International Nuclear Information System (INIS)
Jafri, Y.Z.; Kamal, L.
2007-01-01
Various statistical techniques was used on five-year data from 1998-2002 of average humidity, rainfall, maximum and minimum temperatures, respectively. The relationships to regression analysis time series (RATS) were developed for determining the overall trend of these climate parameters on the basis of which forecast models can be corrected and modified. We computed the coefficient of determination as a measure of goodness of fit, to our polynomial regression analysis time series (PRATS). The correlation to multiple linear regression (MLR) and multiple linear regression analysis time series (MLRATS) were also developed for deciphering the interdependence of weather parameters. Spearman's rand correlation and Goldfeld-Quandt test were used to check the uniformity or non-uniformity of variances in our fit to polynomial regression (PR). The Breusch-Pagan test was applied to MLR and MLRATS, respectively which yielded homoscedasticity. We also employed Bartlett's test for homogeneity of variances on a five-year data of rainfall and humidity, respectively which showed that the variances in rainfall data were not homogenous while in case of humidity, were homogenous. Our results on regression and regression analysis time series show the best fit to prediction modeling on climatic data of Quetta, Pakistan. (author)
Time series regression model for infectious disease and weather.
Imai, Chisato; Armstrong, Ben; Chalabi, Zaid; Mangtani, Punam; Hashizume, Masahiro
2015-10-01
Time series regression has been developed and long used to evaluate the short-term associations of air pollution and weather with mortality or morbidity of non-infectious diseases. The application of the regression approaches from this tradition to infectious diseases, however, is less well explored and raises some new issues. We discuss and present potential solutions for five issues often arising in such analyses: changes in immune population, strong autocorrelations, a wide range of plausible lag structures and association patterns, seasonality adjustments, and large overdispersion. The potential approaches are illustrated with datasets of cholera cases and rainfall from Bangladesh and influenza and temperature in Tokyo. Though this article focuses on the application of the traditional time series regression to infectious diseases and weather factors, we also briefly introduce alternative approaches, including mathematical modeling, wavelet analysis, and autoregressive integrated moving average (ARIMA) models. Modifications proposed to standard time series regression practice include using sums of past cases as proxies for the immune population, and using the logarithm of lagged disease counts to control autocorrelation due to true contagion, both of which are motivated from "susceptible-infectious-recovered" (SIR) models. The complexity of lag structures and association patterns can often be informed by biological mechanisms and explored by using distributed lag non-linear models. For overdispersed models, alternative distribution models such as quasi-Poisson and negative binomial should be considered. Time series regression can be used to investigate dependence of infectious diseases on weather, but may need modifying to allow for features specific to this context. Copyright © 2015 The Authors. Published by Elsevier Inc. All rights reserved.
Model and Variable Selection Procedures for Semiparametric Time Series Regression
Directory of Open Access Journals (Sweden)
Risa Kato
2009-01-01
Full Text Available Semiparametric regression models are very useful for time series analysis. They facilitate the detection of features resulting from external interventions. The complexity of semiparametric models poses new challenges for issues of nonparametric and parametric inference and model selection that frequently arise from time series data analysis. In this paper, we propose penalized least squares estimators which can simultaneously select significant variables and estimate unknown parameters. An innovative class of variable selection procedure is proposed to select significant variables and basis functions in a semiparametric model. The asymptotic normality of the resulting estimators is established. Information criteria for model selection are also proposed. We illustrate the effectiveness of the proposed procedures with numerical simulations.
A generalized additive regression model for survival times
DEFF Research Database (Denmark)
Scheike, Thomas H.
2001-01-01
Additive Aalen model; counting process; disability model; illness-death model; generalized additive models; multiple time-scales; non-parametric estimation; survival data; varying-coefficient models......Additive Aalen model; counting process; disability model; illness-death model; generalized additive models; multiple time-scales; non-parametric estimation; survival data; varying-coefficient models...
A generalized exponential time series regression model for electricity prices
DEFF Research Database (Denmark)
Haldrup, Niels; Knapik, Oskar; Proietti, Tomasso
on the estimated model, the best linear predictor is constructed. Our modeling approach provides good fit within sample and outperforms competing benchmark predictors in terms of forecasting accuracy. We also find that building separate models for each hour of the day and averaging the forecasts is a better...
Ng, Kar Yong; Awang, Norhashidah
2018-01-06
Frequent haze occurrences in Malaysia have made the management of PM 10 (particulate matter with aerodynamic less than 10 μm) pollution a critical task. This requires knowledge on factors associating with PM 10 variation and good forecast of PM 10 concentrations. Hence, this paper demonstrates the prediction of 1-day-ahead daily average PM 10 concentrations based on predictor variables including meteorological parameters and gaseous pollutants. Three different models were built. They were multiple linear regression (MLR) model with lagged predictor variables (MLR1), MLR model with lagged predictor variables and PM 10 concentrations (MLR2) and regression with time series error (RTSE) model. The findings revealed that humidity, temperature, wind speed, wind direction, carbon monoxide and ozone were the main factors explaining the PM 10 variation in Peninsular Malaysia. Comparison among the three models showed that MLR2 model was on a same level with RTSE model in terms of forecasting accuracy, while MLR1 model was the worst.
Flexible survival regression modelling
DEFF Research Database (Denmark)
Cortese, Giuliana; Scheike, Thomas H; Martinussen, Torben
2009-01-01
Regression analysis of survival data, and more generally event history data, is typically based on Cox's regression model. We here review some recent methodology, focusing on the limitations of Cox's regression model. The key limitation is that the model is not well suited to represent time-varyi...
Time-adaptive quantile regression
DEFF Research Database (Denmark)
Møller, Jan Kloppenborg; Nielsen, Henrik Aalborg; Madsen, Henrik
2008-01-01
An algorithm for time-adaptive quantile regression is presented. The algorithm is based on the simplex algorithm, and the linear optimization formulation of the quantile regression problem is given. The observations have been split to allow a direct use of the simplex algorithm. The simplex method...... and an updating procedure are combined into a new algorithm for time-adaptive quantile regression, which generates new solutions on the basis of the old solution, leading to savings in computation time. The suggested algorithm is tested against a static quantile regression model on a data set with wind power...... production, where the models combine splines and quantile regression. The comparison indicates superior performance for the time-adaptive quantile regression in all the performance parameters considered....
Krishnan, M.; Bhowmik, B.; Hazra, B.; Pakrashi, V.
2018-02-01
In this paper, a novel baseline free approach for continuous online damage detection of multi degree of freedom vibrating structures using Recursive Principal Component Analysis (RPCA) in conjunction with Time Varying Auto-Regressive Modeling (TVAR) is proposed. In this method, the acceleration data is used to obtain recursive proper orthogonal components online using rank-one perturbation method, followed by TVAR modeling of the first transformed response, to detect the change in the dynamic behavior of the vibrating system from its pristine state to contiguous linear/non-linear-states that indicate damage. Most of the works available in the literature deal with algorithms that require windowing of the gathered data owing to their data-driven nature which renders them ineffective for online implementation. Algorithms focussed on mathematically consistent recursive techniques in a rigorous theoretical framework of structural damage detection is missing, which motivates the development of the present framework that is amenable for online implementation which could be utilized along with suite experimental and numerical investigations. The RPCA algorithm iterates the eigenvector and eigenvalue estimates for sample covariance matrices and new data point at each successive time instants, using the rank-one perturbation method. TVAR modeling on the principal component explaining maximum variance is utilized and the damage is identified by tracking the TVAR coefficients. This eliminates the need for offline post processing and facilitates online damage detection especially when applied to streaming data without requiring any baseline data. Numerical simulations performed on a 5-dof nonlinear system under white noise excitation and El Centro (also known as 1940 Imperial Valley earthquake) excitation, for different damage scenarios, demonstrate the robustness of the proposed algorithm. The method is further validated on results obtained from case studies involving
Longitudinal beta regression models for analyzing health-related quality of life scores over time
Directory of Open Access Journals (Sweden)
Hunger Matthias
2012-09-01
Full Text Available Abstract Background Health-related quality of life (HRQL has become an increasingly important outcome parameter in clinical trials and epidemiological research. HRQL scores are typically bounded at both ends of the scale and often highly skewed. Several regression techniques have been proposed to model such data in cross-sectional studies, however, methods applicable in longitudinal research are less well researched. This study examined the use of beta regression models for analyzing longitudinal HRQL data using two empirical examples with distributional features typically encountered in practice. Methods We used SF-6D utility data from a German older age cohort study and stroke-specific HRQL data from a randomized controlled trial. We described the conceptual differences between mixed and marginal beta regression models and compared both models to the commonly used linear mixed model in terms of overall fit and predictive accuracy. Results At any measurement time, the beta distribution fitted the SF-6D utility data and stroke-specific HRQL data better than the normal distribution. The mixed beta model showed better likelihood-based fit statistics than the linear mixed model and respected the boundedness of the outcome variable. However, it tended to underestimate the true mean at the upper part of the distribution. Adjusted group means from marginal beta model and linear mixed model were nearly identical but differences could be observed with respect to standard errors. Conclusions Understanding the conceptual differences between mixed and marginal beta regression models is important for their proper use in the analysis of longitudinal HRQL data. Beta regression fits the typical distribution of HRQL data better than linear mixed models, however, if focus is on estimating group mean scores rather than making individual predictions, the two methods might not differ substantially.
Replica analysis of overfitting in regression models for time-to-event data
Coolen, A. C. C.; Barrett, J. E.; Paga, P.; Perez-Vicente, C. J.
2017-09-01
Overfitting, which happens when the number of parameters in a model is too large compared to the number of data points available for determining these parameters, is a serious and growing problem in survival analysis. While modern medicine presents us with data of unprecedented dimensionality, these data cannot yet be used effectively for clinical outcome prediction. Standard error measures in maximum likelihood regression, such as p-values and z-scores, are blind to overfitting, and even for Cox’s proportional hazards model (the main tool of medical statisticians), one finds in literature only rules of thumb on the number of samples required to avoid overfitting. In this paper we present a mathematical theory of overfitting in regression models for time-to-event data, which aims to increase our quantitative understanding of the problem and provide practical tools with which to correct regression outcomes for the impact of overfitting. It is based on the replica method, a statistical mechanical technique for the analysis of heterogeneous many-variable systems that has been used successfully for several decades in physics, biology, and computer science, but not yet in medical statistics. We develop the theory initially for arbitrary regression models for time-to-event data, and verify its predictions in detail for the popular Cox model.
Improving the Prediction of Total Surgical Procedure Time Using Linear Regression Modeling
Directory of Open Access Journals (Sweden)
Eric R. Edelman
2017-06-01
Full Text Available For efficient utilization of operating rooms (ORs, accurate schedules of assigned block time and sequences of patient cases need to be made. The quality of these planning tools is dependent on the accurate prediction of total procedure time (TPT per case. In this paper, we attempt to improve the accuracy of TPT predictions by using linear regression models based on estimated surgeon-controlled time (eSCT and other variables relevant to TPT. We extracted data from a Dutch benchmarking database of all surgeries performed in six academic hospitals in The Netherlands from 2012 till 2016. The final dataset consisted of 79,983 records, describing 199,772 h of total OR time. Potential predictors of TPT that were included in the subsequent analysis were eSCT, patient age, type of operation, American Society of Anesthesiologists (ASA physical status classification, and type of anesthesia used. First, we computed the predicted TPT based on a previously described fixed ratio model for each record, multiplying eSCT by 1.33. This number is based on the research performed by van Veen-Berkx et al., which showed that 33% of SCT is generally a good approximation of anesthesia-controlled time (ACT. We then systematically tested all possible linear regression models to predict TPT using eSCT in combination with the other available independent variables. In addition, all regression models were again tested without eSCT as a predictor to predict ACT separately (which leads to TPT by adding SCT. TPT was most accurately predicted using a linear regression model based on the independent variables eSCT, type of operation, ASA classification, and type of anesthesia. This model performed significantly better than the fixed ratio model and the method of predicting ACT separately. Making use of these more accurate predictions in planning and sequencing algorithms may enable an increase in utilization of ORs, leading to significant financial and productivity related
Improving the Prediction of Total Surgical Procedure Time Using Linear Regression Modeling.
Edelman, Eric R; van Kuijk, Sander M J; Hamaekers, Ankie E W; de Korte, Marcel J M; van Merode, Godefridus G; Buhre, Wolfgang F F A
2017-01-01
For efficient utilization of operating rooms (ORs), accurate schedules of assigned block time and sequences of patient cases need to be made. The quality of these planning tools is dependent on the accurate prediction of total procedure time (TPT) per case. In this paper, we attempt to improve the accuracy of TPT predictions by using linear regression models based on estimated surgeon-controlled time (eSCT) and other variables relevant to TPT. We extracted data from a Dutch benchmarking database of all surgeries performed in six academic hospitals in The Netherlands from 2012 till 2016. The final dataset consisted of 79,983 records, describing 199,772 h of total OR time. Potential predictors of TPT that were included in the subsequent analysis were eSCT, patient age, type of operation, American Society of Anesthesiologists (ASA) physical status classification, and type of anesthesia used. First, we computed the predicted TPT based on a previously described fixed ratio model for each record, multiplying eSCT by 1.33. This number is based on the research performed by van Veen-Berkx et al., which showed that 33% of SCT is generally a good approximation of anesthesia-controlled time (ACT). We then systematically tested all possible linear regression models to predict TPT using eSCT in combination with the other available independent variables. In addition, all regression models were again tested without eSCT as a predictor to predict ACT separately (which leads to TPT by adding SCT). TPT was most accurately predicted using a linear regression model based on the independent variables eSCT, type of operation, ASA classification, and type of anesthesia. This model performed significantly better than the fixed ratio model and the method of predicting ACT separately. Making use of these more accurate predictions in planning and sequencing algorithms may enable an increase in utilization of ORs, leading to significant financial and productivity related benefits.
Wilson, Barry T.; Knight, Joseph F.; McRoberts, Ronald E.
2018-03-01
Imagery from the Landsat Program has been used frequently as a source of auxiliary data for modeling land cover, as well as a variety of attributes associated with tree cover. With ready access to all scenes in the archive since 2008 due to the USGS Landsat Data Policy, new approaches to deriving such auxiliary data from dense Landsat time series are required. Several methods have previously been developed for use with finer temporal resolution imagery (e.g. AVHRR and MODIS), including image compositing and harmonic regression using Fourier series. The manuscript presents a study, using Minnesota, USA during the years 2009-2013 as the study area and timeframe. The study examined the relative predictive power of land cover models, in particular those related to tree cover, using predictor variables based solely on composite imagery versus those using estimated harmonic regression coefficients. The study used two common non-parametric modeling approaches (i.e. k-nearest neighbors and random forests) for fitting classification and regression models of multiple attributes measured on USFS Forest Inventory and Analysis plots using all available Landsat imagery for the study area and timeframe. The estimated Fourier coefficients developed by harmonic regression of tasseled cap transformation time series data were shown to be correlated with land cover, including tree cover. Regression models using estimated Fourier coefficients as predictor variables showed a two- to threefold increase in explained variance for a small set of continuous response variables, relative to comparable models using monthly image composites. Similarly, the overall accuracies of classification models using the estimated Fourier coefficients were approximately 10-20 percentage points higher than the models using the image composites, with corresponding individual class accuracies between six and 45 percentage points higher.
Prahutama, Alan; Suparti; Wahyu Utami, Tiani
2018-03-01
Regression analysis is an analysis to model the relationship between response variables and predictor variables. The parametric approach to the regression model is very strict with the assumption, but nonparametric regression model isn’t need assumption of model. Time series data is the data of a variable that is observed based on a certain time, so if the time series data wanted to be modeled by regression, then we should determined the response and predictor variables first. Determination of the response variable in time series is variable in t-th (yt), while the predictor variable is a significant lag. In nonparametric regression modeling, one developing approach is to use the Fourier series approach. One of the advantages of nonparametric regression approach using Fourier series is able to overcome data having trigonometric distribution. In modeling using Fourier series needs parameter of K. To determine the number of K can be used Generalized Cross Validation method. In inflation modeling for the transportation sector, communication and financial services using Fourier series yields an optimal K of 120 parameters with R-square 99%. Whereas if it was modeled by multiple linear regression yield R-square 90%.
Lee, Eunjee; Zhu, Hongtu; Kong, Dehan; Wang, Yalin; Giovanello, Kelly Sullivan; Ibrahim, Joseph G
2015-12-01
The aim of this paper is to develop a Bayesian functional linear Cox regression model (BFLCRM) with both functional and scalar covariates. This new development is motivated by establishing the likelihood of conversion to Alzheimer's disease (AD) in 346 patients with mild cognitive impairment (MCI) enrolled in the Alzheimer's Disease Neuroimaging Initiative 1 (ADNI-1) and the early markers of conversion. These 346 MCI patients were followed over 48 months, with 161 MCI participants progressing to AD at 48 months. The functional linear Cox regression model was used to establish that functional covariates including hippocampus surface morphology and scalar covariates including brain MRI volumes, cognitive performance (ADAS-Cog), and APOE status can accurately predict time to onset of AD. Posterior computation proceeds via an efficient Markov chain Monte Carlo algorithm. A simulation study is performed to evaluate the finite sample performance of BFLCRM.
Time-series regression models to study the short-term effects of environmental factors on health
Tobías, Aureli; Saez, Marc
2004-01-01
Time series regression models are especially suitable in epidemiology for evaluating short-term effects of time-varying exposures on health. The problem is that potential for confounding in time series regression is very high. Thus, it is important that trend and seasonality are properly accounted for. Our paper reviews the statistical models commonly used in time-series regression methods, specially allowing for serial correlation, make them potentially useful for selected epidemiological pu...
Lalonde, Trent L; Wilson, Jeffrey R; Yin, Jianqiong
2014-11-30
When analyzing longitudinal data, it is essential to account both for the correlation inherent from the repeated measures of the responses as well as the correlation realized on account of the feedback created between the responses at a particular time and the predictors at other times. As such one can analyze these data using generalized estimating equation with the independent working correlation. However, because it is essential to include all the appropriate moment conditions as you solve for the regression coefficients, we explore an alternative approach using a generalized method of moments for estimating the coefficients in such data. We develop an approach that makes use of all the valid moment conditions necessary with each time-dependent and time-independent covariate. This approach does not assume that feedback is always present over time, or if present occur at the same degree. Further, we make use of continuously updating generalized method of moments in obtaining estimates. We fit the generalized method of moments logistic regression model with time-dependent covariates using SAS PROC IML and also in R. We used p-values adjusted for multiple correlated tests to determine the appropriate moment conditions for determining the regression coefficients. We examined two datasets for illustrative purposes. We looked at re-hospitalization taken from a Medicare database. We also revisited data regarding the relationship between the body mass index and future morbidity among children in the Philippines. We conducted a simulated study to compare the performances of extended classifications. Copyright © 2014 John Wiley & Sons, Ltd.
Prediction of hourly PM2.5 using a space-time support vector regression model
Yang, Wentao; Deng, Min; Xu, Feng; Wang, Hang
2018-05-01
Real-time air quality prediction has been an active field of research in atmospheric environmental science. The existing methods of machine learning are widely used to predict pollutant concentrations because of their enhanced ability to handle complex non-linear relationships. However, because pollutant concentration data, as typical geospatial data, also exhibit spatial heterogeneity and spatial dependence, they may violate the assumptions of independent and identically distributed random variables in most of the machine learning methods. As a result, a space-time support vector regression model is proposed to predict hourly PM2.5 concentrations. First, to address spatial heterogeneity, spatial clustering is executed to divide the study area into several homogeneous or quasi-homogeneous subareas. To handle spatial dependence, a Gauss vector weight function is then developed to determine spatial autocorrelation variables as part of the input features. Finally, a local support vector regression model with spatial autocorrelation variables is established for each subarea. Experimental data on PM2.5 concentrations in Beijing are used to verify whether the results of the proposed model are superior to those of other methods.
A joint logistic regression and covariate-adjusted continuous-time Markov chain model.
Rubin, Maria Laura; Chan, Wenyaw; Yamal, Jose-Miguel; Robertson, Claudia Sue
2017-12-10
The use of longitudinal measurements to predict a categorical outcome is an increasingly common goal in research studies. Joint models are commonly used to describe two or more models simultaneously by considering the correlated nature of their outcomes and the random error present in the longitudinal measurements. However, there is limited research on joint models with longitudinal predictors and categorical cross-sectional outcomes. Perhaps the most challenging task is how to model the longitudinal predictor process such that it represents the true biological mechanism that dictates the association with the categorical response. We propose a joint logistic regression and Markov chain model to describe a binary cross-sectional response, where the unobserved transition rates of a two-state continuous-time Markov chain are included as covariates. We use the method of maximum likelihood to estimate the parameters of our model. In a simulation study, coverage probabilities of about 95%, standard deviations close to standard errors, and low biases for the parameter values show that our estimation method is adequate. We apply the proposed joint model to a dataset of patients with traumatic brain injury to describe and predict a 6-month outcome based on physiological data collected post-injury and admission characteristics. Our analysis indicates that the information provided by physiological changes over time may help improve prediction of long-term functional status of these severely ill subjects. Copyright © 2017 John Wiley & Sons, Ltd. Copyright © 2017 John Wiley & Sons, Ltd.
Directory of Open Access Journals (Sweden)
Hong Soo Lim
2018-01-01
Full Text Available Building automation systems is becoming more vital, especially in regard to reduced building energy consumption. However, the accuracy of such systems in calculating building thermal loads is limited as they are unable to predict future thermal loads based on prevailing environmental factors. The current paper therefore seeks to improve the understanding of the interactions between outdoor meteorological data and building energy consumption through a statistical analysis. Using weather data collected by the Korean Meteorological Agency (KMA over a period of three years (2011–2014, prediction models that are able to predict heating thermal loads considering the time-lag phenomenon are developed. In addition, the study develops different prediction models for buildings of different sizes. The results confirm the existence of the time-lag phenomenon: the heating load experienced by a building at a given time is better explained by a regression model developed using the climatic conditions that existed two hours before. As such, conventional building simulation programs must endeavor to include time-lag as well as Aerosol Optical Depth (AOD data as important factors in the prediction of building heating loads.
Predicting the "graduate on time (GOT)" of PhD students using binary logistics regression model
Shariff, S. Sarifah Radiah; Rodzi, Nur Atiqah Mohd; Rahman, Kahartini Abdul; Zahari, Siti Meriam; Deni, Sayang Mohd
2016-10-01
Malaysian government has recently set a new goal to produce 60,000 Malaysian PhD holders by the year 2023. As a Malaysia's largest institution of higher learning in terms of size and population which offers more than 500 academic programmes in a conducive and vibrant environment, UiTM has taken several initiatives to fill up the gap. Strategies to increase the numbers of graduates with PhD are a process that is challenging. In many occasions, many have already identified that the struggle to get into the target set is even more daunting, and that implementation is far too ideal. This has further being progressing slowly as the attrition rate increases. This study aims to apply the proposed models that incorporates several factors in predicting the number PhD students that will complete their PhD studies on time. Binary Logistic Regression model is proposed and used on the set of data to determine the number. The results show that only 6.8% of the 2014 PhD students are predicted to graduate on time and the results are compared wih the actual number for validation purpose.
Forecasting with Dynamic Regression Models
Pankratz, Alan
2012-01-01
One of the most widely used tools in statistical forecasting, single equation regression models is examined here. A companion to the author's earlier work, Forecasting with Univariate Box-Jenkins Models: Concepts and Cases, the present text pulls together recent time series ideas and gives special attention to possible intertemporal patterns, distributed lag responses of output to input series and the auto correlation patterns of regression disturbance. It also includes six case studies.
Hilbe, Joseph M
2009-01-01
This book really does cover everything you ever wanted to know about logistic regression … with updates available on the author's website. Hilbe, a former national athletics champion, philosopher, and expert in astronomy, is a master at explaining statistical concepts and methods. Readers familiar with his other expository work will know what to expect-great clarity.The book provides considerable detail about all facets of logistic regression. No step of an argument is omitted so that the book will meet the needs of the reader who likes to see everything spelt out, while a person familiar with some of the topics has the option to skip "obvious" sections. The material has been thoroughly road-tested through classroom and web-based teaching. … The focus is on helping the reader to learn and understand logistic regression. The audience is not just students meeting the topic for the first time, but also experienced users. I believe the book really does meet the author's goal … .-Annette J. Dobson, Biometric...
The projection of world geothermal energy consumption from time series and regression model
Simanullang, Elwin Y.; Supriatna, Agus; Supriatna, Asep K.
2015-12-01
World population growth has many impacts on human live activities and other related aspects. One among the aspects is the increase of the use of energy to support human daily activities, covering industrial aspect, transportation, domestic activities, etc. It is plausible that the higher the population size in a country the higher the needs for energy to support all aspects of human activities in the country. Considering the depletion of petroleum and other fossil-based energy, recently there is a tendency to use geothermal as other source of energy. In this paper we will discuss the prediction of the world consumption of geothermal energy by two different methods, i.e. via the time series of the geothermal usage and via the time series of the geothermal usage combined with the prediction of the world total population. For the first case, we use the simple exponential smoothing method while for the second case we use the simple regression method. The result shows that taking into account the prediction of the world population size giving a better prediction to forecast a short term of the geothermal energy consumption.
Using the mean approach in pooling cross-section and time series data for regression modelling
International Nuclear Information System (INIS)
Nuamah, N.N.N.N.
1989-12-01
The mean approach is one of the methods for pooling cross section and time series data for mathematical-statistical modelling. Though a simple approach, its results are sometimes paradoxical in nature. However, researchers still continue using it for its simplicity. Here, the paper investigates the nature and source of such unwanted phenomena. (author). 7 refs
Barry T. Wilson; Joseph F. Knight; Ronald E. McRoberts
2018-01-01
Imagery from the Landsat Program has been used frequently as a source of auxiliary data for modeling land cover, as well as a variety of attributes associated with tree cover. With ready access to all scenes in the archive since 2008 due to the USGS Landsat Data Policy, new approaches to deriving such auxiliary data from dense Landsat time series are required. Several...
Mixture regression models for the gap time distributions and illness-death processes.
Huang, Chia-Hui
2018-01-27
The aim of this study is to provide an analysis of gap event times under the illness-death model, where some subjects experience "illness" before "death" and others experience only "death." Which event is more likely to occur first and how the duration of the "illness" influences the "death" event are of interest. Because the occurrence of the second event is subject to dependent censoring, it can lead to bias in the estimation of model parameters. In this work, we generalize the semiparametric mixture models for competing risks data to accommodate the subsequent event and use a copula function to model the dependent structure between the successive events. Under the proposed method, the survival function of the censoring time does not need to be estimated when developing the inference procedure. We incorporate the cause-specific hazard functions with the counting process approach and derive a consistent estimation using the nonparametric maximum likelihood method. Simulations are conducted to demonstrate the performance of the proposed analysis, and its application in a clinical study on chronic myeloid leukemia is reported to illustrate its utility.
Directory of Open Access Journals (Sweden)
Tanck Michael WT
2008-01-01
Full Text Available Abstract Background This paper describes a likelihood approach to model the relation between failure time and haplotypes in studies with unrelated individuals where haplotype phase is unknown, while dealing with the problem of unstable estimates due to rare haplotypes by considering a penalized log-likelihood. Results The Cox model presented here incorporates the uncertainty related to the unknown phase of multiple heterozygous individuals as weights. Estimation is performed with an EM algorithm. In the E-step the weights are estimated, and in the M-step the parameter estimates are estimated by maximizing the expectation of the joint log-likelihood, and the baseline hazard function and haplotype frequencies are calculated. These steps are iterated until the parameter estimates converge. Two penalty functions are considered, namely the ridge penalty and a difference penalty, which is based on the assumption that similar haplotypes show similar effects. Simulations were conducted to investigate properties of the method, and the association between IL10 haplotypes and risk of target vessel revascularization was investigated in 2653 patients from the GENDER study. Conclusion Results from simulations and real data show that the penalized log-likelihood approach produces valid results, indicating that this method is of interest when studying the association between rare haplotypes and failure time in studies of unrelated individuals.
International Nuclear Information System (INIS)
Guo, Yin; Nazarian, Ehsan; Ko, Jeonghan; Rajurkar, Kamlakar
2014-01-01
Highlights: • Developed hourly-indexed ARX models for robust cooling-load forecasting. • Proposed a two-stage weighted least-squares regression approach. • Considered the effect of outliers as well as trend of cooling load and weather patterns. • Included higher order terms and day type patterns in the forecasting models. • Demonstrated better accuracy compared with some ARX and ANN models. - Abstract: This paper presents a robust hourly cooling-load forecasting method based on time-indexed autoregressive with exogenous inputs (ARX) models, in which the coefficients are estimated through a two-stage weighted least squares regression. The prediction method includes a combination of two separate time-indexed ARX models to improve prediction accuracy of the cooling load over different forecasting periods. The two-stage weighted least-squares regression approach in this study is robust to outliers and suitable for fast and adaptive coefficient estimation. The proposed method is tested on a large-scale central cooling system in an academic institution. The numerical case studies show the proposed prediction method performs better than some ANN and ARX forecasting models for the given test data set
Leroux, Romain; Chatellier, Ludovic; David, Laurent
2018-01-01
This article is devoted to the estimation of time-resolved particle image velocimetry (TR-PIV) flow fields using a time-resolved point measurements of a voltage signal obtained by hot-film anemometry. A multiple linear regression model is first defined to map the TR-PIV flow fields onto the voltage signal. Due to the high temporal resolution of the signal acquired by the hot-film sensor, the estimates of the TR-PIV flow fields are obtained with a multiple linear regression method called orthonormalized partial least squares regression (OPLSR). Subsequently, this model is incorporated as the observation equation in an ensemble Kalman filter (EnKF) applied on a proper orthogonal decomposition reduced-order model to stabilize it while reducing the effects of the hot-film sensor noise. This method is assessed for the reconstruction of the flow around a NACA0012 airfoil at a Reynolds number of 1000 and an angle of attack of {20}°. Comparisons with multi-time delay-modified linear stochastic estimation show that both the OPLSR and EnKF combined with OPLSR are more accurate as they produce a much lower relative estimation error, and provide a faithful reconstruction of the time evolution of the velocity flow fields.
(Non) linear regression modelling
Cizek, P.; Gentle, J.E.; Hardle, W.K.; Mori, Y.
2012-01-01
We will study causal relationships of a known form between random variables. Given a model, we distinguish one or more dependent (endogenous) variables Y = (Y1,…,Yl), l ∈ N, which are explained by a model, and independent (exogenous, explanatory) variables X = (X1,…,Xp),p ∈ N, which explain or
Directory of Open Access Journals (Sweden)
Qiong Hu
2018-03-01
Full Text Available Soybean cultivation in China has significantly decreased due to the rising import of genetically modified soybeans from other countries. Understanding soybean’s extent and change information is of great value for national agricultural policy implications and global food security. Some previous studies have explored the quantitative relationships between crop area and spectral variables derived from remote sensing data. However, both those linear or non-linear relationships were expressed by global regression models, which ignored the spatial non-stationarity of crop spectral signature and may limit the prediction accuracy. This study presented a geographically weighted regression model (GWR to estimate fractional soybean at 250 m spatial resolution in Heilongjiang Province, one of the most important food production regions in China, using time-series MODIS data and high-quality calibration information derived from Landsat data. A forward stepwise optimization strategy was embedded with the GWR model to select the optimal subset of independent variables for soybeans. Normalized Difference Vegetation Index (NDVI of Julian day 233 to 257 when soybeans are filling seed was found to be the most important temporal period for sub-pixel soybean area estimation. Our MODIS-based soybean area compared well with Landsat-based results at pixel-level. Also, there was a good agreement between the MODIS-based result and census data at county level, with the coefficient of determination (R2 of 0.80 and the root mean square error (RMSE was 340.21 km2. Additionally, F-test results showed GWR model had better model goodness-of-fit and higher prediction accuracy than the traditional ordinary least squares (OLS model. These promising results suggest crop spectral variations both at temporal and spatial scales should be considered when exploring its relationship with pixel-level crop acreage. The optimized GWR model by combining an automated feature selection
Ridge Regression for Interactive Models.
Tate, Richard L.
1988-01-01
An exploratory study of the value of ridge regression for interactive models is reported. Assuming that the linear terms in a simple interactive model are centered to eliminate non-essential multicollinearity, a variety of common models, representing both ordinal and disordinal interactions, are shown to have "orientations" that are…
Modified Regression Correlation Coefficient for Poisson Regression Model
Kaengthong, Nattacha; Domthong, Uthumporn
2017-09-01
This study gives attention to indicators in predictive power of the Generalized Linear Model (GLM) which are widely used; however, often having some restrictions. We are interested in regression correlation coefficient for a Poisson regression model. This is a measure of predictive power, and defined by the relationship between the dependent variable (Y) and the expected value of the dependent variable given the independent variables [E(Y|X)] for the Poisson regression model. The dependent variable is distributed as Poisson. The purpose of this research was modifying regression correlation coefficient for Poisson regression model. We also compare the proposed modified regression correlation coefficient with the traditional regression correlation coefficient in the case of two or more independent variables, and having multicollinearity in independent variables. The result shows that the proposed regression correlation coefficient is better than the traditional regression correlation coefficient based on Bias and the Root Mean Square Error (RMSE).
Nonparametric Mixture of Regression Models.
Huang, Mian; Li, Runze; Wang, Shaoli
2013-07-01
Motivated by an analysis of US house price index data, we propose nonparametric finite mixture of regression models. We study the identifiability issue of the proposed models, and develop an estimation procedure by employing kernel regression. We further systematically study the sampling properties of the proposed estimators, and establish their asymptotic normality. A modified EM algorithm is proposed to carry out the estimation procedure. We show that our algorithm preserves the ascent property of the EM algorithm in an asymptotic sense. Monte Carlo simulations are conducted to examine the finite sample performance of the proposed estimation procedure. An empirical analysis of the US house price index data is illustrated for the proposed methodology.
Dynamic travel time estimation using regression trees.
2008-10-01
This report presents a methodology for travel time estimation by using regression trees. The dissemination of travel time information has become crucial for effective traffic management, especially under congested road conditions. In the absence of c...
Regression Models for Repairable Systems
Czech Academy of Sciences Publication Activity Database
Novák, Petr
2015-01-01
Roč. 17, č. 4 (2015), s. 963-972 ISSN 1387-5841 Institutional support: RVO:67985556 Keywords : Reliability analysis * Repair models * Regression Subject RIV: BB - Applied Statistics , Operational Research Impact factor: 0.782, year: 2015 http://library.utia.cas.cz/separaty/2015/SI/novak-0450902.pdf
An Additive-Multiplicative Cox-Aalen Regression Model
DEFF Research Database (Denmark)
Scheike, Thomas H.; Zhang, Mei-Jie
2002-01-01
Aalen model; additive risk model; counting processes; Cox regression; survival analysis; time-varying effects......Aalen model; additive risk model; counting processes; Cox regression; survival analysis; time-varying effects...
Kühn, Michael; Schöne, Tim
2017-04-01
Water management tools are essential to ensure the conservation of natural resources. The geothermal hot water reservoir below the village of Waiwera, on the Northern Island of New Zealand is used commercially since 1863. The continuous production of 50 °C hot geothermal water, to supply hotels and spas, has a negative impact on the reservoir. Until the year 1969 from all wells drilled the warm water flow was artesian. Due to overproduction the water needs to be pumped up nowadays. Further, within the years 1975 to 1976 the warm water seeps on the beach of Waiwera ran dry. In order to protect the reservoir and the historical and tourist site in the early 1980s a water management plan was deployed. The "Auckland Council" established guidelines to enable a sustainable management of the resource [1]. The management plan demands that the water level in the official and appropriate observation well of the council is 0.5 m above sea level throughout the year in average. Almost four decades of data (since 1978 until today) are now available [2]. For a sustainable water management, it is necessary to be able to forecast the water level as a function of the production rates in the production wells. The best predictions are provided by a multivariate regression model of the water level and production rate time series, which takes into account the production rates of individual wells. It is based on the inversely proportional relationship between the independent variable (production rate) and the dependent variable (measured water level). In production scenarios, a maximum total production rate of approx. 1,100 m3 / day is determined in order to comply with the guidelines of the "Auckland Council". [1] Kühn M., Stöfen H. (2005) A reactive flow model of the geothermal reservoir Waiwera, New Zealand. Hydrogeology Journal 13, 606-626, doi: 10.1007/s10040-004-0377-6 [2] Kühn M., Altmannsberger C. (2016) Assessment of data driven and process based water management tools for
Linear Regression Models for Estimating True Subsurface ...
Indian Academy of Sciences (India)
47
The objective is to minimize the processing time and computer memory required .... Survey. 65 time to acquire extra GPR or seismic data for large sites and picking the first arrival time. 66 to provide the needed datasets for the joint inversion are also .... The data utilized for the regression modelling was acquired from ground.
Ali, M Sanni; Groenwold, Rolf H H; Belitser, Svetlana V; Souverein, Patrick C; Martín, Elisa; Gatto, Nicolle M; Huerta, Consuelo; Gardarsdottir, Helga; Roes, Kit C B; Hoes, Arno W; de Boer, Antonius; Klungel, Olaf H
2016-01-01
BACKGROUND: Observational studies including time-varying treatments are prone to confounding. We compared time-varying Cox regression analysis, propensity score (PS) methods, and marginal structural models (MSMs) in a study of antidepressant [selective serotonin reuptake inhibitors (SSRIs)] use and
Directory of Open Access Journals (Sweden)
Yu-Pin Liao
2017-11-01
Full Text Available In the past few decades, demand forecasting has become relatively difficult due to rapid changes in the global environment. This research illustrates the use of the make-to-stock (MTS production strategy in order to explain how forecasting plays an essential role in business management. The linear mixed-effect (LME model has been extensively developed and is widely applied in various fields. However, no study has used the LME model for business forecasting. We suggest that the LME model be used as a tool for prediction and to overcome environment complexity. The data analysis is based on real data in an international display company, where the company needs accurate demand forecasting before adopting a MTS strategy. The forecasting result from the LME model is compared to the commonly used approaches, including the regression model, autoregressive model, times series model, and exponential smoothing model, with the results revealing that prediction performance provided by the LME model is more stable than using the other methods. Furthermore, product types in the data are regarded as a random effect in the LME model, hence demands of all types can be predicted simultaneously using a single LME model. However, some approaches require splitting the data into different type categories, and then predicting the type demand by establishing a model for each type. This feature also demonstrates the practicability of the LME model in real business operations.
Nonparametric and semiparametric dynamic additive regression models
DEFF Research Database (Denmark)
Scheike, Thomas Harder; Martinussen, Torben
Dynamic additive regression models provide a flexible class of models for analysis of longitudinal data. The approach suggested in this work is suited for measurements obtained at random time points and aims at estimating time-varying effects. Both fully nonparametric and semiparametric models can...... in special cases. We investigate the finite sample properties of the estimators and conclude that the asymptotic results are valid for even samll samples....
Chen, Bo; Wu, Zhongru; Liang, Jiachen; Dou, Yanhong
2017-01-01
The modeling of cracks and identification of dam behavior changes are difficult issues in dam health monitoring research. In this paper, a time-varying identification model for crack monitoring data is built using support vector regression (SVR) and the Bayesian evidence framework (BEF). First, the SVR method is adopted for better modeling of the nonlinear relationship between the crack opening displacement (COD) and its influencing factors. Second, the BEF approach is applied to determine th...
Hamidi, Omid; Tapak, Leili; Abbasi, Hamed; Maryanaji, Zohreh
2017-10-01
We have conducted a case study to investigate the performance of support vector machine, multivariate adaptive regression splines, and random forest time series methods in snowfall modeling. These models were applied to a data set of monthly snowfall collected during six cold months at Hamadan Airport sample station located in the Zagros Mountain Range in Iran. We considered monthly data of snowfall from 1981 to 2008 during the period from October/November to April/May as the training set and the data from 2009 to 2015 as the testing set. The root mean square errors (RMSE), mean absolute errors (MAE), determination coefficient (R 2), coefficient of efficiency (E%), and intra-class correlation coefficient (ICC) statistics were used as evaluation criteria. Our results indicated that the random forest time series model outperformed the support vector machine and multivariate adaptive regression splines models in predicting monthly snowfall in terms of several criteria. The RMSE, MAE, R 2, E, and ICC for the testing set were 7.84, 5.52, 0.92, 0.89, and 0.93, respectively. The overall results indicated that the random forest time series model could be successfully used to estimate monthly snowfall values. Moreover, the support vector machine model showed substantial performance as well, suggesting it may also be applied to forecast snowfall in this area.
Model building in nonproportional hazard regression.
Rodríguez-Girondo, Mar; Kneib, Thomas; Cadarso-Suárez, Carmen; Abu-Assi, Emad
2013-12-30
Recent developments of statistical methods allow for a very flexible modeling of covariates affecting survival times via the hazard rate, including also the inspection of possible time-dependent associations. Despite their immediate appeal in terms of flexibility, these models typically introduce additional difficulties when a subset of covariates and the corresponding modeling alternatives have to be chosen, that is, for building the most suitable model for given data. This is particularly true when potentially time-varying associations are given. We propose to conduct a piecewise exponential representation of the original survival data to link hazard regression with estimation schemes based on of the Poisson likelihood to make recent advances for model building in exponential family regression accessible also in the nonproportional hazard regression context. A two-stage stepwise selection approach, an approach based on doubly penalized likelihood, and a componentwise functional gradient descent approach are adapted to the piecewise exponential regression problem. These three techniques were compared via an intensive simulation study. An application to prognosis after discharge for patients who suffered a myocardial infarction supplements the simulation to demonstrate the pros and cons of the approaches in real data analyses. Copyright © 2013 John Wiley & Sons, Ltd.
AIRLINE ACTIVITY FORECASTING BY REGRESSION MODELS
Directory of Open Access Journals (Sweden)
Н. Білак
2012-04-01
Full Text Available Proposed linear and nonlinear regression models, which take into account the equation of trend and seasonality indices for the analysis and restore the volume of passenger traffic over the past period of time and its prediction for future years, as well as the algorithm of formation of these models based on statistical analysis over the years. The desired model is the first step for the synthesis of more complex models, which will enable forecasting of passenger (income level airline with the highest accuracy and time urgency.
A Seemingly Unrelated Poisson Regression Model
King, Gary
1989-01-01
This article introduces a new estimator for the analysis of two contemporaneously correlated endogenous event count variables. This seemingly unrelated Poisson regression model (SUPREME) estimator combines the efficiencies created by single equation Poisson regression model estimators and insights from "seemingly unrelated" linear regression models.
Gaussian Process Regression Model in Spatial Logistic Regression
Sofro, A.; Oktaviarina, A.
2018-01-01
Spatial analysis has developed very quickly in the last decade. One of the favorite approaches is based on the neighbourhood of the region. Unfortunately, there are some limitations such as difficulty in prediction. Therefore, we offer Gaussian process regression (GPR) to accommodate the issue. In this paper, we will focus on spatial modeling with GPR for binomial data with logit link function. The performance of the model will be investigated. We will discuss the inference of how to estimate the parameters and hyper-parameters and to predict as well. Furthermore, simulation studies will be explained in the last section.
Panel Smooth Transition Regression Models
DEFF Research Database (Denmark)
González, Andrés; Terasvirta, Timo; Dijk, Dick van
models to the panel context. The strategy consists of model specification based on homogeneity tests, parameter estimation, and model evaluation, including tests of parameter constancy and no remaining heterogeneity. The model is applied to describing firms' investment decisions in the presence...
Directory of Open Access Journals (Sweden)
Bo Chen
2017-01-01
Full Text Available The modeling of cracks and identification of dam behavior changes are difficult issues in dam health monitoring research. In this paper, a time-varying identification model for crack monitoring data is built using support vector regression (SVR and the Bayesian evidence framework (BEF. First, the SVR method is adopted for better modeling of the nonlinear relationship between the crack opening displacement (COD and its influencing factors. Second, the BEF approach is applied to determine the optimal SVR modeling parameters, including the penalty coefficient, the loss coefficient, and the width coefficient of the radial kernel function, under the principle that the prediction errors between the monitored and the model forecasted values are as small as possible. Then, considering the predicted COD, the historical maximum COD, and the time-dependent component, forewarning criteria are proposed for identifying the time-varying behavior of cracks and the degree of abnormality of dam health. Finally, an example of modeling and forewarning analysis is presented using two monitoring subsequences from a real structural crack in the Chencun concrete arch-gravity dam. The findings indicate that the proposed time-varying model can provide predicted results that are more accurately nonlinearity fitted and is suitable for use in evaluating the behavior of cracks in dams.
Introduction to the use of regression models in epidemiology.
Bender, Ralf
2009-01-01
Regression modeling is one of the most important statistical techniques used in analytical epidemiology. By means of regression models the effect of one or several explanatory variables (e.g., exposures, subject characteristics, risk factors) on a response variable such as mortality or cancer can be investigated. From multiple regression models, adjusted effect estimates can be obtained that take the effect of potential confounders into account. Regression methods can be applied in all epidemiologic study designs so that they represent a universal tool for data analysis in epidemiology. Different kinds of regression models have been developed in dependence on the measurement scale of the response variable and the study design. The most important methods are linear regression for continuous outcomes, logistic regression for binary outcomes, Cox regression for time-to-event data, and Poisson regression for frequencies and rates. This chapter provides a nontechnical introduction to these regression models with illustrating examples from cancer research.
Oden, Timothy D.; Asquith, William H.; Milburn, Matthew S.
2009-01-01
In December 2005, the U.S. Geological Survey in cooperation with the City of Houston, Texas, began collecting discrete water-quality samples for nutrients, total organic carbon, bacteria (total coliform and Escherichia coli), atrazine, and suspended sediment at two U.S. Geological Survey streamflow-gaging stations upstream from Lake Houston near Houston (08068500 Spring Creek near Spring, Texas, and 08070200 East Fork San Jacinto River near New Caney, Texas). The data from the discrete water-quality samples collected during 2005-07, in conjunction with monitored real-time data already being collected - physical properties (specific conductance, pH, water temperature, turbidity, and dissolved oxygen), streamflow, and rainfall - were used to develop regression models for predicting water-quality constituent concentrations for inflows to Lake Houston. Rainfall data were obtained from a rain gage monitored by Harris County Homeland Security and Emergency Management and colocated with the Spring Creek station. The leaps and bounds algorithm was used to find the best subsets of possible regression models (minimum residual sum of squares for a given number of variables). The potential explanatory or predictive variables included discharge (streamflow), specific conductance, pH, water temperature, turbidity, dissolved oxygen, rainfall, and time (to account for seasonal variations inherent in some water-quality data). The response variables at each site were nitrite plus nitrate nitrogen, total phosphorus, organic carbon, Escherichia coli, atrazine, and suspended sediment. The explanatory variables provide easily measured quantities as a means to estimate concentrations of the various constituents under investigation, with accompanying estimates of measurement uncertainty. Each regression equation can be used to estimate concentrations of a given constituent in real time. In conjunction with estimated concentrations, constituent loads were estimated by multiplying the
Variable importance in latent variable regression models
Kvalheim, O.M.; Arneberg, R.; Bleie, O.; Rajalahti, T.; Smilde, A.K.; Westerhuis, J.A.
2014-01-01
The quality and practical usefulness of a regression model are a function of both interpretability and prediction performance. This work presents some new graphical tools for improved interpretation of latent variable regression models that can also assist in improved algorithms for variable
Regression modeling of ground-water flow
Cooley, R.L.; Naff, R.L.
1985-01-01
Nonlinear multiple regression methods are developed to model and analyze groundwater flow systems. Complete descriptions of regression methodology as applied to groundwater flow models allow scientists and engineers engaged in flow modeling to apply the methods to a wide range of problems. Organization of the text proceeds from an introduction that discusses the general topic of groundwater flow modeling, to a review of basic statistics necessary to properly apply regression techniques, and then to the main topic: exposition and use of linear and nonlinear regression to model groundwater flow. Statistical procedures are given to analyze and use the regression models. A number of exercises and answers are included to exercise the student on nearly all the methods that are presented for modeling and statistical analysis. Three computer programs implement the more complex methods. These three are a general two-dimensional, steady-state regression model for flow in an anisotropic, heterogeneous porous medium, a program to calculate a measure of model nonlinearity with respect to the regression parameters, and a program to analyze model errors in computed dependent variables such as hydraulic head. (USGS)
BOX-COX REGRESSION METHOD IN TIME SCALING
Directory of Open Access Journals (Sweden)
ATİLLA GÖKTAŞ
2013-06-01
Full Text Available Box-Cox regression method with λj, for j = 1, 2, ..., k, power transformation can be used when dependent variable and error term of the linear regression model do not satisfy the continuity and normality assumptions. The situation obtaining the smallest mean square error when optimum power λj, transformation for j = 1, 2, ..., k, of Y has been discussed. Box-Cox regression method is especially appropriate to adjust existence skewness or heteroscedasticity of error terms for a nonlinear functional relationship between dependent and explanatory variables. In this study, the advantage and disadvantage use of Box-Cox regression method have been discussed in differentiation and differantial analysis of time scale concept.
[From clinical judgment to linear regression model.
Palacios-Cruz, Lino; Pérez, Marcela; Rivas-Ruiz, Rodolfo; Talavera, Juan O
2013-01-01
When we think about mathematical models, such as linear regression model, we think that these terms are only used by those engaged in research, a notion that is far from the truth. Legendre described the first mathematical model in 1805, and Galton introduced the formal term in 1886. Linear regression is one of the most commonly used regression models in clinical practice. It is useful to predict or show the relationship between two or more variables as long as the dependent variable is quantitative and has normal distribution. Stated in another way, the regression is used to predict a measure based on the knowledge of at least one other variable. Linear regression has as it's first objective to determine the slope or inclination of the regression line: Y = a + bx, where "a" is the intercept or regression constant and it is equivalent to "Y" value when "X" equals 0 and "b" (also called slope) indicates the increase or decrease that occurs when the variable "x" increases or decreases in one unit. In the regression line, "b" is called regression coefficient. The coefficient of determination (R 2 ) indicates the importance of independent variables in the outcome.
Ali, M Sanni; Groenwold, Rolf H H; Belitser, Svetlana V; Souverein, Patrick C; Martín, Elisa; Gatto, Nicolle M; Huerta, Consuelo; Gardarsdottir, Helga; Roes, Kit C B; Hoes, Arno W; de Boer, Antonius; Klungel, Olaf H
2016-03-01
Observational studies including time-varying treatments are prone to confounding. We compared time-varying Cox regression analysis, propensity score (PS) methods, and marginal structural models (MSMs) in a study of antidepressant [selective serotonin reuptake inhibitors (SSRIs)] use and the risk of hip fracture. A cohort of patients with a first prescription for antidepressants (SSRI or tricyclic antidepressants) was extracted from the Dutch Mondriaan and Spanish Base de datos para la Investigación Farmacoepidemiológica en Atención Primaria (BIFAP) general practice databases for the period 2001-2009. The net (total) effect of SSRI versus no SSRI on the risk of hip fracture was estimated using time-varying Cox regression, stratification and covariate adjustment using the PS, and MSM. In MSM, censoring was accounted for by inverse probability of censoring weights. The crude hazard ratio (HR) of SSRI use versus no SSRI use on hip fracture was 1.75 (95%CI: 1.12, 2.72) in Mondriaan and 2.09 (1.89, 2.32) in BIFAP. After confounding adjustment using time-varying Cox regression, stratification, and covariate adjustment using the PS, HRs increased in Mondriaan [2.59 (1.63, 4.12), 2.64 (1.63, 4.25), and 2.82 (1.63, 4.25), respectively] and decreased in BIFAP [1.56 (1.40, 1.73), 1.54 (1.39, 1.71), and 1.61 (1.45, 1.78), respectively]. MSMs with stabilized weights yielded HR 2.15 (1.30, 3.55) in Mondriaan and 1.63 (1.28, 2.07) in BIFAP when accounting for censoring and 2.13 (1.32, 3.45) in Mondriaan and 1.66 (1.30, 2.12) in BIFAP without accounting for censoring. In this empirical study, differences between the different methods to control for time-dependent confounding were small. The observed differences in treatment effect estimates between the databases are likely attributable to different confounding information in the datasets, illustrating that adequate information on (time-varying) confounding is crucial to prevent bias. Copyright © 2016 John Wiley & Sons, Ltd.
Michna, Agata; Braselmann, Herbert; Selmansberger, Martin; Dietz, Anne; Hess, Julia; Gomolka, Maria; Hornhardt, Sabine; Blüthgen, Nils; Zitzelsberger, Horst; Unger, Kristian
2016-01-01
Gene expression time-course experiments allow to study the dynamics of transcriptomic changes in cells exposed to different stimuli. However, most approaches for the reconstruction of gene association networks (GANs) do not propose prior-selection approaches tailored to time-course transcriptome data. Here, we present a workflow for the identification of GANs from time-course data using prior selection of genes differentially expressed over time identified by natural cubic spline regression modeling (NCSRM). The workflow comprises three major steps: 1) the identification of differentially expressed genes from time-course expression data by employing NCSRM, 2) the use of regularized dynamic partial correlation as implemented in GeneNet to infer GANs from differentially expressed genes and 3) the identification and functional characterization of the key nodes in the reconstructed networks. The approach was applied on a time-resolved transcriptome data set of radiation-perturbed cell culture models of non-tumor cells with normal and increased radiation sensitivity. NCSRM detected significantly more genes than another commonly used method for time-course transcriptome analysis (BETR). While most genes detected with BETR were also detected with NCSRM the false-detection rate of NCSRM was low (3%). The GANs reconstructed from genes detected with NCSRM showed a better overlap with the interactome network Reactome compared to GANs derived from BETR detected genes. After exposure to 1 Gy the normal sensitive cells showed only sparse response compared to cells with increased sensitivity, which exhibited a strong response mainly of genes related to the senescence pathway. After exposure to 10 Gy the response of the normal sensitive cells was mainly associated with senescence and that of cells with increased sensitivity with apoptosis. We discuss these results in a clinical context and underline the impact of senescence-associated pathways in acute radiation response of normal
Regression Models for Market-Shares
DEFF Research Database (Denmark)
Birch, Kristina; Olsen, Jørgen Kai; Tjur, Tue
2005-01-01
On the background of a data set of weekly sales and prices for three brands of coffee, this paper discusses various regression models and their relation to the multiplicative competitive-interaction model (the MCI model, see Cooper 1988, 1993) for market-shares. Emphasis is put on the interpretat......On the background of a data set of weekly sales and prices for three brands of coffee, this paper discusses various regression models and their relation to the multiplicative competitive-interaction model (the MCI model, see Cooper 1988, 1993) for market-shares. Emphasis is put...... on the interpretation of the parameters in relation to models for the total sales based on discrete choice models.Key words and phrases. MCI model, discrete choice model, market-shares, price elasitcity, regression model....
Bias-corrected quantile regression estimation of censored regression models
Cizek, Pavel; Sadikoglu, Serhan
2018-01-01
In this paper, an extension of the indirect inference methodology to semiparametric estimation is explored in the context of censored regression. Motivated by weak small-sample performance of the censored regression quantile estimator proposed by Powell (J Econom 32:143–155, 1986a), two- and
Hu, Q.; Friedl, M. A.; Wu, W.
2017-12-01
Accurate and timely information regarding the spatial distribution of crop types and their changes is essential for acreage surveys, yield estimation, water management, and agricultural production decision-making. In recent years, increasing population, dietary shifts and climate change have driven drastic changes in China's agricultural land use. However, no maps are currently available that document the spatial and temporal patterns of these agricultural land use changes. Because of its short revisit period, rich spectral bands and global coverage, MODIS time series data has been shown to have great potential for detecting the seasonal dynamics of different crop types. However, its inherently coarse spatial resolution limits the accuracy with which crops can be identified from MODIS in regions with small fields or complex agricultural landscapes. To evaluate this more carefully and specifically understand the strengths and weaknesses of MODIS data for crop-type mapping, we used MODIS time-series imagery to map the sub-pixel fractional crop area for four major crop types (rice, corn, soybean and wheat) at 500-m spatial resolution for Heilongjiang province, one of the most important grain-production regions in China where recent agricultural land use change has been rapid and pronounced. To do this, a random forest regression (RF-g) model was constructed to estimate the percentage of each sub-pixel crop type in 2006, 2011 and 2016. Crop type maps generated through expert visual interpretation of high spatial resolution images (i.e., Landsat and SPOT data) were used to calibrate the regression model. Five different time series of vegetation indices (155 features) derived from different spectral channels of MODIS land surface reflectance (MOD09A1) data were used as candidate features for the RF-g model. An out-of-bag strategy and backward elimination approach was applied to select the optimal spectra-temporal feature subset for each crop type. The resulting crop maps
Vaeth, Michael; Skovlund, Eva
2004-06-15
For a given regression problem it is possible to identify a suitably defined equivalent two-sample problem such that the power or sample size obtained for the two-sample problem also applies to the regression problem. For a standard linear regression model the equivalent two-sample problem is easily identified, but for generalized linear models and for Cox regression models the situation is more complicated. An approximately equivalent two-sample problem may, however, also be identified here. In particular, we show that for logistic regression and Cox regression models the equivalent two-sample problem is obtained by selecting two equally sized samples for which the parameters differ by a value equal to the slope times twice the standard deviation of the independent variable and further requiring that the overall expected number of events is unchanged. In a simulation study we examine the validity of this approach to power calculations in logistic regression and Cox regression models. Several different covariate distributions are considered for selected values of the overall response probability and a range of alternatives. For the Cox regression model we consider both constant and non-constant hazard rates. The results show that in general the approach is remarkably accurate even in relatively small samples. Some discrepancies are, however, found in small samples with few events and a highly skewed covariate distribution. Comparison with results based on alternative methods for logistic regression models with a single continuous covariate indicates that the proposed method is at least as good as its competitors. The method is easy to implement and therefore provides a simple way to extend the range of problems that can be covered by the usual formulas for power and sample size determination. Copyright 2004 John Wiley & Sons, Ltd.
Horst, Fabian; Eekhoff, Alexander; Newell, Karl M; Schöllhorn, Wolfgang I
2017-01-01
Traditionally, gait analysis has been centered on the idea of average behavior and normality. On one hand, clinical diagnoses and therapeutic interventions typically assume that average gait patterns remain constant over time. On the other hand, it is well known that all our movements are accompanied by a certain amount of variability, which does not allow us to make two identical steps. The purpose of this study was to examine changes in the intra-individual gait patterns across different time-scales (i.e., tens-of-mins, tens-of-hours). Nine healthy subjects performed 15 gait trials at a self-selected speed on 6 sessions within one day (duration between two subsequent sessions from 10 to 90 mins). For each trial, time-continuous ground reaction forces and lower body joint angles were measured. A supervised learning model using a kernel-based discriminant regression was applied for classifying sessions within individual gait patterns. Discernable characteristics of intra-individual gait patterns could be distinguished between repeated sessions by classification rates of 67.8 ± 8.8% and 86.3 ± 7.9% for the six-session-classification of ground reaction forces and lower body joint angles, respectively. Furthermore, the one-on-one-classification showed that increasing classification rates go along with increasing time durations between two sessions and indicate that changes of gait patterns appear at different time-scales. Discernable characteristics between repeated sessions indicate continuous intrinsic changes in intra-individual gait patterns and suggest a predominant role of deterministic processes in human motor control and learning. Natural changes of gait patterns without any externally induced injury or intervention may reflect continuous adaptations of the motor system over several time-scales. Accordingly, the modelling of walking by means of average gait patterns that are assumed to be near constant over time needs to be reconsidered in the context of
Applied Regression Modeling A Business Approach
Pardoe, Iain
2012-01-01
An applied and concise treatment of statistical regression techniques for business students and professionals who have little or no background in calculusRegression analysis is an invaluable statistical methodology in business settings and is vital to model the relationship between a response variable and one or more predictor variables, as well as the prediction of a response value given values of the predictors. In view of the inherent uncertainty of business processes, such as the volatility of consumer spending and the presence of market uncertainty, business professionals use regression a
DEFF Research Database (Denmark)
Christensen, E; Altman, D G; Neuberger, J
1993-01-01
immunoglobulin (Ig)M also indicated a poor prognosis. The survival predicted by the models agreed well with the survival observed in the independent PBC patients. The time-dependent models predicted better than our previously published time-fixed model. CONCLUSIONS: Using the time-dependent Cox models, one can...
Flexible regression models with cubic splines.
Durrleman, S; Simon, R
1989-05-01
We describe the use of cubic splines in regression models to represent the relationship between the response variable and a vector of covariates. This simple method can help prevent the problems that result from inappropriate linearity assumptions. We compare restricted cubic spline regression to non-parametric procedures for characterizing the relationship between age and survival in the Stanford Heart Transplant data. We also provide an illustrative example in cancer therapeutics.
Linear Regression Based Real-Time Filtering
Directory of Open Access Journals (Sweden)
Misel Batmend
2013-01-01
Full Text Available This paper introduces real time filtering method based on linear least squares fitted line. Method can be used in case that a filtered signal is linear. This constraint narrows a band of potential applications. Advantage over Kalman filter is that it is computationally less expensive. The paper further deals with application of introduced method on filtering data used to evaluate a position of engraved material with respect to engraving machine. The filter was implemented to the CNC engraving machine control system. Experiments showing its performance are included.
Structured Dimensionality Reduction for Additive Model Regression
Fawzi, Alhussein; Fiot, Jean-Baptiste; Chen, Bei; Sinn, Mathieu; Frossard, Pascal
2016-01-01
Additive models are regression methods which model the response variable as the sum of univariate transfer functions of the input variables. Key benefits of additive models are their accuracy and interpretability on many real-world tasks. Additive models are however not adapted to problems involving a large number (e.g., hundreds) of input variables, as they are prone to overfitting in addition to losing interpretability. In this paper, we introduce a novel framework for applying additive ...
Mixed-effects regression models in linguistics
Heylen, Kris; Geeraerts, Dirk
2018-01-01
When data consist of grouped observations or clusters, and there is a risk that measurements within the same group are not independent, group-specific random effects can be added to a regression model in order to account for such within-group associations. Regression models that contain such group-specific random effects are called mixed-effects regression models, or simply mixed models. Mixed models are a versatile tool that can handle both balanced and unbalanced datasets and that can also be applied when several layers of grouping are present in the data; these layers can either be nested or crossed. In linguistics, as in many other fields, the use of mixed models has gained ground rapidly over the last decade. This methodological evolution enables us to build more sophisticated and arguably more realistic models, but, due to its technical complexity, also introduces new challenges. This volume brings together a number of promising new evolutions in the use of mixed models in linguistics, but also addres...
Bayesian Travel Time Inversion adopting Gaussian Process Regression
Mauerberger, S.; Holschneider, M.
2017-12-01
A major application in seismology is the determination of seismic velocity models. Travel time measurements are putting an integral constraint on the velocity between source and receiver. We provide insight into travel time inversion from a correlation-based Bayesian point of view. Therefore, the concept of Gaussian process regression is adopted to estimate a velocity model. The non-linear travel time integral is approximated by a 1st order Taylor expansion. A heuristic covariance describes correlations amongst observations and a priori model. That approach enables us to assess a proxy of the Bayesian posterior distribution at ordinary computational costs. No multi dimensional numeric integration nor excessive sampling is necessary. Instead of stacking the data, we suggest to progressively build the posterior distribution. Incorporating only a single evidence at a time accounts for the deficit of linearization. As a result, the most probable model is given by the posterior mean whereas uncertainties are described by the posterior covariance.As a proof of concept, a synthetic purely 1d model is addressed. Therefore a single source accompanied by multiple receivers is considered on top of a model comprising a discontinuity. We consider travel times of both phases - direct and reflected wave - corrupted by noise. Left and right of the interface are assumed independent where the squared exponential kernel serves as covariance.
A Skew-Normal Mixture Regression Model
Liu, Min; Lin, Tsung-I
2014-01-01
A challenge associated with traditional mixture regression models (MRMs), which rest on the assumption of normally distributed errors, is determining the number of unobserved groups. Specifically, even slight deviations from normality can lead to the detection of spurious classes. The current work aims to (a) examine how sensitive the commonly…
Regression modeling methods, theory, and computation with SAS
Panik, Michael
2009-01-01
Regression Modeling: Methods, Theory, and Computation with SAS provides an introduction to a diverse assortment of regression techniques using SAS to solve a wide variety of regression problems. The author fully documents the SAS programs and thoroughly explains the output produced by the programs.The text presents the popular ordinary least squares (OLS) approach before introducing many alternative regression methods. It covers nonparametric regression, logistic regression (including Poisson regression), Bayesian regression, robust regression, fuzzy regression, random coefficients regression,
Influence diagnostics in meta-regression model.
Shi, Lei; Zuo, ShanShan; Yu, Dalei; Zhou, Xiaohua
2017-09-01
This paper studies the influence diagnostics in meta-regression model including case deletion diagnostic and local influence analysis. We derive the subset deletion formulae for the estimation of regression coefficient and heterogeneity variance and obtain the corresponding influence measures. The DerSimonian and Laird estimation and maximum likelihood estimation methods in meta-regression are considered, respectively, to derive the results. Internal and external residual and leverage measure are defined. The local influence analysis based on case-weights perturbation scheme, responses perturbation scheme, covariate perturbation scheme, and within-variance perturbation scheme are explored. We introduce a method by simultaneous perturbing responses, covariate, and within-variance to obtain the local influence measure, which has an advantage of capable to compare the influence magnitude of influential studies from different perturbations. An example is used to illustrate the proposed methodology. Copyright © 2017 John Wiley & Sons, Ltd.
Geographically weighted regression model on poverty indicator
Slamet, I.; Nugroho, N. F. T. A.; Muslich
2017-12-01
In this research, we applied geographically weighted regression (GWR) for analyzing the poverty in Central Java. We consider Gaussian Kernel as weighted function. The GWR uses the diagonal matrix resulted from calculating kernel Gaussian function as a weighted function in the regression model. The kernel weights is used to handle spatial effects on the data so that a model can be obtained for each location. The purpose of this paper is to model of poverty percentage data in Central Java province using GWR with Gaussian kernel weighted function and to determine the influencing factors in each regency/city in Central Java province. Based on the research, we obtained geographically weighted regression model with Gaussian kernel weighted function on poverty percentage data in Central Java province. We found that percentage of population working as farmers, population growth rate, percentage of households with regular sanitation, and BPJS beneficiaries are the variables that affect the percentage of poverty in Central Java province. In this research, we found the determination coefficient R2 are 68.64%. There are two categories of district which are influenced by different of significance factors.
Tutorial on Using Regression Models with Count Outcomes Using R
Directory of Open Access Journals (Sweden)
A. Alexander Beaujean
2016-02-01
Full Text Available Education researchers often study count variables, such as times a student reached a goal, discipline referrals, and absences. Most researchers that study these variables use typical regression methods (i.e., ordinary least-squares either with or without transforming the count variables. In either case, using typical regression for count data can produce parameter estimates that are biased, thus diminishing any inferences made from such data. As count-variable regression models are seldom taught in training programs, we present a tutorial to help educational researchers use such methods in their own research. We demonstrate analyzing and interpreting count data using Poisson, negative binomial, zero-inflated Poisson, and zero-inflated negative binomial regression models. The count regression methods are introduced through an example using the number of times students skipped class. The data for this example are freely available and the R syntax used run the example analyses are included in the Appendix.
Adaptive regression for modeling nonlinear relationships
Knafl, George J
2016-01-01
This book presents methods for investigating whether relationships are linear or nonlinear and for adaptively fitting appropriate models when they are nonlinear. Data analysts will learn how to incorporate nonlinearity in one or more predictor variables into regression models for different types of outcome variables. Such nonlinear dependence is often not considered in applied research, yet nonlinear relationships are common and so need to be addressed. A standard linear analysis can produce misleading conclusions, while a nonlinear analysis can provide novel insights into data, not otherwise possible. A variety of examples of the benefits of modeling nonlinear relationships are presented throughout the book. Methods are covered using what are called fractional polynomials based on real-valued power transformations of primary predictor variables combined with model selection based on likelihood cross-validation. The book covers how to formulate and conduct such adaptive fractional polynomial modeling in the s...
Bayesian Inference of a Multivariate Regression Model
Directory of Open Access Journals (Sweden)
Marick S. Sinay
2014-01-01
Full Text Available We explore Bayesian inference of a multivariate linear regression model with use of a flexible prior for the covariance structure. The commonly adopted Bayesian setup involves the conjugate prior, multivariate normal distribution for the regression coefficients and inverse Wishart specification for the covariance matrix. Here we depart from this approach and propose a novel Bayesian estimator for the covariance. A multivariate normal prior for the unique elements of the matrix logarithm of the covariance matrix is considered. Such structure allows for a richer class of prior distributions for the covariance, with respect to strength of beliefs in prior location hyperparameters, as well as the added ability, to model potential correlation amongst the covariance structure. The posterior moments of all relevant parameters of interest are calculated based upon numerical results via a Markov chain Monte Carlo procedure. The Metropolis-Hastings-within-Gibbs algorithm is invoked to account for the construction of a proposal density that closely matches the shape of the target posterior distribution. As an application of the proposed technique, we investigate a multiple regression based upon the 1980 High School and Beyond Survey.
General regression and representation model for classification.
Directory of Open Access Journals (Sweden)
Jianjun Qian
Full Text Available Recently, the regularized coding-based classification methods (e.g. SRC and CRC show a great potential for pattern classification. However, most existing coding methods assume that the representation residuals are uncorrelated. In real-world applications, this assumption does not hold. In this paper, we take account of the correlations of the representation residuals and develop a general regression and representation model (GRR for classification. GRR not only has advantages of CRC, but also takes full use of the prior information (e.g. the correlations between representation residuals and representation coefficients and the specific information (weight matrix of image pixels to enhance the classification performance. GRR uses the generalized Tikhonov regularization and K Nearest Neighbors to learn the prior information from the training data. Meanwhile, the specific information is obtained by using an iterative algorithm to update the feature (or image pixel weights of the test sample. With the proposed model as a platform, we design two classifiers: basic general regression and representation classifier (B-GRR and robust general regression and representation classifier (R-GRR. The experimental results demonstrate the performance advantages of proposed methods over state-of-the-art algorithms.
Confidence bands for inverse regression models
International Nuclear Information System (INIS)
Birke, Melanie; Bissantz, Nicolai; Holzmann, Hajo
2010-01-01
We construct uniform confidence bands for the regression function in inverse, homoscedastic regression models with convolution-type operators. Here, the convolution is between two non-periodic functions on the whole real line rather than between two periodic functions on a compact interval, since the former situation arguably arises more often in applications. First, following Bickel and Rosenblatt (1973 Ann. Stat. 1 1071–95) we construct asymptotic confidence bands which are based on strong approximations and on a limit theorem for the supremum of a stationary Gaussian process. Further, we propose bootstrap confidence bands based on the residual bootstrap and prove consistency of the bootstrap procedure. A simulation study shows that the bootstrap confidence bands perform reasonably well for moderate sample sizes. Finally, we apply our method to data from a gel electrophoresis experiment with genetically engineered neuronal receptor subunits incubated with rat brain extract
Multitask Quantile Regression under the Transnormal Model.
Fan, Jianqing; Xue, Lingzhou; Zou, Hui
2016-01-01
We consider estimating multi-task quantile regression under the transnormal model, with focus on high-dimensional setting. We derive a surprisingly simple closed-form solution through rank-based covariance regularization. In particular, we propose the rank-based ℓ 1 penalization with positive definite constraints for estimating sparse covariance matrices, and the rank-based banded Cholesky decomposition regularization for estimating banded precision matrices. By taking advantage of alternating direction method of multipliers, nearest correlation matrix projection is introduced that inherits sampling properties of the unprojected one. Our work combines strengths of quantile regression and rank-based covariance regularization to simultaneously deal with nonlinearity and nonnormality for high-dimensional regression. Furthermore, the proposed method strikes a good balance between robustness and efficiency, achieves the "oracle"-like convergence rate, and provides the provable prediction interval under the high-dimensional setting. The finite-sample performance of the proposed method is also examined. The performance of our proposed rank-based method is demonstrated in a real application to analyze the protein mass spectroscopy data.
Directory of Open Access Journals (Sweden)
Jaime Araujo Cobuci
2012-09-01
Full Text Available Milk yield test-day records on the first three lactations of 25,500 Holstein cows were used to estimate genetic parameters and predict breeding values for nine measures of persistency and 305-d milk yield in a random regression animal model using two criteria to define the fixed regression. Legendre polynomials of fourth and fifth orders were used to model the fixed and random regressions of lactation curves. The fixed regressions were adjusted for average milk yield on populations (single or subpopulations (multiple formed by cows that calved at the same age and in the same season. Akaike Information (AIC and Bayesian Information (BIC criteria indicated that models with multiple regression lactation curves had the best fit to test-day milk records of first lactations, while models with a single regression curve had the best fit for the second and third lactations. Heritability and genetic correlation estimates between persistency and milk yield differed significantly depending on the lactation order and the measures of persistency used. These parameters did not differ significantly depending on the criteria used for defining the fixed regressions for lactation curves. In general, the heritability estimates were higher for first (0.07 to 0.43, followed by the second (0.08 to 0.21 and third (0.04 to 0.10 lactation. The rank of sires resulting from the processes of genetic evaluation for milk yield or persistency using random regression models differed according to the criteria used for determining the fixed regression of lactation curve.
Crime Modeling using Spatial Regression Approach
Saleh Ahmar, Ansari; Adiatma; Kasim Aidid, M.
2018-01-01
Act of criminality in Indonesia increased both variety and quantity every year. As murder, rape, assault, vandalism, theft, fraud, fencing, and other cases that make people feel unsafe. Risk of society exposed to crime is the number of reported cases in the police institution. The higher of the number of reporter to the police institution then the number of crime in the region is increasing. In this research, modeling criminality in South Sulawesi, Indonesia with the dependent variable used is the society exposed to the risk of crime. Modelling done by area approach is the using Spatial Autoregressive (SAR) and Spatial Error Model (SEM) methods. The independent variable used is the population density, the number of poor population, GDP per capita, unemployment and the human development index (HDI). Based on the analysis using spatial regression can be shown that there are no dependencies spatial both lag or errors in South Sulawesi.
Inferring gene regression networks with model trees
Directory of Open Access Journals (Sweden)
Aguilar-Ruiz Jesus S
2010-10-01
Full Text Available Abstract Background Novel strategies are required in order to handle the huge amount of data produced by microarray technologies. To infer gene regulatory networks, the first step is to find direct regulatory relationships between genes building the so-called gene co-expression networks. They are typically generated using correlation statistics as pairwise similarity measures. Correlation-based methods are very useful in order to determine whether two genes have a strong global similarity but do not detect local similarities. Results We propose model trees as a method to identify gene interaction networks. While correlation-based methods analyze each pair of genes, in our approach we generate a single regression tree for each gene from the remaining genes. Finally, a graph from all the relationships among output and input genes is built taking into account whether the pair of genes is statistically significant. For this reason we apply a statistical procedure to control the false discovery rate. The performance of our approach, named REGNET, is experimentally tested on two well-known data sets: Saccharomyces Cerevisiae and E.coli data set. First, the biological coherence of the results are tested. Second the E.coli transcriptional network (in the Regulon database is used as control to compare the results to that of a correlation-based method. This experiment shows that REGNET performs more accurately at detecting true gene associations than the Pearson and Spearman zeroth and first-order correlation-based methods. Conclusions REGNET generates gene association networks from gene expression data, and differs from correlation-based methods in that the relationship between one gene and others is calculated simultaneously. Model trees are very useful techniques to estimate the numerical values for the target genes by linear regression functions. They are very often more precise than linear regression models because they can add just different linear
Pan, Shin-Liang; Chen, Hsiu-Hsi
2010-09-01
The rates of functional recovery after stroke tend to decrease with time. Time-varying Markov processes (TVMP) may be more biologically plausible than time-invariant Markov process for modeling such data. However, analysis of such stochastic processes, particularly tackling reversible transitions and the incorporation of random effects into models, can be analytically intractable. We make use of ordinary differential equations to solve continuous-time TVMP with reversible transitions. The proportional hazard form was used to assess the effects of an individual's covariates on multi-state transitions with the incorporation of random effects that capture the residual variation after being explained by measured covariates under the concept of generalized linear model. We further built up Bayesian directed acyclic graphic model to obtain full joint posterior distribution. Markov chain Monte Carlo (MCMC) with Gibbs sampling was applied to estimate parameters based on posterior marginal distributions with multiple integrands. The proposed method was illustrated with empirical data from a study on the functional recovery after stroke. Copyright 2010 Elsevier Inc. All rights reserved.
STREAMFLOW AND WATER QUALITY REGRESSION MODELING ...
African Journals Online (AJOL)
Journal of Modeling, Design and Management of Engineering Systems ... Consistency tests, trend analyses and mathematical modeling of water quality constituents and riverflow characteristics at upstream Nekede station and downstream Obigbo station show: consistent time-trends in degree of contamination; linear and ...
Linear Regression Models for Estimating True Subsurface ...
Indian Academy of Sciences (India)
47
of the processing time and memory space required to carry out the inversion with the. 29. SCLS algorithm. ... consumption of time and memory space for the iterative computations to converge at. 54 minimum data ..... colour scale and blanking as the observed true resistivity models, for visual assessment. 163. The accuracy ...
Allegrini, Franco; Braga, Jez W B; Moreira, Alessandro C O; Olivieri, Alejandro C
2018-06-29
A new multivariate regression model, named Error Covariance Penalized Regression (ECPR) is presented. Following a penalized regression strategy, the proposed model incorporates information about the measurement error structure of the system, using the error covariance matrix (ECM) as a penalization term. Results are reported from both simulations and experimental data based on replicate mid and near infrared (MIR and NIR) spectral measurements. The results for ECPR are better under non-iid conditions when compared with traditional first-order multivariate methods such as ridge regression (RR), principal component regression (PCR) and partial least-squares regression (PLS). Copyright © 2018 Elsevier B.V. All rights reserved.
Probabilistic Solar Forecasting Using Quantile Regression Models
Directory of Open Access Journals (Sweden)
Philippe Lauret
2017-10-01
Full Text Available In this work, we assess the performance of three probabilistic models for intra-day solar forecasting. More precisely, a linear quantile regression method is used to build three models for generating 1 h–6 h-ahead probabilistic forecasts. Our approach is applied to forecasting solar irradiance at a site experiencing highly variable sky conditions using the historical ground observations of solar irradiance as endogenous inputs and day-ahead forecasts as exogenous inputs. Day-ahead irradiance forecasts are obtained from the Integrated Forecast System (IFS, a Numerical Weather Prediction (NWP model maintained by the European Center for Medium-Range Weather Forecast (ECMWF. Several metrics, mainly originated from the weather forecasting community, are used to evaluate the performance of the probabilistic forecasts. The results demonstrated that the NWP exogenous inputs improve the quality of the intra-day probabilistic forecasts. The analysis considered two locations with very dissimilar solar variability. Comparison between the two locations highlighted that the statistical performance of the probabilistic models depends on the local sky conditions.
Entrepreneurial intention modeling using hierarchical multiple regression
Directory of Open Access Journals (Sweden)
Marina Jeger
2014-12-01
Full Text Available The goal of this study is to identify the contribution of effectuation dimensions to the predictive power of the entrepreneurial intention model over and above that which can be accounted for by other predictors selected and confirmed in previous studies. As is often the case in social and behavioral studies, some variables are likely to be highly correlated with each other. Therefore, the relative amount of variance in the criterion variable explained by each of the predictors depends on several factors such as the order of variable entry and sample specifics. The results show the modest predictive power of two dimensions of effectuation prior to the introduction of the theory of planned behavior elements. The article highlights the main advantages of applying hierarchical regression in social sciences as well as in the specific context of entrepreneurial intention formation, and addresses some of the potential pitfalls that this type of analysis entails.
Hierarchical regression analysis in structural Equation Modeling
de Jong, P.F.
1999-01-01
In a hierarchical or fixed-order regression analysis, the independent variables are entered into the regression equation in a prespecified order. Such an analysis is often performed when the extra amount of variance accounted for in a dependent variable by a specific independent variable is the main
Extended cox regression model: The choice of timefunction
Isik, Hatice; Tutkun, Nihal Ata; Karasoy, Durdu
2017-07-01
Cox regression model (CRM), which takes into account the effect of censored observations, is one the most applicative and usedmodels in survival analysis to evaluate the effects of covariates. Proportional hazard (PH), requires a constant hazard ratio over time, is the assumptionofCRM. Using extended CRM provides the test of including a time dependent covariate to assess the PH assumption or an alternative model in case of nonproportional hazards. In this study, the different types of real data sets are used to choose the time function and the differences between time functions are analyzed and discussed.
Explaining Variation in Instructional Time: An Application of Quantile Regression
Corey, Douglas Lyman; Phelps, Geoffrey; Ball, Deborah Loewenberg; Demonte, Jenny; Harrison, Delena
2012-01-01
This research is conducted in the context of a large-scale study of three nationally disseminated comprehensive school reform projects (CSRs) and examines how school- and classroom-level factors contribute to variation in instructional time in English language arts and mathematics. When using mean-based OLS regression techniques such as…
Outlier detection algorithms for least squares time series regression
DEFF Research Database (Denmark)
Johansen, Søren; Nielsen, Bent
We review recent asymptotic results on some robust methods for multiple regression. The regressors include stationary and non-stationary time series as well as polynomial terms. The methods include the Huber-skip M-estimator, 1-step Huber-skip M-estimators, in particular the Impulse Indicator Sat...
The art of regression modeling in road safety
Hauer, Ezra
2015-01-01
This unique book explains how to fashion useful regression models from commonly available data to erect models essential for evidence-based road safety management and research. Composed from techniques and best practices presented over many years of lectures and workshops, The Art of Regression Modeling in Road Safety illustrates that fruitful modeling cannot be done without substantive knowledge about the modeled phenomenon. Class-tested in courses and workshops across North America, the book is ideal for professionals, researchers, university professors, and graduate students with an interest in, or responsibilities related to, road safety. This book also: · Presents for the first time a powerful analytical tool for road safety researchers and practitioners · Includes problems and solutions in each chapter as well as data and spreadsheets for running models and PowerPoint presentation slides · Features pedagogy well-suited for graduate courses and workshops including problems, solutions, and PowerPoint p...
Modeling maximum daily temperature using a varying coefficient regression model
Han Li; Xinwei Deng; Dong-Yum Kim; Eric P. Smith
2014-01-01
Relationships between stream water and air temperatures are often modeled using linear or nonlinear regression methods. Despite a strong relationship between water and air temperatures and a variety of models that are effective for data summarized on a weekly basis, such models did not yield consistently good predictions for summaries such as daily maximum temperature...
Tracking time-varying parameters with local regression
DEFF Research Database (Denmark)
Joensen, Alfred Karsten; Nielsen, Henrik Aalborg; Nielsen, Torben Skov
2000-01-01
This paper shows that the recursive least-squares (RLS) algorithm with forgetting factor is a special case of a varying-coe\\$cient model, and a model which can easily be estimated via simple local regression. This observation allows us to formulate a new method which retains the RLS algorithm......, but extends the algorithm by including polynomial approximations. Simulation results are provided, which indicates that this new method is superior to the classical RLS method, if the parameter variations are smooth....
Model performance analysis and model validation in logistic regression
Directory of Open Access Journals (Sweden)
Rosa Arboretti Giancristofaro
2007-10-01
Full Text Available In this paper a new model validation procedure for a logistic regression model is presented. At first, we illustrate a brief review of different techniques of model validation. Next, we define a number of properties required for a model to be considered "good", and a number of quantitative performance measures. Lastly, we describe a methodology for the assessment of the performance of a given model by using an example taken from a management study.
Applications of some discrete regression models for count data
Directory of Open Access Journals (Sweden)
B. M. Golam Kibria
2006-01-01
Full Text Available In this paper we have considered several regression models to fit the count data that encounter in the field of Biometrical, Environmental, Social Sciences and Transportation Engineering. We have fitted Poisson (PO, Negative Binomial (NB, Zero-Inflated Poisson (ZIP and Zero-Inflated Negative Binomial (ZINB regression models to run-off-road (ROR crash data which collected on arterial roads in south region (rural of Florida State. To compare the performance of these models, we analyzed data with moderate to high percentage of zero counts. Because the variances were almost three times greater than the means, it appeared that both NB and ZINB models performed better than PO and ZIP models for the zero inflated and over dispersed count data.
Lee, Michael T.; Asquith, William H.; Oden, Timothy D.
2012-01-01
In December 2005, the U.S. Geological Survey (USGS), in cooperation with the City of Houston, Texas, began collecting discrete water-quality samples for nutrients, total organic carbon, bacteria (Escherichia coli and total coliform), atrazine, and suspended sediment at two USGS streamflow-gaging stations that represent watersheds contributing to Lake Houston (08068500 Spring Creek near Spring, Tex., and 08070200 East Fork San Jacinto River near New Caney, Tex.). Data from the discrete water-quality samples collected during 2005–9, in conjunction with continuously monitored real-time data that included streamflow and other physical water-quality properties (specific conductance, pH, water temperature, turbidity, and dissolved oxygen), were used to develop regression models for the estimation of concentrations of water-quality constituents of substantial source watersheds to Lake Houston. The potential explanatory variables included discharge (streamflow), specific conductance, pH, water temperature, turbidity, dissolved oxygen, and time (to account for seasonal variations inherent in some water-quality data). The response variables (the selected constituents) at each site were nitrite plus nitrate nitrogen, total phosphorus, total organic carbon, E. coli, atrazine, and suspended sediment. The explanatory variables provide easily measured quantities to serve as potential surrogate variables to estimate concentrations of the selected constituents through statistical regression. Statistical regression also facilitates accompanying estimates of uncertainty in the form of prediction intervals. Each regression model potentially can be used to estimate concentrations of a given constituent in real time. Among other regression diagnostics, the diagnostics used as indicators of general model reliability and reported herein include the adjusted R-squared, the residual standard error, residual plots, and p-values. Adjusted R-squared values for the Spring Creek models ranged
Logistic Regression Modeling of Diminishing Manufacturing Sources for Integrated Circuits
National Research Council Canada - National Science Library
Gravier, Michael
1999-01-01
.... The research identified logistic regression as a powerful tool for analysis of DMSMS and further developed twenty models attempting to identify the "best" way to model and predict DMSMS using logistic regression...
Electricity consumption forecasting in Italy using linear regression models
International Nuclear Information System (INIS)
Bianco, Vincenzo; Manca, Oronzio; Nardini, Sergio
2009-01-01
The influence of economic and demographic variables on the annual electricity consumption in Italy has been investigated with the intention to develop a long-term consumption forecasting model. The time period considered for the historical data is from 1970 to 2007. Different regression models were developed, using historical electricity consumption, gross domestic product (GDP), gross domestic product per capita (GDP per capita) and population. A first part of the paper considers the estimation of GDP, price and GDP per capita elasticities of domestic and non-domestic electricity consumption. The domestic and non-domestic short run price elasticities are found to be both approximately equal to -0.06, while long run elasticities are equal to -0.24 and -0.09, respectively. On the contrary, the elasticities of GDP and GDP per capita present higher values. In the second part of the paper, different regression models, based on co-integrated or stationary data, are presented. Different statistical tests are employed to check the validity of the proposed models. A comparison with national forecasts, based on complex econometric models, such as Markal-Time, was performed, showing that the developed regressions are congruent with the official projections, with deviations of ±1% for the best case and ±11% for the worst. These deviations are to be considered acceptable in relation to the time span taken into account. (author)
Modelling Issues in Kernel Ridge Regression
P. Exterkate (Peter)
2011-01-01
textabstractKernel ridge regression is gaining popularity as a data-rich nonlinear forecasting tool, which is applicable in many different contexts. This paper investigates the influence of the choice of kernel and the setting of tuning parameters on forecast accuracy. We review several popular
Bayesian Estimation of Multivariate Latent Regression Models: Gauss versus Laplace
Culpepper, Steven Andrew; Park, Trevor
2017-01-01
A latent multivariate regression model is developed that employs a generalized asymmetric Laplace (GAL) prior distribution for regression coefficients. The model is designed for high-dimensional applications where an approximate sparsity condition is satisfied, such that many regression coefficients are near zero after accounting for all the model…
Rigoni, Marta; Torri, Emanuele; Nollo, Giandomenico; Zarantonello, Diana; Laudon, Alessandro; Sottini, Laura; Guarrera, Giovanni Maria; Brunori, Giuliano
2017-06-01
Despite several studies reporting similar outcomes for peritoneal dialysis (PD) and hemodialysis (HD), the former is underused worldwide, with a PD prevalence of 15% in Italy. In 2008, the Unit of Nephrology and Dialysis of the Healthcare Trust of the Autonomous Province of Trento implemented a successful PD program which has increased the proportion of PD incident patients from 7 to 47%. We aimed to assess the effect of this extensive use of PD by comparing HD and PD in terms of survival and time-to-transplantation. A total of 334 HD and 153 PD incident patients were enrolled between January 2008 and December 2014. After screening for exclusion criteria and propensity score matching, 279 HD and 132 PD patients were analyzed. Survival and time-to-transplantation were assessed by competing-risks regression models, using death and transplantation as primary and competing events. Crude and adjusted regression models for survival revealed the absence of significant differences between HD and PD cumulative incidence functions (subhazard ratio: 1.09, p = 0.62 and 1.34, p = 0.10, respectively). Differently, crude and adjusted regression models for transplantation revealed a lower time-to-transplantation for PD versus HD patients (subhazard ratio: 2.34, p < 0.01, and 2.57, p < 0.01, respectively). The waiting time for placement in the transplant waiting list was longer in HD than PD patients (330 vs. 224 days, p < 0.01). The extensive use of PD did not lead to any statistically significant difference in mortality. Furthermore, PD was associated with lower time to transplantation. PD may be a viable option for large-scale dialytic treatment in the advanced chronic kidney disease population.
Mixture of Regression Models with Single-Index
Xiang, Sijia; Yao, Weixin
2016-01-01
In this article, we propose a class of semiparametric mixture regression models with single-index. We argue that many recently proposed semiparametric/nonparametric mixture regression models can be considered special cases of the proposed model. However, unlike existing semiparametric mixture regression models, the new pro- posed model can easily incorporate multivariate predictors into the nonparametric components. Backfitting estimates and the corresponding algorithms have been proposed for...
Linear regression crash prediction models : issues and proposed solutions.
2010-05-01
The paper develops a linear regression model approach that can be applied to : crash data to predict vehicle crashes. The proposed approach involves novice data aggregation : to satisfy linear regression assumptions; namely error structure normality ...
Stagewise pseudo-value regression for time-varying effects on the cumulative incidence
DEFF Research Database (Denmark)
Zöller, Daniela; Schmidtmann, Irene; Weinmann, Arndt
2016-01-01
using a pseudo-value approach. For a grid of time points, the possibly unobserved binary event status is replaced by a jackknife pseudo-value based on the Aalen-Johansen method. We combine a stagewise regression technique with the pseudo-value approach to provide variable selection while allowing......In a competing risks setting, the cumulative incidence of an event of interest describes the absolute risk for this event as a function of time. For regression analysis, one can either choose to model all competing events by separate cause-specific hazard models or directly model the association...... between covariates and the cumulative incidence of one of the events. With a suitable link function, direct regression models allow for a straightforward interpretation of covariate effects on the cumulative incidence. In practice, where data can be right-censored, these regression models are implemented...
International Nuclear Information System (INIS)
Nuamah, N.N.N.N.
1990-12-01
The paradoxical nature of results of the mean approach in pooling cross-section and time series data has been identified to be caused by the presence in the normal equations of phenomena such as autocovariances, multicollinear covariances, drift covariances and drift multicollinear covariances. This paper considers the problem of autocorrelation and suggests ways of solving it. (author). 4 refs
Dynamic logistic regression and dynamic model averaging for binary classification.
McCormick, Tyler H; Raftery, Adrian E; Madigan, David; Burd, Randall S
2012-03-01
We propose an online binary classification procedure for cases when there is uncertainty about the model to use and parameters within a model change over time. We account for model uncertainty through dynamic model averaging, a dynamic extension of Bayesian model averaging in which posterior model probabilities may also change with time. We apply a state-space model to the parameters of each model and we allow the data-generating model to change over time according to a Markov chain. Calibrating a "forgetting" factor accommodates different levels of change in the data-generating mechanism. We propose an algorithm that adjusts the level of forgetting in an online fashion using the posterior predictive distribution, and so accommodates various levels of change at different times. We apply our method to data from children with appendicitis who receive either a traditional (open) appendectomy or a laparoscopic procedure. Factors associated with which children receive a particular type of procedure changed substantially over the 7 years of data collection, a feature that is not captured using standard regression modeling. Because our procedure can be implemented completely online, future data collection for similar studies would require storing sensitive patient information only temporarily, reducing the risk of a breach of confidentiality. © 2011, The International Biometric Society.
Logistic Regression Model on Antenna Control Unit Autotracking Mode
2015-10-20
412TW-PA-15240 Logistic Regression Model on Antenna Control Unit Autotracking Mode DANIEL T. LAIRD AIR FORCE TEST CENTER EDWARDS AFB, CA...OCT 15 4. TITLE AND SUBTITLE Logistic Regression Model on Antenna Control Unit Autotracking Mode 5a. CONTRACT NUMBER 5b. GRANT...alternative-hypothesis. This paper will present an Antenna Auto- tracking model using Logistic Regression modeling. This paper presents an example of
A mathematical model of tumour angiogenesis: growth, regression and regrowth.
Vilanova, Guillermo; Colominas, Ignasi; Gomez, Hector
2017-01-01
Cancerous tumours have the ability to recruit new blood vessels through a process called angiogenesis. By stimulating vascular growth, tumours get connected to the circulatory system, receive nutrients and open a way to colonize distant organs. Tumour-induced vascular networks become unstable in the absence of tumour angiogenic factors (TAFs). They may undergo alternating stages of growth, regression and regrowth. Following a phase-field methodology, we propose a model of tumour angiogenesis that reproduces the aforementioned features and highlights the importance of vascular regression and regrowth. In contrast with previous theories which focus on vessel remodelling due to the absence of flow, we model an alternative regression mechanism based on the dependency of tumour-induced vascular networks on TAFs. The model captures capillaries at full scale, the plastic dynamics of tumour-induced vessel networks at long time scales, and shows the key role played by filopodia during angiogenesis. The predictions of our model are in agreement with in vivo experiments and may prove useful for the design of antiangiogenic therapies. © 2017 The Author(s).
Parameters Estimation of Geographically Weighted Ordinal Logistic Regression (GWOLR) Model
Zuhdi, Shaifudin; Retno Sari Saputro, Dewi; Widyaningsih, Purnami
2017-06-01
A regression model is the representation of relationship between independent variable and dependent variable. The dependent variable has categories used in the logistic regression model to calculate odds on. The logistic regression model for dependent variable has levels in the logistics regression model is ordinal. GWOLR model is an ordinal logistic regression model influenced the geographical location of the observation site. Parameters estimation in the model needed to determine the value of a population based on sample. The purpose of this research is to parameters estimation of GWOLR model using R software. Parameter estimation uses the data amount of dengue fever patients in Semarang City. Observation units used are 144 villages in Semarang City. The results of research get GWOLR model locally for each village and to know probability of number dengue fever patient categories.
Time-localized wavelet multiple regression and correlation
Fernández-Macho, Javier
2018-02-01
This paper extends wavelet methodology to handle comovement dynamics of multivariate time series via moving weighted regression on wavelet coefficients. The concept of wavelet local multiple correlation is used to produce one single set of multiscale correlations along time, in contrast with the large number of wavelet correlation maps that need to be compared when using standard pairwise wavelet correlations with rolling windows. Also, the spectral properties of weight functions are investigated and it is argued that some common time windows, such as the usual rectangular rolling window, are not satisfactory on these grounds. The method is illustrated with a multiscale analysis of the comovements of Eurozone stock markets during this century. It is shown how the evolution of the correlation structure in these markets has been far from homogeneous both along time and across timescales featuring an acute divide across timescales at about the quarterly scale. At longer scales, evidence from the long-term correlation structure can be interpreted as stable perfect integration among Euro stock markets. On the other hand, at intramonth and intraweek scales, the short-term correlation structure has been clearly evolving along time, experiencing a sharp increase during financial crises which may be interpreted as evidence of financial 'contagion'.
Optimum short-time polynomial regression for signal analysis
Indian Academy of Sciences (India)
A Sreenivasa Murthy
does not require transformation to a different domain (fre- quency domain, for example). Model-fitting is done by working in the time domain. This feature allows ... provided applications to time-interleaved analog-to-digital converters (ADCs). Strohmer also developed a fast multi- level algorithm for signal reconstruction from ...
Alternative regression models to assess increase in childhood BMI
Beyerlein, Andreas; Fahrmeir, Ludwig; Mansmann, Ulrich; Toschke, André M
2008-01-01
Abstract Background Body mass index (BMI) data usually have skewed distributions, for which common statistical modeling approaches such as simple linear or logistic regression have limitations. Methods Different regression approaches to predict childhood BMI by goodness-of-fit measures and means of interpretation were compared including generalized linear models (GLMs), quantile regression and Generalized Additive Models for Location, Scale and Shape (GAMLSS). We analyzed data of 4967 childre...
Directory of Open Access Journals (Sweden)
Hong-Juan Li
2013-04-01
Full Text Available Electric load forecasting is an important issue for a power utility, associated with the management of daily operations such as energy transfer scheduling, unit commitment, and load dispatch. Inspired by strong non-linear learning capability of support vector regression (SVR, this paper presents a SVR model hybridized with the empirical mode decomposition (EMD method and auto regression (AR for electric load forecasting. The electric load data of the New South Wales (Australia market are employed for comparing the forecasting performances of different forecasting models. The results confirm the validity of the idea that the proposed model can simultaneously provide forecasting with good accuracy and interpretability.
DEFF Research Database (Denmark)
Azarang, Leyla; Scheike, Thomas; de Uña-Álvarez, Jacobo
2017-01-01
In this work, we present direct regression analysis for the transition probabilities in the possibly non-Markov progressive illness–death model. The method is based on binomial regression, where the response is the indicator of the occupancy for the given state along time. Randomly weighted score...
Stochastic Approximation Methods for Latent Regression Item Response Models
von Davier, Matthias; Sinharay, Sandip
2010-01-01
This article presents an application of a stochastic approximation expectation maximization (EM) algorithm using a Metropolis-Hastings (MH) sampler to estimate the parameters of an item response latent regression model. Latent regression item response models are extensions of item response theory (IRT) to a latent variable model with covariates…
Linear regression models for quantitative assessment of left ...
African Journals Online (AJOL)
STORAGESEVER
2008-07-04
Jul 4, 2008 ... computed. Linear regression models for the prediction of left ventricular structures were established. Prediction models for ... study aimed at establishing linear regression models that could be used in the prediction ..... Is white cat hypertension associated with artenal disease or left ventricular hypertrophy?
Poisson Mixture Regression Models for Heart Disease Prediction
Erol, Hamza
2016-01-01
Early heart disease control can be achieved by high disease prediction and diagnosis efficiency. This paper focuses on the use of model based clustering techniques to predict and diagnose heart disease via Poisson mixture regression models. Analysis and application of Poisson mixture regression models is here addressed under two different classes: standard and concomitant variable mixture regression models. Results show that a two-component concomitant variable Poisson mixture regression model predicts heart disease better than both the standard Poisson mixture regression model and the ordinary general linear Poisson regression model due to its low Bayesian Information Criteria value. Furthermore, a Zero Inflated Poisson Mixture Regression model turned out to be the best model for heart prediction over all models as it both clusters individuals into high or low risk category and predicts rate to heart disease componentwise given clusters available. It is deduced that heart disease prediction can be effectively done by identifying the major risks componentwise using Poisson mixture regression model. PMID:27999611
Development and Application of Nonlinear Land-Use Regression Models
Champendal, Alexandre; Kanevski, Mikhail; Huguenot, Pierre-Emmanuel
2014-05-01
The problem of air pollution modelling in urban zones is of great importance both from scientific and applied points of view. At present there are several fundamental approaches either based on science-based modelling (air pollution dispersion) or on the application of space-time geostatistical methods (e.g. family of kriging models or conditional stochastic simulations). Recently, there were important developments in so-called Land Use Regression (LUR) models. These models take into account geospatial information (e.g. traffic network, sources of pollution, average traffic, population census, land use, etc.) at different scales, for example, using buffering operations. Usually the dimension of the input space (number of independent variables) is within the range of (10-100). It was shown that LUR models have some potential to model complex and highly variable patterns of air pollution in urban zones. Most of LUR models currently used are linear models. In the present research the nonlinear LUR models are developed and applied for Geneva city. Mainly two nonlinear data-driven models were elaborated: multilayer perceptron and random forest. An important part of the research deals also with a comprehensive exploratory data analysis using statistical, geostatistical and time series tools. Unsupervised self-organizing maps were applied to better understand space-time patterns of the pollution. The real data case study deals with spatial-temporal air pollution data of Geneva (2002-2011). Nitrogen dioxide (NO2) has caught our attention. It has effects on human health and on plants; NO2 contributes to the phenomenon of acid rain. The negative effects of nitrogen dioxides on plants are the reduction of the growth, production and pesticide resistance. And finally, the effects on materials: nitrogen dioxide increases the corrosion. The data used for this study consist of a set of 106 NO2 passive sensors. 80 were used to build the models and the remaining 36 have constituted
Hahn, Andrew D; Rowe, Daniel B
2012-02-01
As more evidence is presented suggesting that the phase, as well as the magnitude, of functional MRI (fMRI) time series may contain important information and that there are theoretical drawbacks to modeling functional response in the magnitude alone, removing noise in the phase is becoming more important. Previous studies have shown that retrospective correction of noise from physiologic sources can remove significant phase variance and that dynamic main magnetic field correction and regression of estimated motion parameters also remove significant phase fluctuations. In this work, we investigate the performance of physiologic noise regression in a framework along with correction for dynamic main field fluctuations and motion regression. Our findings suggest that including physiologic regressors provides some benefit in terms of reduction in phase noise power, but it is small compared to the benefit of dynamic field corrections and use of estimated motion parameters as nuisance regressors. Additionally, we show that the use of all three techniques reduces phase variance substantially, removes undesirable spatial phase correlations and improves detection of the functional response in magnitude and phase. Copyright © 2011 Elsevier Inc. All rights reserved.
Multivariate Frequency-Severity Regression Models in Insurance
Directory of Open Access Journals (Sweden)
Edward W. Frees
2016-02-01
Full Text Available In insurance and related industries including healthcare, it is common to have several outcome measures that the analyst wishes to understand using explanatory variables. For example, in automobile insurance, an accident may result in payments for damage to one’s own vehicle, damage to another party’s vehicle, or personal injury. It is also common to be interested in the frequency of accidents in addition to the severity of the claim amounts. This paper synthesizes and extends the literature on multivariate frequency-severity regression modeling with a focus on insurance industry applications. Regression models for understanding the distribution of each outcome continue to be developed yet there now exists a solid body of literature for the marginal outcomes. This paper contributes to this body of literature by focusing on the use of a copula for modeling the dependence among these outcomes; a major advantage of this tool is that it preserves the body of work established for marginal models. We illustrate this approach using data from the Wisconsin Local Government Property Insurance Fund. This fund offers insurance protection for (i property; (ii motor vehicle; and (iii contractors’ equipment claims. In addition to several claim types and frequency-severity components, outcomes can be further categorized by time and space, requiring complex dependency modeling. We find significant dependencies for these data; specifically, we find that dependencies among lines are stronger than the dependencies between the frequency and average severity within each line.
Yang, Xue; Lauzon, Carolyn B; Crainiceanu, Ciprian; Caffo, Brian; Resnick, Susan M; Landman, Bennett A
2012-09-01
Massively univariate regression and inference in the form of statistical parametric mapping have transformed the way in which multi-dimensional imaging data are studied. In functional and structural neuroimaging, the de facto standard "design matrix"-based general linear regression model and its multi-level cousins have enabled investigation of the biological basis of the human brain. With modern study designs, it is possible to acquire multi-modal three-dimensional assessments of the same individuals--e.g., structural, functional and quantitative magnetic resonance imaging, alongside functional and ligand binding maps with positron emission tomography. Largely, current statistical methods in the imaging community assume that the regressors are non-random. For more realistic multi-parametric assessment (e.g., voxel-wise modeling), distributional consideration of all observations is appropriate. Herein, we discuss two unified regression and inference approaches, model II regression and regression calibration, for use in massively univariate inference with imaging data. These methods use the design matrix paradigm and account for both random and non-random imaging regressors. We characterize these methods in simulation and illustrate their use on an empirical dataset. Both methods have been made readily available as a toolbox plug-in for the SPM software. Copyright © 2012 Elsevier Inc. All rights reserved.
Ogutu, Joseph O; Schulz-Streeck, Torben; Piepho, Hans-Peter
2012-05-21
Genomic selection (GS) is emerging as an efficient and cost-effective method for estimating breeding values using molecular markers distributed over the entire genome. In essence, it involves estimating the simultaneous effects of all genes or chromosomal segments and combining the estimates to predict the total genomic breeding value (GEBV). Accurate prediction of GEBVs is a central and recurring challenge in plant and animal breeding. The existence of a bewildering array of approaches for predicting breeding values using markers underscores the importance of identifying approaches able to efficiently and accurately predict breeding values. Here, we comparatively evaluate the predictive performance of six regularized linear regression methods-- ridge regression, ridge regression BLUP, lasso, adaptive lasso, elastic net and adaptive elastic net-- for predicting GEBV using dense SNP markers. We predicted GEBVs for a quantitative trait using a dataset on 3000 progenies of 20 sires and 200 dams and an accompanying genome consisting of five chromosomes with 9990 biallelic SNP-marker loci simulated for the QTL-MAS 2011 workshop. We applied all the six methods that use penalty-based (regularization) shrinkage to handle datasets with far more predictors than observations. The lasso, elastic net and their adaptive extensions further possess the desirable property that they simultaneously select relevant predictive markers and optimally estimate their effects. The regression models were trained with a subset of 2000 phenotyped and genotyped individuals and used to predict GEBVs for the remaining 1000 progenies without phenotypes. Predictive accuracy was assessed using the root mean squared error, the Pearson correlation between predicted GEBVs and (1) the true genomic value (TGV), (2) the true breeding value (TBV) and (3) the simulated phenotypic values based on fivefold cross-validation (CV). The elastic net, lasso, adaptive lasso and the adaptive elastic net all had
A test for the parameters of multiple linear regression models ...
African Journals Online (AJOL)
A test for the parameters of multiple linear regression models is developed for conducting tests simultaneously on all the parameters of multiple linear regression models. The test is robust relative to the assumptions of homogeneity of variances and absence of serial correlation of the classical F-test. Under certain null and ...
Genetic evaluation of European quails by random regression models
Directory of Open Access Journals (Sweden)
Flaviana Miranda Gonçalves
2012-09-01
Full Text Available The objective of this study was to compare different random regression models, defined from different classes of heterogeneity of variance combined with different Legendre polynomial orders for the estimate of (covariance of quails. The data came from 28,076 observations of 4,507 female meat quails of the LF1 lineage. Quail body weights were determined at birth and 1, 14, 21, 28, 35 and 42 days of age. Six different classes of residual variance were fitted to Legendre polynomial functions (orders ranging from 2 to 6 to determine which model had the best fit to describe the (covariance structures as a function of time. According to the evaluated criteria (AIC, BIC and LRT, the model with six classes of residual variances and of sixth-order Legendre polynomial was the best fit. The estimated additive genetic variance increased from birth to 28 days of age, and dropped slightly from 35 to 42 days. The heritability estimates decreased along the growth curve and changed from 0.51 (1 day to 0.16 (42 days. Animal genetic and permanent environmental correlation estimates between weights and age classes were always high and positive, except for birth weight. The sixth order Legendre polynomial, along with the residual variance divided into six classes was the best fit for the growth rate curve of meat quails; therefore, they should be considered for breeding evaluation processes by random regression models.
A new inverse regression model applied to radiation biodosimetry
Higueras, Manuel; Puig, Pedro; Ainsbury, Elizabeth A.; Rothkamm, Kai
2015-01-01
Biological dosimetry based on chromosome aberration scoring in peripheral blood lymphocytes enables timely assessment of the ionizing radiation dose absorbed by an individual. Here, new Bayesian-type count data inverse regression methods are introduced for situations where responses are Poisson or two-parameter compound Poisson distributed. Our Poisson models are calculated in a closed form, by means of Hermite and negative binomial (NB) distributions. For compound Poisson responses, complete and simplified models are provided. The simplified models are also expressible in a closed form and involve the use of compound Hermite and compound NB distributions. Three examples of applications are given that demonstrate the usefulness of these methodologies in cytogenetic radiation biodosimetry and in radiotherapy. We provide R and SAS codes which reproduce these examples. PMID:25663804
An approach for quantifying small effects in regression models.
Bedrick, Edward J; Hund, Lauren
2018-04-01
We develop a novel approach for quantifying small effects in regression models. Our method is based on variation in the mean function, in contrast to methods that focus on regression coefficients. Our idea applies in diverse settings such as testing for a negligible trend and quantifying differences in regression functions across strata. Straightforward Bayesian methods are proposed for inference. Four examples are used to illustrate the ideas.
Foster, Guy M.; Graham, Jennifer L.
2016-04-06
The Kansas River is a primary source of drinking water for about 800,000 people in northeastern Kansas. Source-water supplies are treated by a combination of chemical and physical processes to remove contaminants before distribution. Advanced notification of changing water-quality conditions and cyanobacteria and associated toxin and taste-and-odor compounds provides drinking-water treatment facilities time to develop and implement adequate treatment strategies. The U.S. Geological Survey (USGS), in cooperation with the Kansas Water Office (funded in part through the Kansas State Water Plan Fund), and the City of Lawrence, the City of Topeka, the City of Olathe, and Johnson County Water One, began a study in July 2012 to develop statistical models at two Kansas River sites located upstream from drinking-water intakes. Continuous water-quality monitors have been operated and discrete-water quality samples have been collected on the Kansas River at Wamego (USGS site number 06887500) and De Soto (USGS site number 06892350) since July 2012. Continuous and discrete water-quality data collected during July 2012 through June 2015 were used to develop statistical models for constituents of interest at the Wamego and De Soto sites. Logistic models to continuously estimate the probability of occurrence above selected thresholds were developed for cyanobacteria, microcystin, and geosmin. Linear regression models to continuously estimate constituent concentrations were developed for major ions, dissolved solids, alkalinity, nutrients (nitrogen and phosphorus species), suspended sediment, indicator bacteria (Escherichia coli, fecal coliform, and enterococci), and actinomycetes bacteria. These models will be used to provide real-time estimates of the probability that cyanobacteria and associated compounds exceed thresholds and of the concentrations of other water-quality constituents in the Kansas River. The models documented in this report are useful for characterizing changes
Optimal experimental designs for inverse quadratic regression models
Dette, Holger; Kiss, Christine
2007-01-01
In this paper optimal experimental designs for inverse quadratic regression models are determined. We consider two different parameterizations of the model and investigate local optimal designs with respect to the $c$-, $D$- and $E$-criteria, which reflect various aspects of the precision of the maximum likelihood estimator for the parameters in inverse quadratic regression models. In particular it is demonstrated that for a sufficiently large design space geometric allocation rules are optim...
A Bayesian semiparametric Markov regression model for juvenile dermatomyositis.
De Iorio, Maria; Gallot, Natacha; Valcarcel, Beatriz; Wedderburn, Lucy
2018-02-20
Juvenile dermatomyositis (JDM) is a rare autoimmune disease that may lead to serious complications, even to death. We develop a 2-state Markov regression model in a Bayesian framework to characterise disease progression in JDM over time and gain a better understanding of the factors influencing disease risk. The transition probabilities between disease and remission state (and vice versa) are a function of time-homogeneous and time-varying covariates. These latter types of covariates are introduced in the model through a latent health state function, which describes patient-specific health over time and accounts for variability among patients. We assume a nonparametric prior based on the Dirichlet process to model the health state function and the baseline transition intensities between disease and remission state and vice versa. The Dirichlet process induces a clustering of the patients in homogeneous risk groups. To highlight clinical variables that most affect the transition probabilities, we perform variable selection using spike and slab prior distributions. Posterior inference is performed through Markov chain Monte Carlo methods. Data were made available from the UK JDM Cohort and Biomarker Study and Repository, hosted at the UCL Institute of Child Health. Copyright © 2018 John Wiley & Sons, Ltd.
Directory of Open Access Journals (Sweden)
Drzewiecki Wojciech
2016-12-01
Full Text Available In this work nine non-linear regression models were compared for sub-pixel impervious surface area mapping from Landsat images. The comparison was done in three study areas both for accuracy of imperviousness coverage evaluation in individual points in time and accuracy of imperviousness change assessment. The performance of individual machine learning algorithms (Cubist, Random Forest, stochastic gradient boosting of regression trees, k-nearest neighbors regression, random k-nearest neighbors regression, Multivariate Adaptive Regression Splines, averaged neural networks, and support vector machines with polynomial and radial kernels was also compared with the performance of heterogeneous model ensembles constructed from the best models trained using particular techniques.
Methods of Detecting Outliers in A Regression Analysis Model ...
African Journals Online (AJOL)
PROF. O. E. OSUAGWU
2013-06-01
Jun 1, 2013 ... This is the type of linear regression that involves only two variables one independent and one dependent plus the random error term. The simple linear regression model assumes that there is a straight line (linear) relationship between the dependent variable Y and the independent variable X. This can be.
The R Package threg to Implement Threshold Regression Models
Directory of Open Access Journals (Sweden)
Tao Xiao
2015-08-01
This new package includes four functions: threg, and the methods hr, predict and plot for threg objects returned by threg. The threg function is the model-fitting function which is used to calculate regression coefficient estimates, asymptotic standard errors and p values. The hr method for threg objects is the hazard-ratio calculation function which provides the estimates of hazard ratios at selected time points for specified scenarios (based on given categories or value settings of covariates. The predict method for threg objects is used for prediction. And the plot method for threg objects provides plots for curves of estimated hazard functions, survival functions and probability density functions of the first-hitting-time; function curves corresponding to different scenarios can be overlaid in the same plot for comparison to give additional research insights.
Gao, Xiangyun; An, Haizhong; Fang, Wei; Huang, Xuan; Li, Huajiao; Zhong, Weiqiong; Ding, Yinghui
2014-07-01
The linear regression parameters between two time series can be different under different lengths of observation period. If we study the whole period by the sliding window of a short period, the change of the linear regression parameters is a process of dynamic transmission over time. We tackle fundamental research that presents a simple and efficient computational scheme: a linear regression patterns transmission algorithm, which transforms linear regression patterns into directed and weighted networks. The linear regression patterns (nodes) are defined by the combination of intervals of the linear regression parameters and the results of the significance testing under different sizes of the sliding window. The transmissions between adjacent patterns are defined as edges, and the weights of the edges are the frequency of the transmissions. The major patterns, the distance, and the medium in the process of the transmission can be captured. The statistical results of weighted out-degree and betweenness centrality are mapped on timelines, which shows the features of the distribution of the results. Many measurements in different areas that involve two related time series variables could take advantage of this algorithm to characterize the dynamic relationships between the time series from a new perspective.
Al-Ghraibah, Amani
error of approximately 3/4 a GOES class. We also consider thresholding the regressed flare size for the experiment containing both flaring and non-flaring regions and find a TPR. of 0.69 and a TNR of 0.86 for flare prediction, consistent with our previous studies of flare prediction using the same magnetic complexity features. The results for both of these size regression experiments are consistent across a wide range of predictive time windows, indicating that the magnetic complexity features may be persistent in appearance long before flare activity. This conjecture is supported by our larger error rates of some 40 hours in the time-to-flare regression problem. The magnetic complexity features considered here appear to have discriminative potential for flare size, but their persistence in time makes them less discriminative for the time-to-flare problem. We also study the prediction of solar flare size and time-to-flare using two temporal features, namely the ▵- and ▵-▵-features, the same average size and time-to-flare regression error are found when these temporal features are used in size and time-to-flare prediction. In the third topic, we study the temporal evolution of active region magnetic fields using Hidden Markov Models (HMMs) which is one of the efficient temporal analyses found in literature. We extracted 38 features which describing the complexity of the photospheric magnetic field. These features are converted into a sequence of symbols using k-nearest neighbor search method. We study many parameters before prediction; like the length of the training window Wtrain which denotes to the number of history images use to train the flare and non-flare HMMs, and number of hidden states Q. In training phase, the model parameters of the HMM of each category are optimized so as to best describe the training symbol sequences. In testing phase, we use the best flare and non-flare models to predict/classify active regions as a flaring or non-flaring region
Application of random regression models to the genetic evaluation ...
African Journals Online (AJOL)
The model included fixed regression on AM (range from 30 to 138 mo) and the effect of herd-measurement date concatenation. Random parts of the model were RRM coefficients for additive and permanent environmental effects, while residual effects were modelled to account for heterogeneity of variance by AY. Estimates ...
Regression Model Optimization for the Analysis of Experimental Data
Ulbrich, N.
2009-01-01
A candidate math model search algorithm was developed at Ames Research Center that determines a recommended math model for the multivariate regression analysis of experimental data. The search algorithm is applicable to classical regression analysis problems as well as wind tunnel strain gage balance calibration analysis applications. The algorithm compares the predictive capability of different regression models using the standard deviation of the PRESS residuals of the responses as a search metric. This search metric is minimized during the search. Singular value decomposition is used during the search to reject math models that lead to a singular solution of the regression analysis problem. Two threshold dependent constraints are also applied. The first constraint rejects math models with insignificant terms. The second constraint rejects math models with near-linear dependencies between terms. The math term hierarchy rule may also be applied as an optional constraint during or after the candidate math model search. The final term selection of the recommended math model depends on the regressor and response values of the data set, the user s function class combination choice, the user s constraint selections, and the result of the search metric minimization. A frequently used regression analysis example from the literature is used to illustrate the application of the search algorithm to experimental data.
Real estate value prediction using multivariate regression models
Manjula, R.; Jain, Shubham; Srivastava, Sharad; Rajiv Kher, Pranav
2017-11-01
The real estate market is one of the most competitive in terms of pricing and the same tends to vary significantly based on a lot of factors, hence it becomes one of the prime fields to apply the concepts of machine learning to optimize and predict the prices with high accuracy. Therefore in this paper, we present various important features to use while predicting housing prices with good accuracy. We have described regression models, using various features to have lower Residual Sum of Squares error. While using features in a regression model some feature engineering is required for better prediction. Often a set of features (multiple regressions) or polynomial regression (applying a various set of powers in the features) is used for making better model fit. For these models are expected to be susceptible towards over fitting ridge regression is used to reduce it. This paper thus directs to the best application of regression models in addition to other techniques to optimize the result.
Technological progress and regress in pre-industrial times
DEFF Research Database (Denmark)
Aiyar, Shekhar; Dalgaard, Carl-Johan Lars; Moav, Omer
2008-01-01
This paper offers micro-foundations for the dynamic relationship between technology and population in the pre-industrial world, accounting for both technological progress and the hitherto neglected but common phenomenon of technological regress. A positive feedback between population and the adop....... Inventions don't just get adopted once and forever; they have to be constantly practised and transmitted, or useful techniques may be forgotten. Jared Diamond, Ten Thousand Years of Solitude, 1993...
Alternative regression models to assess increase in childhood BMI
Directory of Open Access Journals (Sweden)
Mansmann Ulrich
2008-09-01
Full Text Available Abstract Background Body mass index (BMI data usually have skewed distributions, for which common statistical modeling approaches such as simple linear or logistic regression have limitations. Methods Different regression approaches to predict childhood BMI by goodness-of-fit measures and means of interpretation were compared including generalized linear models (GLMs, quantile regression and Generalized Additive Models for Location, Scale and Shape (GAMLSS. We analyzed data of 4967 children participating in the school entry health examination in Bavaria, Germany, from 2001 to 2002. TV watching, meal frequency, breastfeeding, smoking in pregnancy, maternal obesity, parental social class and weight gain in the first 2 years of life were considered as risk factors for obesity. Results GAMLSS showed a much better fit regarding the estimation of risk factors effects on transformed and untransformed BMI data than common GLMs with respect to the generalized Akaike information criterion. In comparison with GAMLSS, quantile regression allowed for additional interpretation of prespecified distribution quantiles, such as quantiles referring to overweight or obesity. The variables TV watching, maternal BMI and weight gain in the first 2 years were directly, and meal frequency was inversely significantly associated with body composition in any model type examined. In contrast, smoking in pregnancy was not directly, and breastfeeding and parental social class were not inversely significantly associated with body composition in GLM models, but in GAMLSS and partly in quantile regression models. Risk factor specific BMI percentile curves could be estimated from GAMLSS and quantile regression models. Conclusion GAMLSS and quantile regression seem to be more appropriate than common GLMs for risk factor modeling of BMI data.
Alternative regression models to assess increase in childhood BMI.
Beyerlein, Andreas; Fahrmeir, Ludwig; Mansmann, Ulrich; Toschke, André M
2008-09-08
Body mass index (BMI) data usually have skewed distributions, for which common statistical modeling approaches such as simple linear or logistic regression have limitations. Different regression approaches to predict childhood BMI by goodness-of-fit measures and means of interpretation were compared including generalized linear models (GLMs), quantile regression and Generalized Additive Models for Location, Scale and Shape (GAMLSS). We analyzed data of 4967 children participating in the school entry health examination in Bavaria, Germany, from 2001 to 2002. TV watching, meal frequency, breastfeeding, smoking in pregnancy, maternal obesity, parental social class and weight gain in the first 2 years of life were considered as risk factors for obesity. GAMLSS showed a much better fit regarding the estimation of risk factors effects on transformed and untransformed BMI data than common GLMs with respect to the generalized Akaike information criterion. In comparison with GAMLSS, quantile regression allowed for additional interpretation of prespecified distribution quantiles, such as quantiles referring to overweight or obesity. The variables TV watching, maternal BMI and weight gain in the first 2 years were directly, and meal frequency was inversely significantly associated with body composition in any model type examined. In contrast, smoking in pregnancy was not directly, and breastfeeding and parental social class were not inversely significantly associated with body composition in GLM models, but in GAMLSS and partly in quantile regression models. Risk factor specific BMI percentile curves could be estimated from GAMLSS and quantile regression models. GAMLSS and quantile regression seem to be more appropriate than common GLMs for risk factor modeling of BMI data.
Analysis of Sting Balance Calibration Data Using Optimized Regression Models
Ulbrich, N.; Bader, Jon B.
2010-01-01
Calibration data of a wind tunnel sting balance was processed using a candidate math model search algorithm that recommends an optimized regression model for the data analysis. During the calibration the normal force and the moment at the balance moment center were selected as independent calibration variables. The sting balance itself had two moment gages. Therefore, after analyzing the connection between calibration loads and gage outputs, it was decided to choose the difference and the sum of the gage outputs as the two responses that best describe the behavior of the balance. The math model search algorithm was applied to these two responses. An optimized regression model was obtained for each response. Classical strain gage balance load transformations and the equations of the deflection of a cantilever beam under load are used to show that the search algorithm s two optimized regression models are supported by a theoretical analysis of the relationship between the applied calibration loads and the measured gage outputs. The analysis of the sting balance calibration data set is a rare example of a situation when terms of a regression model of a balance can directly be derived from first principles of physics. In addition, it is interesting to note that the search algorithm recommended the correct regression model term combinations using only a set of statistical quality metrics that were applied to the experimental data during the algorithm s term selection process.
Robust mislabel logistic regression without modeling mislabel probabilities.
Hung, Hung; Jou, Zhi-Yu; Huang, Su-Yun
2018-03-01
Logistic regression is among the most widely used statistical methods for linear discriminant analysis. In many applications, we only observe possibly mislabeled responses. Fitting a conventional logistic regression can then lead to biased estimation. One common resolution is to fit a mislabel logistic regression model, which takes into consideration of mislabeled responses. Another common method is to adopt a robust M-estimation by down-weighting suspected instances. In this work, we propose a new robust mislabel logistic regression based on γ-divergence. Our proposal possesses two advantageous features: (1) It does not need to model the mislabel probabilities. (2) The minimum γ-divergence estimation leads to a weighted estimating equation without the need to include any bias correction term, that is, it is automatically bias-corrected. These features make the proposed γ-logistic regression more robust in model fitting and more intuitive for model interpretation through a simple weighting scheme. Our method is also easy to implement, and two types of algorithms are included. Simulation studies and the Pima data application are presented to demonstrate the performance of γ-logistic regression. © 2017, The International Biometric Society.
Characteristics and Properties of a Simple Linear Regression Model
Directory of Open Access Journals (Sweden)
Kowal Robert
2016-12-01
Full Text Available A simple linear regression model is one of the pillars of classic econometrics. Despite the passage of time, it continues to raise interest both from the theoretical side as well as from the application side. One of the many fundamental questions in the model concerns determining derivative characteristics and studying the properties existing in their scope, referring to the first of these aspects. The literature of the subject provides several classic solutions in that regard. In the paper, a completely new design is proposed, based on the direct application of variance and its properties, resulting from the non-correlation of certain estimators with the mean, within the scope of which some fundamental dependencies of the model characteristics are obtained in a much more compact manner. The apparatus allows for a simple and uniform demonstration of multiple dependencies and fundamental properties in the model, and it does it in an intuitive manner. The results were obtained in a classic, traditional area, where everything, as it might seem, has already been thoroughly studied and discovered.
Directory of Open Access Journals (Sweden)
Sareh Keshavarzi
2012-01-01
Full Text Available Background. In many studies with longitudinal data, time-dependent covariates can only be measured intermittently (not at all observation times, and this presents difficulties for standard statistical analyses. This situation is common in medical studies, and methods that deal with this challenge would be useful. Methods. In this study, we performed the seemingly unrelated regression (SUR based models, with respect to each observation time in longitudinal data with intermittently observed time-dependent covariates and further compared these models with mixed-effect regression models (MRMs under three classic imputation procedures. Simulation studies were performed to compare the sample size properties of the estimated coefficients for different modeling choices. Results. In general, the proposed models in the presence of intermittently observed time-dependent covariates showed a good performance. However, when we considered only the observed values of the covariate without any imputations, the resulted biases were greater. The performances of the proposed SUR-based models in comparison with MRM using classic imputation methods were nearly similar with approximately equal amounts of bias and MSE. Conclusion. The simulation study suggests that the SUR-based models work as efficiently as MRM in the case of intermittently observed time-dependent covariates. Thus, it can be used as an alternative to MRM.
Song, Chao; Kwan, Mei-Po; Zhu, Jiping
2017-04-08
An increasing number of fires are occurring with the rapid development of cities, resulting in increased risk for human beings and the environment. This study compares geographically weighted regression-based models, including geographically weighted regression (GWR) and geographically and temporally weighted regression (GTWR), which integrates spatial and temporal effects and global linear regression models (LM) for modeling fire risk at the city scale. The results show that the road density and the spatial distribution of enterprises have the strongest influences on fire risk, which implies that we should focus on areas where roads and enterprises are densely clustered. In addition, locations with a large number of enterprises have fewer fire ignition records, probably because of strict management and prevention measures. A changing number of significant variables across space indicate that heterogeneity mainly exists in the northern and eastern rural and suburban areas of Hefei city, where human-related facilities or road construction are only clustered in the city sub-centers. GTWR can capture small changes in the spatiotemporal heterogeneity of the variables while GWR and LM cannot. An approach that integrates space and time enables us to better understand the dynamic changes in fire risk. Thus governments can use the results to manage fire safety at the city scale.
Directory of Open Access Journals (Sweden)
Chao Song
2017-04-01
Full Text Available An increasing number of fires are occurring with the rapid development of cities, resulting in increased risk for human beings and the environment. This study compares geographically weighted regression-based models, including geographically weighted regression (GWR and geographically and temporally weighted regression (GTWR, which integrates spatial and temporal effects and global linear regression models (LM for modeling fire risk at the city scale. The results show that the road density and the spatial distribution of enterprises have the strongest influences on fire risk, which implies that we should focus on areas where roads and enterprises are densely clustered. In addition, locations with a large number of enterprises have fewer fire ignition records, probably because of strict management and prevention measures. A changing number of significant variables across space indicate that heterogeneity mainly exists in the northern and eastern rural and suburban areas of Hefei city, where human-related facilities or road construction are only clustered in the city sub-centers. GTWR can capture small changes in the spatiotemporal heterogeneity of the variables while GWR and LM cannot. An approach that integrates space and time enables us to better understand the dynamic changes in fire risk. Thus governments can use the results to manage fire safety at the city scale.
Buffalos milk yield analysis using random regression models
Directory of Open Access Journals (Sweden)
A.S. Schierholt
2010-02-01
Full Text Available Data comprising 1,719 milk yield records from 357 females (predominantly Murrah breed, daughters of 110 sires, with births from 1974 to 2004, obtained from the Programa de Melhoramento Genético de Bubalinos (PROMEBUL and from records of EMBRAPA Amazônia Oriental - EAO herd, located in Belém, Pará, Brazil, were used to compare random regression models for estimating variance components and predicting breeding values of the sires. The data were analyzed by different models using the Legendre’s polynomial functions from second to fourth orders. The random regression models included the effects of herd-year, month of parity date of the control; regression coefficients for age of females (in order to describe the fixed part of the lactation curve and random regression coefficients related to the direct genetic and permanent environment effects. The comparisons among the models were based on the Akaike Infromation Criterion. The random effects regression model using third order Legendre’s polynomials with four classes of the environmental effect were the one that best described the additive genetic variation in milk yield. The heritability estimates varied from 0.08 to 0.40. The genetic correlation between milk yields in younger ages was close to the unit, but in older ages it was low.
Linear regression models for quantitative assessment of left ...
African Journals Online (AJOL)
Changes in left ventricular structures and function have been reported in cardiomyopathies. No prediction models have been established in this environment. This study established regression models for prediction of left ventricular structures in normal subjects. A sample of normal subjects was drawn from a large urban ...
Uncertainties in spatially aggregated predictions from a logistic regression model
Horssen, P.W. van; Pebesma, E.J.; Schot, P.P.
2002-01-01
This paper presents a method to assess the uncertainty of an ecological spatial prediction model which is based on logistic regression models, using data from the interpolation of explanatory predictor variables. The spatial predictions are presented as approximate 95% prediction intervals. The
Regression models for estimating charcoal yield in a Eucalyptus ...
African Journals Online (AJOL)
... dbh2H, and the product of dbh and merchantable height [(dbh)MH] as independent variables. Results of residual analysis showed that the models satisfied all the assumptions of regression analysis. Keywords: Models, charcoal production, biomass, Eucalyptus, arid, anergy, allometric. Bowen Journal of Agriculture Vol.
CICAAR - Convolutive ICA with an Auto-Regressive Inverse Model
DEFF Research Database (Denmark)
Dyrholm, Mads; Hansen, Lars Kai
2004-01-01
We invoke an auto-regressive IIR inverse model for convolutive ICA and derive expressions for the likelihood and its gradient. We argue that optimization will give a stable inverse. When there are more sensors than sources the mixing model parameters are estimated in a second step by least squares...
Default Bayes Factors for Model Selection in Regression
Rouder, Jeffrey N.; Morey, Richard D.
2012-01-01
In this article, we present a Bayes factor solution for inference in multiple regression. Bayes factors are principled measures of the relative evidence from data for various models or positions, including models that embed null hypotheses. In this regard, they may be used to state positive evidence for a lack of an effect, which is not possible…
Geographically Weighted Logistic Regression Applied to Credit Scoring Models
Directory of Open Access Journals (Sweden)
Pedro Henrique Melo Albuquerque
Full Text Available Abstract This study used real data from a Brazilian financial institution on transactions involving Consumer Direct Credit (CDC, granted to clients residing in the Distrito Federal (DF, to construct credit scoring models via Logistic Regression and Geographically Weighted Logistic Regression (GWLR techniques. The aims were: to verify whether the factors that influence credit risk differ according to the borrower’s geographic location; to compare the set of models estimated via GWLR with the global model estimated via Logistic Regression, in terms of predictive power and financial losses for the institution; and to verify the viability of using the GWLR technique to develop credit scoring models. The metrics used to compare the models developed via the two techniques were the AICc informational criterion, the accuracy of the models, the percentage of false positives, the sum of the value of false positive debt, and the expected monetary value of portfolio default compared with the monetary value of defaults observed. The models estimated for each region in the DF were distinct in their variables and coefficients (parameters, with it being concluded that credit risk was influenced differently in each region in the study. The Logistic Regression and GWLR methodologies presented very close results, in terms of predictive power and financial losses for the institution, and the study demonstrated viability in using the GWLR technique to develop credit scoring models for the target population in the study.
Visualisation and interpretation of Support Vector Regression models.
Ustün, B; Melssen, W J; Buydens, L M C
2007-07-09
This paper introduces a technique to visualise the information content of the kernel matrix and a way to interpret the ingredients of the Support Vector Regression (SVR) model. Recently, the use of Support Vector Machines (SVM) for solving classification (SVC) and regression (SVR) problems has increased substantially in the field of chemistry and chemometrics. This is mainly due to its high generalisation performance and its ability to model non-linear relationships in a unique and global manner. Modeling of non-linear relationships will be enabled by applying a kernel function. The kernel function transforms the input data, usually non-linearly related to the associated output property, into a high dimensional feature space where the non-linear relationship can be represented in a linear form. Usually, SVMs are applied as a black box technique. Hence, the model cannot be interpreted like, e.g., Partial Least Squares (PLS). For example, the PLS scores and loadings make it possible to visualise and understand the driving force behind the optimal PLS machinery. In this study, we have investigated the possibilities to visualise and interpret the SVM model. Here, we exclusively have focused on Support Vector Regression to demonstrate these visualisation and interpretation techniques. Our observations show that we are now able to turn a SVR black box model into a transparent and interpretable regression modeling technique.
Model-based Quantile Regression for Discrete Data
Padellini, Tullia
2018-04-10
Quantile regression is a class of methods voted to the modelling of conditional quantiles. In a Bayesian framework quantile regression has typically been carried out exploiting the Asymmetric Laplace Distribution as a working likelihood. Despite the fact that this leads to a proper posterior for the regression coefficients, the resulting posterior variance is however affected by an unidentifiable parameter, hence any inferential procedure beside point estimation is unreliable. We propose a model-based approach for quantile regression that considers quantiles of the generating distribution directly, and thus allows for a proper uncertainty quantification. We then create a link between quantile regression and generalised linear models by mapping the quantiles to the parameter of the response variable, and we exploit it to fit the model with R-INLA. We extend it also in the case of discrete responses, where there is no 1-to-1 relationship between quantiles and distribution\\'s parameter, by introducing continuous generalisations of the most common discrete variables (Poisson, Binomial and Negative Binomial) to be exploited in the fitting.
Cruz, Antonio M; Barr, Cameron; Puñales-Pozo, Elsa
2008-01-01
This research's main goals were to build a predictor for a turnaround time (TAT) indicator for estimating its values and use a numerical clustering technique for finding possible causes of undesirable TAT values. The following stages were used: domain understanding, data characterisation and sample reduction and insight characterisation. Building the TAT indicator multiple linear regression predictor and clustering techniques were used for improving corrective maintenance task efficiency in a clinical engineering department (CED). The indicator being studied was turnaround time (TAT). Multiple linear regression was used for building a predictive TAT value model. The variables contributing to such model were clinical engineering department response time (CE(rt), 0.415 positive coefficient), stock service response time (Stock(rt), 0.734 positive coefficient), priority level (0.21 positive coefficient) and service time (0.06 positive coefficient). The regression process showed heavy reliance on Stock(rt), CE(rt) and priority, in that order. Clustering techniques revealed the main causes of high TAT values. This examination has provided a means for analysing current technical service quality and effectiveness. In doing so, it has demonstrated a process for identifying areas and methods of improvement and a model against which to analyse these methods' effectiveness.
Harrell , Jr , Frank E
2015-01-01
This highly anticipated second edition features new chapters and sections, 225 new references, and comprehensive R software. In keeping with the previous edition, this book is about the art and science of data analysis and predictive modeling, which entails choosing and using multiple tools. Instead of presenting isolated techniques, this text emphasizes problem solving strategies that address the many issues arising when developing multivariable models using real data and not standard textbook examples. It includes imputation methods for dealing with missing data effectively, methods for fitting nonlinear relationships and for making the estimation of transformations a formal part of the modeling process, methods for dealing with "too many variables to analyze and not enough observations," and powerful model validation techniques based on the bootstrap. The reader will gain a keen understanding of predictive accuracy, and the harm of categorizing continuous predictors or outcomes. This text realistically...
Model building strategy for logistic regression: purposeful selection.
Zhang, Zhongheng
2016-03-01
Logistic regression is one of the most commonly used models to account for confounders in medical literature. The article introduces how to perform purposeful selection model building strategy with R. I stress on the use of likelihood ratio test to see whether deleting a variable will have significant impact on model fit. A deleted variable should also be checked for whether it is an important adjustment of remaining covariates. Interaction should be checked to disentangle complex relationship between covariates and their synergistic effect on response variable. Model should be checked for the goodness-of-fit (GOF). In other words, how the fitted model reflects the real data. Hosmer-Lemeshow GOF test is the most widely used for logistic regression model.
Thermal Efficiency Degradation Diagnosis Method Using Regression Model
International Nuclear Information System (INIS)
Jee, Chang Hyun; Heo, Gyun Young; Jang, Seok Won; Lee, In Cheol
2011-01-01
This paper proposes an idea for thermal efficiency degradation diagnosis in turbine cycles, which is based on turbine cycle simulation under abnormal conditions and a linear regression model. The correlation between the inputs for representing degradation conditions (normally unmeasured but intrinsic states) and the simulation outputs (normally measured but superficial states) was analyzed with the linear regression model. The regression models can inversely response an associated intrinsic state for a superficial state observed from a power plant. The diagnosis method proposed herein is classified into three processes, 1) simulations for degradation conditions to get measured states (referred as what-if method), 2) development of the linear model correlating intrinsic and superficial states, and 3) determination of an intrinsic state using the superficial states of current plant and the linear regression model (referred as inverse what-if method). The what-if method is to generate the outputs for the inputs including various root causes and/or boundary conditions whereas the inverse what-if method is the process of calculating the inverse matrix with the given superficial states, that is, component degradation modes. The method suggested in this paper was validated using the turbine cycle model for an operating power plant
Multiple Response Regression for Gaussian Mixture Models with Known Labels.
Lee, Wonyul; Du, Ying; Sun, Wei; Hayes, D Neil; Liu, Yufeng
2012-12-01
Multiple response regression is a useful regression technique to model multiple response variables using the same set of predictor variables. Most existing methods for multiple response regression are designed for modeling homogeneous data. In many applications, however, one may have heterogeneous data where the samples are divided into multiple groups. Our motivating example is a cancer dataset where the samples belong to multiple cancer subtypes. In this paper, we consider modeling the data coming from a mixture of several Gaussian distributions with known group labels. A naive approach is to split the data into several groups according to the labels and model each group separately. Although it is simple, this approach ignores potential common structures across different groups. We propose new penalized methods to model all groups jointly in which the common and unique structures can be identified. The proposed methods estimate the regression coefficient matrix, as well as the conditional inverse covariance matrix of response variables. Asymptotic properties of the proposed methods are explored. Through numerical examples, we demonstrate that both estimation and prediction can be improved by modeling all groups jointly using the proposed methods. An application to a glioblastoma cancer dataset reveals some interesting common and unique gene relationships across different cancer subtypes.
Overcoming Spurious Regression Using time-Varying Fourier ...
African Journals Online (AJOL)
Non-stationary time series data have been traditionally analyzed in the frequency domain by assuming constant amplitudes regardless of the timelag. A new approach called time-varying amplitude method (TVAM) is presented here. Oscillations are analyzed for changes in the magnitude of Fourier Coefficients which are ...
Detecting influential observations in nonlinear regression modeling of groundwater flow
Yager, Richard M.
1998-01-01
Nonlinear regression is used to estimate optimal parameter values in models of groundwater flow to ensure that differences between predicted and observed heads and flows do not result from nonoptimal parameter values. Parameter estimates can be affected, however, by observations that disproportionately influence the regression, such as outliers that exert undue leverage on the objective function. Certain statistics developed for linear regression can be used to detect influential observations in nonlinear regression if the models are approximately linear. This paper discusses the application of Cook's D, which measures the effect of omitting a single observation on a set of estimated parameter values, and the statistical parameter DFBETAS, which quantifies the influence of an observation on each parameter. The influence statistics were used to (1) identify the influential observations in the calibration of a three-dimensional, groundwater flow model of a fractured-rock aquifer through nonlinear regression, and (2) quantify the effect of omitting influential observations on the set of estimated parameter values. Comparison of the spatial distribution of Cook's D with plots of model sensitivity shows that influential observations correspond to areas where the model heads are most sensitive to certain parameters, and where predicted groundwater flow rates are largest. Five of the six discharge observations were identified as influential, indicating that reliable measurements of groundwater flow rates are valuable data in model calibration. DFBETAS are computed and examined for an alternative model of the aquifer system to identify a parameterization error in the model design that resulted in overestimation of the effect of anisotropy on horizontal hydraulic conductivity.
Direction of Effects in Multiple Linear Regression Models.
Wiedermann, Wolfgang; von Eye, Alexander
2015-01-01
Previous studies analyzed asymmetric properties of the Pearson correlation coefficient using higher than second order moments. These asymmetric properties can be used to determine the direction of dependence in a linear regression setting (i.e., establish which of two variables is more likely to be on the outcome side) within the framework of cross-sectional observational data. Extant approaches are restricted to the bivariate regression case. The present contribution extends the direction of dependence methodology to a multiple linear regression setting by analyzing distributional properties of residuals of competing multiple regression models. It is shown that, under certain conditions, the third central moments of estimated regression residuals can be used to decide upon direction of effects. In addition, three different approaches for statistical inference are discussed: a combined D'Agostino normality test, a skewness difference test, and a bootstrap difference test. Type I error and power of the procedures are assessed using Monte Carlo simulations, and an empirical example is provided for illustrative purposes. In the discussion, issues concerning the quality of psychological data, possible extensions of the proposed methods to the fourth central moment of regression residuals, and potential applications are addressed.
Logistic regression for risk factor modelling in stuttering research.
Reed, Phil; Wu, Yaqionq
2013-06-01
To outline the uses of logistic regression and other statistical methods for risk factor analysis in the context of research on stuttering. The principles underlying the application of a logistic regression are illustrated, and the types of questions to which such a technique has been applied in the stuttering field are outlined. The assumptions and limitations of the technique are discussed with respect to existing stuttering research, and with respect to formulating appropriate research strategies to accommodate these considerations. Finally, some alternatives to the approach are briefly discussed. The way the statistical procedures are employed are demonstrated with some hypothetical data. Research into several practical issues concerning stuttering could benefit if risk factor modelling were used. Important examples are early diagnosis, prognosis (whether a child will recover or persist) and assessment of treatment outcome. After reading this article you will: (a) Summarize the situations in which logistic regression can be applied to a range of issues about stuttering; (b) Follow the steps in performing a logistic regression analysis; (c) Describe the assumptions of the logistic regression technique and the precautions that need to be checked when it is employed; (d) Be able to summarize its advantages over other techniques like estimation of group differences and simple regression. Copyright © 2012 Elsevier Inc. All rights reserved.
Bayesian approach to errors-in-variables in regression models
Rozliman, Nur Aainaa; Ibrahim, Adriana Irawati Nur; Yunus, Rossita Mohammad
2017-05-01
In many applications and experiments, data sets are often contaminated with error or mismeasured covariates. When at least one of the covariates in a model is measured with error, Errors-in-Variables (EIV) model can be used. Measurement error, when not corrected, would cause misleading statistical inferences and analysis. Therefore, our goal is to examine the relationship of the outcome variable and the unobserved exposure variable given the observed mismeasured surrogate by applying the Bayesian formulation to the EIV model. We shall extend the flexible parametric method proposed by Hossain and Gustafson (2009) to another nonlinear regression model which is the Poisson regression model. We shall then illustrate the application of this approach via a simulation study using Markov chain Monte Carlo sampling methods.
Optimum short-time polynomial regression for signal analysis
Indian Academy of Sciences (India)
In the presence of noise, such modeling becomes important, because the polynomial approximation performs smoothing leading to noise suppression. The problem ... However, approximate optimization can be achieved using only the variance expressions and the intersection-ofconfidence-intervals (ICI) technique. The ICI ...
Efficient estimation of an additive quantile regression model
Cheng, Y.; de Gooijer, J.G.; Zerom, D.
2009-01-01
In this paper two kernel-based nonparametric estimators are proposed for estimating the components of an additive quantile regression model. The first estimator is a computationally convenient approach which can be viewed as a viable alternative to the method of De Gooijer and Zerom (2003). By
Efficient estimation of an additive quantile regression model
Cheng, Y.; de Gooijer, J.G.; Zerom, D.
2010-01-01
In this paper two kernel-based nonparametric estimators are proposed for estimating the components of an additive quantile regression model. The first estimator is a computationally convenient approach which can be viewed as a viable alternative to the method of De Gooijer and Zerom (2003). By
Efficient estimation of an additive quantile regression model
Cheng, Y.; de Gooijer, J.G.; Zerom, D.
2011-01-01
In this paper, two non-parametric estimators are proposed for estimating the components of an additive quantile regression model. The first estimator is a computationally convenient approach which can be viewed as a more viable alternative to existing kernel-based approaches. The second estimator
Linearity and Misspecification Tests for Vector Smooth Transition Regression Models
DEFF Research Database (Denmark)
Teräsvirta, Timo; Yang, Yukai
The purpose of the paper is to derive Lagrange multiplier and Lagrange multiplier type specification and misspecification tests for vector smooth transition regression models. We report results from simulation studies in which the size and power properties of the proposed asymptotic tests in small...
Application of multilinear regression analysis in modeling of soil ...
African Journals Online (AJOL)
The application of Multi-Linear Regression Analysis (MLRA) model for predicting soil properties in Calabar South offers a technical guide and solution in foundation designs problems in the area. Forty-five soil samples were collected from fifteen different boreholes at a different depth and 270 tests were carried out for CBR, ...
Regression Models and Experimental Designs : A Tutorial for Simulation Analaysts
Kleijnen, J.P.C.
2006-01-01
This tutorial explains the basics of linear regression models. especially low-order polynomials. and the corresponding statistical designs. namely, designs of resolution III, IV, V, and Central Composite Designs (CCDs).This tutorial assumes 'white noise', which means that the residuals of the fitted
application of multilinear regression analysis in modeling of soil
African Journals Online (AJOL)
Windows User
APPLICATION OF MULTILINEAR REGRESSION ANALYSIS IN MODELING OF. SOIL PROPERTIES FOR GEOTECHNICAL CIVIL ENGINEERING WORKS. IN CALABAR SOUTH. J. G. Egbe1, D. E. Ewa2, S. E. Ubi3, G. B. Ikwa4 and O. O. Tumenayo5. 1, 2, 3, 4, DEPT. OF CIVIL ENGINEERING, CROSS RIVER UNIV.
A binary logistic regression model with complex sampling design of ...
African Journals Online (AJOL)
A binary logistic regression model with complex sampling design of unmet need for family planning among all women aged (15-49) in Ethiopia. ... Conclusion: The key determinants of unmet need family planning in Ethiopia were residence, age, marital-status, education, household members, birth-events and number of ...
Transpiration of glasshouse rose crops: evaluation of regression models
Baas, R.; Rijssel, van E.
2006-01-01
Regression models of transpiration (T) based on global radiation inside the greenhouse (G), with or without energy input from heating pipes (Eh) and/or vapor pressure deficit (VPD) were parameterized. Therefore, data on T, G, temperatures from air, canopy and heating pipes, and VPD from both a
Optimum short-time polynomial regression for signal analysis
Indian Academy of Sciences (India)
A Sreenivasa Murthy
equivalent to discrete-time convolution with a fixed filter that depends on the window size and order of the .... pseudo-random white noise sequence to the synthesized signal (cf. figure 1(b)). The signal-to-noise ratio ..... Gaussian random variable distributed around s with bias. EfDsLg and standard deviation rL. Thus, for a ...
Auto-correlograms and auto-regressive models of trace metal distributions in Cochin backwaters
Digital Repository Service at National Institute of Oceanography (India)
Jayalakshmy, K.V.; Sankaranarayanan, V.N.
,2 and 3 and for Zn at stations 1 and 4. The stability in time for the concentration profiles increases as Fe Mn Ni Cu Co. The fraction of variability in the variables obtained by the auto-regressive model of order 1 ranges from 20 to 50%. Auto-regressive...
Transductive Ridge Regression in Structure-activity Modeling.
Marcou, Gilles; Delouis, Grace; Mokshyna, Olena; Horvath, Dragos; Lachiche, Nicolas; Varnek, Alexandre
2018-01-01
In this article we consider the application of the Transductive Ridge Regression (TRR) approach to structure-activity modeling. An original procedure of the TRR parameters optimization is suggested. Calculations performed on 3 different datasets involving two types of descriptors demonstrated that TRR outperforms its non-transductive analogue (Ridge Regression) in more than 90 % of cases. The most significant transductive effect was observed for small datasets. This suggests that transduction may be particularly useful when the data are expensive or difficult to collect. © 2018 Wiley-VCH Verlag GmbH & Co. KGaA, Weinheim.
Approximating prediction uncertainty for random forest regression models
John W. Coulston; Christine E. Blinn; Valerie A. Thomas; Randolph H. Wynne
2016-01-01
Machine learning approaches such as random forest haveÂ increased for the spatial modeling and mapping of continuousÂ variables. Random forest is a non-parametric ensembleÂ approach, and unlike traditional regression approaches thereÂ is no direct quantification of prediction error. UnderstandingÂ prediction uncertainty is important when using model-basedÂ continuous maps as...
Flexible competing risks regression modeling and goodness-of-fit
DEFF Research Database (Denmark)
Scheike, Thomas; Zhang, Mei-Jie
2008-01-01
In this paper we consider different approaches for estimation and assessment of covariate effects for the cumulative incidence curve in the competing risks model. The classic approach is to model all cause-specific hazards and then estimate the cumulative incidence curve based on these cause......-specific hazards. Another recent approach is to directly model the cumulative incidence by a proportional model (Fine and Gray, J Am Stat Assoc 94:496-509, 1999), and then obtain direct estimates of how covariates influences the cumulative incidence curve. We consider a simple and flexible class of regression...
Autoregressive Model Using Fuzzy C-Regression Model Clustering for Traffic Modeling
Tanaka, Fumiaki; Suzuki, Yukinori; Maeda, Junji
A robust traffic modeling is required to perform an effective congestion control for the broad band digital network. An autoregressive model using a fuzzy c-regression model (FCRM) clustering is proposed for a traffic modeling. This is a simpler modeling method than previous methods. The experiments show that the proposed method is more robust for traffic modeling than the previous method.
Yang, Huiqin; Thompson, Carl; Bland, Martin
2014-11-15
Time pressure is common in acute healthcare and significantly influences clinical judgement and decision making. Despite nurses' judgements being studied since the 1960s, the empirical picture of how time pressure impacts on nurses' judgement strategies and outcomes remain undeveloped. This paper aims to assess alterations in nurses' judgement strategies and outcomes under time pressure in a simulated acute care setting. In a simulated acute care environment, ninety-seven nurses were exposed to 25 clinical scenarios under time pressured and no time pressured conditions. Scenarios were randomly sampled from a large dataset of patient cases. A reference standard (judgement correctness) was generated from the same patient case records. In 12 of the scenarios only 20 seconds per judgement was allowed, in the other 13 scenarios no time pressure existed. Percentage of correct judgments in both conditions was calculated. Logistic regression modelling (of 2,425 observations) described the relationship between information cues used and judgments made. The degree of attention paid to particular cues was captured by calculating cue relative weights. The clustering effect of nurses was countered by estimating robust standard errors. The Chow test was used to test the null hypothesis that differences in regression coefficients in time pressure and no time pressure models were zero. Compared to no time pressure, no significant difference was observed in the proportion of correct judgments when nurses were put under time pressure. However, time pressure significantly impacted on the judgment strategies employed. Whilst nurses predominantly used respiration rate to make judgements, they used fewer cues to reach their clinical judgements under time pressure. The relative weighting afforded to heart rate was much smaller in the time pressure regression model, indicating that nurses paid significantly less attention to it when making judgements under time pressure. Time pressure had
Regression Model to Predict Global Solar Irradiance in Malaysia
Directory of Open Access Journals (Sweden)
Hairuniza Ahmed Kutty
2015-01-01
Full Text Available A novel regression model is developed to estimate the monthly global solar irradiance in Malaysia. The model is developed based on different available meteorological parameters, including temperature, cloud cover, rain precipitate, relative humidity, wind speed, pressure, and gust speed, by implementing regression analysis. This paper reports on the details of the analysis of the effect of each prediction parameter to identify the parameters that are relevant to estimating global solar irradiance. In addition, the proposed model is compared in terms of the root mean square error (RMSE, mean bias error (MBE, and the coefficient of determination (R2 with other models available from literature studies. Seven models based on single parameters (PM1 to PM7 and five multiple-parameter models (PM7 to PM12 are proposed. The new models perform well, with RMSE ranging from 0.429% to 1.774%, R2 ranging from 0.942 to 0.992, and MBE ranging from −0.1571% to 0.6025%. In general, cloud cover significantly affects the estimation of global solar irradiance. However, cloud cover in Malaysia lacks sufficient influence when included into multiple-parameter models although it performs fairly well in single-parameter prediction models.
Modeling energy expenditure in children and adolescents using quantile regression.
Yang, Yunwen; Adolph, Anne L; Puyau, Maurice R; Vohra, Firoz A; Butte, Nancy F; Zakeri, Issa F
2013-07-15
Advanced mathematical models have the potential to capture the complex metabolic and physiological processes that result in energy expenditure (EE). Study objective is to apply quantile regression (QR) to predict EE and determine quantile-dependent variation in covariate effects in nonobese and obese children. First, QR models will be developed to predict minute-by-minute awake EE at different quantile levels based on heart rate (HR) and physical activity (PA) accelerometry counts, and child characteristics of age, sex, weight, and height. Second, the QR models will be used to evaluate the covariate effects of weight, PA, and HR across the conditional EE distribution. QR and ordinary least squares (OLS) regressions are estimated in 109 children, aged 5-18 yr. QR modeling of EE outperformed OLS regression for both nonobese and obese populations. Average prediction errors for QR compared with OLS were not only smaller at the median τ = 0.5 (18.6 vs. 21.4%), but also substantially smaller at the tails of the distribution (10.2 vs. 39.2% at τ = 0.1 and 8.7 vs. 19.8% at τ = 0.9). Covariate effects of weight, PA, and HR on EE for the nonobese and obese children differed across quantiles (P effects of weight, PA, and HR on EE in nonobese and obese children.
Resampling procedures to validate dendro-auxometric regression models
Directory of Open Access Journals (Sweden)
2009-03-01
Full Text Available Regression analysis has a large use in several sectors of forest research. The validation of a dendro-auxometric model is a basic step in the building of the model itself. The more a model resists to attempts of demonstrating its groundlessness, the more its reliability increases. In the last decades many new theories, that quite utilizes the calculation speed of the calculators, have been formulated. Here we show the results obtained by the application of a bootsprap resampling procedure as a validation tool.
A mixed-effects multinomial logistic regression model.
Hedeker, Donald
2003-05-15
A mixed-effects multinomial logistic regression model is described for analysis of clustered or longitudinal nominal or ordinal response data. The model is parameterized to allow flexibility in the choice of contrasts used to represent comparisons across the response categories. Estimation is achieved using a maximum marginal likelihood (MML) solution that uses quadrature to numerically integrate over the distribution of random effects. An analysis of a psychiatric data set, in which homeless adults with serious mental illness are repeatedly classified in terms of their living arrangement, is used to illustrate features of the model. Copyright 2003 by John Wiley & Sons, Ltd.
Online Statistical Modeling (Regression Analysis) for Independent Responses
Made Tirta, I.; Anggraeni, Dian; Pandutama, Martinus
2017-06-01
Regression analysis (statistical analmodelling) are among statistical methods which are frequently needed in analyzing quantitative data, especially to model relationship between response and explanatory variables. Nowadays, statistical models have been developed into various directions to model various type and complex relationship of data. Rich varieties of advanced and recent statistical modelling are mostly available on open source software (one of them is R). However, these advanced statistical modelling, are not very friendly to novice R users, since they are based on programming script or command line interface. Our research aims to developed web interface (based on R and shiny), so that most recent and advanced statistical modelling are readily available, accessible and applicable on web. We have previously made interface in the form of e-tutorial for several modern and advanced statistical modelling on R especially for independent responses (including linear models/LM, generalized linier models/GLM, generalized additive model/GAM and generalized additive model for location scale and shape/GAMLSS). In this research we unified them in the form of data analysis, including model using Computer Intensive Statistics (Bootstrap and Markov Chain Monte Carlo/ MCMC). All are readily accessible on our online Virtual Statistics Laboratory. The web (interface) make the statistical modeling becomes easier to apply and easier to compare them in order to find the most appropriate model for the data.
Segmented relationships to model erosion of regression effect in Cox regression.
Muggeo, Vito M R; Attanasio, Massimo
2011-08-01
In this article we propose a parsimonious parameterisation to model the so-called erosion of the covariate effect in the Cox model, namely a covariate effect approaching to zero as the follow-up time increases. The proposed parameterisation is based on the segmented relationship where proper constraints are set to accomodate for the erosion. Relevant hypothesis testing is discussed. The approach is illustrated on two historical datasets in the survival analysis literature, and some simulation studies are presented to show how the proposed framework leads to a test for a global effect with good power as compared with alternative procedures. Finally, possible generalisations are also presented for future research.
Bootstrapping heteroskedastic regression models: wild bootstrap vs. pairs bootstrap
Emmanuel Flachaire
2005-01-01
In regression models, appropriate bootstrap methods for inference robust to heteroskedasticity of unknown form are the wild bootstrap and the pairs bootstrap. The finite sample performance of a heteroskedastic-robust test is investigated with Monte Carlo experiments. The simulation results suggest that one specific version of the wild bootstrap outperforms the other versions of the wild bootstrap and of the pairs bootstrap. It is the only one for which the bootstrap test gives always better r...
Correlation-regression model for physico-chemical quality of ...
African Journals Online (AJOL)
abusaad
3Department of Zoology, Gulbarga University Gulbarga, India. Accepted 2 July, 2012 ... multiple R2 value of 0.999 indicated that 99.9% variability in observed EC could be ascribed to Clˉ (76%),. HCO3. ˉ. (12.5%), NO3. - (10.3%) and SO4. 2- (1.1%). Multiple regression models can predict EC at 5% level of significance.
Bonellie, Sandra R
2012-10-01
To illustrate the use of regression and logistic regression models to investigate changes over time in size of babies particularly in relation to social deprivation, age of the mother and smoking. Mean birthweight has been found to be increasing in many countries in recent years, but there are still a group of babies who are born with low birthweights. Population-based retrospective cohort study. Multiple linear regression and logistic regression models are used to analyse data on term 'singleton births' from Scottish hospitals between 1994-2003. Mothers who smoke are shown to give birth to lighter babies on average, a difference of approximately 0.57 Standard deviations lower (95% confidence interval. 0.55-0.58) when adjusted for sex and parity. These mothers are also more likely to have babies that are low birthweight (odds ratio 3.46, 95% confidence interval 3.30-3.63) compared with non-smokers. Low birthweight is 30% more likely where the mother lives in the most deprived areas compared with the least deprived, (odds ratio 1.30, 95% confidence interval 1.21-1.40). Smoking during pregnancy is shown to have a detrimental effect on the size of infants at birth. This effect explains some, though not all, of the observed socioeconomic birthweight. It also explains much of the observed birthweight differences by the age of the mother. Identifying mothers at greater risk of having a low birthweight baby as important implications for the care and advice this group receives. © 2012 Blackwell Publishing Ltd.
Augmented Beta rectangular regression models: A Bayesian perspective.
Wang, Jue; Luo, Sheng
2016-01-01
Mixed effects Beta regression models based on Beta distributions have been widely used to analyze longitudinal percentage or proportional data ranging between zero and one. However, Beta distributions are not flexible to extreme outliers or excessive events around tail areas, and they do not account for the presence of the boundary values zeros and ones because these values are not in the support of the Beta distributions. To address these issues, we propose a mixed effects model using Beta rectangular distribution and augment it with the probabilities of zero and one. We conduct extensive simulation studies to assess the performance of mixed effects models based on both the Beta and Beta rectangular distributions under various scenarios. The simulation studies suggest that the regression models based on Beta rectangular distributions improve the accuracy of parameter estimates in the presence of outliers and heavy tails. The proposed models are applied to the motivating Neuroprotection Exploratory Trials in Parkinson's Disease (PD) Long-term Study-1 (LS-1 study, n = 1741), developed by The National Institute of Neurological Disorders and Stroke Exploratory Trials in Parkinson's Disease (NINDS NET-PD) network. © 2015 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.
Regularized multivariate regression models with skew-t error distributions
Chen, Lianfu
2014-06-01
We consider regularization of the parameters in multivariate linear regression models with the errors having a multivariate skew-t distribution. An iterative penalized likelihood procedure is proposed for constructing sparse estimators of both the regression coefficient and inverse scale matrices simultaneously. The sparsity is introduced through penalizing the negative log-likelihood by adding L1-penalties on the entries of the two matrices. Taking advantage of the hierarchical representation of skew-t distributions, and using the expectation conditional maximization (ECM) algorithm, we reduce the problem to penalized normal likelihood and develop a procedure to minimize the ensuing objective function. Using a simulation study the performance of the method is assessed, and the methodology is illustrated using a real data set with a 24-dimensional response vector. © 2014 Elsevier B.V.
[Interaction between continuous variables in logistic regression model].
Qiu, Hong; Yu, Ignatius Tak-Sun; Tse, Lap Ah; Wang, Xiao-rong; Fu, Zhen-ming
2010-07-01
Rothman argued that interaction estimated as departure from additivity better reflected the biological interaction. In a logistic regression model, the product term reflects the interaction as departure from multiplicativity. So far, literature on estimating interaction regarding an additive scale using logistic regression was only focusing on two dichotomous factors. The objective of the present report was to provide a method to examine the interaction as departure from additivity between two continuous variables or between one continuous variable and one categorical variable. We used data from a lung cancer case-control study among males in Hong Kong as an example to illustrate the bootstrap re-sampling method for calculating the corresponding confidence intervals. Free software R (Version 2.8.1) was used to estimate interaction on the additive scale.
Modeling the number of car theft using Poisson regression
Zulkifli, Malina; Ling, Agnes Beh Yen; Kasim, Maznah Mat; Ismail, Noriszura
2016-10-01
Regression analysis is the most popular statistical methods used to express the relationship between the variables of response with the covariates. The aim of this paper is to evaluate the factors that influence the number of car theft using Poisson regression model. This paper will focus on the number of car thefts that occurred in districts in Peninsular Malaysia. There are two groups of factor that have been considered, namely district descriptive factors and socio and demographic factors. The result of the study showed that Bumiputera composition, Chinese composition, Other ethnic composition, foreign migration, number of residence with the age between 25 to 64, number of employed person and number of unemployed person are the most influence factors that affect the car theft cases. These information are very useful for the law enforcement department, insurance company and car owners in order to reduce and limiting the car theft cases in Peninsular Malaysia.
Drzewiecki, Wojciech
2016-12-01
In this work nine non-linear regression models were compared for sub-pixel impervious surface area mapping from Landsat images. The comparison was done in three study areas both for accuracy of imperviousness coverage evaluation in individual points in time and accuracy of imperviousness change assessment. The performance of individual machine learning algorithms (Cubist, Random Forest, stochastic gradient boosting of regression trees, k-nearest neighbors regression, random k-nearest neighbors regression, Multivariate Adaptive Regression Splines, averaged neural networks, and support vector machines with polynomial and radial kernels) was also compared with the performance of heterogeneous model ensembles constructed from the best models trained using particular techniques. The results proved that in case of sub-pixel evaluation the most accurate prediction of change may not necessarily be based on the most accurate individual assessments. When single methods are considered, based on obtained results Cubist algorithm may be advised for Landsat based mapping of imperviousness for single dates. However, Random Forest may be endorsed when the most reliable evaluation of imperviousness change is the primary goal. It gave lower accuracies for individual assessments, but better prediction of change due to more correlated errors of individual predictions. Heterogeneous model ensembles performed for individual time points assessments at least as well as the best individual models. In case of imperviousness change assessment the ensembles always outperformed single model approaches. It means that it is possible to improve the accuracy of sub-pixel imperviousness change assessment using ensembles of heterogeneous non-linear regression models.
Dynamic Regression Intervention Modeling for the Malaysian Daily Load
Directory of Open Access Journals (Sweden)
Fadhilah Abdrazak
2014-05-01
Full Text Available Malaysia is a unique country due to having both fixed and moving holidays. These moving holidays may overlap with other fixed holidays and therefore, increase the complexity of the load forecasting activities. The errors due to holidays’ effects in the load forecasting are known to be higher than other factors. If these effects can be estimated and removed, the behavior of the series could be better viewed. Thus, the aim of this paper is to improve the forecasting errors by using a dynamic regression model with intervention analysis. Based on the linear transfer function method, a daily load model consists of either peak or average is developed. The developed model outperformed the seasonal ARIMA model in estimating the fixed and moving holidays’ effects and achieved a smaller Mean Absolute Percentage Error (MAPE in load forecast.
Can We Use Regression Modeling to Quantify Mean Annual Streamflow at a Global-Scale?
Barbarossa, V.; Huijbregts, M. A. J.; Hendriks, J. A.; Beusen, A.; Clavreul, J.; King, H.; Schipper, A.
2016-12-01
Quantifying mean annual flow of rivers (MAF) at ungauged sites is essential for a number of applications, including assessments of global water supply, ecosystem integrity and water footprints. MAF can be quantified with spatially explicit process-based models, which might be overly time-consuming and data-intensive for this purpose, or with empirical regression models that predict MAF based on climate and catchment characteristics. Yet, regression models have mostly been developed at a regional scale and the extent to which they can be extrapolated to other regions is not known. In this study, we developed a global-scale regression model for MAF using observations of discharge and catchment characteristics from 1,885 catchments worldwide, ranging from 2 to 106 km2 in size. In addition, we compared the performance of the regression model with the predictive ability of the spatially explicit global hydrological model PCR-GLOBWB [van Beek et al., 2011] by comparing results from both models to independent measurements. We obtained a regression model explaining 89% of the variance in MAF based on catchment area, mean annual precipitation and air temperature, average slope and elevation. The regression model performed better than PCR-GLOBWB for the prediction of MAF, as root-mean-square error values were lower (0.29 - 0.38 compared to 0.49 - 0.57) and the modified index of agreement was higher (0.80 - 0.83 compared to 0.72 - 0.75). Our regression model can be applied globally at any point of the river network, provided that the input parameters are within the range of values employed in the calibration of the model. The performance is reduced for water scarce regions and further research should focus on improving such an aspect for regression-based global hydrological models.
Modeling of the Monthly Rainfall-Runoff Process Through Regressions
Directory of Open Access Journals (Sweden)
Campos-Aranda Daniel Francisco
2014-10-01
Full Text Available To solve the problems associated with the assessment of water resources of a river, the modeling of the rainfall-runoff process (RRP allows the deduction of runoff missing data and to extend its record, since generally the information available on precipitation is larger. It also enables the estimation of inputs to reservoirs, when their building led to the suppression of the gauging station. The simplest mathematical model that can be set for the RRP is the linear regression or curve on a monthly basis. Such a model is described in detail and is calibrated with the simultaneous record of monthly rainfall and runoff in Ballesmi hydrometric station, which covers 35 years. Since the runoff of this station has an important contribution from the spring discharge, the record is corrected first by removing that contribution. In order to do this a procedure was developed based either on the monthly average regional runoff coefficients or on nearby and similar watershed; in this case the Tancuilín gauging station was used. Both stations belong to the Partial Hydrologic Region No. 26 (Lower Rio Panuco and are located within the state of San Luis Potosi, México. The study performed indicates that the monthly regression model, due to its conceptual approach, faithfully reproduces monthly average runoff volumes and achieves an excellent approximation in relation to the dispersion, proved by calculation of the means and standard deviations.
Poisson regression for modeling count and frequency outcomes in trauma research.
Gagnon, David R; Doron-LaMarca, Susan; Bell, Margret; O'Farrell, Timothy J; Taft, Casey T
2008-10-01
The authors describe how the Poisson regression method for analyzing count or frequency outcome variables can be applied in trauma studies. The outcome of interest in trauma research may represent a count of the number of incidents of behavior occurring in a given time interval, such as acts of physical aggression or substance abuse. Traditional linear regression approaches assume a normally distributed outcome variable with equal variances over the range of predictor variables, and may not be optimal for modeling count outcomes. An application of Poisson regression is presented using data from a study of intimate partner aggression among male patients in an alcohol treatment program and their female partners. Results of Poisson regression and linear regression models are compared.
Interpreting parameters in the logistic regression model with random effects
DEFF Research Database (Denmark)
Larsen, Klaus; Petersen, Jørgen Holm; Budtz-Jørgensen, Esben
2000-01-01
interpretation, interval odds ratio, logistic regression, median odds ratio, normally distributed random effects......interpretation, interval odds ratio, logistic regression, median odds ratio, normally distributed random effects...
Learning Supervised Topic Models for Classification and Regression from Crowds
DEFF Research Database (Denmark)
Rodrigues, Filipe; Lourenco, Mariana; Ribeiro, Bernardete
2017-01-01
annotation tasks, prone to ambiguity and noise, often with high volumes of documents, deem learning under a single-annotator assumption unrealistic or unpractical for most real-world applications. In this article, we propose two supervised topic models, one for classification and another for regression......The growing need to analyze large collections of documents has led to great developments in topic modeling. Since documents are frequently associated with other related variables, such as labels or ratings, much interest has been placed on supervised topic models. However, the nature of most...... problems, which account for the heterogeneity and biases among different annotators that are encountered in practice when learning from crowds. We develop an efficient stochastic variational inference algorithm that is able to scale to very large datasets, and we empirically demonstrate the advantages...
Diagnostic Measures for the Cox Regression Model with Missing Covariates.
Zhu, Hongtu; Ibrahim, Joseph G; Chen, Ming-Hui
2015-12-01
This paper investigates diagnostic measures for assessing the influence of observations and model misspecification in the presence of missing covariate data for the Cox regression model. Our diagnostics include case-deletion measures, conditional martingale residuals, and score residuals. The Q-distance is proposed to examine the effects of deleting individual observations on the estimates of finite-dimensional and infinite-dimensional parameters. Conditional martingale residuals are used to construct goodness of fit statistics for testing possible misspecification of the model assumptions. A resampling method is developed to approximate the p -values of the goodness of fit statistics. Simulation studies are conducted to evaluate our methods, and a real data set is analyzed to illustrate their use.
Fuzzy regression modeling for tool performance prediction and degradation detection.
Li, X; Er, M J; Lim, B S; Zhou, J H; Gan, O P; Rutkowski, L
2010-10-01
In this paper, the viability of using Fuzzy-Rule-Based Regression Modeling (FRM) algorithm for tool performance and degradation detection is investigated. The FRM is developed based on a multi-layered fuzzy-rule-based hybrid system with Multiple Regression Models (MRM) embedded into a fuzzy logic inference engine that employs Self Organizing Maps (SOM) for clustering. The FRM converts a complex nonlinear problem to a simplified linear format in order to further increase the accuracy in prediction and rate of convergence. The efficacy of the proposed FRM is tested through a case study - namely to predict the remaining useful life of a ball nose milling cutter during a dry machining process of hardened tool steel with a hardness of 52-54 HRc. A comparative study is further made between four predictive models using the same set of experimental data. It is shown that the FRM is superior as compared with conventional MRM, Back Propagation Neural Networks (BPNN) and Radial Basis Function Networks (RBFN) in terms of prediction accuracy and learning speed.
Preference learning with evolutionary Multivariate Adaptive Regression Spline model
DEFF Research Database (Denmark)
Abou-Zleikha, Mohamed; Shaker, Noor; Christensen, Mads Græsbøll
2015-01-01
This paper introduces a novel approach for pairwise preference learning through combining an evolutionary method with Multivariate Adaptive Regression Spline (MARS). Collecting users' feedback through pairwise preferences is recommended over other ranking approaches as this method is more appealing...... for function approximation as well as being relatively easy to interpret. MARS models are evolved based on their efficiency in learning pairwise data. The method is tested on two datasets that collectively provide pairwise preference data of five cognitive states expressed by users. The method is analysed...
Failure and reliability prediction by support vector machines regression of time series data
International Nuclear Information System (INIS)
Chagas Moura, Marcio das; Zio, Enrico; Lins, Isis Didier; Droguett, Enrique
2011-01-01
Support Vector Machines (SVMs) are kernel-based learning methods, which have been successfully adopted for regression problems. However, their use in reliability applications has not been widely explored. In this paper, a comparative analysis is presented in order to evaluate the SVM effectiveness in forecasting time-to-failure and reliability of engineered components based on time series data. The performance on literature case studies of SVM regression is measured against other advanced learning methods such as the Radial Basis Function, the traditional MultiLayer Perceptron model, Box-Jenkins autoregressive-integrated-moving average and the Infinite Impulse Response Locally Recurrent Neural Networks. The comparison shows that in the analyzed cases, SVM outperforms or is comparable to other techniques. - Highlights: → Realistic modeling of reliability demands complex mathematical formulations. → SVM is proper when the relation input/output is unknown or very costly to be obtained. → Results indicate the potential of SVM for reliability time series prediction. → Reliability estimates support the establishment of adequate maintenance strategies.
By, Kunthel; Qaqish, Bahjat F; Preisser, John S; Perin, Jamie; Zink, Richard C
2014-02-01
This article describes a new software for modeling correlated binary data based on orthogonalized residuals, a recently developed estimating equations approach that includes, as a special case, alternating logistic regressions. The software is flexible with respect to fitting in that the user can choose estimating equations for association models based on alternating logistic regressions or orthogonalized residuals, the latter choice providing a non-diagonal working covariance matrix for second moment parameters providing potentially greater efficiency. Regression diagnostics based on this method are also implemented in the software. The mathematical background is briefly reviewed and the software is applied to medical data sets. Published by Elsevier Ireland Ltd.
Due to the complexity of the processes contributing to beach bacteria concentrations, many researchers rely on statistical modeling, among which multiple linear regression (MLR) modeling is most widely used. Despite its ease of use and interpretation, there may be time dependence...
Regression Models for Predicting Force Coefficients of Aerofoils
Directory of Open Access Journals (Sweden)
Mohammed ABDUL AKBAR
2015-09-01
Full Text Available Renewable sources of energy are attractive and advantageous in a lot of different ways. Among the renewable energy sources, wind energy is the fastest growing type. Among wind energy converters, Vertical axis wind turbines (VAWTs have received renewed interest in the past decade due to some of the advantages they possess over their horizontal axis counterparts. VAWTs have evolved into complex 3-D shapes. A key component in predicting the output of VAWTs through analytical studies is obtaining the values of lift and drag coefficients which is a function of shape of the aerofoil, ‘angle of attack’ of wind and Reynolds’s number of flow. Sandia National Laboratories have carried out extensive experiments on aerofoils for the Reynolds number in the range of those experienced by VAWTs. The volume of experimental data thus obtained is huge. The current paper discusses three Regression analysis models developed wherein lift and drag coefficients can be found out using simple formula without having to deal with the bulk of the data. Drag coefficients and Lift coefficients were being successfully estimated by regression models with R2 values as high as 0.98.
Additive Intensity Regression Models in Corporate Default Analysis
DEFF Research Database (Denmark)
Lando, David; Medhat, Mamdouh; Nielsen, Mads Stenbo
2013-01-01
We consider additive intensity (Aalen) models as an alternative to the multiplicative intensity (Cox) models for analyzing the default risk of a sample of rated, nonfinancial U.S. firms. The setting allows for estimating and testing the significance of time-varying effects. We use a variety...... of model checking techniques to identify misspecifications. In our final model, we find evidence of time-variation in the effects of distance-to-default and short-to-long term debt. Also we identify interactions between distance-to-default and other covariates, and the quick ratio covariate is significant....... None of our macroeconomic covariates are significant....
Khoshravesh, Mojtaba; Sefidkouhi, Mohammad Ali Gholami; Valipour, Mohammad
2017-07-01
The proper evaluation of evapotranspiration is essential in food security investigation, farm management, pollution detection, irrigation scheduling, nutrient flows, carbon balance as well as hydrologic modeling, especially in arid environments. To achieve sustainable development and to ensure water supply, especially in arid environments, irrigation experts need tools to estimate reference evapotranspiration on a large scale. In this study, the monthly reference evapotranspiration was estimated by three different regression models including the multivariate fractional polynomial (MFP), robust regression, and Bayesian regression in Ardestan, Esfahan, and Kashan. The results were compared with Food and Agriculture Organization (FAO)-Penman-Monteith (FAO-PM) to select the best model. The results show that at a monthly scale, all models provided a closer agreement with the calculated values for FAO-PM ( R 2 > 0.95 and RMSE < 12.07 mm month-1). However, the MFP model gives better estimates than the other two models for estimating reference evapotranspiration at all stations.
MULTIPLE LOGISTIC REGRESSION MODEL TO PREDICT RISK FACTORS OF ORAL HEALTH DISEASES
Directory of Open Access Journals (Sweden)
Parameshwar V. Pandit
2012-06-01
Full Text Available Purpose: To analysis the dependence of oral health diseases i.e. dental caries and periodontal disease on considering the number of risk factors through the applications of logistic regression model. Method: The cross sectional study involves a systematic random sample of 1760 permanent dentition aged between 18-40 years in Dharwad, Karnataka, India. Dharwad is situated in North Karnataka. The mean age was 34.26±7.28. The risk factors of dental caries and periodontal disease were established by multiple logistic regression model using SPSS statistical software. Results: The factors like frequency of brushing, timings of cleaning teeth and type of toothpastes are significant persistent predictors of dental caries and periodontal disease. The log likelihood value of full model is –1013.1364 and Akaike’s Information Criterion (AIC is 1.1752 as compared to reduced regression model are -1019.8106 and 1.1748 respectively for dental caries. But, the log likelihood value of full model is –1085.7876 and AIC is 1.2577 followed by reduced regression model are -1019.8106 and 1.1748 respectively for periodontal disease. The area under Receiver Operating Characteristic (ROC curve for the dental caries is 0.7509 (full model and 0.7447 (reduced model; the ROC for the periodontal disease is 0.6128 (full model and 0.5821 (reduced model. Conclusions: The frequency of brushing, timings of cleaning teeth and type of toothpastes are main signifi cant risk factors of dental caries and periodontal disease. The fitting performance of reduced logistic regression model is slightly a better fit as compared to full logistic regression model in identifying the these risk factors for both dichotomous dental caries and periodontal disease.
Complex Environmental Data Modelling Using Adaptive General Regression Neural Networks
Kanevski, Mikhail
2015-04-01
The research deals with an adaptation and application of Adaptive General Regression Neural Networks (GRNN) to high dimensional environmental data. GRNN [1,2,3] are efficient modelling tools both for spatial and temporal data and are based on nonparametric kernel methods closely related to classical Nadaraya-Watson estimator. Adaptive GRNN, using anisotropic kernels, can be also applied for features selection tasks when working with high dimensional data [1,3]. In the present research Adaptive GRNN are used to study geospatial data predictability and relevant feature selection using both simulated and real data case studies. The original raw data were either three dimensional monthly precipitation data or monthly wind speeds embedded into 13 dimensional space constructed by geographical coordinates and geo-features calculated from digital elevation model. GRNN were applied in two different ways: 1) adaptive GRNN with the resulting list of features ordered according to their relevancy; and 2) adaptive GRNN applied to evaluate all possible models N [in case of wind fields N=(2^13 -1)=8191] and rank them according to the cross-validation error. In both cases training were carried out applying leave-one-out procedure. An important result of the study is that the set of the most relevant features depends on the month (strong seasonal effect) and year. The predictabilities of precipitation and wind field patterns, estimated using the cross-validation and testing errors of raw and shuffled data, were studied in detail. The results of both approaches were qualitatively and quantitatively compared. In conclusion, Adaptive GRNN with their ability to select features and efficient modelling of complex high dimensional data can be widely used in automatic/on-line mapping and as an integrated part of environmental decision support systems. 1. Kanevski M., Pozdnoukhov A., Timonin V. Machine Learning for Spatial Environmental Data. Theory, applications and software. EPFL Press
Structured Additive Regression Models: An R Interface to BayesX
Directory of Open Access Journals (Sweden)
Nikolaus Umlauf
2015-02-01
Full Text Available Structured additive regression (STAR models provide a flexible framework for model- ing possible nonlinear effects of covariates: They contain the well established frameworks of generalized linear models and generalized additive models as special cases but also allow a wider class of effects, e.g., for geographical or spatio-temporal data, allowing for specification of complex and realistic models. BayesX is standalone software package providing software for fitting general class of STAR models. Based on a comprehensive open-source regression toolbox written in C++, BayesX uses Bayesian inference for estimating STAR models based on Markov chain Monte Carlo simulation techniques, a mixed model representation of STAR models, or stepwise regression techniques combining penalized least squares estimation with model selection. BayesX not only covers models for responses from univariate exponential families, but also models from less-standard regression situations such as models for multi-categorical responses with either ordered or unordered categories, continuous time survival data, or continuous time multi-state models. This paper presents a new fully interactive R interface to BayesX: the R package R2BayesX. With the new package, STAR models can be conveniently specified using Rs formula language (with some extended terms, fitted using the BayesX binary, represented in R with objects of suitable classes, and finally printed/summarized/plotted. This makes BayesX much more accessible to users familiar with R and adds extensive graphics capabilities for visualizing fitted STAR models. Furthermore, R2BayesX complements the already impressive capabilities for semiparametric regression in R by a comprehensive toolbox comprising in particular more complex response types and alternative inferential procedures such as simulation-based Bayesian inference.
Developing and testing a global-scale regression model to quantify mean annual streamflow
Barbarossa, Valerio; Huijbregts, Mark A. J.; Hendriks, A. Jan; Beusen, Arthur H. W.; Clavreul, Julie; King, Henry; Schipper, Aafke M.
2017-01-01
Quantifying mean annual flow of rivers (MAF) at ungauged sites is essential for assessments of global water supply, ecosystem integrity and water footprints. MAF can be quantified with spatially explicit process-based models, which might be overly time-consuming and data-intensive for this purpose, or with empirical regression models that predict MAF based on climate and catchment characteristics. Yet, regression models have mostly been developed at a regional scale and the extent to which they can be extrapolated to other regions is not known. In this study, we developed a global-scale regression model for MAF based on a dataset unprecedented in size, using observations of discharge and catchment characteristics from 1885 catchments worldwide, measuring between 2 and 106 km2. In addition, we compared the performance of the regression model with the predictive ability of the spatially explicit global hydrological model PCR-GLOBWB by comparing results from both models to independent measurements. We obtained a regression model explaining 89% of the variance in MAF based on catchment area and catchment averaged mean annual precipitation and air temperature, slope and elevation. The regression model performed better than PCR-GLOBWB for the prediction of MAF, as root-mean-square error (RMSE) values were lower (0.29-0.38 compared to 0.49-0.57) and the modified index of agreement (d) was higher (0.80-0.83 compared to 0.72-0.75). Our regression model can be applied globally to estimate MAF at any point of the river network, thus providing a feasible alternative to spatially explicit process-based global hydrological models.
Global Land Use Regression Model for Nitrogen Dioxide Air Pollution.
Larkin, Andrew; Geddes, Jeffrey A; Martin, Randall V; Xiao, Qingyang; Liu, Yang; Marshall, Julian D; Brauer, Michael; Hystad, Perry
2017-06-20
Nitrogen dioxide is a common air pollutant with growing evidence of health impacts independent of other common pollutants such as ozone and particulate matter. However, the worldwide distribution of NO 2 exposure and associated impacts on health is still largely uncertain. To advance global exposure estimates we created a global nitrogen dioxide (NO 2 ) land use regression model for 2011 using annual measurements from 5,220 air monitors in 58 countries. The model captured 54% of global NO 2 variation, with a mean absolute error of 3.7 ppb. Regional performance varied from R 2 = 0.42 (Africa) to 0.67 (South America). Repeated 10% cross-validation using bootstrap sampling (n = 10,000) demonstrated a robust performance with respect to air monitor sampling in North America, Europe, and Asia (adjusted R 2 within 2%) but not for Africa and Oceania (adjusted R 2 within 11%) where NO 2 monitoring data are sparse. The final model included 10 variables that captured both between and within-city spatial gradients in NO 2 concentrations. Variable contributions differed between continental regions, but major roads within 100 m and satellite-derived NO 2 were consistently the strongest predictors. The resulting model can be used for global risk assessments and health studies, particularly in countries without existing NO 2 monitoring data or models.
Generalized constraint neural network regression model subject to linear priors.
Qu, Ya-Jun; Hu, Bao-Gang
2011-12-01
This paper is reports an extension of our previous investigations on adding transparency to neural networks. We focus on a class of linear priors (LPs), such as symmetry, ranking list, boundary, monotonicity, etc., which represent either linear-equality or linear-inequality priors. A generalized constraint neural network-LPs (GCNN-LPs) model is studied. Unlike other existing modeling approaches, the GCNN-LP model exhibits its advantages. First, any LP is embedded by an explicitly structural mode, which may add a higher degree of transparency than using a pure algorithm mode. Second, a direct elimination and least squares approach is adopted to study the model, which produces better performances in both accuracy and computational cost over the Lagrange multiplier techniques in experiments. Specific attention is paid to both "hard (strictly satisfied)" and "soft (weakly satisfied)" constraints for regression problems. Numerical investigations are made on synthetic examples as well as on the real-world datasets. Simulation results demonstrate the effectiveness of the proposed modeling approach in comparison with other existing approaches.
Conditional Monte Carlo randomization tests for regression models.
Parhat, Parwen; Rosenberger, William F; Diao, Guoqing
2014-08-15
We discuss the computation of randomization tests for clinical trials of two treatments when the primary outcome is based on a regression model. We begin by revisiting the seminal paper of Gail, Tan, and Piantadosi (1988), and then describe a method based on Monte Carlo generation of randomization sequences. The tests based on this Monte Carlo procedure are design based, in that they incorporate the particular randomization procedure used. We discuss permuted block designs, complete randomization, and biased coin designs. We also use a new technique by Plamadeala and Rosenberger (2012) for simple computation of conditional randomization tests. Like Gail, Tan, and Piantadosi, we focus on residuals from generalized linear models and martingale residuals from survival models. Such techniques do not apply to longitudinal data analysis, and we introduce a method for computation of randomization tests based on the predicted rate of change from a generalized linear mixed model when outcomes are longitudinal. We show, by simulation, that these randomization tests preserve the size and power well under model misspecification. Copyright © 2014 John Wiley & Sons, Ltd.
Collision prediction models using multivariate Poisson-lognormal regression.
El-Basyouny, Karim; Sayed, Tarek
2009-07-01
This paper advocates the use of multivariate Poisson-lognormal (MVPLN) regression to develop models for collision count data. The MVPLN approach presents an opportunity to incorporate the correlations across collision severity levels and their influence on safety analyses. The paper introduces a new multivariate hazardous location identification technique, which generalizes the univariate posterior probability of excess that has been commonly proposed and applied in the literature. In addition, the paper presents an alternative approach for quantifying the effect of the multivariate structure on the precision of expected collision frequency. The MVPLN approach is compared with the independent (separate) univariate Poisson-lognormal (PLN) models with respect to model inference, goodness-of-fit, identification of hot spots and precision of expected collision frequency. The MVPLN is modeled using the WinBUGS platform which facilitates computation of posterior distributions as well as providing a goodness-of-fit measure for model comparisons. The results indicate that the estimates of the extra Poisson variation parameters were considerably smaller under MVPLN leading to higher precision. The improvement in precision is due mainly to the fact that MVPLN accounts for the correlation between the latent variables representing property damage only (PDO) and injuries plus fatalities (I+F). This correlation was estimated at 0.758, which is highly significant, suggesting that higher PDO rates are associated with higher I+F rates, as the collision likelihood for both types is likely to rise due to similar deficiencies in roadway design and/or other unobserved factors. In terms of goodness-of-fit, the MVPLN model provided a superior fit than the independent univariate models. The multivariate hazardous location identification results demonstrated that some hazardous locations could be overlooked if the analysis was restricted to the univariate models.
A Gompertz regression model for fern spores germination
Directory of Open Access Journals (Sweden)
Gabriel y Galán, Jose María
2015-06-01
Full Text Available Germination is one of the most important biological processes for both seed and spore plants, also for fungi. At present, mathematical models of germination have been developed in fungi, bryophytes and several plant species. However, ferns are the only group whose germination has never been modelled. In this work we develop a regression model of the germination of fern spores. We have found that for Blechnum serrulatum, Blechnum yungense, Cheilanthes pilosa, Niphidium macbridei and Polypodium feuillei species the Gompertz growth model describe satisfactorily cumulative germination. An important result is that regression parameters are independent of fern species and the model is not affected by intraspecific variation. Our results show that the Gompertz curve represents a general germination model for all the non-green spore leptosporangiate ferns, including in the paper a discussion about the physiological and ecological meaning of the model.La germinación es uno de los procesos biológicos más relevantes tanto para las plantas con esporas, como para las plantas con semillas y los hongos. Hasta el momento, se han desarrollado modelos de germinación para hongos, briofitos y diversas especies de espermatófitos. Los helechos son el único grupo de plantas cuya germinación nunca ha sido modelizada. En este trabajo se desarrolla un modelo de regresión para explicar la germinación de las esporas de helechos. Observamos que para las especies Blechnum serrulatum, Blechnum yungense, Cheilanthes pilosa, Niphidium macbridei y Polypodium feuillei el modelo de crecimiento de Gompertz describe satisfactoriamente la germinación acumulativa. Un importante resultado es que los parámetros de la regresión son independientes de la especie y que el modelo no está afectado por variación intraespecífica. Por lo tanto, los resultados del trabajo muestran que la curva de Gompertz puede representar un modelo general para todos los helechos leptosporangiados
THE REGRESSION MODEL OF IRAN LIBRARIES ORGANIZATIONAL CLIMATE.
Jahani, Mohammad Ali; Yaminfirooz, Mousa; Siamian, Hasan
2015-10-01
The purpose of this study was to drawing a regression model of organizational climate of central libraries of Iran's universities. This study is an applied research. The statistical population of this study consisted of 96 employees of the central libraries of Iran's public universities selected among the 117 universities affiliated to the Ministry of Health by Stratified Sampling method (510 people). Climate Qual localized questionnaire was used as research tools. For predicting the organizational climate pattern of the libraries is used from the multivariate linear regression and track diagram. of the 9 variables affecting organizational climate, 5 variables of innovation, teamwork, customer service, psychological safety and deep diversity play a major role in prediction of the organizational climate of Iran's libraries. The results also indicate that each of these variables with different coefficient have the power to predict organizational climate but the climate score of psychological safety (0.94) plays a very crucial role in predicting the organizational climate. Track diagram showed that five variables of teamwork, customer service, psychological safety, deep diversity and innovation directly effects on the organizational climate variable that contribution of the team work from this influence is more than any other variables. Of the indicator of the organizational climate of climateQual, the contribution of the team work from this influence is more than any other variables that reinforcement of teamwork in academic libraries can be more effective in improving the organizational climate of this type libraries.
Bruno, Delia Evelina; Barca, Emanuele; Goncalves, Rodrigo Mikosz; de Araujo Queiroz, Heithor Alexandre; Berardi, Luigi; Passarella, Giuseppe
2018-01-01
In this paper, the Evolutionary Polynomial Regression data modelling strategy has been applied to study small scale, short-term coastal morphodynamics, given its capability for treating a wide database of known information, non-linearly. Simple linear and multilinear regression models were also applied to achieve a balance between the computational load and reliability of estimations of the three models. In fact, even though it is easy to imagine that the more complex the model, the more the prediction improves, sometimes a "slight" worsening of estimations can be accepted in exchange for the time saved in data organization and computational load. The models' outcomes were validated through a detailed statistical, error analysis, which revealed a slightly better estimation of the polynomial model with respect to the multilinear model, as expected. On the other hand, even though the data organization was identical for the two models, the multilinear one required a simpler simulation setting and a faster run time. Finally, the most reliable evolutionary polynomial regression model was used in order to make some conjecture about the uncertainty increase with the extension of extrapolation time of the estimation. The overlapping rate between the confidence band of the mean of the known coast position and the prediction band of the estimated position can be a good index of the weakness in producing reliable estimations when the extrapolation time increases too much. The proposed models and tests have been applied to a coastal sector located nearby Torre Colimena in the Apulia region, south Italy.
Variable selection in Logistic regression model with genetic algorithm.
Zhang, Zhongheng; Trevino, Victor; Hoseini, Sayed Shahabuddin; Belciug, Smaranda; Boopathi, Arumugam Manivanna; Zhang, Ping; Gorunescu, Florin; Subha, Velappan; Dai, Songshi
2018-02-01
Variable or feature selection is one of the most important steps in model specification. Especially in the case of medical-decision making, the direct use of a medical database, without a previous analysis and preprocessing step, is often counterproductive. In this way, the variable selection represents the method of choosing the most relevant attributes from the database in order to build a robust learning models and, thus, to improve the performance of the models used in the decision process. In biomedical research, the purpose of variable selection is to select clinically important and statistically significant variables, while excluding unrelated or noise variables. A variety of methods exist for variable selection, but none of them is without limitations. For example, the stepwise approach, which is highly used, adds the best variable in each cycle generally producing an acceptable set of variables. Nevertheless, it is limited by the fact that it commonly trapped in local optima. The best subset approach can systematically search the entire covariate pattern space, but the solution pool can be extremely large with tens to hundreds of variables, which is the case in nowadays clinical data. Genetic algorithms (GA) are heuristic optimization approaches and can be used for variable selection in multivariable regression models. This tutorial paper aims to provide a step-by-step approach to the use of GA in variable selection. The R code provided in the text can be extended and adapted to other data analysis needs.
Bayesian Regression of Thermodynamic Models of Redox Active Materials
Energy Technology Data Exchange (ETDEWEB)
Johnston, Katherine [Sandia National Lab. (SNL-NM), Albuquerque, NM (United States)
2017-09-01
Finding a suitable functional redox material is a critical challenge to achieving scalable, economically viable technologies for storing concentrated solar energy in the form of a defected oxide. Demonstrating e ectiveness for thermal storage or solar fuel is largely accomplished by using a thermodynamic model derived from experimental data. The purpose of this project is to test the accuracy of our regression model on representative data sets. Determining the accuracy of the model includes parameter tting the model to the data, comparing the model using di erent numbers of param- eters, and analyzing the entropy and enthalpy calculated from the model. Three data sets were considered in this project: two demonstrating materials for solar fuels by wa- ter splitting and the other of a material for thermal storage. Using Bayesian Inference and Markov Chain Monte Carlo (MCMC), parameter estimation was preformed on the three data sets. Good results were achieved, except some there was some deviations on the edges of the data input ranges. The evidence values were then calculated in a variety of ways and used to compare models with di erent number of parameters. It was believed that at least one of the parameters was unnecessary and comparing evidence values demonstrated that the parameter was need on one data set and not signi cantly helpful on another. The entropy was calculated by taking the derivative in one variable and integrating over another. and its uncertainty was also calculated by evaluating the entropy over multiple MCMC samples. Afterwards, all the parts were written up as a tutorial for the Uncertainty Quanti cation Toolkit (UQTk).
Meta-Modeling by Symbolic Regression and Pareto Simulated Annealing
Stinstra, E.; Rennen, G.; Teeuwen, G.J.A.
2006-01-01
The subject of this paper is a new approach to Symbolic Regression.Other publications on Symbolic Regression use Genetic Programming.This paper describes an alternative method based on Pareto Simulated Annealing.Our method is based on linear regression for the estimation of constants.Interval
Tightness of M-estimators for multiple linear regression in time series
DEFF Research Database (Denmark)
Johansen, Søren; Nielsen, Bent
We show tightness of a general M-estimator for multiple linear regression in time series. The positive criterion function for the M-estimator is assumed lower semi-continuous and sufficiently large for large argument: Particular cases are the Huber-skip and quantile regression. Tightness requires...
de Schryver, Tom; Eisinga, R.N.
2011-01-01
The key question in research on dismissals of head coaches in sports clubs is not whether they should happen but when they will happen. This paper applies piecewise linear regression to advance our understanding of the timing of head coach dismissals. Essentially, the regression sacrifices degrees
Convergence diagnostics for Eigenvalue problems with linear regression model
International Nuclear Information System (INIS)
Shi, Bo; Petrovic, Bojan
2011-01-01
Although the Monte Carlo method has been extensively used for criticality/Eigenvalue problems, a reliable, robust, and efficient convergence diagnostics method is still desired. Most methods are based on integral parameters (multiplication factor, entropy) and either condense the local distribution information into a single value (e.g., entropy) or even disregard it. We propose to employ the detailed cycle-by-cycle local flux evolution obtained by using mesh tally mechanism to assess the source and flux convergence. By applying a linear regression model to each individual mesh in a mesh tally for convergence diagnostics, a global convergence criterion can be obtained. We exemplify this method on two problems and obtain promising diagnostics results. (author)
Ultracentrifuge separative power modeling with multivariate regression using covariance matrix
International Nuclear Information System (INIS)
Migliavacca, Elder
2004-01-01
In this work, the least-squares methodology with covariance matrix is applied to determine a data curve fitting to obtain a performance function for the separative power δU of a ultracentrifuge as a function of variables that are experimentally controlled. The experimental data refer to 460 experiments on the ultracentrifugation process for uranium isotope separation. The experimental uncertainties related with these independent variables are considered in the calculation of the experimental separative power values, determining an experimental data input covariance matrix. The process variables, which significantly influence the δU values are chosen in order to give information on the ultracentrifuge behaviour when submitted to several levels of feed flow rate F, cut θ and product line pressure P p . After the model goodness-of-fit validation, a residual analysis is carried out to verify the assumed basis concerning its randomness and independence and mainly the existence of residual heteroscedasticity with any explained regression model variable. The surface curves are made relating the separative power with the control variables F, θ and P p to compare the fitted model with the experimental data and finally to calculate their optimized values. (author)
Modeling Pan Evaporation for Kuwait by Multiple Linear Regression
Almedeij, Jaber
2012-01-01
Evaporation is an important parameter for many projects related to hydrology and water resources systems. This paper constitutes the first study conducted in Kuwait to obtain empirical relations for the estimation of daily and monthly pan evaporation as functions of available meteorological data of temperature, relative humidity, and wind speed. The data used here for the modeling are daily measurements of substantial continuity coverage, within a period of 17 years between January 1993 and December 2009, which can be considered representative of the desert climate of the urban zone of the country. Multiple linear regression technique is used with a procedure of variable selection for fitting the best model forms. The correlations of evaporation with temperature and relative humidity are also transformed in order to linearize the existing curvilinear patterns of the data by using power and exponential functions, respectively. The evaporation models suggested with the best variable combinations were shown to produce results that are in a reasonable agreement with observation values. PMID:23226984
Schmidtmann, I; Elsäßer, A; Weinmann, A; Binder, H
2014-12-30
For determining a manageable set of covariates potentially influential with respect to a time-to-event endpoint, Cox proportional hazards models can be combined with variable selection techniques, such as stepwise forward selection or backward elimination based on p-values, or regularized regression techniques such as component-wise boosting. Cox regression models have also been adapted for dealing with more complex event patterns, for example, for competing risks settings with separate, cause-specific hazard models for each event type, or for determining the prognostic effect pattern of a variable over different landmark times, with one conditional survival model for each landmark. Motivated by a clinical cancer registry application, where complex event patterns have to be dealt with and variable selection is needed at the same time, we propose a general approach for linking variable selection between several Cox models. Specifically, we combine score statistics for each covariate across models by Fisher's method as a basis for variable selection. This principle is implemented for a stepwise forward selection approach as well as for a regularized regression technique. In an application to data from hepatocellular carcinoma patients, the coupled stepwise approach is seen to facilitate joint interpretation of the different cause-specific Cox models. In conditional survival models at landmark times, which address updates of prediction as time progresses and both treatment and other potential explanatory variables may change, the coupled regularized regression approach identifies potentially important, stably selected covariates together with their effect time pattern, despite having only a small number of events. These results highlight the promise of the proposed approach for coupling variable selection between Cox models, which is particularly relevant for modeling for clinical cancer registries with their complex event patterns. Copyright © 2014 John Wiley & Sons
Observer-Based and Regression Model-Based Detection of Emerging Faults in Coal Mills
DEFF Research Database (Denmark)
Odgaard, Peter Fogh; Lin, Bao; Jørgensen, Sten Bay
2006-01-01
In order to improve the reliability of power plants it is important to detect fault as fast as possible. Doing this it is interesting to find the most efficient method. Since modeling of large scale systems is time consuming it is interesting to compare a model-based method with data driven ones....... In this paper three different fault detection approaches are compared using a example of a coal mill, where a fault emerges. The compared methods are based on: an optimal unknown input observer, static and dynamic regression model-based detections. The conclusion on the comparison is that observer-based scheme...... detects the fault 13 samples earlier than the dynamic regression model-based method, and that the static regression based method is not usable due to generation of far too many false detections....
Shih, Ya-Chen Tina; Konrad, Thomas R
2007-10-01
Physician income is generally high, but quite variable; hence, physicians have divergent perspectives regarding health policy initiatives and market reforms that could affect their incomes. We investigated factors underlying the distribution of income within the physician population. Full-time physicians (N=10,777) from the restricted version of the 1996-1997 Community Tracking Study Physician Survey (CTS-PS), 1996 Area Resource File, and 1996 health maintenance organization penetration data. We conducted separate analyses for primary care physicians (PCPs) and specialists. We employed least square and quantile regression models to examine factors associated with physician incomes at the mean and at various points of the income distribution, respectively. We accounted for the complex survey design for the CTS-PS data using appropriate weighted procedures and explored endogeneity using an instrumental variables method. We detected widespread and subtle effects of many variables on physician incomes at different points (10th, 25th, 75th, and 90th percentiles) in the distribution that were undetected when employing regression estimations focusing on only the means or medians. Our findings show that the effects of managed care penetration are demonstrable at the mean of specialist incomes, but are more pronounced at higher levels. Conversely, a gender gap in earnings occurs at all levels of income of both PCPs and specialists, but is more pronounced at lower income levels. The quantile regression technique offers an analytical tool to evaluate policy effects beyond the means. A longitudinal application of this approach may enable health policy makers to identify winners and losers among segments of the physician workforce and assess how market dynamics and health policy initiatives affect the overall physician income distribution over various time intervals.
An Ordered Regression Model to Predict Transit Passengers’ Behavioural Intentions
Energy Technology Data Exchange (ETDEWEB)
Oña, J. de; Oña, R. de; Eboli, L.; Forciniti, C.; Mazzulla, G.
2016-07-01
Passengers’ behavioural intentions after experiencing transit services can be viewed as signals that show if a customer continues to utilise a company’s service. Users’ behavioural intentions can depend on a series of aspects that are difficult to measure directly. More recently, transit passengers’ behavioural intentions have been just considered together with the concepts of service quality and customer satisfaction. Due to the characteristics of the ways for evaluating passengers’ behavioural intentions, service quality and customer satisfaction, we retain that this kind of issue could be analysed also by applying ordered regression models. This work aims to propose just an ordered probit model for analysing service quality factors that can influence passengers’ behavioural intentions towards the use of transit services. The case study is the LRT of Seville (Spain), where a survey was conducted in order to collect the opinions of the passengers about the existing transit service, and to have a measure of the aspects that can influence the intentions of the users to continue using the transit service in the future. (Author)
Cason, Gerald J.; Cason, Carolyn L.
A more familiar and efficient method for estimating the parameters of Cason and Cason's model was examined. Using a two-step analysis based on linear regression, rather than the direct search interative procedure, gave about equally good results while providing a 33 to 1 computer processing time advantage, across 14 cohorts of junior medical…
Shi, Jinfei; Zhu, Songqing; Chen, Ruwen
2017-12-01
An order selection method based on multiple stepwise regressions is proposed for General Expression of Nonlinear Autoregressive model which converts the model order problem into the variable selection of multiple linear regression equation. The partial autocorrelation function is adopted to define the linear term in GNAR model. The result is set as the initial model, and then the nonlinear terms are introduced gradually. Statistics are chosen to study the improvements of both the new introduced and originally existed variables for the model characteristics, which are adopted to determine the model variables to retain or eliminate. So the optimal model is obtained through data fitting effect measurement or significance test. The simulation and classic time-series data experiment results show that the method proposed is simple, reliable and can be applied to practical engineering.
Directory of Open Access Journals (Sweden)
Menon Carlo
2011-09-01
Full Text Available Abstract Background Several regression models have been proposed for estimation of isometric joint torque using surface electromyography (SEMG signals. Common issues related to torque estimation models are degradation of model accuracy with passage of time, electrode displacement, and alteration of limb posture. This work compares the performance of the most commonly used regression models under these circumstances, in order to assist researchers with identifying the most appropriate model for a specific biomedical application. Methods Eleven healthy volunteers participated in this study. A custom-built rig, equipped with a torque sensor, was used to measure isometric torque as each volunteer flexed and extended his wrist. SEMG signals from eight forearm muscles, in addition to wrist joint torque data were gathered during the experiment. Additional data were gathered one hour and twenty-four hours following the completion of the first data gathering session, for the purpose of evaluating the effects of passage of time and electrode displacement on accuracy of models. Acquired SEMG signals were filtered, rectified, normalized and then fed to models for training. Results It was shown that mean adjusted coefficient of determination (Ra2 values decrease between 20%-35% for different models after one hour while altering arm posture decreased mean Ra2 values between 64% to 74% for different models. Conclusions Model estimation accuracy drops significantly with passage of time, electrode displacement, and alteration of limb posture. Therefore model retraining is crucial for preserving estimation accuracy. Data resampling can significantly reduce model training time without losing estimation accuracy. Among the models compared, ordinary least squares linear regression model (OLS was shown to have high isometric torque estimation accuracy combined with very short training times.
A computational approach to compare regression modelling strategies in prediction research.
Pajouheshnia, Romin; Pestman, Wiebe R; Teerenstra, Steven; Groenwold, Rolf H H
2016-08-25
It is often unclear which approach to fit, assess and adjust a model will yield the most accurate prediction model. We present an extension of an approach for comparing modelling strategies in linear regression to the setting of logistic regression and demonstrate its application in clinical prediction research. A framework for comparing logistic regression modelling strategies by their likelihoods was formulated using a wrapper approach. Five different strategies for modelling, including simple shrinkage methods, were compared in four empirical data sets to illustrate the concept of a priori strategy comparison. Simulations were performed in both randomly generated data and empirical data to investigate the influence of data characteristics on strategy performance. We applied the comparison framework in a case study setting. Optimal strategies were selected based on the results of a priori comparisons in a clinical data set and the performance of models built according to each strategy was assessed using the Brier score and calibration plots. The performance of modelling strategies was highly dependent on the characteristics of the development data in both linear and logistic regression settings. A priori comparisons in four empirical data sets found that no strategy consistently outperformed the others. The percentage of times that a model adjustment strategy outperformed a logistic model ranged from 3.9 to 94.9 %, depending on the strategy and data set. However, in our case study setting the a priori selection of optimal methods did not result in detectable improvement in model performance when assessed in an external data set. The performance of prediction modelling strategies is a data-dependent process and can be highly variable between data sets within the same clinical domain. A priori strategy comparison can be used to determine an optimal logistic regression modelling strategy for a given data set before selecting a final modelling approach.
The analysis of nonstationary time series using regression, correlation and cointegration
DEFF Research Database (Denmark)
Johansen, Søren
2012-01-01
There are simple well-known conditions for the validity of regression and correlation as statistical tools. We analyse by examples the effect of nonstationarity on inference using these methods and compare them to model based inference using the cointegrated vector autoregressive model. Finally w...... analyse some monthly data from US on interest rates as an illustration of the methods......There are simple well-known conditions for the validity of regression and correlation as statistical tools. We analyse by examples the effect of nonstationarity on inference using these methods and compare them to model based inference using the cointegrated vector autoregressive model. Finally we...
Zihajehzadeh, Shaghayegh; Park, Edward J
2016-08-01
This study provides a concurrent comparison of regression model-based walking speed estimation accuracy using lower body mounted inertial sensors. The comparison is based on different sets of variables, features, mounting locations and regression methods. An experimental evaluation was performed on 15 healthy subjects during free walking trials. Our results show better accuracy of Gaussian process regression compared to least square regression using Lasso. Among the variables, external acceleration tends to provide improved accuracy. By using both time-domain and frequency-domain features, waist and ankle-mounted sensors result in similar accuracies: 4.5% for the waist and 4.9% for the ankle. When using only frequency-domain features, estimation accuracy based on a waist-mounted sensor suffers more compared to the one from ankle.
Color Image Segmentation Using Fuzzy C-Regression Model
Directory of Open Access Journals (Sweden)
Min Chen
2017-01-01
Full Text Available Image segmentation is one important process in image analysis and computer vision and is a valuable tool that can be applied in fields of image processing, health care, remote sensing, and traffic image detection. Given the lack of prior knowledge of the ground truth, unsupervised learning techniques like clustering have been largely adopted. Fuzzy clustering has been widely studied and successfully applied in image segmentation. In situations such as limited spatial resolution, poor contrast, overlapping intensities, and noise and intensity inhomogeneities, fuzzy clustering can retain much more information than the hard clustering technique. Most fuzzy clustering algorithms have originated from fuzzy c-means (FCM and have been successfully applied in image segmentation. However, the cluster prototype of the FCM method is hyperspherical or hyperellipsoidal. FCM may not provide the accurate partition in situations where data consists of arbitrary shapes. Therefore, a Fuzzy C-Regression Model (FCRM using spatial information has been proposed whose prototype is hyperplaned and can be either linear or nonlinear allowing for better cluster partitioning. Thus, this paper implements FCRM and applies the algorithm to color segmentation using Berkeley’s segmentation database. The results show that FCRM obtains more accurate results compared to other fuzzy clustering algorithms.
Application of regression model on stream water quality parameters
International Nuclear Information System (INIS)
Suleman, M.; Maqbool, F.; Malik, A.H.; Bhatti, Z.A.
2012-01-01
Statistical analysis was conducted to evaluate the effect of solid waste leachate from the open solid waste dumping site of Salhad on the stream water quality. Five sites were selected along the stream. Two sites were selected prior to mixing of leachate with the surface water. One was of leachate and other two sites were affected with leachate. Samples were analyzed for pH, water temperature, electrical conductivity (EC), total dissolved solids (TDS), Biological oxygen demand (BOD), chemical oxygen demand (COD), dissolved oxygen (DO) and total bacterial load (TBL). In this study correlation coefficient r among different water quality parameters of various sites were calculated by using Pearson model and then average of each correlation between two parameters were also calculated, which shows TDS and EC and pH and BOD have significantly increasing r value, while temperature and TDS, temp and EC, DO and BL, DO and COD have decreasing r value. Single factor ANOVA at 5% level of significance was used which shows EC, TDS, TCL and COD were significantly differ among various sites. By the application of these two statistical approaches TDS and EC shows strongly positive correlation because the ions from the dissolved solids in water influence the ability of that water to conduct an electrical current. These two parameters significantly vary among 5 sites which are further confirmed by using linear regression. (author)
CANFIS: A non-linear regression procedure to produce statistical air-quality forecast models
Energy Technology Data Exchange (ETDEWEB)
Burrows, W.R.; Montpetit, J. [Environment Canada, Downsview, Ontario (Canada). Meteorological Research Branch; Pudykiewicz, J. [Environment Canada, Dorval, Quebec (Canada)
1997-12-31
Statistical models for forecasts of environmental variables can provide a good trade-off between significance and precision in return for substantial saving of computer execution time. Recent non-linear regression techniques give significantly increased accuracy compared to traditional linear regression methods. Two are Classification and Regression Trees (CART) and the Neuro-Fuzzy Inference System (NFIS). Both can model predict and distributions, including the tails, with much better accuracy than linear regression. Given a learning data set of matched predict and predictors, CART regression produces a non-linear, tree-based, piecewise-continuous model of the predict and data. Its variance-minimizing procedure optimizes the task of predictor selection, often greatly reducing initial data dimensionality. NFIS reduces dimensionality by a procedure known as subtractive clustering but it does not of itself eliminate predictors. Over-lapping coverage in predictor space is enhanced by NFIS with a Gaussian membership function for each cluster component. Coefficients for a continuous response model based on the fuzzified cluster centers are obtained by a least-squares estimation procedure. CANFIS is a two-stage data-modeling technique that combines the strength of CART to optimize the process of selecting predictors from a large pool of potential predictors with the modeling strength of NFIS. A CANFIS model requires negligible computer time to run. CANFIS models for ground-level O{sub 3}, particulates, and other pollutants will be produced for each of about 100 Canadian sites. The air-quality models will run twice daily using a small number of predictors isolated from a large pool of upstream and local Lagrangian potential predictors.
The microcomputer scientific software series 2: general linear model--regression.
Harold M. Rauscher
1983-01-01
The general linear model regression (GLMR) program provides the microcomputer user with a sophisticated regression analysis capability. The output provides a regression ANOVA table, estimators of the regression model coefficients, their confidence intervals, confidence intervals around the predicted Y-values, residuals for plotting, a check for multicollinearity, a...
Bertrand, Julie; Balding, David J
2013-03-01
Studies on the influence of single nucleotide polymorphisms (SNPs) on drug pharmacokinetics (PK) have usually been limited to the analysis of observed drug concentration or area under the concentration versus time curve. Nonlinear mixed effects models enable analysis of the entire curve, even for sparse data, but until recently, there has been no systematic method to examine the effects of multiple SNPs on the model parameters. The aim of this study was to assess different penalized regression methods for including SNPs in PK analyses. A total of 200 data sets were simulated under both the null and an alternative hypothesis. In each data set for each of the 300 participants, a PK profile at six sampling times was simulated and 1227 genotypes were generated through haplotypes. After modelling the PK profiles using an expectation maximization algorithm, genetic association with individual parameters was investigated using the following approaches: (i) a classical stepwise approach, (ii) ridge regression modified to include a test, (iii) Lasso and (iv) a generalization of Lasso, the HyperLasso. Penalized regression approaches are often much faster than the stepwise approach. There are significantly fewer true positives for ridge regression than for the stepwise procedure and HyperLasso. The higher number of true positives in the stepwise procedure was accompanied by a higher count of false positives (not significant). We find that all approaches except ridge regression show similar power, but penalized regression can be much less computationally demanding. We conclude that penalized regression should be preferred over stepwise procedures for PK analyses with a large panel of genetic covariates.
Ban, Masashi
2017-08-01
Violation of the quantum regression theorem and the Leggett-Garg inequality is studied by means of the exactly solvable multi-mode Jaynes-Cummings model. An exact expression of a two-time correlation function is compared with that derived by the quantum regression theorem. It is found that the quantum regression theorem is not valid even if the reduced time evolution of the qubit is Markovian. Furthermore, it is shown that if the quantum regression theorem is applied in this model, the Leggett-Garg inequality is satisfied while it is violated by the exact correlation function.
MODELING SNAKE MICROHABITAT FROM RADIOTELEMETRY STUDIES USING POLYTOMOUS LOGISTIC REGRESSION
Multivariate analysis of snake microhabitat has historically used techniques that were derived under assumptions of normality and common covariance structure (e.g., discriminant function analysis, MANOVA). In this study, polytomous logistic regression (PLR which does not require ...
Kala, Abhishek K; Tiwari, Chetan; Mikler, Armin R; Atkinson, Samuel F
2017-01-01
The primary aim of the study reported here was to determine the effectiveness of utilizing local spatial variations in environmental data to uncover the statistical relationships between West Nile Virus (WNV) risk and environmental factors. Because least squares regression methods do not account for spatial autocorrelation and non-stationarity of the type of spatial data analyzed for studies that explore the relationship between WNV and environmental determinants, we hypothesized that a geographically weighted regression model would help us better understand how environmental factors are related to WNV risk patterns without the confounding effects of spatial non-stationarity. We examined commonly mapped environmental factors using both ordinary least squares regression (LSR) and geographically weighted regression (GWR). Both types of models were applied to examine the relationship between WNV-infected dead bird counts and various environmental factors for those locations. The goal was to determine which approach yielded a better predictive model. LSR efforts lead to identifying three environmental variables that were statistically significantly related to WNV infected dead birds (adjusted R 2 = 0.61): stream density, road density, and land surface temperature. GWR efforts increased the explanatory value of these three environmental variables with better spatial precision (adjusted R 2 = 0.71). The spatial granularity resulting from the geographically weighted approach provides a better understanding of how environmental spatial heterogeneity is related to WNV risk as implied by WNV infected dead birds, which should allow improved planning of public health management strategies.
A logistic regression model for Ghana National Health Insurance claims
Directory of Open Access Journals (Sweden)
Samuel Antwi
2013-07-01
Full Text Available In August 2003, the Ghanaian Government made history by implementing the first National Health Insurance System (NHIS in Sub-Saharan Africa. Within three years, over half of the country’s population had voluntarily enrolled into the National Health Insurance Scheme. This study had three objectives: 1 To estimate the risk factors that influences the Ghana national health insurance claims. 2 To estimate the magnitude of each of the risk factors in relation to the Ghana national health insurance claims. In this work, data was collected from the policyholders of the Ghana National Health Insurance Scheme with the help of the National Health Insurance database and the patients’ attendance register of the Koforidua Regional Hospital, from 1st January to 31st December 2011. Quantitative analysis was done using the generalized linear regression (GLR models. The results indicate that risk factors such as sex, age, marital status, distance and length of stay at the hospital were important predictors of health insurance claims. However, it was found that the risk factors; health status, billed charges and income level are not good predictors of national health insurance claim. The outcome of the study shows that sex, age, marital status, distance and length of stay at the hospital are statistically significant in the determination of the Ghana National health insurance premiums since they considerably influence claims. We recommended, among other things that, the National Health Insurance Authority should facilitate the institutionalization of the collection of appropriate data on a continuous basis to help in the determination of future premiums.
Keat, Sim Chong; Chun, Beh Boon; San, Lim Hwee; Jafri, Mohd Zubir Mat
2015-04-01
Climate change due to carbon dioxide (CO2) emissions is one of the most complex challenges threatening our planet. This issue considered as a great and international concern that primary attributed from different fossil fuels. In this paper, regression model is used for analyzing the causal relationship among CO2 emissions based on the energy consumption in Malaysia using time series data for the period of 1980-2010. The equations were developed using regression model based on the eight major sources that contribute to the CO2 emissions such as non energy, Liquefied Petroleum Gas (LPG), diesel, kerosene, refinery gas, Aviation Turbine Fuel (ATF) and Aviation Gasoline (AV Gas), fuel oil and motor petrol. The related data partly used for predict the regression model (1980-2000) and partly used for validate the regression model (2001-2010). The results of the prediction model with the measured data showed a high correlation coefficient (R2=0.9544), indicating the model's accuracy and efficiency. These results are accurate and can be used in early warning of the population to comply with air quality standards.
Directory of Open Access Journals (Sweden)
Irene Fernández Monsalve
2014-08-01
Full Text Available During language comprehension, semantic contextual information is used to generate expectations about upcoming items. This has been commonly studied through the N400 event-related potential (ERP, as a measure of facilitated lexical retrieval. However, the associative relationships in multi-word expressions (MWE may enable the generation of a categorical expectation, leading to lexical retrieval before target word onset. Processing of the target word would thus reflect a target-identification mechanism, possibly indexed by a P3 ERP component. However, given their time overlap (200-500 ms post-stimulus onset, differentiating between N400/P3 ERP responses (averaged over multiple linguistically variable trials is problematic. In the present study, we analyzed EEG data from a previous experiment, which compared ERP responses to highly expected words that were placed either in a MWE or a regular non-fixed compositional context, and to low predictability controls. We focused on oscillatory dynamics and regression analyses, in order to dissociate between the two contexts by modeling the electrophysiological response as a function of item-level parameters. A significant interaction between word position and condition was found in the regression model for power in a theta range (~7-9 Hz, providing evidence for the presence of qualitative differences between conditions. Power levels within this band were lower for MWE than compositional contexts then the target word appeared later on in the sentence, confirming that in the former lexical retrieval would have taken place before word onset. On the other hand, gamma-power (~50-70 Hz was also modulated by predictability of the item in all conditions, which is interpreted as an index of a similar `matching' sub-step for both types of contexts, binding an expected representation and the external input.
Linking Simple Economic Theory Models and the Cointegrated Vector AutoRegressive Model
DEFF Research Database (Denmark)
Møller, Niels Framroze
This paper attempts to clarify the connection between simple economic theory models and the approach of the Cointegrated Vector-Auto-Regressive model (CVAR). By considering (stylized) examples of simple static equilibrium models, it is illustrated in detail, how the theoretical model and its...
Faraway, Julian J
2005-01-01
Linear models are central to the practice of statistics and form the foundation of a vast range of statistical methodologies. Julian J. Faraway''s critically acclaimed Linear Models with R examined regression and analysis of variance, demonstrated the different methods available, and showed in which situations each one applies. Following in those footsteps, Extending the Linear Model with R surveys the techniques that grow from the regression model, presenting three extensions to that framework: generalized linear models (GLMs), mixed effect models, and nonparametric regression models. The author''s treatment is thoroughly modern and covers topics that include GLM diagnostics, generalized linear mixed models, trees, and even the use of neural networks in statistics. To demonstrate the interplay of theory and practice, throughout the book the author weaves the use of the R software environment to analyze the data of real examples, providing all of the R commands necessary to reproduce the analyses. All of the ...
International Nuclear Information System (INIS)
Alvarez R, J.T.; Morales P, R.
1992-06-01
The absorbed dose for equivalent soft tissue is determined,it is imparted by ophthalmologic applicators, ( 90 Sr/ 90 Y, 1850 MBq) using an extrapolation chamber of variable electrodes; when estimating the slope of the extrapolation curve using a simple lineal regression model is observed that the dose values are underestimated from 17.7 percent up to a 20.4 percent in relation to the estimate of this dose by means of a regression model polynomial two grade, at the same time are observed an improvement in the standard error for the quadratic model until in 50%. Finally the global uncertainty of the dose is presented, taking into account the reproducibility of the experimental arrangement. As conclusion it can infers that in experimental arrangements where the source is to contact with the extrapolation chamber, it was recommended to substitute the lineal regression model by the quadratic regression model, in the determination of the slope of the extrapolation curve, for more exact and accurate measurements of the absorbed dose. (Author)
International Nuclear Information System (INIS)
Fang, Tingting; Lahdelma, Risto
2016-01-01
Highlights: • Social factor is considered for the linear regression models besides weather file. • Simultaneously optimize all the coefficients for linear regression models. • SARIMA combined with linear regression is used to forecast the heat demand. • The accuracy for both linear regression and time series models are evaluated. - Abstract: Forecasting heat demand is necessary for production and operation planning of district heating (DH) systems. In this study we first propose a simple regression model where the hourly outdoor temperature and wind speed forecast the heat demand. Weekly rhythm of heat consumption as a social component is added to the model to significantly improve the accuracy. The other type of model is the seasonal autoregressive integrated moving average (SARIMA) model with exogenous variables as a combination to take weather factors, and the historical heat consumption data as depending variables. One outstanding advantage of the model is that it peruses the high accuracy for both long-term and short-term forecast by considering both exogenous factors and time series. The forecasting performance of both linear regression models and time series model are evaluated based on real-life heat demand data for the city of Espoo in Finland by out-of-sample tests for the last 20 full weeks of the year. The results indicate that the proposed linear regression model (T168h) using 168-h demand pattern with midweek holidays classified as Saturdays or Sundays gives the highest accuracy and strong robustness among all the tested models based on the tested forecasting horizon and corresponding data. Considering the parsimony of the input, the ease of use and the high accuracy, the proposed T168h model is the best in practice. The heat demand forecasting model can also be developed for individual buildings if automated meter reading customer measurements are available. This would allow forecasting the heat demand based on more accurate heat consumption
Bonfatti, V; Tiezzi, F; Miglior, F; Carnier, P
2017-09-01
The objective of this study was to compare the prediction accuracy of 92 infrared prediction equations obtained by different statistical approaches. The predicted traits included fatty acid composition (n = 1,040); detailed protein composition (n = 1,137); lactoferrin (n = 558); pH and coagulation properties (n = 1,296); curd yield and composition obtained by a micro-cheese making procedure (n = 1,177); and Ca, P, Mg, and K contents (n = 689). The statistical methods used to develop the prediction equations were partial least squares regression (PLSR), Bayesian ridge regression, Bayes A, Bayes B, Bayes C, and Bayesian least absolute shrinkage and selection operator. Model performances were assessed, for each trait and model, in training and validation sets over 10 replicates. In validation sets, Bayesian regression models performed significantly better than PLSR for the prediction of 33 out of 92 traits, especially fatty acids, whereas they yielded a significantly lower prediction accuracy than PLSR in the prediction of 8 traits: the percentage of C18:1n-7 trans-9 in fat; the content of unglycosylated κ-casein and its percentage in protein; the content of α-lactalbumin; the percentage of α S2 -casein in protein; and the contents of Ca, P, and Mg. Even though Bayesian methods produced a significant enhancement of model accuracy in many traits compared with PLSR, most variations in the coefficient of determination in validation sets were smaller than 1 percentage point. Over traits, the highest predictive ability was obtained by Bayes C even though most of the significant differences in accuracy between Bayesian regression models were negligible. Copyright © 2017 American Dairy Science Association. Published by Elsevier Inc. All rights reserved.
Directory of Open Access Journals (Sweden)
Oleg Svatos
2013-01-01
Full Text Available In this paper we analyze complexity of time limits we can find especially in regulated processes of public administration. First we review the most popular process modeling languages. There is defined an example scenario based on the current Czech legislature which is then captured in discussed process modeling languages. Analysis shows that the contemporary process modeling languages support capturing of the time limit only partially. This causes troubles to analysts and unnecessary complexity of the models. Upon unsatisfying results of the contemporary process modeling languages we analyze the complexity of the time limits in greater detail and outline lifecycles of a time limit using the multiple dynamic generalizations pattern. As an alternative to the popular process modeling languages there is presented PSD process modeling language, which supports the defined lifecycles of a time limit natively and therefore allows keeping the models simple and easy to understand.
Directory of Open Access Journals (Sweden)
Nataša Šarlija
2017-01-01
Full Text Available This study sheds light on the most common issues related to applying logistic regression in prediction models for company growth. The purpose of the paper is 1 to provide a detailed demonstration of the steps in developing a growth prediction model based on logistic regression analysis, 2 to discuss common pitfalls and methodological errors in developing a model, and 3 to provide solutions and possible ways of overcoming these issues. Special attention is devoted to the question of satisfying logistic regression assumptions, selecting and defining dependent and independent variables, using classification tables and ROC curves, for reporting model strength, interpreting odds ratios as effect measures and evaluating performance of the prediction model. Development of a logistic regression model in this paper focuses on a prediction model of company growth. The analysis is based on predominantly financial data from a sample of 1471 small and medium-sized Croatian companies active between 2009 and 2014. The financial data is presented in the form of financial ratios divided into nine main groups depicting following areas of business: liquidity, leverage, activity, profitability, research and development, investing and export. The growth prediction model indicates aspects of a business critical for achieving high growth. In that respect, the contribution of this paper is twofold. First, methodological, in terms of pointing out pitfalls and potential solutions in logistic regression modelling, and secondly, theoretical, in terms of identifying factors responsible for high growth of small and medium-sized companies.
Gaussian process regression bootstrapping: exploring the effects of uncertainty in time course data.
Kirk, Paul D W; Stumpf, Michael P H
2009-05-15
Although widely accepted that high-throughput biological data are typically highly noisy, the effects that this uncertainty has upon the conclusions we draw from these data are often overlooked. However, in order to assign any degree of confidence to our conclusions, we must quantify these effects. Bootstrap resampling is one method by which this may be achieved. Here, we present a parametric bootstrapping approach for time-course data, in which Gaussian process regression (GPR) is used to fit a probabilistic model from which replicates may then be drawn. This approach implicitly allows the time dependence of the data to be taken into account, and is applicable to a wide range of problems. We apply GPR bootstrapping to two datasets from the literature. In the first example, we show how the approach may be used to investigate the effects of data uncertainty upon the estimation of parameters in an ordinary differential equations (ODE) model of a cell signalling pathway. Although we find that the parameter estimates inferred from the original dataset are relatively robust to data uncertainty, we also identify a distinct second set of estimates. In the second example, we use our method to show that the topology of networks constructed from time-course gene expression data appears to be sensitive to data uncertainty, although there may be individual edges in the network that are robust in light of present data. Matlab code for performing GPR bootstrapping is available from our web site: http://www3.imperial.ac.uk/theoreticalsystemsbiology/data-software/.
Rosso, Luigi; Riatti, Riccardo
2016-01-01
The knowledge of the associations between the timing of skeletal maturation and craniofacial growth is of primary importance when planning a functional treatment for most of the skeletal malocclusions. This cross-sectional study was thus aimed at evaluating whether sagittal and vertical craniofacial growth has an association with the timing of circumpubertal skeletal maturation. A total of 320 subjects (160 females and 160 males) were included in the study (mean age, 12.3 ± 1.7 years; range, 7.6–16.7 years). These subjects were equally distributed in the circumpubertal cervical vertebral maturation (CVM) stages 2 to 5. Each CVM stage group also had equal number of females and males. Multiple regression models were run for each CVM stage group to assess the significance of the association of cephalometric parameters (ANB, SN/MP, and NSBa angles) with age of attainment of the corresponding CVM stage (in months). Significant associations were seen only for stage 3, where the SN/MP angle was negatively associated with age (β coefficient, −0.7). These results show that hyperdivergent and hypodivergent subjects may have an anticipated and delayed attainment of the pubertal CVM stage 3, respectively. However, such association remains of little entity and it would become clinically relevant only in extreme cases. PMID:27995136
Directory of Open Access Journals (Sweden)
Giuseppe Perinetti
2016-01-01
Full Text Available The knowledge of the associations between the timing of skeletal maturation and craniofacial growth is of primary importance when planning a functional treatment for most of the skeletal malocclusions. This cross-sectional study was thus aimed at evaluating whether sagittal and vertical craniofacial growth has an association with the timing of circumpubertal skeletal maturation. A total of 320 subjects (160 females and 160 males were included in the study (mean age, 12.3±1.7 years; range, 7.6–16.7 years. These subjects were equally distributed in the circumpubertal cervical vertebral maturation (CVM stages 2 to 5. Each CVM stage group also had equal number of females and males. Multiple regression models were run for each CVM stage group to assess the significance of the association of cephalometric parameters (ANB, SN/MP, and NSBa angles with age of attainment of the corresponding CVM stage (in months. Significant associations were seen only for stage 3, where the SN/MP angle was negatively associated with age (β coefficient, −0.7. These results show that hyperdivergent and hypodivergent subjects may have an anticipated and delayed attainment of the pubertal CVM stage 3, respectively. However, such association remains of little entity and it would become clinically relevant only in extreme cases.
Parametric vs. Nonparametric Regression Modelling within Clinical Decision Support
Czech Academy of Sciences Publication Activity Database
Kalina, Jan; Zvárová, Jana
2017-01-01
Roč. 5, č. 1 (2017), s. 21-27 ISSN 1805-8698 R&D Projects: GA ČR GA17-01251S Institutional support: RVO:67985807 Keywords : decision support systems * decision rules * statistical analysis * nonparametric regression Subject RIV: IN - Informatics, Computer Science OBOR OECD: Statistics and probability
Bilinear regression model with Kronecker and linear structures for ...
African Journals Online (AJOL)
On the basis of n independent observations from a matrix normal distribution, estimating equations in a flip-flop relation are established and the consistency of estimators is studied. Keywords: Bilinear regression; Estimating equations; Flip- flop algorithm; Kronecker product structure; Linear structured covariance matrix; ...
Covariance Functions and Random Regression Models in the ...
African Journals Online (AJOL)
ARC-IRENE
CFs were on age of the cow expressed in months (AM) using quadratic (order three) regressions based on orthogonal (Legendre) polynomials, initially proposed by Kirkpatrick & Heckman (1989). The matrices of coefficients KG and KC (corresponding to the additive genetic and permanent environmental functions, G.
A binary logistic regression model with complex sampling design of ...
African Journals Online (AJOL)
2017-09-03
Sep 3, 2017 ... SPSS-21. Binary logistic regression with complex sam- pling design was fitted for the unmet need outcomes. Married women are disaggregated by various background characteristics to have an insight of their characteristics. All background characteristics of women used in this study were categorical ...
Phase Space Prediction of Chaotic Time Series with Nu-Support Vector Machine Regression
International Nuclear Information System (INIS)
Ye Meiying; Wang Xiaodong
2005-01-01
A new class of support vector machine, nu-support vector machine, is discussed which can handle both classification and regression. We focus on nu-support vector machine regression and use it for phase space prediction of chaotic time series. The effectiveness of the method is demonstrated by applying it to the Henon map. This study also compares nu-support vector machine with back propagation (BP) networks in order to better evaluate the performance of the proposed methods. The experimental results show that the nu-support vector machine regression obtains lower root mean squared error than the BP networks and provides an accurate chaotic time series prediction. These results can be attributable to the fact that nu-support vector machine implements the structural risk minimization principle and this leads to better generalization than the BP networks.
FRICTION MODELING OF Al-Mg ALLOY SHEETS BASED ON MULTIPLE REGRESSION ANALYSIS AND NEURAL NETWORKS
Directory of Open Access Journals (Sweden)
Hirpa G. Lemu
2017-03-01
Full Text Available This article reports a proposed approach to a frictional resistance description in sheet metal forming processes that enables determination of the friction coefficient value under a wide range of friction conditions without performing time-consuming experiments. The motivation for this proposal is the fact that there exists a considerable amount of factors affect the friction coefficient value and as a result building analytical friction model for specified process conditions is practically impossible. In this proposed approach, a mathematical model of friction behaviour is created using multiple regression analysis and artificial neural networks. The regression analysis was performed using a subroutine in MATLAB programming code and STATISTICA Neural Networks was utilized to build an artificial neural networks model. The effect of different training strategies on the quality of neural networks was studied. As input variables for regression model and training of radial basis function networks, generalized regression neural networks and multilayer networks the results of strip drawing friction test were utilized. Four kinds of Al-Mg alloy sheets were used as a test material.
Near Real-Time Dust Aerosol Detection with Support Vector Machines for Regression
Rivas-Perea, P.; Rivas-Perea, P. E.; Cota-Ruiz, J.; Aragon Franco, R. A.
2015-12-01
Remote sensing instruments operating in the near-infrared spectrum usually provide the necessary information for further dust aerosol spectral analysis using statistical or machine learning algorithms. Such algorithms have proven to be effective in analyzing very specific case studies or dust events. However, very few make the analysis open to the public on a regular basis, fewer are designed specifically to operate in near real-time to higher resolutions, and almost none give a global daily coverage. In this research we investigated a large-scale approach to a machine learning algorithm called "support vector regression". The algorithm uses four near-infrared spectral bands from NASA MODIS instrument: B20 (3.66-3.84μm), B29 (8.40-8.70μm), B31 (10.78-11.28μm), and B32 (11.77-12.27μm). The algorithm is presented with ground truth from more than 30 distinct reported dust events, from different geographical regions, at different seasons, both over land and sea cover, in the presence of clouds and clear sky, and in the presence of fires. The purpose of our algorithm is to learn to distinguish the dust aerosols spectral signature from other spectral signatures, providing as output an estimate of the probability of a data point being consistent with dust aerosol signatures. During modeling with ground truth, our algorithm achieved more than 90% of accuracy, and the current live performance of the algorithm is remarkable. Moreover, our algorithm is currently operating in near real-time using NASA's Land, Atmosphere Near real-time Capability for EOS (LANCE) servers, providing a high resolution global overview including 64, 32, 16, 8, 4, 2, and 1km. The near real-time analysis of our algorithm is now available to the general public at http://dust.reev.us and archives of the results starting from 2012 are available upon request.
Directory of Open Access Journals (Sweden)
Soyoung Park
2017-07-01
Full Text Available This study mapped and analyzed groundwater potential using two different models, logistic regression (LR and multivariate adaptive regression splines (MARS, and compared the results. A spatial database was constructed for groundwater well data and groundwater influence factors. Groundwater well data with a high potential yield of ≥70 m3/d were extracted, and 859 locations (70% were used for model training, whereas the other 365 locations (30% were used for model validation. We analyzed 16 groundwater influence factors including altitude, slope degree, slope aspect, plan curvature, profile curvature, topographic wetness index, stream power index, sediment transport index, distance from drainage, drainage density, lithology, distance from fault, fault density, distance from lineament, lineament density, and land cover. Groundwater potential maps (GPMs were constructed using LR and MARS models and tested using a receiver operating characteristics curve. Based on this analysis, the area under the curve (AUC for the success rate curve of GPMs created using the MARS and LR models was 0.867 and 0.838, and the AUC for the prediction rate curve was 0.836 and 0.801, respectively. This implies that the MARS model is useful and effective for groundwater potential analysis in the study area.
Semiparametric Mixtures of Regressions with Single-index for Model Based Clustering
Xiang, Sijia; Yao, Weixin
2017-01-01
In this article, we propose two classes of semiparametric mixture regression models with single-index for model based clustering. Unlike many semiparametric/nonparametric mixture regression models that can only be applied to low dimensional predictors, the new semiparametric models can easily incorporate high dimensional predictors into the nonparametric components. The proposed models are very general, and many of the recently proposed semiparametric/nonparametric mixture regression models a...
Grajeda, Laura M; Ivanescu, Andrada; Saito, Mayuko; Crainiceanu, Ciprian; Jaganath, Devan; Gilman, Robert H; Crabtree, Jean E; Kelleher, Dermott; Cabrera, Lilia; Cama, Vitaliano; Checkley, William
2016-01-01
Childhood growth is a cornerstone of pediatric research. Statistical models need to consider individual trajectories to adequately describe growth outcomes. Specifically, well-defined longitudinal models are essential to characterize both population and subject-specific growth. Linear mixed-effect models with cubic regression splines can account for the nonlinearity of growth curves and provide reasonable estimators of population and subject-specific growth, velocity and acceleration. We provide a stepwise approach that builds from simple to complex models, and account for the intrinsic complexity of the data. We start with standard cubic splines regression models and build up to a model that includes subject-specific random intercepts and slopes and residual autocorrelation. We then compared cubic regression splines vis-à-vis linear piecewise splines, and with varying number of knots and positions. Statistical code is provided to ensure reproducibility and improve dissemination of methods. Models are applied to longitudinal height measurements in a cohort of 215 Peruvian children followed from birth until their fourth year of life. Unexplained variability, as measured by the variance of the regression model, was reduced from 7.34 when using ordinary least squares to 0.81 (p linear mixed-effect models with random slopes and a first order continuous autoregressive error term. There was substantial heterogeneity in both the intercept (p linear regression equation for both estimation and prediction of population- and individual-level growth in height. We show that cubic regression splines are superior to linear regression splines for the case of a small number of knots in both estimation and prediction with the full linear mixed effect model (AIC 19,352 vs. 19,598, respectively). While the regression parameters are more complex to interpret in the former, we argue that inference for any problem depends more on the estimated curve or differences in curves rather
Tan, C. H.; Matjafri, M. Z.; Lim, H. S.
2015-10-01
This paper presents the prediction models which analyze and compute the CO2 emission in Malaysia. Each prediction model for CO2 emission will be analyzed based on three main groups which is transportation, electricity and heat production as well as residential buildings and commercial and public services. The prediction models were generated using data obtained from World Bank Open Data. Best subset method will be used to remove irrelevant data and followed by multi linear regression to produce the prediction models. From the results, high R-square (prediction) value was obtained and this implies that the models are reliable to predict the CO2 emission by using specific data. In addition, the CO2 emissions from these three groups are forecasted using trend analysis plots for observation purpose.
Nagel-Alne, G E; Krontveit, R; Bohlin, J; Valle, P S; Skjerve, E; Sølverød, L S
2014-07-01
In 2001, the Norwegian Goat Health Service initiated the Healthier Goats program (HG), with the aim of eradicating caprine arthritis encephalitis, caseous lymphadenitis, and Johne's disease (caprine paratuberculosis) in Norwegian goat herds. The aim of the present study was to explore how control and eradication of the above-mentioned diseases by enrolling in HG affected milk yield by comparison with herds not enrolled in HG. Lactation curves were modeled using a multilevel cubic spline regression model where farm, goat, and lactation were included as random effect parameters. The data material contained 135,446 registrations of daily milk yield from 28,829 lactations in 43 herds. The multilevel cubic spline regression model was applied to 4 categories of data: enrolled early, control early, enrolled late, and control late. For enrolled herds, the early and late notations refer to the situation before and after enrolling in HG; for nonenrolled herds (controls), they refer to development over time, independent of HG. Total milk yield increased in the enrolled herds after eradication: the total milk yields in the fourth lactation were 634.2 and 873.3 kg in enrolled early and enrolled late herds, respectively, and 613.2 and 701.4 kg in the control early and control late herds, respectively. Day of peak yield differed between enrolled and control herds. The day of peak yield came on d 6 of lactation for the control early category for parities 2, 3, and 4, indicating an inability of the goats to further increase their milk yield from the initial level. For enrolled herds, on the other hand, peak yield came between d 49 and 56, indicating a gradual increase in milk yield after kidding. Our results indicate that enrollment in the HG disease eradication program improved the milk yield of dairy goats considerably, and that the multilevel cubic spline regression was a suitable model for exploring effects of disease control and eradication on milk yield. Copyright © 2014
Semiparametric nonlinear quantile regression model for financial returns
Czech Academy of Sciences Publication Activity Database
Avdulaj, Krenar; Baruník, Jozef
2017-01-01
Roč. 21, č. 1 (2017), s. 81-97 ISSN 1081-1826 R&D Projects: GA ČR(CZ) GBP402/12/G097 Institutional support: RVO:67985556 Keywords : copula quantile regression * realized volatility * value-at-risk Subject RIV: AH - Economics OBOR OECD: Applied Economics, Econometrics Impact factor: 0.649, year: 2016 http://library.utia.cas.cz/separaty/2017/E/avdulaj-0472346.pdf
International Nuclear Information System (INIS)
Urrutia, J D; Bautista, L A; Baccay, E B
2014-01-01
The aim of this study was to develop mathematical models for estimating earthquake casualties such as death, number of injured persons, affected families and total cost of damage. To quantify the direct damages from earthquakes to human beings and properties given the magnitude, intensity, depth of focus, location of epicentre and time duration, the regression models were made. The researchers formulated models through regression analysis using matrices and used α = 0.01. The study considered thirty destructive earthquakes that hit the Philippines from the inclusive years 1968 to 2012. Relevant data about these said earthquakes were obtained from Philippine Institute of Volcanology and Seismology. Data on damages and casualties were gathered from the records of National Disaster Risk Reduction and Management Council. This study will be of great value in emergency planning, initiating and updating programs for earthquake hazard reduction in the Philippines, which is an earthquake-prone country.
Global regression model for moisture content determination using near-infrared spectroscopy.
Clavaud, Matthieu; Roggo, Yves; Dégardin, Klara; Sacré, Pierre-Yves; Hubert, Philippe; Ziemons, Eric
2017-10-01
Near-infrared (NIR) global quantitative models were evaluated for the moisture content (MC) determination of three different freeze-dried drug products. The quantitative models were based on 3822 spectra measured on two identical spectrometers to include variability. The MC, measured with the reference Karl Fischer (KF) method, were ranged from 0.05% to 4.96%. Linear and non-linear regression models using Partial Least Square (PLS), Decision Tree (DT), Bayesian Ridge Regression (Bayes-RR), K-Nearest Neighbors (KNN), and Support Vector Regression (SVR) algorithms were created and evaluated. Among them, the SVR model was retained for a global application. The Standard Error of Calibration (SEC) and the Standard Error of Prediction (SEP) were respectively 0.12% and 0.15%. This model was then evaluated in terms of total error and risk-based assessment, linearity, and accuracy. It was observed that MC can be fastly and simultaneously determined in freeze-dried pharmaceutical products thanks to a global NIR model created with different medicines. This innovative approach allows to speed up the validation time and the in-lab release analyses. Copyright © 2017 Elsevier B.V. All rights reserved.
Model checks for Cox-type regression models based on optimally weighted martingale residuals.
Gandy, Axel; Jensen, Uwe
2009-12-01
We introduce directed goodness-of-fit tests for Cox-type regression models in survival analysis. "Directed" means that one may choose against which alternatives the tests are particularly powerful. The tests are based on sums of weighted martingale residuals and their asymptotic distributions.We derive optimal tests against certain competing models which include Cox-type regression models with different covariates and/or a different link function. We report results from several simulation studies and apply our test to a real dataset.
Abad, Cesar C C; Barros, Ronaldo V; Bertuzzi, Romulo; Gagliardi, João F L; Lima-Silva, Adriano E; Lambert, Mike I; Pires, Flavio O
2016-06-01
The aim of this study was to verify the power of VO 2max , peak treadmill running velocity (PTV), and running economy (RE), unadjusted or allometrically adjusted, in predicting 10 km running performance. Eighteen male endurance runners performed: 1) an incremental test to exhaustion to determine VO 2max and PTV; 2) a constant submaximal run at 12 km·h -1 on an outdoor track for RE determination; and 3) a 10 km running race. Unadjusted (VO 2max , PTV and RE) and adjusted variables (VO 2max 0.72 , PTV 0.72 and RE 0.60 ) were investigated through independent multiple regression models to predict 10 km running race time. There were no significant correlations between 10 km running time and either the adjusted or unadjusted VO 2max . Significant correlations (p 0.84 and power > 0.88. The allometrically adjusted predictive model was composed of PTV 0.72 and RE 0.60 and explained 83% of the variance in 10 km running time with a standard error of the estimate (SEE) of 1.5 min. The unadjusted model composed of a single PVT accounted for 72% of the variance in 10 km running time (SEE of 1.9 min). Both regression models provided powerful estimates of 10 km running time; however, the unadjusted PTV may provide an uncomplicated estimation.
Beta Regression Finite Mixture Models of Polarization and Priming
Smithson, Michael; Merkle, Edgar C.; Verkuilen, Jay
2011-01-01
This paper describes the application of finite-mixture general linear models based on the beta distribution to modeling response styles, polarization, anchoring, and priming effects in probability judgments. These models, in turn, enhance our capacity for explicitly testing models and theories regarding the aforementioned phenomena. The mixture…
Prediction of Mind-Wandering with Electroencephalogram and Non-linear Regression Modeling.
Kawashima, Issaku; Kumano, Hiroaki
2017-01-01
Mind-wandering (MW), task-unrelated thought, has been examined by researchers in an increasing number of articles using models to predict whether subjects are in MW, using numerous physiological variables. However, these models are not applicable in general situations. Moreover, they output only binary classification. The current study suggests that the combination of electroencephalogram (EEG) variables and non-linear regression modeling can be a good indicator of MW intensity. We recorded EEGs of 50 subjects during the performance of a Sustained Attention to Response Task, including a thought sampling probe that inquired the focus of attention. We calculated the power and coherence value and prepared 35 patterns of variable combinations and applied Support Vector machine Regression (SVR) to them. Finally, we chose four SVR models: two of them non-linear models and the others linear models; two of the four models are composed of a limited number of electrodes to satisfy model usefulness. Examination using the held-out data indicated that all models had robust predictive precision and provided significantly better estimations than a linear regression model using single electrode EEG variables. Furthermore, in limited electrode condition, non-linear SVR model showed significantly better precision than linear SVR model. The method proposed in this study helps investigations into MW in various little-examined situations. Further, by measuring MW with a high temporal resolution EEG, unclear aspects of MW, such as time series variation, are expected to be revealed. Furthermore, our suggestion that a few electrodes can also predict MW contributes to the development of neuro-feedback studies.
Prediction of Mind-Wandering with Electroencephalogram and Non-linear Regression Modeling
Directory of Open Access Journals (Sweden)
Issaku Kawashima
2017-07-01
Full Text Available Mind-wandering (MW, task-unrelated thought, has been examined by researchers in an increasing number of articles using models to predict whether subjects are in MW, using numerous physiological variables. However, these models are not applicable in general situations. Moreover, they output only binary classification. The current study suggests that the combination of electroencephalogram (EEG variables and non-linear regression modeling can be a good indicator of MW intensity. We recorded EEGs of 50 subjects during the performance of a Sustained Attention to Response Task, including a thought sampling probe that inquired the focus of attention. We calculated the power and coherence value and prepared 35 patterns of variable combinations and applied Support Vector machine Regression (SVR to them. Finally, we chose four SVR models: two of them non-linear models and the others linear models; two of the four models are composed of a limited number of electrodes to satisfy model usefulness. Examination using the held-out data indicated that all models had robust predictive precision and provided significantly better estimations than a linear regression model using single electrode EEG variables. Furthermore, in limited electrode condition, non-linear SVR model showed significantly better precision than linear SVR model. The method proposed in this study helps investigations into MW in various little-examined situations. Further, by measuring MW with a high temporal resolution EEG, unclear aspects of MW, such as time series variation, are expected to be revealed. Furthermore, our suggestion that a few electrodes can also predict MW contributes to the development of neuro-feedback studies.
Lamont, A.E.; Vermunt, J.K.; Van Horn, M.L.
2016-01-01
Regression mixture models are increasingly used as an exploratory approach to identify heterogeneity in the effects of a predictor on an outcome. In this simulation study, we tested the effects of violating an implicit assumption often made in these models; that is, independent variables in the
Epsilon-insensitive fuzzy c-regression models: introduction to epsilon-insensitive fuzzy modeling.
Leski, Jacek M
2004-02-01
This paper introduces a new epsilon-insensitive fuzzy c-regression models (epsilonFCRM), that can be used in fuzzy modeling. To fit these regression models to real data, a weighted epsilon-insensitive loss function is used. The proposed method make it possible to exclude an intrinsic inconsistency of fuzzy modeling, where crisp loss function (usually quadratic) is used to match real data and the fuzzy model. The epsilon-insensitive fuzzy modeling is based on human thinking and learning. This method allows easy control of generalization ability and outliers robustness. This approach leads to c simultaneous quadratic programming problems with bound constraints and one linear equality constraint. To solve this problem, computationally efficient numerical method, called incremental learning, is proposed. Finally, examples are given to demonstrate the validity of introduced approach to fuzzy modeling.
Directory of Open Access Journals (Sweden)
Yoonsu Shin
2016-01-01
Full Text Available In the 5G era, the operational cost of mobile wireless networks will significantly increase. Further, massive network capacity and zero latency will be needed because everything will be connected to mobile networks. Thus, self-organizing networks (SON are needed, which expedite automatic operation of mobile wireless networks, but have challenges to satisfy the 5G requirements. Therefore, researchers have proposed a framework to empower SON using big data. The recent framework of a big data-empowered SON analyzes the relationship between key performance indicators (KPIs and related network parameters (NPs using machine-learning tools, and it develops regression models using a Gaussian process with those parameters. The problem, however, is that the methods of finding the NPs related to the KPIs differ individually. Moreover, the Gaussian process regression model cannot determine the relationship between a KPI and its various related NPs. In this paper, to solve these problems, we proposed multivariate multiple regression models to determine the relationship between various KPIs and NPs. If we assume one KPI and multiple NPs as one set, the proposed models help us process multiple sets at one time. Also, we can find out whether some KPIs are conflicting or not. We implement the proposed models using MapReduce.
Dynamic modeling of predictive uncertainty by regression on absolute errors
Pianosi, F.; Raso, L.
2012-01-01
Uncertainty of hydrological forecasts represents valuable information for water managers and hydrologists. This explains the popularity of probabilistic models, which provide the entire distribution of the hydrological forecast. Nevertheless, many existing hydrological models are deterministic and
Langella, Giuliano; Basile, Angelo; Bonfante, Antonello; Manna, Piero; Terribile, Fabio
2013-04-01
Digital soil mapping procedures are widespread used to build two-dimensional continuous maps about several pedological attributes. Our work addressed a regression kriging (RK) technique and a bootstrapped artificial neural network approach in order to evaluate and compare (i) the accuracy of prediction, (ii) the susceptibility of being included in automatic engines (e.g. to constitute web processing services), and (iii) the time cost needed for calibrating models and for making predictions. Regression kriging is maybe the most widely used geostatistical technique in the digital soil mapping literature. Here we tried to apply the EBLUP regression kriging as it is deemed to be the most statistically sound RK flavor by pedometricians. An unusual multi-parametric and nonlinear machine learning approach was accomplished, called BAGAP (Bootstrap aggregating Artificial neural networks with Genetic Algorithms and Principal component regression). BAGAP combines a selected set of weighted neural nets having specified characteristics to yield an ensemble response. The purpose of applying these two particular models is to ascertain whether and how much a more cumbersome machine learning method could be much promising in making more accurate/precise predictions. Being aware of the difficulty to handle objects based on EBLUP-RK as well as BAGAP when they are embedded in environmental applications, we explore the susceptibility of them in being wrapped within Web Processing Services. Two further kinds of aspects are faced for an exhaustive evaluation and comparison: automaticity and time of calculation with/without high performance computing leverage.
Sun, Yanqing; Sun, Liuquan; Zhou, Jie
2013-07-01
This paper studies the generalized semiparametric regression model for longitudinal data where the covariate effects are constant for some and time-varying for others. Different link functions can be used to allow more flexible modelling of longitudinal data. The nonparametric components of the model are estimated using a local linear estimating equation and the parametric components are estimated through a profile estimating function. The method automatically adjusts for heterogeneity of sampling times, allowing the sampling strategy to depend on the past sampling history as well as possibly time-dependent covariates without specifically model such dependence. A [Formula: see text]-fold cross-validation bandwidth selection is proposed as a working tool for locating an appropriate bandwidth. A criteria for selecting the link function is proposed to provide better fit of the data. Large sample properties of the proposed estimators are investigated. Large sample pointwise and simultaneous confidence intervals for the regression coefficients are constructed. Formal hypothesis testing procedures are proposed to check for the covariate effects and whether the effects are time-varying. A simulation study is conducted to examine the finite sample performances of the proposed estimation and hypothesis testing procedures. The methods are illustrated with a data example.
Parental Vaccine Acceptance: A Logistic Regression Model Using Previsit Decisions.
Lee, Sara; Riley-Behringer, Maureen; Rose, Jeanmarie C; Meropol, Sharon B; Lazebnik, Rina
2017-07-01
This study explores how parents' intentions regarding vaccination prior to their children's visit were associated with actual vaccine acceptance. A convenience sample of parents accompanying 6-week-old to 17-year-old children completed a written survey at 2 pediatric practices. Using hierarchical logistic regression, for hospital-based participants (n = 216), vaccine refusal history ( P < .01) and vaccine decision made before the visit ( P < .05) explained 87% of vaccine refusals. In community-based participants (n = 100), vaccine refusal history ( P < .01) explained 81% of refusals. Over 1 in 5 parents changed their minds about vaccination during the visit. Thirty parents who were previous vaccine refusers accepted current vaccines, and 37 who had intended not to vaccinate choose vaccination. Twenty-nine parents without a refusal history declined vaccines, and 32 who did not intend to refuse before the visit declined vaccination. Future research should identify key factors to nudge parent decision making in favor of vaccination.
Forecast Model of Urban Stagnant Water Based on Logistic Regression
Directory of Open Access Journals (Sweden)
Liu Pan
2017-01-01
Full Text Available With the development of information technology, the construction of water resource system has been gradually carried out. In the background of big data, the work of water information needs to carry out the process of quantitative to qualitative change. Analyzing the correlation of data and exploring the deep value of data which are the key of water information’s research. On the basis of the research on the water big data and the traditional data warehouse architecture, we try to find out the connection of different data source. According to the temporal and spatial correlation of stagnant water and rainfall, we use spatial interpolation to integrate data of stagnant water and rainfall which are from different data source and different sensors, then use logistic regression to find out the relationship between them.
Asavaskulkiet, Krissada
2014-01-01
This paper proposes a novel face super-resolution reconstruction (hallucination) technique for YCbCr color space. The underlying idea is to learn with an error regression model and multi-linear principal component analysis (MPCA). From hallucination framework, many color face images are explained in YCbCr space. To reduce the time complexity of color face hallucination, we can be naturally described the color face imaged as tensors or multi-linear arrays. In addition, the error regression analysis is used to find the error estimation which can be obtained from the existing LR in tensor space. In learning process is from the mistakes in reconstruct face images of the training dataset by MPCA, then finding the relationship between input and error by regression analysis. In hallucinating process uses normal method by backprojection of MPCA, after that the result is corrected with the error estimation. In this contribution we show that our hallucination technique can be suitable for color face images both in RGB and YCbCr space. By using the MPCA subspace with error regression model, we can generate photorealistic color face images. Our approach is demonstrated by extensive experiments with high-quality hallucinated color faces. Comparison with existing algorithms shows the effectiveness of the proposed method.
Regression Modeling of EDM Process for AISI D2 Tool Steel with RSM
Directory of Open Access Journals (Sweden)
Shakir M. Mousa
2018-01-01
Full Text Available In this paper, Response Surface Method (RSM is utilized to carry out an investigation of the impact of input parameters: electrode type (E.T. [Gr, Cu and CuW], pulse duration of current (Ip, pulse duration on time (Ton, and pulse duration off time (Toff on the surface finish in EDM operation. To approximate and concentrate the suggested second- order regression model is generally accepted for Surface Roughness Ra, a Central Composite Design (CCD is utilized for evaluating the model constant coefficients of the input parameters on Surface Roughness (Ra. Examinations were performed on AISI D2 tool steel. The important coefficients are gotten by achieving successfully an Analysis of Variance (ANOVA at the 5 % confidence interval. The outcomes discover that Surface Roughness (Ra is much more impacted by E.T., Ton, Toff, Ip and little of their interactions action or influence. To predict the average Surface Roughness (Ra, a mathematical regression model was developed. Furthermore, for saving in time, the created model could be utilized for the choice of the high levels in the EDM procedure. The model adequacy was extremely agreeable as the constant Coefficient of Determination (R2 is observed to be 99.72% and adjusted R2-measurement (R2adj 99.60%.
On pseudo-values for regression analysis in competing risks models
DEFF Research Database (Denmark)
Graw, F; Gerds, Thomas Alexander; Schumacher, M
2009-01-01
For regression on state and transition probabilities in multi-state models Andersen et al. (Biometrika 90:15-27, 2003) propose a technique based on jackknife pseudo-values. In this article we analyze the pseudo-values suggested for competing risks models and prove some conjectures regarding...... their asymptotics (Klein and Andersen, Biometrics 61:223-229, 2005). The key is a second order von Mises expansion of the Aalen-Johansen estimator which yields an appropriate representation of the pseudo-values. The method is illustrated with data from a clinical study on total joint replacement. In the application...... we consider for comparison the estimates obtained with the Fine and Gray approach (J Am Stat Assoc 94:496-509, 1999) and also time-dependent solutions of pseudo-value regression equations....
DEFF Research Database (Denmark)
Xu, Man; Pinson, Pierre; Lu, Zongxiang
2016-01-01
Wind farm power curve modeling, which characterizes the relationship between meteorological variables and power production, is a crucial procedure for wind power forecasting. In many cases, power curve modeling is more impacted by the limited quality of input data rather than the stochastic nature...... of the energy conversion process. Such nature may be due the varying wind conditions, aging and state of the turbines, etc. And, an equivalent steady-state power curve, estimated under normal operating conditions with the intention to filter abnormal data, is not sufficient to solve the problem because...... of the lack of time adaptivity. In this paper, a refined local polynomial regression algorithm is proposed to yield an adaptive robust model of the time-varying scattered power curve for forecasting applications. The time adaptivity of the algorithm is considered with a new data-driven bandwidth selection...
Time-trend of melanoma screening practice by primary care physicians: A meta-regression analysis
Valachis, Antonis; Mauri, Davide; Karampoiki, Vassiliki; Polyzos, Nikolaos P; Cortinovis, Ivan; Koukourakis, Georgios; Zacharias, Georgios; Xilomenos, Apostolos; Tsappi, Maria; Casazza, Giovanni
2009-01-01
Objective To assess whether the proportion of primary care physicians implementing full body skin examination (FBSE) to screen for melanoma changed over time. Methods Meta-regression analyses of available data. Data Sources: MEDLINE, ISI, Cochrane Central Register of Controlled Trials. Results Fifteen studies surveying 10,336 physicians were included in the analyses. Overall, 15%?82% of them reported to perform FBSE to screen for melanoma. The proportion of physicians using FBSE screening ten...
Cluster regression model and level fluctuation features of Van Lake, Turkey
Directory of Open Access Journals (Sweden)
Z. Şen
Full Text Available Lake water levels change under the influences of natural and/or anthropogenic environmental conditions. Among these influences are the climate change, greenhouse effects and ozone layer depletions which are reflected in the hydrological cycle features over the lake drainage basins. Lake levels are among the most significant hydrological variables that are influenced by different atmospheric and environmental conditions. Consequently, lake level time series in many parts of the world include nonstationarity components such as shifts in the mean value, apparent or hidden periodicities. On the other hand, many lake level modeling techniques have a stationarity assumption. The main purpose of this work is to develop a cluster regression model for dealing with nonstationarity especially in the form of shifting means. The basis of this model is the combination of transition probability and classical regression technique. Both parts of the model are applied to monthly level fluctuations of Lake Van in eastern Turkey. It is observed that the cluster regression procedure does preserve the statistical properties and the transitional probabilities that are indistinguishable from the original data.
Key words. Hydrology (hydrologic budget; stochastic processes · Meteorology and atmospheric dynamics (ocean-atmosphere interactions
MODELLING OF ORDINAL TIME SERIES BY PROPORTIONAL ODDS MODEL
Directory of Open Access Journals (Sweden)
Serpil AKTAŞ ALTUNAY
2013-06-01
Full Text Available Categorical time series data with random time dependent covariates often arise when the variable categories are assigned as categorical. There are several other models that have been proposed in the literature for the analysis of categorical time series. For example, Markov chain models, integer autoregressive processes, discrete ARMA models can be utilized for modeling of categorical time series. In general, the choice of model depends on the measurement of study variables: nominal, ordinal and interval. However, regression theory is successful approach for categorical time series which is based on generalized linear models and partial likelihood inference. One of the models for ordinal time series in regression theory is proportional odds model. In this study, proportional odds model approach to ordinal categorical time series is investigated based on a real air pollution data set and the results are discussed.
Misspecified poisson regression models for large-scale registry data
DEFF Research Database (Denmark)
Grøn, Randi; Gerds, Thomas A.; Andersen, Per K.
2016-01-01
working models that are then likely misspecified. To support and improve conclusions drawn from such models, we discuss methods for sensitivity analysis, for estimation of average exposure effects using aggregated data, and a semi-parametric bootstrap method to obtain robust standard errors. The methods...
Approximate Tests of Hypotheses in Regression Models with Grouped Data
1979-02-01
in terms of Kolmogoroff -Smirnov statistic in the next section. I 1 1 I t A 4. Simulations Two models have been considered for simulations. Model I. Yuk...Fort Meade, MD 20755 2 Commanding Officer Navy LibraryrnhOffice o Naval Research National Space Technology LaboratoryBranch Office *Attn: Navy
Effects of Employing Ridge Regression in Structural Equation Models.
McQuitty, Shaun
1997-01-01
LISREL 8 invokes a ridge option when maximum likelihood or generalized least squares are used to estimate a structural equation model with a nonpositive definite covariance or correlation matrix. Implications of the ridge option for model fit, parameter estimates, and standard errors are explored through two examples. (SLD)
Predicting recycling behaviour: Comparison of a linear regression model and a fuzzy logic model.
Vesely, Stepan; Klöckner, Christian A; Dohnal, Mirko
2016-03-01
In this paper we demonstrate that fuzzy logic can provide a better tool for predicting recycling behaviour than the customarily used linear regression. To show this, we take a set of empirical data on recycling behaviour (N=664), which we randomly divide into two halves. The first half is used to estimate a linear regression model of recycling behaviour, and to develop a fuzzy logic model of recycling behaviour. As the first comparison, the fit of both models to the data included in estimation of the models (N=332) is evaluated. As the second comparison, predictive accuracy of both models for "new" cases (hold-out data not included in building the models, N=332) is assessed. In both cases, the fuzzy logic model significantly outperforms the regression model in terms of fit. To conclude, when accurate predictions of recycling and possibly other environmental behaviours are needed, fuzzy logic modelling seems to be a promising technique. Copyright © 2015 Elsevier Ltd. All rights reserved.
Logistic regression model for detecting radon prone areas in Ireland.
Elío, J; Crowley, Q; Scanlon, R; Hodgson, J; Long, S
2017-12-01
A new high spatial resolution radon risk map of Ireland has been developed, based on a combination of indoor radon measurements (n=31,910) and relevant geological information (i.e. Bedrock Geology, Quaternary Geology, soil permeability and aquifer type). Logistic regression was used to predict the probability of having an indoor radon concentration above the national reference level of 200Bqm -3 in Ireland. The four geological datasets evaluated were found to be statistically significant, and, based on combinations of these four variables, the predicted probabilities ranged from 0.57% to 75.5%. Results show that the Republic of Ireland may be divided in three main radon risk categories: High (HR), Medium (MR) and Low (LR). The probability of having an indoor radon concentration above 200Bqm -3 in each area was found to be 19%, 8% and 3%; respectively. In the Republic of Ireland, the population affected by radon concentrations above 200Bqm -3 is estimated at ca. 460k (about 10% of the total population). Of these, 57% (265k), 35% (160k) and 8% (35k) are in High, Medium and Low Risk Areas, respectively. Our results provide a high spatial resolution utility which permit customised radon-awareness information to be targeted at specific geographic areas. Copyright © 2017 Elsevier B.V. All rights reserved.
Learning Supervised Topic Models for Classification and Regression from Crowds
Rodrigues, Filipe; Lourenco, Mariana; Ribeiro, Bernardete; Pereira, Francisco
2017-01-01
The growing need to analyze large collections of documents has led to great developments in topic modeling. Since documents are frequently associated with other related variables, such as labels or ratings, much interest has been placed on supervised topic models. However, the nature of most annotation tasks, prone to ambiguity and noise, often with high volumes of documents, deem learning under a single-annotator assumption unrealistic or unpractical for most real-world applications. In this...
Directory of Open Access Journals (Sweden)
Simone Becker Lopes
2014-04-01
Full Text Available Considering the importance of spatial issues in transport planning, the main objective of this study was to analyze the results obtained from different approaches of spatial regression models. In the case of spatial autocorrelation, spatial dependence patterns should be incorporated in the models, since that dependence may affect the predictive power of these models. The results obtained with the spatial regression models were also compared with the results of a multiple linear regression model that is typically used in trips generation estimations. The findings support the hypothesis that the inclusion of spatial effects in regression models is important, since the best results were obtained with alternative models (spatial regression models or the ones with spatial variables included. This was observed in a case study carried out in the city of Porto Alegre, in the state of Rio Grande do Sul, Brazil, in the stages of specification and calibration of the models, with two distinct datasets.
Directory of Open Access Journals (Sweden)
Laura M. Grajeda
2016-01-01
Full Text Available Abstract Background Childhood growth is a cornerstone of pediatric research. Statistical models need to consider individual trajectories to adequately describe growth outcomes. Specifically, well-defined longitudinal models are essential to characterize both population and subject-specific growth. Linear mixed-effect models with cubic regression splines can account for the nonlinearity of growth curves and provide reasonable estimators of population and subject-specific growth, velocity and acceleration. Methods We provide a stepwise approach that builds from simple to complex models, and account for the intrinsic complexity of the data. We start with standard cubic splines regression models and build up to a model that includes subject-specific random intercepts and slopes and residual autocorrelation. We then compared cubic regression splines vis-à-vis linear piecewise splines, and with varying number of knots and positions. Statistical code is provided to ensure reproducibility and improve dissemination of methods. Models are applied to longitudinal height measurements in a cohort of 215 Peruvian children followed from birth until their fourth year of life. Results Unexplained variability, as measured by the variance of the regression model, was reduced from 7.34 when using ordinary least squares to 0.81 (p < 0.001 when using a linear mixed-effect models with random slopes and a first order continuous autoregressive error term. There was substantial heterogeneity in both the intercept (p < 0.001 and slopes (p < 0.001 of the individual growth trajectories. We also identified important serial correlation within the structure of the data (ρ = 0.66; 95 % CI 0.64 to 0.68; p < 0.001, which we modeled with a first order continuous autoregressive error term as evidenced by the variogram of the residuals and by a lack of association among residuals. The final model provides a parametric linear regression equation for both estimation and
Online Support Vector Regression with Varying Parameters for Time-Dependent Data
International Nuclear Information System (INIS)
Omitaomu, Olufemi A.; Jeong, Myong K.; Badiru, Adedeji B.
2011-01-01
Support vector regression (SVR) is a machine learning technique that continues to receive interest in several domains including manufacturing, engineering, and medicine. In order to extend its application to problems in which datasets arrive constantly and in which batch processing of the datasets is infeasible or expensive, an accurate online support vector regression (AOSVR) technique was proposed. The AOSVR technique efficiently updates a trained SVR function whenever a sample is added to or removed from the training set without retraining the entire training data. However, the AOSVR technique assumes that the new samples and the training samples are of the same characteristics; hence, the same value of SVR parameters is used for training and prediction. This assumption is not applicable to data samples that are inherently noisy and non-stationary such as sensor data. As a result, we propose Accurate On-line Support Vector Regression with Varying Parameters (AOSVR-VP) that uses varying SVR parameters rather than fixed SVR parameters, and hence accounts for the variability that may exist in the samples. To accomplish this objective, we also propose a generalized weight function to automatically update the weights of SVR parameters in on-line monitoring applications. The proposed function allows for lower and upper bounds for SVR parameters. We tested our proposed approach and compared results with the conventional AOSVR approach using two benchmark time series data and sensor data from nuclear power plant. The results show that using varying SVR parameters is more applicable to time dependent data.
von Davier, Matthias; Sinharay, Sandip
2009-01-01
This paper presents an application of a stochastic approximation EM-algorithm using a Metropolis-Hastings sampler to estimate the parameters of an item response latent regression model. Latent regression models are extensions of item response theory (IRT) to a 2-level latent variable model in which covariates serve as predictors of the…
A computational approach to compare regression modelling strategies in prediction research
Pajouheshnia, R.; Pestman, W.R.; Teerenstra, S.; Groenwold, R.H.
2016-01-01
BACKGROUND: It is often unclear which approach to fit, assess and adjust a model will yield the most accurate prediction model. We present an extension of an approach for comparing modelling strategies in linear regression to the setting of logistic regression and demonstrate its application in
Cepeda-Cuervo, Edilberto; Núñez-Antón, Vicente
2013-01-01
In this article, a proposed Bayesian extension of the generalized beta spatial regression models is applied to the analysis of the quality of education in Colombia. We briefly revise the beta distribution and describe the joint modeling approach for the mean and dispersion parameters in the spatial regression models' setting. Finally, we motivate…
U.S. Environmental Protection Agency — Spreadsheets are included here to support the manuscript "Boosted Regression Tree Models to Explain Watershed Nutrient Concentrations and Biological Condition". This...
Cox's regression model for dynamics of grouped unemployment data
Czech Academy of Sciences Publication Activity Database
Volf, Petr
2003-01-01
Roč. 10, č. 19 (2003), s. 151-162 ISSN 1212-074X R&D Projects: GA ČR GA402/01/0539 Institutional research plan: CEZ:AV0Z1075907 Keywords : mathematical statistics * survival analysis * Cox's model Subject RIV: BB - Applied Statistics, Operational Research
Inflation, Forecast Intervals and Long Memory Regression Models
C.S. Bos (Charles); Ph.H.B.F. Franses (Philip Hans); M. Ooms (Marius)
2001-01-01
textabstractWe examine recursive out-of-sample forecasting of monthly postwar U.S. core inflation and log price levels. We use the autoregressive fractionally integrated moving average model with explanatory variables (ARFIMAX). Our analysis suggests a significant explanatory power of leading
Inflation, Forecast Intervals and Long Memory Regression Models
Ooms, M.; Bos, C.S.; Franses, P.H.
2003-01-01
We examine recursive out-of-sample forecasting of monthly postwar US core inflation and log price levels. We use the autoregressive fractionally integrated moving average model with explanatory variables (ARFIMAX). Our analysis suggests a significant explanatory power of leading indicators
Multiple Linear Regression Model for Estimating the Price of a ...
African Journals Online (AJOL)
Ghana Mining Journal ... In the modeling, the Ordinary Least Squares (OLS) normality assumption which could introduce errors in the statistical analyses was dealt with by log transformation of the data, ensuring the data is normally ... The resultant MLRM is: Ŷi MLRM = (X'X)-1X'Y(xi') where X is the sample data matrix.
A linear regression modelling of the relation- ship between initial ...
African Journals Online (AJOL)
despite there being a need for certainty regarding the completion of projects. This article reports on an investigation of the ... Keywords: Relationship, initial & final construction time, project delivery. Dr Ayodeji Olatunji Aiyetan, Department of ..... for computing final contract duration. Therefore, both are recommended for use ...
Current and Predicted Fertility using Poisson Regression Model ...
African Journals Online (AJOL)
AJRH Managing Editor
marriage, modern contraceptive use, paid employment status, marital status, marital duration, education attainment, husbands education attainment, residence, zones, wealth quintiles. Children ever born in the context of this study refers to the number of children a woman previously born alive as at the time of the study.
Liu, Shelley H; Bobb, Jennifer F; Lee, Kyu Ha; Gennings, Chris; Claus Henn, Birgit; Bellinger, David; Austin, Christine; Schnaas, Lourdes; Tellez-Rojo, Martha M; Hu, Howard; Wright, Robert O; Arora, Manish; Coull, Brent A
2017-09-06
The impact of neurotoxic chemical mixtures on children's health is a critical public health concern. It is well known that during early life, toxic exposures may impact cognitive function during critical time intervals of increased vulnerability, known as windows of susceptibility. Knowledge on time windows of susceptibility can help inform treatment and prevention strategies, as chemical mixtures may affect a developmental process that is operating at a specific life phase. There are several statistical challenges in estimating the health effects of time-varying exposures to multi-pollutant mixtures, such as: multi-collinearity among the exposures both within time points and across time points, and complex exposure-response relationships. To address these concerns, we develop a flexible statistical method, called lagged kernel machine regression (LKMR). LKMR identifies critical exposure windows of chemical mixtures, and accounts for complex non-linear and non-additive effects of the mixture at any given exposure window. Specifically, LKMR estimates how the effects of a mixture of exposures change with the exposure time window using a Bayesian formulation of a grouped, fused lasso penalty within a kernel machine regression (KMR) framework. A simulation study demonstrates the performance of LKMR under realistic exposure-response scenarios, and demonstrates large gains over approaches that consider each time window separately, particularly when serial correlation among the time-varying exposures is high. Furthermore, LKMR demonstrates gains over another approach that inputs all time-specific chemical concentrations together into a single KMR. We apply LKMR to estimate associations between neurodevelopment and metal mixtures in Early Life Exposures in Mexico and Neurotoxicology, a prospective cohort study of child health in Mexico City. © The Author 2017. Published by Oxford University Press. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.
Directory of Open Access Journals (Sweden)
Igor K. Kochanenko
2013-01-01
Full Text Available Procedures of construction of curve regress by criterion of the least fractals, i.e. the greatest probability of the sums of degrees of the least deviations measured intensity from their modelling values are proved. The exponent is defined as fractal dimension of a time number. The difference of results of a well-founded method and a method of the least squares is quantitatively estimated.
Parametric modelling of thresholds across scales in wavelet regression
Anestis Antoniadis; Piotr Fryzlewicz
2006-01-01
We propose a parametric wavelet thresholding procedure for estimation in the ‘function plus independent, identically distributed Gaussian noise’ model. To reflect the decreasing sparsity of wavelet coefficients from finer to coarser scales, our thresholds also decrease. They retain the noise-free reconstruction property while being lower than the universal threshold, and are jointly parameterised by a single scalar parameter. We show that our estimator achieves near-optimal risk rates for the...
Shaofu Zhuyu Decoction Regresses Endometriotic Lesions in a Rat Model
Directory of Open Access Journals (Sweden)
Guanghui Zhu
2018-01-01
Full Text Available The current therapies for endometriosis are restricted by various side effects and treatment outcome has been less than satisfactory. Shaofu Zhuyu Decoction (SZD, a classic traditional Chinese medicinal (TCM prescription for dysmenorrhea, has been widely used in clinical practice by TCM doctors to relieve symptoms of endometriosis. The present study aimed to investigate the effects of SZD on a rat model of endometriosis. Forty-eight female Sprague-Dawley rats with regular estrous cycles went through autotransplantation operation to establish endometriosis model. Then 38 rats with successful ectopic implants were randomized into two groups: vehicle- and SZD-treated groups. The latter were administered SZD through oral gavage for 4 weeks. By the end of the treatment period, the volume of the endometriotic lesions was measured, the histopathological properties of the ectopic endometrium were evaluated, and levels of proliferating cell nuclear antigen (PCNA, CD34, and hypoxia inducible factor- (HIF- 1α in the ectopic endometrium were detected with immunohistochemistry. Furthermore, apoptosis was assessed using the terminal deoxynucleotidyl transferase (TdT deoxyuridine 5′-triphosphate (dUTP nick-end labeling (TUNEL assay. In this study, SZD significantly reduced the size of ectopic lesions in rats with endometriosis, inhibited cell proliferation, increased cell apoptosis, and reduced microvessel density and HIF-1α expression. It suggested that SZD could be an effective therapy for the treatment and prevention of endometriosis recurrence.
International Nuclear Information System (INIS)
Vajna, Szabolcs; Kertész, János; Tóth, Bálint
2013-01-01
Many human-related activities show power-law decaying interevent time distribution with exponents usually varying between 1 and 2. We study a simple task-queuing model, which produces bursty time series due to the non-trivial dynamics of the task list. The model is characterized by a priority distribution as an input parameter, which describes the choice procedure from the list. We give exact results on the asymptotic behaviour of the model and we show that the interevent time distribution is power-law decaying for any kind of input distributions that remain normalizable in the infinite list limit, with exponents tunable between 1 and 2. The model satisfies a scaling law between the exponents of interevent time distribution (β) and autocorrelation function (α): α + β = 2. This law is general for renewal processes with power-law decaying interevent time distribution. We conclude that slowly decaying autocorrelation function indicates long-range dependence only if the scaling law is violated. (paper)
[Application of detecting and taking overdispersion into account in Poisson regression model].
Bouche, G; Lepage, B; Migeot, V; Ingrand, P
2009-08-01
Researchers often use the Poisson regression model to analyze count data. Overdispersion can occur when a Poisson regression model is used, resulting in an underestimation of variance of the regression model parameters. Our objective was to take overdispersion into account and assess its impact with an illustration based on the data of a study investigating the relationship between use of the Internet to seek health information and number of primary care consultations. Three methods, overdispersed Poisson, a robust estimator, and negative binomial regression, were performed to take overdispersion into account in explaining variation in the number (Y) of primary care consultations. We tested overdispersion in the Poisson regression model using the ratio of the sum of Pearson residuals over the number of degrees of freedom (chi(2)/df). We then fitted the three models and compared parameter estimation to the estimations given by Poisson regression model. Variance of the number of primary care consultations (Var[Y]=21.03) was greater than the mean (E[Y]=5.93) and the chi(2)/df ratio was 3.26, which confirmed overdispersion. Standard errors of the parameters varied greatly between the Poisson regression model and the three other regression models. Interpretation of estimates from two variables (using the Internet to seek health information and single parent family) would have changed according to the model retained, with significant levels of 0.06 and 0.002 (Poisson), 0.29 and 0.09 (overdispersed Poisson), 0.29 and 0.13 (use of a robust estimator) and 0.45 and 0.13 (negative binomial) respectively. Different methods exist to solve the problem of underestimating variance in the Poisson regression model when overdispersion is present. The negative binomial regression model seems to be particularly accurate because of its theorical distribution ; in addition this regression is easy to perform with ordinary statistical software packages.
Chardon, Jérémy; Hingray, Benoit; Favre, Anne-Catherine
2018-01-01
Statistical downscaling models (SDMs) are often used to produce local weather scenarios from large-scale atmospheric information. SDMs include transfer functions which are based on a statistical link identified from observations between local weather and a set of large-scale predictors. As physical processes driving surface weather vary in time, the most relevant predictors and the regression link are likely to vary in time too. This is well known for precipitation for instance and the link is thus often estimated after some seasonal stratification of the data. In this study, we present a two-stage analog/regression model where the regression link is estimated from atmospheric analogs of the current prediction day. Atmospheric analogs are identified from fields of geopotential heights at 1000 and 500 hPa. For the regression stage, two generalized linear models are further used to model the probability of precipitation occurrence and the distribution of non-zero precipitation amounts, respectively. The two-stage model is evaluated for the probabilistic prediction of small-scale precipitation over France. It noticeably improves the skill of the prediction for both precipitation occurrence and amount. As the analog days vary from one prediction day to another, the atmospheric predictors selected in the regression stage and the value of the corresponding regression coefficients can vary from one prediction day to another. The model allows thus for a day-to-day adaptive and tailored downscaling. It can also reveal specific predictors for peculiar and non-frequent weather configurations.
Directory of Open Access Journals (Sweden)
Yoojeong Seo
2018-01-01
Full Text Available The issue of detecting objects bottoming on the sea floor is significant in various fields including civilian and military areas. The objective of this study is to investigate the logistic regression model to discriminate the target from the clutter and to verify the possibility of applying the model trained by the simulated data generated by the mathematical model to the real experimental data because it is not easy to obtain sufficient data in the underwater field. In the first stage of this study, when the clutter signal energy is so strong that the detection of a target is difficult, the logistic regression model is employed to distinguish the strong clutter signal and the target signal. Previous studies have found that if the clutter energy is larger, false detection occurs even for the various existing detection schemes. For this reason, the discrete Fourier transform (DFT magnitude spectrum of acoustic signals received by active sonar is applied to train the model to distinguish whether the received signal contains a target signal or not. The goodness of fit of the model is verified in terms of receiver operation characteristic (ROC, area under ROC curve (AUC, and classification table. The detection performance of the proposed model is evaluated in terms of detection rate according to target to clutter ratio (TCR. Furthermore, the real experimental data are employed to test the proposed approach. When using the experimental data to test the model, the logistic regression model is trained by the simulated data that are generated based on the mathematical model for the backscattering of the cylindrical object. The mathematical model is developed according to the size of the cylinder used in the experiment. Since the information on the experimental environment including the sound speed, the sediment type and such is not available, once simulated data are generated under various conditions, valid simulated data are selected using 70% of the
Capacitance Regression Modelling Analysis on Latex from Selected Rubber Tree Clones
Rosli, A. D.; Hashim, H.; Khairuzzaman, N. A.; Mohd Sampian, A. F.; Baharudin, R.; Abdullah, N. E.; Sulaiman, M. S.; Kamaru'zzaman, M.
2015-11-01
This paper investigates the capacitance regression modelling performance of latex for various rubber tree clones, namely clone 2002, 2008, 2014 and 3001. Conventionally, the rubber tree clones identification are based on observation towards tree features such as shape of leaf, trunk, branching habit and pattern of seeds texture. The former method requires expert persons and very time-consuming. Currently, there is no sensing device based on electrical properties that can be employed to measure different clones from latex samples. Hence, with a hypothesis that the dielectric constant of each clone varies, this paper discusses the development of a capacitance sensor via Capacitance Comparison Bridge (known as capacitance sensor) to measure an output voltage of different latex samples. The proposed sensor is initially tested with 30ml of latex sample prior to gradually addition of dilution water. The output voltage and capacitance obtained from the test are recorded and analyzed using Simple Linear Regression (SLR) model. This work outcome infers that latex clone of 2002 has produced the highest and reliable linear regression line with determination coefficient of 91.24%. In addition, the study also found that the capacitive elements in latex samples deteriorate if it is diluted with higher volume of water.
Capacitance Regression Modelling Analysis on Latex from Selected Rubber Tree Clones
International Nuclear Information System (INIS)
Rosli, A D; Baharudin, R; Hashim, H; Khairuzzaman, N A; Mohd Sampian, A F; Abdullah, N E; Kamaru'zzaman, M; Sulaiman, M S
2015-01-01
This paper investigates the capacitance regression modelling performance of latex for various rubber tree clones, namely clone 2002, 2008, 2014 and 3001. Conventionally, the rubber tree clones identification are based on observation towards tree features such as shape of leaf, trunk, branching habit and pattern of seeds texture. The former method requires expert persons and very time-consuming. Currently, there is no sensing device based on electrical properties that can be employed to measure different clones from latex samples. Hence, with a hypothesis that the dielectric constant of each clone varies, this paper discusses the development of a capacitance sensor via Capacitance Comparison Bridge (known as capacitance sensor) to measure an output voltage of different latex samples. The proposed sensor is initially tested with 30ml of latex sample prior to gradually addition of dilution water. The output voltage and capacitance obtained from the test are recorded and analyzed using Simple Linear Regression (SLR) model. This work outcome infers that latex clone of 2002 has produced the highest and reliable linear regression line with determination coefficient of 91.24%. In addition, the study also found that the capacitive elements in latex samples deteriorate if it is diluted with higher volume of water. (paper)
A logistic regression model of Coronary Artery Disease among Male Patients in Punjab
Directory of Open Access Journals (Sweden)
Sohail Chand
2005-07-01
Full Text Available This is a cross-sectional retrospective study of 308 male patients, who were presented first time for coronary angiography at the Punjab Institute of Cardiology. The mean age was 50.97 + 9.9 among male patients. As the response variable coronary artery disease (CAD was a binary variable, logistic regression model was fitted to predict the Coronary Artery Disease with the help of significant risk factors. Age, Chest pain, Diabetes Mellitus, Smoking and Lipids are resulted as significant risk factors associated with CAD among male population.
Directory of Open Access Journals (Sweden)
Y Shamshirgaran
2011-12-01
Full Text Available The Fixed Regression Test-Day Model (FRM and Random Regression Test-Day Model (RRM for genetic evaluation of milk yield trait of dairy cattle in Khorasan Razavi province were studied. Breeding values and genetic parameters of milk yield trait from two models were compared. A total of 164391 monthly test day milk records (three times milking per day obtained from 19217 Holstein cows distributed in 172 herds and calved from 1991 to 2008, were used to estimate genetic parameters and to predict breeding values. The contemporary group of herd- year- month of production was fitted as fixed effects in the models. Also linear and quadratic forms of age at calving and Holstein gene percentage were fitted as covariate. The random factors of the models were additive genetic and permanent environmental effects. In the random regression model, orthogonal legendre polynomial up to order 4(cubic was implemented to take account of genetic and environmental aspects of milk production over the course of lactation. Heritability estimates resulted from the FRM was 0.15. The average heritability estimates resulted from the RRM of monthly test day milk production for the second half of the lactation was higher than that of the first half of lactation period. The highest and lowest heritability values were found for the first (0.102 and sixth (0.235 month of lactation. Breeding value of animals predicted from FRM and RRM were also compared. The results showed similar ranking of animals based on their breeding values from both models.
Travel time reliability modeling.
2011-07-01
This report includes three papers as follows: : 1. Guo F., Rakha H., and Park S. (2010), "A Multi-state Travel Time Reliability Model," : Transportation Research Record: Journal of the Transportation Research Board, n 2188, : pp. 46-54. : 2. Park S.,...
Validation of regression models for nitrate concentrations in the upper groundwater in sandy soils
Sonneveld, M.P.W.; Brus, D.J.; Roelsma, J.
2010-01-01
For Dutch sandy regions, linear regression models have been developed that predict nitrate concentrations in the upper groundwater on the basis of residual nitrate contents in the soil in autumn. The objective of our study was to validate these regression models for one particular sandy region
Use of a Regression Model to Study Host-Genomic Determinants of Phage Susceptibility in MRSA
DEFF Research Database (Denmark)
Zschach, Henrike; Larsen, Mette Voldby; Hasman, Henrik
2018-01-01
strains to 12 (nine monovalent) different therapeutic phage preparations and subsequently employed linear regression models to estimate the influence of individual host gene families on resistance to phages. Specifically, we used a two-step regression model setup with a preselection step based on gene...
Bayesian networks with a logistic regression model for the conditional probabilities
Rijmen, F.P.J.
2008-01-01
Logistic regression techniques can be used to restrict the conditional probabilities of a Bayesian network for discrete variables. More specifically, each variable of the network can be modeled through a logistic regression model, in which the parents of the variable define the covariates. When all
Technology diffusion in hospitals : A log odds random effects regression model
Blank, J.L.T.; Valdmanis, V.G.
2013-01-01
This study identifies the factors that affect the diffusion of hospital innovations. We apply a log odds random effects regression model on hospital micro data. We introduce the concept of clustering innovations and the application of a log odds random effects regression model to describe the
Technology diffusion in hospitals: A log odds random effects regression model
J.L.T. Blank (Jos); V.G. Valdmanis (Vivian G.)
2015-01-01
textabstractThis study identifies the factors that affect the diffusion of hospital innovations. We apply a log odds random effects regression model on hospital micro data. We introduce the concept of clustering innovations and the application of a log odds random effects regression model to
Clinical Predictors of Regression of Choroidal Melanomas after Brachytherapy: A Growth Curve Model.
Rashid, Mamunur; Heikkonen, Jorma; Singh, Arun D; Kivelä, Tero T
2018-02-27
To build multivariate models to assess correctly and efficiently the contribution of tumor characteristics on the rate of regression of choroidal melanomas after brachytherapy in a way that adjusts for confounding and takes into account variation in tumor regression patterns. Modeling of longitudinal observational data. Ultrasound images from 330 of 388 consecutive choroidal melanomas (87%) irradiated from 2000 through 2008 at the Helsinki University Hospital, Helsinki, Finland, a national referral center. Images were obtained with a 10-MHz B-scan during 3 years of follow-up. Change in tumor thickness and cross-sectional area were modeled using a polynomial growth-curve function in a nested mixed linear regression model considering regression pattern and tumor levels. Initial tumor dimensions, tumor-node-metastasis (TNM) stage, shape, ciliary body involvement, pigmentation, isotope, plaque size, detached muscles, and radiation parameters were considered as covariates. Covariates that independently predict tumor regression. Initial tumor thickness, largest basal diameter, ciliary body involvement, TNM stage, tumor shape group, break in Bruch's membrane, having muscles detached, and radiation dose to tumor base predicted faster regression, whether considering all tumors or those that regressed in a pattern compatible with exponential decay. Dark brown pigmentation was associated with slower regression. In multivariate modeling, initial tumor thickness remained the predominant and robust predictor of tumor regression (P future analyses efficiently without matching. Copyright © 2018 American Academy of Ophthalmology. Published by Elsevier Inc. All rights reserved.
Real-time prediction of respiratory motion based on local regression methods
International Nuclear Information System (INIS)
Ruan, D; Fessler, J A; Balter, J M
2007-01-01
Recent developments in modulation techniques enable conformal delivery of radiation doses to small, localized target volumes. One of the challenges in using these techniques is real-time tracking and predicting target motion, which is necessary to accommodate system latencies. For image-guided-radiotherapy systems, it is also desirable to minimize sampling rates to reduce imaging dose. This study focuses on predicting respiratory motion, which can significantly affect lung tumours. Predicting respiratory motion in real-time is challenging, due to the complexity of breathing patterns and the many sources of variability. We propose a prediction method based on local regression. There are three major ingredients of this approach: (1) forming an augmented state space to capture system dynamics, (2) local regression in the augmented space to train the predictor from previous observation data using semi-periodicity of respiratory motion, (3) local weighting adjustment to incorporate fading temporal correlations. To evaluate prediction accuracy, we computed the root mean square error between predicted tumor motion and its observed location for ten patients. For comparison, we also investigated commonly used predictive methods, namely linear prediction, neural networks and Kalman filtering to the same data. The proposed method reduced the prediction error for all imaging rates and latency lengths, particularly for long prediction lengths
Masselot, Pierre; Chebana, Fateh; Bélanger, Diane; St-Hilaire, André; Abdous, Belkacem; Gosselin, Pierre; Ouarda, Taha B. M. J.
2018-01-01
In a number of environmental studies, relationships between natural processes are often assessed through regression analyses, using time series data. Such data are often multi-scale and non-stationary, leading to a poor accuracy of the resulting regression models and therefore to results with moderate reliability. To deal with this issue, the present paper introduces the EMD-regression methodology consisting in applying the empirical mode decomposition (EMD) algorithm on data series and then using the resulting components in regression models. The proposed methodology presents a number of advantages. First, it accounts of the issues of non-stationarity associated to the data series. Second, this approach acts as a scan for the relationship between a response variable and the predictors at different time scales, providing new insights about this relationship. To illustrate the proposed methodology it is applied to study the relationship between weather and cardiovascular mortality in Montreal, Canada. The results shed new knowledge concerning the studied relationship. For instance, they show that the humidity can cause excess mortality at the monthly time scale, which is a scale not visible in classical models. A comparison is also conducted with state of the art methods which are the generalized additive models and distributed lag models, both widely used in weather-related health studies. The comparison shows that EMD-regression achieves better prediction performances and provides more details than classical models concerning the relationship.
Evaluating penalized logistic regression models to predict Heat-Related Electric grid stress days
Energy Technology Data Exchange (ETDEWEB)
Bramer, L. M.; Rounds, J.; Burleyson, C. D.; Fortin, D.; Hathaway, J.; Rice, J.; Kraucunas, I.
2017-11-01
Understanding the conditions associated with stress on the electricity grid is important in the development of contingency plans for maintaining reliability during periods when the grid is stressed. In this paper, heat-related grid stress and the relationship with weather conditions is examined using data from the eastern United States. Penalized logistic regression models were developed and applied to predict stress on the electric grid using weather data. The inclusion of other weather variables, such as precipitation, in addition to temperature improved model performance. Several candidate models and datasets were examined. A penalized logistic regression model fit at the operation-zone level was found to provide predictive value and interpretability. Additionally, the importance of different weather variables observed at different time scales were examined. Maximum temperature and precipitation were identified as important across all zones while the importance of other weather variables was zone specific. The methods presented in this work are extensible to other regions and can be used to aid in planning and development of the electrical grid.
Time-trend of melanoma screening practice by primary care physicians: a meta-regression analysis.
Valachis, Antonis; Mauri, Davide; Karampoiki, Vassiliki; Polyzos, Nikolaos P; Cortinovis, Ivan; Koukourakis, Georgios; Zacharias, Georgios; Xilomenos, Apostolos; Tsappi, Maria; Casazza, Giovanni
2009-01-01
To assess whether the proportion of primary care physicians implementing full body skin examination (FBSE) to screen for melanoma changed over time. Meta-regression analyses of available data. MEDLINE, ISI, Cochrane Central Register of Controlled Trials. Fifteen studies surveying 10,336 physicians were included in the analyses. Overall, 15%-82% of them reported to perform FBSE to screen for melanoma. The proportion of physicians using FBSE screening tended to decrease by 1.72% per year (P =0.086). Corresponding annual changes in European, North American, and Australian settings were -0.68% (P =0.494), -2.02% (P =0.044), and +2.59% (P =0.010), respectively. Changes were not influenced by national guide-lines. Considering the increasing incidence of melanoma and other skin malignancies, as well as their relative potential consequences, the FBSE implementation time-trend we retrieved should be considered a worrisome phenomenon.
Austin, Peter C
2018-01-01
The use of the Cox proportional hazards regression model is widespread. A key assumption of the model is that of proportional hazards. Analysts frequently test the validity of this assumption using statistical significance testing. However, the statistical power of such assessments is frequently unknown. We used Monte Carlo simulations to estimate the statistical power of two different methods for detecting violations of this assumption. When the covariate was binary, we found that a model-based method had greater power than a method based on cumulative sums of martingale residuals. Furthermore, the parametric nature of the distribution of event times had an impact on power when the covariate was binary. Statistical power to detect a strong violation of the proportional hazards assumption was low to moderate even when the number of observed events was high. In many data sets, power to detect a violation of this assumption is likely to be low to modest.
Directory of Open Access Journals (Sweden)
Soldić-Aleksić Jasna
2009-01-01
Full Text Available Market segmentation presents one of the key concepts of the modern marketing. The main goal of market segmentation is focused on creating groups (segments of customers that have similar characteristics, needs, wishes and/or similar behavior regarding the purchase of concrete product/service. Companies can create specific marketing plan for each of these segments and therefore gain short or long term competitive advantage on the market. Depending on the concrete marketing goal, different segmentation schemes and techniques may be applied. This paper presents a predictive market segmentation model based on the application of logistic regression model and CHAID analysis. The logistic regression model was used for the purpose of variables selection (from the initial pool of eleven variables which are statistically significant for explaining the dependent variable. Selected variables were afterwards included in the CHAID procedure that generated the predictive market segmentation model. The model results are presented on the concrete empirical example in the following form: summary model results, CHAID tree, Gain chart, Index chart, risk and classification tables.
Koon, Sharon; Petscher, Yaacov
2015-01-01
The purpose of this report was to explicate the use of logistic regression and classification and regression tree (CART) analysis in the development of early warning systems. It was motivated by state education leaders' interest in maintaining high classification accuracy while simultaneously improving practitioner understanding of the rules by…
DEFF Research Database (Denmark)
Tan, Qihua; Bathum, L; Christiansen, L
2003-01-01
In this paper, we apply logistic regression models to measure genetic association with human survival for highly polymorphic and pleiotropic genes. By modelling genotype frequency as a function of age, we introduce a logistic regression model with polytomous responses to handle the polymorphic...... situation. Genotype and allele-based parameterization can be used to investigate the modes of gene action and to reduce the number of parameters, so that the power is increased while the amount of multiple testing minimized. A binomial logistic regression model with fractional polynomials is used to capture...
International Nuclear Information System (INIS)
Liu, Jie
2015-01-01
This Ph. D. work is motivated by the possibility of monitoring the conditions of components of energy systems for their extended and safe use, under proper practice of operation and adequate policies of maintenance. The aim is to develop a Support Vector Regression (SVR)-based framework for predicting time series data under stationary/nonstationary environmental and operational conditions. Single SVR and SVR-based ensemble approaches are developed to tackle the prediction problem based on both small and large datasets. Strategies are proposed for adaptively updating the single SVR and SVR-based ensemble models in the existence of pattern drifts. Comparisons with other online learning approaches for kernel-based modelling are provided with reference to time series data from a critical component in Nuclear Power Plants (NPPs) provided by Electricite de France (EDF). The results show that the proposed approaches achieve comparable prediction results, considering the Mean Squared Error (MSE) and Mean Relative Error (MRE), in much less computation time. Furthermore, by analyzing the geometrical meaning of the Feature Vector Selection (FVS) method proposed in the literature, a novel geometrically interpretable kernel method, named Reduced Rank Kernel Ridge Regression-II (RRKRR-II), is proposed to describe the linear relations between a predicted value and the predicted values of the Feature Vectors (FVs) selected by FVS. Comparisons with several kernel methods on a number of public datasets prove the good prediction accuracy and the easy-of-tuning of the hyper-parameters of RRKRR-II. (author)
Davies, Patrick Laurie
2014-01-01
Introduction IntroductionApproximate Models Notation Two Modes of Statistical AnalysisTowards One Mode of Analysis Approximation, Randomness, Chaos, Determinism ApproximationA Concept of Approximation Approximation Approximating a Data Set by a Model Approximation Regions Functionals and EquivarianceRegularization and Optimality Metrics and DiscrepanciesStrong and Weak Topologies On Being (almost) Honest Simulations and Tables Degree of Approximation and p-values ScalesStability of Analysis The Choice of En(α, P) Independence Procedures, Approximation and VaguenessDiscrete Models The Empirical Density Metrics and Discrepancies The Total Variation Metric The Kullback-Leibler and Chi-Squared Discrepancies The Po(λ) ModelThe b(k, p) and nb(k, p) Models The Flying Bomb Data The Student Study Times Data OutliersOutliers, Data Analysis and Models Breakdown Points and Equivariance Identifying Outliers and Breakdown Outliers in Multivariate Data Outliers in Linear Regression Outliers in Structured Data The Location...
Using regression heteroscedasticity to model trends in the mean and variance of floods
Hecht, Jory; Vogel, Richard
2015-04-01
Changes in the frequency of extreme floods have been observed and anticipated in many hydrological settings in response to numerous drivers of environmental change, including climate, land cover, and infrastructure. To help decision-makers design flood control infrastructure in settings with non-stationary hydrological regimes, a parsimonious approach for detecting and modeling trends in extreme floods is needed. An approach using ordinary least squares (OLS) to fit a heteroscedastic regression model can accommodate nonstationarity in both the mean and variance of flood series while simultaneously offering a means of (i) analytically evaluating type I and type II trend detection errors, (ii) analytically generating expressions of uncertainty, such as confidence and prediction intervals, (iii) providing updated estimates of the frequency of floods exceeding the flood of record, (iv) accommodating a wide range of non-linear functions through ladder of powers transformations, and (v) communicating hydrological changes in a single graphical image. Previous research has shown that the two-parameter lognormal distribution can adequately model the annual maximum flood distribution of both stationary and non-stationary hydrological regimes in many regions of the United States. A simple logarithmic transformation of annual maximum flood series enables an OLS heteroscedastic regression modeling approach to be especially suitable for creating a non-stationary flood frequency distribution with parameters that are conditional upon time or physically meaningful covariates. While heteroscedasticity is often viewed as an impediment, we document how detecting and modeling heteroscedasticity presents an opportunity for characterizing both the conditional mean and variance of annual maximum floods. We introduce an approach through which variance trend models can be analytically derived from the behavior of residuals of the conditional mean flood model. Through case studies of
Describing Growth Pattern of Bali Cows Using Non-linear Regression Models
Directory of Open Access Journals (Sweden)
Mohd. Hafiz A.W
2016-12-01
Full Text Available The objective of this study was to evaluate the best fit non-linear regression model to describe the growth pattern of Bali cows. Estimates of asymptotic mature weight, rate of maturing and constant of integration were derived from Brody, von Bertalanffy, Gompertz and Logistic models which were fitted to cross-sectional data of body weight taken from 74 Bali cows raised in MARDI Research Station Muadzam Shah Pahang. Coefficient of determination (R2 and residual mean squares (MSE were used to determine the best fit model in describing the growth pattern of Bali cows. Von Bertalanffy model was the best model among the four growth functions evaluated to determine the mature weight of Bali cattle as shown by the highest R2 and lowest MSE values (0.973 and 601.9, respectively, followed by Gompertz (0.972 and 621.2, respectively, Logistic (0.971 and 648.4, respectively and Brody (0.932 and 660.5, respectively models. The correlation between rate of maturing and mature weight was found to be negative in the range of -0.170 to -0.929 for all models, indicating that animals of heavier mature weight had lower rate of maturing. The use of non-linear model could summarize the weight-age relationship into several biologically interpreted parameters compared to the entire lifespan weight-age data points that are difficult and time consuming to interpret.
Perceived Organizational Support for Enhancing Welfare at Work: A Regression Tree Model
Giorgi, Gabriele; Dubin, David; Perez, Javier Fiz
2016-01-01
When trying to examine outcomes such as welfare and well-being, research tends to focus on main effects and take into account limited numbers of variables at a time. There are a number of techniques that may help address this problem. For example, many statistical packages available in R provide easy-to-use methods of modeling complicated analysis such as classification and tree regression (i.e., recursive partitioning). The present research illustrates the value of recursive partitioning in the prediction of perceived organizational support in a sample of more than 6000 Italian bankers. Utilizing the tree function party package in R, we estimated a regression tree model predicting perceived organizational support from a multitude of job characteristics including job demand, lack of job control, lack of supervisor support, training, etc. The resulting model appears particularly helpful in pointing out several interactions in the prediction of perceived organizational support. In particular, training is the dominant factor. Another dimension that seems to influence organizational support is reporting (perceived communication about safety and stress concerns). Results are discussed from a theoretical and methodological point of view. PMID:28082924
Evaluation of Regression and Neuro_Fuzzy Models in Estimating Saturated Hydraulic Conductivity
Directory of Open Access Journals (Sweden)
J. Behmanesh
2015-06-01
Full Text Available Study of soil hydraulic properties such as saturated and unsaturated hydraulic conductivity is required in the environmental investigations. Despite numerous research, measuring saturated hydraulic conductivity using by direct methods are still costly, time consuming and professional. Therefore estimating saturated hydraulic conductivity using rapid and low cost methods such as pedo-transfer functions with acceptable accuracy was developed. The purpose of this research was to compare and evaluate 11 pedo-transfer functions and Adaptive Neuro-Fuzzy Inference System (ANFIS to estimate saturated hydraulic conductivity of soil. In this direct, saturated hydraulic conductivity and physical properties in 40 points of Urmia were calculated. The soil excavated was used in the lab to determine its easily accessible parameters. The results showed that among existing models, Aimrun et al model had the best estimation for soil saturated hydraulic conductivity. For mentioned model, the Root Mean Square Error and Mean Absolute Error parameters were 0.174 and 0.028 m/day respectively. The results of the present research, emphasises the importance of effective porosity application as an important accessible parameter in accuracy of pedo-transfer functions. sand and silt percent, bulk density and soil particle density were selected to apply in 561 ANFIS models. In training phase of best ANFIS model, the R2 and RMSE were calculated 1 and 1.2×10-7 respectively. These amounts in the test phase were 0.98 and 0.0006 respectively. Comparison of regression and ANFIS models showed that the ANFIS model had better results than regression functions. Also Nuro-Fuzzy Inference System had capability to estimatae with high accuracy in various soil textures.
MCKissick, Burnell T. (Technical Monitor); Plassman, Gerald E.; Mall, Gerald H.; Quagliano, John R.
2005-01-01
Linear multivariable regression models for predicting day and night Eddy Dissipation Rate (EDR) from available meteorological data sources are defined and validated. Model definition is based on a combination of 1997-2000 Dallas/Fort Worth (DFW) data sources, EDR from Aircraft Vortex Spacing System (AVOSS) deployment data, and regression variables primarily from corresponding Automated Surface Observation System (ASOS) data. Model validation is accomplished through EDR predictions on a similar combination of 1994-1995 Memphis (MEM) AVOSS and ASOS data. Model forms include an intercept plus a single term of fixed optimal power for each of these regression variables; 30-minute forward averaged mean and variance of near-surface wind speed and temperature, variance of wind direction, and a discrete cloud cover metric. Distinct day and night models, regressing on EDR and the natural log of EDR respectively, yield best performance and avoid model discontinuity over day/night data boundaries.
Amalia, Junita; Purhadi, Otok, Bambang Widjanarko
2017-11-01
Poisson distribution is a discrete distribution with count data as the random variables and it has one parameter defines both mean and variance. Poisson regression assumes mean and variance should be same (equidispersion). Nonetheless, some case of the count data unsatisfied this assumption because variance exceeds mean (over-dispersion). The ignorance of over-dispersion causes underestimates in standard error. Furthermore, it causes incorrect decision in the statistical test. Previously, paired count data has a correlation and it has bivariate Poisson distribution. If there is over-dispersion, modeling paired count data is not sufficient with simple bivariate Poisson regression. Bivariate Poisson Inverse Gaussian Regression (BPIGR) model is mix Poisson regression for modeling paired count data within over-dispersion. BPIGR model produces a global model for all locations. In another hand, each location has different geographic conditions, social, cultural and economic so that Geographically Weighted Regression (GWR) is needed. The weighting function of each location in GWR generates a different local model. Geographically Weighted Bivariate Poisson Inverse Gaussian Regression (GWBPIGR) model is used to solve over-dispersion and to generate local models. Parameter estimation of GWBPIGR model obtained by Maximum Likelihood Estimation (MLE) method. Meanwhile, hypothesis testing of GWBPIGR model acquired by Maximum Likelihood Ratio Test (MLRT) method.
Regression models for analyzing radiological visual grading studies--an empirical comparison.
Saffari, S Ehsan; Löve, Áskell; Fredrikson, Mats; Smedby, Örjan
2015-10-30
For optimizing and evaluating image quality in medical imaging, one can use visual grading experiments, where observers rate some aspect of image quality on an ordinal scale. To analyze the grading data, several regression methods are available, and this study aimed at empirically comparing such techniques, in particular when including random effects in the models, which is appropriate for observers and patients. Data were taken from a previous study where 6 observers graded or ranked in 40 patients the image quality of four imaging protocols, differing in radiation dose and image reconstruction method. The models tested included linear regression, the proportional odds model for ordinal logistic regression, the partial proportional odds model, the stereotype logistic regression model and rank-order logistic regression (for ranking data). In the first two models, random effects as well as fixed effects could be included; in the remaining three, only fixed effects. In general, the goodness of fit (AIC and McFadden's Pseudo R (2)) showed small differences between the models with fixed effects only. For the mixed-effects models, higher AIC and lower Pseudo R (2) was obtained, which may be related to the different number of parameters in these models. The estimated potential for dose reduction by new image reconstruction methods varied only slightly between models. The authors suggest that the most suitable approach may be to use ordinal logistic regression, which can handle ordinal data and random effects appropriately.
Genome-wide selection by mixed model ridge regression and extensions based on geostatistical models.
Schulz-Streeck, Torben; Piepho, Hans-Peter
2010-03-31
The success of genome-wide selection (GS) approaches will depend crucially on the availability of efficient and easy-to-use computational tools. Therefore, approaches that can be implemented using mixed models hold particular promise and deserve detailed study. A particular class of mixed models suitable for GS is given by geostatistical mixed models, when genetic distance is treated analogously to spatial distance in geostatistics. We consider various spatial mixed models for use in GS. The analyses presented for the QTL-MAS 2009 dataset pay particular attention to the modelling of residual errors as well as of polygenetic effects. It is shown that geostatistical models are viable alternatives to ridge regression, one of the common approaches to GS. Correlations between genome-wide estimated breeding values and true breeding values were between 0.879 and 0.889. In the example considered, we did not find a large effect of the residual error variance modelling, largely because error variances were very small. A variance components model reflecting the pedigree of the crosses did not provide an improved fit. We conclude that geostatistical models deserve further study as a tool to GS that is easily implemented in a mixed model package.
Hirsch, Robert M; Moyer, Douglas L; Archfield, Stacey A
2010-10-01
A new approach to the analysis of long-term surface water-quality data is proposed and implemented. The goal of this approach is to increase the amount of information that is extracted from the types of rich water-quality datasets that now exist. The method is formulated to allow for maximum flexibility in representations of the long-term trend, seasonal components, and discharge-related components of the behavior of the water-quality variable of interest. It is designed to provide internally consistent estimates of the actual history of concentrations and fluxes as well as histories that eliminate the influence of year-to-year variations in streamflow. The method employs the use of weighted regressions of concentrations on time, discharge, and season. Finally, the method is designed to be useful as a diagnostic tool regarding the kinds of changes that are taking place in the watershed related to point sources, groundwater sources, and surface-water nonpoint sources. The method is applied to datasets for the nine large tributaries of Chesapeake Bay from 1978 to 2008. The results show a wide range of patterns of change in total phosphorus and in dissolved nitrate plus nitrite. These results should prove useful in further examination of the causes of changes, or lack of changes, and may help inform decisions about future actions to reduce nutrient enrichment in the Chesapeake Bay and its watershed.Hirsch, Robert M., Douglas L. Moyer, and Stacey A. Archfield, 2010. Weighted Regressions on Time, Discharge, and Season (WRTDS), With an Application to Chesapeake Bay River Inputs. Journal of the American Water Resources Association (JAWRA) 46(5):857-880. DOI: 10.1111/j.1752-1688.2010.00482.x.
Gale, Ella M; Johns, Marcus A; Wirawan, Remigius H; Scott, Janet L
2017-07-21
Polysaccharides, such as cellulose, are often processed by dissolution in solvent mixtures, e.g. an ionic liquid (IL) combined with a dipolar aprotic co-solvent (CS) that the polymer does not dissolve in. A multi-walker, discrete-time, discrete-space 1-dimensional random walk can be applied to model solvation of a polymer in a multi-component solvent mixture. The number of IL pairs in a solvent mixture and the number of solvent shells formable, x, is associated with n, the model time-step, and N, the number of random walkers. The mean number of distinct sites visited is proportional to the amount of polymer soluble in a solution. By also fitting a polynomial regression model to the data, we can associate the random walk terms with chemical interactions between components and probe where the system deviates from a 1-D random walk. The 'frustration' between solvents shells is given as ln x in the random walk model and as a negative IL:IL interaction term in the regression model. This frustration appears in regime II of the random walk model (high volume fractions of IL) where walkers interfere with each other, and the system tends to its limiting behaviour. In the low concentration regime, (regime I) the solvent shells do not interact, and the system depends only on IL and CS terms. In both models (and both regimes), the system is almost entirely controlled by the volume available to solvation shells, and thus is a counting/space-filling problem, where the molar volume of the CS is important. Small deviations are observed when there is an IL-CS interaction. The use of two models, built on separate approaches, confirm these findings, demonstrating that this is a real effect and offering a route to identifying such systems. Specifically, the majority of CSs - such as dimethylformide - follow the random walk model, whilst 1-methylimidazole, dimethyl sulfoxide, 1,3-dimethyl-2-imidazolidinone and tetramethylurea offer a CS-mediated improvement and propylene carbonate
Formulating state space models in R with focus on longitudinal regression models
DEFF Research Database (Denmark)
Dethlefsen, Claus; Lundbye-Christensen, Søren
We provide a language for formulating a range of state space models. The described methodology is implemented in the R -package sspir available from cran.r-project.org . A state space model is specified similarly to a generalized linear model in R , by marking the time-varying terms in the form...... We provide a language for formulating a range of state space models. The described methodology is implemented in the R -package sspir available from cran.r-project.org . A state space model is specified similarly to a generalized linear model in R , by marking the time-varying terms...
Hiebert, Sara M; Green, Stephen A; Yellon, Steven M
2006-05-01
This study tested the efficacy of timed oral administration of melatonin as an alternative both to invasive methods (daily injections, timed infusions) and to untimed oral administration in Siberian hamsters (Phodopus sungorus), an important model for the study of photoperiodism. Hamsters readily consumed a small piece of melatonin-treated apple immediately when presented and circulating melatonin was rapidly elevated with a half-life of approximately 3.5 h. Melatonin-treated apple was fed to hamsters for 3 weeks at 2 h before lights off to extend the duration of the nighttime rise in endogenous melatonin. Melatonin treatment induced testicular regression and elevated serum cortisol, effects comparable to those in hamsters exposed to short days. These findings support the hypothesis that timed oral administration of melatonin can mimic the effects of short days and provide a method by which melatonin can be delivered without the potentially confounding effects of handling and injection stress.
DEFF Research Database (Denmark)
Strathe, Anders B; Mark, Thomas; Nielsen, Bjarne
2014-01-01
Random regression models were used to estimate covariance functions between cumulated feed intake (CFI) and body weight (BW) in 8424 Danish Duroc pigs. Random regressions on second order Legendre polynomials of age were used to describe genetic and permanent environmental curves in BW and CFI...
A connection between item/subtest regression and the Rasch model
Engelen, Ronald J.H.; Jannarone, Robert J.
1989-01-01
The purpose of this paper is to link empirical Bayes methods with two specific topics in item response theory--item/subtest regression, and testing the goodness of fit of the Rasch model--under the assumptions of local independence and sufficiency. It is shown that item/subtest regression results in
Bulcock, J. W.; And Others
Advantages of normalization regression estimation over ridge regression estimation are demonstrated by reference to Bloom's model of school learning. Theoretical concern centered on the structure of scholastic achievement at grade 10 in Canadian high schools. Data on 886 students were randomly sampled from the Carnegie Human Resources Data Bank.…
Brakebill, John W; Preston, Stephen D
2003-01-01
The U.S. Geological Survey has developed a methodology for statistically relating nutrient sources and land-surface characteristics to nutrient loads of streams. The methodology is referred to as SPAtially Referenced Regressions On Watershed attributes (SPARROW), and relates measured stream nutrient loads to nutrient sources using nonlinear statistical regression models. A spatially detailed digital hydrologic network of stream reaches, stream-reach characteristics such as mean streamflow, water velocity, reach length, and travel time, and their associated watersheds supports the regression models. This network serves as the primary framework for spatially referencing potential nutrient source information such as atmospheric deposition, septic systems, point-sources, land use, land cover, and agricultural sources and land-surface characteristics such as land use, land cover, average-annual precipitation and temperature, slope, and soil permeability. In the Chesapeake Bay watershed that covers parts of Delaware, Maryland, Pennsylvania, New York, Virginia, West Virginia, and Washington D.C., SPARROW was used to generate models estimating loads of total nitrogen and total phosphorus representing 1987 and 1992 land-surface conditions. The 1987 models used a hydrologic network derived from an enhanced version of the U.S. Environmental Protection Agency's digital River Reach File, and course resolution Digital Elevation Models (DEMs). A new hydrologic network was created to support the 1992 models by generating stream reaches representing surface-water pathways defined by flow direction and flow accumulation algorithms from higher resolution DEMs. On a reach-by-reach basis, stream reach characteristics essential to the modeling were transferred to the newly generated pathways or reaches from the enhanced River Reach File used to support the 1987 models. To complete the new network, watersheds for each reach were generated using the direction of surface-water flow derived
Model-free prediction and regression a transformation-based approach to inference
Politis, Dimitris N
2015-01-01
The Model-Free Prediction Principle expounded upon in this monograph is based on the simple notion of transforming a complex dataset to one that is easier to work with, e.g., i.i.d. or Gaussian. As such, it restores the emphasis on observable quantities, i.e., current and future data, as opposed to unobservable model parameters and estimates thereof, and yields optimal predictors in diverse settings such as regression and time series. Furthermore, the Model-Free Bootstrap takes us beyond point prediction in order to construct frequentist prediction intervals without resort to unrealistic assumptions such as normality. Prediction has been traditionally approached via a model-based paradigm, i.e., (a) fit a model to the data at hand, and (b) use the fitted model to extrapolate/predict future data. Due to both mathematical and computational constraints, 20th century statistical practice focused mostly on parametric models. Fortunately, with the advent of widely accessible powerful computing in the late 1970s, co...
Modeling group size and scalar stress by logistic regression from an archaeological perspective.
Directory of Open Access Journals (Sweden)
Gianmarco Alberti
Full Text Available Johnson's scalar stress theory, describing the mechanics of (and the remedies to the increase in in-group conflictuality that parallels the increase in groups' size, provides scholars with a useful theoretical framework for the understanding of different aspects of the material culture of past communities (i.e., social organization, communal food consumption, ceramic style, architecture and settlement layout. Due to its relevance in archaeology and anthropology, the article aims at proposing a predictive model of critical level of scalar stress on the basis of community size. Drawing upon Johnson's theory and on Dunbar's findings on the cognitive constrains to human group size, a model is built by means of Logistic Regression on the basis of the data on colony fissioning among the Hutterites of North America. On the grounds of the theoretical framework sketched in the first part of the article, the absence or presence of colony fissioning is considered expression of not critical vs. critical level of scalar stress for the sake of the model building. The model, which is also tested against a sample of archaeological and ethnographic cases: a confirms the existence of a significant relationship between critical scalar stress and group size, setting the issue on firmer statistical grounds; b allows calculating the intercept and slope of the logistic regression model, which can be used in any time to estimate the probability that a community experienced a critical level of scalar stress; c allows locating a critical scalar stress threshold at community size 127 (95% CI: 122-132, while the maximum probability of critical scale stress is predicted at size 158 (95% CI: 147-170. The model ultimately provides grounds to assess, for the sake of any further archaeological/anthropological interpretation, the probability that a group reached a hot spot of size development critical for its internal cohesion.
Shirk, Andrew J; Landguth, Erin L; Cushman, Samuel A
2018-01-01
Anthropogenic migration barriers fragment many populations and limit the ability of species to respond to climate-induced biome shifts. Conservation actions designed to conserve habitat connectivity and mitigate barriers are needed to unite fragmented populations into larger, more viable metapopulations, and to allow species to track their climate envelope over time. Landscape genetic analysis provides an empirical means to infer landscape factors influencing gene flow and thereby inform such conservation actions. However, there are currently many methods available for model selection in landscape genetics, and considerable uncertainty as to which provide the greatest accuracy in identifying the true landscape model influencing gene flow among competing alternative hypotheses. In this study, we used population genetic simulations to evaluate the performance of seven regression-based model selection methods on a broad array of landscapes that varied by the number and type of variables contributing to resistance, the magnitude and cohesion of resistance, as well as the functional relationship between variables and resistance. We also assessed the effect of transformations designed to linearize the relationship between genetic and landscape distances. We found that linear mixed effects models had the highest accuracy in every way we evaluated model performance; however, other methods also performed well in many circumstances, particularly when landscape resistance was high and the correlation among competing hypotheses was limited. Our results provide guidance for which regression-based model selection methods provide the most accurate inferences in landscape genetic analysis and thereby best inform connectivity conservation actions. Published 2017. This article is a U.S. Government work and is in the public domain in the USA.
Land-use regression panel models of NO2 concentrations in Seoul, Korea
Kim, Youngkook; Guldmann, Jean-Michel
2015-04-01
Transportation and land-use activities are major air pollution contributors. Since their shares of emissions vary across space and time, so do air pollution concentrations. Despite these variations, panel data have rarely been used in land-use regression (LUR) modeling of air pollution. In addition, the complex interactions between traffic flows, land uses, and meteorological variables, have not been satisfactorily investigated in LUR models. The purpose of this research is to develop and estimate nitrogen dioxide (NO2) panel models based on the LUR framework with data for Seoul, Korea, accounting for the impacts of these variables, and their interactions with spatial and temporal dummy variables. The panel data vary over several scales: daily (24 h), seasonally (4), and spatially (34 intra-urban measurement locations). To enhance model explanatory power, wind direction and distance decay effects are accounted for. The results show that vehicle-kilometers-traveled (VKT) and solar radiation have statistically strong positive and negative impacts on NO2 concentrations across the four seasonal models. In addition, there are significant interactions with the dummy variables, pointing to VKT and solar radiation effects on NO2 concentrations that vary with time and intra-urban location. The results also show that residential, commercial, and industrial land uses, and wind speed, temperature, and humidity, all impact NO2 concentrations. The R2 vary between 0.95 and 0.98.
Directory of Open Access Journals (Sweden)
DT Wiyanti
2013-07-01
Full Text Available Salah satu metode peramalan yang paling dikembangkan saat ini adalah time series, yakni menggunakan pendekatan kuantitatif dengan data masa lampau yang dijadikan acuan untuk peramalan masa depan. Berbagai penelitian telah mengusulkan metode-metode untuk menyelesaikan time series, di antaranya statistik, jaringan syaraf, wavelet, dan sistem fuzzy. Metode-metode tersebut memiliki kekurangan dan keunggulan yang berbeda. Namun permasalahan yang ada dalam dunia nyata merupakan masalah yang kompleks. Satu metode saja mungkin tidak mampu mengatasi masalah tersebut dengan baik. Dalam artikel ini dibahas penggabungan dua buah metode yaitu Auto Regressive Integrated Moving Average (ARIMA dan Radial Basis Function (RBF. Alasan penggabungan kedua metode ini adalah karena adanya asumsi bahwa metode tunggal tidak dapat secara total mengidentifikasi semua karakteristik time series. Pada artikel ini dibahas peramalan terhadap data Indeks Harga Perdagangan Besar (IHPB dan data inflasi komoditi Indonesia; kedua data berada pada rentang tahun 2006 hingga beberapa bulan di tahun 2012. Kedua data tersebut masing-masing memiliki enam variabel. Hasil peramalan metode ARIMA-RBF dibandingkan dengan metode ARIMA dan metode RBF secara individual. Hasil analisa menunjukkan bahwa dengan metode penggabungan ARIMA dan RBF, model yang diberikan memiliki hasil yang lebih akurat dibandingkan dengan penggunaan salah satu metode saja. Hal ini terlihat dalam visual plot, MAPE, dan RMSE dari semua variabel pada dua data uji coba.Â The accuracy of time series forecasting is the subject of many decision-making processes. Time series use a quantitative approach to employ data from the past to make forecast for the future. Many researches have proposed several methods to solve time series, such as using statistics, neural networks, wavelets, and fuzzy systems. These methods have different advantages and disadvantages. But often the problem in the real world is just too complex that a
[Evaluation of estimation of prevalence ratio using bayesian log-binomial regression model].
Gao, W L; Lin, H; Liu, X N; Ren, X W; Li, J S; Shen, X P; Zhu, S L
2017-03-10
To evaluate the estimation of prevalence ratio ( PR ) by using bayesian log-binomial regression model and its application, we estimated the PR of medical care-seeking prevalence to caregivers' recognition of risk signs of diarrhea in their infants by using bayesian log-binomial regression model in Openbugs software. The results showed that caregivers' recognition of infant' s risk signs of diarrhea was associated significantly with a 13% increase of medical care-seeking. Meanwhile, we compared the differences in PR 's point estimation and its interval estimation of medical care-seeking prevalence to caregivers' recognition of risk signs of diarrhea and convergence of three models (model 1: not adjusting for the covariates; model 2: adjusting for duration of caregivers' education, model 3: adjusting for distance between village and township and child month-age based on model 2) between bayesian log-binomial regression model and conventional log-binomial regression model. The results showed that all three bayesian log-binomial regression models were convergence and the estimated PRs were 1.130(95 %CI : 1.005-1.265), 1.128(95 %CI : 1.001-1.264) and 1.132(95 %CI : 1.004-1.267), respectively. Conventional log-binomial regression model 1 and model 2 were convergence and their PRs were 1.130(95 % CI : 1.055-1.206) and 1.126(95 % CI : 1.051-1.203), respectively, but the model 3 was misconvergence, so COPY method was used to estimate PR , which was 1.125 (95 %CI : 1.051-1.200). In addition, the point estimation and interval estimation of PRs from three bayesian log-binomial regression models differed slightly from those of PRs from conventional log-binomial regression model, but they had a good consistency in estimating PR . Therefore, bayesian log-binomial regression model can effectively estimate PR with less misconvergence and have more advantages in application compared with conventional log-binomial regression model.
Risk of Recurrence in Operated Parasagittal Meningiomas: A Logistic Binary Regression Model.
Escribano Mesa, José Alberto; Alonso Morillejo, Enrique; Parrón Carreño, Tesifón; Huete Allut, Antonio; Narro Donate, José María; Méndez Román, Paddy; Contreras Jiménez, Ascensión; Pedrero García, Francisco; Masegosa González, José
2018-02-01
Parasagittal meningiomas arise from the arachnoid cells of the angle formed between the superior sagittal sinus (SSS) and the brain convexity. In this retrospective study, we focused on factors that predict early recurrence and recurrence times. We reviewed 125 patients with parasagittal meningiomas operated from 1985 to 2014. We studied the following variables: age, sex, location, laterality, histology, surgeons, invasion of the SSS, Simpson removal grade, follow-up time, angiography, embolization, radiotherapy, recurrence and recurrence time, reoperation, neurologic deficit, degree of dependency, and patient status at the end of follow-up. Patients ranged in age from 26 to 81 years (mean 57.86 years; median 60 years). There were 44 men (35.2%) and 81 women (64.8%). There were 57 patients with neurologic deficits (45.2%). The most common presenting symptom was motor deficit. World Health Organization grade I tumors were identified in 104 patients (84.6%), and the majority were the meningothelial type. Recurrence was detected in 34 cases. Time of recurrence was 9 to 336 months (mean: 84.4 months; median: 79.5 months). Male sex was identified as an independent risk for recurrence with relative risk 2.7 (95% confidence interval 1.21-6.15), P = 0.014. Kaplan-Meier curves for recurrence had statistically significant differences depending on sex, age, histologic type, and World Health Organization histologic grade. A binary logistic regression was made with the Hosmer-Lemeshow test with P > 0.05; sex, tumor size, and histologic type were used in this model. Male sex is an independent risk factor for recurrence that, associated with other factors such tumor size and histologic type, explains 74.5% of all cases in a binary regression model. Copyright © 2017 Elsevier Inc. All rights reserved.
Wang, Yubo; Veluvolu, Kalyana C
2017-06-14
It is often difficult to analyze biological signals because of their nonlinear and non-stationary characteristics. This necessitates the usage of time-frequency decomposition methods for analyzing the subtle changes in these signals that are often connected to an underlying phenomena. This paper presents a new approach to analyze the time-varying characteristics of such signals by employing a simple truncated Fourier series model, namely the band-limited multiple Fourier linear combiner (BMFLC). In contrast to the earlier designs, we first identified the sparsity imposed on the signal model in order to reformulate the model to a sparse linear regression model. The coefficients of the proposed model are then estimated by a convex optimization algorithm. The performance of the proposed method was analyzed with benchmark test signals. An energy ratio metric is employed to quantify the spectral performance and results show that the proposed method Sparse-BMFLC has high mean energy (0.9976) ratio and outperforms existing methods such as short-time Fourier transfrom (STFT), continuous Wavelet transform (CWT) and BMFLC Kalman Smoother. Furthermore, the proposed method provides an overall 6.22% in reconstruction error.
Directory of Open Access Journals (Sweden)
Yubo Wang
2017-06-01
Full Text Available It is often difficult to analyze biological signals because of their nonlinear and non-stationary characteristics. This necessitates the usage of time-frequency decomposition methods for analyzing the subtle changes in these signals that are often connected to an underlying phenomena. This paper presents a new approach to analyze the time-varying characteristics of such signals by employing a simple truncated Fourier series model, namely the band-limited multiple Fourier linear combiner (BMFLC. In contrast to the earlier designs, we first identified the sparsity imposed on the signal model in order to reformulate the model to a sparse linear regression model. The coefficients of the proposed model are then estimated by a convex optimization algorithm. The performance of the proposed method was analyzed with benchmark test signals. An energy ratio metric is employed to quantify the spectral performance and results show that the proposed method Sparse-BMFLC has high mean energy (0.9976 ratio and outperforms existing methods such as short-time Fourier transfrom (STFT, continuous Wavelet transform (CWT and BMFLC Kalman Smoother. Furthermore, the proposed method provides an overall 6.22% in reconstruction error.
A Regression Algorithm for Model Reduction of Large-Scale Multi-Dimensional Problems
Rasekh, Ehsan
2011-11-01
Model reduction is an approach for fast and cost-efficient modelling of large-scale systems governed by Ordinary Differential Equations (ODEs). Multi-dimensional model reduction has been suggested for reduction of the linear systems simultaneously with respect to frequency and any other parameter of interest. Multi-dimensional model reduction is also used to reduce the weakly nonlinear systems based on Volterra theory. Multiple dimensions degrade the efficiency of reduction by increasing the size of the projection matrix. In this paper a new methodology is proposed to efficiently build the reduced model based on regression analysis. A numerical example confirms the validity of the proposed regression algorithm for model reduction.
Formulating state space models in R with focus on longitudinal regression models
DEFF Research Database (Denmark)
Dethlefsen, Claus; Lundbye-Christensen, Søren
2006-01-01
We provide a language for formulating a range of state space models with response densities within the exponential family. The described methodology is implemented in the R-package sspir. A state space model is specified similarly to a generalized linear model in R, and then the time-varying terms...
Amaliana, Luthfatul; Sa'adah, Umu; Wayan Surya Wardhani, Ni
2017-12-01
Tetanus Neonatorum is an infectious disease that can be prevented by immunization. The number of Tetanus Neonatorum cases in East Java Province is the highest in Indonesia until 2015. Tetanus Neonatorum data contain over dispersion and big enough proportion of zero-inflation. Negative Binomial (NB) regression is an alternative method when over dispersion happens in Poisson regression. However, the data containing over dispersion and zero-inflation are more appropriately analyzed by using Zero-Inflated Negative Binomial (ZINB) regression. The purpose of this study are: (1) to model Tetanus Neonatorum cases in East Java Province with 71.05 percent proportion of zero-inflation by using NB and ZINB regression, (2) to obtain the best model. The result of this study indicates that ZINB is better than NB regression with smaller AIC.
Societies, *Conflict, Statistical processes, Behavioral science, Motivation, Social psychology, Nations, Political science, Correlation techniques, Pattern recognition, Hypotheses, Regression analysis
Stone, Wesley W.; Crawford, Charles G.; Gilliom, Robert J.
2013-01-01
Watershed Regressions for Pesticides for multiple pesticides (WARP-MP) are statistical models developed to predict concentration statistics for a wide range of pesticides in unmonitored streams. The WARP-MP models use the national atrazine WARP models in conjunction with an adjustment factor for each additional pesticide. The WARP-MP models perform best for pesticides with application timing and methods similar to those used with atrazine. For other pesticides, WARP-MP models tend to overpredict concentration statistics for the model development sites. For WARP and WARP-MP, the less-than-ideal sampling frequency for the model development sites leads to underestimation of the shorter-duration concentration; hence, the WARP models tend to underpredict 4- and 21-d maximum moving-average concentrations, with median errors ranging from 9 to 38% As a result of this sampling bias, pesticides that performed well with the model development sites are expected to have predictions that are biased low for these shorter-duration concentration statistics. The overprediction by WARP-MP apparent for some of the pesticides is variably offset by underestimation of the model development concentration statistics. Of the 112 pesticides used in the WARP-MP application to stream segments nationwide, 25 were predicted to have concentration statistics with a 50% or greater probability of exceeding one or more aquatic life benchmarks in one or more stream segments. Geographically, many of the modeled streams in the Corn Belt Region were predicted to have one or more pesticides that exceeded an aquatic life benchmark during 2009, indicating the potential vulnerability of streams in this region.
A Technique of Fuzzy C-Mean in Multiple Linear Regression Model toward Paddy Yield
Syazwan Wahab, Nur; Saifullah Rusiman, Mohd; Mohamad, Mahathir; Amira Azmi, Nur; Che Him, Norziha; Ghazali Kamardan, M.; Ali, Maselan
2018-04-01
In this paper, we propose a hybrid model which is a combination of multiple linear regression model and fuzzy c-means method. This research involved a relationship between 20 variates of the top soil that are analyzed prior to planting of paddy yields at standard fertilizer rates. Data used were from the multi-location trials for rice carried out by MARDI at major paddy granary in Peninsular Malaysia during the period from 2009 to 2012. Missing observations were estimated using mean estimation techniques. The data were analyzed using multiple linear regression model and a combination of multiple linear regression model and fuzzy c-means method. Analysis of normality and multicollinearity indicate that the data is normally scattered without multicollinearity among independent variables. Analysis of fuzzy c-means cluster the yield of paddy into two clusters before the multiple linear regression model can be used. The comparison between two method indicate that the hybrid of multiple linear regression model and fuzzy c-means method outperform the multiple linear regression model with lower value of mean square error.
As a fast and effective technique, the multiple linear regression (MLR) method has been widely used in modeling and prediction of beach bacteria concentrations. Among previous works on this subject, however, several issues were insufficiently or inconsistently addressed. Those is...
Rank Set Sampling in Improving the Estimates of Simple Regression Model
Directory of Open Access Journals (Sweden)
M Iqbal Jeelani
2015-04-01
Full Text Available In this paper Rank set sampling (RSS is introduced with a view of increasing the efficiency of estimates of Simple regression model. Regression model is considered with respect to samples taken from sampling techniques like Simple random sampling (SRS, Systematic sampling (SYS and Rank set sampling (RSS. It is found that R2 and Adj R2 obtained from regression model based on Rank set sample is higher than rest of two sampling schemes. Similarly Root mean square error, p-values, coefficient of variation are much lower in Rank set based regression model, also under validation technique (Jackknifing there is consistency in the measure of R2, Adj R2 and RMSE in case of RSS as compared to SRS and SYS. Results are supported with an empirical study involving a real data set generated of Pinus Wallichiana taken from block Langate of district Kupwara.
Reflexion on linear regression trip production modelling method for ensuring good model quality
Suprayitno, Hitapriya; Ratnasari, Vita
2017-11-01
Transport Modelling is important. For certain cases, the conventional model still has to be used, in which having a good trip production model is capital. A good model can only be obtained from a good sample. Two of the basic principles of a good sampling is having a sample capable to represent the population characteristics and capable to produce an acceptable error at a certain confidence level. It seems that this principle is not yet quite understood and used in trip production modeling. Therefore, investigating the Trip Production Modelling practice in Indonesia and try to formulate a better modeling method for ensuring the Model Quality is necessary. This research result is presented as follows. Statistics knows a method to calculate span of prediction value at a certain confidence level for linear regression, which is called Confidence Interval of Predicted Value. The common modeling practice uses R2 as the principal quality measure, the sampling practice varies and not always conform to the sampling principles. An experiment indicates that small sample is already capable to give excellent R2 value and sample composition can significantly change the model. Hence, good R2 value, in fact, does not always mean good model quality. These lead to three basic ideas for ensuring good model quality, i.e. reformulating quality measure, calculation procedure, and sampling method. A quality measure is defined as having a good R2 value and a good Confidence Interval of Predicted Value. Calculation procedure must incorporate statistical calculation method and appropriate statistical tests needed. A good sampling method must incorporate random well distributed stratified sampling with a certain minimum number of samples. These three ideas need to be more developed and tested.
Kamarianakis, Yiannis; Gao, H Oliver
2010-02-15
Collecting and analyzing high frequency emission measurements has become very usual during the past decade as significantly more information with respect to formation conditions can be collected than from regulated bag measurements. A challenging issue for researchers is the accurate time-alignment between tailpipe measurements and engine operating variables. An alignment procedure should take into account both the reaction time of the analyzers and the dynamics of gas transport in the exhaust and measurement systems. This paper discusses a statistical modeling framework that compensates for variable exhaust transport delay while relating tailpipe measurements with engine operating covariates. Specifically it is shown that some variants of the smooth transition regression model allow for transport delays that vary smoothly as functions of the exhaust flow rate. These functions are characterized by a pair of coefficients that can be estimated via a least-squares procedure. The proposed models can be adapted to encompass inherent nonlinearities that were implicit in previous instantaneous emissions modeling efforts. This article describes the methodology and presents an illustrative application which uses data collected from a diesel bus under real-world driving conditions.
Tsai, C C; Chen, Z S; Duh, C T; Horng, F W
2001-01-01
Techniques for conventional forest soil surveys in Taiwan need to be further developed in order to save time and money. Although some soil-landscape regression models have been developed to describe and predict soil properties and depths, they have seldom been studied in Taiwan. This study establishes linear soil-landscape regression models related to soil depths and landscape factors found in the forest soils of southern Taiwan. These models were evaluated by validating the models according to their mean errors and root mean square errors. The study was carried out at the 60,000 ha Chishan Forest Working Circle. About 310 soil pedons were collected. The landscape factors included elevation, slope, aspect, and surface stone contents. Sixty percent of the total field samples were used to establish the soil-landscape regression models, and forty % were used for validation. The sampling strategy indicated that each representative pedon covers an area of about 147 ha. The number of samples was appropriate considering the available time and budget. The single variate and/or multivariate linear regression soil-landscape models were successfully established. Those models revealed significant inter-relations among the soil depths of the B and B+BC horizons, solum thickness, and landscape factors, including slope and surface stone contents (p stone should be collected in a field soil survey to increase the precision of soil depth prediction of the B and B+BC horizons, and the solum thickness.
Yusuf, O B; Bamgboye, E A; Afolabi, R F; Shodimu, M A
2014-09-01
Logistic regression model is widely used in health research for description and predictive purposes. Unfortunately, most researchers are sometimes not aware that the underlying principles of the techniques have failed when the algorithm for maximum likelihood does not converge. Young researchers particularly postgraduate students may not know why separation problem whether quasi or complete occurs, how to identify it and how to fix it. This study was designed to critically evaluate convergence issues in articles that employed logistic regression analysis published in an African Journal of Medicine and medical sciences between 2004 and 2013. Problems of quasi or complete separation were described and were illustrated with the National Demographic and Health Survey dataset. A critical evaluation of articles that employed logistic regression was conducted. A total of 581 articles was reviewed, of which 40 (6.9%) used binary logistic regression. Twenty-four (60.0%) stated the use of logistic regression model in the methodology while none of the articles assessed model fit. Only 3 (12.5%) properly described the procedures. Of the 40 that used the logistic regression model, the problem of convergence occurred in 6 (15.0%) of the articles. Logistic regression tends to be poorly reported in studies published between 2004 and 2013. Our findings showed that the procedure may not be well understood by researchers since very few described the process in their reports and may be totally unaware of the problem of convergence or how to deal with it.
Directory of Open Access Journals (Sweden)
Hossein Fallahzadeh
2017-05-01
Full Text Available Introduction: Different statistical methods can be used to analyze fertility data. When the response variable is discrete, Poisson model is applied. If the condition does not hold for the Poisson model, its generalized model will be applied. The goal of this study was to compare the efficiency of generalized Poisson regression model with the standard Poisson regression model in estimating the coefficient of effective factors onthe current number of children. Methods: This is a cross-sectional study carried out on a populationof married women within the age range of15-49 years in Kashan, Iran. The cluster sampling method was used for data collection. Clusters consisted ofthe urbanblocksdeterminedby the municipality.Atotal number of10clusters each containing30households was selected according to the health center's framework. The necessary data were then collected through a self-madequestionnaireanddirectinterviewswith women under study. Further, the data analysiswas performed by usingthe standard and generalizedPoisson regression models through theRsoftware. Results: The average number of children for each woman was 1.45 with a variance of 1.073.A significant relationship was observed between the husband's age, number of unwanted pregnancies, and the average durationof breastfeeding with the present number of children in the two standard and generalized Poisson regression models (p < 0.05.The mean ageof women participating in thisstudy was33.1± 7.57 years (from 25.53 years to 40.67, themean age of marriage was 20.09 ± 3.82 (from16.27 years to23.91, and themean age of their husbands was 37.9 ± 8.4years (from 29.5 years to 46.3. In the current study, the majority of women werein the age range of 30-35years old with the medianof 32years, however, most ofmen were in the age range of 35-40yearswith the median of37years. While 236of women did not have unwanted pregnancies, most participants of the present study had one unwanted pregnancy
Non-linear regression techniques are used widely to fit weed field emergence patterns to soil microclimatic indices using S-type functions. Artificial neural networks present interesting and alternative features for such modeling purposes. In this work, a univariate hydrothermal-time based Weibull m...
Chen, Baojiang; Qin, Jing
2014-05-10
In statistical analysis, a regression model is needed if one is interested in finding the relationship between a response variable and covariates. When the response depends on the covariate, then it may also depend on the function of this covariate. If one has no knowledge of this functional form but expect for monotonic increasing or decreasing, then the isotonic regression model is preferable. Estimation of parameters for isotonic regression models is based on the pool-adjacent-violators algorithm (PAVA), where the monotonicity constraints are built in. With missing data, people often employ the augmented estimating method to improve estimation efficiency by incorporating auxiliary information through a working regression model. However, under the framework of the isotonic regression model, the PAVA does not work as the monotonicity constraints are violated. In this paper, we develop an empirical likelihood-based method for isotonic regression model to incorporate the auxiliary information. Because the monotonicity constraints still hold, the PAVA can be used for parameter estimation. Simulation studies demonstrate that the proposed method can yield more efficient estimates, and in some situations, the efficiency improvement is substantial. We apply this method to a dementia study. Copyright © 2013 John Wiley & Sons, Ltd.
Mapping soil organic carbon stocks by robust geostatistical and boosted regression models
Nussbaum, Madlene; Papritz, Andreas; Baltensweiler, Andri; Walthert, Lorenz
2013-04-01
Carbon (C) sequestration in forests offsets greenhouse gas emissions. Therefore, quantifying C stocks and fluxes in forest ecosystems is of interest for greenhouse gas reporting according to the Kyoto protocol. In Switzerland, the National Forest Inventory offers comprehensive data to quantify the aboveground forest biomass and its change in time. Estimating stocks of soil organic C (SOC) in forests is more difficult because the variables needed to quantify stocks vary strongly in space and precise quantification of some of them is very costly. Based on data from 1'033 plots we modeled SOC stocks of the organic layer and the mineral soil to depths of 30 cm and 100 cm for the Swiss forested area. For the statistical modeling a broad range of covariates were available: Climate data (e. g. precipitation, temperature), two elevation models (resolutions 25 and 2 m) with respective terrain attributes and spectral reflectance data representing vegetation. Furthermore, the main mapping units of an overview soil map and a coarse scale geological map were used to coarsely represent the parent material of the soils. The selection of important covariates for SOC stocks modeling out of a large set was a major challenge for the statistical modeling. We used two approaches to deal with this problem: 1) A robust restricted maximum likelihood method to fit linear regression model with spatially correlated errors. The large number of covariates was first reduced by LASSO (Least Absolute Shrinkage and Selection Operator) and then further narrowed down to a parsimonious set of important covariates by cross-validation of the robustly fitted model. To account for nonlinear dependencies of the response on the covariates interaction terms of the latter were included in model if this improved the fit. 2) A boosted structured regression model with componentwise linear least squares or componentwise smoothing splines as base procedures. The selection of important covariates was done by the
Mölter, A; Lindley, S; de Vocht, F; Simpson, A; Agius, R
2010-12-01
Over recent years land use regression (LUR) has become a frequently used method in air pollution exposure studies, as it can model intra-urban variation in pollutant concentrations at a fine spatial scale. However, very few studies have used the LUR methodology to also model the temporal variation in air pollution exposure. The aim of this study is to estimate annual mean NO(2) and PM(10) concentrations from 1996 to 2008 for Greater Manchester using land use regression models. The results from these models will be used in the Manchester Asthma and Allergy Study (MAAS) birth cohort to determine health effects of air pollution exposure. The Greater Manchester LUR model for 2005 was recalibrated using interpolated and adjusted NO(2) and PM(10) concentrations as dependent variables for 1996-2008. In addition, temporally resolved variables were available for traffic intensity and PM(10) emissions. To validate the resulting LUR models, they were applied to the locations of automatic monitoring stations and the estimated concentrations were compared against measured concentrations. The 2005 LUR models were successfully recalibrated, providing individual models for each year from 1996 to 2008. When applied to the monitoring stations the mean prediction error (MPE) for NO(2) concentrations for all stations and years was -0.8μg/m³ and the root mean squared error (RMSE) was 6.7μg/m³. For PM(10) concentrations the MPE was 0.8μg/m³ and the RMSE was 3.4μg/m³. These results indicate that it is possible to model temporal variation in air pollution through LUR with relatively small prediction errors. It is likely that most previous LUR studies did not include temporal variation, because they were based on short term monitoring campaigns and did not have historic pollution data. The advantage of this study is that it uses data from an air dispersion model, which provided concentrations for 2005 and 2010, and therefore allowed extrapolation over a longer time period
Anderson, Weston; Guikema, Seth; Zaitchik, Ben; Pan, William
2014-01-01
Obtaining accurate small area estimates of population is essential for policy and health planning but is often difficult in countries with limited data. In lieu of available population data, small area estimate models draw information from previous time periods or from similar areas. This study focuses on model-based methods for estimating population when no direct samples are available in the area of interest. To explore the efficacy of tree-based models for estimating population density, we compare six different model structures including Random Forest and Bayesian Additive Regression Trees. Results demonstrate that without information from prior time periods, non-parametric tree-based models produced more accurate predictions than did conventional regression methods. Improving estimates of population density in non-sampled areas is important for regions with incomplete census data and has implications for economic, health and development policies.
Prediction of the indoor temperatures of an urban area with an in-time regression mapping approach.
Smargiassi, Audrey; Fournier, Michel; Griot, Chloé; Baudouin, Yves; Kosatsky, Tom
2008-05-01
Excess mortality has been noted during high ambient temperature episodes. During such episodes, individuals are not likely to be uniformly exposed to temperatures within cities. Exposure of individuals to high temperatures is likely to fluctuate with the micro-urban variation of outdoor temperatures (heat island effect) and with factors linked to building properties. In this paper, a GIS-based regression mapping approach is proposed to model urban spatial patterns of indoor temperatures in time, for all residential buildings of an urban area. In July 2005, the hourly indoor temperature was measured with data loggers for 31 consecutive days, concurrently in 75 dwellings in Montreal. The general estimating equation model (GEE) developed to predict indoor temperatures integrates temporal variability of outdoor temperatures (and their 24 h moving average), with geo-referenced determinants available for the entire city, such as surface temperatures at each site (from a satellite image) and building characteristics (from the Montreal Property Assessment database). The proportion of the variability of the indoor temperatures explained increases from 20%, using only outdoor temperatures, to 54% with the full model. Using this model, high-resolution maps of indoor temperatures can be provided across an entire urban area. The model developed adds a temporal dimension to similar regression mapping approaches used to estimate exposure for population health studies, based on spatial predictors, and can thus be used to predict exposure to indoor temperatures under various outdoor temperature scenarios. It is thus concluded that such a model might be used as a means of mapping indoor temperatures either to inform urban planning and housing strategies to mitigate the effects of climate change, to orient public health interventions, or as a basis for assessing exposure as part of epidemiological studies.
A soft-sensing model for feedwater flow rate using fuzzy support vector regression
Energy Technology Data Exchange (ETDEWEB)
Na, Man Gyun; Yang, Heon Young; Lim, Dong Hyuk [Chosun University, Gwangju (Korea, Republic of)
2008-02-15
Most pressurized water reactors use Venturi flow meters to measure the feedwater flow rate. However, fouling phenomena, which allow corrosion products to accumulate and increase the differential pressure across the Venturi flow meter, can result in an overestimation of the flow rate. In this study, a soft-sensing model based on fuzzy support vector regression was developed to enable accurate on-line prediction of the feedwater flow rate. The available data was divided into two groups by fuzzy c-means clustering in order to reduce the training time. The data for training the soft-sensing model was selected from each data group with the aid of a subtractive clustering scheme because informative data increases the learning effect. The proposed soft-sensing model was confirmed with the real plant data of Yonggwang Nuclear Power Plant Unit 3. The root mean square error and relative maximum error of the model were quite small. Hence, this model can be used to validate and monitor existing hardware feedwater flow meters.
Generic global regression models for growth prediction of Salmonella in ground pork and pork cuts
DEFF Research Database (Denmark)
Buschhardt, Tasja; Hansen, Tina Beck; Bahl, Martin Iain
2017-01-01
Introduction and Objectives Models for the prediction of bacterial growth in fresh pork are primarily developed using two-step regression (i.e. primary models followed by secondary models). These models are also generally based on experiments in liquids or ground meat and neglect surface growth....... It has been shown that one-step global regressions can result in more accurate models and that bacterial growth on intact surfaces can substantially differ from growth in liquid culture. Material and Methods We used a global-regression approach to develop predictive models for the growth of Salmonella...... for three pork matrices: on the surface of shoulder (neck) and hind part (ham), and in ground pork. We conducted five experimental trials and inoculated essentially sterile pork pieces with a Salmonella cocktail (n = 192). Inoculated meat was aerobically incubated at 4 °C, 7 °C, 12 °C, and 16 °C for 96 h...
Optimization of end-members used in multiple linear regression geochemical mixing models
Dunlea, Ann G.; Murray, Richard W.
2015-11-01
Tracking marine sediment provenance (e.g., of dust, ash, hydrothermal material, etc.) provides insight into contemporary ocean processes and helps construct paleoceanographic records. In a simple system with only a few end-members that can be easily quantified by a unique chemical or isotopic signal, chemical ratios and normative calculations can help quantify the flux of sediment from the few sources. In a more complex system (e.g., each element comes from multiple sources), more sophisticated mixing models are required. MATLAB codes published in Pisias et al. solidified the foundation for application of a Constrained Least Squares (CLS) multiple linear regression technique that can use many elements and several end-members in a mixing model. However, rigorous sensitivity testing to check the robustness of the CLS model is time and labor intensive. MATLAB codes provided in this paper reduce the time and labor involved and facilitate finding a robust and stable CLS model. By quickly comparing the goodness of fit between thousands of different end-member combinations, users are able to identify trends in the results that reveal the CLS solution uniqueness and the end-member composition precision required for a good fit. Users can also rapidly check that they have the appropriate number and type of end-members in their model. In the end, these codes improve the user's confidence that the final CLS model(s) they select are the most reliable solutions. These advantages are demonstrated by application of the codes in two case studies of well-studied datasets (Nazca Plate and South Pacific Gyre).
International Nuclear Information System (INIS)
Fang, Xiande; Xu, Yu
2011-01-01
The empirical model of turbine efficiency is necessary for the control- and/or diagnosis-oriented simulation and useful for the simulation and analysis of dynamic performances of the turbine equipment and systems, such as air cycle refrigeration systems, power plants, turbine engines, and turbochargers. Existing empirical models of turbine efficiency are insufficient because there is no suitable form available for air cycle refrigeration turbines. This work performs a critical review of empirical models (called mean value models in some literature) of turbine efficiency and develops an empirical model in the desired form for air cycle refrigeration, the dominant cooling approach in aircraft environmental control systems. The Taylor series and regression analysis are used to build the model, with the Taylor series being used to expand functions with the polytropic exponent and the regression analysis to finalize the model. The measured data of a turbocharger turbine and two air cycle refrigeration turbines are used for the regression analysis. The proposed model is compact and able to present the turbine efficiency map. Its predictions agree with the measured data very well, with the corrected coefficient of determination R c 2 ≥ 0.96 and the mean absolute percentage deviation = 1.19% for the three turbines. -- Highlights: → Performed a critical review of empirical models of turbine efficiency. → Developed an empirical model in the desired form for air cycle refrigeration, using the Taylor expansion and regression analysis. → Verified the method for developing the empirical model. → Verified the model.
Random regression test-day model for the analysis of dairy cattle ...
African Journals Online (AJOL)
Genetic evaluation of dairy cattle using test-day models is now common internationally. In South Africa a fixed regression test-day model is used to generate breeding values for dairy animals on a routine basis. The model is, however, often criticized for erroneously assuming a standard lactation curve for cows in similar ...
Xiao, Yongling; Abrahamowicz, Michal
2010-03-30
We propose two bootstrap-based methods to correct the standard errors (SEs) from Cox's model for within-cluster correlation of right-censored event times. The cluster-bootstrap method resamples, with replacement, only the clusters, whereas the two-step bootstrap method resamples (i) the clusters, and (ii) individuals within each selected cluster, with replacement. In simulations, we evaluate both methods and compare them with the existing robust variance estimator and the shared gamma frailty model, which are available in statistical software packages. We simulate clustered event time data, with latent cluster-level random effects, which are ignored in the conventional Cox's model. For cluster-level covariates, both proposed bootstrap methods yield accurate SEs, and type I error rates, and acceptable coverage rates, regardless of the true random effects distribution, and avoid serious variance under-estimation by conventional Cox-based standard errors. However, the two-step bootstrap method over-estimates the variance for individual-level covariates. We also apply the proposed bootstrap methods to obtain confidence bands around flexible estimates of time-dependent effects in a real-life analysis of cluster event times.
Singh, Kunwar P; Gupta, Shikha; Rai, Premanjali
2014-05-01
Kernel function-based regression models were constructed and applied to a nonlinear hydro-chemical dataset pertaining to surface water for predicting the dissolved oxygen levels. Initial features were selected using nonlinear approach. Nonlinearity in the data was tested using BDS statistics, which revealed the data with nonlinear structure. Kernel ridge regression, kernel principal component regression, kernel partial least squares regression, and support vector regression models were developed using the Gaussian kernel function and their generalization and predictive abilities were compared in terms of several statistical parameters. Model parameters were optimized using the cross-validation procedure. The proposed kernel regression methods successfully captured the nonlinear features of the original data by transforming it to a high dimensional feature space using the kernel function. Performance of all the kernel-based modeling methods used here were comparable both in terms of predictive and generalization abilities. Values of the performance criteria parameters suggested for the adequacy of the constructed models to fit the nonlinear data and their good predictive capabilities.
Laporte, Audrey; Karimova, Alfia; Ferguson, Brian
2010-09-01
The time path of consumption from a rational addiction (RA) model contains information about an individual's tendency to be forward looking. In this paper, we use quantile regression (QR) techniques to investigate whether the tendency to be forward looking varies systematically with the level of consumption of cigarettes. Using panel data, we find that the forward-looking effect is strongest relative to the addiction effect in the lower quantiles of cigarette consumption, and that the forward-looking effect declines and the addiction effect increases as we move toward the upper quantiles. The results indicate that QR can be used to illuminate the heterogeneity in individuals' tendency to be forward looking even after controlling for factors such as education. QR also gives useful information about the differential impact of policy variables, most notably workplace smoking restrictions, on light and heavy smokers. Copyright (c) 2010 John Wiley & Sons, Ltd.
Giacomo, Della Riccia; Stefania, Del Zotto
2013-12-15
Fumonisins are mycotoxins produced by Fusarium species that commonly live in maize. Whereas fungi damage plants, fumonisins cause disease both to cattle breedings and human beings. Law limits set fumonisins tolerable daily intake with respect to several maize based feed and food. Chemical techniques assure the most reliable and accurate measurements, but they are expensive and time consuming. A method based on Near Infrared spectroscopy and multivariate statistical regression is described as a simpler, cheaper and faster alternative. We apply Partial Least Squares with full cross validation. Two models are described, having high correlation of calibration (0.995, 0.998) and of validation (0.908, 0.909), respectively. Description of observed phenomenon is accurate and overfitting is avoided. Screening of contaminated maize with respect to European legal limit of 4 mg kg(-1) should be assured. Copyright © 2013 Elsevier Ltd. All rights reserved.
Linear regression models and k-means clustering for statistical analysis of fNIRS data.
Bonomini, Viola; Zucchelli, Lucia; Re, Rebecca; Ieva, Francesca; Spinelli, Lorenzo; Contini, Davide; Paganoni, Anna; Torricelli, Alessandro
2015-02-01
We propose a new algorithm, based on a linear regression model, to statistically estimate the hemodynamic activations in fNIRS data sets. The main concern guiding the algorithm development was the minimization of assumptions and approximations made on the data set for the application of statistical tests. Further, we propose a K-means method to cluster fNIRS data (i.e. channels) as activated or not activated. The methods were validated both on simulated and in vivo fNIRS data. A time domain (TD) fNIRS technique was preferred because of its high performances in discriminating cortical activation and superficial physiological changes. However, the proposed method is also applicable to continuous wave or frequency domain fNIRS data sets.
REGRESSION MODEL FOR RISK REPORTING IN FINANCIAL STATEMENTS OF ACCOUNTING SERVICES ENTITIES
Directory of Open Access Journals (Sweden)
Mirela NICHITA
2015-06-01
Full Text Available The purpose of financial reports is to provide useful information to users; the utility of information is defined through the qualitative characteristics (fundamental and enhancing. The financial crisis emphasized the limits of financial reporting which has been unable to prevent investors about the risks they were facing. Due to the current changes in business environment, managers have been highly motivated to rethink and improve the risk governance philosophy, processes and methodologies. The lack of quality, timely data and adequate systems to capture, report and measure the right information across the organization is a fundamental challenge for implementing and sustaining all aspects of effective risk management. Starting with the 80s, the investors are more interested in narratives (Notes to financial statements, than in primary reports (financial position and performance. The research will apply a regression model for assessment of risk reporting by the professional (accounting and taxation services for major companies from Romania during the period 2009 – 2013.
Modelling and analysis of turbulent datasets using Auto Regressive Moving Average processes
International Nuclear Information System (INIS)
Faranda, Davide; Dubrulle, Bérengère; Daviaud, François; Pons, Flavio Maria Emanuele; Saint-Michel, Brice; Herbert, Éric; Cortet, Pierre-Philippe
2014-01-01
We introduce a novel way to extract information from turbulent datasets by applying an Auto Regressive Moving Average (ARMA) statistical analysis. Such analysis goes well beyond the analysis of the mean flow and of the fluctuations and links the behavior of the recorded time series to a discrete version of a stochastic differential equation which is able to describe the correlation structure in the dataset. We introduce a new index Υ that measures the difference between the resulting analysis and the Obukhov model of turbulence, the simplest stochastic model reproducing both Richardson law and the Kolmogorov spectrum. We test the method on datasets measured in a von Kármán swirling flow experiment. We found that the ARMA analysis is well correlated with spatial structures of the flow, and can discriminate between two different flows with comparable mean velocities, obtained by changing the forcing. Moreover, we show that the Υ is highest in regions where shear layer vortices are present, thereby establishing a link between deviations from the Kolmogorov model and coherent structures. These deviations are consistent with the ones observed by computing the Hurst exponents for the same time series. We show that some salient features of the analysis are preserved when considering global instead of local observables. Finally, we analyze flow configurations with multistability features where the ARMA technique is efficient in discriminating different stability branches of the system
An Ionospheric Index Model based on Linear Regression and Neural Network Approaches
Tshisaphungo, Mpho; McKinnell, Lee-Anne; Bosco Habarulema, John
2017-04-01
The ionosphere is well known to reflect radio wave signals in the high frequency (HF) band due to the present of electron and ions within the region. To optimise the use of long distance HF communications, it is important to understand the drivers of ionospheric storms and accurately predict the propagation conditions especially during disturbed days. This paper presents the development of an ionospheric storm-time index over the South African region for the application of HF communication users. The model will result into a valuable tool to measure the complex ionospheric behaviour in an operational space weather monitoring and forecasting environment. The development of an ionospheric storm-time index is based on a single ionosonde station data over Grahamstown (33.3°S,26.5°E), South Africa. Critical frequency of the F2 layer (foF2) measurements for a period 1996-2014 were considered for this study. The model was developed based on linear regression and neural network approaches. In this talk validation results for low, medium and high solar activity periods will be discussed to demonstrate model's performance.
Modelling and analysis of turbulent datasets using Auto Regressive Moving Average processes
Energy Technology Data Exchange (ETDEWEB)
Faranda, Davide, E-mail: davide.faranda@cea.fr; Dubrulle, Bérengère; Daviaud, François [Laboratoire SPHYNX, Service de Physique de l' Etat Condensé, DSM, CEA Saclay, CNRS URA 2464, 91191 Gif-sur-Yvette (France); Pons, Flavio Maria Emanuele [Dipartimento di Scienze Statistiche, Universitá di Bologna, Via delle Belle Arti 41, 40126 Bologna (Italy); Saint-Michel, Brice [Institut de Recherche sur les Phénomènes Hors Equilibre, Technopole de Chateau Gombert, 49 rue Frédéric Joliot Curie, B.P. 146, 13 384 Marseille (France); Herbert, Éric [Université Paris Diderot - LIED - UMR 8236, Laboratoire Interdisciplinaire des Énergies de Demain, Paris (France); Cortet, Pierre-Philippe [Laboratoire FAST, CNRS, Université Paris-Sud (France)
2014-10-15
We introduce a novel way to extract information from turbulent datasets by applying an Auto Regressive Moving Average (ARMA) statistical analysis. Such analysis goes well beyond the analysis of the mean flow and of the fluctuations and links the behavior of the recorded time series to a discrete version of a stochastic differential equation which is able to describe the correlation structure in the dataset. We introduce a new index Υ that measures the difference between the resulting analysis and the Obukhov model of turbulence, the simplest stochastic model reproducing both Richardson law and the Kolmogorov spectrum. We test the method on datasets measured in a von Kármán swirling flow experiment. We found that the ARMA analysis is well correlated with spatial structures of the flow, and can discriminate between two different flows with comparable mean velocities, obtained by changing the forcing. Moreover, we show that the Υ is highest in regions where shear layer vortices are present, thereby establishing a link between deviations from the Kolmogorov model and coherent structures. These deviations are consistent with the ones observed by computing the Hurst exponents for the same time series. We show that some salient features of the analysis are preserved when considering global instead of local observables. Finally, we analyze flow configurations with multistability features where the ARMA technique is efficient in discriminating different stability branches of the system.
Culpepper, Steven Andrew
2010-01-29
Statistical prediction remains an important tool for decisions in a variety of disciplines. An equally important issue is identifying factors that contribute to more or less accurate predictions. The time series literature includes well developed methods for studying predictability and volatility over time. This article develops distribution-appropriate methods for studying individual differences in predictability for settings in psychological research. Specifically, 3 different approaches are discussed for modeling predictability. The 1st is a bivariate measure of predictability discussed previously in the psychology literature, the squared or absolute valued difference between criterion and predictor, which is shown to follow the gamma distribution. The 2nd method extended limitations of previous research and involved understanding predictability in regression models. The 3rd method used nonlinear multilevel models to study predictability in settings where participants are nested within clusters. An application was presented using SAS NLMIXED to understand the predictability of college grade point average by student demographic characteristics. The findings from the application suggest that the 1st-year college performance of English as a second language students were, on average, less predictable whereas females and Whites tended to demonstrate more predictable academic performance than their male or racial/ethnic minority counterparts.
Hirsch, Robert M.; Moyer, Douglas; Archfield, Stacey A.
2010-01-01
A new approach to the analysis of long-term surface water-quality data is proposed and implemented. The goal of this approach is to increase the amount of information that is extracted from the types of rich water-quality datasets that now exist. The method is formulated to allow for maximum flexibility in representations of the long-term trend, seasonal components, and discharge-related components of the behavior of the water-quality variable of interest. It is designed to provide internally consistent estimates of the actual history of concentrations and fluxes as well as histories that eliminate the influence of year-to-year variations in streamflow. The method employs the use of weighted regressions of concentrations on time, discharge, and season. Finally, the method is designed to be useful as a diagnostic tool regarding the kinds of changes that are taking place in the watershed related to point sources, groundwater sources, and surface-water nonpoint sources. The method is applied to datasets for the nine large tributaries of Chesapeake Bay from 1978 to 2008. The results show a wide range of patterns of change in total phosphorus and in dissolved nitrate plus nitrite. These results should prove useful in further examination of the causes of changes, or lack of changes, and may help inform decisions about future actions to reduce nutrient enrichment in the Chesapeake Bay and its watershed.
I. M. Yassin; M. F. Abdul Khalid; S. H. Herman; I. Pasya; N. Ab Wahab; Z. Awang
2017-01-01
The prediction of stocks in the stock market is important in investment as it would help the investor to time buy and sell transactions to maximize profits. In this paper, a Multi-Layer Perceptron (MLP)-based Nonlinear Auto-Regressive with Exogenous Inputs (NARX) model was used to predict the prices of the Apple Inc. weekly stock prices over a time horizon of 1995 to 2013. The NARX model belongs is a system identification model that constructs a mathematical model from the dynamic input/outpu...
The limiting behavior of the estimated parameters in a misspecified random field regression model
DEFF Research Database (Denmark)
Dahl, Christian Møller; Qin, Yu
This paper examines the limiting properties of the estimated parameters in the random field regression model recently proposed by Hamilton (Econometrica, 2001). Though the model is parametric, it enjoys the flexibility of the nonparametric approach since it can approximate a large collection...... convenient new uniform convergence results that we propose. This theory may have applications beyond those presented here. Our results indicate that classical statistical inference techniques, in general, works very well for random field regression models in finite samples and that these models succesfully...
DEFF Research Database (Denmark)
Carstensen, Bendix
1996-01-01
This paper shows how to fit excess and relative risk regression models to interval censored survival data, and how to implement the models in standard statistical software. The methods developed are used for the analysis of HIV infection rates in a cohort of Danish homosexual men.......This paper shows how to fit excess and relative risk regression models to interval censored survival data, and how to implement the models in standard statistical software. The methods developed are used for the analysis of HIV infection rates in a cohort of Danish homosexual men....
The Relationship between Economic Growth and Money Laundering – a Linear Regression Model
Directory of Open Access Journals (Sweden)
Daniel Rece
2009-09-01
Full Text Available This study provides an overview of the relationship between economic growth and money laundering modeled by a least squares function. The report analyzes statistically data collected from USA, Russia, Romania and other eleven European countries, rendering a linear regression model. The study illustrates that 23.7% of the total variance in the regressand (level of money laundering is “explained” by the linear regression model. In our opinion, this model will provide critical auxiliary judgment and decision support for anti-money laundering service systems.
Directory of Open Access Journals (Sweden)
J. Chardon
2018-01-01
Full Text Available Statistical downscaling models (SDMs are often used to produce local weather scenarios from large-scale atmospheric information. SDMs include transfer functions which are based on a statistical link identified from observations between local weather and a set of large-scale predictors. As physical processes driving surface weather vary in time, the most relevant predictors and the regression link are likely to vary in time too. This is well known for precipitation for instance and the link is thus often estimated after some seasonal stratification of the data. In this study, we present a two-stage analog/regression model where the regression link is estimated from atmospheric analogs of the current prediction day. Atmospheric analogs are identified from fields of geopotential heights at 1000 and 500 hPa. For the regression stage, two generalized linear models are further used to model the probability of precipitation occurrence and the distribution of non-zero precipitation amounts, respectively. The two-stage model is evaluated for the probabilistic prediction of small-scale precipitation over France. It noticeably improves the skill of the prediction for both precipitation occurrence and amount. As the analog days vary from one prediction day to another, the atmospheric predictors selected in the regression stage and the value of the corresponding regression coefficients can vary from one prediction day to another. The model allows thus for a day-to-day adaptive and tailored downscaling. It can also reveal specific predictors for peculiar and non-frequent weather configurations.
Regression models for the restricted residual mean life for right-censored and left-truncated data.
Cortese, Giuliana; Holmboe, Stine A; Scheike, Thomas H
2017-05-20
The hazard ratios resulting from a Cox's regression hazards model are hard to interpret and to be converted into prolonged survival time. As the main goal is often to study survival functions, there is increasing interest in summary measures based on the survival function that are easier to interpret than the hazard ratio; the residual mean time is an important example of those measures. However, because of the presence of right censoring, the tail of the survival distribution is often difficult to estimate correctly. Therefore, we consider the restricted residual mean time, which represents a partial area under the survival function, given any time horizon τ, and is interpreted as the residual life expectancy up to τ of a subject surviving up to time t. We present a class of regression models for this measure, based on weighted estimating equations and inverse probability of censoring weighted estimators to model potential right censoring. Furthermore, we show how to extend the models and the estimators to deal with delayed entries. We demonstrate that the restricted residual mean life estimator is equivalent to integrals of Kaplan-Meier estimates in the case of simple factor variables. Estimation performance is investigated by simulation studies. Using real data from Danish Monitoring Cardiovascular Risk Factor Surveys, we illustrate an application to additive regression models and discuss the general assumption of right censoring and left truncation being dependent on covariates. Copyright © 2017 John Wiley & Sons, Ltd. Copyright © 2017 John Wiley & Sons, Ltd.
A History of Regression and Related Model-Fitting in the Earth Sciences (1636?-2000)
International Nuclear Information System (INIS)
Howarth, Richard J.
2001-01-01
roots in meeting the evident need for improved estimators in spatial interpolation. Technical advances in regression analysis during the 1970s embraced the development of regression diagnostics and consequent attention to outliers; the recognition of problems caused by correlated predictors, and the subsequent introduction of ridge regression to overcome them; and techniques for fitting errors-in-variables and mixture models. Improvements in computational power have enabled ever more computer-intensive methods to be applied. These include algorithms which are robust in the presence of outliers, for example Rousseeuw's 1984 Least Median Squares; nonparametric smoothing methods, such as kernel-functions, splines and Cleveland's 1979 LOcally WEighted Scatterplot Smoother (LOWESS); and the Classification and Regression Tree (CART) technique of Breiman and others in 1984. Despite a continuing improvement in the rate of technology-transfer from the statistical to the earth-science community, despite an abrupt drop to a time-lag of about 10 years following the introduction of digital computers, these more recent developments are only just beginning to penetrate beyond the research community of earth scientists. Examples of applications to problem-solving in the earth sciences are given
Gonzales, Gerard Bryan; De Saeger, Sarah
2018-02-26
In this paper, the stability of the plasma metabolome at -20 °C for up to 30 days was evaluated using liquid chromatography-high resolution mass spectrometric metabolomics analysis. To follow the time-series deterioration of the plasma metabolome, the use of an elastic net regularized regression model for the prediction of storage time at -20 °C based on the plasma metabolomic profile, and the selection and ranking of metabolites with high temporal changes was demonstrated using the glmnet package in R. Out of 1229 (positive mode) and 1483 (negative mode) metabolite features, the elastic net model extracted 32 metabolites of interest in both positive and negative modes. L-gamma-glutamyl-L-(iso)leucine (tentative identification) was found to have the highest time-dependent change and significantly increased proportionally to the storage time of plasma at -20 °C (R 2 = 0.6378 [positive mode], R 2 = 0.7893 [negative mode], p-value plan logistics and storage strategies for samples obtained from low-resource settings, where -80 °C storage is not guaranteed.
Hu, L; Zhang, Z G; Mouraux, A; Iannetti, G D
2015-05-01
Transient sensory, motor or cognitive event elicit not only phase-locked event-related potentials (ERPs) in the ongoing electroencephalogram (EEG), but also induce non-phase-locked modulations of ongoing EEG oscillations. These modulations can be detected when single-trial waveforms are analysed in the time-frequency domain, and consist in stimulus-induced decreases (event-related desynchronization, ERD) or increases (event-related synchronization, ERS) of synchrony in the activity of the underlying neuronal populations. ERD and ERS reflect changes in the parameters that control oscillations in neuronal networks and, depending on the frequency at which they occur, represent neuronal mechanisms involved in cortical activation, inhibition and binding. ERD and ERS are commonly estimated by averaging the time-frequency decomposition of single trials. However, their trial-to-trial variability that can reflect physiologically-important information is lost by across-trial averaging. Here, we aim to (1) develop novel approaches to explore single-trial parameters (including latency, frequency and magnitude) of ERP/ERD/ERS; (2) disclose the relationship between estimated single-trial parameters and other experimental factors (e.g., perceived intensity). We found that (1) stimulus-elicited ERP/ERD/ERS can be correctly separated using principal component analysis (PCA) decomposition with Varimax rotation on the single-trial time-frequency distributions; (2) time-frequency multiple linear regression with dispersion term (TF-MLRd) enhances the signal-to-noise ratio of ERP/ERD/ERS in single trials, and provides an unbiased estimation of their latency, frequency, and magnitude at single-trial level; (3) these estimates can be meaningfully correlated with each other and with other experimental factors at single-trial level (e.g., perceived stimulus intensity and ERP magnitude). The methods described in this article allow exploring fully non-phase-locked stimulus-induced cortical
Microbiome Data Accurately Predicts the Postmortem Interval Using Random Forest Regression Models
Directory of Open Access Journals (Sweden)
Aeriel Belk
2018-02-01
Full Text Available Death investigations often include an effort to establish the postmortem interval (PMI in cases in which the time of death is uncertain. The postmortem interval can lead to the identification of the deceased and the validation of witness statements and suspect alibis. Recent research has demonstrated that microbes provide an accurate clock that starts at death and relies on ecological change in the microbial communities that normally inhabit a body and its surrounding environment. Here, we explore how to build the most robust Random Forest regression models for prediction of PMI by testing models built on different sample types (gravesoil, skin of the torso, skin of the head, gene markers (16S ribosomal RNA (rRNA, 18S rRNA, internal transcribed spacer regions (ITS, and taxonomic levels (sequence variants, species, genus, etc.. We also tested whether particular suites of indicator microbes were informative across different datasets. Generally, results indicate that the most accurate models for predicting PMI were built using gravesoil and skin data using the 16S rRNA genetic marker at the taxonomic level of phyla. Additionally, several phyla consistently contributed highly to model accuracy and may be candidate indicators of PMI.
Regression trees modeling and forecasting of PM10 air pollution in urban areas
Stoimenova, M.; Voynikova, D.; Ivanov, A.; Gocheva-Ilieva, S.; Iliev, I.
2017-10-01
Fine particulate matter (PM10) air pollution is a serious problem affecting the health of the population in many Bulgarian cities. As an example, the object of this study is the pollution with PM10 of the town of Pleven, Northern Bulgaria. The measured concentrations of this air pollutant for this city consistently exceeded the permissible limits set by European and national legislation. Based on data for the last 6 years (2011-2016), the analysis shows that this applies both to the daily limit of 50 micrograms per cubic meter and the allowable number of daily concentration exceedances to 35 per year. Also, the average annual concentration of PM10 exceeded the prescribed norm of no more than 40 micrograms per cubic meter. The aim of this work is to build high performance mathematical models for effective prediction and forecasting the level of PM10 pollution. The study was conducted with the powerful flexible data mining technique Classification and Regression Trees (CART). The values of PM10 were fitted with respect to meteorological data such as maximum and minimum air temperature, relative humidity, wind speed and direction and others, as well as with time and autoregressive variables. As a result the obtained CART models demonstrate high predictive ability and fit the actual data with up to 80%. The best models were applied for forecasting the level pollution for 3 to 7 days ahead. An interpretation of the modeling results is presented.
Directory of Open Access Journals (Sweden)
Tao Gao
2014-01-01
Full Text Available Extreme precipitation is likely to be one of the most severe meteorological disasters in China; however, studies on the physical factors affecting precipitation extremes and corresponding prediction models are not accurately available. From a new point of view, the sensible heat flux (SHF and latent heat flux (LHF, which have significant impacts on summer extreme rainfall in Yangtze River basin (YRB, have been quantified and then selections of the impact factors are conducted. Firstly, a regional extreme precipitation index was applied to determine Regions of Significant Correlation (RSC by analyzing spatial distribution of correlation coefficients between this index and SHF, LHF, and sea surface temperature (SST on global ocean scale; then the time series of SHF, LHF, and SST in RSCs during 1967–2010 were selected. Furthermore, other factors that significantly affect variations in precipitation extremes over YRB were also selected. The methods of multiple stepwise regression and leave-one-out cross-validation (LOOCV were utilized to analyze and test influencing factors and statistical prediction model. The correlation coefficient between observed regional extreme index and model simulation result is 0.85, with significant level at 99%. This suggested that the forecast skill was acceptable although many aspects of the prediction model should be improved.
Energy Technology Data Exchange (ETDEWEB)
Ghazali, Amirul Syafiq Mohd; Ali, Zalila; Noor, Norlida Mohd; Baharum, Adam [Pusat Pengajian Sains Matematik, Universiti Sains Malaysia, 11800 USM, Pulau Pinang, Malaysia amirul@unisel.edu.my, zalila@cs.usm.my, norlida@usm.my, adam@usm.my (Malaysia)
2015-10-22
Multinomial logistic regression is widely used to model the outcomes of a polytomous response variable, a categorical dependent variable with more than two categories. The model assumes that the conditional mean of the dependent categorical variables is the logistic function of an affine combination of predictor variables. Its procedure gives a number of logistic regression models that make specific comparisons of the response categories. When there are q categories of the response variable, the model consists of q-1 logit equations which are fitted simultaneously. The model is validated by variable selection procedures, tests of regression coefficients, a significant test of the overall model, goodness-of-fit measures, and validation of predicted probabilities using odds ratio. This study used the multinomial logistic regression model to investigate obesity and overweight among primary school students in a rural area on the basis of their demographic profiles, lifestyles and on the diet and food intake. The results indicated that obesity and overweight of students are related to gender, religion, sleep duration, time spent on electronic games, breakfast intake in a week, with whom meals are taken, protein intake, and also, the interaction between breakfast intake in a week with sleep duration, and the interaction between gender and protein intake.
International Nuclear Information System (INIS)
Ghazali, Amirul Syafiq Mohd; Ali, Zalila; Noor, Norlida Mohd; Baharum, Adam
2015-01-01
Multinomial logistic regression is widely used to model the outcomes of a polytomous response variable, a categorical dependent variable with more than two categories. The model assumes that the conditional mean of the dependent categorical variables is the logistic function of an affine combination of predictor variables. Its procedure gives a number of logistic regression models that make specific comparisons of the response categories. When there are q categories of the response variable, the model consists of q-1 logit equations which are fitted simultaneously. The model is validated by variable selection procedures, tests of regression coefficients, a significant test of the overall model, goodness-of-fit measures, and validation of predicted probabilities using odds ratio. This study used the multinomial logistic regression model to investigate obesity and overweight among primary school students in a rural area on the basis of their demographic profiles, lifestyles and on the diet and food intake. The results indicated that obesity and overweight of students are related to gender, religion, sleep duration, time spent on electronic games, breakfast intake in a week, with whom meals are taken, protein intake, and also, the interaction between breakfast intake in a week with sleep duration, and the interaction between gender and protein intake
Ghazali, Amirul Syafiq Mohd; Ali, Zalila; Noor, Norlida Mohd; Baharum, Adam
2015-10-01
Multinomial logistic regression is widely used to model the outcomes of a polytomous response variable, a categorical dependent variable with more than two categories. The model assumes that the conditional mean of the dependent categorical variables is the logistic function of an affine combination of predictor variables. Its procedure gives a number of logistic regression models that make specific comparisons of the response categories. When there are q categories of the response variable, the model consists of q-1 logit equations which are fitted simultaneously. The model is validated by variable selection procedures, tests of regression coefficients, a significant test of the overall model, goodness-of-fit measures, and validation of predicted probabilities using odds ratio. This study used the multinomial logistic regression model to investigate obesity and overweight among primary school students in a rural area on the basis of their demographic profiles, lifestyles and on the diet and food intake. The results indicated that obesity and overweight of students are related to gender, religion, sleep duration, time spent on electronic games, breakfast intake in a week, with whom meals are taken, protein intake, and also, the interaction between breakfast intake in a week with sleep duration, and the interaction between gender and protein intake.
Estimasi Model Seemingly Unrelated Regression (SUR dengan Metode Generalized Least Square (GLS
Directory of Open Access Journals (Sweden)
Ade Widyaningsih
2015-04-01
Full Text Available Regression analysis is a statistical tool that is used to determine the relationship between two or more quantitative variables so that one variable can be predicted from the other variables. A method that can used to obtain a good estimation in the regression analysis is ordinary least squares method. The least squares method is used to estimate the parameters of one or more regression but relationships among the errors in the response of other estimators are not allowed. One way to overcome this problem is Seemingly Unrelated Regression model (SUR in which parameters are estimated using Generalized Least Square (GLS. In this study, the author applies SUR model using GLS method on world gasoline demand data. The author obtains that SUR using GLS is better than OLS because SUR produce smaller errors than the OLS.
Estimasi Model Seemingly Unrelated Regression (SUR dengan Metode Generalized Least Square (GLS
Directory of Open Access Journals (Sweden)
Ade Widyaningsih
2014-06-01
Full Text Available Regression analysis is a statistical tool that is used to determine the relationship between two or more quantitative variables so that one variable can be predicted from the other variables. A method that can used to obtain a good estimation in the regression analysis is ordinary least squares method. The least squares method is used to estimate the parameters of one or more regression but relationships among the errors in the response of other estimators are not allowed. One way to overcome this problem is Seemingly Unrelated Regression model (SUR in which parameters are estimated using Generalized Least Square (GLS. In this study, the author applies SUR model using GLS method on world gasoline demand data. The author obtains that SUR using GLS is better than OLS because SUR produce smaller errors than the OLS.
Regression analysis understanding and building business and economic models using Excel
Wilson, J Holton
2012-01-01
The technique of regression analysis is used so often in business and economics today that an understanding of its use is necessary for almost everyone engaged in the field. This book will teach you the essential elements of building and understanding regression models in a business/economic context in an intuitive manner. The authors take a non-theoretical treatment that is accessible even if you have a limited statistical background. It is specifically designed to teach the correct use of regression, while advising you of its limitations and teaching about common pitfalls. This book describe
Weichenthal, Scott; Ryswyk, Keith Van; Goldstein, Alon; Bagg, Scott; Shekkarizfard, Maryam; Hatzopoulou, Marianne
2016-04-01
Existing evidence suggests that ambient ultrafine particles (UFPs) (regression model for UFPs in Montreal, Canada using mobile monitoring data collected from 414 road segments during the summer and winter months between 2011 and 2012. Two different approaches were examined for model development including standard multivariable linear regression and a machine learning approach (kernel-based regularized least squares (KRLS)) that learns the functional form of covariate impacts on ambient UFP concentrations from the data. The final models included parameters for population density, ambient temperature and wind speed, land use parameters (park space and open space), length of local roads and rail, and estimated annual average NOx emissions from traffic. The final multivariable linear regression model explained 62% of the spatial variation in ambient UFP concentrations whereas the KRLS model explained 79% of the variance. The KRLS model performed slightly better than the linear regression model when evaluated using an external dataset (R(2)=0.58 vs. 0.55) or a cross-validation procedure (R(2)=0.67 vs. 0.60). In general, our findings suggest that the KRLS approach may offer modest improvements in predictive performance compared to standard multivariable linear regression models used to estimate spatial variations in ambient UFPs. However, differences in predictive performance were not statistically significant when evaluated using the cross-validation procedure. Crown Copyright © 2015. Published by Elsevier Inc. All rights reserved.
International Nuclear Information System (INIS)
Yu, Jie; Chen, Kuilin; Mori, Junichi; Rashid, Mudassir M.
2013-01-01
Optimizing wind power generation and controlling the operation of wind turbines to efficiently harness the renewable wind energy is a challenging task due to the intermittency and unpredictable nature of wind speed, which has significant influence on wind power production. A new approach for long-term wind speed forecasting is developed in this study by integrating GMCM (Gaussian mixture copula model) and localized GPR (Gaussian process regression). The time series of wind speed is first classified into multiple non-Gaussian components through the Gaussian mixture copula model and then Bayesian inference strategy is employed to incorporate the various non-Gaussian components using the posterior probabilities. Further, the localized Gaussian process regression models corresponding to different non-Gaussian components are built to characterize the stochastic uncertainty and non-stationary seasonality of the wind speed data. The various localized GPR models are integrated through the posterior probabilities as the weightings so that a global predictive model is developed for the prediction of wind speed. The proposed GMCM–GPR approach is demonstrated using wind speed data from various wind farm locations and compared against the GMCM-based ARIMA (auto-regressive integrated moving average) and SVR (support vector regression) methods. In contrast to GMCM–ARIMA and GMCM–SVR methods, the proposed GMCM–GPR model is able to well characterize the multi-seasonality and uncertainty of wind speed series for accurate long-term prediction. - Highlights: • A novel predictive modeling method is proposed for long-term wind speed forecasting. • Gaussian mixture copula model is estimated to characterize the multi-seasonality. • Localized Gaussian process regression models can deal with the random uncertainty. • Multiple GPR models are integrated through Bayesian inference strategy. • The proposed approach shows higher prediction accuracy and reliability
Setiawan, Suhartono, Ahmad, Imam Safawi; Rahmawati, Noorgam Ika
2015-12-01
Bank Indonesia (BI) as the central bank of Republic Indonesiahas a single overarching objective to establish and maintain rupiah stability. This objective could be achieved by monitoring traffic of inflow and outflow money currency. Inflow and outflow are related to stock and distribution of money currency around Indonesia territory. It will effect of economic activities. Economic activities of Indonesia,as one of Moslem country, absolutely related to Islamic Calendar (lunar calendar), that different with Gregorian calendar. This research aims to forecast the inflow and outflow money currency of Representative Office (RO) of BI Semarang Central Java region. The results of the analysis shows that the characteristics of inflow and outflow money currency influenced by the effects of the calendar variations, that is the day of Eid al-Fitr (moslem holyday) as well as seasonal patterns. In addition, the period of a certain week during Eid al-Fitr also affect the increase of inflow and outflow money currency. The best model based on the value of the smallestRoot Mean Square Error (RMSE) for inflow data is ARIMA model. While the best model for predicting the outflow data in RO of BI Semarang is ARIMAX model or Time Series Regression, because both of them have the same model. The results forecast in a period of 2015 shows an increase of inflow money currency happened in August, while the increase in outflow money currency happened in July.
Preisser, J. S.; Phillips, C.; Perin, J.; Schwartz, T. A.
2011-01-01
Objectives The article reviews proportional and partial proportional odds regression for ordered categorical outcomes, such as patient-reported measures, that are frequently used in clinical research in dentistry. Methods The proportional odds regression model for ordinal data is a generalization of ordinary logistic regression for dichotomous responses. When the proportional odds assumption holds for some but not all of the covariates, the lesser known partial proportional odds model is shown to provide a useful extension. Results The ordinal data models are illustrated for the analysis of repeated ordinal outcomes to determine whether the burden associated with sensory alteration following a bilateral sagittal split osteotomy procedure differed for those patients who were given opening exercises only following surgery and those who received sensory retraining exercises in conjunction with standard opening exercises. Conclusions Proportional and partial proportional odds models are broadly applicable to the analysis of cross-sectional and longitudinal ordinal data in dental research. PMID:21070317
truncSP: An R Package for Estimation of Semi-Parametric Truncated Linear Regression Models
Directory of Open Access Journals (Sweden)
Maria Karlsson
2014-05-01
Full Text Available Problems with truncated data occur in many areas, complicating estimation and inference. Regarding linear regression models, the ordinary least squares estimator is inconsistent and biased for these types of data and is therefore unsuitable for use. Alternative estimators, designed for the estimation of truncated regression models, have been developed. This paper presents the R package truncSP. The package contains functions for the estimation of semi-parametric truncated linear regression models using three different estimators: the symmetrically trimmed least squares, quadratic mode, and left truncated estimators, all of which have been shown to have good asymptotic and ?nite sample properties. The package also provides functions for the analysis of the estimated models. Data from the environmental sciences are used to illustrate the functions in the package.
Modeling and prediction of Turkey's electricity consumption using Support Vector Regression
International Nuclear Information System (INIS)
Kavaklioglu, Kadir
2011-01-01
Support Vector Regression (SVR) methodology is used to model and predict Turkey's electricity consumption. Among various SVR formalisms, ε-SVR method was used since the training pattern set was relatively small. Electricity consumption is modeled as a function of socio-economic indicators such as population, Gross National Product, imports and exports. In order to facilitate future predictions of electricity consumption, a separate SVR model was created for each of the input variables using their current and past values; and these models were combined to yield consumption prediction values. A grid search for the model parameters was performed to find the best ε-SVR model for each variable based on Root Mean Square Error. Electricity consumption of Turkey is predicted until 2026 using data from 1975 to 2006. The results show that electricity consumption can be modeled using Support Vector Regression and the models can be used to predict future electricity consumption. (author)
DEFF Research Database (Denmark)
Larsen, Ulrik; Pierobon, Leonardo; Wronski, Jorrit
2014-01-01
to power. In this study we propose four linear regression models to predict the maximum obtainable thermal efficiency for simple and recuperated ORCs. A previously derived methodology is able to determine the maximum thermal efficiency among many combinations of fluids and processes, given the boundary...... conditions of the process. Hundreds of optimised cases with varied design parameters are used as observations in four multiple regression analyses. We analyse the model assumptions, prediction abilities and extrapolations, and compare the results with recent studies in the literature. The models...
Bayes Wavelet Regression Approach to Solve Problems in Multivariable Calibration Modeling
Directory of Open Access Journals (Sweden)
Setiawan Setiawan
2010-05-01
Full Text Available In the multiple regression modeling, a serious problems would arise if the independent variables are correlated among each other (the problem of ill conditioned and the number of observations is much smaller than the number of independent variables (the problem of singularity. Bayes Regression (BR is an approach that can be used to solve the problem of ill conditioned, but computing constraints will be experienced, so pre-processing methods will be necessary in the form of dimensional reduction of independent variables. The results of empirical studies and literature shows that the discrete wavelet transform (WT gives estimation results of regression model which is better than the other preprocessing methods. This experiment will study a combination of BR with WT as pre-processing method to solve the problems ill conditioned and singularities. One application of calibration in the field of chemistry is relationship modeling between the concentration of active substance as measured by High Performance Liquid Chromatography (HPLC with Fourier Transform Infrared (FTIR absorbance spectrum. Spectrum pattern is expected to predict the value of the concentration of active substance. The exploration of Continuum Regression Wavelet Transform (CR-WT, and Partial Least Squares Regression Wavelet Transform (PLS-WT, and Bayes Regression Wavelet Transform (BR-WT shows that the BR-WT has a good performance. BR-WT is superior than PLS-WT method, and relatively is as good as CR-WT method.
Improved model of the retardance in citric acid coated ferrofluids using stepwise regression
Lin, J. F.; Qiu, X. R.
2017-06-01
Citric acid (CA) coated Fe3O4 ferrofluids (FFs) have been conducted for biomedical application. The magneto-optical retardance of CA coated FFs was measured by a Stokes polarimeter. Optimization and multiple regression of retardance in FFs were executed by Taguchi method and Microsoft Excel previously, and the F value of regression model was large enough. However, the model executed by Excel was not systematic. Instead we adopted the stepwise regression to model the retardance of CA coated FFs. From the results of stepwise regression by MATLAB, the developed model had highly predictable ability owing to F of 2.55897e+7 and correlation coefficient of one. The average absolute error of predicted retardances to measured retardances was just 0.0044%. Using the genetic algorithm (GA) in MATLAB, the optimized parametric combination was determined as [4.709 0.12 39.998 70.006] corresponding to the pH of suspension, molar ratio of CA to Fe3O4, CA volume, and coating temperature. The maximum retardance was found as 31.712°, close to that obtained by evolutionary solver in Excel and a relative error of -0.013%. Above all, the stepwise regression method was successfully used to model the retardance of CA coated FFs, and the maximum global retardance was determined by the use of GA.
Bias and Uncertainty in Regression-Calibrated Models of Groundwater Flow in Heterogeneous Media
DEFF Research Database (Denmark)
Cooley, R.L.; Christensen, Steen
2006-01-01
small. Model error is accounted for in the weighted nonlinear regression methodology developed to estimate θ* and assess model uncertainties by incorporating the second-moment matrix of the model errors into the weight matrix. Techniques developed by statisticians to analyze classical nonlinear...... are reduced in magnitude. Biases, correction factors, and confidence and prediction intervals were obtained for a test problem for which model error is large to test robustness of the methodology. Numerical results conform with the theoretical analysis....
Directory of Open Access Journals (Sweden)
Kyung-Jin Kim
2017-02-01
An accuracy evaluation using observations from 2002 to 2009 found that the time-lagged ensemble approach alone produced significant bias but the AR processor reduced the relative error percentage of the peak discharge from 60% to 10% and also decreased the peak timing error from more than 10 h to less than 3 h, on average. The proposed methodology is easy and inexpensive to implement with the existing products and models and thus can be immediately activated until a new product for forecasted meteorological ensembles is officially issued in Korea.
Genomic prediction based on data from three layer lines using non-linear regression models
Huang, H.; Windig, J.J.; Vereijken, A.; Calus, M.P.L.
2014-01-01
Background - Most studies on genomic prediction with reference populations that include multiple lines or breeds have used linear models. Data heterogeneity due to using multiple populations may conflict with model assumptions used in linear regression methods. Methods - In an attempt to alleviate
Kleijnen, J.P.C.
1995-01-01
This tutorial discusses what-if analysis and optimization of System Dynamics models. These problems are solved, using the statistical techniques of regression analysis and design of experiments (DOE). These issues are illustrated by applying the statistical techniques to a System Dynamics model for
A national fine spatial scale land-use regression model for ozone
Kerckhoffs, Jules|info:eu-repo/dai/nl/411260502; Wang, Meng|info:eu-repo/dai/nl/345480279; Meliefste, Kees; Malmqvist, Ebba; Fischer, Paul; Janssen, Nicole A H; Beelen, Rob|info:eu-repo/dai/nl/30483100X; Hoek, Gerard|info:eu-repo/dai/nl/069553475
Uncertainty about health effects of long-term ozone exposure remains. Land use regression (LUR) models have been used successfully for modeling fine scale spatial variation of primary pollutants but very limited for ozone. Our objective was to assess the feasibility of developing a national LUR
A LATENT CLASS POISSON REGRESSION-MODEL FOR HETEROGENEOUS COUNT DATA
WEDEL, M; DESARBO, WS; BULT, [No Value; RAMASWAMY, [No Value
1993-01-01
In this paper an approach is developed that accommodates heterogeneity in Poisson regression models for count data. The model developed assumes that heterogeneity arises from a distribution of both the intercept and the coefficients of the explanatory variables. We assume that the mixing
Determining factors influencing survival of breast cancer by fuzzy logistic regression model.
Nikbakht, Roya; Bahrampour, Abbas
2017-01-01
Fuzzy logistic regression model can be used for determining influential factors of disease. This study explores the important factors of actual predictive survival factors of breast cancer's patients. We used breast cancer data which collected by cancer registry of Kerman University of Medical Sciences during the period of 2000-2007. The variables such as morphology, grade, age, and treatments (surgery, radiotherapy, and chemotherapy) were applied in the fuzzy logistic regression model. Performance of model was determined in terms of mean degree of membership (MDM). The study results showed that almost 41% of patients were in neoplasm and malignant group and more than two-third of them were still alive after 5-year follow-up. Based on the fuzzy logistic model, the most important factors influencing survival were chemotherapy, morphology, and radiotherapy, respectively. Furthermore, the MDM criteria show that the fuzzy logistic regression have a good fit on the data (MDM = 0.86). Fuzzy logistic regression model showed that chemotherapy is more important than radiotherapy in survival of patients with breast cancer. In addition, another ability of this model is calculating possibilistic odds of survival in cancer patients. The results of this study can be applied in clinical research. Furthermore, there are few studies which applied the fuzzy logistic models. Furthermore, we recommend using this model in various research areas.
de Vries, S O; Fidler, Vaclav; Kuipers, Wietze D; Hunink, Maria G M
1998-01-01
The purpose of this study was to develop a model that predicts the outcome of supervised exercise for intermittent claudication. The authors present an example of the use of autoregressive logistic regression for modeling observed longitudinal data. Data were collected from 329 participants in a
Logistic regression models of factors influencing the location of bioenergy and biofuels plants
T.M. Young; R.L. Zaretzki; J.H. Perdue; F.M. Guess; X. Liu
2011-01-01
Logistic regression models were developed to identify significant factors that influence the location of existing wood-using bioenergy/biofuels plants and traditional wood-using facilities. Logistic models provided quantitative insight for variables influencing the location of woody biomass-using facilities. Availability of "thinnings to a basal area of 31.7m2/ha...
Profile-driven regression for modeling and runtime optimization of mobile networks
DEFF Research Database (Denmark)
McClary, Dan; Syrotiuk, Violet; Kulahci, Murat
2010-01-01
of throughput in a mobile ad hoc network, a self-organizing collection of mobile wireless nodes without any fixed infrastructure. The intermediate models generated in profile-driven regression are used to fit an overall model of throughput, and are also used to optimize controllable factors at runtime. Unlike...
The use of logistic regression in modelling the distributions of bird ...
African Journals Online (AJOL)
The method of logistic regression was used to model the observed geographical distribution patterns of bird species in Swaziland in relation to a set of environmental variables. Reporting rates derived from bird atlas data are used as an index of population densities. This is justified in part by the success of the modelling ...
The use of logistic regression in modelling the distributions of bird ...
African Journals Online (AJOL)
The method of logistic regression was used to model the observed geographical distribution patterns of bird species in Swaziland in relation to a set of environmental variables. Reporting rates derived from brrd atlas data are used as an index of population densities. This is justified in part by the success of the modelling ...
Random regression models in the evaluation of the growth curve of Simbrasil beef cattle
Mota, M.; Marques, F.A.; Lopes, P.S.; Hidalgo, A.M.
2013-01-01
Random regression models were used to estimate the types and orders of random effects of (co)variance functions in the description of the growth trajectory of the Simbrasil cattle breed. Records for 7049 animals totaling 18,677 individual weighings were submitted to 15 models from the third to the
DEFF Research Database (Denmark)
Carstensen, Bendix
1996-01-01
This paper shows how to fit excess and relative risk regression models to interval censored survival data, and how to implement the models in standard statistical software. The methods developed are used for the analysis of HIV infection rates in a cohort of Danish homosexual men....
Zheng, F.
2011-01-01
Urban travel times are intrinsically uncertain due to a lot of stochastic characteristics of traffic, especially at signalized intersections. A single travel time does not have much meaning and is not informative to drivers or traffic managers. The range of travel times is large such that certain
Deep ensemble learning of sparse regression models for brain disease diagnosis.
Suk, Heung-Il; Lee, Seong-Whan; Shen, Dinggang
2017-04-01
Recent studies on brain imaging analysis witnessed the core roles of machine learning techniques in computer-assisted intervention for brain disease diagnosis. Of various machine-learning techniques, sparse regression models have proved their effectiveness in handling high-dimensional data but with a small number of training samples, especially in medical problems. In the meantime, deep learning methods have been making great successes by outperforming the state-of-the-art performances in various applications. In this paper, we propose a novel framework that combines the two conceptually different methods of sparse regression and deep learning for Alzheimer's disease/mild cognitive impairment diagnosis and prognosis. Specifically, we first train multiple sparse regression models, each of which is trained with different values of a regularization control parameter. Thus, our multiple sparse regression models potentially select different feature subsets from the original feature set; thereby they have different powers to predict the response values, i.e., clinical label and clinical scores in our work. By regarding the response values from our sparse regression models as target-level representations, we then build a deep convolutional neural network for clinical decision making, which thus we call 'Deep Ensemble Sparse Regression Network.' To our best knowledge, this is the first work that combines sparse regression models with deep neural network. In our experiments with the ADNI cohort, we validated the effectiveness of the proposed method by achieving the highest diagnostic accuracies in three classification tasks. We also rigorously analyzed our results and compared with the previous studies on the ADNI cohort in the literature. Copyright © 2017 Elsevier B.V. All rights reserved.
Forghani, Ali; Peralta, Richard C.
2017-10-01
The study presents a procedure using solute transport and statistical models to evaluate the performance of aquifer storage and recovery (ASR) systems designed to earn additional water rights in freshwater aquifers. The recovery effectiveness (REN) index quantifies the performance of these ASR systems. REN is the proportion of the injected water that the same ASR well can recapture during subsequent extraction periods. To estimate REN for individual ASR wells, the presented procedure uses finely discretized groundwater flow and contaminant transport modeling. Then, the procedure uses multivariate adaptive regression splines (MARS) analysis to identify the significant variables affecting REN, and to identify the most recovery-effective wells. Achieving REN values close to 100% is the desire of the studied 14-well ASR system operator. This recovery is feasible for most of the ASR wells by extracting three times the injectate volume during the same year as injection. Most of the wells would achieve RENs below 75% if extracting merely the same volume as they injected. In other words, recovering almost all the same water molecules that are injected requires having a pre-existing water right to extract groundwater annually. MARS shows that REN most significantly correlates with groundwater flow velocity, or hydraulic conductivity and hydraulic gradient. MARS results also demonstrate that maximizing REN requires utilizing the wells located in areas with background Darcian groundwater velocities less than 0.03 m/d. The study also highlights the superiority of MARS over regular multiple linear regressions to identify the wells that can provide the maximum REN. This is the first reported application of MARS for evaluating performance of an ASR system in fresh water aquifers.
Bakhtiyari, Mahmood; Mehmandar, Mohammad Reza; Mirbagheri, Babak; Hariri, Gholam Reza; Delpisheh, Ali; Soori, Hamid
2014-01-01
Risk factors of human-related traffic crashes are the most important and preventable challenges for community health due to their noteworthy burden in developing countries in particular. The present study aims to investigate the role of human risk factors of road traffic crashes in Iran. Through a cross-sectional study using the COM 114 data collection forms, the police records of almost 600,000 crashes occurred in 2010 are investigated. The binary logistic regression and proportional odds regression models are used. The odds ratio for each risk factor is calculated. These models are adjusted for known confounding factors including age, sex and driving time. The traffic crash reports of 537,688 men (90.8%) and 54,480 women (9.2%) are analysed. The mean age is 34.1 ± 14 years. Not maintaining eyes on the road (53.7%) and losing control of the vehicle (21.4%) are the main causes of drivers' deaths in traffic crashes within cities. Not maintaining eyes on the road is also the most frequent human risk factor for road traffic crashes out of cities. Sudden lane excursion (OR = 9.9, 95% CI: 8.2-11.9) and seat belt non-compliance (OR = 8.7, CI: 6.7-10.1), exceeding authorised speed (OR = 17.9, CI: 12.7-25.1) and exceeding safe speed (OR = 9.7, CI: 7.2-13.2) are the most significant human risk factors for traffic crashes in Iran. The high mortality rate of 39 people for every 100,000 population emphasises on the importance of traffic crashes in Iran. Considering the important role of human risk factors in traffic crashes, struggling efforts are required to control dangerous driving behaviours such as exceeding speed, illegal overtaking and not maintaining eyes on the road.
A personalized microRNA microarray normalization method using a logistic regression model.
Wang, Bin; Wang, Xiao-Feng; Howell, Paul; Qian, Xuemin; Huang, Kun; Riker, Adam I; Ju, Jingfang; Xi, Yaguang
2010-01-15
MicroRNA (miRNA) is a set of newly discovered non-coding small RNA molecules. Its significant effects have contributed to a number of critical biological events including cell proliferation, apoptosis development, as well as tumorigenesis. High-dimensional genomic discovery platforms (e.g. microarray) have been employed to evaluate the important roles of miRNAs by analyzing their expression profiling. However, because of the small total number of miRNAs and the absence of well-known endogenous controls, the traditional normalization methods for messenger RNA (mRNA) profiling analysis could not offer a suitable solution for miRNA analysis. The need for the establishment of new adaptive methods has come to the forefront. Locked nucleic acid (LNA)-based miRNA array was employed to profile miRNAs using colorectal cancer cell lines under different treatments. The expression pattern of overall miRNA profiling was pre-evaluated by a panel of miRNAs using Taqman-based quantitative real-time polymerase chain reaction (qRT-PCR) miRNA assays. A logistic regression model was built based on qRT-PCR results and then applied to the normalization of miRNA array data. The expression levels of 20 additional miRNAs selected from the normalized list were post-validated. Compared with other popularly used normalization methods, the logistic regression model efficiently calibrates the variance across arrays and improves miRNA microarray discovery accuracy. Datasets and R package are available at http://gauss.usouthal.edu/publ/logit/.
Wang, Qingliang; Li, Xiaojie; Hu, Kunpeng; Zhao, Kun; Yang, Peisheng; Liu, Bo
2015-05-12
To explore the risk factors of portal hypertensive gastropathy (PHG) in patients with hepatitis B associated cirrhosis and establish a Logistic regression model of noninvasive prediction. The clinical data of 234 hospitalized patients with hepatitis B associated cirrhosis from March 2012 to March 2014 were analyzed retrospectively. The dependent variable was the occurrence of PHG while the independent variables were screened by binary Logistic analysis. Multivariate Logistic regression was used for further analysis of significant noninvasive independent variables. Logistic regression model was established and odds ratio was calculated for each factor. The accuracy, sensitivity and specificity of model were evaluated by the curve of receiver operating characteristic (ROC). According to univariate Logistic regression, the risk factors included hepatic dysfunction, albumin (ALB), bilirubin (TB), prothrombin time (PT), platelet (PLT), white blood cell (WBC), portal vein diameter, spleen index, splenic vein diameter, diameter ratio, PLT to spleen volume ratio, esophageal varices (EV) and gastric varices (GV). Multivariate analysis showed that hepatic dysfunction (X1), TB (X2), PLT (X3) and splenic vein diameter (X4) were the major occurring factors for PHG. The established regression model was Logit P=-2.667+2.186X1-2.167X2+0.725X3+0.976X4. The accuracy of model for PHG was 79.1% with a sensitivity of 77.2% and a specificity of 80.8%. Hepatic dysfunction, TB, PLT and splenic vein diameter are risk factors for PHG and the noninvasive predicted Logistic regression model was Logit P=-2.667+2.186X1-2.167X2+0.725X3+0.976X4.
Flexible regression models for estimating postmortem interval (PMI) in forensic medicine.
Muñoz Barús, José Ignacio; Febrero-Bande, Manuel; Cadarso-Suárez, Carmen
2008-10-30
Correct determination of time of death is an important goal in forensic medicine. Numerous methods have been described for estimating postmortem interval (PMI), but most are imprecise, poorly reproducible and/or have not been validated with real data. In recent years, however, some progress in PMI estimation has been made, notably through the use of new biochemical methods for quantifying relevant indicator compounds in the vitreous humour. The best, but unverified, results have been obtained with [K+] and hypoxanthine [Hx], using simple linear regression (LR) models. The main aim of this paper is to offer more flexible alternatives to LR, such as generalized additive models (GAMs) and support vector machines (SVMs) in order to obtain improved PMI estimates. The present study, based on detailed analysis of [K+] and [Hx] in more than 200 vitreous humour samples from subjects with known PMI, compared classical LR methodology with GAM and SVM methodologies. Both proved better than LR for estimation of PMI. SVM showed somewhat greater precision than GAM, but GAM offers a readily interpretable graphical output, facilitating understanding of findings by legal professionals; there are thus arguments for using both types of models. R code for these methods is available from the authors, permitting accurate prediction of PMI from vitreous humour [K+], [Hx] and [U], with confidence intervals and graphical output provided. Copyright 2008 John Wiley & Sons, Ltd.
Estimating carbon and showing impacts of drought using satellite data in regression-tree models
Boyte, Stephen; Wylie, Bruce K.; Howard, Danny; Dahal, Devendra; Gilmanov, Tagir G.
2018-01-01
Integrating spatially explicit biogeophysical and remotely sensed data into regression-tree models enables the spatial extrapolation of training data over large geographic spaces, allowing a better understanding of broad-scale ecosystem processes. The current study presents annual gross primary production (GPP) and annual ecosystem respiration (RE) for 2000–2013 in several short-statured vegetation types using carbon flux data from towers that are located strategically across the conterminous United States (CONUS). We calculate carbon fluxes (annual net ecosystem production [NEP]) for each year in our study period, which includes 2012 when drought and higher-than-normal temperatures influence vegetation productivity in large parts of the study area. We present and analyse carbon flux dynamics in the CONUS to better understand how drought affects GPP, RE, and NEP. Model accuracy metrics show strong correlation coefficients (r) (r ≥ 94%) between training and estimated data for both GPP and RE. Overall, average annual GPP, RE, and NEP are relatively constant throughout the study period except during 2012 when almost 60% less carbon is sequestered than normal. These results allow us to conclude that this modelling method effectively estimates carbon dynamics through time and allows the exploration of impacts of meteorological anomalies and vegetation types on carbon dynamics.
Association of footprint measurements with plantar kinetics: a linear regression model.
Fascione, Jeanna M; Crews, Ryan T; Wrobel, James S
2014-03-01
The use of foot measurements to classify morphology and interpret foot function remains one of the focal concepts of lower-extremity biomechanics. However, only 27% to 55% of midfoot variance in foot pressures has been determined in the most comprehensive models. We investigated whether dynamic walking footprint measurements are associated with inter-individual foot loading variability. Thirty individuals (15 men and 15 women; mean ± SD age, 27.17 ± 2.21 years) walked at a self-selected speed over an electronic pedography platform using the midgait technique. Kinetic variables (contact time, peak pressure, pressure-time integral, and force-time integral) were collected for six masked regions. Footprints were digitized for area and linear boundaries using digital photo planimetry software. Six footprint measurements were determined: contact area, footprint index, arch index, truncated arch index, Chippaux-Smirak index, and Staheli index. Linear regression analysis with a Bonferroni adjustment was performed to determine the association between the footprint measurements and each of the kinetic variables. The findings demonstrate that a relationship exists between increased midfoot contact and increased kinetic values in respective locations. Many of these variables produced large effect sizes while describing 38% to 71% of the common variance of select plantar kinetic variables in the medial midfoot region. In addition, larger footprints were associated with larger kinetic values at the medial heel region and both masked forefoot regions. Dynamic footprint measurements are associated with dynamic plantar loading kinetics, with emphasis on the midfoot region.
FlexMix: A General Framework for Finite Mixture Models and Latent Class Regression in R
Directory of Open Access Journals (Sweden)
Friedrich Leisch
2004-10-01
Full Text Available FlexMix implements a general framework for fitting discrete mixtures of regression models in the R statistical computing environment: three variants of the EM algorithm can be used for parameter estimation, regressors and responses may be multivariate with arbitrary dimension, data may be grouped, e.g., to account for multiple observations per individual, the usual formula interface of the S language is used for convenient model specification, and a modular concept of driver functions allows to interface many different types of regression models. Existing drivers implement mixtures of standard linear models, generalized linear models and model-based clustering. FlexMix provides the E-step and all data handling, while the M-step can be supplied by the user to easily define new models.
Arida, Maya Ahmad
In 1972 sustainable development concept existed and during The years it became one of the most important solution to save natural resources and energy, but now with rising energy costs and increasing awareness of the effect of global warming, the development of building energy saving methods and models become apparently more necessary for sustainable future. According to U.S. Energy Information Administration EIA (EIA), today buildings in the U.S. consume 72 percent of electricity produced, and use 55 percent of U.S. natural gas. Buildings account for about 40 percent of the energy consumed in the United States, more than industry and transportation. Of this energy, heating and cooling systems use about 55 percent. If energy-use trends continue, buildings will become the largest consumer of global energy by 2025. This thesis proposes procedures and analysis techniques for building energy system and optimization methods using time series auto regression artificial neural networks. The model predicts whole building energy consumptions as a function of four input variables, dry bulb and wet bulb outdoor air temperatures, hour of day and type of day. The proposed model and the optimization process are tested using data collected from an existing building located in Greensboro, NC. The testing results show that the model can capture very well the system performance, and The optimization method was also developed to automate the process of finding the best model structure that can produce the best accurate prediction against the actual data. The results show that the developed model can provide results sufficiently accurate for its use in various energy efficiency and saving estimation applications.
Madarang, Krish J; Kang, Joo-Hyon
2014-06-01
Stormwater runoff has been identified as a source of pollution for the environment, especially for receiving waters. In order to quantify and manage the impacts of stormwater runoff on the environment, predictive models and mathematical models have been developed. Predictive tools such as regression models have been widely used to predict stormwater discharge characteristics. Storm event characteristics, such as antecedent dry days (ADD), have been related to response variables, such as pollutant loads and concentrations. However it has been a controversial issue among many studies to consider ADD as an important variable in predicting stormwater discharge characteristics. In this study, we examined the accuracy of general linear regression models in predicting discharge characteristics of roadway runoff. A total of 17 storm events were monitored in two highway segments, located in Gwangju, Korea. Data from the monitoring were used to calibrate United States Environmental Protection Agency's Storm Water Management Model (SWMM). The calibrated SWMM was simulated for 55 storm events, and the results of total suspended solid (TSS) discharge loads and event mean concentrations (EMC) were extracted. From these data, linear regression models were developed. R(2) and p-values of the regression of ADD for both TSS loads and EMCs were investigated. Results showed that pollutant loads were better predicted than pollutant EMC in the multiple regression models. Regression may not provide the true effect of site-specific characteristics, due to uncertainty in the data. Copyright © 2014 The Research Centre for Eco-Environmental Sciences, Chinese Academy of Sciences. Published by Elsevier B.V. All rights reserved.
Adjusting for Confounding in Early Postlaunch Settings: Going Beyond Logistic Regression Models.
Schmidt, Amand F; Klungel, Olaf H; Groenwold, Rolf H H
2016-01-01
Postlaunch data on medical treatments can be analyzed to explore adverse events or relative effectiveness in real-life settings. These analyses are often complicated by the number of potential confounders and the possibility of model misspecification. We conducted a simulation study to compare the performance of logistic regression, propensity score, disease risk score, and stabilized inverse probability weighting methods to adjust for confounding. Model misspecification was induced in the independent derivation dataset. We evaluated performance using relative bias confidence interval coverage of the true effect, among other metrics. At low events per coefficient (1.0 and 0.5), the logistic regression estimates had a large relative bias (greater than -100%). Bias of the disease risk score estimates was at most 13.48% and 18.83%. For the propensity score model, this was 8.74% and >100%, respectively. At events per coefficient of 1.0 and 0.5, inverse probability weighting frequently failed or reduced to a crude regression, resulting in biases of -8.49% and 24.55%. Coverage of logistic regression estimates became less than the nominal level at events per coefficient ≤5. For the disease risk score, inverse probability weighting, and propensity score, coverage became less than nominal at events per coefficient ≤2.5, ≤1.0, and ≤1.0, respectively. Bias of misspecified disease risk score models was 16.55%. In settings with low events/exposed subjects per coefficient, disease risk score methods can be useful alternatives to logistic regression models, especially when propensity score models cannot be used. Despite better performance of disease risk score methods than logistic regression and propensity score models in small events per coefficient settings, bias, and coverage still deviated from nominal.
Modeling of Soil Aggregate Stability using Support Vector Machines and Multiple Linear Regression
Directory of Open Access Journals (Sweden)
Ali Asghar Besalatpour
2016-02-01
Full Text Available Introduction: Soil aggregate stability is a key factor in soil resistivity to mechanical stresses, including the impacts of rainfall and surface runoff, and thus to water erosion (Canasveras et al., 2010. Various indicators have been proposed to characterize and quantify soil aggregate stability, for example percentage of water-stable aggregates (WSA, mean weight diameter (MWD, geometric mean diameter (GMD of aggregates, and water-dispersible clay (WDC content (Calero et al., 2008. Unfortunately, the experimental methods available to determine these indicators are laborious, time-consuming and difficult to standardize (Canasveras et al., 2010. Therefore, it would be advantageous if aggregate stability could be predicted indirectly from more easily available data (Besalatpour et al., 2014. The main objective of this study is to investigate the potential use of support vector machines (SVMs method for estimating soil aggregate stability (as quantified by GMD as compared to multiple linear regression approach. Materials and Methods: The study area was part of the Bazoft watershed (31° 37′ to 32° 39′ N and 49° 34′ to 50° 32′ E, which is located in the Northern part of the Karun river basin in central Iran. A total of 160 soil samples were collected from the top 5 cm of soil surface. Some easily available characteristics including topographic, vegetation, and soil properties were used as inputs. Soil organic matter (SOM content was determined by the Walkley-Black method (Nelson & Sommers, 1986. Particle size distribution in the soil samples (clay, silt, sand, fine sand, and very fine sand were measured using the procedure described by Gee & Bauder (1986 and calcium carbonate equivalent (CCE content was determined by the back-titration method (Nelson, 1982. The modified Kemper & Rosenau (1986 method was used to determine wet-aggregate stability (GMD. The topographic attributes of elevation, slope, and aspect were characterized using a 20-m
Introduction to Time Series Modeling
Kitagawa, Genshiro
2010-01-01
In time series modeling, the behavior of a certain phenomenon is expressed in relation to the past values of itself and other covariates. Since many important phenomena in statistical analysis are actually time series and the identification of conditional distribution of the phenomenon is an essential part of the statistical modeling, it is very important and useful to learn fundamental methods of time series modeling. Illustrating how to build models for time series using basic methods, "Introduction to Time Series Modeling" covers numerous time series models and the various tools f
Buonaccorsi, John P; Romeo, Giovanni; Thoresen, Magne
2018-03-01
When fitting regression models, measurement error in any of the predictors typically leads to biased coefficients and incorrect inferences. A plethora of methods have been proposed to correct for this. Obtaining standard errors and confidence intervals using the corrected estimators can be challenging and, in addition, there is concern about remaining bias in the corrected estimators. The bootstrap, which is one option to address these problems, has received limited attention in this context. It has usually been employed by simply resampling observations, which, while suitable in some situations, is not always formally justified. In addition, the simple bootstrap does not allow for estimating bias in non-linear models, including logistic regression. Model-based bootstrapping, which can potentially estimate bias in addition to being robust to the original sampling or whether the measurement error variance is constant or not, has received limited attention. However, it faces challenges that are not present in handling regression models with no measurement error. This article develops new methods for model-based bootstrapping when correcting for measurement error in logistic regression with replicate measures. The methodology is illustrated using two examples, and a series of simulations are carried out to assess and compare the simple and model-based bootstrap methods, as well as other standard methods. While not always perfect, the model-based approaches offer some distinct improvements over the other methods. © 2017, The International Biometric Society.
Construction of risk prediction model of type 2 diabetes mellitus based on logistic regression
Directory of Open Access Journals (Sweden)
Li Jian
2017-01-01
Full Text Available Objective: to construct multi factor prediction model for the individual risk of T2DM, and to explore new ideas for early warning, prevention and personalized health services for T2DM. Methods: using logistic regression techniques to screen the risk factors for T2DM and construct the risk prediction model of T2DM. Results: Male’s risk prediction model logistic regression equation: logit(P=BMI × 0.735+ vegetables × (−0.671 + age × 0.838+ diastolic pressure × 0.296+ physical activity× (−2.287 + sleep ×(−0.009 +smoking ×0.214; Female’s risk prediction model logistic regression equation: logit(P=BMI ×1.979+ vegetables× (−0.292 + age × 1.355+ diastolic pressure× 0.522+ physical activity × (−2.287 + sleep × (−0.010.The area under the ROC curve of male was 0.83, the sensitivity was 0.72, the specificity was 0.86, the area under the ROC curve of female was 0.84, the sensitivity was 0.75, the specificity was 0.90. Conclusion: This study model data is from a compared study of nested case, the risk prediction model has been established by using the more mature logistic regression techniques, and the model is higher predictive sensitivity, specificity and stability.
Directory of Open Access Journals (Sweden)
Gang WU
2016-01-01
Full Text Available Objective To analyze the risk factors for prognosis in intracerebral hemorrhage using decision tree (classification and regression tree, CART model and logistic regression model. Methods CART model and logistic regression model were established according to the risk factors for prognosis of patients with cerebral hemorrhage. The differences in the results were compared between the two methods. Results Logistic regression analyses showed that hematoma volume (OR-value 0.953, initial Glasgow Coma Scale (GCS score (OR-value 1.210, pulmonary infection (OR-value 0.295, and basal ganglia hemorrhage (OR-value 0.336 were the risk factors for the prognosis of cerebral hemorrhage. The results of CART analysis showed that volume of hematoma and initial GCS score were the main factors affecting the prognosis of cerebral hemorrhage. The effects of two models on the prognosis of cerebral hemorrhage were similar (Z-value 0.402, P=0.688. Conclusions CART model has a similar value to that of logistic model in judging the prognosis of cerebral hemorrhage, and it is characterized by using transactional analysis between the risk factors, and it is more intuitive. DOI: 10.11855/j.issn.0577-7402.2015.12.13
Attribute Selection Impact on Linear and Nonlinear Regression Models for Crop Yield Prediction
Directory of Open Access Journals (Sweden)
Alberto Gonzalez-Sanchez
2014-01-01
Full Text Available Efficient cropping requires yield estimation for each involved crop, where data-driven models are commonly applied. In recent years, some data-driven modeling technique comparisons have been made, looking for the best model to yield prediction. However, attributes are usually selected based on expertise assessment or in dimensionality reduction algorithms. A fairer comparison should include the best subset of features for each regression technique; an evaluation including several crops is preferred. This paper evaluates the most common data-driven modeling techniques applied to yield prediction, using a complete method to define the best attribute subset for each model. Multiple linear regression, stepwise linear regression, M5′ regression trees, and artificial neural networks (ANN were ranked. The models were built using real data of eight crops sowed in an irrigation module of Mexico. To validate the models, three accuracy metrics were used: the root relative square error (RRSE, relative mean absolute error (RMAE, and correlation factor (R. The results show that ANNs are more consistent in the best attribute subset composition between the learning and the training stages, obtaining the lowest average RRSE (86.04%, lowest average RMAE (8.75%, and the highest average correlation factor (0.63.
Regional regression models of watershed suspended-sediment discharge for the eastern United States
Roman, David C.; Vogel, Richard M.; Schwarz, Gregory E.
2012-01-01
Estimates of mean annual watershed sediment discharge, derived from long-term measurements of suspended-sediment concentration and streamflow, often are not available at locations of interest. The goal of this study was to develop multivariate regression models to enable prediction of mean annual suspended-sediment discharge from available basin characteristics useful for most ungaged river locations in the eastern United States. The models are based on long-term mean sediment discharge estimates and explanatory variables obtained from a combined dataset of 1201 US Geological Survey (USGS) stations derived from a SPAtially Referenced Regression on Watershed attributes (SPARROW) study and the Geospatial Attributes of Gages for Evaluating Streamflow (GAGES) database. The resulting regional regression models summarized for major US water resources regions 1–8, exhibited prediction R2 values ranging from 76.9% to 92.7% and corresponding average model prediction errors ranging from 56.5% to 124.3%. Results from cross-validation experiments suggest that a majority of the models will perform similarly to calibration runs. The 36-parameter regional regression models also outperformed a 16-parameter national SPARROW model of suspended-sediment discharge and indicate that mean annual sediment loads in the eastern United States generally correlates with a combination of basin area, land use patterns, seasonal precipitation, soil composition, hydrologic modification, and to a lesser extent, topography.
SPSS macros to compare any two fitted values from a regression model.
Weaver, Bruce; Dubois, Sacha
2012-12-01
In regression models with first-order terms only, the coefficient for a given variable is typically interpreted as the change in the fitted value of Y for a one-unit increase in that variable, with all other variables held constant. Therefore, each regression coefficient represents the difference between two fitted values of Y. But the coefficients represent only a fraction of the possible fitted value comparisons that might be of interest to researchers. For many fitted value comparisons that are not captured by any of the regression coefficients, common statistical software packages do not provide the standard errors needed to compute confidence intervals or carry out statistical tests-particularly in more complex models that include interactions, polynomial terms, or regression splines. We describe two SPSS macros that implement a matrix algebra method for comparing any two fitted values from a regression model. The !OLScomp and !MLEcomp macros are for use with models fitted via ordinary least squares and maximum likelihood estimation, respectively. The output from the macros includes the standard error of the difference between the two fitted values, a 95% confidence interval for the difference, and a corresponding statistical test with its p-value.
Non-parametric genetic prediction of complex traits with latent Dirichlet process regression models.
Zeng, Ping; Zhou, Xiang
2017-09-06
Using genotype data to perform accurate genetic prediction of complex traits can facilitate genomic selection in animal and plant breeding programs, and can aid in the development of personalized medicine in humans. Because most complex traits have a polygenic architecture, accurate genetic prediction often requires modeling all genetic variants together via polygenic methods. Here, we develop such a polygenic method, which we refer to as the latent Dirichlet process regression model. Dirichlet process regression is non-parametric in nature, relies on the Dirichlet process to flexibly and adaptively model the effect size distribution, and thus enjoys robust prediction performance across a broad spectrum of genetic architectures. We compare Dirichlet process regression with several commonly used prediction methods with simulations. We further apply Dirichlet process regression to predict gene expressions, to conduct PrediXcan based gene set test, to perform genomic selection of four traits in two species, and to predict eight complex traits in a human cohort.Genetic prediction of complex traits with polygenic architecture has wide application from animal breeding to disease prevention. Here, Zeng and Zhou develop a non-parametric genetic prediction method based on latent Dirichlet Process regression models.
Multiple regression models for energy use in air-conditioned office buildings in different climates
International Nuclear Information System (INIS)
Lam, Joseph C.; Wan, Kevin K.W.; Liu Dalong; Tsang, C.L.
2010-01-01
An attempt was made to develop multiple regression models for office buildings in the five major climates in China - severe cold, cold, hot summer and cold winter, mild, and hot summer and warm winter. A total of 12 key building design variables were identified through parametric and sensitivity analysis, and considered as inputs in the regression models. The coefficient of determination R 2 varies from 0.89 in Harbin to 0.97 in Kunming, indicating that 89-97% of the variations in annual building energy use can be explained by the changes in the 12 parameters. A pseudo-random number generator based on three simple multiplicative congruential generators was employed to generate random designs for evaluation of the regression models. The difference between regression-predicted and DOE-simulated annual building energy use are largely within 10%. It is envisaged that the regression models developed can be used to estimate the likely energy savings/penalty during the initial design stage when different building schemes and design concepts are being considered.
Quantile regression for censored mixed-effects models with applications to HIV studies.
Lachos, Victor H; Chen, Ming-Hui; Abanto-Valle, Carlos A; Azevedo, Caio L N
HIV RNA viral load measures are often subjected to some upper and lower detection limits depending on the quantification assays. Hence, the responses are either left or right censored. Linear/nonlinear mixed-effects models, with slight modifications to accommodate censoring, are routinely used to analyze this type of data. Usually, the inference procedures are based on normality (or elliptical distribution) assumptions for the random terms. However, those analyses might not provide robust inference when the distribution assumptions are questionable. In this paper, we discuss a fully Bayesian quantile regression inference using Markov Chain Monte Carlo (MCMC) methods for longitudinal data models with random effects and censored responses. Compared to the conventional mean regression approach, quantile regression can characterize the entire conditional distribution of the outcome variable, and is more robust to outliers and misspecification of the error distribution. Under the assumption that the error term follows an asymmetric Laplace distribution, we develop a hierarchical Bayesian model and obtain the posterior distribution of unknown parameters at the p th level, with the median regression ( p = 0.5) as a special case. The proposed procedures are illustrated with two HIV AIDS studies on viral loads that were initially analyzed using the typical normal (censored) mean regression mixed-effects models, as well as a simulation study.
LINEAR REGRESSION MODEL ESTİMATİON FOR RIGHT CENSORED DATA
Directory of Open Access Journals (Sweden)
Ersin Yılmaz
2016-05-01
Full Text Available In this study, firstly we will define a right censored data. If we say shortly right-censored data is censoring values that above the exact line. This may be related with scaling device. And then we will use response variable acquainted from right-censored explanatory variables. Then the linear regression model will be estimated. For censored data’s existence, Kaplan-Meier weights will be used for the estimation of the model. With the weights regression model will be consistent and unbiased with that. And also there is a method for the censored data that is a semi parametric regression and this method also give useful results for censored data too. This study also might be useful for the health studies because of the censored data used in medical issues generally.
Arora, Amarpreet S; Reddy, Akepati S
2014-01-01
Stormwater management at urban sub-watershed level has been envisioned to include stormwater collection, treatment, and disposal of treated stormwater through groundwater recharging. Sizing, operation and control of the stormwater management systems require information on the quantities and characteristics of the stormwater generated. Stormwater characteristics depend upon dry spell between two successive rainfall events, intensity of rainfall and watershed characteristics. However, sampling and analysis of stormwater, spanning only few rainfall events, provides insufficient information on the characteristics. An attempt has been made in the present study to assess the stormwater characteristics through regression modeling. Stormwater of five sub-watersheds of Patiala city were sampled and analyzed. The results obtained were related with the antecedent dry periods and with the intensity of the rainfall event through regression modeling. Obtained regression models were used to assess the stormwater quality for various antecedent dry periods and rainfall event intensities.
Modeling Personalized Email Prioritization: Classification-based and Regression-based Approaches
Energy Technology Data Exchange (ETDEWEB)
Yoo S.; Yang, Y.; Carbonell, J.
2011-10-24
Email overload, even after spam filtering, presents a serious productivity challenge for busy professionals and executives. One solution is automated prioritization of incoming emails to ensure the most important are read and processed quickly, while others are processed later as/if time permits in declining priority levels. This paper presents a study of machine learning approaches to email prioritization into discrete levels, comparing ordinal regression versus classier cascades. Given the ordinal nature of discrete email priority levels, SVM ordinal regression would be expected to perform well, but surprisingly a cascade of SVM classifiers significantly outperforms ordinal regression for email prioritization. In contrast, SVM regression performs well -- better than classifiers -- on selected UCI data sets. This unexpected performance inversion is analyzed and results are presented, providing core functionality for email prioritization systems.
Evaluating Non-Linear Regression Models in Analysis of Persian Walnut Fruit Growth
Directory of Open Access Journals (Sweden)
I. Karamatlou
2016-02-01
Full Text Available Introduction: Persian walnut (Juglans regia L. is a large, wind-pollinated, monoecious, dichogamous, long lived, perennial tree cultivated for its high quality wood and nuts throughout the temperate regions of the world. Growth model methodology has been widely used in the modeling of plant growth. Mathematical models are important tools to study the plant growth and agricultural systems. These models can be applied for decision-making anddesigning management procedures in horticulture. Through growth analysis, planning for planting systems, fertilization, pruning operations, harvest time as well as obtaining economical yield can be more accessible.Non-linear models are more difficult to specify and estimate than linear models. This research was aimed to studynon-linear regression models based on data obtained from fruit weight, length and width. Selecting the best models which explain that fruit inherent growth pattern of Persian walnut was a further goal of this study. Materials and Methods: The experimental material comprising 14 Persian walnut genotypes propagated by seed collected from a walnut orchard in Golestan province, Minoudasht region, Iran, at latitude 37◦04’N; longitude 55◦32’E; altitude 1060 m, in a silt loam soil type. These genotypes were selected as a representative sampling of the many walnut genotypes available throughout the Northeastern Iran. The age range of walnut trees was 30 to 50 years. The annual mean temperature at the location is16.3◦C, with annual mean rainfall of 690 mm.The data used here is the average of walnut fresh fruit and measured withgram/millimeter/day in2011.According to the data distribution pattern, several equations have been proposed to describesigmoidal growth patterns. Here, we used double-sigmoid and logistic–monomolecular models to evaluate fruit growth based on fruit weight and4different regression models in cluding Richards, Gompertz, Logistic and Exponential growth for evaluation
Martínez-Fernández, J.; Chuvieco, E.; Koutsias, N.
2013-02-01
parameter estimates for all the variables and an important reduction of the autocorrelation in the residuals of the GW linear model. Despite the fitting improvement of local models, GW regression, more than an alternative to "global" or traditional regression modelling, seems to be a valuable complement to explore the non-stationary relationships between the response variable and the explanatory variables. The synergy of global and local modelling provides insights into fire management and policy and helps further our understanding of the fire problem over large areas while at the same time recognizing its local character.
Madonna, Erica; Ginsbourger, David; Martius, Olivia
2018-05-01
In Switzerland, hail regularly causes substantial damage to agriculture, cars and infrastructure, however, little is known about its long-term variability. To study the variability, the monthly number of days with hail in northern Switzerland is modeled in a regression framework using large-scale predictors derived from ERA-Interim reanalysis. The model is developed and verified using radar-based hail observations for the extended summer season (April-September) in the period 2002-2014. The seasonality of hail is explicitly modeled with a categorical predictor (month) and monthly anomalies of several large-scale predictors are used to capture the year-to-year variability. Several regression models are applied and their performance tested with respect to standard scores and cross-validation. The chosen model includes four predictors: the monthly anomaly of the two meter temperature, the monthly anomaly of the logarithm of the convective available potential energy (CAPE), the monthly anomaly of the wind shear and the month. This model well captures the intra-annual variability and slightly underestimates its inter-annual variability. The regression model is applied to the reanalysis data back in time to 1980. The resulting hail day time series shows an increase of the number of hail days per month, which is (in the model) related to an increase in temperature and CAPE. The trend corresponds to approximately 0.5 days per month per decade. The results of the regression model have been compared to two independent data sets. All data sets agree on the sign of the trend, but the trend is weaker in the other data sets.
Accounting for spatial effects in land use regression for urban air pollution modeling.
Bertazzon, Stefania; Johnson, Markey; Eccles, Kristin; Kaplan, Gilaad G
2015-01-01
In order to accurately assess air pollution risks, health studies require spatially resolved pollution concentrations. Land-use regression (LUR) models estimate ambient concentrations at a fine spatial scale. However, spatial effects such as spatial non-stationarity and spatial autocorrelation can reduce the accuracy of LUR estimates by increasing regression errors and uncertainty; and statistical methods for resolving these effects--e.g., spatially autoregressive (SAR) and geographically weighted regression (GWR) models--may be difficult to apply simultaneously. We used an alternate approach to address spatial non-stationarity and spatial autocorrelation in LUR models for nitrogen dioxide. Traditional models were re-specified to include a variable capturing wind speed and direction, and re-fit as GWR models. Mean R(2) values for the resulting GWR-wind models (summer: 0.86, winter: 0.73) showed a 10-20% improvement over traditional LUR models. GWR-wind models effectively addressed both spatial effects and produced meaningful predictive models. These results suggest a useful method for improving spatially explicit models. Copyright © 2015 The Authors. Published by Elsevier Ltd.. All rights reserved.
Combination of supervised and semi-supervised regression models for improved unbiased estimation
DEFF Research Database (Denmark)
Arenas-Garía, Jeronimo; Moriana-Varo, Carlos; Larsen, Jan
2010-01-01
In this paper we investigate the steady-state performance of semisupervised regression models adjusted using a modified RLS-like algorithm, identifying the situations where the new algorithm is expected to outperform standard RLS. By using an adaptive combination of the supervised and semisupervi......In this paper we investigate the steady-state performance of semisupervised regression models adjusted using a modified RLS-like algorithm, identifying the situations where the new algorithm is expected to outperform standard RLS. By using an adaptive combination of the supervised...
Methods and applications of linear models regression and the analysis of variance
Hocking, Ronald R
2013-01-01
Praise for the Second Edition"An essential desktop reference book . . . it should definitely be on your bookshelf." -Technometrics A thoroughly updated book, Methods and Applications of Linear Models: Regression and the Analysis of Variance, Third Edition features innovative approaches to understanding and working with models and theory of linear regression. The Third Edition provides readers with the necessary theoretical concepts, which are presented using intuitive ideas rather than complicated proofs, to describe the inference that is appropriate for the methods being discussed. The book
Jovanovic, Milos; Radovanovic, Sandro; Vukicevic, Milan; Van Poucke, Sven; Delibasic, Boris
2016-09-01
Quantification and early identification of unplanned readmission risk have the potential to improve the quality of care during hospitalization and after discharge. However, high dimensionality, sparsity, and class imbalance of electronic health data and the complexity of risk quantification, challenge the development of accurate predictive models. Predictive models require a certain level of interpretability in order to be applicable in real settings and create actionable insights. This paper aims to develop accurate and interpretable predictive models for readmission in a general pediatric patient population, by integrating a data-driven model (sparse logistic regression) and domain knowledge based on the international classification of diseases 9th-revision clinical modification (ICD-9-CM) hierarchy of diseases. Additionally, we propose a way to quantify the interpretability of a model and inspect the stability of alternative solutions. The analysis was conducted on >66,000 pediatric hospital discharge records from California, State Inpatient Databases, Healthcare Cost and Utilization Project between 2009 and 2011. We incorporated domain knowledge based on the ICD-9-CM hierarchy in a data driven, Tree-Lasso regularized logistic regression model, providing the framework for model interpretation. This approach was compared with traditional Lasso logistic regression resulting in models that are easier to interpret by fewer high-level diagnoses, with comparable prediction accuracy. The results revealed that the use of a Tree-Lasso model was as competitive in terms of accuracy (measured by area under the receiver operating characteristic curve-AUC) as the traditional Lasso logistic regression, but integration with the ICD-9-CM hierarchy of diseases provided more interpretable models in terms of high-level diagnoses. Additionally, interpretations of models are in accordance with existing medical understanding of pediatric readmission. Best performing models have
Multi-omics facilitated variable selection in Cox-regression model for cancer prognosis prediction.
Liu, Cong; Wang, Xujun; Genchev, Georgi Z; Lu, Hui
2017-07-15
New developments in high-throughput genomic technologies have enabled the measurement of diverse types of omics biomarkers in a cost-efficient and clinically-feasible manner. Developing computational methods and tools for analysis and translation of such genomic data into clinically-relevant information is an ongoing and active area of investigation. For example, several studies have utilized an unsupervised learning framework to cluster patients by integrating omics data. Despite such recent advances, predicting cancer prognosis using integrated omics biomarkers remains a challenge. There is also a shortage of computational tools for predicting cancer prognosis by using supervised learning methods. The current standard approach is to fit a Cox regression model by concatenating the different types of omics data in a linear manner, while penalty could be added for feature selection. A more powerful approach, however, would be to incorporate data by considering relationships among omics datatypes. Here we developed two methods: a SKI-Cox method and a wLASSO-Cox method to incorporate the association among different types of omics data. Both methods fit the Cox proportional hazards model and predict a risk score based on mRNA expression profiles. SKI-Cox borrows the information generated by these additional types of omics data to guide variable selection, while wLASSO-Cox incorporates this information as a penalty factor during model fitting. We show that SKI-Cox and wLASSO-Cox models select more true variables than a LASSO-Cox model in simulation studies. We assess the performance of SKI-Cox and wLASSO-Cox using TCGA glioblastoma multiforme and lung adenocarcinoma data. In each case, mRNA expression, methylation, and copy number variation data are integrated to predict the overall survival time of cancer patients. Our methods achieve better performance in predicting patients' survival in glioblastoma and lung adenocarcinoma. Copyright © 2017. Published by Elsevier
A ¤nonparametric dynamic additive regression model for longitudinal data
DEFF Research Database (Denmark)
Martinussen, T.; Scheike, T. H.
2000-01-01
dynamic linear models, estimating equations, least squares, longitudinal data, nonparametric methods, partly conditional mean models, time-varying-coefficient models......dynamic linear models, estimating equations, least squares, longitudinal data, nonparametric methods, partly conditional mean models, time-varying-coefficient models...
Accounting for Zero Inflation of Mussel Parasite Counts Using Discrete Regression Models
Directory of Open Access Journals (Sweden)
Emel Çankaya
2017-06-01
Full Text Available In many ecological applications, the absences of species are inevitable due to either detection faults in samples or uninhabitable conditions for their existence, resulting in high number of zero counts or abundance. Usual practice for modelling such data is regression modelling of log(abundance+1 and it is well know that resulting model is inadequate for prediction purposes. New discrete models accounting for zero abundances, namely zero-inflated regression (ZIP and ZINB, Hurdle-Poisson (HP and Hurdle-Negative Binomial (HNB amongst others are widely preferred to the classical regression models. Due to the fact that mussels are one of the economically most important aquatic products of Turkey, the purpose of this study is therefore to examine the performances of these four models in determination of the significant biotic and abiotic factors on the occurrences of Nematopsis legeri parasite harming the existence of Mediterranean mussels (Mytilus galloprovincialis L.. The data collected from the three coastal regions of Sinop city in Turkey showed more than 50% of parasite counts on the average are zero-valued and model comparisons were based on information criterion. The results showed that the probability of the occurrence of this parasite is here best formulated by ZINB or HNB models and influential factors of models were found to be correspondent with ecological differences of the regions.
A componential model of human interaction with graphs: 1. Linear regression modeling
Gillan, Douglas J.; Lewis, Robert
1994-01-01
Task analyses served as the basis for developing the Mixed Arithmetic-Perceptual (MA-P) model, which proposes (1) that people interacting with common graphs to answer common questions apply a set of component processes-searching for indicators, encoding the value of indicators, performing arithmetic operations on the values, making spatial comparisons among indicators, and repsonding; and (2) that the type of graph and user's task determine the combination and order of the components applied (i.e., the processing steps). Two experiments investigated the prediction that response time will be linearly related to the number of processing steps according to the MA-P model. Subjects used line graphs, scatter plots, and stacked bar graphs to answer comparison questions and questions requiring arithmetic calculations. A one-parameter version of the model (with equal weights for all components) and a two-parameter version (with different weights for arithmetic and nonarithmetic processes) accounted for 76%-85% of individual subjects' variance in response time and 61%-68% of the variance taken across all subjects. The discussion addresses possible modifications in the MA-P model, alternative models, and design implications from the MA-P model.
Carie, Adam; Rios-Doria, Jonathan; Costich, Tara; Burke, Brian; Slama, Richard; Skaff, Habib; Sill, Kevin
2011-01-01
Polymer micelles are promising drug delivery vehicles for the delivery of anticancer agents to tumors. Often, anticancer drugs display potent cytotoxic effects towards cancer cells but are too hydrophobic to be administered in the clinic as a free drug. To address this problem, a polymer micelle was designed using a triblock copolymer (ITP-101) that enables hydrophobic drugs to be encapsulated. An SN-38 encapsulated micelle, IT-141, was prepared that exhibited potent in vitro cytotoxicity against a wide array of cancer cell lines. In a mouse model, pharmacokinetic analysis revealed that IT-141 had a much longer circulation time, plasma exposure, and tumor exposure compared to irinotecan. IT-141 was also superior to irinotecan in terms of antitumor activity, exhibiting greater tumor inhibition in HT-29 and HCT116 colorectal cancer xenograft models at half the dose of irinotecan. The antitumor effect of IT-141 was dose-dependent and caused complete growth inhibition and tumor regression at well-tolerated doses. Varying the specific concentration of SN-38 within the IT-141 micelle had no detectible effect on this antitumor activity, indicating no differences in activity between different IT-141 formulations. In summary, IT-141 is a potent micelle-based chemotherapy that holds promise for the treatment of colorectal cancer.
Directory of Open Access Journals (Sweden)
Adam Carie
2011-01-01
Full Text Available Polymer micelles are promising drug delivery vehicles for the delivery of anticancer agents to tumors. Often, anticancer drugs display potent cytotoxic effects towards cancer cells but are too hydrophobic to be administered in the clinic as a free drug. To address this problem, a polymer micelle was designed using a triblock copolymer (ITP-101 that enables hydrophobic drugs to be encapsulated. An SN-38 encapsulated micelle, IT-141, was prepared that exhibited potent in vitro cytotoxicity against a wide array of cancer cell lines. In a mouse model, pharmacokinetic analysis revealed that IT-141 had a much longer circulation time, plasma exposure, and tumor exposure compared to irinotecan. IT-141 was also superior to irinotecan in terms of antitumor activity, exhibiting greater tumor inhibition in HT-29 and HCT116 colorectal cancer xenograft models at half the dose of irinotecan. The antitumor effect of IT-141 was dose-dependent and caused complete growth inhibition and tumor regression at well-tolerated doses. Varying the specific concentration of SN-38 within the IT-141 micelle had no detectible effect on this antitumor activity, indicating no differences in activity between different IT-141 formulations. In summary, IT-141 is a potent micelle-based chemotherapy that holds promise for the treatment of colorectal cancer.
A Gaussian process regression model for walking speed estimation using a head-worn IMU.
Zihajehzadeh, Shaghayegh; Park, Edward J
2017-07-01
Miniature inertial sensors mainly worn on waist, ankle and wrist have been widely used to measure walking speed of the individuals for lifestyle and/or health monitoring. Recent emergence of head-worn inertial sensors in the form of a smart eyewear (e.g. Recon Jet) or a smart ear-worn device (e.g. Sensixa e-AR) provides an opportunity to use these sensors for estimation of walking speed in real-world environment. This work studies the feasibility of using a head-worn inertial sensor for estimation of walking speed. A combination of time-domain and frequency-domain features of tri-axial acceleration norm signal were used in a Gaussian process regression model to estimate walking speed. An experimental evaluation was performed on 15 healthy subjects during free walking trials in an indoor environment. The results show that the proposed method can provide accuracies of better than around 10% for various walking speed regimes. Additionally, further evaluation of the model for long (15-minutes) outdoor walking trials reveals high correlation of the estimated walking speed values to the ones obtained from fusion of GPS with inertial sensors.
Evans, Wiley; Mathis, Jeremy T.; Winsor, Peter; Statscewich, Hank; Whitledge, Terry E.
2013-01-01
northern Gulf of Alaska (GOA) shelf experiences carbonate system variability on seasonal and annual time scales, but little information exists to resolve higher frequency variability in this region. To resolve this variability using platforms-of-opportunity, we present multiple linear regression (MLR) models constructed from hydrographic data collected along the Northeast Pacific Global Ocean Ecosystems Dynamics (GLOBEC) Seward Line. The empirical algorithms predict dissolved inorganic carbon (DIC) and total alkalinity (TA) using observations of nitrate (NO3-), temperature, salinity and pressure from the surface to 500 m, with R2s > 0.97 and RMSE values of 11 µmol kg-1 for DIC and 9 µmol kg-1 for TA. We applied these relationships to high-resolution NO3- data sets collected during a novel 20 h glider flight and a GLOBEC mesoscale SeaSoar survey. Results from the glider flight demonstrated time/space along-isopycnal variability of aragonite saturations (Ωarag) associated with a dicothermal layer (a cold near-surface layer found in high latitude oceans) that rivaled changes seen vertically through the thermocline. The SeaSoar survey captured the uplift to aragonite saturation horizon (depth where Ωarag = 1) shoaled to a previously unseen depth in the northern GOA. This work is similar to recent studies aimed at predicting the carbonate system in continental margin settings, albeit demonstrates that a NO3--based approach can be applied to high-latitude data collected from platforms capable of high-frequency measurements.
Abbring, J.H.
2009-01-01
We study mixed hitting-time models, which specify durations as the first time a Levy process (a continuous-time process with stationary and independent increments) crosses a heterogeneous threshold. Such models of substantial interest because they can be reduced from optimal-stopping models with
Wang, Wen-Cheng; Cho, Wen-Chien; Chen, Yin-Jen
2014-01-01
It is estimated that mainland Chinese tourists travelling to Taiwan can bring annual revenues of 400 billion NTD to the Taiwan economy. Thus, how the Taiwanese Government formulates relevant measures to satisfy both sides is the focus of most concern. Taiwan must improve the facilities and service quality of its tourism industry so as to attract more mainland tourists. This paper conducted a questionnaire survey of mainland tourists and used grey relational analysis in grey mathematics to analyze the satisfaction performance of all satisfaction question items. The first eight satisfaction items were used as independent variables, and the overall satisfaction performance was used as a dependent variable for quantile regression model analysis to discuss the relationship between the dependent variable under different quantiles and independent variables. Finally, this study further discussed the predictive accuracy of the least mean regression model and each quantile regression model, as a reference for research personnel. The analysis results showed that other variables could also affect the overall satisfaction performance of mainland tourists, in addition to occupation and age. The overall predictive accuracy of quantile regression model Q0.25 was higher than that of the other three models. PMID:24574916
Directory of Open Access Journals (Sweden)
S. G. Gocheva-Ilieva
2010-01-01
Full Text Available In order to model the output laser power of a copper bromide laser with wavelengths of 510.6 and 578.2 nm we have applied two regression techniques—multiple linear regression and multivariate adaptive regression splines. The models have been constructed on the basis of PCA factors for historical data. The influence of first- and second-order interactions between predictors has been taken into account. The models are easily interpreted and have good prediction power, which is established from the results of their validation. The comparison of the derived models shows that these based on multivariate adaptive regression splines have an advantage over the others. The obtained results allow for the clarification of relationships between laser generation and the observed laser input variables, for better determining their influence on laser generation, in order to improve the experimental setup and laser production technology. They can be useful for evaluation of known experiments as well as for prediction of future experiments. The developed modeling methodology is also applicable for a wide range of similar laser devices—metal vapor lasers and gas lasers.
Directory of Open Access Journals (Sweden)
Wen-Cheng Wang
2014-01-01
Full Text Available It is estimated that mainland Chinese tourists travelling to Taiwan can bring annual revenues of 400 billion NTD to the Taiwan economy. Thus, how the Taiwanese Government formulates relevant measures to satisfy both sides is the focus of most concern. Taiwan must improve the facilities and service quality of its tourism industry so as to attract more mainland tourists. This paper conducted a questionnaire survey of mainland tourists and used grey relational analysis in grey mathematics to analyze the satisfaction performance of all satisfaction question items. The first eight satisfaction items were used as independent variables, and the overall satisfaction performance was used as a dependent variable for quantile regression model analysis to discuss the relationship between the dependent variable under different quantiles and independent variables. Finally, this study further discussed the predictive accuracy of the least mean regression model and each quantile regression model, as a reference for research personnel. The analysis results showed that other variables could also affect the overall satisfaction performance of mainland tourists, in addition to occupation and age. The overall predictive accuracy of quantile regression model Q0.25 was higher than that of the other three models.
Directory of Open Access Journals (Sweden)
Anke Hüls
2017-05-01
Full Text Available Antimicrobial resistance in livestock is a matter of general concern. To develop hygiene measures and methods for resistance prevention and control, epidemiological studies on a population level are needed to detect factors associated with antimicrobial resistance in livestock holdings. In general, regression models are used to describe these relationships between environmental factors and resistance outcome. Besides the study design, the correlation structures of the different outcomes of antibiotic resistance and structural zero measurements on the resistance outcome as well as on the exposure side are challenges for the epidemiological model building process. The use of appropriate regression models that acknowledge these complexities is essential to assure valid epidemiological interpretations. The aims of this paper are (i to explain the model building process comparing several competing models for count data (negative binomial model, quasi-Poisson model, zero-inflated model, and hurdle model and (ii to compare these models using data from a cross-sectional study on antibiotic resistance in animal husbandry. These goals are essential to evaluate which model is most suitable to identify potential prevention measures. The dataset used as an example in our analyses was generated initially to study the prevalence and associated factors for the appearance of cefotaxime-resistant Escherichia coli in 48 German fattening pig farms. For each farm, the outcome was the count of samples with resistant bacteria. There was almost no overdispersion and only moderate evidence of excess zeros in the data. Our analyses show that it is essential to evaluate regression models in studies analyzing the relationship between environmental factors and antibiotic resistances in livestock. After model comparison based on evaluation of model predictions, Akaike information criterion, and Pearson residuals, here the hurdle model was judged to be the most appropriate
INVESTIGATION OF E-MAIL TRAFFIC BY USING ZERO-INFLATED REGRESSION MODELS
Directory of Open Access Journals (Sweden)
Yılmaz KAYA
2012-06-01
Full Text Available Based on count data obtained with a value of zero may be greater than anticipated. These types of data sets should be used to analyze by regression methods taking into account zero values. Zero- Inflated Poisson (ZIP, Zero-Inflated negative binomial (ZINB, Poisson Hurdle (PH, negative binomial Hurdle (NBH are more common approaches in modeling more zero value possessing dependent variables than expected. In the present study, the e-mail traffic of Yüzüncü Yıl University in 2009 spring semester was investigated. ZIP and ZINB, PH and NBH regression methods were applied on the data set because more zeros counting (78.9% were found in data set than expected. ZINB and NBH regression considered zero dispersion and overdispersion were found to be more accurate results due to overdispersion and zero dispersion in sending e-mail. ZINB is determined to be best model accordingto Vuong statistics and information criteria.
Directory of Open Access Journals (Sweden)
Wun Wong
2003-01-01
Full Text Available The assessment of medical outcomes is important in the effort to contain costs, streamline patient management, and codify medical practices. As such, it is necessary to develop predictive models that will make accurate predictions of these outcomes. The neural network methodology has often been shown to perform as well, if not better, than the logistic regression methodology in terms of sample predictive performance. However, the logistic regression method is capable of providing an explanation regarding the relationship(s between variables. This explanation is often crucial to understanding the clinical underpinnings of the disease process. Given the respective strengths of the methodologies in question, the combined use of a statistical (i.e., logistic regression and machine learning (i.e., neural network technology in the classification of medical outcomes is warranted under appropriate conditions. The study discusses these conditions and describes an approach for combining the strengths of the models.
Bayesian Bandwidth Selection for a Nonparametric Regression Model with Mixed Types of Regressors
Directory of Open Access Journals (Sweden)
Xibin Zhang
2016-04-01
Full Text Available This paper develops a sampling algorithm for bandwidth estimation in a nonparametric regression model with continuous and discrete regressors under an unknown error density. The error density is approximated by the kernel density estimator of the unobserved errors, while the regression function is estimated using the Nadaraya-Watson estimator admitting continuous and discrete regressors. We derive an approximate likelihood and posterior for bandwidth parameters, followed by a sampling algorithm. Simulation results show that the proposed approach typically leads to better accuracy of the resulting estimates than cross-validation, particularly for smaller sample sizes. This bandwidth estimation approach is applied to nonparametric regression model of the Australian All Ordinaries returns and the kernel density estimation of gross domestic product (GDP growth rates among the organisation for economic co-operation and development (OECD and non-OECD countries.