Praise for the Third Edition ""...this is an excellent book which could easily be used as a course text...""-International Statistical Institute The Fourth Edition of Applied Linear Regression provides a thorough update of the basic theory and methodology of linear regression modeling. Demonstrating the practical applications of linear regression analysis techniques, the Fourth Edition uses interesting, real-world exercises and examples. Stressing central concepts such as model building, understanding parameters, assessing fit and reliability, and drawing conclusions, the new edition illus
Hosmer, David W; Sturdivant, Rodney X
A new edition of the definitive guide to logistic regression modeling for health science and other applications This thoroughly expanded Third Edition provides an easily accessible introduction to the logistic regression (LR) model and highlights the power of this model by examining the relationship between a dichotomous outcome and a set of covariables. Applied Logistic Regression, Third Edition emphasizes applications in the health sciences and handpicks topics that best suit the use of modern statistical software. The book provides readers with state-of-
.... It emphasizes major statistical software packages, including SPSS(r), Minitab(r), SAS(r), R, and R/S-PLUS(r). Detailed instructions for use of these packages, as well as for Microsoft Office Excel...
Pantula, Sastry; Dickey, David
Least squares estimation, when used appropriately, is a powerful research tool. A deeper understanding of the regression concepts is essential for achieving optimal benefits from a least squares analysis. This book builds on the fundamentals of statistical methods and provides appropriate concepts that will allow a scientist to use least squares as an effective research tool. Applied Regression Analysis is aimed at the scientist who wishes to gain a working knowledge of regression analysis. The basic purpose of this book is to develop an understanding of least squares and related statistical methods without becoming excessively mathematical. It is the outgrowth of more than 30 years of consulting experience with scientists and many years of teaching an applied regression course to graduate students. Applied Regression Analysis serves as an excellent text for a service course on regression for non-statisticians and as a reference for researchers. It also provides a bridge between a two-semester introduction to...
An applied and concise treatment of statistical regression techniques for business students and professionals who have little or no background in calculusRegression analysis is an invaluable statistical methodology in business settings and is vital to model the relationship between a response variable and one or more predictor variables, as well as the prediction of a response value given values of the predictors. In view of the inherent uncertainty of business processes, such as the volatility of consumer spending and the presence of market uncertainty, business professionals use regression a
Guanche, Yanira; Mínguez, Roberto; Méndez, Fernando J.
The study of atmospheric patterns, weather types or circulation patterns, is a topic deeply studied by climatologists, and it is widely accepted to disaggregate the atmospheric conditions over regions in a certain number of representative states. This consensus allows simplifying the study of climate conditions to improve weather predictions and a better knowledge of the influence produced by anthropogenic activities on the climate system. Once the atmospheric conditions have been reduced to a catalogue of representative states, it is desirable to dispose of numerical models to improve our understanding about weather dynamics, i.e. i) to analyze climate change studying trends in the probability of occurrence of weather types, ii) to study seasonality and iii) to analyze the possible influence of previous states (Autoregressive terms or Markov Chains). This work introduces the mathematical framework to analyze those effects from a qualitative point of view. In particular, an autoregressive logistic regression model, which has been successfully applied in medical and pharmacological research fields, is presented. The main advantages of autoregressive logistic regression are that i) it can be used to model polytomous outcome variables, such as circulation types, and ii) standard statistical software can be used for fitting purposes. To show the potential of these kind of models for analyzing atmospheric conditions, a case of study located in the Northeastern Atlantic is described. Results obtained show how the model is capable of dealing simultaneously with predictors related to different time scales, which can be used to simulate the behaviour of circulation patterns.
Guanche, Y.; Mínguez, R.; Méndez, F. J.
Autoregressive logistic regression models have been successfully applied in medical and pharmacology research fields, and in simple models to analyze weather types. The main purpose of this paper is to introduce a general framework to study atmospheric circulation patterns capable of dealing simultaneously with: seasonality, interannual variability, long-term trends, and autocorrelation of different orders. To show its effectiveness on modeling performance, daily atmospheric circulation patterns identified from observed sea level pressure fields over the Northeastern Atlantic, have been analyzed using this framework. Model predictions are compared with probabilities from the historical database, showing very good fitting diagnostics. In addition, the fitted model is used to simulate the evolution over time of atmospheric circulation patterns using Monte Carlo method. Simulation results are statistically consistent with respect to the historical sequence in terms of (1) probability of occurrence of the different weather types, (2) transition probabilities and (3) persistence. The proposed model constitutes an easy-to-use and powerful tool for a better understanding of the climate system.
Pedro Henrique Melo Albuquerque
Full Text Available Abstract This study used real data from a Brazilian financial institution on transactions involving Consumer Direct Credit (CDC, granted to clients residing in the Distrito Federal (DF, to construct credit scoring models via Logistic Regression and Geographically Weighted Logistic Regression (GWLR techniques. The aims were: to verify whether the factors that influence credit risk differ according to the borrower’s geographic location; to compare the set of models estimated via GWLR with the global model estimated via Logistic Regression, in terms of predictive power and financial losses for the institution; and to verify the viability of using the GWLR technique to develop credit scoring models. The metrics used to compare the models developed via the two techniques were the AICc informational criterion, the accuracy of the models, the percentage of false positives, the sum of the value of false positive debt, and the expected monetary value of portfolio default compared with the monetary value of defaults observed. The models estimated for each region in the DF were distinct in their variables and coefficients (parameters, with it being concluded that credit risk was influenced differently in each region in the study. The Logistic Regression and GWLR methodologies presented very close results, in terms of predictive power and financial losses for the institution, and the study demonstrated viability in using the GWLR technique to develop credit scoring models for the target population in the study.
Higueras, Manuel; Puig, Pedro; Ainsbury, Elizabeth A.; Rothkamm, Kai
Biological dosimetry based on chromosome aberration scoring in peripheral blood lymphocytes enables timely assessment of the ionizing radiation dose absorbed by an individual. Here, new Bayesian-type count data inverse regression methods are introduced for situations where responses are Poisson or two-parameter compound Poisson distributed. Our Poisson models are calculated in a closed form, by means of Hermite and negative binomial (NB) distributions. For compound Poisson responses, complete and simplified models are provided. The simplified models are also expressible in a closed form and involve the use of compound Hermite and compound NB distributions. Three examples of applications are given that demonstrate the usefulness of these methodologies in cytogenetic radiation biodosimetry and in radiotherapy. We provide R and SAS codes which reproduce these examples. PMID:25663804
Cohen, Ayala; Nahum-Shani, Inbal; Doveh, Etti
In their seminal paper, Edwards and Parry (1993) presented the polynomial regression as a better alternative to applying difference score in the study of congruence. Although this method is increasingly applied in congruence research, its complexity relative to other methods for assessing congruence (e.g., difference score methods) was one of the…
Sales forecasting is one of the most important issues in managing information technology (IT) chain store sales since an IT chain store has many branches. Integrating feature extraction method and prediction tool, such as support vector regression (SVR), is a useful method for constructing an effective sales forecasting scheme. Independent component analysis (ICA) is a novel feature extraction technique and has been widely applied to deal with various forecasting problems. But, up to now, only the basic ICA method (i.e., temporal ICA model) was applied to sale forecasting problem. In this paper, we utilize three different ICA methods including spatial ICA (sICA), temporal ICA (tICA), and spatiotemporal ICA (stICA) to extract features from the sales data and compare their performance in sales forecasting of IT chain store. Experimental results from a real sales data show that the sales forecasting scheme by integrating stICA and SVR outperforms the comparison models in terms of forecasting error. The stICA is a promising tool for extracting effective features from branch sales data and the extracted features can improve the prediction performance of SVR for sales forecasting. PMID:25165740
Sidik, S. M.
Ridge, Marquardt's generalized inverse, shrunken, and principal components estimators are discussed in terms of the objectives of point estimation of parameters, estimation of the predictive regression function, and hypothesis testing. It is found that as the normal equations approach singularity, more consideration must be given to estimable functions of the parameters as opposed to estimation of the full parameter vector; that biased estimators all introduce constraints on the parameter space; that adoption of mean squared error as a criterion of goodness should be independent of the degree of singularity; and that ordinary least-squares subset regression is the best overall method.
Gomes, Daniel de Souza; Baptista Filho, Benedito; Oliveira, Fabio Branco de, E-mail: firstname.lastname@example.org, E-mail: email@example.com, E-mail: firstname.lastname@example.org [Instituto de Pesquisas Energeticas e Nucleares (IPEN/CNEN-SP), Sao Paulo, SP (Brazil); Giovedi, Claudia, E-mail: email@example.com [Universidade de Sao Paulo (POLI/USP), Sao Paulo, SP (Brazil). Lab. de Analise, Avaliacao e Gerenciamento de Risco
A reactivity-initiated Accident (RIA) is a disastrous failure, which occurs because of an unexpected rise in the fission rate and reactor power. This sudden increase in the reactor power may activate processes that might lead to the failure of fuel cladding. In severe accidents, a disruption of fuel and core melting can occur. The purpose of the present research is to study the patterns of such accidents using exploratory data analysis techniques. A study based on applied statistics was used for simulations. Then, we chose peak enthalpy, pulse width, burnup, fission gas release, and the oxidation of zirconium as input parameters and set the safety boundary conditions. This new approach includes the logistic regression. With this, the present research aims also to develop the ability to identify the conditions and the probability of failures. Zirconium-based alloys fabricating the cladding of the fuel rod elements with niobium 1% were analyzed for high burnup limits at 65 MWd/kgU. The data based on six decades of investigations from experimental programs. In test, perform in American reactors such as the transient reactor test (TREAT), and power Burst Facility (PBF). In experiments realized in Japanese program at nuclear in the safety research reactor (NSRR), and in Kazakhstan as impulse graphite reactor (IGR). The database obtained from the tests and served as a support for our study. (author)
Kitada, Y.; Makiguchi, M.; Komori, A.; Ichiki, T.
The records of three earthquakes which had induced significant earthquake response to the piping system were obtained with the earthquake observation system. In the present paper, first, the eigenvalue analysis results for the natural piping system based on the piping support (boundary) conditions are described and second, the frequency and the damping factor evaluation results for each vibrational mode are described. In the present study, the Auto Regressive (AR) analysis method is used in the evaluation of natural frequencies and damping factors. The AR analysis applied here has a capability of direct evaluation of natural frequencies and damping factors from earthquake records observed on a piping system without any information on the input motions to the system. (orig./HP)
Zhang, Y J; Xue, F X; Bai, Z P
The impact of maternal air pollution exposure on offspring health has received much attention. Precise and feasible exposure estimation is particularly important for clarifying exposure-response relationships and reducing heterogeneity among studies. Temporally-adjusted land use regression (LUR) models are exposure assessment methods developed in recent years that have the advantage of having high spatial-temporal resolution. Studies on the health effects of outdoor air pollution exposure during pregnancy have been increasingly carried out using this model. In China, research applying LUR models was done mostly at the model construction stage, and findings from related epidemiological studies were rarely reported. In this paper, the sources of heterogeneity and research progress of meta-analysis research on the associations between air pollution and adverse pregnancy outcomes were analyzed. The methods of the characteristics of temporally-adjusted LUR models were introduced. The current epidemiological studies on adverse pregnancy outcomes that applied this model were systematically summarized. Recommendations for the development and application of LUR models in China are presented. This will encourage the implementation of more valid exposure predictions during pregnancy in large-scale epidemiological studies on the health effects of air pollution in China.
Gomes, Marcos José Timbó Lima; Cunto, Flávio; da Silva, Alan Ricardo
Generalized Linear Models (GLM) with negative binomial distribution for errors, have been widely used to estimate safety at the level of transportation planning. The limited ability of this technique to take spatial effects into account can be overcome through the use of local models from spatial regression techniques, such as Geographically Weighted Poisson Regression (GWPR). Although GWPR is a system that deals with spatial dependency and heterogeneity and has already been used in some road safety studies at the planning level, it fails to account for the possible overdispersion that can be found in the observations on road-traffic crashes. Two approaches were adopted for the Geographically Weighted Negative Binomial Regression (GWNBR) model to allow discrete data to be modeled in a non-stationary form and to take note of the overdispersion of the data: the first examines the constant overdispersion for all the traffic zones and the second includes the variable for each spatial unit. This research conducts a comparative analysis between non-spatial global crash prediction models and spatial local GWPR and GWNBR at the level of traffic zones in Fortaleza/Brazil. A geographic database of 126 traffic zones was compiled from the available data on exposure, network characteristics, socioeconomic factors and land use. The models were calibrated by using the frequency of injury crashes as a dependent variable and the results showed that GWPR and GWNBR achieved a better performance than GLM for the average residuals and likelihood as well as reducing the spatial autocorrelation of the residuals, and the GWNBR model was more able to capture the spatial heterogeneity of the crash frequency. Copyright © 2017 Elsevier Ltd. All rights reserved.
Goldstein, Benjamin A; Navar, Ann Marie; Carter, Rickey E
Risk prediction plays an important role in clinical cardiology research. Traditionally, most risk models have been based on regression models. While useful and robust, these statistical methods are limited to using a small number of predictors which operate in the same way on everyone, and uniformly throughout their range. The purpose of this review is to illustrate the use of machine-learning methods for development of risk prediction models. Typically presented as black box approaches, most machine-learning methods are aimed at solving particular challenges that arise in data analysis that are not well addressed by typical regression approaches. To illustrate these challenges, as well as how different methods can address them, we consider trying to predicting mortality after diagnosis of acute myocardial infarction. We use data derived from our institution's electronic health record and abstract data on 13 regularly measured laboratory markers. We walk through different challenges that arise in modelling these data and then introduce different machine-learning approaches. Finally, we discuss general issues in the application of machine-learning methods including tuning parameters, loss functions, variable importance, and missing data. Overall, this review serves as an introduction for those working on risk modelling to approach the diffuse field of machine learning. © The Author 2016. Published by Oxford University Press on behalf of the European Society of Cardiology.
Kitada, Yoshio; Ichiki, Tadaharu; Makiguchi, Morio; Komori, Akio.
The observation of the equipment and piping system installed in an operating nuclear power plant in earthquakes is very umportant for evaluating and confirming the adequacy and the safety margin expected in the design stage. By analyzing observed earthquake records, it can be expected to get the valuable data concerning the behavior of those in earthquakes, and extract the information about the aseismatic design parameters for those systems. From these viewpoints, an earthquake observation system was installed in a reactor building in an operating plant. Up to now, the records of three earthquakes were obtained with this system. In this paper, an example of the analysis of earthquake records is shown, and the main purpose of the analysis was the evaluation of the vibration mode, natural frequency and damping factor of this piping system. Prior to the earthquake record analysis, the eigenvalue analysis for this piping system was performed. Auto-regressive analysis was applied to the observed acceleration time history which was obtained with a piping system installed in an operating BWR. The results of earthquake record analysis agreed well with the results of eigenvalue analysis. (Kako, I.)
Los Campos, De G.; Hickey, J.M.; Pong-Wong, R.; Daetwyler, H.D.; Calus, M.P.L.
Genomic-enabled prediction is becoming increasingly important in animal and plant breeding, and is also receiving attention in human genetics. Deriving accurate predictions of complex traits requires implementing whole-genome regression (WGR) models where phenotypes are regressed on thousands of
Kiezun, Adam; Lee, I-Ting Angelina; Shomron, Noam
Logistic regression is often used to help make medical decisions with binary outcomes. Here we evaluate the use of several methods for selection of variables in logistic regression. We use a large dataset to predict the diagnosis of myocardial infarction in patients reporting to an emergency room with chest pain. Our results indicate that some of the examined methods are well suited for variable selection in logistic regression and that our model, and our myocardial infarction risk calculator, can be an additional tool to aid physicians in myocardial infarction diagnosis.
de los Campos, Gustavo; Hickey, John M.; Pong-Wong, Ricardo; Daetwyler, Hans D.; Calus, Mario P. L.
Genomic-enabled prediction is becoming increasingly important in animal and plant breeding and is also receiving attention in human genetics. Deriving accurate predictions of complex traits requires implementing whole-genome regression (WGR) models where phenotypes are regressed on thousands of markers concurrently. Methods exist that allow implementing these large-p with small-n regressions, and genome-enabled selection (GS) is being implemented in several plant and animal breeding programs. The list of available methods is long, and the relationships between them have not been fully addressed. In this article we provide an overview of available methods for implementing parametric WGR models, discuss selected topics that emerge in applications, and present a general discussion of lessons learned from simulation and empirical data analysis in the last decade. PMID:22745228
L.F. Hoogerheide (Lennart); F.R. Kleibergen (Frank); H.K. van Dijk (Herman)
textabstractWe propose a natural conjugate prior for the instrumental variables regression model. The prior is a natural conjugate one since the marginal prior and posterior of the structural parameter have the same functional expressions which directly reveal the update from prior to posterior. The
Land Walker H
Full Text Available Abstract Background When investigating covariate interactions and group associations with standard regression analyses, the relationship between the response variable and exposure may be difficult to characterize. When the relationship is nonlinear, linear modeling techniques do not capture the nonlinear information content. Statistical learning (SL techniques with kernels are capable of addressing nonlinear problems without making parametric assumptions. However, these techniques do not produce findings relevant for epidemiologic interpretations. A simulated case-control study was used to contrast the information embedding characteristics and separation boundaries produced by a specific SL technique with logistic regression (LR modeling representing a parametric approach. The SL technique was comprised of a kernel mapping in combination with a perceptron neural network. Because the LR model has an important epidemiologic interpretation, the SL method was modified to produce the analogous interpretation and generate odds ratios for comparison. Results The SL approach is capable of generating odds ratios for main effects and risk factor interactions that better capture nonlinear relationships between exposure variables and outcome in comparison with LR. Conclusions The integration of SL methods in epidemiology may improve both the understanding and interpretation of complex exposure/disease relationships.
Heine, John J; Land, Walker H; Egan, Kathleen M
When investigating covariate interactions and group associations with standard regression analyses, the relationship between the response variable and exposure may be difficult to characterize. When the relationship is nonlinear, linear modeling techniques do not capture the nonlinear information content. Statistical learning (SL) techniques with kernels are capable of addressing nonlinear problems without making parametric assumptions. However, these techniques do not produce findings relevant for epidemiologic interpretations. A simulated case-control study was used to contrast the information embedding characteristics and separation boundaries produced by a specific SL technique with logistic regression (LR) modeling representing a parametric approach. The SL technique was comprised of a kernel mapping in combination with a perceptron neural network. Because the LR model has an important epidemiologic interpretation, the SL method was modified to produce the analogous interpretation and generate odds ratios for comparison. The SL approach is capable of generating odds ratios for main effects and risk factor interactions that better capture nonlinear relationships between exposure variables and outcome in comparison with LR. The integration of SL methods in epidemiology may improve both the understanding and interpretation of complex exposure/disease relationships.
Full Text Available We have been elaborating an approach founded on the identification of multimodal laws of the complex structure distribution in medicine, biology, chemistry of ultrapure materials and membrane technology as well as in technical applications. The method is based on the formulation and solution of inverse problems in mathematical physics for the respective probability density functions. The verification of the used algorithmic tools is carried out on model limited-scope samples. For stochastic structures and systems under study the method is supplemented with an original option of a regression analysis taking into account the identified stochastic laws displaying numerical parameters into the binary space. The proposed approach has been tested on clinical material in practical medicine.
Zhu, Ting-Lei; Zhao, Chang-Yin; Zhang, Ming-Jiang
This paper aims to obtain an analytic approximation to the evolution of circular orbits governed by the Earth's J2 and the luni-solar gravitational perturbations. Assuming that the lunar orbital plane coincides with the ecliptic plane, Allan and Cook (Proc. R. Soc. A, Math. Phys. Eng. Sci. 280(1380):97, 1964) derived an analytic solution to the orbital plane evolution of circular orbits. Using their result as an intermediate solution, we establish an approximate analytic model with lunar orbital inclination and its node regression be taken into account. Finally, an approximate analytic expression is derived, which is accurate compared to the numerical results except for the resonant cases when the period of the reference orbit approximately equals the integer multiples (especially 1 or 2 times) of lunar node regression period.
Saro, Lee; Woo, Jeon Seong; Kwan-Young, Oh; Moung-Jin, Lee
The aim of this study is to predict landslide susceptibility caused using the spatial analysis by the application of a statistical methodology based on the GIS. Logistic regression models along with artificial neutral network were applied and validated to analyze landslide susceptibility in Inje, Korea. Landslide occurrence area in the study were identified based on interpretations of optical remote sensing data (Aerial photographs) followed by field surveys. A spatial database considering forest, geophysical, soil and topographic data, was built on the study area using the Geographical Information System (GIS). These factors were analysed using artificial neural network (ANN) and logistic regression models to generate a landslide susceptibility map. The study validates the landslide susceptibility map by comparing them with landslide occurrence areas. The locations of landslide occurrence were divided randomly into a training set (50%) and a test set (50%). A training set analyse the landslide susceptibility map using the artificial network along with logistic regression models, and a test set was retained to validate the prediction map. The validation results revealed that the artificial neural network model (with an accuracy of 80.10%) was better at predicting landslides than the logistic regression model (with an accuracy of 77.05%). Of the weights used in the artificial neural network model, `slope' yielded the highest weight value (1.330), and `aspect' yielded the lowest value (1.000). This research applied two statistical analysis methods in a GIS and compared their results. Based on the findings, we were able to derive a more effective method for analyzing landslide susceptibility.
Full Text Available Abstract Background Multiple imputation is becoming increasingly popular. Theoretical considerations as well as simulation studies have shown that the inclusion of auxiliary variables is generally of benefit. Methods A simulation study of a linear regression with a response Y and two predictors X1 and X2 was performed on data with n = 50, 100 and 200 using complete cases or multiple imputation with 0, 10, 20, 40 and 80 auxiliary variables. Mechanisms of missingness were either 100% MCAR or 50% MAR + 50% MCAR. Auxiliary variables had low (r=.10 vs. moderate correlations (r=.50 with X’s and Y. Results The inclusion of auxiliary variables can improve a multiple imputation model. However, inclusion of too many variables leads to downward bias of regression coefficients and decreases precision. When the correlations are low, inclusion of auxiliary variables is not useful. Conclusion More research on auxiliary variables in multiple imputation should be performed. A preliminary rule of thumb could be that the ratio of variables to cases with complete data should not go below 1 : 3.
Greensmith, David J.
Here I present an Excel based program for the analysis of intracellular Ca transients recorded using fluorescent indicators. The program can perform all the necessary steps which convert recorded raw voltage changes into meaningful physiological information. The program performs two fundamental processes. (1) It can prepare the raw signal by several methods. (2) It can then be used to analyze the prepared data to provide information such as absolute intracellular Ca levels. Also, the rates of change of Ca can be measured using multiple, simultaneous regression analysis. I demonstrate that this program performs equally well as commercially available software, but has numerous advantages, namely creating a simplified, self-contained analysis workflow. PMID:24125908
Greensmith, David J
Here I present an Excel based program for the analysis of intracellular Ca transients recorded using fluorescent indicators. The program can perform all the necessary steps which convert recorded raw voltage changes into meaningful physiological information. The program performs two fundamental processes. (1) It can prepare the raw signal by several methods. (2) It can then be used to analyze the prepared data to provide information such as absolute intracellular Ca levels. Also, the rates of change of Ca can be measured using multiple, simultaneous regression analysis. I demonstrate that this program performs equally well as commercially available software, but has numerous advantages, namely creating a simplified, self-contained analysis workflow. Copyright © 2013 The Author. Published by Elsevier Ireland Ltd.. All rights reserved.
Danilo Cuzzuol Pedrini
Full Text Available Este artigo propõe um método para a aplicação do gráfico de controle de regressão no monitoramento de processos industriais. Visando facilitar a aplicação do gráfico, o método é apresentado em duas fases: análise retrospectiva (Fase I e monitoramento do processo (Fase II, além de incluir uma modificação do gráfico de controle de regressão múltipla, permitindo o monitoramento direto da característica de qualidade do processo ao invés do monitoramento dos resíduos padronizados do modelo. Também é proposto o gráfico de controle de extrapolação, que verifica se as variáveis de controle extrapolam o conjunto de valores utilizado para estimar o modelo de regressão. O método foi aplicado em um processo de uma indústria de borrachas. O desempenho do gráfico de controle foi avaliado pelo Número Médio de Amostras (NMA até o sinal através do método de Monte Carlo, mostrando a eficiência do gráfico em detectar algumas modificações nos parâmetros do processo.This work proposes a method for the application of regression control charts in the monitoring of manufacturing processes. The proposed method is presented in two phases: retrospective analysis (Phase I and process monitoring (Phase II. It includes a simple modification of the multiple regression control chart, allowing the monitoring of the values of quality characteristics of the process, instead of monitoring the regression standardized residuals. It also proposes an extrapolation control chart, which verifies whether the control variables extrapolate the set of data used in regression model estimation. The proposed method was successfully applied in a rubber manufacturing process. The Average Run Length (ARL distribution was estimated using the Monte Carlo method, proving the efficiency of the proposed chart in detecting some alterations in process parameters.
Helgesson, P.; Sjöstrand, H.; Rochman, D.
This paper presents a novel approach to the evaluation of nuclear data (ND), combining experimental data for thermal cross sections with resonance parameters and nuclear reaction modeling. The method involves sampling of various uncertain parameters, in particular uncertain components in experimental setups, and provides extensive covariance information, including consistent cross-channel correlations over the whole energy spectrum. The method is developed for, and applied to, 59Ni, but may be used as a whole, or in part, for other nuclides. 59Ni is particularly interesting since a substantial amount of 59Ni is produced in thermal nuclear reactors by neutron capture in 58Ni and since it has a non-threshold (n,α) cross section. Therefore, 59Ni gives a very important contribution to the helium production in stainless steel in a thermal reactor. However, current evaluated ND libraries contain old information for 59Ni, without any uncertainty information. The work includes a study of thermal cross section experiments and a novel combination of this experimental information, giving the full multivariate distribution of the thermal cross sections. In particular, the thermal (n,α) cross section is found to be 12.7 ± . 7 b. This is consistent with, but yet different from, current established values. Further, the distribution of thermal cross sections is combined with reported resonance parameters, and with TENDL-2015 data, to provide full random ENDF files; all of this is done in a novel way, keeping uncertainties and correlations in mind. The random files are also condensed into one single ENDF file with covariance information, which is now part of a beta version of JEFF 3.3. Finally, the random ENDF files have been processed and used in an MCNP model to study the helium production in stainless steel. The increase in the (n,α) rate due to 59Ni compared to fresh stainless steel is found to be a factor of 5.2 at a certain time in the reactor vessel, with a relative
Engström, Emma; Mörtberg, Ulla; Karlström, Anders; Mangold, Mikael
This study developed methodology for statistically assessing groundwater contamination mechanisms. It focused on microbial water pollution in low-income regions. Risk factors for faecal contamination of groundwater-fed drinking-water sources were evaluated in a case study in Juba, South Sudan. The study was based on counts of thermotolerant coliforms in water samples from 129 sources, collected by the humanitarian aid organisation Médecins Sans Frontières in 2010. The factors included hydrogeological settings, land use and socio-economic characteristics. The results showed that the residuals of a conventional probit regression model had a significant positive spatial autocorrelation (Moran's I = 3.05, I-stat = 9.28); therefore, a spatial model was developed that had better goodness-of-fit to the observations. The most significant factor in this model ( p-value 0.005) was the distance from a water source to the nearest Tukul area, an area with informal settlements that lack sanitation services. It is thus recommended that future remediation and monitoring efforts in the city be concentrated in such low-income regions. The spatial model differed from the conventional approach: in contrast with the latter case, lowland topography was not significant at the 5% level, as the p-value was 0.074 in the spatial model and 0.040 in the traditional model. This study showed that statistical risk-factor assessments of groundwater contamination need to consider spatial interactions when the water sources are located close to each other. Future studies might further investigate the cut-off distance that reflects spatial autocorrelation. Particularly, these results advise research on urban groundwater quality.
Full Text Available Shape is an important morphological characteristic both in animals and plants. In the present study, we examined a method for predicting biological contour shapes based on genome-wide marker polymorphisms. The method is expected to contribute to the acceleration of genetic improvement of biological shape via genomic selection. Grain shape variation observed in rice (Oryza sativa L. germplasms was delineated using elliptic Fourier descriptors (EFDs, and was predicted based on genome-wide single nucleotide polymorphism (SNP genotypes. We applied four methods including kernel PLS (KPLS regression for building a prediction model of grain shape, and compared the accuracy of the methods via cross-validation. We analyzed multiple datasets that differed in marker density and sample size. Datasets with larger sample size and higher marker density showed higher accuracy. Among the four methods, KPLS showed the highest accuracy. Although KPLS and ridge regression (RR had equivalent accuracy in a single dataset, the result suggested the potential of KPLS for the prediction of high-dimensional EFDs. Ordinary PLS, however, was less accurate than RR in all datasets, suggesting that the use of a non-linear kernel was necessary for accurate prediction using the PLS method. Rice grain shape can be predicted accurately based on genome-wide SNP genotypes. The proposed method is expected to be useful for genomic selection in biological shape.
Espelt, Albert; Marí-Dell'Olmo, Marc; Penelo, Eva; Bosque-Prous, Marina
To examine the differences between Prevalence Ratio (PR) and Odds Ratio (OR) in a cross-sectional study and to provide tools to calculate PR using two statistical packages widely used in substance use research (STATA and R). We used cross-sectional data from 41,263 participants of 16 European countries participating in the Survey on Health, Ageing and Retirement in Europe (SHARE). The dependent variable, hazardous drinking, was calculated using the Alcohol Use Disorders Identification Test - Consumption (AUDIT-C). The main independent variable was gender. Other variables used were: age, educational level and country of residence. PR of hazardous drinking in men with relation to women was estimated using Mantel-Haenszel method, log-binomial regression models and poisson regression models with robust variance. These estimations were compared to the OR calculated using logistic regression models. Prevalence of hazardous drinkers varied among countries. Generally, men have higher prevalence of hazardous drinking than women [PR=1.43 (1.38-1.47)]. Estimated PR was identical independently of the method and the statistical package used. However, OR overestimated PR, depending on the prevalence of hazardous drinking in the country. In cross-sectional studies, where comparisons between countries with differences in the prevalence of the disease or condition are made, it is advisable to use PR instead of OR.
Smith, Kelly M.; Gay, Robert S.; Stachowiak, Susan J.
In late 2014, NASA will fly the Orion capsule on a Delta IV-Heavy rocket for the Exploration Flight Test-1 (EFT-1) mission. For EFT-1, the Orion capsule will be flying with a new GPS receiver and new navigation software. Given the experimental nature of the flight, the flight software must be robust to the loss of GPS measurements. Once the high-speed entry is complete, the drogue parachutes must be deployed within the proper conditions to stabilize the vehicle prior to deploying the main parachutes. When GPS is available in nominal operations, the vehicle will deploy the drogue parachutes based on an altitude trigger. However, when GPS is unavailable, the navigated altitude errors become excessively large, driving the need for a backup barometric altimeter to improve altitude knowledge. In order to increase overall robustness, the vehicle also has an alternate method of triggering the parachute deployment sequence based on planet-relative velocity if both the GPS and the barometric altimeter fail. However, this backup trigger results in large altitude errors relative to the targeted altitude. Motivated by this challenge, this paper demonstrates how logistic regression may be employed to semi-automatically generate robust triggers based on statistical analysis. Logistic regression is used as a ground processor pre-flight to develop a statistical classifier. The classifier would then be implemented in flight software and executed in real-time. This technique offers improved performance even in the face of highly inaccurate measurements. Although the logistic regression-based trigger approach will not be implemented within EFT-1 flight software, the methodology can be carried forward for future missions and vehicles.
Smith, Kelly; Gay, Robert; Stachowiak, Susan
In late 2014, NASA will fly the Orion capsule on a Delta IV-Heavy rocket for the Exploration Flight Test-1 (EFT-1) mission. For EFT-1, the Orion capsule will be flying with a new GPS receiver and new navigation software. Given the experimental nature of the flight, the flight software must be robust to the loss of GPS measurements. Once the high-speed entry is complete, the drogue parachutes must be deployed within the proper conditions to stabilize the vehicle prior to deploying the main parachutes. When GPS is available in nominal operations, the vehicle will deploy the drogue parachutes based on an altitude trigger. However, when GPS is unavailable, the navigated altitude errors become excessively large, driving the need for a backup barometric altimeter to improve altitude knowledge. In order to increase overall robustness, the vehicle also has an alternate method of triggering the parachute deployment sequence based on planet-relative velocity if both the GPS and the barometric altimeter fail. However, this backup trigger results in large altitude errors relative to the targeted altitude. Motivated by this challenge, this paper demonstrates how logistic regression may be employed to semi-automatically generate robust triggers based on statistical analysis. Logistic regression is used as a ground processor pre-flight to develop a statistical classifier. The classifier would then be implemented in flight software and executed in real-time. This technique offers improved performance even in the face of highly inaccurate measurements. Although the logistic regression-based trigger approach will not be implemented within EFT-1 flight software, the methodology can be carried forward for future missions and vehicles
Thorsen, Kenneth; Søreide, Jon Arne; Søreide, Kjetil
Mortality rates in perforated peptic ulcer (PPU) have remained unchanged. The aim of this study was to compare known clinical factors and three scoring systems (American Society of Anesthesiologists (ASA), Boey and peptic ulcer perforation (PULP)) in the ability to predict mortality in PPU. This is a consecutive, observational cohort study of patients surgically treated for perforated peptic ulcer over a decade (January 2001 through December 2010). Primary outcome was 30-day mortality. A total of 172 patients were included, of whom 28 (16 %) died within 30 days. Among the factors associated with mortality, the PULP score had an odds ratio (OR) of 18.6 and the ASA score had an OR of 11.6, both with an area under the curve (AUC) of 0.79. The Boey score had an OR of 5.0 and an AUC of 0.75. Hypoalbuminaemia alone (≤37 g/l) achieved an OR of 8.7 and an AUC of 0.78. In multivariable regression, mortality was best predicted by a combination of increasing age, presence of active cancer and delay from admission to surgery of >24 h, together with hypoalbuminaemia, hyperbilirubinaemia and increased creatinine values, for a model AUC of 0.89. Six clinical factors predicted 30-day mortality better than available risk scores. Hypoalbuminaemia was the strongest single predictor of mortality and may be included for improved risk estimation.
A number of clinical trials and single-subject studies have been published measuring the effectiveness of long-term, comprehensive applied behavior analytic (ABA) intervention for young children with autism. However, the overall appreciation of this literature through standardized measures has been hampered by the varying methods, designs, treatment features and quality standards of published studies. In an attempt to fill this gap in the literature, state-of-the-art meta-analytical methods were implemented, including quality assessment, sensitivity analysis, meta-regression, dose-response meta-analysis and meta-analysis of studies of different metrics. Results suggested that long-term, comprehensive ABA intervention leads to (positive) medium to large effects in terms of intellectual functioning, language development, acquisition of daily living skills and social functioning in children with autism. Although favorable effects were apparent across all outcomes, language-related outcomes (IQ, receptive and expressive language, communication) were superior to non-verbal IQ, social functioning and daily living skills, with effect sizes approaching 1.5 for receptive and expressive language and communication skills. Dose-dependant effect sizes were apparent by levels of total treatment hours for language and adaptation composite scores. Methodological issues relating ABA clinical trials for autism are discussed.
Açikgöz, Güneş; Hamamci, Berna; Yildiz, Abdulkadir
Alcohol consumption triggers toxic effect to organs and tissues in the human body. The risks are essentially thought to be related to ethanol content in alcoholic beverages. The identification of ethanol in blood samples requires rapid, minimal sample handling, and non-destructive analysis, such as Raman Spectroscopy. This study aims to apply Raman Spectroscopy for identification of ethanol in blood samples. Silver nanoparticles were synthesized to obtain Surface Enhanced Raman Spectroscopy (SERS) spectra of blood samples. The SERS spectra were used for Partial Least Square (PLS) for determining ethanol quantitatively. To apply PLS method, 920~820 cm -1 band interval was chosen and the spectral changes of the observed concentrations statistically associated with each other. The blood samples were examined according to this model and the quantity of ethanol was determined as that: first a calibration method was established. A strong relationship was observed between known concentration values and the values obtained by PLS method (R 2 = 1). Second instead of then, quantities of ethanol in 40 blood samples were predicted according to the calibration method. Quantitative analysis of the ethanol in the blood was done by analyzing the data obtained by Raman spectroscopy and the PLS method.
Tanizaki, Junko; Hayashi, Hidetoshi; Kimura, Masatomo; Tanaka, Kaoru; Takeda, Masayuki; Shimizu, Shigeki; Ito, Akihiko; Nakagawa, Kazuhiko
The recent approval of nivolumab and other immune-checkpoint inhibitors for the treatment of certain solid tumors including non-small cell lung cancer (NSCLC) has transformed cancer therapy. However, it will be important to characterize effects of such agents not seen with classical cytotoxic drugs or other targeted therapeutics. We here report two cases of NSCLC showing so-called pseudoprogression during nivolumab treatment. In both cases, imaging assessment revealed that liver metastatic lesions initially progressed but subsequently shrank during continuous nivolumab administration, with treatment also resulting in a decline in serum levels of carcinoembryonic antigen. Histological evaluation of the liver metastatic lesion of one case after regression revealed fibrotic tissue containing infiltrated lymphocytes positive for CD3, CD4, or CD8 but no viable tumor cells, suggestive of a durable immune reaction even after a pathological complete response. Given the increasing use of immune-checkpoint inhibitors in patients with NSCLC or other solid tumors, further clinical evaluation and pathological assessment are warranted to provide a better understanding of such pseudoprogression. Copyright © 2016 Elsevier Ireland Ltd. All rights reserved.
Full Text Available Due to the importance of the import management, this study applies generalized ARDL approach to estimate MIDAS regression for wheat import value and to compare the accuracy of forecasts with those competed by the regression with adjusted data model. Mixed frequency sampling models aim to extract information with high frequency indicators so that independent variables with lower frequencies are modeled and foorcasted. Due to a more precise identification of the relationships among the variables, more accurate prediction is expected. Based on the results of both estimated regression with adjusted frequency models and MIDAS for the years 1978-2003 as a training period, wheat import value with internal products and exchange rate was positively related, while the relative price variable had an adverse relation with the Iran's wheat import value. Based on the results from the conventional statistics such as RMSE, MAD, MAPE and the statistical significance, MIDAS models using data sets of annual wheat import value, internal products, relative price and seasonal exchange rate significantly improves prediction of annual wheat import value for the years2004-2008 as a testing period. Hence, it is recommended that applying prediction approaches with mixed data improves modeling and prediction of agricultural import value, especially for strategic import products.
Scarpace, F. L.; Voss, A. W.
Dye densities of multi-layered films are determined by applying a regression analysis to the spectral response of the composite transparency. The amount of dye in each layer is determined by fitting the sum of the individual dye layer densities to the measured dye densities. From this, dye content constants are calculated. Methods of calculating equivalent exposures are discussed. Equivalent exposures are a constant amount of energy over a limited band-width that will give the same dye content constants as the real incident energy. Methods of using these equivalent exposures for analysis of photographic data are presented.
Marami Milani, Mohammad Reza; Hense, Andreas; Rahmani, Elham; Ploeger, Angelika
This study focuses on multiple linear regression models relating six climate indices (temperature humidity THI, environmental stress ESI, equivalent temperature index ETI, heat load HLI, modified HLI (HLI new ), and respiratory rate predictor RRP) with three main components of cow's milk (yield, fat, and protein) for cows in Iran. The least absolute shrinkage selection operator (LASSO) and the Akaike information criterion (AIC) techniques are applied to select the best model for milk predictands with the smallest number of climate predictors. Uncertainty estimation is employed by applying bootstrapping through resampling. Cross validation is used to avoid over-fitting. Climatic parameters are calculated from the NASA-MERRA global atmospheric reanalysis. Milk data for the months from April to September, 2002 to 2010 are used. The best linear regression models are found in spring between milk yield as the predictand and THI, ESI, ETI, HLI, and RRP as predictors with p -value < 0.001 and R ² (0.50, 0.49) respectively. In summer, milk yield with independent variables of THI, ETI, and ESI show the highest relation ( p -value < 0.001) with R ² (0.69). For fat and protein the results are only marginal. This method is suggested for the impact studies of climate variability/change on agriculture and food science fields when short-time series or data with large uncertainty are available.
Full Text Available In this study, we used high-dimensional pattern regression methods based on structural (gray and white matter; GM and WM and functional (positron emission tomography of regional cerebral blood flow; PET brain data to identify cross-sectional imaging biomarkers of cognitive performance in cognitively normal older adults from the Baltimore Longitudinal Study of Aging (BLSA. We focused on specific components of executive and memory domains known to decline with aging, including manipulation, semantic retrieval, long-term memory (LTM, and short-term memory (STM. For each imaging modality, brain regions associated with each cognitive domain were generated by adaptive regional clustering. A relevance vector machine was adopted to model the nonlinear continuous relationship between brain regions and cognitive performance, with cross-validation to select the most informative brain regions (using recursive feature elimination as imaging biomarkers and optimize model parameters. Predicted cognitive scores using our regression algorithm based on the resulting brain regions correlated well with actual performance. Also, regression models obtained using combined GM, WM, and PET imaging modalities outperformed models based on single modalities. Imaging biomarkers related to memory performance included the orbito-frontal and medial temporal cortical regions with LTM showing stronger correlation with the temporal lobe than STM. Brain regions predicting executive performance included orbito-frontal, and occipito-temporal areas. The PET modality had higher contribution to most cognitive domains except manipulation, which had higher WM contribution from the superior longitudinal fasciculus and the genu of the corpus callosum. These findings based on machine-learning methods demonstrate the importance of combining structural and functional imaging data in understanding complex cognitive mechanisms and also their potential usage as biomarkers that predict cognitive
Full Text Available Abstract Background European Union public healthcare expenditure on treating smoking and attributable diseases is estimated at over €25bn annually. The reduction of tobacco consumption has thus become one of the major social policies of the EU. This study investigates the effects of price hikes on cigarette consumption, tobacco tax revenues and smoking-caused deaths in 28 EU countries. Methods Employing panel data for the years 2005 to 2014 from Euromonitor International, the World Bank and the World Health Organization, we used income as a threshold variable and applied threshold regression modelling to estimate the elasticity of cigarette prices and to simulate the effect of price fluctuations. Results The results showed that there was an income threshold effect on cigarette prices in the 28 EU countries that had a gross national income (GNI per capita lower than US$5418, with a maximum cigarette price elasticity of −1.227. The results of the simulated analysis showed that a rise of 10% in cigarette price would significantly reduce cigarette consumption as well the total death toll caused by smoking in all the observed countries, but would be most effective in Bulgaria and Romania, followed by Latvia and Poland. Additionally, an increase in the number of MPOWER tobacco control policies at the highest level of achievment would help reduce cigarette consumption. Conclusions It is recommended that all EU countries levy higher tobacco taxes to increase cigarette prices, and thus in effect reduce cigarette consumption. The subsequent increase in tobacco tax revenues would be instrumental in covering expenditures related to tobacco prevention and control programs.
Yeh, Chun-Yuan; Schafferer, Christian; Lee, Jie-Min; Ho, Li-Ming; Hsieh, Chi-Jung
European Union public healthcare expenditure on treating smoking and attributable diseases is estimated at over €25bn annually. The reduction of tobacco consumption has thus become one of the major social policies of the EU. This study investigates the effects of price hikes on cigarette consumption, tobacco tax revenues and smoking-caused deaths in 28 EU countries. Employing panel data for the years 2005 to 2014 from Euromonitor International, the World Bank and the World Health Organization, we used income as a threshold variable and applied threshold regression modelling to estimate the elasticity of cigarette prices and to simulate the effect of price fluctuations. The results showed that there was an income threshold effect on cigarette prices in the 28 EU countries that had a gross national income (GNI) per capita lower than US$5418, with a maximum cigarette price elasticity of -1.227. The results of the simulated analysis showed that a rise of 10% in cigarette price would significantly reduce cigarette consumption as well the total death toll caused by smoking in all the observed countries, but would be most effective in Bulgaria and Romania, followed by Latvia and Poland. Additionally, an increase in the number of MPOWER tobacco control policies at the highest level of achievment would help reduce cigarette consumption. It is recommended that all EU countries levy higher tobacco taxes to increase cigarette prices, and thus in effect reduce cigarette consumption. The subsequent increase in tobacco tax revenues would be instrumental in covering expenditures related to tobacco prevention and control programs.
Pires, Flávio de Oliveira; de Oliveira Pires, Flávio
According to Thomas Kuhn, the scientific progress of any discipline could be distinguished by a pre-paradigm phase, a normal science phase and a revolution phase. The science advances when a scientific revolution takes place after silent period of normal science and the scientific community moves ahead to a paradigm shift. I suggest there has been a recent change of course in the direction of the exercise science. According to the 'current paradigm', exercise would be probably limited by alterations in either central command or peripheral skeletal muscles, and fatigue would be developed in a task-dependent manner. Instead, the central governor model (GCM) has proposed that all forms of exercise are centrally-regulated, the central nervous system would calculate the metabolic cost required to complete a task in order to avoid catastrophic body failure. Some have criticized the CGM and supported the traditional interpretation, but recently the scientific community appears to have begun an intellectual trajectory to accept this theory. First, the increased number of citations of articles that have supported the CGM could indicate that the community has changed the focus. Second, relevant journals have devoted special editions to promote the debate on subjects challenged by the CGM. Finally, scientists from different fields have recognized mechanisms included in the CGM to understand the exercise limits. Given the importance of the scientific community in demarcating a Kuhnian paradigm shift, I suggest that these three aspects could indicate an increased acceptance of a centrally-regulated effort model, to understand the limits of exercise.
Samuel Ribeiro Figueiredo
Full Text Available Regressões nominais logísticas estabelecem relações matemáticas entre variáveis independentes contínuas ou discretas e variáveis dependentes discretas. Essas foram avaliadas quanto ao seu potencial em predizer a ocorrência e distribuição de classes de solos na região dos municípios de Ibirubá e Quinze de Novembro (RS. A partir de modelo numérico de terreno digital (MNT com 90 m de resolução, foram calculadas variáveis de terreno topográficas (elevação, declividade e curvatura e hidrográficas (distância dos rios, índice de umidade topográfica, comprimento de fluxo de escoamento e índice de poder de escoamento. Foram então estabelecidas regressões logísticas múltiplas entre as classes de solos da região com base em levantamento tradicional na escala 1:80.000 e as variáveis de terreno. As regressões serviram para calcular a probabilidade de ocorrência de cada classe de solo, e o mapa final de solos estimado foi produzido atribuindo-se a cada célula do mapa a denominação da classe de solo com maior probabilidade de ocorrência. Observou-se acurácia geral (AG de 58 % e acurácia pelo coeficiente Kappa de Cohen de 38 %, comparando-se o mapa original com o mapa estimado dentro da escala original. Uma simplificação de escala foi pouco significativa para o aumento da acurácia do mapa, sendo 61 % de AG e 39 % de Kappa. Concluiu-se que as regressões logísticas múltiplas apresentaram potencial preditivo para serem usadas como ferramentas no mapeamento supervisionado de solos.Logistic nominal regressions establish mathematical relations between continuous or discrete independent variables and discrete dependent variables. The prediction potential of the occurrence and distribution of soil classes in the region Ibirubá and Quinze de Novembro, RS, Brazil was evaluated. Using a digital elevation model (DEM with 90 m resolution, were calculated several topographic characteristics (elevation, slope, and curvature and
Jović, Ozren; Smrečki, Neven; Popović, Zora
A novel quantitative prediction and variable selection method called interval ridge regression (iRR) is studied in this work. The method is performed on six data sets of FTIR, two data sets of UV-vis and one data set of DSC. The obtained results show that models built with ridge regression on optimal variables selected with iRR significantly outperfom models built with ridge regression on all variables in both calibration (6 out of 9 cases) and validation (2 out of 9 cases). In this study, iRR is also compared with interval partial least squares regression (iPLS). iRR outperfomed iPLS in validation (insignificantly in 6 out of 9 cases and significantly in one out of 9 cases for p0.99). Copyright © 2015 Elsevier B.V. All rights reserved.
Callen, M.S.; Lopez, J.M.; Mastral, A.M.
The estimation of benzo(a)pyrene (BaP) concentrations in ambient air is very important from an environmental point of view especially with the introduction of the Directive 2004/107/EC and due to the carcinogenic character of this pollutant. A sampling campaign of particulate matter less or equal than 10 microns (PM10) carried out during 2008-2009 in four locations of Spain was collected to determine experimentally BaP concentrations by gas chromatography mass-spectrometry mass-spectrometry (GC-MS-MS). Multivariate linear regression models (MLRM) were used to predict BaP air concentrations in two sampling places, taking PM10 and meteorological variables as possible predictors. The model obtained with data from two sampling sites (all sites model) (R 2 = 0.817, PRESS/SSY = 0.183) included the significant variables like PM10, temperature, solar radiation and wind speed and was internally and externally validated. The first validation was performed by cross validation and the last one by BaP concentrations from previous campaigns carried out in Zaragoza from 2001-2004. The proposed model constitutes a first approximation to estimate BaP concentrations in urban atmospheres with very good internal prediction (Q CV 2 =0.813, PRESS/SSY = 0.187) and with the maximal external prediction for the 2001-2002 campaign (Q ext 2 =0.679 and PRESS/SSY = 0.321) versus the 2001-2004 campaign (Q ext 2 =0.551, PRESS/SSY = 0.449).
Callén, M S; López, J M; Mastral, A M
The estimation of benzo(a)pyrene (BaP) concentrations in ambient air is very important from an environmental point of view especially with the introduction of the Directive 2004/107/EC and due to the carcinogenic character of this pollutant. A sampling campaign of particulate matter less or equal than 10 microns (PM10) carried out during 2008-2009 in four locations of Spain was collected to determine experimentally BaP concentrations by gas chromatography mass-spectrometry mass-spectrometry (GC-MS-MS). Multivariate linear regression models (MLRM) were used to predict BaP air concentrations in two sampling places, taking PM10 and meteorological variables as possible predictors. The model obtained with data from two sampling sites (all sites model) (R(2)=0.817, PRESS/SSY=0.183) included the significant variables like PM10, temperature, solar radiation and wind speed and was internally and externally validated. The first validation was performed by cross validation and the last one by BaP concentrations from previous campaigns carried out in Zaragoza from 2001-2004. The proposed model constitutes a first approximation to estimate BaP concentrations in urban atmospheres with very good internal prediction (Q(CV)(2)=0.813, PRESS/SSY=0.187) and with the maximal external prediction for the 2001-2002 campaign (Q(ext)(2)=0.679 and PRESS/SSY=0.321) versus the 2001-2004 campaign (Q(ext)(2)=0.551, PRESS/SSY=0.449). Copyright 2010 Elsevier B.V. All rights reserved.
The logistic regression originally is intended to explain the relationship between the probability of an event and a set of covariables. The model's coefficients can be interpreted via the odds and odds ratio, which are presented in introduction of the chapter. The observations are possibly got individually, then we speak of binary logistic regression. When they are grouped, the logistic regression is said binomial. In our presentation we mainly focus on the binary case. For statistical inference the main tool is the maximum likelihood methodology: we present the Wald, Rao and likelihoods ratio results and their use to compare nested models. The problems we intend to deal with are essentially the same as in multiple linear regression: testing global effect, individual effect, selection of variables to build a model, measure of the fitness of the model, prediction of new values… . The methods are demonstrated on data sets using R. Finally we briefly consider the binomial case and the situation where we are interested in several events, that is the polytomous (multinomial) logistic regression and the particular case of ordinal logistic regression.
Full Text Available Electric load forecasting plays an important role in electricity markets and power systems. Because electric load time series are complicated and nonlinear, it is very difficult to achieve a satisfactory forecasting accuracy. In this paper, a hybrid model, Wavelet Denoising-Extreme Learning Machine optimized by k-Nearest Neighbor Regression (EWKM, which combines k-Nearest Neighbor (KNN and Extreme Learning Machine (ELM based on a wavelet denoising technique is proposed for short-term load forecasting. The proposed hybrid model decomposes the time series into a low frequency-associated main signal and some detailed signals associated with high frequencies at first, then uses KNN to determine the independent and dependent variables from the low-frequency signal. Finally, the ELM is used to get the non-linear relationship between these variables to get the final prediction result for the electric load. Compared with three other models, Extreme Learning Machine optimized by k-Nearest Neighbor Regression (EKM, Wavelet Denoising-Extreme Learning Machine (WKM and Wavelet Denoising-Back Propagation Neural Network optimized by k-Nearest Neighbor Regression (WNNM, the model proposed in this paper can improve the accuracy efficiently. New South Wales is the economic powerhouse of Australia, so we use the proposed model to predict electric demand for that region. The accurate prediction has a significant meaning.
Alfredo BONINI NETO
Full Text Available Neste trabalho é apresentada uma análise de Regressão pelo método dos Mínimos Quadrados que tem por finalidade prever um resultado a partir de uma sequência de dados conhecidos. É utilizado o software Matlab para criação da interface gráfica tornando o programa mais iterativo para o usuário. O programa também auxilia na resolução do sistema de equações lineares do método dos Mínimos Quadrados. Um exemplo da aplicação do método dos Mínimos Quadrados é em exercícios de previsão. Feito através da recolha dos dados que já foram medidos e através desses dados, obter uma função (1º ou 2º grau que passe o mais próximo possível dos pontos dados. Uma observação a ser feita, é encontrar a função mais apropriada para ser utilizada, pois é esta função que passará o mais próximo possível dos pontos conhecidos. Neste trabalho é feito o diagrama de dispersão que tem por finalidade descobrir qual a função é mais apropriada para ser utilizada. Palavras-chave: Mínimos Quadrados, Previsão, Regressão, Interface Gráfica.
Full Text Available Abstract Background Regression calibration as a method for handling measurement error is becoming increasingly well-known and used in epidemiologic research. However, the standard version of the method is not appropriate for exposure analyzed on a categorical (e.g. quintile scale, an approach commonly used in epidemiologic studies. A tempting solution could then be to use the predicted continuous exposure obtained through the regression calibration method and treat it as an approximation to the true exposure, that is, include the categorized calibrated exposure in the main regression analysis. Methods We use semi-analytical calculations and simulations to evaluate the performance of the proposed approach compared to the naive approach of not correcting for measurement error, in situations where analyses are performed on quintile scale and when incorporating the original scale into the categorical variables, respectively. We also present analyses of real data, containing measures of folate intake and depression, from the Norwegian Women and Cancer study (NOWAC. Results In cases where extra information is available through replicated measurements and not validation data, regression calibration does not maintain important qualities of the true exposure distribution, thus estimates of variance and percentiles can be severely biased. We show that the outlined approach maintains much, in some cases all, of the misclassification found in the observed exposure. For that reason, regression analysis with the corrected variable included on a categorical scale is still biased. In some cases the corrected estimates are analytically equal to those obtained by the naive approach. Regression calibration is however vastly superior to the naive method when applying the medians of each category in the analysis. Conclusion Regression calibration in its most well-known form is not appropriate for measurement error correction when the exposure is analyzed on a
Adragni, Kofi P; Cook, R Dennis
Dimension reduction for regression is a prominent issue today because technological advances now allow scientists to routinely formulate regressions in which the number of predictors is considerably larger than in the past. While several methods have been proposed to deal with such regressions, principal components (PCs) still seem to be the most widely used across the applied sciences. We give a broad overview of ideas underlying a particular class of methods for dimension reduction that includes PCs, along with an introduction to the corresponding methodology. New methods are proposed for prediction in regressions with many predictors.
Olive, David J
This text covers both multiple linear regression and some experimental design models. The text uses the response plot to visualize the model and to detect outliers, does not assume that the error distribution has a known parametric distribution, develops prediction intervals that work when the error distribution is unknown, suggests bootstrap hypothesis tests that may be useful for inference after variable selection, and develops prediction regions and large sample theory for the multivariate linear regression model that has m response variables. A relationship between multivariate prediction regions and confidence regions provides a simple way to bootstrap confidence regions. These confidence regions often provide a practical method for testing hypotheses. There is also a chapter on generalized linear models and generalized additive models. There are many R functions to produce response and residual plots, to simulate prediction intervals and hypothesis tests, to detect outliers, and to choose response trans...
Igor K. Kochanenko
Full Text Available Procedures of construction of curve regress by criterion of the least fractals, i.e. the greatest probability of the sums of degrees of the least deviations measured intensity from their modelling values are proved. The exponent is defined as fractal dimension of a time number. The difference of results of a well-founded method and a method of the least squares is quantitatively estimated.
Fitzenberger, Bernd; Wilke, Ralf Andreas
if the mean regression model does not. We provide a short informal introduction into the principle of quantile regression which includes an illustrative application from empirical labor market research. This is followed by briefly sketching the underlying statistical model for linear quantile regression based......Quantile regression is emerging as a popular statistical approach, which complements the estimation of conditional mean models. While the latter only focuses on one aspect of the conditional distribution of the dependent variable, the mean, quantile regression provides more detailed insights...... by modeling conditional quantiles. Quantile regression can therefore detect whether the partial effect of a regressor on the conditional quantiles is the same for all quantiles or differs across quantiles. Quantile regression can provide evidence for a statistical relationship between two variables even...
Esposito, Gabriele; van Bavel, René; Baranowski, Tom; Duch-Brown, Néstor
The theory of planned behavior (TPB) has received its fair share of criticism lately, including calls for it to retire. We contribute to improving the theory by testing extensions such as the model of goal-directed behavior (MGDB, which adds desire and anticipated positive and negative emotions) applied to physical activity (PA) intention. We also test the inclusion of a descriptive norms construct as an addition to the subjective norms construct, also applied to PA, resulting in two additional models: TPB including descriptive norms (TPB + DN) and MGDB including descriptive norms (MGDB + DN). The study is based on an online survey of 400 young adult Internet users, previously enrolled in a subject pool. Confirmatory factor analysis (CFA) showed that TPB and TPB + DN were not fit for purpose, while MGDB and MGDB + DN were. Structural equation modelling (SEM) conducted on MGDB and MGDB + DN showed that the inclusion of descriptive norms took over the significance of injunctive norms, and increased the model's account of total variance in intention to be physically active. © The Author(s) 2016.
Peng, Chao-Ying Joanne; Lee, Kuk Lida; Ingersoll, Gary M.
Provides guidelines for what to expect in an article using logistic regression techniques, discussing tables, figures, and charts to be included to comprehensively assess results and assumptions to be verified; demonstrating the preferred pattern for applying logistic methods, with an illustration of logistic regression applied to a data set; and…
Fabyano Fonseca Silva
Full Text Available Nowadays, an important and interesting alternative in the control of tick-infestation in cattle is to select resistant animals, and identify the respective quantitative trait loci (QTLs and DNA markers, for posterior use in breeding programs. The number of ticks/animal is characterized as a discrete-counting trait, which could potentially follow Poisson distribution. However, in the case of an excess of zeros, due to the occurrence of several noninfected animals, zero-inflated Poisson and generalized zero-inflated distribution (GZIP may provide a better description of the data. Thus, the objective here was to compare through simulation, Poisson and ZIP models (simple and generalized with classical approaches, for QTL mapping with counting phenotypes under different scenarios, and to apply these approaches to a QTL study of tick resistance in an F2 cattle (Gyr x Holstein population. It was concluded that, when working with zero-inflated data, it is recommendable to use the generalized and simple ZIP model for analysis. On the other hand, when working with data with zeros, but not zero-inflated, the Poisson model or a data-transformation-approach, such as square-root or Box-Cox transformation, are applicable.
Cooley, R.L.; Naff, R.L.
Nonlinear multiple regression methods are developed to model and analyze groundwater flow systems. Complete descriptions of regression methodology as applied to groundwater flow models allow scientists and engineers engaged in flow modeling to apply the methods to a wide range of problems. Organization of the text proceeds from an introduction that discusses the general topic of groundwater flow modeling, to a review of basic statistics necessary to properly apply regression techniques, and then to the main topic: exposition and use of linear and nonlinear regression to model groundwater flow. Statistical procedures are given to analyze and use the regression models. A number of exercises and answers are included to exercise the student on nearly all the methods that are presented for modeling and statistical analysis. Three computer programs implement the more complex methods. These three are a general two-dimensional, steady-state regression model for flow in an anisotropic, heterogeneous porous medium, a program to calculate a measure of model nonlinearity with respect to the regression parameters, and a program to analyze model errors in computed dependent variables such as hydraulic head. (USGS)
Feigelson, Eric D.; Babu, Gutti J.
A wide variety of least-squares linear regression procedures used in observational astronomy, particularly investigations of the cosmic distance scale, are presented and discussed. The classes of linear models considered are (1) unweighted regression lines, with bootstrap and jackknife resampling; (2) regression solutions when measurement error, in one or both variables, dominates the scatter; (3) methods to apply a calibration line to new data; (4) truncated regression models, which apply to flux-limited data sets; and (5) censored regression models, which apply when nondetections are present. For the calibration problem we develop two new procedures: a formula for the intercept offset between two parallel data sets, which propagates slope errors from one regression to the other; and a generalization of the Working-Hotelling confidence bands to nonstandard least-squares lines. They can provide improved error analysis for Faber-Jackson, Tully-Fisher, and similar cosmic distance scale relations.
Lee, Kyewon; Ahn, Hongshik; Moon, Hojin; Kodell, Ralph L; Chen, James J
This article proposes a method for multiclass classification problems using ensembles of multinomial logistic regression models. A multinomial logit model is used as a base classifier in ensembles from random partitions of predictors. The multinomial logit model can be applied to each mutually exclusive subset of the feature space without variable selection. By combining multiple models the proposed method can handle a huge database without a constraint needed for analyzing high-dimensional data, and the random partition can improve the prediction accuracy by reducing the correlation among base classifiers. The proposed method is implemented using R, and the performance including overall prediction accuracy, sensitivity, and specificity for each category is evaluated on two real data sets and simulation data sets. To investigate the quality of prediction in terms of sensitivity and specificity, the area under the receiver operating characteristic (ROC) curve (AUC) is also examined. The performance of the proposed model is compared to a single multinomial logit model and it shows a substantial improvement in overall prediction accuracy. The proposed method is also compared with other classification methods such as the random forest, support vector machines, and random multinomial logit model.
Varanasi, S. V.
Order of model determined easily. Linear-regression algorithhm includes recursive equations for coefficients of model of increased order. Algorithm eliminates duplicative calculations, facilitates search for minimum order of linear-regression model fitting set of data satisfactory.
Full Text Available We give an explicit algorithm and source code for combining alpha streams via bounded regression. In practical applications, typically, there is insufficient history to compute a sample covariance matrix (SCM for a large number of alphas. To compute alpha allocation weights, one then resorts to (weighted regression over SCM principal components. Regression often produces alpha weights with insufficient diversification and/or skewed distribution against, e.g., turnover. This can be rectified by imposing bounds on alpha weights within the regression procedure. Bounded regression can also be applied to stock and other asset portfolio construction. We discuss illustrative examples.
... concerning drug-free workplace in the Governmentwide common rule that the DoD has codified at 32 CFR part 26. The requirements apply to all financial assistance. 3. Prohibitions on discrimination on the basis of... “Nondiscrimination” in Appendix B to 32 CFR part 22. 4. Prohibitions on discrimination on the basis of age, in the...
Szpunar, C.B.; Gillette, J.L.
This report examines the concept of environmental externality. It discusses various factors -- the atmospheric transformations, relationship of point-source emissions to ambient air quality, dose-response relationships, applicable cause-and-effect principles, and risk and valuation research -- that are considered by a number of state utilities when they apply the environmental externality concept to energy resource planning. It describes a methodology developed by Argonne National Laboratory for general use in resource planning, in combination with traditional methods that consider the cost of electricity production. Finally, it shows how the methodology can be applied in Indonesia, Thailand, and Taiwan to potential coal-fired power plant projects that will make use of clean coal technologies.
Improvement of multivariate image analysis applied to quantitative structure-activity relationship (QSAR) analysis by using wavelet-principal component analysis ranking variable selection and least-squares support vector machine regression: QSAR study of checkpoint kinase WEE1 inhibitors.
Cormanich, Rodrigo A; Goodarzi, Mohammad; Freitas, Matheus P
Inhibition of tyrosine kinase enzyme WEE1 is an important step for the treatment of cancer. The bioactivities of a series of WEE1 inhibitors have been previously modeled through comparative molecular field analyses (CoMFA and CoMSIA), but a two-dimensional image-based quantitative structure-activity relationship approach has shown to be highly predictive for other compound classes. This method, called multivariate image analysis applied to quantitative structure-activity relationship, was applied here to derive quantitative structure-activity relationship models. Whilst the well-known bilinear and multilinear partial least squares regressions (PLS and N-PLS, respectively) correlated multivariate image analysis descriptors with the corresponding dependent variables only reasonably well, the use of wavelet and principal component ranking as variable selection methods, together with least-squares support vector machine, improved significantly the prediction statistics. These recently implemented mathematical tools, particularly novel in quantitative structure-activity relationship studies, represent an important advance for the development of more predictive quantitative structure-activity relationship models and, consequently, new drugs.
Kuhl, Mark R.
The introduction of the Global Positioning System (GPS) into the National Airspace System (NAS) necessitates the development of Receiver Autonomous Integrity Monitoring (RAIM) techniques. In order to guarantee a certain level of integrity, a thorough understanding of modern estimation techniques applied to navigational problems is required. The extended Kalman filter (EKF) is derived and analyzed under poor geometry conditions. It was found that the performance of the EKF is difficult to predict, since the EKF is designed for a Gaussian environment. A novel approach is implemented which incorporates ridge regression to explain the behavior of an EKF in the presence of dynamics under poor geometry conditions. The basic principles of ridge regression theory are presented, followed by the derivation of a linearized recursive ridge estimator. Computer simulations are performed to confirm the underlying theory and to provide a comparative analysis of the EKF and the recursive ridge estimator.
The theory of planned behavior (TPB) has received its fair share of criticism lately, including calls for it to retire. We contributed to improving the theory by testing extensions such as the model of goal-directed behavior (MGDB, which adds desire and anticipated positive and negative emotions) ap...
Goutte, Cyril; Larsen, Jan
Kernel smoothing is a widely used non-parametric pattern recognition technique. By nature, it suffers from the curse of dimensionality and is usually difficult to apply to high input dimensions. In this contribution, we propose an algorithm that adapts the input metric used in multivariate...... regression by minimising a cross-validation estimate of the generalisation error. This allows to automatically adjust the importance of different dimensions. The improvement in terms of modelling performance is illustrated on a variable selection task where the adaptive metric kernel clearly outperforms...
Goutte, Cyril; Larsen, Jan
Kernel smoothing is a widely used nonparametric pattern recognition technique. By nature, it suffers from the curse of dimensionality and is usually difficult to apply to high input dimensions. In this paper, we propose an algorithm that adapts the input metric used in multivariate regression...... by minimising a cross-validation estimate of the generalisation error. This allows one to automatically adjust the importance of different dimensions. The improvement in terms of modelling performance is illustrated on a variable selection task where the adaptive metric kernel clearly outperforms the standard...
Pedrini, D. T.; Pedrini, Bonnie C.
Regression, another mechanism studied by Sigmund Freud, has had much research, e.g., hypnotic regression, frustration regression, schizophrenic regression, and infra-human-animal regression (often directly related to fixation). Many investigators worked with hypnotic age regression, which has a long history, going back to Russian reflexologists.…
Aoki, Takayuki; Kobayashi, Hiroyuki; Higuchi, Shinichi; Shimizu, Sadato
A Ni-base alloy weld, including cracks due to stress corrosion cracking found in the reactor internal of the oldest BWR in Japan, Tsuruga unit 1, in 1999, was examined by three (3) types of UT method. After this examination, a depth of each crack was confirmed by carrying out a little excavation with a grinder and PT examination by turns until each crack disappeared. Then, the depth measured by the former method was compared with the one measured by the latter method. In this fashion, performances of the UT methods were verified. As a result, a combination of the three types of UT method was found to meet the acceptance criteria given by ASME Sec.XI Appendix VIII, Performance Demonstration for Ultrasonic Examination Systems-Supplement 6. In this paper, the results of the UT examination described above and their evaluation are discussed. (author)
Brodeur, Garrett M.; Bagatell, Rochelle
Recent genomic and biological studies of neuroblastoma have shed light on the dramatic heterogeneity in the clinical behaviour of this disease, which spans from spontaneous regression or differentiation in some patients, to relentless disease progression in others, despite intensive multimodality therapy. This evidence also suggests several possible mechanisms to explain the phenomena of spontaneous regression in neuroblastomas, including neurotrophin deprivation, humoral or cellular immunity, loss of telomerase activity and alterations in epigenetic regulation. A better understanding of the mechanisms of spontaneous regression might help to identify optimal therapeutic approaches for patients with these tumours. Currently, the most druggable mechanism is the delayed activation of developmentally programmed cell death regulated by the tropomyosin receptor kinase A pathway. Indeed, targeted therapy aimed at inhibiting neurotrophin receptors might be used in lieu of conventional chemotherapy or radiation in infants with biologically favourable tumours that require treatment. Alternative approaches consist of breaking immune tolerance to tumour antigens or activating neurotrophin receptor pathways to induce neuronal differentiation. These approaches are likely to be most effective against biologically favourable tumours, but they might also provide insights into treatment of biologically unfavourable tumours. We describe the different mechanisms of spontaneous neuroblastoma regression and the consequent therapeutic approaches. PMID:25331179
Isobe, Takashi; Feigelson, Eric D.; Akritas, Michael G.; Babu, Gutti Jogesh
Five methods for obtaining linear regression fits to bivariate data with unknown or insignificant measurement errors are discussed: ordinary least-squares (OLS) regression of Y on X, OLS regression of X on Y, the bisector of the two OLS lines, orthogonal regression, and 'reduced major-axis' regression. These methods have been used by various researchers in observational astronomy, most importantly in cosmic distance scale applications. Formulas for calculating the slope and intercept coefficients and their uncertainties are given for all the methods, including a new general form of the OLS variance estimates. The accuracy of the formulas was confirmed using numerical simulations. The applicability of the procedures is discussed with respect to their mathematical properties, the nature of the astronomical data under consideration, and the scientific purpose of the regression. It is found that, for problems needing symmetrical treatment of the variables, the OLS bisector performs significantly better than orthogonal or reduced major-axis regression.
Análise de agrupamento na seleção de modelos de regressão não-lineares para curvas de crescimento de ovinos cruzados Cluster analysis applied to nonlinear regression models selection to growth curves of crossed lambs
Fernanda Gomes da Silveira
Full Text Available Este estudo teve como objetivo utilizar a análise de agrupamento para classificar modelos de regressão não-lineares usados para descrever a curva de crescimento de ovinos cruzados, tendo em vista os resultados de diferentes avaliadores de qualidade de ajuste. Para tanto, utilizaram-se dados de peso-idade dos seguintes cruzamentos entre raças de ovinos de corte: Dorper x Morada Nova, Dorper x Rabo Largo e Dorper x Santa Inês. Após a indicação do melhor modelo, objetivou-se ainda aplicar a técnica de identidade de modelos a fim de identificar o cruzamento mais produtivo. Foram ajustados doze modelos não-lineares, cuja qualidade de ajuste foi medida pelo coeficiente de determinação ajustado, critérios de informação de Akaike e Bayesiano, erro quadrático médio de predição e coeficiente de determinação de predição. A análise de agrupamento indicou o modelo Richards como o mais adequado para descrever as curvas de crescimento dos três grupos genéticos considerados, e os testes de identidade de modelos indicaram o cruzamento Dorper x Santa Inês como sendo o mais indicado para a pecuária local.This study had the objectives to use the cluster analysis in order to classify nonlinear regression models used to describe the growth curve in relation to different quality fit evaluators. Were utilized weight-age data from the following crossbred beef lambs Dorper x Morada Nova, Dorper x Rabo Largo e Dorper x Santa Inês. After the choice of the best model, we aimed also to apply the model identity in order to identify the most efficient crossbred group. Eleven nonlinear models were used, whose fit quality was measured by determination coefficient, Akaike information criterion, Bayesian information criterion, mean quadratic error of prediction and predicted determination coefficient. The cluster analysis indicated the Richards as the best model for the three data sets, and the model identity tests revealed that the Dorper x Santa In
Originally published in 1990, the first edition of Subset Selection in Regression filled a significant gap in the literature, and its critical and popular success has continued for more than a decade. Thoroughly revised to reflect progress in theory, methods, and computing power, the second edition promises to continue that tradition. The author has thoroughly updated each chapter, incorporated new material on recent developments, and included more examples and references. New in the Second Edition:A separate chapter on Bayesian methodsComplete revision of the chapter on estimationA major example from the field of near infrared spectroscopyMore emphasis on cross-validationGreater focus on bootstrappingStochastic algorithms for finding good subsets from large numbers of predictors when an exhaustive search is not feasible Software available on the Internet for implementing many of the algorithms presentedMore examplesSubset Selection in Regression, Second Edition remains dedicated to the techniques for fitting...
Walton, Joseph M.; And Others
Ridge regression is an approach to the problem of large standard errors of regression estimates of intercorrelated regressors. The effect of ridge regression on the estimated squared multiple correlation coefficient is discussed and illustrated. (JKS)
Roč. 1, č. 4 (2011), s. 25-28 ISSN 2045-3345 Grant - others:GA ČR(CZ) GA402/09/0557 Institutional research plan: CEZ:AV0Z10300504 Keywords : robust regression * heteroscedasticity * regression quantiles * diagnostics Subject RIV: BB - Applied Statistics , Operational Research http://www.researchjournals.co.uk/documents/Vol4/06%20Kalina.pdf
Davino, Cristina; Vistocco, Domenico
A guide to the implementation and interpretation of Quantile Regression models This book explores the theory and numerous applications of quantile regression, offering empirical data analysis as well as the software tools to implement the methods. The main focus of this book is to provide the reader with a comprehensivedescription of the main issues concerning quantile regression; these include basic modeling, geometrical interpretation, estimation and inference for quantile regression, as well as issues on validity of the model, diagnostic tools. Each methodological aspect is explored and
Regression Modeling: Methods, Theory, and Computation with SAS provides an introduction to a diverse assortment of regression techniques using SAS to solve a wide variety of regression problems. The author fully documents the SAS programs and thoroughly explains the output produced by the programs.The text presents the popular ordinary least squares (OLS) approach before introducing many alternative regression methods. It covers nonparametric regression, logistic regression (including Poisson regression), Bayesian regression, robust regression, fuzzy regression, random coefficients regression,
Mark A. Wolters
Full Text Available Autologistic regression is an important probability model for dichotomous random variables observed along with covariate information. It has been used in various fields for analyzing binary data possessing spatial or network structure. The model can be viewed as an extension of the autologistic model (also known as the Ising model, quadratic exponential binary distribution, or Boltzmann machine to include covariates. It can also be viewed as an extension of logistic regression to handle responses that are not independent. Not all authors use exactly the same form of the autologistic regression model. Variations of the model differ in two respects. First, the variable coding—the two numbers used to represent the two possible states of the variables—might differ. Common coding choices are (zero, one and (minus one, plus one. Second, the model might appear in either of two algebraic forms: a standard form, or a recently proposed centered form. Little attention has been paid to the effect of these differences, and the literature shows ambiguity about their importance. It is shown here that changes to either coding or centering in fact produce distinct, non-nested probability models. Theoretical results, numerical studies, and analysis of an ecological data set all show that the differences among the models can be large and practically significant. Understanding the nature of the differences and making appropriate modeling choices can lead to significantly improved autologistic regression analyses. The results strongly suggest that the standard model with plus/minus coding, which we call the symmetric autologistic model, is the most natural choice among the autologistic variants.
The reduced rank regression model is a multivariate regression model with a coefficient matrix with reduced rank. The reduced rank regression algorithm is an estimation procedure, which estimates the reduced rank regression model. It is related to canonical correlations and involves calculating...
Roč. 17, č. 4 (2015), s. 963-972 ISSN 1387-5841 Institutional support: RVO:67985556 Keywords : Reliability analysis * Repair models * Regression Subject RIV: BB - Applied Statistics , Operational Research Impact factor: 0.782, year: 2015 http://library.utia.cas.cz/separaty/2015/SI/novak-0450902.pdf
We review the serious problem associated with instabilities of the coefficients of regression equations, referred to as the MRA (multivariate regression analysis) "nightmare of the first kind". This is manifested when in a stepwise regression a descriptor is included or excluded from a regression. The consequence is an unpredictable change of the coefficients of the descriptors that remain in the regression equation. We follow with consideration of an even more serious problem, referred to as the MRA "nightmare of the second kind", arising when optimal descriptors are selected from a large pool of descriptors. This process typically causes at different steps of the stepwise regression a replacement of several previously used descriptors by new ones. We describe a procedure that resolves these difficulties. The approach is illustrated on boiling points of nonanes which are considered (1) by using an ordered connectivity basis; (2) by using an ordering resulting from application of greedy algorithm; and (3) by using an ordering derived from an exhaustive search for optimal descriptors. A novel variant of multiple regression analysis, called retro-regression (RR), is outlined showing how it resolves the ambiguities associated with both "nightmares" of the first and the second kind of MRA.
Praise for the Fourth Edition: ""This book is . . . an excellent source of examples for regression analysis. It has been and still is readily readable and understandable."" -Journal of the American Statistical Association Regression analysis is a conceptually simple method for investigating relationships among variables. Carrying out a successful application of regression analysis, however, requires a balance of theoretical results, empirical rules, and subjective judgment. Regression Analysis by Example, Fifth Edition has been expanded
Chen, Yen-Chi; Genovese, Christopher R.; Tibshirani, Ryan J.; Wasserman, Larry
Modal regression estimates the local modes of the distribution of $Y$ given $X=x$, instead of the mean, as in the usual regression sense, and can hence reveal important structure missed by usual regression methods. We study a simple nonparametric method for modal regression, based on a kernel density estimate (KDE) of the joint distribution of $Y$ and $X$. We derive asymptotic error bounds for this method, and propose techniques for constructing confidence sets and prediction sets. The latter...
Zafar, S.N.; Siddique, S.N.; Zaheer, N.
To observe the types of tumor regression after treatment, and identify the common pattern of regression in our patients. Study Design: Descriptive study. Place and Duration of Study: Department of Pediatric Ophthalmology and Strabismus, Al-Shifa Trust Eye Hospital, Rawalpindi, Pakistan, from October 2011 to October 2014. Methodology: Children with unilateral and bilateral retinoblastoma were included in the study. Patients were referred to Pakistan Institute of Medical Sciences, Islamabad, for chemotherapy. After every cycle of chemotherapy, dilated funds examination under anesthesia was performed to record response of the treatment. Regression patterns were recorded on RetCam II. Results: Seventy-four tumors were included in the study. Out of 74 tumors, 3 were ICRB group A tumors, 43 were ICRB group B tumors, 14 tumors belonged to ICRB group C, and remaining 14 were ICRB group D tumors. Type IV regression was seen in 39.1% (n=29) tumors, type II in 29.7% (n=22), type III in 25.6% (n=19), and type I in 5.4% (n=4). All group A tumors (100%) showed type IV regression. Seventeen (39.5%) group B tumors showed type IV regression. In group C, 5 tumors (35.7%) showed type II regression and 5 tumors (35.7%) showed type IV regression. In group D, 6 tumors (42.9%) regressed to type II non-calcified remnants. Conclusion: The response and success of the focal and systemic treatment, as judged by the appearance of different patterns of tumor regression, varies with the ICRB grouping of the tumor. (author)
Cortese, Giuliana; Scheike, Thomas H; Martinussen, Torben
Regression analysis of survival data, and more generally event history data, is typically based on Cox's regression model. We here review some recent methodology, focusing on the limitations of Cox's regression model. The key limitation is that the model is not well suited to represent time-varyi...
By, Kunthel; Qaqish, Bahjat F; Preisser, John S; Perin, Jamie; Zink, Richard C
This article describes a new software for modeling correlated binary data based on orthogonalized residuals, a recently developed estimating equations approach that includes, as a special case, alternating logistic regressions. The software is flexible with respect to fitting in that the user can choose estimating equations for association models based on alternating logistic regressions or orthogonalized residuals, the latter choice providing a non-diagonal working covariance matrix for second moment parameters providing potentially greater efficiency. Regression diagnostics based on this method are also implemented in the software. The mathematical background is briefly reviewed and the software is applied to medical data sets. Published by Elsevier Ireland Ltd.
Estimativas de parâmetros genéticos para produção de leite e persistência da lactação em vacas Gir, aplicando modelos de regressão aleatória Estimates of genetic parameters for milk yield and persistency of lactation of Gyr cows, applying random regression models
Luis Gabriel González Herrera
of Gyr cows calving between 1990 and 2005 were used to estimate genetic parameters of monthly test-day milk yield (TDMY. Records were analyzed by random regression models (MRA that included the additive genetic and permanent environmental random effects and the contemporary group, age of cow at calving (linear and quadratic components and the average trend of the population as fixed effects. Random trajectories were fitted by Wilmink's (WIL and Ali & Schaeffer's (AS parametric functions. Residual variances were fitted by step functions with 1, 4, 6 or 10 classes. The contemporary group was defined by herd-year-season of test-day and included at least three animals. Models were compared by Akaike's and Schwarz's Bayesian (BIC information criterion. The AS function used for modeling the additive genetic and permanent environmental effects with heterogeneous residual variances adjusted with a step function with four classes was the best fitted model. Heritability estimates ranged from 0.21 to 0.33 for the AS function and from 0.17 to 0.30 for WIL function and were larger in the first half of the lactation period. Genetic correlations between TDMY were high and positive for adjacent test-days and decreased as days between records increased. Predicted breeding values for total 305-day milk yield (MRA305 and specific periods of lactation (obtained by the mean of all breeding values in the periods using the AS function were compared with that predicted by a standard model using accumulated 305-day milk yield (PTA305 by rank correlation. The magnitude of correlations suggested differences may be observed in ranking animals by using the different criteria which were compared in this study.
One of the most widely used tools in statistical forecasting, single equation regression models is examined here. A companion to the author's earlier work, Forecasting with Univariate Box-Jenkins Models: Concepts and Cases, the present text pulls together recent time series ideas and gives special attention to possible intertemporal patterns, distributed lag responses of output to input series and the auto correlation patterns of regression disturbance. It also includes six case studies.
Full Text Available Regression is the study of the dependence of a response variable y on a collection predictors p collected in x. In dimension reduction regression, we seek to find a few linear combinations β1x,...,βdx, such that all the information about the regression is contained in these linear combinations. If d is very small, perhaps one or two, then the regression problem can be summarized using simple graphics; for example, for d=1, the plot of y versus β1x contains all the regression information. When d=2, a 3D plot contains all the information. Several methods for estimating d and relevant functions of β1,..., βdhave been suggested in the literature. In this paper, we describe an R package for three important dimension reduction methods: sliced inverse regression or sir, sliced average variance estimates, or save, and principal Hessian directions, or phd. The package is very general and flexible, and can be easily extended to include other methods of dimension reduction. It includes tests and estimates of the dimension , estimates of the relevant information including β1,..., βd, and some useful graphical summaries as well.
Goeman, Jelle J
This paper presents autocorrelated logistic ridge regression, an extension of logistic ridge regression for ordered covariates that is based on the assumption that adjacent covariates have similar regression coefficients. The method is applied to the analysis of proteomics mass spectra.
Generalized regression quantiles, including the conditional quantiles and expectiles as special cases, are useful alternatives to the conditional means for characterizing a conditional distribution, especially when the interest lies in the tails. We develop a functional data analysis approach to jointly estimate a family of generalized regression quantiles. Our approach assumes that the generalized regression quantiles share some common features that can be summarized by a small number of principal component functions. The principal component functions are modeled as splines and are estimated by minimizing a penalized asymmetric loss measure. An iterative least asymmetrically weighted squares algorithm is developed for computation. While separate estimation of individual generalized regression quantiles usually suffers from large variability due to lack of sufficient data, by borrowing strength across data sets, our joint estimation approach significantly improves the estimation efficiency, which is demonstrated in a simulation study. The proposed method is applied to data from 159 weather stations in China to obtain the generalized quantile curves of the volatility of the temperature at these stations. © 2013 Springer Science+Business Media New York.
Yoon, Sangho; Assimes, Themistocles L; Quertermous, Thomas; Hsiao, Chin-Fu; Chuang, Lee-Ming; Hwu, Chii-Min; Rajaratnam, Bala; Olshen, Richard A
In this paper we try to define insulin resistance (IR) precisely for a group of Chinese women. Our definition deliberately does not depend upon body mass index (BMI) or age, although in other studies, with particular random effects models quite different from models used here, BMI accounts for a large part of the variability in IR. We accomplish our goal through application of Gauss mixture vector quantization (GMVQ), a technique for clustering that was developed for application to lossy data compression. Defining data come from measurements that play major roles in medical practice. A precise statement of what the data are is in Section 1. Their family structures are described in detail. They concern levels of lipids and the results of an oral glucose tolerance test (OGTT). We apply GMVQ to residuals obtained from regressions of outcomes of an OGTT and lipids on functions of age and BMI that are inferred from the data. A bootstrap procedure developed for our family data supplemented by insights from other approaches leads us to believe that two clusters are appropriate for defining IR precisely. One cluster consists of women who are IR, and the other of women who seem not to be. Genes and other features are used to predict cluster membership. We argue that prediction with "main effects" is not satisfactory, but prediction that includes interactions may be.
Full Text Available In this paper we try to define insulin resistance (IR precisely for a group of Chinese women. Our definition deliberately does not depend upon body mass index (BMI or age, although in other studies, with particular random effects models quite different from models used here, BMI accounts for a large part of the variability in IR. We accomplish our goal through application of Gauss mixture vector quantization (GMVQ, a technique for clustering that was developed for application to lossy data compression. Defining data come from measurements that play major roles in medical practice. A precise statement of what the data are is in Section 1. Their family structures are described in detail. They concern levels of lipids and the results of an oral glucose tolerance test (OGTT. We apply GMVQ to residuals obtained from regressions of outcomes of an OGTT and lipids on functions of age and BMI that are inferred from the data. A bootstrap procedure developed for our family data supplemented by insights from other approaches leads us to believe that two clusters are appropriate for defining IR precisely. One cluster consists of women who are IR, and the other of women who seem not to be. Genes and other features are used to predict cluster membership. We argue that prediction with "main effects" is not satisfactory, but prediction that includes interactions may be.
Hybrid regression trees applied to the monitoring of dynamic safety of isolated networks with large eolic production contribution; Utilizacao de arvores de regressao hibridas na monitorizacao da seguranca dinamica de redes isoladas com grande producao eolica
Lopes, J.A Pecas; Vasconcelos, Maria Helena O.P. de [Instituto de Engenharia de Sistemas e Computadores (INESC), Porto (Portugal). E-mail: firstname.lastname@example.org; email@example.com
This paper describes in a synthetic manner the technology adopted to define structures used in the fast evaluation of dynamic safety of isolated network with high level of eolic production contribution. This methodology uses hybrid regression trees, which allows the quantification the endurance connected to the dynamic behavior of these networks by emulating the frequency minimum deviation that will be experienced by the system when submitted toa pre-defined perturbation. Also, new procedures for data automatic generation are presented, which will be used for construction and measurements of the evaluation structures performance. The paper describes the Terceira island - Acores archipelago network study case.
Cook, R Dennis
Covers the use of dynamic and interactive computer graphics in linear regression analysis, focusing on analytical graphics. Features new techniques like plot rotation. The authors have composed their own regression code, using Xlisp-Stat language called R-code, which is a nearly complete system for linear regression analysis and can be utilized as the main computer program in a linear regression course. The accompanying disks, for both Macintosh and Windows computers, contain the R-code and Xlisp-Stat. An Instructor's Manual presenting detailed solutions to all the problems in the book is ava
Of related interest. Nonlinear Regression Analysis and its Applications Douglas M. Bates and Donald G. Watts ".an extraordinary presentation of concepts and methods concerning the use and analysis of nonlinear regression models.highly recommend[ed].for anyone needing to use and/or understand issues concerning the analysis of nonlinear regression models." --Technometrics This book provides a balance between theory and practice supported by extensive displays of instructive geometrical constructs. Numerous in-depth case studies illustrate the use of nonlinear regression analysis--with all data s
Kumar, Suman; Karmakar, Probir; Mohanan, Akhil
Regression in autism applies to the phenomenon of apparently normal early development followed by the loss of previously acquired skills and manifestation of symptoms of autism. Estimates of the frequency of regression in autism range from 10% to 50%. Although there are tools available to evaluate and diagnose Autism Spectrum Disorders, however, there is no published tool available in Indian context to identify the children with ASD at an early age. The study was aimed to describe the differences in language regression between children with ASD and typically developing children and also to determine the age of regression. Regression screening tool, a questionnaire was developed based on Regression Supplement Form (Goldberg et al., 2003). The skills were validated by five Clinical Psychologists. It comprised of 16 skills which included domains like, 'spoken language and non verbal communication', 'social interest and responsiveness' and 'play and imagination'. This retrospective study was conducted on a single group. The participants consisted of parents of 30 children with ASD (22 males and 8 females). The findings revealed a significant regression in children with ASD. The mean regression age is 20.19 months (SD-5.2). The regression profile of the children with ASD revealed regression of language skills occurred at 19.16 months followed by non language skills at 20.5 months. Based on the findings it can be stated that inclusion of regression screening tool will offer clinicians a convenient tool to examine the phenomena of regression in children with ASD and identify them as early as 21 months of age for early intervention. Copyright © 2013 Elsevier Ireland Ltd. All rights reserved.
Preisser, John S; Das, Kalyan; Benecha, Habtamu; Stamm, John W
Sometimes there is interest in a dichotomized outcome indicating whether a count variable is positive or zero. Under this scenario, the application of ordinary logistic regression may result in efficiency loss, which is quantifiable under an assumed model for the counts. In such situations, a shared-parameter hurdle model is investigated for more efficient estimation of regression parameters relating to overall effects of covariates on the dichotomous outcome, while handling count data with many zeroes. One model part provides a logistic regression containing marginal log odds ratio effects of primary interest, while an ancillary model part describes the mean count of a Poisson or negative binomial process in terms of nuisance regression parameters. Asymptotic efficiency of the logistic model parameter estimators of the two-part models is evaluated with respect to ordinary logistic regression. Simulations are used to assess the properties of the models with respect to power and Type I error, the latter investigated under both misspecified and correctly specified models. The methods are applied to data from a randomized clinical trial of three toothpaste formulations to prevent incident dental caries in a large population of Scottish schoolchildren. © The Author(s) 2014.
Boček, Pavel; Šiman, Miroslav
Roč. 53, č. 3 (2017), s. 480-492 ISSN 0023-5954 R&D Projects: GA ČR GA14-07234S Institutional support: RVO:67985556 Keywords : multivariate quantile * regression quantile * halfspace depth * depth contour Subject RIV: BD - Theory of Information OBOR OECD: Applied mathematics Impact factor: 0.379, year: 2016 http://library.utia.cas.cz/separaty/2017/SI/bocek-0476587.pdf
P.M.C. de Boer (Paul); C.M. Hafner (Christian)
textabstractWe argue in this paper that general ridge (GR) regression implies no major complication compared with simple ridge regression. We introduce a generalization of an explicit GR estimator derived by Hemmerle and by Teekens and de Boer and show that this estimator, which is more
Full Text Available Regression clustering is a mixture of unsupervised and supervised statistical learning and data mining method which is found in a wide range of applications including artificial intelligence and neuroscience. It performs unsupervised learning when it clusters the data according to their respective unobserved regression hyperplanes. The method also performs supervised learning when it fits regression hyperplanes to the corresponding data clusters. Applying regression clustering in practice requires means of determining the underlying number of clusters in the data, finding the cluster label of each data point, and estimating the regression coefficients of the model. In this paper, we review the estimation and selection issues in regression clustering with regard to the least squares and robust statistical methods. We also provide a model selection based technique to determine the number of regression clusters underlying the data. We further develop a computing procedure for regression clustering estimation and selection. Finally, simulation studies are presented for assessing the procedure, together with analyzing a real data set on RGB cell marking in neuroscience to illustrate and interpret the method.
Liu, R X; Kuang, J; Gong, Q; Hou, X L
The paper introduces all indices of multicollinearity diagnoses, the basic principle of principal component regression and determination of 'best' equation method. The paper uses an example to describe how to do principal component regression analysis with SPSS 10.0: including all calculating processes of the principal component regression and all operations of linear regression, factor analysis, descriptives, compute variable and bivariate correlations procedures in SPSS 10.0. The principal component regression analysis can be used to overcome disturbance of the multicollinearity. The simplified, speeded up and accurate statistical effect is reached through the principal component regression analysis with SPSS.
This chapter deals with the multiple linear regression. That is we investigate the situation where the mean of a variable depends linearly on a set of covariables. The noise is supposed to be gaussian. We develop the least squared method to get the parameter estimators and estimates of their precisions. This leads to design confidence intervals, prediction intervals, global tests, individual tests and more generally tests of submodels defined by linear constraints. Methods for model's choice and variables selection, measures of the quality of the fit, residuals study, diagnostic methods are presented. Finally identification of departures from the model's assumptions and the way to deal with these problems are addressed. A real data set is used to illustrate the methodology with software R. Note that this chapter is intended to serve as a guide for other regression methods, like logistic regression or AFT models and Cox regression.
Quade, Markus; Gout, Julien; Abel, Markus
We present Glyph - a Python package for genetic programming based symbolic regression. Glyph is designed for usage let by numerical simulations let by real world experiments. For experimentalists, glyph-remote provides a separation of tasks: a ZeroMQ interface splits the genetic programming optimization task from the evaluation of an experimental (or numerical) run. Glyph can be accessed at http://github.com/ambrosys/glyph . Domain experts are be able to employ symbolic regression in their ex...
Tang, Songze; Xiao, Liang; Liu, Pengfei; Huang, Lili; Zhou, Nan; Xu, Yang
Pansharpening is an effective way to enhance the spatial resolution of a multispectral (MS) image by fusing it with a provided panchromatic image. Instead of restricting the coding coefficients of low-resolution (LR) and high-resolution (HR) images to be equal, we propose a pansharpening approach via sparse regression in which the relationship between sparse coefficients of HR and LR MS images is modeled by ridge regression and elastic-net regression simultaneously learning the corresponding dictionaries. The compact dictionaries are learned based on the sampled patch pairs from the high- and low-resolution images, which can greatly characterize the structural information of the LR MS and HR MS images. Later, taking the complex relationship between the coding coefficients of LR MS and HR MS images into account, the ridge regression is used to characterize the relationship of intrapatches. The elastic-net regression is employed to describe the relationship of interpatches. Thus, the HR MS image can be almost identically reconstructed by multiplying the HR dictionary and the calculated sparse coefficient vector with the learned regression relationship. The simulated and real experimental results illustrate that the proposed method outperforms several well-known methods, both quantitatively and perceptually.
Full Text Available ABSTRACT In this research work attempt was made to critically analyze the effect of Federal Road Safety Corps FRSC to various categories of road traffic accident in Nigeria for a certain period of time over all the states of federation including Federal capital territory. This was done by using panel data regression model. The conventional OLS estimator applied to panel data has over time led to inconsistent estimate of the regression parameters due to lack of adequately handling individual specific effect of the parameters. A better and preferable estimation method was exploited in this analysis to obtain a more reliable result that can be used for prediction of likely future occurrence. Among all the estimation methods considered only the fixed effect panel data regression method with heteroscedasticity variance-covariance tools gives a consistent estimate of the regression parameters.
Experimental variability and data pre-processing as factors affecting the discrimination power of some chemometric approaches (PCA, CA and a new algorithm based on linear regression) applied to (+/-)ESI/MS and RPLC/UV data: Application on green tea extracts.
Iorgulescu, E; Voicu, V A; Sârbu, C; Tache, F; Albu, F; Medvedovici, A
The influence of the experimental variability (instrumental repeatability, instrumental intermediate precision and sample preparation variability) and data pre-processing (normalization, peak alignment, background subtraction) on the discrimination power of multivariate data analysis methods (Principal Component Analysis -PCA- and Cluster Analysis -CA-) as well as a new algorithm based on linear regression was studied. Data used in the study were obtained through positive or negative ion monitoring electrospray mass spectrometry (+/-ESI/MS) and reversed phase liquid chromatography/UV spectrometric detection (RPLC/UV) applied to green tea extracts. Extractions in ethanol and heated water infusion were used as sample preparation procedures. The multivariate methods were directly applied to mass spectra and chromatograms, involving strictly a holistic comparison of shapes, without assignment of any structural identity to compounds. An alternative data interpretation based on linear regression analysis mutually applied to data series is also discussed. Slopes, intercepts and correlation coefficients produced by the linear regression analysis applied on pairs of very large experimental data series successfully retain information resulting from high frequency instrumental acquisition rates, obviously better defining the profiles being compared. Consequently, each type of sample or comparison between samples produces in the Cartesian space an ellipsoidal volume defined by the normal variation intervals of the slope, intercept and correlation coefficient. Distances between volumes graphically illustrates (dis)similarities between compared data. The instrumental intermediate precision had the major effect on the discrimination power of the multivariate data analysis methods. Mass spectra produced through ionization from liquid state in atmospheric pressure conditions of bulk complex mixtures resulting from extracted materials of natural origins provided an excellent data
Yamashita, H.; Marinova, I.; Cingoski, V.
These proceedings contain papers relating to the 3rd Japanese-Bulgarian-Macedonian Joint Seminar on Applied Electromagnetics. Included are the following groups: Numerical Methods I; Electrical and Mechanical System Analysis and Simulations; Inverse Problems and Optimizations; Software Methodology; Numerical Methods II; Applied Electromagnetics
Regression quantiles can be substantially biased when the covariates are measured with error. In this paper we propose a new method that produces consistent linear quantile estimation in the presence of covariate measurement error. The method corrects the measurement error induced bias by constructing joint estimating equations that simultaneously hold for all the quantile levels. An iterative EM-type estimation algorithm to obtain the solutions to such joint estimation equations is provided. The finite sample performance of the proposed method is investigated in a simulation study, and compared to the standard regression calibration approach. Finally, we apply our methodology to part of the National Collaborative Perinatal Project growth data, a longitudinal study with an unusual measurement error structure. © 2009 American Statistical Association.
Full Text Available Polynomial specifications are widely used, not only in applied economics, but also in epidemiology, physics, political analysis and psychology, just to mention a few examples. In many cases, the data employed to estimate such specifications are time series that may exhibit stochastic nonstationary behavior. We extend Phillips’ results (Phillips, P. Understanding spurious regressions in econometrics. J. Econom. 1986, 33, 311–340. by proving that an inference drawn from polynomial specifications, under stochastic nonstationarity, is misleading unless the variables cointegrate. We use a generalized polynomial specification as a vehicle to study its asymptotic and finite-sample properties. Our results, therefore, lead to a call to be cautious whenever practitioners estimate polynomial regressions.
Full Text Available This study formulates regression of vector data that will enable statistical analysis of various geodetic phenomena such as, polar motion, ocean currents, typhoon/hurricane tracking, crustal deformations, and precursory earthquake signals. The observed vector variable of an event (dependent vector variable is expressed as a function of a number of hypothesized phenomena realized also as vector variables (independent vector variables and/or scalar variables that are likely to impact the dependent vector variable. The proposed representation has the unique property of solving the coefficients of independent vector variables (explanatory variables also as vectors, hence it supersedes multivariate multiple regression models, in which the unknown coefficients are scalar quantities. For the solution, complex numbers are used to rep- resent vector information, and the method of least squares is deployed to estimate the vector model parameters after transforming the complex vector regression model into a real vector regression model through isomorphism. Various operational statistics for testing the predictive significance of the estimated vector parameter coefficients are also derived. A simple numerical example demonstrates the use of the proposed vector regression analysis in modeling typhoon paths.
Bache, Stefan Holst
A new and alternative quantile regression estimator is developed and it is shown that the estimator is root n-consistent and asymptotically normal. The estimator is based on a minimax ‘deviance function’ and has asymptotically equivalent properties to the usual quantile regression estimator. It is......, however, a different and therefore new estimator. It allows for both linear- and nonlinear model specifications. A simple algorithm for computing the estimates is proposed. It seems to work quite well in practice but whether it has theoretical justification is still an open question....
Clausel, M.; Grégoire, G.
An exercise is proposed to illustrate the logistic regression. One investigates the different risk factors in the apparition of coronary heart disease. It has been proposed in Chapter 5 of the book of D.G. Kleinbaum and M. Klein, "Logistic Regression", Statistics for Biology and Health, Springer Science Business Media, LLC (2010) and also by D. Chessel and A.B. Dufour in Lyon 1 (see Sect. 6 of http://pbil.univ-lyon1.fr/R/pdf/tdr341.pdf). This example is based on data given in the file evans.txt coming from http://www.sph.emory.edu/dkleinb/logreg3.htm#data.
of recursive procedures. Acta Informatica , 45(6):403 – 439, 2008. [GS11] Benny Godlin and Ofer Strichman. Regression verifica- tion. Technical Report...functions. Therefore, we need to rede - fine m-term. – Mutual termination. If either function f or function f ′ (or both) is non- deterministic, then their
Edwards, T. R.
Program rapidly selects best-suited set of coefficients. User supplies only vectors of independent and dependent data and specifies confidence level required. Program uses stepwise statistical procedure for relating minimal set of variables to set of observations; final regression contains only most statistically significant coefficients. Program is written in FORTRAN IV for batch execution and has been implemented on NOVA 1200.
Jensen, Bjørn Sand; Nielsen, Jens Brehm; Larsen, Jan
We extend the Gaussian process (GP) framework for bounded regression by introducing two bounded likelihood functions that model the noise on the dependent variable explicitly. This is fundamentally different from the implicit noise assumption in the previously suggested warped GP framework. We...
Van Erp, H.R.N.; Van Gelder, P.H.A.J.M.
In this paper we present a Bayesian logistic regression analysis. It is found that if one wishes to derive the posterior distribution of the probability of some event, then, together with the traditional Bayes Theorem and the integrating out of nuissance parameters, the Jacobian transformation is an
G. Yu Evzikov
Full Text Available Compression of the spinal nerve root, giving rise to pain and sensory and motor disorders in the area of its innervation is the most vivid manifestation of herniated intervertebral disk. Different treatment modalities, including neurosurgery, for evolving these conditions are discussed. There has been recent evidence that spontaneous regression of disk herniation can regress. The paper describes a female patient with large lateralized disc extrusion that has caused compression of the nerve root S1, leading to obvious myotonic and radicular syndrome. Magnetic resonance imaging has shown that the clinical manifestations of discogenic radiculopathy, as well myotonic syndrome and morphological changes completely regressed 8 months later. The likely mechanism is inflammation-induced resorption of a large herniated disk fragment, which agrees with the data available in the literature. A decision to perform neurosurgery for which the patient had indications was made during her first consultation. After regression of discogenic radiculopathy, there was only moderate pain caused by musculoskeletal diseases (facet syndrome, piriformis syndrome that were successfully eliminated by minimally invasive techniques.
Kernberg, O F
The choice of good leaders is a major task for all organizations. Inforamtion regarding the prospective administrator's personality should complement questions regarding his previous experience, his general conceptual skills, his technical knowledge, and the specific skills in the area for which he is being selected. The growing psychoanalytic knowledge about the crucial importance of internal, in contrast to external, object relations, and about the mutual relationships of regression in individuals and in groups, constitutes an important practical tool for the selection of leaders.
Breiman, Leo; Olshen, Richard A; Stone, Charles J
The methodology used to construct tree structured rules is the focus of this monograph. Unlike many other statistical procedures, which moved from pencil and paper to calculators, this text's use of trees was unthinkable before computers. Both the practical and theoretical sides have been developed in the authors' study of tree methods. Classification and Regression Trees reflects these two sides, covering the use of trees as a data analysis method, and in a more mathematical framework, proving some of their fundamental properties.
Hansen, Henrik; Tarp, Finn
. There are, however, decreasing returns to aid, and the estimated effectiveness of aid is highly sensitive to the choice of estimator and the set of control variables. When investment and human capital are controlled for, no positive effect of aid is found. Yet, aid continues to impact on growth via...... investment. We conclude by stressing the need for more theoretical work before this kind of cross-country regressions are used for policy purposes....
Hilbe, Joseph M
This book really does cover everything you ever wanted to know about logistic regression … with updates available on the author's website. Hilbe, a former national athletics champion, philosopher, and expert in astronomy, is a master at explaining statistical concepts and methods. Readers familiar with his other expository work will know what to expect-great clarity.The book provides considerable detail about all facets of logistic regression. No step of an argument is omitted so that the book will meet the needs of the reader who likes to see everything spelt out, while a person familiar with some of the topics has the option to skip "obvious" sections. The material has been thoroughly road-tested through classroom and web-based teaching. … The focus is on helping the reader to learn and understand logistic regression. The audience is not just students meeting the topic for the first time, but also experienced users. I believe the book really does meet the author's goal … .-Annette J. Dobson, Biometric...
Ozenne, Brice; Sørensen, Anne Lyngholm; Scheike, Thomas
In the presence of competing risks a prediction of the time-dynamic absolute risk of an event can be based on cause-specific Cox regression models for the event and the competing risks (Benichou and Gail, 1990). We present computationally fast and memory optimized C++ functions with an R interfac...... functionals. The software presented here is implemented in the riskRegression package.......In the presence of competing risks a prediction of the time-dynamic absolute risk of an event can be based on cause-specific Cox regression models for the event and the competing risks (Benichou and Gail, 1990). We present computationally fast and memory optimized C++ functions with an R interface...... for predicting the covariate specific absolute risks, their confidence intervals, and their confidence bands based on right censored time to event data. We provide explicit formulas for our implementation of the estimator of the (stratified) baseline hazard function in the presence of tied event times. As a by...
Rakow, Ernest A.
Ridge regression is a technique used to ameliorate the problem of highly correlated independent variables in multiple regression analysis. This paper explains the fundamentals of ridge regression and illustrates its use. (JKS)
The paper develops a linear regression model approach that can be applied to : crash data to predict vehicle crashes. The proposed approach involves novice data aggregation : to satisfy linear regression assumptions; namely error structure normality ...
L. Yu. Glukhova
Full Text Available The author represents the review of current scientific literature devoted to autistic epileptiform regression — the special form of autistic disorder, characterized by development of severe communicative disorders in children as a result of continuous prolonged epileptiform activity on EEG. This condition has been described by R.F. Tuchman and I. Rapin in 1997. The author describes the aspects of pathogenesis, clinical pictures and diagnostics of this disorder, including the peculiar anomalies on EEG (benign epileptiform patterns of childhood, with a high index of epileptiform activity, especially in the sleep. The especial attention is given to approaches to the treatment of autistic epileptiform regression. Efficacy of valproates, corticosteroid hormones and antiepileptic drugs of other groups is considered.
Bordacconi, Mats Joe; Larsen, Martin Vinæs
Humans are fundamentally primed for making causal attributions based on correlations. This implies that researchers must be careful to present their results in a manner that inhibits unwarranted causal attribution. In this paper, we present the results of an experiment that suggests regression...... more likely. Our experiment drew on a sample of 235 university students from three different social science degree programs (political science, sociology and economics), all of whom had received substantial training in statistics. The subjects were asked to compare and evaluate the validity...
FENG, Ge; PENG, Jing; TU, Dongke; ZHENG, Julia Z.; FENG, Changyong
Summary Regression is one of the favorite tools in applied statistics. However, misuse and misinterpretation of results from regression analysis are common in biomedical research. In this paper we use statistical theory and simulation studies to clarify some paradoxes around this popular statistical method. In particular, we show that a widely used model selection procedure employed in many publications in top medical journals is wrong. Formal procedures based on solid statistical theory shou...
van Wieringen, Wessel N.
The linear regression model cannot be fitted to high-dimensional data, as the high-dimensionality brings about empirical non-identifiability. Penalized regression overcomes this non-identifiability by augmentation of the loss function by a penalty (i.e. a function of regression coefficients). The ridge penalty is the sum of squared regression coefficients, giving rise to ridge regression. Here many aspect of ridge regression are reviewed e.g. moments, mean squared error, its equivalence to co...
Kuhl, Mark R.
Current navigation requirements depend on a geometric dilution of precision (GDOP) criterion. As long as the GDOP stays below a specific value, navigation requirements are met. The GDOP will exceed the specified value when the measurement geometry becomes too collinear. A new signal processing technique, called Ridge Regression Processing, can reduce the effects of nearly collinear measurement geometry; thereby reducing the inflation of the measurement errors. It is shown that the Ridge signal processor gives a consistently better mean squared error (MSE) in position than the Ordinary Least Mean Squares (OLS) estimator. The applicability of this technique is currently being investigated to improve the following areas: receiver autonomous integrity monitoring (RAIM), coverage requirements, availability requirements, and precision approaches.
This book treats econometric methods for analysis of applied econometrics with a particular focus on applications in macroeconomics. Topics include macroeconomic data, panel data models, unobserved heterogeneity, model comparison, endogeneity, dynamic econometric models, vector autoregressions, forecast evaluation, structural identification. The books provides undergraduate students with the necessary knowledge to be able to undertake econometric analysis in modern macroeconomic research.
Full Text Available Regression models are introduced into the receiver operating characteristic (ROC analysis to accommodate effects of covariates, such as genes. If many covariates are available, the variable selection issue arises. The traditional induced methodology separately models outcomes of diseased and nondiseased groups; thus, separate application of variable selections to two models will bring barriers in interpretation, due to differences in selected models. Furthermore, in the ROC regression, the accuracy of area under the curve (AUC should be the focus instead of aiming at the consistency of model selection or the good prediction performance. In this paper, we obtain one single objective function with the group SCAD to select grouped variables, which adapts to popular criteria of model selection, and propose a two-stage framework to apply the focused information criterion (FIC. Some asymptotic properties of the proposed methods are derived. Simulation studies show that the grouped variable selection is superior to separate model selections. Furthermore, the FIC improves the accuracy of the estimated AUC compared with other criteria.
Morris, John D.
Although methods for using ordinary least squares regression computer programs to calculate a ridge regression are available, the calculation of a stepwise ridge regression requires a special purpose algorithm and computer program. The correct stepwise ridge regression procedure is given, and a parallel FORTRAN computer program is described.…
Wu, Yuanshan; Yin, Guosheng
The main challenge in the context of cure rate analysis is that one never knows whether censored subjects are cured or uncured, or whether they are susceptible or insusceptible to the event of interest. Considering the susceptible indicator as missing data, we propose a multiple imputation approach to cure rate quantile regression for censored data with a survival fraction. We develop an iterative algorithm to estimate the conditionally uncured probability for each subject. By utilizing this estimated probability and Bernoulli sample imputation, we can classify each subject as cured or uncured, and then employ the locally weighted method to estimate the quantile regression coefficients with only the uncured subjects. Repeating the imputation procedure multiple times and taking an average over the resultant estimators, we obtain consistent estimators for the quantile regression coefficients. Our approach relaxes the usual global linearity assumption, so that we can apply quantile regression to any particular quantile of interest. We establish asymptotic properties for the proposed estimators, including both consistency and asymptotic normality. We conduct simulation studies to assess the finite-sample performance of the proposed multiple imputation method and apply it to a lung cancer study as an illustration. © 2016, The International Biometric Society.
González, Andrés; Terasvirta, Timo; Dijk, Dick van
models to the panel context. The strategy consists of model specification based on homogeneity tests, parameter estimation, and model evaluation, including tests of parameter constancy and no remaining heterogeneity. The model is applied to describing firms' investment decisions in the presence...
Kaengthong, Nattacha; Domthong, Uthumporn
This study gives attention to indicators in predictive power of the Generalized Linear Model (GLM) which are widely used; however, often having some restrictions. We are interested in regression correlation coefficient for a Poisson regression model. This is a measure of predictive power, and defined by the relationship between the dependent variable (Y) and the expected value of the dependent variable given the independent variables [E(Y|X)] for the Poisson regression model. The dependent variable is distributed as Poisson. The purpose of this research was modifying regression correlation coefficient for Poisson regression model. We also compare the proposed modified regression correlation coefficient with the traditional regression correlation coefficient in the case of two or more independent variables, and having multicollinearity in independent variables. The result shows that the proposed regression correlation coefficient is better than the traditional regression correlation coefficient based on Bias and the Root Mean Square Error (RMSE).
Luo, Chongliang; Liu, Jin; Dey, Dipak K; Chen, Kun
In many fields, multi-view datasets, measuring multiple distinct but interrelated sets of characteristics on the same set of subjects, together with data on certain outcomes or phenotypes, are routinely collected. The objective in such a problem is often two-fold: both to explore the association structures of multiple sets of measurements and to develop a parsimonious model for predicting the future outcomes. We study a unified canonical variate regression framework to tackle the two problems simultaneously. The proposed criterion integrates multiple canonical correlation analysis with predictive modeling, balancing between the association strength of the canonical variates and their joint predictive power on the outcomes. Moreover, the proposed criterion seeks multiple sets of canonical variates simultaneously to enable the examination of their joint effects on the outcomes, and is able to handle multivariate and non-Gaussian outcomes. An efficient algorithm based on variable splitting and Lagrangian multipliers is proposed. Simulation studies show the superior performance of the proposed approach. We demonstrate the effectiveness of the proposed approach in an [Formula: see text] intercross mice study and an alcohol dependence study. © The Author 2016. Published by Oxford University Press. All rights reserved. For permissions, please e-mail: firstname.lastname@example.org.
Using linear algebra this thesis developed linear regression analysis including analysis of variance, covariance analysis, special experimental designs, linear and fertility adjustments, analysis of experiments at different places and times. The determination of the orthogonal projection, yielding
Regression modeling is one of the most important statistical techniques used in analytical epidemiology. By means of regression models the effect of one or several explanatory variables (e.g., exposures, subject characteristics, risk factors) on a response variable such as mortality or cancer can be investigated. From multiple regression models, adjusted effect estimates can be obtained that take the effect of potential confounders into account. Regression methods can be applied in all epidemiologic study designs so that they represent a universal tool for data analysis in epidemiology. Different kinds of regression models have been developed in dependence on the measurement scale of the response variable and the study design. The most important methods are linear regression for continuous outcomes, logistic regression for binary outcomes, Cox regression for time-to-event data, and Poisson regression for frequencies and rates. This chapter provides a nontechnical introduction to these regression models with illustrating examples from cancer research.
Smeekes, Stephan; Wijler, Etiënne
We study the suitability of lasso-type penalized regression techniques when applied to macroeconomic forecasting with high-dimensional datasets. We consider performance of the lasso-type methods when the true DGP is a factor model, contradicting the sparsity assumption underlying penalized
preferable when possible to work with a simple functional form in transformed variables rather than with a more complicated form in the original variables. In this paper, it is shown that linear transformations applied to independent variables in polynomial regression models affect the t ratio and hence the statistical ...
Orszag, A.; Antonetti, A.
The 1988 progress report, of the Applied Optics laboratory, of the (Polytechnic School, France), is presented. The optical fiber activities are focused on the development of an optical gyrometer, containing a resonance cavity. The following domains are included, in the research program: the infrared laser physics, the laser sources, the semiconductor physics, the multiple-photon ionization and the nonlinear optics. Investigations on the biomedical, the biological and biophysical domains are carried out. The published papers and the congress communications are listed [fr
Gao Zhengming; Zhao Juan; He Shengping
In order to analyze the decay heating power of a certain radioactive isotope per kilogram with polynomial regression method, the paper firstly demonstrated the broad usage of polynomial function and deduced its parameters with ordinary least squares estimate. Then significance test method of polynomial regression function is derived considering the similarity between the polynomial regression model and the multivariable linear regression model. Finally, polynomial regression analysis and significance test of the polynomial function are done to the decay heating power of the iso tope per kilogram in accord with the authors' real work. (authors)
Hobza, T.; Pardo, L.; Vajda, Igor
Roč. 138, č. 12 (2008), s. 3822-3840 ISSN 0378-3758 R&D Projects: GA MŠk 1M0572 Grant - others:Instituto Nacional de Estadistica(ES) MPO FI - IM3/136; GA MŠk(CZ) MTM 2006-06872 Institutional research plan: CEZ:AV0Z10750506 Keywords : Logistic regression * Median * Robustness * Consistency and asymptotic normality * Morgenthaler * Bianco and Yohai * Croux and Hasellbroeck Subject RIV: BB - Applied Statistics, Operational Research Impact factor: 0.679, year: 2008 http://library.utia.cas.cz/separaty/2008/SI/vajda-robust%20median%20estimator%20in%20logistic%20regression.pdf
Whittaker, J C; Thompson, R; Denham, M C
In cross between inbred lines, linear regression can be used to estimate the correlation of markers with a trait of interest; these marker effects then allow marker assisted selection (MAS) for quantitative traits. Usually a subset of markers to include in the model must be selected: no completely satisfactory method of doing this exists. We show that replacing this selection of markers by ridge regression can improve the mean response to selection and reduce the variability of selection response.
Glaws, Andrew; Constantine, Paul G.; Cook, R. Dennis
We investigate the application of sufficient dimension reduction (SDR) to a noiseless data set derived from a deterministic function of several variables. In this context, SDR provides a framework for ridge recovery. In this second part, we explore the numerical subtleties associated with using two inverse regression methods---sliced inverse regression (SIR) and sliced average variance estimation (SAVE)---for ridge recovery. This includes a detailed numerical analysis of the eigenvalues of th...
Møller, Jan Kloppenborg; Nielsen, Henrik Aalborg; Madsen, Henrik
An algorithm for time-adaptive quantile regression is presented. The algorithm is based on the simplex algorithm, and the linear optimization formulation of the quantile regression problem is given. The observations have been split to allow a direct use of the simplex algorithm. The simplex method...... and an updating procedure are combined into a new algorithm for time-adaptive quantile regression, which generates new solutions on the basis of the old solution, leading to savings in computation time. The suggested algorithm is tested against a static quantile regression model on a data set with wind power...... production, where the models combine splines and quantile regression. The comparison indicates superior performance for the time-adaptive quantile regression in all the performance parameters considered....
Cizek, Pavel; Sadikoglu, Serhan
In this paper, an extension of the indirect inference methodology to semiparametric estimation is explored in the context of censored regression. Motivated by weak small-sample performance of the censored regression quantile estimator proposed by Powell (J Econom 32:143–155, 1986a), two- and
Fawzi, Alhussein; Fiot, Jean-Baptiste; Chen, Bei; Sinn, Mathieu; Frossard, Pascal
Additive models are regression methods which model the response variable as the sum of univariate transfer functions of the input variables. Key benefits of additive models are their accuracy and interpretability on many real-world tasks. Additive models are however not adapted to problems involving a large number (e.g., hundreds) of input variables, as they are prone to overfitting in addition to losing interpretability. In this paper, we introduce a novel framework for applying additive ...
The main focus of logistic regression analysis is classification of individuals in different groups. The aim of the present study is to explain basic concepts and processes of binary logistic regression analysis intended to determine the combination of independent variables which best explain the membership in certain groups called dichotomous…
In nonparametric regression, it is often needed to detect whether there are jump discontinuities in the mean function. In this paper, we revisit the difference-based method in [13 H.-G. Müller and U. Stadtmüller, Discontinuous versus smooth regression, Ann. Stat. 27 (1999), pp. 299–337. doi: 10.1214/aos/1018031100
Karlsson, S.M.; Pont, S.C.; Koenderink, J.J.; Zisserman, A.
We investigate the estimation of illuminance flow using Histograms of Oriented Gradient features (HOGs). In a regression setting, we found for both ridge regression and support vector machines, that the optimal solution shows close resemblance to the gradient based structure tensor (also known as
.... It also incorporates BayesX code, which is particularly useful in nonlinear regression. To demonstrate MCMC sampling from first principles, the author includes worked examples using the R package...
Jones, Jeff A; Waller, Niels G
In this article we develop methods for assessing parameter sensitivity in logistic regression models. To set the stage for this work, we first review Waller's (2008) equations for computing fungible weights in linear regression. Next, we describe 2 methods for computing fungible weights in logistic regression. To demonstrate the utility of these methods, we compute fungible logistic regression weights using data from the Centers for Disease Control and Prevention's (2010) Youth Risk Behavior Surveillance Survey, and we illustrate how these alternate weights can be used to evaluate parameter sensitivity. To make our work accessible to the research community, we provide R code (R Core Team, 2015) that will generate both kinds of fungible logistic regression weights. (PsycINFO Database Record (c) 2016 APA, all rights reserved).
Vaeth, Michael; Skovlund, Eva
For a given regression problem it is possible to identify a suitably defined equivalent two-sample problem such that the power or sample size obtained for the two-sample problem also applies to the regression problem. For a standard linear regression model the equivalent two-sample problem is easily identified, but for generalized linear models and for Cox regression models the situation is more complicated. An approximately equivalent two-sample problem may, however, also be identified here. In particular, we show that for logistic regression and Cox regression models the equivalent two-sample problem is obtained by selecting two equally sized samples for which the parameters differ by a value equal to the slope times twice the standard deviation of the independent variable and further requiring that the overall expected number of events is unchanged. In a simulation study we examine the validity of this approach to power calculations in logistic regression and Cox regression models. Several different covariate distributions are considered for selected values of the overall response probability and a range of alternatives. For the Cox regression model we consider both constant and non-constant hazard rates. The results show that in general the approach is remarkably accurate even in relatively small samples. Some discrepancies are, however, found in small samples with few events and a highly skewed covariate distribution. Comparison with results based on alternative methods for logistic regression models with a single continuous covariate indicates that the proposed method is at least as good as its competitors. The method is easy to implement and therefore provides a simple way to extend the range of problems that can be covered by the usual formulas for power and sample size determination. Copyright 2004 John Wiley & Sons, Ltd.
Brabers, A.; Esch, T.E.M. van; Groenewegen, P.P.; Hek, K.; Mullenders, P.; Dijk, L. van; Jong, J.D. de
Background: One perceived barrier to adherence to guidelines is the existence of patient preferences which may conflict with them. We examined whether patient preferences influence the prescription of antibiotics in general practice, and how this affects adherence to guidelines. We hypothesised that
Jafri, Y.Z.; Kamal, L.
Various statistical techniques was used on five-year data from 1998-2002 of average humidity, rainfall, maximum and minimum temperatures, respectively. The relationships to regression analysis time series (RATS) were developed for determining the overall trend of these climate parameters on the basis of which forecast models can be corrected and modified. We computed the coefficient of determination as a measure of goodness of fit, to our polynomial regression analysis time series (PRATS). The correlation to multiple linear regression (MLR) and multiple linear regression analysis time series (MLRATS) were also developed for deciphering the interdependence of weather parameters. Spearman's rand correlation and Goldfeld-Quandt test were used to check the uniformity or non-uniformity of variances in our fit to polynomial regression (PR). The Breusch-Pagan test was applied to MLR and MLRATS, respectively which yielded homoscedasticity. We also employed Bartlett's test for homogeneity of variances on a five-year data of rainfall and humidity, respectively which showed that the variances in rainfall data were not homogenous while in case of humidity, were homogenous. Our results on regression and regression analysis time series show the best fit to prediction modeling on climatic data of Quetta, Pakistan. (author)
Polat, Esra; Gunay, Suleyman
One of the problems encountered in Multiple Linear Regression (MLR) is multicollinearity, which causes the overestimation of the regression parameters and increase of the variance of these parameters. Hence, in case of multicollinearity presents, biased estimation procedures such as classical Principal Component Regression (CPCR) and Partial Least Squares Regression (PLSR) are then performed. SIMPLS algorithm is the leading PLSR algorithm because of its speed, efficiency and results are easier to interpret. However, both of the CPCR and SIMPLS yield very unreliable results when the data set contains outlying observations. Therefore, Hubert and Vanden Branden (2003) have been presented a robust PCR (RPCR) method and a robust PLSR (RPLSR) method called RSIMPLS. In RPCR, firstly, a robust Principal Component Analysis (PCA) method for high-dimensional data on the independent variables is applied, then, the dependent variables are regressed on the scores using a robust regression method. RSIMPLS has been constructed from a robust covariance matrix for high-dimensional data and robust linear regression. The purpose of this study is to show the usage of RPCR and RSIMPLS methods on an econometric data set, hence, making a comparison of two methods on an inflation model of Turkey. The considered methods have been compared in terms of predictive ability and goodness of fit by using a robust Root Mean Squared Error of Cross-validation (R-RMSECV), a robust R2 value and Robust Component Selection (RCS) statistic.
Hoek, G.; Eeftens, M.; Beelen, R.; Fischer, P.; Brunekreef, B.; Boersma, K.F.; Veefkind, P.
Land use regression (LUR) modelling has increasingly been applied to model fine scale spatial variation of outdoor air pollutants including nitrogen dioxide (NO2). Satellite observations of tropospheric NO2 improved LUR model in very large study areas, including Canada, United States and Australia.
Hoek, Gerard; Eeftens, Marloes; Beelen, Rob; Fischer, Paul; Brunekreef, Bert; Boersma, K. Folkert; Veefkind, Pepijn
Land use regression (LUR) modelling has increasingly been applied to model fine scale spatial variation of outdoor air pollutants including nitrogen dioxide (NO2). Satellite observations of tropospheric NO2 improved LUR model in very large study areas, including Canada, United States and Australia.
Schwender, Holger; Ruczinski, Ingo
Logic regression is an adaptive classification and regression procedure, initially developed to reveal interacting single nucleotide polymorphisms (SNPs) in genetic association studies. In general, this approach can be used in any setting with binary predictors, when the interaction of these covariates is of primary interest. Logic regression searches for Boolean (logic) combinations of binary variables that best explain the variability in the outcome variable, and thus, reveals variables and interactions that are associated with the response and/or have predictive capabilities. The logic expressions are embedded in a generalized linear regression framework, and thus, logic regression can handle a variety of outcome types, such as binary responses in case-control studies, numeric responses, and time-to-event data. In this chapter, we provide an introduction to the logic regression methodology, list some applications in public health and medicine, and summarize some of the direct extensions and modifications of logic regression that have been proposed in the literature. Copyright © 2010 Elsevier Inc. All rights reserved.
Muqit, M M K; Marcellino, G R; Henson, D B; Young, L B; Turner, G S; Stanga, P E
To quantify the 20-ms Pattern Scan Laser (Pascal) panretinal laser photocoagulation (PRP) ablation dosage required for regression of proliferative diabetic retinopathy (PDR), and to explore factors related to long-term regression. We retrospectively studied a cohort of patients who participated in a randomised clinical trial, the Manchester Pascal Study. In all, 36 eyes of 22 patients were investigated over a follow-up period of 18 months. Primary outcome measures included visual acuity (VA) and complete PDR regression. Secondary outcomes included laser burn dosimetry, calculation of retinal PRP ablation areas, and effect of patient-related factors on disease regression. A PDR subgroup analysis was undertaken to assess all factors related to PDR regression according to disease severity. There were no significant changes in logMAR VA for any group over time. In total, 10 eyes (28%) regressed after a single PRP. Following top-up PRP treatment, regression rates varied according to severity: 75% for mild PDR (n=6), 67% for moderate PDR (n=14), and 43% in severe PDR (n=3). To achieve complete disease regression, mild PDR required a mean of 2187 PRP burns and 264 mm(2) ablation area, moderate PDR required 3998 PRP burns and area 456 mm(2), and severe PDR needed 6924 PRP laser burns (836 mm(2); P<0.05). Multiple 20-ms PRP treatments applied over time does not adversely affect visual outcomes, with favourable PDR regression rates and minimal laser burn expansion over 18 months. The average laser dosimetry and retinal ablation areas to achieve complete regression increased significantly with worsening PDR.
Bedrick, Edward J; Hund, Lauren
We develop a novel approach for quantifying small effects in regression models. Our method is based on variation in the mean function, in contrast to methods that focus on regression coefficients. Our idea applies in diverse settings such as testing for a negligible trend and quantifying differences in regression functions across strata. Straightforward Bayesian methods are proposed for inference. Four examples are used to illustrate the ideas.
Moore, Dirk F
Applied Survival Analysis Using R covers the main principles of survival analysis, gives examples of how it is applied, and teaches how to put those principles to use to analyze data using R as a vehicle. Survival data, where the primary outcome is time to a specific event, arise in many areas of biomedical research, including clinical trials, epidemiological studies, and studies of animals. Many survival methods are extensions of techniques used in linear regression and categorical data, while other aspects of this field are unique to survival data. This text employs numerous actual examples to illustrate survival curve estimation, comparison of survivals of different groups, proper accounting for censoring and truncation, model variable selection, and residual analysis. Because explaining survival analysis requires more advanced mathematics than many other statistical topics, this book is organized with basic concepts and most frequently used procedures covered in earlier chapters, with more advanced topics...
Korns, Michael F.
This chapter examines the use of Abstract Expression Grammars to perform the entire Symbolic Regression process without the use of Genetic Programming per se. The techniques explored produce a symbolic regression engine which has absolutely no bloat, which allows total user control of the search space and output formulas, which is faster, and more accurate than the engines produced in our previous papers using Genetic Programming. The genome is an all vector structure with four chromosomes plus additional epigenetic and constraint vectors, allowing total user control of the search space and the final output formulas. A combination of specialized compiler techniques, genetic algorithms, particle swarm, aged layered populations, plus discrete and continuous differential evolution are used to produce an improved symbolic regression sytem. Nine base test cases, from the literature, are used to test the improvement in speed and accuracy. The improved results indicate that these techniques move us a big step closer toward future industrial strength symbolic regression systems.
Christensen, Karl Bang
Rasch models provide a framework for measurement and modelling latent variables. Having measured a latent variable in a population a comparison of groups will often be of interest. For this purpose the use of observed raw scores will often be inadequate because these lack interval scale propertie....... This paper compares two approaches to group comparison: linear regression models using estimated person locations as outcome variables and latent regression models based on the distribution of the score....
Tai, Bee Choo
Regression Methods for Medical Research provides medical researchers with the skills they need to critically read and interpret research using more advanced statistical methods. The statistical requirements of interpreting and publishing in medical journals, together with rapid changes in science and technology, increasingly demands an understanding of more complex and sophisticated analytic procedures.The text explains the application of statistical models to a wide variety of practical medical investigative studies and clinical trials. Regression methods are used to appropriately answer the
Bihrmann, Kristine; Toft, Nils; Nielsen, Søren Saxmose
Standard logistic regression assumes that the outcome is measured perfectly. In practice, this is often not the case, which could lead to biased estimates if not accounted for. This study presents Bayesian logistic regression with adjustment for misclassification of the outcome applied to data...
This report focuses on the relationship between walking and its contributing factors by : applying spatial regression methods. Using the Vermont data from the New England : Transportation Survey (NETS), walking variables as well as 170 independent va...
McMahan, Christopher S; Tebbs, Joshua M; Hanson, Timothy E; Bilder, Christopher R
Group testing involves pooling individual specimens (e.g., blood, urine, swabs, etc.) and testing the pools for the presence of a disease. When individual covariate information is available (e.g., age, gender, number of sexual partners, etc.), a common goal is to relate an individual's true disease status to the covariates in a regression model. Estimating this relationship is a nonstandard problem in group testing because true individual statuses are not observed and all testing responses (on pools and on individuals) are subject to misclassification arising from assay error. Previous regression methods for group testing data can be inefficient because they are restricted to using only initial pool responses and/or they make potentially unrealistic assumptions regarding the assay accuracy probabilities. To overcome these limitations, we propose a general Bayesian regression framework for modeling group testing data. The novelty of our approach is that it can be easily implemented with data from any group testing protocol. Furthermore, our approach will simultaneously estimate assay accuracy probabilities (along with the covariate effects) and can even be applied in screening situations where multiple assays are used. We apply our methods to group testing data collected in Iowa as part of statewide screening efforts for chlamydia, and we make user-friendly R code available to practitioners. © 2017, The International Biometric Society.
NASA's Kepler mission is the source of more exoplanets than any other instrument, but the discovery depends on complex statistical analysis procedures embedded in the Kepler pipeline. A particular challenge is mitigating irregular stellar variability without loss of sensitivity to faint periodic planetary transits. This proposal presents a two-stage alternative analysis procedure. First, parametric autoregressive ARFIMA models, commonly used in econometrics, remove most of the stellar variations. Second, a novel matched filter is used to create a periodogram from which transit-like periodicities are identified. This analysis procedure, the Kepler AutoRegressive Planet Search (KARPS), is confirming most of the Kepler Objects of Interest and is expected to identify additional planetary candidates. The proposed research will complete application of the KARPS methodology to the prime Kepler mission light curves of 200,000: stars, and compare the results with Kepler Objects of Interest obtained with the Kepler pipeline. We will then conduct a variety of astronomical studies based on the KARPS results. Important subsamples will be extracted including Habitable Zone planets, hot super-Earths, grazing-transit hot Jupiters, and multi-planet systems. Groundbased spectroscopy of poorly studied candidates will be performed to better characterize the host stars. Studies of stellar variability will then be pursued based on KARPS analysis. The autocorrelation function and nonstationarity measures will be used to identify spotted stars at different stages of autoregressive modeling. Periodic variables with folded light curves inconsistent with planetary transits will be identified; they may be eclipsing or mutually-illuminating binary star systems. Classification of stellar variables with KARPS-derived statistical properties will be attempted. KARPS procedures will then be applied to archived K2 data to identify planetary transits and characterize stellar variability.
Shi, Lei; Zuo, ShanShan; Yu, Dalei; Zhou, Xiaohua
This paper studies the influence diagnostics in meta-regression model including case deletion diagnostic and local influence analysis. We derive the subset deletion formulae for the estimation of regression coefficient and heterogeneity variance and obtain the corresponding influence measures. The DerSimonian and Laird estimation and maximum likelihood estimation methods in meta-regression are considered, respectively, to derive the results. Internal and external residual and leverage measure are defined. The local influence analysis based on case-weights perturbation scheme, responses perturbation scheme, covariate perturbation scheme, and within-variance perturbation scheme are explored. We introduce a method by simultaneous perturbing responses, covariate, and within-variance to obtain the local influence measure, which has an advantage of capable to compare the influence magnitude of influential studies from different perturbations. An example is used to illustrate the proposed methodology. Copyright © 2017 John Wiley & Sons, Ltd.
Liu, Chuan-Fen; Burgess, James F; Manning, Willard G; Maciejewski, Matthew L
To illustrate how the analysis of bimodal U-shaped distributed utilization can be modeled with beta-binomial regression, which is rarely used in health services research. Veterans Affairs (VA) administrative data and Medicare claims in 2001-2004 for 11,123 Medicare-eligible VA primary care users in 2000. We compared means and distributions of VA reliance (the proportion of all VA/Medicare primary care visits occurring in VA) predicted from beta-binomial, binomial, and ordinary least-squares (OLS) models. Beta-binomial model fits the bimodal distribution of VA reliance better than binomial and OLS models due to the nondependence on normality and the greater flexibility in shape parameters. Increased awareness of beta-binomial regression may help analysts apply appropriate methods to outcomes with bimodal or U-shaped distributions. © Health Research and Educational Trust.
Birke, Melanie; Bissantz, Nicolai; Holzmann, Hajo
We construct uniform confidence bands for the regression function in inverse, homoscedastic regression models with convolution-type operators. Here, the convolution is between two non-periodic functions on the whole real line rather than between two periodic functions on a compact interval, since the former situation arguably arises more often in applications. First, following Bickel and Rosenblatt (1973 Ann. Stat. 1 1071–95) we construct asymptotic confidence bands which are based on strong approximations and on a limit theorem for the supremum of a stationary Gaussian process. Further, we propose bootstrap confidence bands based on the residual bootstrap and prove consistency of the bootstrap procedure. A simulation study shows that the bootstrap confidence bands perform reasonably well for moderate sample sizes. Finally, we apply our method to data from a gel electrophoresis experiment with genetically engineered neuronal receptor subunits incubated with rat brain extract
Xie, Dan; Liu, Yi; Chen, Jining
Forecasting and preventing urban noise pollution are major challenges in urban environmental management. Most existing efforts, including experiment-based models, statistical models, and noise mapping, however, have limited capacity to explain the association between urban growth and corresponding noise change. Therefore, these conventional methods can hardly forecast urban noise at a given outlook of development layout. This paper, for the first time, introduces a land use regression method, which has been applied for simulating urban air quality for a decade, to construct an urban noise model (LUNOS) in Dalian Municipality, Northwest China. The LUNOS model describes noise as a dependent variable of surrounding various land areas via a regressive function. The results suggest that a linear model performs better in fitting monitoring data, and there is no significant difference of the LUNOS's outputs when applied to different spatial scales. As the LUNOS facilitates a better understanding of the association between land use and urban environmental noise in comparison to conventional methods, it can be regarded as a promising tool for noise prediction for planning purposes and aid smart decision-making.
Gandy, Axel; Jensen, Uwe
We introduce directed goodness-of-fit tests for Cox-type regression models in survival analysis. "Directed" means that one may choose against which alternatives the tests are particularly powerful. The tests are based on sums of weighted martingale residuals and their asymptotic distributions.We derive optimal tests against certain competing models which include Cox-type regression models with different covariates and/or a different link function. We report results from several simulation studies and apply our test to a real dataset.
Huang, Mian; Li, Runze; Wang, Shaoli
Motivated by an analysis of US house price index data, we propose nonparametric finite mixture of regression models. We study the identifiability issue of the proposed models, and develop an estimation procedure by employing kernel regression. We further systematically study the sampling properties of the proposed estimators, and establish their asymptotic normality. A modified EM algorithm is proposed to carry out the estimation procedure. We show that our algorithm preserves the ascent property of the EM algorithm in an asymptotic sense. Monte Carlo simulations are conducted to examine the finite sample performance of the proposed estimation procedure. An empirical analysis of the US house price index data is illustrated for the proposed methodology.
The problem considered is that of resolving a measured pulse height spectrum of a material mixture, e.g. gamma ray spectrum, Raman spectrum, into a weighed sum of the spectra of the individual constituents. The model on which the analytical formulation is based is described. The problem reduces to that of a multiple linear regression. A stepwise linear regression procedure was constructed. The efficiency of this method was then tested by transforming the procedure in a computer programme which was used to unfold test spectra obtained by mixing some spectra, from a library of arbitrary chosen spectra, and adding a noise component. (U.K.)
Al-Daffaie, Kadhem; Khan, Shahjahan
This paper considers the relationship between a binary response and a circular predictor. It develops the logistic regression model by employing the linear-circular regression approach. The maximum likelihood method is used to estimate the parameters. The Newton-Raphson numerical method is used to find the estimated values of the parameters. A data set from weather records of Toowoomba city is analysed by the proposed methods. Moreover, a simulation study is considered. The R software is used for all computations and simulations.
Mitchell, T.J.; Beauchamp, J.J.
This paper is concerned with the selection of subsets of ''predictor'' variables in a linear regression model for the prediction of a ''dependent'' variable. We take a Bayesian approach and assign a probability distribution to the dependent variable through a specification of prior distributions for the unknown parameters in the regression model. The appropriate posterior probabilities are derived for each submodel and methods are proposed for evaluating the family of prior distributions. Examples are given that show the application of the Bayesian methodology. 23 refs., 3 figs.
Full Text Available The stepwise linear regression is a multi-variable regression for identifying statistically significant variables in the linear regression equation. In present study, we presented the Matlab program of stepwise regression.
Acevedo Rodriguez, F J; López-Sastre, R J; Gil-Jiménez, P; Maldonado Bascón, S; Ruiz-Reyes, N
Cyclic voltammetry is an electroanalytical technique for obtaining information about substances under analysis without the need for complex flow systems. However, classifying the information in voltammograms obtained using this technique is difficult. In this paper, we propose the use of fixed kernel regression as a method for extracting features from these voltammograms, reducing the information to a few coefficients. The proposed approach has been applied to a wine classification problem with accuracy rates of over 98%. Although the method is described here for extracting voltammogram information, it can be used for other types of signals
Grilo, Luís M.; Grilo, Helena L.; Gonçalves, Sónia P.; Junça, Ana
In European countries, namely in Portugal, it is common to hear some people mentioning that they are exposed to excessive and continuous psychosocial stressors at work. This is increasing in diverse activity sectors, such as, the Services sector. A representative sample was collected from a Portuguese Services' organization, by applying a survey (internationally validated), which variables were measured in five ordered categories in Likert-type scale. A multinomial logistic regression model is used to estimate the probability of each category of the dependent variable general health perception where, among other independent variables, burnout appear as statistically significant.
Stel, Vianda S.; Dekker, Friedo W.; Tripepi, Giovanni; Zoccali, Carmine; Jager, Kitty J.
In contrast to the Kaplan-Meier method, Cox proportional hazards regression can provide an effect estimate by quantifying the difference in survival between patient groups and can adjust for confounding effects of other variables. The purpose of this article is to explain the basic concepts of the
Tate, Richard L.
An exploratory study of the value of ridge regression for interactive models is reported. Assuming that the linear terms in a simple interactive model are centered to eliminate non-essential multicollinearity, a variety of common models, representing both ordinal and disordinal interactions, are shown to have "orientations" that are…
When the author first used "VisiCalc," the author thought it a very useful tool when he had the formulas. But how could he design a spreadsheet if there was no known formula for the quantities he was trying to predict? A few months later, the author relates he learned to use multiple linear regression software and suddenly it all clicked into…
This book contains a course in applied macroeconomics. Macroeconomic theory is applied to real world cases. Students are expected to compute model results with the help of a spreadsheet program. To that end the book also contains descriptions of the spreadsheet applications used, such as linear
Sharif, Behzad; Makowski, David; Plauborg, Finn
showing similar performance led in some cases to different conclusions with respect to effect of temperature and precipitation. Hence, it is recommended to apply an ensemble of regression models, in order to account for the sensitivity of the data driven models for projecting crop yield under climate......Statistical regression models represent alternatives to process-based dynamic models for predicting the response of crop yields to variation in climatic conditions. Regression models can be used to quantify the effect of change in temperature and precipitation on yields. However, it is difficult...... to identify the most relevant input variables that should be included in regression models due to the high number of candidate variables and to their correlations. This paper compares several regression techniques for modeling response of winter oilseed rape yield to a high number of correlated input...
Chiu, Long S.
The potential use of logistic regression in rainfall estimation from satellite measurements is investigated. Satellite measurements provide covariate information in terms of radiances from different remote sensors.The logistic regression technique can effectively accommodate many covariates and test their significance in the estimation. The outcome from the logistical model is the probability that the rainrate of a satellite pixel is above a certain threshold. By varying the thresholds, a rainrate histogram can be obtained, from which the mean and the variant can be estimated. A logistical model is developed and applied to rainfall data collected during GATE, using as covariates the fractional rain area and a radiance measurement which is deduced from a microwave temperature-rainrate relation. It is demonstrated that the fractional rain area is an important covariate in the model, consistent with the use of the so-called Area Time Integral in estimating total rain volume in other studies. To calibrate the logistical model, simulated rain fields generated by rainfield models with prescribed parameters are needed. A stringent test of the logistical model is its ability to recover the prescribed parameters of simulated rain fields. A rain field simulation model which preserves the fractional rain area and lognormality of rainrates as found in GATE is developed. A stochastic regression model of branching and immigration whose solutions are lognormally distributed in some asymptotic limits has also been developed.
Slamet, I.; Nugroho, N. F. T. A.; Muslich
In this research, we applied geographically weighted regression (GWR) for analyzing the poverty in Central Java. We consider Gaussian Kernel as weighted function. The GWR uses the diagonal matrix resulted from calculating kernel Gaussian function as a weighted function in the regression model. The kernel weights is used to handle spatial effects on the data so that a model can be obtained for each location. The purpose of this paper is to model of poverty percentage data in Central Java province using GWR with Gaussian kernel weighted function and to determine the influencing factors in each regency/city in Central Java province. Based on the research, we obtained geographically weighted regression model with Gaussian kernel weighted function on poverty percentage data in Central Java province. We found that percentage of population working as farmers, population growth rate, percentage of households with regular sanitation, and BPJS beneficiaries are the variables that affect the percentage of poverty in Central Java province. In this research, we found the determination coefficient R2 are 68.64%. There are two categories of district which are influenced by different of significance factors.
Laparra, Valero; Malo, Jesus; Camps-Valls, Gustau
This paper introduces a new unsupervised method for dimensionality reduction via regression (DRR). The algorithm belongs to the family of invertible transforms that generalize Principal Component Analysis (PCA) by using curvilinear instead of linear features. DRR identifies the nonlinear features through multivariate regression to ensure the reduction in redundancy between he PCA coefficients, the reduction of the variance of the scores, and the reduction in the reconstruction error. More importantly, unlike other nonlinear dimensionality reduction methods, the invertibility, volume-preservation, and straightforward out-of-sample extension, makes DRR interpretable and easy to apply. The properties of DRR enable learning a more broader class of data manifolds than the recently proposed Non-linear Principal Components Analysis (NLPCA) and Principal Polynomial Analysis (PPA). We illustrate the performance of the representation in reducing the dimensionality of remote sensing data. In particular, we tackle two common problems: processing very high dimensional spectral information such as in hyperspectral image sounding data, and dealing with spatial-spectral image patches of multispectral images. Both settings pose collinearity and ill-determination problems. Evaluation of the expressive power of the features is assessed in terms of truncation error, estimating atmospheric variables, and surface land cover classification error. Results show that DRR outperforms linear PCA and recently proposed invertible extensions based on neural networks (NLPCA) and univariate regressions (PPA).
Heylen, Kris; Geeraerts, Dirk
When data consist of grouped observations or clusters, and there is a risk that measurements within the same group are not independent, group-specific random effects can be added to a regression model in order to account for such within-group associations. Regression models that contain such group-specific random effects are called mixed-effects regression models, or simply mixed models. Mixed models are a versatile tool that can handle both balanced and unbalanced datasets and that can also be applied when several layers of grouping are present in the data; these layers can either be nested or crossed. In linguistics, as in many other fields, the use of mixed models has gained ground rapidly over the last decade. This methodological evolution enables us to build more sophisticated and arguably more realistic models, but, due to its technical complexity, also introduces new challenges. This volume brings together a number of promising new evolutions in the use of mixed models in linguistics, but also addres...
Ogutu, Joseph O; Schulz-Streeck, Torben; Piepho, Hans-Peter
Genomic selection (GS) is emerging as an efficient and cost-effective method for estimating breeding values using molecular markers distributed over the entire genome. In essence, it involves estimating the simultaneous effects of all genes or chromosomal segments and combining the estimates to predict the total genomic breeding value (GEBV). Accurate prediction of GEBVs is a central and recurring challenge in plant and animal breeding. The existence of a bewildering array of approaches for predicting breeding values using markers underscores the importance of identifying approaches able to efficiently and accurately predict breeding values. Here, we comparatively evaluate the predictive performance of six regularized linear regression methods-- ridge regression, ridge regression BLUP, lasso, adaptive lasso, elastic net and adaptive elastic net-- for predicting GEBV using dense SNP markers. We predicted GEBVs for a quantitative trait using a dataset on 3000 progenies of 20 sires and 200 dams and an accompanying genome consisting of five chromosomes with 9990 biallelic SNP-marker loci simulated for the QTL-MAS 2011 workshop. We applied all the six methods that use penalty-based (regularization) shrinkage to handle datasets with far more predictors than observations. The lasso, elastic net and their adaptive extensions further possess the desirable property that they simultaneously select relevant predictive markers and optimally estimate their effects. The regression models were trained with a subset of 2000 phenotyped and genotyped individuals and used to predict GEBVs for the remaining 1000 progenies without phenotypes. Predictive accuracy was assessed using the root mean squared error, the Pearson correlation between predicted GEBVs and (1) the true genomic value (TGV), (2) the true breeding value (TBV) and (3) the simulated phenotypic values based on fivefold cross-validation (CV). The elastic net, lasso, adaptive lasso and the adaptive elastic net all had
A detailed mathematical determination of regression laws is presented in the article. Particular emphasis is place on determining the laws of X j on X l to account for source nuclei decay and detector errors in nuclear physics instrumentation. Both linear and nonlinear relations are presented. Linearization of 19 functions is tabulated, including graph, relation, variable substitution, obtained linear function, and remarks. 6 refs., 1 tab
Lobo Pereira, J. A.; Ferreira, Maria Cristina; Oliveira, Teresa
Multivariate statistical analysis is indispensable to assess the associations and interactions between different factors and the risk of periodontitis. Among others, regression analysis is a statistical technique widely used in healthcare to investigate and model the relationship between variables. In our work we study the impact of socio-demographic, medical and behavioral factors on periodontal health. Using regression, linear and logistic models, we can assess the relevance, as risk factors for periodontitis disease, of the following independent variables (IVs): Age, Gender, Diabetic Status, Education, Smoking status and Plaque Index. The multiple linear regression analysis model was built to evaluate the influence of IVs on mean Attachment Loss (AL). Thus, the regression coefficients along with respective p-values will be obtained as well as the respective p-values from the significance tests. The classification of a case (individual) adopted in the logistic model was the extent of the destruction of periodontal tissues defined by an Attachment Loss greater than or equal to 4 mm in 25% (AL≥4mm/≥25%) of sites surveyed. The association measures include the Odds Ratios together with the correspondent 95% confidence intervals.
Rodríguez-Girondo, Mar; Kneib, Thomas; Cadarso-Suárez, Carmen; Abu-Assi, Emad
Recent developments of statistical methods allow for a very flexible modeling of covariates affecting survival times via the hazard rate, including also the inspection of possible time-dependent associations. Despite their immediate appeal in terms of flexibility, these models typically introduce additional difficulties when a subset of covariates and the corresponding modeling alternatives have to be chosen, that is, for building the most suitable model for given data. This is particularly true when potentially time-varying associations are given. We propose to conduct a piecewise exponential representation of the original survival data to link hazard regression with estimation schemes based on of the Poisson likelihood to make recent advances for model building in exponential family regression accessible also in the nonproportional hazard regression context. A two-stage stepwise selection approach, an approach based on doubly penalized likelihood, and a componentwise functional gradient descent approach are adapted to the piecewise exponential regression problem. These three techniques were compared via an intensive simulation study. An application to prognosis after discharge for patients who suffered a myocardial infarction supplements the simulation to demonstrate the pros and cons of the approaches in real data analyses. Copyright © 2013 John Wiley & Sons, Ltd.
Newhouse, Vernon L
Applied Superconductivity, Volume II, is part of a two-volume series on applied superconductivity. The first volume dealt with electronic applications and radiation detection, and contains a chapter on liquid helium refrigeration. The present volume discusses magnets, electromechanical applications, accelerators, and microwave and rf devices. The book opens with a chapter on high-field superconducting magnets, covering applications and magnet design. Subsequent chapters discuss superconductive machinery such as superconductive bearings and motors; rf superconducting devices; and future prospec
Musicant, David R; Feinberg, Alexander
This paper presents active set support vector regression (ASVR), a new active set strategy to solve a straightforward reformulation of the standard support vector regression problem. This new algorithm is based on the successful ASVM algorithm for classification problems, and consists of solving a finite number of linear equations with a typically large dimensionality equal to the number of points to be approximated. However, by making use of the Sherman-Morrison-Woodbury formula, a much smaller matrix of the order of the original input space is inverted at each step. The algorithm requires no specialized quadratic or linear programming code, but merely a linear equation solver which is publicly available. ASVR is extremely fast, produces comparable generalization error to other popular algorithms, and is available on the web for download.
This thesis is the first comprehensive research work conducted on the Beirut based TV station, an important representative of the post-2011 generation of Arab satellite news media. The launch of al-Mayadeen in June 2012 was closely linked to the political developments across the Arab world...... members, this thesis investigates a growing political trend and ideological discourse in the Arab world that I have called The New Regressive Left. On the premise that a media outlet can function as a forum for ideology production, the thesis argues that an analysis of this material can help to trace...... the contexture of The New Regressive Left. If the first part of the thesis lays out the theoretical approach and draws the contextual framework, through an exploration of the surrounding Arab media-and ideoscapes, the second part is an analytical investigation of the discourse that permeates the programmes aired...
Kihara, Kyoichi; Fujita, Shin; Ohshiro, Taihei; Yamamoto, Seiichiro; Sekine, Shigeki
A case of spontaneous regression of transverse colon cancer is reported. A 64-year-old man was diagnosed as having cancer of the transverse colon at a local hospital. Initial and second colonoscopy examinations revealed a typical cancer of the transverse colon, which was diagnosed as moderately differentiated adenocarcinoma. The patient underwent right hemicolectomy 6 weeks after the initial colonoscopy. The resected specimen showed only a scar at the tumor site, and no cancerous tissue was proven histologically. The patient is alive with no evidence of recurrence 1 year after surgery. Although an antitumor immune response is the most likely explanation, the exact nature of the phenomenon was unclear. We describe this rare case and review the literature pertaining to spontaneous regression of colorectal cancer. © The Author 2014. Published by Oxford University Press. All rights reserved. For Permissions, please email: email@example.com.
Sofro, A.; Oktaviarina, A.
Spatial analysis has developed very quickly in the last decade. One of the favorite approaches is based on the neighbourhood of the region. Unfortunately, there are some limitations such as difficulty in prediction. Therefore, we offer Gaussian process regression (GPR) to accommodate the issue. In this paper, we will focus on spatial modeling with GPR for binomial data with logit link function. The performance of the model will be investigated. We will discuss the inference of how to estimate the parameters and hyper-parameters and to predict as well. Furthermore, simulation studies will be explained in the last section.
Korman, Valentin; Polzin, Kurt A.
The capability to provide localized, real-time monitoring of material regression rates in various applications has the potential to provide a new stream of data for development testing of various components and systems, as well as serving as a monitoring tool in flight applications. These applications include, but are not limited to, the regression of a combusting solid fuel surface, the ablation of the throat in a chemical rocket or the heat shield of an aeroshell, and the monitoring of erosion in long-life plasma thrusters. The rate of regression in the first application is very fast, while the second and third are increasingly slower. A recent fundamental sensor development effort has led to a novel regression, erosion, and ablation sensor technology (REAST). The REAST sensor allows for measurement of real-time surface erosion rates at a discrete surface location. The sensor is optical, using two different, co-located fiber-optics to perform the regression measurement. The disparate optical transmission properties of the two fiber-optics makes it possible to measure the regression rate by monitoring the relative light attenuation through the fibers. As the fibers regress along with the parent material in which they are embedded, the relative light intensities through the two fibers changes, providing a measure of the regression rate. The optical nature of the system makes it relatively easy to use in a variety of harsh, high temperature environments, and it is also unaffected by the presence of electric and magnetic fields. In addition, the sensor could be used to perform optical spectroscopy on the light emitted by a process and collected by fibers, giving localized measurements of various properties. The capability to perform an in-situ measurement of material regression rates is useful in addressing a variety of physical issues in various applications. An in-situ measurement allows for real-time data regarding the erosion rates, providing a quick method for
Djarfour, Nouredine; Ferahtia, Jalal; Babaia, Foudel; Baddari, Kamel; Said, El-adj; Farfour, Mohammed
This paper deals with the application of Generalized Regression Neural Networks to the seismic data filtering. The proposed system is a class of neural networks widely used for the continuous function mapping. They are based on the well known nonparametric kernel statistical estimators. The main advantages of this neural network include adaptability, simplicity and rapid training. Several synthetic tests are performed in order to highlight the merit of the proposed topology of neural network. In this work, the filtering strategy has been applied to remove random noises as well as source-related noises from real seismic data extracted from a field in the South of Algeria. The obtained results are very promising and indicate the high performance of the proposed filter in comparison to the well known frequency-wavenumber filter.
Krishan, Kewal; Kanchan, Tanuj; Sharma, Abhilasha
Estimation of stature is an important parameter in identification of human remains in forensic examinations. The present study is aimed to compare the reliability and accuracy of stature estimation and to demonstrate the variability in estimated stature and actual stature using multiplication factor and regression analysis methods. The study is based on a sample of 246 subjects (123 males and 123 females) from North India aged between 17 and 20 years. Four anthropometric measurements; hand length, hand breadth, foot length and foot breadth taken on the left side in each subject were included in the study. Stature was measured using standard anthropometric techniques. Multiplication factors were calculated and linear regression models were derived for estimation of stature from hand and foot dimensions. Derived multiplication factors and regression formula were applied to the hand and foot measurements in the study sample. The estimated stature from the multiplication factors and regression analysis was compared with the actual stature to find the error in estimated stature. The results indicate that the range of error in estimation of stature from regression analysis method is less than that of multiplication factor method thus, confirming that the regression analysis method is better than multiplication factor analysis in stature estimation. Copyright © 2012 Elsevier Ltd and Faculty of Forensic and Legal Medicine. All rights reserved.
Caceres, Gabriel Antonio; Feigelson, Eric
The Kepler AutoRegressive Planet Search (KARPS) project uses statistical methodology associated with autoregressive (AR) processes to model Kepler lightcurves in order to improve exoplanet transit detection in systems with high stellar variability. We also introduce a planet-search algorithm to detect transits in time-series residuals after application of the AR models. One of the main obstacles in detecting faint planetary transits is the intrinsic stellar variability of the host star. The variability displayed by many stars may have autoregressive properties, wherein later flux values are correlated with previous ones in some manner. Our analysis procedure consisting of three steps: pre-processing of the data to remove discontinuities, gaps and outliers; AR-type model selection and fitting; and transit signal search of the residuals using a new Transit Comb Filter (TCF) that replaces traditional box-finding algorithms. The analysis procedures of the project are applied to a portion of the publicly available Kepler light curve data for the full 4-year mission duration. Tests of the methods have been made on a subset of Kepler Objects of Interest (KOI) systems, classified both as planetary `candidates' and `false positives' by the Kepler Team, as well as a random sample of unclassified systems. We find that the ARMA-type modeling successfully reduces the stellar variability, by a factor of 10 or more in active stars and by smaller factors in more quiescent stars. A typical quiescent Kepler star has an interquartile range (IQR) of ~10 e-/sec, which may improve slightly after modeling, while those with IQR ranging from 20 to 50 e-/sec, have improvements from 20% up to 70%. High activity stars (IQR exceeding 100) markedly improve. A periodogram based on the TCF is constructed to concentrate the signal of these periodic spikes. When a periodic transit is found, the model is displayed on a standard period-folded averaged light curve. Our findings to date on real
Schmidt, Amand F; Finan, Chris
Researchers often perform arbitrary outcome transformations to fulfill the normality assumption of a linear regression model. This commentary explains and illustrates that in large data settings, such transformations are often unnecessary, and worse may bias model estimates. Linear regression assumptions are illustrated using simulated data and an empirical example on the relation between time since type 2 diabetes diagnosis and glycated hemoglobin levels. Simulation results were evaluated on coverage; i.e., the number of times the 95% confidence interval included the true slope coefficient. Although outcome transformations bias point estimates, violations of the normality assumption in linear regression analyses do not. The normality assumption is necessary to unbiasedly estimate standard errors, and hence confidence intervals and P-values. However, in large sample sizes (e.g., where the number of observations per variable is >10) violations of this normality assumption often do not noticeably impact results. Contrary to this, assumptions on, the parametric model, absence of extreme observations, homoscedasticity, and independency of the errors, remain influential even in large sample size settings. Given that modern healthcare research typically includes thousands of subjects focusing on the normality assumption is often unnecessary, does not guarantee valid results, and worse may bias estimates due to the practice of outcome transformations. Copyright © 2017 Elsevier Inc. All rights reserved.
Logan, J David
Praise for the Third Edition"Future mathematicians, scientists, and engineers should find the book to be an excellent introductory text for coursework or self-study as well as worth its shelf space for reference." -MAA Reviews Applied Mathematics, Fourth Edition is a thoroughly updated and revised edition on the applications of modeling and analyzing natural, social, and technological processes. The book covers a wide range of key topics in mathematical methods and modeling and highlights the connections between mathematics and the applied and nat
Faraway, Julian J
Linear models are central to the practice of statistics and form the foundation of a vast range of statistical methodologies. Julian J. Faraway''s critically acclaimed Linear Models with R examined regression and analysis of variance, demonstrated the different methods available, and showed in which situations each one applies. Following in those footsteps, Extending the Linear Model with R surveys the techniques that grow from the regression model, presenting three extensions to that framework: generalized linear models (GLMs), mixed effect models, and nonparametric regression models. The author''s treatment is thoroughly modern and covers topics that include GLM diagnostics, generalized linear mixed models, trees, and even the use of neural networks in statistics. To demonstrate the interplay of theory and practice, throughout the book the author weaves the use of the R software environment to analyze the data of real examples, providing all of the R commands necessary to reproduce the analyses. All of the ...
Full Text Available Efficient cropping requires yield estimation for each involved crop, where data-driven models are commonly applied. In recent years, some data-driven modeling technique comparisons have been made, looking for the best model to yield prediction. However, attributes are usually selected based on expertise assessment or in dimensionality reduction algorithms. A fairer comparison should include the best subset of features for each regression technique; an evaluation including several crops is preferred. This paper evaluates the most common data-driven modeling techniques applied to yield prediction, using a complete method to define the best attribute subset for each model. Multiple linear regression, stepwise linear regression, M5′ regression trees, and artificial neural networks (ANN were ranked. The models were built using real data of eight crops sowed in an irrigation module of Mexico. To validate the models, three accuracy metrics were used: the root relative square error (RRSE, relative mean absolute error (RMAE, and correlation factor (R. The results show that ANNs are more consistent in the best attribute subset composition between the learning and the training stages, obtaining the lowest average RRSE (86.04%, lowest average RMAE (8.75%, and the highest average correlation factor (0.63.
Han, Xixuan; Clemmensen, Line Katrine Harder
We propose a new type of weighted support vector regression (SVR), motivated by modeling local dependencies in time and space in prediction of house prices. The classic weights of the weighted SVR are added to the slack variables in the objective function (OF‐weights). This procedure directly...... the differences and similarities of the two types of weights by demonstrating the connection between the Least Absolute Shrinkage and Selection Operator (LASSO) and the SVR. We show that an SVR problem can be transformed to a LASSO problem plus a linear constraint and a box constraint. We demonstrate...
Beyerlein, Andreas; Fahrmeir, Ludwig; Mansmann, Ulrich; Toschke, André M
Abstract Background Body mass index (BMI) data usually have skewed distributions, for which common statistical modeling approaches such as simple linear or logistic regression have limitations. Methods Different regression approaches to predict childhood BMI by goodness-of-fit measures and means of interpretation were compared including generalized linear models (GLMs), quantile regression and Generalized Additive Models for Location, Scale and Shape (GAMLSS). We analyzed data of 4967 childre...
Reiss, Philip T; Huang, Lei; Mennes, Maarten
Regression models for functional responses and scalar predictors are often fitted by means of basis functions, with quadratic roughness penalties applied to avoid overfitting. The fitting approach described by Ramsay and Silverman in the 1990 s amounts to a penalized ordinary least squares (P-OLS) estimator of the coefficient functions. We recast this estimator as a generalized ridge regression estimator, and present a penalized generalized least squares (P-GLS) alternative. We describe algorithms by which both estimators can be implemented, with automatic selection of optimal smoothing parameters, in a more computationally efficient manner than has heretofore been available. We discuss pointwise confidence intervals for the coefficient functions, simultaneous inference by permutation tests, and model selection, including a novel notion of pointwise model selection. P-OLS and P-GLS are compared in a simulation study. Our methods are illustrated with an analysis of age effects in a functional magnetic resonance imaging data set, as well as a reanalysis of a now-classic Canadian weather data set. An R package implementing the methods is publicly available.
Roč. 18, č. 4 (2012), s. 154-164 ISSN 1803-9782 Grant - others:GA ČR(CZ) GAP209/10/2045 Institutional support: RVO:67985556 Keywords : regression analysis * Gordon surface * prediction error * projection pursuit Subject RIV: BB - Applied Statistics, Operational Research http://library.utia.cas.cz/separaty/2013/SI/volf-on two flexible methods of 2-dimensional regression analysis.pdf
Mahdiyah; Norsiah Mohamed, Wan; Ibrahim, Kamarulzaman
In this study, quantiles regression approach is applied to the data of Malaysian Family Life Survey (MFLS), to identify factors which are significantly related to the different conditional quantiles of the breastfeeding duration. It is known that the classical linear regression methods are based on minimizing residual sum of squared, but quantiles regression use a mechanism which are based on the conditional median function and the full range of other conditional quantile functions. Overall, it is found that the period of breastfeeding is significantly related to place of living, religion and total number of children in the family.
Barbarossa, V.; Huijbregts, M. A. J.; Hendriks, J. A.; Beusen, A.; Clavreul, J.; King, H.; Schipper, A.
Quantifying mean annual flow of rivers (MAF) at ungauged sites is essential for a number of applications, including assessments of global water supply, ecosystem integrity and water footprints. MAF can be quantified with spatially explicit process-based models, which might be overly time-consuming and data-intensive for this purpose, or with empirical regression models that predict MAF based on climate and catchment characteristics. Yet, regression models have mostly been developed at a regional scale and the extent to which they can be extrapolated to other regions is not known. In this study, we developed a global-scale regression model for MAF using observations of discharge and catchment characteristics from 1,885 catchments worldwide, ranging from 2 to 106 km2 in size. In addition, we compared the performance of the regression model with the predictive ability of the spatially explicit global hydrological model PCR-GLOBWB [van Beek et al., 2011] by comparing results from both models to independent measurements. We obtained a regression model explaining 89% of the variance in MAF based on catchment area, mean annual precipitation and air temperature, average slope and elevation. The regression model performed better than PCR-GLOBWB for the prediction of MAF, as root-mean-square error values were lower (0.29 - 0.38 compared to 0.49 - 0.57) and the modified index of agreement was higher (0.80 - 0.83 compared to 0.72 - 0.75). Our regression model can be applied globally at any point of the river network, provided that the input parameters are within the range of values employed in the calibration of the model. The performance is reduced for water scarce regions and further research should focus on improving such an aspect for regression-based global hydrological models.
Applied Dynamics is an important branch of engineering mechanics widely applied to mechanical and automotive engineering, aerospace and biomechanics as well as control engineering and mechatronics. The computational methods presented are based on common fundamentals. For this purpose analytical mechanics turns out to be very useful where D’Alembert’s principle in the Lagrangian formulation proves to be most efficient. The method of multibody systems, finite element systems and continuous systems are treated consistently. Thus, students get a much better understanding of dynamical phenomena, and engineers in design and development departments using computer codes may check the results more easily by choosing models of different complexity for vibration and stress analysis.
Ventosa-Santaulària, Daniel; Rodríguez-Caballero, Carlos Vladimir
Polynomial specifications are widely used, not only in applied economics, but also in epidemiology, physics, political analysis, and psychology, just to mention a few examples. In many cases, the data employed to estimate such estimations are time series that may exhibit stochastic nonstationary ...
Full Text Available Nowadays, humanoids are increasingly expected acting in the real world to complete some high-level tasks humanly and intelligently. However, this is a hard issue due to that the real world is always extremely complicated and full of miscellaneous variations. As a consequence, for a real-world-acting robot, precisely perceiving the environmental changes might be an essential premise. Unlike human being, humanoid robot usually turns out to be with much less sensors to get enough information from the real world, which further leads the environmental perception problem to be more challenging. Although it can be tackled by establishing direct sensory mappings or adopting probabilistic filtering methods, the nonlinearity and uncertainty caused by both the complexity of the environment and the high degree of freedom of the robots will result in tough modeling difficulties. In our study, with the Gaussian process regression framework, an alternative learning approach to address such a modeling problem is proposed and discussed. Meanwhile, to debase the influence derived from limited sensors, the idea of fusing multiple sensory information is also involved. To evaluate the effectiveness, with two representative environment changing tasks, that is, suffering unknown external pushing and suddenly encountering sloped terrains, the proposed approach is applied to a humanoid, which is only equipped with a three-axis gyroscope and a three-axis accelerometer. Experimental results reveal that the proposed Gaussian process regression-based approach is effective in coping with the nonlinearity and uncertainty of the humanoid environmental perception problem. Further, a humanoid balancing controller is developed, which takes the output of the Gaussian process regression-based environmental perception as the seed to activate the corresponding balancing strategy. Both simulated and hardware experiments consistently show that our approach is valuable and leads to a
Sharifzadeh, Sara; Skytte, Jacob Lercke; Nielsen, Otto Højager Attermann
Statistical solutions find wide spread use in food and medicine quality control. We investigate the effect of different regression and sparse regression methods for a viscosity estimation problem using the spectro-temporal features from new Sub-Surface Laser Scattering (SLS) vision system. From...... with sparse LAR, lasso and Elastic Net (EN) sparse regression methods. Due to the inconsistent measurement condition, Locally Weighted Scatter plot Smoothing (Loess) has been employed to alleviate the undesired variation in the estimated viscosity. The experimental results of applying different methods show...... that, the sparse regression lasso outperforms other methods. In addition, the use of local smoothing has improved the results considerably for all regression methods. Due to the sparsity of lasso, this result would assist to design a simpler vision system with less spectral bands....
Full Text Available A 54 years old female patient was admitted to our outpatient clinic with a two months history of muscle spasms of her neck and pain radiating to the left upper extremity. Magnetic resonance imaging had shown a large left-sided paracentral disk herniation at the C6-C7 disk space (Figure 1. Neurological examination showed no obvious neurological deficit. She received conservative treatment including bed rest, rehabilitation, and analgesic drugs. After 13 months, requested by the patient, a second magnetic resonance imaging study showed resolution of the disc herniation.(Figure 2 Although the literature contains several reports about spontaneous regression of herniated lumbar disc without surgical intervention, that of phenomenon reported for herniated cervical level is rare, and such reports are few. In conclusion, herniated intervertebral disc have the potential to spontaneously regress independently from the spine level. With further studies, determining the predictive signs for prognostic evaluation for spontaneous regression which would yield to conservative treatment would be beneficial.
Huang, Ying; Pepe, Margaret S; Feng, Ziding
Two different approaches to analysis of data from diagnostic biomarker studies are commonly employed. Logistic regression is used to fit models for probability of disease given marker values while ROC curves and risk distributions are used to evaluate classification performance. In this paper we present a method that simultaneously accomplishes both tasks. The key step is to standardize markers relative to the non-diseased population before including them in the logistic regression model. Among the advantages of this method are: (i) ensuring that results from regression and performance assessments are consistent with each other; (ii) allowing covariate adjustment and covariate effects on ROC curves to be handled in a familiar way, and (iii) providing a mechanism to incorporate important assumptions about structure in the ROC curve into the fitted risk model. We develop the method in detail for the problem of combining biomarker datasets derived from multiple studies, populations or biomarker measurement platforms, when ROC curves are similar across data sources. The methods are applicable to both cohort and case-control sampling designs. The dataset motivating this application concerns Prostate Cancer Antigen 3 (PCA3) for diagnosis of prostate cancer in patients with or without previous negative biopsy where the ROC curves for PCA3 are found to be the same in the two populations. Estimated constrained maximum likelihood and empirical likelihood estimators are derived. The estimators are compared in simulation studies and the methods are illustrated with the PCA3 dataset.
The pool adjacent violators (PAV) algorithm is an efficient technique for the class of isotonic regression problems with complete ordering. The algorithm yields a stepwise isotonic estimate which approximates the function and assigns maximum likelihood to the data. However, if one has reasons to believe that the data were generated by a continuous function, a smoother estimate may provide a better approximation to that function. In this paper, we consider the formulation which assumes that the data were generated by a continuous monotonic function obeying the Lipschitz condition. We propose a new algorithm, the Lipschitz pool adjacent violators (LPAV) algorithm, which approximates that function; we prove the convergence of the algorithm and examine its complexity. PMID:29456266
This paper considers the possibility of prediction in land use planning, and the use of statistical research methods in analyses of relationships between urban form and travel behaviour. Influential writers within the tradition of critical realism reject the possibility of predicting social...... phenomena. This position is fundamentally problematic to public planning. Without at least some ability to predict the likely consequences of different proposals, the justification for public sector intervention into market mechanisms will be frail. Statistical methods like regression analyses are commonly...... seen as necessary in order to identify aggregate level effects of policy measures, but are questioned by many advocates of critical realist ontology. Using research into the relationship between urban structure and travel as an example, the paper discusses relevant research methods and the kinds...
Halcoussis, Dennis; Phillips, G. Michael
Statistics, econometrics, investment analysis, and data analysis classes often review the calculation of several types of averages, including the arithmetic mean, geometric mean, harmonic mean, and various weighted averages. This note shows how each of these can be computed using a basic regression framework. By recognizing when a regression model…
Altun, Idiris; Yüksel, Kasım Zafer
Low back pain is a frequent condition that results in substantial disability and causes admission of patients to neurosurgery clinics. To evaluate and present the therapeutic outcomes in lumbar disc hernia (LDH) patients treated by means of a conservative approach, consisting of bed rest and medical therapy. This retrospective cohort was carried out in the neurosurgery departments of hospitals in Kahramanmaraş city and 23 patients diagnosed with LDH at the levels of L3-L4, L4-L5 or L5-S1 were enrolled. The average age was 38.4 ± 8.0 and the chief complaint was low back pain and sciatica radiating to one or both lower extremities. Conservative treatment was administered. Neurological examination findings, durations of treatment and intervals until symptomatic recovery were recorded. Laségue tests and neurosensory examination revealed that mild neurological deficits existed in 16 of our patients. Previously, 5 patients had received physiotherapy and 7 patients had been on medical treatment. The number of patients with LDH at the level of L3-L4, L4-L5, and L5-S1 were 1, 13, and 9, respectively. All patients reported that they had benefit from medical treatment and bed rest, and radiologic improvement was observed simultaneously on MRI scans. The average duration until symptomatic recovery and/or regression of LDH symptoms was 13.6 ± 5.4 months (range: 5-22). It should be kept in mind that lumbar disc hernias could regress with medical treatment and rest without surgery, and there should be an awareness that these patients could recover radiologically. This condition must be taken into account during decision making for surgical intervention in LDH patients devoid of indications for emergent surgery.
Full Text Available The goal of this study is to identify the contribution of effectuation dimensions to the predictive power of the entrepreneurial intention model over and above that which can be accounted for by other predictors selected and confirmed in previous studies. As is often the case in social and behavioral studies, some variables are likely to be highly correlated with each other. Therefore, the relative amount of variance in the criterion variable explained by each of the predictors depends on several factors such as the order of variable entry and sample specifics. The results show the modest predictive power of two dimensions of effectuation prior to the introduction of the theory of planned behavior elements. The article highlights the main advantages of applying hierarchical regression in social sciences as well as in the specific context of entrepreneurial intention formation, and addresses some of the potential pitfalls that this type of analysis entails.
Knafl, George J
This book presents methods for investigating whether relationships are linear or nonlinear and for adaptively fitting appropriate models when they are nonlinear. Data analysts will learn how to incorporate nonlinearity in one or more predictor variables into regression models for different types of outcome variables. Such nonlinear dependence is often not considered in applied research, yet nonlinear relationships are common and so need to be addressed. A standard linear analysis can produce misleading conclusions, while a nonlinear analysis can provide novel insights into data, not otherwise possible. A variety of examples of the benefits of modeling nonlinear relationships are presented throughout the book. Methods are covered using what are called fractional polynomials based on real-valued power transformations of primary predictor variables combined with model selection based on likelihood cross-validation. The book covers how to formulate and conduct such adaptive fractional polynomial modeling in the s...
Lee, Hyekyoung; Lee, Dong Soo; Kang, Hyejin; Kim, Boong-Nyun; Chung, Moo K.
Sparse partial correlation is a useful connectivity measure for brain networks when it is difficult to compute the exact partial correlation in the small-n large-p setting. In this paper, we formulate the problem of estimating partial correlation as a sparse linear regression with a l1-norm penalty. The method is applied to brain network consisting of parcellated regions of interest (ROIs), which are obtained from FDG-PET images of the autism spectrum disorder (ASD) children and the pediatric control (PedCon) subjects. To validate the results, we check their reproducibilities of the obtained brain networks by the leave-one-out cross validation and compare the clustered structures derived from the brain networks of ASD and PedCon.
Applied ethics is a growing, interdisciplinary field dealing with ethical problems in different areas of society. It includes for instance social and political ethics, computer ethics, medical ethics, bioethics, envi-ronmental ethics, business ethics, and it also relates to different forms of professional ethics. From the perspective of ethics, applied ethics is a specialisation in one area of ethics. From the perspective of social practice applying eth-ics is to focus on ethical aspects and ...
Sonwane, Chandrashekhar; Saunders, Timothy; Fitzsimmons, Mark Andrew
A pump apparatus includes a particulate pump that defines a passage that extends from an inlet to an outlet. A duct is in flow communication with the outlet. The duct includes a deconsolidator configured to fragment particle agglomerates received from the passage.
De Iorio Maria
Full Text Available Abstract Background Technological developments have increased the feasibility of large scale genetic association studies. Densely typed genetic markers are obtained using SNP arrays, next-generation sequencing technologies and imputation. However, SNPs typed using these methods can be highly correlated due to linkage disequilibrium among them, and standard multiple regression techniques fail with these data sets due to their high dimensionality and correlation structure. There has been increasing interest in using penalised regression in the analysis of high dimensional data. Ridge regression is one such penalised regression technique which does not perform variable selection, instead estimating a regression coefficient for each predictor variable. It is therefore desirable to obtain an estimate of the significance of each ridge regression coefficient. Results We develop and evaluate a test of significance for ridge regression coefficients. Using simulation studies, we demonstrate that the performance of the test is comparable to that of a permutation test, with the advantage of a much-reduced computational cost. We introduce the p-value trace, a plot of the negative logarithm of the p-values of ridge regression coefficients with increasing shrinkage parameter, which enables the visualisation of the change in p-value of the regression coefficients with increasing penalisation. We apply the proposed method to a lung cancer case-control data set from EPIC, the European Prospective Investigation into Cancer and Nutrition. Conclusions The proposed test is a useful alternative to a permutation test for the estimation of the significance of ridge regression coefficients, at a much-reduced computational cost. The p-value trace is an informative graphical tool for evaluating the results of a test of significance of ridge regression coefficients as the shrinkage parameter increases, and the proposed test makes its production computationally feasible.
In a recent paper (Weller EA, Milton DK, Eisen EA, Spiegelman D. Regression calibration for logistic regression with multiple surrogates for one exposure. Journal of Statistical Planning and Inference 2007; 137: 449-461), the authors discussed fitting logistic regression models when a scalar main explanatory variable is measured with error by several surrogates, that is, a situation with more surrogates than variables measured with error. They compared two methods of adjusting for measurement error using a regression calibration approximate model as if it were exact. One is the standard regression calibration approach consisting of substituting an estimated conditional expectation of the true covariate given observed data in the logistic regression. The other is a novel two-stage approach when the logistic regression is fitted to multiple surrogates, and then a linear combination of estimated slopes is formed as the estimate of interest. Applying estimated asymptotic variances for both methods in a single data set with some sensitivity analysis, the authors asserted superiority of their two-stage approach. We investigate this claim in some detail. A troubling aspect of the proposed two-stage method is that, unlike standard regression calibration and a natural form of maximum likelihood, the resulting estimates are not invariant to reparameterization of nuisance parameters in the model. We show, however, that, under the regression calibration approximation, the two-stage method is asymptotically equivalent to a maximum likelihood formulation, and is therefore in theory superior to standard regression calibration. However, our extensive finite-sample simulations in the practically important parameter space where the regression calibration model provides a good approximation failed to uncover such superiority of the two-stage method. We also discuss extensions to different data structures.
Cule, Erika; Vineis, Paolo; De Iorio, Maria
Technological developments have increased the feasibility of large scale genetic association studies. Densely typed genetic markers are obtained using SNP arrays, next-generation sequencing technologies and imputation. However, SNPs typed using these methods can be highly correlated due to linkage disequilibrium among them, and standard multiple regression techniques fail with these data sets due to their high dimensionality and correlation structure. There has been increasing interest in using penalised regression in the analysis of high dimensional data. Ridge regression is one such penalised regression technique which does not perform variable selection, instead estimating a regression coefficient for each predictor variable. It is therefore desirable to obtain an estimate of the significance of each ridge regression coefficient. We develop and evaluate a test of significance for ridge regression coefficients. Using simulation studies, we demonstrate that the performance of the test is comparable to that of a permutation test, with the advantage of a much-reduced computational cost. We introduce the p-value trace, a plot of the negative logarithm of the p-values of ridge regression coefficients with increasing shrinkage parameter, which enables the visualisation of the change in p-value of the regression coefficients with increasing penalisation. We apply the proposed method to a lung cancer case-control data set from EPIC, the European Prospective Investigation into Cancer and Nutrition. The proposed test is a useful alternative to a permutation test for the estimation of the significance of ridge regression coefficients, at a much-reduced computational cost. The p-value trace is an informative graphical tool for evaluating the results of a test of significance of ridge regression coefficients as the shrinkage parameter increases, and the proposed test makes its production computationally feasible.
A Monte Carlo simulation study comparing linear regression, beta regression, variable-dispersion beta regression and fractional logit regression at recovering average difference measures in a two sample design.
Meaney, Christopher; Moineddin, Rahim
In biomedical research, response variables are often encountered which have bounded support on the open unit interval--(0,1). Traditionally, researchers have attempted to estimate covariate effects on these types of response data using linear regression. Alternative modelling strategies may include: beta regression, variable-dispersion beta regression, and fractional logit regression models. This study employs a Monte Carlo simulation design to compare the statistical properties of the linear regression model to that of the more novel beta regression, variable-dispersion beta regression, and fractional logit regression models. In the Monte Carlo experiment we assume a simple two sample design. We assume observations are realizations of independent draws from their respective probability models. The randomly simulated draws from the various probability models are chosen to emulate average proportion/percentage/rate differences of pre-specified magnitudes. Following simulation of the experimental data we estimate average proportion/percentage/rate differences. We compare the estimators in terms of bias, variance, type-1 error and power. Estimates of Monte Carlo error associated with these quantities are provided. If response data are beta distributed with constant dispersion parameters across the two samples, then all models are unbiased and have reasonable type-1 error rates and power profiles. If the response data in the two samples have different dispersion parameters, then the simple beta regression model is biased. When the sample size is small (N0 = N1 = 25) linear regression has superior type-1 error rates compared to the other models. Small sample type-1 error rates can be improved in beta regression models using bias correction/reduction methods. In the power experiments, variable-dispersion beta regression and fractional logit regression models have slightly elevated power compared to linear regression models. Similar results were observed if the
Ma, Yu-Feng; Wang, Qing-Fu; Chen, Zhao-Jun; Du, Chun-Lin; Li, Jun-Hai; Huang, Hu; Shi, Zong-Ting; Yin, Yue-Shan; Zhang, Lei; A-Di, Li-Jiang; Dong, Shi-Yu; Wu, Ji
To perform Multiple Linear Regression analysis of X-ray measurement and WOMAC scores of knee osteoarthritis, and to analyze their relationship with clinical and biomechanical concepts. From March 2011 to July 2011, 140 patients (250 knees) were reviewed, including 132 knees in the left and 118 knees in the right; ranging in age from 40 to 71 years, with an average of 54.68 years. The MB-RULER measurement software was applied to measure femoral angle, tibial angle, femorotibial angle, joint gap angle from antero-posterir and lateral position of X-rays. The WOMAC scores were also collected. Then multiple regression equations was applied for the linear regression analysis of correlation between the X-ray measurement and WOMAC scores. There was statistical significance in the regression equation of AP X-rays value and WOMAC scores (Pregression equation of lateral X-ray value and WOMAC scores (P>0.05). 1) X-ray measurement of knee joint can reflect the WOMAC scores to a certain extent. 2) It is necessary to measure the X-ray mechanical axis of knee, which is important for diagnosis and treatment of osteoarthritis. 3) The correlation between tibial angle,joint gap angle on antero-posterior X-ray and WOMAC scores is significant, which can be used to assess the functional recovery of patients before and after treatment.
Crane, Paul K; Gibbons, Laura E; Jolley, Lance; van Belle, Gerald
We present an ordinal logistic regression model for identification of items with differential item functioning (DIF) and apply this model to a Mini-Mental State Examination (MMSE) dataset. We employ item response theory ability estimation in our models. Three nested ordinal logistic regression models are applied to each item. Model testing begins with examination of the statistical significance of the interaction term between ability and the group indicator, consistent with nonuniform DIF. Then we turn our attention to the coefficient of the ability term in models with and without the group term. If including the group term has a marked effect on that coefficient, we declare that it has uniform DIF. We examined DIF related to language of test administration in addition to self-reported race, Hispanic ethnicity, age, years of education, and sex. We used PARSCALE for IRT analyses and STATA for ordinal logistic regression approaches. We used an iterative technique for adjusting IRT ability estimates on the basis of DIF findings. Five items were found to have DIF related to language. These same items also had DIF related to other covariates. The ordinal logistic regression approach to DIF detection, when combined with IRT ability estimates, provides a reasonable alternative for DIF detection. There appear to be several items with significant DIF related to language of test administration in the MMSE. More attention needs to be paid to the specific criteria used to determine whether an item has DIF, not just the technique used to identify DIF.
Liu, Ming; Yin, Xiaobo; Zhang, Xiang
The present invention provides for a one or more layer graphene optical modulator. In a first exemplary embodiment the optical modulator includes an optical waveguide, a nanoscale oxide spacer adjacent to a working region of the waveguide, and a monolayer graphene sheet adjacent to the spacer. In a second exemplary embodiment, the optical modulator includes at least one pair of active media, where the pair includes an oxide spacer, a first monolayer graphene sheet adjacent to a first side of the spacer, and a second monolayer graphene sheet adjacent to a second side of the spacer, and at least one optical waveguide adjacent to the pair.
Aguilar-Ruiz Jesus S
Full Text Available Abstract Background Novel strategies are required in order to handle the huge amount of data produced by microarray technologies. To infer gene regulatory networks, the first step is to find direct regulatory relationships between genes building the so-called gene co-expression networks. They are typically generated using correlation statistics as pairwise similarity measures. Correlation-based methods are very useful in order to determine whether two genes have a strong global similarity but do not detect local similarities. Results We propose model trees as a method to identify gene interaction networks. While correlation-based methods analyze each pair of genes, in our approach we generate a single regression tree for each gene from the remaining genes. Finally, a graph from all the relationships among output and input genes is built taking into account whether the pair of genes is statistically significant. For this reason we apply a statistical procedure to control the false discovery rate. The performance of our approach, named REGNET, is experimentally tested on two well-known data sets: Saccharomyces Cerevisiae and E.coli data set. First, the biological coherence of the results are tested. Second the E.coli transcriptional network (in the Regulon database is used as control to compare the results to that of a correlation-based method. This experiment shows that REGNET performs more accurately at detecting true gene associations than the Pearson and Spearman zeroth and first-order correlation-based methods. Conclusions REGNET generates gene association networks from gene expression data, and differs from correlation-based methods in that the relationship between one gene and others is calculated simultaneously. Model trees are very useful techniques to estimate the numerical values for the target genes by linear regression functions. They are very often more precise than linear regression models because they can add just different linear
Fang, Xiaozhao; Xu, Yong; Li, Xuelong; Lai, Zhihui; Wong, Wai Keung; Fang, Bingwu
Linear regression (LR) and some of its variants have been widely used for classification problems. Most of these methods assume that during the learning phase, the training samples can be exactly transformed into a strict binary label matrix, which has too little freedom to fit the labels adequately. To address this problem, in this paper, we propose a novel regularized label relaxation LR method, which has the following notable characteristics. First, the proposed method relaxes the strict binary label matrix into a slack variable matrix by introducing a nonnegative label relaxation matrix into LR, which provides more freedom to fit the labels and simultaneously enlarges the margins between different classes as much as possible. Second, the proposed method constructs the class compactness graph based on manifold learning and uses it as the regularization item to avoid the problem of overfitting. The class compactness graph is used to ensure that the samples sharing the same labels can be kept close after they are transformed. Two different algorithms, which are, respectively, based on -norm and -norm loss functions are devised. These two algorithms have compact closed-form solutions in each iteration so that they are easily implemented. Extensive experiments show that these two algorithms outperform the state-of-the-art algorithms in terms of the classification accuracy and running time.
The 1988 progress report of the Applied Mathematics center (Polytechnic School, France), is presented. The research fields of the Center are the scientific calculus, the probabilities and statistics and the video image synthesis. The research topics developed are: the analysis of numerical methods, the mathematical analysis of the physics and mechanics fundamental models, the numerical solution of complex models related to the industrial problems, the stochastic calculus and the brownian movement, the stochastic partial differential equations, the identification of the adaptive filtering parameters, the discrete element systems, statistics, the stochastic control and the development, the image synthesis techniques for education and research programs. The published papers, the congress communications and the thesis are listed [fr
Passmore, David L.; Mohamed, Dominic A.
Describes the workings of a simple two-way table of employment status by sex and extends this table to include school enrollment status by sex, race, and high school graduation status using logistic regression techniques. (JOW)
National Aeronautics and Space Administration — Regression problems on massive data sets are ubiquitous in many application domains including the Internet, earth and space sciences, and finances. In many cases,...
Yang, Heon Young; Na, Man Gyun
PWRs (Pressurized Water Reactors) generally operate in the nucleate boiling state. However, the conversion of nucleate boiling into film boiling with conspicuously reduced heat transfer induces a boiling crisis that may cause the fuel clad melting in the long run. This type of boiling crisis is called Departure from Nucleate Boiling (DNB) phenomena. Because the prediction of minimum DNBR in a reactor core is very important to prevent the boiling crisis such as clad melting, a lot of research has been conducted to predict DNBR values. The object of this research is to predict minimum DNBR applying support vector regression (SVR) by using the measured signals of a reactor coolant system (RCS). The SVR has extensively and successfully been applied to nonlinear function approximation like the proposed problem for estimating DNBR values that will be a function of various input variables such as reactor power, reactor pressure, core mass flowrate, control rod positions and so on. The minimum DNBR in a reactor core is predicted using these various operating condition data as the inputs to the SVR. The minimum DBNR values predicted by the SVR confirm its correctness compared with COLSS values
Reed, Phil; Wu, Yaqionq
To outline the uses of logistic regression and other statistical methods for risk factor analysis in the context of research on stuttering. The principles underlying the application of a logistic regression are illustrated, and the types of questions to which such a technique has been applied in the stuttering field are outlined. The assumptions and limitations of the technique are discussed with respect to existing stuttering research, and with respect to formulating appropriate research strategies to accommodate these considerations. Finally, some alternatives to the approach are briefly discussed. The way the statistical procedures are employed are demonstrated with some hypothetical data. Research into several practical issues concerning stuttering could benefit if risk factor modelling were used. Important examples are early diagnosis, prognosis (whether a child will recover or persist) and assessment of treatment outcome. After reading this article you will: (a) Summarize the situations in which logistic regression can be applied to a range of issues about stuttering; (b) Follow the steps in performing a logistic regression analysis; (c) Describe the assumptions of the logistic regression technique and the precautions that need to be checked when it is employed; (d) Be able to summarize its advantages over other techniques like estimation of group differences and simple regression. Copyright © 2012 Elsevier Inc. All rights reserved.
Manjula, R.; Jain, Shubham; Srivastava, Sharad; Rajiv Kher, Pranav
The real estate market is one of the most competitive in terms of pricing and the same tends to vary significantly based on a lot of factors, hence it becomes one of the prime fields to apply the concepts of machine learning to optimize and predict the prices with high accuracy. Therefore in this paper, we present various important features to use while predicting housing prices with good accuracy. We have described regression models, using various features to have lower Residual Sum of Squares error. While using features in a regression model some feature engineering is required for better prediction. Often a set of features (multiple regressions) or polynomial regression (applying a various set of powers in the features) is used for making better model fit. For these models are expected to be susceptible towards over fitting ridge regression is used to reduce it. This paper thus directs to the best application of regression models in addition to other techniques to optimize the result.
Ulbrich, N.; Bader, Jon B.
Calibration data of a wind tunnel sting balance was processed using a candidate math model search algorithm that recommends an optimized regression model for the data analysis. During the calibration the normal force and the moment at the balance moment center were selected as independent calibration variables. The sting balance itself had two moment gages. Therefore, after analyzing the connection between calibration loads and gage outputs, it was decided to choose the difference and the sum of the gage outputs as the two responses that best describe the behavior of the balance. The math model search algorithm was applied to these two responses. An optimized regression model was obtained for each response. Classical strain gage balance load transformations and the equations of the deflection of a cantilever beam under load are used to show that the search algorithm s two optimized regression models are supported by a theoretical analysis of the relationship between the applied calibration loads and the measured gage outputs. The analysis of the sting balance calibration data set is a rare example of a situation when terms of a regression model of a balance can directly be derived from first principles of physics. In addition, it is interesting to note that the search algorithm recommended the correct regression model term combinations using only a set of statistical quality metrics that were applied to the experimental data during the algorithm s term selection process.
Sarkar, Abhra; Mallick, Bani K; Carroll, Raymond J
We consider the problem of robust estimation of the regression relationship between a response and a covariate based on sample in which precise measurements on the covariate are not available but error-prone surrogates for the unobserved covariate are available for each sampled unit. Existing methods often make restrictive and unrealistic assumptions about the density of the covariate and the densities of the regression and the measurement errors, for example, normality and, for the latter two, also homoscedasticity and thus independence from the covariate. In this article we describe Bayesian semiparametric methodology based on mixtures of B-splines and mixtures induced by Dirichlet processes that relaxes these restrictive assumptions. In particular, our models for the aforementioned densities adapt to asymmetry, heavy tails and multimodality. The models for the densities of regression and measurement errors also accommodate conditional heteroscedasticity. In simulation experiments, our method vastly outperforms existing methods. We apply our method to data from nutritional epidemiology. © 2014, The International Biometric Society.
Zhu, K; Lou, Z; Zhou, J; Ballester, N; Kong, N; Parikh, P
This article is part of the Focus Theme of Methods of Information in Medicine on "Big Data and Analytics in Healthcare". Hospital readmissions raise healthcare costs and cause significant distress to providers and patients. It is, therefore, of great interest to healthcare organizations to predict what patients are at risk to be readmitted to their hospitals. However, current logistic regression based risk prediction models have limited prediction power when applied to hospital administrative data. Meanwhile, although decision trees and random forests have been applied, they tend to be too complex to understand among the hospital practitioners. Explore the use of conditional logistic regression to increase the prediction accuracy. We analyzed an HCUP statewide inpatient discharge record dataset, which includes patient demographics, clinical and care utilization data from California. We extracted records of heart failure Medicare beneficiaries who had inpatient experience during an 11-month period. We corrected the data imbalance issue with under-sampling. In our study, we first applied standard logistic regression and decision tree to obtain influential variables and derive practically meaning decision rules. We then stratified the original data set accordingly and applied logistic regression on each data stratum. We further explored the effect of interacting variables in the logistic regression modeling. We conducted cross validation to assess the overall prediction performance of conditional logistic regression (CLR) and compared it with standard classification models. The developed CLR models outperformed several standard classification models (e.g., straightforward logistic regression, stepwise logistic regression, random forest, support vector machine). For example, the best CLR model improved the classification accuracy by nearly 20% over the straightforward logistic regression model. Furthermore, the developed CLR models tend to achieve better sensitivity of
Mallozzi, P.J.; Epstein, H.M.; Jung, R.G.; Applebaum, D.C.; Fairand, B.P.; Gallagher, W.J.; Uecker, R.L.; Muckerheide, M.C.
The invention discloses a method and apparatus for applying radiation by producing X-rays of a selected spectrum and intensity and directing them to a desired location. Radiant energy is directed from a laser onto a target to produce such X-rays at the target, which is so positioned adjacent to the desired location as to emit the X-rays toward the desired location; or such X-rays are produced in a region away from the desired location, and are channeled to the desired location. The radiant energy directing means may be shaped (as with bends; adjustable, if desired) to circumvent any obstruction between the laser and the target. Similarly, the X-ray channeling means may be shaped (as with fixed or adjustable bends) to circumvent any obstruction between the region where the X-rays are produced and the desired location. For producing a radiograph in a living organism the X-rays are provided in a short pulse to avoid any blurring of the radiograph from movement of or in the organism. For altering tissue in a living organism the selected spectrum and intensity are such as to affect substantially the tissue in a preselected volume without injuring nearby tissue. Typically, the selected spectrum comprises the range of about 0.1 to 100 keV, and the intensity is selected to provide about 100 to 1000 rads at the desired location. The X-rays may be produced by stimulated emission thereof, typically in a single direction
Tracey, Terrence J.; Sedlacek, William E.
A study of the effectiveness of ridge regression over ordinary least squares regression as applied to both cognitive and noncognitive admissions data is reported. Separate race equations and a general equation were used. The analysis used did not improve on existing regression analyses. (MSE)
This paper applies the regression-based inequality decomposition approach to explore determinants of income inequality in Cameroon using the 2007 Cameroon household consumption survey. The contribution of each source to measured income inequality is the sum of its weighted marginal contributions in all possible ...
Abstract. The influence of individual cases in a data set is studied when variable selection is applied in multiple linear regression. Two different influence measures, based on the Cp criterion and. Akaike's information criterion, are introduced. The relative change in the selection criterion when an individual case is omitted is ...
The influence of individual cases in a data set is studied when variable selection is applied in multiple linear regression. Two different influence measures, based on the Cp criterion and Akaike's information criterion, are introduced. The relative change in the selection criterion when an individual case is omitted is proposed ...
The paper discusses the merits of partial shrinkage of the ordinary least square estimator of the coefficients of the multiple regression model of full rank. Theoretical comparisons of scalar and matrix-valued risks of the partially shrunken and totally shrunken estimators are given. The strategy of partial shrinkage is applied to ...
Kilmer, J T; Rodríguez, R L
When it comes to fitting simple allometric slopes through measurement data, evolutionary biologists have been torn between regression methods. On the one hand, there is the ordinary least squares (OLS) regression, which is commonly used across many disciplines of biology to fit lines through data, but which has a reputation for underestimating slopes when measurement error is present. On the other hand, there is the reduced major axis (RMA) regression, which is often recommended as a substitute for OLS regression in studies of allometry, but which has several weaknesses of its own. Here, we review statistical theory as it applies to evolutionary biology and studies of allometry. We point out that the concerns that arise from measurement error for OLS regression are small and straightforward to deal with, whereas RMA has several key properties that make it unfit for use in the field of allometry. The recommended approach for researchers interested in allometry is to use OLS regression on measurements taken with low (but realistically achievable) measurement error. If measurement error is unavoidable and relatively large, it is preferable to correct for slope attenuation rather than to turn to RMA regression, or to take the expected amount of attenuation into account when interpreting the data. © 2016 European Society For Evolutionary Biology. Journal of Evolutionary Biology © 2016 European Society For Evolutionary Biology.
A number of novel features of QCD are reviewed, including the consequences of formation zone and color transparency phenomena in hadronic collisions, the use of automatic scale setting for perturbative predictions, null-zone phenomena as a fundamental test of gauge theory, and the relationship of intrinsic heavy colored particle Fock state components to new particle production. We conclude with a review of the applications of QCD to nuclear multiquark systems. 74 references
Lynch, Scott M
""Introduction to Applied Bayesian Statistics and Estimation for Social Scientists"" covers the complete process of Bayesian statistical analysis in great detail from the development of a model through the process of making statistical inference. The key feature of this book is that it covers models that are most commonly used in social science research - including the linear regression model, generalized linear models, hierarchical models, and multivariate regression models - and it thoroughly develops each real-data example in painstaking detail.The first part of the book provides a detailed
Kalialis, Louise V; Drzewiecki, Krzysztof T; Mohammadi, Mahin
A case of a 61-year-old male with widespread metastatic melanoma is presented 5 years after complete spontaneous cure. Spontaneous regression occurred in cutaneous, pulmonary, hepatic and cerebral metastases. A review of the literature reveals seven cases of regression of cerebral metastases......; this report is the first to document complete spontaneous regression of cerebral metastases from malignant melanoma by means of computed tomography scans. Spontaneous regression is defined as the partial or complete disappearance of a malignant tumour in the absence of all treatment or in the presence...... of therapy, which is considered inadequate to exert a significant influence on neoplastic disease. The incidence of spontaneous regression of metastases from malignant melanoma is approximately one per 400 patients, and possible mechanisms include immunologic, endocrine, inflammatory and tumour nutritional...
A. Alexander Beaujean
Full Text Available Education researchers often study count variables, such as times a student reached a goal, discipline referrals, and absences. Most researchers that study these variables use typical regression methods (i.e., ordinary least-squares either with or without transforming the count variables. In either case, using typical regression for count data can produce parameter estimates that are biased, thus diminishing any inferences made from such data. As count-variable regression models are seldom taught in training programs, we present a tutorial to help educational researchers use such methods in their own research. We demonstrate analyzing and interpreting count data using Poisson, negative binomial, zero-inflated Poisson, and zero-inflated negative binomial regression models. The count regression methods are introduced through an example using the number of times students skipped class. The data for this example are freely available and the R syntax used run the example analyses are included in the Appendix.
Kojo, Nobuto; Tokutomi, Takashi; Eguchi, Gihachirou; Takagi, Shigeyuki; Matsumoto, Tomie; Sasaguri, Yasuyuki; Shigemori, Minoru.
In a 46-year-old female with a 1-month history of gait and speech disturbances, computed tomography (CT) demonstrated mass lesions of slightly high density in the left basal ganglia and left frontal lobe. The lesions were markedly enhanced by contrast medium. The patient received no specific treatment, but her clinical manifestations gradually abated and the lesions decreased in size. Five months after her initial examination, the lesions were absent on CT scans; only a small area of low density remained. Residual clinical symptoms included mild right hemiparesis and aphasia. After 14 months the patient again deteriorated, and a CT scan revealed mass lesions in the right frontal lobe and the pons. However, no enhancement was observed in the previously affected regions. A biopsy revealed malignant lymphoma. Despite treatment with steroids and radiation, the patient's clinical status progressively worsened and she died 27 months after initial presentation. Seven other cases of spontaneous regression of primary malignant lymphoma have been reported. In this case, the mechanism of the spontaneous regression was not clear, but changes in immunologic status may have been involved. (author)
Li, Wenfa; Liu, Hongzhe; Yang, Peng; Xie, Wei
As one of the most popular statistical and machine learning models, logistic regression with regularization has found wide adoption in biomedicine, social sciences, information technology, and so on. These domains often involve data of human subjects that are contingent upon strict privacy regulations. Concerns over data privacy make it increasingly difficult to coordinate and conduct large-scale collaborative studies, which typically rely on cross-institution data sharing and joint analysis. Our work here focuses on safeguarding regularized logistic regression, a widely-used statistical model while at the same time has not been investigated from a data security and privacy perspective. We consider a common use scenario of multi-institution collaborative studies, such as in the form of research consortia or networks as widely seen in genetics, epidemiology, social sciences, etc. To make our privacy-enhancing solution practical, we demonstrate a non-conventional and computationally efficient method leveraging distributing computing and strong cryptography to provide comprehensive protection over individual-level and summary data. Extensive empirical evaluations on several studies validate the privacy guarantee, efficiency and scalability of our proposal. We also discuss the practical implications of our solution for large-scale studies and applications from various disciplines, including genetic and biomedical studies, smart grid, network analysis, etc.
Rodríguez, F Lucas; Atanassov, I; Burkimsher, P; Frost, O; Taskinen, J; Tulimaki, V
The Detector Control System of the TOTEM experiment at the LHC is built with the industrial product WinCC OA (PVSS). The TOTEM system is generated automatically through scripts using as input the detector Product Breakdown Structure (PBS) structure and its pinout connectivity, archiving and alarm metainformation, and some other heuristics based on the naming conventions. When those initial parameters and automation code are modified to include new features, the resulting PVSS system can also introduce side-effects. On a daily basis, a custom developed regression testing tool takes the most recent code from a Subversion (SVN) repository and builds a new control system from scratch. This system is exported in plain text format using the PVSS export tool, and compared with a system previously validated by a human. A report is sent to the developers with any differences highlighted, in readiness for validation and acceptance as a new stable version. This regression approach is not dependent on any development framework or methodology. This process has been satisfactory during several months, proving to be a very valuable tool before deploying new versions in the production systems.
Made Tirta, I.; Anggraeni, Dian; Pandutama, Martinus
Regression analysis (statistical analmodelling) are among statistical methods which are frequently needed in analyzing quantitative data, especially to model relationship between response and explanatory variables. Nowadays, statistical models have been developed into various directions to model various type and complex relationship of data. Rich varieties of advanced and recent statistical modelling are mostly available on open source software (one of them is R). However, these advanced statistical modelling, are not very friendly to novice R users, since they are based on programming script or command line interface. Our research aims to developed web interface (based on R and shiny), so that most recent and advanced statistical modelling are readily available, accessible and applicable on web. We have previously made interface in the form of e-tutorial for several modern and advanced statistical modelling on R especially for independent responses (including linear models/LM, generalized linier models/GLM, generalized additive model/GAM and generalized additive model for location scale and shape/GAMLSS). In this research we unified them in the form of data analysis, including model using Computer Intensive Statistics (Bootstrap and Markov Chain Monte Carlo/ MCMC). All are readily accessible on our online Virtual Statistics Laboratory. The web (interface) make the statistical modeling becomes easier to apply and easier to compare them in order to find the most appropriate model for the data.
Proposal to consistently apply the International Code of Nomenclature of Prokaryotes (ICNP) to names of the oxygenic photosynthetic bacteria (cyanobacteria), including those validly published under the International Code of Botanical Nomenclature (ICBN)/International Code of Nomenclature for algae, fungi and plants (ICN), and proposal to change Principle 2 of the ICNP.
Pinevich, Alexander V
This taxonomic note was motivated by the recent proposal [Oren & Garrity (2014) Int J Syst Evol Microbiol 64, 309-310] to exclude the oxygenic photosynthetic bacteria (cyanobacteria) from the wording of General Consideration 5 of the International Code of Nomenclature of Prokaryotes (ICNP), which entails unilateral coverage of these prokaryotes by the International Code of Nomenclature for algae, fungi, and plants (ICN; formerly the International Code of Botanical Nomenclature, ICBN). On the basis of key viewpoints, approaches and rules in the systematics, taxonomy and nomenclature of prokaryotes it is reciprocally proposed to apply the ICNP to names of cyanobacteria including those validly published under the ICBN/ICN. For this purpose, a change to Principle 2 of the ICNP is proposed to enable validation of cyanobacterial names published under the ICBN/ICN rules. © 2015 IUMS.
Shi, Jian Qing
Gaussian Process Regression Analysis for Functional Data presents nonparametric statistical methods for functional regression analysis, specifically the methods based on a Gaussian process prior in a functional space. The authors focus on problems involving functional response variables and mixed covariates of functional and scalar variables.Covering the basics of Gaussian process regression, the first several chapters discuss functional data analysis, theoretical aspects based on the asymptotic properties of Gaussian process regression models, and new methodological developments for high dime
Osterrieder, Daniela; Ventosa-Santaulària, Daniel; Vera-Valdés, J. Eduardo
Predictive return regressions with persistent regressors are typically plagued by (asymptotically) biased/inconsistent estimates of the slope, non-standard or potentially even spurious statistical inference, and regression unbalancedness. We alleviate the problem of unbalancedness in the theoreti......Predictive return regressions with persistent regressors are typically plagued by (asymptotically) biased/inconsistent estimates of the slope, non-standard or potentially even spurious statistical inference, and regression unbalancedness. We alleviate the problem of unbalancedness...
Semiparametric regression is a fusion between parametric regression and nonparametric regression that integrates low-rank penalized splines, mixed model and hierarchical Bayesian methodology – thus allowing more streamlined handling of longitudinal and spatial correlation. We review progress in the field over the five-year period between 2003 and 2007. We find semiparametric regression to be a vibrant field with substantial involvement and activity, continual enhancement and widespread application.
Hoerl and Kennard (1970a) introduced the ridge regression estimator as an alternative to the ordinary least squares (OLS) estimator in the presence of multicollinearity. In ridge regression, ridge parameter plays an important role in parameter estimation. In this article, a new method for estimating ridge parameters in both situations of ordinary ridge regression (ORR) and generalized ridge regression (GRR) is proposed. The simulation study evaluates the performance of the proposed estimator ...
This article introduces a new estimator for the analysis of two contemporaneously correlated endogenous event count variables. This seemingly unrelated Poisson regression model (SUPREME) estimator combines the efficiencies created by single equation Poisson regression model estimators and insights from "seemingly unrelated" linear regression models.
Standardized coefficients in logistic regression analysis have the same utility as standardized coefficients in linear regression analysis. Although there has been no consensus on the best way to construct standardized logistic regression coefficients, there is now sufficient evidence to suggest a single best approach to the construction of a…
Marquette, J. F.; Dufala, M. M.
Ridge regression is an approach to ameliorating the problem of large standard errors of regression estimates when predictor variables are highly intercorrelated. An interactive computer program is presented which allows for investigation of the effects of using various ridge regression adjustment values. (JKS)
A candidate math model search algorithm was developed at Ames Research Center that determines a recommended math model for the multivariate regression analysis of experimental data. The search algorithm is applicable to classical regression analysis problems as well as wind tunnel strain gage balance calibration analysis applications. The algorithm compares the predictive capability of different regression models using the standard deviation of the PRESS residuals of the responses as a search metric. This search metric is minimized during the search. Singular value decomposition is used during the search to reject math models that lead to a singular solution of the regression analysis problem. Two threshold dependent constraints are also applied. The first constraint rejects math models with insignificant terms. The second constraint rejects math models with near-linear dependencies between terms. The math term hierarchy rule may also be applied as an optional constraint during or after the candidate math model search. The final term selection of the recommended math model depends on the regressor and response values of the data set, the user s function class combination choice, the user s constraint selections, and the result of the search metric minimization. A frequently used regression analysis example from the literature is used to illustrate the application of the search algorithm to experimental data.
Wilcox, Rand R
Applying Contemporary Statistical Techniques explains why traditional statistical methods are often inadequate or outdated when applied to modern problems. Wilcox demonstrates how new and more powerful techniques address these problems far more effectively, making these modern robust methods understandable, practical, and easily accessible.* Assumes no previous training in statistics * Explains how and why modern statistical methods provide more accurate results than conventional methods* Covers the latest developments on multiple comparisons * Includes recent advanc
Denzel, Alexander; Kästner, Johannes
We implemented a geometry optimizer based on Gaussian process regression (GPR) to find minimum structures on potential energy surfaces. We tested both a two times differentiable form of the Matérn kernel and the squared exponential kernel. The Matérn kernel performs much better. We give a detailed description of the optimization procedures. These include overshooting the step resulting from GPR in order to obtain a higher degree of interpolation vs. extrapolation. In a benchmark against the Limited-memory Broyden-Fletcher-Goldfarb-Shanno optimizer of the DL-FIND library on 26 test systems, we found the new optimizer to generally reduce the number of required optimization steps.
Berk, Richard A
This textbook considers statistical learning applications when interest centers on the conditional distribution of the response variable, given a set of predictors, and when it is important to characterize how the predictors are related to the response. As a first approximation, this can be seen as an extension of nonparametric regression. This fully revised new edition includes important developments over the past 8 years. Consistent with modern data analytics, it emphasizes that a proper statistical learning data analysis derives from sound data collection, intelligent data management, appropriate statistical procedures, and an accessible interpretation of results. A continued emphasis on the implications for practice runs through the text. Among the statistical learning procedures examined are bagging, random forests, boosting, support vector machines and neural networks. Response variables may be quantitative or categorical. As in the first edition, a unifying theme is supervised learning that can be trea...
Ji, Shuang; Peng, Limin; Cheng, Yu; Lai, HuiChuan
SUMMARY Double censoring often occurs in registry studies when left censoring is present in addition to right censoring. In this work, we propose a new analysis strategy for such doubly censored data by adopting a quantile regression model. We develop computationally simple estimation and inference procedures by appropriately using the embedded martingale structure. Asymptotic properties, including the uniform consistency and weak convergence, are established for the resulting estimators. Moreover, we propose conditional inference to address the special identifiability issues attached to the doubly censoring setting. We further show that the proposed method can be readily adapted to handle left truncation. Simulation studies demonstrate good finite-sample performance of the new inferential procedures. The practical utility of our method is illustrated by an analysis of the onset of the most commonly investigated respiratory infection, Pseudomonas aeruginosa, in children with cystic fibrosis through the use of the US Cystic Fibrosis Registry. PMID:21950348
Yuan, Ying; MacKinnon, David P.
Mediation analysis has many applications in psychology and the social sciences. The most prevalent methods typically assume that the error distribution is normal and homoscedastic. However, this assumption may rarely be met in practice, which can affect the validity of the mediation analysis. To address this problem, we propose robust mediation analysis based on median regression. Our approach is robust to various departures from the assumption of homoscedasticity and normality, including heavy-tailed, skewed, contaminated, and heteroscedastic distributions. Simulation studies show that under these circumstances, the proposed method is more efficient and powerful than standard mediation analysis. We further extend the proposed robust method to multilevel mediation analysis, and demonstrate through simulation studies that the new approach outperforms the standard multilevel mediation analysis. We illustrate the proposed method using data from a program designed to increase reemployment and enhance mental health of job seekers. PMID:24079925
Noorzad, Pardis; Sturm, Bob L.
We propose sparse approximation weighted regression (SPARROW), a method for local estimation of the regression function that uses sparse approximation with a dictionary of measurements. SPARROW estimates the regression function at a point with a linear combination of a few regressands selected...... by a sparse approximation of the point in terms of the regressors. We show SPARROW can be considered a variant of \\(k\\)-nearest neighbors regression (\\(k\\)-NNR), and more generally, local polynomial kernel regression. Unlike \\(k\\)-NNR, however, SPARROW can adapt the number of regressors to use based...
Kalialis, Louise Vennegaard; Drzewiecki, Krzysztof T; Klyver, Helle
Regression of metastatic melanoma is a rare event, and review of the literature reveals a total of 76 reported cases since 1866. The proposed mechanisms include immunologic, endocrine, inflammatory and metastatic tumour nutritional factors. We conclude from this review that although the precise...... mechanisms remain unknown, some event must trigger the immune system to produce a stronger than normal response that results in regression of the melanoma metastases. Immunologic studies of patients with regression may disclose the underlying mechanisms and lead to new therapies of disseminated melanoma....
Matt N. Williams
Full Text Available In 2002, an article entitled - Four assumptions of multiple regression that researchers should always test- by.Osborne and Waters was published in PARE. This article has gone on to be viewed more than 275,000 times.(as of August 2013, and it is one of the first results displayed in a Google search for - regression.assumptions- . While Osborne and Waters' efforts in raising awareness of the need to check assumptions.when using regression are laudable, we note that the original article contained at least two fairly important.misconceptions about the assumptions of multiple regression: Firstly, that multiple regression requires the.assumption of normally distributed variables; and secondly, that measurement errors necessarily cause.underestimation of simple regression coefficients. In this article, we clarify that multiple regression models.estimated using ordinary least squares require the assumption of normally distributed errors in order for.trustworthy inferences, at least in small samples, but not the assumption of normally distributed response or.predictor variables. Secondly, we point out that regression coefficients in simple regression models will be.biased (toward zero estimates of the relationships between variables of interest when measurement error is.uncorrelated across those variables, but that when correlated measurement error is present, regression.coefficients may be either upwardly or downwardly biased. We conclude with a brief corrected summary of.the assumptions of multiple regression when using ordinary least squares.
Pásztor, László; Laborczi, Annamária; Takács, Katalin; Szatmári, Gábor
information representing IEW or GRP forming environmental factors were taken into account to support the spatial inference of the locally experienced IEW frequency and measured GRP values respectively. An efficient spatial prediction methodology was applied to construct reliable maps, namely regression kriging (RK) using spatially exhaustive auxiliary data on soil, geology, topography, land use and climate. RK divides the spatial inference into two parts. Firstly the deterministic component of the target variable is determined by a regression model. The residuals of the multiple linear regression analysis represent the spatially varying but dependent stochastic component, which are interpolated by kriging. The final map is the sum of the two component predictions. Application of RK also provides the possibility of inherent accuracy assessment. The resulting maps are characterized by global and local measures of its accuracy. Additionally the method enables interval estimation for spatial extension of the areas of predefined risk categories. All of these outputs provide useful contribution to spatial planning, action planning and decision making. Acknowledgement: Our work was partly supported by the Hungarian National Scientific Research Foundation (OTKA, Grant No. K105167).
Full Text Available In this work, we assess the performance of three probabilistic models for intra-day solar forecasting. More precisely, a linear quantile regression method is used to build three models for generating 1 h–6 h-ahead probabilistic forecasts. Our approach is applied to forecasting solar irradiance at a site experiencing highly variable sky conditions using the historical ground observations of solar irradiance as endogenous inputs and day-ahead forecasts as exogenous inputs. Day-ahead irradiance forecasts are obtained from the Integrated Forecast System (IFS, a Numerical Weather Prediction (NWP model maintained by the European Center for Medium-Range Weather Forecast (ECMWF. Several metrics, mainly originated from the weather forecasting community, are used to evaluate the performance of the probabilistic forecasts. The results demonstrated that the NWP exogenous inputs improve the quality of the intra-day probabilistic forecasts. The analysis considered two locations with very dissimilar solar variability. Comparison between the two locations highlighted that the statistical performance of the probabilistic models depends on the local sky conditions.
Chu, Carlton; Ni, Yizhao; Tan, Geoffrey; Saunders, Craig J; Ashburner, John
This paper introduces two kernel-based regression schemes to decode or predict brain states from functional brain scans as part of the Pittsburgh Brain Activity Interpretation Competition (PBAIC) 2007, in which our team was awarded first place. Our procedure involved image realignment, spatial smoothing, detrending of low-frequency drifts, and application of multivariate linear and non-linear kernel regression methods: namely kernel ridge regression (KRR) and relevance vector regression (RVR). RVR is based on a Bayesian framework, which automatically determines a sparse solution through maximization of marginal likelihood. KRR is the dual-form formulation of ridge regression, which solves regression problems with high dimensional data in a computationally efficient way. Feature selection based on prior knowledge about human brain function was also used. Post-processing by constrained deconvolution and re-convolution was used to furnish the prediction. This paper also contains a detailed description of how prior knowledge was used to fine tune predictions of specific "feature ratings," which we believe is one of the key factors in our prediction accuracy. The impact of pre-processing was also evaluated, demonstrating that different pre-processing may lead to significantly different accuracies. Although the original work was aimed at the PBAIC, many techniques described in this paper can be generally applied to any fMRI decoding works to increase the prediction accuracy. Published by Elsevier Inc.
Contribution to the systemic study of energetic systems including electrochemical devices: Bond Graph formalism applied to modelling fuel cells, lithium-ion batteries and sun-racer; Contribution a l'etude systemique de dispositifs energetiques a composants electrochimiques. Formalisme Bond Graph applique aux piles a combustible, accumulateurs lithium-ion, vehicule solaire
This thesis is a contribution to the study of electric power conversion systems including electrochemical devices. A systemic approach draws advantage of the unified Bond Graph formalism in order to model every component as well as the whole system. A state of the art of electrochemical devices for decentralized electric energy generation and storage put emphasis on common phenomena with the aim of developing 'system oriented' generic models. Solid Oxide and Proton Exchange Fuel Cells (SOFC, PEMFC), as well as Lithium Ion batteries, have been modelled through an efficient work with electrochemistry specialists. These models involve an explicit representation, at a macroscopic level, of conversion and irreversible phenomena linked to the chemical reaction and coupled together both in the hydraulic, chemical, thermodynamic, electric and thermal fields. These models are used to study the modularity of the components, particularly the electric and thermal imbalances in the series and parallel fuel cells associations. The systemic approach is also applied to the study of architectures and energy management of electric power generating units involving PEMFC and battery or super-capacitors storage. Different working conditions for the fuel cells are defined and studied, consisting in either voltage or current or power imposed by means of the storage and static converters environment. Identification of parameters and working tests are performed on specially developed test benches so as to validate theoretical results. At last, the method is applied to study a 'sun-racer', an original complex system with embedded photovoltaic generator, electrochemical storage and brush-less wheel motor, wholly modelled in order to compare various energy management onboard the solar vehicle 'Solelhada'. (author)
Full Text Available Abstract Background In vertebrates, a large part of gene transcriptional regulation is operated by cis-regulatory modules. These modules are believed to be regulating much of the tissue-specificity of gene expression. Results We develop a Bayesian network approach for identifying cis-regulatory modules likely to regulate tissue-specific expression. The network integrates predicted transcription factor binding site information, transcription factor expression data, and target gene expression data. At its core is a regression tree modeling the effect of combinations of transcription factors bound to a module. A new unsupervised EM-like algorithm is developed to learn the parameters of the network, including the regression tree structure. Conclusion Our approach is shown to accurately identify known human liver and erythroid-specific modules. When applied to the prediction of tissue-specific modules in 10 different tissues, the network predicts a number of important transcription factor combinations whose concerted binding is associated to specific expression.
Kinnebrock, Silja; Podolskij, Mark
This paper introduces a new estimator to measure the ex-post covariation between high-frequency financial time series under market microstructure noise. We provide an asymptotic limit theory (including feasible central limit theorems) for standard methods such as regression, correlation analysis...... and covariance, for which we obtain the optimal rate of convergence. We demonstrate some positive semidefinite estimators of the covariation and construct a positive semidefinite estimator of the conditional covariance matrix in the central limit theorem. Furthermore, we indicate how the assumptions on the noise...... process can be relaxed and how our method can be applied to non-synchronous observations. We also present an empirical study of how high-frequency correlations, regressions and covariances change through time....
Sun, Yan V; Shedden, Kerby A; Zhu, Ji; Choi, Nam-Hee; Kardia, Sharon Lr
Using the North American Rheumatoid Arthritis Consortium genome-wide association dataset, we applied ridged, multiple least-squares regression to identify genetic variants with apparent unique contributions to variation of anti-cyclic citrullinated peptide (anti-CCP), a newly identified clinical risk factor for development of rheumatoid arthritis. Within a 2.7-Mbp region on chromosome 6 around the well studied HLA-DRB1 locus, ridge regression identified a single-nucleotide polymorphism that was associated with anti-CCP variation when including the additive effects of other single-nucleotide polymorphisms in a multivariable analysis, but that showed only a weak direct association with anti-CCP. This suggests that multivariable methods can be used to identify potentially relevant genetic variants in regions of interest that would be difficult to detect based on direct associations.
Grajeda, Laura M; Ivanescu, Andrada; Saito, Mayuko; Crainiceanu, Ciprian; Jaganath, Devan; Gilman, Robert H; Crabtree, Jean E; Kelleher, Dermott; Cabrera, Lilia; Cama, Vitaliano; Checkley, William
Childhood growth is a cornerstone of pediatric research. Statistical models need to consider individual trajectories to adequately describe growth outcomes. Specifically, well-defined longitudinal models are essential to characterize both population and subject-specific growth. Linear mixed-effect models with cubic regression splines can account for the nonlinearity of growth curves and provide reasonable estimators of population and subject-specific growth, velocity and acceleration. We provide a stepwise approach that builds from simple to complex models, and account for the intrinsic complexity of the data. We start with standard cubic splines regression models and build up to a model that includes subject-specific random intercepts and slopes and residual autocorrelation. We then compared cubic regression splines vis-à-vis linear piecewise splines, and with varying number of knots and positions. Statistical code is provided to ensure reproducibility and improve dissemination of methods. Models are applied to longitudinal height measurements in a cohort of 215 Peruvian children followed from birth until their fourth year of life. Unexplained variability, as measured by the variance of the regression model, was reduced from 7.34 when using ordinary least squares to 0.81 (p linear mixed-effect models with random slopes and a first order continuous autoregressive error term. There was substantial heterogeneity in both the intercept (p linear regression equation for both estimation and prediction of population- and individual-level growth in height. We show that cubic regression splines are superior to linear regression splines for the case of a small number of knots in both estimation and prediction with the full linear mixed effect model (AIC 19,352 vs. 19,598, respectively). While the regression parameters are more complex to interpret in the former, we argue that inference for any problem depends more on the estimated curve or differences in curves rather
Atem, Folefac D; Qian, Jing; Maye, Jacqueline E; Johnson, Keith A; Betensky, Rebecca A
Randomly censored covariates arise frequently in epidemiologic studies. The most commonly used methods, including complete case and single imputation or substitution, suffer from inefficiency and bias. They make strong parametric assumptions or they consider limit of detection censoring only. We employ multiple imputation, in conjunction with semi-parametric modeling of the censored covariate, to overcome these shortcomings and to facilitate robust estimation. We develop a multiple imputation approach for randomly censored covariates within the framework of a logistic regression model. We use the non-parametric estimate of the covariate distribution or the semiparametric Cox model estimate in the presence of additional covariates in the model. We evaluate this procedure in simulations, and compare its operating characteristics to those from the complete case analysis and a survival regression approach. We apply the procedures to an Alzheimer's study of the association between amyloid positivity and maternal age of onset of dementia. Multiple imputation achieves lower standard errors and higher power than the complete case approach under heavy and moderate censoring and is comparable under light censoring. The survival regression approach achieves the highest power among all procedures, but does not produce interpretable estimates of association. Multiple imputation offers a favorable alternative to complete case analysis and ad hoc substitution methods in the presence of randomly censored covariates within the framework of logistic regression.
Zied Ben Bouallègue
Full Text Available Nowadays, ensemble-based numerical weather forecasts provide probabilistic guidance to actors in the renewable energy sector. Ensemble forecasts can however suffer from statistical inconsistencies that affect the forecast reliability. Statistical post-processing techniques address this issue using learning algorithms based on past data. In this study, it is shown that quantile regression is a suitable method for the post-processing of ensemble global radiation forecasts. In a basic approach, conditional quantiles are estimated using the first guess quantile forecasts and a solar geometry variable as predictors. In a more complex approach, adequate meteorological predictors are selected among a pool of ensemble model outputs by means of a regularization scheme. The so-called penalized quantile regression and the basic quantile regression approaches, respectively, are applied to hourly averaged global radiation forecasts of the high-resolution ensemble prediction system COSMO-DE-EPS. Both calibration setups provide reliable probabilistic forecasts at all investigated probability levels, which improves considerably the ensemble forecast skill. Moreover, verification results demonstrate that including rigorously selected predictors in the regression scheme increases the ensemble forecast sharpness and thereby the value of the probabilistic guidance.
Scott, Neil W; Fayers, Peter M; Aaronson, Neil K; Bottomley, Andrew; de Graeff, Alexander; Groenvold, Mogens; Gundy, Chad; Koller, Michael; Petersen, Morten A; Sprangers, Mirjam A G
Differential item functioning (DIF) methods can be used to determine whether different subgroups respond differently to particular items within a health-related quality of life (HRQoL) subscale, after allowing for overall subgroup differences in that scale. This article reviews issues that arise when testing for DIF in HRQoL instruments. We focus on logistic regression methods, which are often used because of their efficiency, simplicity and ease of application. A review of logistic regression DIF analyses in HRQoL was undertaken. Methodological articles from other fields and using other DIF methods were also included if considered relevant. There are many competing approaches for the conduct of DIF analyses and many criteria for determining what constitutes significant DIF. DIF in short scales, as commonly found in HRQL instruments, may be more difficult to interpret. Qualitative methods may aid interpretation of such DIF analyses. A number of methodological choices must be made when applying logistic regression for DIF analyses, and many of these affect the results. We provide recommendations based on reviewing the current evidence. Although the focus is on logistic regression, many of our results should be applicable to DIF analyses in general. There is a need for more empirical and theoretical work in this area.
Burns, Douglas A.; Smith, Martyn J.; Freehafer, Douglas A.
A new Web-based application, titled “Application of Flood Regressions and Climate Change Scenarios To Explore Estimates of Future Peak Flows”, has been developed by the U.S. Geological Survey, in cooperation with the New York State Department of Transportation, that allows a user to apply a set of regression equations to estimate the magnitude of future floods for any stream or river in New York State (exclusive of Long Island) and the Lake Champlain Basin in Vermont. The regression equations that are the basis of the current application were developed in previous investigations by the U.S. Geological Survey (USGS) and are described at the USGS StreamStats Web sites for New York (http://water.usgs.gov/osw/streamstats/new_york.html) and Vermont (http://water.usgs.gov/osw/streamstats/Vermont.html). These regression equations include several fixed landscape metrics that quantify aspects of watershed geomorphology, basin size, and land cover as well as a climate variable—either annual precipitation or annual runoff.
Full Text Available Accurate electricity forecasting is still the critical issue in many energy management fields. The applications of hybrid novel algorithms with support vector regression (SVR models to overcome the premature convergence problem and improve forecasting accuracy levels also deserve to be widely explored. This paper applies chaotic function and quantum computing concepts to address the embedded drawbacks including crossover and mutation operations of genetic algorithms. Then, this paper proposes a novel electricity load forecasting model by hybridizing chaotic function and quantum computing with GA in an SVR model (named SVRCQGA to achieve more satisfactory forecasting accuracy levels. Experimental examples demonstrate that the proposed SVRCQGA model is superior to other competitive models.
Yang, Shun-hua; Zhang, Hai-tao; Guo, Long; Ren, Yan
Relative elevation and stream power index were selected as auxiliary variables based on correlation analysis for mapping soil organic matter. Geographically weighted regression Kriging (GWRK) and regression Kriging (RK) were used for spatial interpolation of soil organic matter and compared with ordinary Kriging (OK), which acts as a control. The results indicated that soil or- ganic matter was significantly positively correlated with relative elevation whilst it had a significantly negative correlation with stream power index. Semivariance analysis showed that both soil organic matter content and its residuals (including ordinary least square regression residual and GWR resi- dual) had strong spatial autocorrelation. Interpolation accuracies by different methods were esti- mated based on a data set of 98 validation samples. Results showed that the mean error (ME), mean absolute error (MAE) and root mean square error (RMSE) of RK were respectively 39.2%, 17.7% and 20.6% lower than the corresponding values of OK, with a relative-improvement (RI) of 20.63. GWRK showed a similar tendency, having its ME, MAE and RMSE to be respectively 60.6%, 23.7% and 27.6% lower than those of OK, with a RI of 59.79. Therefore, both RK and GWRK significantly improved the accuracy of OK interpolation of soil organic matter due to their in- corporation of auxiliary variables. In addition, GWRK performed obviously better than RK did in this study, and its improved performance should be attributed to the consideration of sample spatial locations.
Shih, Ching-Lin; Liu, Tien-Hsiang; Wang, Wen-Chung
The simultaneous item bias test (SIBTEST) method regression procedure and the differential item functioning (DIF)-free-then-DIF strategy are applied to the logistic regression (LR) method simultaneously in this study. These procedures are used to adjust the effects of matching true score on observed score and to better control the Type I error…
Richardson, David B; Langholz, Bryan
Background stratified Poisson regression is an approach that has been used in the analysis of data derived from a variety of epidemiologically important studies of radiation-exposed populations, including uranium miners, nuclear industry workers, and atomic bomb survivors. We describe a novel approach to fit Poisson regression models that adjust for a set of covariates through background stratification while directly estimating the radiation-disease association of primary interest. The approach makes use of an expression for the Poisson likelihood that treats the coefficients for stratum-specific indicator variables as 'nuisance' variables and avoids the need to explicitly estimate the coefficients for these stratum-specific parameters. Log-linear models, as well as other general relative rate models, are accommodated. This approach is illustrated using data from the Life Span Study of Japanese atomic bomb survivors and data from a study of underground uranium miners. The point estimate and confidence interval obtained from this 'conditional' regression approach are identical to the values obtained using unconditional Poisson regression with model terms for each background stratum. Moreover, it is shown that the proposed approach allows estimation of background stratified Poisson regression models of non-standard form, such as models that parameterize latency effects, as well as regression models in which the number of strata is large, thereby overcoming the limitations of previously available statistical software for fitting background stratified Poisson regression models.
Richardson, David B.; Langholz, Bryan
Background stratified Poisson regression is an approach that has been used in the analysis of data derived from a variety of epidemiologically important studies of radiation-exposed populations, including uranium miners, nuclear industry workers, and atomic bomb survivors. We describe a novel approach to fit Poisson regression models that adjust for a set of covariates through background stratification while directly estimating the radiation-disease association of primary interest. The approach makes use of an expression for the Poisson likelihood that treats the coefficients for stratum-specific indicator variables as 'nuisance' variables and avoids the need to explicitly estimate the coefficients for these stratum-specific parameters. Log-linear models, as well as other general relative rate models, are accommodated. This approach is illustrated using data from the Life Span Study of Japanese atomic bomb survivors and data from a study of underground uranium miners. The point estimate and confidence interval obtained from this 'conditional' regression approach are identical to the values obtained using unconditional Poisson regression with model terms for each background stratum. Moreover, it is shown that the proposed approach allows estimation of background stratified Poisson regression models of non-standard form, such as models that parameterize latency effects, as well as regression models in which the number of strata is large, thereby overcoming the limitations of previously available statistical software for fitting background stratified Poisson regression models. (orig.)
Richardson, David B. [University of North Carolina at Chapel Hill, Department of Epidemiology, School of Public Health, Chapel Hill, NC (United States); Langholz, Bryan [Keck School of Medicine, University of Southern California, Division of Biostatistics, Department of Preventive Medicine, Los Angeles, CA (United States)
Background stratified Poisson regression is an approach that has been used in the analysis of data derived from a variety of epidemiologically important studies of radiation-exposed populations, including uranium miners, nuclear industry workers, and atomic bomb survivors. We describe a novel approach to fit Poisson regression models that adjust for a set of covariates through background stratification while directly estimating the radiation-disease association of primary interest. The approach makes use of an expression for the Poisson likelihood that treats the coefficients for stratum-specific indicator variables as 'nuisance' variables and avoids the need to explicitly estimate the coefficients for these stratum-specific parameters. Log-linear models, as well as other general relative rate models, are accommodated. This approach is illustrated using data from the Life Span Study of Japanese atomic bomb survivors and data from a study of underground uranium miners. The point estimate and confidence interval obtained from this 'conditional' regression approach are identical to the values obtained using unconditional Poisson regression with model terms for each background stratum. Moreover, it is shown that the proposed approach allows estimation of background stratified Poisson regression models of non-standard form, such as models that parameterize latency effects, as well as regression models in which the number of strata is large, thereby overcoming the limitations of previously available statistical software for fitting background stratified Poisson regression models. (orig.)
Full Text Available Incomplete spontaneous regression of melanoma is common. However, complete melanoma regression is still a very rare phenomenon. Because melanoma is the most immunogenic human malignancy, the mechanisms leading to regression, based on accumulative evidence, are the host's immune responses. Unfortunately, therapies aiming to enhance the patient's natural immunity against melanoma have yet to meet their expectations. Reasons for failure include various immune escape mechanisms, induced by the tumor, that subsequently lead to tolerance. Here, we performed time-dependent gene expression profiling to unravel molecular changes involved in the transition of progressive melanoma to complete tumor regression using a porcine model. The melanoblastomabearing Libechov minipigs are highly suitable for this study because these animals exhibit naturally occurring and regressing melanomas. We were able to identify a molecular signature of the melanoma regression process. Genes regulated in this signature were associated with 1 cell cycle, 2 immune response, and 3 melanocyte differentiation. These genes may shed light on molecular mechanisms involved in complete melanoma regression and indicate what improvements are needed for successful antimelanoma therapy.
Sher, Gene; Zhi, Degui; Zhang, Shaojie
The ability to predict epitopes plays an enormous role in vaccine development in terms of our ability to zero in on where to do a more thorough in-vivo analysis of the protein in question. Though for the past decade there have been numerous advancements and improvements in epitope prediction, on average the best benchmark prediction accuracies are still only around 60%. New machine learning algorithms have arisen within the domain of deep learning, text mining, and convolutional networks. This paper presents a novel analytically trained and string kernel using deep neural network, which is tailored for continuous epitope prediction, called: Deep Ridge Regressed Epitope Predictor (DRREP). DRREP was tested on long protein sequences from the following datasets: SARS, Pellequer, HIV, AntiJen, and SEQ194. DRREP was compared to numerous state of the art epitope predictors, including the most recently published predictors called LBtope and DMNLBE. Using area under ROC curve (AUC), DRREP achieved a performance improvement over the best performing predictors on SARS (13.7%), HIV (8.9%), Pellequer (1.5%), and SEQ194 (3.1%), with its performance being matched only on the AntiJen dataset, by the LBtope predictor, where both DRREP and LBtope achieved an AUC of 0.702. DRREP is an analytically trained deep neural network, thus capable of learning in a single step through regression. By combining the features of deep learning, string kernels, and convolutional networks, the system is able to perform residue-by-residue prediction of continues epitopes with higher accuracy than the current state of the art predictors.
Shen, Xia; Alam, Moudud; Fikse, Freddy; Rönnegård, Lars
As the molecular marker density grows, there is a strong need in both genome-wide association studies and genomic selection to fit models with a large number of parameters. Here we present a computationally efficient generalized ridge regression (RR) algorithm for situations in which the number of parameters largely exceeds the number of observations. The computationally demanding parts of the method depend mainly on the number of observations and not the number of parameters. The algorithm was implemented in the R package bigRR based on the previously developed package hglm. Using such an approach, a heteroscedastic effects model (HEM) was also developed, implemented, and tested. The efficiency for different data sizes were evaluated via simulation. The method was tested for a bacteria-hypersensitive trait in a publicly available Arabidopsis data set including 84 inbred lines and 216,130 SNPs. The computation of all the SNP effects required <10 sec using a single 2.7-GHz core. The advantage in run time makes permutation test feasible for such a whole-genome model, so that a genome-wide significance threshold can be obtained. HEM was found to be more robust than ordinary RR (a.k.a. SNP-best linear unbiased prediction) in terms of QTL mapping, because SNP-specific shrinkage was applied instead of a common shrinkage. The proposed algorithm was also assessed for genomic evaluation and was shown to give better predictions than ordinary RR.
Full Text Available Intervertebral disc herniation of the lumbar spine is a common disease presenting with low back pain and involving nerve root radiculopathy. Some neurological symptoms in the majority of patients frequently improve after a period of conservative treatment. This has been regarded as the result of a decrease of pressure exerted from the herniated disc on neighboring neurostructures and a gradual regression of inflammation. Recently, with advances in magnetic resonance imaging, many reports have demonstrated that the herniated disc has the potential for spontaneous regression. Regression coincided with the improvement of associated symptoms. However, the exact regression mechanism remains unclear. Here, we present 2 cases of lumbar intervertebral disc herniation with spontaneous regression. We review the literature and discuss the possible mechanisms, the precipitating factors of spontaneous disc regression and the proper timing of surgical intervention.
Johansen, Søren; Nielsen, Bent
We review recent asymptotic results on some robust methods for multiple regression. The regressors include stationary and non-stationary time series as well as polynomial terms. The methods include the Huber-skip M-estimator, 1-step Huber-skip M-estimators, in particular the Impulse Indicator Sat...
Megan L. Sawatsky
Full Text Available Partial least square regression (PLSR is a statistical modeling technique that extracts latent factors to explain both predictor and response variation. PLSR is particularly useful as a data exploration technique because it is highly flexible (e.g., there are few assumptions, variables can be highly collinear. While gaining importance across a diverse number of fields, its application in the social sciences has been limited. Here, we provide a brief introduction to PLSR, directed towards a novice audience with limited exposure to the technique; demonstrate its utility as an alternative to more classic approaches (multiple linear regression, principal component regression; and apply the technique to a hypothetical dataset using JMP statistical software (with references to SAS software.
Bulcock, J. W.
The problem of model estimation when the data are collinear was examined. Though the ridge regression (RR) outperforms ordinary least squares (OLS) regression in the presence of acute multicollinearity, it is not a problem free technique for reducing the variance of the estimates. It is a stochastic procedure when it should be nonstochastic and it…
Palacios-Cruz, Lino; Pérez, Marcela; Rivas-Ruiz, Rodolfo; Talavera, Juan O
When we think about mathematical models, such as linear regression model, we think that these terms are only used by those engaged in research, a notion that is far from the truth. Legendre described the first mathematical model in 1805, and Galton introduced the formal term in 1886. Linear regression is one of the most commonly used regression models in clinical practice. It is useful to predict or show the relationship between two or more variables as long as the dependent variable is quantitative and has normal distribution. Stated in another way, the regression is used to predict a measure based on the knowledge of at least one other variable. Linear regression has as it's first objective to determine the slope or inclination of the regression line: Y = a + bx, where "a" is the intercept or regression constant and it is equivalent to "Y" value when "X" equals 0 and "b" (also called slope) indicates the increase or decrease that occurs when the variable "x" increases or decreases in one unit. In the regression line, "b" is called regression coefficient. The coefficient of determination (R 2 ) indicates the importance of independent variables in the outcome.
Yutaka, Yojiro; Omasa, Mitsugu; Shikuma, Kei; Okuda, Masato; Taki, Toshihiko
Although there are many reports of spontaneous regression of noninvasive thymoma, there are no reports of spontaneous regression of an invasive thymoma. Moreover, the mechanism of the spontaneous regression is still unknown. The present case concerns a 47-year-old man who presented with chest pain. Computed tomography (CT) showed a large anterior mediastinal mass with left pleural effusion that occluded the innominate vein. The tissue obtained by video-assisted thoracic surgery suggested a diagnosis of invasive thymic carcinoma. One month later CT showed prominent regression of the tumor, and the tumor was completely resected. On pathology, the diagnosis was thymoma type B3.
Hung, Hung; Jou, Zhi-Yu; Huang, Su-Yun
Logistic regression is among the most widely used statistical methods for linear discriminant analysis. In many applications, we only observe possibly mislabeled responses. Fitting a conventional logistic regression can then lead to biased estimation. One common resolution is to fit a mislabel logistic regression model, which takes into consideration of mislabeled responses. Another common method is to adopt a robust M-estimation by down-weighting suspected instances. In this work, we propose a new robust mislabel logistic regression based on γ-divergence. Our proposal possesses two advantageous features: (1) It does not need to model the mislabel probabilities. (2) The minimum γ-divergence estimation leads to a weighted estimating equation without the need to include any bias correction term, that is, it is automatically bias-corrected. These features make the proposed γ-logistic regression more robust in model fitting and more intuitive for model interpretation through a simple weighting scheme. Our method is also easy to implement, and two types of algorithms are included. Simulation studies and the Pima data application are presented to demonstrate the performance of γ-logistic regression. © 2017, The International Biometric Society.
Nielsen, Allan Aasbjerg
This note primarily describes the mathematics of least squares regression analysis as it is often used in geodesy including land surveying and satellite positioning applications. In these fields regression is often termed adjustment. The note also contains a couple of typical land surveying...... and satellite positioning application examples. In these application areas we are typically interested in the parameters in the model typically 2- or 3-D positions and not in predictive modelling which is often the main concern in other regression analysis applications. Adjustment is often used to obtain...... the clock error) and to obtain estimates of the uncertainty with which the position is determined. Regression analysis is used in many other fields of application both in the natural, the technical and the social sciences. Examples may be curve fitting, calibration, establishing relationships between...
Cade, B.S.; Noon, B.R.
Quantile regression is a way to estimate the conditional quantiles of a response variable distribution in the linear model that provides a more complete view of possible causal relationships between variables in ecological processes. Typically, all the factors that affect ecological processes are not measured and included in the statistical models used to investigate relationships between variables associated with those processes. As a consequence, there may be a weak or no predictive relationship between the mean of the response variable (y) distribution and the measured predictive factors (X). Yet there may be stronger, useful predictive relationships with other parts of the response variable distribution. This primer relates quantile regression estimates to prediction intervals in parametric error distribution regression models (eg least squares), and discusses the ordering characteristics, interval nature, sampling variation, weighting, and interpretation of the estimates for homogeneous and heterogeneous regression models.
Lee, Kang N. (Inventor)
An enhanced environmental barrier coating for a silicon containing substrate. The enhanced barrier coating may include a bond coat doped with at least one of an alkali metal oxide and an alkali earth metal oxide. The enhanced barrier coating may include a composite mullite bond coat including BSAS and another distinct second phase oxide applied over said surface.
Ustün, B; Melssen, W J; Buydens, L M C
This paper introduces a technique to visualise the information content of the kernel matrix and a way to interpret the ingredients of the Support Vector Regression (SVR) model. Recently, the use of Support Vector Machines (SVM) for solving classification (SVC) and regression (SVR) problems has increased substantially in the field of chemistry and chemometrics. This is mainly due to its high generalisation performance and its ability to model non-linear relationships in a unique and global manner. Modeling of non-linear relationships will be enabled by applying a kernel function. The kernel function transforms the input data, usually non-linearly related to the associated output property, into a high dimensional feature space where the non-linear relationship can be represented in a linear form. Usually, SVMs are applied as a black box technique. Hence, the model cannot be interpreted like, e.g., Partial Least Squares (PLS). For example, the PLS scores and loadings make it possible to visualise and understand the driving force behind the optimal PLS machinery. In this study, we have investigated the possibilities to visualise and interpret the SVM model. Here, we exclusively have focused on Support Vector Regression to demonstrate these visualisation and interpretation techniques. Our observations show that we are now able to turn a SVR black box model into a transparent and interpretable regression modeling technique.
Full Text Available Understanding associations between genotypes and complex traits is a fundamental problem in human genetics. A major open problem in mapping phenotypes is that of identifying a set of interacting genetic variants, which might contribute to complex traits. Logic regression (LR is a powerful multivariant association tool. Several LR-based approaches have been successfully applied to different datasets. However, these approaches are not adequate with regard to accuracy and efficiency. In this paper, we propose a new LR-based approach, called fish-swarm logic regression (FSLR, which improves the logic regression process by incorporating swarm optimization. In our approach, a school of fish agents are conducted in parallel. Each fish agent holds a regression model, while the school searches for better models through various preset behaviors. A swarm algorithm improves the accuracy and the efficiency by speeding up the convergence and preventing it from dropping into local optimums. We apply our approach on a real screening dataset and a series of simulation scenarios. Compared to three existing LR-based approaches, our approach outperforms them by having lower type I and type II error rates, being able to identify more preset causal sites, and performing at faster speeds.
Most of applied statistics involves regression analysis of data. In practice, it is important to specify a regression model that has minimal assumptions which are not violated by data, to ensure that statistical inferences from the model are informative and not misleading. This paper presents a stand-alone and menu-driven software package, Bayesian Regression: Nonparametric and Parametric Models, constructed from MATLAB Compiler. Currently, this package gives the user a choice from 83 Bayesian models for data analysis. They include 47 Bayesian nonparametric (BNP) infinite-mixture regression models; 5 BNP infinite-mixture models for density estimation; and 31 normal random effects models (HLMs), including normal linear models. Each of the 78 regression models handles either a continuous, binary, or ordinal dependent variable, and can handle multi-level (grouped) data. All 83 Bayesian models can handle the analysis of weighted observations (e.g., for meta-analysis), and the analysis of left-censored, right-censored, and/or interval-censored data. Each BNP infinite-mixture model has a mixture distribution assigned one of various BNP prior distributions, including priors defined by either the Dirichlet process, Pitman-Yor process (including the normalized stable process), beta (two-parameter) process, normalized inverse-Gaussian process, geometric weights prior, dependent Dirichlet process, or the dependent infinite-probits prior. The software user can mouse-click to select a Bayesian model and perform data analysis via Markov chain Monte Carlo (MCMC) sampling. After the sampling completes, the software automatically opens text output that reports MCMC-based estimates of the model's posterior distribution and model predictive fit to the data. Additional text and/or graphical output can be generated by mouse-clicking other menu options. This includes output of MCMC convergence analyses, and estimates of the model's posterior predictive distribution, for selected
Kallager, Per Kristian Roko
The effect of economic globalisation on the welfare state is a widely polarised debate in the scholarly literature. In essence, there are three possible effects of this relationship: economic globalisation increases welfare, decreases welfare or it has no effect. By applying meta-regression analysis to 33 empirical studies, this thesis concludes that globalization have a positive effect on the welfare state, although it is quite small. Moreover, the thesis finds that publication bias is not ...
Avdulaj, Krenar; Baruník, Jozef
Roč. 21, č. 1 (2017), s. 81-97 ISSN 1081-1826 R&D Projects: GA ČR(CZ) GBP402/12/G097 Institutional support: RVO:67985556 Keywords : copula quantile regression * realized volatility * value-at-risk Subject RIV: AH - Economics OBOR OECD: Applied Economics, Econometrics Impact factor: 0.649, year: 2016 http://library.utia.cas.cz/separaty/2017/E/avdulaj-0472346.pdf
This book seeks new perspectives on the growing inequalities that our societies face, putting forward Structured Additive Distributional Regression as a means of statistical analysis that circumvents the common problem of analytical reduction to simple point estimators. This new approach allows the observed discrepancy between the individuals’ realities and the abstract representation of those realities to be explicitly taken into consideration using the arithmetic mean alone. In turn, the method is applied to the question of economic inequality in Germany.
Metallic heat shields for Space Shuttle thermal protection systems must operate for many flight cycles at high temperatures in low-pressure air and use thin-gage (less than or equal to 0.65 mm) sheet. Available creep data for thin sheet under those conditions are inadequate. To assess the effects of oxygen partial pressure and sheet thickness on creep behavior and to develop constitutive creep equations for small sets of data, regression techniques are applied and discussed
Nagy, Ivan; Suzdaleva, Evgenia
Roč. 26, č. 5 (2016), s. 417-437 ISSN 1210-0552 R&D Projects: GA ČR GA15-03564S Institutional support: RVO:67985556 Keywords : on-line modeling * on-line logistic regression * recursive mixture estimation * data dependent pointer Subject RIV: BB - Applied Statistics, Operational Research Impact factor: 0.394, year: 2016 http://library.utia.cas.cz/separaty/2016/ZS/suzdaleva-0464463.pdf
Lin, Yunzhi; Yu, Menggang; Wang, Sijian; Chappell, Richard; Imperiale, Thomas F
Colorectal cancer is the second leading cause of death from cancer in the United States. To facilitate the efficiency of colorectal cancer screening, there is a need to stratify risk for colorectal cancer among the 90% of US residents who are considered "average risk." In this article, we investigate such risk stratification rules for advanced colorectal neoplasia (colorectal cancer and advanced, precancerous polyps). We use a recently completed large cohort study of subjects who underwent a first screening colonoscopy. Logistic regression models have been used in the literature to estimate the risk of advanced colorectal neoplasia based on quantifiable risk factors. However, logistic regression may be prone to overfitting and instability in variable selection. Since most of the risk factors in our study have several categories, it was tempting to collapse these categories into fewer risk groups. We propose a penalized logistic regression method that automatically and simultaneously selects variables, groups categories, and estimates their coefficients by penalizing the [Formula: see text]-norm of both the coefficients and their differences. Hence, it encourages sparsity in the categories, i.e. grouping of the categories, and sparsity in the variables, i.e. variable selection. We apply the penalized logistic regression method to our data. The important variables are selected, with close categories simultaneously grouped, by penalized regression models with and without the interactions terms. The models are validated with 10-fold cross-validation. The receiver operating characteristic curves of the penalized regression models dominate the receiver operating characteristic curve of naive logistic regressions, indicating a superior discriminative performance. © The Author(s) 2013.
Dissing, Bjørn Skovlund; Carstensen, Jens Michael; Larsen, Rasmus
-XYZ color matching functions. The target of the regression is a well known color chart, and the models are validated using leave one out cross validation in order to maintain best possible generalization ability. The authors compare the method with a direct linear regression and see...
Distante, Roberta; Petrella, Ivan; Santoro, Emiliano
The nexus between firm growth, size and age in U.S. manufacturing is examined through the lens of quantile regression models. This methodology allows us to overcome serious shortcomings entailed by linear regression models employed by much of the existing literature, unveiling a number of important...
RUSCHENDORF, L; DEVALK, [No Value
We construct a.s. nonlinear regression representations of general stochastic processes (X(n))n is-an-element-of N. As a consequence we obtain in particular special regression representations of Markov chains and of certain m-dependent sequences. For m-dependent sequences we obtain a constructive
Kvalheim, O.M.; Arneberg, R.; Bleie, O.; Rajalahti, T.; Smilde, A.K.; Westerhuis, J.A.
The quality and practical usefulness of a regression model are a function of both interpretability and prediction performance. This work presents some new graphical tools for improved interpretation of latent variable regression models that can also assist in improved algorithms for variable
Zhdanov, Fedor; Kalnishkan, Yuri
This paper derives an identity connecting the square loss of ridge regression in on-line mode with the loss of the retrospectively best regressor. Some corollaries about the properties of the cumulative loss of on-line ridge regression are also obtained.
Dijkstra, Theo K.
For ridge regression the degrees of freedom are commonly calculated by the trace of the matrix that transforms the vector of observations on the dependent variable into the ridge regression estimate of its expected value. For a fixed ridge parameter this is unobjectionable. When the ridge parameter
de Jong, P.F.
In a hierarchical or fixed-order regression analysis, the independent variables are entered into the regression equation in a prespecified order. Such an analysis is often performed when the extra amount of variance accounted for in a dependent variable by a specific independent variable is the main
Faggion, Clovis Mariano; Chambrone, Leandro; Tu, Yu-Kang
To evaluate the quality of reporting of logistic regression models used to assess risk factors for tooth loss in patients who have received periodontal treatment. The PubMed, EMBASE, BIOSIS Citation Index, CINAHL, Web of Science, and LILACS electronic databases were searched up to 01 March 2014 to identify interventional longitudinal studies assessing risk factors for tooth loss after periodontal treatment. The reference lists of included studies were searched manually. No language restriction was applied to the search. Quality of reporting of logistic regression models was assessed using analytical and documentation criteria with a 15-item checklist. Criteria were judged as met (adequately reported) or not met (not reported). All searches, selection, data extraction, and quality assessment were performed independently and in duplicate. Of 621 records initially retrieved, 24 articles were included in the analysis. Less than 30% of all 360 datapoints were met. "Coding of independent variables" was reported most frequently [n = 22 (83%) articles]. Criteria such as "internal and external validation of the model" were not met in any study assessed. The reporting of logistic regression models in studies assessing risk factors for tooth loss in patients who have received periodontal treatment is not optimal. © 2014 John Wiley & Sons A/S. Published by John Wiley & Sons Ltd.
Full Text Available Hepatic fibrosis is the common pathological outcome of chronic hepatic diseases. An accurate assessment of fibrosis degree provides an important reference for a definite diagnosis of diseases, treatment decision-making, treatment outcome monitoring, and prognostic evaluation. At present, many clinical studies have proven that regression of hepatic fibrosis and early-stage liver cirrhosis can be achieved by effective treatment, and a correct evaluation of fibrosis regression has become a hot topic in clinical research. Liver biopsy has long been regarded as the gold standard for the assessment of hepatic fibrosis, and thus it plays an important role in the evaluation of fibrosis regression. This article reviews the clinical application of current pathological staging systems in the evaluation of fibrosis regression from the perspectives of semi-quantitative scoring system, quantitative approach, and qualitative approach, in order to propose a better pathological evaluation system for the assessment of fibrosis regression.
Shanmugam, Nesan; Román-Rego, Ana; Ong, Peter; Kaski, Juan Carlos
Coronary artery disease is the major cause of death in the western world. The formation and rapid progression of atheromatous plaques can lead to serious cardiovascular events in patients with atherosclerosis. The better understanding, in recent years, of the mechanisms leading to atheromatous plaque growth and disruption and the availability of powerful HMG CoA-reductase inhibitors (statins) has permitted the consideration of plaque regression as a realistic therapeutic goal. This article reviews the existing evidence underpinning current therapeutic strategies aimed at achieving atherosclerotic plaque regression. In this review we also discuss imaging modalities for the assessment of plaque regression, predictors of regression and whether plaque regression is associated with a survival benefit.
Rausch, Manuel; Zehetleitner, Michael
Are logistic regression slopes suitable to quantify metacognitive sensitivity, i.e. the efficiency with which subjective reports differentiate between correct and incorrect task responses? We analytically show that logistic regression slopes are independent from rating criteria in one specific model of metacognition, which assumes (i) that rating decisions are based on sensory evidence generated independently of the sensory evidence used for primary task responses and (ii) that the distributions of evidence are logistic. Given a hierarchical model of metacognition, logistic regression slopes depend on rating criteria. According to all considered models, regression slopes depend on the primary task criterion. A reanalysis of previous data revealed that massive numbers of trials are required to distinguish between hierarchical and independent models with tolerable accuracy. It is argued that researchers who wish to use logistic regression as measure of metacognitive sensitivity need to control the primary task criterion and rating criteria. Copyright © 2017 Elsevier Inc. All rights reserved.
Recent attempts to assess the practical impact of scientific research prompted my own reflections on over 40 years worth of combining basic and applied cognitive psychology. Examples are drawn principally from the study of memory disorders, but also include applications to the assessment of attention, reading, and intelligence. The most striking conclusion concerns the many years it typically takes to go from an initial study, to the final practical outcome. Although the complexity and sheer timescale involved make external evaluation problematic, the combination of practical satisfaction and theoretical stimulation make the attempt to combine basic and applied research very rewarding. © 2013 The British Psychological Society.
Dries F. Benoit
Full Text Available After its introduction by Koenker and Basset (1978, quantile regression has become an important and popular tool to investigate the conditional response distribution in regression. The R package bayesQR contains a number of routines to estimate quantile regression parameters using a Bayesian approach based on the asymmetric Laplace distribution. The package contains functions for the typical quantile regression with continuous dependent variable, but also supports quantile regression for binary dependent variables. For both types of dependent variables, an approach to variable selection using the adaptive lasso approach is provided. For the binary quantile regression model, the package also contains a routine that calculates the fitted probabilities for each vector of predictors. In addition, functions for summarizing the results, creating traceplots, posterior histograms and drawing quantile plots are included. This paper starts with a brief overview of the theoretical background of the models used in the bayesQR package. The main part of this paper discusses the computational problems that arise in the implementation of the procedure and illustrates the usefulness of the package through selected examples.
Allegrini, Franco; Braga, Jez W B; Moreira, Alessandro C O; Olivieri, Alejandro C
A new multivariate regression model, named Error Covariance Penalized Regression (ECPR) is presented. Following a penalized regression strategy, the proposed model incorporates information about the measurement error structure of the system, using the error covariance matrix (ECM) as a penalization term. Results are reported from both simulations and experimental data based on replicate mid and near infrared (MIR and NIR) spectral measurements. The results for ECPR are better under non-iid conditions when compared with traditional first-order multivariate methods such as ridge regression (RR), principal component regression (PCR) and partial least-squares regression (PLS). Copyright © 2018 Elsevier B.V. All rights reserved.
Pajouheshnia, Romin; Pestman, Wiebe R; Teerenstra, Steven; Groenwold, Rolf H H
It is often unclear which approach to fit, assess and adjust a model will yield the most accurate prediction model. We present an extension of an approach for comparing modelling strategies in linear regression to the setting of logistic regression and demonstrate its application in clinical prediction research. A framework for comparing logistic regression modelling strategies by their likelihoods was formulated using a wrapper approach. Five different strategies for modelling, including simple shrinkage methods, were compared in four empirical data sets to illustrate the concept of a priori strategy comparison. Simulations were performed in both randomly generated data and empirical data to investigate the influence of data characteristics on strategy performance. We applied the comparison framework in a case study setting. Optimal strategies were selected based on the results of a priori comparisons in a clinical data set and the performance of models built according to each strategy was assessed using the Brier score and calibration plots. The performance of modelling strategies was highly dependent on the characteristics of the development data in both linear and logistic regression settings. A priori comparisons in four empirical data sets found that no strategy consistently outperformed the others. The percentage of times that a model adjustment strategy outperformed a logistic model ranged from 3.9 to 94.9 %, depending on the strategy and data set. However, in our case study setting the a priori selection of optimal methods did not result in detectable improvement in model performance when assessed in an external data set. The performance of prediction modelling strategies is a data-dependent process and can be highly variable between data sets within the same clinical domain. A priori strategy comparison can be used to determine an optimal logistic regression modelling strategy for a given data set before selecting a final modelling approach.
Molano, Monica; Gonzalez, Mauricio; Gamboa, Oscar; Ortiz, Natasha; Luna, Joaquin; Hernandez, Gustavo; Posso, Hector; Murillo, Raul; Munoz, Nubia
Objective: To analyze the role of Human Papillomavirus (HPV) and other risk factors in the regression of cervical lesions in women from the Bogota Cohort. Methods: 200 HPV positive women with abnormal cytology were included for regression analysis. The time of lesion regression was modeled using methods for interval censored survival time data. Median duration of total follow-up was 9 years. Results: 80 (40%) women were diagnosed with Atypical Squamous Cells of Undetermined Significance (ASCUS) or Atypical Glandular Cells of Undetermined Significance (AGUS) while 120 (60%) were diagnosed with Low Grade Squamous Intra-epithelial Lesions (LSIL). Globally, 40% of the lesions were still present at first year of follow up, while 1.5% was still present at 5 year check-up. The multivariate model showed similar regression rates for lesions in women with ASCUS/AGUS and women with LSIL (HR= 0.82, 95% CI 0.59-1.12). Women infected with HR HPV types and those with mixed infections had lower regression rates for lesions than did women infected with LR types (HR=0.526, 95% CI 0.33-0.84, for HR types and HR=0.378, 95% CI 0.20-0.69, for mixed infections). Furthermore, women over 30 years had a higher lesion regression rate than did women under 30 years (HR1.53, 95% CI 1.03-2.27). The study showed that the median time for lesion regression was 9 months while the median time for HPV clearance was 12 months. Conclusions: In the studied population, the type of infection and the age of the women are critical factors for the regression of cervical lesions.
Kayano, Mitsunori; Kataoka, Tomoko
Multiple logistic regression was applied to milk yield and composition data for 632 records of healthy cows and 61 records of ketotic cows in Hokkaido, Japan. The purpose was to diagnose ketosis based on milk yield and composition, simultaneously. The cows were divided into two groups: (1) multiparous, including 314 healthy cows and 45 ketotic cows and (2) primiparous, including 318 healthy cows and 16 ketotic cows, since nutritional status, milk yield and composition are affected by parity. Multiple logistic regression was applied to these groups separately. For multiparous cows, milk yield (kg/day/cow) and protein-to-fat (P/F) ratio in milk were significant factors (Pketosis. For primiparous cows, lactose content (%), solid not fat (SNF) content (%) and milk urea nitrogen (MUN) content (mg/dl) were significantly associated with ketosis (Pketosis, provided the sensitivity, specificity and AUC values of (1) 0.711, 0.726 and 0.781; and (2) 0.678, 0.767 and 0.738, respectively.
Tashakkor, Scott B.
NASA is developing the Space Launch System (SLS) to be a heavy lift launch vehicle supporting human and scientific exploration beyond earth orbit. SLS will have a common core stage, an upper stage, and different permutations of boosters and fairings to perform various crewed or cargo missions. Marshall Space Flight Center (MSFC) is writing the Flight Software (FSW) that will operate the SLS launch vehicle. The FSW is developed in an incremental manner based on "Agile" software techniques. As the FSW is incrementally developed, testing the functionality of the code needs to be performed continually to ensure that the integrity of the software is maintained. Manually testing the functionality on an ever-growing set of requirements and features is not an efficient solution and therefore needs to be done automatically to ensure testing is comprehensive. To support test automation, a framework for a regression test harness has been developed and used on SLS FSW. The test harness provides a modular design approach that can compile or read in the required information specified by the developer of the test. The modularity provides independence between groups of tests and the ability to add and remove tests without disturbing others. This provides the SLS FSW team a time saving feature that is essential to meeting SLS Program technical and programmatic requirements. During development of SLS FSW, this technique has proved to be a useful tool to ensure all requirements have been tested, and that desired functionality is maintained, as changes occur. It also provides a mechanism for developers to check functionality of the code that they have developed. With this system, automation of regression testing is accomplished through a scheduling tool and/or commit hooks. Key advantages of this test harness capability includes execution support for multiple independent test cases, the ability for developers to specify precisely what they are testing and how, the ability to add
Tassios, Dimitrios P
Applied Chemical Engineering Thermodynamics provides the undergraduate and graduate student of chemical engineering with the basic knowledge, the methodology and the references he needs to apply it in industrial practice. Thus, in addition to the classical topics of the laws of thermodynamics,pure component and mixture thermodynamic properties as well as phase and chemical equilibria the reader will find: - history of thermodynamics - energy conservation - internmolecular forces and molecular thermodynamics - cubic equations of state - statistical mechanics. A great number of calculated problems with solutions and an appendix with numerous tables of numbers of practical importance are extremely helpful for applied calculations. The computer programs on the included disk help the student to become familiar with the typical methods used in industry for volumetric and vapor-liquid equilibria calculations.
The purpose of this paper is to present some useful methods for introductory analysis of variables and subsets in relation to PLS regression. We present here methods that are efficient in finding the appropriate variables or subset to use in the PLS regression. The general conclusion...... is that variable selection is important for successful analysis of chemometric data. An important aspect of the results presented is that lack of variable selection can spoil the PLS regression, and that cross-validation measures using a test set can show larger variation, when we use different subsets of X, than...
A review of regression procedures for randomized response data, including univariate and multivariate logistic regression, the proportional odds model and item response model, and self-protective responses
Cruyff, M.; Böckenholt, U.; van der Heijden, P.G.M.; Frank, L.E.
In survey research, it is often problematic to ask people sensitive questions because they may refuse to answer or they may provide a socially desirable answer that does not reveal their true status on the sensitive question. To solve this problem Warner (1965) proposed randomized response (RR).
Buckner, Mark A.
Learning from data is fast becoming the rule rather than the exception for many science and engineering research problems, particularly those encountered in nuclear engineering. Problems associated with learning from data fall under the more general category of inverse problems . A data-drive inverse problem involves constructing a predictive model of a target system from a collection of input/output observations. One of the difficulties associated with constructing a model that approximates such unknown causes based solely on observations of their effects is that collinearities in the input data result in the problem being ill-posed. Ill-posed problems cause models obtained by conventional techniques, such as linear regression, neural networks and kernel techniques, to become unstable, producing unreliable results. Methods of regularization using ordinary ridge regression (ORR) and kernel regression (KR) have been proposed as viable solutions to ill-posed problems. Successful application of ORR and KR require the selection of optimal parameter values---ridge parameters for ORR and bandwidth parameters for KR. The common practice for both methods is to select a single parameter based on minimizing an objective function which is an estimate of empirical risk. The single parameter value is then applied to all predictor variables indiscriminately, in a sort of one-size-fits-all fashion. Versions of ORR and KR have been proposed that make use of individual localized ridge and a matrix of localized bandwidth parameters that are optimally selected based on the relevance of their associated predictor variables to reducing empirical risk. While the practical and theoretical value of both localized regression techniques is recognized they have obtained limited use because of the difficulties associated with selecting multiple optimal ridge parameters for localized ridge regression (LRR)---defined as the localized ridge regression problem---and multiple optimal bandwidth
Full Text Available In this study, 52 asymptotic Curve Number (CN regression equations were developed for combinations of representative land covers and hydrologic soil groups. In addition, to overcome the limitations of the original Long-term Hydrologic Impact Assessment (L-THIA model when it is applied to larger watersheds, a watershed-scale L-THIA Asymptotic CN (ACN regression equation model (watershed-scale L-THIA ACN model was developed by integrating the asymptotic CN regressions and various modules for direct runoff/baseflow/channel routing. The watershed-scale L-THIA ACN model was applied to four watersheds in South Korea to evaluate the accuracy of its streamflow prediction. The coefficient of determination (R2 and Nash–Sutcliffe Efficiency (NSE values for observed versus simulated streamflows over intervals of eight days were greater than 0.6 for all four of the watersheds. The watershed-scale L-THIA ACN model, including the asymptotic CN regression equation method, can simulate long-term streamflow sufficiently well with the ten parameters that have been added for the characterization of streamflow.
Langella, Giuliano; Basile, Angelo; Bonfante, Antonello; Manna, Piero; Terribile, Fabio
Digital soil mapping procedures are widespread used to build two-dimensional continuous maps about several pedological attributes. Our work addressed a regression kriging (RK) technique and a bootstrapped artificial neural network approach in order to evaluate and compare (i) the accuracy of prediction, (ii) the susceptibility of being included in automatic engines (e.g. to constitute web processing services), and (iii) the time cost needed for calibrating models and for making predictions. Regression kriging is maybe the most widely used geostatistical technique in the digital soil mapping literature. Here we tried to apply the EBLUP regression kriging as it is deemed to be the most statistically sound RK flavor by pedometricians. An unusual multi-parametric and nonlinear machine learning approach was accomplished, called BAGAP (Bootstrap aggregating Artificial neural networks with Genetic Algorithms and Principal component regression). BAGAP combines a selected set of weighted neural nets having specified characteristics to yield an ensemble response. The purpose of applying these two particular models is to ascertain whether and how much a more cumbersome machine learning method could be much promising in making more accurate/precise predictions. Being aware of the difficulty to handle objects based on EBLUP-RK as well as BAGAP when they are embedded in environmental applications, we explore the susceptibility of them in being wrapped within Web Processing Services. Two further kinds of aspects are faced for an exhaustive evaluation and comparison: automaticity and time of calculation with/without high performance computing leverage.
Xiang, Sijia; Yao, Weixin
In this article, we propose two classes of semiparametric mixture regression models with single-index for model based clustering. Unlike many semiparametric/nonparametric mixture regression models that can only be applied to low dimensional predictors, the new semiparametric models can easily incorporate high dimensional predictors into the nonparametric components. The proposed models are very general, and many of the recently proposed semiparametric/nonparametric mixture regression models a...
Kumar, Akansha; Tsvetkov, Pavel V.; McClarren, Ryan G.
Highlights: • Presented a benchmark for the applicability of linear regression to complex systems. • Applied linear regression to a nuclear reactor power system. • Performed neutronics, thermal–hydraulics, and energy conversion using Brayton’s cycle for the design of a GCFBR. • Performed detailed sensitivity analysis to a set of parameters in a nuclear reactor power system. • Modeled and developed reactor design using MCNP, regression using R, and thermal–hydraulics in Java. - Abstract: The paper presents a general strategy applicable for sensitivity analysis (SA), and uncertainity quantification analysis (UA) of parameters related to a nuclear reactor design. This work also validates the use of linear regression (LR) for predictive analysis in a nuclear reactor design. The analysis helps to determine the parameters on which a LR model can be fit for predictive analysis. For those parameters, a regression surface is created based on trial data and predictions are made using this surface. A general strategy of SA to determine and identify the influential parameters those affect the operation of the reactor is mentioned. Identification of design parameters and validation of linearity assumption for the application of LR of reactor design based on a set of tests is performed. The testing methods used to determine the behavior of the parameters can be used as a general strategy for UA, and SA of nuclear reactor models, and thermal hydraulics calculations. A design of a gas cooled fast breeder reactor (GCFBR), with thermal–hydraulics, and energy transfer has been used for the demonstration of this method. MCNP6 is used to simulate the GCFBR design, and perform the necessary criticality calculations. Java is used to build and run input samples, and to extract data from the output files of MCNP6, and R is used to perform regression analysis and other multivariate variance, and analysis of the collinearity of data
© 2015, © American Statistical Association, Institute of Mathematical Statistics, and Interface Foundation of North America. Recent years have seen active developments of various penalized regression methods, such as LASSO and elastic net, to analyze high-dimensional data. In these approaches, the direction and length of the regression coefficients are determined simultaneously. Due to the introduction of penalties, the length of the estimates can be far from being optimal for accurate predictions. We introduce a new framework, regression by projection, and its sparse version to analyze high-dimensional data. The unique nature of this framework is that the directions of the regression coefficients are inferred first, and the lengths and the tuning parameters are determined by a cross-validation procedure to achieve the largest prediction accuracy. We provide a theoretical result for simultaneous model selection consistency and parameter estimation consistency of our method in high dimension. This new framework is then generalized such that it can be applied to principal components analysis, partial least squares, and canonical correlation analysis. We also adapt this framework for discriminant analysis. Compared with the existing methods, where there is relatively little control of the dependency among the sparse components, our method can control the relationships among the components. We present efficient algorithms and related theory for solving the sparse regression by projection problem. Based on extensive simulations and real data analysis, we demonstrate that our method achieves good predictive performance and variable selection in the regression setting, and the ability to control relationships between the sparse components leads to more accurate classification. In supplementary materials available online, the details of the algorithms and theoretical proofs, and R codes for all simulation studies are provided.
Offers an introduction to the R system for users with a background in economics. This book covers a variety of regression models, regression diagnostics and robustness issues, the nonlinear models of microeconomics, time series and time series econometrics.
Osterrieder, Daniela; Ventosa-Santaulària, Daniel; Vera-Valdés, J. Eduardo
Predictive return regressions with persistent regressors are typically plagued by (asymptotically) biased/inconsistent estimates of the slope, non-standard or potentially even spurious statistical inference, and regression unbalancedness. We alleviate the problem of unbalancedness...... in the theoretical predictive equation by suggesting a data generating process, where returns are generated as linear functions of a lagged latent I(0) risk process. The observed predictor is a function of this latent I(0) process, but it is corrupted by a fractionally integrated noise. Such a process may arise due...... to aggregation or unexpected level shifts. In this setup, the practitioner estimates a misspecified, unbalanced, and endogenous predictive regression. We show that the OLS estimate of this regression is inconsistent, but standard inference is possible. To obtain a consistent slope estimate, we then suggest...
This report presents a methodology for travel time estimation by using regression trees. The dissemination of travel time information has become crucial for effective traffic management, especially under congested road conditions. In the absence of c...
J Gordon Millichap
Full Text Available Patterns and features of regression in a case series of 53 girls and women with Rett syndrome were studied at the Institute of Child Health and Great Ormond Street Children’s Hospital, London, UK.
The model included fixed regression on AM (range from 30 to 138 mo) and the effect of herd-measurement date concatenation. Random parts of the model were RRM coefficients for additive and permanent environmental effects, while residual effects were modelled to account for heterogeneity of variance by AY. Estimates ...
Jaccard, James; Guilamo-Ramos, Vincent; Johansson, Margaret; Bouris, Alida
A major form of data analysis in clinical child and adolescent psychology is multiple regression. This article reviews issues in the application of such methods in light of the research designs typical of this field. Issues addressed include controlling covariates, evaluation of predictor relevance, comparing predictors, analysis of moderation,…
Rouder, Jeffrey N.; Morey, Richard D.
In this article, we present a Bayes factor solution for inference in multiple regression. Bayes factors are principled measures of the relative evidence from data for various models or positions, including models that embed null hypotheses. In this regard, they may be used to state positive evidence for a lack of an effect, which is not possible…
Zhang, Zheng; Lai, Zhihui; Xu, Yong; Shao, Ling; Wu, Jian; Xie, Guo-Sen
In this paper, we aim at learning compact and discriminative linear regression models. Linear regression has been widely used in different problems. However, most of the existing linear regression methods exploit the conventional zero-one matrix as the regression targets, which greatly narrows the flexibility of the regression model. Another major limitation of these methods is that the learned projection matrix fails to precisely project the image features to the target space due to their weak discriminative capability. To this end, we present an elastic-net regularized linear regression (ENLR) framework, and develop two robust linear regression models which possess the following special characteristics. First, our methods exploit two particular strategies to enlarge the margins of different classes by relaxing the strict binary targets into a more feasible variable matrix. Second, a robust elastic-net regularization of singular values is introduced to enhance the compactness and effectiveness of the learned projection matrix. Third, the resulting optimization problem of ENLR has a closed-form solution in each iteration, which can be solved efficiently. Finally, rather than directly exploiting the projection matrix for recognition, our methods employ the transformed features as the new discriminate representations to make final image classification. Compared with the traditional linear regression model and some of its variants, our method is much more accurate in image classification. Extensive experiments conducted on publicly available data sets well demonstrate that the proposed framework can outperform the state-of-the-art methods. The MATLAB codes of our methods can be available at http://www.yongxu.org/lunwen.html.
Kühnel, Line; Sommer, Stefan Horst
This paper considers the estimation problem arising when inferring parameters in the stochastic development regression model for manifold valued non-linear data. Stochastic development regression captures the relation between manifold-valued response and Euclidean covariate variables using...... the stochastic development construction. It is thereby able to incorporate several covariate variables and random effects. The model is intrinsically defined using the connection of the manifold, and the use of stochastic development avoids linearizing the geometry. We propose to infer parameters using...
Juang, C. H.; Huang, X. H.; Fleming, J. W.
This paper presents a new computational approach for performing fuzzy regression. In contrast to Bardossy's approach, the new approach, while dealing with fuzzy variables, closely follows the conventional regression technique. In this approach, treatment of fuzzy input is more 'computational' than 'symbolic.' The following sections first outline the formulation of the new approach, then deal with the implementation and computational scheme, and this is followed by examples to illustrate the new procedure.
Paindaveine, D.; Šiman, Miroslav
Roč. 56, č. 4 (2012), s. 840-853 ISSN 0167-9473 R&D Projects: GA MŠk(CZ) 1M06047 Institutional research plan: CEZ:AV0Z10750506 Keywords : halfspace depth * multiple -output regression * parametric linear programming * quantile regression Subject RIV: BA - General Mathematics Impact factor: 1.304, year: 2012 http://library.utia.cas.cz/separaty/2012/SI/siman-0376413.pdf
Durrleman, S; Simon, R
We describe the use of cubic splines in regression models to represent the relationship between the response variable and a vector of covariates. This simple method can help prevent the problems that result from inappropriate linearity assumptions. We compare restricted cubic spline regression to non-parametric procedures for characterizing the relationship between age and survival in the Stanford Heart Transplant data. We also provide an illustrative example in cancer therapeutics.
Abreu, Mery Natali Silva; Siqueira, Arminda Lucia; Caiaffa, Waleska Teixeira
Ordinal logistic regression models have been developed for analysis of epidemiological studies. However, the adequacy of such models for adjustment has so far received little attention. In this article, we reviewed the most important ordinal regression models and common approaches used to verify goodness-of-fit, using R or Stata programs. We performed formal and graphical analyses to compare ordinal models using data sets on health conditions from the National Health and Nutrition Examination Survey (NHANES II).
Hassan, S J
Merkel cell carcinoma is a rare aggressive neuroendocrine carcinoma of the skin predominantly affecting elderly Caucasians. It has a high rate of local recurrence and regional lymph node metastases. It is associated with a poor prognosis. Complete spontaneous regression of Merkel cell carcinoma has been reported but is a poorly understood phenomenon. Here we present a case of complete spontaneous regression of metastatic Merkel cell carcinoma demonstrating a markedly different pattern of events from those previously published.
Al Kadiri, M.
We study the marginal longitudinal nonparametric regression problem and some of its semiparametric extensions. We point out that, while several elaborate proposals for efficient estimation have been proposed, a relative simple and straightforward one, based on penalized splines, has not. After describing our approach, we then explain how Gibbs sampling and the BUGS software can be used to achieve quick and effective implementation. Illustrations are provided for nonparametric regression and additive models.
Liu, Bing; Xia, Shixiong; Zhou, Yong
Traditional manifold learning algorithms, such as locally linear embedding, Isomap, and Laplacian eigenmap, only provide the embedding results of the training samples. To solve the out-of-sample extension problem, spectral regression (SR) solves the problem of learning an embedding function by establishing a regression framework, which can avoid eigen-decomposition of dense matrices. Motivated by the effectiveness of SR, we incorporate multiple kernel learning (MKL) into SR for dimensionality...
This book is an undergraduate text that introduces students to commonly-used statistical methods in economics. Using examples based on contemporary economic issues and readily-available data, it not only explains the mechanics of the various methods, it also guides students to connect statistical results to detailed economic interpretations. Because the goal is for students to be able to apply the statistical methods presented, online sources for economic data and directions for performing each task in Excel are also included.
Lange, Theis; Hansen, Kim Wadt; Sørensen, Rikke
In recent years, mediation analysis has emerged as a powerful tool to disentangle causal pathways from an exposure/treatment to clinically relevant outcomes. Mediation analysis has been applied in scientific fields as diverse as labour market relations and randomized clinical trials of heart...... disease treatments. In parallel to these applications, the underlying mathematical theory and computer tools have been refined. This combined review and tutorial will introduce the reader to modern mediation analysis including: the mathematical framework; required assumptions; and software implementation...
Percin, M.; De Baar, J.H.S.; Van Oudheusden, B.W.; Dwight, R.P.
The work explores the three-dimensional unsteady wake of a flapping-wing Micro Air Vehicle (MAV) ‘DelFly II’, applying a Kriging regression technique for the spatial regression of time-resolved Stereoscopic Particle Image Velocimetry (Stereo-PIV) data. In the view of limited number of measurement
de Schryver, Tom; Eisinga, R.N.
The key question in research on dismissals of head coaches in sports clubs is not whether they should happen but when they will happen. This paper applies piecewise linear regression to advance our understanding of the timing of head coach dismissals. Essentially, the regression sacrifices degrees
Blank, J.L.T.; Valdmanis, V.G.
This study identifies the factors that affect the diffusion of hospital innovations. We apply a log odds random effects regression model on hospital micro data. We introduce the concept of clustering innovations and the application of a log odds random effects regression model to describe the
J.L.T. Blank (Jos); V.G. Valdmanis (Vivian G.)
textabstractThis study identifies the factors that affect the diffusion of hospital innovations. We apply a log odds random effects regression model on hospital micro data. We introduce the concept of clustering innovations and the application of a log odds random effects regression model to
Cepeda-Cuervo, Edilberto; Núñez-Antón, Vicente
In this article, a proposed Bayesian extension of the generalized beta spatial regression models is applied to the analysis of the quality of education in Colombia. We briefly revise the beta distribution and describe the joint modeling approach for the mean and dispersion parameters in the spatial regression models' setting. Finally, we motivate…
Bassiouni, Maoya; Vogel, Richard M.; Archfield, Stacey A.
Multicollinearity and omitted-variable bias are major limitations to developing multiple linear regression models to estimate streamflow characteristics in ungaged areas and varying rainfall conditions. Panel regression is used to overcome limitations of traditional regression methods, and obtain reliable model coefficients, in particular to understand the elasticity of streamflow to rainfall. Using annual rainfall and selected basin characteristics at 86 gaged streams in the Hawaiian Islands, regional regression models for three stream classes were developed to estimate the annual low-flow duration discharges. Three panel-regression structures (random effects, fixed effects, and pooled) were compared to traditional regression methods, in which space is substituted for time. Results indicated that panel regression generally was able to reproduce the temporal behavior of streamflow and reduce the standard errors of model coefficients compared to traditional regression, even for models in which the unobserved heterogeneity between streams is significant and the variance inflation factor for rainfall is much greater than 10. This is because both spatial and temporal variability were better characterized in panel regression. In a case study, regional rainfall elasticities estimated from panel regressions were applied to ungaged basins on Maui, using available rainfall projections to estimate plausible changes in surface-water availability and usable stream habitat for native species. The presented panel-regression framework is shown to offer benefits over existing traditional hydrologic regression methods for developing robust regional relations to investigate streamflow response in a changing climate.
Bu, Yude; Pan, Jingchang
As is well known, it is necessary to derive stellar parameters from massive amounts of spectral data automatically and efficiently. However, in traditional automatic methods such as artificial neural networks (ANNs) and kernel regression (KR), it is often difficult to optimize the algorithm structure and determine the optimal algorithm parameters. Gaussian process regression (GPR) is a recently developed method that has been proven to be capable of overcoming these difficulties. Here we apply GPR to derive stellar atmospheric parameters from spectra. Through evaluating the performance of GPR on Sloan Digital Sky Survey (SDSS) spectra, Medium resolution Isaac Newton Telescope Library of Empirical Spectra (MILES) spectra, ELODIE spectra and the spectra of member stars of galactic globular clusters, we conclude that GPR can derive stellar parameters accurately and precisely, especially when we use data preprocessed with principal component analysis (PCA). We then compare the performance of GPR with that of several widely used regression methods (ANNs, support-vector regression and KR) and find that with GPR it is easier to optimize structures and parameters and more efficient and accurate to extract atmospheric parameters.
ET AL. UNCLRSSIFIED NOV 66 90014-9S-K-0643 F/O 5/9 ML I F-2 EL .2 1111.25 IL64 Mliii 116 MICROCOPY RESOLUTION TEST CHART NATIONAL BUR AU OF STANDARDS_...saaetel D Bgaes as Usee Chapo& Bil1sWe 27914 scoo of ad .atio Universuty of K&Sme Staahw4 Univerdity Pershel.Wr Depeatas D.Os estOOS #aafied CA 9430 5...olfe or. fe l Ste nberg ai l a, 312 96734 ron el a colfe niverelty of Pensylvaia Navy 0e"oasel MD Caster Departent of Pyoelegy Dr. J. V. 11. Van
Liang Jiang; Peter C.B. Phillips; Jun Yu
This paper develops a new hedonic method for constructing a real estate price index that utilizes all transaction price information that encompasses both single-sale and repeat-sale properties. The new method is less prone to specification errors than standard hedonic methods and uses all available data. Like the Case-Shiller repeat-sales method, the new method has the advantage of being computationally efficient. In an empirical analysis of the methodology, we fit the model to all transactio...
Rong, Yao; Han, Xixuan; Hao, Dongmei
to compare the corticomuscular coherence in the alpha (7–15Hz), beta (15–30Hz) and gamma (30–45Hz) band at 25 % maximum grip force (MGF) and 75 % MGF. Results show that ESVR could reduce the influence of deflected signals and summarize the overall behavior of multiple coherence curves. Coherence proportion...
The aim of the present paper is two-fold. First, it attempts to support previous findings on the role of some psychometric variables, such as, M-capacity, the degree of field dependence-independence, logical thinking and the mobility-fixity dimension, on students' achievement in chemistry problem solving. Second, the paper aims to raise some…
Bauman, William; Crawford, Winifred; Barrett, Joe; Watson, Leela; Wheeler, Mark
This report summarizes the Applied Meteorology Unit (AMU) activities for the first quarter of Fiscal Year 2010 (October - December 2009). A detailed project schedule is included in the Appendix. Included tasks are: (1) Peak Wind Tool for User Launch Commit Criteria (LCC), (2) Objective Lightning Probability Tool, Phase III, (3) Peak Wind Tool for General Forecasting, Phase II, (4) Upgrade Summer Severe Weather Tool in Meteorological Interactive Data Display System (MIDDS), (5) Advanced Regional Prediction System (ARPS) Data Analysis System (ADAS) Update and Maintainability, (5) Verify 12-km resolution North American Model (MesoNAM) Performance, and (5) Hybrid Single-Particle Lagrangian Integrated Trajectory (HYSPLIT) Graphical User Interface.
We propose a surprisingly simple model for supervised video background estimation. Our model is based on $\\\\ell_1$ regression. As existing methods for $\\\\ell_1$ regression do not scale to high-resolution videos, we propose several simple and scalable methods for solving the problem, including iteratively reweighted least squares, a homotopy method, and stochastic gradient descent. We show through extensive experiments that our model and methods match or outperform the state-of-the-art online and batch methods in virtually all quantitative and qualitative measures.
David S Boukal
Full Text Available Temperature drives development in insects and other ectotherms because their metabolic rate and growth depends directly on thermal conditions. However, relative durations of successive ontogenetic stages often remain nearly constant across a substantial range of temperatures. This pattern, termed 'developmental rate isomorphy' (DRI in insects, appears to be widespread and reported departures from DRI are generally very small. We show that these conclusions may be due to the caveats hidden in the statistical methods currently used to study DRI. Because the DRI concept is inherently based on proportional data, we propose that Dirichlet regression applied to individual-level data is an appropriate statistical method to critically assess DRI. As a case study we analyze data on five aquatic and four terrestrial insect species. We find that results obtained by Dirichlet regression are consistent with DRI violation in at least eight of the studied species, although standard analysis detects significant departure from DRI in only four of them. Moreover, the departures from DRI detected by Dirichlet regression are consistently much larger than previously reported. The proposed framework can also be used to infer whether observed departures from DRI reflect life history adaptations to size- or stage-dependent effects of varying temperature. Our results indicate that the concept of DRI in insects and other ectotherms should be critically re-evaluated and put in a wider context, including the concept of 'equiproportional development' developed for copepods.
We consider nonparametric regression analysis in a generalized linear model (GLM) framework for data with covariates that are the subject-specific random effects of longitudinal measurements. The usual assumption that the effects of the longitudinal covariate processes are linear in the GLM may be unrealistic and if this happens it can cast doubt on the inference of observed covariate effects. Allowing the regression functions to be unknown, we propose to apply Bayesian nonparametric methods including cubic smoothing splines or P-splines for the possible nonlinearity and use an additive model in this complex setting. To improve computational efficiency, we propose the use of data-augmentation schemes. The approach allows flexible covariance structures for the random effects and within-subject measurement errors of the longitudinal processes. The posterior model space is explored through a Markov chain Monte Carlo (MCMC) sampler. The proposed methods are illustrated and compared to other approaches, the "naive" approach and the regression calibration, via simulations and by an application that investigates the relationship between obesity in adulthood and childhood growth curves. © 2010, The International Biometric Society.
Boukal, David S; Ditrich, Tomáš; Kutcherov, Dmitry; Sroka, Pavel; Dudová, Pavla; Papáček, Miroslav
Temperature drives development in insects and other ectotherms because their metabolic rate and growth depends directly on thermal conditions. However, relative durations of successive ontogenetic stages often remain nearly constant across a substantial range of temperatures. This pattern, termed 'developmental rate isomorphy' (DRI) in insects, appears to be widespread and reported departures from DRI are generally very small. We show that these conclusions may be due to the caveats hidden in the statistical methods currently used to study DRI. Because the DRI concept is inherently based on proportional data, we propose that Dirichlet regression applied to individual-level data is an appropriate statistical method to critically assess DRI. As a case study we analyze data on five aquatic and four terrestrial insect species. We find that results obtained by Dirichlet regression are consistent with DRI violation in at least eight of the studied species, although standard analysis detects significant departure from DRI in only four of them. Moreover, the departures from DRI detected by Dirichlet regression are consistently much larger than previously reported. The proposed framework can also be used to infer whether observed departures from DRI reflect life history adaptations to size- or stage-dependent effects of varying temperature. Our results indicate that the concept of DRI in insects and other ectotherms should be critically re-evaluated and put in a wider context, including the concept of 'equiproportional development' developed for copepods.
Full Text Available Among the recent data mining techniques available, the boosting approach has attracted a great deal of attention because of its effective learning algorithm and strong boundaries in terms of its generalization performance. However, the boosting approach has yet to be used in regression problems within the construction domain, including cost estimations, but has been actively utilized in other domains. Therefore, a boosting regression tree (BRT is applied to cost estimations at the early stage of a construction project to examine the applicability of the boosting approach to a regression problem within the construction domain. To evaluate the performance of the BRT model, its performance was compared with that of a neural network (NN model, which has been proven to have a high performance in cost estimation domains. The BRT model has shown results similar to those of NN model using 234 actual cost datasets of a building construction project. In addition, the BRT model can provide additional information such as the importance plot and structure model, which can support estimators in comprehending the decision making process. Consequently, the boosting approach has potential applicability in preliminary cost estimations in a building construction project.
Sjölund, J; Forsberg, D; Andersson, M; Knutsson, H
Radiotherapy planning and attenuation correction of PET images require simulation of radiation transport. The necessary physical properties are typically derived from computed tomography (CT) images, but in some cases, including stereotactic neurosurgery and combined PET/MR imaging, only magnetic resonance (MR) images are available. With these applications in mind, we describe how a realistic, patient-specific, pseudo-CT of the head can be derived from anatomical MR images. We refer to the method as atlas-based regression, because of its similarity to atlas-based segmentation. Given a target MR and an atlas database comprising MR and CT pairs, atlas-based regression works by registering each atlas MR to the target MR, applying the resulting displacement fields to the corresponding atlas CTs and, finally, fusing the deformed atlas CTs into a single pseudo-CT. We use a deformable registration algorithm known as the Morphon and augment it with a certainty mask that allows a tailoring of the influence certain regions are allowed to have on the registration. Moreover, we propose a novel method of fusion, wherein the collection of deformed CTs is iteratively registered to their joint mean and find that the resulting mean CT becomes more similar to the target CT. However, the voxelwise median provided even better results; at least as good as earlier work that required special MR imaging techniques. This makes atlas-based regression a good candidate for clinical use. (paper)
Full Text Available Lake water levels change under the influences of natural and/or anthropogenic environmental conditions. Among these influences are the climate change, greenhouse effects and ozone layer depletions which are reflected in the hydrological cycle features over the lake drainage basins. Lake levels are among the most significant hydrological variables that are influenced by different atmospheric and environmental conditions. Consequently, lake level time series in many parts of the world include nonstationarity components such as shifts in the mean value, apparent or hidden periodicities. On the other hand, many lake level modeling techniques have a stationarity assumption. The main purpose of this work is to develop a cluster regression model for dealing with nonstationarity especially in the form of shifting means. The basis of this model is the combination of transition probability and classical regression technique. Both parts of the model are applied to monthly level fluctuations of Lake Van in eastern Turkey. It is observed that the cluster regression procedure does preserve the statistical properties and the transitional probabilities that are indistinguishable from the original data.
Key words. Hydrology (hydrologic budget; stochastic processes · Meteorology and atmospheric dynamics (ocean-atmosphere interactions
Bertazzon, Stefania; Johnson, Markey; Eccles, Kristin; Kaplan, Gilaad G
In order to accurately assess air pollution risks, health studies require spatially resolved pollution concentrations. Land-use regression (LUR) models estimate ambient concentrations at a fine spatial scale. However, spatial effects such as spatial non-stationarity and spatial autocorrelation can reduce the accuracy of LUR estimates by increasing regression errors and uncertainty; and statistical methods for resolving these effects--e.g., spatially autoregressive (SAR) and geographically weighted regression (GWR) models--may be difficult to apply simultaneously. We used an alternate approach to address spatial non-stationarity and spatial autocorrelation in LUR models for nitrogen dioxide. Traditional models were re-specified to include a variable capturing wind speed and direction, and re-fit as GWR models. Mean R(2) values for the resulting GWR-wind models (summer: 0.86, winter: 0.73) showed a 10-20% improvement over traditional LUR models. GWR-wind models effectively addressed both spatial effects and produced meaningful predictive models. These results suggest a useful method for improving spatially explicit models. Copyright © 2015 The Authors. Published by Elsevier Ltd.. All rights reserved.
Quintero, Adrian; Lesaffre, Emmanuel
Multivariate regression methods generally assume a constant covariance matrix for the observations. In case a heteroscedastic model is needed, the parametric and nonparametric covariance regression approaches can be restrictive in the literature. We propose a multilevel regression model for the mean and covariance structure, including random intercepts in both components and allowing for correlation between them. The implied conditional covariance function can be different across clusters as a result of the random effect in the variance structure. In addition, allowing for correlation between the random intercepts in the mean and covariance makes the model convenient for skewedly distributed responses. Furthermore, it permits us to analyse directly the relation between the mean response level and the variability in each cluster. Parameter estimation is carried out via Gibbs sampling. We compare the performance of our model to other covariance modelling approaches in a simulation study. Finally, the proposed model is applied to the RN4CAST dataset to identify the variables that impact burnout of nurses in Belgium. © 2017 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.
Full Text Available Abstract Background Body mass index (BMI data usually have skewed distributions, for which common statistical modeling approaches such as simple linear or logistic regression have limitations. Methods Different regression approaches to predict childhood BMI by goodness-of-fit measures and means of interpretation were compared including generalized linear models (GLMs, quantile regression and Generalized Additive Models for Location, Scale and Shape (GAMLSS. We analyzed data of 4967 children participating in the school entry health examination in Bavaria, Germany, from 2001 to 2002. TV watching, meal frequency, breastfeeding, smoking in pregnancy, maternal obesity, parental social class and weight gain in the first 2 years of life were considered as risk factors for obesity. Results GAMLSS showed a much better fit regarding the estimation of risk factors effects on transformed and untransformed BMI data than common GLMs with respect to the generalized Akaike information criterion. In comparison with GAMLSS, quantile regression allowed for additional interpretation of prespecified distribution quantiles, such as quantiles referring to overweight or obesity. The variables TV watching, maternal BMI and weight gain in the first 2 years were directly, and meal frequency was inversely significantly associated with body composition in any model type examined. In contrast, smoking in pregnancy was not directly, and breastfeeding and parental social class were not inversely significantly associated with body composition in GLM models, but in GAMLSS and partly in quantile regression models. Risk factor specific BMI percentile curves could be estimated from GAMLSS and quantile regression models. Conclusion GAMLSS and quantile regression seem to be more appropriate than common GLMs for risk factor modeling of BMI data.
Beyerlein, Andreas; Fahrmeir, Ludwig; Mansmann, Ulrich; Toschke, André M
Body mass index (BMI) data usually have skewed distributions, for which common statistical modeling approaches such as simple linear or logistic regression have limitations. Different regression approaches to predict childhood BMI by goodness-of-fit measures and means of interpretation were compared including generalized linear models (GLMs), quantile regression and Generalized Additive Models for Location, Scale and Shape (GAMLSS). We analyzed data of 4967 children participating in the school entry health examination in Bavaria, Germany, from 2001 to 2002. TV watching, meal frequency, breastfeeding, smoking in pregnancy, maternal obesity, parental social class and weight gain in the first 2 years of life were considered as risk factors for obesity. GAMLSS showed a much better fit regarding the estimation of risk factors effects on transformed and untransformed BMI data than common GLMs with respect to the generalized Akaike information criterion. In comparison with GAMLSS, quantile regression allowed for additional interpretation of prespecified distribution quantiles, such as quantiles referring to overweight or obesity. The variables TV watching, maternal BMI and weight gain in the first 2 years were directly, and meal frequency was inversely significantly associated with body composition in any model type examined. In contrast, smoking in pregnancy was not directly, and breastfeeding and parental social class were not inversely significantly associated with body composition in GLM models, but in GAMLSS and partly in quantile regression models. Risk factor specific BMI percentile curves could be estimated from GAMLSS and quantile regression models. GAMLSS and quantile regression seem to be more appropriate than common GLMs for risk factor modeling of BMI data.
Ali A. Al-Subaihi
Full Text Available This paper introduces a SAS/IML program to select among the multivariate model candidates based on a few well-known multivariate model selection criteria. Stepwise regression and all-possible-regression are considered. The program is user friendly and requires the user to paste or read the data at the beginning of the module, include the names of the dependent and independent variables (the y's and the x's, and then run the module. The program produces the multivariate candidate models based on the following criteria: Forward Selection, Forward Stepwise Regression, Backward Elimination, Mean Square Error, Coefficient of Multiple Determination, Adjusted Coefficient of Multiple Determination, Akaike's Information Criterion, the Corrected Form of Akaike's Information Criterion, Hannan and Quinn Information Criterion, the Corrected Form of Hannan and Quinn (HQc Information Criterion, Schwarz's Criterion, and Mallow's PC. The output also constitutes detailed as well as summarized results.
Grasmair, M.; Scherzer, O.; Vanhems, A.
This paper considers the nonparametric regression model with an additive error that is dependent on the explanatory variables. As is common in empirical studies in epidemiology and economics, it also supposes that valid instrumental variables are observed. A classical example in microeconomics considers the consumer demand function as a function of the price of goods and the income, both variables often considered as endogenous. In this framework, the economic theory also imposes shape restrictions on the demand function, such as integrability conditions. Motivated by this illustration in microeconomics, we study an estimator of a nonparametric constrained regression function using instrumental variables by means of Tikhonov regularization. We derive rates of convergence for the regularized model both in a deterministic and stochastic setting under the assumption that the true regression function satisfies a projected source condition including, because of the non-convexity of the imposed constraints, an additional smallness condition.
Grasmair, M; Scherzer, O; Vanhems, A
This paper considers the nonparametric regression model with an additive error that is dependent on the explanatory variables. As is common in empirical studies in epidemiology and economics, it also supposes that valid instrumental variables are observed. A classical example in microeconomics considers the consumer demand function as a function of the price of goods and the income, both variables often considered as endogenous. In this framework, the economic theory also imposes shape restrictions on the demand function, such as integrability conditions. Motivated by this illustration in microeconomics, we study an estimator of a nonparametric constrained regression function using instrumental variables by means of Tikhonov regularization. We derive rates of convergence for the regularized model both in a deterministic and stochastic setting under the assumption that the true regression function satisfies a projected source condition including, because of the non-convexity of the imposed constraints, an additional smallness condition. (paper)
In this study, we have analyzed the factors that affect the performance of Turkey's Top 500 Industrial Enterprises using quantile regression. The variable about labor productivity of enterprises is considered as dependent variable, the variableabout assets is considered as independent variable. The distribution of labor productivity of enterprises is right-skewed. If the dependent distribution is skewed, linear regression could not catch important aspects of the relationships between the dependent variable and its predictors due to modeling only the conditional mean. Hence, the quantile regression, which allows modelingany quantilesof the dependent distribution, including the median,appears to be useful. It examines whether relationships between dependent and independent variables are different for low, medium, and high percentiles. As a result of analyzing data, the effect of total assets is relatively constant over the entire distribution, except the upper tail. It hasa moderately stronger effect in the upper tail.
U.S. Environmental Protection Agency — Spreadsheets are included here to support the manuscript "Boosted Regression Tree Models to Explain Watershed Nutrient Concentrations and Biological Condition". This...
Full Text Available Data comprising 1,719 milk yield records from 357 females (predominantly Murrah breed, daughters of 110 sires, with births from 1974 to 2004, obtained from the Programa de Melhoramento Genético de Bubalinos (PROMEBUL and from records of EMBRAPA Amazônia Oriental - EAO herd, located in Belém, Pará, Brazil, were used to compare random regression models for estimating variance components and predicting breeding values of the sires. The data were analyzed by different models using the Legendre’s polynomial functions from second to fourth orders. The random regression models included the effects of herd-year, month of parity date of the control; regression coefficients for age of females (in order to describe the fixed part of the lactation curve and random regression coefficients related to the direct genetic and permanent environment effects. The comparisons among the models were based on the Akaike Infromation Criterion. The random effects regression model using third order Legendre’s polynomials with four classes of the environmental effect were the one that best described the additive genetic variation in milk yield. The heritability estimates varied from 0.08 to 0.40. The genetic correlation between milk yields in younger ages was close to the unit, but in older ages it was low.
Hou, Jianglong; Kang, Y. James
Pathological cardiac hypertrophy is a key risk factor for heart failure. It is associated with increased interstitial fibrosis, cell death and cardiac dysfunction. The progression of pathological cardiac hypertrophy has long been considered as irreversible. However, recent clinical observations and experimental studies have produced evidence showing the reversal of pathological cardiac hypertrophy. Left ventricle assist devices used in heart failure patients for bridging to transplantation not only improve peripheral circulation but also often cause reverse remodeling of the geometry and recovery of the function of the heart. Dietary supplementation with physiologically relevant levels of copper can reverse pathological cardiac hypertrophy in mice. Angiogenesis is essential and vascular endothelial growth factor (VEGF) is a constitutive factor for the regression. The action of VEGF is mediated by VEGF receptor-1, whose activation is linked to cyclic GMP-dependent protein kinase-1 (PKG-1) signaling pathways, and inhibition of cyclic GMP degradation leads to regression of pathological cardiac hypertrophy. Most of these pathways are regulated by hypoxia-inducible factor. Potential therapeutic targets for promoting the regression include: promotion of angiogenesis, selective enhancement of VEGF receptor-1 signaling pathways, stimulation of PKG-1 pathways, and sustention of hypoxia-inducible factor transcriptional activity. More exciting insights into the regression of pathological cardiac hypertrophy are emerging. The time of translating the concept of regression of pathological cardiac hypertrophy to clinical practice is coming. PMID:22750195
Langer, Rupert; Becker, Karen
Neoadjuvant therapy has been successfully introduced in the treatment of locally advanced gastrointestinal malignancies, particularly esophageal, gastric, and rectal cancers. The effects of preoperative chemo- or radiochemotherapy can be determined by histopathological investigation of the resection specimen following this treatment. Frequent histological findings after neoadjuvant therapy include various amounts of residual tumor, inflammation, resorptive changes with infiltrates of foamy histiocytes, foreign body reactions, and scarry fibrosis. Several tumor regression grading (TRG) systems, which aim to categorize the amount of regressive changes after cytotoxic treatment in primary tumor sites, have been proposed for gastroesophageal and rectal carcinomas. These systems primarily refer to the amount of therapy-induced fibrosis in relation to the residual tumor (e.g., the Mandard, Dworak, or AJCC systems) or the estimated percentage of residual tumor in relation to the previous tumor site (e.g., the Becker, Rödel, or Rectal Cancer Regression Grading systems). TRGs provide valuable prognostic information, as in most cases, complete or subtotal tumor regression after neoadjuvant treatment is associated with better patient outcomes. This review describes the typical histopathological findings after neoadjuvant treatment, discusses the most commonly used TRG systems for gastroesophageal and rectal carcinomas, addresses the limitations and critical issues of tumor regression grading in these tumors, and describes the clinical impact of TRG.
Hou, Jianglong; Kang, Y James
Pathological cardiac hypertrophy is a key risk factor for heart failure. It is associated with increased interstitial fibrosis, cell death and cardiac dysfunction. The progression of pathological cardiac hypertrophy has long been considered as irreversible. However, recent clinical observations and experimental studies have produced evidence showing the reversal of pathological cardiac hypertrophy. Left ventricle assist devices used in heart failure patients for bridging to transplantation not only improve peripheral circulation but also often cause reverse remodeling of the geometry and recovery of the function of the heart. Dietary supplementation with physiologically relevant levels of copper can reverse pathological cardiac hypertrophy in mice. Angiogenesis is essential and vascular endothelial growth factor (VEGF) is a constitutive factor for the regression. The action of VEGF is mediated by VEGF receptor-1, whose activation is linked to cyclic GMP-dependent protein kinase-1 (PKG-1) signaling pathways, and inhibition of cyclic GMP degradation leads to regression of pathological cardiac hypertrophy. Most of these pathways are regulated by hypoxia-inducible factor. Potential therapeutic targets for promoting the regression include: promotion of angiogenesis, selective enhancement of VEGF receptor-1 signaling pathways, stimulation of PKG-1 pathways, and sustention of hypoxia-inducible factor transcriptional activity. More exciting insights into the regression of pathological cardiac hypertrophy are emerging. The time of translating the concept of regression of pathological cardiac hypertrophy to clinical practice is coming. Copyright © 2012 Elsevier Inc. All rights reserved.
Rozliman, Nur Aainaa; Ibrahim, Adriana Irawati Nur; Yunus, Rossita Mohammad
In many applications and experiments, data sets are often contaminated with error or mismeasured covariates. When at least one of the covariates in a model is measured with error, Errors-in-Variables (EIV) model can be used. Measurement error, when not corrected, would cause misleading statistical inferences and analysis. Therefore, our goal is to examine the relationship of the outcome variable and the unobserved exposure variable given the observed mismeasured surrogate by applying the Bayesian formulation to the EIV model. We shall extend the flexible parametric method proposed by Hossain and Gustafson (2009) to another nonlinear regression model which is the Poisson regression model. We shall then illustrate the application of this approach via a simulation study using Markov chain Monte Carlo sampling methods.
Full Text Available While support vector regression is widely used as both a function approximating tool and a residual generator for nonlinear system fault isolation, a drawback for this method is the freedom in selecting model parameters. Moreover, for samples with discordant distributing complexities, the selection of reasonable parameters is even impossible. To alleviate this problem we introduce the method of flexible support vector regression (F-SVR, which is especially suited for modelling complicated sample distributions, as it is free from parameters selection. Reasonable parameters for F-SVR are automatically generated given a sample distribution. Lastly, we apply this method in the analysis of the fault isolation of high frequency power supplies, where satisfactory results have been obtained.
Kala, Abhishek K; Tiwari, Chetan; Mikler, Armin R; Atkinson, Samuel F
The primary aim of the study reported here was to determine the effectiveness of utilizing local spatial variations in environmental data to uncover the statistical relationships between West Nile Virus (WNV) risk and environmental factors. Because least squares regression methods do not account for spatial autocorrelation and non-stationarity of the type of spatial data analyzed for studies that explore the relationship between WNV and environmental determinants, we hypothesized that a geographically weighted regression model would help us better understand how environmental factors are related to WNV risk patterns without the confounding effects of spatial non-stationarity. We examined commonly mapped environmental factors using both ordinary least squares regression (LSR) and geographically weighted regression (GWR). Both types of models were applied to examine the relationship between WNV-infected dead bird counts and various environmental factors for those locations. The goal was to determine which approach yielded a better predictive model. LSR efforts lead to identifying three environmental variables that were statistically significantly related to WNV infected dead birds (adjusted R 2 = 0.61): stream density, road density, and land surface temperature. GWR efforts increased the explanatory value of these three environmental variables with better spatial precision (adjusted R 2 = 0.71). The spatial granularity resulting from the geographically weighted approach provides a better understanding of how environmental spatial heterogeneity is related to WNV risk as implied by WNV infected dead birds, which should allow improved planning of public health management strategies.
Richardson, David B; Hamra, Ghassan B; MacLehose, Richard F; Cole, Stephen R; Chu, Haitao
In cohort mortality studies, there often is interest in associations between an exposure of primary interest and mortality due to a range of different causes. A standard approach to such analyses involves fitting a separate regression model for each type of outcome. However, the statistical precision of some estimated associations may be poor because of sparse data. In this paper, we describe a hierarchical regression model for estimation of parameters describing outcome-specific relative rate functions and associated credible intervals. The proposed model uses background stratification to provide flexible control for the outcome-specific associations of potential confounders, and it employs a hierarchical "shrinkage" approach to stabilize estimates of an exposure's associations with mortality due to different causes of death. The approach is illustrated in analyses of cancer mortality in 2 cohorts: a cohort of dioxin-exposed US chemical workers and a cohort of radiation-exposed Japanese atomic bomb survivors. Compared with standard regression estimates of associations, hierarchical regression yielded estimates with improved precision that tended to have less extreme values. The hierarchical regression approach also allowed the fitting of models with effect-measure modification. The proposed hierarchical approach can yield estimates of association that are more precise than conventional estimates when one wishes to estimate associations with multiple outcomes. © The Author 2015. Published by Oxford University Press on behalf of the Johns Hopkins Bloomberg School of Public Health. All rights reserved. For permissions, please e-mail: firstname.lastname@example.org.
Past life regression therapy is used by some physicians in cases with some mental diseases. Anxiety disorders, mood disorders, and gender dysphoria have all been treated using life regression therapy by some doctors on the assumption that they reflect problems in past lives. Although it is not supported by psychiatric associations, few medical associations have actually condemned it as unethical. In this article, I argue that past life regression therapy is unethical for two basic reasons. First, it is not evidence-based. Past life regression is based on the reincarnation hypothesis, but this hypothesis is not supported by evidence, and in fact, it faces some insurmountable conceptual problems. If patients are not fully informed about these problems, they cannot provide an informed consent, and hence, the principle of autonomy is violated. Second, past life regression therapy has the great risk of implanting false memories in patients, and thus, causing significant harm. This is a violation of the principle of non-malfeasance, which is surely the most important principle in medical ethics.
Kim, Sun Mi; Kim, Yongdai; Jeong, Kuhwan; Jeong, Heeyeong; Kim, Jiyoung
The aim of this study was to compare the performance of image analysis for predicting breast cancer using two distinct regression models and to evaluate the usefulness of incorporating clinical and demographic data (CDD) into the image analysis in order to improve the diagnosis of breast cancer. This study included 139 solid masses from 139 patients who underwent a ultrasonography-guided core biopsy and had available CDD between June 2009 and April 2010. Three breast radiologists retrospectively reviewed 139 breast masses and described each lesion using the Breast Imaging Reporting and Data System (BI-RADS) lexicon. We applied and compared two regression methods-stepwise logistic (SL) regression and logistic least absolute shrinkage and selection operator (LASSO) regression-in which the BI-RADS descriptors and CDD were used as covariates. We investigated the performances of these regression methods and the agreement of radiologists in terms of test misclassification error and the area under the curve (AUC) of the tests. Logistic LASSO regression was superior (PBI-RADS descriptors and CDD showed better performance than SL in predicting the presence of breast cancer. The use of CDD as a supplement to the BI-RADS descriptors significantly improved the prediction of breast cancer using logistic LASSO regression.
Zhang, Yong; Kwon, Dongjin; Pohl, Kilian M
We derive an algorithm to directly solve logistic regression based on cardinality constraint, group sparsity and use it to classify intra-subject MRI sequences (e.g. cine MRIs) of healthy from diseased subjects. Group cardinality constraint models are often applied to medical images in order to avoid overfitting of the classifier to the training data. Solutions within these models are generally determined by relaxing the cardinality constraint to a weighted feature selection scheme. However, these solutions relate to the original sparse problem only under specific assumptions, which generally do not hold for medical image applications. In addition, inferring clinical meaning from features weighted by a classifier is an ongoing topic of discussion. Avoiding weighing features, we propose to directly solve the group cardinality constraint logistic regression problem by generalizing the Penalty Decomposition method. To do so, we assume that an intra-subject series of images represents repeated samples of the same disease patterns. We model this assumption by combining series of measurements created by a feature across time into a single group. Our algorithm then derives a solution within that model by decoupling the minimization of the logistic regression function from enforcing the group sparsity constraint. The minimum to the smooth and convex logistic regression problem is determined via gradient descent while we derive a closed form solution for finding a sparse approximation of that minimum. We apply our method to cine MRI of 38 healthy controls and 44 adult patients that received reconstructive surgery of Tetralogy of Fallot (TOF) during infancy. Our method correctly identifies regions impacted by TOF and generally obtains statistically significant higher classification accuracy than alternative solutions to this model, i.e., ones relaxing group cardinality constraints. Copyright © 2016 Elsevier B.V. All rights reserved.
Ioannidou, Elena; Flori, Aikaterini; Varouchakis, Emmanouil A.; Giannakis, Georgios; Vozinaki, Anthi Eirini K.; Karatzas, George P.; Nikolaidis, Nikolaos
erosion occurrence probability can be calculated in conjunction with the model deviance regarding the independent variables tested. The most straightforward measure for goodness of fit is the G statistic. It is a simple and effective way to study and evaluate the Logistic Regression model efficiency and the reliability of each independent variable. The developed statistical model is applied to the Koiliaris River Basin on the island of Crete, Greece. Two datasets of river bank slope, river cross-section width and indications of erosion were available for the analysis (12 and 8 locations). Two different types of spatial dependence functions, exponential and tricubic, were examined to determine the local spatial dependence of the independent variables at the measurement locations. The results show a significant improvement when the tricubic function is applied as the erosion probability is accurately predicted at all eight validation locations. Results for the model deviance show that cross-section width is more important than bank slope in the estimation of erosion probability along the Koiliaris riverbanks. The proposed statistical model is a useful tool that quantifies the erosion probability along the riverbanks and can be used to assist managing erosion and flooding events. Acknowledgements This work is part of an on-going THALES project (CYBERSENSORS - High Frequency Monitoring System for Integrated Water Resources Management of Rivers). The project has been co-financed by the European Union (European Social Fund - ESF) and Greek national funds through the Operational Program "Education and Lifelong Learning" of the National Strategic Reference Framework (NSRF) - Research Funding Program: THALES. Investing in knowledge society through the European Social Fund.
Wang, Lingfeng; Pan, Chunhong
In this brief, we propose a new groupwise retargeted least squares regression (GReLSR) model for multicategory classification. The main motivation behind GReLSR is to utilize an additional regularization to restrict the translation values of ReLSR, so that they should be similar within same class. By analyzing the regression targets of ReLSR, we propose a new formulation of ReLSR, where the translation values are expressed explicitly. On the basis of the new formulation, discriminative least-squares regression can be regarded as a special case of ReLSR with zero translation values. Moreover, a groupwise constraint is added to ReLSR to form the new GReLSR model. Extensive experiments on various machine leaning data sets illustrate that our method outperforms the current state-of-the-art approaches.
Birch, Kristina; Olsen, Jørgen Kai; Tjur, Tue
On the background of a data set of weekly sales and prices for three brands of coffee, this paper discusses various regression models and their relation to the multiplicative competitive-interaction model (the MCI model, see Cooper 1988, 1993) for market-shares. Emphasis is put on the interpretat......On the background of a data set of weekly sales and prices for three brands of coffee, this paper discusses various regression models and their relation to the multiplicative competitive-interaction model (the MCI model, see Cooper 1988, 1993) for market-shares. Emphasis is put...... on the interpretation of the parameters in relation to models for the total sales based on discrete choice models.Key words and phrases. MCI model, discrete choice model, market-shares, price elasitcity, regression model....
Tracy Zhou Wu
Full Text Available Lq-penalized regression arises in multidimensional statistical modelling where all or part of the regression coefficients are penalized to achieve both accuracy and parsimony of statistical models. There is often substantial computational difficulty except for the quadratic penalty case. The difficulty is partly due to the nonsmoothness of the objective function inherited from the use of the absolute value. We propose a new solution method for the general Lq-penalized regression problem based on space transformation and thus efficient optimization algorithms. The new method has immediate applications in statistics, notably in penalized spline smoothing problems. In particular, the LASSO problem is shown to be polynomial time solvable. Numerical studies show promise of our approach.
Full Text Available In this paper we present a way to solve the linear regression model with R and Hadoop using the Rhadoop library. We show how the linear regression model can be solved even for very large models that require special technologies. For storing the data we used Hadoop and for computation we used R. The interface between R and Hadoop is the open source library RHadoop. We present the main features of the Hadoop and R software systems and the way of interconnecting them. We then show how the least squares solution for the linear regression problem could be expressed in terms of map-reduce programming paradigm and how could be implemented using the Rhadoop library.
Suryanarayana, T M V
This book highlights the estimation of crop yield in Central Gujarat, especially with regard to the development of Multiple Regression Models and Principal Component Regression (PCR) models using climatological parameters as independent variables and crop yield as a dependent variable. It subsequently compares the multiple linear regression (MLR) and PCR results, and discusses the significance of PCR for crop yield estimation. In this context, the book also covers Principal Component Analysis (PCA), a statistical procedure used to reduce a number of correlated variables into a smaller number of uncorrelated variables called principal components (PC). This book will be helpful to the students and researchers, starting their works on climate and agriculture, mainly focussing on estimation models. The flow of chapters takes the readers in a smooth path, in understanding climate and weather and impact of climate change, and gradually proceeds towards downscaling techniques and then finally towards development of ...
Full Text Available Ordinary least square is a parameter estimations for minimizing residual sum of squares. If the multicollinearity was found in the data, unbias estimator with minimum variance could not be reached. Multicollinearity is a linear correlation between independent variabels in model. Jackknife Ridge Regression(JRR as an extension of Generalized Ridge Regression (GRR for solving multicollinearity. Generalized Ridge Regression is used to overcome the bias of estimators caused of presents multicollinearity by adding different bias parameter for each independent variabel in least square equation after transforming the data into an orthoghonal form. Beside that, JRR can reduce the bias of the ridge estimator. The result showed that JRR model out performs GRR model.
Khoshravesh, Mojtaba; Sefidkouhi, Mohammad Ali Gholami; Valipour, Mohammad
The proper evaluation of evapotranspiration is essential in food security investigation, farm management, pollution detection, irrigation scheduling, nutrient flows, carbon balance as well as hydrologic modeling, especially in arid environments. To achieve sustainable development and to ensure water supply, especially in arid environments, irrigation experts need tools to estimate reference evapotranspiration on a large scale. In this study, the monthly reference evapotranspiration was estimated by three different regression models including the multivariate fractional polynomial (MFP), robust regression, and Bayesian regression in Ardestan, Esfahan, and Kashan. The results were compared with Food and Agriculture Organization (FAO)-Penman-Monteith (FAO-PM) to select the best model. The results show that at a monthly scale, all models provided a closer agreement with the calculated values for FAO-PM ( R 2 > 0.95 and RMSE < 12.07 mm month-1). However, the MFP model gives better estimates than the other two models for estimating reference evapotranspiration at all stations.
Hazra, Avijit; Gogtay, Nithya
Correlation and linear regression are the most commonly used techniques for quantifying the association between two numeric variables. Correlation quantifies the strength of the linear relationship between paired variables, expressing this as a correlation coefficient. If both variables x and y are normally distributed, we calculate Pearson's correlation coefficient ( r ). If normality assumption is not met for one or both variables in a correlation analysis, a rank correlation coefficient, such as Spearman's rho (ρ) may be calculated. A hypothesis test of correlation tests whether the linear relationship between the two variables holds in the underlying population, in which case it returns a P linear relation with the independent variable x and is called the coefficient of determination. Linear regression is a technique that attempts to link two correlated variables x and y in the form of a mathematical equation ( y = a + bx ), such that given the value of one variable the other may be predicted. In general, the method of least squares is applied to obtain the equation of the regression line. Correlation and linear regression analysis are based on certain assumptions pertaining to the data sets. If these assumptions are not met, misleading conclusions may be drawn. The first assumption is that of linear relationship between the two variables. A scatter plot is essential before embarking on any correlation-regression analysis to show that this is indeed the case. Outliers or clustering within data sets can distort the correlation coefficient value. Finally, it is vital to remember that though strong correlation can be a pointer toward causation, the two are not synonymous.
Malmquist bias is present in all astronomical surveys where sources are observed above an apparent brightness threshold. Those sources which can be detected at progressively larger distances are progressively more limited to the intrinsically luminous portion of the true distribution. This bias does not distort any of the measurements, but distorts the sample composition. We have developed the first treatment to correct for Malmquist bias in linear regressions of astronomical data. A demonstration of the corrected linear regression that is computed in four steps is presented.
Czekaj, Tomasz Gerard; Henningsen, Arne
We discuss nonparametric regression models for panel data. A fully nonparametric panel data specification that uses the time variable and the individual identifier as additional (categorical) explanatory variables is considered to be the most suitable. We use this estimator and conventional...... parametric panel data estimators to analyse the production technology of Polish crop farms. The results of our nonparametric kernel regressions generally differ from the estimates of the parametric models but they only slightly depend on the choice of the kernel functions. Based on economic reasoning, we...... found the estimates of the fully nonparametric panel data model to be more reliable....
Paindaveine, D.; Šiman, Miroslav
Roč. 102, č. 2 (2011), s. 193-212 ISSN 0047-259X R&D Projects: GA MŠk(CZ) 1M06047 Grant - others:Commision EC(BE) Fonds National de la Recherche Scientifique Institutional research plan: CEZ:AV0Z10750506 Keywords : multivariate quantile * quantile regression * multiple-output regression * halfspace depth * portfolio optimization * value -at risk Subject RIV: BA - General Mathematics Impact factor: 0.879, year: 2011 http://library.utia.cas.cz/separaty/2011/SI/siman-0364128.pdf
Henrard, S; Speybroeck, N; Hermans, C
Haemophilia is a rare genetic haemorrhagic disease characterized by partial or complete deficiency of coagulation factor VIII, for haemophilia A, or IX, for haemophilia B. As in any other medical research domain, the field of haemophilia research is increasingly concerned with finding factors associated with binary or continuous outcomes through multivariable models. Traditional models include multiple logistic regressions, for binary outcomes, and multiple linear regressions for continuous outcomes. Yet these regression models are at times difficult to implement, especially for non-statisticians, and can be difficult to interpret. The present paper sought to didactically explain how, why, and when to use classification and regression tree (CART) analysis for haemophilia research. The CART method is non-parametric and non-linear, based on the repeated partitioning of a sample into subgroups based on a certain criterion. Breiman developed this method in 1984. Classification trees (CTs) are used to analyse categorical outcomes and regression trees (RTs) to analyse continuous ones. The CART methodology has become increasingly popular in the medical field, yet only a few examples of studies using this methodology specifically in haemophilia have to date been published. Two examples using CART analysis and previously published in this field are didactically explained in details. There is increasing interest in using CART analysis in the health domain, primarily due to its ease of implementation, use, and interpretation, thus facilitating medical decision-making. This method should be promoted for analysing continuous or categorical outcomes in haemophilia, when applicable. © 2015 John Wiley & Sons Ltd.
Takagi, Daisuke; Ikeda, Ken'ichi; Kawachi, Ichiro
Crime is an important determinant of public health outcomes, including quality of life, mental well-being, and health behavior. A body of research has documented the association between community social capital and crime victimization. The association between social capital and crime victimization has been examined at multiple levels of spatial aggregation, ranging from entire countries, to states, metropolitan areas, counties, and neighborhoods. In multilevel analysis, the spatial boundaries at level 2 are most often drawn from administrative boundaries (e.g., Census tracts in the U.S.). One problem with adopting administrative definitions of neighborhoods is that it ignores spatial spillover. We conducted a study of social capital and crime victimization in one ward of Tokyo city, using a spatial Durbin model with an inverse-distance weighting matrix that assigned each respondent a unique level of "exposure" to social capital based on all other residents' perceptions. The study is based on a postal questionnaire sent to 20-69 years old residents of Arakawa Ward, Tokyo. The response rate was 43.7%. We examined the contextual influence of generalized trust, perceptions of reciprocity, two types of social network variables, as well as two principal components of social capital (constructed from the above four variables). Our outcome measure was self-reported crime victimization in the last five years. In the spatial Durbin model, we found that neighborhood generalized trust, reciprocity, supportive networks and two principal components of social capital were each inversely associated with crime victimization. By contrast, a multilevel regression performed with the same data (using administrative neighborhood boundaries) found generally null associations between neighborhood social capital and crime. Spatial regression methods may be more appropriate for investigating the contextual influence of social capital in homogeneous cultural settings such as Japan. Copyright
Full Text Available Multiple regression study of the influence of catalyst's characters with γ-Al2O3 as a support, including acidity, specific area, average pore volume, average pore radius, Ni content, and Mo content the hydrocracking conversion of asphaltene has been conducted. A multivariable regression analysis method, including regression analysis and correlation analysis, was applied on this study. Using multivariable regression, the characters of catalyst was correlated together with the data of the asphaltene conversions. Furthermore, using this method, the characters of catalyst, which have the greatest influence on conversion, may be evaluated. The results showed that there was a high correlation between catalyst characters and hydrocracking conversion of asphalten (r = 0.983. It means that the conversion was 98.3% correlated with the catalyst characters. The value of the multivariable determination coefficient was 0.966, indicating that at least 96.6% variation on the conversions was determined by combination of catalyst characters on this research. From the parameter value of regression equation, it could also be known that average pore radius and specific surface area were the two characters that have the greatest influence on the hydrocracking conversion of asphalten. Keywords: multivariable regression, catalyst's characters, high correlation degree, determination coefficient
Song Meicun; Cai Qi
The research condition of fault trend prediction and the basic theory of support vector regression (SVR) were introduced. SVR was applied to the fault trend prediction of roller bearing, and compared with other methods (BP neural network, gray model, and gray-AR model). The results show that BP network tends to overlearn and gets into local minimum so that the predictive result is unstable. It also shows that the predictive result of SVR is stabilization, and SVR is superior to BP neural network, gray model and gray-AR model in predictive precision. SVR is a kind of effective method of fault trend prediction. (authors)
Bache, Stefan Holst; Dahl, Christian Møller; Kristensen, Johannes Tang
to the possibility that smoking habits can be influenced through policy conduct. It is widely believed that maternal smoking reduces birthweight; however, the crucial difficulty in estimating such effects is the unobserved heterogeneity among mothers. We consider extensions of three panel data models to a quantile...... regression framework in order to control for heterogeneity and to infer conclusions about causality across the entire birthweight distribution. We obtain estimation results for maternal smoking and other interesting determinants, applying these to data obtained from Aarhus University Hospital, Skejby...
LaVange, L M; Koch, G G; Schwartz, T A
This paper outlines the utility of statistical methods for sample surveys in analysing clinical trials data. Sample survey statisticians face a variety of complex data analysis issues deriving from the use of multi-stage probability sampling from finite populations. One such issue is that of clustering of observations at the various stages of sampling. Survey data analysis approaches developed to accommodate clustering in the sample design have more general application to clinical studies in which repeated measures structures are encountered. Situations where these methods are of interest include multi-visit studies where responses are observed at two or more time points for each patient, multi-period cross-over studies, and epidemiological studies for repeated occurrences of adverse events or illnesses. We describe statistical procedures for fitting multiple regression models to sample survey data that are more effective for repeated measures studies with complicated data structures than the more traditional approaches of multivariate repeated measures analysis. In this setting, one can specify a primary sampling unit within which repeated measures have intraclass correlation. This intraclass correlation is taken into account by sample survey regression methods through robust estimates of the standard errors of the regression coefficients. Regression estimates are obtained from model fitting estimation equations which ignore the correlation structure of the data (that is, computing procedures which assume that all observational units are independent or are from simple random samples). The analytic approach is straightforward to apply with logistic models for dichotomous data, proportional odds models for ordinal data, and linear models for continuously scaled data, and results are interpretable in terms of population average parameters. Through the features summarized here, the sample survey regression methods have many similarities to the broader family of
Stolzer, Alan J.; Halford, Carl
In a previous study, multiple regression techniques were applied to Flight Operations Quality Assurance-derived data to develop parsimonious model(s) for fuel consumption on the Boeing 757 airplane. The present study examined several data mining algorithms, including neural networks, on the fuel consumption problem and compared them to the multiple regression results obtained earlier. Using regression methods, parsimonious models were obtained that explained approximately 85% of the variation in fuel flow. In general data mining methods were more effective in predicting fuel consumption. Classification and Regression Tree methods reported correlation coefficients of .91 to .92, and General Linear Models and Multilayer Perceptron neural networks reported correlation coefficients of about .99. These data mining models show great promise for use in further examining large FOQA databases for operational and safety improvements.
In this study, Birth Weight has been estimated from anthropometric measurements of hand and foot. Linear regression equations were formed from each of the measured variables. These simple equations can be used to estimate Birth Weight of new born babies, in order to identify those with low birth weight and referred to ...
Cologne, John B.; Sposto, Richard
Consider the problem of fitting a curve to data that exhibit a multiphase linear response with smooth transitions between phases. We propose substituting hyperbolas as covariates in piecewise linear regression splines to obtain curves that are smoothly joined. The method provides an intuitive and easy way to extend the two-phase linear hyperbolic response model of Griffiths and Miller and Watts and Bacon to accommodate more than two linear segments. The resulting regression spline with hyperbolic covariates may be fit by nonlinear regression methods to estimate the degree of curvature between adjoining linear segments. The added complexity of fitting nonlinear, as opposed to linear, regression models is not great. The extra effort is particularly worthwhile when investigators are unwilling to assume that the slope of the response changes abruptly at the join points. We can also estimate the join points (the values of the abscissas where the linear segments would intersect if extrapolated) if their number and approximate locations may be presumed known. An example using data on changing age at menarche in a cohort of Japanese women illustrates the use of the method for exploratory data analysis. (author)
Madsen, Kaj; Nielsen, Hans Bruun
The Huber M-estimator for robust linear regression is analyzed. Newton type methods for solution of the problem are defined and analyzed, and finite convergence is proved. Numerical experiments with a large number of test problems demonstrate efficiency and indicate that this kind of approach may...
Liu, Min; Lin, Tsung-I
A challenge associated with traditional mixture regression models (MRMs), which rest on the assumption of normally distributed errors, is determining the number of unobserved groups. Specifically, even slight deviations from normality can lead to the detection of spurious classes. The current work aims to (a) examine how sensitive the commonly…
Víšek, Jan Ámos
Roč. 23, č. 2 (2003), s. 435-448 ISSN 0208-4147 R&D Projects: GA ČR(CZ) GA402/03/0084 Institutional research plan: CEZ:AV0Z1075907 Keywords : diagnostics * regression * M-estimators * critical values of robustified D-W statistic s Subject RIV: BA - General Mathematics
Demaerel, P.; Eerens, I.; Wilms, G. [University Hospital, Leuven (Belgium). Dept. of Radiology; Goffin, J. [Dept. of Neurosurgery, University Hospitals, Leuven (Belgium)
We present a patient with a so-called disc cyst. Its location in the ventrolateral epidural space and its communication with the herniated disc are clearly shown. The disc cyst developed rapidly and regressed spontaneously. This observation, which has not been reported until now, appears to support focal degeneration with cyst formation as the pathogenesis. (orig.)
Adwere-Boamah, Joseph; Hufstedler, Shirley
This study used binary logistic regression to predict social trust with five demographic variables from a national sample of adult individuals who participated in The General Social Survey (GSS) in 2012. The five predictor variables were respondents' highest degree earned, race, sex, general happiness and the importance of personally assisting…
Abi Morshed, Alaa; Andreou, E.; Boldea, Otilia
Structural break tests developed in the literature for regression models are sensitive to model misspecification. We show - analytically and through simulations - that the sup Wald test for breaks in the conditional mean and variance of a time series process exhibits severe size distortions when the
Williams, Matt N.; Gomez Grajales, Carlos Alberto; Kurkiewicz, Dason
In 2002, an article entitled "Four assumptions of multiple regression that researchers should always test" by Osborne and Waters was published in "PARE." This article has gone on to be viewed more than 275,000 times (as of August 2013), and it is one of the first results displayed in a Google search for "regression…
The objective is to minimize the processing time and computer memory required .... Survey. 65 time to acquire extra GPR or seismic data for large sites and picking the first arrival time. 66 to provide the needed datasets for the joint inversion are also .... The data utilized for the regression modelling was acquired from ground.
Zeng, Jiabei; Liu, Yang; Leng, Biao; Xiong, Zhang; Cheung, Yiu-Ming
Supervised dimensionality reduction (DR) plays an important role in learning systems with high-dimensional data. It projects the data into a low-dimensional subspace and keeps the projected data distinguishable in different classes. In addition to preserving the discriminant information for binary or multiple classes, some real-world applications also require keeping the preference degrees of assigning the data to multiple aspects, e.g., to keep the different intensities for co-occurring facial expressions or the product ratings in different aspects. To address this issue, we propose a novel supervised DR method for DR in multiple ordinal regression (DRMOR), whose projected subspace preserves all the ordinal information in multiple aspects or labels. We formulate this problem as a joint optimization framework to simultaneously perform DR and ordinal regression. In contrast to most existing DR methods, which are conducted independently of the subsequent classification or ordinal regression, the proposed framework fully benefits from both of the procedures. We experimentally demonstrate that the proposed DRMOR method (DRMOR-M) well preserves the ordinal information from all the aspects or labels in the learned subspace. Moreover, DRMOR-M exhibits advantages compared with representative DR or ordinal regression algorithms on three standard data sets.
Anton Andreevich Krasnopevtsev
Full Text Available The article describes practical approaches for realization of automatized regression functional and load testing on random software-hardware complex, based on «MARSh 3.0» sample. Testing automatization is being realized for «MARSh 3.0» information security increase.
Let (Y;C;X) be a vector of random variables where Y; C and X are, respectively, the interest variable, a right censoring and a covariable (predictor). In this paper, we introduce a new nonlinear wavelet-based estimator of the regression function in the right censorship model. An asymptotic expression for the mean integrated ...
Westphal, Alexander; Schelinski, Stefanie; Volkmar, Fred; Pelphrey, Kevin
Theodor Heller first described a severe regression of adaptive function in normally developing children, something he termed dementia infantilis, over one 100 years ago. Dementia infantilis is most closely related to the modern diagnosis, childhood disintegrative disorder. We translate Heller's paper, Uber Dementia Infantilis, and discuss…
Portela, M.; Teulings, C.N.; Alessie, R.
The perpetual inventory method used for the construction of education data per country leads to systematic measurement error. This paper analyses the effect of this measurement error on GDP regressions. There is a systematic difference in the education level between census data and observations
Portela, Miguel; Teulings, Coen; Alessie, R.
The perpetual inventory method used for the construction of education data per country leads to systematic measurement error. This paper analyses the effect of this measurement error on GDP regressions. There is a systematic difference in the education level between census data and observations
Lai, Zhihui; Mo, Dongmei; Wong, Wai Keung; Xu, Yong; Miao, Duoqian; Zhang, David
Ridge regression (RR) and its extended versions are widely used as an effective feature extraction method in pattern recognition. However, the RR-based methods are sensitive to the variations of data and can learn only limited number of projections for feature extraction and recognition. To address these problems, we propose a new method called robust discriminant regression (RDR) for feature extraction. In order to enhance the robustness, the L₂,₁-norm is used as the basic metric in the proposed RDR. The designed robust objective function in regression form can be solved by an iterative algorithm containing an eigenfunction, through which the optimal orthogonal projections of RDR can be obtained by eigen decomposition. The convergence analysis and computational complexity are presented. In addition, we also explore the intrinsic connections and differences between the RDR and some previous methods. Experiments on some well-known databases show that RDR is superior to the classical and very recent proposed methods reported in the literature, no matter the L₂-norm or the L₂,₁-norm-based regression methods. The code of this paper can be downloaded from http://www.scholat.com/laizhihui.
Mar 8, 2018 ... sequence stratigraphic architecture to understand the exact paleogeographic setup of the Raniganj ... regressive cycles in the light of tectonic/basinal changes, fluctuating sea level conditions and pro- ...... allowing incursion of marine water within the basin. (Bhattacharya et al. 2016). As a result, the estu-.
Camacho, José; Saccenti, Edoardo
This paper introduces the group-wise partial least squares (GPLS) regression. GPLS is a new sparse PLS technique where the sparsity structure is defined in terms of groups of correlated variables, similarly to what is done in the related group-wise principal component analysis. These groups are
Full Text Available In this article we have considered the problem of prediction within and outside the sample for actual and average values of the study variables in case of ordinary least squares and ridge regression estimators. Finally, the performance properties of the estimators are analyzed.
P. Exterkate (Peter)
textabstractKernel ridge regression is gaining popularity as a data-rich nonlinear forecasting tool, which is applicable in many different contexts. This paper investigates the influence of the choice of kernel and the setting of tuning parameters on forecast accuracy. We review several popular
Scott B. Raymond, MD, PhD
Full Text Available Mandibular arteriovenous malformations (AVMs are rare lesions that may initially present as catastrophic bleeding during dental surgical procedures. Owing to the significant risk of bleeding, most mandibular AVMs are treated definitively by resection or embolization. In this report, we describe a mandibular AVM that spontaneously regressed after biopsy.
Nutt, A. T.; Batsell, R. R.
Examples of the use of Multiple Linear Regression (MLR) techniques are presented. This is done to show how MLR aids data processing and decision-making by providing the decision-maker with freedom in phrasing questions and by accurately reflecting the data on hand. A brief overview of the rationale underlying MLR is given, some basic definitions…
Ph.H.B.F. Franses (Philip Hans)
textabstractA MIDAS regression involves a dependent variable observed at a low frequency and independent variables observed at a higher frequency. This paper relates a true high frequency data generating process, where also the dependent variable is observed (hypothetically) at the high frequency,
Using the theory of impulsive differential equations, this book focuses on mathematical models which reflect current research in biology, population dynamics, neural networks and economics. The authors provide the basic background from the fundamental theory and give a systematic exposition of recent results related to the qualitative analysis of impulsive mathematical models. Consisting of six chapters, the book presents many applicable techniques, making them available in a single source easily accessible to researchers interested in mathematical models and their applications. Serving as a valuable reference, this text is addressed to a wide audience of professionals, including mathematicians, applied researchers and practitioners.
While preserving the clear, accessible style of previous editions, Applied Nonparametric Statistical Methods, Fourth Edition reflects the latest developments in computer-intensive methods that deal with intractable analytical problems and unwieldy data sets. Reorganized and with additional material, this edition begins with a brief summary of some relevant general statistical concepts and an introduction to basic ideas of nonparametric or distribution-free methods. Designed experiments, including those with factorial treatment structures, are now the focus of an entire chapter. The text also e
Linear regression methods are without doubt the most used approaches to describe and predict data in the physical sciences. They are often good first order approximations and they are in general easier to apply and interpret than more advanced methods. However, even the properties of univariate regression can lead to debate over the appropriateness of various models as witnessed by the recent discussion about climate reconstruction methods. Before linear regression is applied important choices have to be made regarding the origins of the noise terms and regarding which of the two variables under consideration that should be treated as the independent variable. These decisions are often not easy to make but they may have a considerable impact on the results. We seek to give a unified probabilistic - Bayesian with flat priors - treatment of univariate linear regression and prediction by taking, as starting point, the general errors-in-variables model (Christiansen, J. Clim., 27, 2014-2031, 2014). Other versions of linear regression can be obtained as limits of this model. We derive the likelihood of the model parameters and predictands of the general errors-in-variables model by marginalizing over the nuisance parameters. The resulting likelihood is relatively simple and easy to analyze and calculate. The well known unidentifiability of the errors-in-variables model is manifested as the absence of a well-defined maximum in the likelihood. However, this does not mean that probabilistic inference can not be made; the marginal likelihoods of model parameters and the predictands have, in general, well-defined maxima. We also include a probabilistic version of classical calibration and show how it is related to the errors-in-variables model. The results are illustrated by an example from the coupling between the lower stratosphere and the troposphere in the Northern Hemisphere winter.
Such categories as applied science and pure science can be thought of as "ideological." They have been contested in the public sphere, exposing long-term intellectual commitments, assumptions, balances of power, and material interests. This group of essays explores the contest over applied science in Britain and the United States during the nineteenth century. The essays look at the concept in the context of a variety of neighbors, including pure science, technology, and art. They are closely related and connected to contemporary historiographic debate. Jennifer Alexander links the issues raised to a recent paper by Paul Forman. Paul Lucier and Graeme Gooday deal with the debates in the last quarter of the century in the United States and Britain, respectively. Robert Bud deals with the earlier part of the nineteenth century, with an eye specifically on the variety of concepts hybridized under the heading of "applied science." Eric Schatzberg looks at the erosion of the earlier concept of art. As a whole, the essays illuminate both long-term changes and nuanced debate and are themselves intended to provoke further reflection on science in the public sphere.
Imai, Chisato; Armstrong, Ben; Chalabi, Zaid; Mangtani, Punam; Hashizume, Masahiro
Time series regression has been developed and long used to evaluate the short-term associations of air pollution and weather with mortality or morbidity of non-infectious diseases. The application of the regression approaches from this tradition to infectious diseases, however, is less well explored and raises some new issues. We discuss and present potential solutions for five issues often arising in such analyses: changes in immune population, strong autocorrelations, a wide range of plausible lag structures and association patterns, seasonality adjustments, and large overdispersion. The potential approaches are illustrated with datasets of cholera cases and rainfall from Bangladesh and influenza and temperature in Tokyo. Though this article focuses on the application of the traditional time series regression to infectious diseases and weather factors, we also briefly introduce alternative approaches, including mathematical modeling, wavelet analysis, and autoregressive integrated moving average (ARIMA) models. Modifications proposed to standard time series regression practice include using sums of past cases as proxies for the immune population, and using the logarithm of lagged disease counts to control autocorrelation due to true contagion, both of which are motivated from "susceptible-infectious-recovered" (SIR) models. The complexity of lag structures and association patterns can often be informed by biological mechanisms and explored by using distributed lag non-linear models. For overdispersed models, alternative distribution models such as quasi-Poisson and negative binomial should be considered. Time series regression can be used to investigate dependence of infectious diseases on weather, but may need modifying to allow for features specific to this context. Copyright © 2015 The Authors. Published by Elsevier Inc. All rights reserved.
Zheng, Lily; Han, Pengfei; Liu, Jiaming; Li, Rui; Yin, Wen; Wang, Tao; Zhang, Wenjing; Kang, Y James
Pressure overload causes an accumulation of homocysteine in the heart, which is accompanied by copper depletion through the formation of copper-homocysteine complexes and the excretion of the complexes. Copper supplementation recovers cytochrome c oxidase (CCO) activity and promotes myocardial angiogenesis, along with the regression of cardiac hypertrophy and the recovery of cardiac contractile function. Increased copper availability is responsible for the recovery of CCO activity. Copper promoted expression of angiogenesis factors including vascular endothelial growth factor (VEGF) in endothelial cells is responsible for angiogenesis. VEGF receptor-2 (VEGFR-2) is critical for hypertrophic growth of cardiomyocytes and VEGFR-1 is essential for the regression of cardiomyocyte hypertrophy. Copper, through promoting VEGF production and suppressing VEGFR-2, switches the VEGF signaling pathway from VEGFR-2-dependent to VEGFR-1-dependent, leading to the regression of cardiomyocyte hypertrophy. Copper is also required for hypoxia-inducible factor-1 (HIF-1) transcriptional activity, acting on the interaction between HIF-1 and the hypoxia responsible element and the formation of HIF-1 transcriptional complex by inhibiting the factor inhibiting HIF-1. Therefore, therapeutic targets for copper supplementation-induced regression of cardiac hypertrophy include: (1) the recovery of copper availability for CCO and other critical cellular events; (2) the activation of HIF-1 transcriptional complex leading to the promotion of angiogenesis in the endothelial cells by VEGF and other factors; (3) the activation of VEGFR-1-dependent regression signaling pathway in the cardiomyocytes; and (4) the inhibition of VEGFR-2 through post-translational regulation in the hypertrophic cardiomyocytes. Future studies should focus on target-specific delivery of copper for the development of clinical application. Copyright © 2014 Elsevier Inc. All rights reserved.
Full Text Available Oculoauriculovertebral spectrum, or Goldenhar Syndrome, is a condition characterized by variable degrees of uni- or bilateral involvement of craniofacial structures, ocular anomalies, and vertebral defects. Its expressivity is variable; therefore, the term “expanded Goldenhar complex” has been coined. The Goldenhar Syndrome usually involves anomalies in craniofacial structures, but it is known that nervous system anomalies, including encephalocele or caudal regression, may, rarely, occur in this condition. We report two rare cases of infants affected by Goldenhar Syndrome, associated with neural tube defects, specifically caudal regression syndrome and nasal encephaloceles, to underline the extremely complex and heterogeneous clinical features of this oculoauriculovertebral spectrum. These additional particular cases could increase the number of new variable spectrums to be included in the “expanded Goldenhar complex.”
This unique book explains how to fashion useful regression models from commonly available data to erect models essential for evidence-based road safety management and research. Composed from techniques and best practices presented over many years of lectures and workshops, The Art of Regression Modeling in Road Safety illustrates that fruitful modeling cannot be done without substantive knowledge about the modeled phenomenon. Class-tested in courses and workshops across North America, the book is ideal for professionals, researchers, university professors, and graduate students with an interest in, or responsibilities related to, road safety. This book also: · Presents for the first time a powerful analytical tool for road safety researchers and practitioners · Includes problems and solutions in each chapter as well as data and spreadsheets for running models and PowerPoint presentation slides · Features pedagogy well-suited for graduate courses and workshops including problems, solutions, and PowerPoint p...
Yang, Yunwen; Adolph, Anne L; Puyau, Maurice R; Vohra, Firoz A; Butte, Nancy F; Zakeri, Issa F
Advanced mathematical models have the potential to capture the complex metabolic and physiological processes that result in energy expenditure (EE). Study objective is to apply quantile regression (QR) to predict EE and determine quantile-dependent variation in covariate effects in nonobese and obese children. First, QR models will be developed to predict minute-by-minute awake EE at different quantile levels based on heart rate (HR) and physical activity (PA) accelerometry counts, and child characteristics of age, sex, weight, and height. Second, the QR models will be used to evaluate the covariate effects of weight, PA, and HR across the conditional EE distribution. QR and ordinary least squares (OLS) regressions are estimated in 109 children, aged 5-18 yr. QR modeling of EE outperformed OLS regression for both nonobese and obese populations. Average prediction errors for QR compared with OLS were not only smaller at the median τ = 0.5 (18.6 vs. 21.4%), but also substantially smaller at the tails of the distribution (10.2 vs. 39.2% at τ = 0.1 and 8.7 vs. 19.8% at τ = 0.9). Covariate effects of weight, PA, and HR on EE for the nonobese and obese children differed across quantiles (P effects of weight, PA, and HR on EE in nonobese and obese children.
Joensen, Alfred Karsten; Nielsen, Henrik Aalborg; Nielsen, Torben Skov
This paper shows that the recursive least-squares (RLS) algorithm with forgetting factor is a special case of a varying-coe\\$cient model, and a model which can easily be estimated via simple local regression. This observation allows us to formulate a new method which retains the RLS algorithm......, but extends the algorithm by including polynomial approximations. Simulation results are provided, which indicates that this new method is superior to the classical RLS method, if the parameter variations are smooth....
Jee, Chang Hyun; Heo, Gyun Young; Jang, Seok Won; Lee, In Cheol
This paper proposes an idea for thermal efficiency degradation diagnosis in turbine cycles, which is based on turbine cycle simulation under abnormal conditions and a linear regression model. The correlation between the inputs for representing degradation conditions (normally unmeasured but intrinsic states) and the simulation outputs (normally measured but superficial states) was analyzed with the linear regression model. The regression models can inversely response an associated intrinsic state for a superficial state observed from a power plant. The diagnosis method proposed herein is classified into three processes, 1) simulations for degradation conditions to get measured states (referred as what-if method), 2) development of the linear model correlating intrinsic and superficial states, and 3) determination of an intrinsic state using the superficial states of current plant and the linear regression model (referred as inverse what-if method). The what-if method is to generate the outputs for the inputs including various root causes and/or boundary conditions whereas the inverse what-if method is the process of calculating the inverse matrix with the given superficial states, that is, component degradation modes. The method suggested in this paper was validated using the turbine cycle model for an operating power plant
Zhang, Lin; Li, Kailong; Zhang, Chengqi; Qi, Xianlong; Zheng, Ning; Wang, Guangbin
Language regression is observed in a subset of toddlers with autism spectrum disorder (ASD) as initial symptom. However, such a phenomenon has not been fully explored, partly due to the lack of definite diagnostic evaluation methods and criteria. Fifteen toddlers with ASD exhibiting language regression and fourteen age-matched typically developing (TD) controls underwent diffusion tensor imaging (DTI). DTI parameters including fractional anisotropy (FA), average fiber length (AFL), tract volume (TV) and number of voxels (NV) were analyzed by Neuro 3D in Siemens syngo workstation. Subsequently, the data were analyzed by using IBM SPSS Statistics 22. Compared with TD children, a significant reduction of FA along with an increase in TV and NV was observed in ASD children with language regression. Note that there were no significant differences between ASD and TD children in AFL of the arcuate fasciculus (AF). These DTI changes in the AF suggest that microstructural anomalies of the AF white matter may be associated with language deficits in ASD children exhibiting language regression starting from an early age.
Full Text Available The availability of dense molecular markers has made possible the use of genomic selection in plant and animal breeding. However, models for genomic selection pose several computational and statistical challenges and require specialized computer programs, not always available to the end user and not implemented in standard statistical software yet. The R-package BLR (Bayesian Linear Regression implements several statistical procedures (e.g., Bayesian Ridge Regression, Bayesian LASSO in a unified framework that allows including marker genotypes and pedigree data jointly. This article describes the classes of models implemented in the BLR package and illustrates their use through examples. Some challenges faced when applying genomic-enabled selection, such as model choice, evaluation of predictive ability through cross-validation, and choice of hyper-parameters, are also addressed.
Pérez, Paulino; de Los Campos, Gustavo; Crossa, José; Gianola, Daniel
The availability of dense molecular markers has made possible the use of genomic selection in plant and animal breeding. However, models for genomic selection pose several computational and statistical challenges and require specialized computer programs, not always available to the end user and not implemented in standard statistical software yet. The R-package BLR (Bayesian Linear Regression) implements several statistical procedures (e.g., Bayesian Ridge Regression, Bayesian LASSO) in a unifi ed framework that allows including marker genotypes and pedigree data jointly. This article describes the classes of models implemented in the BLR package and illustrates their use through examples. Some challenges faced when applying genomic-enabled selection, such as model choice, evaluation of predictive ability through cross-validation, and choice of hyper-parameters, are also addressed.
In data mining we often have to learn from biased data, because, for instance, data comes from different batches or there was a gender or racial bias in the collection of social data. In some applications it may be necessary to explicitly control this bias in the models we learn from the data. This paper is the first to study learning linear regression models under constraints that control the biasing effect of a given attribute such as gender or batch number. We show how propensity modeling can be used for factoring out the part of the bias that can be justified by externally provided explanatory attributes. Then we analytically derive linear models that minimize squared error while controlling the bias by imposing constraints on the mean outcome or residuals of the models. Experiments with discrimination-aware crime prediction and batch effect normalization tasks show that the proposed techniques are successful in controlling attribute effects in linear regression models. © 2013 IEEE.
Full Text Available Traditional manifold learning algorithms, such as locally linear embedding, Isomap, and Laplacian eigenmap, only provide the embedding results of the training samples. To solve the out-of-sample extension problem, spectral regression (SR solves the problem of learning an embedding function by establishing a regression framework, which can avoid eigen-decomposition of dense matrices. Motivated by the effectiveness of SR, we incorporate multiple kernel learning (MKL into SR for dimensionality reduction. The proposed approach (termed MKL-SR seeks an embedding function in the Reproducing Kernel Hilbert Space (RKHS induced by the multiple base kernels. An MKL-SR algorithm is proposed to improve the performance of kernel-based SR (KSR further. Furthermore, the proposed MKL-SR algorithm can be performed in the supervised, unsupervised, and semi-supervised situation. Experimental results on supervised classification and semi-supervised classification demonstrate the effectiveness and efficiency of our algorithm.
Boček, Pavel; Šiman, Miroslav
Roč. 52, č. 1 (2016), s. 28-51 ISSN 0023-5954 R&D Projects: GA ČR GA14-07234S Institutional support: RVO:67985556 Keywords : quantile regression * multivariate quantile * depth contour * Matlab Subject RIV: IN - Informatics, Computer Science Impact factor: 0.379, year: 2016 http://library.utia.cas.cz/separaty/2016/SI/bocek-0458380.pdf
Roč. 11, č. 2 (2015), s. 69-78 ISSN 1336-9180 Grant - others:GA ČR(CZ) GA13-01930S; Nadační fond na podporu vědy(CZ) Neuron Institutional support: RVO:67985807 Keywords : robust regression * robust econometrics * hypothesis testing Subject RIV: BA - General Mathematics http://www.degruyter.com/view/j/jamsi.2015.11.issue-2/jamsi-2015-0013/jamsi-2015-0013. xml ?format=INT
Das, Sakti Prasad; Ojha, Niranjan; Ganesh, G Shankar; Mohanty, Ram Narayan
Presence of single umbilical persistent vitelline artery distinguishes sirenomelia from caudal regression syndrome. We report a case of a12-year-old boy who had bilateral umbilical arteries presented with fusion of both legs in the lower one third of leg. Both feet were rudimentary. The right foot had a valgus rocker-bottom deformity. All toes were present but rudimentary. The left foot showed absence of all toes. Physical examination showed left tibia vara. The chest evaluation in sitting re...
Lindsey M. Negrete, BS
Full Text Available We present a case of caudal regression syndrome (CRS, a relatively uncommon defect of the lower spine accompanied by a wide range of developmental abnormalities. CRS is closely associated with pregestational diabetes and is nearly 200 times more prevalent in infants of diabetic mothers (1, 2. We report a case of prenatally suspected CRS in a fetus of a nondiabetic mother and discuss how the initial neurological abnormalities found on imaging correlate with the postnatal clinical deficits.
Kleinbaum, David G
This textbook provides students and professionals in the health sciences with a presentation of the use of logistic regression in research. The text is self-contained, and designed to be used both in class or as a tool for self-study. It arises from the author's many years of experience teaching this material and the notes on which it is based have been extensively used throughout the world.
Scheike, Thomas Harder; Martinussen, Torben
Dynamic additive regression models provide a flexible class of models for analysis of longitudinal data. The approach suggested in this work is suited for measurements obtained at random time points and aims at estimating time-varying effects. Both fully nonparametric and semiparametric models can...... in special cases. We investigate the finite sample properties of the estimators and conclude that the asymptotic results are valid for even samll samples....
Berk, Richard; Heidari, Hoda; Jabbari, Shahin; Joseph, Matthew; Kearns, Michael; Morgenstern, Jamie; Neel, Seth; Roth, Aaron
We introduce a flexible family of fairness regularizers for (linear and logistic) regression problems. These regularizers all enjoy convexity, permitting fast optimization, and they span the rang from notions of group fairness to strong individual fairness. By varying the weight on the fairness regularizer, we can compute the efficient frontier of the accuracy-fairness trade-off on any given dataset, and we measure the severity of this trade-off via a numerical quantity we call the Price of F...
equivalent to the one described in Artzner et al. (1999), where axiom (i) is replaced by translation invariance . When we refer to a coherent measure...defined as Di = ( f(X)− f (i)(X) )2 mMSE = ( f(X)− f (i)(X) )2 m (f(X)− Y )2 , (II.47) where f (i)(·) represents the fitted regression function without
Marick S. Sinay
Full Text Available We explore Bayesian inference of a multivariate linear regression model with use of a flexible prior for the covariance structure. The commonly adopted Bayesian setup involves the conjugate prior, multivariate normal distribution for the regression coefficients and inverse Wishart specification for the covariance matrix. Here we depart from this approach and propose a novel Bayesian estimator for the covariance. A multivariate normal prior for the unique elements of the matrix logarithm of the covariance matrix is considered. Such structure allows for a richer class of prior distributions for the covariance, with respect to strength of beliefs in prior location hyperparameters, as well as the added ability, to model potential correlation amongst the covariance structure. The posterior moments of all relevant parameters of interest are calculated based upon numerical results via a Markov chain Monte Carlo procedure. The Metropolis-Hastings-within-Gibbs algorithm is invoked to account for the construction of a proposal density that closely matches the shape of the target posterior distribution. As an application of the proposed technique, we investigate a multiple regression based upon the 1980 High School and Beyond Survey.
Full Text Available Recently, the regularized coding-based classification methods (e.g. SRC and CRC show a great potential for pattern classification. However, most existing coding methods assume that the representation residuals are uncorrelated. In real-world applications, this assumption does not hold. In this paper, we take account of the correlations of the representation residuals and develop a general regression and representation model (GRR for classification. GRR not only has advantages of CRC, but also takes full use of the prior information (e.g. the correlations between representation residuals and representation coefficients and the specific information (weight matrix of image pixels to enhance the classification performance. GRR uses the generalized Tikhonov regularization and K Nearest Neighbors to learn the prior information from the training data. Meanwhile, the specific information is obtained by using an iterative algorithm to update the feature (or image pixel weights of the test sample. With the proposed model as a platform, we design two classifiers: basic general regression and representation classifier (B-GRR and robust general regression and representation classifier (R-GRR. The experimental results demonstrate the performance advantages of proposed methods over state-of-the-art algorithms.
Fan, Jianqing; Xue, Lingzhou; Zou, Hui
We consider estimating multi-task quantile regression under the transnormal model, with focus on high-dimensional setting. We derive a surprisingly simple closed-form solution through rank-based covariance regularization. In particular, we propose the rank-based ℓ 1 penalization with positive definite constraints for estimating sparse covariance matrices, and the rank-based banded Cholesky decomposition regularization for estimating banded precision matrices. By taking advantage of alternating direction method of multipliers, nearest correlation matrix projection is introduced that inherits sampling properties of the unprojected one. Our work combines strengths of quantile regression and rank-based covariance regularization to simultaneously deal with nonlinearity and nonnormality for high-dimensional regression. Furthermore, the proposed method strikes a good balance between robustness and efficiency, achieves the "oracle"-like convergence rate, and provides the provable prediction interval under the high-dimensional setting. The finite-sample performance of the proposed method is also examined. The performance of our proposed rank-based method is demonstrated in a real application to analyze the protein mass spectroscopy data.
We study the properties of treatment effect estimate in terms of odds ratio at the study end point from logistic regression model adjusting for the baseline value when the underlying continuous repeated measurements follow a multivariate normal distribution. Compared with the analysis that does not adjust for the baseline value, the adjusted analysis produces a larger treatment effect as well as a larger standard error. However, the increase in standard error is more than offset by the increase in treatment effect so that the adjusted analysis is more powerful than the unadjusted analysis for detecting the treatment effect. On the other hand, the true adjusted odds ratio implied by the normal distribution of the underlying continuous variable is a function of the baseline value and hence is unlikely to be able to be adequately represented by a single value of adjusted odds ratio from the logistic regression model. In contrast, the risk difference function derived from the logistic regression model provides a reasonable approximation to the true risk difference function implied by the normal distribution of the underlying continuous variable over the range of the baseline distribution. We show that different metrics of treatment effect have similar statistical power when evaluated at the baseline mean. Copyright © 2016 John Wiley & Sons, Ltd. Copyright © 2016 John Wiley & Sons, Ltd.
Full Text Available We describe a supervised prediction method for diagnosis of acute myeloid leukemia (AML from patient samples based on flow cytometry measurements. We use a data driven approach with machine learning methods to train a computational model that takes in flow cytometry measurements from a single patient and gives a confidence score of the patient being AML-positive. Our solution is based on an [Formula: see text] regularized logistic regression model that aggregates AML test statistics calculated from individual test tubes with different cell populations and fluorescent markers. The model construction is entirely data driven and no prior biological knowledge is used. The described solution scored a 100% classification accuracy in the DREAM6/FlowCAP2 Molecular Classification of Acute Myeloid Leukaemia Challenge against a golden standard consisting of 20 AML-positive and 160 healthy patients. Here we perform a more extensive validation of the prediction model performance and further improve and simplify our original method showing that statistically equal results can be obtained by using simple average marker intensities as features in the logistic regression model. In addition to the logistic regression based model, we also present other classification models and compare their performance quantitatively. The key benefit in our prediction method compared to other solutions with similar performance is that our model only uses a small fraction of the flow cytometry measurements making our solution highly economical.
Ni, Karl S; Nguyen, Truong Q
A thorough investigation of the application of support vector regression (SVR) to the superresolution problem is conducted through various frameworks. Prior to the study, the SVR problem is enhanced by finding the optimal kernel. This is done by formulating the kernel learning problem in SVR form as a convex optimization problem, specifically a semi-definite programming (SDP) problem. An additional constraint is added to reduce the SDP to a quadratically constrained quadratic programming (QCQP) problem. After this optimization, investigation of the relevancy of SVR to superresolution proceeds with the possibility of using a single and general support vector regression for all image content, and the results are impressive for small training sets. This idea is improved upon by observing structural properties in the discrete cosine transform (DCT) domain to aid in learning the regression. Further improvement involves a combination of classification and SVR-based techniques, extending works in resolution synthesis. This method, termed kernel resolution synthesis, uses specific regressors for isolated image content to describe the domain through a partitioned look of the vector space, thereby yielding good results.
Chen, Gang; Chen, Guangyu; Xie, Chunming; Ward, B Douglas; Li, Wenjun; Antuono, Piero; Li, Shi-Jiang
In resting-state functional MRI studies, the global signal (operationally defined as the global average of resting-state functional MRI time courses) is often considered a nuisance effect and commonly removed in preprocessing. This global signal regression method can introduce artifacts, such as false anticorrelated resting-state networks in functional connectivity analyses. Therefore, the efficacy of this technique as a correction tool remains questionable. In this article, we establish that the accuracy of the estimated global signal is determined by the level of global noise (i.e., non-neural noise that has a global effect on the resting-state functional MRI signal). When the global noise level is low, the global signal resembles the resting-state functional MRI time courses of the largest cluster, but not those of the global noise. Using real data, we demonstrate that the global signal is strongly correlated with the default mode network components and has biological significance. These results call into question whether or not global signal regression should be applied. We introduce a method to quantify global noise levels. We show that a criteria for global signal regression can be found based on the method. By using the criteria, one can determine whether to include or exclude the global signal regression in minimizing errors in functional connectivity measures. Copyright © 2012 Wiley Periodicals, Inc.
Lo, Ching F.
The integration of Radial Basis Function Networks and Back Propagation Neural Networks with the Multiple Linear Regression has been accomplished to map nonlinear response surfaces over a wide range of independent variables in the process of the Modem Design of Experiments. The integrated method is capable to estimate the precision intervals including confidence and predicted intervals. The power of the innovative method has been demonstrated by applying to a set of wind tunnel test data in construction of response surface and estimation of precision interval.
Full Text Available Image segmentation is one important process in image analysis and computer vision and is a valuable tool that can be applied in fields of image processing, health care, remote sensing, and traffic image detection. Given the lack of prior knowledge of the ground truth, unsupervised learning techniques like clustering have been largely adopted. Fuzzy clustering has been widely studied and successfully applied in image segmentation. In situations such as limited spatial resolution, poor contrast, overlapping intensities, and noise and intensity inhomogeneities, fuzzy clustering can retain much more information than the hard clustering technique. Most fuzzy clustering algorithms have originated from fuzzy c-means (FCM and have been successfully applied in image segmentation. However, the cluster prototype of the FCM method is hyperspherical or hyperellipsoidal. FCM may not provide the accurate partition in situations where data consists of arbitrary shapes. Therefore, a Fuzzy C-Regression Model (FCRM using spatial information has been proposed whose prototype is hyperplaned and can be either linear or nonlinear allowing for better cluster partitioning. Thus, this paper implements FCRM and applies the algorithm to color segmentation using Berkeley’s segmentation database. The results show that FCRM obtains more accurate results compared to other fuzzy clustering algorithms.
Laura M. Grajeda
Full Text Available Abstract Background Childhood growth is a cornerstone of pediatric research. Statistical models need to consider individual trajectories to adequately describe growth outcomes. Specifically, well-defined longitudinal models are essential to characterize both population and subject-specific growth. Linear mixed-effect models with cubic regression splines can account for the nonlinearity of growth curves and provide reasonable estimators of population and subject-specific growth, velocity and acceleration. Methods We provide a stepwise approach that builds from simple to complex models, and account for the intrinsic complexity of the data. We start with standard cubic splines regression models and build up to a model that includes subject-specific random intercepts and slopes and residual autocorrelation. We then compared cubic regression splines vis-à-vis linear piecewise splines, and with varying number of knots and positions. Statistical code is provided to ensure reproducibility and improve dissemination of methods. Models are applied to longitudinal height measurements in a cohort of 215 Peruvian children followed from birth until their fourth year of life. Results Unexplained variability, as measured by the variance of the regression model, was reduced from 7.34 when using ordinary least squares to 0.81 (p < 0.001 when using a linear mixed-effect models with random slopes and a first order continuous autoregressive error term. There was substantial heterogeneity in both the intercept (p < 0.001 and slopes (p < 0.001 of the individual growth trajectories. We also identified important serial correlation within the structure of the data (ρ = 0.66; 95 % CI 0.64 to 0.68; p < 0.001, which we modeled with a first order continuous autoregressive error term as evidenced by the variogram of the residuals and by a lack of association among residuals. The final model provides a parametric linear regression equation for both estimation and
Nagwani, Naresh Kumar; Deo, Shirish V
Understanding of the compressive strength of concrete is important for activities like construction arrangement, prestressing operations, and proportioning new mixtures and for the quality assurance. Regression techniques are most widely used for prediction tasks where relationship between the independent variables and dependent (prediction) variable is identified. The accuracy of the regression techniques for prediction can be improved if clustering can be used along with regression. Clustering along with regression will ensure the more accurate curve fitting between the dependent and independent variables. In this work cluster regression technique is applied for estimating the compressive strength of the concrete and a novel state of the art is proposed for predicting the concrete compressive strength. The objective of this work is to demonstrate that clustering along with regression ensures less prediction errors for estimating the concrete compressive strength. The proposed technique consists of two major stages: in the first stage, clustering is used to group the similar characteristics concrete data and then in the second stage regression techniques are applied over these clusters (groups) to predict the compressive strength from individual clusters. It is found from experiments that clustering along with regression techniques gives minimum errors for predicting compressive strength of concrete; also fuzzy clustering algorithm C-means performs better than K-means algorithm.
Avdic, S. Dz.
Positron lifetime spectroscopy is a non-destructive tool for detection of radiation induced defects in nuclear reactor materials. This work concerns the applicability of the support vector machines method for the input data compression in the neural network analysis of positron lifetime spectra. It has been demonstrated that the SVM technique can be successfully applied to regression analysis of positron spectra. A substantial data compression of about 50 % and 8 % of the whole training set with two and three spectral components respectively has been achieved including a high accuracy of the spectra approximation. However, some parameters in the SVM approach such as the insensitivity zone e and the penalty parameter C have to be chosen carefully to obtain a good performance. (author)
Song, Chao; Kwan, Mei-Po; Zhu, Jiping
An increasing number of fires are occurring with the rapid development of cities, resulting in increased risk for human beings and the environment. This study compares geographically weighted regression-based models, including geographically weighted regression (GWR) and geographically and temporally weighted regression (GTWR), which integrates spatial and temporal effects and global linear regression models (LM) for modeling fire risk at the city scale. The results show that the road density and the spatial distribution of enterprises have the strongest influences on fire risk, which implies that we should focus on areas where roads and enterprises are densely clustered. In addition, locations with a large number of enterprises have fewer fire ignition records, probably because of strict management and prevention measures. A changing number of significant variables across space indicate that heterogeneity mainly exists in the northern and eastern rural and suburban areas of Hefei city, where human-related facilities or road construction are only clustered in the city sub-centers. GTWR can capture small changes in the spatiotemporal heterogeneity of the variables while GWR and LM cannot. An approach that integrates space and time enables us to better understand the dynamic changes in fire risk. Thus governments can use the results to manage fire safety at the city scale.
Full Text Available An increasing number of fires are occurring with the rapid development of cities, resulting in increased risk for human beings and the environment. This study compares geographically weighted regression-based models, including geographically weighted regression (GWR and geographically and temporally weighted regression (GTWR, which integrates spatial and temporal effects and global linear regression models (LM for modeling fire risk at the city scale. The results show that the road density and the spatial distribution of enterprises have the strongest influences on fire risk, which implies that we should focus on areas where roads and enterprises are densely clustered. In addition, locations with a large number of enterprises have fewer fire ignition records, probably because of strict management and prevention measures. A changing number of significant variables across space indicate that heterogeneity mainly exists in the northern and eastern rural and suburban areas of Hefei city, where human-related facilities or road construction are only clustered in the city sub-centers. GTWR can capture small changes in the spatiotemporal heterogeneity of the variables while GWR and LM cannot. An approach that integrates space and time enables us to better understand the dynamic changes in fire risk. Thus governments can use the results to manage fire safety at the city scale.
The research deals with an adaptation and application of Adaptive General Regression Neural Networks (GRNN) to high dimensional environmental data. GRNN [1,2,3] are efficient modelling tools both for spatial and temporal data and are based on nonparametric kernel methods closely related to classical Nadaraya-Watson estimator. Adaptive GRNN, using anisotropic kernels, can be also applied for features selection tasks when working with high dimensional data [1,3]. In the present research Adaptive GRNN are used to study geospatial data predictability and relevant feature selection using both simulated and real data case studies. The original raw data were either three dimensional monthly precipitation data or monthly wind speeds embedded into 13 dimensional space constructed by geographical coordinates and geo-features calculated from digital elevation model. GRNN were applied in two different ways: 1) adaptive GRNN with the resulting list of features ordered according to their relevancy; and 2) adaptive GRNN applied to evaluate all possible models N [in case of wind fields N=(2^13 -1)=8191] and rank them according to the cross-validation error. In both cases training were carried out applying leave-one-out procedure. An important result of the study is that the set of the most relevant features depends on the month (strong seasonal effect) and year. The predictabilities of precipitation and wind field patterns, estimated using the cross-validation and testing errors of raw and shuffled data, were studied in detail. The results of both approaches were qualitatively and quantitatively compared. In conclusion, Adaptive GRNN with their ability to select features and efficient modelling of complex high dimensional data can be widely used in automatic/on-line mapping and as an integrated part of environmental decision support systems. 1. Kanevski M., Pozdnoukhov A., Timonin V. Machine Learning for Spatial Environmental Data. Theory, applications and software. EPFL Press
Jaime Araujo Cobuci
Full Text Available Milk yield test-day records on the first three lactations of 25,500 Holstein cows were used to estimate genetic parameters and predict breeding values for nine measures of persistency and 305-d milk yield in a random regression animal model using two criteria to define the fixed regression. Legendre polynomials of fourth and fifth orders were used to model the fixed and random regressions of lactation curves. The fixed regressions were adjusted for average milk yield on populations (single or subpopulations (multiple formed by cows that calved at the same age and in the same season. Akaike Information (AIC and Bayesian Information (BIC criteria indicated that models with multiple regression lactation curves had the best fit to test-day milk records of first lactations, while models with a single regression curve had the best fit for the second and third lactations. Heritability and genetic correlation estimates between persistency and milk yield differed significantly depending on the lactation order and the measures of persistency used. These parameters did not differ significantly depending on the criteria used for defining the fixed regressions for lactation curves. In general, the heritability estimates were higher for first (0.07 to 0.43, followed by the second (0.08 to 0.21 and third (0.04 to 0.10 lactation. The rank of sires resulting from the processes of genetic evaluation for milk yield or persistency using random regression models differed according to the criteria used for determining the fixed regression of lactation curve.
The presentation focuses on some of the time-proven and new technologies being used to accomplish radiological work. These techniques can be applied at nuclear facilities to reduce radiation doses and protect the environment. The last reactor plants and processing facilities were shutdown and Hanford was given a new mission to put the facilities in a safe condition, decontaminate, and prepare them for decommissioning. The skills that were necessary to operate these facilities were different than the skills needed today to clean up Hanford. Workers were not familiar with many of the tools, equipment, and materials needed to accomplish:the new mission, which includes clean up of contaminated areas in and around all the facilities, recovery of reactor fuel from spent fuel pools, and the removal of millions of gallons of highly radioactive waste from 177 underground tanks. In addition, this work has to be done with a reduced number of workers and a smaller budget. At Hanford, facilities contain a myriad of radioactive isotopes that are 2048 located inside plant systems, underground tanks, and the soil. As cleanup work at Hanford began, it became obvious early that in order to get workers to apply ALARA and use hew tools and equipment to accomplish the radiological work it was necessary to plan the work in advance and get radiological control and/or ALARA committee personnel involved early in the planning process. Emphasis was placed on applying,ALARA techniques to reduce dose, limit contamination spread and minimize the amount of radioactive waste generated. Progress on the cleanup has,b6en steady and Hanford workers have learned to use different types of engineered controls and ALARA techniques to perform radiological work. The purpose of this presentation is to share the lessons learned on how Hanford is accomplishing radiological work
The development of applied ethics in recent decades has had great significance for philosophy and society. In this article, I try to characterise this field of philosophical inquiry. I also discuss the relation of applied ethics to social policy and to professional ethics. In the first part, I address the following questions: What is applied ethics? When and why did applied ethics appear? How do we engage in applied ethics? What are the methods? In the second part of the article, I introduce...
Kempe, P T; van Oppen, P; de Haan, E; Twisk, J W R; Sluis, A; Smit, J H; van Dyck, R; van Balkom, A J L M
Two methods for predicting remissions in obsessive-compulsive disorder (OCD) treatment are evaluated. Y-BOCS measurements of 88 patients with a primary OCD (DSM-III-R) diagnosis were performed over a 16-week treatment period, and during three follow-ups. Remission at any measurement was defined as a Y-BOCS score lower than thirteen combined with a reduction of seven points when compared with baseline. Logistic regression models were compared with a Cox regression for recurrent events model. Logistic regression yielded different models at different evaluation times. The recurrent events model remained stable when fewer measurements were used. Higher baseline levels of neuroticism and more severe OCD symptoms were associated with a lower chance of remission, early age of onset and more depressive symptoms with a higher chance. Choice of outcome time affects logistic regression prediction models. Recurrent events analysis uses all information on remissions and relapses. Short- and long-term predictors for OCD remission show overlap.
Baldwin, Scott A; Larson, Michael J
Statistical training psychology focuses on frequentist methods. Bayesian methods are an alternative to standard frequentist methods. This article provides researchers with an introduction to fundamental ideas in Bayesian modeling. We use data from an electroencephalogram (EEG) and anxiety study to illustrate Bayesian models. Specifically, the models examine the relationship between error-related negativity (ERN), a particular event-related potential, and trait anxiety. Methodological topics covered include: how to set up a regression model in a Bayesian framework, specifying priors, examining convergence of the model, visualizing and interpreting posterior distributions, interval estimates, expected and predicted values, and model comparison tools. We also discuss situations where Bayesian methods can outperform frequentist methods as well has how to specify more complicated regression models. Finally, we conclude with recommendations about reporting guidelines for those using Bayesian methods in their own research. We provide data and R code for replicating our analyses. Copyright © 2017 Elsevier Ltd. All rights reserved.
Petersen, Ashley; Simon, Noah; Witten, Daniela
We consider the problem of predicting an outcome variable on the basis of a small number of covariates, using an interpretable yet non-additive model. We propose convex regression with interpretable sharp partitions (CRISP) for this task. CRISP partitions the covariate space into blocks in a data-adaptive way, and fits a mean model within each block. Unlike other partitioning methods, CRISP is fit using a non-greedy approach by solving a convex optimization problem, resulting in low-variance fits. We explore the properties of CRISP, and evaluate its performance in a simulation study and on a housing price data set.