WorldWideScience

Sample records for random coefficient regression

  1. The performance of random coefficient regression in accounting for residual confounding.

    Science.gov (United States)

    Gustafson, Paul; Greenland, Sander

    2006-09-01

    Greenland (2000, Biometrics 56, 915-921) describes the use of random coefficient regression to adjust for residual confounding in a particular setting. We examine this setting further, giving theoretical and empirical results concerning the frequentist and Bayesian performance of random coefficient regression. Particularly, we compare estimators based on this adjustment for residual confounding to estimators based on the assumption of no residual confounding. This devolves to comparing an estimator from a nonidentified but more realistic model to an estimator from a less realistic but identified model. The approach described by Gustafson (2005, Statistical Science 20, 111-140) is used to quantify the performance of a Bayesian estimator arising from a nonidentified model. From both theoretical calculations and simulations we find support for the idea that superior performance can be obtained by replacing unrealistic identifying constraints with priors that allow modest departures from those constraints. In terms of point-estimator bias this superiority arises when the extent of residual confounding is substantial, but the advantage is much broader in terms of interval estimation. The benefit from modeling residual confounding is maintained when the prior distributions employed only roughly correspond to reality, for the standard identifying constraints are equivalent to priors that typically correspond much worse.

  2. Modified Regression Correlation Coefficient for Poisson Regression Model

    Science.gov (United States)

    Kaengthong, Nattacha; Domthong, Uthumporn

    2017-09-01

    This study gives attention to indicators in predictive power of the Generalized Linear Model (GLM) which are widely used; however, often having some restrictions. We are interested in regression correlation coefficient for a Poisson regression model. This is a measure of predictive power, and defined by the relationship between the dependent variable (Y) and the expected value of the dependent variable given the independent variables [E(Y|X)] for the Poisson regression model. The dependent variable is distributed as Poisson. The purpose of this research was modifying regression correlation coefficient for Poisson regression model. We also compare the proposed modified regression correlation coefficient with the traditional regression correlation coefficient in the case of two or more independent variables, and having multicollinearity in independent variables. The result shows that the proposed regression correlation coefficient is better than the traditional regression correlation coefficient based on Bias and the Root Mean Square Error (RMSE).

  3. Standards for Standardized Logistic Regression Coefficients

    Science.gov (United States)

    Menard, Scott

    2011-01-01

    Standardized coefficients in logistic regression analysis have the same utility as standardized coefficients in linear regression analysis. Although there has been no consensus on the best way to construct standardized logistic regression coefficients, there is now sufficient evidence to suggest a single best approach to the construction of a…

  4. Algebraic polynomials with random coefficients

    Directory of Open Access Journals (Sweden)

    K. Farahmand

    2002-01-01

    Full Text Available This paper provides an asymptotic value for the mathematical expected number of points of inflections of a random polynomial of the form a0(ω+a1(ω(n11/2x+a2(ω(n21/2x2+…an(ω(nn1/2xn when n is large. The coefficients {aj(w}j=0n, w∈Ω are assumed to be a sequence of independent normally distributed random variables with means zero and variance one, each defined on a fixed probability space (A,Ω,Pr. A special case of dependent coefficients is also studied.

  5. SDE based regression for random PDEs

    KAUST Repository

    Bayer, Christian

    2016-01-01

    A simulation based method for the numerical solution of PDE with random coefficients is presented. By the Feynman-Kac formula, the solution can be represented as conditional expectation of a functional of a corresponding stochastic differential equation driven by independent noise. A time discretization of the SDE for a set of points in the domain and a subsequent Monte Carlo regression lead to an approximation of the global solution of the random PDE. We provide an initial error and complexity analysis of the proposed method along with numerical examples illustrating its behaviour.

  6. SDE based regression for random PDEs

    KAUST Repository

    Bayer, Christian

    2016-01-06

    A simulation based method for the numerical solution of PDE with random coefficients is presented. By the Feynman-Kac formula, the solution can be represented as conditional expectation of a functional of a corresponding stochastic differential equation driven by independent noise. A time discretization of the SDE for a set of points in the domain and a subsequent Monte Carlo regression lead to an approximation of the global solution of the random PDE. We provide an initial error and complexity analysis of the proposed method along with numerical examples illustrating its behaviour.

  7. On the Occurrence of Standardized Regression Coefficients Greater than One.

    Science.gov (United States)

    Deegan, John, Jr.

    1978-01-01

    It is demonstrated here that standardized regression coefficients greater than one can legitimately occur. Furthermore, the relationship between the occurrence of such coefficients and the extent of multicollinearity present among the set of predictor variables in an equation is examined. Comments on the interpretation of these coefficients are…

  8. Sabine absorption coefficients to random incidence absorption coefficients

    DEFF Research Database (Denmark)

    Jeong, Cheol-Ho

    2014-01-01

    into random incidence absorption coefficients for porous absorbers are investigated. Two optimization-based conversion methods are suggested: the surface impedance estimation for locally reacting absorbers and the flow resistivity estimation for extendedly reacting absorbers. The suggested conversion methods...

  9. Regression Models for Predicting Force Coefficients of Aerofoils

    Directory of Open Access Journals (Sweden)

    Mohammed ABDUL AKBAR

    2015-09-01

    Full Text Available Renewable sources of energy are attractive and advantageous in a lot of different ways. Among the renewable energy sources, wind energy is the fastest growing type. Among wind energy converters, Vertical axis wind turbines (VAWTs have received renewed interest in the past decade due to some of the advantages they possess over their horizontal axis counterparts. VAWTs have evolved into complex 3-D shapes. A key component in predicting the output of VAWTs through analytical studies is obtaining the values of lift and drag coefficients which is a function of shape of the aerofoil, ‘angle of attack’ of wind and Reynolds’s number of flow. Sandia National Laboratories have carried out extensive experiments on aerofoils for the Reynolds number in the range of those experienced by VAWTs. The volume of experimental data thus obtained is huge. The current paper discusses three Regression analysis models developed wherein lift and drag coefficients can be found out using simple formula without having to deal with the bulk of the data. Drag coefficients and Lift coefficients were being successfully estimated by regression models with R2 values as high as 0.98.

  10. Converting Sabine absorption coefficients to random incidence absorption coefficients

    DEFF Research Database (Denmark)

    Jeong, Cheol-Ho

    2013-01-01

    are suggested: An optimization method for the surface impedances for locally reacting absorbers, the flow resistivity for extendedly reacting absorbers, and the flow resistance for fabrics. With four porous type absorbers, the conversion methods are validated. For absorbers backed by a rigid wall, the surface...... coefficients to random incidence absorption coefficients are proposed. The overestimations of the Sabine absorption coefficient are investigated theoretically based on Miki's model for porous absorbers backed by a rigid wall or an air cavity, resulting in conversion factors. Additionally, three optimizations...... impedance optimization produces the best results, while the flow resistivity optimization also yields reasonable results. The flow resistivity and flow resistance optimization for extendedly reacting absorbers are also found to be successful. However, the theoretical conversion factors based on Miki's model...

  11. Application of random regression models to the genetic evaluation ...

    African Journals Online (AJOL)

    The model included fixed regression on AM (range from 30 to 138 mo) and the effect of herd-measurement date concatenation. Random parts of the model were RRM coefficients for additive and permanent environmental effects, while residual effects were modelled to account for heterogeneity of variance by AY. Estimates ...

  12. Interpreting parameters in the logistic regression model with random effects

    DEFF Research Database (Denmark)

    Larsen, Klaus; Petersen, Jørgen Holm; Budtz-Jørgensen, Esben

    2000-01-01

    interpretation, interval odds ratio, logistic regression, median odds ratio, normally distributed random effects......interpretation, interval odds ratio, logistic regression, median odds ratio, normally distributed random effects...

  13. Overcoming multicollinearity in multiple regression using correlation coefficient

    Science.gov (United States)

    Zainodin, H. J.; Yap, S. J.

    2013-09-01

    Multicollinearity happens when there are high correlations among independent variables. In this case, it would be difficult to distinguish between the contributions of these independent variables to that of the dependent variable as they may compete to explain much of the similar variance. Besides, the problem of multicollinearity also violates the assumption of multiple regression: that there is no collinearity among the possible independent variables. Thus, an alternative approach is introduced in overcoming the multicollinearity problem in achieving a well represented model eventually. This approach is accomplished by removing the multicollinearity source variables on the basis of the correlation coefficient values based on full correlation matrix. Using the full correlation matrix can facilitate the implementation of Excel function in removing the multicollinearity source variables. It is found that this procedure is easier and time-saving especially when dealing with greater number of independent variables in a model and a large number of all possible models. Hence, in this paper detailed insight of the procedure is shown, compared and implemented.

  14. Interpreting Bivariate Regression Coefficients: Going beyond the Average

    Science.gov (United States)

    Halcoussis, Dennis; Phillips, G. Michael

    2010-01-01

    Statistics, econometrics, investment analysis, and data analysis classes often review the calculation of several types of averages, including the arithmetic mean, geometric mean, harmonic mean, and various weighted averages. This note shows how each of these can be computed using a basic regression framework. By recognizing when a regression model…

  15. Bias in regression coefficient estimates upon different treatments of ...

    African Journals Online (AJOL)

    MS and PW consistently overestimated the population parameter. EM and RI, on the other hand, tended to consistently underestimate the population parameter under non-monotonic pattern. Keywords: Missing data, bias, regression, percent missing, non-normality, missing pattern > East African Journal of Statistics Vol.

  16. Modeling maximum daily temperature using a varying coefficient regression model

    Science.gov (United States)

    Han Li; Xinwei Deng; Dong-Yum Kim; Eric P. Smith

    2014-01-01

    Relationships between stream water and air temperatures are often modeled using linear or nonlinear regression methods. Despite a strong relationship between water and air temperatures and a variety of models that are effective for data summarized on a weekly basis, such models did not yield consistently good predictions for summaries such as daily maximum temperature...

  17. Using the Ridge Regression Procedures to Estimate the Multiple Linear Regression Coefficients

    Science.gov (United States)

    Gorgees, HazimMansoor; Mahdi, FatimahAssim

    2018-05-01

    This article concerns with comparing the performance of different types of ordinary ridge regression estimators that have been already proposed to estimate the regression parameters when the near exact linear relationships among the explanatory variables is presented. For this situations we employ the data obtained from tagi gas filling company during the period (2008-2010). The main result we reached is that the method based on the condition number performs better than other methods since it has smaller mean square error (MSE) than the other stated methods.

  18. MANCOVA for one way classification with homogeneity of regression coefficient vectors

    Science.gov (United States)

    Mokesh Rayalu, G.; Ravisankar, J.; Mythili, G. Y.

    2017-11-01

    The MANOVA and MANCOVA are the extensions of the univariate ANOVA and ANCOVA techniques to multidimensional or vector valued observations. The assumption of a Gaussian distribution has been replaced with the Multivariate Gaussian distribution for the vectors data and residual term variables in the statistical models of these techniques. The objective of MANCOVA is to determine if there are statistically reliable mean differences that can be demonstrated between groups later modifying the newly created variable. When randomization assignment of samples or subjects to groups is not possible, multivariate analysis of covariance (MANCOVA) provides statistical matching of groups by adjusting dependent variables as if all subjects scored the same on the covariates. In this research article, an extension has been made to the MANCOVA technique with more number of covariates and homogeneity of regression coefficient vectors is also tested.

  19. Estimating nonlinear selection gradients using quadratic regression coefficients: double or nothing?

    Science.gov (United States)

    Stinchcombe, John R; Agrawal, Aneil F; Hohenlohe, Paul A; Arnold, Stevan J; Blows, Mark W

    2008-09-01

    The use of regression analysis has been instrumental in allowing evolutionary biologists to estimate the strength and mode of natural selection. Although directional and correlational selection gradients are equal to their corresponding regression coefficients, quadratic regression coefficients must be doubled to estimate stabilizing/disruptive selection gradients. Based on a sample of 33 papers published in Evolution between 2002 and 2007, at least 78% of papers have not doubled quadratic regression coefficients, leading to an appreciable underestimate of the strength of stabilizing and disruptive selection. Proper treatment of quadratic regression coefficients is necessary for estimation of fitness surfaces and contour plots, canonical analysis of the gamma matrix, and modeling the evolution of populations on an adaptive landscape.

  20. A Note on the Correlated Random Coefficient Model

    DEFF Research Database (Denmark)

    Kolodziejczyk, Christophe

    In this note we derive the bias of the OLS estimator for a correlated random coefficient model with one random coefficient, but which is correlated with a binary variable. We provide set-identification to the parameters of interest of the model. We also show how to reduce the bias of the estimator...

  1. Comparing Regression Coefficients between Nested Linear Models for Clustered Data with Generalized Estimating Equations

    Science.gov (United States)

    Yan, Jun; Aseltine, Robert H., Jr.; Harel, Ofer

    2013-01-01

    Comparing regression coefficients between models when one model is nested within another is of great practical interest when two explanations of a given phenomenon are specified as linear models. The statistical problem is whether the coefficients associated with a given set of covariates change significantly when other covariates are added into…

  2. Sintering equation: determination of its coefficients by experiments - using multiple regression

    International Nuclear Information System (INIS)

    Windelberg, D.

    1999-01-01

    Sintering is a method for volume-compression (or volume-contraction) of powdered or grained material applying high temperature (less than the melting point of the material). Maekipirtti tried to find an equation which describes the process of sintering by its main parameters sintering time, sintering temperature and volume contracting. Such equation is called a sintering equation. It also contains some coefficients which characterise the behaviour of the material during the process of sintering. These coefficients have to be determined by experiments. Here we show that some linear regressions will produce wrong coefficients, but multiple regression results in an useful sintering equation. (orig.)

  3. Reproducibility of The Random Incidence Absorption Coefficient Converted From the Sabine Absorption Coefficient

    DEFF Research Database (Denmark)

    Jeong, Cheol-Ho; Chang, Ji-ho

    2015-01-01

    largely depending on the test room. Several conversion methods for porous absorbers from the Sabine absorption coefficient to the random incidence absorption coefficient were suggested by considering the finite size of a test specimen and non-uniformly incident energy onto the specimen, which turned out...... resistivity optimization outperforms the surface impedance optimization in terms of the reproducibility....

  4. Meta-analytical synthesis of regression coefficients under different categorization scheme of continuous covariates.

    Science.gov (United States)

    Yoneoka, Daisuke; Henmi, Masayuki

    2017-11-30

    Recently, the number of clinical prediction models sharing the same regression task has increased in the medical literature. However, evidence synthesis methodologies that use the results of these regression models have not been sufficiently studied, particularly in meta-analysis settings where only regression coefficients are available. One of the difficulties lies in the differences between the categorization schemes of continuous covariates across different studies. In general, categorization methods using cutoff values are study specific across available models, even if they focus on the same covariates of interest. Differences in the categorization of covariates could lead to serious bias in the estimated regression coefficients and thus in subsequent syntheses. To tackle this issue, we developed synthesis methods for linear regression models with different categorization schemes of covariates. A 2-step approach to aggregate the regression coefficient estimates is proposed. The first step is to estimate the joint distribution of covariates by introducing a latent sampling distribution, which uses one set of individual participant data to estimate the marginal distribution of covariates with categorization. The second step is to use a nonlinear mixed-effects model with correction terms for the bias due to categorization to estimate the overall regression coefficients. Especially in terms of precision, numerical simulations show that our approach outperforms conventional methods, which only use studies with common covariates or ignore the differences between categorization schemes. The method developed in this study is also applied to a series of WHO epidemiologic studies on white blood cell counts. Copyright © 2017 John Wiley & Sons, Ltd.

  5. Simulating WTP Values from Random-Coefficient Models

    OpenAIRE

    Maurus Rischatsch

    2009-01-01

    Discrete Choice Experiments (DCEs) designed to estimate willingness-to-pay (WTP) values are very popular in health economics. With increased computation power and advanced simulation techniques, random-coefficient models have gained an increasing importance in applied work as they allow for taste heterogeneity. This paper discusses the parametrical derivation of WTP values from estimated random-coefficient models and shows how these values can be simulated in cases where they do not have a kn...

  6. A Structural Modeling Approach to a Multilevel Random Coefficients Model.

    Science.gov (United States)

    Rovine, Michael J.; Molenaar, Peter C. M.

    2000-01-01

    Presents a method for estimating the random coefficients model using covariance structure modeling and allowing one to estimate both fixed and random effects. The method is applied to real and simulated data, including marriage data from J. Belsky and M. Rovine (1990). (SLD)

  7. Random regression models for detection of gene by environment interaction

    Directory of Open Access Journals (Sweden)

    Meuwissen Theo HE

    2007-02-01

    Full Text Available Abstract Two random regression models, where the effect of a putative QTL was regressed on an environmental gradient, are described. The first model estimates the correlation between intercept and slope of the random regression, while the other model restricts this correlation to 1 or -1, which is expected under a bi-allelic QTL model. The random regression models were compared to a model assuming no gene by environment interactions. The comparison was done with regards to the models ability to detect QTL, to position them accurately and to detect possible QTL by environment interactions. A simulation study based on a granddaughter design was conducted, and QTL were assumed, either by assigning an effect independent of the environment or as a linear function of a simulated environmental gradient. It was concluded that the random regression models were suitable for detection of QTL effects, in the presence and absence of interactions with environmental gradients. Fixing the correlation between intercept and slope of the random regression had a positive effect on power when the QTL effects re-ranked between environments.

  8. Approximating prediction uncertainty for random forest regression models

    Science.gov (United States)

    John W. Coulston; Christine E. Blinn; Valerie A. Thomas; Randolph H. Wynne

    2016-01-01

    Machine learning approaches such as random forest have increased for the spatial modeling and mapping of continuous variables. Random forest is a non-parametric ensemble approach, and unlike traditional regression approaches there is no direct quantification of prediction error. Understanding prediction uncertainty is important when using model-based continuous maps as...

  9. Random effects coefficient of determination for mixed and meta-analysis models.

    Science.gov (United States)

    Demidenko, Eugene; Sargent, James; Onega, Tracy

    2012-01-01

    The key feature of a mixed model is the presence of random effects. We have developed a coefficient, called the random effects coefficient of determination, [Formula: see text], that estimates the proportion of the conditional variance of the dependent variable explained by random effects. This coefficient takes values from 0 to 1 and indicates how strong the random effects are. The difference from the earlier suggested fixed effects coefficient of determination is emphasized. If [Formula: see text] is close to 0, there is weak support for random effects in the model because the reduction of the variance of the dependent variable due to random effects is small; consequently, random effects may be ignored and the model simplifies to standard linear regression. The value of [Formula: see text] apart from 0 indicates the evidence of the variance reduction in support of the mixed model. If random effects coefficient of determination is close to 1 the variance of random effects is very large and random effects turn into free fixed effects-the model can be estimated using the dummy variable approach. We derive explicit formulas for [Formula: see text] in three special cases: the random intercept model, the growth curve model, and meta-analysis model. Theoretical results are illustrated with three mixed model examples: (1) travel time to the nearest cancer center for women with breast cancer in the U.S., (2) cumulative time watching alcohol related scenes in movies among young U.S. teens, as a risk factor for early drinking onset, and (3) the classic example of the meta-analysis model for combination of 13 studies on tuberculosis vaccine.

  10. SPSS and SAS programs for comparing Pearson correlations and OLS regression coefficients.

    Science.gov (United States)

    Weaver, Bruce; Wuensch, Karl L

    2013-09-01

    Several procedures that use summary data to test hypotheses about Pearson correlations and ordinary least squares regression coefficients have been described in various books and articles. To our knowledge, however, no single resource describes all of the most common tests. Furthermore, many of these tests have not yet been implemented in popular statistical software packages such as SPSS and SAS. In this article, we describe all of the most common tests and provide SPSS and SAS programs to perform them. When they are applicable, our code also computes 100 × (1 - α)% confidence intervals corresponding to the tests. For testing hypotheses about independent regression coefficients, we demonstrate one method that uses summary data and another that uses raw data (i.e., Potthoff analysis). When the raw data are available, the latter method is preferred, because use of summary data entails some loss of precision due to rounding.

  11. Conditional Monte Carlo randomization tests for regression models.

    Science.gov (United States)

    Parhat, Parwen; Rosenberger, William F; Diao, Guoqing

    2014-08-15

    We discuss the computation of randomization tests for clinical trials of two treatments when the primary outcome is based on a regression model. We begin by revisiting the seminal paper of Gail, Tan, and Piantadosi (1988), and then describe a method based on Monte Carlo generation of randomization sequences. The tests based on this Monte Carlo procedure are design based, in that they incorporate the particular randomization procedure used. We discuss permuted block designs, complete randomization, and biased coin designs. We also use a new technique by Plamadeala and Rosenberger (2012) for simple computation of conditional randomization tests. Like Gail, Tan, and Piantadosi, we focus on residuals from generalized linear models and martingale residuals from survival models. Such techniques do not apply to longitudinal data analysis, and we introduce a method for computation of randomization tests based on the predicted rate of change from a generalized linear mixed model when outcomes are longitudinal. We show, by simulation, that these randomization tests preserve the size and power well under model misspecification. Copyright © 2014 John Wiley & Sons, Ltd.

  12. Synthesis of linear regression coefficients by recovering the within-study covariance matrix from summary statistics.

    Science.gov (United States)

    Yoneoka, Daisuke; Henmi, Masayuki

    2017-06-01

    Recently, the number of regression models has dramatically increased in several academic fields. However, within the context of meta-analysis, synthesis methods for such models have not been developed in a commensurate trend. One of the difficulties hindering the development is the disparity in sets of covariates among literature models. If the sets of covariates differ across models, interpretation of coefficients will differ, thereby making it difficult to synthesize them. Moreover, previous synthesis methods for regression models, such as multivariate meta-analysis, often have problems because covariance matrix of coefficients (i.e. within-study correlations) or individual patient data are not necessarily available. This study, therefore, proposes a brief explanation regarding a method to synthesize linear regression models under different covariate sets by using a generalized least squares method involving bias correction terms. Especially, we also propose an approach to recover (at most) threecorrelations of covariates, which is required for the calculation of the bias term without individual patient data. Copyright © 2016 John Wiley & Sons, Ltd. Copyright © 2016 John Wiley & Sons, Ltd.

  13. Simultaneous confidence bands for Cox regression from semiparametric random censorship.

    Science.gov (United States)

    Mondal, Shoubhik; Subramanian, Sundarraman

    2016-01-01

    Cox regression is combined with semiparametric random censorship models to construct simultaneous confidence bands (SCBs) for subject-specific survival curves. Simulation results are presented to compare the performance of the proposed SCBs with the SCBs that are based only on standard Cox. The new SCBs provide correct empirical coverage and are more informative. The proposed SCBs are illustrated with two real examples. An extension to handle missing censoring indicators is also outlined.

  14. A comparison of random forest regression and multiple linear regression for prediction in neuroscience.

    Science.gov (United States)

    Smith, Paul F; Ganesh, Siva; Liu, Ping

    2013-10-30

    Regression is a common statistical tool for prediction in neuroscience. However, linear regression is by far the most common form of regression used, with regression trees receiving comparatively little attention. In this study, the results of conventional multiple linear regression (MLR) were compared with those of random forest regression (RFR), in the prediction of the concentrations of 9 neurochemicals in the vestibular nucleus complex and cerebellum that are part of the l-arginine biochemical pathway (agmatine, putrescine, spermidine, spermine, l-arginine, l-ornithine, l-citrulline, glutamate and γ-aminobutyric acid (GABA)). The R(2) values for the MLRs were higher than the proportion of variance explained values for the RFRs: 6/9 of them were ≥ 0.70 compared to 4/9 for RFRs. Even the variables that had the lowest R(2) values for the MLRs, e.g. ornithine (0.50) and glutamate (0.61), had much lower proportion of variance explained values for the RFRs (0.27 and 0.49, respectively). The RSE values for the MLRs were lower than those for the RFRs in all but two cases. In general, MLRs seemed to be superior to the RFRs in terms of predictive value and error. In the case of this data set, MLR appeared to be superior to RFR in terms of its explanatory value and error. This result suggests that MLR may have advantages over RFR for prediction in neuroscience with this kind of data set, but that RFR can still have good predictive value in some cases. Copyright © 2013 Elsevier B.V. All rights reserved.

  15. Comparing spatial regression to random forests for large ...

    Science.gov (United States)

    Environmental data may be “large” due to number of records, number of covariates, or both. Random forests has a reputation for good predictive performance when using many covariates, whereas spatial regression, when using reduced rank methods, has a reputation for good predictive performance when using many records. In this study, we compare these two techniques using a data set containing the macroinvertebrate multimetric index (MMI) at 1859 stream sites with over 200 landscape covariates. Our primary goal is predicting MMI at over 1.1 million perennial stream reaches across the USA. For spatial regression modeling, we develop two new methods to accommodate large data: (1) a procedure that estimates optimal Box-Cox transformations to linearize covariate relationships; and (2) a computationally efficient covariate selection routine that takes into account spatial autocorrelation. We show that our new methods lead to cross-validated performance similar to random forests, but that there is an advantage for spatial regression when quantifying the uncertainty of the predictions. Simulations are used to clarify advantages for each method. This research investigates different approaches for modeling and mapping national stream condition. We use MMI data from the EPA's National Rivers and Streams Assessment and predictors from StreamCat (Hill et al., 2015). Previous studies have focused on modeling the MMI condition classes (i.e., good, fair, and po

  16. The Initial Regression Statistical Characteristics of Intervals Between Zeros of Random Processes

    Directory of Open Access Journals (Sweden)

    V. K. Hohlov

    2014-01-01

    Full Text Available The article substantiates the initial regression statistical characteristics of intervals between zeros of realizing random processes, studies their properties allowing the use these features in the autonomous information systems (AIS of near location (NL. Coefficients of the initial regression (CIR to minimize the residual sum of squares of multiple initial regression views are justified on the basis of vector representations associated with a random vector notion of analyzed signal parameters. It is shown that even with no covariance-based private CIR it is possible to predict one random variable through another with respect to the deterministic components. The paper studies dependences of CIR interval sizes between zeros of the narrowband stationary in wide-sense random process with its energy spectrum. Particular CIR for random processes with Gaussian and rectangular energy spectra are obtained. It is shown that the considered CIRs do not depend on the average frequency of spectra, are determined by the relative bandwidth of the energy spectra, and weakly depend on the type of spectrum. CIR properties enable its use as an informative parameter when implementing temporary regression methods of signal processing, invariant to the average rate and variance of the input implementations. We consider estimates of the average energy spectrum frequency of the random stationary process by calculating the length of the time interval corresponding to the specified number of intervals between zeros. It is shown that the relative variance in estimation of the average energy spectrum frequency of stationary random process with increasing relative bandwidth ceases to depend on the last process implementation in processing above ten intervals between zeros. The obtained results can be used in the AIS NL to solve the tasks of detection and signal recognition, when a decision is made in conditions of unknown mathematical expectations on a limited observation

  17. Least squares estimation in a simple random coefficient autoregressive model

    DEFF Research Database (Denmark)

    Johansen, S; Lange, T

    2013-01-01

    The question we discuss is whether a simple random coefficient autoregressive model with infinite variance can create the long swings, or persistence, which are observed in many macroeconomic variables. The model is defined by yt=stρyt−1+εt,t=1,…,n, where st is an i.i.d. binary variable with p...... we prove the curious result that View the MathML source. The proof applies the notion of a tail index of sums of positive random variables with infinite variance to find the order of magnitude of View the MathML source and View the MathML source and hence the limit of View the MathML source...

  18. Random errors in the magnetic field coefficients of superconducting magnets

    International Nuclear Information System (INIS)

    Herrera, J.; Hogue, R.; Prodell, A.; Wanderer, P.; Willen, E.

    1985-01-01

    Random errors in the multipole magnetic coefficients of superconducting magnet have been of continuing interest in accelerator research. The Superconducting Super Collider (SSC) with its small magnetic aperture only emphasizes this aspect of magnet design, construction, and measurement. With this in mind, we present a magnet model which mirrors the structure of a typical superconducting magnet. By taking advantage of the basic symmetries of a dipole magnet, we use this model to fit the measured multipole rms widths. The fit parameters allow us then to predict the values of the rms multipole errors expected for the SSC dipole reference design D, SSC-C5. With the aid of first-order perturbation theory, we then give an estimate of the effect of these random errors on the emittance growth of a proton beam stored in an SSC. 10 refs., 6 figs., 2 tabs

  19. Estimating overall exposure effects for the clustered and censored outcome using random effect Tobit regression models.

    Science.gov (United States)

    Wang, Wei; Griswold, Michael E

    2016-11-30

    The random effect Tobit model is a regression model that accommodates both left- and/or right-censoring and within-cluster dependence of the outcome variable. Regression coefficients of random effect Tobit models have conditional interpretations on a constructed latent dependent variable and do not provide inference of overall exposure effects on the original outcome scale. Marginalized random effects model (MREM) permits likelihood-based estimation of marginal mean parameters for the clustered data. For random effect Tobit models, we extend the MREM to marginalize over both the random effects and the normal space and boundary components of the censored response to estimate overall exposure effects at population level. We also extend the 'Average Predicted Value' method to estimate the model-predicted marginal means for each person under different exposure status in a designated reference group by integrating over the random effects and then use the calculated difference to assess the overall exposure effect. The maximum likelihood estimation is proposed utilizing a quasi-Newton optimization algorithm with Gauss-Hermite quadrature to approximate the integration of the random effects. We use these methods to carefully analyze two real datasets. Copyright © 2016 John Wiley & Sons, Ltd. Copyright © 2016 John Wiley & Sons, Ltd.

  20. A special covariance structure for random coefficient models with both between and within covariates

    International Nuclear Information System (INIS)

    Riedel, K.S.

    1990-07-01

    We review random coefficient (RC) models in linear regression and propose a bias correction to the maximum likelihood (ML) estimator. Asymmptotic expansion of the ML equations are given when the between individual variance is much larger or smaller than the variance from within individual fluctuations. The standard model assumes all but one covariate varies within each individual, (we denote the within covariates by vector χ 1 ). We consider random coefficient models where some of the covariates do not vary in any single individual (we denote the between covariates by vector χ 0 ). The regression coefficients, vector β k , can only be estimated in the subspace X k of X. Thus the number of individuals necessary to estimate vector β and the covariance matrix Δ of vector β increases significantly in the presence of more than one between covariate. When the number of individuals is sufficient to estimate vector β but not the entire matrix Δ , additional assumptions must be imposed on the structure of Δ. A simple reduced model is that the between component of vector β is fixed and only the within component varies randomly. This model fails because it is not invariant under linear coordinate transformations and it can significantly overestimate the variance of new observations. We propose a covariance structure for Δ without these difficulties by first projecting the within covariates onto the space perpendicular to be between covariates. (orig.)

  1. Genetic evaluation of European quails by random regression models

    Directory of Open Access Journals (Sweden)

    Flaviana Miranda Gonçalves

    2012-09-01

    Full Text Available The objective of this study was to compare different random regression models, defined from different classes of heterogeneity of variance combined with different Legendre polynomial orders for the estimate of (covariance of quails. The data came from 28,076 observations of 4,507 female meat quails of the LF1 lineage. Quail body weights were determined at birth and 1, 14, 21, 28, 35 and 42 days of age. Six different classes of residual variance were fitted to Legendre polynomial functions (orders ranging from 2 to 6 to determine which model had the best fit to describe the (covariance structures as a function of time. According to the evaluated criteria (AIC, BIC and LRT, the model with six classes of residual variances and of sixth-order Legendre polynomial was the best fit. The estimated additive genetic variance increased from birth to 28 days of age, and dropped slightly from 35 to 42 days. The heritability estimates decreased along the growth curve and changed from 0.51 (1 day to 0.16 (42 days. Animal genetic and permanent environmental correlation estimates between weights and age classes were always high and positive, except for birth weight. The sixth order Legendre polynomial, along with the residual variance divided into six classes was the best fit for the growth rate curve of meat quails; therefore, they should be considered for breeding evaluation processes by random regression models.

  2. Weighted SGD for ℓp Regression with Randomized Preconditioning*

    Science.gov (United States)

    Yang, Jiyan; Chow, Yin-Lam; Ré, Christopher; Mahoney, Michael W.

    2018-01-01

    In recent years, stochastic gradient descent (SGD) methods and randomized linear algebra (RLA) algorithms have been applied to many large-scale problems in machine learning and data analysis. SGD methods are easy to implement and applicable to a wide range of convex optimization problems. In contrast, RLA algorithms provide much stronger performance guarantees but are applicable to a narrower class of problems. We aim to bridge the gap between these two methods in solving constrained overdetermined linear regression problems—e.g., ℓ2 and ℓ1 regression problems. We propose a hybrid algorithm named pwSGD that uses RLA techniques for preconditioning and constructing an importance sampling distribution, and then performs an SGD-like iterative process with weighted sampling on the preconditioned system.By rewriting a deterministic ℓp regression problem as a stochastic optimization problem, we connect pwSGD to several existing ℓp solvers including RLA methods with algorithmic leveraging (RLA for short).We prove that pwSGD inherits faster convergence rates that only depend on the lower dimension of the linear system, while maintaining low computation complexity. Such SGD convergence rates are superior to other related SGD algorithm such as the weighted randomized Kaczmarz algorithm.Particularly, when solving ℓ1 regression with size n by d, pwSGD returns an approximate solution with ε relative error in the objective value in 𝒪(log n·nnz(A)+poly(d)/ε2) time. This complexity is uniformly better than that of RLA methods in terms of both ε and d when the problem is unconstrained. In the presence of constraints, pwSGD only has to solve a sequence of much simpler and smaller optimization problem over the same constraints. In general this is more efficient than solving the constrained subproblem required in RLA.For ℓ2 regression, pwSGD returns an approximate solution with ε relative error in the objective value and the solution vector measured in

  3. Using the Coefficient of Determination "R"[superscript 2] to Test the Significance of Multiple Linear Regression

    Science.gov (United States)

    Quinino, Roberto C.; Reis, Edna A.; Bessegato, Lupercio F.

    2013-01-01

    This article proposes the use of the coefficient of determination as a statistic for hypothesis testing in multiple linear regression based on distributions acquired by beta sampling. (Contains 3 figures.)

  4. Stable Parameter Estimation for Autoregressive Equations with Random Coefficients

    Directory of Open Access Journals (Sweden)

    V. B. Goryainov

    2014-01-01

    Full Text Available In recent yearsthere has been a growing interest in non-linear time series models. They are more flexible than traditional linear models and allow more adequate description of real data. Among these models a autoregressive model with random coefficients plays an important role. It is widely used in various fields of science and technology, for example, in physics, biology, economics and finance. The model parameters are the mean values of autoregressive coefficients. Their evaluation is the main task of model identification. The basic method of estimation is still the least squares method, which gives good results for Gaussian time series, but it is quite sensitive to even small disturbancesin the assumption of Gaussian observations. In this paper we propose estimates, which generalize the least squares estimate in the sense that the quadratic objective function is replaced by an arbitrary convex and even function. Reasonable choice of objective function allows you to keep the benefits of the least squares estimate and eliminate its shortcomings. In particular, you can make it so that they will be almost as effective as the least squares estimate in the Gaussian case, but almost never loose in accuracy with small deviations of the probability distribution of the observations from the Gaussian distribution.The main result is the proof of consistency and asymptotic normality of the proposed estimates in the particular case of the one-parameter model describing the stationary process with finite variance. Another important result is the finding of the asymptotic relative efficiency of the proposed estimates in relation to the least squares estimate. This allows you to compare the two estimates, depending on the probability distribution of innovation process and of autoregressive coefficients. The results can be used to identify an autoregressive process, especially with nonGaussian nature, and/or of autoregressive processes observed with gross

  5. Partial F-tests with multiply imputed data in the linear regression framework via coefficient of determination.

    Science.gov (United States)

    Chaurasia, Ashok; Harel, Ofer

    2015-02-10

    Tests for regression coefficients such as global, local, and partial F-tests are common in applied research. In the framework of multiple imputation, there are several papers addressing tests for regression coefficients. However, for simultaneous hypothesis testing, the existing methods are computationally intensive because they involve calculation with vectors and (inversion of) matrices. In this paper, we propose a simple method based on the scalar entity, coefficient of determination, to perform (global, local, and partial) F-tests with multiply imputed data. The proposed method is evaluated using simulated data and applied to suicide prevention data. Copyright © 2014 John Wiley & Sons, Ltd.

  6. Towards molecular design using 2D-molecular contour maps obtained from PLS regression coefficients

    Science.gov (United States)

    Borges, Cleber N.; Barigye, Stephen J.; Freitas, Matheus P.

    2017-12-01

    The multivariate image analysis descriptors used in quantitative structure-activity relationships are direct representations of chemical structures as they are simply numerical decodifications of pixels forming the 2D chemical images. These MDs have found great utility in the modeling of diverse properties of organic molecules. Given the multicollinearity and high dimensionality of the data matrices generated with the MIA-QSAR approach, modeling techniques that involve the projection of the data space onto orthogonal components e.g. Partial Least Squares (PLS) have been generally used. However, the chemical interpretation of the PLS-based MIA-QSAR models, in terms of the structural moieties affecting the modeled bioactivity has not been straightforward. This work describes the 2D-contour maps based on the PLS regression coefficients, as a means of assessing the relevance of single MIA predictors to the response variable, and thus allowing for the structural, electronic and physicochemical interpretation of the MIA-QSAR models. A sample study to demonstrate the utility of the 2D-contour maps to design novel drug-like molecules is performed using a dataset of some anti-HIV-1 2-amino-6-arylsulfonylbenzonitriles and derivatives, and the inferences obtained are consistent with other reports in the literature. In addition, the different schemes for encoding atomic properties in molecules are discussed and evaluated.

  7. Varying coefficient subdistribution regression for left-truncated semi-competing risks data.

    Science.gov (United States)

    Li, Ruosha; Peng, Limin

    2014-10-01

    Semi-competing risks data frequently arise in biomedical studies when time to a disease landmark event is subject to dependent censoring by death, the observation of which however is not precluded by the occurrence of the landmark event. In observational studies, the analysis of such data can be further complicated by left truncation. In this work, we study a varying co-efficient subdistribution regression model for left-truncated semi-competing risks data. Our method appropriately accounts for the specifical truncation and censoring features of the data, and moreover has the flexibility to accommodate potentially varying covariate effects. The proposed method can be easily implemented and the resulting estimators are shown to have nice asymptotic properties. We also present inference, such as Kolmogorov-Smirnov type and Cramér Von-Mises type hypothesis testing procedures for the covariate effects. Simulation studies and an application to the Denmark diabetes registry demonstrate good finite-sample performance and practical utility of the proposed method.

  8. [Correlation coefficient-based classification method of hydrological dependence variability: With auto-regression model as example].

    Science.gov (United States)

    Zhao, Yu Xi; Xie, Ping; Sang, Yan Fang; Wu, Zi Yi

    2018-04-01

    Hydrological process evaluation is temporal dependent. Hydrological time series including dependence components do not meet the data consistency assumption for hydrological computation. Both of those factors cause great difficulty for water researches. Given the existence of hydrological dependence variability, we proposed a correlationcoefficient-based method for significance evaluation of hydrological dependence based on auto-regression model. By calculating the correlation coefficient between the original series and its dependence component and selecting reasonable thresholds of correlation coefficient, this method divided significance degree of dependence into no variability, weak variability, mid variability, strong variability, and drastic variability. By deducing the relationship between correlation coefficient and auto-correlation coefficient in each order of series, we found that the correlation coefficient was mainly determined by the magnitude of auto-correlation coefficient from the 1 order to p order, which clarified the theoretical basis of this method. With the first-order and second-order auto-regression models as examples, the reasonability of the deduced formula was verified through Monte-Carlo experiments to classify the relationship between correlation coefficient and auto-correlation coefficient. This method was used to analyze three observed hydrological time series. The results indicated the coexistence of stochastic and dependence characteristics in hydrological process.

  9. Coefficient shifts in geographical ecology: an empirical evaluation of spatial and non-spatial regression

    DEFF Research Database (Denmark)

    Bini, L. M.; Diniz-Filho, J. A. F.; Rangel, T. F. L. V. B.

    2009-01-01

    A major focus of geographical ecology and macroecology is to understand the causes of spatially structured ecological patterns. However, achieving this understanding can be complicated when using multiple regression, because the relative importance of explanatory variables, as measured by regress...

  10. Robust linear registration of CT images using random regression forests

    Science.gov (United States)

    Konukoglu, Ender; Criminisi, Antonio; Pathak, Sayan; Robertson, Duncan; White, Steve; Haynor, David; Siddiqui, Khan

    2011-03-01

    Global linear registration is a necessary first step for many different tasks in medical image analysis. Comparing longitudinal studies1, cross-modality fusion2, and many other applications depend heavily on the success of the automatic registration. The robustness and efficiency of this step is crucial as it affects all subsequent operations. Most common techniques cast the linear registration problem as the minimization of a global energy function based on the image intensities. Although these algorithms have proved useful, their robustness in fully automated scenarios is still an open question. In fact, the optimization step often gets caught in local minima yielding unsatisfactory results. Recent algorithms constrain the space of registration parameters by exploiting implicit or explicit organ segmentations, thus increasing robustness4,5. In this work we propose a novel robust algorithm for automatic global linear image registration. Our method uses random regression forests to estimate posterior probability distributions for the locations of anatomical structures - represented as axis aligned bounding boxes6. These posterior distributions are later integrated in a global linear registration algorithm. The biggest advantage of our algorithm is that it does not require pre-defined segmentations or regions. Yet it yields robust registration results. We compare the robustness of our algorithm with that of the state of the art Elastix toolbox7. Validation is performed via 1464 pair-wise registrations in a database of very diverse 3D CT images. We show that our method decreases the "failure" rate of the global linear registration from 12.5% (Elastix) to only 1.9%.

  11. EXISTENCE AND UNIQUENESS OF SOLUTIONS TO STOCHASTIC DIFFERENTIAL EQUATION WITH RANDOM COEFFICIENTS

    Institute of Scientific and Technical Information of China (English)

    2010-01-01

    This paper mainly deals with a stochastic differential equation (SDE) with random coefficients. Sufficient conditions which guarantee the existence and uniqueness of solutions to the equation are given.

  12. Predicting longitudinal trajectories of health probabilities with random-effects multinomial logit regression.

    Science.gov (United States)

    Liu, Xian; Engel, Charles C

    2012-12-20

    Researchers often encounter longitudinal health data characterized with three or more ordinal or nominal categories. Random-effects multinomial logit models are generally applied to account for potential lack of independence inherent in such clustered data. When parameter estimates are used to describe longitudinal processes, however, random effects, both between and within individuals, need to be retransformed for correctly predicting outcome probabilities. This study attempts to go beyond existing work by developing a retransformation method that derives longitudinal growth trajectories of unbiased health probabilities. We estimated variances of the predicted probabilities by using the delta method. Additionally, we transformed the covariates' regression coefficients on the multinomial logit function, not substantively meaningful, to the conditional effects on the predicted probabilities. The empirical illustration uses the longitudinal data from the Asset and Health Dynamics among the Oldest Old. Our analysis compared three sets of the predicted probabilities of three health states at six time points, obtained from, respectively, the retransformation method, the best linear unbiased prediction, and the fixed-effects approach. The results demonstrate that neglect of retransforming random errors in the random-effects multinomial logit model results in severely biased longitudinal trajectories of health probabilities as well as overestimated effects of covariates on the probabilities. Copyright © 2012 John Wiley & Sons, Ltd.

  13. THE DETERMINATION OF BETA COEFFICIENTS OF PUBLICLY-HELD COMPANIES BY A REGRESSION MODEL AND AN APPLICATION ON PRIVATE FIRMS

    Directory of Open Access Journals (Sweden)

    METİN KAMİL ERCAN

    2013-06-01

    Full Text Available It is possible to determine the value of private companies by means of suggestions and assumptions derived from their financial statements. However, there comes out a serious problem in the determination of equity costs of these private companies using Capital Assets Pricing Model (CAPM as beta coefficients are unknown or unavailable. In this study, firstly, a regression model that represents the relationship between the beta coefficients and financial statements’ Variables of publicly-held companies will be developed. Then, this model will be tested and applied on private companies.

  14. Random Decrement and Regression Analysis of Traffic Responses of Bridges

    DEFF Research Database (Denmark)

    Asmussen, J. C.; Ibrahim, S. R.; Brincker, Rune

    1996-01-01

    The topic of this paper is the estimation of modal parameters from ambient data by applying the Random Decrement technique. The data fro the Queensborough Bridge over the Fraser River in Vancouver, Canada have been applied. The loads producing the dynamic response are ambient, e. g. wind, traffic...

  15. Random Decrement and Regression Analysis of Traffic Responses of Bridges

    DEFF Research Database (Denmark)

    Asmussen, J. C.; Ibrahim, S. R.; Brincker, Rune

    The topic of this paper is the estimation of modal parameters from ambient data by applying the Random Decrement technique. The data from the Queensborough Bridge over the Fraser River in Vancouver, Canada have been applied. The loads producing the dynamic response are ambient, e.g. wind, traffic...

  16. Deriving Genomic Breeding Values for Residual Feed Intake from Covariance Functions of Random Regression Models

    DEFF Research Database (Denmark)

    Strathe, Anders B; Mark, Thomas; Nielsen, Bjarne

    2014-01-01

    Random regression models were used to estimate covariance functions between cumulated feed intake (CFI) and body weight (BW) in 8424 Danish Duroc pigs. Random regressions on second order Legendre polynomials of age were used to describe genetic and permanent environmental curves in BW and CFI...

  17. Estimating the Performance of Random Forest versus Multiple Regression for Predicting Prices of the Apartments

    Directory of Open Access Journals (Sweden)

    Marjan Čeh

    2018-05-01

    Full Text Available The goal of this study is to analyse the predictive performance of the random forest machine learning technique in comparison to commonly used hedonic models based on multiple regression for the prediction of apartment prices. A data set that includes 7407 records of apartment transactions referring to real estate sales from 2008–2013 in the city of Ljubljana, the capital of Slovenia, was used in order to test and compare the predictive performances of both models. Apparent challenges faced during modelling included (1 the non-linear nature of the prediction assignment task; (2 input data being based on transactions occurring over a period of great price changes in Ljubljana whereby a 28% decline was noted in six consecutive testing years; and (3 the complex urban form of the case study area. Available explanatory variables, organised as a Geographic Information Systems (GIS ready dataset, including the structural and age characteristics of the apartments as well as environmental and neighbourhood information were considered in the modelling procedure. All performance measures (R2 values, sales ratios, mean average percentage error (MAPE, coefficient of dispersion (COD revealed significantly better results for predictions obtained by the random forest method, which confirms the prospective of this machine learning technique on apartment price prediction.

  18. Estimation of the Coefficient of Restitution of Rocking Systems by the Random Decrement Technique

    DEFF Research Database (Denmark)

    Brincker, Rune; Demosthenous, Milton; Manos, George C.

    1994-01-01

    The aim of this paper is to investigate the possibility of estimating an average damping parameter for a rocking system due to impact, the so-called coefficient of restitution, from the random response, i.e. when the loads are random and unknown, and the response is measured. The objective...... is to obtain an estimate of the free rocking response from the measured random response using the Random Decrement (RDD) Technique, and then estimate the coefficient of restitution from this free response estimate. In the paper this approach is investigated by simulating the response of a single degree...

  19. Analysis and computation of the elastic wave equation with random coefficients

    KAUST Repository

    Motamed, Mohammad; Nobile, Fabio; Tempone, Raul

    2015-01-01

    We consider the stochastic initial-boundary value problem for the elastic wave equation with random coefficients and deterministic data. We propose a stochastic collocation method for computing statistical moments of the solution or statistics

  20. Comparison of regression coefficient and GIS-based methodologies for regional estimates of forest soil carbon stocks

    International Nuclear Information System (INIS)

    Elliott Campbell, J.; Moen, Jeremie C.; Ney, Richard A.; Schnoor, Jerald L.

    2008-01-01

    Estimates of forest soil organic carbon (SOC) have applications in carbon science, soil quality studies, carbon sequestration technologies, and carbon trading. Forest SOC has been modeled using a regression coefficient methodology that applies mean SOC densities (mass/area) to broad forest regions. A higher resolution model is based on an approach that employs a geographic information system (GIS) with soil databases and satellite-derived landcover images. Despite this advancement, the regression approach remains the basis of current state and federal level greenhouse gas inventories. Both approaches are analyzed in detail for Wisconsin forest soils from 1983 to 2001, applying rigorous error-fixing algorithms to soil databases. Resulting SOC stock estimates are 20% larger when determined using the GIS method rather than the regression approach. Average annual rates of increase in SOC stocks are 3.6 and 1.0 million metric tons of carbon per year for the GIS and regression approaches respectively. - Large differences in estimates of soil organic carbon stocks and annual changes in stocks for Wisconsin forestlands indicate a need for validation from forthcoming forest surveys

  1. Methods for identifying SNP interactions: a review on variations of Logic Regression, Random Forest and Bayesian logistic regression.

    Science.gov (United States)

    Chen, Carla Chia-Ming; Schwender, Holger; Keith, Jonathan; Nunkesser, Robin; Mengersen, Kerrie; Macrossan, Paula

    2011-01-01

    Due to advancements in computational ability, enhanced technology and a reduction in the price of genotyping, more data are being generated for understanding genetic associations with diseases and disorders. However, with the availability of large data sets comes the inherent challenges of new methods of statistical analysis and modeling. Considering a complex phenotype may be the effect of a combination of multiple loci, various statistical methods have been developed for identifying genetic epistasis effects. Among these methods, logic regression (LR) is an intriguing approach incorporating tree-like structures. Various methods have built on the original LR to improve different aspects of the model. In this study, we review four variations of LR, namely Logic Feature Selection, Monte Carlo Logic Regression, Genetic Programming for Association Studies, and Modified Logic Regression-Gene Expression Programming, and investigate the performance of each method using simulated and real genotype data. We contrast these with another tree-like approach, namely Random Forests, and a Bayesian logistic regression with stochastic search variable selection.

  2. Estimation of the Coefficient of Restitution of Rocking Systems by the Random Decrement Technique

    DEFF Research Database (Denmark)

    Brincker, Rune; Demosthenous, M.; Manos, G. C.

    The aim of this paper is to investigate the possibility of estimating an average damping parameter for a rocking system due to impact, the so-called coefficient of restitution, from the random response, i.e. when the loads are random and unknown, and the response is measured. The objective is to ...... of freedom system loaded by white noise, estimating the coefficient of restitution as explained, and comparing the estimates with the value used in the simulations. Several estimates for the coefficient of restitution are considered, and reasonable results are achieved....

  3. Formulae of differentiation for solving differential equations with complex-valued random coefficients

    International Nuclear Information System (INIS)

    Kim, Ki Hong; Lee, Dong Hun

    1999-01-01

    Generalizing the work of Shapiro and Loginov, we derive new formulae of differentiation useful for solving differential equations with complex-valued random coefficients. We apply the formulae to the quantum-mechanical problem of noninteracting electrons moving in a correlated random potential in one dimension

  4. Predicting volume of distribution with decision tree-based regression methods using predicted tissue:plasma partition coefficients.

    Science.gov (United States)

    Freitas, Alex A; Limbu, Kriti; Ghafourian, Taravat

    2015-01-01

    Volume of distribution is an important pharmacokinetic property that indicates the extent of a drug's distribution in the body tissues. This paper addresses the problem of how to estimate the apparent volume of distribution at steady state (Vss) of chemical compounds in the human body using decision tree-based regression methods from the area of data mining (or machine learning). Hence, the pros and cons of several different types of decision tree-based regression methods have been discussed. The regression methods predict Vss using, as predictive features, both the compounds' molecular descriptors and the compounds' tissue:plasma partition coefficients (Kt:p) - often used in physiologically-based pharmacokinetics. Therefore, this work has assessed whether the data mining-based prediction of Vss can be made more accurate by using as input not only the compounds' molecular descriptors but also (a subset of) their predicted Kt:p values. Comparison of the models that used only molecular descriptors, in particular, the Bagging decision tree (mean fold error of 2.33), with those employing predicted Kt:p values in addition to the molecular descriptors, such as the Bagging decision tree using adipose Kt:p (mean fold error of 2.29), indicated that the use of predicted Kt:p values as descriptors may be beneficial for accurate prediction of Vss using decision trees if prior feature selection is applied. Decision tree based models presented in this work have an accuracy that is reasonable and similar to the accuracy of reported Vss inter-species extrapolations in the literature. The estimation of Vss for new compounds in drug discovery will benefit from methods that are able to integrate large and varied sources of data and flexible non-linear data mining methods such as decision trees, which can produce interpretable models. Graphical AbstractDecision trees for the prediction of tissue partition coefficient and volume of distribution of drugs.

  5. Diffusion coefficients for multi-step persistent random walks on lattices

    International Nuclear Information System (INIS)

    Gilbert, Thomas; Sanders, David P

    2010-01-01

    We calculate the diffusion coefficients of persistent random walks on lattices, where the direction of a walker at a given step depends on the memory of a certain number of previous steps. In particular, we describe a simple method which enables us to obtain explicit expressions for the diffusion coefficients of walks with a two-step memory on different classes of one-, two- and higher dimensional lattices.

  6. Statistical Analysis for Multisite Trials Using Instrumental Variables with Random Coefficients

    Science.gov (United States)

    Raudenbush, Stephen W.; Reardon, Sean F.; Nomi, Takako

    2012-01-01

    Multisite trials can clarify the average impact of a new program and the heterogeneity of impacts across sites. Unfortunately, in many applications, compliance with treatment assignment is imperfect. For these applications, we propose an instrumental variable (IV) model with person-specific and site-specific random coefficients. Site-specific IV…

  7. Transmission coefficient and heat conduction of a harmonic chain with random masses

    International Nuclear Information System (INIS)

    Verheggen, T.

    1979-01-01

    We find upper and lower bounds for the transmission coefficient of a chain of random masses. Using these bounds we show that the heat conduction in such a chain does not obey Fourier's law: For different temperatures at the ends of a chain containing N particles the energy flux falls off like Nsup(-1/2) rather than N -1 . (orig.)

  8. Investigation of Pear Drying Performance by Different Methods and Regression of Convective Heat Transfer Coefficient with Support Vector Machine

    Directory of Open Access Journals (Sweden)

    Mehmet Das

    2018-01-01

    Full Text Available In this study, an air heated solar collector (AHSC dryer was designed to determine the drying characteristics of the pear. Flat pear slices of 10 mm thickness were used in the experiments. The pears were dried both in the AHSC dryer and under the sun. Panel glass temperature, panel floor temperature, panel inlet temperature, panel outlet temperature, drying cabinet inlet temperature, drying cabinet outlet temperature, drying cabinet temperature, drying cabinet moisture, solar radiation, pear internal temperature, air velocity and mass loss of pear were measured at 30 min intervals. Experiments were carried out during the periods of June 2017 in Elazig, Turkey. The experiments started at 8:00 a.m. and continued till 18:00. The experiments were continued until the weight changes in the pear slices stopped. Wet basis moisture content (MCw, dry basis moisture content (MCd, adjustable moisture ratio (MR, drying rate (DR, and convective heat transfer coefficient (hc were calculated with both in the AHSC dryer and the open sun drying experiment data. It was found that the values of hc in both drying systems with a range 12.4 and 20.8 W/m2 °C. Three different kernel models were used in the support vector machine (SVM regression to construct the predictive model of the calculated hc values for both systems. The mean absolute error (MAE, root mean squared error (RMSE, relative absolute error (RAE and root relative absolute error (RRAE analysis were performed to indicate the predictive model’s accuracy. As a result, the rate of drying of the pear was examined for both systems and it was observed that the pear had dried earlier in the AHSC drying system. A predictive model was obtained using the SVM regression for the calculated hc values for the pear in the AHSC drying system. The normalized polynomial kernel was determined as the best kernel model in SVM for estimating the hc values.

  9. Random errors in the magnetic field coefficients of superconducting quadrupole magnets

    International Nuclear Information System (INIS)

    Herrera, J.; Hogue, R.; Prodell, A.; Thompson, P.; Wanderer, P.; Willen, E.

    1987-01-01

    The random multipole errors of superconducting quadrupoles are studied. For analyzing the multipoles which arise due to random variations in the size and locations of the current blocks, a model is outlined which gives the fractional field coefficients from the current distributions. With this approach, based on the symmetries of the quadrupole magnet, estimates are obtained of the random multipole errors for the arc quadrupoles envisioned for the Relativistic Heavy Ion Collider and for a single-layer quadrupole proposed for the Superconducting Super Collider

  10. A random regression model in analysis of litter size in pigs | Lukovi& ...

    African Journals Online (AJOL)

    Dispersion parameters for number of piglets born alive (NBA) were estimated using a random regression model (RRM). Two data sets of litter records from the Nemščak farm in Slovenia were used for analyses. The first dataset (DS1) included records from the first to the sixth parity. The second dataset (DS2) was extended ...

  11. Reduction of the number of parameters needed for a polynomial random regression test-day model

    NARCIS (Netherlands)

    Pool, M.H.; Meuwissen, T.H.E.

    2000-01-01

    Legendre polynomials were used to describe the (co)variance matrix within a random regression test day model. The goodness of fit depended on the polynomial order of fit, i.e., number of parameters to be estimated per animal but is limited by computing capacity. Two aspects: incomplete lactation

  12. Comparing spatial regression to random forests for large environmental data sets

    Science.gov (United States)

    Environmental data may be “large” due to number of records, number of covariates, or both. Random forests has a reputation for good predictive performance when using many covariates, whereas spatial regression, when using reduced rank methods, has a reputatio...

  13. Semi-parametric estimation of random effects in a logistic regression model using conditional inference

    DEFF Research Database (Denmark)

    Petersen, Jørgen Holm

    2016-01-01

    This paper describes a new approach to the estimation in a logistic regression model with two crossed random effects where special interest is in estimating the variance of one of the effects while not making distributional assumptions about the other effect. A composite likelihood is studied...

  14. The limiting behavior of the estimated parameters in a misspecified random field regression model

    DEFF Research Database (Denmark)

    Dahl, Christian Møller; Qin, Yu

    This paper examines the limiting properties of the estimated parameters in the random field regression model recently proposed by Hamilton (Econometrica, 2001). Though the model is parametric, it enjoys the flexibility of the nonparametric approach since it can approximate a large collection of n...

  15. Modeling Ontario regional electricity system demand using a mixed fixed and random coefficients approach

    Energy Technology Data Exchange (ETDEWEB)

    Hsiao, C.; Mountain, D.C.; Chan, M.W.L.; Tsui, K.Y. (University of Southern California, Los Angeles (USA) McMaster Univ., Hamilton, ON (Canada) Chinese Univ. of Hong Kong, Shatin)

    1989-12-01

    In examining the municipal peak and kilowatt-hour demand for electricity in Ontario, the issue of homogeneity across geographic regions is explored. A common model across municipalities and geographic regions cannot be supported by the data. Considered are various procedures which deal with this heterogeneity and yet reduce the multicollinearity problems associated with regional specific demand formulations. The recommended model controls for regional differences assuming that the coefficients of regional-seasonal specific factors are fixed and different while the coefficients of economic and weather variables are random draws from a common population for any one municipality by combining the information on all municipalities through a Bayes procedure. 8 tabs., 41 refs.

  16. Multiple Imputation of a Randomly Censored Covariate Improves Logistic Regression Analysis.

    Science.gov (United States)

    Atem, Folefac D; Qian, Jing; Maye, Jacqueline E; Johnson, Keith A; Betensky, Rebecca A

    2016-01-01

    Randomly censored covariates arise frequently in epidemiologic studies. The most commonly used methods, including complete case and single imputation or substitution, suffer from inefficiency and bias. They make strong parametric assumptions or they consider limit of detection censoring only. We employ multiple imputation, in conjunction with semi-parametric modeling of the censored covariate, to overcome these shortcomings and to facilitate robust estimation. We develop a multiple imputation approach for randomly censored covariates within the framework of a logistic regression model. We use the non-parametric estimate of the covariate distribution or the semiparametric Cox model estimate in the presence of additional covariates in the model. We evaluate this procedure in simulations, and compare its operating characteristics to those from the complete case analysis and a survival regression approach. We apply the procedures to an Alzheimer's study of the association between amyloid positivity and maternal age of onset of dementia. Multiple imputation achieves lower standard errors and higher power than the complete case approach under heavy and moderate censoring and is comparable under light censoring. The survival regression approach achieves the highest power among all procedures, but does not produce interpretable estimates of association. Multiple imputation offers a favorable alternative to complete case analysis and ad hoc substitution methods in the presence of randomly censored covariates within the framework of logistic regression.

  17. Estimating filtration coefficients for straining from percolation and random walk theories

    DEFF Research Database (Denmark)

    Yuan, Hao; Shapiro, Alexander; You, Zhenjiang

    2012-01-01

    In this paper, laboratory challenge tests are carried out under unfavorable attachment conditions, so that size exclusion or straining is the only particle capture mechanism. The experimental results show that far above the percolation threshold the filtration coefficients are not proportional...... size exclusion theory or the model of parallel tubes with mixing chambers, where the filtration coefficients are proportional to the flux through smaller pores, and the predicted penetration depths are much lower. A special capture mechanism is proposed, which makes it possible to explain...... the experimentally observed power law dependencies of filtration coefficients and large penetration depths of particles. Such a capture mechanism is realized in a 2D pore network model with periodical boundaries with the random walk of particles on the percolation lattice. Geometries of infinite and finite clusters...

  18. Maximum Simulated Likelihood and Expectation-Maximization Methods to Estimate Random Coefficients Logit with Panel Data

    DEFF Research Database (Denmark)

    Cherchi, Elisabetta; Guevara, Cristian

    2012-01-01

    with cross-sectional or with panel data, and (d) EM systematically attained more efficient estimators than the MSL method. The results imply that if the purpose of the estimation is only to determine the ratios of the model parameters (e.g., the value of time), the EM method should be preferred. For all......The random coefficients logit model allows a more realistic representation of agents' behavior. However, the estimation of that model may involve simulation, which may become impractical with many random coefficients because of the curse of dimensionality. In this paper, the traditional maximum...... simulated likelihood (MSL) method is compared with the alternative expectation- maximization (EM) method, which does not require simulation. Previous literature had shown that for cross-sectional data, MSL outperforms the EM method in the ability to recover the true parameters and estimation time...

  19. Analysis and implementation issues for the numerical approximation of parabolic equations with random coefficients

    KAUST Repository

    Nobile, Fabio; Tempone, Raul

    2009-01-01

    We consider the problem of numerically approximating statistical moments of the solution of a time- dependent linear parabolic partial differential equation (PDE), whose coefficients and/or forcing terms are spatially correlated random fields. The stochastic coefficients of the PDE are approximated by truncated Karhunen-Loève expansions driven by a finite number of uncorrelated random variables. After approxi- mating the stochastic coefficients, the original stochastic PDE turns into a new deterministic parametric PDE of the same type, the dimension of the parameter set being equal to the number of random variables introduced. After proving that the solution of the parametric PDE problem is analytic with respect to the parameters, we consider global polynomial approximations based on tensor product, total degree or sparse polynomial spaces and constructed by either a Stochastic Galerkin or a Stochastic Collocation approach. We derive convergence rates for the different cases and present numerical results that show how these approaches are a valid alternative to the more traditional Monte Carlo Method for this class of problems. © 2009 John Wiley & Sons, Ltd.

  20. Analysis and implementation issues for the numerical approximation of parabolic equations with random coefficients

    KAUST Repository

    Nobile, Fabio

    2009-11-05

    We consider the problem of numerically approximating statistical moments of the solution of a time- dependent linear parabolic partial differential equation (PDE), whose coefficients and/or forcing terms are spatially correlated random fields. The stochastic coefficients of the PDE are approximated by truncated Karhunen-Loève expansions driven by a finite number of uncorrelated random variables. After approxi- mating the stochastic coefficients, the original stochastic PDE turns into a new deterministic parametric PDE of the same type, the dimension of the parameter set being equal to the number of random variables introduced. After proving that the solution of the parametric PDE problem is analytic with respect to the parameters, we consider global polynomial approximations based on tensor product, total degree or sparse polynomial spaces and constructed by either a Stochastic Galerkin or a Stochastic Collocation approach. We derive convergence rates for the different cases and present numerical results that show how these approaches are a valid alternative to the more traditional Monte Carlo Method for this class of problems. © 2009 John Wiley & Sons, Ltd.

  1. Random regression models for daily feed intake in Danish Duroc pigs

    DEFF Research Database (Denmark)

    Strathe, Anders Bjerring; Mark, Thomas; Jensen, Just

    The objective of this study was to develop random regression models and estimate covariance functions for daily feed intake (DFI) in Danish Duroc pigs. A total of 476201 DFI records were available on 6542 Duroc boars between 70 to 160 days of age. The data originated from the National test station......-year-season, permanent, and animal genetic effects. The functional form was based on Legendre polynomials. A total of 64 models for random regressions were initially ranked by BIC to identify the approximate order for the Legendre polynomials using AI-REML. The parsimonious model included Legendre polynomials of 2nd...... order for genetic and permanent environmental curves and a heterogeneous residual variance, allowing the daily residual variance to change along the age trajectory due to scale effects. The parameters of the model were estimated in a Bayesian framework, using the RJMC module of the DMU package, where...

  2. Linear Regression with a Randomly Censored Covariate: Application to an Alzheimer's Study.

    Science.gov (United States)

    Atem, Folefac D; Qian, Jing; Maye, Jacqueline E; Johnson, Keith A; Betensky, Rebecca A

    2017-01-01

    The association between maternal age of onset of dementia and amyloid deposition (measured by in vivo positron emission tomography (PET) imaging) in cognitively normal older offspring is of interest. In a regression model for amyloid, special methods are required due to the random right censoring of the covariate of maternal age of onset of dementia. Prior literature has proposed methods to address the problem of censoring due to assay limit of detection, but not random censoring. We propose imputation methods and a survival regression method that do not require parametric assumptions about the distribution of the censored covariate. Existing imputation methods address missing covariates, but not right censored covariates. In simulation studies, we compare these methods to the simple, but inefficient complete case analysis, and to thresholding approaches. We apply the methods to the Alzheimer's study.

  3. Interpretation of diffusion coefficients in nanostructured materials from random walk numerical simulation.

    Science.gov (United States)

    Anta, Juan A; Mora-Seró, Iván; Dittrich, Thomas; Bisquert, Juan

    2008-08-14

    We make use of the numerical simulation random walk (RWNS) method to compute the "jump" diffusion coefficient of electrons in nanostructured materials via mean-square displacement. First, a summary of analytical results is given that relates the diffusion coefficient obtained from RWNS to those in the multiple-trapping (MT) and hopping models. Simulations are performed in a three-dimensional lattice of trap sites with energies distributed according to an exponential distribution and with a step-function distribution centered at the Fermi level. It is observed that once the stationary state is reached, the ensemble of particles follow Fermi-Dirac statistics with a well-defined Fermi level. In this stationary situation the diffusion coefficient obeys the theoretical predictions so that RWNS effectively reproduces the MT model. Mobilities can be also computed when an electrical bias is applied and they are observed to comply with the Einstein relation when compared with steady-state diffusion coefficients. The evolution of the system towards the stationary situation is also studied. When the diffusion coefficients are monitored along simulation time a transition from anomalous to trap-limited transport is observed. The nature of this transition is discussed in terms of the evolution of electron distribution and the Fermi level. All these results will facilitate the use of RW simulation and related methods to interpret steady-state as well as transient experimental techniques.

  4. Genetic correlations among body condition score, yield and fertility in multiparous cows using random regression models

    OpenAIRE

    Bastin, Catherine; Gillon, Alain; Massart, Xavier; Bertozzi, Carlo; Vanderick, Sylvie; Gengler, Nicolas

    2010-01-01

    Genetic correlations between body condition score (BCS) in lactation 1 to 3 and four economically important traits (days open, 305-days milk, fat, and protein yields recorded in the first 3 lactations) were estimated on about 12,500 Walloon Holstein cows using 4-trait random regression models. Results indicated moderate favorable genetic correlations between BCS and days open (from -0.46 to -0.62) and suggested the use of BCS for indirect selection on fertility. However, unfavorable genetic c...

  5. Multilevel covariance regression with correlated random effects in the mean and variance structure.

    Science.gov (United States)

    Quintero, Adrian; Lesaffre, Emmanuel

    2017-09-01

    Multivariate regression methods generally assume a constant covariance matrix for the observations. In case a heteroscedastic model is needed, the parametric and nonparametric covariance regression approaches can be restrictive in the literature. We propose a multilevel regression model for the mean and covariance structure, including random intercepts in both components and allowing for correlation between them. The implied conditional covariance function can be different across clusters as a result of the random effect in the variance structure. In addition, allowing for correlation between the random intercepts in the mean and covariance makes the model convenient for skewedly distributed responses. Furthermore, it permits us to analyse directly the relation between the mean response level and the variability in each cluster. Parameter estimation is carried out via Gibbs sampling. We compare the performance of our model to other covariance modelling approaches in a simulation study. Finally, the proposed model is applied to the RN4CAST dataset to identify the variables that impact burnout of nurses in Belgium. © 2017 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.

  6. Left ventricular mass regression after porcine versus bovine aortic valve replacement: a randomized comparison.

    Science.gov (United States)

    Suri, Rakesh M; Zehr, Kenton J; Sundt, Thoralf M; Dearani, Joseph A; Daly, Richard C; Oh, Jae K; Schaff, Hartzell V

    2009-10-01

    It is unclear whether small differences in transprosthetic gradient between porcine and bovine biologic aortic valves translate into improved regression of left ventricular (LV) hypertrophy after aortic valve replacement. We investigated transprosthetic gradient, aortic valve orifice area, and LV mass in patients randomized to aortic valve replacement with either the Medtronic Mosaic (MM) porcine or an Edwards Perimount (EP) bovine pericardial bioprosthesis. One hundred fifty-two patients with aortic valve disease were randomly assigned to receive either the MM (n = 76) or an EP prosthesis. There were 89 men (59%), and the mean age was 76 years. Echocardiograms from preoperative, postoperative, predismissal, and 1-year time points were analyzed. Baseline characteristics and preoperative echocardiograms were similar between the two groups. The median implant size was 23 mm for both. There were no early deaths, and 10 patients (7%) died after dismissal. One hundred seven of 137 patients (78%) had a 1-year echocardiogram, and none required aortic valve reoperation. The mean aortic valve gradient at dismissal was 19.4 mm Hg (MM) versus13.5 mm Hg (EP; p regression of LV mass index (MM, -32.4 g/m(2) versus EP, -27.0 g/m(2); p = 0.40). Greater preoperative LV mass index was the sole independent predictor of greater LV mass regression after surgery (p regression of LV mass during the first year after aortic valve replacement.

  7. Application of QMC methods to PDEs with random coefficients : a survey of analysis and implementation

    KAUST Repository

    Kuo, Frances

    2016-01-05

    In this talk I will provide a survey of recent research efforts on the application of quasi-Monte Carlo (QMC) methods to PDEs with random coefficients. Such PDE problems occur in the area of uncertainty quantification. In recent years many papers have been written on this topic using a variety of methods. QMC methods are relatively new to this application area. I will consider different models for the randomness (uniform versus lognormal) and contrast different QMC algorithms (single-level versus multilevel, first order versus higher order, deterministic versus randomized). I will give a summary of the QMC error analysis and proof techniques in a unified view, and provide a practical guide to the software for constructing QMC points tailored to the PDE problems.

  8. Logistic Regression with Multiple Random Effects: A Simulation Study of Estimation Methods and Statistical Packages

    Science.gov (United States)

    Kim, Yoonsang; Emery, Sherry

    2013-01-01

    Several statistical packages are capable of estimating generalized linear mixed models and these packages provide one or more of three estimation methods: penalized quasi-likelihood, Laplace, and Gauss-Hermite. Many studies have investigated these methods’ performance for the mixed-effects logistic regression model. However, the authors focused on models with one or two random effects and assumed a simple covariance structure between them, which may not be realistic. When there are multiple correlated random effects in a model, the computation becomes intensive, and often an algorithm fails to converge. Moreover, in our analysis of smoking status and exposure to anti-tobacco advertisements, we have observed that when a model included multiple random effects, parameter estimates varied considerably from one statistical package to another even when using the same estimation method. This article presents a comprehensive review of the advantages and disadvantages of each estimation method. In addition, we compare the performances of the three methods across statistical packages via simulation, which involves two- and three-level logistic regression models with at least three correlated random effects. We apply our findings to a real dataset. Our results suggest that two packages—SAS GLIMMIX Laplace and SuperMix Gaussian quadrature—perform well in terms of accuracy, precision, convergence rates, and computing speed. We also discuss the strengths and weaknesses of the two packages in regard to sample sizes. PMID:24288415

  9. Logistic Regression with Multiple Random Effects: A Simulation Study of Estimation Methods and Statistical Packages.

    Science.gov (United States)

    Kim, Yoonsang; Choi, Young-Ku; Emery, Sherry

    2013-08-01

    Several statistical packages are capable of estimating generalized linear mixed models and these packages provide one or more of three estimation methods: penalized quasi-likelihood, Laplace, and Gauss-Hermite. Many studies have investigated these methods' performance for the mixed-effects logistic regression model. However, the authors focused on models with one or two random effects and assumed a simple covariance structure between them, which may not be realistic. When there are multiple correlated random effects in a model, the computation becomes intensive, and often an algorithm fails to converge. Moreover, in our analysis of smoking status and exposure to anti-tobacco advertisements, we have observed that when a model included multiple random effects, parameter estimates varied considerably from one statistical package to another even when using the same estimation method. This article presents a comprehensive review of the advantages and disadvantages of each estimation method. In addition, we compare the performances of the three methods across statistical packages via simulation, which involves two- and three-level logistic regression models with at least three correlated random effects. We apply our findings to a real dataset. Our results suggest that two packages-SAS GLIMMIX Laplace and SuperMix Gaussian quadrature-perform well in terms of accuracy, precision, convergence rates, and computing speed. We also discuss the strengths and weaknesses of the two packages in regard to sample sizes.

  10. The Use of Alternative Regression Methods in Social Sciences and the Comparison of Least Squares and M Estimation Methods in Terms of the Determination of Coefficient

    Science.gov (United States)

    Coskuntuncel, Orkun

    2013-01-01

    The purpose of this study is two-fold; the first aim being to show the effect of outliers on the widely used least squares regression estimator in social sciences. The second aim is to compare the classical method of least squares with the robust M-estimator using the "determination of coefficient" (R[superscript 2]). For this purpose,…

  11. Bayesian Nonparametric Regression Analysis of Data with Random Effects Covariates from Longitudinal Measurements

    KAUST Repository

    Ryu, Duchwan

    2010-09-28

    We consider nonparametric regression analysis in a generalized linear model (GLM) framework for data with covariates that are the subject-specific random effects of longitudinal measurements. The usual assumption that the effects of the longitudinal covariate processes are linear in the GLM may be unrealistic and if this happens it can cast doubt on the inference of observed covariate effects. Allowing the regression functions to be unknown, we propose to apply Bayesian nonparametric methods including cubic smoothing splines or P-splines for the possible nonlinearity and use an additive model in this complex setting. To improve computational efficiency, we propose the use of data-augmentation schemes. The approach allows flexible covariance structures for the random effects and within-subject measurement errors of the longitudinal processes. The posterior model space is explored through a Markov chain Monte Carlo (MCMC) sampler. The proposed methods are illustrated and compared to other approaches, the "naive" approach and the regression calibration, via simulations and by an application that investigates the relationship between obesity in adulthood and childhood growth curves. © 2010, The International Biometric Society.

  12. Random matrix theory analysis of cross-correlations in the US stock market: Evidence from Pearson’s correlation coefficient and detrended cross-correlation coefficient

    Science.gov (United States)

    Wang, Gang-Jin; Xie, Chi; Chen, Shou; Yang, Jiao-Jiao; Yang, Ming-Yan

    2013-09-01

    In this study, we first build two empirical cross-correlation matrices in the US stock market by two different methods, namely the Pearson’s correlation coefficient and the detrended cross-correlation coefficient (DCCA coefficient). Then, combining the two matrices with the method of random matrix theory (RMT), we mainly investigate the statistical properties of cross-correlations in the US stock market. We choose the daily closing prices of 462 constituent stocks of S&P 500 index as the research objects and select the sample data from January 3, 2005 to August 31, 2012. In the empirical analysis, we examine the statistical properties of cross-correlation coefficients, the distribution of eigenvalues, the distribution of eigenvector components, and the inverse participation ratio. From the two methods, we find some new results of the cross-correlations in the US stock market in our study, which are different from the conclusions reached by previous studies. The empirical cross-correlation matrices constructed by the DCCA coefficient show several interesting properties at different time scales in the US stock market, which are useful to the risk management and optimal portfolio selection, especially to the diversity of the asset portfolio. It will be an interesting and meaningful work to find the theoretical eigenvalue distribution of a completely random matrix R for the DCCA coefficient because it does not obey the Marčenko-Pastur distribution.

  13. A comparison of two least-squared random coefficient autoregressive models: with and without autocorrelated errors

    OpenAIRE

    Autcha Araveeporn

    2013-01-01

    This paper compares a Least-Squared Random Coefficient Autoregressive (RCA) model with a Least-Squared RCA model based on Autocorrelated Errors (RCA-AR). We looked at only the first order models, denoted RCA(1) and RCA(1)-AR(1). The efficiency of the Least-Squared method was checked by applying the models to Brownian motion and Wiener process, and the efficiency followed closely the asymptotic properties of a normal distribution. In a simulation study, we compared the performance of RCA(1) an...

  14. Regression modeling methods, theory, and computation with SAS

    CERN Document Server

    Panik, Michael

    2009-01-01

    Regression Modeling: Methods, Theory, and Computation with SAS provides an introduction to a diverse assortment of regression techniques using SAS to solve a wide variety of regression problems. The author fully documents the SAS programs and thoroughly explains the output produced by the programs.The text presents the popular ordinary least squares (OLS) approach before introducing many alternative regression methods. It covers nonparametric regression, logistic regression (including Poisson regression), Bayesian regression, robust regression, fuzzy regression, random coefficients regression,

  15. A SOCIOLOGICAL ANALYSIS OF THE CHILDBEARING COEFFICIENT IN THE ALTAI REGION BASED ON METHOD OF FUZZY LINEAR REGRESSION

    Directory of Open Access Journals (Sweden)

    Sergei Vladimirovich Varaksin

    2017-06-01

    Full Text Available Purpose. Construction of a mathematical model of the dynamics of childbearing change in the Altai region in 2000–2016, analysis of the dynamics of changes in birth rates for multiple age categories of women of childbearing age. Methodology. A auxiliary analysis element is the construction of linear mathematical models of the dynamics of childbearing by using fuzzy linear regression method based on fuzzy numbers. Fuzzy linear regression is considered as an alternative to standard statistical linear regression for short time series and unknown distribution law. The parameters of fuzzy linear and standard statistical regressions for childbearing time series were defined with using the built in language MatLab algorithm. Method of fuzzy linear regression is not used in sociological researches yet. Results. There are made the conclusions about the socio-demographic changes in society, the high efficiency of the demographic policy of the leadership of the region and the country, and the applicability of the method of fuzzy linear regression for sociological analysis.

  16. Genetic analyses of partial egg production in Japanese quail using multi-trait random regression models.

    Science.gov (United States)

    Karami, K; Zerehdaran, S; Barzanooni, B; Lotfi, E

    2017-12-01

    1. The aim of the present study was to estimate genetic parameters for average egg weight (EW) and egg number (EN) at different ages in Japanese quail using multi-trait random regression (MTRR) models. 2. A total of 8534 records from 900 quail, hatched between 2014 and 2015, were used in the study. Average weekly egg weights and egg numbers were measured from second until sixth week of egg production. 3. Nine random regression models were compared to identify the best order of the Legendre polynomials (LP). The most optimal model was identified by the Bayesian Information Criterion. A model with second order of LP for fixed effects, second order of LP for additive genetic effects and third order of LP for permanent environmental effects (MTRR23) was found to be the best. 4. According to the MTRR23 model, direct heritability for EW increased from 0.26 in the second week to 0.53 in the sixth week of egg production, whereas the ratio of permanent environment to phenotypic variance decreased from 0.48 to 0.1. Direct heritability for EN was low, whereas the ratio of permanent environment to phenotypic variance decreased from 0.57 to 0.15 during the production period. 5. For each trait, estimated genetic correlations among weeks of egg production were high (from 0.85 to 0.98). Genetic correlations between EW and EN were low and negative for the first two weeks, but they were low and positive for the rest of the egg production period. 6. In conclusion, random regression models can be used effectively for analysing egg production traits in Japanese quail. Response to selection for increased egg weight would be higher at older ages because of its higher heritability and such a breeding program would have no negative genetic impact on egg production.

  17. Basis adaptation and domain decomposition for steady-state partial differential equations with random coefficients

    Energy Technology Data Exchange (ETDEWEB)

    Tipireddy, R.; Stinis, P.; Tartakovsky, A. M.

    2017-12-01

    We present a novel approach for solving steady-state stochastic partial differential equations (PDEs) with high-dimensional random parameter space. The proposed approach combines spatial domain decomposition with basis adaptation for each subdomain. The basis adaptation is used to address the curse of dimensionality by constructing an accurate low-dimensional representation of the stochastic PDE solution (probability density function and/or its leading statistical moments) in each subdomain. Restricting the basis adaptation to a specific subdomain affords finding a locally accurate solution. Then, the solutions from all of the subdomains are stitched together to provide a global solution. We support our construction with numerical experiments for a steady-state diffusion equation with a random spatially dependent coefficient. Our results show that highly accurate global solutions can be obtained with significantly reduced computational costs.

  18. Convergence of quasi-optimal Stochastic Galerkin methods for a class of PDES with random coefficients

    KAUST Repository

    Beck, Joakim; Nobile, Fabio; Tamellini, Lorenzo; Tempone, Raul

    2014-01-01

    In this work we consider quasi-optimal versions of the Stochastic Galerkin method for solving linear elliptic PDEs with stochastic coefficients. In particular, we consider the case of a finite number N of random inputs and an analytic dependence of the solution of the PDE with respect to the parameters in a polydisc of the complex plane CN. We show that a quasi-optimal approximation is given by a Galerkin projection on a weighted (anisotropic) total degree space and prove a (sub)exponential convergence rate. As a specific application we consider a thermal conduction problem with non-overlapping inclusions of random conductivity. Numerical results show the sharpness of our estimates. © 2013 Elsevier Ltd. All rights reserved.

  19. Convergence of quasi-optimal Stochastic Galerkin methods for a class of PDES with random coefficients

    KAUST Repository

    Beck, Joakim

    2014-03-01

    In this work we consider quasi-optimal versions of the Stochastic Galerkin method for solving linear elliptic PDEs with stochastic coefficients. In particular, we consider the case of a finite number N of random inputs and an analytic dependence of the solution of the PDE with respect to the parameters in a polydisc of the complex plane CN. We show that a quasi-optimal approximation is given by a Galerkin projection on a weighted (anisotropic) total degree space and prove a (sub)exponential convergence rate. As a specific application we consider a thermal conduction problem with non-overlapping inclusions of random conductivity. Numerical results show the sharpness of our estimates. © 2013 Elsevier Ltd. All rights reserved.

  20. Adaptive Algebraic Multigrid for Finite Element Elliptic Equations with Random Coefficients

    Energy Technology Data Exchange (ETDEWEB)

    Kalchev, D

    2012-04-02

    This thesis presents a two-grid algorithm based on Smoothed Aggregation Spectral Element Agglomeration Algebraic Multigrid (SA-{rho}AMGe) combined with adaptation. The aim is to build an efficient solver for the linear systems arising from discretization of second-order elliptic partial differential equations (PDEs) with stochastic coefficients. Examples include PDEs that model subsurface flow with random permeability field. During a Markov Chain Monte Carlo (MCMC) simulation process, that draws PDE coefficient samples from a certain distribution, the PDE coefficients change, hence the resulting linear systems to be solved change. At every such step the system (discretized PDE) needs to be solved and the computed solution used to evaluate some functional(s) of interest that then determine if the coefficient sample is acceptable or not. The MCMC process is hence computationally intensive and requires the solvers used to be efficient and fast. This fact that at every step of MCMC the resulting linear system changes, makes an already existing solver built for the old problem perhaps not as efficient for the problem corresponding to the new sampled coefficient. This motivates the main goal of our study, namely, to adapt an already existing solver to handle the problem (with changed coefficient) with the objective to achieve this goal to be faster and more efficient than building a completely new solver from scratch. Our approach utilizes the local element matrices (for the problem with changed coefficients) to build local problems associated with constructed by the method agglomerated elements (a set of subdomains that cover the given computational domain). We solve a generalized eigenproblem for each set in a subspace spanned by the previous local coarse space (used for the old solver) and a vector, component of the error, that the old solver cannot handle. A portion of the spectrum of these local eigen-problems (corresponding to eigenvalues close to zero) form the

  1. Estimation of genetic parameters related to eggshell strength using random regression models.

    Science.gov (United States)

    Guo, J; Ma, M; Qu, L; Shen, M; Dou, T; Wang, K

    2015-01-01

    This study examined the changes in eggshell strength and the genetic parameters related to this trait throughout a hen's laying life using random regression. The data were collected from a crossbred population between 2011 and 2014, where the eggshell strength was determined repeatedly for 2260 hens. Using random regression models (RRMs), several Legendre polynomials were employed to estimate the fixed, direct genetic and permanent environment effects. The residual effects were treated as independently distributed with heterogeneous variance for each test week. The direct genetic variance was included with second-order Legendre polynomials and the permanent environment with third-order Legendre polynomials. The heritability of eggshell strength ranged from 0.26 to 0.43, the repeatability ranged between 0.47 and 0.69, and the estimated genetic correlations between test weeks was high at > 0.67. The first eigenvalue of the genetic covariance matrix accounted for about 97% of the sum of all the eigenvalues. The flexibility and statistical power of RRM suggest that this model could be an effective method to improve eggshell quality and to reduce losses due to cracked eggs in a breeding plan.

  2. ESTIMATION OF GENETIC PARAMETERS IN TROPICARNE CATTLE WITH RANDOM REGRESSION MODELS USING B-SPLINES

    Directory of Open Access Journals (Sweden)

    Joel Domínguez Viveros

    2015-04-01

    Full Text Available The objectives were to estimate variance components, and direct (h2 and maternal (m2 heritability in the growth of Tropicarne cattle based on a random regression model using B-Splines for random effects modeling. Information from 12 890 monthly weightings of 1787 calves, from birth to 24 months old, was analyzed. The pedigree included 2504 animals. The random effects model included genetic and permanent environmental (direct and maternal of cubic order, and residuals. The fixed effects included contemporaneous groups (year – season of weighed, sex and the covariate age of the cow (linear and quadratic. The B-Splines were defined in four knots through the growth period analyzed. Analyses were performed with the software Wombat. The variances (phenotypic and residual presented a similar behavior; of 7 to 12 months of age had a negative trend; from birth to 6 months and 13 to 18 months had positive trend; after 19 months were maintained constant. The m2 were low and near to zero, with an average of 0.06 in an interval of 0.04 to 0.11; the h2 also were close to zero, with an average of 0.10 in an interval of 0.03 to 0.23.

  3. Determination of Nonlinear Stiffness Coefficients for Finite Element Models with Application to the Random Vibration Problem

    Science.gov (United States)

    Muravyov, Alexander A.

    1999-01-01

    In this paper, a method for obtaining nonlinear stiffness coefficients in modal coordinates for geometrically nonlinear finite-element models is developed. The method requires application of a finite-element program with a geometrically non- linear static capability. The MSC/NASTRAN code is employed for this purpose. The equations of motion of a MDOF system are formulated in modal coordinates. A set of linear eigenvectors is used to approximate the solution of the nonlinear problem. The random vibration problem of the MDOF nonlinear system is then considered. The solutions obtained by application of two different versions of a stochastic linearization technique are compared with linear and exact (analytical) solutions in terms of root-mean-square (RMS) displacements and strains for a beam structure.

  4. Cost-effective degradation test plan for a nonlinear random-coefficients model

    International Nuclear Information System (INIS)

    Kim, Seong-Joon; Bae, Suk Joo

    2013-01-01

    The determination of requisite sample size and the inspection schedule considering both testing cost and accuracy has been an important issue in the degradation test. This paper proposes a cost-effective degradation test plan in the context of a nonlinear random-coefficients model, while meeting some precision constraints for failure-time distribution. We introduce a precision measure to quantify the information losses incurred by reducing testing resources. The precision measure is incorporated into time-varying cost functions to reflect real circumstances. We apply a hybrid genetic algorithm to general cost optimization problem with reasonable constraints on the level of testing precision in order to determine a cost-effective inspection scheme. The proposed method is applied to the degradation data of plasma display panels (PDPs) following a bi-exponential degradation model. Finally, sensitivity analysis via simulation is provided to evaluate the robustness of the proposed degradation test plan.

  5. Genetic analysis of partial egg production records in Japanese quail using random regression models.

    Science.gov (United States)

    Abou Khadiga, G; Mahmoud, B Y F; Farahat, G S; Emam, A M; El-Full, E A

    2017-08-01

    The main objectives of this study were to detect the most appropriate random regression model (RRM) to fit the data of monthly egg production in 2 lines (selected and control) of Japanese quail and to test the consistency of different criteria of model choice. Data from 1,200 female Japanese quails for the first 5 months of egg production from 4 consecutive generations of an egg line selected for egg production in the first month (EP1) was analyzed. Eight RRMs with different orders of Legendre polynomials were compared to determine the proper model for analysis. All criteria of model choice suggested that the adequate model included the second-order Legendre polynomials for fixed effects, and the third-order for additive genetic effects and permanent environmental effects. Predictive ability of the best model was the highest among all models (ρ = 0.987). According to the best model fitted to the data, estimates of heritability were relatively low to moderate (0.10 to 0.17) showed a descending pattern from the first to the fifth month of production. A similar pattern was observed for permanent environmental effects with greater estimates in the first (0.36) and second (0.23) months of production than heritability estimates. Genetic correlations between separate production periods were higher (0.18 to 0.93) than their phenotypic counterparts (0.15 to 0.87). The superiority of the selected line over the control was observed through significant (P egg production in earlier ages (first and second months) than later ones. A methodology based on random regression animal models can be recommended for genetic evaluation of egg production in Japanese quail. © 2017 Poultry Science Association Inc.

  6. Genetic correlations between body condition scores and fertility in dairy cattle using bivariate random regression models.

    Science.gov (United States)

    De Haas, Y; Janss, L L G; Kadarmideen, H N

    2007-10-01

    Genetic correlations between body condition score (BCS) and fertility traits in dairy cattle were estimated using bivariate random regression models. BCS was recorded by the Swiss Holstein Association on 22,075 lactating heifers (primiparous cows) from 856 sires. Fertility data during first lactation were extracted for 40,736 cows. The fertility traits were days to first service (DFS), days between first and last insemination (DFLI), calving interval (CI), number of services per conception (NSPC) and conception rate to first insemination (CRFI). A bivariate model was used to estimate genetic correlations between BCS as a longitudinal trait by random regression components, and daughter's fertility at the sire level as a single lactation measurement. Heritability of BCS was 0.17, and heritabilities for fertility traits were low (0.01-0.08). Genetic correlations between BCS and fertility over the lactation varied from: -0.45 to -0.14 for DFS; -0.75 to 0.03 for DFLI; from -0.59 to -0.02 for CI; from -0.47 to 0.33 for NSPC and from 0.08 to 0.82 for CRFI. These results show (genetic) interactions between fat reserves and reproduction along the lactation trajectory of modern dairy cows, which can be useful in genetic selection as well as in management. Maximum genetic gain in fertility from indirect selection on BCS should be based on measurements taken in mid lactation when the genetic variance for BCS is largest, and the genetic correlations between BCS and fertility is strongest.

  7. Improved profile fitting and quantification of uncertainty in experimental measurements of impurity transport coefficients using Gaussian process regression

    International Nuclear Information System (INIS)

    Chilenski, M.A.; Greenwald, M.; Howard, N.T.; White, A.E.; Rice, J.E.; Walk, J.R.; Marzouk, Y.

    2015-01-01

    The need to fit smooth temperature and density profiles to discrete observations is ubiquitous in plasma physics, but the prevailing techniques for this have many shortcomings that cast doubt on the statistical validity of the results. This issue is amplified in the context of validation of gyrokinetic transport models (Holland et al 2009 Phys. Plasmas 16 052301), where the strong sensitivity of the code outputs to input gradients means that inadequacies in the profile fitting technique can easily lead to an incorrect assessment of the degree of agreement with experimental measurements. In order to rectify the shortcomings of standard approaches to profile fitting, we have applied Gaussian process regression (GPR), a powerful non-parametric regression technique, to analyse an Alcator C-Mod L-mode discharge used for past gyrokinetic validation work (Howard et al 2012 Nucl. Fusion 52 063002). We show that the GPR techniques can reproduce the previous results while delivering more statistically rigorous fits and uncertainty estimates for both the value and the gradient of plasma profiles with an improved level of automation. We also discuss how the use of GPR can allow for dramatic increases in the rate of convergence of uncertainty propagation for any code that takes experimental profiles as inputs. The new GPR techniques for profile fitting and uncertainty propagation are quite useful and general, and we describe the steps to implementation in detail in this paper. These techniques have the potential to substantially improve the quality of uncertainty estimates on profile fits and the rate of convergence of uncertainty propagation, making them of great interest for wider use in fusion experiments and modelling efforts. (paper)

  8. Quasi optimal and adaptive sparse grids with control variates for PDEs with random diffusion coefficient

    KAUST Repository

    Tamellini, Lorenzo

    2016-01-05

    In this talk we discuss possible strategies to minimize the impact of the curse of dimensionality effect when building sparse-grid approximations of a multivariate function u = u(y1, ..., yN ). More precisely, we present a knapsack approach , in which we estimate the cost and the error reduction contribution of each possible component of the sparse grid, and then we choose the components with the highest error reduction /cost ratio. The estimates of the error reduction are obtained by either a mixed a-priori / a-posteriori approach, in which we first derive a theoretical bound and then tune it with some inexpensive auxiliary computations (resulting in the so-called quasi-optimal sparse grids ), or by a fully a-posteriori approach (obtaining the so-called adaptive sparse grids ). This framework is very general and can be used to build quasi-optimal/adaptive sparse grids on bounded and unbounded domains (e.g. u depending on uniform and normal random distributions for yn), using both nested and non-nested families of univariate collocation points. We present some theoretical convergence results as well as numerical results showing the efficiency of the proposed approach for the approximation of the solution of elliptic PDEs with random diffusion coefficients. In this context, to treat the case of rough permeability fields in which a sparse grid approach may not be suitable, we propose to use the sparse grids as a control variate in a Monte Carlo simulation.

  9. Substituting random forest for multiple linear regression improves binding affinity prediction of scoring functions: Cyscore as a case study.

    Science.gov (United States)

    Li, Hongjian; Leung, Kwong-Sak; Wong, Man-Hon; Ballester, Pedro J

    2014-08-27

    State-of-the-art protein-ligand docking methods are generally limited by the traditionally low accuracy of their scoring functions, which are used to predict binding affinity and thus vital for discriminating between active and inactive compounds. Despite intensive research over the years, classical scoring functions have reached a plateau in their predictive performance. These assume a predetermined additive functional form for some sophisticated numerical features, and use standard multivariate linear regression (MLR) on experimental data to derive the coefficients. In this study we show that such a simple functional form is detrimental for the prediction performance of a scoring function, and replacing linear regression by machine learning techniques like random forest (RF) can improve prediction performance. We investigate the conditions of applying RF under various contexts and find that given sufficient training samples RF manages to comprehensively capture the non-linearity between structural features and measured binding affinities. Incorporating more structural features and training with more samples can both boost RF performance. In addition, we analyze the importance of structural features to binding affinity prediction using the RF variable importance tool. Lastly, we use Cyscore, a top performing empirical scoring function, as a baseline for comparison study. Machine-learning scoring functions are fundamentally different from classical scoring functions because the former circumvents the fixed functional form relating structural features with binding affinities. RF, but not MLR, can effectively exploit more structural features and more training samples, leading to higher prediction performance. The future availability of more X-ray crystal structures will further widen the performance gap between RF-based and MLR-based scoring functions. This further stresses the importance of substituting RF for MLR in scoring function development.

  10. Systematic review of treatment modalities for gingival depigmentation: a random-effects poisson regression analysis.

    Science.gov (United States)

    Lin, Yi Hung; Tu, Yu Kang; Lu, Chun Tai; Chung, Wen Chen; Huang, Chiung Fang; Huang, Mao Suan; Lu, Hsein Kun

    2014-01-01

    Repigmentation variably occurs with different treatment methods in patients with gingival pigmentation. A systemic review was conducted of various treatment modalities for eliminating melanin pigmentation of the gingiva, comprising bur abrasion, scalpel surgery, cryosurgery, electrosurgery, gingival grafts, and laser techniques, to compare the recurrence rates (Rrs) of these treatment procedures. Electronic databases, including PubMed, Web of Science, Google, and Medline were comprehensively searched, and manual searches were conducted for studies published from January 1951 to June 2013. After applying inclusion and exclusion criteria, the final list of articles was reviewed in depth to achieve the objectives of this review. A Poisson regression was used to analyze the outcome of depigmentation using the various treatment methods. The systematic review was based on case reports mainly. In total, 61 eligible publications met the defined criteria. The various therapeutic procedures showed variable clinical results with a wide range of Rrs. A random-effects Poisson regression showed that cryosurgery (Rr = 0.32%), electrosurgery (Rr = 0.74%), and laser depigmentation (Rr = 1.16%) yielded superior result, whereas bur abrasion yielded the highest Rr (8.89%). Within the limit of the sampling level, the present evidence-based results show that cryosurgery exhibits the optimal predictability for depigmentation of the gingiva among all procedures examined, followed by electrosurgery and laser techniques. It is possible to treat melanin pigmentation of the gingiva with various methods and prevent repigmentation. Among those treatment modalities, cryosurgery, electrosurgery, and laser surgery appear to be the best choices for treating gingival pigmentation. © 2014 Wiley Periodicals, Inc.

  11. Bounds and Estimates for Transport Coefficients of Random and Porous Media with High Contrasts

    International Nuclear Information System (INIS)

    Berryman, J G

    2004-01-01

    Bounds on transport coefficients of random polycrystals of laminates are presented, including the well-known Hashin-Shtrikman bounds and some newly formulated bounds involving two formation factors for a two-component porous medium. Some new types of self-consistent estimates are then formulated based on the observed analytical structure both of these bounds and also of earlier self-consistent estimates (of the CPA or coherent potential approximation type). A numerical study is made, assuming first that the internal structure (i.e., the laminated grain structure) is not known, and then that it is known. The purpose of this aspect of the study is to attempt to quantify the differences in the predictions of properties of a system being modeled when such organized internal structure is present in the medium but detailed spatial correlation information may or (more commonly) may not be available. Some methods of estimating formation factors from data are also presented and then applied to a high-contrast fluid-permeability data set. Hashin-Shtrikman bounds are found to be very accurate estimates for low contrast heterogeneous media. But formation factor lower bounds are superior estimates for high contrast situations. The new self-consistent estimators also tend to agree better with data than either the bounds or the CPA estimates, which themselves tend to overestimate values for high contrast conducting composites

  12. Analysis and computation of the elastic wave equation with random coefficients

    KAUST Repository

    Motamed, Mohammad

    2015-10-21

    We consider the stochastic initial-boundary value problem for the elastic wave equation with random coefficients and deterministic data. We propose a stochastic collocation method for computing statistical moments of the solution or statistics of some given quantities of interest. We study the convergence rate of the error in the stochastic collocation method. In particular, we show that, the rate of convergence depends on the regularity of the solution or the quantity of interest in the stochastic space, which is in turn related to the regularity of the deterministic data in the physical space and the type of the quantity of interest. We demonstrate that a fast rate of convergence is possible in two cases: for the elastic wave solutions with high regular data; and for some high regular quantities of interest even in the presence of low regular data. We perform numerical examples, including a simplified earthquake, which confirm the analysis and show that the collocation method is a valid alternative to the more traditional Monte Carlo sampling method for approximating quantities with high stochastic regularity.

  13. BOX-COX transformation and random regression models for fecal egg count data

    Directory of Open Access Journals (Sweden)

    Marcos Vinicius Silva

    2012-01-01

    Full Text Available Accurate genetic evaluation of livestock is based on appropriate modeling of phenotypic measurements. In ruminants fecal egg count (FEC is commonly used to measure resistance to nematodes. FEC values are not normally distributed and logarithmic transformations have been used to achieve normality before analysis. However, the transformed data are often not normally distributed, especially when data are extremely skewed. A series of repeated FEC measurements may provide information about the population dynamics of a group or individual. A total of 6,375 FEC measures were obtained for 410 animals between 1992 and 2003 from the Beltsville Agricultural Research Center Angus herd. Original data were transformed using an extension of the Box-Cox transformation to approach normality and to estimate (covariance components. We also proposed using random regression models (RRM for genetic and non-genetic studies of FEC. Phenotypes were analyzed using RRM and restricted maximum likelihood. Within the different orders of Legendre polynomials used, those with more parameters (order 4 adjusted FEC data best. Results indicated that the transformation of FEC data utilizing the Box-Cox transformation family was effective in reducing the skewness and kurtosis, and dramatically increased estimates of heritability, and measurements of FEC obtained in the period between 12 and 26 weeks in a 26-week experimental challenge period are genetically correlated.

  14. Multi-fidelity Gaussian process regression for prediction of random fields

    Energy Technology Data Exchange (ETDEWEB)

    Parussini, L. [Department of Engineering and Architecture, University of Trieste (Italy); Venturi, D., E-mail: venturi@ucsc.edu [Department of Applied Mathematics and Statistics, University of California Santa Cruz (United States); Perdikaris, P. [Department of Mechanical Engineering, Massachusetts Institute of Technology (United States); Karniadakis, G.E. [Division of Applied Mathematics, Brown University (United States)

    2017-05-01

    We propose a new multi-fidelity Gaussian process regression (GPR) approach for prediction of random fields based on observations of surrogate models or hierarchies of surrogate models. Our method builds upon recent work on recursive Bayesian techniques, in particular recursive co-kriging, and extends it to vector-valued fields and various types of covariances, including separable and non-separable ones. The framework we propose is general and can be used to perform uncertainty propagation and quantification in model-based simulations, multi-fidelity data fusion, and surrogate-based optimization. We demonstrate the effectiveness of the proposed recursive GPR techniques through various examples. Specifically, we study the stochastic Burgers equation and the stochastic Oberbeck–Boussinesq equations describing natural convection within a square enclosure. In both cases we find that the standard deviation of the Gaussian predictors as well as the absolute errors relative to benchmark stochastic solutions are very small, suggesting that the proposed multi-fidelity GPR approaches can yield highly accurate results.

  15. Box-Cox Transformation and Random Regression Models for Fecal egg Count Data.

    Science.gov (United States)

    da Silva, Marcos Vinícius Gualberto Barbosa; Van Tassell, Curtis P; Sonstegard, Tad S; Cobuci, Jaime Araujo; Gasbarre, Louis C

    2011-01-01

    Accurate genetic evaluation of livestock is based on appropriate modeling of phenotypic measurements. In ruminants, fecal egg count (FEC) is commonly used to measure resistance to nematodes. FEC values are not normally distributed and logarithmic transformations have been used in an effort to achieve normality before analysis. However, the transformed data are often still not normally distributed, especially when data are extremely skewed. A series of repeated FEC measurements may provide information about the population dynamics of a group or individual. A total of 6375 FEC measures were obtained for 410 animals between 1992 and 2003 from the Beltsville Agricultural Research Center Angus herd. Original data were transformed using an extension of the Box-Cox transformation to approach normality and to estimate (co)variance components. We also proposed using random regression models (RRM) for genetic and non-genetic studies of FEC. Phenotypes were analyzed using RRM and restricted maximum likelihood. Within the different orders of Legendre polynomials used, those with more parameters (order 4) adjusted FEC data best. Results indicated that the transformation of FEC data utilizing the Box-Cox transformation family was effective in reducing the skewness and kurtosis, and dramatically increased estimates of heritability, and measurements of FEC obtained in the period between 12 and 26 weeks in a 26-week experimental challenge period are genetically correlated.

  16. Microbiome Data Accurately Predicts the Postmortem Interval Using Random Forest Regression Models

    Directory of Open Access Journals (Sweden)

    Aeriel Belk

    2018-02-01

    Full Text Available Death investigations often include an effort to establish the postmortem interval (PMI in cases in which the time of death is uncertain. The postmortem interval can lead to the identification of the deceased and the validation of witness statements and suspect alibis. Recent research has demonstrated that microbes provide an accurate clock that starts at death and relies on ecological change in the microbial communities that normally inhabit a body and its surrounding environment. Here, we explore how to build the most robust Random Forest regression models for prediction of PMI by testing models built on different sample types (gravesoil, skin of the torso, skin of the head, gene markers (16S ribosomal RNA (rRNA, 18S rRNA, internal transcribed spacer regions (ITS, and taxonomic levels (sequence variants, species, genus, etc.. We also tested whether particular suites of indicator microbes were informative across different datasets. Generally, results indicate that the most accurate models for predicting PMI were built using gravesoil and skin data using the 16S rRNA genetic marker at the taxonomic level of phyla. Additionally, several phyla consistently contributed highly to model accuracy and may be candidate indicators of PMI.

  17. Multi-fidelity Gaussian process regression for prediction of random fields

    International Nuclear Information System (INIS)

    Parussini, L.; Venturi, D.; Perdikaris, P.; Karniadakis, G.E.

    2017-01-01

    We propose a new multi-fidelity Gaussian process regression (GPR) approach for prediction of random fields based on observations of surrogate models or hierarchies of surrogate models. Our method builds upon recent work on recursive Bayesian techniques, in particular recursive co-kriging, and extends it to vector-valued fields and various types of covariances, including separable and non-separable ones. The framework we propose is general and can be used to perform uncertainty propagation and quantification in model-based simulations, multi-fidelity data fusion, and surrogate-based optimization. We demonstrate the effectiveness of the proposed recursive GPR techniques through various examples. Specifically, we study the stochastic Burgers equation and the stochastic Oberbeck–Boussinesq equations describing natural convection within a square enclosure. In both cases we find that the standard deviation of the Gaussian predictors as well as the absolute errors relative to benchmark stochastic solutions are very small, suggesting that the proposed multi-fidelity GPR approaches can yield highly accurate results.

  18. Herd-specific random regression carcass profiles for beef cattle after adjustment for animal genetic merit.

    Science.gov (United States)

    Englishby, Tanya M; Moore, Kirsty L; Berry, Donagh P; Coffey, Mike P; Banos, Georgios

    2017-07-01

    Abattoir data are an important source of information for the genetic evaluation of carcass traits, but also for on-farm management purposes. The present study aimed to quantify the contribution of herd environment to beef carcass characteristics (weight, conformation score and fat score) with particular emphasis on generating finishing herd-specific profiles for these traits across different ages at slaughter. Abattoir records from 46,115 heifers and 78,790 steers aged between 360 and 900days, and from 22,971 young bulls aged between 360 and 720days, were analysed. Finishing herd-year and animal genetic (co)variance components for each trait were estimated using random regression models. Across slaughter age and gender, the ratio of finishing herd-year to total phenotypic variance ranged from 0.31 to 0.72 for carcass weight, 0.21 to 0.57 for carcass conformation and 0.11 to 0.44 for carcass fat score. These parameters indicate that the finishing herd environment is an important contributor to carcass trait variability and amenable to improvement with management practices. Copyright © 2017 Elsevier Ltd. All rights reserved.

  19. Regression Discontinuity and Randomized Controlled Trial Estimates: An Application to The Mycotic Ulcer Treatment Trials.

    Science.gov (United States)

    Oldenburg, Catherine E; Venkatesh Prajna, N; Krishnan, Tiruvengada; Rajaraman, Revathi; Srinivasan, Muthiah; Ray, Kathryn J; O'Brien, Kieran S; Glymour, M Maria; Porco, Travis C; Acharya, Nisha R; Rose-Nussbaumer, Jennifer; Lietman, Thomas M

    2018-08-01

    We compare results from regression discontinuity (RD) analysis to primary results of a randomized controlled trial (RCT) utilizing data from two contemporaneous RCTs for treatment of fungal corneal ulcers. Patients were enrolled in the Mycotic Ulcer Treatment Trials I and II (MUTT I & MUTT II) based on baseline visual acuity: patients with acuity ≤ 20/400 (logMAR 1.3) enrolled in MUTT I, and >20/400 in MUTT II. MUTT I investigated the effect of topical natamycin versus voriconazole on best spectacle-corrected visual acuity. MUTT II investigated the effect of topical voriconazole plus placebo versus topical voriconazole plus oral voriconazole. We compared the RD estimate (natamycin arm of MUTT I [N = 162] versus placebo arm of MUTT II [N = 54]) to the RCT estimate from MUTT I (topical natamycin [N = 162] versus topical voriconazole [N = 161]). In the RD, patients receiving natamycin had mean improvement of 4-lines of visual acuity at 3 months (logMAR -0.39, 95% CI: -0.61, -0.17) compared to topical voriconazole plus placebo, and 2-lines in the RCT (logMAR -0.18, 95% CI: -0.30, -0.05) compared to topical voriconazole. The RD and RCT estimates were similar, although the RD design overestimated effects compared to the RCT.

  20. Comparative evaluation of left ventricular mass regression after aortic valve replacement: a prospective randomized analysis

    Directory of Open Access Journals (Sweden)

    Kiessling Arndt H

    2011-10-01

    Full Text Available Abstract Background We assessed the hemodynamic performance of various prostheses and the clinical outcomes after aortic valve replacement, in different age groups. Methods One-hundred-and-twenty patients with isolated aortic valve stenosis were included in this prospective randomized randomised trial and allocated in three age-groups to receive either pulmonary autograft (PA, n = 20 or mechanical prosthesis (MP, Edwards Mira n = 20 in group 1 (age 75. Clinical outcomes and hemodynamic performance were evaluated at discharge, six months and one year. Results In group 1, patients with PA had significantly lower mean gradients than the MP (2.6 vs. 10.9 mmHg, p = 0.0005 with comparable left ventricular mass regression (LVMR. Morbidity included 1 stroke in the PA population and 1 gastrointestinal bleeding in the MP subgroup. In group 2, mean gradients did not differ significantly between both populations (7.0 vs. 8.9 mmHg, p = 0.81. The rate of LVMR and EF were comparable at 12 months; each group with one mortality. Morbidity included 1 stroke and 1 gastrointestinal bleeding in the stentless and 3 bleeding complications in the MP group. In group 3, mean gradients did not differ significantly (7.8 vs 6.5 mmHg, p = 0.06. Postoperative EF and LVMR were comparable. There were 3 deaths in the stented group and no mortality in the stentless group. Morbidity included 1 endocarditis and 1 stroke in the stentless compared to 1 endocarditis, 1 stroke and one pulmonary embolism in the stented group. Conclusions Clinical outcomes justify valve replacement with either valve substitute in the respective age groups. The PA hemodynamically outperformed the MPs. Stentless valves however, did not demonstrate significantly superior hemodynamics or outcomes in comparison to stented bioprosthesis or MPs.

  1. Comparing cluster-level dynamic treatment regimens using sequential, multiple assignment, randomized trials: Regression estimation and sample size considerations.

    Science.gov (United States)

    NeCamp, Timothy; Kilbourne, Amy; Almirall, Daniel

    2017-08-01

    Cluster-level dynamic treatment regimens can be used to guide sequential treatment decision-making at the cluster level in order to improve outcomes at the individual or patient-level. In a cluster-level dynamic treatment regimen, the treatment is potentially adapted and re-adapted over time based on changes in the cluster that could be impacted by prior intervention, including aggregate measures of the individuals or patients that compose it. Cluster-randomized sequential multiple assignment randomized trials can be used to answer multiple open questions preventing scientists from developing high-quality cluster-level dynamic treatment regimens. In a cluster-randomized sequential multiple assignment randomized trial, sequential randomizations occur at the cluster level and outcomes are observed at the individual level. This manuscript makes two contributions to the design and analysis of cluster-randomized sequential multiple assignment randomized trials. First, a weighted least squares regression approach is proposed for comparing the mean of a patient-level outcome between the cluster-level dynamic treatment regimens embedded in a sequential multiple assignment randomized trial. The regression approach facilitates the use of baseline covariates which is often critical in the analysis of cluster-level trials. Second, sample size calculators are derived for two common cluster-randomized sequential multiple assignment randomized trial designs for use when the primary aim is a between-dynamic treatment regimen comparison of the mean of a continuous patient-level outcome. The methods are motivated by the Adaptive Implementation of Effective Programs Trial which is, to our knowledge, the first-ever cluster-randomized sequential multiple assignment randomized trial in psychiatry.

  2. Monte Carlo Finite Volume Element Methods for the Convection-Diffusion Equation with a Random Diffusion Coefficient

    Directory of Open Access Journals (Sweden)

    Qian Zhang

    2014-01-01

    Full Text Available The paper presents a framework for the construction of Monte Carlo finite volume element method (MCFVEM for the convection-diffusion equation with a random diffusion coefficient, which is described as a random field. We first approximate the continuous stochastic field by a finite number of random variables via the Karhunen-Loève expansion and transform the initial stochastic problem into a deterministic one with a parameter in high dimensions. Then we generate independent identically distributed approximations of the solution by sampling the coefficient of the equation and employing finite volume element variational formulation. Finally the Monte Carlo (MC method is used to compute corresponding sample averages. Statistic error is estimated analytically and experimentally. A quasi-Monte Carlo (QMC technique with Sobol sequences is also used to accelerate convergence, and experiments indicate that it can improve the efficiency of the Monte Carlo method.

  3. An R package to compute commonality coefficients in the multiple regression case: an introduction to the package and a practical example.

    Science.gov (United States)

    Nimon, Kim; Lewis, Mitzi; Kane, Richard; Haynes, R Michael

    2008-05-01

    Multiple regression is a widely used technique for data analysis in social and behavioral research. The complexity of interpreting such results increases when correlated predictor variables are involved. Commonality analysis provides a method of determining the variance accounted for by respective predictor variables and is especially useful in the presence of correlated predictors. However, computing commonality coefficients is laborious. To make commonality analysis accessible to more researchers, a program was developed to automate the calculation of unique and common elements in commonality analysis, using the statistical package R. The program is described, and a heuristic example using data from the Holzinger and Swineford (1939) study, readily available in the MBESS R package, is presented.

  4. Predicting attention-deficit/hyperactivity disorder severity from psychosocial stress and stress-response genes : A random forest regression approach

    NARCIS (Netherlands)

    Van Der Meer, D.; Hoekstra, P. J.; Van Donkelaar, M.; Bralten, J.; Oosterlaan, J.; Heslenfeld, D.; Faraone, S. V.; Franke, B.; Buitelaar, J. K.; Hartman, C. A.

    2017-01-01

    Identifying genetic variants contributing to attention-deficit/hyperactivity disorder (ADHD) is complicated by the involvement of numerous common genetic variants with small effects, interacting with each other as well as with environmental factors, such as stress exposure. Random forest regression

  5. Genetic Parameters for Body condition score, Body weigth, Milk yield and Fertility estimated using random regression models

    NARCIS (Netherlands)

    Berry, D.P.; Buckley, F.; Dillon, P.; Evans, R.D.; Rath, M.; Veerkamp, R.F.

    2003-01-01

    Genetic (co)variances between body condition score (BCS), body weight (BW), milk yield, and fertility were estimated using a random regression animal model extended to multivariate analysis. The data analyzed included 81,313 BCS observations, 91,937 BW observations, and 100,458 milk test-day yields

  6. Predicting attention-deficit/hyperactivity disorder severity from psychosocial stress and stress-response genes : a random forest regression approach

    NARCIS (Netherlands)

    van der Meer, D.; Hoekstra, P. J.; van Donkelaar, Marjolein M. J.; Bralten, Janita; Oosterlaan, J; Heslenfeld, Dirk J.; Faraone, S. V.; Franke, B.; Buitelaar, J. K.; Hartman, C. A.

    2017-01-01

    Identifying genetic variants contributing to attention-deficit/hyperactivity disorder (ADHD) is complicated by the involvement of numerous common genetic variants with small effects, interacting with each other as well as with environmental factors, such as stress exposure. Random forest regression

  7. An Introduction to Recursive Partitioning: Rationale, Application, and Characteristics of Classification and Regression Trees, Bagging, and Random Forests

    Science.gov (United States)

    Strobl, Carolin; Malley, James; Tutz, Gerhard

    2009-01-01

    Recursive partitioning methods have become popular and widely used tools for nonparametric regression and classification in many scientific fields. Especially random forests, which can deal with large numbers of predictor variables even in the presence of complex interactions, have been applied successfully in genetics, clinical medicine, and…

  8. Quantitative structure-property relationship study of n-octanol-water partition coefficients of some of diverse drugs using multiple linear regression

    International Nuclear Information System (INIS)

    Ghasemi, Jahanbakhsh; Saaidpour, Saadi

    2007-01-01

    A quantitative structure-property relationship (QSPR) study was performed to develop models those relate the structures of 150 drug organic compounds to their n-octanol-water partition coefficients (log P o/w ). Molecular descriptors derived solely from 3D structures of the molecular drugs. A genetic algorithm was also applied as a variable selection tool in QSPR analysis. The models were constructed using 110 molecules as training set, and predictive ability tested using 40 compounds. Modeling of log P o/w of these compounds as a function of the theoretically derived descriptors was established by multiple linear regression (MLR). Four descriptors for these compounds molecular volume (MV) (geometrical), hydrophilic-lipophilic balance (HLB) (constitutional), hydrogen bond forming ability (HB) (electronic) and polar surface area (PSA) (electrostatic) are taken as inputs for the model. The use of descriptors calculated only from molecular structure eliminates the need for experimental determination of properties for use in the correlation and allows for the estimation of log P o/w for molecules not yet synthesized. Application of the developed model to a testing set of 40 drug organic compounds demonstrates that the model is reliable with good predictive accuracy and simple formulation. The prediction results are in good agreement with the experimental value. The root mean square error of prediction (RMSEP) and square correlation coefficient (R 2 ) for MLR model were 0.22 and 0.99 for the prediction set log P o/w

  9. Prediction of octanol-water partition coefficients of organic compounds by multiple linear regression, partial least squares, and artificial neural network.

    Science.gov (United States)

    Golmohammadi, Hassan

    2009-11-30

    A quantitative structure-property relationship (QSPR) study was performed to develop models those relate the structure of 141 organic compounds to their octanol-water partition coefficients (log P(o/w)). A genetic algorithm was applied as a variable selection tool. Modeling of log P(o/w) of these compounds as a function of theoretically derived descriptors was established by multiple linear regression (MLR), partial least squares (PLS), and artificial neural network (ANN). The best selected descriptors that appear in the models are: atomic charge weighted partial positively charged surface area (PPSA-3), fractional atomic charge weighted partial positive surface area (FPSA-3), minimum atomic partial charge (Qmin), molecular volume (MV), total dipole moment of molecule (mu), maximum antibonding contribution of a molecule orbital in the molecule (MAC), and maximum free valency of a C atom in the molecule (MFV). The result obtained showed the ability of developed artificial neural network to prediction of partition coefficients of organic compounds. Also, the results revealed the superiority of ANN over the MLR and PLS models. Copyright 2009 Wiley Periodicals, Inc.

  10. Variances in the projections, resulting from CLIMEX, Boosted Regression Trees and Random Forests techniques

    Science.gov (United States)

    Shabani, Farzin; Kumar, Lalit; Solhjouy-fard, Samaneh

    2017-08-01

    The aim of this study was to have a comparative investigation and evaluation of the capabilities of correlative and mechanistic modeling processes, applied to the projection of future distributions of date palm in novel environments and to establish a method of minimizing uncertainty in the projections of differing techniques. The location of this study on a global scale is in Middle Eastern Countries. We compared the mechanistic model CLIMEX (CL) with the correlative models MaxEnt (MX), Boosted Regression Trees (BRT), and Random Forests (RF) to project current and future distributions of date palm ( Phoenix dactylifera L.). The Global Climate Model (GCM), the CSIRO-Mk3.0 (CS) using the A2 emissions scenario, was selected for making projections. Both indigenous and alien distribution data of the species were utilized in the modeling process. The common areas predicted by MX, BRT, RF, and CL from the CS GCM were extracted and compared to ascertain projection uncertainty levels of each individual technique. The common areas identified by all four modeling techniques were used to produce a map indicating suitable and unsuitable areas for date palm cultivation for Middle Eastern countries, for the present and the year 2100. The four different modeling approaches predict fairly different distributions. Projections from CL were more conservative than from MX. The BRT and RF were the most conservative methods in terms of projections for the current time. The combination of the final CL and MX projections for the present and 2100 provide higher certainty concerning those areas that will become highly suitable for future date palm cultivation. According to the four models, cold, hot, and wet stress, with differences on a regional basis, appears to be the major restrictions on future date palm distribution. The results demonstrate variances in the projections, resulting from different techniques. The assessment and interpretation of model projections requires reservations

  11. Bayesian Nonparametric Regression Analysis of Data with Random Effects Covariates from Longitudinal Measurements

    KAUST Repository

    Ryu, Duchwan; Li, Erning; Mallick, Bani K.

    2010-01-01

    " approach and the regression calibration, via simulations and by an application that investigates the relationship between obesity in adulthood and childhood growth curves. © 2010, The International Biometric Society.

  12. Modelos de regressão aleatória com diferentes estruturas de variância residual para descrever o tamanho da leitegada Random regression models with different residual variance structures for describing litter size in swine

    Directory of Open Access Journals (Sweden)

    Aderbal Cavalcante-Neto

    2011-12-01

    random-regression, single-characteristic animal model. The fixed and random regressions were represented by continuous functions over the farrowing order, adjusted by third-order Legendre's orthogonal polynomials. To obtain the best modeling for the residual variance, variance heterogeneity was assumed by means of 1 to 7 classes of residual variance. The general analysis model included a contemporary group; the fixed regression coefficients for modeling the population's average trajectory; the random regression coefficients of the direct additive genetic effects both of the litter and of the animal's permanent environment; and the residual random effect. The likelihood-ratio test, Akaike's information criterion, and Schwarz's Bayesian information criterion appointed the model that considered variance homogeneity as being the one that provided the best adjustment to the data used. Overall, the heritabilities obtained were close to zero (0.002 to 0.006. Regarding the permanent environment proportion, different magnitudes were observed for the farrowing order: increasing from the 1st (0.06 to the 5th (0.28 orders and decreasing from there to the 7th order (0.18. The common litter effect presented low values (from 0.01 to 0.02. The use of residual variance homogeneity was more suitable for modeling variances associated to the trait litter size at birth in this data set.

  13. Backward Stochastic Riccati Equations and Infinite Horizon L-Q Optimal Control with Infinite Dimensional State Space and Random Coefficients

    International Nuclear Information System (INIS)

    Guatteri, Giuseppina; Tessitore, Gianmario

    2008-01-01

    We study the Riccati equation arising in a class of quadratic optimal control problems with infinite dimensional stochastic differential state equation and infinite horizon cost functional. We allow the coefficients, both in the state equation and in the cost, to be random.In such a context backward stochastic Riccati equations are backward stochastic differential equations in the whole positive real axis that involve quadratic non-linearities and take values in a non-Hilbertian space. We prove existence of a minimal non-negative solution and, under additional assumptions, its uniqueness. We show that such a solution allows to perform the synthesis of the optimal control and investigate its attractivity properties. Finally the case where the coefficients are stationary is addressed and an example concerning a controlled wave equation in random media is proposed

  14. Genetic analysis of body weights of individually fed beef bulls in South Africa using random regression models.

    Science.gov (United States)

    Selapa, N W; Nephawe, K A; Maiwashe, A; Norris, D

    2012-02-08

    The aim of this study was to estimate genetic parameters for body weights of individually fed beef bulls measured at centralized testing stations in South Africa using random regression models. Weekly body weights of Bonsmara bulls (N = 2919) tested between 1999 and 2003 were available for the analyses. The model included a fixed regression of the body weights on fourth-order orthogonal Legendre polynomials of the actual days on test (7, 14, 21, 28, 35, 42, 49, 56, 63, 70, 77, and 84) for starting age and contemporary group effects. Random regressions on fourth-order orthogonal Legendre polynomials of the actual days on test were included for additive genetic effects and additional uncorrelated random effects of the weaning-herd-year and the permanent environment of the animal. Residual effects were assumed to be independently distributed with heterogeneous variance for each test day. Variance ratios for additive genetic, permanent environment and weaning-herd-year for weekly body weights at different test days ranged from 0.26 to 0.29, 0.37 to 0.44 and 0.26 to 0.34, respectively. The weaning-herd-year was found to have a significant effect on the variation of body weights of bulls despite a 28-day adjustment period. Genetic correlations amongst body weights at different test days were high, ranging from 0.89 to 1.00. Heritability estimates were comparable to literature using multivariate models. Therefore, random regression model could be applied in the genetic evaluation of body weight of individually fed beef bulls in South Africa.

  15. Random Forest as a Predictive Analytics Alternative to Regression in Institutional Research

    Science.gov (United States)

    He, Lingjun; Levine, Richard A.; Fan, Juanjuan; Beemer, Joshua; Stronach, Jeanne

    2018-01-01

    In institutional research, modern data mining approaches are seldom considered to address predictive analytics problems. The goal of this paper is to highlight the advantages of tree-based machine learning algorithms over classic (logistic) regression methods for data-informed decision making in higher education problems, and stress the success of…

  16. Appropriate assessment of neighborhood effects on individual health: integrating random and fixed effects in multilevel logistic regression

    DEFF Research Database (Denmark)

    Larsen, Klaus; Merlo, Juan

    2005-01-01

    The logistic regression model is frequently used in epidemiologic studies, yielding odds ratio or relative risk interpretations. Inspired by the theory of linear normal models, the logistic regression model has been extended to allow for correlated responses by introducing random effects. However......, the model does not inherit the interpretational features of the normal model. In this paper, the authors argue that the existing measures are unsatisfactory (and some of them are even improper) when quantifying results from multilevel logistic regression analyses. The authors suggest a measure...... of heterogeneity, the median odds ratio, that quantifies cluster heterogeneity and facilitates a direct comparison between covariate effects and the magnitude of heterogeneity in terms of well-known odds ratios. Quantifying cluster-level covariates in a meaningful way is a challenge in multilevel logistic...

  17. Evaluating an Organizational-Level Occupational Health Intervention in a Combined Regression Discontinuity and Randomized Control Design.

    Science.gov (United States)

    Sørensen, By Ole H

    2016-10-01

    Organizational-level occupational health interventions have great potential to improve employees' health and well-being. However, they often compare unfavourably to individual-level interventions. This calls for improving methods for designing, implementing and evaluating organizational interventions. This paper presents and discusses the regression discontinuity design because, like the randomized control trial, it is a strong summative experimental design, but it typically fits organizational-level interventions better. The paper explores advantages and disadvantages of a regression discontinuity design with an embedded randomized control trial. It provides an example from an intervention study focusing on reducing sickness absence in 196 preschools. The paper demonstrates that such a design fits the organizational context, because it allows management to focus on organizations or workgroups with the most salient problems. In addition, organizations may accept an embedded randomized design because the organizations or groups with most salient needs receive obligatory treatment as part of the regression discontinuity design. Copyright © 2016 John Wiley & Sons, Ltd. Copyright © 2016 John Wiley & Sons, Ltd.

  18. Application of QMC methods to PDEs with random coefficients : a survey of analysis and implementation

    KAUST Repository

    Kuo, Frances; Dick, Josef; Le Gia, Thong; Nichols, James; Sloan, Ian; Graham, Ivan; Scheichl, Robert; Nuyens, Dirk; Schwab, Christoph

    2016-01-01

    have been written on this topic using a variety of methods. QMC methods are relatively new to this application area. I will consider different models for the randomness (uniform versus lognormal) and contrast different QMC algorithms (single-level

  19. 3D statistical shape models incorporating 3D random forest regression voting for robust CT liver segmentation

    Science.gov (United States)

    Norajitra, Tobias; Meinzer, Hans-Peter; Maier-Hein, Klaus H.

    2015-03-01

    During image segmentation, 3D Statistical Shape Models (SSM) usually conduct a limited search for target landmarks within one-dimensional search profiles perpendicular to the model surface. In addition, landmark appearance is modeled only locally based on linear profiles and weak learners, altogether leading to segmentation errors from landmark ambiguities and limited search coverage. We present a new method for 3D SSM segmentation based on 3D Random Forest Regression Voting. For each surface landmark, a Random Regression Forest is trained that learns a 3D spatial displacement function between the according reference landmark and a set of surrounding sample points, based on an infinite set of non-local randomized 3D Haar-like features. Landmark search is then conducted omni-directionally within 3D search spaces, where voxelwise forest predictions on landmark position contribute to a common voting map which reflects the overall position estimate. Segmentation experiments were conducted on a set of 45 CT volumes of the human liver, of which 40 images were randomly chosen for training and 5 for testing. Without parameter optimization, using a simple candidate selection and a single resolution approach, excellent results were achieved, while faster convergence and better concavity segmentation were observed, altogether underlining the potential of our approach in terms of increased robustness from distinct landmark detection and from better search coverage.

  20. Vertical random variability of the distribution coefficient in the soil and its effect on the migration of fallout radionuclides

    International Nuclear Information System (INIS)

    Bunzl, K.

    2002-01-01

    In the field, the distribution coefficient, K d , for the sorption of a radionuclide by the soil cannot be expected to be constant. Even in a well defined soil horizon, K d will vary stochastically in horizontal as well as in vertical direction around a mean value. The horizontal random variability of K d produce a pronounced tailing effect in the concentration depth profile of a fallout radionuclide, much less is known on the corresponding effect of the vertical random variability. To analyze this effect theoretically, the classical convection-dispersion model in combination with the random-walk particle method was applied. The concentration depth profile of a radionuclide was calculated one year after deposition assuming constant values of the pore water velocity, the diffusion/dispersion coefficient, and the distribution coefficient (K d = 100 cm 3 x g -1 ) and exhibiting a vertical variability for K d according to a log-normal distribution with a geometric mean of 100 cm 3 x g -1 and a coefficient of variation of CV 0.53. The results show that these two concentration depth profiles are only slightly different, the location of the peak is shifted somewhat upwards, and the dispersion of the concentration depth profile is slightly larger. A substantial tailing effect of the concentration depth profile is not perceivable. Especially with respect to the location of the peak, a very good approximation of the concentration depth profile is obtained if the arithmetic mean of the K d -values (K d = 113 cm 3 x g -1 ) and a slightly increased dispersion coefficient are used in the analytical solution of the classical convection-dispersion equation with constant K d . The evaluation of the observed concentration depth profile with the analytical solution of the classical convection-dispersion equation with constant parameters will, within the usual experimental limits, hardly reveal the presence of a log-normal random distribution of K d in the vertical direction in

  1. Gaussian Mixture Random Coefficient model based framework for SHM in structures with time-dependent dynamics under uncertainty

    Science.gov (United States)

    Avendaño-Valencia, Luis David; Fassois, Spilios D.

    2017-12-01

    The problem of vibration-based damage diagnosis in structures characterized by time-dependent dynamics under significant environmental and/or operational uncertainty is considered. A stochastic framework consisting of a Gaussian Mixture Random Coefficient model of the uncertain time-dependent dynamics under each structural health state, proper estimation methods, and Bayesian or minimum distance type decision making, is postulated. The Random Coefficient (RC) time-dependent stochastic model with coefficients following a multivariate Gaussian Mixture Model (GMM) allows for significant flexibility in uncertainty representation. Certain of the model parameters are estimated via a simple procedure which is founded on the related Multiple Model (MM) concept, while the GMM weights are explicitly estimated for optimizing damage diagnostic performance. The postulated framework is demonstrated via damage detection in a simple simulated model of a quarter-car active suspension with time-dependent dynamics and considerable uncertainty on the payload. Comparisons with a simpler Gaussian RC model based method are also presented, with the postulated framework shown to be capable of offering considerable improvement in diagnostic performance.

  2. Auto Regressive Moving Average (ARMA) Modeling Method for Gyro Random Noise Using a Robust Kalman Filter

    Science.gov (United States)

    Huang, Lei

    2015-01-01

    To solve the problem in which the conventional ARMA modeling methods for gyro random noise require a large number of samples and converge slowly, an ARMA modeling method using a robust Kalman filtering is developed. The ARMA model parameters are employed as state arguments. Unknown time-varying estimators of observation noise are used to achieve the estimated mean and variance of the observation noise. Using the robust Kalman filtering, the ARMA model parameters are estimated accurately. The developed ARMA modeling method has the advantages of a rapid convergence and high accuracy. Thus, the required sample size is reduced. It can be applied to modeling applications for gyro random noise in which a fast and accurate ARMA modeling method is required. PMID:26437409

  3. Polynomial Chaos Expansion of Random Coefficients and the Solution of Stochastic Partial Differential Equations in the Tensor Train Format

    KAUST Repository

    Dolgov, Sergey

    2015-11-03

    We apply the tensor train (TT) decomposition to construct the tensor product polynomial chaos expansion (PCE) of a random field, to solve the stochastic elliptic diffusion PDE with the stochastic Galerkin discretization, and to compute some quantities of interest (mean, variance, and exceedance probabilities). We assume that the random diffusion coefficient is given as a smooth transformation of a Gaussian random field. In this case, the PCE is delivered by a complicated formula, which lacks an analytic TT representation. To construct its TT approximation numerically, we develop the new block TT cross algorithm, a method that computes the whole TT decomposition from a few evaluations of the PCE formula. The new method is conceptually similar to the adaptive cross approximation in the TT format but is more efficient when several tensors must be stored in the same TT representation, which is the case for the PCE. In addition, we demonstrate how to assemble the stochastic Galerkin matrix and to compute the solution of the elliptic equation and its postprocessing, staying in the TT format. We compare our technique with the traditional sparse polynomial chaos and the Monte Carlo approaches. In the tensor product polynomial chaos, the polynomial degree is bounded for each random variable independently. This provides higher accuracy than the sparse polynomial set or the Monte Carlo method, but the cardinality of the tensor product set grows exponentially with the number of random variables. However, when the PCE coefficients are implicitly approximated in the TT format, the computations with the full tensor product polynomial set become possible. In the numerical experiments, we confirm that the new methodology is competitive in a wide range of parameters, especially where high accuracy and high polynomial degrees are required.

  4. A Correction of Random Incidence Absorption Coefficients for the Angular Distribution of Acoustic Energy under Measurement Conditions

    DEFF Research Database (Denmark)

    Jeong, Cheol-Ho

    2009-01-01

    Most acoustic measurements are based on an assumption of ideal conditions. One such ideal condition is a diffuse and reverberant field. In practice, a perfectly diffuse sound field cannot be achieved in a reverberation chamber. Uneven incident energy density under measurement conditions can cause...... discrepancies between the measured value and the theoretical random incidence absorption coefficient. Therefore the angular distribution of the incident acoustic energy onto an absorber sample should be taken into account. The angular distribution of the incident energy density was simulated using the beam...... tracing method for various room shapes and source positions. The averaged angular distribution is found to be similar to a Gaussian distribution. As a result, an angle-weighted absorption coefficient was proposed by considering the angular energy distribution to improve the agreement between...

  5. Implementation of optimal Galerkin and Collocation approximations of PDEs with Random Coefficients

    KAUST Repository

    Beck, Joakim

    2011-12-22

    In this work we first focus on the Stochastic Galerkin approximation of the solution u of an elliptic stochastic PDE. We rely on sharp estimates for the decay of the coefficients of the spectral expansion of u on orthogonal polynomials to build a sequence of polynomial subspaces that features better convergence properties compared to standard polynomial subspaces such as Total Degree or Tensor Product. We consider then the Stochastic Collocation method, and use the previous estimates to introduce a new effective class of Sparse Grids, based on the idea of selecting a priori the most profitable hierarchical surpluses, that, again, features better convergence properties compared to standard Smolyak or tensor product grids.

  6. Regression models to predict the behavior of the coefficient of friction of AISI 316L on UHMWPE under ISO 14243-3 conditions.

    Science.gov (United States)

    Garcia-Garcia, A L; Alvarez-Vera, M; Montoya-Santiyanes, L A; Dominguez-Lopez, I; Montes-Seguedo, J L; Sosa-Savedra, J C; Barceinas-Sanchez, J D O

    2018-06-01

    Friction is the natural response of all tribosystems. In a total knee replacement (TKR) prosthetic device, its measurement is hindered by the complex geometry of its integrating parts and that of the testing simulation rig operating under the ISO 14243-3:2014 standard. To develop prediction models of the coefficient of friction (COF) between AISI 316L steel and ultra-high molecular weight polyethylene (UHMWPE) lubricated with fetal bovine serum dilutions, the arthrokinematics and loading conditions prescribed by the ISO 142433: 2014 standard were translated to a simpler geometrical setup, via Hertz contact theory. Tribological testing proceeded by loading a stainless steel AISI 316L ball against the surface of a UHMWPE disk, with the test fluid at 37 °C. The method has been applied to study the behavior of the COF during a whole walking cycle. On the other hand, the role of protein aggregation phenomena as a lubrication mechanism has been extensively studied in hip joint replacements but little explored for the operating conditions of a TKR. Lubricant testing fluids were prepared with fetal bovine serum (FBS) dilutions having protein mass concentrations of 5, 10, 20 and 36 g/L. The results were contrasted against deionized, sterilized water. The results indicate that even at protein concentration as low as 5 g/L, protein aggregation phenomena play an important role in the lubrication of the metal-on-polymer tribopair. The regression models of the COF developed herein are available for numerical simulations of the tribological behavior of the aforementioned tribosystem. In this case, surface stress rather than film thickness should be considered. Copyright © 2018 Elsevier Ltd. All rights reserved.

  7. Random regression models to estimate genetic parameters for milk production of Guzerat cows using orthogonal Legendre polynomials

    Directory of Open Access Journals (Sweden)

    Maria Gabriela Campolina Diniz Peixoto

    2014-05-01

    Full Text Available The objective of this work was to compare random regression models for the estimation of genetic parameters for Guzerat milk production, using orthogonal Legendre polynomials. Records (20,524 of test-day milk yield (TDMY from 2,816 first-lactation Guzerat cows were used. TDMY grouped into 10-monthly classes were analyzed for additive genetic effect and for environmental and residual permanent effects (random effects, whereas the contemporary group, calving age (linear and quadratic effects and mean lactation curve were analized as fixed effects. Trajectories for the additive genetic and permanent environmental effects were modeled by means of a covariance function employing orthogonal Legendre polynomials ranging from the second to the fifth order. Residual variances were considered in one, four, six, or ten variance classes. The best model had six residual variance classes. The heritability estimates for the TDMY records varied from 0.19 to 0.32. The random regression model that used a second-order Legendre polynomial for the additive genetic effect, and a fifth-order polynomial for the permanent environmental effect is adequate for comparison by the main employed criteria. The model with a second-order Legendre polynomial for the additive genetic effect, and that with a fourth-order for the permanent environmental effect could also be employed in these analyses.

  8. Modeling urban coastal flood severity from crowd-sourced flood reports using Poisson regression and Random Forest

    Science.gov (United States)

    Sadler, J. M.; Goodall, J. L.; Morsy, M. M.; Spencer, K.

    2018-04-01

    Sea level rise has already caused more frequent and severe coastal flooding and this trend will likely continue. Flood prediction is an essential part of a coastal city's capacity to adapt to and mitigate this growing problem. Complex coastal urban hydrological systems however, do not always lend themselves easily to physically-based flood prediction approaches. This paper presents a method for using a data-driven approach to estimate flood severity in an urban coastal setting using crowd-sourced data, a non-traditional but growing data source, along with environmental observation data. Two data-driven models, Poisson regression and Random Forest regression, are trained to predict the number of flood reports per storm event as a proxy for flood severity, given extensive environmental data (i.e., rainfall, tide, groundwater table level, and wind conditions) as input. The method is demonstrated using data from Norfolk, Virginia USA from September 2010 to October 2016. Quality-controlled, crowd-sourced street flooding reports ranging from 1 to 159 per storm event for 45 storm events are used to train and evaluate the models. Random Forest performed better than Poisson regression at predicting the number of flood reports and had a lower false negative rate. From the Random Forest model, total cumulative rainfall was by far the most dominant input variable in predicting flood severity, followed by low tide and lower low tide. These methods serve as a first step toward using data-driven methods for spatially and temporally detailed coastal urban flood prediction.

  9. Evaluation of random errors in Williams’ series coefficients obtained with digital image correlation

    International Nuclear Information System (INIS)

    Lychak, Oleh V; Holyns’kiy, Ivan S

    2016-01-01

    The use of the Williams’ series parameters for fracture analysis requires valid information about their error values. The aim of this investigation is the development of the method for estimation of the standard deviation of random errors of the Williams’ series parameters, obtained from the measured components of the stress field. Also, the criteria for choosing the optimal number of terms in the truncated Williams’ series for derivation of their parameters with minimal errors is proposed. The method was used for the evaluation of the Williams’ parameters, obtained from the data, and measured by the digital image correlation technique for testing a three-point bending specimen. (paper)

  10. Land surface temperature downscaling using random forest regression: primary result and sensitivity analysis

    Science.gov (United States)

    Pan, Xin; Cao, Chen; Yang, Yingbao; Li, Xiaolong; Shan, Liangliang; Zhu, Xi

    2018-04-01

    The land surface temperature (LST) derived from thermal infrared satellite images is a meaningful variable in many remote sensing applications. However, at present, the spatial resolution of the satellite thermal infrared remote sensing sensor is coarser, which cannot meet the needs. In this study, LST image was downscaled by a random forest model between LST and multiple predictors in an arid region with an oasis-desert ecotone. The proposed downscaling approach was evaluated using LST derived from the MODIS LST product of Zhangye City in Heihe Basin. The primary result of LST downscaling has been shown that the distribution of downscaled LST matched with that of the ecosystem of oasis and desert. By the way of sensitivity analysis, the most sensitive factors to LST downscaling were modified normalized difference water index (MNDWI)/normalized multi-band drought index (NMDI), soil adjusted vegetation index (SAVI)/ shortwave infrared reflectance (SWIR)/normalized difference vegetation index (NDVI), normalized difference building index (NDBI)/SAVI and SWIR/NDBI/MNDWI/NDWI for the region of water, vegetation, building and desert, with LST variation (at most) of 0.20/-0.22 K, 0.92/0.62/0.46 K, 0.28/-0.29 K and 3.87/-1.53/-0.64/-0.25 K in the situation of +/-0.02 predictor perturbances, respectively.

  11. The relationship between multilevel models and non-parametric multilevel mixture models: Discrete approximation of intraclass correlation, random coefficient distributions, and residual heteroscedasticity.

    Science.gov (United States)

    Rights, Jason D; Sterba, Sonya K

    2016-11-01

    Multilevel data structures are common in the social sciences. Often, such nested data are analysed with multilevel models (MLMs) in which heterogeneity between clusters is modelled by continuously distributed random intercepts and/or slopes. Alternatively, the non-parametric multilevel regression mixture model (NPMM) can accommodate the same nested data structures through discrete latent class variation. The purpose of this article is to delineate analytic relationships between NPMM and MLM parameters that are useful for understanding the indirect interpretation of the NPMM as a non-parametric approximation of the MLM, with relaxed distributional assumptions. We define how seven standard and non-standard MLM specifications can be indirectly approximated by particular NPMM specifications. We provide formulas showing how the NPMM can serve as an approximation of the MLM in terms of intraclass correlation, random coefficient means and (co)variances, heteroscedasticity of residuals at level 1, and heteroscedasticity of residuals at level 2. Further, we discuss how these relationships can be useful in practice. The specific relationships are illustrated with simulated graphical demonstrations, and direct and indirect interpretations of NPMM classes are contrasted. We provide an R function to aid in implementing and visualizing an indirect interpretation of NPMM classes. An empirical example is presented and future directions are discussed. © 2016 The British Psychological Society.

  12. Evaluation Procedures of Random Uncertainties in Theoretical Calculations of Cross Sections and Rate Coefficients

    International Nuclear Information System (INIS)

    Kokoouline, V.; Richardson, W.

    2014-01-01

    Uncertainties in theoretical calculations may include: • Systematic uncertainty: Due to applicability limits of the chosen model. • Random: Within a model, uncertainties of model parameters result in uncertainties of final results (such as cross sections). • If uncertainties of experimental and theoretical data are known, for the purpose of data evaluation (to produce recommended data), one should combine two data sets to produce the best guess data with the smallest possible uncertainty. In many situations, it is possible to assess the accuracy of theoretical calculations because theoretical models usually rely on parameters that are uncertain, but not completely random, i.e. the uncertainties of the parameters of the models are approximately known. If there are one or several such parameters with corresponding uncertainties, even if some or all parameters are correlated, the above approach gives a conceptually simple way to calculate uncertainties of final cross sections (uncertainty propagation). Numerically, the statistical approach to the uncertainty propagation could be computationally expensive. However, in situations, where uncertainties are considered to be as important as the actual cross sections (for data validation or benchmark calculations, for example), such a numerical effort is justified. Having data from different sources (say, from theory and experiment), a systematic statistical approach allows one to compare the data and produce “unbiased” evaluated data with improved uncertainties, if uncertainties of initial data from different sources are available. Without uncertainties, the data evaluation/validation becomes impossible. This is the reason why theoreticians should assess the accuracy of their calculations in one way or another. A statistical and systematic approach, similar to the described above, is preferable.

  13. A Logistic Regression Model with a Hierarchical Random Error Term for Analyzing the Utilization of Public Transport

    Directory of Open Access Journals (Sweden)

    Chong Wei

    2015-01-01

    Full Text Available Logistic regression models have been widely used in previous studies to analyze public transport utilization. These studies have shown travel time to be an indispensable variable for such analysis and usually consider it to be a deterministic variable. This formulation does not allow us to capture travelers’ perception error regarding travel time, and recent studies have indicated that this error can have a significant effect on modal choice behavior. In this study, we propose a logistic regression model with a hierarchical random error term. The proposed model adds a new random error term for the travel time variable. This term structure enables us to investigate travelers’ perception error regarding travel time from a given choice behavior dataset. We also propose an extended model that allows constraining the sign of this error in the model. We develop two Gibbs samplers to estimate the basic hierarchical model and the extended model. The performance of the proposed models is examined using a well-known dataset.

  14. Genetic parameters for body condition score, body weight, milk yield, and fertility estimated using random regression models.

    Science.gov (United States)

    Berry, D P; Buckley, F; Dillon, P; Evans, R D; Rath, M; Veerkamp, R F

    2003-11-01

    Genetic (co)variances between body condition score (BCS), body weight (BW), milk yield, and fertility were estimated using a random regression animal model extended to multivariate analysis. The data analyzed included 81,313 BCS observations, 91,937 BW observations, and 100,458 milk test-day yields from 8725 multiparous Holstein-Friesian cows. A cubic random regression was sufficient to model the changing genetic variances for BCS, BW, and milk across different days in milk. The genetic correlations between BCS and fertility changed little over the lactation; genetic correlations between BCS and interval to first service and between BCS and pregnancy rate to first service varied from -0.47 to -0.31, and from 0.15 to 0.38, respectively. This suggests that maximum genetic gain in fertility from indirect selection on BCS should be based on measurements taken in midlactation when the genetic variance for BCS is largest. Selection for increased BW resulted in shorter intervals to first service, but more services and poorer pregnancy rates; genetic correlations between BW and pregnancy rate to first service varied from -0.52 to -0.45. Genetic selection for higher lactation milk yield alone through selection on increased milk yield in early lactation is likely to have a more deleterious effect on genetic merit for fertility than selection on higher milk yield in late lactation.

  15. Guideline for Adopting the Local Reaction Assumption for Porous Absorbers in Terms of Random Incidence Absorption Coefficients

    DEFF Research Database (Denmark)

    Jeong, Cheol-Ho

    2011-01-01

    resistivity and the absorber thickness on the difference between the two surface reaction models are examined and discussed. For a porous absorber backed by a rigid surface, the assumption of local reaction always underestimates the random incidence absorption coefficient and the local reaction models give...... incidence acoustical characteristics of typical building elements made of porous materials assuming extended and local reaction. For each surface reaction, five well-established wave propagation models, the Delany-Bazley, Miki, Beranek, Allard-Champoux, and Biot model, are employed. Effects of the flow...... errors of less than 10% if the thickness exceeds 120 mm for a flow resistivity of 5000 Nm-4s. As the flow resistivity doubles, a decrease in the required thickness by 25 mm is observed to achieve the same amount of error. For an absorber backed by an air gap, the thickness ratio between the material...

  16. Genetic Analysis of Milk Yield Using Random Regression Test Day Model in Tehran Province Holstein Dairy Cow

    Directory of Open Access Journals (Sweden)

    A. Seyeddokht

    2012-09-01

    Full Text Available In this research a random regression test day model was used to estimate heritability values and calculation genetic correlations between test day milk records. a total of 140357 monthly test day milk records belonging to 28292 first lactation Holstein cattle(trice time a day milking distributed in 165 herd and calved from 2001 to 2010 belonging to the herds of Tehran province were used. The fixed effects of herd-year-month of calving as contemporary group and age at calving and Holstein gene percentage as covariate were fitted. Orthogonal legendre polynomial with a 4th-order was implemented to take account of genetic and environmental aspects of milk production over the course of lactation. RRM using Legendre polynomials as base functions appears to be the most adequate to describe the covariance structure of the data. The results showed that the average of heritability for the second half of lactation period was higher than that of the first half. The heritability value for the first month was lowest (0.117 and for the eighth month of the lactation was highest (0.230 compared to the other months of lactation. Because of genetic variation was increased gradually, and residual variance was high in the first months of lactation, heritabilities were different over the course of lactation. The RRMs with a higher number of parameters were more useful to describe the genetic variation of test-day milk yield throughout the lactation. In this research estimation of genetic parameters, and calculation genetic correlations were implemented by random regression test day model, therefore using this method is the exact way to take account of parameters rather than the other ways.

  17. Spatial prediction of landslides using a hybrid machine learning approach based on Random Subspace and Classification and Regression Trees

    Science.gov (United States)

    Pham, Binh Thai; Prakash, Indra; Tien Bui, Dieu

    2018-02-01

    A hybrid machine learning approach of Random Subspace (RSS) and Classification And Regression Trees (CART) is proposed to develop a model named RSSCART for spatial prediction of landslides. This model is a combination of the RSS method which is known as an efficient ensemble technique and the CART which is a state of the art classifier. The Luc Yen district of Yen Bai province, a prominent landslide prone area of Viet Nam, was selected for the model development. Performance of the RSSCART model was evaluated through the Receiver Operating Characteristic (ROC) curve, statistical analysis methods, and the Chi Square test. Results were compared with other benchmark landslide models namely Support Vector Machines (SVM), single CART, Naïve Bayes Trees (NBT), and Logistic Regression (LR). In the development of model, ten important landslide affecting factors related with geomorphology, geology and geo-environment were considered namely slope angles, elevation, slope aspect, curvature, lithology, distance to faults, distance to rivers, distance to roads, and rainfall. Performance of the RSSCART model (AUC = 0.841) is the best compared with other popular landslide models namely SVM (0.835), single CART (0.822), NBT (0.821), and LR (0.723). These results indicate that performance of the RSSCART is a promising method for spatial landslide prediction.

  18. Potential misinterpretation of treatment effects due to use of odds ratios and logistic regression in randomized controlled trials.

    Directory of Open Access Journals (Sweden)

    Mirjam J Knol

    Full Text Available BACKGROUND: In randomized controlled trials (RCTs, the odds ratio (OR can substantially overestimate the risk ratio (RR if the incidence of the outcome is over 10%. This study determined the frequency of use of ORs, the frequency of overestimation of the OR as compared with its accompanying RR in published RCTs, and we assessed how often regression models that calculate RRs were used. METHODS: We included 288 RCTs published in 2008 in five major general medical journals (Annals of Internal Medicine, British Medical Journal, Journal of the American Medical Association, Lancet, New England Journal of Medicine. If an OR was reported, we calculated the corresponding RR, and we calculated the percentage of overestimation by using the formula . RESULTS: Of 193 RCTs with a dichotomous primary outcome, 24 (12.4% presented a crude and/or adjusted OR for the primary outcome. In five RCTs (2.6%, the OR differed more than 100% from its accompanying RR on the log scale. Forty-one of all included RCTs (n = 288; 14.2% presented ORs for other outcomes, or for subgroup analyses. Nineteen of these RCTs (6.6% had at least one OR that deviated more than 100% from its accompanying RR on the log scale. Of 53 RCTs that adjusted for baseline variables, 15 used logistic regression. Alternative methods to estimate RRs were only used in four RCTs. CONCLUSION: ORs and logistic regression are often used in RCTs and in many articles the OR did not approximate the RR. Although the authors did not explicitly misinterpret these ORs as RRs, misinterpretation by readers can seriously affect treatment decisions and policy making.

  19. Comparative Performance Analysis of Support Vector Machine, Random Forest, Logistic Regression and k-Nearest Neighbours in Rainbow Trout (Oncorhynchus Mykiss) Classification Using Image-Based Features.

    Science.gov (United States)

    Saberioon, Mohammadmehdi; Císař, Petr; Labbé, Laurent; Souček, Pavel; Pelissier, Pablo; Kerneis, Thierry

    2018-03-29

    The main aim of this study was to develop a new objective method for evaluating the impacts of different diets on the live fish skin using image-based features. In total, one-hundred and sixty rainbow trout ( Oncorhynchus mykiss ) were fed either a fish-meal based diet (80 fish) or a 100% plant-based diet (80 fish) and photographed using consumer-grade digital camera. Twenty-three colour features and four texture features were extracted. Four different classification methods were used to evaluate fish diets including Random forest (RF), Support vector machine (SVM), Logistic regression (LR) and k -Nearest neighbours ( k -NN). The SVM with radial based kernel provided the best classifier with correct classification rate (CCR) of 82% and Kappa coefficient of 0.65. Although the both LR and RF methods were less accurate than SVM, they achieved good classification with CCR 75% and 70% respectively. The k -NN was the least accurate (40%) classification model. Overall, it can be concluded that consumer-grade digital cameras could be employed as the fast, accurate and non-invasive sensor for classifying rainbow trout based on their diets. Furthermore, these was a close association between image-based features and fish diet received during cultivation. These procedures can be used as non-invasive, accurate and precise approaches for monitoring fish status during the cultivation by evaluating diet's effects on fish skin.

  20. Comparative Performance Analysis of Support Vector Machine, Random Forest, Logistic Regression and k-Nearest Neighbours in Rainbow Trout (Oncorhynchus Mykiss Classification Using Image-Based Features

    Directory of Open Access Journals (Sweden)

    Mohammadmehdi Saberioon

    2018-03-01

    Full Text Available The main aim of this study was to develop a new objective method for evaluating the impacts of different diets on the live fish skin using image-based features. In total, one-hundred and sixty rainbow trout (Oncorhynchus mykiss were fed either a fish-meal based diet (80 fish or a 100% plant-based diet (80 fish and photographed using consumer-grade digital camera. Twenty-three colour features and four texture features were extracted. Four different classification methods were used to evaluate fish diets including Random forest (RF, Support vector machine (SVM, Logistic regression (LR and k-Nearest neighbours (k-NN. The SVM with radial based kernel provided the best classifier with correct classification rate (CCR of 82% and Kappa coefficient of 0.65. Although the both LR and RF methods were less accurate than SVM, they achieved good classification with CCR 75% and 70% respectively. The k-NN was the least accurate (40% classification model. Overall, it can be concluded that consumer-grade digital cameras could be employed as the fast, accurate and non-invasive sensor for classifying rainbow trout based on their diets. Furthermore, these was a close association between image-based features and fish diet received during cultivation. These procedures can be used as non-invasive, accurate and precise approaches for monitoring fish status during the cultivation by evaluating diet’s effects on fish skin.

  1. Genetic correlations among body condition score, yield, and fertility in first-parity cows estimated by random regression models.

    Science.gov (United States)

    Veerkamp, R F; Koenen, E P; De Jong, G

    2001-10-01

    Twenty type classifiers scored body condition (BCS) of 91,738 first-parity cows from 601 sires and 5518 maternal grandsires. Fertility data during first lactation were extracted for 177,220 cows, of which 67,278 also had a BCS observation, and first-lactation 305-d milk, fat, and protein yields were added for 180,631 cows. Heritabilities and genetic correlations were estimated using a sire-maternal grandsire model. Heritability of BCS was 0.38. Heritabilities for fertility traits were low (0.01 to 0.07), but genetic standard deviations were substantial, 9 d for days to first service and calving interval, 0.25 for number of services, and 5% for first-service conception. Phenotypic correlations between fertility and yield or BCS were small (-0.15 to 0.20). Genetic correlations between yield and all fertility traits were unfavorable (0.37 to 0.74). Genetic correlations with BCS were between -0.4 and -0.6 for calving interval and days to first service. Random regression analysis (RR) showed that correlations changed with days in milk for BCS. Little agreement was found between variances and correlations from RR, and analysis including a single month (mo 1 to 10) of data for BCS, especially during early and late lactation. However, this was due to excluding data from the conventional analysis, rather than due to the polynomials used. RR and a conventional five-traits model where BCS in mo 1, 4, 7, and 10 was treated as a separate traits (plus yield or fertility) gave similar results. Thus a parsimonious random regression model gave more realistic estimates for the (co)variances than a series of bivariate analysis on subsets of the data for BCS. A higher genetic merit for yield has unfavorable effects on fertility, but the genetic correlation suggests that BCS (at some stages of lactation) might help to alleviate the unfavorable effect of selection for higher yield on fertility.

  2. Development of a predictive model for distribution coefficient (Kd) of 13'7Cs and 60Co in marine sediments using multiple linear regression analysis

    International Nuclear Information System (INIS)

    Kumar, Ajay; Ravi, P.M.; Guneshwar, S.L.; Rout, Sabyasachi; Mishra, Manish K.; Pulhani, Vandana; Tripathi, R.M.

    2018-01-01

    Numerous common methods (batch laboratory, the column laboratory, field-batch method, field modeling and K 0c method) are used frequently for determination of K d values. Recently, multiple regression models are considered as new best estimates for predicting the K d of radionuclides in the environment. It is also well known fact that the K d value is highly influenced by physico-chemical properties of sediment. Due to the significant variability in influencing parameters, the measured K d values can range over several orders of magnitude under different environmental conditions. The aim of this study is to develop a predictive model for K d values of 137 Cs and 60 Co based on the sediment properties using multiple linear regression analysis

  3. Longitudinal changes in telomere length and associated genetic parameters in dairy cattle analysed using random regression models.

    Directory of Open Access Journals (Sweden)

    Luise A Seeker

    Full Text Available Telomeres cap the ends of linear chromosomes and shorten with age in many organisms. In humans short telomeres have been linked to morbidity and mortality. With the accumulation of longitudinal datasets the focus shifts from investigating telomere length (TL to exploring TL change within individuals over time. Some studies indicate that the speed of telomere attrition is predictive of future disease. The objectives of the present study were to 1 characterize the change in bovine relative leukocyte TL (RLTL across the lifetime in Holstein Friesian dairy cattle, 2 estimate genetic parameters of RLTL over time and 3 investigate the association of differences in individual RLTL profiles with productive lifespan. RLTL measurements were analysed using Legendre polynomials in a random regression model to describe TL profiles and genetic variance over age. The analyses were based on 1,328 repeated RLTL measurements of 308 female Holstein Friesian dairy cattle. A quadratic Legendre polynomial was fitted to the fixed effect of age in months and to the random effect of the animal identity. Changes in RLTL, heritability and within-trait genetic correlation along the age trajectory were calculated and illustrated. At a population level, the relationship between RLTL and age was described by a positive quadratic function. Individuals varied significantly regarding the direction and amount of RLTL change over life. The heritability of RLTL ranged from 0.36 to 0.47 (SE = 0.05-0.08 and remained statistically unchanged over time. The genetic correlation of RLTL at birth with measurements later in life decreased with the time interval between samplings from near unity to 0.69, indicating that TL later in life might be regulated by different genes than TL early in life. Even though animals differed in their RLTL profiles significantly, those differences were not correlated with productive lifespan (p = 0.954.

  4. Dropout from exercise randomized controlled trials among people with depression: A meta-analysis and meta regression.

    Science.gov (United States)

    Stubbs, Brendon; Vancampfort, Davy; Rosenbaum, Simon; Ward, Philip B; Richards, Justin; Soundy, Andrew; Veronese, Nicola; Solmi, Marco; Schuch, Felipe B

    2016-01-15

    Exercise has established efficacy in improving depressive symptoms. Dropouts from randomized controlled trials (RCT's) pose a threat to the validity of this evidence base, with dropout rates varying across studies. We conducted a systematic review and meta-analysis to investigate the prevalence and predictors of dropout rates among adults with depression participating in exercise RCT's. Three authors identified RCT's from a recent Cochrane review and conducted updated searches of major electronic databases from 01/2013 to 08/2015. We included RCT's of exercise interventions in people with depression (including major depressive disorder (MDD) and depressive symptoms) that reported dropout rates. A random effects meta-analysis and meta regression were conducted. Overall, 40 RCT's were included reporting dropout rates across 52 exercise interventions including 1720 people with depression (49.1 years (range=19-76 years), 72% female (range=0-100)). The trim and fill adjusted prevalence of dropout across all studies was 18.1% (95%CI=15.0-21.8%) and 17.2% (95%CI=13.5-21.7, N=31) in MDD only. In MDD participants, higher baseline depressive symptoms (β=0.0409, 95%CI=0.0809-0.0009, P=0.04) predicted greater dropout, whilst supervised interventions delivered by physiotherapists (β=-1.2029, 95%CI=-2.0967 to -0.3091, p=0.008) and exercise physiologists (β=-1.3396, 95%CI=-2.4478 to -0.2313, p=0.01) predicted lower dropout. A comparative meta-analysis (N=29) established dropout was lower in exercise than control conditions (OR=0.642, 95%CI=0.43-0.95, p=0.02). Exercise is well tolerated by people with depression and drop out in RCT's is lower than control conditions. Thus, exercise is a feasible treatment, in particular when delivered by healthcare professionals with specific training in exercise prescription. Copyright © 2015 Elsevier B.V. All rights reserved.

  5. Full Random Coefficients Multilevel Modeling of the Relationship between Land Use and Trip Time on Weekdays and Weekends

    Directory of Open Access Journals (Sweden)

    Tae-Hyoung Tommy Gim

    2017-10-01

    Full Text Available Interests in weekend trips are increasing, but few have studied how they are affected by land use. In this study, we analyze the relationship between compact land use characteristics and trip time in Seoul, Korea by comparing two research models, each of which uses the weekday and weekend data of the same travelers. To secure sufficient numbers of subjects and groups, full random coefficients multilevel models define the trip as level one and the neighborhood as level two, and find that level-two land use characteristics account for less variation in trip time than level-one individual characteristics. At level one, weekday trip time is found to be reduced by the choice of the automobile as a travel mode, but not by its ownership per se. In addition, it becomes reduced if made by high income travelers and extended to travel to quality jobs. Among four land use characteristics at level two, population density, road connectivity, and subway availability are shown to be significant in the weekday model. Only subway availability has a positive relationship with trip time and this finding is consistent with the level-one result that the choice of automobile alternatives increases trip time. The other land use characteristic, land use balance, turns out to be a single significant land use variable in the weekend model, implying that it is concerned mainly with non-work, non-mandatory travel.

  6. Global industrial impact coefficient based on random walk process and inter-country input-output table

    Science.gov (United States)

    Xing, Lizhi; Dong, Xianlei; Guan, Jun

    2017-04-01

    Input-output table is very comprehensive and detailed in describing the national economic system with lots of economic relationships, which contains supply and demand information among industrial sectors. The complex network, a theory and method for measuring the structure of complex system, can describe the structural characteristics of the internal structure of the research object by measuring the structural indicators of the social and economic system, revealing the complex relationship between the inner hierarchy and the external economic function. This paper builds up GIVCN-WIOT models based on World Input-Output Database in order to depict the topological structure of Global Value Chain (GVC), and assumes the competitive advantage of nations is equal to the overall performance of its domestic sectors' impact on the GVC. Under the perspective of econophysics, Global Industrial Impact Coefficient (GIIC) is proposed to measure the national competitiveness in gaining information superiority and intermediate interests. Analysis of GIVCN-WIOT models yields several insights including the following: (1) sectors with higher Random Walk Centrality contribute more to transmitting value streams within the global economic system; (2) Half-Value Ratio can be used to measure robustness of open-economy macroeconomics in the process of globalization; (3) the positive correlation between GIIC and GDP indicates that one country's global industrial impact could reveal its international competitive advantage.

  7. A Two-Stage Estimation Method for Random Coefficient Differential Equation Models with Application to Longitudinal HIV Dynamic Data.

    Science.gov (United States)

    Fang, Yun; Wu, Hulin; Zhu, Li-Xing

    2011-07-01

    We propose a two-stage estimation method for random coefficient ordinary differential equation (ODE) models. A maximum pseudo-likelihood estimator (MPLE) is derived based on a mixed-effects modeling approach and its asymptotic properties for population parameters are established. The proposed method does not require repeatedly solving ODEs, and is computationally efficient although it does pay a price with the loss of some estimation efficiency. However, the method does offer an alternative approach when the exact likelihood approach fails due to model complexity and high-dimensional parameter space, and it can also serve as a method to obtain the starting estimates for more accurate estimation methods. In addition, the proposed method does not need to specify the initial values of state variables and preserves all the advantages of the mixed-effects modeling approach. The finite sample properties of the proposed estimator are studied via Monte Carlo simulations and the methodology is also illustrated with application to an AIDS clinical data set.

  8. Random regression analysis for body weights and main morphological traits in genetically improved farmed tilapia (Oreochromis niloticus).

    Science.gov (United States)

    He, Jie; Zhao, Yunfeng; Zhao, Jingli; Gao, Jin; Xu, Pao; Yang, Runqing

    2018-02-01

    To genetically analyse growth traits in genetically improved farmed tilapia (GIFT), the body weight (BWE) and main morphological traits, including body length (BL), body depth (BD), body width (BWI), head length (HL) and length of the caudal peduncle (CPL), were measured six times in growth duration on 1451 fish from 45 mixed families of full and half sibs. A random regression model (RRM) was used to model genetic changes of the growth traits with days of age and estimate the heritability for any growth point and genetic correlations between pairwise growth points. Using the covariance function based on optimal RRMs, the heritabilities were estimated to be from 0.102 to 0.662 for BWE, 0.157 to 0.591 for BL, 0.047 to 0.621 for BD, 0.018 to 0.577 for BWI, 0.075 to 0.597 for HL and 0.032 to 0.610 for CPL between 60 and 140 days of age. All genetic correlations exceeded 0.5 between pairwise growth points. Moreover, the traits at initial days of age showed less correlation with those at later days of age. With phenotypes observed repeatedly, the model choice showed that the optimal RRMs could more precisely predict breeding values at a specific growth time than repeatability models or multiple trait animal models, which enhanced the efficiency of selection for the BWE and main morphological traits.

  9. Multitrait, Random Regression, or Simple Repeatability Model in High-Throughput Phenotyping Data Improve Genomic Prediction for Wheat Grain Yield.

    Science.gov (United States)

    Sun, Jin; Rutkoski, Jessica E; Poland, Jesse A; Crossa, José; Jannink, Jean-Luc; Sorrells, Mark E

    2017-07-01

    High-throughput phenotyping (HTP) platforms can be used to measure traits that are genetically correlated with wheat ( L.) grain yield across time. Incorporating such secondary traits in the multivariate pedigree and genomic prediction models would be desirable to improve indirect selection for grain yield. In this study, we evaluated three statistical models, simple repeatability (SR), multitrait (MT), and random regression (RR), for the longitudinal data of secondary traits and compared the impact of the proposed models for secondary traits on their predictive abilities for grain yield. Grain yield and secondary traits, canopy temperature (CT) and normalized difference vegetation index (NDVI), were collected in five diverse environments for 557 wheat lines with available pedigree and genomic information. A two-stage analysis was applied for pedigree and genomic selection (GS). First, secondary traits were fitted by SR, MT, or RR models, separately, within each environment. Then, best linear unbiased predictions (BLUPs) of secondary traits from the above models were used in the multivariate prediction models to compare predictive abilities for grain yield. Predictive ability was substantially improved by 70%, on average, from multivariate pedigree and genomic models when including secondary traits in both training and test populations. Additionally, (i) predictive abilities slightly varied for MT, RR, or SR models in this data set, (ii) results indicated that including BLUPs of secondary traits from the MT model was the best in severe drought, and (iii) the RR model was slightly better than SR and MT models under drought environment. Copyright © 2017 Crop Science Society of America.

  10. Inferring genetic parameters of lactation in Tropical Milking Criollo cattle with random regression test-day models.

    Science.gov (United States)

    Santellano-Estrada, E; Becerril-Pérez, C M; de Alba, J; Chang, Y M; Gianola, D; Torres-Hernández, G; Ramírez-Valverde, R

    2008-11-01

    This study inferred genetic and permanent environmental variation of milk yield in Tropical Milking Criollo cattle and compared 5 random regression test-day models using Wilmink's function and Legendre polynomials. Data consisted of 15,377 test-day records from 467 Tropical Milking Criollo cows that calved between 1974 and 2006 in the tropical lowlands of the Gulf Coast of Mexico and in southern Nicaragua. Estimated heritabilities of test-day milk yields ranged from 0.18 to 0.45, and repeatabilities ranged from 0.35 to 0.68 for the period spanning from 6 to 400 d in milk. Genetic correlation between days in milk 10 and 400 was around 0.50 but greater than 0.90 for most pairs of test days. The model that used first-order Legendre polynomials for additive genetic effects and second-order Legendre polynomials for permanent environmental effects gave the smallest residual variance and was also favored by the Akaike information criterion and likelihood ratio tests.

  11. Reduced Rank Regression

    DEFF Research Database (Denmark)

    Johansen, Søren

    2008-01-01

    The reduced rank regression model is a multivariate regression model with a coefficient matrix with reduced rank. The reduced rank regression algorithm is an estimation procedure, which estimates the reduced rank regression model. It is related to canonical correlations and involves calculating...

  12. Using Multisite Experiments to Study Cross-Site Variation in Treatment Effects: A Hybrid Approach with Fixed Intercepts and A Random Treatment Coefficient

    Science.gov (United States)

    Bloom, Howard S.; Raudenbush, Stephen W.; Weiss, Michael J.; Porter, Kristin

    2017-01-01

    The present article considers a fundamental question in evaluation research: "By how much do program effects vary across sites?" The article first presents a theoretical model of cross-site impact variation and a related estimation model with a random treatment coefficient and fixed site-specific intercepts. This approach eliminates…

  13. Models for Estimating Genetic Parameters of Milk Production Traits Using Random Regression Models in Korean Holstein Cattle

    Directory of Open Access Journals (Sweden)

    C. I. Cho

    2016-05-01

    Full Text Available The objectives of the study were to estimate genetic parameters for milk production traits of Holstein cattle using random regression models (RRMs, and to compare the goodness of fit of various RRMs with homogeneous and heterogeneous residual variances. A total of 126,980 test-day milk production records of the first parity Holstein cows between 2007 and 2014 from the Dairy Cattle Improvement Center of National Agricultural Cooperative Federation in South Korea were used. These records included milk yield (MILK, fat yield (FAT, protein yield (PROT, and solids-not-fat yield (SNF. The statistical models included random effects of genetic and permanent environments using Legendre polynomials (LP of the third to fifth order (L3–L5, fixed effects of herd-test day, year-season at calving, and a fixed regression for the test-day record (third to fifth order. The residual variances in the models were either homogeneous (HOM or heterogeneous (15 classes, HET15; 60 classes, HET60. A total of nine models (3 orders of polynomials×3 types of residual variance including L3-HOM, L3-HET15, L3-HET60, L4-HOM, L4-HET15, L4-HET60, L5-HOM, L5-HET15, and L5-HET60 were compared using Akaike information criteria (AIC and/or Schwarz Bayesian information criteria (BIC statistics to identify the model(s of best fit for their respective traits. The lowest BIC value was observed for the models L5-HET15 (MILK; PROT; SNF and L4-HET15 (FAT, which fit the best. In general, the BIC values of HET15 models for a particular polynomial order was lower than that of the HET60 model in most cases. This implies that the orders of LP and types of residual variances affect the goodness of models. Also, the heterogeneity of residual variances should be considered for the test-day analysis. The heritability estimates of from the best fitted models ranged from 0.08 to 0.15 for MILK, 0.06 to 0.14 for FAT, 0.08 to 0.12 for PROT, and 0.07 to 0.13 for SNF according to days in milk of first

  14. Evaluation of Systematic and Random Error in the Measurement of Equilibrium Solubility and Diffusion Coefficient for Liquids in Polymers

    National Research Council Canada - National Science Library

    Shuely, Wendel

    2001-01-01

    A standardized thermogravimetric analyzer (TGA) desorption method for measuring the equilibrium solubility and diffusion coefficient of toxic contaminants with polymers was further developed and evaluated...

  15. Prediction of the thermal expansion coefficients of bio diesels from several sources through the application of linear regression; Predicao dos coeficientes de expansao termica de biodieseis de diversas origens atraves da aplicacao da regressa linear

    Energy Technology Data Exchange (ETDEWEB)

    Canciam, Cesar Augusto [Universidade Tecnologica Federal do Parana (UTFPR), Campus Ponta Grossa, PR (Brazil)], e-mail: canciam@utfpr.edu.br

    2012-07-01

    When evaluating the consumption of bio fuels, the knowledge of the density is of great importance for rectify the effect of temperature. The thermal expansion coefficient is a thermodynamic property that provides a measure of the density variation in response to temperature variation, keeping the pressure constant. This study aimed to predict the thermal expansion coefficients of ethyl bio diesels from castor beans, soybeans, sunflower seeds and Mabea fistulifera Mart. oils and of methyl bio diesels from soybeans, sunflower seeds, souari nut, cotton, coconut, castor beans and palm oils, from beef tallow, chicken fat and hydrogenated vegetable fat residual. For this purpose, there was a linear regression analysis of the density of each bio diesel a function of temperature. These data were obtained from other works. The thermal expansion coefficients for bio diesels are between 6.3729x{sup 10-4} and 1.0410x10{sup -3} degree C-1. In all the cases, the correlation coefficients were over 0.99. (author)

  16. Development of a predictive model for lead, cadmium and fluorine soil-water partition coefficients using sparse multiple linear regression analysis.

    Science.gov (United States)

    Nakamura, Kengo; Yasutaka, Tetsuo; Kuwatani, Tatsu; Komai, Takeshi

    2017-11-01

    In this study, we applied sparse multiple linear regression (SMLR) analysis to clarify the relationships between soil properties and adsorption characteristics for a range of soils across Japan and identify easily-obtained physical and chemical soil properties that could be used to predict K and n values of cadmium, lead and fluorine. A model was first constructed that can easily predict the K and n values from nine soil parameters (pH, cation exchange capacity, specific surface area, total carbon, soil organic matter from loss on ignition and water holding capacity, the ratio of sand, silt and clay). The K and n values of cadmium, lead and fluorine of 17 soil samples were used to verify the SMLR models by the root mean square error values obtained from 512 combinations of soil parameters. The SMLR analysis indicated that fluorine adsorption to soil may be associated with organic matter, whereas cadmium or lead adsorption to soil is more likely to be influenced by soil pH, IL. We found that an accurate K value can be predicted from more than three soil parameters for most soils. Approximately 65% of the predicted values were between 33 and 300% of their measured values for the K value; 76% of the predicted values were within ±30% of their measured values for the n value. Our findings suggest that adsorption properties of lead, cadmium and fluorine to soil can be predicted from the soil physical and chemical properties using the presented models. Copyright © 2017 Elsevier Ltd. All rights reserved.

  17. A comparison of confidence interval methods for the intraclass correlation coefficient in community-based cluster randomization trials with a binary outcome.

    Science.gov (United States)

    Braschel, Melissa C; Svec, Ivana; Darlington, Gerarda A; Donner, Allan

    2016-04-01

    Many investigators rely on previously published point estimates of the intraclass correlation coefficient rather than on their associated confidence intervals to determine the required size of a newly planned cluster randomized trial. Although confidence interval methods for the intraclass correlation coefficient that can be applied to community-based trials have been developed for a continuous outcome variable, fewer methods exist for a binary outcome variable. The aim of this study is to evaluate confidence interval methods for the intraclass correlation coefficient applied to binary outcomes in community intervention trials enrolling a small number of large clusters. Existing methods for confidence interval construction are examined and compared to a new ad hoc approach based on dividing clusters into a large number of smaller sub-clusters and subsequently applying existing methods to the resulting data. Monte Carlo simulation is used to assess the width and coverage of confidence intervals for the intraclass correlation coefficient based on Smith's large sample approximation of the standard error of the one-way analysis of variance estimator, an inverted modified Wald test for the Fleiss-Cuzick estimator, and intervals constructed using a bootstrap-t applied to a variance-stabilizing transformation of the intraclass correlation coefficient estimate. In addition, a new approach is applied in which clusters are randomly divided into a large number of smaller sub-clusters with the same methods applied to these data (with the exception of the bootstrap-t interval, which assumes large cluster sizes). These methods are also applied to a cluster randomized trial on adolescent tobacco use for illustration. When applied to a binary outcome variable in a small number of large clusters, existing confidence interval methods for the intraclass correlation coefficient provide poor coverage. However, confidence intervals constructed using the new approach combined with Smith

  18. New machine learning tools for predictive vegetation mapping after climate change: Bagging and Random Forest perform better than Regression Tree Analysis

    Science.gov (United States)

    L.R. Iverson; A.M. Prasad; A. Liaw

    2004-01-01

    More and better machine learning tools are becoming available for landscape ecologists to aid in understanding species-environment relationships and to map probable species occurrence now and potentially into the future. To thal end, we evaluated three statistical models: Regression Tree Analybib (RTA), Bagging Trees (BT) and Random Forest (RF) for their utility in...

  19. Estimation of genotype X environment interactions, in a grassbased system, for milk yield, body condition score,and body weight using random regression models

    NARCIS (Netherlands)

    Berry, D.P.; Buckley, F.; Dillon, P.; Evans, R.D.; Rath, M.; Veerkamp, R.F.

    2003-01-01

    (Co)variance components for milk yield, body condition score (BCS), body weight (BW), BCS change and BW change over different herd-year mean milk yields (HMY) and nutritional environments (concentrate feeding level, grazing severity and silage quality) were estimated using a random regression model.

  20. Kendall-Theil Robust Line (KTRLine--version 1.0)-A Visual Basic Program for Calculating and Graphing Robust Nonparametric Estimates of Linear-Regression Coefficients Between Two Continuous Variables

    Science.gov (United States)

    Granato, Gregory E.

    2006-01-01

    The Kendall-Theil Robust Line software (KTRLine-version 1.0) is a Visual Basic program that may be used with the Microsoft Windows operating system to calculate parameters for robust, nonparametric estimates of linear-regression coefficients between two continuous variables. The KTRLine software was developed by the U.S. Geological Survey, in cooperation with the Federal Highway Administration, for use in stochastic data modeling with local, regional, and national hydrologic data sets to develop planning-level estimates of potential effects of highway runoff on the quality of receiving waters. The Kendall-Theil robust line was selected because this robust nonparametric method is resistant to the effects of outliers and nonnormality in residuals that commonly characterize hydrologic data sets. The slope of the line is calculated as the median of all possible pairwise slopes between points. The intercept is calculated so that the line will run through the median of input data. A single-line model or a multisegment model may be specified. The program was developed to provide regression equations with an error component for stochastic data generation because nonparametric multisegment regression tools are not available with the software that is commonly used to develop regression models. The Kendall-Theil robust line is a median line and, therefore, may underestimate total mass, volume, or loads unless the error component or a bias correction factor is incorporated into the estimate. Regression statistics such as the median error, the median absolute deviation, the prediction error sum of squares, the root mean square error, the confidence interval for the slope, and the bias correction factor for median estimates are calculated by use of nonparametric methods. These statistics, however, may be used to formulate estimates of mass, volume, or total loads. The program is used to read a two- or three-column tab-delimited input file with variable names in the first row and

  1. Polynomial Chaos Expansion of Random Coefficients and the Solution of Stochastic Partial Differential Equations in the Tensor Train Format

    KAUST Repository

    Dolgov, Sergey; Khoromskij, Boris N.; Litvinenko, Alexander; Matthies, Hermann G.

    2015-01-01

    We apply the tensor train (TT) decomposition to construct the tensor product polynomial chaos expansion (PCE) of a random field, to solve the stochastic elliptic diffusion PDE with the stochastic Galerkin discretization, and to compute some

  2. Estimates of Intraclass Correlation Coefficients from Longitudinal Group-Randomized Trials of Adolescent HIV/STI/Pregnancy Prevention Programs

    Science.gov (United States)

    Glassman, Jill R.; Potter, Susan C.; Baumler, Elizabeth R.; Coyle, Karin K.

    2015-01-01

    Introduction: Group-randomized trials (GRTs) are one of the most rigorous methods for evaluating the effectiveness of group-based health risk prevention programs. Efficiently designing GRTs with a sample size that is sufficient for meeting the trial's power and precision goals while not wasting resources exceeding them requires estimates of the…

  3. SAS Code for Calculating Intraclass Correlation Coefficients and Effect Size Benchmarks for Site-Randomized Education Experiments

    Science.gov (United States)

    Brandon, Paul R.; Harrison, George M.; Lawton, Brian E.

    2013-01-01

    When evaluators plan site-randomized experiments, they must conduct the appropriate statistical power analyses. These analyses are most likely to be valid when they are based on data from the jurisdictions in which the studies are to be conducted. In this method note, we provide software code, in the form of a SAS macro, for producing statistical…

  4. Multiple linear regression analysis

    Science.gov (United States)

    Edwards, T. R.

    1980-01-01

    Program rapidly selects best-suited set of coefficients. User supplies only vectors of independent and dependent data and specifies confidence level required. Program uses stepwise statistical procedure for relating minimal set of variables to set of observations; final regression contains only most statistically significant coefficients. Program is written in FORTRAN IV for batch execution and has been implemented on NOVA 1200.

  5. Application of single-step genomic best linear unbiased prediction with a multiple-lactation random regression test-day model for Japanese Holsteins.

    Science.gov (United States)

    Baba, Toshimi; Gotoh, Yusaku; Yamaguchi, Satoshi; Nakagawa, Satoshi; Abe, Hayato; Masuda, Yutaka; Kawahara, Takayoshi

    2017-08-01

    This study aimed to evaluate a validation reliability of single-step genomic best linear unbiased prediction (ssGBLUP) with a multiple-lactation random regression test-day model and investigate an effect of adding genotyped cows on the reliability. Two data sets for test-day records from the first three lactations were used: full data from February 1975 to December 2015 (60 850 534 records from 2 853 810 cows) and reduced data cut off in 2011 (53 091 066 records from 2 502 307 cows). We used marker genotypes of 4480 bulls and 608 cows. Genomic enhanced breeding values (GEBV) of 305-day milk yield in all the lactations were estimated for at least 535 young bulls using two marker data sets: bull genotypes only and both bulls and cows genotypes. The realized reliability (R 2 ) from linear regression analysis was used as an indicator of validation reliability. Using only genotyped bulls, R 2 was ranged from 0.41 to 0.46 and it was always higher than parent averages. The very similar R 2 were observed when genotyped cows were added. An application of ssGBLUP to a multiple-lactation random regression model is feasible and adding a limited number of genotyped cows has no significant effect on reliability of GEBV for genotyped bulls. © 2016 Japanese Society of Animal Science.

  6. A Monte Carlo experiment to analyze the curse of dimensionality in estimating random coefficients models with a full variance–covariance matrix

    DEFF Research Database (Denmark)

    Cherchi, Elisabetta; Guevara, Cristian Angelo

    2012-01-01

    of parameters increases is usually known as the “curse of dimensionality” in the simulation methods. We investigate this problem in the case of the random coefficients Logit model. We compare the traditional Maximum Simulated Likelihood (MSL) method with two alternative estimation methods: the Expectation......–Maximization (EM) and the Laplace Approximation (HH) methods that do not require simulation. We use Monte Carlo experimentation to investigate systematically the performance of the methods under different circumstances, including different numbers of variables, sample sizes and structures of the variance...

  7. Acupuncture for musculoskeletal pain: A meta-analysis and meta-regression of sham-controlled randomized clinical trials

    Science.gov (United States)

    Yuan, Qi-ling; Wang, Peng; Liu, Liang; Sun, Fu; Cai, Yong-song; Wu, Wen-tao; Ye, Mao-lin; Ma, Jiang-tao; Xu, Bang-bang; Zhang, Yin-gang

    2016-01-01

    The aims of this systematic review were to study the analgesic effect of real acupuncture and to explore whether sham acupuncture (SA) type is related to the estimated effect of real acupuncture for musculoskeletal pain. Five databases were searched. The outcome was pain or disability immediately (≤1 week) following an intervention. Standardized mean differences (SMDs) with 95% confidence intervals were calculated. Meta-regression was used to explore possible sources of heterogeneity. Sixty-three studies (6382 individuals) were included. Eight condition types were included. The pooled effect size was moderate for pain relief (59 trials, 4980 individuals, SMD −0.61, 95% CI −0.76 to −0.47; P acupuncture has a moderate effect (approximate 12-point reduction on the 100-mm visual analogue scale) on musculoskeletal pain. SA type did not appear to be related to the estimated effect of real acupuncture. PMID:27471137

  8. Dual Regression

    OpenAIRE

    Spady, Richard; Stouli, Sami

    2012-01-01

    We propose dual regression as an alternative to the quantile regression process for the global estimation of conditional distribution functions under minimal assumptions. Dual regression provides all the interpretational power of the quantile regression process while avoiding the need for repairing the intersecting conditional quantile surfaces that quantile regression often produces in practice. Our approach introduces a mathematical programming characterization of conditional distribution f...

  9. Longitudinal analysis of the strengths and difficulties questionnaire scores of the Millennium Cohort Study children in England using M-quantile random-effects regression.

    Science.gov (United States)

    Tzavidis, Nikos; Salvati, Nicola; Schmid, Timo; Flouri, Eirini; Midouhas, Emily

    2016-02-01

    Multilevel modelling is a popular approach for longitudinal data analysis. Statistical models conventionally target a parameter at the centre of a distribution. However, when the distribution of the data is asymmetric, modelling other location parameters, e.g. percentiles, may be more informative. We present a new approach, M -quantile random-effects regression, for modelling multilevel data. The proposed method is used for modelling location parameters of the distribution of the strengths and difficulties questionnaire scores of children in England who participate in the Millennium Cohort Study. Quantile mixed models are also considered. The analyses offer insights to child psychologists about the differential effects of risk factors on children's outcomes.

  10. Recursive Algorithm For Linear Regression

    Science.gov (United States)

    Varanasi, S. V.

    1988-01-01

    Order of model determined easily. Linear-regression algorithhm includes recursive equations for coefficients of model of increased order. Algorithm eliminates duplicative calculations, facilitates search for minimum order of linear-regression model fitting set of data satisfactory.

  11. Modelos de regressão aleatória para avaliação da curva de crescimento em matrizes de codorna de corte Random regression models for growth evaluation of meat-type quail hens

    Directory of Open Access Journals (Sweden)

    Bruno Bastos Teixeira

    2012-09-01

    Full Text Available Objetivou-se comparar diferentes modelos de regressão aleatória por meio de funções polinomiais de Legendre de diferentes ordens, para avaliar o que melhor se ajusta ao estudo genético da curva de crescimento de codornas de corte. Foram avaliados dados de 2136 matrizes de codorna de corte, dos quais 1026 pertenciam ao grupo genético UFV1 e 1110 ao grupo UFV2. As codornas foram pesadas nos 1°, 7°, 14°, 21°, 28°, 35°, 42°, 77°, 112° e 147° dias de idade e seus pesos utilizados para a análise. Foram testadas duas possíveis modelagens de variância residual heterogênea, sendo agrupadas em 3 e 5 classes de idade. Após, foi realizado o estudo do modelo de regressão aleatória que melhor aplica-se à curva de crescimento das codornas. A comparação entre os modelos foi feita pelo Critério de Informação de Akaike (AIC, Critério de Informação Bayesiano de Schwarz (BIC, Logaritmo da função de verossimilhança (Log e L e teste da razão de verossimilhança (LRT, ao nível de 1%. O modelo que considerou a heterogeneidade de variância residual CL3 mostrou-se adequado à linhagem UFV1, e o modelo CL5 à linhagem UFV2. Uma função polinomial de Legendre com ordem 5, para efeito genético aditivo direto e 5 para efeito permanente de animal, para a linhagem UFV1 e, com ordem 3, para efeito genético aditivo direto e 5 para efeito permanente de animal para a linhagem UFV2, deve ser utilizada na avaliação genética da curva de crescimento das codornas de corte.The objective was to compare different random regression models using Legendre polynomial functions of different orders, to evaluate what best fits the genetic study of the growth curve of meat quails. It was evaluated data from 2136 cut dies quail, of which 1026 belonged to genetic group UFV1 and 1110 the group UFV2. Quail were weighed at 10, 70, 140, 210, 280, 350, 420, 770, 1120 and 1470 days of age, and weights used for the analysis. It was tested two possible modeling

  12. Mapping SOC (Soil Organic Carbon) using LiDAR-derived vegetation indices in a random forest regression model

    Science.gov (United States)

    Will, R. M.; Glenn, N. F.; Benner, S. G.; Pierce, J. L.; Spaete, L.; Li, A.

    2015-12-01

    Quantifying SOC (Soil Organic Carbon) storage in complex terrain is challenging due to high spatial variability. Generally, the challenge is met by transforming point data to the entire landscape using surrogate, spatially-distributed, variables like elevation or precipitation. In many ecosystems, remotely sensed information on above-ground vegetation (e.g. NDVI) is a good predictor of below-ground carbon stocks. In this project, we are attempting to improve this predictive method by incorporating LiDAR-derived vegetation indices. LiDAR provides a mechanism for improved characterization of aboveground vegetation by providing structural parameters such as vegetation height and biomass. In this study, a random forest model is used to predict SOC using a suite of LiDAR-derived vegetation indices as predictor variables. The Reynolds Creek Experimental Watershed (RCEW) is an ideal location for a study of this type since it encompasses a strong elevation/precipitation gradient that supports lower biomass sagebrush ecosystems at low elevations and forests with more biomass at higher elevations. Sagebrush ecosystems composed of Wyoming, Low and Mountain Sagebrush have SOC values ranging from .4 to 1% (top 30 cm), while higher biomass ecosystems composed of aspen, juniper and fir have SOC values approaching 4% (top 30 cm). Large differences in SOC have been observed between canopy and interspace locations and high resolution vegetation information is likely to explain plot scale variability in SOC. Mapping of the SOC reservoir will help identify underlying controls on SOC distribution and provide insight into which processes are most important in determining SOC in semi-arid mountainous regions. In addition, airborne LiDAR has the potential to characterize vegetation communities at a high resolution and could be a tool for improving estimates of SOC at larger scales.

  13. Estimation of Genetic Parameters for First Lactation Monthly Test-day Milk Yields using Random Regression Test Day Model in Karan Fries Cattle

    Directory of Open Access Journals (Sweden)

    Ajay Singh

    2016-06-01

    Full Text Available A single trait linear mixed random regression test-day model was applied for the first time for analyzing the first lactation monthly test-day milk yield records in Karan Fries cattle. The test-day milk yield data was modeled using a random regression model (RRM considering different order of Legendre polynomial for the additive genetic effect (4th order and the permanent environmental effect (5th order. Data pertaining to 1,583 lactation records spread over a period of 30 years were recorded and analyzed in the study. The variance component, heritability and genetic correlations among test-day milk yields were estimated using RRM. RRM heritability estimates of test-day milk yield varied from 0.11 to 0.22 in different test-day records. The estimates of genetic correlations between different test-day milk yields ranged 0.01 (test-day 1 [TD-1] and TD-11 to 0.99 (TD-4 and TD-5. The magnitudes of genetic correlations between test-day milk yields decreased as the interval between test-days increased and adjacent test-day had higher correlations. Additive genetic and permanent environment variances were higher for test-day milk yields at both ends of lactation. The residual variance was observed to be lower than the permanent environment variance for all the test-day milk yields.

  14. Downscaling of surface moisture flux and precipitation in the Ebro Valley (Spain using analogues and analogues followed by random forests and multiple linear regression

    Directory of Open Access Journals (Sweden)

    G. Ibarra-Berastegi

    2011-06-01

    Full Text Available In this paper, reanalysis fields from the ECMWF have been statistically downscaled to predict from large-scale atmospheric fields, surface moisture flux and daily precipitation at two observatories (Zaragoza and Tortosa, Ebro Valley, Spain during the 1961–2001 period. Three types of downscaling models have been built: (i analogues, (ii analogues followed by random forests and (iii analogues followed by multiple linear regression. The inputs consist of data (predictor fields taken from the ERA-40 reanalysis. The predicted fields are precipitation and surface moisture flux as measured at the two observatories. With the aim to reduce the dimensionality of the problem, the ERA-40 fields have been decomposed using empirical orthogonal functions. Available daily data has been divided into two parts: a training period used to find a group of about 300 analogues to build the downscaling model (1961–1996 and a test period (1997–2001, where models' performance has been assessed using independent data. In the case of surface moisture flux, the models based on analogues followed by random forests do not clearly outperform those built on analogues plus multiple linear regression, while simple averages calculated from the nearest analogues found in the training period, yielded only slightly worse results. In the case of precipitation, the three types of model performed equally. These results suggest that most of the models' downscaling capabilities can be attributed to the analogues-calculation stage.

  15. Panel Smooth Transition Regression Models

    DEFF Research Database (Denmark)

    González, Andrés; Terasvirta, Timo; Dijk, Dick van

    We introduce the panel smooth transition regression model. This new model is intended for characterizing heterogeneous panels, allowing the regression coefficients to vary both across individuals and over time. Specifically, heterogeneity is allowed for by assuming that these coefficients are bou...

  16. Supremum Norm Posterior Contraction and Credible Sets for Nonparametric Multivariate Regression

    NARCIS (Netherlands)

    Yoo, W.W.; Ghosal, S

    2016-01-01

    In the setting of nonparametric multivariate regression with unknown error variance, we study asymptotic properties of a Bayesian method for estimating a regression function f and its mixed partial derivatives. We use a random series of tensor product of B-splines with normal basis coefficients as a

  17. Contribution to the neutronic theory of random stacks (diffusion coefficient and first-flight collision probabilities) with a general theorem on collision probabilities

    International Nuclear Information System (INIS)

    Dixmier, Marc.

    1980-10-01

    A general expression of the diffusion coefficient (d.c.) of neutrons was given, with stress being put on symmetries. A system of first-flight collision probabilities for the case of a random stack of any number of types of one- and two-zoned spherical pebbles, with an albedo at the frontiers of the elements or (either) consideration of the interstital medium, was built; to that end, the bases of collision probability theory were reviewed, and a wide generalisation of the reciprocity theorem for those probabilities was demonstrated. The migration area of neutrons was expressed for any random stack of convex, 'simple' and 'regular-contact' elements, taking into account the correlations between free-paths; the average cosinus of re-emission of neutrons by an element, in the case of a homogeneous spherical pebble and the transport approximation, was expressed; the superiority of the so-found result over Behrens' theory, for the type of media under consideration, was established. The 'fine structure current term' of the d.c. was also expressed, and it was shown that its 'polarisation term' is negligible. Numerical applications showed that the global heterogeneity effect on the d.c. of pebble-bed reactors is comparable with that for Graphite-moderated, Carbon gas-cooled, natural Uranium reactors. The code CARACOLE, which integrates all the results here obtained, was introduced [fr

  18. Exploring reasons for the observed inconsistent trial reports on intra-articular injections with hyaluronic acid in the treatment of osteoarthritis: Meta-regression analyses of randomized trials.

    Science.gov (United States)

    Johansen, Mette; Bahrt, Henriette; Altman, Roy D; Bartels, Else M; Juhl, Carsten B; Bliddal, Henning; Lund, Hans; Christensen, Robin

    2016-08-01

    The aim was to identify factors explaining inconsistent observations concerning the efficacy of intra-articular hyaluronic acid compared to intra-articular sham/control, or non-intervention control, in patients with symptomatic osteoarthritis, based on randomized clinical trials (RCTs). A systematic review and meta-regression analyses of available randomized trials were conducted. The outcome, pain, was assessed according to a pre-specified hierarchy of potentially available outcomes. Hedges׳s standardized mean difference [SMD (95% CI)] served as effect size. REstricted Maximum Likelihood (REML) mixed-effects models were used to combine study results, and heterogeneity was calculated and interpreted as Tau-squared and I-squared, respectively. Overall, 99 studies (14,804 patients) met the inclusion criteria: Of these, only 71 studies (72%), including 85 comparisons (11,216 patients), had adequate data available for inclusion in the primary meta-analysis. Overall, compared with placebo, intra-articular hyaluronic acid reduced pain with an effect size of -0.39 [-0.47 to -0.31; P hyaluronic acid. Based on available trial data, intra-articular hyaluronic acid showed a better effect than intra-articular saline on pain reduction in osteoarthritis. Publication bias and the risk of selective outcome reporting suggest only small clinical effect compared to saline. Copyright © 2016 Elsevier Inc. All rights reserved.

  19. Assessing the suitability of summary data for two-sample Mendelian randomization analyses using MR-Egger regression: the role of the I2 statistic.

    Science.gov (United States)

    Bowden, Jack; Del Greco M, Fabiola; Minelli, Cosetta; Davey Smith, George; Sheehan, Nuala A; Thompson, John R

    2016-12-01

    : MR-Egger regression has recently been proposed as a method for Mendelian randomization (MR) analyses incorporating summary data estimates of causal effect from multiple individual variants, which is robust to invalid instruments. It can be used to test for directional pleiotropy and provides an estimate of the causal effect adjusted for its presence. MR-Egger regression provides a useful additional sensitivity analysis to the standard inverse variance weighted (IVW) approach that assumes all variants are valid instruments. Both methods use weights that consider the single nucleotide polymorphism (SNP)-exposure associations to be known, rather than estimated. We call this the `NO Measurement Error' (NOME) assumption. Causal effect estimates from the IVW approach exhibit weak instrument bias whenever the genetic variants utilized violate the NOME assumption, which can be reliably measured using the F-statistic. The effect of NOME violation on MR-Egger regression has yet to be studied. An adaptation of the I2 statistic from the field of meta-analysis is proposed to quantify the strength of NOME violation for MR-Egger. It lies between 0 and 1, and indicates the expected relative bias (or dilution) of the MR-Egger causal estimate in the two-sample MR context. We call it IGX2 . The method of simulation extrapolation is also explored to counteract the dilution. Their joint utility is evaluated using simulated data and applied to a real MR example. In simulated two-sample MR analyses we show that, when a causal effect exists, the MR-Egger estimate of causal effect is biased towards the null when NOME is violated, and the stronger the violation (as indicated by lower values of IGX2 ), the stronger the dilution. When additionally all genetic variants are valid instruments, the type I error rate of the MR-Egger test for pleiotropy is inflated and the causal effect underestimated. Simulation extrapolation is shown to substantially mitigate these adverse effects. We

  20. Regression Phalanxes

    OpenAIRE

    Zhang, Hongyang; Welch, William J.; Zamar, Ruben H.

    2017-01-01

    Tomal et al. (2015) introduced the notion of "phalanxes" in the context of rare-class detection in two-class classification problems. A phalanx is a subset of features that work well for classification tasks. In this paper, we propose a different class of phalanxes for application in regression settings. We define a "Regression Phalanx" - a subset of features that work well together for prediction. We propose a novel algorithm which automatically chooses Regression Phalanxes from high-dimensi...

  1. Vector regression introduced

    Directory of Open Access Journals (Sweden)

    Mok Tik

    2014-06-01

    Full Text Available This study formulates regression of vector data that will enable statistical analysis of various geodetic phenomena such as, polar motion, ocean currents, typhoon/hurricane tracking, crustal deformations, and precursory earthquake signals. The observed vector variable of an event (dependent vector variable is expressed as a function of a number of hypothesized phenomena realized also as vector variables (independent vector variables and/or scalar variables that are likely to impact the dependent vector variable. The proposed representation has the unique property of solving the coefficients of independent vector variables (explanatory variables also as vectors, hence it supersedes multivariate multiple regression models, in which the unknown coefficients are scalar quantities. For the solution, complex numbers are used to rep- resent vector information, and the method of least squares is deployed to estimate the vector model parameters after transforming the complex vector regression model into a real vector regression model through isomorphism. Various operational statistics for testing the predictive significance of the estimated vector parameter coefficients are also derived. A simple numerical example demonstrates the use of the proposed vector regression analysis in modeling typhoon paths.

  2. Multicollinearity and Regression Analysis

    Science.gov (United States)

    Daoud, Jamal I.

    2017-12-01

    In regression analysis it is obvious to have a correlation between the response and predictor(s), but having correlation among predictors is something undesired. The number of predictors included in the regression model depends on many factors among which, historical data, experience, etc. At the end selection of most important predictors is something objective due to the researcher. Multicollinearity is a phenomena when two or more predictors are correlated, if this happens, the standard error of the coefficients will increase [8]. Increased standard errors means that the coefficients for some or all independent variables may be found to be significantly different from In other words, by overinflating the standard errors, multicollinearity makes some variables statistically insignificant when they should be significant. In this paper we focus on the multicollinearity, reasons and consequences on the reliability of the regression model.

  3. Determining clinical benefits of drug-eluting coronary stents according to the population risk profile: a meta-regression from 31 randomized trials.

    Science.gov (United States)

    Moreno, Raul; Martin-Reyes, Roberto; Jimenez-Valero, Santiago; Sanchez-Recalde, Angel; Galeote, Guillermo; Calvo, Luis; Plaza, Ignacio; Lopez-Sendon, Jose-Luis

    2011-04-01

    The use of drug-eluting stents (DES) in unfavourable patients has been associated with higher rates of clinical complications and stent thrombosis, and because of that concerns about the use of DES in high-risk settings have been raised. This study sought to demonstrate that the clinical benefit of DES increases as the risk profile of the patients increases. A meta-regression analysis from 31 randomized trials that compared DES and bare-metal stents, including overall 12,035 patients, was performed. The relationship between the clinical benefit of using DES (number of patients to treat [NNT] to prevent one episode of target lesion revascularization [TLR]), and the risk profile of the population (rate of TLR in patients allocated to bare-metal stents) in each trial was evaluated. The clinical benefit of DES increased as the risk profile of each study population increased: NNT for TLR=31.1-1.2 (TLR for bare-metal stents); prisk profile of each study population, since the effect of DES in mortality, myocardial infarction, and stent thrombosis, was not adversely affected by the risk profile of each study population (95% confidence interval for β value 0.09 to 0.11, -0.12 to 0.19, and -0.03 to-0.15 for mortality, myocardial infarction, and stent thrombosis, respectively). The clinical benefit of DES increases as the risk profile of the patients increases, without affecting safety. Copyright © 2009 Elsevier Ireland Ltd. All rights reserved.

  4. Sample size adjustments for varying cluster sizes in cluster randomized trials with binary outcomes analyzed with second-order PQL mixed logistic regression.

    Science.gov (United States)

    Candel, Math J J M; Van Breukelen, Gerard J P

    2010-06-30

    Adjustments of sample size formulas are given for varying cluster sizes in cluster randomized trials with a binary outcome when testing the treatment effect with mixed effects logistic regression using second-order penalized quasi-likelihood estimation (PQL). Starting from first-order marginal quasi-likelihood (MQL) estimation of the treatment effect, the asymptotic relative efficiency of unequal versus equal cluster sizes is derived. A Monte Carlo simulation study shows this asymptotic relative efficiency to be rather accurate for realistic sample sizes, when employing second-order PQL. An approximate, simpler formula is presented to estimate the efficiency loss due to varying cluster sizes when planning a trial. In many cases sampling 14 per cent more clusters is sufficient to repair the efficiency loss due to varying cluster sizes. Since current closed-form formulas for sample size calculation are based on first-order MQL, planning a trial also requires a conversion factor to obtain the variance of the second-order PQL estimator. In a second Monte Carlo study, this conversion factor turned out to be 1.25 at most. (c) 2010 John Wiley & Sons, Ltd.

  5. Detecting spatio-temporal changes in agricultural land use in Heilongjiang province, China using MODIS time-series data and a random forest regression model

    Science.gov (United States)

    Hu, Q.; Friedl, M. A.; Wu, W.

    2017-12-01

    Accurate and timely information regarding the spatial distribution of crop types and their changes is essential for acreage surveys, yield estimation, water management, and agricultural production decision-making. In recent years, increasing population, dietary shifts and climate change have driven drastic changes in China's agricultural land use. However, no maps are currently available that document the spatial and temporal patterns of these agricultural land use changes. Because of its short revisit period, rich spectral bands and global coverage, MODIS time series data has been shown to have great potential for detecting the seasonal dynamics of different crop types. However, its inherently coarse spatial resolution limits the accuracy with which crops can be identified from MODIS in regions with small fields or complex agricultural landscapes. To evaluate this more carefully and specifically understand the strengths and weaknesses of MODIS data for crop-type mapping, we used MODIS time-series imagery to map the sub-pixel fractional crop area for four major crop types (rice, corn, soybean and wheat) at 500-m spatial resolution for Heilongjiang province, one of the most important grain-production regions in China where recent agricultural land use change has been rapid and pronounced. To do this, a random forest regression (RF-g) model was constructed to estimate the percentage of each sub-pixel crop type in 2006, 2011 and 2016. Crop type maps generated through expert visual interpretation of high spatial resolution images (i.e., Landsat and SPOT data) were used to calibrate the regression model. Five different time series of vegetation indices (155 features) derived from different spectral channels of MODIS land surface reflectance (MOD09A1) data were used as candidate features for the RF-g model. An out-of-bag strategy and backward elimination approach was applied to select the optimal spectra-temporal feature subset for each crop type. The resulting crop maps

  6. Random regression models to account for the effect of genotype by environment interaction due to heat stress on the milk yield of Holstein cows under tropical conditions.

    Science.gov (United States)

    Santana, Mário L; Bignardi, Annaiza Braga; Pereira, Rodrigo Junqueira; Menéndez-Buxadera, Alberto; El Faro, Lenira

    2016-02-01

    The present study had the following objectives: to compare random regression models (RRM) considering the time-dependent (days in milk, DIM) and/or temperature × humidity-dependent (THI) covariate for genetic evaluation; to identify the effect of genotype by environment interaction (G×E) due to heat stress on milk yield; and to quantify the loss of milk yield due to heat stress across lactation of cows under tropical conditions. A total of 937,771 test-day records from 3603 first lactations of Brazilian Holstein cows obtained between 2007 and 2013 were analyzed. An important reduction in milk yield due to heat stress was observed for THI values above 66 (-0.23 kg/day/THI). Three phases of milk yield loss were identified during lactation, the most damaging one at the end of lactation (-0.27 kg/day/THI). Using the most complex RRM, the additive genetic variance could be altered simultaneously as a function of both DIM and THI values. This model could be recommended for the genetic evaluation taking into account the effect of G×E. The response to selection in the comfort zone (THI ≤ 66) is expected to be higher than that obtained in the heat stress zone (THI > 66) of the animals. The genetic correlations between milk yield in the comfort and heat stress zones were less than unity at opposite extremes of the environmental gradient. Thus, the best animals for milk yield in the comfort zone are not necessarily the best in the zone of heat stress and, therefore, G×E due to heat stress should not be neglected in the genetic evaluation.

  7. Autistic Regression

    Science.gov (United States)

    Matson, Johnny L.; Kozlowski, Alison M.

    2010-01-01

    Autistic regression is one of the many mysteries in the developmental course of autism and pervasive developmental disorders not otherwise specified (PDD-NOS). Various definitions of this phenomenon have been used, further clouding the study of the topic. Despite this problem, some efforts at establishing prevalence have been made. The purpose of…

  8. Linear regression

    CERN Document Server

    Olive, David J

    2017-01-01

    This text covers both multiple linear regression and some experimental design models. The text uses the response plot to visualize the model and to detect outliers, does not assume that the error distribution has a known parametric distribution, develops prediction intervals that work when the error distribution is unknown, suggests bootstrap hypothesis tests that may be useful for inference after variable selection, and develops prediction regions and large sample theory for the multivariate linear regression model that has m response variables. A relationship between multivariate prediction regions and confidence regions provides a simple way to bootstrap confidence regions. These confidence regions often provide a practical method for testing hypotheses. There is also a chapter on generalized linear models and generalized additive models. There are many R functions to produce response and residual plots, to simulate prediction intervals and hypothesis tests, to detect outliers, and to choose response trans...

  9. Signal intensity of normal breast tissue at MR mammography on midfield: Applying a random coefficient model evaluating the effect of doubling the contrast dose

    Energy Technology Data Exchange (ETDEWEB)

    Marklund, Mette [Parker Institute: Imaging Unit, Frederiksberg Hospital (Denmark)], E-mail: mm@frh.regionh.dk; Christensen, Robin [Parker Institute: Musculoskeletal Statistics Unit, Frederiksberg Hospital (Denmark)], E-mail: robin.christensen@frh.regionh.dk; Torp-Pedersen, Soren [Parker Institute: Imaging Unit, Frederiksberg Hospital (Denmark)], E-mail: stp@frh.regionh.dk; Thomsen, Carsten [Department of Radiology, Rigshospitalet, University of Copenhagen (Denmark)], E-mail: carsten.thomsen@rh.regionh.dk; Nolsoe, Christian P. [Department of Radiology, Koge Hospital (Denmark)], E-mail: cnolsoe@dadlnet.dk

    2009-01-15

    Purpose: To prospectively investigate the effect on signal intensity (SI) of healthy breast parenchyma on magnetic resonance mammography (MRM) when doubling the contrast dose from 0.1 to 0.2 mmol/kg bodyweight. Materials and methods: Informed consent and institutional review board approval were obtained. Twenty-five healthy female volunteers (median age: 24 years (range: 21-37 years) and median bodyweight: 65 kg (51-80 kg)) completed two dynamic MRM examinations on a 0.6 T open scanner. The inter-examination time was 24 h (23.5-25 h). The following sequences were applied: axial T2W TSE and an axial dynamic T1W FFED, with a total of seven frames. At day 1, an i.v. gadolinium (Gd) bolus injection of 0.1 mmol/kg bodyweight (Omniscan) (low) was administered. On day 2, the contrast dose was increased to 0.2 mmol/kg (high). Injection rate was 2 mL/s (day 1) and 4 mL/s (day 2). Any use of estrogen containing oral contraceptives (ECOC) was recorded. Post-processing with automated subtraction, manually traced ROI (region of interest) and recording of the SI was performed. A random coefficient model was applied. Results: We found an SI increase of 24.2% and 40% following the low and high dose, respectively (P < 0.0001); corresponding to a 65% (95% CI: 37-99%) SI increase, indicating a moderate saturation. Although not statistically significant (P = 0.06), the results indicated a tendency, towards lower maximal SI in the breast parenchyma of ECOC users compared to non-ECOC users. Conclusion: We conclude that the contrast dose can be increased from 0.1 to 0.2 mmol/kg bodyweight, if a better contrast/noise relation is desired but increasing the contrast dose above 0.2 mmol/kg bodyweight is not likely to improve the enhancement substantially due to the moderate saturation observed. Further research is needed to determine the impact of ECOC on the relative enhancement ratio, and further studies are needed to determine if a possible use of ECOC should be considered a compromising

  10. Signal intensity of normal breast tissue at MR mammography on midfield: Applying a random coefficient model evaluating the effect of doubling the contrast dose

    International Nuclear Information System (INIS)

    Marklund, Mette; Christensen, Robin; Torp-Pedersen, Soren; Thomsen, Carsten; Nolsoe, Christian P.

    2009-01-01

    Purpose: To prospectively investigate the effect on signal intensity (SI) of healthy breast parenchyma on magnetic resonance mammography (MRM) when doubling the contrast dose from 0.1 to 0.2 mmol/kg bodyweight. Materials and methods: Informed consent and institutional review board approval were obtained. Twenty-five healthy female volunteers (median age: 24 years (range: 21-37 years) and median bodyweight: 65 kg (51-80 kg)) completed two dynamic MRM examinations on a 0.6 T open scanner. The inter-examination time was 24 h (23.5-25 h). The following sequences were applied: axial T2W TSE and an axial dynamic T1W FFED, with a total of seven frames. At day 1, an i.v. gadolinium (Gd) bolus injection of 0.1 mmol/kg bodyweight (Omniscan) (low) was administered. On day 2, the contrast dose was increased to 0.2 mmol/kg (high). Injection rate was 2 mL/s (day 1) and 4 mL/s (day 2). Any use of estrogen containing oral contraceptives (ECOC) was recorded. Post-processing with automated subtraction, manually traced ROI (region of interest) and recording of the SI was performed. A random coefficient model was applied. Results: We found an SI increase of 24.2% and 40% following the low and high dose, respectively (P < 0.0001); corresponding to a 65% (95% CI: 37-99%) SI increase, indicating a moderate saturation. Although not statistically significant (P = 0.06), the results indicated a tendency, towards lower maximal SI in the breast parenchyma of ECOC users compared to non-ECOC users. Conclusion: We conclude that the contrast dose can be increased from 0.1 to 0.2 mmol/kg bodyweight, if a better contrast/noise relation is desired but increasing the contrast dose above 0.2 mmol/kg bodyweight is not likely to improve the enhancement substantially due to the moderate saturation observed. Further research is needed to determine the impact of ECOC on the relative enhancement ratio, and further studies are needed to determine if a possible use of ECOC should be considered a compromising

  11. Interpretation of commonly used statistical regression models.

    Science.gov (United States)

    Kasza, Jessica; Wolfe, Rory

    2014-01-01

    A review of some regression models commonly used in respiratory health applications is provided in this article. Simple linear regression, multiple linear regression, logistic regression and ordinal logistic regression are considered. The focus of this article is on the interpretation of the regression coefficients of each model, which are illustrated through the application of these models to a respiratory health research study. © 2013 The Authors. Respirology © 2013 Asian Pacific Society of Respirology.

  12. Linear regression in astronomy. I

    Science.gov (United States)

    Isobe, Takashi; Feigelson, Eric D.; Akritas, Michael G.; Babu, Gutti Jogesh

    1990-01-01

    Five methods for obtaining linear regression fits to bivariate data with unknown or insignificant measurement errors are discussed: ordinary least-squares (OLS) regression of Y on X, OLS regression of X on Y, the bisector of the two OLS lines, orthogonal regression, and 'reduced major-axis' regression. These methods have been used by various researchers in observational astronomy, most importantly in cosmic distance scale applications. Formulas for calculating the slope and intercept coefficients and their uncertainties are given for all the methods, including a new general form of the OLS variance estimates. The accuracy of the formulas was confirmed using numerical simulations. The applicability of the procedures is discussed with respect to their mathematical properties, the nature of the astronomical data under consideration, and the scientific purpose of the regression. It is found that, for problems needing symmetrical treatment of the variables, the OLS bisector performs significantly better than orthogonal or reduced major-axis regression.

  13. Uso de modelos de regressão aleatória para descrever a variação genética da produção de leite na raça Holandesa Random regressions models to describe the genetic variation of milk yield in Holstein breed

    Directory of Open Access Journals (Sweden)

    Cláudio Vieira de Araújo

    2006-06-01

    Full Text Available Registros de produção de leite de 68.523 controles leiteiros de 8.536 vacas da raça Holandesa, com parições nos anos de 1996 a 2001, foram utilizados na comparação entre modelos de regressão aleatória para estimação de componentes de variância. Os registros de controle leiteiro foram analisados como características múltiplas, considerando cada controle uma característica distinta. Os mesmos registros de controle leiteiro foram analisados como dados longitudinais, por meio de modelos de regressão aleatória, que diferiram entre si pela função utilizada para descrever a trajetória da curva de lactação dos animais. As funções utilizadas foram a exponencial de Wilmink, a função de Ali e Schaeffer e os polinômios de Legendre de segundo e quarto graus. A comparação entre modelos foi realizada com base nos seguintes critérios: estimativas de componentes de variância, obtidas no modelo multicaractístico e por regressão aleatória; valores da variância residual; e valores do logaritmo da função de verossimilhança. As estimativas de herdabilidade obtidas por meio dos modelos de características múltiplas variaram de 0,110 a 0,244. Para os modelos de regressão aleatória, esses valores oscilaram de 0,127 a 0,301, observando-se as maiores estimativas nos modelos com maior número de parâmetros. Verificou-se que os modelos de regressão aleatória que utilizaram os polinômios de Legendre descreveram melhor a variação genética da produção de leite.Data comprising 68,523 test day milk yield of 8,536 cows of the Holstein breed, calving from 1996 to 2001, were used to compare random regression models, for estimating variance components. Test day records (TD were analyzed as multiple traits, considering each TD as a different trait. The test day records were analyzed as longitudinal traits by different random regression models regarding the function used to describe the trajectory of the lactation curve of the animals

  14. Correlation and simple linear regression.

    Science.gov (United States)

    Zou, Kelly H; Tuncali, Kemal; Silverman, Stuart G

    2003-06-01

    In this tutorial article, the concepts of correlation and regression are reviewed and demonstrated. The authors review and compare two correlation coefficients, the Pearson correlation coefficient and the Spearman rho, for measuring linear and nonlinear relationships between two continuous variables. In the case of measuring the linear relationship between a predictor and an outcome variable, simple linear regression analysis is conducted. These statistical concepts are illustrated by using a data set from published literature to assess a computed tomography-guided interventional technique. These statistical methods are important for exploring the relationships between variables and can be applied to many radiologic studies.

  15. Retro-regression--another important multivariate regression improvement.

    Science.gov (United States)

    Randić, M

    2001-01-01

    We review the serious problem associated with instabilities of the coefficients of regression equations, referred to as the MRA (multivariate regression analysis) "nightmare of the first kind". This is manifested when in a stepwise regression a descriptor is included or excluded from a regression. The consequence is an unpredictable change of the coefficients of the descriptors that remain in the regression equation. We follow with consideration of an even more serious problem, referred to as the MRA "nightmare of the second kind", arising when optimal descriptors are selected from a large pool of descriptors. This process typically causes at different steps of the stepwise regression a replacement of several previously used descriptors by new ones. We describe a procedure that resolves these difficulties. The approach is illustrated on boiling points of nonanes which are considered (1) by using an ordered connectivity basis; (2) by using an ordering resulting from application of greedy algorithm; and (3) by using an ordering derived from an exhaustive search for optimal descriptors. A novel variant of multiple regression analysis, called retro-regression (RR), is outlined showing how it resolves the ambiguities associated with both "nightmares" of the first and the second kind of MRA.

  16. Repeated measurements of blood lactate concentration as a prognostic marker in horses with acute colitis evaluated with classification and regression trees (CART) and random forest analysis

    DEFF Research Database (Denmark)

    Petersen, Mette Bisgaard; Tolver, Anders; Husted, Louise

    2016-01-01

    -off value of 7 mmol/L had a sensitivity of 0.66 and a specificity of 0.92 in predicting survival. In independent test data, the sensitivity was 0.69 and the specificity was 0.76. At the observed survival rate (38%), the optimal decision tree identified horses as non-survivors when the Lac at admission...... admitted with acute colitis (trees, as well as random...

  17. Basic Diagnosis and Prediction of Persistent Contrail Occurrence using High-resolution Numerical Weather Analyses/Forecasts and Logistic Regression. Part I: Effects of Random Error

    Science.gov (United States)

    Duda, David P.; Minnis, Patrick

    2009-01-01

    Straightforward application of the Schmidt-Appleman contrail formation criteria to diagnose persistent contrail occurrence from numerical weather prediction data is hindered by significant bias errors in the upper tropospheric humidity. Logistic models of contrail occurrence have been proposed to overcome this problem, but basic questions remain about how random measurement error may affect their accuracy. A set of 5000 synthetic contrail observations is created to study the effects of random error in these probabilistic models. The simulated observations are based on distributions of temperature, humidity, and vertical velocity derived from Advanced Regional Prediction System (ARPS) weather analyses. The logistic models created from the simulated observations were evaluated using two common statistical measures of model accuracy, the percent correct (PC) and the Hanssen-Kuipers discriminant (HKD). To convert the probabilistic results of the logistic models into a dichotomous yes/no choice suitable for the statistical measures, two critical probability thresholds are considered. The HKD scores are higher when the climatological frequency of contrail occurrence is used as the critical threshold, while the PC scores are higher when the critical probability threshold is 0.5. For both thresholds, typical random errors in temperature, relative humidity, and vertical velocity are found to be small enough to allow for accurate logistic models of contrail occurrence. The accuracy of the models developed from synthetic data is over 85 percent for both the prediction of contrail occurrence and non-occurrence, although in practice, larger errors would be anticipated.

  18. Long-term remineralizing effect of MI Paste Plus on regression of early caries after orthodontic fixed appliance treatment: a 12-month follow-up randomized controlled trial.

    Science.gov (United States)

    Beerens, Moniek W; Ten Cate, Jacob M; Buijs, Mark J; van der Veen, Monique H

    2017-11-17

    Casein-phosphopeptide-amorphous-calcium-fluoride-phosphate (CPP-ACFP) can remineralize subsurface lesions. It is the active ingredient of MI-Paste-Plus® (MPP). The long-term remineralization efficacy is unknown. To evaluate the long-term effect of MPP versus a placebo paste on remineralization of enamel after fixed orthodontic treatment over a 12-month period. This trial was designed as a prospective, double-blinded, placebo-controlled RCT. Patients with subsurface lesions scheduled for removal of the appliance were included. They applied either MPP or control paste once a day at bedtime for 12 months, complementary to normal oral hygiene. Changes in enamel lesions (primary outcome) were fluorescence loss and lesion area determined by quantitative light-induced fluorescence (QLF). Secondary outcomes were Microbial composition, by conventional plating, and acidogenicity of plaque, by capillary ion analysis (CIA), and lesion changes scored visually on clinical photographs. Participants [age = 15.5 years (SD = 1.6)] were randomly assigned to either the MPP or the control group, as determined by a computer-randomization scheme, created and locked before the start of the study. Participants received neutral-coloured concealed toothpaste tubes marked A or B. The patients and the observers were blinded with respect to the content of tube A or B. A total of 51 patients were analysed; MPP (n = 25) versus control group (n = 26); data loss (n = 14). There was no significant difference between the groups over time for all the used outcome measures. There was a significant improvement in enamel lesions (fluorescence loss) over time in both groups (P orthodontic fixed appliance treatment did not improve these lesions during the 1 year following debonding. This trial is registered at the medical ethical committee of the VU Medical Centre in Amsterdam (NL.199226.029.07). © The Author 2017. Published by Oxford University Press on behalf of the European Orthodontic Society

  19. Interpreting Multiple Logistic Regression Coefficients in Prospective Observational Studies

    Science.gov (United States)

    1982-11-01

    prompted close examination of the issue at a workshop on hypertriglyceridemia where some of the cautions and perspectives given in this paper were...characteristics. If this is not the interest, then to isolate and-understand the effect of a characteris- tic on CHD when it could be one of several interacting...also easily extended to the case when several independent variables are modeled in a multiple logistic equation. In this instance, if xlx 2,..., x are

  20. Advanced statistics: linear regression, part II: multiple linear regression.

    Science.gov (United States)

    Marill, Keith A

    2004-01-01

    The applications of simple linear regression in medical research are limited, because in most situations, there are multiple relevant predictor variables. Univariate statistical techniques such as simple linear regression use a single predictor variable, and they often may be mathematically correct but clinically misleading. Multiple linear regression is a mathematical technique used to model the relationship between multiple independent predictor variables and a single dependent outcome variable. It is used in medical research to model observational data, as well as in diagnostic and therapeutic studies in which the outcome is dependent on more than one factor. Although the technique generally is limited to data that can be expressed with a linear function, it benefits from a well-developed mathematical framework that yields unique solutions and exact confidence intervals for regression coefficients. Building on Part I of this series, this article acquaints the reader with some of the important concepts in multiple regression analysis. These include multicollinearity, interaction effects, and an expansion of the discussion of inference testing, leverage, and variable transformations to multivariate models. Examples from the first article in this series are expanded on using a primarily graphic, rather than mathematical, approach. The importance of the relationships among the predictor variables and the dependence of the multivariate model coefficients on the choice of these variables are stressed. Finally, concepts in regression model building are discussed.

  1. A nonparametric random coefficient approach for life expectancy growth using a hierarchical mixture likelihood model with application to regional data from North Rhine-Westphalia (Germany).

    Science.gov (United States)

    Böhning, Dankmar; Karasek, Sarah; Terschüren, Claudia; Annuß, Rolf; Fehr, Rainer

    2013-03-09

    Life expectancy is of increasing prime interest for a variety of reasons. In many countries, life expectancy is growing linearly, without any indication of reaching a limit. The state of North Rhine-Westphalia (NRW) in Germany with its 54 districts is considered here where the above mentioned growth in life expectancy is occurring as well. However, there is also empirical evidence that life expectancy is not growing linearly at the same level for different regions. To explore this situation further a likelihood-based cluster analysis is suggested and performed. The modelling uses a nonparametric mixture approach for the latent random effect. Maximum likelihood estimates are determined by means of the EM algorithm and the number of components in the mixture model are found on the basis of the Bayesian Information Criterion. Regions are classified into the mixture components (clusters) using the maximum posterior allocation rule. For the data analyzed here, 7 components are found with a spatial concentration of lower life expectancy levels in a centre of NRW, formerly an enormous conglomerate of heavy industry, still the most densely populated area with Gelsenkirchen having the lowest level of life expectancy growth for both genders. The paper offers some explanations for this fact including demographic and socio-economic sources. This case study shows that life expectancy growth is widely linear, but it might occur on different levels.

  2. Differentiating regressed melanoma from regressed lichenoid keratosis.

    Science.gov (United States)

    Chan, Aegean H; Shulman, Kenneth J; Lee, Bonnie A

    2017-04-01

    Distinguishing regressed lichen planus-like keratosis (LPLK) from regressed melanoma can be difficult on histopathologic examination, potentially resulting in mismanagement of patients. We aimed to identify histopathologic features by which regressed melanoma can be differentiated from regressed LPLK. Twenty actively inflamed LPLK, 12 LPLK with regression and 15 melanomas with regression were compared and evaluated by hematoxylin and eosin staining as well as Melan-A, microphthalmia transcription factor (MiTF) and cytokeratin (AE1/AE3) immunostaining. (1) A total of 40% of regressed melanomas showed complete or near complete loss of melanocytes within the epidermis with Melan-A and MiTF immunostaining, while 8% of regressed LPLK exhibited this finding. (2) Necrotic keratinocytes were seen in the epidermis in 33% regressed melanomas as opposed to all of the regressed LPLK. (3) A dense infiltrate of melanophages in the papillary dermis was seen in 40% of regressed melanomas, a feature not seen in regressed LPLK. In summary, our findings suggest that a complete or near complete loss of melanocytes within the epidermis strongly favors a regressed melanoma over a regressed LPLK. In addition, necrotic epidermal keratinocytes and the presence of a dense band-like distribution of dermal melanophages can be helpful in differentiating these lesions. © 2016 John Wiley & Sons A/S. Published by John Wiley & Sons Ltd.

  3. Better Autologistic Regression

    Directory of Open Access Journals (Sweden)

    Mark A. Wolters

    2017-11-01

    Full Text Available Autologistic regression is an important probability model for dichotomous random variables observed along with covariate information. It has been used in various fields for analyzing binary data possessing spatial or network structure. The model can be viewed as an extension of the autologistic model (also known as the Ising model, quadratic exponential binary distribution, or Boltzmann machine to include covariates. It can also be viewed as an extension of logistic regression to handle responses that are not independent. Not all authors use exactly the same form of the autologistic regression model. Variations of the model differ in two respects. First, the variable coding—the two numbers used to represent the two possible states of the variables—might differ. Common coding choices are (zero, one and (minus one, plus one. Second, the model might appear in either of two algebraic forms: a standard form, or a recently proposed centered form. Little attention has been paid to the effect of these differences, and the literature shows ambiguity about their importance. It is shown here that changes to either coding or centering in fact produce distinct, non-nested probability models. Theoretical results, numerical studies, and analysis of an ecological data set all show that the differences among the models can be large and practically significant. Understanding the nature of the differences and making appropriate modeling choices can lead to significantly improved autologistic regression analyses. The results strongly suggest that the standard model with plus/minus coding, which we call the symmetric autologistic model, is the most natural choice among the autologistic variants.

  4. Data mining methods in the prediction of Dementia: A real-data comparison of the accuracy, sensitivity and specificity of linear discriminant analysis, logistic regression, neural networks, support vector machines, classification trees and random forests

    Directory of Open Access Journals (Sweden)

    Santana Isabel

    2011-08-01

    Full Text Available Abstract Background Dementia and cognitive impairment associated with aging are a major medical and social concern. Neuropsychological testing is a key element in the diagnostic procedures of Mild Cognitive Impairment (MCI, but has presently a limited value in the prediction of progression to dementia. We advance the hypothesis that newer statistical classification methods derived from data mining and machine learning methods like Neural Networks, Support Vector Machines and Random Forests can improve accuracy, sensitivity and specificity of predictions obtained from neuropsychological testing. Seven non parametric classifiers derived from data mining methods (Multilayer Perceptrons Neural Networks, Radial Basis Function Neural Networks, Support Vector Machines, CART, CHAID and QUEST Classification Trees and Random Forests were compared to three traditional classifiers (Linear Discriminant Analysis, Quadratic Discriminant Analysis and Logistic Regression in terms of overall classification accuracy, specificity, sensitivity, Area under the ROC curve and Press'Q. Model predictors were 10 neuropsychological tests currently used in the diagnosis of dementia. Statistical distributions of classification parameters obtained from a 5-fold cross-validation were compared using the Friedman's nonparametric test. Results Press' Q test showed that all classifiers performed better than chance alone (p Conclusions When taking into account sensitivity, specificity and overall classification accuracy Random Forests and Linear Discriminant analysis rank first among all the classifiers tested in prediction of dementia using several neuropsychological tests. These methods may be used to improve accuracy, sensitivity and specificity of Dementia predictions from neuropsychological testing.

  5. Regression: A Bibliography.

    Science.gov (United States)

    Pedrini, D. T.; Pedrini, Bonnie C.

    Regression, another mechanism studied by Sigmund Freud, has had much research, e.g., hypnotic regression, frustration regression, schizophrenic regression, and infra-human-animal regression (often directly related to fixation). Many investigators worked with hypnotic age regression, which has a long history, going back to Russian reflexologists.…

  6. Estimação de parâmetros genéticos para produção de leite de vacas da raça Holandesa via regressão aleatória Estimation of genetic parameters for Holstein cows milk production by random regression

    Directory of Open Access Journals (Sweden)

    C.K.P. Dorneles

    2009-04-01

    Full Text Available Foram utilizados 21.702 registros de produção de leite no dia do controle de 2.429 vacas primíparas da raça Holandesa, filhas de 233 touros, coletados em 33 rebanhos do Estado do Rio Grande do Sul, para estimar parâmetros genéticos para produção de leite no dia do controle. O modelo de regressão aleatória ajustado aos controles leiteiros entre o sexto e o 305º dia de lactação incluiu o efeito de rebanho-ano-mês do controle, idade da vaca no parto e os parâmetros do polinômio de Legendre de ordem quatro, para modelar a curva média da produção de leite da população e parâmetros do mesmo polinômio, para modelar os efeitos aleatórios genético-aditivo e de ambiente permanente. As variâncias genéticas e de ambiente permanente para produção de leite no dia do controle variaram, respectivamente, de 2,38 a 3,14 e de 7,55 a 10,35. As estimativas de herdabilidade aumentaram gradativamente do início (0,14 para o final do período de lactação (0,20, indicando ser uma característica de moderada herdabilidade. As correlações genéticas entre as produções de leite de diferentes estágios leiteiros variaram de 0,33 a 0,99 e foram maiores entre os controles adjacentes. As correlações de ambiente permanente seguiram a mesma tendência das correlações genéticas. O modelo de regressão aleatória com polinômio de Legendre de ordem quatro pode ser considerado como uma boa ferramenta para estimação de parâmetros genéticos para a produção de leite ao longo da lactação.A total of 21,702 records of milk production from 2,429 first-lactation Holstein cows, sired by 233 bulls, collected in 33 herds in the State of Rio Grande do Sul from 1991 to 2003, were used to estimate genetic parameters for that characteristic. The random regression model adjusted to test day from the 6th and the 305th lactation day included the effect of herd-year-month of the test day, the age of the cow at parturition, and the order fourth Legendre

  7. REGRES: A FORTRAN-77 program to calculate nonparametric and ``structural'' parametric solutions to bivariate regression equations

    Science.gov (United States)

    Rock, N. M. S.; Duffy, T. R.

    REGRES allows a range of regression equations to be calculated for paired sets of data values in which both variables are subject to error (i.e. neither is the "independent" variable). Nonparametric regressions, based on medians of all possible pairwise slopes and intercepts, are treated in detail. Estimated slopes and intercepts are output, along with confidence limits, Spearman and Kendall rank correlation coefficients. Outliers can be rejected with user-determined stringency. Parametric regressions can be calculated for any value of λ (the ratio of the variances of the random errors for y and x)—including: (1) major axis ( λ = 1); (2) reduced major axis ( λ = variance of y/variance of x); (3) Y on Xλ = infinity; or (4) X on Y ( λ = 0) solutions. Pearson linear correlation coefficients also are output. REGRES provides an alternative to conventional isochron assessment techniques where bivariate normal errors cannot be assumed, or weighting methods are inappropriate.

  8. Random Coefficient Logit Model for Large Datasets

    NARCIS (Netherlands)

    C. Hernández-Mireles (Carlos); D. Fok (Dennis)

    2010-01-01

    textabstractWe present an approach for analyzing market shares and products price elasticities based on large datasets containing aggregate sales data for many products, several markets and for relatively long time periods. We consider the recently proposed Bayesian approach of Jiang et al [Jiang,

  9. Bacteria Reduction In Ponds Under Random Coefficients ...

    African Journals Online (AJOL)

    Journal of Modeling, Design and Management of Engineering Systems. Journal Home · ABOUT THIS JOURNAL · Advanced Search · Current Issue · Archives · Journal Home > Vol 1, No 1 (2002) >. Log in or Register to get access to full text downloads.

  10. Sparse Regression by Projection and Sparse Discriminant Analysis

    KAUST Repository

    Qi, Xin; Luo, Ruiyan; Carroll, Raymond J.; Zhao, Hongyu

    2015-01-01

    predictions. We introduce a new framework, regression by projection, and its sparse version to analyze high-dimensional data. The unique nature of this framework is that the directions of the regression coefficients are inferred first, and the lengths

  11. Persistência na lactação para vacas da raça Holandesa criadas no Estado do Rio Grande do Sul via modelos de regressão aleatória Lactation persistency for Holstein cows raised in the State of Rio Grande do Sul using a random regression model

    Directory of Open Access Journals (Sweden)

    Cristian Kelen Pinto Dorneles

    2009-08-01

    Full Text Available Foram utilizados 21.702 registros de produção de leite no dia do controle de 2.429 vacas primíparas da raça Holandesa, filhas de 233 touros, coletados em 33 rebanhos do Estado do Rio Grande do Sul, entre 1992 e 2003, para estimar parâmetros genéticos, para três medidas de persistência (PS1, PS2 e PS3 e para a produção de leite até 305 dias (P305 de lactação. Os modelos de regressão aleatória ajustados aos controles leiteiros entre o sexto e o 300o dia de lactação incluíram o efeito de rebanho-ano-mês do controle, a idade da vaca ao parto e os parâmetros do polinômio de Legendre de ordem quatro, para modelar a curva média da produção de leite da população e os parâmetros do mesmo polinômio, para modelar os efeitos aleatórios genético-aditivo direto e de ambiente permanente. As estimativas de herdabilidade obtidas foram 0,05, 0,08 e 0,19, respectivamente, para PS1, PS2 e PS3 e 0,25, para P305 sugerindo a possibilidade de ganho genético por meio da seleção para PS3 e para P305. As correlações genéticas entre as três medidas de persistência e P305, variaram de -0,05 a 0,07, indicando serem persistência e produção, características determinadas por grupos de genes diferentes. Assim, consequentemente, a seleção para P305, geralmente praticada, não promove progresso genético para a persistência.There were used 21,702 test day milk yields from 2,429 first parity Holstein breed cows, daughters of 2,031 dams and 233 sires, distributed over 33 herds in the state of Rio Grande do Sul, from 1992 to 2003. Genetic parameters for three measures of lactation persistency (PS1, PS2 e PS3 and for milk production to 305 days (P305 were evaluated. A random regression model adjusted by fourth order Legendre polynomial was used. The random regression model adjusted to test day between the sixth and the 305th lactation day included the herd-year-season of the test day, the age of the cow at the parturition effects and the

  12. AN APPLICATION OF FUNCTIONAL MULTIVARIATE REGRESSION MODEL TO MULTICLASS CLASSIFICATION

    OpenAIRE

    Krzyśko, Mirosław; Smaga, Łukasz

    2017-01-01

    In this paper, the scale response functional multivariate regression model is considered. By using the basis functions representation of functional predictors and regression coefficients, this model is rewritten as a multivariate regression model. This representation of the functional multivariate regression model is used for multiclass classification for multivariate functional data. Computational experiments performed on real labelled data sets demonstrate the effectiveness of the proposed ...

  13. Virologic response to tipranavir-ritonavir or darunavir-ritonavir based regimens in antiretroviral therapy experienced HIV-1 patients: a meta-analysis and meta-regression of randomized controlled clinical trials.

    Directory of Open Access Journals (Sweden)

    Asres Berhan

    Full Text Available The development of tipranavir and darunavir, second generation non-peptidic HIV protease inhibitors, with marked improved resistance profiles, has opened a new perspective on the treatment of antiretroviral therapy (ART experienced HIV patients with poor viral load control. The aim of this study was to determine the virologic response in ART experienced patients to tipranavir-ritonavir and darunavir-ritonavir based regimens.A computer based literature search was conducted in the databases of HINARI (Health InterNetwork Access to Research Initiative, Medline and Cochrane library. Meta-analysis was performed by including randomized controlled studies that were conducted in ART experienced patients with plasma viral load above 1,000 copies HIV RNA/ml. The odds ratios and 95% confidence intervals (CI for viral loads of <50 copies and <400 copies HIV RNA/ml at the end of the intervention were determined by the random effects model. Meta-regression, sensitivity analysis and funnel plots were done. The number of HIV-1 patients who were on either a tipranavir-ritonavir or darunavir-ritonavir based regimen and achieved viral load less than 50 copies HIV RNA/ml was significantly higher (overall OR = 3.4; 95% CI, 2.61-4.52 than the number of HIV-1 patients who were on investigator selected boosted comparator HIV-1 protease inhibitors (CPIs-ritonavir. Similarly, the number of patients with viral load less than 400 copies HIV RNA/ml was significantly higher in either the tipranavir-ritonavir or darunavir-ritonavir based regimen treated group (overall OR = 3.0; 95% CI, 2.15-4.11. Meta-regression showed that the viral load reduction was independent of baseline viral load, baseline CD4 count and duration of tipranavir-ritonavir or darunavir-ritonavir based regimen.Tipranavir and darunavir based regimens were more effective in patients who were ART experienced and had poor viral load control. Further studies are required to determine their consistent

  14. [From clinical judgment to linear regression model.

    Science.gov (United States)

    Palacios-Cruz, Lino; Pérez, Marcela; Rivas-Ruiz, Rodolfo; Talavera, Juan O

    2013-01-01

    When we think about mathematical models, such as linear regression model, we think that these terms are only used by those engaged in research, a notion that is far from the truth. Legendre described the first mathematical model in 1805, and Galton introduced the formal term in 1886. Linear regression is one of the most commonly used regression models in clinical practice. It is useful to predict or show the relationship between two or more variables as long as the dependent variable is quantitative and has normal distribution. Stated in another way, the regression is used to predict a measure based on the knowledge of at least one other variable. Linear regression has as it's first objective to determine the slope or inclination of the regression line: Y = a + bx, where "a" is the intercept or regression constant and it is equivalent to "Y" value when "X" equals 0 and "b" (also called slope) indicates the increase or decrease that occurs when the variable "x" increases or decreases in one unit. In the regression line, "b" is called regression coefficient. The coefficient of determination (R 2 ) indicates the importance of independent variables in the outcome.

  15. Regression analysis by example

    CERN Document Server

    Chatterjee, Samprit

    2012-01-01

    Praise for the Fourth Edition: ""This book is . . . an excellent source of examples for regression analysis. It has been and still is readily readable and understandable."" -Journal of the American Statistical Association Regression analysis is a conceptually simple method for investigating relationships among variables. Carrying out a successful application of regression analysis, however, requires a balance of theoretical results, empirical rules, and subjective judgment. Regression Analysis by Example, Fifth Edition has been expanded

  16. Logistic regression for dichotomized counts.

    Science.gov (United States)

    Preisser, John S; Das, Kalyan; Benecha, Habtamu; Stamm, John W

    2016-12-01

    Sometimes there is interest in a dichotomized outcome indicating whether a count variable is positive or zero. Under this scenario, the application of ordinary logistic regression may result in efficiency loss, which is quantifiable under an assumed model for the counts. In such situations, a shared-parameter hurdle model is investigated for more efficient estimation of regression parameters relating to overall effects of covariates on the dichotomous outcome, while handling count data with many zeroes. One model part provides a logistic regression containing marginal log odds ratio effects of primary interest, while an ancillary model part describes the mean count of a Poisson or negative binomial process in terms of nuisance regression parameters. Asymptotic efficiency of the logistic model parameter estimators of the two-part models is evaluated with respect to ordinary logistic regression. Simulations are used to assess the properties of the models with respect to power and Type I error, the latter investigated under both misspecified and correctly specified models. The methods are applied to data from a randomized clinical trial of three toothpaste formulations to prevent incident dental caries in a large population of Scottish schoolchildren. © The Author(s) 2014.

  17. Modeling Linguistic Variables With Regression Models: Addressing Non-Gaussian Distributions, Non-independent Observations, and Non-linear Predictors With Random Effects and Generalized Additive Models for Location, Scale, and Shape.

    Science.gov (United States)

    Coupé, Christophe

    2018-01-01

    As statistical approaches are getting increasingly used in linguistics, attention must be paid to the choice of methods and algorithms used. This is especially true since they require assumptions to be satisfied to provide valid results, and because scientific articles still often fall short of reporting whether such assumptions are met. Progress is being, however, made in various directions, one of them being the introduction of techniques able to model data that cannot be properly analyzed with simpler linear regression models. We report recent advances in statistical modeling in linguistics. We first describe linear mixed-effects regression models (LMM), which address grouping of observations, and generalized linear mixed-effects models (GLMM), which offer a family of distributions for the dependent variable. Generalized additive models (GAM) are then introduced, which allow modeling non-linear parametric or non-parametric relationships between the dependent variable and the predictors. We then highlight the possibilities offered by generalized additive models for location, scale, and shape (GAMLSS). We explain how they make it possible to go beyond common distributions, such as Gaussian or Poisson, and offer the appropriate inferential framework to account for 'difficult' variables such as count data with strong overdispersion. We also demonstrate how they offer interesting perspectives on data when not only the mean of the dependent variable is modeled, but also its variance, skewness, and kurtosis. As an illustration, the case of phonemic inventory size is analyzed throughout the article. For over 1,500 languages, we consider as predictors the number of speakers, the distance from Africa, an estimation of the intensity of language contact, and linguistic relationships. We discuss the use of random effects to account for genealogical relationships, the choice of appropriate distributions to model count data, and non-linear relationships. Relying on GAMLSS, we

  18. Modeling Linguistic Variables With Regression Models: Addressing Non-Gaussian Distributions, Non-independent Observations, and Non-linear Predictors With Random Effects and Generalized Additive Models for Location, Scale, and Shape

    Directory of Open Access Journals (Sweden)

    Christophe Coupé

    2018-04-01

    Full Text Available As statistical approaches are getting increasingly used in linguistics, attention must be paid to the choice of methods and algorithms used. This is especially true since they require assumptions to be satisfied to provide valid results, and because scientific articles still often fall short of reporting whether such assumptions are met. Progress is being, however, made in various directions, one of them being the introduction of techniques able to model data that cannot be properly analyzed with simpler linear regression models. We report recent advances in statistical modeling in linguistics. We first describe linear mixed-effects regression models (LMM, which address grouping of observations, and generalized linear mixed-effects models (GLMM, which offer a family of distributions for the dependent variable. Generalized additive models (GAM are then introduced, which allow modeling non-linear parametric or non-parametric relationships between the dependent variable and the predictors. We then highlight the possibilities offered by generalized additive models for location, scale, and shape (GAMLSS. We explain how they make it possible to go beyond common distributions, such as Gaussian or Poisson, and offer the appropriate inferential framework to account for ‘difficult’ variables such as count data with strong overdispersion. We also demonstrate how they offer interesting perspectives on data when not only the mean of the dependent variable is modeled, but also its variance, skewness, and kurtosis. As an illustration, the case of phonemic inventory size is analyzed throughout the article. For over 1,500 languages, we consider as predictors the number of speakers, the distance from Africa, an estimation of the intensity of language contact, and linguistic relationships. We discuss the use of random effects to account for genealogical relationships, the choice of appropriate distributions to model count data, and non-linear relationships

  19. Parâmetros genéticos para a produção de leite de controles individuais de vacas da raça Gir estimados com modelos de repetibilidade e regressão aleatória Estimation of genetic parameters for test day milk records of first lactation Gyr cows using repeatability and random regression animal models

    Directory of Open Access Journals (Sweden)

    Claudio Napolis Costa

    2005-10-01

    número de estimativas negativas entre as PLC do início e fim da lactação do que a FAS. Exceto para a FAS, observou-se redução das estimativas de correlação genética próximas à unidade entre as PLC adjacentes para valores negativos entre as PLC no início e no fim da lactação. Entre os polinômios de Legendre, o de quinta ordem apresentou um melhor o ajuste das PLC. Os resultados indicam o potencial de uso de regressão aleatória, com os modelos LP5 e a FAS apresentando-se como os mais adequados para a modelagem das variâncias genética e de efeito permanente das PLC da raça Gir.Data comprising 8,183 test day records of 1,273 first lactations of Gyr cows from herds supervised by ABCZ were used to estimate variance components and genetic parameters for milk yield using repeatability and random regression animal models by REML. Genetic modelling of logarithmic (FAS, exponential (FW curves was compared to orthogonal Legendre polynomials (LP of order 3 to 5. Residual variance was assumed to be constant in all (ME=1 or some periods of lactation (ME=4. Lactation milk yield in 305-d was also adjusted by an animal model. Genetic variance, heritability and repeatability for test day milk yields estimated by a repeatability animal model were 1.74 kg2, 0.27, and 0.76, respectively. Genetic variance and heritability estimates for lactation milk yield were respectively 121,094.6 and 0.22. Heritability estimates from FAS and FW, respectively, decreased from 0,59 and 0.74 at the beginning of lactation to 0.20 at the end of the period. Except for a fifth-order LP with ME=1, heritability estimates decreased from around 0,70 at early lactation to 0,30 at the end of lactation. Residual variance estimates were slightly smaller for logarithimic than for exponential curves both for homogeneous and heterogeneous variance assumptions. Estimates of residual variance in all stages of lactation decreased as the order of LP increased and depended on the assumption about ME

  20. Quantile Regression Methods

    DEFF Research Database (Denmark)

    Fitzenberger, Bernd; Wilke, Ralf Andreas

    2015-01-01

    if the mean regression model does not. We provide a short informal introduction into the principle of quantile regression which includes an illustrative application from empirical labor market research. This is followed by briefly sketching the underlying statistical model for linear quantile regression based......Quantile regression is emerging as a popular statistical approach, which complements the estimation of conditional mean models. While the latter only focuses on one aspect of the conditional distribution of the dependent variable, the mean, quantile regression provides more detailed insights...... by modeling conditional quantiles. Quantile regression can therefore detect whether the partial effect of a regressor on the conditional quantiles is the same for all quantiles or differs across quantiles. Quantile regression can provide evidence for a statistical relationship between two variables even...

  1. Estimativas de parâmetros genéticos para produção de leite e persistência da lactação em vacas Gir, aplicando modelos de regressão aleatória Estimates of genetic parameters for milk yield and persistency of lactation of Gyr cows, applying random regression models

    Directory of Open Access Journals (Sweden)

    Luis Gabriel González Herrera

    2008-09-01

    of Gyr cows calving between 1990 and 2005 were used to estimate genetic parameters of monthly test-day milk yield (TDMY. Records were analyzed by random regression models (MRA that included the additive genetic and permanent environmental random effects and the contemporary group, age of cow at calving (linear and quadratic components and the average trend of the population as fixed effects. Random trajectories were fitted by Wilmink's (WIL and Ali & Schaeffer's (AS parametric functions. Residual variances were fitted by step functions with 1, 4, 6 or 10 classes. The contemporary group was defined by herd-year-season of test-day and included at least three animals. Models were compared by Akaike's and Schwarz's Bayesian (BIC information criterion. The AS function used for modeling the additive genetic and permanent environmental effects with heterogeneous residual variances adjusted with a step function with four classes was the best fitted model. Heritability estimates ranged from 0.21 to 0.33 for the AS function and from 0.17 to 0.30 for WIL function and were larger in the first half of the lactation period. Genetic correlations between TDMY were high and positive for adjacent test-days and decreased as days between records increased. Predicted breeding values for total 305-day milk yield (MRA305 and specific periods of lactation (obtained by the mean of all breeding values in the periods using the AS function were compared with that predicted by a standard model using accumulated 305-day milk yield (PTA305 by rank correlation. The magnitude of correlations suggested differences may be observed in ranking animals by using the different criteria which were compared in this study.

  2. Understanding logistic regression analysis

    OpenAIRE

    Sperandei, Sandro

    2014-01-01

    Logistic regression is used to obtain odds ratio in the presence of more than one explanatory variable. The procedure is quite similar to multiple linear regression, with the exception that the response variable is binomial. The result is the impact of each variable on the odds ratio of the observed event of interest. The main advantage is to avoid confounding effects by analyzing the association of all variables together. In this article, we explain the logistic regression procedure using ex...

  3. Introduction to regression graphics

    CERN Document Server

    Cook, R Dennis

    2009-01-01

    Covers the use of dynamic and interactive computer graphics in linear regression analysis, focusing on analytical graphics. Features new techniques like plot rotation. The authors have composed their own regression code, using Xlisp-Stat language called R-code, which is a nearly complete system for linear regression analysis and can be utilized as the main computer program in a linear regression course. The accompanying disks, for both Macintosh and Windows computers, contain the R-code and Xlisp-Stat. An Instructor's Manual presenting detailed solutions to all the problems in the book is ava

  4. Alternative Methods of Regression

    CERN Document Server

    Birkes, David

    2011-01-01

    Of related interest. Nonlinear Regression Analysis and its Applications Douglas M. Bates and Donald G. Watts ".an extraordinary presentation of concepts and methods concerning the use and analysis of nonlinear regression models.highly recommend[ed].for anyone needing to use and/or understand issues concerning the analysis of nonlinear regression models." --Technometrics This book provides a balance between theory and practice supported by extensive displays of instructive geometrical constructs. Numerous in-depth case studies illustrate the use of nonlinear regression analysis--with all data s

  5. Avaliação de medidas da persistência da lactação de cabras da raça Saanen sob modelo de regressão aleatória Evaluation of persistency lactation measures of Saanen goats under random regression model

    Directory of Open Access Journals (Sweden)

    Gilberto Romeiro de Oliveira Menezes

    2010-08-01

    Full Text Available Utilizaram-se 10.238 registros semanais de produção de leite no dia do controle, provenientes de 388 primeiras lactações de cabras da raça Saanen, na avaliação de seis medidas da persistência da lactação, a fim de verificar qual a mais adequada para o uso em avaliações genéticas para a característica. As seis medidas avaliadas são adaptações de medidas utilizadas em bovinos de leite, obtidas por substituir, nas fórmulas, os valores de referência de bovinos pelos de caprinos. Os valores usados nos cálculos foram obtidos de modelos de regressão aleatória. As estimativas de herdabilidade para as medidas de persistência variaram entre 0,03 e 0,09. As correlações genéticas entre medidas de persistência e produção de leite até 268 dias variaram entre -0,64 e 0,67. Por apresentar a menor correlação genética com produção aos 268 dias (0,14, a medida de persistência PS4, obtida pelo somatório dos valores do 41º ao 240º dia de lactação como desvios da produção aos 40 dias de lactação, é a mais indicada em avaliações genéticas para persistência da lactação em cabras da raça Saanen. Assim, a seleção de cabras de melhor persistência da lactação não altera a produção aos 268 dias. Em razão da baixa herdabilidade dessa medida (0,03, pequenas respostas à seleção são esperadas neste rebanho.It was used 10,238 weekly milk production records on the control day from the first 388 lactations of Saanen goats on the evalution of six lactation persistency measures in order to find out which was the best fitted for using in genetic evaluations on this trait. These six evaluated measures are adaptations from those used on dairy cattle, obtained by replacing, in the formula, bovine reference values by the goat ones. The values used in the calculations were obtained from random regression models. Heritability estimates for persistency measures ranged from 0.03 to 0.09. Genetic correlations between

  6. Dimension Reduction and Discretization in Stochastic Problems by Regression Method

    DEFF Research Database (Denmark)

    Ditlevsen, Ove Dalager

    1996-01-01

    The chapter mainly deals with dimension reduction and field discretizations based directly on the concept of linear regression. Several examples of interesting applications in stochastic mechanics are also given.Keywords: Random fields discretization, Linear regression, Stochastic interpolation, ...

  7. Harmonic regression of Landsat time series for modeling attributes from national forest inventory data

    Science.gov (United States)

    Wilson, Barry T.; Knight, Joseph F.; McRoberts, Ronald E.

    2018-03-01

    Imagery from the Landsat Program has been used frequently as a source of auxiliary data for modeling land cover, as well as a variety of attributes associated with tree cover. With ready access to all scenes in the archive since 2008 due to the USGS Landsat Data Policy, new approaches to deriving such auxiliary data from dense Landsat time series are required. Several methods have previously been developed for use with finer temporal resolution imagery (e.g. AVHRR and MODIS), including image compositing and harmonic regression using Fourier series. The manuscript presents a study, using Minnesota, USA during the years 2009-2013 as the study area and timeframe. The study examined the relative predictive power of land cover models, in particular those related to tree cover, using predictor variables based solely on composite imagery versus those using estimated harmonic regression coefficients. The study used two common non-parametric modeling approaches (i.e. k-nearest neighbors and random forests) for fitting classification and regression models of multiple attributes measured on USFS Forest Inventory and Analysis plots using all available Landsat imagery for the study area and timeframe. The estimated Fourier coefficients developed by harmonic regression of tasseled cap transformation time series data were shown to be correlated with land cover, including tree cover. Regression models using estimated Fourier coefficients as predictor variables showed a two- to threefold increase in explained variance for a small set of continuous response variables, relative to comparable models using monthly image composites. Similarly, the overall accuracies of classification models using the estimated Fourier coefficients were approximately 10-20 percentage points higher than the models using the image composites, with corresponding individual class accuracies between six and 45 percentage points higher.

  8. Transport Coefficients of Fluids

    CERN Document Server

    Eu, Byung Chan

    2006-01-01

    Until recently the formal statistical mechanical approach offered no practicable method for computing the transport coefficients of liquids, and so most practitioners had to resort to empirical fitting formulas. This has now changed, as demonstrated in this innovative monograph. The author presents and applies new methods based on statistical mechanics for calculating the transport coefficients of simple and complex liquids over wide ranges of density and temperature. These molecular theories enable the transport coefficients to be calculated in terms of equilibrium thermodynamic properties, and the results are shown to account satisfactorily for experimental observations, including even the non-Newtonian behavior of fluids far from equilibrium.

  9. Boosted beta regression.

    Directory of Open Access Journals (Sweden)

    Matthias Schmid

    Full Text Available Regression analysis with a bounded outcome is a common problem in applied statistics. Typical examples include regression models for percentage outcomes and the analysis of ratings that are measured on a bounded scale. In this paper, we consider beta regression, which is a generalization of logit models to situations where the response is continuous on the interval (0,1. Consequently, beta regression is a convenient tool for analyzing percentage responses. The classical approach to fit a beta regression model is to use maximum likelihood estimation with subsequent AIC-based variable selection. As an alternative to this established - yet unstable - approach, we propose a new estimation technique called boosted beta regression. With boosted beta regression estimation and variable selection can be carried out simultaneously in a highly efficient way. Additionally, both the mean and the variance of a percentage response can be modeled using flexible nonlinear covariate effects. As a consequence, the new method accounts for common problems such as overdispersion and non-binomial variance structures.

  10. Tracking time-varying coefficient-functions

    DEFF Research Database (Denmark)

    Nielsen, Henrik Aalborg; Nielsen, Torben Skov; Joensen, Alfred K.

    2000-01-01

    is a combination of recursive least squares with exponential forgetting and local polynomial regression. It is argued, that it is appropriate to let the forgetting factor vary with the value of the external signal which is the argument of the coefficient functions. Some of the key properties of the modified method...... are studied by simulation...

  11. Understanding logistic regression analysis.

    Science.gov (United States)

    Sperandei, Sandro

    2014-01-01

    Logistic regression is used to obtain odds ratio in the presence of more than one explanatory variable. The procedure is quite similar to multiple linear regression, with the exception that the response variable is binomial. The result is the impact of each variable on the odds ratio of the observed event of interest. The main advantage is to avoid confounding effects by analyzing the association of all variables together. In this article, we explain the logistic regression procedure using examples to make it as simple as possible. After definition of the technique, the basic interpretation of the results is highlighted and then some special issues are discussed.

  12. Applied linear regression

    CERN Document Server

    Weisberg, Sanford

    2013-01-01

    Praise for the Third Edition ""...this is an excellent book which could easily be used as a course text...""-International Statistical Institute The Fourth Edition of Applied Linear Regression provides a thorough update of the basic theory and methodology of linear regression modeling. Demonstrating the practical applications of linear regression analysis techniques, the Fourth Edition uses interesting, real-world exercises and examples. Stressing central concepts such as model building, understanding parameters, assessing fit and reliability, and drawing conclusions, the new edition illus

  13. Applied logistic regression

    CERN Document Server

    Hosmer, David W; Sturdivant, Rodney X

    2013-01-01

     A new edition of the definitive guide to logistic regression modeling for health science and other applications This thoroughly expanded Third Edition provides an easily accessible introduction to the logistic regression (LR) model and highlights the power of this model by examining the relationship between a dichotomous outcome and a set of covariables. Applied Logistic Regression, Third Edition emphasizes applications in the health sciences and handpicks topics that best suit the use of modern statistical software. The book provides readers with state-of-

  14. Significance testing in ridge regression for genetic data

    Directory of Open Access Journals (Sweden)

    De Iorio Maria

    2011-09-01

    Full Text Available Abstract Background Technological developments have increased the feasibility of large scale genetic association studies. Densely typed genetic markers are obtained using SNP arrays, next-generation sequencing technologies and imputation. However, SNPs typed using these methods can be highly correlated due to linkage disequilibrium among them, and standard multiple regression techniques fail with these data sets due to their high dimensionality and correlation structure. There has been increasing interest in using penalised regression in the analysis of high dimensional data. Ridge regression is one such penalised regression technique which does not perform variable selection, instead estimating a regression coefficient for each predictor variable. It is therefore desirable to obtain an estimate of the significance of each ridge regression coefficient. Results We develop and evaluate a test of significance for ridge regression coefficients. Using simulation studies, we demonstrate that the performance of the test is comparable to that of a permutation test, with the advantage of a much-reduced computational cost. We introduce the p-value trace, a plot of the negative logarithm of the p-values of ridge regression coefficients with increasing shrinkage parameter, which enables the visualisation of the change in p-value of the regression coefficients with increasing penalisation. We apply the proposed method to a lung cancer case-control data set from EPIC, the European Prospective Investigation into Cancer and Nutrition. Conclusions The proposed test is a useful alternative to a permutation test for the estimation of the significance of ridge regression coefficients, at a much-reduced computational cost. The p-value trace is an informative graphical tool for evaluating the results of a test of significance of ridge regression coefficients as the shrinkage parameter increases, and the proposed test makes its production computationally feasible.

  15. On Weighted Support Vector Regression

    DEFF Research Database (Denmark)

    Han, Xixuan; Clemmensen, Line Katrine Harder

    2014-01-01

    We propose a new type of weighted support vector regression (SVR), motivated by modeling local dependencies in time and space in prediction of house prices. The classic weights of the weighted SVR are added to the slack variables in the objective function (OF‐weights). This procedure directly...... shrinks the coefficient of each observation in the estimated functions; thus, it is widely used for minimizing influence of outliers. We propose to additionally add weights to the slack variables in the constraints (CF‐weights) and call the combination of weights the doubly weighted SVR. We illustrate...... the differences and similarities of the two types of weights by demonstrating the connection between the Least Absolute Shrinkage and Selection Operator (LASSO) and the SVR. We show that an SVR problem can be transformed to a LASSO problem plus a linear constraint and a box constraint. We demonstrate...

  16. Understanding poisson regression.

    Science.gov (United States)

    Hayat, Matthew J; Higgins, Melinda

    2014-04-01

    Nurse investigators often collect study data in the form of counts. Traditional methods of data analysis have historically approached analysis of count data either as if the count data were continuous and normally distributed or with dichotomization of the counts into the categories of occurred or did not occur. These outdated methods for analyzing count data have been replaced with more appropriate statistical methods that make use of the Poisson probability distribution, which is useful for analyzing count data. The purpose of this article is to provide an overview of the Poisson distribution and its use in Poisson regression. Assumption violations for the standard Poisson regression model are addressed with alternative approaches, including addition of an overdispersion parameter or negative binomial regression. An illustrative example is presented with an application from the ENSPIRE study, and regression modeling of comorbidity data is included for illustrative purposes. Copyright 2014, SLACK Incorporated.

  17. Multiple regression models for energy use in air-conditioned office buildings in different climates

    International Nuclear Information System (INIS)

    Lam, Joseph C.; Wan, Kevin K.W.; Liu Dalong; Tsang, C.L.

    2010-01-01

    An attempt was made to develop multiple regression models for office buildings in the five major climates in China - severe cold, cold, hot summer and cold winter, mild, and hot summer and warm winter. A total of 12 key building design variables were identified through parametric and sensitivity analysis, and considered as inputs in the regression models. The coefficient of determination R 2 varies from 0.89 in Harbin to 0.97 in Kunming, indicating that 89-97% of the variations in annual building energy use can be explained by the changes in the 12 parameters. A pseudo-random number generator based on three simple multiplicative congruential generators was employed to generate random designs for evaluation of the regression models. The difference between regression-predicted and DOE-simulated annual building energy use are largely within 10%. It is envisaged that the regression models developed can be used to estimate the likely energy savings/penalty during the initial design stage when different building schemes and design concepts are being considered.

  18. Discharge Coefficient of Rectangular Short-Crested Weir with Varying Slope Coefficients

    Directory of Open Access Journals (Sweden)

    Yuejun Chen

    2018-02-01

    Full Text Available Rectangular short-crested weirs are widely used for simple structure and high discharge capacity. As one of the most important and influential factors of discharge capacity, side slope can improve the hydraulic characteristics of weirs at special conditions. In order to systemically study the effects of upstream and downstream slope coefficients S1 and S2 on overflow discharge coefficient in a rectangular short-crested weir the Volume of Fluid (VOF method and the Renormalization Group (RNG κ-ε turbulence model are used. In this study, the slope coefficient ranges from V to 3H:1V and each model corresponds to five total energy heads of H0 ranging from 8.0 to 24.0 cm. Comparisons of discharge coefficients and free surface profiles between simulated and laboratory results display a good agreement. The simulated results show that the difference of discharge coefficients will decrease with upstream slopes and increase with downstream slopes as H0 increases. For a given H0, the discharge coefficient has a convex parabolic relation with S1 and a piecewise linearity relation with S2. The maximum discharge coefficient is always obtained at S2 = 0.8. There exists a difference between upstream and downstream slope coefficients in the influence range of free surface curvatures. Furthermore, a proposed discharge coefficient equation by nonlinear regression is a function of upstream and downstream slope coefficients.

  19. Dose-Dependent Effects of Statins for Patients with Aneurysmal Subarachnoid Hemorrhage: Meta-Regression Analysis.

    Science.gov (United States)

    To, Minh-Son; Prakash, Shivesh; Poonnoose, Santosh I; Bihari, Shailesh

    2018-05-01

    The study uses meta-regression analysis to quantify the dose-dependent effects of statin pharmacotherapy on vasospasm, delayed ischemic neurologic deficits (DIND), and mortality in aneurysmal subarachnoid hemorrhage. Prospective, retrospective observational studies, and randomized controlled trials (RCTs) were retrieved by a systematic database search. Summary estimates were expressed as absolute risk (AR) for a given statin dose or control (placebo). Meta-regression using inverse variance weighting and robust variance estimation was performed to assess the effect of statin dose on transformed AR in a random effects model. Dose-dependence of predicted AR with 95% confidence interval (CI) was recovered by using Miller's Freeman-Tukey inverse. The database search and study selection criteria yielded 18 studies (2594 patients) for analysis. These included 12 RCTs, 4 retrospective observational studies, and 2 prospective observational studies. Twelve studies investigated simvastatin, whereas the remaining studies investigated atorvastatin, pravastatin, or pitavastatin, with simvastatin-equivalent doses ranging from 20 to 80 mg. Meta-regression revealed dose-dependent reductions in Freeman-Tukey-transformed AR of vasospasm (slope coefficient -0.00404, 95% CI -0.00720 to -0.00087; P = 0.0321), DIND (slope coefficient -0.00316, 95% CI -0.00586 to -0.00047; P = 0.0392), and mortality (slope coefficient -0.00345, 95% CI -0.00623 to -0.00067; P = 0.0352). The present meta-regression provides weak evidence for dose-dependent reductions in vasospasm, DIND and mortality associated with acute statin use after aneurysmal subarachnoid hemorrhage. However, the analysis was limited by substantial heterogeneity among individual studies. Greater dosing strategies are a potential consideration for future RCTs. Copyright © 2018 Elsevier Inc. All rights reserved.

  20. Testing homogeneity in Weibull-regression models.

    Science.gov (United States)

    Bolfarine, Heleno; Valença, Dione M

    2005-10-01

    In survival studies with families or geographical units it may be of interest testing whether such groups are homogeneous for given explanatory variables. In this paper we consider score type tests for group homogeneity based on a mixing model in which the group effect is modelled as a random variable. As opposed to hazard-based frailty models, this model presents survival times that conditioned on the random effect, has an accelerated failure time representation. The test statistics requires only estimation of the conventional regression model without the random effect and does not require specifying the distribution of the random effect. The tests are derived for a Weibull regression model and in the uncensored situation, a closed form is obtained for the test statistic. A simulation study is used for comparing the power of the tests. The proposed tests are applied to real data sets with censored data.

  1. Regularity of the Interband Light Absorption Coefficient

    Indian Academy of Sciences (India)

    In this paper we consider the interband light absorption coefficient (ILAC), in a symmetric form, in the case of random operators on the -dimensional lattice. We show that the symmetrized version of ILAC is either continuous or has a component which has the same modulus of continuity as the density of states.

  2. Recognition of NEMP and LEMP signals based on auto-regression model and artificial neutral network

    International Nuclear Information System (INIS)

    Li Peng; Song Lijun; Han Chao; Zheng Yi; Cao Baofeng; Li Xiaoqiang; Zhang Xueqin; Liang Rui

    2010-01-01

    Auto-regression (AR) model, one power spectrum estimation method of stationary random signals, and artificial neutral network were adopted to recognize nuclear and lightning electromagnetic pulses. Self-correlation function and Burg algorithms were used to acquire the AR model coefficients as eigenvalues, and BP artificial neural network was introduced as the classifier with different numbers of hidden layers and hidden layer nodes. The results show that AR model is effective in those signals, feature extraction, and the Burg algorithm is more effective than the self-correlation function algorithm. (authors)

  3. REGRESSIVE ANALYSIS OF BRAKING EFFICIENCY OF M1 CATEGORY VEHICLES WITH ANTI-BLOCKING BRAKE SYSTEM

    Directory of Open Access Journals (Sweden)

    О. Sarayev

    2015-07-01

    Full Text Available The problematics of assessing the effectiveness of vehicle braking after road accidentoccurrence is considered. For the first time in relation to the modern models of vehicles equipped with anti-lock brakes there were obtained regression models describing the relationship between the coefficient of traction and a random variable of steady deceleration. This does not contradict the essence of the stochastic physical object, which is the process of vehicle braking, unlike the previously adopted method of formalizing this process, using a deterministic function.

  4. Neutrosophic Correlation and Simple Linear Regression

    Directory of Open Access Journals (Sweden)

    A. A. Salama

    2014-09-01

    Full Text Available Since the world is full of indeterminacy, the neutrosophics found their place into contemporary research. The fundamental concepts of neutrosophic set, introduced by Smarandache. Recently, Salama et al., introduced the concept of correlation coefficient of neutrosophic data. In this paper, we introduce and study the concepts of correlation and correlation coefficient of neutrosophic data in probability spaces and study some of their properties. Also, we introduce and study the neutrosophic simple linear regression model. Possible applications to data processing are touched upon.

  5. Minimax Regression Quantiles

    DEFF Research Database (Denmark)

    Bache, Stefan Holst

    A new and alternative quantile regression estimator is developed and it is shown that the estimator is root n-consistent and asymptotically normal. The estimator is based on a minimax ‘deviance function’ and has asymptotically equivalent properties to the usual quantile regression estimator. It is......, however, a different and therefore new estimator. It allows for both linear- and nonlinear model specifications. A simple algorithm for computing the estimates is proposed. It seems to work quite well in practice but whether it has theoretical justification is still an open question....

  6. riskRegression

    DEFF Research Database (Denmark)

    Ozenne, Brice; Sørensen, Anne Lyngholm; Scheike, Thomas

    2017-01-01

    In the presence of competing risks a prediction of the time-dynamic absolute risk of an event can be based on cause-specific Cox regression models for the event and the competing risks (Benichou and Gail, 1990). We present computationally fast and memory optimized C++ functions with an R interface...... for predicting the covariate specific absolute risks, their confidence intervals, and their confidence bands based on right censored time to event data. We provide explicit formulas for our implementation of the estimator of the (stratified) baseline hazard function in the presence of tied event times. As a by...... functionals. The software presented here is implemented in the riskRegression package....

  7. Bayesian logistic regression analysis

    NARCIS (Netherlands)

    Van Erp, H.R.N.; Van Gelder, P.H.A.J.M.

    2012-01-01

    In this paper we present a Bayesian logistic regression analysis. It is found that if one wishes to derive the posterior distribution of the probability of some event, then, together with the traditional Bayes Theorem and the integrating out of nuissance parameters, the Jacobian transformation is an

  8. Linear Regression Analysis

    CERN Document Server

    Seber, George A F

    2012-01-01

    Concise, mathematically clear, and comprehensive treatment of the subject.* Expanded coverage of diagnostics and methods of model fitting.* Requires no specialized knowledge beyond a good grasp of matrix algebra and some acquaintance with straight-line regression and simple analysis of variance models.* More than 200 problems throughout the book plus outline solutions for the exercises.* This revision has been extensively class-tested.

  9. Nonlinear Regression with R

    CERN Document Server

    Ritz, Christian; Parmigiani, Giovanni

    2009-01-01

    R is a rapidly evolving lingua franca of graphical display and statistical analysis of experiments from the applied sciences. This book provides a coherent treatment of nonlinear regression with R by means of examples from a diversity of applied sciences such as biology, chemistry, engineering, medicine and toxicology.

  10. Bayesian ARTMAP for regression.

    Science.gov (United States)

    Sasu, L M; Andonie, R

    2013-10-01

    Bayesian ARTMAP (BA) is a recently introduced neural architecture which uses a combination of Fuzzy ARTMAP competitive learning and Bayesian learning. Training is generally performed online, in a single-epoch. During training, BA creates input data clusters as Gaussian categories, and also infers the conditional probabilities between input patterns and categories, and between categories and classes. During prediction, BA uses Bayesian posterior probability estimation. So far, BA was used only for classification. The goal of this paper is to analyze the efficiency of BA for regression problems. Our contributions are: (i) we generalize the BA algorithm using the clustering functionality of both ART modules, and name it BA for Regression (BAR); (ii) we prove that BAR is a universal approximator with the best approximation property. In other words, BAR approximates arbitrarily well any continuous function (universal approximation) and, for every given continuous function, there is one in the set of BAR approximators situated at minimum distance (best approximation); (iii) we experimentally compare the online trained BAR with several neural models, on the following standard regression benchmarks: CPU Computer Hardware, Boston Housing, Wisconsin Breast Cancer, and Communities and Crime. Our results show that BAR is an appropriate tool for regression tasks, both for theoretical and practical reasons. Copyright © 2013 Elsevier Ltd. All rights reserved.

  11. Bounded Gaussian process regression

    DEFF Research Database (Denmark)

    Jensen, Bjørn Sand; Nielsen, Jens Brehm; Larsen, Jan

    2013-01-01

    We extend the Gaussian process (GP) framework for bounded regression by introducing two bounded likelihood functions that model the noise on the dependent variable explicitly. This is fundamentally different from the implicit noise assumption in the previously suggested warped GP framework. We...... with the proposed explicit noise-model extension....

  12. and Multinomial Logistic Regression

    African Journals Online (AJOL)

    This work presented the results of an experimental comparison of two models: Multinomial Logistic Regression (MLR) and Artificial Neural Network (ANN) for classifying students based on their academic performance. The predictive accuracy for each model was measured by their average Classification Correct Rate (CCR).

  13. Mechanisms of neuroblastoma regression

    Science.gov (United States)

    Brodeur, Garrett M.; Bagatell, Rochelle

    2014-01-01

    Recent genomic and biological studies of neuroblastoma have shed light on the dramatic heterogeneity in the clinical behaviour of this disease, which spans from spontaneous regression or differentiation in some patients, to relentless disease progression in others, despite intensive multimodality therapy. This evidence also suggests several possible mechanisms to explain the phenomena of spontaneous regression in neuroblastomas, including neurotrophin deprivation, humoral or cellular immunity, loss of telomerase activity and alterations in epigenetic regulation. A better understanding of the mechanisms of spontaneous regression might help to identify optimal therapeutic approaches for patients with these tumours. Currently, the most druggable mechanism is the delayed activation of developmentally programmed cell death regulated by the tropomyosin receptor kinase A pathway. Indeed, targeted therapy aimed at inhibiting neurotrophin receptors might be used in lieu of conventional chemotherapy or radiation in infants with biologically favourable tumours that require treatment. Alternative approaches consist of breaking immune tolerance to tumour antigens or activating neurotrophin receptor pathways to induce neuronal differentiation. These approaches are likely to be most effective against biologically favourable tumours, but they might also provide insights into treatment of biologically unfavourable tumours. We describe the different mechanisms of spontaneous neuroblastoma regression and the consequent therapeutic approaches. PMID:25331179

  14. Improvement of Storm Forecasts Using Gridded Bayesian Linear Regression for Northeast United States

    Science.gov (United States)

    Yang, J.; Astitha, M.; Schwartz, C. S.

    2017-12-01

    Bayesian linear regression (BLR) is a post-processing technique in which regression coefficients are derived and used to correct raw forecasts based on pairs of observation-model values. This study presents the development and application of a gridded Bayesian linear regression (GBLR) as a new post-processing technique to improve numerical weather prediction (NWP) of rain and wind storm forecasts over northeast United States. Ten controlled variables produced from ten ensemble members of the National Center for Atmospheric Research (NCAR) real-time prediction system are used for a GBLR model. In the GBLR framework, leave-one-storm-out cross-validation is utilized to study the performances of the post-processing technique in a database composed of 92 storms. To estimate the regression coefficients of the GBLR, optimization procedures that minimize the systematic and random error of predicted atmospheric variables (wind speed, precipitation, etc.) are implemented for the modeled-observed pairs of training storms. The regression coefficients calculated for meteorological stations of the National Weather Service are interpolated back to the model domain. An analysis of forecast improvements based on error reductions during the storms will demonstrate the value of GBLR approach. This presentation will also illustrate how the variances are optimized for the training partition in GBLR and discuss the verification strategy for grid points where no observations are available. The new post-processing technique is successful in improving wind speed and precipitation storm forecasts using past event-based data and has the potential to be implemented in real-time.

  15. Ridge Regression Signal Processing

    Science.gov (United States)

    Kuhl, Mark R.

    1990-01-01

    The introduction of the Global Positioning System (GPS) into the National Airspace System (NAS) necessitates the development of Receiver Autonomous Integrity Monitoring (RAIM) techniques. In order to guarantee a certain level of integrity, a thorough understanding of modern estimation techniques applied to navigational problems is required. The extended Kalman filter (EKF) is derived and analyzed under poor geometry conditions. It was found that the performance of the EKF is difficult to predict, since the EKF is designed for a Gaussian environment. A novel approach is implemented which incorporates ridge regression to explain the behavior of an EKF in the presence of dynamics under poor geometry conditions. The basic principles of ridge regression theory are presented, followed by the derivation of a linearized recursive ridge estimator. Computer simulations are performed to confirm the underlying theory and to provide a comparative analysis of the EKF and the recursive ridge estimator.

  16. Subset selection in regression

    CERN Document Server

    Miller, Alan

    2002-01-01

    Originally published in 1990, the first edition of Subset Selection in Regression filled a significant gap in the literature, and its critical and popular success has continued for more than a decade. Thoroughly revised to reflect progress in theory, methods, and computing power, the second edition promises to continue that tradition. The author has thoroughly updated each chapter, incorporated new material on recent developments, and included more examples and references. New in the Second Edition:A separate chapter on Bayesian methodsComplete revision of the chapter on estimationA major example from the field of near infrared spectroscopyMore emphasis on cross-validationGreater focus on bootstrappingStochastic algorithms for finding good subsets from large numbers of predictors when an exhaustive search is not feasible Software available on the Internet for implementing many of the algorithms presentedMore examplesSubset Selection in Regression, Second Edition remains dedicated to the techniques for fitting...

  17. (Non) linear regression modelling

    NARCIS (Netherlands)

    Cizek, P.; Gentle, J.E.; Hardle, W.K.; Mori, Y.

    2012-01-01

    We will study causal relationships of a known form between random variables. Given a model, we distinguish one or more dependent (endogenous) variables Y = (Y1,…,Yl), l ∈ N, which are explained by a model, and independent (exogenous, explanatory) variables X = (X1,…,Xp),p ∈ N, which explain or

  18. Regression in organizational leadership.

    Science.gov (United States)

    Kernberg, O F

    1979-02-01

    The choice of good leaders is a major task for all organizations. Inforamtion regarding the prospective administrator's personality should complement questions regarding his previous experience, his general conceptual skills, his technical knowledge, and the specific skills in the area for which he is being selected. The growing psychoanalytic knowledge about the crucial importance of internal, in contrast to external, object relations, and about the mutual relationships of regression in individuals and in groups, constitutes an important practical tool for the selection of leaders.

  19. Classification and regression trees

    CERN Document Server

    Breiman, Leo; Olshen, Richard A; Stone, Charles J

    1984-01-01

    The methodology used to construct tree structured rules is the focus of this monograph. Unlike many other statistical procedures, which moved from pencil and paper to calculators, this text's use of trees was unthinkable before computers. Both the practical and theoretical sides have been developed in the authors' study of tree methods. Classification and Regression Trees reflects these two sides, covering the use of trees as a data analysis method, and in a more mathematical framework, proving some of their fundamental properties.

  20. Attenuation coefficients of soils

    International Nuclear Information System (INIS)

    Martini, E.; Naziry, M.J.

    1989-01-01

    As a prerequisite to the interpretation of gamma-spectrometric in situ measurements of activity concentrations of soil radionuclides the attenuation of 60 to 1332 keV gamma radiation by soil samples varying in water content and density has been investigated. A useful empirical equation could be set up to describe the dependence of the mass attenuation coefficient upon photon energy for soil with a mean water content of 10%, with the results comparing well with data in the literature. The mean density of soil in the GDR was estimated at 1.6 g/cm 3 . This value was used to derive the linear attenuation coefficients, their range of variation being 10%. 7 figs., 5 tabs. (author)

  1. Logistic regression models

    CERN Document Server

    Hilbe, Joseph M

    2009-01-01

    This book really does cover everything you ever wanted to know about logistic regression … with updates available on the author's website. Hilbe, a former national athletics champion, philosopher, and expert in astronomy, is a master at explaining statistical concepts and methods. Readers familiar with his other expository work will know what to expect-great clarity.The book provides considerable detail about all facets of logistic regression. No step of an argument is omitted so that the book will meet the needs of the reader who likes to see everything spelt out, while a person familiar with some of the topics has the option to skip "obvious" sections. The material has been thoroughly road-tested through classroom and web-based teaching. … The focus is on helping the reader to learn and understand logistic regression. The audience is not just students meeting the topic for the first time, but also experienced users. I believe the book really does meet the author's goal … .-Annette J. Dobson, Biometric...

  2. Modelling subject-specific childhood growth using linear mixed-effect models with cubic regression splines.

    Science.gov (United States)

    Grajeda, Laura M; Ivanescu, Andrada; Saito, Mayuko; Crainiceanu, Ciprian; Jaganath, Devan; Gilman, Robert H; Crabtree, Jean E; Kelleher, Dermott; Cabrera, Lilia; Cama, Vitaliano; Checkley, William

    2016-01-01

    Childhood growth is a cornerstone of pediatric research. Statistical models need to consider individual trajectories to adequately describe growth outcomes. Specifically, well-defined longitudinal models are essential to characterize both population and subject-specific growth. Linear mixed-effect models with cubic regression splines can account for the nonlinearity of growth curves and provide reasonable estimators of population and subject-specific growth, velocity and acceleration. We provide a stepwise approach that builds from simple to complex models, and account for the intrinsic complexity of the data. We start with standard cubic splines regression models and build up to a model that includes subject-specific random intercepts and slopes and residual autocorrelation. We then compared cubic regression splines vis-à-vis linear piecewise splines, and with varying number of knots and positions. Statistical code is provided to ensure reproducibility and improve dissemination of methods. Models are applied to longitudinal height measurements in a cohort of 215 Peruvian children followed from birth until their fourth year of life. Unexplained variability, as measured by the variance of the regression model, was reduced from 7.34 when using ordinary least squares to 0.81 (p linear mixed-effect models with random slopes and a first order continuous autoregressive error term. There was substantial heterogeneity in both the intercept (p modeled with a first order continuous autoregressive error term as evidenced by the variogram of the residuals and by a lack of association among residuals. The final model provides a parametric linear regression equation for both estimation and prediction of population- and individual-level growth in height. We show that cubic regression splines are superior to linear regression splines for the case of a small number of knots in both estimation and prediction with the full linear mixed effect model (AIC 19,352 vs. 19

  3. Testing the equality of nonparametric regression curves based on ...

    African Journals Online (AJOL)

    Abstract. In this work we propose a new methodology for the comparison of two regression functions f1 and f2 in the case of homoscedastic error structure and a fixed design. Our approach is based on the empirical Fourier coefficients of the regression functions f1 and f2 respectively. As our main results we obtain the ...

  4. Implicit collinearity effect in linear regression: Application to basal ...

    African Journals Online (AJOL)

    Collinearity of predictor variables is a severe problem in the least square regression analysis. It contributes to the instability of regression coefficients and leads to a wrong prediction accuracy. Despite these problems, studies are conducted with a large number of observed and derived variables linked with a response ...

  5. Changes in persistence, spurious regressions and the Fisher hypothesis

    DEFF Research Database (Denmark)

    Kruse, Robinson; Ventosa-Santaulària, Daniel; Noriega, Antonio E.

    Declining inflation persistence has been documented in numerous studies. When such series are analyzed in a regression framework in conjunction with other persistent time series, spurious regressions are likely to occur. We propose to use the coefficient of determination R2 as a test statistic to...

  6. Steganalysis using logistic regression

    Science.gov (United States)

    Lubenko, Ivans; Ker, Andrew D.

    2011-02-01

    We advocate Logistic Regression (LR) as an alternative to the Support Vector Machine (SVM) classifiers commonly used in steganalysis. LR offers more information than traditional SVM methods - it estimates class probabilities as well as providing a simple classification - and can be adapted more easily and efficiently for multiclass problems. Like SVM, LR can be kernelised for nonlinear classification, and it shows comparable classification accuracy to SVM methods. This work is a case study, comparing accuracy and speed of SVM and LR classifiers in detection of LSB Matching and other related spatial-domain image steganography, through the state-of-art 686-dimensional SPAM feature set, in three image sets.

  7. SEPARATION PHENOMENA LOGISTIC REGRESSION

    Directory of Open Access Journals (Sweden)

    Ikaro Daniel de Carvalho Barreto

    2014-03-01

    Full Text Available This paper proposes an application of concepts about the maximum likelihood estimation of the binomial logistic regression model to the separation phenomena. It generates bias in the estimation and provides different interpretations of the estimates on the different statistical tests (Wald, Likelihood Ratio and Score and provides different estimates on the different iterative methods (Newton-Raphson and Fisher Score. It also presents an example that demonstrates the direct implications for the validation of the model and validation of variables, the implications for estimates of odds ratios and confidence intervals, generated from the Wald statistics. Furthermore, we present, briefly, the Firth correction to circumvent the phenomena of separation.

  8. riskRegression

    DEFF Research Database (Denmark)

    Ozenne, Brice; Sørensen, Anne Lyngholm; Scheike, Thomas

    2017-01-01

    In the presence of competing risks a prediction of the time-dynamic absolute risk of an event can be based on cause-specific Cox regression models for the event and the competing risks (Benichou and Gail, 1990). We present computationally fast and memory optimized C++ functions with an R interface......-product we obtain fast access to the baseline hazards (compared to survival::basehaz()) and predictions of survival probabilities, their confidence intervals and confidence bands. Confidence intervals and confidence bands are based on point-wise asymptotic expansions of the corresponding statistical...

  9. Adaptive metric kernel regression

    DEFF Research Database (Denmark)

    Goutte, Cyril; Larsen, Jan

    2000-01-01

    Kernel smoothing is a widely used non-parametric pattern recognition technique. By nature, it suffers from the curse of dimensionality and is usually difficult to apply to high input dimensions. In this contribution, we propose an algorithm that adapts the input metric used in multivariate...... regression by minimising a cross-validation estimate of the generalisation error. This allows to automatically adjust the importance of different dimensions. The improvement in terms of modelling performance is illustrated on a variable selection task where the adaptive metric kernel clearly outperforms...

  10. Adaptive Metric Kernel Regression

    DEFF Research Database (Denmark)

    Goutte, Cyril; Larsen, Jan

    1998-01-01

    Kernel smoothing is a widely used nonparametric pattern recognition technique. By nature, it suffers from the curse of dimensionality and is usually difficult to apply to high input dimensions. In this paper, we propose an algorithm that adapts the input metric used in multivariate regression...... by minimising a cross-validation estimate of the generalisation error. This allows one to automatically adjust the importance of different dimensions. The improvement in terms of modelling performance is illustrated on a variable selection task where the adaptive metric kernel clearly outperforms the standard...

  11. The Truth About Ballistic Coefficients

    OpenAIRE

    Courtney, Michael; Courtney, Amy

    2007-01-01

    The ballistic coefficient of a bullet describes how it slows in flight due to air resistance. This article presents experimental determinations of ballistic coefficients showing that the majority of bullets tested have their previously published ballistic coefficients exaggerated from 5-25% by the bullet manufacturers. These exaggerated ballistic coefficients lead to inaccurate predictions of long range bullet drop, retained energy and wind drift.

  12. On Solving Lq-Penalized Regressions

    Directory of Open Access Journals (Sweden)

    Tracy Zhou Wu

    2007-01-01

    Full Text Available Lq-penalized regression arises in multidimensional statistical modelling where all or part of the regression coefficients are penalized to achieve both accuracy and parsimony of statistical models. There is often substantial computational difficulty except for the quadratic penalty case. The difficulty is partly due to the nonsmoothness of the objective function inherited from the use of the absolute value. We propose a new solution method for the general Lq-penalized regression problem based on space transformation and thus efficient optimization algorithms. The new method has immediate applications in statistics, notably in penalized spline smoothing problems. In particular, the LASSO problem is shown to be polynomial time solvable. Numerical studies show promise of our approach.

  13. Influence diagnostics in meta-regression model.

    Science.gov (United States)

    Shi, Lei; Zuo, ShanShan; Yu, Dalei; Zhou, Xiaohua

    2017-09-01

    This paper studies the influence diagnostics in meta-regression model including case deletion diagnostic and local influence analysis. We derive the subset deletion formulae for the estimation of regression coefficient and heterogeneity variance and obtain the corresponding influence measures. The DerSimonian and Laird estimation and maximum likelihood estimation methods in meta-regression are considered, respectively, to derive the results. Internal and external residual and leverage measure are defined. The local influence analysis based on case-weights perturbation scheme, responses perturbation scheme, covariate perturbation scheme, and within-variance perturbation scheme are explored. We introduce a method by simultaneous perturbing responses, covariate, and within-variance to obtain the local influence measure, which has an advantage of capable to compare the influence magnitude of influential studies from different perturbations. An example is used to illustrate the proposed methodology. Copyright © 2017 John Wiley & Sons, Ltd.

  14. Aid and growth regressions

    DEFF Research Database (Denmark)

    Hansen, Henrik; Tarp, Finn

    2001-01-01

    This paper examines the relationship between foreign aid and growth in real GDP per capita as it emerges from simple augmentations of popular cross country growth specifications. It is shown that aid in all likelihood increases the growth rate, and this result is not conditional on ‘good’ policy....... investment. We conclude by stressing the need for more theoretical work before this kind of cross-country regressions are used for policy purposes.......This paper examines the relationship between foreign aid and growth in real GDP per capita as it emerges from simple augmentations of popular cross country growth specifications. It is shown that aid in all likelihood increases the growth rate, and this result is not conditional on ‘good’ policy...

  15. The microcomputer scientific software series 2: general linear model--regression.

    Science.gov (United States)

    Harold M. Rauscher

    1983-01-01

    The general linear model regression (GLMR) program provides the microcomputer user with a sophisticated regression analysis capability. The output provides a regression ANOVA table, estimators of the regression model coefficients, their confidence intervals, confidence intervals around the predicted Y-values, residuals for plotting, a check for multicollinearity, a...

  16. A logistic regression estimating function for spatial Gibbs point processes

    DEFF Research Database (Denmark)

    Baddeley, Adrian; Coeurjolly, Jean-François; Rubak, Ege

    We propose a computationally efficient logistic regression estimating function for spatial Gibbs point processes. The sample points for the logistic regression consist of the observed point pattern together with a random pattern of dummy points. The estimating function is closely related to the p......We propose a computationally efficient logistic regression estimating function for spatial Gibbs point processes. The sample points for the logistic regression consist of the observed point pattern together with a random pattern of dummy points. The estimating function is closely related...

  17. Quantum Non-Markovian Langevin Equations and Transport Coefficients

    International Nuclear Information System (INIS)

    Sargsyan, V.V.; Antonenko, N.V.; Kanokov, Z.; Adamian, G.G.

    2005-01-01

    Quantum diffusion equations featuring explicitly time-dependent transport coefficients are derived from generalized non-Markovian Langevin equations. Generalized fluctuation-dissipation relations and analytic expressions for calculating the friction and diffusion coefficients in nuclear processes are obtained. The asymptotic behavior of the transport coefficients and correlation functions for a damped harmonic oscillator that is linearly coupled in momentum to a heat bath is studied. The coupling to a heat bath in momentum is responsible for the appearance of the diffusion coefficient in coordinate. The problem of regression of correlations in quantum dissipative systems is analyzed

  18. A varying-coefficient method for analyzing longitudinal clinical trials data with nonignorable dropout

    Science.gov (United States)

    Forster, Jeri E.; MaWhinney, Samantha; Ball, Erika L.; Fairclough, Diane

    2011-01-01

    Dropout is common in longitudinal clinical trials and when the probability of dropout depends on unobserved outcomes even after conditioning on available data, it is considered missing not at random and therefore nonignorable. To address this problem, mixture models can be used to account for the relationship between a longitudinal outcome and dropout. We propose a Natural Spline Varying-coefficient mixture model (NSV), which is a straightforward extension of the parametric Conditional Linear Model (CLM). We assume that the outcome follows a varying-coefficient model conditional on a continuous dropout distribution. Natural cubic B-splines are used to allow the regression coefficients to semiparametrically depend on dropout and inference is therefore more robust. Additionally, this method is computationally stable and relatively simple to implement. We conduct simulation studies to evaluate performance and compare methodologies in settings where the longitudinal trajectories are linear and dropout time is observed for all individuals. Performance is assessed under conditions where model assumptions are both met and violated. In addition, we compare the NSV to the CLM and a standard random-effects model using an HIV/AIDS clinical trial with probable nonignorable dropout. The simulation studies suggest that the NSV is an improvement over the CLM when dropout has a nonlinear dependence on the outcome. PMID:22101223

  19. Canonical variate regression.

    Science.gov (United States)

    Luo, Chongliang; Liu, Jin; Dey, Dipak K; Chen, Kun

    2016-07-01

    In many fields, multi-view datasets, measuring multiple distinct but interrelated sets of characteristics on the same set of subjects, together with data on certain outcomes or phenotypes, are routinely collected. The objective in such a problem is often two-fold: both to explore the association structures of multiple sets of measurements and to develop a parsimonious model for predicting the future outcomes. We study a unified canonical variate regression framework to tackle the two problems simultaneously. The proposed criterion integrates multiple canonical correlation analysis with predictive modeling, balancing between the association strength of the canonical variates and their joint predictive power on the outcomes. Moreover, the proposed criterion seeks multiple sets of canonical variates simultaneously to enable the examination of their joint effects on the outcomes, and is able to handle multivariate and non-Gaussian outcomes. An efficient algorithm based on variable splitting and Lagrangian multipliers is proposed. Simulation studies show the superior performance of the proposed approach. We demonstrate the effectiveness of the proposed approach in an [Formula: see text] intercross mice study and an alcohol dependence study. © The Author 2016. Published by Oxford University Press. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.

  20. On the Kendall Correlation Coefficient

    OpenAIRE

    Stepanov, Alexei

    2015-01-01

    In the present paper, we first discuss the Kendall rank correlation coefficient. In continuous case, we define the Kendall rank correlation coefficient in terms of the concomitants of order statistics, find the expected value of the Kendall rank correlation coefficient and show that the later is free of n. We also prove that in continuous case the Kendall correlation coefficient converges in probability to its expected value. We then propose to consider the expected value of the Kendall rank ...

  1. Regression and regression analysis time series prediction modeling on climate data of quetta, pakistan

    International Nuclear Information System (INIS)

    Jafri, Y.Z.; Kamal, L.

    2007-01-01

    Various statistical techniques was used on five-year data from 1998-2002 of average humidity, rainfall, maximum and minimum temperatures, respectively. The relationships to regression analysis time series (RATS) were developed for determining the overall trend of these climate parameters on the basis of which forecast models can be corrected and modified. We computed the coefficient of determination as a measure of goodness of fit, to our polynomial regression analysis time series (PRATS). The correlation to multiple linear regression (MLR) and multiple linear regression analysis time series (MLRATS) were also developed for deciphering the interdependence of weather parameters. Spearman's rand correlation and Goldfeld-Quandt test were used to check the uniformity or non-uniformity of variances in our fit to polynomial regression (PR). The Breusch-Pagan test was applied to MLR and MLRATS, respectively which yielded homoscedasticity. We also employed Bartlett's test for homogeneity of variances on a five-year data of rainfall and humidity, respectively which showed that the variances in rainfall data were not homogenous while in case of humidity, were homogenous. Our results on regression and regression analysis time series show the best fit to prediction modeling on climatic data of Quetta, Pakistan. (author)

  2. Genetic Analysis of Daily Maximum Milking Speed by a Random Walk Model in Dairy Cows

    DEFF Research Database (Denmark)

    Karacaören, Burak; Janss, Luc; Kadarmideen, Haja

    Data were obtained from dairy cows stationed at research farm ETH Zurich for maximum milking speed. The main aims of this paper are a) to evaluate if the Wood curve is suitable to model mean lactation curve b) to predict longitudinal breeding values by random regression and random walk models of ...... filter applications: random walk model could give online prediction of breeding values. Hence without waiting for whole lactation records, genetic evaluation could be made when the daily or monthly data is available......Data were obtained from dairy cows stationed at research farm ETH Zurich for maximum milking speed. The main aims of this paper are a) to evaluate if the Wood curve is suitable to model mean lactation curve b) to predict longitudinal breeding values by random regression and random walk models...... of maximum milking speed. Wood curve did not provide a good fit to the data set. Quadratic random regressions gave better predictions compared with the random walk model. However random walk model does not need to be evaluated for different orders of regression coefficients. In addition with the Kalman...

  3. Fixed kernel regression for voltammogram feature extraction

    International Nuclear Information System (INIS)

    Acevedo Rodriguez, F J; López-Sastre, R J; Gil-Jiménez, P; Maldonado Bascón, S; Ruiz-Reyes, N

    2009-01-01

    Cyclic voltammetry is an electroanalytical technique for obtaining information about substances under analysis without the need for complex flow systems. However, classifying the information in voltammograms obtained using this technique is difficult. In this paper, we propose the use of fixed kernel regression as a method for extracting features from these voltammograms, reducing the information to a few coefficients. The proposed approach has been applied to a wine classification problem with accuracy rates of over 98%. Although the method is described here for extracting voltammogram information, it can be used for other types of signals

  4. Polynomial regression analysis and significance test of the regression function

    International Nuclear Information System (INIS)

    Gao Zhengming; Zhao Juan; He Shengping

    2012-01-01

    In order to analyze the decay heating power of a certain radioactive isotope per kilogram with polynomial regression method, the paper firstly demonstrated the broad usage of polynomial function and deduced its parameters with ordinary least squares estimate. Then significance test method of polynomial regression function is derived considering the similarity between the polynomial regression model and the multivariable linear regression model. Finally, polynomial regression analysis and significance test of the polynomial function are done to the decay heating power of the iso tope per kilogram in accord with the authors' real work. (authors)

  5. Assessing risk factors for periodontitis using regression

    Science.gov (United States)

    Lobo Pereira, J. A.; Ferreira, Maria Cristina; Oliveira, Teresa

    2013-10-01

    Multivariate statistical analysis is indispensable to assess the associations and interactions between different factors and the risk of periodontitis. Among others, regression analysis is a statistical technique widely used in healthcare to investigate and model the relationship between variables. In our work we study the impact of socio-demographic, medical and behavioral factors on periodontal health. Using regression, linear and logistic models, we can assess the relevance, as risk factors for periodontitis disease, of the following independent variables (IVs): Age, Gender, Diabetic Status, Education, Smoking status and Plaque Index. The multiple linear regression analysis model was built to evaluate the influence of IVs on mean Attachment Loss (AL). Thus, the regression coefficients along with respective p-values will be obtained as well as the respective p-values from the significance tests. The classification of a case (individual) adopted in the logistic model was the extent of the destruction of periodontal tissues defined by an Attachment Loss greater than or equal to 4 mm in 25% (AL≥4mm/≥25%) of sites surveyed. The association measures include the Odds Ratios together with the correspondent 95% confidence intervals.

  6. Combining Alphas via Bounded Regression

    Directory of Open Access Journals (Sweden)

    Zura Kakushadze

    2015-11-01

    Full Text Available We give an explicit algorithm and source code for combining alpha streams via bounded regression. In practical applications, typically, there is insufficient history to compute a sample covariance matrix (SCM for a large number of alphas. To compute alpha allocation weights, one then resorts to (weighted regression over SCM principal components. Regression often produces alpha weights with insufficient diversification and/or skewed distribution against, e.g., turnover. This can be rectified by imposing bounds on alpha weights within the regression procedure. Bounded regression can also be applied to stock and other asset portfolio construction. We discuss illustrative examples.

  7. Regression in autistic spectrum disorders.

    Science.gov (United States)

    Stefanatos, Gerry A

    2008-12-01

    A significant proportion of children diagnosed with Autistic Spectrum Disorder experience a developmental regression characterized by a loss of previously-acquired skills. This may involve a loss of speech or social responsitivity, but often entails both. This paper critically reviews the phenomena of regression in autistic spectrum disorders, highlighting the characteristics of regression, age of onset, temporal course, and long-term outcome. Important considerations for diagnosis are discussed and multiple etiological factors currently hypothesized to underlie the phenomenon are reviewed. It is argued that regressive autistic spectrum disorders can be conceptualized on a spectrum with other regressive disorders that may share common pathophysiological features. The implications of this viewpoint are discussed.

  8. Advanced statistics: linear regression, part I: simple linear regression.

    Science.gov (United States)

    Marill, Keith A

    2004-01-01

    Simple linear regression is a mathematical technique used to model the relationship between a single independent predictor variable and a single dependent outcome variable. In this, the first of a two-part series exploring concepts in linear regression analysis, the four fundamental assumptions and the mechanics of simple linear regression are reviewed. The most common technique used to derive the regression line, the method of least squares, is described. The reader will be acquainted with other important concepts in simple linear regression, including: variable transformations, dummy variables, relationship to inference testing, and leverage. Simplified clinical examples with small datasets and graphic models are used to illustrate the points. This will provide a foundation for the second article in this series: a discussion of multiple linear regression, in which there are multiple predictor variables.

  9. Analysis of quantile regression as alternative to ordinary least squares

    OpenAIRE

    Ibrahim Abdullahi; Abubakar Yahaya

    2015-01-01

    In this article, an alternative to ordinary least squares (OLS) regression based on analytical solution in the Statgraphics software is considered, and this alternative is no other than quantile regression (QR) model. We also present goodness of fit statistic as well as approximate distributions of the associated test statistics for the parameters. Furthermore, we suggest a goodness of fit statistic called the least absolute deviation (LAD) coefficient of determination. The procedure is well ...

  10. Quadrature formulas for Fourier coefficients

    KAUST Repository

    Bojanov, Borislav

    2009-09-01

    We consider quadrature formulas of high degree of precision for the computation of the Fourier coefficients in expansions of functions with respect to a system of orthogonal polynomials. In particular, we show the uniqueness of a multiple node formula for the Fourier-Tchebycheff coefficients given by Micchelli and Sharma and construct new Gaussian formulas for the Fourier coefficients of a function, based on the values of the function and its derivatives. © 2009 Elsevier B.V. All rights reserved.

  11. Household water treatment in developing countries: comparing different intervention types using meta-regression.

    Science.gov (United States)

    Hunter, Paul R

    2009-12-01

    Household water treatment (HWT) is being widely promoted as an appropriate intervention for reducing the burden of waterborne disease in poor communities in developing countries. A recent study has raised concerns about the effectiveness of HWT, in part because of concerns over the lack of blinding and in part because of considerable heterogeneity in the reported effectiveness of randomized controlled trials. This study set out to attempt to investigate the causes of this heterogeneity and so identify factors associated with good health gains. Studies identified in an earlier systematic review and meta-analysis were supplemented with more recently published randomized controlled trials. A total of 28 separate studies of randomized controlled trials of HWT with 39 intervention arms were included in the analysis. Heterogeneity was studied using the "metareg" command in Stata. Initial analyses with single candidate predictors were undertaken and all variables significant at the P Risk and the parameter estimates from the final regression model. The overall effect size of all unblinded studies was relative risk = 0.56 (95% confidence intervals 0.51-0.63), but after adjusting for bias due to lack of blinding the effect size was much lower (RR = 0.85, 95% CI = 0.76-0.97). Four main variables were significant predictors of effectiveness of intervention in a multipredictor meta regression model: Log duration of study follow-up (regression coefficient of log effect size = 0.186, standard error (SE) = 0.072), whether or not the study was blinded (coefficient 0.251, SE 0.066) and being conducted in an emergency setting (coefficient -0.351, SE 0.076) were all significant predictors of effect size in the final model. Compared to the ceramic filter all other interventions were much less effective (Biosand 0.247, 0.073; chlorine and safe waste storage 0.295, 0.061; combined coagulant-chlorine 0.2349, 0.067; SODIS 0.302, 0.068). A Monte Carlo model predicted that over 12 months

  12. Coefficient Alpha: A Reliability Coefficient for the 21st Century?

    Science.gov (United States)

    Yang, Yanyun; Green, Samuel B.

    2011-01-01

    Coefficient alpha is almost universally applied to assess reliability of scales in psychology. We argue that researchers should consider alternatives to coefficient alpha. Our preference is for structural equation modeling (SEM) estimates of reliability because they are informative and allow for an empirical evaluation of the assumptions…

  13. Coefficient estimates of negative powers and inverse coefficients for ...

    Indian Academy of Sciences (India)

    and the inequality is sharp for the inverse of the Koebe function k(z) = z/(1 − z)2. An alternative approach to the inverse coefficient problem for functions in the class S has been investigated by Schaeffer and Spencer [27] and FitzGerald [6]. Although, the inverse coefficient problem for the class S has been completely solved ...

  14. Measuring of heat transfer coefficient

    DEFF Research Database (Denmark)

    Henningsen, Poul; Lindegren, Maria

    Subtask 3.4 Measuring of heat transfer coefficient Subtask 3.4.1 Design and setting up of tests to measure heat transfer coefficient Objective: Complementary testing methods together with the relevant experimental equipment are to be designed by the two partners involved in order to measure...... the heat transfer coefficient for a wide range of interface conditions in hot and warm forging processes. Subtask 3.4.2 Measurement of heat transfer coefficient The objective of subtask 3.4.2 is to determine heat transfer values for different interface conditions reflecting those typically operating in hot...

  15. Estimation of octanol/water partition coefficients using LSER parameters

    Science.gov (United States)

    Luehrs, Dean C.; Hickey, James P.; Godbole, Kalpana A.; Rogers, Tony N.

    1998-01-01

    The logarithms of octanol/water partition coefficients, logKow, were regressed against the linear solvation energy relationship (LSER) parameters for a training set of 981 diverse organic chemicals. The standard deviation for logKow was 0.49. The regression equation was then used to estimate logKow for a test of 146 chemicals which included pesticides and other diverse polyfunctional compounds. Thus the octanol/water partition coefficient may be estimated by LSER parameters without elaborate software but only moderate accuracy should be expected.

  16. Moderation analysis using a two-level regression model.

    Science.gov (United States)

    Yuan, Ke-Hai; Cheng, Ying; Maxwell, Scott

    2014-10-01

    Moderation analysis is widely used in social and behavioral research. The most commonly used model for moderation analysis is moderated multiple regression (MMR) in which the explanatory variables of the regression model include product terms, and the model is typically estimated by least squares (LS). This paper argues for a two-level regression model in which the regression coefficients of a criterion variable on predictors are further regressed on moderator variables. An algorithm for estimating the parameters of the two-level model by normal-distribution-based maximum likelihood (NML) is developed. Formulas for the standard errors (SEs) of the parameter estimates are provided and studied. Results indicate that, when heteroscedasticity exists, NML with the two-level model gives more efficient and more accurate parameter estimates than the LS analysis of the MMR model. When error variances are homoscedastic, NML with the two-level model leads to essentially the same results as LS with the MMR model. Most importantly, the two-level regression model permits estimating the percentage of variance of each regression coefficient that is due to moderator variables. When applied to data from General Social Surveys 1991, NML with the two-level model identified a significant moderation effect of race on the regression of job prestige on years of education while LS with the MMR model did not. An R package is also developed and documented to facilitate the application of the two-level model.

  17. QSAR Modeling of COX -2 Inhibitory Activity of Some Dihydropyridine and Hydroquinoline Derivatives Using Multiple Linear Regression (MLR) Method.

    Science.gov (United States)

    Akbari, Somaye; Zebardast, Tannaz; Zarghi, Afshin; Hajimahdi, Zahra

    2017-01-01

    COX-2 inhibitory activities of some 1,4-dihydropyridine and 5-oxo-1,4,5,6,7,8-hexahydroquinoline derivatives were modeled by quantitative structure-activity relationship (QSAR) using stepwise-multiple linear regression (SW-MLR) method. The built model was robust and predictive with correlation coefficient (R 2 ) of 0.972 and 0.531 for training and test groups, respectively. The quality of the model was evaluated by leave-one-out (LOO) cross validation (LOO correlation coefficient (Q 2 ) of 0.943) and Y-randomization. We also employed a leverage approach for the defining of applicability domain of model. Based on QSAR models results, COX-2 inhibitory activity of selected data set had correlation with BEHm6 (highest eigenvalue n. 6 of Burden matrix/weighted by atomic masses), Mor03u (signal 03/unweighted) and IVDE (Mean information content on the vertex degree equality) descriptors which derived from their structures.

  18. Biostatistics Series Module 6: Correlation and Linear Regression.

    Science.gov (United States)

    Hazra, Avijit; Gogtay, Nithya

    2016-01-01

    Correlation and linear regression are the most commonly used techniques for quantifying the association between two numeric variables. Correlation quantifies the strength of the linear relationship between paired variables, expressing this as a correlation coefficient. If both variables x and y are normally distributed, we calculate Pearson's correlation coefficient ( r ). If normality assumption is not met for one or both variables in a correlation analysis, a rank correlation coefficient, such as Spearman's rho (ρ) may be calculated. A hypothesis test of correlation tests whether the linear relationship between the two variables holds in the underlying population, in which case it returns a P correlation coefficient can also be calculated for an idea of the correlation in the population. The value r 2 denotes the proportion of the variability of the dependent variable y that can be attributed to its linear relation with the independent variable x and is called the coefficient of determination. Linear regression is a technique that attempts to link two correlated variables x and y in the form of a mathematical equation ( y = a + bx ), such that given the value of one variable the other may be predicted. In general, the method of least squares is applied to obtain the equation of the regression line. Correlation and linear regression analysis are based on certain assumptions pertaining to the data sets. If these assumptions are not met, misleading conclusions may be drawn. The first assumption is that of linear relationship between the two variables. A scatter plot is essential before embarking on any correlation-regression analysis to show that this is indeed the case. Outliers or clustering within data sets can distort the correlation coefficient value. Finally, it is vital to remember that though strong correlation can be a pointer toward causation, the two are not synonymous.

  19. Mixed-effects regression models in linguistics

    CERN Document Server

    Heylen, Kris; Geeraerts, Dirk

    2018-01-01

    When data consist of grouped observations or clusters, and there is a risk that measurements within the same group are not independent, group-specific random effects can be added to a regression model in order to account for such within-group associations. Regression models that contain such group-specific random effects are called mixed-effects regression models, or simply mixed models. Mixed models are a versatile tool that can handle both balanced and unbalanced datasets and that can also be applied when several layers of grouping are present in the data; these layers can either be nested or crossed.  In linguistics, as in many other fields, the use of mixed models has gained ground rapidly over the last decade. This methodological evolution enables us to build more sophisticated and arguably more realistic models, but, due to its technical complexity, also introduces new challenges. This volume brings together a number of promising new evolutions in the use of mixed models in linguistics, but also addres...

  20. Linear regression in astronomy. II

    Science.gov (United States)

    Feigelson, Eric D.; Babu, Gutti J.

    1992-01-01

    A wide variety of least-squares linear regression procedures used in observational astronomy, particularly investigations of the cosmic distance scale, are presented and discussed. The classes of linear models considered are (1) unweighted regression lines, with bootstrap and jackknife resampling; (2) regression solutions when measurement error, in one or both variables, dominates the scatter; (3) methods to apply a calibration line to new data; (4) truncated regression models, which apply to flux-limited data sets; and (5) censored regression models, which apply when nondetections are present. For the calibration problem we develop two new procedures: a formula for the intercept offset between two parallel data sets, which propagates slope errors from one regression to the other; and a generalization of the Working-Hotelling confidence bands to nonstandard least-squares lines. They can provide improved error analysis for Faber-Jackson, Tully-Fisher, and similar cosmic distance scale relations.

  1. Time-adaptive quantile regression

    DEFF Research Database (Denmark)

    Møller, Jan Kloppenborg; Nielsen, Henrik Aalborg; Madsen, Henrik

    2008-01-01

    and an updating procedure are combined into a new algorithm for time-adaptive quantile regression, which generates new solutions on the basis of the old solution, leading to savings in computation time. The suggested algorithm is tested against a static quantile regression model on a data set with wind power......An algorithm for time-adaptive quantile regression is presented. The algorithm is based on the simplex algorithm, and the linear optimization formulation of the quantile regression problem is given. The observations have been split to allow a direct use of the simplex algorithm. The simplex method...... production, where the models combine splines and quantile regression. The comparison indicates superior performance for the time-adaptive quantile regression in all the performance parameters considered....

  2. Quantile regression theory and applications

    CERN Document Server

    Davino, Cristina; Vistocco, Domenico

    2013-01-01

    A guide to the implementation and interpretation of Quantile Regression models This book explores the theory and numerous applications of quantile regression, offering empirical data analysis as well as the software tools to implement the methods. The main focus of this book is to provide the reader with a comprehensivedescription of the main issues concerning quantile regression; these include basic modeling, geometrical interpretation, estimation and inference for quantile regression, as well as issues on validity of the model, diagnostic tools. Each methodological aspect is explored and

  3. [Hyperspectral Estimation of Apple Tree Canopy LAI Based on SVM and RF Regression].

    Science.gov (United States)

    Han, Zhao-ying; Zhu, Xi-cun; Fang, Xian-yi; Wang, Zhuo-yuan; Wang, Ling; Zhao, Geng-Xing; Jiang, Yuan-mao

    2016-03-01

    Leaf area index (LAI) is the dynamic index of crop population size. Hyperspectral technology can be used to estimate apple canopy LAI rapidly and nondestructively. It can be provide a reference for monitoring the tree growing and yield estimation. The Red Fuji apple trees of full bearing fruit are the researching objects. Ninety apple trees canopies spectral reflectance and LAI values were measured by the ASD Fieldspec3 spectrometer and LAI-2200 in thirty orchards in constant two years in Qixia research area of Shandong Province. The optimal vegetation indices were selected by the method of correlation analysis of the original spectral reflectance and vegetation indices. The models of predicting the LAI were built with the multivariate regression analysis method of support vector machine (SVM) and random forest (RF). The new vegetation indices, GNDVI527, ND-VI676, RVI682, FD-NVI656 and GRVI517 and the previous two main vegetation indices, NDVI670 and NDVI705, are in accordance with LAI. In the RF regression model, the calibration set decision coefficient C-R2 of 0.920 and validation set decision coefficient V-R2 of 0.889 are higher than the SVM regression model by 0.045 and 0.033 respectively. The root mean square error of calibration set C-RMSE of 0.249, the root mean square error validation set V-RMSE of 0.236 are lower than that of the SVM regression model by 0.054 and 0.058 respectively. Relative analysis of calibrating error C-RPD and relative analysis of validation set V-RPD reached 3.363 and 2.520, 0.598 and 0.262, respectively, which were higher than the SVM regression model. The measured and predicted the scatterplot trend line slope of the calibration set and validation set C-S and V-S are close to 1. The estimation result of RF regression model is better than that of the SVM. RF regression model can be used to estimate the LAI of red Fuji apple trees in full fruit period.

  4. Linear regression and the normality assumption.

    Science.gov (United States)

    Schmidt, Amand F; Finan, Chris

    2017-12-16

    Researchers often perform arbitrary outcome transformations to fulfill the normality assumption of a linear regression model. This commentary explains and illustrates that in large data settings, such transformations are often unnecessary, and worse may bias model estimates. Linear regression assumptions are illustrated using simulated data and an empirical example on the relation between time since type 2 diabetes diagnosis and glycated hemoglobin levels. Simulation results were evaluated on coverage; i.e., the number of times the 95% confidence interval included the true slope coefficient. Although outcome transformations bias point estimates, violations of the normality assumption in linear regression analyses do not. The normality assumption is necessary to unbiasedly estimate standard errors, and hence confidence intervals and P-values. However, in large sample sizes (e.g., where the number of observations per variable is >10) violations of this normality assumption often do not noticeably impact results. Contrary to this, assumptions on, the parametric model, absence of extreme observations, homoscedasticity, and independency of the errors, remain influential even in large sample size settings. Given that modern healthcare research typically includes thousands of subjects focusing on the normality assumption is often unnecessary, does not guarantee valid results, and worse may bias estimates due to the practice of outcome transformations. Copyright © 2017 Elsevier Inc. All rights reserved.

  5. Bayesian Inference of a Multivariate Regression Model

    Directory of Open Access Journals (Sweden)

    Marick S. Sinay

    2014-01-01

    Full Text Available We explore Bayesian inference of a multivariate linear regression model with use of a flexible prior for the covariance structure. The commonly adopted Bayesian setup involves the conjugate prior, multivariate normal distribution for the regression coefficients and inverse Wishart specification for the covariance matrix. Here we depart from this approach and propose a novel Bayesian estimator for the covariance. A multivariate normal prior for the unique elements of the matrix logarithm of the covariance matrix is considered. Such structure allows for a richer class of prior distributions for the covariance, with respect to strength of beliefs in prior location hyperparameters, as well as the added ability, to model potential correlation amongst the covariance structure. The posterior moments of all relevant parameters of interest are calculated based upon numerical results via a Markov chain Monte Carlo procedure. The Metropolis-Hastings-within-Gibbs algorithm is invoked to account for the construction of a proposal density that closely matches the shape of the target posterior distribution. As an application of the proposed technique, we investigate a multiple regression based upon the 1980 High School and Beyond Survey.

  6. Geographically weighted regression model on poverty indicator

    Science.gov (United States)

    Slamet, I.; Nugroho, N. F. T. A.; Muslich

    2017-12-01

    In this research, we applied geographically weighted regression (GWR) for analyzing the poverty in Central Java. We consider Gaussian Kernel as weighted function. The GWR uses the diagonal matrix resulted from calculating kernel Gaussian function as a weighted function in the regression model. The kernel weights is used to handle spatial effects on the data so that a model can be obtained for each location. The purpose of this paper is to model of poverty percentage data in Central Java province using GWR with Gaussian kernel weighted function and to determine the influencing factors in each regency/city in Central Java province. Based on the research, we obtained geographically weighted regression model with Gaussian kernel weighted function on poverty percentage data in Central Java province. We found that percentage of population working as farmers, population growth rate, percentage of households with regular sanitation, and BPJS beneficiaries are the variables that affect the percentage of poverty in Central Java province. In this research, we found the determination coefficient R2 are 68.64%. There are two categories of district which are influenced by different of significance factors.

  7. General regression and representation model for classification.

    Directory of Open Access Journals (Sweden)

    Jianjun Qian

    Full Text Available Recently, the regularized coding-based classification methods (e.g. SRC and CRC show a great potential for pattern classification. However, most existing coding methods assume that the representation residuals are uncorrelated. In real-world applications, this assumption does not hold. In this paper, we take account of the correlations of the representation residuals and develop a general regression and representation model (GRR for classification. GRR not only has advantages of CRC, but also takes full use of the prior information (e.g. the correlations between representation residuals and representation coefficients and the specific information (weight matrix of image pixels to enhance the classification performance. GRR uses the generalized Tikhonov regularization and K Nearest Neighbors to learn the prior information from the training data. Meanwhile, the specific information is obtained by using an iterative algorithm to update the feature (or image pixel weights of the test sample. With the proposed model as a platform, we design two classifiers: basic general regression and representation classifier (B-GRR and robust general regression and representation classifier (R-GRR. The experimental results demonstrate the performance advantages of proposed methods over state-of-the-art algorithms.

  8. Varying coefficients model with measurement error.

    Science.gov (United States)

    Li, Liang; Greene, Tom

    2008-06-01

    We propose a semiparametric partially varying coefficient model to study the relationship between serum creatinine concentration and the glomerular filtration rate (GFR) among kidney donors and patients with chronic kidney disease. A regression model is used to relate serum creatinine to GFR and demographic factors in which coefficient of GFR is expressed as a function of age to allow its effect to be age dependent. GFR measurements obtained from the clearance of a radioactively labeled isotope are assumed to be a surrogate for the true GFR, with the relationship between measured and true GFR expressed using an additive error model. We use locally corrected score equations to estimate parameters and coefficient functions, and propose an expected generalized cross-validation (EGCV) method to select the kernel bandwidth. The performance of the proposed methods, which avoid distributional assumptions on the true GFR and residuals, is investigated by simulation. Accounting for measurement error using the proposed model reduced apparent inconsistencies in the relationship between serum creatinine and GFR among different clinical data sets derived from kidney donor and chronic kidney disease source populations.

  9. Stochastic development regression using method of moments

    DEFF Research Database (Denmark)

    Kühnel, Line; Sommer, Stefan Horst

    2017-01-01

    This paper considers the estimation problem arising when inferring parameters in the stochastic development regression model for manifold valued non-linear data. Stochastic development regression captures the relation between manifold-valued response and Euclidean covariate variables using...... the stochastic development construction. It is thereby able to incorporate several covariate variables and random effects. The model is intrinsically defined using the connection of the manifold, and the use of stochastic development avoids linearizing the geometry. We propose to infer parameters using...... the Method of Moments procedure that matches known constraints on moments of the observations conditional on the latent variables. The performance of the model is investigated in a simulation example using data on finite dimensional landmark manifolds....

  10. Testing discontinuities in nonparametric regression

    KAUST Repository

    Dai, Wenlin

    2017-01-19

    In nonparametric regression, it is often needed to detect whether there are jump discontinuities in the mean function. In this paper, we revisit the difference-based method in [13 H.-G. Müller and U. Stadtmüller, Discontinuous versus smooth regression, Ann. Stat. 27 (1999), pp. 299–337. doi: 10.1214/aos/1018031100

  11. Testing discontinuities in nonparametric regression

    KAUST Repository

    Dai, Wenlin; Zhou, Yuejin; Tong, Tiejun

    2017-01-01

    In nonparametric regression, it is often needed to detect whether there are jump discontinuities in the mean function. In this paper, we revisit the difference-based method in [13 H.-G. Müller and U. Stadtmüller, Discontinuous versus smooth regression, Ann. Stat. 27 (1999), pp. 299–337. doi: 10.1214/aos/1018031100

  12. Logistic Regression: Concept and Application

    Science.gov (United States)

    Cokluk, Omay

    2010-01-01

    The main focus of logistic regression analysis is classification of individuals in different groups. The aim of the present study is to explain basic concepts and processes of binary logistic regression analysis intended to determine the combination of independent variables which best explain the membership in certain groups called dichotomous…

  13. Background stratified Poisson regression analysis of cohort data.

    Science.gov (United States)

    Richardson, David B; Langholz, Bryan

    2012-03-01

    Background stratified Poisson regression is an approach that has been used in the analysis of data derived from a variety of epidemiologically important studies of radiation-exposed populations, including uranium miners, nuclear industry workers, and atomic bomb survivors. We describe a novel approach to fit Poisson regression models that adjust for a set of covariates through background stratification while directly estimating the radiation-disease association of primary interest. The approach makes use of an expression for the Poisson likelihood that treats the coefficients for stratum-specific indicator variables as 'nuisance' variables and avoids the need to explicitly estimate the coefficients for these stratum-specific parameters. Log-linear models, as well as other general relative rate models, are accommodated. This approach is illustrated using data from the Life Span Study of Japanese atomic bomb survivors and data from a study of underground uranium miners. The point estimate and confidence interval obtained from this 'conditional' regression approach are identical to the values obtained using unconditional Poisson regression with model terms for each background stratum. Moreover, it is shown that the proposed approach allows estimation of background stratified Poisson regression models of non-standard form, such as models that parameterize latency effects, as well as regression models in which the number of strata is large, thereby overcoming the limitations of previously available statistical software for fitting background stratified Poisson regression models.

  14. Background stratified Poisson regression analysis of cohort data

    International Nuclear Information System (INIS)

    Richardson, David B.; Langholz, Bryan

    2012-01-01

    Background stratified Poisson regression is an approach that has been used in the analysis of data derived from a variety of epidemiologically important studies of radiation-exposed populations, including uranium miners, nuclear industry workers, and atomic bomb survivors. We describe a novel approach to fit Poisson regression models that adjust for a set of covariates through background stratification while directly estimating the radiation-disease association of primary interest. The approach makes use of an expression for the Poisson likelihood that treats the coefficients for stratum-specific indicator variables as 'nuisance' variables and avoids the need to explicitly estimate the coefficients for these stratum-specific parameters. Log-linear models, as well as other general relative rate models, are accommodated. This approach is illustrated using data from the Life Span Study of Japanese atomic bomb survivors and data from a study of underground uranium miners. The point estimate and confidence interval obtained from this 'conditional' regression approach are identical to the values obtained using unconditional Poisson regression with model terms for each background stratum. Moreover, it is shown that the proposed approach allows estimation of background stratified Poisson regression models of non-standard form, such as models that parameterize latency effects, as well as regression models in which the number of strata is large, thereby overcoming the limitations of previously available statistical software for fitting background stratified Poisson regression models. (orig.)

  15. Drag coefficient Variability and Thermospheric models

    Science.gov (United States)

    Moe, Kenneth

    Satellite drag coefficients depend upon a variety of factors: The shape of the satellite, its altitude, the eccentricity of its orbit, the temperature and mean molecular mass of the ambient atmosphere, and the time in the sunspot cycle. At altitudes where the mean free path of the atmospheric molecules is large compared to the dimensions of the satellite, the drag coefficients can be determined from the theory of free-molecule flow. The dependence on altitude is caused by the concentration of atomic oxygen which plays an important role by its ability to adsorb on the satellite surface and thereby affect the energy loss of molecules striking the surface. The eccentricity of the orbit determines the satellite velocity at perigee, and therefore the energy of the incident molecules relative to the energy of adsorption of atomic oxygen atoms on the surface. The temperature of the ambient atmosphere determines the extent to which the random thermal motion of the molecules influences the momentum transfer to the satellite. The time in the sunspot cycle affects the ambient temperature as well as the concentration of atomic oxygen at a particular altitude. Tables and graphs will be used to illustrate the variability of drag coefficients. Before there were any measurements of gas-surface interactions in orbit, Izakov and Cook independently made an excellent estimate that the drag coefficient of satellites of compact shape would be 2.2. That numerical value, independent of altitude, was used by Jacchia to construct his model from the early measurements of satellite drag. Consequently, there is an altitude dependent bias in the model. From the sparce orbital experiments that have been done, we know that the molecules which strike satellite surfaces rebound in a diffuse angular distribution with an energy loss given by the energy accommodation coefficient. As more evidence accumulates on the energy loss, more realistic drag coefficients are being calculated. These improved drag

  16. Fungible weights in logistic regression.

    Science.gov (United States)

    Jones, Jeff A; Waller, Niels G

    2016-06-01

    In this article we develop methods for assessing parameter sensitivity in logistic regression models. To set the stage for this work, we first review Waller's (2008) equations for computing fungible weights in linear regression. Next, we describe 2 methods for computing fungible weights in logistic regression. To demonstrate the utility of these methods, we compute fungible logistic regression weights using data from the Centers for Disease Control and Prevention's (2010) Youth Risk Behavior Surveillance Survey, and we illustrate how these alternate weights can be used to evaluate parameter sensitivity. To make our work accessible to the research community, we provide R code (R Core Team, 2015) that will generate both kinds of fungible logistic regression weights. (PsycINFO Database Record (c) 2016 APA, all rights reserved).

  17. Ordinary least square regression, orthogonal regression, geometric mean regression and their applications in aerosol science

    International Nuclear Information System (INIS)

    Leng Ling; Zhang Tianyi; Kleinman, Lawrence; Zhu Wei

    2007-01-01

    Regression analysis, especially the ordinary least squares method which assumes that errors are confined to the dependent variable, has seen a fair share of its applications in aerosol science. The ordinary least squares approach, however, could be problematic due to the fact that atmospheric data often does not lend itself to calling one variable independent and the other dependent. Errors often exist for both measurements. In this work, we examine two regression approaches available to accommodate this situation. They are orthogonal regression and geometric mean regression. Comparisons are made theoretically as well as numerically through an aerosol study examining whether the ratio of organic aerosol to CO would change with age

  18. Evaluation of random forest regression for prediction of breeding ...

    Indian Academy of Sciences (India)

    have been widely used for prediction of breeding values of genotypes from genomewide association studies. However, appli- ... tolerance to biotic and abiotic stresses. But due to ..... School, IARI, New Delhi, during his Ph.D. References.

  19. Multi-trait and random regression mature weight heritability and ...

    African Journals Online (AJOL)

    Legendre polynomials of orders 4, 3, 6 and 3 were used for animal and maternal genetic and permanent environmental effects, respectively, considering five classes of residual variances. Mature weight (five years) direct heritability estimates were 0.35 (MM) and 0.38 (RRM). Rank correlation between sires' breeding values ...

  20. Covariance Functions and Random Regression Models in the ...

    African Journals Online (AJOL)

    ARC-IRENE

    0.31 ± 0.021. 0.28 ± 0.004. aNo = number of records; Mean = unadjusted mean; s.d. = standard deviation .... A manual for. USA. Kirkp ., 1990. ... J. Math. Biol. 27, 429-450. atrick, M., Lofsvold, D. & Bulmer, M trajectories. Genetics 124, 979-993.

  1. Tumor regression patterns in retinoblastoma

    International Nuclear Information System (INIS)

    Zafar, S.N.; Siddique, S.N.; Zaheer, N.

    2016-01-01

    To observe the types of tumor regression after treatment, and identify the common pattern of regression in our patients. Study Design: Descriptive study. Place and Duration of Study: Department of Pediatric Ophthalmology and Strabismus, Al-Shifa Trust Eye Hospital, Rawalpindi, Pakistan, from October 2011 to October 2014. Methodology: Children with unilateral and bilateral retinoblastoma were included in the study. Patients were referred to Pakistan Institute of Medical Sciences, Islamabad, for chemotherapy. After every cycle of chemotherapy, dilated funds examination under anesthesia was performed to record response of the treatment. Regression patterns were recorded on RetCam II. Results: Seventy-four tumors were included in the study. Out of 74 tumors, 3 were ICRB group A tumors, 43 were ICRB group B tumors, 14 tumors belonged to ICRB group C, and remaining 14 were ICRB group D tumors. Type IV regression was seen in 39.1% (n=29) tumors, type II in 29.7% (n=22), type III in 25.6% (n=19), and type I in 5.4% (n=4). All group A tumors (100%) showed type IV regression. Seventeen (39.5%) group B tumors showed type IV regression. In group C, 5 tumors (35.7%) showed type II regression and 5 tumors (35.7%) showed type IV regression. In group D, 6 tumors (42.9%) regressed to type II non-calcified remnants. Conclusion: The response and success of the focal and systemic treatment, as judged by the appearance of different patterns of tumor regression, varies with the ICRB grouping of the tumor. (author)

  2. Noninvasive spectral imaging of skin chromophores based on multiple regression analysis aided by Monte Carlo simulation

    Science.gov (United States)

    Nishidate, Izumi; Wiswadarma, Aditya; Hase, Yota; Tanaka, Noriyuki; Maeda, Takaaki; Niizeki, Kyuichi; Aizu, Yoshihisa

    2011-08-01

    In order to visualize melanin and blood concentrations and oxygen saturation in human skin tissue, a simple imaging technique based on multispectral diffuse reflectance images acquired at six wavelengths (500, 520, 540, 560, 580 and 600nm) was developed. The technique utilizes multiple regression analysis aided by Monte Carlo simulation for diffuse reflectance spectra. Using the absorbance spectrum as a response variable and the extinction coefficients of melanin, oxygenated hemoglobin, and deoxygenated hemoglobin as predictor variables, multiple regression analysis provides regression coefficients. Concentrations of melanin and total blood are then determined from the regression coefficients using conversion vectors that are deduced numerically in advance, while oxygen saturation is obtained directly from the regression coefficients. Experiments with a tissue-like agar gel phantom validated the method. In vivo experiments with human skin of the human hand during upper limb occlusion and of the inner forearm exposed to UV irradiation demonstrated the ability of the method to evaluate physiological reactions of human skin tissue.

  3. Probabilistic optimization of safety coefficients

    International Nuclear Information System (INIS)

    Marques, M.; Devictor, N.; Magistris, F. de

    1999-01-01

    This article describes a reliability-based method for the optimization of safety coefficients defined and used in design codes. The purpose of the optimization is to determine the partial safety coefficients which minimize an objective function for sets of components and loading situations covered by a design rule. This objective function is a sum of distances between the reliability of the components designed using the safety coefficients and a target reliability. The advantage of this method is shown on the examples of the reactor vessel, a vapour pipe and the safety injection circuit. (authors)

  4. Recursive least squares method of regression coefficients estimation as a special case of Kalman filter

    Science.gov (United States)

    Borodachev, S. M.

    2016-06-01

    The simple derivation of recursive least squares (RLS) method equations is given as special case of Kalman filter estimation of a constant system state under changing observation conditions. A numerical example illustrates application of RLS to multicollinearity problem.

  5. Deriving proper uniform priors for regression coefficients, Parts I, II, and III

    NARCIS (Netherlands)

    van Erp, H.R.N.; Linger, R.O.; van Gelder, P.H.A.J.M.

    2017-01-01

    It is a relatively well-known fact that in problems of Bayesian model selection, improper priors should, in general, be avoided. In this paper we will derive and discuss a collection of four proper uniform priors which lie on an ascending scale of informativeness. It will turn out that these

  6. Non-linear Bayesian update of PCE coefficients

    KAUST Repository

    Litvinenko, Alexander

    2014-01-06

    Given: a physical system modeled by a PDE or ODE with uncertain coefficient q(?), a measurement operator Y (u(q), q), where u(q, ?) uncertain solution. Aim: to identify q(?). The mapping from parameters to observations is usually not invertible, hence this inverse identification problem is generally ill-posed. To identify q(!) we derived non-linear Bayesian update from the variational problem associated with conditional expectation. To reduce cost of the Bayesian update we offer a unctional approximation, e.g. polynomial chaos expansion (PCE). New: We apply Bayesian update to the PCE coefficients of the random coefficient q(?) (not to the probability density function of q).

  7. Non-linear Bayesian update of PCE coefficients

    KAUST Repository

    Litvinenko, Alexander; Matthies, Hermann G.; Pojonk, Oliver; Rosic, Bojana V.; Zander, Elmar

    2014-01-01

    Given: a physical system modeled by a PDE or ODE with uncertain coefficient q(?), a measurement operator Y (u(q), q), where u(q, ?) uncertain solution. Aim: to identify q(?). The mapping from parameters to observations is usually not invertible, hence this inverse identification problem is generally ill-posed. To identify q(!) we derived non-linear Bayesian update from the variational problem associated with conditional expectation. To reduce cost of the Bayesian update we offer a unctional approximation, e.g. polynomial chaos expansion (PCE). New: We apply Bayesian update to the PCE coefficients of the random coefficient q(?) (not to the probability density function of q).

  8. Regression to Causality : Regression-style presentation influences causal attribution

    DEFF Research Database (Denmark)

    Bordacconi, Mats Joe; Larsen, Martin Vinæs

    2014-01-01

    of equivalent results presented as either regression models or as a test of two sample means. Our experiment shows that the subjects who were presented with results as estimates from a regression model were more inclined to interpret these results causally. Our experiment implies that scholars using regression...... models – one of the primary vehicles for analyzing statistical results in political science – encourage causal interpretation. Specifically, we demonstrate that presenting observational results in a regression model, rather than as a simple comparison of means, makes causal interpretation of the results...... more likely. Our experiment drew on a sample of 235 university students from three different social science degree programs (political science, sociology and economics), all of whom had received substantial training in statistics. The subjects were asked to compare and evaluate the validity...

  9. Regression analysis with categorized regression calibrated exposure: some interesting findings

    Directory of Open Access Journals (Sweden)

    Hjartåker Anette

    2006-07-01

    Full Text Available Abstract Background Regression calibration as a method for handling measurement error is becoming increasingly well-known and used in epidemiologic research. However, the standard version of the method is not appropriate for exposure analyzed on a categorical (e.g. quintile scale, an approach commonly used in epidemiologic studies. A tempting solution could then be to use the predicted continuous exposure obtained through the regression calibration method and treat it as an approximation to the true exposure, that is, include the categorized calibrated exposure in the main regression analysis. Methods We use semi-analytical calculations and simulations to evaluate the performance of the proposed approach compared to the naive approach of not correcting for measurement error, in situations where analyses are performed on quintile scale and when incorporating the original scale into the categorical variables, respectively. We also present analyses of real data, containing measures of folate intake and depression, from the Norwegian Women and Cancer study (NOWAC. Results In cases where extra information is available through replicated measurements and not validation data, regression calibration does not maintain important qualities of the true exposure distribution, thus estimates of variance and percentiles can be severely biased. We show that the outlined approach maintains much, in some cases all, of the misclassification found in the observed exposure. For that reason, regression analysis with the corrected variable included on a categorical scale is still biased. In some cases the corrected estimates are analytically equal to those obtained by the naive approach. Regression calibration is however vastly superior to the naive method when applying the medians of each category in the analysis. Conclusion Regression calibration in its most well-known form is not appropriate for measurement error correction when the exposure is analyzed on a

  10. The Crash Intensity Evaluation Using General Centrality Criterions and a Geographically Weighted Regression

    Science.gov (United States)

    Ghadiriyan Arani, M.; Pahlavani, P.; Effati, M.; Noori Alamooti, F.

    2017-09-01

    Today, one of the social problems influencing on the lives of many people is the road traffic crashes especially the highway ones. In this regard, this paper focuses on highway of capital and the most populous city in the U.S. state of Georgia and the ninth largest metropolitan area in the United States namely Atlanta. Geographically weighted regression and general centrality criteria are the aspects of traffic used for this article. In the first step, in order to estimate of crash intensity, it is needed to extract the dual graph from the status of streets and highways to use general centrality criteria. With the help of the graph produced, the criteria are: Degree, Pageranks, Random walk, Eccentricity, Closeness, Betweenness, Clustering coefficient, Eigenvector, and Straightness. The intensity of crash point is counted for every highway by dividing the number of crashes in that highway to the total number of crashes. Intensity of crash point is calculated for each highway. Then, criteria and crash point were normalized and the correlation between them was calculated to determine the criteria that are not dependent on each other. The proposed hybrid approach is a good way to regression issues because these effective measures result to a more desirable output. R2 values for geographically weighted regression using the Gaussian kernel was 0.539 and also 0.684 was obtained using a triple-core cube. The results showed that the triple-core cube kernel is better for modeling the crash intensity.

  11. THE CRASH INTENSITY EVALUATION USING GENERAL CENTRALITY CRITERIONS AND A GEOGRAPHICALLY WEIGHTED REGRESSION

    Directory of Open Access Journals (Sweden)

    M. Ghadiriyan Arani

    2017-09-01

    Full Text Available Today, one of the social problems influencing on the lives of many people is the road traffic crashes especially the highway ones. In this regard, this paper focuses on highway of capital and the most populous city in the U.S. state of Georgia and the ninth largest metropolitan area in the United States namely Atlanta. Geographically weighted regression and general centrality criteria are the aspects of traffic used for this article. In the first step, in order to estimate of crash intensity, it is needed to extract the dual graph from the status of streets and highways to use general centrality criteria. With the help of the graph produced, the criteria are: Degree, Pageranks, Random walk, Eccentricity, Closeness, Betweenness, Clustering coefficient, Eigenvector, and Straightness. The intensity of crash point is counted for every highway by dividing the number of crashes in that highway to the total number of crashes. Intensity of crash point is calculated for each highway. Then, criteria and crash point were normalized and the correlation between them was calculated to determine the criteria that are not dependent on each other. The proposed hybrid approach is a good way to regression issues because these effective measures result to a more desirable output. R2 values for geographically weighted regression using the Gaussian kernel was 0.539 and also 0.684 was obtained using a triple-core cube. The results showed that the triple-core cube kernel is better for modeling the crash intensity.

  12. Photon mass attenuation coefficients, effective atomic numbers and ...

    Indian Academy of Sciences (India)

    of atomic number Z was performed using the logarithmic regression analysis of the data measured by the authors and reported earlier. The best-fit coefficients so obtained in the photon ..... This photon build-up is a function of thickness and atomic number of the sample and also the incident photon energy, which combine to ...

  13. On the misinterpretation of the correlation coefficient in pharmaceutical sciences

    DEFF Research Database (Denmark)

    Sonnergaard, Jørn

    2006-01-01

    The correlation coefficient is often used and more often misused as a universal parameter expressing the quality in linear regression analysis. The popularity of this dimensionless quantity is evident as it is easy to communicate and considered to be unproblematic to comprehend. However, illustra...

  14. Logic regression and its extensions.

    Science.gov (United States)

    Schwender, Holger; Ruczinski, Ingo

    2010-01-01

    Logic regression is an adaptive classification and regression procedure, initially developed to reveal interacting single nucleotide polymorphisms (SNPs) in genetic association studies. In general, this approach can be used in any setting with binary predictors, when the interaction of these covariates is of primary interest. Logic regression searches for Boolean (logic) combinations of binary variables that best explain the variability in the outcome variable, and thus, reveals variables and interactions that are associated with the response and/or have predictive capabilities. The logic expressions are embedded in a generalized linear regression framework, and thus, logic regression can handle a variety of outcome types, such as binary responses in case-control studies, numeric responses, and time-to-event data. In this chapter, we provide an introduction to the logic regression methodology, list some applications in public health and medicine, and summarize some of the direct extensions and modifications of logic regression that have been proposed in the literature. Copyright © 2010 Elsevier Inc. All rights reserved.

  15. Analysis of Palm Oil Production, Export, and Government Consumption to Gross Domestic Product of Five Districts in West Kalimantan by Panel Regression

    Science.gov (United States)

    Sulistianingsih, E.; Kiftiah, M.; Rosadi, D.; Wahyuni, H.

    2017-04-01

    Gross Domestic Product (GDP) is an indicator of economic growth in a region. GDP is a panel data, which consists of cross-section and time series data. Meanwhile, panel regression is a tool which can be utilised to analyse panel data. There are three models in panel regression, namely Common Effect Model (CEM), Fixed Effect Model (FEM) and Random Effect Model (REM). The models will be chosen based on results of Chow Test, Hausman Test and Lagrange Multiplier Test. This research analyses palm oil about production, export, and government consumption to five district GDP are in West Kalimantan, namely Sanggau, Sintang, Sambas, Ketapang and Bengkayang by panel regression. Based on the results of analyses, it concluded that REM, which adjusted-determination-coefficient is 0,823, is the best model in this case. Also, according to the result, only Export and Government Consumption that influence GDP of the districts.

  16. Quadrature formulas for Fourier coefficients

    KAUST Repository

    Bojanov, Borislav; Petrova, Guergana

    2009-01-01

    We consider quadrature formulas of high degree of precision for the computation of the Fourier coefficients in expansions of functions with respect to a system of orthogonal polynomials. In particular, we show the uniqueness of a multiple node

  17. Diffusion coefficient for anomalous transport

    International Nuclear Information System (INIS)

    1986-01-01

    A report on the progress towards the goal of estimating the diffusion coefficient for anomalous transport is given. The gyrokinetic theory is used to identify different time and length scale inherent to the characteristics of plasmas which exhibit anomalous transport

  18. Fuel Temperature Coefficient of Reactivity

    Energy Technology Data Exchange (ETDEWEB)

    Loewe, W.E.

    2001-07-31

    A method for measuring the fuel temperature coefficient of reactivity in a heterogeneous nuclear reactor is presented. The method, which is used during normal operation, requires that calibrated control rods be oscillated in a special way at a high reactor power level. The value of the fuel temperature coefficient of reactivity is found from the measured flux responses to these oscillations. Application of the method in a Savannah River reactor charged with natural uranium is discussed.

  19. Properties of Traffic Risk Coefficient

    Science.gov (United States)

    Tang, Tie-Qiao; Huang, Hai-Jun; Shang, Hua-Yan; Xue, Yu

    2009-10-01

    We use the model with the consideration of the traffic interruption probability (Physica A 387(2008)6845) to study the relationship between the traffic risk coefficient and the traffic interruption probability. The analytical and numerical results show that the traffic interruption probability will reduce the traffic risk coefficient and that the reduction is related to the density, which shows that this model can improve traffic security.

  20. BANK FAILURE PREDICTION WITH LOGISTIC REGRESSION

    Directory of Open Access Journals (Sweden)

    Taha Zaghdoudi

    2013-04-01

    Full Text Available In recent years the economic and financial world is shaken by a wave of financial crisis and resulted in violent bank fairly huge losses. Several authors have focused on the study of the crises in order to develop an early warning model. It is in the same path that our work takes its inspiration. Indeed, we have tried to develop a predictive model of Tunisian bank failures with the contribution of the binary logistic regression method. The specificity of our prediction model is that it takes into account microeconomic indicators of bank failures. The results obtained using our provisional model show that a bank's ability to repay its debt, the coefficient of banking operations, bank profitability per employee and leverage financial ratio has a negative impact on the probability of failure.

  1. Systematic Risk on Istanbul Stock Exchange: Traditional Beta Coefficient Versus Downside Beta Coefficient

    Directory of Open Access Journals (Sweden)

    Gülfen TUNA

    2013-03-01

    Full Text Available The aim of this study is to test the validity of Downside Capital Asset Pricing Model (D-CAPM on the ISE. At the same time, the explanatory power of CAPM's traditional beta and D-CAPM's downside beta on the changes in the average return values are examined comparatively. In this context, the monthly data for seventy three stocks that are continuously traded on the ISE for the period 1991-2009 is used. Regression analysis is applied in this study. The research results have shown that D-CAPM is valid on the ISE. In addition, it is obtained that the power of downside beta coefficient is higher than traditional beta coefficient on explaining the return changes. Therefore, it can be said that the downside beta is superior to traditional beta in the ISE for chosen period.

  2. Abstract Expression Grammar Symbolic Regression

    Science.gov (United States)

    Korns, Michael F.

    This chapter examines the use of Abstract Expression Grammars to perform the entire Symbolic Regression process without the use of Genetic Programming per se. The techniques explored produce a symbolic regression engine which has absolutely no bloat, which allows total user control of the search space and output formulas, which is faster, and more accurate than the engines produced in our previous papers using Genetic Programming. The genome is an all vector structure with four chromosomes plus additional epigenetic and constraint vectors, allowing total user control of the search space and the final output formulas. A combination of specialized compiler techniques, genetic algorithms, particle swarm, aged layered populations, plus discrete and continuous differential evolution are used to produce an improved symbolic regression sytem. Nine base test cases, from the literature, are used to test the improvement in speed and accuracy. The improved results indicate that these techniques move us a big step closer toward future industrial strength symbolic regression systems.

  3. Quantile Regression With Measurement Error

    KAUST Repository

    Wei, Ying; Carroll, Raymond J.

    2009-01-01

    . The finite sample performance of the proposed method is investigated in a simulation study, and compared to the standard regression calibration approach. Finally, we apply our methodology to part of the National Collaborative Perinatal Project growth data, a

  4. Linear and logistic regression analysis

    NARCIS (Netherlands)

    Tripepi, G.; Jager, K. J.; Dekker, F. W.; Zoccali, C.

    2008-01-01

    In previous articles of this series, we focused on relative risks and odds ratios as measures of effect to assess the relationship between exposure to risk factors and clinical outcomes and on control for confounding. In randomized clinical trials, the random allocation of patients is hoped to

  5. From Rasch scores to regression

    DEFF Research Database (Denmark)

    Christensen, Karl Bang

    2006-01-01

    Rasch models provide a framework for measurement and modelling latent variables. Having measured a latent variable in a population a comparison of groups will often be of interest. For this purpose the use of observed raw scores will often be inadequate because these lack interval scale propertie....... This paper compares two approaches to group comparison: linear regression models using estimated person locations as outcome variables and latent regression models based on the distribution of the score....

  6. Testing Heteroscedasticity in Robust Regression

    Czech Academy of Sciences Publication Activity Database

    Kalina, Jan

    2011-01-01

    Roč. 1, č. 4 (2011), s. 25-28 ISSN 2045-3345 Grant - others:GA ČR(CZ) GA402/09/0557 Institutional research plan: CEZ:AV0Z10300504 Keywords : robust regression * heteroscedasticity * regression quantiles * diagnostics Subject RIV: BB - Applied Statistics , Operational Research http://www.researchjournals.co.uk/documents/Vol4/06%20Kalina.pdf

  7. Regression methods for medical research

    CERN Document Server

    Tai, Bee Choo

    2013-01-01

    Regression Methods for Medical Research provides medical researchers with the skills they need to critically read and interpret research using more advanced statistical methods. The statistical requirements of interpreting and publishing in medical journals, together with rapid changes in science and technology, increasingly demands an understanding of more complex and sophisticated analytic procedures.The text explains the application of statistical models to a wide variety of practical medical investigative studies and clinical trials. Regression methods are used to appropriately answer the

  8. Forecasting with Dynamic Regression Models

    CERN Document Server

    Pankratz, Alan

    2012-01-01

    One of the most widely used tools in statistical forecasting, single equation regression models is examined here. A companion to the author's earlier work, Forecasting with Univariate Box-Jenkins Models: Concepts and Cases, the present text pulls together recent time series ideas and gives special attention to possible intertemporal patterns, distributed lag responses of output to input series and the auto correlation patterns of regression disturbance. It also includes six case studies.

  9. Clustering Coefficients for Correlation Networks.

    Science.gov (United States)

    Masuda, Naoki; Sakaki, Michiko; Ezaki, Takahiro; Watanabe, Takamitsu

    2018-01-01

    Graph theory is a useful tool for deciphering structural and functional networks of the brain on various spatial and temporal scales. The clustering coefficient quantifies the abundance of connected triangles in a network and is a major descriptive statistics of networks. For example, it finds an application in the assessment of small-worldness of brain networks, which is affected by attentional and cognitive conditions, age, psychiatric disorders and so forth. However, it remains unclear how the clustering coefficient should be measured in a correlation-based network, which is among major representations of brain networks. In the present article, we propose clustering coefficients tailored to correlation matrices. The key idea is to use three-way partial correlation or partial mutual information to measure the strength of the association between the two neighboring nodes of a focal node relative to the amount of pseudo-correlation expected from indirect paths between the nodes. Our method avoids the difficulties of previous applications of clustering coefficient (and other) measures in defining correlational networks, i.e., thresholding on the correlation value, discarding of negative correlation values, the pseudo-correlation problem and full partial correlation matrices whose estimation is computationally difficult. For proof of concept, we apply the proposed clustering coefficient measures to functional magnetic resonance imaging data obtained from healthy participants of various ages and compare them with conventional clustering coefficients. We show that the clustering coefficients decline with the age. The proposed clustering coefficients are more strongly correlated with age than the conventional ones are. We also show that the local variants of the proposed clustering coefficients (i.e., abundance of triangles around a focal node) are useful in characterizing individual nodes. In contrast, the conventional local clustering coefficients were strongly

  10. Clustering Coefficients for Correlation Networks

    Directory of Open Access Journals (Sweden)

    Naoki Masuda

    2018-03-01

    Full Text Available Graph theory is a useful tool for deciphering structural and functional networks of the brain on various spatial and temporal scales. The clustering coefficient quantifies the abundance of connected triangles in a network and is a major descriptive statistics of networks. For example, it finds an application in the assessment of small-worldness of brain networks, which is affected by attentional and cognitive conditions, age, psychiatric disorders and so forth. However, it remains unclear how the clustering coefficient should be measured in a correlation-based network, which is among major representations of brain networks. In the present article, we propose clustering coefficients tailored to correlation matrices. The key idea is to use three-way partial correlation or partial mutual information to measure the strength of the association between the two neighboring nodes of a focal node relative to the amount of pseudo-correlation expected from indirect paths between the nodes. Our method avoids the difficulties of previous applications of clustering coefficient (and other measures in defining correlational networks, i.e., thresholding on the correlation value, discarding of negative correlation values, the pseudo-correlation problem and full partial correlation matrices whose estimation is computationally difficult. For proof of concept, we apply the proposed clustering coefficient measures to functional magnetic resonance imaging data obtained from healthy participants of various ages and compare them with conventional clustering coefficients. We show that the clustering coefficients decline with the age. The proposed clustering coefficients are more strongly correlated with age than the conventional ones are. We also show that the local variants of the proposed clustering coefficients (i.e., abundance of triangles around a focal node are useful in characterizing individual nodes. In contrast, the conventional local clustering coefficients

  11. Clustering Coefficients for Correlation Networks

    Science.gov (United States)

    Masuda, Naoki; Sakaki, Michiko; Ezaki, Takahiro; Watanabe, Takamitsu

    2018-01-01

    Graph theory is a useful tool for deciphering structural and functional networks of the brain on various spatial and temporal scales. The clustering coefficient quantifies the abundance of connected triangles in a network and is a major descriptive statistics of networks. For example, it finds an application in the assessment of small-worldness of brain networks, which is affected by attentional and cognitive conditions, age, psychiatric disorders and so forth. However, it remains unclear how the clustering coefficient should be measured in a correlation-based network, which is among major representations of brain networks. In the present article, we propose clustering coefficients tailored to correlation matrices. The key idea is to use three-way partial correlation or partial mutual information to measure the strength of the association between the two neighboring nodes of a focal node relative to the amount of pseudo-correlation expected from indirect paths between the nodes. Our method avoids the difficulties of previous applications of clustering coefficient (and other) measures in defining correlational networks, i.e., thresholding on the correlation value, discarding of negative correlation values, the pseudo-correlation problem and full partial correlation matrices whose estimation is computationally difficult. For proof of concept, we apply the proposed clustering coefficient measures to functional magnetic resonance imaging data obtained from healthy participants of various ages and compare them with conventional clustering coefficients. We show that the clustering coefficients decline with the age. The proposed clustering coefficients are more strongly correlated with age than the conventional ones are. We also show that the local variants of the proposed clustering coefficients (i.e., abundance of triangles around a focal node) are useful in characterizing individual nodes. In contrast, the conventional local clustering coefficients were strongly

  12. Association between response rates and survival outcomes in patients with newly diagnosed multiple myeloma. A systematic review and meta-regression analysis.

    Science.gov (United States)

    Mainou, Maria; Madenidou, Anastasia-Vasiliki; Liakos, Aris; Paschos, Paschalis; Karagiannis, Thomas; Bekiari, Eleni; Vlachaki, Efthymia; Wang, Zhen; Murad, Mohammad Hassan; Kumar, Shaji; Tsapas, Apostolos

    2017-06-01

    We performed a systematic review and meta-regression analysis of randomized control trials to investigate the association between response to initial treatment and survival outcomes in patients with newly diagnosed multiple myeloma (MM). Response outcomes included complete response (CR) and the combined outcome of CR or very good partial response (VGPR), while survival outcomes were overall survival (OS) and progression-free survival (PFS). We used random-effect meta-regression models and conducted sensitivity analyses based on definition of CR and study quality. Seventy-two trials were included in the systematic review, 63 of which contributed data in meta-regression analyses. There was no association between OS and CR in patients without autologous stem cell transplant (ASCT) (regression coefficient: .02, 95% confidence interval [CI] -0.06, 0.10), in patients undergoing ASCT (-.11, 95% CI -0.44, 0.22) and in trials comparing ASCT with non-ASCT patients (.04, 95% CI -0.29, 0.38). Similarly, OS did not correlate with the combined metric of CR or VGPR, and no association was evident between response outcomes and PFS. Sensitivity analyses yielded similar results. This meta-regression analysis suggests that there is no association between conventional response outcomes and survival in patients with newly diagnosed MM. © 2017 John Wiley & Sons A/S. Published by John Wiley & Sons Ltd.

  13. Regression analysis of sparse asynchronous longitudinal data.

    Science.gov (United States)

    Cao, Hongyuan; Zeng, Donglin; Fine, Jason P

    2015-09-01

    We consider estimation of regression models for sparse asynchronous longitudinal observations, where time-dependent responses and covariates are observed intermittently within subjects. Unlike with synchronous data, where the response and covariates are observed at the same time point, with asynchronous data, the observation times are mismatched. Simple kernel-weighted estimating equations are proposed for generalized linear models with either time invariant or time-dependent coefficients under smoothness assumptions for the covariate processes which are similar to those for synchronous data. For models with either time invariant or time-dependent coefficients, the estimators are consistent and asymptotically normal but converge at slower rates than those achieved with synchronous data. Simulation studies evidence that the methods perform well with realistic sample sizes and may be superior to a naive application of methods for synchronous data based on an ad hoc last value carried forward approach. The practical utility of the methods is illustrated on data from a study on human immunodeficiency virus.

  14. Relationships between the structure of wheat gluten and ACE inhibitory activity of hydrolysate: stepwise multiple linear regression analysis.

    Science.gov (United States)

    Zhang, Yanyan; Ma, Haile; Wang, Bei; Qu, Wenjuan; Wali, Asif; Zhou, Cunshan

    2016-08-01

    Ultrasound pretreatment of wheat gluten (WG) before enzymolysis can improve the angiotensin converting enzyme (ACE) inhibitory activity of the hydrolysates by alerting the structure of substrate proteins. Establishment of a relationship between the structure of WG and ACE inhibitory activity of the hydrolysates to judge the end point of the ultrasonic pretreatment is vital. The results of stepwise multiple linear regression (MLR) showed that the contents of free sulfhydryl, α-helix, disulfide bond, surface hydrophobicity and random coil were significantly correlated to ACE Inhibitory activity of the hydrolysate, with the standard partial regression coefficients were 3.729, -0.676, -0.252, 0.022 and 0.156, respectively. The R(2) of this model was 0.970. External validation showed that the stepwise MLR model could well predict the ACE inhibitory activity of hydrolysate based on the content of free sulfhydryl, α-helix, disulfide bond, surface hydrophobicity and random coil of WG before hydrolysis. A stepwise multiple linear regression model describing the quantitative relationships between the structure of WG and the ACE Inhibitory activity of the hydrolysates was established. This model can be used to predict the endpoint of the ultrasonic pretreatment. © 2015 Society of Chemical Industry. © 2015 Society of Chemical Industry.

  15. Path coefficient analysis of zinc dynamics in varying soil environment

    International Nuclear Information System (INIS)

    Rattan, R.K.; Phung, C.V.; Singhal, S.K.; Deb, D.L.; Singh, A.K.

    1994-01-01

    Influence of soil properties on labile zinc, as measured by diethylene-triamine pentaacetic acid (DTPA) and zinc-65, and self-diffusion coefficients of zinc was assessed on 22 surface soil samples varying widely in their characteristics following linear regression and path coefficient analysis techniques. DTPA extractable zinc could be predicted from organic carbon status and pH of the soil with a highly significant coefficient of determination (R 2 =0.84 ** ). Ninety seven per cent variation in isotopically exchangeable zinc was explained by pH, clay content and cation exchange capacity (CEC) of soil. The self-diffusion coefficients (DaZn and DpZn) and buffer power of zinc exhibited exponential relationship with soil properties, pH being the most dominant one. Soil properties like organic matter, clay content etc. exhibited indirect effects on zinc diffusion rates via pH only. (author). 13 refs., 6 tabs

  16. Power coefficient anomaly in JOYO

    Energy Technology Data Exchange (ETDEWEB)

    Yamamoto, H

    1980-12-15

    Operation of the JOYO experimental fast reactor with the MK-I core has been divided into two phases: (1) 50 MWt power ascension and operation; and (2) 75 MWt power ascension and operation. The 50 MWt power-up tests were conducted in August 1978. In these tests, the measured reactivity loss due to power increases from 15 MWt to 50 MWt was 0.28% ..delta.. K/K, and agreed well with the predicted value of 0.27% ..delta.. K/K. The 75 MWt power ascension tests were conducted in July-August 1979. In the process of the first power increase above 50 MWt to 65 MWt conducted on July 11, 1979, an anomalously large negative power coefficient was observed. The value was about twice the power coefficient values measured in the tests below 50 MW. In order to reproduce the anomaly, the reactor power was decreased and again increased up to the maximum power of 65 MWt. However, the large negative power coefficient was not observed at this time. In the succeeding power increase from 65 MWt to 75 MWt, a similar anomalous power coefficient was again observed. This anomaly disappeared in the subsequent power ascensions to 75 MWt, and the magnitude of the power coefficient gradually decreased with power cycles above the 50 MWt level.

  17. An improved multiple linear regression and data analysis computer program package

    Science.gov (United States)

    Sidik, S. M.

    1972-01-01

    NEWRAP, an improved version of a previous multiple linear regression program called RAPIER, CREDUC, and CRSPLT, allows for a complete regression analysis including cross plots of the independent and dependent variables, correlation coefficients, regression coefficients, analysis of variance tables, t-statistics and their probability levels, rejection of independent variables, plots of residuals against the independent and dependent variables, and a canonical reduction of quadratic response functions useful in optimum seeking experimentation. A major improvement over RAPIER is that all regression calculations are done in double precision arithmetic.

  18. Semisupervised Clustering by Iterative Partition and Regression with Neuroscience Applications

    Directory of Open Access Journals (Sweden)

    Guoqi Qian

    2016-01-01

    Full Text Available Regression clustering is a mixture of unsupervised and supervised statistical learning and data mining method which is found in a wide range of applications including artificial intelligence and neuroscience. It performs unsupervised learning when it clusters the data according to their respective unobserved regression hyperplanes. The method also performs supervised learning when it fits regression hyperplanes to the corresponding data clusters. Applying regression clustering in practice requires means of determining the underlying number of clusters in the data, finding the cluster label of each data point, and estimating the regression coefficients of the model. In this paper, we review the estimation and selection issues in regression clustering with regard to the least squares and robust statistical methods. We also provide a model selection based technique to determine the number of regression clusters underlying the data. We further develop a computing procedure for regression clustering estimation and selection. Finally, simulation studies are presented for assessing the procedure, together with analyzing a real data set on RGB cell marking in neuroscience to illustrate and interpret the method.

  19. Analysis of internal conversion coefficients

    International Nuclear Information System (INIS)

    Coursol, N.; Gorozhankin, V.M.; Yakushev, E.A.; Briancon, C.; Vylov, Ts.

    2000-01-01

    An extensive database has been assembled that contains the three most widely used sets of calculated internal conversion coefficients (ICC): [Hager R.S., Seltzer E.C., 1968. Internal conversion tables. K-, L-, M-shell Conversion coefficients for Z=30 to Z=103, Nucl. Data Tables A4, 1-237; Band I.M., Trzhaskovskaya M.B., 1978. Tables of gamma-ray internal conversion coefficients for the K-, L- and M-shells, 10≤Z≤104, Special Report of Leningrad Nuclear Physics Institute; Roesel F., Fries H.M., Alder K., Pauli H.C., 1978. Internal conversion coefficients for all atomic shells, At. Data Nucl. Data Tables 21, 91-289] and also includes new Dirac-Fock calculations [Band I.M. and Trzhaskovskaya M.B., 1993. Internal conversion coefficients for low-energy nuclear transitions, At. Data Nucl. Data Tables 55, 43-61]. This database is linked to a computer program to plot ICCs and their combinations (sums and ratios) as a function of Z and energy, as well as relative deviations of ICC or their combinations for any pair of tabulated data. Examples of these analyses are presented for the K-shell and total ICCs of the gamma-ray standards [Hansen H.H., 1985. Evaluation of K-shell and total internal conversion coefficients for some selected nuclear transitions, Eur. Appl. Res. Rept. Nucl. Sci. Tech. 11.6 (4) 777-816] and for the K-shell and total ICCs of high multipolarity transitions (total, K-, L-, M-shells of E3 and M3 and K-shell of M4). Experimental data sets are also compared with the theoretical values of these specific calculations

  20. Producing The New Regressive Left

    DEFF Research Database (Denmark)

    Crone, Christine

    members, this thesis investigates a growing political trend and ideological discourse in the Arab world that I have called The New Regressive Left. On the premise that a media outlet can function as a forum for ideology production, the thesis argues that an analysis of this material can help to trace...... the contexture of The New Regressive Left. If the first part of the thesis lays out the theoretical approach and draws the contextual framework, through an exploration of the surrounding Arab media-and ideoscapes, the second part is an analytical investigation of the discourse that permeates the programmes aired...... becomes clear from the analytical chapters is the emergence of the new cross-ideological alliance of The New Regressive Left. This emerging coalition between Shia Muslims, religious minorities, parts of the Arab Left, secular cultural producers, and the remnants of the political,strategic resistance...

  1. Research and analyze of physical health using multiple regression analysis

    Directory of Open Access Journals (Sweden)

    T. S. Kyi

    2014-01-01

    Full Text Available This paper represents the research which is trying to create a mathematical model of the "healthy people" using the method of regression analysis. The factors are the physical parameters of the person (such as heart rate, lung capacity, blood pressure, breath holding, weight height coefficient, flexibility of the spine, muscles of the shoulder belt, abdominal muscles, squatting, etc.., and the response variable is an indicator of physical working capacity. After performing multiple regression analysis, obtained useful multiple regression models that can predict the physical performance of boys the aged of fourteen to seventeen years. This paper represents the development of regression model for the sixteen year old boys and analyzed results.

  2. A Matlab program for stepwise regression

    Directory of Open Access Journals (Sweden)

    Yanhong Qi

    2016-03-01

    Full Text Available The stepwise linear regression is a multi-variable regression for identifying statistically significant variables in the linear regression equation. In present study, we presented the Matlab program of stepwise regression.

  3. Regression filter for signal resolution

    International Nuclear Information System (INIS)

    Matthes, W.

    1975-01-01

    The problem considered is that of resolving a measured pulse height spectrum of a material mixture, e.g. gamma ray spectrum, Raman spectrum, into a weighed sum of the spectra of the individual constituents. The model on which the analytical formulation is based is described. The problem reduces to that of a multiple linear regression. A stepwise linear regression procedure was constructed. The efficiency of this method was then tested by transforming the procedure in a computer programme which was used to unfold test spectra obtained by mixing some spectra, from a library of arbitrary chosen spectra, and adding a noise component. (U.K.)

  4. Nonparametric Mixture of Regression Models.

    Science.gov (United States)

    Huang, Mian; Li, Runze; Wang, Shaoli

    2013-07-01

    Motivated by an analysis of US house price index data, we propose nonparametric finite mixture of regression models. We study the identifiability issue of the proposed models, and develop an estimation procedure by employing kernel regression. We further systematically study the sampling properties of the proposed estimators, and establish their asymptotic normality. A modified EM algorithm is proposed to carry out the estimation procedure. We show that our algorithm preserves the ascent property of the EM algorithm in an asymptotic sense. Monte Carlo simulations are conducted to examine the finite sample performance of the proposed estimation procedure. An empirical analysis of the US house price index data is illustrated for the proposed methodology.

  5. Parameter Selection Method for Support Vector Regression Based on Adaptive Fusion of the Mixed Kernel Function

    Directory of Open Access Journals (Sweden)

    Hailun Wang

    2017-01-01

    Full Text Available Support vector regression algorithm is widely used in fault diagnosis of rolling bearing. A new model parameter selection method for support vector regression based on adaptive fusion of the mixed kernel function is proposed in this paper. We choose the mixed kernel function as the kernel function of support vector regression. The mixed kernel function of the fusion coefficients, kernel function parameters, and regression parameters are combined together as the parameters of the state vector. Thus, the model selection problem is transformed into a nonlinear system state estimation problem. We use a 5th-degree cubature Kalman filter to estimate the parameters. In this way, we realize the adaptive selection of mixed kernel function weighted coefficients and the kernel parameters, the regression parameters. Compared with a single kernel function, unscented Kalman filter (UKF support vector regression algorithms, and genetic algorithms, the decision regression function obtained by the proposed method has better generalization ability and higher prediction accuracy.

  6. Irrational "Coefficients" in Renaissance Algebra.

    Science.gov (United States)

    Oaks, Jeffrey A

    2017-06-01

    Argument From the time of al-Khwārizmī in the ninth century to the beginning of the sixteenth century algebraists did not allow irrational numbers to serve as coefficients. To multiply by x, for instance, the result was expressed as the rhetorical equivalent of . The reason for this practice has to do with the premodern concept of a monomial. The coefficient, or "number," of a term was thought of as how many of that term are present, and not as the scalar multiple that we work with today. Then, in sixteenth-century Europe, a few algebraists began to allow for irrational coefficients in their notation. Christoff Rudolff (1525) was the first to admit them in special cases, and subsequently they appear more liberally in Cardano (1539), Scheubel (1550), Bombelli (1572), and others, though most algebraists continued to ban them. We survey this development by examining the texts that show irrational coefficients and those that argue against them. We show that the debate took place entirely in the conceptual context of premodern, "cossic" algebra, and persisted in the sixteenth century independent of the development of the new algebra of Viète, Decartes, and Fermat. This was a formal innovation violating prevailing concepts that we propose could only be introduced because of the growing autonomy of notation from rhetorical text.

  7. Integer Solutions of Binomial Coefficients

    Science.gov (United States)

    Gilbertson, Nicholas J.

    2016-01-01

    A good formula is like a good story, rich in description, powerful in communication, and eye-opening to readers. The formula presented in this article for determining the coefficients of the binomial expansion of (x + y)n is one such "good read." The beauty of this formula is in its simplicity--both describing a quantitative situation…

  8. Solution Methods for Structures with Random Properties Subject to Random Excitation

    DEFF Research Database (Denmark)

    Köylüoglu, H. U.; Nielsen, Søren R. K.; Cakmak, A. S.

    This paper deals with the lower order statistical moments of the response of structures with random stiffness and random damping properties subject to random excitation. The arising stochastic differential equations (SDE) with random coefficients are solved by two methods, a second order...... the SDE with random coefficients with deterministic initial conditions to an equivalent nonlinear SDE with deterministic coefficient and random initial conditions. In both methods, the statistical moment equations are used. Hierarchy of statistical moments in the markovian approach is closed...... by the cumulant neglect closure method applied at the fourth order level....

  9. Cactus: An Introduction to Regression

    Science.gov (United States)

    Hyde, Hartley

    2008-01-01

    When the author first used "VisiCalc," the author thought it a very useful tool when he had the formulas. But how could he design a spreadsheet if there was no known formula for the quantities he was trying to predict? A few months later, the author relates he learned to use multiple linear regression software and suddenly it all clicked into…

  10. Regression Models for Repairable Systems

    Czech Academy of Sciences Publication Activity Database

    Novák, Petr

    2015-01-01

    Roč. 17, č. 4 (2015), s. 963-972 ISSN 1387-5841 Institutional support: RVO:67985556 Keywords : Reliability analysis * Repair models * Regression Subject RIV: BB - Applied Statistics, Operational Research Impact factor: 0.782, year: 2015 http://library.utia.cas.cz/separaty/2015/SI/novak-0450902.pdf

  11. Survival analysis II: Cox regression

    NARCIS (Netherlands)

    Stel, Vianda S.; Dekker, Friedo W.; Tripepi, Giovanni; Zoccali, Carmine; Jager, Kitty J.

    2011-01-01

    In contrast to the Kaplan-Meier method, Cox proportional hazards regression can provide an effect estimate by quantifying the difference in survival between patient groups and can adjust for confounding effects of other variables. The purpose of this article is to explain the basic concepts of the

  12. Kernel regression with functional response

    OpenAIRE

    Ferraty, Frédéric; Laksaci, Ali; Tadj, Amel; Vieu, Philippe

    2011-01-01

    We consider kernel regression estimate when both the response variable and the explanatory one are functional. The rates of uniform almost complete convergence are stated as function of the small ball probability of the predictor and as function of the entropy of the set on which uniformity is obtained.

  13. Determining Sample Size for Accurate Estimation of the Squared Multiple Correlation Coefficient.

    Science.gov (United States)

    Algina, James; Olejnik, Stephen

    2000-01-01

    Discusses determining sample size for estimation of the squared multiple correlation coefficient and presents regression equations that permit determination of the sample size for estimating this parameter for up to 20 predictor variables. (SLD)

  14. Simple and multiple linear regression: sample size considerations.

    Science.gov (United States)

    Hanley, James A

    2016-11-01

    The suggested "two subjects per variable" (2SPV) rule of thumb in the Austin and Steyerberg article is a chance to bring out some long-established and quite intuitive sample size considerations for both simple and multiple linear regression. This article distinguishes two of the major uses of regression models that imply very different sample size considerations, neither served well by the 2SPV rule. The first is etiological research, which contrasts mean Y levels at differing "exposure" (X) values and thus tends to focus on a single regression coefficient, possibly adjusted for confounders. The second research genre guides clinical practice. It addresses Y levels for individuals with different covariate patterns or "profiles." It focuses on the profile-specific (mean) Y levels themselves, estimating them via linear compounds of regression coefficients and covariates. By drawing on long-established closed-form variance formulae that lie beneath the standard errors in multiple regression, and by rearranging them for heuristic purposes, one arrives at quite intuitive sample size considerations for both research genres. Copyright © 2016 Elsevier Inc. All rights reserved.

  15. Comparison of Linear and Non-linear Regression Analysis to Determine Pulmonary Pressure in Hyperthyroidism.

    Science.gov (United States)

    Scarneciu, Camelia C; Sangeorzan, Livia; Rus, Horatiu; Scarneciu, Vlad D; Varciu, Mihai S; Andreescu, Oana; Scarneciu, Ioan

    2017-01-01

    This study aimed at assessing the incidence of pulmonary hypertension (PH) at newly diagnosed hyperthyroid patients and at finding a simple model showing the complex functional relation between pulmonary hypertension in hyperthyroidism and the factors causing it. The 53 hyperthyroid patients (H-group) were evaluated mainly by using an echocardiographical method and compared with 35 euthyroid (E-group) and 25 healthy people (C-group). In order to identify the factors causing pulmonary hypertension the statistical method of comparing the values of arithmetical means is used. The functional relation between the two random variables (PAPs and each of the factors determining it within our research study) can be expressed by linear or non-linear function. By applying the linear regression method described by a first-degree equation the line of regression (linear model) has been determined; by applying the non-linear regression method described by a second degree equation, a parabola-type curve of regression (non-linear or polynomial model) has been determined. We made the comparison and the validation of these two models by calculating the determination coefficient (criterion 1), the comparison of residuals (criterion 2), application of AIC criterion (criterion 3) and use of F-test (criterion 4). From the H-group, 47% have pulmonary hypertension completely reversible when obtaining euthyroidism. The factors causing pulmonary hypertension were identified: previously known- level of free thyroxin, pulmonary vascular resistance, cardiac output; new factors identified in this study- pretreatment period, age, systolic blood pressure. According to the four criteria and to the clinical judgment, we consider that the polynomial model (graphically parabola- type) is better than the linear one. The better model showing the functional relation between the pulmonary hypertension in hyperthyroidism and the factors identified in this study is given by a polynomial equation of second

  16. A graphical method to evaluate spectral preprocessing in multivariate regression calibrations: example with Savitzky-Golay filters and partial least squares regression.

    Science.gov (United States)

    Delwiche, Stephen R; Reeves, James B

    2010-01-01

    In multivariate regression analysis of spectroscopy data, spectral preprocessing is often performed to reduce unwanted background information (offsets, sloped baselines) or accentuate absorption features in intrinsically overlapping bands. These procedures, also known as pretreatments, are commonly smoothing operations or derivatives. While such operations are often useful in reducing the number of latent variables of the actual decomposition and lowering residual error, they also run the risk of misleading the practitioner into accepting calibration equations that are poorly adapted to samples outside of the calibration. The current study developed a graphical method to examine this effect on partial least squares (PLS) regression calibrations of near-infrared (NIR) reflection spectra of ground wheat meal with two analytes, protein content and sodium dodecyl sulfate sedimentation (SDS) volume (an indicator of the quantity of the gluten proteins that contribute to strong doughs). These two properties were chosen because of their differing abilities to be modeled by NIR spectroscopy: excellent for protein content, fair for SDS sedimentation volume. To further demonstrate the potential pitfalls of preprocessing, an artificial component, a randomly generated value, was included in PLS regression trials. Savitzky-Golay (digital filter) smoothing, first-derivative, and second-derivative preprocess functions (5 to 25 centrally symmetric convolution points, derived from quadratic polynomials) were applied to PLS calibrations of 1 to 15 factors. The results demonstrated the danger of an over reliance on preprocessing when (1) the number of samples used in a multivariate calibration is low (<50), (2) the spectral response of the analyte is weak, and (3) the goodness of the calibration is based on the coefficient of determination (R(2)) rather than a term based on residual error. The graphical method has application to the evaluation of other preprocess functions and various

  17. Assessment of deforestation using regression; Hodnotenie odlesnenia s vyuzitim regresie

    Energy Technology Data Exchange (ETDEWEB)

    Juristova, J. [Univerzita Komenskeho, Prirodovedecka fakulta, Katedra kartografie, geoinformatiky a DPZ, 84215 Bratislava (Slovakia)

    2013-04-16

    This work is devoted to the evaluation of deforestation using regression methods through software Idrisi Taiga. Deforestation is evaluated by the method of logistic regression. The dependent variable has discrete values '0' and '1', indicating that the deforestation occurred or not. Independent variables have continuous values, expressing the distance from the edge of the deforested areas of forests from urban areas, the river and the road network. The results were also used in predicting the probability of deforestation in subsequent periods. The result is a map showing the output probability of deforestation for the periods 1990/2000 and 200/2006 in accordance with predetermined coefficients (values of independent variables). (authors)

  18. A Monte Carlo simulation study comparing linear regression, beta regression, variable-dispersion beta regression and fractional logit regression at recovering average difference measures in a two sample design.

    Science.gov (United States)

    Meaney, Christopher; Moineddin, Rahim

    2014-01-24

    In biomedical research, response variables are often encountered which have bounded support on the open unit interval--(0,1). Traditionally, researchers have attempted to estimate covariate effects on these types of response data using linear regression. Alternative modelling strategies may include: beta regression, variable-dispersion beta regression, and fractional logit regression models. This study employs a Monte Carlo simulation design to compare the statistical properties of the linear regression model to that of the more novel beta regression, variable-dispersion beta regression, and fractional logit regression models. In the Monte Carlo experiment we assume a simple two sample design. We assume observations are realizations of independent draws from their respective probability models. The randomly simulated draws from the various probability models are chosen to emulate average proportion/percentage/rate differences of pre-specified magnitudes. Following simulation of the experimental data we estimate average proportion/percentage/rate differences. We compare the estimators in terms of bias, variance, type-1 error and power. Estimates of Monte Carlo error associated with these quantities are provided. If response data are beta distributed with constant dispersion parameters across the two samples, then all models are unbiased and have reasonable type-1 error rates and power profiles. If the response data in the two samples have different dispersion parameters, then the simple beta regression model is biased. When the sample size is small (N0 = N1 = 25) linear regression has superior type-1 error rates compared to the other models. Small sample type-1 error rates can be improved in beta regression models using bias correction/reduction methods. In the power experiments, variable-dispersion beta regression and fractional logit regression models have slightly elevated power compared to linear regression models. Similar results were observed if the

  19. A Predictive Logistic Regression Model of World Conflict Using Open Source Data

    Science.gov (United States)

    2015-03-26

    No correlation between the error terms and the independent variables 9. Absence of perfect multicollinearity (Menard, 2001) When assumptions are...some of the variables before initial model building. Multicollinearity , or near-linear dependence among the variables will cause problems in the...model. High multicollinearity tends to produce unreasonably high logistic regression coefficients and can result in coefficients that are not

  20. [Application of negative binomial regression and modified Poisson regression in the research of risk factors for injury frequency].

    Science.gov (United States)

    Cao, Qingqing; Wu, Zhenqiang; Sun, Ying; Wang, Tiezhu; Han, Tengwei; Gu, Chaomei; Sun, Yehuan

    2011-11-01

    To Eexplore the application of negative binomial regression and modified Poisson regression analysis in analyzing the influential factors for injury frequency and the risk factors leading to the increase of injury frequency. 2917 primary and secondary school students were selected from Hefei by cluster random sampling method and surveyed by questionnaire. The data on the count event-based injuries used to fitted modified Poisson regression and negative binomial regression model. The risk factors incurring the increase of unintentional injury frequency for juvenile students was explored, so as to probe the efficiency of these two models in studying the influential factors for injury frequency. The Poisson model existed over-dispersion (P Poisson regression and negative binomial regression model, was fitted better. respectively. Both showed that male gender, younger age, father working outside of the hometown, the level of the guardian being above junior high school and smoking might be the results of higher injury frequencies. On a tendency of clustered frequency data on injury event, both the modified Poisson regression analysis and negative binomial regression analysis can be used. However, based on our data, the modified Poisson regression fitted better and this model could give a more accurate interpretation of relevant factors affecting the frequency of injury.

  1. Impact of multicollinearity on small sample hydrologic regression models

    Science.gov (United States)

    Kroll, Charles N.; Song, Peter

    2013-06-01

    Often hydrologic regression models are developed with ordinary least squares (OLS) procedures. The use of OLS with highly correlated explanatory variables produces multicollinearity, which creates highly sensitive parameter estimators with inflated variances and improper model selection. It is not clear how to best address multicollinearity in hydrologic regression models. Here a Monte Carlo simulation is developed to compare four techniques to address multicollinearity: OLS, OLS with variance inflation factor screening (VIF), principal component regression (PCR), and partial least squares regression (PLS). The performance of these four techniques was observed for varying sample sizes, correlation coefficients between the explanatory variables, and model error variances consistent with hydrologic regional regression models. The negative effects of multicollinearity are magnified at smaller sample sizes, higher correlations between the variables, and larger model error variances (smaller R2). The Monte Carlo simulation indicates that if the true model is known, multicollinearity is present, and the estimation and statistical testing of regression parameters are of interest, then PCR or PLS should be employed. If the model is unknown, or if the interest is solely on model predictions, is it recommended that OLS be employed since using more complicated techniques did not produce any improvement in model performance. A leave-one-out cross-validation case study was also performed using low-streamflow data sets from the eastern United States. Results indicate that OLS with stepwise selection generally produces models across study regions with varying levels of multicollinearity that are as good as biased regression techniques such as PCR and PLS.

  2. Statistical learning from a regression perspective

    CERN Document Server

    Berk, Richard A

    2016-01-01

    This textbook considers statistical learning applications when interest centers on the conditional distribution of the response variable, given a set of predictors, and when it is important to characterize how the predictors are related to the response. As a first approximation, this can be seen as an extension of nonparametric regression. This fully revised new edition includes important developments over the past 8 years. Consistent with modern data analytics, it emphasizes that a proper statistical learning data analysis derives from sound data collection, intelligent data management, appropriate statistical procedures, and an accessible interpretation of results. A continued emphasis on the implications for practice runs through the text. Among the statistical learning procedures examined are bagging, random forests, boosting, support vector machines and neural networks. Response variables may be quantitative or categorical. As in the first edition, a unifying theme is supervised learning that can be trea...

  3. Improving sub-pixel imperviousness change prediction by ensembling heterogeneous non-linear regression models

    Directory of Open Access Journals (Sweden)

    Drzewiecki Wojciech

    2016-12-01

    Full Text Available In this work nine non-linear regression models were compared for sub-pixel impervious surface area mapping from Landsat images. The comparison was done in three study areas both for accuracy of imperviousness coverage evaluation in individual points in time and accuracy of imperviousness change assessment. The performance of individual machine learning algorithms (Cubist, Random Forest, stochastic gradient boosting of regression trees, k-nearest neighbors regression, random k-nearest neighbors regression, Multivariate Adaptive Regression Splines, averaged neural networks, and support vector machines with polynomial and radial kernels was also compared with the performance of heterogeneous model ensembles constructed from the best models trained using particular techniques.

  4. Testing overall and moderator effects meta-regression

    NARCIS (Netherlands)

    Huizenga, H.M.; Visser, I.; Dolan, C.V.

    2011-01-01

    Random effects meta-regression is a technique to synthesize results of multiple studies. It allows for a test of an overall effect, as well as for tests of effects of study characteristics, that is, (discrete or continuous) moderator effects. We describe various procedures to test moderator effects:

  5. A Bayesian Nonparametric Causal Model for Regression Discontinuity Designs

    Science.gov (United States)

    Karabatsos, George; Walker, Stephen G.

    2013-01-01

    The regression discontinuity (RD) design (Thistlewaite & Campbell, 1960; Cook, 2008) provides a framework to identify and estimate causal effects from a non-randomized design. Each subject of a RD design is assigned to the treatment (versus assignment to a non-treatment) whenever her/his observed value of the assignment variable equals or…

  6. On the null distribution of Bayes factors in linear regression

    Science.gov (United States)

    We show that under the null, the 2 log (Bayes factor) is asymptotically distributed as a weighted sum of chi-squared random variables with a shifted mean. This claim holds for Bayesian multi-linear regression with a family of conjugate priors, namely, the normal-inverse-gamma prior, the g-prior, and...

  7. Quantile Regression With Measurement Error

    KAUST Repository

    Wei, Ying

    2009-08-27

    Regression quantiles can be substantially biased when the covariates are measured with error. In this paper we propose a new method that produces consistent linear quantile estimation in the presence of covariate measurement error. The method corrects the measurement error induced bias by constructing joint estimating equations that simultaneously hold for all the quantile levels. An iterative EM-type estimation algorithm to obtain the solutions to such joint estimation equations is provided. The finite sample performance of the proposed method is investigated in a simulation study, and compared to the standard regression calibration approach. Finally, we apply our methodology to part of the National Collaborative Perinatal Project growth data, a longitudinal study with an unusual measurement error structure. © 2009 American Statistical Association.

  8. Calibration factor or calibration coefficient?

    International Nuclear Information System (INIS)

    Meghzifene, A.; Shortt, K.R.

    2002-01-01

    Full text: The IAEA/WHO network of SSDLs was set up in order to establish links between SSDL members and the international measurement system. At the end of 2001, there were 73 network members in 63 Member States. The SSDL network members provide calibration services to end-users at the national or regional level. The results of the calibrations are summarized in a document called calibration report or calibration certificate. The IAEA has been using the term calibration certificate and will continue using the same terminology. The most important information in a calibration certificate is a list of calibration factors and their related uncertainties that apply to the calibrated instrument for the well-defined irradiation and ambient conditions. The IAEA has recently decided to change the term calibration factor to calibration coefficient, to be fully in line with ISO [ISO 31-0], which recommends the use of the term coefficient when it links two quantities A and B (equation 1) that have different dimensions. The term factor should only be used for k when it is used to link the terms A and B that have the same dimensions A=k.B. However, in a typical calibration, an ion chamber is calibrated in terms of a physical quantity such as air kerma, dose to water, ambient dose equivalent, etc. If the chamber is calibrated together with its electrometer, then the calibration refers to the physical quantity to be measured per electrometer unit reading. In this case, the terms referred have different dimensions. The adoption by the Agency of the term coefficient to express the results of calibrations is consistent with the 'International vocabulary of basic and general terms in metrology' prepared jointly by the BIPM, IEC, ISO, OIML and other organizations. The BIPM has changed from factor to coefficient. The authors believe that this is more than just a matter of semantics and recommend that the SSDL network members adopt this change in terminology. (author)

  9. Extinction Coefficient of Gold Nanostars

    OpenAIRE

    de Puig, Helena; Tam, Justina O.; Yen, Chun-Wan; Gehrke, Lee; Hamad-Schifferli, Kimberly

    2015-01-01

    Gold nanostars (NStars) are highly attractive for biological applications due to their surface chemistry, facile synthesis and optical properties. Here, we synthesize NStars in HEPES buffer at different HEPES/Au ratios, producing NStars of different sizes and shapes, and therefore varying optical properties. We measure the extinction coefficient of the synthesized NStars at their maximum surface plasmon resonances (SPR), which range from 5.7 × 108 to 26.8 × 108 M−1cm−1. Measured values correl...

  10. Multivariate and semiparametric kernel regression

    OpenAIRE

    Härdle, Wolfgang; Müller, Marlene

    1997-01-01

    The paper gives an introduction to theory and application of multivariate and semiparametric kernel smoothing. Multivariate nonparametric density estimation is an often used pilot tool for examining the structure of data. Regression smoothing helps in investigating the association between covariates and responses. We concentrate on kernel smoothing using local polynomial fitting which includes the Nadaraya-Watson estimator. Some theory on the asymptotic behavior and bandwidth selection is pro...

  11. Regression algorithm for emotion detection

    OpenAIRE

    Berthelon , Franck; Sander , Peter

    2013-01-01

    International audience; We present here two components of a computational system for emotion detection. PEMs (Personalized Emotion Maps) store links between bodily expressions and emotion values, and are individually calibrated to capture each person's emotion profile. They are an implementation based on aspects of Scherer's theoretical complex system model of emotion~\\cite{scherer00, scherer09}. We also present a regression algorithm that determines a person's emotional feeling from sensor m...

  12. Directional quantile regression in R

    Czech Academy of Sciences Publication Activity Database

    Boček, Pavel; Šiman, Miroslav

    2017-01-01

    Roč. 53, č. 3 (2017), s. 480-492 ISSN 0023-5954 R&D Projects: GA ČR GA14-07234S Institutional support: RVO:67985556 Keywords : multivariate quantile * regression quantile * halfspace depth * depth contour Subject RIV: BD - Theory of Information OBOR OECD: Applied mathematics Impact factor: 0.379, year: 2016 http://library.utia.cas.cz/separaty/2017/SI/bocek-0476587.pdf

  13. Correlation, Regression, and Cointegration of Nonstationary Economic Time Series

    DEFF Research Database (Denmark)

    Johansen, Søren

    ), and Phillips (1986) found the limit distributions. We propose to distinguish between empirical and population correlation coefficients and show in a bivariate autoregressive model for nonstationary variables that the empirical correlation and regression coefficients do not converge to the relevant population...... values, due to the trending nature of the data. We conclude by giving a simple cointegration analysis of two interests. The analysis illustrates that much more insight can be gained about the dynamic behavior of the nonstationary variables then simply by calculating a correlation coefficient......Yule (1926) introduced the concept of spurious or nonsense correlation, and showed by simulation that for some nonstationary processes, that the empirical correlations seem not to converge in probability even if the processes were independent. This was later discussed by Granger and Newbold (1974...

  14. Advanced colorectal neoplasia risk stratification by penalized logistic regression.

    Science.gov (United States)

    Lin, Yunzhi; Yu, Menggang; Wang, Sijian; Chappell, Richard; Imperiale, Thomas F

    2016-08-01

    Colorectal cancer is the second leading cause of death from cancer in the United States. To facilitate the efficiency of colorectal cancer screening, there is a need to stratify risk for colorectal cancer among the 90% of US residents who are considered "average risk." In this article, we investigate such risk stratification rules for advanced colorectal neoplasia (colorectal cancer and advanced, precancerous polyps). We use a recently completed large cohort study of subjects who underwent a first screening colonoscopy. Logistic regression models have been used in the literature to estimate the risk of advanced colorectal neoplasia based on quantifiable risk factors. However, logistic regression may be prone to overfitting and instability in variable selection. Since most of the risk factors in our study have several categories, it was tempting to collapse these categories into fewer risk groups. We propose a penalized logistic regression method that automatically and simultaneously selects variables, groups categories, and estimates their coefficients by penalizing the [Formula: see text]-norm of both the coefficients and their differences. Hence, it encourages sparsity in the categories, i.e. grouping of the categories, and sparsity in the variables, i.e. variable selection. We apply the penalized logistic regression method to our data. The important variables are selected, with close categories simultaneously grouped, by penalized regression models with and without the interactions terms. The models are validated with 10-fold cross-validation. The receiver operating characteristic curves of the penalized regression models dominate the receiver operating characteristic curve of naive logistic regressions, indicating a superior discriminative performance. © The Author(s) 2013.

  15. Polylinear regression analysis in radiochemistry

    International Nuclear Information System (INIS)

    Kopyrin, A.A.; Terent'eva, T.N.; Khramov, N.N.

    1995-01-01

    A number of radiochemical problems have been formulated in the framework of polylinear regression analysis, which permits the use of conventional mathematical methods for their solution. The authors have considered features of the use of polylinear regression analysis for estimating the contributions of various sources to the atmospheric pollution, for studying irradiated nuclear fuel, for estimating concentrations from spectral data, for measuring neutron fields of a nuclear reactor, for estimating crystal lattice parameters from X-ray diffraction patterns, for interpreting data of X-ray fluorescence analysis, for estimating complex formation constants, and for analyzing results of radiometric measurements. The problem of estimating the target parameters can be incorrect at certain properties of the system under study. The authors showed the possibility of regularization by adding a fictitious set of data open-quotes obtainedclose quotes from the orthogonal design. To estimate only a part of the parameters under consideration, the authors used incomplete rank models. In this case, it is necessary to take into account the possibility of confounding estimates. An algorithm for evaluating the degree of confounding is presented which is realized using standard software or regression analysis

  16. Earning on Response Coefficient in Automobile and Go Public Companies

    Directory of Open Access Journals (Sweden)

    Lisdawati Arifin

    2017-09-01

    Full Text Available This study aims to analyze factors that influence earnings response coefficients (ERC, simultaneously and partially, composed of leverage, the systematic risk (beta, growth opportunities (market to book value ratio, and the size of the firm (firm size, selection of the sample in this study the author take 12 automakers and components that meet the criteria of completeness of the data from the year 2008 to 2012, entirely based on consideration of the following criteria: (1 the company's automotive and components are listed on the stock exchange, (2 have the financial statements years 2008-2012 (3 has a return data (closing price the first day after the date of issuance of the financial statements. This study uses secondary data applying multiple linear regression models to analyze and test the effect of independent variables on the dependent variable partially (t-test, simultaneous (f-test, and the goodness of fit (R-square on a research model. The result shows that leverage, beta, growth opportunities (market to book value ratio and size along with (simultaneously the effect on the dependent variable (dependent variable earnings response coefficients. Partially leverage negatively affect earnings response coefficients, partially beta negatively correlated earnings response coefficients, partially growth opportunities (market to book value ratio significant effect on earnings response coefficients, partially sized companies (firm size significantly influence earnings response coefficients.

  17. Gaussian Process Regression Model in Spatial Logistic Regression

    Science.gov (United States)

    Sofro, A.; Oktaviarina, A.

    2018-01-01

    Spatial analysis has developed very quickly in the last decade. One of the favorite approaches is based on the neighbourhood of the region. Unfortunately, there are some limitations such as difficulty in prediction. Therefore, we offer Gaussian process regression (GPR) to accommodate the issue. In this paper, we will focus on spatial modeling with GPR for binomial data with logit link function. The performance of the model will be investigated. We will discuss the inference of how to estimate the parameters and hyper-parameters and to predict as well. Furthermore, simulation studies will be explained in the last section.

  18. Form of multicomponent Fickian diffusion coefficients matrix

    International Nuclear Information System (INIS)

    Wambui Mutoru, J.; Firoozabadi, Abbas

    2011-01-01

    Highlights: → Irreversible thermodynamics establishes form of multicomponent diffusion coefficients. → Phenomenological coefficients and thermodynamic factors affect sign of diffusion coefficients. → Negative diagonal elements of diffusion coefficients matrix can occur in non-ideal mixtures. → Eigenvalues of the matrix of Fickian diffusion coefficients may not be all real. - Abstract: The form of multicomponent Fickian diffusion coefficients matrix in thermodynamically stable mixtures is established based on the form of phenomenological coefficients and thermodynamic factors. While phenomenological coefficients form a symmetric positive definite matrix, the determinant of thermodynamic factors matrix is positive. As a result, the Fickian diffusion coefficients matrix has a positive determinant, but its elements - including diagonal elements - can be negative. Comprehensive survey of reported diffusion coefficients data for ternary and quaternary mixtures, confirms that invariably the determinant of the Fickian diffusion coefficients matrix is positive.

  19. Sparse Reduced-Rank Regression for Simultaneous Dimension Reduction and Variable Selection

    KAUST Repository

    Chen, Lisha; Huang, Jianhua Z.

    2012-01-01

    and hence improves predictive accuracy. We propose to select relevant variables for reduced-rank regression by using a sparsity-inducing penalty. We apply a group-lasso type penalty that treats each row of the matrix of the regression coefficients as a group

  20. Easy methods for extracting individual regression slopes: Comparing SPSS, R, and Excel

    Directory of Open Access Journals (Sweden)

    Roland Pfister

    2013-10-01

    Full Text Available Three different methods for extracting coefficientsof linear regression analyses are presented. The focus is on automatic and easy-to-use approaches for common statistical packages: SPSS, R, and MS Excel / LibreOffice Calc. Hands-on examples are included for each analysis, followed by a brief description of how a subsequent regression coefficient analysis is performed.

  1. Modeling Group Differences in OLS and Orthogonal Regression: Implications for Differential Validity Studies

    Science.gov (United States)

    Kane, Michael T.; Mroch, Andrew A.

    2010-01-01

    In evaluating the relationship between two measures across different groups (i.e., in evaluating "differential validity") it is necessary to examine differences in correlation coefficients and in regression lines. Ordinary least squares (OLS) regression is the standard method for fitting lines to data, but its criterion for optimal fit…

  2. A new correlation for two-phase critical discharge coefficient

    International Nuclear Information System (INIS)

    Park, Jong Woon; Chun, Moon Hyun

    1989-01-01

    A new simple correlation for subcooled and two-phase critical flow discharge coefficient has been developed by stepwise regression technique. The new discharge coefficient has three independent variables and they are length to hydraulic diameter ratio, degree of subcooling, and stagnation temperature. The new discharge coefficient is applied as a multiplier to homogeneous equilibrium model and Abauf's single phase critical mass flux calculation equation. This method has been tested for its accuracy by comparing with experimental data. Results of the comparison show that the agreement between the predictions with new correlation and the experimental data is good for pipes and nozzles with vertical upward flow for subcooled upstream condition and nozzles with horizontal configuration for two-phase upstream condition

  3. Study of transport coefficients of nanodiamond nanofluids

    Science.gov (United States)

    Pryazhnikov, M. I.; Minakov, A. V.; Guzei, D. V.

    2017-09-01

    Experimental data on the thermal conductivity coefficient and viscosity coefficient of nanodiamond nanofluids are presented. Distilled water and ethylene glycol were used as the base fluid. Dependences of transport coefficients on concentration are obtained. It was shown that the thermal conductivity coefficient increases with increasing nanodiamonds concentration. It was shown that base fluids properties and nanodiamonds concentration affect on the rheology of nanofluids.

  4. On concurvity in nonlinear and nonparametric regression models

    Directory of Open Access Journals (Sweden)

    Sonia Amodio

    2014-12-01

    Full Text Available When data are affected by multicollinearity in the linear regression framework, then concurvity will be present in fitting a generalized additive model (GAM. The term concurvity describes nonlinear dependencies among the predictor variables. As collinearity results in inflated variance of the estimated regression coefficients in the linear regression model, the result of the presence of concurvity leads to instability of the estimated coefficients in GAMs. Even if the backfitting algorithm will always converge to a solution, in case of concurvity the final solution of the backfitting procedure in fitting a GAM is influenced by the starting functions. While exact concurvity is highly unlikely, approximate concurvity, the analogue of multicollinearity, is of practical concern as it can lead to upwardly biased estimates of the parameters and to underestimation of their standard errors, increasing the risk of committing type I error. We compare the existing approaches to detect concurvity, pointing out their advantages and drawbacks, using simulated and real data sets. As a result, this paper will provide a general criterion to detect concurvity in nonlinear and non parametric regression models.

  5. Evaluation of Rock Joint Coefficients

    Science.gov (United States)

    Audy, Ondřej; Ficker, Tomáš

    2017-10-01

    A computer method for evaluation of rock joint coefficients is described and several applications are presented. The method is based on two absolute numerical indicators that are formed by means of the Fourier replicas of rock joint profiles. The first indicator quantifies the vertical depth of profiles and the second indicator classifies wavy character of profiles. The absolute indicators have replaced the formerly used relative indicators that showed some artificial behavior in some cases. This contribution is focused on practical computations testing the functionality of the newly introduced indicators.

  6. Variable selection and model choice in geoadditive regression models.

    Science.gov (United States)

    Kneib, Thomas; Hothorn, Torsten; Tutz, Gerhard

    2009-06-01

    Model choice and variable selection are issues of major concern in practical regression analyses, arising in many biometric applications such as habitat suitability analyses, where the aim is to identify the influence of potentially many environmental conditions on certain species. We describe regression models for breeding bird communities that facilitate both model choice and variable selection, by a boosting algorithm that works within a class of geoadditive regression models comprising spatial effects, nonparametric effects of continuous covariates, interaction surfaces, and varying coefficients. The major modeling components are penalized splines and their bivariate tensor product extensions. All smooth model terms are represented as the sum of a parametric component and a smooth component with one degree of freedom to obtain a fair comparison between the model terms. A generic representation of the geoadditive model allows us to devise a general boosting algorithm that automatically performs model choice and variable selection.

  7. Direction of Effects in Multiple Linear Regression Models.

    Science.gov (United States)

    Wiedermann, Wolfgang; von Eye, Alexander

    2015-01-01

    Previous studies analyzed asymmetric properties of the Pearson correlation coefficient using higher than second order moments. These asymmetric properties can be used to determine the direction of dependence in a linear regression setting (i.e., establish which of two variables is more likely to be on the outcome side) within the framework of cross-sectional observational data. Extant approaches are restricted to the bivariate regression case. The present contribution extends the direction of dependence methodology to a multiple linear regression setting by analyzing distributional properties of residuals of competing multiple regression models. It is shown that, under certain conditions, the third central moments of estimated regression residuals can be used to decide upon direction of effects. In addition, three different approaches for statistical inference are discussed: a combined D'Agostino normality test, a skewness difference test, and a bootstrap difference test. Type I error and power of the procedures are assessed using Monte Carlo simulations, and an empirical example is provided for illustrative purposes. In the discussion, issues concerning the quality of psychological data, possible extensions of the proposed methods to the fourth central moment of regression residuals, and potential applications are addressed.

  8. Correlation coefficients in neutron β-decay

    International Nuclear Information System (INIS)

    Byrne, J.

    1978-01-01

    The various angular and polarisation coefficients in neutron decay are the principal sources of information on the β-interaction. Measurements of the electron-neutrino angular correlation coefficient (a), the neutron-spin-electron-momentum correlation coefficient (A), the neutron-spin-neutrino-momentum correlation coefficient (B), and the triple correlation coefficient D and time-reversal invariance are reviewed and the results discussed. (U.K.)

  9. Spontaneous regression of pulmonary bullae

    International Nuclear Information System (INIS)

    Satoh, H.; Ishikawa, H.; Ohtsuka, M.; Sekizawa, K.

    2002-01-01

    The natural history of pulmonary bullae is often characterized by gradual, progressive enlargement. Spontaneous regression of bullae is, however, very rare. We report a case in which complete resolution of pulmonary bullae in the left upper lung occurred spontaneously. The management of pulmonary bullae is occasionally made difficult because of gradual progressive enlargement associated with abnormal pulmonary function. Some patients have multiple bulla in both lungs and/or have a history of pulmonary emphysema. Others have a giant bulla without emphysematous change in the lungs. Our present case had treated lung cancer with no evidence of local recurrence. He had no emphysematous change in lung function test and had no complaints, although the high resolution CT scan shows evidence of underlying minimal changes of emphysema. Ortin and Gurney presented three cases of spontaneous reduction in size of bulla. Interestingly, one of them had a marked decrease in the size of a bulla in association with thickening of the wall of the bulla, which was observed in our patient. This case we describe is of interest, not only because of the rarity with which regression of pulmonary bulla has been reported in the literature, but also because of the spontaneous improvements in the radiological picture in the absence of overt infection or tumor. Copyright (2002) Blackwell Science Pty Ltd

  10. Quantum algorithm for linear regression

    Science.gov (United States)

    Wang, Guoming

    2017-07-01

    We present a quantum algorithm for fitting a linear regression model to a given data set using the least-squares approach. Differently from previous algorithms which yield a quantum state encoding the optimal parameters, our algorithm outputs these numbers in the classical form. So by running it once, one completely determines the fitted model and then can use it to make predictions on new data at little cost. Moreover, our algorithm works in the standard oracle model, and can handle data sets with nonsparse design matrices. It runs in time poly( log2(N ) ,d ,κ ,1 /ɛ ) , where N is the size of the data set, d is the number of adjustable parameters, κ is the condition number of the design matrix, and ɛ is the desired precision in the output. We also show that the polynomial dependence on d and κ is necessary. Thus, our algorithm cannot be significantly improved. Furthermore, we also give a quantum algorithm that estimates the quality of the least-squares fit (without computing its parameters explicitly). This algorithm runs faster than the one for finding this fit, and can be used to check whether the given data set qualifies for linear regression in the first place.

  11. Sparse Regression by Projection and Sparse Discriminant Analysis

    KAUST Repository

    Qi, Xin

    2015-04-03

    © 2015, © American Statistical Association, Institute of Mathematical Statistics, and Interface Foundation of North America. Recent years have seen active developments of various penalized regression methods, such as LASSO and elastic net, to analyze high-dimensional data. In these approaches, the direction and length of the regression coefficients are determined simultaneously. Due to the introduction of penalties, the length of the estimates can be far from being optimal for accurate predictions. We introduce a new framework, regression by projection, and its sparse version to analyze high-dimensional data. The unique nature of this framework is that the directions of the regression coefficients are inferred first, and the lengths and the tuning parameters are determined by a cross-validation procedure to achieve the largest prediction accuracy. We provide a theoretical result for simultaneous model selection consistency and parameter estimation consistency of our method in high dimension. This new framework is then generalized such that it can be applied to principal components analysis, partial least squares, and canonical correlation analysis. We also adapt this framework for discriminant analysis. Compared with the existing methods, where there is relatively little control of the dependency among the sparse components, our method can control the relationships among the components. We present efficient algorithms and related theory for solving the sparse regression by projection problem. Based on extensive simulations and real data analysis, we demonstrate that our method achieves good predictive performance and variable selection in the regression setting, and the ability to control relationships between the sparse components leads to more accurate classification. In supplementary materials available online, the details of the algorithms and theoretical proofs, and R codes for all simulation studies are provided.

  12. Prediction of retention indices for frequently reported compounds of plant essential oils using multiple linear regression, partial least squares, and support vector machine.

    Science.gov (United States)

    Yan, Jun; Huang, Jian-Hua; He, Min; Lu, Hong-Bing; Yang, Rui; Kong, Bo; Xu, Qing-Song; Liang, Yi-Zeng

    2013-08-01

    Retention indices for frequently reported compounds of plant essential oils on three different stationary phases were investigated. Multivariate linear regression, partial least squares, and support vector machine combined with a new variable selection approach called random-frog recently proposed by our group, were employed to model quantitative structure-retention relationships. Internal and external validations were performed to ensure the stability and predictive ability. All the three methods could obtain an acceptable model, and the optimal results by support vector machine based on a small number of informative descriptors with the square of correlation coefficient for cross validation, values of 0.9726, 0.9759, and 0.9331 on the dimethylsilicone stationary phase, the dimethylsilicone phase with 5% phenyl groups, and the PEG stationary phase, respectively. The performances of two variable selection approaches, random-frog and genetic algorithm, are compared. The importance of the variables was found to be consistent when estimated from correlation coefficients in multivariate linear regression equations and selection probability in model spaces. © 2013 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.

  13. Tools to support interpreting multiple regression in the face of multicollinearity.

    Science.gov (United States)

    Kraha, Amanda; Turner, Heather; Nimon, Kim; Zientek, Linda Reichwein; Henson, Robin K

    2012-01-01

    While multicollinearity may increase the difficulty of interpreting multiple regression (MR) results, it should not cause undue problems for the knowledgeable researcher. In the current paper, we argue that rather than using one technique to investigate regression results, researchers should consider multiple indices to understand the contributions that predictors make not only to a regression model, but to each other as well. Some of the techniques to interpret MR effects include, but are not limited to, correlation coefficients, beta weights, structure coefficients, all possible subsets regression, commonality coefficients, dominance weights, and relative importance weights. This article will review a set of techniques to interpret MR effects, identify the elements of the data on which the methods focus, and identify statistical software to support such analyses.

  14. Correlation factor, velocity autocorrelation function and frequency-dependent tracer diffusion coefficient

    NARCIS (Netherlands)

    Beijeren, H. van; Kehr, K.W.

    1986-01-01

    The correlation factor, defined as the ratio between the tracer diffusion coefficient in lattice gases and the diffusion coefficient for a corresponding uncorrelated random walk, is known to assume a very simple form under certain conditions. A simple derivation of this is given with the aid of

  15. The number of subjects per variable required in linear regression analyses.

    Science.gov (United States)

    Austin, Peter C; Steyerberg, Ewout W

    2015-06-01

    To determine the number of independent variables that can be included in a linear regression model. We used a series of Monte Carlo simulations to examine the impact of the number of subjects per variable (SPV) on the accuracy of estimated regression coefficients and standard errors, on the empirical coverage of estimated confidence intervals, and on the accuracy of the estimated R(2) of the fitted model. A minimum of approximately two SPV tended to result in estimation of regression coefficients with relative bias of less than 10%. Furthermore, with this minimum number of SPV, the standard errors of the regression coefficients were accurately estimated and estimated confidence intervals had approximately the advertised coverage rates. A much higher number of SPV were necessary to minimize bias in estimating the model R(2), although adjusted R(2) estimates behaved well. The bias in estimating the model R(2) statistic was inversely proportional to the magnitude of the proportion of variation explained by the population regression model. Linear regression models require only two SPV for adequate estimation of regression coefficients, standard errors, and confidence intervals. Copyright © 2015 The Authors. Published by Elsevier Inc. All rights reserved.

  16. Exploring the Influence of Neighborhood Characteristics on Burglary Risks: A Bayesian Random Effects Modeling Approach

    Directory of Open Access Journals (Sweden)

    Hongqiang Liu

    2016-06-01

    Full Text Available A Bayesian random effects modeling approach was used to examine the influence of neighborhood characteristics on burglary risks in Jianghan District, Wuhan, China. This random effects model is essentially spatial; a spatially structured random effects term and an unstructured random effects term are added to the traditional non-spatial Poisson regression model. Based on social disorganization and routine activity theories, five covariates extracted from the available data at the neighborhood level were used in the modeling. Three regression models were fitted and compared by the deviance information criterion to identify which model best fit our data. A comparison of the results from the three models indicates that the Bayesian random effects model is superior to the non-spatial models in fitting the data and estimating regression coefficients. Our results also show that neighborhoods with above average bar density and department store density have higher burglary risks. Neighborhood-specific burglary risks and posterior probabilities of neighborhoods having a burglary risk greater than 1.0 were mapped, indicating the neighborhoods that should warrant more attention and be prioritized for crime intervention and reduction. Implications and limitations of the study are discussed in our concluding section.

  17. Regression and artificial neural network modeling for the prediction of gray leaf spot of maize.

    Science.gov (United States)

    Paul, P A; Munkvold, G P

    2005-04-01

    ABSTRACT Regression and artificial neural network (ANN) modeling approaches were combined to develop models to predict the severity of gray leaf spot of maize, caused by Cercospora zeae-maydis. In all, 329 cases consisting of environmental, cultural, and location-specific variables were collected for field plots in Iowa between 1998 and 2002. Disease severity on the ear leaf at the dough to dent plant growth stage was used as the response variable. Correlation and regression analyses were performed to select potentially useful predictor variables. Predictors from the best 9 of 80 regression models were used to develop ANN models. A random sample of 60% of the cases was used to train the networks, and 20% each for testing and validation. Model performance was evaluated based on coefficient of determination (R(2)) and mean square error (MSE) for the validation data set. The best models had R(2) ranging from 0.70 to 0.75 and MSE ranging from 174.7 to 202.8. The most useful predictor variables were hours of daily temperatures between 22 and 30 degrees C (85.50 to 230.50 h) and hours of nightly relative humidity >/=90% (122 to 330 h) for the period between growth stages V4 and V12, mean nightly temperature (65.26 to 76.56 degrees C) for the period between growth stages V12 and R2, longitude (90.08 to 95.14 degrees W), maize residue on the soil surface (0 to 100%), planting date (in day of the year; 112 to 182), and gray leaf spot resistance rating (2 to 7; based on a 1-to-9 scale, where 1 = most susceptible to 9 = most resistant).

  18. Network clustering coefficient approach to DNA sequence analysis

    Energy Technology Data Exchange (ETDEWEB)

    Gerhardt, Guenther J.L. [Universidade Federal do Rio Grande do Sul-Hospital de Clinicas de Porto Alegre, Rua Ramiro Barcelos 2350/sala 2040/90035-003 Porto Alegre (Brazil); Departamento de Fisica e Quimica da Universidade de Caxias do Sul, Rua Francisco Getulio Vargas 1130, 95001-970 Caxias do Sul (Brazil); Lemke, Ney [Programa Interdisciplinar em Computacao Aplicada, Unisinos, Av. Unisinos, 950, 93022-000 Sao Leopoldo, RS (Brazil); Corso, Gilberto [Departamento de Biofisica e Farmacologia, Centro de Biociencias, Universidade Federal do Rio Grande do Norte, Campus Universitario, 59072 970 Natal, RN (Brazil)]. E-mail: corso@dfte.ufrn.br

    2006-05-15

    In this work we propose an alternative DNA sequence analysis tool based on graph theoretical concepts. The methodology investigates the path topology of an organism genome through a triplet network. In this network, triplets in DNA sequence are vertices and two vertices are connected if they occur juxtaposed on the genome. We characterize this network topology by measuring the clustering coefficient. We test our methodology against two main bias: the guanine-cytosine (GC) content and 3-bp (base pairs) periodicity of DNA sequence. We perform the test constructing random networks with variable GC content and imposed 3-bp periodicity. A test group of some organisms is constructed and we investigate the methodology in the light of the constructed random networks. We conclude that the clustering coefficient is a valuable tool since it gives information that is not trivially contained in 3-bp periodicity neither in the variable GC content.

  19. Prediction, Regression and Critical Realism

    DEFF Research Database (Denmark)

    Næss, Petter

    2004-01-01

    This paper considers the possibility of prediction in land use planning, and the use of statistical research methods in analyses of relationships between urban form and travel behaviour. Influential writers within the tradition of critical realism reject the possibility of predicting social...... phenomena. This position is fundamentally problematic to public planning. Without at least some ability to predict the likely consequences of different proposals, the justification for public sector intervention into market mechanisms will be frail. Statistical methods like regression analyses are commonly...... seen as necessary in order to identify aggregate level effects of policy measures, but are questioned by many advocates of critical realist ontology. Using research into the relationship between urban structure and travel as an example, the paper discusses relevant research methods and the kinds...

  20. The adsorption coefficient (KOC) of chlorpyrifos in clay soil

    International Nuclear Information System (INIS)

    Halimah Muhamad; Nashriyah Mat; Tan Yew Ai; Ismail Sahid

    2005-01-01

    The purpose of this study was to determine the adsorption coefficient (KOC) of chlorpyrifos in clay soil by measuring the Freundlich adsorption coefficient (Kads(f)) and desorption coefficient (1/n value) of chlorpyrifos. It was found that the Freundlich adsorption coefficient (Kads(f)) and the linear regression (r2) of the Freundlich adsorption isotherm for chlorpyrifos in the clay soil were 52.6 L/kg and 0.5244, respectively. Adsorption equilibrium time was achieved within 24 hours for clay soil. This adsorption equilibrium time was used to determine the effect of concentration on adsorption. The adsorption coefficient (KOC) of clay soil was found to be 2783 L/kg with an initial concentration solution of 1 μg/g, soil-solution ratio (1:5) at 300 C when the equilibrium between the soil matrix and solution was 24 hours. The Kdes decreased over four repetitions of the desorption process. The chlorpyrifos residues may be strongly adsorbed onto the surface of clay. (Author)

  1. New Inference Procedures for Semiparametric Varying-Coefficient Partially Linear Cox Models

    Directory of Open Access Journals (Sweden)

    Yunbei Ma

    2014-01-01

    Full Text Available In biomedical research, one major objective is to identify risk factors and study their risk impacts, as this identification can help clinicians to both properly make a decision and increase efficiency of treatments and resource allocation. A two-step penalized-based procedure is proposed to select linear regression coefficients for linear components and to identify significant nonparametric varying-coefficient functions for semiparametric varying-coefficient partially linear Cox models. It is shown that the penalized-based resulting estimators of the linear regression coefficients are asymptotically normal and have oracle properties, and the resulting estimators of the varying-coefficient functions have optimal convergence rates. A simulation study and an empirical example are presented for illustration.

  2. Application of principal component regression and partial least squares regression in ultraviolet spectrum water quality detection

    Science.gov (United States)

    Li, Jiangtong; Luo, Yongdao; Dai, Honglin

    2018-01-01

    Water is the source of life and the essential foundation of all life. With the development of industrialization, the phenomenon of water pollution is becoming more and more frequent, which directly affects the survival and development of human. Water quality detection is one of the necessary measures to protect water resources. Ultraviolet (UV) spectral analysis is an important research method in the field of water quality detection, which partial least squares regression (PLSR) analysis method is becoming predominant technology, however, in some special cases, PLSR's analysis produce considerable errors. In order to solve this problem, the traditional principal component regression (PCR) analysis method was improved by using the principle of PLSR in this paper. The experimental results show that for some special experimental data set, improved PCR analysis method performance is better than PLSR. The PCR and PLSR is the focus of this paper. Firstly, the principal component analysis (PCA) is performed by MATLAB to reduce the dimensionality of the spectral data; on the basis of a large number of experiments, the optimized principal component is extracted by using the principle of PLSR, which carries most of the original data information. Secondly, the linear regression analysis of the principal component is carried out with statistic package for social science (SPSS), which the coefficients and relations of principal components can be obtained. Finally, calculating a same water spectral data set by PLSR and improved PCR, analyzing and comparing two results, improved PCR and PLSR is similar for most data, but improved PCR is better than PLSR for data near the detection limit. Both PLSR and improved PCR can be used in Ultraviolet spectral analysis of water, but for data near the detection limit, improved PCR's result better than PLSR.

  3. Effective elasticity coefficients of native rocks and consolidated granular matter

    International Nuclear Information System (INIS)

    Schulz, Beatrix M.; Schulz, Michael

    2008-01-01

    The elastic coefficients of binary heterogeneous materials, such as several native rock materials or consolidated granular matter will be determined in terms of a perturbation expansion. Furthermore, in order to check the validity of the obtained results, these are compared with numerical investigations using Boole's model of randomly distributed spheres. Finally, we apply the results on several classes of native rocks and consolidated granular materials

  4. Extinction Coefficient of Gold Nanostars.

    Science.gov (United States)

    de Puig, Helena; Tam, Justina O; Yen, Chun-Wan; Gehrke, Lee; Hamad-Schifferli, Kimberly

    2015-07-30

    Gold nanostars (NStars) are highly attractive for biological applications due to their surface chemistry, facile synthesis and optical properties. Here, we synthesize NStars in HEPES buffer at different HEPES/Au ratios, producing NStars of different sizes and shapes, and therefore varying optical properties. We measure the extinction coefficient of the synthesized NStars at their maximum surface plasmon resonances (SPR), which range from 5.7 × 10 8 to 26.8 × 10 8 M -1 cm -1 . Measured values correlate with those obtained from theoretical models of the NStars using the discrete dipole approximation (DDA), which we use to simulate the extinction spectra of the nanostars. Finally, because NStars are typically used in biological applications, we conjugate DNA and antibodies to the NStars and calculate the footprint of the bound biomolecules.

  5. Kerr scattering coefficients via isomonodromy

    Energy Technology Data Exchange (ETDEWEB)

    Cunha, Bruno Carneiro da [Departamento de Física, Universidade Federal de Pernambuco,50670-901, Recife, Pernambuco (Brazil); Novaes, Fábio [International Institute of Physics, Federal University of Rio Grande do Norte,Av. Odilon Gomes de Lima 1722, Capim Macio, Natal-RN 59078-400 (Brazil)

    2015-11-23

    We study the scattering of a massless scalar field in a generic Kerr background. Using a particular gauge choice based on the current conservation of the radial equation, we give a generic formula for the scattering coefficient in terms of the composite monodromy parameter σ between the inner and the outer horizons. Using the isomonodromy flow, we calculate σ exactly in terms of the Painlevé V τ-function. We also show that the eigenvalue problem for the angular equation (spheroidal harmonics) can be calculated using the same techniques. We use recent developments relating the Painlevé V τ-function to Liouville irregular conformal blocks to claim that this scattering problem is solved in the combinatorial sense, with known expressions for the τ-function near the critical points.

  6. Non-Markovian dynamics of quantum systems: formalism, transport coefficients

    International Nuclear Information System (INIS)

    Kanokov, Z.; Palchikov, Yu.V.; Antonenko, N.V.; Adamian, G.G.; Kanokov, Z.; Adamian, G.G.; Scheid, W.

    2004-01-01

    Full text: The generalized Linbland equations with non-stationary transport coefficients are derived from the Langevin equations for the case of nonlinear non-Markovian noise [1]. The equations of motion for the collective coordinates are consistent with the generalized quantum fluctuation dissipation relations. The microscopic justification of the Linbland axiomatic approach is performed. Explicit expressions for the time-dependent transport coefficients are presented for the case of FC- and RWA-oscillators and a general linear coupling in coordinate and in momentum between the collective subsystem and heat bath. The explicit equations for the correlation functions show that the Onsanger's regression hypothesis does not hold exactly for the non-Markovian equations of motion. However, under some conditions the regression of fluctuations goes to zero in the same manner as the average values. In the low and high temperature regimes we found that the dissipation leads to long-time tails in correlation functions in the RWA-oscillator. In the case of the FC-oscillator a non-exponential power-like decay of the correlation function in coordinate is only obtained only at the low temperature limit. The calculated results depend rather weakly on the memory time in many applications. The found transient times for diffusion coefficients D pp (t), D qp (t) and D qq (t) are quite short. The value of classical diffusion coefficients in momentum underestimates the asymptotic value of quantum one D pp (t), but the asymptotic values of classical σ qq c and quantum σ qq second moments are close due to the negativity of quantum mixed diffusion coefficient D qp (t)

  7. Credit Scoring Problem Based on Regression Analysis

    OpenAIRE

    Khassawneh, Bashar Suhil Jad Allah

    2014-01-01

    ABSTRACT: This thesis provides an explanatory introduction to the regression models of data mining and contains basic definitions of key terms in the linear, multiple and logistic regression models. Meanwhile, the aim of this study is to illustrate fitting models for the credit scoring problem using simple linear, multiple linear and logistic regression models and also to analyze the found model functions by statistical tools. Keywords: Data mining, linear regression, logistic regression....

  8. Randomization tests

    CERN Document Server

    Edgington, Eugene

    2007-01-01

    Statistical Tests That Do Not Require Random Sampling Randomization Tests Numerical Examples Randomization Tests and Nonrandom Samples The Prevalence of Nonrandom Samples in Experiments The Irrelevance of Random Samples for the Typical Experiment Generalizing from Nonrandom Samples Intelligibility Respect for the Validity of Randomization Tests Versatility Practicality Precursors of Randomization Tests Other Applications of Permutation Tests Questions and Exercises Notes References Randomized Experiments Unique Benefits of Experiments Experimentation without Mani

  9. Model-based Quantile Regression for Discrete Data

    KAUST Repository

    Padellini, Tullia

    2018-04-10

    Quantile regression is a class of methods voted to the modelling of conditional quantiles. In a Bayesian framework quantile regression has typically been carried out exploiting the Asymmetric Laplace Distribution as a working likelihood. Despite the fact that this leads to a proper posterior for the regression coefficients, the resulting posterior variance is however affected by an unidentifiable parameter, hence any inferential procedure beside point estimation is unreliable. We propose a model-based approach for quantile regression that considers quantiles of the generating distribution directly, and thus allows for a proper uncertainty quantification. We then create a link between quantile regression and generalised linear models by mapping the quantiles to the parameter of the response variable, and we exploit it to fit the model with R-INLA. We extend it also in the case of discrete responses, where there is no 1-to-1 relationship between quantiles and distribution\\'s parameter, by introducing continuous generalisations of the most common discrete variables (Poisson, Binomial and Negative Binomial) to be exploited in the fitting.

  10. Regularized Label Relaxation Linear Regression.

    Science.gov (United States)

    Fang, Xiaozhao; Xu, Yong; Li, Xuelong; Lai, Zhihui; Wong, Wai Keung; Fang, Bingwu

    2018-04-01

    Linear regression (LR) and some of its variants have been widely used for classification problems. Most of these methods assume that during the learning phase, the training samples can be exactly transformed into a strict binary label matrix, which has too little freedom to fit the labels adequately. To address this problem, in this paper, we propose a novel regularized label relaxation LR method, which has the following notable characteristics. First, the proposed method relaxes the strict binary label matrix into a slack variable matrix by introducing a nonnegative label relaxation matrix into LR, which provides more freedom to fit the labels and simultaneously enlarges the margins between different classes as much as possible. Second, the proposed method constructs the class compactness graph based on manifold learning and uses it as the regularization item to avoid the problem of overfitting. The class compactness graph is used to ensure that the samples sharing the same labels can be kept close after they are transformed. Two different algorithms, which are, respectively, based on -norm and -norm loss functions are devised. These two algorithms have compact closed-form solutions in each iteration so that they are easily implemented. Extensive experiments show that these two algorithms outperform the state-of-the-art algorithms in terms of the classification accuracy and running time.

  11. Factorization of Transport Coefficients in Macroporous Media

    DEFF Research Database (Denmark)

    Shapiro, Alexander; Stenby, Erling Halfdan

    2000-01-01

    We prove the fundamental theorem about factorization of the phenomenological coefficients for transport in macroporous media. By factorization we mean the representation of the transport coefficients as products of geometric parameters of the porous medium and the parameters characteristic...

  12. Anomalous Seebeck coefficient in boron carbides

    International Nuclear Information System (INIS)

    Aselage, T.L.; Emin, D.; Wood, C.; Mackinnon, I.D.R.; Howard, I.A.

    1987-01-01

    Boron carbides exhibit an anomalously large Seebeck coefficient with a temperature coefficient that is characteristic of polaronic hopping between inequivalent sites. The inequivalence in the sites is associated with disorder in the solid. The temperature dependence of the Seebeck coefficient for materials prepared by different techniques provides insight into the nature of the disorder

  13. Soccer Ball Lift Coefficients via Trajectory Analysis

    Science.gov (United States)

    Goff, John Eric; Carre, Matt J.

    2010-01-01

    We performed experiments in which a soccer ball was launched from a machine while two high-speed cameras recorded portions of the trajectory. Using the trajectory data and published drag coefficients, we extracted lift coefficients for a soccer ball. We determined lift coefficients for a wide range of spin parameters, including several spin…

  14. Kepler AutoRegressive Planet Search (KARPS)

    Science.gov (United States)

    Caceres, Gabriel

    2018-01-01

    One of the main obstacles in detecting faint planetary transits is the intrinsic stellar variability of the host star. The Kepler AutoRegressive Planet Search (KARPS) project implements statistical methodology associated with autoregressive processes (in particular, ARIMA and ARFIMA) to model stellar lightcurves in order to improve exoplanet transit detection. We also develop a novel Transit Comb Filter (TCF) applied to the AR residuals which provides a periodogram analogous to the standard Box-fitting Least Squares (BLS) periodogram. We train a random forest classifier on known Kepler Objects of Interest (KOIs) using select features from different stages of this analysis, and then use ROC curves to define and calibrate the criteria to recover the KOI planet candidates with high fidelity. These statistical methods are detailed in a contributed poster (Feigelson et al., this meeting).These procedures are applied to the full DR25 dataset of NASA’s Kepler mission. Using the classification criteria, a vast majority of known KOIs are recovered and dozens of new KARPS Candidate Planets (KCPs) discovered, including ultra-short period exoplanets. The KCPs will be briefly presented and discussed.

  15. Symmetry chains and adaptation coefficients

    International Nuclear Information System (INIS)

    Fritzer, H.P.; Gruber, B.

    1985-01-01

    Given a symmetry chain of physical significance it becomes necessary to obtain states which transform properly with respect to the symmetries of the chain. In this article we describe a method which permits us to calculate symmetry-adapted quantum states with relative ease. The coefficients for the symmetry-adapted linear combinations are obtained, in numerical form, in terms of the original states of the system and can thus be represented in the form of numerical tables. In addition, one also obtains automatically the matrix elements for the operators of the symmetry groups which are involved, and thus for any physical operator which can be expressed either as an element of the algebra or of the enveloping algebra. The method is well suited for computers once the physically relevant symmetry chain, or chains, have been defined. While the method to be described is generally applicable to any physical system for which semisimple Lie algebras play a role we choose here a familiar example in order to illustrate the method and to illuminate its simplicity. We choose the nuclear shell model for the case of two nucleons with orbital angular momentum l = 1. While the states of the entire shell transform like the smallest spin representation of SO(25) we restrict our attention to its subgroup SU(6) x SU(2)/sub T/. We determine the symmetry chains which lead to total angular momentum SU(2)/sub J/ and obtain the symmetry-adapted states for these chains

  16. The Regression Analysis of Individual Financial Performance: Evidence from Croatia

    OpenAIRE

    Bahovec, Vlasta; Barbić, Dajana; Palić, Irena

    2017-01-01

    Background: A large body of empirical literature indicates that gender and financial literacy are significant determinants of individual financial performance. Objectives: The purpose of this paper is to recognize the impact of the variable financial literacy and the variable gender on the variation of the financial performance using the regression analysis. Methods/Approach: The survey was conducted using the systematically chosen random sample of Croatian financial consumers. The cross sect...

  17. Some effects of random dose measurement errors on analysis of atomic bomb survivor data

    International Nuclear Information System (INIS)

    Gilbert, E.S.

    1985-01-01

    The effects of random dose measurement errors on analyses of atomic bomb survivor data are described and quantified for several procedures. It is found that the ways in which measurement error is most likely to mislead are through downward bias in the estimated regression coefficients and through distortion of the shape of the dose-response curve. The magnitude of the bias with simple linear regression is evaluated for several dose treatments including the use of grouped and ungrouped data, analyses with and without truncation at 600 rad, and analyses which exclude doses exceeding 200 rad. Limited calculations have also been made for maximum likelihood estimation based on Poisson regression. 16 refs., 6 tabs

  18. Principal component regression analysis with SPSS.

    Science.gov (United States)

    Liu, R X; Kuang, J; Gong, Q; Hou, X L

    2003-06-01

    The paper introduces all indices of multicollinearity diagnoses, the basic principle of principal component regression and determination of 'best' equation method. The paper uses an example to describe how to do principal component regression analysis with SPSS 10.0: including all calculating processes of the principal component regression and all operations of linear regression, factor analysis, descriptives, compute variable and bivariate correlations procedures in SPSS 10.0. The principal component regression analysis can be used to overcome disturbance of the multicollinearity. The simplified, speeded up and accurate statistical effect is reached through the principal component regression analysis with SPSS.

  19. Random Intercept and Random Slope 2-Level Multilevel Models

    Directory of Open Access Journals (Sweden)

    Rehan Ahmad Khan

    2012-11-01

    Full Text Available Random intercept model and random intercept & random slope model carrying two-levels of hierarchy in the population are presented and compared with the traditional regression approach. The impact of students’ satisfaction on their grade point average (GPA was explored with and without controlling teachers influence. The variation at level-1 can be controlled by introducing the higher levels of hierarchy in the model. The fanny movement of the fitted lines proves variation of student grades around teachers.

  20. The intermediate endpoint effect in logistic and probit regression

    Science.gov (United States)

    MacKinnon, DP; Lockwood, CM; Brown, CH; Wang, W; Hoffman, JM

    2010-01-01

    Background An intermediate endpoint is hypothesized to be in the middle of the causal sequence relating an independent variable to a dependent variable. The intermediate variable is also called a surrogate or mediating variable and the corresponding effect is called the mediated, surrogate endpoint, or intermediate endpoint effect. Clinical studies are often designed to change an intermediate or surrogate endpoint and through this intermediate change influence the ultimate endpoint. In many intermediate endpoint clinical studies the dependent variable is binary, and logistic or probit regression is used. Purpose The purpose of this study is to describe a limitation of a widely used approach to assessing intermediate endpoint effects and to propose an alternative method, based on products of coefficients, that yields more accurate results. Methods The intermediate endpoint model for a binary outcome is described for a true binary outcome and for a dichotomization of a latent continuous outcome. Plots of true values and a simulation study are used to evaluate the different methods. Results Distorted estimates of the intermediate endpoint effect and incorrect conclusions can result from the application of widely used methods to assess the intermediate endpoint effect. The same problem occurs for the proportion of an effect explained by an intermediate endpoint, which has been suggested as a useful measure for identifying intermediate endpoints. A solution to this problem is given based on the relationship between latent variable modeling and logistic or probit regression. Limitations More complicated intermediate variable models are not addressed in the study, although the methods described in the article can be extended to these more complicated models. Conclusions Researchers are encouraged to use an intermediate endpoint method based on the product of regression coefficients. A common method based on difference in coefficient methods can lead to distorted

  1. Energy coefficients for a propeller series

    DEFF Research Database (Denmark)

    Olsen, Anders Smærup

    2004-01-01

    The efficiency for a propeller is calculated by energy coefficients. These coefficients are related to four types of losses, i.e. the axial, the rotational, the frictional, and the finite blade number loss, and one gain, i.e. the axial gain. The energy coefficients are derived by use...... of the potential theory with the propeller modelled as an actuator disk. The efficiency based on the energy coefficients is calculated for a propeller series. The results show a good agreement between the efficiency based on the energy coefficients and the efficiency obtained by a vortex-lattice method....

  2. Statistical power analyses using G*Power 3.1: tests for correlation and regression analyses.

    Science.gov (United States)

    Faul, Franz; Erdfelder, Edgar; Buchner, Axel; Lang, Albert-Georg

    2009-11-01

    G*Power is a free power analysis program for a variety of statistical tests. We present extensions and improvements of the version introduced by Faul, Erdfelder, Lang, and Buchner (2007) in the domain of correlation and regression analyses. In the new version, we have added procedures to analyze the power of tests based on (1) single-sample tetrachoric correlations, (2) comparisons of dependent correlations, (3) bivariate linear regression, (4) multiple linear regression based on the random predictor model, (5) logistic regression, and (6) Poisson regression. We describe these new features and provide a brief introduction to their scope and handling.

  3. PENGARUH PENGUNGKAPAN CORPORATE SOCIAL RESPONSIBILITY TERHADAP EARNING RESPONSE COEFFICIENT

    Directory of Open Access Journals (Sweden)

    MI Mitha Dwi Restuti

    2012-03-01

    Full Text Available Tujuan penelitian ini adalah untuk mengetahui pengaruh negatif pengungkapan Corporate Sosial Responsibility (CSR disclosure terhadap Earning Response Coefficient (ERC. Alat analisis yang digunakan dalam penelitian ini menggunakan metode analisis regresi berganda.Sampel yang digunakan adalah sebanyak 150 perusahaan yang terdaftar pada Bursa Efek Indonesia pada tahun 2010. Berdasarkan hasil penelitian ditemukan bahwa pengungkapan Corporate Social Responsibility tidak berpengaruh terhadap Earning Response Coefficient (ERC. Hal ini dapat dikatakan bahwa investor belum memperhatikan informasi-informasi sosial yang diungkapkan dalam laporan tahunan perusahaan sebagai informasi yang dapat mempengaruhi investor dalam melakukan keputusan investasi. Investor masih mengganggap informasi laba lebih bermanfaat dalam menilai perusahaan dan dianggap lebih mampu memberikan informasi untuk mendapatkan return saham yang diharapkan oleh investor dibandingkan dengan informasi sosial yang diungkapkan oleh perusahaan.The purpose of this study is to determine the negative effect of Corporate Social Responsibility disclosure (CSR disclosure of Earnings Response Coefficient (ERC. Multiple regressions were used to analyze the data. The samples were 150 companies listed on the Indonesia Stock Exchange in 2010. Based on the research, the result was the disclosures of Corporate Social Responsibility did not influence Earning Response Coefficient (ECR. It can be said that investors did not pay attention to social information that was disclosed in the company’s annual report as information that could affect investors in making investment decisions. Investor did not consider sosial information; they only consider profit information to assess the company value and their investment return

  4. Methodology update for determination of the erosion coefficient(Z

    Directory of Open Access Journals (Sweden)

    Tošić Radislav

    2012-01-01

    Full Text Available The research and mapping the intensity of mechanical water erosion that have begun with the empirical methodology of S. Gavrilović during the mid-twentieth century last, by various intensity, until the present time. A many decades work on the research of these issues pointed to some shortcomings of the existing methodology, and thus the need for its innovation. In this sense, R. Lazarević made certain adjustments of the empirical methodology of S. Gavrilović by changing the tables for determination of the coefficients Φ, X and Y, that is, the tables for determining the mean erosion coefficient (Z. The main objective of this paper is to update the existing methodology for determining the erosion coefficient (Z with the empirical methodology of S. Gavrilović and amendments made by R. Lazarević (1985, but also with better adjustments to the information technologies and the needs of modern society. The proposed procedure, that is, the model to determine the erosion coefficient (Z in this paper is the result of ten years of scientific research and project work in mapping the intensity of mechanical water erosion and its modeling using various models of erosion in the Republic of Srpska and Serbia. By analyzing the correlation of results obtained by regression models and results obtained during the mapping of erosion on the territory of the Republic of Srpska, a high degree of correlation (R² = 0.9963 was established, which is essentially a good assessment of the proposed models.

  5. Marginal regression analysis of recurrent events with coarsened censoring times.

    Science.gov (United States)

    Hu, X Joan; Rosychuk, Rhonda J

    2016-12-01

    Motivated by an ongoing pediatric mental health care (PMHC) study, this article presents weakly structured methods for analyzing doubly censored recurrent event data where only coarsened information on censoring is available. The study extracted administrative records of emergency department visits from provincial health administrative databases. The available information of each individual subject is limited to a subject-specific time window determined up to concealed data. To evaluate time-dependent effect of exposures, we adapt the local linear estimation with right censored survival times under the Cox regression model with time-varying coefficients (cf. Cai and Sun, Scandinavian Journal of Statistics 2003, 30, 93-111). We establish the pointwise consistency and asymptotic normality of the regression parameter estimator, and examine its performance by simulation. The PMHC study illustrates the proposed approach throughout the article. © 2016, The International Biometric Society.

  6. Regularized multivariate regression models with skew-t error distributions

    KAUST Repository

    Chen, Lianfu

    2014-06-01

    We consider regularization of the parameters in multivariate linear regression models with the errors having a multivariate skew-t distribution. An iterative penalized likelihood procedure is proposed for constructing sparse estimators of both the regression coefficient and inverse scale matrices simultaneously. The sparsity is introduced through penalizing the negative log-likelihood by adding L1-penalties on the entries of the two matrices. Taking advantage of the hierarchical representation of skew-t distributions, and using the expectation conditional maximization (ECM) algorithm, we reduce the problem to penalized normal likelihood and develop a procedure to minimize the ensuing objective function. Using a simulation study the performance of the method is assessed, and the methodology is illustrated using a real data set with a 24-dimensional response vector. © 2014 Elsevier B.V.

  7. Bayesian median regression for temporal gene expression data

    Science.gov (United States)

    Yu, Keming; Vinciotti, Veronica; Liu, Xiaohui; 't Hoen, Peter A. C.

    2007-09-01

    Most of the existing methods for the identification of biologically interesting genes in a temporal expression profiling dataset do not fully exploit the temporal ordering in the dataset and are based on normality assumptions for the gene expression. In this paper, we introduce a Bayesian median regression model to detect genes whose temporal profile is significantly different across a number of biological conditions. The regression model is defined by a polynomial function where both time and condition effects as well as interactions between the two are included. MCMC-based inference returns the posterior distribution of the polynomial coefficients. From this a simple Bayes factor test is proposed to test for significance. The estimation of the median rather than the mean, and within a Bayesian framework, increases the robustness of the method compared to a Hotelling T2-test previously suggested. This is shown on simulated data and on muscular dystrophy gene expression data.

  8. Two SPSS programs for interpreting multiple regression results.

    Science.gov (United States)

    Lorenzo-Seva, Urbano; Ferrando, Pere J; Chico, Eliseo

    2010-02-01

    When multiple regression is used in explanation-oriented designs, it is very important to determine both the usefulness of the predictor variables and their relative importance. Standardized regression coefficients are routinely provided by commercial programs. However, they generally function rather poorly as indicators of relative importance, especially in the presence of substantially correlated predictors. We provide two user-friendly SPSS programs that implement currently recommended techniques and recent developments for assessing the relevance of the predictors. The programs also allow the user to take into account the effects of measurement error. The first program, MIMR-Corr.sps, uses a correlation matrix as input, whereas the second program, MIMR-Raw.sps, uses the raw data and computes bootstrap confidence intervals of different statistics. The SPSS syntax, a short manual, and data files related to this article are available as supplemental materials from http://brm.psychonomic-journals.org/content/supplemental.

  9. Correlation of Cadmium Distribution Coefficients to Soil Characteristics

    DEFF Research Database (Denmark)

    Holm, Peter Engelund; Rootzen, Helle; Borggaard, Ole K.

    2003-01-01

    on whole soil samples have shown that pH is the main parameter controlling the distribution. To identify further the components that are important for Cd binding in soil we measured Cd distribution coefficients (K-d) at two fixed pH values and at low Cd loadings for 49 soils sampled in Denmark. The Kd...... values for Cd ranged from 5 to 3000 L kg(-1). The soils were described pedologically and characterized in detail (22 parameters) including determination of contents of the various minerals in the clay fraction. Correlating parameters were grouped and step-wise regression analysis revealed...... interlayered clay minerals [HIM], chlorite, quartz, microcline, plagioclase) were significant in explaining the Cd distribution coefficient....

  10. Unbalanced Regressions and the Predictive Equation

    DEFF Research Database (Denmark)

    Osterrieder, Daniela; Ventosa-Santaulària, Daniel; Vera-Valdés, J. Eduardo

    Predictive return regressions with persistent regressors are typically plagued by (asymptotically) biased/inconsistent estimates of the slope, non-standard or potentially even spurious statistical inference, and regression unbalancedness. We alleviate the problem of unbalancedness in the theoreti......Predictive return regressions with persistent regressors are typically plagued by (asymptotically) biased/inconsistent estimates of the slope, non-standard or potentially even spurious statistical inference, and regression unbalancedness. We alleviate the problem of unbalancedness...

  11. Semiparametric regression during 2003–2007

    KAUST Repository

    Ruppert, David; Wand, M.P.; Carroll, Raymond J.

    2009-01-01

    Semiparametric regression is a fusion between parametric regression and nonparametric regression that integrates low-rank penalized splines, mixed model and hierarchical Bayesian methodology – thus allowing more streamlined handling of longitudinal and spatial correlation. We review progress in the field over the five-year period between 2003 and 2007. We find semiparametric regression to be a vibrant field with substantial involvement and activity, continual enhancement and widespread application.

  12. Gaussian process regression analysis for functional data

    CERN Document Server

    Shi, Jian Qing

    2011-01-01

    Gaussian Process Regression Analysis for Functional Data presents nonparametric statistical methods for functional regression analysis, specifically the methods based on a Gaussian process prior in a functional space. The authors focus on problems involving functional response variables and mixed covariates of functional and scalar variables.Covering the basics of Gaussian process regression, the first several chapters discuss functional data analysis, theoretical aspects based on the asymptotic properties of Gaussian process regression models, and new methodological developments for high dime

  13. Does higher education protect against obesity? Evidence using Mendelian randomization.

    Science.gov (United States)

    Böckerman, Petri; Viinikainen, Jutta; Pulkki-Råback, Laura; Hakulinen, Christian; Pitkänen, Niina; Lehtimäki, Terho; Pehkonen, Jaakko; Raitakari, Olli T

    2017-08-01

    The aim of this explorative study was to examine the effect of education on obesity using Mendelian randomization. Participants (N=2011) were from the on-going nationally representative Young Finns Study (YFS) that began in 1980 when six cohorts (aged 30, 33, 36, 39, 42 and 45 in 2007) were recruited. The average value of BMI (kg/m 2 ) measurements in 2007 and 2011 and genetic information were linked to comprehensive register-based information on the years of education in 2007. We first used a linear regression (Ordinary Least Squares, OLS) to estimate the relationship between education and BMI. To identify a causal relationship, we exploited Mendelian randomization and used a genetic score as an instrument for education. The genetic score was based on 74 genetic variants that genome-wide association studies (GWASs) have found to be associated with the years of education. Because the genotypes are randomly assigned at conception, the instrument causes exogenous variation in the years of education and thus enables identification of causal effects. The years of education in 2007 were associated with lower BMI in 2007/2011 (regression coefficient (b)=-0.22; 95% Confidence Intervals [CI]=-0.29, -0.14) according to the linear regression results. The results based on Mendelian randomization suggests that there may be a negative causal effect of education on BMI (b=-0.84; 95% CI=-1.77, 0.09). The findings indicate that education could be a protective factor against obesity in advanced countries. Copyright © 2017 Elsevier Inc. All rights reserved.

  14. Adjusting for Confounding in Early Postlaunch Settings: Going Beyond Logistic Regression Models.

    Science.gov (United States)

    Schmidt, Amand F; Klungel, Olaf H; Groenwold, Rolf H H

    2016-01-01

    Postlaunch data on medical treatments can be analyzed to explore adverse events or relative effectiveness in real-life settings. These analyses are often complicated by the number of potential confounders and the possibility of model misspecification. We conducted a simulation study to compare the performance of logistic regression, propensity score, disease risk score, and stabilized inverse probability weighting methods to adjust for confounding. Model misspecification was induced in the independent derivation dataset. We evaluated performance using relative bias confidence interval coverage of the true effect, among other metrics. At low events per coefficient (1.0 and 0.5), the logistic regression estimates had a large relative bias (greater than -100%). Bias of the disease risk score estimates was at most 13.48% and 18.83%. For the propensity score model, this was 8.74% and >100%, respectively. At events per coefficient of 1.0 and 0.5, inverse probability weighting frequently failed or reduced to a crude regression, resulting in biases of -8.49% and 24.55%. Coverage of logistic regression estimates became less than the nominal level at events per coefficient ≤5. For the disease risk score, inverse probability weighting, and propensity score, coverage became less than nominal at events per coefficient ≤2.5, ≤1.0, and ≤1.0, respectively. Bias of misspecified disease risk score models was 16.55%. In settings with low events/exposed subjects per coefficient, disease risk score methods can be useful alternatives to logistic regression models, especially when propensity score models cannot be used. Despite better performance of disease risk score methods than logistic regression and propensity score models in small events per coefficient settings, bias, and coverage still deviated from nominal.

  15. Regression Analysis by Example. 5th Edition

    Science.gov (United States)

    Chatterjee, Samprit; Hadi, Ali S.

    2012-01-01

    Regression analysis is a conceptually simple method for investigating relationships among variables. Carrying out a successful application of regression analysis, however, requires a balance of theoretical results, empirical rules, and subjective judgment. "Regression Analysis by Example, Fifth Edition" has been expanded and thoroughly…

  16. A Seemingly Unrelated Poisson Regression Model

    OpenAIRE

    King, Gary

    1989-01-01

    This article introduces a new estimator for the analysis of two contemporaneously correlated endogenous event count variables. This seemingly unrelated Poisson regression model (SUPREME) estimator combines the efficiencies created by single equation Poisson regression model estimators and insights from "seemingly unrelated" linear regression models.

  17. Multiple Response Regression for Gaussian Mixture Models with Known Labels.

    Science.gov (United States)

    Lee, Wonyul; Du, Ying; Sun, Wei; Hayes, D Neil; Liu, Yufeng

    2012-12-01

    Multiple response regression is a useful regression technique to model multiple response variables using the same set of predictor variables. Most existing methods for multiple response regression are designed for modeling homogeneous data. In many applications, however, one may have heterogeneous data where the samples are divided into multiple groups. Our motivating example is a cancer dataset where the samples belong to multiple cancer subtypes. In this paper, we consider modeling the data coming from a mixture of several Gaussian distributions with known group labels. A naive approach is to split the data into several groups according to the labels and model each group separately. Although it is simple, this approach ignores potential common structures across different groups. We propose new penalized methods to model all groups jointly in which the common and unique structures can be identified. The proposed methods estimate the regression coefficient matrix, as well as the conditional inverse covariance matrix of response variables. Asymptotic properties of the proposed methods are explored. Through numerical examples, we demonstrate that both estimation and prediction can be improved by modeling all groups jointly using the proposed methods. An application to a glioblastoma cancer dataset reveals some interesting common and unique gene relationships across different cancer subtypes.

  18. Boosting structured additive quantile regression for longitudinal childhood obesity data.

    Science.gov (United States)

    Fenske, Nora; Fahrmeir, Ludwig; Hothorn, Torsten; Rzehak, Peter; Höhle, Michael

    2013-07-25

    Childhood obesity and the investigation of its risk factors has become an important public health issue. Our work is based on and motivated by a German longitudinal study including 2,226 children with up to ten measurements on their body mass index (BMI) and risk factors from birth to the age of 10 years. We introduce boosting of structured additive quantile regression as a novel distribution-free approach for longitudinal quantile regression. The quantile-specific predictors of our model include conventional linear population effects, smooth nonlinear functional effects, varying-coefficient terms, and individual-specific effects, such as intercepts and slopes. Estimation is based on boosting, a computer intensive inference method for highly complex models. We propose a component-wise functional gradient descent boosting algorithm that allows for penalized estimation of the large variety of different effects, particularly leading to individual-specific effects shrunken toward zero. This concept allows us to flexibly estimate the nonlinear age curves of upper quantiles of the BMI distribution, both on population and on individual-specific level, adjusted for further risk factors and to detect age-varying effects of categorical risk factors. Our model approach can be regarded as the quantile regression analog of Gaussian additive mixed models (or structured additive mean regression models), and we compare both model classes with respect to our obesity data.

  19. Geographically Weighted Logistic Regression Applied to Credit Scoring Models

    Directory of Open Access Journals (Sweden)

    Pedro Henrique Melo Albuquerque

    Full Text Available Abstract This study used real data from a Brazilian financial institution on transactions involving Consumer Direct Credit (CDC, granted to clients residing in the Distrito Federal (DF, to construct credit scoring models via Logistic Regression and Geographically Weighted Logistic Regression (GWLR techniques. The aims were: to verify whether the factors that influence credit risk differ according to the borrower’s geographic location; to compare the set of models estimated via GWLR with the global model estimated via Logistic Regression, in terms of predictive power and financial losses for the institution; and to verify the viability of using the GWLR technique to develop credit scoring models. The metrics used to compare the models developed via the two techniques were the AICc informational criterion, the accuracy of the models, the percentage of false positives, the sum of the value of false positive debt, and the expected monetary value of portfolio default compared with the monetary value of defaults observed. The models estimated for each region in the DF were distinct in their variables and coefficients (parameters, with it being concluded that credit risk was influenced differently in each region in the study. The Logistic Regression and GWLR methodologies presented very close results, in terms of predictive power and financial losses for the institution, and the study demonstrated viability in using the GWLR technique to develop credit scoring models for the target population in the study.

  20. Intermediate and advanced topics in multilevel logistic regression analysis.

    Science.gov (United States)

    Austin, Peter C; Merlo, Juan

    2017-09-10

    Multilevel data occur frequently in health services, population and public health, and epidemiologic research. In such research, binary outcomes are common. Multilevel logistic regression models allow one to account for the clustering of subjects within clusters of higher-level units when estimating the effect of subject and cluster characteristics on subject outcomes. A search of the PubMed database demonstrated that the use of multilevel or hierarchical regression models is increasing rapidly. However, our impression is that many analysts simply use multilevel regression models to account for the nuisance of within-cluster homogeneity that is induced by clustering. In this article, we describe a suite of analyses that can complement the fitting of multilevel logistic regression models. These ancillary analyses permit analysts to estimate the marginal or population-average effect of covariates measured at the subject and cluster level, in contrast to the within-cluster or cluster-specific effects arising from the original multilevel logistic regression model. We describe the interval odds ratio and the proportion of opposed odds ratios, which are summary measures of effect for cluster-level covariates. We describe the variance partition coefficient and the median odds ratio which are measures of components of variance and heterogeneity in outcomes. These measures allow one to quantify the magnitude of the general contextual effect. We describe an R 2 measure that allows analysts to quantify the proportion of variation explained by different multilevel logistic regression models. We illustrate the application and interpretation of these measures by analyzing mortality in patients hospitalized with a diagnosis of acute myocardial infarction. © 2017 The Authors. Statistics in Medicine published by John Wiley & Sons Ltd. © 2017 The Authors. Statistics in Medicine published by John Wiley & Sons Ltd.

  1. Direct modeling of regression effects for transition probabilities in the progressive illness-death model

    DEFF Research Database (Denmark)

    Azarang, Leyla; Scheike, Thomas; de Uña-Álvarez, Jacobo

    2017-01-01

    In this work, we present direct regression analysis for the transition probabilities in the possibly non-Markov progressive illness–death model. The method is based on binomial regression, where the response is the indicator of the occupancy for the given state along time. Randomly weighted score...

  2. Optimizing Prophylactic CPAP in Patients Without Obstructive Sleep Apnoea for High-Risk Abdominal Surgeries: A Meta-regression Analysis.

    Science.gov (United States)

    Singh, Preet Mohinder; Borle, Anuradha; Shah, Dipal; Sinha, Ashish; Makkar, Jeetinder Kaur; Trikha, Anjan; Goudra, Basavana Gouda

    2016-04-01

    Prophylactic continuous positive airway pressure (CPAP) can prevent pulmonary adverse events following upper abdominal surgeries. The present meta-regression evaluates and quantifies the effect of degree/duration of (CPAP) on the incidence of postoperative pulmonary events. Medical databases were searched for randomized controlled trials involving adult patients, comparing the outcome in those receiving prophylactic postoperative CPAP versus no CPAP, undergoing high-risk abdominal surgeries. Our meta-analysis evaluated the relationship between the postoperative pulmonary complications and the use of CPAP. Furthermore, meta-regression was used to quantify the effect of cumulative duration and degree of CPAP on the measured outcomes. Seventy-three potentially relevant studies were identified, of which 11 had appropriate data, allowing us to compare a total of 362 and 363 patients in CPAP and control groups, respectively. Qualitatively, Odds ratio for CPAP showed protective effect for pneumonia [0.39 (0.19-0.78)], atelectasis [0.51 (0.32-0.80)] and pulmonary complications [0.37 (0.24-0.56)] with zero heterogeneity. For prevention of pulmonary complications, odds ratio was better for continuous than intermittent CPAP. Meta-regression demonstrated a positive correlation between the degree of CPAP and the incidence of pneumonia with a regression coefficient of +0.61 (95 % CI 0.02-1.21, P = 0.048, τ (2) = 0.078, r (2) = 7.87 %). Overall, adverse effects were similar with or without the use of CPAP. Prophylactic postoperative use of continuous CPAP significantly reduces the incidence of postoperative pneumonia, atelectasis and pulmonary complications in patients undergoing high-risk abdominal surgeries. Quantitatively, increasing the CPAP levels does not necessarily enhance the protective effect against pneumonia. Instead, protective effect diminishes with increasing degree of CPAP.

  3. Design and analysis of experiments classical and regression approaches with SAS

    CERN Document Server

    Onyiah, Leonard C

    2008-01-01

    Introductory Statistical Inference and Regression Analysis Elementary Statistical Inference Regression Analysis Experiments, the Completely Randomized Design (CRD)-Classical and Regression Approaches Experiments Experiments to Compare Treatments Some Basic Ideas Requirements of a Good Experiment One-Way Experimental Layout or the CRD: Design and Analysis Analysis of Experimental Data (Fixed Effects Model) Expected Values for the Sums of Squares The Analysis of Variance (ANOVA) Table Follow-Up Analysis to Check fo

  4. Regression with Sparse Approximations of Data

    DEFF Research Database (Denmark)

    Noorzad, Pardis; Sturm, Bob L.

    2012-01-01

    We propose sparse approximation weighted regression (SPARROW), a method for local estimation of the regression function that uses sparse approximation with a dictionary of measurements. SPARROW estimates the regression function at a point with a linear combination of a few regressands selected...... by a sparse approximation of the point in terms of the regressors. We show SPARROW can be considered a variant of \\(k\\)-nearest neighbors regression (\\(k\\)-NNR), and more generally, local polynomial kernel regression. Unlike \\(k\\)-NNR, however, SPARROW can adapt the number of regressors to use based...

  5. Spontaneous regression of a congenital melanocytic nevus

    Directory of Open Access Journals (Sweden)

    Amiya Kumar Nath

    2011-01-01

    Full Text Available Congenital melanocytic nevus (CMN may rarely regress which may also be associated with a halo or vitiligo. We describe a 10-year-old girl who presented with CMN on the left leg since birth, which recently started to regress spontaneously with associated depigmentation in the lesion and at a distant site. Dermoscopy performed at different sites of the regressing lesion demonstrated loss of epidermal pigments first followed by loss of dermal pigments. Histopathology and Masson-Fontana stain demonstrated lymphocytic infiltration and loss of pigment production in the regressing area. Immunohistochemistry staining (S100 and HMB-45, however, showed that nevus cells were present in the regressing areas.

  6. SPSS macros to compare any two fitted values from a regression model.

    Science.gov (United States)

    Weaver, Bruce; Dubois, Sacha

    2012-12-01

    In regression models with first-order terms only, the coefficient for a given variable is typically interpreted as the change in the fitted value of Y for a one-unit increase in that variable, with all other variables held constant. Therefore, each regression coefficient represents the difference between two fitted values of Y. But the coefficients represent only a fraction of the possible fitted value comparisons that might be of interest to researchers. For many fitted value comparisons that are not captured by any of the regression coefficients, common statistical software packages do not provide the standard errors needed to compute confidence intervals or carry out statistical tests-particularly in more complex models that include interactions, polynomial terms, or regression splines. We describe two SPSS macros that implement a matrix algebra method for comparing any two fitted values from a regression model. The !OLScomp and !MLEcomp macros are for use with models fitted via ordinary least squares and maximum likelihood estimation, respectively. The output from the macros includes the standard error of the difference between the two fitted values, a 95% confidence interval for the difference, and a corresponding statistical test with its p-value.

  7. A comparison of two indices for the intraclass correlation coefficient.

    Science.gov (United States)

    Shieh, Gwowen

    2012-12-01

    In the present study, we examined the behavior of two indices for measuring the intraclass correlation in the one-way random effects model: the prevailing ICC(1) (Fisher, 1938) and the corrected eta-squared (Bliese & Halverson, 1998). These two procedures differ both in their methods of estimating the variance components that define the intraclass correlation coefficient and in their performance of bias and mean squared error in the estimation of the intraclass correlation coefficient. In contrast with the natural unbiased principle used to construct ICC(1), in the present study it was analytically shown that the corrected eta-squared estimator is identical to the maximum likelihood estimator and the pairwise estimator under equal group sizes. Moreover, the empirical results obtained from the present Monte Carlo simulation study across various group structures revealed the mutual dominance relationship between their truncated versions for negative values. The corrected eta-squared estimator performs better than the ICC(1) estimator when the underlying population intraclass correlation coefficient is small. Conversely, ICC(1) has a clear advantage over the corrected eta-squared for medium and large magnitudes of population intraclass correlation coefficient. The conceptual description and numerical investigation provide guidelines to help researchers choose between the two indices for more accurate reliability analysis in multilevel research.

  8. FITTING OF THE DATA FOR DIFFUSION COEFFICIENTS IN UNSATURATED POROUS MEDIA

    Energy Technology Data Exchange (ETDEWEB)

    B. Bullard

    1999-05-01

    The purpose of this calculation is to evaluate diffusion coefficients in unsaturated porous media for use in the TSPA-VA analyses. Using experimental data, regression techniques were used to curve fit the diffusion coefficient in unsaturated porous media as a function of volumetric water content. This calculation substantiates the model fit used in Total System Performance Assessment-1995 An Evaluation of the Potential Yucca Mountain Repository (TSPA-1995), Section 6.5.4.

  9. FITTING OF THE DATA FOR DIFFUSION COEFFICIENTS IN UNSATURATED POROUS MEDIA

    International Nuclear Information System (INIS)

    B. Bullard

    1999-01-01

    The purpose of this calculation is to evaluate diffusion coefficients in unsaturated porous media for use in the TSPA-VA analyses. Using experimental data, regression techniques were used to curve fit the diffusion coefficient in unsaturated porous media as a function of volumetric water content. This calculation substantiates the model fit used in Total System Performance Assessment-1995 An Evaluation of the Potential Yucca Mountain Repository (TSPA-1995), Section 6.5.4

  10. A drying coefficient for building materials

    DEFF Research Database (Denmark)

    Scheffler, Gregor Albrecht; Plagge, Rudolf

    2009-01-01

    coefficient is defined which can be determined based on measured drying data. The correlation of this coefficient with the water absorption and the vapour diffusion coefficient is analyzed and its additional information content is critically challenged. As result, a drying coefficient has been derived......The drying experiment is an important element of the hygrothermal characterisation of building materials. Contrary to other moisture transport experiments as the vapour diffusion and the water absorption test, it is until now not possible to derive a simple coefficient for the drying. However......, in many cases such a coefficient would be highly appreciated, e.g. in interaction of industry and research or for the distinction and selection of suitable building materials throughout design and practise. This article first highlights the importance of drying experiments for hygrothermal...

  11. Apparatus for measurement of coefficient of friction

    Science.gov (United States)

    Slifka, A. J.; Siegwarth, J. D.; Sparks, L. L.; Chaudhuri, Dilip K.

    1990-01-01

    An apparatus designed to measure the coefficient of friction in certain controlled atmospheres is described. The coefficient of friction observed during high-load tests was nearly constant, with an average value of 0.56. This value is in general agreement with that found in the literature and also with the initial friction coefficient value of 0.67 measured during self-mated friction of 440C steel in an oxygen environment.

  12. New definition of the cell diffusion coefficient

    International Nuclear Information System (INIS)

    Koehler, P.

    1975-01-01

    As was shown in a recent work by Gelbard, the usually applied Benoist definition of the cell diffusion coefficient gives two different values if two different definitions of the cell are made. A new definition is proposed that preserves the neutron balance for the homogenized lattice and that is independent of the cell definition. The resulting diffusion coefficient is identical with the main term of Benoist's diffusion coefficient

  13. Sample size determination for logistic regression on a logit-normal distribution.

    Science.gov (United States)

    Kim, Seongho; Heath, Elisabeth; Heilbrun, Lance

    2017-06-01

    Although the sample size for simple logistic regression can be readily determined using currently available methods, the sample size calculation for multiple logistic regression requires some additional information, such as the coefficient of determination ([Formula: see text]) of a covariate of interest with other covariates, which is often unavailable in practice. The response variable of logistic regression follows a logit-normal distribution which can be generated from a logistic transformation of a normal distribution. Using this property of logistic regression, we propose new methods of determining the sample size for simple and multiple logistic regressions using a normal transformation of outcome measures. Simulation studies and a motivating example show several advantages of the proposed methods over the existing methods: (i) no need for [Formula: see text] for multiple logistic regression, (ii) available interim or group-sequential designs, and (iii) much smaller required sample size.

  14. Mean centering, multicollinearity, and moderators in multiple regression: The reconciliation redux.

    Science.gov (United States)

    Iacobucci, Dawn; Schneider, Matthew J; Popovich, Deidre L; Bakamitsos, Georgios A

    2017-02-01

    In this article, we attempt to clarify our statements regarding the effects of mean centering. In a multiple regression with predictors A, B, and A × B (where A × B serves as an interaction term), mean centering A and B prior to computing the product term can clarify the regression coefficients (which is good) and the overall model fit R 2 will remain undisturbed (which is also good).

  15. Transfer coefficients in ultracold strongly coupled plasma

    Science.gov (United States)

    Bobrov, A. A.; Vorob'ev, V. S.; Zelener, B. V.

    2018-03-01

    We use both analytical and molecular dynamic methods for electron transfer coefficients in an ultracold plasma when its temperature is small and the coupling parameter characterizing the interaction of electrons and ions exceeds unity. For these conditions, we use the approach of nearest neighbor to determine the average electron (ion) diffusion coefficient and to calculate other electron transfer coefficients (viscosity and electrical and thermal conductivities). Molecular dynamics simulations produce electronic and ionic diffusion coefficients, confirming the reliability of these results. The results compare favorably with experimental and numerical data from earlier studies.

  16. Comparing linear probability model coefficients across groups

    DEFF Research Database (Denmark)

    Holm, Anders; Ejrnæs, Mette; Karlson, Kristian Bernt

    2015-01-01

    of the following three components: outcome truncation, scale parameters and distributional shape of the predictor variable. These results point to limitations in using linear probability model coefficients for group comparisons. We also provide Monte Carlo simulations and real examples to illustrate......This article offers a formal identification analysis of the problem in comparing coefficients from linear probability models between groups. We show that differences in coefficients from these models can result not only from genuine differences in effects, but also from differences in one or more...... these limitations, and we suggest a restricted approach to using linear probability model coefficients in group comparisons....

  17. Random walk on random walks

    NARCIS (Netherlands)

    Hilário, M.; Hollander, den W.Th.F.; Sidoravicius, V.; Soares dos Santos, R.; Teixeira, A.

    2014-01-01

    In this paper we study a random walk in a one-dimensional dynamic random environment consisting of a collection of independent particles performing simple symmetric random walks in a Poisson equilibrium with density ¿¿(0,8). At each step the random walk performs a nearest-neighbour jump, moving to

  18. Applied regression analysis a research tool

    CERN Document Server

    Pantula, Sastry; Dickey, David

    1998-01-01

    Least squares estimation, when used appropriately, is a powerful research tool. A deeper understanding of the regression concepts is essential for achieving optimal benefits from a least squares analysis. This book builds on the fundamentals of statistical methods and provides appropriate concepts that will allow a scientist to use least squares as an effective research tool. Applied Regression Analysis is aimed at the scientist who wishes to gain a working knowledge of regression analysis. The basic purpose of this book is to develop an understanding of least squares and related statistical methods without becoming excessively mathematical. It is the outgrowth of more than 30 years of consulting experience with scientists and many years of teaching an applied regression course to graduate students. Applied Regression Analysis serves as an excellent text for a service course on regression for non-statisticians and as a reference for researchers. It also provides a bridge between a two-semester introduction to...

  19. Modified loss coefficients in the determination of optimum generation scheduling

    Energy Technology Data Exchange (ETDEWEB)

    Hazarika, D.; Bordoloi, P.K. (Assam Engineering Coll. (IN))

    1991-03-01

    A modified method has been evolved to form the loss coefficients of an electrical power system network by decoupling load and generation and thereby creating additional fictitious load buses. The system losses are then calculated and co-ordinated to arrive at an optimum scheduling of generation using the standard co-ordination equation. The method presented is superior to the ones currently available, in that it is applicable to a multimachine system with random variation of load and it accounts for limits in plant generations and line losses. The precise nature of results and the economy in the cost of energy production obtained by this method is quantified and presented. (author).

  20. Resummed coefficient function for the shape function

    OpenAIRE

    Aglietti, U.

    2001-01-01

    We present a leading evaluation of the resummed coefficient function for the shape function. It is also shown that the coefficient function is short-distance-dominated. Our results allow relating the shape function computed on the lattice to the physical QCD distributions.