regression test problems: Topics by WorldWideScience.org

Sample records for regression test problems

Summary of Documentation for DYNA3D-ParaDyn's Software Quality Assurance Regression Test Problems

Energy Technology Data Exchange (ETDEWEB)

Zywicz, Edward [Lawrence Livermore National Lab. (LLNL), Livermore, CA (United States)

2016-08-18

The Software Quality Assurance (SQA) regression test suite for DYNA3D (Zywicz and Lin, 2015) and ParaDyn (DeGroot, et al., 2015) currently contains approximately 600 problems divided into 21 suites, and is a required component of ParaDyn’s SQA plan (Ferencz and Oliver, 2013). The regression suite allows developers to ensure that software modifications do not unintentionally alter the code response. The entire regression suite is run prior to permanently incorporating any software modification or addition. When code modifications alter test problem results, the specific cause must be determined and fully understood before the software changes and revised test answers can be incorporated. The regression suite is executed on LLNL platforms using a Python script and an associated data file. The user specifies the DYNA3D or ParaDyn executable, number of processors to use, test problems to run, and other options to the script. The data file details how each problem and its answer extraction scripts are executed. For each problem in the regression suite there exists an input deck, an eight-processor partition file, an answer file, and various extraction scripts. These scripts assemble a temporary answer file in a specific format from the simulation results. The temporary and stored answer files are compared to a specific level of numerical precision, and when differences are detected the test problem is flagged as failed. Presently, numerical results are stored and compared to 16 digits. At this accuracy level different processor types, compilers, number of partitions, etc. impact the results to various degrees. Thus, for consistency purposes the regression suite is run with ParaDyn using 8 processors on machines with a specific processor type (currently the Intel Xeon E5530 processor). For non-parallel regression problems, i.e., the two XFEM problems, DYNA3D is used instead. When environments or platforms change, executables using the current source code and the new
DYNA3D/ParaDyn Regression Test Suite Inventory

Energy Technology Data Exchange (ETDEWEB)

Lin, Jerry I. [Lawrence Livermore National Lab. (LLNL), Livermore, CA (United States)

2016-09-01

The following table constitutes an initial assessment of feature coverage across the regression test suite used for DYNA3D and ParaDyn. It documents the regression test suite at the time of preliminary release 16.1 in September 2016. The columns of the table represent groupings of functionalities, e.g., material models. Each problem in the test suite is represented by a row in the table. All features exercised by the problem are denoted by a check mark (√) in the corresponding column. The definition of “feature” has not been subdivided to its smallest unit of user input, e.g., algorithmic parameters specific to a particular type of contact surface. This represents a judgment to provide code developers and users a reasonable impression of feature coverage without expanding the width of the table by several multiples. All regression testing is run in parallel, typically with eight processors, except problems involving features only available in serial mode. Many are strictly regression tests acting as a check that the codes continue to produce adequately repeatable results as development unfolds; compilers change and platforms are replaced. A subset of the tests represents true verification problems that have been checked against analytical or other benchmark solutions. Users are welcomed to submit documented problems for inclusion in the test suite, especially if they are heavily exercising, and dependent upon, features that are currently underrepresented.
A Powerful Test for Comparing Multiple Regression Functions.

Science.gov (United States)

Maity, Arnab

2012-09-01

In this article, we address the important problem of comparison of two or more population regression functions. Recently, Pardo-Fernández, Van Keilegom and González-Manteiga (2007) developed test statistics for simple nonparametric regression models: Y(ij) = θ(j)(Z(ij)) + σ(j)(Z(ij))∊(ij), based on empirical distributions of the errors in each population j = 1, … , J. In this paper, we propose a test for equality of the θ(j)(·) based on the concept of generalized likelihood ratio type statistics. We also generalize our test for other nonparametric regression setups, e.g, nonparametric logistic regression, where the loglikelihood for population j is any general smooth function [Formula: see text]. We describe a resampling procedure to obtain the critical values of the test. In addition, we present a simulation study to evaluate the performance of the proposed test and compare our results to those in Pardo-Fernández et al. (2007).
Credit Scoring Problem Based on Regression Analysis

OpenAIRE

Khassawneh, Bashar Suhil Jad Allah

2014-01-01

ABSTRACT: This thesis provides an explanatory introduction to the regression models of data mining and contains basic definitions of key terms in the linear, multiple and logistic regression models. Meanwhile, the aim of this study is to illustrate fitting models for the credit scoring problem using simple linear, multiple linear and logistic regression models and also to analyze the found model functions by statistical tools. Keywords: Data mining, linear regression, logistic regression....
Polynomial regression analysis and significance test of the regression function

International Nuclear Information System (INIS)

Gao Zhengming; Zhao Juan; He Shengping

2012-01-01

In order to analyze the decay heating power of a certain radioactive isotope per kilogram with polynomial regression method, the paper firstly demonstrated the broad usage of polynomial function and deduced its parameters with ordinary least squares estimate. Then significance test method of polynomial regression function is derived considering the similarity between the polynomial regression model and the multivariable linear regression model. Finally, polynomial regression analysis and significance test of the polynomial function are done to the decay heating power of the iso tope per kilogram in accord with the authors' real work. (authors)
Bayesian nonlinear regression for large small problems

KAUST Repository

Chakraborty, Sounak; Ghosh, Malay; Mallick, Bani K.

2012-01-01

Statistical modeling and inference problems with sample sizes substantially smaller than the number of available covariates are challenging. This is known as large p small n problem. Furthermore, the problem is more complicated when we have multiple correlated responses. We develop multivariate nonlinear regression models in this setup for accurate prediction. In this paper, we introduce a full Bayesian support vector regression model with Vapnik's ε-insensitive loss function, based on reproducing kernel Hilbert spaces (RKHS) under the multivariate correlated response setup. This provides a full probabilistic description of support vector machine (SVM) rather than an algorithm for fitting purposes. We have also introduced a multivariate version of the relevance vector machine (RVM). Instead of the original treatment of the RVM relying on the use of type II maximum likelihood estimates of the hyper-parameters, we put a prior on the hyper-parameters and use Markov chain Monte Carlo technique for computation. We have also proposed an empirical Bayes method for our RVM and SVM. Our methods are illustrated with a prediction problem in the near-infrared (NIR) spectroscopy. A simulation study is also undertaken to check the prediction accuracy of our models. © 2012 Elsevier Inc.
Bayesian nonlinear regression for large small problems

KAUST Repository

Chakraborty, Sounak

2012-07-01

Statistical modeling and inference problems with sample sizes substantially smaller than the number of available covariates are challenging. This is known as large p small n problem. Furthermore, the problem is more complicated when we have multiple correlated responses. We develop multivariate nonlinear regression models in this setup for accurate prediction. In this paper, we introduce a full Bayesian support vector regression model with Vapnik\\'s ε-insensitive loss function, based on reproducing kernel Hilbert spaces (RKHS) under the multivariate correlated response setup. This provides a full probabilistic description of support vector machine (SVM) rather than an algorithm for fitting purposes. We have also introduced a multivariate version of the relevance vector machine (RVM). Instead of the original treatment of the RVM relying on the use of type II maximum likelihood estimates of the hyper-parameters, we put a prior on the hyper-parameters and use Markov chain Monte Carlo technique for computation. We have also proposed an empirical Bayes method for our RVM and SVM. Our methods are illustrated with a prediction problem in the near-infrared (NIR) spectroscopy. A simulation study is also undertaken to check the prediction accuracy of our models. © 2012 Elsevier Inc.
Testing for constant nonparametric effects in general semiparametric regression models with interactions

KAUST Repository

Wei, Jiawei; Carroll, Raymond J.; Maity, Arnab

2011-01-01

We consider the problem of testing for a constant nonparametric effect in a general semi-parametric regression model when there is the potential for interaction between the parametrically and nonparametrically modeled variables. The work
Significance testing in ridge regression for genetic data

Directory of Open Access Journals (Sweden)

De Iorio Maria

2011-09-01

Full Text Available Abstract Background Technological developments have increased the feasibility of large scale genetic association studies. Densely typed genetic markers are obtained using SNP arrays, next-generation sequencing technologies and imputation. However, SNPs typed using these methods can be highly correlated due to linkage disequilibrium among them, and standard multiple regression techniques fail with these data sets due to their high dimensionality and correlation structure. There has been increasing interest in using penalised regression in the analysis of high dimensional data. Ridge regression is one such penalised regression technique which does not perform variable selection, instead estimating a regression coefficient for each predictor variable. It is therefore desirable to obtain an estimate of the significance of each ridge regression coefficient. Results We develop and evaluate a test of significance for ridge regression coefficients. Using simulation studies, we demonstrate that the performance of the test is comparable to that of a permutation test, with the advantage of a much-reduced computational cost. We introduce the p-value trace, a plot of the negative logarithm of the p-values of ridge regression coefficients with increasing shrinkage parameter, which enables the visualisation of the change in p-value of the regression coefficients with increasing penalisation. We apply the proposed method to a lung cancer case-control data set from EPIC, the European Prospective Investigation into Cancer and Nutrition. Conclusions The proposed test is a useful alternative to a permutation test for the estimation of the significance of ridge regression coefficients, at a much-reduced computational cost. The p-value trace is an informative graphical tool for evaluating the results of a test of significance of ridge regression coefficients as the shrinkage parameter increases, and the proposed test makes its production computationally feasible.
Bias in logistic regression due to imperfect diagnostic test results and practical correction approaches.

Science.gov (United States)

Valle, Denis; Lima, Joanna M Tucker; Millar, Justin; Amratia, Punam; Haque, Ubydul

2015-11-04

Logistic regression is a statistical model widely used in cross-sectional and cohort studies to identify and quantify the effects of potential disease risk factors. However, the impact of imperfect tests on adjusted odds ratios (and thus on the identification of risk factors) is under-appreciated. The purpose of this article is to draw attention to the problem associated with modelling imperfect diagnostic tests, and propose simple Bayesian models to adequately address this issue. A systematic literature review was conducted to determine the proportion of malaria studies that appropriately accounted for false-negatives/false-positives in a logistic regression setting. Inference from the standard logistic regression was also compared with that from three proposed Bayesian models using simulations and malaria data from the western Brazilian Amazon. A systematic literature review suggests that malaria epidemiologists are largely unaware of the problem of using logistic regression to model imperfect diagnostic test results. Simulation results reveal that statistical inference can be substantially improved when using the proposed Bayesian models versus the standard logistic regression. Finally, analysis of original malaria data with one of the proposed Bayesian models reveals that microscopy sensitivity is strongly influenced by how long people have lived in the study region, and an important risk factor (i.e., participation in forest extractivism) is identified that would have been missed by standard logistic regression. Given the numerous diagnostic methods employed by malaria researchers and the ubiquitous use of logistic regression to model the results of these diagnostic tests, this paper provides critical guidelines to improve data analysis practice in the presence of misclassification error. Easy-to-use code that can be readily adapted to WinBUGS is provided, enabling straightforward implementation of the proposed Bayesian models.
Constitutive Theories of Self-Knowledge and the Regress Problem ...

African Journals Online (AJOL)

... on the other hand, hold that self-knowledge is constitutive of intentional states. That is, self-ascription is a necessary condition for being in a particular mental state. Akeel Bilgrami is a defender of the constitutive model. I argue that the constitutive model gives rise to a regress problem. This paper will focus on that problem ...
Testing homogeneity in Weibull-regression models.

Science.gov (United States)

Bolfarine, Heleno; Valença, Dione M

2005-10-01

In survival studies with families or geographical units it may be of interest testing whether such groups are homogeneous for given explanatory variables. In this paper we consider score type tests for group homogeneity based on a mixing model in which the group effect is modelled as a random variable. As opposed to hazard-based frailty models, this model presents survival times that conditioned on the random effect, has an accelerated failure time representation. The test statistics requires only estimation of the conventional regression model without the random effect and does not require specifying the distribution of the random effect. The tests are derived for a Weibull regression model and in the uncensored situation, a closed form is obtained for the test statistic. A simulation study is used for comparing the power of the tests. The proposed tests are applied to real data sets with censored data.
HYBRID DATA APPROACH FOR SELECTING EFFECTIVE TEST CASES DURING THE REGRESSION TESTING

OpenAIRE

Mohan, M.; Shrimali, Tarun

2017-01-01

In the software industry, software testing becomes more important in the entire software development life cycle. Software testing is one of the fundamental components of software quality assurances. Software Testing Life Cycle (STLC)is a process involved in testing the complete software, which includes Regression Testing, Unit Testing, Smoke Testing, Integration Testing, Interface Testing, System Testing & etc. In the STLC of Regression testing, test case selection is one of the most importan...
Computing group cardinality constraint solutions for logistic regression problems.

Science.gov (United States)

Zhang, Yong; Kwon, Dongjin; Pohl, Kilian M

2017-01-01

We derive an algorithm to directly solve logistic regression based on cardinality constraint, group sparsity and use it to classify intra-subject MRI sequences (e.g. cine MRIs) of healthy from diseased subjects. Group cardinality constraint models are often applied to medical images in order to avoid overfitting of the classifier to the training data. Solutions within these models are generally determined by relaxing the cardinality constraint to a weighted feature selection scheme. However, these solutions relate to the original sparse problem only under specific assumptions, which generally do not hold for medical image applications. In addition, inferring clinical meaning from features weighted by a classifier is an ongoing topic of discussion. Avoiding weighing features, we propose to directly solve the group cardinality constraint logistic regression problem by generalizing the Penalty Decomposition method. To do so, we assume that an intra-subject series of images represents repeated samples of the same disease patterns. We model this assumption by combining series of measurements created by a feature across time into a single group. Our algorithm then derives a solution within that model by decoupling the minimization of the logistic regression function from enforcing the group sparsity constraint. The minimum to the smooth and convex logistic regression problem is determined via gradient descent while we derive a closed form solution for finding a sparse approximation of that minimum. We apply our method to cine MRI of 38 healthy controls and 44 adult patients that received reconstructive surgery of Tetralogy of Fallot (TOF) during infancy. Our method correctly identifies regions impacted by TOF and generally obtains statistically significant higher classification accuracy than alternative solutions to this model, i.e., ones relaxing group cardinality constraints. Copyright © 2016 Elsevier B.V. All rights reserved.
Multiple regression for physiological data analysis: the problem of multicollinearity.

Science.gov (United States)

Slinker, B K; Glantz, S A

1985-07-01

Multiple linear regression, in which several predictor variables are related to a response variable, is a powerful statistical tool for gaining quantitative insight into complex in vivo physiological systems. For these insights to be correct, all predictor variables must be uncorrelated. However, in many physiological experiments the predictor variables cannot be precisely controlled and thus change in parallel (i.e., they are highly correlated). There is a redundancy of information about the response, a situation called multicollinearity, that leads to numerical problems in estimating the parameters in regression equations; the parameters are often of incorrect magnitude or sign or have large standard errors. Although multicollinearity can be avoided with good experimental design, not all interesting physiological questions can be studied without encountering multicollinearity. In these cases various ad hoc procedures have been proposed to mitigate multicollinearity. Although many of these procedures are controversial, they can be helpful in applying multiple linear regression to some physiological problems.
Testing for marginal linear effects in quantile regression

KAUST Repository

Wang, Huixia Judy

2017-10-23

The paper develops a new marginal testing procedure to detect significant predictors that are associated with the conditional quantiles of a scalar response. The idea is to fit the marginal quantile regression on each predictor one at a time, and then to base the test on the t-statistics that are associated with the most predictive predictors. A resampling method is devised to calibrate this test statistic, which has non-regular limiting behaviour due to the selection of the most predictive variables. Asymptotic validity of the procedure is established in a general quantile regression setting in which the marginal quantile regression models can be misspecified. Even though a fixed dimension is assumed to derive the asymptotic results, the test proposed is applicable and computationally feasible for large dimensional predictors. The method is more flexible than existing marginal screening test methods based on mean regression and has the added advantage of being robust against outliers in the response. The approach is illustrated by using an application to a human immunodeficiency virus drug resistance data set.
Testing for marginal linear effects in quantile regression

KAUST Repository

Wang, Huixia Judy; McKeague, Ian W.; Qian, Min

2017-01-01

The paper develops a new marginal testing procedure to detect significant predictors that are associated with the conditional quantiles of a scalar response. The idea is to fit the marginal quantile regression on each predictor one at a time, and then to base the test on the t-statistics that are associated with the most predictive predictors. A resampling method is devised to calibrate this test statistic, which has non-regular limiting behaviour due to the selection of the most predictive variables. Asymptotic validity of the procedure is established in a general quantile regression setting in which the marginal quantile regression models can be misspecified. Even though a fixed dimension is assumed to derive the asymptotic results, the test proposed is applicable and computationally feasible for large dimensional predictors. The method is more flexible than existing marginal screening test methods based on mean regression and has the added advantage of being robust against outliers in the response. The approach is illustrated by using an application to a human immunodeficiency virus drug resistance data set.
Considering a non-polynomial basis for local kernel regression problem

Science.gov (United States)

Silalahi, Divo Dharma; Midi, Habshah

2017-01-01

A common used as solution for local kernel nonparametric regression problem is given using polynomial regression. In this study, we demonstrated the estimator and properties using maximum likelihood estimator for a non-polynomial basis such B-spline to replacing the polynomial basis. This estimator allows for flexibility in the selection of a bandwidth and a knot. The best estimator was selected by finding an optimal bandwidth and knot through minimizing the famous generalized validation function.
Testing hypotheses for differences between linear regression lines

Science.gov (United States)

Stanley J. Zarnoch

2009-01-01

Five hypotheses are identified for testing differences between simple linear regression lines. The distinctions between these hypotheses are based on a priori assumptions and illustrated with full and reduced models. The contrast approach is presented as an easy and complete method for testing for overall differences between the regressions and for making pairwise...
Testing for constant nonparametric effects in general semiparametric regression models with interactions

KAUST Repository

Wei, Jiawei

2011-07-01

We consider the problem of testing for a constant nonparametric effect in a general semi-parametric regression model when there is the potential for interaction between the parametrically and nonparametrically modeled variables. The work was originally motivated by a unique testing problem in genetic epidemiology (Chatterjee, et al., 2006) that involved a typical generalized linear model but with an additional term reminiscent of the Tukey one-degree-of-freedom formulation, and their interest was in testing for main effects of the genetic variables, while gaining statistical power by allowing for a possible interaction between genes and the environment. Later work (Maity, et al., 2009) involved the possibility of modeling the environmental variable nonparametrically, but they focused on whether there was a parametric main effect for the genetic variables. In this paper, we consider the complementary problem, where the interest is in testing for the main effect of the nonparametrically modeled environmental variable. We derive a generalized likelihood ratio test for this hypothesis, show how to implement it, and provide evidence that our method can improve statistical power when compared to standard partially linear models with main effects only. We use the method for the primary purpose of analyzing data from a case-control study of colorectal adenoma.

Testing overall and moderator effects meta-regression

NARCIS (Netherlands)

Huizenga, H.M.; Visser, I.; Dolan, C.V.

2011-01-01

Random effects meta-regression is a technique to synthesize results of multiple studies. It allows for a test of an overall effect, as well as for tests of effects of study characteristics, that is, (discrete or continuous) moderator effects. We describe various procedures to test moderator effects:
Solving Dynamic Traveling Salesman Problem Using Dynamic Gaussian Process Regression

Directory of Open Access Journals (Sweden)

Stephen M. Akandwanaho

2014-01-01

Full Text Available This paper solves the dynamic traveling salesman problem (DTSP using dynamic Gaussian Process Regression (DGPR method. The problem of varying correlation tour is alleviated by the nonstationary covariance function interleaved with DGPR to generate a predictive distribution for DTSP tour. This approach is conjoined with Nearest Neighbor (NN method and the iterated local search to track dynamic optima. Experimental results were obtained on DTSP instances. The comparisons were performed with Genetic Algorithm and Simulated Annealing. The proposed approach demonstrates superiority in finding good traveling salesman problem (TSP tour and less computational time in nonstationary conditions.
A test for the parameters of multiple linear regression models ...

African Journals Online (AJOL)

A test for the parameters of multiple linear regression models is developed for conducting tests simultaneously on all the parameters of multiple linear regression models. The test is robust relative to the assumptions of homogeneity of variances and absence of serial correlation of the classical F-test. Under certain null and ...
Regression testing Ajax applications : Coping with dynamism

NARCIS (Netherlands)

Roest, D.; Mesbah, A.; Van Deursen, A.

2009-01-01

Note: This paper is a pre-print of: Danny Roest, Ali Mesbah and Arie van Deursen. Regression Testing AJAX Applications: Coping with Dynamism. In Proceedings of the 3rd International Conference on Software Testing, Verification and Validation (ICST’10), Paris, France. IEEE Computer Society, 2010.
Testing and Estimating Shape-Constrained Nonparametric Density and Regression in the Presence of Measurement Error

KAUST Repository

Carroll, Raymond J.

2011-03-01

In many applications we can expect that, or are interested to know if, a density function or a regression curve satisfies some specific shape constraints. For example, when the explanatory variable, X, represents the value taken by a treatment or dosage, the conditional mean of the response, Y , is often anticipated to be a monotone function of X. Indeed, if this regression mean is not monotone (in the appropriate direction) then the medical or commercial value of the treatment is likely to be significantly curtailed, at least for values of X that lie beyond the point at which monotonicity fails. In the case of a density, common shape constraints include log-concavity and unimodality. If we can correctly guess the shape of a curve, then nonparametric estimators can be improved by taking this information into account. Addressing such problems requires a method for testing the hypothesis that the curve of interest satisfies a shape constraint, and, if the conclusion of the test is positive, a technique for estimating the curve subject to the constraint. Nonparametric methodology for solving these problems already exists, but only in cases where the covariates are observed precisely. However in many problems, data can only be observed with measurement errors, and the methods employed in the error-free case typically do not carry over to this error context. In this paper we develop a novel approach to hypothesis testing and function estimation under shape constraints, which is valid in the context of measurement errors. Our method is based on tilting an estimator of the density or the regression mean until it satisfies the shape constraint, and we take as our test statistic the distance through which it is tilted. Bootstrap methods are used to calibrate the test. The constrained curve estimators that we develop are also based on tilting, and in that context our work has points of contact with methodology in the error-free case.
The Prediction Properties of Inverse and Reverse Regression for the Simple Linear Calibration Problem

Science.gov (United States)

Parker, Peter A.; Geoffrey, Vining G.; Wilson, Sara R.; Szarka, John L., III; Johnson, Nels G.

2010-01-01

The calibration of measurement systems is a fundamental but under-studied problem within industrial statistics. The origins of this problem go back to basic chemical analysis based on NIST standards. In today's world these issues extend to mechanical, electrical, and materials engineering. Often, these new scenarios do not provide "gold standards" such as the standard weights provided by NIST. This paper considers the classic "forward regression followed by inverse regression" approach. In this approach the initial experiment treats the "standards" as the regressor and the observed values as the response to calibrate the instrument. The analyst then must invert the resulting regression model in order to use the instrument to make actual measurements in practice. This paper compares this classical approach to "reverse regression," which treats the standards as the response and the observed measurements as the regressor in the calibration experiment. Such an approach is intuitively appealing because it avoids the need for the inverse regression. However, it also violates some of the basic regression assumptions.
Structural Break Tests Robust to Regression Misspecification

Directory of Open Access Journals (Sweden)

Alaa Abi Morshed

2018-05-01

Full Text Available Structural break tests for regression models are sensitive to model misspecification. We show—analytically and through simulations—that the sup Wald test for breaks in the conditional mean and variance of a time series process exhibits severe size distortions when the conditional mean dynamics are misspecified. We also show that the sup Wald test for breaks in the unconditional mean and variance does not have the same size distortions, yet benefits from similar power to its conditional counterpart in correctly specified models. Hence, we propose using it as an alternative and complementary test for breaks. We apply the unconditional and conditional mean and variance tests to three US series: unemployment, industrial production growth and interest rates. Both the unconditional and the conditional mean tests detect a break in the mean of interest rates. However, for the other two series, the unconditional mean test does not detect a break, while the conditional mean tests based on dynamic regression models occasionally detect a break, with the implied break-point estimator varying across different dynamic specifications. For all series, the unconditional variance does not detect a break while most tests for the conditional variance do detect a break which also varies across specifications.
Posterior consistency for Bayesian inverse problems through stability and regression results

International Nuclear Information System (INIS)

Vollmer, Sebastian J

2013-01-01

In the Bayesian approach, the a priori knowledge about the input of a mathematical model is described via a probability measure. The joint distribution of the unknown input and the data is then conditioned, using Bayes’ formula, giving rise to the posterior distribution on the unknown input. In this setting we prove posterior consistency for nonlinear inverse problems: a sequence of data is considered, with diminishing fluctuations around a single truth and it is then of interest to show that the resulting sequence of posterior measures arising from this sequence of data concentrates around the truth used to generate the data. Posterior consistency justifies the use of the Bayesian approach very much in the same way as error bounds and convergence results for regularization techniques do. As a guiding example, we consider the inverse problem of reconstructing the diffusion coefficient from noisy observations of the solution to an elliptic PDE in divergence form. This problem is approached by splitting the forward operator into the underlying continuum model and a simpler observation operator based on the output of the model. In general, these splittings allow us to conclude posterior consistency provided a deterministic stability result for the underlying inverse problem and a posterior consistency result for the Bayesian regression problem with the push-forward prior. Moreover, we prove posterior consistency for the Bayesian regression problem based on the regularity, the tail behaviour and the small ball probabilities of the prior. (paper)
Automation of Flight Software Regression Testing

Science.gov (United States)

Tashakkor, Scott B.

2016-01-01

NASA is developing the Space Launch System (SLS) to be a heavy lift launch vehicle supporting human and scientific exploration beyond earth orbit. SLS will have a common core stage, an upper stage, and different permutations of boosters and fairings to perform various crewed or cargo missions. Marshall Space Flight Center (MSFC) is writing the Flight Software (FSW) that will operate the SLS launch vehicle. The FSW is developed in an incremental manner based on "Agile" software techniques. As the FSW is incrementally developed, testing the functionality of the code needs to be performed continually to ensure that the integrity of the software is maintained. Manually testing the functionality on an ever-growing set of requirements and features is not an efficient solution and therefore needs to be done automatically to ensure testing is comprehensive. To support test automation, a framework for a regression test harness has been developed and used on SLS FSW. The test harness provides a modular design approach that can compile or read in the required information specified by the developer of the test. The modularity provides independence between groups of tests and the ability to add and remove tests without disturbing others. This provides the SLS FSW team a time saving feature that is essential to meeting SLS Program technical and programmatic requirements. During development of SLS FSW, this technique has proved to be a useful tool to ensure all requirements have been tested, and that desired functionality is maintained, as changes occur. It also provides a mechanism for developers to check functionality of the code that they have developed. With this system, automation of regression testing is accomplished through a scheduling tool and/or commit hooks. Key advantages of this test harness capability includes execution support for multiple independent test cases, the ability for developers to specify precisely what they are testing and how, the ability to add
Regression filter for signal resolution

International Nuclear Information System (INIS)

Matthes, W.

1975-01-01

The problem considered is that of resolving a measured pulse height spectrum of a material mixture, e.g. gamma ray spectrum, Raman spectrum, into a weighed sum of the spectra of the individual constituents. The model on which the analytical formulation is based is described. The problem reduces to that of a multiple linear regression. A stepwise linear regression procedure was constructed. The efficiency of this method was then tested by transforming the procedure in a computer programme which was used to unfold test spectra obtained by mixing some spectra, from a library of arbitrary chosen spectra, and adding a noise component. (U.K.)
Convergence diagnostics for Eigenvalue problems with linear regression model

International Nuclear Information System (INIS)

Shi, Bo; Petrovic, Bojan

2011-01-01

Although the Monte Carlo method has been extensively used for criticality/Eigenvalue problems, a reliable, robust, and efficient convergence diagnostics method is still desired. Most methods are based on integral parameters (multiplication factor, entropy) and either condense the local distribution information into a single value (e.g., entropy) or even disregard it. We propose to employ the detailed cycle-by-cycle local flux evolution obtained by using mesh tally mechanism to assess the source and flux convergence. By applying a linear regression model to each individual mesh in a mesh tally for convergence diagnostics, a global convergence criterion can be obtained. We exemplify this method on two problems and obtain promising diagnostics results. (author)
Continuous validation of ASTEC containment models and regression testing

International Nuclear Information System (INIS)

Nowack, Holger; Reinke, Nils; Sonnenkalb, Martin

2014-01-01

The focus of the ASTEC (Accident Source Term Evaluation Code) development at GRS is primarily on the containment module CPA (Containment Part of ASTEC), whose modelling is to a large extent based on the GRS containment code COCOSYS (COntainment COde SYStem). Validation is usually understood as the approval of the modelling capabilities by calculations of appropriate experiments done by external users different from the code developers. During the development process of ASTEC CPA, bugs and unintended side effects may occur, which leads to changes in the results of the initially conducted validation. Due to the involvement of a considerable number of developers in the coding of ASTEC modules, validation of the code alone, even if executed repeatedly, is not sufficient. Therefore, a regression testing procedure has been implemented in order to ensure that the initially obtained validation results are still valid with succeeding code versions. Within the regression testing procedure, calculations of experiments and plant sequences are performed with the same input deck but applying two different code versions. For every test-case the up-to-date code version is compared to the preceding one on the basis of physical parameters deemed to be characteristic for the test-case under consideration. In the case of post-calculations of experiments also a comparison to experimental data is carried out. Three validation cases from the regression testing procedure are presented within this paper. The very good post-calculation of the HDR E11.1 experiment shows the high quality modelling of thermal-hydraulics in ASTEC CPA. Aerosol behaviour is validated on the BMC VANAM M3 experiment, and the results show also a very good agreement with experimental data. Finally, iodine behaviour is checked in the validation test-case of the THAI IOD-11 experiment. Within this test-case, the comparison of the ASTEC versions V2.0r1 and V2.0r2 shows how an error was detected by the regression testing
Time-adaptive quantile regression

DEFF Research Database (Denmark)

Møller, Jan Kloppenborg; Nielsen, Henrik Aalborg; Madsen, Henrik

2008-01-01

and an updating procedure are combined into a new algorithm for time-adaptive quantile regression, which generates new solutions on the basis of the old solution, leading to savings in computation time. The suggested algorithm is tested against a static quantile regression model on a data set with wind power......An algorithm for time-adaptive quantile regression is presented. The algorithm is based on the simplex algorithm, and the linear optimization formulation of the quantile regression problem is given. The observations have been split to allow a direct use of the simplex algorithm. The simplex method...... production, where the models combine splines and quantile regression. The comparison indicates superior performance for the time-adaptive quantile regression in all the performance parameters considered....
Testing for Stock Market Contagion: A Quantile Regression Approach

NARCIS (Netherlands)

S.Y. Park (Sung); W. Wang (Wendun); N. Huang (Naijing)

2015-01-01

markdownabstract__Abstract__ Regarding the asymmetric and leptokurtic behavior of financial data, we propose a new contagion test in the quantile regression framework that is robust to model misspecification. Unlike conventional correlation-based tests, the proposed quantile contagion test
Finite Algorithms for Robust Linear Regression

DEFF Research Database (Denmark)

Madsen, Kaj; Nielsen, Hans Bruun

1990-01-01

The Huber M-estimator for robust linear regression is analyzed. Newton type methods for solution of the problem are defined and analyzed, and finite convergence is proved. Numerical experiments with a large number of test problems demonstrate efficiency and indicate that this kind of approach may...
Linear Regression Analysis

CERN Document Server

Seber, George A F

2012-01-01

Concise, mathematically clear, and comprehensive treatment of the subject.* Expanded coverage of diagnostics and methods of model fitting.* Requires no specialized knowledge beyond a good grasp of matrix algebra and some acquaintance with straight-line regression and simple analysis of variance models.* More than 200 problems throughout the book plus outline solutions for the exercises.* This revision has been extensively class-tested.
Retro-regression--another important multivariate regression improvement.

Science.gov (United States)

Randić, M

2001-01-01

We review the serious problem associated with instabilities of the coefficients of regression equations, referred to as the MRA (multivariate regression analysis) "nightmare of the first kind". This is manifested when in a stepwise regression a descriptor is included or excluded from a regression. The consequence is an unpredictable change of the coefficients of the descriptors that remain in the regression equation. We follow with consideration of an even more serious problem, referred to as the MRA "nightmare of the second kind", arising when optimal descriptors are selected from a large pool of descriptors. This process typically causes at different steps of the stepwise regression a replacement of several previously used descriptors by new ones. We describe a procedure that resolves these difficulties. The approach is illustrated on boiling points of nonanes which are considered (1) by using an ordered connectivity basis; (2) by using an ordering resulting from application of greedy algorithm; and (3) by using an ordering derived from an exhaustive search for optimal descriptors. A novel variant of multiple regression analysis, called retro-regression (RR), is outlined showing how it resolves the ambiguities associated with both "nightmares" of the first and the second kind of MRA.
A Solution to Separation and Multicollinearity in Multiple Logistic Regression.

Science.gov (United States)

Shen, Jianzhao; Gao, Sujuan

2008-10-01

In dementia screening tests, item selection for shortening an existing screening test can be achieved using multiple logistic regression. However, maximum likelihood estimates for such logistic regression models often experience serious bias or even non-existence because of separation and multicollinearity problems resulting from a large number of highly correlated items. Firth (1993, Biometrika, 80(1), 27-38) proposed a penalized likelihood estimator for generalized linear models and it was shown to reduce bias and the non-existence problems. The ridge regression has been used in logistic regression to stabilize the estimates in cases of multicollinearity. However, neither solves the problems for each other. In this paper, we propose a double penalized maximum likelihood estimator combining Firth's penalized likelihood equation with a ridge parameter. We present a simulation study evaluating the empirical performance of the double penalized likelihood estimator in small to moderate sample sizes. We demonstrate the proposed approach using a current screening data from a community-based dementia study.
Notes on power of normality tests of error terms in regression models

International Nuclear Information System (INIS)

Střelec, Luboš

2015-01-01

Normality is one of the basic assumptions in applying statistical procedures. For example in linear regression most of the inferential procedures are based on the assumption of normality, i.e. the disturbance vector is assumed to be normally distributed. Failure to assess non-normality of the error terms may lead to incorrect results of usual statistical inference techniques such as t-test or F-test. Thus, error terms should be normally distributed in order to allow us to make exact inferences. As a consequence, normally distributed stochastic errors are necessary in order to make a not misleading inferences which explains a necessity and importance of robust tests of normality. Therefore, the aim of this contribution is to discuss normality testing of error terms in regression models. In this contribution, we introduce the general RT class of robust tests for normality, and present and discuss the trade-off between power and robustness of selected classical and robust normality tests of error terms in regression models
Notes on power of normality tests of error terms in regression models

Energy Technology Data Exchange (ETDEWEB)

Střelec, Luboš [Department of Statistics and Operation Analysis, Faculty of Business and Economics, Mendel University in Brno, Zemědělská 1, Brno, 61300 (Czech Republic)

2015-03-10

Normality is one of the basic assumptions in applying statistical procedures. For example in linear regression most of the inferential procedures are based on the assumption of normality, i.e. the disturbance vector is assumed to be normally distributed. Failure to assess non-normality of the error terms may lead to incorrect results of usual statistical inference techniques such as t-test or F-test. Thus, error terms should be normally distributed in order to allow us to make exact inferences. As a consequence, normally distributed stochastic errors are necessary in order to make a not misleading inferences which explains a necessity and importance of robust tests of normality. Therefore, the aim of this contribution is to discuss normality testing of error terms in regression models. In this contribution, we introduce the general RT class of robust tests for normality, and present and discuss the trade-off between power and robustness of selected classical and robust normality tests of error terms in regression models.

Regression testing in the TOTEM DCS

International Nuclear Information System (INIS)

Rodríguez, F Lucas; Atanassov, I; Burkimsher, P; Frost, O; Taskinen, J; Tulimaki, V

2012-01-01

The Detector Control System of the TOTEM experiment at the LHC is built with the industrial product WinCC OA (PVSS). The TOTEM system is generated automatically through scripts using as input the detector Product Breakdown Structure (PBS) structure and its pinout connectivity, archiving and alarm metainformation, and some other heuristics based on the naming conventions. When those initial parameters and automation code are modified to include new features, the resulting PVSS system can also introduce side-effects. On a daily basis, a custom developed regression testing tool takes the most recent code from a Subversion (SVN) repository and builds a new control system from scratch. This system is exported in plain text format using the PVSS export tool, and compared with a system previously validated by a human. A report is sent to the developers with any differences highlighted, in readiness for validation and acceptance as a new stable version. This regression approach is not dependent on any development framework or methodology. This process has been satisfactory during several months, proving to be a very valuable tool before deploying new versions in the production systems.
Comparing Linear Discriminant Function with Logistic Regression for the Two-Group Classification Problem.

Science.gov (United States)

Fan, Xitao; Wang, Lin

The Monte Carlo study compared the performance of predictive discriminant analysis (PDA) and that of logistic regression (LR) for the two-group classification problem. Prior probabilities were used for classification, but the cost of misclassification was assumed to be equal. The study used a fully crossed three-factor experimental design (with…
Linearity and Misspecification Tests for Vector Smooth Transition Regression Models

DEFF Research Database (Denmark)

Teräsvirta, Timo; Yang, Yukai

The purpose of the paper is to derive Lagrange multiplier and Lagrange multiplier type specification and misspecification tests for vector smooth transition regression models. We report results from simulation studies in which the size and power properties of the proposed asymptotic tests in small...
Significance tests to determine the direction of effects in linear regression models.

Science.gov (United States)

Wiedermann, Wolfgang; Hagmann, Michael; von Eye, Alexander

2015-02-01

Previous studies have discussed asymmetric interpretations of the Pearson correlation coefficient and have shown that higher moments can be used to decide on the direction of dependence in the bivariate linear regression setting. The current study extends this approach by illustrating that the third moment of regression residuals may also be used to derive conclusions concerning the direction of effects. Assuming non-normally distributed variables, it is shown that the distribution of residuals of the correctly specified regression model (e.g., Y is regressed on X) is more symmetric than the distribution of residuals of the competing model (i.e., X is regressed on Y). Based on this result, 4 one-sample tests are discussed which can be used to decide which variable is more likely to be the response and which one is more likely to be the explanatory variable. A fifth significance test is proposed based on the differences of skewness estimates, which leads to a more direct test of a hypothesis that is compatible with direction of dependence. A Monte Carlo simulation study was performed to examine the behaviour of the procedures under various degrees of associations, sample sizes, and distributional properties of the underlying population. An empirical example is given which illustrates the application of the tests in practice. © 2014 The British Psychological Society.
Normalization Ridge Regression in Practice I: Comparisons Between Ordinary Least Squares, Ridge Regression and Normalization Ridge Regression.

Science.gov (United States)

Bulcock, J. W.

The problem of model estimation when the data are collinear was examined. Though the ridge regression (RR) outperforms ordinary least squares (OLS) regression in the presence of acute multicollinearity, it is not a problem free technique for reducing the variance of the estimates. It is a stochastic procedure when it should be nonstochastic and it…
A Spreadsheet Tool for Learning the Multiple Regression F-Test, T-Tests, and Multicollinearity

Science.gov (United States)

Martin, David

2008-01-01

This note presents a spreadsheet tool that allows teachers the opportunity to guide students towards answering on their own questions related to the multiple regression F-test, the t-tests, and multicollinearity. The note demonstrates approaches for using the spreadsheet that might be appropriate for three different levels of statistics classes,…
Testing the Perturbation Sensitivity of Abortion-Crime Regressions

Directory of Open Access Journals (Sweden)

Michał Brzeziński

2012-06-01

Full Text Available The hypothesis that the legalisation of abortion contributed significantly to the reduction of crime in the United States in 1990s is one of the most prominent ideas from the recent “economics-made-fun” movement sparked by the book Freakonomics. This paper expands on the existing literature about the computational stability of abortion-crime regressions by testing the sensitivity of coefficients’ estimates to small amounts of data perturbation. In contrast to previous studies, we use a new data set on crime correlates for each of the US states, the original model specifica-tion and estimation methodology, and an improved data perturbation algorithm. We find that the coefficients’ estimates in abortion-crime regressions are not computationally stable and, therefore, are unreliable.
Statistical power analyses using G*Power 3.1: tests for correlation and regression analyses.

Science.gov (United States)

Faul, Franz; Erdfelder, Edgar; Buchner, Axel; Lang, Albert-Georg

2009-11-01

G*Power is a free power analysis program for a variety of statistical tests. We present extensions and improvements of the version introduced by Faul, Erdfelder, Lang, and Buchner (2007) in the domain of correlation and regression analyses. In the new version, we have added procedures to analyze the power of tests based on (1) single-sample tetrachoric correlations, (2) comparisons of dependent correlations, (3) bivariate linear regression, (4) multiple linear regression based on the random predictor model, (5) logistic regression, and (6) Poisson regression. We describe these new features and provide a brief introduction to their scope and handling.
Accounting for estimated IQ in neuropsychological test performance with regression-based techniques.

Science.gov (United States)

Testa, S Marc; Winicki, Jessica M; Pearlson, Godfrey D; Gordon, Barry; Schretlen, David J

2009-11-01

Regression-based normative techniques account for variability in test performance associated with multiple predictor variables and generate expected scores based on algebraic equations. Using this approach, we show that estimated IQ, based on oral word reading, accounts for 1-9% of the variability beyond that explained by individual differences in age, sex, race, and years of education for most cognitive measures. These results confirm that adding estimated "premorbid" IQ to demographic predictors in multiple regression models can incrementally improve the accuracy with which regression-based norms (RBNs) benchmark expected neuropsychological test performance in healthy adults. It remains to be seen whether the incremental variance in test performance explained by estimated "premorbid" IQ translates to improved diagnostic accuracy in patient samples. We describe these methods, and illustrate the step-by-step application of RBNs with two cases. We also discuss the rationale, assumptions, and caveats of this approach. More broadly, we note that adjusting test scores for age and other characteristics might actually decrease the accuracy with which test performance predicts absolute criteria, such as the ability to drive or live independently.
Power and Sample Size Calculations for Logistic Regression Tests for Differential Item Functioning

Science.gov (United States)

Li, Zhushan

2014-01-01

Logistic regression is a popular method for detecting uniform and nonuniform differential item functioning (DIF) effects. Theoretical formulas for the power and sample size calculations are derived for likelihood ratio tests and Wald tests based on the asymptotic distribution of the maximum likelihood estimators for the logistic regression model.…
PELE-IC test problems

International Nuclear Information System (INIS)

Gong, E.Y.; Alexander, E.E.; McMaster, W.H.; Quinones, D.F.

1979-01-01

This report provides prospective users of the Lawrence Livermore Laboratory (LLL) fluid-structure interaction computer code, PELE-IC, a variety of test problems for verifying the code on CDC 7600 computer systems at facilities external to the LLL environment. The test problems have been successfully run on CDC 7600 computers at the LLL and Lawrence Berkeley Laboratory (LBL) computer centers
Testing discontinuities in nonparametric regression

KAUST Repository

Dai, Wenlin

2017-01-19

In nonparametric regression, it is often needed to detect whether there are jump discontinuities in the mean function. In this paper, we revisit the difference-based method in [13 H.-G. Müller and U. Stadtmüller, Discontinuous versus smooth regression, Ann. Stat. 27 (1999), pp. 299–337. doi: 10.1214/aos/1018031100
Testing discontinuities in nonparametric regression

KAUST Repository

Dai, Wenlin; Zhou, Yuejin; Tong, Tiejun

2017-01-01

In nonparametric regression, it is often needed to detect whether there are jump discontinuities in the mean function. In this paper, we revisit the difference-based method in [13 H.-G. Müller and U. Stadtmüller, Discontinuous versus smooth regression, Ann. Stat. 27 (1999), pp. 299–337. doi: 10.1214/aos/1018031100
Testing Heteroscedasticity in Robust Regression

Czech Academy of Sciences Publication Activity Database

Kalina, Jan

2011-01-01

Roč. 1, č. 4 (2011), s. 25-28 ISSN 2045-3345 Grant - others:GA ČR(CZ) GA402/09/0557 Institutional research plan: CEZ:AV0Z10300504 Keywords : robust regression * heteroscedasticity * regression quantiles * diagnostics Subject RIV: BB - Applied Statistics , Operational Research http://www.researchjournals.co.uk/documents/Vol4/06%20Kalina.pdf
SOCP relaxation bounds for the optimal subset selection problem applied to robust linear regression

OpenAIRE

Flores, Salvador

2015-01-01

This paper deals with the problem of finding the globally optimal subset of h elements from a larger set of n elements in d space dimensions so as to minimize a quadratic criterion, with an special emphasis on applications to computing the Least Trimmed Squares Estimator (LTSE) for robust regression. The computation of the LTSE is a challenging subset selection problem involving a nonlinear program with continuous and binary variables, linked in a highly nonlinear fashion. The selection of a ...
Accounting for measurement error in log regression models with applications to accelerated testing.

Science.gov (United States)

Richardson, Robert; Tolley, H Dennis; Evenson, William E; Lunt, Barry M

2018-01-01

In regression settings, parameter estimates will be biased when the explanatory variables are measured with error. This bias can significantly affect modeling goals. In particular, accelerated lifetime testing involves an extrapolation of the fitted model, and a small amount of bias in parameter estimates may result in a significant increase in the bias of the extrapolated predictions. Additionally, bias may arise when the stochastic component of a log regression model is assumed to be multiplicative when the actual underlying stochastic component is additive. To account for these possible sources of bias, a log regression model with measurement error and additive error is approximated by a weighted regression model which can be estimated using Iteratively Re-weighted Least Squares. Using the reduced Eyring equation in an accelerated testing setting, the model is compared to previously accepted approaches to modeling accelerated testing data with both simulations and real data.
Accounting for measurement error in log regression models with applications to accelerated testing.

Directory of Open Access Journals (Sweden)

Robert Richardson

Full Text Available In regression settings, parameter estimates will be biased when the explanatory variables are measured with error. This bias can significantly affect modeling goals. In particular, accelerated lifetime testing involves an extrapolation of the fitted model, and a small amount of bias in parameter estimates may result in a significant increase in the bias of the extrapolated predictions. Additionally, bias may arise when the stochastic component of a log regression model is assumed to be multiplicative when the actual underlying stochastic component is additive. To account for these possible sources of bias, a log regression model with measurement error and additive error is approximated by a weighted regression model which can be estimated using Iteratively Re-weighted Least Squares. Using the reduced Eyring equation in an accelerated testing setting, the model is compared to previously accepted approaches to modeling accelerated testing data with both simulations and real data.
An application of robust ridge regression model in the presence of outliers to real data problem

Science.gov (United States)

Shariff, N. S. Md.; Ferdaos, N. A.

2017-09-01

Multicollinearity and outliers are often leads to inconsistent and unreliable parameter estimates in regression analysis. The well-known procedure that is robust to multicollinearity problem is the ridge regression method. This method however is believed are affected by the presence of outlier. The combination of GM-estimation and ridge parameter that is robust towards both problems is on interest in this study. As such, both techniques are employed to investigate the relationship between stock market price and macroeconomic variables in Malaysia due to curiosity of involving the multicollinearity and outlier problem in the data set. There are four macroeconomic factors selected for this study which are Consumer Price Index (CPI), Gross Domestic Product (GDP), Base Lending Rate (BLR) and Money Supply (M1). The results demonstrate that the proposed procedure is able to produce reliable results towards the presence of multicollinearity and outliers in the real data.
Power properties of invariant tests for spatial autocorrelation in linear regression

NARCIS (Netherlands)

Martellosio, F.

2006-01-01

Many popular tests for residual spatial autocorrelation in the context of the linear regression model belong to the class of invariant tests. This paper derives a number of exact properties of the power function of such tests. In particular, we extend the work of Krämer (2005, Journal of Statistical
A simple approach to power and sample size calculations in logistic regression and Cox regression models.

Science.gov (United States)

Vaeth, Michael; Skovlund, Eva

2004-06-15

For a given regression problem it is possible to identify a suitably defined equivalent two-sample problem such that the power or sample size obtained for the two-sample problem also applies to the regression problem. For a standard linear regression model the equivalent two-sample problem is easily identified, but for generalized linear models and for Cox regression models the situation is more complicated. An approximately equivalent two-sample problem may, however, also be identified here. In particular, we show that for logistic regression and Cox regression models the equivalent two-sample problem is obtained by selecting two equally sized samples for which the parameters differ by a value equal to the slope times twice the standard deviation of the independent variable and further requiring that the overall expected number of events is unchanged. In a simulation study we examine the validity of this approach to power calculations in logistic regression and Cox regression models. Several different covariate distributions are considered for selected values of the overall response probability and a range of alternatives. For the Cox regression model we consider both constant and non-constant hazard rates. The results show that in general the approach is remarkably accurate even in relatively small samples. Some discrepancies are, however, found in small samples with few events and a highly skewed covariate distribution. Comparison with results based on alternative methods for logistic regression models with a single continuous covariate indicates that the proposed method is at least as good as its competitors. The method is easy to implement and therefore provides a simple way to extend the range of problems that can be covered by the usual formulas for power and sample size determination. Copyright 2004 John Wiley & Sons, Ltd.

A test of inflated zeros for Poisson regression models.

Science.gov (United States)

He, Hua; Zhang, Hui; Ye, Peng; Tang, Wan

2017-01-01

Excessive zeros are common in practice and may cause overdispersion and invalidate inference when fitting Poisson regression models. There is a large body of literature on zero-inflated Poisson models. However, methods for testing whether there are excessive zeros are less well developed. The Vuong test comparing a Poisson and a zero-inflated Poisson model is commonly applied in practice. However, the type I error of the test often deviates seriously from the nominal level, rendering serious doubts on the validity of the test in such applications. In this paper, we develop a new approach for testing inflated zeros under the Poisson model. Unlike the Vuong test for inflated zeros, our method does not require a zero-inflated Poisson model to perform the test. Simulation studies show that when compared with the Vuong test our approach not only better at controlling type I error rate, but also yield more power.
Double Length Regressions for Testing the Box-Cox Difference Transformation.

OpenAIRE

Park, Timothy

1991-01-01

The Box-Cox difference transformation is used to determine the appropriate specification for estimation of hedge ratios and a new double length regression form of the Lagrange multiplier test is presented for the difference transformation. The Box-Cox difference transformation allows the testing of the first difference model and the returns model as special cases of the Box-Cox difference transformation. Copyright 1991 by MIT Press.
Boosted beta regression.

Directory of Open Access Journals (Sweden)

Matthias Schmid

Full Text Available Regression analysis with a bounded outcome is a common problem in applied statistics. Typical examples include regression models for percentage outcomes and the analysis of ratings that are measured on a bounded scale. In this paper, we consider beta regression, which is a generalization of logit models to situations where the response is continuous on the interval (0,1. Consequently, beta regression is a convenient tool for analyzing percentage responses. The classical approach to fit a beta regression model is to use maximum likelihood estimation with subsequent AIC-based variable selection. As an alternative to this established - yet unstable - approach, we propose a new estimation technique called boosted beta regression. With boosted beta regression estimation and variable selection can be carried out simultaneously in a highly efficient way. Additionally, both the mean and the variance of a percentage response can be modeled using flexible nonlinear covariate effects. As a consequence, the new method accounts for common problems such as overdispersion and non-binomial variance structures.
CUSUM-Logistic Regression analysis for the rapid detection of errors in clinical laboratory test results.

Science.gov (United States)

Sampson, Maureen L; Gounden, Verena; van Deventer, Hendrik E; Remaley, Alan T

2016-02-01

The main drawback of the periodic analysis of quality control (QC) material is that test performance is not monitored in time periods between QC analyses, potentially leading to the reporting of faulty test results. The objective of this study was to develop a patient based QC procedure for the more timely detection of test errors. Results from a Chem-14 panel measured on the Beckman LX20 analyzer were used to develop the model. Each test result was predicted from the other 13 members of the panel by multiple regression, which resulted in correlation coefficients between the predicted and measured result of >0.7 for 8 of the 14 tests. A logistic regression model, which utilized the measured test result, the predicted test result, the day of the week and time of day, was then developed for predicting test errors. The output of the logistic regression was tallied by a daily CUSUM approach and used to predict test errors, with a fixed specificity of 90%. The mean average run length (ARL) before error detection by CUSUM-Logistic Regression (CSLR) was 20 with a mean sensitivity of 97%, which was considerably shorter than the mean ARL of 53 (sensitivity 87.5%) for a simple prediction model that only used the measured result for error detection. A CUSUM-Logistic Regression analysis of patient laboratory data can be an effective approach for the rapid and sensitive detection of clinical laboratory errors. Published by Elsevier Inc.
Two-Sample Tests for High-Dimensional Linear Regression with an Application to Detecting Interactions.

Science.gov (United States)

Xia, Yin; Cai, Tianxi; Cai, T Tony

2018-01-01

Motivated by applications in genomics, we consider in this paper global and multiple testing for the comparisons of two high-dimensional linear regression models. A procedure for testing the equality of the two regression vectors globally is proposed and shown to be particularly powerful against sparse alternatives. We then introduce a multiple testing procedure for identifying unequal coordinates while controlling the false discovery rate and false discovery proportion. Theoretical justifications are provided to guarantee the validity of the proposed tests and optimality results are established under sparsity assumptions on the regression coefficients. The proposed testing procedures are easy to implement. Numerical properties of the procedures are investigated through simulation and data analysis. The results show that the proposed tests maintain the desired error rates under the null and have good power under the alternative at moderate sample sizes. The procedures are applied to the Framingham Offspring study to investigate the interactions between smoking and cardiovascular related genetic mutations important for an inflammation marker.
Conditional Monte Carlo randomization tests for regression models.

Science.gov (United States)

Parhat, Parwen; Rosenberger, William F; Diao, Guoqing

2014-08-15

We discuss the computation of randomization tests for clinical trials of two treatments when the primary outcome is based on a regression model. We begin by revisiting the seminal paper of Gail, Tan, and Piantadosi (1988), and then describe a method based on Monte Carlo generation of randomization sequences. The tests based on this Monte Carlo procedure are design based, in that they incorporate the particular randomization procedure used. We discuss permuted block designs, complete randomization, and biased coin designs. We also use a new technique by Plamadeala and Rosenberger (2012) for simple computation of conditional randomization tests. Like Gail, Tan, and Piantadosi, we focus on residuals from generalized linear models and martingale residuals from survival models. Such techniques do not apply to longitudinal data analysis, and we introduce a method for computation of randomization tests based on the predicted rate of change from a generalized linear mixed model when outcomes are longitudinal. We show, by simulation, that these randomization tests preserve the size and power well under model misspecification. Copyright © 2014 John Wiley & Sons, Ltd.
Parameter estimation and statistical test of geographically weighted bivariate Poisson inverse Gaussian regression models

Science.gov (United States)

Amalia, Junita; Purhadi, Otok, Bambang Widjanarko

2017-11-01

Poisson distribution is a discrete distribution with count data as the random variables and it has one parameter defines both mean and variance. Poisson regression assumes mean and variance should be same (equidispersion). Nonetheless, some case of the count data unsatisfied this assumption because variance exceeds mean (over-dispersion). The ignorance of over-dispersion causes underestimates in standard error. Furthermore, it causes incorrect decision in the statistical test. Previously, paired count data has a correlation and it has bivariate Poisson distribution. If there is over-dispersion, modeling paired count data is not sufficient with simple bivariate Poisson regression. Bivariate Poisson Inverse Gaussian Regression (BPIGR) model is mix Poisson regression for modeling paired count data within over-dispersion. BPIGR model produces a global model for all locations. In another hand, each location has different geographic conditions, social, cultural and economic so that Geographically Weighted Regression (GWR) is needed. The weighting function of each location in GWR generates a different local model. Geographically Weighted Bivariate Poisson Inverse Gaussian Regression (GWBPIGR) model is used to solve over-dispersion and to generate local models. Parameter estimation of GWBPIGR model obtained by Maximum Likelihood Estimation (MLE) method. Meanwhile, hypothesis testing of GWBPIGR model acquired by Maximum Likelihood Ratio Test (MLRT) method.
The alarming problems of confounding equivalence using logistic regression models in the perspective of causal diagrams.

Science.gov (United States)

Yu, Yuanyuan; Li, Hongkai; Sun, Xiaoru; Su, Ping; Wang, Tingting; Liu, Yi; Yuan, Zhongshang; Liu, Yanxun; Xue, Fuzhong

2017-12-28

Confounders can produce spurious associations between exposure and outcome in observational studies. For majority of epidemiologists, adjusting for confounders using logistic regression model is their habitual method, though it has some problems in accuracy and precision. It is, therefore, important to highlight the problems of logistic regression and search the alternative method. Four causal diagram models were defined to summarize confounding equivalence. Both theoretical proofs and simulation studies were performed to verify whether conditioning on different confounding equivalence sets had the same bias-reducing potential and then to select the optimum adjusting strategy, in which logistic regression model and inverse probability weighting based marginal structural model (IPW-based-MSM) were compared. The "do-calculus" was used to calculate the true causal effect of exposure on outcome, then the bias and standard error were used to evaluate the performances of different strategies. Adjusting for different sets of confounding equivalence, as judged by identical Markov boundaries, produced different bias-reducing potential in the logistic regression model. For the sets satisfied G-admissibility, adjusting for the set including all the confounders reduced the equivalent bias to the one containing the parent nodes of the outcome, while the bias after adjusting for the parent nodes of exposure was not equivalent to them. In addition, all causal effect estimations through logistic regression were biased, although the estimation after adjusting for the parent nodes of exposure was nearest to the true causal effect. However, conditioning on different confounding equivalence sets had the same bias-reducing potential under IPW-based-MSM. Compared with logistic regression, the IPW-based-MSM could obtain unbiased causal effect estimation when the adjusted confounders satisfied G-admissibility and the optimal strategy was to adjust for the parent nodes of outcome, which
The alarming problems of confounding equivalence using logistic regression models in the perspective of causal diagrams

Directory of Open Access Journals (Sweden)

Yuanyuan Yu

2017-12-01

Full Text Available Abstract Background Confounders can produce spurious associations between exposure and outcome in observational studies. For majority of epidemiologists, adjusting for confounders using logistic regression model is their habitual method, though it has some problems in accuracy and precision. It is, therefore, important to highlight the problems of logistic regression and search the alternative method. Methods Four causal diagram models were defined to summarize confounding equivalence. Both theoretical proofs and simulation studies were performed to verify whether conditioning on different confounding equivalence sets had the same bias-reducing potential and then to select the optimum adjusting strategy, in which logistic regression model and inverse probability weighting based marginal structural model (IPW-based-MSM were compared. The “do-calculus” was used to calculate the true causal effect of exposure on outcome, then the bias and standard error were used to evaluate the performances of different strategies. Results Adjusting for different sets of confounding equivalence, as judged by identical Markov boundaries, produced different bias-reducing potential in the logistic regression model. For the sets satisfied G-admissibility, adjusting for the set including all the confounders reduced the equivalent bias to the one containing the parent nodes of the outcome, while the bias after adjusting for the parent nodes of exposure was not equivalent to them. In addition, all causal effect estimations through logistic regression were biased, although the estimation after adjusting for the parent nodes of exposure was nearest to the true causal effect. However, conditioning on different confounding equivalence sets had the same bias-reducing potential under IPW-based-MSM. Compared with logistic regression, the IPW-based-MSM could obtain unbiased causal effect estimation when the adjusted confounders satisfied G-admissibility and the optimal
Regression Phalanxes

OpenAIRE

Zhang, Hongyang; Welch, William J.; Zamar, Ruben H.

2017-01-01

Tomal et al. (2015) introduced the notion of "phalanxes" in the context of rare-class detection in two-class classification problems. A phalanx is a subset of features that work well for classification tasks. In this paper, we propose a different class of phalanxes for application in regression settings. We define a "Regression Phalanx" - a subset of features that work well together for prediction. We propose a novel algorithm which automatically chooses Regression Phalanxes from high-dimensi...
Chandra X-ray Center Science Data Systems Regression Testing of CIAO

Science.gov (United States)

Lee, N. P.; Karovska, M.; Galle, E. C.; Bonaventura, N. R.

2011-07-01

The Chandra Interactive Analysis of Observations (CIAO) is a software system developed for the analysis of Chandra X-ray Observatory observations. An important component of a successful CIAO release is the repeated testing of the tools across various platforms to ensure consistent and scientifically valid results. We describe the procedures of the scientific regression testing of CIAO and the enhancements made to the testing system to increase the efficiency of run time and result validation.
Testing the water-energy theory on American palms (Arecaceae using geographically weighted regression.

Directory of Open Access Journals (Sweden)

Wolf L Eiserhardt

Full Text Available Water and energy have emerged as the best contemporary environmental correlates of broad-scale species richness patterns. A corollary hypothesis of water-energy dynamics theory is that the influence of water decreases and the influence of energy increases with absolute latitude. We report the first use of geographically weighted regression for testing this hypothesis on a continuous species richness gradient that is entirely located within the tropics and subtropics. The dataset was divided into northern and southern hemispheric portions to test whether predictor shifts are more pronounced in the less oceanic northern hemisphere. American palms (Arecaceae, n = 547 spp., whose species richness and distributions are known to respond strongly to water and energy, were used as a model group. The ability of water and energy to explain palm species richness was quantified locally at different spatial scales and regressed on latitude. Clear latitudinal trends in agreement with water-energy dynamics theory were found, but the results did not differ qualitatively between hemispheres. Strong inherent spatial autocorrelation in local modeling results and collinearity of water and energy variables were identified as important methodological challenges. We overcame these problems by using simultaneous autoregressive models and variation partitioning. Our results show that the ability of water and energy to explain species richness changes not only across large climatic gradients spanning tropical to temperate or arctic zones but also within megathermal climates, at least for strictly tropical taxa such as palms. This finding suggests that the predictor shifts are related to gradual latitudinal changes in ambient energy (related to solar flux input rather than to abrupt transitions at specific latitudes, such as the occurrence of frost.
Semiparametric Allelic Tests for Mapping Multiple Phenotypes: Binomial Regression and Mahalanobis Distance.

Science.gov (United States)

Majumdar, Arunabha; Witte, John S; Ghosh, Saurabh

2015-12-01

Binary phenotypes commonly arise due to multiple underlying quantitative precursors and genetic variants may impact multiple traits in a pleiotropic manner. Hence, simultaneously analyzing such correlated traits may be more powerful than analyzing individual traits. Various genotype-level methods, e.g., MultiPhen (O'Reilly et al. []), have been developed to identify genetic factors underlying a multivariate phenotype. For univariate phenotypes, the usefulness and applicability of allele-level tests have been investigated. The test of allele frequency difference among cases and controls is commonly used for mapping case-control association. However, allelic methods for multivariate association mapping have not been studied much. In this article, we explore two allelic tests of multivariate association: one using a Binomial regression model based on inverted regression of genotype on phenotype (Binomial regression-based Association of Multivariate Phenotypes [BAMP]), and the other employing the Mahalanobis distance between two sample means of the multivariate phenotype vector for two alleles at a single-nucleotide polymorphism (Distance-based Association of Multivariate Phenotypes [DAMP]). These methods can incorporate both discrete and continuous phenotypes. Some theoretical properties for BAMP are studied. Using simulations, the power of the methods for detecting multivariate association is compared with the genotype-level test MultiPhen's. The allelic tests yield marginally higher power than MultiPhen for multivariate phenotypes. For one/two binary traits under recessive mode of inheritance, allelic tests are found to be substantially more powerful. All three tests are applied to two different real data and the results offer some support for the simulation study. We propose a hybrid approach for testing multivariate association that implements MultiPhen when Hardy-Weinberg Equilibrium (HWE) is violated and BAMP otherwise, because the allelic approaches assume HWE
A Bayesian goodness of fit test and semiparametric generalization of logistic regression with measurement data.

Science.gov (United States)

Schörgendorfer, Angela; Branscum, Adam J; Hanson, Timothy E

2013-06-01

Logistic regression is a popular tool for risk analysis in medical and population health science. With continuous response data, it is common to create a dichotomous outcome for logistic regression analysis by specifying a threshold for positivity. Fitting a linear regression to the nondichotomized response variable assuming a logistic sampling model for the data has been empirically shown to yield more efficient estimates of odds ratios than ordinary logistic regression of the dichotomized endpoint. We illustrate that risk inference is not robust to departures from the parametric logistic distribution. Moreover, the model assumption of proportional odds is generally not satisfied when the condition of a logistic distribution for the data is violated, leading to biased inference from a parametric logistic analysis. We develop novel Bayesian semiparametric methodology for testing goodness of fit of parametric logistic regression with continuous measurement data. The testing procedures hold for any cutoff threshold and our approach simultaneously provides the ability to perform semiparametric risk estimation. Bayes factors are calculated using the Savage-Dickey ratio for testing the null hypothesis of logistic regression versus a semiparametric generalization. We propose a fully Bayesian and a computationally efficient empirical Bayesian approach to testing, and we present methods for semiparametric estimation of risks, relative risks, and odds ratios when parametric logistic regression fails. Theoretical results establish the consistency of the empirical Bayes test. Results from simulated data show that the proposed approach provides accurate inference irrespective of whether parametric assumptions hold or not. Evaluation of risk factors for obesity shows that different inferences are derived from an analysis of a real data set when deviations from a logistic distribution are permissible in a flexible semiparametric framework. © 2013, The International Biometric
Application of range-test in multiple linear regression analysis in ...

African Journals Online (AJOL)

Application of range-test in multiple linear regression analysis in the presence of outliers is studied in this paper. First, the plot of the explanatory variables (i.e. Administration, Social/Commercial, Economic services and Transfer) on the dependent variable (i.e. GDP) was done to identify the statistical trend over the years.
Unbalanced Regressions and the Predictive Equation

DEFF Research Database (Denmark)

Osterrieder, Daniela; Ventosa-Santaulària, Daniel; Vera-Valdés, J. Eduardo

Predictive return regressions with persistent regressors are typically plagued by (asymptotically) biased/inconsistent estimates of the slope, non-standard or potentially even spurious statistical inference, and regression unbalancedness. We alleviate the problem of unbalancedness in the theoreti......Predictive return regressions with persistent regressors are typically plagued by (asymptotically) biased/inconsistent estimates of the slope, non-standard or potentially even spurious statistical inference, and regression unbalancedness. We alleviate the problem of unbalancedness...
A Note on Three Statistical Tests in the Logistic Regression DIF Procedure

Science.gov (United States)

Paek, Insu

2012-01-01

Although logistic regression became one of the well-known methods in detecting differential item functioning (DIF), its three statistical tests, the Wald, likelihood ratio (LR), and score tests, which are readily available under the maximum likelihood, do not seem to be consistently distinguished in DIF literature. This paper provides a clarifying…
Solving the Omitted Variables Problem of Regression Analysis Using the Relative Vertical Position of Observations

Directory of Open Access Journals (Sweden)

Jonathan E. Leightner

2012-01-01

Full Text Available The omitted variables problem is one of regression analysis’ most serious problems. The standard approach to the omitted variables problem is to find instruments, or proxies, for the omitted variables, but this approach makes strong assumptions that are rarely met in practice. This paper introduces best projection reiterative truncated projected least squares (BP-RTPLS, the third generation of a technique that solves the omitted variables problem without using proxies or instruments. This paper presents a theoretical argument that BP-RTPLS produces unbiased reduced form estimates when there are omitted variables. This paper also provides simulation evidence that shows OLS produces between 250% and 2450% more errors than BP-RTPLS when there are omitted variables and when measurement and round-off error is 1 percent or less. In an example, the government spending multiplier, , is estimated using annual data for the USA between 1929 and 2010.
Beta/gamma test problems for ITS

International Nuclear Information System (INIS)

Mei, G.T.

1993-01-01

The Integrated Tiger Series of Coupled Electron/Photon Monte Carlo Transport Codes (ITS 3.0, PC Version) was used at Oak Ridge National Laboratory (ORNL) to compare with and extend the experimental findings of the beta/gamma response of selected health physics instruments. In order to assure that ITS gives correct results, several beta/gamma problems have been tested. ITS was used to simulate these problems numerically, and results for each were compared to the problem's experimental or analytical results. ITS successfully predicted the experimental or analytical results of all tested problems within the statistical uncertainty inherent in the Monte Carlo method
Testing problem-solving capacities: differences between individual testing and social group setting.

Science.gov (United States)

Krasheninnikova, Anastasia; Schneider, Jutta M

2014-09-01

Testing animals individually in problem-solving tasks limits distractions of the subjects during the test, so that they can fully concentrate on the problem. However, such individual performance may not indicate the problem-solving capacity that is commonly employed in the wild when individuals are faced with a novel problem in their social groups, where the presence of a conspecific influences an individual's behaviour. To assess the validity of data gathered from parrots when tested individually, we compared the performance on patterned-string tasks among parrots tested singly and parrots tested in social context. We tested two captive groups of orange-winged amazons (Amazona amazonica) with several patterned-string tasks. Despite the differences in the testing environment (singly vs. social context), parrots from both groups performed similarly. However, we found that the willingness to participate in the tasks was significantly higher for the individuals tested in social context. The study provides further evidence for the crucial influence of social context on individual's response to a challenging situation such as a problem-solving test.

Testing Under Fire: Chicago's Problem.

Science.gov (United States)

Byrd, Manford, Jr.

The history and development of city-wide testing programs in Chicago since 1936 are reviewed and placed in context with the impact on testing of Sputnik and the passage of the National Defense Education Act of 1958. Current testing problems include the time lag between events and curricular changes and new test construction, the time lag between…
Testing of a Fiber Optic Wear, Erosion and Regression Sensor

Science.gov (United States)

Korman, Valentin; Polzin, Kurt A.

2011-01-01

The nature of the physical processes and harsh environments associated with erosion and wear in propulsion environments makes their measurement and real-time rate quantification difficult. A fiber optic sensor capable of determining the wear (regression, erosion, ablation) associated with these environments has been developed and tested in a number of different applications to validate the technique. The sensor consists of two fiber optics that have differing attenuation coefficients and transmit light to detectors. The ratio of the two measured intensities can be correlated to the lengths of the fiber optic lines, and if the fibers and the host parent material in which they are embedded wear at the same rate the remaining length of fiber provides a real-time measure of the wear process. Testing in several disparate situations has been performed, with the data exhibiting excellent qualitative agreement with the theoretical description of the process and when a separate calibrated regression measurement is available good quantitative agreement is obtained as well. The light collected by the fibers can also be used to optically obtain the spectra and measure the internal temperature of the wear layer.
On Solving Lq-Penalized Regressions

Directory of Open Access Journals (Sweden)

Tracy Zhou Wu

2007-01-01

Full Text Available Lq-penalized regression arises in multidimensional statistical modelling where all or part of the regression coefficients are penalized to achieve both accuracy and parsimony of statistical models. There is often substantial computational difficulty except for the quadratic penalty case. The difficulty is partly due to the nonsmoothness of the objective function inherited from the use of the absolute value. We propose a new solution method for the general Lq-penalized regression problem based on space transformation and thus efficient optimization algorithms. The new method has immediate applications in statistics, notably in penalized spline smoothing problems. In particular, the LASSO problem is shown to be polynomial time solvable. Numerical studies show promise of our approach.
Standardized Definitions for Code Verification Test Problems

Energy Technology Data Exchange (ETDEWEB)

Doebling, Scott William [Los Alamos National Lab. (LANL), Los Alamos, NM (United States)

2017-09-14

This document contains standardized definitions for several commonly used code verification test problems. These definitions are intended to contain sufficient information to set up the test problem in a computational physics code. These definitions are intended to be used in conjunction with exact solutions to these problems generated using Exact- Pack, www.github.com/lanl/exactpack.
Partial F-tests with multiply imputed data in the linear regression framework via coefficient of determination.

Science.gov (United States)

Chaurasia, Ashok; Harel, Ofer

2015-02-10

Tests for regression coefficients such as global, local, and partial F-tests are common in applied research. In the framework of multiple imputation, there are several papers addressing tests for regression coefficients. However, for simultaneous hypothesis testing, the existing methods are computationally intensive because they involve calculation with vectors and (inversion of) matrices. In this paper, we propose a simple method based on the scalar entity, coefficient of determination, to perform (global, local, and partial) F-tests with multiply imputed data. The proposed method is evaluated using simulated data and applied to suicide prevention data. Copyright © 2014 John Wiley & Sons, Ltd.
Sierra/SolidMechanics 4.46 Example Problems Manual.

Energy Technology Data Exchange (ETDEWEB)

Plews, Julia A.; Crane, Nathan K; de Frias, Gabriel Jose; Le, San; Littlewood, David John; Merewether, Mark Thomas; Mosby, Matthew David; Pierson, Kendall H.; Porter, Vicki L.; Shelton, Timothy; Thomas, Jesse David; Tupek, Michael R.; Veilleux, Michael

2018-03-01

Presented in this document are tests that exist in the Sierra/SolidMechanics example problem suite, which is a subset of the Sierra/SM regression and performance test suite. These examples showcase common and advanced code capabilities. A wide variety of other regression and verification tests exist in the Sierra/SM test suite that are not included in this manual.
Multiple Linear Regression for Reconstruction of Gene Regulatory Networks in Solving Cascade Error Problems.

Science.gov (United States)

Salleh, Faridah Hani Mohamed; Zainudin, Suhaila; Arif, Shereena M

2017-01-01

Gene regulatory network (GRN) reconstruction is the process of identifying regulatory gene interactions from experimental data through computational analysis. One of the main reasons for the reduced performance of previous GRN methods had been inaccurate prediction of cascade motifs. Cascade error is defined as the wrong prediction of cascade motifs, where an indirect interaction is misinterpreted as a direct interaction. Despite the active research on various GRN prediction methods, the discussion on specific methods to solve problems related to cascade errors is still lacking. In fact, the experiments conducted by the past studies were not specifically geared towards proving the ability of GRN prediction methods in avoiding the occurrences of cascade errors. Hence, this research aims to propose Multiple Linear Regression (MLR) to infer GRN from gene expression data and to avoid wrongly inferring of an indirect interaction (A → B → C) as a direct interaction (A → C). Since the number of observations of the real experiment datasets was far less than the number of predictors, some predictors were eliminated by extracting the random subnetworks from global interaction networks via an established extraction method. In addition, the experiment was extended to assess the effectiveness of MLR in dealing with cascade error by using a novel experimental procedure that had been proposed in this work. The experiment revealed that the number of cascade errors had been very minimal. Apart from that, the Belsley collinearity test proved that multicollinearity did affect the datasets used in this experiment greatly. All the tested subnetworks obtained satisfactory results, with AUROC values above 0.5.
Stochastic search, optimization and regression with energy applications

Science.gov (United States)

Hannah, Lauren A.

models. We evaluate DP-GLM on several data sets, comparing it to modern methods of nonparametric regression like CART, Bayesian trees and Gaussian processes. Compared to existing techniques, the DP-GLM provides a single model (and corresponding inference algorithms) that performs well in many regression settings. Finally, we study convex stochastic search problems where a noisy objective function value is observed after a decision is made. There are many stochastic search problems whose behavior depends on an exogenous state variable which affects the shape of the objective function. Currently, there is no general purpose algorithm to solve this class of problems. We use nonparametric density estimation to take observations from the joint state-outcome distribution and use them to infer the optimal decision for a given query state. We propose two solution methods that depend on the problem characteristics: function-based and gradient-based optimization. We examine two weighting schemes, kernel-based weights and Dirichlet process-based weights, for use with the solution methods. The weights and solution methods are tested on a synthetic multi-product newsvendor problem and the hour-ahead wind commitment problem. Our results show that in some cases Dirichlet process weights offer substantial benefits over kernel based weights and more generally that nonparametric estimation methods provide good solutions to otherwise intractable problems.
Application of multilinear regression analysis in modeling of soil ...

African Journals Online (AJOL)

The application of Multi-Linear Regression Analysis (MLRA) model for predicting soil properties in Calabar South offers a technical guide and solution in foundation designs problems in the area. Forty-five soil samples were collected from fifteen different boreholes at a different depth and 270 tests were carried out for CBR, ...
Predicting Antitumor Activity of Peptides by Consensus of Regression Models Trained on a Small Data Sample

Directory of Open Access Journals (Sweden)

Ivanka Jerić

2011-11-01

Full Text Available Predicting antitumor activity of compounds using regression models trained on a small number of compounds with measured biological activity is an ill-posed inverse problem. Yet, it occurs very often within the academic community. To counteract, up to some extent, overfitting problems caused by a small training data, we propose to use consensus of six regression models for prediction of biological activity of virtual library of compounds. The QSAR descriptors of 22 compounds related to the opioid growth factor (OGF, Tyr-Gly-Gly-Phe-Met with known antitumor activity were used to train regression models: the feed-forward artificial neural network, the k-nearest neighbor, sparseness constrained linear regression, the linear and nonlinear (with polynomial and Gaussian kernel support vector machine. Regression models were applied on a virtual library of 429 compounds that resulted in six lists with candidate compounds ranked by predicted antitumor activity. The highly ranked candidate compounds were synthesized, characterized and tested for an antiproliferative activity. Some of prepared peptides showed more pronounced activity compared with the native OGF; however, they were less active than highly ranked compounds selected previously by the radial basis function support vector machine (RBF SVM regression model. The ill-posedness of the related inverse problem causes unstable behavior of trained regression models on test data. These results point to high complexity of prediction based on the regression models trained on a small data sample.
A Spline-Based Lack-Of-Fit Test for Independent Variable Effect in Poisson Regression.

Science.gov (United States)

Li, Chin-Shang; Tu, Wanzhu

2007-05-01

In regression analysis of count data, independent variables are often modeled by their linear effects under the assumption of log-linearity. In reality, the validity of such an assumption is rarely tested, and its use is at times unjustifiable. A lack-of-fit test is proposed for the adequacy of a postulated functional form of an independent variable within the framework of semiparametric Poisson regression models based on penalized splines. It offers added flexibility in accommodating the potentially non-loglinear effect of the independent variable. A likelihood ratio test is constructed for the adequacy of the postulated parametric form, for example log-linearity, of the independent variable effect. Simulations indicate that the proposed model performs well, and misspecified parametric model has much reduced power. An example is given.
Dimension Reduction and Discretization in Stochastic Problems by Regression Method

DEFF Research Database (Denmark)

Ditlevsen, Ove Dalager

1996-01-01

The chapter mainly deals with dimension reduction and field discretizations based directly on the concept of linear regression. Several examples of interesting applications in stochastic mechanics are also given.Keywords: Random fields discretization, Linear regression, Stochastic interpolation, ...
Introduction to regression graphics

CERN Document Server

Cook, R Dennis

2009-01-01

Covers the use of dynamic and interactive computer graphics in linear regression analysis, focusing on analytical graphics. Features new techniques like plot rotation. The authors have composed their own regression code, using Xlisp-Stat language called R-code, which is a nearly complete system for linear regression analysis and can be utilized as the main computer program in a linear regression course. The accompanying disks, for both Macintosh and Windows computers, contain the R-code and Xlisp-Stat. An Instructor's Manual presenting detailed solutions to all the problems in the book is ava
Group Work Tests for Context-Rich Problems

Science.gov (United States)

Meyer, Chris

2016-05-01

The group work test is an assessment strategy that promotes higher-order thinking skills for solving context-rich problems. With this format, teachers are able to pose challenging, nuanced questions on a test, while providing the support weaker students need to get started and show their understanding. The test begins with a group discussion phase, when students are given a "number-free" version of the problem. This phase allows students to digest the story-like problem, explore solution ideas, and alleviate some test anxiety. After 10-15 minutes of discussion, students inform the instructor of their readiness for the individual part of the test. What follows next is a pedagogical phase change from lively group discussion to quiet individual work. The group work test is a natural continuation of the group work in our daily physics classes and helps reinforce the importance of collaboration. This method has met with success at York Mills Collegiate Institute, in Toronto, Ontario, where it has been used consistently for unit tests and the final exam of the grade 12 university preparation physics course.
The art of regression modeling in road safety

CERN Document Server

Hauer, Ezra

2015-01-01

This unique book explains how to fashion useful regression models from commonly available data to erect models essential for evidence-based road safety management and research. Composed from techniques and best practices presented over many years of lectures and workshops, The Art of Regression Modeling in Road Safety illustrates that fruitful modeling cannot be done without substantive knowledge about the modeled phenomenon. Class-tested in courses and workshops across North America, the book is ideal for professionals, researchers, university professors, and graduate students with an interest in, or responsibilities related to, road safety. This book also: · Presents for the first time a powerful analytical tool for road safety researchers and practitioners · Includes problems and solutions in each chapter as well as data and spreadsheets for running models and PowerPoint presentation slides · Features pedagogy well-suited for graduate courses and workshops including problems, solutions, and PowerPoint p...
Is past life regression therapy ethical?

Science.gov (United States)

Andrade, Gabriel

2017-01-01

Past life regression therapy is used by some physicians in cases with some mental diseases. Anxiety disorders, mood disorders, and gender dysphoria have all been treated using life regression therapy by some doctors on the assumption that they reflect problems in past lives. Although it is not supported by psychiatric associations, few medical associations have actually condemned it as unethical. In this article, I argue that past life regression therapy is unethical for two basic reasons. First, it is not evidence-based. Past life regression is based on the reincarnation hypothesis, but this hypothesis is not supported by evidence, and in fact, it faces some insurmountable conceptual problems. If patients are not fully informed about these problems, they cannot provide an informed consent, and hence, the principle of autonomy is violated. Second, past life regression therapy has the great risk of implanting false memories in patients, and thus, causing significant harm. This is a violation of the principle of non-malfeasance, which is surely the most important principle in medical ethics.
Image superresolution using support vector regression.

Science.gov (United States)

Ni, Karl S; Nguyen, Truong Q

2007-06-01

A thorough investigation of the application of support vector regression (SVR) to the superresolution problem is conducted through various frameworks. Prior to the study, the SVR problem is enhanced by finding the optimal kernel. This is done by formulating the kernel learning problem in SVR form as a convex optimization problem, specifically a semi-definite programming (SDP) problem. An additional constraint is added to reduce the SDP to a quadratically constrained quadratic programming (QCQP) problem. After this optimization, investigation of the relevancy of SVR to superresolution proceeds with the possibility of using a single and general support vector regression for all image content, and the results are impressive for small training sets. This idea is improved upon by observing structural properties in the discrete cosine transform (DCT) domain to aid in learning the regression. Further improvement involves a combination of classification and SVR-based techniques, extending works in resolution synthesis. This method, termed kernel resolution synthesis, uses specific regressors for isolated image content to describe the domain through a partitioned look of the vector space, thereby yielding good results.
Distance Based Root Cause Analysis and Change Impact Analysis of Performance Regressions

Directory of Open Access Journals (Sweden)

Junzan Zhou

2015-01-01

Full Text Available Performance regression testing is applied to uncover both performance and functional problems of software releases. A performance problem revealed by performance testing can be high response time, low throughput, or even being out of service. Mature performance testing process helps systematically detect software performance problems. However, it is difficult to identify the root cause and evaluate the potential change impact. In this paper, we present an approach leveraging server side logs for identifying root causes of performance problems. Firstly, server side logs are used to recover call tree of each business transaction. We define a novel distance based metric computed from call trees for root cause analysis and apply inverted index from methods to business transactions for change impact analysis. Empirical studies show that our approach can effectively and efficiently help developers diagnose root cause of performance problems.
Comparison of IRT Likelihood Ratio Test and Logistic Regression DIF Detection Procedures

Science.gov (United States)

Atar, Burcu; Kamata, Akihito

2011-01-01

The Type I error rates and the power of IRT likelihood ratio test and cumulative logit ordinal logistic regression procedures in detecting differential item functioning (DIF) for polytomously scored items were investigated in this Monte Carlo simulation study. For this purpose, 54 simulation conditions (combinations of 3 sample sizes, 2 sample…
Multiple Linear Regression for Reconstruction of Gene Regulatory Networks in Solving Cascade Error Problems

Directory of Open Access Journals (Sweden)

Faridah Hani Mohamed Salleh

2017-01-01

Full Text Available Gene regulatory network (GRN reconstruction is the process of identifying regulatory gene interactions from experimental data through computational analysis. One of the main reasons for the reduced performance of previous GRN methods had been inaccurate prediction of cascade motifs. Cascade error is defined as the wrong prediction of cascade motifs, where an indirect interaction is misinterpreted as a direct interaction. Despite the active research on various GRN prediction methods, the discussion on specific methods to solve problems related to cascade errors is still lacking. In fact, the experiments conducted by the past studies were not specifically geared towards proving the ability of GRN prediction methods in avoiding the occurrences of cascade errors. Hence, this research aims to propose Multiple Linear Regression (MLR to infer GRN from gene expression data and to avoid wrongly inferring of an indirect interaction (A → B → C as a direct interaction (A → C. Since the number of observations of the real experiment datasets was far less than the number of predictors, some predictors were eliminated by extracting the random subnetworks from global interaction networks via an established extraction method. In addition, the experiment was extended to assess the effectiveness of MLR in dealing with cascade error by using a novel experimental procedure that had been proposed in this work. The experiment revealed that the number of cascade errors had been very minimal. Apart from that, the Belsley collinearity test proved that multicollinearity did affect the datasets used in this experiment greatly. All the tested subnetworks obtained satisfactory results, with AUROC values above 0.5.

Regressive Imagery in Creative Problem-Solving: Comparing Verbal Protocols of Expert and Novice Visual Artists and Computer Programmers

Science.gov (United States)

Kozbelt, Aaron; Dexter, Scott; Dolese, Melissa; Meredith, Daniel; Ostrofsky, Justin

2015-01-01

We applied computer-based text analyses of regressive imagery to verbal protocols of individuals engaged in creative problem-solving in two domains: visual art (23 experts, 23 novices) and computer programming (14 experts, 14 novices). Percentages of words involving primary process and secondary process thought, plus emotion-related words, were…
A Simulation Investigation of Principal Component Regression.

Science.gov (United States)

Allen, David E.

Regression analysis is one of the more common analytic tools used by researchers. However, multicollinearity between the predictor variables can cause problems in using the results of regression analyses. Problems associated with multicollinearity include entanglement of relative influences of variables due to reduced precision of estimation,…
SOFC regulation at constant temperature: Experimental test and data regression study

International Nuclear Information System (INIS)

Barelli, L.; Bidini, G.; Cinti, G.; Ottaviano, A.

2016-01-01

Highlights: • SOFC operating temperature impacts strongly on its performance and lifetime. • Experimental tests were carried out varying electric load and feeding mixture gas. • Three different anodic inlet gases were tested maintaining constant temperature. • Cathodic air flow rate was used to maintain constant its operating temperature. • Regression law was defined from experimental data to regulate the air flow rate. - Abstract: The operating temperature of solid oxide fuel cell stack (SOFC) is an important parameter to be controlled, which impacts the SOFC performance and its lifetime. Rapid temperature change implies a significant temperature differences between the surface and the mean body leading to a state of thermal shock. Thermal shock and thermal cycling introduce stress in a material due to temperature differences between the surface and the interior, or between different regions of the cell. In this context, in order to determine a control law that permit to maintain constant the fuel cell temperature varying the electrical load and the infeed fuel mixture, an experimental activity were carried out on a planar SOFC short stack to analyse stack temperature. Specifically, three different anodic inlet gas compositions were tested: pure hydrogen, reformed natural gas with steam to carbon ratio equal to 2 and 2.5. By processing the obtained results, a regression law was defined to regulate the air flow rate to be provided to the fuel cell to maintain constant its operating temperature varying its operating conditions.
The use of cognitive ability measures as explanatory variables in regression analysis.

Science.gov (United States)

Junker, Brian; Schofield, Lynne Steuerle; Taylor, Lowell J

2012-12-01

Cognitive ability measures are often taken as explanatory variables in regression analysis, e.g., as a factor affecting a market outcome such as an individual's wage, or a decision such as an individual's education acquisition. Cognitive ability is a latent construct; its true value is unobserved. Nonetheless, researchers often assume that a test score , constructed via standard psychometric practice from individuals' responses to test items, can be safely used in regression analysis. We examine problems that can arise, and suggest that an alternative approach, a "mixed effects structural equations" (MESE) model, may be more appropriate in many circumstances.
Advanced statistics: linear regression, part I: simple linear regression.

Science.gov (United States)

Marill, Keith A

2004-01-01

Simple linear regression is a mathematical technique used to model the relationship between a single independent predictor variable and a single dependent outcome variable. In this, the first of a two-part series exploring concepts in linear regression analysis, the four fundamental assumptions and the mechanics of simple linear regression are reviewed. The most common technique used to derive the regression line, the method of least squares, is described. The reader will be acquainted with other important concepts in simple linear regression, including: variable transformations, dummy variables, relationship to inference testing, and leverage. Simplified clinical examples with small datasets and graphic models are used to illustrate the points. This will provide a foundation for the second article in this series: a discussion of multiple linear regression, in which there are multiple predictor variables.
[Comparison of application of Cochran-Armitage trend test and linear regression analysis for rate trend analysis in epidemiology study].

Science.gov (United States)

Wang, D Z; Wang, C; Shen, C F; Zhang, Y; Zhang, H; Song, G D; Xue, X D; Xu, Z L; Zhang, S; Jiang, G H

2017-05-10

We described the time trend of acute myocardial infarction (AMI) from 1999 to 2013 in Tianjin incidence rate with Cochran-Armitage trend (CAT) test and linear regression analysis, and the results were compared. Based on actual population, CAT test had much stronger statistical power than linear regression analysis for both overall incidence trend and age specific incidence trend (Cochran-Armitage trend P valuelinear regression P value). The statistical power of CAT test decreased, while the result of linear regression analysis remained the same when population size was reduced by 100 times and AMI incidence rate remained unchanged. The two statistical methods have their advantages and disadvantages. It is necessary to choose statistical method according the fitting degree of data, or comprehensively analyze the results of two methods.
Why some children with externalising problems develop internalising symptoms: testing two pathways in a genetically sensitive cohort study.

Science.gov (United States)

Wertz, Jasmin; Zavos, Helena; Matthews, Timothy; Harvey, Kirsten; Hunt, Alice; Pariante, Carmine M; Arseneault, Louise

2015-07-01

Children with externalising problems are at risk of developing internalising problems as they grow older. The pathways underlying this developmental association remain to be elucidated. We tested two processes that could explain why some children with externalising problems develop internalising symptoms in preadolescence: a mediation model whereby the association between early externalising and later new internalising symptoms is explained by negative experiences; and a genetic model, whereby genes influence both problems. We used data from the Environmental Risk (E-Risk) Study, a 1994-1995 birth cohort of 2,232 twins born in England and Wales. We assessed externalising and internalising problems using combined mothers' and teachers' ratings at age 5 and 12. We measured bullying victimisation, maternal dissatisfaction and academic difficulties between age 7 and 10 and used linear regression analyses to test the effects of these negative experiences on the association between early externalising and later internalising problems. We employed a Cholesky decomposition to examine the genetic influences on the association. Children with externalising problems at age 5 showed increased rates of new internalising problems at age 12 (r = .24, p children with externalising problems develop internalising symptoms in preadolescence. Negative experiences also contribute to the association, possibly through gene-environment interplay. Mental health professionals should monitor the development of internalising symptoms in young children with externalising problems. © 2014 Association for Child and Adolescent Mental Health.
Accounting for regression-to-the-mean in tests for recent changes in institutional performance: analysis and power.

Science.gov (United States)

Jones, Hayley E; Spiegelhalter, David J

2009-05-30

Recent changes in individual units are often of interest when monitoring and assessing the performance of healthcare providers. We consider three high profile examples: (a) annual teenage pregnancy rates in English local authorities, (b) quarterly rates of the hospital-acquired infection Clostridium difficile in National Health Service (NHS) Trusts and (c) annual mortality rates following heart surgery in New York State hospitals. Increasingly, government targets call for continual improvements, in each individual provider as well as overall.Owing to the well-known statistical phenomenon of regression-to-the-mean, observed changes between just two measurements are potentially misleading. This problem has received much attention in other areas, but there is a need for guidelines within performance monitoring.In this paper we show theoretically and with worked examples that a simple random effects predictive distribution can be used to 'correct' for the potentially undesirable consequences of regression-to-the-mean on a test for individual change. We discuss connections to the literature in other fields, and build upon this, in particular by examining the effect of the correction on the power to detect genuine changes. It is demonstrated that a gain in average power can be expected, but that this gain is only very slight if the providers are very different from one another, for example due to poor risk adjustment. Further, the power of the corrected test depends on the provider's baseline rate and, although large gains can be expected for some providers, this is at the cost of some power to detect real changes in others. (c) 2009 John Wiley & Sons, Ltd.
49 CFR 40.205 - How are drug test problems corrected?

Science.gov (United States)

2010-10-01

... 49 Transportation 1 2010-10-01 2010-10-01 false How are drug test problems corrected? 40.205 Section 40.205 Transportation Office of the Secretary of Transportation PROCEDURES FOR TRANSPORTATION WORKPLACE DRUG AND ALCOHOL TESTING PROGRAMS Problems in Drug Tests § 40.205 How are drug test problems...
Autistic Regression

Science.gov (United States)

Matson, Johnny L.; Kozlowski, Alison M.

2010-01-01

Autistic regression is one of the many mysteries in the developmental course of autism and pervasive developmental disorders not otherwise specified (PDD-NOS). Various definitions of this phenomenon have been used, further clouding the study of the topic. Despite this problem, some efforts at establishing prevalence have been made. The purpose of…
Pivotal statistics for testing subsets of structural parameters in the IV Regression Model

NARCIS (Netherlands)

Kleibergen, F.R.

2000-01-01

We construct a novel statistic to test hypothezes on subsets of the structural parameters in anInstrumental Variables (IV) regression model. We derive the chi squared limiting distribution of thestatistic and show that it has a degrees of freedom parameter that is equal to the number ofstructural
A multiple objective test assembly approach for exposure control problems in Computerized Adaptive Testing

Directory of Open Access Journals (Sweden)

Theo J.H.M. Eggen

2010-01-01

Full Text Available Overexposure and underexposure of items in the bank are serious problems in operational computerized adaptive testing (CAT systems. These exposure problems might result in item compromise, or point at a waste of investments. The exposure control problem can be viewed as a test assembly problem with multiple objectives. Information in the test has to be maximized, item compromise has to be minimized, and pool usage has to be optimized. In this paper, a multiple objectives method is developed to deal with both types of exposure problems. In this method, exposure control parameters based on observed exposure rates are implemented as weights for the information in the item selection procedure. The method does not need time consuming simulation studies, and it can be implemented conditional on ability level. The method is compared with Sympson Hetter method for exposure control, with the Progressive method and with alphastratified testing. The results show that the method is successful in dealing with both kinds of exposure problems.
Unbalanced Regressions and the Predictive Equation

DEFF Research Database (Denmark)

Osterrieder, Daniela; Ventosa-Santaulària, Daniel; Vera-Valdés, J. Eduardo

Predictive return regressions with persistent regressors are typically plagued by (asymptotically) biased/inconsistent estimates of the slope, non-standard or potentially even spurious statistical inference, and regression unbalancedness. We alleviate the problem of unbalancedness in the theoreti......Predictive return regressions with persistent regressors are typically plagued by (asymptotically) biased/inconsistent estimates of the slope, non-standard or potentially even spurious statistical inference, and regression unbalancedness. We alleviate the problem of unbalancedness...... in the theoretical predictive equation by suggesting a data generating process, where returns are generated as linear functions of a lagged latent I(0) risk process. The observed predictor is a function of this latent I(0) process, but it is corrupted by a fractionally integrated noise. Such a process may arise due...... to aggregation or unexpected level shifts. In this setup, the practitioner estimates a misspecified, unbalanced, and endogenous predictive regression. We show that the OLS estimate of this regression is inconsistent, but standard inference is possible. To obtain a consistent slope estimate, we then suggest...
Test set for initial value problem solvers

NARCIS (Netherlands)

W.M. Lioen (Walter); J.J.B. de Swart (Jacques)

1998-01-01

textabstractThe CWI test set for IVP solvers presents a collection of Initial Value Problems to test solvers for implicit differential equations. This test set can both decrease the effort for the code developer to test his software in a reliable way, and cross the bridge between the application
Testing Mediation Using Multiple Regression and Structural Equation Modeling Analyses in Secondary Data

Science.gov (United States)

Li, Spencer D.

2011-01-01

Mediation analysis in child and adolescent development research is possible using large secondary data sets. This article provides an overview of two statistical methods commonly used to test mediated effects in secondary analysis: multiple regression and structural equation modeling (SEM). Two empirical studies are presented to illustrate the…
Regression modeling methods, theory, and computation with SAS

CERN Document Server

Panik, Michael

2009-01-01

Regression Modeling: Methods, Theory, and Computation with SAS provides an introduction to a diverse assortment of regression techniques using SAS to solve a wide variety of regression problems. The author fully documents the SAS programs and thoroughly explains the output produced by the programs.The text presents the popular ordinary least squares (OLS) approach before introducing many alternative regression methods. It covers nonparametric regression, logistic regression (including Poisson regression), Bayesian regression, robust regression, fuzzy regression, random coefficients regression,
Motor operated valves problems tests and simulations

Energy Technology Data Exchange (ETDEWEB)

Pinier, D.; Haas, J.L.

1996-12-01

An analysis of the two refusals of operation of the EAS recirculation shutoff valves enabled two distinct problems to be identified on the motorized valves: the calculation methods for the operating torques of valves in use in the power plants are not conservative enough, which results in the misadjustement of the torque limiters installed on their motorizations, the second problem concerns the pressure locking phenomenon: a number of valves may entrap a pressure exceeding the in-line pressure between the disks, which may cause a jamming of the valve. EDF has made the following approach to settle the first problem: determination of the friction coefficients and the efficiency of the valve and its actuator through general and specific tests and models, definition of a new calculation method. In order to solve the second problem, EDF has made the following operations: identification of the valves whose technology enables the pressure to be entrapped: the tests and numerical simulations carried out in the Research and Development Division confirm the possibility of a {open_quotes}boiler{close_quotes} effect: determination of the necessary modifications: development and testing of anti-boiler effect systems.
Motor operated valves problems tests and simulations

International Nuclear Information System (INIS)

Pinier, D.; Haas, J.L.

1996-01-01

An analysis of the two refusals of operation of the EAS recirculation shutoff valves enabled two distinct problems to be identified on the motorized valves: the calculation methods for the operating torques of valves in use in the power plants are not conservative enough, which results in the misadjustement of the torque limiters installed on their motorizations, the second problem concerns the pressure locking phenomenon: a number of valves may entrap a pressure exceeding the in-line pressure between the disks, which may cause a jamming of the valve. EDF has made the following approach to settle the first problem: determination of the friction coefficients and the efficiency of the valve and its actuator through general and specific tests and models, definition of a new calculation method. In order to solve the second problem, EDF has made the following operations: identification of the valves whose technology enables the pressure to be entrapped: the tests and numerical simulations carried out in the Research and Development Division confirm the possibility of a open-quotes boilerclose quotes effect: determination of the necessary modifications: development and testing of anti-boiler effect systems
Regression and regression analysis time series prediction modeling on climate data of quetta, pakistan

International Nuclear Information System (INIS)

Jafri, Y.Z.; Kamal, L.

2007-01-01

Various statistical techniques was used on five-year data from 1998-2002 of average humidity, rainfall, maximum and minimum temperatures, respectively. The relationships to regression analysis time series (RATS) were developed for determining the overall trend of these climate parameters on the basis of which forecast models can be corrected and modified. We computed the coefficient of determination as a measure of goodness of fit, to our polynomial regression analysis time series (PRATS). The correlation to multiple linear regression (MLR) and multiple linear regression analysis time series (MLRATS) were also developed for deciphering the interdependence of weather parameters. Spearman's rand correlation and Goldfeld-Quandt test were used to check the uniformity or non-uniformity of variances in our fit to polynomial regression (PR). The Breusch-Pagan test was applied to MLR and MLRATS, respectively which yielded homoscedasticity. We also employed Bartlett's test for homogeneity of variances on a five-year data of rainfall and humidity, respectively which showed that the variances in rainfall data were not homogenous while in case of humidity, were homogenous. Our results on regression and regression analysis time series show the best fit to prediction modeling on climatic data of Quetta, Pakistan. (author)
Validity of the Clock Drawing Test in predicting reports of driving problems in the elderly

Directory of Open Access Journals (Sweden)

Banou Evangelia

2004-10-01

Full Text Available Abstract Background This study examined the use of the Folstein Mini Mental Status Exam (MMSE and the Clock Drawing Test (CDT in predicting retrospective reports of driving problems among the elderly. The utility of existing scoring systems for the CDT was also examined. Methods Archival chart records of 325 patients of a geriatric outpatient clinic were reviewed, of which 162 had CDT results (including original clock drawings. T-test, correlation, and regression procedures were used to analyze the data. Results Both CDT and MMSE scores were significantly worse among non-drivers than individuals who were currently or recently driving. Among current or recent drivers, scores on both instruments correlated significantly with the total number of reported accidents or near misses, although the magnitude of the respective correlations was small. Only MMSE scores, however, significantly predicted whether or not any accidents or near misses were reported at all. Neither MMSE nor CDT scores predicted unique variance in the regressions. Conclusions The overall results suggest that both the MMSE and CDT have limited utility as potential indicators of driving problems in the elderly. The demonstrated predictive power for these instruments appears to be redundant, such that both appear to assess general cognitive function versus more specific abilities. Furthermore, the lack of robust prediction suggests that neither are sufficient to serve as stand-alone instruments on which to solely base decisions of driving capacity. Rather, individuals who evidence impairment should be provided a more thorough and comprehensive assessment than can be obtained through screening tools.

Projected regression method for solving Fredholm integral equations arising in the analytic continuation problem of quantum physics

International Nuclear Information System (INIS)

Arsenault, Louis-François; Millis, Andrew J; Neuberg, Richard; Hannah, Lauren A

2017-01-01

We present a supervised machine learning approach to the inversion of Fredholm integrals of the first kind as they arise, for example, in the analytic continuation problem of quantum many-body physics. The approach provides a natural regularization for the ill-conditioned inverse of the Fredholm kernel, as well as an efficient and stable treatment of constraints. The key observation is that the stability of the forward problem permits the construction of a large database of outputs for physically meaningful inputs. Applying machine learning to this database generates a regression function of controlled complexity, which returns approximate solutions for previously unseen inputs; the approximate solutions are then projected onto the subspace of functions satisfying relevant constraints. Under standard error metrics the method performs as well or better than the Maximum Entropy method for low input noise and is substantially more robust to increased input noise. We suggest that the methodology will be similarly effective for other problems involving a formally ill-conditioned inversion of an integral operator, provided that the forward problem can be efficiently solved. (paper)
A meta-regression analysis of 41 Australian problem gambling prevalence estimates and their relationship to total spending on electronic gaming machines

Directory of Open Access Journals (Sweden)

Francis Markham

2017-05-01

Full Text Available Abstract Background Many jurisdictions regularly conduct surveys to estimate the prevalence of problem gambling in their adult populations. However, the comparison of such estimates is problematic due to methodological variations between studies. Total consumption theory suggests that an association between mean electronic gaming machine (EGM and casino gambling losses and problem gambling prevalence estimates may exist. If this is the case, then changes in EGM losses may be used as a proxy indicator for changes in problem gambling prevalence. To test for this association this study examines the relationship between aggregated losses on electronic gaming machines (EGMs and problem gambling prevalence estimates for Australian states and territories between 1994 and 2016. Methods A Bayesian meta-regression analysis of 41 cross-sectional problem gambling prevalence estimates was undertaken using EGM gambling losses, year of survey and methodological variations as predictor variables. General population studies of adults in Australian states and territory published before 1 July 2016 were considered in scope. 41 studies were identified, with a total of 267,367 participants. Problem gambling prevalence, moderate-risk problem gambling prevalence, problem gambling screen, administration mode and frequency threshold were extracted from surveys. Administrative data on EGM and casino gambling loss data were extracted from government reports and expressed as the proportion of household disposable income lost. Results Money lost on EGMs is correlated with problem gambling prevalence. An increase of 1% of household disposable income lost on EGMs and in casinos was associated with problem gambling prevalence estimates that were 1.33 times higher [95% credible interval 1.04, 1.71]. There was no clear association between EGM losses and moderate-risk problem gambling prevalence estimates. Moderate-risk problem gambling prevalence estimates were not explained by
A meta-regression analysis of 41 Australian problem gambling prevalence estimates and their relationship to total spending on electronic gaming machines.

Science.gov (United States)

Markham, Francis; Young, Martin; Doran, Bruce; Sugden, Mark

2017-05-23

Many jurisdictions regularly conduct surveys to estimate the prevalence of problem gambling in their adult populations. However, the comparison of such estimates is problematic due to methodological variations between studies. Total consumption theory suggests that an association between mean electronic gaming machine (EGM) and casino gambling losses and problem gambling prevalence estimates may exist. If this is the case, then changes in EGM losses may be used as a proxy indicator for changes in problem gambling prevalence. To test for this association this study examines the relationship between aggregated losses on electronic gaming machines (EGMs) and problem gambling prevalence estimates for Australian states and territories between 1994 and 2016. A Bayesian meta-regression analysis of 41 cross-sectional problem gambling prevalence estimates was undertaken using EGM gambling losses, year of survey and methodological variations as predictor variables. General population studies of adults in Australian states and territory published before 1 July 2016 were considered in scope. 41 studies were identified, with a total of 267,367 participants. Problem gambling prevalence, moderate-risk problem gambling prevalence, problem gambling screen, administration mode and frequency threshold were extracted from surveys. Administrative data on EGM and casino gambling loss data were extracted from government reports and expressed as the proportion of household disposable income lost. Money lost on EGMs is correlated with problem gambling prevalence. An increase of 1% of household disposable income lost on EGMs and in casinos was associated with problem gambling prevalence estimates that were 1.33 times higher [95% credible interval 1.04, 1.71]. There was no clear association between EGM losses and moderate-risk problem gambling prevalence estimates. Moderate-risk problem gambling prevalence estimates were not explained by the models (I 2 ≥ 0.97; R 2 ≤ 0.01). The
Alcohol Use-Related Problems Among a Rural Indian Population of West Bengal: An Application of the Alcohol Use Disorders Identification Test (AUDIT).

Science.gov (United States)

Barik, Anamitra; Rai, Rajesh Kumar; Chowdhury, Abhijit

2016-03-01

To examine alcohol use and related problems among a rural subset of the Indian population. The Alcohol Use Disorders Identification Test (AUDIT) was used as part of Health and Demographic Surveillance of 36,611 individuals aged ≥18 years. From this survey data on 3671 current alcohol users were analysed using bivariate and multivariate ordered logit regression. Over 19% of males and 2.4% of females were current alcohol users. Mean ethanol consumption on a typical drinking day among males was estimated to be higher (96.3 gm) than females (56.5 gm). Mean AUDIT score was 11 among current alcohol users. AUDIT showed in the ordered logit regression estimated alcohol use-related problems to be low among women, Scheduled Tribes and unmarried people, whereas alcohol use-related problems registered high among Muslims. This rural population appears to be in need of an effective intervention program, perhaps targeting men and the household, aimed at reducing the level of alcohol use and related problems. © The Author 2015. Medical Council on Alcohol and Oxford University Press. All rights reserved.
Change-based test selection : An empirical evaluation

NARCIS (Netherlands)

Soetens, Quinten; Demeyer, Serge; Zaidman, A.E.; Perez, Javier

2015-01-01

Regression test selection (i.e., selecting a subset of a given regression test suite) is a problem that has been studied intensely over the last decade. However, with the increasing popularity of developer tests as the driver of the test process, more fine-grained solutions that work well within the
Estimation of Genetic Parameters for First Lactation Monthly Test-day Milk Yields using Random Regression Test Day Model in Karan Fries Cattle

Directory of Open Access Journals (Sweden)

Ajay Singh

2016-06-01

Full Text Available A single trait linear mixed random regression test-day model was applied for the first time for analyzing the first lactation monthly test-day milk yield records in Karan Fries cattle. The test-day milk yield data was modeled using a random regression model (RRM considering different order of Legendre polynomial for the additive genetic effect (4th order and the permanent environmental effect (5th order. Data pertaining to 1,583 lactation records spread over a period of 30 years were recorded and analyzed in the study. The variance component, heritability and genetic correlations among test-day milk yields were estimated using RRM. RRM heritability estimates of test-day milk yield varied from 0.11 to 0.22 in different test-day records. The estimates of genetic correlations between different test-day milk yields ranged 0.01 (test-day 1 [TD-1] and TD-11 to 0.99 (TD-4 and TD-5. The magnitudes of genetic correlations between test-day milk yields decreased as the interval between test-days increased and adjacent test-day had higher correlations. Additive genetic and permanent environment variances were higher for test-day milk yields at both ends of lactation. The residual variance was observed to be lower than the permanent environment variance for all the test-day milk yields.
Regression to Causality : Regression-style presentation influences causal attribution

DEFF Research Database (Denmark)

Bordacconi, Mats Joe; Larsen, Martin Vinæs

2014-01-01

of equivalent results presented as either regression models or as a test of two sample means. Our experiment shows that the subjects who were presented with results as estimates from a regression model were more inclined to interpret these results causally. Our experiment implies that scholars using regression...... models – one of the primary vehicles for analyzing statistical results in political science – encourage causal interpretation. Specifically, we demonstrate that presenting observational results in a regression model, rather than as a simple comparison of means, makes causal interpretation of the results...... more likely. Our experiment drew on a sample of 235 university students from three different social science degree programs (political science, sociology and economics), all of whom had received substantial training in statistics. The subjects were asked to compare and evaluate the validity...
Simulation and Analysis of Converging Shock Wave Test Problems

Energy Technology Data Exchange (ETDEWEB)

Ramsey, Scott D. [Los Alamos National Laboratory; Shashkov, Mikhail J. [Los Alamos National Laboratory

2012-06-21

Results and analysis pertaining to the simulation of the Guderley converging shock wave test problem (and associated code verification hydrodynamics test problems involving converging shock waves) in the LANL ASC radiation-hydrodynamics code xRAGE are presented. One-dimensional (1D) spherical and two-dimensional (2D) axi-symmetric geometric setups are utilized and evaluated in this study, as is an instantiation of the xRAGE adaptive mesh refinement capability. For the 2D simulations, a 'Surrogate Guderley' test problem is developed and used to obviate subtleties inherent to the true Guderley solution's initialization on a square grid, while still maintaining a high degree of fidelity to the original problem, and minimally straining the general credibility of associated analysis and conclusions.
49 CFR 40.271 - How are alcohol testing problems corrected?

Science.gov (United States)

2010-10-01

... 49 Transportation 1 2010-10-01 2010-10-01 false How are alcohol testing problems corrected? 40.271 Section 40.271 Transportation Office of the Secretary of Transportation PROCEDURES FOR TRANSPORTATION WORKPLACE DRUG AND ALCOHOL TESTING PROGRAMS Problems in Alcohol Testing § 40.271 How are alcohol testing...
Fast metabolite identification with Input Output Kernel Regression

Science.gov (United States)

Brouard, Céline; Shen, Huibin; Dührkop, Kai; d'Alché-Buc, Florence; Böcker, Sebastian; Rousu, Juho

2016-01-01

Motivation: An important problematic of metabolomics is to identify metabolites using tandem mass spectrometry data. Machine learning methods have been proposed recently to solve this problem by predicting molecular fingerprint vectors and matching these fingerprints against existing molecular structure databases. In this work we propose to address the metabolite identification problem using a structured output prediction approach. This type of approach is not limited to vector output space and can handle structured output space such as the molecule space. Results: We use the Input Output Kernel Regression method to learn the mapping between tandem mass spectra and molecular structures. The principle of this method is to encode the similarities in the input (spectra) space and the similarities in the output (molecule) space using two kernel functions. This method approximates the spectra-molecule mapping in two phases. The first phase corresponds to a regression problem from the input space to the feature space associated to the output kernel. The second phase is a preimage problem, consisting in mapping back the predicted output feature vectors to the molecule space. We show that our approach achieves state-of-the-art accuracy in metabolite identification. Moreover, our method has the advantage of decreasing the running times for the training step and the test step by several orders of magnitude over the preceding methods. Availability and implementation: Contact: celine.brouard@aalto.fi Supplementary information: Supplementary data are available at Bioinformatics online. PMID:27307628
Linear regression

CERN Document Server

Olive, David J

2017-01-01

This text covers both multiple linear regression and some experimental design models. The text uses the response plot to visualize the model and to detect outliers, does not assume that the error distribution has a known parametric distribution, develops prediction intervals that work when the error distribution is unknown, suggests bootstrap hypothesis tests that may be useful for inference after variable selection, and develops prediction regions and large sample theory for the multivariate linear regression model that has m response variables. A relationship between multivariate prediction regions and confidence regions provides a simple way to bootstrap confidence regions. These confidence regions often provide a practical method for testing hypotheses. There is also a chapter on generalized linear models and generalized additive models. There are many R functions to produce response and residual plots, to simulate prediction intervals and hypothesis tests, to detect outliers, and to choose response trans...
Predicting Student Success on the Texas Chemistry STAAR Test: A Logistic Regression Analysis

Science.gov (United States)

Johnson, William L.; Johnson, Annabel M.; Johnson, Jared

2012-01-01

Background: The context is the new Texas STAAR end-of-course testing program. Purpose: The authors developed a logistic regression model to predict who would pass-or-fail the new Texas chemistry STAAR end-of-course exam. Setting: Robert E. Lee High School (5A) with an enrollment of 2700 students, Tyler, Texas. Date of the study was the 2011-2012…
Linear regression in astronomy. I

Science.gov (United States)

Isobe, Takashi; Feigelson, Eric D.; Akritas, Michael G.; Babu, Gutti Jogesh

1990-01-01

Five methods for obtaining linear regression fits to bivariate data with unknown or insignificant measurement errors are discussed: ordinary least-squares (OLS) regression of Y on X, OLS regression of X on Y, the bisector of the two OLS lines, orthogonal regression, and 'reduced major-axis' regression. These methods have been used by various researchers in observational astronomy, most importantly in cosmic distance scale applications. Formulas for calculating the slope and intercept coefficients and their uncertainties are given for all the methods, including a new general form of the OLS variance estimates. The accuracy of the formulas was confirmed using numerical simulations. The applicability of the procedures is discussed with respect to their mathematical properties, the nature of the astronomical data under consideration, and the scientific purpose of the regression. It is found that, for problems needing symmetrical treatment of the variables, the OLS bisector performs significantly better than orthogonal or reduced major-axis regression.
The M Word: Multicollinearity in Multiple Regression.

Science.gov (United States)

Morrow-Howell, Nancy

1994-01-01

Notes that existence of substantial correlation between two or more independent variables creates problems of multicollinearity in multiple regression. Discusses multicollinearity problem in social work research in which independent variables are usually intercorrelated. Clarifies problems created by multicollinearity, explains detection of…
Analysis of some methods for reduced rank Gaussian process regression

DEFF Research Database (Denmark)

Quinonero-Candela, J.; Rasmussen, Carl Edward

2005-01-01

While there is strong motivation for using Gaussian Processes (GPs) due to their excellent performance in regression and classification problems, their computational complexity makes them impractical when the size of the training set exceeds a few thousand cases. This has motivated the recent...... proliferation of a number of cost-effective approximations to GPs, both for classification and for regression. In this paper we analyze one popular approximation to GPs for regression: the reduced rank approximation. While generally GPs are equivalent to infinite linear models, we show that Reduced Rank...... Gaussian Processes (RRGPs) are equivalent to finite sparse linear models. We also introduce the concept of degenerate GPs and show that they correspond to inappropriate priors. We show how to modify the RRGP to prevent it from being degenerate at test time. Training RRGPs consists both in learning...
Linear regression in astronomy. II

Science.gov (United States)

Feigelson, Eric D.; Babu, Gutti J.

1992-01-01

A wide variety of least-squares linear regression procedures used in observational astronomy, particularly investigations of the cosmic distance scale, are presented and discussed. The classes of linear models considered are (1) unweighted regression lines, with bootstrap and jackknife resampling; (2) regression solutions when measurement error, in one or both variables, dominates the scatter; (3) methods to apply a calibration line to new data; (4) truncated regression models, which apply to flux-limited data sets; and (5) censored regression models, which apply when nondetections are present. For the calibration problem we develop two new procedures: a formula for the intercept offset between two parallel data sets, which propagates slope errors from one regression to the other; and a generalization of the Working-Hotelling confidence bands to nonstandard least-squares lines. They can provide improved error analysis for Faber-Jackson, Tully-Fisher, and similar cosmic distance scale relations.
Implicit collinearity effect in linear regression: Application to basal ...

African Journals Online (AJOL)

Collinearity of predictor variables is a severe problem in the least square regression analysis. It contributes to the instability of regression coefficients and leads to a wrong prediction accuracy. Despite these problems, studies are conducted with a large number of observed and derived variables linked with a response ...
Testing the equality of nonparametric regression curves based on ...

African Journals Online (AJOL)

Abstract. In this work we propose a new methodology for the comparison of two regression functions f1 and f2 in the case of homoscedastic error structure and a fixed design. Our approach is based on the empirical Fourier coefficients of the regression functions f1 and f2 respectively. As our main results we obtain the ...
Testing Homogeneity in a Semiparametric Two-Sample Problem

Directory of Open Access Journals (Sweden)

Yukun Liu

2012-01-01

Full Text Available We study a two-sample homogeneity testing problem, in which one sample comes from a population with density f(x and the other is from a mixture population with mixture density (1−λf(x+λg(x. This problem arises naturally from many statistical applications such as test for partial differential gene expression in microarray study or genetic studies for gene mutation. Under the semiparametric assumption g(x=f(xeα+βx, a penalized empirical likelihood ratio test could be constructed, but its implementation is hindered by the fact that there is neither feasible algorithm for computing the test statistic nor available research results on its theoretical properties. To circumvent these difficulties, we propose an EM test based on the penalized empirical likelihood. We prove that the EM test has a simple chi-square limiting distribution, and we also demonstrate its competitive testing performances by simulations. A real-data example is used to illustrate the proposed methodology.
Testing contingency hypotheses in budgetary research: An evaluation of the use of moderated regression analysis

NARCIS (Netherlands)

Hartmann, Frank G.H.; Moers, Frank

1999-01-01

In the contingency literature on the behavioral and organizational effects of budgeting, use of the Moderated Regression Analysis (MRA) technique is prevalent. This technique is used to test contingency hypotheses that predict interaction effects between budgetary and contextual variables. This

Using the Coefficient of Determination "R"[superscript 2] to Test the Significance of Multiple Linear Regression

Science.gov (United States)

Quinino, Roberto C.; Reis, Edna A.; Bessegato, Lupercio F.

2013-01-01

This article proposes the use of the coefficient of determination as a statistic for hypothesis testing in multiple linear regression based on distributions acquired by beta sampling. (Contains 3 figures.)
Discriminative Elastic-Net Regularized Linear Regression.

Science.gov (United States)

Zhang, Zheng; Lai, Zhihui; Xu, Yong; Shao, Ling; Wu, Jian; Xie, Guo-Sen

2017-03-01

In this paper, we aim at learning compact and discriminative linear regression models. Linear regression has been widely used in different problems. However, most of the existing linear regression methods exploit the conventional zero-one matrix as the regression targets, which greatly narrows the flexibility of the regression model. Another major limitation of these methods is that the learned projection matrix fails to precisely project the image features to the target space due to their weak discriminative capability. To this end, we present an elastic-net regularized linear regression (ENLR) framework, and develop two robust linear regression models which possess the following special characteristics. First, our methods exploit two particular strategies to enlarge the margins of different classes by relaxing the strict binary targets into a more feasible variable matrix. Second, a robust elastic-net regularization of singular values is introduced to enhance the compactness and effectiveness of the learned projection matrix. Third, the resulting optimization problem of ENLR has a closed-form solution in each iteration, which can be solved efficiently. Finally, rather than directly exploiting the projection matrix for recognition, our methods employ the transformed features as the new discriminate representations to make final image classification. Compared with the traditional linear regression model and some of its variants, our method is much more accurate in image classification. Extensive experiments conducted on publicly available data sets well demonstrate that the proposed framework can outperform the state-of-the-art methods. The MATLAB codes of our methods can be available at http://www.yongxu.org/lunwen.html.
Gaussian process regression analysis for functional data

CERN Document Server

Shi, Jian Qing

2011-01-01

Gaussian Process Regression Analysis for Functional Data presents nonparametric statistical methods for functional regression analysis, specifically the methods based on a Gaussian process prior in a functional space. The authors focus on problems involving functional response variables and mixed covariates of functional and scalar variables.Covering the basics of Gaussian process regression, the first several chapters discuss functional data analysis, theoretical aspects based on the asymptotic properties of Gaussian process regression models, and new methodological developments for high dime
Regression and Sparse Regression Methods for Viscosity Estimation of Acid Milk From it’s Sls Features

DEFF Research Database (Denmark)

Sharifzadeh, Sara; Skytte, Jacob Lercke; Nielsen, Otto Højager Attermann

2012-01-01

Statistical solutions find wide spread use in food and medicine quality control. We investigate the effect of different regression and sparse regression methods for a viscosity estimation problem using the spectro-temporal features from new Sub-Surface Laser Scattering (SLS) vision system. From...... with sparse LAR, lasso and Elastic Net (EN) sparse regression methods. Due to the inconsistent measurement condition, Locally Weighted Scatter plot Smoothing (Loess) has been employed to alleviate the undesired variation in the estimated viscosity. The experimental results of applying different methods show...
Sex Differences and Self-Reported Attention Problems During Baseline Concussion Testing.

Science.gov (United States)

Brooks, Brian L; Iverson, Grant L; Atkins, Joseph E; Zafonte, Ross; Berkner, Paul D

2016-01-01

Amateur athletic programs often use computerized cognitive testing as part of their concussion management programs. There is evidence that athletes with preexisting attention problems will have worse cognitive performance and more symptoms at baseline testing. The purpose of this study was to examine whether attention problems affect assessments differently for male and female athletes. Participants were drawn from a database that included 6,840 adolescents from Maine who completed Immediate Postconcussion Assessment and Cognitive Testing (ImPACT) at baseline (primary outcome measure). The final sample included 249 boys and 100 girls with self-reported attention problems. Each participant was individually matched for sex, age, number of past concussions, and sport to a control participant (249 boys, 100 girls). Boys with attention problems had worse reaction time than boys without attention problems. Girls with attention problems had worse visual-motor speed than girls without attention problems. Boys with attention problems reported more total symptoms, including more cognitive-sensory and sleep-arousal symptoms, compared with boys without attention problems. Girls with attention problems reported more cognitive-sensory, sleep-arousal, and affective symptoms than girls without attention problems. When considering the assessment, management, and outcome from concussions in adolescent athletes, it is important to consider both sex and preinjury attention problems regarding cognitive test results and symptom reporting.
Hypothesis Designs for Three-Hypothesis Test Problems

OpenAIRE

Yan Li; Xiaolong Pu

2010-01-01

As a helpful guide for applications, the alternative hypotheses of the three-hypothesis test problems are designed under the required error probabilities and average sample number in this paper. The asymptotic formulas and the proposed numerical quadrature formulas are adopted, respectively, to obtain the hypothesis designs and the corresponding sequential test schemes under the Koopman-Darmois distributions. The example of the normal mean test shows that our methods are qu...
A New Quantile Regression Model to forecast one-day-ahead Value-at-Risk

OpenAIRE

Steine, Sturla Aavik; Eliassen, Markus Thorsø

2014-01-01

This master thesis focuses on the problem of forecasting volatility and Value-at-Risk (VaR) in the nancial markets. There are numerous methods for calculating VaR. However, research in this area has not currently reached one universally accepted method that can produce good VaR estimates across dierent data series, and VaR prediction and quality testing is still a very challenging statistical problem. The thesis has two main purposes, the rst is to propose a simple quantile regression mod...
Neoclassical versus Frontier Production Models ? Testing for the Skewness of Regression Residuals

DEFF Research Database (Denmark)

Kuosmanen, T; Fosgerau, Mogens

2009-01-01

The empirical literature on production and cost functions is divided into two strands. The neoclassical approach concentrates on model parameters, while the frontier approach decomposes the disturbance term to a symmetric noise term and a positively skewed inefficiency term. We propose a theoreti......The empirical literature on production and cost functions is divided into two strands. The neoclassical approach concentrates on model parameters, while the frontier approach decomposes the disturbance term to a symmetric noise term and a positively skewed inefficiency term. We propose...... a theoretical justification for the skewness of the inefficiency term, arguing that this skewness is the key testable hypothesis of the frontier approach. We propose to test the regression residuals for skewness in order to distinguish the two competing approaches. Our test builds directly upon the asymmetry...
Wind Power Ramp Events Prediction with Hybrid Machine Learning Regression Techniques and Reanalysis Data

Directory of Open Access Journals (Sweden)

Laura Cornejo-Bueno

2017-11-01

Full Text Available Wind Power Ramp Events (WPREs are large fluctuations of wind power in a short time interval, which lead to strong, undesirable variations in the electric power produced by a wind farm. Its accurate prediction is important in the effort of efficiently integrating wind energy in the electric system, without affecting considerably its stability, robustness and resilience. In this paper, we tackle the problem of predicting WPREs by applying Machine Learning (ML regression techniques. Our approach consists of using variables from atmospheric reanalysis data as predictive inputs for the learning machine, which opens the possibility of hybridizing numerical-physical weather models with ML techniques for WPREs prediction in real systems. Specifically, we have explored the feasibility of a number of state-of-the-art ML regression techniques, such as support vector regression, artificial neural networks (multi-layer perceptrons and extreme learning machines and Gaussian processes to solve the problem. Furthermore, the ERA-Interim reanalysis from the European Center for Medium-Range Weather Forecasts is the one used in this paper because of its accuracy and high resolution (in both spatial and temporal domains. Aiming at validating the feasibility of our predicting approach, we have carried out an extensive experimental work using real data from three wind farms in Spain, discussing the performance of the different ML regression tested in this wind power ramp event prediction problem.
Multiple linear combination (MLC) regression tests for common variants adapted to linkage disequilibrium structure.

Science.gov (United States)

Yoo, Yun Joo; Sun, Lei; Poirier, Julia G; Paterson, Andrew D; Bull, Shelley B

2017-02-01

By jointly analyzing multiple variants within a gene, instead of one at a time, gene-based multiple regression can improve power, robustness, and interpretation in genetic association analysis. We investigate multiple linear combination (MLC) test statistics for analysis of common variants under realistic trait models with linkage disequilibrium (LD) based on HapMap Asian haplotypes. MLC is a directional test that exploits LD structure in a gene to construct clusters of closely correlated variants recoded such that the majority of pairwise correlations are positive. It combines variant effects within the same cluster linearly, and aggregates cluster-specific effects in a quadratic sum of squares and cross-products, producing a test statistic with reduced degrees of freedom (df) equal to the number of clusters. By simulation studies of 1000 genes from across the genome, we demonstrate that MLC is a well-powered and robust choice among existing methods across a broad range of gene structures. Compared to minimum P-value, variance-component, and principal-component methods, the mean power of MLC is never much lower than that of other methods, and can be higher, particularly with multiple causal variants. Moreover, the variation in gene-specific MLC test size and power across 1000 genes is less than that of other methods, suggesting it is a complementary approach for discovery in genome-wide analysis. The cluster construction of the MLC test statistics helps reveal within-gene LD structure, allowing interpretation of clustered variants as haplotypic effects, while multiple regression helps to distinguish direct and indirect associations. © 2016 The Authors Genetic Epidemiology Published by Wiley Periodicals, Inc.
Reduction of the number of parameters needed for a polynomial random regression test-day model

NARCIS (Netherlands)

Pool, M.H.; Meuwissen, T.H.E.

2000-01-01

Legendre polynomials were used to describe the (co)variance matrix within a random regression test day model. The goodness of fit depended on the polynomial order of fit, i.e., number of parameters to be estimated per animal but is limited by computing capacity. Two aspects: incomplete lactation
Advanced statistics: linear regression, part II: multiple linear regression.

Science.gov (United States)

Marill, Keith A

2004-01-01

The applications of simple linear regression in medical research are limited, because in most situations, there are multiple relevant predictor variables. Univariate statistical techniques such as simple linear regression use a single predictor variable, and they often may be mathematically correct but clinically misleading. Multiple linear regression is a mathematical technique used to model the relationship between multiple independent predictor variables and a single dependent outcome variable. It is used in medical research to model observational data, as well as in diagnostic and therapeutic studies in which the outcome is dependent on more than one factor. Although the technique generally is limited to data that can be expressed with a linear function, it benefits from a well-developed mathematical framework that yields unique solutions and exact confidence intervals for regression coefficients. Building on Part I of this series, this article acquaints the reader with some of the important concepts in multiple regression analysis. These include multicollinearity, interaction effects, and an expansion of the discussion of inference testing, leverage, and variable transformations to multivariate models. Examples from the first article in this series are expanded on using a primarily graphic, rather than mathematical, approach. The importance of the relationships among the predictor variables and the dependence of the multivariate model coefficients on the choice of these variables are stressed. Finally, concepts in regression model building are discussed.
Regression analysis with categorized regression calibrated exposure: some interesting findings

Directory of Open Access Journals (Sweden)

Hjartåker Anette

2006-07-01

percentile scale. Relating back to the original scale of the exposure solves the problem. The conclusion regards all regression models.
Weighted SGD for ℓp Regression with Randomized Preconditioning*

Science.gov (United States)

Yang, Jiyan; Chow, Yin-Lam; Ré, Christopher; Mahoney, Michael W.

2018-01-01

In recent years, stochastic gradient descent (SGD) methods and randomized linear algebra (RLA) algorithms have been applied to many large-scale problems in machine learning and data analysis. SGD methods are easy to implement and applicable to a wide range of convex optimization problems. In contrast, RLA algorithms provide much stronger performance guarantees but are applicable to a narrower class of problems. We aim to bridge the gap between these two methods in solving constrained overdetermined linear regression problems—e.g., ℓ2 and ℓ1 regression problems. We propose a hybrid algorithm named pwSGD that uses RLA techniques for preconditioning and constructing an importance sampling distribution, and then performs an SGD-like iterative process with weighted sampling on the preconditioned system.By rewriting a deterministic ℓp regression problem as a stochastic optimization problem, we connect pwSGD to several existing ℓp solvers including RLA methods with algorithmic leveraging (RLA for short).We prove that pwSGD inherits faster convergence rates that only depend on the lower dimension of the linear system, while maintaining low computation complexity. Such SGD convergence rates are superior to other related SGD algorithm such as the weighted randomized Kaczmarz algorithm.Particularly, when solving ℓ1 regression with size n by d, pwSGD returns an approximate solution with ε relative error in the objective value in 𝒪(log n·nnz(A)+poly(d)/ε2) time. This complexity is uniformly better than that of RLA methods in terms of both ε and d when the problem is unconstrained. In the presence of constraints, pwSGD only has to solve a sequence of much simpler and smaller optimization problem over the same constraints. In general this is more efficient than solving the constrained subproblem required in RLA.For ℓ2 regression, pwSGD returns an approximate solution with ε relative error in the objective value and the solution vector measured in
Bias and Uncertainty in Regression-Calibrated Models of Groundwater Flow in Heterogeneous Media

DEFF Research Database (Denmark)

Cooley, R.L.; Christensen, Steen

2006-01-01

small. Model error is accounted for in the weighted nonlinear regression methodology developed to estimate θ* and assess model uncertainties by incorporating the second-moment matrix of the model errors into the weight matrix. Techniques developed by statisticians to analyze classical nonlinear...... are reduced in magnitude. Biases, correction factors, and confidence and prediction intervals were obtained for a test problem for which model error is large to test robustness of the methodology. Numerical results conform with the theoretical analysis....
A general equation to obtain multiple cut-off scores on a test from multinomial logistic regression.

Science.gov (United States)

Bersabé, Rosa; Rivas, Teresa

2010-05-01

The authors derive a general equation to compute multiple cut-offs on a total test score in order to classify individuals into more than two ordinal categories. The equation is derived from the multinomial logistic regression (MLR) model, which is an extension of the binary logistic regression (BLR) model to accommodate polytomous outcome variables. From this analytical procedure, cut-off scores are established at the test score (the predictor variable) at which an individual is as likely to be in category j as in category j+1 of an ordinal outcome variable. The application of the complete procedure is illustrated by an example with data from an actual study on eating disorders. In this example, two cut-off scores on the Eating Attitudes Test (EAT-26) scores are obtained in order to classify individuals into three ordinal categories: asymptomatic, symptomatic and eating disorder. Diagnoses were made from the responses to a self-report (Q-EDD) that operationalises DSM-IV criteria for eating disorders. Alternatives to the MLR model to set multiple cut-off scores are discussed.
Association of Stressful Life Events with Psychological Problems: A Large-Scale Community-Based Study Using Grouped Outcomes Latent Factor Regression with Latent Predictors

Directory of Open Access Journals (Sweden)

Akbar Hassanzadeh

2017-01-01

Full Text Available Objective. The current study is aimed at investigating the association between stressful life events and psychological problems in a large sample of Iranian adults. Method. In a cross-sectional large-scale community-based study, 4763 Iranian adults, living in Isfahan, Iran, were investigated. Grouped outcomes latent factor regression on latent predictors was used for modeling the association of psychological problems (depression, anxiety, and psychological distress, measured by Hospital Anxiety and Depression Scale (HADS and General Health Questionnaire (GHQ-12, as the grouped outcomes, and stressful life events, measured by a self-administered stressful life events (SLEs questionnaire, as the latent predictors. Results. The results showed that the personal stressors domain has significant positive association with psychological distress (β=0.19, anxiety (β=0.25, depression (β=0.15, and their collective profile score (β=0.20, with greater associations in females (β=0.28 than in males (β=0.13 (all P<0.001. In addition, in the adjusted models, the regression coefficients for the association of social stressors domain and psychological problems profile score were 0.37, 0.35, and 0.46 in total sample, males, and females, respectively (P<0.001. Conclusion. Results of our study indicated that different stressors, particularly those socioeconomic related, have an effective impact on psychological problems. It is important to consider the social and cultural background of a population for managing the stressors as an effective approach for preventing and reducing the destructive burden of psychological problems.
Association of Stressful Life Events with Psychological Problems: A Large-Scale Community-Based Study Using Grouped Outcomes Latent Factor Regression with Latent Predictors

Science.gov (United States)

Hassanzadeh, Akbar; Heidari, Zahra; Hassanzadeh Keshteli, Ammar; Afshar, Hamid

2017-01-01

Objective The current study is aimed at investigating the association between stressful life events and psychological problems in a large sample of Iranian adults. Method In a cross-sectional large-scale community-based study, 4763 Iranian adults, living in Isfahan, Iran, were investigated. Grouped outcomes latent factor regression on latent predictors was used for modeling the association of psychological problems (depression, anxiety, and psychological distress), measured by Hospital Anxiety and Depression Scale (HADS) and General Health Questionnaire (GHQ-12), as the grouped outcomes, and stressful life events, measured by a self-administered stressful life events (SLEs) questionnaire, as the latent predictors. Results The results showed that the personal stressors domain has significant positive association with psychological distress (β = 0.19), anxiety (β = 0.25), depression (β = 0.15), and their collective profile score (β = 0.20), with greater associations in females (β = 0.28) than in males (β = 0.13) (all P < 0.001). In addition, in the adjusted models, the regression coefficients for the association of social stressors domain and psychological problems profile score were 0.37, 0.35, and 0.46 in total sample, males, and females, respectively (P < 0.001). Conclusion Results of our study indicated that different stressors, particularly those socioeconomic related, have an effective impact on psychological problems. It is important to consider the social and cultural background of a population for managing the stressors as an effective approach for preventing and reducing the destructive burden of psychological problems. PMID:29312459
A Fast Solution of the Lindley Equations for the M-Group Regression Problem. Technical Report 78-3, October 1977 through May 1978.

Science.gov (United States)

Molenaar, Ivo W.

The technical problems involved in obtaining Bayesian model estimates for the regression parameters in m similar groups are studied. The available computer programs, BPREP (BASIC), and BAYREG, both written in FORTRAN, require an amount of computer processing that does not encourage regular use. These programs are analyzed so that the performance…
Inverse problem in radionuclide transport

International Nuclear Information System (INIS)

Yu, C.

1988-01-01

The disposal of radioactive waste must comply with the performance objectives set forth in 10 CFR 61 for low-level waste (LLW) and 10 CFR 60 for high-level waste (HLW). To determine probable compliance, the proposed disposal system can be modeled to predict its performance. One of the difficulties encountered in such a study is modeling the migration of radionuclides through a complex geologic medium for the long term. Although many radionuclide transport models exist in the literature, the accuracy of the model prediction is highly dependent on the model parameters used. The problem of using known parameters in a radionuclide transport model to predict radionuclide concentrations is a direct problem (DP); whereas the reverse of DP, i.e., the parameter identification problem of determining model parameters from known radionuclide concentrations, is called the inverse problem (IP). In this study, a procedure to solve IP is tested, using the regression technique. Several nonlinear regression programs are examined, and the best one is recommended. 13 refs., 1 tab

Polylinear regression analysis in radiochemistry

International Nuclear Information System (INIS)

Kopyrin, A.A.; Terent'eva, T.N.; Khramov, N.N.

1995-01-01

A number of radiochemical problems have been formulated in the framework of polylinear regression analysis, which permits the use of conventional mathematical methods for their solution. The authors have considered features of the use of polylinear regression analysis for estimating the contributions of various sources to the atmospheric pollution, for studying irradiated nuclear fuel, for estimating concentrations from spectral data, for measuring neutron fields of a nuclear reactor, for estimating crystal lattice parameters from X-ray diffraction patterns, for interpreting data of X-ray fluorescence analysis, for estimating complex formation constants, and for analyzing results of radiometric measurements. The problem of estimating the target parameters can be incorrect at certain properties of the system under study. The authors showed the possibility of regularization by adding a fictitious set of data open-quotes obtainedclose quotes from the orthogonal design. To estimate only a part of the parameters under consideration, the authors used incomplete rank models. In this case, it is necessary to take into account the possibility of confounding estimates. An algorithm for evaluating the degree of confounding is presented which is realized using standard software or regression analysis
Problem-Solving Test: Tryptophan Operon Mutants

Science.gov (United States)

Szeberenyi, Jozsef

2010-01-01

This paper presents a problem-solving test that deals with the regulation of the "trp" operon of "Escherichia coli." Two mutants of this operon are described: in mutant A, the operator region of the operon carries a point mutation so that it is unable to carry out its function; mutant B expresses a "trp" repressor protein unable to bind…
Testing the transferability of regression equations derived from small sub-catchments to a large area in central Sweden

Directory of Open Access Journals (Sweden)

C. Xu

2003-01-01

Full Text Available There is an ever increasing need to apply hydrological models to catchments where streamflow data are unavailable or to large geographical regions where calibration is not feasible. Estimation of model parameters from spatial physical data is the key issue in the development and application of hydrological models at various scales. To investigate the suitability of transferring the regression equations relating model parameters to physical characteristics developed from small sub-catchments to a large region for estimating model parameters, a conceptual snow and water balance model was optimised on all the sub-catchments in the region. A multiple regression analysis related model parameters to physical data for the catchments and the regression equations derived from the small sub-catchments were used to calculate regional parameter values for the large basin using spatially aggregated physical data. For the model tested, the results support the suitability of transferring the regression equations to the larger region. Keywords: water balance modelling,large scale, multiple regression, regionalisation
Zero-Shot Learning via Attribute Regression and Class Prototype Rectification.

Science.gov (United States)

Luo, Changzhi; Li, Zhetao; Huang, Kaizhu; Feng, Jiashi; Wang, Meng

2018-02-01

Zero-shot learning (ZSL) aims at classifying examples for unseen classes (with no training examples) given some other seen classes (with training examples). Most existing approaches exploit intermedia-level information (e.g., attributes) to transfer knowledge from seen classes to unseen classes. A common practice is to first learn projections from samples to attributes on seen classes via a regression method, and then apply such projections to unseen classes directly. However, it turns out that such a manner of learning strategy easily causes projection domain shift problem and hubness problem, which hinder the performance of ZSL task. In this paper, we also formulate ZSL as an attribute regression problem. However, different from general regression-based solutions, the proposed approach is novel in three aspects. First, a class prototype rectification method is proposed to connect the unseen classes to the seen classes. Here, a class prototype refers to a vector representation of a class, and it is also known as a class center, class signature, or class exemplar. Second, an alternating learning scheme is proposed for jointly performing attribute regression and rectifying the class prototypes. Finally, a new objective function which takes into consideration both the attribute regression accuracy and the class prototype discrimination is proposed. By introducing such a solution, domain shift problem and hubness problem can be mitigated. Experimental results on three public datasets (i.e., CUB200-2011, SUN Attribute, and aPaY) well demonstrate the effectiveness of our approach.
Regression modeling of ground-water flow

Science.gov (United States)

Cooley, R.L.; Naff, R.L.

1985-01-01

Nonlinear multiple regression methods are developed to model and analyze groundwater flow systems. Complete descriptions of regression methodology as applied to groundwater flow models allow scientists and engineers engaged in flow modeling to apply the methods to a wide range of problems. Organization of the text proceeds from an introduction that discusses the general topic of groundwater flow modeling, to a review of basic statistics necessary to properly apply regression techniques, and then to the main topic: exposition and use of linear and nonlinear regression to model groundwater flow. Statistical procedures are given to analyze and use the regression models. A number of exercises and answers are included to exercise the student on nearly all the methods that are presented for modeling and statistical analysis. Three computer programs implement the more complex methods. These three are a general two-dimensional, steady-state regression model for flow in an anisotropic, heterogeneous porous medium, a program to calculate a measure of model nonlinearity with respect to the regression parameters, and a program to analyze model errors in computed dependent variables such as hydraulic head. (USGS)
Cointegrating MiDaS Regressions and a MiDaS Test

OpenAIRE

J. Isaac Miller

2011-01-01

This paper introduces cointegrating mixed data sampling (CoMiDaS) regressions, generalizing nonlinear MiDaS regressions in the extant literature. Under a linear mixed-frequency data-generating process, MiDaS regressions provide a parsimoniously parameterized nonlinear alternative when the linear forecasting model is over-parameterized and may be infeasible. In spite of potential correlation of the error term both serially and with the regressors, I find that nonlinear least squares consistent...
General Nature of Multicollinearity in Multiple Regression Analysis.

Science.gov (United States)

Liu, Richard

1981-01-01

Discusses multiple regression, a very popular statistical technique in the field of education. One of the basic assumptions in regression analysis requires that independent variables in the equation should not be highly correlated. The problem of multicollinearity and some of the solutions to it are discussed. (Author)
Testing and Modeling Fuel Regression Rate in a Miniature Hybrid Burner

Directory of Open Access Journals (Sweden)

Luciano Fanton

2012-01-01

Full Text Available Ballistic characterization of an extended group of innovative HTPB-based solid fuel formulations for hybrid rocket propulsion was performed in a lab-scale burner. An optical time-resolved technique was used to assess the quasisteady regression history of single perforation, cylindrical samples. The effects of metalized additives and radiant heat transfer on the regression rate of such formulations were assessed. Under the investigated operating conditions and based on phenomenological models from the literature, analyses of the collected experimental data show an appreciable influence of the radiant heat flux from burnt gases and soot for both unloaded and loaded fuel formulations. Pure HTPB regression rate data are satisfactorily reproduced, while the impressive initial regression rates of metalized formulations require further assessment.
Testing quantum contextuality. The problem of compatibility

International Nuclear Information System (INIS)

Szangolies, Jochen

2015-01-01

Jochen Szangolies contributes a novel way of dealing with the problem of the experimental testability of the Kochen-Specker theorem posed by realistic, that is, noisy, measurements. Such noise spoils perfect compatibility between successive measurements, which however is a necessary requirement to test the notion of contextuality in usual approaches. To overcome this difficulty, a new, extended notion of contextuality that reduces to Kochen-Specker contextuality in the limit of perfect measurement implementations is proposed by the author, together with a scheme to test this notion experimentally. Furthermore, the behaviour of these tests under realistic noise conditions is investigated.
Test-state approach to the quantum search problem

International Nuclear Information System (INIS)

Sehrawat, Arun; Nguyen, Le Huy; Englert, Berthold-Georg

2011-01-01

The search for 'a quantum needle in a quantum haystack' is a metaphor for the problem of finding out which one of a permissible set of unitary mappings - the oracles - is implemented by a given black box. Grover's algorithm solves this problem with quadratic speedup as compared with the analogous search for 'a classical needle in a classical haystack'. Since the outcome of Grover's algorithm is probabilistic - it gives the correct answer with high probability, not with certainty - the answer requires verification. For this purpose we introduce specific test states, one for each oracle. These test states can also be used to realize 'a classical search for the quantum needle' which is deterministic - it always gives a definite answer after a finite number of steps - and 3.41 times as fast as the purely classical search. Since the test-state search and Grover's algorithm look for the same quantum needle, the average number of oracle queries of the test-state search is the classical benchmark for Grover's algorithm.
A unified framework for testing in the linear regression model under unknown order of fractional integration

DEFF Research Database (Denmark)

Christensen, Bent Jesper; Kruse, Robinson; Sibbertsen, Philipp

We consider hypothesis testing in a general linear time series regression framework when the possibly fractional order of integration of the error term is unknown. We show that the approach suggested by Vogelsang (1998a) for the case of integer integration does not apply to the case of fractional...
Abstract Expression Grammar Symbolic Regression

Science.gov (United States)

Korns, Michael F.

This chapter examines the use of Abstract Expression Grammars to perform the entire Symbolic Regression process without the use of Genetic Programming per se. The techniques explored produce a symbolic regression engine which has absolutely no bloat, which allows total user control of the search space and output formulas, which is faster, and more accurate than the engines produced in our previous papers using Genetic Programming. The genome is an all vector structure with four chromosomes plus additional epigenetic and constraint vectors, allowing total user control of the search space and the final output formulas. A combination of specialized compiler techniques, genetic algorithms, particle swarm, aged layered populations, plus discrete and continuous differential evolution are used to produce an improved symbolic regression sytem. Nine base test cases, from the literature, are used to test the improvement in speed and accuracy. The improved results indicate that these techniques move us a big step closer toward future industrial strength symbolic regression systems.
The inverse problem of the magnetostatic nondestructive testing

International Nuclear Information System (INIS)

Pechenkov, A.N.; Shcherbinin, V.E.

2006-01-01

The inverse problem of magnetostatic nondestructive testing consists in the calculation of the shape and magnetic characteristics of a flaw in a uniform magnetized body with measurement of static magnetic field beyond the body. If the flaw does not contain any magnetic material, the inverse problem is reduced to identification of the shape and magnetic susceptibility of the substance. This case has been considered in the study [ru
Some problems in use of the moral judgment test.

Science.gov (United States)

Villegas de Posada, Cristina

2005-06-01

The Moral Judgment Test has been widely used in evaluation of moral development; however, it presents some problems related to the trait measured, reliability, and validity of its summary score (C-index). This index reflects consistency in moral judgment, but this construct is different from moral development as stated by Kohlberg. Therefore, users interested in the latter evaluation should refer to other indexes derived from the test. Some of the analyzed problems could be partially corrected with more theory and research on moral consistency as a component of moral competence.
Vector regression introduced

Directory of Open Access Journals (Sweden)

Mok Tik

2014-06-01

Full Text Available This study formulates regression of vector data that will enable statistical analysis of various geodetic phenomena such as, polar motion, ocean currents, typhoon/hurricane tracking, crustal deformations, and precursory earthquake signals. The observed vector variable of an event (dependent vector variable is expressed as a function of a number of hypothesized phenomena realized also as vector variables (independent vector variables and/or scalar variables that are likely to impact the dependent vector variable. The proposed representation has the unique property of solving the coefficients of independent vector variables (explanatory variables also as vectors, hence it supersedes multivariate multiple regression models, in which the unknown coefficients are scalar quantities. For the solution, complex numbers are used to rep- resent vector information, and the method of least squares is deployed to estimate the vector model parameters after transforming the complex vector regression model into a real vector regression model through isomorphism. Various operational statistics for testing the predictive significance of the estimated vector parameter coefficients are also derived. A simple numerical example demonstrates the use of the proposed vector regression analysis in modeling typhoon paths.
Sparse reduced-rank regression with covariance estimation

KAUST Repository

Chen, Lisha

2014-12-08

Improving the predicting performance of the multiple response regression compared with separate linear regressions is a challenging question. On the one hand, it is desirable to seek model parsimony when facing a large number of parameters. On the other hand, for certain applications it is necessary to take into account the general covariance structure for the errors of the regression model. We assume a reduced-rank regression model and work with the likelihood function with general error covariance to achieve both objectives. In addition we propose to select relevant variables for reduced-rank regression by using a sparsity-inducing penalty, and to estimate the error covariance matrix simultaneously by using a similar penalty on the precision matrix. We develop a numerical algorithm to solve the penalized regression problem. In a simulation study and real data analysis, the new method is compared with two recent methods for multivariate regression and exhibits competitive performance in prediction and variable selection.
Sparse reduced-rank regression with covariance estimation

KAUST Repository

Chen, Lisha; Huang, Jianhua Z.

2014-01-01

Improving the predicting performance of the multiple response regression compared with separate linear regressions is a challenging question. On the one hand, it is desirable to seek model parsimony when facing a large number of parameters. On the other hand, for certain applications it is necessary to take into account the general covariance structure for the errors of the regression model. We assume a reduced-rank regression model and work with the likelihood function with general error covariance to achieve both objectives. In addition we propose to select relevant variables for reduced-rank regression by using a sparsity-inducing penalty, and to estimate the error covariance matrix simultaneously by using a similar penalty on the precision matrix. We develop a numerical algorithm to solve the penalized regression problem. In a simulation study and real data analysis, the new method is compared with two recent methods for multivariate regression and exhibits competitive performance in prediction and variable selection.
Testing foreign language impact on engineering students' scientific problem-solving performance

Science.gov (United States)

Tatzl, Dietmar; Messnarz, Bernd

2013-12-01

This article investigates the influence of English as the examination language on the solution of physics and science problems by non-native speakers in tertiary engineering education. For that purpose, a statistically significant total number of 96 students in four year groups from freshman to senior level participated in a testing experiment in the Degree Programme of Aviation at the FH JOANNEUM University of Applied Sciences, Graz, Austria. Half of each test group were given a set of 12 physics problems described in German, the other half received the same set of problems described in English. It was the goal to test linguistic reading comprehension necessary for scientific problem solving instead of physics knowledge as such. The results imply that written undergraduate English-medium engineering tests and examinations may not require additional examination time or language-specific aids for students who have reached university-entrance proficiency in English as a foreign language.
A Novel Multiobjective Evolutionary Algorithm Based on Regression Analysis

Directory of Open Access Journals (Sweden)

Zhiming Song

2015-01-01

Full Text Available As is known, the Pareto set of a continuous multiobjective optimization problem with m objective functions is a piecewise continuous (m-1-dimensional manifold in the decision space under some mild conditions. However, how to utilize the regularity to design multiobjective optimization algorithms has become the research focus. In this paper, based on this regularity, a model-based multiobjective evolutionary algorithm with regression analysis (MMEA-RA is put forward to solve continuous multiobjective optimization problems with variable linkages. In the algorithm, the optimization problem is modelled as a promising area in the decision space by a probability distribution, and the centroid of the probability distribution is (m-1-dimensional piecewise continuous manifold. The least squares method is used to construct such a model. A selection strategy based on the nondominated sorting is used to choose the individuals to the next generation. The new algorithm is tested and compared with NSGA-II and RM-MEDA. The result shows that MMEA-RA outperforms RM-MEDA and NSGA-II on the test instances with variable linkages. At the same time, MMEA-RA has higher efficiency than the other two algorithms. A few shortcomings of MMEA-RA have also been identified and discussed in this paper.
Regression of environmental noise in LIGO data

International Nuclear Information System (INIS)

Tiwari, V; Klimenko, S; Mitselmakher, G; Necula, V; Drago, M; Prodi, G; Frolov, V; Yakushin, I; Re, V; Salemi, F; Vedovato, G

2015-01-01

We address the problem of noise regression in the output of gravitational-wave (GW) interferometers, using data from the physical environmental monitors (PEM). The objective of the regression analysis is to predict environmental noise in the GW channel from the PEM measurements. One of the most promising regression methods is based on the construction of Wiener–Kolmogorov (WK) filters. Using this method, the seismic noise cancellation from the LIGO GW channel has already been performed. In the presented approach the WK method has been extended, incorporating banks of Wiener filters in the time–frequency domain, multi-channel analysis and regulation schemes, which greatly enhance the versatility of the regression analysis. Also we present the first results on regression of the bi-coherent noise in the LIGO data. (paper)

Research Problems Associated with Limiting the Applied Force in Vibration Tests and Conducting Base-Drive Modal Vibration Tests

Science.gov (United States)

Scharton, Terry D.

1995-01-01

The intent of this paper is to make a case for developing and conducting vibration tests which are both realistic and practical (a question of tailoring versus standards). Tests are essential for finding things overlooked in the analyses. The best test is often the most realistic test which can be conducted within the cost and budget constraints. Some standards are essential, but the author believes more in the individual's ingenuity to solve a specific problem than in the application of standards which reduce problems (and technology) to their lowest common denominator. Force limited vibration tests and base-drive modal tests are two examples of realistic, but practical testing approaches. Since both of these approaches are relatively new, a number of interesting research problems exist, and these are emphasized herein.
Feasibility testing for dial-a-ride problems

DEFF Research Database (Denmark)

Haugland, Dag; Ho, Sin C.

Hunsaker and Savelsbergh have proposed an algorithm for testing feasibility of a route in the solution to the dial-a-ride problem. The constraints that are checked are load capacity constraints, time windows, ride time bounds and wait time bounds. The algorithm has linear running time. By virtue...
Feasibility Testing for Dial-a-Ride Problems

DEFF Research Database (Denmark)

Haugland, Dag; Ho, Sin C.

2010-01-01

Hunsaker and Savelsbergh have proposed an algorithm for testing feasibility of a route in the solution to the dial-a-ride problem. The constraints that are checked are load capacity constraints, time windows, ride time bounds and wait time bounds. The algorithm has linear running time. By virtue...
Regression-Based Norms for a Bi-factor Model for Scoring the Brief Test of Adult Cognition by Telephone (BTACT).

Science.gov (United States)

Gurnani, Ashita S; John, Samantha E; Gavett, Brandon E

2015-05-01

The current study developed regression-based normative adjustments for a bi-factor model of the The Brief Test of Adult Cognition by Telephone (BTACT). Archival data from the Midlife Development in the United States-II Cognitive Project were used to develop eight separate linear regression models that predicted bi-factor BTACT scores, accounting for age, education, gender, and occupation-alone and in various combinations. All regression models provided statistically significant fit to the data. A three-predictor regression model fit best and accounted for 32.8% of the variance in the global bi-factor BTACT score. The fit of the regression models was not improved by gender. Eight different regression models are presented to allow the user flexibility in applying demographic corrections to the bi-factor BTACT scores. Occupation corrections, while not widely used, may provide useful demographic adjustments for adult populations or for those individuals who have attained an occupational status not commensurate with expected educational attainment. © The Author 2015. Published by Oxford University Press. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.
Inverse problems in the design, modeling and testing of engineering systems

Science.gov (United States)

Alifanov, Oleg M.

1991-01-01

Formulations, classification, areas of application, and approaches to solving different inverse problems are considered for the design of structures, modeling, and experimental data processing. Problems in the practical implementation of theoretical-experimental methods based on solving inverse problems are analyzed in order to identify mathematical models of physical processes, aid in input data preparation for design parameter optimization, help in design parameter optimization itself, and to model experiments, large-scale tests, and real tests of engineering systems.
THE USEFULNESS OF USER TESTING METHODS IN IDENTIFYING PROBLEMS ON UNIVERSITY WEBSITES

Directory of Open Access Journals (Sweden)

Layla Hasan

2014-10-01

Full Text Available This paper aims to investigate the usefulness of three user testing methods (observation, and using both quantitative and qualitative data from a post-test questionnaire in terms of their ability or inability to find specific usability problems on university websites. The results showed that observation was the best method, compared to the other two, in identifying large numbers of major and minor usability problems on university websites. The results also showed that employing qualitative data from a post-test questionnaire was a useful complementary method since this identified additional usability problems that were not identified by the observation method. However, the results showed that the quantitative data from the post-test questionnaire were inaccurate and ineffective in terms of identifying usability problems on such websites.
Robust Face Recognition via Multi-Scale Patch-Based Matrix Regression.

Directory of Open Access Journals (Sweden)

Guangwei Gao

Full Text Available In many real-world applications such as smart card solutions, law enforcement, surveillance and access control, the limited training sample size is the most fundamental problem. By making use of the low-rank structural information of the reconstructed error image, the so-called nuclear norm-based matrix regression has been demonstrated to be effective for robust face recognition with continuous occlusions. However, the recognition performance of nuclear norm-based matrix regression degrades greatly in the face of the small sample size problem. An alternative solution to tackle this problem is performing matrix regression on each patch and then integrating the outputs from all patches. However, it is difficult to set an optimal patch size across different databases. To fully utilize the complementary information from different patch scales for the final decision, we propose a multi-scale patch-based matrix regression scheme based on which the ensemble of multi-scale outputs can be achieved optimally. Extensive experiments on benchmark face databases validate the effectiveness and robustness of our method, which outperforms several state-of-the-art patch-based face recognition algorithms.
A Cross-Domain Collaborative Filtering Algorithm Based on Feature Construction and Locally Weighted Linear Regression.

Science.gov (United States)

Yu, Xu; Lin, Jun-Yu; Jiang, Feng; Du, Jun-Wei; Han, Ji-Zhong

2018-01-01

Cross-domain collaborative filtering (CDCF) solves the sparsity problem by transferring rating knowledge from auxiliary domains. Obviously, different auxiliary domains have different importance to the target domain. However, previous works cannot evaluate effectively the significance of different auxiliary domains. To overcome this drawback, we propose a cross-domain collaborative filtering algorithm based on Feature Construction and Locally Weighted Linear Regression (FCLWLR). We first construct features in different domains and use these features to represent different auxiliary domains. Thus the weight computation across different domains can be converted as the weight computation across different features. Then we combine the features in the target domain and in the auxiliary domains together and convert the cross-domain recommendation problem into a regression problem. Finally, we employ a Locally Weighted Linear Regression (LWLR) model to solve the regression problem. As LWLR is a nonparametric regression method, it can effectively avoid underfitting or overfitting problem occurring in parametric regression methods. We conduct extensive experiments to show that the proposed FCLWLR algorithm is effective in addressing the data sparsity problem by transferring the useful knowledge from the auxiliary domains, as compared to many state-of-the-art single-domain or cross-domain CF methods.
Understanding and quantifying cognitive complexity level in mathematical problem solving items

Directory of Open Access Journals (Sweden)

SUSAN E. EMBRETSON

2008-09-01

Full Text Available The linear logistic test model (LLTM; Fischer, 1973 has been applied to a wide variety of new tests. When the LLTM application involves item complexity variables that are both theoretically interesting and empirically supported, several advantages can result. These advantages include elaborating construct validity at the item level, defining variables for test design, predicting parameters of new items, item banking by sources of complexity and providing a basis for item design and item generation. However, despite the many advantages of applying LLTM to test items, it has been applied less often to understand the sources of complexity for large-scale operational test items. Instead, previously calibrated item parameters are modeled using regression techniques because raw item response data often cannot be made available. In the current study, both LLTM and regression modeling are applied to mathematical problem solving items from a widely used test. The findings from the two methods are compared and contrasted for their implications for continued development of ability and achievement tests based on mathematical problem solving items.
Bayesian ARTMAP for regression.

Science.gov (United States)

Sasu, L M; Andonie, R

2013-10-01

Bayesian ARTMAP (BA) is a recently introduced neural architecture which uses a combination of Fuzzy ARTMAP competitive learning and Bayesian learning. Training is generally performed online, in a single-epoch. During training, BA creates input data clusters as Gaussian categories, and also infers the conditional probabilities between input patterns and categories, and between categories and classes. During prediction, BA uses Bayesian posterior probability estimation. So far, BA was used only for classification. The goal of this paper is to analyze the efficiency of BA for regression problems. Our contributions are: (i) we generalize the BA algorithm using the clustering functionality of both ART modules, and name it BA for Regression (BAR); (ii) we prove that BAR is a universal approximator with the best approximation property. In other words, BAR approximates arbitrarily well any continuous function (universal approximation) and, for every given continuous function, there is one in the set of BAR approximators situated at minimum distance (best approximation); (iii) we experimentally compare the online trained BAR with several neural models, on the following standard regression benchmarks: CPU Computer Hardware, Boston Housing, Wisconsin Breast Cancer, and Communities and Crime. Our results show that BAR is an appropriate tool for regression tasks, both for theoretical and practical reasons. Copyright © 2013 Elsevier Ltd. All rights reserved.
Genetic analysis of somatic cell score in Danish dairy cattle using ramdom regression test-day model

DEFF Research Database (Denmark)

Elsaid, Reda; Sabry, Ayman; Lund, Mogens Sandø

2011-01-01

,233 Danish Holstein cows, were extracted from the national milk recording database. Each data set was analyzed with random regression models using AI-REML. Fixed effects in all models were age at first calving, herd test day, days carrying calf, effects of germ plasm importation (e.g. additive breed effects......) and low between the beginning and the end of lactation. The estimated environmental correlations were lower than the genetic correlations, but the trends were similar. Based on test-day records, the accuracy of genetic evaluations for SCC should be improved when the variation in heritabilities...
Social problems on Semipalatinsk test site

International Nuclear Information System (INIS)

Cherepnin, Yu.S.; Zhdanov, N.A.; Tumenova, B.N.

2000-01-01

In the report main stages of National Nuclear Center of Republic of Kazakhstan activity in the field of scientific information obtain about consequences of conducted nuclear tests, radioecological and medical and biological researches, restoration of natural environment and people's health in Republic of Kazakhstan are reflected. Chronicle and results of joint works within frameworks of international programs in these field are given as well. Analysis of up-to-date social problems of population of the region is carried out
Extensions of Morse-Smale Regression with Application to Actuarial Science

OpenAIRE

Farrelly, Colleen M.

2017-01-01

The problem of subgroups is ubiquitous in scientific research (ex. disease heterogeneity, spatial distributions in ecology...), and piecewise regression is one way to deal with this phenomenon. Morse-Smale regression offers a way to partition the regression function based on level sets of a defined function and that function's basins of attraction. This topologically-based piecewise regression algorithm has shown promise in its initial applications, but the current implementation in the liter...
Marginal longitudinal semiparametric regression via penalized splines

KAUST Repository

Al Kadiri, M.

2010-08-01

We study the marginal longitudinal nonparametric regression problem and some of its semiparametric extensions. We point out that, while several elaborate proposals for efficient estimation have been proposed, a relative simple and straightforward one, based on penalized splines, has not. After describing our approach, we then explain how Gibbs sampling and the BUGS software can be used to achieve quick and effective implementation. Illustrations are provided for nonparametric regression and additive models.
Marginal longitudinal semiparametric regression via penalized splines

KAUST Repository

Al Kadiri, M.; Carroll, R.J.; Wand, M.P.

2010-01-01

We study the marginal longitudinal nonparametric regression problem and some of its semiparametric extensions. We point out that, while several elaborate proposals for efficient estimation have been proposed, a relative simple and straightforward one, based on penalized splines, has not. After describing our approach, we then explain how Gibbs sampling and the BUGS software can be used to achieve quick and effective implementation. Illustrations are provided for nonparametric regression and additive models.
Excel 2016 in applied statistics for high school students a guide to solving practical problems

CERN Document Server

Quirk, Thomas J

2018-01-01

This textbook is a step-by-step guide for high school, community college, or undergraduate students who are taking a course in applied statistics and wish to learn how to use Excel to solve statistical problems. All of the statistics problems in this book will come from the following fields of study: business, education, psychology, marketing, engineering and advertising. Students will learn how to perform key statistical tests in Excel without being overwhelmed by statistical theory. Each chapter briefly explains a topic and then demonstrates how to use Excel commands and formulas to solve specific statistics problems. This book gives practice in using Excel in two different ways: (1) writing formulas (e.g., confidence interval about the mean, one-group t-test, two-group t-test, correlation) and (2) using Excel’s drop-down formula menus (e.g., simple linear regression, multiple correlations and multiple regression, and one-way ANOVA). Three practice problems are provided at the end of each chapter, along w...
Goodness-of-fit tests and model diagnostics for negative binomial regression of RNA sequencing data.

Science.gov (United States)

Mi, Gu; Di, Yanming; Schafer, Daniel W

2015-01-01

This work is about assessing model adequacy for negative binomial (NB) regression, particularly (1) assessing the adequacy of the NB assumption, and (2) assessing the appropriateness of models for NB dispersion parameters. Tools for the first are appropriate for NB regression generally; those for the second are primarily intended for RNA sequencing (RNA-Seq) data analysis. The typically small number of biological samples and large number of genes in RNA-Seq analysis motivate us to address the trade-offs between robustness and statistical power using NB regression models. One widely-used power-saving strategy, for example, is to assume some commonalities of NB dispersion parameters across genes via simple models relating them to mean expression rates, and many such models have been proposed. As RNA-Seq analysis is becoming ever more popular, it is appropriate to make more thorough investigations into power and robustness of the resulting methods, and into practical tools for model assessment. In this article, we propose simulation-based statistical tests and diagnostic graphics to address model adequacy. We provide simulated and real data examples to illustrate that our proposed methods are effective for detecting the misspecification of the NB mean-variance relationship as well as judging the adequacy of fit of several NB dispersion models.
The Finite Deformation Dynamic Sphere Test Problem

Energy Technology Data Exchange (ETDEWEB)

Versino, Daniele [Los Alamos National Lab. (LANL), Los Alamos, NM (United States); Brock, Jerry Steven [Los Alamos National Lab. (LANL), Los Alamos, NM (United States)

2016-09-02

In this manuscript we describe test cases for the dynamic sphere problem in presence of finite deformations. The spherical shell in exam is made of a homogeneous, isotropic or transverse isotropic material and elastic and elastic-plastic material behaviors are considered. Twenty cases, (a) to (t), are thus defined combining material types and boundary conditions. The inner surface radius, the outer surface radius and the material's density are kept constant for all the considered test cases and their values are r_i = 10mm, r_o = 20mm and p = 1000Kg/m³ respectively.
A Cross-Domain Collaborative Filtering Algorithm Based on Feature Construction and Locally Weighted Linear Regression

Directory of Open Access Journals (Sweden)

Xu Yu

2018-01-01

Full Text Available Cross-domain collaborative filtering (CDCF solves the sparsity problem by transferring rating knowledge from auxiliary domains. Obviously, different auxiliary domains have different importance to the target domain. However, previous works cannot evaluate effectively the significance of different auxiliary domains. To overcome this drawback, we propose a cross-domain collaborative filtering algorithm based on Feature Construction and Locally Weighted Linear Regression (FCLWLR. We first construct features in different domains and use these features to represent different auxiliary domains. Thus the weight computation across different domains can be converted as the weight computation across different features. Then we combine the features in the target domain and in the auxiliary domains together and convert the cross-domain recommendation problem into a regression problem. Finally, we employ a Locally Weighted Linear Regression (LWLR model to solve the regression problem. As LWLR is a nonparametric regression method, it can effectively avoid underfitting or overfitting problem occurring in parametric regression methods. We conduct extensive experiments to show that the proposed FCLWLR algorithm is effective in addressing the data sparsity problem by transferring the useful knowledge from the auxiliary domains, as compared to many state-of-the-art single-domain or cross-domain CF methods.
Bayesian models based on test statistics for multiple hypothesis testing problems.

Science.gov (United States)

Ji, Yuan; Lu, Yiling; Mills, Gordon B

2008-04-01

We propose a Bayesian method for the problem of multiple hypothesis testing that is routinely encountered in bioinformatics research, such as the differential gene expression analysis. Our algorithm is based on modeling the distributions of test statistics under both null and alternative hypotheses. We substantially reduce the complexity of the process of defining posterior model probabilities by modeling the test statistics directly instead of modeling the full data. Computationally, we apply a Bayesian FDR approach to control the number of rejections of null hypotheses. To check if our model assumptions for the test statistics are valid for various bioinformatics experiments, we also propose a simple graphical model-assessment tool. Using extensive simulations, we demonstrate the performance of our models and the utility of the model-assessment tool. In the end, we apply the proposed methodology to an siRNA screening and a gene expression experiment.

Progression and regression of cervical pap test lesions in an urban AIDS clinic in the combined antiretroviral therapy era: a longitudinal, retrospective study.

Science.gov (United States)

Lofgren, Sarah M; Tadros, Talaat; Herring-Bailey, Gina; Birdsong, George; Mosunjac, Marina; Flowers, Lisa; Nguyen, Minh Ly

2015-05-01

Our objective was to evaluate the progression and regression of cervical dysplasia in human immunodeficiency virus (HIV)-positive women during the late antiretroviral era. Risk factors as well as outcomes after treatment of cancerous or precancerous lesions were examined. This is a longitudinal retrospective review of cervical Pap tests performed on HIV-infected women with an intact cervix between 2004 and 2011. Subjects needed over two Pap tests for at least 2 years of follow-up. Progression was defined as those who developed a squamous intraepithelial lesion (SIL), atypical glandular cells (AGC), had low-grade SIL (LSIL) followed by atypical squamous cells-cannot exclude high-grade SIL (ASC-H) or high-grade SIL (HSIL), or cancer. Regression was defined as an initial SIL with two or more subsequent normal Pap tests. Persistence was defined as having an SIL without progression or regression. High-risk human papillomavirus (HPV) testing started in 2006 on atypical squamous cells of undetermined significance (ASCUS) Pap tests. AGC at enrollment were excluded from progression analysis. Of 1,445 screened, 383 patients had over two Pap tests for a 2-year period. Of those, 309 had an intact cervix. The median age was 40 years and CD4+ cell count was 277 cells/mL. Four had AGC at enrollment. A quarter had persistently normal Pap tests, 64 (31%) regressed, and 50 (24%) progressed. Four developed cancer. The only risk factor associated with progression was CD4 count. In those with treated lesions, 24 (59%) had negative Pap tests at the end of follow-up. More studies are needed to evaluate follow-up strategies of LSIL patients, potentially combined with HPV testing. Guidelines for HIV-seropositive women who are in care, have improved CD4, and have persistently negative Pap tests could likely lengthen the follow-up interval.
Differentiating regressed melanoma from regressed lichenoid keratosis.

Science.gov (United States)

Chan, Aegean H; Shulman, Kenneth J; Lee, Bonnie A

2017-04-01

Distinguishing regressed lichen planus-like keratosis (LPLK) from regressed melanoma can be difficult on histopathologic examination, potentially resulting in mismanagement of patients. We aimed to identify histopathologic features by which regressed melanoma can be differentiated from regressed LPLK. Twenty actively inflamed LPLK, 12 LPLK with regression and 15 melanomas with regression were compared and evaluated by hematoxylin and eosin staining as well as Melan-A, microphthalmia transcription factor (MiTF) and cytokeratin (AE1/AE3) immunostaining. (1) A total of 40% of regressed melanomas showed complete or near complete loss of melanocytes within the epidermis with Melan-A and MiTF immunostaining, while 8% of regressed LPLK exhibited this finding. (2) Necrotic keratinocytes were seen in the epidermis in 33% regressed melanomas as opposed to all of the regressed LPLK. (3) A dense infiltrate of melanophages in the papillary dermis was seen in 40% of regressed melanomas, a feature not seen in regressed LPLK. In summary, our findings suggest that a complete or near complete loss of melanocytes within the epidermis strongly favors a regressed melanoma over a regressed LPLK. In addition, necrotic epidermal keratinocytes and the presence of a dense band-like distribution of dermal melanophages can be helpful in differentiating these lesions. © 2016 John Wiley & Sons A/S. Published by John Wiley & Sons Ltd.
Development of a computer program to support an efficient non-regression test of a thermal-hydraulic system code

Energy Technology Data Exchange (ETDEWEB)

Lee, Jun Yeob; Jeong, Jae Jun [School of Mechanical Engineering, Pusan National University, Busan (Korea, Republic of); Suh, Jae Seung [System Engineering and Technology Co., Daejeon (Korea, Republic of); Kim, Kyung Doo [Korea Atomic Energy Research Institute, Daejeon (Korea, Republic of)

2014-10-15

During the development process of a thermal-hydraulic system code, a non-regression test (NRT) must be performed repeatedly in order to prevent software regression. The NRT process, however, is time-consuming and labor-intensive. Thus, automation of this process is an ideal solution. In this study, we have developed a program to support an efficient NRT for the SPACE code and demonstrated its usability. This results in a high degree of efficiency for code development. The program was developed using the Visual Basic for Applications and designed so that it can be easily customized for the NRT of other computer codes.
Composite marginal quantile regression analysis for longitudinal adolescent body mass index data.

Science.gov (United States)

Yang, Chi-Chuan; Chen, Yi-Hau; Chang, Hsing-Yi

2017-09-20

Childhood and adolescenthood overweight or obesity, which may be quantified through the body mass index (BMI), is strongly associated with adult obesity and other health problems. Motivated by the child and adolescent behaviors in long-term evolution (CABLE) study, we are interested in individual, family, and school factors associated with marginal quantiles of longitudinal adolescent BMI values. We propose a new method for composite marginal quantile regression analysis for longitudinal outcome data, which performs marginal quantile regressions at multiple quantile levels simultaneously. The proposed method extends the quantile regression coefficient modeling method introduced by Frumento and Bottai (Biometrics 2016; 72:74-84) to longitudinal data accounting suitably for the correlation structure in longitudinal observations. A goodness-of-fit test for the proposed modeling is also developed. Simulation results show that the proposed method can be much more efficient than the analysis without taking correlation into account and the analysis performing separate quantile regressions at different quantile levels. The application to the longitudinal adolescent BMI data from the CABLE study demonstrates the practical utility of our proposal. Copyright © 2017 John Wiley & Sons, Ltd. Copyright © 2017 John Wiley & Sons, Ltd.
Logistic regression models

CERN Document Server

Hilbe, Joseph M

2009-01-01

This book really does cover everything you ever wanted to know about logistic regression … with updates available on the author's website. Hilbe, a former national athletics champion, philosopher, and expert in astronomy, is a master at explaining statistical concepts and methods. Readers familiar with his other expository work will know what to expect-great clarity.The book provides considerable detail about all facets of logistic regression. No step of an argument is omitted so that the book will meet the needs of the reader who likes to see everything spelt out, while a person familiar with some of the topics has the option to skip "obvious" sections. The material has been thoroughly road-tested through classroom and web-based teaching. … The focus is on helping the reader to learn and understand logistic regression. The audience is not just students meeting the topic for the first time, but also experienced users. I believe the book really does meet the author's goal … .-Annette J. Dobson, Biometric...
On the estimation and testing of predictive panel regressions

NARCIS (Netherlands)

Karabiyik, H.; Westerlund, Joakim; Narayan, Paresh

2016-01-01

Hjalmarsson (2010) considers an OLS-based estimator of predictive panel regressions that is argued to be mixed normal under very general conditions. In a recent paper, Westerlund et al. (2016) show that while consistent, the estimator is generally not mixed normal, which invalidates standard normal
Direction of Effects in Multiple Linear Regression Models.

Science.gov (United States)

Wiedermann, Wolfgang; von Eye, Alexander

2015-01-01

Previous studies analyzed asymmetric properties of the Pearson correlation coefficient using higher than second order moments. These asymmetric properties can be used to determine the direction of dependence in a linear regression setting (i.e., establish which of two variables is more likely to be on the outcome side) within the framework of cross-sectional observational data. Extant approaches are restricted to the bivariate regression case. The present contribution extends the direction of dependence methodology to a multiple linear regression setting by analyzing distributional properties of residuals of competing multiple regression models. It is shown that, under certain conditions, the third central moments of estimated regression residuals can be used to decide upon direction of effects. In addition, three different approaches for statistical inference are discussed: a combined D'Agostino normality test, a skewness difference test, and a bootstrap difference test. Type I error and power of the procedures are assessed using Monte Carlo simulations, and an empirical example is provided for illustrative purposes. In the discussion, issues concerning the quality of psychological data, possible extensions of the proposed methods to the fourth central moment of regression residuals, and potential applications are addressed.
Parenting, attention and externalizing problems: testing mediation longitudinally, repeatedly and reciprocally.

Science.gov (United States)

Belsky, Jay; Pasco Fearon, R M; Bell, Brian

2007-12-01

Building on prior work, this paper tests, longitudinally and repeatedly, the proposition that attentional control processes mediate the effect of earlier parenting on later externalizing problems. Repeated independent measurements of all three constructs--observed parenting, computer-tested attentional control and adult-reported externalizing problems--were subjected to structural equation modeling using data from the large-scale American study of child care and youth development. Structural equation modeling indicated (a) that greater maternal sensitivity at two different ages (54 months, approximately 6 years) predicted better attentional control on the Continuous Performance Test (CPT) of attention regulation two later ages ( approximately 6/9 years); (2) that better attentional control at three different ages (54 months, approximately 6/9 years) predicted less teacher-reported externalizing problems at three later ages ( approximately 6/8/10 years); and (3) that attentional control partially mediated the effect of parenting on externalizing problems at two different lags (i.e., 54 months--> approximately 6 years--> approximately 8 years; approximately 6 years--> approximately 9 years--> approximately 10 years), though somewhat more strongly for the first. Additionally, (4) some evidence of reciprocal effects of attentional processes on parenting emerged (54 months--> approximately 6 years; approximately 6 years--> approximately 8 years), but not of problem behavior on attention. Because attention control partially mediates the effects of parenting on externalizing problems, intervention efforts could target both parenting and attentional processes.
Methods for significance testing of categorical covariates in logistic regression models after multiple imputation: power and applicability analysis

NARCIS (Netherlands)

Eekhout, I.; Wiel, M.A. van de; Heymans, M.W.

2017-01-01

Background. Multiple imputation is a recommended method to handle missing data. For significance testing after multiple imputation, Rubin’s Rules (RR) are easily applied to pool parameter estimates. In a logistic regression model, to consider whether a categorical covariate with more than two levels
[Application of detecting and taking overdispersion into account in Poisson regression model].

Science.gov (United States)

Bouche, G; Lepage, B; Migeot, V; Ingrand, P

2009-08-01

Researchers often use the Poisson regression model to analyze count data. Overdispersion can occur when a Poisson regression model is used, resulting in an underestimation of variance of the regression model parameters. Our objective was to take overdispersion into account and assess its impact with an illustration based on the data of a study investigating the relationship between use of the Internet to seek health information and number of primary care consultations. Three methods, overdispersed Poisson, a robust estimator, and negative binomial regression, were performed to take overdispersion into account in explaining variation in the number (Y) of primary care consultations. We tested overdispersion in the Poisson regression model using the ratio of the sum of Pearson residuals over the number of degrees of freedom (chi(2)/df). We then fitted the three models and compared parameter estimation to the estimations given by Poisson regression model. Variance of the number of primary care consultations (Var[Y]=21.03) was greater than the mean (E[Y]=5.93) and the chi(2)/df ratio was 3.26, which confirmed overdispersion. Standard errors of the parameters varied greatly between the Poisson regression model and the three other regression models. Interpretation of estimates from two variables (using the Internet to seek health information and single parent family) would have changed according to the model retained, with significant levels of 0.06 and 0.002 (Poisson), 0.29 and 0.09 (overdispersed Poisson), 0.29 and 0.13 (use of a robust estimator) and 0.45 and 0.13 (negative binomial) respectively. Different methods exist to solve the problem of underestimating variance in the Poisson regression model when overdispersion is present. The negative binomial regression model seems to be particularly accurate because of its theorical distribution ; in addition this regression is easy to perform with ordinary statistical software packages.
Support vector methods for survival analysis: a comparison between ranking and regression approaches.

Science.gov (United States)

Van Belle, Vanya; Pelckmans, Kristiaan; Van Huffel, Sabine; Suykens, Johan A K

2011-10-01

To compare and evaluate ranking, regression and combined machine learning approaches for the analysis of survival data. The literature describes two approaches based on support vector machines to deal with censored observations. In the first approach the key idea is to rephrase the task as a ranking problem via the concordance index, a problem which can be solved efficiently in a context of structural risk minimization and convex optimization techniques. In a second approach, one uses a regression approach, dealing with censoring by means of inequality constraints. The goal of this paper is then twofold: (i) introducing a new model combining the ranking and regression strategy, which retains the link with existing survival models such as the proportional hazards model via transformation models; and (ii) comparison of the three techniques on 6 clinical and 3 high-dimensional datasets and discussing the relevance of these techniques over classical approaches fur survival data. We compare svm-based survival models based on ranking constraints, based on regression constraints and models based on both ranking and regression constraints. The performance of the models is compared by means of three different measures: (i) the concordance index, measuring the model's discriminating ability; (ii) the logrank test statistic, indicating whether patients with a prognostic index lower than the median prognostic index have a significant different survival than patients with a prognostic index higher than the median; and (iii) the hazard ratio after normalization to restrict the prognostic index between 0 and 1. Our results indicate a significantly better performance for models including regression constraints above models only based on ranking constraints. This work gives empirical evidence that svm-based models using regression constraints perform significantly better than svm-based models based on ranking constraints. Our experiments show a comparable performance for methods
A comparison of discriminant logistic regression and Item Response Theory Likelihood-Ratio Tests for Differential Item Functioning (IRTLRDIF) in polytomous short tests.

Science.gov (United States)

Hidalgo, María D; López-Martínez, María D; Gómez-Benito, Juana; Guilera, Georgina

2016-01-01

Short scales are typically used in the social, behavioural and health sciences. This is relevant since test length can influence whether items showing DIF are correctly flagged. This paper compares the relative effectiveness of discriminant logistic regression (DLR) and IRTLRDIF for detecting DIF in polytomous short tests. A simulation study was designed. Test length, sample size, DIF amount and item response categories number were manipulated. Type I error and power were evaluated. IRTLRDIF and DLR yielded Type I error rates close to nominal level in no-DIF conditions. Under DIF conditions, Type I error rates were affected by test length DIF amount, degree of test contamination, sample size and number of item response categories. DLR showed a higher Type I error rate than did IRTLRDIF. Power rates were affected by DIF amount and sample size, but not by test length. DLR achieved higher power rates than did IRTLRDIF in very short tests, although the high Type I error rate involved means that this result cannot be taken into account. Test length had an important impact on the Type I error rate. IRTLRDIF and DLR showed a low power rate in short tests and with small sample sizes.
COVAR: Computer Program for Multifactor Relative Risks and Tests of Hypotheses Using a Variance-Covariance Matrix from Linear and Log-Linear Regression

Directory of Open Access Journals (Sweden)

Leif E. Peterson

1997-11-01

Full Text Available A computer program for multifactor relative risks, confidence limits, and tests of hypotheses using regression coefficients and a variance-covariance matrix obtained from a previous additive or multiplicative regression analysis is described in detail. Data used by the program can be stored and input from an external disk-file or entered via the keyboard. The output contains a list of the input data, point estimates of single or joint effects, confidence intervals and tests of hypotheses based on a minimum modified chi-square statistic. Availability of the program is also discussed.
Method for nonlinear exponential regression analysis

Science.gov (United States)

Junkin, B. G.

1972-01-01

Two computer programs developed according to two general types of exponential models for conducting nonlinear exponential regression analysis are described. Least squares procedure is used in which the nonlinear problem is linearized by expanding in a Taylor series. Program is written in FORTRAN 5 for the Univac 1108 computer.
Analysis of standard problem six (Semiscale test S-02-6) data

International Nuclear Information System (INIS)

Cartmill, C.E.

1977-08-01

Test S-02-6 of the Semiscale Mod-1 blowdown heat transfer test series was conducted to supply data for the U.S. Nuclear Regulatory Commission Standard Problem Six. To determine the credibility of the data and thus establish the validity of Standard Problem Six, an analysis of the results of Test S-02-6 was performed and is presented. This analysis consisted of investigations of system hydraulic and core thermal data. The credibility of the system hydraulic data was investigated through comparisons of the data with data and calculations from related sources (Test S-02-4) and, when necessary, through assessment of physical events. The credibility of the core thermal data was based on a thorough analysis of physical events. The results of these investigations substantiate the validity of Test S-02-6 data
Convergent Time-Varying Regression Models for Data Streams: Tracking Concept Drift by the Recursive Parzen-Based Generalized Regression Neural Networks.

Science.gov (United States)

Duda, Piotr; Jaworski, Maciej; Rutkowski, Leszek

2018-03-01

One of the greatest challenges in data mining is related to processing and analysis of massive data streams. Contrary to traditional static data mining problems, data streams require that each element is processed only once, the amount of allocated memory is constant and the models incorporate changes of investigated streams. A vast majority of available methods have been developed for data stream classification and only a few of them attempted to solve regression problems, using various heuristic approaches. In this paper, we develop mathematically justified regression models working in a time-varying environment. More specifically, we study incremental versions of generalized regression neural networks, called IGRNNs, and we prove their tracking properties - weak (in probability) and strong (with probability one) convergence assuming various concept drift scenarios. First, we present the IGRNNs, based on the Parzen kernels, for modeling stationary systems under nonstationary noise. Next, we extend our approach to modeling time-varying systems under nonstationary noise. We present several types of concept drifts to be handled by our approach in such a way that weak and strong convergence holds under certain conditions. Finally, in the series of simulations, we compare our method with commonly used heuristic approaches, based on forgetting mechanism or sliding windows, to deal with concept drift. Finally, we apply our concept in a real life scenario solving the problem of currency exchange rates prediction.
Inferring genetic parameters of lactation in Tropical Milking Criollo cattle with random regression test-day models.

Science.gov (United States)

Santellano-Estrada, E; Becerril-Pérez, C M; de Alba, J; Chang, Y M; Gianola, D; Torres-Hernández, G; Ramírez-Valverde, R

2008-11-01

This study inferred genetic and permanent environmental variation of milk yield in Tropical Milking Criollo cattle and compared 5 random regression test-day models using Wilmink's function and Legendre polynomials. Data consisted of 15,377 test-day records from 467 Tropical Milking Criollo cows that calved between 1974 and 2006 in the tropical lowlands of the Gulf Coast of Mexico and in southern Nicaragua. Estimated heritabilities of test-day milk yields ranged from 0.18 to 0.45, and repeatabilities ranged from 0.35 to 0.68 for the period spanning from 6 to 400 d in milk. Genetic correlation between days in milk 10 and 400 was around 0.50 but greater than 0.90 for most pairs of test days. The model that used first-order Legendre polynomials for additive genetic effects and second-order Legendre polynomials for permanent environmental effects gave the smallest residual variance and was also favored by the Akaike information criterion and likelihood ratio tests.
BRGLM, Interactive Linear Regression Analysis by Least Square Fit

International Nuclear Information System (INIS)

Ringland, J.T.; Bohrer, R.E.; Sherman, M.E.

1985-01-01

1 - Description of program or function: BRGLM is an interactive program written to fit general linear regression models by least squares and to provide a variety of statistical diagnostic information about the fit. Stepwise and all-subsets regression can be carried out also. There are facilities for interactive data management (e.g. setting missing value flags, data transformations) and tools for constructing design matrices for the more commonly-used models such as factorials, cubic Splines, and auto-regressions. 2 - Method of solution: The least squares computations are based on the orthogonal (QR) decomposition of the design matrix obtained using the modified Gram-Schmidt algorithm. 3 - Restrictions on the complexity of the problem: The current release of BRGLM allows maxima of 1000 observations, 99 variables, and 3000 words of main memory workspace. For a problem with N observations and P variables, the number of words of main memory storage required is MAX(N*(P+6), N*P+P*P+3*N, and 3*P*P+6*N). Any linear model may be fit although the in-memory workspace will have to be increased for larger problems
The Impact of Problem Sets on Student Learning

Science.gov (United States)

Kim, Myeong Hwan; Cho, Moon-Heum; Leonard, Karen Moustafa

2012-01-01

The authors examined the role of problem sets on student learning in university microeconomics. A total of 126 students participated in the study in consecutive years. independent samples t test showed that students who were not given answer keys outperformed students who were given answer keys. Multiple regression analysis showed that, along with…
Detecting and Analyzing I/O Performance Regressions

NARCIS (Netherlands)

Bezemer, C.P.; Milon, E.; Zaidman, A.; Pouwelse, J.

2014-01-01

Regression testing can be done by re-executing a test suite on different software versions and comparing the outcome. For functional testing, the outcome of such tests is either pass (correct behaviour) or fail (incorrect behaviour). For non-functional testing, such as performance testing, this is

[Testing a Model to Predict Problem Gambling in Speculative Game Users].

Science.gov (United States)

Park, Hyangjin; Kim, Suk Sun

2018-04-01

The purpose of the study was to develop and test a model for predicting problem gambling in speculative game users based on Blaszczynski and Nower's pathways model of problem and pathological gambling. The participants were 262 speculative game users recruited from seven speculative gambling places located in Seoul, Gangwon, and Gyeonggi, Korea. They completed a structured self-report questionnaire comprising measures of problem gambling, negative emotions, attentional impulsivity, motor impulsivity, non-planning impulsivity, gambler's fallacy, and gambling self-efficacy. Structural Equation Modeling was used to test the hypothesized model and to examine the direct and indirect effects on problem gambling in speculative game users using SPSS 22.0 and AMOS 20.0 programs. The hypothetical research model provided a reasonable fit to the data. Negative emotions, motor impulsivity, gambler's fallacy, and gambling self-efficacy had direct effects on problem gambling in speculative game users, while indirect effects were reported for negative emotions, motor impulsivity, and gambler's fallacy. These predictors explained 75.2% problem gambling in speculative game users. The findings suggest that developing intervention programs to reduce negative emotions, motor impulsivity, and gambler's fallacy, and to increase gambling self-efficacy in speculative game users are needed to prevent their problem gambling. © 2018 Korean Society of Nursing Science.
Genetic Analysis of Milk Yield Using Random Regression Test Day Model in Tehran Province Holstein Dairy Cow

Directory of Open Access Journals (Sweden)

A. Seyeddokht

2012-09-01

Full Text Available In this research a random regression test day model was used to estimate heritability values and calculation genetic correlations between test day milk records. a total of 140357 monthly test day milk records belonging to 28292 first lactation Holstein cattle(trice time a day milking distributed in 165 herd and calved from 2001 to 2010 belonging to the herds of Tehran province were used. The fixed effects of herd-year-month of calving as contemporary group and age at calving and Holstein gene percentage as covariate were fitted. Orthogonal legendre polynomial with a 4th-order was implemented to take account of genetic and environmental aspects of milk production over the course of lactation. RRM using Legendre polynomials as base functions appears to be the most adequate to describe the covariance structure of the data. The results showed that the average of heritability for the second half of lactation period was higher than that of the first half. The heritability value for the first month was lowest (0.117 and for the eighth month of the lactation was highest (0.230 compared to the other months of lactation. Because of genetic variation was increased gradually, and residual variance was high in the first months of lactation, heritabilities were different over the course of lactation. The RRMs with a higher number of parameters were more useful to describe the genetic variation of test-day milk yield throughout the lactation. In this research estimation of genetic parameters, and calculation genetic correlations were implemented by random regression test day model, therefore using this method is the exact way to take account of parameters rather than the other ways.
Ridge Regression Signal Processing

Science.gov (United States)

Kuhl, Mark R.

1990-01-01

The introduction of the Global Positioning System (GPS) into the National Airspace System (NAS) necessitates the development of Receiver Autonomous Integrity Monitoring (RAIM) techniques. In order to guarantee a certain level of integrity, a thorough understanding of modern estimation techniques applied to navigational problems is required. The extended Kalman filter (EKF) is derived and analyzed under poor geometry conditions. It was found that the performance of the EKF is difficult to predict, since the EKF is designed for a Gaussian environment. A novel approach is implemented which incorporates ridge regression to explain the behavior of an EKF in the presence of dynamics under poor geometry conditions. The basic principles of ridge regression theory are presented, followed by the derivation of a linearized recursive ridge estimator. Computer simulations are performed to confirm the underlying theory and to provide a comparative analysis of the EKF and the recursive ridge estimator.
Modeling Governance KB with CATPCA to Overcome Multicollinearity in the Logistic Regression

Science.gov (United States)

Khikmah, L.; Wijayanto, H.; Syafitri, U. D.

2017-04-01

The problem often encounters in logistic regression modeling are multicollinearity problems. Data that have multicollinearity between explanatory variables with the result in the estimation of parameters to be bias. Besides, the multicollinearity will result in error in the classification. In general, to overcome multicollinearity in regression used stepwise regression. They are also another method to overcome multicollinearity which involves all variable for prediction. That is Principal Component Analysis (PCA). However, classical PCA in only for numeric data. Its data are categorical, one method to solve the problems is Categorical Principal Component Analysis (CATPCA). Data were used in this research were a part of data Demographic and Population Survey Indonesia (IDHS) 2012. This research focuses on the characteristic of women of using the contraceptive methods. Classification results evaluated using Area Under Curve (AUC) values. The higher the AUC value, the better. Based on AUC values, the classification of the contraceptive method using stepwise method (58.66%) is better than the logistic regression model (57.39%) and CATPCA (57.39%). Evaluation of the results of logistic regression using sensitivity, shows the opposite where CATPCA method (99.79%) is better than logistic regression method (92.43%) and stepwise (92.05%). Therefore in this study focuses on major class classification (using a contraceptive method), then the selected model is CATPCA because it can raise the level of the major class model accuracy.
Testing Environmental Kuznets Curve in the Selected Transition Economies with Panel Smooth Transition Regression Analysis

Directory of Open Access Journals (Sweden)

Mahmut Zortuk

2016-08-01

Full Text Available The Environmental Kuznets Curve (EKC introduces an inverted U-shaped relationship between environmental pollution and economic development. The inverted U-shaped curve is seen as complete pattern for developed economies. However, our study tests the EKC for developing transition economies of European Union, therefore, our results could make a significant contribution to the literature. In this paper, the relationship between carbon dioxide (CO2 emissions, gross domestic product (GDP, energy use and urban population is investigated in the Transition Economies (Bulgaria, Croatia, Czech Republic, Estonia, Hungary, Latvia, Lithuania, Poland, Romania, Slovakia and Slovenia. Environmental Kuznets Curve is tested by panel smooth transition regression for these economies for 1993 – 2010 periods. As a result of study, the null hypothesis of linearity was rejected and no-remaining nonlinearity test showed that there is a smooth transition exists between two regimes (below $5176 GDP per capita is first one and above $5176 GDP per capita is second one in the related period for these economies.
The students' ability in the mathematical literacy for uncertainty problems on the PISA adaptation test

Science.gov (United States)

Julie, Hongki; Sanjaya, Febi; Anggoro, Ant. Yudhi

2017-08-01

One of purposes of this study was to describe the solution profile of the junior high school students for the PISA adaptation test. The procedures conducted by researchers to achieve this objective were (1) adapting the PISA test, (2) validating the adapting PISA test, (3) asking junior high school students to do the adapting PISA test, and (4) making the students' solution profile. The PISA problems for mathematics could be classified into four areas, namely quantity, space and shape, change and relationship, and uncertainty. The research results that would be presented in this paper were the result test for uncertainty problems. In the adapting PISA test, there were fifteen questions. Subjects in this study were 18 students from 11 junior high schools in Yogyakarta, Central Java, and Banten. The type of research that used by the researchers was a qualitative research. For the first uncertainty problem in the adapting test, 66.67% of students reached level 3. For the second uncertainty problem in the adapting test, 44.44% of students achieved level 4, and 33.33% of students reached level 3. For the third uncertainty problem in the adapting test n, 38.89% of students achieved level 5, 11.11% of students reached level 4, and 5.56% of students achieved level 3. For the part a of the fourth uncertainty problem in the adapting test, 72.22% of students reached level 4 and for the part b of the fourth uncertainty problem in the adapting test, 83.33% students achieved level 4.
On selection of optimal stochastic model for accelerated life testing

International Nuclear Information System (INIS)

Volf, P.; Timková, J.

2014-01-01

This paper deals with the problem of proper lifetime model selection in the context of statistical reliability analysis. Namely, we consider regression models describing the dependence of failure intensities on a covariate, for instance, a stressor. Testing the model fit is standardly based on the so-called martingale residuals. Their analysis has already been studied by many authors. Nevertheless, the Bayes approach to the problem, in spite of its advantages, is just developing. We shall present the Bayes procedure of estimation in several semi-parametric regression models of failure intensity. Then, our main concern is the Bayes construction of residual processes and goodness-of-fit tests based on them. The method is illustrated with both artificial and real-data examples. - Highlights: • Statistical survival and reliability analysis and Bayes approach. • Bayes semi-parametric regression modeling in Cox's and AFT models. • Bayes version of martingale residuals and goodness-of-fit test
Identifying Interacting Genetic Variations by Fish-Swarm Logic Regression

Science.gov (United States)

Yang, Aiyuan; Yan, Chunxia; Zhu, Feng; Zhao, Zhongmeng; Cao, Zhi

2013-01-01

Understanding associations between genotypes and complex traits is a fundamental problem in human genetics. A major open problem in mapping phenotypes is that of identifying a set of interacting genetic variants, which might contribute to complex traits. Logic regression (LR) is a powerful multivariant association tool. Several LR-based approaches have been successfully applied to different datasets. However, these approaches are not adequate with regard to accuracy and efficiency. In this paper, we propose a new LR-based approach, called fish-swarm logic regression (FSLR), which improves the logic regression process by incorporating swarm optimization. In our approach, a school of fish agents are conducted in parallel. Each fish agent holds a regression model, while the school searches for better models through various preset behaviors. A swarm algorithm improves the accuracy and the efficiency by speeding up the convergence and preventing it from dropping into local optimums. We apply our approach on a real screening dataset and a series of simulation scenarios. Compared to three existing LR-based approaches, our approach outperforms them by having lower type I and type II error rates, being able to identify more preset causal sites, and performing at faster speeds. PMID:23984382
SPLINE LINEAR REGRESSION USED FOR EVALUATING FINANCIAL ASSETS 1

Directory of Open Access Journals (Sweden)

Liviu GEAMBAŞU

2010-12-01

Full Text Available One of the most important preoccupations of financial markets participants was and still is the problem of determining more precise the trend of financial assets prices. For solving this problem there were written many scientific papers and were developed many mathematical and statistical models in order to better determine the financial assets price trend. If until recently the simple linear models were largely used due to their facile utilization, the financial crises that affected the world economy starting with 2008 highlight the necessity of adapting the mathematical models to variation of economy. A simple to use model but adapted to economic life realities is the spline linear regression. This type of regression keeps the continuity of regression function, but split the studied data in intervals with homogenous characteristics. The characteristics of each interval are highlighted and also the evolution of market over all the intervals, resulting reduced standard errors. The first objective of the article is the theoretical presentation of the spline linear regression, also referring to scientific national and international papers related to this subject. The second objective is applying the theoretical model to data from the Bucharest Stock Exchange
Identifying Interacting Genetic Variations by Fish-Swarm Logic Regression

Directory of Open Access Journals (Sweden)

Xuanping Zhang

2013-01-01

Full Text Available Understanding associations between genotypes and complex traits is a fundamental problem in human genetics. A major open problem in mapping phenotypes is that of identifying a set of interacting genetic variants, which might contribute to complex traits. Logic regression (LR is a powerful multivariant association tool. Several LR-based approaches have been successfully applied to different datasets. However, these approaches are not adequate with regard to accuracy and efficiency. In this paper, we propose a new LR-based approach, called fish-swarm logic regression (FSLR, which improves the logic regression process by incorporating swarm optimization. In our approach, a school of fish agents are conducted in parallel. Each fish agent holds a regression model, while the school searches for better models through various preset behaviors. A swarm algorithm improves the accuracy and the efficiency by speeding up the convergence and preventing it from dropping into local optimums. We apply our approach on a real screening dataset and a series of simulation scenarios. Compared to three existing LR-based approaches, our approach outperforms them by having lower type I and type II error rates, being able to identify more preset causal sites, and performing at faster speeds.
Extending multivariate distance matrix regression with an effect size measure and the asymptotic null distribution of the test statistic.

Science.gov (United States)

McArtor, Daniel B; Lubke, Gitta H; Bergeman, C S

2017-12-01

Person-centered methods are useful for studying individual differences in terms of (dis)similarities between response profiles on multivariate outcomes. Multivariate distance matrix regression (MDMR) tests the significance of associations of response profile (dis)similarities and a set of predictors using permutation tests. This paper extends MDMR by deriving and empirically validating the asymptotic null distribution of its test statistic, and by proposing an effect size for individual outcome variables, which is shown to recover true associations. These extensions alleviate the computational burden of permutation tests currently used in MDMR and render more informative results, thus making MDMR accessible to new research domains.
Developing and testing a global-scale regression model to quantify mean annual streamflow

Science.gov (United States)

Barbarossa, Valerio; Huijbregts, Mark A. J.; Hendriks, A. Jan; Beusen, Arthur H. W.; Clavreul, Julie; King, Henry; Schipper, Aafke M.

2017-01-01

Quantifying mean annual flow of rivers (MAF) at ungauged sites is essential for assessments of global water supply, ecosystem integrity and water footprints. MAF can be quantified with spatially explicit process-based models, which might be overly time-consuming and data-intensive for this purpose, or with empirical regression models that predict MAF based on climate and catchment characteristics. Yet, regression models have mostly been developed at a regional scale and the extent to which they can be extrapolated to other regions is not known. In this study, we developed a global-scale regression model for MAF based on a dataset unprecedented in size, using observations of discharge and catchment characteristics from 1885 catchments worldwide, measuring between 2 and 106 km2. In addition, we compared the performance of the regression model with the predictive ability of the spatially explicit global hydrological model PCR-GLOBWB by comparing results from both models to independent measurements. We obtained a regression model explaining 89% of the variance in MAF based on catchment area and catchment averaged mean annual precipitation and air temperature, slope and elevation. The regression model performed better than PCR-GLOBWB for the prediction of MAF, as root-mean-square error (RMSE) values were lower (0.29-0.38 compared to 0.49-0.57) and the modified index of agreement (d) was higher (0.80-0.83 compared to 0.72-0.75). Our regression model can be applied globally to estimate MAF at any point of the river network, thus providing a feasible alternative to spatially explicit process-based global hydrological models.
A Comparative Study of Pairwise Learning Methods Based on Kernel Ridge Regression.

Science.gov (United States)

Stock, Michiel; Pahikkala, Tapio; Airola, Antti; De Baets, Bernard; Waegeman, Willem

2018-06-12

Many machine learning problems can be formulated as predicting labels for a pair of objects. Problems of that kind are often referred to as pairwise learning, dyadic prediction, or network inference problems. During the past decade, kernel methods have played a dominant role in pairwise learning. They still obtain a state-of-the-art predictive performance, but a theoretical analysis of their behavior has been underexplored in the machine learning literature. In this work we review and unify kernel-based algorithms that are commonly used in different pairwise learning settings, ranging from matrix filtering to zero-shot learning. To this end, we focus on closed-form efficient instantiations of Kronecker kernel ridge regression. We show that independent task kernel ridge regression, two-step kernel ridge regression, and a linear matrix filter arise naturally as a special case of Kronecker kernel ridge regression, implying that all these methods implicitly minimize a squared loss. In addition, we analyze universality, consistency, and spectral filtering properties. Our theoretical results provide valuable insights into assessing the advantages and limitations of existing pairwise learning methods.
Repeated Results Analysis for Middleware Regression Benchmarking

Czech Academy of Sciences Publication Activity Database

Bulej, Lubomír; Kalibera, T.; Tůma, P.

2005-01-01

Roč. 60, - (2005), s. 345-358 ISSN 0166-5316 R&D Projects: GA ČR GA102/03/0672 Institutional research plan: CEZ:AV0Z10300504 Keywords : middleware benchmarking * regression benchmarking * regression testing Subject RIV: JD - Computer Applications, Robotics Impact factor: 0.756, year: 2005
Seed germination test for toxicity evaluation of compost: Its roles, problems and prospects.

Science.gov (United States)

Luo, Yuan; Liang, Jie; Zeng, Guangming; Chen, Ming; Mo, Dan; Li, Guoxue; Zhang, Difang

2018-01-01

Compost is commonly used for the growth of plants and the remediation of environmental pollution. It is important to evaluate the quality of compost and seed germination test is a powerful tool to examine the toxicity of compost, which is the most important aspect of the quality. Now the test is widely adopted, but the main problem is that the test results vary with different methods and seed species, which limits the development and application of it. The standardization of methods and the modelization of seeds can contribute to solving the problem. Additionally, according to the probabilistic theory of seed germination, the error caused by the analysis and judgment methods of the test results can be reduced. Here, we reviewed the roles, problems and prospects of the seed germination test in the studies of compost. Copyright © 2017 Elsevier Ltd. All rights reserved.
Modified Regression Correlation Coefficient for Poisson Regression Model

Science.gov (United States)

Kaengthong, Nattacha; Domthong, Uthumporn

2017-09-01

This study gives attention to indicators in predictive power of the Generalized Linear Model (GLM) which are widely used; however, often having some restrictions. We are interested in regression correlation coefficient for a Poisson regression model. This is a measure of predictive power, and defined by the relationship between the dependent variable (Y) and the expected value of the dependent variable given the independent variables [E(Y|X)] for the Poisson regression model. The dependent variable is distributed as Poisson. The purpose of this research was modifying regression correlation coefficient for Poisson regression model. We also compare the proposed modified regression correlation coefficient with the traditional regression correlation coefficient in the case of two or more independent variables, and having multicollinearity in independent variables. The result shows that the proposed regression correlation coefficient is better than the traditional regression correlation coefficient based on Bias and the Root Mean Square Error (RMSE).
The Collinearity Free and Bias Reduced Regression Estimation Project: The Theory of Normalization Ridge Regression. Report No. 2.

Science.gov (United States)

Bulcock, J. W.; And Others

Multicollinearity refers to the presence of highly intercorrelated independent variables in structural equation models, that is, models estimated by using techniques such as least squares regression and maximum likelihood. There is a problem of multicollinearity in both the natural and social sciences where theory formulation and estimation is in…
bayesQR: A Bayesian Approach to Quantile Regression

Directory of Open Access Journals (Sweden)

Dries F. Benoit

2017-01-01

Full Text Available After its introduction by Koenker and Basset (1978, quantile regression has become an important and popular tool to investigate the conditional response distribution in regression. The R package bayesQR contains a number of routines to estimate quantile regression parameters using a Bayesian approach based on the asymmetric Laplace distribution. The package contains functions for the typical quantile regression with continuous dependent variable, but also supports quantile regression for binary dependent variables. For both types of dependent variables, an approach to variable selection using the adaptive lasso approach is provided. For the binary quantile regression model, the package also contains a routine that calculates the fitted probabilities for each vector of predictors. In addition, functions for summarizing the results, creating traceplots, posterior histograms and drawing quantile plots are included. This paper starts with a brief overview of the theoretical background of the models used in the bayesQR package. The main part of this paper discusses the computational problems that arise in the implementation of the procedure and illustrates the usefulness of the package through selected examples.
Hierarchical Matching and Regression with Application to Photometric Redshift Estimation

Science.gov (United States)

Murtagh, Fionn

2017-06-01

This work emphasizes that heterogeneity, diversity, discontinuity, and discreteness in data is to be exploited in classification and regression problems. A global a priori model may not be desirable. For data analytics in cosmology, this is motivated by the variety of cosmological objects such as elliptical, spiral, active, and merging galaxies at a wide range of redshifts. Our aim is matching and similarity-based analytics that takes account of discrete relationships in the data. The information structure of the data is represented by a hierarchy or tree where the branch structure, rather than just the proximity, is important. The representation is related to p-adic number theory. The clustering or binning of the data values, related to the precision of the measurements, has a central role in this methodology. If used for regression, our approach is a method of cluster-wise regression, generalizing nearest neighbour regression. Both to exemplify this analytics approach, and to demonstrate computational benefits, we address the well-known photometric redshift or `photo-z' problem, seeking to match Sloan Digital Sky Survey (SDSS) spectroscopic and photometric redshifts.
Methodical approaches to solving special problems of testing. Seminar papers

International Nuclear Information System (INIS)

1996-01-01

This Seminar volume introduces concepts and applications from different areas of application of ultrasonic testing and other non-destructive test methods in 18 lectures, in order to give an idea of new trends in development and stimuli for special solutions to problems. 3 articles were recorded separately for the ENERGY data bank. (orig./MM) [de

Estimating the Proportion of True Null Hypotheses in Multiple Testing Problems

Directory of Open Access Journals (Sweden)

Oluyemi Oyeniran

2016-01-01

Full Text Available The problem of estimating the proportion, π0, of the true null hypotheses in a multiple testing problem is important in cases where large scale parallel hypotheses tests are performed independently. While the problem is a quantity of interest in its own right in applications, the estimate of π0 can be used for assessing or controlling an overall false discovery rate. In this article, we develop an innovative nonparametric maximum likelihood approach to estimate π0. The nonparametric likelihood is proposed to be restricted to multinomial models and an EM algorithm is also developed to approximate the estimate of π0. Simulation studies show that the proposed method outperforms other existing methods. Using experimental microarray datasets, we demonstrate that the new method provides satisfactory estimate in practice.
Principal Covariates Clusterwise Regression (PCCR) : Accounting for multicollinearity and population heterogeneity in hierarchically organized data.

NARCIS (Netherlands)

Wilderjans, Tom F.; Van de Gaer, E.; Kiers, H.A.L.; Van Mechelen, Iven; Ceulemans, Eva

In the behavioral sciences, many research questions pertain to a regression problem in that one wants to predict a criterion on the basis of a number of predictors. Although in many cases, ordinary least squares regression will suffice, sometimes the prediction problem is more challenging, for three
Regularized principal covariates regression and its application to finding coupled patterns in climate fields

Science.gov (United States)

Fischer, M. J.

2014-02-01

There are many different methods for investigating the coupling between two climate fields, which are all based on the multivariate regression model. Each different method of solving the multivariate model has its own attractive characteristics, but often the suitability of a particular method for a particular problem is not clear. Continuum regression methods search the solution space between the conventional methods and thus can find regression model subspaces that mix the attractive characteristics of the end-member subspaces. Principal covariates regression is a continuum regression method that is easily applied to climate fields and makes use of two end-members: principal components regression and redundancy analysis. In this study, principal covariates regression is extended to additionally span a third end-member (partial least squares or maximum covariance analysis). The new method, regularized principal covariates regression, has several attractive features including the following: it easily applies to problems in which the response field has missing values or is temporally sparse, it explores a wide range of model spaces, and it seeks a model subspace that will, for a set number of components, have a predictive skill that is the same or better than conventional regression methods. The new method is illustrated by applying it to the problem of predicting the southern Australian winter rainfall anomaly field using the regional atmospheric pressure anomaly field. Regularized principal covariates regression identifies four major coupled patterns in these two fields. The two leading patterns, which explain over half the variance in the rainfall field, are related to the subtropical ridge and features of the zonally asymmetric circulation.
The Use of Nonparametric Kernel Regression Methods in Econometric Production Analysis

DEFF Research Database (Denmark)

Czekaj, Tomasz Gerard

and nonparametric estimations of production functions in order to evaluate the optimal firm size. The second paper discusses the use of parametric and nonparametric regression methods to estimate panel data regression models. The third paper analyses production risk, price uncertainty, and farmers' risk preferences...... within a nonparametric panel data regression framework. The fourth paper analyses the technical efficiency of dairy farms with environmental output using nonparametric kernel regression in a semiparametric stochastic frontier analysis. The results provided in this PhD thesis show that nonparametric......This PhD thesis addresses one of the fundamental problems in applied econometric analysis, namely the econometric estimation of regression functions. The conventional approach to regression analysis is the parametric approach, which requires the researcher to specify the form of the regression...
Regression-based approach for testing the association between multi-region haplotype configuration and complex trait

Directory of Open Access Journals (Sweden)

Zhao Hongbo

2009-09-01

Full Text Available Abstract Background It is quite common that the genetic architecture of complex traits involves many genes and their interactions. Therefore, dealing with multiple unlinked genomic regions simultaneously is desirable. Results In this paper we develop a regression-based approach to assess the interactions of haplotypes that belong to different unlinked regions, and we use score statistics to test the null hypothesis of non-genetic association. Additionally, multiple marker combinations at each unlinked region are considered. The multiple tests are settled via the minP approach. The P value of the "best" multi-region multi-marker configuration is corrected via Monte-Carlo simulations. Through simulation studies, we assess the performance of the proposed approach and demonstrate its validity and power in testing for haplotype interaction association. Conclusion Our simulations showed that, for binary trait without covariates, our proposed methods prove to be equal and even more powerful than htr and hapcc which are part of the FAMHAP program. Additionally, our model can be applied to a wider variety of traits and allow adjustment for other covariates. To test the validity, our methods are applied to analyze the association between four unlinked candidate genes and pig meat quality.
Using Regression Equations Built from Summary Data in the Psychological Assessment of the Individual Case: Extension to Multiple Regression

Science.gov (United States)

Crawford, John R.; Garthwaite, Paul H.; Denham, Annie K.; Chelune, Gordon J.

2012-01-01

Regression equations have many useful roles in psychological assessment. Moreover, there is a large reservoir of published data that could be used to build regression equations; these equations could then be employed to test a wide variety of hypotheses concerning the functioning of individual cases. This resource is currently underused because…
Regression-based Multi-View Facial Expression Recognition

NARCIS (Netherlands)

Rudovic, Ognjen; Patras, Ioannis; Pantic, Maja

2010-01-01

We present a regression-based scheme for multi-view facial expression recognition based on 2蚠D geometric features. We address the problem by mapping facial points (e.g. mouth corners) from non-frontal to frontal view where further recognition of the expressions can be performed using a
Order Selection for General Expression of Nonlinear Autoregressive Model Based on Multivariate Stepwise Regression

Science.gov (United States)

Shi, Jinfei; Zhu, Songqing; Chen, Ruwen

2017-12-01

An order selection method based on multiple stepwise regressions is proposed for General Expression of Nonlinear Autoregressive model which converts the model order problem into the variable selection of multiple linear regression equation. The partial autocorrelation function is adopted to define the linear term in GNAR model. The result is set as the initial model, and then the nonlinear terms are introduced gradually. Statistics are chosen to study the improvements of both the new introduced and originally existed variables for the model characteristics, which are adopted to determine the model variables to retain or eliminate. So the optimal model is obtained through data fitting effect measurement or significance test. The simulation and classic time-series data experiment results show that the method proposed is simple, reliable and can be applied to practical engineering.
Demonstration of a Fiber Optic Regression Probe

Science.gov (United States)

Korman, Valentin; Polzin, Kurt A.

2010-01-01

The capability to provide localized, real-time monitoring of material regression rates in various applications has the potential to provide a new stream of data for development testing of various components and systems, as well as serving as a monitoring tool in flight applications. These applications include, but are not limited to, the regression of a combusting solid fuel surface, the ablation of the throat in a chemical rocket or the heat shield of an aeroshell, and the monitoring of erosion in long-life plasma thrusters. The rate of regression in the first application is very fast, while the second and third are increasingly slower. A recent fundamental sensor development effort has led to a novel regression, erosion, and ablation sensor technology (REAST). The REAST sensor allows for measurement of real-time surface erosion rates at a discrete surface location. The sensor is optical, using two different, co-located fiber-optics to perform the regression measurement. The disparate optical transmission properties of the two fiber-optics makes it possible to measure the regression rate by monitoring the relative light attenuation through the fibers. As the fibers regress along with the parent material in which they are embedded, the relative light intensities through the two fibers changes, providing a measure of the regression rate. The optical nature of the system makes it relatively easy to use in a variety of harsh, high temperature environments, and it is also unaffected by the presence of electric and magnetic fields. In addition, the sensor could be used to perform optical spectroscopy on the light emitted by a process and collected by fibers, giving localized measurements of various properties. The capability to perform an in-situ measurement of material regression rates is useful in addressing a variety of physical issues in various applications. An in-situ measurement allows for real-time data regarding the erosion rates, providing a quick method for
Block-GP: Scalable Gaussian Process Regression for Multimodal Data

Data.gov (United States)

National Aeronautics and Space Administration — Regression problems on massive data sets are ubiquitous in many application domains including the Internet, earth and space sciences, and finances. In many cases,...
An appraisal of convergence failures in the application of logistic regression model in published manuscripts.

Science.gov (United States)

Yusuf, O B; Bamgboye, E A; Afolabi, R F; Shodimu, M A

2014-09-01

Logistic regression model is widely used in health research for description and predictive purposes. Unfortunately, most researchers are sometimes not aware that the underlying principles of the techniques have failed when the algorithm for maximum likelihood does not converge. Young researchers particularly postgraduate students may not know why separation problem whether quasi or complete occurs, how to identify it and how to fix it. This study was designed to critically evaluate convergence issues in articles that employed logistic regression analysis published in an African Journal of Medicine and medical sciences between 2004 and 2013. Problems of quasi or complete separation were described and were illustrated with the National Demographic and Health Survey dataset. A critical evaluation of articles that employed logistic regression was conducted. A total of 581 articles was reviewed, of which 40 (6.9%) used binary logistic regression. Twenty-four (60.0%) stated the use of logistic regression model in the methodology while none of the articles assessed model fit. Only 3 (12.5%) properly described the procedures. Of the 40 that used the logistic regression model, the problem of convergence occurred in 6 (15.0%) of the articles. Logistic regression tends to be poorly reported in studies published between 2004 and 2013. Our findings showed that the procedure may not be well understood by researchers since very few described the process in their reports and may be totally unaware of the problem of convergence or how to deal with it.
Statistical approach for selection of regression model during validation of bioanalytical method

Directory of Open Access Journals (Sweden)

Natalija Nakov

2014-06-01

Full Text Available The selection of an adequate regression model is the basis for obtaining accurate and reproducible results during the bionalytical method validation. Given the wide concentration range, frequently present in bioanalytical assays, heteroscedasticity of the data may be expected. Several weighted linear and quadratic regression models were evaluated during the selection of the adequate curve fit using nonparametric statistical tests: One sample rank test and Wilcoxon signed rank test for two independent groups of samples. The results obtained with One sample rank test could not give statistical justification for the selection of linear vs. quadratic regression models because slight differences between the error (presented through the relative residuals were obtained. Estimation of the significance of the differences in the RR was achieved using Wilcoxon signed rank test, where linear and quadratic regression models were treated as two independent groups. The application of this simple non-parametric statistical test provides statistical confirmation of the choice of an adequate regression model.
Integration of association statistics over genomic regions using Bayesian adaptive regression splines

Directory of Open Access Journals (Sweden)

Zhang Xiaohua

2003-11-01

Full Text Available Abstract In the search for genetic determinants of complex disease, two approaches to association analysis are most often employed, testing single loci or testing a small group of loci jointly via haplotypes for their relationship to disease status. It is still debatable which of these approaches is more favourable, and under what conditions. The former has the advantage of simplicity but suffers severely when alleles at the tested loci are not in linkage disequilibrium (LD with liability alleles; the latter should capture more of the signal encoded in LD, but is far from simple. The complexity of haplotype analysis could be especially troublesome for association scans over large genomic regions, which, in fact, is becoming the standard design. For these reasons, the authors have been evaluating statistical methods that bridge the gap between single-locus and haplotype-based tests. In this article, they present one such method, which uses non-parametric regression techniques embodied by Bayesian adaptive regression splines (BARS. For a set of markers falling within a common genomic region and a corresponding set of single-locus association statistics, the BARS procedure integrates these results into a single test by examining the class of smooth curves consistent with the data. The non-parametric BARS procedure generally finds no signal when no liability allele exists in the tested region (ie it achieves the specified size of the test and it is sensitive enough to pick up signals when a liability allele is present. The BARS procedure provides a robust and potentially powerful alternative to classical tests of association, diminishes the multiple testing problem inherent in those tests and can be applied to a wide range of data types, including genotype frequencies estimated from pooled samples.
Testing After Worked Example Study Does Not Enhance Delayed Problem-Solving Performance Compared to Restudy

NARCIS (Netherlands)

T.A.J.M. van Gog (Tamara); L. Kester (Liesbeth); K. Dirkx (Kim); V. Hoogerheide (Vincent); J. Boerboom (Joris); P.P.J.L. Verkoeijen (Peter)

2015-01-01

textabstractFour experiments investigated whether the testing effect also applies to the acquisition of problem-solving skills from worked examples. Experiment 1 (n = 120) showed no beneficial effects of testing consisting of isomorphic problem solving or example recall on final test performance,
Testing After Worked Example Study Does Not Enhance Delayed Problem-Solving Performance Compared to Restudy

NARCIS (Netherlands)

Van Gog, Tamara; Kester, Liesbeth; Dirkx, Kim; Hoogerheide, Vincent; Boerboom, Joris; Verkoeijen, Peter P. J. L.

2016-01-01

Four experiments investigated whether the testing effect also applies to the acquisition of problem-solving skills from worked examples. Experiment 1 (n=120) showed no beneficial effects of testing consisting of isomorphic problem solving or example recall on final test performance, which
Testing After Worked Example Study Does Not Enhance Delayed Problem-Solving Performance Compared to Restudy

NARCIS (Netherlands)

van Gog, Tamara; Kester, Liesbeth; Dirkx, Kim; Hoogerheide, Vincent; Boerboom, Joris; Verkoeijen, Peter P J L

Four experiments investigated whether the testing effect also applies to the acquisition of problem-solving skills from worked examples. Experiment 1 (n = 120) showed no beneficial effects of testing consisting of isomorphic problem solving or example recall on final test performance, which
Predicting hyperketonemia by logistic and linear regression using test-day milk and performance variables in early-lactation Holstein and Jersey cows.

Science.gov (United States)

Chandler, T L; Pralle, R S; Dórea, J R R; Poock, S E; Oetzel, G R; Fourdraine, R H; White, H M

2018-03-01

Although cowside testing strategies for diagnosing hyperketonemia (HYK) are available, many are labor intensive and costly, and some lack sufficient accuracy. Predicting milk ketone bodies by Fourier transform infrared spectrometry during routine milk sampling may offer a more practical monitoring strategy. The objectives of this study were to (1) develop linear and logistic regression models using all available test-day milk and performance variables for predicting HYK and (2) compare prediction methods (Fourier transform infrared milk ketone bodies, linear regression models, and logistic regression models) to determine which is the most predictive of HYK. Given the data available, a secondary objective was to evaluate differences in test-day milk and performance variables (continuous measurements) between Holsteins and Jerseys and between cows with or without HYK within breed. Blood samples were collected on the same day as milk sampling from 658 Holstein and 468 Jersey cows between 5 and 20 d in milk (DIM). Diagnosis of HYK was at a serum β-hydroxybutyrate (BHB) concentration ≥1.2 mmol/L. Concentrations of milk BHB and acetone were predicted by Fourier transform infrared spectrometry (Foss Analytical, Hillerød, Denmark). Thresholds of milk BHB and acetone were tested for diagnostic accuracy, and logistic models were built from continuous variables to predict HYK in primiparous and multiparous cows within breed. Linear models were constructed from continuous variables for primiparous and multiparous cows within breed that were 5 to 11 DIM or 12 to 20 DIM. Milk ketone body thresholds diagnosed HYK with 64.0 to 92.9% accuracy in Holsteins and 59.1 to 86.6% accuracy in Jerseys. Logistic models predicted HYK with 82.6 to 97.3% accuracy. Internally cross-validated multiple linear regression models diagnosed HYK of Holstein cows with 97.8% accuracy for primiparous and 83.3% accuracy for multiparous cows. Accuracy of Jersey models was 81.3% in primiparous and 83
Independent contrasts and PGLS regression estimators are equivalent.

Science.gov (United States)

Blomberg, Simon P; Lefevre, James G; Wells, Jessie A; Waterhouse, Mary

2012-05-01

We prove that the slope parameter of the ordinary least squares regression of phylogenetically independent contrasts (PICs) conducted through the origin is identical to the slope parameter of the method of generalized least squares (GLSs) regression under a Brownian motion model of evolution. This equivalence has several implications: 1. Understanding the structure of the linear model for GLS regression provides insight into when and why phylogeny is important in comparative studies. 2. The limitations of the PIC regression analysis are the same as the limitations of the GLS model. In particular, phylogenetic covariance applies only to the response variable in the regression and the explanatory variable should be regarded as fixed. Calculation of PICs for explanatory variables should be treated as a mathematical idiosyncrasy of the PIC regression algorithm. 3. Since the GLS estimator is the best linear unbiased estimator (BLUE), the slope parameter estimated using PICs is also BLUE. 4. If the slope is estimated using different branch lengths for the explanatory and response variables in the PIC algorithm, the estimator is no longer the BLUE, so this is not recommended. Finally, we discuss whether or not and how to accommodate phylogenetic covariance in regression analyses, particularly in relation to the problem of phylogenetic uncertainty. This discussion is from both frequentist and Bayesian perspectives.
Explaining Discrepancies Between the Digit Triplet Speech-in-Noise Test Score and Self-Reported Hearing Problems in Older Adults.

Science.gov (United States)

Pronk, Marieke; Deeg, Dorly J H; Kramer, Sophia E

2018-04-17

The purpose of this study is to determine which demographic, health-related, mood, personality, or social factors predict discrepancies between older adults' functional speech-in-noise test result and their self-reported hearing problems. Data of 1,061 respondents from the Longitudinal Aging Study Amsterdam were used (ages ranged from 57 to 95 years). Functional hearing problems were measured using a digit triplet speech-in-noise test. Five questions were used to assess self-reported hearing problems. Scores of both hearing measures were dichotomized. Two discrepancy outcomes were created: (a) being unaware: those with functional but without self-reported problems (reference is aware: those with functional and self-reported problems); (b) reporting false complaints: those without functional but with self-reported problems (reference is well: those without functional and self-reported hearing problems). Two multivariable prediction models (logistic regression) were built with 19 candidate predictors. The speech reception threshold in noise was kept (forced) as a predictor in both models. Persons with higher self-efficacy (to initiate behavior) and higher self-esteem had a higher odds to being unaware than persons with lower self-efficacy scores (odds ratio [OR] = 1.13 and 1.11, respectively). Women had a higher odds than men (OR = 1.47). Persons with more chronic diseases and persons with worse (i.e., higher) speech-in-noise reception thresholds in noise had a lower odds to being unaware (OR = 0.85 and 0.91, respectively) than persons with less diseases and better thresholds, respectively. A higher odds to reporting false complaints was predicted by more depressive symptoms (OR = 1.06), more chronic diseases (OR = 1.21), and a larger social network (OR = 1.02). Persons with higher self-efficacy (to complete behavior) had a lower odds (OR = 0.86), whereas persons with higher self-esteem had a higher odds to report false complaints (OR = 1.21). The explained variance
Establishment of regression dependences. Linear and nonlinear dependences

International Nuclear Information System (INIS)

Onishchenko, A.M.

1994-01-01

The main problems of determination of linear and 19 types of nonlinear regression dependences are completely discussed. It is taken into consideration that total dispersions are the sum of measurement dispersions and parameter variation dispersions themselves. Approaches to all dispersions determination are described. It is shown that the least square fit gives inconsistent estimation for industrial objects and processes. The correction methods by taking into account comparable measurement errors for both variable give an opportunity to obtain consistent estimation for the regression equation parameters. The condition of the correction technique application expediency is given. The technique for determination of nonlinear regression dependences taking into account the dependence form and comparable errors of both variables is described. 6 refs., 1 tab

Comparison of multinomial logistic regression and logistic regression: which is more efficient in allocating land use?

Science.gov (United States)

Lin, Yingzhi; Deng, Xiangzheng; Li, Xing; Ma, Enjun

2014-12-01

Spatially explicit simulation of land use change is the basis for estimating the effects of land use and cover change on energy fluxes, ecology and the environment. At the pixel level, logistic regression is one of the most common approaches used in spatially explicit land use allocation models to determine the relationship between land use and its causal factors in driving land use change, and thereby to evaluate land use suitability. However, these models have a drawback in that they do not determine/allocate land use based on the direct relationship between land use change and its driving factors. Consequently, a multinomial logistic regression method was introduced to address this flaw, and thereby, judge the suitability of a type of land use in any given pixel in a case study area of the Jiangxi Province, China. A comparison of the two regression methods indicated that the proportion of correctly allocated pixels using multinomial logistic regression was 92.98%, which was 8.47% higher than that obtained using logistic regression. Paired t-test results also showed that pixels were more clearly distinguished by multinomial logistic regression than by logistic regression. In conclusion, multinomial logistic regression is a more efficient and accurate method for the spatial allocation of land use changes. The application of this method in future land use change studies may improve the accuracy of predicting the effects of land use and cover change on energy fluxes, ecology, and environment.
Testing after Worked Example Study Does Not Enhance Delayed Problem-Solving Performance Compared to Restudy

Science.gov (United States)

van Gog, Tamara; Kester, Liesbeth; Dirkx, Kim; Hoogerheide, Vincent; Boerboom, Joris; Verkoeijen, Peter P. J. L.

2015-01-01

Four experiments investigated whether the testing effect also applies to the acquisition of problem-solving skills from worked examples. Experiment 1 (n?=?120) showed no beneficial effects of testing consisting of "isomorphic" problem solving or "example recall" on final test performance, which consisted of isomorphic problem…
Dual Regression

OpenAIRE

Spady, Richard; Stouli, Sami

2012-01-01

We propose dual regression as an alternative to the quantile regression process for the global estimation of conditional distribution functions under minimal assumptions. Dual regression provides all the interpretational power of the quantile regression process while avoiding the need for repairing the intersecting conditional quantile surfaces that quantile regression often produces in practice. Our approach introduces a mathematical programming characterization of conditional distribution f...
Social problem solving ability predicts mental health among undergraduate students.

Science.gov (United States)

Ranjbar, Mansour; Bayani, Ali Asghar; Bayani, Ali

2013-11-01

The main objective of this study was predicting student's mental health using social problem solving- ability. In this correlational. descriptive study, 369 (208 female and 161 male) from, Mazandaran University of Medical Science were selected through stratified random sampling method. In order to collect the data, the social problem solving inventory-revised and general health questionnaire were used. Data were analyzed through SPSS-19, Pearson's correlation, t test, and stepwise regression analysis. Data analysis showed significant relationship between social problem solving ability and mental health (P Social problem solving ability was significantly associated with the somatic symptoms, anxiety and insomnia, social dysfunction and severe depression (P social problem solving ability and mental health.
Nonlinear reaction-diffusion equations with delay: some theorems, test problems, exact and numerical solutions

Science.gov (United States)

Polyanin, A. D.; Sorokin, V. G.

2017-12-01

The paper deals with nonlinear reaction-diffusion equations with one or several delays. We formulate theorems that allow constructing exact solutions for some classes of these equations, which depend on several arbitrary functions. Examples of application of these theorems for obtaining new exact solutions in elementary functions are provided. We state basic principles of construction, selection, and use of test problems for nonlinear partial differential equations with delay. Some test problems which can be suitable for estimating accuracy of approximate analytical and numerical methods of solving reaction-diffusion equations with delay are presented. Some examples of numerical solutions of nonlinear test problems with delay are considered.
Hyperspectral Unmixing with Robust Collaborative Sparse Regression

Directory of Open Access Journals (Sweden)

Chang Li

2016-07-01

Full Text Available Recently, sparse unmixing (SU of hyperspectral data has received particular attention for analyzing remote sensing images. However, most SU methods are based on the commonly admitted linear mixing model (LMM, which ignores the possible nonlinear effects (i.e., nonlinearity. In this paper, we propose a new method named robust collaborative sparse regression (RCSR based on the robust LMM (rLMM for hyperspectral unmixing. The rLMM takes the nonlinearity into consideration, and the nonlinearity is merely treated as outlier, which has the underlying sparse property. The RCSR simultaneously takes the collaborative sparse property of the abundance and sparsely distributed additive property of the outlier into consideration, which can be formed as a robust joint sparse regression problem. The inexact augmented Lagrangian method (IALM is used to optimize the proposed RCSR. The qualitative and quantitative experiments on synthetic datasets and real hyperspectral images demonstrate that the proposed RCSR is efficient for solving the hyperspectral SU problem compared with the other four state-of-the-art algorithms.
Regression-Based Norms for the Symbol Digit Modalities Test in the Dutch Population: Improving Detection of Cognitive Impairment in Multiple Sclerosis?

Science.gov (United States)

Burggraaff, Jessica; Knol, Dirk L; Uitdehaag, Bernard M J

2017-01-01

Appropriate and timely screening instruments that sensitively capture the cognitive functioning of multiple sclerosis (MS) patients are the need of the hour. We evaluated newly derived regression-based norms for the Symbol Digit Modalities Test (SDMT) in a Dutch-speaking sample, as an indicator of the cognitive state of MS patients. Regression-based norms for the SDMT were created from a healthy control sample (n = 96) and used to convert MS patients' (n = 157) raw scores to demographically adjusted Z-scores, correcting for the effects of age, age2, gender, and education. Conventional and regression-based norms were compared on their impairment-classification rates and related to other neuropsychological measures. The regression analyses revealed that age was the only significantly influencing demographic in our healthy sample. Regression-based norms for the SDMT more readily detected impairment in MS patients than conventional normalization methods (32 patients instead of 15). Patients changing from an SDMT-preserved to -impaired status (n = 17) were also impaired on other cognitive domains (p < 0.05), except for visuospatial memory (p = 0.34). Regression-based norms for the SDMT more readily detect abnormal performance in MS patients than conventional norms, identifying those patients at highest risk for cognitive impairment, which was supported by a worse performance on other neuropsychological measures. © 2017 S. Karger AG, Basel.
Regression Benchmarking: An Approach to Quality Assurance in Performance

OpenAIRE

Bulej, Lubomír

2005-01-01

The paper presents a short summary of our work in the area of regression benchmarking and its application to software development. Specially, we explain the concept of regression benchmarking, the requirements for employing regression testing in a software project, and methods used for analyzing the vast amounts of data resulting from repeated benchmarking. We present the application of regression benchmarking on a real software project and conclude with a glimpse at the challenges for the fu...
Short-term electricity prices forecasting based on support vector regression and Auto-regressive integrated moving average modeling

International Nuclear Information System (INIS)

Che Jinxing; Wang Jianzhou

2010-01-01

In this paper, we present the use of different mathematical models to forecast electricity price under deregulated power. A successful prediction tool of electricity price can help both power producers and consumers plan their bidding strategies. Inspired by that the support vector regression (SVR) model, with the ε-insensitive loss function, admits of the residual within the boundary values of ε-tube, we propose a hybrid model that combines both SVR and Auto-regressive integrated moving average (ARIMA) models to take advantage of the unique strength of SVR and ARIMA models in nonlinear and linear modeling, which is called SVRARIMA. A nonlinear analysis of the time-series indicates the convenience of nonlinear modeling, the SVR is applied to capture the nonlinear patterns. ARIMA models have been successfully applied in solving the residuals regression estimation problems. The experimental results demonstrate that the model proposed outperforms the existing neural-network approaches, the traditional ARIMA models and other hybrid models based on the root mean square error and mean absolute percentage error.
Multi-stratified multiple regression tests of the linear/no-threshold theory of radon-induced lung cancer

International Nuclear Information System (INIS)

Cohen, B.L.

1992-01-01

A plot of lung-cancer rates versus radon exposures in 965 US counties, or in all US states, has a strong negative slope, b, in sharp contrast to the strong positive slope predicted by linear/no-threshold theory. The discrepancy between these slopes exceeds 20 standard deviations (SD). Including smoking frequency in the analysis substantially improves fits to a linear relationship but has little effect on the discrepancy in b, because correlations between smoking frequency and radon levels are quite weak. Including 17 socioeconomic variables (SEV) in multiple regression analysis reduces the discrepancy to 15 SD. Data were divided into segments by stratifying on each SEV in turn, and on geography, and on both simultaneously, giving over 300 data sets to be analyzed individually, but negative slopes predominated. The slope is negative whether one considers only the most urban counties or only the most rural; only the richest or only the poorest; only the richest in the South Atlantic region or only the poorest in that region, etc., etc.,; and for all the strata in between. Since this is an ecological study, the well-known problems with ecological studies were investigated and found not to be applicable here. The open-quotes ecological fallacyclose quotes was shown not to apply in testing a linear/no-threshold theory, and the vulnerability to confounding is greatly reduced when confounding factors are only weakly correlated with radon levels, as is generally the case here. All confounding factors known to correlate with radon and with lung cancer were investigated quantitatively and found to have little effect on the discrepancy
Detection of epistatic effects with logic regression and a classical linear regression model.

Science.gov (United States)

Malina, Magdalena; Ickstadt, Katja; Schwender, Holger; Posch, Martin; Bogdan, Małgorzata

2014-02-01

To locate multiple interacting quantitative trait loci (QTL) influencing a trait of interest within experimental populations, usually methods as the Cockerham's model are applied. Within this framework, interactions are understood as the part of the joined effect of several genes which cannot be explained as the sum of their additive effects. However, if a change in the phenotype (as disease) is caused by Boolean combinations of genotypes of several QTLs, this Cockerham's approach is often not capable to identify them properly. To detect such interactions more efficiently, we propose a logic regression framework. Even though with the logic regression approach a larger number of models has to be considered (requiring more stringent multiple testing correction) the efficient representation of higher order logic interactions in logic regression models leads to a significant increase of power to detect such interactions as compared to a Cockerham's approach. The increase in power is demonstrated analytically for a simple two-way interaction model and illustrated in more complex settings with simulation study and real data analysis.
Variable and subset selection in PLS regression

DEFF Research Database (Denmark)

Høskuldsson, Agnar

2001-01-01

The purpose of this paper is to present some useful methods for introductory analysis of variables and subsets in relation to PLS regression. We present here methods that are efficient in finding the appropriate variables or subset to use in the PLS regression. The general conclusion...... is that variable selection is important for successful analysis of chemometric data. An important aspect of the results presented is that lack of variable selection can spoil the PLS regression, and that cross-validation measures using a test set can show larger variation, when we use different subsets of X, than...
Support Vector Regression-Based Adaptive Divided Difference Filter for Nonlinear State Estimation Problems

Directory of Open Access Journals (Sweden)

Hongjian Wang

2014-01-01

Full Text Available We present a support vector regression-based adaptive divided difference filter (SVRADDF algorithm for improving the low state estimation accuracy of nonlinear systems, which are typically affected by large initial estimation errors and imprecise prior knowledge of process and measurement noises. The derivative-free SVRADDF algorithm is significantly simpler to compute than other methods and is implemented using only functional evaluations. The SVRADDF algorithm involves the use of the theoretical and actual covariance of the innovation sequence. Support vector regression (SVR is employed to generate the adaptive factor to tune the noise covariance at each sampling instant when the measurement update step executes, which improves the algorithm’s robustness. The performance of the proposed algorithm is evaluated by estimating states for (i an underwater nonmaneuvering target bearing-only tracking system and (ii maneuvering target bearing-only tracking in an air-traffic control system. The simulation results show that the proposed SVRADDF algorithm exhibits better performance when compared with a traditional DDF algorithm.
Distributed Monitoring of the R2 Statistic for Linear Regression

Data.gov (United States)

National Aeronautics and Space Administration — The problem of monitoring a multivariate linear regression model is relevant in studying the evolving relationship between a set of input variables (features) and...
Bradley’s Regress, Russell’s States of Affairs, and Some General Remarks on the Problem

Directory of Open Access Journals (Sweden)

Holger Leerhoff

2008-12-01

Full Text Available In this paper, I will give a presentation of Bradley's two main arguments against the reality of relations. Whereas one of his arguments is highly specific to Bradley's metaphysical background, his famous regress argument seems to pose a serious threat not only for ontological pluralism, but especially for states of affairs as an ontological category. Amongst the proponents of states-of-affairs ontologies two groups can be distinguished: One group holds states of affairs to be complexes consisting of their particular and universal constituents alone, the other holds that there has to be a "unifying relation" of some sort to establish the unity of a given state of affairs. Bradley's regress is often conceived to be a compelling argument against the first and for the latter. I will argue that the latter approaches have no real advantage over the simpler theories—neither in the light of Bradley's regress nor in other respects.
Piecewise linear regression splines with hyperbolic covariates

International Nuclear Information System (INIS)

Cologne, John B.; Sposto, Richard

1992-09-01

Consider the problem of fitting a curve to data that exhibit a multiphase linear response with smooth transitions between phases. We propose substituting hyperbolas as covariates in piecewise linear regression splines to obtain curves that are smoothly joined. The method provides an intuitive and easy way to extend the two-phase linear hyperbolic response model of Griffiths and Miller and Watts and Bacon to accommodate more than two linear segments. The resulting regression spline with hyperbolic covariates may be fit by nonlinear regression methods to estimate the degree of curvature between adjoining linear segments. The added complexity of fitting nonlinear, as opposed to linear, regression models is not great. The extra effort is particularly worthwhile when investigators are unwilling to assume that the slope of the response changes abruptly at the join points. We can also estimate the join points (the values of the abscissas where the linear segments would intersect if extrapolated) if their number and approximate locations may be presumed known. An example using data on changing age at menarche in a cohort of Japanese women illustrates the use of the method for exploratory data analysis. (author)
Stochastic development regression using method of moments

DEFF Research Database (Denmark)

Kühnel, Line; Sommer, Stefan Horst

2017-01-01

This paper considers the estimation problem arising when inferring parameters in the stochastic development regression model for manifold valued non-linear data. Stochastic development regression captures the relation between manifold-valued response and Euclidean covariate variables using...... the stochastic development construction. It is thereby able to incorporate several covariate variables and random effects. The model is intrinsically defined using the connection of the manifold, and the use of stochastic development avoids linearizing the geometry. We propose to infer parameters using...... the Method of Moments procedure that matches known constraints on moments of the observations conditional on the latent variables. The performance of the model is investigated in a simulation example using data on finite dimensional landmark manifolds....
Optimal regression for reasoning about knowledge and actions

NARCIS (Netherlands)

Ditmarsch, van H.; Herzig, Andreas; Lima, de Tiago

2007-01-01

We show how in the propositional case both Reiter’s and Scherl & Levesque’s solutions to the frame problem can be modelled in dynamic epistemic logic (DEL), and provide an optimal regression algorithm for the latter. Our method is as follows: we extend Reiter’s framework by integrating observation
Penalized linear regression for discrete ill-posed problems: A hybrid least-squares and mean-squared error approach

KAUST Repository

Suliman, Mohamed Abdalla Elhag

2016-12-19

This paper proposes a new approach to find the regularization parameter for linear least-squares discrete ill-posed problems. In the proposed approach, an artificial perturbation matrix with a bounded norm is forced into the discrete ill-posed model matrix. This perturbation is introduced to enhance the singular-value (SV) structure of the matrix and hence to provide a better solution. The proposed approach is derived to select the regularization parameter in a way that minimizes the mean-squared error (MSE) of the estimator. Numerical results demonstrate that the proposed approach outperforms a set of benchmark methods in most cases when applied to different scenarios of discrete ill-posed problems. Jointly, the proposed approach enjoys the lowest run-time and offers the highest level of robustness amongst all the tested methods.
Regression Models For Multivariate Count Data.

Science.gov (United States)

Zhang, Yiwen; Zhou, Hua; Zhou, Jin; Sun, Wei

2017-01-01

Data with multivariate count responses frequently occur in modern applications. The commonly used multinomial-logit model is limiting due to its restrictive mean-variance structure. For instance, analyzing count data from the recent RNA-seq technology by the multinomial-logit model leads to serious errors in hypothesis testing. The ubiquity of over-dispersion and complicated correlation structures among multivariate counts calls for more flexible regression models. In this article, we study some generalized linear models that incorporate various correlation structures among the counts. Current literature lacks a treatment of these models, partly due to the fact that they do not belong to the natural exponential family. We study the estimation, testing, and variable selection for these models in a unifying framework. The regression models are compared on both synthetic and real RNA-seq data.

Social problem solving ability predicts mental health among undergraduate students

Directory of Open Access Journals (Sweden)

Mansour Ranjbar

2013-01-01

Methods : In this correlational- descriptive study, 369 (208 female and 161 male from, Mazandaran University of Medical Science were selected through stratified random sampling method. In order to collect the data, the social problem solving inventory-revised and general health questionnaire were used. Data were analyzed through SPSS-19, Pearson′s correlation, t test, and stepwise regression analysis. Results : Data analysis showed significant relationship between social problem solving ability and mental health (P < 0.01. Social problem solving ability was significantly associated with the somatic symptoms, anxiety and insomnia, social dysfunction and severe depression (P < 0.01. Conclusions: The results of our study demonstrated that there is a significant correlation between social problem solving ability and mental health.
The Effect of Multicollinearity and the Violation of the Assumption of Normality on the Testing of Hypotheses in Regression Analysis.

Science.gov (United States)

Vasu, Ellen S.; Elmore, Patricia B.

The effects of the violation of the assumption of normality coupled with the condition of multicollinearity upon the outcome of testing the hypothesis Beta equals zero in the two-predictor regression equation is investigated. A monte carlo approach was utilized in which three differenct distributions were sampled for two sample sizes over…
Special problems in making geotechnical measurements in salt

International Nuclear Information System (INIS)

Verslvis, S.; Lindner, E.N.

1983-01-01

The transfer of experience, theory, and instrumentation suitable for hard rock media has posed numerous problems which this paper will address. Foremost of these pertains to the time-dependent (creep) behavior of salt. The theoretical mechanism is elusive; creep laws formulated to predict this behavior represent the state of the art in regression analysis. Furthermore, long term experiments (1 year) that would be necessary to determine creep mechanism(s) are enormously expensive and tie-up test equipment. Second, tests for determining in situ stress are based on the theory of elasticity. However anelastic (non-recoverable) strains contribute a significant portion of the material behavior precluding back calculating in situ stresses. Another problem pertains to the rate-dependent behavior of salt. Loading and temperature gradients experienced in the laboratory are more severe than would be experienced in a repository. Significant differences in material behavior can be expected along with special problems with instrumentation
Regression: A Bibliography.

Science.gov (United States)

Pedrini, D. T.; Pedrini, Bonnie C.

Regression, another mechanism studied by Sigmund Freud, has had much research, e.g., hypnotic regression, frustration regression, schizophrenic regression, and infra-human-animal regression (often directly related to fixation). Many investigators worked with hypnotic age regression, which has a long history, going back to Russian reflexologists.…
Parameter Selection Method for Support Vector Regression Based on Adaptive Fusion of the Mixed Kernel Function

Directory of Open Access Journals (Sweden)

Hailun Wang

2017-01-01

Full Text Available Support vector regression algorithm is widely used in fault diagnosis of rolling bearing. A new model parameter selection method for support vector regression based on adaptive fusion of the mixed kernel function is proposed in this paper. We choose the mixed kernel function as the kernel function of support vector regression. The mixed kernel function of the fusion coefficients, kernel function parameters, and regression parameters are combined together as the parameters of the state vector. Thus, the model selection problem is transformed into a nonlinear system state estimation problem. We use a 5th-degree cubature Kalman filter to estimate the parameters. In this way, we realize the adaptive selection of mixed kernel function weighted coefficients and the kernel parameters, the regression parameters. Compared with a single kernel function, unscented Kalman filter (UKF support vector regression algorithms, and genetic algorithms, the decision regression function obtained by the proposed method has better generalization ability and higher prediction accuracy.
On Weighted Support Vector Regression

DEFF Research Database (Denmark)

Han, Xixuan; Clemmensen, Line Katrine Harder

2014-01-01

We propose a new type of weighted support vector regression (SVR), motivated by modeling local dependencies in time and space in prediction of house prices. The classic weights of the weighted SVR are added to the slack variables in the objective function (OF‐weights). This procedure directly...... shrinks the coefficient of each observation in the estimated functions; thus, it is widely used for minimizing influence of outliers. We propose to additionally add weights to the slack variables in the constraints (CF‐weights) and call the combination of weights the doubly weighted SVR. We illustrate...... the differences and similarities of the two types of weights by demonstrating the connection between the Least Absolute Shrinkage and Selection Operator (LASSO) and the SVR. We show that an SVR problem can be transformed to a LASSO problem plus a linear constraint and a box constraint. We demonstrate...
Tax Evasion, Information Reporting, and the Regressive Bias Hypothesis

DEFF Research Database (Denmark)

Boserup, Simon Halphen; Pinje, Jori Veng

A robust prediction from the tax evasion literature is that optimal auditing induces a regressive bias in effective tax rates compared to statutory rates. If correct, this will have important distributional consequences. Nevertheless, the regressive bias hypothesis has never been tested empirically...
Steganalysis using logistic regression

Science.gov (United States)

Lubenko, Ivans; Ker, Andrew D.

2011-02-01

We advocate Logistic Regression (LR) as an alternative to the Support Vector Machine (SVM) classifiers commonly used in steganalysis. LR offers more information than traditional SVM methods - it estimates class probabilities as well as providing a simple classification - and can be adapted more easily and efficiently for multiclass problems. Like SVM, LR can be kernelised for nonlinear classification, and it shows comparable classification accuracy to SVM methods. This work is a case study, comparing accuracy and speed of SVM and LR classifiers in detection of LSB Matching and other related spatial-domain image steganography, through the state-of-art 686-dimensional SPAM feature set, in three image sets.
Regressão múltipla stepwise e hierárquica em Psicologia Organizacional: aplicações, problemas e soluções Stepwise and hierarchical multiple regression in organizational psychology: Applications, problemas and solutions

Directory of Open Access Journals (Sweden)

Gardênia Abbad

2002-01-01

Full Text Available Este artigo discute algumas aplicações das técnicas de análise de regressão múltipla stepwise e hierárquica, as quais são muito utilizadas em pesquisas da área de Psicologia Organizacional. São discutidas algumas estratégias de identificação e de solução de problemas relativos à ocorrência de erros do Tipo I e II e aos fenômenos de supressão, complementaridade e redundância nas equações de regressão múltipla. São apresentados alguns exemplos de pesquisas nas quais esses padrões de associação entre variáveis estiveram presentes e descritas as estratégias utilizadas pelos pesquisadores para interpretá-los. São discutidas as aplicações dessas análises no estudo de interação entre variáveis e na realização de testes para avaliação da linearidade do relacionamento entre variáveis. Finalmente, são apresentadas sugestões para lidar com as limitações das análises de regressão múltipla (stepwise e hierárquica.This article discusses applications of stepwise and hierarchical multiple regression analyses to research in organizational psychology. Strategies for identifying type I and II errors, and solutions to potential problems that may arise from such errors are proposed. In addition, phenomena such as suppression, complementarity, and redundancy are reviewed. The article presents examples of research where these phenomena occurred, and the manner in which they were explained by researchers. Some applications of multiple regression analyses to studies involving between-variable interactions are presented, along with tests used to analyze the presence of linearity among variables. Finally, some suggestions are provided for dealing with limitations implicit in multiple regression analyses (stepwise and hierarchical.
Working memory dysfunctions predict social problem solving skills in schizophrenia.

Science.gov (United States)

Huang, Jia; Tan, Shu-ping; Walsh, Sarah C; Spriggens, Lauren K; Neumann, David L; Shum, David H K; Chan, Raymond C K

2014-12-15

The current study aimed to examine the contribution of neurocognition and social cognition to components of social problem solving. Sixty-seven inpatients with schizophrenia and 31 healthy controls were administrated batteries of neurocognitive tests, emotion perception tests, and the Chinese Assessment of Interpersonal Problem Solving Skills (CAIPSS). MANOVAs were conducted to investigate the domains in which patients with schizophrenia showed impairments. Correlations were used to determine which impaired domains were associated with social problem solving, and multiple regression analyses were conducted to compare the relative contribution of neurocognitive and social cognitive functioning to components of social problem solving. Compared with healthy controls, patients with schizophrenia performed significantly worse in sustained attention, working memory, negative emotion, intention identification and all components of the CAIPSS. Specifically, sustained attention, working memory and negative emotion identification were found to correlate with social problem solving and 1-back accuracy significantly predicted the poor performance in social problem solving. Among the dysfunctions in schizophrenia, working memory contributed most to deficits in social problem solving in patients with schizophrenia. This finding provides support for targeting working memory in the development of future social problem solving rehabilitation interventions. Copyright © 2014 Elsevier Ireland Ltd. All rights reserved.
The efficiency of modified jackknife and ridge type regression estimators: a comparison

Directory of Open Access Journals (Sweden)

Sharad Damodar Gore

2008-09-01

Full Text Available A common problem in multiple regression models is multicollinearity, which produces undesirable effects on the least squares estimator. To circumvent this problem, two well known estimation procedures are often suggested in the literature. They are Generalized Ridge Regression (GRR estimation suggested by Hoerl and Kennard iteb8 and the Jackknifed Ridge Regression (JRR estimation suggested by Singh et al. iteb13. The GRR estimation leads to a reduction in the sampling variance, whereas, JRR leads to a reduction in the bias. In this paper, we propose a new estimator namely, Modified Jackknife Ridge Regression Estimator (MJR. It is based on the criterion that combines the ideas underlying both the GRR and JRR estimators. We have investigated standard properties of this new estimator. From a simulation study, we find that the new estimator often outperforms the LASSO, and it is superior to both GRR and JRR estimators, using the mean squared error criterion. The conditions under which the MJR estimator is better than the other two competing estimators have been investigated.
Learning a Nonnegative Sparse Graph for Linear Regression.

Science.gov (United States)

Fang, Xiaozhao; Xu, Yong; Li, Xuelong; Lai, Zhihui; Wong, Wai Keung

2015-09-01

Previous graph-based semisupervised learning (G-SSL) methods have the following drawbacks: 1) they usually predefine the graph structure and then use it to perform label prediction, which cannot guarantee an overall optimum and 2) they only focus on the label prediction or the graph structure construction but are not competent in handling new samples. To this end, a novel nonnegative sparse graph (NNSG) learning method was first proposed. Then, both the label prediction and projection learning were integrated into linear regression. Finally, the linear regression and graph structure learning were unified within the same framework to overcome these two drawbacks. Therefore, a novel method, named learning a NNSG for linear regression was presented, in which the linear regression and graph learning were simultaneously performed to guarantee an overall optimum. In the learning process, the label information can be accurately propagated via the graph structure so that the linear regression can learn a discriminative projection to better fit sample labels and accurately classify new samples. An effective algorithm was designed to solve the corresponding optimization problem with fast convergence. Furthermore, NNSG provides a unified perceptiveness for a number of graph-based learning methods and linear regression methods. The experimental results showed that NNSG can obtain very high classification accuracy and greatly outperforms conventional G-SSL methods, especially some conventional graph construction methods.
Usability Testing: Too Early? Too Much Talking? Too Many Problems?

DEFF Research Database (Denmark)

Hertzum, Morten

2016-01-01

Usability testing has evolved in response to a search for tests that are cheap, early, easy, and fast. In addition, it accords with a situational definition of usability, such as the one propounded by ISO. By approaching usability from an organizational perspective, this author argues that usabil......Usability testing has evolved in response to a search for tests that are cheap, early, easy, and fast. In addition, it accords with a situational definition of usability, such as the one propounded by ISO. By approaching usability from an organizational perspective, this author argues...... that usability should (also) be evaluated late, that usability professionals should be wary of thinking aloud, and that they should focus more on effects achievement than problem detection....
Application of single-step genomic best linear unbiased prediction with a multiple-lactation random regression test-day model for Japanese Holsteins.

Science.gov (United States)

Baba, Toshimi; Gotoh, Yusaku; Yamaguchi, Satoshi; Nakagawa, Satoshi; Abe, Hayato; Masuda, Yutaka; Kawahara, Takayoshi

2017-08-01

This study aimed to evaluate a validation reliability of single-step genomic best linear unbiased prediction (ssGBLUP) with a multiple-lactation random regression test-day model and investigate an effect of adding genotyped cows on the reliability. Two data sets for test-day records from the first three lactations were used: full data from February 1975 to December 2015 (60 850 534 records from 2 853 810 cows) and reduced data cut off in 2011 (53 091 066 records from 2 502 307 cows). We used marker genotypes of 4480 bulls and 608 cows. Genomic enhanced breeding values (GEBV) of 305-day milk yield in all the lactations were estimated for at least 535 young bulls using two marker data sets: bull genotypes only and both bulls and cows genotypes. The realized reliability (R 2 ) from linear regression analysis was used as an indicator of validation reliability. Using only genotyped bulls, R 2 was ranged from 0.41 to 0.46 and it was always higher than parent averages. The very similar R 2 were observed when genotyped cows were added. An application of ssGBLUP to a multiple-lactation random regression model is feasible and adding a limited number of genotyped cows has no significant effect on reliability of GEBV for genotyped bulls. © 2016 Japanese Society of Animal Science.
Geographically weighted regression and multicollinearity: dispelling the myth

Science.gov (United States)

Fotheringham, A. Stewart; Oshan, Taylor M.

2016-10-01

Geographically weighted regression (GWR) extends the familiar regression framework by estimating a set of parameters for any number of locations within a study area, rather than producing a single parameter estimate for each relationship specified in the model. Recent literature has suggested that GWR is highly susceptible to the effects of multicollinearity between explanatory variables and has proposed a series of local measures of multicollinearity as an indicator of potential problems. In this paper, we employ a controlled simulation to demonstrate that GWR is in fact very robust to the effects of multicollinearity. Consequently, the contention that GWR is highly susceptible to multicollinearity issues needs rethinking.
Changes in persistence, spurious regressions and the Fisher hypothesis

DEFF Research Database (Denmark)

Kruse, Robinson; Ventosa-Santaulària, Daniel; Noriega, Antonio E.

Declining inflation persistence has been documented in numerous studies. When such series are analyzed in a regression framework in conjunction with other persistent time series, spurious regressions are likely to occur. We propose to use the coefficient of determination R2 as a test statistic to...
The use of regression analysis in determining reference intervals for low hematocrit and thrombocyte count in multiple electrode aggregometry and platelet function analyzer 100 testing of platelet function.

Science.gov (United States)

Kuiper, Gerhardus J A J M; Houben, Rik; Wetzels, Rick J H; Verhezen, Paul W M; Oerle, Rene van; Ten Cate, Hugo; Henskens, Yvonne M C; Lancé, Marcus D

2017-11-01

Low platelet counts and hematocrit levels hinder whole blood point-of-care testing of platelet function. Thus far, no reference ranges for MEA (multiple electrode aggregometry) and PFA-100 (platelet function analyzer 100) devices exist for low ranges. Through dilution methods of volunteer whole blood, platelet function at low ranges of platelet count and hematocrit levels was assessed on MEA for four agonists and for PFA-100 in two cartridges. Using (multiple) regression analysis, 95% reference intervals were computed for these low ranges. Low platelet counts affected MEA in a positive correlation (all agonists showed r 2 ≥ 0.75) and PFA-100 in an inverse correlation (closure times were prolonged with lower platelet counts). Lowered hematocrit did not affect MEA testing, except for arachidonic acid activation (ASPI), which showed a weak positive correlation (r 2 = 0.14). Closure time on PFA-100 testing was inversely correlated with hematocrit for both cartridges. Regression analysis revealed different 95% reference intervals in comparison with originally established intervals for both MEA and PFA-100 in low platelet or hematocrit conditions. Multiple regression analysis of ASPI and both tests on the PFA-100 for combined low platelet and hematocrit conditions revealed that only PFA-100 testing should be adjusted for both thrombocytopenia and anemia. 95% reference intervals were calculated using multiple regression analysis. However, coefficients of determination of PFA-100 were poor, and some variance remained unexplained. Thus, in this pilot study using (multiple) regression analysis, we could establish reference intervals of platelet function in anemia and thrombocytopenia conditions on PFA-100 and in thrombocytopenia conditions on MEA.
A generalized right truncated bivariate Poisson regression model with applications to health data.

Science.gov (United States)

Islam, M Ataharul; Chowdhury, Rafiqul I

2017-01-01

A generalized right truncated bivariate Poisson regression model is proposed in this paper. Estimation and tests for goodness of fit and over or under dispersion are illustrated for both untruncated and right truncated bivariate Poisson regression models using marginal-conditional approach. Estimation and test procedures are illustrated for bivariate Poisson regression models with applications to Health and Retirement Study data on number of health conditions and the number of health care services utilized. The proposed test statistics are easy to compute and it is evident from the results that the models fit the data very well. A comparison between the right truncated and untruncated bivariate Poisson regression models using the test for nonnested models clearly shows that the truncated model performs significantly better than the untruncated model.
[The effect of prison crowding on prisoners' violence in Japan: testing with cointegration regressions and error correction models].

Science.gov (United States)

Yuma, Yoshikazu

2010-08-01

This research examined the effect of prison population densities (PPD) on inmate-inmate prison violence rates (PVR) in Japan using one-year-interval time-series data (1972-2006). Cointegration regressions revealed a long-run equilibrium relationship between PPD and PVR. PPD had a significant and increasing effect on PVR in the long-term. Error correction models showed that in the short-term, the effect of PPD was significant and positive on PVR, even after controlling for the effects of the proportions of males, age younger than 30 years, less than one-year incarceration, and prisoner/staff ratio. The results were discussed in regard to (a) differences between Japanese prisons and prisons in the United States, and (b) methodological problems found in previous research.
Sparse Inverse Gaussian Process Regression with Application to Climate Network Discovery

Data.gov (United States)

National Aeronautics and Space Administration — Regression problems on massive data sets are ubiquitous in many application domains including the Internet, earth and space sciences, and finances. Gaussian Process...

Wavelet regression model in forecasting crude oil price

Science.gov (United States)

Hamid, Mohd Helmie; Shabri, Ani

2017-05-01

This study presents the performance of wavelet multiple linear regression (WMLR) technique in daily crude oil forecasting. WMLR model was developed by integrating the discrete wavelet transform (DWT) and multiple linear regression (MLR) model. The original time series was decomposed to sub-time series with different scales by wavelet theory. Correlation analysis was conducted to assist in the selection of optimal decomposed components as inputs for the WMLR model. The daily WTI crude oil price series has been used in this study to test the prediction capability of the proposed model. The forecasting performance of WMLR model were also compared with regular multiple linear regression (MLR), Autoregressive Moving Average (ARIMA) and Generalized Autoregressive Conditional Heteroscedasticity (GARCH) using root mean square errors (RMSE) and mean absolute errors (MAE). Based on the experimental results, it appears that the WMLR model performs better than the other forecasting technique tested in this study.
Sparse Reduced-Rank Regression for Simultaneous Dimension Reduction and Variable Selection

KAUST Repository

Chen, Lisha

2012-12-01

The reduced-rank regression is an effective method in predicting multiple response variables from the same set of predictor variables. It reduces the number of model parameters and takes advantage of interrelations between the response variables and hence improves predictive accuracy. We propose to select relevant variables for reduced-rank regression by using a sparsity-inducing penalty. We apply a group-lasso type penalty that treats each row of the matrix of the regression coefficients as a group and show that this penalty satisfies certain desirable invariance properties. We develop two numerical algorithms to solve the penalized regression problem and establish the asymptotic consistency of the proposed method. In particular, the manifold structure of the reduced-rank regression coefficient matrix is considered and studied in our theoretical analysis. In our simulation study and real data analysis, the new method is compared with several existing variable selection methods for multivariate regression and exhibits competitive performance in prediction and variable selection. © 2012 American Statistical Association.
DYNAMIC PROGRAMMING APPROACH TO TESTING RESOURCE ALLOCATION PROBLEM FOR MODULAR SOFTWARE

Directory of Open Access Journals (Sweden)

P.K. Kapur

2003-02-01

Full Text Available Testing phase of a software begins with module testing. During this period modules are tested independently to remove maximum possible number of faults within a specified time limit or testing resource budget. This gives rise to some interesting optimization problems, which are discussed in this paper. Two Optimization models are proposed for optimal allocation of testing resources among the modules of a Software. In the first model, we maximize the total fault removal, subject to budgetary Constraint. In the second model, additional constraint representing aspiration level for fault removals for each module of the software is added. These models are solved using dynamic programming technique. The methods have been illustrated through numerical examples.
Tools to support interpreting multiple regression in the face of multicollinearity.

Science.gov (United States)

Kraha, Amanda; Turner, Heather; Nimon, Kim; Zientek, Linda Reichwein; Henson, Robin K

2012-01-01

While multicollinearity may increase the difficulty of interpreting multiple regression (MR) results, it should not cause undue problems for the knowledgeable researcher. In the current paper, we argue that rather than using one technique to investigate regression results, researchers should consider multiple indices to understand the contributions that predictors make not only to a regression model, but to each other as well. Some of the techniques to interpret MR effects include, but are not limited to, correlation coefficients, beta weights, structure coefficients, all possible subsets regression, commonality coefficients, dominance weights, and relative importance weights. This article will review a set of techniques to interpret MR effects, identify the elements of the data on which the methods focus, and identify statistical software to support such analyses.
Testing of the PELSHIE shielding code using Benchmark problems and other special shielding models

International Nuclear Information System (INIS)

Language, A.E.; Sartori, D.E.; De Beer, G.P.

1981-08-01

The PELSHIE shielding code for gamma rays from point and extended sources was written in 1971 and a revised version was published in October 1979. At Pelindaba the program is used extensively due to its flexibility and ease of use for a wide range of problems. The testing of PELSHIE results with the results of a range of models and so-called Benchmark problems is desirable to determine possible weaknesses in PELSHIE. Benchmark problems, experimental data, and shielding models, some of which were resolved by the discrete-ordinates method with the ANISN and DOT 3.5 codes, were used for the efficiency test. The description of the models followed the pattern of a classical shielding problem. After the intercomparison with six different models, the usefulness of the PELSHIE code was quantitatively determined [af
Cognitive functioning and social problem-solving skills in schizophrenia.

Science.gov (United States)

Hatashita-Wong, Michi; Smith, Thomas E; Silverstein, Steven M; Hull, James W; Willson, Deborah F

2002-05-01

This study examined the relationships between symptoms, cognitive functioning, and social skill deficits in schizophrenia. Few studies have incorporated measures of cognitive functioning and symptoms in predictive models for social problem solving. For our study, 44 participants were recruited from consecutive outpatient admissions. Neuropsychological tests were given to assess cognitive function, and social problem solving was assessed using structured vignettes designed to evoke the participant's ability to generate, evaluate, and apply solutions to social problems. A sequential model-fitting method of analysis was used to incorporate social problem solving, symptom presentation, and cognitive impairment into linear regression models. Predictor variables were drawn from demographic, cognitive, and symptom domains. Because this method of analysis was exploratory and not intended as hierarchical modelling, no a priori hypotheses were proposed. Participants with higher scores on tests of cognitive flexibility were better able to generate accurate, appropriate, and relevant responses to the social problem-solving vignettes. The results suggest that cognitive flexibility is a potentially important mediating factor in social problem-solving competence. While other factors are related to social problem-solving skill, this study supports the importance of cognition and understanding how it relates to the complex and multifaceted nature of social functioning.
A Comparative Study of Classification and Regression Algorithms for Modelling Students' Academic Performance

Science.gov (United States)

Strecht, Pedro; Cruz, Luís; Soares, Carlos; Mendes-Moreira, João; Abreu, Rui

2015-01-01

Predicting the success or failure of a student in a course or program is a problem that has recently been addressed using data mining techniques. In this paper we evaluate some of the most popular classification and regression algorithms on this problem. We address two problems: prediction of approval/failure and prediction of grade. The former is…
Ordinary least square regression, orthogonal regression, geometric mean regression and their applications in aerosol science

International Nuclear Information System (INIS)

Leng Ling; Zhang Tianyi; Kleinman, Lawrence; Zhu Wei

2007-01-01

Regression analysis, especially the ordinary least squares method which assumes that errors are confined to the dependent variable, has seen a fair share of its applications in aerosol science. The ordinary least squares approach, however, could be problematic due to the fact that atmospheric data often does not lend itself to calling one variable independent and the other dependent. Errors often exist for both measurements. In this work, we examine two regression approaches available to accommodate this situation. They are orthogonal regression and geometric mean regression. Comparisons are made theoretically as well as numerically through an aerosol study examining whether the ratio of organic aerosol to CO would change with age
Canonical variate regression.

Science.gov (United States)

Luo, Chongliang; Liu, Jin; Dey, Dipak K; Chen, Kun

2016-07-01

In many fields, multi-view datasets, measuring multiple distinct but interrelated sets of characteristics on the same set of subjects, together with data on certain outcomes or phenotypes, are routinely collected. The objective in such a problem is often two-fold: both to explore the association structures of multiple sets of measurements and to develop a parsimonious model for predicting the future outcomes. We study a unified canonical variate regression framework to tackle the two problems simultaneously. The proposed criterion integrates multiple canonical correlation analysis with predictive modeling, balancing between the association strength of the canonical variates and their joint predictive power on the outcomes. Moreover, the proposed criterion seeks multiple sets of canonical variates simultaneously to enable the examination of their joint effects on the outcomes, and is able to handle multivariate and non-Gaussian outcomes. An efficient algorithm based on variable splitting and Lagrangian multipliers is proposed. Simulation studies show the superior performance of the proposed approach. We demonstrate the effectiveness of the proposed approach in an [Formula: see text] intercross mice study and an alcohol dependence study. © The Author 2016. Published by Oxford University Press. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.
Random Forest as a Predictive Analytics Alternative to Regression in Institutional Research

Science.gov (United States)

He, Lingjun; Levine, Richard A.; Fan, Juanjuan; Beemer, Joshua; Stronach, Jeanne

2018-01-01

In institutional research, modern data mining approaches are seldom considered to address predictive analytics problems. The goal of this paper is to highlight the advantages of tree-based machine learning algorithms over classic (logistic) regression methods for data-informed decision making in higher education problems, and stress the success of…
Dynamic Optimization for IPS2 Resource Allocation Based on Improved Fuzzy Multiple Linear Regression

Directory of Open Access Journals (Sweden)

Maokuan Zheng

2017-01-01

Full Text Available The study mainly focuses on resource allocation optimization for industrial product-service systems (IPS2. The development of IPS2 leads to sustainable economy by introducing cooperative mechanisms apart from commodity transaction. The randomness and fluctuation of service requests from customers lead to the volatility of IPS2 resource utilization ratio. Three basic rules for resource allocation optimization are put forward to improve system operation efficiency and cut unnecessary costs. An approach based on fuzzy multiple linear regression (FMLR is developed, which integrates the strength and concision of multiple linear regression in data fitting and factor analysis and the merit of fuzzy theory in dealing with uncertain or vague problems, which helps reduce those costs caused by unnecessary resource transfer. The iteration mechanism is introduced in the FMLR algorithm to improve forecasting accuracy. A case study of human resource allocation optimization in construction machinery industry is implemented to test and verify the proposed model.
Patch testing with markers of fragrance contact allergy. Do clinical tests correspond to patients' self-reported problems?

Science.gov (United States)

Johansen, J D; Andersen, T F; Veien, N; Avnstorp, C; Andersen, K E; Menné, T

1997-03-01

The aim of the present study was to investigate the relationship between patients' own recognition of skin problems using consumer products and the results of patch testing with markers of fragrance sensitization. Eight hundred and eighty-four consecutive eczema patients, 18-69 years of age, filled in a questionnaire prior to patch testing with the European standard series. The questionnaire contained questions about skin symptoms from the use of scented and unscented products as well as skin reactions from contact with spices, flowers and citrus fruits that could indicate fragrance sensitivity. A highly significant association was found between reporting a history of visible skin symptoms from using scented products and a positive patch test to the fragrance mix, whereas no such relationship could be established to the Peru balsam in univariate or multivariate analysis. Our results suggest that the role of Peru balsam in detecting relevant fragrance contact allergy is limited, while most fragrance mix-positive patients are aware that the use of scented products may cause skin problems.
A Test Set for stiff Initial Value Problem Solvers in the open source software R: Package deTestSet

NARCIS (Netherlands)

Mazzia, F.; Cash, J.R.; Soetaert, K.

2012-01-01

In this paper we present the R package deTestSet that includes challenging test problems written as ordinary differential equations (ODEs), differential algebraic equations (DAEs) of index up to 3 and implicit differential equations (IDES). In addition it includes 6 new codes to solve initial value
Exploring the Domain Specificity of Creativity in Children: The Relationship between a Non-Verbal Creative Production Test and Creative Problem-Solving Activities

Directory of Open Access Journals (Sweden)

Ahmed Mohamed

2012-12-01

Full Text Available AbstractIn this study, we explored whether creativity was domain specific or domain general. The relationships between students’ scores on three creative problem-solving activities (math, spa-tial artistic, and oral linguistic in the DIS-COVER assessment (Discovering Intellectual Strengths and Capabilities While Observing Varied Ethnic Responses and the TCT-DP (Test of Creative Thinking-Drawing Produc-tion, a non-verbal general measure of creativi-ty, were examined. The participants were 135 first and second graders from two schools in the Southwestern United States from linguisti-cally and culturally diverse backgrounds. Pearson correlations, canonical correlations, and multiple regression analyses were calcu-lated to describe the relationship between the TCT-DP and the three DISCOVER creative problem-solving activities. We found that crea-tivity has both domain-specific and domain-general aspects, but that the domain-specific component seemed more prominent. One im-plication of these results is that educators should consider assessing creativity in specific domains to place students in special programs for gifted students rather than relying only on domain-general measures of divergent think-ing or creativity.
Online and Batch Supervised Background Estimation via L1 Regression

KAUST Repository

Dutta, Aritra

2017-11-23

We propose a surprisingly simple model for supervised video background estimation. Our model is based on $\\\\ell_1$ regression. As existing methods for $\\\\ell_1$ regression do not scale to high-resolution videos, we propose several simple and scalable methods for solving the problem, including iteratively reweighted least squares, a homotopy method, and stochastic gradient descent. We show through extensive experiments that our model and methods match or outperform the state-of-the-art online and batch methods in virtually all quantitative and qualitative measures.
Online and Batch Supervised Background Estimation via L1 Regression

KAUST Repository

Dutta, Aritra; Richtarik, Peter

2017-01-01

We propose a surprisingly simple model for supervised video background estimation. Our model is based on $\\ell_1$ regression. As existing methods for $\\ell_1$ regression do not scale to high-resolution videos, we propose several simple and scalable methods for solving the problem, including iteratively reweighted least squares, a homotopy method, and stochastic gradient descent. We show through extensive experiments that our model and methods match or outperform the state-of-the-art online and batch methods in virtually all quantitative and qualitative measures.
Reduced Rank Regression

DEFF Research Database (Denmark)

Johansen, Søren

2008-01-01

The reduced rank regression model is a multivariate regression model with a coefficient matrix with reduced rank. The reduced rank regression algorithm is an estimation procedure, which estimates the reduced rank regression model. It is related to canonical correlations and involves calculating...
[Parental beliefs and child-rearing attitudes and mental health problems among schoolchildren].

Science.gov (United States)

Vitolo, Ymara Lúcia Camargo; Fleitlich-Bilyk, Bacy; Goodman, Robert; Bordin, Isabel Altenfelder Santos

2005-10-01

To verify the prevalence and identify the risk factors related to mental health problems among schoolchildren and its possible association with the beliefs and educational attitudes of parents/caretakers. Cross-sectional study with a stratified probabilistic sample (n=454) of first to third-graders from public and private schools in Southeastern Brazil. Standardized instruments were administered to parents/caretakers by trained interviewers, including screening questionnaires for mental health problems among children and parents/caretakers; a questionnaire on beliefs and attitudes; and a questionnaire for socio-economic status. Chi-square tests and logistic regression models were used for statistical analysis. We found 35.2% prevalence of clinical/borderline cases among students. Parents/caretakers that believed in corporal punishment as a child-rearing method used physical aggression towards their children more frequently (64.8%). Logistic regression models showed that the act of hitting the child with a belt was associated to conduct problems and to overall mental health problems among schoolchildren in the presence of other risk factors: child gender (male), parents/caretakers with mental health problems, and adverse socioeconomic conditions. The high prevalence of mental health problems among schoolchildren and its association with child-rearing methods and mental health problems among parents/caretakers indicate the need for psycho-educational interventions aimed to reduce physical abuse and mental health problems in childhood.
Cognitive functioning and everyday problem solving in older adults.

Science.gov (United States)

Burton, Catherine L; Strauss, Esther; Hultsch, David F; Hunter, Michael A

2006-09-01

The relationship between cognitive functioning and a performance-based measure of everyday problem-solving, the Everyday Problems Test (EPT), thought to index instrumental activities of daily living (IADL), was examined in 291 community-dwelling non-demented older adults. Performance on the EPT was found to vary according to age, cognitive status, and education. Hierarchical regression analyses revealed that, after adjusting for demographic and health variables, measures of cognitive functioning accounted for 23.6% of the variance in EPT performance. In particular, measures of global cognitive status, cognitive decline, speed of processing, executive functioning, episodic memory, and verbal ability were significant predictors of EPT performance. These findings suggest that cognitive functioning along with demographic variables are important determinants of everyday problem-solving.
Estimating the exceedance probability of rain rate by logistic regression

Science.gov (United States)

Chiu, Long S.; Kedem, Benjamin

1990-01-01

Recent studies have shown that the fraction of an area with rain intensity above a fixed threshold is highly correlated with the area-averaged rain rate. To estimate the fractional rainy area, a logistic regression model, which estimates the conditional probability that rain rate over an area exceeds a fixed threshold given the values of related covariates, is developed. The problem of dependency in the data in the estimation procedure is bypassed by the method of partial likelihood. Analyses of simulated scanning multichannel microwave radiometer and observed electrically scanning microwave radiometer data during the Global Atlantic Tropical Experiment period show that the use of logistic regression in pixel classification is superior to multiple regression in predicting whether rain rate at each pixel exceeds a given threshold, even in the presence of noisy data. The potential of the logistic regression technique in satellite rain rate estimation is discussed.

Breath tests: principles, problems, and promise

International Nuclear Information System (INIS)

Lo, C.W.; Carter, E.A.; Walker, W.A.

1982-01-01

Breath tests rely on the measurement of gases produced in the intestine, absorbed, and expired in the breath. Carbohydrates, such as lactose and sucrose, can be administered in ysiologic doses; if malabsorbed, they will be metabolized to hydrogen by colonic bacteria. Since hydrogen is not produced by human metabolic reactions, a rise in breath hydrogen, as measured by gas chromatography, is evidence of carbohydrate malabsorption. Likewise, a rise in breath hydrogen marks the transit time of nonabsorbable carbohydrates such as lactulose through the small intestine into the colon. Simple end-expiratory interval collection into nonsiliconized vacutainer tubes has made these noninvasive tests quite convenient to perform, but various problems, including changes in stool pH intestinal motility, or metabolic rate, may influence results. Another group of breath tests uses substrates labeled with radioactive or stable isotopes of carbon. Labeled fat substrates such as trioctanoin, tripalmitin, and triolein do not produce the expected rise in labeled breath CO 2 if there is fat malabsorption. Bile acid malabsorption and small intestinal bacterial overgrowth can be measured with labeled cholylglycine or cholyltaurine. Labeled drugs such as aminopyrine, methacetin, and phenacetin can be used as an indication of drug metabolism and liver function. Radioactive substrates have been used to trace metabolic pathways and can be measured by scintillation counters. The availability of nonradioactive stable isotopes has made these ideal for use in children and pregnant women, but the cost of substrates and the mass spectrometers to measure them has so far limited their use to research centers. It is hoped that new techniques of processing and measurement will allow further realization of the exciting potential breath analysis has in a growing list of clinical applications
Quantile Regression Methods

DEFF Research Database (Denmark)

Fitzenberger, Bernd; Wilke, Ralf Andreas

2015-01-01

if the mean regression model does not. We provide a short informal introduction into the principle of quantile regression which includes an illustrative application from empirical labor market research. This is followed by briefly sketching the underlying statistical model for linear quantile regression based......Quantile regression is emerging as a popular statistical approach, which complements the estimation of conditional mean models. While the latter only focuses on one aspect of the conditional distribution of the dependent variable, the mean, quantile regression provides more detailed insights...... by modeling conditional quantiles. Quantile regression can therefore detect whether the partial effect of a regressor on the conditional quantiles is the same for all quantiles or differs across quantiles. Quantile regression can provide evidence for a statistical relationship between two variables even...
Robustness to non-normality of various tests for the one-sample location problem

Directory of Open Access Journals (Sweden)

Michelle K. McDougall

2004-01-01

Full Text Available This paper studies the effect of the normal distribution assumption on the power and size of the sign test, Wilcoxon's signed rank test and the t-test when used in one-sample location problems. Power functions for these tests under various skewness and kurtosis conditions are produced for several sample sizes from simulated data using the g-and-k distribution of MacGillivray and Cannon [5].
Addiction-Like Mobile Phone Behavior - Validation and Association With Problem Gambling.

Science.gov (United States)

Fransson, Andreas; Chóliz, Mariano; Håkansson, Anders

2018-01-01

Mobile phone use and its potential addiction has become a point of interest within the research community. The aim of the study was to translate and validate the Test of Mobile Dependence (TMD), and to investigate if there are any associations between mobile phone use and problem gambling. This was a cross-sectional study on a Swedish general population. A questionnaire consisting of a translated version of the TMD, three problem gambling questions (NODS-CLiP) together with two questions concerning previous addiction treatment was published online. Exploratory factor analysis based on polychoric correlations was performed on the TMD. Independent samples T -tests, Mann-Whitney test, logistic regression analyses and ANOVA were performed to examine mean differences between subjects based on TMD test score, gambling and previous addiction treatment. A total of 1,515 people (38.3% men) answered the questionnaire. The TMD showed acceptable internal consistency (Cronbach's alpha: 0.905), and significant correlation with subjective dependence on one's mobile phone. Women scored higher on the TMD and 15-18 year olds had the highest mean test score. The TMD test score was significantly associated with problem gambling, but only when controlling for age and sex. Various separated items related to mobile phone use were associated with problem gambling. The TMD had acceptable internal consistency and correlates with subjective dependence, while future confirmatory factor analysis is recommended. An association between mobile phone use and problem gambling may be possible, but requires further research.
49 CFR 40.203 - What problems cause a drug test to be cancelled unless they are corrected?

Science.gov (United States)

2010-10-01

... 49 Transportation 1 2010-10-01 2010-10-01 false What problems cause a drug test to be cancelled unless they are corrected? 40.203 Section 40.203 Transportation Office of the Secretary of Transportation PROCEDURES FOR TRANSPORTATION WORKPLACE DRUG AND ALCOHOL TESTING PROGRAMS Problems in Drug Tests § 40.203...
Inverse Tasks In The Tsunami Problem: Nonlinear Regression With Inaccurate Input Data

Science.gov (United States)

Lavrentiev, M.; Shchemel, A.; Simonov, K.

A variant of modified training functional that allows considering inaccurate input data is suggested. A limiting case when a part of input data is completely undefined, and, therefore, a problem of reconstruction of hidden parameters should be solved, is also considered. Some numerical experiments are presented. It is assumed that a dependence of known output variables on known input ones should be found is the classic problem definition, which is widely used in the majority of neural nets algorithms. The quality of approximation is evaluated as a performance function. Often the error of the task is evaluated as squared distance between known input data and predicted data multiplied by weighed coefficients. These coefficients may be named "precision coefficients". When inputs are not known exactly, natural generalization of performance function is adding member that responsible for distance between known inputs and shifted inputs, which lessen model's error. It is desirable that the set of variable parameters is compact for training to be con- verging. In the above problem it is possible to choose variants of demands of a priori compactness, which allow meaningful interpretation in the smoothness of the model dependence. Two kinds of regularization was used, first limited squares of coefficients responsible for nonlinearity and second limited multiplication of the above coeffi- cients and linear coefficients. Asymptotic universality of neural net ability to approxi- mate various smooth functions with any accuracy by increase of the number of tunable parameters is often the base for selecting a type of neural net approximation. It is pos- sible to show that used neural net will approach to Fourier integral transform, which approximate abilities are known, with increasing of the number of tunable parameters. In the limiting case, when input data is set with zero precision, the problem of recon- struction of hidden parameters with observed output data appears. The
Linear Regression on Sparse Features for Single-Channel Speech Separation

DEFF Research Database (Denmark)

Schmidt, Mikkel N.; Olsson, Rasmus Kongsgaard

2007-01-01

In this work we address the problem of separating multiple speakers from a single microphone recording. We formulate a linear regression model for estimating each speaker based on features derived from the mixture. The employed feature representation is a sparse, non-negative encoding of the speech...... mixture in terms of pre-learned speaker-dependent dictionaries. Previous work has shown that this feature representation by itself provides some degree of separation. We show that the performance is significantly improved when regression analysis is performed on the sparse, non-negative features, both...
Online monitoring and conditional regression tree test: Useful tools for a better understanding of combined sewer network behavior.

Science.gov (United States)

Bersinger, T; Bareille, G; Pigot, T; Bru, N; Le Hécho, I

2018-06-01

A good knowledge of the dynamic of pollutant concentration and flux in a combined sewer network is necessary when considering solutions to limit the pollutants discharged by combined sewer overflow (CSO) into receiving water during wet weather. Identification of the parameters that influence pollutant concentration and flux is important. Nevertheless, few studies have obtained satisfactory results for the identification of these parameters using statistical tools. Thus, this work uses a large database of rain events (116 over one year) obtained via continuous measurement of rainfall, discharge flow and chemical oxygen demand (COD) estimated using online turbidity for the identification of these parameters. We carried out a statistical study of the parameters influencing the maximum COD concentration, the discharge flow and the discharge COD flux. In this study a new test was used that has never been used in this field: the conditional regression tree test. We have demonstrated that the antecedent dry weather period, the rain event average intensity and the flow before the event are the three main factors influencing the maximum COD concentration during a rainfall event. Regarding the discharge flow, it is mainly influenced by the overall rainfall height but not by the maximum rainfall intensity. Finally, COD discharge flux is influenced by the discharge volume and the maximum COD concentration. Regression trees seem much more appropriate than common tests like PCA and PLS for this type of study as they take into account the thresholds and cumulative effects of various parameters as a function of the target variable. These results could help to improve sewer and CSO management in order to decrease the discharge of pollutants into receiving waters. Copyright © 2017 Elsevier B.V. All rights reserved.
Satellite rainfall retrieval by logistic regression

Science.gov (United States)

Chiu, Long S.

1986-01-01

The potential use of logistic regression in rainfall estimation from satellite measurements is investigated. Satellite measurements provide covariate information in terms of radiances from different remote sensors.The logistic regression technique can effectively accommodate many covariates and test their significance in the estimation. The outcome from the logistical model is the probability that the rainrate of a satellite pixel is above a certain threshold. By varying the thresholds, a rainrate histogram can be obtained, from which the mean and the variant can be estimated. A logistical model is developed and applied to rainfall data collected during GATE, using as covariates the fractional rain area and a radiance measurement which is deduced from a microwave temperature-rainrate relation. It is demonstrated that the fractional rain area is an important covariate in the model, consistent with the use of the so-called Area Time Integral in estimating total rain volume in other studies. To calibrate the logistical model, simulated rain fields generated by rainfield models with prescribed parameters are needed. A stringent test of the logistical model is its ability to recover the prescribed parameters of simulated rain fields. A rain field simulation model which preserves the fractional rain area and lognormality of rainrates as found in GATE is developed. A stochastic regression model of branching and immigration whose solutions are lognormally distributed in some asymptotic limits has also been developed.
A Monte Carlo simulation study comparing linear regression, beta regression, variable-dispersion beta regression and fractional logit regression at recovering average difference measures in a two sample design.

Science.gov (United States)

Meaney, Christopher; Moineddin, Rahim

2014-01-24

In biomedical research, response variables are often encountered which have bounded support on the open unit interval--(0,1). Traditionally, researchers have attempted to estimate covariate effects on these types of response data using linear regression. Alternative modelling strategies may include: beta regression, variable-dispersion beta regression, and fractional logit regression models. This study employs a Monte Carlo simulation design to compare the statistical properties of the linear regression model to that of the more novel beta regression, variable-dispersion beta regression, and fractional logit regression models. In the Monte Carlo experiment we assume a simple two sample design. We assume observations are realizations of independent draws from their respective probability models. The randomly simulated draws from the various probability models are chosen to emulate average proportion/percentage/rate differences of pre-specified magnitudes. Following simulation of the experimental data we estimate average proportion/percentage/rate differences. We compare the estimators in terms of bias, variance, type-1 error and power. Estimates of Monte Carlo error associated with these quantities are provided. If response data are beta distributed with constant dispersion parameters across the two samples, then all models are unbiased and have reasonable type-1 error rates and power profiles. If the response data in the two samples have different dispersion parameters, then the simple beta regression model is biased. When the sample size is small (N0 = N1 = 25) linear regression has superior type-1 error rates compared to the other models. Small sample type-1 error rates can be improved in beta regression models using bias correction/reduction methods. In the power experiments, variable-dispersion beta regression and fractional logit regression models have slightly elevated power compared to linear regression models. Similar results were observed if the
Descriptor Learning via Supervised Manifold Regularization for Multioutput Regression.

Science.gov (United States)

Zhen, Xiantong; Yu, Mengyang; Islam, Ali; Bhaduri, Mousumi; Chan, Ian; Li, Shuo

2017-09-01

Multioutput regression has recently shown great ability to solve challenging problems in both computer vision and medical image analysis. However, due to the huge image variability and ambiguity, it is fundamentally challenging to handle the highly complex input-target relationship of multioutput regression, especially with indiscriminate high-dimensional representations. In this paper, we propose a novel supervised descriptor learning (SDL) algorithm for multioutput regression, which can establish discriminative and compact feature representations to improve the multivariate estimation performance. The SDL is formulated as generalized low-rank approximations of matrices with a supervised manifold regularization. The SDL is able to simultaneously extract discriminative features closely related to multivariate targets and remove irrelevant and redundant information by transforming raw features into a new low-dimensional space aligned to targets. The achieved discriminative while compact descriptor largely reduces the variability and ambiguity for multioutput regression, which enables more accurate and efficient multivariate estimation. We conduct extensive evaluation of the proposed SDL on both synthetic data and real-world multioutput regression tasks for both computer vision and medical image analysis. Experimental results have shown that the proposed SDL can achieve high multivariate estimation accuracy on all tasks and largely outperforms the algorithms in the state of the arts. Our method establishes a novel SDL framework for multioutput regression, which can be widely used to boost the performance in different applications.
Improving Students’ Scientific Reasoning and Problem-Solving Skills by The 5E Learning Model

Directory of Open Access Journals (Sweden)

Sri Mulyani Endang Susilowati

2017-12-01

Full Text Available Biology learning in MA (Madrasah Aliyah Khas Kempek was still dominated by teacher with low students’ involvement. This study would analyze the effectiveness of the 5E (Engagement, Exploration, Explanation, Elaboration, Evaluation learning model in improving scientific knowledge and problems solving. It also explained the relationship between students’ scientific reasoning with their problem-solving abilities. This was a pre-experimental research with one group pre-test post-test. Sixty students of MA Khas Kempek from XI MIA 3 and XI MIA 4 involved in this study. The learning outcome of the students was collected by the test of reasoning and problem-solving. The results showed that the rises of students’ scientific reasoning ability were 69.77% for XI MIA 3 and 66.27% for XI MIA 4, in the medium category. The problem-solving skills were 63.40% for XI MIA 3, 61.67% for XI MIA 4, and classified in the moderate category. The simple regression test found a linear correlation between students’ scientific reasoning and problem-solving ability. This study affirms that reasoning ability is needed in problem-solving. It is found that application of 5E learning model was effective to improve scientific reasoning and problem-solving ability of students.
The effect of high leverage points on the logistic ridge regression estimator having multicollinearity

Science.gov (United States)

Ariffin, Syaiba Balqish; Midi, Habshah

2014-06-01

This article is concerned with the performance of logistic ridge regression estimation technique in the presence of multicollinearity and high leverage points. In logistic regression, multicollinearity exists among predictors and in the information matrix. The maximum likelihood estimator suffers a huge setback in the presence of multicollinearity which cause regression estimates to have unduly large standard errors. To remedy this problem, a logistic ridge regression estimator is put forward. It is evident that the logistic ridge regression estimator outperforms the maximum likelihood approach for handling multicollinearity. The effect of high leverage points are then investigated on the performance of the logistic ridge regression estimator through real data set and simulation study. The findings signify that logistic ridge regression estimator fails to provide better parameter estimates in the presence of both high leverage points and multicollinearity.
Gaussian process regression for sensor networks under localization uncertainty

Science.gov (United States)

Jadaliha, M.; Xu, Yunfei; Choi, Jongeun; Johnson, N.S.; Li, Weiming

2013-01-01

In this paper, we formulate Gaussian process regression with observations under the localization uncertainty due to the resource-constrained sensor networks. In our formulation, effects of observations, measurement noise, localization uncertainty, and prior distributions are all correctly incorporated in the posterior predictive statistics. The analytically intractable posterior predictive statistics are proposed to be approximated by two techniques, viz., Monte Carlo sampling and Laplace's method. Such approximation techniques have been carefully tailored to our problems and their approximation error and complexity are analyzed. Simulation study demonstrates that the proposed approaches perform much better than approaches without considering the localization uncertainty properly. Finally, we have applied the proposed approaches on the experimentally collected real data from a dye concentration field over a section of a river and a temperature field of an outdoor swimming pool to provide proof of concept tests and evaluate the proposed schemes in real situations. In both simulation and experimental results, the proposed methods outperform the quick-and-dirty solutions often used in practice.
Estimation of adjusted rate differences using additive negative binomial regression.

Science.gov (United States)

Donoghoe, Mark W; Marschner, Ian C

2016-08-15

Rate differences are an important effect measure in biostatistics and provide an alternative perspective to rate ratios. When the data are event counts observed during an exposure period, adjusted rate differences may be estimated using an identity-link Poisson generalised linear model, also known as additive Poisson regression. A problem with this approach is that the assumption of equality of mean and variance rarely holds in real data, which often show overdispersion. An additive negative binomial model is the natural alternative to account for this; however, standard model-fitting methods are often unable to cope with the constrained parameter space arising from the non-negativity restrictions of the additive model. In this paper, we propose a novel solution to this problem using a variant of the expectation-conditional maximisation-either algorithm. Our method provides a reliable way to fit an additive negative binomial regression model and also permits flexible generalisations using semi-parametric regression functions. We illustrate the method using a placebo-controlled clinical trial of fenofibrate treatment in patients with type II diabetes, where the outcome is the number of laser therapy courses administered to treat diabetic retinopathy. An R package is available that implements the proposed method. Copyright © 2016 John Wiley & Sons, Ltd. Copyright © 2016 John Wiley & Sons, Ltd.
A speech reception in noise test for preschool children (the Galker-test)

DEFF Research Database (Denmark)

Lauritsen, Maj-Britt Glenn; Kreiner, Svend; Söderström, Margareta

2015-01-01

Purpose: This study evaluates initial validity and reliability of the “Galker test of speech reception in noise” developed for Danish preschool children suspected to have problems with hearing or understanding speech against strict psychometric standards and assesses acceptance by the children....... Methods:The Galker test is an audio-visual, computerised, word discrimination test in background noise, originally comprised of 50 word pairs. Three hundred and eighty eight children attending ordinary day care centres and aged 3–5 years were included. With multiple regression and the Rasch item response...... model it was examined whether the total score of the Galker test validly reflected item responses across subgroups defined by sex, age, bilingualism, tympanometry, audiometry and verbal comprehension. Results: A total of 370 children (95%) accepted testing and 339 (87%) completed all 50 items...
Online Censoring for Large-Scale Regressions with Application to Streaming Big Data.

Science.gov (United States)

Berberidis, Dimitris; Kekatos, Vassilis; Giannakis, Georgios B

2016-08-01

On par with data-intensive applications, the sheer size of modern linear regression problems creates an ever-growing demand for efficient solvers. Fortunately, a significant percentage of the data accrued can be omitted while maintaining a certain quality of statistical inference with an affordable computational budget. This work introduces means of identifying and omitting less informative observations in an online and data-adaptive fashion. Given streaming data, the related maximum-likelihood estimator is sequentially found using first- and second-order stochastic approximation algorithms. These schemes are well suited when data are inherently censored or when the aim is to save communication overhead in decentralized learning setups. In a different operational scenario, the task of joint censoring and estimation is put forth to solve large-scale linear regressions in a centralized setup. Novel online algorithms are developed enjoying simple closed-form updates and provable (non)asymptotic convergence guarantees. To attain desired censoring patterns and levels of dimensionality reduction, thresholding rules are investigated too. Numerical tests on real and synthetic datasets corroborate the efficacy of the proposed data-adaptive methods compared to data-agnostic random projection-based alternatives.
Theory of Regression Apple Professional Cooperation Organization Research

OpenAIRE

Ouyang Bin

2013-01-01

In view of the enterprise ecological apple manor a variety of problems of existence, put forward to the enterprise management transformation, achieve enterprise, collective, individual integrated operation management and the use of regression mathematical model on apple professional cooperation organization analysis. Through the example, Apple professional economic cooperation organization innovation model of the input output ratio than the rural economic cooperation organization is much high...
Model selection with multiple regression on distance matrices leads to incorrect inferences.

Directory of Open Access Journals (Sweden)

Ryan P Franckowiak

Full Text Available In landscape genetics, model selection procedures based on Information Theoretic and Bayesian principles have been used with multiple regression on distance matrices (MRM to test the relationship between multiple vectors of pairwise genetic, geographic, and environmental distance. Using Monte Carlo simulations, we examined the ability of model selection criteria based on Akaike's information criterion (AIC, its small-sample correction (AICc, and the Bayesian information criterion (BIC to reliably rank candidate models when applied with MRM while varying the sample size. The results showed a serious problem: all three criteria exhibit a systematic bias toward selecting unnecessarily complex models containing spurious random variables and erroneously suggest a high level of support for the incorrectly ranked best model. These problems effectively increased with increasing sample size. The failure of AIC, AICc, and BIC was likely driven by the inflated sample size and different sum-of-squares partitioned by MRM, and the resulting effect on delta values. Based on these findings, we strongly discourage the continued application of AIC, AICc, and BIC for model selection with MRM.
Replica analysis of overfitting in regression models for time-to-event data

Science.gov (United States)

Coolen, A. C. C.; Barrett, J. E.; Paga, P.; Perez-Vicente, C. J.

2017-09-01

Overfitting, which happens when the number of parameters in a model is too large compared to the number of data points available for determining these parameters, is a serious and growing problem in survival analysis. While modern medicine presents us with data of unprecedented dimensionality, these data cannot yet be used effectively for clinical outcome prediction. Standard error measures in maximum likelihood regression, such as p-values and z-scores, are blind to overfitting, and even for Cox’s proportional hazards model (the main tool of medical statisticians), one finds in literature only rules of thumb on the number of samples required to avoid overfitting. In this paper we present a mathematical theory of overfitting in regression models for time-to-event data, which aims to increase our quantitative understanding of the problem and provide practical tools with which to correct regression outcomes for the impact of overfitting. It is based on the replica method, a statistical mechanical technique for the analysis of heterogeneous many-variable systems that has been used successfully for several decades in physics, biology, and computer science, but not yet in medical statistics. We develop the theory initially for arbitrary regression models for time-to-event data, and verify its predictions in detail for the popular Cox model.

Patch testing with markers of fragrance contact allergy. Do clinical tests correspond to patients' self-reported problems?

DEFF Research Database (Denmark)

Johansen, J D; Andersen, T F; Veien, Niels

1997-01-01

The aim of the present study was to investigate the relationship between patients' own recognition of skin problems using consumer products and the results of patch testing with markers of fragrance sensitization. Eight hundred and eighty-four consecutive eczema patients, 18-69 years of age, filled...
Patch testing with markers of fragrance contact allergy. Do clinical tests correspond to patients' self-reported problems?

DEFF Research Database (Denmark)

Johansen, J D; Andersen, T F; Veien, Niels

1997-01-01

The aim of the present study was to investigate the relationship between patients' own recognition of skin problems using consumer products and the results of patch testing with markers of fragrance sensitization. Eight hundred and eighty-four consecutive eczema patients, 18-69 years of age, fill...
Problems of overcoming medical consequences of nuclear tests at the former Semipalatinsk test site (STS)

International Nuclear Information System (INIS)

Devyatko, V.N.

1997-01-01

Tests conducted for many years resulted in large radioactive contamination of Semipalatinsk, East Kazakhstan, Pavlodar and Karaganda regions. About 1,5 million people underwent multiple acute and chronic influence of small ionizing radiation doses basically.In this connection Ministry of Heals Protection and Social Protection Organizations are worried about the problem of recovering and rehabilitating the population of the above regions. to solve these problems Ministry of Health Protection Republic of Kazakhstan established Scientific Research Institutes of Medicine and Ecology in Semipalatinsk and regional Medical and Diagnostic Center in Kurchatov. With the help of regional Administrations there were created medical centers: diagnostic, children's, recovering, ophthalmological, of motherhood and childhood protection. Work on creating State National Medical Registration for people who underwent influence of ionizing radiation is being performed
Construct-level predictive validity of educational attainment and intellectual aptitude tests in medical student selection: meta-regression of six UK longitudinal studies

Science.gov (United States)

2013-01-01

Background Measures used for medical student selection should predict future performance during training. A problem for any selection study is that predictor-outcome correlations are known only in those who have been selected, whereas selectors need to know how measures would predict in the entire pool of applicants. That problem of interpretation can be solved by calculating construct-level predictive validity, an estimate of true predictor-outcome correlation across the range of applicant abilities. Methods Construct-level predictive validities were calculated in six cohort studies of medical student selection and training (student entry, 1972 to 2009) for a range of predictors, including A-levels, General Certificates of Secondary Education (GCSEs)/O-levels, and aptitude tests (AH5 and UK Clinical Aptitude Test (UKCAT)). Outcomes included undergraduate basic medical science and finals assessments, as well as postgraduate measures of Membership of the Royal Colleges of Physicians of the United Kingdom (MRCP(UK)) performance and entry in the Specialist Register. Construct-level predictive validity was calculated with the method of Hunter, Schmidt and Le (2006), adapted to correct for right-censorship of examination results due to grade inflation. Results Meta-regression analyzed 57 separate predictor-outcome correlations (POCs) and construct-level predictive validities (CLPVs). Mean CLPVs are substantially higher (.450) than mean POCs (.171). Mean CLPVs for first-year examinations, were high for A-levels (.809; CI: .501 to .935), and lower for GCSEs/O-levels (.332; CI: .024 to .583) and UKCAT (mean = .245; CI: .207 to .276). A-levels had higher CLPVs for all undergraduate and postgraduate assessments than did GCSEs/O-levels and intellectual aptitude tests. CLPVs of educational attainment measures decline somewhat during training, but continue to predict postgraduate performance. Intellectual aptitude tests have lower CLPVs than A-levels or GCSEs
A flexible fuzzy regression algorithm for forecasting oil consumption estimation

International Nuclear Information System (INIS)

Azadeh, A.; Khakestani, M.; Saberi, M.

2009-01-01

Oil consumption plays a vital role in socio-economic development of most countries. This study presents a flexible fuzzy regression algorithm for forecasting oil consumption based on standard economic indicators. The standard indicators are annual population, cost of crude oil import, gross domestic production (GDP) and annual oil production in the last period. The proposed algorithm uses analysis of variance (ANOVA) to select either fuzzy regression or conventional regression for future demand estimation. The significance of the proposed algorithm is three fold. First, it is flexible and identifies the best model based on the results of ANOVA and minimum absolute percentage error (MAPE), whereas previous studies consider the best fitted fuzzy regression model based on MAPE or other relative error results. Second, the proposed model may identify conventional regression as the best model for future oil consumption forecasting because of its dynamic structure, whereas previous studies assume that fuzzy regression always provide the best solutions and estimation. Third, it utilizes the most standard independent variables for the regression models. To show the applicability and superiority of the proposed flexible fuzzy regression algorithm the data for oil consumption in Canada, United States, Japan and Australia from 1990 to 2005 are used. The results show that the flexible algorithm provides accurate solution for oil consumption estimation problem. The algorithm may be used by policy makers to accurately foresee the behavior of oil consumption in various regions.
FATAL, General Experiment Fitting Program by Nonlinear Regression Method

International Nuclear Information System (INIS)

Salmon, L.; Budd, T.; Marshall, M.

1982-01-01

1 - Description of problem or function: A generalized fitting program with a free-format keyword interface to the user. It permits experimental data to be fitted by non-linear regression methods to any function describable by the user. The user requires the minimum of computer experience but needs to provide a subroutine to define his function. Some statistical output is included as well as 'best' estimates of the function's parameters. 2 - Method of solution: The regression method used is based on a minimization technique devised by Powell (Harwell Subroutine Library VA05A, 1972) which does not require the use of analytical derivatives. The method employs a quasi-Newton procedure balanced with a steepest descent correction. Experience shows this to be efficient for a very wide range of application. 3 - Restrictions on the complexity of the problem: The current version of the program permits functions to be defined with up to 20 parameters. The function may be fitted to a maximum of 400 points, preferably with estimated values of weight given
Online Support Vector Regression with Varying Parameters for Time-Dependent Data

International Nuclear Information System (INIS)

Omitaomu, Olufemi A.; Jeong, Myong K.; Badiru, Adedeji B.

2011-01-01

Support vector regression (SVR) is a machine learning technique that continues to receive interest in several domains including manufacturing, engineering, and medicine. In order to extend its application to problems in which datasets arrive constantly and in which batch processing of the datasets is infeasible or expensive, an accurate online support vector regression (AOSVR) technique was proposed. The AOSVR technique efficiently updates a trained SVR function whenever a sample is added to or removed from the training set without retraining the entire training data. However, the AOSVR technique assumes that the new samples and the training samples are of the same characteristics; hence, the same value of SVR parameters is used for training and prediction. This assumption is not applicable to data samples that are inherently noisy and non-stationary such as sensor data. As a result, we propose Accurate On-line Support Vector Regression with Varying Parameters (AOSVR-VP) that uses varying SVR parameters rather than fixed SVR parameters, and hence accounts for the variability that may exist in the samples. To accomplish this objective, we also propose a generalized weight function to automatically update the weights of SVR parameters in on-line monitoring applications. The proposed function allows for lower and upper bounds for SVR parameters. We tested our proposed approach and compared results with the conventional AOSVR approach using two benchmark time series data and sensor data from nuclear power plant. The results show that using varying SVR parameters is more applicable to time dependent data.
Addiction-Like Mobile Phone Behavior – Validation and Association With Problem Gambling

Directory of Open Access Journals (Sweden)

Andreas Fransson

2018-05-01

Full Text Available Mobile phone use and its potential addiction has become a point of interest within the research community. The aim of the study was to translate and validate the Test of Mobile Dependence (TMD, and to investigate if there are any associations between mobile phone use and problem gambling. This was a cross-sectional study on a Swedish general population. A questionnaire consisting of a translated version of the TMD, three problem gambling questions (NODS-CLiP together with two questions concerning previous addiction treatment was published online. Exploratory factor analysis based on polychoric correlations was performed on the TMD. Independent samples T-tests, Mann-Whitney test, logistic regression analyses and ANOVA were performed to examine mean differences between subjects based on TMD test score, gambling and previous addiction treatment. A total of 1,515 people (38.3% men answered the questionnaire. The TMD showed acceptable internal consistency (Cronbach's alpha: 0.905, and significant correlation with subjective dependence on one's mobile phone. Women scored higher on the TMD and 15-18 year olds had the highest mean test score. The TMD test score was significantly associated with problem gambling, but only when controlling for age and sex. Various separated items related to mobile phone use were associated with problem gambling. The TMD had acceptable internal consistency and correlates with subjective dependence, while future confirmatory factor analysis is recommended. An association between mobile phone use and problem gambling may be possible, but requires further research.
Addiction-Like Mobile Phone Behavior – Validation and Association With Problem Gambling

Science.gov (United States)

Fransson, Andreas; Chóliz, Mariano; Håkansson, Anders

2018-01-01

Mobile phone use and its potential addiction has become a point of interest within the research community. The aim of the study was to translate and validate the Test of Mobile Dependence (TMD), and to investigate if there are any associations between mobile phone use and problem gambling. This was a cross-sectional study on a Swedish general population. A questionnaire consisting of a translated version of the TMD, three problem gambling questions (NODS-CLiP) together with two questions concerning previous addiction treatment was published online. Exploratory factor analysis based on polychoric correlations was performed on the TMD. Independent samples T-tests, Mann-Whitney test, logistic regression analyses and ANOVA were performed to examine mean differences between subjects based on TMD test score, gambling and previous addiction treatment. A total of 1,515 people (38.3% men) answered the questionnaire. The TMD showed acceptable internal consistency (Cronbach's alpha: 0.905), and significant correlation with subjective dependence on one's mobile phone. Women scored higher on the TMD and 15-18 year olds had the highest mean test score. The TMD test score was significantly associated with problem gambling, but only when controlling for age and sex. Various separated items related to mobile phone use were associated with problem gambling. The TMD had acceptable internal consistency and correlates with subjective dependence, while future confirmatory factor analysis is recommended. An association between mobile phone use and problem gambling may be possible, but requires further research. PMID:29780345
Modification of Harvard step-test for assessment of students’ with health problems functional potentials

Directory of Open Access Journals (Sweden)

E.N. Kopeikina

2016-08-01

Full Text Available Purpose: to substantiate, work out and experimentally prove modified test for assessment of students’ with health problems functional potentials. Material: in the research students and girl students of 18-20 years’ age (n=522 participated. According to the worked out modification of test during 30 seconds student ascended on bench (h=43 cm and descended from it. Then pulse was measured three times. In total the test took 4 minutes. Results: For working out the scale for interpretation of the received results we assessed new 30 seconds’ modification of Harvard step-test for validity. First, for this purpose all students fulfilled modified step-test. Then after full restoration (after 20 minutes they fulfilled its three minutes’ variant. Correlation analysis of the received results showed the presence of average correlation between two samples (r=0.64. Conclusions: application of this modified variant permits for pedagogues to completely assess functional potentials of students with heath problems.
Model building strategy for logistic regression: purposeful selection.

Science.gov (United States)

Zhang, Zhongheng

2016-03-01

Logistic regression is one of the most commonly used models to account for confounders in medical literature. The article introduces how to perform purposeful selection model building strategy with R. I stress on the use of likelihood ratio test to see whether deleting a variable will have significant impact on model fit. A deleted variable should also be checked for whether it is an important adjustment of remaining covariates. Interaction should be checked to disentangle complex relationship between covariates and their synergistic effect on response variable. Model should be checked for the goodness-of-fit (GOF). In other words, how the fitted model reflects the real data. Hosmer-Lemeshow GOF test is the most widely used for logistic regression model.
Retrieving relevant factors with exploratory SEM and principal-covariate regression: A comparison.

Science.gov (United States)

Vervloet, Marlies; Van den Noortgate, Wim; Ceulemans, Eva

2018-02-12

Behavioral researchers often linearly regress a criterion on multiple predictors, aiming to gain insight into the relations between the criterion and predictors. Obtaining this insight from the ordinary least squares (OLS) regression solution may be troublesome, because OLS regression weights show only the effect of a predictor on top of the effects of other predictors. Moreover, when the number of predictors grows larger, it becomes likely that the predictors will be highly collinear, which makes the regression weights' estimates unstable (i.e., the "bouncing beta" problem). Among other procedures, dimension-reduction-based methods have been proposed for dealing with these problems. These methods yield insight into the data by reducing the predictors to a smaller number of summarizing variables and regressing the criterion on these summarizing variables. Two promising methods are principal-covariate regression (PCovR) and exploratory structural equation modeling (ESEM). Both simultaneously optimize reduction and prediction, but they are based on different frameworks. The resulting solutions have not yet been compared; it is thus unclear what the strengths and weaknesses are of both methods. In this article, we focus on the extents to which PCovR and ESEM are able to extract the factors that truly underlie the predictor scores and can predict a single criterion. The results of two simulation studies showed that for a typical behavioral dataset, ESEM (using the BIC for model selection) in this regard is successful more often than PCovR. Yet, in 93% of the datasets PCovR performed equally well, and in the case of 48 predictors, 100 observations, and large differences in the strengths of the factors, PCovR even outperformed ESEM.
Regression analysis of mixed recurrent-event and panel-count data.

Science.gov (United States)

Zhu, Liang; Tong, Xinwei; Sun, Jianguo; Chen, Manhua; Srivastava, Deo Kumar; Leisenring, Wendy; Robison, Leslie L

2014-07-01

In event history studies concerning recurrent events, two types of data have been extensively discussed. One is recurrent-event data (Cook and Lawless, 2007. The Analysis of Recurrent Event Data. New York: Springer), and the other is panel-count data (Zhao and others, 2010. Nonparametric inference based on panel-count data. Test 20: , 1-42). In the former case, all study subjects are monitored continuously; thus, complete information is available for the underlying recurrent-event processes of interest. In the latter case, study subjects are monitored periodically; thus, only incomplete information is available for the processes of interest. In reality, however, a third type of data could occur in which some study subjects are monitored continuously, but others are monitored periodically. When this occurs, we have mixed recurrent-event and panel-count data. This paper discusses regression analysis of such mixed data and presents two estimation procedures for the problem. One is a maximum likelihood estimation procedure, and the other is an estimating equation procedure. The asymptotic properties of both resulting estimators of regression parameters are established. Also, the methods are applied to a set of mixed recurrent-event and panel-count data that arose from a Childhood Cancer Survivor Study and motivated this investigation. © The Author 2014. Published by Oxford University Press. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.
Implementing fuzzy polynomial interpolation (FPI and fuzzy linear regression (LFR

Directory of Open Access Journals (Sweden)

Maria Cristina Floreno

1996-05-01

Full Text Available This paper presents some preliminary results arising within a general framework concerning the development of software tools for fuzzy arithmetic. The program is in a preliminary stage. What has been already implemented consists of a set of routines for elementary operations, optimized functions evaluation, interpolation and regression. Some of these have been applied to real problems.This paper describes a prototype of a library in C++ for polynomial interpolation of fuzzifying functions, a set of routines in FORTRAN for fuzzy linear regression and a program with graphical user interface allowing the use of such routines.
49 CFR 40.267 - What problems always cause an alcohol test to be cancelled?

Science.gov (United States)

2010-10-01

... cancelled? 40.267 Section 40.267 Transportation Office of the Secretary of Transportation PROCEDURES FOR... always cause an alcohol test to be cancelled? As an employer, a BAT, or an STT, you must cancel an... the test was cancelled and must be treated as if the test never occurred. These problems are: (a) In...
Testing a model of research intention among U.K. clinical psychologists: a logistic regression analysis.

Science.gov (United States)

Eke, Gemma; Holttum, Sue; Hayward, Mark

2012-03-01

Previous research highlights barriers to clinical psychologists conducting research, but has rarely examined U.K. clinical psychologists. The study investigated U.K. clinical psychologists' self-reported research output and tested part of a theoretical model of factors influencing their intention to conduct research. Questionnaires were mailed to 1,300 U.K. clinical psychologists. Three hundred and seventy-four questionnaires were returned (29% response-rate). This study replicated in a U.K. sample the finding that the modal number of publications was zero, highlighted in a number of U.K. and U.S. studies. Research intention was bimodally distributed, and logistic regression classified 78% of cases successfully. Outcome expectations, perceived behavioral control and normative beliefs mediated between research training environment and intention. Further research should explore how research is negotiated in clinical roles, and this issue should be incorporated into prequalification training. © 2012 Wiley Periodicals, Inc.
Testing Foreign Language Impact on Engineering Students' Scientific Problem-Solving Performance

Science.gov (United States)

Tatzl, Dietmar; Messnarz, Bernd

2013-01-01

This article investigates the influence of English as the examination language on the solution of physics and science problems by non-native speakers in tertiary engineering education. For that purpose, a statistically significant total number of 96 students in four year groups from freshman to senior level participated in a testing experiment in…
Comparison of ν-support vector regression and logistic equation for ...

African Journals Online (AJOL)

Due to the complexity and high non-linearity of bioprocess, most simple mathematical models fail to describe the exact behavior of biochemistry systems. As a novel type of learning method, support vector regression (SVR) owns the powerful capability to characterize problems via small sample, nonlinearity, high dimension ...
Problem-Solving Test: RNA and Protein Synthesis in Bacteriophage-Infected "E. coli" Cells

Science.gov (United States)

Szeberenyi, Jozsef

2008-01-01

The classic experiment presented in this problem-solving test was designed to identify the template molecules of translation by analyzing the synthesis of phage proteins in "Escherichia coli" cells infected with bacteriophage T4. The work described in this test led to one of the most seminal discoveries of early molecular biology: it dealt a…
Multiple regression analysis of Jominy hardenability data for boron treated steels

International Nuclear Information System (INIS)

Komenda, J.; Sandstroem, R.; Tukiainen, M.

1997-01-01

The relations between chemical composition and their hardenability of boron treated steels have been investigated using a multiple regression analysis method. A linear model of regression was chosen. The free boron content that is effective for the hardenability was calculated using a model proposed by Jansson. The regression analysis for 1261 steel heats provided equations that were statistically significant at the 95% level. All heats met the specification according to the nordic countries producers classification. The variation in chemical composition explained typically 80 to 90% of the variation in the hardenability. In the regression analysis elements which did not significantly contribute to the calculated hardness according to the F test were eliminated. Carbon, silicon, manganese, phosphorus and chromium were of importance at all Jominy distances, nickel, vanadium, boron and nitrogen at distances above 6 mm. After the regression analysis it was demonstrated that very few outliers were present in the data set, i.e. data points outside four times the standard deviation. The model has successfully been used in industrial practice replacing some of the necessary Jominy tests. (orig.)

General Dimensional Multiple-Output Support Vector Regressions and Their Multiple Kernel Learning.

Science.gov (United States)

Chung, Wooyong; Kim, Jisu; Lee, Heejin; Kim, Euntai

2015-11-01

Support vector regression has been considered as one of the most important regression or function approximation methodologies in a variety of fields. In this paper, two new general dimensional multiple output support vector regressions (MSVRs) named SOCPL1 and SOCPL2 are proposed. The proposed methods are formulated in the dual space and their relationship with the previous works is clearly investigated. Further, the proposed MSVRs are extended into the multiple kernel learning and their training is implemented by the off-the-shelf convex optimization tools. The proposed MSVRs are applied to benchmark problems and their performances are compared with those of the previous methods in the experimental section.
Testing and Modeling of Contact Problems in Resistance Welding

DEFF Research Database (Denmark)

Song, Quanfeng

together two or three cylindrical parts as well as disc-ring pairs of dissimilar metals. The tests have demonstrated the effectiveness of the model. A theoretical and experimental study is performed on the contact resistance aiming at a more reliable model for numerical simulation of resistance welding......As a part of the efforts towards a professional and reliable numerical tool for resistance welding engineers, this Ph.D. project is dedicated to refining the numerical models related to the interface behavior. An FE algorithm for the contact problems in resistance welding has been developed...... for the formulation, and the interfaces are treated in a symmetric pattern. The frictional sliding contact is also solved employing the constant friction model. The algorithm is incorporated into the finite element code. Verification is carried out in some numerical tests as well as experiments such as upsetting...
Testing the Emotional Vulnerability Pathway to Problem Gambling in Culturally Diverse University Students.

Science.gov (United States)

Hum, Sandra; Carr, Sherilene M

2018-02-12

Loneliness and adapting to an unfamiliar environment can increase emotional vulnerability in culturally and linguistically diverse (CALD) university students. According to Blaszczynski and Nower's pathways model of problem and pathological gambling, this emotional vulnerability could increase the risk of problem gambling. The current study examined whether loneliness was associated with problem gambling risk in CALD students relative to their Australian peers. Additionally, differences in coping strategies were examined to determine their buffering effect on the relationship. A total of 463 female and 165 male university students (aged 18-38) from Australian (38%), mixed Australian and CALD (23%) and CALD (28%) backgrounds responded to an online survey of problem gambling behaviour, loneliness, and coping strategies. The results supported the hypothesis that loneliness would be related to problem gambling in CALD students. There was no evidence of a moderating effect of coping strategies. Future research could test whether the introduction of programs designed to alleviate loneliness in culturally diverse university students reduces their risk of developing problem gambling.
Bayesian median regression for temporal gene expression data

Science.gov (United States)

Yu, Keming; Vinciotti, Veronica; Liu, Xiaohui; 't Hoen, Peter A. C.

2007-09-01

Most of the existing methods for the identification of biologically interesting genes in a temporal expression profiling dataset do not fully exploit the temporal ordering in the dataset and are based on normality assumptions for the gene expression. In this paper, we introduce a Bayesian median regression model to detect genes whose temporal profile is significantly different across a number of biological conditions. The regression model is defined by a polynomial function where both time and condition effects as well as interactions between the two are included. MCMC-based inference returns the posterior distribution of the polynomial coefficients. From this a simple Bayes factor test is proposed to test for significance. The estimation of the median rather than the mean, and within a Bayesian framework, increases the robustness of the method compared to a Hotelling T2-test previously suggested. This is shown on simulated data and on muscular dystrophy gene expression data.
Novel applications of multitask learning and multiple output regression to multiple genetic trait prediction.

Science.gov (United States)

He, Dan; Kuhn, David; Parida, Laxmi

2016-06-15

Given a set of biallelic molecular markers, such as SNPs, with genotype values encoded numerically on a collection of plant, animal or human samples, the goal of genetic trait prediction is to predict the quantitative trait values by simultaneously modeling all marker effects. Genetic trait prediction is usually represented as linear regression models. In many cases, for the same set of samples and markers, multiple traits are observed. Some of these traits might be correlated with each other. Therefore, modeling all the multiple traits together may improve the prediction accuracy. In this work, we view the multitrait prediction problem from a machine learning angle: as either a multitask learning problem or a multiple output regression problem, depending on whether different traits share the same genotype matrix or not. We then adapted multitask learning algorithms and multiple output regression algorithms to solve the multitrait prediction problem. We proposed a few strategies to improve the least square error of the prediction from these algorithms. Our experiments show that modeling multiple traits together could improve the prediction accuracy for correlated traits. The programs we used are either public or directly from the referred authors, such as MALSAR (http://www.public.asu.edu/~jye02/Software/MALSAR/) package. The Avocado data set has not been published yet and is available upon request. dhe@us.ibm.com. © The Author 2016. Published by Oxford University Press.
Risk factors that predicted problem drinking in Danish men at age thirty

DEFF Research Database (Denmark)

Knop, Joachim; Penick, Elizabeth C; Jensen, Per

2003-01-01

records and a series of structured interviews and psychometric tests at ages 19-20 and 30 years. The present analysis focuses on the degree to which premorbid differences between the high- and low-risk groups later predicted lifetime drinking problems at age 30 (n = 241). RESULTS: As expected lifetime...... alcohol abuse/dependence by age 30 was reported significantly more often in the high-risk group. Of the 394 premorbid variables tested, 68 were found to distinguish the high- from the low-risk group before any subjects had developed a drinking problem. Of these 68 variables, 28 (41%) were also associated...... with DSM-III-R alcohol abuse/dependence at age 30. These 28 putative markers were reduced to 12 that were entered into a multiple regression analysis to search for the most powerful unique predictors of alcoholism. Four of the 28 putative markers were independently associated with problem drinking at age...
A Fast Gradient Method for Nonnegative Sparse Regression With Self-Dictionary

Science.gov (United States)

Gillis, Nicolas; Luce, Robert

2018-01-01

A nonnegative matrix factorization (NMF) can be computed efficiently under the separability assumption, which asserts that all the columns of the given input data matrix belong to the cone generated by a (small) subset of them. The provably most robust methods to identify these conic basis columns are based on nonnegative sparse regression and self dictionaries, and require the solution of large-scale convex optimization problems. In this paper we study a particular nonnegative sparse regression model with self dictionary. As opposed to previously proposed models, this model yields a smooth optimization problem where the sparsity is enforced through linear constraints. We show that the Euclidean projection on the polyhedron defined by these constraints can be computed efficiently, and propose a fast gradient method to solve our model. We compare our algorithm with several state-of-the-art methods on synthetic data sets and real-world hyperspectral images.
A multiple regression method for genomewide association studies ...

Indian Academy of Sciences (India)

Bujun Mei

2018-06-07

Jun 7, 2018 ... Similar to the typical genomewide association tests using LD ... new approach performed validly when the multiple regression based on linkage method was employed. .... the model, two groups of scenarios were simulated.
Regularized Label Relaxation Linear Regression.

Science.gov (United States)

Fang, Xiaozhao; Xu, Yong; Li, Xuelong; Lai, Zhihui; Wong, Wai Keung; Fang, Bingwu

2018-04-01

Linear regression (LR) and some of its variants have been widely used for classification problems. Most of these methods assume that during the learning phase, the training samples can be exactly transformed into a strict binary label matrix, which has too little freedom to fit the labels adequately. To address this problem, in this paper, we propose a novel regularized label relaxation LR method, which has the following notable characteristics. First, the proposed method relaxes the strict binary label matrix into a slack variable matrix by introducing a nonnegative label relaxation matrix into LR, which provides more freedom to fit the labels and simultaneously enlarges the margins between different classes as much as possible. Second, the proposed method constructs the class compactness graph based on manifold learning and uses it as the regularization item to avoid the problem of overfitting. The class compactness graph is used to ensure that the samples sharing the same labels can be kept close after they are transformed. Two different algorithms, which are, respectively, based on -norm and -norm loss functions are devised. These two algorithms have compact closed-form solutions in each iteration so that they are easily implemented. Extensive experiments show that these two algorithms outperform the state-of-the-art algorithms in terms of the classification accuracy and running time.
EFEKTIVITAS PEMBELAJARAN MATEMATIKA DENGAN METODE PROBLEM POSING BERBASIS PENDIDIKAN KARAKTER

Directory of Open Access Journals (Sweden)

Eka Lia Susanti

2012-06-01

. Method of data collection with observation sheets and tests. Data processed by the t test and the comparative effect of regression testing. The results showed that the experimental class learning achievement (82.74 was statistically exceed KKM (75. With simple linear regression test obtained regression equation Y = -15.847 + 1.194 X and R2 = 0.829. The coefficient of X is a positive number so that the activity has a positive effect on learning achievement of 82.9%. Average learning achievement experimental class (82.74 and the average learning achievement control class (72.91. In statistical learning achievement test experimental class is better than the control class learning achievement. Based on the results of the analysis is concluded (1 learning to achieve complete learning, (2 a positive influence on the activity of learning achievement, and (3 experimental class learning achievement is better than the control class learning achievement; that learning mathematics with problem posing method based on character education in the TeenZania laboratory is an effective learning.
Problems With Section Two ITP TOEFL Test

Directory of Open Access Journals (Sweden)

Rizki Ananda

2016-03-01

Full Text Available This study was designed to investigate (1 the difficulties faced by EFL university students with section two of the ITP, and (2 whether part A or part B was more difficult for them and why. A number of 26 students from two different universities, Syiah Kuala University and the State Islamic University Ar-Raniry were the samples for the test. The data was obtained from a multiple choice questionnaire test consisting of 46 questions, each with 4 answers to choose from. The results showed that inversions (12%, subject-verb agreements (10%, adverb clause connectors (7%, passives (6%, reduced adjective clauses (5%, parallel structures (5% and use of verbs (5% were the most difficult questions for the students. Furthermore, they felt that part B was more difficult than part A, as finding an error in a sentence was harder than completing a sentence from a multiple choice. Furthermore, the length of questions in part A did not affect the amount of time the students spent to complete part A and did not cause them to panic. Also, unfamiliar words in part A were not regarded as a problem by the students. Hence, TOEFL teachers and trainers are highly encouraged to pay more attention to doing study exercises for the seven topics with the highest percentages above in part A and also to more practice for part B.
Physics constrained nonlinear regression models for time series

International Nuclear Information System (INIS)

Majda, Andrew J; Harlim, John

2013-01-01

A central issue in contemporary science is the development of data driven statistical nonlinear dynamical models for time series of partial observations of nature or a complex physical model. It has been established recently that ad hoc quadratic multi-level regression (MLR) models can have finite-time blow up of statistical solutions and/or pathological behaviour of their invariant measure. Here a new class of physics constrained multi-level quadratic regression models are introduced, analysed and applied to build reduced stochastic models from data of nonlinear systems. These models have the advantages of incorporating memory effects in time as well as the nonlinear noise from energy conserving nonlinear interactions. The mathematical guidelines for the performance and behaviour of these physics constrained MLR models as well as filtering algorithms for their implementation are developed here. Data driven applications of these new multi-level nonlinear regression models are developed for test models involving a nonlinear oscillator with memory effects and the difficult test case of the truncated Burgers–Hopf model. These new physics constrained quadratic MLR models are proposed here as process models for Bayesian estimation through Markov chain Monte Carlo algorithms of low frequency behaviour in complex physical data. (paper)
Large biases in regression-based constituent flux estimates: causes and diagnostic tools

Science.gov (United States)

Hirsch, Robert M.

2014-01-01

It has been documented in the literature that, in some cases, widely used regression-based models can produce severely biased estimates of long-term mean river fluxes of various constituents. These models, estimated using sample values of concentration, discharge, and date, are used to compute estimated fluxes for a multiyear period at a daily time step. This study compares results of the LOADEST seven-parameter model, LOADEST five-parameter model, and the Weighted Regressions on Time, Discharge, and Season (WRTDS) model using subsampling of six very large datasets to better understand this bias problem. This analysis considers sample datasets for dissolved nitrate and total phosphorus. The results show that LOADEST-7 and LOADEST-5, although they often produce very nearly unbiased results, can produce highly biased results. This study identifies three conditions that can give rise to these severe biases: (1) lack of fit of the log of concentration vs. log discharge relationship, (2) substantial differences in the shape of this relationship across seasons, and (3) severely heteroscedastic residuals. The WRTDS model is more resistant to the bias problem than the LOADEST models but is not immune to them. Understanding the causes of the bias problem is crucial to selecting an appropriate method for flux computations. Diagnostic tools for identifying the potential for bias problems are introduced, and strategies for resolving bias problems are described.
Sparse Regression by Projection and Sparse Discriminant Analysis

KAUST Repository

Qi, Xin

2015-04-03

© 2015, © American Statistical Association, Institute of Mathematical Statistics, and Interface Foundation of North America. Recent years have seen active developments of various penalized regression methods, such as LASSO and elastic net, to analyze high-dimensional data. In these approaches, the direction and length of the regression coefficients are determined simultaneously. Due to the introduction of penalties, the length of the estimates can be far from being optimal for accurate predictions. We introduce a new framework, regression by projection, and its sparse version to analyze high-dimensional data. The unique nature of this framework is that the directions of the regression coefficients are inferred first, and the lengths and the tuning parameters are determined by a cross-validation procedure to achieve the largest prediction accuracy. We provide a theoretical result for simultaneous model selection consistency and parameter estimation consistency of our method in high dimension. This new framework is then generalized such that it can be applied to principal components analysis, partial least squares, and canonical correlation analysis. We also adapt this framework for discriminant analysis. Compared with the existing methods, where there is relatively little control of the dependency among the sparse components, our method can control the relationships among the components. We present efficient algorithms and related theory for solving the sparse regression by projection problem. Based on extensive simulations and real data analysis, we demonstrate that our method achieves good predictive performance and variable selection in the regression setting, and the ability to control relationships between the sparse components leads to more accurate classification. In supplementary materials available online, the details of the algorithms and theoretical proofs, and R codes for all simulation studies are provided.
Estimasi Model Seemingly Unrelated Regression (SUR dengan Metode Generalized Least Square (GLS

Directory of Open Access Journals (Sweden)

Ade Widyaningsih

2015-04-01

Full Text Available Regression analysis is a statistical tool that is used to determine the relationship between two or more quantitative variables so that one variable can be predicted from the other variables. A method that can used to obtain a good estimation in the regression analysis is ordinary least squares method. The least squares method is used to estimate the parameters of one or more regression but relationships among the errors in the response of other estimators are not allowed. One way to overcome this problem is Seemingly Unrelated Regression model (SUR in which parameters are estimated using Generalized Least Square (GLS. In this study, the author applies SUR model using GLS method on world gasoline demand data. The author obtains that SUR using GLS is better than OLS because SUR produce smaller errors than the OLS.
Estimasi Model Seemingly Unrelated Regression (SUR dengan Metode Generalized Least Square (GLS

Directory of Open Access Journals (Sweden)

Ade Widyaningsih

2014-06-01

Full Text Available Regression analysis is a statistical tool that is used to determine the relationship between two or more quantitative variables so that one variable can be predicted from the other variables. A method that can used to obtain a good estimation in the regression analysis is ordinary least squares method. The least squares method is used to estimate the parameters of one or more regression but relationships among the errors in the response of other estimators are not allowed. One way to overcome this problem is Seemingly Unrelated Regression model (SUR in which parameters are estimated using Generalized Least Square (GLS. In this study, the author applies SUR model using GLS method on world gasoline demand data. The author obtains that SUR using GLS is better than OLS because SUR produce smaller errors than the OLS.
Distributed Monitoring of the R(sup 2) Statistic for Linear Regression

Science.gov (United States)

Bhaduri, Kanishka; Das, Kamalika; Giannella, Chris R.

2011-01-01

The problem of monitoring a multivariate linear regression model is relevant in studying the evolving relationship between a set of input variables (features) and one or more dependent target variables. This problem becomes challenging for large scale data in a distributed computing environment when only a subset of instances is available at individual nodes and the local data changes frequently. Data centralization and periodic model recomputation can add high overhead to tasks like anomaly detection in such dynamic settings. Therefore, the goal is to develop techniques for monitoring and updating the model over the union of all nodes data in a communication-efficient fashion. Correctness guarantees on such techniques are also often highly desirable, especially in safety-critical application scenarios. In this paper we develop DReMo a distributed algorithm with very low resource overhead, for monitoring the quality of a regression model in terms of its coefficient of determination (R2 statistic). When the nodes collectively determine that R2 has dropped below a fixed threshold, the linear regression model is recomputed via a network-wide convergecast and the updated model is broadcast back to all nodes. We show empirically, using both synthetic and real data, that our proposed method is highly communication-efficient and scalable, and also provide theoretical guarantees on correctness.
Assessing risk factors for periodontitis using regression

Science.gov (United States)

Lobo Pereira, J. A.; Ferreira, Maria Cristina; Oliveira, Teresa

2013-10-01

Multivariate statistical analysis is indispensable to assess the associations and interactions between different factors and the risk of periodontitis. Among others, regression analysis is a statistical technique widely used in healthcare to investigate and model the relationship between variables. In our work we study the impact of socio-demographic, medical and behavioral factors on periodontal health. Using regression, linear and logistic models, we can assess the relevance, as risk factors for periodontitis disease, of the following independent variables (IVs): Age, Gender, Diabetic Status, Education, Smoking status and Plaque Index. The multiple linear regression analysis model was built to evaluate the influence of IVs on mean Attachment Loss (AL). Thus, the regression coefficients along with respective p-values will be obtained as well as the respective p-values from the significance tests. The classification of a case (individual) adopted in the logistic model was the extent of the destruction of periodontal tissues defined by an Attachment Loss greater than or equal to 4 mm in 25% (AL≥4mm/≥25%) of sites surveyed. The association measures include the Odds Ratios together with the correspondent 95% confidence intervals.
Early breastfeeding problems

DEFF Research Database (Denmark)

Feenstra, Maria Monberg; Kirkeby, Mette Jørgine; Thygesen, Marianne

2018-01-01

Objectives Breastfeeding problems are common and associated with early cessation. Stilllength of postpartum hospital stay has been reduced. This leaves new mothers to establish breastfeeding at home with less support from health care professionals. The objective was to explore mothers’ perspectives...... on when breastfeeding problems were the most challenging and prominent early postnatal. The aim was also toidentify possible factors associated with the breastfeeding problems. Methods In a cross-sectional study, a mixed method approach was used to analyse postal survey data from 1437 mothers with full...... term singleton infants. Content analysis was used to analyse mothers’ open text descriptions of their most challenging breastfeeding problem. Multiple logistic regression was used to calculate odds ratios for early breastfeeding problems according to sociodemographic- and psychosocial factors. Results...
Estimation of Geographically Weighted Regression Case Study on Wet Land Paddy Productivities in Tulungagung Regency

Directory of Open Access Journals (Sweden)

Danang Ariyanto

2017-11-01

Full Text Available Regression is a method connected independent variable and dependent variable with estimation parameter as an output. Principal problem in this method is its application in spatial data. Geographically Weighted Regression (GWR method used to solve the problem. GWR is a regression technique that extends the traditional regression framework by allowing the estimation of local rather than global parameters. In other words, GWR runs a regression for each location, instead of a sole regression for the entire study area. The purpose of this research is to analyze the factors influencing wet land paddy productivities in Tulungagung Regency. The methods used in this research is GWR using cross validation bandwidth and weighted by adaptive Gaussian kernel fungtion.This research using 4 variables which are presumed affecting the wet land paddy productivities such as: the rate of rainfall(X1, the average cost of fertilizer per hectare(X2, the average cost of pestisides per hectare(X3 and Allocation of subsidized NPK fertilizer of food crops sub-sector(X4. Based on the result, X1, X2, X3 and X4 has a different effect on each Distric. So, to improve the productivity of wet land paddy in Tulungagung Regency required a special policy based on the GWR model in each distric.

[Development of a proverb test for assessment of concrete thinking problems in schizophrenic patients].

Science.gov (United States)

Barth, A; Küfferle, B

2001-11-01

Concretism is considered an important aspect of schizophrenic thought disorder. Traditionally it is measured using the method of proverb interpretation, in which metaphoric proverbs are presented with the request that the subject tell its meaning. Interpretations are recorded and scored on concretistic tendencies. However, this method has two problems: its reliability is doubtful and it is rather complicated to perform. In this paper, a new version of a multiple choice proverb test is presented which can solve these problems in a reliable and economic manner. Using the new test, it is has been shown that schizophrenic patients have greater deficits in proverb interpretation than depressive patients.
The 1-min Screening Test for Reading Problems in College Students: Psychometric Properties of the 1-min TIL.

Science.gov (United States)

Fernandes, Tânia; Araújo, Susana; Sucena, Ana; Reis, Alexandra; Castro, São Luís

2017-02-01

Reading is a central cognitive domain, but little research has been devoted to standardized tests for adults. We, thus, examined the psychometric properties of the 1-min version of Teste de Idade de Leitura (Reading Age Test; 1-min TIL), the Portuguese version of Lobrot L3 test, in three experiments with college students: typical readers in Experiment 1A and B, dyslexic readers and chronological age controls in Experiment 2. In Experiment 1A, test-retest reliability and convergent validity were evaluated in 185 students. Reliability was >.70, and phonological decoding underpinned 1-min TIL. In Experiment 1B, internal consistency was assessed by presenting two 45-s versions of the test to 19 students, and performance in these versions was significantly associated (r = .78). In Experiment 2, construct validity, criterion validity and clinical utility of 1-min TIL were investigated. A multiple regression analysis corroborated construct validity; both phonological decoding and listening comprehension were reliable predictors of 1-min TIL scores. Logistic regression and receiver operating characteristics analyses revealed the high accuracy of this test in distinguishing dyslexic from typical readers. Therefore, the 1-min TIL, which assesses reading comprehension and potential reading difficulties in college students, has the necessary psychometric properties to become a useful screening instrument in neuropsychological assessment and research. Copyright © 2017 John Wiley & Sons, Ltd. Copyright © 2017 John Wiley & Sons, Ltd.
The application of an artificial immune system for solving the identification problem

Directory of Open Access Journals (Sweden)

Astachova Irina

2017-01-01

Full Text Available Ecological prognosis sets the identification task, which is to find the capacity of pollution sources based on the available experimental data. This problem is an inverse problem, for the solution of which the method of symbolic regression is considered. The distributed artificial immune system is used as an algorithm for the problem solving. The artificial immune system (AIS is a model that allows solving various problems of identification, its concept was borrowed from biology. The solution is sought using a distributed version of the artificial immune system, which is implemented through a network. This distributed network can operate in any heterogeneous environment, which is achieved through the use of cross-platform Python programming language. AIS demonstrates the ability to restore the original function in the problem of identification. The obtained solution for the test data is represented by the graph.
Inferring gene expression dynamics via functional regression analysis

Directory of Open Access Journals (Sweden)

Leng Xiaoyan

2008-01-01

Full Text Available Abstract Background Temporal gene expression profiles characterize the time-dynamics of expression of specific genes and are increasingly collected in current gene expression experiments. In the analysis of experiments where gene expression is obtained over the life cycle, it is of interest to relate temporal patterns of gene expression associated with different developmental stages to each other to study patterns of long-term developmental gene regulation. We use tools from functional data analysis to study dynamic changes by relating temporal gene expression profiles of different developmental stages to each other. Results We demonstrate that functional regression methodology can pinpoint relationships that exist between temporary gene expression profiles for different life cycle phases and incorporates dimension reduction as needed for these high-dimensional data. By applying these tools, gene expression profiles for pupa and adult phases are found to be strongly related to the profiles of the same genes obtained during the embryo phase. Moreover, one can distinguish between gene groups that exhibit relationships with positive and others with negative associations between later life and embryonal expression profiles. Specifically, we find a positive relationship in expression for muscle development related genes, and a negative relationship for strictly maternal genes for Drosophila, using temporal gene expression profiles. Conclusion Our findings point to specific reactivation patterns of gene expression during the Drosophila life cycle which differ in characteristic ways between various gene groups. Functional regression emerges as a useful tool for relating gene expression patterns from different developmental stages, and avoids the problems with large numbers of parameters and multiple testing that affect alternative approaches.
Investigating the Effect of Complexity Factors in Stoichiometry Problems Using Logistic Regression and Eye Tracking

Science.gov (United States)

Tang, Hui; Kirk, John; Pienta, Norbert J.

2014-01-01

This paper includes two experiments, one investigating complexity factors in stoichiometry word problems, and the other identifying students' problem-solving protocols by using eye-tracking technology. The word problems used in this study had five different complexity factors, which were randomly assigned by a Web-based tool that we developed. The…
Modelling infant mortality rate in Central Java, Indonesia use generalized poisson regression method

Science.gov (United States)

Prahutama, Alan; Sudarno

2018-05-01

The infant mortality rate is the number of deaths under one year of age occurring among the live births in a given geographical area during a given year, per 1,000 live births occurring among the population of the given geographical area during the same year. This problem needs to be addressed because it is an important element of a country’s economic development. High infant mortality rate will disrupt the stability of a country as it relates to the sustainability of the population in the country. One of regression model that can be used to analyze the relationship between dependent variable Y in the form of discrete data and independent variable X is Poisson regression model. Recently The regression modeling used for data with dependent variable is discrete, among others, poisson regression, negative binomial regression and generalized poisson regression. In this research, generalized poisson regression modeling gives better AIC value than poisson regression. The most significant variable is the Number of health facilities (X1), while the variable that gives the most influence to infant mortality rate is the average breastfeeding (X9).
Multi-platform SCADA GUI Regression Testing at CERN

CERN Document Server

Burkimsher, P C; Klikovits, S

2011-01-01

The JCOP Framework is a toolkit used widely at CERN for the development of industrial control systems in several domains (i.e. experiments, accelerators and technical infrastructure). The software development started 10 years ago and there is now a large base of production systems running it. For the success of the project, it was essential to formalize and automate the quality assurance process. This paper will present the overall testing strategy and will describe in detail mechanisms used for GUI testing. The choice of a commercial tool (Squish) and the architectural features making it appropriate for our multi-platform environment will be described. Practical difficulties encountered when using the tool in the CERN context are discussed as well as how these were addressed. In the light of initial experience, the test code itself has been recently reworked in object-oriented style to facilitate future maintenance and extension. The current reporting process is described, as well as future plans for easy re...
Exposure to child abuse and risk for mental health problems in women.

Science.gov (United States)

Schneider, Renee; Baumrind, Nikki; Kimerling, Rachel

2007-01-01

Risk for adult mental health problems associated with child sexual, physical, or emotional abuse and multiple types of child abuse was examined. Logistic regression analyses were used to test study hypotheses in a population-based sample of women (N = 3,936). As expected, child sexual, physical, and emotional abuse were independently associated with increased risk for mental health problems. History of multiple types of child abuse was also associated with elevated risk for mental health problems. In particular, exposure to all three types of child abuse was linked to a 23-fold increase in risk for probable posttraumatic stress disorder (PTSD). Findings underscore relations between child emotional abuse and adult mental health problems and highlight the need for mental health services for survivors of multiple types of child abuse.
A Comparison of Advanced Regression Algorithms for Quantifying Urban Land Cover

Directory of Open Access Journals (Sweden)

Akpona Okujeni

2014-07-01

Full Text Available Quantitative methods for mapping sub-pixel land cover fractions are gaining increasing attention, particularly with regard to upcoming hyperspectral satellite missions. We evaluated five advanced regression algorithms combined with synthetically mixed training data for quantifying urban land cover from HyMap data at 3.6 and 9 m spatial resolution. Methods included support vector regression (SVR, kernel ridge regression (KRR, artificial neural networks (NN, random forest regression (RFR and partial least squares regression (PLSR. Our experiments demonstrate that both kernel methods SVR and KRR yield high accuracies for mapping complex urban surface types, i.e., rooftops, pavements, grass- and tree-covered areas. SVR and KRR models proved to be stable with regard to the spatial and spectral differences between both images and effectively utilized the higher complexity of the synthetic training mixtures for improving estimates for coarser resolution data. Observed deficiencies mainly relate to known problems arising from spectral similarities or shadowing. The remaining regressors either revealed erratic (NN or limited (RFR and PLSR performances when comprehensively mapping urban land cover. Our findings suggest that the combination of kernel-based regression methods, such as SVR and KRR, with synthetically mixed training data is well suited for quantifying urban land cover from imaging spectrometer data at multiple scales.
3D Modeling and Simulation for Electromagnetic Non-Destructive Testing- Problems and Limitations

International Nuclear Information System (INIS)

Ilham Mukriz Zainal Abidin; Nurul Ain Ahmad Latif

2011-01-01

Non-Destructive Testing (NDT) plays a critical role in nuclear power plants (NPPs) for life cycle management; such testing requires specialists with various NDT related expertise with specific equipment. This paper will discuss the importance of 3D modeling and simulation for electromagnetic NDT for critical and complex components in terms of engineering reasoning and physical trials. Results from simulation are presented which show the link established between the measurements and information relating to defects, such as 3D shape, size and location, which facilitates not only forward problem but also inverse modeling involving experimental system specification and configuration; and pattern recognition for 3D defect information. Subsequently, the problems and limitations pertinent to 3D modeling and simulation are then highlighted and areas of improvement are discussed. (author)
No rationale for 1 variable per 10 events criterion for binary logistic regression analysis.

Science.gov (United States)

van Smeden, Maarten; de Groot, Joris A H; Moons, Karel G M; Collins, Gary S; Altman, Douglas G; Eijkemans, Marinus J C; Reitsma, Johannes B

2016-11-24

Ten events per variable (EPV) is a widely advocated minimal criterion for sample size considerations in logistic regression analysis. Of three previous simulation studies that examined this minimal EPV criterion only one supports the use of a minimum of 10 EPV. In this paper, we examine the reasons for substantial differences between these extensive simulation studies. The current study uses Monte Carlo simulations to evaluate small sample bias, coverage of confidence intervals and mean square error of logit coefficients. Logistic regression models fitted by maximum likelihood and a modified estimation procedure, known as Firth's correction, are compared. The results show that besides EPV, the problems associated with low EPV depend on other factors such as the total sample size. It is also demonstrated that simulation results can be dominated by even a few simulated data sets for which the prediction of the outcome by the covariates is perfect ('separation'). We reveal that different approaches for identifying and handling separation leads to substantially different simulation results. We further show that Firth's correction can be used to improve the accuracy of regression coefficients and alleviate the problems associated with separation. The current evidence supporting EPV rules for binary logistic regression is weak. Given our findings, there is an urgent need for new research to provide guidance for supporting sample size considerations for binary logistic regression analysis.
Simultaneous Estimation of Regression Functions for Marine Corps Technical Training Specialties.

Science.gov (United States)

Dunbar, Stephen B.; And Others

This paper considers the application of Bayesian techniques for simultaneous estimation to the specification of regression weights for selection tests used in various technical training courses in the Marine Corps. Results of a method for m-group regression developed by Molenaar and Lewis (1979) suggest that common weights for training courses…
The Screening Test for Emotional Problems-Parent Report (STEP-P): Studies of Reliability and Validity

Science.gov (United States)

Erford, Bradley T.; Alsamadi, Silvana C.

2012-01-01

Score reliability and validity of parent responses concerning their 10- to 17-year-old students were analyzed using the Screening Test for Emotional Problems-Parent Report (STEP-P), which assesses a variety of emotional problems classified under the Individuals with Disabilities Education Improvement Act. Score reliability, convergent, and…
Biostatistics Series Module 6: Correlation and Linear Regression.

Science.gov (United States)

Hazra, Avijit; Gogtay, Nithya

2016-01-01

Correlation and linear regression are the most commonly used techniques for quantifying the association between two numeric variables. Correlation quantifies the strength of the linear relationship between paired variables, expressing this as a correlation coefficient. If both variables x and y are normally distributed, we calculate Pearson's correlation coefficient ( r ). If normality assumption is not met for one or both variables in a correlation analysis, a rank correlation coefficient, such as Spearman's rho (ρ) may be calculated. A hypothesis test of correlation tests whether the linear relationship between the two variables holds in the underlying population, in which case it returns a P correlation coefficient can also be calculated for an idea of the correlation in the population. The value r 2 denotes the proportion of the variability of the dependent variable y that can be attributed to its linear relation with the independent variable x and is called the coefficient of determination. Linear regression is a technique that attempts to link two correlated variables x and y in the form of a mathematical equation ( y = a + bx ), such that given the value of one variable the other may be predicted. In general, the method of least squares is applied to obtain the equation of the regression line. Correlation and linear regression analysis are based on certain assumptions pertaining to the data sets. If these assumptions are not met, misleading conclusions may be drawn. The first assumption is that of linear relationship between the two variables. A scatter plot is essential before embarking on any correlation-regression analysis to show that this is indeed the case. Outliers or clustering within data sets can distort the correlation coefficient value. Finally, it is vital to remember that though strong correlation can be a pointer toward causation, the two are not synonymous.
Exhaustive Search for Sparse Variable Selection in Linear Regression

Science.gov (United States)

Igarashi, Yasuhiko; Takenaka, Hikaru; Nakanishi-Ohno, Yoshinori; Uemura, Makoto; Ikeda, Shiro; Okada, Masato

2018-04-01

We propose a K-sparse exhaustive search (ES-K) method and a K-sparse approximate exhaustive search method (AES-K) for selecting variables in linear regression. With these methods, K-sparse combinations of variables are tested exhaustively assuming that the optimal combination of explanatory variables is K-sparse. By collecting the results of exhaustively computing ES-K, various approximate methods for selecting sparse variables can be summarized as density of states. With this density of states, we can compare different methods for selecting sparse variables such as relaxation and sampling. For large problems where the combinatorial explosion of explanatory variables is crucial, the AES-K method enables density of states to be effectively reconstructed by using the replica-exchange Monte Carlo method and the multiple histogram method. Applying the ES-K and AES-K methods to type Ia supernova data, we confirmed the conventional understanding in astronomy when an appropriate K is given beforehand. However, we found the difficulty to determine K from the data. Using virtual measurement and analysis, we argue that this is caused by data shortage.
Regression: The Apple Does Not Fall Far From the Tree.

Science.gov (United States)

Vetter, Thomas R; Schober, Patrick

2018-05-15

Researchers and clinicians are frequently interested in either: (1) assessing whether there is a relationship or association between 2 or more variables and quantifying this association; or (2) determining whether 1 or more variables can predict another variable. The strength of such an association is mainly described by the correlation. However, regression analysis and regression models can be used not only to identify whether there is a significant relationship or association between variables but also to generate estimations of such a predictive relationship between variables. This basic statistical tutorial discusses the fundamental concepts and techniques related to the most common types of regression analysis and modeling, including simple linear regression, multiple regression, logistic regression, ordinal regression, and Poisson regression, as well as the common yet often underrecognized phenomenon of regression toward the mean. The various types of regression analysis are powerful statistical techniques, which when appropriately applied, can allow for the valid interpretation of complex, multifactorial data. Regression analysis and models can assess whether there is a relationship or association between 2 or more observed variables and estimate the strength of this association, as well as determine whether 1 or more variables can predict another variable. Regression is thus being applied more commonly in anesthesia, perioperative, critical care, and pain research. However, it is crucial to note that regression can identify plausible risk factors; it does not prove causation (a definitive cause and effect relationship). The results of a regression analysis instead identify independent (predictor) variable(s) associated with the dependent (outcome) variable. As with other statistical methods, applying regression requires that certain assumptions be met, which can be tested with specific diagnostics.
A method for nonlinear exponential regression analysis

Science.gov (United States)

Junkin, B. G.

1971-01-01

A computer-oriented technique is presented for performing a nonlinear exponential regression analysis on decay-type experimental data. The technique involves the least squares procedure wherein the nonlinear problem is linearized by expansion in a Taylor series. A linear curve fitting procedure for determining the initial nominal estimates for the unknown exponential model parameters is included as an integral part of the technique. A correction matrix was derived and then applied to the nominal estimate to produce an improved set of model parameters. The solution cycle is repeated until some predetermined criterion is satisfied.
Fixed kernel regression for voltammogram feature extraction

International Nuclear Information System (INIS)

Acevedo Rodriguez, F J; López-Sastre, R J; Gil-Jiménez, P; Maldonado Bascón, S; Ruiz-Reyes, N

2009-01-01

Cyclic voltammetry is an electroanalytical technique for obtaining information about substances under analysis without the need for complex flow systems. However, classifying the information in voltammograms obtained using this technique is difficult. In this paper, we propose the use of fixed kernel regression as a method for extracting features from these voltammograms, reducing the information to a few coefficients. The proposed approach has been applied to a wine classification problem with accuracy rates of over 98%. Although the method is described here for extracting voltammogram information, it can be used for other types of signals
Linear regression and sensitivity analysis in nuclear reactor design

International Nuclear Information System (INIS)

Kumar, Akansha; Tsvetkov, Pavel V.; McClarren, Ryan G.

2015-01-01

Highlights: • Presented a benchmark for the applicability of linear regression to complex systems. • Applied linear regression to a nuclear reactor power system. • Performed neutronics, thermal–hydraulics, and energy conversion using Brayton’s cycle for the design of a GCFBR. • Performed detailed sensitivity analysis to a set of parameters in a nuclear reactor power system. • Modeled and developed reactor design using MCNP, regression using R, and thermal–hydraulics in Java. - Abstract: The paper presents a general strategy applicable for sensitivity analysis (SA), and uncertainity quantification analysis (UA) of parameters related to a nuclear reactor design. This work also validates the use of linear regression (LR) for predictive analysis in a nuclear reactor design. The analysis helps to determine the parameters on which a LR model can be fit for predictive analysis. For those parameters, a regression surface is created based on trial data and predictions are made using this surface. A general strategy of SA to determine and identify the influential parameters those affect the operation of the reactor is mentioned. Identification of design parameters and validation of linearity assumption for the application of LR of reactor design based on a set of tests is performed. The testing methods used to determine the behavior of the parameters can be used as a general strategy for UA, and SA of nuclear reactor models, and thermal hydraulics calculations. A design of a gas cooled fast breeder reactor (GCFBR), with thermal–hydraulics, and energy transfer has been used for the demonstration of this method. MCNP6 is used to simulate the GCFBR design, and perform the necessary criticality calculations. Java is used to build and run input samples, and to extract data from the output files of MCNP6, and R is used to perform regression analysis and other multivariate variance, and analysis of the collinearity of data
THE DETERMINATION OF BETA COEFFICIENTS OF PUBLICLY-HELD COMPANIES BY A REGRESSION MODEL AND AN APPLICATION ON PRIVATE FIRMS

Directory of Open Access Journals (Sweden)

METİN KAMİL ERCAN

2013-06-01

Full Text Available It is possible to determine the value of private companies by means of suggestions and assumptions derived from their financial statements. However, there comes out a serious problem in the determination of equity costs of these private companies using Capital Assets Pricing Model (CAPM as beta coefficients are unknown or unavailable. In this study, firstly, a regression model that represents the relationship between the beta coefficients and financial statements’ Variables of publicly-held companies will be developed. Then, this model will be tested and applied on private companies.

Regression tools for CO2 inversions: application of a shrinkage estimator to process attribution

International Nuclear Information System (INIS)

Shaby, Benjamin A.; Field, Christopher B.

2006-01-01

In this study we perform an atmospheric inversion based on a shrinkage estimator. This method is used to estimate surface fluxes of CO 2 , first partitioned according to constituent geographic regions, and then according to constituent processes that are responsible for the total flux. Our approach differs from previous approaches in two important ways. The first is that the technique of linear Bayesian inversion is recast as a regression problem. Seen as such, standard regression tools are employed to analyse and reduce errors in the resultant estimates. A shrinkage estimator, which combines standard ridge regression with the linear 'Bayesian inversion' model, is introduced. This method introduces additional bias into the model with the aim of reducing variance such that errors are decreased overall. Compared with standard linear Bayesian inversion, the ridge technique seems to reduce both flux estimation errors and prediction errors. The second divergence from previous studies is that instead of dividing the world into geographically distinct regions and estimating the CO 2 flux in each region, the flux space is divided conceptually into processes that contribute to the total global flux. Formulating the problem in this manner adds to the interpretability of the resultant estimates and attempts to shed light on the problem of attributing sources and sinks to their underlying mechanisms
Optimal choice of basis functions in the linear regression analysis

International Nuclear Information System (INIS)

Khotinskij, A.M.

1988-01-01

Problem of optimal choice of basis functions in the linear regression analysis is investigated. Step algorithm with estimation of its efficiency, which holds true at finite number of measurements, is suggested. Conditions, providing the probability of correct choice close to 1 are formulated. Application of the step algorithm to analysis of decay curves is substantiated. 8 refs
Leak testing of cryogenic components — problems and solutions

Science.gov (United States)

Srivastava, S. P.; Pandarkar, S. P.; Unni, T. G.; Sinha, A. K.; Mahajan, K.; Suthar, R. L.

2008-05-01

moderator pot was driving the MSLD out of range. Since it was very difficult to locate the leak by Tracer Probe Method, some other technique was ventured to solve the problem of leak location. Finally, it was possible to locate the leak by observing the change in Helium background reading of MSLD during masking/unmasking of the welded joints. This paper, in general describes the design and leak testing aspects of cryogenic components of Cold Neutron Source and in particular, the problems and solutions for leak testing of transfer lines and moderator pot.
Principal Covariates Clusterwise Regression (PCCR): Accounting for Multicollinearity and Population Heterogeneity in Hierarchically Organized Data.

Science.gov (United States)

Wilderjans, Tom Frans; Vande Gaer, Eva; Kiers, Henk A L; Van Mechelen, Iven; Ceulemans, Eva

2017-03-01

In the behavioral sciences, many research questions pertain to a regression problem in that one wants to predict a criterion on the basis of a number of predictors. Although in many cases, ordinary least squares regression will suffice, sometimes the prediction problem is more challenging, for three reasons: first, multiple highly collinear predictors can be available, making it difficult to grasp their mutual relations as well as their relations to the criterion. In that case, it may be very useful to reduce the predictors to a few summary variables, on which one regresses the criterion and which at the same time yields insight into the predictor structure. Second, the population under study may consist of a few unknown subgroups that are characterized by different regression models. Third, the obtained data are often hierarchically structured, with for instance, observations being nested into persons or participants within groups or countries. Although some methods have been developed that partially meet these challenges (i.e., principal covariates regression (PCovR), clusterwise regression (CR), and structural equation models), none of these methods adequately deals with all of them simultaneously. To fill this gap, we propose the principal covariates clusterwise regression (PCCR) method, which combines the key idea's behind PCovR (de Jong & Kiers in Chemom Intell Lab Syst 14(1-3):155-164, 1992) and CR (Späth in Computing 22(4):367-373, 1979). The PCCR method is validated by means of a simulation study and by applying it to cross-cultural data regarding satisfaction with life.
Regressão e crescimento do primogênito no processo de tornar-se irmão Firstborn's regression and growth in the process of becoming a sibling

Directory of Open Access Journals (Sweden)

Débora Silva Oliveira

2013-03-01

Full Text Available Investigaram-se indicadores de regressão e crescimento do primogênito no processo de tornar-se irmão. Participaram três primogênitos pré-escolares no terceiro trimestre de gestação, aos 12 e 24 meses do irmão. Foi aplicado o Teste das Fábulas e realizada análise qualitativa de conteúdo. Os resultados revelaram regressão do primogênito na gestação materna e crescimento, aos 12 e aos 24 meses de idade do irmão. A regressão foi uma forma de enfrentar a chegada do irmão, enquanto que o crescimento revelou capacidade para conquistas ou custos de ser mais velho. Tanto a regressão quanto o crescimento oportunizaram um ir e vir saudável, fundamental para o desenvolvimento rumo à independência. Esses achados têm implicações para a pesquisa e para a clínica.Regression and growth indicators in the process of becoming a sibling were investigated. Three firstborns took part in the study during the first sibling's third trimester of pregnancy, and when the sibling was 12 and 24 months old, respectively. The Fables Test was used and a qualitative content analysis was carried out. Results revealed regression indicators during pregnancy. At 12 and 24 months there were growth indicators together with regression indicators. Regression was used by the firstborn for coping with the sibling's arrival while growth revealed the capacity for acquisitions or the costs of being an older sibling. Both regressive and growth manifestations enabled a healthy to and fro, which is fundamental for development towards independence. These findings have both research and clinical implications.
Some problems on materials tests in high temperature hydrogen base gas mixture

International Nuclear Information System (INIS)

Shikama, Tatsuo; Tanabe, Tatsuhiko; Fujitsuka, Masakazu; Yoshida, Heitaro; Watanabe, Ryoji

1980-01-01

Some problems have been examined on materials tests (creep rupture tests and corrosion tests) in high temperature mixture gas of hydrogen (80%H 2 + 15%CO + 5%CO 2 ) simulating the reducing gas for direct steel making. H 2 , CO, CO 2 and CH 4 in the reducing gas interact with each other at elevated temperature and produce water vapor (H 2 O) and carbon (soot). Carbon deposited on the walls of retorts and the water condensed at pipings of the lower temperature gas outlet causes blocking of gas flow. The gas reactions have been found to be catalyzed by the retort walls, and appropriate selection of the materials for retorts has been found to mitigate the problems caused by water condensation and carbon deposition. Quartz has been recognized to be one of the most promising materials for minimizing the gas reactions. And ceramic coating, namely, BN (born nitride) on the heat resistant superalloy, MO-RE II, has reduced the amounts of water vapor and deposited carbon (sooting) produced by gas reactions and has kept dew points of outlet gas below room temperature. The well known emf (thermo-electromotive force) deterioration of Alumel-Chromel thermocouples in the reducing gases at elevated temperatures has been also found to be prevented by the ceramic (BN) coating. (author)
Attitude and practice of physical activity and social problem-solving ability among university students.

Science.gov (United States)

Sone, Toshimasa; Kawachi, Yousuke; Abe, Chihiro; Otomo, Yuki; Sung, Yul-Wan; Ogawa, Seiji

2017-04-04

Effective social problem-solving abilities can contribute to decreased risk of poor mental health. In addition, physical activity has a favorable effect on mental health. These previous studies suggest that physical activity and social problem-solving ability can interact by helping to sustain mental health. The present study aimed to determine the association between attitude and practice of physical activity and social problem-solving ability among university students. Information on physical activity and social problem-solving was collected using a self-administered questionnaire. We analyzed data from 185 students who participated in the questionnaire surveys and psychological tests. Social problem-solving as measured by the Social Problem-Solving Inventory-Revised (SPSI-R) (median score 10.85) was the dependent variable. Multiple logistic regression analysis was employed to calculate the odds ratios (ORs) and 95% confidence intervals (CIs) for higher SPSI-R according to physical activity categories. The multiple logistic regression analysis indicated that the ORs (95% CI) in reference to participants who said they never considered exercising were 2.08 (0.69-6.93), 1.62 (0.55-5.26), 2.78 (0.86-9.77), and 6.23 (1.81-23.97) for participants who did not exercise but intended to start, tried to exercise but did not, exercised but not regularly, and exercised regularly, respectively. This finding suggested that positive linear association between physical activity and social problem-solving ability (p value for linear trend social problem-solving ability.
A brief introduction to regression designs and mixed-effects modelling by a recent convert

OpenAIRE

Balling, Laura Winther

2008-01-01

This article discusses the advantages of multiple regression designs over the factorial designs traditionally used in many psycholinguistic experiments. It is shown that regression designs are typically more informative, statistically more powerful and better suited to the analysis of naturalistic tasks. The advantages of including both fixed and random effects are demonstrated with reference to linear mixed-effects models, and problems of collinearity, variable distribution and variable sele...
Robust Image Regression Based on the Extended Matrix Variate Power Exponential Distribution of Dependent Noise.

Science.gov (United States)

Luo, Lei; Yang, Jian; Qian, Jianjun; Tai, Ying; Lu, Gui-Fu

2017-09-01

Dealing with partial occlusion or illumination is one of the most challenging problems in image representation and classification. In this problem, the characterization of the representation error plays a crucial role. In most current approaches, the error matrix needs to be stretched into a vector and each element is assumed to be independently corrupted. This ignores the dependence between the elements of error. In this paper, it is assumed that the error image caused by partial occlusion or illumination changes is a random matrix variate and follows the extended matrix variate power exponential distribution. This has the heavy tailed regions and can be used to describe a matrix pattern of l×m dimensional observations that are not independent. This paper reveals the essence of the proposed distribution: it actually alleviates the correlations between pixels in an error matrix E and makes E approximately Gaussian. On the basis of this distribution, we derive a Schatten p -norm-based matrix regression model with L q regularization. Alternating direction method of multipliers is applied to solve this model. To get a closed-form solution in each step of the algorithm, two singular value function thresholding operators are introduced. In addition, the extended Schatten p -norm is utilized to characterize the distance between the test samples and classes in the design of the classifier. Extensive experimental results for image reconstruction and classification with structural noise demonstrate that the proposed algorithm works much more robustly than some existing regression-based methods.
Quantile regression for the statistical analysis of immunological data with many non-detects.

Science.gov (United States)

Eilers, Paul H C; Röder, Esther; Savelkoul, Huub F J; van Wijk, Roy Gerth

2012-07-07

Immunological parameters are hard to measure. A well-known problem is the occurrence of values below the detection limit, the non-detects. Non-detects are a nuisance, because classical statistical analyses, like ANOVA and regression, cannot be applied. The more advanced statistical techniques currently available for the analysis of datasets with non-detects can only be used if a small percentage of the data are non-detects. Quantile regression, a generalization of percentiles to regression models, models the median or higher percentiles and tolerates very high numbers of non-detects. We present a non-technical introduction and illustrate it with an implementation to real data from a clinical trial. We show that by using quantile regression, groups can be compared and that meaningful linear trends can be computed, even if more than half of the data consists of non-detects. Quantile regression is a valuable addition to the statistical methods that can be used for the analysis of immunological datasets with non-detects.
Penalized regression procedures for variable selection in the potential outcomes framework.

Science.gov (United States)

Ghosh, Debashis; Zhu, Yeying; Coffman, Donna L

2015-05-10

A recent topic of much interest in causal inference is model selection. In this article, we describe a framework in which to consider penalized regression approaches to variable selection for causal effects. The framework leads to a simple 'impute, then select' class of procedures that is agnostic to the type of imputation algorithm as well as penalized regression used. It also clarifies how model selection involves a multivariate regression model for causal inference problems and that these methods can be applied for identifying subgroups in which treatment effects are homogeneous. Analogies and links with the literature on machine learning methods, missing data, and imputation are drawn. A difference least absolute shrinkage and selection operator algorithm is defined, along with its multiple imputation analogs. The procedures are illustrated using a well-known right-heart catheterization dataset. Copyright © 2015 John Wiley & Sons, Ltd.
Performance of the modified Poisson regression approach for estimating relative risks from clustered prospective data.

Science.gov (United States)

Yelland, Lisa N; Salter, Amy B; Ryan, Philip

2011-10-15

Modified Poisson regression, which combines a log Poisson regression model with robust variance estimation, is a useful alternative to log binomial regression for estimating relative risks. Previous studies have shown both analytically and by simulation that modified Poisson regression is appropriate for independent prospective data. This method is often applied to clustered prospective data, despite a lack of evidence to support its use in this setting. The purpose of this article is to evaluate the performance of the modified Poisson regression approach for estimating relative risks from clustered prospective data, by using generalized estimating equations to account for clustering. A simulation study is conducted to compare log binomial regression and modified Poisson regression for analyzing clustered data from intervention and observational studies. Both methods generally perform well in terms of bias, type I error, and coverage. Unlike log binomial regression, modified Poisson regression is not prone to convergence problems. The methods are contrasted by using example data sets from 2 large studies. The results presented in this article support the use of modified Poisson regression as an alternative to log binomial regression for analyzing clustered prospective data when clustering is taken into account by using generalized estimating equations.
Automatically stable discontinuous Petrov-Galerkin methods for stationary transport problems: Quasi-optimal test space norm

KAUST Repository

Niemi, Antti H.

2013-12-01

We investigate the application of the discontinuous Petrov-Galerkin (DPG) finite element framework to stationary convection-diffusion problems. In particular, we demonstrate how the quasi-optimal test space norm improves the robustness of the DPG method with respect to vanishing diffusion. We numerically compare coarse-mesh accuracy of the approximation when using the quasi-optimal norm, the standard norm, and the weighted norm. Our results show that the quasi-optimal norm leads to more accurate results on three benchmark problems in two spatial dimensions. We address the problems associated to the resolution of the optimal test functions with respect to the quasi-optimal norm by studying their convergence numerically. In order to facilitate understanding of the method, we also include a detailed explanation of the methodology from the algorithmic point of view. © 2013 Elsevier Ltd. All rights reserved.
Automatically stable discontinuous Petrov-Galerkin methods for stationary transport problems: Quasi-optimal test space norm

KAUST Repository

Niemi, Antti H.; Collier, Nathan; Calo, Victor M.

2013-01-01

We investigate the application of the discontinuous Petrov-Galerkin (DPG) finite element framework to stationary convection-diffusion problems. In particular, we demonstrate how the quasi-optimal test space norm improves the robustness of the DPG method with respect to vanishing diffusion. We numerically compare coarse-mesh accuracy of the approximation when using the quasi-optimal norm, the standard norm, and the weighted norm. Our results show that the quasi-optimal norm leads to more accurate results on three benchmark problems in two spatial dimensions. We address the problems associated to the resolution of the optimal test functions with respect to the quasi-optimal norm by studying their convergence numerically. In order to facilitate understanding of the method, we also include a detailed explanation of the methodology from the algorithmic point of view. © 2013 Elsevier Ltd. All rights reserved.
REGSTEP - stepwise multivariate polynomial regression with singular extensions

International Nuclear Information System (INIS)

Davierwalla, D.M.

1977-09-01

The program REGSTEP determines a polynomial approximation, in the least squares sense, to tabulated data. The polynomial may be univariate or multivariate. The computational method is that of stepwise regression. A variable is inserted into the regression basis if it is significant with respect to an appropriate F-test at a preselected risk level. In addition, should a variable already in the basis, become nonsignificant (again with respect to an appropriate F-test) after the entry of a new variable, it is expelled from the model. Thus only significant variables are retained in the model. Although written expressly to be incorporated into CORCOD, a code for predicting nuclear cross sections for given values of power, temperature, void fractions, Boron content etc. there is nothing to limit the use of REGSTEP to nuclear applications, as the examples demonstrate. A separate version has been incorporated into RSYST for the general user. (Auth.)
Regularized multivariate regression models with skew-t error distributions

KAUST Repository

Chen, Lianfu

2014-06-01

We consider regularization of the parameters in multivariate linear regression models with the errors having a multivariate skew-t distribution. An iterative penalized likelihood procedure is proposed for constructing sparse estimators of both the regression coefficient and inverse scale matrices simultaneously. The sparsity is introduced through penalizing the negative log-likelihood by adding L1-penalties on the entries of the two matrices. Taking advantage of the hierarchical representation of skew-t distributions, and using the expectation conditional maximization (ECM) algorithm, we reduce the problem to penalized normal likelihood and develop a procedure to minimize the ensuing objective function. Using a simulation study the performance of the method is assessed, and the methodology is illustrated using a real data set with a 24-dimensional response vector. © 2014 Elsevier B.V.
Clinical evaluation of a novel population-based regression analysis for detecting glaucomatous visual field progression.

Science.gov (United States)

Kovalska, M P; Bürki, E; Schoetzau, A; Orguel, S F; Orguel, S; Grieshaber, M C

2011-04-01

The distinction of real progression from test variability in visual field (VF) series may be based on clinical judgment, on trend analysis based on follow-up of test parameters over time, or on identification of a significant change related to the mean of baseline exams (event analysis). The aim of this study was to compare a new population-based method (Octopus field analysis, OFA) with classic regression analyses and clinical judgment for detecting glaucomatous VF changes. 240 VF series of 240 patients with at least 9 consecutive examinations available were included into this study. They were independently classified by two experienced investigators. The results of such a classification served as a reference for comparison for the following statistical tests: (a) t-test global, (b) r-test global, (c) regression analysis of 10 VF clusters and (d) point-wise linear regression analysis. 32.5 % of the VF series were classified as progressive by the investigators. The sensitivity and specificity were 89.7 % and 92.0 % for r-test, and 73.1 % and 93.8 % for the t-test, respectively. In the point-wise linear regression analysis, the specificity was comparable (89.5 % versus 92 %), but the sensitivity was clearly lower than in the r-test (22.4 % versus 89.7 %) at a significance level of p = 0.01. A regression analysis for the 10 VF clusters showed a markedly higher sensitivity for the r-test (37.7 %) than the t-test (14.1 %) at a similar specificity (88.3 % versus 93.8 %) for a significant trend (p = 0.005). In regard to the cluster distribution, the paracentral clusters and the superior nasal hemifield progressed most frequently. The population-based regression analysis seems to be superior to the trend analysis in detecting VF progression in glaucoma, and may eliminate the drawbacks of the event analysis. Further, it may assist the clinician in the evaluation of VF series and may allow better visualization of the correlation between function and structure owing to VF
Methods of Detecting Outliers in A Regression Analysis Model ...

African Journals Online (AJOL)

PROF. O. E. OSUAGWU

2013-06-01

Jun 1, 2013 ... especially true in observational studies .... Simple linear regression and multiple ... The simple linear ..... Grubbs,F.E (1950): Sample Criteria for Testing Outlying observations: Annals of ... In experimental design, the Relative.
Associations between overweight, peer problems, and mental health in 12-13-year-old Norwegian children.

Science.gov (United States)

Hestetun, Ingebjørg; Svendsen, Martin Veel; Oellingrath, Inger Margaret

2015-03-01

Overweight and mental health problems represent two major challenges related to child and adolescent health. More knowledge of a possible relationship between the two problems and the influence of peer problems on the mental health of overweight children is needed. It has previously been hypothesized that peer problems may be an underlying factor in the association between overweight and mental health problems. The purpose of the present study was to investigate the associations between overweight, peer problems, and indications of mental health problems in a sample of 12-13-year-old Norwegian schoolchildren. Children aged 12-13 years were recruited from the seventh grade of primary schools in Telemark County, Norway. Parents gave information about mental health and peer problems by completing the extended version of the Strength and Difficulties Questionnaire (SDQ). Height and weight were objectively measured. Complete data were obtained for 744 children. Fisher's exact probability test and multiple logistic regressions were used. Most children had normal good mental health. Multiple logistic regression analysis showed that overweight children were more likely to have indications of psychiatric disorders (adjusted OR: 1.8, CI: 1.0-3.2) and peer problems (adjusted OR: 2.6, CI: 1.6-4.2) than normal-weight children, when adjusted for relevant background variables. When adjusted for peer problems, the association between overweight and indications of any psychiatric disorder was no longer significant. The results support the hypothesis that peer problems may be an important underlying factor for mental health problems in overweight children.
Three Contributions to Robust Regression Diagnostics

Czech Academy of Sciences Publication Activity Database

Kalina, Jan

2015-01-01

Roč. 11, č. 2 (2015), s. 69-78 ISSN 1336-9180 Grant - others:GA ČR(CZ) GA13-01930S; Nadační fond na podporu vědy(CZ) Neuron Institutional support: RVO:67985807 Keywords : robust regression * robust econometrics * hypothesis test ing Subject RIV: BA - General Mathematics http://www.degruyter.com/view/j/jamsi.2015.11.issue-2/jamsi-2015-0013/jamsi-2015-0013.xml?format=INT

Further investigations of the W-test for pairwise epistasis testing [version 1; referees: 2 approved

Directory of Open Access Journals (Sweden)

Richard Howey

2017-07-01

Full Text Available Background: In a recent paper, a novel W-test for pairwise epistasis testing was proposed that appeared, in computer simulations, to have higher power than competing alternatives. Application to genome-wide bipolar data detected significant epistasis between SNPs in genes of relevant biological function. Network analysis indicated that the implicated genes formed two separate interaction networks, each containing genes highly related to autism and neurodegenerative disorders. Methods: Here we investigate further the properties and performance of the W-test via theoretical evaluation, computer simulations and application to real data. Results: We demonstrate that, for common variants, the W-test is closely related to several existing tests of association allowing for interaction, including logistic regression on 8 degrees of freedom, although logistic regression can show inflated type I error for low minor allele frequencies, whereas the W-test shows good/conservative type I error control. Although in some situations the W-test can show higher power, logistic regression is not limited to tests on 8 degrees of freedom but can instead be taylored to impose greater structure on the assumed alternative hypothesis, offering a power advantage when the imposed structure matches the true structure. Conclusions: The W-test is a potentially useful method for testing for association - without necessarily implying interaction - between genetic variants disease, particularly when one or more of the genetic variants are rare. For common variants, the advantages of the W-test are less clear, and, indeed, there are situations where existing methods perform better. In our investigations, we further uncover a number of problems with the practical implementation and application of the W-test (to bipolar disorder previously described, apparently due to inadequate use of standard data quality-control procedures. This observation leads us to urge caution in
Medical Tests for Prostate Problems

Science.gov (United States)

... walnut-shaped gland that is part of the male reproductive system. It has two or more lobes, or sections, ... treating problems of the urinary tract and the male reproductive system. Abdominal Ultrasound Ultrasound uses a device, called a ...
No rationale for 1 variable per 10 events criterion for binary logistic regression analysis

Directory of Open Access Journals (Sweden)

Maarten van Smeden

2016-11-01

Full Text Available Abstract Background Ten events per variable (EPV is a widely advocated minimal criterion for sample size considerations in logistic regression analysis. Of three previous simulation studies that examined this minimal EPV criterion only one supports the use of a minimum of 10 EPV. In this paper, we examine the reasons for substantial differences between these extensive simulation studies. Methods The current study uses Monte Carlo simulations to evaluate small sample bias, coverage of confidence intervals and mean square error of logit coefficients. Logistic regression models fitted by maximum likelihood and a modified estimation procedure, known as Firth’s correction, are compared. Results The results show that besides EPV, the problems associated with low EPV depend on other factors such as the total sample size. It is also demonstrated that simulation results can be dominated by even a few simulated data sets for which the prediction of the outcome by the covariates is perfect (‘separation’. We reveal that different approaches for identifying and handling separation leads to substantially different simulation results. We further show that Firth’s correction can be used to improve the accuracy of regression coefficients and alleviate the problems associated with separation. Conclusions The current evidence supporting EPV rules for binary logistic regression is weak. Given our findings, there is an urgent need for new research to provide guidance for supporting sample size considerations for binary logistic regression analysis.
Reconstructing missing daily precipitation data using regression trees and artificial neural networks

Science.gov (United States)

Incomplete meteorological data has been a problem in environmental modeling studies. The objective of this work was to develop a technique to reconstruct missing daily precipitation data in the central part of Chesapeake Bay Watershed using regression trees (RT) and artificial neural networks (ANN)....
Regression Analysis: Instructional Resource for Cost/Managerial Accounting

Science.gov (United States)

Stout, David E.

2015-01-01

This paper describes a classroom-tested instructional resource, grounded in principles of active learning and a constructivism, that embraces two primary objectives: "demystify" for accounting students technical material from statistics regarding ordinary least-squares (OLS) regression analysis--material that students may find obscure or…
Scaling model for prediction of radionuclide activity in cooling water using a regression triplet technique

International Nuclear Information System (INIS)

Silvia Dulanska; Lubomir Matel; Milan Meloun

2010-01-01

The decommissioning of the nuclear power plant (NPP) A1 Jaslovske Bohunice (Slovakia) is a complicated set of problems that is highly demanding both technically and financially. The basic goal of the decommissioning process is the total elimination of radioactive materials from the nuclear power plant area, and radwaste treatment to a form suitable for its safe disposal. The initial conditions of decommissioning also include elimination of the operational events, preparation and transport of the fuel from the plant territory, radiochemical and physical-chemical characterization of the radioactive wastes. One of the problems was and still is the processing of the liquid radioactive wastes. Such media is also the cooling water of the long-term storage of spent fuel. A suitable scaling model for predicting the activity of hard-to-detect radionuclides 239,240 Pu, 90 Sr and summary beta in cooling water using a regression triplet technique has been built using the regression triplet analysis and regression diagnostics. (author)
Determination of osteoporosis risk factors using a multiple logistic regression model in postmenopausal Turkish women.

Science.gov (United States)

Akkus, Zeki; Camdeviren, Handan; Celik, Fatma; Gur, Ali; Nas, Kemal

2005-09-01

To determine the risk factors of osteoporosis using a multiple binary logistic regression method and to assess the risk variables for osteoporosis, which is a major and growing health problem in many countries. We presented a case-control study, consisting of 126 postmenopausal healthy women as control group and 225 postmenopausal osteoporotic women as the case group. The study was carried out in the Department of Physical Medicine and Rehabilitation, Dicle University, Diyarbakir, Turkey between 1999-2002. The data from the 351 participants were collected using a standard questionnaire that contains 43 variables. A multiple logistic regression model was then used to evaluate the data and to find the best regression model. We classified 80.1% (281/351) of the participants using the regression model. Furthermore, the specificity value of the model was 67% (84/126) of the control group while the sensitivity value was 88% (197/225) of the case group. We found the distribution of residual values standardized for final model to be exponential using the Kolmogorow-Smirnow test (p=0.193). The receiver operating characteristic curve was found successful to predict patients with risk for osteoporosis. This study suggests that low levels of dietary calcium intake, physical activity, education, and longer duration of menopause are independent predictors of the risk of low bone density in our population. Adequate dietary calcium intake in combination with maintaining a daily physical activity, increasing educational level, decreasing birth rate, and duration of breast-feeding may contribute to healthy bones and play a role in practical prevention of osteoporosis in Southeast Anatolia. In addition, the findings of the present study indicate that the use of multivariate statistical method as a multiple logistic regression in osteoporosis, which maybe influenced by many variables, is better than univariate statistical evaluation.
SEPARATION PHENOMENA LOGISTIC REGRESSION

Directory of Open Access Journals (Sweden)

Ikaro Daniel de Carvalho Barreto

2014-03-01

Full Text Available This paper proposes an application of concepts about the maximum likelihood estimation of the binomial logistic regression model to the separation phenomena. It generates bias in the estimation and provides different interpretations of the estimates on the different statistical tests (Wald, Likelihood Ratio and Score and provides different estimates on the different iterative methods (Newton-Raphson and Fisher Score. It also presents an example that demonstrates the direct implications for the validation of the model and validation of variables, the implications for estimates of odds ratios and confidence intervals, generated from the Wald statistics. Furthermore, we present, briefly, the Firth correction to circumvent the phenomena of separation.
Patch testing with markers of fragrance contact allergy. Do clinical tests correspond to patients' self-reported problems?

DEFF Research Database (Denmark)

Johansen, J D; Andersen, T F; Veien, N

1997-01-01

The aim of the present study was to investigate the relationship between patients' own recognition of skin problems using consumer products and the results of patch testing with markers of fragrance sensitization. Eight hundred and eighty-four consecutive eczema patients, 18-69 years of age, filled...... in a questionnaire prior to patch testing with the European standard series. The questionnaire contained questions about skin symptoms from the use of scented and unscented products as well as skin reactions from contact with spices, flowers and citrus fruits that could indicate fragrance sensitivity. A highly...... significant association was found between reporting a history of visible skin symptoms from using scented products and a positive patch test to the fragrance mix, whereas no such relationship could be established to the Peru balsam in univariate or multivariate analysis. Our results suggest that the role...
Gaussian process regression for geometry optimization

Science.gov (United States)

Denzel, Alexander; Kästner, Johannes

2018-03-01

We implemented a geometry optimizer based on Gaussian process regression (GPR) to find minimum structures on potential energy surfaces. We tested both a two times differentiable form of the Matérn kernel and the squared exponential kernel. The Matérn kernel performs much better. We give a detailed description of the optimization procedures. These include overshooting the step resulting from GPR in order to obtain a higher degree of interpolation vs. extrapolation. In a benchmark against the Limited-memory Broyden-Fletcher-Goldfarb-Shanno optimizer of the DL-FIND library on 26 test systems, we found the new optimizer to generally reduce the number of required optimization steps.
Right frontal pole cortical thickness and executive functioning in children with traumatic brain injury: the impact on social problems.

Science.gov (United States)

Levan, Ashley; Black, Garrett; Mietchen, Jonathan; Baxter, Leslie; Brock Kirwan, C; Gale, Shawn D

2016-12-01

Cognitive and social outcomes may be negatively affected in children with a history of traumatic brain injury (TBI). We hypothesized that executive function would mediate the association between right frontal pole cortical thickness and problematic social behaviors. Child participants with a history of TBI were recruited from inpatient admissions for long-term follow-up (n = 23; average age = 12.8, average time post-injury =3.2 years). Three measures of executive function, the Trail Making Test, verbal fluency test, and the Conners' Continuous Performance Test-Second edition (CPT-II), were administered to each participant while caregivers completed the Childhood Behavior Checklist (CBCL). All participants underwent brain magnetic resonance imaging following cognitive testing. Regression analysis demonstrated right frontal pole cortical thickness significantly predicted social problems. Measures of executive functioning also significantly predicted social problems; however, the mediation model testing whether executive function mediated the relationship between cortical thickness and social problems was not statistically significant. Right frontal pole cortical thickness and omission errors on the CPT-II predicted Social Problems on the CBCL. Results did not indicate that the association between cortical thickness and social problems was mediated by executive function.
Regression analysis by example

CERN Document Server

Chatterjee, Samprit

2012-01-01

Praise for the Fourth Edition: ""This book is . . . an excellent source of examples for regression analysis. It has been and still is readily readable and understandable."" -Journal of the American Statistical Association Regression analysis is a conceptually simple method for investigating relationships among variables. Carrying out a successful application of regression analysis, however, requires a balance of theoretical results, empirical rules, and subjective judgment. Regression Analysis by Example, Fifth Edition has been expanded
Accuracy of Bayes and Logistic Regression Subscale Probabilities for Educational and Certification Tests

Science.gov (United States)

Rudner, Lawrence

2016-01-01

In the machine learning literature, it is commonly accepted as fact that as calibration sample sizes increase, Naïve Bayes classifiers initially outperform Logistic Regression classifiers in terms of classification accuracy. Applied to subtests from an on-line final examination and from a highly regarded certification examination, this study shows…
Applied logistic regression

CERN Document Server

Hosmer, David W; Sturdivant, Rodney X

2013-01-01

A new edition of the definitive guide to logistic regression modeling for health science and other applications This thoroughly expanded Third Edition provides an easily accessible introduction to the logistic regression (LR) model and highlights the power of this model by examining the relationship between a dichotomous outcome and a set of covariables. Applied Logistic Regression, Third Edition emphasizes applications in the health sciences and handpicks topics that best suit the use of modern statistical software. The book provides readers with state-of-
Application of Boosting Regression Trees to Preliminary Cost Estimation in Building Construction Projects

Directory of Open Access Journals (Sweden)

Yoonseok Shin

2015-01-01

Full Text Available Among the recent data mining techniques available, the boosting approach has attracted a great deal of attention because of its effective learning algorithm and strong boundaries in terms of its generalization performance. However, the boosting approach has yet to be used in regression problems within the construction domain, including cost estimations, but has been actively utilized in other domains. Therefore, a boosting regression tree (BRT is applied to cost estimations at the early stage of a construction project to examine the applicability of the boosting approach to a regression problem within the construction domain. To evaluate the performance of the BRT model, its performance was compared with that of a neural network (NN model, which has been proven to have a high performance in cost estimation domains. The BRT model has shown results similar to those of NN model using 234 actual cost datasets of a building construction project. In addition, the BRT model can provide additional information such as the importance plot and structure model, which can support estimators in comprehending the decision making process. Consequently, the boosting approach has potential applicability in preliminary cost estimations in a building construction project.
SPSS macros to compare any two fitted values from a regression model.

Science.gov (United States)

Weaver, Bruce; Dubois, Sacha

2012-12-01

In regression models with first-order terms only, the coefficient for a given variable is typically interpreted as the change in the fitted value of Y for a one-unit increase in that variable, with all other variables held constant. Therefore, each regression coefficient represents the difference between two fitted values of Y. But the coefficients represent only a fraction of the possible fitted value comparisons that might be of interest to researchers. For many fitted value comparisons that are not captured by any of the regression coefficients, common statistical software packages do not provide the standard errors needed to compute confidence intervals or carry out statistical tests-particularly in more complex models that include interactions, polynomial terms, or regression splines. We describe two SPSS macros that implement a matrix algebra method for comparing any two fitted values from a regression model. The !OLScomp and !MLEcomp macros are for use with models fitted via ordinary least squares and maximum likelihood estimation, respectively. The output from the macros includes the standard error of the difference between the two fitted values, a 95% confidence interval for the difference, and a corresponding statistical test with its p-value.
Analysis of quantile regression as alternative to ordinary least squares

OpenAIRE

Ibrahim Abdullahi; Abubakar Yahaya

2015-01-01

In this article, an alternative to ordinary least squares (OLS) regression based on analytical solution in the Statgraphics software is considered, and this alternative is no other than quantile regression (QR) model. We also present goodness of fit statistic as well as approximate distributions of the associated test statistics for the parameters. Furthermore, we suggest a goodness of fit statistic called the least absolute deviation (LAD) coefficient of determination. The procedure is well ...
Laplacian embedded regression for scalable manifold regularization.

Science.gov (United States)

Chen, Lin; Tsang, Ivor W; Xu, Dong

2012-06-01

Semi-supervised learning (SSL), as a powerful tool to learn from a limited number of labeled data and a large number of unlabeled data, has been attracting increasing attention in the machine learning community. In particular, the manifold regularization framework has laid solid theoretical foundations for a large family of SSL algorithms, such as Laplacian support vector machine (LapSVM) and Laplacian regularized least squares (LapRLS). However, most of these algorithms are limited to small scale problems due to the high computational cost of the matrix inversion operation involved in the optimization problem. In this paper, we propose a novel framework called Laplacian embedded regression by introducing an intermediate decision variable into the manifold regularization framework. By using ∈-insensitive loss, we obtain the Laplacian embedded support vector regression (LapESVR) algorithm, which inherits the sparse solution from SVR. Also, we derive Laplacian embedded RLS (LapERLS) corresponding to RLS under the proposed framework. Both LapESVR and LapERLS possess a simpler form of a transformed kernel, which is the summation of the original kernel and a graph kernel that captures the manifold structure. The benefits of the transformed kernel are two-fold: (1) we can deal with the original kernel matrix and the graph Laplacian matrix in the graph kernel separately and (2) if the graph Laplacian matrix is sparse, we only need to perform the inverse operation for a sparse matrix, which is much more efficient when compared with that for a dense one. Inspired by kernel principal component analysis, we further propose to project the introduced decision variable into a subspace spanned by a few eigenvectors of the graph Laplacian matrix in order to better reflect the data manifold, as well as accelerate the calculation of the graph kernel, allowing our methods to efficiently and effectively cope with large scale SSL problems. Extensive experiments on both toy and real
Applied linear regression

CERN Document Server

Weisberg, Sanford

2013-01-01

Praise for the Third Edition ""...this is an excellent book which could easily be used as a course text...""-International Statistical Institute The Fourth Edition of Applied Linear Regression provides a thorough update of the basic theory and methodology of linear regression modeling. Demonstrating the practical applications of linear regression analysis techniques, the Fourth Edition uses interesting, real-world exercises and examples. Stressing central concepts such as model building, understanding parameters, assessing fit and reliability, and drawing conclusions, the new edition illus
SNOW DEPTH ESTIMATION USING TIME SERIES PASSIVE MICROWAVE IMAGERY VIA GENETICALLY SUPPORT VECTOR REGRESSION (CASE STUDY URMIA LAKE BASIN

Directory of Open Access Journals (Sweden)

N. Zahir

2015-12-01

Full Text Available Lake Urmia is one of the most important ecosystems of the country which is on the verge of elimination. Many factors contribute to this crisis among them is the precipitation, paly important roll. Precipitation has many forms one of them is in the form of snow. The snow on Sahand Mountain is one of the main and important sources of the Lake Urmia’s water. Snow Depth (SD is vital parameters for estimating water balance for future year. In this regards, this study is focused on SD parameter using Special Sensor Microwave/Imager (SSM/I instruments on board the Defence Meteorological Satellite Program (DMSP F16. The usual statistical methods for retrieving SD include linear and non-linear ones. These methods used least square procedure to estimate SD model. Recently, kernel base methods widely used for modelling statistical problem. From these methods, the support vector regression (SVR is achieved the high performance for modelling the statistical problem. Examination of the obtained data shows the existence of outlier in them. For omitting these outliers, wavelet denoising method is applied. After the omission of the outliers it is needed to select the optimum bands and parameters for SVR. To overcome these issues, feature selection methods have shown a direct effect on improving the regression performance. We used genetic algorithm (GA for selecting suitable features of the SSMI bands in order to estimate SD model. The results for the training and testing data in Sahand mountain is [R²_TEST=0.9049 and RMSE= 6.9654] that show the high SVR performance.

Solution of the Chandler-Gibson equations for a three-body test problem

International Nuclear Information System (INIS)

Gibson, A.G.; Waters, A.J.; Berthold, G.H.; Chandler, C.

1991-01-01

The Chandler-Gibson (CG) N-body equations are tested by considering the problem of three nonrelativistic particles moving on a line and interacting through attractive delta-function potentials. In particular, the input Born and overlap matrix-valued functions are evaluated analytically, and the CG equations are solved using a B-spline collocation method. The computed scattering matrix elements are within 0.5% of the known exact solutions, and the corresponding scattering probabilities are within 0.001% of the exact probabilities, both below and above the 3-body breakup threshold. These results establish that the CG method is practical, as well as theoretically correct, and may be a valuable approach for solving certain more complicated N-body scattering problems
Organizational adoption of preemployment drug testing.

Science.gov (United States)

Spell, C S; Blum, T C

2001-04-01

This study explored the adoption of preemployment drug testing by 360 organizations. Survival models were developed that included internal organizational and labor market factors hypothesized to affect the likelihood of adoption of drug testing. Also considered was another set of variables that included social and political variables based on institutional theory. An event history analysis using Cox regressions indicated that both internal organizational and environmental variables predicted adoption of drug testing. Results indicate that the higher the proportion of drug testers in the worksite's industry, the more likely it would be to adopt drug testing. Also, the extent to which an organization uses an internal labor market, voluntary turnover rate, and the extent to which management perceives drugs to be a problem were related to likelihood of adoption of drug testing.
Do job demands and job control affect problem-solving?

Science.gov (United States)

Bergman, Peter N; Ahlberg, Gunnel; Johansson, Gun; Stoetzer, Ulrich; Aborg, Carl; Hallsten, Lennart; Lundberg, Ingvar

2012-01-01

The Job Demand Control model presents combinations of working conditions that may facilitate learning, the active learning hypothesis, or have detrimental effects on health, the strain hypothesis. To test the active learning hypothesis, this study analysed the effects of job demands and job control on general problem-solving strategies. A population-based sample of 4,636 individuals (55% women, 45% men) with the same job characteristics measured at two times with a three year time lag was used. Main effects of demands, skill discretion, task authority and control, and the combined effects of demands and control were analysed in logistic regressions, on four outcomes representing general problem-solving strategies. Those reporting high on skill discretion, task authority and control, as well as those reporting high demand/high control and low demand/high control job characteristics were more likely to state using problem solving strategies. Results suggest that working conditions including high levels of control may affect how individuals cope with problems and that workplace characteristics may affect behaviour in the non-work domain.
The comparison between several robust ridge regression estimators in the presence of multicollinearity and multiple outliers

Science.gov (United States)

Zahari, Siti Meriam; Ramli, Norazan Mohamed; Moktar, Balkiah; Zainol, Mohammad Said

2014-09-01

In the presence of multicollinearity and multiple outliers, statistical inference of linear regression model using ordinary least squares (OLS) estimators would be severely affected and produces misleading results. To overcome this, many approaches have been investigated. These include robust methods which were reported to be less sensitive to the presence of outliers. In addition, ridge regression technique was employed to tackle multicollinearity problem. In order to mitigate both problems, a combination of ridge regression and robust methods was discussed in this study. The superiority of this approach was examined when simultaneous presence of multicollinearity and multiple outliers occurred in multiple linear regression. This study aimed to look at the performance of several well-known robust estimators; M, MM, RIDGE and robust ridge regression estimators, namely Weighted Ridge M-estimator (WRM), Weighted Ridge MM (WRMM), Ridge MM (RMM), in such a situation. Results of the study showed that in the presence of simultaneous multicollinearity and multiple outliers (in both x and y-direction), the RMM and RIDGE are more or less similar in terms of superiority over the other estimators, regardless of the number of observation, level of collinearity and percentage of outliers used. However, when outliers occurred in only single direction (y-direction), the WRMM estimator is the most superior among the robust ridge regression estimators, by producing the least variance. In conclusion, the robust ridge regression is the best alternative as compared to robust and conventional least squares estimators when dealing with simultaneous presence of multicollinearity and outliers.
Causal correlation of foliar biochemical concentrations with AVIRIS spectra using forced entry linear regression

Science.gov (United States)

Dawson, Terence P.; Curran, Paul J.; Kupiec, John A.

1995-01-01

A major goal of airborne imaging spectrometry is to estimate the biochemical composition of vegetation canopies from reflectance spectra. Remotely-sensed estimates of foliar biochemical concentrations of forests would provide valuable indicators of ecosystem function at regional and eventually global scales. Empirical research has shown a relationship exists between the amount of radiation reflected from absorption features and the concentration of given biochemicals in leaves and canopies (Matson et al., 1994, Johnson et al., 1994). A technique commonly used to determine which wavelengths have the strongest correlation with the biochemical of interest is unguided (stepwise) multiple regression. Wavelengths are entered into a multivariate regression equation, in their order of importance, each contributing to the reduction of the variance in the measured biochemical concentration. A significant problem with the use of stepwise regression for determining the correlation between biochemical concentration and spectra is that of 'overfitting' as there are significantly more wavebands than biochemical measurements. This could result in the selection of wavebands which may be more accurately attributable to noise or canopy effects. In addition, there is a real problem of collinearity in that the individual biochemical concentrations may covary. A strong correlation between the reflectance at a given wavelength and the concentration of a biochemical of interest, therefore, may be due to the effect of another biochemical which is closely related. Furthermore, it is not always possible to account for potentially suitable waveband omissions in the stepwise selection procedure. This concern about the suitability of stepwise regression has been identified and acknowledged in a number of recent studies (Wessman et al., 1988, Curran, 1989, Curran et al., 1992, Peterson and Hubbard, 1992, Martine and Aber, 1994, Kupiec, 1994). These studies have pointed to the lack of a physical
49 CFR 40.208 - What problem requires corrective action but does not result in the cancellation of a test?

Science.gov (United States)

2010-10-01

... 49 Transportation 1 2010-10-01 2010-10-01 false What problem requires corrective action but does not result in the cancellation of a test? 40.208 Section 40.208 Transportation Office of the Secretary of Transportation PROCEDURES FOR TRANSPORTATION WORKPLACE DRUG AND ALCOHOL TESTING PROGRAMS Problems...
The students’ ability in mathematical literacy for the quantity, and the change and relationship problems on the PISA adaptation test

Science.gov (United States)

Julie, Hongki; Sanjaya, Febi; Yudhi Anggoro, Ant.

2017-09-01

One of purposes of this study was to describe the solution profile of the junior high school students for the PISA adaptation test. The procedures conducted by researchers to achieve this objective were (1) adapting the PISA test, (2) validating the adapting PISA test, (3) asking junior high school students to do the adapting PISA test, and (4) making the students’ solution profile. The PISA problems for mathematics could be classified into four areas, namely quantity, space and shape, change and relationship, and uncertainty. The research results that would be presented in this paper were the result test for quantity, and change and relationship problems. In the adapting PISA test, there were fifteen questions that consist of two questions for the quantity group, six questions for space and shape group, three questions for the change and relationship group, and four questions for uncertainty. Subjects in this study were 18 students from 11 junior high schools in Yogyakarta, Central Java, and Banten. The type of research that used by the researchers was a qualitative research. For the first quantity problem, there were 38.89 % students who achieved level 3. For the second quantity problem, there were 88.89 % students who achieved level 2. For part a of the first change and relationship problem, there were 55.56 % students who achieved level 5. For part b of the first change and relationship problem, there were 77.78 % students who achieved level 2. For the second change and relationship problem, there were 38.89 % students who achieved level 2.
Testing developmental pathways to antisocial personality problems

NARCIS (Netherlands)

S. Diamantopoulou; F.C. Verhulst (Frank); J. van der Ende (Jan)

2010-01-01

textabstractThis study examined the development of antisocial personality problems (APP) in young adulthood from disruptive behaviors and internalizing problems in childhood and adolescence. Parent ratings of 507 children's (aged 6-8 years) symptoms of attention deficit hyperactivity disorder,
Top Incomes, Heavy Tails, and Rank-Size Regressions

Directory of Open Access Journals (Sweden)

Christian Schluter

2018-03-01

Full Text Available In economics, rank-size regressions provide popular estimators of tail exponents of heavy-tailed distributions. We discuss the properties of this approach when the tail of the distribution is regularly varying rather than strictly Pareto. The estimator then over-estimates the true value in the leading parametric income models (so the upper income tail is less heavy than estimated, which leads to test size distortions and undermines inference. For practical work, we propose a sensitivity analysis based on regression diagnostics in order to assess the likely impact of the distortion. The methods are illustrated using data on top incomes in the UK.
SPSS and SAS programs for comparing Pearson correlations and OLS regression coefficients.

Science.gov (United States)

Weaver, Bruce; Wuensch, Karl L

2013-09-01

Several procedures that use summary data to test hypotheses about Pearson correlations and ordinary least squares regression coefficients have been described in various books and articles. To our knowledge, however, no single resource describes all of the most common tests. Furthermore, many of these tests have not yet been implemented in popular statistical software packages such as SPSS and SAS. In this article, we describe all of the most common tests and provide SPSS and SAS programs to perform them. When they are applicable, our code also computes 100 × (1 - α)% confidence intervals corresponding to the tests. For testing hypotheses about independent regression coefficients, we demonstrate one method that uses summary data and another that uses raw data (i.e., Potthoff analysis). When the raw data are available, the latter method is preferred, because use of summary data entails some loss of precision due to rounding.
Targeting: Logistic Regression, Special Cases and Extensions

Directory of Open Access Journals (Sweden)

Helmut Schaeben

2014-12-01

Full Text Available Logistic regression is a classical linear model for logit-transformed conditional probabilities of a binary target variable. It recovers the true conditional probabilities if the joint distribution of predictors and the target is of log-linear form. Weights-of-evidence is an ordinary logistic regression with parameters equal to the differences of the weights of evidence if all predictor variables are discrete and conditionally independent given the target variable. The hypothesis of conditional independence can be tested in terms of log-linear models. If the assumption of conditional independence is violated, the application of weights-of-evidence does not only corrupt the predicted conditional probabilities, but also their rank transform. Logistic regression models, including the interaction terms, can account for the lack of conditional independence, appropriate interaction terms compensate exactly for violations of conditional independence. Multilayer artificial neural nets may be seen as nested regression-like models, with some sigmoidal activation function. Most often, the logistic function is used as the activation function. If the net topology, i.e., its control, is sufficiently versatile to mimic interaction terms, artificial neural nets are able to account for violations of conditional independence and yield very similar results. Weights-of-evidence cannot reasonably include interaction terms; subsequent modifications of the weights, as often suggested, cannot emulate the effect of interaction terms.
Understanding poisson regression.

Science.gov (United States)

Hayat, Matthew J; Higgins, Melinda

2014-04-01

Nurse investigators often collect study data in the form of counts. Traditional methods of data analysis have historically approached analysis of count data either as if the count data were continuous and normally distributed or with dichotomization of the counts into the categories of occurred or did not occur. These outdated methods for analyzing count data have been replaced with more appropriate statistical methods that make use of the Poisson probability distribution, which is useful for analyzing count data. The purpose of this article is to provide an overview of the Poisson distribution and its use in Poisson regression. Assumption violations for the standard Poisson regression model are addressed with alternative approaches, including addition of an overdispersion parameter or negative binomial regression. An illustrative example is presented with an application from the ENSPIRE study, and regression modeling of comorbidity data is included for illustrative purposes. Copyright 2014, SLACK Incorporated.
Alternative Methods of Regression

CERN Document Server

Birkes, David

2011-01-01

Of related interest. Nonlinear Regression Analysis and its Applications Douglas M. Bates and Donald G. Watts ".an extraordinary presentation of concepts and methods concerning the use and analysis of nonlinear regression models.highly recommend[ed].for anyone needing to use and/or understand issues concerning the analysis of nonlinear regression models." --Technometrics This book provides a balance between theory and practice supported by extensive displays of instructive geometrical constructs. Numerous in-depth case studies illustrate the use of nonlinear regression analysis--with all data s
Privacy-Preserving Distributed Linear Regression on High-Dimensional Data

Directory of Open Access Journals (Sweden)

Gascón Adrià

2017-10-01

Full Text Available We propose privacy-preserving protocols for computing linear regression models, in the setting where the training dataset is vertically distributed among several parties. Our main contribution is a hybrid multi-party computation protocol that combines Yao’s garbled circuits with tailored protocols for computing inner products. Like many machine learning tasks, building a linear regression model involves solving a system of linear equations. We conduct a comprehensive evaluation and comparison of different techniques for securely performing this task, including a new Conjugate Gradient Descent (CGD algorithm. This algorithm is suitable for secure computation because it uses an efficient fixed-point representation of real numbers while maintaining accuracy and convergence rates comparable to what can be obtained with a classical solution using floating point numbers. Our technique improves on Nikolaenko et al.’s method for privacy-preserving ridge regression (S&P 2013, and can be used as a building block in other analyses. We implement a complete system and demonstrate that our approach is highly scalable, solving data analysis problems with one million records and one hundred features in less than one hour of total running time.
EPS-LASSO: Test for High-Dimensional Regression Under Extreme Phenotype Sampling of Continuous Traits.

Science.gov (United States)

Xu, Chao; Fang, Jian; Shen, Hui; Wang, Yu-Ping; Deng, Hong-Wen

2018-01-25

Extreme phenotype sampling (EPS) is a broadly-used design to identify candidate genetic factors contributing to the variation of quantitative traits. By enriching the signals in extreme phenotypic samples, EPS can boost the association power compared to random sampling. Most existing statistical methods for EPS examine the genetic factors individually, despite many quantitative traits have multiple genetic factors underlying their variation. It is desirable to model the joint effects of genetic factors, which may increase the power and identify novel quantitative trait loci under EPS. The joint analysis of genetic data in high-dimensional situations requires specialized techniques, e.g., the least absolute shrinkage and selection operator (LASSO). Although there are extensive research and application related to LASSO, the statistical inference and testing for the sparse model under EPS remain unknown. We propose a novel sparse model (EPS-LASSO) with hypothesis test for high-dimensional regression under EPS based on a decorrelated score function. The comprehensive simulation shows EPS-LASSO outperforms existing methods with stable type I error and FDR control. EPS-LASSO can provide a consistent power for both low- and high-dimensional situations compared with the other methods dealing with high-dimensional situations. The power of EPS-LASSO is close to other low-dimensional methods when the causal effect sizes are small and is superior when the effects are large. Applying EPS-LASSO to a transcriptome-wide gene expression study for obesity reveals 10 significant body mass index associated genes. Our results indicate that EPS-LASSO is an effective method for EPS data analysis, which can account for correlated predictors. The source code is available at https://github.com/xu1912/EPSLASSO. hdeng2@tulane.edu. Supplementary data are available at Bioinformatics online. © The Author (2018). Published by Oxford University Press. All rights reserved. For Permissions, please
COMPARISON OF PARTIAL LEAST SQUARES REGRESSION METHOD ALGORITHMS: NIPALS AND PLS-KERNEL AND AN APPLICATION

Directory of Open Access Journals (Sweden)

ELİF BULUT

2013-06-01

Full Text Available Partial Least Squares Regression (PLSR is a multivariate statistical method that consists of partial least squares and multiple linear regression analysis. Explanatory variables, X, having multicollinearity are reduced to components which explain the great amount of covariance between explanatory and response variable. These components are few in number and they don’t have multicollinearity problem. Then multiple linear regression analysis is applied to those components to model the response variable Y. There are various PLSR algorithms. In this study NIPALS and PLS-Kernel algorithms will be studied and illustrated on a real data set.
Length bias correction in gene ontology enrichment analysis using logistic regression.

Science.gov (United States)

Mi, Gu; Di, Yanming; Emerson, Sarah; Cumbie, Jason S; Chang, Jeff H

2012-01-01

When assessing differential gene expression from RNA sequencing data, commonly used statistical tests tend to have greater power to detect differential expression of genes encoding longer transcripts. This phenomenon, called "length bias", will influence subsequent analyses such as Gene Ontology enrichment analysis. In the presence of length bias, Gene Ontology categories that include longer genes are more likely to be identified as enriched. These categories, however, are not necessarily biologically more relevant. We show that one can effectively adjust for length bias in Gene Ontology analysis by including transcript length as a covariate in a logistic regression model. The logistic regression model makes the statistical issue underlying length bias more transparent: transcript length becomes a confounding factor when it correlates with both the Gene Ontology membership and the significance of the differential expression test. The inclusion of the transcript length as a covariate allows one to investigate the direct correlation between the Gene Ontology membership and the significance of testing differential expression, conditional on the transcript length. We present both real and simulated data examples to show that the logistic regression approach is simple, effective, and flexible.
The intermediate endpoint effect in logistic and probit regression

Science.gov (United States)

MacKinnon, DP; Lockwood, CM; Brown, CH; Wang, W; Hoffman, JM

2010-01-01

Background An intermediate endpoint is hypothesized to be in the middle of the causal sequence relating an independent variable to a dependent variable. The intermediate variable is also called a surrogate or mediating variable and the corresponding effect is called the mediated, surrogate endpoint, or intermediate endpoint effect. Clinical studies are often designed to change an intermediate or surrogate endpoint and through this intermediate change influence the ultimate endpoint. In many intermediate endpoint clinical studies the dependent variable is binary, and logistic or probit regression is used. Purpose The purpose of this study is to describe a limitation of a widely used approach to assessing intermediate endpoint effects and to propose an alternative method, based on products of coefficients, that yields more accurate results. Methods The intermediate endpoint model for a binary outcome is described for a true binary outcome and for a dichotomization of a latent continuous outcome. Plots of true values and a simulation study are used to evaluate the different methods. Results Distorted estimates of the intermediate endpoint effect and incorrect conclusions can result from the application of widely used methods to assess the intermediate endpoint effect. The same problem occurs for the proportion of an effect explained by an intermediate endpoint, which has been suggested as a useful measure for identifying intermediate endpoints. A solution to this problem is given based on the relationship between latent variable modeling and logistic or probit regression. Limitations More complicated intermediate variable models are not addressed in the study, although the methods described in the article can be extended to these more complicated models. Conclusions Researchers are encouraged to use an intermediate endpoint method based on the product of regression coefficients. A common method based on difference in coefficient methods can lead to distorted
Application of principal component regression and partial least squares regression in ultraviolet spectrum water quality detection

Science.gov (United States)

Li, Jiangtong; Luo, Yongdao; Dai, Honglin

2018-01-01

Water is the source of life and the essential foundation of all life. With the development of industrialization, the phenomenon of water pollution is becoming more and more frequent, which directly affects the survival and development of human. Water quality detection is one of the necessary measures to protect water resources. Ultraviolet (UV) spectral analysis is an important research method in the field of water quality detection, which partial least squares regression (PLSR) analysis method is becoming predominant technology, however, in some special cases, PLSR's analysis produce considerable errors. In order to solve this problem, the traditional principal component regression (PCR) analysis method was improved by using the principle of PLSR in this paper. The experimental results show that for some special experimental data set, improved PCR analysis method performance is better than PLSR. The PCR and PLSR is the focus of this paper. Firstly, the principal component analysis (PCA) is performed by MATLAB to reduce the dimensionality of the spectral data; on the basis of a large number of experiments, the optimized principal component is extracted by using the principle of PLSR, which carries most of the original data information. Secondly, the linear regression analysis of the principal component is carried out with statistic package for social science (SPSS), which the coefficients and relations of principal components can be obtained. Finally, calculating a same water spectral data set by PLSR and improved PCR, analyzing and comparing two results, improved PCR and PLSR is similar for most data, but improved PCR is better than PLSR for data near the detection limit. Both PLSR and improved PCR can be used in Ultraviolet spectral analysis of water, but for data near the detection limit, improved PCR's result better than PLSR.
Quantile regression for the statistical analysis of immunological data with many non-detects

NARCIS (Netherlands)

Eilers, P.H.C.; Roder, E.; Savelkoul, H.F.J.; Wijk, van R.G.

2012-01-01

Background Immunological parameters are hard to measure. A well-known problem is the occurrence of values below the detection limit, the non-detects. Non-detects are a nuisance, because classical statistical analyses, like ANOVA and regression, cannot be applied. The more advanced statistical

Quantile regression for the statistical analysis of immunological data with many non-detects

NARCIS (Netherlands)

P.H.C. Eilers (Paul); E. Röder (Esther); H.F.J. Savelkoul (Huub); R. Gerth van Wijk (Roy)

2012-01-01

textabstractBackground: Immunological parameters are hard to measure. A well-known problem is the occurrence of values below the detection limit, the non-detects. Non-detects are a nuisance, because classical statistical analyses, like ANOVA and regression, cannot be applied. The more advanced
Asymptotic normality of kernel estimator of $\\psi$-regression function for functional ergodic data

OpenAIRE

Laksaci ALI; Benziadi Fatima; Gheriballak Abdelkader

2016-01-01

In this paper we consider the problem of the estimation of the $\\psi$-regression function when the covariates take values in an infinite dimensional space. Our main aim is to establish, under a stationary ergodic process assumption, the asymptotic normality of this estimate.
Testing Developmental Pathways to Antisocial Personality Problems

Science.gov (United States)

Diamantopoulou, Sofia; Verhulst, Frank C.; van der Ende, Jan

2010-01-01

This study examined the development of antisocial personality problems (APP) in young adulthood from disruptive behaviors and internalizing problems in childhood and adolescence. Parent ratings of 507 children's (aged 6-8 years) symptoms of attention deficit hyperactivity disorder, oppositional defiant disorder, and anxiety, were linked to…
A study of fuzzy logic ensemble system performance on face recognition problem

Science.gov (United States)

Polyakova, A.; Lipinskiy, L.

2017-02-01

Some problems are difficult to solve by using a single intelligent information technology (IIT). The ensemble of the various data mining (DM) techniques is a set of models which are able to solve the problem by itself, but the combination of which allows increasing the efficiency of the system as a whole. Using the IIT ensembles can improve the reliability and efficiency of the final decision, since it emphasizes on the diversity of its components. The new method of the intellectual informational technology ensemble design is considered in this paper. It is based on the fuzzy logic and is designed to solve the classification and regression problems. The ensemble consists of several data mining algorithms: artificial neural network, support vector machine and decision trees. These algorithms and their ensemble have been tested by solving the face recognition problems. Principal components analysis (PCA) is used for feature selection.
Design of a Maglev Vibration Test Platform for the Research of Maglev Vehicle-girder Coupled Vibration Problem

Directory of Open Access Journals (Sweden)

Zhou Danfeng

2017-01-01

Full Text Available The maglev vehicle-girder coupled vibration problem has been encountered in many maglev test or commercial lines, which significantly degrade the performance of the maglev train. In previous research on the principle of the coupled vibration problem, it has been discovered that the fundamental model of the maglev girder can be simplified as a series of mass-spring resonators of different but related resonance frequencies, and that the stability of the vehicle-girder coupled system can be investigated by separately examining the stability of each mass-spring resonator – electromagnet coupled system. Based on this conclusion, a maglev test platform, which includes a single electromagnetic suspension control system, is built for experimental study of the coupled vibration problem. The guideway of the test platform is supported by a number of springs so as to change its flexibility. The mass of the guideway can also be changed by adjusting extra weights attached to it. By changing the flexibility and mass of the guideway, the rules of the maglev vehicle-girder coupled vibration problem are to be examined through experiments, and related theory on the vehicle-girder self-excited vibration proposed in previous research is also testified.
Cox regression with missing covariate data using a modified partial likelihood method

DEFF Research Database (Denmark)

Martinussen, Torben; Holst, Klaus K.; Scheike, Thomas H.

2016-01-01

Missing covariate values is a common problem in survival analysis. In this paper we propose a novel method for the Cox regression model that is close to maximum likelihood but avoids the use of the EM-algorithm. It exploits that the observed hazard function is multiplicative in the baseline hazard...
Poisson regression approach for modeling fatal injury rates amongst Malaysian workers

International Nuclear Information System (INIS)

Kamarulzaman Ibrahim; Heng Khai Theng

2005-01-01

Many safety studies are based on the analysis carried out on injury surveillance data. The injury surveillance data gathered for the analysis include information on number of employees at risk of injury in each of several strata where the strata are defined in terms of a series of important predictor variables. Further insight into the relationship between fatal injury rates and predictor variables may be obtained by the poisson regression approach. Poisson regression is widely used in analyzing count data. In this study, poisson regression is used to model the relationship between fatal injury rates and predictor variables which are year (1995-2002), gender, recording system and industry type. Data for the analysis were obtained from PERKESO and Jabatan Perangkaan Malaysia. It is found that the assumption that the data follow poisson distribution has been violated. After correction for the problem of over dispersion, the predictor variables that are found to be significant in the model are gender, system of recording, industry type, two interaction effects (interaction between recording system and industry type and between year and industry type). Introduction Regression analysis is one of the most popular
Iterative and range test methods for an inverse source problem for acoustic waves

International Nuclear Information System (INIS)

Alves, Carlos; Kress, Rainer; Serranho, Pedro

2009-01-01

We propose two methods for solving an inverse source problem for time-harmonic acoustic waves. Based on the reciprocity gap principle a nonlinear equation is presented for the locations and intensities of the point sources that can be solved via Newton iterations. To provide an initial guess for this iteration we suggest a range test algorithm for approximating the source locations. We give a mathematical foundation for the range test and exhibit its feasibility in connection with the iteration method by some numerical examples
A brief introduction to regression designs and mixed-effects modelling by a recent convert

DEFF Research Database (Denmark)

Balling, Laura Winther

2008-01-01

This article discusses the advantages of multiple regression designs over the factorial designs traditionally used in many psycholinguistic experiments. It is shown that regression designs are typically more informative, statistically more powerful and better suited to the analysis of naturalistic...... tasks. The advantages of including both fixed and random effects are demonstrated with reference to linear mixed-effects models, and problems of collinearity, variable distribution and variable selection are discussed. The advantages of these techniques are exemplified in an analysis of a word...
Discontinuous Petrov-Galerkin method based on the optimal test space norm for one-dimensional transport problems

KAUST Repository

Niemi, Antti

2011-05-14

We revisit the finite element analysis of convection dominated flow problems within the recently developed Discontinuous Petrov-Galerkin (DPG) variational framework. We demonstrate how test function spaces that guarantee numerical stability can be computed automatically with respect to the so called optimal test space norm by using an element subgrid discretization. This should make the DPG method not only stable but also robust, that is, uniformly stable with respect to the Ṕeclet number in the current application. The e_ectiveness of the algorithm is demonstrated on two problems for the linear advection-di_usion equation.
Impact of multicollinearity on small sample hydrologic regression models

Science.gov (United States)

Kroll, Charles N.; Song, Peter

2013-06-01

Often hydrologic regression models are developed with ordinary least squares (OLS) procedures. The use of OLS with highly correlated explanatory variables produces multicollinearity, which creates highly sensitive parameter estimators with inflated variances and improper model selection. It is not clear how to best address multicollinearity in hydrologic regression models. Here a Monte Carlo simulation is developed to compare four techniques to address multicollinearity: OLS, OLS with variance inflation factor screening (VIF), principal component regression (PCR), and partial least squares regression (PLS). The performance of these four techniques was observed for varying sample sizes, correlation coefficients between the explanatory variables, and model error variances consistent with hydrologic regional regression models. The negative effects of multicollinearity are magnified at smaller sample sizes, higher correlations between the variables, and larger model error variances (smaller R2). The Monte Carlo simulation indicates that if the true model is known, multicollinearity is present, and the estimation and statistical testing of regression parameters are of interest, then PCR or PLS should be employed. If the model is unknown, or if the interest is solely on model predictions, is it recommended that OLS be employed since using more complicated techniques did not produce any improvement in model performance. A leave-one-out cross-validation case study was also performed using low-streamflow data sets from the eastern United States. Results indicate that OLS with stepwise selection generally produces models across study regions with varying levels of multicollinearity that are as good as biased regression techniques such as PCR and PLS.
Gaussian Processes and Polynomial Chaos Expansion for Regression Problem: Linkage via the RKHS and Comparison via the KL Divergence

Directory of Open Access Journals (Sweden)

Liang Yan

2018-03-01

Full Text Available In this paper, we examine two widely-used approaches, the polynomial chaos expansion (PCE and Gaussian process (GP regression, for the development of surrogate models. The theoretical differences between the PCE and GP approximations are discussed. A state-of-the-art PCE approach is constructed based on high precision quadrature points; however, the need for truncation may result in potential precision loss; the GP approach performs well on small datasets and allows a fine and precise trade-off between fitting the data and smoothing, but its overall performance depends largely on the training dataset. The reproducing kernel Hilbert space (RKHS and Mercer’s theorem are introduced to form a linkage between the two methods. The theorem has proven that the two surrogates can be embedded in two isomorphic RKHS, by which we propose a novel method named Gaussian process on polynomial chaos basis (GPCB that incorporates the PCE and GP. A theoretical comparison is made between the PCE and GPCB with the help of the Kullback–Leibler divergence. We present that the GPCB is as stable and accurate as the PCE method. Furthermore, the GPCB is a one-step Bayesian method that chooses the best subset of RKHS in which the true function should lie, while the PCE method requires an adaptive procedure. Simulations of 1D and 2D benchmark functions show that GPCB outperforms both the PCE and classical GP methods. In order to solve high dimensional problems, a random sample scheme with a constructive design (i.e., tensor product of quadrature points is proposed to generate a valid training dataset for the GPCB method. This approach utilizes the nature of the high numerical accuracy underlying the quadrature points while ensuring the computational feasibility. Finally, the experimental results show that our sample strategy has a higher accuracy than classical experimental designs; meanwhile, it is suitable for solving high dimensional problems.
How Can Comparison Groups Strengthen Regression Discontinuity Designs?

Science.gov (United States)

Wing, Coady; Cook, Thomas D.

2011-01-01

In this paper, the authors examine some of the ways that different types of non-equivalent comparison groups can be used to strengthen causal inferences based on regression discontinuity design (RDD). First, they consider a design that incorporates pre-test data on assignment scores and outcomes that were collected either before the treatment…
Single image super-resolution using locally adaptive multiple linear regression.

Science.gov (United States)

Yu, Soohwan; Kang, Wonseok; Ko, Seungyong; Paik, Joonki

2015-12-01

This paper presents a regularized superresolution (SR) reconstruction method using locally adaptive multiple linear regression to overcome the limitation of spatial resolution of digital images. In order to make the SR problem better-posed, the proposed method incorporates the locally adaptive multiple linear regression into the regularization process as a local prior. The local regularization prior assumes that the target high-resolution (HR) pixel is generated by a linear combination of similar pixels in differently scaled patches and optimum weight parameters. In addition, we adapt a modified version of the nonlocal means filter as a smoothness prior to utilize the patch redundancy. Experimental results show that the proposed algorithm better restores HR images than existing state-of-the-art methods in the sense of the most objective measures in the literature.
Validity of the reduced-sample insulin modified frequently-sampled intravenous glucose tolerance test using the nonlinear regression approach.

Science.gov (United States)

Sumner, Anne E; Luercio, Marcella F; Frempong, Barbara A; Ricks, Madia; Sen, Sabyasachi; Kushner, Harvey; Tulloch-Reid, Marshall K

2009-02-01

The disposition index, the product of the insulin sensitivity index (S(I)) and the acute insulin response to glucose, is linked in African Americans to chromosome 11q. This link was determined with S(I) calculated with the nonlinear regression approach to the minimal model and data from the reduced-sample insulin-modified frequently-sampled intravenous glucose tolerance test (Reduced-Sample-IM-FSIGT). However, the application of the nonlinear regression approach to calculate S(I) using data from the Reduced-Sample-IM-FSIGT has been challenged as being not only inaccurate but also having a high failure rate in insulin-resistant subjects. Our goal was to determine the accuracy and failure rate of the Reduced-Sample-IM-FSIGT using the nonlinear regression approach to the minimal model. With S(I) from the Full-Sample-IM-FSIGT considered the standard and using the nonlinear regression approach to the minimal model, we compared the agreement between S(I) from the Full- and Reduced-Sample-IM-FSIGT protocols. One hundred African Americans (body mass index, 31.3 +/- 7.6 kg/m(2) [mean +/- SD]; range, 19.0-56.9 kg/m(2)) had FSIGTs. Glucose (0.3 g/kg) was given at baseline. Insulin was infused from 20 to 25 minutes (total insulin dose, 0.02 U/kg). For the Full-Sample-IM-FSIGT, S(I) was calculated based on the glucose and insulin samples taken at -1, 1, 2, 3, 4, 5, 6, 7, 8,10, 12, 14, 16, 19, 22, 23, 24, 25, 27, 30, 40, 50, 60, 70, 80, 90, 100, 120, 150, and 180 minutes. For the Reduced-Sample-FSIGT, S(I) was calculated based on the time points that appear in bold. Agreement was determined by Spearman correlation, concordance, and the Bland-Altman method. In addition, for both protocols, the population was divided into tertiles of S(I). Insulin resistance was defined by the lowest tertile of S(I) from the Full-Sample-IM-FSIGT. The distribution of subjects across tertiles was compared by rank order and kappa statistic. We found that the rate of failure of resolution of S(I) by
WASP (Write a Scientific Paper) using Excel - 13: Correlation and Regression.

Science.gov (United States)

Grech, Victor

2018-07-01

Correlation and regression measure the closeness of association between two continuous variables. This paper explains how to perform these tests in Microsoft Excel and their interpretation, as well as how to apply these tests dynamically using Excel's functions. Copyright © 2018 Elsevier B.V. All rights reserved.
Management of Industrial Performance Indicators: Regression Analysis and Simulation

Directory of Open Access Journals (Sweden)

Walter Roberto Hernandez Vergara

2017-11-01

Full Text Available Stochastic methods can be used in problem solving and explanation of natural phenomena through the application of statistical procedures. The article aims to associate the regression analysis and systems simulation, in order to facilitate the practical understanding of data analysis. The algorithms were developed in Microsoft Office Excel software, using statistical techniques such as regression theory, ANOVA and Cholesky Factorization, which made it possible to create models of single and multiple systems with up to five independent variables. For the analysis of these models, the Monte Carlo simulation and analysis of industrial performance indicators were used, resulting in numerical indices that aim to improve the goals’ management for compliance indicators, by identifying systems’ instability, correlation and anomalies. The analytical models presented in the survey indicated satisfactory results with numerous possibilities for industrial and academic applications, as well as the potential for deployment in new analytical techniques.
General regression and representation model for classification.

Directory of Open Access Journals (Sweden)

Jianjun Qian

Full Text Available Recently, the regularized coding-based classification methods (e.g. SRC and CRC show a great potential for pattern classification. However, most existing coding methods assume that the representation residuals are uncorrelated. In real-world applications, this assumption does not hold. In this paper, we take account of the correlations of the representation residuals and develop a general regression and representation model (GRR for classification. GRR not only has advantages of CRC, but also takes full use of the prior information (e.g. the correlations between representation residuals and representation coefficients and the specific information (weight matrix of image pixels to enhance the classification performance. GRR uses the generalized Tikhonov regularization and K Nearest Neighbors to learn the prior information from the training data. Meanwhile, the specific information is obtained by using an iterative algorithm to update the feature (or image pixel weights of the test sample. With the proposed model as a platform, we design two classifiers: basic general regression and representation classifier (B-GRR and robust general regression and representation classifier (R-GRR. The experimental results demonstrate the performance advantages of proposed methods over state-of-the-art algorithms.
Deep ensemble learning of sparse regression models for brain disease diagnosis.

Science.gov (United States)

Suk, Heung-Il; Lee, Seong-Whan; Shen, Dinggang

2017-04-01

Recent studies on brain imaging analysis witnessed the core roles of machine learning techniques in computer-assisted intervention for brain disease diagnosis. Of various machine-learning techniques, sparse regression models have proved their effectiveness in handling high-dimensional data but with a small number of training samples, especially in medical problems. In the meantime, deep learning methods have been making great successes by outperforming the state-of-the-art performances in various applications. In this paper, we propose a novel framework that combines the two conceptually different methods of sparse regression and deep learning for Alzheimer's disease/mild cognitive impairment diagnosis and prognosis. Specifically, we first train multiple sparse regression models, each of which is trained with different values of a regularization control parameter. Thus, our multiple sparse regression models potentially select different feature subsets from the original feature set; thereby they have different powers to predict the response values, i.e., clinical label and clinical scores in our work. By regarding the response values from our sparse regression models as target-level representations, we then build a deep convolutional neural network for clinical decision making, which thus we call 'Deep Ensemble Sparse Regression Network.' To our best knowledge, this is the first work that combines sparse regression models with deep neural network. In our experiments with the ADNI cohort, we validated the effectiveness of the proposed method by achieving the highest diagnostic accuracies in three classification tasks. We also rigorously analyzed our results and compared with the previous studies on the ADNI cohort in the literature. Copyright © 2017 Elsevier B.V. All rights reserved.
Prediction of unwanted pregnancies using logistic regression, probit regression and discriminant analysis.

Science.gov (United States)

Ebrahimzadeh, Farzad; Hajizadeh, Ebrahim; Vahabi, Nasim; Almasian, Mohammad; Bakhteyar, Katayoon

2015-01-01

Unwanted pregnancy not intended by at least one of the parents has undesirable consequences for the family and the society. In the present study, three classification models were used and compared to predict unwanted pregnancies in an urban population. In this cross-sectional study, 887 pregnant mothers referring to health centers in Khorramabad, Iran, in 2012 were selected by the stratified and cluster sampling; relevant variables were measured and for prediction of unwanted pregnancy, logistic regression, discriminant analysis, and probit regression models and SPSS software version 21 were used. To compare these models, indicators such as sensitivity, specificity, the area under the ROC curve, and the percentage of correct predictions were used. The prevalence of unwanted pregnancies was 25.3%. The logistic and probit regression models indicated that parity and pregnancy spacing, contraceptive methods, household income and number of living male children were related to unwanted pregnancy. The performance of the models based on the area under the ROC curve was 0.735, 0.733, and 0.680 for logistic regression, probit regression, and linear discriminant analysis, respectively. Given the relatively high prevalence of unwanted pregnancies in Khorramabad, it seems necessary to revise family planning programs. Despite the similar accuracy of the models, if the researcher is interested in the interpretability of the results, the use of the logistic regression model is recommended.

Depression and Related Problems in University Students

Science.gov (United States)

Field, Tiffany; Diego, Miguel; Pelaez, Martha; Deeds, Osvelia; Delgado, Jeannette

2012-01-01

Method: Depression and related problems were studied in a sample of 283 university students. Results: The students with high depression scores also had high scores on anxiety, intrusive thoughts, controlling intrusive thoughts and sleep disturbances scales. A stepwise regression suggested that those problems contributed to a significant proportion…
Spontaneous regression of multiple pulmonary metastatic nodules of hepatocarcinoma: a case report

Energy Technology Data Exchange (ETDEWEB)

Bahk, Yong Whee; Park, Seog Hee; Kim, Sun Moo [St. Mary' s Hospital, Catholic Medical College, Seoul (Korea, Republic of)

1981-09-15

Although are spontaneous regression of either primary or metastatic malignant tumor in the absence of or inadequate therapy has been well documented. Since the earliest day of this century various malignant tumors have been reported to spontaneously disappear or to be arrested of their growth, but the cases of hepatocarcinoma has been very rare. From the literature, we were able to find out 5 previously reported cases of hepatocarcinoma which showed spontaneous regression at the primary site. Recently we have seen a case of multiple pulmonary metastatic nodules of hepatocarcinoma which completely regressed spontaneously and this forms the basis of the present case report. The patient was 55-year-old male admitted to St. Mary's Hospital, Catholic Medical College because of a hard palpable mass in the epigastrium on April 26, 1978. The admission PA chest roentgenogram revealed multiple small nodular densities scattered throughout both lung field especially in lower zones and toward the peripheral portion. A hepatoscintigram revealed a large cold area involving the left lobe and inermediate zone of the liver. Alfa-fetoprotein and hepatitis B serum antigen test were positive whereas many other standard liver function tests turned out to be negative. A needle biopsy of the tumor revealed well differentiated hepatocellular carcinoma. The patient was put under chemotherapy which consisted of 5 FU 500 mg intravenously for 6 days from April 28 to May 3, 1978. The patient was discharged after this single course of 5 FU treatment and was on a herb medicine, the nature and quantity of which obscure. No other specific treatment was given. The second admission took place on Dec. 3, 1980 because of irregularity in bowel habits and dyspepsia. A follow up PA chest roentgenogram obtained on the second admission revealed complete disappearance of previously noted multiple pulmonary nodular lesions (Fig. 3). Follow up liver scan revealed persistence of the cold area in the left lobe
Spontaneous regression of multiple pulmonary metastatic nodules of hepatocarcinoma: a case report

International Nuclear Information System (INIS)

Bahk, Yong Whee; Park, Seog Hee; Kim, Sun Moo

1981-01-01

Although are spontaneous regression of either primary or metastatic malignant tumor in the absence of or inadequate therapy has been well documented. Since the earliest day of this century various malignant tumors have been reported to spontaneously disappear or to be arrested of their growth, but the cases of hepatocarcinoma has been very rare. From the literature, we were able to find out 5 previously reported cases of hepatocarcinoma which showed spontaneous regression at the primary site. Recently we have seen a case of multiple pulmonary metastatic nodules of hepatocarcinoma which completely regressed spontaneously and this forms the basis of the present case report. The patient was 55-year-old male admitted to St. Mary's Hospital, Catholic Medical College because of a hard palpable mass in the epigastrium on April 26, 1978. The admission PA chest roentgenogram revealed multiple small nodular densities scattered throughout both lung field especially in lower zones and toward the peripheral portion. A hepatoscintigram revealed a large cold area involving the left lobe and inermediate zone of the liver. Alfa-fetoprotein and hepatitis B serum antigen test were positive whereas many other standard liver function tests turned out to be negative. A needle biopsy of the tumor revealed well differentiated hepatocellular carcinoma. The patient was put under chemotherapy which consisted of 5 FU 500 mg intravenously for 6 days from April 28 to May 3, 1978. The patient was discharged after this single course of 5 FU treatment and was on a herb medicine, the nature and quantity of which obscure. No other specific treatment was given. The second admission took place on Dec. 3, 1980 because of irregularity in bowel habits and dyspepsia. A follow up PA chest roentgenogram obtained on the second admission revealed complete disappearance of previously noted multiple pulmonary nodular lesions (Fig. 3). Follow up liver scan revealed persistence of the cold area in the left lobe
Estimating the Impact of Urbanization on Air Quality in China Using Spatial Regression Models

Directory of Open Access Journals (Sweden)

Chuanglin Fang

2015-11-01

Full Text Available Urban air pollution is one of the most visible environmental problems to have accompanied China’s rapid urbanization. Based on emission inventory data from 2014, gathered from 289 cities, we used Global and Local Moran’s I to measure the spatial autorrelation of Air Quality Index (AQI values at the city level, and employed Ordinary Least Squares (OLS, Spatial Lag Model (SAR, and Geographically Weighted Regression (GWR to quantitatively estimate the comprehensive impact and spatial variations of China’s urbanization process on air quality. The results show that a significant spatial dependence and heterogeneity existed in AQI values. Regression models revealed urbanization has played an important negative role in determining air quality in Chinese cities. The population, urbanization rate, automobile density, and the proportion of secondary industry were all found to have had a significant influence over air quality. Per capita Gross Domestic Product (GDP and the scale of urban land use, however, failed the significance test at 10% level. The GWR model performed better than global models and the results of GWR modeling show that the relationship between urbanization and air quality was not constant in space. Further, the local parameter estimates suggest significant spatial variation in the impacts of various urbanization factors on air quality.
Effects of dependence in high-dimensional multiple testing problems

Directory of Open Access Journals (Sweden)

van de Wiel Mark A

2008-02-01

Full Text Available Abstract Background We consider effects of dependence among variables of high-dimensional data in multiple hypothesis testing problems, in particular the False Discovery Rate (FDR control procedures. Recent simulation studies consider only simple correlation structures among variables, which is hardly inspired by real data features. Our aim is to systematically study effects of several network features like sparsity and correlation strength by imposing dependence structures among variables using random correlation matrices. Results We study the robustness against dependence of several FDR procedures that are popular in microarray studies, such as Benjamin-Hochberg FDR, Storey's q-value, SAM and resampling based FDR procedures. False Non-discovery Rates and estimates of the number of null hypotheses are computed from those methods and compared. Our simulation study shows that methods such as SAM and the q-value do not adequately control the FDR to the level claimed under dependence conditions. On the other hand, the adaptive Benjamini-Hochberg procedure seems to be most robust while remaining conservative. Finally, the estimates of the number of true null hypotheses under various dependence conditions are variable. Conclusion We discuss a new method for efficient guided simulation of dependent data, which satisfy imposed network constraints as conditional independence structures. Our simulation set-up allows for a structural study of the effect of dependencies on multiple testing criterions and is useful for testing a potentially new method on π0 or FDR estimation in a dependency context.
Enhancement of Visual Field Predictions with Pointwise Exponential Regression (PER) and Pointwise Linear Regression (PLR).

Science.gov (United States)

Morales, Esteban; de Leon, John Mark S; Abdollahi, Niloufar; Yu, Fei; Nouri-Mahdavi, Kouros; Caprioli, Joseph

2016-03-01

The study was conducted to evaluate threshold smoothing algorithms to enhance prediction of the rates of visual field (VF) worsening in glaucoma. We studied 798 patients with primary open-angle glaucoma and 6 or more years of follow-up who underwent 8 or more VF examinations. Thresholds at each VF location for the first 4 years or first half of the follow-up time (whichever was greater) were smoothed with clusters defined by the nearest neighbor (NN), Garway-Heath, Glaucoma Hemifield Test (GHT), and weighting by the correlation of rates at all other VF locations. Thresholds were regressed with a pointwise exponential regression (PER) model and a pointwise linear regression (PLR) model. Smaller root mean square error (RMSE) values of the differences between the observed and the predicted thresholds at last two follow-ups indicated better model predictions. The mean (SD) follow-up times for the smoothing and prediction phase were 5.3 (1.5) and 10.5 (3.9) years. The mean RMSE values for the PER and PLR models were unsmoothed data, 6.09 and 6.55; NN, 3.40 and 3.42; Garway-Heath, 3.47 and 3.48; GHT, 3.57 and 3.74; and correlation of rates, 3.59 and 3.64. Smoothed VF data predicted better than unsmoothed data. Nearest neighbor provided the best predictions; PER also predicted consistently more accurately than PLR. Smoothing algorithms should be used when forecasting VF results with PER or PLR. The application of smoothing algorithms on VF data can improve forecasting in VF points to assist in treatment decisions.
Coalitions and family problem solving with preadolescents in referred, at-risk, and comparison families.

Science.gov (United States)

Vuchinich, S; Wood, B; Vuchinich, R

1994-12-01

This study tested the hypothesis that the mother-father coalition, parent-child coalitions, and parental warmth expressed toward the child are associated with family problem solving in families with a preadolescent child referred for treatment of behavior problems (n = 30), families with a child at-risk for conduct disorder (n = 68), and a sample of comparison families (n = 90). Referred and at-risk families displayed less effective problem solving. A regression analysis, which controlled for gender of the child, family structure, family income, marital satisfaction, and severity of child problems, showed that strong parental coalitions were linked to low levels of family problem solving in at-risk and referred families. Parent-child coalitions had little apparent impact while parental warmth was highly correlated with better family problem solving. The results may be interpreted as evidence for a tendency for parents in at-risk and referred families to "scapegoat" a preadolescent during family problem-solving sessions. This may undermine progress on family problem solutions and may complicate family-based prevention and treatment programs that use family problem-solving sessions.
ISP 22 OECD/NEA/CSNI International standard problem n. 22. Evaluation of post-test analyses

International Nuclear Information System (INIS)

1992-07-01

The present report deals with the open re-evaluation of the originally double-blind CSNI International Standard Problem 22 based on the test SP-FW-02 performed in the SPES facility. The SPES apparatus is an experimental simulator of the Westinghouse PWR-PUN plant. The test SP-FW-02 (ISP22) simulates a complete loss of feedwater with delayed injection of auxiliary feedwater. The main parts of the report are: outline of the test facility and of the SP-FW-02 experiment; overview of pre-test activities; overview of input models used by post-test participants; evaluation of participant predictions; evaluation of qualitative and quantitative code accuracy of pre-test and post-test calculations
Lesion mapping of social problem solving.

Science.gov (United States)

Barbey, Aron K; Colom, Roberto; Paul, Erick J; Chau, Aileen; Solomon, Jeffrey; Grafman, Jordan H

2014-10-01

Accumulating neuroscience evidence indicates that human intelligence is supported by a distributed network of frontal and parietal regions that enable complex, goal-directed behaviour. However, the contributions of this network to social aspects of intellectual function remain to be well characterized. Here, we report a human lesion study (n = 144) that investigates the neural bases of social problem solving (measured by the Everyday Problem Solving Inventory) and examine the degree to which individual differences in performance are predicted by a broad spectrum of psychological variables, including psychometric intelligence (measured by the Wechsler Adult Intelligence Scale), emotional intelligence (measured by the Mayer, Salovey, Caruso Emotional Intelligence Test), and personality traits (measured by the Neuroticism-Extraversion-Openness Personality Inventory). Scores for each variable were obtained, followed by voxel-based lesion-symptom mapping. Stepwise regression analyses revealed that working memory, processing speed, and emotional intelligence predict individual differences in everyday problem solving. A targeted analysis of specific everyday problem solving domains (involving friends, home management, consumerism, work, information management, and family) revealed psychological variables that selectively contribute to each. Lesion mapping results indicated that social problem solving, psychometric intelligence, and emotional intelligence are supported by a shared network of frontal, temporal, and parietal regions, including white matter association tracts that bind these areas into a coordinated system. The results support an integrative framework for understanding social intelligence and make specific recommendations for the application of the Everyday Problem Solving Inventory to the study of social problem solving in health and disease. © The Author (2014). Published by Oxford University Press on behalf of the Guarantors of Brain. All rights reserved
Predicting Dropouts of University Freshmen: A Logit Regression Analysis.

Science.gov (United States)

Lam, Y. L. Jack

1984-01-01

Stepwise discriminant analysis coupled with logit regression analysis of freshmen data from Brandon University (Manitoba) indicated that six tested variables drawn from research on university dropouts were useful in predicting attrition: student status, residence, financial sources, distance from home town, goal fulfillment, and satisfaction with…
Building optimal regression tree by ant colony system-genetic algorithm: Application to modeling of melting points

Energy Technology Data Exchange (ETDEWEB)

Hemmateenejad, Bahram, E-mail: hemmatb@sums.ac.ir [Department of Chemistry, Shiraz University, Shiraz (Iran, Islamic Republic of); Medicinal and Natural Products Chemistry Research Center, Shiraz University of Medical Sciences, Shiraz (Iran, Islamic Republic of); Shamsipur, Mojtaba [Department of Chemistry, Razi University, Kermanshah (Iran, Islamic Republic of); Zare-Shahabadi, Vali [Young Researchers Club, Mahshahr Branch, Islamic Azad University, Mahshahr (Iran, Islamic Republic of); Akhond, Morteza [Department of Chemistry, Shiraz University, Shiraz (Iran, Islamic Republic of)

2011-10-17

Highlights: {yields} Ant colony systems help to build optimum classification and regression trees. {yields} Using of genetic algorithm operators in ant colony systems resulted in more appropriate models. {yields} Variable selection in each terminal node of the tree gives promising results. {yields} CART-ACS-GA could model the melting point of organic materials with prediction errors lower than previous models. - Abstract: The classification and regression trees (CART) possess the advantage of being able to handle large data sets and yield readily interpretable models. A conventional method of building a regression tree is recursive partitioning, which results in a good but not optimal tree. Ant colony system (ACS), which is a meta-heuristic algorithm and derived from the observation of real ants, can be used to overcome this problem. The purpose of this study was to explore the use of CART and its combination with ACS for modeling of melting points of a large variety of chemical compounds. Genetic algorithm (GA) operators (e.g., cross averring and mutation operators) were combined with ACS algorithm to select the best solution model. In addition, at each terminal node of the resulted tree, variable selection was done by ACS-GA algorithm to build an appropriate partial least squares (PLS) model. To test the ability of the resulted tree, a set of approximately 4173 structures and their melting points were used (3000 compounds as training set and 1173 as validation set). Further, an external test set containing of 277 drugs was used to validate the prediction ability of the tree. Comparison of the results obtained from both trees showed that the tree constructed by ACS-GA algorithm performs better than that produced by recursive partitioning procedure.
Strategies for Testing Statistical and Practical Significance in Detecting DIF with Logistic Regression Models

Science.gov (United States)

Fidalgo, Angel M.; Alavi, Seyed Mohammad; Amirian, Seyed Mohammad Reza

2014-01-01

This study examines three controversial aspects in differential item functioning (DIF) detection by logistic regression (LR) models: first, the relative effectiveness of different analytical strategies for detecting DIF; second, the suitability of the Wald statistic for determining the statistical significance of the parameters of interest; and…
Discontinuous Petrov–Galerkin method with optimal test functions for thin-body problems in solid mechanics

KAUST Repository

Niemi, Antti H.

2011-02-01

We study the applicability of the discontinuous Petrov-Galerkin (DPG) variational framework for thin-body problems in structural mechanics. Our numerical approach is based on discontinuous piecewise polynomial finite element spaces for the trial functions and approximate, local computation of the corresponding \\'optimal\\' test functions. In the Timoshenko beam problem, the proposed method is shown to provide the best approximation in an energy-type norm which is equivalent to the L2-norm for all the unknowns, uniformly with respect to the thickness parameter. The same formulation remains valid also for the asymptotic Euler-Bernoulli solution. As another one-dimensional model problem we consider the modelling of the so called basic edge effect in shell deformations. In particular, we derive a special norm for the test space which leads to a robust method in terms of the shell thickness. Finally, we demonstrate how a posteriori error estimator arising directly from the discontinuous variational framework can be utilized to generate an optimal hp-mesh for resolving the boundary layer. © 2010 Elsevier B.V.
Regression analysis for LED color detection of visual-MIMO system

Science.gov (United States)

Banik, Partha Pratim; Saha, Rappy; Kim, Ki-Doo

2018-04-01

Color detection from a light emitting diode (LED) array using a smartphone camera is very difficult in a visual multiple-input multiple-output (visual-MIMO) system. In this paper, we propose a method to determine the LED color using a smartphone camera by applying regression analysis. We employ a multivariate regression model to identify the LED color. After taking a picture of an LED array, we select the LED array region, and detect the LED using an image processing algorithm. We then apply the k-means clustering algorithm to determine the number of potential colors for feature extraction of each LED. Finally, we apply the multivariate regression model to predict the color of the transmitted LEDs. In this paper, we show our results for three types of environmental light condition: room environmental light, low environmental light (560 lux), and strong environmental light (2450 lux). We compare the results of our proposed algorithm from the analysis of training and test R-Square (%) values, percentage of closeness of transmitted and predicted colors, and we also mention about the number of distorted test data points from the analysis of distortion bar graph in CIE1931 color space.
A consistent framework for Horton regression statistics that leads to a modified Hack's law

Science.gov (United States)

Furey, P.R.; Troutman, B.M.

2008-01-01

A statistical framework is introduced that resolves important problems with the interpretation and use of traditional Horton regression statistics. The framework is based on a univariate regression model that leads to an alternative expression for Horton ratio, connects Horton regression statistics to distributional simple scaling, and improves the accuracy in estimating Horton plot parameters. The model is used to examine data for drainage area A and mainstream length L from two groups of basins located in different physiographic settings. Results show that confidence intervals for the Horton plot regression statistics are quite wide. Nonetheless, an analysis of covariance shows that regression intercepts, but not regression slopes, can be used to distinguish between basin groups. The univariate model is generalized to include n > 1 dependent variables. For the case where the dependent variables represent ln A and ln L, the generalized model performs somewhat better at distinguishing between basin groups than two separate univariate models. The generalized model leads to a modification of Hack's law where L depends on both A and Strahler order ??. Data show that ?? plays a statistically significant role in the modified Hack's law expression. ?? 2008 Elsevier B.V.
Western Regional Conference on Testing Problems (7th, Los Angeles, California, March 14, 1958). Testing for the Discovery and Development of Human Talent.

Science.gov (United States)

Educational Testing Service, Los Angeles, CA.

At the seventh Western Regional Conference on Testing Problems, the following speeches were given: (1) "A Guidance Person's Approach to Testing for the Discovery and Development of Human Talent" by Frances D. McGill; (2) "The Instructional Uses of Measurement in the Discovery and Development of Human Talent" by Roy P. Wahle; (3) "New Frontiers of…
Logic regression and its extensions.

Science.gov (United States)

Schwender, Holger; Ruczinski, Ingo

2010-01-01

Logic regression is an adaptive classification and regression procedure, initially developed to reveal interacting single nucleotide polymorphisms (SNPs) in genetic association studies. In general, this approach can be used in any setting with binary predictors, when the interaction of these covariates is of primary interest. Logic regression searches for Boolean (logic) combinations of binary variables that best explain the variability in the outcome variable, and thus, reveals variables and interactions that are associated with the response and/or have predictive capabilities. The logic expressions are embedded in a generalized linear regression framework, and thus, logic regression can handle a variety of outcome types, such as binary responses in case-control studies, numeric responses, and time-to-event data. In this chapter, we provide an introduction to the logic regression methodology, list some applications in public health and medicine, and summarize some of the direct extensions and modifications of logic regression that have been proposed in the literature. Copyright © 2010 Elsevier Inc. All rights reserved.
Tumor regression patterns in retinoblastoma

International Nuclear Information System (INIS)

Zafar, S.N.; Siddique, S.N.; Zaheer, N.

2016-01-01

To observe the types of tumor regression after treatment, and identify the common pattern of regression in our patients. Study Design: Descriptive study. Place and Duration of Study: Department of Pediatric Ophthalmology and Strabismus, Al-Shifa Trust Eye Hospital, Rawalpindi, Pakistan, from October 2011 to October 2014. Methodology: Children with unilateral and bilateral retinoblastoma were included in the study. Patients were referred to Pakistan Institute of Medical Sciences, Islamabad, for chemotherapy. After every cycle of chemotherapy, dilated funds examination under anesthesia was performed to record response of the treatment. Regression patterns were recorded on RetCam II. Results: Seventy-four tumors were included in the study. Out of 74 tumors, 3 were ICRB group A tumors, 43 were ICRB group B tumors, 14 tumors belonged to ICRB group C, and remaining 14 were ICRB group D tumors. Type IV regression was seen in 39.1% (n=29) tumors, type II in 29.7% (n=22), type III in 25.6% (n=19), and type I in 5.4% (n=4). All group A tumors (100%) showed type IV regression. Seventeen (39.5%) group B tumors showed type IV regression. In group C, 5 tumors (35.7%) showed type II regression and 5 tumors (35.7%) showed type IV regression. In group D, 6 tumors (42.9%) regressed to type II non-calcified remnants. Conclusion: The response and success of the focal and systemic treatment, as judged by the appearance of different patterns of tumor regression, varies with the ICRB grouping of the tumor. (author)
Influence of regression model and incremental test protocol on the relationship between lactate threshold using the maximal-deviation method and performance in female runners.

Science.gov (United States)

Machado, Fabiana Andrade; Nakamura, Fábio Yuzo; Moraes, Solange Marta Franzói De

2012-01-01

This study examined the influence of the regression model and initial intensity of an incremental test on the relationship between the lactate threshold estimated by the maximal-deviation method and the endurance performance. Sixteen non-competitive, recreational female runners performed a discontinuous incremental treadmill test. The initial speed was set at 7 km · h⁻¹, and increased every 3 min by 1 km · h⁻¹ with a 30-s rest between the stages used for earlobe capillary blood sample collection. Lactate-speed data were fitted by an exponential-plus-constant and a third-order polynomial equation. The lactate threshold was determined for both regression equations, using all the coordinates, excluding the first and excluding the first and second initial points. Mean speed of a 10-km road race was the performance index (3.04 ± 0.22 m · s⁻¹). The exponentially-derived lactate threshold had a higher correlation (0.98 ≤ r ≤ 0.99) and smaller standard error of estimate (SEE) (0.04 ≤ SEE ≤ 0.05 m · s⁻¹) with performance than the polynomially-derived equivalent (0.83 ≤ r ≤ 0.89; 0.10 ≤ SEE ≤ 0.13 m · s⁻¹). The exponential lactate threshold was greater than the polynomial equivalent (P performance index that is independent of the initial intensity of the incremental test and better than the polynomial equivalent.
Combining Alphas via Bounded Regression

Directory of Open Access Journals (Sweden)

Zura Kakushadze

2015-11-01

Full Text Available We give an explicit algorithm and source code for combining alpha streams via bounded regression. In practical applications, typically, there is insufficient history to compute a sample covariance matrix (SCM for a large number of alphas. To compute alpha allocation weights, one then resorts to (weighted regression over SCM principal components. Regression often produces alpha weights with insufficient diversification and/or skewed distribution against, e.g., turnover. This can be rectified by imposing bounds on alpha weights within the regression procedure. Bounded regression can also be applied to stock and other asset portfolio construction. We discuss illustrative examples.

Some links on this page may take you to non-federal websites. Their policies may differ from this site.