WorldWideScience

Sample records for linear regression multivariate

  1. High-throughput quantitative biochemical characterization of algal biomass by NIR spectroscopy; multiple linear regression and multivariate linear regression analysis.

    Science.gov (United States)

    Laurens, L M L; Wolfrum, E J

    2013-12-18

    One of the challenges associated with microalgal biomass characterization and the comparison of microalgal strains and conversion processes is the rapid determination of the composition of algae. We have developed and applied a high-throughput screening technology based on near-infrared (NIR) spectroscopy for the rapid and accurate determination of algal biomass composition. We show that NIR spectroscopy can accurately predict the full composition using multivariate linear regression analysis of varying lipid, protein, and carbohydrate content of algal biomass samples from three strains. We also demonstrate a high quality of predictions of an independent validation set. A high-throughput 96-well configuration for spectroscopy gives equally good prediction relative to a ring-cup configuration, and thus, spectra can be obtained from as little as 10-20 mg of material. We found that lipids exhibit a dominant, distinct, and unique fingerprint in the NIR spectrum that allows for the use of single and multiple linear regression of respective wavelengths for the prediction of the biomass lipid content. This is not the case for carbohydrate and protein content, and thus, the use of multivariate statistical modeling approaches remains necessary.

  2. Multivariate sparse group lasso for the multivariate multiple linear regression with an arbitrary group structure.

    Science.gov (United States)

    Li, Yanming; Nan, Bin; Zhu, Ji

    2015-06-01

    We propose a multivariate sparse group lasso variable selection and estimation method for data with high-dimensional predictors as well as high-dimensional response variables. The method is carried out through a penalized multivariate multiple linear regression model with an arbitrary group structure for the regression coefficient matrix. It suits many biology studies well in detecting associations between multiple traits and multiple predictors, with each trait and each predictor embedded in some biological functional groups such as genes, pathways or brain regions. The method is able to effectively remove unimportant groups as well as unimportant individual coefficients within important groups, particularly for large p small n problems, and is flexible in handling various complex group structures such as overlapping or nested or multilevel hierarchical structures. The method is evaluated through extensive simulations with comparisons to the conventional lasso and group lasso methods, and is applied to an eQTL association study. © 2015, The International Biometric Society.

  3. Boosted regression trees, multivariate adaptive regression splines and their two-step combinations with multiple linear regression or partial least squares to predict blood-brain barrier passage: a case study.

    Science.gov (United States)

    Deconinck, E; Zhang, M H; Petitet, F; Dubus, E; Ijjaali, I; Coomans, D; Vander Heyden, Y

    2008-02-18

    The use of some unconventional non-linear modeling techniques, i.e. classification and regression trees and multivariate adaptive regression splines-based methods, was explored to model the blood-brain barrier (BBB) passage of drugs and drug-like molecules. The data set contains BBB passage values for 299 structural and pharmacological diverse drugs, originating from a structured knowledge-based database. Models were built using boosted regression trees (BRT) and multivariate adaptive regression splines (MARS), as well as their respective combinations with stepwise multiple linear regression (MLR) and partial least squares (PLS) regression in two-step approaches. The best models were obtained using combinations of MARS with either stepwise MLR or PLS. It could be concluded that the use of combinations of a linear with a non-linear modeling technique results in some improved properties compared to the individual linear and non-linear models and that, when the use of such a combination is appropriate, combinations using MARS as non-linear technique should be preferred over those with BRT, due to some serious drawbacks of the BRT approaches.

  4. Advanced statistics: linear regression, part II: multiple linear regression.

    Science.gov (United States)

    Marill, Keith A

    2004-01-01

    The applications of simple linear regression in medical research are limited, because in most situations, there are multiple relevant predictor variables. Univariate statistical techniques such as simple linear regression use a single predictor variable, and they often may be mathematically correct but clinically misleading. Multiple linear regression is a mathematical technique used to model the relationship between multiple independent predictor variables and a single dependent outcome variable. It is used in medical research to model observational data, as well as in diagnostic and therapeutic studies in which the outcome is dependent on more than one factor. Although the technique generally is limited to data that can be expressed with a linear function, it benefits from a well-developed mathematical framework that yields unique solutions and exact confidence intervals for regression coefficients. Building on Part I of this series, this article acquaints the reader with some of the important concepts in multiple regression analysis. These include multicollinearity, interaction effects, and an expansion of the discussion of inference testing, leverage, and variable transformations to multivariate models. Examples from the first article in this series are expanded on using a primarily graphic, rather than mathematical, approach. The importance of the relationships among the predictor variables and the dependence of the multivariate model coefficients on the choice of these variables are stressed. Finally, concepts in regression model building are discussed.

  5. Linear regression

    CERN Document Server

    Olive, David J

    2017-01-01

    This text covers both multiple linear regression and some experimental design models. The text uses the response plot to visualize the model and to detect outliers, does not assume that the error distribution has a known parametric distribution, develops prediction intervals that work when the error distribution is unknown, suggests bootstrap hypothesis tests that may be useful for inference after variable selection, and develops prediction regions and large sample theory for the multivariate linear regression model that has m response variables. A relationship between multivariate prediction regions and confidence regions provides a simple way to bootstrap confidence regions. These confidence regions often provide a practical method for testing hypotheses. There is also a chapter on generalized linear models and generalized additive models. There are many R functions to produce response and residual plots, to simulate prediction intervals and hypothesis tests, to detect outliers, and to choose response trans...

  6. Multivariate Linear Regression and CART Regression Analysis of TBM Performance at Abu Hamour Phase-I Tunnel

    Science.gov (United States)

    Jakubowski, J.; Stypulkowski, J. B.; Bernardeau, F. G.

    2017-12-01

    The first phase of the Abu Hamour drainage and storm tunnel was completed in early 2017. The 9.5 km long, 3.7 m diameter tunnel was excavated with two Earth Pressure Balance (EPB) Tunnel Boring Machines from Herrenknecht. TBM operation processes were monitored and recorded by Data Acquisition and Evaluation System. The authors coupled collected TBM drive data with available information on rock mass properties, cleansed, completed with secondary variables and aggregated by weeks and shifts. Correlations and descriptive statistics charts were examined. Multivariate Linear Regression and CART regression tree models linking TBM penetration rate (PR), penetration per revolution (PPR) and field penetration index (FPI) with TBM operational and geotechnical characteristics were performed for the conditions of the weak/soft rock of Doha. Both regression methods are interpretable and the data were screened with different computational approaches allowing enriched insight. The primary goal of the analysis was to investigate empirical relations between multiple explanatory and responding variables, to search for best subsets of explanatory variables and to evaluate the strength of linear and non-linear relations. For each of the penetration indices, a predictive model coupling both regression methods was built and validated. The resultant models appeared to be stronger than constituent ones and indicated an opportunity for more accurate and robust TBM performance predictions.

  7. Multivariate linear regression of high-dimensional fMRI data with multiple target variables.

    Science.gov (United States)

    Valente, Giancarlo; Castellanos, Agustin Lage; Vanacore, Gianluca; Formisano, Elia

    2014-05-01

    Multivariate regression is increasingly used to study the relation between fMRI spatial activation patterns and experimental stimuli or behavioral ratings. With linear models, informative brain locations are identified by mapping the model coefficients. This is a central aspect in neuroimaging, as it provides the sought-after link between the activity of neuronal populations and subject's perception, cognition or behavior. Here, we show that mapping of informative brain locations using multivariate linear regression (MLR) may lead to incorrect conclusions and interpretations. MLR algorithms for high dimensional data are designed to deal with targets (stimuli or behavioral ratings, in fMRI) separately, and the predictive map of a model integrates information deriving from both neural activity patterns and experimental design. Not accounting explicitly for the presence of other targets whose associated activity spatially overlaps with the one of interest may lead to predictive maps of troublesome interpretation. We propose a new model that can correctly identify the spatial patterns associated with a target while achieving good generalization. For each target, the training is based on an augmented dataset, which includes all remaining targets. The estimation on such datasets produces both maps and interaction coefficients, which are then used to generalize. The proposed formulation is independent of the regression algorithm employed. We validate this model on simulated fMRI data and on a publicly available dataset. Results indicate that our method achieves high spatial sensitivity and good generalization and that it helps disentangle specific neural effects from interaction with predictive maps associated with other targets. Copyright © 2013 Wiley Periodicals, Inc.

  8. Regularized multivariate regression models with skew-t error distributions

    KAUST Repository

    Chen, Lianfu; Pourahmadi, Mohsen; Maadooliat, Mehdi

    2014-01-01

    We consider regularization of the parameters in multivariate linear regression models with the errors having a multivariate skew-t distribution. An iterative penalized likelihood procedure is proposed for constructing sparse estimators of both

  9. Linear Multivariable Regression Models for Prediction of Eddy Dissipation Rate from Available Meteorological Data

    Science.gov (United States)

    MCKissick, Burnell T. (Technical Monitor); Plassman, Gerald E.; Mall, Gerald H.; Quagliano, John R.

    2005-01-01

    Linear multivariable regression models for predicting day and night Eddy Dissipation Rate (EDR) from available meteorological data sources are defined and validated. Model definition is based on a combination of 1997-2000 Dallas/Fort Worth (DFW) data sources, EDR from Aircraft Vortex Spacing System (AVOSS) deployment data, and regression variables primarily from corresponding Automated Surface Observation System (ASOS) data. Model validation is accomplished through EDR predictions on a similar combination of 1994-1995 Memphis (MEM) AVOSS and ASOS data. Model forms include an intercept plus a single term of fixed optimal power for each of these regression variables; 30-minute forward averaged mean and variance of near-surface wind speed and temperature, variance of wind direction, and a discrete cloud cover metric. Distinct day and night models, regressing on EDR and the natural log of EDR respectively, yield best performance and avoid model discontinuity over day/night data boundaries.

  10. Regression Models For Multivariate Count Data.

    Science.gov (United States)

    Zhang, Yiwen; Zhou, Hua; Zhou, Jin; Sun, Wei

    2017-01-01

    Data with multivariate count responses frequently occur in modern applications. The commonly used multinomial-logit model is limiting due to its restrictive mean-variance structure. For instance, analyzing count data from the recent RNA-seq technology by the multinomial-logit model leads to serious errors in hypothesis testing. The ubiquity of over-dispersion and complicated correlation structures among multivariate counts calls for more flexible regression models. In this article, we study some generalized linear models that incorporate various correlation structures among the counts. Current literature lacks a treatment of these models, partly due to the fact that they do not belong to the natural exponential family. We study the estimation, testing, and variable selection for these models in a unifying framework. The regression models are compared on both synthetic and real RNA-seq data.

  11. Bayesian Inference of a Multivariate Regression Model

    Directory of Open Access Journals (Sweden)

    Marick S. Sinay

    2014-01-01

    Full Text Available We explore Bayesian inference of a multivariate linear regression model with use of a flexible prior for the covariance structure. The commonly adopted Bayesian setup involves the conjugate prior, multivariate normal distribution for the regression coefficients and inverse Wishart specification for the covariance matrix. Here we depart from this approach and propose a novel Bayesian estimator for the covariance. A multivariate normal prior for the unique elements of the matrix logarithm of the covariance matrix is considered. Such structure allows for a richer class of prior distributions for the covariance, with respect to strength of beliefs in prior location hyperparameters, as well as the added ability, to model potential correlation amongst the covariance structure. The posterior moments of all relevant parameters of interest are calculated based upon numerical results via a Markov chain Monte Carlo procedure. The Metropolis-Hastings-within-Gibbs algorithm is invoked to account for the construction of a proposal density that closely matches the shape of the target posterior distribution. As an application of the proposed technique, we investigate a multiple regression based upon the 1980 High School and Beyond Survey.

  12. Higher-order Multivariable Polynomial Regression to Estimate Human Affective States

    Science.gov (United States)

    Wei, Jie; Chen, Tong; Liu, Guangyuan; Yang, Jiemin

    2016-03-01

    From direct observations, facial, vocal, gestural, physiological, and central nervous signals, estimating human affective states through computational models such as multivariate linear-regression analysis, support vector regression, and artificial neural network, have been proposed in the past decade. In these models, linear models are generally lack of precision because of ignoring intrinsic nonlinearities of complex psychophysiological processes; and nonlinear models commonly adopt complicated algorithms. To improve accuracy and simplify model, we introduce a new computational modeling method named as higher-order multivariable polynomial regression to estimate human affective states. The study employs standardized pictures in the International Affective Picture System to induce thirty subjects’ affective states, and obtains pure affective patterns of skin conductance as input variables to the higher-order multivariable polynomial model for predicting affective valence and arousal. Experimental results show that our method is able to obtain efficient correlation coefficients of 0.98 and 0.96 for estimation of affective valence and arousal, respectively. Moreover, the method may provide certain indirect evidences that valence and arousal have their brain’s motivational circuit origins. Thus, the proposed method can serve as a novel one for efficiently estimating human affective states.

  13. Advanced statistics: linear regression, part I: simple linear regression.

    Science.gov (United States)

    Marill, Keith A

    2004-01-01

    Simple linear regression is a mathematical technique used to model the relationship between a single independent predictor variable and a single dependent outcome variable. In this, the first of a two-part series exploring concepts in linear regression analysis, the four fundamental assumptions and the mechanics of simple linear regression are reviewed. The most common technique used to derive the regression line, the method of least squares, is described. The reader will be acquainted with other important concepts in simple linear regression, including: variable transformations, dummy variables, relationship to inference testing, and leverage. Simplified clinical examples with small datasets and graphic models are used to illustrate the points. This will provide a foundation for the second article in this series: a discussion of multiple linear regression, in which there are multiple predictor variables.

  14. Regularized multivariate regression models with skew-t error distributions

    KAUST Repository

    Chen, Lianfu

    2014-06-01

    We consider regularization of the parameters in multivariate linear regression models with the errors having a multivariate skew-t distribution. An iterative penalized likelihood procedure is proposed for constructing sparse estimators of both the regression coefficient and inverse scale matrices simultaneously. The sparsity is introduced through penalizing the negative log-likelihood by adding L1-penalties on the entries of the two matrices. Taking advantage of the hierarchical representation of skew-t distributions, and using the expectation conditional maximization (ECM) algorithm, we reduce the problem to penalized normal likelihood and develop a procedure to minimize the ensuing objective function. Using a simulation study the performance of the method is assessed, and the methodology is illustrated using a real data set with a 24-dimensional response vector. © 2014 Elsevier B.V.

  15. Distributed Monitoring of the R2 Statistic for Linear Regression

    Data.gov (United States)

    National Aeronautics and Space Administration — The problem of monitoring a multivariate linear regression model is relevant in studying the evolving relationship between a set of input variables (features) and...

  16. A land use regression model for ambient ultrafine particles in Montreal, Canada: A comparison of linear regression and a machine learning approach.

    Science.gov (United States)

    Weichenthal, Scott; Ryswyk, Keith Van; Goldstein, Alon; Bagg, Scott; Shekkarizfard, Maryam; Hatzopoulou, Marianne

    2016-04-01

    Existing evidence suggests that ambient ultrafine particles (UFPs) (regression model for UFPs in Montreal, Canada using mobile monitoring data collected from 414 road segments during the summer and winter months between 2011 and 2012. Two different approaches were examined for model development including standard multivariable linear regression and a machine learning approach (kernel-based regularized least squares (KRLS)) that learns the functional form of covariate impacts on ambient UFP concentrations from the data. The final models included parameters for population density, ambient temperature and wind speed, land use parameters (park space and open space), length of local roads and rail, and estimated annual average NOx emissions from traffic. The final multivariable linear regression model explained 62% of the spatial variation in ambient UFP concentrations whereas the KRLS model explained 79% of the variance. The KRLS model performed slightly better than the linear regression model when evaluated using an external dataset (R(2)=0.58 vs. 0.55) or a cross-validation procedure (R(2)=0.67 vs. 0.60). In general, our findings suggest that the KRLS approach may offer modest improvements in predictive performance compared to standard multivariable linear regression models used to estimate spatial variations in ambient UFPs. However, differences in predictive performance were not statistically significant when evaluated using the cross-validation procedure. Crown Copyright © 2015. Published by Elsevier Inc. All rights reserved.

  17. Multivariate and semiparametric kernel regression

    OpenAIRE

    Härdle, Wolfgang; Müller, Marlene

    1997-01-01

    The paper gives an introduction to theory and application of multivariate and semiparametric kernel smoothing. Multivariate nonparametric density estimation is an often used pilot tool for examining the structure of data. Regression smoothing helps in investigating the association between covariates and responses. We concentrate on kernel smoothing using local polynomial fitting which includes the Nadaraya-Watson estimator. Some theory on the asymptotic behavior and bandwidth selection is pro...

  18. Multivariate covariance generalized linear models

    DEFF Research Database (Denmark)

    Bonat, W. H.; Jørgensen, Bent

    2016-01-01

    are fitted by using an efficient Newton scoring algorithm based on quasi-likelihood and Pearson estimating functions, using only second-moment assumptions. This provides a unified approach to a wide variety of types of response variables and covariance structures, including multivariate extensions......We propose a general framework for non-normal multivariate data analysis called multivariate covariance generalized linear models, designed to handle multivariate response variables, along with a wide range of temporal and spatial correlation structures defined in terms of a covariance link...... function combined with a matrix linear predictor involving known matrices. The method is motivated by three data examples that are not easily handled by existing methods. The first example concerns multivariate count data, the second involves response variables of mixed types, combined with repeated...

  19. Retro-regression--another important multivariate regression improvement.

    Science.gov (United States)

    Randić, M

    2001-01-01

    We review the serious problem associated with instabilities of the coefficients of regression equations, referred to as the MRA (multivariate regression analysis) "nightmare of the first kind". This is manifested when in a stepwise regression a descriptor is included or excluded from a regression. The consequence is an unpredictable change of the coefficients of the descriptors that remain in the regression equation. We follow with consideration of an even more serious problem, referred to as the MRA "nightmare of the second kind", arising when optimal descriptors are selected from a large pool of descriptors. This process typically causes at different steps of the stepwise regression a replacement of several previously used descriptors by new ones. We describe a procedure that resolves these difficulties. The approach is illustrated on boiling points of nonanes which are considered (1) by using an ordered connectivity basis; (2) by using an ordering resulting from application of greedy algorithm; and (3) by using an ordering derived from an exhaustive search for optimal descriptors. A novel variant of multiple regression analysis, called retro-regression (RR), is outlined showing how it resolves the ambiguities associated with both "nightmares" of the first and the second kind of MRA.

  20. AN APPLICATION OF FUNCTIONAL MULTIVARIATE REGRESSION MODEL TO MULTICLASS CLASSIFICATION

    OpenAIRE

    Krzyśko, Mirosław; Smaga, Łukasz

    2017-01-01

    In this paper, the scale response functional multivariate regression model is considered. By using the basis functions representation of functional predictors and regression coefficients, this model is rewritten as a multivariate regression model. This representation of the functional multivariate regression model is used for multiclass classification for multivariate functional data. Computational experiments performed on real labelled data sets demonstrate the effectiveness of the proposed ...

  1. Applied linear regression

    CERN Document Server

    Weisberg, Sanford

    2013-01-01

    Praise for the Third Edition ""...this is an excellent book which could easily be used as a course text...""-International Statistical Institute The Fourth Edition of Applied Linear Regression provides a thorough update of the basic theory and methodology of linear regression modeling. Demonstrating the practical applications of linear regression analysis techniques, the Fourth Edition uses interesting, real-world exercises and examples. Stressing central concepts such as model building, understanding parameters, assessing fit and reliability, and drawing conclusions, the new edition illus

  2. Multivariate Regression Analysis and Slaughter Livestock,

    Science.gov (United States)

    AGRICULTURE, *ECONOMICS), (*MEAT, PRODUCTION), MULTIVARIATE ANALYSIS, REGRESSION ANALYSIS , ANIMALS, WEIGHT, COSTS, PREDICTIONS, STABILITY, MATHEMATICAL MODELS, STORAGE, BEEF, PORK, FOOD, STATISTICAL DATA, ACCURACY

  3. Admissible Estimators in the General Multivariate Linear Model with Respect to Inequality Restricted Parameter Set

    Directory of Open Access Journals (Sweden)

    Shangli Zhang

    2009-01-01

    Full Text Available By using the methods of linear algebra and matrix inequality theory, we obtain the characterization of admissible estimators in the general multivariate linear model with respect to inequality restricted parameter set. In the classes of homogeneous and general linear estimators, the necessary and suffcient conditions that the estimators of regression coeffcient function are admissible are established.

  4. Multivariate generalized linear mixed models using R

    CERN Document Server

    Berridge, Damon Mark

    2011-01-01

    Multivariate Generalized Linear Mixed Models Using R presents robust and methodologically sound models for analyzing large and complex data sets, enabling readers to answer increasingly complex research questions. The book applies the principles of modeling to longitudinal data from panel and related studies via the Sabre software package in R. A Unified Framework for a Broad Class of Models The authors first discuss members of the family of generalized linear models, gradually adding complexity to the modeling framework by incorporating random effects. After reviewing the generalized linear model notation, they illustrate a range of random effects models, including three-level, multivariate, endpoint, event history, and state dependence models. They estimate the multivariate generalized linear mixed models (MGLMMs) using either standard or adaptive Gaussian quadrature. The authors also compare two-level fixed and random effects linear models. The appendices contain additional information on quadrature, model...

  5. Research on refugees and immigrants social integration in Yunnan Border Area: An empirical analysis on the multivariable linear regression model

    Directory of Open Access Journals (Sweden)

    Peng Nai

    2016-03-01

    Full Text Available A great number of immigration populations resident permanently in Yunnan Border Area of China. To some extent, these people belong to refugees or immigrants in accordance with International Rules, which significantly features the social diversity of this area. However, this kind of social diversity always impairs the social order. Therefore, there will be a positive influence to the local society governance by a research on local immigration integration. This essay hereby attempts to acquire the data of the living situation of these border area immigration and refugees. The analysis of the social integration of refugees and immigration in Yunnan border area in China will be deployed through the modeling of multivariable linear regression based on these data in order to propose some more achievable resolutions.

  6. Linear regression and sensitivity analysis in nuclear reactor design

    International Nuclear Information System (INIS)

    Kumar, Akansha; Tsvetkov, Pavel V.; McClarren, Ryan G.

    2015-01-01

    Highlights: • Presented a benchmark for the applicability of linear regression to complex systems. • Applied linear regression to a nuclear reactor power system. • Performed neutronics, thermal–hydraulics, and energy conversion using Brayton’s cycle for the design of a GCFBR. • Performed detailed sensitivity analysis to a set of parameters in a nuclear reactor power system. • Modeled and developed reactor design using MCNP, regression using R, and thermal–hydraulics in Java. - Abstract: The paper presents a general strategy applicable for sensitivity analysis (SA), and uncertainity quantification analysis (UA) of parameters related to a nuclear reactor design. This work also validates the use of linear regression (LR) for predictive analysis in a nuclear reactor design. The analysis helps to determine the parameters on which a LR model can be fit for predictive analysis. For those parameters, a regression surface is created based on trial data and predictions are made using this surface. A general strategy of SA to determine and identify the influential parameters those affect the operation of the reactor is mentioned. Identification of design parameters and validation of linearity assumption for the application of LR of reactor design based on a set of tests is performed. The testing methods used to determine the behavior of the parameters can be used as a general strategy for UA, and SA of nuclear reactor models, and thermal hydraulics calculations. A design of a gas cooled fast breeder reactor (GCFBR), with thermal–hydraulics, and energy transfer has been used for the demonstration of this method. MCNP6 is used to simulate the GCFBR design, and perform the necessary criticality calculations. Java is used to build and run input samples, and to extract data from the output files of MCNP6, and R is used to perform regression analysis and other multivariate variance, and analysis of the collinearity of data

  7. Use of multivariate extensions of generalized linear models in the analysis of data from clinical trials

    OpenAIRE

    ALONSO ABAD, Ariel; Rodriguez, O.; TIBALDI, Fabian; CORTINAS ABRAHANTES, Jose

    2002-01-01

    In medical studies the categorical endpoints are quite often. Even though nowadays some models for handling this multicategorical variables have been developed their use is not common. This work shows an application of the Multivariate Generalized Linear Models to the analysis of Clinical Trials data. After a theoretical introduction models for ordinal and nominal responses are applied and the main results are discussed. multivariate analysis; multivariate logistic regression; multicategor...

  8. OPLS statistical model versus linear regression to assess sonographic predictors of stroke prognosis.

    Science.gov (United States)

    Vajargah, Kianoush Fathi; Sadeghi-Bazargani, Homayoun; Mehdizadeh-Esfanjani, Robab; Savadi-Oskouei, Daryoush; Farhoudi, Mehdi

    2012-01-01

    The objective of the present study was to assess the comparable applicability of orthogonal projections to latent structures (OPLS) statistical model vs traditional linear regression in order to investigate the role of trans cranial doppler (TCD) sonography in predicting ischemic stroke prognosis. The study was conducted on 116 ischemic stroke patients admitted to a specialty neurology ward. The Unified Neurological Stroke Scale was used once for clinical evaluation on the first week of admission and again six months later. All data was primarily analyzed using simple linear regression and later considered for multivariate analysis using PLS/OPLS models through the SIMCA P+12 statistical software package. The linear regression analysis results used for the identification of TCD predictors of stroke prognosis were confirmed through the OPLS modeling technique. Moreover, in comparison to linear regression, the OPLS model appeared to have higher sensitivity in detecting the predictors of ischemic stroke prognosis and detected several more predictors. Applying the OPLS model made it possible to use both single TCD measures/indicators and arbitrarily dichotomized measures of TCD single vessel involvement as well as the overall TCD result. In conclusion, the authors recommend PLS/OPLS methods as complementary rather than alternative to the available classical regression models such as linear regression.

  9. Recursive Algorithm For Linear Regression

    Science.gov (United States)

    Varanasi, S. V.

    1988-01-01

    Order of model determined easily. Linear-regression algorithhm includes recursive equations for coefficients of model of increased order. Algorithm eliminates duplicative calculations, facilitates search for minimum order of linear-regression model fitting set of data satisfactory.

  10. Depth-weighted robust multivariate regression with application to sparse data

    KAUST Repository

    Dutta, Subhajit; Genton, Marc G.

    2017-01-01

    A robust method for multivariate regression is developed based on robust estimators of the joint location and scatter matrix of the explanatory and response variables using the notion of data depth. The multivariate regression estimator possesses desirable affine equivariance properties, achieves the best breakdown point of any affine equivariant estimator, and has an influence function which is bounded in both the response as well as the predictor variable. To increase the efficiency of this estimator, a re-weighted estimator based on robust Mahalanobis distances of the residual vectors is proposed. In practice, the method is more stable than existing methods that are constructed using subsamples of the data. The resulting multivariate regression technique is computationally feasible, and turns out to perform better than several popular robust multivariate regression methods when applied to various simulated data as well as a real benchmark data set. When the data dimension is quite high compared to the sample size it is still possible to use meaningful notions of data depth along with the corresponding depth values to construct a robust estimator in a sparse setting.

  11. Depth-weighted robust multivariate regression with application to sparse data

    KAUST Repository

    Dutta, Subhajit

    2017-04-05

    A robust method for multivariate regression is developed based on robust estimators of the joint location and scatter matrix of the explanatory and response variables using the notion of data depth. The multivariate regression estimator possesses desirable affine equivariance properties, achieves the best breakdown point of any affine equivariant estimator, and has an influence function which is bounded in both the response as well as the predictor variable. To increase the efficiency of this estimator, a re-weighted estimator based on robust Mahalanobis distances of the residual vectors is proposed. In practice, the method is more stable than existing methods that are constructed using subsamples of the data. The resulting multivariate regression technique is computationally feasible, and turns out to perform better than several popular robust multivariate regression methods when applied to various simulated data as well as a real benchmark data set. When the data dimension is quite high compared to the sample size it is still possible to use meaningful notions of data depth along with the corresponding depth values to construct a robust estimator in a sparse setting.

  12. Aortic and Hepatic Contrast Enhancement During Hepatic-Arterial and Portal Venous Phase Computed Tomography Scanning: Multivariate Linear Regression Analysis Using Age, Sex, Total Body Weight, Height, and Cardiac Output.

    Science.gov (United States)

    Masuda, Takanori; Nakaura, Takeshi; Funama, Yoshinori; Higaki, Toru; Kiguchi, Masao; Imada, Naoyuki; Sato, Tomoyasu; Awai, Kazuo

    We evaluated the effect of the age, sex, total body weight (TBW), height (HT) and cardiac output (CO) of patients on aortic and hepatic contrast enhancement during hepatic-arterial phase (HAP) and portal venous phase (PVP) computed tomography (CT) scanning. This prospective study received institutional review board approval; prior informed consent to participate was obtained from all 168 patients. All were examined using our routine protocol; the contrast material was 600 mg/kg iodine. Cardiac output was measured with a portable electrical velocimeter within 5 minutes of starting the CT scan. We calculated contrast enhancement (per gram of iodine: [INCREMENT]HU/gI) of the abdominal aorta during the HAP and of the liver parenchyma during the PVP. We performed univariate and multivariate linear regression analysis between all patient characteristics and the [INCREMENT]HU/gI of aortic- and liver parenchymal enhancement. Univariate linear regression analysis demonstrated statistically significant correlations between the [INCREMENT]HU/gI and the age, sex, TBW, HT, and CO (all P linear regression analysis showed that only the TBW and CO were of independent predictive value (P linear regression analysis only the TBW and CO were significantly correlated with aortic and liver parenchymal enhancement; the age, sex, and HT were not. The CO was the only independent factor affecting aortic and liver parenchymal enhancement at hepatic CT when the protocol was adjusted for the TBW.

  13. Improving sub-pixel imperviousness change prediction by ensembling heterogeneous non-linear regression models

    Directory of Open Access Journals (Sweden)

    Drzewiecki Wojciech

    2016-12-01

    Full Text Available In this work nine non-linear regression models were compared for sub-pixel impervious surface area mapping from Landsat images. The comparison was done in three study areas both for accuracy of imperviousness coverage evaluation in individual points in time and accuracy of imperviousness change assessment. The performance of individual machine learning algorithms (Cubist, Random Forest, stochastic gradient boosting of regression trees, k-nearest neighbors regression, random k-nearest neighbors regression, Multivariate Adaptive Regression Splines, averaged neural networks, and support vector machines with polynomial and radial kernels was also compared with the performance of heterogeneous model ensembles constructed from the best models trained using particular techniques.

  14. An Efficient Local Algorithm for Distributed Multivariate Regression

    Data.gov (United States)

    National Aeronautics and Space Administration — This paper offers a local distributed algorithm for multivariate regression in large peer-to-peer environments. The algorithm is designed for distributed...

  15. Linear regression analysis: part 14 of a series on evaluation of scientific publications.

    Science.gov (United States)

    Schneider, Astrid; Hommel, Gerhard; Blettner, Maria

    2010-11-01

    Regression analysis is an important statistical method for the analysis of medical data. It enables the identification and characterization of relationships among multiple factors. It also enables the identification of prognostically relevant risk factors and the calculation of risk scores for individual prognostication. This article is based on selected textbooks of statistics, a selective review of the literature, and our own experience. After a brief introduction of the uni- and multivariable regression models, illustrative examples are given to explain what the important considerations are before a regression analysis is performed, and how the results should be interpreted. The reader should then be able to judge whether the method has been used correctly and interpret the results appropriately. The performance and interpretation of linear regression analysis are subject to a variety of pitfalls, which are discussed here in detail. The reader is made aware of common errors of interpretation through practical examples. Both the opportunities for applying linear regression analysis and its limitations are presented.

  16. A Scalable Local Algorithm for Distributed Multivariate Regression

    Data.gov (United States)

    National Aeronautics and Space Administration — This paper offers a local distributed algorithm for multivariate regression in large peer-to-peer environments. The algorithm can be used for distributed...

  17. Discriminative Elastic-Net Regularized Linear Regression.

    Science.gov (United States)

    Zhang, Zheng; Lai, Zhihui; Xu, Yong; Shao, Ling; Wu, Jian; Xie, Guo-Sen

    2017-03-01

    In this paper, we aim at learning compact and discriminative linear regression models. Linear regression has been widely used in different problems. However, most of the existing linear regression methods exploit the conventional zero-one matrix as the regression targets, which greatly narrows the flexibility of the regression model. Another major limitation of these methods is that the learned projection matrix fails to precisely project the image features to the target space due to their weak discriminative capability. To this end, we present an elastic-net regularized linear regression (ENLR) framework, and develop two robust linear regression models which possess the following special characteristics. First, our methods exploit two particular strategies to enlarge the margins of different classes by relaxing the strict binary targets into a more feasible variable matrix. Second, a robust elastic-net regularization of singular values is introduced to enhance the compactness and effectiveness of the learned projection matrix. Third, the resulting optimization problem of ENLR has a closed-form solution in each iteration, which can be solved efficiently. Finally, rather than directly exploiting the projection matrix for recognition, our methods employ the transformed features as the new discriminate representations to make final image classification. Compared with the traditional linear regression model and some of its variants, our method is much more accurate in image classification. Extensive experiments conducted on publicly available data sets well demonstrate that the proposed framework can outperform the state-of-the-art methods. The MATLAB codes of our methods can be available at http://www.yongxu.org/lunwen.html.

  18. Linear regression in astronomy. II

    Science.gov (United States)

    Feigelson, Eric D.; Babu, Gutti J.

    1992-01-01

    A wide variety of least-squares linear regression procedures used in observational astronomy, particularly investigations of the cosmic distance scale, are presented and discussed. The classes of linear models considered are (1) unweighted regression lines, with bootstrap and jackknife resampling; (2) regression solutions when measurement error, in one or both variables, dominates the scatter; (3) methods to apply a calibration line to new data; (4) truncated regression models, which apply to flux-limited data sets; and (5) censored regression models, which apply when nondetections are present. For the calibration problem we develop two new procedures: a formula for the intercept offset between two parallel data sets, which propagates slope errors from one regression to the other; and a generalization of the Working-Hotelling confidence bands to nonstandard least-squares lines. They can provide improved error analysis for Faber-Jackson, Tully-Fisher, and similar cosmic distance scale relations.

  19. Correlation and simple linear regression.

    Science.gov (United States)

    Zou, Kelly H; Tuncali, Kemal; Silverman, Stuart G

    2003-06-01

    In this tutorial article, the concepts of correlation and regression are reviewed and demonstrated. The authors review and compare two correlation coefficients, the Pearson correlation coefficient and the Spearman rho, for measuring linear and nonlinear relationships between two continuous variables. In the case of measuring the linear relationship between a predictor and an outcome variable, simple linear regression analysis is conducted. These statistical concepts are illustrated by using a data set from published literature to assess a computed tomography-guided interventional technique. These statistical methods are important for exploring the relationships between variables and can be applied to many radiologic studies.

  20. Sunspot Cycle Prediction Using Multivariate Regression and Binary ...

    Indian Academy of Sciences (India)

    49

    Multivariate regression model has been derived based on the available cycles 1 .... The flare index correlates well with various parameters of the solar activity. ...... 32) Sabarinath A and Anilkumar A K 2011 A stochastic prediction model for the.

  1. Sparse Linear Identifiable Multivariate Modeling

    DEFF Research Database (Denmark)

    Henao, Ricardo; Winther, Ole

    2011-01-01

    and bench-marked on artificial and real biological data sets. SLIM is closest in spirit to LiNGAM (Shimizu et al., 2006), but differs substantially in inference, Bayesian network structure learning and model comparison. Experimentally, SLIM performs equally well or better than LiNGAM with comparable......In this paper we consider sparse and identifiable linear latent variable (factor) and linear Bayesian network models for parsimonious analysis of multivariate data. We propose a computationally efficient method for joint parameter and model inference, and model comparison. It consists of a fully...

  2. A Matlab program for stepwise regression

    Directory of Open Access Journals (Sweden)

    Yanhong Qi

    2016-03-01

    Full Text Available The stepwise linear regression is a multi-variable regression for identifying statistically significant variables in the linear regression equation. In present study, we presented the Matlab program of stepwise regression.

  3. Piecewise linear regression splines with hyperbolic covariates

    International Nuclear Information System (INIS)

    Cologne, John B.; Sposto, Richard

    1992-09-01

    Consider the problem of fitting a curve to data that exhibit a multiphase linear response with smooth transitions between phases. We propose substituting hyperbolas as covariates in piecewise linear regression splines to obtain curves that are smoothly joined. The method provides an intuitive and easy way to extend the two-phase linear hyperbolic response model of Griffiths and Miller and Watts and Bacon to accommodate more than two linear segments. The resulting regression spline with hyperbolic covariates may be fit by nonlinear regression methods to estimate the degree of curvature between adjoining linear segments. The added complexity of fitting nonlinear, as opposed to linear, regression models is not great. The extra effort is particularly worthwhile when investigators are unwilling to assume that the slope of the response changes abruptly at the join points. We can also estimate the join points (the values of the abscissas where the linear segments would intersect if extrapolated) if their number and approximate locations may be presumed known. An example using data on changing age at menarche in a cohort of Japanese women illustrates the use of the method for exploratory data analysis. (author)

  4. Multivariate Regression of Liver on Intestine of Mice: A ...

    African Journals Online (AJOL)

    Multivariate Regression of Liver on Intestine of Mice: A Chemotherapeutic Evaluation of Plant ... Using an analysis of covariance model, the effects ... The findings revealed, with the aid of likelihood-ratio statistic, a marked improvement in

  5. [From clinical judgment to linear regression model.

    Science.gov (United States)

    Palacios-Cruz, Lino; Pérez, Marcela; Rivas-Ruiz, Rodolfo; Talavera, Juan O

    2013-01-01

    When we think about mathematical models, such as linear regression model, we think that these terms are only used by those engaged in research, a notion that is far from the truth. Legendre described the first mathematical model in 1805, and Galton introduced the formal term in 1886. Linear regression is one of the most commonly used regression models in clinical practice. It is useful to predict or show the relationship between two or more variables as long as the dependent variable is quantitative and has normal distribution. Stated in another way, the regression is used to predict a measure based on the knowledge of at least one other variable. Linear regression has as it's first objective to determine the slope or inclination of the regression line: Y = a + bx, where "a" is the intercept or regression constant and it is equivalent to "Y" value when "X" equals 0 and "b" (also called slope) indicates the increase or decrease that occurs when the variable "x" increases or decreases in one unit. In the regression line, "b" is called regression coefficient. The coefficient of determination (R 2 ) indicates the importance of independent variables in the outcome.

  6. Post-processing through linear regression

    Science.gov (United States)

    van Schaeybroeck, B.; Vannitsem, S.

    2011-03-01

    Various post-processing techniques are compared for both deterministic and ensemble forecasts, all based on linear regression between forecast data and observations. In order to evaluate the quality of the regression methods, three criteria are proposed, related to the effective correction of forecast error, the optimal variability of the corrected forecast and multicollinearity. The regression schemes under consideration include the ordinary least-square (OLS) method, a new time-dependent Tikhonov regularization (TDTR) method, the total least-square method, a new geometric-mean regression (GM), a recently introduced error-in-variables (EVMOS) method and, finally, a "best member" OLS method. The advantages and drawbacks of each method are clarified. These techniques are applied in the context of the 63 Lorenz system, whose model version is affected by both initial condition and model errors. For short forecast lead times, the number and choice of predictors plays an important role. Contrarily to the other techniques, GM degrades when the number of predictors increases. At intermediate lead times, linear regression is unable to provide corrections to the forecast and can sometimes degrade the performance (GM and the best member OLS with noise). At long lead times the regression schemes (EVMOS, TDTR) which yield the correct variability and the largest correlation between ensemble error and spread, should be preferred.

  7. Distributed Monitoring of the R(sup 2) Statistic for Linear Regression

    Science.gov (United States)

    Bhaduri, Kanishka; Das, Kamalika; Giannella, Chris R.

    2011-01-01

    The problem of monitoring a multivariate linear regression model is relevant in studying the evolving relationship between a set of input variables (features) and one or more dependent target variables. This problem becomes challenging for large scale data in a distributed computing environment when only a subset of instances is available at individual nodes and the local data changes frequently. Data centralization and periodic model recomputation can add high overhead to tasks like anomaly detection in such dynamic settings. Therefore, the goal is to develop techniques for monitoring and updating the model over the union of all nodes data in a communication-efficient fashion. Correctness guarantees on such techniques are also often highly desirable, especially in safety-critical application scenarios. In this paper we develop DReMo a distributed algorithm with very low resource overhead, for monitoring the quality of a regression model in terms of its coefficient of determination (R2 statistic). When the nodes collectively determine that R2 has dropped below a fixed threshold, the linear regression model is recomputed via a network-wide convergecast and the updated model is broadcast back to all nodes. We show empirically, using both synthetic and real data, that our proposed method is highly communication-efficient and scalable, and also provide theoretical guarantees on correctness.

  8. Forecasting the daily power output of a grid-connected photovoltaic system based on multivariate adaptive regression splines

    International Nuclear Information System (INIS)

    Li, Yanting; He, Yong; Su, Yan; Shu, Lianjie

    2016-01-01

    Highlights: • Suggests a nonparametric model based on MARS for output power prediction. • Compare the MARS model with a wide variety of prediction models. • Show that the MARS model is able to provide an overall good performance in both the training and testing stages. - Abstract: Both linear and nonlinear models have been proposed for forecasting the power output of photovoltaic systems. Linear models are simple to implement but less flexible. Due to the stochastic nature of the power output of PV systems, nonlinear models tend to provide better forecast than linear models. Motivated by this, this paper suggests a fairly simple nonlinear regression model known as multivariate adaptive regression splines (MARS), as an alternative to forecasting of solar power output. The MARS model is a data-driven modeling approach without any assumption about the relationship between the power output and predictors. It maintains simplicity of the classical multiple linear regression (MLR) model while possessing the capability of handling nonlinearity. It is simpler in format than other nonlinear models such as ANN, k-nearest neighbors (KNN), classification and regression tree (CART), and support vector machine (SVM). The MARS model was applied on the daily output of a grid-connected 2.1 kW PV system to provide the 1-day-ahead mean daily forecast of the power output. The comparisons with a wide variety of forecast models show that the MARS model is able to provide reliable forecast performance.

  9. Post-processing through linear regression

    Directory of Open Access Journals (Sweden)

    B. Van Schaeybroeck

    2011-03-01

    Full Text Available Various post-processing techniques are compared for both deterministic and ensemble forecasts, all based on linear regression between forecast data and observations. In order to evaluate the quality of the regression methods, three criteria are proposed, related to the effective correction of forecast error, the optimal variability of the corrected forecast and multicollinearity. The regression schemes under consideration include the ordinary least-square (OLS method, a new time-dependent Tikhonov regularization (TDTR method, the total least-square method, a new geometric-mean regression (GM, a recently introduced error-in-variables (EVMOS method and, finally, a "best member" OLS method. The advantages and drawbacks of each method are clarified.

    These techniques are applied in the context of the 63 Lorenz system, whose model version is affected by both initial condition and model errors. For short forecast lead times, the number and choice of predictors plays an important role. Contrarily to the other techniques, GM degrades when the number of predictors increases. At intermediate lead times, linear regression is unable to provide corrections to the forecast and can sometimes degrade the performance (GM and the best member OLS with noise. At long lead times the regression schemes (EVMOS, TDTR which yield the correct variability and the largest correlation between ensemble error and spread, should be preferred.

  10. Multivariate statistical modelling based on generalized linear models

    CERN Document Server

    Fahrmeir, Ludwig

    1994-01-01

    This book is concerned with the use of generalized linear models for univariate and multivariate regression analysis. Its emphasis is to provide a detailed introductory survey of the subject based on the analysis of real data drawn from a variety of subjects including the biological sciences, economics, and the social sciences. Where possible, technical details and proofs are deferred to an appendix in order to provide an accessible account for non-experts. Topics covered include: models for multi-categorical responses, model checking, time series and longitudinal data, random effects models, and state-space models. Throughout, the authors have taken great pains to discuss the underlying theoretical ideas in ways that relate well to the data at hand. As a result, numerous researchers whose work relies on the use of these models will find this an invaluable account to have on their desks. "The basic aim of the authors is to bring together and review a large part of recent advances in statistical modelling of m...

  11. Asymptotics of Multivariate Regression with Consecutively Added Dependent Varibles

    NARCIS (Netherlands)

    Raats, V.M.; van der Genugten, B.B.; Moors, J.J.A.

    2004-01-01

    We consider multivariate regression where new dependent variables are consecutively added during the experiment (or in time).So, viewed at the end of the experiment, the number of observations decreases with each added variable. The explanatory variables are observed throughout.In a previous paper

  12. Supremum Norm Posterior Contraction and Credible Sets for Nonparametric Multivariate Regression

    NARCIS (Netherlands)

    Yoo, W.W.; Ghosal, S

    2016-01-01

    In the setting of nonparametric multivariate regression with unknown error variance, we study asymptotic properties of a Bayesian method for estimating a regression function f and its mixed partial derivatives. We use a random series of tensor product of B-splines with normal basis coefficients as a

  13. Learning a Nonnegative Sparse Graph for Linear Regression.

    Science.gov (United States)

    Fang, Xiaozhao; Xu, Yong; Li, Xuelong; Lai, Zhihui; Wong, Wai Keung

    2015-09-01

    Previous graph-based semisupervised learning (G-SSL) methods have the following drawbacks: 1) they usually predefine the graph structure and then use it to perform label prediction, which cannot guarantee an overall optimum and 2) they only focus on the label prediction or the graph structure construction but are not competent in handling new samples. To this end, a novel nonnegative sparse graph (NNSG) learning method was first proposed. Then, both the label prediction and projection learning were integrated into linear regression. Finally, the linear regression and graph structure learning were unified within the same framework to overcome these two drawbacks. Therefore, a novel method, named learning a NNSG for linear regression was presented, in which the linear regression and graph learning were simultaneously performed to guarantee an overall optimum. In the learning process, the label information can be accurately propagated via the graph structure so that the linear regression can learn a discriminative projection to better fit sample labels and accurately classify new samples. An effective algorithm was designed to solve the corresponding optimization problem with fast convergence. Furthermore, NNSG provides a unified perceptiveness for a number of graph-based learning methods and linear regression methods. The experimental results showed that NNSG can obtain very high classification accuracy and greatly outperforms conventional G-SSL methods, especially some conventional graph construction methods.

  14. Removing Malmquist bias from linear regressions

    Science.gov (United States)

    Verter, Frances

    1993-01-01

    Malmquist bias is present in all astronomical surveys where sources are observed above an apparent brightness threshold. Those sources which can be detected at progressively larger distances are progressively more limited to the intrinsically luminous portion of the true distribution. This bias does not distort any of the measurements, but distorts the sample composition. We have developed the first treatment to correct for Malmquist bias in linear regressions of astronomical data. A demonstration of the corrected linear regression that is computed in four steps is presented.

  15. Remote-sensing data processing with the multivariate regression analysis method for iron mineral resource potential mapping: a case study in the Sarvian area, central Iran

    Science.gov (United States)

    Mansouri, Edris; Feizi, Faranak; Jafari Rad, Alireza; Arian, Mehran

    2018-03-01

    This paper uses multivariate regression to create a mathematical model for iron skarn exploration in the Sarvian area, central Iran, using multivariate regression for mineral prospectivity mapping (MPM). The main target of this paper is to apply multivariate regression analysis (as an MPM method) to map iron outcrops in the northeastern part of the study area in order to discover new iron deposits in other parts of the study area. Two types of multivariate regression models using two linear equations were employed to discover new mineral deposits. This method is one of the reliable methods for processing satellite images. ASTER satellite images (14 bands) were used as unique independent variables (UIVs), and iron outcrops were mapped as dependent variables for MPM. According to the results of the probability value (p value), coefficient of determination value (R2) and adjusted determination coefficient (Radj2), the second regression model (which consistent of multiple UIVs) fitted better than other models. The accuracy of the model was confirmed by iron outcrops map and geological observation. Based on field observation, iron mineralization occurs at the contact of limestone and intrusive rocks (skarn type).

  16. A generalized multivariate regression model for modelling ocean wave heights

    Science.gov (United States)

    Wang, X. L.; Feng, Y.; Swail, V. R.

    2012-04-01

    In this study, a generalized multivariate linear regression model is developed to represent the relationship between 6-hourly ocean significant wave heights (Hs) and the corresponding 6-hourly mean sea level pressure (MSLP) fields. The model is calibrated using the ERA-Interim reanalysis of Hs and MSLP fields for 1981-2000, and is validated using the ERA-Interim reanalysis for 2001-2010 and ERA40 reanalysis of Hs and MSLP for 1958-2001. The performance of the fitted model is evaluated in terms of Pierce skill score, frequency bias index, and correlation skill score. Being not normally distributed, wave heights are subjected to a data adaptive Box-Cox transformation before being used in the model fitting. Also, since 6-hourly data are being modelled, lag-1 autocorrelation must be and is accounted for. The models with and without Box-Cox transformation, and with and without accounting for autocorrelation, are inter-compared in terms of their prediction skills. The fitted MSLP-Hs relationship is then used to reconstruct historical wave height climate from the 6-hourly MSLP fields taken from the Twentieth Century Reanalysis (20CR, Compo et al. 2011), and to project possible future wave height climates using CMIP5 model simulations of MSLP fields. The reconstructed and projected wave heights, both seasonal means and maxima, are subject to a trend analysis that allows for non-linear (polynomial) trends.

  17. Multivariate Local Polynomial Regression with Application to Shenzhen Component Index

    Directory of Open Access Journals (Sweden)

    Liyun Su

    2011-01-01

    Full Text Available This study attempts to characterize and predict stock index series in Shenzhen stock market using the concepts of multivariate local polynomial regression. Based on nonlinearity and chaos of the stock index time series, multivariate local polynomial prediction methods and univariate local polynomial prediction method, all of which use the concept of phase space reconstruction according to Takens' Theorem, are considered. To fit the stock index series, the single series changes into bivariate series. To evaluate the results, the multivariate predictor for bivariate time series based on multivariate local polynomial model is compared with univariate predictor with the same Shenzhen stock index data. The numerical results obtained by Shenzhen component index show that the prediction mean squared error of the multivariate predictor is much smaller than the univariate one and is much better than the existed three methods. Even if the last half of the training data are used in the multivariate predictor, the prediction mean squared error is smaller than the univariate predictor. Multivariate local polynomial prediction model for nonsingle time series is a useful tool for stock market price prediction.

  18. Linear regression in astronomy. I

    Science.gov (United States)

    Isobe, Takashi; Feigelson, Eric D.; Akritas, Michael G.; Babu, Gutti Jogesh

    1990-01-01

    Five methods for obtaining linear regression fits to bivariate data with unknown or insignificant measurement errors are discussed: ordinary least-squares (OLS) regression of Y on X, OLS regression of X on Y, the bisector of the two OLS lines, orthogonal regression, and 'reduced major-axis' regression. These methods have been used by various researchers in observational astronomy, most importantly in cosmic distance scale applications. Formulas for calculating the slope and intercept coefficients and their uncertainties are given for all the methods, including a new general form of the OLS variance estimates. The accuracy of the formulas was confirmed using numerical simulations. The applicability of the procedures is discussed with respect to their mathematical properties, the nature of the astronomical data under consideration, and the scientific purpose of the regression. It is found that, for problems needing symmetrical treatment of the variables, the OLS bisector performs significantly better than orthogonal or reduced major-axis regression.

  19. On The Structure of The Inverse of a Linear Constant Multivariable ...

    African Journals Online (AJOL)

    On The Structure of The Inverse of a Linear Constant Multivariable System. ... It is shown that the use of this representation has certain advantages in the design of multivariable feedback systems. typical examples were considered to indicate the corresponding application. Keywords: Stability Functions, multivariable ...

  20. Improving sub-pixel imperviousness change prediction by ensembling heterogeneous non-linear regression models

    Science.gov (United States)

    Drzewiecki, Wojciech

    2016-12-01

    In this work nine non-linear regression models were compared for sub-pixel impervious surface area mapping from Landsat images. The comparison was done in three study areas both for accuracy of imperviousness coverage evaluation in individual points in time and accuracy of imperviousness change assessment. The performance of individual machine learning algorithms (Cubist, Random Forest, stochastic gradient boosting of regression trees, k-nearest neighbors regression, random k-nearest neighbors regression, Multivariate Adaptive Regression Splines, averaged neural networks, and support vector machines with polynomial and radial kernels) was also compared with the performance of heterogeneous model ensembles constructed from the best models trained using particular techniques. The results proved that in case of sub-pixel evaluation the most accurate prediction of change may not necessarily be based on the most accurate individual assessments. When single methods are considered, based on obtained results Cubist algorithm may be advised for Landsat based mapping of imperviousness for single dates. However, Random Forest may be endorsed when the most reliable evaluation of imperviousness change is the primary goal. It gave lower accuracies for individual assessments, but better prediction of change due to more correlated errors of individual predictions. Heterogeneous model ensembles performed for individual time points assessments at least as well as the best individual models. In case of imperviousness change assessment the ensembles always outperformed single model approaches. It means that it is possible to improve the accuracy of sub-pixel imperviousness change assessment using ensembles of heterogeneous non-linear regression models.

  1. Use of probabilistic weights to enhance linear regression myoelectric control.

    Science.gov (United States)

    Smith, Lauren H; Kuiken, Todd A; Hargrove, Levi J

    2015-12-01

    Clinically available prostheses for transradial amputees do not allow simultaneous myoelectric control of degrees of freedom (DOFs). Linear regression methods can provide simultaneous myoelectric control, but frequently also result in difficulty with isolating individual DOFs when desired. This study evaluated the potential of using probabilistic estimates of categories of gross prosthesis movement, which are commonly used in classification-based myoelectric control, to enhance linear regression myoelectric control. Gaussian models were fit to electromyogram (EMG) feature distributions for three movement classes at each DOF (no movement, or movement in either direction) and used to weight the output of linear regression models by the probability that the user intended the movement. Eight able-bodied and two transradial amputee subjects worked in a virtual Fitts' law task to evaluate differences in controllability between linear regression and probability-weighted regression for an intramuscular EMG-based three-DOF wrist and hand system. Real-time and offline analyses in able-bodied subjects demonstrated that probability weighting improved performance during single-DOF tasks (p linear regression control. Use of probability weights can improve the ability to isolate individual during linear regression myoelectric control, while maintaining the ability to simultaneously control multiple DOFs.

  2. Linear regression crash prediction models : issues and proposed solutions.

    Science.gov (United States)

    2010-05-01

    The paper develops a linear regression model approach that can be applied to : crash data to predict vehicle crashes. The proposed approach involves novice data aggregation : to satisfy linear regression assumptions; namely error structure normality ...

  3. Multivariate Regression of Liver on Intestine of Mice: A ...

    African Journals Online (AJOL)

    FIRST LADY

    pairs recovered. Linear, semi-logarithmic and logarithmic-logarithmic (log- log) regressions were performed. He chose the log-log curves because its variance was more uniform. The statistical comparison of .... E(U1| U2 = u2) is the regression function of U1 on U2, and Var (U1|U2 = u2) is the conditional covariance matrix.

  4. Biostatistics Series Module 6: Correlation and Linear Regression.

    Science.gov (United States)

    Hazra, Avijit; Gogtay, Nithya

    2016-01-01

    Correlation and linear regression are the most commonly used techniques for quantifying the association between two numeric variables. Correlation quantifies the strength of the linear relationship between paired variables, expressing this as a correlation coefficient. If both variables x and y are normally distributed, we calculate Pearson's correlation coefficient ( r ). If normality assumption is not met for one or both variables in a correlation analysis, a rank correlation coefficient, such as Spearman's rho (ρ) may be calculated. A hypothesis test of correlation tests whether the linear relationship between the two variables holds in the underlying population, in which case it returns a P correlation coefficient can also be calculated for an idea of the correlation in the population. The value r 2 denotes the proportion of the variability of the dependent variable y that can be attributed to its linear relation with the independent variable x and is called the coefficient of determination. Linear regression is a technique that attempts to link two correlated variables x and y in the form of a mathematical equation ( y = a + bx ), such that given the value of one variable the other may be predicted. In general, the method of least squares is applied to obtain the equation of the regression line. Correlation and linear regression analysis are based on certain assumptions pertaining to the data sets. If these assumptions are not met, misleading conclusions may be drawn. The first assumption is that of linear relationship between the two variables. A scatter plot is essential before embarking on any correlation-regression analysis to show that this is indeed the case. Outliers or clustering within data sets can distort the correlation coefficient value. Finally, it is vital to remember that though strong correlation can be a pointer toward causation, the two are not synonymous.

  5. Linear regression and the normality assumption.

    Science.gov (United States)

    Schmidt, Amand F; Finan, Chris

    2017-12-16

    Researchers often perform arbitrary outcome transformations to fulfill the normality assumption of a linear regression model. This commentary explains and illustrates that in large data settings, such transformations are often unnecessary, and worse may bias model estimates. Linear regression assumptions are illustrated using simulated data and an empirical example on the relation between time since type 2 diabetes diagnosis and glycated hemoglobin levels. Simulation results were evaluated on coverage; i.e., the number of times the 95% confidence interval included the true slope coefficient. Although outcome transformations bias point estimates, violations of the normality assumption in linear regression analyses do not. The normality assumption is necessary to unbiasedly estimate standard errors, and hence confidence intervals and P-values. However, in large sample sizes (e.g., where the number of observations per variable is >10) violations of this normality assumption often do not noticeably impact results. Contrary to this, assumptions on, the parametric model, absence of extreme observations, homoscedasticity, and independency of the errors, remain influential even in large sample size settings. Given that modern healthcare research typically includes thousands of subjects focusing on the normality assumption is often unnecessary, does not guarantee valid results, and worse may bias estimates due to the practice of outcome transformations. Copyright © 2017 Elsevier Inc. All rights reserved.

  6. Controlling attribute effect in linear regression

    KAUST Repository

    Calders, Toon; Karim, Asim A.; Kamiran, Faisal; Ali, Wasif Mohammad; Zhang, Xiangliang

    2013-01-01

    In data mining we often have to learn from biased data, because, for instance, data comes from different batches or there was a gender or racial bias in the collection of social data. In some applications it may be necessary to explicitly control this bias in the models we learn from the data. This paper is the first to study learning linear regression models under constraints that control the biasing effect of a given attribute such as gender or batch number. We show how propensity modeling can be used for factoring out the part of the bias that can be justified by externally provided explanatory attributes. Then we analytically derive linear models that minimize squared error while controlling the bias by imposing constraints on the mean outcome or residuals of the models. Experiments with discrimination-aware crime prediction and batch effect normalization tasks show that the proposed techniques are successful in controlling attribute effects in linear regression models. © 2013 IEEE.

  7. Controlling attribute effect in linear regression

    KAUST Repository

    Calders, Toon

    2013-12-01

    In data mining we often have to learn from biased data, because, for instance, data comes from different batches or there was a gender or racial bias in the collection of social data. In some applications it may be necessary to explicitly control this bias in the models we learn from the data. This paper is the first to study learning linear regression models under constraints that control the biasing effect of a given attribute such as gender or batch number. We show how propensity modeling can be used for factoring out the part of the bias that can be justified by externally provided explanatory attributes. Then we analytically derive linear models that minimize squared error while controlling the bias by imposing constraints on the mean outcome or residuals of the models. Experiments with discrimination-aware crime prediction and batch effect normalization tasks show that the proposed techniques are successful in controlling attribute effects in linear regression models. © 2013 IEEE.

  8. Return-Volatility Relationship: Insights from Linear and Non-Linear Quantile Regression

    NARCIS (Netherlands)

    D.E. Allen (David); A.K. Singh (Abhay); R.J. Powell (Robert); M.J. McAleer (Michael); J. Taylor (James); L. Thomas (Lyn)

    2013-01-01

    textabstractThe purpose of this paper is to examine the asymmetric relationship between price and implied volatility and the associated extreme quantile dependence using linear and non linear quantile regression approach. Our goal in this paper is to demonstrate that the relationship between the

  9. Determination of regression laws: Linear and nonlinear

    International Nuclear Information System (INIS)

    Onishchenko, A.M.

    1994-01-01

    A detailed mathematical determination of regression laws is presented in the article. Particular emphasis is place on determining the laws of X j on X l to account for source nuclei decay and detector errors in nuclear physics instrumentation. Both linear and nonlinear relations are presented. Linearization of 19 functions is tabulated, including graph, relation, variable substitution, obtained linear function, and remarks. 6 refs., 1 tab

  10. REGSTEP - stepwise multivariate polynomial regression with singular extensions

    International Nuclear Information System (INIS)

    Davierwalla, D.M.

    1977-09-01

    The program REGSTEP determines a polynomial approximation, in the least squares sense, to tabulated data. The polynomial may be univariate or multivariate. The computational method is that of stepwise regression. A variable is inserted into the regression basis if it is significant with respect to an appropriate F-test at a preselected risk level. In addition, should a variable already in the basis, become nonsignificant (again with respect to an appropriate F-test) after the entry of a new variable, it is expelled from the model. Thus only significant variables are retained in the model. Although written expressly to be incorporated into CORCOD, a code for predicting nuclear cross sections for given values of power, temperature, void fractions, Boron content etc. there is nothing to limit the use of REGSTEP to nuclear applications, as the examples demonstrate. A separate version has been incorporated into RSYST for the general user. (Auth.)

  11. Evaluation of Linear Regression Simultaneous Myoelectric Control Using Intramuscular EMG.

    Science.gov (United States)

    Smith, Lauren H; Kuiken, Todd A; Hargrove, Levi J

    2016-04-01

    The objective of this study was to evaluate the ability of linear regression models to decode patterns of muscle coactivation from intramuscular electromyogram (EMG) and provide simultaneous myoelectric control of a virtual 3-DOF wrist/hand system. Performance was compared to the simultaneous control of conventional myoelectric prosthesis methods using intramuscular EMG (parallel dual-site control)-an approach that requires users to independently modulate individual muscles in the residual limb, which can be challenging for amputees. Linear regression control was evaluated in eight able-bodied subjects during a virtual Fitts' law task and was compared to performance of eight subjects using parallel dual-site control. An offline analysis also evaluated how different types of training data affected prediction accuracy of linear regression control. The two control systems demonstrated similar overall performance; however, the linear regression method demonstrated improved performance for targets requiring use of all three DOFs, whereas parallel dual-site control demonstrated improved performance for targets that required use of only one DOF. Subjects using linear regression control could more easily activate multiple DOFs simultaneously, but often experienced unintended movements when trying to isolate individual DOFs. Offline analyses also suggested that the method used to train linear regression systems may influence controllability. Linear regression myoelectric control using intramuscular EMG provided an alternative to parallel dual-site control for 3-DOF simultaneous control at the wrist and hand. The two methods demonstrated different strengths in controllability, highlighting the tradeoff between providing simultaneous control and the ability to isolate individual DOFs when desired.

  12. A Model for Shovel Capital Cost Estimation, Using a Hybrid Model of Multivariate Regression and Neural Networks

    Directory of Open Access Journals (Sweden)

    Abdolreza Yazdani-Chamzini

    2017-12-01

    Full Text Available Cost estimation is an essential issue in feasibility studies in civil engineering. Many different methods can be applied to modelling costs. These methods can be divided into several main groups: (1 artificial intelligence, (2 statistical methods, and (3 analytical methods. In this paper, the multivariate regression (MVR method, which is one of the most popular linear models, and the artificial neural network (ANN method, which is widely applied to solving different prediction problems with a high degree of accuracy, have been combined to provide a cost estimate model for a shovel machine. This hybrid methodology is proposed, taking the advantages of MVR and ANN models in linear and nonlinear modelling, respectively. In the proposed model, the unique advantages of the MVR model in linear modelling are used first to recognize the existing linear structure in data, and, then, the ANN for determining nonlinear patterns in preprocessed data is applied. The results with three indices indicate that the proposed model is efficient and capable of increasing the prediction accuracy.

  13. Augmenting Data with Published Results in Bayesian Linear Regression

    Science.gov (United States)

    de Leeuw, Christiaan; Klugkist, Irene

    2012-01-01

    In most research, linear regression analyses are performed without taking into account published results (i.e., reported summary statistics) of similar previous studies. Although the prior density in Bayesian linear regression could accommodate such prior knowledge, formal models for doing so are absent from the literature. The goal of this…

  14. Extending the linear model with R generalized linear, mixed effects and nonparametric regression models

    CERN Document Server

    Faraway, Julian J

    2005-01-01

    Linear models are central to the practice of statistics and form the foundation of a vast range of statistical methodologies. Julian J. Faraway''s critically acclaimed Linear Models with R examined regression and analysis of variance, demonstrated the different methods available, and showed in which situations each one applies. Following in those footsteps, Extending the Linear Model with R surveys the techniques that grow from the regression model, presenting three extensions to that framework: generalized linear models (GLMs), mixed effect models, and nonparametric regression models. The author''s treatment is thoroughly modern and covers topics that include GLM diagnostics, generalized linear mixed models, trees, and even the use of neural networks in statistics. To demonstrate the interplay of theory and practice, throughout the book the author weaves the use of the R software environment to analyze the data of real examples, providing all of the R commands necessary to reproduce the analyses. All of the ...

  15. Polynomial regression analysis and significance test of the regression function

    International Nuclear Information System (INIS)

    Gao Zhengming; Zhao Juan; He Shengping

    2012-01-01

    In order to analyze the decay heating power of a certain radioactive isotope per kilogram with polynomial regression method, the paper firstly demonstrated the broad usage of polynomial function and deduced its parameters with ordinary least squares estimate. Then significance test method of polynomial regression function is derived considering the similarity between the polynomial regression model and the multivariable linear regression model. Finally, polynomial regression analysis and significance test of the polynomial function are done to the decay heating power of the iso tope per kilogram in accord with the authors' real work. (authors)

  16. Multivariate nonparametric regression and visualization with R and applications to finance

    CERN Document Server

    Klemelä, Jussi

    2014-01-01

    A modern approach to statistical learning and its applications through visualization methods With a unique and innovative presentation, Multivariate Nonparametric Regression and Visualization provides readers with the core statistical concepts to obtain complete and accurate predictions when given a set of data. Focusing on nonparametric methods to adapt to the multiple types of data generatingmechanisms, the book begins with an overview of classification and regression. The book then introduces and examines various tested and proven visualization techniques for learning samples and functio

  17. MIDAS: Regionally linear multivariate discriminative statistical mapping.

    Science.gov (United States)

    Varol, Erdem; Sotiras, Aristeidis; Davatzikos, Christos

    2018-07-01

    Statistical parametric maps formed via voxel-wise mass-univariate tests, such as the general linear model, are commonly used to test hypotheses about regionally specific effects in neuroimaging cross-sectional studies where each subject is represented by a single image. Despite being informative, these techniques remain limited as they ignore multivariate relationships in the data. Most importantly, the commonly employed local Gaussian smoothing, which is important for accounting for registration errors and making the data follow Gaussian distributions, is usually chosen in an ad hoc fashion. Thus, it is often suboptimal for the task of detecting group differences and correlations with non-imaging variables. Information mapping techniques, such as searchlight, which use pattern classifiers to exploit multivariate information and obtain more powerful statistical maps, have become increasingly popular in recent years. However, existing methods may lead to important interpretation errors in practice (i.e., misidentifying a cluster as informative, or failing to detect truly informative voxels), while often being computationally expensive. To address these issues, we introduce a novel efficient multivariate statistical framework for cross-sectional studies, termed MIDAS, seeking highly sensitive and specific voxel-wise brain maps, while leveraging the power of regional discriminant analysis. In MIDAS, locally linear discriminative learning is applied to estimate the pattern that best discriminates between two groups, or predicts a variable of interest. This pattern is equivalent to local filtering by an optimal kernel whose coefficients are the weights of the linear discriminant. By composing information from all neighborhoods that contain a given voxel, MIDAS produces a statistic that collectively reflects the contribution of the voxel to the regional classifiers as well as the discriminative power of the classifiers. Critically, MIDAS efficiently assesses the

  18. Use of probabilistic weights to enhance linear regression myoelectric control

    Science.gov (United States)

    Smith, Lauren H.; Kuiken, Todd A.; Hargrove, Levi J.

    2015-12-01

    Objective. Clinically available prostheses for transradial amputees do not allow simultaneous myoelectric control of degrees of freedom (DOFs). Linear regression methods can provide simultaneous myoelectric control, but frequently also result in difficulty with isolating individual DOFs when desired. This study evaluated the potential of using probabilistic estimates of categories of gross prosthesis movement, which are commonly used in classification-based myoelectric control, to enhance linear regression myoelectric control. Approach. Gaussian models were fit to electromyogram (EMG) feature distributions for three movement classes at each DOF (no movement, or movement in either direction) and used to weight the output of linear regression models by the probability that the user intended the movement. Eight able-bodied and two transradial amputee subjects worked in a virtual Fitts’ law task to evaluate differences in controllability between linear regression and probability-weighted regression for an intramuscular EMG-based three-DOF wrist and hand system. Main results. Real-time and offline analyses in able-bodied subjects demonstrated that probability weighting improved performance during single-DOF tasks (p < 0.05) by preventing extraneous movement at additional DOFs. Similar results were seen in experiments with two transradial amputees. Though goodness-of-fit evaluations suggested that the EMG feature distributions showed some deviations from the Gaussian, equal-covariance assumptions used in this experiment, the assumptions were sufficiently met to provide improved performance compared to linear regression control. Significance. Use of probability weights can improve the ability to isolate individual during linear regression myoelectric control, while maintaining the ability to simultaneously control multiple DOFs.

  19. Collision prediction models using multivariate Poisson-lognormal regression.

    Science.gov (United States)

    El-Basyouny, Karim; Sayed, Tarek

    2009-07-01

    This paper advocates the use of multivariate Poisson-lognormal (MVPLN) regression to develop models for collision count data. The MVPLN approach presents an opportunity to incorporate the correlations across collision severity levels and their influence on safety analyses. The paper introduces a new multivariate hazardous location identification technique, which generalizes the univariate posterior probability of excess that has been commonly proposed and applied in the literature. In addition, the paper presents an alternative approach for quantifying the effect of the multivariate structure on the precision of expected collision frequency. The MVPLN approach is compared with the independent (separate) univariate Poisson-lognormal (PLN) models with respect to model inference, goodness-of-fit, identification of hot spots and precision of expected collision frequency. The MVPLN is modeled using the WinBUGS platform which facilitates computation of posterior distributions as well as providing a goodness-of-fit measure for model comparisons. The results indicate that the estimates of the extra Poisson variation parameters were considerably smaller under MVPLN leading to higher precision. The improvement in precision is due mainly to the fact that MVPLN accounts for the correlation between the latent variables representing property damage only (PDO) and injuries plus fatalities (I+F). This correlation was estimated at 0.758, which is highly significant, suggesting that higher PDO rates are associated with higher I+F rates, as the collision likelihood for both types is likely to rise due to similar deficiencies in roadway design and/or other unobserved factors. In terms of goodness-of-fit, the MVPLN model provided a superior fit than the independent univariate models. The multivariate hazardous location identification results demonstrated that some hazardous locations could be overlooked if the analysis was restricted to the univariate models.

  20. Fourier transform infrared spectroscopic imaging and multivariate regression for prediction of proteoglycan content of articular cartilage.

    Directory of Open Access Journals (Sweden)

    Lassi Rieppo

    Full Text Available Fourier Transform Infrared (FT-IR spectroscopic imaging has been earlier applied for the spatial estimation of the collagen and the proteoglycan (PG contents of articular cartilage (AC. However, earlier studies have been limited to the use of univariate analysis techniques. Current analysis methods lack the needed specificity for collagen and PGs. The aim of the present study was to evaluate the suitability of partial least squares regression (PLSR and principal component regression (PCR methods for the analysis of the PG content of AC. Multivariate regression models were compared with earlier used univariate methods and tested with a sample material consisting of healthy and enzymatically degraded steer AC. Chondroitinase ABC enzyme was used to increase the variation in PG content levels as compared to intact AC. Digital densitometric measurements of Safranin O-stained sections provided the reference for PG content. The results showed that multivariate regression models predict PG content of AC significantly better than earlier used absorbance spectrum (i.e. the area of carbohydrate region with or without amide I normalization or second derivative spectrum univariate parameters. Increased molecular specificity favours the use of multivariate regression models, but they require more knowledge of chemometric analysis and extended laboratory resources for gathering reference data for establishing the models. When true molecular specificity is required, the multivariate models should be used.

  1. Preference learning with evolutionary Multivariate Adaptive Regression Spline model

    DEFF Research Database (Denmark)

    Abou-Zleikha, Mohamed; Shaker, Noor; Christensen, Mads Græsbøll

    2015-01-01

    This paper introduces a novel approach for pairwise preference learning through combining an evolutionary method with Multivariate Adaptive Regression Spline (MARS). Collecting users' feedback through pairwise preferences is recommended over other ranking approaches as this method is more appealing...... for function approximation as well as being relatively easy to interpret. MARS models are evolved based on their efficiency in learning pairwise data. The method is tested on two datasets that collectively provide pairwise preference data of five cognitive states expressed by users. The method is analysed...

  2. Finite Algorithms for Robust Linear Regression

    DEFF Research Database (Denmark)

    Madsen, Kaj; Nielsen, Hans Bruun

    1990-01-01

    The Huber M-estimator for robust linear regression is analyzed. Newton type methods for solution of the problem are defined and analyzed, and finite convergence is proved. Numerical experiments with a large number of test problems demonstrate efficiency and indicate that this kind of approach may...

  3. Direction of Effects in Multiple Linear Regression Models.

    Science.gov (United States)

    Wiedermann, Wolfgang; von Eye, Alexander

    2015-01-01

    Previous studies analyzed asymmetric properties of the Pearson correlation coefficient using higher than second order moments. These asymmetric properties can be used to determine the direction of dependence in a linear regression setting (i.e., establish which of two variables is more likely to be on the outcome side) within the framework of cross-sectional observational data. Extant approaches are restricted to the bivariate regression case. The present contribution extends the direction of dependence methodology to a multiple linear regression setting by analyzing distributional properties of residuals of competing multiple regression models. It is shown that, under certain conditions, the third central moments of estimated regression residuals can be used to decide upon direction of effects. In addition, three different approaches for statistical inference are discussed: a combined D'Agostino normality test, a skewness difference test, and a bootstrap difference test. Type I error and power of the procedures are assessed using Monte Carlo simulations, and an empirical example is provided for illustrative purposes. In the discussion, issues concerning the quality of psychological data, possible extensions of the proposed methods to the fourth central moment of regression residuals, and potential applications are addressed.

  4. Interpretability of Multivariate Brain Maps in Linear Brain Decoding: Definition, and Heuristic Quantification in Multivariate Analysis of MEG Time-Locked Effects.

    Science.gov (United States)

    Kia, Seyed Mostafa; Vega Pons, Sandro; Weisz, Nathan; Passerini, Andrea

    2016-01-01

    Brain decoding is a popular multivariate approach for hypothesis testing in neuroimaging. Linear classifiers are widely employed in the brain decoding paradigm to discriminate among experimental conditions. Then, the derived linear weights are visualized in the form of multivariate brain maps to further study spatio-temporal patterns of underlying neural activities. It is well known that the brain maps derived from weights of linear classifiers are hard to interpret because of high correlations between predictors, low signal to noise ratios, and the high dimensionality of neuroimaging data. Therefore, improving the interpretability of brain decoding approaches is of primary interest in many neuroimaging studies. Despite extensive studies of this type, at present, there is no formal definition for interpretability of multivariate brain maps. As a consequence, there is no quantitative measure for evaluating the interpretability of different brain decoding methods. In this paper, first, we present a theoretical definition of interpretability in brain decoding; we show that the interpretability of multivariate brain maps can be decomposed into their reproducibility and representativeness. Second, as an application of the proposed definition, we exemplify a heuristic for approximating the interpretability in multivariate analysis of evoked magnetoencephalography (MEG) responses. Third, we propose to combine the approximated interpretability and the generalization performance of the brain decoding into a new multi-objective criterion for model selection. Our results, for the simulated and real MEG data, show that optimizing the hyper-parameters of the regularized linear classifier based on the proposed criterion results in more informative multivariate brain maps. More importantly, the presented definition provides the theoretical background for quantitative evaluation of interpretability, and hence, facilitates the development of more effective brain decoding algorithms

  5. Simple and multiple linear regression: sample size considerations.

    Science.gov (United States)

    Hanley, James A

    2016-11-01

    The suggested "two subjects per variable" (2SPV) rule of thumb in the Austin and Steyerberg article is a chance to bring out some long-established and quite intuitive sample size considerations for both simple and multiple linear regression. This article distinguishes two of the major uses of regression models that imply very different sample size considerations, neither served well by the 2SPV rule. The first is etiological research, which contrasts mean Y levels at differing "exposure" (X) values and thus tends to focus on a single regression coefficient, possibly adjusted for confounders. The second research genre guides clinical practice. It addresses Y levels for individuals with different covariate patterns or "profiles." It focuses on the profile-specific (mean) Y levels themselves, estimating them via linear compounds of regression coefficients and covariates. By drawing on long-established closed-form variance formulae that lie beneath the standard errors in multiple regression, and by rearranging them for heuristic purposes, one arrives at quite intuitive sample size considerations for both research genres. Copyright © 2016 Elsevier Inc. All rights reserved.

  6. Who Will Win?: Predicting the Presidential Election Using Linear Regression

    Science.gov (United States)

    Lamb, John H.

    2007-01-01

    This article outlines a linear regression activity that engages learners, uses technology, and fosters cooperation. Students generated least-squares linear regression equations using TI-83 Plus[TM] graphing calculators, Microsoft[C] Excel, and paper-and-pencil calculations using derived normal equations to predict the 2004 presidential election.…

  7. Quantum algorithm for linear regression

    Science.gov (United States)

    Wang, Guoming

    2017-07-01

    We present a quantum algorithm for fitting a linear regression model to a given data set using the least-squares approach. Differently from previous algorithms which yield a quantum state encoding the optimal parameters, our algorithm outputs these numbers in the classical form. So by running it once, one completely determines the fitted model and then can use it to make predictions on new data at little cost. Moreover, our algorithm works in the standard oracle model, and can handle data sets with nonsparse design matrices. It runs in time poly( log2(N ) ,d ,κ ,1 /ɛ ) , where N is the size of the data set, d is the number of adjustable parameters, κ is the condition number of the design matrix, and ɛ is the desired precision in the output. We also show that the polynomial dependence on d and κ is necessary. Thus, our algorithm cannot be significantly improved. Furthermore, we also give a quantum algorithm that estimates the quality of the least-squares fit (without computing its parameters explicitly). This algorithm runs faster than the one for finding this fit, and can be used to check whether the given data set qualifies for linear regression in the first place.

  8. Comparison of Linear and Non-linear Regression Analysis to Determine Pulmonary Pressure in Hyperthyroidism.

    Science.gov (United States)

    Scarneciu, Camelia C; Sangeorzan, Livia; Rus, Horatiu; Scarneciu, Vlad D; Varciu, Mihai S; Andreescu, Oana; Scarneciu, Ioan

    2017-01-01

    This study aimed at assessing the incidence of pulmonary hypertension (PH) at newly diagnosed hyperthyroid patients and at finding a simple model showing the complex functional relation between pulmonary hypertension in hyperthyroidism and the factors causing it. The 53 hyperthyroid patients (H-group) were evaluated mainly by using an echocardiographical method and compared with 35 euthyroid (E-group) and 25 healthy people (C-group). In order to identify the factors causing pulmonary hypertension the statistical method of comparing the values of arithmetical means is used. The functional relation between the two random variables (PAPs and each of the factors determining it within our research study) can be expressed by linear or non-linear function. By applying the linear regression method described by a first-degree equation the line of regression (linear model) has been determined; by applying the non-linear regression method described by a second degree equation, a parabola-type curve of regression (non-linear or polynomial model) has been determined. We made the comparison and the validation of these two models by calculating the determination coefficient (criterion 1), the comparison of residuals (criterion 2), application of AIC criterion (criterion 3) and use of F-test (criterion 4). From the H-group, 47% have pulmonary hypertension completely reversible when obtaining euthyroidism. The factors causing pulmonary hypertension were identified: previously known- level of free thyroxin, pulmonary vascular resistance, cardiac output; new factors identified in this study- pretreatment period, age, systolic blood pressure. According to the four criteria and to the clinical judgment, we consider that the polynomial model (graphically parabola- type) is better than the linear one. The better model showing the functional relation between the pulmonary hypertension in hyperthyroidism and the factors identified in this study is given by a polynomial equation of second

  9. A Monte Carlo simulation study comparing linear regression, beta regression, variable-dispersion beta regression and fractional logit regression at recovering average difference measures in a two sample design.

    Science.gov (United States)

    Meaney, Christopher; Moineddin, Rahim

    2014-01-24

    In biomedical research, response variables are often encountered which have bounded support on the open unit interval--(0,1). Traditionally, researchers have attempted to estimate covariate effects on these types of response data using linear regression. Alternative modelling strategies may include: beta regression, variable-dispersion beta regression, and fractional logit regression models. This study employs a Monte Carlo simulation design to compare the statistical properties of the linear regression model to that of the more novel beta regression, variable-dispersion beta regression, and fractional logit regression models. In the Monte Carlo experiment we assume a simple two sample design. We assume observations are realizations of independent draws from their respective probability models. The randomly simulated draws from the various probability models are chosen to emulate average proportion/percentage/rate differences of pre-specified magnitudes. Following simulation of the experimental data we estimate average proportion/percentage/rate differences. We compare the estimators in terms of bias, variance, type-1 error and power. Estimates of Monte Carlo error associated with these quantities are provided. If response data are beta distributed with constant dispersion parameters across the two samples, then all models are unbiased and have reasonable type-1 error rates and power profiles. If the response data in the two samples have different dispersion parameters, then the simple beta regression model is biased. When the sample size is small (N0 = N1 = 25) linear regression has superior type-1 error rates compared to the other models. Small sample type-1 error rates can be improved in beta regression models using bias correction/reduction methods. In the power experiments, variable-dispersion beta regression and fractional logit regression models have slightly elevated power compared to linear regression models. Similar results were observed if the

  10. Healthcare Expenditures Associated with Depression Among Individuals with Osteoarthritis: Post-Regression Linear Decomposition Approach.

    Science.gov (United States)

    Agarwal, Parul; Sambamoorthi, Usha

    2015-12-01

    Depression is common among individuals with osteoarthritis and leads to increased healthcare burden. The objective of this study was to examine excess total healthcare expenditures associated with depression among individuals with osteoarthritis in the US. Adults with self-reported osteoarthritis (n = 1881) were identified using data from the 2010 Medical Expenditure Panel Survey (MEPS). Among those with osteoarthritis, chi-square tests and ordinary least square regressions (OLS) were used to examine differences in healthcare expenditures between those with and without depression. Post-regression linear decomposition technique was used to estimate the relative contribution of different constructs of the Anderson's behavioral model, i.e., predisposing, enabling, need, personal healthcare practices, and external environment factors, to the excess expenditures associated with depression among individuals with osteoarthritis. All analysis accounted for the complex survey design of MEPS. Depression coexisted among 20.6 % of adults with osteoarthritis. The average total healthcare expenditures were $13,684 among adults with depression compared to $9284 among those without depression. Multivariable OLS regression revealed that adults with depression had 38.8 % higher healthcare expenditures (p regression linear decomposition analysis indicated that 50 % of differences in expenditures among adults with and without depression can be explained by differences in need factors. Among individuals with coexisting osteoarthritis and depression, excess healthcare expenditures associated with depression were mainly due to comorbid anxiety, chronic conditions and poor health status. These expenditures may potentially be reduced by providing timely intervention for need factors or by providing care under a collaborative care model.

  11. A test for the parameters of multiple linear regression models ...

    African Journals Online (AJOL)

    A test for the parameters of multiple linear regression models is developed for conducting tests simultaneously on all the parameters of multiple linear regression models. The test is robust relative to the assumptions of homogeneity of variances and absence of serial correlation of the classical F-test. Under certain null and ...

  12. Testing hypotheses for differences between linear regression lines

    Science.gov (United States)

    Stanley J. Zarnoch

    2009-01-01

    Five hypotheses are identified for testing differences between simple linear regression lines. The distinctions between these hypotheses are based on a priori assumptions and illustrated with full and reduced models. The contrast approach is presented as an easy and complete method for testing for overall differences between the regressions and for making pairwise...

  13. Seasonal variation of benzo(a)pyrene in the Spanish airborne PM10. Multivariate linear regression model applied to estimate BaP concentrations.

    Science.gov (United States)

    Callén, M S; López, J M; Mastral, A M

    2010-08-15

    The estimation of benzo(a)pyrene (BaP) concentrations in ambient air is very important from an environmental point of view especially with the introduction of the Directive 2004/107/EC and due to the carcinogenic character of this pollutant. A sampling campaign of particulate matter less or equal than 10 microns (PM10) carried out during 2008-2009 in four locations of Spain was collected to determine experimentally BaP concentrations by gas chromatography mass-spectrometry mass-spectrometry (GC-MS-MS). Multivariate linear regression models (MLRM) were used to predict BaP air concentrations in two sampling places, taking PM10 and meteorological variables as possible predictors. The model obtained with data from two sampling sites (all sites model) (R(2)=0.817, PRESS/SSY=0.183) included the significant variables like PM10, temperature, solar radiation and wind speed and was internally and externally validated. The first validation was performed by cross validation and the last one by BaP concentrations from previous campaigns carried out in Zaragoza from 2001-2004. The proposed model constitutes a first approximation to estimate BaP concentrations in urban atmospheres with very good internal prediction (Q(CV)(2)=0.813, PRESS/SSY=0.187) and with the maximal external prediction for the 2001-2002 campaign (Q(ext)(2)=0.679 and PRESS/SSY=0.321) versus the 2001-2004 campaign (Q(ext)(2)=0.551, PRESS/SSY=0.449). Copyright 2010 Elsevier B.V. All rights reserved.

  14. Seasonal variation of benzo(a)pyrene in the Spanish airborne PM10. Multivariate linear regression model applied to estimate BaP concentrations

    International Nuclear Information System (INIS)

    Callen, M.S.; Lopez, J.M.; Mastral, A.M.

    2010-01-01

    The estimation of benzo(a)pyrene (BaP) concentrations in ambient air is very important from an environmental point of view especially with the introduction of the Directive 2004/107/EC and due to the carcinogenic character of this pollutant. A sampling campaign of particulate matter less or equal than 10 microns (PM10) carried out during 2008-2009 in four locations of Spain was collected to determine experimentally BaP concentrations by gas chromatography mass-spectrometry mass-spectrometry (GC-MS-MS). Multivariate linear regression models (MLRM) were used to predict BaP air concentrations in two sampling places, taking PM10 and meteorological variables as possible predictors. The model obtained with data from two sampling sites (all sites model) (R 2 = 0.817, PRESS/SSY = 0.183) included the significant variables like PM10, temperature, solar radiation and wind speed and was internally and externally validated. The first validation was performed by cross validation and the last one by BaP concentrations from previous campaigns carried out in Zaragoza from 2001-2004. The proposed model constitutes a first approximation to estimate BaP concentrations in urban atmospheres with very good internal prediction (Q CV 2 =0.813, PRESS/SSY = 0.187) and with the maximal external prediction for the 2001-2002 campaign (Q ext 2 =0.679 and PRESS/SSY = 0.321) versus the 2001-2004 campaign (Q ext 2 =0.551, PRESS/SSY = 0.449).

  15. Multiple Linear Regression: A Realistic Reflector.

    Science.gov (United States)

    Nutt, A. T.; Batsell, R. R.

    Examples of the use of Multiple Linear Regression (MLR) techniques are presented. This is done to show how MLR aids data processing and decision-making by providing the decision-maker with freedom in phrasing questions and by accurately reflecting the data on hand. A brief overview of the rationale underlying MLR is given, some basic definitions…

  16. Identification of Influential Points in a Linear Regression Model

    Directory of Open Access Journals (Sweden)

    Jan Grosz

    2011-03-01

    Full Text Available The article deals with the detection and identification of influential points in the linear regression model. Three methods of detection of outliers and leverage points are described. These procedures can also be used for one-sample (independentdatasets. This paper briefly describes theoretical aspects of several robust methods as well. Robust statistics is a powerful tool to increase the reliability and accuracy of statistical modelling and data analysis. A simulation model of the simple linear regression is presented.

  17. SPLINE LINEAR REGRESSION USED FOR EVALUATING FINANCIAL ASSETS 1

    Directory of Open Access Journals (Sweden)

    Liviu GEAMBAŞU

    2010-12-01

    Full Text Available One of the most important preoccupations of financial markets participants was and still is the problem of determining more precise the trend of financial assets prices. For solving this problem there were written many scientific papers and were developed many mathematical and statistical models in order to better determine the financial assets price trend. If until recently the simple linear models were largely used due to their facile utilization, the financial crises that affected the world economy starting with 2008 highlight the necessity of adapting the mathematical models to variation of economy. A simple to use model but adapted to economic life realities is the spline linear regression. This type of regression keeps the continuity of regression function, but split the studied data in intervals with homogenous characteristics. The characteristics of each interval are highlighted and also the evolution of market over all the intervals, resulting reduced standard errors. The first objective of the article is the theoretical presentation of the spline linear regression, also referring to scientific national and international papers related to this subject. The second objective is applying the theoretical model to data from the Bucharest Stock Exchange

  18. The microcomputer scientific software series 2: general linear model--regression.

    Science.gov (United States)

    Harold M. Rauscher

    1983-01-01

    The general linear model regression (GLMR) program provides the microcomputer user with a sophisticated regression analysis capability. The output provides a regression ANOVA table, estimators of the regression model coefficients, their confidence intervals, confidence intervals around the predicted Y-values, residuals for plotting, a check for multicollinearity, a...

  19. Using the Ridge Regression Procedures to Estimate the Multiple Linear Regression Coefficients

    Science.gov (United States)

    Gorgees, HazimMansoor; Mahdi, FatimahAssim

    2018-05-01

    This article concerns with comparing the performance of different types of ordinary ridge regression estimators that have been already proposed to estimate the regression parameters when the near exact linear relationships among the explanatory variables is presented. For this situations we employ the data obtained from tagi gas filling company during the period (2008-2010). The main result we reached is that the method based on the condition number performs better than other methods since it has smaller mean square error (MSE) than the other stated methods.

  20. Answers to selected problems in multivariable calculus with linear algebra and series

    CERN Document Server

    Trench, William F

    1972-01-01

    Answers to Selected Problems in Multivariable Calculus with Linear Algebra and Series contains the answers to selected problems in linear algebra, the calculus of several variables, and series. Topics covered range from vectors and vector spaces to linear matrices and analytic geometry, as well as differential calculus of real-valued functions. Theorems and definitions are included, most of which are followed by worked-out illustrative examples.The problems and corresponding solutions deal with linear equations and matrices, including determinants; vector spaces and linear transformations; eig

  1. Bayesian linear regression with skew-symmetric error distributions with applications to survival analysis

    KAUST Repository

    Rubio, Francisco J.

    2016-02-09

    We study Bayesian linear regression models with skew-symmetric scale mixtures of normal error distributions. These kinds of models can be used to capture departures from the usual assumption of normality of the errors in terms of heavy tails and asymmetry. We propose a general noninformative prior structure for these regression models and show that the corresponding posterior distribution is proper under mild conditions. We extend these propriety results to cases where the response variables are censored. The latter scenario is of interest in the context of accelerated failure time models, which are relevant in survival analysis. We present a simulation study that demonstrates good frequentist properties of the posterior credible intervals associated with the proposed priors. This study also sheds some light on the trade-off between increased model flexibility and the risk of over-fitting. We illustrate the performance of the proposed models with real data. Although we focus on models with univariate response variables, we also present some extensions to the multivariate case in the Supporting Information.

  2. Applied multivariate statistics with R

    CERN Document Server

    Zelterman, Daniel

    2015-01-01

    This book brings the power of multivariate statistics to graduate-level practitioners, making these analytical methods accessible without lengthy mathematical derivations. Using the open source, shareware program R, Professor Zelterman demonstrates the process and outcomes for a wide array of multivariate statistical applications. Chapters cover graphical displays, linear algebra, univariate, bivariate and multivariate normal distributions, factor methods, linear regression, discrimination and classification, clustering, time series models, and additional methods. Zelterman uses practical examples from diverse disciplines to welcome readers from a variety of academic specialties. Those with backgrounds in statistics will learn new methods while they review more familiar topics. Chapters include exercises, real data sets, and R implementations. The data are interesting, real-world topics, particularly from health and biology-related contexts. As an example of the approach, the text examines a sample from the B...

  3. Multivariate Frequency-Severity Regression Models in Insurance

    Directory of Open Access Journals (Sweden)

    Edward W. Frees

    2016-02-01

    Full Text Available In insurance and related industries including healthcare, it is common to have several outcome measures that the analyst wishes to understand using explanatory variables. For example, in automobile insurance, an accident may result in payments for damage to one’s own vehicle, damage to another party’s vehicle, or personal injury. It is also common to be interested in the frequency of accidents in addition to the severity of the claim amounts. This paper synthesizes and extends the literature on multivariate frequency-severity regression modeling with a focus on insurance industry applications. Regression models for understanding the distribution of each outcome continue to be developed yet there now exists a solid body of literature for the marginal outcomes. This paper contributes to this body of literature by focusing on the use of a copula for modeling the dependence among these outcomes; a major advantage of this tool is that it preserves the body of work established for marginal models. We illustrate this approach using data from the Wisconsin Local Government Property Insurance Fund. This fund offers insurance protection for (i property; (ii motor vehicle; and (iii contractors’ equipment claims. In addition to several claim types and frequency-severity components, outcomes can be further categorized by time and space, requiring complex dependency modeling. We find significant dependencies for these data; specifically, we find that dependencies among lines are stronger than the dependencies between the frequency and average severity within each line.

  4. Multivariate analysis with LISREL

    CERN Document Server

    Jöreskog, Karl G; Y Wallentin, Fan

    2016-01-01

    This book traces the theory and methodology of multivariate statistical analysis and shows how it can be conducted in practice using the LISREL computer program. It presents not only the typical uses of LISREL, such as confirmatory factor analysis and structural equation models, but also several other multivariate analysis topics, including regression (univariate, multivariate, censored, logistic, and probit), generalized linear models, multilevel analysis, and principal component analysis. It provides numerous examples from several disciplines and discusses and interprets the results, illustrated with sections of output from the LISREL program, in the context of the example. The book is intended for masters and PhD students and researchers in the social, behavioral, economic and many other sciences who require a basic understanding of multivariate statistical theory and methods for their analysis of multivariate data. It can also be used as a textbook on various topics of multivariate statistical analysis.

  5. Synthesis of linear regression coefficients by recovering the within-study covariance matrix from summary statistics.

    Science.gov (United States)

    Yoneoka, Daisuke; Henmi, Masayuki

    2017-06-01

    Recently, the number of regression models has dramatically increased in several academic fields. However, within the context of meta-analysis, synthesis methods for such models have not been developed in a commensurate trend. One of the difficulties hindering the development is the disparity in sets of covariates among literature models. If the sets of covariates differ across models, interpretation of coefficients will differ, thereby making it difficult to synthesize them. Moreover, previous synthesis methods for regression models, such as multivariate meta-analysis, often have problems because covariance matrix of coefficients (i.e. within-study correlations) or individual patient data are not necessarily available. This study, therefore, proposes a brief explanation regarding a method to synthesize linear regression models under different covariate sets by using a generalized least squares method involving bias correction terms. Especially, we also propose an approach to recover (at most) threecorrelations of covariates, which is required for the calculation of the bias term without individual patient data. Copyright © 2016 John Wiley & Sons, Ltd. Copyright © 2016 John Wiley & Sons, Ltd.

  6. Prediction of retention indices for frequently reported compounds of plant essential oils using multiple linear regression, partial least squares, and support vector machine.

    Science.gov (United States)

    Yan, Jun; Huang, Jian-Hua; He, Min; Lu, Hong-Bing; Yang, Rui; Kong, Bo; Xu, Qing-Song; Liang, Yi-Zeng

    2013-08-01

    Retention indices for frequently reported compounds of plant essential oils on three different stationary phases were investigated. Multivariate linear regression, partial least squares, and support vector machine combined with a new variable selection approach called random-frog recently proposed by our group, were employed to model quantitative structure-retention relationships. Internal and external validations were performed to ensure the stability and predictive ability. All the three methods could obtain an acceptable model, and the optimal results by support vector machine based on a small number of informative descriptors with the square of correlation coefficient for cross validation, values of 0.9726, 0.9759, and 0.9331 on the dimethylsilicone stationary phase, the dimethylsilicone phase with 5% phenyl groups, and the PEG stationary phase, respectively. The performances of two variable selection approaches, random-frog and genetic algorithm, are compared. The importance of the variables was found to be consistent when estimated from correlation coefficients in multivariate linear regression equations and selection probability in model spaces. © 2013 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.

  7. Stochastic development regression on non-linear manifolds

    DEFF Research Database (Denmark)

    Kühnel, Line; Sommer, Stefan Horst

    2017-01-01

    We introduce a regression model for data on non-linear manifolds. The model describes the relation between a set of manifold valued observations, such as shapes of anatomical objects, and Euclidean explanatory variables. The approach is based on stochastic development of Euclidean diffusion...... processes to the manifold. Defining the data distribution as the transition distribution of the mapped stochastic process, parameters of the model, the non-linear analogue of design matrix and intercept, are found via maximum likelihood. The model is intrinsically related to the geometry encoded...

  8. Comparison between Linear and Nonlinear Regression in a Laboratory Heat Transfer Experiment

    Science.gov (United States)

    Gonçalves, Carine Messias; Schwaab, Marcio; Pinto, José Carlos

    2013-01-01

    In order to interpret laboratory experimental data, undergraduate students are used to perform linear regression through linearized versions of nonlinear models. However, the use of linearized models can lead to statistically biased parameter estimates. Even so, it is not an easy task to introduce nonlinear regression and show for the students…

  9. Linear regression methods a ccording to objective functions

    OpenAIRE

    Yasemin Sisman; Sebahattin Bektas

    2012-01-01

    The aim of the study is to explain the parameter estimation methods and the regression analysis. The simple linear regressionmethods grouped according to the objective function are introduced. The numerical solution is achieved for the simple linear regressionmethods according to objective function of Least Squares and theLeast Absolute Value adjustment methods. The success of the appliedmethods is analyzed using their objective function values.

  10. Estimating monotonic rates from biological data using local linear regression.

    Science.gov (United States)

    Olito, Colin; White, Craig R; Marshall, Dustin J; Barneche, Diego R

    2017-03-01

    Accessing many fundamental questions in biology begins with empirical estimation of simple monotonic rates of underlying biological processes. Across a variety of disciplines, ranging from physiology to biogeochemistry, these rates are routinely estimated from non-linear and noisy time series data using linear regression and ad hoc manual truncation of non-linearities. Here, we introduce the R package LoLinR, a flexible toolkit to implement local linear regression techniques to objectively and reproducibly estimate monotonic biological rates from non-linear time series data, and demonstrate possible applications using metabolic rate data. LoLinR provides methods to easily and reliably estimate monotonic rates from time series data in a way that is statistically robust, facilitates reproducible research and is applicable to a wide variety of research disciplines in the biological sciences. © 2017. Published by The Company of Biologists Ltd.

  11. Linear models of coregionalization for multivariate lattice data: Order-dependent and order-free cMCARs.

    Science.gov (United States)

    MacNab, Ying C

    2016-08-01

    This paper concerns with multivariate conditional autoregressive models defined by linear combination of independent or correlated underlying spatial processes. Known as linear models of coregionalization, the method offers a systematic and unified approach for formulating multivariate extensions to a broad range of univariate conditional autoregressive models. The resulting multivariate spatial models represent classes of coregionalized multivariate conditional autoregressive models that enable flexible modelling of multivariate spatial interactions, yielding coregionalization models with symmetric or asymmetric cross-covariances of different spatial variation and smoothness. In the context of multivariate disease mapping, for example, they facilitate borrowing strength both over space and cross variables, allowing for more flexible multivariate spatial smoothing. Specifically, we present a broadened coregionalization framework to include order-dependent, order-free, and order-robust multivariate models; a new class of order-free coregionalized multivariate conditional autoregressives is introduced. We tackle computational challenges and present solutions that are integral for Bayesian analysis of these models. We also discuss two ways of computing deviance information criterion for comparison among competing hierarchical models with or without unidentifiable prior parameters. The models and related methodology are developed in the broad context of modelling multivariate data on spatial lattice and illustrated in the context of multivariate disease mapping. The coregionalization framework and related methods also present a general approach for building spatially structured cross-covariance functions for multivariate geostatistics. © The Author(s) 2016.

  12. Comparison between linear and non-parametric regression models for genome-enabled prediction in wheat.

    Science.gov (United States)

    Pérez-Rodríguez, Paulino; Gianola, Daniel; González-Camacho, Juan Manuel; Crossa, José; Manès, Yann; Dreisigacker, Susanne

    2012-12-01

    In genome-enabled prediction, parametric, semi-parametric, and non-parametric regression models have been used. This study assessed the predictive ability of linear and non-linear models using dense molecular markers. The linear models were linear on marker effects and included the Bayesian LASSO, Bayesian ridge regression, Bayes A, and Bayes B. The non-linear models (this refers to non-linearity on markers) were reproducing kernel Hilbert space (RKHS) regression, Bayesian regularized neural networks (BRNN), and radial basis function neural networks (RBFNN). These statistical models were compared using 306 elite wheat lines from CIMMYT genotyped with 1717 diversity array technology (DArT) markers and two traits, days to heading (DTH) and grain yield (GY), measured in each of 12 environments. It was found that the three non-linear models had better overall prediction accuracy than the linear regression specification. Results showed a consistent superiority of RKHS and RBFNN over the Bayesian LASSO, Bayesian ridge regression, Bayes A, and Bayes B models.

  13. Regression Analysis for Multivariate Dependent Count Data Using Convolved Gaussian Processes

    OpenAIRE

    Sofro, A'yunin; Shi, Jian Qing; Cao, Chunzheng

    2017-01-01

    Research on Poisson regression analysis for dependent data has been developed rapidly in the last decade. One of difficult problems in a multivariate case is how to construct a cross-correlation structure and at the meantime make sure that the covariance matrix is positive definite. To address the issue, we propose to use convolved Gaussian process (CGP) in this paper. The approach provides a semi-parametric model and offers a natural framework for modeling common mean structure and covarianc...

  14. Privacy-Preserving Distributed Linear Regression on High-Dimensional Data

    Directory of Open Access Journals (Sweden)

    Gascón Adrià

    2017-10-01

    Full Text Available We propose privacy-preserving protocols for computing linear regression models, in the setting where the training dataset is vertically distributed among several parties. Our main contribution is a hybrid multi-party computation protocol that combines Yao’s garbled circuits with tailored protocols for computing inner products. Like many machine learning tasks, building a linear regression model involves solving a system of linear equations. We conduct a comprehensive evaluation and comparison of different techniques for securely performing this task, including a new Conjugate Gradient Descent (CGD algorithm. This algorithm is suitable for secure computation because it uses an efficient fixed-point representation of real numbers while maintaining accuracy and convergence rates comparable to what can be obtained with a classical solution using floating point numbers. Our technique improves on Nikolaenko et al.’s method for privacy-preserving ridge regression (S&P 2013, and can be used as a building block in other analyses. We implement a complete system and demonstrate that our approach is highly scalable, solving data analysis problems with one million records and one hundred features in less than one hour of total running time.

  15. Prognostic factorsin inoperable adenocarcinoma of the lung: A multivariate regression analysis of 259 patiens

    DEFF Research Database (Denmark)

    Sørensen, Jens Benn; Badsberg, Jens Henrik; Olsen, Jens

    1989-01-01

    The prognostic factors for survival in advanced adenocarcinoma of the lung were investigated in a consecutive series of 259 patients treated with chemotherapy. Twenty-eight pretreatment variables were investigated by use of Cox's multivariate regression model, including histological subtypes and ...

  16. A comparison of random forest regression and multiple linear regression for prediction in neuroscience.

    Science.gov (United States)

    Smith, Paul F; Ganesh, Siva; Liu, Ping

    2013-10-30

    Regression is a common statistical tool for prediction in neuroscience. However, linear regression is by far the most common form of regression used, with regression trees receiving comparatively little attention. In this study, the results of conventional multiple linear regression (MLR) were compared with those of random forest regression (RFR), in the prediction of the concentrations of 9 neurochemicals in the vestibular nucleus complex and cerebellum that are part of the l-arginine biochemical pathway (agmatine, putrescine, spermidine, spermine, l-arginine, l-ornithine, l-citrulline, glutamate and γ-aminobutyric acid (GABA)). The R(2) values for the MLRs were higher than the proportion of variance explained values for the RFRs: 6/9 of them were ≥ 0.70 compared to 4/9 for RFRs. Even the variables that had the lowest R(2) values for the MLRs, e.g. ornithine (0.50) and glutamate (0.61), had much lower proportion of variance explained values for the RFRs (0.27 and 0.49, respectively). The RSE values for the MLRs were lower than those for the RFRs in all but two cases. In general, MLRs seemed to be superior to the RFRs in terms of predictive value and error. In the case of this data set, MLR appeared to be superior to RFR in terms of its explanatory value and error. This result suggests that MLR may have advantages over RFR for prediction in neuroscience with this kind of data set, but that RFR can still have good predictive value in some cases. Copyright © 2013 Elsevier B.V. All rights reserved.

  17. A Technique of Fuzzy C-Mean in Multiple Linear Regression Model toward Paddy Yield

    Science.gov (United States)

    Syazwan Wahab, Nur; Saifullah Rusiman, Mohd; Mohamad, Mahathir; Amira Azmi, Nur; Che Him, Norziha; Ghazali Kamardan, M.; Ali, Maselan

    2018-04-01

    In this paper, we propose a hybrid model which is a combination of multiple linear regression model and fuzzy c-means method. This research involved a relationship between 20 variates of the top soil that are analyzed prior to planting of paddy yields at standard fertilizer rates. Data used were from the multi-location trials for rice carried out by MARDI at major paddy granary in Peninsular Malaysia during the period from 2009 to 2012. Missing observations were estimated using mean estimation techniques. The data were analyzed using multiple linear regression model and a combination of multiple linear regression model and fuzzy c-means method. Analysis of normality and multicollinearity indicate that the data is normally scattered without multicollinearity among independent variables. Analysis of fuzzy c-means cluster the yield of paddy into two clusters before the multiple linear regression model can be used. The comparison between two method indicate that the hybrid of multiple linear regression model and fuzzy c-means method outperform the multiple linear regression model with lower value of mean square error.

  18. A simple linear regression method for quantitative trait loci linkage analysis with censored observations.

    Science.gov (United States)

    Anderson, Carl A; McRae, Allan F; Visscher, Peter M

    2006-07-01

    Standard quantitative trait loci (QTL) mapping techniques commonly assume that the trait is both fully observed and normally distributed. When considering survival or age-at-onset traits these assumptions are often incorrect. Methods have been developed to map QTL for survival traits; however, they are both computationally intensive and not available in standard genome analysis software packages. We propose a grouped linear regression method for the analysis of continuous survival data. Using simulation we compare this method to both the Cox and Weibull proportional hazards models and a standard linear regression method that ignores censoring. The grouped linear regression method is of equivalent power to both the Cox and Weibull proportional hazards methods and is significantly better than the standard linear regression method when censored observations are present. The method is also robust to the proportion of censored individuals and the underlying distribution of the trait. On the basis of linear regression methodology, the grouped linear regression model is computationally simple and fast and can be implemented readily in freely available statistical software.

  19. Analysis of γ spectra in airborne radioactivity measurements using multiple linear regressions

    International Nuclear Information System (INIS)

    Bao Min; Shi Quanlin; Zhang Jiamei

    2004-01-01

    This paper describes the net peak counts calculating of nuclide 137 Cs at 662 keV of γ spectra in airborne radioactivity measurements using multiple linear regressions. Mathematic model is founded by analyzing every factor that has contribution to Cs peak counts in spectra, and multiple linear regression function is established. Calculating process adopts stepwise regression, and the indistinctive factors are eliminated by F check. The regression results and its uncertainty are calculated using Least Square Estimation, then the Cs peak net counts and its uncertainty can be gotten. The analysis results for experimental spectrum are displayed. The influence of energy shift and energy resolution on the analyzing result is discussed. In comparison with the stripping spectra method, multiple linear regression method needn't stripping radios, and the calculating result has relation with the counts in Cs peak only, and the calculating uncertainty is reduced. (authors)

  20. Correcting for multivariate measurement error by regression calibration in meta-analyses of epidemiological studies.

    NARCIS (Netherlands)

    Kromhout, D.

    2009-01-01

    Within-person variability in measured values of multiple risk factors can bias their associations with disease. The multivariate regression calibration (RC) approach can correct for such measurement error and has been applied to studies in which true values or independent repeat measurements of the

  1. Generating linear regression model to predict motor functions by use of laser range finder during TUG.

    Science.gov (United States)

    Adachi, Daiki; Nishiguchi, Shu; Fukutani, Naoto; Hotta, Takayuki; Tashiro, Yuto; Morino, Saori; Shirooka, Hidehiko; Nozaki, Yuma; Hirata, Hinako; Yamaguchi, Moe; Yorozu, Ayanori; Takahashi, Masaki; Aoyama, Tomoki

    2017-05-01

    The purpose of this study was to investigate which spatial and temporal parameters of the Timed Up and Go (TUG) test are associated with motor function in elderly individuals. This study included 99 community-dwelling women aged 72.9 ± 6.3 years. Step length, step width, single support time, variability of the aforementioned parameters, gait velocity, cadence, reaction time from starting signal to first step, and minimum distance between the foot and a marker placed to 3 in front of the chair were measured using our analysis system. The 10-m walk test, five times sit-to-stand (FTSTS) test, and one-leg standing (OLS) test were used to assess motor function. Stepwise multivariate linear regression analysis was used to determine which TUG test parameters were associated with each motor function test. Finally, we calculated a predictive model for each motor function test using each regression coefficient. In stepwise linear regression analysis, step length and cadence were significantly associated with the 10-m walk test, FTSTS and OLS test. Reaction time was associated with the FTSTS test, and step width was associated with the OLS test. Each predictive model showed a strong correlation with the 10-m walk test and OLS test (P motor function test. Moreover, the TUG test time regarded as the lower extremity function and mobility has strong predictive ability in each motor function test. Copyright © 2017 The Japanese Orthopaedic Association. Published by Elsevier B.V. All rights reserved.

  2. Reduction of interferences in graphite furnace atomic absorption spectrometry by multiple linear regression modelling

    Science.gov (United States)

    Grotti, Marco; Abelmoschi, Maria Luisa; Soggia, Francesco; Tiberiade, Christian; Frache, Roberto

    2000-12-01

    The multivariate effects of Na, K, Mg and Ca as nitrates on the electrothermal atomisation of manganese, cadmium and iron were studied by multiple linear regression modelling. Since the models proved to efficiently predict the effects of the considered matrix elements in a wide range of concentrations, they were applied to correct the interferences occurring in the determination of trace elements in seawater after pre-concentration of the analytes. In order to obtain a statistically significant number of samples, a large volume of the certified seawater reference materials CASS-3 and NASS-3 was treated with Chelex-100 resin; then, the chelating resin was separated from the solution, divided into several sub-samples, each of them was eluted with nitric acid and analysed by electrothermal atomic absorption spectrometry (for trace element determinations) and inductively coupled plasma optical emission spectrometry (for matrix element determinations). To minimise any other systematic error besides that due to matrix effects, accuracy of the pre-concentration step and contamination levels of the procedure were checked by inductively coupled plasma mass spectrometric measurements. Analytical results obtained by applying the multiple linear regression models were compared with those obtained with other calibration methods, such as external calibration using acid-based standards, external calibration using matrix-matched standards and the analyte addition technique. Empirical models proved to efficiently reduce interferences occurring in the analysis of real samples, allowing an improvement of accuracy better than for other calibration methods.

  3. Generalised Partially Linear Regression with Misclassified Data and an Application to Labour Market Transitions

    DEFF Research Database (Denmark)

    Dlugosz, Stephan; Mammen, Enno; Wilke, Ralf

    We consider the semiparametric generalised linear regression model which has mainstream empirical models such as the (partially) linear mean regression, logistic and multinomial regression as special cases. As an extension to related literature we allow a misclassified covariate to be interacted...

  4. Neutrosophic Correlation and Simple Linear Regression

    Directory of Open Access Journals (Sweden)

    A. A. Salama

    2014-09-01

    Full Text Available Since the world is full of indeterminacy, the neutrosophics found their place into contemporary research. The fundamental concepts of neutrosophic set, introduced by Smarandache. Recently, Salama et al., introduced the concept of correlation coefficient of neutrosophic data. In this paper, we introduce and study the concepts of correlation and correlation coefficient of neutrosophic data in probability spaces and study some of their properties. Also, we introduce and study the neutrosophic simple linear regression model. Possible applications to data processing are touched upon.

  5. Comparison of Classical Linear Regression and Orthogonal Regression According to the Sum of Squares Perpendicular Distances

    OpenAIRE

    KELEŞ, Taliha; ALTUN, Murat

    2016-01-01

    Regression analysis is a statistical technique for investigating and modeling the relationship between variables. The purpose of this study was the trivial presentation of the equation for orthogonal regression (OR) and the comparison of classical linear regression (CLR) and OR techniques with respect to the sum of squared perpendicular distances. For that purpose, the analyses were shown by an example. It was found that the sum of squared perpendicular distances of OR is smaller. Thus, it wa...

  6. Single image super-resolution using locally adaptive multiple linear regression.

    Science.gov (United States)

    Yu, Soohwan; Kang, Wonseok; Ko, Seungyong; Paik, Joonki

    2015-12-01

    This paper presents a regularized superresolution (SR) reconstruction method using locally adaptive multiple linear regression to overcome the limitation of spatial resolution of digital images. In order to make the SR problem better-posed, the proposed method incorporates the locally adaptive multiple linear regression into the regularization process as a local prior. The local regularization prior assumes that the target high-resolution (HR) pixel is generated by a linear combination of similar pixels in differently scaled patches and optimum weight parameters. In addition, we adapt a modified version of the nonlocal means filter as a smoothness prior to utilize the patch redundancy. Experimental results show that the proposed algorithm better restores HR images than existing state-of-the-art methods in the sense of the most objective measures in the literature.

  7. Parameter estimation of multivariate multiple regression model using bayesian with non-informative Jeffreys’ prior distribution

    Science.gov (United States)

    Saputro, D. R. S.; Amalia, F.; Widyaningsih, P.; Affan, R. C.

    2018-05-01

    Bayesian method is a method that can be used to estimate the parameters of multivariate multiple regression model. Bayesian method has two distributions, there are prior and posterior distributions. Posterior distribution is influenced by the selection of prior distribution. Jeffreys’ prior distribution is a kind of Non-informative prior distribution. This prior is used when the information about parameter not available. Non-informative Jeffreys’ prior distribution is combined with the sample information resulting the posterior distribution. Posterior distribution is used to estimate the parameter. The purposes of this research is to estimate the parameters of multivariate regression model using Bayesian method with Non-informative Jeffreys’ prior distribution. Based on the results and discussion, parameter estimation of β and Σ which were obtained from expected value of random variable of marginal posterior distribution function. The marginal posterior distributions for β and Σ are multivariate normal and inverse Wishart. However, in calculation of the expected value involving integral of a function which difficult to determine the value. Therefore, approach is needed by generating of random samples according to the posterior distribution characteristics of each parameter using Markov chain Monte Carlo (MCMC) Gibbs sampling algorithm.

  8. Teaching the Concept of Breakdown Point in Simple Linear Regression.

    Science.gov (United States)

    Chan, Wai-Sum

    2001-01-01

    Most introductory textbooks on simple linear regression analysis mention the fact that extreme data points have a great influence on ordinary least-squares regression estimation; however, not many textbooks provide a rigorous mathematical explanation of this phenomenon. Suggests a way to fill this gap by teaching students the concept of breakdown…

  9. Establishment of regression dependences. Linear and nonlinear dependences

    International Nuclear Information System (INIS)

    Onishchenko, A.M.

    1994-01-01

    The main problems of determination of linear and 19 types of nonlinear regression dependences are completely discussed. It is taken into consideration that total dispersions are the sum of measurement dispersions and parameter variation dispersions themselves. Approaches to all dispersions determination are described. It is shown that the least square fit gives inconsistent estimation for industrial objects and processes. The correction methods by taking into account comparable measurement errors for both variable give an opportunity to obtain consistent estimation for the regression equation parameters. The condition of the correction technique application expediency is given. The technique for determination of nonlinear regression dependences taking into account the dependence form and comparable errors of both variables is described. 6 refs., 1 tab

  10. application of multilinear regression analysis in modeling of soil

    African Journals Online (AJOL)

    Windows User

    Accordingly [1, 3] in their work, they applied linear regression ... (MLRA) is a statistical technique that uses several explanatory ... order to check this, they adopted bivariate correlation analysis .... groups, namely A-1 through A-7, based on their relative expected ..... Multivariate Regression in Gorgan Province North of Iran” ...

  11. The PIT-trap-A "model-free" bootstrap procedure for inference about regression models with discrete, multivariate responses.

    Science.gov (United States)

    Warton, David I; Thibaut, Loïc; Wang, Yi Alice

    2017-01-01

    Bootstrap methods are widely used in statistics, and bootstrapping of residuals can be especially useful in the regression context. However, difficulties are encountered extending residual resampling to regression settings where residuals are not identically distributed (thus not amenable to bootstrapping)-common examples including logistic or Poisson regression and generalizations to handle clustered or multivariate data, such as generalised estimating equations. We propose a bootstrap method based on probability integral transform (PIT-) residuals, which we call the PIT-trap, which assumes data come from some marginal distribution F of known parametric form. This method can be understood as a type of "model-free bootstrap", adapted to the problem of discrete and highly multivariate data. PIT-residuals have the key property that they are (asymptotically) pivotal. The PIT-trap thus inherits the key property, not afforded by any other residual resampling approach, that the marginal distribution of data can be preserved under PIT-trapping. This in turn enables the derivation of some standard bootstrap properties, including second-order correctness of pivotal PIT-trap test statistics. In multivariate data, bootstrapping rows of PIT-residuals affords the property that it preserves correlation in data without the need for it to be modelled, a key point of difference as compared to a parametric bootstrap. The proposed method is illustrated on an example involving multivariate abundance data in ecology, and demonstrated via simulation to have improved properties as compared to competing resampling methods.

  12. Comparing treatment effects after adjustment with multivariable Cox proportional hazards regression and propensity score methods

    NARCIS (Netherlands)

    Martens, Edwin P; de Boer, Anthonius; Pestman, Wiebe R; Belitser, Svetlana V; Stricker, Bruno H Ch; Klungel, Olaf H

    PURPOSE: To compare adjusted effects of drug treatment for hypertension on the risk of stroke from propensity score (PS) methods with a multivariable Cox proportional hazards (Cox PH) regression in an observational study with censored data. METHODS: From two prospective population-based cohort

  13. The number of subjects per variable required in linear regression analyses.

    Science.gov (United States)

    Austin, Peter C; Steyerberg, Ewout W

    2015-06-01

    To determine the number of independent variables that can be included in a linear regression model. We used a series of Monte Carlo simulations to examine the impact of the number of subjects per variable (SPV) on the accuracy of estimated regression coefficients and standard errors, on the empirical coverage of estimated confidence intervals, and on the accuracy of the estimated R(2) of the fitted model. A minimum of approximately two SPV tended to result in estimation of regression coefficients with relative bias of less than 10%. Furthermore, with this minimum number of SPV, the standard errors of the regression coefficients were accurately estimated and estimated confidence intervals had approximately the advertised coverage rates. A much higher number of SPV were necessary to minimize bias in estimating the model R(2), although adjusted R(2) estimates behaved well. The bias in estimating the model R(2) statistic was inversely proportional to the magnitude of the proportion of variation explained by the population regression model. Linear regression models require only two SPV for adequate estimation of regression coefficients, standard errors, and confidence intervals. Copyright © 2015 The Authors. Published by Elsevier Inc. All rights reserved.

  14. On the Optimality of Multivariate S-Estimators

    NARCIS (Netherlands)

    Croux, C.; Dehon, C.; Yadine, A.

    2010-01-01

    In this paper we maximize the efficiency of a multivariate S-estimator under a constraint on the breakdown point. In the linear regression model, it is known that the highest possible efficiency of a maximum breakdown S-estimator is bounded above by 33% for Gaussian errors. We prove the surprising

  15. How Robust Is Linear Regression with Dummy Variables?

    Science.gov (United States)

    Blankmeyer, Eric

    2006-01-01

    Researchers in education and the social sciences make extensive use of linear regression models in which the dependent variable is continuous-valued while the explanatory variables are a combination of continuous-valued regressors and dummy variables. The dummies partition the sample into groups, some of which may contain only a few observations.…

  16. Common pitfalls in statistical analysis: Linear regression analysis

    Directory of Open Access Journals (Sweden)

    Rakesh Aggarwal

    2017-01-01

    Full Text Available In a previous article in this series, we explained correlation analysis which describes the strength of relationship between two continuous variables. In this article, we deal with linear regression analysis which predicts the value of one continuous variable from another. We also discuss the assumptions and pitfalls associated with this analysis.

  17. Tightness of M-estimators for multiple linear regression in time series

    DEFF Research Database (Denmark)

    Johansen, Søren; Nielsen, Bent

    We show tightness of a general M-estimator for multiple linear regression in time series. The positive criterion function for the M-estimator is assumed lower semi-continuous and sufficiently large for large argument: Particular cases are the Huber-skip and quantile regression. Tightness requires...

  18. Non-proportional odds multivariate logistic regression of ordinal family data.

    Science.gov (United States)

    Zaloumis, Sophie G; Scurrah, Katrina J; Harrap, Stephen B; Ellis, Justine A; Gurrin, Lyle C

    2015-03-01

    Methods to examine whether genetic and/or environmental sources can account for the residual variation in ordinal family data usually assume proportional odds. However, standard software to fit the non-proportional odds model to ordinal family data is limited because the correlation structure of family data is more complex than for other types of clustered data. To perform these analyses we propose the non-proportional odds multivariate logistic regression model and take a simulation-based approach to model fitting using Markov chain Monte Carlo methods, such as partially collapsed Gibbs sampling and the Metropolis algorithm. We applied the proposed methodology to male pattern baldness data from the Victorian Family Heart Study. © 2014 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.

  19. BRGLM, Interactive Linear Regression Analysis by Least Square Fit

    International Nuclear Information System (INIS)

    Ringland, J.T.; Bohrer, R.E.; Sherman, M.E.

    1985-01-01

    1 - Description of program or function: BRGLM is an interactive program written to fit general linear regression models by least squares and to provide a variety of statistical diagnostic information about the fit. Stepwise and all-subsets regression can be carried out also. There are facilities for interactive data management (e.g. setting missing value flags, data transformations) and tools for constructing design matrices for the more commonly-used models such as factorials, cubic Splines, and auto-regressions. 2 - Method of solution: The least squares computations are based on the orthogonal (QR) decomposition of the design matrix obtained using the modified Gram-Schmidt algorithm. 3 - Restrictions on the complexity of the problem: The current release of BRGLM allows maxima of 1000 observations, 99 variables, and 3000 words of main memory workspace. For a problem with N observations and P variables, the number of words of main memory storage required is MAX(N*(P+6), N*P+P*P+3*N, and 3*P*P+6*N). Any linear model may be fit although the in-memory workspace will have to be increased for larger problems

  20. Prediction of Mind-Wandering with Electroencephalogram and Non-linear Regression Modeling.

    Science.gov (United States)

    Kawashima, Issaku; Kumano, Hiroaki

    2017-01-01

    Mind-wandering (MW), task-unrelated thought, has been examined by researchers in an increasing number of articles using models to predict whether subjects are in MW, using numerous physiological variables. However, these models are not applicable in general situations. Moreover, they output only binary classification. The current study suggests that the combination of electroencephalogram (EEG) variables and non-linear regression modeling can be a good indicator of MW intensity. We recorded EEGs of 50 subjects during the performance of a Sustained Attention to Response Task, including a thought sampling probe that inquired the focus of attention. We calculated the power and coherence value and prepared 35 patterns of variable combinations and applied Support Vector machine Regression (SVR) to them. Finally, we chose four SVR models: two of them non-linear models and the others linear models; two of the four models are composed of a limited number of electrodes to satisfy model usefulness. Examination using the held-out data indicated that all models had robust predictive precision and provided significantly better estimations than a linear regression model using single electrode EEG variables. Furthermore, in limited electrode condition, non-linear SVR model showed significantly better precision than linear SVR model. The method proposed in this study helps investigations into MW in various little-examined situations. Further, by measuring MW with a high temporal resolution EEG, unclear aspects of MW, such as time series variation, are expected to be revealed. Furthermore, our suggestion that a few electrodes can also predict MW contributes to the development of neuro-feedback studies.

  1. Prediction of Mind-Wandering with Electroencephalogram and Non-linear Regression Modeling

    Directory of Open Access Journals (Sweden)

    Issaku Kawashima

    2017-07-01

    Full Text Available Mind-wandering (MW, task-unrelated thought, has been examined by researchers in an increasing number of articles using models to predict whether subjects are in MW, using numerous physiological variables. However, these models are not applicable in general situations. Moreover, they output only binary classification. The current study suggests that the combination of electroencephalogram (EEG variables and non-linear regression modeling can be a good indicator of MW intensity. We recorded EEGs of 50 subjects during the performance of a Sustained Attention to Response Task, including a thought sampling probe that inquired the focus of attention. We calculated the power and coherence value and prepared 35 patterns of variable combinations and applied Support Vector machine Regression (SVR to them. Finally, we chose four SVR models: two of them non-linear models and the others linear models; two of the four models are composed of a limited number of electrodes to satisfy model usefulness. Examination using the held-out data indicated that all models had robust predictive precision and provided significantly better estimations than a linear regression model using single electrode EEG variables. Furthermore, in limited electrode condition, non-linear SVR model showed significantly better precision than linear SVR model. The method proposed in this study helps investigations into MW in various little-examined situations. Further, by measuring MW with a high temporal resolution EEG, unclear aspects of MW, such as time series variation, are expected to be revealed. Furthermore, our suggestion that a few electrodes can also predict MW contributes to the development of neuro-feedback studies.

  2. Stochastic development regression on non-linear manifolds

    DEFF Research Database (Denmark)

    Kühnel, Line; Sommer, Stefan Horst

    2017-01-01

    We introduce a regression model for data on non-linear manifolds. The model describes the relation between a set of manifold valued observations, such as shapes of anatomical objects, and Euclidean explanatory variables. The approach is based on stochastic development of Euclidean diffusion...... processes to the manifold. Defining the data distribution as the transition distribution of the mapped stochastic process, parameters of the model, the non-linear analogue of design matrix and intercept, are found via maximum likelihood. The model is intrinsically related to the geometry encoded...... in the connection of the manifold. We propose an estimation procedure which applies the Laplace approximation of the likelihood function. A simulation study of the performance of the model is performed and the model is applied to a real dataset of Corpus Callosum shapes....

  3. Comparing lagged linear correlation, lagged regression, Granger causality, and vector autoregression for uncovering associations in EHR data.

    Science.gov (United States)

    Levine, Matthew E; Albers, David J; Hripcsak, George

    2016-01-01

    Time series analysis methods have been shown to reveal clinical and biological associations in data collected in the electronic health record. We wish to develop reliable high-throughput methods for identifying adverse drug effects that are easy to implement and produce readily interpretable results. To move toward this goal, we used univariate and multivariate lagged regression models to investigate associations between twenty pairs of drug orders and laboratory measurements. Multivariate lagged regression models exhibited higher sensitivity and specificity than univariate lagged regression in the 20 examples, and incorporating autoregressive terms for labs and drugs produced more robust signals in cases of known associations among the 20 example pairings. Moreover, including inpatient admission terms in the model attenuated the signals for some cases of unlikely associations, demonstrating how multivariate lagged regression models' explicit handling of context-based variables can provide a simple way to probe for health-care processes that confound analyses of EHR data.

  4. EPMLR: sequence-based linear B-cell epitope prediction method using multiple linear regression.

    Science.gov (United States)

    Lian, Yao; Ge, Meng; Pan, Xian-Ming

    2014-12-19

    B-cell epitopes have been studied extensively due to their immunological applications, such as peptide-based vaccine development, antibody production, and disease diagnosis and therapy. Despite several decades of research, the accurate prediction of linear B-cell epitopes has remained a challenging task. In this work, based on the antigen's primary sequence information, a novel linear B-cell epitope prediction model was developed using the multiple linear regression (MLR). A 10-fold cross-validation test on a large non-redundant dataset was performed to evaluate the performance of our model. To alleviate the problem caused by the noise of negative dataset, 300 experiments utilizing 300 sub-datasets were performed. We achieved overall sensitivity of 81.8%, precision of 64.1% and area under the receiver operating characteristic curve (AUC) of 0.728. We have presented a reliable method for the identification of linear B cell epitope using antigen's primary sequence information. Moreover, a web server EPMLR has been developed for linear B-cell epitope prediction: http://www.bioinfo.tsinghua.edu.cn/epitope/EPMLR/ .

  5. Analysis of Multivariate Experimental Data Using A Simplified Regression Model Search Algorithm

    Science.gov (United States)

    Ulbrich, Norbert Manfred

    2013-01-01

    A new regression model search algorithm was developed in 2011 that may be used to analyze both general multivariate experimental data sets and wind tunnel strain-gage balance calibration data. The new algorithm is a simplified version of a more complex search algorithm that was originally developed at the NASA Ames Balance Calibration Laboratory. The new algorithm has the advantage that it needs only about one tenth of the original algorithm's CPU time for the completion of a search. In addition, extensive testing showed that the prediction accuracy of math models obtained from the simplified algorithm is similar to the prediction accuracy of math models obtained from the original algorithm. The simplified algorithm, however, cannot guarantee that search constraints related to a set of statistical quality requirements are always satisfied in the optimized regression models. Therefore, the simplified search algorithm is not intended to replace the original search algorithm. Instead, it may be used to generate an alternate optimized regression model of experimental data whenever the application of the original search algorithm either fails or requires too much CPU time. Data from a machine calibration of NASA's MK40 force balance is used to illustrate the application of the new regression model search algorithm.

  6. Multivariate regression analysis for determining short-term values of radon and its decay products from filter measurements

    International Nuclear Information System (INIS)

    Kraut, W.; Schwarz, W.; Wilhelm, A.

    1994-01-01

    A multivariate regression analysis is applied to decay measurements of α-resp. β-filter activcity. Activity concentrations for Po-218, Pb-214 and Bi-214, resp. for the Rn-222 equilibrium equivalent concentration are obtained explicitly. The regression analysis takes into account properly the variances of the measured count rates and their influence on the resulting activity concentrations. (orig.) [de

  7. The number of subjects per variable required in linear regression analyses

    NARCIS (Netherlands)

    P.C. Austin (Peter); E.W. Steyerberg (Ewout)

    2015-01-01

    textabstractObjectives To determine the number of independent variables that can be included in a linear regression model. Study Design and Setting We used a series of Monte Carlo simulations to examine the impact of the number of subjects per variable (SPV) on the accuracy of estimated regression

  8. Treating experimental data of inverse kinetic method by unitary linear regression analysis

    International Nuclear Information System (INIS)

    Zhao Yusen; Chen Xiaoliang

    2009-01-01

    The theory of treating experimental data of inverse kinetic method by unitary linear regression analysis was described. Not only the reactivity, but also the effective neutron source intensity could be calculated by this method. Computer code was compiled base on the inverse kinetic method and unitary linear regression analysis. The data of zero power facility BFS-1 in Russia were processed and the results were compared. The results show that the reactivity and the effective neutron source intensity can be obtained correctly by treating experimental data of inverse kinetic method using unitary linear regression analysis and the precision of reactivity measurement is improved. The central element efficiency can be calculated by using the reactivity. The result also shows that the effect to reactivity measurement caused by external neutron source should be considered when the reactor power is low and the intensity of external neutron source is strong. (authors)

  9. Experimental variability and data pre-processing as factors affecting the discrimination power of some chemometric approaches (PCA, CA and a new algorithm based on linear regression) applied to (+/-)ESI/MS and RPLC/UV data: Application on green tea extracts.

    Science.gov (United States)

    Iorgulescu, E; Voicu, V A; Sârbu, C; Tache, F; Albu, F; Medvedovici, A

    2016-08-01

    The influence of the experimental variability (instrumental repeatability, instrumental intermediate precision and sample preparation variability) and data pre-processing (normalization, peak alignment, background subtraction) on the discrimination power of multivariate data analysis methods (Principal Component Analysis -PCA- and Cluster Analysis -CA-) as well as a new algorithm based on linear regression was studied. Data used in the study were obtained through positive or negative ion monitoring electrospray mass spectrometry (+/-ESI/MS) and reversed phase liquid chromatography/UV spectrometric detection (RPLC/UV) applied to green tea extracts. Extractions in ethanol and heated water infusion were used as sample preparation procedures. The multivariate methods were directly applied to mass spectra and chromatograms, involving strictly a holistic comparison of shapes, without assignment of any structural identity to compounds. An alternative data interpretation based on linear regression analysis mutually applied to data series is also discussed. Slopes, intercepts and correlation coefficients produced by the linear regression analysis applied on pairs of very large experimental data series successfully retain information resulting from high frequency instrumental acquisition rates, obviously better defining the profiles being compared. Consequently, each type of sample or comparison between samples produces in the Cartesian space an ellipsoidal volume defined by the normal variation intervals of the slope, intercept and correlation coefficient. Distances between volumes graphically illustrates (dis)similarities between compared data. The instrumental intermediate precision had the major effect on the discrimination power of the multivariate data analysis methods. Mass spectra produced through ionization from liquid state in atmospheric pressure conditions of bulk complex mixtures resulting from extracted materials of natural origins provided an excellent data

  10. An improved multiple linear regression and data analysis computer program package

    Science.gov (United States)

    Sidik, S. M.

    1972-01-01

    NEWRAP, an improved version of a previous multiple linear regression program called RAPIER, CREDUC, and CRSPLT, allows for a complete regression analysis including cross plots of the independent and dependent variables, correlation coefficients, regression coefficients, analysis of variance tables, t-statistics and their probability levels, rejection of independent variables, plots of residuals against the independent and dependent variables, and a canonical reduction of quadratic response functions useful in optimum seeking experimentation. A major improvement over RAPIER is that all regression calculations are done in double precision arithmetic.

  11. PM10 modeling in the Oviedo urban area (Northern Spain) by using multivariate adaptive regression splines

    Science.gov (United States)

    Nieto, Paulino José García; Antón, Juan Carlos Álvarez; Vilán, José Antonio Vilán; García-Gonzalo, Esperanza

    2014-10-01

    The aim of this research work is to build a regression model of the particulate matter up to 10 micrometers in size (PM10) by using the multivariate adaptive regression splines (MARS) technique in the Oviedo urban area (Northern Spain) at local scale. This research work explores the use of a nonparametric regression algorithm known as multivariate adaptive regression splines (MARS) which has the ability to approximate the relationship between the inputs and outputs, and express the relationship mathematically. In this sense, hazardous air pollutants or toxic air contaminants refer to any substance that may cause or contribute to an increase in mortality or serious illness, or that may pose a present or potential hazard to human health. To accomplish the objective of this study, the experimental dataset of nitrogen oxides (NOx), carbon monoxide (CO), sulfur dioxide (SO2), ozone (O3) and dust (PM10) were collected over 3 years (2006-2008) and they are used to create a highly nonlinear model of the PM10 in the Oviedo urban nucleus (Northern Spain) based on the MARS technique. One main objective of this model is to obtain a preliminary estimate of the dependence between PM10 pollutant in the Oviedo urban area at local scale. A second aim is to determine the factors with the greatest bearing on air quality with a view to proposing health and lifestyle improvements. The United States National Ambient Air Quality Standards (NAAQS) establishes the limit values of the main pollutants in the atmosphere in order to ensure the health of healthy people. Firstly, this MARS regression model captures the main perception of statistical learning theory in order to obtain a good prediction of the dependence among the main pollutants in the Oviedo urban area. Secondly, the main advantages of MARS are its capacity to produce simple, easy-to-interpret models, its ability to estimate the contributions of the input variables, and its computational efficiency. Finally, on the basis of

  12. On macroeconomic values investigation using fuzzy linear regression analysis

    Directory of Open Access Journals (Sweden)

    Richard Pospíšil

    2017-06-01

    Full Text Available The theoretical background for abstract formalization of the vague phenomenon of complex systems is the fuzzy set theory. In the paper, vague data is defined as specialized fuzzy sets - fuzzy numbers and there is described a fuzzy linear regression model as a fuzzy function with fuzzy numbers as vague parameters. To identify the fuzzy coefficients of the model, the genetic algorithm is used. The linear approximation of the vague function together with its possibility area is analytically and graphically expressed. A suitable application is performed in the tasks of the time series fuzzy regression analysis. The time-trend and seasonal cycles including their possibility areas are calculated and expressed. The examples are presented from the economy field, namely the time-development of unemployment, agricultural production and construction respectively between 2009 and 2011 in the Czech Republic. The results are shown in the form of the fuzzy regression models of variables of time series. For the period 2009-2011, the analysis assumptions about seasonal behaviour of variables and the relationship between them were confirmed; in 2010, the system behaved fuzzier and the relationships between the variables were vaguer, that has a lot of causes, from the different elasticity of demand, through state interventions to globalization and transnational impacts.

  13. Fuzzy Linear Regression for the Time Series Data which is Fuzzified with SMRGT Method

    Directory of Open Access Journals (Sweden)

    Seçil YALAZ

    2016-10-01

    Full Text Available Our work on regression and classification provides a new contribution to the analysis of time series used in many areas for years. Owing to the fact that convergence could not obtained with the methods used in autocorrelation fixing process faced with time series regression application, success is not met or fall into obligation of changing the models’ degree. Changing the models’ degree may not be desirable in every situation. In our study, recommended for these situations, time series data was fuzzified by using the simple membership function and fuzzy rule generation technique (SMRGT and to estimate future an equation has created by applying fuzzy least square regression (FLSR method which is a simple linear regression method to this data. Although SMRGT has success in determining the flow discharge in open channels and can be used confidently for flow discharge modeling in open canals, as well as in pipe flow with some modifications, there is no clue about that this technique is successful in fuzzy linear regression modeling. Therefore, in order to address the luck of such a modeling, a new hybrid model has been described within this study. In conclusion, to demonstrate our methods’ efficiency, classical linear regression for time series data and linear regression for fuzzy time series data were applied to two different data sets, and these two approaches performances were compared by using different measures.

  14. Multivariate analysis of nystatin and metronidazole in a semi-solid matrix by means of diffuse reflectance NIR spectroscopy and PLS regression.

    Science.gov (United States)

    Baratieri, Sabrina C; Barbosa, Juliana M; Freitas, Matheus P; Martins, José A

    2006-01-23

    A multivariate method of analysis of nystatin and metronidazole in a semi-solid matrix, based on diffuse reflectance NIR measurements and partial least squares regression, is reported. The product, a vaginal cream used in the antifungal and antibacterial treatment, is usually, quantitatively analyzed through microbiological tests (nystatin) and HPLC technique (metronidazole), according to pharmacopeial procedures. However, near infrared spectroscopy has demonstrated to be a valuable tool for content determination, given the rapidity and scope of the method. In the present study, it was successfully applied in the prediction of nystatin (even in low concentrations, ca. 0.3-0.4%, w/w, which is around 100,000 IU/5g) and metronidazole contents, as demonstrated by some figures of merit, namely linearity, precision (mean and repeatability) and accuracy.

  15. Alzheimer's Disease Detection by Pseudo Zernike Moment and Linear Regression Classification.

    Science.gov (United States)

    Wang, Shui-Hua; Du, Sidan; Zhang, Yin; Phillips, Preetha; Wu, Le-Nan; Chen, Xian-Qing; Zhang, Yu-Dong

    2017-01-01

    This study presents an improved method based on "Gorji et al. Neuroscience. 2015" by introducing a relatively new classifier-linear regression classification. Our method selects one axial slice from 3D brain image, and employed pseudo Zernike moment with maximum order of 15 to extract 256 features from each image. Finally, linear regression classification was harnessed as the classifier. The proposed approach obtains an accuracy of 97.51%, a sensitivity of 96.71%, and a specificity of 97.73%. Our method performs better than Gorji's approach and five other state-of-the-art approaches. Copyright© Bentham Science Publishers; For any queries, please email at epub@benthamscience.org.

  16. The simultaneous use of several pseudo-random binary sequences in the identification of linear multivariable dynamic systems

    International Nuclear Information System (INIS)

    Cummins, J.D.

    1965-02-01

    With several white noise sources the various transmission paths of a linear multivariable system may be determined simultaneously. This memorandum considers the restrictions on pseudo-random two state sequences to effect simultaneous identification of several transmission paths and the consequential rejection of cross-coupled signals in linear multivariable systems. The conditions for simultaneous identification are established by an example, which shows that the integration time required is large i.e. tends to infinity, as it does when white noise sources are used. (author)

  17. Transmission of linear regression patterns between time series: from relationship in time series to complex networks.

    Science.gov (United States)

    Gao, Xiangyun; An, Haizhong; Fang, Wei; Huang, Xuan; Li, Huajiao; Zhong, Weiqiong; Ding, Yinghui

    2014-07-01

    The linear regression parameters between two time series can be different under different lengths of observation period. If we study the whole period by the sliding window of a short period, the change of the linear regression parameters is a process of dynamic transmission over time. We tackle fundamental research that presents a simple and efficient computational scheme: a linear regression patterns transmission algorithm, which transforms linear regression patterns into directed and weighted networks. The linear regression patterns (nodes) are defined by the combination of intervals of the linear regression parameters and the results of the significance testing under different sizes of the sliding window. The transmissions between adjacent patterns are defined as edges, and the weights of the edges are the frequency of the transmissions. The major patterns, the distance, and the medium in the process of the transmission can be captured. The statistical results of weighted out-degree and betweenness centrality are mapped on timelines, which shows the features of the distribution of the results. Many measurements in different areas that involve two related time series variables could take advantage of this algorithm to characterize the dynamic relationships between the time series from a new perspective.

  18. Modelling subject-specific childhood growth using linear mixed-effect models with cubic regression splines.

    Science.gov (United States)

    Grajeda, Laura M; Ivanescu, Andrada; Saito, Mayuko; Crainiceanu, Ciprian; Jaganath, Devan; Gilman, Robert H; Crabtree, Jean E; Kelleher, Dermott; Cabrera, Lilia; Cama, Vitaliano; Checkley, William

    2016-01-01

    Childhood growth is a cornerstone of pediatric research. Statistical models need to consider individual trajectories to adequately describe growth outcomes. Specifically, well-defined longitudinal models are essential to characterize both population and subject-specific growth. Linear mixed-effect models with cubic regression splines can account for the nonlinearity of growth curves and provide reasonable estimators of population and subject-specific growth, velocity and acceleration. We provide a stepwise approach that builds from simple to complex models, and account for the intrinsic complexity of the data. We start with standard cubic splines regression models and build up to a model that includes subject-specific random intercepts and slopes and residual autocorrelation. We then compared cubic regression splines vis-à-vis linear piecewise splines, and with varying number of knots and positions. Statistical code is provided to ensure reproducibility and improve dissemination of methods. Models are applied to longitudinal height measurements in a cohort of 215 Peruvian children followed from birth until their fourth year of life. Unexplained variability, as measured by the variance of the regression model, was reduced from 7.34 when using ordinary least squares to 0.81 (p linear mixed-effect models with random slopes and a first order continuous autoregressive error term. There was substantial heterogeneity in both the intercept (p modeled with a first order continuous autoregressive error term as evidenced by the variogram of the residuals and by a lack of association among residuals. The final model provides a parametric linear regression equation for both estimation and prediction of population- and individual-level growth in height. We show that cubic regression splines are superior to linear regression splines for the case of a small number of knots in both estimation and prediction with the full linear mixed effect model (AIC 19,352 vs. 19

  19. Analysis of dental caries using generalized linear and count regression models

    Directory of Open Access Journals (Sweden)

    Javali M. Phil

    2013-11-01

    Full Text Available Generalized linear models (GLM are generalization of linear regression models, which allow fitting regression models to response data in all the sciences especially medical and dental sciences that follow a general exponential family. These are flexible and widely used class of such models that can accommodate response variables. Count data are frequently characterized by overdispersion and excess zeros. Zero-inflated count models provide a parsimonious yet powerful way to model this type of situation. Such models assume that the data are a mixture of two separate data generation processes: one generates only zeros, and the other is either a Poisson or a negative binomial data-generating process. Zero inflated count regression models such as the zero-inflated Poisson (ZIP, zero-inflated negative binomial (ZINB regression models have been used to handle dental caries count data with many zeros. We present an evaluation framework to the suitability of applying the GLM, Poisson, NB, ZIP and ZINB to dental caries data set where the count data may exhibit evidence of many zeros and over-dispersion. Estimation of the model parameters using the method of maximum likelihood is provided. Based on the Vuong test statistic and the goodness of fit measure for dental caries data, the NB and ZINB regression models perform better than other count regression models.

  20. Regression of non-linear coupling of noise in LIGO detectors

    Science.gov (United States)

    Da Silva Costa, C. F.; Billman, C.; Effler, A.; Klimenko, S.; Cheng, H.-P.

    2018-03-01

    In 2015, after their upgrade, the advanced Laser Interferometer Gravitational-Wave Observatory (LIGO) detectors started acquiring data. The effort to improve their sensitivity has never stopped since then. The goal to achieve design sensitivity is challenging. Environmental and instrumental noise couple to the detector output with different, linear and non-linear, coupling mechanisms. The noise regression method we use is based on the Wiener–Kolmogorov filter, which uses witness channels to make noise predictions. We present here how this method helped to determine complex non-linear noise couplings in the output mode cleaner and in the mirror suspension system of the LIGO detector.

  1. Fitting program for linear regressions according to Mahon (1996)

    Energy Technology Data Exchange (ETDEWEB)

    2018-01-09

    This program takes the users' Input data and fits a linear regression to it using the prescription presented by Mahon (1996). Compared to the commonly used York fit, this method has the correct prescription for measurement error propagation. This software should facilitate the proper fitting of measurements with a simple Interface.

  2. Correcting for multivariate measurement error by regression calibration in meta-analyses of epidemiological studies

    DEFF Research Database (Denmark)

    Tybjærg-Hansen, Anne

    2009-01-01

    Within-person variability in measured values of multiple risk factors can bias their associations with disease. The multivariate regression calibration (RC) approach can correct for such measurement error and has been applied to studies in which true values or independent repeat measurements...... of the risk factors are observed on a subsample. We extend the multivariate RC techniques to a meta-analysis framework where multiple studies provide independent repeat measurements and information on disease outcome. We consider the cases where some or all studies have repeat measurements, and compare study......-specific, averaged and empirical Bayes estimates of RC parameters. Additionally, we allow for binary covariates (e.g. smoking status) and for uncertainty and time trends in the measurement error corrections. Our methods are illustrated using a subset of individual participant data from prospective long-term studies...

  3. Comparison of l₁-Norm SVR and Sparse Coding Algorithms for Linear Regression.

    Science.gov (United States)

    Zhang, Qingtian; Hu, Xiaolin; Zhang, Bo

    2015-08-01

    Support vector regression (SVR) is a popular function estimation technique based on Vapnik's concept of support vector machine. Among many variants, the l1-norm SVR is known to be good at selecting useful features when the features are redundant. Sparse coding (SC) is a technique widely used in many areas and a number of efficient algorithms are available. Both l1-norm SVR and SC can be used for linear regression. In this brief, the close connection between the l1-norm SVR and SC is revealed and some typical algorithms are compared for linear regression. The results show that the SC algorithms outperform the Newton linear programming algorithm, an efficient l1-norm SVR algorithm, in efficiency. The algorithms are then used to design the radial basis function (RBF) neural networks. Experiments on some benchmark data sets demonstrate the high efficiency of the SC algorithms. In particular, one of the SC algorithms, the orthogonal matching pursuit is two orders of magnitude faster than a well-known RBF network designing algorithm, the orthogonal least squares algorithm.

  4. Reporting quality of multivariable logistic regression in selected Indian medical journals.

    Science.gov (United States)

    Kumar, R; Indrayan, A; Chhabra, P

    2012-01-01

    Use of multivariable logistic regression (MLR) modeling has steeply increased in the medical literature over the past few years. Testing of model assumptions and adequate reporting of MLR allow the reader to interpret results more accurately. To review the fulfillment of assumptions and reporting quality of MLR in selected Indian medical journals using established criteria. Analysis of published literature. Medknow.com publishes 68 Indian medical journals with open access. Eight of these journals had at least five articles using MLR between the years 1994 to 2008. Articles from each of these journals were evaluated according to the previously established 10-point quality criteria for reporting and to test the MLR model assumptions. SPSS 17 software and non-parametric test (Kruskal-Wallis H, Mann Whitney U, Spearman Correlation). One hundred and nine articles were finally found using MLR for analyzing the data in the selected eight journals. The number of such articles gradually increased after year 2003, but quality score remained almost similar over time. P value, odds ratio, and 95% confidence interval for coefficients in MLR was reported in 75.2% and sufficient cases (>10) per covariate of limiting sample size were reported in the 58.7% of the articles. No article reported the test for conformity of linear gradient for continuous covariates. Total score was not significantly different across the journals. However, involvement of statistician or epidemiologist as a co-author improved the average quality score significantly (P=0.014). Reporting of MLR in many Indian journals is incomplete. Only one article managed to score 8 out of 10 among 109 articles under review. All others scored less. Appropriate guidelines in instructions to authors, and pre-publication review of articles using MLR by a qualified statistician may improve quality of reporting.

  5. Structural brain connectivity and cognitive ability differences: A multivariate distance matrix regression analysis.

    Science.gov (United States)

    Ponsoda, Vicente; Martínez, Kenia; Pineda-Pardo, José A; Abad, Francisco J; Olea, Julio; Román, Francisco J; Barbey, Aron K; Colom, Roberto

    2017-02-01

    Neuroimaging research involves analyses of huge amounts of biological data that might or might not be related with cognition. This relationship is usually approached using univariate methods, and, therefore, correction methods are mandatory for reducing false positives. Nevertheless, the probability of false negatives is also increased. Multivariate frameworks have been proposed for helping to alleviate this balance. Here we apply multivariate distance matrix regression for the simultaneous analysis of biological and cognitive data, namely, structural connections among 82 brain regions and several latent factors estimating cognitive performance. We tested whether cognitive differences predict distances among individuals regarding their connectivity pattern. Beginning with 3,321 connections among regions, the 36 edges better predicted by the individuals' cognitive scores were selected. Cognitive scores were related to connectivity distances in both the full (3,321) and reduced (36) connectivity patterns. The selected edges connect regions distributed across the entire brain and the network defined by these edges supports high-order cognitive processes such as (a) (fluid) executive control, (b) (crystallized) recognition, learning, and language processing, and (c) visuospatial processing. This multivariate study suggests that one widespread, but limited number, of regions in the human brain, supports high-level cognitive ability differences. Hum Brain Mapp 38:803-816, 2017. © 2016 Wiley Periodicals, Inc. © 2016 Wiley Periodicals, Inc.

  6. A SOCIOLOGICAL ANALYSIS OF THE CHILDBEARING COEFFICIENT IN THE ALTAI REGION BASED ON METHOD OF FUZZY LINEAR REGRESSION

    Directory of Open Access Journals (Sweden)

    Sergei Vladimirovich Varaksin

    2017-06-01

    Full Text Available Purpose. Construction of a mathematical model of the dynamics of childbearing change in the Altai region in 2000–2016, analysis of the dynamics of changes in birth rates for multiple age categories of women of childbearing age. Methodology. A auxiliary analysis element is the construction of linear mathematical models of the dynamics of childbearing by using fuzzy linear regression method based on fuzzy numbers. Fuzzy linear regression is considered as an alternative to standard statistical linear regression for short time series and unknown distribution law. The parameters of fuzzy linear and standard statistical regressions for childbearing time series were defined with using the built in language MatLab algorithm. Method of fuzzy linear regression is not used in sociological researches yet. Results. There are made the conclusions about the socio-demographic changes in society, the high efficiency of the demographic policy of the leadership of the region and the country, and the applicability of the method of fuzzy linear regression for sociological analysis.

  7. A primer for biomedical scientists on how to execute model II linear regression analysis.

    Science.gov (United States)

    Ludbrook, John

    2012-04-01

    1. There are two very different ways of executing linear regression analysis. One is Model I, when the x-values are fixed by the experimenter. The other is Model II, in which the x-values are free to vary and are subject to error. 2. I have received numerous complaints from biomedical scientists that they have great difficulty in executing Model II linear regression analysis. This may explain the results of a Google Scholar search, which showed that the authors of articles in journals of physiology, pharmacology and biochemistry rarely use Model II regression analysis. 3. I repeat my previous arguments in favour of using least products linear regression analysis for Model II regressions. I review three methods for executing ordinary least products (OLP) and weighted least products (WLP) regression analysis: (i) scientific calculator and/or computer spreadsheet; (ii) specific purpose computer programs; and (iii) general purpose computer programs. 4. Using a scientific calculator and/or computer spreadsheet, it is easy to obtain correct values for OLP slope and intercept, but the corresponding 95% confidence intervals (CI) are inaccurate. 5. Using specific purpose computer programs, the freeware computer program smatr gives the correct OLP regression coefficients and obtains 95% CI by bootstrapping. In addition, smatr can be used to compare the slopes of OLP lines. 6. When using general purpose computer programs, I recommend the commercial programs systat and Statistica for those who regularly undertake linear regression analysis and I give step-by-step instructions in the Supplementary Information as to how to use loss functions. © 2011 The Author. Clinical and Experimental Pharmacology and Physiology. © 2011 Blackwell Publishing Asia Pty Ltd.

  8. Geoelectrical parameter-based multivariate regression borehole yield model for predicting aquifer yield in managing groundwater resource sustainability

    Directory of Open Access Journals (Sweden)

    Kehinde Anthony Mogaji

    2016-07-01

    Full Text Available This study developed a GIS-based multivariate regression (MVR yield rate prediction model of groundwater resource sustainability in the hard-rock geology terrain of southwestern Nigeria. This model can economically manage the aquifer yield rate potential predictions that are often overlooked in groundwater resources development. The proposed model relates the borehole yield rate inventory of the area to geoelectrically derived parameters. Three sets of borehole yield rate conditioning geoelectrically derived parameters—aquifer unit resistivity (ρ, aquifer unit thickness (D and coefficient of anisotropy (λ—were determined from the acquired and interpreted geophysical data. The extracted borehole yield rate values and the geoelectrically derived parameter values were regressed to develop the MVR relationship model by applying linear regression and GIS techniques. The sensitivity analysis results of the MVR model evaluated at P ⩽ 0.05 for the predictors ρ, D and λ provided values of 2.68 × 10−05, 2 × 10−02 and 2.09 × 10−06, respectively. The accuracy and predictive power tests conducted on the MVR model using the Theil inequality coefficient measurement approach, coupled with the sensitivity analysis results, confirmed the model yield rate estimation and prediction capability. The MVR borehole yield prediction model estimates were processed in a GIS environment to model an aquifer yield potential prediction map of the area. The information on the prediction map can serve as a scientific basis for predicting aquifer yield potential rates relevant in groundwater resources sustainability management. The developed MVR borehole yield rate prediction mode provides a good alternative to other methods used for this purpose.

  9. Order Selection for General Expression of Nonlinear Autoregressive Model Based on Multivariate Stepwise Regression

    Science.gov (United States)

    Shi, Jinfei; Zhu, Songqing; Chen, Ruwen

    2017-12-01

    An order selection method based on multiple stepwise regressions is proposed for General Expression of Nonlinear Autoregressive model which converts the model order problem into the variable selection of multiple linear regression equation. The partial autocorrelation function is adopted to define the linear term in GNAR model. The result is set as the initial model, and then the nonlinear terms are introduced gradually. Statistics are chosen to study the improvements of both the new introduced and originally existed variables for the model characteristics, which are adopted to determine the model variables to retain or eliminate. So the optimal model is obtained through data fitting effect measurement or significance test. The simulation and classic time-series data experiment results show that the method proposed is simple, reliable and can be applied to practical engineering.

  10. A graphical method to evaluate spectral preprocessing in multivariate regression calibrations: example with Savitzky-Golay filters and partial least squares regression.

    Science.gov (United States)

    Delwiche, Stephen R; Reeves, James B

    2010-01-01

    In multivariate regression analysis of spectroscopy data, spectral preprocessing is often performed to reduce unwanted background information (offsets, sloped baselines) or accentuate absorption features in intrinsically overlapping bands. These procedures, also known as pretreatments, are commonly smoothing operations or derivatives. While such operations are often useful in reducing the number of latent variables of the actual decomposition and lowering residual error, they also run the risk of misleading the practitioner into accepting calibration equations that are poorly adapted to samples outside of the calibration. The current study developed a graphical method to examine this effect on partial least squares (PLS) regression calibrations of near-infrared (NIR) reflection spectra of ground wheat meal with two analytes, protein content and sodium dodecyl sulfate sedimentation (SDS) volume (an indicator of the quantity of the gluten proteins that contribute to strong doughs). These two properties were chosen because of their differing abilities to be modeled by NIR spectroscopy: excellent for protein content, fair for SDS sedimentation volume. To further demonstrate the potential pitfalls of preprocessing, an artificial component, a randomly generated value, was included in PLS regression trials. Savitzky-Golay (digital filter) smoothing, first-derivative, and second-derivative preprocess functions (5 to 25 centrally symmetric convolution points, derived from quadratic polynomials) were applied to PLS calibrations of 1 to 15 factors. The results demonstrated the danger of an over reliance on preprocessing when (1) the number of samples used in a multivariate calibration is low (<50), (2) the spectral response of the analyte is weak, and (3) the goodness of the calibration is based on the coefficient of determination (R(2)) rather than a term based on residual error. The graphical method has application to the evaluation of other preprocess functions and various

  11. Nonparametric Regression Estimation for Multivariate Null Recurrent Processes

    Directory of Open Access Journals (Sweden)

    Biqing Cai

    2015-04-01

    Full Text Available This paper discusses nonparametric kernel regression with the regressor being a \\(d\\-dimensional \\(\\beta\\-null recurrent process in presence of conditional heteroscedasticity. We show that the mean function estimator is consistent with convergence rate \\(\\sqrt{n(Th^{d}}\\, where \\(n(T\\ is the number of regenerations for a \\(\\beta\\-null recurrent process and the limiting distribution (with proper normalization is normal. Furthermore, we show that the two-step estimator for the volatility function is consistent. The finite sample performance of the estimate is quite reasonable when the leave-one-out cross validation method is used for bandwidth selection. We apply the proposed method to study the relationship of Federal funds rate with 3-month and 5-year T-bill rates and discover the existence of nonlinearity of the relationship. Furthermore, the in-sample and out-of-sample performance of the nonparametric model is far better than the linear model.

  12. Linear Regression on Sparse Features for Single-Channel Speech Separation

    DEFF Research Database (Denmark)

    Schmidt, Mikkel N.; Olsson, Rasmus Kongsgaard

    2007-01-01

    In this work we address the problem of separating multiple speakers from a single microphone recording. We formulate a linear regression model for estimating each speaker based on features derived from the mixture. The employed feature representation is a sparse, non-negative encoding of the speech...... mixture in terms of pre-learned speaker-dependent dictionaries. Previous work has shown that this feature representation by itself provides some degree of separation. We show that the performance is significantly improved when regression analysis is performed on the sparse, non-negative features, both...

  13. The Relationship between Economic Growth and Money Laundering – a Linear Regression Model

    Directory of Open Access Journals (Sweden)

    Daniel Rece

    2009-09-01

    Full Text Available This study provides an overview of the relationship between economic growth and money laundering modeled by a least squares function. The report analyzes statistically data collected from USA, Russia, Romania and other eleven European countries, rendering a linear regression model. The study illustrates that 23.7% of the total variance in the regressand (level of money laundering is “explained” by the linear regression model. In our opinion, this model will provide critical auxiliary judgment and decision support for anti-money laundering service systems.

  14. Inverse estimation of multiple muscle activations based on linear logistic regression.

    Science.gov (United States)

    Sekiya, Masashi; Tsuji, Toshiaki

    2017-07-01

    This study deals with a technology to estimate the muscle activity from the movement data using a statistical model. A linear regression (LR) model and artificial neural networks (ANN) have been known as statistical models for such use. Although ANN has a high estimation capability, it is often in the clinical application that the lack of data amount leads to performance deterioration. On the other hand, the LR model has a limitation in generalization performance. We therefore propose a muscle activity estimation method to improve the generalization performance through the use of linear logistic regression model. The proposed method was compared with the LR model and ANN in the verification experiment with 7 participants. As a result, the proposed method showed better generalization performance than the conventional methods in various tasks.

  15. Data Transformations for Inference with Linear Regression: Clarifications and Recommendations

    Science.gov (United States)

    Pek, Jolynn; Wong, Octavia; Wong, C. M.

    2017-01-01

    Data transformations have been promoted as a popular and easy-to-implement remedy to address the assumption of normally distributed errors (in the population) in linear regression. However, the application of data transformations introduces non-ignorable complexities which should be fully appreciated before their implementation. This paper adds to…

  16. Alpins and thibos vectorial astigmatism analyses: proposal of a linear regression model between methods

    Directory of Open Access Journals (Sweden)

    Giuliano de Oliveira Freitas

    2013-10-01

    Full Text Available PURPOSE: To determine linear regression models between Alpins descriptive indices and Thibos astigmatic power vectors (APV, assessing the validity and strength of such correlations. METHODS: This case series prospectively assessed 62 eyes of 31 consecutive cataract patients with preoperative corneal astigmatism between 0.75 and 2.50 diopters in both eyes. Patients were randomly assorted among two phacoemulsification groups: one assigned to receive AcrySof®Toric intraocular lens (IOL in both eyes and another assigned to have AcrySof Natural IOL associated with limbal relaxing incisions, also in both eyes. All patients were reevaluated postoperatively at 6 months, when refractive astigmatism analysis was performed using both Alpins and Thibos methods. The ratio between Thibos postoperative APV and preoperative APV (APVratio and its linear regression to Alpins percentage of success of astigmatic surgery, percentage of astigmatism corrected and percentage of astigmatism reduction at the intended axis were assessed. RESULTS: Significant negative correlation between the ratio of post- and preoperative Thibos APVratio and Alpins percentage of success (%Success was found (Spearman's ρ=-0.93; linear regression is given by the following equation: %Success = (-APVratio + 1.00x100. CONCLUSION: The linear regression we found between APVratio and %Success permits a validated mathematical inference concerning the overall success of astigmatic surgery.

  17. On the null distribution of Bayes factors in linear regression

    Science.gov (United States)

    We show that under the null, the 2 log (Bayes factor) is asymptotically distributed as a weighted sum of chi-squared random variables with a shifted mean. This claim holds for Bayesian multi-linear regression with a family of conjugate priors, namely, the normal-inverse-gamma prior, the g-prior, and...

  18. Causal correlation of foliar biochemical concentrations with AVIRIS spectra using forced entry linear regression

    Science.gov (United States)

    Dawson, Terence P.; Curran, Paul J.; Kupiec, John A.

    1995-01-01

    A major goal of airborne imaging spectrometry is to estimate the biochemical composition of vegetation canopies from reflectance spectra. Remotely-sensed estimates of foliar biochemical concentrations of forests would provide valuable indicators of ecosystem function at regional and eventually global scales. Empirical research has shown a relationship exists between the amount of radiation reflected from absorption features and the concentration of given biochemicals in leaves and canopies (Matson et al., 1994, Johnson et al., 1994). A technique commonly used to determine which wavelengths have the strongest correlation with the biochemical of interest is unguided (stepwise) multiple regression. Wavelengths are entered into a multivariate regression equation, in their order of importance, each contributing to the reduction of the variance in the measured biochemical concentration. A significant problem with the use of stepwise regression for determining the correlation between biochemical concentration and spectra is that of 'overfitting' as there are significantly more wavebands than biochemical measurements. This could result in the selection of wavebands which may be more accurately attributable to noise or canopy effects. In addition, there is a real problem of collinearity in that the individual biochemical concentrations may covary. A strong correlation between the reflectance at a given wavelength and the concentration of a biochemical of interest, therefore, may be due to the effect of another biochemical which is closely related. Furthermore, it is not always possible to account for potentially suitable waveband omissions in the stepwise selection procedure. This concern about the suitability of stepwise regression has been identified and acknowledged in a number of recent studies (Wessman et al., 1988, Curran, 1989, Curran et al., 1992, Peterson and Hubbard, 1992, Martine and Aber, 1994, Kupiec, 1994). These studies have pointed to the lack of a physical

  19. Implementing fuzzy polynomial interpolation (FPI and fuzzy linear regression (LFR

    Directory of Open Access Journals (Sweden)

    Maria Cristina Floreno

    1996-05-01

    Full Text Available This paper presents some preliminary results arising within a general framework concerning the development of software tools for fuzzy arithmetic. The program is in a preliminary stage. What has been already implemented consists of a set of routines for elementary operations, optimized functions evaluation, interpolation and regression. Some of these have been applied to real problems.This paper describes a prototype of a library in C++ for polynomial interpolation of fuzzifying functions, a set of routines in FORTRAN for fuzzy linear regression and a program with graphical user interface allowing the use of such routines.

  20. Application of range-test in multiple linear regression analysis in ...

    African Journals Online (AJOL)

    Application of range-test in multiple linear regression analysis in the presence of outliers is studied in this paper. First, the plot of the explanatory variables (i.e. Administration, Social/Commercial, Economic services and Transfer) on the dependent variable (i.e. GDP) was done to identify the statistical trend over the years.

  1. Evaluation of linear regression techniques for atmospheric applications: the importance of appropriate weighting

    Directory of Open Access Journals (Sweden)

    C. Wu

    2018-03-01

    Full Text Available Linear regression techniques are widely used in atmospheric science, but they are often improperly applied due to lack of consideration or inappropriate handling of measurement uncertainty. In this work, numerical experiments are performed to evaluate the performance of five linear regression techniques, significantly extending previous works by Chu and Saylor. The five techniques are ordinary least squares (OLS, Deming regression (DR, orthogonal distance regression (ODR, weighted ODR (WODR, and York regression (YR. We first introduce a new data generation scheme that employs the Mersenne twister (MT pseudorandom number generator. The numerical simulations are also improved by (a refining the parameterization of nonlinear measurement uncertainties, (b inclusion of a linear measurement uncertainty, and (c inclusion of WODR for comparison. Results show that DR, WODR and YR produce an accurate slope, but the intercept by WODR and YR is overestimated and the degree of bias is more pronounced with a low R2 XY dataset. The importance of a properly weighting parameter λ in DR is investigated by sensitivity tests, and it is found that an improper λ in DR can lead to a bias in both the slope and intercept estimation. Because the λ calculation depends on the actual form of the measurement error, it is essential to determine the exact form of measurement error in the XY data during the measurement stage. If a priori error in one of the variables is unknown, or the measurement error described cannot be trusted, DR, WODR and YR can provide the least biases in slope and intercept among all tested regression techniques. For these reasons, DR, WODR and YR are recommended for atmospheric studies when both X and Y data have measurement errors. An Igor Pro-based program (Scatter Plot was developed to facilitate the implementation of error-in-variables regressions.

  2. Evaluation of linear regression techniques for atmospheric applications: the importance of appropriate weighting

    Science.gov (United States)

    Wu, Cheng; Zhen Yu, Jian

    2018-03-01

    Linear regression techniques are widely used in atmospheric science, but they are often improperly applied due to lack of consideration or inappropriate handling of measurement uncertainty. In this work, numerical experiments are performed to evaluate the performance of five linear regression techniques, significantly extending previous works by Chu and Saylor. The five techniques are ordinary least squares (OLS), Deming regression (DR), orthogonal distance regression (ODR), weighted ODR (WODR), and York regression (YR). We first introduce a new data generation scheme that employs the Mersenne twister (MT) pseudorandom number generator. The numerical simulations are also improved by (a) refining the parameterization of nonlinear measurement uncertainties, (b) inclusion of a linear measurement uncertainty, and (c) inclusion of WODR for comparison. Results show that DR, WODR and YR produce an accurate slope, but the intercept by WODR and YR is overestimated and the degree of bias is more pronounced with a low R2 XY dataset. The importance of a properly weighting parameter λ in DR is investigated by sensitivity tests, and it is found that an improper λ in DR can lead to a bias in both the slope and intercept estimation. Because the λ calculation depends on the actual form of the measurement error, it is essential to determine the exact form of measurement error in the XY data during the measurement stage. If a priori error in one of the variables is unknown, or the measurement error described cannot be trusted, DR, WODR and YR can provide the least biases in slope and intercept among all tested regression techniques. For these reasons, DR, WODR and YR are recommended for atmospheric studies when both X and Y data have measurement errors. An Igor Pro-based program (Scatter Plot) was developed to facilitate the implementation of error-in-variables regressions.

  3. Research on the multiple linear regression in non-invasive blood glucose measurement.

    Science.gov (United States)

    Zhu, Jianming; Chen, Zhencheng

    2015-01-01

    A non-invasive blood glucose measurement sensor and the data process algorithm based on the metabolic energy conservation (MEC) method are presented in this paper. The physiological parameters of human fingertip can be measured by various sensing modalities, and blood glucose value can be evaluated with the physiological parameters by the multiple linear regression analysis. Five methods such as enter, remove, forward, backward and stepwise in multiple linear regression were compared, and the backward method had the best performance. The best correlation coefficient was 0.876 with the standard error of the estimate 0.534, and the significance was 0.012 (sig. regression equation was valid. The Clarke error grid analysis was performed to compare the MEC method with the hexokinase method, using 200 data points. The correlation coefficient R was 0.867 and all of the points were located in Zone A and Zone B, which shows the MEC method provides a feasible and valid way for non-invasive blood glucose measurement.

  4. Multivariate time series with linear state space structure

    CERN Document Server

    Gómez, Víctor

    2016-01-01

    This book presents a comprehensive study of multivariate time series with linear state space structure. The emphasis is put on both the clarity of the theoretical concepts and on efficient algorithms for implementing the theory. In particular, it investigates the relationship between VARMA and state space models, including canonical forms. It also highlights the relationship between Wiener-Kolmogorov and Kalman filtering both with an infinite and a finite sample. The strength of the book also lies in the numerous algorithms included for state space models that take advantage of the recursive nature of the models. Many of these algorithms can be made robust, fast, reliable and efficient. The book is accompanied by a MATLAB package called SSMMATLAB and a webpage presenting implemented algorithms with many examples and case studies. Though it lays a solid theoretical foundation, the book also focuses on practical application, and includes exercises in each chapter. It is intended for researchers and students wor...

  5. Estimating Loess Plateau Average Annual Precipitation with Multiple Linear Regression Kriging and Geographically Weighted Regression Kriging

    Directory of Open Access Journals (Sweden)

    Qiutong Jin

    2016-06-01

    Full Text Available Estimating the spatial distribution of precipitation is an important and challenging task in hydrology, climatology, ecology, and environmental science. In order to generate a highly accurate distribution map of average annual precipitation for the Loess Plateau in China, multiple linear regression Kriging (MLRK and geographically weighted regression Kriging (GWRK methods were employed using precipitation data from the period 1980–2010 from 435 meteorological stations. The predictors in regression Kriging were selected by stepwise regression analysis from many auxiliary environmental factors, such as elevation (DEM, normalized difference vegetation index (NDVI, solar radiation, slope, and aspect. All predictor distribution maps had a 500 m spatial resolution. Validation precipitation data from 130 hydrometeorological stations were used to assess the prediction accuracies of the MLRK and GWRK approaches. Results showed that both prediction maps with a 500 m spatial resolution interpolated by MLRK and GWRK had a high accuracy and captured detailed spatial distribution data; however, MLRK produced a lower prediction error and a higher variance explanation than GWRK, although the differences were small, in contrast to conclusions from similar studies.

  6. On the interpretation of weight vectors of linear models in multivariate neuroimaging.

    Science.gov (United States)

    Haufe, Stefan; Meinecke, Frank; Görgen, Kai; Dähne, Sven; Haynes, John-Dylan; Blankertz, Benjamin; Bießmann, Felix

    2014-02-15

    The increase in spatiotemporal resolution of neuroimaging devices is accompanied by a trend towards more powerful multivariate analysis methods. Often it is desired to interpret the outcome of these methods with respect to the cognitive processes under study. Here we discuss which methods allow for such interpretations, and provide guidelines for choosing an appropriate analysis for a given experimental goal: For a surgeon who needs to decide where to remove brain tissue it is most important to determine the origin of cognitive functions and associated neural processes. In contrast, when communicating with paralyzed or comatose patients via brain-computer interfaces, it is most important to accurately extract the neural processes specific to a certain mental state. These equally important but complementary objectives require different analysis methods. Determining the origin of neural processes in time or space from the parameters of a data-driven model requires what we call a forward model of the data; such a model explains how the measured data was generated from the neural sources. Examples are general linear models (GLMs). Methods for the extraction of neural information from data can be considered as backward models, as they attempt to reverse the data generating process. Examples are multivariate classifiers. Here we demonstrate that the parameters of forward models are neurophysiologically interpretable in the sense that significant nonzero weights are only observed at channels the activity of which is related to the brain process under study. In contrast, the interpretation of backward model parameters can lead to wrong conclusions regarding the spatial or temporal origin of the neural signals of interest, since significant nonzero weights may also be observed at channels the activity of which is statistically independent of the brain process under study. As a remedy for the linear case, we propose a procedure for transforming backward models into forward

  7. The role of chemometrics in single and sequential extraction assays: a review. Part II. Cluster analysis, multiple linear regression, mixture resolution, experimental design and other techniques.

    Science.gov (United States)

    Giacomino, Agnese; Abollino, Ornella; Malandrino, Mery; Mentasti, Edoardo

    2011-03-04

    Single and sequential extraction procedures are used for studying element mobility and availability in solid matrices, like soils, sediments, sludge, and airborne particulate matter. In the first part of this review we reported an overview on these procedures and described the applications of chemometric uni- and bivariate techniques and of multivariate pattern recognition techniques based on variable reduction to the experimental results obtained. The second part of the review deals with the use of chemometrics not only for the visualization and interpretation of data, but also for the investigation of the effects of experimental conditions on the response, the optimization of their values and the calculation of element fractionation. We will describe the principles of the multivariate chemometric techniques considered, the aims for which they were applied and the key findings obtained. The following topics will be critically addressed: pattern recognition by cluster analysis (CA), linear discriminant analysis (LDA) and other less common techniques; modelling by multiple linear regression (MLR); investigation of spatial distribution of variables by geostatistics; calculation of fractionation patterns by a mixture resolution method (Chemometric Identification of Substrates and Element Distributions, CISED); optimization and characterization of extraction procedures by experimental design; other multivariate techniques less commonly applied. Copyright © 2010 Elsevier B.V. All rights reserved.

  8. Sparse multivariate factor analysis regression models and its applications to integrative genomics analysis.

    Science.gov (United States)

    Zhou, Yan; Wang, Pei; Wang, Xianlong; Zhu, Ji; Song, Peter X-K

    2017-01-01

    The multivariate regression model is a useful tool to explore complex associations between two kinds of molecular markers, which enables the understanding of the biological pathways underlying disease etiology. For a set of correlated response variables, accounting for such dependency can increase statistical power. Motivated by integrative genomic data analyses, we propose a new methodology-sparse multivariate factor analysis regression model (smFARM), in which correlations of response variables are assumed to follow a factor analysis model with latent factors. This proposed method not only allows us to address the challenge that the number of association parameters is larger than the sample size, but also to adjust for unobserved genetic and/or nongenetic factors that potentially conceal the underlying response-predictor associations. The proposed smFARM is implemented by the EM algorithm and the blockwise coordinate descent algorithm. The proposed methodology is evaluated and compared to the existing methods through extensive simulation studies. Our results show that accounting for latent factors through the proposed smFARM can improve sensitivity of signal detection and accuracy of sparse association map estimation. We illustrate smFARM by two integrative genomics analysis examples, a breast cancer dataset, and an ovarian cancer dataset, to assess the relationship between DNA copy numbers and gene expression arrays to understand genetic regulatory patterns relevant to the disease. We identify two trans-hub regions: one in cytoband 17q12 whose amplification influences the RNA expression levels of important breast cancer genes, and the other in cytoband 9q21.32-33, which is associated with chemoresistance in ovarian cancer. © 2016 WILEY PERIODICALS, INC.

  9. Investigation of linear regression of EPR dosimetric signal of the man tooth enamel

    International Nuclear Information System (INIS)

    Pivovarov, S.P.; Rukhin, A.B.; Zhakparov, R.K.; Vasilevskaya, L.A.

    2001-01-01

    The experimental relations of the EPR radiation signal in samples of man tooth enamel of three donors of different age up to doses 1350 Gy are examined. To all of them the linear regression is applicable. The considerable errors leading to apparent non-linearity are eliminated most. (author)

  10. Linear regression metamodeling as a tool to summarize and present simulation model results.

    Science.gov (United States)

    Jalal, Hawre; Dowd, Bryan; Sainfort, François; Kuntz, Karen M

    2013-10-01

    Modelers lack a tool to systematically and clearly present complex model results, including those from sensitivity analyses. The objective was to propose linear regression metamodeling as a tool to increase transparency of decision analytic models and better communicate their results. We used a simplified cancer cure model to demonstrate our approach. The model computed the lifetime cost and benefit of 3 treatment options for cancer patients. We simulated 10,000 cohorts in a probabilistic sensitivity analysis (PSA) and regressed the model outcomes on the standardized input parameter values in a set of regression analyses. We used the regression coefficients to describe measures of sensitivity analyses, including threshold and parameter sensitivity analyses. We also compared the results of the PSA to deterministic full-factorial and one-factor-at-a-time designs. The regression intercept represented the estimated base-case outcome, and the other coefficients described the relative parameter uncertainty in the model. We defined simple relationships that compute the average and incremental net benefit of each intervention. Metamodeling produced outputs similar to traditional deterministic 1-way or 2-way sensitivity analyses but was more reliable since it used all parameter values. Linear regression metamodeling is a simple, yet powerful, tool that can assist modelers in communicating model characteristics and sensitivity analyses.

  11. Optimal choice of basis functions in the linear regression analysis

    International Nuclear Information System (INIS)

    Khotinskij, A.M.

    1988-01-01

    Problem of optimal choice of basis functions in the linear regression analysis is investigated. Step algorithm with estimation of its efficiency, which holds true at finite number of measurements, is suggested. Conditions, providing the probability of correct choice close to 1 are formulated. Application of the step algorithm to analysis of decay curves is substantiated. 8 refs

  12. Evaluation of Logistic Regression and Multivariate Adaptive Regression Spline Models for Groundwater Potential Mapping Using R and GIS

    Directory of Open Access Journals (Sweden)

    Soyoung Park

    2017-07-01

    Full Text Available This study mapped and analyzed groundwater potential using two different models, logistic regression (LR and multivariate adaptive regression splines (MARS, and compared the results. A spatial database was constructed for groundwater well data and groundwater influence factors. Groundwater well data with a high potential yield of ≥70 m3/d were extracted, and 859 locations (70% were used for model training, whereas the other 365 locations (30% were used for model validation. We analyzed 16 groundwater influence factors including altitude, slope degree, slope aspect, plan curvature, profile curvature, topographic wetness index, stream power index, sediment transport index, distance from drainage, drainage density, lithology, distance from fault, fault density, distance from lineament, lineament density, and land cover. Groundwater potential maps (GPMs were constructed using LR and MARS models and tested using a receiver operating characteristics curve. Based on this analysis, the area under the curve (AUC for the success rate curve of GPMs created using the MARS and LR models was 0.867 and 0.838, and the AUC for the prediction rate curve was 0.836 and 0.801, respectively. This implies that the MARS model is useful and effective for groundwater potential analysis in the study area.

  13. Using Patterns for Multivariate Monitoring and Feedback Control of Linear Accelerator Performance: Proof-of-Concept Research

    International Nuclear Information System (INIS)

    Cordes, Gail Adele; Van Ausdeln, Leo Anthony; Velasquez, Maria Elena

    2002-01-01

    The report discusses preliminary proof-of-concept research for using the Advanced Data Validation and Verification System (ADVVS), a new INEEL software package, to add validation and verification and multivariate feedback control to the operation of non-destructive analysis (NDA) equipment. The software is based on human cognition, the recognition of patterns and changes in patterns in time-related data. The first project applied ADVVS to monitor operations of a selectable energy linear electron accelerator, and showed how the software recognizes in real time any deviations from the optimal tune of the machine. The second project extended the software method to provide model-based multivariate feedback control for the same linear electron accelerator. The projects successfully demonstrated proof-of-concept for the applications and focused attention on the common application of intelligent information processing techniques

  14. Selecting minimum dataset soil variables using PLSR as a regressive multivariate method

    Science.gov (United States)

    Stellacci, Anna Maria; Armenise, Elena; Castellini, Mirko; Rossi, Roberta; Vitti, Carolina; Leogrande, Rita; De Benedetto, Daniela; Ferrara, Rossana M.; Vivaldi, Gaetano A.

    2017-04-01

    Long-term field experiments and science-based tools that characterize soil status (namely the soil quality indices, SQIs) assume a strategic role in assessing the effect of agronomic techniques and thus in improving soil management especially in marginal environments. Selecting key soil variables able to best represent soil status is a critical step for the calculation of SQIs. Current studies show the effectiveness of statistical methods for variable selection to extract relevant information deriving from multivariate datasets. Principal component analysis (PCA) has been mainly used, however supervised multivariate methods and regressive techniques are progressively being evaluated (Armenise et al., 2013; de Paul Obade et al., 2016; Pulido Moncada et al., 2014). The present study explores the effectiveness of partial least square regression (PLSR) in selecting critical soil variables, using a dataset comparing conventional tillage and sod-seeding on durum wheat. The results were compared to those obtained using PCA and stepwise discriminant analysis (SDA). The soil data derived from a long-term field experiment in Southern Italy. On samples collected in April 2015, the following set of variables was quantified: (i) chemical: total organic carbon and nitrogen (TOC and TN), alkali-extractable C (TEC and humic substances - HA-FA), water extractable N and organic C (WEN and WEOC), Olsen extractable P, exchangeable cations, pH and EC; (ii) physical: texture, dry bulk density (BD), macroporosity (Pmac), air capacity (AC), and relative field capacity (RFC); (iii) biological: carbon of the microbial biomass quantified with the fumigation-extraction method. PCA and SDA were previously applied to the multivariate dataset (Stellacci et al., 2016). PLSR was carried out on mean centered and variance scaled data of predictors (soil variables) and response (wheat yield) variables using the PLS procedure of SAS/STAT. In addition, variable importance for projection (VIP

  15. LINEAR REGRESSION MODEL ESTİMATİON FOR RIGHT CENSORED DATA

    Directory of Open Access Journals (Sweden)

    Ersin Yılmaz

    2016-05-01

    Full Text Available In this study, firstly we will define a right censored data. If we say shortly right-censored data is censoring values that above the exact line. This may be related with scaling device. And then  we will use response variable acquainted from right-censored explanatory variables. Then the linear regression model will be estimated. For censored data’s existence, Kaplan-Meier weights will be used for  the estimation of the model. With the weights regression model  will be consistent and unbiased with that.   And also there is a method for the censored data that is a semi parametric regression and this method also give  useful results  for censored data too. This study also might be useful for the health studies because of the censored data used in medical issues generally.

  16. Simultaneous determination of estrogens (ethinylestradiol and norgestimate) concentrations in human and bovine serum albumin by use of fluorescence spectroscopy and multivariate regression analysis.

    Science.gov (United States)

    Hordge, LaQuana N; McDaniel, Kiara L; Jones, Derick D; Fakayode, Sayo O

    2016-05-15

    The endocrine disruption property of estrogens necessitates the immediate need for effective monitoring and development of analytical protocols for their analyses in biological and human specimens. This study explores the first combined utility of a steady-state fluorescence spectroscopy and multivariate partial-least-square (PLS) regression analysis for the simultaneous determination of two estrogens (17α-ethinylestradiol (EE) and norgestimate (NOR)) concentrations in bovine serum albumin (BSA) and human serum albumin (HSA) samples. The influence of EE and NOR concentrations and temperature on the emission spectra of EE-HSA EE-BSA, NOR-HSA, and NOR-BSA complexes was also investigated. The binding of EE with HSA and BSA resulted in increase in emission characteristics of HSA and BSA and a significant blue spectra shift. In contrast, the interaction of NOR with HSA and BSA quenched the emission characteristics of HSA and BSA. The observed emission spectral shifts preclude the effective use of traditional univariate regression analysis of fluorescent data for the determination of EE and NOR concentrations in HSA and BSA samples. Multivariate partial-least-squares (PLS) regression analysis was utilized to correlate the changes in emission spectra with EE and NOR concentrations in HSA and BSA samples. The figures-of-merit of the developed PLS regression models were excellent, with limits of detection as low as 1.6×10(-8) M for EE and 2.4×10(-7) M for NOR and good linearity (R(2)>0.994985). The PLS models correctly predicted EE and NOR concentrations in independent validation HSA and BSA samples with a root-mean-square-percent-relative-error (RMS%RE) of less than 6.0% at physiological condition. On the contrary, the use of univariate regression resulted in poor predictions of EE and NOR in HSA and BSA samples, with RMS%RE larger than 40% at physiological conditions. High accuracy, low sensitivity, simplicity, low-cost with no prior analyte extraction or separation

  17. Substituting random forest for multiple linear regression improves binding affinity prediction of scoring functions: Cyscore as a case study.

    Science.gov (United States)

    Li, Hongjian; Leung, Kwong-Sak; Wong, Man-Hon; Ballester, Pedro J

    2014-08-27

    State-of-the-art protein-ligand docking methods are generally limited by the traditionally low accuracy of their scoring functions, which are used to predict binding affinity and thus vital for discriminating between active and inactive compounds. Despite intensive research over the years, classical scoring functions have reached a plateau in their predictive performance. These assume a predetermined additive functional form for some sophisticated numerical features, and use standard multivariate linear regression (MLR) on experimental data to derive the coefficients. In this study we show that such a simple functional form is detrimental for the prediction performance of a scoring function, and replacing linear regression by machine learning techniques like random forest (RF) can improve prediction performance. We investigate the conditions of applying RF under various contexts and find that given sufficient training samples RF manages to comprehensively capture the non-linearity between structural features and measured binding affinities. Incorporating more structural features and training with more samples can both boost RF performance. In addition, we analyze the importance of structural features to binding affinity prediction using the RF variable importance tool. Lastly, we use Cyscore, a top performing empirical scoring function, as a baseline for comparison study. Machine-learning scoring functions are fundamentally different from classical scoring functions because the former circumvents the fixed functional form relating structural features with binding affinities. RF, but not MLR, can effectively exploit more structural features and more training samples, leading to higher prediction performance. The future availability of more X-ray crystal structures will further widen the performance gap between RF-based and MLR-based scoring functions. This further stresses the importance of substituting RF for MLR in scoring function development.

  18. Weighted functional linear regression models for gene-based association analysis.

    Science.gov (United States)

    Belonogova, Nadezhda M; Svishcheva, Gulnara R; Wilson, James F; Campbell, Harry; Axenovich, Tatiana I

    2018-01-01

    Functional linear regression models are effectively used in gene-based association analysis of complex traits. These models combine information about individual genetic variants, taking into account their positions and reducing the influence of noise and/or observation errors. To increase the power of methods, where several differently informative components are combined, weights are introduced to give the advantage to more informative components. Allele-specific weights have been introduced to collapsing and kernel-based approaches to gene-based association analysis. Here we have for the first time introduced weights to functional linear regression models adapted for both independent and family samples. Using data simulated on the basis of GAW17 genotypes and weights defined by allele frequencies via the beta distribution, we demonstrated that type I errors correspond to declared values and that increasing the weights of causal variants allows the power of functional linear models to be increased. We applied the new method to real data on blood pressure from the ORCADES sample. Five of the six known genes with P models. Moreover, we found an association between diastolic blood pressure and the VMP1 gene (P = 8.18×10-6), when we used a weighted functional model. For this gene, the unweighted functional and weighted kernel-based models had P = 0.004 and 0.006, respectively. The new method has been implemented in the program package FREGAT, which is freely available at https://cran.r-project.org/web/packages/FREGAT/index.html.

  19. Genomic prediction based on data from three layer lines using non-linear regression models

    NARCIS (Netherlands)

    Huang, H.; Windig, J.J.; Vereijken, A.; Calus, M.P.L.

    2014-01-01

    Background - Most studies on genomic prediction with reference populations that include multiple lines or breeds have used linear models. Data heterogeneity due to using multiple populations may conflict with model assumptions used in linear regression methods. Methods - In an attempt to alleviate

  20. A method for fitting regression splines with varying polynomial order in the linear mixed model.

    Science.gov (United States)

    Edwards, Lloyd J; Stewart, Paul W; MacDougall, James E; Helms, Ronald W

    2006-02-15

    The linear mixed model has become a widely used tool for longitudinal analysis of continuous variables. The use of regression splines in these models offers the analyst additional flexibility in the formulation of descriptive analyses, exploratory analyses and hypothesis-driven confirmatory analyses. We propose a method for fitting piecewise polynomial regression splines with varying polynomial order in the fixed effects and/or random effects of the linear mixed model. The polynomial segments are explicitly constrained by side conditions for continuity and some smoothness at the points where they join. By using a reparameterization of this explicitly constrained linear mixed model, an implicitly constrained linear mixed model is constructed that simplifies implementation of fixed-knot regression splines. The proposed approach is relatively simple, handles splines in one variable or multiple variables, and can be easily programmed using existing commercial software such as SAS or S-plus. The method is illustrated using two examples: an analysis of longitudinal viral load data from a study of subjects with acute HIV-1 infection and an analysis of 24-hour ambulatory blood pressure profiles.

  1. Computer software for linear and nonlinear regression in organic NMR

    International Nuclear Information System (INIS)

    Canto, Eduardo Leite do; Rittner, Roberto

    1991-01-01

    Calculation involving two variable linear regressions, require specific procedures generally not familiar to chemist. For attending the necessity of fast and efficient handling of NMR data, a self explained and Pc portable software has been developed, which allows user to produce and use diskette recorded tables, containing chemical shift or any other substituent physical-chemical measurements and constants (σ T , σ o R , E s , ...)

  2. Do clinical and translational science graduate students understand linear regression? Development and early validation of the REGRESS quiz.

    Science.gov (United States)

    Enders, Felicity

    2013-12-01

    Although regression is widely used for reading and publishing in the medical literature, no instruments were previously available to assess students' understanding. The goal of this study was to design and assess such an instrument for graduate students in Clinical and Translational Science and Public Health. A 27-item REsearch on Global Regression Expectations in StatisticS (REGRESS) quiz was developed through an iterative process. Consenting students taking a course on linear regression in a Clinical and Translational Science program completed the quiz pre- and postcourse. Student results were compared to practicing statisticians with a master's or doctoral degree in statistics or a closely related field. Fifty-two students responded precourse, 59 postcourse , and 22 practicing statisticians completed the quiz. The mean (SD) score was 9.3 (4.3) for students precourse and 19.0 (3.5) postcourse (P REGRESS quiz was internally reliable (Cronbach's alpha 0.89). The initial validation is quite promising with statistically significant and meaningful differences across time and study populations. Further work is needed to validate the quiz across multiple institutions. © 2013 Wiley Periodicals, Inc.

  3. Power properties of invariant tests for spatial autocorrelation in linear regression

    NARCIS (Netherlands)

    Martellosio, F.

    2006-01-01

    Many popular tests for residual spatial autocorrelation in the context of the linear regression model belong to the class of invariant tests. This paper derives a number of exact properties of the power function of such tests. In particular, we extend the work of Krämer (2005, Journal of Statistical

  4. Predicting recovery of cognitive function soon after stroke: differential modeling of logarithmic and linear regression.

    Science.gov (United States)

    Suzuki, Makoto; Sugimura, Yuko; Yamada, Sumio; Omori, Yoshitsugu; Miyamoto, Masaaki; Yamamoto, Jun-ichi

    2013-01-01

    Cognitive disorders in the acute stage of stroke are common and are important independent predictors of adverse outcome in the long term. Despite the impact of cognitive disorders on both patients and their families, it is still difficult to predict the extent or duration of cognitive impairments. The objective of the present study was, therefore, to provide data on predicting the recovery of cognitive function soon after stroke by differential modeling with logarithmic and linear regression. This study included two rounds of data collection comprising 57 stroke patients enrolled in the first round for the purpose of identifying the time course of cognitive recovery in the early-phase group data, and 43 stroke patients in the second round for the purpose of ensuring that the correlation of the early-phase group data applied to the prediction of each individual's degree of cognitive recovery. In the first round, Mini-Mental State Examination (MMSE) scores were assessed 3 times during hospitalization, and the scores were regressed on the logarithm and linear of time. In the second round, calculations of MMSE scores were made for the first two scoring times after admission to tailor the structures of logarithmic and linear regression formulae to fit an individual's degree of functional recovery. The time course of early-phase recovery for cognitive functions resembled both logarithmic and linear functions. However, MMSE scores sampled at two baseline points based on logarithmic regression modeling could estimate prediction of cognitive recovery more accurately than could linear regression modeling (logarithmic modeling, R(2) = 0.676, PLogarithmic modeling based on MMSE scores could accurately predict the recovery of cognitive function soon after the occurrence of stroke. This logarithmic modeling with mathematical procedures is simple enough to be adopted in daily clinical practice.

  5. Tutorial on Biostatistics: Linear Regression Analysis of Continuous Correlated Eye Data.

    Science.gov (United States)

    Ying, Gui-Shuang; Maguire, Maureen G; Glynn, Robert; Rosner, Bernard

    2017-04-01

    To describe and demonstrate appropriate linear regression methods for analyzing correlated continuous eye data. We describe several approaches to regression analysis involving both eyes, including mixed effects and marginal models under various covariance structures to account for inter-eye correlation. We demonstrate, with SAS statistical software, applications in a study comparing baseline refractive error between one eye with choroidal neovascularization (CNV) and the unaffected fellow eye, and in a study determining factors associated with visual field in the elderly. When refractive error from both eyes were analyzed with standard linear regression without accounting for inter-eye correlation (adjusting for demographic and ocular covariates), the difference between eyes with CNV and fellow eyes was 0.15 diopters (D; 95% confidence interval, CI -0.03 to 0.32D, p = 0.10). Using a mixed effects model or a marginal model, the estimated difference was the same but with narrower 95% CI (0.01 to 0.28D, p = 0.03). Standard regression for visual field data from both eyes provided biased estimates of standard error (generally underestimated) and smaller p-values, while analysis of the worse eye provided larger p-values than mixed effects models and marginal models. In research involving both eyes, ignoring inter-eye correlation can lead to invalid inferences. Analysis using only right or left eyes is valid, but decreases power. Worse-eye analysis can provide less power and biased estimates of effect. Mixed effects or marginal models using the eye as the unit of analysis should be used to appropriately account for inter-eye correlation and maximize power and precision.

  6. Efficient Determination of Free Energy Landscapes in Multiple Dimensions from Biased Umbrella Sampling Simulations Using Linear Regression.

    Science.gov (United States)

    Meng, Yilin; Roux, Benoît

    2015-08-11

    The weighted histogram analysis method (WHAM) is a standard protocol for postprocessing the information from biased umbrella sampling simulations to construct the potential of mean force with respect to a set of order parameters. By virtue of the WHAM equations, the unbiased density of state is determined by satisfying a self-consistent condition through an iterative procedure. While the method works very effectively when the number of order parameters is small, its computational cost grows rapidly in higher dimension. Here, we present a simple and efficient alternative strategy, which avoids solving the self-consistent WHAM equations iteratively. An efficient multivariate linear regression framework is utilized to link the biased probability densities of individual umbrella windows and yield an unbiased global free energy landscape in the space of order parameters. It is demonstrated with practical examples that free energy landscapes that are comparable in accuracy to WHAM can be generated at a small fraction of the cost.

  7. truncSP: An R Package for Estimation of Semi-Parametric Truncated Linear Regression Models

    Directory of Open Access Journals (Sweden)

    Maria Karlsson

    2014-05-01

    Full Text Available Problems with truncated data occur in many areas, complicating estimation and inference. Regarding linear regression models, the ordinary least squares estimator is inconsistent and biased for these types of data and is therefore unsuitable for use. Alternative estimators, designed for the estimation of truncated regression models, have been developed. This paper presents the R package truncSP. The package contains functions for the estimation of semi-parametric truncated linear regression models using three different estimators: the symmetrically trimmed least squares, quadratic mode, and left truncated estimators, all of which have been shown to have good asymptotic and ?nite sample properties. The package also provides functions for the analysis of the estimated models. Data from the environmental sciences are used to illustrate the functions in the package.

  8. COLOR IMAGE RETRIEVAL BASED ON FEATURE FUSION THROUGH MULTIPLE LINEAR REGRESSION ANALYSIS

    Directory of Open Access Journals (Sweden)

    K. Seetharaman

    2015-08-01

    Full Text Available This paper proposes a novel technique based on feature fusion using multiple linear regression analysis, and the least-square estimation method is employed to estimate the parameters. The given input query image is segmented into various regions according to the structure of the image. The color and texture features are extracted on each region of the query image, and the features are fused together using the multiple linear regression model. The estimated parameters of the model, which is modeled based on the features, are formed as a vector called a feature vector. The Canberra distance measure is adopted to compare the feature vectors of the query and target images. The F-measure is applied to evaluate the performance of the proposed technique. The obtained results expose that the proposed technique is comparable to the other existing techniques.

  9. Assessment of Genetic Heterogeneity in Structured Plant Populations Using Multivariate Whole-Genome Regression Models.

    Science.gov (United States)

    Lehermeier, Christina; Schön, Chris-Carolin; de Los Campos, Gustavo

    2015-09-01

    Plant breeding populations exhibit varying levels of structure and admixture; these features are likely to induce heterogeneity of marker effects across subpopulations. Traditionally, structure has been dealt with as a potential confounder, and various methods exist to "correct" for population stratification. However, these methods induce a mean correction that does not account for heterogeneity of marker effects. The animal breeding literature offers a few recent studies that consider modeling genetic heterogeneity in multibreed data, using multivariate models. However, these methods have received little attention in plant breeding where population structure can have different forms. In this article we address the problem of analyzing data from heterogeneous plant breeding populations, using three approaches: (a) a model that ignores population structure [A-genome-based best linear unbiased prediction (A-GBLUP)], (b) a stratified (i.e., within-group) analysis (W-GBLUP), and (c) a multivariate approach that uses multigroup data and accounts for heterogeneity (MG-GBLUP). The performance of the three models was assessed on three different data sets: a diversity panel of rice (Oryza sativa), a maize (Zea mays L.) half-sib panel, and a wheat (Triticum aestivum L.) data set that originated from plant breeding programs. The estimated genomic correlations between subpopulations varied from null to moderate, depending on the genetic distance between subpopulations and traits. Our assessment of prediction accuracy features cases where ignoring population structure leads to a parsimonious more powerful model as well as others where the multivariate and stratified approaches have higher predictive power. In general, the multivariate approach appeared slightly more robust than either the A- or the W-GBLUP. Copyright © 2015 by the Genetics Society of America.

  10. Analysis of interactive fixed effects dynamic linear panel regression with measurement error

    OpenAIRE

    Nayoung Lee; Hyungsik Roger Moon; Martin Weidner

    2011-01-01

    This paper studies a simple dynamic panel linear regression model with interactive fixed effects in which the variable of interest is measured with error. To estimate the dynamic coefficient, we consider the least-squares minimum distance (LS-MD) estimation method.

  11. Multicollinearity in applied economics research and the Bayesian linear regression

    OpenAIRE

    EISENSTAT, Eric

    2016-01-01

    This article revises the popular issue of collinearity amongst explanatory variables in the context of a multiple linear regression analysis, particularly in empirical studies within social science related fields. Some important interpretations and explanations are highlighted from the econometrics literature with respect to the effects of multicollinearity on statistical inference, as well as the general shortcomings of the once fervent search for methods intended to detect and mitigate thes...

  12. Robust best linear estimation for regression analysis using surrogate and instrumental variables.

    Science.gov (United States)

    Wang, C Y

    2012-04-01

    We investigate methods for regression analysis when covariates are measured with errors. In a subset of the whole cohort, a surrogate variable is available for the true unobserved exposure variable. The surrogate variable satisfies the classical measurement error model, but it may not have repeated measurements. In addition to the surrogate variables that are available among the subjects in the calibration sample, we assume that there is an instrumental variable (IV) that is available for all study subjects. An IV is correlated with the unobserved true exposure variable and hence can be useful in the estimation of the regression coefficients. We propose a robust best linear estimator that uses all the available data, which is the most efficient among a class of consistent estimators. The proposed estimator is shown to be consistent and asymptotically normal under very weak distributional assumptions. For Poisson or linear regression, the proposed estimator is consistent even if the measurement error from the surrogate or IV is heteroscedastic. Finite-sample performance of the proposed estimator is examined and compared with other estimators via intensive simulation studies. The proposed method and other methods are applied to a bladder cancer case-control study.

  13. COMPARISON OF PARTIAL LEAST SQUARES REGRESSION METHOD ALGORITHMS: NIPALS AND PLS-KERNEL AND AN APPLICATION

    Directory of Open Access Journals (Sweden)

    ELİF BULUT

    2013-06-01

    Full Text Available Partial Least Squares Regression (PLSR is a multivariate statistical method that consists of partial least squares and multiple linear regression analysis. Explanatory variables, X, having multicollinearity are reduced to components which explain the great amount of covariance between explanatory and response variable. These components are few in number and they don’t have multicollinearity problem. Then multiple linear regression analysis is applied to those components to model the response variable Y. There are various PLSR algorithms. In this study NIPALS and PLS-Kernel algorithms will be studied and illustrated on a real data set.

  14. Sparse reduced-rank regression with covariance estimation

    KAUST Repository

    Chen, Lisha

    2014-12-08

    Improving the predicting performance of the multiple response regression compared with separate linear regressions is a challenging question. On the one hand, it is desirable to seek model parsimony when facing a large number of parameters. On the other hand, for certain applications it is necessary to take into account the general covariance structure for the errors of the regression model. We assume a reduced-rank regression model and work with the likelihood function with general error covariance to achieve both objectives. In addition we propose to select relevant variables for reduced-rank regression by using a sparsity-inducing penalty, and to estimate the error covariance matrix simultaneously by using a similar penalty on the precision matrix. We develop a numerical algorithm to solve the penalized regression problem. In a simulation study and real data analysis, the new method is compared with two recent methods for multivariate regression and exhibits competitive performance in prediction and variable selection.

  15. Sparse reduced-rank regression with covariance estimation

    KAUST Repository

    Chen, Lisha; Huang, Jianhua Z.

    2014-01-01

    Improving the predicting performance of the multiple response regression compared with separate linear regressions is a challenging question. On the one hand, it is desirable to seek model parsimony when facing a large number of parameters. On the other hand, for certain applications it is necessary to take into account the general covariance structure for the errors of the regression model. We assume a reduced-rank regression model and work with the likelihood function with general error covariance to achieve both objectives. In addition we propose to select relevant variables for reduced-rank regression by using a sparsity-inducing penalty, and to estimate the error covariance matrix simultaneously by using a similar penalty on the precision matrix. We develop a numerical algorithm to solve the penalized regression problem. In a simulation study and real data analysis, the new method is compared with two recent methods for multivariate regression and exhibits competitive performance in prediction and variable selection.

  16. Real estate value prediction using multivariate regression models

    Science.gov (United States)

    Manjula, R.; Jain, Shubham; Srivastava, Sharad; Rajiv Kher, Pranav

    2017-11-01

    The real estate market is one of the most competitive in terms of pricing and the same tends to vary significantly based on a lot of factors, hence it becomes one of the prime fields to apply the concepts of machine learning to optimize and predict the prices with high accuracy. Therefore in this paper, we present various important features to use while predicting housing prices with good accuracy. We have described regression models, using various features to have lower Residual Sum of Squares error. While using features in a regression model some feature engineering is required for better prediction. Often a set of features (multiple regressions) or polynomial regression (applying a various set of powers in the features) is used for making better model fit. For these models are expected to be susceptible towards over fitting ridge regression is used to reduce it. This paper thus directs to the best application of regression models in addition to other techniques to optimize the result.

  17. Relative Importance for Linear Regression in R: The Package relaimpo

    Directory of Open Access Journals (Sweden)

    Ulrike Gromping

    2006-09-01

    Full Text Available Relative importance is a topic that has seen a lot of interest in recent years, particularly in applied work. The R package relaimpo implements six different metrics for assessing relative importance of regressors in the linear model, two of which are recommended - averaging over orderings of regressors and a newly proposed metric (Feldman 2005 called pmvd. Apart from delivering the metrics themselves, relaimpo also provides (exploratory bootstrap confidence intervals. This paper offers a brief tutorial introduction to the package. The methods and relaimpo’s functionality are illustrated using the data set swiss that is generally available in R. The paper targets readers who have a basic understanding of multiple linear regression. For the background of more advanced aspects, references are provided.

  18. Extending multivariate distance matrix regression with an effect size measure and the asymptotic null distribution of the test statistic.

    Science.gov (United States)

    McArtor, Daniel B; Lubke, Gitta H; Bergeman, C S

    2017-12-01

    Person-centered methods are useful for studying individual differences in terms of (dis)similarities between response profiles on multivariate outcomes. Multivariate distance matrix regression (MDMR) tests the significance of associations of response profile (dis)similarities and a set of predictors using permutation tests. This paper extends MDMR by deriving and empirically validating the asymptotic null distribution of its test statistic, and by proposing an effect size for individual outcome variables, which is shown to recover true associations. These extensions alleviate the computational burden of permutation tests currently used in MDMR and render more informative results, thus making MDMR accessible to new research domains.

  19. Recursive and non-linear logistic regression: moving on from the original EuroSCORE and EuroSCORE II methodologies.

    Science.gov (United States)

    Poullis, Michael

    2014-11-01

    EuroSCORE II, despite improving on the original EuroSCORE system, has not solved all the calibration and predictability issues. Recursive, non-linear and mixed recursive and non-linear regression analysis were assessed with regard to sensitivity, specificity and predictability of the original EuroSCORE and EuroSCORE II systems. The original logistic EuroSCORE, EuroSCORE II and recursive, non-linear and mixed recursive and non-linear regression analyses of these risk models were assessed via receiver operator characteristic curves (ROC) and Hosmer-Lemeshow statistic analysis with regard to the accuracy of predicting in-hospital mortality. Analysis was performed for isolated coronary artery bypass grafts (CABGs) (n = 2913), aortic valve replacement (AVR) (n = 814), mitral valve surgery (n = 340), combined AVR and CABG (n = 517), aortic (n = 350), miscellaneous cases (n = 642), and combinations of the above cases (n = 5576). The original EuroSCORE had an ROC below 0.7 for isolated AVR and combined AVR and CABG. None of the methods described increased the ROC above 0.7. The EuroSCORE II risk model had an ROC below 0.7 for isolated AVR only. Recursive regression, non-linear regression, and mixed recursive and non-linear regression all increased the ROC above 0.7 for isolated AVR. The original EuroSCORE had a Hosmer-Lemeshow statistic that was above 0.05 for all patients and the subgroups analysed. All of the techniques markedly increased the Hosmer-Lemeshow statistic. The EuroSCORE II risk model had a Hosmer-Lemeshow statistic that was significant for all patients (P linear regression failed to improve on the original Hosmer-Lemeshow statistic. The mixed recursive and non-linear regression using the EuroSCORE II risk model was the only model that produced an ROC of 0.7 or above for all patients and procedures and had a Hosmer-Lemeshow statistic that was highly non-significant. The original EuroSCORE and the EuroSCORE II risk models do not have adequate ROC and Hosmer

  20. [Prediction model of health workforce and beds in county hospitals of Hunan by multiple linear regression].

    Science.gov (United States)

    Ling, Ru; Liu, Jiawang

    2011-12-01

    To construct prediction model for health workforce and hospital beds in county hospitals of Hunan by multiple linear regression. We surveyed 16 counties in Hunan with stratified random sampling according to uniform questionnaires,and multiple linear regression analysis with 20 quotas selected by literature view was done. Independent variables in the multiple linear regression model on medical personnels in county hospitals included the counties' urban residents' income, crude death rate, medical beds, business occupancy, professional equipment value, the number of devices valued above 10 000 yuan, fixed assets, long-term debt, medical income, medical expenses, outpatient and emergency visits, hospital visits, actual available bed days, and utilization rate of hospital beds. Independent variables in the multiple linear regression model on county hospital beds included the the population of aged 65 and above in the counties, disposable income of urban residents, medical personnel of medical institutions in county area, business occupancy, the total value of professional equipment, fixed assets, long-term debt, medical income, medical expenses, outpatient and emergency visits, hospital visits, actual available bed days, utilization rate of hospital beds, and length of hospitalization. The prediction model shows good explanatory and fitting, and may be used for short- and mid-term forecasting.

  1. Determination of boiling point of petrochemicals by gas chromatography-mass spectrometry and multivariate regression analysis of structural activity relationship.

    Science.gov (United States)

    Fakayode, Sayo O; Mitchell, Breanna S; Pollard, David A

    2014-08-01

    Accurate understanding of analyte boiling points (BP) is of critical importance in gas chromatographic (GC) separation and crude oil refinery operation in petrochemical industries. This study reported the first combined use of GC separation and partial-least-square (PLS1) multivariate regression analysis of petrochemical structural activity relationship (SAR) for accurate BP determination of two commercially available (D3710 and MA VHP) calibration gas mix samples. The results of the BP determination using PLS1 multivariate regression were further compared with the results of traditional simulated distillation method of BP determination. The developed PLS1 regression was able to correctly predict analytes BP in D3710 and MA VHP calibration gas mix samples, with a root-mean-square-%-relative-error (RMS%RE) of 6.4%, and 10.8% respectively. In contrast, the overall RMS%RE of 32.9% and 40.4%, respectively obtained for BP determination in D3710 and MA VHP using a traditional simulated distillation method were approximately four times larger than the corresponding RMS%RE of BP prediction using MRA, demonstrating the better predictive ability of MRA. The reported method is rapid, robust, and promising, and can be potentially used routinely for fast analysis, pattern recognition, and analyte BP determination in petrochemical industries. Copyright © 2014 Elsevier B.V. All rights reserved.

  2. Linear and support vector regressions based on geometrical correlation of data

    Directory of Open Access Journals (Sweden)

    Kaijun Wang

    2007-10-01

    Full Text Available Linear regression (LR and support vector regression (SVR are widely used in data analysis. Geometrical correlation learning (GcLearn was proposed recently to improve the predictive ability of LR and SVR through mining and using correlations between data of a variable (inner correlation. This paper theoretically analyzes prediction performance of the GcLearn method and proves that GcLearn LR and SVR will have better prediction performance than traditional LR and SVR for prediction tasks when good inner correlations are obtained and predictions by traditional LR and SVR are far away from their neighbor training data under inner correlation. This gives the applicable condition of GcLearn method.

  3. Inference for multivariate regression model based on multiply imputed synthetic data generated via posterior predictive sampling

    Science.gov (United States)

    Moura, Ricardo; Sinha, Bimal; Coelho, Carlos A.

    2017-06-01

    The recent popularity of the use of synthetic data as a Statistical Disclosure Control technique has enabled the development of several methods of generating and analyzing such data, but almost always relying in asymptotic distributions and in consequence being not adequate for small sample datasets. Thus, a likelihood-based exact inference procedure is derived for the matrix of regression coefficients of the multivariate regression model, for multiply imputed synthetic data generated via Posterior Predictive Sampling. Since it is based in exact distributions this procedure may even be used in small sample datasets. Simulation studies compare the results obtained from the proposed exact inferential procedure with the results obtained from an adaptation of Reiters combination rule to multiply imputed synthetic datasets and an application to the 2000 Current Population Survey is discussed.

  4. Regularized Label Relaxation Linear Regression.

    Science.gov (United States)

    Fang, Xiaozhao; Xu, Yong; Li, Xuelong; Lai, Zhihui; Wong, Wai Keung; Fang, Bingwu

    2018-04-01

    Linear regression (LR) and some of its variants have been widely used for classification problems. Most of these methods assume that during the learning phase, the training samples can be exactly transformed into a strict binary label matrix, which has too little freedom to fit the labels adequately. To address this problem, in this paper, we propose a novel regularized label relaxation LR method, which has the following notable characteristics. First, the proposed method relaxes the strict binary label matrix into a slack variable matrix by introducing a nonnegative label relaxation matrix into LR, which provides more freedom to fit the labels and simultaneously enlarges the margins between different classes as much as possible. Second, the proposed method constructs the class compactness graph based on manifold learning and uses it as the regularization item to avoid the problem of overfitting. The class compactness graph is used to ensure that the samples sharing the same labels can be kept close after they are transformed. Two different algorithms, which are, respectively, based on -norm and -norm loss functions are devised. These two algorithms have compact closed-form solutions in each iteration so that they are easily implemented. Extensive experiments show that these two algorithms outperform the state-of-the-art algorithms in terms of the classification accuracy and running time.

  5. Assessing risk factors for periodontitis using regression

    Science.gov (United States)

    Lobo Pereira, J. A.; Ferreira, Maria Cristina; Oliveira, Teresa

    2013-10-01

    Multivariate statistical analysis is indispensable to assess the associations and interactions between different factors and the risk of periodontitis. Among others, regression analysis is a statistical technique widely used in healthcare to investigate and model the relationship between variables. In our work we study the impact of socio-demographic, medical and behavioral factors on periodontal health. Using regression, linear and logistic models, we can assess the relevance, as risk factors for periodontitis disease, of the following independent variables (IVs): Age, Gender, Diabetic Status, Education, Smoking status and Plaque Index. The multiple linear regression analysis model was built to evaluate the influence of IVs on mean Attachment Loss (AL). Thus, the regression coefficients along with respective p-values will be obtained as well as the respective p-values from the significance tests. The classification of a case (individual) adopted in the logistic model was the extent of the destruction of periodontal tissues defined by an Attachment Loss greater than or equal to 4 mm in 25% (AL≥4mm/≥25%) of sites surveyed. The association measures include the Odds Ratios together with the correspondent 95% confidence intervals.

  6. Generalized multivariate Fokker-Planck equations derived from kinetic transport theory and linear nonequilibrium thermodynamics

    International Nuclear Information System (INIS)

    Frank, T.D.

    2002-01-01

    We study many particle systems in the context of mean field forces, concentration-dependent diffusion coefficients, generalized equilibrium distributions, and quantum statistics. Using kinetic transport theory and linear nonequilibrium thermodynamics we derive for these systems a generalized multivariate Fokker-Planck equation. It is shown that this Fokker-Planck equation describes relaxation processes, has stationary maximum entropy distributions, can have multiple stationary solutions and stationary solutions that differ from Boltzmann distributions

  7. Evaluation of accuracy of linear regression models in predicting urban stormwater discharge characteristics.

    Science.gov (United States)

    Madarang, Krish J; Kang, Joo-Hyon

    2014-06-01

    Stormwater runoff has been identified as a source of pollution for the environment, especially for receiving waters. In order to quantify and manage the impacts of stormwater runoff on the environment, predictive models and mathematical models have been developed. Predictive tools such as regression models have been widely used to predict stormwater discharge characteristics. Storm event characteristics, such as antecedent dry days (ADD), have been related to response variables, such as pollutant loads and concentrations. However it has been a controversial issue among many studies to consider ADD as an important variable in predicting stormwater discharge characteristics. In this study, we examined the accuracy of general linear regression models in predicting discharge characteristics of roadway runoff. A total of 17 storm events were monitored in two highway segments, located in Gwangju, Korea. Data from the monitoring were used to calibrate United States Environmental Protection Agency's Storm Water Management Model (SWMM). The calibrated SWMM was simulated for 55 storm events, and the results of total suspended solid (TSS) discharge loads and event mean concentrations (EMC) were extracted. From these data, linear regression models were developed. R(2) and p-values of the regression of ADD for both TSS loads and EMCs were investigated. Results showed that pollutant loads were better predicted than pollutant EMC in the multiple regression models. Regression may not provide the true effect of site-specific characteristics, due to uncertainty in the data. Copyright © 2014 The Research Centre for Eco-Environmental Sciences, Chinese Academy of Sciences. Published by Elsevier B.V. All rights reserved.

  8. On the degrees of freedom of reduced-rank estimators in multivariate regression.

    Science.gov (United States)

    Mukherjee, A; Chen, K; Wang, N; Zhu, J

    We study the effective degrees of freedom of a general class of reduced-rank estimators for multivariate regression in the framework of Stein's unbiased risk estimation. A finite-sample exact unbiased estimator is derived that admits a closed-form expression in terms of the thresholded singular values of the least-squares solution and hence is readily computable. The results continue to hold in the high-dimensional setting where both the predictor and the response dimensions may be larger than the sample size. The derived analytical form facilitates the investigation of theoretical properties and provides new insights into the empirical behaviour of the degrees of freedom. In particular, we examine the differences and connections between the proposed estimator and a commonly-used naive estimator. The use of the proposed estimator leads to efficient and accurate prediction risk estimation and model selection, as demonstrated by simulation studies and a data example.

  9. Modeling Pan Evaporation for Kuwait by Multiple Linear Regression

    Science.gov (United States)

    Almedeij, Jaber

    2012-01-01

    Evaporation is an important parameter for many projects related to hydrology and water resources systems. This paper constitutes the first study conducted in Kuwait to obtain empirical relations for the estimation of daily and monthly pan evaporation as functions of available meteorological data of temperature, relative humidity, and wind speed. The data used here for the modeling are daily measurements of substantial continuity coverage, within a period of 17 years between January 1993 and December 2009, which can be considered representative of the desert climate of the urban zone of the country. Multiple linear regression technique is used with a procedure of variable selection for fitting the best model forms. The correlations of evaporation with temperature and relative humidity are also transformed in order to linearize the existing curvilinear patterns of the data by using power and exponential functions, respectively. The evaporation models suggested with the best variable combinations were shown to produce results that are in a reasonable agreement with observation values. PMID:23226984

  10. A Cross-Domain Collaborative Filtering Algorithm Based on Feature Construction and Locally Weighted Linear Regression.

    Science.gov (United States)

    Yu, Xu; Lin, Jun-Yu; Jiang, Feng; Du, Jun-Wei; Han, Ji-Zhong

    2018-01-01

    Cross-domain collaborative filtering (CDCF) solves the sparsity problem by transferring rating knowledge from auxiliary domains. Obviously, different auxiliary domains have different importance to the target domain. However, previous works cannot evaluate effectively the significance of different auxiliary domains. To overcome this drawback, we propose a cross-domain collaborative filtering algorithm based on Feature Construction and Locally Weighted Linear Regression (FCLWLR). We first construct features in different domains and use these features to represent different auxiliary domains. Thus the weight computation across different domains can be converted as the weight computation across different features. Then we combine the features in the target domain and in the auxiliary domains together and convert the cross-domain recommendation problem into a regression problem. Finally, we employ a Locally Weighted Linear Regression (LWLR) model to solve the regression problem. As LWLR is a nonparametric regression method, it can effectively avoid underfitting or overfitting problem occurring in parametric regression methods. We conduct extensive experiments to show that the proposed FCLWLR algorithm is effective in addressing the data sparsity problem by transferring the useful knowledge from the auxiliary domains, as compared to many state-of-the-art single-domain or cross-domain CF methods.

  11. Estimate the contribution of incubation parameters influence egg hatchability using multiple linear regression analysis.

    Science.gov (United States)

    Khalil, Mohamed H; Shebl, Mostafa K; Kosba, Mohamed A; El-Sabrout, Karim; Zaki, Nesma

    2016-08-01

    This research was conducted to determine the most affecting parameters on hatchability of indigenous and improved local chickens' eggs. Five parameters were studied (fertility, early and late embryonic mortalities, shape index, egg weight, and egg weight loss) on four strains, namely Fayoumi, Alexandria, Matrouh, and Montazah. Multiple linear regression was performed on the studied parameters to determine the most influencing one on hatchability. The results showed significant differences in commercial and scientific hatchability among strains. Alexandria strain has the highest significant commercial hatchability (80.70%). Regarding the studied strains, highly significant differences in hatching chick weight among strains were observed. Using multiple linear regression analysis, fertility made the greatest percent contribution (71.31%) to hatchability, and the lowest percent contributions were made by shape index and egg weight loss. A prediction of hatchability using multiple regression analysis could be a good tool to improve hatchability percentage in chickens.

  12. User's Guide to the Weighted-Multiple-Linear Regression Program (WREG version 1.0)

    Science.gov (United States)

    Eng, Ken; Chen, Yin-Yu; Kiang, Julie.E.

    2009-01-01

    Streamflow is not measured at every location in a stream network. Yet hydrologists, State and local agencies, and the general public still seek to know streamflow characteristics, such as mean annual flow or flood flows with different exceedance probabilities, at ungaged basins. The goals of this guide are to introduce and familiarize the user with the weighted multiple-linear regression (WREG) program, and to also provide the theoretical background for program features. The program is intended to be used to develop a regional estimation equation for streamflow characteristics that can be applied at an ungaged basin, or to improve the corresponding estimate at continuous-record streamflow gages with short records. The regional estimation equation results from a multiple-linear regression that relates the observable basin characteristics, such as drainage area, to streamflow characteristics.

  13. Linear Regression Analysis

    CERN Document Server

    Seber, George A F

    2012-01-01

    Concise, mathematically clear, and comprehensive treatment of the subject.* Expanded coverage of diagnostics and methods of model fitting.* Requires no specialized knowledge beyond a good grasp of matrix algebra and some acquaintance with straight-line regression and simple analysis of variance models.* More than 200 problems throughout the book plus outline solutions for the exercises.* This revision has been extensively class-tested.

  14. Introduction to statistical modelling 2: categorical variables and interactions in linear regression.

    Science.gov (United States)

    Lunt, Mark

    2015-07-01

    In the first article in this series we explored the use of linear regression to predict an outcome variable from a number of predictive factors. It assumed that the predictive factors were measured on an interval scale. However, this article shows how categorical variables can also be included in a linear regression model, enabling predictions to be made separately for different groups and allowing for testing the hypothesis that the outcome differs between groups. The use of interaction terms to measure whether the effect of a particular predictor variable differs between groups is also explained. An alternative approach to testing the difference between groups of the effect of a given predictor, which consists of measuring the effect in each group separately and seeing whether the statistical significance differs between the groups, is shown to be misleading. © The Author 2013. Published by Oxford University Press on behalf of the British Society for Rheumatology. All rights reserved. For Permissions, please email: journals.permissions@oup.com.

  15. Pleiotropy analysis of quantitative traits at gene level by multivariate functional linear models.

    Science.gov (United States)

    Wang, Yifan; Liu, Aiyi; Mills, James L; Boehnke, Michael; Wilson, Alexander F; Bailey-Wilson, Joan E; Xiong, Momiao; Wu, Colin O; Fan, Ruzong

    2015-05-01

    In genetics, pleiotropy describes the genetic effect of a single gene on multiple phenotypic traits. A common approach is to analyze the phenotypic traits separately using univariate analyses and combine the test results through multiple comparisons. This approach may lead to low power. Multivariate functional linear models are developed to connect genetic variant data to multiple quantitative traits adjusting for covariates for a unified analysis. Three types of approximate F-distribution tests based on Pillai-Bartlett trace, Hotelling-Lawley trace, and Wilks's Lambda are introduced to test for association between multiple quantitative traits and multiple genetic variants in one genetic region. The approximate F-distribution tests provide much more significant results than those of F-tests of univariate analysis and optimal sequence kernel association test (SKAT-O). Extensive simulations were performed to evaluate the false positive rates and power performance of the proposed models and tests. We show that the approximate F-distribution tests control the type I error rates very well. Overall, simultaneous analysis of multiple traits can increase power performance compared to an individual test of each trait. The proposed methods were applied to analyze (1) four lipid traits in eight European cohorts, and (2) three biochemical traits in the Trinity Students Study. The approximate F-distribution tests provide much more significant results than those of F-tests of univariate analysis and SKAT-O for the three biochemical traits. The approximate F-distribution tests of the proposed functional linear models are more sensitive than those of the traditional multivariate linear models that in turn are more sensitive than SKAT-O in the univariate case. The analysis of the four lipid traits and the three biochemical traits detects more association than SKAT-O in the univariate case. © 2015 WILEY PERIODICALS, INC.

  16. A note on the use of multiple linear regression in molecular ecology.

    Science.gov (United States)

    Frasier, Timothy R

    2016-03-01

    Multiple linear regression analyses (also often referred to as generalized linear models--GLMs, or generalized linear mixed models--GLMMs) are widely used in the analysis of data in molecular ecology, often to assess the relative effects of genetic characteristics on individual fitness or traits, or how environmental characteristics influence patterns of genetic differentiation. However, the coefficients resulting from multiple regression analyses are sometimes misinterpreted, which can lead to incorrect interpretations and conclusions within individual studies, and can propagate to wider-spread errors in the general understanding of a topic. The primary issue revolves around the interpretation of coefficients for independent variables when interaction terms are also included in the analyses. In this scenario, the coefficients associated with each independent variable are often interpreted as the independent effect of each predictor variable on the predicted variable. However, this interpretation is incorrect. The correct interpretation is that these coefficients represent the effect of each predictor variable on the predicted variable when all other predictor variables are zero. This difference may sound subtle, but the ramifications cannot be overstated. Here, my goals are to raise awareness of this issue, to demonstrate and emphasize the problems that can result and to provide alternative approaches for obtaining the desired information. © 2015 John Wiley & Sons Ltd.

  17. An evaluation of bias in propensity score-adjusted non-linear regression models.

    Science.gov (United States)

    Wan, Fei; Mitra, Nandita

    2018-03-01

    Propensity score methods are commonly used to adjust for observed confounding when estimating the conditional treatment effect in observational studies. One popular method, covariate adjustment of the propensity score in a regression model, has been empirically shown to be biased in non-linear models. However, no compelling underlying theoretical reason has been presented. We propose a new framework to investigate bias and consistency of propensity score-adjusted treatment effects in non-linear models that uses a simple geometric approach to forge a link between the consistency of the propensity score estimator and the collapsibility of non-linear models. Under this framework, we demonstrate that adjustment of the propensity score in an outcome model results in the decomposition of observed covariates into the propensity score and a remainder term. Omission of this remainder term from a non-collapsible regression model leads to biased estimates of the conditional odds ratio and conditional hazard ratio, but not for the conditional rate ratio. We further show, via simulation studies, that the bias in these propensity score-adjusted estimators increases with larger treatment effect size, larger covariate effects, and increasing dissimilarity between the coefficients of the covariates in the treatment model versus the outcome model.

  18. Improving the Prediction of Total Surgical Procedure Time Using Linear Regression Modeling

    Directory of Open Access Journals (Sweden)

    Eric R. Edelman

    2017-06-01

    Full Text Available For efficient utilization of operating rooms (ORs, accurate schedules of assigned block time and sequences of patient cases need to be made. The quality of these planning tools is dependent on the accurate prediction of total procedure time (TPT per case. In this paper, we attempt to improve the accuracy of TPT predictions by using linear regression models based on estimated surgeon-controlled time (eSCT and other variables relevant to TPT. We extracted data from a Dutch benchmarking database of all surgeries performed in six academic hospitals in The Netherlands from 2012 till 2016. The final dataset consisted of 79,983 records, describing 199,772 h of total OR time. Potential predictors of TPT that were included in the subsequent analysis were eSCT, patient age, type of operation, American Society of Anesthesiologists (ASA physical status classification, and type of anesthesia used. First, we computed the predicted TPT based on a previously described fixed ratio model for each record, multiplying eSCT by 1.33. This number is based on the research performed by van Veen-Berkx et al., which showed that 33% of SCT is generally a good approximation of anesthesia-controlled time (ACT. We then systematically tested all possible linear regression models to predict TPT using eSCT in combination with the other available independent variables. In addition, all regression models were again tested without eSCT as a predictor to predict ACT separately (which leads to TPT by adding SCT. TPT was most accurately predicted using a linear regression model based on the independent variables eSCT, type of operation, ASA classification, and type of anesthesia. This model performed significantly better than the fixed ratio model and the method of predicting ACT separately. Making use of these more accurate predictions in planning and sequencing algorithms may enable an increase in utilization of ORs, leading to significant financial and productivity related

  19. Improving the Prediction of Total Surgical Procedure Time Using Linear Regression Modeling.

    Science.gov (United States)

    Edelman, Eric R; van Kuijk, Sander M J; Hamaekers, Ankie E W; de Korte, Marcel J M; van Merode, Godefridus G; Buhre, Wolfgang F F A

    2017-01-01

    For efficient utilization of operating rooms (ORs), accurate schedules of assigned block time and sequences of patient cases need to be made. The quality of these planning tools is dependent on the accurate prediction of total procedure time (TPT) per case. In this paper, we attempt to improve the accuracy of TPT predictions by using linear regression models based on estimated surgeon-controlled time (eSCT) and other variables relevant to TPT. We extracted data from a Dutch benchmarking database of all surgeries performed in six academic hospitals in The Netherlands from 2012 till 2016. The final dataset consisted of 79,983 records, describing 199,772 h of total OR time. Potential predictors of TPT that were included in the subsequent analysis were eSCT, patient age, type of operation, American Society of Anesthesiologists (ASA) physical status classification, and type of anesthesia used. First, we computed the predicted TPT based on a previously described fixed ratio model for each record, multiplying eSCT by 1.33. This number is based on the research performed by van Veen-Berkx et al., which showed that 33% of SCT is generally a good approximation of anesthesia-controlled time (ACT). We then systematically tested all possible linear regression models to predict TPT using eSCT in combination with the other available independent variables. In addition, all regression models were again tested without eSCT as a predictor to predict ACT separately (which leads to TPT by adding SCT). TPT was most accurately predicted using a linear regression model based on the independent variables eSCT, type of operation, ASA classification, and type of anesthesia. This model performed significantly better than the fixed ratio model and the method of predicting ACT separately. Making use of these more accurate predictions in planning and sequencing algorithms may enable an increase in utilization of ORs, leading to significant financial and productivity related benefits.

  20. Predicting Fuel Ignition Quality Using 1H NMR Spectroscopy and Multiple Linear Regression

    KAUST Repository

    Abdul Jameel, Abdul Gani; Naser, Nimal; Emwas, Abdul-Hamid M.; Dooley, Stephen; Sarathy, Mani

    2016-01-01

    An improved model for the prediction of ignition quality of hydrocarbon fuels has been developed using 1H nuclear magnetic resonance (NMR) spectroscopy and multiple linear regression (MLR) modeling. Cetane number (CN) and derived cetane number (DCN

  1. Modeling Fire Occurrence at the City Scale: A Comparison between Geographically Weighted Regression and Global Linear Regression.

    Science.gov (United States)

    Song, Chao; Kwan, Mei-Po; Zhu, Jiping

    2017-04-08

    An increasing number of fires are occurring with the rapid development of cities, resulting in increased risk for human beings and the environment. This study compares geographically weighted regression-based models, including geographically weighted regression (GWR) and geographically and temporally weighted regression (GTWR), which integrates spatial and temporal effects and global linear regression models (LM) for modeling fire risk at the city scale. The results show that the road density and the spatial distribution of enterprises have the strongest influences on fire risk, which implies that we should focus on areas where roads and enterprises are densely clustered. In addition, locations with a large number of enterprises have fewer fire ignition records, probably because of strict management and prevention measures. A changing number of significant variables across space indicate that heterogeneity mainly exists in the northern and eastern rural and suburban areas of Hefei city, where human-related facilities or road construction are only clustered in the city sub-centers. GTWR can capture small changes in the spatiotemporal heterogeneity of the variables while GWR and LM cannot. An approach that integrates space and time enables us to better understand the dynamic changes in fire risk. Thus governments can use the results to manage fire safety at the city scale.

  2. Predicting recycling behaviour: Comparison of a linear regression model and a fuzzy logic model.

    Science.gov (United States)

    Vesely, Stepan; Klöckner, Christian A; Dohnal, Mirko

    2016-03-01

    In this paper we demonstrate that fuzzy logic can provide a better tool for predicting recycling behaviour than the customarily used linear regression. To show this, we take a set of empirical data on recycling behaviour (N=664), which we randomly divide into two halves. The first half is used to estimate a linear regression model of recycling behaviour, and to develop a fuzzy logic model of recycling behaviour. As the first comparison, the fit of both models to the data included in estimation of the models (N=332) is evaluated. As the second comparison, predictive accuracy of both models for "new" cases (hold-out data not included in building the models, N=332) is assessed. In both cases, the fuzzy logic model significantly outperforms the regression model in terms of fit. To conclude, when accurate predictions of recycling and possibly other environmental behaviours are needed, fuzzy logic modelling seems to be a promising technique. Copyright © 2015 Elsevier Ltd. All rights reserved.

  3. Reduced Rank Regression

    DEFF Research Database (Denmark)

    Johansen, Søren

    2008-01-01

    The reduced rank regression model is a multivariate regression model with a coefficient matrix with reduced rank. The reduced rank regression algorithm is an estimation procedure, which estimates the reduced rank regression model. It is related to canonical correlations and involves calculating...

  4. Computational Tools for Probing Interactions in Multiple Linear Regression, Multilevel Modeling, and Latent Curve Analysis

    Science.gov (United States)

    Preacher, Kristopher J.; Curran, Patrick J.; Bauer, Daniel J.

    2006-01-01

    Simple slopes, regions of significance, and confidence bands are commonly used to evaluate interactions in multiple linear regression (MLR) models, and the use of these techniques has recently been extended to multilevel or hierarchical linear modeling (HLM) and latent curve analysis (LCA). However, conducting these tests and plotting the…

  5. A Cross-Domain Collaborative Filtering Algorithm Based on Feature Construction and Locally Weighted Linear Regression

    Directory of Open Access Journals (Sweden)

    Xu Yu

    2018-01-01

    Full Text Available Cross-domain collaborative filtering (CDCF solves the sparsity problem by transferring rating knowledge from auxiliary domains. Obviously, different auxiliary domains have different importance to the target domain. However, previous works cannot evaluate effectively the significance of different auxiliary domains. To overcome this drawback, we propose a cross-domain collaborative filtering algorithm based on Feature Construction and Locally Weighted Linear Regression (FCLWLR. We first construct features in different domains and use these features to represent different auxiliary domains. Thus the weight computation across different domains can be converted as the weight computation across different features. Then we combine the features in the target domain and in the auxiliary domains together and convert the cross-domain recommendation problem into a regression problem. Finally, we employ a Locally Weighted Linear Regression (LWLR model to solve the regression problem. As LWLR is a nonparametric regression method, it can effectively avoid underfitting or overfitting problem occurring in parametric regression methods. We conduct extensive experiments to show that the proposed FCLWLR algorithm is effective in addressing the data sparsity problem by transferring the useful knowledge from the auxiliary domains, as compared to many state-of-the-art single-domain or cross-domain CF methods.

  6. Carbon 13 nuclear magnetic resonance chemical shifts empiric calculations of polymers by multi linear regression and molecular modeling

    International Nuclear Information System (INIS)

    Da Silva Pinto, P.S.; Eustache, R.P.; Audenaert, M.; Bernassau, J.M.

    1996-01-01

    This work deals with carbon 13 nuclear magnetic resonance chemical shifts empiric calculations by multi linear regression and molecular modeling. The multi linear regression is indeed one way to obtain an equation able to describe the behaviour of the chemical shift for some molecules which are in the data base (rigid molecules with carbons). The methodology consists of structures describer parameters definition which can be bound to carbon 13 chemical shift known for these molecules. Then, the linear regression is used to determine the equation significant parameters. This one can be extrapolated to molecules which presents some resemblances with those of the data base. (O.L.). 20 refs., 4 figs., 1 tab

  7. SOME STATISTICAL ISSUES RELATED TO MULTIPLE LINEAR REGRESSION MODELING OF BEACH BACTERIA CONCENTRATIONS

    Science.gov (United States)

    As a fast and effective technique, the multiple linear regression (MLR) method has been widely used in modeling and prediction of beach bacteria concentrations. Among previous works on this subject, however, several issues were insufficiently or inconsistently addressed. Those is...

  8. Multiple regression technique for Pth degree polynominals with and without linear cross products

    Science.gov (United States)

    Davis, J. W.

    1973-01-01

    A multiple regression technique was developed by which the nonlinear behavior of specified independent variables can be related to a given dependent variable. The polynomial expression can be of Pth degree and can incorporate N independent variables. Two cases are treated such that mathematical models can be studied both with and without linear cross products. The resulting surface fits can be used to summarize trends for a given phenomenon and provide a mathematical relationship for subsequent analysis. To implement this technique, separate computer programs were developed for the case without linear cross products and for the case incorporating such cross products which evaluate the various constants in the model regression equation. In addition, the significance of the estimated regression equation is considered and the standard deviation, the F statistic, the maximum absolute percent error, and the average of the absolute values of the percent of error evaluated. The computer programs and their manner of utilization are described. Sample problems are included to illustrate the use and capability of the technique which show the output formats and typical plots comparing computer results to each set of input data.

  9. Multivariate Multiple Regression Models for a Big Data-Empowered SON Framework in Mobile Wireless Networks

    Directory of Open Access Journals (Sweden)

    Yoonsu Shin

    2016-01-01

    Full Text Available In the 5G era, the operational cost of mobile wireless networks will significantly increase. Further, massive network capacity and zero latency will be needed because everything will be connected to mobile networks. Thus, self-organizing networks (SON are needed, which expedite automatic operation of mobile wireless networks, but have challenges to satisfy the 5G requirements. Therefore, researchers have proposed a framework to empower SON using big data. The recent framework of a big data-empowered SON analyzes the relationship between key performance indicators (KPIs and related network parameters (NPs using machine-learning tools, and it develops regression models using a Gaussian process with those parameters. The problem, however, is that the methods of finding the NPs related to the KPIs differ individually. Moreover, the Gaussian process regression model cannot determine the relationship between a KPI and its various related NPs. In this paper, to solve these problems, we proposed multivariate multiple regression models to determine the relationship between various KPIs and NPs. If we assume one KPI and multiple NPs as one set, the proposed models help us process multiple sets at one time. Also, we can find out whether some KPIs are conflicting or not. We implement the proposed models using MapReduce.

  10. Improvement of Storm Forecasts Using Gridded Bayesian Linear Regression for Northeast United States

    Science.gov (United States)

    Yang, J.; Astitha, M.; Schwartz, C. S.

    2017-12-01

    Bayesian linear regression (BLR) is a post-processing technique in which regression coefficients are derived and used to correct raw forecasts based on pairs of observation-model values. This study presents the development and application of a gridded Bayesian linear regression (GBLR) as a new post-processing technique to improve numerical weather prediction (NWP) of rain and wind storm forecasts over northeast United States. Ten controlled variables produced from ten ensemble members of the National Center for Atmospheric Research (NCAR) real-time prediction system are used for a GBLR model. In the GBLR framework, leave-one-storm-out cross-validation is utilized to study the performances of the post-processing technique in a database composed of 92 storms. To estimate the regression coefficients of the GBLR, optimization procedures that minimize the systematic and random error of predicted atmospheric variables (wind speed, precipitation, etc.) are implemented for the modeled-observed pairs of training storms. The regression coefficients calculated for meteorological stations of the National Weather Service are interpolated back to the model domain. An analysis of forecast improvements based on error reductions during the storms will demonstrate the value of GBLR approach. This presentation will also illustrate how the variances are optimized for the training partition in GBLR and discuss the verification strategy for grid points where no observations are available. The new post-processing technique is successful in improving wind speed and precipitation storm forecasts using past event-based data and has the potential to be implemented in real-time.

  11. Multivariate strategies in functional magnetic resonance imaging

    DEFF Research Database (Denmark)

    Hansen, Lars Kai

    2007-01-01

    We discuss aspects of multivariate fMRI modeling, including the statistical evaluation of multivariate models and means for dimensional reduction. In a case study we analyze linear and non-linear dimensional reduction tools in the context of a `mind reading' predictive multivariate fMRI model....

  12. [Comparison of application of Cochran-Armitage trend test and linear regression analysis for rate trend analysis in epidemiology study].

    Science.gov (United States)

    Wang, D Z; Wang, C; Shen, C F; Zhang, Y; Zhang, H; Song, G D; Xue, X D; Xu, Z L; Zhang, S; Jiang, G H

    2017-05-10

    We described the time trend of acute myocardial infarction (AMI) from 1999 to 2013 in Tianjin incidence rate with Cochran-Armitage trend (CAT) test and linear regression analysis, and the results were compared. Based on actual population, CAT test had much stronger statistical power than linear regression analysis for both overall incidence trend and age specific incidence trend (Cochran-Armitage trend P valuelinear regression P value). The statistical power of CAT test decreased, while the result of linear regression analysis remained the same when population size was reduced by 100 times and AMI incidence rate remained unchanged. The two statistical methods have their advantages and disadvantages. It is necessary to choose statistical method according the fitting degree of data, or comprehensively analyze the results of two methods.

  13. [Use of multiple regression models in observational studies (1970-2013) and requirements of the STROBE guidelines in Spanish scientific journals].

    Science.gov (United States)

    Real, J; Cleries, R; Forné, C; Roso-Llorach, A; Martínez-Sánchez, J M

    In medicine and biomedical research, statistical techniques like logistic, linear, Cox and Poisson regression are widely known. The main objective is to describe the evolution of multivariate techniques used in observational studies indexed in PubMed (1970-2013), and to check the requirements of the STROBE guidelines in the author guidelines in Spanish journals indexed in PubMed. A targeted PubMed search was performed to identify papers that used logistic linear Cox and Poisson models. Furthermore, a review was also made of the author guidelines of journals published in Spain and indexed in PubMed and Web of Science. Only 6.1% of the indexed manuscripts included a term related to multivariate analysis, increasing from 0.14% in 1980 to 12.3% in 2013. In 2013, 6.7, 2.5, 3.5, and 0.31% of the manuscripts contained terms related to logistic, linear, Cox and Poisson regression, respectively. On the other hand, 12.8% of journals author guidelines explicitly recommend to follow the STROBE guidelines, and 35.9% recommend the CONSORT guideline. A low percentage of Spanish scientific journals indexed in PubMed include the STROBE statement requirement in the author guidelines. Multivariate regression models in published observational studies such as logistic regression, linear, Cox and Poisson are increasingly used both at international level, as well as in journals published in Spanish. Copyright © 2015 Sociedad Española de Médicos de Atención Primaria (SEMERGEN). Publicado por Elsevier España, S.L.U. All rights reserved.

  14. Multivariate research in areas of phosphorus cast-iron brake shoes manufacturing using the statistical analysis and the multiple regression equations

    Science.gov (United States)

    Kiss, I.; Cioată, V. G.; Alexa, V.; Raţiu, S. A.

    2017-05-01

    The braking system is one of the most important and complex subsystems of railway vehicles, especially when it comes for safety. Therefore, installing efficient safe brakes on the modern railway vehicles is essential. Nowadays is devoted attention to solving problems connected with using high performance brake materials and its impact on thermal and mechanical loading of railway wheels. The main factor that influences the selection of a friction material for railway applications is the performance criterion, due to the interaction between the brake block and the wheel produce complex thermos-mechanical phenomena. In this work, the investigated subjects are the cast-iron brake shoes, which are still widely used on freight wagons. Therefore, the cast-iron brake shoes - with lamellar graphite and with a high content of phosphorus (0.8-1.1%) - need a special investigation. In order to establish the optimal condition for the cast-iron brake shoes we proposed a mathematical modelling study by using the statistical analysis and multiple regression equations. Multivariate research is important in areas of cast-iron brake shoes manufacturing, because many variables interact with each other simultaneously. Multivariate visualization comes to the fore when researchers have difficulties in comprehending many dimensions at one time. Technological data (hardness and chemical composition) obtained from cast-iron brake shoes were used for this purpose. In order to settle the multiple correlation between the hardness of the cast-iron brake shoes, and the chemical compositions elements several model of regression equation types has been proposed. Because a three-dimensional surface with variables on three axes is a common way to illustrate multivariate data, in which the maximum and minimum values are easily highlighted, we plotted graphical representation of the regression equations in order to explain interaction of the variables and locate the optimal level of each variable for

  15. USE OF THE SIMPLE LINEAR REGRESSION MODEL IN MACRO-ECONOMICAL ANALYSES

    Directory of Open Access Journals (Sweden)

    Constantin ANGHELACHE

    2011-10-01

    Full Text Available The article presents the fundamental aspects of the linear regression, as a toolbox which can be used in macroeconomic analyses. The article describes the estimation of the parameters, the statistical tests used, the homoscesasticity and heteroskedasticity. The use of econometrics instrument in macroeconomics is an important factor that guarantees the quality of the models, analyses, results and possible interpretation that can be drawn at this level.

  16. Single camera multi-view anthropometric measurement of human height and mid-upper arm circumference using linear regression.

    Science.gov (United States)

    Liu, Yingying; Sowmya, Arcot; Khamis, Heba

    2018-01-01

    Manually measured anthropometric quantities are used in many applications including human malnutrition assessment. Training is required to collect anthropometric measurements manually, which is not ideal in resource-constrained environments. Photogrammetric methods have been gaining attention in recent years, due to the availability and affordability of digital cameras. The primary goal is to demonstrate that height and mid-upper arm circumference (MUAC)-indicators of malnutrition-can be accurately estimated by applying linear regression to distance measurements from photographs of participants taken from five views, and determine the optimal view combinations. A secondary goal is to observe the effect on estimate error of two approaches which reduce complexity of the setup, computational requirements and the expertise required of the observer. Thirty-one participants (11 female, 20 male; 18-37 years) were photographed from five views. Distances were computed using both camera calibration and reference object techniques from manually annotated photos. To estimate height, linear regression was applied to the distances between the top of the participants head and the floor, as well as the height of a bounding box enclosing the participant's silhouette which eliminates the need to identify the floor. To estimate MUAC, linear regression was applied to the mid-upper arm width. Estimates were computed for all view combinations and performance was compared to other photogrammetric methods from the literature-linear distance method for height, and shape models for MUAC. The mean absolute difference (MAD) between the linear regression estimates and manual measurements were smaller compared to other methods. For the optimal view combinations (smallest MAD), the technical error of measurement and coefficient of reliability also indicate the linear regression methods are more reliable. The optimal view combination was the front and side views. When estimating height by linear

  17. Robust linear registration of CT images using random regression forests

    Science.gov (United States)

    Konukoglu, Ender; Criminisi, Antonio; Pathak, Sayan; Robertson, Duncan; White, Steve; Haynor, David; Siddiqui, Khan

    2011-03-01

    Global linear registration is a necessary first step for many different tasks in medical image analysis. Comparing longitudinal studies1, cross-modality fusion2, and many other applications depend heavily on the success of the automatic registration. The robustness and efficiency of this step is crucial as it affects all subsequent operations. Most common techniques cast the linear registration problem as the minimization of a global energy function based on the image intensities. Although these algorithms have proved useful, their robustness in fully automated scenarios is still an open question. In fact, the optimization step often gets caught in local minima yielding unsatisfactory results. Recent algorithms constrain the space of registration parameters by exploiting implicit or explicit organ segmentations, thus increasing robustness4,5. In this work we propose a novel robust algorithm for automatic global linear image registration. Our method uses random regression forests to estimate posterior probability distributions for the locations of anatomical structures - represented as axis aligned bounding boxes6. These posterior distributions are later integrated in a global linear registration algorithm. The biggest advantage of our algorithm is that it does not require pre-defined segmentations or regions. Yet it yields robust registration results. We compare the robustness of our algorithm with that of the state of the art Elastix toolbox7. Validation is performed via 1464 pair-wise registrations in a database of very diverse 3D CT images. We show that our method decreases the "failure" rate of the global linear registration from 12.5% (Elastix) to only 1.9%.

  18. Using the fuzzy linear regression method to benchmark the energy efficiency of commercial buildings

    International Nuclear Information System (INIS)

    Chung, William

    2012-01-01

    Highlights: ► Fuzzy linear regression method is used for developing benchmarking systems. ► The systems can be used to benchmark energy efficiency of commercial buildings. ► The resulting benchmarking model can be used by public users. ► The resulting benchmarking model can capture the fuzzy nature of input–output data. -- Abstract: Benchmarking systems from a sample of reference buildings need to be developed to conduct benchmarking processes for the energy efficiency of commercial buildings. However, not all benchmarking systems can be adopted by public users (i.e., other non-reference building owners) because of the different methods in developing such systems. An approach for benchmarking the energy efficiency of commercial buildings using statistical regression analysis to normalize other factors, such as management performance, was developed in a previous work. However, the field data given by experts can be regarded as a distribution of possibility. Thus, the previous work may not be adequate to handle such fuzzy input–output data. Consequently, a number of fuzzy structures cannot be fully captured by statistical regression analysis. This present paper proposes the use of fuzzy linear regression analysis to develop a benchmarking process, the resulting model of which can be used by public users. An illustrative example is given as well.

  19. Least median of squares and iteratively re-weighted least squares as robust linear regression methods for fluorimetric determination of α-lipoic acid in capsules in ideal and non-ideal cases of linearity.

    Science.gov (United States)

    Korany, Mohamed A; Gazy, Azza A; Khamis, Essam F; Ragab, Marwa A A; Kamal, Miranda F

    2018-03-26

    This study outlines two robust regression approaches, namely least median of squares (LMS) and iteratively re-weighted least squares (IRLS) to investigate their application in instrument analysis of nutraceuticals (that is, fluorescence quenching of merbromin reagent upon lipoic acid addition). These robust regression methods were used to calculate calibration data from the fluorescence quenching reaction (∆F and F-ratio) under ideal or non-ideal linearity conditions. For each condition, data were treated using three regression fittings: Ordinary Least Squares (OLS), LMS and IRLS. Assessment of linearity, limits of detection (LOD) and quantitation (LOQ), accuracy and precision were carefully studied for each condition. LMS and IRLS regression line fittings showed significant improvement in correlation coefficients and all regression parameters for both methods and both conditions. In the ideal linearity condition, the intercept and slope changed insignificantly, but a dramatic change was observed for the non-ideal condition and linearity intercept. Under both linearity conditions, LOD and LOQ values after the robust regression line fitting of data were lower than those obtained before data treatment. The results obtained after statistical treatment indicated that the linearity ranges for drug determination could be expanded to lower limits of quantitation by enhancing the regression equation parameters after data treatment. Analysis results for lipoic acid in capsules, using both fluorimetric methods, treated by parametric OLS and after treatment by robust LMS and IRLS were compared for both linearity conditions. Copyright © 2018 John Wiley & Sons, Ltd.

  20. Piecewise linear regression techniques to analyze the timing of head coach dismissals in Dutch soccer clubs

    NARCIS (Netherlands)

    Schryver, T. de; Eisinga, R.

    2010-01-01

    The key question in research on dismissals of head coaches in sports clubs is not whether they should happen but when they will happen. This paper applies piecewise linear regression to advance our understanding of the timing of head coach dismissals. Essentially, the regression sacrifices degrees

  1. An Introduction to Graphical and Mathematical Methods for Detecting Heteroscedasticity in Linear Regression.

    Science.gov (United States)

    Thompson, Russel L.

    Homoscedasticity is an important assumption of linear regression. This paper explains what it is and why it is important to the researcher. Graphical and mathematical methods for testing the homoscedasticity assumption are demonstrated. Sources of homoscedasticity and types of homoscedasticity are discussed, and methods for correction are…

  2. Error analysis of dimensionless scaling experiments with multiple points using linear regression

    International Nuclear Information System (INIS)

    Guercan, Oe.D.; Vermare, L.; Hennequin, P.; Bourdelle, C.

    2010-01-01

    A general method of error estimation in the case of multiple point dimensionless scaling experiments, using linear regression and standard error propagation, is proposed. The method reduces to the previous result of Cordey (2009 Nucl. Fusion 49 052001) in the case of a two-point scan. On the other hand, if the points follow a linear trend, it explains how the estimated error decreases as more points are added to the scan. Based on the analytical expression that is derived, it is argued that for a low number of points, adding points to the ends of the scanned range, rather than the middle, results in a smaller error estimate. (letter)

  3. Comparison of two-concentration with multi-concentration linear regressions: Retrospective data analysis of multiple regulated LC-MS bioanalytical projects.

    Science.gov (United States)

    Musuku, Adrien; Tan, Aimin; Awaiye, Kayode; Trabelsi, Fethi

    2013-09-01

    Linear calibration is usually performed using eight to ten calibration concentration levels in regulated LC-MS bioanalysis because a minimum of six are specified in regulatory guidelines. However, we have previously reported that two-concentration linear calibration is as reliable as or even better than using multiple concentrations. The purpose of this research is to compare two-concentration with multiple-concentration linear calibration through retrospective data analysis of multiple bioanalytical projects that were conducted in an independent regulated bioanalytical laboratory. A total of 12 bioanalytical projects were randomly selected: two validations and two studies for each of the three most commonly used types of sample extraction methods (protein precipitation, liquid-liquid extraction, solid-phase extraction). When the existing data were retrospectively linearly regressed using only the lowest and the highest concentration levels, no extra batch failure/QC rejection was observed and the differences in accuracy and precision between the original multi-concentration regression and the new two-concentration linear regression are negligible. Specifically, the differences in overall mean apparent bias (square root of mean individual bias squares) are within the ranges of -0.3% to 0.7% and 0.1-0.7% for the validations and studies, respectively. The differences in mean QC concentrations are within the ranges of -0.6% to 1.8% and -0.8% to 2.5% for the validations and studies, respectively. The differences in %CV are within the ranges of -0.7% to 0.9% and -0.3% to 0.6% for the validations and studies, respectively. The average differences in study sample concentrations are within the range of -0.8% to 2.3%. With two-concentration linear regression, an average of 13% of time and cost could have been saved for each batch together with 53% of saving in the lead-in for each project (the preparation of working standard solutions, spiking, and aliquoting). Furthermore

  4. Remote sensing and GIS-based landslide hazard analysis and cross-validation using multivariate logistic regression model on three test areas in Malaysia

    Science.gov (United States)

    Pradhan, Biswajeet

    2010-05-01

    This paper presents the results of the cross-validation of a multivariate logistic regression model using remote sensing data and GIS for landslide hazard analysis on the Penang, Cameron, and Selangor areas in Malaysia. Landslide locations in the study areas were identified by interpreting aerial photographs and satellite images, supported by field surveys. SPOT 5 and Landsat TM satellite imagery were used to map landcover and vegetation index, respectively. Maps of topography, soil type, lineaments and land cover were constructed from the spatial datasets. Ten factors which influence landslide occurrence, i.e., slope, aspect, curvature, distance from drainage, lithology, distance from lineaments, soil type, landcover, rainfall precipitation, and normalized difference vegetation index (ndvi), were extracted from the spatial database and the logistic regression coefficient of each factor was computed. Then the landslide hazard was analysed using the multivariate logistic regression coefficients derived not only from the data for the respective area but also using the logistic regression coefficients calculated from each of the other two areas (nine hazard maps in all) as a cross-validation of the model. For verification of the model, the results of the analyses were then compared with the field-verified landslide locations. Among the three cases of the application of logistic regression coefficient in the same study area, the case of Selangor based on the Selangor logistic regression coefficients showed the highest accuracy (94%), where as Penang based on the Penang coefficients showed the lowest accuracy (86%). Similarly, among the six cases from the cross application of logistic regression coefficient in other two areas, the case of Selangor based on logistic coefficient of Cameron showed highest (90%) prediction accuracy where as the case of Penang based on the Selangor logistic regression coefficients showed the lowest accuracy (79%). Qualitatively, the cross

  5. Evaluation of a multiple linear regression model and SARIMA model in forecasting heat demand for district heating system

    International Nuclear Information System (INIS)

    Fang, Tingting; Lahdelma, Risto

    2016-01-01

    Highlights: • Social factor is considered for the linear regression models besides weather file. • Simultaneously optimize all the coefficients for linear regression models. • SARIMA combined with linear regression is used to forecast the heat demand. • The accuracy for both linear regression and time series models are evaluated. - Abstract: Forecasting heat demand is necessary for production and operation planning of district heating (DH) systems. In this study we first propose a simple regression model where the hourly outdoor temperature and wind speed forecast the heat demand. Weekly rhythm of heat consumption as a social component is added to the model to significantly improve the accuracy. The other type of model is the seasonal autoregressive integrated moving average (SARIMA) model with exogenous variables as a combination to take weather factors, and the historical heat consumption data as depending variables. One outstanding advantage of the model is that it peruses the high accuracy for both long-term and short-term forecast by considering both exogenous factors and time series. The forecasting performance of both linear regression models and time series model are evaluated based on real-life heat demand data for the city of Espoo in Finland by out-of-sample tests for the last 20 full weeks of the year. The results indicate that the proposed linear regression model (T168h) using 168-h demand pattern with midweek holidays classified as Saturdays or Sundays gives the highest accuracy and strong robustness among all the tested models based on the tested forecasting horizon and corresponding data. Considering the parsimony of the input, the ease of use and the high accuracy, the proposed T168h model is the best in practice. The heat demand forecasting model can also be developed for individual buildings if automated meter reading customer measurements are available. This would allow forecasting the heat demand based on more accurate heat consumption

  6. Convergence diagnostics for Eigenvalue problems with linear regression model

    International Nuclear Information System (INIS)

    Shi, Bo; Petrovic, Bojan

    2011-01-01

    Although the Monte Carlo method has been extensively used for criticality/Eigenvalue problems, a reliable, robust, and efficient convergence diagnostics method is still desired. Most methods are based on integral parameters (multiplication factor, entropy) and either condense the local distribution information into a single value (e.g., entropy) or even disregard it. We propose to employ the detailed cycle-by-cycle local flux evolution obtained by using mesh tally mechanism to assess the source and flux convergence. By applying a linear regression model to each individual mesh in a mesh tally for convergence diagnostics, a global convergence criterion can be obtained. We exemplify this method on two problems and obtain promising diagnostics results. (author)

  7. Regressão linear geograficamente ponderada em ambiente SIG

    Directory of Open Access Journals (Sweden)

    Luís Eduardo Ximenes Carvalho

    2009-10-01

    Full Text Available

    Este artigo aborda considerações teóricas e resultados da implementação em ambiente SIG de um modelo confirmatório de estatística espacial — regressão linear geograficamente ponderada (RGP — não disponível em ambiente livre. Os aspectos teóricos deste modelo local de regressão espacial foram amplamente discutidos em virtude da escassa bibliografia existente. O modelo RGP foi implementado na linguagem de programação GISDK do SIG-T TransCAD, utilizando compreensivamente as ferramentas de manipulação, tratamento georreferenciado dos dados e rotinas de análise espacial disponibilizadas em plataformas SIG. Ao final, espera-se ter desenvolvido, ainda que de maneira parcial, uma importante ferramenta que contribuirá para a compreensão e refinamento da modelagem de fenômenos geográficos tão amplamente analisados em estudos de Planejamento de Transportes.

  8. Enhancement of Visual Field Predictions with Pointwise Exponential Regression (PER) and Pointwise Linear Regression (PLR).

    Science.gov (United States)

    Morales, Esteban; de Leon, John Mark S; Abdollahi, Niloufar; Yu, Fei; Nouri-Mahdavi, Kouros; Caprioli, Joseph

    2016-03-01

    The study was conducted to evaluate threshold smoothing algorithms to enhance prediction of the rates of visual field (VF) worsening in glaucoma. We studied 798 patients with primary open-angle glaucoma and 6 or more years of follow-up who underwent 8 or more VF examinations. Thresholds at each VF location for the first 4 years or first half of the follow-up time (whichever was greater) were smoothed with clusters defined by the nearest neighbor (NN), Garway-Heath, Glaucoma Hemifield Test (GHT), and weighting by the correlation of rates at all other VF locations. Thresholds were regressed with a pointwise exponential regression (PER) model and a pointwise linear regression (PLR) model. Smaller root mean square error (RMSE) values of the differences between the observed and the predicted thresholds at last two follow-ups indicated better model predictions. The mean (SD) follow-up times for the smoothing and prediction phase were 5.3 (1.5) and 10.5 (3.9) years. The mean RMSE values for the PER and PLR models were unsmoothed data, 6.09 and 6.55; NN, 3.40 and 3.42; Garway-Heath, 3.47 and 3.48; GHT, 3.57 and 3.74; and correlation of rates, 3.59 and 3.64. Smoothed VF data predicted better than unsmoothed data. Nearest neighbor provided the best predictions; PER also predicted consistently more accurately than PLR. Smoothing algorithms should be used when forecasting VF results with PER or PLR. The application of smoothing algorithms on VF data can improve forecasting in VF points to assist in treatment decisions.

  9. Multivariate mixed linear model analysis of longitudinal data: an information-rich statistical technique for analyzing disease resistance data

    Science.gov (United States)

    The mixed linear model (MLM) is currently among the most advanced and flexible statistical modeling techniques and its use in tackling problems in plant pathology has begun surfacing in the literature. The longitudinal MLM is a multivariate extension that handles repeatedly measured data, such as r...

  10. [Multiple linear regression analysis of X-ray measurement and WOMAC scores of knee osteoarthritis].

    Science.gov (United States)

    Ma, Yu-Feng; Wang, Qing-Fu; Chen, Zhao-Jun; Du, Chun-Lin; Li, Jun-Hai; Huang, Hu; Shi, Zong-Ting; Yin, Yue-Shan; Zhang, Lei; A-Di, Li-Jiang; Dong, Shi-Yu; Wu, Ji

    2012-05-01

    To perform Multiple Linear Regression analysis of X-ray measurement and WOMAC scores of knee osteoarthritis, and to analyze their relationship with clinical and biomechanical concepts. From March 2011 to July 2011, 140 patients (250 knees) were reviewed, including 132 knees in the left and 118 knees in the right; ranging in age from 40 to 71 years, with an average of 54.68 years. The MB-RULER measurement software was applied to measure femoral angle, tibial angle, femorotibial angle, joint gap angle from antero-posterir and lateral position of X-rays. The WOMAC scores were also collected. Then multiple regression equations was applied for the linear regression analysis of correlation between the X-ray measurement and WOMAC scores. There was statistical significance in the regression equation of AP X-rays value and WOMAC scores (Pregression equation of lateral X-ray value and WOMAC scores (P>0.05). 1) X-ray measurement of knee joint can reflect the WOMAC scores to a certain extent. 2) It is necessary to measure the X-ray mechanical axis of knee, which is important for diagnosis and treatment of osteoarthritis. 3) The correlation between tibial angle,joint gap angle on antero-posterior X-ray and WOMAC scores is significant, which can be used to assess the functional recovery of patients before and after treatment.

  11. Single Image Super-Resolution Using Global Regression Based on Multiple Local Linear Mappings.

    Science.gov (United States)

    Choi, Jae-Seok; Kim, Munchurl

    2017-03-01

    Super-resolution (SR) has become more vital, because of its capability to generate high-quality ultra-high definition (UHD) high-resolution (HR) images from low-resolution (LR) input images. Conventional SR methods entail high computational complexity, which makes them difficult to be implemented for up-scaling of full-high-definition input images into UHD-resolution images. Nevertheless, our previous super-interpolation (SI) method showed a good compromise between Peak-Signal-to-Noise Ratio (PSNR) performances and computational complexity. However, since SI only utilizes simple linear mappings, it may fail to precisely reconstruct HR patches with complex texture. In this paper, we present a novel SR method, which inherits the large-to-small patch conversion scheme from SI but uses global regression based on local linear mappings (GLM). Thus, our new SR method is called GLM-SI. In GLM-SI, each LR input patch is divided into 25 overlapped subpatches. Next, based on the local properties of these subpatches, 25 different local linear mappings are applied to the current LR input patch to generate 25 HR patch candidates, which are then regressed into one final HR patch using a global regressor. The local linear mappings are learned cluster-wise in our off-line training phase. The main contribution of this paper is as follows: Previously, linear-mapping-based conventional SR methods, including SI only used one simple yet coarse linear mapping to each patch to reconstruct its HR version. On the contrary, for each LR input patch, our GLM-SI is the first to apply a combination of multiple local linear mappings, where each local linear mapping is found according to local properties of the current LR patch. Therefore, it can better approximate nonlinear LR-to-HR mappings for HR patches with complex texture. Experiment results show that the proposed GLM-SI method outperforms most of the state-of-the-art methods, and shows comparable PSNR performance with much lower

  12. Using the classical linear regression model in analysis of the dependences of conveyor belt life

    Directory of Open Access Journals (Sweden)

    Miriam Andrejiová

    2013-12-01

    Full Text Available The paper deals with the classical linear regression model of the dependence of conveyor belt life on some selected parameters: thickness of paint layer, width and length of the belt, conveyor speed and quantity of transported material. The first part of the article is about regression model design, point and interval estimation of parameters, verification of statistical significance of the model, and about the parameters of the proposed regression model. The second part of the article deals with identification of influential and extreme values that can have an impact on estimation of regression model parameters. The third part focuses on assumptions of the classical regression model, i.e. on verification of independence assumptions, normality and homoscedasticity of residuals.

  13. Building a new predictor for multiple linear regression technique-based corrective maintenance turnaround time.

    Science.gov (United States)

    Cruz, Antonio M; Barr, Cameron; Puñales-Pozo, Elsa

    2008-01-01

    This research's main goals were to build a predictor for a turnaround time (TAT) indicator for estimating its values and use a numerical clustering technique for finding possible causes of undesirable TAT values. The following stages were used: domain understanding, data characterisation and sample reduction and insight characterisation. Building the TAT indicator multiple linear regression predictor and clustering techniques were used for improving corrective maintenance task efficiency in a clinical engineering department (CED). The indicator being studied was turnaround time (TAT). Multiple linear regression was used for building a predictive TAT value model. The variables contributing to such model were clinical engineering department response time (CE(rt), 0.415 positive coefficient), stock service response time (Stock(rt), 0.734 positive coefficient), priority level (0.21 positive coefficient) and service time (0.06 positive coefficient). The regression process showed heavy reliance on Stock(rt), CE(rt) and priority, in that order. Clustering techniques revealed the main causes of high TAT values. This examination has provided a means for analysing current technical service quality and effectiveness. In doing so, it has demonstrated a process for identifying areas and methods of improvement and a model against which to analyse these methods' effectiveness.

  14. Modeling the frequency of opposing left-turn conflicts at signalized intersections using generalized linear regression models.

    Science.gov (United States)

    Zhang, Xin; Liu, Pan; Chen, Yuguang; Bai, Lu; Wang, Wei

    2014-01-01

    The primary objective of this study was to identify whether the frequency of traffic conflicts at signalized intersections can be modeled. The opposing left-turn conflicts were selected for the development of conflict predictive models. Using data collected at 30 approaches at 20 signalized intersections, the underlying distributions of the conflicts under different traffic conditions were examined. Different conflict-predictive models were developed to relate the frequency of opposing left-turn conflicts to various explanatory variables. The models considered include a linear regression model, a negative binomial model, and separate models developed for four traffic scenarios. The prediction performance of different models was compared. The frequency of traffic conflicts follows a negative binominal distribution. The linear regression model is not appropriate for the conflict frequency data. In addition, drivers behaved differently under different traffic conditions. Accordingly, the effects of conflicting traffic volumes on conflict frequency vary across different traffic conditions. The occurrences of traffic conflicts at signalized intersections can be modeled using generalized linear regression models. The use of conflict predictive models has potential to expand the uses of surrogate safety measures in safety estimation and evaluation.

  15. Multivariate Bonferroni-type inequalities theory and applications

    CERN Document Server

    Chen, John

    2014-01-01

    Multivariate Bonferroni-Type Inequalities: Theory and Applications presents a systematic account of research discoveries on multivariate Bonferroni-type inequalities published in the past decade. The emergence of new bounding approaches pushes the conventional definitions of optimal inequalities and demands new insights into linear and Fréchet optimality. The book explores these advances in bounding techniques with corresponding innovative applications. It presents the method of linear programming for multivariate bounds, multivariate hybrid bounds, sub-Markovian bounds, and bounds using Hamil

  16. Non-linear multivariable predictive control of an alcoholic fermentation process using functional link networks

    Directory of Open Access Journals (Sweden)

    Luiz Augusto da Cruz Meleiro

    2005-06-01

    Full Text Available In this work a MIMO non-linear predictive controller was developed for an extractive alcoholic fermentation process. The internal model of the controller was represented by two MISO Functional Link Networks (FLNs, identified using simulated data generated from a deterministic mathematical model whose kinetic parameters were determined experimentally. The FLN structure presents as advantages fast training and guaranteed convergence, since the estimation of the weights is a linear optimization problem. Besides, the elimination of non-significant weights generates parsimonious models, which allows for fast execution in an MPC-based algorithm. The proposed algorithm showed good potential in identification and control of non-linear processes.Neste trabalho um controlador preditivo não linear multivariável foi desenvolvido para um processo de fermentação alcoólica extrativa. O modelo interno do controlador foi representado por duas redes do tipo Functional Link (FLN, identificadas usando dados de simulação gerados a partir de um modelo validado experimentalmente. A estrutura FLN apresenta como vantagem o treinamento rápido e convergência garantida, já que a estimação dos seus pesos é um problema de otimização linear. Além disso, a eliminação de pesos não significativos gera modelos parsimoniosos, o que permite a rápida execução em algoritmos de controle preditivo baseado em modelo. Os resultados mostram que o algoritmo proposto tem grande potencial para identificação e controle de processos não lineares.

  17. A Feature-Free 30-Disease Pathological Brain Detection System by Linear Regression Classifier.

    Science.gov (United States)

    Chen, Yi; Shao, Ying; Yan, Jie; Yuan, Ti-Fei; Qu, Yanwen; Lee, Elizabeth; Wang, Shuihua

    2017-01-01

    Alzheimer's disease patients are increasing rapidly every year. Scholars tend to use computer vision methods to develop automatic diagnosis system. (Background) In 2015, Gorji et al. proposed a novel method using pseudo Zernike moment. They tested four classifiers: learning vector quantization neural network, pattern recognition neural network trained by Levenberg-Marquardt, by resilient backpropagation, and by scaled conjugate gradient. This study presents an improved method by introducing a relatively new classifier-linear regression classification. Our method selects one axial slice from 3D brain image, and employed pseudo Zernike moment with maximum order of 15 to extract 256 features from each image. Finally, linear regression classification was harnessed as the classifier. The proposed approach obtains an accuracy of 97.51%, a sensitivity of 96.71%, and a specificity of 97.73%. Our method performs better than Gorji's approach and five other state-of-the-art approaches. Therefore, it can be used to detect Alzheimer's disease. Copyright© Bentham Science Publishers; For any queries, please email at epub@benthamscience.org.

  18. MULTIPLE LINEAR REGRESSION ANALYSIS FOR PREDICTION OF BOILER LOSSES AND BOILER EFFICIENCY

    OpenAIRE

    Chayalakshmi C.L

    2018-01-01

    MULTIPLE LINEAR REGRESSION ANALYSIS FOR PREDICTION OF BOILER LOSSES AND BOILER EFFICIENCY ABSTRACT Calculation of boiler efficiency is essential if its parameters need to be controlled for either maintaining or enhancing its efficiency. But determination of boiler efficiency using conventional method is time consuming and very expensive. Hence, it is not recommended to find boiler efficiency frequently. The work presented in this paper deals with establishing the statistical mo...

  19. [Multivariate ordinal logistic regression analysis on the association between consumption of fried food and both esophageal cancer and precancerous lesions].

    Science.gov (United States)

    Guo, L W; Liu, S Z; Zhang, M; Chen, Q; Zhang, S K; Sun, X B

    2017-12-10

    Objective: To investigate the effect of fried food intake on the pathogenesis of esophageal cancer and precancerous lesions. Methods: From 2005 to 2013, all the residents aged 40-69 years from 11 counties (cities) where cancer screening of upper gastrointestinal cancer had been conducted in rural areas of Henan province, were recruited as the subjects of study. Information on demography and lifestyle was collected. The residents under study were screened with iodine staining endoscopic examination and biopsy samples were diagnosed pathologically, under standardized criteria. Subjects with high risk were divided into the groups based on their different pathological degrees. Multivariate ordinal logistic regression analysis was used to analyze the relationship between the frequency of fried food intake and esophageal cancer and precancerous lesions. Results: A total number of 8 792 cases with normal esophagus, 3 680 with mild hyperplasia, 972 with moderate hyperplasia, 413 with severe hyperplasia carcinoma in situ, and 336 cases of esophageal cancer were recruited. Results from multivariate logistic regression analysis showed that, when compared with those who did not eat fried food, the intake of fried food (food appeared a risk factor for both esophageal cancer and precancerous lesions.

  20. Bivariate least squares linear regression: Towards a unified analytic formalism. I. Functional models

    Science.gov (United States)

    Caimmi, R.

    2011-08-01

    Concerning bivariate least squares linear regression, the classical approach pursued for functional models in earlier attempts ( York, 1966, 1969) is reviewed using a new formalism in terms of deviation (matrix) traces which, for unweighted data, reduce to usual quantities leaving aside an unessential (but dimensional) multiplicative factor. Within the framework of classical error models, the dependent variable relates to the independent variable according to the usual additive model. The classes of linear models considered are regression lines in the general case of correlated errors in X and in Y for weighted data, and in the opposite limiting situations of (i) uncorrelated errors in X and in Y, and (ii) completely correlated errors in X and in Y. The special case of (C) generalized orthogonal regression is considered in detail together with well known subcases, namely: (Y) errors in X negligible (ideally null) with respect to errors in Y; (X) errors in Y negligible (ideally null) with respect to errors in X; (O) genuine orthogonal regression; (R) reduced major-axis regression. In the limit of unweighted data, the results determined for functional models are compared with their counterparts related to extreme structural models i.e. the instrumental scatter is negligible (ideally null) with respect to the intrinsic scatter ( Isobe et al., 1990; Feigelson and Babu, 1992). While regression line slope and intercept estimators for functional and structural models necessarily coincide, the contrary holds for related variance estimators even if the residuals obey a Gaussian distribution, with the exception of Y models. An example of astronomical application is considered, concerning the [O/H]-[Fe/H] empirical relations deduced from five samples related to different stars and/or different methods of oxygen abundance determination. For selected samples and assigned methods, different regression models yield consistent results within the errors (∓ σ) for both

  1. Improving sensitivity of linear regression-based cell type-specific differential expression deconvolution with per-gene vs. global significance threshold.

    Science.gov (United States)

    Glass, Edmund R; Dozmorov, Mikhail G

    2016-10-06

    The goal of many human disease-oriented studies is to detect molecular mechanisms different between healthy controls and patients. Yet, commonly used gene expression measurements from blood samples suffer from variability of cell composition. This variability hinders the detection of differentially expressed genes and is often ignored. Combined with cell counts, heterogeneous gene expression may provide deeper insights into the gene expression differences on the cell type-specific level. Published computational methods use linear regression to estimate cell type-specific differential expression, and a global cutoff to judge significance, such as False Discovery Rate (FDR). Yet, they do not consider many artifacts hidden in high-dimensional gene expression data that may negatively affect linear regression. In this paper we quantify the parameter space affecting the performance of linear regression (sensitivity of cell type-specific differential expression detection) on a per-gene basis. We evaluated the effect of sample sizes, cell type-specific proportion variability, and mean squared error on sensitivity of cell type-specific differential expression detection using linear regression. Each parameter affected variability of cell type-specific expression estimates and, subsequently, the sensitivity of differential expression detection. We provide the R package, LRCDE, which performs linear regression-based cell type-specific differential expression (deconvolution) detection on a gene-by-gene basis. Accounting for variability around cell type-specific gene expression estimates, it computes per-gene t-statistics of differential detection, p-values, t-statistic-based sensitivity, group-specific mean squared error, and several gene-specific diagnostic metrics. The sensitivity of linear regression-based cell type-specific differential expression detection differed for each gene as a function of mean squared error, per group sample sizes, and variability of the proportions

  2. Multivariate Matrix-Exponential Distributions

    DEFF Research Database (Denmark)

    Bladt, Mogens; Nielsen, Bo Friis

    2010-01-01

    be written as linear combinations of the elements in the exponential of a matrix. For this reason we shall refer to multivariate distributions with rational Laplace transform as multivariate matrix-exponential distributions (MVME). The marginal distributions of an MVME are univariate matrix......-exponential distributions. We prove a characterization that states that a distribution is an MVME distribution if and only if all non-negative, non-null linear combinations of the coordinates have a univariate matrix-exponential distribution. This theorem is analog to a well-known characterization theorem...

  3. On the Relationship Between Confidence Sets and Exchangeable Weights in Multiple Linear Regression.

    Science.gov (United States)

    Pek, Jolynn; Chalmers, R Philip; Monette, Georges

    2016-01-01

    When statistical models are employed to provide a parsimonious description of empirical relationships, the extent to which strong conclusions can be drawn rests on quantifying the uncertainty in parameter estimates. In multiple linear regression (MLR), regression weights carry two kinds of uncertainty represented by confidence sets (CSs) and exchangeable weights (EWs). Confidence sets quantify uncertainty in estimation whereas the set of EWs quantify uncertainty in the substantive interpretation of regression weights. As CSs and EWs share certain commonalities, we clarify the relationship between these two kinds of uncertainty about regression weights. We introduce a general framework describing how CSs and the set of EWs for regression weights are estimated from the likelihood-based and Wald-type approach, and establish the analytical relationship between CSs and sets of EWs. With empirical examples on posttraumatic growth of caregivers (Cadell et al., 2014; Schneider, Steele, Cadell & Hemsworth, 2011) and on graduate grade point average (Kuncel, Hezlett & Ones, 2001), we illustrate the usefulness of CSs and EWs for drawing strong scientific conclusions. We discuss the importance of considering both CSs and EWs as part of the scientific process, and provide an Online Appendix with R code for estimating Wald-type CSs and EWs for k regression weights.

  4. Significance tests to determine the direction of effects in linear regression models.

    Science.gov (United States)

    Wiedermann, Wolfgang; Hagmann, Michael; von Eye, Alexander

    2015-02-01

    Previous studies have discussed asymmetric interpretations of the Pearson correlation coefficient and have shown that higher moments can be used to decide on the direction of dependence in the bivariate linear regression setting. The current study extends this approach by illustrating that the third moment of regression residuals may also be used to derive conclusions concerning the direction of effects. Assuming non-normally distributed variables, it is shown that the distribution of residuals of the correctly specified regression model (e.g., Y is regressed on X) is more symmetric than the distribution of residuals of the competing model (i.e., X is regressed on Y). Based on this result, 4 one-sample tests are discussed which can be used to decide which variable is more likely to be the response and which one is more likely to be the explanatory variable. A fifth significance test is proposed based on the differences of skewness estimates, which leads to a more direct test of a hypothesis that is compatible with direction of dependence. A Monte Carlo simulation study was performed to examine the behaviour of the procedures under various degrees of associations, sample sizes, and distributional properties of the underlying population. An empirical example is given which illustrates the application of the tests in practice. © 2014 The British Psychological Society.

  5. Linear multivariate evaluation models for spatial perception of soundscape.

    Science.gov (United States)

    Deng, Zhiyong; Kang, Jian; Wang, Daiwei; Liu, Aili; Kang, Joe Zhengyu

    2015-11-01

    Soundscape is a sound environment that emphasizes the awareness of auditory perception and social or cultural understandings. The case of spatial perception is significant to soundscape. However, previous studies on the auditory spatial perception of the soundscape environment have been limited. Based on 21 native binaural-recorded soundscape samples and a set of auditory experiments for subjective spatial perception (SSP), a study of the analysis among semantic parameters, the inter-aural-cross-correlation coefficient (IACC), A-weighted-equal sound-pressure-level (L(eq)), dynamic (D), and SSP is introduced to verify the independent effect of each parameter and to re-determine some of their possible relationships. The results show that the more noisiness the audience perceived, the worse spatial awareness they received, while the closer and more directional the sound source image variations, dynamics, and numbers of sound sources in the soundscape are, the better the spatial awareness would be. Thus, the sensations of roughness, sound intensity, transient dynamic, and the values of Leq and IACC have a suitable range for better spatial perception. A better spatial awareness seems to promote the preference slightly for the audience. Finally, setting SSPs as functions of the semantic parameters and Leq-D-IACC, two linear multivariate evaluation models of subjective spatial perception are proposed.

  6. Multiple linear regression analysis

    Science.gov (United States)

    Edwards, T. R.

    1980-01-01

    Program rapidly selects best-suited set of coefficients. User supplies only vectors of independent and dependent data and specifies confidence level required. Program uses stepwise statistical procedure for relating minimal set of variables to set of observations; final regression contains only most statistically significant coefficients. Program is written in FORTRAN IV for batch execution and has been implemented on NOVA 1200.

  7. Estimating a graphical intra-class correlation coefficient (GICC) using multivariate probit-linear mixed models.

    Science.gov (United States)

    Yue, Chen; Chen, Shaojie; Sair, Haris I; Airan, Raag; Caffo, Brian S

    2015-09-01

    Data reproducibility is a critical issue in all scientific experiments. In this manuscript, the problem of quantifying the reproducibility of graphical measurements is considered. The image intra-class correlation coefficient (I2C2) is generalized and the graphical intra-class correlation coefficient (GICC) is proposed for such purpose. The concept for GICC is based on multivariate probit-linear mixed effect models. A Markov Chain Monte Carlo EM (mcm-cEM) algorithm is used for estimating the GICC. Simulation results with varied settings are demonstrated and our method is applied to the KIRBY21 test-retest dataset.

  8. Dynamic Optimization for IPS2 Resource Allocation Based on Improved Fuzzy Multiple Linear Regression

    Directory of Open Access Journals (Sweden)

    Maokuan Zheng

    2017-01-01

    Full Text Available The study mainly focuses on resource allocation optimization for industrial product-service systems (IPS2. The development of IPS2 leads to sustainable economy by introducing cooperative mechanisms apart from commodity transaction. The randomness and fluctuation of service requests from customers lead to the volatility of IPS2 resource utilization ratio. Three basic rules for resource allocation optimization are put forward to improve system operation efficiency and cut unnecessary costs. An approach based on fuzzy multiple linear regression (FMLR is developed, which integrates the strength and concision of multiple linear regression in data fitting and factor analysis and the merit of fuzzy theory in dealing with uncertain or vague problems, which helps reduce those costs caused by unnecessary resource transfer. The iteration mechanism is introduced in the FMLR algorithm to improve forecasting accuracy. A case study of human resource allocation optimization in construction machinery industry is implemented to test and verify the proposed model.

  9. A Linear Regression Model for Global Solar Radiation on Horizontal Surfaces at Warri, Nigeria

    Directory of Open Access Journals (Sweden)

    Michael S. Okundamiya

    2013-10-01

    Full Text Available The growing anxiety on the negative effects of fossil fuels on the environment and the global emission reduction targets call for a more extensive use of renewable energy alternatives. Efficient solar energy utilization is an essential solution to the high atmospheric pollution caused by fossil fuel combustion. Global solar radiation (GSR data, which are useful for the design and evaluation of solar energy conversion system, are not measured at the forty-five meteorological stations in Nigeria. The dearth of the measured solar radiation data calls for accurate estimation. This study proposed a temperature-based linear regression, for predicting the monthly average daily GSR on horizontal surfaces, at Warri (latitude 5.020N and longitude 7.880E an oil city located in the south-south geopolitical zone, in Nigeria. The proposed model is analyzed based on five statistical indicators (coefficient of correlation, coefficient of determination, mean bias error, root mean square error, and t-statistic, and compared with the existing sunshine-based model for the same study. The results indicate that the proposed temperature-based linear regression model could replace the existing sunshine-based model for generating global solar radiation data. Keywords: air temperature; empirical model; global solar radiation; regression analysis; renewable energy; Warri

  10. Assessing the response of area burned to changing climate in western boreal North America using a Multivariate Adaptive Regression Splines (MARS) approach

    Science.gov (United States)

    Michael S. Balshi; A. David McGuire; Paul Duffy; Mike Flannigan; John Walsh; Jerry Melillo

    2009-01-01

    We developed temporally and spatially explicit relationships between air temperature and fuel moisture codes derived from the Canadian Fire Weather Index System to estimate annual area burned at 2.5o (latitude x longitude) resolution using a Multivariate Adaptive Regression Spline (MARS) approach across Alaska and Canada. Burned area was...

  11. Partitioning of late gestation energy expenditure in ewes using indirect calorimetry and a linear regression approach

    DEFF Research Database (Denmark)

    Kiani, Alishir; Chwalibog, André; Nielsen, Mette O

    2007-01-01

    Late gestation energy expenditure (EE(gest)) originates from energy expenditure (EE) of development of conceptus (EE(conceptus)) and EE of homeorhetic adaptation of metabolism (EE(homeorhetic)). Even though EE(gest) is relatively easy to quantify, its partitioning is problematic. In the present...... study metabolizable energy (ME) intake ranges for twin-bearing ewes were 220-440, 350- 700, 350-900 kJ per metabolic body weight (W0.75) at week seven, five, two pre-partum respectively. Indirect calorimetry and a linear regression approach were used to quantify EE(gest) and then partition to EE......(conceptus) and EE(homeorhetic). Energy expenditure of basal metabolism of the non-gravid tissues (EE(bmng)), derived from the intercept of the linear regression equation of retained energy [kJ/W0.75] and ME intake [kJ/W(0.75)], was 298 [kJ/ W0.75]. Values of the intercepts of the regression equations at week seven...

  12. Suppression Situations in Multiple Linear Regression

    Science.gov (United States)

    Shieh, Gwowen

    2006-01-01

    This article proposes alternative expressions for the two most prevailing definitions of suppression without resorting to the standardized regression modeling. The formulation provides a simple basis for the examination of their relationship. For the two-predictor regression, the author demonstrates that the previous results in the literature are…

  13. Multiple linear regression and regression with time series error models in forecasting PM10 concentrations in Peninsular Malaysia.

    Science.gov (United States)

    Ng, Kar Yong; Awang, Norhashidah

    2018-01-06

    Frequent haze occurrences in Malaysia have made the management of PM 10 (particulate matter with aerodynamic less than 10 μm) pollution a critical task. This requires knowledge on factors associating with PM 10 variation and good forecast of PM 10 concentrations. Hence, this paper demonstrates the prediction of 1-day-ahead daily average PM 10 concentrations based on predictor variables including meteorological parameters and gaseous pollutants. Three different models were built. They were multiple linear regression (MLR) model with lagged predictor variables (MLR1), MLR model with lagged predictor variables and PM 10 concentrations (MLR2) and regression with time series error (RTSE) model. The findings revealed that humidity, temperature, wind speed, wind direction, carbon monoxide and ozone were the main factors explaining the PM 10 variation in Peninsular Malaysia. Comparison among the three models showed that MLR2 model was on a same level with RTSE model in terms of forecasting accuracy, while MLR1 model was the worst.

  14. Bayesian linear regression : different conjugate models and their (in)sensitivity to prior-data conflict

    NARCIS (Netherlands)

    Walter, G.M.; Augustin, Th.; Kneib, Thomas; Tutz, Gerhard

    2010-01-01

    The paper is concerned with Bayesian analysis under prior-data conflict, i.e. the situation when observed data are rather unexpected under the prior (and the sample size is not large enough to eliminate the influence of the prior). Two approaches for Bayesian linear regression modeling based on

  15. A Simple and Convenient Method of Multiple Linear Regression to Calculate Iodine Molecular Constants

    Science.gov (United States)

    Cooper, Paul D.

    2010-01-01

    A new procedure using a student-friendly least-squares multiple linear-regression technique utilizing a function within Microsoft Excel is described that enables students to calculate molecular constants from the vibronic spectrum of iodine. This method is advantageous pedagogically as it calculates molecular constants for ground and excited…

  16. Robust multivariate analysis

    CERN Document Server

    J Olive, David

    2017-01-01

    This text presents methods that are robust to the assumption of a multivariate normal distribution or methods that are robust to certain types of outliers. Instead of using exact theory based on the multivariate normal distribution, the simpler and more applicable large sample theory is given.  The text develops among the first practical robust regression and robust multivariate location and dispersion estimators backed by theory.   The robust techniques  are illustrated for methods such as principal component analysis, canonical correlation analysis, and factor analysis.  A simple way to bootstrap confidence regions is also provided. Much of the research on robust multivariate analysis in this book is being published for the first time. The text is suitable for a first course in Multivariate Statistical Analysis or a first course in Robust Statistics. This graduate text is also useful for people who are familiar with the traditional multivariate topics, but want to know more about handling data sets with...

  17. Comparing Regression Coefficients between Nested Linear Models for Clustered Data with Generalized Estimating Equations

    Science.gov (United States)

    Yan, Jun; Aseltine, Robert H., Jr.; Harel, Ofer

    2013-01-01

    Comparing regression coefficients between models when one model is nested within another is of great practical interest when two explanations of a given phenomenon are specified as linear models. The statistical problem is whether the coefficients associated with a given set of covariates change significantly when other covariates are added into…

  18. Using the Coefficient of Determination "R"[superscript 2] to Test the Significance of Multiple Linear Regression

    Science.gov (United States)

    Quinino, Roberto C.; Reis, Edna A.; Bessegato, Lupercio F.

    2013-01-01

    This article proposes the use of the coefficient of determination as a statistic for hypothesis testing in multiple linear regression based on distributions acquired by beta sampling. (Contains 3 figures.)

  19. INTRODUCTION TO A COMBINED MULTIPLE LINEAR REGRESSION AND ARMA MODELING APPROACH FOR BEACH BACTERIA PREDICTION

    Science.gov (United States)

    Due to the complexity of the processes contributing to beach bacteria concentrations, many researchers rely on statistical modeling, among which multiple linear regression (MLR) modeling is most widely used. Despite its ease of use and interpretation, there may be time dependence...

  20. Daily Suspended Sediment Discharge Prediction Using Multiple Linear Regression and Artificial Neural Network

    Science.gov (United States)

    Uca; Toriman, Ekhwan; Jaafar, Othman; Maru, Rosmini; Arfan, Amal; Saleh Ahmar, Ansari

    2018-01-01

    Prediction of suspended sediment discharge in a catchments area is very important because it can be used to evaluation the erosion hazard, management of its water resources, water quality, hydrology project management (dams, reservoirs, and irrigation) and to determine the extent of the damage that occurred in the catchments. Multiple Linear Regression analysis and artificial neural network can be used to predict the amount of daily suspended sediment discharge. Regression analysis using the least square method, whereas artificial neural networks using Radial Basis Function (RBF) and feedforward multilayer perceptron with three learning algorithms namely Levenberg-Marquardt (LM), Scaled Conjugate Descent (SCD) and Broyden-Fletcher-Goldfarb-Shanno Quasi-Newton (BFGS). The number neuron of hidden layer is three to sixteen, while in output layer only one neuron because only one output target. The mean absolute error (MAE), root mean square error (RMSE), coefficient of determination (R2 ) and coefficient of efficiency (CE) of the multiple linear regression (MLRg) value Model 2 (6 input variable independent) has the lowest the value of MAE and RMSE (0.0000002 and 13.6039) and highest R2 and CE (0.9971 and 0.9971). When compared between LM, SCG and RBF, the BFGS model structure 3-7-1 is the better and more accurate to prediction suspended sediment discharge in Jenderam catchment. The performance value in testing process, MAE and RMSE (13.5769 and 17.9011) is smallest, meanwhile R2 and CE (0.9999 and 0.9998) is the highest if it compared with the another BFGS Quasi-Newton model (6-3-1, 9-10-1 and 12-12-1). Based on the performance statistics value, MLRg, LM, SCG, BFGS and RBF suitable and accurately for prediction by modeling the non-linear complex behavior of suspended sediment responses to rainfall, water depth and discharge. The comparison between artificial neural network (ANN) and MLRg, the MLRg Model 2 accurately for to prediction suspended sediment discharge (kg

  1. Computer software for linear and nonlinear regression in organic NMR; Programa de computador para regressao linear e nao linear em R.M.N. organica

    Energy Technology Data Exchange (ETDEWEB)

    Canto, Eduardo Leite do; Rittner, Roberto [Universidade Estadual de Campinas, SP (Brazil). Inst. de Quimica

    1992-12-31

    Calculation involving two variable linear regressions, require specific procedures generally not familiar to chemist. For attending the necessity of fast and efficient handling of NMR data, a self explained and Pc portable software has been developed, which allows user to produce and use diskette recorded tables, containing chemical shift or any other substituent physical-chemical measurements and constants ({sigma}{sub T}, {sigma}{sup o}{sub R}, E{sub s}, ...) 9 refs., 1 fig.

  2. Straight line fitting and predictions: On a marginal likelihood approach to linear regression and errors-in-variables models

    Science.gov (United States)

    Christiansen, Bo

    2015-04-01

    Linear regression methods are without doubt the most used approaches to describe and predict data in the physical sciences. They are often good first order approximations and they are in general easier to apply and interpret than more advanced methods. However, even the properties of univariate regression can lead to debate over the appropriateness of various models as witnessed by the recent discussion about climate reconstruction methods. Before linear regression is applied important choices have to be made regarding the origins of the noise terms and regarding which of the two variables under consideration that should be treated as the independent variable. These decisions are often not easy to make but they may have a considerable impact on the results. We seek to give a unified probabilistic - Bayesian with flat priors - treatment of univariate linear regression and prediction by taking, as starting point, the general errors-in-variables model (Christiansen, J. Clim., 27, 2014-2031, 2014). Other versions of linear regression can be obtained as limits of this model. We derive the likelihood of the model parameters and predictands of the general errors-in-variables model by marginalizing over the nuisance parameters. The resulting likelihood is relatively simple and easy to analyze and calculate. The well known unidentifiability of the errors-in-variables model is manifested as the absence of a well-defined maximum in the likelihood. However, this does not mean that probabilistic inference can not be made; the marginal likelihoods of model parameters and the predictands have, in general, well-defined maxima. We also include a probabilistic version of classical calibration and show how it is related to the errors-in-variables model. The results are illustrated by an example from the coupling between the lower stratosphere and the troposphere in the Northern Hemisphere winter.

  3. The detection of influential subsets in linear regression using an influence matrix

    OpenAIRE

    Peña, Daniel; Yohai, Víctor J.

    1991-01-01

    This paper presents a new method to identify influential subsets in linear regression problems. The procedure uses the eigenstructure of an influence matrix which is defined as the matrix of uncentered covariance of the effect on the whole data set of deleting each observation, normalized to include the univariate Cook's statistics in the diagonal. It is shown that points in an influential subset will appear with large weight in at least one of the eigenvector linked to the largest eigenvalue...

  4. An Investigation of the Fit of Linear Regression Models to Data from an SAT[R] Validity Study. Research Report 2011-3

    Science.gov (United States)

    Kobrin, Jennifer L.; Sinharay, Sandip; Haberman, Shelby J.; Chajewski, Michael

    2011-01-01

    This study examined the adequacy of a multiple linear regression model for predicting first-year college grade point average (FYGPA) using SAT[R] scores and high school grade point average (HSGPA). A variety of techniques, both graphical and statistical, were used to examine if it is possible to improve on the linear regression model. The results…

  5. Risk factors for pedicled flap necrosis in hand soft tissue reconstruction: a multivariate logistic regression analysis.

    Science.gov (United States)

    Gong, Xu; Cui, Jianli; Jiang, Ziping; Lu, Laijin; Li, Xiucun

    2018-03-01

    Few clinical retrospective studies have reported the risk factors of pedicled flap necrosis in hand soft tissue reconstruction. The aim of this study was to identify non-technical risk factors associated with pedicled flap perioperative necrosis in hand soft tissue reconstruction via a multivariate logistic regression analysis. For patients with hand soft tissue reconstruction, we carefully reviewed hospital records and identified 163 patients who met the inclusion criteria. The characteristics of these patients, flap transfer procedures and postoperative complications were recorded. Eleven predictors were identified. The correlations between pedicled flap necrosis and risk factors were analysed using a logistic regression model. Of 163 skin flaps, 125 flaps survived completely without any complications. The pedicled flap necrosis rate in hands was 11.04%, which included partial flap necrosis (7.36%) and total flap necrosis (3.68%). Soft tissue defects in fingers were noted in 68.10% of all cases. The logistic regression analysis indicated that the soft tissue defect site (P = 0.046, odds ratio (OR) = 0.079, confidence interval (CI) (0.006, 0.959)), flap size (P = 0.020, OR = 1.024, CI (1.004, 1.045)) and postoperative wound infection (P < 0.001, OR = 17.407, CI (3.821, 79.303)) were statistically significant risk factors for pedicled flap necrosis of the hand. Soft tissue defect site, flap size and postoperative wound infection were risk factors associated with pedicled flap necrosis in hand soft tissue defect reconstruction. © 2017 Royal Australasian College of Surgeons.

  6. Uncertainty of pesticide residue concentration determined from ordinary and weighted linear regression curve.

    Science.gov (United States)

    Yolci Omeroglu, Perihan; Ambrus, Árpad; Boyacioglu, Dilek

    2018-03-28

    Determination of pesticide residues is based on calibration curves constructed for each batch of analysis. Calibration standard solutions are prepared from a known amount of reference material at different concentration levels covering the concentration range of the analyte in the analysed samples. In the scope of this study, the applicability of both ordinary linear and weighted linear regression (OLR and WLR) for pesticide residue analysis was investigated. We used 782 multipoint calibration curves obtained for 72 different analytical batches with high-pressure liquid chromatography equipped with an ultraviolet detector, and gas chromatography with electron capture, nitrogen phosphorus or mass spectrophotometer detectors. Quality criteria of the linear curves including regression coefficient, standard deviation of relative residuals and deviation of back calculated concentrations were calculated both for WLR and OLR methods. Moreover, the relative uncertainty of the predicted analyte concentration was estimated for both methods. It was concluded that calibration curve based on WLR complies with all the quality criteria set by international guidelines compared to those calculated with OLR. It means that all the data fit well with WLR for pesticide residue analysis. It was estimated that, regardless of the actual concentration range of the calibration, relative uncertainty at the lowest calibrated level ranged between 0.3% and 113.7% for OLR and between 0.2% and 22.1% for WLR. At or above 1/3 of the calibrated range, uncertainty of calibration curve ranged between 0.1% and 16.3% for OLR and 0% and 12.2% for WLR, and therefore, the two methods gave comparable results.

  7. The regression-calibration method for fitting generalized linear models with additive measurement error

    OpenAIRE

    James W. Hardin; Henrik Schmeidiche; Raymond J. Carroll

    2003-01-01

    This paper discusses and illustrates the method of regression calibration. This is a straightforward technique for fitting models with additive measurement error. We present this discussion in terms of generalized linear models (GLMs) following the notation defined in Hardin and Carroll (2003). Discussion will include specified measurement error, measurement error estimated by replicate error-prone proxies, and measurement error estimated by instrumental variables. The discussion focuses on s...

  8. Non-linear multivariate and multiscale monitoring and signal denoising strategy using Kernel Principal Component Analysis combined with Ensemble Empirical Mode Decomposition method

    Science.gov (United States)

    Žvokelj, Matej; Zupan, Samo; Prebil, Ivan

    2011-10-01

    The article presents a novel non-linear multivariate and multiscale statistical process monitoring and signal denoising method which combines the strengths of the Kernel Principal Component Analysis (KPCA) non-linear multivariate monitoring approach with the benefits of Ensemble Empirical Mode Decomposition (EEMD) to handle multiscale system dynamics. The proposed method which enables us to cope with complex even severe non-linear systems with a wide dynamic range was named the EEMD-based multiscale KPCA (EEMD-MSKPCA). The method is quite general in nature and could be used in different areas for various tasks even without any really deep understanding of the nature of the system under consideration. Its efficiency was first demonstrated by an illustrative example, after which the applicability for the task of bearing fault detection, diagnosis and signal denosing was tested on simulated as well as actual vibration and acoustic emission (AE) signals measured on purpose-built large-size low-speed bearing test stand. The positive results obtained indicate that the proposed EEMD-MSKPCA method provides a promising tool for tackling non-linear multiscale data which present a convolved picture of many events occupying different regions in the time-frequency plane.

  9. Time-Frequency Analysis of Non-Stationary Biological Signals with Sparse Linear Regression Based Fourier Linear Combiner

    Directory of Open Access Journals (Sweden)

    Yubo Wang

    2017-06-01

    Full Text Available It is often difficult to analyze biological signals because of their nonlinear and non-stationary characteristics. This necessitates the usage of time-frequency decomposition methods for analyzing the subtle changes in these signals that are often connected to an underlying phenomena. This paper presents a new approach to analyze the time-varying characteristics of such signals by employing a simple truncated Fourier series model, namely the band-limited multiple Fourier linear combiner (BMFLC. In contrast to the earlier designs, we first identified the sparsity imposed on the signal model in order to reformulate the model to a sparse linear regression model. The coefficients of the proposed model are then estimated by a convex optimization algorithm. The performance of the proposed method was analyzed with benchmark test signals. An energy ratio metric is employed to quantify the spectral performance and results show that the proposed method Sparse-BMFLC has high mean energy (0.9976 ratio and outperforms existing methods such as short-time Fourier transfrom (STFT, continuous Wavelet transform (CWT and BMFLC Kalman Smoother. Furthermore, the proposed method provides an overall 6.22% in reconstruction error.

  10. Time-Frequency Analysis of Non-Stationary Biological Signals with Sparse Linear Regression Based Fourier Linear Combiner.

    Science.gov (United States)

    Wang, Yubo; Veluvolu, Kalyana C

    2017-06-14

    It is often difficult to analyze biological signals because of their nonlinear and non-stationary characteristics. This necessitates the usage of time-frequency decomposition methods for analyzing the subtle changes in these signals that are often connected to an underlying phenomena. This paper presents a new approach to analyze the time-varying characteristics of such signals by employing a simple truncated Fourier series model, namely the band-limited multiple Fourier linear combiner (BMFLC). In contrast to the earlier designs, we first identified the sparsity imposed on the signal model in order to reformulate the model to a sparse linear regression model. The coefficients of the proposed model are then estimated by a convex optimization algorithm. The performance of the proposed method was analyzed with benchmark test signals. An energy ratio metric is employed to quantify the spectral performance and results show that the proposed method Sparse-BMFLC has high mean energy (0.9976) ratio and outperforms existing methods such as short-time Fourier transfrom (STFT), continuous Wavelet transform (CWT) and BMFLC Kalman Smoother. Furthermore, the proposed method provides an overall 6.22% in reconstruction error.

  11. A spline-based regression parameter set for creating customized DARTEL MRI brain templates from infancy to old age

    Directory of Open Access Journals (Sweden)

    Marko Wilke

    2018-02-01

    Full Text Available This dataset contains the regression parameters derived by analyzing segmented brain MRI images (gray matter and white matter from a large population of healthy subjects, using a multivariate adaptive regression splines approach. A total of 1919 MRI datasets ranging in age from 1–75 years from four publicly available datasets (NIH, C-MIND, fCONN, and IXI were segmented using the CAT12 segmentation framework, writing out gray matter and white matter images normalized using an affine-only spatial normalization approach. These images were then subjected to a six-step DARTEL procedure, employing an iterative non-linear registration approach and yielding increasingly crisp intermediate images. The resulting six datasets per tissue class were then analyzed using multivariate adaptive regression splines, using the CerebroMatic toolbox. This approach allows for flexibly modelling smoothly varying trajectories while taking into account demographic (age, gender as well as technical (field strength, data quality predictors. The resulting regression parameters described here can be used to generate matched DARTEL or SHOOT templates for a given population under study, from infancy to old age. The dataset and the algorithm used to generate it are publicly available at https://irc.cchmc.org/software/cerebromatic.php. Keywords: MRI template creation, Multivariate adaptive regression splines, DARTEL, Structural MRI

  12. Heteroscedasticity as a Basis of Direction Dependence in Reversible Linear Regression Models.

    Science.gov (United States)

    Wiedermann, Wolfgang; Artner, Richard; von Eye, Alexander

    2017-01-01

    Heteroscedasticity is a well-known issue in linear regression modeling. When heteroscedasticity is observed, researchers are advised to remedy possible model misspecification of the explanatory part of the model (e.g., considering alternative functional forms and/or omitted variables). The present contribution discusses another source of heteroscedasticity in observational data: Directional model misspecifications in the case of nonnormal variables. Directional misspecification refers to situations where alternative models are equally likely to explain the data-generating process (e.g., x → y versus y → x). It is shown that the homoscedasticity assumption is likely to be violated in models that erroneously treat true nonnormal predictors as response variables. Recently, Direction Dependence Analysis (DDA) has been proposed as a framework to empirically evaluate the direction of effects in linear models. The present study links the phenomenon of heteroscedasticity with DDA and describes visual diagnostics and nine homoscedasticity tests that can be used to make decisions concerning the direction of effects in linear models. Results of a Monte Carlo simulation that demonstrate the adequacy of the approach are presented. An empirical example is provided, and applicability of the methodology in cases of violated assumptions is discussed.

  13. Multivariate return periods of sea storms for coastal erosion risk assessment

    Directory of Open Access Journals (Sweden)

    S. Corbella

    2012-08-01

    Full Text Available The erosion of a beach depends on various storm characteristics. Ideally, the risk associated with a storm would be described by a single multivariate return period that is also representative of the erosion risk, i.e. a 100 yr multivariate storm return period would cause a 100 yr erosion return period. Unfortunately, a specific probability level may be associated with numerous combinations of storm characteristics. These combinations, despite having the same multivariate probability, may cause very different erosion outcomes. This paper explores this ambiguity problem in the context of copula based multivariate return periods and using a case study at Durban on the east coast of South Africa. Simulations were used to correlate multivariate return periods of historical events to return periods of estimated storm induced erosion volumes. In addition, the relationship of the most-likely design event (Salvadori et al., 2011 to coastal erosion was investigated. It was found that the multivariate return periods for wave height and duration had the highest correlation to erosion return periods. The most-likely design event was found to be an inadequate design method in its current form. We explore the inclusion of conditions based on the physical realizability of wave events and the use of multivariate linear regression to relate storm parameters to erosion computed from a process based model. Establishing a link between storm statistics and erosion consequences can resolve the ambiguity between multivariate storm return periods and associated erosion return periods.

  14. Two Paradoxes in Linear Regression Analysis

    Science.gov (United States)

    FENG, Ge; PENG, Jing; TU, Dongke; ZHENG, Julia Z.; FENG, Changyong

    2016-01-01

    Summary Regression is one of the favorite tools in applied statistics. However, misuse and misinterpretation of results from regression analysis are common in biomedical research. In this paper we use statistical theory and simulation studies to clarify some paradoxes around this popular statistical method. In particular, we show that a widely used model selection procedure employed in many publications in top medical journals is wrong. Formal procedures based on solid statistical theory should be used in model selection. PMID:28638214

  15. Regression model of support vector machines for least squares prediction of crystallinity of cracking catalysts by infrared spectroscopy

    International Nuclear Information System (INIS)

    Comesanna Garcia, Yumirka; Dago Morales, Angel; Talavera Bustamante, Isneri

    2010-01-01

    The recently introduction of the least squares support vector machines method for regression purposes in the field of Chemometrics has provided several advantages to linear and nonlinear multivariate calibration methods. The objective of the paper was to propose the use of the least squares support vector machine as an alternative multivariate calibration method for the prediction of the percentage of crystallinity of fluidized catalytic cracking catalysts, by means of Fourier transform mid-infrared spectroscopy. A linear kernel was used in the calculations of the regression model. The optimization of its gamma parameter was carried out using the leave-one-out cross-validation procedure. The root mean square error of prediction was used to measure the performance of the model. The accuracy of the results obtained with the application of the method is in accordance with the uncertainty of the X-ray powder diffraction reference method. To compare the generalization capability of the developed method, a comparison study was carried out, taking into account the results achieved with the new model and those reached through the application of linear calibration methods. The developed method can be easily implemented in refinery laboratories

  16. How to deal with continuous and dichotomic outcomes in epidemiological research: linear and logistic regression analyses

    NARCIS (Netherlands)

    Tripepi, Giovanni; Jager, Kitty J.; Stel, Vianda S.; Dekker, Friedo W.; Zoccali, Carmine

    2011-01-01

    Because of some limitations of stratification methods, epidemiologists frequently use multiple linear and logistic regression analyses to address specific epidemiological questions. If the dependent variable is a continuous one (for example, systolic pressure and serum creatinine), the researcher

  17. BFLCRM: A BAYESIAN FUNCTIONAL LINEAR COX REGRESSION MODEL FOR PREDICTING TIME TO CONVERSION TO ALZHEIMER'S DISEASE.

    Science.gov (United States)

    Lee, Eunjee; Zhu, Hongtu; Kong, Dehan; Wang, Yalin; Giovanello, Kelly Sullivan; Ibrahim, Joseph G

    2015-12-01

    The aim of this paper is to develop a Bayesian functional linear Cox regression model (BFLCRM) with both functional and scalar covariates. This new development is motivated by establishing the likelihood of conversion to Alzheimer's disease (AD) in 346 patients with mild cognitive impairment (MCI) enrolled in the Alzheimer's Disease Neuroimaging Initiative 1 (ADNI-1) and the early markers of conversion. These 346 MCI patients were followed over 48 months, with 161 MCI participants progressing to AD at 48 months. The functional linear Cox regression model was used to establish that functional covariates including hippocampus surface morphology and scalar covariates including brain MRI volumes, cognitive performance (ADAS-Cog), and APOE status can accurately predict time to onset of AD. Posterior computation proceeds via an efficient Markov chain Monte Carlo algorithm. A simulation study is performed to evaluate the finite sample performance of BFLCRM.

  18. Toward Customer-Centric Organizational Science: A Common Language Effect Size Indicator for Multiple Linear Regressions and Regressions With Higher-Order Terms.

    Science.gov (United States)

    Krasikova, Dina V; Le, Huy; Bachura, Eric

    2018-01-22

    To address a long-standing concern regarding a gap between organizational science and practice, scholars called for more intuitive and meaningful ways of communicating research results to users of academic research. In this article, we develop a common language effect size index (CLβ) that can help translate research results to practice. We demonstrate how CLβ can be computed and used to interpret the effects of continuous and categorical predictors in multiple linear regression models. We also elaborate on how the proposed CLβ index is computed and used to interpret interactions and nonlinear effects in regression models. In addition, we test the robustness of the proposed index to violations of normality and provide means for computing standard errors and constructing confidence intervals around its estimates. (PsycINFO Database Record (c) 2018 APA, all rights reserved).

  19. Real time computer control of a nonlinear Multivariable System via Linearization and Stability Analysis

    International Nuclear Information System (INIS)

    Raza, K.S.M.

    2004-01-01

    This paper demonstrates that if a complicated nonlinear, non-square, state-coupled multi variable system is smartly linearized and subjected to a thorough stability analysis then we can achieve our design objectives via a controller which will be quite simple (in term of resource usage and execution time) and very efficient (in terms of robustness). Further the aim is to implement this controller via computer in a real time environment. Therefore first a nonlinear mathematical model of the system is achieved. An intelligent work is done to decouple the multivariable system. Linearization and stability analysis techniques are employed for the development of a linearized and mathematically sound control law. Nonlinearities like the saturation in actuators are also been catered. The controller is then discretized using Runge-Kutta integration. Finally the discretized control law is programmed in a computer in a real time environment. The programme is done in RT -Linux using GNU C for the real time realization of the control scheme. The real time processes, like sampling and controlled actuation, and the non real time processes, like graphical user interface and display, are programmed as different tasks. The issue of inter process communication, between real time and non real time task is addressed quite carefully. The results of this research pursuit are presented graphically. (author)

  20. A simple bias correction in linear regression for quantitative trait association under two-tail extreme selection.

    Science.gov (United States)

    Kwan, Johnny S H; Kung, Annie W C; Sham, Pak C

    2011-09-01

    Selective genotyping can increase power in quantitative trait association. One example of selective genotyping is two-tail extreme selection, but simple linear regression analysis gives a biased genetic effect estimate. Here, we present a simple correction for the bias.

  1. Introduction to regression graphics

    CERN Document Server

    Cook, R Dennis

    2009-01-01

    Covers the use of dynamic and interactive computer graphics in linear regression analysis, focusing on analytical graphics. Features new techniques like plot rotation. The authors have composed their own regression code, using Xlisp-Stat language called R-code, which is a nearly complete system for linear regression analysis and can be utilized as the main computer program in a linear regression course. The accompanying disks, for both Macintosh and Windows computers, contain the R-code and Xlisp-Stat. An Instructor's Manual presenting detailed solutions to all the problems in the book is ava

  2. Endpoint in plasma etch process using new modified w-multivariate charts and windowed regression

    Science.gov (United States)

    Zakour, Sihem Ben; Taleb, Hassen

    2017-09-01

    Endpoint detection is very important undertaking on the side of getting a good understanding and figuring out if a plasma etching process is done in the right way, especially if the etched area is very small (0.1%). It truly is a crucial part of supplying repeatable effects in every single wafer. When the film being etched has been completely cleared, the endpoint is reached. To ensure the desired device performance on the produced integrated circuit, the high optical emission spectroscopy (OES) sensor is employed. The huge number of gathered wavelengths (profiles) is then analyzed and pre-processed using a new proposed simple algorithm named Spectra peak selection (SPS) to select the important wavelengths, then we employ wavelet analysis (WA) to enhance the performance of detection by suppressing noise and redundant information. The selected and treated OES wavelengths are then used in modified multivariate control charts (MEWMA and Hotelling) for three statistics (mean, SD and CV) and windowed polynomial regression for mean. The employ of three aforementioned statistics is motivated by controlling mean shift, variance shift and their ratio (CV) if both mean and SD are not stable. The control charts show their performance in detecting endpoint especially W-mean Hotelling chart and the worst result is given by CV statistic. As the best detection of endpoint is given by the W-Hotelling mean statistic, this statistic will be used to construct a windowed wavelet Hotelling polynomial regression. This latter can only identify the window containing endpoint phenomenon.

  3. Hippocampal atrophy and developmental regression as first sign of linear scleroderma "en coup de sabre".

    Science.gov (United States)

    Verhelst, Helene E; Beele, Hilde; Joos, Rik; Vanneuville, Benedicte; Van Coster, Rudy N

    2008-11-01

    An 8-year-old girl with linear scleroderma "en coup de sabre" is reported who, at preschool age, presented with intractable simple partial seizures more than 1 year before skin lesions were first noticed. MRI revealed hippocampal atrophy, controlaterally to the seizures and ipsilaterally to the skin lesions. In the following months, a mental and motor regression was noticed. Cerebral CT scan showed multiple foci of calcifications in the affected hemisphere. In previously reported patients the skin lesions preceded the neurological signs. To the best of our knowledge, hippocampal atrophy was not earlier reported as presenting symptom of linear scleroderma. Linear scleroderma should be included in the differential diagnosis in patients with unilateral hippocampal atrophy even when the typical skin lesions are not present.

  4. Bayesian quantile regression-based partially linear mixed-effects joint models for longitudinal data with multiple features.

    Science.gov (United States)

    Zhang, Hanze; Huang, Yangxin; Wang, Wei; Chen, Henian; Langland-Orban, Barbara

    2017-01-01

    In longitudinal AIDS studies, it is of interest to investigate the relationship between HIV viral load and CD4 cell counts, as well as the complicated time effect. Most of common models to analyze such complex longitudinal data are based on mean-regression, which fails to provide efficient estimates due to outliers and/or heavy tails. Quantile regression-based partially linear mixed-effects models, a special case of semiparametric models enjoying benefits of both parametric and nonparametric models, have the flexibility to monitor the viral dynamics nonparametrically and detect the varying CD4 effects parametrically at different quantiles of viral load. Meanwhile, it is critical to consider various data features of repeated measurements, including left-censoring due to a limit of detection, covariate measurement error, and asymmetric distribution. In this research, we first establish a Bayesian joint models that accounts for all these data features simultaneously in the framework of quantile regression-based partially linear mixed-effects models. The proposed models are applied to analyze the Multicenter AIDS Cohort Study (MACS) data. Simulation studies are also conducted to assess the performance of the proposed methods under different scenarios.

  5. Multivariate methods and forecasting with IBM SPSS statistics

    CERN Document Server

    Aljandali, Abdulkader

    2017-01-01

    This is the second of a two-part guide to quantitative analysis using the IBM SPSS Statistics software package; this volume focuses on multivariate statistical methods and advanced forecasting techniques. More often than not, regression models involve more than one independent variable. For example, forecasting methods are commonly applied to aggregates such as inflation rates, unemployment, exchange rates, etc., that have complex relationships with determining variables. This book introduces multivariate regression models and provides examples to help understand theory underpinning the model. The book presents the fundamentals of multivariate regression and then moves on to examine several related techniques that have application in business-orientated fields such as logistic and multinomial regression. Forecasting tools such as the Box-Jenkins approach to time series modeling are introduced, as well as exponential smoothing and naïve techniques. This part also covers hot topics such as Factor Analysis, Dis...

  6. Application of genetic algorithm - multiple linear regressions to predict the activity of RSK inhibitors

    Directory of Open Access Journals (Sweden)

    Avval Zhila Mohajeri

    2015-01-01

    Full Text Available This paper deals with developing a linear quantitative structure-activity relationship (QSAR model for predicting the RSK inhibition activity of some new compounds. A dataset consisting of 62 pyrazino [1,2-α] indole, diazepino [1,2-α] indole, and imidazole derivatives with known inhibitory activities was used. Multiple linear regressions (MLR technique combined with the stepwise (SW and the genetic algorithm (GA methods as variable selection tools was employed. For more checking stability, robustness and predictability of the proposed models, internal and external validation techniques were used. Comparison of the results obtained, indicate that the GA-MLR model is superior to the SW-MLR model and that it isapplicable for designing novel RSK inhibitors.

  7. A computer tool for a minimax criterion in binary response and heteroscedastic simple linear regression models.

    Science.gov (United States)

    Casero-Alonso, V; López-Fidalgo, J; Torsney, B

    2017-01-01

    Binary response models are used in many real applications. For these models the Fisher information matrix (FIM) is proportional to the FIM of a weighted simple linear regression model. The same is also true when the weight function has a finite integral. Thus, optimal designs for one binary model are also optimal for the corresponding weighted linear regression model. The main objective of this paper is to provide a tool for the construction of MV-optimal designs, minimizing the maximum of the variances of the estimates, for a general design space. MV-optimality is a potentially difficult criterion because of its nondifferentiability at equal variance designs. A methodology for obtaining MV-optimal designs where the design space is a compact interval [a, b] will be given for several standard weight functions. The methodology will allow us to build a user-friendly computer tool based on Mathematica to compute MV-optimal designs. Some illustrative examples will show a representation of MV-optimal designs in the Euclidean plane, taking a and b as the axes. The applet will be explained using two relevant models. In the first one the case of a weighted linear regression model is considered, where the weight function is directly chosen from a typical family. In the second example a binary response model is assumed, where the probability of the outcome is given by a typical probability distribution. Practitioners can use the provided applet to identify the solution and to know the exact support points and design weights. Copyright © 2016 Elsevier Ireland Ltd. All rights reserved.

  8. Regional trends in short-duration precipitation extremes: a flexible multivariate monotone quantile regression approach

    Science.gov (United States)

    Cannon, Alex

    2017-04-01

    univariate technique, and cannot incorporate information from additional covariates, for example ENSO state or physiographic controls on extreme rainfall within a region. Here, the univariate MQR model is extended to allow the use of multiple covariates. Multivariate monotone quantile regression (MMQR) is based on a single hidden-layer feedforward network with the quantile regression error function and partial monotonicity constraints. The MMQR model is demonstrated via Monte Carlo simulations and the estimation and visualization of regional trends in moderate rainfall extremes based on homogenized sub-daily precipitation data at stations in Canada.

  9. Two-Sample Tests for High-Dimensional Linear Regression with an Application to Detecting Interactions.

    Science.gov (United States)

    Xia, Yin; Cai, Tianxi; Cai, T Tony

    2018-01-01

    Motivated by applications in genomics, we consider in this paper global and multiple testing for the comparisons of two high-dimensional linear regression models. A procedure for testing the equality of the two regression vectors globally is proposed and shown to be particularly powerful against sparse alternatives. We then introduce a multiple testing procedure for identifying unequal coordinates while controlling the false discovery rate and false discovery proportion. Theoretical justifications are provided to guarantee the validity of the proposed tests and optimality results are established under sparsity assumptions on the regression coefficients. The proposed testing procedures are easy to implement. Numerical properties of the procedures are investigated through simulation and data analysis. The results show that the proposed tests maintain the desired error rates under the null and have good power under the alternative at moderate sample sizes. The procedures are applied to the Framingham Offspring study to investigate the interactions between smoking and cardiovascular related genetic mutations important for an inflammation marker.

  10. An Exact Confidence Region in Multivariate Calibration

    OpenAIRE

    Mathew, Thomas; Kasala, Subramanyam

    1994-01-01

    In the multivariate calibration problem using a multivariate linear model, an exact confidence region is constructed. It is shown that the region is always nonempty and is invariant under nonsingular transformations.

  11. A simple bias correction in linear regression for quantitative trait association under two-tail extreme selection

    OpenAIRE

    Kwan, Johnny S. H.; Kung, Annie W. C.; Sham, Pak C.

    2011-01-01

    Selective genotyping can increase power in quantitative trait association. One example of selective genotyping is two-tail extreme selection, but simple linear regression analysis gives a biased genetic effect estimate. Here, we present a simple correction for the bias. © The Author(s) 2011.

  12. Method validation using weighted linear regression models for quantification of UV filters in water samples.

    Science.gov (United States)

    da Silva, Claudia Pereira; Emídio, Elissandro Soares; de Marchi, Mary Rosa Rodrigues

    2015-01-01

    This paper describes the validation of a method consisting of solid-phase extraction followed by gas chromatography-tandem mass spectrometry for the analysis of the ultraviolet (UV) filters benzophenone-3, ethylhexyl salicylate, ethylhexyl methoxycinnamate and octocrylene. The method validation criteria included evaluation of selectivity, analytical curve, trueness, precision, limits of detection and limits of quantification. The non-weighted linear regression model has traditionally been used for calibration, but it is not necessarily the optimal model in all cases. Because the assumption of homoscedasticity was not met for the analytical data in this work, a weighted least squares linear regression was used for the calibration method. The evaluated analytical parameters were satisfactory for the analytes and showed recoveries at four fortification levels between 62% and 107%, with relative standard deviations less than 14%. The detection limits ranged from 7.6 to 24.1 ng L(-1). The proposed method was used to determine the amount of UV filters in water samples from water treatment plants in Araraquara and Jau in São Paulo, Brazil. Copyright © 2014 Elsevier B.V. All rights reserved.

  13. Linear and evolutionary polynomial regression models to forecast coastal dynamics: Comparison and reliability assessment

    Science.gov (United States)

    Bruno, Delia Evelina; Barca, Emanuele; Goncalves, Rodrigo Mikosz; de Araujo Queiroz, Heithor Alexandre; Berardi, Luigi; Passarella, Giuseppe

    2018-01-01

    In this paper, the Evolutionary Polynomial Regression data modelling strategy has been applied to study small scale, short-term coastal morphodynamics, given its capability for treating a wide database of known information, non-linearly. Simple linear and multilinear regression models were also applied to achieve a balance between the computational load and reliability of estimations of the three models. In fact, even though it is easy to imagine that the more complex the model, the more the prediction improves, sometimes a "slight" worsening of estimations can be accepted in exchange for the time saved in data organization and computational load. The models' outcomes were validated through a detailed statistical, error analysis, which revealed a slightly better estimation of the polynomial model with respect to the multilinear model, as expected. On the other hand, even though the data organization was identical for the two models, the multilinear one required a simpler simulation setting and a faster run time. Finally, the most reliable evolutionary polynomial regression model was used in order to make some conjecture about the uncertainty increase with the extension of extrapolation time of the estimation. The overlapping rate between the confidence band of the mean of the known coast position and the prediction band of the estimated position can be a good index of the weakness in producing reliable estimations when the extrapolation time increases too much. The proposed models and tests have been applied to a coastal sector located nearby Torre Colimena in the Apulia region, south Italy.

  14. Improving ASTER GDEM Accuracy Using Land Use-Based Linear Regression Methods: A Case Study of Lianyungang, East China

    Directory of Open Access Journals (Sweden)

    Xiaoyan Yang

    2018-04-01

    Full Text Available The Advanced Spaceborne Thermal-Emission and Reflection Radiometer Global Digital Elevation Model (ASTER GDEM is important to a wide range of geographical and environmental studies. Its accuracy, to some extent associated with land-use types reflecting topography, vegetation coverage, and human activities, impacts the results and conclusions of these studies. In order to improve the accuracy of ASTER GDEM prior to its application, we investigated ASTER GDEM errors based on individual land-use types and proposed two linear regression calibration methods, one considering only land use-specific errors and the other considering the impact of both land-use and topography. Our calibration methods were tested on the coastal prefectural city of Lianyungang in eastern China. Results indicate that (1 ASTER GDEM is highly accurate for rice, wheat, grass and mining lands but less accurate for scenic, garden, wood and bare lands; (2 despite improvements in ASTER GDEM2 accuracy, multiple linear regression calibration requires more data (topography and a relatively complex calibration process; (3 simple linear regression calibration proves a practicable and simplified means to systematically investigate and improve the impact of land-use on ASTER GDEM accuracy. Our method is applicable to areas with detailed land-use data based on highly accurate field-based point-elevation measurements.

  15. Estimating integrated variance in the presence of microstructure noise using linear regression

    Science.gov (United States)

    Holý, Vladimír

    2017-07-01

    Using financial high-frequency data for estimation of integrated variance of asset prices is beneficial but with increasing number of observations so-called microstructure noise occurs. This noise can significantly bias the realized variance estimator. We propose a method for estimation of the integrated variance robust to microstructure noise as well as for testing the presence of the noise. Our method utilizes linear regression in which realized variances estimated from different data subsamples act as dependent variable while the number of observations act as explanatory variable. We compare proposed estimator with other methods on simulated data for several microstructure noise structures.

  16. Face Hallucination with Linear Regression Model in Semi-Orthogonal Multilinear PCA Method

    Science.gov (United States)

    Asavaskulkiet, Krissada

    2018-04-01

    In this paper, we propose a new face hallucination technique, face images reconstruction in HSV color space with a semi-orthogonal multilinear principal component analysis method. This novel hallucination technique can perform directly from tensors via tensor-to-vector projection by imposing the orthogonality constraint in only one mode. In our experiments, we use facial images from FERET database to test our hallucination approach which is demonstrated by extensive experiments with high-quality hallucinated color faces. The experimental results assure clearly demonstrated that we can generate photorealistic color face images by using the SO-MPCA subspace with a linear regression model.

  17. Multivariate regression models for the simultaneous quantitative analysis of calcium and magnesium carbonates and magnesium oxide through drifts data

    Directory of Open Access Journals (Sweden)

    Marder Luciano

    2006-01-01

    Full Text Available In the present work multivariate regression models were developed for the quantitative analysis of ternary systems using Diffuse Reflectance Infrared Fourier Transform Spectroscopy (DRIFTS to determine the concentration in weight of calcium carbonate, magnesium carbonate and magnesium oxide. Nineteen spectra of standard samples previously defined in ternary diagram by mixture design were prepared and mid-infrared diffuse reflectance spectra were recorded. The partial least squares (PLS regression method was applied to the model. The spectra set was preprocessed by either mean-centered and variance-scaled (model 2 or mean-centered only (model 1. The results based on the prediction performance of the external validation set expressed by RMSEP (root mean square error of prediction demonstrated that it is possible to develop good models to simultaneously determine calcium carbonate, magnesium carbonate and magnesium oxide content in powdered samples that can be used in the study of the thermal decomposition of dolomite rocks.

  18. Genomic prediction based on data from three layer lines using non-linear regression models.

    Science.gov (United States)

    Huang, Heyun; Windig, Jack J; Vereijken, Addie; Calus, Mario P L

    2014-11-06

    Most studies on genomic prediction with reference populations that include multiple lines or breeds have used linear models. Data heterogeneity due to using multiple populations may conflict with model assumptions used in linear regression methods. In an attempt to alleviate potential discrepancies between assumptions of linear models and multi-population data, two types of alternative models were used: (1) a multi-trait genomic best linear unbiased prediction (GBLUP) model that modelled trait by line combinations as separate but correlated traits and (2) non-linear models based on kernel learning. These models were compared to conventional linear models for genomic prediction for two lines of brown layer hens (B1 and B2) and one line of white hens (W1). The three lines each had 1004 to 1023 training and 238 to 240 validation animals. Prediction accuracy was evaluated by estimating the correlation between observed phenotypes and predicted breeding values. When the training dataset included only data from the evaluated line, non-linear models yielded at best a similar accuracy as linear models. In some cases, when adding a distantly related line, the linear models showed a slight decrease in performance, while non-linear models generally showed no change in accuracy. When only information from a closely related line was used for training, linear models and non-linear radial basis function (RBF) kernel models performed similarly. The multi-trait GBLUP model took advantage of the estimated genetic correlations between the lines. Combining linear and non-linear models improved the accuracy of multi-line genomic prediction. Linear models and non-linear RBF models performed very similarly for genomic prediction, despite the expectation that non-linear models could deal better with the heterogeneous multi-population data. This heterogeneity of the data can be overcome by modelling trait by line combinations as separate but correlated traits, which avoids the occasional

  19. Temperature uniformity control in RTP using multivariable adaptive control

    Energy Technology Data Exchange (ETDEWEB)

    Morales, S.; Dahhou, B.; Dilhac, J.M. [Centre National de la Recherche Scientifique (CNRS), 31 - Toulouse (France); Morales, S.

    1995-12-31

    In Rapid Thermal Processing (RTP) control of the wafer temperature during all processing to get good trajectory following, together with spatial temperature uniformity, is essential. It is well know as RTP process is nonlinear, classical control laws are not very efficient. In this work, the authors aim at studying the applicability of MIMO (Multiple Inputs Multiple Outputs) adaptive techniques to solve the temperature control problems in RTP. A multivariable linear discrete time CARIMA (Controlled Auto Regressive Integrating Moving Average) model of the highly non-linear process is identified on-line using a robust identification technique. The identified model is used to compute an infinite time LQ (Linear Quadratic) based control law, with a partial state reference model. This reference model smooths the original setpoint sequence, and at the same time gives a tracking capability to the LQ control law. After an experimental open-loop investigation, the results of the application of the adaptive control law are presented. Finally, some comments on the future difficulties and developments of the application of adaptive control in RTP are given. (author) 13 refs.

  20. Multivariate calibration on NIR data: development of a model for the rapid evaluation of ethanol content in bakery products.

    Science.gov (United States)

    Bello, Alessandra; Bianchi, Federica; Careri, Maria; Giannetto, Marco; Mori, Giovanni; Musci, Marilena

    2007-11-05

    A new NIR method based on multivariate calibration for determination of ethanol in industrially packed wholemeal bread was developed and validated. GC-FID was used as reference method for the determination of actual ethanol concentration of different samples of wholemeal bread with proper content of added ethanol, ranging from 0 to 3.5% (w/w). Stepwise discriminant analysis was carried out on the NIR dataset, in order to reduce the number of original variables by selecting those that were able to discriminate between the samples of different ethanol concentrations. With the so selected variables a multivariate calibration model was then obtained by multiple linear regression. The prediction power of the linear model was optimized by a new "leave one out" method, so that the number of original variables resulted further reduced.

  1. Linear regression based on Minimum Covariance Determinant (MCD) and TELBS methods on the productivity of phytoplankton

    Science.gov (United States)

    Gusriani, N.; Firdaniza

    2018-03-01

    The existence of outliers on multiple linear regression analysis causes the Gaussian assumption to be unfulfilled. If the Least Square method is forcedly used on these data, it will produce a model that cannot represent most data. For that, we need a robust regression method against outliers. This paper will compare the Minimum Covariance Determinant (MCD) method and the TELBS method on secondary data on the productivity of phytoplankton, which contains outliers. Based on the robust determinant coefficient value, MCD method produces a better model compared to TELBS method.

  2. U.S. Army Armament Research, Development and Engineering Center Grain Evaluation Software to Numerically Predict Linear Burn Regression for Solid Propellant Grain Geometries

    Science.gov (United States)

    2017-10-01

    ENGINEERING CENTER GRAIN EVALUATION SOFTWARE TO NUMERICALLY PREDICT LINEAR BURN REGRESSION FOR SOLID PROPELLANT GRAIN GEOMETRIES Brian...distribution is unlimited. AD U.S. ARMY ARMAMENT RESEARCH, DEVELOPMENT AND ENGINEERING CENTER Munitions Engineering Technology Center Picatinny...U.S. ARMY ARMAMENT RESEARCH, DEVELOPMENT AND ENGINEERING CENTER GRAIN EVALUATION SOFTWARE TO NUMERICALLY PREDICT LINEAR BURN REGRESSION FOR SOLID

  3. Isotherms and thermodynamics by linear and non-linear regression analysis for the sorption of methylene blue onto activated carbon: Comparison of various error functions

    International Nuclear Information System (INIS)

    Kumar, K. Vasanth; Porkodi, K.; Rocha, F.

    2008-01-01

    A comparison of linear and non-linear regression method in selecting the optimum isotherm was made to the experimental equilibrium data of methylene blue sorption by activated carbon. The r 2 was used to select the best fit linear theoretical isotherm. In the case of non-linear regression method, six error functions, namely coefficient of determination (r 2 ), hybrid fractional error function (HYBRID), Marquardt's percent standard deviation (MPSD), average relative error (ARE), sum of the errors squared (ERRSQ) and sum of the absolute errors (EABS) were used to predict the parameters involved in the two and three parameter isotherms and also to predict the optimum isotherm. For two parameter isotherm, MPSD was found to be the best error function in minimizing the error distribution between the experimental equilibrium data and predicted isotherms. In the case of three parameter isotherm, r 2 was found to be the best error function to minimize the error distribution structure between experimental equilibrium data and theoretical isotherms. The present study showed that the size of the error function alone is not a deciding factor to choose the optimum isotherm. In addition to the size of error function, the theory behind the predicted isotherm should be verified with the help of experimental data while selecting the optimum isotherm. A coefficient of non-determination, K 2 was explained and was found to be very useful in identifying the best error function while selecting the optimum isotherm

  4. Multivariate linear models and repeated measurements revisited

    DEFF Research Database (Denmark)

    Dalgaard, Peter

    2009-01-01

    Methods for generalized analysis of variance based on multivariate normal theory have been known for many years. In a repeated measurements context, it is most often of interest to consider transformed responses, typically within-subject contrasts or averages. Efficiency considerations leads...... to sphericity assumptions, use of F tests and the Greenhouse-Geisser and Huynh-Feldt adjustments to compensate for deviations from sphericity. During a recent implementation of such methods in the R language, the general structure of such transformations was reconsidered, leading to a flexible specification...

  5. Aspects of robust linear regression

    NARCIS (Netherlands)

    Davies, P.L.

    1993-01-01

    Section 1 of the paper contains a general discussion of robustness. In Section 2 the influence function of the Hampel-Rousseeuw least median of squares estimator is derived. Linearly invariant weak metrics are constructed in Section 3. It is shown in Section 4 that $S$-estimators satisfy an exact

  6. A novel simple QSAR model for the prediction of anti-HIV activity using multiple linear regression analysis.

    Science.gov (United States)

    Afantitis, Antreas; Melagraki, Georgia; Sarimveis, Haralambos; Koutentis, Panayiotis A; Markopoulos, John; Igglessi-Markopoulou, Olga

    2006-08-01

    A quantitative-structure activity relationship was obtained by applying Multiple Linear Regression Analysis to a series of 80 1-[2-hydroxyethoxy-methyl]-6-(phenylthio) thymine (HEPT) derivatives with significant anti-HIV activity. For the selection of the best among 37 different descriptors, the Elimination Selection Stepwise Regression Method (ES-SWR) was utilized. The resulting QSAR model (R (2) (CV) = 0.8160; S (PRESS) = 0.5680) proved to be very accurate both in training and predictive stages.

  7. Fast Detection of Copper Content in Rice by Laser-Induced Breakdown Spectroscopy with Uni- and Multivariate Analysis

    Science.gov (United States)

    Ye, Lanhan; Song, Kunlin; Shen, Tingting

    2018-01-01

    Fast detection of heavy metals is very important for ensuring the quality and safety of crops. Laser-induced breakdown spectroscopy (LIBS), coupled with uni- and multivariate analysis, was applied for quantitative analysis of copper in three kinds of rice (Jiangsu rice, regular rice, and Simiao rice). For univariate analysis, three pre-processing methods were applied to reduce fluctuations, including background normalization, the internal standard method, and the standard normal variate (SNV). Linear regression models showed a strong correlation between spectral intensity and Cu content, with an R2 more than 0.97. The limit of detection (LOD) was around 5 ppm, lower than the tolerance limit of copper in foods. For multivariate analysis, partial least squares regression (PLSR) showed its advantage in extracting effective information for prediction, and its sensitivity reached 1.95 ppm, while support vector machine regression (SVMR) performed better in both calibration and prediction sets, where Rc2 and Rp2 reached 0.9979 and 0.9879, respectively. This study showed that LIBS could be considered as a constructive tool for the quantification of copper contamination in rice. PMID:29495445

  8. Partitioning of Multivariate Phenotypes using Regression Trees Reveals Complex Patterns of Adaptation to Climate across the Range of Black Cottonwood (Populus trichocarpa

    Directory of Open Access Journals (Sweden)

    Regis Wendpouire Oubida

    2015-03-01

    Full Text Available Local adaptation to climate in temperate forest trees involves the integration of multiple physiological, morphological, and phenological traits. Latitudinal clines are frequently observed for these traits, but environmental constraints also track longitude and altitude. We combined extensive phenotyping of 12 candidate adaptive traits, multivariate regression trees, quantitative genetics, and a genome-wide panel of SNP markers to better understand the interplay among geography, climate, and adaptation to abiotic factors in Populus trichocarpa. Heritabilities were low to moderate (0.13 to 0.32 and population differentiation for many traits exceeded the 99th percentile of the genome-wide distribution of FST, suggesting local adaptation. When climate variables were taken as predictors and the 12 traits as response variables in a multivariate regression tree analysis, evapotranspiration (Eref explained the most variation, with subsequent splits related to mean temperature of the warmest month, frost-free period (FFP, and mean annual precipitation (MAP. These grouping matched relatively well the splits using geographic variables as predictors: the northernmost groups (short FFP and low Eref had the lowest growth, and lowest cold injury index; the southern British Columbia group (low Eref and intermediate temperatures had average growth and cold injury index; the group from the coast of California and Oregon (high Eref and FFP had the highest growth performance and the highest cold injury index; and the southernmost, high-altitude group (with high Eref and low FFP performed poorly, had high cold injury index, and lower water use efficiency. Taken together, these results suggest variation in both temperature and water availability across the range shape multivariate adaptive traits in poplar.

  9. Characteristics and Properties of a Simple Linear Regression Model

    Directory of Open Access Journals (Sweden)

    Kowal Robert

    2016-12-01

    Full Text Available A simple linear regression model is one of the pillars of classic econometrics. Despite the passage of time, it continues to raise interest both from the theoretical side as well as from the application side. One of the many fundamental questions in the model concerns determining derivative characteristics and studying the properties existing in their scope, referring to the first of these aspects. The literature of the subject provides several classic solutions in that regard. In the paper, a completely new design is proposed, based on the direct application of variance and its properties, resulting from the non-correlation of certain estimators with the mean, within the scope of which some fundamental dependencies of the model characteristics are obtained in a much more compact manner. The apparatus allows for a simple and uniform demonstration of multiple dependencies and fundamental properties in the model, and it does it in an intuitive manner. The results were obtained in a classic, traditional area, where everything, as it might seem, has already been thoroughly studied and discovered.

  10. Exhaustive Search for Sparse Variable Selection in Linear Regression

    Science.gov (United States)

    Igarashi, Yasuhiko; Takenaka, Hikaru; Nakanishi-Ohno, Yoshinori; Uemura, Makoto; Ikeda, Shiro; Okada, Masato

    2018-04-01

    We propose a K-sparse exhaustive search (ES-K) method and a K-sparse approximate exhaustive search method (AES-K) for selecting variables in linear regression. With these methods, K-sparse combinations of variables are tested exhaustively assuming that the optimal combination of explanatory variables is K-sparse. By collecting the results of exhaustively computing ES-K, various approximate methods for selecting sparse variables can be summarized as density of states. With this density of states, we can compare different methods for selecting sparse variables such as relaxation and sampling. For large problems where the combinatorial explosion of explanatory variables is crucial, the AES-K method enables density of states to be effectively reconstructed by using the replica-exchange Monte Carlo method and the multiple histogram method. Applying the ES-K and AES-K methods to type Ia supernova data, we confirmed the conventional understanding in astronomy when an appropriate K is given beforehand. However, we found the difficulty to determine K from the data. Using virtual measurement and analysis, we argue that this is caused by data shortage.

  11. Soil moisture estimation using multi linear regression with terraSAR-X data

    Directory of Open Access Journals (Sweden)

    G. García

    2016-06-01

    Full Text Available The first five centimeters of soil form an interface where the main heat fluxes exchanges between the land surface and the atmosphere occur. Besides ground measurements, remote sensing has proven to be an excellent tool for the monitoring of spatial and temporal distributed data of the most relevant Earth surface parameters including soil’s parameters. Indeed, active microwave sensors (Synthetic Aperture Radar - SAR offer the opportunity to monitor soil moisture (HS at global, regional and local scales by monitoring involved processes. Several inversion algorithms, that derive geophysical information as HS from SAR data, were developed. Many of them use electromagnetic models for simulating the backscattering coefficient and are based on statistical techniques, such as neural networks, inversion methods and regression models. Recent studies have shown that simple multiple regression techniques yield satisfactory results. The involved geophysical variables in these methodologies are descriptive of the soil structure, microwave characteristics and land use. Therefore, in this paper we aim at developing a multiple linear regression model to estimate HS on flat agricultural regions using TerraSAR-X satellite data and data from a ground weather station. The results show that the backscatter, the precipitation and the relative humidity are the explanatory variables of HS. The results obtained presented a RMSE of 5.4 and a R2  of about 0.6

  12. Linear Regression Based Real-Time Filtering

    Directory of Open Access Journals (Sweden)

    Misel Batmend

    2013-01-01

    Full Text Available This paper introduces real time filtering method based on linear least squares fitted line. Method can be used in case that a filtered signal is linear. This constraint narrows a band of potential applications. Advantage over Kalman filter is that it is computationally less expensive. The paper further deals with application of introduced method on filtering data used to evaluate a position of engraved material with respect to engraving machine. The filter was implemented to the CNC engraving machine control system. Experiments showing its performance are included.

  13. A unified framework for testing in the linear regression model under unknown order of fractional integration

    DEFF Research Database (Denmark)

    Christensen, Bent Jesper; Kruse, Robinson; Sibbertsen, Philipp

    We consider hypothesis testing in a general linear time series regression framework when the possibly fractional order of integration of the error term is unknown. We show that the approach suggested by Vogelsang (1998a) for the case of integer integration does not apply to the case of fractional...

  14. Use of multiple linear regression and logistic regression models to investigate changes in birthweight for term singleton infants in Scotland.

    Science.gov (United States)

    Bonellie, Sandra R

    2012-10-01

    To illustrate the use of regression and logistic regression models to investigate changes over time in size of babies particularly in relation to social deprivation, age of the mother and smoking. Mean birthweight has been found to be increasing in many countries in recent years, but there are still a group of babies who are born with low birthweights. Population-based retrospective cohort study. Multiple linear regression and logistic regression models are used to analyse data on term 'singleton births' from Scottish hospitals between 1994-2003. Mothers who smoke are shown to give birth to lighter babies on average, a difference of approximately 0.57 Standard deviations lower (95% confidence interval. 0.55-0.58) when adjusted for sex and parity. These mothers are also more likely to have babies that are low birthweight (odds ratio 3.46, 95% confidence interval 3.30-3.63) compared with non-smokers. Low birthweight is 30% more likely where the mother lives in the most deprived areas compared with the least deprived, (odds ratio 1.30, 95% confidence interval 1.21-1.40). Smoking during pregnancy is shown to have a detrimental effect on the size of infants at birth. This effect explains some, though not all, of the observed socioeconomic birthweight. It also explains much of the observed birthweight differences by the age of the mother.   Identifying mothers at greater risk of having a low birthweight baby as important implications for the care and advice this group receives. © 2012 Blackwell Publishing Ltd.

  15. Comparing Machine Learning Classifiers and Linear/Logistic Regression to Explore the Relationship between Hand Dimensions and Demographic Characteristics.

    Science.gov (United States)

    Miguel-Hurtado, Oscar; Guest, Richard; Stevenage, Sarah V; Neil, Greg J; Black, Sue

    2016-01-01

    Understanding the relationship between physiological measurements from human subjects and their demographic data is important within both the biometric and forensic domains. In this paper we explore the relationship between measurements of the human hand and a range of demographic features. We assess the ability of linear regression and machine learning classifiers to predict demographics from hand features, thereby providing evidence on both the strength of relationship and the key features underpinning this relationship. Our results show that we are able to predict sex, height, weight and foot size accurately within various data-range bin sizes, with machine learning classification algorithms out-performing linear regression in most situations. In addition, we identify the features used to provide these relationships applicable across multiple applications.

  16. Credit Scoring Problem Based on Regression Analysis

    OpenAIRE

    Khassawneh, Bashar Suhil Jad Allah

    2014-01-01

    ABSTRACT: This thesis provides an explanatory introduction to the regression models of data mining and contains basic definitions of key terms in the linear, multiple and logistic regression models. Meanwhile, the aim of this study is to illustrate fitting models for the credit scoring problem using simple linear, multiple linear and logistic regression models and also to analyze the found model functions by statistical tools. Keywords: Data mining, linear regression, logistic regression....

  17. A Simple Linear Regression Method for Quantitative Trait Loci Linkage Analysis With Censored Observations

    OpenAIRE

    Anderson, Carl A.; McRae, Allan F.; Visscher, Peter M.

    2006-01-01

    Standard quantitative trait loci (QTL) mapping techniques commonly assume that the trait is both fully observed and normally distributed. When considering survival or age-at-onset traits these assumptions are often incorrect. Methods have been developed to map QTL for survival traits; however, they are both computationally intensive and not available in standard genome analysis software packages. We propose a grouped linear regression method for the analysis of continuous survival data. Using...

  18. Linear Regression between CIE-Lab Color Parameters and Organic Matter in Soils of Tea Plantations

    Science.gov (United States)

    Chen, Yonggen; Zhang, Min; Fan, Dongmei; Fan, Kai; Wang, Xiaochang

    2018-02-01

    To quantify the relationship between the soil organic matter and color parameters using the CIE-Lab system, 62 soil samples (0-10 cm, Ferralic Acrisols) from tea plantations were collected from southern China. After air-drying and sieving, numerical color information and reflectance spectra of soil samples were measured under laboratory conditions using an UltraScan VIS (HunterLab) spectrophotometer equipped with CIE-Lab color models. We found that soil total organic carbon (TOC) and nitrogen (TN) contents were negatively correlated with the L* value (lightness) ( r = -0.84 and -0.80, respectively), a* value (correlation coefficient r = -0.51 and -0.46, respectively) and b* value ( r = -0.76 and -0.70, respectively). There were also linear regressions between TOC and TN contents with the L* value and b* value. Results showed that color parameters from a spectrophotometer equipped with CIE-Lab color models can predict TOC contents well for soils in tea plantations. The linear regression model between color values and soil organic carbon contents showed it can be used as a rapid, cost-effective method to evaluate content of soil organic matter in Chinese tea plantations.

  19. Estimation of error components in a multi-error linear regression model, with an application to track fitting

    International Nuclear Information System (INIS)

    Fruehwirth, R.

    1993-01-01

    We present an estimation procedure of the error components in a linear regression model with multiple independent stochastic error contributions. After solving the general problem we apply the results to the estimation of the actual trajectory in track fitting with multiple scattering. (orig.)

  20. Calculation of U, Ra, Th and K contents in uranium ore by multiple linear regression method

    International Nuclear Information System (INIS)

    Lin Chao; Chen Yingqiang; Zhang Qingwen; Tan Fuwen; Peng Guanghui

    1991-01-01

    A multiple linear regression method was used to compute γ spectra of uranium ore samples and to calculate contents of U, Ra, Th, and K. In comparison with the inverse matrix method, its advantage is that no standard samples of pure U, Ra, Th and K are needed for obtaining response coefficients

  1. Fuzzy multiple linear regression: A computational approach

    Science.gov (United States)

    Juang, C. H.; Huang, X. H.; Fleming, J. W.

    1992-01-01

    This paper presents a new computational approach for performing fuzzy regression. In contrast to Bardossy's approach, the new approach, while dealing with fuzzy variables, closely follows the conventional regression technique. In this approach, treatment of fuzzy input is more 'computational' than 'symbolic.' The following sections first outline the formulation of the new approach, then deal with the implementation and computational scheme, and this is followed by examples to illustrate the new procedure.

  2. Multivariate Adaptative Regression Splines (MARS, una alternativa para el análisis de series de tiempo

    Directory of Open Access Journals (Sweden)

    Jairo Vanegas

    2017-05-01

    Full Text Available Multivariate Adaptative Regression Splines (MARS es un método de modelación no paramétrico que extiende el modelo lineal incorporando no linealidades e interacciones de variables. Es una herramienta flexible que automatiza la construcción de modelos de predicción, seleccionando variables relevantes, transformando las variables predictoras, tratando valores perdidos y previniendo sobreajustes mediante un autotest. También permite predecir tomando en cuenta factores estructurales que pudieran tener influencia sobre la variable respuesta, generando modelos hipotéticos. El resultado final serviría para identificar puntos de corte relevantes en series de datos. En el área de la salud es poco utilizado, por lo que se propone como una herramienta más para la evaluación de indicadores relevantes en salud pública. Para efectos demostrativos se utilizaron series de datos de mortalidad de menores de 5 años de Costa Rica en el periodo 1978-2008.

  3. Improving validation methods for molecular diagnostics: application of Bland-Altman, Deming and simple linear regression analyses in assay comparison and evaluation for next-generation sequencing.

    Science.gov (United States)

    Misyura, Maksym; Sukhai, Mahadeo A; Kulasignam, Vathany; Zhang, Tong; Kamel-Reid, Suzanne; Stockley, Tracy L

    2018-02-01

    A standard approach in test evaluation is to compare results of the assay in validation to results from previously validated methods. For quantitative molecular diagnostic assays, comparison of test values is often performed using simple linear regression and the coefficient of determination (R 2 ), using R 2 as the primary metric of assay agreement. However, the use of R 2 alone does not adequately quantify constant or proportional errors required for optimal test evaluation. More extensive statistical approaches, such as Bland-Altman and expanded interpretation of linear regression methods, can be used to more thoroughly compare data from quantitative molecular assays. We present the application of Bland-Altman and linear regression statistical methods to evaluate quantitative outputs from next-generation sequencing assays (NGS). NGS-derived data sets from assay validation experiments were used to demonstrate the utility of the statistical methods. Both Bland-Altman and linear regression were able to detect the presence and magnitude of constant and proportional error in quantitative values of NGS data. Deming linear regression was used in the context of assay comparison studies, while simple linear regression was used to analyse serial dilution data. Bland-Altman statistical approach was also adapted to quantify assay accuracy, including constant and proportional errors, and precision where theoretical and empirical values were known. The complementary application of the statistical methods described in this manuscript enables more extensive evaluation of performance characteristics of quantitative molecular assays, prior to implementation in the clinical molecular laboratory. © Article author(s) (or their employer(s) unless otherwise stated in the text of the article) 2018. All rights reserved. No commercial use is permitted unless otherwise expressly granted.

  4. Predicting Fuel Ignition Quality Using 1H NMR Spectroscopy and Multiple Linear Regression

    KAUST Repository

    Abdul Jameel, Abdul Gani

    2016-09-14

    An improved model for the prediction of ignition quality of hydrocarbon fuels has been developed using 1H nuclear magnetic resonance (NMR) spectroscopy and multiple linear regression (MLR) modeling. Cetane number (CN) and derived cetane number (DCN) of 71 pure hydrocarbons and 54 hydrocarbon blends were utilized as a data set to study the relationship between ignition quality and molecular structure. CN and DCN are functional equivalents and collectively referred to as D/CN, herein. The effect of molecular weight and weight percent of structural parameters such as paraffinic CH3 groups, paraffinic CH2 groups, paraffinic CH groups, olefinic CH–CH2 groups, naphthenic CH–CH2 groups, and aromatic C–CH groups on D/CN was studied. A particular emphasis on the effect of branching (i.e., methyl substitution) on the D/CN was studied, and a new parameter denoted as the branching index (BI) was introduced to quantify this effect. A new formula was developed to calculate the BI of hydrocarbon fuels using 1H NMR spectroscopy. Multiple linear regression (MLR) modeling was used to develop an empirical relationship between D/CN and the eight structural parameters. This was then used to predict the DCN of many hydrocarbon fuels. The developed model has a high correlation coefficient (R2 = 0.97) and was validated with experimentally measured DCN of twenty-two real fuel mixtures (e.g., gasolines and diesels) and fifty-nine blends of known composition, and the predicted values matched well with the experimental data.

  5. Modeling the potential risk factors of bovine viral diarrhea prevalence in Egypt using univariable and multivariable logistic regression analyses

    Directory of Open Access Journals (Sweden)

    Abdelfattah M. Selim

    2018-03-01

    Full Text Available Aim: The present cross-sectional study was conducted to determine the seroprevalence and potential risk factors associated with Bovine viral diarrhea virus (BVDV disease in cattle and buffaloes in Egypt, to model the potential risk factors associated with the disease using logistic regression (LR models, and to fit the best predictive model for the current data. Materials and Methods: A total of 740 blood samples were collected within November 2012-March 2013 from animals aged between 6 months and 3 years. The potential risk factors studied were species, age, sex, and herd location. All serum samples were examined with indirect ELIZA test for antibody detection. Data were analyzed with different statistical approaches such as Chi-square test, odds ratios (OR, univariable, and multivariable LR models. Results: Results revealed a non-significant association between being seropositive with BVDV and all risk factors, except for species of animal. Seroprevalence percentages were 40% and 23% for cattle and buffaloes, respectively. OR for all categories were close to one with the highest OR for cattle relative to buffaloes, which was 2.237. Likelihood ratio tests showed a significant drop of the -2LL from univariable LR to multivariable LR models. Conclusion: There was an evidence of high seroprevalence of BVDV among cattle as compared with buffaloes with the possibility of infection in different age groups of animals. In addition, multivariable LR model was proved to provide more information for association and prediction purposes relative to univariable LR models and Chi-square tests if we have more than one predictor.

  6. Using Multiple Linear Regression Techniques to Quantify Carbon ...

    African Journals Online (AJOL)

    komla

    Process and statistical models of productivity, though useful, are often ... The carbon balance of terrestrial ecosystems is uncertain, in part due to discrepancies and errors in .... The ecological data were collected through field work to include both .... Computer-aided Multivariate Analysis, Life Learning Publications, Belmont,.

  7. Spatially resolved regression analysis of pre-treatment FDG, FLT and Cu-ATSM PET from post-treatment FDG PET: an exploratory study

    Science.gov (United States)

    Bowen, Stephen R; Chappell, Richard J; Bentzen, Søren M; Deveau, Michael A; Forrest, Lisa J; Jeraj, Robert

    2012-01-01

    Purpose To quantify associations between pre-radiotherapy and post-radiotherapy PET parameters via spatially resolved regression. Materials and methods Ten canine sinonasal cancer patients underwent PET/CT scans of [18F]FDG (FDGpre), [18F]FLT (FLTpre), and [61Cu]Cu-ATSM (Cu-ATSMpre). Following radiotherapy regimens of 50 Gy in 10 fractions, veterinary patients underwent FDG PET/CT scans at three months (FDGpost). Regression of standardized uptake values in baseline FDGpre, FLTpre and Cu-ATSMpre tumour voxels to those in FDGpost images was performed for linear, log-linear, generalized-linear and mixed-fit linear models. Goodness-of-fit in regression coefficients was assessed by R2. Hypothesis testing of coefficients over the patient population was performed. Results Multivariate linear model fits of FDGpre to FDGpost were significantly positive over the population (FDGpost~0.17 FDGpre, p=0.03), and classified slopes of RECIST non-responders and responders to be different (0.37 vs. 0.07, p=0.01). Generalized-linear model fits related FDGpre to FDGpost by a linear power law (FDGpost~FDGpre0.93, pregression analysis indicates that pre-treatment FDG PET uptake is most strongly associated with three-month post-treatment FDG PET uptake in this patient population, though associations are histopathology-dependent. PMID:22682748

  8. New strategy for determination of anthocyanins, polyphenols and antioxidant capacity of Brassica oleracea liquid extract using infrared spectroscopies and multivariate regression

    Science.gov (United States)

    de Oliveira, Isadora R. N.; Roque, Jussara V.; Maia, Mariza P.; Stringheta, Paulo C.; Teófilo, Reinaldo F.

    2018-04-01

    A new method was developed to determine the antioxidant properties of red cabbage extract (Brassica oleracea) by mid (MID) and near (NIR) infrared spectroscopies and partial least squares (PLS) regression. A 70% (v/v) ethanolic extract of red cabbage was concentrated to 9° Brix and further diluted (12 to 100%) in water. The dilutions were used as external standards for the building of PLS models. For the first time, this strategy was applied for building multivariate regression models. Reference analyses and spectral data were obtained from diluted extracts. The determinate properties were total and monomeric anthocyanins, total polyphenols and antioxidant capacity by ABTS (2,2-azino-bis(3-ethyl-benzothiazoline-6-sulfonate)) and DPPH (2,2-diphenyl-1-picrylhydrazyl) methods. Ordered predictors selection (OPS) and genetic algorithm (GA) were used for feature selection before PLS regression (PLS-1). In addition, a PLS-2 regression was applied to all properties simultaneously. PLS-1 models provided more predictive models than did PLS-2 regression. PLS-OPS and PLS-GA models presented excellent prediction results with a correlation coefficient higher than 0.98. However, the best models were obtained using PLS and variable selection with the OPS algorithm and the models based on NIR spectra were considered more predictive for all properties. Then, these models provided a simple, rapid and accurate method for determination of red cabbage extract antioxidant properties and its suitability for use in the food industry.

  9. Construction of multiple linear regression models using blood biomarkers for selecting against abdominal fat traits in broilers.

    Science.gov (United States)

    Dong, J Q; Zhang, X Y; Wang, S Z; Jiang, X F; Zhang, K; Ma, G W; Wu, M Q; Li, H; Zhang, H

    2018-01-01

    Plasma very low-density lipoprotein (VLDL) can be used to select for low body fat or abdominal fat (AF) in broilers, but its correlation with AF is limited. We investigated whether any other biochemical indicator can be used in combination with VLDL for a better selective effect. Nineteen plasma biochemical indicators were measured in male chickens from the Northeast Agricultural University broiler lines divergently selected for AF content (NEAUHLF) in the fed state at 46 and 48 d of age. The average concentration of every parameter for the 2 d was used for statistical analysis. Levels of these 19 plasma biochemical parameters were compared between the lean and fat lines. The phenotypic correlations between these plasma biochemical indicators and AF traits were analyzed. Then, multiple linear regression models were constructed to select the best model used for selecting against AF content. and the heritabilities of plasma indicators contained in the best models were estimated. The results showed that 11 plasma biochemical indicators (triglycerides, total bile acid, total protein, globulin, albumin/globulin, aspartate transaminase, alanine transaminase, gamma-glutamyl transpeptidase, uric acid, creatinine, and VLDL) differed significantly between the lean and fat lines (P linear regression models based on albumin/globulin, VLDL, triglycerides, globulin, total bile acid, and uric acid, had higher R2 (0.73) than the model based only on VLDL (0.21). The plasma parameters included in the best models had moderate heritability estimates (0.21 ≤ h2 ≤ 0.43). These results indicate that these multiple linear regression models can be used to select for lean broiler chickens. © 2017 Poultry Science Association Inc.

  10. Simultaneous chemometric determination of pyridoxine hydrochloride and isoniazid in tablets by multivariate regression methods.

    Science.gov (United States)

    Dinç, Erdal; Ustündağ, Ozgür; Baleanu, Dumitru

    2010-08-01

    The sole use of pyridoxine hydrochloride during treatment of tuberculosis gives rise to pyridoxine deficiency. Therefore, a combination of pyridoxine hydrochloride and isoniazid is used in pharmaceutical dosage form in tuberculosis treatment to reduce this side effect. In this study, two chemometric methods, partial least squares (PLS) and principal component regression (PCR), were applied to the simultaneous determination of pyridoxine (PYR) and isoniazid (ISO) in their tablets. A concentration training set comprising binary mixtures of PYR and ISO consisting of 20 different combinations were randomly prepared in 0.1 M HCl. Both multivariate calibration models were constructed using the relationships between the concentration data set (concentration data matrix) and absorbance data matrix in the spectral region 200-330 nm. The accuracy and the precision of the proposed chemometric methods were validated by analyzing synthetic mixtures containing the investigated drugs. The recovery results obtained by applying PCR and PLS calibrations to the artificial mixtures were found between 100.0 and 100.7%. Satisfactory results obtained by applying the PLS and PCR methods to both artificial and commercial samples were obtained. The results obtained in this manuscript strongly encourage us to use them for the quality control and the routine analysis of the marketing tablets containing PYR and ISO drugs. Copyright © 2010 John Wiley & Sons, Ltd.

  11. Fast Detection of Copper Content in Rice by Laser-Induced Breakdown Spectroscopy with Uni- and Multivariate Analysis

    Directory of Open Access Journals (Sweden)

    Fei Liu

    2018-02-01

    Full Text Available Fast detection of heavy metals is very important for ensuring the quality and safety of crops. Laser-induced breakdown spectroscopy (LIBS, coupled with uni- and multivariate analysis, was applied for quantitative analysis of copper in three kinds of rice (Jiangsu rice, regular rice, and Simiao rice. For univariate analysis, three pre-processing methods were applied to reduce fluctuations, including background normalization, the internal standard method, and the standard normal variate (SNV. Linear regression models showed a strong correlation between spectral intensity and Cu content, with an R 2 more than 0.97. The limit of detection (LOD was around 5 ppm, lower than the tolerance limit of copper in foods. For multivariate analysis, partial least squares regression (PLSR showed its advantage in extracting effective information for prediction, and its sensitivity reached 1.95 ppm, while support vector machine regression (SVMR performed better in both calibration and prediction sets, where R c 2 and R p 2 reached 0.9979 and 0.9879, respectively. This study showed that LIBS could be considered as a constructive tool for the quantification of copper contamination in rice.

  12. Stock price forecasting for companies listed on Tehran stock exchange using multivariate adaptive regression splines model and semi-parametric splines technique

    Science.gov (United States)

    Rounaghi, Mohammad Mahdi; Abbaszadeh, Mohammad Reza; Arashi, Mohammad

    2015-11-01

    One of the most important topics of interest to investors is stock price changes. Investors whose goals are long term are sensitive to stock price and its changes and react to them. In this regard, we used multivariate adaptive regression splines (MARS) model and semi-parametric splines technique for predicting stock price in this study. The MARS model as a nonparametric method is an adaptive method for regression and it fits for problems with high dimensions and several variables. semi-parametric splines technique was used in this study. Smoothing splines is a nonparametric regression method. In this study, we used 40 variables (30 accounting variables and 10 economic variables) for predicting stock price using the MARS model and using semi-parametric splines technique. After investigating the models, we select 4 accounting variables (book value per share, predicted earnings per share, P/E ratio and risk) as influencing variables on predicting stock price using the MARS model. After fitting the semi-parametric splines technique, only 4 accounting variables (dividends, net EPS, EPS Forecast and P/E Ratio) were selected as variables effective in forecasting stock prices.

  13. Modelling lecturer performance index of private university in Tulungagung by using survival analysis with multivariate adaptive regression spline

    Science.gov (United States)

    Hasyim, M.; Prastyo, D. D.

    2018-03-01

    Survival analysis performs relationship between independent variables and survival time as dependent variable. In fact, not all survival data can be recorded completely by any reasons. In such situation, the data is called censored data. Moreover, several model for survival analysis requires assumptions. One of the approaches in survival analysis is nonparametric that gives more relax assumption. In this research, the nonparametric approach that is employed is Multivariate Regression Adaptive Spline (MARS). This study is aimed to measure the performance of private university’s lecturer. The survival time in this study is duration needed by lecturer to obtain their professional certificate. The results show that research activities is a significant factor along with developing courses material, good publication in international or national journal, and activities in research collaboration.

  14. Effects of measurement errors on psychometric measurements in ergonomics studies: Implications for correlations, ANOVA, linear regression, factor analysis, and linear discriminant analysis.

    Science.gov (United States)

    Liu, Yan; Salvendy, Gavriel

    2009-05-01

    This paper aims to demonstrate the effects of measurement errors on psychometric measurements in ergonomics studies. A variety of sources can cause random measurement errors in ergonomics studies and these errors can distort virtually every statistic computed and lead investigators to erroneous conclusions. The effects of measurement errors on five most widely used statistical analysis tools have been discussed and illustrated: correlation; ANOVA; linear regression; factor analysis; linear discriminant analysis. It has been shown that measurement errors can greatly attenuate correlations between variables, reduce statistical power of ANOVA, distort (overestimate, underestimate or even change the sign of) regression coefficients, underrate the explanation contributions of the most important factors in factor analysis and depreciate the significance of discriminant function and discrimination abilities of individual variables in discrimination analysis. The discussions will be restricted to subjective scales and survey methods and their reliability estimates. Other methods applied in ergonomics research, such as physical and electrophysiological measurements and chemical and biomedical analysis methods, also have issues of measurement errors, but they are beyond the scope of this paper. As there has been increasing interest in the development and testing of theories in ergonomics research, it has become very important for ergonomics researchers to understand the effects of measurement errors on their experiment results, which the authors believe is very critical to research progress in theory development and cumulative knowledge in the ergonomics field.

  15. Neck-focused panic attacks among Cambodian refugees; a logistic and linear regression analysis.

    Science.gov (United States)

    Hinton, Devon E; Chhean, Dara; Pich, Vuth; Um, Khin; Fama, Jeanne M; Pollack, Mark H

    2006-01-01

    Consecutive Cambodian refugees attending a psychiatric clinic were assessed for the presence and severity of current--i.e., at least one episode in the last month--neck-focused panic. Among the whole sample (N=130), in a logistic regression analysis, the Anxiety Sensitivity Index (ASI; odds ratio=3.70) and the Clinician-Administered PTSD Scale (CAPS; odds ratio=2.61) significantly predicted the presence of current neck panic (NP). Among the neck panic patients (N=60), in the linear regression analysis, NP severity was significantly predicted by NP-associated flashbacks (beta=.42), NP-associated catastrophic cognitions (beta=.22), and CAPS score (beta=.28). Further analysis revealed the effect of the CAPS score to be significantly mediated (Sobel test [Baron, R. M., & Kenny, D. A. (1986). The moderator-mediator variable distinction in social psychological research: conceptual, strategic, and statistical considerations. Journal of Personality and Social Psychology, 51, 1173-1182]) by both NP-associated flashbacks and catastrophic cognitions. In the care of traumatized Cambodian refugees, NP severity, as well as NP-associated flashbacks and catastrophic cognitions, should be specifically assessed and treated.

  16. Descriptor Learning via Supervised Manifold Regularization for Multioutput Regression.

    Science.gov (United States)

    Zhen, Xiantong; Yu, Mengyang; Islam, Ali; Bhaduri, Mousumi; Chan, Ian; Li, Shuo

    2017-09-01

    Multioutput regression has recently shown great ability to solve challenging problems in both computer vision and medical image analysis. However, due to the huge image variability and ambiguity, it is fundamentally challenging to handle the highly complex input-target relationship of multioutput regression, especially with indiscriminate high-dimensional representations. In this paper, we propose a novel supervised descriptor learning (SDL) algorithm for multioutput regression, which can establish discriminative and compact feature representations to improve the multivariate estimation performance. The SDL is formulated as generalized low-rank approximations of matrices with a supervised manifold regularization. The SDL is able to simultaneously extract discriminative features closely related to multivariate targets and remove irrelevant and redundant information by transforming raw features into a new low-dimensional space aligned to targets. The achieved discriminative while compact descriptor largely reduces the variability and ambiguity for multioutput regression, which enables more accurate and efficient multivariate estimation. We conduct extensive evaluation of the proposed SDL on both synthetic data and real-world multioutput regression tasks for both computer vision and medical image analysis. Experimental results have shown that the proposed SDL can achieve high multivariate estimation accuracy on all tasks and largely outperforms the algorithms in the state of the arts. Our method establishes a novel SDL framework for multioutput regression, which can be widely used to boost the performance in different applications.

  17. Heterogeneity index evaluated by slope of linear regression on 18F-FDG PET/CT as a prognostic marker for predicting tumor recurrence in pancreatic ductal adenocarcinoma

    International Nuclear Information System (INIS)

    Kim, Yong-il; Kim, Yong Joong; Paeng, Jin Chul; Cheon, Gi Jeong; Lee, Dong Soo; Chung, June-Key; Kang, Keon Wook

    2017-01-01

    18 F-Fluorodeoxyglucose (FDG) positron emission tomography (PET)/computed tomography (CT) has been investigated as a method to predict pancreatic cancer recurrence after pancreatic surgery. We evaluated the recently introduced heterogeneity indices of 18 F-FDG PET/CT used for predicting pancreatic cancer recurrence after surgery and compared them with current clinicopathologic and 18 F-FDG PET/CT parameters. A total of 93 pancreatic ductal adenocarcinoma patients (M:F = 60:33, mean age = 64.2 ± 9.1 years) who underwent preoperative 18 F-FDG PET/CT following pancreatic surgery were retrospectively enrolled. The standardized uptake values (SUVs) and tumor-to-background ratios (TBR) were measured on each 18 F-FDG PET/CT, as metabolic parameters. Metabolic tumor volume (MTV) and total lesion glycolysis (TLG) were examined as volumetric parameters. The coefficient of variance (heterogeneity index-1; SUVmean divided by the standard deviation) and linear regression slopes (heterogeneity index-2) of the MTV, according to SUV thresholds of 2.0, 2.5 and 3.0, were evaluated as heterogeneity indices. Predictive values of clinicopathologic and 18 F-FDG PET/CT parameters and heterogeneity indices were compared in terms of pancreatic cancer recurrence. Seventy patients (75.3%) showed recurrence after pancreatic cancer surgery (mean recurrence = 9.4 ± 8.4 months). Comparing the recurrence and no recurrence patients, all of the 18 F-FDG PET/CT parameters and heterogeneity indices demonstrated significant differences. In univariate Cox-regression analyses, MTV (P = 0.013), TLG (P = 0.007), and heterogeneity index-2 (P = 0.027) were significant. Among the clinicopathologic parameters, CA19-9 (P = 0.025) and venous invasion (P = 0.002) were selected as significant parameters. In multivariate Cox-regression analyses, MTV (P = 0.005), TLG (P = 0.004), and heterogeneity index-2 (P = 0.016) with venous invasion (P < 0.001, 0.001, and 0.001, respectively) demonstrated significant results

  18. pKa prediction for acidic phosphorus-containing compounds using multiple linear regression with computational descriptors.

    Science.gov (United States)

    Yu, Donghai; Du, Ruobing; Xiao, Ji-Chang

    2016-07-05

    Ninety-six acidic phosphorus-containing molecules with pKa 1.88 to 6.26 were collected and divided into training and test sets by random sampling. Structural parameters were obtained by density functional theory calculation of the molecules. The relationship between the experimental pKa values and structural parameters was obtained by multiple linear regression fitting for the training set, and tested with the test set; the R(2) values were 0.974 and 0.966 for the training and test sets, respectively. This regression equation, which quantitatively describes the influence of structural parameters on pKa , and can be used to predict pKa values of similar structures, is significant for the design of new acidic phosphorus-containing extractants. © 2016 Wiley Periodicals, Inc. © 2016 Wiley Periodicals, Inc.

  19. A Regularized Linear Dynamical System Framework for Multivariate Time Series Analysis.

    Science.gov (United States)

    Liu, Zitao; Hauskrecht, Milos

    2015-01-01

    Linear Dynamical System (LDS) is an elegant mathematical framework for modeling and learning Multivariate Time Series (MTS). However, in general, it is difficult to set the dimension of an LDS's hidden state space. A small number of hidden states may not be able to model the complexities of a MTS, while a large number of hidden states can lead to overfitting. In this paper, we study learning methods that impose various regularization penalties on the transition matrix of the LDS model and propose a regularized LDS learning framework (rLDS) which aims to (1) automatically shut down LDSs' spurious and unnecessary dimensions, and consequently, address the problem of choosing the optimal number of hidden states; (2) prevent the overfitting problem given a small amount of MTS data; and (3) support accurate MTS forecasting. To learn the regularized LDS from data we incorporate a second order cone program and a generalized gradient descent method into the Maximum a Posteriori framework and use Expectation Maximization to obtain a low-rank transition matrix of the LDS model. We propose two priors for modeling the matrix which lead to two instances of our rLDS. We show that our rLDS is able to recover well the intrinsic dimensionality of the time series dynamics and it improves the predictive performance when compared to baselines on both synthetic and real-world MTS datasets.

  20. Comparison of multiple linear regression and artificial neural network in developing the objective functions of the orthopaedic screws.

    Science.gov (United States)

    Hsu, Ching-Chi; Lin, Jinn; Chao, Ching-Kong

    2011-12-01

    Optimizing the orthopaedic screws can greatly improve their biomechanical performances. However, a methodical design optimization approach requires a long time to search the best design. Thus, the surrogate objective functions of the orthopaedic screws should be accurately developed. To our knowledge, there is no study to evaluate the strengths and limitations of the surrogate methods in developing the objective functions of the orthopaedic screws. Three-dimensional finite element models for both the tibial locking screws and the spinal pedicle screws were constructed and analyzed. Then, the learning data were prepared according to the arrangement of the Taguchi orthogonal array, and the verification data were selected with use of a randomized selection. Finally, the surrogate objective functions were developed by using either the multiple linear regression or the artificial neural network. The applicability and accuracy of those surrogate methods were evaluated and discussed. The multiple linear regression method could successfully construct the objective function of the tibial locking screws, but it failed to develop the objective function of the spinal pedicle screws. The artificial neural network method showed a greater capacity of prediction in developing the objective functions for the tibial locking screws and the spinal pedicle screws than the multiple linear regression method. The artificial neural network method may be a useful option for developing the objective functions of the orthopaedic screws with a greater structural complexity. The surrogate objective functions of the orthopaedic screws could effectively decrease the time and effort required for the design optimization process. Copyright © 2010 Elsevier Ireland Ltd. All rights reserved.

  1. A spline-based regression parameter set for creating customized DARTEL MRI brain templates from infancy to old age.

    Science.gov (United States)

    Wilke, Marko

    2018-02-01

    This dataset contains the regression parameters derived by analyzing segmented brain MRI images (gray matter and white matter) from a large population of healthy subjects, using a multivariate adaptive regression splines approach. A total of 1919 MRI datasets ranging in age from 1-75 years from four publicly available datasets (NIH, C-MIND, fCONN, and IXI) were segmented using the CAT12 segmentation framework, writing out gray matter and white matter images normalized using an affine-only spatial normalization approach. These images were then subjected to a six-step DARTEL procedure, employing an iterative non-linear registration approach and yielding increasingly crisp intermediate images. The resulting six datasets per tissue class were then analyzed using multivariate adaptive regression splines, using the CerebroMatic toolbox. This approach allows for flexibly modelling smoothly varying trajectories while taking into account demographic (age, gender) as well as technical (field strength, data quality) predictors. The resulting regression parameters described here can be used to generate matched DARTEL or SHOOT templates for a given population under study, from infancy to old age. The dataset and the algorithm used to generate it are publicly available at https://irc.cchmc.org/software/cerebromatic.php.

  2. Multivariable control in nuclear power stations

    International Nuclear Information System (INIS)

    Parent, M.; McMorran, P.D.

    1982-11-01

    Multivariable methods have the potential to improve the control of large systems such as nuclear power stations. Linear-quadratic optimal control is a multivariable method based on the minimization of a cost function. A related technique leads to the Kalman filter for estimation of plant state from noisy measurements. A design program for optimal control and Kalman filtering has been developed as part of a computer-aided design package for multivariable control systems. The method is demonstrated on a model of a nuclear steam generator, and simulated results are presented

  3. Testing for marginal linear effects in quantile regression

    KAUST Repository

    Wang, Huixia Judy

    2017-10-23

    The paper develops a new marginal testing procedure to detect significant predictors that are associated with the conditional quantiles of a scalar response. The idea is to fit the marginal quantile regression on each predictor one at a time, and then to base the test on the t-statistics that are associated with the most predictive predictors. A resampling method is devised to calibrate this test statistic, which has non-regular limiting behaviour due to the selection of the most predictive variables. Asymptotic validity of the procedure is established in a general quantile regression setting in which the marginal quantile regression models can be misspecified. Even though a fixed dimension is assumed to derive the asymptotic results, the test proposed is applicable and computationally feasible for large dimensional predictors. The method is more flexible than existing marginal screening test methods based on mean regression and has the added advantage of being robust against outliers in the response. The approach is illustrated by using an application to a human immunodeficiency virus drug resistance data set.

  4. Testing for marginal linear effects in quantile regression

    KAUST Repository

    Wang, Huixia Judy; McKeague, Ian W.; Qian, Min

    2017-01-01

    The paper develops a new marginal testing procedure to detect significant predictors that are associated with the conditional quantiles of a scalar response. The idea is to fit the marginal quantile regression on each predictor one at a time, and then to base the test on the t-statistics that are associated with the most predictive predictors. A resampling method is devised to calibrate this test statistic, which has non-regular limiting behaviour due to the selection of the most predictive variables. Asymptotic validity of the procedure is established in a general quantile regression setting in which the marginal quantile regression models can be misspecified. Even though a fixed dimension is assumed to derive the asymptotic results, the test proposed is applicable and computationally feasible for large dimensional predictors. The method is more flexible than existing marginal screening test methods based on mean regression and has the added advantage of being robust against outliers in the response. The approach is illustrated by using an application to a human immunodeficiency virus drug resistance data set.

  5. QSAR models for prediction study of HIV protease inhibitors using support vector machines, neural networks and multiple linear regression

    Directory of Open Access Journals (Sweden)

    Rachid Darnag

    2017-02-01

    Full Text Available Support vector machines (SVM represent one of the most promising Machine Learning (ML tools that can be applied to develop a predictive quantitative structure–activity relationship (QSAR models using molecular descriptors. Multiple linear regression (MLR and artificial neural networks (ANNs were also utilized to construct quantitative linear and non linear models to compare with the results obtained by SVM. The prediction results are in good agreement with the experimental value of HIV activity; also, the results reveal the superiority of the SVM over MLR and ANN model. The contribution of each descriptor to the structure–activity relationships was evaluated.

  6. A comparison between univariate probabilistic and multivariate (logistic regression) methods for landslide susceptibility analysis: the example of the Febbraro valley (Northern Alps, Italy)

    Science.gov (United States)

    Rossi, M.; Apuani, T.; Felletti, F.

    2009-04-01

    The aim of this paper is to compare the results of two statistical methods for landslide susceptibility analysis: 1) univariate probabilistic method based on landslide susceptibility index, 2) multivariate method (logistic regression). The study area is the Febbraro valley, located in the central Italian Alps, where different types of metamorphic rocks croup out. On the eastern part of the studied basin a quaternary cover represented by colluvial and secondarily, by glacial deposits, is dominant. In this study 110 earth flows, mainly located toward NE portion of the catchment, were analyzed. They involve only the colluvial deposits and their extension mainly ranges from 36 to 3173 m2. Both statistical methods require to establish a spatial database, in which each landslide is described by several parameters that can be assigned using a main scarp central point of landslide. The spatial database is constructed using a Geographical Information System (GIS). Each landslide is described by several parameters corresponding to the value of main scarp central point of the landslide. Based on bibliographic review a total of 15 predisposing factors were utilized. The width of the intervals, in which the maps of the predisposing factors have to be reclassified, has been defined assuming constant intervals to: elevation (100 m), slope (5 °), solar radiation (0.1 MJ/cm2/year), profile curvature (1.2 1/m), tangential curvature (2.2 1/m), drainage density (0.5), lineament density (0.00126). For the other parameters have been used the results of the probability-probability plots analysis and the statistical indexes of landslides site. In particular slope length (0 ÷ 2, 2 ÷ 5, 5 ÷ 10, 10 ÷ 20, 20 ÷ 35, 35 ÷ 260), accumulation flow (0 ÷ 1, 1 ÷ 2, 2 ÷ 5, 5 ÷ 12, 12 ÷ 60, 60 ÷27265), Topographic Wetness Index 0 ÷ 0.74, 0.74 ÷ 1.94, 1.94 ÷ 2.62, 2.62 ÷ 3.48, 3.48 ÷ 6,00, 6.00 ÷ 9.44), Stream Power Index (0 ÷ 0.64, 0.64 ÷ 1.28, 1.28 ÷ 1.81, 1.81 ÷ 4.20, 4.20 ÷ 9

  7. EXPLORATORY DATA ANALYSIS AND MULTIVARIATE STRATEGIES FOR REVEALING MULTIVARIATE STRUCTURES IN CLIMATE DATA

    Directory of Open Access Journals (Sweden)

    2016-12-01

    Full Text Available This paper is on data analysis strategy in a complex, multidimensional, and dynamic domain. The focus is on the use of data mining techniques to explore the importance of multivariate structures; using climate variables which influences climate change. Techniques involved in data mining exercise vary according to the data structures. The multivariate analysis strategy considered here involved choosing an appropriate tool to analyze a process. Factor analysis is introduced into data mining technique in order to reveal the influencing impacts of factors involved as well as solving for multicolinearity effect among the variables. The temporal nature and multidimensionality of the target variables is revealed in the model using multidimensional regression estimates. The strategy of integrating the method of several statistical techniques, using climate variables in Nigeria was employed. R2 of 0.518 was obtained from the ordinary least square regression analysis carried out and the test was not significant at 5% level of significance. However, factor analysis regression strategy gave a good fit with R2 of 0.811 and the test was significant at 5% level of significance. Based on this study, model building should go beyond the usual confirmatory data analysis (CDA, rather it should be complemented with exploratory data analysis (EDA in order to achieve a desired result.

  8. FIRE: an SPSS program for variable selection in multiple linear regression analysis via the relative importance of predictors.

    Science.gov (United States)

    Lorenzo-Seva, Urbano; Ferrando, Pere J

    2011-03-01

    We provide an SPSS program that implements currently recommended techniques and recent developments for selecting variables in multiple linear regression analysis via the relative importance of predictors. The approach consists of: (1) optimally splitting the data for cross-validation, (2) selecting the final set of predictors to be retained in the equation regression, and (3) assessing the behavior of the chosen model using standard indices and procedures. The SPSS syntax, a short manual, and data files related to this article are available as supplemental materials from brm.psychonomic-journals.org/content/supplemental.

  9. An Introduction to the Hybrid Approach of Neural Networks and the Linear Regression Model : An Illustration in the Hedonic Pricing Model of Building Costs

    OpenAIRE

    浅野, 美代子; マーコ, ユー K.W.

    2007-01-01

    This paper introduces the hybrid approach of neural networks and linear regression model proposed by Asano and Tsubaki (2003). Neural networks are often credited with its superiority in data consistency whereas the linear regression model provides simple interpretation of the data enabling researchers to verify their hypotheses. The hybrid approach aims at combing the strengths of these two well-established statistical methods. A step-by-step procedure for performing the hybrid approach is pr...

  10. Association of footprint measurements with plantar kinetics: a linear regression model.

    Science.gov (United States)

    Fascione, Jeanna M; Crews, Ryan T; Wrobel, James S

    2014-03-01

    The use of foot measurements to classify morphology and interpret foot function remains one of the focal concepts of lower-extremity biomechanics. However, only 27% to 55% of midfoot variance in foot pressures has been determined in the most comprehensive models. We investigated whether dynamic walking footprint measurements are associated with inter-individual foot loading variability. Thirty individuals (15 men and 15 women; mean ± SD age, 27.17 ± 2.21 years) walked at a self-selected speed over an electronic pedography platform using the midgait technique. Kinetic variables (contact time, peak pressure, pressure-time integral, and force-time integral) were collected for six masked regions. Footprints were digitized for area and linear boundaries using digital photo planimetry software. Six footprint measurements were determined: contact area, footprint index, arch index, truncated arch index, Chippaux-Smirak index, and Staheli index. Linear regression analysis with a Bonferroni adjustment was performed to determine the association between the footprint measurements and each of the kinetic variables. The findings demonstrate that a relationship exists between increased midfoot contact and increased kinetic values in respective locations. Many of these variables produced large effect sizes while describing 38% to 71% of the common variance of select plantar kinetic variables in the medial midfoot region. In addition, larger footprints were associated with larger kinetic values at the medial heel region and both masked forefoot regions. Dynamic footprint measurements are associated with dynamic plantar loading kinetics, with emphasis on the midfoot region.

  11. Describing Growth Pattern of Bali Cows Using Non-linear Regression Models

    Directory of Open Access Journals (Sweden)

    Mohd. Hafiz A.W

    2016-12-01

    Full Text Available The objective of this study was to evaluate the best fit non-linear regression model to describe the growth pattern of Bali cows. Estimates of asymptotic mature weight, rate of maturing and constant of integration were derived from Brody, von Bertalanffy, Gompertz and Logistic models which were fitted to cross-sectional data of body weight taken from 74 Bali cows raised in MARDI Research Station Muadzam Shah Pahang. Coefficient of determination (R2 and residual mean squares (MSE were used to determine the best fit model in describing the growth pattern of Bali cows. Von Bertalanffy model was the best model among the four growth functions evaluated to determine the mature weight of Bali cattle as shown by the highest R2 and lowest MSE values (0.973 and 601.9, respectively, followed by Gompertz (0.972 and 621.2, respectively, Logistic (0.971 and 648.4, respectively and Brody (0.932 and 660.5, respectively models. The correlation between rate of maturing and mature weight was found to be negative in the range of -0.170 to -0.929 for all models, indicating that animals of heavier mature weight had lower rate of maturing. The use of non-linear model could summarize the weight-age relationship into several biologically interpreted parameters compared to the entire lifespan weight-age data points that are difficult and time consuming to interpret.

  12. Relationships between the structure of wheat gluten and ACE inhibitory activity of hydrolysate: stepwise multiple linear regression analysis.

    Science.gov (United States)

    Zhang, Yanyan; Ma, Haile; Wang, Bei; Qu, Wenjuan; Wali, Asif; Zhou, Cunshan

    2016-08-01

    Ultrasound pretreatment of wheat gluten (WG) before enzymolysis can improve the angiotensin converting enzyme (ACE) inhibitory activity of the hydrolysates by alerting the structure of substrate proteins. Establishment of a relationship between the structure of WG and ACE inhibitory activity of the hydrolysates to judge the end point of the ultrasonic pretreatment is vital. The results of stepwise multiple linear regression (MLR) showed that the contents of free sulfhydryl, α-helix, disulfide bond, surface hydrophobicity and random coil were significantly correlated to ACE Inhibitory activity of the hydrolysate, with the standard partial regression coefficients were 3.729, -0.676, -0.252, 0.022 and 0.156, respectively. The R(2) of this model was 0.970. External validation showed that the stepwise MLR model could well predict the ACE inhibitory activity of hydrolysate based on the content of free sulfhydryl, α-helix, disulfide bond, surface hydrophobicity and random coil of WG before hydrolysis. A stepwise multiple linear regression model describing the quantitative relationships between the structure of WG and the ACE Inhibitory activity of the hydrolysates was established. This model can be used to predict the endpoint of the ultrasonic pretreatment. © 2015 Society of Chemical Industry. © 2015 Society of Chemical Industry.

  13. Modified Regression Correlation Coefficient for Poisson Regression Model

    Science.gov (United States)

    Kaengthong, Nattacha; Domthong, Uthumporn

    2017-09-01

    This study gives attention to indicators in predictive power of the Generalized Linear Model (GLM) which are widely used; however, often having some restrictions. We are interested in regression correlation coefficient for a Poisson regression model. This is a measure of predictive power, and defined by the relationship between the dependent variable (Y) and the expected value of the dependent variable given the independent variables [E(Y|X)] for the Poisson regression model. The dependent variable is distributed as Poisson. The purpose of this research was modifying regression correlation coefficient for Poisson regression model. We also compare the proposed modified regression correlation coefficient with the traditional regression correlation coefficient in the case of two or more independent variables, and having multicollinearity in independent variables. The result shows that the proposed regression correlation coefficient is better than the traditional regression correlation coefficient based on Bias and the Root Mean Square Error (RMSE).

  14. Radioligand assays - methods and applications. IV. Uniform regression of hyperbolic and linear radioimmunoassay calibration curves

    Energy Technology Data Exchange (ETDEWEB)

    Keilacker, H; Becker, G; Ziegler, M; Gottschling, H D [Zentralinstitut fuer Diabetes, Karlsburg (German Democratic Republic)

    1980-10-01

    In order to handle all types of radioimmunoassay (RIA) calibration curves obtained in the authors' laboratory in the same way, they tried to find a non-linear expression for their regression which allows calibration curves with different degrees of curvature to be fitted. Considering the two boundary cases of the incubation protocol they derived a hyperbolic inverse regression function: x = a/sub 1/y + a/sub 0/ + asub(-1)y/sup -1/, where x is the total concentration of antigen, asub(i) are constants, and y is the specifically bound radioactivity. An RIA evaluation procedure based on this function is described providing a fitted inverse RIA calibration curve and some statistical quality parameters. The latter are of an order which is normal for RIA systems. There is an excellent agreement between fitted and experimentally obtained calibration curves having a different degree of curvature.

  15. Trend analysis by a piecewise linear regression model applied to surface air temperatures in Southeastern Spain (1973–2014)

    OpenAIRE

    Campra, Pablo; Morales, Maria

    2016-01-01

    The magnitude of the trends of environmental and climatic changes is mostly derived from the slopes of the linear trends using ordinary least-square fitting. An alternative flexible fitting model, piecewise regression, has been applied here to surface air temperature records in southeastern Spain for the recent warming period (1973–2014) to gain accuracy in the description of the inner structure of change, dividing the time series into linear segments with different slopes. Breakpoint y...

  16. A study on direct determination of uranium in ore by analyzing γ-ray spectrum with dual linear regression

    International Nuclear Information System (INIS)

    Liu Chunkui

    1996-01-01

    The method introduced is based on different energy of γ-ray emitted from radionuclide in the uranium-radium decay series in ore. The pulse counting rates of two spectra bands, i.e. N 1 (55∼193 keV) and N 2 (260∼1500 keV), are measured by portable type HYX-3 400-channel γ-ray spectrometer. On the other side, the uranium content (Q U ) is obtained by chemical analysis of channel sampling. Then the regression coefficients (b 0 , b 1 ,b 2 ) can be determined through dual linear regression by using Q U and N 1 , N 2 . The direct determination of uranium can be made with the regression equation Q U = b 0 + b 1 N 1 + b 2 N 2

  17. Multiple linear stepwise regression of liver lipid levels: proton MR spectroscopy study in vivo at 3.0 T

    International Nuclear Information System (INIS)

    Xu Li; Liang Changhong; Xiao Yuanqiu; Zhang Zhonglin

    2010-01-01

    Objective: To analyze the correlations between liver lipid level determined by liver 3.0 T 1 H-MRS in vivo and influencing factors using multiple linear stepwise regression. Methods: The prospective study of liver 1 H-MRS was performed with 3.0 T system and eight-channel torso phased-array coils using PRESS sequence. Forty-four volunteers were enrolled in this study. Liver spectra were collected with a TR of 1500 ms, TE of 30 ms, volume of interest of 2 cm×2 cm×2 cm, NSA of 64 times. The acquired raw proton MRS data were processed by using a software program SAGE. For each MRS measurement, using water as the internal reference, the amplitude of the lipid signal was normalized to the sum of the signal from lipid and water to obtain percentage lipid within the liver. The statistical description of height, weight, age and BMI, Line width and water suppression were recorded, and Pearson analysis was applied to test their relationships. Multiple linear stepwise regression was used to set the statistical model for the prediction of Liver lipid content. Results: Age (39.1±12.6) years, body weight (64.4±10.4) kg, BMI (23.3±3.1) kg/m 2 , linewidth (18.9±4.4) and the water suppression (90.7±6.5)% had significant correlation with liver lipid content (0.00 to 0.96%, median 0.02%), r were 0.11, 0.44, 0.40, 0.52, -0.73 respectively (P<0.05). But only age, BMI, line width, and the water suppression entered into the multiple linear regression equation. Liver lipid content prediction equation was as follows: Y= 1.395 - (0.021×water suppression) + (0.022×BMI) + (0.014×line width) - (0.004×age), and the coefficient of determination was 0. 613, corrected coefficient of determination was 0.59. Conclusion: The regression model fitted well, since the variables of age, BMI, width, and water suppression can explain about 60% of liver lipid content changes. (authors)

  18. Hierarchical Cluster-based Partial Least Squares Regression (HC-PLSR is an efficient tool for metamodelling of nonlinear dynamic models

    Directory of Open Access Journals (Sweden)

    Omholt Stig W

    2011-06-01

    Full Text Available Abstract Background Deterministic dynamic models of complex biological systems contain a large number of parameters and state variables, related through nonlinear differential equations with various types of feedback. A metamodel of such a dynamic model is a statistical approximation model that maps variation in parameters and initial conditions (inputs to variation in features of the trajectories of the state variables (outputs throughout the entire biologically relevant input space. A sufficiently accurate mapping can be exploited both instrumentally and epistemically. Multivariate regression methodology is a commonly used approach for emulating dynamic models. However, when the input-output relations are highly nonlinear or non-monotone, a standard linear regression approach is prone to give suboptimal results. We therefore hypothesised that a more accurate mapping can be obtained by locally linear or locally polynomial regression. We present here a new method for local regression modelling, Hierarchical Cluster-based PLS regression (HC-PLSR, where fuzzy C-means clustering is used to separate the data set into parts according to the structure of the response surface. We compare the metamodelling performance of HC-PLSR with polynomial partial least squares regression (PLSR and ordinary least squares (OLS regression on various systems: six different gene regulatory network models with various types of feedback, a deterministic mathematical model of the mammalian circadian clock and a model of the mouse ventricular myocyte function. Results Our results indicate that multivariate regression is well suited for emulating dynamic models in systems biology. The hierarchical approach turned out to be superior to both polynomial PLSR and OLS regression in all three test cases. The advantage, in terms of explained variance and prediction accuracy, was largest in systems with highly nonlinear functional relationships and in systems with positive feedback

  19. Hierarchical cluster-based partial least squares regression (HC-PLSR) is an efficient tool for metamodelling of nonlinear dynamic models.

    Science.gov (United States)

    Tøndel, Kristin; Indahl, Ulf G; Gjuvsland, Arne B; Vik, Jon Olav; Hunter, Peter; Omholt, Stig W; Martens, Harald

    2011-06-01

    Deterministic dynamic models of complex biological systems contain a large number of parameters and state variables, related through nonlinear differential equations with various types of feedback. A metamodel of such a dynamic model is a statistical approximation model that maps variation in parameters and initial conditions (inputs) to variation in features of the trajectories of the state variables (outputs) throughout the entire biologically relevant input space. A sufficiently accurate mapping can be exploited both instrumentally and epistemically. Multivariate regression methodology is a commonly used approach for emulating dynamic models. However, when the input-output relations are highly nonlinear or non-monotone, a standard linear regression approach is prone to give suboptimal results. We therefore hypothesised that a more accurate mapping can be obtained by locally linear or locally polynomial regression. We present here a new method for local regression modelling, Hierarchical Cluster-based PLS regression (HC-PLSR), where fuzzy C-means clustering is used to separate the data set into parts according to the structure of the response surface. We compare the metamodelling performance of HC-PLSR with polynomial partial least squares regression (PLSR) and ordinary least squares (OLS) regression on various systems: six different gene regulatory network models with various types of feedback, a deterministic mathematical model of the mammalian circadian clock and a model of the mouse ventricular myocyte function. Our results indicate that multivariate regression is well suited for emulating dynamic models in systems biology. The hierarchical approach turned out to be superior to both polynomial PLSR and OLS regression in all three test cases. The advantage, in terms of explained variance and prediction accuracy, was largest in systems with highly nonlinear functional relationships and in systems with positive feedback loops. HC-PLSR is a promising approach for

  20. Standardizing effect size from linear regression models with log-transformed variables for meta-analysis.

    Science.gov (United States)

    Rodríguez-Barranco, Miguel; Tobías, Aurelio; Redondo, Daniel; Molina-Portillo, Elena; Sánchez, María José

    2017-03-17

    Meta-analysis is very useful to summarize the effect of a treatment or a risk factor for a given disease. Often studies report results based on log-transformed variables in order to achieve the principal assumptions of a linear regression model. If this is the case for some, but not all studies, the effects need to be homogenized. We derived a set of formulae to transform absolute changes into relative ones, and vice versa, to allow including all results in a meta-analysis. We applied our procedure to all possible combinations of log-transformed independent or dependent variables. We also evaluated it in a simulation based on two variables either normally or asymmetrically distributed. In all the scenarios, and based on different change criteria, the effect size estimated by the derived set of formulae was equivalent to the real effect size. To avoid biased estimates of the effect, this procedure should be used with caution in the case of independent variables with asymmetric distributions that significantly differ from the normal distribution. We illustrate an application of this procedure by an application to a meta-analysis on the potential effects on neurodevelopment in children exposed to arsenic and manganese. The procedure proposed has been shown to be valid and capable of expressing the effect size of a linear regression model based on different change criteria in the variables. Homogenizing the results from different studies beforehand allows them to be combined in a meta-analysis, independently of whether the transformations had been performed on the dependent and/or independent variables.

  1. Efficient Semiparametric Marginal Estimation for the Partially Linear Additive Model for Longitudinal/Clustered Data

    KAUST Repository

    Carroll, Raymond; Maity, Arnab; Mammen, Enno; Yu, Kyusang

    2009-01-01

    We consider the efficient estimation of a regression parameter in a partially linear additive nonparametric regression model from repeated measures data when the covariates are multivariate. To date, while there is some literature in the scalar covariate case, the problem has not been addressed in the multivariate additive model case. Ours represents a first contribution in this direction. As part of this work, we first describe the behavior of nonparametric estimators for additive models with repeated measures when the underlying model is not additive. These results are critical when one considers variants of the basic additive model. We apply them to the partially linear additive repeated-measures model, deriving an explicit consistent estimator of the parametric component; if the errors are in addition Gaussian, the estimator is semiparametric efficient. We also apply our basic methods to a unique testing problem that arises in genetic epidemiology; in combination with a projection argument we develop an efficient and easily computed testing scheme. Simulations and an empirical example from nutritional epidemiology illustrate our methods.

  2. Efficient Semiparametric Marginal Estimation for the Partially Linear Additive Model for Longitudinal/Clustered Data

    KAUST Repository

    Carroll, Raymond

    2009-04-23

    We consider the efficient estimation of a regression parameter in a partially linear additive nonparametric regression model from repeated measures data when the covariates are multivariate. To date, while there is some literature in the scalar covariate case, the problem has not been addressed in the multivariate additive model case. Ours represents a first contribution in this direction. As part of this work, we first describe the behavior of nonparametric estimators for additive models with repeated measures when the underlying model is not additive. These results are critical when one considers variants of the basic additive model. We apply them to the partially linear additive repeated-measures model, deriving an explicit consistent estimator of the parametric component; if the errors are in addition Gaussian, the estimator is semiparametric efficient. We also apply our basic methods to a unique testing problem that arises in genetic epidemiology; in combination with a projection argument we develop an efficient and easily computed testing scheme. Simulations and an empirical example from nutritional epidemiology illustrate our methods.

  3. Heterogeneity index evaluated by slope of linear regression on {sup 18}F-FDG PET/CT as a prognostic marker for predicting tumor recurrence in pancreatic ductal adenocarcinoma

    Energy Technology Data Exchange (ETDEWEB)

    Kim, Yong-il [CHA University, Department of Nuclear Medicine, CHA Bundang Medical Center, Seongnam (Korea, Republic of); Seoul National University Hospital, Department of Nuclear Medicine, Seoul (Korea, Republic of); Kim, Yong Joong [Veterans Health Service Medical Center, Seoul (Korea, Republic of); Paeng, Jin Chul; Cheon, Gi Jeong; Lee, Dong Soo [Seoul National University Hospital, Department of Nuclear Medicine, Seoul (Korea, Republic of); Chung, June-Key [Seoul National University Hospital, Department of Nuclear Medicine, Seoul (Korea, Republic of); Seoul National University, Cancer Research Institute, Seoul (Korea, Republic of); Kang, Keon Wook [Seoul National University Hospital, Department of Nuclear Medicine, Seoul (Korea, Republic of); Seoul National University, Cancer Research Institute, Seoul (Korea, Republic of); Seoul National University College of Medicine, Department of Biomedical Sciences, Seoul (Korea, Republic of); Seoul National University College of Medicine, Department of Nuclear Medicine, Seoul (Korea, Republic of)

    2017-11-15

    {sup 18}F-Fluorodeoxyglucose (FDG) positron emission tomography (PET)/computed tomography (CT) has been investigated as a method to predict pancreatic cancer recurrence after pancreatic surgery. We evaluated the recently introduced heterogeneity indices of {sup 18}F-FDG PET/CT used for predicting pancreatic cancer recurrence after surgery and compared them with current clinicopathologic and {sup 18}F-FDG PET/CT parameters. A total of 93 pancreatic ductal adenocarcinoma patients (M:F = 60:33, mean age = 64.2 ± 9.1 years) who underwent preoperative {sup 18}F-FDG PET/CT following pancreatic surgery were retrospectively enrolled. The standardized uptake values (SUVs) and tumor-to-background ratios (TBR) were measured on each {sup 18}F-FDG PET/CT, as metabolic parameters. Metabolic tumor volume (MTV) and total lesion glycolysis (TLG) were examined as volumetric parameters. The coefficient of variance (heterogeneity index-1; SUVmean divided by the standard deviation) and linear regression slopes (heterogeneity index-2) of the MTV, according to SUV thresholds of 2.0, 2.5 and 3.0, were evaluated as heterogeneity indices. Predictive values of clinicopathologic and {sup 18}F-FDG PET/CT parameters and heterogeneity indices were compared in terms of pancreatic cancer recurrence. Seventy patients (75.3%) showed recurrence after pancreatic cancer surgery (mean recurrence = 9.4 ± 8.4 months). Comparing the recurrence and no recurrence patients, all of the {sup 18}F-FDG PET/CT parameters and heterogeneity indices demonstrated significant differences. In univariate Cox-regression analyses, MTV (P = 0.013), TLG (P = 0.007), and heterogeneity index-2 (P = 0.027) were significant. Among the clinicopathologic parameters, CA19-9 (P = 0.025) and venous invasion (P = 0.002) were selected as significant parameters. In multivariate Cox-regression analyses, MTV (P = 0.005), TLG (P = 0.004), and heterogeneity index-2 (P = 0.016) with venous invasion (P < 0.001, 0.001, and 0

  4. A MULTIVARIATE ANALYSIS OF CROATIAN COUNTIES ENTREPRENEURSHIP

    Directory of Open Access Journals (Sweden)

    Elza Jurun

    2012-12-01

    Full Text Available In the focus of this paper is a multivariate analysis of Croatian Counties entrepreneurship. Complete data base available by official statistic institutions at national and regional level is used. Modern econometric methodology starting from a comparative analysis via multiple regression to multivariate cluster analysis is carried out as well as the analysis of successful or inefficacious entrepreneurship measured by indicators of efficiency, profitability and productivity. Time horizons of the comparative analysis are in 2004 and 2010. Accelerators of socio-economic development - number of entrepreneur investors, investment in fixed assets and current assets ratio in multiple regression model are analytically filtered between twenty-six independent variables as variables of the dominant influence on GDP per capita in 2010 as dependent variable. Results of multivariate cluster analysis of twentyone Croatian Counties are interpreted also in the sense of three Croatian NUTS 2 regions according to European nomenclature of regional territorial division of Croatia.

  5. Estimating traffic volume on Wyoming low volume roads using linear and logistic regression methods

    Directory of Open Access Journals (Sweden)

    Dick Apronti

    2016-12-01

    Full Text Available Traffic volume is an important parameter in most transportation planning applications. Low volume roads make up about 69% of road miles in the United States. Estimating traffic on the low volume roads is a cost-effective alternative to taking traffic counts. This is because traditional traffic counts are expensive and impractical for low priority roads. The purpose of this paper is to present the development of two alternative means of cost-effectively estimating traffic volumes for low volume roads in Wyoming and to make recommendations for their implementation. The study methodology involves reviewing existing studies, identifying data sources, and carrying out the model development. The utility of the models developed were then verified by comparing actual traffic volumes to those predicted by the model. The study resulted in two regression models that are inexpensive and easy to implement. The first regression model was a linear regression model that utilized pavement type, access to highways, predominant land use types, and population to estimate traffic volume. In verifying the model, an R2 value of 0.64 and a root mean square error of 73.4% were obtained. The second model was a logistic regression model that identified the level of traffic on roads using five thresholds or levels. The logistic regression model was verified by estimating traffic volume thresholds and determining the percentage of roads that were accurately classified as belonging to the given thresholds. For the five thresholds, the percentage of roads classified correctly ranged from 79% to 88%. In conclusion, the verification of the models indicated both model types to be useful for accurate and cost-effective estimation of traffic volumes for low volume Wyoming roads. The models developed were recommended for use in traffic volume estimations for low volume roads in pavement management and environmental impact assessment studies.

  6. Multivariable Feedback Control of Nuclear Reactors

    Directory of Open Access Journals (Sweden)

    Rune Moen

    1982-07-01

    Full Text Available Multivariable feedback control has been adapted for optimal control of the spatial power distribution in nuclear reactor cores. Two design techniques, based on the theory of automatic control, were developed: the State Variable Feedback (SVF is an application of the linear optimal control theory, and the Multivariable Frequency Response (MFR is based on a generalization of the traditional frequency response approach to control system design.

  7. Improvement of a Robotic Manipulator Model Based on Multivariate Residual Modeling

    Directory of Open Access Journals (Sweden)

    Serge Gale

    2017-07-01

    Full Text Available A new method is presented for extending a dynamic model of a six degrees of freedom robotic manipulator. A non-linear multivariate calibration of input–output training data from several typical motion trajectories is carried out with the aim of predicting the model systematic output error at time (t + 1 from known input reference up till and including time (t. A new partial least squares regression (PLSR based method, nominal PLSR with interactions was developed and used to handle, unmodelled non-linearities. The performance of the new method is compared with least squares (LS. Different cross-validation schemes were compared in order to assess the sampling of the state space based on conventional trajectories. The method developed in the paper can be used as fault monitoring mechanism and early warning system for sensor failure. The results show that the suggested methods improves trajectory tracking performance of the robotic manipulator by extending the initial dynamic model of the manipulator.

  8. Predicting the multi-domain progression of Parkinson's disease: a Bayesian multivariate generalized linear mixed-effect model.

    Science.gov (United States)

    Wang, Ming; Li, Zheng; Lee, Eun Young; Lewis, Mechelle M; Zhang, Lijun; Sterling, Nicholas W; Wagner, Daymond; Eslinger, Paul; Du, Guangwei; Huang, Xuemei

    2017-09-25

    It is challenging for current statistical models to predict clinical progression of Parkinson's disease (PD) because of the involvement of multi-domains and longitudinal data. Past univariate longitudinal or multivariate analyses from cross-sectional trials have limited power to predict individual outcomes or a single moment. The multivariate generalized linear mixed-effect model (GLMM) under the Bayesian framework was proposed to study multi-domain longitudinal outcomes obtained at baseline, 18-, and 36-month. The outcomes included motor, non-motor, and postural instability scores from the MDS-UPDRS, and demographic and standardized clinical data were utilized as covariates. The dynamic prediction was performed for both internal and external subjects using the samples from the posterior distributions of the parameter estimates and random effects, and also the predictive accuracy was evaluated based on the root of mean square error (RMSE), absolute bias (AB) and the area under the receiver operating characteristic (ROC) curve. First, our prediction model identified clinical data that were differentially associated with motor, non-motor, and postural stability scores. Second, the predictive accuracy of our model for the training data was assessed, and improved prediction was gained in particularly for non-motor (RMSE and AB: 2.89 and 2.20) compared to univariate analysis (RMSE and AB: 3.04 and 2.35). Third, the individual-level predictions of longitudinal trajectories for the testing data were performed, with ~80% observed values falling within the 95% credible intervals. Multivariate general mixed models hold promise to predict clinical progression of individual outcomes in PD. The data was obtained from Dr. Xuemei Huang's NIH grant R01 NS060722 , part of NINDS PD Biomarker Program (PDBP). All data was entered within 24 h of collection to the Data Management Repository (DMR), which is publically available ( https://pdbp.ninds.nih.gov/data-management ).

  9. Interpretation of commonly used statistical regression models.

    Science.gov (United States)

    Kasza, Jessica; Wolfe, Rory

    2014-01-01

    A review of some regression models commonly used in respiratory health applications is provided in this article. Simple linear regression, multiple linear regression, logistic regression and ordinal logistic regression are considered. The focus of this article is on the interpretation of the regression coefficients of each model, which are illustrated through the application of these models to a respiratory health research study. © 2013 The Authors. Respirology © 2013 Asian Pacific Society of Respirology.

  10. Detection of epistatic effects with logic regression and a classical linear regression model.

    Science.gov (United States)

    Malina, Magdalena; Ickstadt, Katja; Schwender, Holger; Posch, Martin; Bogdan, Małgorzata

    2014-02-01

    To locate multiple interacting quantitative trait loci (QTL) influencing a trait of interest within experimental populations, usually methods as the Cockerham's model are applied. Within this framework, interactions are understood as the part of the joined effect of several genes which cannot be explained as the sum of their additive effects. However, if a change in the phenotype (as disease) is caused by Boolean combinations of genotypes of several QTLs, this Cockerham's approach is often not capable to identify them properly. To detect such interactions more efficiently, we propose a logic regression framework. Even though with the logic regression approach a larger number of models has to be considered (requiring more stringent multiple testing correction) the efficient representation of higher order logic interactions in logic regression models leads to a significant increase of power to detect such interactions as compared to a Cockerham's approach. The increase in power is demonstrated analytically for a simple two-way interaction model and illustrated in more complex settings with simulation study and real data analysis.

  11. Design of multivariable controller for a 600 MWe CANDU nuclear power plant

    International Nuclear Information System (INIS)

    Mensah, S.; McMorran, P.D.

    1982-04-01

    This paper reports the results of a case study on the design of a multivariable regulator for a nuclear power station of the Gentilly-2 type. In this study, a design model was derived by simplifying and linearizing equations in the G2SIM non-linear model. Open-loop simulation showed good agreement between transient responses of both models. After a critical review of multivariable design techniques, the authors explored pole shifting with output feedback. A comprehensive set of application-oriented algorithms for closed-loop pole shifting, implemented via modules in the MVPACK computer-aided design package were derived. A controller was designed for the linear model, then implemented on the non-linear simulation. After adjustment of controller gains, mainly in the dynanamic section of the feedback, simulation results showed that the performance of the multivariable controller on G2SIM is satisfactory. The results demonstrate the relative superiority of the multi-variable controller over the existing conventional controller

  12. Adaptive Linear and Normalized Combination of Radial Basis Function Networks for Function Approximation and Regression

    Directory of Open Access Journals (Sweden)

    Yunfeng Wu

    2014-01-01

    Full Text Available This paper presents a novel adaptive linear and normalized combination (ALNC method that can be used to combine the component radial basis function networks (RBFNs to implement better function approximation and regression tasks. The optimization of the fusion weights is obtained by solving a constrained quadratic programming problem. According to the instantaneous errors generated by the component RBFNs, the ALNC is able to perform the selective ensemble of multiple leaners by adaptively adjusting the fusion weights from one instance to another. The results of the experiments on eight synthetic function approximation and six benchmark regression data sets show that the ALNC method can effectively help the ensemble system achieve a higher accuracy (measured in terms of mean-squared error and the better fidelity (characterized by normalized correlation coefficient of approximation, in relation to the popular simple average, weighted average, and the Bagging methods.

  13. Application of multivariate adaptive regression spine-assisted objective function on optimization of heat transfer rate around a cylinder

    Energy Technology Data Exchange (ETDEWEB)

    Dey, Prasenjit; Dad, Ajoy K. [Mechanical Engineering Department, National Institute of Technology, Agartala (India)

    2016-12-15

    The present study aims to predict the heat transfer characteristics around a square cylinder with different corner radii using multivariate adaptive regression splines (MARS). Further, the MARS-generated objective function is optimized by particle swarm optimization. The data for the prediction are taken from the recently published article by the present authors [P. Dey, A. Sarkar, A.K. Das, Development of GEP and ANN model to predict the unsteady forced convection over a cylinder, Neural Comput. Appl. (2015). Further, the MARS model is compared with artificial neural network and gene expression programming. It has been found that the MARS model is very efficient in predicting the heat transfer characteristics. It has also been found that MARS is more efficient than artificial neural network and gene expression programming in predicting the forced convection data, and also particle swarm optimization can efficiently optimize the heat transfer rate.

  14. Multi-disease analysis of maternal antibody decay using non-linear mixed models accounting for censoring.

    Science.gov (United States)

    Goeyvaerts, Nele; Leuridan, Elke; Faes, Christel; Van Damme, Pierre; Hens, Niel

    2015-09-10

    Biomedical studies often generate repeated measures of multiple outcomes on a set of subjects. It may be of interest to develop a biologically intuitive model for the joint evolution of these outcomes while assessing inter-subject heterogeneity. Even though it is common for biological processes to entail non-linear relationships, examples of multivariate non-linear mixed models (MNMMs) are still fairly rare. We contribute to this area by jointly analyzing the maternal antibody decay for measles, mumps, rubella, and varicella, allowing for a different non-linear decay model for each infectious disease. We present a general modeling framework to analyze multivariate non-linear longitudinal profiles subject to censoring, by combining multivariate random effects, non-linear growth and Tobit regression. We explore the hypothesis of a common infant-specific mechanism underlying maternal immunity using a pairwise correlated random-effects approach and evaluating different correlation matrix structures. The implied marginal correlation between maternal antibody levels is estimated using simulations. The mean duration of passive immunity was less than 4 months for all diseases with substantial heterogeneity between infants. The maternal antibody levels against rubella and varicella were found to be positively correlated, while little to no correlation could be inferred for the other disease pairs. For some pairs, computational issues occurred with increasing correlation matrix complexity, which underlines the importance of further developing estimation methods for MNMMs. Copyright © 2015 John Wiley & Sons, Ltd.

  15. Least Squares Adjustment: Linear and Nonlinear Weighted Regression Analysis

    DEFF Research Database (Denmark)

    Nielsen, Allan Aasbjerg

    2007-01-01

    This note primarily describes the mathematics of least squares regression analysis as it is often used in geodesy including land surveying and satellite positioning applications. In these fields regression is often termed adjustment. The note also contains a couple of typical land surveying...... and satellite positioning application examples. In these application areas we are typically interested in the parameters in the model typically 2- or 3-D positions and not in predictive modelling which is often the main concern in other regression analysis applications. Adjustment is often used to obtain...... the clock error) and to obtain estimates of the uncertainty with which the position is determined. Regression analysis is used in many other fields of application both in the natural, the technical and the social sciences. Examples may be curve fitting, calibration, establishing relationships between...

  16. Prosthetic alignment after total knee replacement is not associated with dissatisfaction or change in Oxford Knee Score: A multivariable regression analysis.

    Science.gov (United States)

    Huijbregts, Henricus J T A M; Khan, Riaz J K; Fick, Daniel P; Jarrett, Olivia M; Haebich, Samantha

    2016-06-01

    Approximately 18% of the patients are dissatisfied with the result of total knee replacement. However, the relation between dissatisfaction and prosthetic alignment has not been investigated before. We retrospectively analysed prospectively gathered data of all patients who had a primary TKR, preoperative and one-year postoperative Oxford Knee Scores (OKS) and postoperative computed tomography (CT). The CT protocol measures hip-knee-ankle (HKA) angle, and coronal, sagittal and axial component alignment. Satisfaction was defined using a five-item Likert scale. We dichotomised dissatisfaction by combining '(very) dissatisfied' and 'neutral/not sure'. Associations with dissatisfaction and change in OKS were calculated using multivariable logistic and linear regression models. 230 TKRs were implanted in 105 men and 106 women. At one year, 12% were (very) dissatisfied and 10% neutral. Coronal alignment of the femoral component was 0.5 degrees more accurate in patients who were satisfied at one year. The other alignment measurements were not different between satisfied and dissatisfied patients. All radiographic measurements had a P-value>0.10 on univariate analyses. At one year, dissatisfaction was associated with the three-months OKS. Change in OKS was associated with three-months OKS, preoperative physical SF-12, preoperative pain and cruciate retaining design. Neither mechanical axis, nor component alignment, is associated with dissatisfaction at one year following TKR. Patients get the best outcome when pain reduction and function improvement are optimal during the first three months and when the indication to embark on surgery is based on physical limitations rather than on a high pain score. 2. Copyright © 2016 Elsevier B.V. All rights reserved.

  17. SU-G-BRA-08: Diaphragm Motion Tracking Based On KV CBCT Projections with a Constrained Linear Regression Optimization

    Energy Technology Data Exchange (ETDEWEB)

    Wei, J [City College of New York, New York, NY (United States); Chao, M [The Mount Sinai Medical Center, New York, NY (United States)

    2016-06-15

    Purpose: To develop a novel strategy to extract the respiratory motion of the thoracic diaphragm from kilovoltage cone beam computed tomography (CBCT) projections by a constrained linear regression optimization technique. Methods: A parabolic function was identified as the geometric model and was employed to fit the shape of the diaphragm on the CBCT projections. The search was initialized by five manually placed seeds on a pre-selected projection image. Temporal redundancies, the enabling phenomenology in video compression and encoding techniques, inherent in the dynamic properties of the diaphragm motion together with the geometrical shape of the diaphragm boundary and the associated algebraic constraint that significantly reduced the searching space of viable parabolic parameters was integrated, which can be effectively optimized by a constrained linear regression approach on the subsequent projections. The innovative algebraic constraints stipulating the kinetic range of the motion and the spatial constraint preventing any unphysical deviations was able to obtain the optimal contour of the diaphragm with minimal initialization. The algorithm was assessed by a fluoroscopic movie acquired at anteriorposterior fixed direction and kilovoltage CBCT projection image sets from four lung and two liver patients. The automatic tracing by the proposed algorithm and manual tracking by a human operator were compared in both space and frequency domains. Results: The error between the estimated and manual detections for the fluoroscopic movie was 0.54mm with standard deviation (SD) of 0.45mm, while the average error for the CBCT projections was 0.79mm with SD of 0.64mm for all enrolled patients. The submillimeter accuracy outcome exhibits the promise of the proposed constrained linear regression approach to track the diaphragm motion on rotational projection images. Conclusion: The new algorithm will provide a potential solution to rendering diaphragm motion and ultimately

  18. SU-G-BRA-08: Diaphragm Motion Tracking Based On KV CBCT Projections with a Constrained Linear Regression Optimization

    International Nuclear Information System (INIS)

    Wei, J; Chao, M

    2016-01-01

    Purpose: To develop a novel strategy to extract the respiratory motion of the thoracic diaphragm from kilovoltage cone beam computed tomography (CBCT) projections by a constrained linear regression optimization technique. Methods: A parabolic function was identified as the geometric model and was employed to fit the shape of the diaphragm on the CBCT projections. The search was initialized by five manually placed seeds on a pre-selected projection image. Temporal redundancies, the enabling phenomenology in video compression and encoding techniques, inherent in the dynamic properties of the diaphragm motion together with the geometrical shape of the diaphragm boundary and the associated algebraic constraint that significantly reduced the searching space of viable parabolic parameters was integrated, which can be effectively optimized by a constrained linear regression approach on the subsequent projections. The innovative algebraic constraints stipulating the kinetic range of the motion and the spatial constraint preventing any unphysical deviations was able to obtain the optimal contour of the diaphragm with minimal initialization. The algorithm was assessed by a fluoroscopic movie acquired at anteriorposterior fixed direction and kilovoltage CBCT projection image sets from four lung and two liver patients. The automatic tracing by the proposed algorithm and manual tracking by a human operator were compared in both space and frequency domains. Results: The error between the estimated and manual detections for the fluoroscopic movie was 0.54mm with standard deviation (SD) of 0.45mm, while the average error for the CBCT projections was 0.79mm with SD of 0.64mm for all enrolled patients. The submillimeter accuracy outcome exhibits the promise of the proposed constrained linear regression approach to track the diaphragm motion on rotational projection images. Conclusion: The new algorithm will provide a potential solution to rendering diaphragm motion and ultimately

  19. Multiple linear combination (MLC) regression tests for common variants adapted to linkage disequilibrium structure.

    Science.gov (United States)

    Yoo, Yun Joo; Sun, Lei; Poirier, Julia G; Paterson, Andrew D; Bull, Shelley B

    2017-02-01

    By jointly analyzing multiple variants within a gene, instead of one at a time, gene-based multiple regression can improve power, robustness, and interpretation in genetic association analysis. We investigate multiple linear combination (MLC) test statistics for analysis of common variants under realistic trait models with linkage disequilibrium (LD) based on HapMap Asian haplotypes. MLC is a directional test that exploits LD structure in a gene to construct clusters of closely correlated variants recoded such that the majority of pairwise correlations are positive. It combines variant effects within the same cluster linearly, and aggregates cluster-specific effects in a quadratic sum of squares and cross-products, producing a test statistic with reduced degrees of freedom (df) equal to the number of clusters. By simulation studies of 1000 genes from across the genome, we demonstrate that MLC is a well-powered and robust choice among existing methods across a broad range of gene structures. Compared to minimum P-value, variance-component, and principal-component methods, the mean power of MLC is never much lower than that of other methods, and can be higher, particularly with multiple causal variants. Moreover, the variation in gene-specific MLC test size and power across 1000 genes is less than that of other methods, suggesting it is a complementary approach for discovery in genome-wide analysis. The cluster construction of the MLC test statistics helps reveal within-gene LD structure, allowing interpretation of clustered variants as haplotypic effects, while multiple regression helps to distinguish direct and indirect associations. © 2016 The Authors Genetic Epidemiology Published by Wiley Periodicals, Inc.

  20. Plateletpheresis efficiency and mathematical correction of software-derived platelet yield prediction: A linear regression and ROC modeling approach.

    Science.gov (United States)

    Jaime-Pérez, José Carlos; Jiménez-Castillo, Raúl Alberto; Vázquez-Hernández, Karina Elizabeth; Salazar-Riojas, Rosario; Méndez-Ramírez, Nereida; Gómez-Almaguer, David

    2017-10-01

    Advances in automated cell separators have improved the efficiency of plateletpheresis and the possibility of obtaining double products (DP). We assessed cell processor accuracy of predicted platelet (PLT) yields with the goal of a better prediction of DP collections. This retrospective proof-of-concept study included 302 plateletpheresis procedures performed on a Trima Accel v6.0 at the apheresis unit of a hematology department. Donor variables, software predicted yield and actual PLT yield were statistically evaluated. Software prediction was optimized by linear regression analysis and its optimal cut-off to obtain a DP assessed by receiver operating characteristic curve (ROC) modeling. Three hundred and two plateletpheresis procedures were performed; in 271 (89.7%) occasions, donors were men and in 31 (10.3%) women. Pre-donation PLT count had the best direct correlation with actual PLT yield (r = 0.486. P Simple correction derived from linear regression analysis accurately corrected this underestimation and ROC analysis identified a precise cut-off to reliably predict a DP. © 2016 Wiley Periodicals, Inc.

  1. Modeling the kinetics of essential oil hydrodistillation from juniper berries (Juniperus communis L. using non-linear regression

    Directory of Open Access Journals (Sweden)

    Radosavljević Dragana B.

    2017-01-01

    Full Text Available This paper presents kinetics modeling of essential oil hydrodistillation from juniper berries (Juniperus communis L. by using a non-linear regression methodology. The proposed model has the polynomial-logarithmic form. The initial equation of the proposed non-linear model is q = q∞•(a•(logt2 + b•logt + c and by substituting a1=q∞•a, b1 = q∞•b and c1 = q∞•c, the final equation is obtained as q = a1•(logt2 + b1•logt + c1. In this equation q is the quantity of the obtained oil at time t, while a1, b1 and c1 are parameters to be determined for each sample. From the final equation it can be seen that the key parameter q∞, which presents the maximal oil quantity obtained after infinite time, is already included in parameters a1, b1 and c1. In this way, experimental determination of this parameter is avoided. Using the proposed model with parameters obtained by regression, the values of oil hydrodistillation in time are calculated for each sample and compared to the experimental values. In addition, two kinetic models previously proposed in literature were applied to the same experimental results. The developed model provided better agreements with the experimental values than the two, generally accepted kinetic models of this process. The average values of error measures (RSS, RSE, AIC and MRPD obtained for our model (0.005; 0.017; –84.33; 1.65 were generally lower than the corresponding values of the other two models (0.025; 0.041; –53.20; 3.89 and (0.0035; 0.015; –86.83; 1.59. Also, parameter estimation for the proposed model was significantly simpler (maximum 2 iterations per sample using the non-linear regression than that for the existing models (maximum 9 iterations per sample. [Project of the Serbian Ministry of Education, Science and Technological Development, Grant no. TR-35026

  2. Reliability of the Load-Velocity Relationship Obtained Through Linear and Polynomial Regression Models to Predict the One-Repetition Maximum Load.

    Science.gov (United States)

    Pestaña-Melero, Francisco Luis; Haff, G Gregory; Rojas, Francisco Javier; Pérez-Castilla, Alejandro; García-Ramos, Amador

    2017-12-18

    This study aimed to compare the between-session reliability of the load-velocity relationship between (1) linear vs. polynomial regression models, (2) concentric-only vs. eccentric-concentric bench press variants, as well as (3) the within-participants vs. the between-participants variability of the velocity attained at each percentage of the one-repetition maximum (%1RM). The load-velocity relationship of 30 men (age: 21.2±3.8 y; height: 1.78±0.07 m, body mass: 72.3±7.3 kg; bench press 1RM: 78.8±13.2 kg) were evaluated by means of linear and polynomial regression models in the concentric-only and eccentric-concentric bench press variants in a Smith Machine. Two sessions were performed with each bench press variant. The main findings were: (1) first-order-polynomials (CV: 4.39%-4.70%) provided the load-velocity relationship with higher reliability than second-order-polynomials (CV: 4.68%-5.04%); (2) the reliability of the load-velocity relationship did not differ between the concentric-only and eccentric-concentric bench press variants; (3) the within-participants variability of the velocity attained at each %1RM was markedly lower than the between-participants variability. Taken together, these results highlight that, regardless of the bench press variant considered, the individual determination of the load-velocity relationship by a linear regression model could be recommended to monitor and prescribe the relative load in the Smith machine bench press exercise.

  3. Multivariable controller for a 600 MWe CANDU nuclear power plant

    International Nuclear Information System (INIS)

    Mensah, S.

    1982-11-01

    The problems of designing a multivariable regulator for a nuclear power station of the Gentilly-2 type are studied. A reduced model, G2LDM, linearized around steady state operating conditions, is derived from the non-linear model G2SIM. The resulting linear model is described by state-space equations. Good agreement is demonstrated between the transient responses of both models. Properties of G2LDM are assessed by performing controllability and observability tests, cyclicity and rank tests, and eigenanalysis. A comprehensive set of application-orinented algorithms which allow multivariable controller design with closed-loop pole-assignment techniques are implemented in a computer-aided design package via several modules. A general scheme for the implementation of a multivariable controller in G2SIM is designed, and simulation tests show satisfactory performance of the controller [fr

  4. Modeling daily soil temperature over diverse climate conditions in Iran—a comparison of multiple linear regression and support vector regression techniques

    Science.gov (United States)

    Delbari, Masoomeh; Sharifazari, Salman; Mohammadi, Ehsan

    2018-02-01

    The knowledge of soil temperature at different depths is important for agricultural industry and for understanding climate change. The aim of this study is to evaluate the performance of a support vector regression (SVR)-based model in estimating daily soil temperature at 10, 30 and 100 cm depth at different climate conditions over Iran. The obtained results were compared to those obtained from a more classical multiple linear regression (MLR) model. The correlation sensitivity for the input combinations and periodicity effect were also investigated. Climatic data used as inputs to the models were minimum and maximum air temperature, solar radiation, relative humidity, dew point, and the atmospheric pressure (reduced to see level), collected from five synoptic stations Kerman, Ahvaz, Tabriz, Saghez, and Rasht located respectively in the hyper-arid, arid, semi-arid, Mediterranean, and hyper-humid climate conditions. According to the results, the performance of both MLR and SVR models was quite well at surface layer, i.e., 10-cm depth. However, SVR performed better than MLR in estimating soil temperature at deeper layers especially 100 cm depth. Moreover, both models performed better in humid climate condition than arid and hyper-arid areas. Further, adding a periodicity component into the modeling process considerably improved the models' performance especially in the case of SVR.

  5. A STATISTICAL ANALYSIS OF GDP AND FINAL CONSUMPTION USING SIMPLE LINEAR REGRESSION. THE CASE OF ROMANIA 1990–2010

    OpenAIRE

    Aniela Balacescu; Marian Zaharia

    2011-01-01

    This paper aims to examine the causal relationship between GDP and final consumption. The authors used linear regression model in which GDP is considered variable results, and final consumption variable factor. In drafting article we used Excel software application that is a modern computing and statistical data analysis.

  6. Using Logistic Regression To Predict the Probability of Debris Flows Occurring in Areas Recently Burned By Wildland Fires

    Science.gov (United States)

    Rupert, Michael G.; Cannon, Susan H.; Gartner, Joseph E.

    2003-01-01

    Logistic regression was used to predict the probability of debris flows occurring in areas recently burned by wildland fires. Multiple logistic regression is conceptually similar to multiple linear regression because statistical relations between one dependent variable and several independent variables are evaluated. In logistic regression, however, the dependent variable is transformed to a binary variable (debris flow did or did not occur), and the actual probability of the debris flow occurring is statistically modeled. Data from 399 basins located within 15 wildland fires that burned during 2000-2002 in Colorado, Idaho, Montana, and New Mexico were evaluated. More than 35 independent variables describing the burn severity, geology, land surface gradient, rainfall, and soil properties were evaluated. The models were developed as follows: (1) Basins that did and did not produce debris flows were delineated from National Elevation Data using a Geographic Information System (GIS). (2) Data describing the burn severity, geology, land surface gradient, rainfall, and soil properties were determined for each basin. These data were then downloaded to a statistics software package for analysis using logistic regression. (3) Relations between the occurrence/non-occurrence of debris flows and burn severity, geology, land surface gradient, rainfall, and soil properties were evaluated and several preliminary multivariate logistic regression models were constructed. All possible combinations of independent variables were evaluated to determine which combination produced the most effective model. The multivariate model that best predicted the occurrence of debris flows was selected. (4) The multivariate logistic regression model was entered into a GIS, and a map showing the probability of debris flows was constructed. The most effective model incorporates the percentage of each basin with slope greater than 30 percent, percentage of land burned at medium and high burn severity

  7. Effective Surfactants Blend Concentration Determination for O/W Emulsion Stabilization by Two Nonionic Surfactants by Simple Linear Regression.

    Science.gov (United States)

    Hassan, A K

    2015-01-01

    In this work, O/W emulsion sets were prepared by using different concentrations of two nonionic surfactants. The two surfactants, tween 80(HLB=15.0) and span 80(HLB=4.3) were used in a fixed proportions equal to 0.55:0.45 respectively. HLB value of the surfactants blends were fixed at 10.185. The surfactants blend concentration is starting from 3% up to 19%. For each O/W emulsion set the conductivity was measured at room temperature (25±2°), 40, 50, 60, 70 and 80°. Applying the simple linear regression least squares method statistical analysis to the temperature-conductivity obtained data determines the effective surfactants blend concentration required for preparing the most stable O/W emulsion. These results were confirmed by applying the physical stability centrifugation testing and the phase inversion temperature range measurements. The results indicated that, the relation which represents the most stable O/W emulsion has the strongest direct linear relationship between temperature and conductivity. This relationship is linear up to 80°. This work proves that, the most stable O/W emulsion is determined via the determination of the maximum R² value by applying of the simple linear regression least squares method to the temperature-conductivity obtained data up to 80°, in addition to, the true maximum slope is represented by the equation which has the maximum R² value. Because the conditions would be changed in a more complex formulation, the method of the determination of the effective surfactants blend concentration was verified by applying it for more complex formulations of 2% O/W miconazole nitrate cream and the results indicate its reproducibility.

  8. Development of a Multiple Linear Regression Model to Forecast Facility Electrical Consumption at an Air Force Base.

    Science.gov (United States)

    1981-09-01

    corresponds to the same square footage that consumed the electrical energy. 3. The basic assumptions of multiple linear regres- sion, as enumerated in...7. Data related to the sample of bases is assumed to be representative of bases in the population. Limitations Basic limitations on this research were... Ratemaking --Overview. Rand Report R-5894, Santa Monica CA, May 1977. Chatterjee, Samprit, and Bertram Price. Regression Analysis by Example. New York: John

  9. Mixture of Regression Models with Single-Index

    OpenAIRE

    Xiang, Sijia; Yao, Weixin

    2016-01-01

    In this article, we propose a class of semiparametric mixture regression models with single-index. We argue that many recently proposed semiparametric/nonparametric mixture regression models can be considered special cases of the proposed model. However, unlike existing semiparametric mixture regression models, the new pro- posed model can easily incorporate multivariate predictors into the nonparametric components. Backfitting estimates and the corresponding algorithms have been proposed for...

  10. Local bilinear multiple-output quantile/depth regression

    Czech Academy of Sciences Publication Activity Database

    Hallin, M.; Lu, Z.; Paindaveine, D.; Šiman, Miroslav

    2015-01-01

    Roč. 21, č. 3 (2015), s. 1435-1466 ISSN 1350-7265 R&D Projects: GA MŠk(CZ) 1M06047 Institutional support: RVO:67985556 Keywords : conditional depth * growth chart * halfspace depth * local bilinear regression * multivariate quantile * quantile regression * regression depth Subject RIV: BA - General Mathematics Impact factor: 1.372, year: 2015 http://library.utia.cas.cz/separaty/2015/SI/siman-0446857.pdf

  11. Implicit collinearity effect in linear regression: Application to basal ...

    African Journals Online (AJOL)

    Collinearity of predictor variables is a severe problem in the least square regression analysis. It contributes to the instability of regression coefficients and leads to a wrong prediction accuracy. Despite these problems, studies are conducted with a large number of observed and derived variables linked with a response ...

  12. On the analysis of clonogenic survival data: Statistical alternatives to the linear-quadratic model

    International Nuclear Information System (INIS)

    Unkel, Steffen; Belka, Claus; Lauber, Kirsten

    2016-01-01

    the extraction of scores of radioresistance, which displayed significant correlations with the estimated parameters of the regression models. Undoubtedly, LQ regression is a robust method for the analysis of clonogenic survival data. Nevertheless, alternative approaches including non-linear regression and multivariate techniques such as cluster analysis and principal component analysis represent versatile tools for the extraction of parameters and/or scores of the cellular response towards ionizing irradiation with a more intuitive biological interpretation. The latter are highly informative for correlation analyses with other types of data, including functional genomics data that are increasingly beinggenerated

  13. Relationship between rice yield and climate variables in southwest Nigeria using multiple linear regression and support vector machine analysis

    Science.gov (United States)

    Oguntunde, Philip G.; Lischeid, Gunnar; Dietrich, Ottfried

    2018-03-01

    This study examines the variations of climate variables and rice yield and quantifies the relationships among them using multiple linear regression, principal component analysis, and support vector machine (SVM) analysis in southwest Nigeria. The climate and yield data used was for a period of 36 years between 1980 and 2015. Similar to the observed decrease ( P 1 and explained 83.1% of the total variance of predictor variables. The SVM regression function using the scores of the first principal component explained about 75% of the variance in rice yield data and linear regression about 64%. SVM regression between annual solar radiation values and yield explained 67% of the variance. Only the first component of the principal component analysis (PCA) exhibited a clear long-term trend and sometimes short-term variance similar to that of rice yield. Short-term fluctuations of the scores of the PC1 are closely coupled to those of rice yield during the 1986-1993 and the 2006-2013 periods thereby revealing the inter-annual sensitivity of rice production to climate variability. Solar radiation stands out as the climate variable of highest influence on rice yield, and the influence was especially strong during monsoon and post-monsoon periods, which correspond to the vegetative, booting, flowering, and grain filling stages in the study area. The outcome is expected to provide more in-depth regional-specific climate-rice linkage for screening of better cultivars that can positively respond to future climate fluctuations as well as providing information that may help optimized planting dates for improved radiation use efficiency in the study area.

  14. Performance of an Axisymmetric Rocket Based Combined Cycle Engine During Rocket Only Operation Using Linear Regression Analysis

    Science.gov (United States)

    Smith, Timothy D.; Steffen, Christopher J., Jr.; Yungster, Shaye; Keller, Dennis J.

    1998-01-01

    The all rocket mode of operation is shown to be a critical factor in the overall performance of a rocket based combined cycle (RBCC) vehicle. An axisymmetric RBCC engine was used to determine specific impulse efficiency values based upon both full flow and gas generator configurations. Design of experiments methodology was used to construct a test matrix and multiple linear regression analysis was used to build parametric models. The main parameters investigated in this study were: rocket chamber pressure, rocket exit area ratio, injected secondary flow, mixer-ejector inlet area, mixer-ejector area ratio, and mixer-ejector length-to-inlet diameter ratio. A perfect gas computational fluid dynamics analysis, using both the Spalart-Allmaras and k-omega turbulence models, was performed with the NPARC code to obtain values of vacuum specific impulse. Results from the multiple linear regression analysis showed that for both the full flow and gas generator configurations increasing mixer-ejector area ratio and rocket area ratio increase performance, while increasing mixer-ejector inlet area ratio and mixer-ejector length-to-diameter ratio decrease performance. Increasing injected secondary flow increased performance for the gas generator analysis, but was not statistically significant for the full flow analysis. Chamber pressure was found to be not statistically significant.

  15. Determination of sulfamethoxazole and trimethoprim mixtures by multivariate electronic spectroscopy

    OpenAIRE

    Cordeiro, Gilcélia A.; Peralta-Zamora, Patricio; Nagata, Noemi; Pontarollo, Roberto

    2008-01-01

    In this work a multivariate spectroscopic methodology is proposed for quantitative determination of sulfamethoxazole and trimethoprim in pharmaceutical associations. The multivariate model was developed by partial least-squares regression, using twenty synthetic mixtures and the spectral region between 190 and 350 nm. In the validation stage, which involved the analysis of five synthetic mixtures, prediction errors lower that 3% were observed. The predictive capacity of the multivariate model...

  16. Linear regression models and k-means clustering for statistical analysis of fNIRS data.

    Science.gov (United States)

    Bonomini, Viola; Zucchelli, Lucia; Re, Rebecca; Ieva, Francesca; Spinelli, Lorenzo; Contini, Davide; Paganoni, Anna; Torricelli, Alessandro

    2015-02-01

    We propose a new algorithm, based on a linear regression model, to statistically estimate the hemodynamic activations in fNIRS data sets. The main concern guiding the algorithm development was the minimization of assumptions and approximations made on the data set for the application of statistical tests. Further, we propose a K-means method to cluster fNIRS data (i.e. channels) as activated or not activated. The methods were validated both on simulated and in vivo fNIRS data. A time domain (TD) fNIRS technique was preferred because of its high performances in discriminating cortical activation and superficial physiological changes. However, the proposed method is also applicable to continuous wave or frequency domain fNIRS data sets.

  17. The estimation and prediction of the inventories for the liquid and gaseous radwaste systems using the linear regression analysis

    International Nuclear Information System (INIS)

    Kim, J. Y.; Shin, C. H.; Kim, J. K.; Lee, J. K.; Park, Y. J.

    2003-01-01

    The variation transitions of the inventories for the liquid radwaste system and the radioactive gas have being released in containment, and their predictive values according to the operation histories of Yonggwang(YGN) 3 and 4 were analyzed by linear regression analysis methodology. The results show that the variation transitions of the inventories for those systems are linearly increasing according to the operation histories but the inventories released to the environment are considerably lower than the recommended values based on the FSAR suggestions. It is considered that some conservation were presented in the estimation methodology in preparing stage of FSAR

  18. Modeling of Soil Aggregate Stability using Support Vector Machines and Multiple Linear Regression

    Directory of Open Access Journals (Sweden)

    Ali Asghar Besalatpour

    2016-02-01

    Full Text Available Introduction: Soil aggregate stability is a key factor in soil resistivity to mechanical stresses, including the impacts of rainfall and surface runoff, and thus to water erosion (Canasveras et al., 2010. Various indicators have been proposed to characterize and quantify soil aggregate stability, for example percentage of water-stable aggregates (WSA, mean weight diameter (MWD, geometric mean diameter (GMD of aggregates, and water-dispersible clay (WDC content (Calero et al., 2008. Unfortunately, the experimental methods available to determine these indicators are laborious, time-consuming and difficult to standardize (Canasveras et al., 2010. Therefore, it would be advantageous if aggregate stability could be predicted indirectly from more easily available data (Besalatpour et al., 2014. The main objective of this study is to investigate the potential use of support vector machines (SVMs method for estimating soil aggregate stability (as quantified by GMD as compared to multiple linear regression approach. Materials and Methods: The study area was part of the Bazoft watershed (31° 37′ to 32° 39′ N and 49° 34′ to 50° 32′ E, which is located in the Northern part of the Karun river basin in central Iran. A total of 160 soil samples were collected from the top 5 cm of soil surface. Some easily available characteristics including topographic, vegetation, and soil properties were used as inputs. Soil organic matter (SOM content was determined by the Walkley-Black method (Nelson & Sommers, 1986. Particle size distribution in the soil samples (clay, silt, sand, fine sand, and very fine sand were measured using the procedure described by Gee & Bauder (1986 and calcium carbonate equivalent (CCE content was determined by the back-titration method (Nelson, 1982. The modified Kemper & Rosenau (1986 method was used to determine wet-aggregate stability (GMD. The topographic attributes of elevation, slope, and aspect were characterized using a 20-m

  19. Genomic-Enabled Prediction Based on Molecular Markers and Pedigree Using the Bayesian Linear Regression Package in R

    Directory of Open Access Journals (Sweden)

    Paulino Pérez

    2010-09-01

    Full Text Available The availability of dense molecular markers has made possible the use of genomic selection in plant and animal breeding. However, models for genomic selection pose several computational and statistical challenges and require specialized computer programs, not always available to the end user and not implemented in standard statistical software yet. The R-package BLR (Bayesian Linear Regression implements several statistical procedures (e.g., Bayesian Ridge Regression, Bayesian LASSO in a unified framework that allows including marker genotypes and pedigree data jointly. This article describes the classes of models implemented in the BLR package and illustrates their use through examples. Some challenges faced when applying genomic-enabled selection, such as model choice, evaluation of predictive ability through cross-validation, and choice of hyper-parameters, are also addressed.

  20. Multivariate analysis of DSC-XRD simultaneous measurement data: a study of multistage crystalline structure changes in a linear poly(ethylene imine) thin film.

    Science.gov (United States)

    Kakuda, Hiroyuki; Okada, Tetsuo; Otsuka, Makoto; Katsumoto, Yukiteru; Hasegawa, Takeshi

    2009-01-01

    A multivariate analytical technique has been applied to the analysis of simultaneous measurement data from differential scanning calorimetry (DSC) and X-ray diffraction (XRD) in order to study thermal changes in crystalline structure of a linear poly(ethylene imine) (LPEI) film. A large number of XRD patterns generated from the simultaneous measurements were subjected to an augmented alternative least-squares (ALS) regression analysis, and the XRD patterns were readily decomposed into chemically independent XRD patterns and their thermal profiles were also obtained at the same time. The decomposed XRD patterns and the profiles were useful in discussing the minute peaks in the DSC. The analytical results revealed the following changes of polymorphisms in detail: An LPEI film prepared by casting an aqueous solution was composed of sesquihydrate and hemihydrate crystals. The sesquihydrate one was lost at an early stage of heating, and the film changed into an amorphous state. Once the sesquihydrate was lost by heating, it was not recovered even when it was cooled back to room temperature. When the sample was heated again, structural changes were found between the hemihydrate and the amorphous components. In this manner, the simultaneous DSC-XRD measurements combined with ALS analysis proved to be powerful for obtaining a better understanding of the thermally induced changes of the crystalline structure in a polymer film.

  1. Application of multivariate splines to discrete mathematics

    OpenAIRE

    Xu, Zhiqiang

    2005-01-01

    Using methods developed in multivariate splines, we present an explicit formula for discrete truncated powers, which are defined as the number of non-negative integer solutions of linear Diophantine equations. We further use the formula to study some classical problems in discrete mathematics as follows. First, we extend the partition function of integers in number theory. Second, we exploit the relation between the relative volume of convex polytopes and multivariate truncated powers and giv...

  2. pulver: an R package for parallel ultra-rapid p-value computation for linear regression interaction terms.

    Science.gov (United States)

    Molnos, Sophie; Baumbach, Clemens; Wahl, Simone; Müller-Nurasyid, Martina; Strauch, Konstantin; Wang-Sattler, Rui; Waldenberger, Melanie; Meitinger, Thomas; Adamski, Jerzy; Kastenmüller, Gabi; Suhre, Karsten; Peters, Annette; Grallert, Harald; Theis, Fabian J; Gieger, Christian

    2017-09-29

    Genome-wide association studies allow us to understand the genetics of complex diseases. Human metabolism provides information about the disease-causing mechanisms, so it is usual to investigate the associations between genetic variants and metabolite levels. However, only considering genetic variants and their effects on one trait ignores the possible interplay between different "omics" layers. Existing tools only consider single-nucleotide polymorphism (SNP)-SNP interactions, and no practical tool is available for large-scale investigations of the interactions between pairs of arbitrary quantitative variables. We developed an R package called pulver to compute p-values for the interaction term in a very large number of linear regression models. Comparisons based on simulated data showed that pulver is much faster than the existing tools. This is achieved by using the correlation coefficient to test the null-hypothesis, which avoids the costly computation of inversions. Additional tricks are a rearrangement of the order, when iterating through the different "omics" layers, and implementing this algorithm in the fast programming language C++. Furthermore, we applied our algorithm to data from the German KORA study to investigate a real-world problem involving the interplay among DNA methylation, genetic variants, and metabolite levels. The pulver package is a convenient and rapid tool for screening huge numbers of linear regression models for significant interaction terms in arbitrary pairs of quantitative variables. pulver is written in R and C++, and can be downloaded freely from CRAN at https://cran.r-project.org/web/packages/pulver/ .

  3. Monopole and dipole estimation for multi-frequency sky maps by linear regression

    Science.gov (United States)

    Wehus, I. K.; Fuskeland, U.; Eriksen, H. K.; Banday, A. J.; Dickinson, C.; Ghosh, T.; Górski, K. M.; Lawrence, C. R.; Leahy, J. P.; Maino, D.; Reich, P.; Reich, W.

    2017-01-01

    We describe a simple but efficient method for deriving a consistent set of monopole and dipole corrections for multi-frequency sky map data sets, allowing robust parametric component separation with the same data set. The computational core of this method is linear regression between pairs of frequency maps, often called T-T plots. Individual contributions from monopole and dipole terms are determined by performing the regression locally in patches on the sky, while the degeneracy between different frequencies is lifted whenever the dominant foreground component exhibits a significant spatial spectral index variation. Based on this method, we present two different, but each internally consistent, sets of monopole and dipole coefficients for the nine-year WMAP, Planck 2013, SFD 100 μm, Haslam 408 MHz and Reich & Reich 1420 MHz maps. The two sets have been derived with different analysis assumptions and data selection, and provide an estimate of residual systematic uncertainties. In general, our values are in good agreement with previously published results. Among the most notable results are a relative dipole between the WMAP and Planck experiments of 10-15μK (depending on frequency), an estimate of the 408 MHz map monopole of 8.9 ± 1.3 K, and a non-zero dipole in the 1420 MHz map of 0.15 ± 0.03 K pointing towards Galactic coordinates (l,b) = (308°,-36°) ± 14°. These values represent the sum of any instrumental and data processing offsets, as well as any Galactic or extra-Galactic component that is spectrally uniform over the full sky.

  4. Aggregation-cokriging for highly multivariate spatial data

    KAUST Repository

    Furrer, R.; Genton, M. G.

    2011-01-01

    Best linear unbiased prediction of spatially correlated multivariate random processes, often called cokriging in geostatistics, requires the solution of a large linear system based on the covariance and cross-covariance matrix of the observations. For many problems of practical interest, it is impossible to solve the linear system with direct methods. We propose an efficient linear unbiased predictor based on a linear aggregation of the covariables. The primary variable together with this single meta-covariable is used to perform cokriging. We discuss the optimality of the approach under different covariance structures, and use it to create reanalysis type high-resolution historical temperature fields. © 2011 Biometrika Trust.

  5. Aggregation-cokriging for highly multivariate spatial data

    KAUST Repository

    Furrer, R.

    2011-08-26

    Best linear unbiased prediction of spatially correlated multivariate random processes, often called cokriging in geostatistics, requires the solution of a large linear system based on the covariance and cross-covariance matrix of the observations. For many problems of practical interest, it is impossible to solve the linear system with direct methods. We propose an efficient linear unbiased predictor based on a linear aggregation of the covariables. The primary variable together with this single meta-covariable is used to perform cokriging. We discuss the optimality of the approach under different covariance structures, and use it to create reanalysis type high-resolution historical temperature fields. © 2011 Biometrika Trust.

  6. A multiple linear regression analysis of hot corrosion attack on a series of nickel base turbine alloys

    Science.gov (United States)

    Barrett, C. A.

    1985-01-01

    Multiple linear regression analysis was used to determine an equation for estimating hot corrosion attack for a series of Ni base cast turbine alloys. The U transform (i.e., 1/sin (% A/100) to the 1/2) was shown to give the best estimate of the dependent variable, y. A complete second degree equation is described for the centered" weight chemistries for the elements Cr, Al, Ti, Mo, W, Cb, Ta, and Co. In addition linear terms for the minor elements C, B, and Zr were added for a basic 47 term equation. The best reduced equation was determined by the stepwise selection method with essentially 13 terms. The Cr term was found to be the most important accounting for 60 percent of the explained variability hot corrosion attack.

  7. Relationship between rice yield and climate variables in southwest Nigeria using multiple linear regression and support vector machine analysis.

    Science.gov (United States)

    Oguntunde, Philip G; Lischeid, Gunnar; Dietrich, Ottfried

    2018-03-01

    This study examines the variations of climate variables and rice yield and quantifies the relationships among them using multiple linear regression, principal component analysis, and support vector machine (SVM) analysis in southwest Nigeria. The climate and yield data used was for a period of 36 years between 1980 and 2015. Similar to the observed decrease (P  1 and explained 83.1% of the total variance of predictor variables. The SVM regression function using the scores of the first principal component explained about 75% of the variance in rice yield data and linear regression about 64%. SVM regression between annual solar radiation values and yield explained 67% of the variance. Only the first component of the principal component analysis (PCA) exhibited a clear long-term trend and sometimes short-term variance similar to that of rice yield. Short-term fluctuations of the scores of the PC1 are closely coupled to those of rice yield during the 1986-1993 and the 2006-2013 periods thereby revealing the inter-annual sensitivity of rice production to climate variability. Solar radiation stands out as the climate variable of highest influence on rice yield, and the influence was especially strong during monsoon and post-monsoon periods, which correspond to the vegetative, booting, flowering, and grain filling stages in the study area. The outcome is expected to provide more in-depth regional-specific climate-rice linkage for screening of better cultivars that can positively respond to future climate fluctuations as well as providing information that may help optimized planting dates for improved radiation use efficiency in the study area.

  8. Quantile Regression Methods

    DEFF Research Database (Denmark)

    Fitzenberger, Bernd; Wilke, Ralf Andreas

    2015-01-01

    if the mean regression model does not. We provide a short informal introduction into the principle of quantile regression which includes an illustrative application from empirical labor market research. This is followed by briefly sketching the underlying statistical model for linear quantile regression based......Quantile regression is emerging as a popular statistical approach, which complements the estimation of conditional mean models. While the latter only focuses on one aspect of the conditional distribution of the dependent variable, the mean, quantile regression provides more detailed insights...... by modeling conditional quantiles. Quantile regression can therefore detect whether the partial effect of a regressor on the conditional quantiles is the same for all quantiles or differs across quantiles. Quantile regression can provide evidence for a statistical relationship between two variables even...

  9. Comparison of multiple linear regression, partial least squares and artificial neural networks for prediction of gas chromatographic relative retention times of trimethylsilylated anabolic androgenic steroids.

    Science.gov (United States)

    Fragkaki, A G; Farmaki, E; Thomaidis, N; Tsantili-Kakoulidou, A; Angelis, Y S; Koupparis, M; Georgakopoulos, C

    2012-09-21

    The comparison among different modelling techniques, such as multiple linear regression, partial least squares and artificial neural networks, has been performed in order to construct and evaluate models for prediction of gas chromatographic relative retention times of trimethylsilylated anabolic androgenic steroids. The performance of the quantitative structure-retention relationship study, using the multiple linear regression and partial least squares techniques, has been previously conducted. In the present study, artificial neural networks models were constructed and used for the prediction of relative retention times of anabolic androgenic steroids, while their efficiency is compared with that of the models derived from the multiple linear regression and partial least squares techniques. For overall ranking of the models, a novel procedure [Trends Anal. Chem. 29 (2010) 101-109] based on sum of ranking differences was applied, which permits the best model to be selected. The suggested models are considered useful for the estimation of relative retention times of designer steroids for which no analytical data are available. Copyright © 2012 Elsevier B.V. All rights reserved.

  10. QSAR Study of Insecticides of Phthalamide Derivatives Using Multiple Linear Regression and Artificial Neural Network Methods

    Directory of Open Access Journals (Sweden)

    Adi Syahputra

    2014-03-01

    Full Text Available Quantitative structure activity relationship (QSAR for 21 insecticides of phthalamides containing hydrazone (PCH was studied using multiple linear regression (MLR, principle component regression (PCR and artificial neural network (ANN. Five descriptors were included in the model for MLR and ANN analysis, and five latent variables obtained from principle component analysis (PCA were used in PCR analysis. Calculation of descriptors was performed using semi-empirical PM6 method. ANN analysis was found to be superior statistical technique compared to the other methods and gave a good correlation between descriptors and activity (r2 = 0.84. Based on the obtained model, we have successfully designed some new insecticides with higher predicted activity than those of previously synthesized compounds, e.g.2-(decalinecarbamoyl-5-chloro-N’-((5-methylthiophen-2-ylmethylene benzohydrazide, 2-(decalinecarbamoyl-5-chloro-N’-((thiophen-2-yl-methylene benzohydrazide and 2-(decaline carbamoyl-N’-(4-fluorobenzylidene-5-chlorobenzohydrazide with predicted log LC50 of 1.640, 1.672, and 1.769 respectively.

  11. Predicting hyperketonemia by logistic and linear regression using test-day milk and performance variables in early-lactation Holstein and Jersey cows.

    Science.gov (United States)

    Chandler, T L; Pralle, R S; Dórea, J R R; Poock, S E; Oetzel, G R; Fourdraine, R H; White, H M

    2018-03-01

    Although cowside testing strategies for diagnosing hyperketonemia (HYK) are available, many are labor intensive and costly, and some lack sufficient accuracy. Predicting milk ketone bodies by Fourier transform infrared spectrometry during routine milk sampling may offer a more practical monitoring strategy. The objectives of this study were to (1) develop linear and logistic regression models using all available test-day milk and performance variables for predicting HYK and (2) compare prediction methods (Fourier transform infrared milk ketone bodies, linear regression models, and logistic regression models) to determine which is the most predictive of HYK. Given the data available, a secondary objective was to evaluate differences in test-day milk and performance variables (continuous measurements) between Holsteins and Jerseys and between cows with or without HYK within breed. Blood samples were collected on the same day as milk sampling from 658 Holstein and 468 Jersey cows between 5 and 20 d in milk (DIM). Diagnosis of HYK was at a serum β-hydroxybutyrate (BHB) concentration ≥1.2 mmol/L. Concentrations of milk BHB and acetone were predicted by Fourier transform infrared spectrometry (Foss Analytical, Hillerød, Denmark). Thresholds of milk BHB and acetone were tested for diagnostic accuracy, and logistic models were built from continuous variables to predict HYK in primiparous and multiparous cows within breed. Linear models were constructed from continuous variables for primiparous and multiparous cows within breed that were 5 to 11 DIM or 12 to 20 DIM. Milk ketone body thresholds diagnosed HYK with 64.0 to 92.9% accuracy in Holsteins and 59.1 to 86.6% accuracy in Jerseys. Logistic models predicted HYK with 82.6 to 97.3% accuracy. Internally cross-validated multiple linear regression models diagnosed HYK of Holstein cows with 97.8% accuracy for primiparous and 83.3% accuracy for multiparous cows. Accuracy of Jersey models was 81.3% in primiparous and 83

  12. Extending Local Canonical Correlation Analysis to Handle General Linear Contrasts for fMRI Data

    Directory of Open Access Journals (Sweden)

    Mingwu Jin

    2012-01-01

    Full Text Available Local canonical correlation analysis (CCA is a multivariate method that has been proposed to more accurately determine activation patterns in fMRI data. In its conventional formulation, CCA has several drawbacks that limit its usefulness in fMRI. A major drawback is that, unlike the general linear model (GLM, a test of general linear contrasts of the temporal regressors has not been incorporated into the CCA formalism. To overcome this drawback, a novel directional test statistic was derived using the equivalence of multivariate multiple regression (MVMR and CCA. This extension will allow CCA to be used for inference of general linear contrasts in more complicated fMRI designs without reparameterization of the design matrix and without reestimating the CCA solutions for each particular contrast of interest. With the proper constraints on the spatial coefficients of CCA, this test statistic can yield a more powerful test on the inference of evoked brain regional activations from noisy fMRI data than the conventional t-test in the GLM. The quantitative results from simulated and pseudoreal data and activation maps from fMRI data were used to demonstrate the advantage of this novel test statistic.

  13. A step-by-step guide to non-linear regression analysis of experimental data using a Microsoft Excel spreadsheet.

    Science.gov (United States)

    Brown, A M

    2001-06-01

    The objective of this present study was to introduce a simple, easily understood method for carrying out non-linear regression analysis based on user input functions. While it is relatively straightforward to fit data with simple functions such as linear or logarithmic functions, fitting data with more complicated non-linear functions is more difficult. Commercial specialist programmes are available that will carry out this analysis, but these programmes are expensive and are not intuitive to learn. An alternative method described here is to use the SOLVER function of the ubiquitous spreadsheet programme Microsoft Excel, which employs an iterative least squares fitting routine to produce the optimal goodness of fit between data and function. The intent of this paper is to lead the reader through an easily understood step-by-step guide to implementing this method, which can be applied to any function in the form y=f(x), and is well suited to fast, reliable analysis of data in all fields of biology.

  14. Applied Statistics: From Bivariate through Multivariate Techniques [with CD-ROM

    Science.gov (United States)

    Warner, Rebecca M.

    2007-01-01

    This book provides a clear introduction to widely used topics in bivariate and multivariate statistics, including multiple regression, discriminant analysis, MANOVA, factor analysis, and binary logistic regression. The approach is applied and does not require formal mathematics; equations are accompanied by verbal explanations. Students are asked…

  15. Regression and regression analysis time series prediction modeling on climate data of quetta, pakistan

    International Nuclear Information System (INIS)

    Jafri, Y.Z.; Kamal, L.

    2007-01-01

    Various statistical techniques was used on five-year data from 1998-2002 of average humidity, rainfall, maximum and minimum temperatures, respectively. The relationships to regression analysis time series (RATS) were developed for determining the overall trend of these climate parameters on the basis of which forecast models can be corrected and modified. We computed the coefficient of determination as a measure of goodness of fit, to our polynomial regression analysis time series (PRATS). The correlation to multiple linear regression (MLR) and multiple linear regression analysis time series (MLRATS) were also developed for deciphering the interdependence of weather parameters. Spearman's rand correlation and Goldfeld-Quandt test were used to check the uniformity or non-uniformity of variances in our fit to polynomial regression (PR). The Breusch-Pagan test was applied to MLR and MLRATS, respectively which yielded homoscedasticity. We also employed Bartlett's test for homogeneity of variances on a five-year data of rainfall and humidity, respectively which showed that the variances in rainfall data were not homogenous while in case of humidity, were homogenous. Our results on regression and regression analysis time series show the best fit to prediction modeling on climatic data of Quetta, Pakistan. (author)

  16. A non-linear regression analysis program for describing electrophysiological data with multiple functions using Microsoft Excel.

    Science.gov (United States)

    Brown, Angus M

    2006-04-01

    The objective of this present study was to demonstrate a method for fitting complex electrophysiological data with multiple functions using the SOLVER add-in of the ubiquitous spreadsheet Microsoft Excel. SOLVER minimizes the difference between the sum of the squares of the data to be fit and the function(s) describing the data using an iterative generalized reduced gradient method. While it is a straightforward procedure to fit data with linear functions, and we have previously demonstrated a method of non-linear regression analysis of experimental data based upon a single function, it is more complex to fit data with multiple functions, usually requiring specialized expensive computer software. In this paper we describe an easily understood program for fitting experimentally acquired data, in this case the stimulus-evoked compound action potential from the mouse optic nerve, with multiple Gaussian functions. The program is flexible and can be applied to describe data with a wide variety of user-input functions.

  17. Enforcing Co-expression Within a Brain-Imaging Genomics Regression Framework.

    Science.gov (United States)

    Zille, Pascal; Calhoun, Vince D; Wang, Yu-Ping

    2017-06-28

    Among the challenges arising in brain imaging genetic studies, estimating the potential links between neurological and genetic variability within a population is key. In this work, we propose a multivariate, multimodal formulation for variable selection that leverages co-expression patterns across various data modalities. Our approach is based on an intuitive combination of two widely used statistical models: sparse regression and canonical correlation analysis (CCA). While the former seeks multivariate linear relationships between a given phenotype and associated observations, the latter searches to extract co-expression patterns between sets of variables belonging to different modalities. In the following, we propose to rely on a 'CCA-type' formulation in order to regularize the classical multimodal sparse regression problem (essentially incorporating both CCA and regression models within a unified formulation). The underlying motivation is to extract discriminative variables that are also co-expressed across modalities. We first show that the simplest formulation of such model can be expressed as a special case of collaborative learning methods. After discussing its limitation, we propose an extended, more flexible formulation, and introduce a simple and efficient alternating minimization algorithm to solve the associated optimization problem.We explore the parameter space and provide some guidelines regarding parameter selection. Both the original and extended versions are then compared on a simple toy dataset and a more advanced simulated imaging genomics dataset in order to illustrate the benefits of the latter. Finally, we validate the proposed formulation using single nucleotide polymorphisms (SNP) data and functional magnetic resonance imaging (fMRI) data from a population of adolescents (n = 362 subjects, age 16.9 ± 1.9 years from the Philadelphia Neurodevelopmental Cohort) for the study of learning ability. Furthermore, we carry out a significance

  18. Trochanteric entry femoral nails yield better femoral version and lower revision rates-A large cohort multivariate regression analysis.

    Science.gov (United States)

    Yoon, Richard S; Gage, Mark J; Galos, David K; Donegan, Derek J; Liporace, Frank A

    2017-06-01

    Intramedullary nailing (IMN) has become the standard of care for the treatment of most femoral shaft fractures. Different IMN options include trochanteric and piriformis entry as well as retrograde nails, which may result in varying degrees of femoral rotation. The objective of this study was to analyze postoperative femoral version between three types of nails and to delineate any significant differences in femoral version (DFV) and revision rates. Over a 10-year period, 417 patients underwent IMN of a diaphyseal femur fracture (AO/OTA 32A-C). Of these patients, 316 met inclusion criteria and obtained postoperative computed tomography (CT) scanograms to calculate femoral version and were thus included in the study. In this study, our main outcome measure was the difference in femoral version (DFV) between the uninjured limb and the injured limb. The effect of the following variables on DFV and revision rates were determined via univariate, multivariate, and ordinal regression analyses: gender, age, BMI, ethnicity, mechanism of injury, operative side, open fracture, and table type/position. Statistical significance was set at pregression analysis revealed that a lower BMI was significantly associated with a lower DFV (p=0.006). Controlling for possible covariables, multivariate analysis yielded a significantly lower DFV for trochanteric entry nails than piriformis or retrograde nails (7.9±6.10 vs. 9.5±7.4 vs. 9.4±7.8°, pregression analysis. However, this is not to state that the other nail types exhibited abnormal DFV. Translation to the clinical impact of a few degrees of DFV is also unknown. Future studies to more in-depth study the intricacies of femoral version may lead to improved technology in addition to potentially improved clinical outcomes. Copyright © 2017 Elsevier Ltd. All rights reserved.

  19. Prediction of minimum temperatures in an alpine region by linear and non-linear post-processing of meteorological models

    Directory of Open Access Journals (Sweden)

    R. Barbiero

    2007-05-01

    Full Text Available Model Output Statistics (MOS refers to a method of post-processing the direct outputs of numerical weather prediction (NWP models in order to reduce the biases introduced by a coarse horizontal resolution. This technique is especially useful in orographically complex regions, where large differences can be found between the NWP elevation model and the true orography. This study carries out a comparison of linear and non-linear MOS methods, aimed at the prediction of minimum temperatures in a fruit-growing region of the Italian Alps, based on the output of two different NWPs (ECMWF T511–L60 and LAMI-3. Temperature, of course, is a particularly important NWP output; among other roles it drives the local frost forecast, which is of great interest to agriculture. The mechanisms of cold air drainage, a distinctive aspect of mountain environments, are often unsatisfactorily captured by global circulation models. The simplest post-processing technique applied in this work was a correction for the mean bias, assessed at individual model grid points. We also implemented a multivariate linear regression on the output at the grid points surrounding the target area, and two non-linear models based on machine learning techniques: Neural Networks and Random Forest. We compare the performance of all these techniques on four different NWP data sets. Downscaling the temperatures clearly improved the temperature forecasts with respect to the raw NWP output, and also with respect to the basic mean bias correction. Multivariate methods generally yielded better results, but the advantage of using non-linear algorithms was small if not negligible. RF, the best performing method, was implemented on ECMWF prognostic output at 06:00 UTC over the 9 grid points surrounding the target area. Mean absolute errors in the prediction of 2 m temperature at 06:00 UTC were approximately 1.2°C, close to the natural variability inside the area itself.

  20. Evaluation of Multiple Linear Regression-Based Limited Sampling Strategies for Enteric-Coated Mycophenolate Sodium in Adult Kidney Transplant Recipients.

    Science.gov (United States)

    Brooks, Emily K; Tett, Susan E; Isbel, Nicole M; McWhinney, Brett; Staatz, Christine E

    2018-04-01

    Although multiple linear regression-based limited sampling strategies (LSSs) have been published for enteric-coated mycophenolate sodium, none have been evaluated for the prediction of subsequent mycophenolic acid (MPA) exposure. This study aimed to examine the predictive performance of the published LSS for the estimation of future MPA area under the concentration-time curve from 0 to 12 hours (AUC0-12) in renal transplant recipients. Total MPA plasma concentrations were measured in 20 adult renal transplant patients on 2 occasions a week apart. All subjects received concomitant tacrolimus and were approximately 1 month after transplant. Samples were taken at 0, 0.33, 0.5, 1, 1.5, 2, 2.5, 3, 3.5, 4, 6, and 8 hours and 0, 0.25, 0.5, 0.75, 1, 1.25, 1.5, 2, 3, 4, 6, 9, and 12 hours after dose on the first and second sampling occasion, respectively. Predicted MPA AUC0-12 was calculated using 19 published LSSs and data from the first or second sampling occasion for each patient and compared with the second occasion full MPA AUC0-12 calculated using the linear trapezoidal rule. Bias (median percentage prediction error) and imprecision (median absolute prediction error) were determined. Median percentage prediction error and median absolute prediction error for the prediction of full MPA AUC0-12 were multiple linear regression-based LSS was not possible without concentrations up to at least 8 hours after the dose.

  1. Multivariable adaptive control of bio process

    Energy Technology Data Exchange (ETDEWEB)

    Maher, M.; Bahhou, B.; Roux, G. [Centre National de la Recherche Scientifique (CNRS), 31 - Toulouse (France); Maher, M. [Faculte des Sciences, Rabat (Morocco). Lab. de Physique

    1995-12-31

    This paper presents a multivariable adaptive control of a continuous-flow fermentation process for the alcohol production. The linear quadratic control strategy is used for the regulation of substrate and ethanol concentrations in the bioreactor. The control inputs are the dilution rate and the influent substrate concentration. A robust identification algorithm is used for the on-line estimation of linear MIMO model`s parameters. Experimental results of a pilot-plant fermenter application are reported and show the control performances. (authors) 8 refs.

  2. Vector regression introduced

    Directory of Open Access Journals (Sweden)

    Mok Tik

    2014-06-01

    Full Text Available This study formulates regression of vector data that will enable statistical analysis of various geodetic phenomena such as, polar motion, ocean currents, typhoon/hurricane tracking, crustal deformations, and precursory earthquake signals. The observed vector variable of an event (dependent vector variable is expressed as a function of a number of hypothesized phenomena realized also as vector variables (independent vector variables and/or scalar variables that are likely to impact the dependent vector variable. The proposed representation has the unique property of solving the coefficients of independent vector variables (explanatory variables also as vectors, hence it supersedes multivariate multiple regression models, in which the unknown coefficients are scalar quantities. For the solution, complex numbers are used to rep- resent vector information, and the method of least squares is deployed to estimate the vector model parameters after transforming the complex vector regression model into a real vector regression model through isomorphism. Various operational statistics for testing the predictive significance of the estimated vector parameter coefficients are also derived. A simple numerical example demonstrates the use of the proposed vector regression analysis in modeling typhoon paths.

  3. TMVA - Toolkit for Multivariate Data Analysis with ROOT Users guide

    CERN Document Server

    Höcker, A; Tegenfeldt, F; Voss, H; Voss, K; Christov, A; Henrot-Versillé, S; Jachowski, M; Krasznahorkay, A; Mahalalel, Y; Prudent, X; Speckmayer, P

    2007-01-01

    Multivariate machine learning techniques for the classification of data from high-energy physics (HEP) experiments have become standard tools in most HEP analyses. The multivariate classifiers themselves have significantly evolved in recent years, also driven by developments in other areas inside and outside science. TMVA is a toolkit integrated in ROOT which hosts a large variety of multivariate classification algorithms. They range from rectangular cut optimisation (using a genetic algorithm) and likelihood estimators, over linear and non-linear discriminants (neural networks), to sophisticated recent developments like boosted decision trees and rule ensemble fitting. TMVA organises the simultaneous training, testing, and performance evaluation of all these classifiers with a user-friendly interface, and expedites the application of the trained classifiers to the analysis of data sets with unknown sample composition.

  4. Combined genetic algorithm and multiple linear regression (GA-MLR) optimizer: Application to multi-exponential fluorescence decay surface.

    Science.gov (United States)

    Fisz, Jacek J

    2006-12-07

    The optimization approach based on the genetic algorithm (GA) combined with multiple linear regression (MLR) method, is discussed. The GA-MLR optimizer is designed for the nonlinear least-squares problems in which the model functions are linear combinations of nonlinear functions. GA optimizes the nonlinear parameters, and the linear parameters are calculated from MLR. GA-MLR is an intuitive optimization approach and it exploits all advantages of the genetic algorithm technique. This optimization method results from an appropriate combination of two well-known optimization methods. The MLR method is embedded in the GA optimizer and linear and nonlinear model parameters are optimized in parallel. The MLR method is the only one strictly mathematical "tool" involved in GA-MLR. The GA-MLR approach simplifies and accelerates considerably the optimization process because the linear parameters are not the fitted ones. Its properties are exemplified by the analysis of the kinetic biexponential fluorescence decay surface corresponding to a two-excited-state interconversion process. A short discussion of the variable projection (VP) algorithm, designed for the same class of the optimization problems, is presented. VP is a very advanced mathematical formalism that involves the methods of nonlinear functionals, algebra of linear projectors, and the formalism of Fréchet derivatives and pseudo-inverses. Additional explanatory comments are added on the application of recently introduced the GA-NR optimizer to simultaneous recovery of linear and weakly nonlinear parameters occurring in the same optimization problem together with nonlinear parameters. The GA-NR optimizer combines the GA method with the NR method, in which the minimum-value condition for the quadratic approximation to chi(2), obtained from the Taylor series expansion of chi(2), is recovered by means of the Newton-Raphson algorithm. The application of the GA-NR optimizer to model functions which are multi-linear

  5. Time series linear regression of half-hourly radon levels in a residence

    International Nuclear Information System (INIS)

    Hull, D.A.

    1990-01-01

    This paper uses time series linear regression modelling to assess the impact of temperature and pressure differences on the radon measured in the basement and in the basement drain of a research house in the Princeton area of New Jersey. The models examine half-hour averages of several climate and house parameters for several periods of up to 11 days. The drain radon concentrations follow a strong diurnal pattern that shifts 12 hours in phase between the summer and the fall seasons. This shift can be linked both to the change in temperature differences between seasons and to an experiment which involved sealing the connection between the drain and the basement. We have found that both the basement and the drain radon concentrations are correlated to basement-outdoor and soil-outdoor temperature differences (the coefficient of determination varies between 0.6 and 0.8). The statistical models for the summer periods clearly describe a physical system where the basement drain pumps radon in during the night and sucks radon out during the day

  6. An Ionospheric Index Model based on Linear Regression and Neural Network Approaches

    Science.gov (United States)

    Tshisaphungo, Mpho; McKinnell, Lee-Anne; Bosco Habarulema, John

    2017-04-01

    The ionosphere is well known to reflect radio wave signals in the high frequency (HF) band due to the present of electron and ions within the region. To optimise the use of long distance HF communications, it is important to understand the drivers of ionospheric storms and accurately predict the propagation conditions especially during disturbed days. This paper presents the development of an ionospheric storm-time index over the South African region for the application of HF communication users. The model will result into a valuable tool to measure the complex ionospheric behaviour in an operational space weather monitoring and forecasting environment. The development of an ionospheric storm-time index is based on a single ionosonde station data over Grahamstown (33.3°S,26.5°E), South Africa. Critical frequency of the F2 layer (foF2) measurements for a period 1996-2014 were considered for this study. The model was developed based on linear regression and neural network approaches. In this talk validation results for low, medium and high solar activity periods will be discussed to demonstrate model's performance.

  7. Development of statistical linear regression model for metals from transportation land uses.

    Science.gov (United States)

    Maniquiz, Marla C; Lee, Soyoung; Lee, Eunju; Kim, Lee-Hyung

    2009-01-01

    The transportation landuses possessing impervious surfaces such as highways, parking lots, roads, and bridges were recognized as the highly polluted non-point sources (NPSs) in the urban areas. Lots of pollutants from urban transportation are accumulating on the paved surfaces during dry periods and are washed-off during a storm. In Korea, the identification and monitoring of NPSs still represent a great challenge. Since 2004, the Ministry of Environment (MOE) has been engaged in several researches and monitoring to develop stormwater management policies and treatment systems for future implementation. The data over 131 storm events during May 2004 to September 2008 at eleven sites were analyzed to identify correlation relationships between particulates and metals, and to develop simple linear regression (SLR) model to estimate event mean concentration (EMC). Results indicate that there was no significant relationship between metals and TSS EMC. However, the SLR estimation models although not providing useful results are valuable indicators of high uncertainties that NPS pollution possess. Therefore, long term monitoring employing proper methods and precise statistical analysis of the data should be undertaken to eliminate these uncertainties.

  8. Local Strategy Combined with a Wavelength Selection Method for Multivariate Calibration

    Directory of Open Access Journals (Sweden)

    Haitao Chang

    2016-06-01

    Full Text Available One of the essential factors influencing the prediction accuracy of multivariate calibration models is the quality of the calibration data. A local regression strategy, together with a wavelength selection approach, is proposed to build the multivariate calibration models based on partial least squares regression. The local algorithm is applied to create a calibration set of spectra similar to the spectrum of an unknown sample; the synthetic degree of grey relation coefficient is used to evaluate the similarity. A wavelength selection method based on simple-to-use interactive self-modeling mixture analysis minimizes the influence of noisy variables, and the most informative variables of the most similar samples are selected to build the multivariate calibration model based on partial least squares regression. To validate the performance of the proposed method, ultraviolet-visible absorbance spectra of mixed solutions of food coloring analytes in a concentration range of 20–200 µg/mL is measured. Experimental results show that the proposed method can not only enhance the prediction accuracy of the calibration model, but also greatly reduce its complexity.

  9. Modelos de regressão não linear aplicados a grupos de acessos de alho

    OpenAIRE

    Reis, Renata M; Cecon, Paulo R; Puiatti, Mário; Finger, Fernando L; Nascimento, Moysés; Silva, Fabyano F; Carneiro, Antônio PS; Silva, Anderson R

    2014-01-01

    O principal objetivo deste estudo foi comparar modelos de regressão não linear aptos a descreverem o acúmulo de massa seca de diferentes partes da planta do alho ao longo do tempo (60, 90, 120 e 150 dias após plantio). Objetivou-se também identificar acessos semelhantes em relação às características avaliadas por meio de análises de agrupamento. Foram utilizados 20 acessos de alho pertencentes ao Banco de Germoplasma de Hortaliças da Universidade Federal de Viçosa (BGH/UFV). O teor de massa s...

  10. Predicting blood β-hydroxybutyrate using milk Fourier transform infrared spectrum, milk composition, and producer-reported variables with multiple linear regression, partial least squares regression, and artificial neural network.

    Science.gov (United States)

    Pralle, R S; Weigel, K W; White, H M

    2018-05-01

    Prediction of postpartum hyperketonemia (HYK) using Fourier transform infrared (FTIR) spectrometry analysis could be a practical diagnostic option for farms because these data are now available from routine milk analysis during Dairy Herd Improvement testing. The objectives of this study were to (1) develop and evaluate blood β-hydroxybutyrate (BHB) prediction models using multivariate linear regression (MLR), partial least squares regression (PLS), and artificial neural network (ANN) methods and (2) evaluate whether milk FTIR spectrum (mFTIR)-based models are improved with the inclusion of test-day variables (mTest; milk composition and producer-reported data). Paired blood and milk samples were collected from multiparous cows 5 to 18 d postpartum at 3 Wisconsin farms (3,629 observations from 1,013 cows). Blood BHB concentration was determined by a Precision Xtra meter (Abbot Diabetes Care, Alameda, CA), and milk samples were analyzed by a privately owned laboratory (AgSource, Menomonie, WI) for components and FTIR spectrum absorbance. Producer-recorded variables were extracted from farm management software. A blood BHB ≥1.2 mmol/L was considered HYK. The data set was divided into a training set (n = 3,020) and an external testing set (n = 609). Model fitting was implemented with JMP 12 (SAS Institute, Cary, NC). A 5-fold cross-validation was performed on the training data set for the MLR, PLS, and ANN prediction methods, with square root of blood BHB as the dependent variable. Each method was fitted using 3 combinations of variables: mFTIR, mTest, or mTest + mFTIR variables. Models were evaluated based on coefficient of determination, root mean squared error, and area under the receiver operating characteristic curve. Four models (PLS-mTest + mFTIR, ANN-mFTIR, ANN-mTest, and ANN-mTest + mFTIR) were chosen for further evaluation in the testing set after fitting to the full training set. In the cross-validation analysis, model fit was greatest for ANN, followed

  11. Multivariate calculus and geometry

    CERN Document Server

    Dineen, Seán

    2014-01-01

    Multivariate calculus can be understood best by combining geometric insight, intuitive arguments, detailed explanations and mathematical reasoning. This textbook has successfully followed this programme. It additionally provides a solid description of the basic concepts, via familiar examples, which are then tested in technically demanding situations. In this new edition the introductory chapter and two of the chapters on the geometry of surfaces have been revised. Some exercises have been replaced and others provided with expanded solutions. Familiarity with partial derivatives and a course in linear algebra are essential prerequisites for readers of this book. Multivariate Calculus and Geometry is aimed primarily at higher level undergraduates in the mathematical sciences. The inclusion of many practical examples involving problems of several variables will appeal to mathematics, science and engineering students.

  12. Generalized Partially Linear Regression with Misclassified Data and an Application to Labour Market Transitions

    DEFF Research Database (Denmark)

    Dlugosz, Stephan; Mammen, Enno; Wilke, Ralf

    2017-01-01

    Large data sets that originate from administrative or operational activity are increasingly used for statistical analysis as they often contain very precise information and a large number of observations. But there is evidence that some variables can be subject to severe misclassification...... or contain missing values. Given the size of the data, a flexible semiparametric misclassification model would be good choice but their use in practise is scarce. To close this gap a semiparametric model for the probability of observing labour market transitions is estimated using a sample of 20 m...... observations from Germany. It is shown that estimated marginal effects of a number of covariates are sizeably affected by misclassification and missing values in the analysis data. The proposed generalized partially linear regression extends existing models by allowing a misclassified discrete covariate...

  13. hMuLab: A Biomedical Hybrid MUlti-LABel Classifier Based on Multiple Linear Regression.

    Science.gov (United States)

    Wang, Pu; Ge, Ruiquan; Xiao, Xuan; Zhou, Manli; Zhou, Fengfeng

    2017-01-01

    Many biomedical classification problems are multi-label by nature, e.g., a gene involved in a variety of functions and a patient with multiple diseases. The majority of existing classification algorithms assumes each sample with only one class label, and the multi-label classification problem remains to be a challenge for biomedical researchers. This study proposes a novel multi-label learning algorithm, hMuLab, by integrating both feature-based and neighbor-based similarity scores. The multiple linear regression modeling techniques make hMuLab capable of producing multiple label assignments for a query sample. The comparison results over six commonly-used multi-label performance measurements suggest that hMuLab performs accurately and stably for the biomedical datasets, and may serve as a complement to the existing literature.

  14. Simulations of full multivariate Tweedie with flexible dependence structure

    DEFF Research Database (Denmark)

    Cuenin, Johann; Jørgensen, Bent; Kokonendji, Célestin C.

    2016-01-01

    The paper introduces a variables-in-common method for constructing and simulating multivariate Tweedie distribution, based on linear combinations of independent univariate Tweedie variables. The method is facilitated by the convolution and scaling properties of the Tweedie distributions, using....... The method allows simulation of multivariate distributions from many known, including the Gaussian, Poisson, non-central gamma, gamma and inverse Gaussian distributions....

  15. A note on the relationships between multiple imputation, maximum likelihood and fully Bayesian methods for missing responses in linear regression models.

    Science.gov (United States)

    Chen, Qingxia; Ibrahim, Joseph G

    2014-07-01

    Multiple Imputation, Maximum Likelihood and Fully Bayesian methods are the three most commonly used model-based approaches in missing data problems. Although it is easy to show that when the responses are missing at random (MAR), the complete case analysis is unbiased and efficient, the aforementioned methods are still commonly used in practice for this setting. To examine the performance of and relationships between these three methods in this setting, we derive and investigate small sample and asymptotic expressions of the estimates and standard errors, and fully examine how these estimates are related for the three approaches in the linear regression model when the responses are MAR. We show that when the responses are MAR in the linear model, the estimates of the regression coefficients using these three methods are asymptotically equivalent to the complete case estimates under general conditions. One simulation and a real data set from a liver cancer clinical trial are given to compare the properties of these methods when the responses are MAR.

  16. A New Predictive Model Based on the ABC Optimized Multivariate Adaptive Regression Splines Approach for Predicting the Remaining Useful Life in Aircraft Engines

    Directory of Open Access Journals (Sweden)

    Paulino José García Nieto

    2016-05-01

    Full Text Available Remaining useful life (RUL estimation is considered as one of the most central points in the prognostics and health management (PHM. The present paper describes a nonlinear hybrid ABC–MARS-based model for the prediction of the remaining useful life of aircraft engines. Indeed, it is well-known that an accurate RUL estimation allows failure prevention in a more controllable way so that the effective maintenance can be carried out in appropriate time to correct impending faults. The proposed hybrid model combines multivariate adaptive regression splines (MARS, which have been successfully adopted for regression problems, with the artificial bee colony (ABC technique. This optimization technique involves parameter setting in the MARS training procedure, which significantly influences the regression accuracy. However, its use in reliability applications has not yet been widely explored. Bearing this in mind, remaining useful life values have been predicted here by using the hybrid ABC–MARS-based model from the remaining measured parameters (input variables for aircraft engines with success. A correlation coefficient equal to 0.92 was obtained when this hybrid ABC–MARS-based model was applied to experimental data. The agreement of this model with experimental data confirmed its good performance. The main advantage of this predictive model is that it does not require information about the previous operation states of the aircraft engine.

  17. Emulating facial biomechanics using multivariate partial least squares surrogate models.

    Science.gov (United States)

    Wu, Tim; Martens, Harald; Hunter, Peter; Mithraratne, Kumar

    2014-11-01

    A detailed biomechanical model of the human face driven by a network of muscles is a useful tool in relating the muscle activities to facial deformations. However, lengthy computational times often hinder its applications in practical settings. The objective of this study is to replace precise but computationally demanding biomechanical model by a much faster multivariate meta-model (surrogate model), such that a significant speedup (to real-time interactive speed) can be achieved. Using a multilevel fractional factorial design, the parameter space of the biomechanical system was probed from a set of sample points chosen to satisfy maximal rank optimality and volume filling. The input-output relationship at these sampled points was then statistically emulated using linear and nonlinear, cross-validated, partial least squares regression models. It was demonstrated that these surrogate models can mimic facial biomechanics efficiently and reliably in real-time. Copyright © 2014 John Wiley & Sons, Ltd.

  18. PWR control system design using advanced linear and non-linear methodologies

    International Nuclear Information System (INIS)

    Rabindran, N.; Whitmarsh-Everiss, M.J.

    2004-01-01

    Consideration is here given to the methodology deployed for non-linear heuristic analysis in the time domain supported by multi-variable linear control system design methods for the purposes of operational dynamics and control system analysis. This methodology is illustrated by the application of structural singular value μ analysis to Pressurised Water Reactor control system design. (author)

  19. A Seemingly Unrelated Poisson Regression Model

    OpenAIRE

    King, Gary

    1989-01-01

    This article introduces a new estimator for the analysis of two contemporaneously correlated endogenous event count variables. This seemingly unrelated Poisson regression model (SUPREME) estimator combines the efficiencies created by single equation Poisson regression model estimators and insights from "seemingly unrelated" linear regression models.

  20. Using Multivariate Adaptive Regression Spline and Artificial Neural Network to Simulate Urbanization in Mumbai, India

    Science.gov (United States)

    Ahmadlou, M.; Delavar, M. R.; Tayyebi, A.; Shafizadeh-Moghadam, H.

    2015-12-01

    Land use change (LUC) models used for modelling urban growth are different in structure and performance. Local models divide the data into separate subsets and fit distinct models on each of the subsets. Non-parametric models are data driven and usually do not have a fixed model structure or model structure is unknown before the modelling process. On the other hand, global models perform modelling using all the available data. In addition, parametric models have a fixed structure before the modelling process and they are model driven. Since few studies have compared local non-parametric models with global parametric models, this study compares a local non-parametric model called multivariate adaptive regression spline (MARS), and a global parametric model called artificial neural network (ANN) to simulate urbanization in Mumbai, India. Both models determine the relationship between a dependent variable and multiple independent variables. We used receiver operating characteristic (ROC) to compare the power of the both models for simulating urbanization. Landsat images of 1991 (TM) and 2010 (ETM+) were used for modelling the urbanization process. The drivers considered for urbanization in this area were distance to urban areas, urban density, distance to roads, distance to water, distance to forest, distance to railway, distance to central business district, number of agricultural cells in a 7 by 7 neighbourhoods, and slope in 1991. The results showed that the area under the ROC curve for MARS and ANN was 94.77% and 95.36%, respectively. Thus, ANN performed slightly better than MARS to simulate urban areas in Mumbai, India.

  1. Railway Crossing Risk Area Detection Using Linear Regression and Terrain Drop Compensation Techniques

    Science.gov (United States)

    Chen, Wen-Yuan; Wang, Mei; Fu, Zhou-Xing

    2014-01-01

    Most railway accidents happen at railway crossings. Therefore, how to detect humans or objects present in the risk area of a railway crossing and thus prevent accidents are important tasks. In this paper, three strategies are used to detect the risk area of a railway crossing: (1) we use a terrain drop compensation (TDC) technique to solve the problem of the concavity of railway crossings; (2) we use a linear regression technique to predict the position and length of an object from image processing; (3) we have developed a novel strategy called calculating local maximum Y-coordinate object points (CLMYOP) to obtain the ground points of the object. In addition, image preprocessing is also applied to filter out the noise and successfully improve the object detection. From the experimental results, it is demonstrated that our scheme is an effective and corrective method for the detection of railway crossing risk areas. PMID:24936948

  2. Railway Crossing Risk Area Detection Using Linear Regression and Terrain Drop Compensation Techniques

    Directory of Open Access Journals (Sweden)

    Wen-Yuan Chen

    2014-06-01

    Full Text Available Most railway accidents happen at railway crossings. Therefore, how to detect humans or objects present in the risk area of a railway crossing and thus prevent accidents are important tasks. In this paper, three strategies are used to detect the risk area of a railway crossing: (1 we use a terrain drop compensation (TDC technique to solve the problem of the concavity of railway crossings; (2 we use a linear regression technique to predict the position and length of an object from image processing; (3 we have developed a novel strategy called calculating local maximum Y-coordinate object points (CLMYOP to obtain the ground points of the object. In addition, image preprocessing is also applied to filter out the noise and successfully improve the object detection. From the experimental results, it is demonstrated that our scheme is an effective and corrective method for the detection of railway crossing risk areas.

  3. Using Apparent Density of Paper from Hardwood Kraft Pulps to Predict Sheet Properties, based on Unsupervised Classification and Multivariable Regression Techniques

    Directory of Open Access Journals (Sweden)

    Ofélia Anjos

    2015-07-01

    Full Text Available Paper properties determine the product application potential and depend on the raw material, pulping conditions, and pulp refining. The aim of this study was to construct mathematical models that predict quantitative relations between the paper density and various mechanical and optical properties of the paper. A dataset of properties of paper handsheets produced with pulps of Acacia dealbata, Acacia melanoxylon, and Eucalyptus globulus beaten at 500, 2500, and 4500 revolutions was used. Unsupervised classification techniques were combined to assess the need to perform separated prediction models for each species, and multivariable regression techniques were used to establish such prediction models. It was possible to develop models with a high goodness of fit using paper density as the independent variable (or predictor for all variables except tear index and zero-span tensile strength, both dry and wet.

  4. Weibull and lognormal Taguchi analysis using multiple linear regression

    International Nuclear Information System (INIS)

    Piña-Monarrez, Manuel R.; Ortiz-Yañez, Jesús F.

    2015-01-01

    The paper provides to reliability practitioners with a method (1) to estimate the robust Weibull family when the Taguchi method (TM) is applied, (2) to estimate the normal operational Weibull family in an accelerated life testing (ALT) analysis to give confidence to the extrapolation and (3) to perform the ANOVA analysis to both the robust and the normal operational Weibull family. On the other hand, because the Weibull distribution neither has the normal additive property nor has a direct relationship with the normal parameters (µ, σ), in this paper, the issues of estimating a Weibull family by using a design of experiment (DOE) are first addressed by using an L_9 (3"4) orthogonal array (OA) in both the TM and in the Weibull proportional hazard model approach (WPHM). Then, by using the Weibull/Gumbel and the lognormal/normal relationships and multiple linear regression, the direct relationships between the Weibull and the lifetime parameters are derived and used to formulate the proposed method. Moreover, since the derived direct relationships always hold, the method is generalized to the lognormal and ALT analysis. Finally, the method’s efficiency is shown through its application to the used OA and to a set of ALT data. - Highlights: • It gives the statistical relations and steps to use the Taguchi Method (TM) to analyze Weibull data. • It gives the steps to determine the unknown Weibull family to both the robust TM setting and the normal ALT level. • It gives a method to determine the expected lifetimes and to perform its ANOVA analysis in TM and ALT analysis. • It gives a method to give confidence to the extrapolation in an ALT analysis by using the Weibull family of the normal level.

  5. Introduction to generalized linear models

    CERN Document Server

    Dobson, Annette J

    2008-01-01

    Introduction Background Scope Notation Distributions Related to the Normal Distribution Quadratic Forms Estimation Model Fitting Introduction Examples Some Principles of Statistical Modeling Notation and Coding for Explanatory Variables Exponential Family and Generalized Linear Models Introduction Exponential Family of Distributions Properties of Distributions in the Exponential Family Generalized Linear Models Examples Estimation Introduction Example: Failure Times for Pressure Vessels Maximum Likelihood Estimation Poisson Regression Example Inference Introduction Sampling Distribution for Score Statistics Taylor Series Approximations Sampling Distribution for MLEs Log-Likelihood Ratio Statistic Sampling Distribution for the Deviance Hypothesis Testing Normal Linear Models Introduction Basic Results Multiple Linear Regression Analysis of Variance Analysis of Covariance General Linear Models Binary Variables and Logistic Regression Probability Distributions ...

  6. On directional multiple-output quantile regression

    Czech Academy of Sciences Publication Activity Database

    Paindaveine, D.; Šiman, Miroslav

    2011-01-01

    Roč. 102, č. 2 (2011), s. 193-212 ISSN 0047-259X R&D Projects: GA MŠk(CZ) 1M06047 Grant - others:Commision EC(BE) Fonds National de la Recherche Scientifique Institutional research plan: CEZ:AV0Z10750506 Keywords : multivariate quantile * quantile regression * multiple-output regression * halfspace depth * portfolio optimization * value-at risk Subject RIV: BA - General Mathematics Impact factor: 0.879, year: 2011 http://library.utia.cas.cz/separaty/2011/SI/siman-0364128.pdf

  7. Linear regression models for quantitative assessment of left ...

    African Journals Online (AJOL)

    Changes in left ventricular structures and function have been reported in cardiomyopathies. No prediction models have been established in this environment. This study established regression models for prediction of left ventricular structures in normal subjects. A sample of normal subjects was drawn from a large urban ...

  8. Elliptical multiple-output quantile regression and convex optimization

    Czech Academy of Sciences Publication Activity Database

    Hallin, M.; Šiman, Miroslav

    2016-01-01

    Roč. 109, č. 1 (2016), s. 232-237 ISSN 0167-7152 R&D Projects: GA ČR GA14-07234S Institutional support: RVO:67985556 Keywords : quantile regression * elliptical quantile * multivariate quantile * multiple-output regression Subject RIV: BA - General Mathematics Impact factor: 0.540, year: 2016 http://library.utia.cas.cz/separaty/2016/SI/siman-0458243.pdf

  9. Using multiple linear regression techniques to quantify carbon ...

    African Journals Online (AJOL)

    Fallow ecosystems provide a significant carbon stock that can be quantified for inclusion in the accounts of global carbon budgets. Process and statistical models of productivity, though useful, are often technically rigid as the conditions for their application are not easy to satisfy. Multiple regression techniques have been ...

  10. Early Parallel Activation of Semantics and Phonology in Picture Naming: Evidence from a Multiple Linear Regression MEG Study.

    Science.gov (United States)

    Miozzo, Michele; Pulvermüller, Friedemann; Hauk, Olaf

    2015-10-01

    The time course of brain activation during word production has become an area of increasingly intense investigation in cognitive neuroscience. The predominant view has been that semantic and phonological processes are activated sequentially, at about 150 and 200-400 ms after picture onset. Although evidence from prior studies has been interpreted as supporting this view, these studies were arguably not ideally suited to detect early brain activation of semantic and phonological processes. We here used a multiple linear regression approach to magnetoencephalography (MEG) analysis of picture naming in order to investigate early effects of variables specifically related to visual, semantic, and phonological processing. This was combined with distributed minimum-norm source estimation and region-of-interest analysis. Brain activation associated with visual image complexity appeared in occipital cortex at about 100 ms after picture presentation onset. At about 150 ms, semantic variables became physiologically manifest in left frontotemporal regions. In the same latency range, we found an effect of phonological variables in the left middle temporal gyrus. Our results demonstrate that multiple linear regression analysis is sensitive to early effects of multiple psycholinguistic variables in picture naming. Crucially, our results suggest that access to phonological information might begin in parallel with semantic processing around 150 ms after picture onset. © The Author 2014. Published by Oxford University Press.

  11. Modelling daily dissolved oxygen concentration using least square support vector machine, multivariate adaptive regression splines and M5 model tree

    Science.gov (United States)

    Heddam, Salim; Kisi, Ozgur

    2018-04-01

    In the present study, three types of artificial intelligence techniques, least square support vector machine (LSSVM), multivariate adaptive regression splines (MARS) and M5 model tree (M5T) are applied for modeling daily dissolved oxygen (DO) concentration using several water quality variables as inputs. The DO concentration and water quality variables data from three stations operated by the United States Geological Survey (USGS) were used for developing the three models. The water quality data selected consisted of daily measured of water temperature (TE, °C), pH (std. unit), specific conductance (SC, μS/cm) and discharge (DI cfs), are used as inputs to the LSSVM, MARS and M5T models. The three models were applied for each station separately and compared to each other. According to the results obtained, it was found that: (i) the DO concentration could be successfully estimated using the three models and (ii) the best model among all others differs from one station to another.

  12. Direct integral linear least square regression method for kinetic evaluation of hepatobiliary scintigraphy

    International Nuclear Information System (INIS)

    Shuke, Noriyuki

    1991-01-01

    In hepatobiliary scintigraphy, kinetic model analysis, which provides kinetic parameters like hepatic extraction or excretion rate, have been done for quantitative evaluation of liver function. In this analysis, unknown model parameters are usually determined using nonlinear least square regression method (NLS method) where iterative calculation and initial estimate for unknown parameters are required. As a simple alternative to NLS method, direct integral linear least square regression method (DILS method), which can determine model parameters by a simple calculation without initial estimate, is proposed, and tested the applicability to analysis of hepatobiliary scintigraphy. In order to see whether DILS method could determine model parameters as good as NLS method, or to determine appropriate weight for DILS method, simulated theoretical data based on prefixed parameters were fitted to 1 compartment model using both DILS method with various weightings and NLS method. The parameter values obtained were then compared with prefixed values which were used for data generation. The effect of various weights on the error of parameter estimate was examined, and inverse of time was found to be the best weight to make the error minimum. When using this weight, DILS method could give parameter values close to those obtained by NLS method and both parameter values were very close to prefixed values. With appropriate weighting, the DILS method could provide reliable parameter estimate which is relatively insensitive to the data noise. In conclusion, the DILS method could be used as a simple alternative to NLS method, providing reliable parameter estimate. (author)

  13. Estimating leaf photosynthetic pigments information by stepwise multiple linear regression analysis and a leaf optical model

    Science.gov (United States)

    Liu, Pudong; Shi, Runhe; Wang, Hong; Bai, Kaixu; Gao, Wei

    2014-10-01

    Leaf pigments are key elements for plant photosynthesis and growth. Traditional manual sampling of these pigments is labor-intensive and costly, which also has the difficulty in capturing their temporal and spatial characteristics. The aim of this work is to estimate photosynthetic pigments at large scale by remote sensing. For this purpose, inverse model were proposed with the aid of stepwise multiple linear regression (SMLR) analysis. Furthermore, a leaf radiative transfer model (i.e. PROSPECT model) was employed to simulate the leaf reflectance where wavelength varies from 400 to 780 nm at 1 nm interval, and then these values were treated as the data from remote sensing observations. Meanwhile, simulated chlorophyll concentration (Cab), carotenoid concentration (Car) and their ratio (Cab/Car) were taken as target to build the regression model respectively. In this study, a total of 4000 samples were simulated via PROSPECT with different Cab, Car and leaf mesophyll structures as 70% of these samples were applied for training while the last 30% for model validation. Reflectance (r) and its mathematic transformations (1/r and log (1/r)) were all employed to build regression model respectively. Results showed fair agreements between pigments and simulated reflectance with all adjusted coefficients of determination (R2) larger than 0.8 as 6 wavebands were selected to build the SMLR model. The largest value of R2 for Cab, Car and Cab/Car are 0.8845, 0.876 and 0.8765, respectively. Meanwhile, mathematic transformations of reflectance showed little influence on regression accuracy. We concluded that it was feasible to estimate the chlorophyll and carotenoids and their ratio based on statistical model with leaf reflectance data.

  14. Standards for Standardized Logistic Regression Coefficients

    Science.gov (United States)

    Menard, Scott

    2011-01-01

    Standardized coefficients in logistic regression analysis have the same utility as standardized coefficients in linear regression analysis. Although there has been no consensus on the best way to construct standardized logistic regression coefficients, there is now sufficient evidence to suggest a single best approach to the construction of a…

  15. Area under the curve predictions of dalbavancin, a new lipoglycopeptide agent, using the end of intravenous infusion concentration data point by regression analyses such as linear, log-linear and power models.

    Science.gov (United States)

    Bhamidipati, Ravi Kanth; Syed, Muzeeb; Mullangi, Ramesh; Srinivas, Nuggehally

    2018-02-01

    1. Dalbavancin, a lipoglycopeptide, is approved for treating gram-positive bacterial infections. Area under plasma concentration versus time curve (AUC inf ) of dalbavancin is a key parameter and AUC inf /MIC ratio is a critical pharmacodynamic marker. 2. Using end of intravenous infusion concentration (i.e. C max ) C max versus AUC inf relationship for dalbavancin was established by regression analyses (i.e. linear, log-log, log-linear and power models) using 21 pairs of subject data. 3. The predictions of the AUC inf were performed using published C max data by application of regression equations. The quotient of observed/predicted values rendered fold difference. The mean absolute error (MAE)/root mean square error (RMSE) and correlation coefficient (r) were used in the assessment. 4. MAE and RMSE values for the various models were comparable. The C max versus AUC inf exhibited excellent correlation (r > 0.9488). The internal data evaluation showed narrow confinement (0.84-1.14-fold difference) with a RMSE models predicted AUC inf with a RMSE of 3.02-27.46% with fold difference largely contained within 0.64-1.48. 5. Regardless of the regression models, a single time point strategy of using C max (i.e. end of 30-min infusion) is amenable as a prospective tool for predicting AUC inf of dalbavancin in patients.

  16. Understanding logistic regression analysis

    OpenAIRE

    Sperandei, Sandro

    2014-01-01

    Logistic regression is used to obtain odds ratio in the presence of more than one explanatory variable. The procedure is quite similar to multiple linear regression, with the exception that the response variable is binomial. The result is the impact of each variable on the odds ratio of the observed event of interest. The main advantage is to avoid confounding effects by analyzing the association of all variables together. In this article, we explain the logistic regression procedure using ex...

  17. Semiparametric Allelic Tests for Mapping Multiple Phenotypes: Binomial Regression and Mahalanobis Distance.

    Science.gov (United States)

    Majumdar, Arunabha; Witte, John S; Ghosh, Saurabh

    2015-12-01

    Binary phenotypes commonly arise due to multiple underlying quantitative precursors and genetic variants may impact multiple traits in a pleiotropic manner. Hence, simultaneously analyzing such correlated traits may be more powerful than analyzing individual traits. Various genotype-level methods, e.g., MultiPhen (O'Reilly et al. []), have been developed to identify genetic factors underlying a multivariate phenotype. For univariate phenotypes, the usefulness and applicability of allele-level tests have been investigated. The test of allele frequency difference among cases and controls is commonly used for mapping case-control association. However, allelic methods for multivariate association mapping have not been studied much. In this article, we explore two allelic tests of multivariate association: one using a Binomial regression model based on inverted regression of genotype on phenotype (Binomial regression-based Association of Multivariate Phenotypes [BAMP]), and the other employing the Mahalanobis distance between two sample means of the multivariate phenotype vector for two alleles at a single-nucleotide polymorphism (Distance-based Association of Multivariate Phenotypes [DAMP]). These methods can incorporate both discrete and continuous phenotypes. Some theoretical properties for BAMP are studied. Using simulations, the power of the methods for detecting multivariate association is compared with the genotype-level test MultiPhen's. The allelic tests yield marginally higher power than MultiPhen for multivariate phenotypes. For one/two binary traits under recessive mode of inheritance, allelic tests are found to be substantially more powerful. All three tests are applied to two different real data and the results offer some support for the simulation study. We propose a hybrid approach for testing multivariate association that implements MultiPhen when Hardy-Weinberg Equilibrium (HWE) is violated and BAMP otherwise, because the allelic approaches assume HWE

  18. Time-adaptive quantile regression

    DEFF Research Database (Denmark)

    Møller, Jan Kloppenborg; Nielsen, Henrik Aalborg; Madsen, Henrik

    2008-01-01

    and an updating procedure are combined into a new algorithm for time-adaptive quantile regression, which generates new solutions on the basis of the old solution, leading to savings in computation time. The suggested algorithm is tested against a static quantile regression model on a data set with wind power......An algorithm for time-adaptive quantile regression is presented. The algorithm is based on the simplex algorithm, and the linear optimization formulation of the quantile regression problem is given. The observations have been split to allow a direct use of the simplex algorithm. The simplex method...... production, where the models combine splines and quantile regression. The comparison indicates superior performance for the time-adaptive quantile regression in all the performance parameters considered....

  19. Application of single-step genomic best linear unbiased prediction with a multiple-lactation random regression test-day model for Japanese Holsteins.

    Science.gov (United States)

    Baba, Toshimi; Gotoh, Yusaku; Yamaguchi, Satoshi; Nakagawa, Satoshi; Abe, Hayato; Masuda, Yutaka; Kawahara, Takayoshi

    2017-08-01

    This study aimed to evaluate a validation reliability of single-step genomic best linear unbiased prediction (ssGBLUP) with a multiple-lactation random regression test-day model and investigate an effect of adding genotyped cows on the reliability. Two data sets for test-day records from the first three lactations were used: full data from February 1975 to December 2015 (60 850 534 records from 2 853 810 cows) and reduced data cut off in 2011 (53 091 066 records from 2 502 307 cows). We used marker genotypes of 4480 bulls and 608 cows. Genomic enhanced breeding values (GEBV) of 305-day milk yield in all the lactations were estimated for at least 535 young bulls using two marker data sets: bull genotypes only and both bulls and cows genotypes. The realized reliability (R 2 ) from linear regression analysis was used as an indicator of validation reliability. Using only genotyped bulls, R 2 was ranged from 0.41 to 0.46 and it was always higher than parent averages. The very similar R 2 were observed when genotyped cows were added. An application of ssGBLUP to a multiple-lactation random regression model is feasible and adding a limited number of genotyped cows has no significant effect on reliability of GEBV for genotyped bulls. © 2016 Japanese Society of Animal Science.

  20. Precision Index in the Multivariate Context

    Czech Academy of Sciences Publication Activity Database

    Šiman, Miroslav

    2014-01-01

    Roč. 43, č. 2 (2014), s. 377-387 ISSN 0361-0926 R&D Projects: GA MŠk(CZ) 1M06047 Institutional support: RVO:67985556 Keywords : data depth * multivariate quantile * process capability index * precision index * regression quantile Subject RIV: BA - General Mathematics Impact factor: 0.274, year: 2014 http://library.utia.cas.cz/separaty/2014/SI/siman-0425059.pdf