Freund, Rudolf J; Sa, Ping
2006-01-01
The book provides complete coverage of the classical methods of statistical analysis. It is designed to give students an understanding of the purpose of statistical analyses, to allow the student to determine, at least to some degree, the correct type of statistical analyses to be performed in a given situation, and have some appreciation of what constitutes good experimental design
Psoriasis regression analysis of MHC loci identifies shared genetic variants with vitiligo.
Kun-Ju Zhu
Full Text Available Psoriasis is a common inflammatory skin disease with genetic components of both immune system and the epidermis. PSOR1 locus (6q21 has been strongly associated with psoriasis; however, it is difficult to identify additional independent association due to strong linkage disequilibrium in the MHC region. We performed stepwise regression analyses of more than 3,000 SNPs in the MHC region genotyped using Human 610-Quad (Illumina in 1,139 cases with psoriasis and 1,132 controls of Han Chinese population to search for additional independent association. With four regression models obtained, two SNPs rs9468925 in HLA-C/HLA-B and rs2858881 in HLA-DQA2 were repeatedly selected in all models, suggesting that multiple loci outside PSOR1 locus were associated with psoriasis. More importantly we find that rs9468925 in HLA-C/HLA-B is associated with both psoriasis and vitiligo, providing first important evidence that two major skin diseases share a common genetic locus in the MHC, and a basis for elucidating the molecular mechanism of skin disorders.
Regression analysis by example
Chatterjee, Samprit; Hadi, Ali S
2012-01-01
.... The emphasis continues to be on exploratory data analysis rather than statistical theory. The coverage offers in-depth treatment of regression diagnostics, transformation, multicollinearity, logistic regression, and robust regression...
Regression analysis by example
Chatterjee, Samprit
2012-01-01
Praise for the Fourth Edition: ""This book is . . . an excellent source of examples for regression analysis. It has been and still is readily readable and understandable."" -Journal of the American Statistical Association Regression analysis is a conceptually simple method for investigating relationships among variables. Carrying out a successful application of regression analysis, however, requires a balance of theoretical results, empirical rules, and subjective judgment. Regression Analysis by Example, Fifth Edition has been expanded
Knox, Matthew C; Edye, Michael
2016-04-01
Surgical antibiotic prophylaxis is frequently reported in the literature to be suboptimal, a finding having both clinical and public health implications. This study aimed to calculate rates and patterns of adherence to guidelines at two sites and identify extrinsic contributing factors. A retrospective analysis was conducted over two 12-mo periods during 2013-2014 at the metropolitan Blacktown Hospital and regional Lismore Base Hospital, New South Wales, Australia. A group of 400 patients undergoing abdominal general surgery was selected via simple random sampling (n = 200 per site). Medical records were reviewed, and prophylactic antibiotic regimens were compared with the Australian guideline, Therapeutic Guidelines: Antibiotic (v. 14) with respect to drug choice, dosage, timing of administration, and duration of administration. The overall rate of adherence to the guidelines was 16.5% at Blacktown Hospital and 19.5% at Lismore Base Hospital. At each site, prophylaxis was administered to more than 95% of patients and was inappropriately withheld in 4%. Drug choice was the most frequent error type, specifically involving inappropriate omission of metronidazole and use of newer-generation cephalosporins. Errors in the timing of administration also were frequent, with prophylaxis typically occurring excessively early. Logistic regression identified emergency surgery as independently associated with prophylactic errors in both the Blacktown Hospital (p antibiotic prophylactic guidelines was poor at both the metropolitan and regional sites. Choice of antibiotic and timing of administration were identified as major error types. Consideration should be given to multidisciplinary involvement of anesthetists, implementation of focused interventions with an emphasis on emergency settings, and further research correlating antibiotic use with clinical significance.
Seber, George A F
2012-01-01
Concise, mathematically clear, and comprehensive treatment of the subject.* Expanded coverage of diagnostics and methods of model fitting.* Requires no specialized knowledge beyond a good grasp of matrix algebra and some acquaintance with straight-line regression and simple analysis of variance models.* More than 200 problems throughout the book plus outline solutions for the exercises.* This revision has been extensively class-tested.
Batson, Sarah; Sutton, Alex; Abrams, Keith
2016-01-01
Background Patients with atrial fibrillation are at a greater risk of stroke and therefore the main goal for treatment of patients with atrial fibrillation is to prevent stroke from occurring. There are a number of different stroke prevention treatments available to include warfarin and novel oral anticoagulants. Previous network meta-analyses of novel oral anticoagulants for stroke prevention in atrial fibrillation acknowledge the limitation of heterogeneity across the included trials but have not explored the impact of potentially important treatment modifying covariates. Objectives To explore potentially important treatment modifying covariates using network meta-regression analyses for stroke prevention in atrial fibrillation. Methods We performed a network meta-analysis for the outcome of ischaemic stroke and conducted an exploratory regression analysis considering potentially important treatment modifying covariates. These covariates included the proportion of patients with a previous stroke, proportion of males, mean age, the duration of study follow-up and the patients underlying risk of ischaemic stroke. Results None of the covariates explored impacted relative treatment effects relative to placebo. Notably, the exploration of ‘study follow-up’ as a covariate supported the assumption that difference in trial durations is unimportant in this indication despite the variation across trials in the network. Conclusion This study is limited by the quantity of data available. Further investigation is warranted, and, as justifying further trials may be difficult, it would be desirable to obtain individual patient level data (IPD) to facilitate an effort to relate treatment effects to IPD covariates in order to investigate heterogeneity. Observational data could also be examined to establish if there are potential trends elsewhere. The approach and methods presented have potentially wide applications within any indication as to highlight the potential benefit
Multiple linear regression analysis
Edwards, T. R.
1980-01-01
Program rapidly selects best-suited set of coefficients. User supplies only vectors of independent and dependent data and specifies confidence level required. Program uses stepwise statistical procedure for relating minimal set of variables to set of observations; final regression contains only most statistically significant coefficients. Program is written in FORTRAN IV for batch execution and has been implemented on NOVA 1200.
Using Regression Mixture Analysis in Educational Research
Cody S. Ding
2006-11-01
Full Text Available Conventional regression analysis is typically used in educational research. Usually such an analysis implicitly assumes that a common set of regression parameter estimates captures the population characteristics represented in the sample. In some situations, however, this implicit assumption may not be realistic, and the sample may contain several subpopulations such as high math achievers and low math achievers. In these cases, conventional regression models may provide biased estimates since the parameter estimates are constrained to be the same across subpopulations. This paper advocates the applications of regression mixture models, also known as latent class regression analysis, in educational research. Regression mixture analysis is more flexible than conventional regression analysis in that latent classes in the data can be identified and regression parameter estimates can vary within each latent class. An illustration of regression mixture analysis is provided based on a dataset of authentic data. The strengths and limitations of the regression mixture models are discussed in the context of educational research.
Kauhl, Boris; Pieper, Jonas; Schweikart, Jürgen; Keste, Andrea; Moskwyn, Marita
2017-02-16
Understanding which population groups in which locations are at higher risk for type 2 diabetes mellitus (T2DM) allows efficient and cost-effective interventions targeting these risk-populations in great need in specific locations. The goal of this study was to analyze the spatial distribution of T2DM and to identify the location-specific, population-based risk factors using global and local spatial regression models. To display the spatial heterogeneity of T2DM, bivariate kernel density estimation was applied. An ordinary least squares regression model (OLS) was applied to identify population-based risk factors of T2DM. A geographically weighted regression model (GWR) was then constructed to analyze the spatially varying association between the identified risk factors and T2DM. T2DM is especially concentrated in the east and outskirts of Berlin. The OLS model identified proportions of persons aged 80 and older, persons without migration background, long-term unemployment, households with children and a negative association with single-parenting households as socio-demographic risk groups. The results of the GWR model point out important local variations of the strength of association between the identified risk factors and T2DM. The risk factors for T2DM depend largely on the socio-demographic composition of the neighborhoods in Berlin and highlight that a one-size-fits-all approach is not appropriate for the prevention of T2DM. Future prevention strategies should be tailored to target location-specific risk-groups. © Georg Thieme Verlag KG Stuttgart · New York.
Using Logistic Regression to Identify Risk Factors Causing Rollover Collisions
Essam Dabbour
2012-12-01
Full Text Available Rollover collisions are among the most serious collisions that usually result in severe injuries or fatalities. In 2009, there were 8,732 fatal rollover collisions in the United States of America that resulted in the death of 9,833 persons. Those numbers represent approximately 28% and 29% of the total numbers of fatal collisions and fatalities, respectively. The main objective of this paper is to examine the impact of different risk factors that may contribute to this type of serious collisions to help develop countermeasures that limit them. To avoid the bias that may be caused by interactions among different drivers, this analysis focuses on rollover related to single-vehicle collisions so that the behavior of the driver of the collided vehicle can be analyzed more effectively. Logistic regression technique is utilized to analyze single-vehicle rollover collisions that occurred on state and interstate highways in the states of Ohio and Washington in 2009. The results obtained from this analysis have the potential to help decision makers identify different strategies to limit the severity of this type of collisions.
Identifying Interacting Genetic Variations by Fish-Swarm Logic Regression
Yang, Aiyuan; Yan, Chunxia; Zhu, Feng; Zhao, Zhongmeng; Cao, Zhi
2013-01-01
Understanding associations between genotypes and complex traits is a fundamental problem in human genetics. A major open problem in mapping phenotypes is that of identifying a set of interacting genetic variants, which might contribute to complex traits. Logic regression (LR) is a powerful multivariant association tool. Several LR-based approaches have been successfully applied to different datasets. However, these approaches are not adequate with regard to accuracy and efficiency. In this paper, we propose a new LR-based approach, called fish-swarm logic regression (FSLR), which improves the logic regression process by incorporating swarm optimization. In our approach, a school of fish agents are conducted in parallel. Each fish agent holds a regression model, while the school searches for better models through various preset behaviors. A swarm algorithm improves the accuracy and the efficiency by speeding up the convergence and preventing it from dropping into local optimums. We apply our approach on a real screening dataset and a series of simulation scenarios. Compared to three existing LR-based approaches, our approach outperforms them by having lower type I and type II error rates, being able to identify more preset causal sites, and performing at faster speeds. PMID:23984382
Identifying predictors of physics item difficulty: A linear regression approach
Vanes Mesic
2011-06-01
Full Text Available Large-scale assessments of student achievement in physics are often approached with an intention to discriminate students based on the attained level of their physics competencies. Therefore, for purposes of test design, it is important that items display an acceptable discriminatory behavior. To that end, it is recommended to avoid extraordinary difficult and very easy items. Knowing the factors that influence physics item difficulty makes it possible to model the item difficulty even before the first pilot study is conducted. Thus, by identifying predictors of physics item difficulty, we can improve the test-design process. Furthermore, we get additional qualitative feedback regarding the basic aspects of student cognitive achievement in physics that are directly responsible for the obtained, quantitative test results. In this study, we conducted a secondary analysis of data that came from two large-scale assessments of student physics achievement at the end of compulsory education in Bosnia and Herzegovina. Foremost, we explored the concept of “physics competence” and performed a content analysis of 123 physics items that were included within the above-mentioned assessments. Thereafter, an item database was created. Items were described by variables which reflect some basic cognitive aspects of physics competence. For each of the assessments, Rasch item difficulties were calculated in separate analyses. In order to make the item difficulties from different assessments comparable, a virtual test equating procedure had to be implemented. Finally, a regression model of physics item difficulty was created. It has been shown that 61.2% of item difficulty variance can be explained by factors which reflect the automaticity, complexity, and modality of the knowledge structure that is relevant for generating the most probable correct solution, as well as by the divergence of required thinking and interference effects between intuitive and formal
Identifying predictors of physics item difficulty: A linear regression approach
Mesic, Vanes; Muratovic, Hasnija
2011-06-01
Large-scale assessments of student achievement in physics are often approached with an intention to discriminate students based on the attained level of their physics competencies. Therefore, for purposes of test design, it is important that items display an acceptable discriminatory behavior. To that end, it is recommended to avoid extraordinary difficult and very easy items. Knowing the factors that influence physics item difficulty makes it possible to model the item difficulty even before the first pilot study is conducted. Thus, by identifying predictors of physics item difficulty, we can improve the test-design process. Furthermore, we get additional qualitative feedback regarding the basic aspects of student cognitive achievement in physics that are directly responsible for the obtained, quantitative test results. In this study, we conducted a secondary analysis of data that came from two large-scale assessments of student physics achievement at the end of compulsory education in Bosnia and Herzegovina. Foremost, we explored the concept of “physics competence” and performed a content analysis of 123 physics items that were included within the above-mentioned assessments. Thereafter, an item database was created. Items were described by variables which reflect some basic cognitive aspects of physics competence. For each of the assessments, Rasch item difficulties were calculated in separate analyses. In order to make the item difficulties from different assessments comparable, a virtual test equating procedure had to be implemented. Finally, a regression model of physics item difficulty was created. It has been shown that 61.2% of item difficulty variance can be explained by factors which reflect the automaticity, complexity, and modality of the knowledge structure that is relevant for generating the most probable correct solution, as well as by the divergence of required thinking and interference effects between intuitive and formal physics knowledge
Common pitfalls in statistical analysis: Logistic regression.
Ranganathan, Priya; Pramesh, C S; Aggarwal, Rakesh
2017-01-01
Logistic regression analysis is a statistical technique to evaluate the relationship between various predictor variables (either categorical or continuous) and an outcome which is binary (dichotomous). In this article, we discuss logistic regression analysis and the limitations of this technique.
Principal component regression analysis with SPSS.
Liu, R X; Kuang, J; Gong, Q; Hou, X L
2003-06-01
The paper introduces all indices of multicollinearity diagnoses, the basic principle of principal component regression and determination of 'best' equation method. The paper uses an example to describe how to do principal component regression analysis with SPSS 10.0: including all calculating processes of the principal component regression and all operations of linear regression, factor analysis, descriptives, compute variable and bivariate correlations procedures in SPSS 10.0. The principal component regression analysis can be used to overcome disturbance of the multicollinearity. The simplified, speeded up and accurate statistical effect is reached through the principal component regression analysis with SPSS.
Regression Analysis by Example. 5th Edition
Chatterjee, Samprit; Hadi, Ali S.
2012-01-01
Regression analysis is a conceptually simple method for investigating relationships among variables. Carrying out a successful application of regression analysis, however, requires a balance of theoretical results, empirical rules, and subjective judgment. "Regression Analysis by Example, Fifth Edition" has been expanded and thoroughly…
Using Mixture Regression to Identify Varying Effects: A Demonstration with Paternal Incarceration
Dyer, W. Justin; Pleck, Joseph; McBride, Brent
2012-01-01
The most widely used techniques for identifying the varying effects of stressors involve testing moderator effects via interaction terms in regression or multiple-group analysis in structural equation modeling. The authors present mixture regression as an alternative approach. In contrast to more widely used approaches, mixture regression…
Paula Roberta Mendes
2004-04-01
Full Text Available No presente estudo, ajustou-se um modelo de regressão logística para prever a probabilidade de óbito de cães acometidos por gastroenterite hemorrágica. O modelo Logístico é recomendado para variáveis-resposta dicotômicas em estudo de Coorte. Registraram-se 176 animais censitariamente atendidos com gastroenterite hemorrágica em quatro clínicas veterinárias da cidade de Lavras, sul de Minas Gerais, entre os anos de 1992 e 1999. Após terem sido selecionadas por meio do teste de de Pearson ou teste exato de Fisher, ajustou-se o modelo considerando-se as variáveis sexo, idade, diárias de internação e número de atendimentos. A estimação dos parâmetros foi feita pelo método da máxima verossimilhança. Conclui-se que quando os cães acometidos por gastroenterite hemorrágica são atendidos apenas uma vez, aqueles com idade superior a 6 meses possuem 15,45 vezes mais chances de morrerem (PThis paper presents a study of how to fit a logistic regression model to predict the death probability of dogs with hemorrhagic gastroenteritis. A logistic model is recommended to treat dichotomic variables in Coorte study. Using a census procedure from 1992 to 1999 four veterinary clinic in Lavras, MG, registered 176 infected animals. The variables of the model have been chosen to be sex, age internment days rates and number of clinical treatments by the or Fisher’s exact test. The parameters were estimated by the maximum likelihood method. The results showed that if the infected dogs were clinically treated only once then the animals older than six months had their mortality chances 15.45 times (P<0.05 larger than those younger than six months. If the infected animals younger than six months were clinically treated only once then their mortality chances were 20.251 (P<0.05 higher than if they had received two to seven medical treatments.
Functional linear regression via canonical analysis
He, Guozhong; Wang, Jane-Ling; Yang, Wenjing; 10.3150/09-BEJ228
2011-01-01
We study regression models for the situation where both dependent and independent variables are square-integrable stochastic processes. Questions concerning the definition and existence of the corresponding functional linear regression models and some basic properties are explored for this situation. We derive a representation of the regression parameter function in terms of the canonical components of the processes involved. This representation establishes a connection between functional regression and functional canonical analysis and suggests alternative approaches for the implementation of functional linear regression analysis. A specific procedure for the estimation of the regression parameter function using canonical expansions is proposed and compared with an established functional principal component regression approach. As an example of an application, we present an analysis of mortality data for cohorts of medflies, obtained in experimental studies of aging and longevity.
Applied regression analysis a research tool
Pantula, Sastry; Dickey, David
1998-01-01
Least squares estimation, when used appropriately, is a powerful research tool. A deeper understanding of the regression concepts is essential for achieving optimal benefits from a least squares analysis. This book builds on the fundamentals of statistical methods and provides appropriate concepts that will allow a scientist to use least squares as an effective research tool. Applied Regression Analysis is aimed at the scientist who wishes to gain a working knowledge of regression analysis. The basic purpose of this book is to develop an understanding of least squares and related statistical methods without becoming excessively mathematical. It is the outgrowth of more than 30 years of consulting experience with scientists and many years of teaching an applied regression course to graduate students. Applied Regression Analysis serves as an excellent text for a service course on regression for non-statisticians and as a reference for researchers. It also provides a bridge between a two-semester introduction to...
Zewude, Bereket Tessema; Ashine, Kidus Meskele
2016-01-01
An attempt has been made to assess and identify the major variables that influence student academic achievement at college of natural and computational science of Wolaita Sodo University in Ethiopia. Study time, peer influence, securing first choice of department, arranging study time outside class, amount of money received from family, good life…
Regression Analysis and the Sociological Imagination
De Maio, Fernando
2014-01-01
Regression analysis is an important aspect of most introductory statistics courses in sociology but is often presented in contexts divorced from the central concerns that bring students into the discipline. Consequently, we present five lesson ideas that emerge from a regression analysis of income inequality and mortality in the USA and Canada.
Regression Analysis and the Sociological Imagination
De Maio, Fernando
2014-01-01
Regression analysis is an important aspect of most introductory statistics courses in sociology but is often presented in contexts divorced from the central concerns that bring students into the discipline. Consequently, we present five lesson ideas that emerge from a regression analysis of income inequality and mortality in the USA and Canada.
Heteroscedastic regression analysis method for mixed data
FU Hui-min; YUE Xiao-rui
2011-01-01
The heteroscedastic regression model was established and the heteroscedastic regression analysis method was presented for mixed data composed of complete data, type- I censored data and type- Ⅱ censored data from the location-scale distribution. The best unbiased estimations of regression coefficients, as well as the confidence limits of the location parameter and scale parameter were given. Furthermore, the point estimations and confidence limits of percentiles were obtained. Thus, the traditional multiple regression analysis method which is only suitable to the complete data from normal distribution can be extended to the cases of heteroscedastic mixed data and the location-scale distribution. So the presented method has a broad range of promising applications.
Relative risk regression analysis of epidemiologic data.
Prentice, R L
1985-11-01
Relative risk regression methods are described. These methods provide a unified approach to a range of data analysis problems in environmental risk assessment and in the study of disease risk factors more generally. Relative risk regression methods are most readily viewed as an outgrowth of Cox's regression and life model. They can also be viewed as a regression generalization of more classical epidemiologic procedures, such as that due to Mantel and Haenszel. In the context of an epidemiologic cohort study, relative risk regression methods extend conventional survival data methods and binary response (e.g., logistic) regression models by taking explicit account of the time to disease occurrence while allowing arbitrary baseline disease rates, general censorship, and time-varying risk factors. This latter feature is particularly relevant to many environmental risk assessment problems wherein one wishes to relate disease rates at a particular point in time to aspects of a preceding risk factor history. Relative risk regression methods also adapt readily to time-matched case-control studies and to certain less standard designs. The uses of relative risk regression methods are illustrated and the state of development of these procedures is discussed. It is argued that asymptotic partial likelihood estimation techniques are now well developed in the important special case in which the disease rates of interest have interpretations as counting process intensity functions. Estimation of relative risks processes corresponding to disease rates falling outside this class has, however, received limited attention. The general area of relative risk regression model criticism has, as yet, not been thoroughly studied, though a number of statistical groups are studying such features as tests of fit, residuals, diagnostics and graphical procedures. Most such studies have been restricted to exponential form relative risks as have simulation studies of relative risk estimation
Regression analysis using dependent Polya trees.
Schörgendorfer, Angela; Branscum, Adam J
2013-11-30
Many commonly used models for linear regression analysis force overly simplistic shape and scale constraints on the residual structure of data. We propose a semiparametric Bayesian model for regression analysis that produces data-driven inference by using a new type of dependent Polya tree prior to model arbitrary residual distributions that are allowed to evolve across increasing levels of an ordinal covariate (e.g., time, in repeated measurement studies). By modeling residual distributions at consecutive covariate levels or time points using separate, but dependent Polya tree priors, distributional information is pooled while allowing for broad pliability to accommodate many types of changing residual distributions. We can use the proposed dependent residual structure in a wide range of regression settings, including fixed-effects and mixed-effects linear and nonlinear models for cross-sectional, prospective, and repeated measurement data. A simulation study illustrates the flexibility of our novel semiparametric regression model to accurately capture evolving residual distributions. In an application to immune development data on immunoglobulin G antibodies in children, our new model outperforms several contemporary semiparametric regression models based on a predictive model selection criterion. Copyright © 2013 John Wiley & Sons, Ltd.
Robust Mediation Analysis Based on Median Regression
Yuan, Ying; MacKinnon, David P.
2014-01-01
Mediation analysis has many applications in psychology and the social sciences. The most prevalent methods typically assume that the error distribution is normal and homoscedastic. However, this assumption may rarely be met in practice, which can affect the validity of the mediation analysis. To address this problem, we propose robust mediation analysis based on median regression. Our approach is robust to various departures from the assumption of homoscedasticity and normality, including heavy-tailed, skewed, contaminated, and heteroscedastic distributions. Simulation studies show that under these circumstances, the proposed method is more efficient and powerful than standard mediation analysis. We further extend the proposed robust method to multilevel mediation analysis, and demonstrate through simulation studies that the new approach outperforms the standard multilevel mediation analysis. We illustrate the proposed method using data from a program designed to increase reemployment and enhance mental health of job seekers. PMID:24079925
Functional data analysis of generalized regression quantiles
Guo, Mengmeng
2013-11-05
Generalized regression quantiles, including the conditional quantiles and expectiles as special cases, are useful alternatives to the conditional means for characterizing a conditional distribution, especially when the interest lies in the tails. We develop a functional data analysis approach to jointly estimate a family of generalized regression quantiles. Our approach assumes that the generalized regression quantiles share some common features that can be summarized by a small number of principal component functions. The principal component functions are modeled as splines and are estimated by minimizing a penalized asymmetric loss measure. An iterative least asymmetrically weighted squares algorithm is developed for computation. While separate estimation of individual generalized regression quantiles usually suffers from large variability due to lack of sufficient data, by borrowing strength across data sets, our joint estimation approach significantly improves the estimation efficiency, which is demonstrated in a simulation study. The proposed method is applied to data from 159 weather stations in China to obtain the generalized quantile curves of the volatility of the temperature at these stations. © 2013 Springer Science+Business Media New York.
3D Regression Heat Map Analysis of Population Study Data.
Klemm, Paul; Lawonn, Kai; Glaßer, Sylvia; Niemann, Uli; Hegenscheid, Katrin; Völzke, Henry; Preim, Bernhard
2016-01-01
Epidemiological studies comprise heterogeneous data about a subject group to define disease-specific risk factors. These data contain information (features) about a subject's lifestyle, medical status as well as medical image data. Statistical regression analysis is used to evaluate these features and to identify feature combinations indicating a disease (the target feature). We propose an analysis approach of epidemiological data sets by incorporating all features in an exhaustive regression-based analysis. This approach combines all independent features w.r.t. a target feature. It provides a visualization that reveals insights into the data by highlighting relationships. The 3D Regression Heat Map, a novel 3D visual encoding, acts as an overview of the whole data set. It shows all combinations of two to three independent features with a specific target disease. Slicing through the 3D Regression Heat Map allows for the detailed analysis of the underlying relationships. Expert knowledge about disease-specific hypotheses can be included into the analysis by adjusting the regression model formulas. Furthermore, the influences of features can be assessed using a difference view comparing different calculation results. We applied our 3D Regression Heat Map method to a hepatic steatosis data set to reproduce results from a data mining-driven analysis. A qualitative analysis was conducted on a breast density data set. We were able to derive new hypotheses about relations between breast density and breast lesions with breast cancer. With the 3D Regression Heat Map, we present a visual overview of epidemiological data that allows for the first time an interactive regression-based analysis of large feature sets with respect to a disease.
Credit Scoring Problem Based on Regression Analysis
Khassawneh, Bashar Suhil Jad Allah
2014-01-01
ABSTRACT: This thesis provides an explanatory introduction to the regression models of data mining and contains basic definitions of key terms in the linear, multiple and logistic regression models. Meanwhile, the aim of this study is to illustrate fitting models for the credit scoring problem using simple linear, multiple linear and logistic regression models and also to analyze the found model functions by statistical tools. Keywords: Data mining, linear regression, logistic regression....
Remaining Phosphorus Estimate Through Multiple Regression Analysis
M. E. ALVES; A. LAVORENTI
2006-01-01
The remaining phosphorus (Prem), P concentration that remains in solution after shaking soil with 0.01 mol L-1 CaCl2 containing 60 μg mL-1 P, is a very useful index for studies related to the chemistry of variable charge soils. Although the Prem determination is a simple procedure, the possibility of estimating accurate values of this index from easily and/or routinely determined soil properties can be very useful for practical purposes. The present research evaluated the Premestimation through multiple regression analysis in which routinely determined soil chemical data, soil clay content and soil pH measured in 1 mol L-1 NaF (pHNaF) figured as Prem predictor variables. The Prem can be estimated with acceptable accuracy using the above-mentioned approach, and PHNaF not only substitutes for clay content as a predictor variable but also confers more accuracy to the Prem estimates.
Common pitfalls in statistical analysis: Linear regression analysis.
Aggarwal, Rakesh; Ranganathan, Priya
2017-01-01
In a previous article in this series, we explained correlation analysis which describes the strength of relationship between two continuous variables. In this article, we deal with linear regression analysis which predicts the value of one continuous variable from another. We also discuss the assumptions and pitfalls associated with this analysis.
Common pitfalls in statistical analysis: Linear regression analysis
Rakesh Aggarwal
2017-01-01
Full Text Available In a previous article in this series, we explained correlation analysis which describes the strength of relationship between two continuous variables. In this article, we deal with linear regression analysis which predicts the value of one continuous variable from another. We also discuss the assumptions and pitfalls associated with this analysis.
Sliced Inverse Regression for Time Series Analysis
Chen, Li-Sue
1995-11-01
In this thesis, general nonlinear models for time series data are considered. A basic form is x _{t} = f(beta_sp{1} {T}X_{t-1},beta_sp {2}{T}X_{t-1},... , beta_sp{k}{T}X_ {t-1},varepsilon_{t}), where x_{t} is an observed time series data, X_{t } is the first d time lag vector, (x _{t},x_{t-1},... ,x _{t-d-1}), f is an unknown function, beta_{i}'s are unknown vectors, varepsilon_{t }'s are independent distributed. Special cases include AR and TAR models. We investigate the feasibility applying SIR/PHD (Li 1990, 1991) (the sliced inverse regression and principal Hessian methods) in estimating beta _{i}'s. PCA (Principal component analysis) is brought in to check one critical condition for SIR/PHD. Through simulation and a study on 3 well -known data sets of Canadian lynx, U.S. unemployment rate and sunspot numbers, we demonstrate how SIR/PHD can effectively retrieve the interesting low-dimension structures for time series data.
MULTIPLE REGRESSION ANALYSIS OF MAIN ECONOMIC INDICATORS IN TOURISM
Erika KULCSÁR
2009-12-01
Full Text Available This paper analysis the measure between GDP dependent variable in the sector of hotels and restaurants and the following independent variables: overnight stays in the establishments of touristic reception, arrivals in the establishments of touristic reception and investments in hotels and restaurants sector in the period of analysis 1995-2007. With the multiple regression analysis I found that investments and tourist arrivals are significant predictors for the GDP dependent variable. Based on these results, I identified those components of the marketing mix, which in my opinion require investment, which could contribute to the positive development of tourist arrivals in the establishments of touristic reception.
MULTIPLE REGRESSION ANALYSIS OF MAIN ECONOMIC INDICATORS IN TOURISM
Erika KULCSÁR
2009-12-01
Full Text Available This paper analysis the measure between GDP dependent variable in the sector of hotels and restaurants and the following independent variables: overnight stays in the establishments of touristic reception, arrivals in the establishments of touristic reception and investments in hotels and restaurants sector in the period of analysis 1995-2007. With the multiple regression analysis I found that investments and tourist arrivals are significant predictors for the GDP dependent variable. Based on these results, I identified those components of the marketing mix, which in my opinion require investment, which could contribute to the positive development of tourist arrivals in the establishments of touristic reception.
Using Dominance Analysis to Determine Predictor Importance in Logistic Regression
Azen, Razia; Traxel, Nicole
2009-01-01
This article proposes an extension of dominance analysis that allows researchers to determine the relative importance of predictors in logistic regression models. Criteria for choosing logistic regression R[superscript 2] analogues were determined and measures were selected that can be used to perform dominance analysis in logistic regression. A…
Using Dominance Analysis to Determine Predictor Importance in Logistic Regression
Azen, Razia; Traxel, Nicole
2009-01-01
This article proposes an extension of dominance analysis that allows researchers to determine the relative importance of predictors in logistic regression models. Criteria for choosing logistic regression R[superscript 2] analogues were determined and measures were selected that can be used to perform dominance analysis in logistic regression. A…
Forecasting urban water demand: A meta-regression analysis.
Sebri, Maamar
2016-12-01
Water managers and planners require accurate water demand forecasts over the short-, medium- and long-term for many purposes. These range from assessing water supply needs over spatial and temporal patterns to optimizing future investments and planning future allocations across competing sectors. This study surveys the empirical literature on the urban water demand forecasting using the meta-analytical approach. Specifically, using more than 600 estimates, a meta-regression analysis is conducted to identify explanations of cross-studies variation in accuracy of urban water demand forecasting. Our study finds that accuracy depends significantly on study characteristics, including demand periodicity, modeling method, forecasting horizon, model specification and sample size. The meta-regression results remain robust to different estimators employed as well as to a series of sensitivity checks performed. The importance of these findings lies in the conclusions and implications drawn out for regulators and policymakers and for academics alike. Copyright © 2016. Published by Elsevier Ltd.
Quantile regression and Bayesian cluster detection to identify radon prone areas.
Sarra, Annalina; Fontanella, Lara; Valentini, Pasquale; Palermi, Sergio
2016-11-01
Albeit the dominant source of radon in indoor environments is the geology of the territory, many studies have demonstrated that indoor radon concentrations also depend on dwelling-specific characteristics. Following a stepwise analysis, in this study we propose a combined approach to delineate radon prone areas. We first investigate the impact of various building covariates on indoor radon concentrations. To achieve a more complete picture of this association, we exploit the flexible formulation of a Bayesian spatial quantile regression, which is also equipped with parameters that controls the spatial dependence across data. The quantitative knowledge of the influence of each significant building-specific factor on the measured radon levels is employed to predict the radon concentrations that would have been found if the sampled buildings had possessed standard characteristics. Those normalised radon measures should reflect the geogenic radon potential of the underlying ground, which is a quantity directly related to the geological environment. The second stage of the analysis is aimed at identifying radon prone areas, and to this end, we adopt a Bayesian model for spatial cluster detection using as reference unit the building with standard characteristics. The case study is based on a data set of more than 2000 indoor radon measures, available for the Abruzzo region (Central Italy) and collected by the Agency of Environmental Protection of Abruzzo, during several indoor radon monitoring surveys. Copyright © 2016 Elsevier Ltd. All rights reserved.
Stability Analysis for Regularized Least Squares Regression
Rudin, Cynthia
2005-01-01
We discuss stability for a class of learning algorithms with respect to noisy labels. The algorithms we consider are for regression, and they involve the minimization of regularized risk functionals, such as L(f) := 1/N sum_i (f(x_i)-y_i)^2+ lambda ||f||_H^2. We shall call the algorithm `stable' if, when y_i is a noisy version of f*(x_i) for some function f* in H, the output of the algorithm converges to f* as the regularization term and noise simultaneously vanish. We consider two flavors of...
Poisson Regression Analysis of Illness and Injury Surveillance Data
Frome E.L., Watkins J.P., Ellis E.D.
2012-12-12
The Department of Energy (DOE) uses illness and injury surveillance to monitor morbidity and assess the overall health of the work force. Data collected from each participating site include health events and a roster file with demographic information. The source data files are maintained in a relational data base, and are used to obtain stratified tables of health event counts and person time at risk that serve as the starting point for Poisson regression analysis. The explanatory variables that define these tables are age, gender, occupational group, and time. Typical response variables of interest are the number of absences due to illness or injury, i.e., the response variable is a count. Poisson regression methods are used to describe the effect of the explanatory variables on the health event rates using a log-linear main effects model. Results of fitting the main effects model are summarized in a tabular and graphical form and interpretation of model parameters is provided. An analysis of deviance table is used to evaluate the importance of each of the explanatory variables on the event rate of interest and to determine if interaction terms should be considered in the analysis. Although Poisson regression methods are widely used in the analysis of count data, there are situations in which over-dispersion occurs. This could be due to lack-of-fit of the regression model, extra-Poisson variation, or both. A score test statistic and regression diagnostics are used to identify over-dispersion. A quasi-likelihood method of moments procedure is used to evaluate and adjust for extra-Poisson variation when necessary. Two examples are presented using respiratory disease absence rates at two DOE sites to illustrate the methods and interpretation of the results. In the first example the Poisson main effects model is adequate. In the second example the score test indicates considerable over-dispersion and a more detailed analysis attributes the over-dispersion to extra
Takagi, Daisuke; Ikeda, Ken'ichi; Kawachi, Ichiro
2012-11-01
Crime is an important determinant of public health outcomes, including quality of life, mental well-being, and health behavior. A body of research has documented the association between community social capital and crime victimization. The association between social capital and crime victimization has been examined at multiple levels of spatial aggregation, ranging from entire countries, to states, metropolitan areas, counties, and neighborhoods. In multilevel analysis, the spatial boundaries at level 2 are most often drawn from administrative boundaries (e.g., Census tracts in the U.S.). One problem with adopting administrative definitions of neighborhoods is that it ignores spatial spillover. We conducted a study of social capital and crime victimization in one ward of Tokyo city, using a spatial Durbin model with an inverse-distance weighting matrix that assigned each respondent a unique level of "exposure" to social capital based on all other residents' perceptions. The study is based on a postal questionnaire sent to 20-69 years old residents of Arakawa Ward, Tokyo. The response rate was 43.7%. We examined the contextual influence of generalized trust, perceptions of reciprocity, two types of social network variables, as well as two principal components of social capital (constructed from the above four variables). Our outcome measure was self-reported crime victimization in the last five years. In the spatial Durbin model, we found that neighborhood generalized trust, reciprocity, supportive networks and two principal components of social capital were each inversely associated with crime victimization. By contrast, a multilevel regression performed with the same data (using administrative neighborhood boundaries) found generally null associations between neighborhood social capital and crime. Spatial regression methods may be more appropriate for investigating the contextual influence of social capital in homogeneous cultural settings such as Japan.
Paap, M.C.S.; Eggen, T.J.H.M.; Veldkamp, B.P.
2012-01-01
Standardized tests often group items around a common stimulus. Such groupings of items are called testlets. The potential dependency among items within a testlet is generally ignored in practice, even though a basic assumption of item response theory (IRT) is that individual items are independent of one another. A technique called tree-based regression (TBR) was applied to identify key features of stimuli that could properly predict the dependence structure of testlet data. Knowledge about th...
Hecht, Jeffrey B.
The analysis of regression residuals and detection of outliers are discussed, with emphasis on determining how deviant an individual data point must be to be considered an outlier and the impact that multiple suspected outlier data points have on the process of outlier determination and treatment. Only bivariate (one dependent and one independent)…
Nishanee Rampersad
2017-01-01
Full Text Available Background: Assessment of intraocular pressure (IOP is an important test in glaucoma. In addition, anterior segment variables may be useful in screening for glaucoma risk. Studies have investigated the associations between IOP and anterior segment variables using traditional statistical methods. The classification and regression tree (CART method provides another dimension to detect important variables in a relationship automatically.Aim: To identify the critical factors that influence IOP using a regression tree.Methods: A quantitative cross-sectional research design was used. Anterior segment variables were measured in 700 participants using the iVue100 optical coherence tomographer, Oculus Keratograph and Nidek US-500 ultrasonographer. A Goldmann applanation tonometer was used to measure IOP. Data from only the right eyes were analysed because of high levels of interocular symmetry. A regression tree model was generated with the CART method and Pearson’s correlation coefficients were used to assess the relationships between the ocular variables.Results: The mean IOP for the entire sample was 14.63 mmHg ± 2.40 mmHg. The CART method selected three anterior segment variables in the regression tree model. Central corneal thickness was the most important variable with a cut-off value of 527 µm. The other important variables included average paracentral corneal thickness and axial anterior chamber depth. Corneal thickness measurements increased towards the periphery and were significantly correlated with IOP (r ≥ 0.50, p ≤ 0.001.Conclusion: The CART method identified the anterior segment variables that influenced IOP. Understanding the relationship between IOP and anterior segment variables may help to clinically identify patients with ocular risk factors associated with elevated IOPs.
Regression Commonality Analysis: A Technique for Quantitative Theory Building
Nimon, Kim; Reio, Thomas G., Jr.
2011-01-01
When it comes to multiple linear regression analysis (MLR), it is common for social and behavioral science researchers to rely predominately on beta weights when evaluating how predictors contribute to a regression model. Presenting an underutilized statistical technique, this article describes how organizational researchers can use commonality…
General Nature of Multicollinearity in Multiple Regression Analysis.
Liu, Richard
1981-01-01
Discusses multiple regression, a very popular statistical technique in the field of education. One of the basic assumptions in regression analysis requires that independent variables in the equation should not be highly correlated. The problem of multicollinearity and some of the solutions to it are discussed. (Author)
Multivariate Regression Analysis of Gravitational Waves from Rotating Core Collapse
Engels, William J; Ott, Christian D
2014-01-01
We present a new multivariate regression model for analysis and parameter estimation of gravitational waves observed from well but not perfectly modeled sources such as core-collapse supernovae. Our approach is based on a principal component decomposition of simulated waveform catalogs. Instead of reconstructing waveforms by direct linear combination of physically meaningless principal components, we solve via least squares for the relationship that encodes the connection between chosen physical parameters and the principal component basis. Although our approach is linear, the waveforms' parameter dependence may be non-linear. For the case of gravitational waves from rotating core collapse, we show, using statistical hypothesis testing, that our method is capable of identifying the most important physical parameters that govern waveform morphology in the presence of simulated detector noise. We also demonstrate our method's ability to predict waveforms from a principal component basis given a set of physical ...
Identifying of risks in pricing using a regression model of demand on price dependence
O.I. Yashkina
2016-09-01
Full Text Available The aim of the article. The main purpose of the article is to describe scientific and methodological approaches of determining the price elasticity of demand as a regression model based on the price and risk assessment of price variations on the received model. The results of the analysis. The study is based on the assumption that the index of price elasticity of demand on high-tech innovation is not constant as it is commonly understood in the classical sense. On the stage of commodity market release and subsequent sales growth, the index of price elasticity of demand may vary within certain limits. Index value and thereafter market response are closely related to the current price. Achieving the stated purpose of the article is possible when having factual information about prices and corresponding volumes of sales of new high-tech products for a short period of time, on the basis of which types of demand and prices interrelation are modeled. Risk assessment of pricing and profit optimization by the regression of demand depending on price consists of three stages: a obtaining of a regression model of the demand on the price; b obtaining of function of demand price elasticity and risk assessment of pricing depending on behavior of the function; c determination of the price of company to receive a maximum operating profit based on the specific model of price to demand function. To receive the regression model of dependence of demand on price it is recommended to use specific reference models. The article includes linear, hyperbolic and parabolic models. The regression dependence of price elasticity of demand on price for each of the reference models of demand is obtained on the basis of the function elasticity concept in mathematical analysis. The concept of «function of price elasticity of demand» expresses this dependence. For the received functions of price elasticity of demand, the article provides intervals with the highest and lowest
Predicting Nigeria budget allocation using regression analysis: A ...
Predicting Nigeria budget allocation using regression analysis: A data mining approach. ... Open Access DOWNLOAD FULL TEXT ... Budget is used by the Government as a guiding tool for planning and management of its resources to aid in ...
An Original Stepwise Multilevel Logistic Regression Analysis of Discriminatory Accuracy
Merlo, Juan; Wagner, Philippe; Ghith, Nermin
2016-01-01
BACKGROUND AND AIM: Many multilevel logistic regression analyses of "neighbourhood and health" focus on interpreting measures of associations (e.g., odds ratio, OR). In contrast, multilevel analysis of variance is rarely considered. We propose an original stepwise analytical approach that disting......BACKGROUND AND AIM: Many multilevel logistic regression analyses of "neighbourhood and health" focus on interpreting measures of associations (e.g., odds ratio, OR). In contrast, multilevel analysis of variance is rarely considered. We propose an original stepwise analytical approach...
Least Squares Adjustment: Linear and Nonlinear Weighted Regression Analysis
Nielsen, Allan Aasbjerg
2007-01-01
This note primarily describes the mathematics of least squares regression analysis as it is often used in geodesy including land surveying and satellite positioning applications. In these fields regression is often termed adjustment. The note also contains a couple of typical land surveying...... and satellite positioning application examples. In these application areas we are typically interested in the parameters in the model typically 2- or 3-D positions and not in predictive modelling which is often the main concern in other regression analysis applications. Adjustment is often used to obtain...
Ahn, Kuk-Hyun; Palmer, Richard
2016-09-01
Despite wide use of regression-based regional flood frequency analysis (RFFA) methods, the majority are based on either ordinary least squares (OLS) or generalized least squares (GLS). This paper proposes 'spatial proximity' based RFFA methods using the spatial lagged model (SLM) and spatial error model (SEM). The proposed methods are represented by two frameworks: the quantile regression technique (QRT) and parameter regression technique (PRT). The QRT develops prediction equations for flooding quantiles in average recurrence intervals (ARIs) of 2, 5, 10, 20, and 100 years whereas the PRT provides prediction of three parameters for the selected distribution. The proposed methods are tested using data incorporating 30 basin characteristics from 237 basins in Northeastern United States. Results show that generalized extreme value (GEV) distribution properly represents flood frequencies in the study gages. Also, basin area, stream network, and precipitation seasonality are found to be the most effective explanatory variables in prediction modeling by the QRT and PRT. 'Spatial proximity' based RFFA methods provide reliable flood quantile estimates compared to simpler methods. Compared to the QRT, the PRT may be recommended due to its accuracy and computational simplicity. The results presented in this paper may serve as one possible guidepost for hydrologists interested in flood analysis at ungaged sites.
Anke Hüls
2017-05-01
Full Text Available Antimicrobial resistance in livestock is a matter of general concern. To develop hygiene measures and methods for resistance prevention and control, epidemiological studies on a population level are needed to detect factors associated with antimicrobial resistance in livestock holdings. In general, regression models are used to describe these relationships between environmental factors and resistance outcome. Besides the study design, the correlation structures of the different outcomes of antibiotic resistance and structural zero measurements on the resistance outcome as well as on the exposure side are challenges for the epidemiological model building process. The use of appropriate regression models that acknowledge these complexities is essential to assure valid epidemiological interpretations. The aims of this paper are (i to explain the model building process comparing several competing models for count data (negative binomial model, quasi-Poisson model, zero-inflated model, and hurdle model and (ii to compare these models using data from a cross-sectional study on antibiotic resistance in animal husbandry. These goals are essential to evaluate which model is most suitable to identify potential prevention measures. The dataset used as an example in our analyses was generated initially to study the prevalence and associated factors for the appearance of cefotaxime-resistant Escherichia coli in 48 German fattening pig farms. For each farm, the outcome was the count of samples with resistant bacteria. There was almost no overdispersion and only moderate evidence of excess zeros in the data. Our analyses show that it is essential to evaluate regression models in studies analyzing the relationship between environmental factors and antibiotic resistances in livestock. After model comparison based on evaluation of model predictions, Akaike information criterion, and Pearson residuals, here the hurdle model was judged to be the most appropriate
Research and analyze of physical health using multiple regression analysis
T. S. Kyi
2014-01-01
Full Text Available This paper represents the research which is trying to create a mathematical model of the "healthy people" using the method of regression analysis. The factors are the physical parameters of the person (such as heart rate, lung capacity, blood pressure, breath holding, weight height coefficient, flexibility of the spine, muscles of the shoulder belt, abdominal muscles, squatting, etc.., and the response variable is an indicator of physical working capacity. After performing multiple regression analysis, obtained useful multiple regression models that can predict the physical performance of boys the aged of fourteen to seventeen years. This paper represents the development of regression model for the sixteen year old boys and analyzed results.
Regression Model Optimization for the Analysis of Experimental Data
Ulbrich, N.
2009-01-01
A candidate math model search algorithm was developed at Ames Research Center that determines a recommended math model for the multivariate regression analysis of experimental data. The search algorithm is applicable to classical regression analysis problems as well as wind tunnel strain gage balance calibration analysis applications. The algorithm compares the predictive capability of different regression models using the standard deviation of the PRESS residuals of the responses as a search metric. This search metric is minimized during the search. Singular value decomposition is used during the search to reject math models that lead to a singular solution of the regression analysis problem. Two threshold dependent constraints are also applied. The first constraint rejects math models with insignificant terms. The second constraint rejects math models with near-linear dependencies between terms. The math term hierarchy rule may also be applied as an optional constraint during or after the candidate math model search. The final term selection of the recommended math model depends on the regressor and response values of the data set, the user s function class combination choice, the user s constraint selections, and the result of the search metric minimization. A frequently used regression analysis example from the literature is used to illustrate the application of the search algorithm to experimental data.
Simulation Experiments in Practice : Statistical Design and Regression Analysis
Kleijnen, J.P.C.
2007-01-01
In practice, simulation analysts often change only one factor at a time, and use graphical analysis of the resulting Input/Output (I/O) data. Statistical theory proves that more information is obtained when applying Design Of Experiments (DOE) and linear regression analysis. Unfortunately, classic t
Evaluation Applications of Regression Analysis with Time-Series Data.
Veney, James E.
1993-01-01
The application of time series analysis is described, focusing on the use of regression analysis for analyzing time series in a way that may make it more readily available to an evaluation practice audience. Practical guidelines are suggested for decision makers in government, health, and social welfare agencies. (SLD)
The Analysis of the Regression-Discontinuity Design in R
Thoemmes, Felix; Liao, Wang; Jin, Ze
2017-01-01
This article describes the analysis of regression-discontinuity designs (RDDs) using the R packages rdd, rdrobust, and rddtools. We discuss similarities and differences between these packages and provide directions on how to use them effectively. We use real data from the Carolina Abecedarian Project to show how an analysis of an RDD can be…
Joint regression analysis and AMMI model applied to oat improvement
Oliveira, A.; Oliveira, T. A.; Mejza, S.
2012-09-01
In our work we present an application of some biometrical methods useful in genotype stability evaluation, namely AMMI model, Joint Regression Analysis (JRA) and multiple comparison tests. A genotype stability analysis of oat (Avena Sativa L.) grain yield was carried out using data of the Portuguese Plant Breeding Board, sample of the 22 different genotypes during the years 2002, 2003 and 2004 in six locations. In Ferreira et al. (2006) the authors state the relevance of the regression models and of the Additive Main Effects and Multiplicative Interactions (AMMI) model, to study and to estimate phenotypic stability effects. As computational techniques we use the Zigzag algorithm to estimate the regression coefficients and the agricolae-package available in R software for AMMI model analysis.
Ratio Versus Regression Analysis: Some Empirical Evidence in Brazil
Newton Carneiro Affonso da Costa Jr.
2004-06-01
Full Text Available This work compares the traditional methodology for ratio analysis, applied to a sample of Brazilian firms, with the alternative one of regression analysis both to cross-industry and intra-industry samples. It was tested the structural validity of the traditional methodology through a model that represents its analogous regression format. The data are from 156 Brazilian public companies in nine industrial sectors for the year 1997. The results provide weak empirical support for the traditional ratio methodology as it was verified that the validity of this methodology may differ between ratios.
Time series analysis using semiparametric regression on oil palm production
Yundari, Pasaribu, U. S.; Mukhaiyar, U.
2016-04-01
This paper presents semiparametric kernel regression method which has shown its flexibility and easiness in mathematical calculation, especially in estimating density and regression function. Kernel function is continuous and it produces a smooth estimation. The classical kernel density estimator is constructed by completely nonparametric analysis and it is well reasonable working for all form of function. Here, we discuss about parameter estimation in time series analysis. First, we consider the parameters are exist, then we use nonparametrical estimation which is called semiparametrical. The selection of optimum bandwidth is obtained by considering the approximation of Mean Integrated Square Root Error (MISE).
Analysis of Sting Balance Calibration Data Using Optimized Regression Models
Ulbrich, N.; Bader, Jon B.
2010-01-01
Calibration data of a wind tunnel sting balance was processed using a candidate math model search algorithm that recommends an optimized regression model for the data analysis. During the calibration the normal force and the moment at the balance moment center were selected as independent calibration variables. The sting balance itself had two moment gages. Therefore, after analyzing the connection between calibration loads and gage outputs, it was decided to choose the difference and the sum of the gage outputs as the two responses that best describe the behavior of the balance. The math model search algorithm was applied to these two responses. An optimized regression model was obtained for each response. Classical strain gage balance load transformations and the equations of the deflection of a cantilever beam under load are used to show that the search algorithm s two optimized regression models are supported by a theoretical analysis of the relationship between the applied calibration loads and the measured gage outputs. The analysis of the sting balance calibration data set is a rare example of a situation when terms of a regression model of a balance can directly be derived from first principles of physics. In addition, it is interesting to note that the search algorithm recommended the correct regression model term combinations using only a set of statistical quality metrics that were applied to the experimental data during the algorithm s term selection process.
SU-E-J-212: Identifying Bones From MRI: A Dictionary Learnign and Sparse Regression Approach
Ruan, D; Yang, Y; Cao, M; Hu, P; Low, D [UCLA, Los Angeles, CA (United States)
2014-06-01
Purpose: To develop an efficient and robust scheme to identify bony anatomy based on MRI-only simulation images. Methods: MRI offers important soft tissue contrast and functional information, yet its lack of correlation to electron-density has placed it as an auxiliary modality to CT in radiotherapy simulation and adaptation. An effective scheme to identify bony anatomy is an important first step towards MR-only simulation/treatment paradigm and would satisfy most practical purposes. We utilize a UTE acquisition sequence to achieve visibility of the bone. By contrast to manual + bulk or registration-to identify bones, we propose a novel learning-based approach for improved robustness to MR artefacts and environmental changes. Specifically, local information is encoded with MR image patch, and the corresponding label is extracted (during training) from simulation CT aligned to the UTE. Within each class (bone vs. nonbone), an overcomplete dictionary is learned so that typical patches within the proper class can be represented as a sparse combination of the dictionary entries. For testing, an acquired UTE-MRI is divided to patches using a sliding scheme, where each patch is sparsely regressed against both bone and nonbone dictionaries, and subsequently claimed to be associated with the class with the smaller residual. Results: The proposed method has been applied to the pilot site of brain imaging and it has showed general good performance, with dice similarity coefficient of greater than 0.9 in a crossvalidation study using 4 datasets. Importantly, it is robust towards consistent foreign objects (e.g., headset) and the artefacts relates to Gibbs and field heterogeneity. Conclusion: A learning perspective has been developed for inferring bone structures based on UTE MRI. The imaging setting is subject to minimal motion effects and the post-processing is efficient. The improved efficiency and robustness enables a first translation to MR-only routine. The scheme
Sparse Regression by Projection and Sparse Discriminant Analysis
Qi, Xin
2015-04-03
© 2015, © American Statistical Association, Institute of Mathematical Statistics, and Interface Foundation of North America. Recent years have seen active developments of various penalized regression methods, such as LASSO and elastic net, to analyze high-dimensional data. In these approaches, the direction and length of the regression coefficients are determined simultaneously. Due to the introduction of penalties, the length of the estimates can be far from being optimal for accurate predictions. We introduce a new framework, regression by projection, and its sparse version to analyze high-dimensional data. The unique nature of this framework is that the directions of the regression coefficients are inferred first, and the lengths and the tuning parameters are determined by a cross-validation procedure to achieve the largest prediction accuracy. We provide a theoretical result for simultaneous model selection consistency and parameter estimation consistency of our method in high dimension. This new framework is then generalized such that it can be applied to principal components analysis, partial least squares, and canonical correlation analysis. We also adapt this framework for discriminant analysis. Compared with the existing methods, where there is relatively little control of the dependency among the sparse components, our method can control the relationships among the components. We present efficient algorithms and related theory for solving the sparse regression by projection problem. Based on extensive simulations and real data analysis, we demonstrate that our method achieves good predictive performance and variable selection in the regression setting, and the ability to control relationships between the sparse components leads to more accurate classification. In supplementary materials available online, the details of the algorithms and theoretical proofs, and R codes for all simulation studies are provided.
A Novel Multiobjective Evolutionary Algorithm Based on Regression Analysis
Zhiming Song
2015-01-01
Full Text Available As is known, the Pareto set of a continuous multiobjective optimization problem with m objective functions is a piecewise continuous (m-1-dimensional manifold in the decision space under some mild conditions. However, how to utilize the regularity to design multiobjective optimization algorithms has become the research focus. In this paper, based on this regularity, a model-based multiobjective evolutionary algorithm with regression analysis (MMEA-RA is put forward to solve continuous multiobjective optimization problems with variable linkages. In the algorithm, the optimization problem is modelled as a promising area in the decision space by a probability distribution, and the centroid of the probability distribution is (m-1-dimensional piecewise continuous manifold. The least squares method is used to construct such a model. A selection strategy based on the nondominated sorting is used to choose the individuals to the next generation. The new algorithm is tested and compared with NSGA-II and RM-MEDA. The result shows that MMEA-RA outperforms RM-MEDA and NSGA-II on the test instances with variable linkages. At the same time, MMEA-RA has higher efficiency than the other two algorithms. A few shortcomings of MMEA-RA have also been identified and discussed in this paper.
A novel multiobjective evolutionary algorithm based on regression analysis.
Song, Zhiming; Wang, Maocai; Dai, Guangming; Vasile, Massimiliano
2015-01-01
As is known, the Pareto set of a continuous multiobjective optimization problem with m objective functions is a piecewise continuous (m - 1)-dimensional manifold in the decision space under some mild conditions. However, how to utilize the regularity to design multiobjective optimization algorithms has become the research focus. In this paper, based on this regularity, a model-based multiobjective evolutionary algorithm with regression analysis (MMEA-RA) is put forward to solve continuous multiobjective optimization problems with variable linkages. In the algorithm, the optimization problem is modelled as a promising area in the decision space by a probability distribution, and the centroid of the probability distribution is (m - 1)-dimensional piecewise continuous manifold. The least squares method is used to construct such a model. A selection strategy based on the nondominated sorting is used to choose the individuals to the next generation. The new algorithm is tested and compared with NSGA-II and RM-MEDA. The result shows that MMEA-RA outperforms RM-MEDA and NSGA-II on the test instances with variable linkages. At the same time, MMEA-RA has higher efficiency than the other two algorithms. A few shortcomings of MMEA-RA have also been identified and discussed in this paper.
Regression analysis for solving diagnosis problem of children's health
Cherkashina, Yu A.; Gerget, O. M.
2016-04-01
The paper includes results of scientific researches. These researches are devoted to the application of statistical techniques, namely, regression analysis, to assess the health status of children in the neonatal period based on medical data (hemostatic parameters, parameters of blood tests, the gestational age, vascular-endothelial growth factor) measured at 3-5 days of children's life. In this paper a detailed description of the studied medical data is given. A binary logistic regression procedure is discussed in the paper. Basic results of the research are presented. A classification table of predicted values and factual observed values is shown, the overall percentage of correct recognition is determined. Regression equation coefficients are calculated, the general regression equation is written based on them. Based on the results of logistic regression, ROC analysis was performed, sensitivity and specificity of the model are calculated and ROC curves are constructed. These mathematical techniques allow carrying out diagnostics of health of children providing a high quality of recognition. The results make a significant contribution to the development of evidence-based medicine and have a high practical importance in the professional activity of the author.
Visual category recognition using Spectral Regression and Kernel Discriminant Analysis
Tahir, M.A.; Kittler, J.; Mikolajczyk, K.; Yan, F.; van de Sande, K.E.A.; Gevers, T.
2009-01-01
Visual category recognition (VCR) is one of the most important tasks in image and video indexing. Spectral methods have recently emerged as a powerful tool for dimensionality reduction and manifold learning. Recently, Spectral Regression combined with Kernel Discriminant Analysis (SR-KDA) has been s
Controlling the Type I Error Rate in Stepwise Regression Analysis.
Pohlmann, John T.
Three procedures used to control Type I error rate in stepwise regression analysis are forward selection, backward elimination, and true stepwise. In the forward selection method, a model of the dependent variable is formed by choosing the single best predictor; then the second predictor which makes the strongest contribution to the prediction of…
Phenotype Similarity Regression for Identifying the Genetic Determinants of Rare Diseases
Greene, Daniel; Richardson, Sylvia; Turro, Ernest
2016-01-01
Rare genetic disorders, which can now be studied systematically with affordable genome sequencing, are often caused by high-penetrance rare variants. Such disorders are often heterogeneous and characterized by abnormalities spanning multiple organ systems ascertained with variable clinical precision. Existing methods for identifying genes with variants responsible for rare diseases summarize phenotypes with unstructured binary or quantitative variables. The Human Phenotype Ontology (HPO) allows composite phenotypes to be represented systematically but association methods accounting for the ontological relationship between HPO terms do not exist. We present a Bayesian method to model the association between an HPO-coded patient phenotype and genotype. Our method estimates the probability of an association together with an HPO-coded phenotype characteristic of the disease. We thus formalize a clinical approach to phenotyping that is lacking in standard regression techniques for rare disease research. We demonstrate the power of our method by uncovering a number of true associations in a large collection of genome-sequenced and HPO-coded cases with rare diseases. PMID:26924528
Phenotype Similarity Regression for Identifying the Genetic Determinants of Rare Diseases.
Greene, Daniel; Richardson, Sylvia; Turro, Ernest
2016-03-01
Rare genetic disorders, which can now be studied systematically with affordable genome sequencing, are often caused by high-penetrance rare variants. Such disorders are often heterogeneous and characterized by abnormalities spanning multiple organ systems ascertained with variable clinical precision. Existing methods for identifying genes with variants responsible for rare diseases summarize phenotypes with unstructured binary or quantitative variables. The Human Phenotype Ontology (HPO) allows composite phenotypes to be represented systematically but association methods accounting for the ontological relationship between HPO terms do not exist. We present a Bayesian method to model the association between an HPO-coded patient phenotype and genotype. Our method estimates the probability of an association together with an HPO-coded phenotype characteristic of the disease. We thus formalize a clinical approach to phenotyping that is lacking in standard regression techniques for rare disease research. We demonstrate the power of our method by uncovering a number of true associations in a large collection of genome-sequenced and HPO-coded cases with rare diseases.
Principal regression analysis and the index leverage effect
Reigneron, Pierre-Alain; Allez, Romain; Bouchaud, Jean-Philippe
2011-09-01
We revisit the index leverage effect, that can be decomposed into a volatility effect and a correlation effect. We investigate the latter using a matrix regression analysis, that we call ‘Principal Regression Analysis' (PRA) and for which we provide some analytical (using Random Matrix Theory) and numerical benchmarks. We find that downward index trends increase the average correlation between stocks (as measured by the most negative eigenvalue of the conditional correlation matrix), and makes the market mode more uniform. Upward trends, on the other hand, also increase the average correlation between stocks but rotates the corresponding market mode away from uniformity. There are two time scales associated to these effects, a short one on the order of a month (20 trading days), and a longer time scale on the order of a year. We also find indications of a leverage effect for sectorial correlations as well, which reveals itself in the second and third mode of the PRA.
Parodi, Stefano; Dosi, Corrado; Zambon, Antonella; Ferrari, Enrico; Muselli, Marco
2017-03-02
Identifying potential risk factors for problem gambling (PG) is of primary importance for planning preventive and therapeutic interventions. We illustrate a new approach based on the combination of standard logistic regression and an innovative method of supervised data mining (Logic Learning Machine or LLM). Data were taken from a pilot cross-sectional study to identify subjects with PG behaviour, assessed by two internationally validated scales (SOGS and Lie/Bet). Information was obtained from 251 gamblers recruited in six betting establishments. Data on socio-demographic characteristics, lifestyle and cognitive-related factors, and type, place and frequency of preferred gambling were obtained by a self-administered questionnaire. The following variables associated with PG were identified: instant gratification games, alcohol abuse, cognitive distortion, illegal behaviours and having started gambling with a relative or a friend. Furthermore, the combination of LLM and LR indicated the presence of two different types of PG, namely: (a) daily gamblers, more prone to illegal behaviour, with poor money management skills and who started gambling at an early age, and (b) non-daily gamblers, characterised by superstitious beliefs and a higher preference for immediate reward games. Finally, instant gratification games were strongly associated with the number of games usually played. Studies on gamblers habitually frequently betting shops are rare. The finding of different types of PG by habitual gamblers deserves further analysis in larger studies. Advanced data mining algorithms, like LLM, are powerful tools and potentially useful in identifying risk factors for PG.
A regressed phase analysis for coupled joint systems.
Wininger, Michael
2011-01-01
This study aims to address shortcomings of the relative phase analysis, a widely used method for assessment of coupling among joints of the lower limb. Goniometric data from 15 individuals with spastic diplegic cerebral palsy were recorded from the hip and knee joints during ambulation on a flat surface, and from a single healthy individual with no known motor impairment, over at least 10 gait cycles. The minimum relative phase (MRP) revealed substantial disparity in the timing and severity of the instance of maximum coupling, depending on which reference frame was selected: MRP(knee-hip) differed from MRP(hip-knee) by 16.1±14% of gait cycle and 50.6±77% difference in scale. Additionally, several relative phase portraits contained discontinuities which may contribute to error in phase feature extraction. These vagaries can be attributed to the predication of relative phase analysis on a transformation into the velocity-position phase plane, and the extraction of phase angle by the discontinuous arc-tangent operator. Here, an alternative phase analysis is proposed, wherein kinematic data is transformed into a profile of joint coupling across the entire gait cycle. By comparing joint velocities directly via a standard linear regression in the velocity-velocity phase plane, this regressed phase analysis provides several key advantages over relative phase analysis including continuity, commutativity between reference frames, and generalizability to many-joint systems.
Buchner, Florian; Wasem, Jürgen; Schillo, Sonja
2017-01-01
Risk equalization formulas have been refined since their introduction about two decades ago. Because of the complexity and the abundance of possible interactions between the variables used, hardly any interactions are considered. A regression tree is used to systematically search for interactions, a methodologically new approach in risk equalization. Analyses are based on a data set of nearly 2.9 million individuals from a major German social health insurer. A two-step approach is applied: In the first step a regression tree is built on the basis of the learning data set. Terminal nodes characterized by more than one morbidity-group-split represent interaction effects of different morbidity groups. In the second step the 'traditional' weighted least squares regression equation is expanded by adding interaction terms for all interactions detected by the tree, and regression coefficients are recalculated. The resulting risk adjustment formula shows an improvement in the adjusted R(2) from 25.43% to 25.81% on the evaluation data set. Predictive ratios are calculated for subgroups affected by the interactions. The R(2) improvement detected is only marginal. According to the sample level performance measures used, not involving a considerable number of morbidity interactions forms no relevant loss in accuracy. Copyright © 2015 John Wiley & Sons, Ltd.
Epistasis analysis for quantitative traits by functional regression model.
Zhang, Futao; Boerwinkle, Eric; Xiong, Momiao
2014-06-01
The critical barrier in interaction analysis for rare variants is that most traditional statistical methods for testing interactions were originally designed for testing the interaction between common variants and are difficult to apply to rare variants because of their prohibitive computational time and poor ability. The great challenges for successful detection of interactions with next-generation sequencing (NGS) data are (1) lack of methods for interaction analysis with rare variants, (2) severe multiple testing, and (3) time-consuming computations. To meet these challenges, we shift the paradigm of interaction analysis between two loci to interaction analysis between two sets of loci or genomic regions and collectively test interactions between all possible pairs of SNPs within two genomic regions. In other words, we take a genome region as a basic unit of interaction analysis and use high-dimensional data reduction and functional data analysis techniques to develop a novel functional regression model to collectively test interactions between all possible pairs of single nucleotide polymorphisms (SNPs) within two genome regions. By intensive simulations, we demonstrate that the functional regression models for interaction analysis of the quantitative trait have the correct type 1 error rates and a much better ability to detect interactions than the current pairwise interaction analysis. The proposed method was applied to exome sequence data from the NHLBI's Exome Sequencing Project (ESP) and CHARGE-S study. We discovered 27 pairs of genes showing significant interactions after applying the Bonferroni correction (P-values < 4.58 × 10(-10)) in the ESP, and 11 were replicated in the CHARGE-S study.
Isolating the Effects of Training Using Simple Regression Analysis: An Example of the Procedure.
Waugh, C. Keith
This paper provides a case example of simple regression analysis, a forecasting procedure used to isolate the effects of training from an identified extraneous variable. This case example focuses on results of a three-day sales training program to improve bank loan officers' knowledge, skill-level, and attitude regarding solicitation and sale of…
Sandow Mark Yidana; Bruce Banoeng-Yakubo; Patrick Asamoah Sakyi
2012-04-01
An innovative technique of measuring the intensities of major sources of variation in the hydrochemistry of (ground) water in a basin has been developed. This technique, which is based on the combination of R-mode factor and multiple regression analyses, can be used to measure the degrees of influence of the major sources of variation in the hydrochemistry without measuring the concentrations of the entire set of physico-chemical parameters which are often used to characterize water systems. R-mode factor analysis was applied to the data of 13 physico-chemical parameters and 50 samples in order to determine the major sources of variation in the hydrochemistry of some aquifers in the western region of Ghana. In this study, three sources of variation in the hydrochemistry were distinguished: the dissolution of chlorides and sulfates of the major cations, carbonate mineral dissolution, and silicate mineral weathering. Two key parameters were identified with each of the processes and multiple regression models were developed for each process. These models were tested and found to predict these processes quite accurately, and can be applied anywhere within the terrain. This technique can be reliably applied in areas where logistical constraints limit water sampling for whole basin hydrochemical characterization. Q-mode hierarchical cluster analysis (HCA) applied to the data revealed three major groundwater associations distinguished on the basis of the major causes of variation in the hydrochemistry. The three groundwater types represent Na–HCO3, Ca–HCO3, and Na–Cl groundwater types. Silicate stability diagrams suggest that all these groundwater types are mainly stable in the kaolinite and montmorillonite fields suggesting moderately restricted flow conditions.
Residential behavioural energy savings : a meta-regression analysis
Tiedemann, K.H. [BC Hydro, Burnaby, BC (Canada)
2009-07-01
Increasing attention is being given to opportunities for residential energy behavioural savings, as developed countries attempt to reduce energy use and greenhouse gas emissions. Several utility companies have undertaken pilot programs geared at understanding which interventions are most effective in reducing residential energy consumption through behavioural change. This paper presented the first metaregression analysis of residential energy behavioural savings. This study focused on interventions which affected household energy-related behaviours and as a result, affected household energy use. The paper described rational choice theory, the theory of planned behaviour, and the integration of rational choice theory and the adjusted expectancy values theory in a simple framework. The paper also discussed the review of various social, psychological and economics journals and databases. The results of the studies were presented. A basic concept in meta-regression analysis is the effects size which is defined as the program effect divided by the standard error of the program effect. A lengthy review of the literature found twenty-eight treatments from ten experiments for which an effect size could be calculated. The experiments involved classifying treatments according to whether the interventions were information, goal setting, feedback, rewards or combinations of these interventions. The impact of these alternative interventions on the effect size was then modelled using White's robust regression. Five regression models were compared on the basis of the Akaike's information criterion. It was found that model 5, which used all of the regressors, was the preferred model. It was concluded that the theory of planned behaviour is more appropriate in the context of analysis of behavioural change and energy use. 21 refs., 4 tabs.
Meta-regression Analysis of the Chinese Labor Reallocation Effect
Longhua; YUE; Shiyan; YANG; Rongtai; SHEN
2013-01-01
Meta regression analysis method was applied to study 23 papers about the effect of Chinese labor reallocation on the economic growth. The results showed that both the method of the World Bank (1996) or M.Syrquin(1986) had little impact on the results, while the calculation of the stock of physical capital had a positive impact on the results. The result by using panel data study was bigger than results obtained in the time series data. The time span had little influences on the results. Therefore, it was necessary to measure the exact stock of physical capital in China, so as to evaluate the Chinese labor reallocation effect
Multivariate study and regression analysis of gluten-free granola
Lilian Maria Pagamunici
2014-03-01
Full Text Available This study developed a gluten-free granola and evaluated it during storage with the application of multivariate and regression analysis of the sensory and instrumental parameters. The physicochemical, sensory, and nutritional characteristics of a product containing quinoa, amaranth and linseed were evaluated. The crude protein and lipid contents ranged from 97.49 and 122.72 g kg-1 of food, respectively. The polyunsaturated/saturated, and n-6:n-3 fatty acid ratios ranged from 2.82 and 2.59:1, respectively. Granola had the best alpha-linolenic acid content, nutritional indices in the lipid fraction, and mineral content. There were good hygienic and sanitary conditions during storage; probably due to the low water activity of the formulation, which contributed to inhibit microbial growth. The sensory attributes ranged from 'like very much' to 'like slightly', and the regression models were highly fitted and correlated during the storage period. A reduction in the sensory attribute levels and in the product physical stabilisation was verified by principal component analysis. The use of the affective test acceptance and instrumental analysis combined with statistical methods allowed us to obtain promising results about the characteristics of gluten-free granola.
Matt Silver
2013-11-01
Full Text Available Standard approaches to data analysis in genome-wide association studies (GWAS ignore any potential functional relationships between gene variants. In contrast gene pathways analysis uses prior information on functional structure within the genome to identify pathways associated with a trait of interest. In a second step, important single nucleotide polymorphisms (SNPs or genes may be identified within associated pathways. The pathways approach is motivated by the fact that genes do not act alone, but instead have effects that are likely to be mediated through their interaction in gene pathways. Where this is the case, pathways approaches may reveal aspects of a trait's genetic architecture that would otherwise be missed when considering SNPs in isolation. Most pathways methods begin by testing SNPs one at a time, and so fail to capitalise on the potential advantages inherent in a multi-SNP, joint modelling approach. Here, we describe a dual-level, sparse regression model for the simultaneous identification of pathways and genes associated with a quantitative trait. Our method takes account of various factors specific to the joint modelling of pathways with genome-wide data, including widespread correlation between genetic predictors, and the fact that variants may overlap multiple pathways. We use a resampling strategy that exploits finite sample variability to provide robust rankings for pathways and genes. We test our method through simulation, and use it to perform pathways-driven gene selection in a search for pathways and genes associated with variation in serum high-density lipoprotein cholesterol levels in two separate GWAS cohorts of Asian adults. By comparing results from both cohorts we identify a number of candidate pathways including those associated with cardiomyopathy, and T cell receptor and PPAR signalling. Highlighted genes include those associated with the L-type calcium channel, adenylate cyclase, integrin, laminin, MAPK
Regression analysis application for designing the vibration dampers
A. V. Ivanov
2014-01-01
Full Text Available Multi-frequency vibration dampers protect air power lines and fiber optic communication channels against Aeolian vibrations. To have a maximum efficiency the natural frequencies of dampers should be evenly distributed over the entire operating frequency range from 3 to 150 Hz. A traditional approach to damper design is to investigate damper features using the fullscale models. As a result, a conclusion on the damper capabilities is drawn, and design changes are made to achieve the required natural frequencies. The article describes a direct optimization method to design dampers.This method leads to a clear-cut definition of geometrical and mass parameters of dampers by their natural frequencies. The direct designing method is based on the active plan and design experiment.Based on regression analysis, a regression model is obtained as a second order polynomial to establish unique relation between the input (element dimensions, the weights of cargos and the output (natural frequencies design parameters. Different problems of designing dampers are considered using developed regression models.As a result, it has been found that a satisfactory accuracy of mathematical models, relating the input designing parameters to the output ones, is achieved. Depending on the number of input parameters and the nature of the restrictions a statement of designing purpose, including an optimization one, can be different when restrictions for design parameters are to meet the conflicting requirements.A proposed optimization method to solve a direct designing problem allows us to determine directly the damper element dimensions for any natural frequencies, and at the initial stage of the analysis, based on the methods of nonlinear programming, to disclose problems with no solution.The developed approach can be successfully applied to design various mechanical systems with complicated nonlinear interactions between the input and output parameters.
Optimum short-time polynomial regression for signal analysis
A SREENIVASA MURTHY; CHANDRA SEKHAR SEELAMANTULA; T V SREENIVAS
2016-11-01
We propose a short-time polynomial regression (STPR) for time-varying signal analysis. The advantage of using polynomials is that the notion of a spectrum is not needed and the signals can be analyzed in the time domain over short durations. In the presence of noise, such modeling becomes important, because the polynomial approximation performs smoothing leading to noise suppression. The problem of optimal smoothingdepends on the duration over which a fixed-order polynomial regression is performed. Considering the STPR of a noisy signal, we derive the optimal smoothing window by minimizing the mean-square error (MSE). For a fixed polynomial order, the smoothing window duration depends on the rate of signal variation, which, in turn,depends on its derivatives. Since the derivatives are not available a priori, exact optimization is not feasible.However, approximate optimization can be achieved using only the variance expressions and the intersection-ofconfidence-intervals (ICI) technique. The ICI technique is based on a consistency measure across confidence intervals corresponding to different window lengths. An approximate asymptotic analysis to determine the optimal confidence interval width shows that the asymptotic expressions are the same irrespective of whether one starts with a uniform sampling grid or a nonuniform one. Simulation results on sinusoids, chirps, and electrocardiogram (ECG) signals, and comparisons with standard wavelet denoising techniques, show that theproposed method is robust particularly in the low signal-to-noise ratio regime.
Distance Based Root Cause Analysis and Change Impact Analysis of Performance Regressions
Junzan Zhou
2015-01-01
Full Text Available Performance regression testing is applied to uncover both performance and functional problems of software releases. A performance problem revealed by performance testing can be high response time, low throughput, or even being out of service. Mature performance testing process helps systematically detect software performance problems. However, it is difficult to identify the root cause and evaluate the potential change impact. In this paper, we present an approach leveraging server side logs for identifying root causes of performance problems. Firstly, server side logs are used to recover call tree of each business transaction. We define a novel distance based metric computed from call trees for root cause analysis and apply inverted index from methods to business transactions for change impact analysis. Empirical studies show that our approach can effectively and efficiently help developers diagnose root cause of performance problems.
Polygraph Test Results Assessment by Regression Analysis Methods
K. A. Leontiev
2014-01-01
Full Text Available The paper considers a problem of defining the importance of asked questions for the examinee under judicial and psychophysiological polygraph examination by methods of mathematical statistics. It offers the classification algorithm based on the logistic regression as an optimum Bayesian classifier, considering weight coefficients of information for the polygraph-recorded physiological parameters with no condition for independence of the measured signs.Actually, binary classification is executed by results of polygraph examination with preliminary normalization and standardization of primary results, with check of a hypothesis that distribution of obtained data is normal, as well as with calculation of coefficients of linear regression between input values and responses by method of maximum likelihood. Further, the logistic curve divided signs into two classes of the "significant" and "insignificant" type.Efficiency of model is estimated by means of the ROC analysis (Receiver Operator Characteristics. It is shown that necessary minimum sample has to contain results of 45 measurements at least. This approach ensures a reliable result provided that an expert-polygraphologist possesses sufficient qualification and follows testing techniques.
A Visual Analytics Approach for Correlation, Classification, and Regression Analysis
Steed, Chad A [ORNL; SwanII, J. Edward [Mississippi State University (MSU); Fitzpatrick, Patrick J. [Mississippi State University (MSU); Jankun-Kelly, T.J. [Mississippi State University (MSU)
2012-02-01
New approaches that combine the strengths of humans and machines are necessary to equip analysts with the proper tools for exploring today's increasing complex, multivariate data sets. In this paper, a novel visual data mining framework, called the Multidimensional Data eXplorer (MDX), is described that addresses the challenges of today's data by combining automated statistical analytics with a highly interactive parallel coordinates based canvas. In addition to several intuitive interaction capabilities, this framework offers a rich set of graphical statistical indicators, interactive regression analysis, visual correlation mining, automated axis arrangements and filtering, and data classification techniques. The current work provides a detailed description of the system as well as a discussion of key design aspects and critical feedback from domain experts.
Cardiorespiratory fitness and laboratory stress: a meta-regression analysis.
Jackson, Erica M; Dishman, Rod K
2006-01-01
We performed a meta-regression analysis of 73 studies that examined whether cardiorespiratory fitness mitigates cardiovascular responses during and after acute laboratory stress in humans. The cumulative evidence indicates that fitness is related to slightly greater reactivity, but better recovery. However, effects varied according to several study features and were smallest in the better controlled studies. Fitness did not mitigate integrated stress responses such as heart rate and blood pressure, which were the focus of most of the studies we reviewed. Nonetheless, potentially important areas, particularly hemodynamic and vascular responses, have been understudied. Women, racial/ethnic groups, and cardiovascular patients were underrepresented. Randomized controlled trials, including naturalistic studies of real-life responses, are needed to clarify whether a change in fitness alters putative stress mechanisms linked with cardiovascular health.
Spatial regression analysis of traffic crashes in Seoul.
Rhee, Kyoung-Ah; Kim, Joon-Ki; Lee, Young-ihn; Ulfarsson, Gudmundur F
2016-06-01
Traffic crashes can be spatially correlated events and the analysis of the distribution of traffic crash frequency requires evaluation of parameters that reflect spatial properties and correlation. Typically this spatial aspect of crash data is not used in everyday practice by planning agencies and this contributes to a gap between research and practice. A database of traffic crashes in Seoul, Korea, in 2010 was developed at the traffic analysis zone (TAZ) level with a number of GIS developed spatial variables. Practical spatial models using available software were estimated. The spatial error model was determined to be better than the spatial lag model and an ordinary least squares baseline regression. A geographically weighted regression model provided useful insights about localization of effects. The results found that an increased length of roads with speed limit below 30 km/h and a higher ratio of residents below age of 15 were correlated with lower traffic crash frequency, while a higher ratio of residents who moved to the TAZ, more vehicle-kilometers traveled, and a greater number of access points with speed limit difference between side roads and mainline above 30 km/h all increased the number of traffic crashes. This suggests, for example, that better control or design for merging lower speed roads with higher speed roads is important. A key result is that the length of bus-only center lanes had the largest effect on increasing traffic crashes. This is important as bus-only center lanes with bus stop islands have been increasingly used to improve transit times. Hence the potential negative safety impacts of such systems need to be studied further and mitigated through improved design of pedestrian access to center bus stop islands.
Fisher, Charles K; Mehta, Pankaj
2014-01-01
Human associated microbial communities exert tremendous influence over human health and disease. With modern metagenomic sequencing methods it is now possible to follow the relative abundance of microbes in a community over time. These microbial communities exhibit rich ecological dynamics and an important goal of microbial ecology is to infer the ecological interactions between species directly from sequence data. Any algorithm for inferring ecological interactions must overcome three major obstacles: 1) a correlation between the abundances of two species does not imply that those species are interacting, 2) the sum constraint on the relative abundances obtained from metagenomic studies makes it difficult to infer the parameters in timeseries models, and 3) errors due to experimental uncertainty, or mis-assignment of sequencing reads into operational taxonomic units, bias inferences of species interactions due to a statistical problem called "errors-in-variables". Here we introduce an approach, Learning Interactions from MIcrobial Time Series (LIMITS), that overcomes these obstacles. LIMITS uses sparse linear regression with boostrap aggregation to infer a discrete-time Lotka-Volterra model for microbial dynamics. We tested LIMITS on synthetic data and showed that it could reliably infer the topology of the inter-species ecological interactions. We then used LIMITS to characterize the species interactions in the gut microbiomes of two individuals and found that the interaction networks varied significantly between individuals. Furthermore, we found that the interaction networks of the two individuals are dominated by distinct "keystone species", Bacteroides fragilis and Bacteroided stercosis, that have a disproportionate influence on the structure of the gut microbiome even though they are only found in moderate abundance. Based on our results, we hypothesize that the abundances of certain keystone species may be responsible for individuality in the human gut
Charles K Fisher
Full Text Available Human associated microbial communities exert tremendous influence over human health and disease. With modern metagenomic sequencing methods it is now possible to follow the relative abundance of microbes in a community over time. These microbial communities exhibit rich ecological dynamics and an important goal of microbial ecology is to infer the ecological interactions between species directly from sequence data. Any algorithm for inferring ecological interactions must overcome three major obstacles: 1 a correlation between the abundances of two species does not imply that those species are interacting, 2 the sum constraint on the relative abundances obtained from metagenomic studies makes it difficult to infer the parameters in timeseries models, and 3 errors due to experimental uncertainty, or mis-assignment of sequencing reads into operational taxonomic units, bias inferences of species interactions due to a statistical problem called "errors-in-variables". Here we introduce an approach, Learning Interactions from MIcrobial Time Series (LIMITS, that overcomes these obstacles. LIMITS uses sparse linear regression with boostrap aggregation to infer a discrete-time Lotka-Volterra model for microbial dynamics. We tested LIMITS on synthetic data and showed that it could reliably infer the topology of the inter-species ecological interactions. We then used LIMITS to characterize the species interactions in the gut microbiomes of two individuals and found that the interaction networks varied significantly between individuals. Furthermore, we found that the interaction networks of the two individuals are dominated by distinct "keystone species", Bacteroides fragilis and Bacteroided stercosis, that have a disproportionate influence on the structure of the gut microbiome even though they are only found in moderate abundance. Based on our results, we hypothesize that the abundances of certain keystone species may be responsible for individuality in
Fisher, Charles K.; Mehta, Pankaj
2014-01-01
Human associated microbial communities exert tremendous influence over human health and disease. With modern metagenomic sequencing methods it is now possible to follow the relative abundance of microbes in a community over time. These microbial communities exhibit rich ecological dynamics and an important goal of microbial ecology is to infer the ecological interactions between species directly from sequence data. Any algorithm for inferring ecological interactions must overcome three major obstacles: 1) a correlation between the abundances of two species does not imply that those species are interacting, 2) the sum constraint on the relative abundances obtained from metagenomic studies makes it difficult to infer the parameters in timeseries models, and 3) errors due to experimental uncertainty, or mis-assignment of sequencing reads into operational taxonomic units, bias inferences of species interactions due to a statistical problem called “errors-in-variables”. Here we introduce an approach, Learning Interactions from MIcrobial Time Series (LIMITS), that overcomes these obstacles. LIMITS uses sparse linear regression with boostrap aggregation to infer a discrete-time Lotka-Volterra model for microbial dynamics. We tested LIMITS on synthetic data and showed that it could reliably infer the topology of the inter-species ecological interactions. We then used LIMITS to characterize the species interactions in the gut microbiomes of two individuals and found that the interaction networks varied significantly between individuals. Furthermore, we found that the interaction networks of the two individuals are dominated by distinct “keystone species”, Bacteroides fragilis and Bacteroided stercosis, that have a disproportionate influence on the structure of the gut microbiome even though they are only found in moderate abundance. Based on our results, we hypothesize that the abundances of certain keystone species may be responsible for individuality in the human
Henrard, S; Speybroeck, N; Hermans, C
2015-11-01
Haemophilia is a rare genetic haemorrhagic disease characterized by partial or complete deficiency of coagulation factor VIII, for haemophilia A, or IX, for haemophilia B. As in any other medical research domain, the field of haemophilia research is increasingly concerned with finding factors associated with binary or continuous outcomes through multivariable models. Traditional models include multiple logistic regressions, for binary outcomes, and multiple linear regressions for continuous outcomes. Yet these regression models are at times difficult to implement, especially for non-statisticians, and can be difficult to interpret. The present paper sought to didactically explain how, why, and when to use classification and regression tree (CART) analysis for haemophilia research. The CART method is non-parametric and non-linear, based on the repeated partitioning of a sample into subgroups based on a certain criterion. Breiman developed this method in 1984. Classification trees (CTs) are used to analyse categorical outcomes and regression trees (RTs) to analyse continuous ones. The CART methodology has become increasingly popular in the medical field, yet only a few examples of studies using this methodology specifically in haemophilia have to date been published. Two examples using CART analysis and previously published in this field are didactically explained in details. There is increasing interest in using CART analysis in the health domain, primarily due to its ease of implementation, use, and interpretation, thus facilitating medical decision-making. This method should be promoted for analysing continuous or categorical outcomes in haemophilia, when applicable. © 2015 John Wiley & Sons Ltd.
Online Statistical Modeling (Regression Analysis) for Independent Responses
Made Tirta, I.; Anggraeni, Dian; Pandutama, Martinus
2017-06-01
Regression analysis (statistical analmodelling) are among statistical methods which are frequently needed in analyzing quantitative data, especially to model relationship between response and explanatory variables. Nowadays, statistical models have been developed into various directions to model various type and complex relationship of data. Rich varieties of advanced and recent statistical modelling are mostly available on open source software (one of them is R). However, these advanced statistical modelling, are not very friendly to novice R users, since they are based on programming script or command line interface. Our research aims to developed web interface (based on R and shiny), so that most recent and advanced statistical modelling are readily available, accessible and applicable on web. We have previously made interface in the form of e-tutorial for several modern and advanced statistical modelling on R especially for independent responses (including linear models/LM, generalized linier models/GLM, generalized additive model/GAM and generalized additive model for location scale and shape/GAMLSS). In this research we unified them in the form of data analysis, including model using Computer Intensive Statistics (Bootstrap and Markov Chain Monte Carlo/ MCMC). All are readily accessible on our online Virtual Statistics Laboratory. The web (interface) make the statistical modeling becomes easier to apply and easier to compare them in order to find the most appropriate model for the data.
Analysing count data of Butterflies communities in Jasin, Melaka: A Poisson regression analysis
Afiqah Muhamad Jamil, Siti; Asrul Affendi Abdullah, M.; Kek, Sie Long; Nor, Maria Elena; Mohamed, Maryati; Ismail, Norradihah
2017-09-01
Counting outcomes normally have remaining values highly skewed toward the right as they are often characterized by large values of zeros. The data of butterfly communities, had been taken from Jasin, Melaka and consists of 131 number of subject visits in Jasin, Melaka. In this paper, considering the count data of butterfly communities, an analysis is considered Poisson regression analysis as it is assumed to be an alternative way on better suited to the counting process. This research paper is about analysing count data from zero observation ecological inference of butterfly communities in Jasin, Melaka by using Poisson regression analysis. The software for Poisson regression is readily available and it is becoming more widely used in many field of research and the data was analysed by using SAS software. The purpose of analysis comprised the framework of identifying the concerns. Besides, by using Poisson regression analysis, the study determines the fitness of data for accessing the reliability on using the count data. The finding indicates that the highest and lowest number of subject comes from the third family (Nymphalidae) family and fifth (Hesperidae) family and the Poisson distribution seems to fit the zero values.
Regression Model Term Selection for the Analysis of Strain-Gage Balance Calibration Data
Ulbrich, Norbert Manfred; Volden, Thomas R.
2010-01-01
The paper discusses the selection of regression model terms for the analysis of wind tunnel strain-gage balance calibration data. Different function class combinations are presented that may be used to analyze calibration data using either a non-iterative or an iterative method. The role of the intercept term in a regression model of calibration data is reviewed. In addition, useful algorithms and metrics originating from linear algebra and statistics are recommended that will help an analyst (i) to identify and avoid both linear and near-linear dependencies between regression model terms and (ii) to make sure that the selected regression model of the calibration data uses only statistically significant terms. Three different tests are suggested that may be used to objectively assess the predictive capability of the final regression model of the calibration data. These tests use both the original data points and regression model independent confirmation points. Finally, data from a simplified manual calibration of the Ames MK40 balance is used to illustrate the application of some of the metrics and tests to a realistic calibration data set.
An Effect Size for Regression Predictors in Meta-Analysis
Aloe, Ariel M.; Becker, Betsy Jane
2012-01-01
A new effect size representing the predictive power of an independent variable from a multiple regression model is presented. The index, denoted as r[subscript sp], is the semipartial correlation of the predictor with the outcome of interest. This effect size can be computed when multiple predictor variables are included in the regression model…
No rationale for 1 variable per 10 events criterion for binary logistic regression analysis
Maarten van Smeden
2016-11-01
Full Text Available Abstract Background Ten events per variable (EPV is a widely advocated minimal criterion for sample size considerations in logistic regression analysis. Of three previous simulation studies that examined this minimal EPV criterion only one supports the use of a minimum of 10 EPV. In this paper, we examine the reasons for substantial differences between these extensive simulation studies. Methods The current study uses Monte Carlo simulations to evaluate small sample bias, coverage of confidence intervals and mean square error of logit coefficients. Logistic regression models fitted by maximum likelihood and a modified estimation procedure, known as Firth’s correction, are compared. Results The results show that besides EPV, the problems associated with low EPV depend on other factors such as the total sample size. It is also demonstrated that simulation results can be dominated by even a few simulated data sets for which the prediction of the outcome by the covariates is perfect (‘separation’. We reveal that different approaches for identifying and handling separation leads to substantially different simulation results. We further show that Firth’s correction can be used to improve the accuracy of regression coefficients and alleviate the problems associated with separation. Conclusions The current evidence supporting EPV rules for binary logistic regression is weak. Given our findings, there is an urgent need for new research to provide guidance for supporting sample size considerations for binary logistic regression analysis.
An Analysis of Bank Service Satisfaction Based on Quantile Regression and Grey Relational Analysis
Wen-Tsao Pan
2016-01-01
Full Text Available Bank service satisfaction is vital to the success of a bank. In this paper, we propose to use the grey relational analysis to gauge the levels of service satisfaction of the banks. With the grey relational analysis, we compared the effects of different variables on service satisfaction. We gave ranks to the banks according to their levels of service satisfaction. We further used the quantile regression model to find the variables that affected the satisfaction of a customer at a specific quantile of satisfaction level. The result of the quantile regression analysis provided a bank manager with information to formulate policies to further promote satisfaction of the customers at different quantiles of satisfaction level. We also compared the prediction accuracies of the regression models at different quantiles. The experiment result showed that, among the seven quantile regression models, the median regression model has the best performance in terms of RMSE, RTIC, and CE performance measures.
Regression Analysis of Restricted Mean Survival Time Based on Pseudo-Observations
Andersen, Per Kragh; Hansen, Mette Gerster; Klein, John P.
2004-01-01
censoring; hazard function; health economics; mean survival time; pseudo-observations; regression model; restricted mean survival time; survival analysis......censoring; hazard function; health economics; mean survival time; pseudo-observations; regression model; restricted mean survival time; survival analysis...
Regression analysis of restricted mean survival time based on pseudo-observations
Andersen, Per Kragh; Hansen, Mette Gerster; Klein, John P.
censoring; hazard function; health economics; regression model; survival analysis; mean survival time; restricted mean survival time; pseudo-observations......censoring; hazard function; health economics; regression model; survival analysis; mean survival time; restricted mean survival time; pseudo-observations...
Regression and kriging analysis for grid power factor estimation
Rajesh Guntaka
2014-12-01
Full Text Available The measurement of power factor (PF in electrical utility grids is a mainstay of load balancing and is also a critical element of transmission and distribution efficiency. The measurement of PF dates back to the earliest periods of electrical power distribution to public grids. In the wide-area distribution grid, measurement of current waveforms is trivial and may be accomplished at any point in the grid using a current tap transformer. However, voltage measurement requires reference to ground and so is more problematic and measurements are normally constrained to points that have ready and easy access to a ground source. We present two mathematical analysis methods based on kriging and linear least square estimation (LLSE (regression to derive PF at nodes with unknown voltages that are within a perimeter of sample nodes with ground reference across a selected power grid. Our results indicate an error average of 1.884% that is within acceptable tolerances for PF measurements that are used in load balancing tasks.
A simplified procedure of linear regression in a preliminary analysis
Silvia Facchinetti
2013-05-01
Full Text Available The analysis of a statistical large data-set can be led by the study of a particularly interesting variable Y – regressed – and an explicative variable X, chosen among the remained variables, conjointly observed. The study gives a simplified procedure to obtain the functional link of the variables y=y(x by a partition of the data-set into m subsets, in which the observations are synthesized by location indices (mean or median of X and Y. Polynomial models for y(x of order r are considered to verify the characteristics of the given procedure, in particular we assume r= 1 and 2. The distributions of the parameter estimators are obtained by simulation, when the fitting is done for m= r + 1. Comparisons of the results, in terms of distribution and efficiency, are made with the results obtained by the ordinary least square methods. The study also gives some considerations on the consistency of the estimated parameters obtained by the given procedure.
Fast nonlinear regression method for CT brain perfusion analysis.
Bennink, Edwin; Oosterbroek, Jaap; Kudo, Kohsuke; Viergever, Max A; Velthuis, Birgitta K; de Jong, Hugo W A M
2016-04-01
Although computed tomography (CT) perfusion (CTP) imaging enables rapid diagnosis and prognosis of ischemic stroke, current CTP analysis methods have several shortcomings. We propose a fast nonlinear regression method with a box-shaped model (boxNLR) that has important advantages over the current state-of-the-art method, block-circulant singular value decomposition (bSVD). These advantages include improved robustness to attenuation curve truncation, extensibility, and unified estimation of perfusion parameters. The method is compared with bSVD and with a commercial SVD-based method. The three methods were quantitatively evaluated by means of a digital perfusion phantom, described by Kudo et al. and qualitatively with the aid of 50 clinical CTP scans. All three methods yielded high Pearson correlation coefficients ([Formula: see text]) with the ground truth in the phantom. The boxNLR perfusion maps of the clinical scans showed higher correlation with bSVD than the perfusion maps from the commercial method. Furthermore, it was shown that boxNLR estimates are robust to noise, truncation, and tracer delay. The proposed method provides a fast and reliable way of estimating perfusion parameters from CTP scans. This suggests it could be a viable alternative to current commercial and academic methods.
Influencing Academic Library Use in Tanzania: A Multiple Regression Analysis
Leocardia L Juventus
2016-12-01
Full Text Available Library use is influenced by many factors. This study uses a multiple regression analysis to ascertain the connection between the level of library use and a few of these factors based on the questionnaire responses from 158 undergraduate students who use academic libraries in two Tanzania’s universities: Muhimbili University of Health and Allied Sciences (MUHAS, and Hubert Kairuki Memorial University (HKMU. It has been discovered that users of academic libraries in Tanzania are influenced by the need to: search and access online materials, check for new books or other resources, check out books and other materials, and enjoy a friendly environment for study. However, their library use is not influenced by either the free wireless network, or consultation from librarians. It is argued that, academic libraries need to devise and implement plans that can make these libraries better learning environment and platforms to drive socio-economic developmentparticularly in developing nations such as Tanzania. It is further argued that, this can be enhanced through investment in modern academic library infrastructures.
Standardized Regression Coefficients as Indices of Effect Sizes in Meta-Analysis
Kim, Rae Seon
2011-01-01
When conducting a meta-analysis, it is common to find many collected studies that report regression analyses, because multiple regression analysis is widely used in many fields. Meta-analysis uses effect sizes drawn from individual studies as a means of synthesizing a collection of results. However, indices of effect size from regression analyses…
Tejera-Vaquerizo, A; Martín-Cuevas, P; Gallego, E; Herrera-Acosta, E; Traves, V; Herrera-Ceballos, E; Nagore, E
2015-04-01
The main aim of this study was to identify predictors of sentinel lymph node (SN) metastasis in cutaneous melanoma. This was a retrospective cohort study of 818 patients in 2 tertiary-level hospitals. The primary outcome variable was SN involvement. Independent predictors were identified using multiple logistic regression and a classification and regression tree (CART) analysis. Ulceration, tumor thickness, and a high mitotic rate (≥6 mitoses/mm(2)) were independently associated with SN metastasis in the multiple regression analysis. The most important predictor in the CART analysis was Breslow thickness. Absence of an inflammatory infiltrate, patient age, and tumor location were predictive of SN metastasis in patients with tumors thicker than 2mm. In the case of thinner melanomas, the predictors were mitotic rate (>6 mitoses/mm(2)), presence of ulceration, and tumor thickness. Patient age, mitotic rate, and tumor thickness and location were predictive of survival. A high mitotic rate predicts a higher risk of SN involvement and worse survival. CART analysis improves the prediction of regional metastasis, resulting in better clinical management of melanoma patients. It may also help select suitable candidates for inclusion in clinical trials. Copyright © 2014 Elsevier España, S.L.U. and AEDV. All rights reserved.
Harrell , Jr , Frank E
2015-01-01
This highly anticipated second edition features new chapters and sections, 225 new references, and comprehensive R software. In keeping with the previous edition, this book is about the art and science of data analysis and predictive modeling, which entails choosing and using multiple tools. Instead of presenting isolated techniques, this text emphasizes problem solving strategies that address the many issues arising when developing multivariable models using real data and not standard textbook examples. It includes imputation methods for dealing with missing data effectively, methods for fitting nonlinear relationships and for making the estimation of transformations a formal part of the modeling process, methods for dealing with "too many variables to analyze and not enough observations," and powerful model validation techniques based on the bootstrap. The reader will gain a keen understanding of predictive accuracy, and the harm of categorizing continuous predictors or outcomes. This text realistically...
Design and analysis of experiments classical and regression approaches with SAS
Onyiah, Leonard C
2008-01-01
Introductory Statistical Inference and Regression Analysis Elementary Statistical Inference Regression Analysis Experiments, the Completely Randomized Design (CRD)-Classical and Regression Approaches Experiments Experiments to Compare Treatments Some Basic Ideas Requirements of a Good Experiment One-Way Experimental Layout or the CRD: Design and Analysis Analysis of Experimental Data (Fixed Effects Model) Expected Values for the Sums of Squares The Analysis of Variance (ANOVA) Table Follow-Up Analysis to Check fo
Buffalos milk yield analysis using random regression models
A.S. Schierholt
2010-02-01
Full Text Available Data comprising 1,719 milk yield records from 357 females (predominantly Murrah breed, daughters of 110 sires, with births from 1974 to 2004, obtained from the Programa de Melhoramento Genético de Bubalinos (PROMEBUL and from records of EMBRAPA Amazônia Oriental - EAO herd, located in Belém, Pará, Brazil, were used to compare random regression models for estimating variance components and predicting breeding values of the sires. The data were analyzed by different models using the Legendre’s polynomial functions from second to fourth orders. The random regression models included the effects of herd-year, month of parity date of the control; regression coefficients for age of females (in order to describe the fixed part of the lactation curve and random regression coefficients related to the direct genetic and permanent environment effects. The comparisons among the models were based on the Akaike Infromation Criterion. The random effects regression model using third order Legendre’s polynomials with four classes of the environmental effect were the one that best described the additive genetic variation in milk yield. The heritability estimates varied from 0.08 to 0.40. The genetic correlation between milk yields in younger ages was close to the unit, but in older ages it was low.
Quantile regression provides a fuller analysis of speed data.
Hewson, Paul
2008-03-01
Considerable interest already exists in terms of assessing percentiles of speed distributions, for example monitoring the 85th percentile speed is a common feature of the investigation of many road safety interventions. However, unlike the mean, where t-tests and ANOVA can be used to provide evidence of a statistically significant change, inference on these percentiles is much less common. This paper examines the potential role of quantile regression for modelling the 85th percentile, or any other quantile. Given that crash risk may increase disproportionately with increasing relative speed, it may be argued these quantiles are of more interest than the conditional mean. In common with the more usual linear regression, quantile regression admits a simple test as to whether the 85th percentile speed has changed following an intervention in an analogous way to using the t-test to determine if the mean speed has changed by considering the significance of parameters fitted to a design matrix. Having briefly outlined the technique and briefly examined an application with a widely published dataset concerning speed measurements taken around the introduction of signs in Cambridgeshire, this paper will demonstrate the potential for quantile regression modelling by examining recent data from Northamptonshire collected in conjunction with a "community speed watch" programme. Freely available software is used to fit these models and it is hoped that the potential benefits of using quantile regression methods when examining and analysing speed data are demonstrated.
Analysis of retirement income adequacy using quantile regression: A case study in Malaysia
Alaudin, Ros Idayuwati; Ismail, Noriszura; Isa, Zaidi
2015-09-01
Quantile regression is a statistical analysis that does not restrict attention to the conditional mean and therefore, permitting the approximation of the whole conditional distribution of a response variable. Quantile regression is a robust regression to outliers compared to mean regression models. In this paper, we demonstrate how quantile regression approach can be used to analyze the ratio of projected wealth to needs (wealth-needs ratio) during retirement.
Multiple regression for physiological data analysis: the problem of multicollinearity.
Slinker, B K; Glantz, S A
1985-07-01
Multiple linear regression, in which several predictor variables are related to a response variable, is a powerful statistical tool for gaining quantitative insight into complex in vivo physiological systems. For these insights to be correct, all predictor variables must be uncorrelated. However, in many physiological experiments the predictor variables cannot be precisely controlled and thus change in parallel (i.e., they are highly correlated). There is a redundancy of information about the response, a situation called multicollinearity, that leads to numerical problems in estimating the parameters in regression equations; the parameters are often of incorrect magnitude or sign or have large standard errors. Although multicollinearity can be avoided with good experimental design, not all interesting physiological questions can be studied without encountering multicollinearity. In these cases various ad hoc procedures have been proposed to mitigate multicollinearity. Although many of these procedures are controversial, they can be helpful in applying multiple linear regression to some physiological problems.
Analysis of some methods for reduced rank Gaussian process regression
Quinonero-Candela, J.; Rasmussen, Carl Edward
2005-01-01
While there is strong motivation for using Gaussian Processes (GPs) due to their excellent performance in regression and classification problems, their computational complexity makes them impractical when the size of the training set exceeds a few thousand cases. This has motivated the recent...... proliferation of a number of cost-effective approximations to GPs, both for classification and for regression. In this paper we analyze one popular approximation to GPs for regression: the reduced rank approximation. While generally GPs are equivalent to infinite linear models, we show that Reduced Rank...... Gaussian Processes (RRGPs) are equivalent to finite sparse linear models. We also introduce the concept of degenerate GPs and show that they correspond to inappropriate priors. We show how to modify the RRGP to prevent it from being degenerate at test time. Training RRGPs consists both in learning...
Regression analysis of censored data using pseudo-observations
Parner, Erik T.; Andersen, Per Kragh
2010-01-01
We draw upon a series of articles in which a method based on pseu- dovalues is proposed for direct regression modeling of the survival function, the restricted mean, and the cumulative incidence function in competing risks with right-censored data. The models, once the pseudovalues have been...
Measuring Habituation in Infants: An Approach Using Regression Analysis.
Ashmead, Daniel H.; Davis, DeFord L.
1996-01-01
Used computer simulations to examine effectiveness of different criteria for measuring infant visual habituation. Found that a criterion based on fitting a second-order polynomial regression function to looking-time data produced more accurate estimation of looking times and higher power for detecting novelty effects than did the traditional…
Grades, Gender, and Encouragement: A Regression Discontinuity Analysis
Owen, Ann L.
2010-01-01
The author employs a regression discontinuity design to provide direct evidence on the effects of grades earned in economics principles classes on the decision to major in economics and finds a differential effect for male and female students. Specifically, for female students, receiving an A for a final grade in the first economics class is…
REGRESSION ANALYSIS OF PRODUCTIVITY USING MIXED EFFECT MODEL
Siana Halim
2007-01-01
Full Text Available Production plants of a company are located in several areas that spread across Middle and East Java. As the production process employs mostly manpower, we suspected that each location has different characteristics affecting the productivity. Thus, the production data may have a spatial and hierarchical structure. For fitting a linear regression using the ordinary techniques, we are required to make some assumptions about the nature of the residuals i.e. independent, identically and normally distributed. However, these assumptions were rarely fulfilled especially for data that have a spatial and hierarchical structure. We worked out the problem using mixed effect model. This paper discusses the model construction of productivity and several characteristics in the production line by taking location as a random effect. The simple model with high utility that satisfies the necessary regression assumptions was built using a free statistic software R version 2.6.1.
Academic Achievement, Intelligence, and Creativity: A Regression Surface Analysis.
Marjoribanks, K
1976-01-01
Data collected on 400 12-year-old English school children were used to examine relations between measures of intelligence, creativity and academic achievement. Complex multiple regression models, which included terms to account for the possible interaction and curvilinear relations between intelligence, creativity and academic achievement were used to construct regression surfaces. The surfaces showed that the traditional threshold hypothesis, which suggests that beyond a certain level of intelligence academic achievement is related increasingly to creativity and ceases to be related strongly to intelligence, was not supported. For some areas of academic performance the results suggest an alternate proposition, that creativity ceases to be related to achievement after a threshold level of intelligence has been reached. It was also found that at high levels of verbal ability, non-verbal ability and creativity appeared to have differential relations with academic achievement.
Model performance analysis and model validation in logistic regression
Rosa Arboretti Giancristofaro
2007-10-01
Full Text Available In this paper a new model validation procedure for a logistic regression model is presented. At first, we illustrate a brief review of different techniques of model validation. Next, we define a number of properties required for a model to be considered "good", and a number of quantitative performance measures. Lastly, we describe a methodology for the assessment of the performance of a given model by using an example taken from a management study.
A New Approach in Regression Analysis for Modeling Adsorption Isotherms
Dana D. Marković
2014-01-01
Full Text Available Numerous regression approaches to isotherm parameters estimation appear in the literature. The real insight into the proper modeling pattern can be achieved only by testing methods on a very big number of cases. Experimentally, it cannot be done in a reasonable time, so the Monte Carlo simulation method was applied. The objective of this paper is to introduce and compare numerical approaches that involve different levels of knowledge about the noise structure of the analytical method used for initial and equilibrium concentration determination. Six levels of homoscedastic noise and five types of heteroscedastic noise precision models were considered. Performance of the methods was statistically evaluated based on median percentage error and mean absolute relative error in parameter estimates. The present study showed a clear distinction between two cases. When equilibrium experiments are performed only once, for the homoscedastic case, the winning error function is ordinary least squares, while for the case of heteroscedastic noise the use of orthogonal distance regression or Margart’s percent standard deviation is suggested. It was found that in case when experiments are repeated three times the simple method of weighted least squares performed as well as more complicated orthogonal distance regression method.
Kleijnen, J.P.C.
1995-01-01
This tutorial discusses what-if analysis and optimization of System Dynamics models. These problems are solved, using the statistical techniques of regression analysis and design of experiments (DOE). These issues are illustrated by applying the statistical techniques to a System Dynamics model for
Liu, Xiang; Saat, M Rapik; Qin, Xiao; Barkan, Christopher P L
2013-10-01
Derailments are the most common type of freight-train accidents in the United States. Derailments cause damage to infrastructure and rolling stock, disrupt services, and may cause casualties and harm the environment. Accordingly, derailment analysis and prevention has long been a high priority in the rail industry and government. Despite the low probability of a train derailment, the potential for severe consequences justify the need to better understand the factors influencing train derailment severity. In this paper, a zero-truncated negative binomial (ZTNB) regression model is developed to estimate the conditional mean of train derailment severity. Recognizing that the mean is not the only statistic describing data distribution, a quantile regression (QR) model is also developed to estimate derailment severity at different quantiles. The two regression models together provide a better understanding of train derailment severity distribution. Results of this work can be used to estimate train derailment severity under various operational conditions and by different accident causes. This research is intended to provide insights regarding development of cost-efficient train safety policies. Copyright © 2013 Elsevier Ltd. All rights reserved.
Analysis of some methods for reduced rank Gaussian process regression
Quinonero-Candela, J.; Rasmussen, Carl Edward
2005-01-01
While there is strong motivation for using Gaussian Processes (GPs) due to their excellent performance in regression and classification problems, their computational complexity makes them impractical when the size of the training set exceeds a few thousand cases. This has motivated the recent...... Gaussian Processes (RRGPs) are equivalent to finite sparse linear models. We also introduce the concept of degenerate GPs and show that they correspond to inappropriate priors. We show how to modify the RRGP to prevent it from being degenerate at test time. Training RRGPs consists both in learning...
Parsons, Vickie s.
2009-01-01
The request to conduct an independent review of regression models, developed for determining the expected Launch Commit Criteria (LCC) External Tank (ET)-04 cycle count for the Space Shuttle ET tanking process, was submitted to the NASA Engineering and Safety Center NESC on September 20, 2005. The NESC team performed an independent review of regression models documented in Prepress Regression Analysis, Tom Clark and Angela Krenn, 10/27/05. This consultation consisted of a peer review by statistical experts of the proposed regression models provided in the Prepress Regression Analysis. This document is the consultation's final report.
Păniţă, Ovidiu
2015-09-01
In the years 2012-2014 on Banu-Maracine DRS there were tested an assortment of 25 isogenic lines of wheat (Triticum aestivum ssp.vulgare), the analyzed characters being the number of seeds/spike, seeds weight/spike (g), no. of spikes/m2, weight of a thousand seeds (WTS) (g) and no. of emerged plants/m2. Based on recorded data and statistical processing of those, they were identified a numbers of links between these characters. Also available regression models were identified between some of the studied characters. Based on component analysis, no. of seeds/spike and seeds weight/spike are components that influence in excess of 88% variance analysis, a total of seven genotypes with positive scores for both factors.
Selenium Exposure and Cancer Risk: an Updated Meta-analysis and Meta-regression.
Cai, Xianlei; Wang, Chen; Yu, Wanqi; Fan, Wenjie; Wang, Shan; Shen, Ning; Wu, Pengcheng; Li, Xiuyang; Wang, Fudi
2016-01-20
The objective of this study was to investigate the associations between selenium exposure and cancer risk. We identified 69 studies and applied meta-analysis, meta-regression and dose-response analysis to obtain available evidence. The results indicated that high selenium exposure had a protective effect on cancer risk (pooled OR = 0.78; 95%CI: 0.73-0.83). The results of linear and nonlinear dose-response analysis indicated that high serum/plasma selenium and toenail selenium had the efficacy on cancer prevention. However, we did not find a protective efficacy of selenium supplement. High selenium exposure may have different effects on specific types of cancer. It decreased the risk of breast cancer, lung cancer, esophageal cancer, gastric cancer, and prostate cancer, but it was not associated with colorectal cancer, bladder cancer, and skin cancer.
Simulation Experiments in Practice : Statistical Design and Regression Analysis
Kleijnen, J.P.C.
2007-01-01
In practice, simulation analysts often change only one factor at a time, and use graphical analysis of the resulting Input/Output (I/O) data. The goal of this article is to change these traditional, naïve methods of design and analysis, because statistical theory proves that more information is obta
Modeling Information Content Via Dirichlet-Multinomial Regression Analysis.
Ferrari, Alberto
2017-02-16
Shannon entropy is being increasingly used in biomedical research as an index of complexity and information content in sequences of symbols, e.g. languages, amino acid sequences, DNA methylation patterns and animal vocalizations. Yet, distributional properties of information entropy as a random variable have seldom been the object of study, leading to researchers mainly using linear models or simulation-based analytical approach to assess differences in information content, when entropy is measured repeatedly in different experimental conditions. Here a method to perform inference on entropy in such conditions is proposed. Building on results coming from studies in the field of Bayesian entropy estimation, a symmetric Dirichlet-multinomial regression model, able to deal efficiently with the issue of mean entropy estimation, is formulated. Through a simulation study the model is shown to outperform linear modeling in a vast range of scenarios and to have promising statistical properties. As a practical example, the method is applied to a data set coming from a real experiment on animal communication.
Air Pollution Analysis using Ontologies and Regression Models
Parul Choudhary
2016-07-01
Full Text Available Rapidly throughout the world economy, "the expansive Web" in the "world" explosive growth, rapidly growing market characterized by short product cycles exists and the demand for increased flexibility as well as the extensive use of a new data vision managed data society. A new socio-economic system that relies more and more on movement and allocation results in data whose daily existence, refinement, economy and adjust the exchange industry. Cooperative Engineering Co -operation and multi -disciplinary installed on people's cooperation is a good example. Semantic Web is a new form of Web content that is meaningful to computers and additional approved another example. Communication, vision sharing and exchanging data Society's are new commercial bet. Urban air pollution modeling and data processing techniques need elevated Association. Artificial intelligence in countless ways and breakthrough technologies can solve environmental problems from uneven offers. A method for data to formal ontology means a true meaning and lack of ambiguity to allow us to portray memo. In this work we survey regression model for ontologies and air pollution.
Survival analysis of cervical cancer using stratified Cox regression
Purnami, S. W.; Inayati, K. D.; Sari, N. W. Wulan; Chosuvivatwong, V.; Sriplung, H.
2016-04-01
Cervical cancer is one of the mostly widely cancer cause of the women death in the world including Indonesia. Most cervical cancer patients come to the hospital already in an advanced stadium. As a result, the treatment of cervical cancer becomes more difficult and even can increase the death's risk. One of parameter that can be used to assess successfully of treatment is the probability of survival. This study raises the issue of cervical cancer survival patients at Dr. Soetomo Hospital using stratified Cox regression based on six factors such as age, stadium, treatment initiation, companion disease, complication, and anemia. Stratified Cox model is used because there is one independent variable that does not satisfy the proportional hazards assumption that is stadium. The results of the stratified Cox model show that the complication variable is significant factor which influent survival probability of cervical cancer patient. The obtained hazard ratio is 7.35. It means that cervical cancer patient who has complication is at risk of dying 7.35 times greater than patient who did not has complication. While the adjusted survival curves showed that stadium IV had the lowest probability of survival.
Meta-Analysis: An Introduction Using Regression Models
Rhodes, William
2012-01-01
Research synthesis of evaluation findings is a multistep process. An investigator identifies a research question, acquires the relevant literature, codes findings from that literature, and analyzes the coded data to estimate the average treatment effect and its distribution in a population of interest. The process of estimating the average…
Multiple Regression Analysis of mRNA-miRNA Associations in Colorectal Cancer Pathway
Fengfeng Wang
2014-01-01
Full Text Available Background. MicroRNA (miRNA is a short and endogenous RNA molecule that regulates posttranscriptional gene expression. It is an important factor for tumorigenesis of colorectal cancer (CRC, and a potential biomarker for diagnosis, prognosis, and therapy of CRC. Our objective is to identify the related miRNAs and their associations with genes frequently involved in CRC microsatellite instability (MSI and chromosomal instability (CIN signaling pathways. Results. A regression model was adopted to identify the significantly associated miRNAs targeting a set of candidate genes frequently involved in colorectal cancer MSI and CIN pathways. Multiple linear regression analysis was used to construct the model and find the significant mRNA-miRNA associations. We identified three significantly associated mRNA-miRNA pairs: BCL2 was positively associated with miR-16 and SMAD4 was positively associated with miR-567 in the CRC tissue, while MSH6 was positively associated with miR-142-5p in the normal tissue. As for the whole model, BCL2 and SMAD4 models were not significant, and MSH6 model was significant. The significant associations were different in the normal and the CRC tissues. Conclusion. Our results have laid down a solid foundation in exploration of novel CRC mechanisms, and identification of miRNA roles as oncomirs or tumor suppressor mirs in CRC.
Introduction to mixed modelling beyond regression and analysis of variance
Galwey, N W
2007-01-01
Mixed modelling is one of the most promising and exciting areas of statistical analysis, enabling more powerful interpretation of data through the recognition of random effects. However, many perceive mixed modelling as an intimidating and specialized technique.
A regression analysis on the green olives debittering
Kopsidas, Gerassimos C.
1991-12-01
Full Text Available In this paper, a regression model, which gives the debittering time t as a function of the sodium hydroxide concentration 0 and the debittering temperature T, at the debittering of medium size green olive fruit of the Conservolea variety, is fitted. This model has the simple form t=a_{o}C^{a1} ∙ e^{a2/T}, where a_{o}, a_{1}, and a_{2} are constants. The values of a_{o}, a_{1}, and a_{2} are determined by the method of least squares from a set of experimental data. The determined model is very satisfactory for the conditions in which Greek green olives are debittered.
En este artículo se ajusta un modelo de regresión, que da el tiempo de endulzamiento t en función de la concentración de hidróxido sódico C y la temperatura de endulzamiento T, en el endulzamiento de aceitunas verdes de tamaño mediano de la variedad Conservolea. Este modelo tiene la forma simple t=a_{o}C^{a1} ∙ e^{a2/T}, donde a_{1} y a_{2} son constantes. Los valores de a_{o}, a_{1}, y a_{2} son determinados por el método de los mínimos cuadrados a partir de un grupo de datos experimentales. El modelo determinado es muy satisfactorio para las condiciones en las que las aceitunas verdes griegas son endulzadas.
A Quality Assessment Tool for Non-Specialist Users of Regression Analysis
Argyrous, George
2015-01-01
This paper illustrates the use of a quality assessment tool for regression analysis. It is designed for non-specialist "consumers" of evidence, such as policy makers. The tool provides a series of questions such consumers of evidence can ask to interrogate regression analysis, and is illustrated with reference to a recent study published…
Mears, Lisa; Nørregaard, Rasmus; Sin, Gürkan;
2016-01-01
process operating at Novozymes A/S. Following the FUPCR methodology, the final product concentration could be predicted with an average prediction error of 7.4%. Multiple iterations of preprocessing were applied by implementing the methodology to identify the best data handling methods for the model....... It is shown that application of functional data analysis and the choice of variance scaling method have the greatest impact on the prediction accuracy. Considering the vast amount of batch process data continuously generated in industry, this methodology can potentially contribute as a tool to identify......This work proposes a methodology utilizing functional unfold principal component regression (FUPCR), for application to industrial batch process data as a process modeling and optimization tool. The methodology is applied to an industrial fermentation dataset, containing 30 batches of a production...
Methods of Detecting Outliers in A Regression Analysis Model ...
PROF. O. E. OSUAGWU
2013-06-01
Jun 1, 2013 ... Capacity), X2 (Design Pressure), X3 (Boiler Type), X4 (Drum Type) were used. The analysis of the ... 1.2 Identification Of Outliers. There is no such thing as a simple test. However, there are many ..... Psychological. Bulletin, 95 ...
A Noncentral "t" Regression Model for Meta-Analysis
Camilli, Gregory; de la Torre, Jimmy; Chiu, Chia-Yi
2010-01-01
In this article, three multilevel models for meta-analysis are examined. Hedges and Olkin suggested that effect sizes follow a noncentral "t" distribution and proposed several approximate methods. Raudenbush and Bryk further refined this model; however, this procedure is based on a normal approximation. In the current research literature, this…
An improved multiple linear regression and data analysis computer program package
Sidik, S. M.
1972-01-01
NEWRAP, an improved version of a previous multiple linear regression program called RAPIER, CREDUC, and CRSPLT, allows for a complete regression analysis including cross plots of the independent and dependent variables, correlation coefficients, regression coefficients, analysis of variance tables, t-statistics and their probability levels, rejection of independent variables, plots of residuals against the independent and dependent variables, and a canonical reduction of quadratic response functions useful in optimum seeking experimentation. A major improvement over RAPIER is that all regression calculations are done in double precision arithmetic.
Wiggins, Lisa D.; Rice, Catherine E.; Baio, Jon
2009-01-01
This study evaluated the phenomenon of autistic regression using population-based data. The sample comprised 285 children who met the autism spectrum disorder (ASD) case definition within an ongoing surveillance program. Results indicated that children with a previously documented ASD diagnosis had higher rates of autistic regression than children…
Random Decrement and Regression Analysis of Traffic Responses of Bridges
Asmussen, J. C.; Ibrahim, S. R.; Brincker, Rune
The topic of this paper is the estimation of modal parameters from ambient data by applying the Random Decrement technique. The data from the Queensborough Bridge over the Fraser River in Vancouver, Canada have been applied. The loads producing the dynamic response are ambient, e.g. wind, traffic...... and small ground motion. The Random Decrement technique is used to estimate the correlation function or the free decays from the ambient data. From these functions, the modal parameters are extracted using the Ibrahim Time Domain method. The possible influence of the traffic mass load on the bridge...... of the analysis using the Random Decrement technique are compared with results from an analysis based on fast Fourier transformations....
Random Decrement and Regression Analysis of Traffic Responses of Bridges
Asmussen, J. C.; Ibrahim, S. R.; Brincker, Rune
1996-01-01
The topic of this paper is the estimation of modal parameters from ambient data by applying the Random Decrement technique. The data fro the Queensborough Bridge over the Fraser River in Vancouver, Canada have been applied. The loads producing the dynamic response are ambient, e. g. wind, traffic...... and small ground motion. The random Decrement technique is used to estimate the correlation function or the free decays from the ambient data. From these functions, the modal parameters are extracted using the Ibrahim Time domain method. The possible influence of the traffic mass load on the bridge...... of the analysis using the Random decrement technique are compared with results from an analysis based on fast Fourier transformations....
Analysis of cost regression and post-accident absence
Wojciech, Drozd
2017-07-01
The article presents issues related with costs of work safety. It proves the thesis that economic aspects cannot be overlooked in effective management of occupational health and safety and that adequate expenditures on safety can bring tangible benefits to the company. Reliable analysis of this problem is essential for the description the problem of safety the work. In the article attempts to carry it out using the procedures of mathematical statistics [1, 2, 3].
The Use of Nonparametric Kernel Regression Methods in Econometric Production Analysis
Czekaj, Tomasz Gerard
This PhD thesis addresses one of the fundamental problems in applied econometric analysis, namely the econometric estimation of regression functions. The conventional approach to regression analysis is the parametric approach, which requires the researcher to specify the form of the regression...... to avoid this problem. The main objective is to investigate the applicability of the nonparametric kernel regression method in applied production analysis. The focus of the empirical analyses included in this thesis is the agricultural sector in Poland. Data on Polish farms are used to investigate...... practically and politically relevant problems and to illustrate how nonparametric regression methods can be used in applied microeconomic production analysis both in panel data and cross-section data settings. The thesis consists of four papers. The first paper addresses problems of parametric...
Measurement and Analysis of Test Suite Volume Metrics for Regression Testing
S Raju
2014-01-01
Full Text Available Regression testing intends to ensure that a software applications works as specified after changes made to it during maintenance. It is an important phase in software development lifecycle. Regression testing is the re-execution of some subset of test cases that has already been executed. It is an expensive process used to detect defects due to regressions. Regression testing has been used to support software-testing activities and assure acquiring an appropriate quality through several versions of a software product during its development and maintenance. Regression testing assures the quality of modified applications. In this proposed work, a study and analysis of metrics related to test suite volume was undertaken. It was shown that the software under test needs more test cases after changes were made to it. A comparative analysis was performed for finding the change in test suite size before and after the regression test.
Factor analysis identifies subgroups of constipation
Philip G Dinning; Mike Jones; Linda Hunt; Sergio E Fuentealba; Jamshid Kalanter; Denis W King; David Z Lubowski; Nicholas J Talley; Ian J Cook
2011-01-01
AIM: To determine whether distinct symptom groupings exist in a constipated population and whether such grouping might correlate with quantifiable pathophysiological measures of colonic dysfunction. METHODS: One hundred and ninety-one patients presenting to a Gastroenterology clinic with constipation and 32 constipated patients responding to a newspaper advertisement completed a 53-item, wide-ranging selfreport questionnaire. One hundred of these patients had colonic transit measured scintigraphically. Factor analysis determined whether constipation-related symptoms grouped into distinct aspects of symptomatology. Cluster analysis was used to determine whether individual patients naturally group into distinct subtypes. RESULTS: Cluster analysis yielded a 4 cluster solution with the presence or absence of pain and laxative unresponsiveness providing the main descriptors. Amongst all clusters there was a considerable proportion of patients with demonstrable delayed colon transit, irritable bowel syndrome positive criteria and regular stool frequency. The majority of patients with these characteristics also reported regular laxative use. CONCLUSION: Factor analysis identified four constipation subgroups, based on severity and laxative unresponsiveness, in a constipated population. However, clear stratification into clinically identifiable groups remains imprecise.
Selection of higher order regression models in the analysis of multi-factorial transcription data.
Olivia Prazeres da Costa
Full Text Available INTRODUCTION: Many studies examine gene expression data that has been obtained under the influence of multiple factors, such as genetic background, environmental conditions, or exposure to diseases. The interplay of multiple factors may lead to effect modification and confounding. Higher order linear regression models can account for these effects. We present a new methodology for linear model selection and apply it to microarray data of bone marrow-derived macrophages. This experiment investigates the influence of three variable factors: the genetic background of the mice from which the macrophages were obtained, Yersinia enterocolitica infection (two strains, and a mock control, and treatment/non-treatment with interferon-γ. RESULTS: We set up four different linear regression models in a hierarchical order. We introduce the eruption plot as a new practical tool for model selection complementary to global testing. It visually compares the size and significance of effect estimates between two nested models. Using this methodology we were able to select the most appropriate model by keeping only relevant factors showing additional explanatory power. Application to experimental data allowed us to qualify the interaction of factors as either neutral (no interaction, alleviating (co-occurring effects are weaker than expected from the single effects, or aggravating (stronger than expected. We find a biologically meaningful gene cluster of putative C2TA target genes that appear to be co-regulated with MHC class II genes. CONCLUSIONS: We introduced the eruption plot as a tool for visual model comparison to identify relevant higher order interactions in the analysis of expression data obtained under the influence of multiple factors. We conclude that model selection in higher order linear regression models should generally be performed for the analysis of multi-factorial microarray data.
Identifying marker typing incompatibilities in linkage analysis
Stringham, H.M.; Boehnke, M. [Univ. of Michigan, Ann Arbor, MI (United States)
1996-10-01
A common problem encountered in linkage analyses is that execution of the computer program is halted because of genotypes in the data that are inconsistent with Mendelian inheritance. Such inconsistencies may arise because of pedigree errors or errors in typing. In some cases, the source of the inconsistencies is easily identified by examining the pedigree. In others, the error is not obvious, and substantial time and effort are required to identify the responsible genotypes. We have developed two methods for automatically identifying those individuals whose genotypes are most likely the cause of the inconsistencies. First, we calculate the posterior probability of genotyping error for each member of the pedigree, given the marker data on all pedigree members and allowing anyone in the pedigree to have an error. Second, we identify those individuals whose genotypes could be solely responsible for the inconsistency in the pedigree. We illustrate these methods with two examples: one a pedigree error, the second a genotyping error. These methods have been implemented as a module of the pedigree analysis program package MENDEL. 9 refs., 2 figs., 2 tabs.
Robust analysis of trends in noisy tokamak confinement data using geodesic least squares regression
Verdoolaege, G.; Shabbir, A.; Hornung, G.
2016-11-01
Regression analysis is a very common activity in fusion science for unveiling trends and parametric dependencies, but it can be a difficult matter. We have recently developed the method of geodesic least squares (GLS) regression that is able to handle errors in all variables, is robust against data outliers and uncertainty in the regression model, and can be used with arbitrary distribution models and regression functions. We here report on first results of application of GLS to estimation of the multi-machine scaling law for the energy confinement time in tokamaks, demonstrating improved consistency of the GLS results compared to standard least squares.
Development of a User Interface for a Regression Analysis Software Tool
Ulbrich, Norbert Manfred; Volden, Thomas R.
2010-01-01
An easy-to -use user interface was implemented in a highly automated regression analysis tool. The user interface was developed from the start to run on computers that use the Windows, Macintosh, Linux, or UNIX operating system. Many user interface features were specifically designed such that a novice or inexperienced user can apply the regression analysis tool with confidence. Therefore, the user interface s design minimizes interactive input from the user. In addition, reasonable default combinations are assigned to those analysis settings that influence the outcome of the regression analysis. These default combinations will lead to a successful regression analysis result for most experimental data sets. The user interface comes in two versions. The text user interface version is used for the ongoing development of the regression analysis tool. The official release of the regression analysis tool, on the other hand, has a graphical user interface that is more efficient to use. This graphical user interface displays all input file names, output file names, and analysis settings for a specific software application mode on a single screen which makes it easier to generate reliable analysis results and to perform input parameter studies. An object-oriented approach was used for the development of the graphical user interface. This choice keeps future software maintenance costs to a reasonable limit. Examples of both the text user interface and graphical user interface are discussed in order to illustrate the user interface s overall design approach.
Denli, H. H.; Koc, Z.
2015-12-01
Estimation of real properties depending on standards is difficult to apply in time and location. Regression analysis construct mathematical models which describe or explain relationships that may exist between variables. The problem of identifying price differences of properties to obtain a price index can be converted into a regression problem, and standard techniques of regression analysis can be used to estimate the index. Considering regression analysis for real estate valuation, which are presented in real marketing process with its current characteristics and quantifiers, the method will help us to find the effective factors or variables in the formation of the value. In this study, prices of housing for sale in Zeytinburnu, a district in Istanbul, are associated with its characteristics to find a price index, based on information received from a real estate web page. The associated variables used for the analysis are age, size in m2, number of floors having the house, floor number of the estate and number of rooms. The price of the estate represents the dependent variable, whereas the rest are independent variables. Prices from 60 real estates have been used for the analysis. Same price valued locations have been found and plotted on the map and equivalence curves have been drawn identifying the same valued zones as lines.
Generalized multilevel function-on-scalar regression and principal component analysis.
Goldsmith, Jeff; Zipunnikov, Vadim; Schrack, Jennifer
2015-06-01
This manuscript considers regression models for generalized, multilevel functional responses: functions are generalized in that they follow an exponential family distribution and multilevel in that they are clustered within groups or subjects. This data structure is increasingly common across scientific domains and is exemplified by our motivating example, in which binary curves indicating physical activity or inactivity are observed for nearly 600 subjects over 5 days. We use a generalized linear model to incorporate scalar covariates into the mean structure, and decompose subject-specific and subject-day-specific deviations using multilevel functional principal components analysis. Thus, functional fixed effects are estimated while accounting for within-function and within-subject correlations, and major directions of variability within and between subjects are identified. Fixed effect coefficient functions and principal component basis functions are estimated using penalized splines; model parameters are estimated in a Bayesian framework using Stan, a programming language that implements a Hamiltonian Monte Carlo sampler. Simulations designed to mimic the application have good estimation and inferential properties with reasonable computation times for moderate datasets, in both cross-sectional and multilevel scenarios; code is publicly available. In the application we identify effects of age and BMI on the time-specific change in probability of being active over a 24-hour period; in addition, the principal components analysis identifies the patterns of activity that distinguish subjects and days within subjects.
The Use of Nonparametric Kernel Regression Methods in Econometric Production Analysis
Czekaj, Tomasz Gerard
This PhD thesis addresses one of the fundamental problems in applied econometric analysis, namely the econometric estimation of regression functions. The conventional approach to regression analysis is the parametric approach, which requires the researcher to specify the form of the regression...... function. However, the a priori specification of a functional form involves the risk of choosing one that is not similar to the “true” but unknown relationship between the regressors and the dependent variable. This problem, known as parametric misspecification, can result in biased parameter estimates...... and nonparametric estimations of production functions in order to evaluate the optimal firm size. The second paper discusses the use of parametric and nonparametric regression methods to estimate panel data regression models. The third paper analyses production risk, price uncertainty, and farmers' risk preferences...
Telmo, C; Lousada, J; Moreira, N
2010-06-01
The gross calorific value (GCV), proximate, ultimate and chemical analysis of debark wood in Portugal were studied, for future utilization in wood pellets industry and the results compared with CEN/TS 14961. The relationship between GCV, ultimate and chemical analysis were determined by multiple regression stepwise backward. The treatment between hardwoods-softwoods did not result in significant statistical differences for proximate, ultimate and chemical analysis. Significant statistical differences were found in carbon for National (hardwoods-softwoods) and (National-tropical) hardwoods in volatile matter, fixed carbon, carbon and oxygen and also for chemical analysis in National (hardwoods-softwoods) for F and (National-tropical) hardwoods for Br. GCV was highly positively related to C (0.79 * * *) and negatively to O (-0.71 * * *). The final independent variables of the model were (C, O, S, Zn, Ni, Br) with R(2)=0.86; F=27.68 * * *. The hydrogen did not contribute statistically to the energy content.
Integrative analysis of multiple diverse omics datasets by sparse group multitask regression
Dongdong eLin
2014-10-01
Full Text Available A variety of high throughput genome-wide assays enable the exploration of genetic risk factors underlying complex traits. Although these studies have remarkable impact on identifying susceptible biomarkers, they suffer from issues such as limited sample size and low reproducibility. Combining individual studies of different genetic levels/platforms has the promise to improve the power and consistency of biomarker identification. In this paper, we propose a novel integrative method, namely sparse group multitask regression, for integrating diverse omics datasets, platforms and populations to identify risk genes/factors of complex diseases. This method combines multitask learning with sparse group regularization, which will: 1 treat the biomarker identification in each single study as a task and then combine them by multitask learning; 2 group variables from all studies for identifying significant genes; 3 enforce sparse constraint on groups of variables to overcome the ‘small sample, but large variables’ problem. We introduce two sparse group penalties: sparse group lasso and sparse group ridge in our multitask model, and provide an effective algorithm for each model. In addition, we propose a significance test for the identification of potential risk genes. Two simulation studies are performed to evaluate the performance of our integrative method by comparing it with conventional meta-analysis method. The results show that our sparse group multitask method outperforms meta-analysis method significantly. In an application to our osteoporosis studies, 7 genes are identified as significant genes by our method and are found to have significant effects in other three independent studies for validation. The most significant gene SOD2 has been identified in our previous osteoporosis study involving the same expression dataset. Several other genes such as TREML2, HTR1E and GLO1 are shown to be novel susceptible genes for osteoporosis, as confirmed
Gayou, Olivier; Das, Shiva K; Zhou, Su-Min; Marks, Lawrence B; Parda, David S; Miften, Moyed
2008-12-01
A given outcome of radiotherapy treatment can be modeled by analyzing its correlation with a combination of dosimetric, physiological, biological, and clinical factors, through a logistic regression fit of a large patient population. The quality of the fit is measured by the combination of the predictive power of this particular set of factors and the statistical significance of the individual factors in the model. We developed a genetic algorithm (GA), in which a small sample of all the possible combinations of variables are fitted to the patient data. New models are derived from the best models, through crossover and mutation operations, and are in turn fitted. The process is repeated until the sample converges to the combination of factors that best predicts the outcome. The GA was tested on a data set that investigated the incidence of lung injury in NSCLC patients treated with 3DCRT. The GA identified a model with two variables as the best predictor of radiation pneumonitis: the V30 (p=0.048) and the ongoing use of tobacco at the time of referral (p=0.074). This two-variable model was confirmed as the best model by analyzing all possible combinations of factors. In conclusion, genetic algorithms provide a reliable and fast way to select significant factors in logistic regression analysis of large clinical studies.
Accumulated feedlot manure negatively affects the environment. The objective was to test the validity of using EMI mapping methods combined with predictive-based sampling and ordinary linear regression for measuring spatially variable manure accumulation. A Dualem-1S EMI meter also recording GPS c...
Gelfusa, M; Gaudio, P; Malizia, A; Murari, A; Vega, J; Richetta, M; Gonzalez, S
2014-06-01
Recently, surveying large areas in an automatic way, for early detection of both harmful chemical agents and forest fires, has become a strategic objective of defence and public health organisations. The Lidar and Dial techniques are widely recognized as a cost-effective alternative to monitor large portions of the atmosphere. To maximize the effectiveness of the measurements and to guarantee reliable monitoring of large areas, new data analysis techniques are required. In this paper, an original tool, the Universal Multi Event Locator, is applied to the problem of automatically identifying the time location of peaks in Lidar and Dial measurements for environmental physics applications. This analysis technique improves various aspects of the measurements, ranging from the resilience to drift in the laser sources to the increase of the system sensitivity. The method is also fully general, purely software, and can therefore be applied to a large variety of problems without any additional cost. The potential of the proposed technique is exemplified with the help of data of various instruments acquired during several experimental campaigns in the field.
Das Sumonkanti; Rahman Rajwanur M
2011-01-01
Abstract Background The study attempts to develop an ordinal logistic regression (OLR) model to identify the determinants of child malnutrition instead of developing traditional binary logistic regression (BLR) model using the data of Bangladesh Demographic and Health Survey 2004. Methods Based on weight-for-age anthropometric index (Z-score) child nutrition status is categorized into three groups-severely undernourished (< -3.0), moderately undernourished (-3.0 to -2.01) and nourished (≥-2.0...
A factor analysis-multiple regression model for source apportionment of suspended particulate matter
Okamoto, Shin'ichi; Hayashi, Masayuki; Nakajima, Masaomi; Kainuma, Yasutaka; Shiozawa, Kiyoshige
A factor analysis-multiple regression (FA-MR) model has been used for a source apportionment study in the Tokyo metropolitan area. By a varimax rotated factor analysis, five source types could be identified: refuse incineration, soil and automobile, secondary particles, sea salt and steel mill. Quantitative estimations using the FA-MR model corresponded to the calculated contributing concentrations determined by using a weighted least-squares CMB model. However, the source type of refuse incineration identified by the FA-MR model was similar to that of biomass burning, rather than that produced by an incineration plant. The estimated contributions of sea salt and steel mill by the FA-MR model contained those of other sources, which have the same temporal variation of contributing concentrations. This symptom was caused by a multicollinearity problem. Although this result shows the limitation of the multivariate receptor model, it gives useful information concerning source types and their distribution by comparing with the results of the CMB model. In the Tokyo metropolitan area, the contributions from soil (including road dust), automobile, secondary particles and refuse incineration (biomass burning) were larger than industrial contributions: fuel oil combustion and steel mill. However, since vanadium is highly correlated with SO 42- and other secondary particle related elements, a major portion of secondary particles is considered to be related to fuel oil combustion.
Lamichhane, Archana P.; Liese, Angela D.; Urbina, Elaine M.; Crandell, Jamie L.; Jaacks, Lindsay M.; Dabelea, Dana; Black, Mary Helen; Merchant, Anwar T.; Mayer-Davis, Elizabeth J.
2014-01-01
BACKGROUND/OBJECTIVES Youth with type 1 diabetes (T1DM) are at substantially increased risk for adverse vascular outcomes, but little is known about the influence of dietary behavior on cardiovascular disease (CVD) risk profile. We aimed to identify dietary intake patterns associated with CVD risk factors and evaluate their impact on arterial stiffness (AS) measures collected thereafter in a cohort of youth with T1DM. SUBJECTS/METHODS Baseline diet data from a food frequency questionnaire and CVD risk factors (triglycerides, LDL-cholesterol, systolic BP, HbA1c, C-reactive protein and waist circumference) were available for 1,153 youth aged ≥10 years with T1DM from the SEARCH for Diabetes in Youth Study. A dietary intake pattern was identified using 33 food-groups as predictors and six CVD risk factors as responses in reduced rank regression (RRR) analysis. Associations of this RRR-derived dietary pattern with AS measures [augmentation index(AIx75), n=229; pulse wave velocity(PWV), n=237; and brachial distensibility(BrachD), n=228] were then assessed using linear regression. RESULTS The RRR-derived pattern was characterized by high intakes of sugar-sweetened beverages (SSB) and diet soda, eggs, potatoes and high-fat meats, and low intakes of sweets/desserts and low-fat dairy; major contributors were SSB and diet soda. This pattern captured the largest variability in adverse CVD risk profile and was subsequently associated with AIx75 (β=0.47; p<0.01). The mean difference in AIx75 concentration between the highest and the lowest dietary pattern quartiles was 4.3% in fully adjusted model. CONCLUSIONS Intervention strategies to reduce consumption of unhealthful foods and beverages among youth with T1DM may significantly improve CVD risk profile and ultimately reduce the risk for AS. PMID:24865480
Detecting overdispersion in count data: A zero-inflated Poisson regression analysis
Afiqah Muhamad Jamil, Siti; Asrul Affendi Abdullah, M.; Kek, Sie Long; Nor, Maria Elena; Mohamed, Maryati; Ismail, Norradihah
2017-09-01
This study focusing on analysing count data of butterflies communities in Jasin, Melaka. In analysing count dependent variable, the Poisson regression model has been known as a benchmark model for regression analysis. Continuing from the previous literature that used Poisson regression analysis, this study comprising the used of zero-inflated Poisson (ZIP) regression analysis to gain acute precision on analysing the count data of butterfly communities in Jasin, Melaka. On the other hands, Poisson regression should be abandoned in the favour of count data models, which are capable of taking into account the extra zeros explicitly. By far, one of the most popular models include ZIP regression model. The data of butterfly communities which had been called as the number of subjects in this study had been taken in Jasin, Melaka and consisted of 131 number of subjects visits Jasin, Melaka. Since the researchers are considering the number of subjects, this data set consists of five families of butterfly and represent the five variables involve in the analysis which are the types of subjects. Besides, the analysis of ZIP used the SAS procedure of overdispersion in analysing zeros value and the main purpose of continuing the previous study is to compare which models would be better than when exists zero values for the observation of the count data. The analysis used AIC, BIC and Voung test of 5% level significance in order to achieve the objectives. The finding indicates that there is a presence of over-dispersion in analysing zero value. The ZIP regression model is better than Poisson regression model when zero values exist.
A primer for biomedical scientists on how to execute model II linear regression analysis.
Ludbrook, John
2012-04-01
1. There are two very different ways of executing linear regression analysis. One is Model I, when the x-values are fixed by the experimenter. The other is Model II, in which the x-values are free to vary and are subject to error. 2. I have received numerous complaints from biomedical scientists that they have great difficulty in executing Model II linear regression analysis. This may explain the results of a Google Scholar search, which showed that the authors of articles in journals of physiology, pharmacology and biochemistry rarely use Model II regression analysis. 3. I repeat my previous arguments in favour of using least products linear regression analysis for Model II regressions. I review three methods for executing ordinary least products (OLP) and weighted least products (WLP) regression analysis: (i) scientific calculator and/or computer spreadsheet; (ii) specific purpose computer programs; and (iii) general purpose computer programs. 4. Using a scientific calculator and/or computer spreadsheet, it is easy to obtain correct values for OLP slope and intercept, but the corresponding 95% confidence intervals (CI) are inaccurate. 5. Using specific purpose computer programs, the freeware computer program smatr gives the correct OLP regression coefficients and obtains 95% CI by bootstrapping. In addition, smatr can be used to compare the slopes of OLP lines. 6. When using general purpose computer programs, I recommend the commercial programs systat and Statistica for those who regularly undertake linear regression analysis and I give step-by-step instructions in the Supplementary Information as to how to use loss functions.
Wang, Ming; Flanders, W Dana; Bostick, Roberd M; Long, Qi
2012-12-20
Measurement error is common in epidemiological and biomedical studies. When biomarkers are measured in batches or groups, measurement error is potentially correlated within each batch or group. In regression analysis, most existing methods are not applicable in the presence of batch-specific measurement error in predictors. We propose a robust conditional likelihood approach to account for batch-specific error in predictors when batch effect is additive and the predominant source of error, which requires no assumptions on the distribution of measurement error. Although a regression model with batch as a categorical covariable yields the same parameter estimates as the proposed conditional likelihood approach for linear regression, this result does not hold in general for all generalized linear models, in particular, logistic regression. Our simulation studies show that the conditional likelihood approach achieves better finite sample performance than the regression calibration approach or a naive approach without adjustment for measurement error. In the case of logistic regression, our proposed approach is shown to also outperform the regression approach with batch as a categorical covariate. In addition, we also examine a 'hybrid' approach combining the conditional likelihood method and the regression calibration method, which is shown in simulations to achieve good performance in the presence of both batch-specific and measurement-specific errors. We illustrate our method by using data from a colorectal adenoma study.
Deng, Yangyang; Parajuli, Prem B.
2011-08-10
Evaluation of economic feasibility of a bio-gasification facility needs understanding of its unit cost under different production capacities. The objective of this study was to evaluate the unit cost of syngas production at capacities from 60 through 1800Nm 3/h using an economic model with three regression analysis techniques (simple regression, reciprocal regression, and log-log regression). The preliminary result of this study showed that reciprocal regression analysis technique had the best fit curve between per unit cost and production capacity, with sum of error squares (SES) lower than 0.001 and coefficient of determination of (R 2) 0.996. The regression analysis techniques determined the minimum unit cost of syngas production for micro-scale bio-gasification facilities of $0.052/Nm 3, under the capacity of 2,880 Nm 3/h. The results of this study suggest that to reduce cost, facilities should run at a high production capacity. In addition, the contribution of this technique could be the new categorical criterion to evaluate micro-scale bio-gasification facility from the perspective of economic analysis.
Regression analysis understanding and building business and economic models using Excel
Wilson, J Holton
2012-01-01
The technique of regression analysis is used so often in business and economics today that an understanding of its use is necessary for almost everyone engaged in the field. This book will teach you the essential elements of building and understanding regression models in a business/economic context in an intuitive manner. The authors take a non-theoretical treatment that is accessible even if you have a limited statistical background. It is specifically designed to teach the correct use of regression, while advising you of its limitations and teaching about common pitfalls. This book describe
Zhang, Hong-guang; Lu, Jian-gang
2016-02-01
Abstract To overcome the problems of significant difference among samples and nonlinearity between the property and spectra of samples in spectral quantitative analysis, a local regression algorithm is proposed in this paper. In this algorithm, net signal analysis method(NAS) was firstly used to obtain the net analyte signal of the calibration samples and unknown samples, then the Euclidean distance between net analyte signal of the sample and net analyte signal of calibration samples was calculated and utilized as similarity index. According to the defined similarity index, the local calibration sets were individually selected for each unknown sample. Finally, a local PLS regression model was built on each local calibration sets for each unknown sample. The proposed method was applied to a set of near infrared spectra of meat samples. The results demonstrate that the prediction precision and model complexity of the proposed method are superior to global PLS regression method and conventional local regression algorithm based on spectral Euclidean distance.
Jacob, Benjamin G; Novak, Robert J; Toe, Laurent; Sanfo, Moussa S; Afriyie, Abena N; Ibrahim, Mohammed A; Griffith, Daniel A; Unnasch, Thomas R
2012-01-01
The standard methods for regression analyses of clustered riverine larval habitat data of Simulium damnosum s.l. a major black-fly vector of Onchoceriasis, postulate models relating observational ecological-sampled parameter estimators to prolific habitats without accounting for residual intra-cluster error correlation effects. Generally, this correlation comes from two sources: (1) the design of the random effects and their assumed covariance from the multiple levels within the regression model; and, (2) the correlation structure of the residuals. Unfortunately, inconspicuous errors in residual intra-cluster correlation estimates can overstate precision in forecasted S.damnosum s.l. riverine larval habitat explanatory attributes regardless how they are treated (e.g., independent, autoregressive, Toeplitz, etc). In this research, the geographical locations for multiple riverine-based S. damnosum s.l. larval ecosystem habitats sampled from 2 pre-established epidemiological sites in Togo were identified and recorded from July 2009 to June 2010. Initially the data was aggregated into proc genmod. An agglomerative hierarchical residual cluster-based analysis was then performed. The sampled clustered study site data was then analyzed for statistical correlations using Monthly Biting Rates (MBR). Euclidean distance measurements and terrain-related geomorphological statistics were then generated in ArcGIS. A digital overlay was then performed also in ArcGIS using the georeferenced ground coordinates of high and low density clusters stratified by Annual Biting Rates (ABR). This data was overlain onto multitemporal sub-meter pixel resolution satellite data (i.e., QuickBird 0.61m wavbands ). Orthogonal spatial filter eigenvectors were then generated in SAS/GIS. Univariate and non-linear regression-based models (i.e., Logistic, Poisson and Negative Binomial) were also employed to determine probability distributions and to identify statistically significant parameter
Quantile regression for the statistical analysis of immunological data with many non-detects
Eilers Paul HC; Röder Esther; Savelkoul Huub FJ; van Wijk Roy
2012-01-01
Abstract Background Immunological parameters are hard to measure. A well-known problem is the occurrence of values below the detection limit, the non-detects. Non-detects are a nuisance, because classical statistical analyses, like ANOVA and regression, cannot be applied. The more advanced statistical techniques currently available for the analysis of datasets with non-detects can only be used if a small percentage of the data are non-detects. Methods and results Quantile regression, a genera...
Exploratory regression analysis: a tool for selecting models and determining predictor importance.
Braun, Michael T; Oswald, Frederick L
2011-06-01
Linear regression analysis is one of the most important tools in a researcher's toolbox for creating and testing predictive models. Although linear regression analysis indicates how strongly a set of predictor variables, taken together, will predict a relevant criterion (i.e., the multiple R), the analysis cannot indicate which predictors are the most important. Although there is no definitive or unambiguous method for establishing predictor variable importance, there are several accepted methods. This article reviews those methods for establishing predictor importance and provides a program (in Excel) for implementing them (available for direct download at http://dl.dropbox.com/u/2480715/ERA.xlsm?dl=1) . The program investigates all 2(p) - 1 submodels and produces several indices of predictor importance. This exploratory approach to linear regression, similar to other exploratory data analysis techniques, has the potential to yield both theoretical and practical benefits.
Frndak, Seth E; Smerbeck, Audrey M; Irwin, Lauren N; Drake, Allison S; Kordovski, Victoria M; Kunker, Katrina A; Khan, Anjum L; Benedict, Ralph H B
2016-10-01
We endeavored to clarify how distinct co-occurring symptoms relate to the presence of negative work events in employed multiple sclerosis (MS) patients. Latent profile analysis (LPA) was utilized to elucidate common disability patterns by isolating patient subpopulations. Samples of 272 employed MS patients and 209 healthy controls (HC) were administered neuroperformance tests of ambulation, hand dexterity, processing speed, and memory. Regression-based norms were created from the HC sample. LPA identified latent profiles using the regression-based z-scores. Finally, multinomial logistic regression tested for negative work event differences among the latent profiles. Four profiles were identified via LPA: a common profile (55%) characterized by slightly below average performance in all domains, a broadly low-performing profile (18%), a poor motor abilities profile with average cognition (17%), and a generally high-functioning profile (9%). Multinomial regression analysis revealed that the uniformly low-performing profile demonstrated a higher likelihood of reported negative work events. Employed MS patients with co-occurring motor, memory and processing speed impairments were most likely to report a negative work event, classifying them as uniquely at risk for job loss.
Suzuki, Hideaki; Tabata, Takahisa; Koizumi, Hiroki; Hohchi, Nobusuke; Takeuchi, Shoko; Kitamura, Takuro; Fujino, Yoshihisa; Ohbuchi, Toyoaki
2014-12-01
This study aimed to create a multiple regression model for predicting hearing outcomes of idiopathic sudden sensorineural hearing loss (ISSNHL). The participants were 205 consecutive patients (205 ears) with ISSNHL (hearing level ≥ 40 dB, interval between onset and treatment ≤ 30 days). They received systemic steroid administration combined with intratympanic steroid injection. Data were examined by simple and multiple regression analyses. Three hearing indices (percentage hearing improvement, hearing gain, and posttreatment hearing level [HLpost]) and 7 prognostic factors (age, days from onset to treatment, initial hearing level, initial hearing level at low frequencies, initial hearing level at high frequencies, presence of vertigo, and contralateral hearing level) were included in the multiple regression analysis as dependent and explanatory variables, respectively. In the simple regression analysis, the percentage hearing improvement, hearing gain, and HLpost showed significant correlation with 2, 5, and 6 of the 7 prognostic factors, respectively. The multiple correlation coefficients were 0.396, 0.503, and 0.714 for the percentage hearing improvement, hearing gain, and HLpost, respectively. Predicted values of HLpost calculated by the multiple regression equation were reliable with 70% probability with a 40-dB-width prediction interval. Prediction of HLpost by the multiple regression model may be useful to estimate the hearing prognosis of ISSNHL. © The Author(s) 2014.
Modeling of retardance in ferrofluid with Taguchi-based multiple regression analysis
Lin, Jing-Fung; Wu, Jyh-Shyang; Sheu, Jer-Jia
2015-03-01
The citric acid (CA) coated Fe3O4 ferrofluids are prepared by a co-precipitation method and the magneto-optical retardance property is measured by a Stokes polarimeter. Optimization and multiple regression of retardance in ferrofluids are executed by combining Taguchi method and Excel. From the nine tests for four parameters, including pH of suspension, molar ratio of CA to Fe3O4, volume of CA, and coating temperature, influence sequence and excellent program are found. Multiple regression analysis and F-test on the significance of regression equation are performed. It is found that the model F value is much larger than Fcritical and significance level P <0.0001. So it can be concluded that the regression model has statistically significant predictive ability. Substituting excellent program into equation, retardance is obtained as 32.703°, higher than the highest value in tests by 11.4%.
Risk Factors of Falls in Community-Dwelling Older Adults: Logistic Regression Tree Analysis
Yamashita, Takashi; Noe, Douglas A.; Bailer, A. John
2012-01-01
Purpose of the Study: A novel logistic regression tree-based method was applied to identify fall risk factors and possible interaction effects of those risk factors. Design and Methods: A nationally representative sample of American older adults aged 65 years and older (N = 9,592) in the Health and Retirement Study 2004 and 2006 modules was used.…
Risk Factors of Falls in Community-Dwelling Older Adults: Logistic Regression Tree Analysis
Yamashita, Takashi; Noe, Douglas A.; Bailer, A. John
2012-01-01
Purpose of the Study: A novel logistic regression tree-based method was applied to identify fall risk factors and possible interaction effects of those risk factors. Design and Methods: A nationally representative sample of American older adults aged 65 years and older (N = 9,592) in the Health and Retirement Study 2004 and 2006 modules was used.…
Trend Analysis of Cancer Mortality and Incidence in Panama, Using Joinpoint Regression Analysis
Politis, Michael; Higuera, Gladys; Chang, Lissette Raquel; Gomez, Beatriz; Bares, Juan; Motta, Jorge
2015-01-01
Abstract Cancer is one of the leading causes of death worldwide and its incidence is expected to increase in the future. In Panama, cancer is also one of the leading causes of death. In 1964, a nationwide cancer registry was started and it was restructured and improved in 2012. The aim of this study is to utilize Joinpoint regression analysis to study the trends of the incidence and mortality of cancer in Panama in the last decade. Cancer mortality was estimated from the Panamanian National Institute of Census and Statistics Registry for the period 2001 to 2011. Cancer incidence was estimated from the Panamanian National Cancer Registry for the period 2000 to 2009. The Joinpoint Regression Analysis program, version 4.0.4, was used to calculate trends by age-adjusted incidence and mortality rates for selected cancers. Overall, the trend of age-adjusted cancer mortality in Panama has declined over the last 10 years (−1.12% per year). The cancers for which there was a significant increase in the trend of mortality were female breast cancer and ovarian cancer; while the highest increases in incidence were shown for breast cancer, liver cancer, and prostate cancer. Significant decrease in the trend of mortality was evidenced for the following: prostate cancer, lung and bronchus cancer, and cervical cancer; with respect to incidence, only oral and pharynx cancer in both sexes had a significant decrease. Some cancers showed no significant trends in incidence or mortality. This study reveals contrasting trends in cancer incidence and mortality in Panama in the last decade. Although Panama is considered an upper middle income nation, this study demonstrates that some cancer mortality trends, like the ones seen in cervical and lung cancer, behave similarly to the ones seen in high income countries. In contrast, other types, like breast cancer, follow a pattern seen in countries undergoing a transition to a developed economy with its associated lifestyle, nutrition, and
Zhang, Kun; Huang, Feifei; Chen, Jie; Cai, Qingqing; Wang, Tong; Zou, Rong; Zuo, Zhiyi; Wang, Jingfeng; Huang, Hui
2014-11-01
Overweight and obesity are associated with adverse cardiovascular outcomes. However, the role of overweight and obesity in left ventricular hypertrophy (LVH) of hypertensive patients is controversial. The aim of the current meta-analysis was to evaluate the influence of overweight and obesity on LVH regression in the hypertensive population.Twenty-eight randomized controlled trials comprising 2403 hypertensive patients (mean age range: 43.8-66.7 years) were identified. Three groups were divided according to body mass index: normal weight, overweight, and obesity groups.Compared with the normal-weight group, LVH regression in the overweight and obesity groups was more obvious with less reduction of systolic blood pressure after antihypertensive therapies (P regressing LVH in overweight and obese hypertensive patients (19.27 g/m, 95% confidence interval [15.25, 23.29], P regression was found in 24-h ambulatory blood pressure monitoring (ABPM) group and in relatively young patients (40-60 years' old) group (P Overweight and obesity are independent risk factors for LVH in hypertensive patients. Intervention at an early age and monitoring by ABPM may facilitate therapy-induced LVH regression in overweight and obese hypertensive patients.
Barndorff-Nielsen, Ole Eiler; Shephard, N.
2004-01-01
This paper analyses multivariate high frequency financial data using realized covariation. We provide a new asymptotic distribution theory for standard methods such as regression, correlation analysis, and covariance. It will be based on a fixed interval of time (e.g., a day or week), allowing...... the number of high frequency returns during this period to go to infinity. Our analysis allows us to study how high frequency correlations, regressions, and covariances change through time. In particular we provide confidence intervals for each of these quantities....
Analysis of Functional Data with Focus on Multinomial Regression and Multilevel Data
Mousavi, Seyed Nourollah
Functional data analysis (FDA) is a fast growing area in statistical research with increasingly diverse range of application from economics, medicine, agriculture, chemometrics, etc. Functional regression is an area of FDA which has received the most attention both in aspects of application...... and methodological development. Our main Functional data analysis (FDA) is a fast growing area in statistical research with increasingly diverse range of application from economics, medicine, agriculture, chemometrics, etc. Functional regression is an area of FDA which has received the most attention both in aspects...
Factor analysis and multiple regression between topography and precipitation on Jeju Island, Korea
Um, Myoung-Jin; Yun, Hyeseon; Jeong, Chang-Sam; Heo, Jun-Haeng
2011-11-01
SummaryIn this study, new factors that influence precipitation were extracted from geographic variables using factor analysis, which allow for an accurate estimation of orographic precipitation. Correlation analysis was also used to examine the relationship between nine topographic variables from digital elevation models (DEMs) and the precipitation in Jeju Island. In addition, a spatial analysis was performed in order to verify the validity of the regression model. From the results of the correlation analysis, it was found that all of the topographic variables had a positive correlation with the precipitation. The relations between the variables also changed in accordance with a change in the precipitation duration. However, upon examining the correlation matrix, no significant relationship between the latitude and the aspect was found. According to the factor analysis, eight topographic variables (latitude being the exception) were found to have a direct influence on the precipitation. Three factors were then extracted from the eight topographic variables. By directly comparing the multiple regression model with the factors (model 1) to the multiple regression model with the topographic variables (model 3), it was found that model 1 did not violate the limits of statistical significance and multicollinearity. As such, model 1 was considered to be appropriate for estimating the precipitation when taking into account the topography. In the study of model 1, the multiple regression model using factor analysis was found to be the best method for estimating the orographic precipitation on Jeju Island.
A general framework for the use of logistic regression models in meta-analysis.
Simmonds, Mark C; Higgins, Julian Pt
2016-12-01
Where individual participant data are available for every randomised trial in a meta-analysis of dichotomous event outcomes, "one-stage" random-effects logistic regression models have been proposed as a way to analyse these data. Such models can also be used even when individual participant data are not available and we have only summary contingency table data. One benefit of this one-stage regression model over conventional meta-analysis methods is that it maximises the correct binomial likelihood for the data and so does not require the common assumption that effect estimates are normally distributed. A second benefit of using this model is that it may be applied, with only minor modification, in a range of meta-analytic scenarios, including meta-regression, network meta-analyses and meta-analyses of diagnostic test accuracy. This single model can potentially replace the variety of often complex methods used in these areas. This paper considers, with a range of meta-analysis examples, how random-effects logistic regression models may be used in a number of different types of meta-analyses. This one-stage approach is compared with widely used meta-analysis methods including Bayesian network meta-analysis and the bivariate and hierarchical summary receiver operating characteristic (ROC) models for meta-analyses of diagnostic test accuracy.
Lv, Jianshu; Liu, Yang; Zhang, Zulu; Dai, Jierui
2013-10-15
The knowledge about spatial variations of heavy metals in soils and their relationships with environmental factors is important for human impact assessment and soil management. Surface soils from Rizhao city, Eastern China with rapid urbanization and industrialization were analyzed for six key heavy metals and characterized by parent material and land use using GIS-based data. Factorial kriging analysis and stepwise multiple regression were applied to examine the scale-dependent relationships among heavy metals and to identify environmental factors affecting spatial variability at each spatial scale. Linear model of coregionalization fitting showed that spatial multi-scale variation of heavy metals in soils consisted of nugget effect, an exponential structure with the range of 12 km (short-range scale), as well as a spherical structure with the range of 36 km (long-range scale). The short-range variation of Cd, Pb and Zn were controlled by land use, with higher values in urban areas as well as cultivated land in mountain area, and were related to human influence; while parent material dominated the long structure variations of these elements. Spatial variations of Cr and Ni were associated with natural geochemical sources at short- and long-range scales. At both two scales, Hg dominated by land use, corresponded well to spatial distributions of urban areas, and was attributed to anthropic emissions and atmosphere deposition.
Baghi, Q; Bergé, J; Christophe, B; Touboul, P; Rodrigues, M
2015-01-01
The analysis of physical measurements often copes with highly correlated noises and interruptions caused by outliers, saturation events or transmission losses. We assess the impact of missing data on the performance of linear regression analysis involving the fit of modeled or measured time series. We show that data gaps can significantly alter the precision of the regression parameter estimation in the presence of colored noise, due to the frequency leakage of the noise power. We present a regression method which cancels this effect and estimates the parameters of interest with a precision comparable to the complete data case, even if the noise power spectral density (PSD) is not known a priori. The method is based on an autoregressive (AR) fit of the noise, which allows us to build an approximate generalized least squares estimator approaching the minimal variance bound. The method, which can be applied to any similar data processing, is tested on simulated measurements of the MICROSCOPE space mission, whos...
Shi, K-Q; Zhou, Y-Y; Yan, H-D; Li, H; Wu, F-L; Xie, Y-Y; Braddock, M; Lin, X-Y; Zheng, M-H
2017-02-01
At present, there is no ideal model for predicting the short-term outcome of patients with acute-on-chronic hepatitis B liver failure (ACHBLF). This study aimed to establish and validate a prognostic model by using the classification and regression tree (CART) analysis. A total of 1047 patients from two separate medical centres with suspected ACHBLF were screened in the study, which were recognized as derivation cohort and validation cohort, respectively. CART analysis was applied to predict the 3-month mortality of patients with ACHBLF. The accuracy of the CART model was tested using the area under the receiver operating characteristic curve, which was compared with the model for end-stage liver disease (MELD) score and a new logistic regression model. CART analysis identified four variables as prognostic factors of ACHBLF: total bilirubin, age, serum sodium and INR, and three distinct risk groups: low risk (4.2%), intermediate risk (30.2%-53.2%) and high risk (81.4%-96.9%). The new logistic regression model was constructed with four independent factors, including age, total bilirubin, serum sodium and prothrombin activity by multivariate logistic regression analysis. The performances of the CART model (0.896), similar to the logistic regression model (0.914, P=.382), exceeded that of MELD score (0.667, P<.001). The results were confirmed in the validation cohort. We have developed and validated a novel CART model superior to MELD for predicting three-month mortality of patients with ACHBLF. Thus, the CART model could facilitate medical decision-making and provide clinicians with a validated practical bedside tool for ACHBLF risk stratification.
Spline Nonparametric Regression Analysis of Stress-Strain Curve of Confined Concrete
Tavio Tavio
2008-01-01
Full Text Available Due to enormous uncertainties in confinement models associated with the maximum compressive strength and ductility of concrete confined by rectilinear ties, the implementation of spline nonparametric regression analysis is proposed herein as an alternative approach. The statistical evaluation is carried out based on 128 large-scale column specimens of either normal-or high-strength concrete tested under uniaxial compression. The main advantage of this kind of analysis is that it can be applied when the trend of relation between predictor and response variables are not obvious. The error in the analysis can, therefore, be minimized so that it does not depend on the assumption of a particular shape of the curve. This provides higher flexibility in the application. The results of the statistical analysis indicates that the stress-strain curves of confined concrete obtained from the spline nonparametric regression analysis proves to be in good agreement with the experimental curves available in literatures
Uchimoto, Takeaki; Iwao, Yasunori; Hattori, Hiroaki; Noguchi, Shuji; Itai, Shigeru
2013-01-01
The interaction of the effects of the triglycerin full behenate (TR-FB) concentration and the mixing time on lubrication and tablet properties were analyzed under a two-factor central composite design, and compared with those of magnesium stearate (Mg-St). Various amounts of lubricant (0.07-3.0%) were added to granules and mixed for 1-30 min. A multiple linear regression analysis was performed to identify the effect of the mixing conditions on each physicochemical property. The mixing conditions did not significantly affect the lubrication properties of TR-FB. For tablet properties, tensile strength decreased and disintegration time increased when the lubricant concentration and the mixing time were increased for Mg-St. The direct interaction of the Mg-St concentration and the mixing time had a significant negative effect on the disintegration time. In contrast, any mixing conditions of TR-FB did not affect the tablet properties. In addition, the range of mixing conditions which satisfied the lubrication and tablet property criteria was broader for TR-FB than that for Mg-St, suggesting that TR-FB allows tablets with high quality attributes to be produced consistently. Therefore, TR-FB is a potential lubricant alternative to Mg-St.
Cronk, Ryan; Bartram, Jamie
2017-09-21
Sufficient, safe, and continuously available water services are important for human development and health yet many water systems in low- and middle-income countries are nonfunctional. Monitoring data were analyzed using regression and Bayesian networks (BNs) to explore factors influencing the functionality of 82 503 water systems in Nigeria and Tanzania. Functionality varied by system type. In Tanzania, Nira handpumps were more functional than Afridev and India Mark II handpumps. Higher functionality was associated with fee collection in Nigeria. In Tanzania, functionality was higher if fees were collected monthly rather than in response to system breakdown. Systems in Nigeria were more likely to be functional if they were used for both human and livestock consumption. In Tanzania, systems managed by private operators were more functional than community-managed systems. The BNs found strong dependencies between functionality and system type and administrative unit (e.g., district). The BNs predicted functionality increased from 68% to 89% in Nigeria and from 53% to 68% in Tanzania when best observed conditions were in place. Improvements to water system monitoring and analysis of monitoring data with different modeling techniques may be useful for identifying water service improvement opportunities and informing evidence-based decision-making for better management, policy, programming, and practice.
Family background variables as instruments for education in income regressions: A Bayesian analysis
L.F. Hoogerheide (Lennart); J.H. Block (Jörn); A.R. Thurik (Roy)
2012-01-01
textabstractThe validity of family background variables instrumenting education in income regressions has been much criticized. In this paper, we use data from the 2004 German Socio-Economic Panel and Bayesian analysis to analyze to what degree violations of the strict validity assumption affect the
A systematic review and meta-regression analysis of mivacurium for tracheal intubation
Vanlinthout, L.E.H.; Mesfin, S.H.; Hens, N.; Vanacker, B.F.; Robertson, E.N.; Booij, L.H.D.J.
2014-01-01
We systematically reviewed factors associated with intubation conditions in randomised controlled trials of mivacurium, using random-effects meta-regression analysis. We included 29 studies of 1050 healthy participants. Four factors explained 72.9% of the variation in the probability of excellent in
Schulz, Wolfram
Differences in student knowledge about democracy, institutions, and citizenship and students skills in interpreting political communication were studied through multilevel regression analysis of results from the second International Education Association (IEA) Study. This study provides data on 14-year-old students from 28 countries in Europe,…
Multiple Logistic Regression Analysis of Cigarette Use among High School Students
Adwere-Boamah, Joseph
2011-01-01
A binary logistic regression analysis was performed to predict high school students' cigarette smoking behavior from selected predictors from 2009 CDC Youth Risk Behavior Surveillance Survey. The specific target student behavior of interest was frequent cigarette use. Five predictor variables included in the model were: a) race, b) frequency of…
Bates, Reid A.; Holton, Elwood F., III; Burnett, Michael F.
1999-01-01
A case study of learning transfer demonstrates the possible effect of influential observation on linear regression analysis. A diagnostic method that tests for violation of assumptions, multicollinearity, and individual and multiple influential observations helps determine which observation to delete to eliminate bias. (SK)
Declining Bias and Gender Wage Discrimination? A Meta-Regression Analysis
Jarrell, Stephen B.; Stanley, T. D.
2004-01-01
The meta-regression analysis reveals that there is a strong tendency for discrimination estimates to fall and wage discrimination exist against the woman. The biasing effect of researchers' gender of not correcting for selection bias has weakened and changes in labor market have made it less important.
Identifiable Data Files - Medicare Provider Analysis and ...
U.S. Department of Health & Human Services — The Medicare Provider Analysis and Review (MEDPAR) File contains data from claims for services provided to beneficiaries admitted to Medicare certified inpatient...
Identifying Proper Names Based on Association Analysis
无
2007-01-01
The issue of proper names recognition in Chinese text was discussed. An automatic approach based on association analysis to extract rules from corpus was presented. The method tries to discover rules relevant to external evidence by association analysis, without additional manual effort. These rules can be used to recognize the proper nouns in Chinese texts. The experimental result shows that our method is practical in some applications.Moreover, the method is language independent.
Expert Involvement Predicts mHealth App Downloads: Multivariate Regression Analysis of Urology Apps
Osório, Luís; Cavadas, Vitor; Fraga, Avelino; Carrasquinho, Eduardo; Cardoso de Oliveira, Eduardo; Castelo-Branco, Miguel; Roobol, Monique J
2016-01-01
Background Urological mobile medical (mHealth) apps are gaining popularity with both clinicians and patients. mHealth is a rapidly evolving and heterogeneous field, with some urology apps being downloaded over 10,000 times and others not at all. The factors that contribute to medical app downloads have yet to be identified, including the hypothetical influence of expert involvement in app development. Objective The objective of our study was to identify predictors of the number of urology app downloads. Methods We reviewed urology apps available in the Google Play Store and collected publicly available data. Multivariate ordinal logistic regression evaluated the effect of publicly available app variables on the number of apps being downloaded. Results Of 129 urology apps eligible for study, only 2 (1.6%) had >10,000 downloads, with half having ≤100 downloads and 4 (3.1%) having none at all. Apps developed with expert urologist involvement (P=.003), optional in-app purchases (P=.01), higher user rating (P<.001), and more user reviews (P<.001) were more likely to be installed. App cost was inversely related to the number of downloads (P<.001). Only data from the Google Play Store and the developers’ websites, but not other platforms, were publicly available for analysis, and the level and nature of expert involvement was not documented. Conclusions The explicit participation of urologists in app development is likely to enhance its chances to have a higher number of downloads. This finding should help in the design of better apps and further promote urologist involvement in mHealth. Official certification processes are required to ensure app quality and user safety. PMID:27421338
Analysis of designed experiments by stabilised PLS Regression and jack-knifing
Martens, Harald; Høy, M.; Westad, F.
2001-01-01
Pragmatical, visually oriented methods for assessing and optimising bi-linear regression models are described, and applied to PLS Regression (PLSR) analysis of multi-response data from controlled experiments. The paper outlines some ways to stabilise the PLSR method to extend its range...... the reliability of the linear and bi-linear model parameter estimates. The paper illustrates how the obtained PLSR "significance" probabilities are similar to those from conventional factorial ANOVA, but the PLSR is shown to give important additional overview plots of the main relevant structures in the multi...
Automatic regression analysis for use in a complex system of evaluation of plant genetic resources
Cs. ARKOSSY
1984-08-01
Full Text Available In accordance with the general requirements regarding computerization in gene banks and germplasm research a computer program has been compiled for the analysis of univariate response in crop germplasm evaluation. The program is compiled in COBOL and run on a FELIX C-256 computer. The different modules of the program allows for: (1. data control and error listing; (2 computation of the regression function; (3 listing of the difference between the values measured and computed; (4 sorting of the individuals samples; (5 construction of scattergrams in two dimensions for measured values with the simultaneous representation of the regression line; (6 listing of examined samples in a sequence required in evaluation.
Methods and applications of linear models regression and the analysis of variance
Hocking, Ronald R
2013-01-01
Praise for the Second Edition"An essential desktop reference book . . . it should definitely be on your bookshelf." -Technometrics A thoroughly updated book, Methods and Applications of Linear Models: Regression and the Analysis of Variance, Third Edition features innovative approaches to understanding and working with models and theory of linear regression. The Third Edition provides readers with the necessary theoretical concepts, which are presented using intuitive ideas rather than complicated proofs, to describe the inference that is appropriate for the methods being discussed. The book
Wang, D Z; Wang, C; Shen, C F; Zhang, Y; Zhang, H; Song, G D; Xue, X D; Xu, Z L; Zhang, S; Jiang, G H
2017-05-10
We described the time trend of acute myocardial infarction (AMI) from 1999 to 2013 in Tianjin incidence rate with Cochran-Armitage trend (CAT) test and linear regression analysis, and the results were compared. Based on actual population, CAT test had much stronger statistical power than linear regression analysis for both overall incidence trend and age specific incidence trend (Cochran-Armitage trend P valuelinear regression P value). The statistical power of CAT test decreased, while the result of linear regression analysis remained the same when population size was reduced by 100 times and AMI incidence rate remained unchanged. The two statistical methods have their advantages and disadvantages. It is necessary to choose statistical method according the fitting degree of data, or comprehensively analyze the results of two methods.
Greensmith, David J
2014-01-01
Here I present an Excel based program for the analysis of intracellular Ca transients recorded using fluorescent indicators. The program can perform all the necessary steps which convert recorded raw voltage changes into meaningful physiological information. The program performs two fundamental processes. (1) It can prepare the raw signal by several methods. (2) It can then be used to analyze the prepared data to provide information such as absolute intracellular Ca levels. Also, the rates of change of Ca can be measured using multiple, simultaneous regression analysis. I demonstrate that this program performs equally well as commercially available software, but has numerous advantages, namely creating a simplified, self-contained analysis workflow.
Cirulli, N; Ballini, A; Cantore, S; Farronato, D; Inchingolo, F; Dipalma, G; Gatto, M R; Alessandri Bonetti, G
2015-01-01
Mixed dentition analysis forms a critical aspect of early orthodontic treatment. In fact an accurate space analysis is one of the important criteria in determining whether the treatment plan may involve serial extraction, guidance of eruption, space maintenance, space regaining or just periodic observation of the patients. The aim of the present study was to calculate linear regression equations in mixed dentition space analysis, measuring 230 dental casts mesiodistal tooth widths, obtained from southern Italian patients (118 females, 112 males, mean age 15±3 years). Students t-test or Wilcoxon test for independent and paired samples were used to determine right/left side and male/female differences. On the basis of the sum of the mesiodistal diameters of the 4 mandibular incisors as predictors for the sum of the widths of the canines and premolars in the mandibular mixed dentition, a new linear regression equation was found: y = 0.613x+7.294 (r= 0.701) for both genders in a southern Italian population. To better estimate the size of leeway space, a new regression equation was found to calculate the mesiodistal size of the second premolar using the sum of the four mandibular incisors, canine and first premolar as a predictor. The equation is y = 0.241x+1.224 (r= 0.732). In conclusion, new regression equations were derived for a southern Italian population.
Wen-Cheng Wang
2014-01-01
Full Text Available It is estimated that mainland Chinese tourists travelling to Taiwan can bring annual revenues of 400 billion NTD to the Taiwan economy. Thus, how the Taiwanese Government formulates relevant measures to satisfy both sides is the focus of most concern. Taiwan must improve the facilities and service quality of its tourism industry so as to attract more mainland tourists. This paper conducted a questionnaire survey of mainland tourists and used grey relational analysis in grey mathematics to analyze the satisfaction performance of all satisfaction question items. The first eight satisfaction items were used as independent variables, and the overall satisfaction performance was used as a dependent variable for quantile regression model analysis to discuss the relationship between the dependent variable under different quantiles and independent variables. Finally, this study further discussed the predictive accuracy of the least mean regression model and each quantile regression model, as a reference for research personnel. The analysis results showed that other variables could also affect the overall satisfaction performance of mainland tourists, in addition to occupation and age. The overall predictive accuracy of quantile regression model Q0.25 was higher than that of the other three models.
Wang, Wen-Cheng; Cho, Wen-Chien; Chen, Yin-Jen
2014-01-01
It is estimated that mainland Chinese tourists travelling to Taiwan can bring annual revenues of 400 billion NTD to the Taiwan economy. Thus, how the Taiwanese Government formulates relevant measures to satisfy both sides is the focus of most concern. Taiwan must improve the facilities and service quality of its tourism industry so as to attract more mainland tourists. This paper conducted a questionnaire survey of mainland tourists and used grey relational analysis in grey mathematics to analyze the satisfaction performance of all satisfaction question items. The first eight satisfaction items were used as independent variables, and the overall satisfaction performance was used as a dependent variable for quantile regression model analysis to discuss the relationship between the dependent variable under different quantiles and independent variables. Finally, this study further discussed the predictive accuracy of the least mean regression model and each quantile regression model, as a reference for research personnel. The analysis results showed that other variables could also affect the overall satisfaction performance of mainland tourists, in addition to occupation and age. The overall predictive accuracy of quantile regression model Q0.25 was higher than that of the other three models. PMID:24574916
Ussery, David; Bohlin, Jon; Skjerve, Eystein
2009-01-01
Recently there has been an explosion in the availability of bacterial genomic sequences, making possible now an analysis of genomic signatures across more than 800 hundred different bacterial chromosomes, from a wide variety of environments. Using genomic signatures, we pair-wise compared 867...... different genomic DNA sequences, taken from chromosomes and plasmids more than 100,000 base-pairs in length. Hierarchical clustering was performed on the outcome of the comparisons before a multinomial regression model was fitted. The regression model included the cluster groups as the response variable...... AT content. Small improvements to the regression model, although significant, were also obtained by factors such as sequence size, habitat, growth temperature, selective pressure measured as oligonucleotide usage variance, and oxygen requirement.The statistics obtained using hierarchical clustering...
Bayesian Method of Moments (BMOM) Analysis of Mean and Regression Models
Zellner, Arnold
2008-01-01
A Bayesian method of moments/instrumental variable (BMOM/IV) approach is developed and applied in the analysis of the important mean and multiple regression models. Given a single set of data, it is shown how to obtain posterior and predictive moments without the use of likelihood functions, prior densities and Bayes' Theorem. The posterior and predictive moments, based on a few relatively weak assumptions, are then used to obtain maximum entropy densities for parameters, realized error terms and future values of variables. Posterior means for parameters and realized error terms are shown to be equal to certain well known estimates and rationalized in terms of quadratic loss functions. Conditional maxent posterior densities for means and regression coefficients given scale parameters are in the normal form while scale parameters' maxent densities are in the exponential form. Marginal densities for individual regression coefficients, realized error terms and future values are in the Laplace or double-exponenti...
Souadka Amine
2010-04-01
Full Text Available Abstract Background Incidence of liver hydatid cyst (LHC rupture ranged 15%-40% of all cases and most of them concern the bile duct tree. Patients with biliocystic communication (BCC had specific clinic and therapeutic aspect. The purpose of this study was to determine witch patients with LHC may develop BCC using classification and regression tree (CART analysis Methods A retrospective study of 672 patients with liver hydatid cyst treated at the surgery department "A" at Ibn Sina University Hospital, Rabat Morocco. Four-teen risk factors for BCC occurrence were entered into CART analysis to build an algorithm that can predict at the best way the occurrence of BCC. Results Incidence of BCC was 24.5%. Subgroups with high risk were patients with jaundice and thick pericyst risk at 73.2% and patients with thick pericyst, with no jaundice 36.5 years and younger with no past history of LHC risk at 40.5%. Our developed CART model has sensitivity at 39.6%, specificity at 93.3%, positive predictive value at 65.6%, a negative predictive value at 82.6% and accuracy of good classification at 80.1%. Discriminating ability of the model was good 82%. Conclusion we developed a simple classification tool to identify LHC patients with high risk BCC during a routine clinic visit (only on clinical history and examination followed by an ultrasonography. Predictive factors were based on pericyst aspect, jaundice, age, past history of liver hydatidosis and morphological Gharbi cyst aspect. We think that this classification can be useful with efficacy to direct patients at appropriated medical struct's.
Regression Analysis of Top of Descent Location for Idle-thrust Descents
Stell, Laurel; Bronsvoort, Jesper; McDonald, Greg
2013-01-01
In this paper, multiple regression analysis is used to model the top of descent (TOD) location of user-preferred descent trajectories computed by the flight management system (FMS) on over 1000 commercial flights into Melbourne, Australia. The independent variables cruise altitude, final altitude, cruise Mach, descent speed, wind, and engine type were also recorded or computed post-operations. Both first-order and second-order models are considered, where cross-validation, hypothesis testing, and additional analysis are used to compare models. This identifies the models that should give the smallest errors if used to predict TOD location for new data in the future. A model that is linear in TOD altitude, final altitude, descent speed, and wind gives an estimated standard deviation of 3.9 nmi for TOD location given the trajec- tory parameters, which means about 80% of predictions would have error less than 5 nmi in absolute value. This accuracy is better than demonstrated by other ground automation predictions using kinetic models. Furthermore, this approach would enable online learning of the model. Additional data or further knowl- edge of algorithms is necessary to conclude definitively that no second-order terms are appropriate. Possible applications of the linear model are described, including enabling arriving aircraft to fly optimized descents computed by the FMS even in congested airspace. In particular, a model for TOD location that is linear in the independent variables would enable decision support tool human-machine interfaces for which a kinetic approach would be computationally too slow.
Pineda, Silvia; Real, Francisco X; Kogevinas, Manolis; Carrato, Alfredo; Chanock, Stephen J; Malats, Núria; Van Steen, Kristel
2015-12-01
Omics data integration is becoming necessary to investigate the genomic mechanisms involved in complex diseases. During the integration process, many challenges arise such as data heterogeneity, the smaller number of individuals in comparison to the number of parameters, multicollinearity, and interpretation and validation of results due to their complexity and lack of knowledge about biological processes. To overcome some of these issues, innovative statistical approaches are being developed. In this work, we propose a permutation-based method to concomitantly assess significance and correct by multiple testing with the MaxT algorithm. This was applied with penalized regression methods (LASSO and ENET) when exploring relationships between common genetic variants, DNA methylation and gene expression measured in bladder tumor samples. The overall analysis flow consisted of three steps: (1) SNPs/CpGs were selected per each gene probe within 1Mb window upstream and downstream the gene; (2) LASSO and ENET were applied to assess the association between each expression probe and the selected SNPs/CpGs in three multivariable models (SNP, CPG, and Global models, the latter integrating SNPs and CPGs); and (3) the significance of each model was assessed using the permutation-based MaxT method. We identified 48 genes whose expression levels were significantly associated with both SNPs and CPGs. Importantly, 36 (75%) of them were replicated in an independent data set (TCGA) and the performance of the proposed method was checked with a simulation study. We further support our results with a biological interpretation based on an enrichment analysis. The approach we propose allows reducing computational time and is flexible and easy to implement when analyzing several types of omics data. Our results highlight the importance of integrating omics data by applying appropriate statistical strategies to discover new insights into the complex genetic mechanisms involved in disease
Identifying Peer Institutions Using Cluster Analysis
Boronico, Jess; Choksi, Shail S.
2012-01-01
The New York Institute of Technology's (NYIT) School of Management (SOM) wishes to develop a list of peer institutions for the purpose of benchmarking and monitoring/improving performance against other business schools. The procedure utilizes relevant criteria for the purpose of establishing this peer group by way of a cluster analysis. The…
Groping Toward Linear Regression Analysis: Newton's Analysis of Hipparchus' Equinox Observations
Belenkiy, Ari
2008-01-01
In 1700, Newton, in designing a new universal calendar contained in the manuscripts known as Yahuda MS 24 from Jewish National and University Library at Jerusalem and analyzed in our recent article in Notes & Records Royal Society (59 (3), Sept 2005, pp. 223-54), attempted to compute the length of the tropical year using the ancient equinox observations reported by a famous Greek astronomer Hipparchus of Rhodes, ten in number. Though Newton had a very thin sample of data, he obtained a tropical year only a few seconds longer than the correct length. The reason lies in Newton's application of a technique similar to modern regression analysis. Actually he wrote down the first of the two so-called "normal equations" known from the Ordinary Least Squares method. Newton also had a vague understanding of qualitative variables. This paper concludes by discussing open historico-astronomical problems related to the inclination of the Earth's axis of rotation. In particular, ignorance about the long-range variation...
Identifying MMORPG Bots: A Traffic Analysis Approach
Chen, Kuan-Ta; Jiang, Jhih-Wei; Huang, Polly; Chu, Hao-Hua; Lei, Chin-Laung; Chen, Wen-Chin
2008-12-01
Massively multiplayer online role playing games (MMORPGs) have become extremely popular among network gamers. Despite their success, one of MMORPG's greatest challenges is the increasing use of game bots, that is, autoplaying game clients. The use of game bots is considered unsportsmanlike and is therefore forbidden. To keep games in order, game police, played by actual human players, often patrol game zones and question suspicious players. This practice, however, is labor-intensive and ineffective. To address this problem, we analyze the traffic generated by human players versus game bots and propose general solutions to identify game bots. Taking Ragnarok Online as our subject, we study the traffic generated by human players and game bots. We find that their traffic is distinguishable by 1) the regularity in the release time of client commands, 2) the trend and magnitude of traffic burstiness in multiple time scales, and 3) the sensitivity to different network conditions. Based on these findings, we propose four strategies and two ensemble schemes to identify bots. Finally, we discuss the robustness of the proposed methods against countermeasures of bot developers, and consider a number of possible ways to manage the increasingly serious bot problem.
Identifying MMORPG Bots: A Traffic Analysis Approach
Wen-Chin Chen
2008-11-01
Full Text Available Massively multiplayer online role playing games (MMORPGs have become extremely popular among network gamers. Despite their success, one of MMORPG's greatest challenges is the increasing use of game bots, that is, autoplaying game clients. The use of game bots is considered unsportsmanlike and is therefore forbidden. To keep games in order, game police, played by actual human players, often patrol game zones and question suspicious players. This practice, however, is labor-intensive and ineffective. To address this problem, we analyze the traffic generated by human players versus game bots and propose general solutions to identify game bots. Taking Ragnarok Online as our subject, we study the traffic generated by human players and game bots. We find that their traffic is distinguishable by 1 the regularity in the release time of client commands, 2 the trend and magnitude of traffic burstiness in multiple time scales, and 3 the sensitivity to different network conditions. Based on these findings, we propose four strategies and two ensemble schemes to identify bots. Finally, we discuss the robustness of the proposed methods against countermeasures of bot developers, and consider a number of possible ways to manage the increasingly serious bot problem.
Identifying nonlinear biomechanical models by multicriteria analysis
Srdjevic, Zorica; Cveticanin, Livija
2012-02-01
In this study, the methodology developed by Srdjevic and Cveticanin (International Journal of Industrial Ergonomics 34 (2004) 307-318) for the nonbiased (objective) parameter identification of the linear biomechanical model exposed to vertical vibrations is extended to the identification of n-degree of freedom (DOF) nonlinear biomechanical models. The dynamic performance of the n-DOF nonlinear model is described in terms of response functions in the frequency domain, such as the driving-point mechanical impedance and seat-to-head transmissibility function. For randomly generated parameters of the model, nonlinear equations of motion are solved using the Runge-Kutta method. The appropriate data transformation from the time-to-frequency domain is performed by a discrete Fourier transformation. Squared deviations of the response functions from the target values are used as the model performance evaluation criteria, thus shifting the problem into the multicriteria framework. The objective weights of criteria are obtained by applying the Shannon entropy concept. The suggested methodology is programmed in Pascal and tested on a 4-DOF nonlinear lumped parameter biomechanical model. The identification process over the 2000 generated sets of parameters lasts less than 20 s. The model response obtained with the imbedded identified parameters correlates well with the target values, therefore, justifying the use of the underlying concept and the mathematical instruments and numerical tools applied. It should be noted that the identified nonlinear model has an improved accuracy of the biomechanical response compared to the accuracy of a linear model.
Multiple Regression Analysis of mRNA-miRNA Associations in Colorectal Cancer Pathway
Fengfeng Wang; S. C. Cesar Wong; Lawrence W. C. Chan; Cho, William C. S.; S. P. Yip; Yung, Benjamin Y. M.
2014-01-01
Background. MicroRNA (miRNA) is a short and endogenous RNA molecule that regulates posttranscriptional gene expression. It is an important factor for tumorigenesis of colorectal cancer (CRC), and a potential biomarker for diagnosis, prognosis, and therapy of CRC. Our objective is to identify the related miRNAs and their associations with genes frequently involved in CRC microsatellite instability (MSI) and chromosomal instability (CIN) signaling pathways. Results. A regression model was adopt...
Scarneciu, Camelia C; Sangeorzan, Livia; Rus, Horatiu; Scarneciu, Vlad D; Varciu, Mihai S; Andreescu, Oana; Scarneciu, Ioan
2017-01-01
This study aimed at assessing the incidence of pulmonary hypertension (PH) at newly diagnosed hyperthyroid patients and at finding a simple model showing the complex functional relation between pulmonary hypertension in hyperthyroidism and the factors causing it. The 53 hyperthyroid patients (H-group) were evaluated mainly by using an echocardiographical method and compared with 35 euthyroid (E-group) and 25 healthy people (C-group). In order to identify the factors causing pulmonary hypertension the statistical method of comparing the values of arithmetical means is used. The functional relation between the two random variables (PAPs and each of the factors determining it within our research study) can be expressed by linear or non-linear function. By applying the linear regression method described by a first-degree equation the line of regression (linear model) has been determined; by applying the non-linear regression method described by a second degree equation, a parabola-type curve of regression (non-linear or polynomial model) has been determined. We made the comparison and the validation of these two models by calculating the determination coefficient (criterion 1), the comparison of residuals (criterion 2), application of AIC criterion (criterion 3) and use of F-test (criterion 4). From the H-group, 47% have pulmonary hypertension completely reversible when obtaining euthyroidism. The factors causing pulmonary hypertension were identified: previously known- level of free thyroxin, pulmonary vascular resistance, cardiac output; new factors identified in this study- pretreatment period, age, systolic blood pressure. According to the four criteria and to the clinical judgment, we consider that the polynomial model (graphically parabola- type) is better than the linear one. The better model showing the functional relation between the pulmonary hypertension in hyperthyroidism and the factors identified in this study is given by a polynomial equation of second
Replica analysis of overfitting in regression models for time-to-event data
Coolen, A. C. C.; Barrett, J. E.; Paga, P.; Perez-Vicente, C. J.
2017-09-01
Overfitting, which happens when the number of parameters in a model is too large compared to the number of data points available for determining these parameters, is a serious and growing problem in survival analysis. While modern medicine presents us with data of unprecedented dimensionality, these data cannot yet be used effectively for clinical outcome prediction. Standard error measures in maximum likelihood regression, such as p-values and z-scores, are blind to overfitting, and even for Cox’s proportional hazards model (the main tool of medical statisticians), one finds in literature only rules of thumb on the number of samples required to avoid overfitting. In this paper we present a mathematical theory of overfitting in regression models for time-to-event data, which aims to increase our quantitative understanding of the problem and provide practical tools with which to correct regression outcomes for the impact of overfitting. It is based on the replica method, a statistical mechanical technique for the analysis of heterogeneous many-variable systems that has been used successfully for several decades in physics, biology, and computer science, but not yet in medical statistics. We develop the theory initially for arbitrary regression models for time-to-event data, and verify its predictions in detail for the popular Cox model.
FRICTION MODELING OF Al-Mg ALLOY SHEETS BASED ON MULTIPLE REGRESSION ANALYSIS AND NEURAL NETWORKS
Hirpa G. Lemu
2017-03-01
Full Text Available This article reports a proposed approach to a frictional resistance description in sheet metal forming processes that enables determination of the friction coefficient value under a wide range of friction conditions without performing time-consuming experiments. The motivation for this proposal is the fact that there exists a considerable amount of factors affect the friction coefficient value and as a result building analytical friction model for specified process conditions is practically impossible. In this proposed approach, a mathematical model of friction behaviour is created using multiple regression analysis and artificial neural networks. The regression analysis was performed using a subroutine in MATLAB programming code and STATISTICA Neural Networks was utilized to build an artificial neural networks model. The effect of different training strategies on the quality of neural networks was studied. As input variables for regression model and training of radial basis function networks, generalized regression neural networks and multilayer networks the results of strip drawing friction test were utilized. Four kinds of Al-Mg alloy sheets were used as a test material.
Multilayer perceptron for robust nonlinear interval regression analysis using genetic algorithms.
Hu, Yi-Chung
2014-01-01
On the basis of fuzzy regression, computational models in intelligence such as neural networks have the capability to be applied to nonlinear interval regression analysis for dealing with uncertain and imprecise data. When training data are not contaminated by outliers, computational models perform well by including almost all given training data in the data interval. Nevertheless, since training data are often corrupted by outliers, robust learning algorithms employed to resist outliers for interval regression analysis have been an interesting area of research. Several approaches involving computational intelligence are effective for resisting outliers, but the required parameters for these approaches are related to whether the collected data contain outliers or not. Since it seems difficult to prespecify the degree of contamination beforehand, this paper uses multilayer perceptron to construct the robust nonlinear interval regression model using the genetic algorithm. Outliers beyond or beneath the data interval will impose slight effect on the determination of data interval. Simulation results demonstrate that the proposed method performs well for contaminated datasets.
Multilayer Perceptron for Robust Nonlinear Interval Regression Analysis Using Genetic Algorithms
2014-01-01
On the basis of fuzzy regression, computational models in intelligence such as neural networks have the capability to be applied to nonlinear interval regression analysis for dealing with uncertain and imprecise data. When training data are not contaminated by outliers, computational models perform well by including almost all given training data in the data interval. Nevertheless, since training data are often corrupted by outliers, robust learning algorithms employed to resist outliers for interval regression analysis have been an interesting area of research. Several approaches involving computational intelligence are effective for resisting outliers, but the required parameters for these approaches are related to whether the collected data contain outliers or not. Since it seems difficult to prespecify the degree of contamination beforehand, this paper uses multilayer perceptron to construct the robust nonlinear interval regression model using the genetic algorithm. Outliers beyond or beneath the data interval will impose slight effect on the determination of data interval. Simulation results demonstrate that the proposed method performs well for contaminated datasets. PMID:25110755
Tutorial on Biostatistics: Linear Regression Analysis of Continuous Correlated Eye Data.
Ying, Gui-Shuang; Maguire, Maureen G; Glynn, Robert; Rosner, Bernard
2017-04-01
To describe and demonstrate appropriate linear regression methods for analyzing correlated continuous eye data. We describe several approaches to regression analysis involving both eyes, including mixed effects and marginal models under various covariance structures to account for inter-eye correlation. We demonstrate, with SAS statistical software, applications in a study comparing baseline refractive error between one eye with choroidal neovascularization (CNV) and the unaffected fellow eye, and in a study determining factors associated with visual field in the elderly. When refractive error from both eyes were analyzed with standard linear regression without accounting for inter-eye correlation (adjusting for demographic and ocular covariates), the difference between eyes with CNV and fellow eyes was 0.15 diopters (D; 95% confidence interval, CI -0.03 to 0.32D, p = 0.10). Using a mixed effects model or a marginal model, the estimated difference was the same but with narrower 95% CI (0.01 to 0.28D, p = 0.03). Standard regression for visual field data from both eyes provided biased estimates of standard error (generally underestimated) and smaller p-values, while analysis of the worse eye provided larger p-values than mixed effects models and marginal models. In research involving both eyes, ignoring inter-eye correlation can lead to invalid inferences. Analysis using only right or left eyes is valid, but decreases power. Worse-eye analysis can provide less power and biased estimates of effect. Mixed effects or marginal models using the eye as the unit of analysis should be used to appropriately account for inter-eye correlation and maximize power and precision.
Significant drivers of the virtual water trade evaluated with a multivariate regression analysis
Tamea, Stefania; Laio, Francesco; Ridolfi, Luca
2014-05-01
International trade of food is vital for the food security of many countries, which rely on trade to compensate for an agricultural production insufficient to feed the population. At the same time, food trade has implications on the distribution and use of water resources, because through the international trade of food commodities, countries virtually displace the water used for food production, known as "virtual water". Trade thus implies a network of virtual water fluxes from exporting to importing countries, which has been estimated to displace more than 2 billions of m3 of water per year, or about the 2% of the annual global precipitation above land. It is thus important to adequately identify the dynamics and the controlling factors of the virtual water trade in that it supports and enables the world food security. Using the FAOSTAT database of international trade and the virtual water content available from the Water Footprint Network, we reconstructed 25 years (1986-2010) of virtual water fluxes. We then analyzed the dependence of exchanged fluxes on a set of major relevant factors, that includes: population, gross domestic product, arable land, virtual water embedded in agricultural production and dietary consumption, and geographical distance between countries. Significant drivers have been identified by means of a multivariate regression analysis, applied separately to the export and import fluxes of each country; temporal trends are outlined and the relative importance of drivers is assessed by a commonality analysis. Results indicate that population, gross domestic product and geographical distance are the major drivers of virtual water fluxes, with a minor (but non-negligible) contribution given by the agricultural production of exporting countries. Such drivers have become relevant for an increasing number of countries throughout the years, with an increasing variance explained by the distance between countries and a decreasing role of the gross
Akkaya, Ali Volkan [Department of Mechanical Engineering, Yildiz Technical University, 34349 Besiktas, Istanbul (Turkey)
2009-02-15
In this paper, multiple nonlinear regression models for estimation of higher heating value of coals are developed using proximate analysis data obtained generally from the low rank coal samples as-received basis. In this modeling study, three main model structures depended on the number of proximate analysis parameters, which are named the independent variables, such as moisture, ash, volatile matter and fixed carbon, are firstly categorized. Secondly, sub-model structures with different arrangements of the independent variables are considered. Each sub-model structure is analyzed with a number of model equations in order to find the best fitting model using multiple nonlinear regression method. Based on the results of nonlinear regression analysis, the best model for each sub-structure is determined. Among them, the models giving highest correlation for three main structures are selected. Although the selected all three models predicts HHV rather accurately, the model involving four independent variables provides the most accurate estimation of HHV. Additionally, when the chosen model with four independent variables and a literature model are tested with extra proximate analysis data, it is seen that that the developed model in this study can give more accurate prediction of HHV of coals. It can be concluded that the developed model is effective tool for HHV estimation of low rank coals. (author)
Analysis of the Evolution of the Gross Domestic Product by Means of Cyclic Regressions
Catalin Angelo Ioan
2011-08-01
Full Text Available In this article, we will carry out an analysis on the regularity of the Gross Domestic Product of a country, in our case the United States. The method of analysis is based on a new method of analysis – the cyclic regressions based on the Fourier series of a function. Another point of view is that of considering instead the growth rate of GDP the speed of variation of this rate, computed as a numerical derivative. The obtained results show a cycle for this indicator for 71 years, the mean square error being 0.93%. The method described allows an prognosis on short-term trends in GDP.
Multiple Regression Analysis of Unconfined Compression Strength of Mine Tailings Matrices
Mahmood Ali A.
2017-01-01
Full Text Available As part of a novel approach of sustainable development of mine tailings, experimental and numerical analysis is carried out on newly formulated tailings matrices. Several physical characteristic tests are carried out including the unconfined compression strength test to ascertain the integrity of these matrices when subjected to loading. The current paper attempts a multiple regression analysis of the unconfined compressive strength test results of these matrices to investigate the most pertinent factors affecting their strength. Results of this analysis showed that the suggested equation is reasonably applicable to the range of binder combinations used.
2012-06-15
correlations of the individual variables with each other in a one to one relationship. Variance Inflation Factor ( VIF ) analysis was also conducted to further...accuracy of the model. 31 VIF determines if the variances of the estimated coefficients in the regression model are inflated due to...larger the variance of bk (Simon, 2004). The VIF itself is comprised of the last portion of the equation: 1 1 According to Dr. Simon, VIF values
Witt, Katrina; van Dorn, Richard; Fazel, Seena
2013-01-01
Previous reviews on risk and protective factors for violence in psychosis have produced contrasting findings. There is therefore a need to clarify the direction and strength of association of risk and protective factors for violent outcomes in individuals with psychosis. We conducted a systematic review and meta-analysis using 6 electronic databases (CINAHL, EBSCO, EMBASE, Global Health, PsycINFO, PUBMED) and Google Scholar. Studies were identified that reported factors associated with violence in adults diagnosed, using DSM or ICD criteria, with schizophrenia and other psychoses. We considered non-English language studies and dissertations. Risk and protective factors were meta-analysed if reported in three or more primary studies. Meta-regression examined sources of heterogeneity. A novel meta-epidemiological approach was used to group similar risk factors into one of 10 domains. Sub-group analyses were then used to investigate whether risk domains differed for studies reporting severe violence (rather than aggression or hostility) and studies based in inpatient (rather than outpatient) settings. There were 110 eligible studies reporting on 45,533 individuals, 8,439 (18.5%) of whom were violent. A total of 39,995 (87.8%) were diagnosed with schizophrenia, 209 (0.4%) were diagnosed with bipolar disorder, and 5,329 (11.8%) were diagnosed with other psychoses. Dynamic (or modifiable) risk factors included hostile behaviour, recent drug misuse, non-adherence with psychological therapies (p valuesviolence, these associations did not change materially. In studies investigating inpatient violence, associations differed in strength but not direction. Certain dynamic risk factors are strongly associated with increased violence risk in individuals with psychosis and their role in risk assessment and management warrants further examination.
Zhang, Yiwei; Pan, Wei
2015-03-01
Genome-wide association studies (GWAS) have been established as a major tool to identify genetic variants associated with complex traits, such as common diseases. However, GWAS may suffer from false positives and false negatives due to confounding population structures, including known or unknown relatedness. Another important issue is unmeasured environmental risk factors. Among many methods for adjusting for population structures, two approaches stand out: one is principal component regression (PCR) based on principal component analysis, which is perhaps the most popular due to its early appearance, simplicity, and general effectiveness; the other is based on a linear mixed model (LMM) that has emerged recently as perhaps the most flexible and effective, especially for samples with complex structures as in model organisms. As shown previously, the PCR approach can be regarded as an approximation to an LMM; such an approximation depends on the number of the top principal components (PCs) used, the choice of which is often difficult in practice. Hence, in the presence of population structure, the LMM appears to outperform the PCR method. However, due to the different treatments of fixed vs. random effects in the two approaches, we show an advantage of PCR over LMM: in the presence of an unknown but spatially confined environmental confounder (e.g., environmental pollution or lifestyle), the PCs may be able to implicitly and effectively adjust for the confounder whereas the LMM cannot. Accordingly, to adjust for both population structures and nongenetic confounders, we propose a hybrid method combining the use and, thus, strengths of PCR and LMM. We use real genotype data and simulated phenotypes to confirm the above points, and establish the superior performance of the hybrid method across all scenarios.
Hüseyin BUDAK
2012-11-01
Full Text Available Credit scoring is a vital topic for Banks since there is a need to use limited financial sources more effectively. There are several credit scoring methods that are used by Banks. One of them is to estimate whether a credit demanding customer’s repayment order will be regular or not. In this study, artificial neural networks and logistic regression analysis have been used to provide a support to the Banks’ credit risk prediction and to estimate whether a credit demanding customers’ repayment order will be regular or not. The results of the study showed that artificial neural networks method is more reliable than logistic regression analysis while estimating a credit demanding customer’s repayment order.
Analysis of Functional Data with Focus on Multinomial Regression and Multilevel Data
Mousavi, Seyed Nourollah
Functional data analysis (FDA) is a fast growing area in statistical research with increasingly diverse range of application from economics, medicine, agriculture, chemometrics, etc. Functional regression is an area of FDA which has received the most attention both in aspects of application...... and methodological development. Our main Functional data analysis (FDA) is a fast growing area in statistical research with increasingly diverse range of application from economics, medicine, agriculture, chemometrics, etc. Functional regression is an area of FDA which has received the most attention both in aspects...... and the prediction of the response at time t only depends on th concurrently observed predictor. We introduce a version of this model for multilevel functional data of the type subjectunit, with the unit-level data being functional observations. Finally, in the fourth paper we show how registration can be applied...
Forecasting municipal solid waste generation using prognostic tools and regression analysis.
Ghinea, Cristina; Drăgoi, Elena Niculina; Comăniţă, Elena-Diana; Gavrilescu, Marius; Câmpean, Teofil; Curteanu, Silvia; Gavrilescu, Maria
2016-11-01
For an adequate planning of waste management systems the accurate forecast of waste generation is an essential step, since various factors can affect waste trends. The application of predictive and prognosis models are useful tools, as reliable support for decision making processes. In this paper some indicators such as: number of residents, population age, urban life expectancy, total municipal solid waste were used as input variables in prognostic models in order to predict the amount of solid waste fractions. We applied Waste Prognostic Tool, regression analysis and time series analysis to forecast municipal solid waste generation and composition by considering the Iasi Romania case study. Regression equations were determined for six solid waste fractions (paper, plastic, metal, glass, biodegradable and other waste). Accuracy Measures were calculated and the results showed that S-curve trend model is the most suitable for municipal solid waste (MSW) prediction.
Regression analysis of growth responses to water depth in three wetland plant species
Sorrell, Brian K; Tanner, Chris C; Brix, Hans
2012-01-01
) differing in depth preferences in wetlands, using non-linear and quantile regression analyses to establish how flooding tolerance can explain field zonation. Methodology Plants were established for 8 months in outdoor cultures in waterlogged soil without standing water, and then randomly allocated to water...... depths from 0 – 0.5 m. Morphological and growth responses to depth were followed for 54 days before harvest, and then analysed by repeated measures analysis of covariance, and non-linear and quantile regression analysis (QRA), to compare flooding tolerances. Principal results Growth responses to depth...... differed between the three species, and were non-linear. P. tenax growth rapidly decreased in standing water > 0.25 m depth, C. secta growth increased initially with depth but then decreased at depths > 0.30 m, accompanied by increased shoot height and decreased shoot density, and T. orientalis...
Regression And Time Series Analysis Of Loan Default At Minescho Cooperative Credit Union Tarkwa
Otoo
2015-08-01
Full Text Available Abstract Lending in the form of loans is a principal business activity for banks credit unions and other financial institutions. This forms a substantial amount of the banks assets. However when these loans are defaulted it tends to have serious effects on the financial institutions. This study sought to determine the trend and forecast loan default at Minescho CreditUnion Tarkwa. A secondary data from the Credit Union was analyzed using Regression Analysis and the Box-Jenkins method of Time Series. From the Regression Analysis there was a moderately strong relationship between the amount of loan default and time. Also the amount of loan default had an increasing trend. The two years forecast of the amount of loan default oscillated initially and remained constant from 2016 onwards.
Regression Analysis of Right-censored Failure Time Data with Missing Censoring Indicators
Ping Chen; Ren He; Jun-shan Shen; Jian-guo Sun
2009-01-01
This paper discusses regression analysis of right-censored failure time data when censoring indicators are missing for some subjects. Several methods have been developed for the analysis under different situations and especially, Goetghebeur and Ryan[4] considered the situation where both the failure time and the censoring time follow the proportional hazards models marginally and developed an estimating equation approach. One limitation of their approach is that the two baseline hazard functions were assumed to be proportional to each other. We consider the same problem and present an efficient estimation procedure for regression parameters that does not require the proportionality assumption. An EM algorithm is developed and the method is evaluated by a simulation study, which indicates that the proposed methodology performs well for practical situations. An illustrative example is provided.
[Regression analysis of an instrumental conditioned tentacular reflex in the edible snail].
Stepanov, I I; Kuntsevich, S V; Lokhov, M I
1989-01-01
Regression analysis revealed the opportunity of approximation with exponential mathematical model of the learning curves of conditioned tentacle reflex. Retention of the reflex persisted for more than three weeks. There were some quantitative differences between conditioning of the right and the left tentacle. There was formation of the reflex in every session during spring period, but there was no retention between sessions. The conditioned tentacle reflex may be employed in neuropharmacological studies.
Nop Sopipan
2013-01-01
Full Text Available The aim of this study was to forecast the returns for the Stock Exchange of Thailand (SET Index by adding some explanatory variables and stationary Autoregressive order p (AR (p in the mean equation of returns. In addition, we used Principal Component Analysis (PCA to remove possible complications caused by multicollinearity. Results showed that the multiple regressions based on PCA, has the best performance.
Yun Joo eYoo
2013-11-01
Full Text Available Multi-marker methods for genetic association analysis can be performed for common and low frequency SNPs to improve power. Regression models are an intuitive way to formulate multi-marker tests. In previous studies we evaluated regression-based multi-marker tests for common SNPs, and through identification of bins consisting of correlated SNPs, developed a multi-bin linear combination (MLC test that is a compromise between a 1df linear combination test and a multi-df global test. Bins of SNPs in high linkage disequilibrium (LD are identified, and a linear combination of individual SNP statistics is constructed within each bin. Then association with the phenotype is represented by an overall statistic with df as many or few as the number of bins. In this report we evaluate multi-marker tests for SNPs that occur at low frequencies. There are many linear and quadratic multi-marker tests that are suitable for common or low frequency variant analysis. We compared the performance of the MLC tests with various linear and quadratic statistics in joint or marginal regressions. For these comparisons, we performed a simulation study of genotypes and quantitative traits for 85 genes with many low frequency SNPs based on HapMap Phase III. We compared the tests using 1 set of all SNPs in a gene, 2 set of common SNPs in a gene (MAF≥5%, 3 set of low frequency SNPs (1%≤MAF
Monitoring heavy metal Cr in soil based on hyperspectral data using regression analysis
Zhang, Ningyu; Xu, Fuyun; Zhuang, Shidong; He, Changwei
2016-10-01
Heavy metal pollution in soils is one of the most critical problems in the global ecology and environment safety nowadays. Hyperspectral remote sensing and its application is capable of high speed, low cost, less risk and less damage, and provides a good method for detecting heavy metals in soil. This paper proposed a new idea of applying regression analysis of stepwise multiple regression between the spectral data and monitoring the amount of heavy metal Cr by sample points in soil for environmental protection. In the measurement, a FieldSpec HandHeld spectroradiometer is used to collect reflectance spectra of sample points over the wavelength range of 325-1075 nm. Then the spectral data measured by the spectroradiometer is preprocessed to reduced the influence of the external factors, and the preprocessed methods include first-order differential equation, second-order differential equation and continuum removal method. The algorithms of stepwise multiple regression are established accordingly, and the accuracy of each equation is tested. The results showed that the accuracy of first-order differential equation works best, which makes it feasible to predict the content of heavy metal Cr by using stepwise multiple regression.
Robust estimation for homoscedastic regression in the secondary analysis of case-control data
Wei, Jiawei
2012-12-04
Primary analysis of case-control studies focuses on the relationship between disease D and a set of covariates of interest (Y, X). A secondary application of the case-control study, which is often invoked in modern genetic epidemiologic association studies, is to investigate the interrelationship between the covariates themselves. The task is complicated owing to the case-control sampling, where the regression of Y on X is different from what it is in the population. Previous work has assumed a parametric distribution for Y given X and derived semiparametric efficient estimation and inference without any distributional assumptions about X. We take up the issue of estimation of a regression function when Y given X follows a homoscedastic regression model, but otherwise the distribution of Y is unspecified. The semiparametric efficient approaches can be used to construct semiparametric efficient estimates, but they suffer from a lack of robustness to the assumed model for Y given X. We take an entirely different approach. We show how to estimate the regression parameters consistently even if the assumed model for Y given X is incorrect, and thus the estimates are model robust. For this we make the assumption that the disease rate is known or well estimated. The assumption can be dropped when the disease is rare, which is typically so for most case-control studies, and the estimation algorithm simplifies. Simulations and empirical examples are used to illustrate the approach.
Hiroyuki Nakamoto
2014-01-01
Full Text Available The human is covered with soft skin and has tactile receptors inside. The skin deforms along a contact surface. The tactile receptors detect the mechanical deformation. The detection of the mechanical deformation is essential for the tactile sensation. We propose a magnetic type tactile sensor which has a soft surface and eight magnetoresistive elements. The soft surface has a permanent magnet inside and the magnetoresistive elements under the soft surface measure the magnetic flux density of the magnet. The tactile sensor estimates the displacement and the rotation on the surface based on the change of the magnetic flux density. Determination of an estimate equation is difficult because the displacement and the rotation are not geometrically decided based on the magnetic flux density. In this paper, a stepwise regression analysis determines the estimate equation. The outputs of the magnetoresistive elements are used as explanatory variables, and the three-axis displacement and the two-axis rotation are response variables in the regression analysis. We confirm the regression analysis is effective for determining the estimate equations through simulation and experiment. The results show the tactile sensor measures both the displacement and the rotation generated on the surface by using the determined equation.
Non-Stationary Hydrologic Frequency Analysis using B-Splines Quantile Regression
Nasri, B.; St-Hilaire, A.; Bouezmarni, T.; Ouarda, T.
2015-12-01
Hydrologic frequency analysis is commonly used by engineers and hydrologists to provide the basic information on planning, design and management of hydraulic structures and water resources system under the assumption of stationarity. However, with increasing evidence of changing climate, it is possible that the assumption of stationarity would no longer be valid and the results of conventional analysis would become questionable. In this study, we consider a framework for frequency analysis of extreme flows based on B-Splines quantile regression, which allows to model non-stationary data that have a dependence on covariates. Such covariates may have linear or nonlinear dependence. A Markov Chain Monte Carlo (MCMC) algorithm is used to estimate quantiles and their posterior distributions. A coefficient of determination for quantiles regression is proposed to evaluate the estimation of the proposed model for each quantile level. The method is applied on annual maximum and minimum streamflow records in Ontario, Canada. Climate indices are considered to describe the non-stationarity in these variables and to estimate the quantiles in this case. The results show large differences between the non-stationary quantiles and their stationary equivalents for annual maximum and minimum discharge with high annual non-exceedance probabilities. Keywords: Quantile regression, B-Splines functions, MCMC, Streamflow, Climate indices, non-stationarity.
Das Sumonkanti
2011-11-01
Full Text Available Abstract Background The study attempts to develop an ordinal logistic regression (OLR model to identify the determinants of child malnutrition instead of developing traditional binary logistic regression (BLR model using the data of Bangladesh Demographic and Health Survey 2004. Methods Based on weight-for-age anthropometric index (Z-score child nutrition status is categorized into three groups-severely undernourished ( Results All the models determine that age of child, birth interval, mothers' education, maternal nutrition, household wealth status, child feeding index, and incidence of fever, ARI & diarrhoea were the significant predictors of child malnutrition; however, results of PPOM were more precise than those of other models. Conclusion These findings clearly justify that OLR models (POM and PPOM are appropriate to find predictors of malnutrition instead of BLR models.
Engvall, Karin; Hult, M; Corner, R; Lampa, E; Norbäck, D; Emenius, G
2010-01-01
The aim was to develop a new model to identify residential buildings with higher frequencies of "SBS" than expected, "risk buildings". In 2005, 481 multi-family buildings with 10,506 dwellings in Stockholm were studied by a new stratified random sampling. A standardised self-administered questionnaire was used to assess "SBS", atopy and personal factors. The response rate was 73%. Statistical analysis was performed by multiple logistic regressions. Dwellers owning their building reported less "SBS" than those renting. There was a strong relationship between socio-economic factors and ownership. The regression model, ended up with high explanatory values for age, gender, atopy and ownership. Applying our model, 9% of all residential buildings in Stockholm were classified as "risk buildings" with the highest proportion in houses built 1961-1975 (26%) and lowest in houses built 1985-1990 (4%). To identify "risk buildings", it is necessary to adjust for ownership and population characteristics.
Bo, Yan-Chen; Song, Chao; Wang, Jin-Feng; Li, Xiao-Wen
2014-04-14
There have been large-scale outbreaks of hand, foot and mouth disease (HFMD) in Mainland China over the last decade. These events varied greatly across the country. It is necessary to identify the spatial risk factors and spatial distribution patterns of HFMD for public health control and prevention. Climate risk factors associated with HFMD occurrence have been recognized. However, few studies discussed the socio-economic determinants of HFMD risk at a space scale. HFMD records in Mainland China in May 2008 were collected. Both climate and socio-economic factors were selected as potential risk exposures of HFMD. Odds ratio (OR) was used to identify the spatial risk factors. A spatial autologistic regression model was employed to get OR values of each exposures and model the spatial distribution patterns of HFMD risk. Results showed that both climate and socio-economic variables were spatial risk factors for HFMD transmission in Mainland China. The statistically significant risk factors are monthly average precipitation (OR = 1.4354), monthly average temperature (OR = 1.379), monthly average wind speed (OR = 1.186), the number of industrial enterprises above designated size (OR = 17.699), the population density (OR = 1.953), and the proportion of student population (OR = 1.286). The spatial autologistic regression model has a good goodness of fit (ROC = 0.817) and prediction accuracy (Correct ratio = 78.45%) of HFMD occurrence. The autologistic regression model also reduces the contribution of the residual term in the ordinary logistic regression model significantly, from 17.25 to 1.25 for the odds ratio. Based on the prediction results of the spatial model, we obtained a map of the probability of HFMD occurrence that shows the spatial distribution pattern and local epidemic risk over Mainland China. The autologistic regression model was used to identify spatial risk factors and model spatial risk patterns of HFMD. HFMD occurrences were found to be spatially
Ermarth, Anna; Bryce, Matthew; Woodward, Stephanie; Stoddard, Gregory; Book, Linda; Jensen, M Kyle
2017-03-01
Celiac disease is detected using serology and endoscopy analyses. We used multiple statistical analyses of a geographically isolated population in the United States to determine whether a single serum screening can identify individuals with celiac disease. We performed a retrospective study of 3555 pediatric patients (18 years old or younger) in the intermountain West region of the United States from January 1, 2008, through September 30, 2013. All patients had undergone serologic analyses for celiac disease, including measurement of antibodies to tissue transglutaminase (TTG) and/or deamidated gliadin peptide (DGP), and had duodenal biopsies collected within the following year. Modified Marsh criteria were used to identify patients with celiac disease. We developed models to identify patients with celiac disease using logistic regression and classification and regression tree (CART) analysis. Single use of a test for serum level of IgA against TTG identified patients with celiac disease with 90% sensitivity, 90% specificity, a 61% positive predictive value (PPV), a 90% negative predictive value, and an area under the receiver operating characteristic curve value of 0.91; these values were higher than those obtained from assays for IgA against DGP or IgG against TTG plus DGP. Not including the test for DGP antibody caused only 0.18% of celiac disease cases to be missed. Level of TTG IgA 7-fold the upper limit of normal (ULN) identified patients with celiac disease with a 96% PPV and 100% specificity. Using CART analysis, we found a level of TTG IgA 3.2-fold the ULN and higher to most accurately identify patients with celiac disease (PPV, 89%). Multivariable CART analysis showed that a level of TTG IgA 2.5-fold the ULN and higher was sufficient to identify celiac disease in patients with type 1 diabetes (PPV, 88%). Serum level of IgA against TTG in patients with versus those without trisomy 21 did not affect diagnosis predictability in CART analysis. In a population
Deni Memić
2015-01-01
Full Text Available This article has an aim to assess credit default prediction on the banking market in Bosnia and Herzegovina nationwide as well as on its constitutional entities (Federation of Bosnia and Herzegovina and Republika Srpska. Ability to classify companies info different predefined groups or finding an appropriate tool which would replace human assessment in classifying companies into good and bad buckets has been one of the main interests on risk management researchers for a long time. We investigated the possibility and accuracy of default prediction using traditional statistical methods logistic regression (logit and multiple discriminant analysis (MDA and compared their predictive abilities. The results show that the created models have high predictive ability. For logit models, some variables are more influential on the default prediction than the others. Return on assets (ROA is statistically significant in all four periods prior to default, having very high regression coefficients, or high impact on the model's ability to predict default. Similar results are obtained for MDA models. It is also found that predictive ability differs between logistic regression and multiple discriminant analysis.
Correlation Study and Regression Analysis of Drinking Water Quality in Kashan City, Iran
Mohammad Mehdi HEYDARI
2013-06-01
Full Text Available Chemical and statistical regression analysis on drinking water samples at five fields (21 sampling wells with hot and dry climate in Kashan city, central Iran was carried out. Samples were collected during October 2006 to May 2007 (25 - 30 °C. Comparing the results with drinking water quality standards issued by World Health Organization (WHO, it is found that some of the water samples are not potable. Hydrochemical facies using a Piper diagram indicate that in most parts of the city, the chemical character of water is dominated by NaCl. All samples showed sulfate and sodium ion higher and K+ and F- content lower than the permissible limit. A strongly positive correlation is observed between TDS and EC (R = 0.995 and Ca2+ and TH (R = 0.948. The results showed that regression relations have the same correlation coefficients: (I pH -TH, EC -TH (R = 0.520, (II NO3- -pH, TH-pH (R = 0.520, (III Ca2+-SO42-, TH-SO42-, Cl- -SO42- (R = 0.630. The results revealed that systematic calculations of correlation coefficients between water parameters and regression analysis provide a useful means for rapid monitoring of water quality.
Analysis of ontogenetic spectra of populations of plants and lichens via ordinal regression
Sofronov, G. Yu.; Glotov, N. V.; Ivanov, S. M.
2015-03-01
Ontogenetic spectra of plants and lichens tend to vary across the populations. This means that if several subsamples within a sample (or a population) were collected, then the subsamples would not be homogeneous. Consequently, the statistical analysis of the aggregated data would not be correct, which could potentially lead to false biological conclusions. In order to take into account the heterogeneity of the subsamples, we propose to use ordinal regression, which is a type of generalized linear regression. In this paper, we study the populations of cowberry Vaccinium vitis-idaea L. and epiphytic lichens Hypogymnia physodes (L.) Nyl. and Pseudevernia furfuracea (L.) Zopf. We obtain estimates for the proportions of between-sample variability in the total variability of the ontogenetic spectra of the populations.
Lingling; TAN
2013-01-01
This article selects some major factors influencing the agricultural economic growth are selected,such as labor,capital input,farmland area,fertilizer input and information input.And it selects some factors to explain information input,such as the number of website ownership,types of books,magazines and newspapers published,the number of telephone ownership per 100 households,the number of home computers ownership per 100 households,farmers’ spending on transportation and communication,culture,education,entertainment and services, and the total number of agricultural science and technology service personnel.Using regression model,this article conducts regression analysis of the cross-section data on 31 provinces,autonomous regions and municipalities in 2010.The results show that the building of information infrastructure,the use of means of information,the popularization and promotion of knowledge of agricultural science and technology,play an important role in promoting agricultural economic growth.
Sub-pixel estimation of tree cover and bare surface densities using regression tree analysis
Carlos Augusto Zangrando Toneli
2011-09-01
Full Text Available Sub-pixel analysis is capable of generating continuous fields, which represent the spatial variability of certain thematic classes. The aim of this work was to develop numerical models to represent the variability of tree cover and bare surfaces within the study area. This research was conducted in the riparian buffer within a watershed of the São Francisco River in the North of Minas Gerais, Brazil. IKONOS and Landsat TM imagery were used with the GUIDE algorithm to construct the models. The results were two index images derived with regression trees for the entire study area, one representing tree cover and the other representing bare surface. The use of non-parametric and non-linear regression tree models presented satisfactory results to characterize wetland, deciduous and savanna patterns of forest formation.
THE PROGNOSIS OF RUSSIAN DEFENSE INDUSTRY DEVELOPMENT IMPLEMENTED THROUGH REGRESSION ANALYSIS
L.M. Kapustina
2007-03-01
Full Text Available The article illustrates the results of investigation the major internal and external factors which influence the development of the defense industry, as well as the results of regression analysis which quantitatively displays the factorial contribution in the growth rate of Russian defense industry. On the basis of calculated regression dependences the authors fulfilled the medium-term prognosis of defense industry. Optimistic and inertial versions of defense product growth rate for the period up to 2009 are based on scenario conditions in Russian economy worked out by the Ministry of economy and development. In conclusion authors point out which factors and conditions have the largest impact on successful and stable operation of Russian defense industry.
Tso, Geoffrey K.F.; Yau, Kelvin K.W. [City University of Hong Kong, Kowloon, Hong Kong (China). Department of Management Sciences
2007-09-15
This study presents three modeling techniques for the prediction of electricity energy consumption. In addition to the traditional regression analysis, decision tree and neural networks are considered. Model selection is based on the square root of average squared error. In an empirical application to an electricity energy consumption study, the decision tree and neural network models appear to be viable alternatives to the stepwise regression model in understanding energy consumption patterns and predicting energy consumption levels. With the emergence of the data mining approach for predictive modeling, different types of models can be built in a unified platform: to implement various modeling techniques, assess the performance of different models and select the most appropriate model for future prediction. (author)
COLOR IMAGE RETRIEVAL BASED ON FEATURE FUSION THROUGH MULTIPLE LINEAR REGRESSION ANALYSIS
K. Seetharaman
2015-08-01
Full Text Available This paper proposes a novel technique based on feature fusion using multiple linear regression analysis, and the least-square estimation method is employed to estimate the parameters. The given input query image is segmented into various regions according to the structure of the image. The color and texture features are extracted on each region of the query image, and the features are fused together using the multiple linear regression model. The estimated parameters of the model, which is modeled based on the features, are formed as a vector called a feature vector. The Canberra distance measure is adopted to compare the feature vectors of the query and target images. The F-measure is applied to evaluate the performance of the proposed technique. The obtained results expose that the proposed technique is comparable to the other existing techniques.
A refined method for multivariate meta-analysis and meta-regression.
Jackson, Daniel; Riley, Richard D
2014-02-20
Making inferences about the average treatment effect using the random effects model for meta-analysis is problematic in the common situation where there is a small number of studies. This is because estimates of the between-study variance are not precise enough to accurately apply the conventional methods for testing and deriving a confidence interval for the average effect. We have found that a refined method for univariate meta-analysis, which applies a scaling factor to the estimated effects' standard error, provides more accurate inference. We explain how to extend this method to the multivariate scenario and show that our proposal for refined multivariate meta-analysis and meta-regression can provide more accurate inferences than the more conventional approach. We explain how our proposed approach can be implemented using standard output from multivariate meta-analysis software packages and apply our methodology to two real examples. Copyright © 2013 John Wiley & Sons, Ltd.
Rhodes, Kirsty M; Turner, Rebecca M; White, Ian R; Jackson, Dan; Spiegelhalter, David J; Higgins, Julian P T
2016-12-20
Many meta-analyses combine results from only a small number of studies, a situation in which the between-study variance is imprecisely estimated when standard methods are applied. Bayesian meta-analysis allows incorporation of external evidence on heterogeneity, providing the potential for more robust inference on the effect size of interest. We present a method for performing Bayesian meta-analysis using data augmentation, in which we represent an informative conjugate prior for between-study variance by pseudo data and use meta-regression for estimation. To assist in this, we derive predictive inverse-gamma distributions for the between-study variance expected in future meta-analyses. These may serve as priors for heterogeneity in new meta-analyses. In a simulation study, we compare approximate Bayesian methods using meta-regression and pseudo data against fully Bayesian approaches based on importance sampling techniques and Markov chain Monte Carlo (MCMC). We compare the frequentist properties of these Bayesian methods with those of the commonly used frequentist DerSimonian and Laird procedure. The method is implemented in standard statistical software and provides a less complex alternative to standard MCMC approaches. An importance sampling approach produces almost identical results to standard MCMC approaches, and results obtained through meta-regression and pseudo data are very similar. On average, data augmentation provides closer results to MCMC, if implemented using restricted maximum likelihood estimation rather than DerSimonian and Laird or maximum likelihood estimation. The methods are applied to real datasets, and an extension to network meta-analysis is described. The proposed method facilitates Bayesian meta-analysis in a way that is accessible to applied researchers. © 2016 The Authors. Statistics in Medicine Published by John Wiley & Sons Ltd.
Weidmann, C; Schneider, S; Litaker, D; Weck, E; Klüter, H
2012-01-01
Previous studies have shown substantial geographical variation in blood donation within developed countries. To understand this issue better, we identified community characteristics associated with blood donor rates in German municipalities in an ecological analysis. We calculated an aggregated rate of voluntary blood donors from each of 1533 municipalities in south-west Germany in 2007 from a database of the German Red Cross Blood Service. A multiple linear regression model estimated the association between the municipality-specific donor rate and several community characteristics. Finally, a spatial lag regression model was used to control for spatial autocorrelation that occurs when neighbouring units are related to each other. The spatial lag regression model showed that a relatively larger population, a higher percentage of inhabitants older than 30 years, a higher percentage of non-German citizens and a higher percentage of unemployed persons were associated with lower municipality-specific donor rates. Conversely, a higher donor rate was correlated with higher voter turnout, a higher percentage of inhabitants between 18 and 24 years and more frequent mobile donation sites. Blood donation appears to be a highly clustered regional phenomenon, suggesting the need for regionally targeted recruiting efforts and careful consideration of the value of mobile donation sites. Our model further suggests that municipalities with a decreasing percentage of 18- to 24-year-olds and an increasing percentage of older inhabitants may experience substantial declines in future blood donations. © 2011 The Author(s). Vox Sanguinis © 2011 International Society of Blood Transfusion.
Dixon, Barnali
2009-09-01
Accurate and inexpensive identification of potentially contaminated wells is critical for water resources protection and management. The objectives of this study are to 1) assess the suitability of approximation tools such as neural networks (NN) and support vector machines (SVM) integrated in a geographic information system (GIS) for identifying contaminated wells and 2) use logistic regression and feature selection methods to identify significant variables for transporting contaminants in and through the soil profile to the groundwater. Fourteen GIS derived soil hydrogeologic and landuse parameters were used as initial inputs in this study. Well water quality data (nitrate-N) from 6,917 wells provided by Florida Department of Environmental Protection (USA) were used as an output target class. The use of the logistic regression and feature selection methods reduced the number of input variables to nine. Receiver operating characteristics (ROC) curves were used for evaluation of these approximation tools. Results showed superior performance with the NN as compared to SVM especially on training data while testing results were comparable. Feature selection did not improve accuracy; however, it helped increase the sensitivity or true positive rate (TPR). Thus, a higher TPR was obtainable with fewer variables.
Dhanya, S; Kumari Roshni, V S
2016-01-01
Textures play an important role in image classification. This paper proposes a high performance texture classification method using a combination of multiresolution analysis tool and linear regression modelling by channel elimination. The correlation between different frequency regions has been validated as a sort of effective texture characteristic. This method is motivated by the observation that there exists a distinctive correlation between the image samples belonging to the same kind of texture, at different frequency regions obtained by a wavelet transform. Experimentally, it is observed that this correlation differs across textures. The linear regression modelling is employed to analyze this correlation and extract texture features that characterize the samples. Our method considers not only the frequency regions but also the correlation between these regions. This paper primarily focuses on applying the Dual Tree Complex Wavelet Packet Transform and the Linear Regression model for classification of the obtained texture features. Additionally the paper also presents a comparative assessment of the classification results obtained from the above method with two more types of wavelet transform methods namely the Discrete Wavelet Transform and the Discrete Wavelet Packet Transform.
Ryu, Duchwan
2010-09-28
We consider nonparametric regression analysis in a generalized linear model (GLM) framework for data with covariates that are the subject-specific random effects of longitudinal measurements. The usual assumption that the effects of the longitudinal covariate processes are linear in the GLM may be unrealistic and if this happens it can cast doubt on the inference of observed covariate effects. Allowing the regression functions to be unknown, we propose to apply Bayesian nonparametric methods including cubic smoothing splines or P-splines for the possible nonlinearity and use an additive model in this complex setting. To improve computational efficiency, we propose the use of data-augmentation schemes. The approach allows flexible covariance structures for the random effects and within-subject measurement errors of the longitudinal processes. The posterior model space is explored through a Markov chain Monte Carlo (MCMC) sampler. The proposed methods are illustrated and compared to other approaches, the "naive" approach and the regression calibration, via simulations and by an application that investigates the relationship between obesity in adulthood and childhood growth curves. © 2010, The International Biometric Society.
Ryu, Duchwan; Li, Erning; Mallick, Bani K
2011-06-01
We consider nonparametric regression analysis in a generalized linear model (GLM) framework for data with covariates that are the subject-specific random effects of longitudinal measurements. The usual assumption that the effects of the longitudinal covariate processes are linear in the GLM may be unrealistic and if this happens it can cast doubt on the inference of observed covariate effects. Allowing the regression functions to be unknown, we propose to apply Bayesian nonparametric methods including cubic smoothing splines or P-splines for the possible nonlinearity and use an additive model in this complex setting. To improve computational efficiency, we propose the use of data-augmentation schemes. The approach allows flexible covariance structures for the random effects and within-subject measurement errors of the longitudinal processes. The posterior model space is explored through a Markov chain Monte Carlo (MCMC) sampler. The proposed methods are illustrated and compared to other approaches, the "naive" approach and the regression calibration, via simulations and by an application that investigates the relationship between obesity in adulthood and childhood growth curves.
Evaluation of a LASSO regression approach on the unrelated samples of Genetic Analysis Workshop 17.
Guo, Wei; Elston, Robert C; Zhu, Xiaofeng
2011-11-29
The Genetic Analysis Workshop 17 data we used comprise 697 unrelated individuals genotyped at 24,487 single-nucleotide polymorphisms (SNPs) from a mini-exome scan, using real sequence data for 3,205 genes annotated by the 1000 Genomes Project and simulated phenotypes. We studied 200 sets of simulated phenotypes of trait Q2. An important feature of this data set is that most SNPs are rare, with 87% of the SNPs having a minor allele frequency less than 0.05. For rare SNP detection, in this study we performed a least absolute shrinkage and selection operator (LASSO) regression and F tests at the gene level and calculated the generalized degrees of freedom to avoid any selection bias. For comparison, we also carried out linear regression and the collapsing method, which sums the rare SNPs, modified for a quantitative trait and with two different allele frequency thresholds. The aim of this paper is to evaluate these four approaches in this mini-exome data and compare their performance in terms of power and false positive rates. In most situations the LASSO approach is more powerful than linear regression and collapsing methods. We also note the difficulty in determining the optimal threshold for the collapsing method and the significant role that linkage disequilibrium plays in detecting rare causal SNPs. If a rare causal SNP is in strong linkage disequilibrium with a common marker in the same gene, power will be much improved.
A transcriptome analysis by lasso penalized Cox regression for pancreatic cancer survival.
Wu, Tong Tong; Gong, Haijun; Clarke, Edmund M
2011-12-01
Pancreatic cancer is the fourth leading cause of cancer deaths in the United States with five-year survival rates less than 5% due to rare detection in early stages. Identification of genes that are directly correlated to pancreatic cancer survival is crucial for pancreatic cancer diagnostics and treatment. However, no existing GWAS or transcriptome studies are available for addressing this problem. We apply lasso penalized Cox regression to a transcriptome study to identify genes that are directly related to pancreatic cancer survival. This method is capable of handling the right censoring effect of survival times and the ultrahigh dimensionality of genetic data. A cyclic coordinate descent algorithm is employed to rapidly select the most relevant genes and eliminate the irrelevant ones. Twelve genes have been identified and verified to be directly correlated to pancreatic cancer survival time and can be used for the prediction of future patient's survival.
Katrina Witt
Full Text Available BACKGROUND: Previous reviews on risk and protective factors for violence in psychosis have produced contrasting findings. There is therefore a need to clarify the direction and strength of association of risk and protective factors for violent outcomes in individuals with psychosis. METHOD: We conducted a systematic review and meta-analysis using 6 electronic databases (CINAHL, EBSCO, EMBASE, Global Health, PsycINFO, PUBMED and Google Scholar. Studies were identified that reported factors associated with violence in adults diagnosed, using DSM or ICD criteria, with schizophrenia and other psychoses. We considered non-English language studies and dissertations. Risk and protective factors were meta-analysed if reported in three or more primary studies. Meta-regression examined sources of heterogeneity. A novel meta-epidemiological approach was used to group similar risk factors into one of 10 domains. Sub-group analyses were then used to investigate whether risk domains differed for studies reporting severe violence (rather than aggression or hostility and studies based in inpatient (rather than outpatient settings. FINDINGS: There were 110 eligible studies reporting on 45,533 individuals, 8,439 (18.5% of whom were violent. A total of 39,995 (87.8% were diagnosed with schizophrenia, 209 (0.4% were diagnosed with bipolar disorder, and 5,329 (11.8% were diagnosed with other psychoses. Dynamic (or modifiable risk factors included hostile behaviour, recent drug misuse, non-adherence with psychological therapies (p values<0.001, higher poor impulse control scores, recent substance misuse, recent alcohol misuse (p values<0.01, and non-adherence with medication (p value <0.05. We also examined a number of static factors, the strongest of which were criminal history factors. When restricting outcomes to severe violence, these associations did not change materially. In studies investigating inpatient violence, associations differed in strength but not
Javali Shivalingappa
2010-01-01
Full Text Available Aim: The study aimed to determine the factors associated with periodontal disease (different levels of severity by using different regression models for ordinal data. Design: A cross-sectional design was employed using clinical examination and ′questionnaire with interview′ method. Materials and Methods: The study was conducted during June 2008 to October 2008 in Dharwad, Karnataka, India. It involved a systematic random sample of 1760 individuals aged 18-40 years. The periodontal disease examination was conducted by using Community Periodontal Index for Treatment Needs (CPITN. Statistical Analysis Used: Regression models for ordinal data with different built-in link functions were used in determination of factors associated with periodontal disease. Results: The study findings indicated that, the ordinal regression models with four built-in link functions (logit, probit, Clog-log and nlog-log displayed similar results with negligible differences in significant factors associated with periodontal disease. The factors such as religion, caste, sources of drinking water, Timings for sweet consumption, Timings for cleaning or brushing the teeth and materials used for brushing teeth were significantly associated with periodontal disease in all ordinal models. Conclusions: The ordinal regression model with Clog-log is a better fit in determination of significant factors associated with periodontal disease as compared to models with logit, probit and nlog-log built-in link functions. The factors such as caste and time for sweet consumption are negatively associated with periodontal disease. But religion, sources of drinking water, Timings for cleaning or brushing the teeth and materials used for brushing teeth are significantly and positively associated with periodontal disease.
Analysis of sparse data in logistic regression in medical research: A newer approach
S Devika
2016-01-01
Full Text Available Background and Objective: In the analysis of dichotomous type response variable, logistic regression is usually used. However, the performance of logistic regression in the presence of sparse data is questionable. In such a situation, a common problem is the presence of high odds ratios (ORs with very wide 95% confidence interval (CI (OR: >999.999, 95% CI: 999.999. In this paper, we addressed this issue by using penalized logistic regression (PLR method. Materials and Methods: Data from case-control study on hyponatremia and hiccups conducted in Christian Medical College, Vellore, Tamil Nadu, India was used. The outcome variable was the presence/absence of hiccups and the main exposure variable was the status of hyponatremia. Simulation dataset was created with different sample sizes and with a different number of covariates. Results: A total of 23 cases and 50 controls were used for the analysis of ordinary and PLR methods. The main exposure variable hyponatremia was present in nine (39.13% of the cases and in four (8.0% of the controls. Of the 23 hiccup cases, all were males and among the controls, 46 (92.0% were males. Thus, the complete separation between gender and the disease group led into an infinite OR with 95% CI (OR: >999.999, 95% CI: 999.999 whereas there was a finite and consistent regression coefficient for gender (OR: 5.35; 95% CI: 0.42, 816.48 using PLR. After adjusting for all the confounding variables, hyponatremia entailed 7.9 (95% CI: 2.06, 38.86 times higher risk for the development of hiccups as was found using PLR whereas there was an overestimation of risk OR: 10.76 (95% CI: 2.17, 53.41 using the conventional method. Simulation experiment shows that the estimated coverage probability of this method is near the nominal level of 95% even for small sample sizes and for a large number of covariates. Conclusions: PLR is almost equal to the ordinary logistic regression when the sample size is large and is superior in small cell
Hofland, G.S.; Barton, C.C.
1990-10-01
The computer program FREQFIT is designed to perform regression and statistical chi-squared goodness of fit analysis on one-dimensional or two-dimensional data. The program features an interactive user dialogue, numerous help messages, an option for screen or line printer output, and the flexibility to use practically any commercially available graphics package to create plots of the program`s results. FREQFIT is written in Microsoft QuickBASIC, for IBM-PC compatible computers. A listing of the QuickBASIC source code for the FREQFIT program, a user manual, and sample input data, output, and plots are included. 6 refs., 1 fig.
Gunay, Ahmet [Deparment of Environmental Engineering, Faculty of Engineering and Architecture, Balikesir University (Turkey)], E-mail: ahmetgunay2@gmail.com
2007-09-30
The experimental data of ammonium exchange by natural Bigadic clinoptilolite was evaluated using nonlinear regression analysis. Three two-parameters isotherm models (Langmuir, Freundlich and Temkin) and three three-parameters isotherm models (Redlich-Peterson, Sips and Khan) were used to analyse the equilibrium data. Fitting of isotherm models was determined using values of standard normalization error procedure (SNE) and coefficient of determination (R{sup 2}). HYBRID error function provided lowest sum of normalized error and Khan model had better performance for modeling the equilibrium data. Thermodynamic investigation indicated that ammonium removal by clinoptilolite was favorable at lower temperatures and exothermic in nature.
Zhigao Zeng
2016-01-01
Full Text Available This paper proposes a novel algorithm to solve the challenging problem of classifying error-diffused halftone images. We firstly design the class feature matrices, after extracting the image patches according to their statistics characteristics, to classify the error-diffused halftone images. Then, the spectral regression kernel discriminant analysis is used for feature dimension reduction. The error-diffused halftone images are finally classified using an idea similar to the nearest centroids classifier. As demonstrated by the experimental results, our method is fast and can achieve a high classification accuracy rate with an added benefit of robustness in tackling noise.
A systematic review and meta-regression analysis of mivacurium for tracheal intubation
Vanlinthout, L.E.H.; Mesfin, S.H.; Hens, Niel; Vanacker, B. F.; Robertson, E. N.; Booij, L. H. D. J.
2014-01-01
We systematically reviewed factors associated with intubation conditions in randomised controlled trials of mivacurium, using random-effects meta-regression analysis. We included 29 studies of 1050 healthy participants. Four factors explained 72.9% of the variation in the probability of excellent intubation conditions: mivacurium dose, 24.4%; opioid use, 29.9%; time to intubation and age together, 18.6%. The odds ratio (95% CI) for excellent intubation was 3.14 (1.65–5.73) for doubling the mi...
Mehdi Najafi; Seyed Mohammad Esmaiel Jalali; Reza KhaloKakaie; Farrokh Forouhandeh
2015-01-01
During underground coal gasification (UCG), whereby coal is converted to syngas in situ, a cavity is formed in the coal seam. The cavity growth rate (CGR) or the moving rate of the gasification face is affected by controllable (operation pressure, gasification time, geometry of UCG panel) and uncontrollable (coal seam properties) factors. The CGR is usually predicted by mathematical models and laboratory experiments, which are time consuming, cumbersome and expensive. In this paper, a new simple model for CGR is developed using non-linear regression analysis, based on data from 11 UCG field trials. The empirical model compares satisfactorily with Perkins model and can reliably predict CGR.
Hofland, G.S.; Barton, C.C.
1990-10-01
The computer program FREQFIT is designed to perform regression and statistical chi-squared goodness of fit analysis on one-dimensional or two-dimensional data. The program features an interactive user dialogue, numerous help messages, an option for screen or line printer output, and the flexibility to use practically any commercially available graphics package to create plots of the program`s results. FREQFIT is written in Microsoft QuickBASIC, for IBM-PC compatible computers. A listing of the QuickBASIC source code for the FREQFIT program, a user manual, and sample input data, output, and plots are included. 6 refs., 1 fig.
Kinnebrock, Silja; Podolskij, Mark
This paper introduces a new estimator to measure the ex-post covariation between high-frequency financial time series under market microstructure noise. We provide an asymptotic limit theory (including feasible central limit theorems) for standard methods such as regression, correlation analysis...... and covariance, for which we obtain the optimal rate of convergence. We demonstrate some positive semidefinite estimators of the covariation and construct a positive semidefinite estimator of the conditional covariance matrix in the central limit theorem. Furthermore, we indicate how the assumptions on the noise...
Regression analysis as an objective tool of economic management of rolling mill
Š. Vilamová
2015-07-01
Full Text Available The ability to optimize costs plays a key role in maintaining competitiveness of the company, because without detailed knowledge of costs, companies are not able to make the right decisions that will ensure their long-term growth. The aim of this article is to outline the problematic areas related to company costs and to contribute to a debate on the method used to determine the amount of fixed and variable costs, their monitoring and follow-up control. This article presents a potential use of regression analysis as an objective tool of economic management in metallurgical companies, as these companies have several specific features
Kinnebrock, Silja; Podolskij, Mark
and covariance, for which we obtain the optimal rate of convergence. We demonstrate some positive semidefinite estimators of the covariation and construct a positive semidefinite estimator of the conditional covariance matrix in the central limit theorem. Furthermore, we indicate how the assumptions on the noise......This paper introduces a new estimator to measure the ex-post covariation between high-frequency financial time series under market microstructure noise. We provide an asymptotic limit theory (including feasible central limit theorems) for standard methods such as regression, correlation analysis...
Estimating the causes of traffic accidents using logistic regression and discriminant analysis.
Karacasu, Murat; Ergül, Barış; Altin Yavuz, Arzu
2014-01-01
Factors that affect traffic accidents have been analysed in various ways. In this study, we use the methods of logistic regression and discriminant analysis to determine the damages due to injury and non-injury accidents in the Eskisehir Province. Data were obtained from the accident reports of the General Directorate of Security in Eskisehir; 2552 traffic accidents between January and December 2009 were investigated regarding whether they resulted in injury. According to the results, the effects of traffic accidents were reflected in the variables. These results provide a wealth of information that may aid future measures toward the prevention of undesired results.
Neck-focused panic attacks among Cambodian refugees; a logistic and linear regression analysis.
Hinton, Devon E; Chhean, Dara; Pich, Vuth; Um, Khin; Fama, Jeanne M; Pollack, Mark H
2006-01-01
Consecutive Cambodian refugees attending a psychiatric clinic were assessed for the presence and severity of current--i.e., at least one episode in the last month--neck-focused panic. Among the whole sample (N=130), in a logistic regression analysis, the Anxiety Sensitivity Index (ASI; odds ratio=3.70) and the Clinician-Administered PTSD Scale (CAPS; odds ratio=2.61) significantly predicted the presence of current neck panic (NP). Among the neck panic patients (N=60), in the linear regression analysis, NP severity was significantly predicted by NP-associated flashbacks (beta=.42), NP-associated catastrophic cognitions (beta=.22), and CAPS score (beta=.28). Further analysis revealed the effect of the CAPS score to be significantly mediated (Sobel test [Baron, R. M., & Kenny, D. A. (1986). The moderator-mediator variable distinction in social psychological research: conceptual, strategic, and statistical considerations. Journal of Personality and Social Psychology, 51, 1173-1182]) by both NP-associated flashbacks and catastrophic cognitions. In the care of traumatized Cambodian refugees, NP severity, as well as NP-associated flashbacks and catastrophic cognitions, should be specifically assessed and treated.
Mandal, Nilrudra; Doloi, Biswanath; Mondal, Biswanath
2016-01-01
In the present study, an attempt has been made to apply the Taguchi parameter design method and regression analysis for optimizing the cutting conditions on surface finish while machining AISI 4340 steel with the help of the newly developed yttria based Zirconia Toughened Alumina (ZTA) inserts. These inserts are prepared through wet chemical co-precipitation route followed by powder metallurgy process. Experiments have been carried out based on an orthogonal array L9 with three parameters (cutting speed, depth of cut and feed rate) at three levels (low, medium and high). Based on the mean response and signal to noise ratio (SNR), the best optimal cutting condition has been arrived at A3B1C1 i.e. cutting speed is 420 m/min, depth of cut is 0.5 mm and feed rate is 0.12 m/min considering the condition smaller is the better approach. Analysis of Variance (ANOVA) is applied to find out the significance and percentage contribution of each parameter. The mathematical model of surface roughness has been developed using regression analysis as a function of the above mentioned independent variables. The predicted values from the developed model and experimental values are found to be very close to each other justifying the significance of the model. A confirmation run has been carried out with 95 % confidence level to verify the optimized result and the values obtained are within the prescribed limit.
管军; 杨兴易; 赵良; 林兆奋; 郭昌星; 李文放
2003-01-01
Objective To investigate the incidence, crude mortality and independent risk factors of ventilator-associated pneumonia (VAP) in comprehensive ICU in China.Methods The clinical and microbiological data were retrospectively collected and analysed of all the 97 patients receiving mechanical ventilation (>48hr) in our comprehensive ICU during 1999. 1 - 2000. 12. Firstly several statistically significant risk factors were screened out with univariate analysis, then independent risk factors were determined with multivariate stepwise logistic regression analysis.Results The incidence of VAP was 54. 64% (15. 60 cases per 1000 ventilation days), the crude mortality 47.42% . Interval between the establishment of artificial airway and diagnosis of VAP was 6.9 ± 4.3 d. Univariate analysis suggested that indwelling naso-gastric tube, corticosteroid, acid inhibitor, third-generation cephalosporin/ imipenem, non - infection lung disease, and extrapulmonary infection were the statistically significant risk factors of
Automated particle identification through regression analysis of size, shape and colour
Rodriguez Luna, J. C.; Cooper, J. M.; Neale, S. L.
2016-04-01
Rapid point of care diagnostic tests and tests to provide therapeutic information are now available for a range of specific conditions from the measurement of blood glucose levels for diabetes to card agglutination tests for parasitic infections. Due to a lack of specificity these test are often then backed up by more conventional lab based diagnostic methods for example a card agglutination test may be carried out for a suspected parasitic infection in the field and if positive a blood sample can then be sent to a lab for confirmation. The eventual diagnosis is often achieved by microscopic examination of the sample. In this paper we propose a computerized vision system for aiding in the diagnostic process; this system used a novel particle recognition algorithm to improve specificity and speed during the diagnostic process. We will show the detection and classification of different types of cells in a diluted blood sample using regression analysis of their size, shape and colour. The first step is to define the objects to be tracked by a Gaussian Mixture Model for background subtraction and binary opening and closing for noise suppression. After subtracting the objects of interest from the background the next challenge is to predict if a given object belongs to a certain category or not. This is a classification problem, and the output of the algorithm is a Boolean value (true/false). As such the computer program should be able to "predict" with reasonable level of confidence if a given particle belongs to the kind we are looking for or not. We show the use of a binary logistic regression analysis with three continuous predictors: size, shape and color histogram. The results suggest this variables could be very useful in a logistic regression equation as they proved to have a relatively high predictive value on their own.
Ya-Nan Ma
Full Text Available BACKGROUND: There have been few published studies on spirometric reference values for healthy children in China. We hypothesize that there would have been changes in lung function that would not have been precisely predicted by the existing spirometric reference equations. The objective of the study was to develop more accurate predictive equations for spirometric reference values for children aged 9 to 15 years in Northeast China. METHODOLOGY/PRINCIPAL FINDINGS: Spirometric measurements were obtained from 3,922 children, including 1,974 boys and 1,948 girls, who were randomly selected from five cities of Liaoning province, Northeast China, using the ATS (American Thoracic Society and ERS (European Respiratory Society standards. The data was then randomly split into a training subset containing 2078 cases and a validation subset containing 1844 cases. Predictive equations used multiple linear regression techniques with three predictor variables: height, age and weight. Model goodness of fit was examined using the coefficient of determination or the R(2 and adjusted R(2. The predicted values were compared with those obtained from the existing spirometric reference equations. The results showed the prediction equations using linear regression analysis performed well for most spirometric parameters. Paired t-tests were used to compare the predicted values obtained from the developed and existing spirometric reference equations based on the validation subset. The t-test for males was not statistically significant (p>0.01. The predictive accuracy of the developed equations was higher than the existing equations and the predictive ability of the model was also validated. CONCLUSION/SIGNIFICANCE: We developed prediction equations using linear regression analysis of spirometric parameters for children aged 9-15 years in Northeast China. These equations represent the first attempt at predicting lung function for Chinese children following the ATS
The analysis of internet addiction scale using multivariate adaptive regression splines.
Kayri, M
2010-01-01
Determining real effects on internet dependency is too crucial with unbiased and robust statistical method. MARS is a new non-parametric method in use in the literature for parameter estimations of cause and effect based research. MARS can both obtain legible model curves and make unbiased parametric predictions. In order to examine the performance of MARS, MARS findings will be compared to Classification and Regression Tree (C&RT) findings, which are considered in the literature to be efficient in revealing correlations between variables. The data set for the study is taken from "The Internet Addiction Scale" (IAS), which attempts to reveal addiction levels of individuals. The population of the study consists of 754 secondary school students (301 female, 443 male students with 10 missing data). MARS 2.0 trial version is used for analysis by MARS method and C&RT analysis was done by SPSS. MARS obtained six base functions of the model. As a common result of these six functions, regression equation of the model was found. Over the predicted variable, MARS showed that the predictors of daily Internet-use time on average, the purpose of Internet-use, grade of students and occupations of mothers had a significant effect (Pdependency level prediction. The fact that MARS revealed extent to which the variable, which was considered significant, changes the character of the model was observed in this study.
Julia Gasch
Full Text Available BACKGROUND: Differences in spontaneous and drug-induced baroreflex sensitivity (BRS have been attributed to its different operating ranges. The current study attempted to compare BRS estimates during cardiovascular steady-state and pharmacologically stimulation using an innovative algorithm for dynamic determination of baroreflex gain. METHODOLOGY/PRINCIPAL FINDINGS: Forty-five volunteers underwent the modified Oxford maneuver in supine and 60° tilted position with blood pressure and heart rate being continuously recorded. Drug-induced BRS-estimates were calculated from data obtained by bolus injections of nitroprusside and phenylephrine. Spontaneous indices were derived from data obtained during rest (stationary and under pharmacological stimulation (non-stationary using the algorithm of trigonometric regressive spectral analysis (TRS. Spontaneous and drug-induced BRS values were significantly correlated and display directionally similar changes under different situations. Using the Bland-Altman method, systematic differences between spontaneous and drug-induced estimates were found and revealed that the discrepancy can be as large as the gain itself. Fixed bias was not evident with ordinary least products regression. The correlation and agreement between the estimates increased significantly when BRS was calculated by TRS in non-stationary mode during the drug injection period. TRS-BRS significantly increased during phenylephrine and decreased under nitroprusside. CONCLUSIONS/SIGNIFICANCE: The TRS analysis provides a reliable, non-invasive assessment of human BRS not only under static steady state conditions, but also during pharmacological perturbation of the cardiovascular system.
Buck, J. A.; Underhill, P. R.; Morelli, J.; Krause, T. W.
2016-02-01
Nuclear steam generators (SGs) are a critical component for ensuring safe and efficient operation of a reactor. Life management strategies are implemented in which SG tubes are regularly inspected by conventional eddy current testing (ECT) and ultrasonic testing (UT) technologies to size flaws, and safe operating life of SGs is predicted based on growth models. ECT, the more commonly used technique, due to the rapidity with which full SG tube wall inspection can be performed, is challenged when inspecting ferromagnetic support structure materials in the presence of magnetite sludge and multiple overlapping degradation modes. In this work, an emerging inspection method, pulsed eddy current (PEC), is being investigated to address some of these particular inspection conditions. Time-domain signals were collected by an 8 coil array PEC probe in which ferromagnetic drilled support hole diameter, depth of rectangular tube frets and 2D tube off-centering were varied. Data sets were analyzed with a modified principal components analysis (MPCA) to extract dominant signal features. Multiple linear regression models were applied to MPCA scores to size hole diameter as well as size rectangular outer diameter tube frets. Models were improved through exploratory factor analysis, which was applied to MPCA scores to refine selection for regression models inputs by removing nonessential information.
Czekaj, Tomasz Gerard; Henningsen, Arne
The estimation of the technical efficiency comprises a vast literature in the field of applied production economics. There are two predominant approaches: the non-parametric and non-stochastic Data Envelopment Analysis (DEA) and the parametric Stochastic Frontier Analysis (SFA). The DEA...... of specifying an unsuitable functional form and thus, model misspecification and biased parameter estimates. Given these problems of the DEA and the SFA, Fan, Li and Weersink (1996) proposed a semi-parametric stochastic frontier model that estimates the production function (frontier) by non-parametric......), Kumbhakar et al. (2007), and Henningsen and Kumbhakar (2009). The aim of this paper and its main contribution to the existing literature is the estimation semi-parametric stochastic frontier models using a different non-parametric estimation technique: spline regression (Ma et al. 2011). We apply...
Within-session analysis of the extinction of pavlovian fear-conditioning using robust regression
Vargas-Irwin, Cristina
2010-06-01
Full Text Available Traditionally , the analysis of extinction data in fear conditioning experiments has involved the use of standard linear models, mostly ANOVA of between-group differences of subjects that have undergone different extinction protocols, pharmacological manipulations or some other treatment. Although some studies report individual differences in quantities such as suppression rates or freezing percentages, these differences are not included in the statistical modeling. Withinsubject response patterns are then averaged using coarse-grain time windows which can overlook these individual performance dynamics. Here we illustrate an alternative analytical procedure consisting of 2 steps: the estimation of a trend for within-session data and analysis of group differences in trend as main outcome. This procedure is tested on real fear-conditioning extinction data, comparing trend estimates via Ordinary Least Squares (OLS and robust Least Median of Squares (LMS regression estimates, as well as comparing between-group differences and analyzing mean freezing percentage versus LMS slopes as outcomes
A frailty model approach for regression analysis of multivariate current status data.
Chen, Man-Hua; Tong, Xingwei; Sun, Jianguo
2009-11-30
This paper discusses regression analysis of multivariate current status failure time data (The Statistical Analysis of Interval-censoring Failure Time Data. Springer: New York, 2006), which occur quite often in, for example, tumorigenicity experiments and epidemiologic investigations of the natural history of a disease. For the problem, several marginal approaches have been proposed that model each failure time of interest individually (Biometrics 2000; 56:940-943; Statist. Med. 2002; 21:3715-3726). In this paper, we present a full likelihood approach based on the proportional hazards frailty model. For estimation, an Expectation Maximization (EM) algorithm is developed and simulation studies suggest that the presented approach performs well for practical situations. The approach is applied to a set of bivariate current status data arising from a tumorigenicity experiment.
Roseane Cavalcanti dos Santos
2012-08-01
Full Text Available The objective of this work was to estimate the stability and adaptability of pod and seed yield in runner peanut genotypes based on the nonlinear regression and AMMI analysis. Yield data from 11 trials, distributed in six environments and three harvests, carried out in the Northeast region of Brazil during the rainy season were used. Significant effects of genotypes (G, environments (E, and GE interactions were detected in the analysis, indicating different behaviors among genotypes in favorable and unfavorable environmental conditions. The genotypes BRS Pérola Branca and LViPE‑06 are more stable and adapted to the semiarid environment, whereas LGoPE‑06 is a promising material for pod production, despite being highly dependent on favorable environments.
Lamm, Steven H; Ferdosi, Hamid; Dissen, Elisabeth K; Li, Ji; Ahn, Jaeil
2015-12-07
High levels (> 200 µg/L) of inorganic arsenic in drinking water are known to be a cause of human lung cancer, but the evidence at lower levels is uncertain. We have sought the epidemiological studies that have examined the dose-response relationship between arsenic levels in drinking water and the risk of lung cancer over a range that includes both high and low levels of arsenic. Regression analysis, based on six studies identified from an electronic search, examined the relationship between the log of the relative risk and the log of the arsenic exposure over a range of 1-1000 µg/L. The best-fitting continuous meta-regression model was sought and found to be a no-constant linear-quadratic analysis where both the risk and the exposure had been logarithmically transformed. This yielded both a statistically significant positive coefficient for the quadratic term and a statistically significant negative coefficient for the linear term. Sub-analyses by study design yielded results that were similar for both ecological studies and non-ecological studies. Statistically significant X-intercepts consistently found no increased level of risk at approximately 100-150 µg/L arsenic.
Das, Sumonkanti; Rahman, Rajwanur M
2011-11-14
The study attempts to develop an ordinal logistic regression (OLR) model to identify the determinants of child malnutrition instead of developing traditional binary logistic regression (BLR) model using the data of Bangladesh Demographic and Health Survey 2004. Based on weight-for-age anthropometric index (Z-score) child nutrition status is categorized into three groups-severely undernourished (malnutrition and severe malnutrition if the proportional odds assumption satisfies. The assumption is satisfied with low p-value (0.144) due to violation of the assumption for one co-variate. So partial proportional odds model (PPOM) and two BLR models have also been developed to check the applicability of the OLR model. Graphical test has also been adopted for checking the proportional odds assumption. All the models determine that age of child, birth interval, mothers' education, maternal nutrition, household wealth status, child feeding index, and incidence of fever, ARI & diarrhoea were the significant predictors of child malnutrition; however, results of PPOM were more precise than those of other models. These findings clearly justify that OLR models (POM and PPOM) are appropriate to find predictors of malnutrition instead of BLR models.
Huybrechts, Inge; Lioret, Sandrine; Mouratidou, Theodora; Gunter, Marc J; Manios, Yannis; Kersting, Mathilde; Gottrand, Frederic; Kafatos, Anthony; De Henauw, Stefaan; Cuenca-García, Magdalena; Widhalm, Kurt; Gonzales-Gross, Marcela; Molnar, Denes; Moreno, Luis A; McNaughton, Sarah A
2017-01-01
This study aims to examine repeatability of reduced rank regression (RRR) methods in calculating dietary patterns (DP) and cross-sectional associations with overweight (OW)/obesity across European and Australian samples of adolescents. Data from two cross-sectional surveys in Europe (2006/2007 Healthy Lifestyle in Europe by Nutrition in Adolescence study, including 1954 adolescents, 12-17 years) and Australia (2007 National Children's Nutrition and Physical Activity Survey, including 1498 adolescents, 12-16 years) were used. Dietary intake was measured using two non-consecutive, 24-h recalls. RRR was used to identify DP using dietary energy density, fibre density and percentage of energy intake from fat as the intermediate variables. Associations between DP scores and body mass/fat were examined using multivariable linear and logistic regression as appropriate, stratified by sex. The first DP extracted (labelled 'energy dense, high fat, low fibre') explained 47 and 31 % of the response variation in Australian and European adolescents, respectively. It was similar for European and Australian adolescents and characterised by higher consumption of biscuits/cakes, chocolate/confectionery, crisps/savoury snacks, sugar-sweetened beverages, and lower consumption of yogurt, high-fibre bread, vegetables and fresh fruit. DP scores were inversely associated with BMI z-scores in Australian adolescent boys and borderline inverse in European adolescent boys (so as with %BF). Similarly, a lower likelihood for OW in boys was observed with higher DP scores in both surveys. No such relationships were observed in adolescent girls. In conclusion, the DP identified in this cross-country study was comparable for European and Australian adolescents, demonstrating robustness of the RRR method in calculating DP among populations. However, longitudinal designs are more relevant when studying diet-obesity associations, to prevent reverse causality.
Evaluation of Visual Field Progression in Glaucoma: Quasar Regression Program and Event Analysis.
Díaz-Alemán, Valentín T; González-Hernández, Marta; Perera-Sanz, Daniel; Armas-Domínguez, Karintia
2016-01-01
To determine the sensitivity, specificity and agreement between the Quasar program, glaucoma progression analysis (GPA II) event analysis and expert opinion in the detection of glaucomatous progression. The Quasar program is based on linear regression analysis of both mean defect (MD) and pattern standard deviation (PSD). Each series of visual fields was evaluated by three methods; Quasar, GPA II and four experts. The sensitivity, specificity and agreement (kappa) for each method was calculated, using expert opinion as the reference standard. The study included 439 SITA Standard visual fields of 56 eyes of 42 patients, with a mean of 7.8 ± 0.8 visual fields per eye. When suspected cases of progression were considered stable, sensitivity and specificity of Quasar, GPA II and the experts were 86.6% and 70.7%, 26.6% and 95.1%, and 86.6% and 92.6% respectively. When suspected cases of progression were considered as progressing, sensitivity and specificity of Quasar, GPA II and the experts were 79.1% and 81.2%, 45.8% and 90.6%, and 85.4% and 90.6% respectively. The agreement between Quasar and GPA II when suspected cases were considered stable or progressing was 0.03 and 0.28 respectively. The degree of agreement between Quasar and the experts when suspected cases were considered stable or progressing was 0.472 and 0.507. The degree of agreement between GPA II and the experts when suspected cases were considered stable or progressing was 0.262 and 0.342. The combination of MD and PSD regression analysis in the Quasar program showed better agreement with the experts and higher sensitivity than GPA II.
Rodríguez-Barranco, Miguel; Tobías, Aurelio; Redondo, Daniel; Molina-Portillo, Elena; Sánchez, María José
2017-03-17
Meta-analysis is very useful to summarize the effect of a treatment or a risk factor for a given disease. Often studies report results based on log-transformed variables in order to achieve the principal assumptions of a linear regression model. If this is the case for some, but not all studies, the effects need to be homogenized. We derived a set of formulae to transform absolute changes into relative ones, and vice versa, to allow including all results in a meta-analysis. We applied our procedure to all possible combinations of log-transformed independent or dependent variables. We also evaluated it in a simulation based on two variables either normally or asymmetrically distributed. In all the scenarios, and based on different change criteria, the effect size estimated by the derived set of formulae was equivalent to the real effect size. To avoid biased estimates of the effect, this procedure should be used with caution in the case of independent variables with asymmetric distributions that significantly differ from the normal distribution. We illustrate an application of this procedure by an application to a meta-analysis on the potential effects on neurodevelopment in children exposed to arsenic and manganese. The procedure proposed has been shown to be valid and capable of expressing the effect size of a linear regression model based on different change criteria in the variables. Homogenizing the results from different studies beforehand allows them to be combined in a meta-analysis, independently of whether the transformations had been performed on the dependent and/or independent variables.
Meta-CART: A tool to identify interactions between moderators in meta-analysis.
Li, Xinru; Dusseldorp, Elise; Meulman, Jacqueline J
2017-02-01
In the framework of meta-analysis, moderator analysis is usually performed only univariately. When several study characteristics are available that may account for treatment effect, standard meta-regression has difficulties in identifying interactions between them. To overcome this problem, meta-CART has been proposed: an approach that applies classification and regression trees (CART) to identify interactions, and then subgroup meta-analysis to test the significance of moderator effects. The previous version of meta-CART has its shortcomings: when applying CART, the sample sizes of studies are not taken into account, and the effect sizes are dichotomized around the median value. Therefore, this article proposes new meta-CART extensions, weighting study effect sizes by their accuracy, and using a regression tree to avoid dichotomization. In addition, new pruning rules are proposed. The performance of all versions of meta-CART was evaluated via a Monte Carlo simulation study. The simulation results revealed that meta-regression trees with random-effects weights and a 0.5-standard-error pruning rule perform best. The required sample size for meta-CART to achieve satisfactory performance depends on the number of study characteristics, the magnitude of the interactions, and the residual heterogeneity.
Askelöf, P; Korsfeldt, M; Mannervik, B
1976-10-01
Knowledge of the error structure of a given set of experimental data is a necessary prerequisite for incisive analysis and for discrimination between alternative mathematical models of the data set. A reaction system consisting of glutathione S-transferase A (glutathione S-aryltransferase), glutathione, and 3,4-dichloro-1-nitrobenzene was investigated under steady-state conditions. It was found that the experimental error increased with initial velocity, v, and that the variance (estimated by replicates) could be described by a polynomial in v Var (v) = K0 + K1 - v + K2 - v2 or by a power function Var (v) = K0 + K1 - vK2. These equations were good approximations irrespective of whether different v values were generated by changing substrate or enzyme concentrations. The selection of these models was based mainly on experiments involving varying enzyme concentration, which, unlike v, is not considered a stochastic variable. Different models of the variance, expressed as functions of enzyme concentration, were examined by regression analysis, and the models could then be transformed to functions in which velocity is substituted for enzyme concentration owing to the proportionality between these variables. Thus, neither the absolute nor the relative error was independent of velocity, a result previously obtained for glutathione reductase in this laboratory [BioSystems 7, 101-119 (1975)]. If the experimental errors or velocities were standardized by division with their corresponding mean velocity value they showed a normal (Gaussian) distribution provided that the coefficient of variation was approximately constant for the data considered. Furthermore, it was established that the errors in the independent variables (enzyme and substrate concentrations) were small in comparison with the error in the velocity determinations. For weighting in regression analysis the inverted value of the local variance in each experimental point should be used. It was found that the
Assessment of participation bias in cohort studies: systematic review and meta-regression analysis.
Silva Junior, Sérgio Henrique Almeida da; Santos, Simone M; Coeli, Cláudia Medina; Carvalho, Marilia Sá
2015-11-01
The proportion of non-participation in cohort studies, if associated with both the exposure and the probability of occurrence of the event, can introduce bias in the estimates of interest. The aim of this study is to evaluate the impact of participation and its characteristics in longitudinal studies. A systematic review (MEDLINE, Scopus and Web of Science) for articles describing the proportion of participation in the baseline of cohort studies was performed. Among the 2,964 initially identified, 50 were selected. The average proportion of participation was 64.7%. Using a meta-regression model with mixed effects, only age, year of baseline contact and study region (borderline) were associated with participation. Considering the decrease in participation in recent years, and the cost of cohort studies, it is essential to gather information to assess the potential for non-participation, before committing resources. Finally, journals should require the presentation of this information in the papers.
Assessment of participation bias in cohort studies: systematic review and meta-regression analysis
Sérgio Henrique Almeida da Silva Junior
2015-11-01
Full Text Available Abstract The proportion of non-participation in cohort studies, if associated with both the exposure and the probability of occurrence of the event, can introduce bias in the estimates of interest. The aim of this study is to evaluate the impact of participation and its characteristics in longitudinal studies. A systematic review (MEDLINE, Scopus and Web of Science for articles describing the proportion of participation in the baseline of cohort studies was performed. Among the 2,964 initially identified, 50 were selected. The average proportion of participation was 64.7%. Using a meta-regression model with mixed effects, only age, year of baseline contact and study region (borderline were associated with participation. Considering the decrease in participation in recent years, and the cost of cohort studies, it is essential to gather information to assess the potential for non-participation, before committing resources. Finally, journals should require the presentation of this information in the papers.
Measuring treatment and scale bias effects by linear regression in the analysis of OHI-S scores.
Moore, B J
1977-05-01
A linear regression model is presented for estimating unbiased treatment effects from OHI-S scores. An example is given to illustrate an analysis and to compare results of an unbiased regression estimator with those based on a biased simple difference estimator.
The Impact of Outliers on Net-Benefit Regression Model in Cost-Effectiveness Analysis.
Wen, Yu-Wen; Tsai, Yi-Wen; Wu, David Bin-Chia; Chen, Pei-Fen
2013-01-01
Ordinary least square (OLS) in regression has been widely used to analyze patient-level data in cost-effectiveness analysis (CEA). However, the estimates, inference and decision making in the economic evaluation based on OLS estimation may be biased by the presence of outliers. Instead, robust estimation can remain unaffected and provide result which is resistant to outliers. The objective of this study is to explore the impact of outliers on net-benefit regression (NBR) in CEA using OLS and to propose a potential solution by using robust estimations, i.e. Huber M-estimation, Hampel M-estimation, Tukey's bisquare M-estimation, MM-estimation and least trimming square estimation. Simulations under different outlier-generating scenarios and an empirical example were used to obtain the regression estimates of NBR by OLS and five robust estimations. Empirical size and empirical power of both OLS and robust estimations were then compared in the context of hypothesis testing. Simulations showed that the five robust approaches compared with OLS estimation led to lower empirical sizes and achieved higher empirical powers in testing cost-effectiveness. Using real example of antiplatelet therapy, the estimated incremental net-benefit by OLS estimation was lower than those by robust approaches because of outliers in cost data. Robust estimations demonstrated higher probability of cost-effectiveness compared to OLS estimation. The presence of outliers can bias the results of NBR and its interpretations. It is recommended that the use of robust estimation in NBR can be an appropriate method to avoid such biased decision making.
Stature estimation from footprint measurements in Indian Tamils by regression analysis
T. Nataraja Moorthy
2014-03-01
Full Text Available Stature estimation is of particular interest to forensic scientists for its importance in human identification. Footprint is one piece of valuable physical evidence encountered at crime scenes and its identification can facilitate narrowing down the suspects and establishing the identity of the criminals. Analysis of footprints helps in estimation of an individual’s stature because of the existence of the strong correlation between footprint and height. Foot impressions are still found at crime scenes, since offenders often tend to remove their footwear either to avoid noise or to gain a better grip in climbing walls, etc., while entering or exiting. In Asian countries like India, there are people who still have the habit of walking barefoot. The present study aims to estimate the stature in a sample of 2,040 bilateral footprints collected from 1,020 healthy adult male Indian Tamils, an ethnic group in Tamilnadu State, India, who consented to participate in the study and who range in age from 19 to 42 years old; this study will help to generate population-specific equations using a simple linear regression statistical method. All footprint lengths exhibit a statistically positive significant correlation with stature (p-value < 0.01 and the correlation coefficient (r ranges from 0.546 to 0.578. The accuracy of the regression equations was verified by comparing the estimated stature with the actual stature. Regression equations derived in this research can be used to estimate stature from the complete or even partial footprints among Indian Tamils.
Fu, Yuan-Yuan; Wang, Ji-Hua; Yang, Gui-Jun; Song, Xiao-Yu; Xu, Xin-Gang; Feng, Hai-Kuan
2013-05-01
The major limitation of using existing vegetation indices for crop biomass estimation is that it approaches a saturation level asymptotically for a certain range of biomass. In order to resolve this problem, band depth analysis and partial least square regression (PLSR) were combined to establish winter wheat biomass estimation model in the present study. The models based on the combination of band depth analysis and PLSR were compared with the models based on common vegetation indexes from the point of view of estimation accuracy, subsequently. Band depth analysis was conducted in the visible spectral domain (550-750 nm). Band depth, band depth ratio (BDR), normalized band depth index, and band depth normalized to area were utilized to represent band depth information. Among the calibrated estimation models, the models based on the combination of band depth analysis and PLSR reached higher accuracy than those based on the vegetation indices. Among them, the combination of BDR and PLSR got the highest accuracy (R2 = 0.792, RMSE = 0.164 kg x m(-2)). The results indicated that the combination of band depth analysis and PLSR could well overcome the saturation problem and improve the biomass estimation accuracy when winter wheat biomass is large.
Shuai Wang
2014-10-01
Full Text Available Accurate prediction of the remaining useful life (RUL of lithium-ion batteries is important for battery management systems. Traditional empirical data-driven approaches for RUL prediction usually require multidimensional physical characteristics including the current, voltage, usage duration, battery temperature, and ambient temperature. From a capacity fading analysis of lithium-ion batteries, it is found that the energy efficiency and battery working temperature are closely related to the capacity degradation, which account for all performance metrics of lithium-ion batteries with regard to the RUL and the relationships between some performance metrics. Thus, we devise a non-iterative prediction model based on flexible support vector regression (F-SVR and an iterative multi-step prediction model based on support vector regression (SVR using the energy efficiency and battery working temperature as input physical characteristics. The experimental results show that the proposed prognostic models have high prediction accuracy by using fewer dimensions for the input data than the traditional empirical models.
An innovative land use regression model incorporating meteorology for exposure analysis.
Su, Jason G; Brauer, Michael; Ainslie, Bruce; Steyn, Douw; Larson, Timothy; Buzzelli, Michael
2008-02-15
The advent of spatial analysis and geographic information systems (GIS) has led to studies of chronic exposure and health effects based on the rationale that intra-urban variations in ambient air pollution concentrations are as great as inter-urban differences. Such studies typically rely on local spatial covariates (e.g., traffic, land use type) derived from circular areas (buffers) to predict concentrations/exposures at receptor sites, as a means of averaging the annual net effect of meteorological influences (i.e., wind speed, wind direction and insolation). This is the approach taken in the now popular land use regression (LUR) method. However spatial studies of chronic exposures and temporal studies of acute exposures have not been adequately integrated. This paper presents an innovative LUR method implemented in a GIS environment that reflects both temporal and spatial variability and considers the role of meteorology. The new source area LUR integrates wind speed, wind direction and cloud cover/insolation to estimate hourly nitric oxide (NO) and nitrogen dioxide (NO(2)) concentrations from land use types (i.e., road network, commercial land use) and these concentrations are then used as covariates to regress against NO and NO(2) measurements at various receptor sites across the Vancouver region and compared directly with estimates from a regular LUR. The results show that, when variability in seasonal concentration measurements is present, the source area LUR or SA-LUR model is a better option for concentration estimation.
Improved Regression Analysis of Temperature-Dependent Strain-Gage Balance Calibration Data
Ulbrich, N.
2015-01-01
An improved approach is discussed that may be used to directly include first and second order temperature effects in the load prediction algorithm of a wind tunnel strain-gage balance. The improved approach was designed for the Iterative Method that fits strain-gage outputs as a function of calibration loads and uses a load iteration scheme during the wind tunnel test to predict loads from measured gage outputs. The improved approach assumes that the strain-gage balance is at a constant uniform temperature when it is calibrated and used. First, the method introduces a new independent variable for the regression analysis of the balance calibration data. The new variable is designed as the difference between the uniform temperature of the balance and a global reference temperature. This reference temperature should be the primary calibration temperature of the balance so that, if needed, a tare load iteration can be performed. Then, two temperature{dependent terms are included in the regression models of the gage outputs. They are the temperature difference itself and the square of the temperature difference. Simulated temperature{dependent data obtained from Triumph Aerospace's 2013 calibration of NASA's ARC-30K five component semi{span balance is used to illustrate the application of the improved approach.
Rubio, Francisco J.
2016-02-09
We study Bayesian linear regression models with skew-symmetric scale mixtures of normal error distributions. These kinds of models can be used to capture departures from the usual assumption of normality of the errors in terms of heavy tails and asymmetry. We propose a general noninformative prior structure for these regression models and show that the corresponding posterior distribution is proper under mild conditions. We extend these propriety results to cases where the response variables are censored. The latter scenario is of interest in the context of accelerated failure time models, which are relevant in survival analysis. We present a simulation study that demonstrates good frequentist properties of the posterior credible intervals associated with the proposed priors. This study also sheds some light on the trade-off between increased model flexibility and the risk of over-fitting. We illustrate the performance of the proposed models with real data. Although we focus on models with univariate response variables, we also present some extensions to the multivariate case in the Supporting Information.
Abdul Ghafoor Memon
2014-03-01
Full Text Available In this study, thermodynamic and statistical analyses were performed on a gas turbine system, to assess the impact of some important operating parameters like CIT (Compressor Inlet Temperature, PR (Pressure Ratio and TIT (Turbine Inlet Temperature on its performance characteristics such as net power output, energy efficiency, exergy efficiency and fuel consumption. Each performance characteristic was enunciated as a function of operating parameters, followed by a parametric study and optimization. The results showed that the performance characteristics increase with an increase in the TIT and a decrease in the CIT, except fuel consumption which behaves oppositely. The net power output and efficiencies increase with the PR up to certain initial values and then start to decrease, whereas the fuel consumption always decreases with an increase in the PR. The results of exergy analysis showed the combustion chamber as a major contributor to the exergy destruction, followed by stack gas. Subsequently, multiple regression models were developed to correlate each of the response variables (performance characteristic with the predictor variables (operating parameters. The regression model equations showed a significant statistical relationship between the predictor and response variables.
Exergy Analysis of a Subcritical Reheat Steam Power Plant with Regression Modeling and Optimization
MUHIB ALI RAJPER
2016-07-01
Full Text Available In this paper, exergy analysis of a 210 MW SPP (Steam Power Plant is performed. Firstly, the plant is modeled and validated, followed by a parametric study to show the effects of various operating parameters on the performance parameters. The net power output, energy efficiency, and exergy efficiency are taken as the performance parameters, while the condenser pressure, main steam pressure, bled steam pressures, main steam temperature, and reheat steam temperature isnominated as the operating parameters. Moreover, multiple polynomial regression models are developed to correlate each performance parameter with the operating parameters. The performance is then optimizedby using Direct-searchmethod. According to the results, the net power output, energy efficiency, and exergy efficiency are calculated as 186.5 MW, 31.37 and 30.41%, respectively under normal operating conditions as a base case. The condenser is a major contributor towards the energy loss, followed by the boiler, whereas the highest irreversibilities occur in the boiler and turbine. According to the parametric study, variation in the operating parameters greatly influences the performance parameters. The regression models have appeared to be a good estimator of the performance parameters. The optimum net power output, energy efficiency and exergy efficiency are obtained as 227.6 MW, 37.4 and 36.4, respectively, which have been calculated along with optimal values of selected operating parameters.
Classification of Effective Soil Depth by Using Multinomial Logistic Regression Analysis
Chang, C. H.; Chan, H. C.; Chen, B. A.
2016-12-01
Classification of effective soil depth is a task of determining the slopeland utilizable limitation in Taiwan. The "Slopeland Conservation and Utilization Act" categorizes the slopeland into agriculture and husbandry land, land suitable for forestry and land for enhanced conservation according to the factors including average slope, effective soil depth, soil erosion and parental rock. However, sit investigation of the effective soil depth requires a cost-effective field work. This research aimed to classify the effective soil depth by using multinomial logistic regression with the environmental factors. The Wen-Shui Watershed located at the central Taiwan was selected as the study areas. The analysis of multinomial logistic regression is performed by the assistance of a Geographic Information Systems (GIS). The effective soil depth was categorized into four levels including deeper, deep, shallow and shallower. The environmental factors of slope, aspect, digital elevation model (DEM), curvature and normalized difference vegetation index (NDVI) were selected for classifying the soil depth. An Error Matrix was then used to assess the model accuracy. The results showed an overall accuracy of 75%. At the end, a map of effective soil depth was produced to help planners and decision makers in determining the slopeland utilizable limitation in the study areas.
Yu, Rongqin; Geddes, John R; Fazel, Seena
2012-10-01
The risk of antisocial outcomes in individuals with personality disorder (PD) remains uncertain. The authors synthesize the current evidence on the risks of antisocial behavior, violence, and repeat offending in PD, and they explore sources of heterogeneity in risk estimates through a systematic review and meta-regression analysis of observational studies comparing antisocial outcomes in personality disordered individuals with controls groups. Fourteen studies examined risk of antisocial and violent behavior in 10,007 individuals with PD, compared with over 12 million general population controls. There was a substantially increased risk of violent outcomes in studies with all PDs (random-effects pooled odds ratio [OR] = 3.0, 95% CI = 2.6 to 3.5). Meta-regression revealed that antisocial PD and gender were associated with higher risks (p = .01 and .07, respectively). The odds of all antisocial outcomes were also elevated. Twenty-five studies reported the risk of repeat offending in PD compared with other offenders. The risk of a repeat offense was also increased (fixed-effects pooled OR = 2.4, 95% CI = 2.2 to 2.7) in offenders with PD. The authors conclude that although PD is associated with antisocial outcomes and repeat offending, the risk appears to differ by PD category, gender, and whether individuals are offenders or not.
A quantile regression approach to the analysis of the quality of life determinants in the elderly
Serena Broccoli
2013-05-01
Full Text Available Objective. The aim of this study is to explain the effect of important covariates on the health-related quality of life (HRQol in elderly subjects. Methods. Data were collected within a longitudinal study that involves 5256 subject, aged +or= 65. The Visual Analogue Scale inclused in the EQ-5D Questionnaire, tha EQ-VAS, was used to obtain a synthetic measure of quality of life. To model EQ-VAS Score a quantile regression analysis was employed. This methodological approach was preferred to an OLS regression becouse of the EQ-VAS Score typical distribution. The main covariates are: amount of weekly physical activity, reported problems in Activity of Daily Living, presence of cardiovascular diseases, diabetes, hypercolesterolemia, hypertension, joints pains, as well as socio-demographic information. Main Results. 1 Even a low level of physical activity significantly influences quality of life in a positive way; 2 ADL problems, at least one cardiovascular disease and joint pain strongly decrease the quality of life.
Factors predicting the failure of Bernese periacetabular osteotomy: a meta-regression analysis.
Sambandam, Senthil Nathan; Hull, Jason; Jiranek, William A
2009-12-01
There is no clear evidence regarding the outcome of Bernese periacetabular osteotomy (PAO) in different patient populations. We performed systematic meta-regression analysis of 23 eligible studies. There were 1,113 patients of which 61 patients had total hip arthroplasty (THA) (endpoint) as a result of failed Bernese PAO. Univariate analysis revealed significant correlation between THA and presence of grade 2/grade 3 arthritis, Merle de'Aubigne score (MDS), Harris hip score and Tonnis angle, change in lateral centre edge (LCE) angle, late proximal femoral osteotomies, and heterotrophic ossification (HO) resection. Multivariate analysis showed that the odds of having THA increases with grade 2/grade 3 osteoarthritis (3.36 times), joint penetration (3.12 times), low preoperative MDS (1.59 times), late PFO (1.59 times), presence of preoperative subluxation (1.22 times), previous hip operations (1.14 times), and concomitant PFO (1.09 times). In the absence of randomised controlled studies, the findings of this analysis can help the surgeon to make treatment decisions.
A Performance Study of Data Mining Techniques: Multiple Linear Regression vs. Factor Analysis
Taneja, Abhishek
2011-01-01
The growing volume of data usually creates an interesting challenge for the need of data analysis tools that discover regularities in these data. Data mining has emerged as disciplines that contribute tools for data analysis, discovery of hidden knowledge, and autonomous decision making in many application domains. The purpose of this study is to compare the performance of two data mining techniques viz., factor analysis and multiple linear regression for different sample sizes on three unique sets of data. The performance of the two data mining techniques is compared on following parameters like mean square error (MSE), R-square, R-Square adjusted, condition number, root mean square error(RMSE), number of variables included in the prediction model, modified coefficient of efficiency, F-value, and test of normality. These parameters have been computed using various data mining tools like SPSS, XLstat, Stata, and MS-Excel. It is seen that for all the given dataset, factor analysis outperform multiple linear re...
Wu, X. B.
2006-06-01
Full Text Available Four body-size and fourteen head-size measurements were taken from each Chinese alligator (Alligator sinensis according to the measurements adapted from Verdade. Regression equations between body-size and head-size variables were presented to predict body size from head dimension. The coefficients of determination of captive animals concerning body- and head-size variables can be considered extremely high, which means most of the head-size variables studied can be useful for predicting body length. The result of multivariate allometric analysis indicated that the head elongates as in most other species of crocodilians. The allometric coefficients of snout length (SL and lower ramus (LM were greater than those of other variables of head, which was considered to be possibly correlated to fights and prey. On the contrary, allometric coefficients for the variables of obita (OW, OL and postorbital cranial roof (LCR, were lower than those of other variables.
Deterministic Assessment of Continuous Flight Auger Construction Durations Using Regression Analysis
Hossam E. Hosny
2015-07-01
Full Text Available One of the primary functions of construction equipment management is to calculate the production rate of equipment which will be a major input to the processes of time estimates, cost estimates and the overall project planning. Accordingly, it is crucial to stakeholders to be able to compute equipment production rates. This may be achieved using an accurate, reliable and easy tool. The objective of this research is to provide a simple model that can be used by specialists to predict the duration of a proposed Continuous Flight Auger job. The model was obtained using a prioritizing technique based on expert judgment then using multi-regression analysis based on a representative sample. The model was then validated on a selected sample of projects. The average error of the model was calculated to be about (3%-6%.
Biological stability in drinking water: a regression analysis of influencing factors
LU Wei; ZHANG Xiao-jian
2005-01-01
Some parameters, such as assimilable organic carbon(AOC), chloramine residual, water temperature, and water residence time, were measured in drinking water from distribution systems in a northern city of China. The measurement results illustrate that when chloramine residual is more than 0.3 mg/L or AOC content is below 50 tμg/L, the biological stability of drinking water can be controlled.Both chloramine residual and AOC have a good relationship with Heterotrophic Plate Counts(HPC)(log value), the correlation coefficient was -0.64 and 0.33, respectively. By regression analysis of the survey data, a statistical equation is presented and it is concluded that disinfectant residual exerts the strongest influence on bacterial growth and AOC is a suitable index to assess the biological stability in the drinking water.
Logistic Regression Analysis on Factors Affecting Adoption of RiceFish Farming in North Iran
Seyyed Ali NOORHOSSEINI-NIYAKI; Mohammad Sadegh ALLAHYARI
2012-01-01
We evaluated the factors influencing the adoption of rice-fish farming in the Tavalesh region near the Caspian Sea in northern Iran.We conducted a survey with open-ended questions.Data were collected from 184 respondents (61 adopters and 123 non-adopters) randomly sampled from selected villages and analyzed using logistic regression and multiresponse analysis.Family size,number of contacts with an extension agent,participation in extension-education activities,membership in social institutions and the presence of farm workers were the most important socioeconomic factors for the adoption of rice-fish farming system.In addition,economic problems were the most common issue reported by adopters.Other issues such as lack of access to appropriate fish food,losses of fish,lack of access to high quality fish fingerlings and dehydration and poor water quality were also important to a number of farmers.
ANALYSIS OF TUITION GROWTH RATES BASED ON CLUSTERING AND REGRESSION MODELS
Long Cheng
2016-07-01
Full Text Available Tuition plays a significant role in determining whether a student could afford higher education, which is one of the major driving forces for country development and social prosperity. So it is necessary to fully understand what factors might affect the tuition and how they affect it. However, many existing studies on the tuition growth rate either lack sufficient real data and proper quantitative models to support their conclusions, or are limited to focus on only a few factors that might affect the tuition growth rate, failing to make a comprehensive analysis. In this paper, we explore a wide variety of factors that might affect the tuition growth rate by use of large amounts of authentic data and different quantitative methods such as clustering and regression models.
A generalized Defries-Fulker regression framework for the analysis of twin data.
Lazzeroni, Laura C; Ray, Amrita
2013-01-01
Twin studies compare the similarity between monozygotic twins to that between dizygotic twins in order to investigate the relative contributions of latent genetic and environmental factors influencing a phenotype. Statistical methods for twin data include likelihood estimation and Defries-Fulker regression. We propose a new generalization of the Defries-Fulker model that fully incorporates the effects of observed covariates on both members of a twin pair and is robust to violations of the Normality assumption. A simulation study demonstrates that the method is competitive with likelihood analysis. The Defries-Fulker strategy yields new insight into the parameter space of the twin model and provides a novel, prediction-based interpretation of twin study results that unifies continuous and binary traits. Due to the simplicity of its structure, extensions of the model have the potential to encompass generalized linear models, censored and truncated data; and gene by environment interactions.
牛东晓; 刘达; 邢棉
2008-01-01
A combined model based on principal components analysis (PCA) and generalized regression neural network (GRNN) was adopted to forecast electricity price in day-ahead electricity market. PCA was applied to mine the main influence on day-ahead price, avoiding the strong correlation between the input factors that might influence electricity price, such as the load of the forecasting hour, other history loads and prices, weather and temperature; then GRNN was employed to forecast electricity price according to the main information extracted by PCA. To prove the efficiency of the combined model, a case from PJM (Pennsylvania-New Jersey-Maryland) day-ahead electricity market was evaluated. Compared to back-propagation (BP) neural network and standard GRNN, the combined method reduces the mean absolute percentage error about 3%.
Sensitivity Analysis to Select the Most Influential Risk Factors in a Logistic Regression Model
Jassim N. Hussain
2008-01-01
Full Text Available The traditional variable selection methods for survival data depend on iteration procedures, and control of this process assumes tuning parameters that are problematic and time consuming, especially if the models are complex and have a large number of risk factors. In this paper, we propose a new method based on the global sensitivity analysis (GSA to select the most influential risk factors. This contributes to simplification of the logistic regression model by excluding the irrelevant risk factors, thus eliminating the need to fit and evaluate a large number of models. Data from medical trials are suggested as a way to test the efficiency and capability of this method and as a way to simplify the model. This leads to construction of an appropriate model. The proposed method ranks the risk factors according to their importance.
Melanin and blood concentration in human skin studied by multiple regression analysis: experiments
Shimada, M.; Yamada, Y.; Itoh, M.; Yatagai, T.
2001-09-01
Knowledge of the mechanism of human skin colour and measurement of melanin and blood concentration in human skin are needed in the medical and cosmetic fields. The absorbance spectrum from reflectance at the visible wavelength of human skin increases under several conditions such as a sunburn or scalding. The change of the absorbance spectrum from reflectance including the scattering effect does not correspond to the molar absorption spectrum of melanin and blood. The modified Beer-Lambert law is applied to the change in the absorbance spectrum from reflectance of human skin as the change in melanin and blood is assumed to be small. The concentration of melanin and blood was estimated from the absorbance spectrum reflectance of human skin using multiple regression analysis. Estimated concentrations were compared with the measured one in a phantom experiment and this method was applied to in vivo skin.
Shen, Chung-Wei; Chen, Yi-Hau
2015-10-01
Missing observations and covariate measurement error commonly arise in longitudinal data. However, existing methods for model selection in marginal regression analysis of longitudinal data fail to address the potential bias resulting from these issues. To tackle this problem, we propose a new model selection criterion, the Generalized Longitudinal Information Criterion, which is based on an approximately unbiased estimator for the expected quadratic error of a considered marginal model accounting for both data missingness and covariate measurement error. The simulation results reveal that the proposed method performs quite well in the presence of missing data and covariate measurement error. On the contrary, the naive procedures without taking care of such complexity in data may perform quite poorly. The proposed method is applied to data from the Taiwan Longitudinal Study on Aging to assess the relationship of depression with health and social status in the elderly, accommodating measurement error in the covariate as well as missing observations.
A Note on Penalized Regression Spline Estimation in the Secondary Analysis of Case-Control Data
Gazioglu, Suzan
2013-05-25
Primary analysis of case-control studies focuses on the relationship between disease (D) and a set of covariates of interest (Y, X). A secondary application of the case-control study, often invoked in modern genetic epidemiologic association studies, is to investigate the interrelationship between the covariates themselves. The task is complicated due to the case-control sampling, and to avoid the biased sampling that arises from the design, it is typical to use the control data only. In this paper, we develop penalized regression spline methodology that uses all the data, and improves precision of estimation compared to using only the controls. A simulation study and an empirical example are used to illustrate the methodology.
Upender Manne
2007-01-01
Full Text Available Background: Although a majority of studies in cancer biomarker discovery claim to use proportional hazards regression (PHREG to the study the ability of a biomarker to predict survival, few studies use the predicted probabilities obtained from the model to test the quality of the model. In this paper, we compared the quality of predictions by a PHREG model to that of a linear discriminant analysis (LDA in both training and test set settings. Methods: The PHREG and LDA models were built on a 491 colorectal cancer (CRC patient dataset comprised of demographic and clinicopathologic variables, and phenotypic expression of p53 and Bcl-2. Two variable selection methods, stepwise discriminant analysis and the backward selection, were used to identify the final models. The endpoint of prediction in these models was five-year post-surgery survival. We also used linear regression model to examine the effect of bin size in the training set on the accuracy of prediction in the test set.Results: The two variable selection techniques resulted in different models when stage was included in the list of variables available for selection. However, the proportion of survivors and non-survivors correctly identified was identical in both of these models. When stage was excluded from the variable list, the error rate for the LDA model was 42% as compared to an error rate of 34% for the PHREG model.Conclusions: This study suggests that a PHREG model can perform as well or better than a traditional classifier such as LDA to classify patients into prognostic classes. Also, this study suggests that in the absence of the tumor stage as a variable, Bcl-2 expression is a strong prognostic molecular marker of CRC.
Chang-zhi CHENG
2011-06-01
Full Text Available Objective To explore the risk factors of complication of acute renal failure(ARF in war injuries of limbs.Methods The clinical data of 352 patients with limb injuries admitted to 303 Hospital of PLA from 1968 to 2002 were retrospectively analyzed.The patients were divided into ARF group(n=9 and non-ARF group(n=343 according to the occurrence of ARF,and the case-control study was carried out.Ten factors which might lead to death were analyzed by logistic regression to screen the risk factors for ARF,including causes of trauma,shock after injury,time of admission to hospital after injury,injured sites,combined trauma,number of surgical procedures,presence of foreign matters,features of fractures,amputation,and tourniquet time.Results Fifteen of the 352 patients died(4.3%,among them 7 patients(46.7% died of ARF,3(20.0% of pulmonary embolism,3(20.0% of gas gangrene,and 2(13.3% of multiple organ failure.Univariate analysis revealed that the shock,time before admitted to hospital,amputation and tourniquet time were the risk factors for ARF in the wounded with limb injuries,while the logistic regression analysis showed only amputation was the risk factor for ARF(P < 0.05.Conclusion ARF is the primary cause-of-death in the wounded with limb injury.Prompt and accurate treatment and optimal time for amputation may be beneficial to decreasing the incidence and mortality of ARF in the wounded with severe limb injury and ischemic necrosis.
Identifying clinical course patterns in SMS data using cluster analysis
Kent, Peter; Kongsted, Alice
2012-01-01
ABSTRACT: BACKGROUND: Recently, there has been interest in using the short message service (SMS or text messaging), to gather frequent information on the clinical course of individual patients. One possible role for identifying clinical course patterns is to assist in exploring clinically importa...... of cluster analysis. More research is needed, especially head-to-head studies, to identify which technique is best to use under what circumstances.......ABSTRACT: BACKGROUND: Recently, there has been interest in using the short message service (SMS or text messaging), to gather frequent information on the clinical course of individual patients. One possible role for identifying clinical course patterns is to assist in exploring clinically important...... by spline analysis. However, cluster analysis of SMS data in its original untransformed form may be simpler and offer other advantages. Therefore, the aim of this study was to determine whether cluster analysis could be used for identifying clinical course patterns distinct from the pattern of the whole...
Expert Involvement Predicts mHealth App Downloads: Multivariate Regression Analysis of Urology Apps.
Pereira-Azevedo, Nuno; Osório, Luís; Cavadas, Vitor; Fraga, Avelino; Carrasquinho, Eduardo; Cardoso de Oliveira, Eduardo; Castelo-Branco, Miguel; Roobol, Monique J
2016-07-15
Urological mobile medical (mHealth) apps are gaining popularity with both clinicians and patients. mHealth is a rapidly evolving and heterogeneous field, with some urology apps being downloaded over 10,000 times and others not at all. The factors that contribute to medical app downloads have yet to be identified, including the hypothetical influence of expert involvement in app development. The objective of our study was to identify predictors of the number of urology app downloads. We reviewed urology apps available in the Google Play Store and collected publicly available data. Multivariate ordinal logistic regression evaluated the effect of publicly available app variables on the number of apps being downloaded. Of 129 urology apps eligible for study, only 2 (1.6%) had >10,000 downloads, with half having ≤100 downloads and 4 (3.1%) having none at all. Apps developed with expert urologist involvement (P=.003), optional in-app purchases (P=.01), higher user rating (PApp cost was inversely related to the number of downloads (Papp development is likely to enhance its chances to have a higher number of downloads. This finding should help in the design of better apps and further promote urologist involvement in mHealth. Official certification processes are required to ensure app quality and user safety.
Liu, Bilan; Qiu, Xing; Zhu, Tong; Tian, Wei; Hu, Rui; Ekholm, Sven; Schifitto, Giovanni; Zhong, Jianhui
2016-03-01
Subject-specific longitudinal DTI study is vital for investigation of pathological changes of lesions and disease evolution. Spatial Regression Analysis of Diffusion tensor imaging (SPREAD) is a non-parametric permutation-based statistical framework that combines spatial regression and resampling techniques to achieve effective detection of localized longitudinal diffusion changes within the whole brain at individual level without a priori hypotheses. However, boundary blurring and dislocation limit its sensitivity, especially towards detecting lesions of irregular shapes. In the present study, we propose an improved SPREAD (dubbed improved SPREAD, or iSPREAD) method by incorporating a three-dimensional (3D) nonlinear anisotropic diffusion filtering method, which provides edge-preserving image smoothing through a nonlinear scale space approach. The statistical inference based on iSPREAD was evaluated and compared with the original SPREAD method using both simulated and in vivo human brain data. Results demonstrated that the sensitivity and accuracy of the SPREAD method has been improved substantially by adapting nonlinear anisotropic filtering. iSPREAD identifies subject-specific longitudinal changes in the brain with improved sensitivity, accuracy, and enhanced statistical power, especially when the spatial correlation is heterogeneous among neighboring image pixels in DTI.
Sugimoto, Dai; Myer, Gregory D; Barber Foss, Kim D; Pepin, Michael J; Micheli, Lyle J; Hewett, Timothy E
2017-01-01
Objective The aim of this study was to determine key components in neuromuscular training that optimise ACL injury reduction in female athletes using meta-regression analyses. Design Systematic review and meta-regression. Data sources The literature search was performed in PubMed and EBSCO. Eligibility criteria Inclusion criteria for the current analysis were: (1) documented the number of ACL injuries, (2) employed a neuromuscular training intervention that aimed to reduce ACL injuries, (3) had a comparison group, (4) used a prospective control study design and (5) recruited female athletes as participants. Two independent reviewers extracted studies which met the inclusion criteria. Methodological quality of included study and strength of recommendation were evaluated. Number of ACL injuries and participants in control and intervention groups, age of participants, dosage of neuromuscular training, exercise variations within neuromuscular training and status of verbal feedback were extracted. Results The meta-regression analyses identified age of participants, dosage of neuromuscular training, exercise variations within neuromuscular training and utilisation of verbal feedback as significant predictors of ACL injury reduction (p=0.01 in fixed-effects model, p=0.03 in random-effects model). Inclusion of 1 of the 4 components in neuromuscular training could reduce ACL injury risk by 17.2–17.7% in female athletes. No significant heterogeneity and publication bias effects were detected. Strength of recommendation was rated as A (recommendation based on consistent and good-quality patient-oriented study evidence). Conclusions Age of participants, dosage of neuromuscular training, exercise variations within neuromuscular training and utilisation of verbal feedback are predictors that influence the optimisation of prophylactic effects of neuromuscular training and the resultant ACL injury reduction in female athletes. PMID:27251898
Noor Zaitun Yahaya
2017-01-01
Full Text Available This paper investigated the use of boosted regression trees (BRTs to draw an inference about daytime and nighttime ozone formation in a coastal environment. Hourly ground-level ozone data for a full calendar year in 2010 were obtained from the Kemaman (CA 002 air quality monitoring station. A BRT model was developed using hourly ozone data as a response variable and nitric oxide (NO, Nitrogen Dioxide (NO2 and Nitrogen Dioxide (NOx and meteorological parameters as explanatory variables. The ozone BRT algorithm model was constructed from multiple regression models, and the 'best iteration' of BRT model was performed by optimizing prediction performance. Sensitivity testing of the BRT model was conducted to determine the best parameters and good explanatory variables. Using the number of trees between 2,500-3,500, learning rate of 0.01, and interaction depth of 5 were found to be the best setting for developing the ozone boosting model. The performance of the O3 boosting models were assessed, and the fraction of predictions within two factor (FAC2, coefficient of determination (R2 and the index of agreement (IOA of the model developed for day and nighttime are 0.93, 0.69 and 0.73 for daytime and 0.79, 0.55 and 0.69 for nighttime respectively. Results showed that the model developed was within the acceptable range and could be used to understand ozone formation and identify potential sources of ozone for estimating O3 concentrations during daytime and nighttime. Results indicated that the wind speed, wind direction, relative humidity, and temperature were the most dominant variables in terms of influencing ozone formation. Finally, empirical evidence of the production of a high ozone level by wind blowing from coastal areas towards the interior region, especially from industrial areas, was obtained.
Josef Smolle
2001-01-01
Full Text Available Objective: To evaluate the feasibility of the CART (Classification and Regression Tree procedure for the recognition of microscopic structures in tissue counter analysis. Methods: Digital microscopic images of H&E stained slides of normal human skin and of primary malignant melanoma were overlayed with regularly distributed square measuring masks (elements and grey value, texture and colour features within each mask were recorded. In the learning set, elements were interactively labeled as representing either connective tissue of the reticular dermis, other tissue components or background. Subsequently, CART models were based on these data sets. Results: Implementation of the CART classification rules into the image analysis program showed that in an independent test set 94.1% of elements classified as connective tissue of the reticular dermis were correctly labeled. Automated measurements of the total amount of tissue and of the amount of connective tissue within a slide showed high reproducibility (r=0.97 and r=0.94, respectively; p < 0.001. Conclusions: CART procedure in tissue counter analysis yields simple and reproducible classification rules for tissue elements.
Zwinderman Aeilko H
2009-09-01
Full Text Available Abstract Background We generalized penalized canonical correlation analysis for analyzing microarray gene-expression measurements for checking completeness of known metabolic pathways and identifying candidate genes for incorporation in the pathway. We used Wold's method for calculation of the canonical variates, and we applied ridge penalization to the regression of pathway genes on canonical variates of the non-pathway genes, and the elastic net to the regression of non-pathway genes on the canonical variates of the pathway genes. Results We performed a small simulation to illustrate the model's capability to identify new candidate genes to incorporate in the pathway: in our simulations it appeared that a gene was correctly identified if the correlation with the pathway genes was 0.3 or more. We applied the methods to a gene-expression microarray data set of 12, 209 genes measured in 45 patients with glioblastoma, and we considered genes to incorporate in the glioma-pathway: we identified more than 25 genes that correlated > 0.9 with canonical variates of the pathway genes. Conclusion We concluded that penalized canonical correlation analysis is a powerful tool to identify candidate genes in pathway analysis.
Genetic analysis of somatic cell score in Norwegian cattle using random regression test-day models.
Odegård, J; Jensen, J; Klemetsdal, G; Madsen, P; Heringstad, B
2003-12-01
The dataset used in this analysis contained a total of 341,736 test-day observations of somatic cell scores from 77,110 primiparous daughters of 1965 Norwegian Cattle sires. Initial analyses, using simple random regression models without genetic effects, indicated that use of homogeneous residual variance was appropriate. Further analyses were carried out by use of a repeatability model and 12 random regression sire models. Legendre polynomials of varying order were used to model both permanent environmental and sire effects, as did the Wilmink function, the Lidauer-Mäntysaari function, and the Ali-Schaeffer function. For all these models, heritability estimates were lowest at the beginning (0.05 to 0.07) and higher at the end (0.09 to 0.12) of lactation. Genetic correlations between somatic cell scores early and late in lactation were moderate to high (0.38 to 0.71), whereas genetic correlations for adjacent DIM were near unity. Models were compared based on likelihood ratio tests, Bayesian information criterion, Akaike information criterion, residual variance, and predictive ability. Based on prediction of randomly excluded observations, models with 4 coefficients for permanent environmental effect were preferred over simpler models. More highly parameterized models did not substantially increase predictive ability. Evaluation of the different model selection criteria indicated that a reduced order of fit for sire effects was desireable. Models with zeroth- or first-order of fit for sire effects and higher order of fit for permanent environmental effects probably underestimated sire variance. The chosen model had Legendre polynomials with 3 coefficients for sire, and 4 coefficients for permanent environmental effects. For this model, trajectories of sire variance and heritability were similar assuming either homogeneous or heterogeneous residual variance structure.
The value of a statistical life: a meta-analysis with a mixed effects regression model.
Bellavance, François; Dionne, Georges; Lebeau, Martin
2009-03-01
The value of a statistical life (VSL) is a very controversial topic, but one which is essential to the optimization of governmental decisions. We see a great variability in the values obtained from different studies. The source of this variability needs to be understood, in order to offer public decision-makers better guidance in choosing a value and to set clearer guidelines for future research on the topic. This article presents a meta-analysis based on 39 observations obtained from 37 studies (from nine different countries) which all use a hedonic wage method to calculate the VSL. Our meta-analysis is innovative in that it is the first to use the mixed effects regression model [Raudenbush, S.W., 1994. Random effects models. In: Cooper, H., Hedges, L.V. (Eds.), The Handbook of Research Synthesis. Russel Sage Foundation, New York] to analyze studies on the value of a statistical life. We conclude that the variability found in the values studied stems in large part from differences in methodologies.
Yuliana Yuliana
2010-06-01
Full Text Available Quantitative Electronic Structure Activity Relationship (QSAR analysis of a series of benzalacetones has been investigated based on semi empirical PM3 calculation data using Principal Components Regression (PCR. Investigation has been done based on antimutagen activity from benzalacetone compounds (presented by log 1/IC50 and was studied as linear correlation with latent variables (Tx resulted from transformation of atomic net charges using Principal Component Analysis (PCA. QSAR equation was determinated based on distribution of selected components and then was analysed with PCR. The result was described by the following QSAR equation : log 1/IC50 = 6.555 + (2.177.T1 + (2.284.T2 + (1.933.T3 The equation was significant on the 95% level with statistical parameters : n = 28 r = 0.766 SE = 0.245 Fcalculation/Ftable = 3.780 and gave the PRESS result 0.002. It means that there were only a relatively few deviations between the experimental and theoretical data of antimutagenic activity. New types of benzalacetone derivative compounds were designed and their theoretical activity were predicted based on the best QSAR equation. It was found that compounds number 29, 30, 31, 32, 33, 35, 36, 37, 38, 40, 41, 42, 44, 47, 48, 49 and 50 have a relatively high antimutagenic activity. Keywords: QSAR; antimutagenic activity; benzalaceton; atomic net charge
Binary Logistic Regression Analysis of Foramen Magnum Dimensions for Sex Determination
Kamath, Venkatesh Gokuldas
2015-01-01
Purpose. The structural integrity of foramen magnum is usually preserved in fire accidents and explosions due to its resistant nature and secluded anatomical position and this study attempts to determine its sexing potential. Methods. The sagittal and transverse diameters and area of foramen magnum of seventy-two skulls (41 male and 31 female) from south Indian population were measured. The analysis was done using Student's t-test, linear correlation, histogram, Q-Q plot, and Binary Logistic Regression (BLR) to obtain a model for sex determination. The predicted probabilities of BLR were analysed using Receiver Operating Characteristic (ROC) curve. Result. BLR analysis and ROC curve revealed that the predictability of the dimensions in sexing the crania was 69.6% for sagittal diameter, 66.4% for transverse diameter, and 70.3% for area of foramen. Conclusion. The sexual dimorphism of foramen magnum dimensions is established. However, due to considerable overlapping of male and female values, it is unwise to singularly rely on the foramen measurements. However, considering the high sex predictability percentage of its dimensions in the present study and the studies preceding it, the foramen measurements can be used to supplement other sexing evidence available so as to precisely ascertain the sex of the skeleton. PMID:26346917
A calibration method of Argo floats based on multiple regression analysis
无
2006-01-01
Argo floats are free-moving floats that report vertical profiles of salinity, temperature and pressure at regular time intervals. These floats give good measurements of temperature and pressure, but salinity measurements may show significant sensor drifting with time. It is found that sensor drifting with time is not purely linear as presupposed by Wong (2003). A new method is developed to calibrate conductivity data measured by Argo floats. In this method, Wong's objective analysis method was adopted to estimate the background climatological salinity field on potential temperature surfaces from nearby historical data in WOD01. Furthermore, temperature and time factors are taken into account, and stepwise regression was used for a time-varying or temperature-varying slope in potential conductivity space to correct the drifting in these profiling float salinity data. The result shows salinity errors using this method are smaller than that of Wong's method, the quantitative and qualitative analysis of the conductivity sensor can be carried out with our method.
Rajab, Jasim Mohammed; Jafri, Mohd. Zubir Mat; Lim, Hwee San; Abdullah, Khiruddin
2012-10-01
This study encompasses air surface temperature (AST) modeling in the lower atmosphere. Data of four atmosphere pollutant gases (CO, O3, CH4, and H2O) dataset, retrieved from the National Aeronautics and Space Administration Atmospheric Infrared Sounder (AIRS), from 2003 to 2008 was employed to develop a model to predict AST value in the Malaysian peninsula using the multiple regression method. For the entire period, the pollutants were highly correlated (R=0.821) with predicted AST. Comparisons among five stations in 2009 showed close agreement between the predicted AST and the observed AST from AIRS, especially in the southwest monsoon (SWM) season, within 1.3 K, and for in situ data, within 1 to 2 K. The validation results of AST with AST from AIRS showed high correlation coefficient (R=0.845 to 0.918), indicating the model's efficiency and accuracy. Statistical analysis in terms of β showed that H2O (0.565 to 1.746) tended to contribute significantly to high AST values during the northeast monsoon season. Generally, these results clearly indicate the advantage of using the satellite AIRS data and a correlation analysis study to investigate the impact of atmospheric greenhouse gases on AST over the Malaysian peninsula. A model was developed that is capable of retrieving the Malaysian peninsulan AST in all weather conditions, with total uncertainties ranging between 1 and 2 K.
Machine learning of swimming data via wisdom of crowd and regression analysis.
Xie, Jiang; Xu, Junfu; Nie, Celine; Nie, Qing
2017-04-01
Every performance, in an officially sanctioned meet, by a registered USA swimmer is recorded into an online database with times dating back to 1980. For the first time, statistical analysis and machine learning methods are systematically applied to 4,022,631 swim records. In this study, we investigate performance features for all strokes as a function of age and gender. The variances in performance of males and females for different ages and strokes were studied, and the correlations of performances for different ages were estimated using the Pearson correlation. Regression analysis show the performance trends for both males and females at different ages and suggest critical ages for peak training. Moreover, we assess twelve popular machine learning methods to predict or classify swimmer performance. Each method exhibited different strengths or weaknesses in different cases, indicating no one method could predict well for all strokes. To address this problem, we propose a new method by combining multiple inference methods to derive Wisdom of Crowd Classifier (WoCC). Our simulation experiments demonstrate that the WoCC is a consistent method with better overall prediction accuracy. Our study reveals several new age-dependent trends in swimming and provides an accurate method for classifying and predicting swimming times.
VanEngelsdorp, Dennis; Speybroeck, Niko; Evans, Jay D; Nguyen, Bach Kim; Mullin, Chris; Frazier, Maryann; Frazier, Jim; Cox-Foster, Diana; Chen, Yanping; Tarpy, David R; Haubruge, Eric; Pettis, Jeffrey S; Saegerman, Claude
2010-10-01
Colony collapse disorder (CCD), a syndrome whose defining trait is the rapid loss of adult worker honey bees, Apis mellifera L., is thought to be responsible for a minority of the large overwintering losses experienced by U.S. beekeepers since the winter 2006-2007. Using the same data set developed to perform a monofactorial analysis (PloS ONE 4: e6481, 2009), we conducted a classification and regression tree (CART) analysis in an attempt to better understand the relative importance and interrelations among different risk variables in explaining CCD. Fifty-five exploratory variables were used to construct two CART models: one model with and one model without a cost of misclassifying a CCD-diagnosed colony as a non-CCD colony. The resulting model tree that permitted for misclassification had a sensitivity and specificity of 85 and 74%, respectively. Although factors measuring colony stress (e.g., adult bee physiological measures, such as fluctuating asymmetry or mass of head) were important discriminating values, six of the 19 variables having the greatest discriminatory value were pesticide levels in different hive matrices. Notably, coumaphos levels in brood (a miticide commonly used by beekeepers) had the highest discriminatory value and were highest in control (healthy) colonies. Our CART analysis provides evidence that CCD is probably the result of several factors acting in concert, making afflicted colonies more susceptible to disease. This analysis highlights several areas that warrant further attention, including the effect of sublethal pesticide exposure on pathogen prevalence and the role of variability in bee tolerance to pesticides on colony survivorship.
Pudji Ismartini
2010-08-01
Full Text Available One of the major problem facing the data modelling at social area is multicollinearity. Multicollinearity can have significant impact on the quality and stability of the fitted regression model. Common classical regression technique by using Least Squares estimate is highly sensitive to multicollinearity problem. In such a problem area, Partial Least Squares Regression (PLSR is a useful and flexible tool for statistical model building; however, PLSR can only yields point estimations. This paper will construct the interval estimations for PLSR regression parameters by implementing Jackknife technique to poverty data. A SAS macro programme is developed to obtain the Jackknife interval estimator for PLSR.
Naghshpour, Shahdad
2012-01-01
Regression analysis is the most commonly used statistical method in the world. Although few would characterize this technique as simple, regression is in fact both simple and elegant. The complexity that many attribute to regression analysis is often a reflection of their lack of familiarity with the language of mathematics. But regression analysis can be understood even without a mastery of sophisticated mathematical concepts. This book provides the foundation and will help demystify regression analysis using examples from economics and with real data to show the applications of the method. T
Hart, Harvi F; Barton, Christian J; Khan, Karim M; Riel, Henrik; Crossley, Kay M
2017-05-01
Patellofemoral pain (PFP) occurs frequently, and may be related to patellofemoral osteoarthritis (PFOA). Obesity is associated with increased risk of knee OA. This systematic review involves a meta-regression and analysis to determine the relationship between body mass index (BMI) and PFP and PFOA, and to determine the link between BMI and interventional outcomes. We searched seven electronic databases and reference lists of relevant papers and systematic reviews, for cross-sectional, prospective, human-based observational and interventional studies reporting BMI in individuals with PFP or PFOA compared to healthy controls. Two independent reviewers appraised methodological quality (epidemiological appraisal instrument). Where possible, data from prospective studies were pooled to conduct meta-regression and case-control, and intervention studies to conduct meta-analysis using the following categories: adolescents with PFP, adults with PFP and PFOA. 52 studies were included. We found greater BMI in adults with PFP (standardised mean difference: 0.24, 95% CI 0.12 to 0.36) and PFOA (0.73, 0.46 to 0.99) compared to healthy controls, but not in adolescents with PFP (-0.19, -0.56 to 0.18). We also observed statistical trends (pBMI being a predictor for development of PFP in adults (0.34, -0.04 to 0.71). No significant link between BMI and intervention outcomes in adults with PFP was identified. Higher BMI is present in PFP and PFOA, but not in adolescents with PFP. CRD42015024812. Published by the BMJ Publishing Group Limited. For permission to use (where not already granted under a licence) please go to http://www.bmj.com/company/products-services/rights-and-licensing/.
Comparative analysis of regression and artificial neural network models for wind speed prediction
Bilgili, Mehmet; Sahin, Besir
2010-11-01
In this study, wind speed was modeled by linear regression (LR), nonlinear regression (NLR) and artificial neural network (ANN) methods. A three-layer feedforward artificial neural network structure was constructed and a backpropagation algorithm was used for the training of ANNs. To get a successful simulation, firstly, the correlation coefficients between all of the meteorological variables (wind speed, ambient temperature, atmospheric pressure, relative humidity and rainfall) were calculated taking two variables in turn for each calculation. All independent variables were added to the simple regression model. Then, the method of stepwise multiple regression was applied for the selection of the “best” regression equation (model). Thus, the best independent variables were selected for the LR and NLR models and also used in the input layer of the ANN. The results obtained by all methods were compared to each other. Finally, the ANN method was found to provide better performance than the LR and NLR methods.
Using the classical linear regression model in analysis of the dependences of conveyor belt life
Miriam Andrejiová
2013-12-01
Full Text Available The paper deals with the classical linear regression model of the dependence of conveyor belt life on some selected parameters: thickness of paint layer, width and length of the belt, conveyor speed and quantity of transported material. The first part of the article is about regression model design, point and interval estimation of parameters, verification of statistical significance of the model, and about the parameters of the proposed regression model. The second part of the article deals with identification of influential and extreme values that can have an impact on estimation of regression model parameters. The third part focuses on assumptions of the classical regression model, i.e. on verification of independence assumptions, normality and homoscedasticity of residuals.
Pradhan, Biswajeet
2010-05-01
This paper presents the results of the cross-validation of a multivariate logistic regression model using remote sensing data and GIS for landslide hazard analysis on the Penang, Cameron, and Selangor areas in Malaysia. Landslide locations in the study areas were identified by interpreting aerial photographs and satellite images, supported by field surveys. SPOT 5 and Landsat TM satellite imagery were used to map landcover and vegetation index, respectively. Maps of topography, soil type, lineaments and land cover were constructed from the spatial datasets. Ten factors which influence landslide occurrence, i.e., slope, aspect, curvature, distance from drainage, lithology, distance from lineaments, soil type, landcover, rainfall precipitation, and normalized difference vegetation index (ndvi), were extracted from the spatial database and the logistic regression coefficient of each factor was computed. Then the landslide hazard was analysed using the multivariate logistic regression coefficients derived not only from the data for the respective area but also using the logistic regression coefficients calculated from each of the other two areas (nine hazard maps in all) as a cross-validation of the model. For verification of the model, the results of the analyses were then compared with the field-verified landslide locations. Among the three cases of the application of logistic regression coefficient in the same study area, the case of Selangor based on the Selangor logistic regression coefficients showed the highest accuracy (94%), where as Penang based on the Penang coefficients showed the lowest accuracy (86%). Similarly, among the six cases from the cross application of logistic regression coefficient in other two areas, the case of Selangor based on logistic coefficient of Cameron showed highest (90%) prediction accuracy where as the case of Penang based on the Selangor logistic regression coefficients showed the lowest accuracy (79%). Qualitatively, the cross
Structural parameter identifiability analysis for dynamic reaction networks
Davidescu, Florin Paul; Jørgensen, Sten Bay
2008-01-01
. The proposed analysis is performed in two phases. The first phase determines the structurally identifiable reaction rates based on reaction network stoichiometry. The second phase assesses the structural parameter identifiability of the specific kinetic rate expressions using a generating series expansion...... method based on Lie derivatives. The proposed systematic two phase methodology is illustrated on a mass action based model for an enzymatically catalyzed reaction pathway network where only a limited set of variables is measured. The methodology clearly pinpoints the structurally identifiable parameters...
Effect of acute hypoxia on cognition: A systematic review and meta-regression analysis.
McMorris, Terry; Hale, Beverley J; Barwood, Martin; Costello, Joseph; Corbett, Jo
2017-03-01
A systematic meta-regression analysis of the effects of acute hypoxia on the performance of central executive and non-executive tasks, and the effects of the moderating variables, arterial partial pressure of oxygen (PaO2) and hypobaric versus normobaric hypoxia, was undertaken. Studies were included if they were performed on healthy humans; within-subject design was used; data were reported giving the PaO2 or that allowed the PaO2 to be estimated (e.g. arterial oxygen saturation and/or altitude); and the duration of being in a hypoxic state prior to cognitive testing was ≤6days. Twenty-two experiments met the criteria for inclusion and demonstrated a moderate, negative mean effect size (g=-0.49, 95% CI -0.64 to -0.34, p<0.001). There were no significant differences between central executive and non-executive, perception/attention and short-term memory, tasks. Low (35-60mmHg) PaO2 was the key predictor of cognitive performance (R(2)=0.45, p<0.001) and this was independent of whether the exposure was in hypobaric hypoxic or normobaric hypoxic conditions.
Fernández-Fernández, Mario; Rodríguez-González, Pablo; García Alonso, J Ignacio
2016-10-01
We have developed a novel, rapid and easy calculation procedure for Mass Isotopomer Distribution Analysis based on multiple linear regression which allows the simultaneous calculation of the precursor pool enrichment and the fraction of newly synthesized labelled proteins (fractional synthesis) using linear algebra. To test this approach, we used the peptide RGGGLK as a model tryptic peptide containing three subunits of glycine. We selected glycine labelled in two (13) C atoms ((13) C2 -glycine) as labelled amino acid to demonstrate that spectral overlap is not a problem in the proposed methodology. The developed methodology was tested first in vitro by changing the precursor pool enrichment from 10 to 40% of (13) C2 -glycine. Secondly, a simulated in vivo synthesis of proteins was designed by combining the natural abundance RGGGLK peptide and 10 or 20% (13) C2 -glycine at 1 : 1, 1 : 3 and 3 : 1 ratios. Precursor pool enrichments and fractional synthesis values were calculated with satisfactory precision and accuracy using a simple spreadsheet. This novel approach can provide a relatively rapid and easy means to measure protein turnover based on stable isotope tracers. Copyright © 2016 John Wiley & Sons, Ltd. Copyright © 2016 John Wiley & Sons, Ltd.
A New Global Regression Analysis Method for the Prediction of Wind Tunnel Model Weight Corrections
Ulbrich, Norbert Manfred; Bridge, Thomas M.; Amaya, Max A.
2014-01-01
A new global regression analysis method is discussed that predicts wind tunnel model weight corrections for strain-gage balance loads during a wind tunnel test. The method determines corrections by combining "wind-on" model attitude measurements with least squares estimates of the model weight and center of gravity coordinates that are obtained from "wind-off" data points. The method treats the least squares fit of the model weight separate from the fit of the center of gravity coordinates. Therefore, it performs two fits of "wind- off" data points and uses the least squares estimator of the model weight as an input for the fit of the center of gravity coordinates. Explicit equations for the least squares estimators of the weight and center of gravity coordinates are derived that simplify the implementation of the method in the data system software of a wind tunnel. In addition, recommendations for sets of "wind-off" data points are made that take typical model support system constraints into account. Explicit equations of the confidence intervals on the model weight and center of gravity coordinates and two different error analyses of the model weight prediction are also discussed in the appendices of the paper.
Boldizsar Nagy
2017-05-01
Full Text Available In the present study the biosorption characteristics of Cd (II and Zn (II ions from monocomponent aqueous solutions by Agaricus bisporus macrofungus were investigated. The initial metal ion concentrations, contact time, initial pH and temperature were parameters that influence the biosorption. Maximum removal efficiencies up to 76.10% and 70.09% (318 K for Cd (II and Zn (II, respectively and adsorption capacities up to 3.49 and 2.39 mg/g for Cd (II and Zn (II, respectively at the highest concentration, were calculated. The experimental data were analyzed using pseudo-first- and pseudo-second-order kinetic models, various isotherm models in linear and nonlinear (CMA-ES optimization algorithm regression and thermodynamic parameters were calculated. The results showed that the biosorption process of both studied metal ions, followed pseudo second-order kinetics, while equilibrium is best described by Sips isotherm. The changes in morphological structure after heavy metal-biomass interactions were evaluated by SEM analysis. Our results confirmed that macrofungus A. bisporus could be used as a cost effective, efficient biosorbent for the removal of Cd (II and Zn (II from aqueous synthetic solutions.
A systematic review and meta-regression analysis of mivacurium for tracheal intubation.
Vanlinthout, L E H; Mesfin, S H; Hens, N; Vanacker, B F; Robertson, E N; Booij, L H D J
2014-12-01
We systematically reviewed factors associated with intubation conditions in randomised controlled trials of mivacurium, using random-effects meta-regression analysis. We included 29 studies of 1050 healthy participants. Four factors explained 72.9% of the variation in the probability of excellent intubation conditions: mivacurium dose, 24.4%; opioid use, 29.9%; time to intubation and age together, 18.6%. The odds ratio (95% CI) for excellent intubation was 3.14 (1.65-5.73) for doubling the mivacurium dose, 5.99 (2.14-15.18) for adding opioids to the intubation sequence, and 6.55 (6.01-7.74) for increasing the delay between mivacurium injection and airway insertion from 1 to 2 min in subjects aged 25 years and 2.17 (2.01-2.69) for subjects aged 70 years, p < 0.001 for all. We conclude that good conditions for tracheal intubation are more likely by delaying laryngoscopy after injecting a higher dose of mivacurium with an opioid, particularly in older people.
C. Makendran
2015-01-01
Full Text Available Prediction models for low volume village roads in India are developed to evaluate the progression of different types of distress such as roughness, cracking, and potholes. Even though the Government of India is investing huge quantum of money on road construction every year, poor control over the quality of road construction and its subsequent maintenance is leading to the faster road deterioration. In this regard, it is essential that scientific maintenance procedures are to be evolved on the basis of performance of low volume flexible pavements. Considering the above, an attempt has been made in this research endeavor to develop prediction models to understand the progression of roughness, cracking, and potholes in flexible pavements exposed to least or nil routine maintenance. Distress data were collected from the low volume rural roads covering about 173 stretches spread across Tamil Nadu state in India. Based on the above collected data, distress prediction models have been developed using multiple linear regression analysis. Further, the models have been validated using independent field data. It can be concluded that the models developed in this study can serve as useful tools for the practicing engineers maintaining flexible pavements on low volume roads.
Shayan, Zahra; Mohammad Gholi Mezerji, Naser; Shayan, Leila; Naseri, Parisa
2015-11-03
Logistic regression (LR) and linear discriminant analysis (LDA) are two popular statistical models for prediction of group membership. Although they are very similar, the LDA makes more assumptions about the data. When categorical and continuous variables used simultaneously, the optimal choice between the two models is questionable. In most studies, classification error (CE) is used to discriminate between subjects in several groups, but this index is not suitable to predict the accuracy of the outcome. The present study compared LR and LDA models using classification indices. This cross-sectional study selected 243 cancer patients. Sample sets of different sizes (n = 50, 100, 150, 200, 220) were randomly selected and the CE, B, and Q classification indices were calculated by the LR and LDA models. CE revealed the a lack of superiority for one model over the other, but the results showed that LR performed better than LDA for the B and Q indices in all situations. No significant effect for sample size on CE was noted for selection of an optimal model. Assessment of the accuracy of prediction of real data indicated that the B and Q indices are appropriate for selection of an optimal model. The results of this study showed that LR performs better in some cases and LDA in others when based on CE. The CE index is not appropriate for classification, although the B and Q indices performed better and offered more efficient criteria for comparison and discrimination between groups.
Dai, Wensheng; Wu, Jui-Yu; Lu, Chi-Jie
2014-01-01
Sales forecasting is one of the most important issues in managing information technology (IT) chain store sales since an IT chain store has many branches. Integrating feature extraction method and prediction tool, such as support vector regression (SVR), is a useful method for constructing an effective sales forecasting scheme. Independent component analysis (ICA) is a novel feature extraction technique and has been widely applied to deal with various forecasting problems. But, up to now, only the basic ICA method (i.e., temporal ICA model) was applied to sale forecasting problem. In this paper, we utilize three different ICA methods including spatial ICA (sICA), temporal ICA (tICA), and spatiotemporal ICA (stICA) to extract features from the sales data and compare their performance in sales forecasting of IT chain store. Experimental results from a real sales data show that the sales forecasting scheme by integrating stICA and SVR outperforms the comparison models in terms of forecasting error. The stICA is a promising tool for extracting effective features from branch sales data and the extracted features can improve the prediction performance of SVR for sales forecasting.
Comparison of Bayesian and Classical Analysis of Weibull Regression Model: A Simulation Study
İmran KURT ÖMÜRLÜ
2011-01-01
Full Text Available Objective: The purpose of this study was to compare performances of classical Weibull Regression Model (WRM and Bayesian-WRM under varying conditions using Monte Carlo simulations. Material and Methods: It was simulated the generated data by running for each of classical WRM and Bayesian-WRM under varying informative priors and sample sizes using our simulation algorithm. In simulation studies, n=50, 100 and 250 were for sample sizes, and informative prior values using a normal prior distribution with was selected for b1. For each situation, 1000 simulations were performed. Results: Bayesian-WRM with proper informative prior showed a good performance with too little bias. It was found out that bias of Bayesian-WRM increased while priors were becoming distant from reliability in all sample sizes. Furthermore, Bayesian-WRM obtained predictions with more little standard error than the classical WRM in both of small and big samples in the light of proper priors. Conclusion: In this simulation study, Bayesian-WRM showed better performance than classical method, when subjective data analysis performed by considering of expert opinions and historical knowledge about parameters. Consequently, Bayesian-WRM should be preferred in existence of reliable informative priors, in the contrast cases, classical WRM should be preferred.
A Vehicle Traveling Time Prediction Method Based on Grey Theory and Linear Regression Analysis
TU Jun; LI Yan-ming; LIU Cheng-liang
2009-01-01
Vehicle traveling time prediction is an important part of the research of intelligent transportation system. By now, there have been various kinds of methods for vehicle traveling time prediction. But few consider both aspects of time and space. In this paper, a vehicle traveling time prediction method based on grey theory (GT) and linear regression analysis (LRA) is presented. In aspects of time, we use the history data sequence of bus speed on a certain road to predict the future bus speed on that road by GT. And in aspects of space, we calculate the traffic affecting factors between various roads by LRA. Using these factors we can predict the vehicle's speed at the lower road if the vehicle's speed at the current road is known. Finally we use time factor and space factor as the weighting factors of the two results predicted by GT and LRA respectively to find the fina0l result, thus calculating the vehicle's travehng time. The method also considers such factors as dwell time, thus making the prediction more accurate.
Variable Selection for Functional Logistic Regression in fMRI Data Analysis
Nedret BILLOR
2015-03-01
Full Text Available This study was motivated by classification problem in Functional Magnetic Resonance Imaging (fMRI, a noninvasive imaging technique which allows an experimenter to take images of a subject's brain over time. As fMRI studies usually have a small number of subjects and we assume that there is a smooth, underlying curve describing the observations in fMRI data, this results in incredibly high-dimensional datasets that are functional in nature. High dimensionality is one of the biggest problems in statistical analysis of fMRI data. There is also a need for the development of better classification methods. One of the best things about fMRI technique is its noninvasiveness. If statistical classification methods are improved, it could aid the advancement of noninvasive diagnostic techniques for mental illness or even degenerative diseases such as Alzheimer's. In this paper, we develop a variable selection technique, which tackles high dimensionality and correlation problems in fMRI data, based on L1 regularization-group lasso for the functional logistic regression model where the response is binary and represent two separate classes; the predictors are functional. We assess our method with a simulation study and an application to a real fMRI dataset.
Magura, Stephen; Cleland, Charles M.; Tonigan, J. Scott
2013-01-01
Objective: The objective of the study is to determine whether Alcoholics Anonymous (AA) participation leads to reduced drinking and problems related to drinking within Project MATCH (Matching Alcoholism Treatments to Client Heterogeneity), an existing national alcoholism treatment data set. Method: The method used is structural equation modeling of panel data with cross-lagged partial regression coefficients. The main advantage of this technique for the analysis of AA outcomes is that potential reciprocal causation between AA participation and drinking behavior can be explicitly modeled through the specification of finite causal lags. Results: For the outpatient subsample (n = 952), the results strongly support the hypothesis that AA attendance leads to increases in alcohol abstinence and reduces drinking/problems, whereas a causal effect in the reverse direction is unsupported. For the aftercare subsample (n = 774), the results are not as clear but also suggest that AA attendance leads to better outcomes. Conclusions: Although randomized controlled trials are the surest means of establishing causal relations between interventions and outcomes, such trials are rare in AA research for practical reasons. The current study successfully exploited the multiple data waves in Project MATCH to examine evidence of causality between AA participation and drinking outcomes. The study obtained unique statistical results supporting the effectiveness of AA primarily in the context of primary outpatient treatment for alcoholism. PMID:23490566
Wensheng Dai
2014-01-01
Full Text Available Sales forecasting is one of the most important issues in managing information technology (IT chain store sales since an IT chain store has many branches. Integrating feature extraction method and prediction tool, such as support vector regression (SVR, is a useful method for constructing an effective sales forecasting scheme. Independent component analysis (ICA is a novel feature extraction technique and has been widely applied to deal with various forecasting problems. But, up to now, only the basic ICA method (i.e., temporal ICA model was applied to sale forecasting problem. In this paper, we utilize three different ICA methods including spatial ICA (sICA, temporal ICA (tICA, and spatiotemporal ICA (stICA to extract features from the sales data and compare their performance in sales forecasting of IT chain store. Experimental results from a real sales data show that the sales forecasting scheme by integrating stICA and SVR outperforms the comparison models in terms of forecasting error. The stICA is a promising tool for extracting effective features from branch sales data and the extracted features can improve the prediction performance of SVR for sales forecasting.
Sih, A M; Kopp, D M; Tang, J H; Rosenberg, N E; Chipungu, E; Harfouche, M; Moyo, M; Mwale, M; Wilkinson, J P
2016-04-01
To compare primiparous and multiparous women who develop obstetric fistula (OF) and to assess predictors of fistula location. Cross-sectional study. Fistula Care Centre at Bwaila Hospital, Lilongwe, Malawi. Women with OF who presented between September 2011 and July 2014 with a complete obstetric history were eligible for the study. Women with OF were surveyed for their obstetric history. Women were classified as multiparous if prior vaginal or caesarean delivery was reported. The location of the fistula was determined at operation: OF involving the urethra, bladder neck, and midvagina were classified as low; OF involving the vaginal apex, cervix, uterus, and ureters were classified as high. Demographic information was compared between primiparous and multiparous women using chi-squared and Mann-Whitney U-tests. Multivariate logistic regression models were implemented to assess the relationship between variables of interest and fistula location. During the study period, 533 women presented for repair, of which 452 (84.8%) were included in the analysis. The majority (56.6%) were multiparous when the fistula formed. Multiparous women were more likely to have laboured fistula location (37.5 versus 11.2%, P fistula. Multiparity was common in our cohort, and these women were more likely to have a high fistula. Additional research is needed to understand the aetiology of high fistula including potential iatrogenic causes. Multiparity and caesarean delivery were associated with a high tract fistula in our Malawian cohort. © 2016 Royal College of Obstetricians and Gynaecologists.
Sih, Allison M.; Kopp, Dawn M.; Tang, Jennifer H.; Rosenberg, Nora E.; Chipungu, Ennet; Harfouche, Melike; Moyo, Margaret; Mwale, Mwawi; Wilkinson, Jeffrey P.
2016-01-01
Objective To compare primiparous and multiparous women who develop obstetric fistula (OF) and to assess predictors of fistula location Design Cross-sectional study Setting Fistula Care Center at Bwaila Hospital, Lilongwe, Malawi Population Women with OF who presented between September 2011 and July 2014 with a complete obstetric history were eligible for the study. Methods Women with OF were surveyed for their obstetric history. Women were classified as multiparous if prior vaginal or cesarean delivery was reported. Location of fistula was determined at operation. OF involving the urethra, bladder neck, and midvagina were classified as low; OF involving the vaginal apex, cervix, uterus, and ureters were classified as high. Main Outcome Measures Demographic information was compared between primiparous and multiparous women using Chi-squared and Mann-Whitney U tests. Multivariate logistic regression models were implemented to assess the relationship between variables of interest and fistula location. Results During the study period, 533 women presented for repair, of which 452 (84.8%) were included in the analysis. The majority (56.6%) were multiparous when the fistula formed. Multiparous women were more likely to have labored less than a day (62.4% vs 44.5%, pfistula location (37.5% vs 11.2%, pfistula. Conclusions Multiparity was common in our cohort, and these women were more likely to have a high fistula. Additional research is needed to understand the etiology of high fistula including potential iatrogenic causes. PMID:26853525
Margherita Velucchi
2014-09-01
Full Text Available Labor productivity is very complex to analyze across time, sectors and countries. In particular, in Italy, labor productivity has shown a prolonged slowdown but sector analyses highlight the presence of specific niches that have good levels of productivity and performance. This paper investigates how firms' characteristics might have affected the dynamics of the Italian service and manufacturing firms labor productivity in recent years (1998-2007, comparing them and focusing on some relevant sectors. We use a micro level original panel from the Italian National Institute of Statistics (ISTAT and a longitudinal quantile regression approach that allow us to show that labor productivity is highly heterogeneous across sectors and that the links between labor productivity and firms' characteristics are not constant across quantiles. We show that average estimates obtained via GLS do not capture the complex dynamics and heterogeneity of the service and manufacturing firms' labor productivity. Using this approach, we show that innovativeness and human capital, in particular, have a very strong impact on fostering labor productivity of lower productive firms. From the sector analysis on four service' sectors (restaurants & hotels, trade distributors, trade shops and legal & accountants we show that heterogeneity is more intense at a sector level and we derive some common features that may be useful in terms of policy implications.
Juan Merlo
Full Text Available Many multilevel logistic regression analyses of "neighbourhood and health" focus on interpreting measures of associations (e.g., odds ratio, OR. In contrast, multilevel analysis of variance is rarely considered. We propose an original stepwise analytical approach that distinguishes between "specific" (measures of association and "general" (measures of variance contextual effects. Performing two empirical examples we illustrate the methodology, interpret the results and discuss the implications of this kind of analysis in public health.We analyse 43,291 individuals residing in 218 neighbourhoods in the city of Malmö, Sweden in 2006. We study two individual outcomes (psychotropic drug use and choice of private vs. public general practitioner, GP for which the relative importance of neighbourhood as a source of individual variation differs substantially. In Step 1 of the analysis, we evaluate the OR and the area under the receiver operating characteristic (AUC curve for individual-level covariates (i.e., age, sex and individual low income. In Step 2, we assess general contextual effects using the AUC. Finally, in Step 3 the OR for a specific neighbourhood characteristic (i.e., neighbourhood income is interpreted jointly with the proportional change in variance (i.e., PCV and the proportion of ORs in the opposite direction (POOR statistics.For both outcomes, information on individual characteristics (Step 1 provide a low discriminatory accuracy (AUC = 0.616 for psychotropic drugs; = 0.600 for choosing a private GP. Accounting for neighbourhood of residence (Step 2 only improved the AUC for choosing a private GP (+0.295 units. High neighbourhood income (Step 3 was strongly associated to choosing a private GP (OR = 3.50 but the PCV was only 11% and the POOR 33%.Applying an innovative stepwise multilevel analysis, we observed that, in Malmö, the neighbourhood context per se had a negligible influence on individual use of psychotropic drugs, but
An, Xin; Xu, Shuo; Zhang, Lu-Da; Su, Shi-Guang
2009-01-01
In the present paper, on the basis of LS-SVM algorithm, we built a multiple dependent variables LS-SVM (MLS-SVM) regression model whose weights can be optimized, and gave the corresponding algorithm. Furthermore, we theoretically explained the relationship between MLS-SVM and LS-SVM. Sixty four broomcorn samples were taken as experimental material, and the sample ratio of modeling set to predicting set was 51 : 13. We first selected randomly and uniformly five weight groups in the interval [0, 1], and then in the way of leave-one-out (LOO) rule determined one appropriate weight group and parameters including penalizing parameters and kernel parameters in the model according to the criterion of the minimum of average relative error. Then a multiple dependent variables quantitative analysis model was built with NIR spectrum and simultaneously analyzed three chemical constituents containing protein, lysine and starch. Finally, the average relative errors between actual values and predicted ones by the model of three components for the predicting set were 1.65%, 6.47% and 1.37%, respectively, and the correlation coefficients were 0.9940, 0.8392 and 0.8825, respectively. For comparison, LS-SVM was also utilized, for which the average relative errors were 1.68%, 6.25% and 1.47%, respectively, and the correlation coefficients were 0.9941, 0.8310 and 0.8800, respectively. It is obvious that MLS-SVM algorithm is comparable to LS-SVM algorithm in modeling analysis performance, and both of them can give satisfying results. The result shows that the model with MLS-SVM algorithm is capable of doing multi-components NIR quantitative analysis synchronously. Thus MLS-SVM algorithm offers a new multiple dependent variables quantitative analysis approach for chemometrics. In addition, the weights have certain effect on the prediction performance of the model with MLS-SVM, which is consistent with our intuition and is validated in this study. Therefore, it is necessary to optimize
Levy, Jonathan I; Clougherty, Jane E; Baxter, Lisa K; Houseman, E Andres; Paciorek, Christopher J
2010-12-01
Previous studies have identified associations between traffic exposures and a variety of adverse health effects, but many of these studies relied on proximity measures rather than measured or modeled concentrations of specific air pollutants, complicating interpretability of the findings. An increasing number of studies have used land-use regression (LUR) or other techniques to model small-scale variability in concentrations of specific air pollutants. However, these studies have generally considered a limited number of pollutants, focused on outdoor concentrations (or indoor concentrations of ambient origin) when indoor concentrations are better proxies for personal exposures, and have not taken full advantage of statistical methods for source apportionment that may have provided insight about the structure of the LUR models and the interpretability of model results. Given these issues, the primary objective of our study was to determine predictors of indoor and outdoor residential concentrations of multiple traffic-related air pollutants within an urban area, based on a combination of central site monitoring data; geographic information system (GIS) covariates reflecting traffic and other outdoor sources; questionnaire data reflecting indoor sources and activities that affect ventilation rates; and factor-analytic methods to better infer source contributions. As part of a prospective birth cohort study assessing asthma etiology in urban Boston, we collected indoor and/or outdoor 3-to-4 day samples of nitrogen dioxide (NO2) and fine particulate matter with an aerodynamic diameter or = 2.5 pm (PM2.5) at 44 residences during multiple seasons of the year from 2003 through 2005. We performed reflectance analysis, x-ray fluorescence spectroscopy (XRF), and high-resolution inductively coupled plasma-mass spectrometry (ICP-MS) on particle filters to estimate the concentrations of elemental carbon (EC), trace elements, and water-soluble metals, respectively. We derived
Hao, Lingxin
2007-01-01
Quantile Regression, the first book of Hao and Naiman's two-book series, establishes the seldom recognized link between inequality studies and quantile regression models. Though separate methodological literature exists for each subject, the authors seek to explore the natural connections between this increasingly sought-after tool and research topics in the social sciences. Quantile regression as a method does not rely on assumptions as restrictive as those for the classical linear regression; though more traditional models such as least squares linear regression are more widely utilized, Hao
Kudo, Takamasa; Uda, Shinsuke; Tsuchiya, Takaho; Wada, Takumi; Karasawa, Yasuaki; Fujii, Masashi; Saito, Takeshi H; Kuroda, Shinya
2016-01-01
Signaling networks are made up of limited numbers of molecules and yet can code information that controls different cellular states through temporal patterns and a combination of signaling molecules. In this study, we used a data-driven modeling approach, the Laguerre filter with partial least square regression, to describe how temporal and combinatorial patterns of signaling molecules are decoded by their downstream targets. The Laguerre filter is a time series model used to represent a nonlinear system based on Volterra series expansion. Furthermore, with this approach, each component of the Volterra series expansion is expanded by Laguerre basis functions. We combined two approaches, application of a Laguerre filter and partial least squares (PLS) regression, and applied the combined approach to analysis of a signal transduction network. We applied the Laguerre filter with PLS regression to identify input and output (IO) relationships between MAP kinases and the products of immediate early genes (IEGs). We found that Laguerre filter with PLS regression performs better than Laguerre filter with ordinary regression for the reproduction of a time series of IEGs. Analysis of the nonlinear characteristics extracted using the Laguerre filter revealed a priming effect of ERK and CREB on c-FOS induction. Specifically, we found that the effects of a first pulse of ERK enhance the subsequent effects on c-FOS induction of treatment with a second pulse of ERK, a finding consistent with prior molecular biological knowledge. The variable importance of projections and output loadings in PLS regression predicted the upstream dependency of each IEG. Thus, a Laguerre filter with partial least square regression approach appears to be a powerful method to find the processing mechanism of temporal patterns and combination of signaling molecules by their downstream gene expression.
Saadah, Nicholas H; van Hout, Fabienne M A; Schipperus, Martin R; le Cessie, Saskia; Middelburg, Rutger A; Wiersum-Osselton, Johanna C; van der Bom, Johanna G
2017-09-01
We estimated rates for common plasma-associated transfusion reactions and compared reported rates for various plasma types. We performed a systematic review and meta-analysis of peer-reviewed articles that reported plasma transfusion reaction rates. Random-effects pooled rates were calculated and compared between plasma types. Meta-regression was used to compare various plasma types with regard to their reported plasma transfusion reaction rates. Forty-eight studies reported transfusion reaction rates for fresh-frozen plasma (FFP; mixed-sex and male-only), amotosalen INTERCEPT FFP, methylene blue-treated FFP, and solvent/detergent-treated pooled plasma. Random-effects pooled average rates for FFP were: allergic reactions, 92/10(5) units transfused (95% confidence interval [CI], 46-184/10(5) units transfused); febrile nonhemolytic transfusion reactions (FNHTRs), 12/10(5) units transfused (95% CI, 7-22/10(5) units transfused); transfusion-associated circulatory overload (TACO), 6/10(5) units transfused (95% CI, 1-30/10(5) units transfused); transfusion-related acute lung injury (TRALI), 1.8/10(5) units transfused (95% CI, 1.2-2.7/10(5) units transfused); and anaphylactic reactions, 0.8/10(5) units transfused (95% CI, 0-45.7/10(5) units transfused). Risk differences between plasma types were not significant for allergic reactions, TACO, or anaphylactic reactions. Methylene blue-treated FFP led to fewer FNHTRs than FFP (risk difference = -15.3 FNHTRs/10(5) units transfused; 95% CI, -24.7 to -7.1 reactions/10(5) units transfused); and male-only FFP led to fewer cases of TRALI than mixed-sex FFP (risk difference = -0.74 TRALI/10(5) units transfused; 95% CI, -2.42 to -0.42 injuries/10(5) units transfused). Meta-regression demonstrates that the rate of FNHTRs is lower for methylene blue-treated compared with FFP, and the rate of TRALI is lower for male-only than for mixed-sex FFP; whereas no significant differences are observed between plasma types for allergic
Gizaw, Mesgana Seyoum; Gan, Thian Yew
2016-07-01
Regional Flood Frequency Analysis (RFFA) is a statistical method widely used to estimate flood quantiles of catchments with limited streamflow data. In addition, to estimate the flood quantile of ungauged sites, there could be only a limited number of stations with complete dataset are available from hydrologically similar, surrounding catchments. Besides traditional regression based RFFA methods, recent applications of machine learning algorithms such as the artificial neural network (ANN) have shown encouraging results in regional flood quantile estimations. Another novel machine learning technique that is becoming widely applicable in the hydrologic community is the Support Vector Regression (SVR). In this study, an RFFA model based on SVR was developed to estimate regional flood quantiles for two study areas, one with 26 catchments located in southeastern British Columbia (BC) and another with 23 catchments located in southern Ontario (ON), Canada. The SVR-RFFA model for both study sites was developed from 13 sets of physiographic and climatic predictors for the historical period. The Ef (Nash Sutcliffe coefficient) and R2 of the SVR-RFFA model was about 0.7 when estimating flood quantiles of 10, 25, 50 and 100 year return periods which indicate satisfactory model performance in both study areas. In addition, the SVR-RFFA model also performed well based on other goodness-of-fit statistics such as BIAS (mean bias) and BIASr (relative BIAS). If the amount of data available for training RFFA models is limited, the SVR-RFFA model was found to perform better than an ANN based RFFA model, and with significantly lower median CV (coefficient of variation) of the estimated flood quantiles. The SVR-RFFA model was then used to project changes in flood quantiles over the two study areas under the impact of climate change using the RCP4.5 and RCP8.5 climate projections of five Coupled Model Intercomparison Project (CMIP5) GCMs (Global Climate Models) for the 2041
Lançon Christophe
2006-07-01
Full Text Available Abstract Background Data comparing duloxetine with existing antidepressant treatments is limited. A comparison of duloxetine with fluoxetine has been performed but no comparison with venlafaxine, the other antidepressant in the same therapeutic class with a significant market share, has been undertaken. In the absence of relevant data to assess the place that duloxetine should occupy in the therapeutic arsenal, indirect comparisons are the most rigorous way to go. We conducted a systematic review of the efficacy of duloxetine, fluoxetine and venlafaxine versus placebo in the treatment of Major Depressive Disorder (MDD, and performed indirect comparisons through meta-regressions. Methods The bibliography of the Agency for Health Care Policy and Research and the CENTRAL, Medline, and Embase databases were interrogated using advanced search strategies based on a combination of text and index terms. The search focused on randomized placebo-controlled clinical trials involving adult patients treated for acute phase Major Depressive Disorder. All outcomes were derived to take account for varying placebo responses throughout studies. Primary outcome was treatment efficacy as measured by Hedge's g effect size. Secondary outcomes were response and dropout rates as measured by log odds ratios. Meta-regressions were run to indirectly compare the drugs. Sensitivity analysis, assessing the influence of individual studies over the results, and the influence of patients' characteristics were run. Results 22 studies involving fluoxetine, 9 involving duloxetine and 8 involving venlafaxine were selected. Using indirect comparison methodology, estimated effect sizes for efficacy compared with duloxetine were 0.11 [-0.14;0.36] for fluoxetine and 0.22 [0.06;0.38] for venlafaxine. Response log odds ratios were -0.21 [-0.44;0.03], 0.70 [0.26;1.14]. Dropout log odds ratios were -0.02 [-0.33;0.29], 0.21 [-0.13;0.55]. Sensitivity analyses showed that results were
Byers, John A
2013-08-01
Dose-response curves of the effects of semiochemicals on neurophysiology and behavior are reported in many articles in insect chemical ecology. Most curves are shown in figures representing points connected by straight lines, in which the x-axis has order of magnitude increases in dosage vs. responses on the y-axis. The lack of regression curves indicates that the nature of the dose-response relationship is not well understood. Thus, a computer model was developed to simulate a flux of various numbers of pheromone molecules (10(3) to 5 × 10(6)) passing by 10(4) receptors distributed among 10(6) positions along an insect antenna. Each receptor was depolarized by at least one strike by a molecule, and subsequent strikes had no additional effect. The simulations showed that with an increase in pheromone release rate, the antennal response would increase in a convex fashion and not in a logarithmic relation as suggested previously. Non-linear regression showed that a family of kinetic formation functions fit the simulated data nearly perfectly (R(2) >0.999). This is reasonable because olfactory receptors have proteins that bind to the pheromone molecule and are expected to exhibit enzyme kinetics. Over 90 dose-response relationships reported in the literature of electroantennographic and behavioral bioassays in the laboratory and field were analyzed by the logarithmic and kinetic formation functions. This analysis showed that in 95% of the cases, the kinetic functions explained the relationships better than the logarithmic (mean of about 20% better). The kinetic curves become sigmoid when graphed on a log scale on the x-axis. Dose-catch relationships in the field are similar to dose-EAR (effective attraction radius, in which a spherical radius indicates the trapping effect of a lure) and the circular EARc in two dimensions used in mass trapping models. The use of kinetic formation functions for dose-response curves of attractants, and kinetic decay curves for
A Bayesian ridge regression analysis of congestion's impact on urban expressway safety.
Shi, Qi; Abdel-Aty, Mohamed; Lee, Jaeyoung
2016-03-01
With the rapid growth of traffic in urban areas, concerns about congestion and traffic safety have been heightened. This study leveraged both Automatic Vehicle Identification (AVI) system and Microwave Vehicle Detection System (MVDS) installed on an expressway in Central Florida to explore how congestion impacts the crash occurrence in urban areas. Multiple congestion measures from the two systems were developed. To ensure more precise estimates of the congestion's effects, the traffic data were aggregated into peak and non-peak hours. Multicollinearity among traffic parameters was examined. The results showed the presence of multicollinearity especially during peak hours. As a response, ridge regression was introduced to cope with this issue. Poisson models with uncorrelated random effects, correlated random effects, and both correlated random effects and random parameters were constructed within the Bayesian framework. It was proven that correlated random effects could significantly enhance model performance. The random parameters model has similar goodness-of-fit compared with the model with only correlated random effects. However, by accounting for the unobserved heterogeneity, more variables were found to be significantly related to crash frequency. The models indicated that congestion increased crash frequency during peak hours while during non-peak hours it was not a major crash contributing factor. Using the random parameter model, the three congestion measures were compared. It was found that all congestion indicators had similar effects while Congestion Index (CI) derived from MVDS data was a better congestion indicator for safety analysis. Also, analyses showed that the segments with higher congestion intensity could not only increase property damage only (PDO) crashes, but also more severe crashes. In addition, the issues regarding the necessity to incorporate specific congestion indicator for congestion's effects on safety and to take care of the
Pradhan, Biswajeet
Recently, in 2006 and 2007 heavy monsoons rainfall have triggered floods along Malaysia's east coast as well as in southern state of Johor. The hardest hit areas are along the east coast of peninsular Malaysia in the states of Kelantan, Terengganu and Pahang. The city of Johor was particularly hard hit in southern side. The flood cost nearly billion ringgit of property and many lives. The extent of damage could have been reduced or minimized if an early warning system would have been in place. This paper deals with flood susceptibility analysis using logistic regression model. We have evaluated the flood susceptibility and the effect of flood-related factors along the Kelantan river basin using the Geographic Information System (GIS) and remote sensing data. Previous flooded areas were extracted from archived radarsat images using image processing tools. Flood susceptibility mapping was conducted in the study area along the Kelantan River using radarsat imagery and then enlarged to 1:25,000 scales. Topographical, hydrological, geological data and satellite images were collected, processed, and constructed into a spatial database using GIS and image processing. The factors chosen that influence flood occurrence were: topographic slope, topographic aspect, topographic curvature, DEM and distance from river drainage, all from the topographic database; flow direction, flow accumulation, extracted from hydrological database; geology and distance from lineament, taken from the geologic database; land use from SPOT satellite images; soil texture from soil database; and the vegetation index value from SPOT satellite images. Flood susceptible areas were analyzed and mapped using the probability-logistic regression model. Results indicate that flood prone areas can be performed at 1:25,000 which is comparable to some conventional flood hazard map scales. The flood prone areas delineated on these maps correspond to areas that would be inundated by significant flooding
The Analysis of Internet Addiction Scale Using Multivariate Adaptive Regression Splines
M Kayri
2010-12-01
Full Text Available "nBackground: Determining real effects on internet dependency is too crucial with unbiased and robust statistical method. MARS is a new non-parametric method in use in the literature for parameter estimations of cause and effect based research. MARS can both obtain legible model curves and make unbiased parametric predictions."nMethods: In order to examine the performance of MARS, MARS findings will be compared to Classification and Regression Tree (C&RT findings, which are considered in the literature to be efficient in revealing correlations between variables. The data set for the study is taken from "The Internet Addiction Scale" (IAS, which attempts to reveal addiction levels of individuals. The population of the study consists of 754 secondary school students (301 female, 443 male students with 10 missing data. MARS 2.0 trial version is used for analysis by MARS method and C&RT analysis was done by SPSS."nResults: MARS obtained six base functions of the model. As a common result of these six functions, regression equation of the model was found. Over the predicted variable, MARS showed that the predictors of daily Internet-use time on average, the purpose of Internet- use, grade of students and occupations of mothers had a significant effect (P< 0.05. In this comparative study, MARS obtained different findings from C&RT in dependency level prediction."nConclusion: The fact that MARS revealed extent to which the variable, which was considered significant, changes the character of the model was observed in this study.
Cecchini Diego M
2009-11-01
Full Text Available Abstract Background The central nervous system is considered a sanctuary site for HIV-1 replication. Variables associated with HIV cerebrospinal fluid (CSF viral load in the context of opportunistic CNS infections are poorly understood. Our objective was to evaluate the relation between: (1 CSF HIV-1 viral load and CSF cytological and biochemical characteristics (leukocyte count, protein concentration, cryptococcal antigen titer; (2 CSF HIV-1 viral load and HIV-1 plasma viral load; and (3 CSF leukocyte count and the peripheral blood CD4+ T lymphocyte count. Methods Our approach was to use a prospective collection and analysis of pre-treatment, paired CSF and plasma samples from antiretroviral-naive HIV-positive patients with cryptococcal meningitis and assisted at the Francisco J Muñiz Hospital, Buenos Aires, Argentina (period: 2004 to 2006. We measured HIV CSF and plasma levels by polymerase chain reaction using the Cobas Amplicor HIV-1 Monitor Test version 1.5 (Roche. Data were processed with Statistix 7.0 software (linear regression analysis. Results Samples from 34 patients were analyzed. CSF leukocyte count showed statistically significant correlation with CSF HIV-1 viral load (r = 0.4, 95% CI = 0.13-0.63, p = 0.01. No correlation was found with the plasma viral load, CSF protein concentration and cryptococcal antigen titer. A positive correlation was found between peripheral blood CD4+ T lymphocyte count and the CSF leukocyte count (r = 0.44, 95% CI = 0.125-0.674, p = 0.0123. Conclusion Our study suggests that CSF leukocyte count influences CSF HIV-1 viral load in patients with meningitis caused by Cryptococcus neoformans.
Nakasone, Yutaka; Ikeda, Osamu; Yamashita, Yasuyuki; Kudoh, Kouichi; Shigematsu, Yoshinori; Harada, Kazunori
2007-01-01
We applied multivariate analysis to the clinical findings in patients with acute gastrointestinal (GI) hemorrhage and compared the relationship between these findings and angiographic evidence of extravasation. Our study population consisted of 46 patients with acute GI bleeding. They were divided into two groups. In group 1 we retrospectively analyzed 41 angiograms obtained in 29 patients (age range, 25-91 years; average, 71 years). Their clinical findings including the shock index (SI), diastolic blood pressure, hemoglobin, platelet counts, and age, which were quantitatively analyzed. In group 2, consisting of 17 patients (age range, 21-78 years; average, 60 years), we prospectively applied statistical analysis by a logistics regression model to their clinical findings and then assessed 21 angiograms obtained in these patients to determine whether our model was useful for predicting the presence of angiographic evidence of extravasation. On 18 of 41 (43.9%) angiograms in group 1 there was evidence of extravasation; in 3 patients it was demonstrated only by selective angiography. Factors significantly associated with angiographic visualization of extravasation were the SI and patient age. For differentiation between cases with and cases without angiographic evidence of extravasation, the maximum cutoff point was between 0.51 and 0.0.53. Of the 21 angiograms obtained in group 2, 13 (61.9%) showed evidence of extravasation; in 1 patient it was demonstrated only on selective angiograms. We found that in 90% of the cases, the prospective application of our model correctly predicted the angiographically confirmed presence or absence of extravasation. We conclude that in patients with GI hemorrhage, angiographic visualization of extravasation is associated with the pre-embolization SI. Patients with a high SI value should undergo study to facilitate optimal treatment planning.
Stegeman, J.A.; Vernooij, J.C.M.; Khalifa, O.A.; Broek, van den J.; Mevius, D.J.
2006-01-01
In this study, we investigated the change in the resistance of Enterococcus faecium strains isolated from Dutch broilers against erythromycin and virginiamycin in 1998, 1999 and 2001 by logistic regression analysis and survival analysis. The E. faecium strains were isolated from caecal samples that
Maternal heavy alcohol use and toddler behavior problems: a fixed effects regression analysis.
Knudsen, Ann Kristin; Ystrom, Eivind; Skogen, Jens Christoffer; Torgersen, Leila
2015-10-01
Using data from the longitudinal Norwegian Mother and Child Cohort Study, the aims of the current study were to examine associations between postnatal maternal heavy alcohol use and toddler behavior problems, taking both observed and unobserved confounding factors into account by employing fixed effects regression models. Postnatal maternal heavy alcohol use (defined as drinking alcohol 4 or more times a week, or drinking 7 units or more per alcohol use episode) and toddler internalizing and externalizing behavior problems were assessed when the toddlers were aged 18 and 36 months. Maternal psychopathology, civil status and negative life events last year were included as time-variant covariates. Maternal heavy alcohol use was associated with toddler internalizing and externalizing behavior problems (p < 0.001) in the population when examined with generalized estimating equation models. The associations disappeared when observed and unobserved sources of confounding were taken into account in the fixed effects models [(p = 0.909 for externalizing behaviors (b = 0.002, SE = 0.021), p = 0.928 for internalizing behaviors (b = 0.002, SE = 0.023)], with an even further reduction of the estimates with the inclusion of time-variant confounders. No causal effect was found between postnatal maternal heavy alcohol use and toddler behavior problems. Increased levels of behavior problems among toddlers of heavy drinking mothers should therefore be attributed to other adverse characteristics associated with these mothers, toddlers and families. This should be taken into account when interventions aimed at at-risk families identified by maternal heavy alcohol use are planned and conducted.
Velo Suthar
2010-01-01
Full Text Available Problem statement: At present, after almost more than 20-decades, Malaysia can boast of a solid national philosophy of education, despite tremendous struggles and hopes. The professional learning opportunities are necessary to enhance, support and sustain student's mathematics achievement. Approach: Empirical evidence had shown that student's belief in mathematics is crucial in meeting career aspiration. In addition mathematical beliefs are closely correlated to their mathematics achievement among university students. Results: The literature exposed that a few studies had been done on university undergraduates. The present study involves a sample of eighty-six university undergraduate students, who had completed a self-reported questionnaire related to student mathematical beliefs on three dimensions, viz-a-viz beliefs about mathematics, beliefs about importance of mathematics and beliefs on one's ability in mathematics. The reliability index, using the Cronbach's alpha was 0.86, indicating a high level of internal consistency. Records of achievement (GPA were obtained from the academic division, University Putra Malaysia. Based on these records, students were classified into the minor and major mathematics group. The authors examined student's mathematical beliefs based on a three dimensional logistic regression model estimation technique, appropriate for a survey design study. Conclusion/Recommendations: The results illustrated and identified significant relationships between student beliefs about importance of mathematics and beliefs on one's ability in mathematics with mathematics achievement. In addition, the Hosmer and Lemeshow test was non-significant with a chi-square of 8.46, p = 0.3, which indicated that there is a good model fit as the data did not significantly deviate from the model. The overall model, 77.9% of the sample was classified correctly.
Regression models for air pollution and daily mortality: analysis of data from Birmingham, Alabama
Smith, R.L. [University of North Carolina, Chapel Hill, NC (United States). Dept. of Statistics; Davis, J.M. [North Carolina State University, Raleigh, NC (United States). Dept. of Marine, Earth and Atmospheric Sciences; Sacks, J. [National Institute of Statistical Sciences, Research Triangle Park, NC (United States); Speckman, P. [University of Missouri, Columbia, MO (United States). Dept. of Statistics; Styer, P.
2000-11-01
In recent years, a very large literature has built up on the human health effects of air pollution. Many studies have been based on time series analyses in which daily mortality counts, or some other measure such as hospital admissions, have been decomposed through regression analysis into contributions based on long-term trend and seasonality, meteorological effects, and air pollution. There has been a particular focus on particulate air pollution represented by PM{sub 10} (particulate matter of aerodynamic diameter 10 {mu}m or less), though in recent years more attention has been given to very small particles of diameter 2.5 {mu}m or less. Most of the existing data studies, however, are based on PM{sub 10} because of the wide availability of monitoring data for this variable. The persistence of the resulting effects across many different studies is widely cited as evidence that this is not mere statistical association, but indeed establishes a causal relationship. These studies have been cited by the United States Environmental Protection Agency (USEPA) as justification for a tightening on particulate matter standards in the 1997 revision of the National Ambient Air Quality Standard (NAAQS), which is the basis for air pollution regulation in the United States. The purpose of the present paper is to propose a systematic approach to the regression analyses that are central to this kind of research. We argue that the results may depend on a number of ad hoc features of the analysis, including which meteorological variables to adjust for, and the manner in which different lagged values of particulate matter are combined into a single 'exposure measure'. We also examine the question of whether the effects are linear or nonlinear, with particular attention to the possibility of a 'threshold effect', i.e. that significant effects occur only above some threshold. These points are illustrated with a data set from Birmingham, Alabama, first cited by
Ingunn Fride Tvete
Full Text Available Rheumatoid arthritis patients have been treated with disease modifying anti-rheumatic drugs (DMARDs and the newer biologic drugs. We sought to compare and rank the biologics with respect to efficacy. We performed a literature search identifying 54 publications encompassing 9 biologics. We conducted a multiple treatment comparison regression analysis letting the number experiencing a 50% improvement on the ACR score be dependent upon dose level and disease duration for assessing the comparable relative effect between biologics and placebo or DMARD. The analysis embraced all treatment and comparator arms over all publications. Hence, all measured effects of any biologic agent contributed to the comparison of all biologic agents relative to each other either given alone or combined with DMARD. We found the drug effect to be dependent on dose level, but not on disease duration, and the impact of a high versus low dose level was the same for all drugs (higher doses indicated a higher frequency of ACR50 scores. The ranking of the drugs when given without DMARD was certolizumab (ranked highest, etanercept, tocilizumab/ abatacept and adalimumab. The ranking of the drugs when given with DMARD was certolizumab (ranked highest, tocilizumab, anakinra/rituximab, golimumab/ infliximab/ abatacept, adalimumab/ etanercept [corrected]. Still, all drugs were effective. All biologic agents were effective compared to placebo, with certolizumab the most effective and adalimumab (without DMARD treatment and adalimumab/ etanercept (combined with DMARD treatment the least effective. The drugs were in general more effective, except for etanercept, when given together with DMARDs.
BIOELECTRICAL IMPEDANCE VECTOR ANALYSIS IDENTIFIES SARCOPENIA IN NURSING HOME RESIDENTS
Loss of muscle mass and water shifts between body compartments are contributing factors to frailty in the elderly. The body composition changes are especially pronounced in institutionalized elderly. We investigated the ability of single-frequency bioelectrical impedance analysis (BIA) to identify b...
Identifying subgroups of patients using latent class analysis
Nielsen, Anne Mølgaard; Kent, Peter; Hestbæk, Lise
2017-01-01
BACKGROUND: Heterogeneity in patients with low back pain (LBP) is well recognised and different approaches to subgrouping have been proposed. Latent Class Analysis (LCA) is a statistical technique that is increasingly being used to identify subgroups based on patient characteristics. However, as ...
A Vector Auto Regression Model Applied to Real Estate Development Investment: A Statistic Analysis
Liu, Fengyun; Matsuno, Shuji; Malekian, Reza; Yu, Jin; Li, Zhixiong
2016-01-01
.... The above theoretical model is empirically evidenced with VAR (Vector Auto Regression) methodology. A panel VAR model shows that land leasing and real estate price appreciation positively affect local government general fiscal revenue...
Regression analysis to predict growth performance from dietary net energy in growing-finishing pigs.
Nitikanchana, S; Dritz, S S; Tokach, M D; DeRouchey, J M; Goodband, R D; White, B J
2015-06-01
Data from 41 trials with multiple energy levels (285 observations) were used in a meta-analysis to predict growth performance based on dietary NE concentration. Nutrient and energy concentrations in all diets were estimated using the NRC ingredient library. Predictor variables examined for best fit models using Akaike information criteria included linear and quadratic terms of NE, BW, CP, standardized ileal digestible (SID) Lys, crude fiber, NDF, ADF, fat, ash, and their interactions. The initial best fit models included interactions between NE and CP or SID Lys. After removal of the observations that fed SID Lys below the suggested requirement, these terms were no longer significant. Including dietary fat in the model with NE and BW significantly improved the G:F prediction model, indicating that NE may underestimate the influence of fat on G:F. The meta-analysis indicated that, as long as diets are adequate for other nutrients (i.e., Lys), dietary NE is adequate to predict changes in ADG across different dietary ingredients and conditions. The analysis indicates that ADG increases with increasing dietary NE and BW but decreases when BW is above 87 kg. The G:F ratio improves with increasing dietary NE and fat but decreases with increasing BW. The regression equations were then evaluated by comparing the actual and predicted performance of 543 finishing pigs in 2 trials fed 5 dietary treatments, included 3 different levels of NE by adding wheat middlings, soybean hulls, dried distillers grains with solubles (DDGS; 8 to 9% oil), or choice white grease (CWG) to a corn-soybean meal-based diet. Diets were 1) 30% DDGS, 20% wheat middlings, and 4 to 5% soybean hulls (low energy); 2) 20% wheat middlings and 4 to 5% soybean hulls (low energy); 3) a corn-soybean meal diet (medium energy); 4) diet 2 supplemented with 3.7% CWG to equalize the NE level to diet 3 (medium energy); and 5) a corn-soybean meal diet with 3.7% CWG (high energy). Only small differences were observed
LI; XinTian; TIAN; Hui; CAI; GuoBiao
2013-01-01
This paper presents three-dimensional numerical simulations of the hybrid rocket motor with hydrogen peroxide (HP) and hy-droxyl terminated polybutadiene (HTPB) propellant combination and investigates the fuel regression rate distribution charac-teristics of different fuel types. The numerical models are established to couple the Navier-Stokes equations with turbulence,chemical reactions, solid fuel pyrolysis and solid-gas interfacial boundary conditions. Simulation results including the temper-ature contours and fuel regression rate distributions are presented for the tube, star and wagon wheel grains. The results demonstrate that the changing trends of the regression rate along the axis are similar for all kinds of fuel types, which decrease sharply near the leading edges of the fuels and then gradually increase with increasing axial locations. The regression rates of the star and wagon wheel grains show apparent three-dimensional characteristics, and they are higher in the regions of fuel surfaces near the central core oxidizer flow. The average regression rates increase as the oxidizer mass fluxes rise for all of the fuel types. However, under same oxidizer mass flux, the average regression rates of the star and wagon wheel grains are much larger than that of the tube grain due to their lower hydraulic diameters.
Effects of dietary fat on fertility of dairy cattle: A meta-analysis and meta-regression.
Rodney, R M; Celi, P; Scott, W; Breinhild, K; Lean, I J
2015-08-01
Evidence is increasing of positive effects of feeding fats during transition on fertility and the adaptation to lactation. This study used meta-analytic methods to explore the effects of including fats in the transition diet on the risk of pregnancy to service (proportion pregnant) and calving to pregnancy interval. Meta-analysis was used to integrate smaller studies and increase the statistical power over that of any single study and explore new hypotheses. We explored the effect of fats and diet composition on fertility using meta-regression methods. Relatively few highly controlled studies are available providing detailed descriptions of the diets used that examined interactions between fat nutrition and reproductive outcomes. Only 17 studies containing 26 comparisons were suitable for inclusion in statistical evaluations. Reproductive variables evaluated were risk of pregnancy (proportion pregnant), primarily to first service, and calving to pregnancy interval. Production variables examined were milk yield, milk composition, and body weight. The sources of heterogeneity in these studies were also explored. A 27% overall increase in pregnancy to service was observed (relative risk=1.27; 95% confidence interval Knapp Hartung 1.09 to 1.45), and results were relatively consistent (I(2)=19.9%). A strong indication of a reduction in calving to pregnancy interval was also identified, which was consistent across studies (I(2)=0.0%), supporting a conclusion that, overall, the inclusion of fats does improve fertility. Further exploration of the factors contributing to proportion pregnant using bivariate meta-regression identified variables that reflected changes in diet composition or animal response resulting from inclusion of the fat interventions in the experimental diets fed. Increased fermentable neutral detergent fiber and soluble fiber intakes increased the proportion pregnant, whereas increased milk yield of the treatment group decreased this measure
Diabetes mortality in Serbia, 1991-2015 (a nationwide study): A joinpoint regression analysis.
Ilic, Milena; Ilic, Irena
2017-02-01
The aim of this study was to analyze the mortality trends of diabetes mellitus in Serbia (excluding the Autonomous Province of Kosovo and Metohia). A population-based cross sectional study analyzing diabetes mortality in Serbia in the period 1991-2015 was carried out based on official data. The age-standardized mortality rates (per 100,000) were calculated by direct standardization, using the European Standard Population. Average annual percentage of change (AAPC) and the corresponding 95% confidence interval (CI) were computed using the joinpoint regression analysis. More than 63,000 (about 27,000 of men and 36,000 of women) diabetes deaths occurred in Serbia from 1991 to 2015. Death rates from diabetes were almost equal in men and in women (about 24.0 per 100,000) and places Serbia among the countries with the highest diabetes mortality rates in Europe. Since 1991, mortality from diabetes in men significantly increased by +1.2% per year (95% CI 0.7-1.7), but non-significantly increased in women by +0.2% per year (95% CI -0.4 to 0.7). Increased trends in diabetes type 1 mortality rates were significant in both genders in Serbia. Trends in mortality for diabetes type 2 showed a significant decrease in both genders since 2010. Given that diabetes mortality trends showed different patterns during the studied period, our results imply that further observation of trend is needed. Copyright © 2016 Primary Care Diabetes Europe. Published by Elsevier Ltd. All rights reserved.
Fushimi, Akihiro; Kawashima, Hiroto; Kajihara, Hideo
Understanding the contribution of each emission source of air pollutants to ambient concentrations is important to establish effective measures for risk reduction. We have developed a source apportionment method based on an atmospheric dispersion model and multiple linear regression analysis (MLR) in conjunction with ambient concentrations simultaneously measured at points in a grid network. We used a Gaussian plume dispersion model developed by the US Environmental Protection Agency called the Industrial Source Complex model (ISC) in the method. Our method does not require emission amounts or source profiles. The method was applied to the case of benzene in the vicinity of the Keiyo Central Coastal Industrial Complex (KCCIC), one of the biggest industrial complexes in Japan. Benzene concentrations were simultaneously measured from December 2001 to July 2002 at sites in a grid network established in the KCCIC and the surrounding residential area. The method was used to estimate benzene emissions from the factories in the KCCIC and from automobiles along a section of a road, and then the annual average contribution of the KCCIC to the ambient concentrations was estimated based on the estimated emissions. The estimated contributions of the KCCIC were 65% inside the complex, 49% at 0.5-km sites, 35% at 1.5-km sites, 20% at 3.3-km sites, and 9% at a 5.6-km site. The estimated concentrations agreed well with the measured values. The estimated emissions from the factories and the road were slightly larger than those reported in the first Pollutant Release and Transfer Register (PRTR). These results support the reliability of our method. This method can be applied to other chemicals or regions to achieve reasonable source apportionments.
Milena Ilic
Full Text Available BACKGROUND: Limited data on mortality from malignant lymphatic and hematopoietic neoplasms have been published for Serbia. METHODS: The study covered population of Serbia during the 1991-2010 period. Mortality trends were assessed using the joinpoint regression analysis. RESULTS: Trend for overall death rates from malignant lymphoid and haematopoietic neoplasms significantly decreased: by -2.16% per year from 1991 through 1998, and then significantly increased by +2.20% per year for the 1998-2010 period. The growth during the entire period was on average +0.8% per year (95% CI 0.3 to 1.3. Mortality was higher among males than among females in all age groups. According to the comparability test, mortality trends from malignant lymphoid and haematopoietic neoplasms in men and women were parallel (final selected model failed to reject parallelism, P = 0.232. Among younger Serbian population (0-44 years old in both sexes: trends significantly declined in males for the entire period, while in females 15-44 years of age mortality rates significantly declined only from 2003 onwards. Mortality trend significantly increased in elderly in both genders (by +1.7% in males and +1.5% in females in the 60-69 age group, and +3.8% in males and +3.6% in females in the 70+ age group. According to the comparability test, mortality trend for Hodgkin's lymphoma differed significantly from mortality trends for all other types of malignant lymphoid and haematopoietic neoplasms (P<0.05. CONCLUSION: Unfavourable mortality trend in Serbia requires targeted intervention for risk factors control, early diagnosis and modern therapy.
Spalj, Stjepan; Spalj, Vedrana Tudor; Ivanković, Luida; Plancak, Darije
2014-03-01
The aim of this study was to explore the patterns of oral health-related risk behaviours in relation to dental status, attitudes, motivation and knowledge among Croatian adolescents. The assessment was conducted in the sample of 750 male subjects - military recruits aged 18-28 in Croatia using the questionnaire and clinical examination. Mean number of decayed, missing and filled teeth (DMFT) and Significant Caries Index (SIC) were calculated. Multiple logistic regression models were crated for analysis. Although models of risk behaviours were statistically significant their explanatory values were quite low. Five of them--rarely toothbrushing, not using hygiene auxiliaries, rarely visiting dentist, toothache as a primary reason to visit dentist, and demand for tooth extraction due to toothache--had the highest explanatory values ranging from 21-29% and correctly classified 73-89% of subjects. Toothache as a primary reason to visit dentist, extraction as preferable therapy when toothache occurs, not having brushing education in school and frequent gingival bleeding were significantly related to population with high caries experience (DMFT > or = 14 according to SiC) producing Odds ratios of 1.6 (95% CI 1.07-2.46), 2.1 (95% CI 1.29-3.25), 1.8 (95% CI 1.21-2.74) and 2.4 (95% CI 1.21-2.74) respectively. DMFT> or = 14 model had low explanatory value of 6.5% and correctly classified 83% of subjects. It can be concluded that oral health-related risk behaviours are interrelated. Poor association was seen between attitudes concerning oral health and oral health-related risk behaviours, indicating insufficient motivation to change lifestyle and habits. Self-reported oral hygiene habits were not strongly related to dental status.
Regression analysis of time trends in perinatal mortality in Germany 1980-1993.
Scherb, H; Weigelt, E; Brüske-Hohlfeld, I
2000-02-01
Numerous investigations have been carried out on the possible impact of the Chernobyl accident on the prevalence of anomalies at birth and on perinatal mortality. In many cases the studies were aimed at the detection of differences of pregnancy outcome measurements between regions or time periods. Most authors conclude that there is no evidence of a detrimental physical effect on congenital anomalies or other outcomes of pregnancy following the accident. In this paper, we report on statistical analyses of time trends of perinatal mortality in Germany. Our main intention is to investigate whether perinatal mortality, as reflected in official records, was increased in 1987 as a possible effect of the Chernobyl accident. We show that, in Germany as a whole, there was a significantly elevated perinatal mortality proportion in 1987 as compared to the trend function. The increase is 4.8% (p = 0.0046) of the expected perinatal death proportion for 1987. Even more pronounced levels of 8.2% (p = 0. 0458) and 8.5% (p = 0.0702) may be found in the higher contaminated areas of the former German Democratic Republic (GDR), including West Berlin, and of Bavaria, respectively. To investigate the impact of statistical models on results, we applied three standard regression techniques. The observed significant increase in 1987 is independent of the statistical model used. Stillbirth proportions show essentially the same behavior as perinatal death proportions, but the results for all of Germany are nonsignificant due to the smaller numbers involved. Analysis of the association of stillbirth proportions with the (137)Cs deposition on a district level in Bavaria discloses a significant relationship. Our results are in contrast to those of many analyses of the health consequences of the Chernobyl accident and contradict the present radiobiologic knowledge. As we are dealing with highly aggregated data, other causes or artifacts may explain the observed effects. Hence, the findings
Flexible survival regression modelling
Cortese, Giuliana; Scheike, Thomas H; Martinussen, Torben
2009-01-01
Regression analysis of survival data, and more generally event history data, is typically based on Cox's regression model. We here review some recent methodology, focusing on the limitations of Cox's regression model. The key limitation is that the model is not well suited to represent time-varyi...
Identifying glioblastoma gene networks based on hypergeometric test analysis.
Vasileios Stathias
Full Text Available Patient specific therapy is emerging as an important possibility for many cancer patients. However, to identify such therapies it is essential to determine the genomic and transcriptional alterations present in one tumor relative to control samples. This presents a challenge since use of a single sample precludes many standard statistical analysis techniques. We reasoned that one means of addressing this issue is by comparing transcriptional changes in one tumor with those observed in a large cohort of patients analyzed by The Cancer Genome Atlas (TCGA. To test this directly, we devised a bioinformatics pipeline to identify differentially expressed genes in tumors resected from patients suffering from the most common malignant adult brain tumor, glioblastoma (GBM. We performed RNA sequencing on tumors from individual GBM patients and filtered the results through the TCGA database in order to identify possible gene networks that are overrepresented in GBM samples relative to controls. Importantly, we demonstrate that hypergeometric-based analysis of gene pairs identifies gene networks that validate experimentally. These studies identify a putative workflow for uncovering differentially expressed patient specific genes and gene networks for GBM and other cancers.
Zhang, Man; Liu, Xu-Hua; He, Xiong-Kui; Zhang, Lu-Da; Zhao, Long-Lian; Li, Jun-Hui
2010-05-01
In the present paper, taking 66 wheat samples for testing materials, ridge regression technology in near-infrared (NIR) spectroscopy quantitative analysis was researched. The NIR-ridge regression model for determination of protein content was established by NIR spectral data of 44 wheat samples to predict the protein content of the other 22 samples. The average relative error was 0.015 18 between the predictive results and Kjeldahl's values (chemical analysis values). And the predictive results were compared with those values derived through partial least squares (PLS) method, showing that ridge regression method was deserved to be chosen for NIR spectroscopy quantitative analysis. Furthermore, in order to reduce the disturbance to predictive capacity of the quantitative analysis model resulting from irrelevant information, one effective way is to screen the wavelength information. In order to select the spectral information with more content information and stronger relativity with the composition or the nature of the samples to improve the model's predictive accuracy, ridge regression was used to select wavelength information in this paper. The NIR-ridge regression model was established with the spectral information at 4 wavelength points, which were selected from 1 297 wavelength points, to predict the protein content of the 22 samples. The average relative error was 0.013 7 and the correlation coefficient reached 0.981 7 between the predictive results and Kjeldahl's values. The results showed that ridge regression was able to screen the essential wavelength information from a large amount of spectral information. It not only can simplify the model and effectively reduce the disturbance resulting from collinearity information, but also has practical significance for designing special NIR analysis instrument for analyzing specific component in some special samples.
Keerthiprasad.K
2014-08-01
Full Text Available In recent years, alloy steels have been widely usedin aerospace and automotive industries. Machining of these materials requires better understanding of cutting processes regarding accuracy and efficiency. This study addresses the modelling of the machinability of EN353 and 20mncr5 materials. In this study, multiple regression analysis (MRA is used to investigate the influence of some parameters on the thrust force and torque in the drilling processes of alloy steel materials. The model were identified by using cutting speed, feed rate, and depth as input data and the thrust force and torque as the output data. The statistical analysis accompanied with results showed that cutting feed (f were the most significant parameters on the drilling process, while spindle speed seemed insignificant. Since the spindle speed was insignificant, it directed us to set it either at the highest spindle speed to obtain high material removal rate or at the lowest spindle speed to prolong the tool life depending on the need for the application. The mathematical model is based on a power regression modelling, dependent on the three above mentioned parameters.
Mobley Lee R
2010-09-01
Full Text Available Abstract Background Colorectal cancer (CRC is the second leading cause of cancer death in the United States, and endoscopic screening can both detect and prevent cancer, but utilization is suboptimal and varies across geographic regions. We use multilevel regression to examine the various predictors of individuals' decisions to utilize endoscopic CRC screening. Study subjects are a 100% population cohort of Medicare beneficiaries identified in 2001 and followed through 2005. The outcome variable is a binary indicator of any sigmoidoscopy or colonoscopy use over this period. We analyze each state separately and map the findings for all states together to reveal patterns in the observed heterogeneity across states. Results We estimate a fully adjusted model for each state, based on a comprehensive socio-ecological model. We focus the discussion on the independent contributions of each of three community contextual variables that are amenable to policy intervention. Prevalence of Medicare managed care in one's neighborhood was associated with lower probability of screening in 12 states and higher probability in 19 states. Prevalence of poor English language ability among elders in one's neighborhood was associated with lower probability of screening in 15 states and higher probability in 6 states. Prevalence of poverty in one's neighborhood was associated with lower probability of screening in 36 states and higher probability in 5 states. Conclusions There are considerable differences across states in the socio-ecological context of CRC screening by endoscopy, suggesting that the current decentralized configuration of state-specific comprehensive cancer control programs is well suited to respond to the observed heterogeneity. We find that interventions to mediate language barriers are more critically needed in some states than in others. Medicare managed care penetration, hypothesized to affect information about and diffusion of new endoscopic
Fuzzy multinomial logistic regression analysis: A multi-objective programming approach
Abdalla, Hesham A.; El-Sayed, Amany A.; Hamed, Ramadan
2017-05-01
Parameter estimation for multinomial logistic regression is usually based on maximizing the likelihood function. For large well-balanced datasets, Maximum Likelihood (ML) estimation is a satisfactory approach. Unfortunately, ML can fail completely or at least produce poor results in terms of estimated probabilities and confidence intervals of parameters, specially for small datasets. In this study, a new approach based on fuzzy concepts is proposed to estimate parameters of the multinomial logistic regression. The study assumes that the parameters of multinomial logistic regression are fuzzy. Based on the extension principle stated by Zadeh and Bárdossy's proposition, a multi-objective programming approach is suggested to estimate these fuzzy parameters. A simulation study is used to evaluate the performance of the new approach versus Maximum likelihood (ML) approach. Results show that the new proposed model outperforms ML in cases of small datasets.
Application of Robust Regression and Bootstrap in Poductivity Analysis of GERD Variable in EU27
Dagmar Blatná
2014-06-01
Full Text Available The GERD is one of Europe 2020 headline indicators being tracked within the Europe 2020 strategy. The headline indicator is the 3% target for the GERD to be reached within the EU by 2020. Eurostat defi nes “GERD” as total gross domestic expenditure on research and experimental development in a percentage of GDP. GERD depends on numerous factors of a general economic background, namely of employment, innovation and research, science and technology. The values of these indicators vary among the European countries, and consequently the occurrence of outliers can be anticipated in corresponding analyses. In such a case, a classical statistical approach – the least squares method – can be highly unreliable, the robust regression methods representing an acceptable and useful tool. The aim of the present paper is to demonstrate the advantages of robust regression and applicability of the bootstrap approach in regression based on both classical and robust methods.
Latent cluster analysis of ALS phenotypes identifies prognostically differing groups.
Jeban Ganesalingam
Full Text Available BACKGROUND: Amyotrophic lateral sclerosis (ALS is a degenerative disease predominantly affecting motor neurons and manifesting as several different phenotypes. Whether these phenotypes correspond to different underlying disease processes is unknown. We used latent cluster analysis to identify groupings of clinical variables in an objective and unbiased way to improve phenotyping for clinical and research purposes. METHODS: Latent class cluster analysis was applied to a large database consisting of 1467 records of people with ALS, using discrete variables which can be readily determined at the first clinic appointment. The model was tested for clinical relevance by survival analysis of the phenotypic groupings using the Kaplan-Meier method. RESULTS: The best model generated five distinct phenotypic classes that strongly predicted survival (p<0.0001. Eight variables were used for the latent class analysis, but a good estimate of the classification could be obtained using just two variables: site of first symptoms (bulbar or limb and time from symptom onset to diagnosis (p<0.00001. CONCLUSION: The five phenotypic classes identified using latent cluster analysis can predict prognosis. They could be used to stratify patients recruited into clinical trials and generating more homogeneous disease groups for genetic, proteomic and risk factor research.
Identifying influential factors of business process performance using dependency analysis
Wetzstein, Branimir; Leitner, Philipp; Rosenberg, Florian; Dustdar, Schahram; Leymann, Frank
2011-02-01
We present a comprehensive framework for identifying influential factors of business process performance. In particular, our approach combines monitoring of process events and Quality of Service (QoS) measurements with dependency analysis to effectively identify influential factors. The framework uses data mining techniques to construct tree structures to represent dependencies of a key performance indicator (KPI) on process and QoS metrics. These dependency trees allow business analysts to determine how process KPIs depend on lower-level process metrics and QoS characteristics of the IT infrastructure. The structure of the dependencies enables a drill-down analysis of single factors of influence to gain a deeper knowledge why certain KPI targets are not met.
Spatial Quantile Regression In Analysis Of Healthy Life Years In The European Union Countries
Trzpiot Grażyna
2016-12-01
Full Text Available The paper investigates the impact of the selected factors on the healthy life years of men and women in the EU countries. The multiple quantile spatial autoregression models are used in order to account for substantial differences in the healthy life years and life quality across the EU members. Quantile regression allows studying dependencies between variables in different quantiles of the response distribution. Moreover, this statistical tool is robust against violations of the classical regression assumption about the distribution of the error term. Parameters of the models were estimated using instrumental variable method (Kim, Muller 2004, whereas the confidence intervals and p-values were bootstrapped.
JT-60 configuration parameters for feedback control determined by regression analysis
Matsukawa, Makoto; Hosogane, Nobuyuki; Ninomiya, Hiromasa
1991-12-01
The stepwise regression procedure was applied to obtain measurement formulas for equilibrium parameters used in the feedback control of JT-60. This procedure automatically selects variables necessary for the measurements, and selects a set of variables which are not likely to be picked up by physical considerations. Regression equations with stable and small multicollinearity were obtained and it was experimentally confirmed that the measurement formulas obtained through this procedure were accurate enough to be applicable to the feedback control of plasma configurations in JT-60.
Wilson, Asa B; Kerr, Bernard J; Bastian, Nathaniel D; Fulton, Lawrence V
2012-01-01
From 1980 to 1999, rural designated hospitals closed at a disproportionally high rate. In response to this emergent threat to healthcare access in rural settings, the Balanced Budget Act of 1997 made provisions for the creation of a new rural hospital--the critical access hospital (CAH). The conversion to CAH and the associated cost-based reimbursement scheme significantly slowed the closure rate of rural hospitals. This work investigates which methods can ensure the long-term viability of small hospitals. This article uses a two-step design to focus on a hypothesized relationship between technical efficiency of CAHs and a recently developed set of financial monitors for these entities. The goal is to identify the financial performance measures associated with efficiency. The first step uses data envelopment analysis (DEA) to differentiate efficient from inefficient facilities within a data set of 183 CAHs. Determining DEA efficiency is an a priori categorization of hospitals in the data set as efficient or inefficient. In the second step, DEA efficiency is the categorical dependent variable (efficient = 0, inefficient = 1) in the subsequent binary logistic regression (LR) model. A set of six financial monitors selected from the array of 20 measures were the LR independent variables. We use a binary LR to test the null hypothesis that recently developed CAH financial indicators had no predictive value for categorizing a CAH as efficient or inefficient, (i.e., there is no relationship between DEA efficiency and fiscal performance).
Kamruzzaman, Md; Mamun, A S M A; Bakar, Sheikh Muhammad Abu; Saw, Aik; Kamarul, T; Islam, Md Nurul; Hossain, Md Golam
2016-11-21
The aim of this study was to investigate the socioeconomic and demographic factors influencing the body mass index (BMI) of non-pregnant married Bangladeshi women of reproductive age. Secondary (Hierarchy) data from the 2011 Bangladesh Demographic and Health Survey, collected using two-stage stratified cluster sampling, were used. Two-level linear regression analysis was performed to remove the cluster effect of the variables. The mean BMI of married non-pregnant Bangladeshi women was 21.60±3.86 kg/m2, and the prevalence of underweight, overweight and obesity was 22.8%, 14.9% and 3.2%, respectively. After removing the cluster effect, age and age at first marriage were found to be positively (pchildren was negatively related with women's BMI. Lower BMI was especially found among women from rural areas and poor families, with an uneducated husband, with no television at home and who were currently breast-feeding. Age, total children ever born, age at first marriage, type of residence, education level, level of husband's education, wealth index, having a television at home and practising breast-feeding were found to be important predictors for the BMI of married Bangladeshi non-pregnant women of reproductive age. This information could be used to identify sections of the Bangladeshi population that require special attention, and to develop more effective strategies to resolve the problem of malnutrition.
Yu, Shuang; Liu, Guo-hai; Xia, Rong-sheng; Jiang, Hui
2016-01-01
In order to achieve the rapid monitoring of process state of solid state fermentation (SSF), this study attempted to qualitative identification of process state of SSF of feed protein by use of Fourier transform near infrared (FT-NIR) spectroscopy analysis technique. Even more specifically, the FT-NIR spectroscopy combined with Adaboost-SRDA-NN integrated learning algorithm as an ideal analysis tool was used to accurately and rapidly monitor chemical and physical changes in SSF of feed protein without the need for chemical analysis. Firstly, the raw spectra of all the 140 fermentation samples obtained were collected by use of Fourier transform near infrared spectrometer (Antaris II), and the raw spectra obtained were preprocessed by use of standard normal variate transformation (SNV) spectral preprocessing algorithm. Thereafter, the characteristic information of the preprocessed spectra was extracted by use of spectral regression discriminant analysis (SRDA). Finally, nearest neighbors (NN) algorithm as a basic classifier was selected and building state recognition model to identify different fermentation samples in the validation set. Experimental results showed as follows: the SRDA-NN model revealed its superior performance by compared with other two different NN models, which were developed by use of the feature information form principal component analysis (PCA) and linear discriminant analysis (LDA), and the correct recognition rate of SRDA-NN model achieved 94.28% in the validation set. In this work, in order to further improve the recognition accuracy of the final model, Adaboost-SRDA-NN ensemble learning algorithm was proposed by integrated the Adaboost and SRDA-NN methods, and the presented algorithm was used to construct the online monitoring model of process state of SSF of feed protein. Experimental results showed as follows: the prediction performance of SRDA-NN model has been further enhanced by use of Adaboost lifting algorithm, and the correct
Tomczyk, Aleksandra; Ewertowski, Marek; White, Piran; Kasprzak, Leszek
2016-04-01
The dual role of many Protected Natural Areas in providing benefits for both conservation and recreation poses challenges for management. Although recreation-based damage to ecosystems can occur very quickly, restoration can take many years. The protection of conservation interests at the same as providing for recreation requires decisions to be made about how to prioritise and direct management actions. Trails are commonly used to divert visitors from the most important areas of a site, but high visitor pressure can lead to increases in trail width and a concomitant increase in soil erosion. Here we use detailed field data on condition of recreational trails in Gorce National Park, Poland, as the basis for a regression tree analysis to determine the factors influencing trail deterioration, and link specific trail impacts with environmental, use related and managerial factors. We distinguished 12 types of trails, characterised by four levels of degradation: (1) trails with an acceptable level of degradation; (2) threatened trails; (3) damaged trails; and (4) heavily damaged trails. Damaged trails were the most vulnerable of all trails and should be prioritised for appropriate conservation and restoration. We also proposed five types of monitoring of recreational trail conditions: (1) rapid inventory of negative impacts; (2) monitoring visitor numbers and variation in type of use; (3) change-oriented monitoring focusing on sections of trail which were subjected to changes in type or level of use or subjected to extreme weather events; (4) monitoring of dynamics of trail conditions; and (5) full assessment of trail conditions, to be carried out every 10-15 years. The application of the proposed framework can enhance the ability of Park managers to prioritise their trail management activities, enhancing trail conditions and visitor safety, while minimising adverse impacts on the conservation value of the ecosystem. A.M.T. was supported by the Polish Ministry of
Salas, M.M.; Nascimento, G.G.; Vargas-Ferreira, F.; Tarquinio, S.B.; Huysmans, M.C.D.N.J.M.; Demarco, F.F.
2015-01-01
OBJECTIVE: The aim of the present study was to assess the influence of diet in tooth erosion presence in children and adolescents by meta-analysis and meta-regression. DATA: Two reviewers independently performed the selection process and the quality of studies was assessed. SOURCES: Studies publishe
Muller, Veronica; Brooks, Jessica; Tu, Wei-Mo; Moser, Erin; Lo, Chu-Ling; Chan, Fong
2015-01-01
Purpose: The main objective of this study was to determine the extent to which physical and cognitive-affective factors are associated with fibromyalgia (FM) fatigue. Method: A quantitative descriptive design using correlation techniques and multiple regression analysis. The participants consisted of 302 members of the National Fibromyalgia &…
Salas, M.M.; Nascimento, G.G.; Vargas-Ferreira, F.; Tarquinio, S.B.; Huysmans, M.C.D.N.J.M.; Demarco, F.F.
2015-01-01
OBJECTIVE: The aim of the present study was to assess the influence of diet in tooth erosion presence in children and adolescents by meta-analysis and meta-regression. DATA: Two reviewers independently performed the selection process and the quality of studies was assessed. SOURCES: Studies publishe
Swets, Marije; Dekker, Jack; van Emmerik-van Oortmerssen, Katelijne; Smid, Geert E.; Smit, Filip; de Haan, Lieuwe; Schoevers, Robert A.
Aims: The aims of this study were to conduct a meta-analysis and meta-regression to estimate the prevalence rates for obsessive compulsive symptoms (OCS) and obsessive compulsive disorder (OCD) in schizophrenia, and to investigate what influences these prevalence rates. Method: Studies were
Brabant, Marie-Eve; Hebert, Martine; Chagnon, Francois
2013-01-01
This study explored the clinical profiles of 77 female teenager survivors of sexual abuse and examined the association of abuse-related and personal variables with suicidal ideations. Analyses revealed that 64% of participants experienced suicidal ideations. Findings from classification and regression tree analysis indicated that depression,…
On the Usefulness of a Multilevel Logistic Regression Approach to Person-Fit Analysis
Conijn, Judith M.; Emons, Wilco H. M.; van Assen, Marcel A. L. M.; Sijtsma, Klaas
2011-01-01
The logistic person response function (PRF) models the probability of a correct response as a function of the item locations. Reise (2000) proposed to use the slope parameter of the logistic PRF as a person-fit measure. He reformulated the logistic PRF model as a multilevel logistic regression model and estimated the PRF parameters from this…
Li, T.; Sun, L.; Zou, L.
2009-01-01
This study assesses the impact of government shareholding on corporate performance using a sample of 643 non-financial companies listed on the Chinese stock exchanges. In view of the controversial empirical findings in the literature and the limitations of the least squares regressions, we adopt the
Genetic analysis of tolerance to infections using random regressions: a simulation study
Kause, A.
2011-01-01
Tolerance to infections is the ability of a host to limit the impact of a given pathogen burden on host performance. This simulation study demonstrated the merit of using random regressions to estimate unbiased genetic variances for tolerance slope and its genetic correlations with other traits,
The Analysis of Nonstationary Time Series Using Regression, Correlation and Cointegration
Johansen, Søren
2012-01-01
There are simple well-known conditions for the validity of regression and correlation as statistical tools. We analyse by examples the effect of nonstationarity on inference using these methods and compare them to model based inference using the cointegrated vector autoregressive model. Finally we...
An Analysis of the Indicator Saturation Estimator as a Robust Regression Estimator
Johansen, Søren; Nielsen, Bent
An algorithm suggested by Hendry (1999) for estimation in a regression with more regressors than observations, is analyzed with the purpose of finding an estimator that is robust to outliers and structural breaks. This estimator is an example of a one-step M-estimator based on Huber's skip functi...
Schlechtingen, Meik; Santos, Ilmar
2011-01-01
This paper presents the research results of a comparison of three different model based approaches for wind turbine fault detection in online SCADA data, by applying developed models to five real measured faults and anomalies. The regression based model as the simplest approach to build a normal ...
Hoeflinger, Jennifer L; Hoeflinger, Daniel E; Miller, Michael J
2017-01-01
Herein, an open-source method to generate quantitative bacterial growth data from high-throughput microplate assays is described. The bacterial lag time, maximum specific growth rate, doubling time and delta OD are reported. Our method was validated by carbohydrate utilization of lactobacilli, and visual inspection revealed 94% of regressions were deemed excellent.
Quantile regression for the statistical analysis of immunological data with many non-detects
Eilers, P.H.C.; Roder, E.; Savelkoul, H.F.J.; Wijk, van R.G.
2012-01-01
Background Immunological parameters are hard to measure. A well-known problem is the occurrence of values below the detection limit, the non-detects. Non-detects are a nuisance, because classical statistical analyses, like ANOVA and regression, cannot be applied. The more advanced statistical techni
Quantile regression for the statistical analysis of immunological data with many non-detects
P.H.C. Eilers (Paul); E. Röder (Esther); H.F.J. Savelkoul (Huub); R. Gerth van Wijk (Roy)
2012-01-01
textabstractBackground: Immunological parameters are hard to measure. A well-known problem is the occurrence of values below the detection limit, the non-detects. Non-detects are a nuisance, because classical statistical analyses, like ANOVA and regression, cannot be applied. The more advanced stati
On the Usefulness of a Multilevel Logistic Regression Approach to Person-Fit Analysis
Conijn, Judith M.; Emons, Wilco H. M.; van Assen, Marcel A. L. M.; Sijtsma, Klaas
2011-01-01
The logistic person response function (PRF) models the probability of a correct response as a function of the item locations. Reise (2000) proposed to use the slope parameter of the logistic PRF as a person-fit measure. He reformulated the logistic PRF model as a multilevel logistic regression model and estimated the PRF parameters from this…
Analysis of Multivariate Experimental Data Using A Simplified Regression Model Search Algorithm
Ulbrich, Norbert Manfred
2013-01-01
A new regression model search algorithm was developed in 2011 that may be used to analyze both general multivariate experimental data sets and wind tunnel strain-gage balance calibration data. The new algorithm is a simplified version of a more complex search algorithm that was originally developed at the NASA Ames Balance Calibration Laboratory. The new algorithm has the advantage that it needs only about one tenth of the original algorithm's CPU time for the completion of a search. In addition, extensive testing showed that the prediction accuracy of math models obtained from the simplified algorithm is similar to the prediction accuracy of math models obtained from the original algorithm. The simplified algorithm, however, cannot guarantee that search constraints related to a set of statistical quality requirements are always satisfied in the optimized regression models. Therefore, the simplified search algorithm is not intended to replace the original search algorithm. Instead, it may be used to generate an alternate optimized regression model of experimental data whenever the application of the original search algorithm either fails or requires too much CPU time. Data from a machine calibration of NASA's MK40 force balance is used to illustrate the application of the new regression model search algorithm.
Wiley, Kristofor R.
2013-01-01
Many of the social and emotional needs that have historically been associated with gifted students have been questioned on the basis of recent empirical evidence. Research on the topic, however, is often limited by sample size, selection bias, or definition. This study addressed these limitations by applying linear regression methodology to data…
Baylor, Carolyn; Yorkston, Kathryn; Bamer, Alyssa; Britton, Deanna; Amtmann, Dagmar
2010-01-01
Purpose: To explore variables associated with self-reported communicative participation in a sample (n = 498) of community-dwelling adults with multiple sclerosis (MS). Method: A battery of questionnaires was administered online or on paper per participant preference. Data were analyzed using multiple linear backward stepwise regression. The…
Identifying Organizational Inefficiencies with Pictorial Process Analysis (PPA
David John Patrishkoff
2013-11-01
Full Text Available Pictorial Process Analysis (PPA was created by the author in 2004. PPA is a unique methodology which offers ten layers of additional analysis when compared to standard process mapping techniques. The goal of PPA is to identify and eliminate waste, inefficiencies and risk in manufacturing or transactional business processes at 5 levels in an organization. The highest level being assessed is the process management, followed by the process work environment, detailed work habits, process performance metrics and general attitudes towards the process. This detailed process assessment and analysis is carried out during process improvement brainstorming efforts and Kaizen events. PPA creates a detailed visual efficiency rating for each step of the process under review. A selection of 54 pictorial Inefficiency Icons (cards are available for use to highlight major inefficiencies and risks that are present in the business process under review. These inefficiency icons were identified during the author's independent research on the topic of why things go wrong in business. This paper will highlight how PPA was developed and show the steps required to conduct Pictorial Process Analysis on a sample manufacturing process. The author has successfully used PPA to dramatically improve business processes in over 55 different industries since 2004.
Kahane, Leo H
2007-01-01
Using a friendly, nontechnical approach, the Second Edition of Regression Basics introduces readers to the fundamentals of regression. Accessible to anyone with an introductory statistics background, this book builds from a simple two-variable model to a model of greater complexity. Author Leo H. Kahane weaves four engaging examples throughout the text to illustrate not only the techniques of regression but also how this empirical tool can be applied in creative ways to consider a broad array of topics. New to the Second Edition Offers greater coverage of simple panel-data estimation:
Erener, Arzu; Sivas, A. Abdullah; Selcuk-Kestel, A. Sevtap; Düzgün, H. Sebnem
2017-07-01
All of the quantitative landslide susceptibility mapping (QLSM) methods requires two basic data types, namely, landslide inventory and factors that influence landslide occurrence (landslide influencing factors, LIF). Depending on type of landslides, nature of triggers and LIF, accuracy of the QLSM methods differs. Moreover, how to balance the number of 0 (nonoccurrence) and 1 (occurrence) in the training set obtained from the landslide inventory and how to select which one of the 1's and 0's to be included in QLSM models play critical role in the accuracy of the QLSM. Although performance of various QLSM methods is largely investigated in the literature, the challenge of training set construction is not adequately investigated for the QLSM methods. In order to tackle this challenge, in this study three different training set selection strategies along with the original data set is used for testing the performance of three different regression methods namely Logistic Regression (LR), Bayesian Logistic Regression (BLR) and Fuzzy Logistic Regression (FLR). The first sampling strategy is proportional random sampling (PRS), which takes into account a weighted selection of landslide occurrences in the sample set. The second method, namely non-selective nearby sampling (NNS), includes randomly selected sites and their surrounding neighboring points at certain preselected distances to include the impact of clustering. Selective nearby sampling (SNS) is the third method, which concentrates on the group of 1's and their surrounding neighborhood. A randomly selected group of landslide sites and their neighborhood are considered in the analyses similar to NNS parameters. It is found that LR-PRS, FLR-PRS and BLR-Whole Data set-ups, with order, yield the best fits among the other alternatives. The results indicate that in QLSM based on regression models, avoidance of spatial correlation in the data set is critical for the model's performance.
Investigations upon the indefinite rolls quality assurance in multiple regression analysis
Kiss, I.
2012-04-01
Full Text Available The rolling rolls quality has been enhanced mainly due to the improvements of the chemical compositions of rolls materials. The realization of an optimal chemical composition can constitute a technical efficient mode to assure the exploitation properties, the material from which the rolling mills rolls are manufactured having a higher importance in this sense. This paper continues to present the scientifically results of our experimental research in the area of the rolling rolls. The basic research contains concrete elements of immediate practical utilities in the metallurgical enterprises, for the quality improvements of rolls, having in last as the aim the durability growth and the safety in exploitation. This paper presents an analysis of the chemical composition, the influences upon the mechanical properties of the indefinite cast iron rolls. We present some mathematical correlations and graphical interpretations between the hardness (on the working surface and on necks and the chemical composition. Using the double and triple correlations which is really helpful in the foundry practice, as it allows us to determine variation boundaries for the chemical composition, in view the obtaining the optimal values of the hardness. We suggest a mathematical interpretation of the influence of the chemical composition over the hardness of these indefinite rolling rolls. In this sense we use the multiple regression analysis which can be an important statistical tool for the investigation of relationships between variables. The enunciation of some mathematically modeling results can be described through a number of multi-component equations determined for the spaces with 3 and 4 dimensions. Also, the regression surfaces, curves of levels and volumes of variations can be represented and interpreted by technologists considering these as correlation diagrams between the analyzed variables. In this sense, these researches results can be used in the engineers
Zhou, Jinzhe; Zhou, Yanbing; Cao, Shougen; Li, Shikuan; Wang, Hao; Niu, Zhaojian; Chen, Dong; Wang, Dongsheng; Lv, Liang; Zhang, Jian; Li, Yu; Jiao, Xuelong; Tan, Xiaojie; Zhang, Jianli; Wang, Haibo; Zhang, Bingyuan; Lu, Yun; Sun, Zhenqing
2016-01-01
Reporting of surgical complications is common, but few provide information about the severity and estimate risk factors of complications. If have, but lack of specificity. We retrospectively analyzed data on 2795 gastric cancer patients underwent surgical procedure at the Affiliated Hospital of Qingdao University between June 2007 and June 2012, established multivariate logistic regression model to predictive risk factors related to the postoperative complications according to the Clavien-Dindo classification system. Twenty-four out of 86 variables were identified statistically significant in univariate logistic regression analysis, 11 significant variables entered multivariate analysis were employed to produce the risk model. Liver cirrhosis, diabetes mellitus, Child classification, invasion of neighboring organs, combined resection, introperative transfusion, Billroth II anastomosis of reconstruction, malnutrition, surgical volume of surgeons, operating time and age were independent risk factors for postoperative complications after gastrectomy. Based on logistic regression equation, p=Exp∑BiXi / (1+Exp∑BiXi), multivariate logistic regression predictive model that calculated the risk of postoperative morbidity was developed, p = 1/(1 + e((4.810-1.287X1-0.504X2-0.500X3-0.474X4-0.405X5-0.318X6-0.316X7-0.305X8-0.278X9-0.255X10-0.138X11))). The accuracy, sensitivity and specificity of the model to predict the postoperative complications were 86.7%, 76.2% and 88.6%, respectively. This risk model based on Clavien-Dindo grading severity of complications system and logistic regression analysis can predict severe morbidity specific to an individual patient's risk factors, estimate patients' risks and benefits of gastric surgery as an accurate decision-making tool and may serve as a template for the development of risk models for other surgical groups.
Lorenzo-Seva, Urbano; Ferrando, Pere J
2011-03-01
We provide an SPSS program that implements currently recommended techniques and recent developments for selecting variables in multiple linear regression analysis via the relative importance of predictors. The approach consists of: (1) optimally splitting the data for cross-validation, (2) selecting the final set of predictors to be retained in the equation regression, and (3) assessing the behavior of the chosen model using standard indices and procedures. The SPSS syntax, a short manual, and data files related to this article are available as supplemental materials from brm.psychonomic-journals.org/content/supplemental.
Rolling Regressions with Stata
Kit Baum
2004-01-01
This talk will describe some work underway to add a "rolling regression" capability to Stata's suite of time series features. Although commands such as "statsby" permit analysis of non-overlapping subsamples in the time domain, they are not suited to the analysis of overlapping (e.g. "moving window") samples. Both moving-window and widening-window techniques are often used to judge the stability of time series regression relationships. We will present an implementation of a rolling regression...
Liu, Jian; Gao, Yun-Hua; Li, Ding-Dong; Gao, Yan-Chun; Hou, Ling-Mi; Xie, Ting
2014-01-01
To compare the value of contrast-enhanced ultrasound (CEUS) qualitative and quantitative analysis in the identification of breast tumor lumps. Qualitative and quantitative indicators of CEUS for 73 cases of breast tumor lumps were retrospectively analyzed by univariate and multivariate approaches. Logistic regression was applied and ROC curves were drawn for evaluation and comparison. The CEUS qualitative indicator-generated regression equation contained three indicators, namely enhanced homogeneity, diameter line expansion and peak intensity grading, which demonstrated prediction accuracy for benign and malignant breast tumor lumps of 91.8%; the quantitative indicator-generated regression equation only contained one indicator, namely the relative peak intensity, and its prediction accuracy was 61.5%. The corresponding areas under the ROC curve for qualitative and quantitative analyses were 91.3% and 75.7%, respectively, which exhibited a statistically significant difference by the Z test (Pqualitative analysis to identify breast tumor lumps is better than with quantitative analysis.
Mohammad Ali HORMOZI
2015-06-01
Full Text Available We analyzed the effect of chemical fertilizer, seed, biocide, farm machinery and labor hours on production of paddy (paddy rice in the Khuzestan province in the South Western part of Iran. Here we test two methods (linear regression and neural network. We conclude that the results gotten by neural network with no hidden layer and linear regression are closed to each other. We insist that for a data set of this type the regression analysis yields more reliable results compared to a neural network. They suggest that machinery has a very clear positive effect on yield while fertilizer and labor doesn't affect on it. One can say that there is no necessity that increasing the amount of some "useful input" increase paddy production.
Veilleux, Andrea G.; Stedinger, Jery R.; Eash, David A.
2012-01-01
This paper summarizes methodological advances in regional log-space skewness analyses that support flood-frequency analysis with the log Pearson Type III (LP3) distribution. A Bayesian Weighted Least Squares/Generalized Least Squares (B-WLS/B-GLS) methodology that relates observed skewness coefficient estimators to basin characteristics in conjunction with diagnostic statistics represents an extension of the previously developed B-GLS methodology. B-WLS/B-GLS has been shown to be effective in two California studies. B-WLS/B-GLS uses B-WLS to generate stable estimators of model parameters and B-GLS to estimate the precision of those B-WLS regression parameters, as well as the precision of the model. The study described here employs this methodology to develop a regional skewness model for the State of Iowa. To provide cost effective peak-flow data for smaller drainage basins in Iowa, the U.S. Geological Survey operates a large network of crest stage gages (CSGs) that only record flow values above an identified recording threshold (thus producing a censored data record). CSGs are different from continuous-record gages, which record almost all flow values and have been used in previous B-GLS and B-WLS/B-GLS regional skewness studies. The complexity of analyzing a large CSG network is addressed by using the B-WLS/B-GLS framework along with the Expected Moments Algorithm (EMA). Because EMA allows for the censoring of low outliers, as well as the use of estimated interval discharges for missing, censored, and historic data, it complicates the calculations of effective record length (and effective concurrent record length) used to describe the precision of sample estimators because the peak discharges are no longer solely represented by single values. Thus new record length calculations were developed. The regional skewness analysis for the State of Iowa illustrates the value of the new B-WLS/BGLS methodology with these new extensions.
Salas, M M S; Nascimento, G G; Vargas-Ferreira, F; Tarquinio, S B C; Huysmans, M C D N J M; Demarco, F F
2015-08-01
The aim of the present study was to assess the influence of diet in tooth erosion presence in children and adolescents by meta-analysis and meta-regression. Two reviewers independently performed the selection process and the quality of studies was assessed. Studies published until May 2014 were identified in electronic databases: Pubmed, EBSHost, Scopus, Science direct, Web of Science and Scielo, using keywords. Criteria used included: observational studies, tooth erosion and diet, subject age range 8-19 years old, permanent dentition and index. Meta-analysis was performed and in case of heterogeneity a random-effects model was used. Thirteen studies that fulfilled the inclusion criteria were selected. Higher consumption of carbonated drinks (p=0.001) or acid snacks/sweets (p=0.01 and for acid fruit juices (p=0.03)) increased the odds for tooth erosion, while higher intake of milk (p=0.028) and yogurt (p=0.002) reduced the erosion occurrence. Heterogeneity was observed in soft drinks, confectionary and snacks and acidic fruit juices models. Methodological issues regarding the questionnaires administration and the inclusion of other variables, such as food groups and tooth brushing, explained partially the heterogeneity observed. Some dietary components (carbonated drinks, acid snacks/sweets and natural acidic fruits juice) increased erosion occurrence while milk and yogurt had a protective effect. Methods to assess diet could influence the homogeneity of the studies and should be considered during the study design. The method to assess diet should be carefully considered and well conducted as part of the clinical assessment of tooth erosion, since diet could influence the occurrence of tooth erosion. Copyright © 2015 Elsevier Ltd. All rights reserved.
Evaluating Non-Linear Regression Models in Analysis of Persian Walnut Fruit Growth
I. Karamatlou
2016-02-01
Full Text Available Introduction: Persian walnut (Juglans regia L. is a large, wind-pollinated, monoecious, dichogamous, long lived, perennial tree cultivated for its high quality wood and nuts throughout the temperate regions of the world. Growth model methodology has been widely used in the modeling of plant growth. Mathematical models are important tools to study the plant growth and agricultural systems. These models can be applied for decision-making anddesigning management procedures in horticulture. Through growth analysis, planning for planting systems, fertilization, pruning operations, harvest time as well as obtaining economical yield can be more accessible.Non-linear models are more difficult to specify and estimate than linear models. This research was aimed to studynon-linear regression models based on data obtained from fruit weight, length and width. Selecting the best models which explain that fruit inherent growth pattern of Persian walnut was a further goal of this study. Materials and Methods: The experimental material comprising 14 Persian walnut genotypes propagated by seed collected from a walnut orchard in Golestan province, Minoudasht region, Iran, at latitude 37◦04’N; longitude 55◦32’E; altitude 1060 m, in a silt loam soil type. These genotypes were selected as a representative sampling of the many walnut genotypes available throughout the Northeastern Iran. The age range of walnut trees was 30 to 50 years. The annual mean temperature at the location is16.3◦C, with annual mean rainfall of 690 mm.The data used here is the average of walnut fresh fruit and measured withgram/millimeter/day in2011.According to the data distribution pattern, several equations have been proposed to describesigmoidal growth patterns. Here, we used double-sigmoid and logistic–monomolecular models to evaluate fruit growth based on fruit weight and4different regression models in cluding Richards, Gompertz, Logistic and Exponential growth for evaluation
Rice Transcriptome Analysis to Identify Possible Herbicide Quinclorac Detoxification Genes
Wenying eXu
2015-09-01
Full Text Available Quinclorac is a highly selective auxin-type herbicide, and is widely used in the effective control of barnyard grass in paddy rice fields, improving the world’s rice yield. The herbicide mode of action of quinclorac has been proposed and hormone interactions affect quinclorac signaling. Because of widespread use, quinclorac may be transported outside rice fields with the drainage waters, leading to soil and water pollution and environmental health problems.In this study, we used 57K Affymetrix rice whole-genome array to identify quinclorac signaling response genes to study the molecular mechanisms of action and detoxification of quinclorac in rice plants. Overall, 637 probe sets were identified with differential expression levels under either 6 or 24 h of quinclorac treatment. Auxin-related genes such as GH3 and OsIAAs responded to quinclorac treatment. Gene Ontology analysis showed that genes of detoxification-related family genes were significantly enriched, including cytochrome P450, GST, UGT, and ABC and drug transporter genes. Moreover, real-time RT-PCR analysis showed that top candidate P450 families such as CYP81, CYP709C and CYP72A genes were universally induced by different herbicides. Some Arabidopsis genes for the same P450 family were up-regulated under quinclorac treatment.We conduct rice whole-genome GeneChip analysis and the first global identification of quinclorac response genes. This work may provide potential markers for detoxification of quinclorac and biomonitors of environmental chemical pollution.
Lee, Soo Min; Lee, Jae-Won
2014-11-01
In this study, the optimal conditions for biomass torrefaction were determined by comparing the gain of energy content to the weight loss of biomass from the final products. Torrefaction experiments were performed at temperatures ranging from 220 to 280°C using 20-80min reaction times. Polynomial regression models ranging from the 1st to the 3rd order were used to determine a relationship between the severity factor (SF) and calorific value or weight loss. The intersection of two regression models for calorific value and weight loss was determined and assumed to be the optimized SF. The optimized SFs on each biomass ranged from 6.056 to 6.372. Optimized torrefaction conditions were determined at various reaction times of 15, 30, and 60min. The average optimized temperature was 248.55°C in the studied biomass when torrefaction was performed for 60min.
Gmur, Stephan; Vogt, Daniel; Zabowski, Darlene; Moskal, L Monika
2012-01-01
The characterization of soil attributes using hyperspectral sensors has revealed patterns in soil spectra that are known to respond to mineral composition, organic matter, soil moisture and particle size distribution. Soil samples from different soil horizons of replicated soil series from sites located within Washington and Oregon were analyzed with the FieldSpec Spectroradiometer to measure their spectral signatures across the electromagnetic range of 400 to 1,000 nm. Similarity rankings of individual soil samples reveal differences between replicate series as well as samples within the same replicate series. Using classification and regression tree statistical methods, regression trees were fitted to each spectral response using concentrations of nitrogen, carbon, carbonate and organic matter as the response variables. Statistics resulting from fitted trees were: nitrogen R(2) 0.91 (p organic matter R(2) 0.98 (p organic matter for upper soil horizons in a nondestructive method.
Identifying Sources of Difference in Reliability in Content Analysis
Elizabeth Murphy
2005-07-01
Full Text Available This paper reports on a case study which identifies and illustrates sources of difference in agreement in relation to reliability in a context of quantitative content analysis of a transcript of an online asynchronous discussion (OAD. Transcripts of 10 students in a month-long online asynchronous discussion were coded by two coders using an instrument with two categories, five processes, and 19 indicators of Problem Formulation and Resolution (PFR. Sources of difference were identified in relation to: coders; tasks; and students. Reliability values were calculated at the levels of categories, processes, and indicators. At the most detailed level of coding on the basis of the indicator, findings revealed that the overall level of reliability between coders was .591 when measured with Cohen’s kappa. The difference between tasks at the same level ranged from .349 to .664, and the difference between participants ranged from .390 to .907. Implications for training and research are discussed.
Proteogenomic Analysis Identifies a Novel Human SHANK3 Isoform
Fahad Benthani
2015-05-01
Full Text Available Mutations of the SHANK3 gene have been associated with autism spectrum disorder. Individuals harboring different SHANK3 mutations display considerable heterogeneity in their cognitive impairment, likely due to the high SHANK3 transcriptional diversity. In this study, we report a novel interaction between the Mutated in colorectal cancer (MCC protein and a newly identified SHANK3 protein isoform in human colon cancer cells and mouse brain tissue. Hence, our proteogenomic analysis identifies a new human long isoform of the key synaptic protein SHANK3 that was not predicted by the human reference genome. Taken together, our findings describe a potential new role for MCC in neurons, a new human SHANK3 long isoform and, importantly, highlight the use of proteomic data towards the re-annotation of GC-rich genomic regions.
Regional variation in the prevalence of E. coli O157 in cattle: a meta-analysis and meta-regression.
Md Zohorul Islam
Full Text Available Escherichia coli O157 (EcO157 infection has been recognized as an important global public health concern. But information on the prevalence of EcO157 in cattle at the global and at the wider geographical levels is limited, if not absent. This is the first meta-analysis to investigate the point prevalence of EcO157 in cattle at the global level and to explore the factors contributing to variation in prevalence estimates.Seven electronic databases- CAB Abstracts, PubMed, Biosis Citation Index, Medline, Web of Knowledge, Scirus and Scopus were searched for relevant publications from 1980 to 2012. A random effect meta-analysis model was used to produce the pooled estimates. The potential sources of between study heterogeneity were identified using meta-regression.A total of 140 studies consisting 220,427 cattle were included in the meta-analysis. The prevalence estimate of EcO157 in cattle at the global level was 5.68% (95% CI, 5.16-6.20. The random effects pooled prevalence estimates in Africa, Northern America, Oceania, Europe, Asia and Latin America-Caribbean were 31.20% (95% CI, 12.35-50.04, 7.35% (95% CI, 6.44-8.26, 6.85% (95% CI, 2.41-11.29, 5.15% (95% CI, 4.21-6.09, 4.69% (95% CI, 3.05-6.33 and 1.65% (95% CI, 0.77-2.53, respectively. Between studies heterogeneity was evidenced in most regions. World region (p<0.001, type of cattle (p<0.001 and to some extent, specimens (p = 0.074 as well as method of pre-enrichment (p = 0.110, were identified as factors for variation in the prevalence estimates of EcO157 in cattle.The prevalence of the organism seems to be higher in the African and Northern American regions. The important factors that might have influence in the estimates of EcO157 are type of cattle and kind of screening specimen. Their roles need to be determined and they should be properly handled in any survey to estimate the true prevalence of EcO157.
Predicting Student Success on the Texas Chemistry STAAR Test: A Logistic Regression Analysis
Johnson, William L.; Johnson, Annabel M.; Johnson, Jared
2012-01-01
Background: The context is the new Texas STAAR end-of-course testing program. Purpose: The authors developed a logistic regression model to predict who would pass-or-fail the new Texas chemistry STAAR end-of-course exam. Setting: Robert E. Lee High School (5A) with an enrollment of 2700 students, Tyler, Texas. Date of the study was the 2011-2012…
Javali Shivalingappa; Pandit Parameshwar
2010-01-01
Aim: The study aimed to determine the factors associated with periodontal disease (different levels of severity) by using different regression models for ordinal data. Design: A cross-sectional design was employed using clinical examination and ′questionnaire with interview′ method. Materials and Methods: The study was conducted during June 2008 to October 2008 in Dharwad, Karnataka, India. It involved a systematic random sample of 1760 individuals aged 18-40 years. The periodon...
Unification of regression-based methods for the analysis of natural selection.
Morrissey, Michael B; Sakrejda, Krzysztof
2013-07-01
Regression analyses are central to characterization of the form and strength of natural selection in nature. Two common analyses that are currently used to characterize selection are (1) least squares-based approximation of the individual relative fitness surface for the purpose of obtaining quantitatively useful selection gradients, and (2) spline-based estimation of (absolute) fitness functions to obtain flexible inference of the shape of functions by which fitness and phenotype are related. These two sets of methodologies are often implemented in parallel to provide complementary inferences of the form of natural selection. We unify these two analyses, providing a method whereby selection gradients can be obtained for a given observed distribution of phenotype and characterization of a function relating phenotype to fitness. The method allows quantitatively useful selection gradients to be obtained from analyses of selection that adequately model nonnormal distributions of fitness, and provides unification of the two previously separate regression-based fitness analyses. We demonstrate the method by calculating directional and quadratic selection gradients associated with a smooth regression-based generalized additive model of the relationship between neonatal survival and the phenotypic traits of gestation length and birth mass in humans.
SHAO, Xueguang; CHEN, Da; XU, Heng; LIU, Zhichao; CAI, Wensheng
2009-01-01
Partial least-squares (PLS) regression has been presented as a powerful tool for spectral quantitative measure- ment. However, the improvement of the robustness and stability of PLS models is still needed, because it is difficult to build a stable model when complex samples are analyzed or outliers are contained in the calibration data set. To achieve the purpose, a robust ensemble PLS technique based on probability resampling was proposed, which is named RE-PLS. In the proposed method, a probability is firstly obtained for each calibration sample from its resid- ual in a robust regression. Then, multiple PLS models are constructed based on probability resampling. At last, the multiple PLS models are used to predict unknown samples by taking the average of the predictions from the multi- ple models as final prediction result. To validate the effectiveness and universality of the proposed method, it was applied to two different sets of NIR spectra. The results show that RE-PLS can not only effectively avoid the inter- ference of outliers but also enhance the precision of prediction and the stability of PLS regression. Thus, it may pro- vide a useful tool for multivariate calibration with multiple outliers.
Yang, Yan; Simpson, Douglas G
2012-08-01
Health and safety studies that entail both incidence and magnitude of effects produce semi-continuous outcomes, in which the response is either zero or a continuous positive value. Zero-inflated left-censored models typically employ latent mixture constructions to allow different covariate processes to impact the incidence versus the magnitude. Assessment of the model, however, requires a focus on the observable characteristics. We employ a conditional decomposition approach, in which the model assessment is partitioned into two observable components: the adequacy of the marginal probability model for the boundary value and the adequacy of the conditional model for values strictly above the boundary. A conditional likelihood decomposition facilitates the statistical assessment. For corresponding residual and graphical analysis, the conditional mean and quantile functions for events above the boundary and the marginal probabilities of boundary events are investigated. Large sample standard errors for these quantities are derived for enhanced graphical assessment, and simulation is conducted to investigate the finite-sample behaviour. The methods are illustrated with data from two health-related safety studies. In each case, the conditional assessments identify the source for lack of fit of the previously considered model and thus lead to an improved model.
Julio Cesar de Oliveira
2014-04-01
Full Text Available MODerate resolution Imaging Spectroradiometer (MODIS data are largely used in multitemporal analysis of various Earth-related phenomena, such as vegetation phenology, land use/land cover change, deforestation monitoring, and time series analysis. In general, the MODIS products used to undertake multitemporal analysis are composite mosaics of the best pixels over a certain period of time. However, it is common to find bad pixels in the composition that affect the time series analysis. We present a filtering methodology that considers the pixel position (location in space and time (position in the temporal data series to define a new value for the bad pixel. This methodology, called Window Regression (WR, estimates the value of the point of interest, based on the regression analysis of the data selected by a spatial-temporal window. The spatial window is represented by eight pixels neighboring the pixel under evaluation, and the temporal window selects a set of dates close to the date of interest (either earlier or later. Intensities of noises were simulated over time and space, using the MOD13Q1 product. The method presented and other techniques (4253H twice, Mean Value Iteration (MVI and Savitzky–Golay were evaluated using the Mean Absolute Percentage Error (MAPE and Akaike Information Criteria (AIC. The tests revealed the consistently superior performance of the Window Regression approach to estimate new Normalized Difference Vegetation Index (NDVI values irrespective of the intensity of the noise simulated.
Haghighi, Mona; Johnson, Suzanne Bennett; Qian, Xiaoning; Lynch, Kristian F; Vehik, Kendra; Huang, Shuai
2016-01-01
Regression models are extensively used in many epidemiological studies to understand the linkage between specific outcomes of interest and their risk factors. However, regression models in general examine the average effects of the risk factors and ignore subgroups with different risk profiles. As a result, interventions are often geared towards the average member of the population, without consideration of the special health needs of different subgroups within the population. This paper demonstrates the value of using rule-based analysis methods that can identify subgroups with heterogeneous risk profiles in a population without imposing assumptions on the subgroups or method. The rules define the risk pattern of subsets of individuals by not only considering the interactions between the risk factors but also their ranges. We compared the rule-based analysis results with the results from a logistic regression model in The Environmental Determinants of Diabetes in the Young (TEDDY) study. Both methods detected a similar suite of risk factors, but the rule-based analysis was superior at detecting multiple interactions between the risk factors that characterize the subgroups. A further investigation of the particular characteristics of each subgroup may detect the special health needs of the subgroup and lead to tailored interventions.
Lidar point density analysis: implications for identifying water bodies
Worstell, Bruce B.; Poppenga, Sandra; Evans, Gayla A.; Prince, Sandra
2014-01-01
Most airborne topographic light detection and ranging (lidar) systems operate within the near-infrared spectrum. Laser pulses from these systems frequently are absorbed by water and therefore do not generate reflected returns on water bodies in the resulting void regions within the lidar point cloud. Thus, an analysis of lidar voids has implications for identifying water bodies. Data analysis techniques to detect reduced lidar return densities were evaluated for test sites in Blackhawk County, Iowa, and Beltrami County, Minnesota, to delineate contiguous areas that have few or no lidar returns. Results from this study indicated a 5-meter radius moving window with fewer than 23 returns (28 percent of the moving window) was sufficient for delineating void regions. Techniques to provide elevation values for void regions to flatten water features and to force channel flow in the downstream direction also are presented.
2013-01-01
Background In the treatment of multiple sclerosis (MS), the most important therapeutic aim of disease-modifying treatments (DMTs) is to prevent or postpone long-term disability. Given the typically slow progression observed in the majority of relapsing-remitting MS (RRMS) patients, the primary endpoint for most randomized clinical trials (RCTs) is a reduction in relapse rate. It is widely assumed that reducing relapse rate will slow disability progression. Similarly, MRI studies suggest that reducing T2 lesions will be associated with slowing long-term disability in MS. The objective of this study was to evaluate the relationship between treatment effects on relapse rates and active T2 lesions to differences in disease progression (as measured by the Expanded Disability Status Scale [EDSS]) in trials evaluating patients with clinically isolated syndrome (CIS), RRMS, and secondary progressive MS (SPMS). Methods A systematic literature review was conducted in Medline, Embase, CENTRAL, and PsycINFO to identify randomized trials published in English from January 1, 1993-June 3, 2013 evaluating DMTs in adult MS patients using keywords for CIS, RRMS, and SPMS combined with keywords for relapse and recurrence. Eligible studies were required to report outcomes of relapse and T2 lesion changes or disease progression in CIS, RRMS, or SPMS patients receiving DMTs and have a follow-up duration of at least 22 months. Ultimately, 40 studies satisfied these criteria for inclusion. Regression analyses were conducted on RCTs to relate differences between the effect of treatments on relapse rates and on active T2 lesions to differences between the effects of treatments on disease progression (as measured by EDSS). Results Regression analysis determined there is a substantive clinically and statistically significant association between concurrent treatment effects in relapse rate and EDSS; p EDSS measures also were found (p < 0.05), with some suggestion that the strength of
Use of discriminant analysis to identify propensity for purchasing properties
Ricardo Floriani
2015-03-01
Full Text Available Properties usually represent a milestone for people and families due to the high added-value when compared with family income. The objective of this study is the proposition of a discrimination model, by a discriminant analysis of people with characteristics (according to independent variables classified as potential buyers of properties, as well as to identify the interest in the use of such property, if it will be assigned to housing or leisure activities such as a cottage or beach house, and/or for investment. Thus, the following research question is proposed: What are the characteristics that better describe the profile of people which intend to acquire properties? The study justifies itself by its economic relevance in the real estate industry, as well as to the players of the real estate Market that may develop products based on the profile of potential customers. As a statistical technique, discriminant analysis was applied to the data gathered by questionnaire, which was sent via e-mail. Three hundred and thirty four responses were gathered. Based on this study, it was observed that it is possible to identify the intention for acquired properties, as well the purpose for acquiring it, for housing or investments.
Cluster analysis of clinical data identifies fibromyalgia subgroups.
Elisa Docampo
Full Text Available INTRODUCTION: Fibromyalgia (FM is mainly characterized by widespread pain and multiple accompanying symptoms, which hinder FM assessment and management. In order to reduce FM heterogeneity we classified clinical data into simplified dimensions that were used to define FM subgroups. MATERIAL AND METHODS: 48 variables were evaluated in 1,446 Spanish FM cases fulfilling 1990 ACR FM criteria. A partitioning analysis was performed to find groups of variables similar to each other. Similarities between variables were identified and the variables were grouped into dimensions. This was performed in a subset of 559 patients, and cross-validated in the remaining 887 patients. For each sample and dimension, a composite index was obtained based on the weights of the variables included in the dimension. Finally, a clustering procedure was applied to the indexes, resulting in FM subgroups. RESULTS: VARIABLES CLUSTERED INTO THREE INDEPENDENT DIMENSIONS: "symptomatology", "comorbidities" and "clinical scales". Only the two first dimensions were considered for the construction of FM subgroups. Resulting scores classified FM samples into three subgroups: low symptomatology and comorbidities (Cluster 1, high symptomatology and comorbidities (Cluster 2, and high symptomatology but low comorbidities (Cluster 3, showing differences in measures of disease severity. CONCLUSIONS: We have identified three subgroups of FM samples in a large cohort of FM by clustering clinical data. Our analysis stresses the importance of family and personal history of FM comorbidities. Also, the resulting patient clusters could indicate different forms of the disease, relevant to future research, and might have an impact on clinical assessment.
Introduction to regression graphics
Cook, R Dennis
2009-01-01
Covers the use of dynamic and interactive computer graphics in linear regression analysis, focusing on analytical graphics. Features new techniques like plot rotation. The authors have composed their own regression code, using Xlisp-Stat language called R-code, which is a nearly complete system for linear regression analysis and can be utilized as the main computer program in a linear regression course. The accompanying disks, for both Macintosh and Windows computers, contain the R-code and Xlisp-Stat. An Instructor's Manual presenting detailed solutions to all the problems in the book is ava
Analysis of extreme drinking in patients with alcohol dependence using Pareto regression.
Das, Sourish; Harel, Ofer; Dey, Dipak K; Covault, Jonathan; Kranzler, Henry R
2010-05-20
We developed a novel Pareto regression model with an unknown shape parameter to analyze extreme drinking in patients with Alcohol Dependence (AD). We used the generalized linear model (GLM) framework and the log-link to include the covariate information through the scale parameter of the generalized Pareto distribution. We proposed a Bayesian method based on Ridge prior and Zellner's g-prior for the regression coefficients. Simulation study indicated that the proposed Bayesian method performs better than the existing likelihood-based inference for the Pareto regression.We examined two issues of importance in the study of AD. First, we tested whether a single nucleotide polymorphism within GABRA2 gene, which encodes a subunit of the GABA(A) receptor, and that has been associated with AD, influences 'extreme' alcohol intake and second, the efficacy of three psychotherapies for alcoholism in treating extreme drinking behavior. We found an association between extreme drinking behavior and GABRA2. We also found that, at baseline, men with a high-risk GABRA2 allele had a significantly higher probability of extreme drinking than men with no high-risk allele. However, men with a high-risk allele responded to the therapy better than those with two copies of the low-risk allele. Women with high-risk alleles also responded to the therapy better than those with two copies of the low-risk allele, while women who received the cognitive behavioral therapy had better outcomes than those receiving either of the other two therapies. Among men, motivational enhancement therapy was the best for the treatment of the extreme drinking behavior.
L. Monika Moskal
2012-08-01
Full Text Available The characterization of soil attributes using hyperspectral sensors has revealed patterns in soil spectra that are known to respond to mineral composition, organic matter, soil moisture and particle size distribution. Soil samples from different soil horizons of replicated soil series from sites located within Washington and Oregon were analyzed with the FieldSpec Spectroradiometer to measure their spectral signatures across the electromagnetic range of 400 to 1,000 nm. Similarity rankings of individual soil samples reveal differences between replicate series as well as samples within the same replicate series. Using classification and regression tree statistical methods, regression trees were fitted to each spectral response using concentrations of nitrogen, carbon, carbonate and organic matter as the response variables. Statistics resulting from fitted trees were: nitrogen R^{2} 0.91 (p < 0.01 at 403, 470, 687, and 846 nm spectral band widths, carbonate R^{2} 0.95 (p < 0.01 at 531 and 898 nm band widths, total carbon R^{2} 0.93 (p < 0.01 at 400, 409, 441 and 907 nm band widths, and organic matter R^{2} 0.98 (p < 0.01 at 300, 400, 441, 832 and 907 nm band widths. Use of the 400 to 1,000 nm electromagnetic range utilizing regression trees provided a powerful, rapid and inexpensive method for assessing nitrogen, carbon, carbonate and organic matter for upper soil horizons in a nondestructive method.
REGRESSIVE ANALYSIS OF BRAKING EFFICIENCY OF M1 CATEGORY VEHICLES WITH ANTI-BLOCKING BRAKE SYSTEM
О. Sarayev
2015-07-01
Full Text Available The problematics of assessing the effectiveness of vehicle braking after road accidentoccurrence is considered. For the first time in relation to the modern models of vehicles equipped with anti-lock brakes there were obtained regression models describing the relationship between the coefficient of traction and a random variable of steady deceleration. This does not contradict the essence of the stochastic physical object, which is the process of vehicle braking, unlike the previously adopted method of formalizing this process, using a deterministic function.
Vesnin, V. L.; Muradov, V. G.
2012-09-01
Absorption spectra of multicomponent hydrocarbon mixtures based on n-heptane and isooctane with addition of benzene (up to 1%) and toluene and o-xylene (up to 20%) were investigated experimentally in the region of the first overtones of the hydrocarbon groups (λ = 1620-1780 nm). It was shown that their concentrations could be determined separately by using a multiple linear regression method. The optimum result was obtained by including four wavelengths at 1671, 1680, 1685, and 1695 nm, which took into account absorption of CH groups in benzene, toluene, and o-xylene and CH3 groups, respectively.
Metabolomic analysis identifies altered metabolic pathways in Multiple Sclerosis.
Poddighe, Simone; Murgia, Federica; Lorefice, Lorena; Liggi, Sonia; Cocco, Eleonora; Marrosu, Maria Giovanna; Atzori, Luigi
2017-07-16
Multiple sclerosis (MS) is a chronic, demyelinating disease that affects the central nervous system and is characterized by a complex pathogenesis and difficult management. The identification of new biomarkers would be clinically useful for more accurate diagnoses and disease monitoring. Metabolomics, the identification of small endogenous molecules, offers an instantaneous molecular snapshot of the MS phenotype. Here the metabolomic profiles (utilizing plasma from patients with MS) were characterized with a Gas cromatography-mass spectrometry-based platform followed by a multivariate statistical analysis and comparison with a healthy control (HC) population. The obtained partial least square discriminant analysis (PLS-DA) model identified and validated significant metabolic differences between individuals with MS and HC (R2X=0.223, R2Y=0.82, Q2=0.562; p<0.001). Among discriminant metabolites phosphate, fructose, myo-inositol, pyroglutamate, threonate, l-leucine, l-asparagine, l-ornithine, l-glutamine, and l-glutamate were correctly identified, and some resulted as unknown. A receiver operating characteristic (ROC) curve with AUC 0.84 (p=0.01; CI: 0.75-1) generated with the concentrations of the discriminant metabolites, supported the strength of the model. Pathway analysis indicated asparagine and citrulline biosynthesis as the main canonical pathways involved in MS. Changes in the citrulline biosynthesis pathway suggests the involvement of oxidative stress during neuronal damage. The results confirmed metabolomics as a useful approach to better understand the pathogenesis of MS and to provide new biomarkers for the disease to be used together with clinical data. Copyright © 2017 Elsevier Ltd. All rights reserved.
Cluster Analysis to Identify Possible Subgroups in Tinnitus Patients.
van den Berge, Minke J C; Free, Rolien H; Arnold, Rosemarie; de Kleine, Emile; Hofman, Rutger; van Dijk, J Marc C; van Dijk, Pim
2017-01-01
In tinnitus treatment, there is a tendency to shift from a "one size fits all" to a more individual, patient-tailored approach. Insight in the heterogeneity of the tinnitus spectrum might improve the management of tinnitus patients in terms of choice of treatment and identification of patients with severe mental distress. The goal of this study was to identify subgroups in a large group of tinnitus patients. Data were collected from patients with severe tinnitus complaints visiting our tertiary referral tinnitus care group at the University Medical Center Groningen. Patient-reported and physician-reported variables were collected during their visit to our clinic. Cluster analyses were used to characterize subgroups. For the selection of the right variables to enter in the cluster analysis, two approaches were used: (1) variable reduction with principle component analysis and (2) variable selection based on expert opinion. Various variables of 1,783 tinnitus patients were included in the analyses. Cluster analysis (1) included 976 patients and resulted in a four-cluster solution. The effect of external influences was the most discriminative between the groups, or clusters, of patients. The "silhouette measure" of the cluster outcome was low (0.2), indicating a "no substantial" cluster structure. Cluster analysis (2) included 761 patients and resulted in a three-cluster solution, comparable to the first analysis. Again, a "no substantial" cluster structure was found (0.2). Two cluster analyses on a large database of tinnitus patients revealed that clusters of patients are mostly formed by a different response of external influences on their disease. However, both cluster outcomes based on this dataset showed a poor stability, suggesting that our tinnitus population comprises a continuum rather than a number of clearly defined subgroups.
Walker, Mary Ellen; Anonson, June; Szafron, Michael
2015-01-01
The relationship between political environment and health services accessibility (HSA) has not been the focus of any specific studies. The purpose of this study was to address this gap in the literature by examining the relationship between political environment and HSA. This relationship that HSA indicators (physicians, nurses and hospital beds per 10 000 people) has with political environment was analyzed with multiple least-squares regression using the components of democracy (electoral processes and pluralism, functioning of government, political participation, political culture, and civil liberties). The components of democracy were represented by the 2011 Economist Intelligence Unit Democracy Index (EIUDI) sub-scores. The EIUDI sub-scores and the HSA indicators were evaluated for significant relationships with multiple least-squares regression. While controlling for a country's geographic location and level of democracy, we found that two components of a nation's political environment: functioning of government and political participation, and their interaction had significant relationships with the three HSA indicators. These study findings are of significance to health professionals because they examine the political contexts in which citizens access health services, they come from research that is the first of its kind, and they help explain the effect political environment has on health. © The Author 2014. Published by Oxford University Press on behalf of Royal Society of Tropical Medicine and Hygiene. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.
Patnaik, Surya N.; Guptill, James D.; Hopkins, Dale A.; Lavelle, Thomas M.
2000-01-01
The NASA Engine Performance Program (NEPP) can configure and analyze almost any type of gas turbine engine that can be generated through the interconnection of a set of standard physical components. In addition, the code can optimize engine performance by changing adjustable variables under a set of constraints. However, for engine cycle problems at certain operating points, the NEPP code can encounter difficulties: nonconvergence in the currently implemented Powell's optimization algorithm and deficiencies in the Newton-Raphson solver during engine balancing. A project was undertaken to correct these deficiencies. Nonconvergence was avoided through a cascade optimization strategy, and deficiencies associated with engine balancing were eliminated through neural network and linear regression methods. An approximation-interspersed cascade strategy was used to optimize the engine's operation over its flight envelope. Replacement of Powell's algorithm by the cascade strategy improved the optimization segment of the NEPP code. The performance of the linear regression and neural network methods as alternative engine analyzers was found to be satisfactory. This report considers two examples-a supersonic mixed-flow turbofan engine and a subsonic waverotor-topped engine-to illustrate the results, and it discusses insights gained from the improved version of the NEPP code.
Panagiotis Kasteridis
Full Text Available To test the impact of a UK pay-for-performance indicator, the Quality and Outcomes Framework (QOF dementia review, on three types of hospital admission for people with dementia: emergency admissions where dementia was the primary diagnosis; emergency admissions for ambulatory care sensitive conditions (ACSCs; and elective admissions for cataract, hip replacement, hernia, prostate disease, or hearing loss.Count data regression analyses of hospital admissions from 8,304 English general practices from 2006/7 to 2010/11. We identified relevant admissions from national Hospital Episode Statistics and aggregated them to practice level. We merged these with practice-level data on the QOF dementia review. In the base case, the exposure measure was the reported QOF register. As dementia is commonly under-diagnosed, we tested a predicted practice register based on consensus estimates. We adjusted for practice characteristics including measures of deprivation and uptake of a social benefit to purchase care services (Attendance Allowance.In the base case analysis, higher QOF achievement had no significant effect on any type of hospital admission. However, when the predicted register was used to account for under-diagnosis, a one-percentage point improvement in QOF achievement was associated with a small reduction in emergency admissions for both dementia (-0.1%; P=0.011 and ACSCs (-0.1%; P=0.001. In areas of greater deprivation, uptake of Attendance Allowance was consistently associated with significantly lower emergency admissions. In all analyses, practices with a higher proportion of nursing home patients had significantly lower admission rates for elective and emergency care.In one of three analyses at practice level, the QOF review for dementia was associated with a small but significant reduction in unplanned hospital admissions. Given the rising prevalence of dementia, increasing pressures on acute hospital beds and poor outcomes associated with
Batis, Carolina; Mendez, Michelle A; Gordon-Larsen, Penny; Sotres-Alvarez, Daniela; Adair, Linda; Popkin, Barry
2016-02-01
We examined the association between dietary patterns and diabetes using the strengths of two methods: principal component analysis (PCA) to identify the eating patterns of the population and reduced rank regression (RRR) to derive a pattern that explains the variation in glycated Hb (HbA1c), homeostasis model assessment of insulin resistance (HOMA-IR) and fasting glucose. We measured diet over a 3 d period with 24 h recalls and a household food inventory in 2006 and used it to derive PCA and RRR dietary patterns. The outcomes were measured in 2009. Adults (n 4316) from the China Health and Nutrition Survey. The adjusted odds ratio for diabetes prevalence (HbA1c≥6·5 %), comparing the highest dietary pattern score quartile with the lowest, was 1·26 (95 % CI 0·76, 2·08) for a modern high-wheat pattern (PCA; wheat products, fruits, eggs, milk, instant noodles and frozen dumplings), 0·76 (95 % CI 0·49, 1·17) for a traditional southern pattern (PCA; rice, meat, poultry and fish) and 2·37 (95 % CI 1·56, 3·60) for the pattern derived with RRR. By comparing the dietary pattern structures of RRR and PCA, we found that the RRR pattern was also behaviourally meaningful. It combined the deleterious effects of the modern high-wheat pattern (high intakes of wheat buns and breads, deep-fried wheat and soya milk) with the deleterious effects of consuming the opposite of the traditional southern pattern (low intakes of rice, poultry and game, fish and seafood). Our findings suggest that using both PCA and RRR provided useful insights when studying the association of dietary patterns with diabetes.
Excel Implementation of Principal Component Regression Analysis%主成分回归分析的EXCEL实现
林建华
2015-01-01
EXCEL是一款功能非常强大的办公软件，文章利用其内置的公式和函数给出了主成分回归分析的完整算法和详细过程，得到的计算结果与专业统计软件给出的相同。因此，应用EXCEL实现主成分回归分析是可行的。%Excel is a very powerful software. This paper presents the algorithm and process of the principal component regression analysis using Excel’s built-in formulas and functions. Moreover, the results of the calculation are the same with the professional statistics softwares. Therefore, the application of EXCEL to realize the principal component regression analysis is feasible.
Matson, Johnny L.; Kozlowski, Alison M.
2010-01-01
Autistic regression is one of the many mysteries in the developmental course of autism and pervasive developmental disorders not otherwise specified (PDD-NOS). Various definitions of this phenomenon have been used, further clouding the study of the topic. Despite this problem, some efforts at establishing prevalence have been made. The purpose of…
Nick, Todd G; Campbell, Kathleen M
2007-01-01
The Medical Subject Headings (MeSH) thesaurus used by the National Library of Medicine defines logistic regression models as "statistical models which describe the relationship between a qualitative dependent variable (that is, one which can take only certain discrete values, such as the presence or absence of a disease) and an independent variable." Logistic regression models are used to study effects of predictor variables on categorical outcomes and normally the outcome is binary, such as presence or absence of disease (e.g., non-Hodgkin's lymphoma), in which case the model is called a binary logistic model. When there are multiple predictors (e.g., risk factors and treatments) the model is referred to as a multiple or multivariable logistic regression model and is one of the most frequently used statistical model in medical journals. In this chapter, we examine both simple and multiple binary logistic regression models and present related issues, including interaction, categorical predictor variables, continuous predictor variables, and goodness of fit.
Ghosh, Debarchana; Manson, Steven M.
2008-01-01
In this paper, we present a hybrid approach, robust principal component geographically weighted regression (RPCGWR), in examining urbanization as a function of both extant urban land use and the effect of social and environmental factors in the Twin Cities Metropolitan Area (TCMA) of Minnesota. We used remotely sensed data to treat urbanization via the proxy of impervious surface. We then integrated two different methods, robust principal component analysis (RPCA) and geographically weighted ...
Brahma, K.C.; Pal, B.K.; Das, C. [CMPDI, Bhubaneswar (India)
2005-07-01
Different models of vibration studies are examined. A case analysis to determine the parameters governing the prediction of blast vibration in an opencast coal mine is described. A regression model was developed to evaluate peak particle velocity (PPV) of the blast. The results are applicable to forecasting ground vibration before blasting and to the design of various parameters in controlled blasting. 16 refs., 1 fig., 1 tab.
Stauffer ME; Weisenfluh L; Morrison A
2013-01-01
Melissa E Stauffer, Lauren Weisenfluh, Alan MorrisonSCRIBCO, Effort, PA, USABackground: Triglyceride levels were found to be independently predictive of the development of primary coronary heart disease in epidemiologic studies. The objective of this study was to determine whether triglyceride levels were predictive of cardiovascular events in randomized controlled trials (RCTs) of lipid-modifying drugs.Methods: We performed a systematic review and meta-regression analysis of 40 RCTs of lipid...
Parallel Approach for Time Series Analysis with General Regression Neural Networks
J.C. Cuevas-Tello
2012-04-01
Full Text Available The accuracy on time delay estimation given pairs of irregularly sampled time series is of great relevance in astrophysics. However the computational time is also important because the study of large data sets is needed. Besides introducing a new approach for time delay estimation, this paper presents a parallel approach to obtain a fast algorithm for time delay estimation. The neural network architecture that we use is general Regression Neural Network (GRNN. For the parallel approach, we use Message Passing Interface (MPI on a beowulf-type cluster and on a Cray supercomputer and we also use the Compute Unified Device Architecture (CUDA™ language on Graphics Processing Units (GPUs. We demonstrate that, with our approach, fast algorithms can be obtained for time delay estimation on large data sets with the same accuracy as state-of-the-art methods.
Linear regression analysis of oxygen ionic conductivity in co-doped electrolyte
XIE Guang-yuan; LI Jian; PU Jian; GUO Mi
2006-01-01
A mathematical model for the estimation of oxygen-ion conductivity of doped ZrO2 and CeO2 electrolytes was established based on the assumptions that the electronic conduction and defect association can be neglected. A linear regression method was employed to determine the parameters in the model. This model was confirmed by the published conductivity data of the doped ZrO2 and CeO2 electrolytes. In addition,a series of compositions in Ce0.8Gd0.2-xMxO1.9-δ system (M is the co-dopant) was prepared,their high temperature conductivity were measured. The model was further validated by the measured conductivity data.
Sørensen, Jens Benn; Badsberg, Jens Henrik; Olsen, Jens
1989-01-01
The prognostic factors for survival in advanced adenocarcinoma of the lung were investigated in a consecutive series of 259 patients treated with chemotherapy. Twenty-eight pretreatment variables were investigated by use of Cox's multivariate regression model, including histological subtypes...... and degree of differentiation, the new international staging system for lung cancer, and seven laboratory parameters. Staging of the patients included bone marrow examination but were otherwise nonextensive without routine bone, liver, and brain scans. Factors predicting poor survival were low performance...... status, stage IV disease, no prior nonradical resection, liver metastases, high values of white blood cell count, and lactate dehydrogenase, and low values of aspartate aminotransaminase. The nonradical resection may not be a prognostic factor because of the resection itself but may rather serve...
The Long-Term Impact of Human Capital Investment on GDP: A Panel Cointegrated Regression Analysis
Ahmet Gökçe Akpolat
2014-01-01
Full Text Available This study aims to determine the long-run impact of physical and human capital on GDP by using the panel data set of 13 developed and 11 developing countries over the period 1970–2010. Gross fixed capital formation is used as physical capital indicator while education expenditures and life expectancy at birth are used as human capital indicators. Panel DOLS and FMOLS panel cointegrated regression models are exploited to detect the magnitude and sign of the cointegration relationship and compare the effect of these physical and human capital variables according to these two different country groups. As a consequence of panels DOLS and FMOLS models, the impact of physical capital and education expenditures on GDP in the developed countries is determined as higher than the impact in the developing countries. On the other hand, the impact of life expectancy at birth on GDP is determined as higher in the developing countries.
Słania J.
2014-10-01
Full Text Available The article presents the process of production of coated electrodes and their welding properties. The factors concerning the welding properties and the currently applied method of assessing are given. The methodology of the testing based on the measuring and recording of instantaneous values of welding current and welding arc voltage is discussed. Algorithm for creation of reference data base of the expert system is shown, aiding the assessment of covered electrodes welding properties. The stability of voltage–current characteristics was discussed. Statistical factors of instantaneous values of welding current and welding arc voltage waveforms used for determining of welding process stability are presented. The results of coated electrodes welding properties are compared. The article presents the results of linear regression as well as the impact of the independent variables on the welding process performance. Finally the conclusions drawn from the research are given.
Jonathan E. Leightner
2012-01-01
Full Text Available The omitted variables problem is one of regression analysis’ most serious problems. The standard approach to the omitted variables problem is to find instruments, or proxies, for the omitted variables, but this approach makes strong assumptions that are rarely met in practice. This paper introduces best projection reiterative truncated projected least squares (BP-RTPLS, the third generation of a technique that solves the omitted variables problem without using proxies or instruments. This paper presents a theoretical argument that BP-RTPLS produces unbiased reduced form estimates when there are omitted variables. This paper also provides simulation evidence that shows OLS produces between 250% and 2450% more errors than BP-RTPLS when there are omitted variables and when measurement and round-off error is 1 percent or less. In an example, the government spending multiplier, , is estimated using annual data for the USA between 1929 and 2010.
Regression analysis based on conditional likelihood approach under semi-competing risks data.
Hsieh, Jin-Jian; Huang, Yu-Ting
2012-07-01
Medical studies often involve semi-competing risks data, which consist of two types of events, namely terminal event and non-terminal event. Because the non-terminal event may be dependently censored by the terminal event, it is not possible to make inference on the non-terminal event without extra assumptions. Therefore, this study assumes that the dependence structure on the non-terminal event and the terminal event follows a copula model, and lets the marginal regression models of the non-terminal event and the terminal event both follow time-varying effect models. This study uses a conditional likelihood approach to estimate the time-varying coefficient of the non-terminal event, and proves the large sample properties of the proposed estimator. Simulation studies show that the proposed estimator performs well. This study also uses the proposed method to analyze AIDS Clinical Trial Group (ACTG 320).
A Vector Auto Regression Model Applied to Real Estate Development Investment: A Statistic Analysis
Fengyun Liu
2016-10-01
Full Text Available This study analyzes the economic system dynamics of investment in real estate from mainly four participants in China. Local governments limit the supply of commercial and residential land to raise fiscal revenue, and expand debts by land mortgage to develop industrial zones and parks. Led by local government, banks and real estate development enterprises forge a coalition on real estate investment and facilitate real estate price appreciation. The above theoretical model is empirically evidenced with VAR (Vector Auto Regression methodology. A panel VAR model shows that land leasing and real estate price appreciation positively affect local government general fiscal revenue. Additional VAR models find that bank credit in addition to private and foreign funds respectively have strong positive dynamic effects on housing prices. Housing prices also have a strong positive impact on speculation from private funds and hot money.
Kang, Seung-Wan; Byun, Gukdo; Park, Hun-Joon
2014-12-01
This paper presents empirical research into the relationship between leader-follower value congruence in social responsibility and the level of ethical satisfaction for employees in the workplace. 163 dyads were analyzed, each consisting of a team leader and an employee working at a large manufacturing company in South Korea. Following current methodological recommendations for congruence research, polynomial regression and response surface modeling methodologies were used to determine the effects of value congruence. Results indicate that leader-follower value congruence in social responsibility was positively related to the ethical satisfaction of employees. Furthermore, employees' ethical satisfaction was stronger when aligned with a leader with high social responsibility. The theoretical and practical implications are discussed.
Spatial Growth Regressions for the convergence analysis of renewable energy consumption in Europe
Lara Fontanella
2013-10-01
Full Text Available In recent years there has been an increasing awareness on problems related to the economic growth and on the conditions under which some socio-economic variables measured on European countries tend to converge over time towards a common level. This paper is concerned with the use of energy from renewable sources and considers the extent to which EU countries meet the binding commitment to reach a fifth of energy consumption from renewable sources by 2020. By discussing empirical results on the economic growth pattern of 28 countries in the period 1995-2010, we make use of several spatial growth regression models. We show that the proposed models are able to capture the complexity of the phenomenon including the possibility of estimating sitespecific convergence parameters and the identification of convergence clubs.
Hall, Rob; Fienberg, Stephen
2011-01-01
Preserving the privacy of individual databases when carrying out statistical calculations has a long history in statistics and had been the focus of much recent attention in machine learning In this paper, we present a protocol for computing logistic regression when the data are held by separate parties without actually combining information sources by exploiting results from the literature on multi-party secure computation. We provide only the final result of the calculation compared with other methods that share intermediate values and thus present an opportunity for compromise of values in the combined database. Our paper has two themes: (1) the development of a secure protocol for computing the logistic parameters, and a demonstration of its performances in practice, and (2) and amended protocol that speeds up the computation of the logistic function. We illustrate the nature of the calculations and their accuracy using an extract of data from the Current Population Survey divided between two parties.
Archetypal TRMM Radar Profiles Identified Through Cluster Analysis
Boccippio, Dennis J.
2003-01-01
It is widely held that identifiable 'convective regimes' exist in nature, although precise definitions of these are elusive. Examples include land / Ocean distinctions, break / monsoon beahvior, seasonal differences in the Amazon (SON vs DJF), etc. These regimes are often described by differences in the realized local convective spectra, and measured by various metrics of convective intensity, depth, areal coverage and rainfall amount. Objective regime identification may be valuable in several ways: regimes may serve as natural 'branch points' in satellite retrieval algorithms or data assimilation efforts; one example might be objective identification of regions that 'should' share a similar 2-R relationship. Similarly, objectively defined regimes may provide guidance on optimal siting of ground validation efforts. Objectively defined regimes could also serve as natural (rather than arbitrary geographic) domain 'controls' in studies of convective response to environmental forcing. Quantification of convective vertical structure has traditionally involved parametric study of prescribed quantities thought to be important to convective dynamics: maximum radar reflectivity, cloud top height, 30-35 dBZ echo top height, rain rate, etc. Individually, these parameters are somewhat deficient as their interpretation is often nonunique (the same metric value may signify different physics in different storm realizations). Individual metrics also fail to capture the coherence and interrelationships between vertical levels available in full 3-D radar datasets. An alternative approach is discovery of natural partitions of vertical structure in a globally representative dataset, or 'archetypal' reflectivity profiles. In this study, this is accomplished through cluster analysis of a very large sample (0[107) of TRMM-PR reflectivity columns. Once achieved, the rainconditional and unconditional 'mix' of archetypal profile types in a given location and/or season provides a description
Cluster Analysis of Clinical Data Identifies Fibromyalgia Subgroups
Docampo, Elisa; Collado, Antonio; Escaramís, Geòrgia; Carbonell, Jordi; Rivera, Javier; Vidal, Javier; Alegre, José
2013-01-01
Introduction Fibromyalgia (FM) is mainly characterized by widespread pain and multiple accompanying symptoms, which hinder FM assessment and management. In order to reduce FM heterogeneity we classified clinical data into simplified dimensions that were used to define FM subgroups. Material and Methods 48 variables were evaluated in 1,446 Spanish FM cases fulfilling 1990 ACR FM criteria. A partitioning analysis was performed to find groups of variables similar to each other. Similarities between variables were identified and the variables were grouped into dimensions. This was performed in a subset of 559 patients, and cross-validated in the remaining 887 patients. For each sample and dimension, a composite index was obtained based on the weights of the variables included in the dimension. Finally, a clustering procedure was applied to the indexes, resulting in FM subgroups. Results Variables clustered into three independent dimensions: “symptomatology”, “comorbidities” and “clinical scales”. Only the two first dimensions were considered for the construction of FM subgroups. Resulting scores classified FM samples into three subgroups: low symptomatology and comorbidities (Cluster 1), high symptomatology and comorbidities (Cluster 2), and high symptomatology but low comorbidities (Cluster 3), showing differences in measures of disease severity. Conclusions We have identified three subgroups of FM samples in a large cohort of FM by clustering clinical data. Our analysis stresses the importance of family and personal history of FM comorbidities. Also, the resulting patient clusters could indicate different forms of the disease, relevant to future research, and might have an impact on clinical assessment. PMID:24098674
Archetypal TRMM Radar Profiles Identified Through Cluster Analysis
Boccippio, Dennis J.
2003-01-01
It is widely held that identifiable 'convective regimes' exist in nature, although precise definitions of these are elusive. Examples include land / Ocean distinctions, break / monsoon beahvior, seasonal differences in the Amazon (SON vs DJF), etc. These regimes are often described by differences in the realized local convective spectra, and measured by various metrics of convective intensity, depth, areal coverage and rainfall amount. Objective regime identification may be valuable in several ways: regimes may serve as natural 'branch points' in satellite retrieval algorithms or data assimilation efforts; one example might be objective identification of regions that 'should' share a similar 2-R relationship. Similarly, objectively defined regimes may provide guidance on optimal siting of ground validation efforts. Objectively defined regimes could also serve as natural (rather than arbitrary geographic) domain 'controls' in studies of convective response to environmental forcing. Quantification of convective vertical structure has traditionally involved parametric study of prescribed quantities thought to be important to convective dynamics: maximum radar reflectivity, cloud top height, 30-35 dBZ echo top height, rain rate, etc. Individually, these parameters are somewhat deficient as their interpretation is often nonunique (the same metric value may signify different physics in different storm realizations). Individual metrics also fail to capture the coherence and interrelationships between vertical levels available in full 3-D radar datasets. An alternative approach is discovery of natural partitions of vertical structure in a globally representative dataset, or 'archetypal' reflectivity profiles. In this study, this is accomplished through cluster analysis of a very large sample (0[107) of TRMM-PR reflectivity columns. Once achieved, the rainconditional and unconditional 'mix' of archetypal profile types in a given location and/or season provides a description
Identifying avian sources of faecal contamination using sterol analysis.
Devane, Megan L; Wood, David; Chappell, Andrew; Robson, Beth; Webster-Brown, Jenny; Gilpin, Brent J
2015-10-01
Discrimination of the source of faecal pollution in water bodies is an important step in the assessment and mitigation of public health risk. One tool for faecal source tracking is the analysis of faecal sterols which are present in faeces of animals in a range of distinctive ratios. Published ratios are able to discriminate between human and herbivore mammal faecal inputs but are of less value for identifying pollution from wildfowl, which can be a common cause of elevated bacterial indicators in rivers and streams. In this study, the sterol profiles of 50 avian-derived faecal specimens (seagulls, ducks and chickens) were examined alongside those of 57 ruminant faeces and previously published sterol profiles of human wastewater, chicken effluent and animal meatwork effluent. Two novel sterol ratios were identified as specific to avian faecal scats, which, when incorporated into a decision tree with human and herbivore mammal indicative ratios, were able to identify sterols from avian-polluted waterways. For samples where the sterol profile was not consistent with herbivore mammal or human pollution, avian pollution is indicated when the ratio of 24-ethylcholestanol/(24-ethylcholestanol + 24-ethylcoprostanol + 24-ethylepicoprostanol) is ≥0.4 (avian ratio 1) and the ratio of cholestanol/(cholestanol + coprostanol + epicoprostanol) is ≥0.5 (avian ratio 2). When avian pollution is indicated, further confirmation by targeted PCR specific markers can be employed if greater confidence in the pollution source is required. A 66% concordance between sterol ratios and current avian PCR markers was achieved when 56 water samples from polluted waterways were analysed.
Boucher, Thomas F.; Ozanne, Marie V.; Carmosino, Marco L.; Dyar, M. Darby; Mahadevan, Sridhar; Breves, Elly A.; Lepore, Kate H.; Clegg, Samuel M.
2015-05-01
The ChemCam instrument on the Mars Curiosity rover is generating thousands of LIBS spectra and bringing interest in this technique to public attention. The key to interpreting Mars or any other types of LIBS data are calibrations that relate laboratory standards to unknowns examined in other settings and enable predictions of chemical composition. Here, LIBS spectral data are analyzed using linear regression methods including partial least squares (PLS-1 and PLS-2), principal component regression (PCR), least absolute shrinkage and selection operator (lasso), elastic net, and linear support vector regression (SVR-Lin). These were compared against results from nonlinear regression methods including kernel principal component regression (K-PCR), polynomial kernel support vector regression (SVR-Py) and k-nearest neighbor (kNN) regression to discern the most effective models for interpreting chemical abundances from LIBS spectra of geological samples. The results were evaluated for 100 samples analyzed with 50 laser pulses at each of five locations averaged together. Wilcoxon signed-rank tests were employed to evaluate the statistical significance of differences among the nine models using their predicted residual sum of squares (PRESS) to make comparisons. For MgO, SiO2, Fe2O3, CaO, and MnO, the sparse models outperform all the others except for linear SVR, while for Na2O, K2O, TiO2, and P2O5, the sparse methods produce inferior results, likely because their emission lines in this energy range have lower transition probabilities. The strong performance of the sparse methods in this study suggests that use of dimensionality-reduction techniques as a preprocessing step may improve the performance of the linear models. Nonlinear methods tend to overfit the data and predict less accurately, while the linear methods proved to be more generalizable with better predictive performance. These results are attributed to the high dimensionality of the data (6144 channels
Network Analysis Identifies Disease-Specific Pathways for Parkinson's Disease.
Monti, Chiara; Colugnat, Ilaria; Lopiano, Leonardo; Chiò, Adriano; Alberio, Tiziana
2016-12-21
Neurodegenerative diseases are characterized by the progressive loss of specific neurons in selected regions of the central nervous system. The main clinical manifestation (movement disorders, cognitive impairment, and/or psychiatric disturbances) depends on the neuron population being primarily affected. Parkinson's disease is a common movement disorder, whose etiology remains mostly unknown. Progressive loss of dopaminergic neurons in the substantia nigra causes an impairment of the motor control. Some of the pathogenetic mechanisms causing the progressive deterioration of these neurons are not specific for Parkinson's disease but are shared by other neurodegenerative diseases, like Alzheimer's disease and amyotrophic lateral sclerosis. Here, we performed a meta-analysis of the literature of all the quantitative proteomic investigations of neuronal alterations in different models of Parkinson's disease, Alzheimer's disease, and amyotrophic lateral sclerosis to distinguish between general and Parkinson's disease-specific pattern of neurodegeneration. Then, we merged proteomics data with genetics information from the DisGeNET database. The comparison of gene and protein information allowed us to identify 25 proteins involved uniquely in Parkinson's disease and we verified the alteration of one of them, i.e., transaldolase 1 (TALDO1), in the substantia nigra of 5 patients. By using open-source bioinformatics tools, we identified the biological processes specifically affected in Parkinson's disease, i.e., proteolysis, mitochondrion organization, and mitophagy. Eventually, we highlighted four cellular component complexes mostly involved in the pathogenesis: the proteasome complex, the protein phosphatase 2A, the chaperonins CCT complex, and the complex III of the respiratory chain.
Social network analysis in identifying influential webloggers: A preliminary study
Hasmuni, Noraini; Sulaiman, Nor Intan Saniah; Zaibidi, Nerda Zura
2014-12-01
In recent years, second generation of internet-based services such as weblog has become an effective communication tool to publish information on the Web. Weblogs have unique characteristics that deserve users' attention. Some of webloggers have seen weblogs as appropriate medium to initiate and expand business. These webloggers or also known as direct profit-oriented webloggers (DPOWs) communicate and share knowledge with each other through social interaction. However, survivability is the main issue among DPOW. Frequent communication with influential webloggers is one of the way to keep survive as DPOW. This paper aims to understand the network structure and identify influential webloggers within the network. Proper understanding of the network structure can assist us in knowing how the information is exchanged among members and enhance survivability among DPOW. 30 DPOW were involved in this study. Degree centrality and betweenness centrality measurement in Social Network Analysis (SNA) were used to examine the strength relation and identify influential webloggers within the network. Thus, webloggers with the highest value of these measurements are considered as the most influential webloggers in the network.
Gender roles and binge drinking among Latino emerging adults: a latent class regression analysis.
Vaughan, Ellen L; Wong, Y Joel; Middendorf, Katharine G
2014-09-01
Gender roles are often cited as a culturally specific predictor of drinking among Latino populations. This study used latent class regression to test the relationships between gender roles and binge drinking in a sample of Latino emerging adults. Participants were Latino emerging adults who participated in Wave III of the National Longitudinal Study of Adolescent Health (N = 2,442). A subsample of these participants (n = 660) completed the Bem Sex Role Inventory--Short. We conducted latent class regression using 3 dimensions of gender roles (femininity, social masculinity, and personal masculinity) to predict binge drinking. Results indicated a 3-class solution. In Class 1, the protective personal masculinity class, personal masculinity (e.g., being a leader, defending one's own beliefs) was associated with a reduction in the odds of binge drinking. In Class 2, the nonsignificant class, gender roles were not related to binge drinking. In Class 3, the mixed masculinity class, personal masculinity was associated with a reduction in the odds of binge drinking, whereas social masculinity (e.g., forceful, dominant) was associated with an increase in the odds of binge drinking. Post hoc analyses found that females, those born outside the United States, and those with greater English language usage were at greater odds of being in Class 1 (vs. Class 2). Males, those born outside the United States, and those with greater Spanish language usage were at greater odds of being in Class 3 (vs. Class 2). Directions for future research and implications for practice with Latino emerging adults are discussed.
Arch Index: An Easier Approach for Arch Height (A Regression Analysis
Hironmoy Roy
2012-04-01
Full Text Available Background: Arch-height estimation though practiced usually in supine posture; is neither correct nor scientific as referred in literature, which favour for standing x-rays or arch-index as yardstick. In fact the standing x-rays can be excused for being troublesome in busy OPD, but an ink-footprint on simple graph-sheet can be documented, as it is easier, cheaper and requires almost no machineries and expertisation. Objective: So this study aimed to redefine the inter-relationship of the radiological standing arch-heights with the arch-index for correlation and regression so that from the later we can derive the radiographical standing arch-height values indirectly, avoiding the actual maneuver. Methods: The study involved 103 adult subjects attending at a tertiary care hospital of North Bengal. From the standing x-rays of foot, the standing navicular, talar heights were measured, and ‘normalised’ with the foot length. In parallel foot-prints also been obtained for arch-index. Finally variables analysed by SPSS software. Result: The arch-index showed significant negative correlations and simple linear regressions with standing navicular height, standing talar height as well as standing normalised navicular and talar heights analysed in both sexes separately with supporting mathematical equations. Conclusion: To measure the standing arch-height in a busy OPD, it is wise to have the foot-print first. Arch-index once get known, can be put in the equations as derived here, to predict the preferred standing arch-heights in either sex.
Zardo, Pauline; Collie, Alex
2014-10-09
Use of research evidence in public health policy decision-making is affected by a range of contextual factors operating at the individual, organisational and external levels. Context-specific research is needed to target and tailor research translation intervention design and implementation to ensure that factors affecting research in a specific context are addressed. Whilst such research is increasing, there remain relatively few studies that have quantitatively assessed the factors that predict research use in specific public health policy environments. A quantitative survey was designed and implemented within two public health policy agencies in the Australian state of Victoria. Binary logistic regression analyses were conducted on survey data provided by 372 participants. Univariate logistic regression analyses of 49 factors revealed 26 factors that significantly predicted research use independently. The 26 factors were then tested in a single model and five factors emerged as significant predictors of research over and above all other factors. The five key factors that significantly predicted research use were the following: relevance of research to day-to-day decision-making, skills for research use, internal prompts for use of research, intention to use research within the next 12 months and the agency for which the individual worked. These findings suggest that individual- and organisational-level factors are the critical factors to target in the design of interventions aiming to increase research use in this context. In particular, relevance of research and skills for research use would be necessary to target. The likelihood for research use increased 11- and 4-fold for those who rated highly on these factors. This study builds on previous research and contributes to the currently limited number of quantitative studies that examine use of research evidence in a large sample of public health policy and program decision-makers within a specific context. The
An analysis of first-time blood donors return behaviour using regression models.
Kheiri, S; Alibeigi, Z
2015-08-01
Blood products have a vital role in saving many patients' lives. The aim of this study was to analyse blood donor return behaviour. Using a cross-sectional follow-up design of 5-year duration, 864 first-time donors who had donated blood were selected using a systematic sampling. The behaviours of donors via three response variables, return to donation, frequency of return to donation and the time interval between donations, were analysed based on logistic regression, negative binomial regression and Cox's shared frailty model for recurrent events respectively. Successful return to donation rated at 49·1% and the deferral rate was 13·3%. There was a significant reverse relationship between the frequency of return to donation and the time interval between donations. Sex, body weight and job had an effect on return to donation; weight and frequency of donation during the first year had a direct effect on the total frequency of donations. Age, weight and job had a significant effect on the time intervals between donations. Aging decreases the chances of return to donation and increases the time interval between donations. Body weight affects the three response variables, i.e. the higher the weight, the more the chances of return to donation and the shorter the time interval between donations. There is a positive correlation between the frequency of donations in the first year and the total number of return to donations. Also, the shorter the time interval between donations is, the higher the frequency of donations. © 2015 British Blood Transfusion Society.
Optimization of Game Formats in U-10 Soccer Using Logistic Regression Analysis
Amatria Mario
2016-12-01
Full Text Available Small-sided games provide young soccer players with better opportunities to develop their skills and progress as individual and team players. There is, however, little evidence on the effectiveness of different game formats in different age groups, and furthermore, these formats can vary between and even within countries. The Royal Spanish Soccer Association replaced the traditional grassroots 7-a-side format (F-7 with the 8-a-side format (F-8 in the 2011-12 season and the country’s regional federations gradually followed suit. The aim of this observational methodology study was to investigate which of these formats best suited the learning needs of U-10 players transitioning from 5-aside futsal. We built a multiple logistic regression model to predict the success of offensive moves depending on the game format and the area of the pitch in which the move was initiated. Success was defined as a shot at the goal. We also built two simple logistic regression models to evaluate how the game format influenced the acquisition of technicaltactical skills. It was found that the probability of a shot at the goal was higher in F-7 than in F-8 for moves initiated in the Creation Sector-Own Half (0.08 vs 0.07 and the Creation Sector-Opponent's Half (0.18 vs 0.16. The probability was the same (0.04 in the Safety Sector. Children also had more opportunities to control the ball and pass or take a shot in the F-7 format (0.24 vs 0.20, and these were also more likely to be successful in this format (0.28 vs 0.19.
Liu, Shujie; Kawamoto, Taisuke; Morita, Osamu; Yoshinari, Kouichi; Honda, Hiroshi
2017-03-01
Chemical exposure often results in liver hypertrophy in animal tests, characterized by increased liver weight, hepatocellular hypertrophy, and/or cell proliferation. While most of these changes are considered adaptive responses, there is concern that they may be associated with carcinogenesis. In this study, we have employed a toxicogenomic approach using a logistic ridge regression model to identify genes responsible for liver hypertrophy and hypertrophic hepatocarcinogenesis and to develop a predictive model for assessing hypertrophy-inducing compounds. Logistic regression models have previously been used in the quantification of epidemiological risk factors. DNA microarray data from the Toxicogenomics Project-Genomics Assisted Toxicity Evaluation System were used to identify hypertrophy-related genes that are expressed differently in hypertrophy induced by carcinogens and non-carcinogens. Data were collected for 134 chemicals (72 non-hypertrophy-inducing chemicals, 27 hypertrophy-inducing non-carcinogenic chemicals, and 15 hypertrophy-inducing carcinogenic compounds). After applying logistic ridge regression analysis, 35 genes for liver hypertrophy (e.g., Acot1 and Abcc3) and 13 genes for hypertrophic hepatocarcinogenesis (e.g., Asns and Gpx2) were selected. The predictive models built using these genes were 94.8% and 82.7% accurate, respectively. Pathway analysis of the genes indicates that, aside from a xenobiotic metabolism-related pathway as an adaptive response for liver hypertrophy, amino acid biosynthesis and oxidative responses appear to be involved in hypertrophic hepatocarcinogenesis. Early detection and toxicogenomic characterization of liver hypertrophy using our models may be useful for predicting carcinogenesis. In addition, the identified genes provide novel insight into discrimination between adverse hypertrophy associated with carcinogenesis and adaptive hypertrophy in risk assessment. Copyright © 2017 Elsevier Inc. All rights reserved.
Daphne I Ling
Full Text Available BACKGROUND: Hundreds of studies have evaluated the diagnostic accuracy of nucleic-acid amplification tests (NAATs for tuberculosis (TB. Commercial tests have been shown to give more consistent results than in-house assays. Previous meta-analyses have found high specificity but low and highly variable estimates of sensitivity. However, reasons for variability in study results have not been adequately explored. We performed a meta-analysis on the accuracy of commercial NAATs to diagnose pulmonary TB and meta-regression to identify factors that are associated with higher accuracy. METHODOLOGY/PRINCIPAL FINDINGS: We identified 2948 citations from searching the literature. We found 402 articles that met our eligibility criteria. In the final analysis, 125 separate studies from 105 articles that reported NAAT results from respiratory specimens were included. The pooled sensitivity was 0.85 (range 0.36-1.00 and the pooled specificity was 0.97 (range 0.54-1.00. However, both measures were significantly heterogeneous (p<.001. We performed subgroup and meta-regression analyses to identify sources of heterogeneity. Even after stratifying by type of commercial test, we could not account for the variability. In the meta-regression, the threshold effect was significant (p = .01 and the use of other respiratory specimens besides sputum was associated with higher accuracy. CONCLUSIONS/SIGNIFICANCE: The sensitivity and specificity estimates for commercial NAATs in respiratory specimens were highly variable, with sensitivity lower and more inconsistent than specificity. Thus, summary measures of diagnostic accuracy are not clinically meaningful. The use of different cut-off values and the use of specimens other than sputum could explain some of the observed heterogeneity. Based on these observations, commercial NAATs alone cannot be recommended to replace conventional tests for diagnosing pulmonary TB. Improvements in diagnostic accuracy, particularly sensitivity