Multiple Linear Regressions Analysis
Arsham, Hossein
This online calculator allows users to enter sixteen observations with up to four dependent variables and calculates the regression equation, the fitted values, R-Squared, the F-Statistic, mean, variance, first order serial-correlation, second order serial-correlation, the Durbin-Watson statistic, and the mean absolute errors. It also tests normality and gives the i-th residuals.
Multiple linear regression analysis
Edwards, T. R.
1980-01-01
Program rapidly selects best-suited set of coefficients. User supplies only vectors of independent and dependent data and specifies confidence level required. Program uses stepwise statistical procedure for relating minimal set of variables to set of observations; final regression contains only most statistically significant coefficients. Program is written in FORTRAN IV for batch execution and has been implemented on NOVA 1200.
Multiple Correlation versus Multiple Regression.
Huberty, Carl J.
2003-01-01
Describes differences between multiple correlation analysis (MCA) and multiple regression analysis (MRA), showing how these approaches involve different research questions and study designs, different inferential approaches, different analysis strategies, and different reported information. (SLD)
MULTIPLE REGRESSION ANALYSIS OF MAIN ECONOMIC INDICATORS IN TOURISM
Kulcsa?r, Erika
2009-01-01
This paper analysis the measure between GDP dependent variable in the sector of hotels and restaurants and the following independent variables: overnight stays in the establishments of touristic reception, arrivals in the establishments of touristic reception and investments in hotels and restaurants sector in the period of analysis 1995-2007. With the multiple regression analysis I found that investments and tourist arrivals are significant predictors for the GDP dependent variable. Based on...
MULTIPLE REGRESSION ANALYSIS OF MAIN ECONOMIC INDICATORS IN TOURISM
Erika KULCSÁR
2009-12-01
Full Text Available This paper analysis the measure between GDP dependent variable in the sector of hotels and restaurants and the following independent variables: overnight stays in the establishments of touristic reception, arrivals in the establishments of touristic reception and investments in hotels and restaurants sector in the period of analysis 1995-2007. With the multiple regression analysis I found that investments and tourist arrivals are significant predictors for the GDP dependent variable. Based on these results, I identified those components of the marketing mix, which in my opinion require investment, which could contribute to the positive development of tourist arrivals in the establishments of touristic reception.
Lowry, Richard, 1940-
This page will perform basic multiple regression analysis for the case where there are several independent predictor variables, X1, X2, etc., and one dependent or criterion variable, Y. Requires import of data from a spreadsheet.
Applied multiple regression correlation analysis for the behavioral sciences
Cohen, Patricia; Aiken, Leona S
2014-01-01
This classic text on multiple regression is noted for its nonmathematical, applied, and data-analytic approach. Readers profit from its verbal-conceptual exposition and frequent use of examples. The applied emphasis provides clear illustrations of the principles and provides worked examples of the types of applications that are possible. Researchers learn how to specify regression models that directly address their research questions. An overview of the fundamental ideas of multiple regression and a review of bivariate correlation and regression and other elementary statistical concepts provide a strong foundation for understanding the rest of the text. The third edition features an increased emphasis on graphics and the use of confidence intervals and effect size measures, and an accompanying CD with data for most of the numerical examples along with the computer code for SPSS, SAS, and SYSTAT. Applied Multiple Regression serves as both a textbook for graduate students and as a reference tool for researche...
Multiple Regression Analysis Using ANCOVA in University Model
Maneesha
2013-09-01
Full Text Available The government of UAE is promoting Dubai as an academic hub. Dubai International Academic City (DIAC is a free zone area with many national and international universities promoting higher education in almost all disciplines. The aspiration of every graduating student from the university is to get a good placement. In Dubai diverse job opportunities in national and multinational organizations are available. The objective of the paper is to review the placement opportunities in Dubai for the universities offering programs in Engineering. This paper attempts to study the effect of three independent variables namely Cumulative grade point average (CGPA, Engineering disciplines and types of jobs that graduating students are offered on the dependent variable salary. Engineering discipline understudy are Mechanical, Electronics and Communication, Computer Science and Electrical and Electronics Engineering. The type of jobs taken into consideration are marketing, technical marketing, design and logistics. The concepts of Analysis of covariance (ANCOVA and multiple regression are used for review of placement opportunities vis a vis the salary structure.
Using Robust Standard Errors to Combine Multiple Regression Estimates with Meta-Analysis
Williams, Ryan T.
2012-01-01
Combining multiple regression estimates with meta-analysis has continued to be a difficult task. A variety of methods have been proposed and used to combine multiple regression slope estimates with meta-analysis, however, most of these methods have serious methodological and practical limitations. The purpose of this study was to explore the use…
Regression analysis for multiple-disease group testing data.
Zhang, Boan; Bilder, Christopher R; Tebbs, Joshua M
2013-12-10
Group testing, where individual specimens are composited into groups to test for the presence of a disease (or other binary characteristic), is a procedure commonly used to reduce the costs of screening a large number of individuals. Group testing data are unique in that only group responses may be available, but inferences are needed at the individual level. A further methodological challenge arises when individuals are tested in groups for multiple diseases simultaneously, because unobserved individual disease statuses are likely correlated. In this paper, we propose new regression techniques for multiple-disease group testing data. We develop an expectation-solution based algorithm that provides consistent parameter estimates and natural large-sample inference procedures. We apply our proposed methodology to chlamydia and gonorrhea screening data collected in Nebraska as part of the Infertility Prevention Project and to prenatal infectious disease screening data from Kenya. PMID:23703944
An improved multiple linear regression and data analysis computer program package
Sidik, S. M.
1972-01-01
NEWRAP, an improved version of a previous multiple linear regression program called RAPIER, CREDUC, and CRSPLT, allows for a complete regression analysis including cross plots of the independent and dependent variables, correlation coefficients, regression coefficients, analysis of variance tables, t-statistics and their probability levels, rejection of independent variables, plots of residuals against the independent and dependent variables, and a canonical reduction of quadratic response functions useful in optimum seeking experimentation. A major improvement over RAPIER is that all regression calculations are done in double precision arithmetic.
Analysis of ? spectra in airborne radioactivity measurements using multiple linear regressions
This paper describes the net peak counts calculating of nuclide 137Cs at 662 keV of ? spectra in airborne radioactivity measurements using multiple linear regressions. Mathematic model is founded by analyzing every factor that has contribution to Cs peak counts in spectra, and multiple linear regression function is established. Calculating process adopts stepwise regression, and the indistinctive factors are eliminated by F check. The regression results and its uncertainty are calculated using Least Square Estimation, then the Cs peak net counts and its uncertainty can be gotten. The analysis results for experimental spectrum are displayed. The influence of energy shift and energy resolution on the analyzing result is discussed. In comparison with the stripping spectra method, multiple linear regression method needn't stripping radios, and the calculating result has relation with the counts in Cs peak only, and the calculating uncertainty is reduced. (authors)
Lacey, Michelle
This site, created by Michelle Lacey of Yale University, gives an explanation, a definition and an example of multiple linear regression. Topics include: confidence intervals, tests of significance, and squared multiple correlation. While brief, this is still a valuable site for anyone interested in statistics.
Quantitative electron microscope autoradiography: application of multiple linear regression analysis
A new method for the analysis of high resolution EM autoradiographs is described. It identifies labelled cell organelle profiles in sections on a strictly statistical basis and provides accurate estimates for their radioactivity without the need to make any assumptions about their size, shape and spatial arrangement. (author)
Multiple Regression Analysis of Sib-Pair Data on Reading to Detect Quantitative Trait Loci.
Fulker, D. W.; And Others
1991-01-01
Applies an extension of an earlier multiple regression model for twin analysis to the problem of detecting linkage in a quantitative trait. Detects a number of possible linkages, indicating that the approach is effective. Discusses detecting genotype-environment interaction and the issue of power. (RS)
PUMA: A Unified Framework for Penalized Multiple Regression Analysis of GWAS Data
Hoffman, Gabriel E.; Logsdon, Benjamin A.; Mezey, Jason G
2013-01-01
Penalized Multiple Regression (PMR) can be used to discover novel disease associations in GWAS datasets. In practice, proposed PMR methods have not been able to identify well-supported associations in GWAS that are undetectable by standard association tests and thus these methods are not widely applied. Here, we present a combined algorithmic and heuristic framework for PUMA (Penalized Unified Multiple-locus Association) analysis that solves the problems of previously proposed methods includi...
Regression Models for the Analysis of Longitudinal Gaussian Data from Multiple Sources
O’Brien, Liam M; Fitzmaurice, Garrett M.
2005-01-01
We present a regression model for the joint analysis of longitudinal multiple source Gaussian data. Longitudinal multiple source data arise when repeated measurements are taken from two or more sources, and each source provides a measure of the same underlying variable and on the same scale. This type of data generally produces a relatively large number of observations per subject; thus estimation of an unstructured covariance matrix often may not be possible. We consider two methods by which...
A Performance Study of Data Mining Techniques: Multiple Linear Regression vs. Factor Analysis
Chauhan, R. K.; Abhishek Taneja
2011-01-01
The growing volume of data usually creates an interesting challenge for the need of data analysis tools that discover regularities in these data. Data mining has emerged as disciplines that contribute tools for data analysis, discovery of hidden knowledge, and autonomous decision making in many application domains. The purpose of this study is to compare the performance of two data mining techniques viz., factor analysis and multiple linear regression for different sample si...
Abdelrafe Elzamly; Burairah Hussin
2014-01-01
Risk is not always avoidable, but it is controllable. The aim of this study is to identify whether those techniques are effective in reducing software failure. This motivates the authors to continue the effort to enrich the managing software project risks with consider mining and quantitative approach with large data set. In this study, two new techniques are introduced namely stepwise multiple regression analysis and fuzzy multiple regression to manage the software risks. Two evaluation proc...
Fungible Weights in Multiple Regression
Waller, Niels G.
2008-01-01
Every set of alternate weights (i.e., nonleast squares weights) in a multiple regression analysis with three or more predictors is associated with an infinite class of weights. All members of a given class can be deemed "fungible" because they yield identical "SSE" (sum of squared errors) and R[superscript 2] values. Equations for generating…
Beyond Multiple Regression: Using Commonality Analysis to Better Understand R[superscript 2] Results
Warne, Russell T.
2011-01-01
Multiple regression is one of the most common statistical methods used in quantitative educational research. Despite the versatility and easy interpretability of multiple regression, it has some shortcomings in the detection of suppressor variables and for somewhat arbitrarily assigning values to the structure coefficients of correlated…
Various types of ultrasonic techniques have been used for the estimation of compressive strength of concrete structures. However, conventional ultrasonic velocity method using only longitudial wave cannot be determined the compressive strength of concrete structures with accuracy. In this paper, by using the introduction of multiple parameter, e. g. velocity of shear wave, velocity of longitudinal wave, attenuation coefficient of shear wave, attenuation coefficient of longitudinal wave, combination condition, age and preservation method, multiple regression analysis method was applied to the determination of compressive strength of concrete structures. The experimental results show that velocity of shear wave can be estimated compressive strength of concrete with more accuracy compared with the velocity of longitudinal wave, accuracy of estimated error range of compressive strength of concrete structures can be enhanced within the range of ± 10% approximately
ANALYSIS OF THE FINANCIAL PERFORMANCES OF THE FIRM, BY USING THE MULTIPLE REGRESSION MODEL
Directory of Open Access Journals (Sweden)
Constantin Anghelache
2011-11-01
Full Text Available The information achieved through the use of simple linear regression are not always enough to characterize the evolution of an economic phenomenon and, furthermore, to identify its possible future evolution. To remedy these drawbacks, the special literature includes multiple regression models, in which the evolution of the dependant variable is defined depending on two or more factorial variables.
A. Shirvani
2005-10-01
Full Text Available Since the fluctuations of the Persian Gulf Sea Surface Temperature (PGSST have a significant effect on the winter precipitation and water resources and agricultural productions of the south western parts of Iran, the possibility of the Winter SST prediction was evaluated by multiple regression model. The time series of PGSSTs for all seasons, during 1947-1992, were considered as predictors, and the time series of MSSTs during 1948-1993, as the prrdictand. For the purpose of data reduction and principal components extraction, the principal components analysis was applied. Just the scores of the first four PCs (PC1 to PC4 that accounted for the total variance in predictor field were considered as the input file for the regression analysis. For finding the dependency of each principal component to the first time series of the PGSST, the Varimax rotation analysis was applied. The results have indicated that PC1 to PC4 respectively are the indicator of temperature changes during winter, autumn, Spring and Summer. According to the regression model, the components of PC1, PC2 and PC4 were significant at 5% level. But the components of PC3 was insignificant. The results indicated that the significant variables are held accountable for the 33.5% of the total variance in the winter PGSSTs. It became obvious that for the prediction of the winter PGSST, the PGSST during the winter of the last year has a particular importance. At the next stage, autumn and summer temperature have also a role in prediction of winter PGSST.
A Performance Study of Data Mining Techniques: Multiple Linear Regression vs. Factor Analysis
Taneja, Abhishek
2011-01-01
The growing volume of data usually creates an interesting challenge for the need of data analysis tools that discover regularities in these data. Data mining has emerged as disciplines that contribute tools for data analysis, discovery of hidden knowledge, and autonomous decision making in many application domains. The purpose of this study is to compare the performance of two data mining techniques viz., factor analysis and multiple linear regression for different sample sizes on three unique sets of data. The performance of the two data mining techniques is compared on following parameters like mean square error (MSE), R-square, R-Square adjusted, condition number, root mean square error(RMSE), number of variables included in the prediction model, modified coefficient of efficiency, F-value, and test of normality. These parameters have been computed using various data mining tools like SPSS, XLstat, Stata, and MS-Excel. It is seen that for all the given dataset, factor analysis outperform multiple linear re...
Melanin and blood concentration in human skin studied by multiple regression analysis: experiments
Shimada, M.; Yamada, Y.; Itoh, M.; Yatagai, T.
2001-09-01
Knowledge of the mechanism of human skin colour and measurement of melanin and blood concentration in human skin are needed in the medical and cosmetic fields. The absorbance spectrum from reflectance at the visible wavelength of human skin increases under several conditions such as a sunburn or scalding. The change of the absorbance spectrum from reflectance including the scattering effect does not correspond to the molar absorption spectrum of melanin and blood. The modified Beer-Lambert law is applied to the change in the absorbance spectrum from reflectance of human skin as the change in melanin and blood is assumed to be small. The concentration of melanin and blood was estimated from the absorbance spectrum reflectance of human skin using multiple regression analysis. Estimated concentrations were compared with the measured one in a phantom experiment and this method was applied to in vivo skin.
The content rate of scattered radiation is affected by tube voltage, object thickness, size of radiation field and with or without grid. We tried to formalize the relationship between the content rate and them. As changed the tube voltage, object thickness, radiation field and grid ratio, radiography varies in its film density, then we calculated the content rate of scattered radiation, and led two approximate equations by the method of multiple regression analysis. One of the equations was computed by using real value X of explaining variables, another by using root value ?X except the explaining variable of tube voltage. As a result, the latter had better accuracy. Applying this approximate equation, when each explaining variable is within the boundary area, error is not over 10 percent, almost errors within 5 percent. (author)
Multiple Regression and Its Discontents
Snell, Joel C.; Marsh, Mitchell
2012-01-01
Multiple regression is part of a larger statistical strategy originated by Gauss. The authors raise questions about the theory and suggest some changes that would make room for Mandelbrot and Serendipity.
In this study, thermodynamic and statistical analyses were performed on a gas turbine system, to assess the impact of some important operating parameters like CIT (Compressor Inlet Temperature), PR (Pressure Ratio) and TIT (Turbine Inlet Temperature) on its performance characteristics such as net power output, energy efficiency, exergy efficiency and fuel consumption. Each performance characteristic was enunciated as a function of operating parameters, followed by a parametric study and optimization. The results showed that the performance characteristics increase with an increase in the TIT and a decrease in the CIT, except fuel consumption which behaves oppositely. The net power output and efficiencies increase with the PR up to certain initial values and then start to decrease, whereas the fuel consumption always decreases with an increase in the PR. The results of exergy analysis showed the combustion chamber as a major contributor to the exergy destruction, followed by stack gas. Subsequently, multiple regression models were developed to correlate each of the response variables (performance characteristic) with the predictor variables (operating parameters). The regression model equations showed a significant statistical relationship between the predictor and response variables. (author)
PUMA: a unified framework for penalized multiple regression analysis of GWAS data.
Hoffman, Gabriel E; Logsdon, Benjamin A; Mezey, Jason G
2013-01-01
Penalized Multiple Regression (PMR) can be used to discover novel disease associations in GWAS datasets. In practice, proposed PMR methods have not been able to identify well-supported associations in GWAS that are undetectable by standard association tests and thus these methods are not widely applied. Here, we present a combined algorithmic and heuristic framework for PUMA (Penalized Unified Multiple-locus Association) analysis that solves the problems of previously proposed methods including computational speed, poor performance on genome-scale simulated data, and identification of too many associations for real data to be biologically plausible. The framework includes a new minorize-maximization (MM) algorithm for generalized linear models (GLM) combined with heuristic model selection and testing methods for identification of robust associations. The PUMA framework implements the penalized maximum likelihood penalties previously proposed for GWAS analysis (i.e. Lasso, Adaptive Lasso, NEG, MCP), as well as a penalty that has not been previously applied to GWAS (i.e. LOG). Using simulations that closely mirror real GWAS data, we show that our framework has high performance and reliably increases power to detect weak associations, while existing PMR methods can perform worse than single marker testing in overall performance. To demonstrate the empirical value of PUMA, we analyzed GWAS data for type 1 diabetes, Crohns's disease, and rheumatoid arthritis, three autoimmune diseases from the original Wellcome Trust Case Control Consortium. Our analysis replicates known associations for these diseases and we discover novel etiologically relevant susceptibility loci that are invisible to standard single marker tests, including six novel associations implicating genes involved in pancreatic function, insulin pathways and immune-cell function in type 1 diabetes; three novel associations implicating genes in pro- and anti-inflammatory pathways in Crohn's disease; and one novel association implicating a gene involved in apoptosis pathways in rheumatoid arthritis. We provide software for applying our PUMA analysis framework. PMID:23825936
Multiple Regression Analysis of Factors that May Influence Middle School Science Scores
Glover, Judith
2012-01-01
The purpose of this quantitative multiple regression study was to determine whether a relationship existed between Maryland State Assessment (MSA) reading scores, MSA math scores, gender, ethnicity, age, and MSA science scores. Also examined was if MSA reading scores, MSA math scores, gender, ethnicity, and age can be used in combination or alone…
The calculated >1-MeV pressure vessel fluence is used to determine the fracture toughness and integrity of the reactor pressure vessel. It is therefore of the utmost importance to ensure that the fluence prediction is accurate and unbiased. In practice, this assurance is provided by comparing the predictions of the calculational methodology with an extensive set of accurate benchmarks. A benchmarking database is used to provide an estimate of the overall average measurement-to-calculation (M/C) bias in the calculations (). This average is used as an ad-hoc multiplicative adjustment to the calculations to correct for the observed calculational bias. However, this average only provides a well-defined and valid adjustment of the fluence if the M/C data are homogeneous; i.e., the data are statistically independent and there is no correlation between subsets of M/C data.Typically, the identification of correlations between the errors in the database M/C values is difficult because the correlation is of the same magnitude as the random errors in the M/C data and varies substantially over the database. In this paper, an evaluation of a reactor dosimetry benchmark database is performed to determine the statistical validity of the adjustment to the calculated pressure vessel fluence. Physical mechanisms that could potentially introduce a correlation between the subsets of M/C ratios are identified and included in a multiple regression analysis of the M/C data.tiple regression analysis of the M/C data. Rigorous statistical criteria are used to evaluate the homogeneity of the M/C data and determine the validity of the adjustment.For the database evaluated, the M/C data are found to be strongly correlated with dosimeter response threshold energy and dosimeter location (e.g., cavity versus in-vessel). It is shown that because of the inhomogeneity in the M/C data, for this database, the benchmark data do not provide a valid basis for adjusting the pressure vessel fluence.The statistical criteria and methods employed in this analysis are generic and may be applied in benchmarking applications where the M/C comparisons are used to determine an adjustment of the calculations
Fraas, John W.; Newman, Isadore
1996-01-01
In a conjoint-analysis consumer-preference study, researchers must determine whether the product factor estimates, which measure consumer preferences, should be calculated and interpreted for each respondent or collectively. Multiple regression models can determine whether to aggregate data by examining factor-respondent interaction effects. This…
Anomalous particle pinch and scaling of vin/D based on transport analysis and multiple regression
International Nuclear Information System (INIS)
Predictions of density profiles in current tokamaks and ITER require a validated scaling relation for vin/D where vin is the anomalous inward drift velocity and D is the anomalous diffusion coefficient. Transport analysis is necessary for determining the anomalous particle pinch from measured density profiles and for separating the impact of particle sources. A set of discharges in ASDEX Upgrade, DIII-D, JET and ASDEX is analysed using a special version of the 1.5-D BALDUR transport code. Profiles of ?svin/D with ?s the effective separatrix radius, five other dimensionless parameters and many further quantities in the confinement zone are compiled, resulting in the dataset VIND1.dat, which covers a wide parameter range. Weighted multiple regression is applied to the ASDEX Upgrade subset which leads to a two-term scaling ?svin(x')/D(x') 0.0432[(LTe(x-bar')/?s)-2.58 + 7.13 UL1.55?e*(x-bar')-0.42]x' with x' = ?/?s, effective radius ? and average value x-bar'. The rmse value of the scaling equals 15.2%. The electron temperature gradient length LTe is the key parameter of the anomalous particle pinch which yields the main contribution. A further parameter is the loop voltage UL which introduces the electron collisionality parameter ?e*. All exponenmeter ?e*. All exponents are statistically significant. The parameters UL and ?e* suggest a new anomalous particle pinch term driven by the Ohmic inductive electric field. The nonlinearities in the two-term scaling show that quasilinear theory is disproved by experiment. Regression analysis of the whole dataset VIND1.dat from four tokamaks shows that the LTe/?s scaling covers the dependence of ?svin/D on the effective plasma radius. It is further found that the ?svin/D values from transport analysis do not respond to a change in collisionality regime and are not clearly related to the prevailing turbulence type. The new scaling law predicts for ITER high values of ?svin/D and peaked density profiles, caused by the LTe/?s term and central heating due to alpha particles. The density peaking improves the energy confinement by some 20%
International Nuclear Information System (INIS)
Polycyclic aromatic hydrocarbons (PAHs) are contaminants that reside mainly in surface soils. Dietary intake of plant-based foods can make a major contribution to total PAH exposure. Little information is available on the relationship between root morphology and plant uptake of PAHs. An understanding of plant root morphologic and compositional factors that affect root uptake of contaminants is important and can inform both agricultural (chemical contamination of crops) and engineering (phytoremediation) applications. Five crop plant species are grown hydroponically in solutions containing the PAH phenanthrene. Measurements are taken for 1) phenanthrene uptake, 2) root morphology – specific surface area, volume, surface area, tip number and total root length and 3) root tissue composition – water, lipid, protein and carbohydrate content. These factors are compared through Pearson's correlation and multiple linear regression analysis. The major factors which promote phenanthrene uptake are specific surface area and lipid content. -- Highlights: •There is no correlation between phenanthrene uptake and total root length, and water. •Specific surface area and lipid are the most crucial factors for phenanthrene uptake. •The contribution of specific surface area is greater than that of lipid. -- The contribution of specific surface area is greater than that of lipid in the two most important root morphological and compositional factors affecting phenanthrene uptake
Investigations upon the indefinite rolls quality assurance in multiple regression analysis
International Nuclear Information System (INIS)
Investigations upon the indefinite rolls quality assurance in multiple regression analysis
Investigations upon the indefinite rolls quality assurance in multiple regression analysis
Kiss, I.
2012-11-01
The rolling rolls quality has been enhanced mainly due to the improvements of the chemical compositions of rolls materials. The realization of an optimal chemical composition can constitute a technical efficient mode to assure the exploitation properties, the material from which the rolling mills rolls are manufactured having a higher importance in this sense. This paper continues to present the scientifically results of our experimental research in the area of the rolling rolls. The basic research contains concrete elements of immediate practical utilities in the metallurgical enterprises, for the quality improvements of rolls, having in last as the aim the durability growth and the safety in exploitation. This paper presents an analysis of the chemical composition, the influences upon the mechanical properties of the indefinite cast iron rolls. We present some mathematical correlations and graphical interpretations between the hardness (on the working surface and on necks) and the chemical composition. Using the double and triple correlations which is really helpful in the foundry practice, as it allows us to determine variation boundaries for the chemical composition, in view the obtaining the optimal values of the hardness. We suggest a mathematical interpretation of the influence of the chemical composition over the hardness of these indefinite rolling rolls. In this sense we use the multiple regression analysis which can be an important statistical tool for the investigation of relationships between variables. The enunciation of some mathematically modeling results can be described through a number of multi-component equations determined for the spaces with 3 and 4 dimensions. Also, the regression surfaces, curves of levels and volumes of variations can be represented and interpreted by technologists considering these as correlation diagrams between the analyzed variables. In this sense, these researches results can be used in the engineers collectives of the foundries and the rolling mills sectors, for quality assurances of rolls as far back as phase of production, as well as in exploitation of these, what lead to, inevitably, to the quality assurance of produced laminates. (Author) 16 refs.
Courvoisier, Delphine S.; Olivier Renaud
2010-01-01
After much exertion and care to run an experiment in social science, the analysis of data should not be ruined by an improper analysis. Often, classical methods, like the mean, the usual simple and multiple linear regressions, and the ANOVA require normality and absence of outliers, which rarely occurs in data coming from experiments. To palliate to this problem, researchers often use some ad-hoc methods like the detection and deletion of outliers. In this tutorial, we will show the shortcomi...
Multiple Logistic Regression Analysis of Cigarette Use among High School Students
Adwere-Boamah, Joseph
2011-01-01
A binary logistic regression analysis was performed to predict high school students' cigarette smoking behavior from selected predictors from 2009 CDC Youth Risk Behavior Surveillance Survey. The specific target student behavior of interest was frequent cigarette use. Five predictor variables included in the model were: a) race, b) frequency of…
Regression analysis by example
Chatterjee, Samprit
2012-01-01
Praise for the Fourth Edition: ""This book is . . . an excellent source of examples for regression analysis. It has been and still is readily readable and understandable."" -Journal of the American Statistical Association Regression analysis is a conceptually simple method for investigating relationships among variables. Carrying out a successful application of regression analysis, however, requires a balance of theoretical results, empirical rules, and subjective judgment. Regression Analysis by Example, Fifth Edition has been expanded
S . Saaidpour
2012-03-01
Full Text Available The multiple linear regression (MLR was used to build the linear quantitative structure-property relationship (QSPR model for the prediction of the molar diamagnetic susceptibility (?mfor 140 diverse organic compounds using the three significant descriptors calculated from the molecular structures alone and selected by stepwise regression method. Stepwise regression was employed to develop a regression equation based on 100training compounds, and predictive ability was tested on 40 compounds reserved for that purpose. The stability of the proposed model was validated using Leave-One-Out cross-validation and randomization test. Application of the developed model to a testing set of 40 organic compounds demonstrates that the new model is reliable with good predictive accuracy and simple formulation. By applying MLR method we can predict the test set (40compounds with Q2extof 0.9894 and average root mean square error (RMSE of 2.2550. The model applicability domain was always verified by the leverage approach in order to propose reliable predicted data. The prediction results are in good agreement with the experimental values.
Lorenzo-Seva, Urbano; Ferrando, Pere J
2011-03-01
We provide an SPSS program that implements currently recommended techniques and recent developments for selecting variables in multiple linear regression analysis via the relative importance of predictors. The approach consists of: (1) optimally splitting the data for cross-validation, (2) selecting the final set of predictors to be retained in the equation regression, and (3) assessing the behavior of the chosen model using standard indices and procedures. The SPSS syntax, a short manual, and data files related to this article are available as supplemental materials from brm.psychonomic-journals.org/content/supplemental. PMID:21287104
Error analysis of dimensionless scaling experiments with multiple points using linear regression
International Nuclear Information System (INIS)
A general method of error estimation in the case of multiple point dimensionless scaling experiments, using linear regression and standard error propagation, is proposed. The method reduces to the previous result of Cordey (2009 Nucl. Fusion 49 052001) in the case of a two-point scan. On the other hand, if the points follow a linear trend, it explains how the estimated error decreases as more points are added to the scan. Based on the analytical expression that is derived, it is argued that for a low number of points, adding points to the ends of the scanned range, rather than the middle, results in a smaller error estimate. (letter)
Nishida,Keiichiro; Honda, Mitsugi; Hashizume, Hiroyuki; Arita, Seizaburo; Watanabe,Masutaka; Ozaki, Toshifumi
2013-01-01
The purpose of this study was to quantitatively evaluate Akahori's preoperative classification of cubital tunnel syndrome. We analyzed the results for 57 elbows that were treated by a simple decompression procedure from 1997 to 2004. The relationship between each item of Akahori's preoperative classification and clinical stage was investigated based on the parameter distribution. We evaluated Akahori's classification system using multiple regression analysis, and investigated the association ...
Ani Shabri; Ruhaidah Samsudin
2014-01-01
Crude oil prices do play significant role in the global economy and are a key input into option pricing formulas, portfolio allocation, and risk measurement. In this paper, a hybrid model integrating wavelet and multiple linear regressions (MLR) is proposed for crude oil price forecasting. In this model, Mallat wavelet transform is first selected to decompose an original time series into several subseries with different scale. Then, the principal component analysis (PCA) is used in processing...
Understanding logistic regression analysis
Sperandei, Sandro
2014-01-01
Logistic regression is used to obtain odds ratio in the presence of more than one explanatory variable. The procedure is quite similar to multiple linear regression, with the exception that the response variable is binomial. The result is the impact of each variable on the odds ratio of the observed event of interest. The main advantage is to avoid confounding effects by analyzing the association of all variables together. In this article, we explain the logistic regression procedure using ex...
Simms, Laura E.; Pilipenko, Viacheslav; Engebretson, Mark J.; Reeves, Geoffrey D.; Smith, A. J.; Clilverd, Mark
2014-09-01
Many solar wind and magnetosphere parameters correlate with relativistic electron flux following storms. These include relativistic electron flux before the storm; seed electron flux; solar wind velocity and number density (and their variation); interplanetary magnetic field Bz, AE and Kp indices; and ultra low frequency (ULF) and very low frequency (VLF) wave power. However, as all these variables are intercorrelated, we use multiple regression analyses to determine which are the most predictive of flux when other variables are controlled. Using 219 storms (1992-2002), we obtained hourly averaged electron fluxes for outer radiation belt relativistic electrons (>1.5 MeV) and seed electrons (100 keV) from Los Alamos National Laboratory spacecraft (geosynchronous orbit). For each storm, we found the log10 maximum relativistic electron flux 48-120 h after the end of the main phase of each storm. Each predictor variable was averaged over the 12 h before the storm, the main phase, and the 48 h following minimum Dst. High levels of flux following storms are best modeled by a set of variables. In decreasing influence, ULF, seed electron flux, Vsw and its variation, and after-storm Bz were the most significant explanatory variables. Kp can be added to the model, but it adds no further explanatory power. Although we included ground-based VLF power from Halley, Antarctica, it shows little predictive ability. We produced predictive models using the coefficients from the regression models and assessed their effectiveness in predicting novel observations. The correlation between observed values and those predicted by these empirical models ranged from 0.645 to 0.795.
Application of multiple regression analysis to forecasting South Africa's electricity demand
Scientific Electronic Library Online (English)
Renee, Koen; Jennifer, Holloway.
2014-11-01
Full Text Available In a developing country such as South Africa, understanding the expected future demand for electricity is very important in various planning contexts. It is specifically important to understand how expected scenarios regarding population or economic growth can be translated into corresponding future [...] electricity usage patterns. This paper discusses a methodology for forecasting long-term electricity demand that was specifically developed for applying to such scenarios. The methodology uses a series of multiple regression models to quantify historical patterns of electricity usage per sector in relation to patterns observed in certain economic and demographic variables, and uses these relationships to derive expected future electricity usage patterns. The methodology has been used successfully to derive forecasts used for strategic planning within a private company as well as to provide forecasts to aid planning in the public sector. This paper discusses the development of the modelling methodology, provides details regarding the extensive data collection and validation processes followed during the model development, and reports on the relevant model fit statistics. The paper also shows that the forecasting methodology has to some extent been able to match the actual patterns, and therefore concludes that the methodology can be used to support planning by translating changes relating to economic and demographic growth, for a range of scenarios, into a corresponding electricity demand. The methodology therefore fills a particular gap within the South African long-term electricity forecasting domain.
Understanding logistic regression analysis.
Sperandei, Sandro
2014-01-01
Logistic regression is used to obtain odds ratio in the presence of more than one explanatory variable. The procedure is quite similar to multiple linear regression, with the exception that the response variable is binomial. The result is the impact of each variable on the odds ratio of the observed event of interest. The main advantage is to avoid confounding effects by analyzing the association of all variables together. In this article, we explain the logistic regression procedure using examples to make it as simple as possible. After definition of the technique, the basic interpretation of the results is highlighted and then some special issues are discussed. PMID:24627710
Nishida,Keiichiro
2013-02-01
Full Text Available The purpose of this study was to quantitatively evaluate Akahori's preoperative classification of cubital tunnel syndrome. We analyzed the results for 57 elbows that were treated by a simple decompression procedure from 1997 to 2004. The relationship between each item of Akahori's preoperative classification and clinical stage was investigated based on the parameter distribution. We evaluated Akahori's classification system using multiple regression analysis, and investigated the association between the stage and treatment results. The usefulness of the regression equation was evaluated by analysis of variance of the expected and observed scores. In the parameter distribution, each item of Akahori's classification was mostly associated with the stage, but it was difficult to judge the severity of palsy. In the mathematical evaluation, the most effective item in determining the stage was sensory conduction velocity. It was demonstrated that the established regression equation was highly reliable (R?0.922. Akahori's preoperative classification can also be used in postoperative classification, and this classification was correlated with postoperative prognosis. Our results indicate that Akahori's preoperative classification is a suitable system. It is reliable, reproducible and well-correlated with the postoperative prognosis. In addition, the established prediction formula is useful to reduce the diagnostic complexity of Akahori's classification.
Keith, Timothy Z
2014-01-01
Multiple Regression and Beyond offers a conceptually oriented introduction to multiple regression (MR) analysis and structural equation modeling (SEM), along with analyses that flow naturally from those methods. By focusing on the concepts and purposes of MR and related methods, rather than the derivation and calculation of formulae, this book introduces material to students more clearly, and in a less threatening way. In addition to illuminating content necessary for coursework, the accessibility of this approach means students are more likely to be able to conduct research using MR or SEM--a
Multiple Instance Regression with Structured Data
Wagstaff, Kiri L.; Lane, Terran; Roper, Alex
2008-01-01
This slide presentation reviews the use of multiple instance regression with structured data from multiple and related data sets. It applies the concept to a practical problem, that of estimating crop yield using remote sensed country wide weekly observations.
Reliability and Regression Analysis
Lane, David M.
This applet, by David M. Lane of Rice University, demonstrates how the reliability of X and Y affect various aspects of the regression of Y on X. Java 1.1 is required and a full set of instructions is given in order to get the full value from the applet. Exercises and definitions to key terms are also given to help students understand reliability and regression analysis.
Keat, Sim Chong; Chun, Beh Boon; San, Lim Hwee; Jafri, Mohd Zubir Mat
2015-04-01
Climate change due to carbon dioxide (CO2) emissions is one of the most complex challenges threatening our planet. This issue considered as a great and international concern that primary attributed from different fossil fuels. In this paper, regression model is used for analyzing the causal relationship among CO2 emissions based on the energy consumption in Malaysia using time series data for the period of 1980-2010. The equations were developed using regression model based on the eight major sources that contribute to the CO2 emissions such as non energy, Liquefied Petroleum Gas (LPG), diesel, kerosene, refinery gas, Aviation Turbine Fuel (ATF) and Aviation Gasoline (AV Gas), fuel oil and motor petrol. The related data partly used for predict the regression model (1980-2000) and partly used for validate the regression model (2001-2010). The results of the prediction model with the measured data showed a high correlation coefficient (R2=0.9544), indicating the model's accuracy and efficiency. These results are accurate and can be used in early warning of the population to comply with air quality standards.
Sanford, Ward E.; Nelms, David L.; Pope, Jason P.; Selnick, David L.
2012-01-01
This study by the U.S. Geological Survey, prepared in cooperation with the Virginia Department of Environmental Quality, quantifies the components of the hydrologic cycle across the Commonwealth of Virginia. Long-term, mean fluxes were calculated for precipitation, surface runoff, infiltration, total evapotranspiration (ET), riparian ET, recharge, base flow (or groundwater discharge) and net total outflow. Fluxes of these components were first estimated on a number of real-time-gaged watersheds across Virginia. Specific conductance was used to distinguish and separate surface runoff from base flow. Specific-conductance data were collected every 15 minutes at 75 real-time gages for approximately 18 months between March 2007 and August 2008. Precipitation was estimated for 1971–2000 using PRISM climate data. Precipitation and temperature from the PRISM data were used to develop a regression-based relation to estimate total ET. The proportion of watershed precipitation that becomes surface runoff was related to physiographic province and rock type in a runoff regression equation. Component flux estimates from the watersheds were transferred to flux estimates for counties and independent cities using the ET and runoff regression equations. Only 48 of the 75 watersheds yielded sufficient data, and data from these 48 were used in the final runoff regression equation. The base-flow proportion for the 48 watersheds averaged 72 percent using specific conductance, a value that was substantially higher than the 61 percent average calculated using a graphical-separation technique (the USGS program PART). Final results for the study are presented as component flux estimates for all counties and independent cities in Virginia.
Barrett, C. A.
1985-01-01
Multiple linear regression analysis was used to determine an equation for estimating hot corrosion attack for a series of Ni base cast turbine alloys. The U transform (i.e., 1/sin (% A/100) to the 1/2) was shown to give the best estimate of the dependent variable, y. A complete second degree equation is described for the centered" weight chemistries for the elements Cr, Al, Ti, Mo, W, Cb, Ta, and Co. In addition linear terms for the minor elements C, B, and Zr were added for a basic 47 term equation. The best reduced equation was determined by the stepwise selection method with essentially 13 terms. The Cr term was found to be the most important accounting for 60 percent of the explained variability hot corrosion attack.
International Nuclear Information System (INIS)
A two layer perceptron with backpropagation of error is used for quantitative analysis in ICP-AES. The network was trained by emission spectra of two interfering lines of Cd and As and the concentrations of both elements were subsequently estimated from mixture spectra. The spectra of the Cd and As lines were also used to perform multiple linear regression (MLR) via the calculation of the pseudoinverse S+ of the sensitivity matrix S. In the present paper it is shown that there exist close relations between the operation of the perceptron and the MLR procedure. These are most clearly apparent in the correlation between the weights of the backpropagation network and the elements of the pseudoinverse. Using MLR, the confidence intervals over the predictions are exploited to correct for the optical device of the wavelength shift. (orig.)
Zhang, Chen; Li, Xiaoming; Su, Shaobing; Hong, Yan; Zhou, Yuejiao; Tang, Zhenzhu; Shen, Zhiyong
2015-07-01
Limited data are available regarding risk factors that are related to intimate partner violence (IPV) against female sex workers (FSWs) in the context of stable partnerships. Out of the 1,022 FSWs, 743 reported ever having a stable partnership and 430 (more than half) of those reported experiencing IPV. Hierarchical multivariate regression revealed that some characteristics of stable partners (e.g., low education, alcohol use) and relationship stressors (e.g., frequent friction, concurrent partnerships) were independently predictive of IPV against FSWs. Public health professionals who design future violence prevention interventions targeting FSWs need to consider the influence of their stable partners. PMID:24730642
Seber, George A F
2012-01-01
Concise, mathematically clear, and comprehensive treatment of the subject.* Expanded coverage of diagnostics and methods of model fitting.* Requires no specialized knowledge beyond a good grasp of matrix algebra and some acquaintance with straight-line regression and simple analysis of variance models.* More than 200 problems throughout the book plus outline solutions for the exercises.* This revision has been extensively class-tested.
M. Cholewa
2011-07-01
Full Text Available In this article authors showed influence of technological parameters and modification treatment on structural properties for closed skeleton castings. Approach obtained maximal refinement of structure and minimal structure diversification. Skeleton castings were manufactured in accordance with elaborated production technology. Experimental castings were manufactured in variables technological conditions: range of pouring temperature 953 ÷ 1013 K , temperature of mould 293 ÷ 373 K and height of gating system above casting level 105 ÷ 175 mm. Analysis of metallographic specimens and quantitative analysis of silicon crystals and secondary dendrite-arm spacing analysis of solution ? were performed. Average values of stereological parameters for all castings were determined. (B/L and (P/A factors were determined. On basis results of microstructural analysis authors compares research of samples. The aim of analysis was selected samples on least diversification of refinement degree of structure and least silicon crystals. On basis microstructural analysis authors state that samples 5 (AlSi11, Tpour 1013K, Tmould 333K, h – 265 mm has the best structural properties (least diversification of refinement degree of structure and the least refinement of silicon crystals. Then statistical analysis results of structural analysis was obtained. On basis statistical analysis autors statethat the best structural properties for technological parameters: Tpour= 1013 K, Tmould= 373 K and h = 230 mm [4]. The results of statistical analysis are the prerequisite for optimization studies.
Correlation Weights in Multiple Regression
Waller, Niels G.; Jones, Jeff A.
2010-01-01
A general theory on the use of correlation weights in linear prediction has yet to be proposed. In this paper we take initial steps in developing such a theory by describing the conditions under which correlation weights perform well in population regression models. Using OLS weights as a comparison, we define cases in which the two weighting…
Assumptions of Multiple Regression: Correcting Two Misconceptions
Williams, Matt N.; Gomez Grajales, Carlos Alberto; Kurkiewicz, Dason
2013-01-01
In 2002, an article entitled "Four assumptions of multiple regression that researchers should always test" by Osborne and Waters was published in "PARE." This article has gone on to be viewed more than 275,000 times (as of August 2013), and it is one of the first results displayed in a Google search for "regression…
Multiple Output Regression with Latent Noise
Gillberg, Jussi; Marttinen, Pekka; Pirinen, Matti; Kangas, Antti J; Soininen, Pasi; Ali, Mehreen; Havulinna, Aki S; Järvelin, Marjo-Riitta Marjo-Riitta; Ala-Korpela, Mika; Kaski, Samuel
2014-01-01
In high-dimensional data, structured noise, caused by observed and unobserved factors affecting multiple target variables simultaneously, imposes a serious challenge for modeling, by masking the often weak signal. Therefore, (1) explaining away the structured noise in multiple-output regression is of paramount importance. Additionally, (2) assumptions about the correlation structure of the regression weights are needed. We note that both can be formulated in a natural way in...
The Geometry of Enhancement in Multiple Regression
Waller, Niels G.
2011-01-01
In linear multiple regression, "enhancement" is said to occur when R[superscript 2] = b[prime]r greater than r[prime]r, where b is a p x 1 vector of standardized regression coefficients and r is a p x 1 vector of correlations between a criterion y and a set of standardized regressors, x. When p = 1 then b [is congruent to] r and enhancement cannot…
Computing multiple-output regression quantile regions.
Czech Academy of Sciences Publication Activity Database
Paindaveine, D.; Šiman, Miroslav
2012-01-01
Regression analysis in quantum language
Ishikawa, Shiro
2014-01-01
Although regression analysis has a great history, we consider that it has always continued being confused. For example, the fundamental terms in regression analysis (e.g., "regression", "least-squares method", "explanatory variable", "response variable", etc.) seem to be historically conventional, that is, these words do not express the essence of regression analysis. Recently, we proposed quantum language (or, classical and quantum measurement theory), which is characterize...
Multiple-Instance Regression with Structured Data
Wagstaff, Kiri L.; Lane, Terran; Roper, Alex
2008-01-01
We present a multiple-instance regression algorithm that models internal bag structure to identify the items most relevant to the bag labels. Multiple-instance regression (MIR) operates on a set of bags with real-valued labels, each containing a set of unlabeled items, in which the relevance of each item to its bag label is unknown. The goal is to predict the labels of new bags from their contents. Unlike previous MIR methods, MI-ClusterRegress can operate on bags that are structured in that they contain items drawn from a number of distinct (but unknown) distributions. MI-ClusterRegress simultaneously learns a model of the bag's internal structure, the relevance of each item, and a regression model that accurately predicts labels for new bags. We evaluated this approach on the challenging MIR problem of crop yield prediction from remote sensing data. MI-ClusterRegress provided predictions that were more accurate than those obtained with non-multiple-instance approaches or MIR methods that do not model the bag structure.
Yoo, Yun Joo; SUN Lei; Bull, Shelley B
2013-01-01
Multi-marker methods for genetic association analysis can be performed for common and low frequency SNPs to improve power. Regression models are an intuitive way to formulate multi-marker tests. In previous studies we evaluated regression-based multi-marker tests for common SNPs, and through identification of bins consisting of correlated SNPs, developed a multi-bin linear combination (MLC) test that is a compromise between a 1 df linear combination test and a multi-df global test. Bins of SN...
Kokaly, R.F.; Clark, R.N.
1999-01-01
We develop a new method for estimating the biochemistry of plant material using spectroscopy. Normalized band depths calculated from the continuum-removed reflectance spectra of dried and ground leaves were used to estimate their concentrations of nitrogen, lignin, and cellulose. Stepwise multiple linear regression was used to select wavelengths in the broad absorption features centered at 1.73 ??m, 2.10 ??m, and 2.30 ??m that were highly correlated with the chemistry of samples from eastern U.S. forests. Band depths of absorption features at these wavelengths were found to also be highly correlated with the chemistry of four other sites. A subset of data from the eastern U.S. forest sites was used to derive linear equations that were applied to the remaining data to successfully estimate their nitrogen, lignin, and cellulose concentrations. Correlations were highest for nitrogen (R2 from 0.75 to 0.94). The consistent results indicate the possibility of establishing a single equation capable of estimating the chemical concentrations in a wide variety of species from the reflectance spectra of dried leaves. The extension of this method to remote sensing was investigated. The effects of leaf water content, sensor signal-to-noise and bandpass, atmospheric effects, and background soil exposure were examined. Leaf water was found to be the greatest challenge to extending this empirical method to the analysis of fresh whole leaves and complete vegetation canopies. The influence of leaf water on reflectance spectra must be removed to within 10%. Other effects were reduced by continuum removal and normalization of band depths. If the effects of leaf water can be compensated for, it might be possible to extend this method to remote sensing data acquired by imaging spectrometers to give estimates of nitrogen, lignin, and cellulose concentrations over large areas for use in ecosystem studies.We develop a new method for estimating the biochemistry of plant material using spectroscopy. Normalized band depths calculated from the continuum-removed reflectance spectra of dried and ground leaves were used to estimate their concentrations of nitrogen, lignin, and cellulose. Stepwise multiple linear regression was used to select wavelengths in the broad absorption features centered at 1.73 ??m, 2.10 ??m, and 2.301 ??m that were highly correlated with the chemistry of samples from eastern U.S. forests. Band depths of absorption features at these wavelengths were found to also be highly correlated with the chemistry of four other sites. A subset of data from the eastern U.S. forest sites was used to derive linear equations that were applied to the remaining data to successfully estimate their nitrogen, lignin, and cellulose concentrations. Correlations were highest for nitrogen (R2 from 0.75 to 0.94). The consistent results indicate the possibility of establishing a single equation capable of estimating the chemical concentrations in a wide variety of species from the reflectance spectra of dried leaves. The extension of this method to remote sensing was investigated. The effects of leaf water content, sensor signal-to-noise and bandpass, atmospheric effects, and background soil exposure were examined. Leaf water was found to be the greatest challenge to extending this empirical method to the analysis of fresh whole leaves and complete vegetation canopies. The influence of leaf water on reflectance spectra must be removed to within 10%. Other effects were reduced by continuum removal and normalization of band depths. If the effects of leaf water can be compensated for, it might be possible to extend this method to remote sensing data acquired by imaging spectrometers to give estimates of nitrogen, lignin, and cellulose concentrations over large areas for use in ecosystem studies.
Multiple Outliers Detection Procedures in Linear Regression
Robiah Adnan; Mohd Nor Mohamad; Halim Setan
2003-01-01
This paper describes a procedure for identifying multiple outliers in linear regression. This procedure uses a robust fit which is the least of trimmed of squares (LTS) and the single linkage clustering method to obtain the potential outliers. Then multiple-case diagnostics are used to obtain the outliers from these potential outliers. The performance of this procedure is also compared to Serbert’s method. Monte Carlo simulations are used in determining which procedure performed best in all...
Yoo, Yun Joo; Sun, Lei; Bull, Shelley B
2013-01-01
Multi-marker methods for genetic association analysis can be performed for common and low frequency SNPs to improve power. Regression models are an intuitive way to formulate multi-marker tests. In previous studies we evaluated regression-based multi-marker tests for common SNPs, and through identification of bins consisting of correlated SNPs, developed a multi-bin linear combination (MLC) test that is a compromise between a 1 df linear combination test and a multi-df global test. Bins of SNPs in high linkage disequilibrium (LD) are identified, and a linear combination of individual SNP statistics is constructed within each bin. Then association with the phenotype is represented by an overall statistic with df as many or few as the number of bins. In this report we evaluate multi-marker tests for SNPs that occur at low frequencies. There are many linear and quadratic multi-marker tests that are suitable for common or low frequency variant analysis. We compared the performance of the MLC tests with various linear and quadratic statistics in joint or marginal regressions. For these comparisons, we performed a simulation study of genotypes and quantitative traits for 85 genes with many low frequency SNPs based on HapMap Phase III. We compared the tests using (1) set of all SNPs in a gene, (2) set of common SNPs in a gene (MAF ? 5%), (3) set of low frequency SNPs (1% ? MAF regression Wald test showed consistently good performance whereas other statistics including the ones based on marginal regression had lower power for some situations. PMID:24273553
Multiple Regression Analyses in Clinical Child and Adolescent Psychology
Jaccard, James; Guilamo-Ramos, Vincent; Johansson, Margaret; Bouris, Alida
2006-01-01
A major form of data analysis in clinical child and adolescent psychology is multiple regression. This article reviews issues in the application of such methods in light of the research designs typical of this field. Issues addressed include controlling covariates, evaluation of predictor relevance, comparing predictors, analysis of moderation,…
On directional multiple-output quantile regression.
Paindaveine, D.; Šiman, Miroslav
2011-01-01
Ro?. 102, ?. 2 (2011), s. 193-212. ISSN 0047-259X R&D Projects: GA MŠk(CZ) 1M06047 Grant ostatní: Commision EC(BE) Fonds National de la Recherche Scientifique Institutional research plan: CEZ:AV0Z10750506 Keywords : multivariate quantile * quantile regression * multiple-output regression * halfspace depth * portfolio optimization * value-at risk Subject RIV: BA - General Mathematics Impact factor: 0.879, year: 2011 http://library.utia.cas.cz/separaty/2011/SI/siman-0364128.pdf
Salience Assignment for Multiple-Instance Regression
Wagstaff, Kiri L.; Lane, Terran
2007-01-01
We present a Multiple-Instance Learning (MIL) algorithm for determining the salience of each item in each bag with respect to the bag's real-valued label. We use an alternating-projections constrained optimization approach to simultaneously learn a regression model and estimate all salience values. We evaluate this algorithm on a significant real-world problem, crop yield modeling, and demonstrate that it provides more extensive, intuitive, and stable salience models than Primary-Instance Regression, which selects a single relevant item from each bag.
Multiple-Regression Hidden Markov Model
Fujinaga, Katsuhisa; Nakai, Mitsuru; Shimodaira, Hiroshi; Sagayama, Shigeki
2001-01-01
This paper proposes a new class of hidden Markov model (HMM) called multiple-regression HMM (MRHMM) that utilizes auxiliary features such as fundamental frequency (F0) and speaking styles that affect spectral parameters to better model the acoustic features of phonemes. Though such auxiliary features are considered to be the factors that degrade the performance of speech recognizers, the proposed MR-HMM adapts its model parameters, i.e. mean vectors of output probabili...
Flexible Model Selection Criterion for Multiple Regression
Kunio Takezawa
2012-01-01
Predictors of a multiple linear regression equation selected by GCV (Generalized Cross Validation) may contain undesirable predictors with no linear functional relationship with the target variable, but are chosen only by accident. This is because GCV estimates prediction error, but does not control the probability of selecting irrelevant predictors of the target variable. To take this possibility into account, a new statistics “GCV_{f}
Tarpey, Thaddeus; Petkova, Eva
2010-07-01
Finite mixture models have come to play a very prominent role in modelling data. The finite mixture model is predicated on the assumption that distinct latent groups exist in the population. The finite mixture model therefore is based on a categorical latent variable that distinguishes the different groups. Often in practice distinct sub-populations do not actually exist. For example, disease severity (e.g. depression) may vary continuously and therefore, a distinction of diseased and not-diseased may not be based on the existence of distinct sub-populations. Thus, what is needed is a generalization of the finite mixture's discrete latent predictor to a continuous latent predictor. We cast the finite mixture model as a regression model with a latent Bernoulli predictor. A latent regression model is proposed by replacing the discrete Bernoulli predictor by a continuous latent predictor with a beta distribution. Motivation for the latent regression model arises from applications where distinct latent classes do not exist, but instead individuals vary according to a continuous latent variable. The shapes of the beta density are very flexible and can approximate the discrete Bernoulli distribution. Examples and a simulation are provided to illustrate the latent regression model. In particular, the latent regression model is used to model placebo effect among drug treated subjects in a depression study. PMID:20625443
Lee, L.; Helsel, D.
2005-01-01
Trace contaminants in water, including metals and organics, often are measured at sufficiently low concentrations to be reported only as values below the instrument detection limit. Interpretation of these "less thans" is complicated when multiple detection limits occur. Statistical methods for multiply censored, or multiple-detection limit, datasets have been developed for medical and industrial statistics, and can be employed to estimate summary statistics or model the distributions of trace-level environmental data. We describe S-language-based software tools that perform robust linear regression on order statistics (ROS). The ROS method has been evaluated as one of the most reliable procedures for developing summary statistics of multiply censored data. It is applicable to any dataset that has 0 to 80% of its values censored. These tools are a part of a software library, or add-on package, for the R environment for statistical computing. This library can be used to generate ROS models and associated summary statistics, plot modeled distributions, and predict exceedance probabilities of water-quality standards. ?? 2005 Elsevier Ltd. All rights reserved.
A Dirty Model for Multiple Sparse Regression
Jalali, Ali; Sanghavi, Sujay
2011-01-01
Sparse linear regression -- finding an unknown vector from linear measurements -- is now known to be possible with fewer samples than variables, via methods like the LASSO. We consider the multiple sparse linear regression problem, where several related vectors -- with partially shared support sets -- have to be recovered. A natural question in this setting is whether one can use the sharing to further decrease the overall number of samples required. A line of recent research has studied the use of \\ell_1/\\ell_q norm block-regularizations with q>1 for such problems; however these could actually perform worse in sample complexity -- vis a vis solving each problem separately ignoring sharing -- depending on the level of sharing. We present a new method for multiple sparse linear regression that can leverage support and parameter overlap when it exists, but not pay a penalty when it does not. A very simple idea: we decompose the parameters into two components and regularize these differently. We show both theore...
International Nuclear Information System (INIS)
A novel statistical method, namely Regression-Estimated Input Function (REIF), is proposed in this study for the purpose of non-invasive estimation of the input function for fluorine-18 2-fluoro-2-deoxy-d-glucose positron emission tomography (FDG-PET) quantitative analysis. We collected 44 patients who had undergone a blood sampling procedure during their FDG-PET scans. First, we generated tissue time-activity curves of the grey matter and the whole brain with a segmentation technique for every subject. Summations of different intervals of these two curves were used as a feature vector, which also included the net injection dose. Multiple linear regression analysis was then applied to find the correlation between the input function and the feature vector. After a simulation study with in vivo data, the data of 29 patients were applied to calculate the regression coefficients, which were then used to estimate the input functions of the other 15 subjects. Comparing the estimated input functions with the corresponding real input functions, the averaged error percentages of the area under the curve and the cerebral metabolic rate of glucose (CMRGlc) were 12.13±8.85 and 16.60±9.61, respectively. Regression analysis of the CMRGlc values derived from the real and estimated input functions revealed a high correlation (r=0.91). No significant difference was found between the real CMRGlc and that derived from our regression-estimated input function (Student's t test, P>0.05).put function (Student's t test, P>0.05). The proposed REIF method demonstrated good abilities for input function and CMRGlc estimation, and represents a reliable replacement for the blood sampling procedures in FDG-PET quantification. (orig.)
Multiple linear regression for isotopic measurements
Garcia Alonso, J. I.
2012-04-01
There are two typical applications of isotopic measurements: the detection of natural variations in isotopic systems and the detection man-made variations using enriched isotopes as indicators. For both type of measurements accurate and precise isotope ratio measurements are required. For the so-called non-traditional stable isotopes, multicollector ICP-MS instruments are usually applied. In many cases, chemical separation procedures are required before accurate isotope measurements can be performed. The off-line separation of Rb and Sr or Nd and Sm is the classical procedure employed to eliminate isobaric interferences before multicollector ICP-MS measurement of Sr and Nd isotope ratios. Also, this procedure allows matrix separation for precise and accurate Sr and Nd isotope ratios to be obtained. In our laboratory we have evaluated the separation of Rb-Sr and Nd-Sm isobars by liquid chromatography and on-line multicollector ICP-MS detection. The combination of this chromatographic procedure with multiple linear regression of the raw chromatographic data resulted in Sr and Nd isotope ratios with precisions and accuracies typical of off-line sample preparation procedures. On the other hand, methods for the labelling of individual organisms (such as a given plant, fish or animal) are required for population studies. We have developed a dual isotope labelling procedure which can be unique for a given individual, can be inherited in living organisms and it is stable. The detection of the isotopic signature is based also on multiple linear regression. The labelling of fish and its detection in otoliths by Laser Ablation ICP-MS will be discussed using trout and salmon as examples. As a conclusion, isotope measurement procedures based on multiple linear regression can be a viable alternative in multicollector ICP-MS measurements.
Omnibus Hypothesis Testing in Dominance-Based Ordinal Multiple Regression
Long, Jeffrey D.
2005-01-01
Often quantitative data in the social sciences have only ordinal justification. Problems of interpretation can arise when least squares multiple regression (LSMR) is used with ordinal data. Two ordinal alternatives are discussed, dominance-based ordinal multiple regression (DOMR) and proportional odds multiple regression. The Q[superscript 2]…
S. CONDON
2014-06-01
Full Text Available The thermal inactivation of Enterococcus faecium under isothermal conditions in tryptic soy broth of different pH (4.0, 5.5 and 7.4 was studied. The bacterial cells were more sensitive at higher temperature and in media of low pH. Decimal reduction times at 71ºC were 2.56, 0.39 and 0.03 min at pH 7.4, 5.5 and 4.0 respectively. At all temperatures and pH assayed, the survival curves obtained were linear. A mathematical model based on the first order kinetic accurately described these survival curves. The relationship between DT values and temperature was also linear. A mean z-value of 5ºC was established. A multiple linear regression model using four predictor variables (pH, T, pH2 and T2 related the Log of DT value with pH and treatment temperature. The developed tertiary model satisfactorily predicted the heat inactivation of Enterococcus faeciumunder the treatment conditions investigated.
Paulo Canas, Rodrigues; Dulce Gamito Santinhos, Pereira; João Tiago, Mexia.
2011-12-01
Full Text Available This paper joins the main properties of joint regression analysis (JRA), a model based on the Finlay-Wilkinson regression to analyse multi-environment trials, and of the additive main effects and multiplicative interaction (AMMI) model. The study compares JRA and AMMI with particular focus on robust [...] ness with increasing amounts of randomly selected missing data. The application is made using a data set from a breeding program of durum wheat (Triticum turgidum L., Durum Group) conducted in Portugal. The results of the two models result in similar dominant cultivars (JRA) and winner of mega-environments (AMMI) for the same environments. However, JRA had more stable results with the increase in the incidence rates of missing values.
Hukharnsusatrue, A.
2005-11-01
Full Text Available The objective of this research is to compare multiple regression coefficients estimating methods with existence of multicollinearity among independent variables. The estimation methods are Ordinary Least Squares method (OLS, Restricted Least Squares method (RLS, Restricted Ridge Regression method (RRR and Restricted Liu method (RL when restrictions are true and restrictions are not true. The study used the Monte Carlo Simulation method. The experiment was repeated 1,000 times under each situation. The analyzed results of the data are demonstrated as follows. CASE 1: The restrictions are true. In all cases, RRR and RL methods have a smaller Average Mean Square Error (AMSE than OLS and RLS method, respectively. RRR method provides the smallest AMSE when the level of correlations is high and also provides the smallest AMSE for all level of correlations and all sample sizes when standard deviation is equal to 5. However, RL method provides the smallest AMSE when the level of correlations is low and middle, except in the case of standard deviation equal to 3, small sample sizes, RRR method provides the smallest AMSE.The AMSE varies with, most to least, respectively, level of correlations, standard deviation and number of independent variables but inversely with to sample size.CASE 2: The restrictions are not true.In all cases, RRR method provides the smallest AMSE, except in the case of standard deviation equal to 1 and error of restrictions equal to 5%, OLS method provides the smallest AMSE when the level of correlations is low or median and there is a large sample size, but the small sample sizes, RL method provides the smallest AMSE. In addition, when error of restrictions is increased, OLS method provides the smallest AMSE for all level, of correlations and all sample sizes, except when the level of correlations is high and sample sizes small. Moreover, the case OLS method provides the smallest AMSE, the most RLS method has a smaller AMSE than RRR and RL methods when the level of correlations is low or median and sample sizes are large.The AMSE varies with, most to least, respectively, error of restrictions, level of correlations, standard deviation and number of independent variables but inversely with to sample sizes, except that error of restrictions does not affect AMSE of OLS method.
Multiple Response Regression for Gaussian Mixture Models with Known Labels
Lee, Wonyul; Ying DU; Sun, Wei; Hayes, D. Neil; Liu, Yufeng
2012-01-01
Multiple response regression is a useful regression technique to model multiple response variables using the same set of predictor variables. Most existing methods for multiple response regression are designed for modeling homogeneous data. In many applications, however, one may have heterogeneous data where the samples are divided into multiple groups. Our motivating example is a cancer dataset where the samples belong to multiple cancer subtypes. In this paper, we consider modeling the data...
Lochner, Christine; Seedat, Soraya; Hemmings, Sian M J; Moolman-Smook, Johanna C; Kidd, Martin; Stein, Dan J
2007-01-01
Dissociation is defined as the disruption of the usually integrated functions of consciousness, such as memory, identity, and perceptions of the environment. Causes include various psychological, neurological and neurobiological mechanisms, none of which have been consistently supported. To our knowledge, the role of gene-environment interactions in dissociative experiences in obsessive-compulsive disorder (OCD) has not previously been investigated. Eighty-three Caucasian patients (29 male, 54 female) with a principal diagnosis of OCD were included. The Dissociative Experiences Scale was used to assess dissociation. The role of childhood trauma (assessed with the Childhood Trauma Questionnaire), and a functional 44-bp insertion/deletion polymorphism in the promoter region of the serotonin transporter, or 5-HTT, in mediating dissociation, was investigated using multiple regression analysis and path analysis using the partial least squares model. Both analyses indicated that an interaction between physical neglect and the S/S genotype of the 5-HTT gene significantly predicted dissociation in patients with OCD. Dissociation may be a predictor of poorer treatment outcome in patients with OCD; therefore, a better understanding of the mechanisms that underlie this phenomenon may be useful. Here, two different but related statistical techniques (multiple regression and partial least squares), confirmed that physical neglect and the 5-HTT genotype jointly play a role in predicting dissociation in OCD. PMID:17943026
Dimension Reduction of the Explanatory Variables in Multiple Linear Regression
Filzmoser, P.; Croux, C.
2003-01-01
In classical multiple linear regression analysis problems will occur if the regressors are either multicollinear or if the number of regressors is larger than the number of observations. In this note a new method is introduced which constructs orthogonal predictor variables in a way to have a maximal correlation with the dependent variable. The predictor variables are linear combinations of the original regressors. This method allows a major reduction of the number of predictor...
Shrinkage Estimation and Selection for Multiple Functional Regression
Lian, Heng
2011-01-01
Functional linear regression is a useful extension of simple linear regression and has been investigated by many researchers. However, functional variable selection problems when multiple functional observations exist, which is the counterpart in the functional context of multiple linear regression, is seldom studied. Here we propose a method using group smoothly clipped absolute deviation penalty (gSCAD) which can perform regression estimation and variable selection simulta...
Polynomial regression analysis and significance test of the regression function
International Nuclear Information System (INIS)
In order to analyze the decay heating power of a certain radioactive isotope per kilogram with polynomial regression method, the paper firstly demonstrated the broad usage of polynomial function and deduced its parameters with ordinary least squares estimate. Then significance test method of polynomial regression function is derived considering the similarity between the polynomial regression model and the multivariable linear regression model. Finally, polynomial regression analysis and significance test of the polynomial function are done to the decay heating power of the iso tope per kilogram in accord with the authors' real work. (authors)
AN EFFECTIVE TECHNIQUE OF MULTIPLE IMPUTATION IN NONPARAMETRIC QUANTILE REGRESSION
Yanan Hu; Qianqian Zhu; Maozai Tian
2014-01-01
In this study, we consider the nonparametric quantile regression model with the covariates Missing at Random (MAR). Multiple imputation is becoming an increasingly popular approach for analyzing missing data, which combined with quantile regression is not well-developed. We propose an effective and accurate two-stage multiple imputation method for the model based on the quantile regression, which consists of initial imputation in the first stage and multiple imputation in the second stage. Th...
Four Assumptions of Multiple Regression That Researchers Should Always Test.
Osbourne, Jason W.; Waters, Elaine
2002-01-01
Discusses assumptions of multiple regression that are not robust to violation: linearity, reliability of measurement, homoscedasticity, and normality. Stresses the importance of checking assumptions. (SLD)
Torrecilla, José S; García, Julián; Rojo, Ester; Rodríguez, Francisco
2009-05-15
Multiple linear regression (MLR), radial basis network (RB), and multilayer perceptron (MLP) neural network (NN) models have been explored for the estimation of toxicity of ammonium, imidazolium, morpholinium, phosphonium, piperidinium, pyridinium, pyrrolidinium and quinolinium ionic liquid salts in the Leukemia Rat Cell Line (IPC-81) and Acetylcholinesterase (AChE) using only their empirical formulas (elemental composition) and molecular weights. The toxicity values were estimated by means of decadic logarithms of the half maximal effective concentration (EC(50)) in microM (log(10)EC(50)). The model's performances were analyzed by statistical parameters, analysis of residuals and central tendency and statistical dispersion tests. The MLP model estimates the log(10)EC(50) in IPC-81 and AchE with a mean prediction error less than 2.2 and 3.8%, respectively. PMID:18805639
Chelgani, S.C.; Hart, B.; Grady, W.C.; Hower, J.C.
2011-01-01
The relationship between maceral content plus mineral matter and gross calorific value (GCV) for a wide range of West Virginia coal samples (from 6518 to 15330 BTU/lb; 15.16 to 35.66MJ/kg) has been investigated by multivariable regression and adaptive neuro-fuzzy inference system (ANFIS). The stepwise least square mathematical method comparison between liptinite, vitrinite, plus mineral matter as input data sets with measured GCV reported a nonlinear correlation coefficient (R2) of 0.83. Using the same data set the correlation between the predicted GCV from the ANFIS model and the actual GCV reported a R2 value of 0.96. It was determined that the GCV-based prediction methods, as used in this article, can provide a reasonable estimation of GCV. Copyright ?? Taylor & Francis Group, LLC.
Suppression Situations in Multiple Linear Regression
Shieh, Gwowen
2006-01-01
This article proposes alternative expressions for the two most prevailing definitions of suppression without resorting to the standardized regression modeling. The formulation provides a simple basis for the examination of their relationship. For the two-predictor regression, the author demonstrates that the previous results in the literature are…
Riccardi, M.; Mele, G.
2014-01-01
Leaf chlorophyll content provides valuable information about physiological status of plants; it is directly linked to photosynthetic potential and primary production. In vitro assessment by wet chemical extraction is the standard method for leaf chlorophyll determination. This measurement is expensive, laborious, and time consuming. Over the years alternative methods, rapid and non-destructive, have been explored. The aim of this work was to evaluate the applicability of a fast and non-invasive field method for estimation of chlorophyll content in quinoa and amaranth leaves based on RGB components analysis of digital images acquired with a standard SLR camera. Digital images of leaves from different genotypes of quinoa and amaranth were acquired directly in the field. Mean values of each RGB component were evaluated via image analysis software and correlated to leaf chlorophyll provided by standard laboratory procedure. Single and multiple regression models using RGB color components as independent variables have been tested and validated. The performance of the proposed method was compared to that of the widely used non-destructive SPAD method. Sensitivity of the best regression models for different genotypes of quinoa and amaranth was also checked. Color data acquisition of the leaves in the field with a digital camera was quick, more effective, and lower cost than SPAD. The proposed RGB models provided better correlation (highest R 2) and prediction (lowest RMSEP) of the true value of foliar chlorophyll content and had a lower amount of noise in the whole range of chlorophyll studied compared with SPAD and other leaf image processing based models when applied to quinoa and amaranth.
Fuzzy multiple linear regression: A computational approach
Juang, C. H.; Huang, X. H.; Fleming, J. W.
1992-01-01
This paper presents a new computational approach for performing fuzzy regression. In contrast to Bardossy's approach, the new approach, while dealing with fuzzy variables, closely follows the conventional regression technique. In this approach, treatment of fuzzy input is more 'computational' than 'symbolic.' The following sections first outline the formulation of the new approach, then deal with the implementation and computational scheme, and this is followed by examples to illustrate the new procedure.
Isolating and Examining Sources of Suppression and Multicollinearity in Multiple Linear Regression
Beckstead, Jason W.
2012-01-01
The presence of suppression (and multicollinearity) in multiple regression analysis complicates interpretation of predictor-criterion relationships. The mathematical conditions that produce suppression in regression analysis have received considerable attention in the methodological literature but until now nothing in the way of an analytic…
Principal component regression analysis with SPSS.
Liu, R X; Kuang, J; Gong, Q; Hou, X L
2003-06-01
The paper introduces all indices of multicollinearity diagnoses, the basic principle of principal component regression and determination of 'best' equation method. The paper uses an example to describe how to do principal component regression analysis with SPSS 10.0: including all calculating processes of the principal component regression and all operations of linear regression, factor analysis, descriptives, compute variable and bivariate correlations procedures in SPSS 10.0. The principal component regression analysis can be used to overcome disturbance of the multicollinearity. The simplified, speeded up and accurate statistical effect is reached through the principal component regression analysis with SPSS. PMID:12758135
Estimation of transport airplane aerodynamics using multiple stepwise regression
Keskar, D. A.; Klein, V.; Batterson, J. G.
1985-01-01
This paper presents an application of multiple stepwise regression to the flight test data of a typical transport airplane. The flight test data was carefully preprocessed to eliminate aliasing, time skews and high frequency noise. The data consisted both of basic certification maneuvers, such as wind-up-turns and maneuvers suitable for parameter estimation, such as responses to elevator pulses and doublets. It is shown that the results of multiple stepwise regression techniques compare favorably with the results obtained from maximum likelihood estimation. Finally, it is concluded that multiple stepwise regression could be a fast economical way to estimate transport airplane aerodynamics.
Regression Commonality Analysis: A Technique for Quantitative Theory Building
Nimon, Kim; Reio, Thomas G., Jr.
2011-01-01
When it comes to multiple linear regression analysis (MLR), it is common for social and behavioral science researchers to rely predominately on beta weights when evaluating how predictors contribute to a regression model. Presenting an underutilized statistical technique, this article describes how organizational researchers can use commonality…
A Constrained Linear Estimator for Multiple Regression
Davis-Stober, Clintin P.; Dana, Jason; Budescu, David V.
2010-01-01
"Improper linear models" (see Dawes, Am. Psychol. 34:571-582, "1979"), such as equal weighting, have garnered interest as alternatives to standard regression models. We analyze the general circumstances under which these models perform well by recasting a class of "improper" linear models as "proper" statistical models with a single predictor. We…
Bry, Xavier; Verron, Thomas; Cazes, Pierre
2008-01-01
A variable group Y is assumed to depend upon R thematic variable groups X 1, >..., X R . We assume that components in Y depend linearly upon components in the Xr's. In this work, we propose a multiple covariance criterion which extends that of PLS regression to this multiple predictor groups situation. On this criterion, we build a PLS-type exploratory method - Structural Equation Exploratory Regression (SEER) - that allows to simultaneously perform dimension reduction in gr...
Regression analysis with categorized regression calibrated exposure: some interesting findings
Hjartåker Anette
2006-07-01
Full Text Available Abstract Background Regression calibration as a method for handling measurement error is becoming increasingly well-known and used in epidemiologic research. However, the standard version of the method is not appropriate for exposure analyzed on a categorical (e.g. quintile scale, an approach commonly used in epidemiologic studies. A tempting solution could then be to use the predicted continuous exposure obtained through the regression calibration method and treat it as an approximation to the true exposure, that is, include the categorized calibrated exposure in the main regression analysis. Methods We use semi-analytical calculations and simulations to evaluate the performance of the proposed approach compared to the naive approach of not correcting for measurement error, in situations where analyses are performed on quintile scale and when incorporating the original scale into the categorical variables, respectively. We also present analyses of real data, containing measures of folate intake and depression, from the Norwegian Women and Cancer study (NOWAC. Results In cases where extra information is available through replicated measurements and not validation data, regression calibration does not maintain important qualities of the true exposure distribution, thus estimates of variance and percentiles can be severely biased. We show that the outlined approach maintains much, in some cases all, of the misclassification found in the observed exposure. For that reason, regression analysis with the corrected variable included on a categorical scale is still biased. In some cases the corrected estimates are analytically equal to those obtained by the naive approach. Regression calibration is however vastly superior to the naive method when applying the medians of each category in the analysis. Conclusion Regression calibration in its most well-known form is not appropriate for measurement error correction when the exposure is analyzed on a percentile scale. Relating back to the original scale of the exposure solves the problem. The conclusion regards all regression models.
G. Ugrasen
2014-05-01
Full Text Available Wire Electrical Discharge Machining (WEDM is a specialized thermal machining process capable of accurately machining parts with varying hardness or complex shapes, which have sharp edges that are very difficult to be machined by the main stream machining processes. In WEDM a specific wire run-off speed is applied to compensate wear and avoid wire breakage. Since the workpiece generally stays stationary and short discharge durations are applied, the relative displacement between wire and workpiece during one single discharge is very small. This study outlines the development of model and its application to optimize WEDM machining parameters using the Taguchi?s technique which is based on the robust design. Present study outlines the electrode wear estimation in the wire EDM. EN-8 and EN-19 was machined using different process parameters based on L?16 orthogonal array. Among different process parameters voltage and flush rate were kept constant. Parameters such as bed speed, current, pulse-on and pulse-off was varied. Molybdenum wire having diameter of 0.18 mm was used as an electrode. Electrode wear was measured using universal measuring machine. Estimation and comparison of electrode wear was done using multiple regression analysis and group method data handling technique. From the results it was observed that, measured electrode wear and estimated electrode wear correlates well with respect to MRA than GMDH
International Nuclear Information System (INIS)
Various statistical techniques was used on five-year data from 1998-2002 of average humidity, rainfall, maximum and minimum temperatures, respectively. The relationships to regression analysis time series (RATS) were developed for determining the overall trend of these climate parameters on the basis of which forecast models can be corrected and modified. We computed the coefficient of determination as a measure of goodness of fit, to our polynomial regression analysis time series (PRATS). The correlation to multiple linear regression (MLR) and multiple linear regression analysis time series (MLRATS) were also developed for deciphering the interdependence of weather parameters. Spearman's rand correlation and Goldfeld-Quandt test were used to check the uniformity or non-uniformity of variances in our fit to polynomial regression (PR). The Breusch-Pagan test was applied to MLR and MLRATS, respectively which yielded homoscedasticity. We also employed Bartlett's test for homogeneity of variances on a five-year data of rainfall and humidity, respectively which showed that the variances in rainfall data were not homogenous while in case of humidity, were homogenous. Our results on regression and regression analysis time series show the best fit to prediction modeling on climatic data of Quetta, Pakistan. (author)
Fuzzy Multiple Regression Model for Estimating Software Development Time
Venus Marza; Mir Ali Seyyedi
2009-01-01
As software becomes more complex and its scope dramatically increase, the importance of research on developing methods for estimating software development time has perpetually increased, so accurate estimation is the main goal of software managers for reducing risks of projects. The purpose of this article is to introduce a new Fuzzy Multiple Regression approach, which has the higher accurate than other methods for estimating. Furthermore, we compare Fuzzy Multiple Regression model with Fuzzy...
Using Cigarette Data for An Introduction to Multiple Regression
McIntyre, Lauren
This article, created by Lauran McIntyre of North Carolina State University, describes a dataset containing information for twenty-five brands of domestic cigarettes. The information collected includes: measurements of weight, tar, nicotine and carbon monoxide. The dataset can be used to illustrate multiple regression, outliers, and collinearity. Speaking to this, the author states: "The dataset is useful for introducing the ideas of multiple regression and provides examples of an outlier and a pair of collinear variables."
Effects of associated kernels in nonparametric multiple regressions
Somé, Sobom M.; Kokonendji, Célestin C.
2015-01-01
Associated kernels have been introduced to improve the classical continuous kernels for smoothing any functional on several kinds of supports such as bounded continuous and discrete sets. This work deals with the effects of combined associated kernels on nonparametric multiple regression functions. Using the Nadaraya-Watson estimator with optimal bandwidth matrices selected by cross-validation procedure, different behaviours of multiple regression estimations are pointed out...
Schaeck, S.; Karspeck, T.; Ott, C.; Weirather-Koestner, D.; Stoermer, A. O.
2011-03-01
In the first part of this work [1] a field operational test (FOT) on micro-HEVs (hybrid electric vehicles) and conventional vehicles was introduced. Valve-regulated lead-acid (VRLA) batteries in absorbent glass mat (AGM) technology and flooded batteries were applied. The FOT data were analyzed by kernel density estimation. In this publication multiple regression analysis is applied to the same data. Square regression models without interdependencies are used. Hereby, capacity loss serves as dependent parameter and several battery-related and vehicle-related parameters as independent variables. Battery temperature is found to be the most critical parameter. It is proven that flooded batteries operated in the conventional power system (CPS) degrade faster than VRLA-AGM batteries in the micro-hybrid power system (MHPS). A smaller number of FOT batteries were applied in a vehicle-assigned test design where the test battery is repeatedly mounted in a unique test vehicle. Thus, vehicle category and specific driving profiles can be taken into account in multiple regression. Both parameters have only secondary influence on battery degradation, instead, extended vehicle rest time linked to low mileage performance is more serious. A tear-down analysis was accomplished for selected VRLA-AGM batteries operated in the MHPS. Clear indications are found that pSoC-operation with periodically fully charging the battery (refresh charging) does not result in sulphation of the negative electrode. Instead, the batteries show corrosion of the positive grids and weak adhesion of the positive active mass.
Sample Sizes when Using Multiple Linear Regression for Prediction
Knofczynski, Gregory T.; Mundfrom, Daniel
2008-01-01
When using multiple regression for prediction purposes, the issue of minimum required sample size often needs to be addressed. Using a Monte Carlo simulation, models with varying numbers of independent variables were examined and minimum sample sizes were determined for multiple scenarios at each number of independent variables. The scenarios…
Vehicle Travel Time Predication based on Multiple Kernel Regression
Wenjing Xu
2014-07-01
Full Text Available With the rapid development of transportation and logistics economy, the vehicle travel time prediction and planning become an important topic in logistics. Travel time prediction, which is indispensible for traffic guidance, has become a key issue for researchers in this field. At present, the prediction of travel time is mainly short term prediction, and the predication methods include artificial neural network, Kaman filter and support vector regression (SVR method etc. However, these algorithms still have some shortcomings, such as highcomputationcomplexity, slow convergence rate etc. This paper exploits the learning ability of multiple kernel learning regression (MKLR in nonlinear prediction processing characteristics, logistics planning based on MKLR for vehicle travel time prediction. The method for Vehicle travel time prediction includes the following steps: (1 preprocessing historical data; (2 selecting appropriate kernel function, training the historical data and performing analysis ;(3 predicting the vehicle travel time based on the trained model. The experimental results show that, through the analysis of using different methods for prediction, the vehicle travel time prediction method proposed in this paper, archives higher accuracy than other methods. It also illustrates the feasibility and effectiveness of the proposed prediction method.
On relationship between regression models and interpretation of multiple regression coefficients
Varaksin, A. N.; Panov, V. G.
2012-01-01
In this paper, we consider the problem of treating linear regression equation coefficients in the case of correlated predictors. It is shown that in general there are no natural ways of interpreting these coefficients similar to the case of single predictor. Nevertheless we suggest linear transformations of predictors, reducing multiple regression to a simple one and retaining the coefficient at variable of interest. The new variable can be treated as the part of the old var...
Functional linear regression via canonical analysis
He, Guozhong; Wang, Jane-Ling; Yang, Wenjing; 10.3150/09-BEJ228
2011-01-01
We study regression models for the situation where both dependent and independent variables are square-integrable stochastic processes. Questions concerning the definition and existence of the corresponding functional linear regression models and some basic properties are explored for this situation. We derive a representation of the regression parameter function in terms of the canonical components of the processes involved. This representation establishes a connection between functional regression and functional canonical analysis and suggests alternative approaches for the implementation of functional linear regression analysis. A specific procedure for the estimation of the regression parameter function using canonical expansions is proposed and compared with an established functional principal component regression approach. As an example of an application, we present an analysis of mortality data for cohorts of medflies, obtained in experimental studies of aging and longevity.
Steganalysis of LSB Image Steganography using Multiple Regression and Auto Regressive (AR Model
Directory of Open Access Journals (Sweden)
Souvik Bhattacharyya
2011-07-01
Full Text Available The staggering growth in communication technologyand usage of public domain channels (i.e. Internet has greatly facilitated transfer of data. However, such open communication channelshave greater vulnerability to security threats causing unauthorizedin- formation access. Traditionally, encryption is used to realizethen communication security. However, important information is notprotected once decoded. Steganography is the art and science of communicating in a way which hides the existence of the communication.Important information is ?rstly hidden in a host data, such as digitalimage, text, video or audio, etc, and then transmitted secretly tothe receiver. Steganalysis is another important topic in informationhiding which is the art of detecting the presence of steganography. Inthis paper a novel technique for the steganalysis of Image has beenpresented. The proposed technique uses an auto-regressive model todetect the presence of the hidden messages, as well as to estimatethe relative length of the embedded messages.Various auto regressiveparameters are used to classify cover image as well as stego imagewith the help of a SVM classi?er. Multiple Regression analysis ofthe cover carrier along with the stego carrier has been carried outin order to ?nd out the existence of the negligible amount of thesecret message. Experimental results demonstrate the effectivenessand accuracy of the proposed technique.
Analysing Conjoint Analysis Data by a Random Coefficient Regression Model
Furlan, Roberto; Corradetti, Roberto
2005-01-01
Since late 1960s conjoint analysis has been applied in estimating consumer preferences in marketing research. This article discusses how to model the data coming from a full or a fractional factorial design within a unique regression model, as an alternative to the estimation done by n independent multiple linear regression models, one for each subject. The advantage of the method presented here resides in the possibility of computing correct standard errors for the conjoint analysis utility ...
Significant Tests of Coefficient Multiple Regressions by using Permutation Methods
Ali Shadrokh
2011-01-01
Tests of significance of a single partial regression coefficient in a multiple regression model are often made in situations where the standard assumptions underlying the probability calculation (for example assumption of normally of random error term) do not hold. When the random error term fails to fulfill some of these assumptions, one need resort to some other nonparametric methods to carry out statistical inferences. Permutation methods are a branch of nonparametric methods. This study c...
Hierarchical regression for epidemiologic analyses of multiple exposures.
Greenland, S
1994-01-01
Many epidemiologic investigations are designed to study the effects of multiple exposures. Most of these studies are analyzed either by fitting a risk-regression model with all exposures forced in the model, or by using a preliminary-testing algorithm, such as stepwise regression, to produce a smaller model. Research indicates that hierarchical modeling methods can outperform these conventional approaches. These methods are reviewed and compared to two hierarchical methods, empirical-Bayes re...
Computing multiple-output regression quantile regions from projection quantiles.
Paindaveine, D.; Šiman, Miroslav
2012-01-01
Ro?. 27, ?. 1 (2012), s. 29-49. ISSN 0943-4062 R&D Projects: GA MŠk(CZ) 1M06047 Institutional research plan: CEZ:AV0Z10750506 Keywords : directional quantile * halfspace depth * multiple-output regression * parametric programming * quantile regression Subject RIV: BA - General Mathematics Impact factor: 0.482, year: 2012 http://library.utia.cas.cz/separaty/2012/SI/siman-0376414.pdf
Bry, Xavier; Cazes, Pierre
2008-01-01
A variable group Y is assumed to depend upon R thematic variable groups X 1, >..., X R . We assume that components in Y depend linearly upon components in the Xr's. In this work, we propose a multiple covariance criterion which extends that of PLS regression to this multiple predictor groups situation. On this criterion, we build a PLS-type exploratory method - Structural Equation Exploratory Regression (SEER) - that allows to simultaneously perform dimension reduction in groups and investigate the linear model of the components. SEER uses the multidimensional structure of each group. An application example is given.
Applied regression analysis a research tool
Pantula, Sastry; Dickey, David
1998-01-01
Least squares estimation, when used appropriately, is a powerful research tool. A deeper understanding of the regression concepts is essential for achieving optimal benefits from a least squares analysis. This book builds on the fundamentals of statistical methods and provides appropriate concepts that will allow a scientist to use least squares as an effective research tool. Applied Regression Analysis is aimed at the scientist who wishes to gain a working knowledge of regression analysis. The basic purpose of this book is to develop an understanding of least squares and related statistical methods without becoming excessively mathematical. It is the outgrowth of more than 30 years of consulting experience with scientists and many years of teaching an applied regression course to graduate students. Applied Regression Analysis serves as an excellent text for a service course on regression for non-statisticians and as a reference for researchers. It also provides a bridge between a two-semester introduction to...
Guoli Wang; Jianhui Wu; Jianhua Wu; Xiaohong Wang
2011-01-01
Both linear neural network and multiple linear regression models can be used for multi-factor analysis and forecasting, but the data of the multiple linear regression model are required to meet such conditions as independence and normality, while the data of the linear neural network are only required to have a linear relationship. This article uses the same set of data to establish respectively a linear neural network model and a multiple linear regression model, compares the abilities of fi...
Chen Su-Fen
2013-01-01
Unified Multiple Linear Regression (UMLR) is a nonlinear programming model that unifies all kind of multiple linear regression models, such as Principal Components Regression, Ridge Regression, Robust Regression and constrained regression. Although, UMLR has exhibited excellent performances in some real applications, the optimization procedure is not satisfying yet. This study proposes a novel Granular Computing-Particle Swarm Optimization (Grc-PSO) algorithm by ...
Standardized Regression Coefficients as Indices of Effect Sizes in Meta-Analysis
Kim, Rae Seon
2011-01-01
When conducting a meta-analysis, it is common to find many collected studies that report regression analyses, because multiple regression analysis is widely used in many fields. Meta-analysis uses effect sizes drawn from individual studies as a means of synthesizing a collection of results. However, indices of effect size from regression analyses…
Interpreting Multiple Linear Regression: A Guidebook of Variable Importance
Nathans, Laura L.; Oswald, Frederick L.; Nimon, Kim
2012-01-01
Multiple regression (MR) analyses are commonly employed in social science fields. It is also common for interpretation of results to typically reflect overreliance on beta weights, often resulting in very limited interpretations of variable importance. It appears that few researchers employ other methods to obtain a fuller understanding of what…
Halil Ibrahim Cebeci
2009-12-01
Full Text Available This study explores the relationship between the student performance and instructional design. The research was conducted at the E-Learning School at a university in Turkey. A list of design factors that had potential influence on student success was created through a review of the literature and interviews with relevant experts. From this, the five most import design factors were chosen. The experts scored 25 university courses on the extent to which they demonstrated the chosen design factors. Multiple-regression and supervised artificial neural network (ANN models were used to examine the relationship between student grade point averages and the scores on the five design factors. The results indicated that there is no statistical difference between the two models. Both models identified the use of examples and applications as the most influential factor. The ANN model provided more information and was used to predict the course-specific factor values required for a desired level of success.
Whitlock, C. H., III
1977-01-01
Constituents with linear radiance gradients with concentration may be quantified from signals which contain nonlinear atmospheric and surface reflection effects for both homogeneous and non-homogeneous water bodies provided accurate data can be obtained and nonlinearities are constant with wavelength. Statistical parameters must be used which give an indication of bias as well as total squared error to insure that an equation with an optimum combination of bands is selected. It is concluded that the effect of error in upwelled radiance measurements is to reduce the accuracy of the least square fitting process and to increase the number of points required to obtain a satisfactory fit. The problem of obtaining a multiple regression equation that is extremely sensitive to error is discussed.
Li, Spencer D.
2011-01-01
Mediation analysis in child and adolescent development research is possible using large secondary data sets. This article provides an overview of two statistical methods commonly used to test mediated effects in secondary analysis: multiple regression and structural equation modeling (SEM). Two empirical studies are presented to illustrate the…
Linear regression analysis theory and computing
Yan, Xin
2009-01-01
This volume presents in detail the fundamental theories of linear regression analysis and diagnosis, as well as the relevant statistical computing techniques so that readers are able to actually model the data using the methods and techniques described in the book. It covers the fundamental theories in linear regression analysis and is extremely useful for future research in this area. The examples of regression analysis using the Statistical Application System (SAS) are also included. This book is suitable for graduate students who are either majoring in statistics/biostatistics or using line
Joint regression analysis and AMMI model applied to oat improvement
Oliveira, A.; Oliveira, T. A.; Mejza, S.
2012-09-01
In our work we present an application of some biometrical methods useful in genotype stability evaluation, namely AMMI model, Joint Regression Analysis (JRA) and multiple comparison tests. A genotype stability analysis of oat (Avena Sativa L.) grain yield was carried out using data of the Portuguese Plant Breeding Board, sample of the 22 different genotypes during the years 2002, 2003 and 2004 in six locations. In Ferreira et al. (2006) the authors state the relevance of the regression models and of the Additive Main Effects and Multiplicative Interactions (AMMI) model, to study and to estimate phenotypic stability effects. As computational techniques we use the Zigzag algorithm to estimate the regression coefficients and the agricolae-package available in R software for AMMI model analysis.
Modeling Oil Palm Yield Using Multiple Linear Regression and Robust M-regression
Azme Khamis; Zuhaimy Ismail; Khalid Haron; Ahmad Tarmizi Mohammed
2006-01-01
This study shows how a multiple linear regression model can be used to model palm oil yield. The methods are illustrated by examining the time series data of foliar nutrient compositions as one of the independent variable and fresh fruit bunch as dependent variable. Other independent variables include the nutrient balance ratio and major nutrient composition. This modeling approach is capable of identifying the significant contribution of each independent variable in the improving the modelin...
The incremental diagnostic yield of clinical data, exercise ECG, stress thallium scintigraphy, and cardiac fluoroscopy to predict coronary and multivessel disease was assessed in 171 symptomatic men by means of multiple logistic regression analyses. When clinical variables alone were analyzed, chest pain type and age were predictive of coronary disease, whereas chest pain type, age, a family history of premature coronary disease before age 55 years, and abnormal ST-T wave changes on the rest ECG were predictive of multivessel disease. The percentage of patients correctly classified by cardiac fluoroscopy (presence or absence of coronary artery calcification), exercise ECG, and thallium scintigraphy was 9%, 25%, and 50%, respectively, greater than for clinical variables, when the presence or absence of coronary disease was the outcome, and 13%, 25%, and 29%, respectively, when multivessel disease was studied; 5% of patients were misclassified. When the 37 clinical and noninvasive test variables were analyzed jointly, the most significant variable predictive of coronary disease was an abnormal thallium scan and for multivessel disease, the amount of exercise performed. The data from this study provide a quantitative model and confirm previous reports that optimal diagnostic efficacy is obtained when noninvasive tests are ordered sequentially. In symptomatic men, cardiac fluoroscopy is a relatively ineffective test when compared to exercise ECG and thallium scintigraphyto exercise ECG and thallium scintigraphy
Introducing Evolutionary Computing in Regression Analysis
Olcay Akman
A typical upper level undergraduate or first year graduate level regression course syllabus treats model selection with various stepwise regression methods. Here we implement evolutionary computing for subset model selection and accomplish two goals: i) introduce students to the powerful optimization method of genetic algorithms, and ii) transform a regression analysis course to a regression and modeling without requiring any additional time or software commitment.Furthermore we also employed Akaike Information Criterion (AIC) as a measure of model fitness instead of another commonly used measure of R-square. The model selection tool uses Excel which makes the procedure accessible to a very wide spectrum of interdisciplinary students with no specialized software requirement. An Excel macro, to be used as an instructional tool is freely available through the author's website.
Two SPSS programs for interpreting multiple regression results.
Lorenzo-Seva, Urbano; Ferrando, Pere J; Chico, Eliseo
2010-02-01
When multiple regression is used in explanation-oriented designs, it is very important to determine both the usefulness of the predictor variables and their relative importance. Standardized regression coefficients are routinely provided by commercial programs. However, they generally function rather poorly as indicators of relative importance, especially in the presence of substantially correlated predictors. We provide two user-friendly SPSS programs that implement currently recommended techniques and recent developments for assessing the relevance of the predictors. The programs also allow the user to take into account the effects of measurement error. The first program, MIMR-Corr.sps, uses a correlation matrix as input, whereas the second program, MIMR-Raw.sps, uses the raw data and computes bootstrap confidence intervals of different statistics. The SPSS syntax, a short manual, and data files related to this article are available as supplemental materials from http://brm.psychonomic-journals.org/content/supplemental. PMID:20160283
A multiple regression model for the Ft. Calhoun reactor coolant pump system
International Nuclear Information System (INIS)
Multiple regression analysis is one of the most widely used of all statistical tools. In this research paper, we introduce an application of fitting a multiple regression model on reactor coolant pump (RCP) data. The primary purpose of this research is to correlate the results obtained by Design of Experiments (DOE) and regression model fitting. Also, the idea behind using regression model is to gain more detailed information in the RCP data than provided by DOE. In engineering science, statistical quality control techniques have traditionally been applied to control manufacturing processes. An application to commercial nuclear power plant maintenance and control is presented that can greatly improve plant safety and reliability. The result obtained show that six out of ten parameters are under control specification limits and four parameters are not in the state of statistical control. The four parameters that are out of control adversely affect the regression model fitting and the final prediction equation, thereby, does not predict accurate response for the future. The analysis concludes that in order to fit a best regression model, one has to remove all out of control points from the data set, including dropping a variable from the model to have better prediction of the response variable. (author)
Multiple predictor smoothing methods for sensitivity analysis
International Nuclear Information System (INIS)
The use of multiple predictor smoothing methods in sampling-based sensitivity analyses of complex models is investigated. Specifically, sensitivity analysis procedures based on smoothing methods employing the stepwise application of the following nonparametric regression techniques are described: (1) locally weighted regression (LOESS), (2) additive models, (3) projection pursuit regression, and (4) recursive partitioning regression. The indicated procedures are illustrated with both simple test problems and results from a performance assessment for a radioactive waste disposal facility (i.e., the Waste Isolation Pilot Plant). As shown by the example illustrations, the use of smoothing procedures based on nonparametric regression techniques can yield more informative sensitivity analysis results than can be obtained with more traditional sensitivity analysis procedures based on linear regression, rank regression or quadratic regression when nonlinear relationships between model inputs and model predictions are present
Multiple predictor smoothing methods for sensitivity analysis.
Helton, Jon Craig; Storlie, Curtis B.
2006-08-01
The use of multiple predictor smoothing methods in sampling-based sensitivity analyses of complex models is investigated. Specifically, sensitivity analysis procedures based on smoothing methods employing the stepwise application of the following nonparametric regression techniques are described: (1) locally weighted regression (LOESS), (2) additive models, (3) projection pursuit regression, and (4) recursive partitioning regression. The indicated procedures are illustrated with both simple test problems and results from a performance assessment for a radioactive waste disposal facility (i.e., the Waste Isolation Pilot Plant). As shown by the example illustrations, the use of smoothing procedures based on nonparametric regression techniques can yield more informative sensitivity analysis results than can be obtained with more traditional sensitivity analysis procedures based on linear regression, rank regression or quadratic regression when nonlinear relationships between model inputs and model predictions are present.
Regression analysis using dependent Polya trees.
Schörgendorfer, Angela; Branscum, Adam J
2013-11-30
Many commonly used models for linear regression analysis force overly simplistic shape and scale constraints on the residual structure of data. We propose a semiparametric Bayesian model for regression analysis that produces data-driven inference by using a new type of dependent Polya tree prior to model arbitrary residual distributions that are allowed to evolve across increasing levels of an ordinal covariate (e.g., time, in repeated measurement studies). By modeling residual distributions at consecutive covariate levels or time points using separate, but dependent Polya tree priors, distributional information is pooled while allowing for broad pliability to accommodate many types of changing residual distributions. We can use the proposed dependent residual structure in a wide range of regression settings, including fixed-effects and mixed-effects linear and nonlinear models for cross-sectional, prospective, and repeated measurement data. A simulation study illustrates the flexibility of our novel semiparametric regression model to accurately capture evolving residual distributions. In an application to immune development data on immunoglobulin G antibodies in children, our new model outperforms several contemporary semiparametric regression models based on a predictive model selection criterion. PMID:23839794
Determining School Effectiveness Following a Regression Analysis.
Convey, John J.
Three methods that can be used subsequent to a regression analysis to determine the relative effectiveness of schools are Dyer's Performance Indices, Scheffe's hyperbolic confidence bands, and Gafarian's linear confidence bands. These methods were applied to data from 54 hypothetical schools randomly generated from a multivariate normal…
Tools to Support Interpreting Multiple Regression in the Face of Multicollinearity
AmandaKraha; LindaZientek
2012-01-01
While multicollinearity may increase the difficulty of interpreting multiple regression results, it should not cause undue problems for the knowledgeable researcher. In the current paper, we argue that rather than using one technique to investigate regression results, researchers should consider multiple indices to understand the contributions that predictors make not only to a regression model, but to each other as well. Some of the techniques to interpret multiple regression effects include...
Fiscal Multipliers: A Meta Regression Analysis
Gechert, Sebastian; Will, Henner
2012-01-01
Since the fiscal expansion during the Great Recession 2008-2009 and the current European consolidation and austerity measures, the analysis of fiscal multiplier effects is back on the scientific agenda. The number of empirical studies is growing fast, tackling the issue with manifold model classes, identification strategies, and specifications. While plurality of methods seems to be a good idea to address a complicated issue, the results are far off consensus. We apply meta regression analysi...
FRATS: Functional Regression Analysis of DTI Tract Statistics
Zhu, Hongtu.; Styner, Martin; Tang, Niansheng; Liu, Zhexing; Lin, Weili; Gilmore, John H
2010-01-01
Diffusion tensor imaging (DTI) provides important information on the structure of white matter fiber bundles as well as detailed tissue properties along these fiber bundles in vivo. This paper presents a functional regression framework, called FRATS, for the analysis of multiple diffusion properties along fiber bundle as functions in an infinite dimensional space and their association with a set of covariates of interest, such as age, diagnostic status and gender, in real applications. The fu...
M. Srinivasan
2012-01-01
Full Text Available Problem statement: This study presents a novel method for the determination of average winding temperature rise of transformers under its predetermined field operating conditions. Rise in the winding temperature was determined from the estimated values of winding resistance during the heat run test conducted as per IEC standard. Approach: The estimation of hot resistance was modeled using Multiple Variable Regression (MVR, Multiple Polynomial Regression (MPR and soft computing techniques such as Artificial Neural Network (ANN and Adaptive Neuro Fuzzy Inference System (ANFIS. The modeled hot resistance will help to find the load losses at any load situation without using complicated measurement set up in transformers. Results: These techniques were applied for the hot resistance estimation for dry type transformer by using the input variables cold resistance, ambient temperature and temperature rise. The results are compared and they show a good agreement between measured and computed values. Conclusion: According to our experiments, the proposed methods are verified using experimental results, which have been obtained from temperature rise test performed on a 55 kVA dry-type transformer.
Precipitation interpolation in mountainous regions using multiple linear regression
Hay, L.; Viger, R.; McCabe, G.
1998-01-01
Multiple linear regression (MLR) was used to spatially interpolate precipitation for simulating runoff in the Animas River basin of southwestern Colorado. MLR equations were defined for each time step using measured precipitation as dependent variables. Explanatory variables used in each MLR were derived for the dependent variable locations from a digital elevation model (DEM) using a geographic information system. The same explanatory variables were defined for a 5 ?? 5 km grid of the DEM. For each time step, the best MLR equation was chosen and used to interpolate precipitation onto the 5 ?? 5 km grid. The gridded values of precipitation provide a physically-based estimate of the spatial distribution of precipitation and result in reliable simulations of daily runoff in the Animas River basin.
Multiple regression models for energy use in air-conditioned office buildings in different climates
International Nuclear Information System (INIS)
An attempt was made to develop multiple regression models for office buildings in the five major climates in China - severe cold, cold, hot summer and cold winter, mild, and hot summer and warm winter. A total of 12 key building design variables were identified through parametric and sensitivity analysis, and considered as inputs in the regression models. The coefficient of determination R2 varies from 0.89 in Harbin to 0.97 in Kunming, indicating that 89-97% of the variations in annual building energy use can be explained by the changes in the 12 parameters. A pseudo-random number generator based on three simple multiplicative congruential generators was employed to generate random designs for evaluation of the regression models. The difference between regression-predicted and DOE-simulated annual building energy use are largely within 10%. It is envisaged that the regression models developed can be used to estimate the likely energy savings/penalty during the initial design stage when different building schemes and design concepts are being considered.
Multiple Retrieval Models and Regression Models for Prior Art Search
Lopez, Patrice; Romary, Laurent
2009-01-01
This paper presents the system called PATATRAS (PATent and Article Tracking, Retrieval and AnalysiS) realized for the IP track of CLEF 2009. Our approach presents three main characteristics: 1. The usage of multiple retrieval models (KL, Okapi) and term index definitions (lemma, phrase, concept) for the three languages considered in the present track (English, French, German) producing ten different sets of ranked results. 2. The merging of the different results based on mul...
Stukel's Extended Logistic Regression Analysis with R
Vilda PURUTÇUO?LU
2011-01-01
Full Text Available Objective: For a logistic regression model, the degree to which predicted probabilities agree with actual outcomes can be expressed as a classification table. Being crucial in model adequacy checking, such tables may be slightly different when the same data are modeled with different statistical packages. The underlying reason is that when classifying a set of binary data, if the observations used to fit the model are also used to estimate the classification error, the resulting error-count estimate is biased. In order to cope with this problem, SAS suggests an algorithm, whereas the software is not publicly available. R is a free downloadable programme which is particularly designed for statistical computation, including the logistic regression analysis. The purpose of this study is to present a new function in R which carries out an extended logistic regression analysis of a binary data from the construction of its reduced-biased classification table, to the inference of its model parameters by calling the lrm(. function under the Design package where necessary. Material and Methods: The performance of ext.logreg(. is evaluated in terms of the accuracy of estimates and computational cost. Results: From the results of two binary datasets, it is observed that ext.logreg(. via R estimates the model parameters and constructs the unbiased classification table as accurate as SAS programme under PROC logistic function without losing the computational demand. Conclusion: The free downloadable ext.logreg(. function can be seen as an alternative computational tool in the analysis of logistic regression when the validation of predicted probabilities is essential.
Functional linear regression analysis for longitudinal data
Yao, F; Wang, J L; Yao, Fang; Müller, Hans-Georg; Wang, Jane-Ling
2005-01-01
We propose nonparametric methods for functional linear regression which are designed for sparse longitudinal data, where both the predictor and response are functions of a covariate such as time. Predictor and response processes have smooth random trajectories, and the data consist of a small number of noisy repeated measurements made at irregular times for a sample of subjects. In longitudinal studies, the number of repeated measurements per subject is often small and may be modeled as a discrete random number and, accordingly, only a finite and asymptotically nonincreasing number of measurements are available for each subject or experimental unit. We propose a functional regression approach for this situation, using functional principal component analysis, where we estimate the functional principal component scores through conditional expectations. This allows the prediction of an unobserved response trajectory from sparse measurements of a predictor trajectory. The resulting technique is flexible and allow...
On connectivity of fibers with positive marginals in multiple logistic regression
Hara, Hisayuki; Takemura, Akimichi; Yoshida, Ruriko
2008-01-01
In this paper we consider exact tests of a multiple logistic regression, where the levels of covariates are equally spaced, via Markov beses. In usual application of multiple logistic regression, the sample size is positive for each combination of levels of the covariates. In this case we do not need a whole Markov basis, which guarantees connectivity of all fibers. We first give an explicit Markov basis for multiple Poisson regression. By the Lawrence lifting of this basis,...
An Effect Size for Regression Predictors in Meta-Analysis
Aloe, Ariel M.; Becker, Betsy Jane
2012-01-01
A new effect size representing the predictive power of an independent variable from a multiple regression model is presented. The index, denoted as r[subscript sp], is the semipartial correlation of the predictor with the outcome of interest. This effect size can be computed when multiple predictor variables are included in the regression model…
Regression Analysis for the Social Sciences
Gordon, Rachel A A
2012-01-01
The book provides graduate students in the social sciences with the basic skills that they need to estimate, interpret, present, and publish basic regression models using contemporary standards. Key features of the book include:interweaving the teaching of statistical concepts with examples developed for the course from publicly-available social science data or drawn from the literature. thorough integration of teaching statistical theory with teaching data processing and analysis.teaching of both SAS and Stata "side-by-side" and use of chapter exercises in which students practice programming
A method for nonlinear exponential regression analysis
A computer-oriented technique is presented for performing a nonlinear exponential regression analysis on decay-type experimental data. The technique involves the least squares procedure wherein the nonlinear problem is linearized by expansion in a Taylor series. A linear curve fitting procedure for determining the initial nominal estimates for the unknown exponential model parameters is included as an integral part of the technique. A correction matrix was derived and then applied to the nominal estimate to produce an improved set of model parameters. The solution cycle is repeated until some predetermined criterion is satisfied.
On connectivity of fibers with positive marginals in multiple logistic regression
In this paper we consider exact tests of a multiple logistic regression, where the levels of covariates are equally spaced, via Markov beses. In usual application of multiple logistic regression, the sample size is positive for each combination of levels of the covariates. In this case we do not need a whole Markov basis, which guarantees connectivity of all fibers. We first give an explicit Markov basis for multiple Poisson regression. By the Lawrence lifting of this basis, in the case of bivariate logistic regression, we show a simple subset of the Markov basis which connects all fibers with a positive sample size for each combination of levels of covariates.
Using Dominance Analysis to Determine Predictor Importance in Logistic Regression
Azen, Razia; Traxel, Nicole
2009-01-01
This article proposes an extension of dominance analysis that allows researchers to determine the relative importance of predictors in logistic regression models. Criteria for choosing logistic regression R[superscript 2] analogues were determined and measures were selected that can be used to perform dominance analysis in logistic regression. A…
Type I error rates in multiple regression, and hence the chance for false positive research findings, can be drastically inflated when multiple regression models are used to analyze data that contain random measurement error. This article shows the potential for inflated Type I error rates in commonly encountered scenarios and provides new…
Tools to Support Interpreting Multiple Regression in the Face of Multicollinearity
Kraha, Amanda; Turner, Heather; Nimon, Kim; Zientek, Linda Reichwein; Henson, Robin K.
2012-01-01
While multicollinearity may increase the difficulty of interpreting multiple regression (MR) results, it should not cause undue problems for the knowledgeable researcher. In the current paper, we argue that rather than using one technique to investigate regression results, researchers should consider multiple indices to understand the contributions that predictors make not only to a regression model, but to each other as well. Some of the techniques to interpret MR effects include, but are no...
Throughput Prediction of Fishing Goods Based on the Grey Multiple Linear Regression Method
Based on the grey prediction method and multiple linear regression method, the grey multiple linear regression method was presented. This method was applied to the throughput prediction of fishing goods according to five fishing ports’ actual throughput data. The result of comparing the calculating conclusion to the time series one-dimensional linear regression method and grey prediction method proved that the method of calculation and analyzing was more effective and the forecasting precis...
Sliced Inverse Regression for big data analysis
Modem advances in computing power have greatly widened scientists' scope in gathering and investigating information from many variables. We describe sliced inverse regression (SIR), for reducing the dimension of the input variable x without going through any parametric or nonparametric model-fitting process. This method explores the simplicity of the inverse view of regression. Instead of regressing the univariate output variable y against the multivariate x, we regress x against y. Forward r...
A Software Tool for Regression Analysis and its Assumptions
Nowadays, among the forecasting methods, the most important one is the regression analysis. In this method, the aim is to estimate the population regression model as much as accurate by taking as basis the sample regression function. Its results are valid under certain assumptions and the violations of these assumptions cause the invalidity of some properties of the estimators. In this study, a new object-oriented program concentrated only on the regression analysis and its assumptions has be...
Forecasting Gold Prices Using Multiple Linear Regression Method
Full Text Available Problem statement: Forecasting is a function in management to assist decision making. It is also described as the process of estimation in unknown future situations. In a more general term it is commonly known as prediction which refers to estimation of time series or longitudinal type data. Gold is a precious yellow commodity once used as money. It was made illegal in USA 41 years ago, but is now once again accepted as a potential currency. The demand for this commodity is on the rise. Approach: Objective of this study was to develop a forecasting model for predicting gold prices based on economic factors such as inflation, currency price movements and others. Following the melt-down of US dollars, investors are putting their money into gold because gold plays an important role as a stabilizing influence for investment portfolios. Due to the increase in demand for gold in Malaysian and other parts of the world, it is necessary to develop a model that reflects the structure and pattern of gold market and forecast movement of gold price. The most appropriate approach to the understanding of gold prices is the Multiple Linear Regression (MLR model. MLR is a study on the relationship between a single dependent variable and one or more independent variables, as this case with gold price as the single dependent variable. The fitted model of MLR will be used to predict the future gold prices. A naive model known as ?forecast-1? was considered to be a benchmark model in order to evaluate the performance of the model. Results: Many factors determine the price of gold and based on ?a hunch of experts?, several economic factors had been identified to have influence on the gold prices. Variables such as Commodity Research Bureau future index (CRB; USD/Euro Foreign Exchange Rate (EUROUSD; Inflation rate (INF; Money Supply (M1; New York Stock Exchange (NYSE; Standard and Poor 500 (SPX; Treasury Bill (T-BILL and US Dollar index (USDX were considered to have influence on the prices. Parameter estimations for the MLR were carried out using Statistical Packages for Social Science package (SPSS with Mean Square Error (MSE as the fitness function to determine the forecast accuracy. Conclusion: Two models were considered. The first model considered all possible independent variables. The model appeared to be useful for predicting the price of gold with 85.2% of sample variations in monthly gold prices explained by the model. The second model considered the following four independent variables the (CRB lagged one, (EUROUSD lagged one, (INF lagged two and (M1 lagged two to be significant. In terms of prediction, the second model achieved high level of predictive accuracy. The amount of variance explained was about 70% and the regression coefficients also provide a means of assessing the relative importance of individual variables in the overall prediction of gold price.
Multiple nonparametric regression and model validation for mixed regressors
Schnurbus, Joachim
2014-01-01
The dissertation covers four essays on nonparametric (kernel and/or spline) regression, where tools and methods for the corresponding model validation are provided and discussed for a setup of mixed (discrete and continuous) covariates.
Forecasting Electrical Load using ANN Combined with Multiple Regression Method
This paper combined artificial neural network and regression modeling methods to predict electrical load. We propose an approach for specific day, week and/or month load forecasting for electrical companies taking into account the historical load. Therefore, a modified technique, based on artificial neural network (ANN) combined with linear regression, is applied on the KSA electrical network dependent on its historical data to predict the electrical load demand forecasting up to year 2020. T...
Functional linear regression via canonical analysis
We study regression models for the situation where both dependent and independent variables are square-integrable stochastic processes. Questions concerning the definition and existence of the corresponding functional linear regression models and some basic properties are explored for this situation. We derive a representation of the regression parameter function in terms of the canonical components of the processes involved. This representation establishes a connection betw...
Self-concordant analysis for logistic regression
Most of the non-asymptotic theoretical work in regression is carried out for the square loss, where estimators can be obtained through closed-form expressions. In this paper, we use and extend tools from the convex optimization literature, namely self-concordant functions, to provide simple extensions of theoretical results for the square loss to the logistic loss. We apply the extension techniques to logistic regression with regularization by the $\\ell_2$-norm and regularization by the $\\ell_1$-norm, showing that new results for binary classification through logistic regression can be easily derived from corresponding results for least-squares regression.
The determination of Eu(III) doping levels in various oxide matrices was carried out through x-ray fluorescence analysis. The use of fundamental parameters calculations was investigated as a potentially fast and accurate method, and comparison was made to results obtained by using an intensity model multiple regression method. By use of the fundamental parameters method, results were obtained that differed by less than +- 2% relative to those obtained through multiple regression results. The fundamental parameters method worked well with the use of only two concentration standards (which bracketed the unknown concentrations) and when the sample stoichiometry and matrix composition were specified. The fundamental parameters method is far easier to use than the multiple regression method, since one can obtain accurate results with the use of significantly fewer concentration standards. 20 references, 3 figures, 2 tables
Analysis of genome-wide association data by large-scale Bayesian logistic regression
Wang Yuanjia; Sha Nanshi; Fang Yixin
2009-01-01
Abstract Single-locus analysis is often used to analyze genome-wide association (GWA) data, but such analysis is subject to severe multiple comparisons adjustment. Multivariate logistic regression is proposed to fit a multi-locus model for case-control data. However, when the sample size is much smaller than the number of single-nucleotide polymorphisms (SNPs) or when correlation among SNPs is high, traditional multivariate logistic regression breaks down. To accommodate the scale of data fro...
A first-hitting-time (FHT) survival model postulates a health status process for a patient that gradually declines until the patient dies when the level first reaches a critical threshold. Threshold regression (TR) is a new regression methodology that incorporates the effects of covariates on the threshold and process parameters of this FHT model. In this study, we use TR to analyze data from a randomized clinical trial of treatment for multiple myeloma. The trial compares VELCADE and high-dose dexamethasone, the former a new therapy and the latter an established therapy for this disease. Patients are switched between the two drugs based on patient response. The novel contribution of this work is the modeling of this clinical trial design using a mixture of TR models. Specifically, we propose a mixture FHT model to fit the survival distribution. The model includes a composite time scale that differentiates the rate of disease progression before and after switching. The analysis shows significant benefit from initial treatment by VELCADE. A comparison is made with a Cox proportional hazards regression analysis of the same data. PMID:18991113
Latent class regression: inference and estimation with two-stage multiple imputation
Harel, Ofer; Chung, Hwan; Miglioretti, Diana
2013-01-01
Latent class regression (LCR) is a popular method for analyzing multiple categorical outcomes. While non-response to the manifest items is a common complication, inferences of LCR can be evaluated using maximum likelihood, multiple imputation, and two-stage multiple imputation. Under similar missing data assumptions, the estimates and variances from all three procedures are quite close. However, multiple imputation and two-stage multiple imputation can provide additional information: estimate...
Modeling Lateral and Longitudinal Control of Human Drivers with Multiple Linear Regression Models
In this paper, we describe results to model lateral and longitudinal control behavior of drivers with simple linear multiple regression models. This approach fits into the Bayesian Programming (BP) approach (Bessi
MULTIPLE REGRESSION MODELS FOR HINDCASTING AND FORECASTING MIDSUMMER HYPOXIA IN THE GULF OF MEXICO
A new suite of multiple regression models were developed that describe the relationship between the area of bottom water hypoxia along the northern Gulf of Mexico and Mississippi-Atchafalaya River nitrate concentration, total phosphorus (TP) concentration, and discharge. Variabil...
Multiple functional regression with both discrete and continuous covariates
Kadri, Hachem; Preux, Philippe; Duflos, Emmanuel; Canu, Stéphane
2013-01-01
In this paper we present a nonparametric method for extending functional regression methodology to the situation where more than one functional covariate is used to predict a functional response. Borrowing the idea from Kadri et al. (2010a), the method, which support mixed discrete and continuous explanatory variables, is based on estimating a function-valued function in reproducing kernel Hilbert spaces by virtue of positive operator-valued kernels.
Vehicle Travel Time Predication based on Multiple Kernel Regression
With the rapid development of transportation and logistics economy, the vehicle travel time prediction and planning become an important topic in logistics. Travel time prediction, which is indispensible for traffic guidance, has become a key issue for researchers in this field. At present, the prediction of travel time is mainly short term prediction, and the predication methods include artificial neural network, Kaman filter and support vector regression (SVR) method etc. However, these algo...
Landslides in the hilly terrain along the Kansas and Missouri rivers in northeastern Kansas have caused millions of dollars in property damage during the last decade. To address this problem, a statistical method called multiple logistic regression has been used to create a landslide-hazard map for Atchison, Kansas, and surrounding areas. Data included digitized geology, slopes, and landslides, manipulated using ArcView GIS. Logistic regression relates predictor variables to the occurrence or nonoccurrence of landslides within geographic cells and uses the relationship to produce a map showing the probability of future landslides, given local slopes and geologic units. Results indicated that slope is the most important variable for estimating landslide hazard in the study area. Geologic units consisting mostly of shale, siltstone, and sandstone were most susceptible to landslides. Soil type and aspect ratio were considered but excluded from the final analysis because these variables did not significantly add to the predictive power of the logistic regression. Soil types were highly correlated with the geologic units, and no significant relationships existed between landslides and slope aspect. ?? 2003 Elsevier Science B.V. All rights reserved.
Multiple regression technique for Pth degree polynominals with and without linear cross products
A multiple regression technique was developed by which the nonlinear behavior of specified independent variables can be related to a given dependent variable. The polynomial expression can be of Pth degree and can incorporate N independent variables. Two cases are treated such that mathematical models can be studied both with and without linear cross products. The resulting surface fits can be used to summarize trends for a given phenomenon and provide a mathematical relationship for subsequent analysis. To implement this technique, separate computer programs were developed for the case without linear cross products and for the case incorporating such cross products which evaluate the various constants in the model regression equation. In addition, the significance of the estimated regression equation is considered and the standard deviation, the F statistic, the maximum absolute percent error, and the average of the absolute values of the percent of error evaluated. The computer programs and their manner of utilization are described. Sample problems are included to illustrate the use and capability of the technique which show the output formats and typical plots comparing computer results to each set of input data.
Regression tree approach to studying factors influencing acoustic voice analysis.
Multiple factors influence voice quality measurements (VQM) obtained during an acoustic voice assessment including: gender, intrasubject variability, microphone, environmental noise (type and level), data acquisition (DA) system, and analysis software. This study used regression trees to investigate the order and relative importance of these factors on VQM including interaction effects of the factors and how the outcome differs when the acoustic environment is controlled for noise. Twenty normophonic participants provided 20 voice samples each, which were recorded synchronously on five DA systems combined with six different microphones. The samples were mixed with five noise types at eight signal-to-noise ratio (SNR) levels. The resulting 80,000 audio samples were analyzed for fundamental frequency (F(0)), jitter and shimmer using three software analysis systems: MDVP, PRAAT, and TF32 (CSpeech). Fifteen regression trees and their Variable Importance Measures were utilized to analyze the data. The analyses confirmed that all of the factors listed above were influential. The results suggest that gender, intrasubject variability, and microphone were significant influences on F(0). Software systems and gender were highly influential on measurements of jitter and shimmer. Environmental noise was shown to be the prominent factor that affects VQM when SNR levels are below 30 dB. PMID:16825780
Several statistical packages are capable of estimating generalized linear mixed models and these packages provide one or more of three estimation methods: penalized quasi-likelihood, Laplace, and Gauss-Hermite. Many studies have investigated these methods' performance for the mixed-effects logistic regression model. However, the authors focused on models with one or two random effects and assumed a simple covariance structure between them, which may not be realistic. When there are multiple correlated random effects in a model, the computation becomes intensive, and often an algorithm fails to converge. Moreover, in our analysis of smoking status and exposure to anti-tobacco advertisements, we have observed that when a model included multiple random effects, parameter estimates varied considerably from one statistical package to another even when using the same estimation method. This article presents a comprehensive review of the advantages and disadvantages of each estimation method. In addition, we compare the performances of the three methods across statistical packages via simulation, which involves two- and three-level logistic regression models with at least three correlated random effects. We apply our findings to a real dataset. Our results suggest that two packages-SAS GLIMMIX Laplace and SuperMix Gaussian quadrature-perform well in terms of accuracy, precision, convergence rates, and computing speed. We also discuss the strengths and weaknesses of the two packages in regard to sample sizes. PMID:24288415
Neutron multiplicity analysis tool
I describe the capabilities of the EXCOM (EXcel based COincidence and Multiplicity) calculation tool which is used to analyze experimental data or simulated neutron multiplicity data. The input to the program is the count-rate data (including the multiplicity distribution) for a measurement, the isotopic composition of the sample and relevant dates. The program carries out deadtime correction and background subtraction and then performs a number of analyses. These are: passive calibration curve, known alpha and multiplicity analysis. The latter is done with both the point model and with the weighted point model. In the current application EXCOM carries out the rapid analysis of Monte Carlo calculated quantities and allows the user to determine the magnitude of sample perturbations that lead to systematic errors. Neutron multiplicity counting is an assay method used in the analysis of plutonium for safeguards applications. It is widely used in nuclear material accountancy by international (IAEA) and national inspectors. The method uses the measurement of the correlations in a pulse train to extract information on the spontaneous fission rate in the presence of neutrons from (?,n) reactions and induced fission. The measurement is relatively simple to perform and gives results very quickly ((le) 1 hour). By contrast, destructive analysis techniques are extremely costly and time consuming (several days). By improving the achievable accuracy of neutron multiplicity countie accuracy of neutron multiplicity counting, a nondestructive analysis technique, it could be possible to reduce the use of destructive analysis measurements required in safeguards applications. The accuracy of a neutron multiplicity measurement can be affected by a number of variables such as density, isotopic composition, chemical composition and moisture in the material. In order to determine the magnitude of these effects on the measured plutonium mass a calculational tool, EXCOM, has been produced using VBA within Excel. This program was developed to help speed the analysis of Monte Carlo neutron transport simulation (MCNP) data, and only requires the count-rate data to calculate the mass of material using INCC's analysis methods instead of the full neutron multiplicity distribution required to run analysis in INCC. This paper describes what is implemented within EXCOM, including the methods used, how the program corrects for deadtime, and how uncertainty is calculated. This paper also describes how to use EXCOM within Excel.
Multiple predictor smoothing methods for sensitivity analysis: Example results
International Nuclear Information System (INIS)
The use of multiple predictor smoothing methods in sampling-based sensitivity analyses of complex models is investigated. Specifically, sensitivity analysis procedures based on smoothing methods employing the stepwise application of the following nonparametric regression techniques are described in the first part of this presentation: (i) locally weighted regression (LOESS), (ii) additive models, (iii) projection pursuit regression, and (iv) recursive partitioning regression. In this, the second and concluding part of the presentation, the indicated procedures are illustrated with both simple test problems and results from a performance assessment for a radioactive waste disposal facility (i.e., the Waste Isolation Pilot Plant). As shown by the example illustrations, the use of smoothing procedures based on nonparametric regression techniques can yield more informative sensitivity analysis results than can be obtained with more traditional sensitivity analysis procedures based on linear regression, rank regression or quadratic regression when nonlinear relationships between model inputs and model predictions are present
Simulation Experiments in Practice: Statistical Design and Regression Analysis
Kleijnen, J.P.C.
2007-01-01
In practice, simulation analysts often change only one factor at a time, and use graphical analysis of the resulting Input/Output (I/O) data. The goal of this article is to change these traditional, naïve methods of design and analysis, because statistical theory proves that more information is obtained when applying Design Of Experiments (DOE) and linear regression analysis. Unfortunately, classic DOE and regression analysis assume a single simulation response that is normally and independen...
Neutron multiplicity analysis tool
I describe the capabilities of the EXCOM (EXcel based COincidence and Multiplicity) calculation tool which is used to analyze experimental data or simulated neutron multiplicity data. The input to the program is the count-rate data (including the multiplicity distribution) for a measurement, the isotopic composition of the sample and relevant dates. The program carries out deadtime correction and background subtraction and then performs a number of analyses. These are: passive calibration curve, known alpha and multiplicity analysis. The latter is done with both the point model and with the weighted point model. In the current application EXCOM carries out the rapid analysis of Monte Carlo calculated quantities and allows the user to determine the magnitude of sample perturbations that lead to systematic errors. Neutron multiplicity counting is an assay method used in the analysis of plutonium for safeguards applications. It is widely used in nuclear material accountancy by international (IAEA) and national inspectors. The method uses the measurement of the correlations in a pulse train to extract information on the spontaneous fission rate in the presence of neutrons from ({alpha},n) reactions and induced fission. The measurement is relatively simple to perform and gives results very quickly ({le} 1 hour). By contrast, destructive analysis techniques are extremely costly and time consuming (several days). By improving the achievable accuracy of neutron multiplicity counting, a nondestructive analysis technique, it could be possible to reduce the use of destructive analysis measurements required in safeguards applications. The accuracy of a neutron multiplicity measurement can be affected by a number of variables such as density, isotopic composition, chemical composition and moisture in the material. In order to determine the magnitude of these effects on the measured plutonium mass a calculational tool, EXCOM, has been produced using VBA within Excel. This program was developed to help speed the analysis of Monte Carlo neutron transport simulation (MCNP) data, and only requires the count-rate data to calculate the mass of material using INCC's analysis methods instead of the full neutron multiplicity distribution required to run analysis in INCC. This paper describes what is implemented within EXCOM, including the methods used, how the program corrects for deadtime, and how uncertainty is calculated. This paper also describes how to use EXCOM within Excel.
Full Text Available The paper deals with mutual influence for a share of government expenditures in GDP and a real GDP value as an indicator of economic growth. The invented regression models for the relationship of those indicators for the economies of China and Ukraine perform the grade of effectiveness for government funds using.? ?????? ??????????????? ??????????? ???? ??????????????? ???????? ? ??? ? ?????? ????????? ??? ??? ?????????? ?????????????? ?????. ????????? ????????????? ?????? ??????????? ???? ??????????? ??? ???????? ????? ? ???????, ?????? ????????????? ?? ????????????? ????????????? ??????????????? ???????.
Multiple-spell regression models for duration data
Hamerle, Alfred
1989-01-01
General models for multiple-spell duration data are considered. A general theory which indicates how the successive spells of an individual are generated by an underlying stochastic process is presented. Various special cases of the general model are discussed. The implications of different timescales are investigated: different timescales lead to different underlying stochastic processes such as Markov processes or semi-Markov processes. Occasionally common computer programs for duration dat...
Full Text Available Polyethylene glycol (PEG is the most common preservative in use for bulking and maintaining structural integrity in waterlogged wood. Conservators therefore have a need to be able to determine PEG concentrations in wood in a non-destructive manner. We present a study highlighting the application of infrared spectroscopy coupled with multivariate analysis techniques to predict the concentration of polyethylene glycol 400 (PEG-400 and water simultaneously. This technique uses attenuated total reflectance (ATR spectroscopy andunconstrained stepwise multiple linear regression (SMLR analysis for prediction of multiple components in archaeological wood. Using this model we have calculated the concentration of PEG-400 and water in treated archaeological waterlogged wood samples.
Full Text Available The triggers of forest area loss in Cameroon have not been properly understood. The measures used to curb forest area loss have been simplistic, generalized with no clear cut knowledge of the specific role of different potential factors. This study aims at investigating the hypothesis that population growth is the main cause of loss in forest area. This study will be able to identify what factors are of more significance in the causal equation. The open R programming software has been used to produce multiple linear regression models. The correlation between the dependent variable and the independent variables was established by a correlation matrix and the strength of the models tested by power analysis. The results supports the hypothesis that population growth is the most dominant cause of deforestation in Cameroon while arable production and permanent crop land and arable production per capita index are second and third respectively.
2011-01-01
Quantile regression (QR) is a very useful statistical tool for learning the relationship between the response variable and covariates. For many applications, one often needs to estimate multiple conditional quantile functions of the response variable given covariates. Although one can estimate multiple quantiles separately, it is of great interest to estimate them simultaneously. One advantage of simultaneous estimation is that multiple quantiles can share strength among them to gain better e...
A method for the analysis of capillary column Polychlorinated biphenyl (PCB) data using regression analysis with outlier checking and elimination, COMSTAR, is presented and evaluated. his algorithm determines the best combination of the commercial PCB mixtures which best fits the...
Ordinal regression revisited: multiple criteria ranking with a set of additive value functions
Greco, Salvatore; Mousseau, Vincent; Slowinski, Roman
2007-01-01
We present a new method (called UTAGMS) for multiple criteria ranking using strongly and weakly established weak preference relations which result from an ordinal regression. The preference information supplied by the decision maker is a set of pairwise compar- isons of reference alternatives. The preference model built via ordinal regression is a set of general additive value functions. The method provides two final rankings: a strong ranking identifying "sure" preference statements, and a w...
Catchment Area Analysis Using Bayesian Regression Modeling
A catchment area (CA) is the geographic area and population from which a cancer center draws patients. Defining a CA allows a cancer center to describe its primary patient population and assess how well it meets the needs of cancer patients within the CA. A CA definition is required for cancer centers applying for National Cancer Institute (NCI)-designated cancer center status. In this research, we constructed both diagnosis and diagnosis/treatment CAs for the Massey Cancer Center (MCC) at Virginia Commonwealth University. We constructed diagnosis CAs for all cancers based on Virginia state cancer registry data and Bayesian hierarchical logistic regression models. We constructed a diagnosis/treatment CA using billing data from MCC and a Bayesian hierarchical Poisson regression model. To define CAs, we used exceedance probabilities for county random effects to assess unusual spatial clustering of patients diagnosed or treated at MCC after adjusting for important demographic covariates. We used the MCC CAs to compare patient characteristics inside and outside the CAs. Among cancer patients living within the MCC CA, patients diagnosed at MCC were more likely to be minority, female, uninsured, or on Medicaid. PMID:25983542
The Study on Technology Innovation of Chinese Enterprises by Regression Analysis
According to China Science and Technology Data in recent years, we use Multiple Regression to analysis the influencing factors of technology innovation, and demonstrate the impact of significant and non-significant factors about China’s investment expenditures related policies for technological innovation, so as to enhance China's technological innovation capability and to promote domestic economic development play a guidance and reference.
Egg hatchability prediction by multiple linear regression and artificial neural networks
Full Text Available An artificial neural network (ANN) was compared with a multiple linear regression statistical method to predict hatchability in an artificial incubation process. A feedforward neural network architecture was applied. Network trainings were made by the backpropagation algorithm based on data obtained [...] from industrial incubations. The ANN model was chosen as it produced data that fit better the experimental data as compared to the multiple linear regression model, which used coefficients determined by minimum square method. The proposed simulation results of these approaches indicate that this ANN can be used for incubation performance prediction.
Egg hatchability prediction by multiple linear regression and artificial neural networks
Biplots in Reduced-Rank Regression
Regression problems with a number of related response variables are typically analyzed by separate multiple regressions. This paper shows how these regressions can be visualized jointly in a biplot based on reduced-rank regression. Reduced-rank regression combines multiple regression and principal components analysis and can therefore be carried out with standard statistical packages. The proposed biplot highlights the major aspects of the regressions by displaying the least-squares approxima...
Stability Analysis for Regularized Least Squares Regression
Rudin, C
2005-01-01
We discuss stability for a class of learning algorithms with respect to noisy labels. The algorithms we consider are for regression, and they involve the minimization of regularized risk functionals, such as L(f) := 1/N sum_i (f(x_i)-y_i)^2+ lambda ||f||_H^2. We shall call the algorithm `stable' if, when y_i is a noisy version of f*(x_i) for some function f* in H, the output of the algorithm converges to f* as the regularization term and noise simultaneously vanish. We consider two flavors of this problem, one where a data set of N points remains fixed, and the other where N -> infinity. For the case where N -> infinity, we give conditions for convergence to f_E (the function which is the expectation of y(x) for each x), as lambda -> 0. For the fixed N case, we describe the limiting 'non-noisy', 'non-regularized' function f*, and give conditions for convergence. In the process, we develop a set of tools for dealing with functionals such as L(f), which are applicable to many other problems in learning theory.
Joint regression analysis of correlated data using Gaussian copulas.
Song, Peter X-K; Li, Mingyao; Yuan, Ying
2009-03-01
This article concerns a new joint modeling approach for correlated data analysis. Utilizing Gaussian copulas, we present a unified and flexible machinery to integrate separate one-dimensional generalized linear models (GLMs) into a joint regression analysis of continuous, discrete, and mixed correlated outcomes. This essentially leads to a multivariate analogue of the univariate GLM theory and hence an efficiency gain in the estimation of regression coefficients. The availability of joint probability models enables us to develop a full maximum likelihood inference. Numerical illustrations are focused on regression models for discrete correlated data, including multidimensional logistic regression models and a joint model for mixed normal and binary outcomes. In the simulation studies, the proposed copula-based joint model is compared to the popular generalized estimating equations, which is a moment-based estimating equation method to join univariate GLMs. Two real-world data examples are used in the illustration. PMID:18510653
Multi-Step-Ahead Time Series Prediction using Multiple-Output Support Vector Regression
Accurate time series prediction over long future horizons is challenging and of great interest to both practitioners and academics. As a well-known intelligent algorithm, the standard formulation of Support Vector Regression (SVR) could be taken for multi-step-ahead time series prediction, only relying either on iterated strategy or direct strategy. This study proposes a novel multiple-step-ahead time series prediction approach which employs multiple-output support vector re...
Background stratified Poisson regression analysis of cohort data
Richardson, David B.; Langholz, Bryan
2011-01-01
Background stratified Poisson regression is an approach that has been used in the analysis of data derived from a variety of epidemiologically important studies of radiation-exposed populations, including uranium miners, nuclear industry workers, and atomic bomb survivors. We describe a novel approach to fit Poisson regression models that adjust for a set of covariates through background stratification while directly estimating the radiation-disease association of primary interest. The approa...
Unions and Profits: A meta-regression Analysis
The effect of unions on profits continues to be an unresolved theoretical and empirical issue. In this paper, clustered data analysis and hierarchical linear meta-regression models are applied to the population of forty-five econometric studies that report 532 estimates of the direct effect of unions on profits. Unions have a significant negative effect on profits in the United States, and this effect is larger when market-based measures of profits are used. Separate meta-regression analyses ...
A Spreadsheet Tool for Learning the Multiple Regression F-Test, T-Tests, and Multicollinearity
Martin, David
2008-01-01
This note presents a spreadsheet tool that allows teachers the opportunity to guide students towards answering on their own questions related to the multiple regression F-test, the t-tests, and multicollinearity. The note demonstrates approaches for using the spreadsheet that might be appropriate for three different levels of statistics classes,…
A Simple and Convenient Method of Multiple Linear Regression to Calculate Iodine Molecular Constants
A new procedure using a student-friendly least-squares multiple linear-regression technique utilizing a function within Microsoft Excel is described that enables students to calculate molecular constants from the vibronic spectrum of iodine. This method is advantageous pedagogically as it calculates molecular constants for ground and excited…
Regression Model Optimization for the Analysis of Experimental Data
A candidate math model search algorithm was developed at Ames Research Center that determines a recommended math model for the multivariate regression analysis of experimental data. The search algorithm is applicable to classical regression analysis problems as well as wind tunnel strain gage balance calibration analysis applications. The algorithm compares the predictive capability of different regression models using the standard deviation of the PRESS residuals of the responses as a search metric. This search metric is minimized during the search. Singular value decomposition is used during the search to reject math models that lead to a singular solution of the regression analysis problem. Two threshold dependent constraints are also applied. The first constraint rejects math models with insignificant terms. The second constraint rejects math models with near-linear dependencies between terms. The math term hierarchy rule may also be applied as an optional constraint during or after the candidate math model search. The final term selection of the recommended math model depends on the regressor and response values of the data set, the user s function class combination choice, the user s constraint selections, and the result of the search metric minimization. A frequently used regression analysis example from the literature is used to illustrate the application of the search algorithm to experimental data.
Applying Multiple Linear Regression and Neural Network to Predict Bank Performance
Full Text Available Globalization and technological advancement has created a highly competitive market in the banking and finance industry. Performance of the industry depends heavily on the accuracy of the decisions made at managerial level. This study uses multiple linear regression technique and feed forward artificial neural network in predicting bank performance. The study aims to predict bank performance using multiple linear regression and neural network. The study then evaluates the performance of the two techniques with a goal to find a powerful tool in predicting the bank performance. Data of thirteen banks for the period 2001-2006 was used in the study. ROA was used as a measure of bank performance, and hence is a dependent variable for the multiple linear regressions. Seven variables including liquidity, credit risk, cost to income ratio, size, concentration ratio, inflation and GDP were used as independent variables. Under supervised learning, the dependent variable, ROA was used as the target output for the artificial neural network. Seven inputs corresponding to seven predictor variables were used for pattern recognition at the training phase. Experimental results from the multiple linear regression show that two variables: credit risk and cost to income ratio are significant in determining the bank performance. Two variables were found to explain about 60.9 percent of the total variation in the data with a mean square error (MSE of 0.330. The artificial neural network was found to give optimal results by using thirteen hidden neurons. Testing results show that the seven inputs explain about 66.9 percent of the total variation in the data with a very low MSE of 0.00687. Performance of both methods is measured by mean square prediction error (MSPR at the validation stage. The MSPR value for neural network is lower than the MPSR value for multiple linear regression (0.0061 against 0.6190. The study concludes that artificial neural network is the more powerful tool in predicting bank performance.
ON PARTITIONING THE EXPLAINED VARIATION IN A REGRESSION ANALYSIS.
WISLER, CARL E.
AN ANALYSIS OF THE NONUNIQUE PORTION OF EXPLAINED VARIATION WHICH CAN BE ATTRIBUTED TO THE REGRESSOR VARIABLES AS A GROUP IN REGRESSION ANALYSIS IS DISCUSSED. THE UNIQUE SUMS OF SQUARES IS PRESENTED AS A BASIS FOR UNDERSTANDING THE PROCEDURE FOR PARTITIONING THE NONUNIQUE PART. A METHOD OF PARTITIONING NONUNIQUE VARIATION IS DEFINED IN TERMS OF A…
The Sage handbook of regression analysis and causal inference
Best, Henning
2014-01-01
Covering both general and advanced aspects of multivariate methods, this handbook focuses on regression analysis of cross-sectional and longitudinal data with an emphasis on causal analysis and provides readers with an introduction to and exploration of a large range of techniques.
International Nuclear Information System (INIS)
Fundamental parameters calculations are used for the analysis of europium in the concentration range of 0.1 WT% to 30.0 WT% in the oxidic catalyst supports alumina, calcia, magnesia, lanthania, and thoria. The precision and accuracy of this method is dependent on how the sample matrix is defined in the fundamental parameters program and the number and concentration of the standards used. Results comparable to the multiple regression method are obtained when the matrix stoichiometry is defined as Eu2O3 and the catalyst oxide (i.e. A12O3 etc). It is also necessary to use standards which bracket the europium concentration in the samples. When these conditions are met, the results are comparable to those obtained from a ten point multiple regression calibration curve but with a considerable saving of standard preparation time. The precision is better than + or - 2% relative. The % relative difference between the fundamental parameters and multiple regression results is also 2%. Data is presented which illustrates the effect of defining the sample stoichiometry in the XRF11 computer program
Ratio Versus Regression Analysis: Some Empirical Evidence in Brazil
Analysis of Sting Balance Calibration Data Using Optimized Regression Models
Calibration data of a wind tunnel sting balance was processed using a candidate math model search algorithm that recommends an optimized regression model for the data analysis. During the calibration the normal force and the moment at the balance moment center were selected as independent calibration variables. The sting balance itself had two moment gages. Therefore, after analyzing the connection between calibration loads and gage outputs, it was decided to choose the difference and the sum of the gage outputs as the two responses that best describe the behavior of the balance. The math model search algorithm was applied to these two responses. An optimized regression model was obtained for each response. Classical strain gage balance load transformations and the equations of the deflection of a cantilever beam under load are used to show that the search algorithm s two optimized regression models are supported by a theoretical analysis of the relationship between the applied calibration loads and the measured gage outputs. The analysis of the sting balance calibration data set is a rare example of a situation when terms of a regression model of a balance can directly be derived from first principles of physics. In addition, it is interesting to note that the search algorithm recommended the correct regression model term combinations using only a set of statistical quality metrics that were applied to the experimental data during the algorithm s term selection process.
Gene-environment (G × E) interactions are biologically important for a wide range of environmental exposures and clinical outcomes. Because of the large number of potential interactions in genomewide association data, the standard approach fits one model per G × E interaction with multiple hypothesis correction (MHC) used to control the type I error rate. Although sometimes effective, using one model per candidate G × E interaction test has two important limitations: low power due to MHC and omitted variable bias. To avoid the coefficient estimation bias associated with independent models, researchers have used penalized regression methods to jointly test all main effects and interactions in a single regression model. Although penalized regression supports joint analysis of all interactions, can be used with hierarchical constraints, and offers excellent predictive performance, it cannot assess the statistical significance of G × E interactions or compute meaningful estimates of effect size. To address the challenge of low power, researchers have separately explored screening-testing, or two-stage, methods in which the set of potential G × E interactions is first filtered and then tested for interactions with MHC only applied to the tests actually performed in the second stage. Although two-stage methods are statistically valid and effective at improving power, they still test multiple separate models and so are impacted by MHC and biased coefficient estimation. To remedy the challenges of both poor power and omitted variable bias encountered with traditional G × E interaction detection methods, we propose a novel approach that combines elements of screening-testing and hierarchical penalized regression. Specifically, our proposed method uses, in the first stage, an elastic net-penalized multiple logistic regression model to jointly estimate either the marginal association filter statistic or the gene-environment correlation filter statistic for all candidate genetic markers. In the second stage, a single multiple logistic regression model is used to jointly assess marginal terms and G × E interactions for all genetic markers that pass the first stage filter. A single likelihood-ratio test is used to determine whether any of the interactions are statistically significant. We demonstrate the efficacy of our method relative to alternative G × E detection methods on a bladder cancer data set. PMID:25592580
Multiple regression as a preventive tool for determining the risk of Legionella spp.
Full Text Available To determine the interrelationship between health & hygiene conditions for prevention of legionellosis, the compositionof materials used in water distribution systems, the water origin and Legionella pneumophila risk. Material and methods. Include adescriptive study and multiple regression analysis on a sample of golf course sprinkler irrigation systems (n=31 pertaining to hotelslocated on the Costa del Sol (Malaga, Spain. The study was carried out in 2009. Results. Presented a significant lineal relation, withall the independent variables contributing significantly (p<0.05 to the model’s fit. The relationship between water type and the risk ofLegionella, as well as the material composition and the latter, is lineal and positive. In contrast, the relationship between health-hygieneconditions and Legionella risk is lineal and negative. Conclusion. The characterization of Legionella pneumophila concentration, asdefined by the risk in water and through use of the predictive method, can contribute to the consideration of new influence variables inthe development of the agent, resulting in improved control and prevention of the disease.
An adequate design for regression analysis of yield trials.
Gusmão, L
1985-12-01
Based on theoretical demonstrations and illustrated with a numerical example from triticale yield trials in Portugal, the Completely Randomized Design is proposed as the one suited for Regression Analysis. When trials are designed in Complete Randomized Blocks the regression of plot production on block mean instead of the regression of cultivar mean on the overall mean of the trial is proposed as the correct procedure for regression analysis. These proposed procedures, in addition to providing a better agreement with the assumptions for regression and the philosophy of the method, induce narrower confidence intervals and attenuation of the hyperbolic effect. The increase in precision is brought about by both a decrease in the t Student values by an increased number of degrees of freedom, and by a decrease in standard error by a non proportional increase of residual variance and non proportional increase of the sum of squares of the assumed independent variable. The new procedures seem to be promising for a better understanding of the mechanism of specific instability. PMID:24247400
Application of stepwise multiple regression techniques to inversion of Nimbus 'IRIS' observations.
Exploratory studies with Nimbus-3 infrared interferometer-spectrometer (IRIS) data indicate that, in addition to temperature, such meteorological parameters as geopotential heights of pressure surfaces, tropopause pressure, and tropopause temperature can be inferred from the observed spectra with the use of simple regression equations. The technique of screening the IRIS spectral data by means of stepwise regression to obtain the best radiation predictors of meteorological parameters is validated. The simplicity of application of the technique and the simplicity of the derived linear regression equations - which contain only a few terms - suggest usefulness for this approach. Based upon the results obtained, suggestions are made for further development and exploitation of the stepwise regression analysis technique.
International Nuclear Information System (INIS)
Much attention is focused on increasing the energy efficiency to decrease fuel costs and CO2 emissions throughout industrial sectors. The ORC (organic Rankine cycle) is a relatively simple but efficient process that can be used for this purpose by converting low and medium temperature waste heat to power. In this study we propose four linear regression models to predict the maximum obtainable thermal efficiency for simple and recuperated ORCs. A previously derived methodology is able to determine the maximum thermal efficiency among many combinations of fluids and processes, given the boundary conditions of the process. Hundreds of optimised cases with varied design parameters are used as observations in four multiple regression analyses. We analyse the model assumptions, prediction abilities and extrapolations, and compare the results with recent studies in the literature. The models are in agreement with the literature, and they present an opportunity for accurate prediction of the potential of an ORC to convert heat sources with temperatures from 80 to 360 °C, without detailed knowledge or need for simulation of the process. - Highlights: • The maximum thermal efficiency of ORCs in hundreds of cases was analysed. • Multiple regression models were derived to predict the maximum obtainable efficiency of ORCs. • Using only key design parameters, the maximum obtainable efficiency can be evaluated. • The regression models decrease the resources needed to evaluate the maximum potential. • The models are statistically strong and in good agreement with the literature
User's Guide to the Weighted-Multiple-Linear Regression Program (WREG version 1.0)
Eng, Ken; Chen, Yin-Yu; Kiang, Julie.E.
2009-01-01
Streamflow is not measured at every location in a stream network. Yet hydrologists, State and local agencies, and the general public still seek to know streamflow characteristics, such as mean annual flow or flood flows with different exceedance probabilities, at ungaged basins. The goals of this guide are to introduce and familiarize the user with the weighted multiple-linear regression (WREG) program, and to also provide the theoretical background for program features. The program is intended to be used to develop a regional estimation equation for streamflow characteristics that can be applied at an ungaged basin, or to improve the corresponding estimate at continuous-record streamflow gages with short records. The regional estimation equation results from a multiple-linear regression that relates the observable basin characteristics, such as drainage area, to streamflow characteristics.
Mass estimation of loose parts in nuclear power plant based on multiple regression
International Nuclear Information System (INIS)
According to the application of the Hilbert–Huang transform to the non-stationary signal and the relation between the mass of loose parts in nuclear power plant and corresponding frequency content, a new method for loose part mass estimation based on the marginal Hilbert–Huang spectrum (MHS) and multiple regression is proposed in this paper. The frequency spectrum of a loose part in a nuclear power plant can be expressed by the MHS. The multiple regression model that is constructed by the MHS feature of the impact signals for mass estimation is used to predict the unknown masses of a loose part. A simulated experiment verified that the method is feasible and the errors of the results are acceptable. (paper)
Moderated multiple regression (MMR) arguably is the most popular statistical technique for investigating regression slope differences (interactions) across groups (e.g., aptitude-treatment interactions in training and differential test score-job performance prediction in selection testing). However, heterogeneous error variances can greatly bias the typical MMR analysis, and the conditions that cause heterogeneity are not uncommon. Statistical corrections that have been developed require special calculations and are not conducive to follow-up analyses that describe an interaction effect in depth. A weighted least squares (WLS) approach is recommended for 2-group studies. For 2-group studies, WLS is statistically accurate, is readily executed through popular software packages (e.g., SAS Institute, 1999; SPSS, 1999), and allows follow-up tests. PMID:11570229
Several statistical packages are capable of estimating generalized linear mixed models and these packages provide one or more of three estimation methods: penalized quasi-likelihood, Laplace, and Gauss-Hermite. Many studies have investigated these methods’ performance for the mixed-effects logistic regression model. However, the authors focused on models with one or two random effects and assumed a simple covariance structure between them, which may not be realistic. When there are multiple...
Multiple-regression equations for estimating low flows at ungaged stream sites in Ohio
Koltun, G.F.; Schwartz, R.R.
1987-01-01
This report presents multiple-regression equations for estimating selected low-flow characteristics for most unregulated Ohio streams at sites where little or no discharge data are available. The equations relate combinations of drainage area, main-channel length, main-channel slope, average basin elevation, forested area, average annual precipitation, and an index of infiltration to low flows with durations of 7 and 30 days and average recurrence intervals of 2 and 10 years. Data from 132 long-term continuous-record gaging stations and partial-record sites in Ohio were used in the analyses. Multiple-regression analyses were first performed by using data from all 132 sites in an attempt to develop equations that would be applicable statewide. Standard errors for the statewide equations were too high (111 to 189 percent) for them to be of practical use in estimating low streamflows. Data for the state were then subdivided into five regions, and multiple-regression equations were developed for each region. Standard errors for four of the five regions improved, and raged from 43 to 106 percent. Standard errors for region 5 remained high (74 to 129 percent). The multiple-regression equations presented in this report are not applicable to streams with significant low-flow regulation. The equations also are not applicable if (1) the site has been gaged and low-flow estimates have been developed from gaging-station records, (2) low flow can be estimated by the drainage-area transference method from data for a nearby gaged site, or (3) a sufficient number of partial-record measurements made at the site can be adquately correlated with concurrent base flows at a suitable index station.
Estimation of Saturation Percentage of Soil Using Multiple Regression, ANN, and ANFIS Techniques
Khaled Ahmad Aali; Masoud Parsinejad; Bizhan Rahmani
2009-01-01
The saturation percentage (SP) of soils is an important index in hydrological studies. In this paper, arti?cial neural networks (ANNs), multiple regression (MR), and adaptive neural-based fuzzy inference system (ANFIS) were used for estimation of saturation percentage of soils collected from Boukan region in the northwestern part of Iran. Percent clay, silt, sand and organic carbon (OC) were used to develop the applied methods. In additions contributions of each input variable were asse...
Multiple linear regression MOS for short-term wind power forecast
Short-term (0 - 36 h ahead) wind power forecast is a central issue for the correct management of a grid connected wind farm. A combination of physical and statistical treatments to post-process Numerical Weather Predictions (NWP) outputs is needed for successful short-term wind power forecasts. One of the most promising and effective approaches for statistical treatment is the Model Output Statistics (MOS) technique. In this study a MOS based on multiple linear regression is proposed: the mod...
Least Squares Adjustment: Linear and Nonlinear Weighted Regression Analysis
DEFF Research Database (Denmark)
Nielsen, Allan Aasbjerg
2007-01-01
This note primarily describes the mathematics of least squares regression analysis as it is often used in geodesy including land surveying and satellite positioning applications. In these fields regression is often termed adjustment. The note also contains a couple of typical land surveying and satellite positioning application examples. In these application areas we are typically interested in the parameters in the model typically 2- or 3-D positions and not in predictive modelling which is often the main concern in other regression analysis applications. Adjustment is often used to obtain estimates of relevant parameters in an over-determined system of equations which may arise from deliberately carrying out more measurements than actually needed to determine the set of desired parameters. An example may be the determination of a geographical position based on information from a number of Global Navigation Satellite System (GNSS) satellites also known as space vehicles (SV). It takes at least four SVs to determine the position (and the clock error) of a GNSS receiver. Often more than four SVs are used and we use adjustment to obtain a better estimate of the geographical position (and the clock error) and to obtain estimates of the uncertainty with which the position is determined. Regression analysis is used in many other fields of application both in the natural, the technical and the social sciences. Examples may be curve fitting, calibration, establishing relationships between different variables in an experiment or in a survey, etc. Regression analysis is probably one the most used statistical techniques around. Dr. Anna B. O. Jensen provided insight and data for the Global Positioning System (GPS) example. Matlab code and sections that are considered as either traditional land surveying material or as advanced material are typeset with smaller fonts. Comments in general or on for example unavoidable typos, shortcomings and errors are most welcome.
In this study, the application of Artificial Neural Networks (ANN) and Multiple regression analysis (MR) to forecast long-term seasonal spring rainfall in Victoria, Australia was investigated using lagged El Nino Southern Oscillation (ENSO) and Indian Ocean Dipole (IOD) as potential predictors. The use of dual (combined lagged ENSO-IOD) input sets for calibrating and validating ANN and MR Models is proposed to investigate the simultaneous effect of past values of these two major climate modes on long-term spring rainfall prediction. The MR models that did not violate the limits of statistical significance and multicollinearity were selected for future spring rainfall forecast. The ANN was developed in the form of multilayer perceptron using Levenberg-Marquardt algorithm. Both MR and ANN modelling were assessed statistically using mean square error (MSE), mean absolute error (MAE), Pearson correlation (r) and Willmott index of agreement (d). The developed MR and ANN models were tested on out-of-sample test sets; the MR models showed very poor generalisation ability for east Victoria with correlation coefficients of -0.99 to -0.90 compared to ANN with correlation coefficients of 0.42-0.93; ANN models also showed better generalisation ability for central and west Victoria with correlation coefficients of 0.68-0.85 and 0.58-0.97 respectively. The ability of multiple regression models to forecast out-of-sample sets is compatible with ANN for Daylesford in central Victoria and Kaniva in west Victoria (r = 0.92 and 0.67 respectively). The errors of the testing sets for ANN models are generally lower compared to multiple regression models. The statistical analysis suggest the potential of ANN over MR models for rainfall forecasting using large scale climate modes.
Bayesian Method of Moments (BMOM) Analysis of Mean and Regression Models
Zellner, Arnold
2008-01-01
A Bayesian method of moments/instrumental variable (BMOM/IV) approach is developed and applied in the analysis of the important mean and multiple regression models. Given a single set of data, it is shown how to obtain posterior and predictive moments without the use of likelihood functions, prior densities and Bayes' Theorem. The posterior and predictive moments, based on a few relatively weak assumptions, are then used to obtain maximum entropy densities for parameters, realized error terms and future values of variables. Posterior means for parameters and realized error terms are shown to be equal to certain well known estimates and rationalized in terms of quadratic loss functions. Conditional maxent posterior densities for means and regression coefficients given scale parameters are in the normal form while scale parameters' maxent densities are in the exponential form. Marginal densities for individual regression coefficients, realized error terms and future values are in the Laplace or double-exponenti...
Regression equations have many useful roles in psychological assessment. Moreover, there is a large reservoir of published data that could be used to build regression equations; these equations could then be employed to test a wide variety of hypotheses concerning the functioning of individual cases. This resource is currently underused because…
Lidauer, M H; Emmerling, R; Mäntysaari, E A
2008-06-01
A multiplicative random regression (M-RRM) test-day (TD) model was used to analyse daily milk yields from all available parities of German and Austrian Simmental dairy cattle. The method to account for heterogeneous variance (HV) was based on the multiplicative mixed model approach of Meuwissen. The variance model for the heterogeneity parameters included a fixed region x year x month x parity effect and a random herd x test-month effect with a within-herd first-order autocorrelation between test-months. Acceleration of variance model solutions after each multiplicative model cycle enabled fast convergence of adjustment factors and reduced total computing time significantly. Maximum Likelihood estimation of within-strata residual variances was enhanced by inclusion of approximated information on loss in degrees of freedom due to estimation of location parameters. This improved heterogeneity estimates for very small herds. The multiplicative model was compared with a model that assumed homogeneous variance. Re-estimated genetic variances, based on Mendelian sampling deviations, were homogeneous for the M-RRM TD model but heterogeneous for the homogeneous random regression TD model. Accounting for HV had large effect on cow ranking but moderate effect on bull ranking. PMID:18479265
What fiscal policy is most effective? A Meta Regression Analysis
Gechert, Sebastian
2013-01-01
We apply meta regression analysis to a unique data set of 104 studies on multiplier effects with 1069 reported multipliers in order to derive stylized facts and to quantify the differing effectiveness of the composition of fiscal impulses, adjusted for the interference of study-design characteristics and sample specifics. As a major result, we find that public spending multipliers are close to one and about 0.3 to 0.4 units larger than tax and transfer multipliers. Public investment multiplie...
2005-01-01
Typical alternative hypotheses in the analysis of residuals of a standard regression model are considered, and for each one a Bayesian diagnostic based on a symmetric form of the Kullback-Leibler divergence is determined. The results include an explicit expression for the diagnostic when the alternative hypothesis is that the errors are generated by an unknown distribution function with a Dirichlet process prior. This expression is immediately interpretable, exactly computable and endowed wit...
Entrepreneurship programs in developing countries: A meta regression analysis
This paper provides a synthetic and systematic review on the effectiveness of various entrepreneurship programs in developing countries. We adopt a meta-regression analysis using 37 impact evaluation studies that were in the public domain by March 2012, and draw out several lessons on the design of the programs. We observe a wide variation in program effectiveness across different interventions depending on outcomes, types of beneficiaries, and country context. Overall, entrepreneurship progr...
A Logistic Regression Analysis of the Ischemic Heart Disease Risk
The main objective of the present study is to investigate factors that contribute significantly to enhancing the risk of ischemic heart disease. The dependent variable of the study is diagnosis - whether the patient has the disease or does not have the disease. Logistic regression analysis is applied for exploring the factors affecting the disease. The result of the study show the factors that contribute significantly to enhancing the risk of ischemic heart disease are the use of banaspati gh...
Globalisation and the welfare state - A meta-regression analysis
The effect of economic globalisation on the welfare state is a widely polarised debate in the scholarly literature. In essence, there are three possible effects of this relationship: economic globalisation increases welfare, decreases welfare or it has no effect. By applying meta-regression analysis to 33 empirical studies, this thesis concludes that globalization have a positive effect on the welfare state, although it is quite small. Moreover, the thesis finds that publication bias is not ...
Poisson Regression Analysis of Illness and Injury Surveillance Data
The Department of Energy (DOE) uses illness and injury surveillance to monitor morbidity and assess the overall health of the work force. Data collected from each participating site include health events and a roster file with demographic information. The source data files are maintained in a relational data base, and are used to obtain stratified tables of health event counts and person time at risk that serve as the starting point for Poisson regression analysis. The explanatory variables that define these tables are age, gender, occupational group, and time. Typical response variables of interest are the number of absences due to illness or injury, i.e., the response variable is a count. Poisson regression methods are used to describe the effect of the explanatory variables on the health event rates using a log-linear main effects model. Results of fitting the main effects model are summarized in a tabular and graphical form and interpretation of model parameters is provided. An analysis of deviance table is used to evaluate the importance of each of the explanatory variables on the event rate of interest and to determine if interaction terms should be considered in the analysis. Although Poisson regression methods are widely used in the analysis of count data, there are situations in which over-dispersion occurs. This could be due to lack-of-fit of the regression model, extra-Poisson variation, or both. A score test statistic and regression diagnostics are used to identify over-dispersion. A quasi-likelihood method of moments procedure is used to evaluate and adjust for extra-Poisson variation when necessary. Two examples are presented using respiratory disease absence rates at two DOE sites to illustrate the methods and interpretation of the results. In the first example the Poisson main effects model is adequate. In the second example the score test indicates considerable over-dispersion and a more detailed analysis attributes the over-dispersion to extra-Poisson variation. The R open source software environment for statistical computing and graphics is used for analysis. Additional details about R and the data that were used in this report are provided in an Appendix. Information on how to obtain R and utility functions that can be used to duplicate results in this report are provided.
Data from the Interagency Monitoring of Protected Visual Environments (IMPROVE) network are used to estimate organic mass to organic carbon (OM/OC) ratios across the United States by extending previously published multiple regression techniques. Our new methodology addresses com...
This article proposes the use of the coefficient of determination as a statistic for hypothesis testing in multiple linear regression based on distributions acquired by beta sampling. (Contains 3 figures.)
Regression analysis of radiological parameters in nuclear power plants
International Nuclear Information System (INIS)
Indian Pressurized Heavy Water Reactors (PHWRs) have now attained maturity in their operations. Indian PHWR operation started in the year 1972. At present there are 12 operating PHWRs collectively producing nearly 2400 MWe. Sufficient radiological data are available for analysis to draw inferences which may be utilised for better understanding of radiological parameters influencing the collective internal dose. Tritium is the main contributor to the occupational internal dose originating in PHWRs. An attempt has been made to establish the relationship between radiological parameters, which may be useful to draw inferences about the internal dose. Regression analysis have been done to find out the relationship, if it exist, among the following variables: A. Specific tritium activity of heavy water (Moderator and PHT) and tritium concentration in air at various work locations. B. Internal collective occupational dose and tritium release to environment through air route. C. Specific tritium activity of heavy water (Moderator and PHT) and collective internal occupational dose. For this purpose multivariate regression analysis has been carried out. D. Tritium concentration in air at various work location and tritium release to environment through air route. For this purpose multivariate regression analysis has been carried out. This analysis reveals that collective internal dose has got very good correlation with the tritium activity release to the environment through air ty release to the environment through air route. Whereas no correlation has been found between specific tritium activity in the heavy water systems and collective internal occupational dose. The good correlation has been found in case D and F test reveals that it is not by chance. (author)
Lacey, Michelle
This site, created by Michelle Lacey of Yale University, gives an explanation, a definition and an example of ANOVA for regression. Topics include analysis of variance calculations for simple and multiple regression, and F-statistics. This is a great overview of this topic.
International Nuclear Information System (INIS)
Highlights: ? We obtained models for estimation of cetane number of biodiesel. ? Twenty-four neural networks using two topologies were evaluated. ? The best neural network for predict the cetane number was selected. ? The best accuracy was obtained for the selected neural network. - Abstract: Models for estimation of cetane number of biodiesel from their fatty acid methyl ester composition using multiple linear regression and artificial neural networks were obtained in this work. For the obtaining of models to predict the cetane number, an experimental data from literature reports that covers 48 and 15 biodiesels in the modeling-training step and validation step respectively were taken. Twenty-four neural networks using two topologies and different algorithms for the second training step were evaluated. The model obtained using multiple regression was compared with two other models from literature and it was able to predict cetane number with 89% of accuracy, observing one outlier. A model to predict cetane number using artificial neural network was obtained with better accuracy than 92% except one outlier. The best neural network to predict the cetane number was a backpropagation network (11:5:1) using the Levenberg–Marquardt algorithm for the second step of the networks training and showing R = 0.9544 for the validation data.
International Nuclear Information System (INIS)
Objective: To analyze the correlations between liver lipid level determined by liver 3.0 T 1H-MRS in vivo and influencing factors using multiple linear stepwise regression. Methods: The prospective study of liver 1H-MRS was performed with 3.0 T system and eight-channel torso phased-array coils using PRESS sequence. Forty-four volunteers were enrolled in this study. Liver spectra were collected with a TR of 1500 ms, TE of 30 ms, volume of interest of 2 cm×2 cm×2 cm, NSA of 64 times. The acquired raw proton MRS data were processed by using a software program SAGE. For each MRS measurement, using water as the internal reference, the amplitude of the lipid signal was normalized to the sum of the signal from lipid and water to obtain percentage lipid within the liver. The statistical description of height, weight, age and BMI, Line width and water suppression were recorded, and Pearson analysis was applied to test their relationships. Multiple linear stepwise regression was used to set the statistical model for the prediction of Liver lipid content. Results: Age (39.1±12.6) years, body weight (64.4±10.4) kg, BMI (23.3±3.1) kg/m2, linewidth (18.9±4.4) and the water suppression (90.7±6.5)% had significant correlation with liver lipid content (0.00 to 0.96%, median 0.02%), r were 0.11, 0.44, 0.40, 0.52, -0.73 respectively (P<0.05). But only age, BMI, line width, and the water suppression entered into the multiple linear regression equation. Liver lipid content prediction equation was as follows: Y= 1.395 - (0.021×water suppression) + (0.022×BMI) + (0.014×line width) - (0.004×age), and the coefficient of determination was 0. 613, corrected coefficient of determination was 0.59. Conclusion: The regression model fitted well, since the variables of age, BMI, width, and water suppression can explain about 60% of liver lipid content changes. (authors)
In animal breeding, when there is a relationship between the dependent (Y) and independent (X) variables, regression analysis is applied. But when one of the variables has one or more missing observations regression analysis cannot be applied. This paper illustrates and discusses a regression analysis in which the independent variable (X) has a missing observation.
Multivariate study and regression analysis of gluten-free granola
Scientific Electronic Library Online (English)
Hu, L; Zhang, Z G; Mouraux, A; Iannetti, G D
2015-05-01
Transient sensory, motor or cognitive event elicit not only phase-locked event-related potentials (ERPs) in the ongoing electroencephalogram (EEG), but also induce non-phase-locked modulations of ongoing EEG oscillations. These modulations can be detected when single-trial waveforms are analysed in the time-frequency domain, and consist in stimulus-induced decreases (event-related desynchronization, ERD) or increases (event-related synchronization, ERS) of synchrony in the activity of the underlying neuronal populations. ERD and ERS reflect changes in the parameters that control oscillations in neuronal networks and, depending on the frequency at which they occur, represent neuronal mechanisms involved in cortical activation, inhibition and binding. ERD and ERS are commonly estimated by averaging the time-frequency decomposition of single trials. However, their trial-to-trial variability that can reflect physiologically-important information is lost by across-trial averaging. Here, we aim to (1) develop novel approaches to explore single-trial parameters (including latency, frequency and magnitude) of ERP/ERD/ERS; (2) disclose the relationship between estimated single-trial parameters and other experimental factors (e.g., perceived intensity). We found that (1) stimulus-elicited ERP/ERD/ERS can be correctly separated using principal component analysis (PCA) decomposition with Varimax rotation on the single-trial time-frequency distributions; (2) time-frequency multiple linear regression with dispersion term (TF-MLRd) enhances the signal-to-noise ratio of ERP/ERD/ERS in single trials, and provides an unbiased estimation of their latency, frequency, and magnitude at single-trial level; (3) these estimates can be meaningfully correlated with each other and with other experimental factors at single-trial level (e.g., perceived stimulus intensity and ERP magnitude). The methods described in this article allow exploring fully non-phase-locked stimulus-induced cortical oscillations, obtaining single-trial estimate of response latency, frequency, and magnitude. This permits within-subject statistical comparisons, correlation with pre-stimulus features, and integration of simultaneously-recorded EEG and fMRI. PMID:25665966
Isolated Area Load Forecasting using Linear Regression Analysis: Practical Approach
Spontaneous regression of multiple pulmonary metastatic nodules of hepatocarcinoma: a case report
International Nuclear Information System (INIS)
Although are spontaneous regression of either primary or metastatic malignant tumor in the absence of or inadequate therapy has been well documented. Since the earliest day of this century various malignant tumors have been reported to spontaneously disappear or to be arrested of their growth, but the cases of hepatocarcinoma has been very rare. From the literature, we were able to find out 5 previously reported cases of hepatocarcinoma which showed spontaneous regression at the primary site. Recently we have seen a case of multiple pulmonary metastatic nodules of hepatocarcinoma which completely regressed spontaneously and this forms the basis of the present case report. The patient was 55-year-old male admitted to St. Mary's Hospital, Catholic Medical College because of a hard palpable mass in the epigastrium on April 26, 1978. The admission PA chest roentgenogram revealed multiple small nodular densities scattered throughout both lung field especially in lower zones and toward the peripheral portion. A hepatoscintigram revealed a large cold area involving the left lobe and inermediate zone of the liver. Alfa-fetoprotein and hepatitis B serum antigen test were positive whereas many other standard liver function tests turned out to be negative. A needle biopsy of the tumor revealed well differentiated hepatocellular carcinoma. The patient was put under chemotherapy which consisted of 5 FU 500 mg intravenously for 6 days from April 28 to May 3, 1978. The patient was dApril 28 to May 3, 1978. The patient was discharged after this single course of 5 FU treatment and was on a herb medicine, the nature and quantity of which obscure. No other specific treatment was given. The second admission took place on Dec. 3, 1980 because of irregularity in bowel habits and dyspepsia. A follow up PA chest roentgenogram obtained on the second admission revealed complete disappearance of previously noted multiple pulmonary nodular lesions (Fig. 3). Follow up liver scan revealed persistence of the cold area in the left lobe with slight decrease in size. The patient was discharged again without any specific prescription after confirming negative results of various clinical studies including upper GI series and colon study. At the time of finishing this paper the patient is doing well without apparent medical problems
International Nuclear Information System (INIS)
Parameters derived from computer analysis of digital radio-frequency (rf) ultrasound scan data of untreated uveal malignant melanomas were examined for correlations with tumor regression following cobalt-60 plaque. Parameters included tumor height, normalized power spectrum and acoustic tissue type (ATT). Acoustic tissue type was based upon discriminant analysis of tumor power spectra, with spectra of tumors of known pathology serving as a model. Results showed ATT to be correlated with tumor regression during the first 18 months following treatment. Tumors with ATT associated with spindle cell malignant melanoma showed over twice the percentage reduction in height as those with ATT associated with mixed/epithelioid melanomas. Pre-treatment height was only weakly correlated with regression. Additionally, significant spectral changes were observed following treatment. Ultrasonic spectrum analysis thus provides a noninvasive tool for classification, prediction and monitoring of tumor response to cobalt-60 plaque
Spontaneous regression of a large hepatocellular carcinoma with multiple lung metastases.
Saito, Tamiko; Naito, Masafumi; Matsumura, Yuki; Kita, Hisaaki; Kanno, Tomoyo; Nakada, Yuki; Hamano, Mina; Chiba, Miho; Maeda, Kosaku; Michida, Tomoki; Ito, Toshifumi
2014-09-01
A 75-year-old Japanese man with chronic hepatitis C was found to have a large liver tumor and multiple nodules in the bilateral lungs. We diagnosed the tumor as hepatocellular carcinoma (HCC) with multiple lung metastases based on imaging studies and high titers of HCC tumor markers. Remarkably, without any anticancer treatment or medication, including herbal preparations, the liver tumor decreased in size, and the tumor makers diminished. Moreover, after 1 year, the multiple nodules in the bilateral lungs had disappeared. Fifteen months after the first medical examination, transcatheter arterial chemoembolization (TACE) was performed for the residual HCC. Because local relapse was observed on follow-up computed tomography, a second TACE was performed 13 months after the first one. At 4 years after the second TACE (7 years after the initial medical examination), there was no recurrence of primary or metastatic lesions. Spontaneous regression of HCC is very rare, and its mechanism remains unclear. Understanding the underlying mechanism of this rare phenomenon may offer some hope of finding new therapies, even in advanced metastatic cases. PMID:25228980
Identifying the Factors that Influence Change in SEBD Using Logistic Regression Analysis
Directory of Open Access Journals (Sweden)
Liberato Camilleri
2013-07-01
Full Text Available Multiple linear regression and ANOVA models are widely used in applications since they provide effective statistical tools for assessing the relationship between a continuous dependent variable and several predictors. However these models rely heavily on linearity and normality assumptions and they do not accommodate categorical dependent variables. The seminal contribution of John Nelder and Robert Wedderburn (1972 introduced the concept of Generalized Linear Models. GLMs overcome the limitations of Normal regression models and accommodate any distribution which is a member of the exponential family. Moreover, these models relate the dependent variable to the linear predictor (non-random component through any invertible link function. Logistic regression models are GLMs that accommodate categorical dependent variables. They assume a Binomial distribution and Logit canonical link function. The iteratively re-weighted least squares algorithm using the Fisher scoring technique is employed to maximize the log-likelihood function in GLMs and estimate the model parameters. In this paper, Logistic regression analysis was used to identify the dominant factors that influence change in social, emotional and behaviour difficulties (SEBD of Maltese children. The study comprised 486 pupils whose SEBD was assessed by both teachers and parents using the Strengths and Difficulties Questionnaire (Goodman 1997 when the children were aged 6 and 9 years old.
Confidence intervals for distinguishing ordinal and disordinal interactions in multiple regression.
Lee, Sunbok; Lei, Man-Kit; Brody, Gene H
2015-06-01
Distinguishing between ordinal and disordinal interaction in multiple regression is useful in testing many interesting theoretical hypotheses. Because the distinction is made based on the location of a crossover point of 2 simple regression lines, confidence intervals of the crossover point can be used to distinguish ordinal and disordinal interactions. This study examined 2 factors that need to be considered in constructing confidence intervals of the crossover point: (a) the assumption about the sampling distribution of the crossover point, and (b) the possibility of abnormally wide confidence intervals for the crossover point. A Monte Carlo simulation study was conducted to compare 6 different methods for constructing confidence intervals of the crossover point in terms of the coverage rate, the proportion of true values that fall to the left or right of the confidence intervals, and the average width of the confidence intervals. The methods include the reparameterization, delta, Fieller, basic bootstrap, percentile bootstrap, and bias-corrected accelerated bootstrap methods. The results of our Monte Carlo simulation study suggest that statistical inference using confidence intervals to distinguish ordinal and disordinal interaction requires sample sizes more than 500 to be able to provide sufficiently narrow confidence intervals to identify the location of the crossover point. (PsycINFO Database Record PMID:25844629
International Nuclear Information System (INIS)
We report two cases of spontaneous regression of multiple pulmonary metastases occurring after radiofrequency ablation (RFA) of a single lung metastasis. To the best of our knowledge, these are the first such cases reported. These two patients presented with lung metastases progressive despite treatment with interleukin-2, interferon, or sorafenib but were safely ablated with percutaneous RFA under computed tomography guidance. Percutaneous RFA allowed control of the targeted tumors for >1 year. Distant lung metastases presented an objective response despite the fact that they received no targeted local treatment. Local ablative techniques, such as RFA, induce the release of tumor-degradation product, which is probably responsible for an immunologic reaction that is able to produce a response in distant tumors.
Estimation of Saturation Percentage of Soil Using Multiple Regression, ANN, and ANFIS Techniques
Directory of Open Access Journals (Sweden)
Khaled Ahmad Aali
2009-07-01
Full Text Available The saturation percentage (SP of soils is an important index in hydrological studies. In this paper, arti?cial neural networks (ANNs, multiple regression (MR, and adaptive neural-based fuzzy inference system (ANFIS were used for estimation of saturation percentage of soils collected from Boukan region in the northwestern part of Iran. Percent clay, silt, sand and organic carbon (OC were used to develop the applied methods. In additions contributions of each input variable were assessed on estimation of SP index. Two performance functions, namely root mean square errors (RMSE and determination coefficient (R2, were used to evaluate the adequacy of the models. ANFIS method was found to be superior over the other methods. It is, then, proposed that ANFIS model can be used for reasonable estimation of SP values of soils.
A multiple regression equation for calculated k /SUB eff/ bias errors by criticality code system
International Nuclear Information System (INIS)
Some 500 cases of benchmark calculations on criticality problems for homogeneous experimental systems have been made with the KENO-IV Monte Carlo calculation code using the MGCL cross-section data library. The calculation results have been analyzed to classify the experimental systems so as to make the variance of calculated k /SUB eff/ bias as small as possible in each classified system. The trends of bias are identified and illustrated to be optimumly expressed by a multiple variable regression equation in terms of several variables, which adequately correlate with the bias value of k /SUB eff/ calculated for the experiments. The uncertainty accompanied by bias correction for calculated k /SUB eff/ is clearly determined, and the margin set aside for the experimental error is assessed. Finally, the procedure to estimate nuclear criticality safety is proposed
Regression analysis exploring teacher impact on student FCI post scores
Mahadeo, Jonathan V.; Manthey, Seth R.; Brewe, Eric
2013-01-01
High School Modeling Workshops are designed to improve high school physics teachers' understanding of physics and how to teach using the Modeling method. The basic assumption is that the teacher plays a critical role in their students' physics education. This study investigated teacher impacts on students' Force Concept Inventory scores, (FCI), with the hopes of identifying quantitative differences between teachers. This study examined student FCI scores from 18 teachers with at least a year of teaching high school physics. This data was then evaluated using a General Linear Model (GLM), which allowed for a regression equation to be fitted to the data. This regression equation was used to predict student post FCI scores, based on: teacher ID, student pre FCI score, gender, and representation. The results show 12 out of 18 teachers significantly impact their student post FCI scores. The GLM further revealed that of the 12 teachers only five have a positive impact on student post FCI scores. Given these differences among teachers it is our intention to extend our analysis to investigate pedagogical differences between them.
A Quantile Regression Analysis of Micro-lending's Poverty Impact
Node-Mapping EIT Method Based on Regression Analysis
Low-Cost Housing in Sabah, Malaysia: A Regression Analysis
Directory of Open Access Journals (Sweden)
International Nuclear Information System (INIS)
In environmental epidemiology, trace and toxic substance concentrations frequently have very highly skewed distributions ranging over one or more orders of magnitude, and prediction by conventional regression is often poor. Classification and Regression Tree Analysis (CART) is an alternative in such contexts. To compare the techniques, two Pennsylvania data sets and three independent variables are used: house radon progeny (RnD) and gamma levels as predicted by construction characteristics in 1330 houses; and ?200 house radon (Rn) measurements as predicted by topographic parameters. CART may identify structural variables of interest not identified by conventional regression, and vice versa, but in general the regression models are similar. CART has major advantages in dealing with other common characteristics of environmental data sets, such as missing values, continuous variables requiring transformations, and large sets of potential independent variables. CART is most useful in the identification and screening of independent variables, greatly reducing the need for cross-tabulations and nested breakdown analyses. There is no need to discard cases with missing values for the independent variables because surrogate variables are intrinsic to CART. The tree-structured approach is also independent of the scale on which the independent variables are measured, so that transformations are unnecessary. CART identifies important interactions as well as main effects. The major advantages of CART appear to be in exploring data. Once the important variables are identified, conventional regressions seem to lead to results similar but more interpretable by most audiences. 12 refs., 8 figs., 10 tabs
Regression Analysis of Variables Describing Poultry Meat Supply in European Countries
Use of generalized regression models for the analysis of stress-rupture data
International Nuclear Information System (INIS)
The design of components for operation in an elevated-temperature environment often requires a detailed consideration of the creep and creep-rupture properties of the construction materials involved. Techniques for the analysis and extrapolation of creep data have been widely discussed. The paper presents a generalized regression approach to the analysis of such data. This approach has been applied to multiple heat data sets for types 304 and 316 austenitic stainless steel, ferritic 21/4 Cr-1 Mo steel, and the high-nickel austenitic alloy 800H. Analyses of data for single heats of several materials are also presented. All results appear good. The techniques presented represent a simple yet flexible and powerful means for the analysis and extrapolation of creep and creep-rupture data
The analysis of kernel ridge regression learning algorithm.
Framing an Nuclear Emergency Plan using Qualitative Regression Analysis
International Nuclear Information System (INIS)
Since the arising on safety maintenance issues due to post-Fukushima disaster, as well as, lack of literatures on disaster scenario investigation and theory development. This study is dealing with the initiation difficulty on the research purpose which is related to content and problem setting of the phenomenon. Therefore, the research design of this study refers to inductive approach which is interpreted and codified qualitatively according to primary findings and written reports. These data need to be classified inductively into thematic analysis as to develop conceptual framework related to several theoretical lenses. Moreover, the framing of the expected framework of the respective emergency plan as the improvised business process models are abundant of unstructured data abstraction and simplification. The structural methods of Qualitative Regression Analysis (QRA) and Work System snapshot applied to form the data into the proposed model conceptualization using rigorous analyses. These methods were helpful in organising and summarizing the snapshot into an 'as-is' work system that being recommended as 'to-be'work system towards business process modelling. We conclude that these methods are useful to develop comprehensive and structured research framework for future enhancement in business process simulation. (author)
Mixed-effects Poisson regression analysis of adverse event reports
Gibbons, Robert D.; Segawa, Eisuke; Karabatsos, George; Amatya, Anup K.; Bhaumik, Dulal K.; Brown, C. Hendricks; Kapur, Kush; Marcus, Sue M.; Hur, Kwan; Mann, J. John
2008-01-01
SUMMARY A new statistical methodology is developed for the analysis of spontaneous adverse event (AE) reports from post-marketing drug surveillance data. The method involves both empirical Bayes (EB) and fully Bayes estimation of rate multipliers for each drug within a class of drugs, for a particular AE, based on a mixed-effects Poisson regression model. Both parametric and semiparametric models for the random-effect distribution are examined. The method is applied to data from Food and Drug Administration (FDA)’s Adverse Event Reporting System (AERS) on the relationship between antidepressants and suicide. We obtain point estimates and 95 per cent confidence (posterior) intervals for the rate multiplier for each drug (e.g. antidepressants), which can be used to determine whether a particular drug has an increased risk of association with a particular AE (e.g. suicide). Confidence (posterior) intervals that do not include 1.0 provide evidence for either significant protective or harmful associations of the drug and the adverse effect. We also examine EB, parametric Bayes, and semiparametric Bayes estimators of the rate multipliers and associated confidence (posterior) intervals. Results of our analysis of the FDA AERS data revealed that newer antidepressants are associated with lower rates of suicide adverse event reports compared with older antidepressants. We recommend improvements to the existing AERS system, which are likely to improve its public health value as an early warning system. PMID:18404622
International Nuclear Information System (INIS)
Several MRI features of supratentorial astrocytomas are associated with high histologic grade by statistically significant p values. We sought to apply this information prospectively to a group of astrocytomas in the prediction of tumor grade. We used 10 MRI features of fibrillary astrocytomas from 52 patient studies to develop neural network and multiple linear regression models for practical use in predicting tumor grade. The models were tested prospectively on MR images from 29 patient studies. The performance of the models was compared against that of a radiologist. Neural network accuracy was 61 % in distinguishing between low and high grade tumors. Multiple linear regression achieved an accuracy of 59 %. Assessment of the images by a radiologist yielded 57 % accuracy. We conclude that while certain MRI parameters may be statistically related to astrocytoma histologic grade, neural network and linear regression models cannot reliably use them to predict tumor grade. (orig.)
Vennebusch, Markus; Nothnagel, Axel; Kutterer, Hansjörg
Le, Huy; Marcus, Justin
2012-01-01
This study used Monte Carlo simulation to examine the properties of the overall odds ratio (OOR), which was recently introduced as an index for overall effect size in multiple logistic regression. It was found that the OOR was relatively independent of study base rate and performed better than most commonly used R-square analogs in indexing model…
International Nuclear Information System (INIS)
Risk associated with power generation must be identified to make intelligent choices between alternate power technologies. Radionuclide air stack emissions for a single coal plant and a single nuclear plant are used to compute the single plant leukemia incidence risk and total industry leukemia incidence risk. Leukemia incidence is the response variable as a function of radionuclide bone dose for the six proposed dose response curves considered. During normal operation a coal plant has higher radionuclide emissions than a nuclear plant and the coal industry has a higher leukaemia incidence risk than the nuclear industry, unless a nuclear accident occurs. Variation of nuclear accident size allows quantification of the impact of accidents on the total industry leukemia incidence risk comparison. The leukemia incidence risk is quantified as the number of accidents of a given size for the nuclear industry leukemia incidence risk to equal the coal industry leukemia incidence risk. The general linear model is used to develop equations that relate the accident frequency required for equal industry risks to the magnitude of the nuclear emission. Exploratory data analysis revealed that the relationship between the natural log of accident number versus the natural log of accident size is linear. (Author)
Adaptive regression analysis: theory and applications in econometrics
Directory of Open Access Journals (Sweden)
Dental malocclusion and body posture in young subjects: A multiple regression study
Directory of Open Access Journals (Sweden)
Supply and Demand of Jeneberang River Aggregate Using Multiple Regression Model
Directory of Open Access Journals (Sweden)
Aryanti Virtanti Anas
2013-07-01
Full Text Available Aggregate plays an important role in developing infrastructure because it is the major raw materials used in construction such as roads, hospitals, schools, factories, homes and other buildings. Sand and gravel are essential sources of aggregate and exploited often from the active channels of river systems. Jeneberang River is one of the main rivers in South Sulawesi Province which is located at Gowa Regency and mined in order to fulfill the aggregate demand of Gowa Regency and Makassar City. Supply and demand are economic occurrences that affected by several factors, so this research aims to (1 determine influencing factors to aggregate supply and demand, (2 develop supply and demand model. Data was obtained from Central Bureau Statistics of Gowa Regency and Makassar City, and Department of Mines and Energy, Gowa Regency for eleven years (2001 – 2011. In this research, aggregate supply and demand were modeled using multiple regression method. First, relationship among supply and influencing factors were established, followed by demand and its factors. Second, supply and demand model was established using SPSS. The result of this research showed that the model can be used to estimate accurately supply and demand of aggregate using the established relationship among the influencing factors. Supply of aggregate was affected by several factors including price, number of trucks, number of mining companies and mining permit area meanwhile the price, GDP, income per capita, length of road, number of buildings and economic growth had high influence on demand rate.
Dental malocclusion and body posture in young subjects: A multiple regression study
Scientific Electronic Library Online (English)
Giuseppe, Perinetti; Luca, Contardo; Armando, Silvestrini-Biavati; Lucia, Perdoni; Attilio, Castaldo.
Full Text Available OBJECTIVES: Controversial results have been reported on potential correlations between the stomatognathic system and body posture. We investigated whether malocclusal traits correlate with body posture alterations in young subjects to determine possible clinical applications. METHODS: A total of 122 [...] subjects, including 86 males and 36 females (age range of 10.8-16.3 years), were enrolled. All subjects tested negative for temporomandibular disorders or other conditions affecting the stomatognathic systems, except malocclusion. A dental occlusion assessment included phase of dentition, molar class, overjet, overbite, anterior and posterior crossbite, scissorbite, mandibular crowding and dental midline deviation. In addition, body posture was recorded through static posturography using a vertical force platform. Recordings were performed under two conditions, namely, i) mandibular rest position (RP) and ii) dental intercuspidal position (ICP). Posturographic parameters included the projected sway area and velocity and the antero-posterior and right-left load differences. Multiple regression models were run for both recording conditions to evaluate associations between each malocclusal trait and posturographic parameters. RESULTS: All of the posturographic parameters had large variability and were very similar between the two recording conditions. Moreover, a limited number of weakly significant correlations were observed, mainly for overbite and dentition phase, when using multivariate models. CONCLUSION: Our current findings, particularly with regard to the use of posturography as a diagnostic aid for subjects affected by dental malocclusion, do not support existence of clinically relevant correlations between malocclusal traits and body posture
Comparison of neural network and multiple linear regression as dissolution predictors.
Varying-coefficient functional linear regression
Wu, Yichao; Fan, Jianqing; Mu?ller, Hans-georg
2011-01-01
Functional linear regression analysis aims to model regression relations which include a functional predictor. The analog of the regression parameter vector or matrix in conventional multivariate or multiple-response linear regression models is a regression parameter function in one or two arguments. If, in addition, one has scalar predictors, as is often the case in applications to longitudinal studies, the question arises how to incorporate these into a functional regressi...
Scientific Electronic Library Online (English)
Kosuke, Kawai; Donna, Spiegelman; Anuraj H, Shankar; Wafaie W, Fawzi.
2011-06-01
Full Text Available RESUMEN OBJETIVO: Realizar una revisión sistemática de ensayos aleatorizados y controlados en los que se compara el efecto de la administración de múltiples micronutrientes con el de la administración de hierro y ácido fólico sobre los resultados de los embarazos en los países en vías de desarrollo. [...] MÉTODOS: Se realizaron búsquedas en MEDLINE y EMBASE. Los resultados de interés fueron: peso del neonato, bajo peso neonatal, neonatos con una talla baja para la edad gestacional, mortalidad perinatal y mortalidad neonatal. Se calcularon los riesgos relativos (RR) agrupados, empleando modelos de efectos aleatorios. Se investigaron las fuentes de heterogeneidad del metanálisis y la metarregresión de los subgrupos. RESULTADOS: La administración de múltiples micronutrientes fue más eficaz que la administración de hierro y ácido fólico a la hora de reducir el riesgo del peso bajo neonatal (RR=0,86, IC del 95%=0,79-0,93) y la talla baja para la edad gestacional (RR=0,85; IC del 95%=0,78-0,93). La administración de micronutrientes no tuvo un efecto global en la mortalidad perinatal (RR=1,05; IC del 95%=0,90-1,22), si bien la heterogeneidad fue importante y evidente (I²=58%; p de heterogeneidad=0,008). Los análisis de los subgrupos y de la metarregresión sugirieron que la administración de micronutrientes estaba asociada a un menor riesgo de mortalidad perinatal en aquellos estudios en los que más del 50% de las madres tenía formación universitaria (RR=0,93; IC del 95%=0,82-1,06) o en los que la administración se inició después de una media de 20 semanas de gestación (RR=0,88; IC del 95%=0,80-0,97). CONCLUSIÓN: La educación de la madre o la edad gestacional en la que se inició la administración pueden haber contribuido a los efectos heterogéneos observados en la mortalidad perinatal. Se debe seguir investigando la seguridad, la eficacia y la efectividad de la administración de micronutrientes a mujeres embarazadas. Abstract in english OBJECTIVE: To systematically review randomized controlled trials comparing the effect of supplementation with multiple micronutrients versus iron and folic acid on pregnancy outcomes in developing countries. METHODS: MEDLINE and EMBASE were searched. Outcomes of interest were birth weight, low birth [...] weight, small size for gestational age, perinatal mortality and neonatal mortality. Pooled relative risks (RRs) were estimated by random effects models. Sources of heterogeneity were explored through subgroup meta-analyses and meta-regression. FINDINGS: Multiple micronutrient supplementation was more effective than iron and folic acid supplementation at reducing the risk of low birth weight (RR:0.86, 95% confidence interval, CI:0.79-0.93) and of small size for gestational age (RR:0.85; 95% CI: 0.78-0.93). Micronutrient supplementation had no overall effect on perinatal mortality (RR:1.05; 95% CI:0.90-1.22), although substantial heterogeneity was evident (I²=58%; P for heterogeneity=0.008). Subgroup and meta-regression analyses suggested that micronutrient supplementation was associated with a lower risk of perinatal mortality in trials in which >50% of mothers had formal education (RR:0.93; 95% CI:0.82-1.06) or in which supplementation was initiated after a mean of 20 weeks of gestation (RR:0.88; 95% CI:0.80-0.97). CONCLUSION: Maternal education or gestational age at initiation of supplementation may have contributed to the observed heterogeneous effects on perinatal mortality. The safety, efficacy and effective delivery of maternal micronutrient supplementation require further research.
A simplified procedure of linear regression in a preliminary analysis
Directory of Open Access Journals (Sweden)
Silvia Facchinetti
2013-05-01
Full Text Available The analysis of a statistical large data-set can be led by the study of a particularly interesting variable Y – regressed – and an explicative variable X, chosen among the remained variables, conjointly observed. The study gives a simplified procedure to obtain the functional link of the variables y=y(x by a partition of the data-set into m subsets, in which the observations are synthesized by location indices (mean or median of X and Y. Polynomial models for y(x of order r are considered to verify the characteristics of the given procedure, in particular we assume r= 1 and 2. The distributions of the parameter estimators are obtained by simulation, when the fitting is done for m= r + 1. Comparisons of the results, in terms of distribution and efficiency, are made with the results obtained by the ordinary least square methods. The study also gives some considerations on the consistency of the estimated parameters obtained by the given procedure.
A novel multiobjective evolutionary algorithm based on regression analysis.
International Nuclear Information System (INIS)
The gamma/beta TLD badge used by OPPD consists of two TLD-700 chips (Harshaw G7 card), one of which (chip number sign 2) is shielded by a 0.102 cm-thick aluminum filter, and the other (chip number sign 1) is unshielded, as shown in Fig. 1. Standard procedure had been to determine the beta dose to the badge by subtracting the response of chip number sign 2 from that of chip number sign 1 and then dividing by a calibrated beta-sensitivity factor; the gamma dose was taken to be the response of chip number sign 2 divided by the chip's gamma-sensitivity factor followed by the subtraction of the background dose. A problem with this procedure is penetration of energetic beta particles through the aluminum filter on chip number sign 2 which causes an over-response. Due to the technique used to obtain the beta dose, this also results in an under-estimate of the beta dose. This problem has been corrected through application of multiple linear regression analysis on a large data base of pure gamma (137Cs), pure beta (90Sr), and mixed exposures. The outcome of the analysis is an algorithm that automatically corrects for penetration effects. Performance tests using the ANSI N13.11 standard are presented to show the improvement
Multiple Regression Analysis Using ANCOVA in University Model
Linear Maximum Likelihood Regression Analysis for Untransformed Log-Normally Distributed Data
Sara M. Gustavsson; Sandra Johannesson; Gerd Sallsten; Andersson, Eva M
2012-01-01
Medical research data are often skewed and heteroscedastic. It has therefore become practice to log-transform data in regression analysis, in order to stabilize the variance. Regression analysis on log-transformed data estimates the relative effect, whereas it is often the absolute effect of a predictor that is of interest. We propose a maximum likelihood (ML)-based approach to estimate a linear regression model on log-normal, heteroscedastic data. The new method was evaluated with a large si...
Irrechukwu, Onyi N.; Reiter, David A.; Lin, Ping-Chang; Roque, Remigio A.; Fishbein, Kenneth W.
2012-01-01
Increased sensitivity in the characterization of cartilage matrix status by magnetic resonance (MR) imaging, through the identification of surrogate markers for tissue quality, would be of great use in the noninvasive evaluation of engineered cartilage. Recent advances in MR evaluation of cartilage include multiexponential and multiparametric analysis, which we now extend to engineered cartilage. We studied constructs which developed from chondrocytes seeded in collagen hydrogels. MR measurements of transverse relaxation times were performed on samples after 1, 2, 3, and 4 weeks of development. Corresponding biochemical measurements of sulfated glycosaminoglycan (sGAG) were also performed. sGAG per wet weight increased from 7.74±1.34??g/mg in week 1 to 21.06±4.14??g/mg in week 4. Using multiexponential T2 analysis, we detected at least three distinct water compartments, with T2 values and weight fractions of (45?ms, 3%), (200?ms, 4%), and (500?ms, 97%), respectively. These values are consistent with known properties of engineered cartilage and previous studies of native cartilage. Correlations between sGAG and MR measurements were examined using conventional univariate analysis with T2 data from monoexponential fits with individual multiexponential compartment fractions and sums of these fractions, through multiple linear regression based on linear combinations of fractions, and, finally, with multivariate analysis using the support vector regression (SVR) formalism. The phenomenological relationship between T2 from monoexponential fitting and sGAG exhibited a correlation coefficient of r2=0.56, comparable to the more physically motivated correlations between individual fractions or sums of fractions and sGAG; the correlation based on the sum of the two proteoglycan-associated fractions was r2=0.58. Correlations between measured sGAG and those calculated using standard linear regression were more modest, with r2 in the range 0.43–0.54. However, correlations using SVR exhibited r2 values in the range 0.68–0.93. These results indicate that the SVR-based multivariate approach was able to determine tissue sGAG with substantially higher accuracy than conventional monoexponential T2 measurements or conventional regression modeling based on water fractions. This combined technique, in which the results of multiexponential analysis are examined with multivariate statistical techniques, holds the potential to greatly improve the accuracy of cartilage matrix characterization in engineered constructs using noninvasive MR data. PMID:22166112
Stone, Wesley W.; Crawford, Charles G.; Gilliom, Robert J.
2013-01-01
Watershed Regressions for Pesticides for multiple pesticides (WARP-MP) are statistical models developed to predict concentration statistics for a wide range of pesticides in unmonitored streams. The WARP-MP models use the national atrazine WARP models in conjunction with an adjustment factor for each additional pesticide. The WARP-MP models perform best for pesticides with application timing and methods similar to those used with atrazine. For other pesticides, WARP-MP models tend to overpredict concentration statistics for the model development sites. For WARP and WARP-MP, the less-than-ideal sampling frequency for the model development sites leads to underestimation of the shorter-duration concentration; hence, the WARP models tend to underpredict 4- and 21-d maximum moving-average concentrations, with median errors ranging from 9 to 38% As a result of this sampling bias, pesticides that performed well with the model development sites are expected to have predictions that are biased low for these shorter-duration concentration statistics. The overprediction by WARP-MP apparent for some of the pesticides is variably offset by underestimation of the model development concentration statistics. Of the 112 pesticides used in the WARP-MP application to stream segments nationwide, 25 were predicted to have concentration statistics with a 50% or greater probability of exceeding one or more aquatic life benchmarks in one or more stream segments. Geographically, many of the modeled streams in the Corn Belt Region were predicted to have one or more pesticides that exceeded an aquatic life benchmark during 2009, indicating the potential vulnerability of streams in this region.
Morrison, D G; Humes, P E; Keith, N K; Godke, R A
1985-03-01
Data from 131 calvings of Chianina crossbred cows (2 to 5 yr old) bred to Chianina bulls were used to compare stepwise multiple regression analysis (RA) and stepwise, two-group discriminant analysis (DA) for predicting dystocia. Variables (21) studied in relation to dystocia included both prebreeding and precalving cow and calf effects. Calving was categorized as either unassisted or assisted without regard to the severity of dystocia. During this study, 30 (22.9%) assisted births occurred. All variables were standardized to a mean of zero and a variance of one before statistical analyses. Models were developed based on precalving variables and with both precalving and postcalving variables with both RA and DA. Average discriminant scores (centroids) were different (P less than .01) between assisted and unassisted cows. Significant precalving DA variables were cow age and precalving pelvic height. This model correctly predicted 26 of 30 (86.7%) of the occurrences of dystocia. Significant precalving RA variables were prebreeding pelvic width and precalving pelvic height. The amount of variation accounted for by these two factors was 31.5%. Calf birth weight, calf chest depth, calf height, precalving pelvic area, cow age and precalving cow weight were selected by DA for use in the combined precalving and postcalving prediction model. Calf birth weight was 58% more important than either pelvic size or cow age. Percentage correctly classified with this model was 87.4. Significant postcalving variables selected by RA in order of importance were prebreeding pelvic width, calf birth weight and calf shoulder width (R2 = .399).(ABSTRACT TRUNCATED AT 250 WORDS) PMID:3988637
Regression analysis of technical parameters affecting nuclear power plant performances
International Nuclear Information System (INIS)
Since the 80's many studies have been conducted in order to explicate good and bad performances of commercial nuclear power plants (NPPs), but yet no defined correlation has been found out to be totally representative of plant operational experience. In early works, data availability and the number of operating power stations were both limited; therefore, results showed that specific technical characteristics of NPPs were supposed to be the main causal factors for successful plant operation. Although these aspects keep on assuming a significant role, later studies and observations showed that other factors concerning management and organization of the plant could instead be predominant comparing utilities operational and economic results. Utility quality, in a word, can be used to summarize all the managerial and operational aspects that seem to be effective in determining plant performance. In this paper operational data of a consistent sample of commercial nuclear power stations, out of the total 433 operating NPPs, are analyzed, mainly focusing on the last decade operational experience. The sample consists of PWR and BWR technology, operated by utilities located in different countries, including U.S. (Japan)) (France)) (Germany)) and Finland. Multivariate regression is performed using Unit Capability Factor (UCF) as the dependent variable; this factor reflects indeed the effectiveness of plant programs and practices in maximizing the available electrical generationmizing the available electrical generation and consequently provides an overall indication of how well plants are operated and maintained. Aspects that may not be real causal factors but which can have a consistent impact on the UCF, as technology design, supplier, size and age, are included in the analysis as independent variables. (authors)
Multiple logistic regression model of signalling practices of drivers on urban highways
Buffalos milk yield analysis using random regression models
Directory of Open Access Journals (Sweden)
A.S. Schierholt
2010-02-01
Full Text Available Data comprising 1,719 milk yield records from 357 females (predominantly Murrah breed, daughters of 110 sires, with births from 1974 to 2004, obtained from the Programa de Melhoramento Genético de Bubalinos (PROMEBUL and from records of EMBRAPA Amazônia Oriental - EAO herd, located in Belém, Pará, Brazil, were used to compare random regression models for estimating variance components and predicting breeding values of the sires. The data were analyzed by different models using the Legendre’s polynomial functions from second to fourth orders. The random regression models included the effects of herd-year, month of parity date of the control; regression coefficients for age of females (in order to describe the fixed part of the lactation curve and random regression coefficients related to the direct genetic and permanent environment effects. The comparisons among the models were based on the Akaike Infromation Criterion. The random effects regression model using third order Legendre’s polynomials with four classes of the environmental effect were the one that best described the additive genetic variation in milk yield. The heritability estimates varied from 0.08 to 0.40. The genetic correlation between milk yields in younger ages was close to the unit, but in older ages it was low.
Scientific Electronic Library Online (English)
H., Jang; E., Topal; Y., Kawamura.
2015-05-01
Full Text Available Unplanned dilution and ore loss directly influence not only the productivity of underground stopes, but also the profitability of the entire mining process. Stope dilution is a result of complex interactions between a number of factors, and cannot be predicted prior to mining. In this study, unplann [...] ed dilution and ore loss prediction models were established using multiple linear and nonlinear regression analysis (MLRA and MNRA), as well as an artificial neural network (ANN) method based on 1067 datasets with ten causative factors from three underground longhole stoping mines in Western Australia. Models were established for individual mines, as well as a general model that includes all of the mine data-sets. The correlation coefficient (R) was used to evaluate the methods, and the values for MLRA, MNRA, and ANN compared with the general model were 0.419, 0.438, and 0.719, respectively. Considering that the current unplanned dilution and ore loss prediction for the mines investigated yielded an R of 0.088, the ANN model results are noteworthy. The proposed ANN model can be used directly as a practical tool to predict unplanned dilution and ore loss in mines, which will not only enhance productivity, but will also be beneficial for stope planning and design.
INFLUENCE OF TOURISM SECTOR IN ALBANIAN GDP: STIMATION USING MULTIPLE REGRESSION METHOD
A Bayesian Quantile Regression Analysis of Potential Risk Factors for Violent Crimes in USA
Application of a multiple least-squares regression program to dual energy NaI-CsI(T1) measurements
International Nuclear Information System (INIS)
In conjunction with the development of an optimum background subtraction routine, a multiple least-squares regression program for simultaneous utilization of both the NaI(T1) and CsI(T1) energy ranges of a dual anti-coincidence detection system was applied. To experimentally evaluate the program for whole body counting purposes, an Am-241 contaminated subject was measured in the whole body counter using the standard three phoswich detector array surrounding the head
Regression Analysis with Block Missing Values and Variables Selection
Analysis on Train Stopping Accuracy based on Regression Algorithms
International Nuclear Information System (INIS)
A plot of lung-cancer rates versus radon exposures in 965 US counties, or in all US states, has a strong negative slope, b, in sharp contrast to the strong positive slope predicted by linear/no-threshold theory. The discrepancy between these slopes exceeds 20 standard deviations (SD). Including smoking frequency in the analysis substantially improves fits to a linear relationship but has little effect on the discrepancy in b, because correlations between smoking frequency and radon levels are quite weak. Including 17 socioeconomic variables (SEV) in multiple regression analysis reduces the discrepancy to 15 SD. Data were divided into segments by stratifying on each SEV in turn, and on geography, and on both simultaneously, giving over 300 data sets to be analyzed individually, but negative slopes predominated. The slope is negative whether one considers only the most urban counties or only the most rural; only the richest or only the poorest; only the richest in the South Atlantic region or only the poorest in that region, etc., etc.,; and for all the strata in between. Since this is an ecological study, the well-known problems with ecological studies were investigated and found not to be applicable here. The open-quotes ecological fallacyclose quotes was shown not to apply in testing a linear/no-threshold theory, and the vulnerability to confounding is greatly reduced when confounding factors are only weakly correlated with radon levels, as is generally the case hereadon levels, as is generally the case here. All confounding factors known to correlate with radon and with lung cancer were investigated quantitatively and found to have little effect on the discrepancy
Detrended fluctuation analysis as a regression framework: Estimating dependence at different scales
We propose a novel framework combining detrended fluctuation analysis with standard regression methodology. The method is built on detrended variances and covariances and it is designed to estimate regression parameters at different scales and under potential non-stationarity and power-law correlations. Selected examples from physics, finance and environmental sciences illustrate usefulness of the framework.
REGRESSION ANALYSIS OF PRODUCTIVITY USING MIXED EFFECT MODEL
Siana Halim
2007-01-01
Full Text Available Production plants of a company are located in several areas that spread across Middle and East Java. As the production process employs mostly manpower, we suspected that each location has different characteristics affecting the productivity. Thus, the production data may have a spatial and hierarchical structure. For fitting a linear regression using the ordinary techniques, we are required to make some assumptions about the nature of the residuals i.e. independent, identically and normally distributed. However, these assumptions were rarely fulfilled especially for data that have a spatial and hierarchical structure. We worked out the problem using mixed effect model. This paper discusses the model construction of productivity and several characteristics in the production line by taking location as a random effect. The simple model with high utility that satisfies the necessary regression assumptions was built using a free statistic software R version 2.6.1.
Regression analysis of censored data using pseudo-observations
Parner, Erik T.; Andersen, Per Kragh
2010-01-01
We draw upon a series of articles in which a method based on pseu- dovalues is proposed for direct regression modeling of the survival function, the restricted mean, and the cumulative incidence function in competing risks with right-censored data. The models, once the pseudovalues have been computed, can be fit using standard generalized estimating equation software. Here we present Stata procedures for computing these pseudo-observations. An example from a bone marrow transplantation study is used to illustrate the method.
Are Fiscal Multipliers Regime-Dependent? A Meta Regression Analysis
Gechert, Sebastian; Rannenberg, Ansgar
2014-01-01
Die Studie untersucht, ob fiskalische Multiplikatoreffekte im Abschwung systematisch größer sind als im Aufschwung. Dazu wird eine Meta-Regressions-Analyse durchgeführt, die einen neuartigen Datensatz von 98 empirischen Studien mit über 1800 Beobachtungen von Multiplikatoreffekten auswertet und für die Regime-Abhängigkeit von Multiplikatoren kontrolliert. Es zeigt sich, dass ausgabeseitige Multiplikatoren im Abschwung um 0,6 bis 0,8 Punkte höher liegen. Darüber hinaus übersteigen aus...
Model performance analysis and model validation in logistic regression
Rosa Arboretti Giancristofaro
2007-10-01
Full Text Available In this paper a new model validation procedure for a logistic regression model is presented. At first, we illustrate a brief review of different techniques of model validation. Next, we define a number of properties required for a model to be considered "good", and a number of quantitative performance measures. Lastly, we describe a methodology for the assessment of the performance of a given model by using an example taken from a management study.
Analysis of some methods for reduced rank Gaussian process regression
DEFF Research Database (Denmark)
Quinonero-Candela, J.; Rasmussen, Carl Edward
2005-01-01
While there is strong motivation for using Gaussian Processes (GPs) due to their excellent performance in regression and classification problems, their computational complexity makes them impractical when the size of the training set exceeds a few thousand cases. This has motivated the recent proliferation of a number of cost-effective approximations to GPs, both for classification and for regression. In this paper we analyze one popular approximation to GPs for regression: the reduced rank approximation. While generally GPs are equivalent to infinite linear models, we show that Reduced Rank Gaussian Processes (RRGPs) are equivalent to finite sparse linear models. We also introduce the concept of degenerate GPs and show that they correspond to inappropriate priors. We show how to modify the RRGP to prevent it from being degenerate at test time. Training RRGPs consists both in learning the covariance function hyperparameters and the support set. We propose a method for learning hyperparameters for a given support set. We also review the Sparse Greedy GP (SGGP) approximation (Smola and Bartlett, 2001), which is a way of learning the support set for given hyperparameters based on approximating the posterior. We propose an alternative method to the SGGP that has better generalization capabilities. Finally we make experiments to compare the different ways of training a RRGP. We provide some Matlab code for learning RRGPs.
BRGLM, Interactive Linear Regression Analysis by Least Square Fit
1 - Description of program or function: BRGLM is an interactive program written to fit general linear regression models by least squares and to provide a variety of statistical diagnostic information about the fit. Stepwise and all-subsets regression can be carried out also. There are facilities for interactive data management (e.g. setting missing value flags, data transformations) and tools for constructing design matrices for the more commonly-used models such as factorials, cubic Splines, and auto-regressions. 2 - Method of solution: The least squares computations are based on the orthogonal (QR) decomposition of the design matrix obtained using the modified Gram-Schmidt algorithm. 3 - Restrictions on the complexity of the problem: The current release of BRGLM allows maxima of 1000 observations, 99 variables, and 3000 words of main memory workspace. For a problem with N observations and P variables, the number of words of main memory storage required is MAX(N*(P+6), N*P+P*P+3*N, and 3*P*P+6*N). Any linear model may be fit although the in-memory workspace will have to be increased for larger problems
A New Approach in Regression Analysis for Modeling Adsorption Isotherms
Onjia, Antonije E.
2014-01-01
Numerous regression approaches to isotherm parameters estimation appear in the literature. The real insight into the proper modeling pattern can be achieved only by testing methods on a very big number of cases. Experimentally, it cannot be done in a reasonable time, so the Monte Carlo simulation method was applied. The objective of this paper is to introduce and compare numerical approaches that involve different levels of knowledge about the noise structure of the analytical method used for initial and equilibrium concentration determination. Six levels of homoscedastic noise and five types of heteroscedastic noise precision models were considered. Performance of the methods was statistically evaluated based on median percentage error and mean absolute relative error in parameter estimates. The present study showed a clear distinction between two cases. When equilibrium experiments are performed only once, for the homoscedastic case, the winning error function is ordinary least squares, while for the case of heteroscedastic noise the use of orthogonal distance regression or Margart's percent standard deviation is suggested. It was found that in case when experiments are repeated three times the simple method of weighted least squares performed as well as more complicated orthogonal distance regression method. PMID:24672394
Hardt Jochen
2012-12-01
Full Text Available Abstract Background Multiple imputation is becoming increasingly popular. Theoretical considerations as well as simulation studies have shown that the inclusion of auxiliary variables is generally of benefit. Methods A simulation study of a linear regression with a response Y and two predictors X1 and X2 was performed on data with n = 50, 100 and 200 using complete cases or multiple imputation with 0, 10, 20, 40 and 80 auxiliary variables. Mechanisms of missingness were either 100% MCAR or 50% MAR + 50% MCAR. Auxiliary variables had low (r=.10 vs. moderate correlations (r=.50 with X’s and Y. Results The inclusion of auxiliary variables can improve a multiple imputation model. However, inclusion of too many variables leads to downward bias of regression coefficients and decreases precision. When the correlations are low, inclusion of auxiliary variables is not useful. Conclusion More research on auxiliary variables in multiple imputation should be performed. A preliminary rule of thumb could be that the ratio of variables to cases with complete data should not go below 1 : 3.
Development of a User Interface for a Regression Analysis Software Tool
Ulbrich, Norbert Manfred; Volden, Thomas R.
2010-01-01
An easy-to -use user interface was implemented in a highly automated regression analysis tool. The user interface was developed from the start to run on computers that use the Windows, Macintosh, Linux, or UNIX operating system. Many user interface features were specifically designed such that a novice or inexperienced user can apply the regression analysis tool with confidence. Therefore, the user interface s design minimizes interactive input from the user. In addition, reasonable default combinations are assigned to those analysis settings that influence the outcome of the regression analysis. These default combinations will lead to a successful regression analysis result for most experimental data sets. The user interface comes in two versions. The text user interface version is used for the ongoing development of the regression analysis tool. The official release of the regression analysis tool, on the other hand, has a graphical user interface that is more efficient to use. This graphical user interface displays all input file names, output file names, and analysis settings for a specific software application mode on a single screen which makes it easier to generate reliable analysis results and to perform input parameter studies. An object-oriented approach was used for the development of the graphical user interface. This choice keeps future software maintenance costs to a reasonable limit. Examples of both the text user interface and graphical user interface are discussed in order to illustrate the user interface s overall design approach.
Schilling, K.E.; Wolter, C.F.
2005-01-01
Nineteen variables, including precipitation, soils and geology, land use, and basin morphologic characteristics, were evaluated to develop Iowa regression models to predict total streamflow (Q), base flow (Qb), storm flow (Qs) and base flow percentage (%Qb) in gauged and ungauged watersheds in the state. Discharge records from a set of 33 watersheds across the state for the 1980 to 2000 period were separated into Qb and Qs. Multiple linear regression found that 75.5 percent of long term average Q was explained by rainfall, sand content, and row crop percentage variables, whereas 88.5 percent of Qb was explained by these three variables plus permeability and floodplain area variables. Qs was explained by average rainfall and %Qb was a function of row crop percentage, permeability, and basin slope variables. Regional regression models developed for long term average Q and Qb were adapted to annual rainfall and showed good correlation between measured and predicted values. Combining the regression model for Q with an estimate of mean annual nitrate concentration, a map of potential nitrate loads in the state was produced. Results from this study have important implications for understanding geomorphic and land use controls on streamflow and base flow in Iowa watersheds and similar agriculture dominated watersheds in the glaciated Midwest. (JAWRA) (Copyright ?? 2005).
Point Estimates and Confidence Intervals for Variable Importance in Multiple Linear Regression
Thomas, D. Roland; Zhu, PengCheng; Decady, Yves J.
2007-01-01
The topic of variable importance in linear regression is reviewed, and a measure first justified theoretically by Pratt (1987) is examined in detail. Asymptotic variance estimates are used to construct individual and simultaneous confidence intervals for these importance measures. A simulation study of their coverage properties is reported, and an…
W., Sun; G. X., Meng; Q., Ye; H. L., Jin; J. Z., Zhang.
2012-04-01
Full Text Available Carrying out regression analysis for gas leakage of pressure-relief valve (PRV) to get accurate leakage flow and changing trend of leakage will be helpful in assessing the reliability of PRV. Classic support vector regression (SVR) is an excellent regression model, and has been widely used in variou [...] s fields. However, standard SVR model does regression only using leakage data without elements closely related to the leakage considered. In this paper a regression model based on support vector regression plus (SVR+) is put forward to perform leakage regression of PRV, in which particle swarm optimization (PSO) is used to select optimum parameters of SVR+, termed PSO_SVR+. The experimental results demonstrate that the proposed model taking the difference of inlet pressure and outlet pressure of PRV as hidden information can access a more favorable regression precision than SVR can provide. Meanwhile this article also investigates effects of PSO and Genetic Algorithm on the performance of regression model (SVR+ or SVR)
Li, Yang
2010-01-01
Quantile regression have its advantage properties comparing to the OLS model regression which are full measurement of the effects of a covariate on response, robustness and Equivariance property. In this paper, I use a survey data in Belgium and apply a linear model to see the advantage properites of quantile regression. And I use a quantile regression model with the raw data to analyze the different cost of family on different numbers of children and apply a Wald test. The result shows that ...
Spontaneous Regression of a Large Hepatocellular Carcinoma with Multiple Lung Metastases
Saito, Tamiko; Naito, Masafumi; Matsumura, Yuki; Kita, Hisaaki; Kanno, Tomoyo; Nakada, Yuki; Hamano, Mina; Chiba, Miho; Maeda, Kosaku; Michida, Tomoki; Ito, Toshifumi
2014-01-01
A 75-year-old Japanese man with chronic hepatitis C was found to have a large liver tumor and multiple nodules in the bilateral lungs. We diagnosed the tumor as hepatocellular carcinoma (HCC) with multiple lung metastases based on imaging studies and high titers of HCC tumor markers. Remarkably, without any anticancer treatment or medication, including herbal preparations, the liver tumor decreased in size, and the tumor makers diminished. Moreover, after 1 year, the multiple nodules in the b...
Chang-zhi CHENG
2011-06-01
Full Text Available Objective To explore the risk factors of complication of acute renal failure(ARF in war injuries of limbs.Methods The clinical data of 352 patients with limb injuries admitted to 303 Hospital of PLA from 1968 to 2002 were retrospectively analyzed.The patients were divided into ARF group(n=9 and non-ARF group(n=343 according to the occurrence of ARF,and the case-control study was carried out.Ten factors which might lead to death were analyzed by logistic regression to screen the risk factors for ARF,including causes of trauma,shock after injury,time of admission to hospital after injury,injured sites,combined trauma,number of surgical procedures,presence of foreign matters,features of fractures,amputation,and tourniquet time.Results Fifteen of the 352 patients died(4.3%,among them 7 patients(46.7% died of ARF,3(20.0% of pulmonary embolism,3(20.0% of gas gangrene,and 2(13.3% of multiple organ failure.Univariate analysis revealed that the shock,time before admitted to hospital,amputation and tourniquet time were the risk factors for ARF in the wounded with limb injuries,while the logistic regression analysis showed only amputation was the risk factor for ARF(P < 0.05.Conclusion ARF is the primary cause-of-death in the wounded with limb injury.Prompt and accurate treatment and optimal time for amputation may be beneficial to decreasing the incidence and mortality of ARF in the wounded with severe limb injury and ischemic necrosis.
A comparison of multiple regression and neural network techniques for mapping in situ pCO2 data
International Nuclear Information System (INIS)
Using about 138,000 measurements of surface pCO2 in the Atlantic subpolar gyre (50-70 deg N, 60-10 deg W) during 1995-1997, we compare two methods of interpolation in space and time: a monthly distribution of surface pCO2 constructed using multiple linear regressions on position and temperature, and a self-organizing neural network approach. Both methods confirm characteristics of the region found in previous work, i.e. the subpolar gyre is a sink for atmospheric CO2 throughout the year, and exhibits a strong seasonal variability with the highest undersaturations occurring in spring and summer due to biological activity. As an annual average the surface pCO2 is higher than estimates based on available syntheses of surface pCO2. This supports earlier suggestions that the sink of CO2 in the Atlantic subpolar gyre has decreased over the last decade instead of increasing as previously assumed. The neural network is able to capture a more complex distribution than can be well represented by linear regressions, but both techniques agree relatively well on the average values of pCO2 and derived fluxes. However, when both techniques are used with a subset of the data, the neural network predicts the remaining data to a much better accuracy than the regressions, with a residual standard deviation ranging from 3 to 11 ?atm. The subpolar gyre is a net sink of CO2 of 0.13 Gt-C/yr using the mu2 of 0.13 Gt-C/yr using the multiple linear regressions and 0.15 Gt-C/yr using the neural network, on average between 1995 and 1997. Both calculations were made with the NCEP monthly wind speeds converted to 10 m height and averaged between 1995 and 1997, and using the gas exchange coefficient of Wanninkhof
Regression analysis of country effects using multilevel data: A cautionary tale
Bryan, Mark L.; Jenkins, Stephen P.
2013-01-01
Cross-national differences in outcomes are often analysed using regression analysis of multilevel country datasets, examples of which include the ECHP, ESS, EU-SILC, EVS, ISSP, and SHARE. We review the regression methods applicable to this data structure, pointing out problems with the assessment of country-level factors that appear not to be widely appreciated, and illustrate our arguments using Monte-Carlo simulations and analysis of women's employment probabilities and work hours using EU ...
Smith, Timothy D.; Steffen, Christopher J., Jr.; Yungster, Shaye; Keller, Dennis J.
1998-01-01
The all rocket mode of operation is shown to be a critical factor in the overall performance of a rocket based combined cycle (RBCC) vehicle. An axisymmetric RBCC engine was used to determine specific impulse efficiency values based upon both full flow and gas generator configurations. Design of experiments methodology was used to construct a test matrix and multiple linear regression analysis was used to build parametric models. The main parameters investigated in this study were: rocket chamber pressure, rocket exit area ratio, injected secondary flow, mixer-ejector inlet area, mixer-ejector area ratio, and mixer-ejector length-to-inlet diameter ratio. A perfect gas computational fluid dynamics analysis, using both the Spalart-Allmaras and k-omega turbulence models, was performed with the NPARC code to obtain values of vacuum specific impulse. Results from the multiple linear regression analysis showed that for both the full flow and gas generator configurations increasing mixer-ejector area ratio and rocket area ratio increase performance, while increasing mixer-ejector inlet area ratio and mixer-ejector length-to-diameter ratio decrease performance. Increasing injected secondary flow increased performance for the gas generator analysis, but was not statistically significant for the full flow analysis. Chamber pressure was found to be not statistically significant.
Toutkoushian, Robert K.
This paper proposes a five-step process by which to analyze whether the salary ratio between junior and senior college faculty exhibits salary compression, a term used to describe an unusually small differential between faculty with different levels of experience. The procedure utilizes commonly used statistical techniques (multiple regression…
Multiple factor analysis by example using R
Pagès, Jérôme
2014-01-01
Multiple factor analysis (MFA) enables users to analyze tables of individuals and variables in which the variables are structured into quantitative, qualitative, or mixed groups. Written by the co-developer of this methodology, Multiple Factor Analysis by Example Using R brings together the theoretical and methodological aspects of MFA. It also includes examples of applications and details of how to implement MFA using an R package (FactoMineR).The first two chapters cover the basic factorial analysis methods of principal component analysis (PCA) and multiple correspondence analysis (MCA). The
Economic growth and electricity consumption: Auto regressive distributed lag analysis
Scientific Electronic Library Online (English)
Melike E, Bildirici; Tahsin, Bakirtas; Fazil, Kayikci.
Full Text Available Knowledge of the direction of causality between electricity consumption and economic growth is of primary importance if appropriate energy policies and energy conservation measures are to be devised. This study estimates the causality relationship between electricity consumption and economic growth [...] in per capita and aggregate levels. The study uses the price and income elasticities of total electricity demand and industrial demand by using the auto regressive distributed lag (ARDL) method for some developed and developing countries, including the US, UK, Canada, Japan, China, India, Brazil, Italy, France, Turkey and South Africa. There is evidence to support the growth hypothesis for the US, China, Canada and Brazil. There is evidence to support the conservation hypothesis for India, Turkey, South Africa, Japan, UK, France and Italy.
Angela Radünz Lazzari
2011-01-01
Full Text Available O ar é um meio eficiente de dispersão de poluentes atmosféricos e seucomportamento depende dos movimentos atmosféricos que ocorrem na troposfera. Em Porto Alegre, Estado do Rio Grande do Sul, há um grande tráfego diário e uma concentração de indústrias que podem ser responsáveis por emissões atmosféricas. Neste trabalho, estudou-se ocomportamento das concentrações diárias de material particulado (PM10 desta cidade, considerando a influência dos elementos meteorológicos. A análise dos dados foi realizada a partir de estatísticas descritivas, correlação linear e regressão múltipla. Os dados foram fornecidos pela Fundação Estadual de Proteção Ambiental Henrique Luiz Roessler - RS (FEPAM e pelo Instituto Nacional de Meteorologia (INMET. A partir das análises pôde-se verificar que: asconcentrações do PM10, medidos diariamente às 16h, não ultrapassaram os padrões nacionais de qualidade do ar; os elementos meteorológicos que influenciam nas concentrações do PM10 foram: a velocidade média diária do vento e a radiação média diária com relações negativas; astemperaturas médias diárias do ar e as direções, norte e noroeste, do vento, com relações positivas. As direções do vento que contribuem significativamente para diminuir as concentrações nos locais medidos são Leste e Sudeste.Air is an efficient means of atmospheric pollutants dispersal and its r behavior depends on the atmospheric movements that occur in the troposphere. In Porto Alegre, Rio Grande do Sul State, there is a large daily traffic and a concentration of industries that may be responsible for atmospheric emission. In the present work we studied the behavior of daily concentrations of particulate matter (PM10, in this city, considering the influence of meteorological variables. Dataanalysis was performed from descriptive statistics, linear correlation and multiple regressions. Data were provided by the State Foundation of Environmental Protection Henrique Luiz Roessler - RS and the National Institute of Meteorology. Based on the analysis it was possible to verify that: the concentration of PM10, measured every day at 4:00 p.m., did not exceed national standards for air quality; meteorological elements that influenced on the concentrations of PM10 were the daily average wind speed and average daily radiation with negative relations; the daily average temperature of the air and the directions, north and northwest of wind, with positive relations. Wind directions which contribute significantly to lower concentrations on the measured placesare east and southeast.
Walters, Deborah K. W.; Linn, Richard T.; Kulas, Margaret; Cuddihy, Elisabeth; Wu, Chonghua; Carl V. Granger
1999-01-01
A multitude of techniques exists for modeling medical outcomes. One problem for the researcher is how to select an appropriate modeling technique for a given task. This paper addresses the problem through: an analysis of the strengths and weaknesses of three techniques; and, a case study in which the three techniques are applied to the task of predicting medical rehabilitation outcomes. The three techniques selected where linear regression analysis (LRA), classification and regression trees (...
Regression analysis for a bottom-up approach to analyzing semi-prompt fission gamma yields
International Nuclear Information System (INIS)
Highlights: ? Fitting the semi-prompt non-resolved photon spectrum after fission. ? Energy–time dependence can be factorized. ? Physical model, statistical model, sampling procedure. ? The best fit is: lognormal for energy and F for time. - Abstract: We present an empirical model that describes the yield of gamma rays emitted by fission in the time interval from 20 to 958 ns following a fission event. The analysis is based on experimental data from neutron-induced fission of 235U and 239Pu. The model is devised by first using regression analysis to identify likely patterns in the data and to choose plausible fitting functions. We provide statistical and physical arguments in support of time and energy independence. The intensity of the emitted gamma rays can be described as a bivariate distribution that is the product of independent variates for energy and time. We test several plausible distribution families for the energy and time variates and use maximum likelihood and minimum ?2 to estimate distribution parameters. Because of the uncertainty in the experimental data, multiple combinations of variate pairs give rise to a surface that plausibly well fits the observations well. The best-fit variate turns out to be lognormal in energy and F in time. The findings illustrated in this paper can be used to simulate gamma ray de-excitation from fission in Monte Carlo codes.
International Nuclear Information System (INIS)
Highlights: • Thermodynamic models of simple and regenerative cycles are defined. • Exergy destruction rate of different components was determined. • Impact of important operating parameters on cycles’ characteristics was determined. • Multiple polynomial regression models were developed. • Optimization for optimal operating parameters was performed. - Abstract: In this paper, thermo-environmental, economic and regression analyses of simple and regenerative gas turbine cycles are exhibited. Firstly, thermodynamic models for both cycles are defined; exergy destruction rate of different components is determined and parametric study is carried out to investigate the effects of compressor inlet temperature, turbine inlet temperature and compressor pressure ratio on the parameters that measure cycles’ performance, environmental impact and costs. Subsequently, multiple polynomial regression (MPR) models are developed to correlate important response variables with predictor variables and finally optimization is performed for optimal operating conditions. The results of parametric study have shown a significant impact of operating parameters on the performance parameters, environmental impact and costs. According to exergy analysis, the combustion chamber and exhaust stack are two major sites where largest exergy destruction/losses occur. Also, the total exergy destruction in the regenerative cycle is relatively lower; thereby resulted in a higher exergy efficiency of the cycle. The MPR models are also appeared as good estimator of the response variables since appended with very high R2 values. Finally, these models are used to determine the optimal operating parameters, which maximize the cycles’ performance and minimize CO2 emissions and costs
Deng, Yangyang; Parajuli, Prem B.
2011-08-10
Evaluation of economic feasibility of a bio-gasification facility needs understanding of its unit cost under different production capacities. The objective of this study was to evaluate the unit cost of syngas production at capacities from 60 through 1800Nm 3/h using an economic model with three regression analysis techniques (simple regression, reciprocal regression, and log-log regression). The preliminary result of this study showed that reciprocal regression analysis technique had the best fit curve between per unit cost and production capacity, with sum of error squares (SES) lower than 0.001 and coefficient of determination of (R 2) 0.996. The regression analysis techniques determined the minimum unit cost of syngas production for micro-scale bio-gasification facilities of $0.052/Nm 3, under the capacity of 2,880 Nm 3/h. The results of this study suggest that to reduce cost, facilities should run at a high production capacity. In addition, the contribution of this technique could be the new categorical criterion to evaluate micro-scale bio-gasification facility from the perspective of economic analysis.
Factors Associated with Methadone Treatment Duration: A Cox Regression Analysis
Peng, Ching-Yi; Chao, En; Lee, Tony Szu-Hsien
2015-01-01
This study examined retention rates and associated predictors of methadone maintenance treatment (MMT) duration among 128 newly admitted patients in Taiwan. A semi-structured questionnaire was used to obtain demographic and drug use history. Daily records of methadone taken and test results for HIV, HCV, and morphine toxicology were taken from a computerized medical registry. Cox regression analyses were performed to examine factors associated with MMT duration. MMT retention rates were 80.5%, 68.8%, 53.9%, and 41.4% for 3, 6, 12, and 18 months, respectively. Excluding 38 patients incarcerated during the study period, retention rates were 81.1%, 73.3%, 61.1%, and 48.9% for 3 months, 6 months, 12 months, and 18 months, respectively. No participant seroconverted to HIV and 1 died during the 18-months follow-up. Results showed that being female, imprisonment, a longer distance from house to clinic, having a lower methadone dose after 30 days, being HCV positive, and in the New Taipei city program predicted early patient dropout. The findings suggest favorable MMT outcomes of HIV seroincidence and mortality. Results indicate that the need to minimize travel distance and to provide programs that meet women’s requirements justify expansion of MMT clinics in Taiwan. PMID:25875531
Factors associated with methadone treatment duration: a Cox regression analysis.
Lin, Chao-Kuang; Hung, Chia-Chun; Peng, Ching-Yi; Chao, En; Lee, Tony Szu-Hsien
2015-01-01
This study examined retention rates and associated predictors of methadone maintenance treatment (MMT) duration among 128 newly admitted patients in Taiwan. A semi-structured questionnaire was used to obtain demographic and drug use history. Daily records of methadone taken and test results for HIV, HCV, and morphine toxicology were taken from a computerized medical registry. Cox regression analyses were performed to examine factors associated with MMT duration. MMT retention rates were 80.5%, 68.8%, 53.9%, and 41.4% for 3, 6, 12, and 18 months, respectively. Excluding 38 patients incarcerated during the study period, retention rates were 81.1%, 73.3%, 61.1%, and 48.9% for 3 months, 6 months, 12 months, and 18 months, respectively. No participant seroconverted to HIV and 1 died during the 18-months follow-up. Results showed that being female, imprisonment, a longer distance from house to clinic, having a lower methadone dose after 30 days, being HCV positive, and in the New Taipei city program predicted early patient dropout. The findings suggest favorable MMT outcomes of HIV seroincidence and mortality. Results indicate that the need to minimize travel distance and to provide programs that meet women's requirements justify expansion of MMT clinics in Taiwan. PMID:25875531
Additive Intensity Regression Models in Corporate Default Analysis
Lando, David; Medhat, Mamdouh
2013-01-01
We consider additive intensity (Aalen) models as an alternative to the multiplicative intensity (Cox) models for analyzing the default risk of a sample of rated, nonfinancial U.S. firms. The setting allows for estimating and testing the significance of time-varying effects. We use a variety of model checking techniques to identify misspecifications. In our final model, we find evidence of time-variation in the effects of distance-to-default and short-to-long term debt. Also we identify interactions between distance-to-default and other covariates, and the quick ratio covariate is significant. None of our macroeconomic covariates are significant.
H. Jalilvand
2008-01-01
Full Text Available This study was down in Forest Park of Noor. In order to determination of tree ring response to climatic variations, 35 cores were taken from dominant natural stand of common ash (Fraxinus excelsior L.. The guide of this study was finding which climatic variables are effective in the ring width growth of ash in current growing year and previous years (one, two and three years before current growing year by multiple regression models at the North of IR-Iran. Totally, 85 annually, monthly seasons and seasonal growth climatic variations of precipitation, temperature, heat index, evapotranspiration and water balance were analyzed. The best multiple regression models were explained 83 percent of total variance of the growth of common ash. The results show that the growth of common ash was related to the previous year's climatic variations than that of the current year. The most effective role of climatic variations was due to the first and second preceding years (55%. Evapotranspiration of July and September, and precipitation of May in the second and precipitation of March in the third previous years, all were positively affected the growth of this species. This study revealed that ash is interested in warmer condition on early and middle of seasonal growth in present of available humid, and precipitation in the months of early growing season (Ordibehesht-Khordad of two previous years.
Multiple correspondence analysis and related methods
Greenacre, Michael
2006-01-01
CORRESPONDENCE ANALYSIS AND RELATED METHODS IN PRACTICE, Jörg Blasius and Michael GreenacreA simple exampleBasic method Concepts of correspondence analysisStacked tables Multiple correspondence analysisCategorical principal components analysisActive and supplementary variablesMultiway data Content of the bookFROM SIMPLE TO MULTIPLE CORRESPONDENCE ANALYSIS, Michael GreenacreCanonical correlation analysisGeometric approach Supplementary pointsDiscussion and conclusions DIVIDED BY A COMMON LANGUAGE: ANALYZING AND VISUALIZING TWO-WAY ARRAYS, John C. GowerIntroduction: two-way tables and data matri
The primary treatment goal of radiotherapy for paragangliomas of the head and neck region (HNPGLs) is local control of the tumor, i.e. stabilization of tumor volume. Interestingly, regression of tumor volume has also been reported. Up to the present, no meta-analysis has been performed giving an overview of regression rates after radiotherapy in HNPGLs. The main objective was to perform a systematic review and meta-analysis to assess regression of tumor volume in HNPGL-patients after radiotherapy. A second outcome was local tumor control. Design of the study is systematic review and meta-analysis. PubMed, EMBASE, Web of Science, COCHRANE and Academic Search Premier and references of key articles were searched in March 2012 to identify potentially relevant studies. Considering the indolent course of HNPGLs, only studies with ?12 months follow-up were eligible. Main outcomes were the pooled proportions of regression and local control after radiotherapy as initial, combined (i.e. directly post-operatively or post-embolization) or salvage treatment (i.e. after initial treatment has failed) for HNPGLs. A meta-analysis was performed with an exact likelihood approach using a logistic regression with a random effect at the study level. Pooled proportions with 95% confidence intervals (CI) were reported. Fifteen studies were included, concerning a total of 283 jugulotympanic HNPGLs in 276 patients. Pooled regression proportions for initial, combined and salvage treatment were respectively 21%, 33% and 52% in radiosurgery studies and 4%, 0% and 64% in external beam radiotherapy studies. Pooled local control proportions for radiotherapy as initial, combined and salvage treatment ranged from 79% to 100%. Radiotherapy for jugulotympanic paragangliomas results in excellent local tumor control and therefore is a valuable treatment for these types of tumors. The effects of radiotherapy on regression of tumor volume remain ambiguous, although the data suggest that regression can be achieved at least in some patients. More research is needed to identify predictors for treatment success
Quantile regression for the statistical analysis of immunological data with many non-detects
Eilers Paul HC; Röder Esther; Savelkoul Huub FJ; van Wijk Roy
2012-01-01
Abstract Background Immunological parameters are hard to measure. A well-known problem is the occurrence of values below the detection limit, the non-detects. Non-detects are a nuisance, because classical statistical analyses, like ANOVA and regression, cannot be applied. The more advanced statistical techniques currently available for the analysis of datasets with non-detects can only be used if a small percentage of the data are non-detects. Methods and results Quantile regression, a genera...
Multilayer Perceptron for Robust Nonlinear Interval Regression Analysis Using Genetic Algorithms
Yi-Chung Hu
2014-01-01
On the basis of fuzzy regression, computational models in intelligence such as neural networks have the capability to be applied to nonlinear interval regression analysis for dealing with uncertain and imprecise data. When training data are not contaminated by outliers, computational models perform well by including almost all given training data in the data interval. Nevertheless, since training data are often corrupted by outliers, robust learning algorithms employed to resist outliers for ...
A regression analysis on the green olives debittering
Kopsidas, Gerassimos C.
1991-12-01
Full Text Available In this paper, a regression model, which gives the debittering time t as a function of the sodium hydroxide concentration 0 and the debittering temperature T, at the debittering of medium size green olive fruit of the Conservolea variety, is fitted. This model has the simple form t=a_{o}C^{a1} ? e^{a2/T}, where a_{o}, a_{1}, and a_{2} are constants. The values of a_{o}, a_{1}, and a_{2} are determined by the method of least squares from a set of experimental data. The determined model is very satisfactory for the conditions in which Greek green olives are debittered.
En este artículo se ajusta un modelo de regresión, que da el tiempo de endulzamiento t en función de la concentración de hidróxido sódico C y la temperatura de endulzamiento T, en el endulzamiento de aceitunas verdes de tamaño mediano de la variedad Conservolea. Este modelo tiene la forma simple t=a_{o}C^{a1} ? e^{a2/T}, donde a_{1} y a_{2} son constantes. Los valores de a_{o}, a_{1}, y a_{2} son determinados por el método de los mínimos cuadrados a partir de un grupo de datos experimentales. El modelo determinado es muy satisfactorio para las condiciones en las que las aceitunas verdes griegas son endulzadas.
The Use of Nonparametric Kernel Regression Methods in Econometric Production Analysis
Czekaj, Tomasz Gerard
2013-01-01
This PhD thesis addresses one of the fundamental problems in applied econometric analysis, namely the econometric estimation of regression functions. The conventional approach to regression analysis is the parametric approach, which requires the researcher to specify the form of the regression function. However, the a priori specification of a functional form involves the risk of choosing one that is not similar to the “true” but unknown relationship between the regressors and the dependent variable. This problem, known as parametric misspecification, can result in biased parameter estimates and hence, also in biased measures, which are derived from the estimated parameters. This, in turn, can result in incorrect economic conclusions and recommendations for managers, politicians and decision makers in general. This PhD thesis focuses on a nonparametric econometric approach that can be used to avoid this problem. The main objective is to investigate the applicability of the nonparametric kernel regression method in applied production analysis. The focus of the empirical analyses included in this thesis is the agricultural sector in Poland. Data on Polish farms are used to investigate practically and politically relevant problems and to illustrate how nonparametric regression methods can be used in applied microeconomic production analysis both in panel data and cross-section data settings. The thesis consists of four papers. The first paper addresses problems of parametric and nonparametric estimations of production functions in order to evaluate the optimal firm size. The second paper discusses the use of parametric and nonparametric regression methods to estimate panel data regression models. The third paper analyses production risk, price uncertainty, and farmers' risk preferences within a nonparametric panel data regression framework. The fourth paper analyses the technical efficiency of dairy farms with environmental output using nonparametric kernel regression in a semiparametric stochastic frontier analysis. The results provided in this PhD thesis show that nonparametric kernel methods are well-suited to econometric production analysis and can outperform traditional parametric methods. Although the empirical focus of this thesis is on the application of nonparametric kernel regression in applied production analysis, the findings are also applicable to econometric estimations in general.
Multiple Regression as a Practical Tool for Teacher Preparation Program Evaluation
Williams, Cynthia
2012-01-01
In response to No Child Left Behind mandates, budget cuts and various accountability demands aimed at improving programs, colleges and schools of education are in need of practical, quantitative evaluation methods which can be utilized internally to meaningfully examine teacher preparation programs and related coursework. The utility of multiple…
The application of a multiple regression model for aero radiometric data
International Nuclear Information System (INIS)
The data observed in the total channel of high sensitivity airborne ?-ray spectrometric surveys is selected as the dependent variable while those of the Th, K and U channels are considered as independent variables and a linear statistical model is assumed to relate them as (Total)sub(i) ?sub(0) + ?1(U)sub(i) + ?2(Th)sub(i) + ?3(K)sub(i) + ?sub(i), ?1, ?2, ?3, are the partial regression coefficients and ?sub(i) is the error term. The estimated coefficients (?1, ?2, ?3) are used to check on board the data acquisition system as well as to predict occasionally the more appropriate value of the data in case a single data item is not recorded correctly. (author)
Multifractal Analysis of Multiple Ergodic Averages
Fan, Ai-hua; Schmeling, Joerg; Wu, Meng
2011-01-01
In this paper we present a complete solution to the problem of multifractal analysis of multiple ergodic averages in the case of symbolic dynamics for functions of two variables depending on the first coordinate.
Analysis of Impacted Classes and Regression Test Suite Generation
Aprna Tripathi; Dharmendra Singh Kushwaha; Arun Kumar Misra
2013-01-01
Software needs to be changed over time to deal with new requirements, existing faults and change requests. Change made tosoftware will inevitably have some unforeseen and un desirable effects on other parts of the software. Software Change Impact Analysis (SCIA) is an approach used to identify the potential effects caused by change made o software. As any change is requested by the client or user the software project team have not only the objective to incorporate that change in the existing ...
Treating experimental data of inverse kinetic method by unitary linear regression analysis
International Nuclear Information System (INIS)
The theory of treating experimental data of inverse kinetic method by unitary linear regression analysis was described. Not only the reactivity, but also the effective neutron source intensity could be calculated by this method. Computer code was compiled base on the inverse kinetic method and unitary linear regression analysis. The data of zero power facility BFS-1 in Russia were processed and the results were compared. The results show that the reactivity and the effective neutron source intensity can be obtained correctly by treating experimental data of inverse kinetic method using unitary linear regression analysis and the precision of reactivity measurement is improved. The central element efficiency can be calculated by using the reactivity. The result also shows that the effect to reactivity measurement caused by external neutron source should be considered when the reactor power is low and the intensity of external neutron source is strong. (authors)
Gopal Rajendiran; Kavandappa-Goundar Mayilsamy; Ramasamy Subramanian; Natarajan Nedunchezhian; Ramasamy Venkatachalam
2014-01-01
The purpose of this research work is to build a multiple linear regression model for the characteristics of multicylinder diesel engine using multicomponent blends (diesel- pungamia methyl ester-ethanol) as fuel. Nine blends were tested by varying diesel (100 to 10% by Vol.), biodiesel (80 to 10% by vol.) and keeping ethanol as 10% constant. The brake thermal efficiency, smoke, oxides of nitrogen, carbon dioxide, maximum cylinder pressure, angle of maximum ...
Identifying the Factors that Influence Change in SEBD Using Logistic Regression Analysis
Liberato Camilleri; Carmel Cefai
2013-01-01
Multiple linear regression and ANOVA models are widely used in applications since they provide effective statistical tools for assessing the relationship between a continuous dependent variable and several predictors. However these models rely heavily on linearity and normality assumptions and they do not accommodate categorical dependent variables. The seminal contribution of John Nelder and Robert Wedderburn (1972) introduced the concept of Generalized Linear Models. GLMs overcome the limit...
Fauser, Patrik; Thomsen, Marianne
2010-01-01
This paper proposes a simple method for estimating emissions and predicted environmental concentrations (PECs) in water and air for organic chemicals that are used in household products and industrial processes. The method has been tested on existing data for 63 organic high-production volume chemicals available in the European Chemicals Bureau risk assessment reports (RARs). The method suggests a simple linear relationship between Henry's Law constant, octanol-water coefficient, use and production volumes, and emissions and PECs on a regional scale in the European Union. Emissions and PECs are a result of a complex interaction between chemical properties, production and use patterns and geographical characteristics. A linear relationship cannot capture these complexities; however, it may be applied at a cost-efficient screening level for suggesting critical chemicals that are candidates for an in-depth risk assessment. Uncertainty measures are not available for the RAR data; however, uncertainties for the applied regression models are given in the paper. Evaluation of the methods reveals that between 79% and 93% of all emission and PEC estimates are within one order of magnitude of the reported RAR values. Bearing in mind that the domain of the method comprises organic industrial high-production volume chemicals, four chemicals, prioritized in the Water Framework Directive and the Stockholm Convention on Persistent Organic Pollutants, were used to test the method for estimated emissions and PECs, with corresponding uncertainty intervals, in air and water at regional EU level.
Random Decrement and Regression Analysis of Traffic Responses of Bridges
Asmussen, J. C.; Ibrahim, S. R.
1995-01-01
The topic of this paper is the estimation of modal parameters from ambient data by applying the Random Decrement technique. The data from the Queensborough Bridge over the Fraser River in Vancouver, Canada have been applied. The loads producing the dynamic response are ambient, e.g. wind, traffic and small ground motion. The Random Decrement technique is used to estimate the correlation function or the free decays from the ambient data. From these functions, the modal parameters are extracted using the Ibrahim Time Domain method. The possible influence of the traffic mass load on the bridge is investigated by assuming that the response level of the bridge is dependent on the mass of the vehicle load. The eigenfrequencies of the bridge are estimated as a function of the response level. This indicates the degree of influence of the mass load on the estimated eigenfrequencies. The results of the analysis using the Random Decrement technique are compared with results from an analysis based on fast Fourier transformations.
Lee, Sungyoung; Kwon, Min-Seok; Park, Taesung
2014-01-01
In genome-wide association studies (GWAS), regression analysis has been most commonly used to establish an association between a phenotype and genetic variants, such as single nucleotide polymorphism (SNP). However, most applications of regression analysis have been restricted to the investigation of single marker because of the large computational burden. Thus, there have been limited applications of regression analysis to multiple SNPs, including gene–gene interaction (GGI) in large-scale GWAS data. In order to overcome this limitation, we propose CARAT-GxG, a GPU computing system-oriented toolkit, for performing regression analysis with GGI using CUDA (compute unified device architecture). Compared to other methods, CARAT-GxG achieved almost 700-fold execution speed and delivered highly reliable results through our GPU-specific optimization techniques. In addition, it was possible to achieve almost-linear speed acceleration with the application of a GPU computing system, which is implemented by the TORQUE Resource Manager. We expect that CARAT-GxG will enable large-scale regression analysis with GGI for GWAS data. PMID:25574130
Lee, Sungyoung; Kwon, Min-Seok; Park, Taesung
2014-01-01
In genome-wide association studies (GWAS), regression analysis has been most commonly used to establish an association between a phenotype and genetic variants, such as single nucleotide polymorphism (SNP). However, most applications of regression analysis have been restricted to the investigation of single marker because of the large computational burden. Thus, there have been limited applications of regression analysis to multiple SNPs, including gene-gene interaction (GGI) in large-scale GWAS data. In order to overcome this limitation, we propose CARAT-GxG, a GPU computing system-oriented toolkit, for performing regression analysis with GGI using CUDA (compute unified device architecture). Compared to other methods, CARAT-GxG achieved almost 700-fold execution speed and delivered highly reliable results through our GPU-specific optimization techniques. In addition, it was possible to achieve almost-linear speed acceleration with the application of a GPU computing system, which is implemented by the TORQUE Resource Manager. We expect that CARAT-GxG will enable large-scale regression analysis with GGI for GWAS data. PMID:25574130
Partially linear censored quantile regression
Neocleous, T.; Portnoy, S
2009-01-01
Censored regression quantile (CRQ) methods provide a powerful and flexible approach to the analysis of censored survival data when standard linear models are felt to be appropriate. In many cases however, greater flexibility is desired to go beyond the usual multiple regression paradigm. One area of common interest is that of partially linear models: one (or more) of the explanatory covariates are assumed to act on the response through a non-linear function. Here the CRQ approach of Portnoy (...
Nose, Takashi; Kobayashi, Takao
In this paper, we propose a technique for estimating the degree or intensity of emotional expressions and speaking styles appearing in speech. The key idea is based on a style control technique for speech synthesis using a multiple regression hidden semi-Markov model (MRHSMM), and the proposed technique can be viewed as the inverse of the style control. In the proposed technique, the acoustic features of spectrum, power, fundamental frequency, and duration are simultaneously modeled using the MRHSMM. We derive an algorithm for estimating explanatory variables of the MRHSMM, each of which represents the degree or intensity of emotional expressions and speaking styles appearing in acoustic features of speech, based on a maximum likelihood criterion. We show experimental results to demonstrate the ability of the proposed technique using two types of speech data, simulated emotional speech and spontaneous speech with different speaking styles. It is found that the estimated values have correlation with human perception.
The interelectronic repulsion and spin-orbit interaction parameters for some Ndsup(3+)?-diketone complexes have been computed using partial and multiple regression method from the observed absorption spectra in the region 1000-23500 cmsup(-1). A brief outline of this method which is an alternative to a computer programming method is given. The energy parameters (Slater-Condon and Lande') derived from intra-fsup(N) transitions of lanthanide ion have their importance to predict the covalent tendency of the metal-ligand bond in the complex on the basis of the decrease in the value of these parameters. The complexes have been arranged in the increasing order of covalency as has been indicated by the value of ? or bsup(1/2). (author)
Carlos A, Díaz-Contreras; Alejandra, Aguilera-Rojas; Nathaly, Guillén-Barrientos.
2014-10-01
Full Text Available La incorporación de nuevo personal o la reasignación del ya existente a tareas específicas constituyen una decisión importante, porque el acierto en ella determinará la propia supervivencia de la empresa. En este contexto se vuelve relevante contar con un modelo de selección de personal que consider [...] e la información ambigua y los grados de incertidumbre que están asociados al momento de evaluar las valoraciones cualitativas de los postulantes y que pueda entregar resultados certeros y precisos, garantizando de esta manera el buen desempeño del cargo y reduciendo así el riesgo que conlleva la incorporación de nuevas personas. En este trabajo se elaboró un modelo de selección de personal, en condiciones de incertidumbre, aplicando Lógica Difusa, utilizando como datos de entrada las descripciones de cargos de una empresa del retail, con variables difusas triangulares y con solapamiento. Este fue comparado con un modelo clásico de regresión múltiple. Los resultados mostraron que, en este caso, el uso del modelo de regresión múltiple es más eficiente que el modelo de lógica difusa optado. Abstract in english The incorporation of new personnel or the reallocation of existing tasks is an important decision, since its correctness will determine the survival of the company. In this context, having a model of personnel selection, that considers the associated ambiguous information and degrees of uncertainty, [...] becomes relevant when assessing the qualitative value of the applicants, able to deliver accurate and precise results thus ensuring the good performance of the position and reducing the associated risk with the incorporation of new people. In this work, a model of personnel selection, in conditions of uncertainty using fuzzy logic and having as input the data descriptions of positions of a retail industry, with triangular fuzzy variables and overlap was developed. This was compared with a classical model of multiple regressions. The results showed in this case, that the use of the model of multiple regressions is more efficient than the opted model of fuzzy logic.
E, CORNWELL.
2006-03-01
Full Text Available In QSRR discipline an easy novel to used parameter was designed (Vc) for evaluated classical topological index (W, ¹chi, Z, MTI) and two new generation ones (Xu, ¹chih). Regression between Vc and ¹chih presented a correlation index (r) of 0,9992, a surprising high value in comparison with that found [...] s commonly in QSPR/QSAR discipline. Through Vc parameter, an idea to treatise multiple three independent variable regression is present. Model of 35 saturated hydrocarbons were used
Baghi, Q; Bergé, J; Christophe, B; Touboul, P; Rodrigues, M
2015-01-01
The analysis of physical measurements often copes with highly correlated noises and interruptions caused by outliers, saturation events or transmission losses. We assess the impact of missing data on the performance of linear regression analysis involving the fit of modeled or measured time series. We show that data gaps can significantly alter the precision of the regression parameter estimation in the presence of colored noise, due to the frequency leakage of the noise power. We present a regression method which cancels this effect and estimates the parameters of interest with a precision comparable to the complete data case, even if the noise power spectral density (PSD) is not known a priori. The method is based on an autoregressive (AR) fit of the noise, which allows us to build an approximate generalized least squares estimator approaching the minimal variance bound. The method, which can be applied to any similar data processing, is tested on simulated measurements of the MICROSCOPE space mission, whos...
The empirical model of turbine efficiency is necessary for the control- and/or diagnosis-oriented simulation and useful for the simulation and analysis of dynamic performances of the turbine equipment and systems, such as air cycle refrigeration systems, power plants, turbine engines, and turbochargers. Existing empirical models of turbine efficiency are insufficient because there is no suitable form available for air cycle refrigeration turbines. This work performs a critical review of empirical models (called mean value models in some literature) of turbine efficiency and develops an empirical model in the desired form for air cycle refrigeration, the dominant cooling approach in aircraft environmental control systems. The Taylor series and regression analysis are used to build the model, with the Taylor series being used to expand functions with the polytropic exponent and the regression analysis to finalize the model. The measured data of a turbocharger turbine and two air cycle refrigeration turbines are used for the regression analysis. The proposed model is compact and able to present the turbine efficiency map. Its predictions agree with the measured data very well, with the corrected coefficient of determination Rc2 ? 0.96 and the mean absolute percentage deviation = 1.19% for the three turbines. -- Highlights: ? Performed a critical review of empirical models of turbine efficiency. ? Developed an empirical model in the desired form for air cycle refrigeration, using the Taylor expansion and regression analysis. ? Verified the method for developing the empirical model. ? Verified the model.
Linear Maximum Likelihood Regression Analysis for Untransformed Log-Normally Distributed Data
Sara M. Gustavsson
2012-10-01
Full Text Available Medical research data are often skewed and heteroscedastic. It has therefore become practice to log-transform data in regression analysis, in order to stabilize the variance. Regression analysis on log-transformed data estimates the relative effect, whereas it is often the absolute effect of a predictor that is of interest. We propose a maximum likelihood (ML-based approach to estimate a linear regression model on log-normal, heteroscedastic data. The new method was evaluated with a large simulation study. Log-normal observations were generated according to the simulation models and parameters were estimated using the new ML method, ordinary least-squares regression (LS and weighed least-squares regression (WLS. All three methods produced unbiased estimates of parameters and expected response, and ML and WLS yielded smaller standard errors than LS. The approximate normality of the Wald statistic, used for tests of the ML estimates, in most situations produced correct type I error risk. Only ML and WLS produced correct confidence intervals for the estimated expected value. ML had the highest power for tests regarding ?_{1}.
Han, Bing; Jing, Hongyuan; Liu, Jianping; Wu, Zhangzhong [PetroChina Pipeline RandD Center, Langfang, Hebei (China); Hao, Jianbin [School of Petroleum Engineering, Southwest Petroleum University, Chengdu, Sichuan (China)
2010-07-01
Landslides have a serious impact on the integrity of oil and gas pipelines in the tough terrain of Western China. This paper introduces a solving method of axial stress, which uses numerical simulation and regression analysis for the pipelines subjected to landslides. Numerical simulation is performed to analyze the change regularity of pipe stresses for the five vulnerability assessment indexes, which are: the distance between pipeline and landslide tail; the thickness of landslide; the inclination angle of landslide; the pipeline length passing through landslide; and the buried depth of pipeline. A pipeline passing through a certain landslide in southwest China was selected as an example to verify the feasibility and effectiveness of this method. This method has practical applicability, but it would need large numbers of examples to better verify its reliability and should be modified accordingly. Also, it only considers the case where the direction of the pipeline is perpendicular to the primary slip direction of the landslide.
Catching up with Harvard: Results from Regression Analysis of World Universities League Tables
Li, Mei; Shankar, Sriram; Tang, Kam Ki
2011-01-01
This paper uses regression analysis to test if the universities performing less well according to Shanghai Jiao Tong University's world universities league tables are able to catch up with the top performers, and to identify national and institutional factors that could affect this catching up process. We have constructed a dataset of 461…
Cognitive Differentiation Analysis: A Regression Extension of the Reynolds-Sutrick Model.
Reynolds, Thomas J.; Sutrick, Kenneth H.
1988-01-01
Cognitive Differentiation Analysis (CDA) represents a method to measure the correspondence of an individual vector or a composite vector of descriptor ratings to a matrix of pair-wise dissimilarity judgments where both sets of judgments are assumed to be ordinal. The zero intercept regression extension of CDA is described. (TJH)
Van Keilegom, Ingrid; Wang, Lan
2010-01-01
We consider the problem of modeling heteroscedasticity in semiparametric regression analysis of crosssectional data. Existing work in this setting is rather limited and mostly adopts a fully nonparametric variance structure. This approach is hampered by curse of dimensionality in practical applications. Moreover, the corresponding asymptotic theory is largely restricted to estimators that minimize certain smooth objective functions. The asymptotic derivation thus excludes semiparametric quant...
Isolating the Effects of Training Using Simple Regression Analysis: An Example of the Procedure.
Waugh, C. Keith
This paper provides a case example of simple regression analysis, a forecasting procedure used to isolate the effects of training from an identified extraneous variable. This case example focuses on results of a three-day sales training program to improve bank loan officers' knowledge, skill-level, and attitude regarding solicitation and sale of…
Methods and applications of linear models regression and the analysis of variance
Hocking, Ronald R
2013-01-01
Praise for the Second Edition"An essential desktop reference book . . . it should definitely be on your bookshelf." -Technometrics A thoroughly updated book, Methods and Applications of Linear Models: Regression and the Analysis of Variance, Third Edition features innovative approaches to understanding and working with models and theory of linear regression. The Third Edition provides readers with the necessary theoretical concepts, which are presented using intuitive ideas rather than complicated proofs, to describe the inference that is appropriate for the methods being discussed. The book
Automatic regression analysis for use in a complex system of evaluation of plant genetic resources
Attila T. SZABO
1984-08-01
Full Text Available In accordance with the general requirements regarding computerization in gene banks and germplasm research a computer program has been compiled for the analysis of univariate response in crop germplasm evaluation. The program is compiled in COBOL and run on a FELIX C-256 computer. The different modules of the program allows for: (1. data control and error listing; (2 computation of the regression function; (3 listing of the difference between the values measured and computed; (4 sorting of the individuals samples; (5 construction of scattergrams in two dimensions for measured values with the simultaneous representation of the regression line; (6 listing of examined samples in a sequence required in evaluation.
Fast algorithm of the robust Gaussian regression filter for areal surface analysis
International Nuclear Information System (INIS)
In this paper, the general model of the Gaussian regression filter for areal surface analysis is explored. The intrinsic relationships between the linear Gaussian filter and the robust filter are addressed. A general mathematical solution for this model is presented. Based on this technique, a fast algorithm is created. Both simulated and practical engineering data (stochastic and structured) have been used in the testing of the fast algorithm. Results show that with the same accuracy, the processing time of the second-order nonlinear regression filters for a dataset of 1024*1024 points has been reduced to several seconds from the several hours of traditional algorithms
Statistical methods in regression and calibration analysis of chromosome aberration data
International Nuclear Information System (INIS)
The method of iteratively reweighted least squares for the regression analysis of Poisson distributed chromosome aberration data is reviewed in the context of other fit procedures used in the cytogenetic literature. As an application of the resulting regression curves methods for calculating confidence intervals on dose from aberration yield are described and compared, and, for the linear quadratic model a confidence interval is given. Emphasis is placed on the rational interpretation and the limitations of various methods from a statistical point of view. (orig./MG)
Detrended fluctuation analysis as a regression framework: Estimating dependence at different scales
Kristoufek, Ladislav
2015-02-01
We propose a framework combining detrended fluctuation analysis with standard regression methodology. The method is built on detrended variances and covariances and it is designed to estimate regression parameters at different scales and under potential nonstationarity and power-law correlations. The former feature allows for distinguishing between effects for a pair of variables from different temporal perspectives. The latter ones make the method a significant improvement over the standard least squares estimation. Theoretical claims are supported by Monte Carlo simulations. The method is then applied on selected examples from physics, finance, environmental science, and epidemiology. For most of the studied cases, the relationship between variables of interest varies strongly across scales.
Multiple Factorial Analysis of Symbolic Data
2012-12-01
Full Text Available This document presents an extension of the multiple factorial analysis to symbolic data and especially to space data. The analysis makes use of the characteristic coding method to obtain active individuals and the reconstitutive coding method for additional individuals in order to conserve the variability of assertion objects. Traditional analysis methods of the main components are applied to coded objects. Certain interpretation aids are presented after the coding process. This method was applied to poverty data.
Factors predicting the failure of Bernese periacetabular osteotomy: a meta-regression analysis
Sambandam, Senthil Nathan; Hull, Jason; Jiranek, William A.
2008-01-01
There is no clear evidence regarding the outcome of Bernese periacetabular osteotomy (PAO) in different patient populations. We performed systematic meta-regression analysis of 23 eligible studies. There were 1,113 patients of which 61 patients had total hip arthroplasty (THA) (endpoint) as a result of failed Bernese PAO. Univariate analysis revealed significant correlation between THA and presence of grade 2/grade 3 arthritis, Merle de’Aubigne score (MDS), Harris hip score and Tonnis angle...
Gao, Jun; Johnston, Grace M.; Lavergne, M. Ruth; Mcintyre, Paul
2011-01-01
Classification and regression tree (CART) analysis was used to identify subpopulations with lower palliative care program (PCP) enrolment rates. CART analysis uses recursive partitioning to group predictors. The PCP enrolment rate was 72 percent for the 6,892 adults who died of cancer from 2000 and 2005 in two counties in Nova Scotia, Canada. The lowest PCP enrolment rates were for nursing home residents over 82 years (27 percent), a group residing more than 43 kilometres from the PCP (31 per...
ANDERSON, CARL A.; Mcrae, Allan F.; Visscher, Peter M.
2006-01-01
Standard quantitative trait loci (QTL) mapping techniques commonly assume that the trait is both fully observed and normally distributed. When considering survival or age-at-onset traits these assumptions are often incorrect. Methods have been developed to map QTL for survival traits; however, they are both computationally intensive and not available in standard genome analysis software packages. We propose a grouped linear regression method for the analysis of continuous survival data. Using...
Anwar Fitrianto; Lee Ceng Yik
2014-01-01
When independent variables have high linear correlation in a multiple linear regression model, we can have wrong analysis. It happens if we do the multiple linear regression analysis based on common Ordinary Least Squares (OLS) method. In this situation, we are suggested to use ridge regression estimator. We conduct some simulation study to compare the performance of ridge regression estimator and the OLS. We found that Hoerl and Kennard ridge regression estimation method has better performan...
DEFF Research Database (Denmark)
NØrgaard, Trine; MØldrup, Per
2014-01-01
Colloids are potential carriers for strongly sorbing chemicals in macroporous soils, but predicting the amount of colloids readily available for facilitated chemical transport is an unsolved challenge. This study addresses potential key parameters and predictive indicators when assessing colloid dispersibility and transport at the field scale. Samples representing three measurement scales (1-2 mm aggregates, intact 100 cm3 rings, and intact 6283 cm3 columns) were retrieved from the topsoil of a 1.69 ha agricultural field in a 15 m × 15 m grid (65 locations) to determine soil dispersibility as well as 24 comparison parameters including textural, chemical, and structural (e.g. air permeability) 8 soil properties. The soil dispersibility was determined (i) using a laser diffraction method on 1-2 mm aggregates equilibrated to an initial matric potential of -100 cm H2O, (ii) using an end-over-end shaking on 6.06 cm (diam.) × 3.48 cm (height) cm intact soil rings equilibrated to an initial matric potential of -5 cmH2O, and (iii) as the accumulated amount of particles leached from 20 cm × 20 cm intact soil columns after 6.5 hr (60 mm accumulated outflow). At all three scales, soil dispersibility was higher in samples collected from the northern part of the field where the greatest leaching of pesticides was observed in a horizontal well at ~ 3.5 m depth during a 9-year monitoring program. This suggests that the three dispersibility methods used are all relevant for field-scale mapping of areas with enhanced risk of colloid-facilitated transport. Subsequently, using multiple linear regression (MLR) analyses, soil dispersibility was predicted at all three sample scales from the 24 measured, geo-referenced parameters to produce sets of only a few promising indicator parameters for evaluating soil stability and particle mobilization on field scale. The MLR analyses at each scale were separated in predictions using all, only north, and only south locations in the field. We found that different independent variables were included in the regression models when the sample scale increased from aggregate to column level. Generally, the predictive power of the regression models was better on the 1-2 mm aggregate scale than on the intact 100 cm3 and 20 cm × 20 cm scales. Overall, results suggested that different drivers controlled soil dispersibility 1 at the three scales and the two sub-areas of the field. Predictions of soil dispersibility and the risk of colloid-facilitated chemical transport will therefore need to be highly scale- and area-specific.
Saeideh Ebrahimiasl
2014-02-01
Full Text Available A nanocrystalline SnO2 thin film was synthesized by a chemical bath method. The parameters affecting the energy band gap and surface morphology of the deposited SnO2 thin film were optimized using a semi-empirical method. Four parameters, including deposition time, pH, bath temperature and tin chloride (SnCl2·2H2O concentration were optimized by a factorial method. The factorial used a Taguchi OA (TOA design method to estimate certain interactions and obtain the actual responses. Statistical evidences in analysis of variance including high F-value (4,112.2 and 20.27, very low P-value (<0.012 and 0.0478, non-significant lack of fit, the determination coefficient (R2 equal to 0.978 and 0.977 and the adequate precision (170.96 and 12.57 validated the suggested model. The optima of the suggested model were verified in the laboratory and results were quite close to the predicted values, indicating that the model successfully simulated the optimum conditions of SnO2 thin film synthesis.
Baxter Lisa K
2008-05-01
Full Text Available Abstract Background There is a growing body of literature linking GIS-based measures of traffic density to asthma and other respiratory outcomes. However, no consensus exists on which traffic indicators best capture variability in different pollutants or within different settings. As part of a study on childhood asthma etiology, we examined variability in outdoor concentrations of multiple traffic-related air pollutants within urban communities, using a range of GIS-based predictors and land use regression techniques. Methods We measured fine particulate matter (PM2.5, nitrogen dioxide (NO2, and elemental carbon (EC outside 44 homes representing a range of traffic densities and neighborhoods across Boston, Massachusetts and nearby communities. Multiple three to four-day average samples were collected at each home during winters and summers from 2003 to 2005. Traffic indicators were derived using Massachusetts Highway Department data and direct traffic counts. Multivariate regression analyses were performed separately for each pollutant, using traffic indicators, land use, meteorology, site characteristics, and central site concentrations. Results PM2.5 was strongly associated with the central site monitor (R2 = 0.68. Additional variability was explained by total roadway length within 100 m of the home, smoking or grilling near the monitor, and block-group population density (R2 = 0.76. EC showed greater spatial variability, especially during winter months, and was predicted by roadway length within 200 m of the home. The influence of traffic was greater under low wind speed conditions, and concentrations were lower during summer (R2 = 0.52. NO2 showed significant spatial variability, predicted by population density and roadway length within 50 m of the home, modified by site characteristics (obstruction, and with higher concentrations during summer (R2 = 0.56. Conclusion Each pollutant examined displayed somewhat different spatial patterns within urban neighborhoods, and were differently related to local traffic and meteorology. Our results indicate a need for multi-pollutant exposure modeling to disentangle causal agents in epidemiological studies, and further investigation of site-specific and meteorological modification of the traffic-concentration relationship in urban neighborhoods.
Regression analysis to predict growth performance from dietary net energy in growing-finishing pigs.
Nitikanchana, S; Dritz, S S; Tokach, M D; DeRouchey, J M; Goodband, R D; White, B J
2015-06-01
Data from 41 trials with multiple energy levels (285 observations) were used in a meta-analysis to predict growth performance based on dietary NE concentration. Nutrient and energy concentrations in all diets were estimated using the NRC ingredient library. Predictor variables examined for best fit models using Akaike information criteria included linear and quadratic terms of NE, BW, CP, standardized ileal digestible (SID) Lys, crude fiber, NDF, ADF, fat, ash, and their interactions. The initial best fit models included interactions between NE and CP or SID Lys. After removal of the observations that fed SID Lys below the suggested requirement, these terms were no longer significant. Including dietary fat in the model with NE and BW significantly improved the G:F prediction model, indicating that NE may underestimate the influence of fat on G:F. The meta-analysis indicated that, as long as diets are adequate for other nutrients (i.e., Lys), dietary NE is adequate to predict changes in ADG across different dietary ingredients and conditions. The analysis indicates that ADG increases with increasing dietary NE and BW but decreases when BW is above 87 kg. The G:F ratio improves with increasing dietary NE and fat but decreases with increasing BW. The regression equations were then evaluated by comparing the actual and predicted performance of 543 finishing pigs in 2 trials fed 5 dietary treatments, included 3 different levels of NE by adding wheat middlings, soybean hulls, dried distillers grains with solubles (DDGS; 8 to 9% oil), or choice white grease (CWG) to a corn-soybean meal-based diet. Diets were 1) 30% DDGS, 20% wheat middlings, and 4 to 5% soybean hulls (low energy); 2) 20% wheat middlings and 4 to 5% soybean hulls (low energy); 3) a corn-soybean meal diet (medium energy); 4) diet 2 supplemented with 3.7% CWG to equalize the NE level to diet 3 (medium energy); and 5) a corn-soybean meal diet with 3.7% CWG (high energy). Only small differences were observed between predicted and observed values of ADG and G:F except for the low-energy diet containing the greatest fiber content (30% DDGS diet), where ADG and G:F were overpredicted by 3 to 6%. Therefore, the prediction equations provided a good estimation of the growth rate and feed efficiency of growing-finishing pigs fed different levels of dietary NE except for the pigs fed the low-energy diet containing the greatest fiber content. PMID:26115270
MULTIPLE PROBIT ANALYSIS WITH A NONZERO BACKGROUND
The 'EM' (Expectation-Maximization) algorithm is applied to probit analysis with multiple independent variables and a nonzero response rate. The equations for the maximum likelihood estimators are relatively simple, and converge in all the cases so far examined. An animal bioassa...
International Nuclear Information System (INIS)
Prediction of the amount of hospital waste production will be helpful in the storage, transportation and disposal of hospital waste management. Based on this fact, two predictor models including artificial neural networks (ANNs) and multiple linear regression (MLR) were applied to predict the rate of medical waste generation totally and in different types of sharp, infectious and general. In this study, a 5-fold cross-validation procedure on a database containing total of 50 hospitals of Fars province (Iran) were used to verify the performance of the models. Three performance measures including MAR, RMSE and R2 were used to evaluate performance of models. The MLR as a conventional model obtained poor prediction performance measure values. However, MLR distinguished hospital capacity and bed occupancy as more significant parameters. On the other hand, ANNs as a more powerful model, which has not been introduced in predicting rate of medical waste generation, showed high performance measure values, especially 0.99 value of R2 confirming the good fit of the data. Such satisfactory results could be attributed to the non-linear nature of ANNs in problem solving which provides the opportunity for relating independent variables to dependent ones non-linearly. In conclusion, the obtained results showed that our ANN-based model approach is very promising and may play a useful role in developing a better cost-effective strategy for waste management in fututive strategy for waste management in future.
Linard, Joshua I.
2013-01-01
Mitigating the effects of salt and selenium on water quality in the Grand Valley and lower Gunnison River Basin in western Colorado is a major concern for land managers. Previous modeling indicated means to improve the models by including more detailed geospatial data and a more rigorous method for developing the models. After evaluating all possible combinations of geospatial variables, four multiple linear regression models resulted that could estimate irrigation-season salt yield, nonirrigation-season salt yield, irrigation-season selenium yield, and nonirrigation-season selenium yield. The adjusted r-squared and the residual standard error (in units of log-transformed yield) of the models were, respectively, 0.87 and 2.03 for the irrigation-season salt model, 0.90 and 1.25 for the nonirrigation-season salt model, 0.85 and 2.94 for the irrigation-season selenium model, and 0.93 and 1.75 for the nonirrigation-season selenium model. The four models were used to estimate yields and loads from contributing areas corresponding to 12-digit hydrologic unit codes in the lower Gunnison River Basin study area. Each of the 175 contributing areas was ranked according to its estimated mean seasonal yield of salt and selenium.
Fereshteh Shiri
2010-08-01
Full Text Available In the present work, support vector machines (SVMs and multiple linear regression (MLR techniques were used for quantitative structure–property relationship (QSPR studies of retention time (tR in standardized liquid chromatography–UV–mass spectrometry of 67 mycotoxins (aflatoxins, trichothecenes, roquefortines and ochratoxins based on molecular descriptors calculated from the optimized 3D structures. By applying missing value, zero and multicollinearity tests with a cutoff value of 0.95, and genetic algorithm method of variable selection, the most relevant descriptors were selected to build QSPR models. MLRand SVMs methods were employed to build QSPR models. The robustness of the QSPR models was characterized by the statistical validation and applicability domain (AD. The prediction results from the MLR and SVM models are in good agreement with the experimental values. The correlation and predictability measure by r2 and q2 are 0.931 and 0.932, repectively, for SVM and 0.923 and 0.915, respectively, for MLR. The applicability domain of the model was investigated using William’s plot. The effects of different descriptors on the retention times are described.
Mehrjoo, Saeed; Bashiri, Mahdi
2013-05-01
Production planning and control (PPC) systems have to deal with rising complexity and dynamics. The complexity of planning tasks is due to some existing multiple variables and dynamic factors derived from uncertainties surrounding the PPC. Although literatures on exact scheduling algorithms, simulation approaches, and heuristic methods are extensive in production planning, they seem to be inefficient because of daily fluctuations in real factories. Decision support systems can provide productive tools for production planners to offer a feasible and prompt decision in effective and robust production planning. In this paper, we propose a robust decision support tool for detailed production planning based on statistical multivariate method including principal component analysis and logistic regression. The proposed approach has been used in a real case in Iranian automotive industry. In the presence of existing multisource uncertainties, the results of applying the proposed method in the selected case show that the accuracy of daily production planning increases in comparison with the existing method.
Analysis of the Evolution of the Gross Domestic Product by Means of Cyclic Regressions
Catalin Angelo Ioan
2011-08-01
Full Text Available In this article, we will carry out an analysis on the regularity of the Gross Domestic Product of a country, in our case the United States. The method of analysis is based on a new method of analysis – the cyclic regressions based on the Fourier series of a function. Another point of view is that of considering instead the growth rate of GDP the speed of variation of this rate, computed as a numerical derivative. The obtained results show a cycle for this indicator for 71 years, the mean square error being 0.93%. The method described allows an prognosis on short-term trends in GDP.
Hüseyin BUDAK
2012-11-01
Full Text Available Credit scoring is a vital topic for Banks since there is a need to use limited financial sources more effectively. There are several credit scoring methods that are used by Banks. One of them is to estimate whether a credit demanding customer’s repayment order will be regular or not. In this study, artificial neural networks and logistic regression analysis have been used to provide a support to the Banks’ credit risk prediction and to estimate whether a credit demanding customers’ repayment order will be regular or not. The results of the study showed that artificial neural networks method is more reliable than logistic regression analysis while estimating a credit demanding customer’s repayment order.
Groping Toward Linear Regression Analysis: Newton's Analysis of Hipparchus' Equinox Observations
Belenkiy, Ari
2008-01-01
In 1700, Newton, in designing a new universal calendar contained in the manuscripts known as Yahuda MS 24 from Jewish National and University Library at Jerusalem and analyzed in our recent article in Notes & Records Royal Society (59 (3), Sept 2005, pp. 223-54), attempted to compute the length of the tropical year using the ancient equinox observations reported by a famous Greek astronomer Hipparchus of Rhodes, ten in number. Though Newton had a very thin sample of data, he obtained a tropical year only a few seconds longer than the correct length. The reason lies in Newton's application of a technique similar to modern regression analysis. Actually he wrote down the first of the two so-called "normal equations" known from the Ordinary Least Squares method. Newton also had a vague understanding of qualitative variables. This paper concludes by discussing open historico-astronomical problems related to the inclination of the Earth's axis of rotation. In particular, ignorance about the long-range variation...
Chamroukhi, Faicel; Glotin, Hervé; Samé, Allou
2013-01-01
In this paper, we study the modeling and the classification of functional data presenting regime changes over time. We propose a new model-based functional mixture discriminant analysis approach based on a specific hidden process regression model that governs the regime changes over time. Our approach is particularly adapted to handle the problem of complex-shaped classes of curves, where each class is potentially composed of several sub-classes, and to deal with the regime ...
High-Dimensional Heteroscedastic Regression with an Application to eQTL Data Analysis
Daye, Z. John; Chen, Jinbo; Li, Hongzhe
2011-01-01
We consider the problem of high-dimensional regression under non-constant error variances. Despite being a common phenomenon in biological applications, heteroscedasticity has, so far, been largely ignored in high-dimensional analysis of genomic data sets. We propose a new methodology that allows non-constant error variances for high-dimensional estimation and model selection. Our method incorporates heteroscedasticity by simultaneously modeling both the mean and variance components via a nov...
Soil colour and spectral analysis employing linear regression models I. Effect of organic matter
Moustakas N.K.; Barouchas P.E.
2004-01-01
This work comprises an investigation into whether soil reflectance spectral analysis which is employed to calculate the colour characteristics (hue, value, chroma) of soil can be carried out using linear regression models, so that comparison of colour characteristics subsequently becomes possible, and also statistically documented. To this end the colour of soil samples was calculated through spectrum reflectance in the visible region of dry smooth-rubbed soil samples smaller than 250 mm. The...
Yu, R.; Geddes, Jr; Fazel, S.
2012-01-01
The risk of antisocial outcomes in individuals with personality disorder (PD) remains uncertain. The authors synthesize the current evidence on the risks of antisocial behavior, violence, and repeat offending in PD, and they explore sources of heterogeneity in risk estimates through a systematic review and meta-regression analysis of observational studies comparing antisocial outcomes in personality disordered individuals with controls groups. Fourteen studies examined risk of antisocial and ...
Wan-dong Hong; Le-mei Dong; Zen-cai Jiang; Qi-huai Zhu; Shu-Qing Jin
2011-01-01
OBJECTIVES: Recent guidelines recommend that all cirrhotic patients should undergo endoscopic screening for esophageal varices. That identifying cirrhotic patients with esophageal varices by noninvasive predictors would allow for the restriction of the performance of endoscopy to patients with a high risk of having varices. This study aimed to develop a decision model based on classification and regression tree analysis for the prediction of large esophageal varices in cirrhotic patients. MET...
Ibrahim Fayad
2014-11-01
Full Text Available Estimating forest canopy height from large-footprint satellite LiDAR waveforms is challenging given the complex interaction between LiDAR waveforms, terrain, and vegetation, especially in dense tropical and equatorial forests. In this study, canopy height in French Guiana was estimated using multiple linear regression models and the Random Forest technique (RF. This analysis was either based on LiDAR waveform metrics extracted from the GLAS (Geoscience Laser Altimeter System spaceborne LiDAR data and terrain information derived from the SRTM (Shuttle Radar Topography Mission DEM (Digital Elevation Model or on Principal Component Analysis (PCA of GLAS waveforms. Results show that the best statistical model for estimating forest height based on waveform metrics and digital elevation data is a linear regression of waveform extent, trailing edge extent, and terrain index (RMSE of 3.7 m. For the PCA based models, better canopy height estimation results were observed using a regression model that incorporated both the first 13 principal components (PCs and the waveform extent (RMSE = 3.8 m. Random Forest regressions revealed that the best configuration for canopy height estimation used all the following metrics: waveform extent, leading edge, trailing edge, and terrain index (RMSE = 3.4 m. Waveform extent was the variable that best explained canopy height, with an importance factor almost three times higher than those for the other three metrics (leading edge, trailing edge, and terrain index. Furthermore, the Random Forest regression incorporating the first 13 PCs and the waveform extent had a slightly-improved canopy height estimation in comparison to the linear model, with an RMSE of 3.6 m. In conclusion, multiple linear regressions and RF regressions provided canopy height estimations with similar precision using either LiDAR metrics or PCs. However, a regression model (linear regression or RF based on the PCA of waveform samples with waveform extent information is an interesting alternative for canopy height estimation as it does not require several metrics that are difficult to derive from GLAS waveforms in dense forests, such as those in French Guiana.
Wan-dong, Hong; Le-mei, Dong; Zen-cai, Jiang; Qi-huai, Zhu; Shu-Qing, Jin.
Full Text Available OBJECTIVES: Recent guidelines recommend that all cirrhotic patients should undergo endoscopic screening for esophageal varices. That identifying cirrhotic patients with esophageal varices by noninvasive predictors would allow for the restriction of the performance of endoscopy to patients with a hig [...] h risk of having varices. This study aimed to develop a decision model based on classification and regression tree analysis for the prediction of large esophageal varices in cirrhotic patients. METHODS: 309 cirrhotic patients (training sample, 187 patients; test sample 122 patients) were included. Within the training sample, the classification and regression tree analysis was used to identify predictors and prediction model of large esophageal varices. The prediction model was then further evaluated in the test sample and different Child-Pugh classes. RESULTS: The prevalence of large esophageal varices in cirrhotic patients was 50.8%. A tree model that was consisted of spleen width, portal vein diameter and prothrombin time was developed by classification and regression tree analysis achieved a diagnostic accuracy of 84% for prediction of large esophageal varices. When reconstructed into two groups, the rate of varices was 83.2% for high-risk group and 15.2% for low-risk group. Accuracy of the tree model was maintained in the test sample and different Child-Pugh classes. CONCLUSIONS: A decision tree model that consists of spleen width, portal vein diameter and prothrombin time may be useful for prediction of large esophageal varices in cirrhotic patients
Ruman, M; Olkowska, E; Kozio?, K; Absalon, D; Matysik, M; Polkowska, ?
2014-03-01
Monitoring contamination in river water is an expensive procedure, particularly for developing countries where pollution is a significant problem. This study was conducted to provide a pollution monitoring strategy that reduces the cost of laboratory analysis. The new monitoring strategy was designed as a result of cluster and regression analysis on field data collected from an industrially influenced river. Pollution sources in the study site were coal mining, metallurgy, chemical industry, and metropolitan sewage. This river resembles those in other areas of the world, including developing countries where environmental monitoring is financially constrained. Data were collected on variability of contaminant concentrations during four seasons at the same points on tributaries of the river. The variables described in the study are pH, electrical conductivity, inorganic ions, trace elements, and selected organic pollutants. These variables were divided into groups using cluster analysis. These groups were then tested using regression models to identify how the behavior of one variable changes in relation to another. It was found that up to 86.8% of variability of one parameter could be determined by another in the dataset. We adopted 60, 65, and 70% determination levels () for accepting a regression model. As a result, monitoring could be reduced by 15 (60% level) and 10 variables (65 and 70%) out of 43, which comprises 35 and 23% of the monitored variable total. Cost reduction would be most effective if trace elements or organic pollutants were excluded from monitoring because these are the constituents most expensive to analyze. PMID:25602676
Repeated-measures regression designs and analysis for environmental effects monitoring programs
Paine, Michael D.; Skinner, Marc A.; Kilgour, Bruce W.; DeBlois, Elisabeth M.; Tracy, Ellen
2014-12-01
This paper provides a general overview of repeated-measures (RM) regression designs and analysis for marine monitoring programs, in support of sediment chemistry, particle size and benthic macroinvertebrate community analyses provided as part of this series. In RM regression designs, the same n replicates (usually stations in monitoring programs) are re-sampled (i.e., repeatedly measured) at t>1 Times (usually years). The stations provide variation in the predictor, or X variables. In the Terra Nova environmental effects monitoring (EEM) program, n=48 stations were sampled in each of t=7 years from 2000 to 2010. Two distance measures from five drill centres (sources of drilling wastes) were fixed predictor variables. RM regression designs are rarely used in environmental monitoring programs, but are often suitable and would be appropriate if applied to data from many monitoring programs. For the Terra Nova EEM program, carry-over effects, or persistent and usually small-scale variations among stations unrelated to distance, were strong for most sediment quality variables. Whenever natural carry-over effects are strong, RM designs and analysis will usually be more powerful and suitable than alternative approaches to the analysis.
Wan-dong Hong
2011-01-01
Full Text Available OBJECTIVES: Recent guidelines recommend that all cirrhotic patients should undergo endoscopic screening for esophageal varices. That identifying cirrhotic patients with esophageal varices by noninvasive predictors would allow for the restriction of the performance of endoscopy to patients with a high risk of having varices. This study aimed to develop a decision model based on classification and regression tree analysis for the prediction of large esophageal varices in cirrhotic patients. METHODS: 309 cirrhotic patients (training sample, 187 patients; test sample 122 patients were included. Within the training sample, the classification and regression tree analysis was used to identify predictors and prediction model of large esophageal varices. The prediction model was then further evaluated in the test sample and different Child-Pugh classes. RESULTS: The prevalence of large esophageal varices in cirrhotic patients was 50.8%. A tree model that was consisted of spleen width, portal vein diameter and prothrombin time was developed by classification and regression tree analysis achieved a diagnostic accuracy of 84% for prediction of large esophageal varices. When reconstructed into two groups, the rate of varices was 83.2% for high-risk group and 15.2% for low-risk group. Accuracy of the tree model was maintained in the test sample and different Child-Pugh classes. CONCLUSIONS: A decision tree model that consists of spleen width, portal vein diameter and prothrombin time may be useful for prediction of large esophageal varices in cirrhotic patients
Analysis of electrical resistance tomography (ERT) data using least-squares regression modelling in industrial process tomographs has been tested. Potential differences measured between electrodes in rings have been used to carry out the regression modelling to investigate the location and size of a disturbance present in the system. Extensive experiments have been carried out with ERT to test a suitable regression algorithm to extract the disturbance. Current analysis has been performed for a single disturbance known to be present in the system. For the environment considered, the least-squares regression reported in this paper demonstrates an alternative approach for analysis of tomography data in industrial applications. The position (concentric or off-centre) and the size of the disturbance (in concentric cases) can be well defined by the reported regression modelling approach. However, it is still a challenge to define the size of the off-centre disturbance
Buston, Peter M; Elith, Jane
2011-05-01
1. Central questions of behavioural and evolutionary ecology are what factors influence the reproductive success of dominant breeders and subordinate nonbreeders within animal societies? A complete understanding of any society requires that these questions be answered for all individuals. 2. The clown anemonefish, Amphiprion percula, forms simple societies that live in close association with sea anemones, Heteractis magnifica. Here, we use data from a well-studied population of A. percula to determine the major predictors of reproductive success of dominant pairs in this species. 3. We analyse the effect of multiple predictors on four components of reproductive success, using a relatively new technique from the field of statistical learning: boosted regression trees (BRTs). BRTs have the potential to model complex relationships in ways that give powerful insight. 4. We show that the reproductive success of dominant pairs is unrelated to the presence, number or phenotype of nonbreeders. This is consistent with the observation that nonbreeders do not help or hinder breeders in any way, confirming and extending the results of a previous study. 5. Primarily, reproductive success is negatively related to male growth and positively related to breeding experience. It is likely that these effects are interrelated because males that grow a lot have little breeding experience. These effects are indicative of a trade-off between male growth and parental investment. 6. Secondarily, reproductive success is positively related to female growth and size. In this population, female size is positively related to group size and anemone size, also. These positive correlations among traits likely are caused by variation in site quality and are suggestive of a silver-spoon effect. 7. Noteworthily, whereas reproductive success is positively related to female size, it is unrelated to male size. This observation provides support for the size advantage hypothesis for sex change: both individuals maximize their reproductive success when the larger individual adopts the female tactic. 8. This study provides the most complete picture to date of the factors that predict the reproductive success of dominant pairs of clown anemonefish and illustrates the utility of BRTs for analysis of complex behavioural and evolutionary ecology data. PMID:21284624
Baghi, Quentin; Métris, Gilles; Bergé, Joël; Christophe, Bruno; Touboul, Pierre; Rodrigues, Manuel
2015-03-01
The analysis of physical measurements often copes with highly correlated noises and interruptions caused by outliers, saturation events, or transmission losses. We assess the impact of missing data on the performance of linear regression analysis involving the fit of modeled or measured time series. We show that data gaps can significantly alter the precision of the regression parameter estimation in the presence of colored noise, due to the frequency leakage of the noise power. We present a regression method that cancels this effect and estimates the parameters of interest with a precision comparable to the complete data case, even if the noise power spectral density (PSD) is not known a priori. The method is based on an autoregressive fit of the noise, which allows us to build an approximate generalized least squares estimator approaching the minimal variance bound. The method, which can be applied to any similar data processing, is tested on simulated measurements of the MICROSCOPE space mission, whose goal is to test the weak equivalence principle (WEP) with a precision of 1 0-15. In this particular context the signal of interest is the WEP violation signal expected to be found around a well defined frequency. We test our method with different gap patterns and noise of known PSD and find that the results agree with the mission requirements, decreasing the uncertainty by a factor of 60 with respect to ordinary least squares methods. We show that it also provides a test of significance to assess the uncertainty of the measurement.
D'Souza, Sonia; Rasmussen, John
2012-01-01
Background: The next fifty years will see a drastic increase in the older population. Among other effects, ageing causes a decrease in strength. It is necessary to provide safe and comfortable environments for the elderly. To achieve this, digital human modelling has proved to be a useful and valuable ergonomic tool. Objective: To investigate age and gender effects on the torque-producing ability in the knee and elbow in older adults. To create strength scaled equations based on age, gender, upper/lower limb lengths and masses using multiple linear regression. To reduce the number of dependent parameters based on statistical redundancies, and then validate these equations. Methods: 283 subjects (141 males, 142 females) aged 50-59 years (54.9 +/- 2.9) , 60-69 years (65.4 +/- 2.9) and 70-79 years (73.7 +/- 2.7) were tested for maximal voluntary isometric torque of right knee extensors and elbow flexors. Results: Males were signifantly stronger than females across all age groups. Elbow peak torque (EPT) was better preserved from 60s to 70s whereas knee peak torque (KPT) reduced significantly (P<0.05) across all age groups. This held true for males and females. Gender, thigh mass and age best predicted KPT (R2=0.60). Gender, forearm mass and age best predicted EPT (R2=0.75). Good crossvalidation was established for both elbow and knee models. Conclusion: This cross-sectional study of muscle strength created and validated strength scaled equations of EPT and KPT using only gender, segment mass and age.
Kohei Arai
2012-12-01
Full Text Available In order to evaluate the skin surface temperature (SSST estimation accuracy with MODIS data, 84 of MODIS scenes together with the match-up data of NCEP/GDAS are used. Through regressive analysis, it is found that 0.305 to 0.417 K of RMSE can be achieved. Furthermore, it also is found that band 29 is effective for atmospheric correction (30.6 to 38.8% of estimation accuracy improvement. If single coefficient set for the regressive equation is used for all the cases, SSST estimation accuracy is around 1.969 K so that the specific coefficient set for the five different cases have to be set.
Sub-pixel estimation of tree cover and bare surface densities using regression tree analysis
Carlos Augusto Zangrando Toneli
2011-09-01
Full Text Available Sub-pixel analysis is capable of generating continuous fields, which represent the spatial variability of certain thematic classes. The aim of this work was to develop numerical models to represent the variability of tree cover and bare surfaces within the study area. This research was conducted in the riparian buffer within a watershed of the São Francisco River in the North of Minas Gerais, Brazil. IKONOS and Landsat TM imagery were used with the GUIDE algorithm to construct the models. The results were two index images derived with regression trees for the entire study area, one representing tree cover and the other representing bare surface. The use of non-parametric and non-linear regression tree models presented satisfactory results to characterize wetland, deciduous and savanna patterns of forest formation.
Identification of cotton properties to improve yarn count quality by using regression analysis
International Nuclear Information System (INIS)
Identification of raw material characteristics towards yarn count variation was studied by using statistical techniques. Regression analysis is used to meet the objective. Stepwise regression is used for mode) selection, and coefficient of determination and mean squared error (MSE) criteria are used to identify the contributing factors of cotton properties for yam count. Statistical assumptions of normality, autocorrelation and multicollinearity are evaluated by using probability plot, Durbin Watson test, variance inflation factor (VIF), and then model fitting is carried out. It is found that, invisible (INV), nepness (Nep), grayness (RD), cotton trash (TR) and uniformity index (VI) are the main contributing cotton properties for yarn count variation. The results are also verified by Pareto chart. (author)
Kinnebrock, Silja; Podolskij, Mark
2008-01-01
This paper introduces a new estimator to measure the ex-post covariation between high-frequency financial time series under market microstructure noise. We provide an asymptotic limit theory (including feasible central limit theorems) for standard methods such as regression, correlation analysis and covariance, for which we obtain the optimal rate of convergence. We demonstrate some positive semidefinite estimators of the covariation and construct a positive semidefinite estimator of the conditional covariance matrix in the central limit theorem. Furthermore, we indicate how the assumptions on the noise process can be relaxed and how our method can be applied to non-synchronous observations. We also present an empirical study of how high-frequency correlations, regressions and covariances change through time.
Analysis of ontogenetic spectra of populations of plants and lichens via ordinal regression
Sofronov, G. Yu.; Glotov, N. V.; Ivanov, S. M.
2015-03-01
Ontogenetic spectra of plants and lichens tend to vary across the populations. This means that if several subsamples within a sample (or a population) were collected, then the subsamples would not be homogeneous. Consequently, the statistical analysis of the aggregated data would not be correct, which could potentially lead to false biological conclusions. In order to take into account the heterogeneity of the subsamples, we propose to use ordinal regression, which is a type of generalized linear regression. In this paper, we study the populations of cowberry Vaccinium vitis-idaea L. and epiphytic lichens Hypogymnia physodes (L.) Nyl. and Pseudevernia furfuracea (L.) Zopf. We obtain estimates for the proportions of between-sample variability in the total variability of the ontogenetic spectra of the populations.
Roberto, Baeza-Serrato; José Antonio, Vázquez-López.
2014-06-01
Full Text Available Uno de los supuestos principales del análisis de regresión lineal es la existencia de una relación de causalidad entre las variables analizadas, sin que el análisis de regresión lo permita demostrar. Esta investigación demuestra la causalidad entre las variables analizadas a través de la construcció [...] n y análisis de la retroalimentación entre las variables en estudio, plasmada en un diagrama causal y validado a través de simulación dinámica. Una de las principales contribuciones de ésta investigación, es la propuesta de utilizar un enfoque de dinámica de sistemas, para desarrollar un método de transición de un modelo de regresión lineal múltiple predictivo a un modelo de regresión no lineal simple explicativo, que incrementa el nivel de predicción del modelo. El error cuadrático medio (ECM) es utilizado como criterio de predicción. La validación se realizó con tres modelos de regresión lineal obtenidos experimentalmente en una empresa del sector textil, mostrando una alternativa para incrementar la fiabilidad en los modelos de predicción. Abstract in english One of the main assumptions of the linear regression analysis is the existence of a causal relationship between the variables analyzed, which the regression analysis does not demonstrate. This paper demonstrates the causality between the variables analyzed through the construction and analysis of th [...] e feedback from the variables under study, expressed in a causal diagram and validated through dynamic simulation. The major contribution of this research is the proposal of the use of the system dynamics approach to develop a method of transition from a multiple regression predictive model to a simpler nonlinear regression explanatory model, which increases the level of prediction of the model. The mean square error (MSE) is taken as a criterion for prediction. The validation in the transition model was performed with three linear regression models obtained experimentally in a textile company, showing a method for increasing the reliability of prediction models.
A quantitative structure-property relationship (QSPR) study was performed to develop models those relate the structures of 150 drug organic compounds to their n-octanol-water partition coefficients (log Po/w). Molecular descriptors derived solely from 3D structures of the molecular drugs. A genetic algorithm was also applied as a variable selection tool in QSPR analysis. The models were constructed using 110 molecules as training set, and predictive ability tested using 40 compounds. Modeling of log Po/w of these compounds as a function of the theoretically derived descriptors was established by multiple linear regression (MLR). Four descriptors for these compounds molecular volume (MV) (geometrical), hydrophilic-lipophilic balance (HLB) (constitutional), hydrogen bond forming ability (HB) (electronic) and polar surface area (PSA) (electrostatic) are taken as inputs for the model. The use of descriptors calculated only from molecular structure eliminates the need for experimental determination of properties for use in the correlation and allows for the estimation of log Po/w for molecules not yet synthesized. Application of the developed model to a testing set of 40 drug organic compounds demonstrates that the model is reliable with good predictive accuracy and simple formulation. The prediction results are in good agreement with the experimental value. The root mean square error of prediction (RMSEP) and square correlation coefficient(RMSEP) and square correlation coefficient (R2) for MLR model were 0.22 and 0.99 for the prediction set log Po/w
International Nuclear Information System (INIS)
Computation of distance to fault on an electrical transmission line is affected by many sources of uncertainty, including parameter setting errors, measurement errors, as well as absence of information and incomplete modelling of a system under fault condition. In this paper we propose an application of the variance-based global sensitivity measures for evaluation of fault location algorithms. The main goal of the evaluation is to identify factors and their interactions that contribute to the fault locator output variability. This analysis is based on the results of Sparse Grid Regression. The method compiles the Functional ANOVA model to represent fault locator output as a function of uncertain factors. The ANOVA model provides a tool for interpretation and sensitivity analysis. In practice, such analysis can help in functional performance tests, especially in: selection of the optimal fault location algorithm (device) for a specific application, calibration process and building confidence in a fault location function result. The paper concludes with an application example which demonstrates use of the proposed methodology in testing and comparing some commonly used fault location algorithms. This example is also used to demonstrate numerical efficiency for this type of application of the proposed Sparse Grid Regression method in comparison to the Quasi-Monte Carlo approach. - Highlights: ? Sparse Grid Regression (SGR) method has been developed and presented in the has been developed and presented in the paper. ? The SGR method is able to fit ANOVA model to input/output data of a black-box function. ? The SGR provides variance-based sensitivities to be used for Global Sensitivity Analysis (GSA). ? The SGR algorithm relies on the numerical multi-dimensional integration on a sparse grid. ? Application example presented is GSA of fault-locating algorithms used in electrical networks.
Diversity Performance Analysis on Multiple HAP Networks
Feihong Dong
2015-06-01
Full Text Available One of the main design challenges in wireless sensor networks (WSNs is achieving a high-data-rate transmission for individual sensor devices. The high altitude platform (HAP is an important communication relay platform for WSNs and next-generation wireless networks. Multiple-input multiple-output (MIMO techniques provide the diversity and multiplexing gain, which can improve the network performance effectively. In this paper, a virtual MIMO (V-MIMO model is proposed by networking multiple HAPs with the concept of multiple assets in view (MAV. In a shadowed Rician fading channel, the diversity performance is investigated. The probability density function (PDF and cumulative distribution function (CDF of the received signal-to-noise ratio (SNR are derived. In addition, the average symbol error rate (ASER with BPSK and QPSK is given for the V-MIMO model. The system capacity is studied for both perfect channel state information (CSI and unknown CSI individually. The ergodic capacity with various SNR and Rician factors for different network configurations is also analyzed. The simulation results validate the effectiveness of the performance analysis. It is shown that the performance of the HAPs network in WSNs can be significantly improved by utilizing the MAV to achieve overlapping coverage, with the help of the V-MIMO techniques.
Analysis of designed experiments by stabilised PLS Regression and jack-knifing
Martens, Harald; HØy, M.
2001-01-01
Pragmatical, visually oriented methods for assessing and optimising bi-linear regression models are described, and applied to PLS Regression (PLSR) analysis of multi-response data from controlled experiments. The paper outlines some ways to stabilise the PLSR method to extend its range of applicability to the analysis of effects in designed experiments. Two ways of passifying unreliable variables are shown. A method for estimating the reliability of the cross- validated prediction error RMSEP is demonstrated. Some recently developed jack-knifing extensions are illustrated, for estimating the reliability of the linear and bi-linear model parameter estimates. The paper illustrates how the obtained PLSR "significance" probabilities are similar to those from conventional factorial ANOVA, but the PLSR is shown to give important additional overview plots of the main relevant structures in the multi-response data. The study is part of an ongoing effort to establish a cognitively simple and versatile approach to multivariate data analysis, with reliability assessment based on the data at hand, and with little need for abstract distribution theory [H. Martens, M. Martens, Multivariate Analysis of Quality. An Introduction, Wiley, Chichester, UK, 2001].
Mackley, Rob D.; Spane, Frank A.; Pulsipher, Trenton C.; Allwardt, Craig H.
2010-09-01
A software tool was created in Fiscal Year 2010 (FY11) that enables multiple-regression correction of well water levels for river-stage effects. This task was conducted as part of the Remediation Science and Technology project of CH2MHILL Plateau Remediation Company (CHPRC). This document contains an overview of the correction methodology and a user’s manual for Multiple Regression in Excel (MRCX) v.1.1. It also contains a step-by-step tutorial that shows users how to use MRCX to correct river effects in two different wells. This report is accompanied by an enclosed CD that contains the MRCX installer application and files used in the tutorial exercises.
Quantitative analysis of multiple isotope autoradiography
International Nuclear Information System (INIS)
Recently, in nuclear medicine, many new gamma- and positron- emitting radiopharmaceuticals have been introduced, and their distribution and metabolism need to be evaluated. The use of whole body autoradiography (ARG) provides the high spatial resolution required to determine radiopharmaceutical biodistribution in small animals. The quantitative digital film analysis system using videodensitometry permits to analyze the multiple isotope ARG in the same sections of the same animals. The system, the method used and an illustrative example of application of quantitative multiple isotope ARG are described. Simultaneous injections of two tracers can differentiate two physiological process, for example, blood flow and metabolism, in the same animal, and sequential injection of two tracers can identify differences in a process in normal and diseased states, or differences in the same process sampled at two times
Torres-Valencia, Cristian A; Alvarez, Mauricio A; Orozco-Gutierrez, Alvaro A
2014-08-01
Human emotion recognition (HER) allows the assessment of an affective state of a subject. Until recently, such emotional states were described in terms of discrete emotions, like happiness or contempt. In order to cover a high range of emotions, researchers in the field have introduced different dimensional spaces for emotion description that allow the characterization of affective states in terms of several variables or dimensions that measure distinct aspects of the emotion. One of the most common of such dimensional spaces is the bidimensional Arousal/Valence space. To the best of our knowledge, all HER systems so far have modelled independently, the dimensions in these dimensional spaces. In this paper, we study the effect of modelling the output dimensions simultaneously and show experimentally the advantages in modeling them in this way. We consider a multimodal approach by including features from the Electroencephalogram and a few physiological signals. For modelling the multiple outputs, we employ a multiple output regressor based on support vector machines. We also include an stage of feature selection that is developed within an embedded approach known as Recursive Feature Elimination (RFE), proposed initially for SVM. The results show that several features can be eliminated using the multiple output support vector regressor with RFE without affecting the performance of the regressor. From the analysis of the features selected in smaller subsets via RFE, it can be observed that the signals that are more informative into the arousal and valence space discrimination are the EEG, Electrooculogram/Electromiogram (EOG/EMG) and the Galvanic Skin Response (GSR). PMID:25570122
Waller, M. C.
1976-01-01
An electro-optical device called an oculometer which tracks a subject's lookpoint as a time function has been used to collect data in a real-time simulation study of instrument landing system (ILS) approaches. The data describing the scanning behavior of a pilot during the instrument approaches have been analyzed by use of a stepwise regression analysis technique. A statistically significant correlation between pilot workload, as indicated by pilot ratings, and scanning behavior has been established. In addition, it was demonstrated that parameters derived from the scanning behavior data can be combined in a mathematical equation to provide a good representation of pilot workload.
Relating principal component analysis on merged data sets to a regression approach
Meyners, Michael; Qannari, El Mostafa
2001-01-01
A method for calculating a consensus of several data matrices on the same samples using a PCA is based on a mathematical background. We propose a model to describe the data which might be obtained e. g. by means of a free choice profiling or a fixed vocabulary in a sensory profiling framework. A regression approach for this model leads to a Principal Component Analysis on Merged Data sets (PCAMD), which provides a simple method to calculate a consensus from the data. Since we use less restric...
Analysis of reactor noise by multi-variate auto-regressive model
International Nuclear Information System (INIS)
The multi-variate auto-regressive model has recently been applied to the noise analysis of nuclear reactor systems. From such a standpoint a system identification study was performed at the Japan Power Demonstrain Reactor-2 (JPDR-2), 45 Mwt, using pseuds-random signals. The aim of this paper is further to extend and refine this identification problem based on the measured data. Emphasis is on the fact that the results obtained by the non-parametric method can by justified by the parametric one. Elucidation of feedback map is also made by estimating the noise contribution rate. Results of computation show the effectiveness of the procedure. (author)
International Nuclear Information System (INIS)
Statistical analysis of properties of powder, compacts and wize of tungsten VA was made to determine optimum conditions of plastic working of tungsten and its alloys. The data were collected on 29 parameters and processed on ''Minsk-22'' computer. Correlations were found between wire structure and such factors as hardness and density of compacts, fractional composition and volume weight of powder and others. A regression equation was obtained which connected the structure of 0.52 mm wire with a number of parameters of initial material
Mandal, Nilrudra; Doloi, Biswanath; Mondal, Biswanath
2015-05-01
In the present study, an attempt has been made to apply the Taguchi parameter design method and regression analysis for optimizing the cutting conditions on surface finish while machining AISI 4340 steel with the help of the newly developed yttria based Zirconia Toughened Alumina (ZTA) inserts. These inserts are prepared through wet chemical co-precipitation route followed by powder metallurgy process. Experiments have been carried out based on an orthogonal array L9 with three parameters (cutting speed, depth of cut and feed rate) at three levels (low, medium and high). Based on the mean response and signal to noise ratio (SNR), the best optimal cutting condition has been arrived at A3B1C1 i.e. cutting speed is 420 m/min, depth of cut is 0.5 mm and feed rate is 0.12 m/min considering the condition smaller is the better approach. Analysis of Variance (ANOVA) is applied to find out the significance and percentage contribution of each parameter. The mathematical model of surface roughness has been developed using regression analysis as a function of the above mentioned independent variables. The predicted values from the developed model and experimental values are found to be very close to each other justifying the significance of the model. A confirmation run has been carried out with 95 % confidence level to verify the optimized result and the values obtained are within the prescribed limit.
SPADY, Richard; Stouli, Sami
2012-01-01
We propose an alternative (`dual regression') to the quantile regression process for the global estimation of conditional distribution functions under minimal assumptions. Dual regression provides all the interpretational power of the quantile regression process while largely avoiding the need for `rearrangement' to repair the intersecting conditional quantile surfaces that quantile regression often produces in practice. Dual regression can be appropriately modified to provi...
International Nuclear Information System (INIS)
An uncertainty analysis method is proposed here, which uses Fourier Amplitude Sensitivity Test (FAST) and Stepwise Regression Technique (SRT). This method is a compromise between the approximation method [response surface method (RSM) or moments method] and Monte Carlo method (MCM). It is concluded that: 1. FAST gives the partial variance for each input parameter, which can be used as global sensitivity ranking between input parameters, with moderate sampling point compared to crude MCM. 2. SRT is a good tool to construct the later-used first- or second-order response surface model consisting of comparatively important parameters. 3. The combined uncertainty analysis method using FAST and SRT can be used for uncertainty/sensitivity analysis of the large computer codes with moderate cost and it will be a useful tool to analyze the feasibility of the newly developed, highly uncertain system models
A Bayesian Quantile Regression Analysis of Potential Risk Factors for Violent Crimes in USA
Ming Wang; Lijun Zhang
2012-01-01
Bayesian quantile regression has drawn more attention in widespread applications recently. Yu and Moyeed (2001) proposed an asymmetric Laplace distribution to provide likelihood based mechanism for Bayesian inference of quantile regression models. In this work, the primary objective is to evaluate the performance of Bayesian quantile regression compared with simple regression and quantile regression through simulation and with application to a crime dataset from 50 USA states for assessing th...
Doreswamy; Vastrad, Chanabasayya M.
2013-01-01
Regularized regression techniques for linear regression have been created the last few ten years to reduce the flaws of ordinary least squares regression with regard to prediction accuracy. In this paper, new methods for using regularized regression in model choice are introduced, and we distinguish the conditions in which regularized regression develops our ability to discriminate models. We applied all the five methods that use penalty-based (regularization) shrinkage to h...
Rosana de Cassia de Souza Schneider
2011-03-01
Full Text Available O ar é um meio eficiente de dispersão de poluentes atmosféricos e seu comportamento depende dos movimentos atmosféricos que ocorrem na troposfera. Em Porto Alegre, Estado do Rio Grande do Sul, há um grande tráfego diário e uma concentração de indústrias que podem ser responsáveis por emissões atmosféricas. Neste trabalho, estudou-se o comportamento das concentrações diárias de material particulado (PM10 desta cidade, considerando a influência dos elementos meteorológicos. A análise dos dados foi realizada a partir de estatísticas descritivas, correlação linear e regressão múltipla. Os dados foram fornecidos pela Fundação Estadual de Proteção Ambiental Henrique Luiz Roessler - RS (FEPAM e pelo Instituto Nacional de Meteorologia (INMET. A partir das análises pôde-se verificar que: as concentrações do PM10, medidos diariamente às 16h, não ultrapassaram os padrões nacionais de qualidade do ar; os elementos meteorológicos que influenciam nas concentrações do PM10 foram: a velocidade média diária do vento e a radiação média diária com relações negativas; as temperaturas médias diárias do ar e as direções, norte e noroeste, do vento, com relações positivas. As direções do vento que contribuem significativamente para diminuir as concentrações nos locais medidos são Leste e Sudeste.Air is an efficient means of atmospheric pollutants dispersal and its r behavior depends on the atmospheric movements that occur in the troposphere. In Porto Alegre, Rio Grande do Sul State, there is a large daily traffic and a concentration of industries that may be responsible for atmospheric emission. In the present work we studied the behavior of daily concentrations of particulate matter (PM10, in this city, considering the influence of meteorological variables. Data analysis was performed from descriptive statistics, linear correlation and multiple regressions. Data were provided by the State Foundation of Environmental Protection Henrique Luiz Roessler - RS and the National Institute of Meteorology. Based on the analysis it was possible to verify that: the concentration of PM10, measured every day at 4:00 p.m., did not exceed national standards for air quality; meteorological elements that influenced on the concentrations of PM10 were the daily average wind speed and average daily radiation with negative relations; the daily average temperature of the air and the directions, north and northwest of wind, with positive relations. Wind directions which contribute significantly to lower concentrations on the measured places are east and southeast.
Keerthiprasad.K
2014-08-01
Full Text Available In recent years, alloy steels have been widely usedin aerospace and automotive industries. Machining of these materials requires better understanding of cutting processes regarding accuracy and efficiency. This study addresses the modelling of the machinability of EN353 and 20mncr5 materials. In this study, multiple regression analysis (MRA is used to investigate the influence of some parameters on the thrust force and torque in the drilling processes of alloy steel materials. The model were identified by using cutting speed, feed rate, and depth as input data and the thrust force and torque as the output data. The statistical analysis accompanied with results showed that cutting feed (f were the most significant parameters on the drilling process, while spindle speed seemed insignificant. Since the spindle speed was insignificant, it directed us to set it either at the highest spindle speed to obtain high material removal rate or at the lowest spindle speed to prolong the tool life depending on the need for the application. The mathematical model is based on a power regression modelling, dependent on the three above mentioned parameters.
Jalal, Hawre; Goldhaber-Fiebert, Jeremy D; Kuntz, Karen M
2015-07-01
Decision makers often desire both guidance on the most cost-effective interventions given current knowledge and also the value of collecting additional information to improve the decisions made (i.e., from value of information [VOI] analysis). Unfortunately, VOI analysis remains underused due to the conceptual, mathematical, and computational challenges of implementing Bayesian decision-theoretic approaches in models of sufficient complexity for real-world decision making. In this study, we propose a novel practical approach for conducting VOI analysis using a combination of probabilistic sensitivity analysis, linear regression metamodeling, and unit normal loss integral function-a parametric approach to VOI analysis. We adopt a linear approximation and leverage a fundamental assumption of VOI analysis, which requires that all sources of prior uncertainties be accurately specified. We provide examples of the approach and show that the assumptions we make do not induce substantial bias but greatly reduce the computational time needed to perform VOI analysis. Our approach avoids the need to analytically solve or approximate joint Bayesian updating, requires only one set of probabilistic sensitivity analysis simulations, and can be applied in models with correlated input parameters. PMID:25840900
Regression analysis of growth responses to water depth in three wetland plant species
DEFF Research Database (Denmark)
Sorrell, Brian K; Tanner, Chris C
2012-01-01
Background and aims Plant species composition in wetlands and on lakeshores often shows dramatic zonation which is frequently ascribed to differences in flooding tolerance. This study compared the growth responses to water depth of three species (Phormium tenax, Carex secta, Typha orientalis) differing in depth preferences in wetlands, using non-linear and quantile regression analyses to establish how flooding tolerance can explain field zonation. Methodology Plants were established for 8 months in outdoor cultures in waterlogged soil without standing water, and then randomly allocated to water depths from 0 – 0.5 m. Morphological and growth responses to depth were followed for 54 days before harvest, and then analysed by repeated measures analysis of covariance, and non-linear and quantile regression analysis (QRA), to compare flooding tolerances. Principal results Growth responses to depth differed between the three species, and were non-linear. P. tenax growth rapidly decreased in standing water > 0.25 m depth, C. secta growth increased initially with depth but then decreased at depths > 0.30 m, accompanied by increased shoot height and decreased shoot density, and T. orientalis was unaffected by the 0 – 0.50 m depth range. In P. tenax the decrease in growth was associated with a decrease in the number of leaves produced per ramet and in C. secta the effect of water depth was greatest for the tallest shoots. Allocation patterns were unaffected by depth. Conclusions The responses are consistent with the principle that zonation in the field is primarily structured by competition in shallow water and by physiological flooding tolerance in deep water. Regression analyses, especially QRA, proved to be powerful tools in distinguishing genuine phenotypic responses to water depth from non-phenotypic variation due to size and developmental differences.
M.I. Rizwan Jamal
2012-01-01
Full Text Available Medium Density Fiber board (MDF panels are appropriate for many exterior and interior industrial applications. The degree of surface roughness of MDF plays an important role since, any surface irregularities will affect the final quality of the product. In the present study, regression model were developed to predict surface roughness in drilling MDF panels with carbide step drills. In the development of predictive models, drilling parameters of spindle speed, feed rate and drill diameter were considered as model variables. For this purpose, Taguchi’s design of experiments was carried out in order to collect surface roughness value. The Orthogonal Array (OA and Analysis of Variance (ANOVA are employed to study the surface roughness characteristics in drilling operation of MDF panels. The objective is to establish a correlation between spindle speed, feed rate and drill diameter with surface roughness in a MDF panel. The experiments are conducted as per Taguchi L27 orthogonal array with different cutting conditions. ANOVA and F-test were used to check the validity of regression model and to determine the significant parameter affecting the surface roughness. The statistical analysis showed that the feed rate was an utmost parameter on surface roughness. The microstructure of drilled surfaces were also studied by scanning electron microscopy (SEM.The SEM investigations reveled that drilling MDF panels with step drill produce surface striations and waviness which were increased significantly with feed rate.
International Nuclear Information System (INIS)
Non-invasively reconstructing the transmembrane potentials (TMPs) from body surface potentials (BSPs) constitutes one form of the inverse ECG problem that can be treated as a regression problem with multi-inputs and multi-outputs, and which can be solved using the support vector regression (SVR) method. In developing an effective SVR model, feature extraction is an important task for pre-processing the original input data. This paper proposes the application of principal component analysis (PCA) and kernel principal component analysis (KPCA) to the SVR method for feature extraction. Also, the genetic algorithm and simplex optimization method is invoked to determine the hyper-parameters of the SVR. Based on the realistic heart-torso model, the equivalent double-layer source method is applied to generate the data set for training and testing the SVR model. The experimental results show that the SVR method with feature extraction (PCA-SVR and KPCA-SVR) can perform better than that without the extract feature extraction (single SVR) in terms of the reconstruction of the TMPs on epi- and endocardial surfaces. Moreover, compared with the PCA-SVR, the KPCA-SVR features good approximation and generalization ability when reconstructing the TMPs.
Survival regression analysis: a powerful tool for evaluating fighting and assessment.
Moya-Laraño; Wise
2000-09-01
Theoretical models of animal contests frequently generate predictions about how asymmetries (e.g. differences in size, residence status) between contestants affect fight duration. Linear regression and nonparametric correlation analyses are commonly used to test the fit of data to such models. We show how survival regression analysis (SRA) is a powerful technique for studying the effect of asymmetries on the duration of contests. SRA, which is under-utilized by students of animal behaviour, offers several advantages over more frequently used procedures. It provides unbiased parameter estimates even when including censored data (i.e. results of contests that have not ended at the time when observations are stopped). The analysis of hazard functions, which is a component of SRA, is an easy way to test for consistency with predictions of the sequential assessment game model. These and other advantages of SRA are illustrated by using SRA and more conventional methods to analyse the effect of asymmetries on contest duration for encounters between female Mediterranean tarantulas, Lycosa tarentula (L.). It is hoped that this example of the advantages of SRA will encourage more widespread use of this powerful technique. Copyright 2000 The Association for the Study of Animal Behaviour. PMID:11007639
Integrative Data Analysis: The Simultaneous Analysis of Multiple Data Sets
Curran, Patrick J.; Hussong, Andrea M.
2009-01-01
There are both quantitative and methodological techniques that foster the development and maintenance of a cumulative knowledge base within the psychological sciences. Most noteworthy of these techniques is meta-analysis, which allows for the synthesis of summary statistics drawn from multiple studies when the original data are not available.…
International Nuclear Information System (INIS)
The observation of the equipment and piping system installed in an operating nuclear power plant in earthquakes is very umportant for evaluating and confirming the adequacy and the safety margin expected in the design stage. By analyzing observed earthquake records, it can be expected to get the valuable data concerning the behavior of those in earthquakes, and extract the information about the aseismatic design parameters for those systems. From these viewpoints, an earthquake observation system was installed in a reactor building in an operating plant. Up to now, the records of three earthquakes were obtained with this system. In this paper, an example of the analysis of earthquake records is shown, and the main purpose of the analysis was the evaluation of the vibration mode, natural frequency and damping factor of this piping system. Prior to the earthquake record analysis, the eigenvalue analysis for this piping system was performed. Auto-regressive analysis was applied to the observed acceleration time history which was obtained with a piping system installed in an operating BWR. The results of earthquake record analysis agreed well with the results of eigenvalue analysis. (Kako, I.)
Metin Akay
2006-01-01
Full Text Available In this study, effective economic factors on the import of forest industry products were investigated. Data used in the time series analysis covered a period of 18 years from 1985 to 2002. Double-log linear function was used to analyze the import model. The imported forest industry products in Turkey were considered to be a function of domestic production value, domestic prices, national income per capita, lagged import value (t-1, exchange-rate (TL/$ and export values. The parameters were evaluated using a regression analysis. The results indicated that imported forest industry products in Turkey have largely been effected by national income per capita, domestic prices, export values and exchange-rate variables.
Statistical learning method in regression analysis of simulated positron spectral data
Positron lifetime spectroscopy is a non-destructive tool for detection of radiation induced defects in nuclear reactor materials. This work concerns the applicability of the support vector machines method for the input data compression in the neural network analysis of positron lifetime spectra. It has been demonstrated that the SVM technique can be successfully applied to regression analysis of positron spectra. A substantial data compression of about 50 % and 8 % of the whole training set with two and three spectral components respectively has been achieved including a high accuracy of the spectra approximation. However, some parameters in the SVM approach such as the insensitivity zone e and the penalty parameter C have to be chosen carefully to obtain a good performance. (author)
Junek, W. N.; Jones, W. L.; Woods, M. T.
2011-12-01
An automated event tree analysis system for estimating the probability of short term volcanic activity is presented. The algorithm is driven by a suite of empirical statistical models that are derived through logistic regression. Each model is constructed from a multidisciplinary dataset that was assembled from a collection of historic volcanic unrest episodes. The dataset consists of monitoring measurements (e.g. InSAR, seismic), source modeling results, and historic eruption activity. This provides a simple mechanism for simultaneously accounting for the geophysical changes occurring within the volcano and the historic behavior of analog volcanoes. The algorithm is extensible and can be easily recalibrated to include new or additional monitoring, modeling, or historic information. Standard cross validation techniques are employed to optimize its forecasting capabilities. Analysis results from several recent volcanic unrest episodes are presented.
Within-session analysis of the extinction of pavlovian fear-conditioning using robust regression
Directory of Open Access Journals (Sweden)
Vargas-Irwin, Cristina
2010-06-01
Full Text Available Traditionally , the analysis of extinction data in fear conditioning experiments has involved the use of standard linear models, mostly ANOVA of between-group differences of subjects that have undergone different extinction protocols, pharmacological manipulations or some other treatment. Although some studies report individual differences in quantities such as suppression rates or freezing percentages, these differences are not included in the statistical modeling. Withinsubject response patterns are then averaged using coarse-grain time windows which can overlook these individual performance dynamics. Here we illustrate an alternative analytical procedure consisting of 2 steps: the estimation of a trend for within-session data and analysis of group differences in trend as main outcome. This procedure is tested on real fear-conditioning extinction data, comparing trend estimates via Ordinary Least Squares (OLS and robust Least Median of Squares (LMS regression estimates, as well as comparing between-group differences and analyzing mean freezing percentage versus LMS slopes as outcomes
Modelling and analysis of turbulent datasets using Auto Regressive Moving Average processes
Energy Technology Data Exchange (ETDEWEB)
Faranda, Davide, E-mail: davide.faranda@cea.fr; Dubrulle, Bérengère; Daviaud, François [Laboratoire SPHYNX, Service de Physique de l' Etat Condensé, DSM, CEA Saclay, CNRS URA 2464, 91191 Gif-sur-Yvette (France); Pons, Flavio Maria Emanuele [Dipartimento di Scienze Statistiche, Universitá di Bologna, Via delle Belle Arti 41, 40126 Bologna (Italy); Saint-Michel, Brice [Institut de Recherche sur les Phénomènes Hors Equilibre, Technopole de Chateau Gombert, 49 rue Frédéric Joliot Curie, B.P. 146, 13 384 Marseille (France); Herbert, Éric [Université Paris Diderot - LIED - UMR 8236, Laboratoire Interdisciplinaire des Énergies de Demain, Paris (France); Cortet, Pierre-Philippe [Laboratoire FAST, CNRS, Université Paris-Sud (France)
2014-10-15
We introduce a novel way to extract information from turbulent datasets by applying an Auto Regressive Moving Average (ARMA) statistical analysis. Such analysis goes well beyond the analysis of the mean flow and of the fluctuations and links the behavior of the recorded time series to a discrete version of a stochastic differential equation which is able to describe the correlation structure in the dataset. We introduce a new index ? that measures the difference between the resulting analysis and the Obukhov model of turbulence, the simplest stochastic model reproducing both Richardson law and the Kolmogorov spectrum. We test the method on datasets measured in a von Kármán swirling flow experiment. We found that the ARMA analysis is well correlated with spatial structures of the flow, and can discriminate between two different flows with comparable mean velocities, obtained by changing the forcing. Moreover, we show that the ? is highest in regions where shear layer vortices are present, thereby establishing a link between deviations from the Kolmogorov model and coherent structures. These deviations are consistent with the ones observed by computing the Hurst exponents for the same time series. We show that some salient features of the analysis are preserved when considering global instead of local observables. Finally, we analyze flow configurations with multistability features where the ARMA technique is efficient in discriminating different stability branches of the system.
Modelling and analysis of turbulent datasets using Auto Regressive Moving Average processes
International Nuclear Information System (INIS)
We introduce a novel way to extract information from turbulent datasets by applying an Auto Regressive Moving Average (ARMA) statistical analysis. Such analysis goes well beyond the analysis of the mean flow and of the fluctuations and links the behavior of the recorded time series to a discrete version of a stochastic differential equation which is able to describe the correlation structure in the dataset. We introduce a new index ? that measures the difference between the resulting analysis and the Obukhov model of turbulence, the simplest stochastic model reproducing both Richardson law and the Kolmogorov spectrum. We test the method on datasets measured in a von Kármán swirling flow experiment. We found that the ARMA analysis is well correlated with spatial structures of the flow, and can discriminate between two different flows with comparable mean velocities, obtained by changing the forcing. Moreover, we show that the ? is highest in regions where shear layer vortices are present, thereby establishing a link between deviations from the Kolmogorov model and coherent structures. These deviations are consistent with the ones observed by computing the Hurst exponents for the same time series. We show that some salient features of the analysis are preserved when considering global instead of local observables. Finally, we analyze flow configurations with multistability features where the ARMA technique is efficient in discriminating different stability branches of the system
Modelling and analysis of turbulent datasets using Auto Regressive Moving Average processes
Faranda, Davide; Pons, Flavio Maria Emanuele; Dubrulle, Bérengère; Daviaud, François; Saint-Michel, Brice; Herbert, Éric; Cortet, Pierre-Philippe
2014-10-01
We introduce a novel way to extract information from turbulent datasets by applying an Auto Regressive Moving Average (ARMA) statistical analysis. Such analysis goes well beyond the analysis of the mean flow and of the fluctuations and links the behavior of the recorded time series to a discrete version of a stochastic differential equation which is able to describe the correlation structure in the dataset. We introduce a new index ? that measures the difference between the resulting analysis and the Obukhov model of turbulence, the simplest stochastic model reproducing both Richardson law and the Kolmogorov spectrum. We test the method on datasets measured in a von Kármán swirling flow experiment. We found that the ARMA analysis is well correlated with spatial structures of the flow, and can discriminate between two different flows with comparable mean velocities, obtained by changing the forcing. Moreover, we show that the ? is highest in regions where shear layer vortices are present, thereby establishing a link between deviations from the Kolmogorov model and coherent structures. These deviations are consistent with the ones observed by computing the Hurst exponents for the same time series. We show that some salient features of the analysis are preserved when considering global instead of local observables. Finally, we analyze flow configurations with multistability features where the ARMA technique is efficient in discriminating different stability branches of the system.
Jakši? Uroš G.; Arsi? Nebojša B.; Fetahovi? Irfan S.; Stankovi? Koviljka ?.
2014-01-01
This paper deals with the analysis of correlation and regression between the parameters of particle ionizing radiation and the stability characteristics of the irradiated monocrystalline silicon film. Based on the presented theoretical model of correlation and linear regression between two random variables, numeric and real experiments were performed. In the numeric experiment, a simulation of the effect of alpha radiation on a thin layer of monocrystalline...
Dasgupta, Abhijit; Sun, Yan V.; Ko?nig, Inke R.; Bailey-wilson, Joan E.; Malley, James D.
2011-01-01
Genetics Analysis Workshop 17 provided common and rare genetic variants from exome sequencing data and simulated binary and quantitative traits in 200 replicates. We provide a brief review of the machine learning and regression-based methods used in the analyses of these data. Several regression and machine learning methods were used to address different problems inherent in the analyses of these data, which are high-dimension, low-sample-size data typical of many genetic association studies....
Shuai Wang
2014-10-01
Full Text Available Accurate prediction of the remaining useful life (RUL of lithium-ion batteries is important for battery management systems. Traditional empirical data-driven approaches for RUL prediction usually require multidimensional physical characteristics including the current, voltage, usage duration, battery temperature, and ambient temperature. From a capacity fading analysis of lithium-ion batteries, it is found that the energy efficiency and battery working temperature are closely related to the capacity degradation, which account for all performance metrics of lithium-ion batteries with regard to the RUL and the relationships between some performance metrics. Thus, we devise a non-iterative prediction model based on flexible support vector regression (F-SVR and an iterative multi-step prediction model based on support vector regression (SVR using the energy efficiency and battery working temperature as input physical characteristics. The experimental results show that the proposed prognostic models have high prediction accuracy by using fewer dimensions for the input data than the traditional empirical models.
Regression Analysis of Thermal Conductivity Based on Measurements of Compacted Graphite Irons
Selin, Martin; König, Mathias
2009-12-01
A model describing the thermal conductivity of compacted graphite iron (CGI) was created based on the microstructure analysis and thermal conductivity measurements of 76 compacted graphite samples. The thermal conductivity was measured using a laser flash apparatus for seven temperatures ranging between 35 °C and 600 °C. The model was created by solving a linear regression model taking into account the influence of carbon and silicon additions, nodularity, and fractions of ferrite and carbide constituents. Observations and the results from the model indicated a positive influence of the fraction of ferrite in the metal matrix on the thermal conductivity. Increasing the amount of carbon addition while keeping the CE value constant, i.e., at the same time reducing the silicon addition, had a positive effect on the thermal conductivity value. Nodularity is known to reduce the thermal conductivity and this was also confirmed. The fraction of carbides was low in the samples, making their influence slight. A comparison of the thermal conductivity values calculated from the model with measured values showed a good agreement, even on materials not used to solve the linear regression model.
???
2012-06-01
Full Text Available ??????????????????????????????????????????????????logratio??????????(PLS??????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????? ? ???? ? ????? Prediction of water consumption structure on the basis of the relationship between water consumption structure and industrial structure is essential to the exploitation and utilization of water resources. Based on the symmetrical logratio transformation and partial least-squares regression, linear regression model for water consumption structure and industrial structure in FujianProvinceis developed in this study. Analysis on the model showed that the compositional data of water consumption structure and industrial structure inFujianProvincehad obvious linear relationship. This model fit the data very well with high accuracy and can be used to predict water consumption structure. Agricultural water was highly correlated with primary industry, and so was the industrial water with secondary industry. Agricultural water showed significantly negative correlation with secondary industry and tertiary industry. The variation of domestic water had an insignificant correlation with industrial structure. The capacity to explain water consumption structure of the industrial structure factors was in the order of primary industry > secondary industry > tertiary industry.
Significance Test Algorithm of Crowd Flow in Public Fitness Areas Based On Regression Analysis
Guanghong Liu
2013-01-01
Full Text Available The increase of crowd come in and go out of fitness places can reflects the increase of the number of fitness people from the side. Fitness places often have installed camera, their daily video recording can be used as raw data of health situation in the area. In order to better statistical the number of people in fitness places with high density population, research means on the video should develop toward a more accurate goal that is easier to achieve. In the places with higher population density, target occlusion problem among each other is more prominent, which makes it difficult to detect and trace independent entity in a crowded area and the difficulty to precisely acquire the bodys movement trajectory is strengthened. On the basis of studying the characteristics of the video study object (crowd flow, this study establishes a linear regression model to estimate the population flows. The study first introduces the principle of video motion segmentation and the extraction method of eight categories of image features and then discusses the principles of regression estimation and significance test approach, finally verifies the reasonableness of theoretical models in the text by the data, which provides a theoretical basis for video analysis and provides a better technical foundation for the regional public fitness study.
The Use of Logistic Regression in the Analysis of Data Concerning Good Medical Practice
Directory of Open Access Journals (Sweden)
2002-06-01
Full Text Available Logistic regression is one of the commonly used models of explicative multivariate analysis utilized in epidemiology. Its use, which has become easier with modern statistical software, allows researchers to control confusion bias. It measures the odds-ratio , a quantification of the association probability between a given occurrence, represented by a dichotomic variable, and factors susceptible to influence it, represented by explicative variables. The choice of explicative variables integrated into the model is based on previous information on the study subject and is aimed at avoiding the confusion factors which have already been identified. The authors explain the fundamental principles of logistic regression and the steps involved in its application. By using two examples (the quality of the follow up care given to diabetics and in-hospital mortality after acute myocardial infarction, they demonstrate the value this statistical tool can have in studies performed by the medical service of the national health care fund, particularly in studies designed to evaluate professional practice.
A cautionary note on the use of EESC-based regression analysis for ozone trend studies
Kuttippurath, J.; Bodeker, G. E.; Roscoe, H. K.; Nair, P. J.
2015-01-01
Equivalent effective stratospheric chlorine (EESC) construct of ozone regression models attributes ozone changes to EESC changes using a single value of the sensitivity of ozone to EESC over the whole period. Using space-based total column ozone (TCO) measurements, and a synthetic TCO time series constructed such that EESC does not fall below its late 1990s maximum, we demonstrate that the EESC-based estimates of ozone changes in the polar regions (70-90°) after 2000 may, falsely, suggest an EESC-driven increase in ozone over this period. An EESC-based regression of our synthetic "failed Montreal Protocol with constant EESC" time series suggests a positive TCO trend that is statistically significantly different from zero over 2001-2012 when, in fact, no recovery has taken place. Our analysis demonstrates that caution needs to be exercised when using explanatory variables, with a single fit coefficient, fitted to the entire data record, to interpret changes in only part of the record.
Poisson regression analysis of the mortality among a cohort of World War II nuclear industry workers
International Nuclear Information System (INIS)
A historical cohort mortality study was conducted among 28,008 white male employees who had worked for at least 1 month in Oak Ridge, Tennessee, during World War II. The workers were employed at two plants that were producing enriched uranium and a research and development laboratory. Vital status was ascertained through 1980 for 98.1% of the cohort members and death certificates were obtained for 96.8% of the 11,671 decedents. A modified version of the traditional standardized mortality ratio (SMR) analysis was used to compare the cause-specific mortality experience of the World War II workers with the U.S. white male population. An SMR and a trend statistic were computed for each cause-of-death category for the 30-year interval from 1950 to 1980. The SMR for all causes was 1.11, and there was a significant upward trend of 0.74% per year. The excess mortality was primarily due to lung cancer and diseases of the respiratory system. Poisson regression methods were used to evaluate the influence of duration of employment, facility of employment, socioeconomic status, birth year, period of follow-up, and radiation exposure on cause-specific mortality. Maximum likelihood estimates of the parameters in a main-effects model were obtained to describe the joint effects of these six factors on cause-specific mortality of the World War II workers. We show that these multivariate regression techniques provide a useful extension of conventional SMR analysis and illustrate their etional SMR analysis and illustrate their effective use in a large occupational cohort study
International Nuclear Information System (INIS)
Yuichi Sarusawa; Kohei Arai
2013-01-01
Regression analysis based method for turbidity and ocean current velocity estimation with remote sensing satellite data is proposed. Through regressive analysis with MODIS data and measured data of turbidity and ocean current velocity, regressive equation which allows estimation of turbidity and ocean current velocity is obtained. With the regressive equation as well as long term MODIS data, turbidity and ocean current velocity trends in Ariake Sea area are clarified. It is also confirmed tha...
Quantile Regression in the Study of Developmental Sciences
Petscher, Yaacov; Logan, Jessica A. R.
2014-01-01
Linear regression analysis is one of the most common techniques applied in developmental research, but only allows for an estimate of the average relations between the predictor(s) and the outcome. This study describes quantile regression, which provides estimates of the relations between the predictor(s) and outcome, but across multiple points of…
Analysis of the Multiple SGTR of SMART
An advanced integral pressurized water reactor (PWR), SMART (System-integrated Modular Advanced ReacTor) with a rated thermal power of 330MW, is under development by KAERI. SMART adopts helical once-through type steam generator producing the superheated steam in normal operation and passive residual heat removal system (PRHRS) for decay heat removal after the reactor shutdown as shown in the Fig. 1. As a design basis event, the single steam generator tube rupture (SGTR) has been analyzed. Recently, the licensing body requires the plant's capability for the multiple steam generator tube rupture (MSGTR) of the advanced reactor. Therefore, in this study, the analysis of the MSGTR in SMART has been accomplished to show the proper plant's response
Filoso, Valerio
2010-01-01
The Regression Anatomy (RA) theorem (Angrist and Pischke 2009) is an alternative formulation of the Frisch-Waugh-Lovell (FWL) theorem (Frisch and Waugh 1933; Lovell 1963), a key finding in the algebra of OLS multiple regression models. In this paper, we present a command, reganat, to implement graphically the method of RA. This addition complements the built-in Stata command avplot in the validation of linear models, producing bidimensional scatterplots and regression lines obtained controlli...
G. Selvaraju
2013-12-01
Full Text Available Aim: A study was undertaken to develop a forecasting model for predicting bluetongue outbreaks in North-west agroclimatic zone of Tamil Nadu, India. Materials and Methods: Eleven bluetongue outbreaks were characterised by active and passive surveillances for a period of twelve years and used in this study. Meteorological data comprising of maximum and minimum temperatures, relative humidity, rainfall and wind speed were collected and used as the multiple predictor variables in the multiple liner regression model. Results: A multiple liner regression model was developed for the North-west zone of Tamil Nadu. Values of the dependant variables were less than or greater than one, and indicated remote or greater chances of bluetongue outbreaks respectively. The monthly mean maximum and minimum temperatures, relative humidity at 8.30 h and at 17.00 h IST, wind speed, and monthly total rainfall of 29.1 - 31.0°C, 20.1 - 22.0°C, 80.1 ? 85.0%, 65.1 ? 70.0%, 3.1 ? 5.0 km/h and < 200 mm respectively, were identified as the ideal climatic conditions for increased numbers of bluetongue outbreaks in this zone. Conclusion: Based on the values obtained from the prediction model, stake holders can be warned timely through the media to institute suitable prophylactic measures against bluetongue, to avoid economic losses due to disease. [Vet World 2013; 6(6.000: 321-324
Davidson, Russell; Mackinnon, James
2001-01-01
Associated with every popular nonlinear estimation method is at least one 'artificial' linear regression. We define an artificial regression in terms of three conditions that it must satisfy. Then we show how artificial regressions can be useful for numerical optimization, testing hypotheses, and computing parameter estimates. Several existing artificial regressions are discussed and are shown to satisfy the defining conditions, and a new artificial regression for regression models with heter...
A Unified Approach to Power Calculation and Sample Size Determination for Random Regression Models
Shieh, Gwowen
2007-01-01
Ordinal Logistic Regression for the Estimate of the Response Functions in the Conjoint Analysis
Directory of Open Access Journals (Sweden)
Amedeo De Luca
2011-12-01
Full Text Available In the Conjoint Analysis (COA model proposed here – a new approach to estimate more than one response function–an extension of the traditional COA, the polytomous response variable (i.e. evaluation of the overall desirability of alternative product profiles is described by a sequence of binary variables. To link the categories of overall evaluation to the factor levels, we adopt – at the aggregate level – an ordinal logistic regression, based on a main effects experimental design.The model provides several overall desirability functions (aggregated part-worths sets, as many as the overall ordered categories are, unlike the traditional metric and non metric COA, which gives only one response function. We provide an application of the model and an interpretation of the main effects.
Research of NiMH Battery Modeling and Simulation Based on Linear Regression Analysis Method
Yong-sheng Zhang
2013-11-01
Full Text Available The battery State-Of-Charge estimation was one of core issues in the development of electric vehicles battery management system, and higher accurate model was needed in State-Of-Charge estimation correctly. Therefore, accurate battery modeling and simulation was researched here. The thevenin equivalent circuit model of NiMH battery was established for the poor accuracy of traditional model. Based on the data which were brought from the 6V 6Ah NiMH battery hybrid pulse cycling test experiments, thevenin model parameters were identified by means of the linear regression analysis method. Then, the battery equivalent circuit simulating model was built in the MATLAB/Simulink environment. The simulation and experimental results showed that the model has better accuracy and can be used to guide the battery State-Of-Charge estimation.
Estimation of Output Disturbance in Auto-Regressive Model via Independent Component Analysis
R. Tanaka
2013-02-01
Full Text Available This paper explains and demonstrates how to estimate an output disturbance in an auto-regressive model. This method uses the independent component analysis (ICA technique, which restores source signals from their linear mixtures under the assumption that the source signals are mutually independent. The estimation is achieved by a model whose source signals consist of input and output disturbance, and observed signals consist of input and output. To solve the ICA problem, a natural gradient method based on mutual information is adopted. As a result, in this simulation, the NRR of our proposed method shows an improvement of about 4.0 [dB] compared with that of a conventional method.
International Nuclear Information System (INIS)
The monitoring of detailed 3-dimensional (3D) reactor core power distribution is a prerequisite in the operation of nuclear power reactors to ensure that various safety limits imposed on the LPD and DNBR, are not violated during nuclear power reactor operation. The LPD and DNBR should be calculated in order to perform the two major functions of the core protection calculator system (CPCS) and the core operation limit supervisory system (COLSS). The LPD at the hottest part of a hot fuel rod, which is related to the power peaking factor (PPF, Fq ), is more important than the LPD at any other position in a reactor core. The LPD needs to be estimated accurately to prevent nuclear fuel rods from melting. In this study, support vector regression (SVR) and uncertainty analysis have been applied to estimation of reactor core power peaking factor
Ashok Kumar Sahoo
2014-04-01
Full Text Available The objective of the study is to assess the performance of multilayer coated carbide insert in the machining of hardened AISI D2 steel (53 HRC using Taguchi design of experiment. The experiment was designed based on Taguchi L27 orthogonal array to predict surface roughness. The S/N ratio and optimum parametric condition are analysed. The analysis of variance has also been carried out to predict the significant factors affecting surface roughness. Based on Taguchi S/N ratio and ANOVA, feed is the most influencing parameter for surface roughness followed by cutting speed whereas depth of cut has least significant from the experiments. In regression model, the value of R2 being 0.98 indicates that 98 % of the total variations are explained by the model. It indicates that the developed model can be effectively used to predict the surface roughness on the machining of D2 steel with 95% confidence intervals.
Factors predicting the failure of Bernese periacetabular osteotomy: a meta-regression analysis.
Sambandam, Senthil Nathan; Hull, Jason; Jiranek, William A
2009-12-01
There is no clear evidence regarding the outcome of Bernese periacetabular osteotomy (PAO) in different patient populations. We performed systematic meta-regression analysis of 23 eligible studies. There were 1,113 patients of which 61 patients had total hip arthroplasty (THA) (endpoint) as a result of failed Bernese PAO. Univariate analysis revealed significant correlation between THA and presence of grade 2/grade 3 arthritis, Merle de'Aubigne score (MDS), Harris hip score and Tonnis angle, change in lateral centre edge (LCE) angle, late proximal femoral osteotomies, and heterotrophic ossification (HO) resection. Multivariate analysis showed that the odds of having THA increases with grade 2/grade 3 osteoarthritis (3.36 times), joint penetration (3.12 times), low preoperative MDS (1.59 times), late PFO (1.59 times), presence of preoperative subluxation (1.22 times), previous hip operations (1.14 times), and concomitant PFO (1.09 times). In the absence of randomised controlled studies, the findings of this analysis can help the surgeon to make treatment decisions. PMID:18719916
Comparison of Some Estimation Methods in Linear Regression
?lkay Alt?nda?; Ümran M. Tek?en; A??r Genç
2010-01-01
In this study, we are informed about some methods as alternatives to the classical least squares methods which are used for simple linear and multiple linear regression analysis. In short, linear regression model is shown via matrix as;Y=X?+? where Y is the vector belonging to dependent variable, X is the design matrix of independent variables, ? is the parameter vector, ?is the vector belonging to error terms, so the least squares estimator of the linear regression is shown by?=(X^{?...
K.Satyanarayana
2013-06-01
Full Text Available The present work deals with the cutting forces and cutting temperature produced during turning of titanium alloy Ti-6Al-4V with PVD TiN coated tungsten carbide inserts under dry environment. The 1st order mathematical models are developed using multiple regression analysis and optimized the process parameters using contour plots. The model presented high determination coefficient (R2 = 0.964 and 0.989 explaining 96.4 % and 98.9 % of the variability in the cutting force and cutting temperature, which indicates the goodness of fit for the model and high significance of the model. The developed mathematical model correlates the relationship of the cutting force and temperature with the process parameters with good degree of approximation. From the contour plots, the optimal parametric combination for lowest cutting force is v 3 (75 m/min – f 1 (0.25 mm/rev. Similarly, the optimal parametric combination for minimum temperature is v 1 (45 m/min – f 1 (0.25 mm/rev. Cutting speed is found to be the most significance parameter on cutting forces followed by feed. Similarly, for cutting temperature, feed is found to be the most influencing parameter followed by cutting speed.
Ridge Regression Analysis on the Influential Factors of FDI in Jiangsu Province
Yang CAO
2008-08-01
Full Text Available
As Chinese eastern coastal developed areas, through the use of foreign capital, Jiangsu Province has not only promoted economic growth rapidly, enhanced the regional comprehensive competitiveness, promoted employment, but also created a new famous mode of economic development called Sunan. Based on the qualitative analysis of factors affecting the inflow of foreign capital in Jiangsu, the paper establish a mathematical model between the FDI and major economic indicators in Jiangsu, in accordance with its own characteristics. And then taken 1992-2006 time-series data for the background, the paper use the method of ridge regression to analysis the influential factors of FDI in Jiangsu.
Key words: foreign direct investment, ridge regression, factors, Jiangsu
Résumé: En tant qu’une région développée dans la côte-est de la Chine, grâce à l’usage du capital étranger, la province du Jiangsu a non seulement eu une croissance économique rapide, augmenté la compétitivité générale, créé desemplois mais aussi inventé un nouveau modèle du développement économique qu’on appelle Sunan. En se basant sur les analyses qualitatives des facteurs affectant l’afflux du capital étranger dans la province de Jiangsu, l’article étalit un modèle mathématiqueentre le FDI et les principaux indicateurs économiques dans la Province, conformément à ses caractéristiques appropriées. Et puis, en employant les données de la période de l’année 1992 à 2006 comme l’arrière-plan, l’article utilise la méthode d’analyse de ridge régressionn pour étudier les facteurs influents de FDI dans la province de Jiangsu.
Mots-Clés: investissements directs étrangers, ridge régression, facteurs, Jiangsu
Applying support vector regression analysis on grip force level-related corticomuscular coherence
DEFF Research Database (Denmark)
Rong, Yao; Han, Xixuan
2014-01-01
Voluntary motor performance is the result of cortical commands driving muscle actions. Corticomuscular coherence can be used to examine the functional coupling or communication between human brain and muscles. To investigate the effects of grip force level on corticomuscular coherence in an accessory muscle, this study proposed an expanded support vector regression (ESVR) algorithm to quantify the coherence between electroencephalogram (EEG) from sensorimotor cortex and surface electromyogram (EMG) from brachioradialis in upper limb. A measure called coherence proportion was introduced to compare the corticomuscular coherence in the alpha (7–15Hz), beta (15–30Hz) and gamma (30–45Hz) band at 25 % maximum grip force (MGF) and 75 % MGF. Results show that ESVR could reduce the influence of deflected signals and summarize the overall behavior of multiple coherence curves. Coherence proportion is more sensitive to grip force level than coherence area. The significantly higher corticomuscular coherence occurred in the alpha (p<0.01) and beta band (p<0.01) during 75 % MGF, but in the gamma band (p<0.01) during 25 % MGF. The results suggest that sensorimotor cortex might control the activity of an accessory muscle for hand grip with increased grip intensity by changing functional corticomuscular coupling at certain frequency bands (alpha, beta and gamma bands).
De la Cruz, Rolando; Branco, Márcia D
2009-08-01
We have considered a Bayesian approach for the nonlinear regression model by replacing the normal distribution on the error term by some skewed distributions, which account for both skewness and heavy tails or skewness alone. The type of data considered in this paper concerns repeated measurements taken in time on a set of individuals. Such multiple observations on the same individual generally produce serially correlated outcomes. Thus, additionally, our model does allow for a correlation between observations made from the same individual. We have illustrated the procedure using a data set to study the growth curves of a clinic measurement of a group of pregnant women from an obstetrics clinic in Santiago, Chile. Parameter estimation and prediction were carried out using appropriate posterior simulation schemes based in Markov Chain Monte Carlo methods. Besides the deviance information criterion (DIC) and the conditional predictive ordinate (CPO), we suggest the use of proper scoring rules based on the posterior predictive distribution for comparing models. For our data set, all these criteria chose the skew-t model as the best model for the errors. These DIC and CPO criteria are also validated, for the model proposed here, through a simulation study. As a conclusion of this study, the DIC criterion is not trustful for this kind of complex model. PMID:19629998
Fitzenberger, Bernd; Wilke, Ralf Andreas
2015-01-01
Quantile regression is emerging as a popular statistical approach, which complements the estimation of conditional mean models. While the latter only focuses on one aspect of the conditional distribution of the dependent variable, the mean, quantile regression provides more detailed insights by modeling conditional quantiles. Quantile regression can therefore detect whether the partial effect of a regressor on the conditional quantiles is the same for all quantiles or differs across quantiles. Quantile regression can provide evidence for a statistical relationship between two variables even if the mean regression model does not. We provide a short informal introduction into the principle of quantile regression which includes an illustrative application from empirical labor market research. This is followed by briefly sketching the underlying statistical model for linear quantile regression based on a cross-section sample. We summarize various important extensions of the model including the nonlinear quantile regression model, censored quantile regression, and quantile regression for time-series data. We also discuss a number of more recent extensions of the quantile regression model to censored data, duration data, and endogeneity, and we describe how quantile regression can be used for decomposition analysis. Finally, we identify several key issues, which should be addressed by future research, and we provide an overview of quantile regression implementations in major statistics software. Our treatment of the topic is based on the perspective of applied researchers using quantile regression in their empirical work.
Analysis of the variability of auditory brainstem response components through linear regression
Kheline F. P. Naves
2012-09-01
Full Text Available The analysis of the Auditory Brainstem Response (ABR is of fundamental importance to the investigation of the auditory system behavior, though its interpretation has a subjective nature because of the manual process employed in its study and the clinical experience required for its analysis. When analyzing the ABR, clinicians are often interested in the identification of ABR signal components referred to as Jewett waves. In particular, the detection and study of the time when these waves occur (i.e., the wave latency is a practical tool for the diagnosis of disorders affecting the auditory system. In this context, the aim of this research is to compare ABR manual/visual analysis provided by different examiners. Methods: The ABR data were collected from 10 normal-hearing subjects (5 men and 5 women, from 20 to 52 years. A total of 160 data samples were analyzed and a pair- wise comparison between four distinct examiners was executed. We carried out a statistical study aiming to identify significant differences between assessments provided by the examiners. For this, we used Linear Regression in conjunction with Bootstrap, as a method for evaluating the relation between the responses given by the examiners. Results: The analysis suggests agreement among examiners however reveals differences between assessments of the variability of the waves. We quantified the magnitude of the obtained wave latency differences and 18% of the investigated waves presented substantial differences (large and moderate and of these 3.79% were considered not acceptable for the clinical practice. Conclusions: Our results characterize the variability of the manual analysis of ABR data and the necessity of establishing unified standards and protocols for the analysis of these data. These results may also contribute to the validation and development of automatic systems that are employed in the early diagnosis of hearing loss.
Barlin, Joyce N.; Zhou, Qin; St. Clair, Caryn M.; Iasonos, Alexia; Soslow, Robert A.; Alektiar, Kaled M.; Hensley, Martee L.; Leitao, Mario M.; Barakat, Richard R.; Abu-Rustum, Nadeem R.
2013-01-01
Objective To evaluate which clinicopathologic factors influenced overall survival (OS) in endometrial carcinoma and to determine if the surgical effort to assess para-aortic (PA) lymph nodes (LNs) at initial staging surgery impacts OS. Methods All patients diagnosed with endometrial cancer from 1/1993-12/2011 who had LNs excised were included. PALN assessment was defined by the identification of one or more PALNs on final pathology. A multivariate analysis was performed to assess the effect of PALNs on OS. A form of recursive partitioning called classification and regression tree (CART) analysis was implemented. Variables included: age, stage, tumor subtype, grade, myometrial invasion, total LNs removed, evaluation of PALNs, and adjuvant chemotherapy. Results The cohort included 1920 patients, with a median age of 62 years. The median number of LNs removed was 16 (range, 1-99). The removal of PALNs was not associated with OS (P=0.450). Using the CART hierarchically, stage I vs. stages II-IV and grade 1-2 vs. grade 3 emerged as predictors of OS. If the tree was allowed to grow, further branching was based on age and myometrial invasion. Total number of LNs removed and assessment of PALNs as defined in this study were not predictive of OS. Conclusion This innovative CART analysis emphasized the importance of proper stage assignment and a binary grading system in impacting OS. Notably, the total number of LNs removed and specific evaluation of PALNs as defined in this study were not important predictors of OS. PMID:23774300
International Nuclear Information System (INIS)
Mixed dissociation constants of four drug acids, i.e. silychristin, silybinin, silydianin and mycophenolate at various ionic strengths I of range 0.01 and 0.30 and at temperatures of 25 and 37 deg. C were determined using the SQUAD(84) regression analysis program applied to pH-spectrophotometric titration data. The proposed strategy of an efficient experimentation in a protonation constants determination, followed by a computational strategy for the chemical model with a protonation constants determination, is presented on the protonation equilibria of silychristin. The thermodynamic dissociation constant pKaT was estimated by non-linear regression of {pKa, I data at 25 and 37 deg. C: for silychristin pKa,1T=6.52(16) and 6.62(1), pKa,2T=7.22(13) and 7.41(5), pKa,3T=8.96(9) and 8.94(9), pKa,4T=10.17(7) and 10.03(8), pKa,5T=11.89(4) and 11.63(7); for silybin pKa,1T=7.00(4) and 6.86(5), pKa,2T=8.77(11) and 8.77(3), pKa,3T=9.57(8) and 9.62(1), pKa,4T=11.66(3) and 11.38(1); for silydianin pKa,1T=6.64(7) and 7.10(6), pKa,2T=7.78(5) and 8.93(1), pKa,3T=9.66(9) and 10.06(11), pKa,4T=10.71(7) and 10.77(7), pKa,5T=12.26(5) and 12.14(5); for myT=12.26(5) and 12.14(5); for mycophenolate pKaT=8.32(1) and 8.14(1). Goodness-of-fit tests for various regression diagnostics enabled the reliability of parameter estimates to be found
Meloun, Milan; Burkonova, Dominika; Syrovy, Tomas; Vrana, Ales
2003-06-11
Mixed dissociation constants of four drug acids, i.e. silychristin, silybinin, silydianin and mycophenolate at various ionic strengths I of range 0.01 and 0.30 and at temperatures of 25 and 37 deg. C were determined using the SQUAD(84) regression analysis program applied to pH-spectrophotometric titration data. The proposed strategy of an efficient experimentation in a protonation constants determination, followed by a computational strategy for the chemical model with a protonation constants determination, is presented on the protonation equilibria of silychristin. The thermodynamic dissociation constant pK{sub a}{sup T} was estimated by non-linear regression of {l_brace}pK{sub a}, I data at 25 and 37 deg. C: for silychristin pK{sub a,1}{sup T}=6.52(16) and 6.62(1), pK{sub a,2}{sup T}=7.22(13) and 7.41(5), pK{sub a,3}{sup T}=8.96(9) and 8.94(9), pK{sub a,4}{sup T}=10.17(7) and 10.03(8), pK{sub a,5}{sup T}=11.89(4) and 11.63(7); for silybin pK{sub a,1}{sup T}=7.00(4) and 6.86(5), pK{sub a,2}{sup T}=8.77(11) and 8.77(3), pK{sub a,3}{sup T}=9.57(8) and 9.62(1), pK{sub a,4}{sup T}=11.66(3) and 11.38(1); for silydianin pK{sub a,1}{sup T}=6.64(7) and 7.10(6), pK{sub a,2}{sup T}=7.78(5) and 8.93(1), pK{sub a,3}{sup T}=9.66(9) and 10.06(11), pK{sub a,4}{sup T}=10.71(7) and 10.77(7), pK{sub a,5}{sup T}=12.26(5) and 12.14(5); for mycophenolate pK{sub a}{sup T}=8.32(1) and 8.14(1). Goodness-of-fit tests for various regression diagnostics enabled the reliability of parameter estimates to be found.
Alternative Methods of Regression
Birkes, David
2011-01-01
Of related interest. Nonlinear Regression Analysis and its Applications Douglas M. Bates and Donald G. Watts ".an extraordinary presentation of concepts and methods concerning the use and analysis of nonlinear regression models.highly recommend[ed].for anyone needing to use and/or understand issues concerning the analysis of nonlinear regression models." --Technometrics This book provides a balance between theory and practice supported by extensive displays of instructive geometrical constructs. Numerous in-depth case studies illustrate the use of nonlinear regression analysis--with all data s
International Nuclear Information System (INIS)
Breast density (the percentage of fibroglandular tissue in the breast) has been suggested to be a useful surrogate marker for breast cancer risk. It is conventionally measured using screen-film mammographic images by a labor-intensive histogram segmentation method (HSM). We have adapted and modified the HSM for measuring breast density from raw digital mammograms acquired by full-field digital mammography. Multiple regression model analyses showed that many of the instrument parameters for acquiring the screening mammograms (e.g. breast compression thickness, radiological thickness, radiation dose, compression force, etc) and image pixel intensity statistics of the imaged breasts were strong predictors of the observed threshold values (model R2 = 0.93) and %-density (R2 = 0.84). The intra-class correlation coefficient of the %-density for duplicate images was estimated to be 0.80, using the regression model-derived threshold values, and 0.94 if estimated directly from the parameter estimates of the %-density prediction regression model. Therefore, with additional research, these mathematical models could be used to compute breast density objectively, automatically bypassing the HSM step, and could greatly facilitate breast cancer research studies
Bry, X; Verron, T; Cazes, P
2009-05-29
In this work, we consider chemical and physical variable groups describing a common set of observations (cigarettes). One of the groups, minor smoke compounds (minSC), is assumed to depend on the others (minSC predictors). PLS regression (PLSR) of m inSC on the set of all predictors appears not to lead to a satisfactory analytic model, because it does not take into account the expert's knowledge. PLS path modeling (PLSPM) does not use the multidimensional structure of predictor groups. Indeed, the expert needs to separate the influence of several pre-designed predictor groups on minSC, in order to see what dimensions this influence involves. To meet these needs, we consider a multi-group component-regression model, and propose a method to extract from each group several strong uncorrelated components that fit the model. Estimation is based on a global multiple covariance criterion, used in combination with an appropriate nesting approach. Compared to PLSR and PLSPM, the structural equation exploratory regression (SEER) we propose fully uses predictor group complementarity, both conceptually and statistically, to predict the dependent group. PMID:19427458
Bunch, N. L.; Spasojevic, M.; Shprits, Y.; Golden, D. I.
2011-12-01
Outer radiation belt fluxes vary by orders of magnitude on time scales of hours to days (Li et al., 2001). Wave-particle interactions involving lower band chorus waves are thought to play a major role in acceleration and loss of energetic electrons in the outer belt. Wave particle interactions involving chorus and the highest energy electrons (>MeV) is possible only at latitudes above about 20° (Shprits and Ni, 2009). Despite their perceived importance in controlling energetic electron populations in the radiation belts, relatively insubstantial statistical characterization exists from which to base radiation belt model inputs for chorus. Recent investigations employing a database of chorus events observed by the Polar spacecraft have begun to characterize chorus waves at 20° magnetic latitude and above (Bunch et al., 2011). This study utilizes the Polar wave database to parameterize wave intensities as a function of spatial location and geomagnetic driving conditions (e.g. AE, Vsw, Kp, etc.). The relative correlation of chorus occurrence and amplitude with geomagnetic conditions is also examined using an auto regressive moving average (ARMA) technique for non-independent observations, such as those made by orbiting spacecraft. Regression analysis shows significant correlation of chorus with increased AE, Vsw, and Kp, and much lower correlations with proton density, and pressure. Wave parameterizations show, for fixed range in L, an increase in chorus amplitude with magnetic latitude in the dawn sector. Amplitudes appear more constant over a range of latitudes at noon, particularly for increased activity levels. These results represent significant steps forward toward a more complete characterization of the chorus wave environment and understanding the role chorus plays in regulation of the radiation belt environment.
Borquis, Rusbel Raul Aspilcueta; Neto, Francisco Ribeiro de Araujo; Baldi, Fernando; Hurtado-Lugo, Naudin; de Camargo, Gregório M F; Muñoz-Berrocal, Milthon; Tonhati, Humberto
2013-09-01
In this study, genetic parameters for test-day milk, fat, and protein yield were estimated for the first lactation. The data analyzed consisted of 1,433 first lactations of Murrah buffaloes, daughters of 113 sires from 12 herds in the state of São Paulo, Brazil, with calvings from 1985 to 2007. Ten-month classes of lactation days were considered for the test-day yields. The (co)variance components for the 3 traits were estimated using the regression analyses by Bayesian inference applying an animal model by Gibbs sampling. The contemporary groups were defined as herd-year-month of the test day. In the model, the random effects were additive genetic, permanent environment, and residual. The fixed effects were contemporary group and number of milkings (1 or 2), the linear and quadratic effects of the covariable age of the buffalo at calving, as well as the mean lactation curve of the population, which was modeled by orthogonal Legendre polynomials of fourth order. The random effects for the traits studied were modeled by Legendre polynomials of third and fourth order for additive genetic and permanent environment, respectively, the residual variances were modeled considering 4 residual classes. The heritability estimates for the traits were moderate (from 0.21-0.38), with higher estimates in the intermediate lactation phase. The genetic correlation estimates within and among the traits varied from 0.05 to 0.99. The results indicate that the selection for any trait test day will result in an indirect genetic gain for milk, fat, and protein yield in all periods of the lactation curve. The accuracy associated with estimated breeding values obtained using multi-trait random regression was slightly higher (around 8%) compared with single-trait random regression. This difference may be because to the greater amount of information available per animal. PMID:23831097
Bryanton, Mark; Makis, William
2015-07-01
Pseudomyogenic (epithelioid sarcoma-like) hemangioendothelioma is a rare, recently described vascular neoplasm that occurs predominantly in the distal extremities of young to middle aged adult males. In this report, we describe a patient who presented with numerous lytic bone lesions which were intensely F-FDG avid on PET/CT and presumed to be metastatic. Pathology revealed pseudomyogenic hemangioendothelioma. Follow-up CT showed enlargement and increasing sclerosis of several lesions, believed to represent progression; however, follow-up PET/CT confirmed a spontaneous regression of the disease. PMID:26018681
High-Dimensional Heteroscedastic Regression with an Application to eQTL Data Analysis.
Daye, Z John; Chen, Jinbo; Li, Hongzhe
2012-03-01
We consider the problem of high-dimensional regression under non-constant error variances. Despite being a common phenomenon in biological applications, heteroscedasticity has, so far, been largely ignored in high-dimensional analysis of genomic data sets. We propose a new methodology that allows non-constant error variances for high-dimensional estimation and model selection. Our method incorporates heteroscedasticity by simultaneously modeling both the mean and variance components via a novel doubly regularized approach. Extensive Monte Carlo simulations indicate that our proposed procedure can result in better estimation and variable selection than existing methods when heteroscedasticity arises from the presence of predictors explaining error variances and outliers. Further, we demonstrate the presence of heteroscedasticity in and apply our method to an expression quantitative trait loci (eQTLs) study of 112 yeast segregants. The new procedure can automatically account for heteroscedasticity in identifying the eQTLs that are associated with gene expression variations and lead to smaller prediction errors. These results demonstrate the importance of considering heteroscedasticity in eQTL data analysis. PMID:22547833
NEELEY, E. Shannon; BIGLER, Erin D.; KRASNY, Lori; OZONOFF, Sally; McMAHON, William; LAINHART, Janet E.
2015-01-01
Multiple Criteria Analysis for Energy Storage Selection
Alexandre Barin
2011-09-01
Full Text Available In view of the current and predictable energy shortage and environmental concerns, the exploitation of renewable energy sources offers great potential to meet increasing energy demands and to decrease depend- ence on fossil fuels. However, introducing these sources will be more attractive provided they operate in conjunction with energy storage systems (ESS. Furthermore, effective energy storage management is essential to achieve a balance between power quality, efficiency, costs and environmental constraints. This paper presents a method based on the analytic hierarchy process and fuzzy multi-rules and multi-sets. By exploiting a multiple criteria analysis, the proposed methods evaluate the operation of storage energy systems such as: pumped hydro and compressed air energy storage, H_{2}, flywheel, super-capacitors and lithium-ion storage as well as NaS advanced batteries and VRB flow battery. The main objective of the study is to find the most appropriate ESS consistent with a power quality priority. Several parameters are used for the investigation: efficiency, load management, technical maturity, costs, environmental impact and power quality.
Akimoto, Yuki; Yugi, Katsuyuki; Uda, Shinsuke; Kudo, Takamasa; Komori, Yasunori; Kubota, Hiroyuki; Kuroda, Shinya
2013-01-01
Cells use common signaling molecules for the selective control of downstream gene expression and cell-fate decisions. The relationship between signaling molecules and downstream gene expression and cellular phenotypes is a multiple-input and multiple-output (MIMO) system and is difficult to understand due to its complexity. For example, it has been reported that, in PC12 cells, different types of growth factors activate MAP kinases (MAPKs) including ERK, JNK, and p38, and CREB, for selective ...
Czekaj, Tomasz Gerard; Henningsen, Arne
The estimation of the technical efficiency comprises a vast literature in the field of applied production economics. There are two predominant approaches: the non-parametric and non-stochastic Data Envelopment Analysis (DEA) and the parametric Stochastic Frontier Analysis (SFA). The DEA is criticised, because it cannot account for statistical noise such as random production shocks and measurement errors, which are inherent in more or less all production data sets. In contrast, the SFA is criticised, because it requires the specification of a functional form, which involves the risk of specifying an unsuitable functional form and thus, model misspecification and biased parameter estimates. Given these problems of the DEA and the SFA, Fan, Li and Weersink (1996) proposed a semi-parametric stochastic frontier model that estimates the production function (frontier) by non-parametric regression based on kernel estimators. This approach combines the virtues of the DEA and the SFA, while avoiding their drawbacks: itavoids the specification of a functional form and at the same time accounts for statistical noise. More recently, this approach was used by Henderson and Simar (2005), Kumbhakar et al. (2007), and Henningsen and Kumbhakar (2009). The aim of this paper and its main contribution to the existing literature is the estimation semi-parametric stochastic frontier models using a different non-parametric estimation technique: spline regression (Ma et al. 2011). We apply this approach to the Polish dairy sector and use a panel data set of Polish dairy farms from the years 2004-2010. The Polish dairy sector has changed considerably since the integration of Poland in the European Union: the number of dairy producers decreased by one third and the average herd size increased from 3.8 to 5.7 cows per farm within the period 2004-2010. It is expected that farms with small herds (less than 30 dairy cows) will quit and that the number of large farms (with more than 100 dairy cows) will increase. Therefore, a thorough empirical study of the technical efficiency and scale efficiency of Polish dairy farms contributes to the insight into this dynamic process. Furthermore, we compare and evaluate the results of this spline-based semi-parametric stochastic frontier model with results of other semi-parametric stochastic frontier models and of traditional parametric stochastic frontier models. References: Fan, Y.; Li, Q. , Weersink, A. (1996), Semiparametric Estimation of Stochastic Production Frontier Models, Journal of Business and Economic Statistics. Henderson, D. J., Simar, L. (2005), A Fully Nonparametric Stochastic Frontier Model for Panel Data, University of New York Henningsen, A. , Kumbhakar, S. C. (2009), Semiparametric Stochastic Frontier Analysis: An Application to Polish Farms During Transition, Paper presented at the (EWEPA) in Pisa, Italy. Kumbhakar S. C., Park, B. U., Simar, L. Tsionas E. G. (2007), Nonparametric Stochastic Frontiers: A Local Maximum Likelihood Approach, Journal of Econometrics. Ma,S., Racine, J. S. & Yang, L. (2011), Spline regression in the presence of categorical predictors, Working Paper
Kügler, S. D.; Polsterer, K.; Hoecker, M.
2015-04-01
Context. In astronomy, new approaches to process and analyze the exponentially increasing amount of data are inevitable. For spectra, such as in the Sloan Digital Sky Survey spectral database, usually templates of well-known classes are used for classification. In case the fitting of a template fails, wrong spectral properties (e.g. redshift) are derived. Validation of the derived properties is the key to understand the caveats of the template-based method. Aims: In this paper we present a method for statistically computing the redshift z based on a similarity approach. This allows us to determine redshifts in spectra for emission and absorption features without using any predefined model. Additionally, we show how to determine the redshift based on single features. As a consequence we are, for example, able to filter objects that show multiple redshift components. Methods: The redshift calculation is performed by comparing predefined regions in the spectra and individually applying a nearest neighbor regression model to each predefined emission and absorption region. Results: The choice of the model parameters controls the quality and the completeness of the redshifts. For ?90% of the analyzed 16 000 spectra of our reference and test sample, a certain redshift can be computed that is comparable to the completeness of SDSS (96%). The redshift calculation yields a precision for every individually tested feature that is comparable to the overall precision of the redshifts of SDSS. Using the new method to compute redshifts, we could also identify 14 spectra with a significant shift between emission and absorption or between emission and emission lines. The results already show the immense power of this simple machine-learning approach for investigating huge databases such as the SDSS.
Kaplan, David
2005-01-01
This article considers the problem of estimating dynamic linear regression models when the data are generated from finite mixture probability density function where the mixture components are characterized by different dynamic regression model parameters. Specifically, conventional linear models assume that the data are generated by a single…
Simultaneous Two-Way Clustering of Multiple Correspondence Analysis
Hwang, Heungsun; Dillon, William R.
2010-01-01
A 2-way clustering approach to multiple correspondence analysis is proposed to account for cluster-level heterogeneity of both respondents and variable categories in multivariate categorical data. Specifically, in the proposed method, multiple correspondence analysis is combined with k-means in a unified framework in which "k"-means is applied…
Vozinaki, Anthi Eirini K.; Karatzas, George P.; Sibetheros, Ioannis A.; Varouchakis, Emmanouil A.
2014-05-01
Damage curves are the most significant component of the flood loss estimation models. Their development is quite complex. Two types of damage curves exist, historical and synthetic curves. Historical curves are developed from historical loss data from actual flood events. However, due to the scarcity of historical data, synthetic damage curves can be alternatively developed. Synthetic curves rely on the analysis of expected damage under certain hypothetical flooding conditions. A synthetic approach was developed and presented in this work for the development of damage curves, which are subsequently used as the basic input to a flood loss estimation model. A questionnaire-based survey took place among practicing and research agronomists, in order to generate rural loss data based on the responders' loss estimates, for several flood condition scenarios. In addition, a similar questionnaire-based survey took place among building experts, i.e. civil engineers and architects, in order to generate loss data for the urban sector. By answering the questionnaire, the experts were in essence expressing their opinion on how damage to various crop types or building types is related to a range of values of flood inundation parameters, such as floodwater depth and velocity. However, the loss data compiled from the completed questionnaires were not sufficient for the construction of workable damage curves; to overcome this problem, a Weighted Monte Carlo method was implemented, in order to generate extra synthetic datasets with statistical properties identical to those of the questionnaire-based data. The data generated by the Weighted Monte Carlo method were processed via Logistic Regression techniques in order to develop accurate logistic damage curves for the rural and the urban sectors. A Python-based code was developed, which combines the Weighted Monte Carlo method and the Logistic Regression analysis into a single code (WMCLR Python code). Each WMCLR code execution provided a flow velocity-depth damage curve for a specific land use. More specifically, each WMCLR code execution for the agricultural sector generated a damage curve for a specific crop and for every month of the year, thus relating the damage to any crop with floodwater depth, flow velocity and the growth phase of the crop at the time of flooding. Respectively, each WMCLR code execution for the urban sector developed a damage curve for a specific building type, relating structural damage with floodwater depth and velocity. Furthermore, two techno-economic models were developed in Python programming language, in order to estimate monetary values of flood damages to the rural and the urban sector, respectively. A new Monte Carlo simulation was performed, consisting of multiple executions of the techno-economic code, which generated multiple damage cost estimates. Each execution used the proper WMCLR simulated damage curve. The uncertainty analysis of the damage estimates established the accuracy and reliability of the proposed methodology for the synthetic damage curves' development.
Lombardo, L.; Cama, M.; Maerker, M.; Parisi, L.; Rotigliano, E.
2014-12-01
This study aims at comparing the performances of Binary Logistic Regression (BLR) and Boosted Regression Trees (BRT) methods in assessing landslide susceptibility for multiple-occurrence regional landslide events within the Mediterranean region. A test area was selected in the north-eastern sector of Sicily (southern Italy), corresponding to the catchments of the Briga and the Giampilieri streams both stretching for few kilometres from the Peloritan ridge (eastern Sicily, Italy) to the Ionian sea. This area was struck on the 1st October 2009 by an extreme climatic event resulting in thousands of rapid shallow landslides, mainly of debris flows and debris avalanches types involving the weathered layer of a low to high grade metamorphic bedrock. Exploiting the same set of predictors and the 2009 landslide archive, BLR- and BRT-based susceptibility models were obtained for the two catchments separately, adopting a random partition (RP) technique for validation; besides, the models trained in one of the two catchments (Briga) were tested in predicting the landslide distribution in the other (Giampilieri), adopting a spatial partition (SP) based validation procedure. All the validation procedures were based on multi-folds tests so to evaluate and compare the reliability of the fitting, the prediction skill, the coherence in the predictor selection and the precision of the susceptibility estimates. All the obtained models for the two methods produced very high predictive performances, with a general congruence between BLR and BRT in the predictor importance. In particular, the research highlighted that BRT-models reached a higher prediction performance with respect to BLR-models, for RP based modelling, whilst for the SP-based models the difference in predictive skills between the two methods dropped drastically, converging to an analogous excellent performance. However, when looking at the precision of the probability estimates, BLR demonstrated to produce more robust models in terms of selected predictors and coefficients, as well as of dispersion of the estimated probabilities around the mean value for each mapped pixel. The difference in the behaviour could be interpreted as the result of overfitting effects, which heavily affect decision tree classification more than logistic regression techniques.
Hoff, Peter D.; Niu, Xiaoyue
2011-01-01
Classical regression analysis relates the expectation of a response variable to a linear combination of explanatory variables. In this article, we propose a covariance regression model that parameterizes the covariance matrix of a multivariate response vector as a parsimonious quadratic function of explanatory variables. The approach is analogous to the mean regression model, and is similar to a factor analysis model in which the factor loadings depend on the explanatory var...
Li, Y; Graubard, B I; Huang, P; Gastwirth, J L
2015-02-20
Determining the extent of a disparity, if any, between groups of people, for example, race or gender, is of interest in many fields, including public health for medical treatment and prevention of disease. An observed difference in the mean outcome between an advantaged group (AG) and disadvantaged group (DG) can be due to differences in the distribution of relevant covariates. The Peters-Belson (PB) method fits a regression model with covariates to the AG to predict, for each DG member, their outcome measure as if they had been from the AG. The difference between the mean predicted and the mean observed outcomes of DG members is the (unexplained) disparity of interest. We focus on applying the PB method to estimate the disparity based on binary/multinomial/proportional odds logistic regression models using data collected from complex surveys with more than one DG. Estimators of the unexplained disparity, an analytic variance-covariance estimator that is based on the Taylor linearization variance-covariance estimation method, as well as a Wald test for testing a joint null hypothesis of zero for unexplained disparities between two or more minority groups and a majority group, are provided. Simulation studies with data selected from simple random sampling and cluster sampling, as well as the analyses of disparity in body mass index in the National Health and Nutrition Examination Survey 1999-2004, are conducted.? Empirical results indicate that the Taylor linearization variance-covariance estimation is accurate and that the proposed Wald test maintains the nominal level. PMID:25382235
International Nuclear Information System (INIS)
This study aims to perform a regression analysis which leads to the optimization on the operating conditions of ionic liquid (IL), 1-ethyl-3-methylimidazolium acetate ([EMIM]oAc) pretreatment on sugarcane bagasse (SCB). The structural changes on SCB during pretreatment were also examined. The effects of temperature, time and solid loading on reducing sugar (RS) yield obtained from enzymatic hydrolysis of pretreated SCB were investigated by applying Central Composite Design (CCD) of Response Surface Methodology (RSM). Results from CCD were modeled into a second order polynomial equation and the model shows a good correlation between predicted and experimental values. The optimized condition for [EMIM]oAc pretreatment were 145 °C, 15 min and 14 wt% of solid loading with an optimum RS yield of 69.7%. Characterization of SCB was carried out and there were no significant difference between the chemical composition of untreated and [EMIM]oAc-pretreated SCB. Pretreated SCB was found to be porous, less crystalline and favorable to enzymatic hydrolysis as proven by Scanning Electron Microscopy (SEM), X-ray Powder Diffraction (XRD) analysis and Fourier Transform Infrared (FTIR) analysis. In short, [EMIM]oAc pretreatment shows good performance in improving the RS yield after enzymatic hydrolysis besides giving desirable structural modification on pretreated SCB. These are of great benefit to the subsequent downstream processes. -- Highlights: ? Reliable model prediction on reducing sugar yield. ? Temperature has the most significant effect on [EMIM]oAc pretreatment. ? High solid loading in [EMIM]oAc pretreatment is feasible. ? Amorphous and porous structure in pretreated bagasse was confirmed. ? No significant variation in chemical composition of untreated and pretreated bagasse.
Ou, Dongshu
2010-01-01
The high school exit exam (HSEE) is rapidly becoming a standardized assessment procedure for educational accountability in the United States. I use a unique, state-specific dataset to identify the effects of failing the HSEE on the likelihood of dropping out of high school based on a regression discontinuity design. The analysis shows that…
Reynolds, Cecil R.; Gutkin, Terry B.
1980-01-01
Regression lines for the prediction of achievement were compared across race through the Potthoff analysis, which provides a simultaneous test of slope and intercept values. Results of these comparisons generally supported the predictive validity of the Wechsler Intelligence Scale for Children-Revised across race with this referral sample of young…
Long, Nguyen Phuoc; Huy, Nguyen Tien; Trang, Nguyen Thi Huyen; Luan, Nguyen Thien; Anh, Nguyen Hoang; Nghi, Tran Diem; Hieu, Mai Van; Hirayama, Kenji; Karbwang, Juntra
2014-01-01
BACKGROUND: Ethics is one of the main pillars in the development of science. We performed a JoinPoint regression analysis to analyze the trends of ethical issue research over the past half century. The question is whether ethical issues are neglected despite their importance in modern research.
Elnasir, Selma; Shamsuddin, Siti Mariyam; Farokhi, Sajad
2015-01-01
Palm vein recognition (PVR) is a promising new biometric that has been applied successfully as a method of access control by many organizations, which has even further potential in the field of forensics. The palm vein pattern has highly discriminative features that are difficult to forge because of its subcutaneous position in the palm. Despite considerable progress and a few practical issues, providing accurate palm vein readings has remained an unsolved issue in biometrics. We propose a robust and more accurate PVR method based on the combination of wavelet scattering (WS) with spectral regression kernel discriminant analysis (SRKDA). As the dimension of WS generated features is quite large, SRKDA is required to reduce the extracted features to enhance the discrimination. The results based on two public databases-PolyU Hyper Spectral Palmprint public database and PolyU Multi Spectral Palmprint-show the high performance of the proposed scheme in comparison with state-of-the-art methods. The proposed approach scored a 99.44% identification rate and a 99.90% verification rate [equal error rate (EER)=0.1%] for the hyperspectral database and a 99.97% identification rate and a 99.98% verification rate (EER=0.019%) for the multispectral database.
Dai, Wensheng
2014-01-01
Sales forecasting is one of the most important issues in managing information technology (IT) chain store sales since an IT chain store has many branches. Integrating feature extraction method and prediction tool, such as support vector regression (SVR), is a useful method for constructing an effective sales forecasting scheme. Independent component analysis (ICA) is a novel feature extraction technique and has been widely applied to deal with various forecasting problems. But, up to now, only the basic ICA method (i.e., temporal ICA model) was applied to sale forecasting problem. In this paper, we utilize three different ICA methods including spatial ICA (sICA), temporal ICA (tICA), and spatiotemporal ICA (stICA) to extract features from the sales data and compare their performance in sales forecasting of IT chain store. Experimental results from a real sales data show that the sales forecasting scheme by integrating stICA and SVR outperforms the comparison models in terms of forecasting error. The stICA is a promising tool for extracting effective features from branch sales data and the extracted features can improve the prediction performance of SVR for sales forecasting. PMID:25165740
Power Law Regression Analysis of Heat Flux Width in Type I ELMs
Stephens, C. D.; Makowski, M. A.; Leonard, A. W.; Osborne, T. H.
2014-10-01
In this project, a database of Type I ELM characteristics has been assembled and will be used to investigate possible dependencies of the heat flux width on physics and engineering parameters. At the edge near the divertor, high impulsive heat loads are imparted onto the surface. The impact of these ELMs can cause a reduction in divertor lifetime if the heat flux is great enough due to material erosion. A program will be used to analyze data, extract relevant, measurable quantities, and record the quantities in the table. Care is taken to accurately capture the complex space/time structure of the ELM. Then correlations between discharge and equilibrium parameters will be investigated. Power law regression analysis will be used to help determine the dependence of the heat flux width on these various measurable quantities and parameters. This will enable us to better understand the physics of heat flux at the edge. Work supported in part by the National Undergraduate Fellowship Program in Plasma Physics and Fusion Energy Sciences and the US DOE under DE-FG02-04ER54761, DE-AC52-07NA27344, DE-FC02-04ER54698.
International Nuclear Information System (INIS)
There are many opinions on the reason of hypothyroidism after hyperthyroidism with 131I treatment. In this respect, there are a few scientific analyses and reports. The non-condition logistic regression solved this problem successfully. It has a higher scientific value and confidence in the risk factor analysis. 748 follow-up patients' data were analysed by the non-condition logistic regression. The results shown that the half-life and 131I dose were the main causes of the incidence of hypothyroidism. The degree of confidence is 92.4%
Analysis of a multiple dispatch algorithm
Holmberg, Johannes
2004-01-01
The development of the new programming language Scream, within the project Software Renaissance, led to the need of a good multiple dispatch algorithm. A multiple dispatch algorithm, called Compressed n-dimensional table with row sharing; CNT-RS, was developed from the algorithm Compressed n-dimensional table, CNT. The purpose of CNT-RS was to create a more efficient algorithm. This report is the result of the work to analyse the CNT-RS algorithm. In this report the domain of multiple dispat...
Mauro, Alessandro
2006-01-01
PURPOSE: Quality of life in multiple sclerosis has been often measured through the SF-36 questionnaire. In this study, validation of the SF-36 summary scores, its 'physical' component, and its 'mental' component was attempted by exploring the joint predictive power of disability (EDSS score), of anxiety and depression (HADS-A and -D scores, respectively), and of disease duration, progression type, age, gender and marital status. METHOD: The sample consisted of 75 patients suffering from multi...
Lançon Christophe
2006-07-01
Full Text Available Abstract Background Data comparing duloxetine with existing antidepressant treatments is limited. A comparison of duloxetine with fluoxetine has been performed but no comparison with venlafaxine, the other antidepressant in the same therapeutic class with a significant market share, has been undertaken. In the absence of relevant data to assess the place that duloxetine should occupy in the therapeutic arsenal, indirect comparisons are the most rigorous way to go. We conducted a systematic review of the efficacy of duloxetine, fluoxetine and venlafaxine versus placebo in the treatment of Major Depressive Disorder (MDD, and performed indirect comparisons through meta-regressions. Methods The bibliography of the Agency for Health Care Policy and Research and the CENTRAL, Medline, and Embase databases were interrogated using advanced search strategies based on a combination of text and index terms. The search focused on randomized placebo-controlled clinical trials involving adult patients treated for acute phase Major Depressive Disorder. All outcomes were derived to take account for varying placebo responses throughout studies. Primary outcome was treatment efficacy as measured by Hedge's g effect size. Secondary outcomes were response and dropout rates as measured by log odds ratios. Meta-regressions were run to indirectly compare the drugs. Sensitivity analysis, assessing the influence of individual studies over the results, and the influence of patients' characteristics were run. Results 22 studies involving fluoxetine, 9 involving duloxetine and 8 involving venlafaxine were selected. Using indirect comparison methodology, estimated effect sizes for efficacy compared with duloxetine were 0.11 [-0.14;0.36] for fluoxetine and 0.22 [0.06;0.38] for venlafaxine. Response log odds ratios were -0.21 [-0.44;0.03], 0.70 [0.26;1.14]. Dropout log odds ratios were -0.02 [-0.33;0.29], 0.21 [-0.13;0.55]. Sensitivity analyses showed that results were consistent. Conclusion Fluoxetine was not statistically different in either tolerability or efficacy when compared with duloxetine. Venlafaxine was significantly superior to duloxetine in all analyses except dropout rate. In the absence of relevant data from head-to-head comparison trials, results suggest that venlafaxine is superior compared with duloxetine and that duloxetine does not differentiate from fluoxetine.
Pradhan, Biswajeet
Recently, in 2006 and 2007 heavy monsoons rainfall have triggered floods along Malaysia's east coast as well as in southern state of Johor. The hardest hit areas are along the east coast of peninsular Malaysia in the states of Kelantan, Terengganu and Pahang. The city of Johor was particularly hard hit in southern side. The flood cost nearly billion ringgit of property and many lives. The extent of damage could have been reduced or minimized if an early warning system would have been in place. This paper deals with flood susceptibility analysis using logistic regression model. We have evaluated the flood susceptibility and the effect of flood-related factors along the Kelantan river basin using the Geographic Information System (GIS) and remote sensing data. Previous flooded areas were extracted from archived radarsat images using image processing tools. Flood susceptibility mapping was conducted in the study area along the Kelantan River using radarsat imagery and then enlarged to 1:25,000 scales. Topographical, hydrological, geological data and satellite images were collected, processed, and constructed into a spatial database using GIS and image processing. The factors chosen that influence flood occurrence were: topographic slope, topographic aspect, topographic curvature, DEM and distance from river drainage, all from the topographic database; flow direction, flow accumulation, extracted from hydrological database; geology and distance from lineament, taken from the geologic database; land use from SPOT satellite images; soil texture from soil database; and the vegetation index value from SPOT satellite images. Flood susceptible areas were analyzed and mapped using the probability-logistic regression model. Results indicate that flood prone areas can be performed at 1:25,000 which is comparable to some conventional flood hazard map scales. The flood prone areas delineated on these maps correspond to areas that would be inundated by significant flooding (approximately the 100 year flood). The flood prone area boundaries were generally in agreement with flood hazard maps produced by the Department of Irrigation and Drainage although the latter are somewhat more detailed because of their larger scale.
Nowrouzi, Behdin; Souza, Renan P; Zai, Clement; Shinkai, Takahiro; Monda, Marcellino; Lieberman, Jeffrey; Volvaka, Jan; Meltzer, Herbert Y; Kennedy, James L; De Luca, Vincenzo
2013-03-01
Antipsychotics-induced weight gain is a complex phenomenon with a relevant underlying genetic basis. Polymorphisms of serotonin receptors and related proteins were genotyped in 139 schizophrenia patients and incorporated as covariates in a mixture regression model of weight gain in combination with clinical covariates. The HTR1D rs6300 polymorphism was showing a slight significance conferring risk for obesity (heavy weight gain group) under additive model. After correcting for multiple testing all the genetic predictors were non-significant, however the clinical predictors were associated with the risk of heavy weight gain. These findings suggest a role of ethnicity and olanzapine in increasing the risk for obesity in the heavy weight gain group and haloperidol protecting against heavy weight gain. The mixture regression model appears to be a useful strategy to highlight different weight gain subgroups that are affected differently by clinical and genetic predictors. PMID:22840963
International Nuclear Information System (INIS)
Spontaneous regression of cerebral arteriovenous malformation (AVM) is rare and poorly understood. We reviewed the clinical and angiographic findings in patients who had spontaneous regression of cerebral AVMs to determine whether common features were present. The clinical and angiographic findings of four cases from our series and 29 cases from the literature were retrospectively reviewed. The clinical and angiographic features analyzed were: age at diagnosis, initial presentation, venous drainage pattern, number of draining veins, location of the AVM, number of arterial feeders, clinical events during the interval period to thrombosis, and interval period to spontaneous thrombosis. Common clinical and angiographic features of spontaneous regression of cerebral AVMs are: intracranial hemorrhage as an initial presentation, small AVMs, and a single draining vein. Spontaneous regression of cerebral AVMs can not be predicted by clinical or angiographic features, therefore it should not be considered as an option in cerebral AVM management, despite its proven occurrence. (orig.)