WorldWideScience

Sample records for multiple regression analysis

  1. Multiple linear regression analysis

    Science.gov (United States)

    Edwards, T. R.

    1980-01-01

    Program rapidly selects best-suited set of coefficients. User supplies only vectors of independent and dependent data and specifies confidence level required. Program uses stepwise statistical procedure for relating minimal set of variables to set of observations; final regression contains only most statistically significant coefficients. Program is written in FORTRAN IV for batch execution and has been implemented on NOVA 1200.

  2. Remaining Phosphorus Estimate Through Multiple Regression Analysis

    Institute of Scientific and Technical Information of China (English)

    M. E. ALVES; A. LAVORENTI

    2006-01-01

    The remaining phosphorus (Prem), P concentration that remains in solution after shaking soil with 0.01 mol L-1 CaCl2 containing 60 μg mL-1 P, is a very useful index for studies related to the chemistry of variable charge soils. Although the Prem determination is a simple procedure, the possibility of estimating accurate values of this index from easily and/or routinely determined soil properties can be very useful for practical purposes. The present research evaluated the Premestimation through multiple regression analysis in which routinely determined soil chemical data, soil clay content and soil pH measured in 1 mol L-1 NaF (pHNaF) figured as Prem predictor variables. The Prem can be estimated with acceptable accuracy using the above-mentioned approach, and PHNaF not only substitutes for clay content as a predictor variable but also confers more accuracy to the Prem estimates.

  3. General Nature of Multicollinearity in Multiple Regression Analysis.

    Science.gov (United States)

    Liu, Richard

    1981-01-01

    Discusses multiple regression, a very popular statistical technique in the field of education. One of the basic assumptions in regression analysis requires that independent variables in the equation should not be highly correlated. The problem of multicollinearity and some of the solutions to it are discussed. (Author)

  4. Research and analyze of physical health using multiple regression analysis

    Directory of Open Access Journals (Sweden)

    T. S. Kyi

    2014-01-01

    Full Text Available This paper represents the research which is trying to create a mathematical model of the "healthy people" using the method of regression analysis. The factors are the physical parameters of the person (such as heart rate, lung capacity, blood pressure, breath holding, weight height coefficient, flexibility of the spine, muscles of the shoulder belt, abdominal muscles, squatting, etc.., and the response variable is an indicator of physical working capacity. After performing multiple regression analysis, obtained useful multiple regression models that can predict the physical performance of boys the aged of fourteen to seventeen years. This paper represents the development of regression model for the sixteen year old boys and analyzed results.

  5. MULTIPLE REGRESSION ANALYSIS OF MAIN ECONOMIC INDICATORS IN TOURISM

    Directory of Open Access Journals (Sweden)

    Erika KULCSÁR

    2009-12-01

    Full Text Available This paper analysis the measure between GDP dependent variable in the sector of hotels and restaurants and the following independent variables: overnight stays in the establishments of touristic reception, arrivals in the establishments of touristic reception and investments in hotels and restaurants sector in the period of analysis 1995-2007. With the multiple regression analysis I found that investments and tourist arrivals are significant predictors for the GDP dependent variable. Based on these results, I identified those components of the marketing mix, which in my opinion require investment, which could contribute to the positive development of tourist arrivals in the establishments of touristic reception.

  6. Multiple regression for physiological data analysis: the problem of multicollinearity.

    Science.gov (United States)

    Slinker, B K; Glantz, S A

    1985-07-01

    Multiple linear regression, in which several predictor variables are related to a response variable, is a powerful statistical tool for gaining quantitative insight into complex in vivo physiological systems. For these insights to be correct, all predictor variables must be uncorrelated. However, in many physiological experiments the predictor variables cannot be precisely controlled and thus change in parallel (i.e., they are highly correlated). There is a redundancy of information about the response, a situation called multicollinearity, that leads to numerical problems in estimating the parameters in regression equations; the parameters are often of incorrect magnitude or sign or have large standard errors. Although multicollinearity can be avoided with good experimental design, not all interesting physiological questions can be studied without encountering multicollinearity. In these cases various ad hoc procedures have been proposed to mitigate multicollinearity. Although many of these procedures are controversial, they can be helpful in applying multiple linear regression to some physiological problems.

  7. Business applications of multiple regression

    CERN Document Server

    Richardson, Ronny

    2015-01-01

    This second edition of Business Applications of Multiple Regression describes the use of the statistical procedure called multiple regression in business situations, including forecasting and understanding the relationships between variables. The book assumes a basic understanding of statistics but reviews correlation analysis and simple regression to prepare the reader to understand and use multiple regression. The techniques described in the book are illustrated using both Microsoft Excel and a professional statistical program. Along the way, several real-world data sets are analyzed in deta

  8. An improved multiple linear regression and data analysis computer program package

    Science.gov (United States)

    Sidik, S. M.

    1972-01-01

    NEWRAP, an improved version of a previous multiple linear regression program called RAPIER, CREDUC, and CRSPLT, allows for a complete regression analysis including cross plots of the independent and dependent variables, correlation coefficients, regression coefficients, analysis of variance tables, t-statistics and their probability levels, rejection of independent variables, plots of residuals against the independent and dependent variables, and a canonical reduction of quadratic response functions useful in optimum seeking experimentation. A major improvement over RAPIER is that all regression calculations are done in double precision arithmetic.

  9. Modeling of retardance in ferrofluid with Taguchi-based multiple regression analysis

    Science.gov (United States)

    Lin, Jing-Fung; Wu, Jyh-Shyang; Sheu, Jer-Jia

    2015-03-01

    The citric acid (CA) coated Fe3O4 ferrofluids are prepared by a co-precipitation method and the magneto-optical retardance property is measured by a Stokes polarimeter. Optimization and multiple regression of retardance in ferrofluids are executed by combining Taguchi method and Excel. From the nine tests for four parameters, including pH of suspension, molar ratio of CA to Fe3O4, volume of CA, and coating temperature, influence sequence and excellent program are found. Multiple regression analysis and F-test on the significance of regression equation are performed. It is found that the model F value is much larger than Fcritical and significance level P <0.0001. So it can be concluded that the regression model has statistically significant predictive ability. Substituting excellent program into equation, retardance is obtained as 32.703°, higher than the highest value in tests by 11.4%.

  10. Assessing the Impact of Influential Observations on Multiple Regression Analysis on Human Resource Research.

    Science.gov (United States)

    Bates, Reid A.; Holton, Elwood F., III; Burnett, Michael F.

    1999-01-01

    A case study of learning transfer demonstrates the possible effect of influential observation on linear regression analysis. A diagnostic method that tests for violation of assumptions, multicollinearity, and individual and multiple influential observations helps determine which observation to delete to eliminate bias. (SK)

  11. Factor analysis and multiple regression between topography and precipitation on Jeju Island, Korea

    Science.gov (United States)

    Um, Myoung-Jin; Yun, Hyeseon; Jeong, Chang-Sam; Heo, Jun-Haeng

    2011-11-01

    SummaryIn this study, new factors that influence precipitation were extracted from geographic variables using factor analysis, which allow for an accurate estimation of orographic precipitation. Correlation analysis was also used to examine the relationship between nine topographic variables from digital elevation models (DEMs) and the precipitation in Jeju Island. In addition, a spatial analysis was performed in order to verify the validity of the regression model. From the results of the correlation analysis, it was found that all of the topographic variables had a positive correlation with the precipitation. The relations between the variables also changed in accordance with a change in the precipitation duration. However, upon examining the correlation matrix, no significant relationship between the latitude and the aspect was found. According to the factor analysis, eight topographic variables (latitude being the exception) were found to have a direct influence on the precipitation. Three factors were then extracted from the eight topographic variables. By directly comparing the multiple regression model with the factors (model 1) to the multiple regression model with the topographic variables (model 3), it was found that model 1 did not violate the limits of statistical significance and multicollinearity. As such, model 1 was considered to be appropriate for estimating the precipitation when taking into account the topography. In the study of model 1, the multiple regression model using factor analysis was found to be the best method for estimating the orographic precipitation on Jeju Island.

  12. FORECASTING THE FINANCIAL RETURNS FOR USING MULTIPLE REGRESSION BASED ON PRINCIPAL COMPONENT ANALYSIS

    Directory of Open Access Journals (Sweden)

    Nop Sopipan

    2013-01-01

    Full Text Available The aim of this study was to forecast the returns for the Stock Exchange of Thailand (SET Index by adding some explanatory variables and stationary Autoregressive order p (AR (p in the mean equation of returns. In addition, we used Principal Component Analysis (PCA to remove possible complications caused by multicollinearity. Results showed that the multiple regressions based on PCA, has the best performance.

  13. COLOR IMAGE RETRIEVAL BASED ON FEATURE FUSION THROUGH MULTIPLE LINEAR REGRESSION ANALYSIS

    Directory of Open Access Journals (Sweden)

    K. Seetharaman

    2015-08-01

    Full Text Available This paper proposes a novel technique based on feature fusion using multiple linear regression analysis, and the least-square estimation method is employed to estimate the parameters. The given input query image is segmented into various regions according to the structure of the image. The color and texture features are extracted on each region of the query image, and the features are fused together using the multiple linear regression model. The estimated parameters of the model, which is modeled based on the features, are formed as a vector called a feature vector. The Canberra distance measure is adopted to compare the feature vectors of the query and target images. The F-measure is applied to evaluate the performance of the proposed technique. The obtained results expose that the proposed technique is comparable to the other existing techniques.

  14. Proximate analysis based multiple regression models for higher heating value estimation of low rank coals

    Energy Technology Data Exchange (ETDEWEB)

    Akkaya, Ali Volkan [Department of Mechanical Engineering, Yildiz Technical University, 34349 Besiktas, Istanbul (Turkey)

    2009-02-15

    In this paper, multiple nonlinear regression models for estimation of higher heating value of coals are developed using proximate analysis data obtained generally from the low rank coal samples as-received basis. In this modeling study, three main model structures depended on the number of proximate analysis parameters, which are named the independent variables, such as moisture, ash, volatile matter and fixed carbon, are firstly categorized. Secondly, sub-model structures with different arrangements of the independent variables are considered. Each sub-model structure is analyzed with a number of model equations in order to find the best fitting model using multiple nonlinear regression method. Based on the results of nonlinear regression analysis, the best model for each sub-structure is determined. Among them, the models giving highest correlation for three main structures are selected. Although the selected all three models predicts HHV rather accurately, the model involving four independent variables provides the most accurate estimation of HHV. Additionally, when the chosen model with four independent variables and a literature model are tested with extra proximate analysis data, it is seen that that the developed model in this study can give more accurate prediction of HHV of coals. It can be concluded that the developed model is effective tool for HHV estimation of low rank coals. (author)

  15. Assessing Credit Default using Logistic Regression and Multiple Discriminant Analysis: Empirical Evidence from Bosnia and Herzegovina

    Directory of Open Access Journals (Sweden)

    Deni Memić

    2015-01-01

    Full Text Available This article has an aim to assess credit default prediction on the banking market in Bosnia and Herzegovina nationwide as well as on its constitutional entities (Federation of Bosnia and Herzegovina and Republika Srpska. Ability to classify companies info different predefined groups or finding an appropriate tool which would replace human assessment in classifying companies into good and bad buckets has been one of the main interests on risk management researchers for a long time. We investigated the possibility and accuracy of default prediction using traditional statistical methods logistic regression (logit and multiple discriminant analysis (MDA and compared their predictive abilities. The results show that the created models have high predictive ability. For logit models, some variables are more influential on the default prediction than the others. Return on assets (ROA is statistically significant in all four periods prior to default, having very high regression coefficients, or high impact on the model's ability to predict default. Similar results are obtained for MDA models. It is also found that predictive ability differs between logistic regression and multiple discriminant analysis.

  16. FRICTION MODELING OF Al-Mg ALLOY SHEETS BASED ON MULTIPLE REGRESSION ANALYSIS AND NEURAL NETWORKS

    Directory of Open Access Journals (Sweden)

    Hirpa G. Lemu

    2017-03-01

    Full Text Available This article reports a proposed approach to a frictional resistance description in sheet metal forming processes that enables determination of the friction coefficient value under a wide range of friction conditions without performing time-consuming experiments. The motivation for this proposal is the fact that there exists a considerable amount of factors affect the friction coefficient value and as a result building analytical friction model for specified process conditions is practically impossible. In this proposed approach, a mathematical model of friction behaviour is created using multiple regression analysis and artificial neural networks. The regression analysis was performed using a subroutine in MATLAB programming code and STATISTICA Neural Networks was utilized to build an artificial neural networks model. The effect of different training strategies on the quality of neural networks was studied. As input variables for regression model and training of radial basis function networks, generalized regression neural networks and multilayer networks the results of strip drawing friction test were utilized. Four kinds of Al-Mg alloy sheets were used as a test material.

  17. Ca analysis: an Excel based program for the analysis of intracellular calcium transients including multiple, simultaneous regression analysis.

    Science.gov (United States)

    Greensmith, David J

    2014-01-01

    Here I present an Excel based program for the analysis of intracellular Ca transients recorded using fluorescent indicators. The program can perform all the necessary steps which convert recorded raw voltage changes into meaningful physiological information. The program performs two fundamental processes. (1) It can prepare the raw signal by several methods. (2) It can then be used to analyze the prepared data to provide information such as absolute intracellular Ca levels. Also, the rates of change of Ca can be measured using multiple, simultaneous regression analysis. I demonstrate that this program performs equally well as commercially available software, but has numerous advantages, namely creating a simplified, self-contained analysis workflow.

  18. Regression Analysis

    CERN Document Server

    Freund, Rudolf J; Sa, Ping

    2006-01-01

    The book provides complete coverage of the classical methods of statistical analysis. It is designed to give students an understanding of the purpose of statistical analyses, to allow the student to determine, at least to some degree, the correct type of statistical analyses to be performed in a given situation, and have some appreciation of what constitutes good experimental design

  19. Multiple Regression and Its Discontents

    Science.gov (United States)

    Snell, Joel C.; Marsh, Mitchell

    2012-01-01

    Multiple regression is part of a larger statistical strategy originated by Gauss. The authors raise questions about the theory and suggest some changes that would make room for Mandelbrot and Serendipity.

  20. A Performance Study of Data Mining Techniques: Multiple Linear Regression vs. Factor Analysis

    CERN Document Server

    Taneja, Abhishek

    2011-01-01

    The growing volume of data usually creates an interesting challenge for the need of data analysis tools that discover regularities in these data. Data mining has emerged as disciplines that contribute tools for data analysis, discovery of hidden knowledge, and autonomous decision making in many application domains. The purpose of this study is to compare the performance of two data mining techniques viz., factor analysis and multiple linear regression for different sample sizes on three unique sets of data. The performance of the two data mining techniques is compared on following parameters like mean square error (MSE), R-square, R-Square adjusted, condition number, root mean square error(RMSE), number of variables included in the prediction model, modified coefficient of efficiency, F-value, and test of normality. These parameters have been computed using various data mining tools like SPSS, XLstat, Stata, and MS-Excel. It is seen that for all the given dataset, factor analysis outperform multiple linear re...

  1. Variables Associated with Communicative Participation in People with Multiple Sclerosis: A Regression Analysis

    Science.gov (United States)

    Baylor, Carolyn; Yorkston, Kathryn; Bamer, Alyssa; Britton, Deanna; Amtmann, Dagmar

    2010-01-01

    Purpose: To explore variables associated with self-reported communicative participation in a sample (n = 498) of community-dwelling adults with multiple sclerosis (MS). Method: A battery of questionnaires was administered online or on paper per participant preference. Data were analyzed using multiple linear backward stepwise regression. The…

  2. Melanin and blood concentration in human skin studied by multiple regression analysis: experiments

    Science.gov (United States)

    Shimada, M.; Yamada, Y.; Itoh, M.; Yatagai, T.

    2001-09-01

    Knowledge of the mechanism of human skin colour and measurement of melanin and blood concentration in human skin are needed in the medical and cosmetic fields. The absorbance spectrum from reflectance at the visible wavelength of human skin increases under several conditions such as a sunburn or scalding. The change of the absorbance spectrum from reflectance including the scattering effect does not correspond to the molar absorption spectrum of melanin and blood. The modified Beer-Lambert law is applied to the change in the absorbance spectrum from reflectance of human skin as the change in melanin and blood is assumed to be small. The concentration of melanin and blood was estimated from the absorbance spectrum reflectance of human skin using multiple regression analysis. Estimated concentrations were compared with the measured one in a phantom experiment and this method was applied to in vivo skin.

  3. [Multiple dependent variables LS-SVM regression algorithm and its application in NIR spectral quantitative analysis].

    Science.gov (United States)

    An, Xin; Xu, Shuo; Zhang, Lu-Da; Su, Shi-Guang

    2009-01-01

    In the present paper, on the basis of LS-SVM algorithm, we built a multiple dependent variables LS-SVM (MLS-SVM) regression model whose weights can be optimized, and gave the corresponding algorithm. Furthermore, we theoretically explained the relationship between MLS-SVM and LS-SVM. Sixty four broomcorn samples were taken as experimental material, and the sample ratio of modeling set to predicting set was 51 : 13. We first selected randomly and uniformly five weight groups in the interval [0, 1], and then in the way of leave-one-out (LOO) rule determined one appropriate weight group and parameters including penalizing parameters and kernel parameters in the model according to the criterion of the minimum of average relative error. Then a multiple dependent variables quantitative analysis model was built with NIR spectrum and simultaneously analyzed three chemical constituents containing protein, lysine and starch. Finally, the average relative errors between actual values and predicted ones by the model of three components for the predicting set were 1.65%, 6.47% and 1.37%, respectively, and the correlation coefficients were 0.9940, 0.8392 and 0.8825, respectively. For comparison, LS-SVM was also utilized, for which the average relative errors were 1.68%, 6.25% and 1.47%, respectively, and the correlation coefficients were 0.9941, 0.8310 and 0.8800, respectively. It is obvious that MLS-SVM algorithm is comparable to LS-SVM algorithm in modeling analysis performance, and both of them can give satisfying results. The result shows that the model with MLS-SVM algorithm is capable of doing multi-components NIR quantitative analysis synchronously. Thus MLS-SVM algorithm offers a new multiple dependent variables quantitative analysis approach for chemometrics. In addition, the weights have certain effect on the prediction performance of the model with MLS-SVM, which is consistent with our intuition and is validated in this study. Therefore, it is necessary to optimize

  4. A factor analysis-multiple regression model for source apportionment of suspended particulate matter

    Science.gov (United States)

    Okamoto, Shin'ichi; Hayashi, Masayuki; Nakajima, Masaomi; Kainuma, Yasutaka; Shiozawa, Kiyoshige

    A factor analysis-multiple regression (FA-MR) model has been used for a source apportionment study in the Tokyo metropolitan area. By a varimax rotated factor analysis, five source types could be identified: refuse incineration, soil and automobile, secondary particles, sea salt and steel mill. Quantitative estimations using the FA-MR model corresponded to the calculated contributing concentrations determined by using a weighted least-squares CMB model. However, the source type of refuse incineration identified by the FA-MR model was similar to that of biomass burning, rather than that produced by an incineration plant. The estimated contributions of sea salt and steel mill by the FA-MR model contained those of other sources, which have the same temporal variation of contributing concentrations. This symptom was caused by a multicollinearity problem. Although this result shows the limitation of the multivariate receptor model, it gives useful information concerning source types and their distribution by comparing with the results of the CMB model. In the Tokyo metropolitan area, the contributions from soil (including road dust), automobile, secondary particles and refuse incineration (biomass burning) were larger than industrial contributions: fuel oil combustion and steel mill. However, since vanadium is highly correlated with SO 42- and other secondary particle related elements, a major portion of secondary particles is considered to be related to fuel oil combustion.

  5. Performance Prediction Modelling for Flexible Pavement on Low Volume Roads Using Multiple Linear Regression Analysis

    Directory of Open Access Journals (Sweden)

    C. Makendran

    2015-01-01

    Full Text Available Prediction models for low volume village roads in India are developed to evaluate the progression of different types of distress such as roughness, cracking, and potholes. Even though the Government of India is investing huge quantum of money on road construction every year, poor control over the quality of road construction and its subsequent maintenance is leading to the faster road deterioration. In this regard, it is essential that scientific maintenance procedures are to be evolved on the basis of performance of low volume flexible pavements. Considering the above, an attempt has been made in this research endeavor to develop prediction models to understand the progression of roughness, cracking, and potholes in flexible pavements exposed to least or nil routine maintenance. Distress data were collected from the low volume rural roads covering about 173 stretches spread across Tamil Nadu state in India. Based on the above collected data, distress prediction models have been developed using multiple linear regression analysis. Further, the models have been validated using independent field data. It can be concluded that the models developed in this study can serve as useful tools for the practicing engineers maintaining flexible pavements on low volume roads.

  6. A simplified calculation procedure for mass isotopomer distribution analysis (MIDA) based on multiple linear regression.

    Science.gov (United States)

    Fernández-Fernández, Mario; Rodríguez-González, Pablo; García Alonso, J Ignacio

    2016-10-01

    We have developed a novel, rapid and easy calculation procedure for Mass Isotopomer Distribution Analysis based on multiple linear regression which allows the simultaneous calculation of the precursor pool enrichment and the fraction of newly synthesized labelled proteins (fractional synthesis) using linear algebra. To test this approach, we used the peptide RGGGLK as a model tryptic peptide containing three subunits of glycine. We selected glycine labelled in two (13) C atoms ((13) C2 -glycine) as labelled amino acid to demonstrate that spectral overlap is not a problem in the proposed methodology. The developed methodology was tested first in vitro by changing the precursor pool enrichment from 10 to 40% of (13) C2 -glycine. Secondly, a simulated in vivo synthesis of proteins was designed by combining the natural abundance RGGGLK peptide and 10 or 20% (13) C2 -glycine at 1 : 1, 1 : 3 and 3 : 1 ratios. Precursor pool enrichments and fractional synthesis values were calculated with satisfactory precision and accuracy using a simple spreadsheet. This novel approach can provide a relatively rapid and easy means to measure protein turnover based on stable isotope tracers. Copyright © 2016 John Wiley & Sons, Ltd.

  7. Oral health-related risk behaviours and attitudes among Croatian adolescents--multiple logistic regression analysis.

    Science.gov (United States)

    Spalj, Stjepan; Spalj, Vedrana Tudor; Ivanković, Luida; Plancak, Darije

    2014-03-01

    The aim of this study was to explore the patterns of oral health-related risk behaviours in relation to dental status, attitudes, motivation and knowledge among Croatian adolescents. The assessment was conducted in the sample of 750 male subjects - military recruits aged 18-28 in Croatia using the questionnaire and clinical examination. Mean number of decayed, missing and filled teeth (DMFT) and Significant Caries Index (SIC) were calculated. Multiple logistic regression models were crated for analysis. Although models of risk behaviours were statistically significant their explanatory values were quite low. Five of them--rarely toothbrushing, not using hygiene auxiliaries, rarely visiting dentist, toothache as a primary reason to visit dentist, and demand for tooth extraction due to toothache--had the highest explanatory values ranging from 21-29% and correctly classified 73-89% of subjects. Toothache as a primary reason to visit dentist, extraction as preferable therapy when toothache occurs, not having brushing education in school and frequent gingival bleeding were significantly related to population with high caries experience (DMFT > or = 14 according to SiC) producing Odds ratios of 1.6 (95% CI 1.07-2.46), 2.1 (95% CI 1.29-3.25), 1.8 (95% CI 1.21-2.74) and 2.4 (95% CI 1.21-2.74) respectively. DMFT> or = 14 model had low explanatory value of 6.5% and correctly classified 83% of subjects. It can be concluded that oral health-related risk behaviours are interrelated. Poor association was seen between attitudes concerning oral health and oral health-related risk behaviours, indicating insufficient motivation to change lifestyle and habits. Self-reported oral hygiene habits were not strongly related to dental status.

  8. Physical and Cognitive-Affective Factors Associated with Fatigue in Individuals with Fibromyalgia: A Multiple Regression Analysis

    Science.gov (United States)

    Muller, Veronica; Brooks, Jessica; Tu, Wei-Mo; Moser, Erin; Lo, Chu-Ling; Chan, Fong

    2015-01-01

    Purpose: The main objective of this study was to determine the extent to which physical and cognitive-affective factors are associated with fibromyalgia (FM) fatigue. Method: A quantitative descriptive design using correlation techniques and multiple regression analysis. The participants consisted of 302 members of the National Fibromyalgia &…

  9. Analysis of aromatic constituents in multicomponent hydrocarbon mixtures by infrared spectroscopy using multiple linear regression

    Science.gov (United States)

    Vesnin, V. L.; Muradov, V. G.

    2012-09-01

    Absorption spectra of multicomponent hydrocarbon mixtures based on n-heptane and isooctane with addition of benzene (up to 1%) and toluene and o-xylene (up to 20%) were investigated experimentally in the region of the first overtones of the hydrocarbon groups (λ = 1620-1780 nm). It was shown that their concentrations could be determined separately by using a multiple linear regression method. The optimum result was obtained by including four wavelengths at 1671, 1680, 1685, and 1695 nm, which took into account absorption of CH groups in benzene, toluene, and o-xylene and CH3 groups, respectively.

  10. Regression analysis by example

    CERN Document Server

    Chatterjee, Samprit

    2012-01-01

    Praise for the Fourth Edition: ""This book is . . . an excellent source of examples for regression analysis. It has been and still is readily readable and understandable."" -Journal of the American Statistical Association Regression analysis is a conceptually simple method for investigating relationships among variables. Carrying out a successful application of regression analysis, however, requires a balance of theoretical results, empirical rules, and subjective judgment. Regression Analysis by Example, Fifth Edition has been expanded

  11. FIRE: an SPSS program for variable selection in multiple linear regression analysis via the relative importance of predictors.

    Science.gov (United States)

    Lorenzo-Seva, Urbano; Ferrando, Pere J

    2011-03-01

    We provide an SPSS program that implements currently recommended techniques and recent developments for selecting variables in multiple linear regression analysis via the relative importance of predictors. The approach consists of: (1) optimally splitting the data for cross-validation, (2) selecting the final set of predictors to be retained in the equation regression, and (3) assessing the behavior of the chosen model using standard indices and procedures. The SPSS syntax, a short manual, and data files related to this article are available as supplemental materials from brm.psychonomic-journals.org/content/supplemental.

  12. Prediction of cavity growth rate during underground coal gasification using multiple regression analysis

    Institute of Scientific and Technical Information of China (English)

    Mehdi Najafi; Seyed Mohammad Esmaiel Jalali; Reza KhaloKakaie; Farrokh Forouhandeh

    2015-01-01

    During underground coal gasification (UCG), whereby coal is converted to syngas in situ, a cavity is formed in the coal seam. The cavity growth rate (CGR) or the moving rate of the gasification face is affected by controllable (operation pressure, gasification time, geometry of UCG panel) and uncontrollable (coal seam properties) factors. The CGR is usually predicted by mathematical models and laboratory experiments, which are time consuming, cumbersome and expensive. In this paper, a new simple model for CGR is developed using non-linear regression analysis, based on data from 11 UCG field trials. The empirical model compares satisfactorily with Perkins model and can reliably predict CGR.

  13. Investigations upon the indefinite rolls quality assurance in multiple regression analysis

    Directory of Open Access Journals (Sweden)

    Kiss, I.

    2012-04-01

    Full Text Available The rolling rolls quality has been enhanced mainly due to the improvements of the chemical compositions of rolls materials. The realization of an optimal chemical composition can constitute a technical efficient mode to assure the exploitation properties, the material from which the rolling mills rolls are manufactured having a higher importance in this sense. This paper continues to present the scientifically results of our experimental research in the area of the rolling rolls. The basic research contains concrete elements of immediate practical utilities in the metallurgical enterprises, for the quality improvements of rolls, having in last as the aim the durability growth and the safety in exploitation. This paper presents an analysis of the chemical composition, the influences upon the mechanical properties of the indefinite cast iron rolls. We present some mathematical correlations and graphical interpretations between the hardness (on the working surface and on necks and the chemical composition. Using the double and triple correlations which is really helpful in the foundry practice, as it allows us to determine variation boundaries for the chemical composition, in view the obtaining the optimal values of the hardness. We suggest a mathematical interpretation of the influence of the chemical composition over the hardness of these indefinite rolling rolls. In this sense we use the multiple regression analysis which can be an important statistical tool for the investigation of relationships between variables. The enunciation of some mathematically modeling results can be described through a number of multi-component equations determined for the spaces with 3 and 4 dimensions. Also, the regression surfaces, curves of levels and volumes of variations can be represented and interpreted by technologists considering these as correlation diagrams between the analyzed variables. In this sense, these researches results can be used in the engineers

  14. Practical Session: Multiple Linear Regression

    Science.gov (United States)

    Clausel, M.; Grégoire, G.

    2014-12-01

    Three exercises are proposed to illustrate the simple linear regression. In the first one investigates the influence of several factors on atmospheric pollution. It has been proposed by D. Chessel and A.B. Dufour in Lyon 1 (see Sect. 6 of http://pbil.univ-lyon1.fr/R/pdf/tdr33.pdf) and is based on data coming from 20 cities of U.S. Exercise 2 is an introduction to model selection whereas Exercise 3 provides a first example of analysis of variance. Exercises 2 and 3 have been proposed by A. Dalalyan at ENPC (see Exercises 2 and 3 of http://certis.enpc.fr/~dalalyan/Download/TP_ENPC_5.pdf).

  15. A calibration method of Argo floats based on multiple regression analysis

    Institute of Scientific and Technical Information of China (English)

    2006-01-01

    Argo floats are free-moving floats that report vertical profiles of salinity, temperature and pressure at regular time intervals. These floats give good measurements of temperature and pressure, but salinity measurements may show significant sensor drifting with time. It is found that sensor drifting with time is not purely linear as presupposed by Wong (2003). A new method is developed to calibrate conductivity data measured by Argo floats. In this method, Wong's objective analysis method was adopted to estimate the background climatological salinity field on potential temperature surfaces from nearby historical data in WOD01. Furthermore, temperature and time factors are taken into account, and stepwise regression was used for a time-varying or temperature-varying slope in potential conductivity space to correct the drifting in these profiling float salinity data. The result shows salinity errors using this method are smaller than that of Wong's method, the quantitative and qualitative analysis of the conductivity sensor can be carried out with our method.

  16. Integrative analysis of multiple diverse omics datasets by sparse group multitask regression

    Directory of Open Access Journals (Sweden)

    Dongdong eLin

    2014-10-01

    Full Text Available A variety of high throughput genome-wide assays enable the exploration of genetic risk factors underlying complex traits. Although these studies have remarkable impact on identifying susceptible biomarkers, they suffer from issues such as limited sample size and low reproducibility. Combining individual studies of different genetic levels/platforms has the promise to improve the power and consistency of biomarker identification. In this paper, we propose a novel integrative method, namely sparse group multitask regression, for integrating diverse omics datasets, platforms and populations to identify risk genes/factors of complex diseases. This method combines multitask learning with sparse group regularization, which will: 1 treat the biomarker identification in each single study as a task and then combine them by multitask learning; 2 group variables from all studies for identifying significant genes; 3 enforce sparse constraint on groups of variables to overcome the ‘small sample, but large variables’ problem. We introduce two sparse group penalties: sparse group lasso and sparse group ridge in our multitask model, and provide an effective algorithm for each model. In addition, we propose a significance test for the identification of potential risk genes. Two simulation studies are performed to evaluate the performance of our integrative method by comparing it with conventional meta-analysis method. The results show that our sparse group multitask method outperforms meta-analysis method significantly. In an application to our osteoporosis studies, 7 genes are identified as significant genes by our method and are found to have significant effects in other three independent studies for validation. The most significant gene SOD2 has been identified in our previous osteoporosis study involving the same expression dataset. Several other genes such as TREML2, HTR1E and GLO1 are shown to be novel susceptible genes for osteoporosis, as confirmed

  17. Multiple Regression Analysis for Grading and Prognosis of Cubital Tunnel Syndrome:Assessment of Akahori’s Classification

    Directory of Open Access Journals (Sweden)

    Nishida,Keiichiro

    2013-02-01

    Full Text Available The purpose of this study was to quantitatively evaluate Akahori's preoperative classification of cubital tunnel syndrome. We analyzed the results for 57 elbows that were treated by a simple decompression procedure from 1997 to 2004. The relationship between each item of Akahori's preoperative classification and clinical stage was investigated based on the parameter distribution. We evaluated Akahori's classification system using multiple regression analysis, and investigated the association between the stage and treatment results. The usefulness of the regression equation was evaluated by analysis of variance of the expected and observed scores. In the parameter distribution, each item of Akahori's classification was mostly associated with the stage, but it was difficult to judge the severity of palsy. In the mathematical evaluation, the most effective item in determining the stage was sensory conduction velocity. It was demonstrated that the established regression equation was highly reliable (R=0.922. Akahori's preoperative classification can also be used in postoperative classification, and this classification was correlated with postoperative prognosis. Our results indicate that Akahori's preoperative classification is a suitable system. It is reliable, reproducible and well-correlated with the postoperative prognosis. In addition, the established prediction formula is useful to reduce the diagnostic complexity of Akahori's classification.

  18. Multiple Regression Analysis of mRNA-miRNA Associations in Colorectal Cancer Pathway

    OpenAIRE

    Fengfeng Wang; S. C. Cesar Wong; Lawrence W. C. Chan; Cho, William C. S.; S. P. Yip; Yung, Benjamin Y. M.

    2014-01-01

    Background. MicroRNA (miRNA) is a short and endogenous RNA molecule that regulates posttranscriptional gene expression. It is an important factor for tumorigenesis of colorectal cancer (CRC), and a potential biomarker for diagnosis, prognosis, and therapy of CRC. Our objective is to identify the related miRNAs and their associations with genes frequently involved in CRC microsatellite instability (MSI) and chromosomal instability (CIN) signaling pathways. Results. A regression model was adopt...

  19. Assumptions of Multiple Regression: Correcting Two Misconceptions

    Directory of Open Access Journals (Sweden)

    Matt N. Williams

    2013-09-01

    Full Text Available In 2002, an article entitled - Four assumptions of multiple regression that researchers should always test- by.Osborne and Waters was published in PARE. This article has gone on to be viewed more than 275,000 times.(as of August 2013, and it is one of the first results displayed in a Google search for - regression.assumptions- . While Osborne and Waters' efforts in raising awareness of the need to check assumptions.when using regression are laudable, we note that the original article contained at least two fairly important.misconceptions about the assumptions of multiple regression: Firstly, that multiple regression requires the.assumption of normally distributed variables; and secondly, that measurement errors necessarily cause.underestimation of simple regression coefficients. In this article, we clarify that multiple regression models.estimated using ordinary least squares require the assumption of normally distributed errors in order for.trustworthy inferences, at least in small samples, but not the assumption of normally distributed response or.predictor variables. Secondly, we point out that regression coefficients in simple regression models will be.biased (toward zero estimates of the relationships between variables of interest when measurement error is.uncorrelated across those variables, but that when correlated measurement error is present, regression.coefficients may be either upwardly or downwardly biased. We conclude with a brief corrected summary of.the assumptions of multiple regression when using ordinary least squares.

  20. FORECASTING RETURNS FOR THE STOCK EXCHANGE OF THAILAND INDEX USING MULTIPLE REGRESSION BASED ON PRINCIPAL COMPONENT ANALYSIS

    Directory of Open Access Journals (Sweden)

    Nop Sopipan

    2013-01-01

    Full Text Available The aim of this study was to forecast the returns for the Stock Exchange of Thailand (SET Index by adding some explanatory variables and stationary Autoregressive Moving-Average order p and q (ARMA (p, q in the mean equation of returns. In addition, we used Principal Component Analysis (PCA to remove possible complications caused by multicollinearity. Afterwards, we forecast the volatility of the returns for the SET Index. Results showed that the ARMA (1,1, which includes multiple regression based on PCA, has the best performance. In forecasting the volatility of returns, the GARCH model performs best for one day ahead; and the EGARCH model performs best for five days, ten days and twenty-two days ahead.

  1. A multiple linear regression analysis of hot corrosion attack on a series of nickel base turbine alloys

    Science.gov (United States)

    Barrett, C. A.

    1985-01-01

    Multiple linear regression analysis was used to determine an equation for estimating hot corrosion attack for a series of Ni base cast turbine alloys. The U transform (i.e., 1/sin (% A/100) to the 1/2) was shown to give the best estimate of the dependent variable, y. A complete second degree equation is described for the centered" weight chemistries for the elements Cr, Al, Ti, Mo, W, Cb, Ta, and Co. In addition linear terms for the minor elements C, B, and Zr were added for a basic 47 term equation. The best reduced equation was determined by the stepwise selection method with essentially 13 terms. The Cr term was found to be the most important accounting for 60 percent of the explained variability hot corrosion attack.

  2. Linear Regression Analysis

    CERN Document Server

    Seber, George A F

    2012-01-01

    Concise, mathematically clear, and comprehensive treatment of the subject.* Expanded coverage of diagnostics and methods of model fitting.* Requires no specialized knowledge beyond a good grasp of matrix algebra and some acquaintance with straight-line regression and simple analysis of variance models.* More than 200 problems throughout the book plus outline solutions for the exercises.* This revision has been extensively class-tested.

  3. Risk Assessment and Prediction of Flyrock Distance by Combined Multiple Regression Analysis and Monte Carlo Simulation of Quarry Blasting

    Science.gov (United States)

    Armaghani, Danial Jahed; Mahdiyar, Amir; Hasanipanah, Mahdi; Faradonbeh, Roohollah Shirani; Khandelwal, Manoj; Amnieh, Hassan Bakhshandeh

    2016-09-01

    Flyrock is considered as one of the main causes of human injury, fatalities, and structural damage among all undesirable environmental impacts of blasting. Therefore, it seems that the proper prediction/simulation of flyrock is essential, especially in order to determine blast safety area. If proper control measures are taken, then the flyrock distance can be controlled, and, in return, the risk of damage can be reduced or eliminated. The first objective of this study was to develop a predictive model for flyrock estimation based on multiple regression (MR) analyses, and after that, using the developed MR model, flyrock phenomenon was simulated by the Monte Carlo (MC) approach. In order to achieve objectives of this study, 62 blasting operations were investigated in Ulu Tiram quarry, Malaysia, and some controllable and uncontrollable factors were carefully recorded/calculated. The obtained results of MC modeling indicated that this approach is capable of simulating flyrock ranges with a good level of accuracy. The mean of simulated flyrock by MC was obtained as 236.3 m, while this value was achieved as 238.6 m for the measured one. Furthermore, a sensitivity analysis was also conducted to investigate the effects of model inputs on the output of the system. The analysis demonstrated that powder factor is the most influential parameter on fly rock among all model inputs. It is noticeable that the proposed MR and MC models should be utilized only in the studied area and the direct use of them in the other conditions is not recommended.

  4. Comparing Effects of Biologic Agents in Treating Patients with Rheumatoid Arthritis: A Multiple Treatment Comparison Regression Analysis.

    Directory of Open Access Journals (Sweden)

    Ingunn Fride Tvete

    Full Text Available Rheumatoid arthritis patients have been treated with disease modifying anti-rheumatic drugs (DMARDs and the newer biologic drugs. We sought to compare and rank the biologics with respect to efficacy. We performed a literature search identifying 54 publications encompassing 9 biologics. We conducted a multiple treatment comparison regression analysis letting the number experiencing a 50% improvement on the ACR score be dependent upon dose level and disease duration for assessing the comparable relative effect between biologics and placebo or DMARD. The analysis embraced all treatment and comparator arms over all publications. Hence, all measured effects of any biologic agent contributed to the comparison of all biologic agents relative to each other either given alone or combined with DMARD. We found the drug effect to be dependent on dose level, but not on disease duration, and the impact of a high versus low dose level was the same for all drugs (higher doses indicated a higher frequency of ACR50 scores. The ranking of the drugs when given without DMARD was certolizumab (ranked highest, etanercept, tocilizumab/ abatacept and adalimumab. The ranking of the drugs when given with DMARD was certolizumab (ranked highest, tocilizumab, anakinra/rituximab, golimumab/ infliximab/ abatacept, adalimumab/ etanercept [corrected]. Still, all drugs were effective. All biologic agents were effective compared to placebo, with certolizumab the most effective and adalimumab (without DMARD treatment and adalimumab/ etanercept (combined with DMARD treatment the least effective. The drugs were in general more effective, except for etanercept, when given together with DMARDs.

  5. Crude oil price forecasting based on hybridizing wavelet multiple linear regression model, particle swarm optimization techniques, and principal component analysis.

    Science.gov (United States)

    Shabri, Ani; Samsudin, Ruhaidah

    2014-01-01

    Crude oil prices do play significant role in the global economy and are a key input into option pricing formulas, portfolio allocation, and risk measurement. In this paper, a hybrid model integrating wavelet and multiple linear regressions (MLR) is proposed for crude oil price forecasting. In this model, Mallat wavelet transform is first selected to decompose an original time series into several subseries with different scale. Then, the principal component analysis (PCA) is used in processing subseries data in MLR for crude oil price forecasting. The particle swarm optimization (PSO) is used to adopt the optimal parameters of the MLR model. To assess the effectiveness of this model, daily crude oil market, West Texas Intermediate (WTI), has been used as the case study. Time series prediction capability performance of the WMLR model is compared with the MLR, ARIMA, and GARCH models using various statistics measures. The experimental results show that the proposed model outperforms the individual models in forecasting of the crude oil prices series.

  6. Estimation of the retention behaviour of s-triazine derivatives applying multiple regression analysis of selected molecular descriptors

    Directory of Open Access Journals (Sweden)

    Jevrić Lidija R.

    2013-01-01

    Full Text Available The estimation of retention factors by correlation equations with physico-chemical properties can be of great helpl in chromatographic studies. The retention factors were experimentally measured by RP-HPTLC on impregnated silica gel with paraffin oil using two-component solvent systems. The relationships between solute retention and modifier concentration were described by Snyder’s linear equation. A quantitative structure-retention relationship was developed for a series of s-triazine compounds by the multiple linear regression (MLR analysis. The MLR procedure was used to model the relationships between the molecular descriptors and retention of s-triazine derivatives. The physicochemical molecular descriptors were calculated from the optimized structures. The physico-chemical properties were the lipophilicity (log P, connectivity indices (χ, total energy (Et, water solubility (log W, dissociation constant (pKa, molar refractivity (MR, and Gibbs energy (GibbsE of s-triazines. A high agreement between the experimental and predicted retention parameters was obtained when the dissociation constant and the hydrophilic-lipophilic balance were used as the molecular descriptors. The empirical equations may be successfully used for the prediction of the various chromatographic characteristics of substances, with a similar chemical structure. [Projekat Ministarstva nauke Republike Srbije, br. 31055, br. 172012, br. 172013 i br. 172014

  7. Multiple Linear Regression Analysis of Factors Affecting Real Property Price Index From Case Study Research In Istanbul/Turkey

    Science.gov (United States)

    Denli, H. H.; Koc, Z.

    2015-12-01

    Estimation of real properties depending on standards is difficult to apply in time and location. Regression analysis construct mathematical models which describe or explain relationships that may exist between variables. The problem of identifying price differences of properties to obtain a price index can be converted into a regression problem, and standard techniques of regression analysis can be used to estimate the index. Considering regression analysis for real estate valuation, which are presented in real marketing process with its current characteristics and quantifiers, the method will help us to find the effective factors or variables in the formation of the value. In this study, prices of housing for sale in Zeytinburnu, a district in Istanbul, are associated with its characteristics to find a price index, based on information received from a real estate web page. The associated variables used for the analysis are age, size in m2, number of floors having the house, floor number of the estate and number of rooms. The price of the estate represents the dependent variable, whereas the rest are independent variables. Prices from 60 real estates have been used for the analysis. Same price valued locations have been found and plotted on the map and equivalence curves have been drawn identifying the same valued zones as lines.

  8. The M Word: Multicollinearity in Multiple Regression.

    Science.gov (United States)

    Morrow-Howell, Nancy

    1994-01-01

    Notes that existence of substantial correlation between two or more independent variables creates problems of multicollinearity in multiple regression. Discusses multicollinearity problem in social work research in which independent variables are usually intercorrelated. Clarifies problems created by multicollinearity, explains detection of…

  9. Spectroscopic determination of leaf biochemistry using band-depth analysis of absorption features and stepwise multiple linear regression

    Science.gov (United States)

    Kokaly, R.F.; Clark, R.N.

    1999-01-01

    We develop a new method for estimating the biochemistry of plant material using spectroscopy. Normalized band depths calculated from the continuum-removed reflectance spectra of dried and ground leaves were used to estimate their concentrations of nitrogen, lignin, and cellulose. Stepwise multiple linear regression was used to select wavelengths in the broad absorption features centered at 1.73 ??m, 2.10 ??m, and 2.30 ??m that were highly correlated with the chemistry of samples from eastern U.S. forests. Band depths of absorption features at these wavelengths were found to also be highly correlated with the chemistry of four other sites. A subset of data from the eastern U.S. forest sites was used to derive linear equations that were applied to the remaining data to successfully estimate their nitrogen, lignin, and cellulose concentrations. Correlations were highest for nitrogen (R2 from 0.75 to 0.94). The consistent results indicate the possibility of establishing a single equation capable of estimating the chemical concentrations in a wide variety of species from the reflectance spectra of dried leaves. The extension of this method to remote sensing was investigated. The effects of leaf water content, sensor signal-to-noise and bandpass, atmospheric effects, and background soil exposure were examined. Leaf water was found to be the greatest challenge to extending this empirical method to the analysis of fresh whole leaves and complete vegetation canopies. The influence of leaf water on reflectance spectra must be removed to within 10%. Other effects were reduced by continuum removal and normalization of band depths. If the effects of leaf water can be compensated for, it might be possible to extend this method to remote sensing data acquired by imaging spectrometers to give estimates of nitrogen, lignin, and cellulose concentrations over large areas for use in ecosystem studies.We develop a new method for estimating the biochemistry of plant material using

  10. Predicting punching acceleration from selected strength and power variables in elite karate athletes: a multiple regression analysis.

    Science.gov (United States)

    Loturco, Irineu; Artioli, Guilherme Giannini; Kobal, Ronaldo; Gil, Saulo; Franchini, Emerson

    2014-07-01

    This study investigated the relationship between punching acceleration and selected strength and power variables in 19 professional karate athletes from the Brazilian National Team (9 men and 10 women; age, 23 ± 3 years; height, 1.71 ± 0.09 m; and body mass [BM], 67.34 ± 13.44 kg). Punching acceleration was assessed under 4 different conditions in a randomized order: (a) fixed distance aiming to attain maximum speed (FS), (b) fixed distance aiming to attain maximum impact (FI), (c) self-selected distance aiming to attain maximum speed, and (d) self-selected distance aiming to attain maximum impact. The selected strength and power variables were as follows: maximal dynamic strength in bench press and squat-machine, squat and countermovement jump height, mean propulsive power in bench throw and jump squat, and mean propulsive velocity in jump squat with 40% of BM. Upper- and lower-body power and maximal dynamic strength variables were positively correlated to punch acceleration in all conditions. Multiple regression analysis also revealed predictive variables: relative mean propulsive power in squat jump (W·kg-1), and maximal dynamic strength 1 repetition maximum in both bench press and squat-machine exercises. An impact-oriented instruction and a self-selected distance to start the movement seem to be crucial to reach the highest acceleration during punching execution. This investigation, while demonstrating strong correlations between punching acceleration and strength-power variables, also provides important information for coaches, especially for designing better training strategies to improve punching speed.

  11. Multiple Kernel Spectral Regression for Dimensionality Reduction

    Directory of Open Access Journals (Sweden)

    Bing Liu

    2013-01-01

    Full Text Available Traditional manifold learning algorithms, such as locally linear embedding, Isomap, and Laplacian eigenmap, only provide the embedding results of the training samples. To solve the out-of-sample extension problem, spectral regression (SR solves the problem of learning an embedding function by establishing a regression framework, which can avoid eigen-decomposition of dense matrices. Motivated by the effectiveness of SR, we incorporate multiple kernel learning (MKL into SR for dimensionality reduction. The proposed approach (termed MKL-SR seeks an embedding function in the Reproducing Kernel Hilbert Space (RKHS induced by the multiple base kernels. An MKL-SR algorithm is proposed to improve the performance of kernel-based SR (KSR further. Furthermore, the proposed MKL-SR algorithm can be performed in the supervised, unsupervised, and semi-supervised situation. Experimental results on supervised classification and semi-supervised classification demonstrate the effectiveness and efficiency of our algorithm.

  12. Gene-based multiple regression association testing for combined examination of common and low frequency variants in quantitative trait analysis

    Directory of Open Access Journals (Sweden)

    Yun Joo eYoo

    2013-11-01

    Full Text Available Multi-marker methods for genetic association analysis can be performed for common and low frequency SNPs to improve power. Regression models are an intuitive way to formulate multi-marker tests. In previous studies we evaluated regression-based multi-marker tests for common SNPs, and through identification of bins consisting of correlated SNPs, developed a multi-bin linear combination (MLC test that is a compromise between a 1df linear combination test and a multi-df global test. Bins of SNPs in high linkage disequilibrium (LD are identified, and a linear combination of individual SNP statistics is constructed within each bin. Then association with the phenotype is represented by an overall statistic with df as many or few as the number of bins. In this report we evaluate multi-marker tests for SNPs that occur at low frequencies. There are many linear and quadratic multi-marker tests that are suitable for common or low frequency variant analysis. We compared the performance of the MLC tests with various linear and quadratic statistics in joint or marginal regressions. For these comparisons, we performed a simulation study of genotypes and quantitative traits for 85 genes with many low frequency SNPs based on HapMap Phase III. We compared the tests using 1 set of all SNPs in a gene, 2 set of common SNPs in a gene (MAF≥5%, 3 set of low frequency SNPs (1%≤MAF

  13. A Dirty Model for Multiple Sparse Regression

    CERN Document Server

    Jalali, Ali; Sanghavi, Sujay

    2011-01-01

    Sparse linear regression -- finding an unknown vector from linear measurements -- is now known to be possible with fewer samples than variables, via methods like the LASSO. We consider the multiple sparse linear regression problem, where several related vectors -- with partially shared support sets -- have to be recovered. A natural question in this setting is whether one can use the sharing to further decrease the overall number of samples required. A line of recent research has studied the use of \\ell_1/\\ell_q norm block-regularizations with q>1 for such problems; however these could actually perform worse in sample complexity -- vis a vis solving each problem separately ignoring sharing -- depending on the level of sharing. We present a new method for multiple sparse linear regression that can leverage support and parameter overlap when it exists, but not pay a penalty when it does not. A very simple idea: we decompose the parameters into two components and regularize these differently. We show both theore...

  14. Statistical analysis of water-quality data containing multiple detection limits: S-language software for regression on order statistics

    Science.gov (United States)

    Lee, L.; Helsel, D.

    2005-01-01

    Trace contaminants in water, including metals and organics, often are measured at sufficiently low concentrations to be reported only as values below the instrument detection limit. Interpretation of these "less thans" is complicated when multiple detection limits occur. Statistical methods for multiply censored, or multiple-detection limit, datasets have been developed for medical and industrial statistics, and can be employed to estimate summary statistics or model the distributions of trace-level environmental data. We describe S-language-based software tools that perform robust linear regression on order statistics (ROS). The ROS method has been evaluated as one of the most reliable procedures for developing summary statistics of multiply censored data. It is applicable to any dataset that has 0 to 80% of its values censored. These tools are a part of a software library, or add-on package, for the R environment for statistical computing. This library can be used to generate ROS models and associated summary statistics, plot modeled distributions, and predict exceedance probabilities of water-quality standards. ?? 2005 Elsevier Ltd. All rights reserved.

  15. Melanin and blood concentration in a human skin model studied by multiple regression analysis: assessment by Monte Carlo simulation

    Science.gov (United States)

    Shimada, M.; Yamada, Y.; Itoh, M.; Yatagai, T.

    2001-09-01

    Measurement of melanin and blood concentration in human skin is needed in the medical and the cosmetic fields because human skin colour is mainly determined by the colours of melanin and blood. It is difficult to measure these concentrations in human skin because skin has a multi-layered structure and scatters light strongly throughout the visible spectrum. The Monte Carlo simulation currently used for the analysis of skin colour requires long calculation times and knowledge of the specific optical properties of each skin layer. A regression analysis based on the modified Beer-Lambert law is presented as a method of measuring melanin and blood concentration in human skin in a shorter period of time and with fewer calculations. The accuracy of this method is assessed using Monte Carlo simulations.

  16. Prediction of coal grindability based on petrography, proximate and ultimate analysis using multiple regression and artificial neural network models

    Energy Technology Data Exchange (ETDEWEB)

    Chelgani, S. Chehreh; Jorjani, E.; Mesroghli, Sh.; Bagherieh, A.H. [Department of Mining Engineering, Research and Science Campus, Islamic Azad University, Poonak, Hesarak Tehran (Iran); Hower, James C. [Center for Applied Energy Research, University of Kentucky, 2540 Research Park Drive, Lexington, KY 40511 (United States)

    2008-01-15

    The effects of proximate and ultimate analysis, maceral content, and coal rank (R{sub max}) for a wide range of Kentucky coal samples from calorific value of 4320 to 14960 (BTU/lb) (10.05 to 34.80 MJ/kg) on Hardgrove Grindability Index (HGI) have been investigated by multivariable regression and artificial neural network methods (ANN). The stepwise least square mathematical method shows that the relationship between (a) Moisture, ash, volatile matter, and total sulfur; (b) ln (total sulfur), hydrogen, ash, ln ((oxygen + nitrogen)/carbon) and moisture; (c) ln (exinite), semifusinite, micrinite, macrinite, resinite, and R{sub max} input sets with HGI in linear condition can achieve the correlation coefficients (R{sup 2}) of 0.77, 0.75, and 0.81, respectively. The ANN, which adequately recognized the characteristics of the coal samples, can predict HGI with correlation coefficients of 0.89, 0.89 and 0.95 respectively in testing process. It was determined that ln (exinite), semifusinite, micrinite, macrinite, resinite, and R{sub max} can be used as the best predictor for the estimation of HGI on multivariable regression (R{sup 2} = 0.81) and also artificial neural network methods (R{sup 2} = 0.95). The ANN based prediction method, as used in this paper, can be further employed as a reliable and accurate method, in the hardgrove grindability index prediction. (author)

  17. Improved spatial regression analysis of diffusion tensor imaging for lesion detection during longitudinal progression of multiple sclerosis in individual subjects

    Science.gov (United States)

    Liu, Bilan; Qiu, Xing; Zhu, Tong; Tian, Wei; Hu, Rui; Ekholm, Sven; Schifitto, Giovanni; Zhong, Jianhui

    2016-03-01

    Subject-specific longitudinal DTI study is vital for investigation of pathological changes of lesions and disease evolution. Spatial Regression Analysis of Diffusion tensor imaging (SPREAD) is a non-parametric permutation-based statistical framework that combines spatial regression and resampling techniques to achieve effective detection of localized longitudinal diffusion changes within the whole brain at individual level without a priori hypotheses. However, boundary blurring and dislocation limit its sensitivity, especially towards detecting lesions of irregular shapes. In the present study, we propose an improved SPREAD (dubbed improved SPREAD, or iSPREAD) method by incorporating a three-dimensional (3D) nonlinear anisotropic diffusion filtering method, which provides edge-preserving image smoothing through a nonlinear scale space approach. The statistical inference based on iSPREAD was evaluated and compared with the original SPREAD method using both simulated and in vivo human brain data. Results demonstrated that the sensitivity and accuracy of the SPREAD method has been improved substantially by adapting nonlinear anisotropic filtering. iSPREAD identifies subject-specific longitudinal changes in the brain with improved sensitivity, accuracy, and enhanced statistical power, especially when the spatial correlation is heterogeneous among neighboring image pixels in DTI.

  18. A comparative study of multiple regression analysis and back propagation neural network approaches on plain carbon steel in submerged-arc welding

    Indian Academy of Sciences (India)

    ABHIJIT SARKAR; PRASENJIT DEY; R N RAI; SUBHAS CHANDRA SAHA

    2016-05-01

    Weld bead plays an important role in determining the quality of welding particularly in high heat input processes. This research paper presents the development of multiple regression analysis (MRA) and artificial neural network (ANN) models to predict weld bead geometry and HAZ width in submerged arcwelding process. Design of experiments is based on Taguchi’s L16 orthogonal array by varying wire feed rate,transverse speed and stick out to develop a multiple regression model, which has been checked for adequacy andsignificance. Also, ANN model was accomplished with the back propagation approach in MATLAB program to predict bead geometry and HAZ width. Finally, the results of two prediction models were compared and analyzed. It is found that the error related to the prediction of bead geometry and HAZ width is smaller in ANN than MRA.

  19. A comparison on parameter-estimation methods in multiple regression analysis with existence of multicollinearity among independent variables

    Directory of Open Access Journals (Sweden)

    Hukharnsusatrue, A.

    2005-11-01

    Full Text Available The objective of this research is to compare multiple regression coefficients estimating methods with existence of multicollinearity among independent variables. The estimation methods are Ordinary Least Squares method (OLS, Restricted Least Squares method (RLS, Restricted Ridge Regression method (RRR and Restricted Liu method (RL when restrictions are true and restrictions are not true. The study used the Monte Carlo Simulation method. The experiment was repeated 1,000 times under each situation. The analyzed results of the data are demonstrated as follows. CASE 1: The restrictions are true. In all cases, RRR and RL methods have a smaller Average Mean Square Error (AMSE than OLS and RLS method, respectively. RRR method provides the smallest AMSE when the level of correlations is high and also provides the smallest AMSE for all level of correlations and all sample sizes when standard deviation is equal to 5. However, RL method provides the smallest AMSE when the level of correlations is low and middle, except in the case of standard deviation equal to 3, small sample sizes, RRR method provides the smallest AMSE.The AMSE varies with, most to least, respectively, level of correlations, standard deviation and number of independent variables but inversely with to sample size.CASE 2: The restrictions are not true.In all cases, RRR method provides the smallest AMSE, except in the case of standard deviation equal to 1 and error of restrictions equal to 5%, OLS method provides the smallest AMSE when the level of correlations is low or median and there is a large sample size, but the small sample sizes, RL method provides the smallest AMSE. In addition, when error of restrictions is increased, OLS method provides the smallest AMSE for all level, of correlations and all sample sizes, except when the level of correlations is high and sample sizes small. Moreover, the case OLS method provides the smallest AMSE, the most RLS method has a smaller AMSE than

  20. Diplotype Trend Regression Analysis of the ADH Gene Cluster and the ALDH2 Gene: Multiple Significant Associations with Alcohol Dependence

    Science.gov (United States)

    Luo, Xingguang; Kranzler, Henry R.; Zuo, Lingjun; Wang, Shuang; Schork, Nicholas J.; Gelernter, Joel

    2006-01-01

    The set of alcohol-metabolizing enzymes has considerable genetic and functional complexity. The relationships between some alcohol dehydrogenase (ADH) and aldehyde dehydrogenase (ALDH) genes and alcohol dependence (AD) have long been studied in many populations, but not comprehensively. In the present study, we genotyped 16 markers within the ADH gene cluster (including the ADH1A, ADH1B, ADH1C, ADH5, ADH6, and ADH7 genes), 4 markers within the ALDH2 gene, and 38 unlinked ancestry-informative markers in a case-control sample of 801 individuals. Associations between markers and disease were analyzed by a Hardy-Weinberg equilibrium (HWE) test, a conventional case-control comparison, a structured association analysis, and a novel diplotype trend regression (DTR) analysis. Finally, the disease alleles were fine mapped by a Hardy-Weinberg disequilibrium (HWD) measure (J). All markers were found to be in HWE in controls, but some markers showed HWD in cases. Genotypes of many markers were associated with AD. DTR analysis showed that ADH5 genotypes and diplotypes of ADH1A, ADH1B, ADH7, and ALDH2 were associated with AD in European Americans and/or African Americans. The risk-influencing alleles were fine mapped from among the markers studied and were found to coincide with some well-known functional variants. We demonstrated that DTR was more powerful than many other conventional association methods. We also found that several ADH genes and the ALDH2 gene were susceptibility loci for AD, and the associations were best explained by several independent risk genes. PMID:16685648

  1. [Multiple imputation and complete case analysis in logistic regression models: a practical assessment of the impact of incomplete covariate data].

    Science.gov (United States)

    Camargos, Vitor Passos; César, Cibele Comini; Caiaffa, Waleska Teixeira; Xavier, Cesar Coelho; Proietti, Fernando Augusto

    2011-12-01

    Researchers in the health field often deal with the problem of incomplete databases. Complete Case Analysis (CCA), which restricts the analysis to subjects with complete data, reduces the sample size and may result in biased estimates. Based on statistical grounds, Multiple Imputation (MI) uses all collected data and is recommended as an alternative to CCA. Data from the study Saúde em Beagá, attended by 4,048 adults from two of nine health districts in the city of Belo Horizonte, Minas Gerais State, Brazil, in 2008-2009, were used to evaluate CCA and different MI approaches in the context of logistic models with incomplete covariate data. Peculiarities in some variables in this study allowed analyzing a situation in which the missing covariate data are recovered and thus the results before and after recovery are compared. Based on the analysis, even the more simplistic MI approach performed better than CCA, since it was closer to the post-recovery results.

  2. A comparison between Joint Regression Analysis and the Additive Main and Multiplicative Interaction model: the robustness with increasing amounts of missing data

    Directory of Open Access Journals (Sweden)

    Paulo Canas Rodrigues

    2011-12-01

    Full Text Available This paper joins the main properties of joint regression analysis (JRA, a model based on the Finlay-Wilkinson regression to analyse multi-environment trials, and of the additive main effects and multiplicative interaction (AMMI model. The study compares JRA and AMMI with particular focus on robustness with increasing amounts of randomly selected missing data. The application is made using a data set from a breeding program of durum wheat (Triticum turgidum L., Durum Group conducted in Portugal. The results of the two models result in similar dominant cultivars (JRA and winner of mega-environments (AMMI for the same environments. However, JRA had more stable results with the increase in the incidence rates of missing values.

  3. Multiple Regression Analysis of Reading Performance Data from Twin Pairs with Reading Difficulties and Non-twin Siblings: The Augmented Model

    OpenAIRE

    Wadsworth, S J; Olson, R. K.; Willcutt, E.G.; DeFries, J. C.

    2012-01-01

    The augmented multiple regression model for the analysis of data from selected twin pairs was extended to facilitate analyses of data from twin pairs and non-twin siblings. Fitting this extended model to data from both selected twin pairs and siblings yields direct estimates of heritability (h2) and the difference between environmental influences shared by members of twin pairs and those of sib or twin/sib pairs [i.e., c2(t) − c2(s)]. When this model was fitted to reading performance data fro...

  4. Entrepreneurial intention modeling using hierarchical multiple regression

    Directory of Open Access Journals (Sweden)

    Marina Jeger

    2014-12-01

    Full Text Available The goal of this study is to identify the contribution of effectuation dimensions to the predictive power of the entrepreneurial intention model over and above that which can be accounted for by other predictors selected and confirmed in previous studies. As is often the case in social and behavioral studies, some variables are likely to be highly correlated with each other. Therefore, the relative amount of variance in the criterion variable explained by each of the predictors depends on several factors such as the order of variable entry and sample specifics. The results show the modest predictive power of two dimensions of effectuation prior to the introduction of the theory of planned behavior elements. The article highlights the main advantages of applying hierarchical regression in social sciences as well as in the specific context of entrepreneurial intention formation, and addresses some of the potential pitfalls that this type of analysis entails.

  5. Multiple Retrieval Models and Regression Models for Prior Art Search

    CERN Document Server

    Lopez, Patrice

    2009-01-01

    This paper presents the system called PATATRAS (PATent and Article Tracking, Retrieval and AnalysiS) realized for the IP track of CLEF 2009. Our approach presents three main characteristics: 1. The usage of multiple retrieval models (KL, Okapi) and term index definitions (lemma, phrase, concept) for the three languages considered in the present track (English, French, German) producing ten different sets of ranked results. 2. The merging of the different results based on multiple regression models using an additional validation set created from the patent collection. 3. The exploitation of patent metadata and of the citation structures for creating restricted initial working sets of patents and for producing a final re-ranking regression model. As we exploit specific metadata of the patent documents and the citation relations only at the creation of initial working sets and during the final post ranking step, our architecture remains generic and easy to extend.

  6. Heteroscedastic regression analysis method for mixed data

    Institute of Scientific and Technical Information of China (English)

    FU Hui-min; YUE Xiao-rui

    2011-01-01

    The heteroscedastic regression model was established and the heteroscedastic regression analysis method was presented for mixed data composed of complete data, type- I censored data and type- Ⅱ censored data from the location-scale distribution. The best unbiased estimations of regression coefficients, as well as the confidence limits of the location parameter and scale parameter were given. Furthermore, the point estimations and confidence limits of percentiles were obtained. Thus, the traditional multiple regression analysis method which is only suitable to the complete data from normal distribution can be extended to the cases of heteroscedastic mixed data and the location-scale distribution. So the presented method has a broad range of promising applications.

  7. Fuzzy multiple linear regression: A computational approach

    Science.gov (United States)

    Juang, C. H.; Huang, X. H.; Fleming, J. W.

    1992-01-01

    This paper presents a new computational approach for performing fuzzy regression. In contrast to Bardossy's approach, the new approach, while dealing with fuzzy variables, closely follows the conventional regression technique. In this approach, treatment of fuzzy input is more 'computational' than 'symbolic.' The following sections first outline the formulation of the new approach, then deal with the implementation and computational scheme, and this is followed by examples to illustrate the new procedure.

  8. Radiologic assessment of third molar tooth and spheno-occipital synchondrosis for age estimation: a multiple regression analysis study.

    Science.gov (United States)

    Demirturk Kocasarac, Husniye; Sinanoglu, Alper; Noujeim, Marcel; Helvacioglu Yigit, Dilek; Baydemir, Canan

    2016-05-01

    For forensic age estimation, radiographic assessment of third molar mineralization is important between 14 and 21 years which coincides with the legal age in most countries. The spheno-occipital synchondrosis (SOS) is an important growth site during development, and its use for age estimation is beneficial when combined with other markers. In this study, we aimed to develop a regression model to estimate and narrow the age range based on the radiologic assessment of third molar and SOS in a Turkish subpopulation. Panoramic radiographs and cone beam CT scans of 349 subjects (182 males, 167 females) with age between 8 and 25 were evaluated. Four-stage system was used to evaluate the fusion degree of SOS, and Demirjian's eight stages of development for calcification for third molars. The Pearson correlation indicated a strong positive relationship between age and third molar calcification for both sexes (r = 0.850 for females, r = 0.839 for males, P age and SOS fusion for females (r = 0.814), but a moderate relationship was found for males (r = 0.599), P age determination formula using these scores was established.

  9. Multiple Linear Regression Analysis Indicates Association of P-Glycoprotein Substrate or Inhibitor Character with Bitterness Intensity, Measured with a Sensor.

    Science.gov (United States)

    Yano, Kentaro; Mita, Suzune; Morimoto, Kaori; Haraguchi, Tamami; Arakawa, Hiroshi; Yoshida, Miyako; Yamashita, Fumiyoshi; Uchida, Takahiro; Ogihara, Takuo

    2015-09-01

    P-glycoprotein (P-gp) regulates absorption of many drugs in the gastrointestinal tract and their accumulation in tumor tissues, but the basis of substrate recognition by P-gp remains unclear. Bitter-tasting phenylthiocarbamide, which stimulates taste receptor 2 member 38 (T2R38), increases P-gp activity and is a substrate of P-gp. This led us to hypothesize that bitterness intensity might be a predictor of P-gp-inhibitor/substrate status. Here, we measured the bitterness intensity of a panel of P-gp substrates and nonsubstrates with various taste sensors, and used multiple linear regression analysis to examine the relationship between P-gp-inhibitor/substrate status and various physical properties, including intensity of bitter taste measured with the taste sensor. We calculated the first principal component analysis score (PC1) as the representative value of bitterness, as all taste sensor's outputs shared significant correlation. The P-gp substrates showed remarkably greater mean bitterness intensity than non-P-gp substrates. We found that Km value of P-gp substrates were correlated with molecular weight, log P, and PC1 value, and the coefficient of determination (R(2) ) of the linear regression equation was 0.63. This relationship might be useful as an aid to predict P-gp substrate status at an early stage of drug discovery.

  10. Quantifying TiO2 Abundance of Lunar Soils:Partial Least Squares and Stepwise Multiple Regression Analysis for Determining Causal Effect

    Institute of Scientific and Technical Information of China (English)

    Lin Li

    2011-01-01

    Partial least squares (PLS) regression was applied to the Lunar Soil Characterization Consortium (LSCC) dataset for spectral estimation of TiO2.The LSCC dataset was split into a number of subsets including the low-Ti,high-Ti,total mare soils,total highland,Apollo 16,and Apollo 14 soils to investigete the effects of interfering minerals and nonlinearity on the PLS performance.The PLS weight loading vectors were analyzed through stepwise multiple regression analysis (SMRA) to identify mineral species driving and interfering the PLS performance.PLS exhibits high performance for estimating TiO2 for the LSCC low-Ti and high-Ti mare samples and both groups analyzed together.The results suggest that while the dominant TiO2-bearing minerals are few,additional PLS factors are required to compensate the effects on the important PLS factors of minerals that are not highly corrected to TiO2,to accommodate nonlinear relationships between reflectance and TiO2,and to correct inconsistent mineral-TiO2 correlations between the high-Ti and iow-Ti mare samples.Analysis of the LSCC highland soil samples indicates that the Apollo 16 soils are responsible for the large errors of TiO2 estimates when the soils are modeled with other subgroups.For the LSCC Apollo 16 samples,the dominant spectral effects of plagioclase over other dark minerals are primarily responsible for large errors of estimated TiO2.For the Apollo 14 soils,more accurate estimation for TiO2 is attributed to the positive correlation between a major TiO2-bearing component and TiO2,explaining why the Apollo 14 soils follow the regression trend when analyzed with other soils groups.

  11. Credit Scoring Problem Based on Regression Analysis

    OpenAIRE

    Khassawneh, Bashar Suhil Jad Allah

    2014-01-01

    ABSTRACT: This thesis provides an explanatory introduction to the regression models of data mining and contains basic definitions of key terms in the linear, multiple and logistic regression models. Meanwhile, the aim of this study is to illustrate fitting models for the credit scoring problem using simple linear, multiple linear and logistic regression models and also to analyze the found model functions by statistical tools. Keywords: Data mining, linear regression, logistic regression....

  12. Exploring the equity of GP practice prescribing rates for selected coronary heart disease drugs: a multiple regression analysis with proxies of healthcare need

    Directory of Open Access Journals (Sweden)

    St Leger Antony S

    2005-02-01

    Full Text Available Abstract Background There is a small, but growing body of literature highlighting inequities in GP practice prescribing rates for many drug therapies. The aim of this paper is to further explore the equity of prescribing for five major CHD drug groups and to explain the amount of variation in GP practice prescribing rates that can be explained by a range of healthcare needs indicators (HCNIs. Methods The study involved a cross-sectional secondary analysis in four primary care trusts (PCTs 1–4 in the North West of England, including 132 GP practices. Prescribing rates (average daily quantities per registered patient aged over 35 years and HCNIs were developed for all GP practices. Analysis was undertaken using multiple linear regression. Results Between 22–25% of the variation in prescribing rates for statins, beta-blockers and bendrofluazide was explained in the multiple regression models. Slightly more variation was explained for ACE inhibitors (31.6% and considerably more for aspirin (51.2%. Prescribing rates were positively associated with CHD hospital diagnoses and procedures for all drug groups other than ACE inhibitors. The proportion of patients aged 55–74 years was positively related to all prescribing rates other than aspirin, where they were positively related to the proportion of patients aged >75 years. However, prescribing rates for statins and ACE inhibitors were negatively associated with the proportion of patients aged >75 years in addition to the proportion of patients from minority ethnic groups. Prescribing rates for aspirin, bendrofluazide and all CHD drugs combined were negatively associated with deprivation. Conclusion Although around 25–50% of the variation in prescribing rates was explained by HCNIs, this varied markedly between PCTs and drug groups. Prescribing rates were generally characterised by both positive and negative associations with HCNIs, suggesting possible inequities in prescribing rates on the basis

  13. Abnormal behavior of the least squares estimate of multiple regression

    Institute of Scientific and Technical Information of China (English)

    陈希孺; 安鸿志

    1997-01-01

    An example is given to reveal the abnormal behavior of the least squares estimate of multiple regression. It is shown that the least squares estimate of the multiple linear regression may be "improved in the sense of weak consistency when nuisance parameters are introduced into the model. A discussion on the implications of this finding is given.

  14. Estimating Loess Plateau Average Annual Precipitation with Multiple Linear Regression Kriging and Geographically Weighted Regression Kriging

    Directory of Open Access Journals (Sweden)

    Qiutong Jin

    2016-06-01

    Full Text Available Estimating the spatial distribution of precipitation is an important and challenging task in hydrology, climatology, ecology, and environmental science. In order to generate a highly accurate distribution map of average annual precipitation for the Loess Plateau in China, multiple linear regression Kriging (MLRK and geographically weighted regression Kriging (GWRK methods were employed using precipitation data from the period 1980–2010 from 435 meteorological stations. The predictors in regression Kriging were selected by stepwise regression analysis from many auxiliary environmental factors, such as elevation (DEM, normalized difference vegetation index (NDVI, solar radiation, slope, and aspect. All predictor distribution maps had a 500 m spatial resolution. Validation precipitation data from 130 hydrometeorological stations were used to assess the prediction accuracies of the MLRK and GWRK approaches. Results showed that both prediction maps with a 500 m spatial resolution interpolated by MLRK and GWRK had a high accuracy and captured detailed spatial distribution data; however, MLRK produced a lower prediction error and a higher variance explanation than GWRK, although the differences were small, in contrast to conclusions from similar studies.

  15. Non-destructive evaluation of chlorophyll content in quinoa and amaranth leaves by simple and multiple regression analysis of RGB image components.

    Science.gov (United States)

    Riccardi, M; Mele, G; Pulvento, C; Lavini, A; d'Andria, R; Jacobsen, S-E

    2014-06-01

    Leaf chlorophyll content provides valuable information about physiological status of plants; it is directly linked to photosynthetic potential and primary production. In vitro assessment by wet chemical extraction is the standard method for leaf chlorophyll determination. This measurement is expensive, laborious, and time consuming. Over the years alternative methods, rapid and non-destructive, have been explored. The aim of this work was to evaluate the applicability of a fast and non-invasive field method for estimation of chlorophyll content in quinoa and amaranth leaves based on RGB components analysis of digital images acquired with a standard SLR camera. Digital images of leaves from different genotypes of quinoa and amaranth were acquired directly in the field. Mean values of each RGB component were evaluated via image analysis software and correlated to leaf chlorophyll provided by standard laboratory procedure. Single and multiple regression models using RGB color components as independent variables have been tested and validated. The performance of the proposed method was compared to that of the widely used non-destructive SPAD method. Sensitivity of the best regression models for different genotypes of quinoa and amaranth was also checked. Color data acquisition of the leaves in the field with a digital camera was quick, more effective, and lower cost than SPAD. The proposed RGB models provided better correlation (highest R (2)) and prediction (lowest RMSEP) of the true value of foliar chlorophyll content and had a lower amount of noise in the whole range of chlorophyll studied compared with SPAD and other leaf image processing based models when applied to quinoa and amaranth.

  16. Non-destructive evaluation of chlorophyll content in quinoa and amaranth leaves by simple and multiple regression analysis of RGB image components

    DEFF Research Database (Denmark)

    Riccardi, M.; Mele, G.; Pulvento, C.

    2014-01-01

    by standard laboratory procedure. Single and multiple regression models using RGB color components as independent variables have been tested and validated. The performance of the proposed method was compared to that of the widely used non-destructive SPAD method. Sensitivity of the best regression models...... is expensive, laborious, and time consuming. Over the years alternative methods, rapid and non-destructive, have been explored. The aim of this work was to evaluate the applicability of a fast and non-invasive field method for estimation of chlorophyll content in quinoa and amaranth leaves based on RGB...

  17. The Detection and Interpretation of Interaction Effects between Continuous Variables in Multiple Regression.

    Science.gov (United States)

    Jaccard, James; And Others

    1990-01-01

    Issues in the detection and interpretation of interaction effects between quantitative variables in multiple regression analysis are discussed. Recent discussions associated with problems of multicollinearity are reviewed in the context of the conditional nature of multiple regression with product terms. (TJH)

  18. Determination of the acid dissociation constant of bromocresol green and cresol red in water/AOT/isooctane reverse micelles by multiple linear regression and extended principal component analysis.

    Science.gov (United States)

    Caselli, Maurizio; Mangone, Annarosa; Paolillo, Paola; Traini, Angela

    2002-01-01

    The pKa of 3',3",5',5"tetrabromo-m-cresolsulfonephtalein (Bromocresol Green) and o-cresolsulphonephtalein (Cresol Red) was spectrophotometrically measured in a water/AOT/isooctane microemulsion in the presence of a series of buffers carrying different charges at different water/surfactant ratios. Extended Principal Component Analysis was used for a precise determination of the apparent pKa and of the spectra of the acid and base forms of the dye. The apparent pKa of dyes in water-in-oil microemulsions depends on the charge of the acid and base forms of the buffers present in the water pool. Combination with multiple linear regression increases the precision. Results are discussed taking into account the profile of the electrostatic potential in the water pool and the possible partition of the indicator between the aqueous core and the surfactant. The pKa corrected for these effects are independent of w0 and are close to the value of the pKa in bulk water. On the basis of a tentative hypothesis it is possible to calculate the true pKa of the buffer in the pool.

  19. Application of Multiple Linear Regression and Extended Principal-Component Analysis to Determination of the Acid Dissociation Constant of 7-Hydroxycoumarin in Water/AOT/Isooctane Reverse Micelles.

    Science.gov (United States)

    Caselli; Daniele; Mangone; Paolillo

    2000-01-15

    The apparent pK(a) of dyes in water-in-oil microemulsions depends on the charge of the acid and base forms of the buffers present in the water pool. Extended principal-component analysis allows the precise determination of the apparent pK(a) and of the spectra of the acid and base forms of the dye. Combination with multiple linear regression increases the precision. The pK(a) of 7-hydroxycoumarin (umbelliferone) was spectrophotometrically measured in a water/AOT/isooctane microemulsion in the presence of a series of buffers carrying different charges at various different water/surfactant ratios. The spectra of the acid and base forms of the dye in the microemulsion are very similar to those in bulk water in the presence of Tris and ammonia. The presence of carbonate changes somewhat the spectrum of the acid form. Results are discussed taking into account the profile of the electrostatic potential drop in the water pool and the possible partition of umbelliferone between the aqueous core and the surfactant. The pK(a) values corrected for these effects are independent of w(0) and are close to the value of the pK(a) in bulk water. Copyright 2000 Academic Press.

  20. Clearness index in cloudy days estimated with meteorological information by multiple regression analysis; Kisho joho wo riyoshita kaiki bunseki ni yoru dontenbi no seiten shisu no suitei

    Energy Technology Data Exchange (ETDEWEB)

    Nakagawa, S. [Maizuru National College of Technology, Kyoto (Japan); Kenmoku, Y.; Sakakibara, T. [Toyohashi University of Technology, Aichi (Japan); Kawamoto, T. [Shizuoka University, Shizuoka (Japan). Faculty of Engineering

    1996-10-27

    Study is under way for a more accurate solar radiation quantity prediction for the enhancement of solar energy utilization efficiency. Utilizing the technique of roughly estimating the day`s clearness index from forecast weather, the forecast weather (constituted of weather conditions such as `clear,` `cloudy,` etc., and adverbs or adjectives such as `afterward,` `temporary,` and `intermittent`) has been quantified relative to the clearness index. This index is named the `weather index` for the purpose of this article. The error high in rate in the weather index relates to cloudy days, which means a weather index falling in 0.2-0.5. It has also been found that there is a high correlation between the clearness index and the north-south wind direction component. A multiple regression analysis has been carried out, under the circumstances, for the estimation of clearness index from the maximum temperature and the north-south wind direction component. As compared with estimation of the clearness index on the basis only of the weather index, estimation using the weather index and maximum temperature achieves a 3% improvement throughout the year. It has also been learned that estimation by use of the weather index and north-south wind direction component enables a 2% improvement for summer and a 5% or higher improvement for winter. 2 refs., 6 figs., 4 tabs.

  1. Principal component regression analysis with SPSS.

    Science.gov (United States)

    Liu, R X; Kuang, J; Gong, Q; Hou, X L

    2003-06-01

    The paper introduces all indices of multicollinearity diagnoses, the basic principle of principal component regression and determination of 'best' equation method. The paper uses an example to describe how to do principal component regression analysis with SPSS 10.0: including all calculating processes of the principal component regression and all operations of linear regression, factor analysis, descriptives, compute variable and bivariate correlations procedures in SPSS 10.0. The principal component regression analysis can be used to overcome disturbance of the multicollinearity. The simplified, speeded up and accurate statistical effect is reached through the principal component regression analysis with SPSS.

  2. Vehicle Travel Time Predication based on Multiple Kernel Regression

    Directory of Open Access Journals (Sweden)

    Wenjing Xu

    2014-07-01

    Full Text Available With the rapid development of transportation and logistics economy, the vehicle travel time prediction and planning become an important topic in logistics. Travel time prediction, which is indispensible for traffic guidance, has become a key issue for researchers in this field. At present, the prediction of travel time is mainly short term prediction, and the predication methods include artificial neural network, Kaman filter and support vector regression (SVR method etc. However, these algorithms still have some shortcomings, such as highcomputationcomplexity, slow convergence rate etc. This paper exploits the learning ability of multiple kernel learning regression (MKLR in nonlinear prediction processing characteristics, logistics planning based on MKLR for vehicle travel time prediction. The method for Vehicle travel time prediction includes the following steps: (1 preprocessing historical data; (2 selecting appropriate kernel function, training the historical data and performing analysis ;(3 predicting the vehicle travel time based on the trained model. The experimental results show that, through the analysis of using different methods for prediction, the vehicle travel time prediction method proposed in this paper, archives higher accuracy than other methods. It also illustrates the feasibility and effectiveness of the proposed prediction method.

  3. Isolating and Examining Sources of Suppression and Multicollinearity in Multiple Linear Regression

    Science.gov (United States)

    Beckstead, Jason W.

    2012-01-01

    The presence of suppression (and multicollinearity) in multiple regression analysis complicates interpretation of predictor-criterion relationships. The mathematical conditions that produce suppression in regression analysis have received considerable attention in the methodological literature but until now nothing in the way of an analytic…

  4. A Multiple Regression Approach to Normalization of Spatiotemporal Gait Features.

    Science.gov (United States)

    Wahid, Ferdous; Begg, Rezaul; Lythgo, Noel; Hass, Chris J; Halgamuge, Saman; Ackland, David C

    2016-04-01

    Normalization of gait data is performed to reduce the effects of intersubject variations due to physical characteristics. This study reports a multiple regression normalization approach for spatiotemporal gait data that takes into account intersubject variations in self-selected walking speed and physical properties including age, height, body mass, and sex. Spatiotemporal gait data including stride length, cadence, stance time, double support time, and stride time were obtained from healthy subjects including 782 children, 71 adults, 29 elderly subjects, and 28 elderly Parkinson's disease (PD) patients. Data were normalized using standard dimensionless equations, a detrending method, and a multiple regression approach. After normalization using dimensionless equations and the detrending method, weak to moderate correlations between walking speed, physical properties, and spatiotemporal gait features were observed (0.01 normalization using the multiple regression method reduced these correlations to weak values (|r| normalization using dimensionless equations and detrending resulted in significant differences in stride length and double support time of PD patients; however the multiple regression approach revealed significant differences in these features as well as in cadence, stance time, and stride time. The proposed multiple regression normalization may be useful in machine learning, gait classification, and clinical evaluation of pathological gait patterns.

  5. New Insights into Trace Element Partitioning in Amphibole from Multiple Regression Analysis, with Application to the Magma Plumbing System of Mt. Lamington (Papua New Guinea)

    Science.gov (United States)

    Zhang, J.; Humphreys, M.; Cooper, G.; Davidson, J.; Macpherson, C.

    2015-12-01

    We present a new multiple regression (MR) analysis of published amphibole-melt trace element partitioning data, with the aim of retrieving robust relationships between amphibole crystal-chemical compositions and trace element partition coefficients (D). We examined experimental data for calcic amphiboles of kaersutite, pargasite, tschermakite (Tsch), magnesiohornblende (MgHbl) and magnesiohastingsite (MgHst) compositions crystallized from basanitic-rhyolitic melts (n = 150). The MR analysis demonstrates the varying significance of amphibole major element components assigned to different crystallographic sites (T, M1-3, M4, A) as independent variables in controlling D, and it allows us to retrieve statistically significant relationships for REE, Y, Rb, Sr, Pb, Ti, Zr, Nb (n > 25, R2 > 0.6, p-value < 0.05). For example, DLREE are controlled by SiT, M1-3 site components and CaM4, whereas DMREE-HREE are controlled solely by M1-3 site components. Our overall results for the REE are supported by application of the lattice strain model (Blundy & Wood, 1994). A significant advantage of our study over previous work linking D to melt polymerization (e.g. Tiepolo et al., 2007) is the ability to reconstruct melt compositions from in situ amphibole compositional analyses and published D data. We applied our MR analysis to Mt. Lamington (PNG), where Mg-Hst in quenched mafic enclaves are juxtaposed with MgHbl-Tsch phenocrysts from andesitic host lavas. The results indicate that MgHbl-Tsch are crystallized from a cool, rhyolitic melt (800-900±50 ºC, 70-77±5 wt % SiO2; Ridolfi & Renzulli 2012) with lower Rb and Sr and higher Pb, relative to a hot, andesitic-dacitic melt (950-1,000±50 ºC; 60-70±5 wt % SiO2) where MgHst are crystallized. REE and Nb contents are similar in both types of melts despite higher REE and Nb in MgHbl-Tsch. Therefore, the REE compositional disparity between MgHst and MgHbl-Tsch is driven by the difference in the DREE, rather than the melt REE

  6. Direction of Effects in Multiple Linear Regression Models.

    Science.gov (United States)

    Wiedermann, Wolfgang; von Eye, Alexander

    2015-01-01

    Previous studies analyzed asymmetric properties of the Pearson correlation coefficient using higher than second order moments. These asymmetric properties can be used to determine the direction of dependence in a linear regression setting (i.e., establish which of two variables is more likely to be on the outcome side) within the framework of cross-sectional observational data. Extant approaches are restricted to the bivariate regression case. The present contribution extends the direction of dependence methodology to a multiple linear regression setting by analyzing distributional properties of residuals of competing multiple regression models. It is shown that, under certain conditions, the third central moments of estimated regression residuals can be used to decide upon direction of effects. In addition, three different approaches for statistical inference are discussed: a combined D'Agostino normality test, a skewness difference test, and a bootstrap difference test. Type I error and power of the procedures are assessed using Monte Carlo simulations, and an empirical example is provided for illustrative purposes. In the discussion, issues concerning the quality of psychological data, possible extensions of the proposed methods to the fourth central moment of regression residuals, and potential applications are addressed.

  7. Interpreting Multiple Linear Regression: A Guidebook of Variable Importance

    Science.gov (United States)

    Nathans, Laura L.; Oswald, Frederick L.; Nimon, Kim

    2012-01-01

    Multiple regression (MR) analyses are commonly employed in social science fields. It is also common for interpretation of results to typically reflect overreliance on beta weights, often resulting in very limited interpretations of variable importance. It appears that few researchers employ other methods to obtain a fuller understanding of what…

  8. Using Regression Mixture Analysis in Educational Research

    Directory of Open Access Journals (Sweden)

    Cody S. Ding

    2006-11-01

    Full Text Available Conventional regression analysis is typically used in educational research. Usually such an analysis implicitly assumes that a common set of regression parameter estimates captures the population characteristics represented in the sample. In some situations, however, this implicit assumption may not be realistic, and the sample may contain several subpopulations such as high math achievers and low math achievers. In these cases, conventional regression models may provide biased estimates since the parameter estimates are constrained to be the same across subpopulations. This paper advocates the applications of regression mixture models, also known as latent class regression analysis, in educational research. Regression mixture analysis is more flexible than conventional regression analysis in that latent classes in the data can be identified and regression parameter estimates can vary within each latent class. An illustration of regression mixture analysis is provided based on a dataset of authentic data. The strengths and limitations of the regression mixture models are discussed in the context of educational research.

  9. Applied regression analysis a research tool

    CERN Document Server

    Pantula, Sastry; Dickey, David

    1998-01-01

    Least squares estimation, when used appropriately, is a powerful research tool. A deeper understanding of the regression concepts is essential for achieving optimal benefits from a least squares analysis. This book builds on the fundamentals of statistical methods and provides appropriate concepts that will allow a scientist to use least squares as an effective research tool. Applied Regression Analysis is aimed at the scientist who wishes to gain a working knowledge of regression analysis. The basic purpose of this book is to develop an understanding of least squares and related statistical methods without becoming excessively mathematical. It is the outgrowth of more than 30 years of consulting experience with scientists and many years of teaching an applied regression course to graduate students. Applied Regression Analysis serves as an excellent text for a service course on regression for non-statisticians and as a reference for researchers. It also provides a bridge between a two-semester introduction to...

  10. Fundamental Analysis of the Linear Multiple Regression Technique for Quantification of Water Quality Parameters from Remote Sensing Data. Ph.D. Thesis - Old Dominion Univ.

    Science.gov (United States)

    Whitlock, C. H., III

    1977-01-01

    Constituents with linear radiance gradients with concentration may be quantified from signals which contain nonlinear atmospheric and surface reflection effects for both homogeneous and non-homogeneous water bodies provided accurate data can be obtained and nonlinearities are constant with wavelength. Statistical parameters must be used which give an indication of bias as well as total squared error to insure that an equation with an optimum combination of bands is selected. It is concluded that the effect of error in upwelled radiance measurements is to reduce the accuracy of the least square fitting process and to increase the number of points required to obtain a satisfactory fit. The problem of obtaining a multiple regression equation that is extremely sensitive to error is discussed.

  11. A comparative analysis of the effects of instructional design factors on student success in e-learning: multiple-regression versus neural networks

    Directory of Open Access Journals (Sweden)

    Halil Ibrahim Cebeci

    2009-12-01

    Full Text Available This study explores the relationship between the student performance and instructional design. The research was conducted at the E-Learning School at a university in Turkey. A list of design factors that had potential influence on student success was created through a review of the literature and interviews with relevant experts. From this, the five most import design factors were chosen. The experts scored 25 university courses on the extent to which they demonstrated the chosen design factors. Multiple-regression and supervised artificial neural network (ANN models were used to examine the relationship between student grade point averages and the scores on the five design factors. The results indicated that there is no statistical difference between the two models. Both models identified the use of examples and applications as the most influential factor. The ANN model provided more information and was used to predict the course-specific factor values required for a desired level of success.

  12. Least-Squares Linear Regression and Schrodinger's Cat: Perspectives on the Analysis of Regression Residuals.

    Science.gov (United States)

    Hecht, Jeffrey B.

    The analysis of regression residuals and detection of outliers are discussed, with emphasis on determining how deviant an individual data point must be to be considered an outlier and the impact that multiple suspected outlier data points have on the process of outlier determination and treatment. Only bivariate (one dependent and one independent)…

  13. Regression Analysis and the Sociological Imagination

    Science.gov (United States)

    De Maio, Fernando

    2014-01-01

    Regression analysis is an important aspect of most introductory statistics courses in sociology but is often presented in contexts divorced from the central concerns that bring students into the discipline. Consequently, we present five lesson ideas that emerge from a regression analysis of income inequality and mortality in the USA and Canada.

  14. Prediction on adsorption ratio of carbon dioxide to methane on coals with multiple linear regression

    Institute of Scientific and Technical Information of China (English)

    YU Hong-guan; MENG Xian-ming; FAN Wei-tang; YE Jian-ping

    2007-01-01

    The multiple linear regression equations for adsorption ratio of CO2/CH4 and its coal quality indexes were built with SPSS software on basis of existing coal quality data and its adsorption amount of CO2 and CH4.The regression equations built were tested with data collected from some S,and the influences of coal quality indexes on adsorption ratio of CO2/CH4 were studied with investigation of regression equations.The study results show that the regression equation for adsorption ratio of CO2/CH4 and volatile matter,ash and moisture in coal can be Obtained with multiple linear regression analysis,that the influence of same coal quality index with the degree of metamorphosis or influence of coal quality indexes for same coal rank on adsorption ratio is not consistent.

  15. A Solution to Separation and Multicollinearity in Multiple Logistic Regression.

    Science.gov (United States)

    Shen, Jianzhao; Gao, Sujuan

    2008-10-01

    In dementia screening tests, item selection for shortening an existing screening test can be achieved using multiple logistic regression. However, maximum likelihood estimates for such logistic regression models often experience serious bias or even non-existence because of separation and multicollinearity problems resulting from a large number of highly correlated items. Firth (1993, Biometrika, 80(1), 27-38) proposed a penalized likelihood estimator for generalized linear models and it was shown to reduce bias and the non-existence problems. The ridge regression has been used in logistic regression to stabilize the estimates in cases of multicollinearity. However, neither solves the problems for each other. In this paper, we propose a double penalized maximum likelihood estimator combining Firth's penalized likelihood equation with a ridge parameter. We present a simulation study evaluating the empirical performance of the double penalized likelihood estimator in small to moderate sample sizes. We demonstrate the proposed approach using a current screening data from a community-based dementia study.

  16. Two SPSS programs for interpreting multiple regression results.

    Science.gov (United States)

    Lorenzo-Seva, Urbano; Ferrando, Pere J; Chico, Eliseo

    2010-02-01

    When multiple regression is used in explanation-oriented designs, it is very important to determine both the usefulness of the predictor variables and their relative importance. Standardized regression coefficients are routinely provided by commercial programs. However, they generally function rather poorly as indicators of relative importance, especially in the presence of substantially correlated predictors. We provide two user-friendly SPSS programs that implement currently recommended techniques and recent developments for assessing the relevance of the predictors. The programs also allow the user to take into account the effects of measurement error. The first program, MIMR-Corr.sps, uses a correlation matrix as input, whereas the second program, MIMR-Raw.sps, uses the raw data and computes bootstrap confidence intervals of different statistics. The SPSS syntax, a short manual, and data files related to this article are available as supplemental materials from http://brm.psychonomic-journals.org/content/supplemental.

  17. Testing Mediation Using Multiple Regression and Structural Equation Modeling Analyses in Secondary Data

    Science.gov (United States)

    Li, Spencer D.

    2011-01-01

    Mediation analysis in child and adolescent development research is possible using large secondary data sets. This article provides an overview of two statistical methods commonly used to test mediated effects in secondary analysis: multiple regression and structural equation modeling (SEM). Two empirical studies are presented to illustrate the…

  18. Interpret with caution: multicollinearity in multiple regression of cognitive data.

    Science.gov (United States)

    Morrison, Catriona M

    2003-08-01

    Shibihara and Kondo in 2002 reported a reanalysis of the 1997 Kanji picture-naming data of Yamazaki, Ellis, Morrison, and Lambon-Ralph in which independent variables were highly correlated. Their addition of the variable visual familiarity altered the previously reported pattern of results, indicating that visual familiarity, but not age of acquisition, was important in predicting Kanji naming speed. The present paper argues that caution should be taken when drawing conclusions from multiple regression analyses in which the independent variables are so highly correlated, as such multicollinearity can lead to unreliable output.

  19. Forecasting relativistic electron flux using dynamic multiple regression models

    Directory of Open Access Journals (Sweden)

    H.-L. Wei

    2011-02-01

    Full Text Available The forecast of high energy electron fluxes in the radiation belts is important because the exposure of modern spacecraft to high energy particles can result in significant damage to onboard systems. A comprehensive physical model of processes related to electron energisation that can be used for such a forecast has not yet been developed. In the present paper a systems identification approach is exploited to deduce a dynamic multiple regression model that can be used to predict the daily maximum of high energy electron fluxes at geosynchronous orbit from data. It is shown that the model developed provides reliable predictions.

  20. Joint regression analysis and AMMI model applied to oat improvement

    Science.gov (United States)

    Oliveira, A.; Oliveira, T. A.; Mejza, S.

    2012-09-01

    In our work we present an application of some biometrical methods useful in genotype stability evaluation, namely AMMI model, Joint Regression Analysis (JRA) and multiple comparison tests. A genotype stability analysis of oat (Avena Sativa L.) grain yield was carried out using data of the Portuguese Plant Breeding Board, sample of the 22 different genotypes during the years 2002, 2003 and 2004 in six locations. In Ferreira et al. (2006) the authors state the relevance of the regression models and of the Additive Main Effects and Multiplicative Interactions (AMMI) model, to study and to estimate phenotypic stability effects. As computational techniques we use the Zigzag algorithm to estimate the regression coefficients and the agricolae-package available in R software for AMMI model analysis.

  1. Multiple regression analysis of influencing factors in FibroScan detecting liver stiffness%FibroScan检测肝脏硬度影响因素的多元回归分析

    Institute of Scientific and Technical Information of China (English)

    王爱光; 崔蕾; 王桂玲; 李建志; 张梅芳; 赵敏; 孙爱华; 单容

    2013-01-01

    目的 运用多元回归分析的方法,寻找影响FibroScan测定肝纤维化的独立预测因子.方法 选取住院及门诊需行肝穿刺活组织检查确诊的患者181例,于肝活检手术当天进行FibroScan检测肝脏硬度值(LSM),同时收集患者临床信息及±3d的常规血液检测指标.将筛选出的10个自变量进行多元回归分析.结果 多元回归分析结果显示,血小板(PLT)、血清白蛋白(ALB)、凝血酶原活动度(PTA)、体质量指数(BMI)为独立预测因子.结论 在FibroScan检测肝脏硬度时,PLT、ALB、PTA和BMI值可能会对检测结果产生影响.%Objective To determine the independent predictors influencing liver stiffness detection with FibroScan by multiple regression analysis.Methods One hundred and eighty-one inpatients and outpatients who required liver biopsy were enrolled.Liver stiffness measurement (LSM) was detected by FibroScan on the day of performing liver biopsy,and clinical information and routine biochemical tests data (± 3 days) were gathered.Ten factors were chosen for the multiple regression analysis.Results Multiple regression analysis showed that platelet(PLT),serum albumin (ALB),prothrombin activity (PTA) and body mass index (BMI) were independent predictors.Conclusion In FibroScan detection,PLT,ALB,PTA and BMI might be influencing factors.

  2. Relative risk regression analysis of epidemiologic data.

    Science.gov (United States)

    Prentice, R L

    1985-11-01

    Relative risk regression methods are described. These methods provide a unified approach to a range of data analysis problems in environmental risk assessment and in the study of disease risk factors more generally. Relative risk regression methods are most readily viewed as an outgrowth of Cox's regression and life model. They can also be viewed as a regression generalization of more classical epidemiologic procedures, such as that due to Mantel and Haenszel. In the context of an epidemiologic cohort study, relative risk regression methods extend conventional survival data methods and binary response (e.g., logistic) regression models by taking explicit account of the time to disease occurrence while allowing arbitrary baseline disease rates, general censorship, and time-varying risk factors. This latter feature is particularly relevant to many environmental risk assessment problems wherein one wishes to relate disease rates at a particular point in time to aspects of a preceding risk factor history. Relative risk regression methods also adapt readily to time-matched case-control studies and to certain less standard designs. The uses of relative risk regression methods are illustrated and the state of development of these procedures is discussed. It is argued that asymptotic partial likelihood estimation techniques are now well developed in the important special case in which the disease rates of interest have interpretations as counting process intensity functions. Estimation of relative risks processes corresponding to disease rates falling outside this class has, however, received limited attention. The general area of relative risk regression model criticism has, as yet, not been thoroughly studied, though a number of statistical groups are studying such features as tests of fit, residuals, diagnostics and graphical procedures. Most such studies have been restricted to exponential form relative risks as have simulation studies of relative risk estimation

  3. Hot Resistance Estimation for Dry Type Transformer Using Multiple Variable Regression, Multiple Polynomial Regression and Soft Computing Techniques

    Directory of Open Access Journals (Sweden)

    M. Srinivasan

    2012-01-01

    Full Text Available Problem statement: This study presents a novel method for the determination of average winding temperature rise of transformers under its predetermined field operating conditions. Rise in the winding temperature was determined from the estimated values of winding resistance during the heat run test conducted as per IEC standard. Approach: The estimation of hot resistance was modeled using Multiple Variable Regression (MVR, Multiple Polynomial Regression (MPR and soft computing techniques such as Artificial Neural Network (ANN and Adaptive Neuro Fuzzy Inference System (ANFIS. The modeled hot resistance will help to find the load losses at any load situation without using complicated measurement set up in transformers. Results: These techniques were applied for the hot resistance estimation for dry type transformer by using the input variables cold resistance, ambient temperature and temperature rise. The results are compared and they show a good agreement between measured and computed values. Conclusion: According to our experiments, the proposed methods are verified using experimental results, which have been obtained from temperature rise test performed on a 55 kVA dry-type transformer.

  4. Contiguous Uniform Deviation for Multiple Linear Regression in Pattern Recognition

    Science.gov (United States)

    Andriana, A. S.; Prihatmanto, D.; Hidaya, E. M. I.; Supriana, I.; Machbub, C.

    2017-01-01

    Understanding images by recognizing its objects is still a challenging task. Face elements detection has been developed by researchers but not yet shows enough information (low resolution in information) needed for recognizing objects. Available face recognition methods still have error in classification and need a huge amount of examples which may still be incomplete. Another approach which is still rare in understanding images uses pattern structures or syntactic grammars describing shape detail features. Image pixel values are also processed as signal patterns which are approximated by mathematical function curve fitting. This paper attempts to add contiguous uniform deviation method to curve fitting algorithm to increase applicability in image recognition system related to object movement. The combination of multiple linear regression and contiguous uniform deviation method are applied to the function of image pixel values, and show results in higher resolution (more information) of visual object detail description in object movement.

  5. Novel applications of multitask learning and multiple output regression to multiple genetic trait prediction

    Science.gov (United States)

    Kuhn, David; Parida, Laxmi

    2016-01-01

    Given a set of biallelic molecular markers, such as SNPs, with genotype values encoded numerically on a collection of plant, animal or human samples, the goal of genetic trait prediction is to predict the quantitative trait values by simultaneously modeling all marker effects. Genetic trait prediction is usually represented as linear regression models. In many cases, for the same set of samples and markers, multiple traits are observed. Some of these traits might be correlated with each other. Therefore, modeling all the multiple traits together may improve the prediction accuracy. In this work, we view the multitrait prediction problem from a machine learning angle: as either a multitask learning problem or a multiple output regression problem, depending on whether different traits share the same genotype matrix or not. We then adapted multitask learning algorithms and multiple output regression algorithms to solve the multitrait prediction problem. We proposed a few strategies to improve the least square error of the prediction from these algorithms. Our experiments show that modeling multiple traits together could improve the prediction accuracy for correlated traits. Availability and implementation: The programs we used are either public or directly from the referred authors, such as MALSAR (http://www.public.asu.edu/~jye02/Software/MALSAR/) package. The Avocado data set has not been published yet and is available upon request. Contact: dhe@us.ibm.com PMID:27307640

  6. Modeling Pan Evaporation for Kuwait by Multiple Linear Regression

    Directory of Open Access Journals (Sweden)

    Jaber Almedeij

    2012-01-01

    Full Text Available Evaporation is an important parameter for many projects related to hydrology and water resources systems. This paper constitutes the first study conducted in Kuwait to obtain empirical relations for the estimation of daily and monthly pan evaporation as functions of available meteorological data of temperature, relative humidity, and wind speed. The data used here for the modeling are daily measurements of substantial continuity coverage, within a period of 17 years between January 1993 and December 2009, which can be considered representative of the desert climate of the urban zone of the country. Multiple linear regression technique is used with a procedure of variable selection for fitting the best model forms. The correlations of evaporation with temperature and relative humidity are also transformed in order to linearize the existing curvilinear patterns of the data by using power and exponential functions, respectively. The evaporation models suggested with the best variable combinations were shown to produce results that are in a reasonable agreement with observation values.

  7. Overcoming multicollinearity in multiple regression using correlation coefficient

    Science.gov (United States)

    Zainodin, H. J.; Yap, S. J.

    2013-09-01

    Multicollinearity happens when there are high correlations among independent variables. In this case, it would be difficult to distinguish between the contributions of these independent variables to that of the dependent variable as they may compete to explain much of the similar variance. Besides, the problem of multicollinearity also violates the assumption of multiple regression: that there is no collinearity among the possible independent variables. Thus, an alternative approach is introduced in overcoming the multicollinearity problem in achieving a well represented model eventually. This approach is accomplished by removing the multicollinearity source variables on the basis of the correlation coefficient values based on full correlation matrix. Using the full correlation matrix can facilitate the implementation of Excel function in removing the multicollinearity source variables. It is found that this procedure is easier and time-saving especially when dealing with greater number of independent variables in a model and a large number of all possible models. Hence, in this paper detailed insight of the procedure is shown, compared and implemented.

  8. Functional data analysis of generalized regression quantiles

    KAUST Repository

    Guo, Mengmeng

    2013-11-05

    Generalized regression quantiles, including the conditional quantiles and expectiles as special cases, are useful alternatives to the conditional means for characterizing a conditional distribution, especially when the interest lies in the tails. We develop a functional data analysis approach to jointly estimate a family of generalized regression quantiles. Our approach assumes that the generalized regression quantiles share some common features that can be summarized by a small number of principal component functions. The principal component functions are modeled as splines and are estimated by minimizing a penalized asymmetric loss measure. An iterative least asymmetrically weighted squares algorithm is developed for computation. While separate estimation of individual generalized regression quantiles usually suffers from large variability due to lack of sufficient data, by borrowing strength across data sets, our joint estimation approach significantly improves the estimation efficiency, which is demonstrated in a simulation study. The proposed method is applied to data from 159 weather stations in China to obtain the generalized quantile curves of the volatility of the temperature at these stations. © 2013 Springer Science+Business Media New York.

  9. Neighborhood social capital and crime victimization: comparison of spatial regression analysis and hierarchical regression analysis.

    Science.gov (United States)

    Takagi, Daisuke; Ikeda, Ken'ichi; Kawachi, Ichiro

    2012-11-01

    Crime is an important determinant of public health outcomes, including quality of life, mental well-being, and health behavior. A body of research has documented the association between community social capital and crime victimization. The association between social capital and crime victimization has been examined at multiple levels of spatial aggregation, ranging from entire countries, to states, metropolitan areas, counties, and neighborhoods. In multilevel analysis, the spatial boundaries at level 2 are most often drawn from administrative boundaries (e.g., Census tracts in the U.S.). One problem with adopting administrative definitions of neighborhoods is that it ignores spatial spillover. We conducted a study of social capital and crime victimization in one ward of Tokyo city, using a spatial Durbin model with an inverse-distance weighting matrix that assigned each respondent a unique level of "exposure" to social capital based on all other residents' perceptions. The study is based on a postal questionnaire sent to 20-69 years old residents of Arakawa Ward, Tokyo. The response rate was 43.7%. We examined the contextual influence of generalized trust, perceptions of reciprocity, two types of social network variables, as well as two principal components of social capital (constructed from the above four variables). Our outcome measure was self-reported crime victimization in the last five years. In the spatial Durbin model, we found that neighborhood generalized trust, reciprocity, supportive networks and two principal components of social capital were each inversely associated with crime victimization. By contrast, a multilevel regression performed with the same data (using administrative neighborhood boundaries) found generally null associations between neighborhood social capital and crime. Spatial regression methods may be more appropriate for investigating the contextual influence of social capital in homogeneous cultural settings such as Japan.

  10. 奶牛产奶量与乳成分的多元回归分析%Multiple Regression Analysis on Milk Yield and Milk Composition of Dairy Cow

    Institute of Scientific and Technical Information of China (English)

    张巧娥; 吴学荣; 马水鱼; 邢燕

    2011-01-01

    通过SAS 8.2软件分析了20头胎次相同、泌乳期相近荷斯坦泌乳牛产奶量与乳成分中乳蛋白质率、乳脂率、干物质、体细胞数和乳中尿素氮的多元回归分析.结果表明:从产奶量与乳成分的单项指标回归分析表明,产奶量与乳脂率、体细胞数和干物质含量呈显著性的负相关,而与乳蛋白率和乳中尿素氮差异不显著;从产奶量与乳成分的多元回归分析表明,乳蛋白率、乳脂率和干物质含量对产奶量的影响高于体细胞数和乳中尿素氮,同时乳蛋白率、乳脂率、体细胞数和乳中尿素氮与产奶量成反比.%20 heads Holstein cattles of same matched plet and similar lactation period were selected. Multiple regression analysis between milk yield and protein ratio in milk, fat ration in milk, dry matter content,somatic cell count and urea nitrogen in milk were analyzed in this study by SAS 8.2. The result showed that the corelation between milk yield and fat ration in milk, somatic cell count, and dry matter content was significantly negative, while milk yield had no significant corelation with protein ratio in milk and urea nitrogen in milk according to single index regression analysis between milk yield and milk components. The effects of protein ratio in milk, fat ration in milk and dry matter content on milk yield were bigger than those of somatic cell count and urea nitrogen in milk, meanwhile, protein ratio in milk, fat ration in milk, somatic cell count and urea nitrogen in milk were inversely proportional to milk yield according to multiple regression analysis between milk yield and milk components.

  11. Forecasting Gold Prices Using Multiple Linear Regression Method

    Directory of Open Access Journals (Sweden)

    Z. Ismail

    2009-01-01

    Full Text Available Problem statement: Forecasting is a function in management to assist decision making. It is also described as the process of estimation in unknown future situations. In a more general term it is commonly known as prediction which refers to estimation of time series or longitudinal type data. Gold is a precious yellow commodity once used as money. It was made illegal in USA 41 years ago, but is now once again accepted as a potential currency. The demand for this commodity is on the rise. Approach: Objective of this study was to develop a forecasting model for predicting gold prices based on economic factors such as inflation, currency price movements and others. Following the melt-down of US dollars, investors are putting their money into gold because gold plays an important role as a stabilizing influence for investment portfolios. Due to the increase in demand for gold in Malaysian and other parts of the world, it is necessary to develop a model that reflects the structure and pattern of gold market and forecast movement of gold price. The most appropriate approach to the understanding of gold prices is the Multiple Linear Regression (MLR model. MLR is a study on the relationship between a single dependent variable and one or more independent variables, as this case with gold price as the single dependent variable. The fitted model of MLR will be used to predict the future gold prices. A naive model known as “forecast-1” was considered to be a benchmark model in order to evaluate the performance of the model. Results: Many factors determine the price of gold and based on “a hunch of experts”, several economic factors had been identified to have influence on the gold prices. Variables such as Commodity Research Bureau future index (CRB; USD/Euro Foreign Exchange Rate (EUROUSD; Inflation rate (INF; Money Supply (M1; New York Stock Exchange (NYSE; Standard and Poor 500 (SPX; Treasury Bill (T-BILL and US Dollar index (USDX were considered to

  12. Regression Discontinuity Designs with Multiple Rating-Score Variables

    Science.gov (United States)

    Reardon, Sean F.; Robinson, Joseph P.

    2012-01-01

    In the absence of a randomized control trial, regression discontinuity (RD) designs can produce plausible estimates of the treatment effect on an outcome for individuals near a cutoff score. In the standard RD design, individuals with rating scores higher than some exogenously determined cutoff score are assigned to one treatment condition; those…

  13. 多元回归与年龄移算法在老龄人口研究中的整合分析%The Meta-analysis of Multiple Regression and Age Shifting Algorithm in the Aging Population Research

    Institute of Scientific and Technical Information of China (English)

    吴启凡

    2015-01-01

    我国人口老龄化问题日趋明显,现阶段对人口老龄化的模型研究依然存在问题,在对我国人口老龄化情况的研究过程中,单纯运用多元回归的方法需考虑多重共线性问题,为避免此问题则要优选变量,但在逐步回归过程中又会将对其可能造成显著性影响的偏相关扰动项忽略,而且单纯运用回归模型进行预测将在长时间序列中造成较大误差,为此,结合年龄移算法对回归因子进行单项细度预测,再运用回归方程进行宏观计算,将大幅提高预测的精度。本文以男性人口、女性人口、城市人口、乡村人口等因素进行动态研究,先根据相关性分析,初步筛选影响因素,再通过多元线性回归找到人口老龄化与人口结构中相关因素的数量关系,这里通过逐步回归出恰好出现了偏相关扰动项无法接受检验的情况,我们运用两种标准化方法结合Mann-Whitney U检验进行验证分析,最终运用年龄移算模型和回归矩阵预测人口老龄化发展趋势,并根据预测结果进行相关分析,给出相应评价。%The problem of our aging population has become more evident ,the model for the study of population aging is still a problem at this stage. In the case of China’s aging population of the study,the issues of using a simple method (multiple re-gression multicollinearity) is to be considered,To avoid this problem may lead to the Multicollinearity,however they will be the likely cause of a significant impact which can be easily ignored. And use the simple regression model to predict the result in the long sequence may also give rise to more errors,so we need to combined with age-shift algorithm to return the individu-al factors fineness forecast,then use the macro regression equation to calculate,which will significantly improve the prediction accuracy. In this paper,According to correlation analysis,initial screening factors from

  14. An Effect Size for Regression Predictors in Meta-Analysis

    Science.gov (United States)

    Aloe, Ariel M.; Becker, Betsy Jane

    2012-01-01

    A new effect size representing the predictive power of an independent variable from a multiple regression model is presented. The index, denoted as r[subscript sp], is the semipartial correlation of the predictor with the outcome of interest. This effect size can be computed when multiple predictor variables are included in the regression model…

  15. Multiple linear regression analysis of X-ray measurement and WOMAC scores of knee osteoarthritis%膝关节骨性关节炎X线测量与WOMAC评分的多重线性回归分析

    Institute of Scientific and Technical Information of China (English)

    马玉峰; 阿迪力江; 董士宇; 吴忌; 王庆甫; 陈兆军; 杜春林; 李俊海; 黄沪; 时宗庭; 殷岳杉; 张雷

    2012-01-01

    Objective:To perform Multiple Linear Regression analysis of X-ray measurement and WOMAC scores of knee osteoarthritis, and to analyze their relationship with clinical and biomechanical concepts. Methods: From March 2011 to July 2011,140 patients (250 knees) were reviewed,including 132 knees in the left and 118 knees in the right;ranging in age from 40 to 71 years,with an average of 54.68 years. The MB-RULER measurement software was applied to measure femoral angle, tibial angle, femorotibial angle .joint gap angle from antero-posterir and lateral position of X-rays. The WOMAC scores were also collected. Then multiple regression equations was applied for the linear regression analysis of correlation between the X-ray measurement and WOMAC scores. Results:There was statistical significance in the regression equation of AP X-rays value and WOMAC scores (P0.05). Conclusion :(D X-ray measurement of knee joint can reflect the WOMAC scores to a certain extent. ② It is necessary to measure the X-ray mechanical axis of knee, which is important for diagnosis and treatment of osteoarthritis. ③The correlation between libial angle Joint gap angle on antero-poslerior X-ray and WOMAC scores is significant,which can be used to assess the functional recovery of patients before and after treatment.%目的:进行膝关节骨性关节炎X线测量与WOMAC评分的多重线性回归分析,结合临床和生物力学分析两者的关系.方法:自2011年3月至2011年7月,膝关节骨性关节炎患者140例250膝,左侧132膝,右侧118膝;年龄40~71岁,平均54.68岁.应用MB-RULER测量软件测量患膝正侧位X线片股骨角、胫骨角、股胫角及关节间隙角等数值,并采集WOMAC评分,应用多重线性回归建立回归方程分析两者相关性.结果:应用多重线性回归分析正位X线片测量数值与WOMAC评分的回归方程有统计学意义(P<0.05),而侧位X线片测量数值与WOMAC评分的回归方程无统计学意义(P>0.05).结论:

  16. Multiple predictor smoothing methods for sensitivity analysis.

    Energy Technology Data Exchange (ETDEWEB)

    Helton, Jon Craig; Storlie, Curtis B.

    2006-08-01

    The use of multiple predictor smoothing methods in sampling-based sensitivity analyses of complex models is investigated. Specifically, sensitivity analysis procedures based on smoothing methods employing the stepwise application of the following nonparametric regression techniques are described: (1) locally weighted regression (LOESS), (2) additive models, (3) projection pursuit regression, and (4) recursive partitioning regression. The indicated procedures are illustrated with both simple test problems and results from a performance assessment for a radioactive waste disposal facility (i.e., the Waste Isolation Pilot Plant). As shown by the example illustrations, the use of smoothing procedures based on nonparametric regression techniques can yield more informative sensitivity analysis results than can be obtained with more traditional sensitivity analysis procedures based on linear regression, rank regression or quadratic regression when nonlinear relationships between model inputs and model predictions are present.

  17. Research on the multiple linear regression in non-invasive blood glucose measurement.

    Science.gov (United States)

    Zhu, Jianming; Chen, Zhencheng

    2015-01-01

    A non-invasive blood glucose measurement sensor and the data process algorithm based on the metabolic energy conservation (MEC) method are presented in this paper. The physiological parameters of human fingertip can be measured by various sensing modalities, and blood glucose value can be evaluated with the physiological parameters by the multiple linear regression analysis. Five methods such as enter, remove, forward, backward and stepwise in multiple linear regression were compared, and the backward method had the best performance. The best correlation coefficient was 0.876 with the standard error of the estimate 0.534, and the significance was 0.012 (sig. regression equation was valid. The Clarke error grid analysis was performed to compare the MEC method with the hexokinase method, using 200 data points. The correlation coefficient R was 0.867 and all of the points were located in Zone A and Zone B, which shows the MEC method provides a feasible and valid way for non-invasive blood glucose measurement.

  18. Multiple linear combination (MLC) regression tests for common variants adapted to linkage disequilibrium structure

    Science.gov (United States)

    Yoo, Yun Joo; Sun, Lei; Poirier, Julia G.; Paterson, Andrew D.

    2016-01-01

    ABSTRACT By jointly analyzing multiple variants within a gene, instead of one at a time, gene‐based multiple regression can improve power, robustness, and interpretation in genetic association analysis. We investigate multiple linear combination (MLC) test statistics for analysis of common variants under realistic trait models with linkage disequilibrium (LD) based on HapMap Asian haplotypes. MLC is a directional test that exploits LD structure in a gene to construct clusters of closely correlated variants recoded such that the majority of pairwise correlations are positive. It combines variant effects within the same cluster linearly, and aggregates cluster‐specific effects in a quadratic sum of squares and cross‐products, producing a test statistic with reduced degrees of freedom (df) equal to the number of clusters. By simulation studies of 1000 genes from across the genome, we demonstrate that MLC is a well‐powered and robust choice among existing methods across a broad range of gene structures. Compared to minimum P‐value, variance‐component, and principal‐component methods, the mean power of MLC is never much lower than that of other methods, and can be higher, particularly with multiple causal variants. Moreover, the variation in gene‐specific MLC test size and power across 1000 genes is less than that of other methods, suggesting it is a complementary approach for discovery in genome‐wide analysis. The cluster construction of the MLC test statistics helps reveal within‐gene LD structure, allowing interpretation of clustered variants as haplotypic effects, while multiple regression helps to distinguish direct and indirect associations. PMID:27885705

  19. On empirical analysis of factors affecting commercial house prices based on multiple linear regression%基于多元线性回归的商品房价格影响因素实证分析

    Institute of Scientific and Technical Information of China (English)

    刘枬; 梁晨

    2014-01-01

    According to Keynesianism and property-value bubble theory,the paper takes the statistical yearbooks in the period of 2000~2012 as the data sample,finds the three factors on house prices,that is,contemporary per capita income,newly increased housing areas,and property prices in previous year,discusses their influence on property prices by adopting the relevant analysis and multiple linear regression,figures out the main factors for the price fluctuation,and points out some suggestions for controlling the property prices.%依据凯恩斯理论和房地产泡沫理论,以统计年鉴2000年~2012年相关数据作样本,选取了当年年人均收入、新增住房面积、上一年商品房价格三个影响房价的因素,利用相关分析和多元线性回归分析测度其对房价的影响,找出了引起房价波动的主要因素,并提出了控制房价的建议。

  20. 基于人工鱼群算法的多元线性回归分析问题处理%Solution to problems concerning AFSA- based multiple linear regression analysis

    Institute of Scientific and Technical Information of China (English)

    李媛

    2011-01-01

    人工鱼群算法(AFsA)是一种基于动物行为的自治体寻优模式,依据鱼类活动特点构建的新型智能仿生算法.简要介绍了AFSA算法的基本原理,描述了使用AFSA算法解决多元线性回归分析问题的步骤和结果.仿真实验结果表明,AFSA算法在处理多元线性回归分析问题上是一种简单、高效的算法.%A brief introduction is made of the basic principles of Artificial Fish Swarm Algorithm (AFSA), a new algorithm with autonomous optimization mode according to the behavior of fish swarm. The steps are analyzed for the solution to problems concerning AFSA - based multiple linear regression analysis. The simulation experiment proves that it is simple and efficient.

  1. MULTIPLE LOGISTIC REGRESSION MODEL TO PREDICT RISK FACTORS OF ORAL HEALTH DISEASES

    Directory of Open Access Journals (Sweden)

    Parameshwar V. Pandit

    2012-06-01

    Full Text Available Purpose: To analysis the dependence of oral health diseases i.e. dental caries and periodontal disease on considering the number of risk factors through the applications of logistic regression model. Method: The cross sectional study involves a systematic random sample of 1760 permanent dentition aged between 18-40 years in Dharwad, Karnataka, India. Dharwad is situated in North Karnataka. The mean age was 34.26±7.28. The risk factors of dental caries and periodontal disease were established by multiple logistic regression model using SPSS statistical software. Results: The factors like frequency of brushing, timings of cleaning teeth and type of toothpastes are significant persistent predictors of dental caries and periodontal disease. The log likelihood value of full model is –1013.1364 and Akaike’s Information Criterion (AIC is 1.1752 as compared to reduced regression model are -1019.8106 and 1.1748 respectively for dental caries. But, the log likelihood value of full model is –1085.7876 and AIC is 1.2577 followed by reduced regression model are -1019.8106 and 1.1748 respectively for periodontal disease. The area under Receiver Operating Characteristic (ROC curve for the dental caries is 0.7509 (full model and 0.7447 (reduced model; the ROC for the periodontal disease is 0.6128 (full model and 0.5821 (reduced model. Conclusions: The frequency of brushing, timings of cleaning teeth and type of toothpastes are main signifi cant risk factors of dental caries and periodontal disease. The fitting performance of reduced logistic regression model is slightly a better fit as compared to full logistic regression model in identifying the these risk factors for both dichotomous dental caries and periodontal disease.

  2. Multiple linear regression with correlations among the predictor variables. Theory and computer algorithm ridge (FORTRAN 77)

    Science.gov (United States)

    van Gaans, P. F. M.; Vriend, S. P.

    Application of ridge regression in geoscience usually is a more appropriate technique than ordinary least-squares regression, especially in the situation of highly intercorrelated predictor variables. A FORTRAN 77 program RIDGE for ridged multiple linear regression is presented. The theory of linear regression and ridge regression is treated, to allow for a careful interpretation of the results and to understand the structure of the program. The program gives various parameters to evaluate the extent of multicollinearity within a given regression problem, such as the correlation matrix, multiple correlations among the predictors, variance inflation factors, eigenvalues, condition number, and the determinant of the predictors correlation matrix. The best method for the optimum choice of the ridge parameter with ridge regression has not been established yet. Estimates of the ridge bias, ridged variance inflation factors, estimates, and norms for the ridge parameter therefore are given as output by RIDGE and should complement inspection of the ridge traces. Application within the earth sciences is discussed.

  3. Using Dominance Analysis to Determine Predictor Importance in Logistic Regression

    Science.gov (United States)

    Azen, Razia; Traxel, Nicole

    2009-01-01

    This article proposes an extension of dominance analysis that allows researchers to determine the relative importance of predictors in logistic regression models. Criteria for choosing logistic regression R[superscript 2] analogues were determined and measures were selected that can be used to perform dominance analysis in logistic regression. A…

  4. A critical assessment of shrinkage-based regression approaches for estimating the adverse health effects of multiple air pollutants

    Science.gov (United States)

    Roberts, Steven; Martin, Michael

    Most investigations of the adverse health effects of multiple air pollutants analyse the time series involved by simultaneously entering the multiple pollutants into a Poisson log-linear model. Concerns have been raised about this type of analysis, and it has been stated that new methodology or models should be developed for investigating the adverse health effects of multiple air pollutants. In this paper, we introduce the use of the lasso for this purpose and compare its statistical properties to those of ridge regression and the Poisson log-linear model. Ridge regression has been used in time series analyses on the adverse health effects of multiple air pollutants but its properties for this purpose have not been investigated. A series of simulation studies was used to compare the performance of the lasso, ridge regression, and the Poisson log-linear model. In these simulations, realistic mortality time series were generated with known air pollution mortality effects permitting the performance of the three models to be compared. Both the lasso and ridge regression produced more accurate estimates of the adverse health effects of the multiple air pollutants than those produced using the Poisson log-linear model. This increase in accuracy came at the expense of increased bias. Ridge regression produced more accurate estimates than the lasso, but the lasso produced more interpretable models. The lasso and ridge regression offer a flexible way of obtaining more accurate estimation of pollutant effects than that provided by the standard Poisson log-linear model.

  5. Problems of correlations between explanatory variables in multiple regression analyses in the dental literature.

    Science.gov (United States)

    Tu, Y-K; Kellett, M; Clerehugh, V; Gilthorpe, M S

    2005-10-01

    Multivariable analysis is a widely used statistical methodology for investigating associations amongst clinical variables. However, the problems of collinearity and multicollinearity, which can give rise to spurious results, have in the past frequently been disregarded in dental research. This article illustrates and explains the problems which may be encountered, in the hope of increasing awareness and understanding of these issues, thereby improving the quality of the statistical analyses undertaken in dental research. Three examples from different clinical dental specialties are used to demonstrate how to diagnose the problem of collinearity/multicollinearity in multiple regression analyses and to illustrate how collinearity/multicollinearity can seriously distort the model development process. Lack of awareness of these problems can give rise to misleading results and erroneous interpretations. Multivariable analysis is a useful tool for dental research, though only if its users thoroughly understand the assumptions and limitations of these methods. It would benefit evidence-based dentistry enormously if researchers were more aware of both the complexities involved in multiple regression when using these methods and of the need for expert statistical consultation in developing study design and selecting appropriate statistical methodologies.

  6. Factors Influencing Marginal Fit of Crowns after Cementation: A multiple Linear Regression Analysis.%全冠边缘间隙影响因素的多元线性回归分析

    Institute of Scientific and Technical Information of China (English)

    章少萍; 马守治; 陈熙; 童新文; 李秀容; 张维文

    2011-01-01

    evaluated before and after cementation. Multiple linear regression analysis was used to determine whether the independent variables mentioned above had an impact on the MDAC. Results: Marginal discrepancies increased significantly after cementation.The backward multiple regression analysis showed that the FLP, TAAWP, HP, MDBC, and PLRC were jointly predictives of the MDAC. Conclusion: The FLP and MDBC may have a weak influence on the MDAC, while the TAAWP, HP and PLRC impact MDAC more significantly.

  7. Tightness of M-estimators for multiple linear regression in time series

    DEFF Research Database (Denmark)

    Johansen, Søren; Nielsen, Bent

    We show tightness of a general M-estimator for multiple linear regression in time series. The positive criterion function for the M-estimator is assumed lower semi-continuous and sufficiently large for large argument: Particular cases are the Huber-skip and quantile regression. Tightness requires...

  8. Multiple regression technique for Pth degree polynominals with and without linear cross products

    Science.gov (United States)

    Davis, J. W.

    1973-01-01

    A multiple regression technique was developed by which the nonlinear behavior of specified independent variables can be related to a given dependent variable. The polynomial expression can be of Pth degree and can incorporate N independent variables. Two cases are treated such that mathematical models can be studied both with and without linear cross products. The resulting surface fits can be used to summarize trends for a given phenomenon and provide a mathematical relationship for subsequent analysis. To implement this technique, separate computer programs were developed for the case without linear cross products and for the case incorporating such cross products which evaluate the various constants in the model regression equation. In addition, the significance of the estimated regression equation is considered and the standard deviation, the F statistic, the maximum absolute percent error, and the average of the absolute values of the percent of error evaluated. The computer programs and their manner of utilization are described. Sample problems are included to illustrate the use and capability of the technique which show the output formats and typical plots comparing computer results to each set of input data.

  9. A note on the use of multiple linear regression in molecular ecology.

    Science.gov (United States)

    Frasier, Timothy R

    2016-03-01

    Multiple linear regression analyses (also often referred to as generalized linear models--GLMs, or generalized linear mixed models--GLMMs) are widely used in the analysis of data in molecular ecology, often to assess the relative effects of genetic characteristics on individual fitness or traits, or how environmental characteristics influence patterns of genetic differentiation. However, the coefficients resulting from multiple regression analyses are sometimes misinterpreted, which can lead to incorrect interpretations and conclusions within individual studies, and can propagate to wider-spread errors in the general understanding of a topic. The primary issue revolves around the interpretation of coefficients for independent variables when interaction terms are also included in the analyses. In this scenario, the coefficients associated with each independent variable are often interpreted as the independent effect of each predictor variable on the predicted variable. However, this interpretation is incorrect. The correct interpretation is that these coefficients represent the effect of each predictor variable on the predicted variable when all other predictor variables are zero. This difference may sound subtle, but the ramifications cannot be overstated. Here, my goals are to raise awareness of this issue, to demonstrate and emphasize the problems that can result and to provide alternative approaches for obtaining the desired information.

  10. Epistasis analysis for quantitative traits by functional regression model.

    Science.gov (United States)

    Zhang, Futao; Boerwinkle, Eric; Xiong, Momiao

    2014-06-01

    The critical barrier in interaction analysis for rare variants is that most traditional statistical methods for testing interactions were originally designed for testing the interaction between common variants and are difficult to apply to rare variants because of their prohibitive computational time and poor ability. The great challenges for successful detection of interactions with next-generation sequencing (NGS) data are (1) lack of methods for interaction analysis with rare variants, (2) severe multiple testing, and (3) time-consuming computations. To meet these challenges, we shift the paradigm of interaction analysis between two loci to interaction analysis between two sets of loci or genomic regions and collectively test interactions between all possible pairs of SNPs within two genomic regions. In other words, we take a genome region as a basic unit of interaction analysis and use high-dimensional data reduction and functional data analysis techniques to develop a novel functional regression model to collectively test interactions between all possible pairs of single nucleotide polymorphisms (SNPs) within two genome regions. By intensive simulations, we demonstrate that the functional regression models for interaction analysis of the quantitative trait have the correct type 1 error rates and a much better ability to detect interactions than the current pairwise interaction analysis. The proposed method was applied to exome sequence data from the NHLBI's Exome Sequencing Project (ESP) and CHARGE-S study. We discovered 27 pairs of genes showing significant interactions after applying the Bonferroni correction (P-values < 4.58 × 10(-10)) in the ESP, and 11 were replicated in the CHARGE-S study.

  11. The comparison between several robust ridge regression estimators in the presence of multicollinearity and multiple outliers

    Science.gov (United States)

    Zahari, Siti Meriam; Ramli, Norazan Mohamed; Moktar, Balkiah; Zainol, Mohammad Said

    2014-09-01

    In the presence of multicollinearity and multiple outliers, statistical inference of linear regression model using ordinary least squares (OLS) estimators would be severely affected and produces misleading results. To overcome this, many approaches have been investigated. These include robust methods which were reported to be less sensitive to the presence of outliers. In addition, ridge regression technique was employed to tackle multicollinearity problem. In order to mitigate both problems, a combination of ridge regression and robust methods was discussed in this study. The superiority of this approach was examined when simultaneous presence of multicollinearity and multiple outliers occurred in multiple linear regression. This study aimed to look at the performance of several well-known robust estimators; M, MM, RIDGE and robust ridge regression estimators, namely Weighted Ridge M-estimator (WRM), Weighted Ridge MM (WRMM), Ridge MM (RMM), in such a situation. Results of the study showed that in the presence of simultaneous multicollinearity and multiple outliers (in both x and y-direction), the RMM and RIDGE are more or less similar in terms of superiority over the other estimators, regardless of the number of observation, level of collinearity and percentage of outliers used. However, when outliers occurred in only single direction (y-direction), the WRMM estimator is the most superior among the robust ridge regression estimators, by producing the least variance. In conclusion, the robust ridge regression is the best alternative as compared to robust and conventional least squares estimators when dealing with simultaneous presence of multicollinearity and outliers.

  12. Stability Analysis for Regularized Least Squares Regression

    OpenAIRE

    Rudin, Cynthia

    2005-01-01

    We discuss stability for a class of learning algorithms with respect to noisy labels. The algorithms we consider are for regression, and they involve the minimization of regularized risk functionals, such as L(f) := 1/N sum_i (f(x_i)-y_i)^2+ lambda ||f||_H^2. We shall call the algorithm `stable' if, when y_i is a noisy version of f*(x_i) for some function f* in H, the output of the algorithm converges to f* as the regularization term and noise simultaneously vanish. We consider two flavors of...

  13. Modeling the Philippines' real gross domestic product: A normal estimation equation for multiple linear regression

    Science.gov (United States)

    Urrutia, Jackie D.; Tampis, Razzcelle L.; Mercado, Joseph; Baygan, Aaron Vito M.; Baccay, Edcon B.

    2016-02-01

    The objective of this research is to formulate a mathematical model for the Philippines' Real Gross Domestic Product (Real GDP). The following factors are considered: Consumers' Spending (x1), Government's Spending (x2), Capital Formation (x3) and Imports (x4) as the Independent Variables that can actually influence in the Real GDP in the Philippines (y). The researchers used a Normal Estimation Equation using Matrices to create the model for Real GDP and used α = 0.01.The researchers analyzed quarterly data from 1990 to 2013. The data were acquired from the National Statistical Coordination Board (NSCB) resulting to a total of 96 observations for each variable. The data have undergone a logarithmic transformation particularly the Dependent Variable (y) to satisfy all the assumptions of the Multiple Linear Regression Analysis. The mathematical model for Real GDP was formulated using Matrices through MATLAB. Based on the results, only three of the Independent Variables are significant to the Dependent Variable namely: Consumers' Spending (x1), Capital Formation (x3) and Imports (x4), hence, can actually predict Real GDP (y). The regression analysis displays that 98.7% (coefficient of determination) of the Independent Variables can actually predict the Dependent Variable. With 97.6% of the result in Paired T-Test, the Predicted Values obtained from the model showed no significant difference from the Actual Values of Real GDP. This research will be essential in appraising the forthcoming changes to aid the Government in implementing policies for the development of the economy.

  14. Multiple Regression Prediction Model for Cutting Forces in Turning Carbon-Reinforced PEEK CF30

    Directory of Open Access Journals (Sweden)

    Francisco Mata

    2010-01-01

    Full Text Available Among the thermoplastic polymers available, the reinforced polyetheretherketone with 30% of carbon fibres (PEEK CF 30 demonstrates a particularly good combination of strength, rigidity, and hardness, which prove ideal for industrial applications. Considering these properties and potential areas of application, it is necessary to investigate the machining of PEEK CF30. In this study, response surface methodology was applied to predict the cutting forces in turning operations using TiN-coated cutting tools under dry conditions where the machining parameters are cutting speed ranges, feed rate, and depth of cut. For this study, the experiments have been conducted using full factorial design in the design of experiments (DOEs on CNC turning machine. Based on statistical analysis, multiple quadratic regression model for cutting forces was derived with satisfactory 2-squared correlation. This model proved to be highly preferment for predicting cutting forces.

  15. Transformation of nitrogen dioxide into ozone and prediction of ozone concentrations using multiple linear regression techniques.

    Science.gov (United States)

    Ghazali, Nurul Adyani; Ramli, Nor Azam; Yahaya, Ahmad Shukri; Yusof, Noor Faizah Fitri M D; Sansuddin, Nurulilyana; Al Madhoun, Wesam Ahmed

    2010-06-01

    Analysis and forecasting of air quality parameters are important topics of atmospheric and environmental research today due to the health impact caused by air pollution. This study examines transformation of nitrogen dioxide (NO(2)) into ozone (O(3)) at urban environment using time series plot. Data on the concentration of environmental pollutants and meteorological variables were employed to predict the concentration of O(3) in the atmosphere. Possibility of employing multiple linear regression models as a tool for prediction of O(3) concentration was tested. Results indicated that the presence of NO(2) and sunshine influence the concentration of O(3) in Malaysia. The influence of the previous hour ozone on the next hour concentrations was also demonstrated.

  16. SPECIFICS OF THE APPLICATIONS OF MULTIPLE REGRESSION MODEL IN THE ANALYSES OF THE EFFECTS OF GLOBAL FINANCIAL CRISES

    Directory of Open Access Journals (Sweden)

    Željko V. Račić

    2010-12-01

    Full Text Available This paper aims to present the specifics of the application of multiple linear regression model. The economic (financial crisis is analyzed in terms of gross domestic product which is in a function of the foreign trade balance (on one hand and the credit cards, i.e. indebtedness of the population on this basis (on the other hand, in the USA (from 1999. to 2008. We used the extended application model which shows how the analyst should run the whole development process of regression model. This process began with simple statistical features and the application of regression procedures, and ended with residual analysis, intended for the study of compatibility of data and model settings. This paper also analyzes the values of some standard statistics used in the selection of appropriate regression model. Testing of the model is carried out with the use of the Statistics PASW 17 program.

  17. Tools to support interpreting multiple regression in the face of multicollinearity.

    Science.gov (United States)

    Kraha, Amanda; Turner, Heather; Nimon, Kim; Zientek, Linda Reichwein; Henson, Robin K

    2012-01-01

    While multicollinearity may increase the difficulty of interpreting multiple regression (MR) results, it should not cause undue problems for the knowledgeable researcher. In the current paper, we argue that rather than using one technique to investigate regression results, researchers should consider multiple indices to understand the contributions that predictors make not only to a regression model, but to each other as well. Some of the techniques to interpret MR effects include, but are not limited to, correlation coefficients, beta weights, structure coefficients, all possible subsets regression, commonality coefficients, dominance weights, and relative importance weights. This article will review a set of techniques to interpret MR effects, identify the elements of the data on which the methods focus, and identify statistical software to support such analyses.

  18. Improving the accuracies of bathymetric models based on multiple regression for calibration (case study: Sarca River, Italy)

    Science.gov (United States)

    Niroumand-Jadidi, Milad; Vitti, Alfonso

    2016-10-01

    The optical imagery has the potential for extraction of spatially and temporally explicit bathymetric information in inland/coastal waters. Lyzenga's model and optimal band ratio analysis (OBRA) are main bathymetric models which both provide linear relations with water depths. The former model is sensitive and the latter is quite robust to substrate variability. The simple regression is the widely used approach for calibration of bathymetric models either Lyzenga's model or OBRA model. In this research, a multiple regression is examined for empirical calibration of the models in order to take the advantage of all spectral channels of the imagery. This method is applied on both Lyzenga's model and OBRA model for the bathymetry of a shallow Alpine river in Italy, using WorldView-2 (WV-2) and GeoEye images. Insitu depths are recorded using RTK GPS in two reaches. One-half of the data is used for calibration of models and the remaining half as independent check-points for accuracy assessment. In addition, radiative transfer model is used to simulate a set of spectra in a range of depths, substrate types, and water column properties. The simulated spectra are convolved to the sensors' spectral bands for further bathymetric analysis. Investigating the simulated spectra, it is concluded that the multiple regression improves the robustness of the Lyzenga's model with respect to the substrate variability. The improvements of multiple regression approach are much more pronounced for the Lyzenga's model rather than the OBRA model. This is in line with findings from real imagery; for instance, the multiple regression applied for calibration of Lyzenga's and OBRA models demonstrated, respectively, 22% and 9% higher determination coefficients (R2) as well as 3 cm and 1 cm better RMSEs compared to the simple regression using the WV-2 image.

  19. On asymptotics of t-type regression estimation in multiple linear model

    Institute of Scientific and Technical Information of China (English)

    2004-01-01

    We consider a robust estimator (t-type regression estimator) of multiple linear regression model by maximizing marginal likelihood of a scaled t-type error t-distribution.The marginal likelihood can also be applied to the de-correlated response when the withinsubject correlation can be consistently estimated from an initial estimate of the model based on the independent working assumption. This paper shows that such a t-type estimator is consistent.

  20. Multiple linear and principal component regressions for modelling ecotoxicity bioassay response.

    Science.gov (United States)

    Gomes, Ana I; Pires, José C M; Figueiredo, Sónia A; Boaventura, Rui A R

    2014-01-01

    The ecotoxicological response of the living organisms in an aquatic system depends on the physical, chemical and bacteriological variables, as well as the interactions between them. An important challenge to scientists is to understand the interaction and behaviour of factors involved in a multidimensional process such as the ecotoxicological response. With this aim, multiple linear regression (MLR) and principal component regression were applied to the ecotoxicity bioassay response of Chlorella vulgaris and Vibrio fischeri in water collected at seven sites of Leça river during five monitoring campaigns (February, May, June, August and September of 2006). The river water characterization included the analysis of 22 physicochemical and 3 microbiological parameters. The model that best fitted the data was MLR, which shows: (i) a negative correlation with dissolved organic carbon, zinc and manganese, and a positive one with turbidity and arsenic, regarding C. vulgaris toxic response; (ii) a negative correlation with conductivity and turbidity and a positive one with phosphorus, hardness, iron, mercury, arsenic and faecal coliforms, concerning V. fischeri toxic response. This integrated assessment may allow the evaluation of the effect of future pollution abatement measures over the water quality of Leça River.

  1. Applying Least Absolute Shrinkage Selection Operator and Akaike Information Criterion Analysis to Find the Best Multiple Linear Regression Models between Climate Indices and Components of Cow’s Milk

    Directory of Open Access Journals (Sweden)

    Mohammad Reza Marami Milani

    2016-07-01

    Full Text Available This study focuses on multiple linear regression models relating six climate indices (temperature humidity THI, environmental stress ESI, equivalent temperature index ETI, heat load HLI, modified HLI (HLI new, and respiratory rate predictor RRP with three main components of cow’s milk (yield, fat, and protein for cows in Iran. The least absolute shrinkage selection operator (LASSO and the Akaike information criterion (AIC techniques are applied to select the best model for milk predictands with the smallest number of climate predictors. Uncertainty estimation is employed by applying bootstrapping through resampling. Cross validation is used to avoid over-fitting. Climatic parameters are calculated from the NASA-MERRA global atmospheric reanalysis. Milk data for the months from April to September, 2002 to 2010 are used. The best linear regression models are found in spring between milk yield as the predictand and THI, ESI, ETI, HLI, and RRP as predictors with p-value < 0.001 and R2 (0.50, 0.49 respectively. In summer, milk yield with independent variables of THI, ETI, and ESI show the highest relation (p-value < 0.001 with R2 (0.69. For fat and protein the results are only marginal. This method is suggested for the impact studies of climate variability/change on agriculture and food science fields when short-time series or data with large uncertainty are available.

  2. 老年多器官功能不全综合征发病危险因素的逐步Logistic回归分析%Stepwise Logistic Regression Analysis of Risk Factors of Multiple Organ Dysfunction Syndrome in Elderly

    Institute of Scientific and Technical Information of China (English)

    谭清武; 李庆华

    2009-01-01

    Objective To study the risk factors of multiple organ dysfunction syndrome in elderly (MODSE).Methods A retrospective study was conducted on data of 393 patients aging over 60 hospitalized due to lung infection or having lung infection in hospital from 2001 to 2006.The patients were divided into group MODSE(n=196) and group non-MODSE(n=224).Risk factors of statistical significance were first screened out by single factor analysis,and then independent risk factors by stepwise Logistic regression analysis.Results Single factor analysis showed that age,chronic obstructive pulmonary disease,chronic respiratory failure,pulmonary interstitial fibrosis,pulmonary heart disease,coronary heart disease,chronic cardiac insufficiency,cerebrovascular disease,cervical spondylosis,chronic hepatitis and cirrhosis,diabetes,hyperuricemia,chronic renal failure,malignant tumor,hemoglobin,albumin,urea nitrogen,creatinine and fasting blood glucose were risk factors of MODSE.Stepwise Logistic regression analysis showed that chronic obstructive pulmonary disease,chronic respiratory failure,pulmonary fibrosis,chronic cardiac insufficiency,cerebrovascular disease,diabetes,chronic renal failure,low hemoglobin,low albumin,high urea nitrogen and high fasting blood glucose were independent risk factors of MODSE.Conclusion Chronic obstructive pulmonary disease,chronic respiratory failure,pulmonary fibrosis,chronic cardiac insufficiency,cerebrovascular disease,diabetes,chronic renal failure,low hemoglobin,low albumin,high urea nitrogen and high fasting blood glucose were independent risk factors of MODSE.%目的 探讨老年多器官功能不全综合征(MODSE)的发病危险因素.方法 回顾性调查2001-2006年因肺部感染在我院住院或住院期间出现肺部感染的驻石家庄地区60岁以上的师以上军队离退休干部393例的病历资料,根据肺部感染是否诱发MODSE将393例患者分为MODSE组(169例)和非MODSE组(224例).先以单因素分析筛选有统计学

  3. Multiple sources and multiple measures based traffic flow prediction using the chaos theory and support vector regression method

    Science.gov (United States)

    Cheng, Anyu; Jiang, Xiao; Li, Yongfu; Zhang, Chao; Zhu, Hao

    2017-01-01

    This study proposes a multiple sources and multiple measures based traffic flow prediction algorithm using the chaos theory and support vector regression method. In particular, first, the chaotic characteristics of traffic flow associated with the speed, occupancy, and flow are identified using the maximum Lyapunov exponent. Then, the phase space of multiple measures chaotic time series are reconstructed based on the phase space reconstruction theory and fused into a same multi-dimensional phase space using the Bayesian estimation theory. In addition, the support vector regression (SVR) model is designed to predict the traffic flow. Numerical experiments are performed using the data from multiple sources. The results show that, compared with the single measure, the proposed method has better performance for the short-term traffic flow prediction in terms of the accuracy and timeliness.

  4. Logistic regression analysis of risk factors of stress complications in patients with multiple trauma%多发伤患者应激并发症危险因素的logistic回归分析

    Institute of Scientific and Technical Information of China (English)

    徐文鹏; 应佑国; 方玉明; 秦宗和

    2014-01-01

    目的:探讨导致多发伤患者应激并发症相关的危险因素。方法:对64例多发伤患者应激并发症危险因素的相关指标进行单因素及多因素logistic 回归分析,收集血常规、C反应蛋白、血糖、血电解质、血气分析、肝肾功能等指标以及测定血清总三碘甲状腺原氨酸(TT3)、血清游离三碘甲状腺原氨酸(FT3)、血清总甲状腺素(TT4)、血清游离总甲状腺素、促甲状腺激素、促肾上腺皮质激素( ACTH)、血皮质醇( COR)、生长激素,细胞因子白介素( IL)-1、IL-6、IL-8、IL-10、肿瘤坏死因子( TNF-α)等。结果:64例出现应激性溃疡、急性呼吸窘迫综合征等应激并发症48例,出现应激并发症和无并发症患者,急性生理与慢性健康(APACH)Ⅱ评分、COR、ACTH、TT3、FT3、TT4、IL-6、IL-8、IL-10、TNF-α等差异均有统计学意义(P<0.05~P<0.01);多因素logistic分析表明,APACH Ⅱ评分、TT4、COR、IL-6和IL-10为早期应激并发症的独立危险因素(P<0.05~P<0.01)。结论:多发伤患者APACHⅡ评分以及创伤后TT4、COR、IL-6、IL-10等指标的变化与患者应激性消化道出血及呼吸窘迫综合征等应激并发症的出现密切相关。%Objective:To investigate the risk factors of stress complications in patients with multiple trauma. Methods:The clinical data of stress complications in 64 patients with multiple trauma were analyzed by univariate and multivariate binary Logistic regression. The data of routine blood,c-reactive protein,blood sugar,blood electrolytes,blood gas analysis,indexes of liver and kidney function, serum total iodine thyroid glycine(TT3),serum free thyroid original three iodine glycine(FT3),total serum thyroxine(TT4),serum free thyroid hormone,thyroid stimulating hormone,adrenocorticotropic hormone(ACTH),blood cortisol(COR),growth hormone,cytokine interleukin(IL)-1,IL-6,IL 8,IL-10 and tumor necrosis factor(TNF-α) in all cases were investigated. Results

  5. Multiple Factors Logistic Regression Analysis on the Basic Syndromes Related Factors in Patients with Chronic Prostatitis%慢性前列腺炎基本证型相关因素多元逐步Logistic回归分析

    Institute of Scientific and Technical Information of China (English)

    李兰群; 张强; 李海松; 郭军; 孙松; 邢建民; 周强; 谢春雨; 杨杰; 王彬

    2011-01-01

    目的 探讨慢性前列腺炎基本证型的相关因素.方法 制定调查表,在北京3家医院的中医男科收集慢性前列腺炎连续病例,采用Epidata 3.02建立数据库,SPSS 17.0软件统计分析,对基本证型的可能相关因素分别进行单因素和多因素Logistic回归分析.结果 从事脑力工种为湿热下注证的主要危险因素;西医分类ⅢA型、从事脑力和体力工种、工作时间≤8 h为气滞血瘀证的主要危险因素;病程12个月、居住不舒适、不饮用刺激性饮料、消化不良为肝气郁结证的主要危险因素;年龄增大、工作压力减小、冬季发病为肾阳虚损证的主要危险因素.结论 年龄、病程、西医分类、工种、工作时间、工作压力、发病季节、居住舒适度、消化不良和饮用刺激性饮料等因素与慢性前列腺炎基本证型有关.%Objective To explore the basic syndrome related factors in patients with chronic prostatitis (CP).Methods Using questionnaire to collect data of CP patients from three hospitals in Beijing, Epidata 3.02 to establish database, the uni-variate and multiple Logistic regression analysis was performed with SPSS 17.0 software to determine the basic syndrome related factors.Results Engagement in brainwork was the main risk factor for suffering from dampness-heat downward-flow syndrome; CP of type Ⅲ A (classified by Western medicine), employed on brainwork or physical work, and working time ≤ 8 h were risk factors for suffering from qi-stagnancy and blood-stasis syndrome; illness duration > 12 months, uncomfortable habitat, dislike for irritative beverages and poor digestive function were risk factors for Gan-qi stagnation syndrome; and aging, decreased burden from work, winter onset of the illness were those for Shen-yang deficiency syndrome.Conclusions The basic syndrome related factors in patients with CP are age of patient, duration and type of illness, occupational type, daily working time, burden

  6. Multiple Linear Regression Analysis of Quality of Life in Children with Cerebral Palsy%脑性瘫痪患儿生存质量相关因素多重线性回归分析

    Institute of Scientific and Technical Information of China (English)

    万瑞平; 刘振寰; 林青梅

    2011-01-01

    Objective To analyze the correlative factors influencing quality of life(QOL) in children with cerebral palsy(CP). Methods Eighty children with CP( CP group) and 80 healthy children( healthy control group) were eveluated by Pediatric Quality of Life Inventory Version 4 (PedsQL4.0) to assess their QOL,and then the differences in QOL of children were compared between the 2 groups. Children with CP were also assessed using Gesell Developmental Scale(GDS) and Gross Motor Function Classification System(GMFCS) to test their developmental quotient and severity, and then the correlation among QOL,sex, family incomes, clinical types, GM FCS,and the intelligence capacity were analyzed by multiple regression analysis. Results There were significant differences in physical function/aspect, emotional function, social function, psychological aspect and total QOL between CP group and healthy conorol group (Pa < 0.01 ). Intelligence degree was positive correlated to total score of QOL. Severity degree and intelligence degree were positive correlated to physical aspect, and age was negative correlated to physical aspect, while severity degree affected physical aspect most. Intelligence degree was positive correlated to psychological aspects. Conclusions QOL of children with CP had impairment in full - scale. The intelligence capacity and the physical functions and intelligence degree are important factors which influence QOL of children with CP.%目的 分析影响脑性瘫痪(脑瘫)儿童生存质量的相关因素.方法 将确诊为脑瘫的80例患儿作为脑瘫组,同时选择80例同龄健康儿童作为健康对照组.采用儿童生存质量的PedsQL4.0普适性核心量表对2组儿童的生存质量进行评定,比较2组儿童生存质量的差异;采用粗大运动功能分级系统(GMFCS)评定脑瘫患儿粗大运动功能的级别,采用北京Gesell发育商评定脑瘫患儿的智力水平;采用多重线性回归分析脑瘫患儿生存质量与性别、月

  7. Research on the impact factors of domestic old people' s tourism consumption through multiple stepwise regression analysis%老年游客旅游决策影响因素之多元逐步回归分析

    Institute of Scientific and Technical Information of China (English)

    章杰宽

    2011-01-01

    作者历时2个多月,在大量走访以及问卷调查的基础之上,着重研究分析了影响国内老年游客旅游消费行为的众多因素,并运用多元逐步回归分析方法研究了各因素对老年人旅游消费行为的影响程度。结论显示,影响老年人旅游行为的主要有13个因素,其中老年人的收入水平、旅游地景点的吸引力是影响老年人旅游行为——旅游次数、旅游停留时间和旅游日消费额的共同因素,而收入水平最为关键。%As our country population aging advancement is more and more obvious, the old tourist industry is rapidly becoming an important part of the tour market. Experience and theory of tourism behavior have shown that travel frequency, residence time and amount of tourism consumption are the main indicators to measure the attractiveness of a tourism destination. This paper makes an empirical study through questionnaires among the old tourists located in 12 main tourist attractions in Xi' an. Based on 800 questionnaires, this paper emphatically analyses the influencing factors of the domestic old tourists' consumption behavior and employs the multiple stepwise regression analysis to have studied the affecting degree of every factor. Results conclude that 13 main factors affect the travel behavior of older people; they are physical condition, income, attitude of tourism, spouse, attitude of sons and daughters, related groups, tourism prices, distance, security, climatic conditions, food and accommodation, transport and tourism attraction. Among these factors, income and tourism attraction are the common factors affecting old tourists' travel frequency, residence time, amount of consumption per day. Specifically, the old tourists' travel frequency is directly proportional to income, attitude of tourism, attitude of sons and daughters, physical condition, tourism attraction and is inversely proportional to distance. The old tourists' residence

  8. Least Squares Adjustment: Linear and Nonlinear Weighted Regression Analysis

    DEFF Research Database (Denmark)

    Nielsen, Allan Aasbjerg

    2007-01-01

    This note primarily describes the mathematics of least squares regression analysis as it is often used in geodesy including land surveying and satellite positioning applications. In these fields regression is often termed adjustment. The note also contains a couple of typical land surveying...... and satellite positioning application examples. In these application areas we are typically interested in the parameters in the model typically 2- or 3-D positions and not in predictive modelling which is often the main concern in other regression analysis applications. Adjustment is often used to obtain...

  9. An Original Stepwise Multilevel Logistic Regression Analysis of Discriminatory Accuracy

    DEFF Research Database (Denmark)

    Merlo, Juan; Wagner, Philippe; Ghith, Nermin

    2016-01-01

    BACKGROUND AND AIM: Many multilevel logistic regression analyses of "neighbourhood and health" focus on interpreting measures of associations (e.g., odds ratio, OR). In contrast, multilevel analysis of variance is rarely considered. We propose an original stepwise analytical approach that disting......BACKGROUND AND AIM: Many multilevel logistic regression analyses of "neighbourhood and health" focus on interpreting measures of associations (e.g., odds ratio, OR). In contrast, multilevel analysis of variance is rarely considered. We propose an original stepwise analytical approach...

  10. 3D Regression Heat Map Analysis of Population Study Data.

    Science.gov (United States)

    Klemm, Paul; Lawonn, Kai; Glaßer, Sylvia; Niemann, Uli; Hegenscheid, Katrin; Völzke, Henry; Preim, Bernhard

    2016-01-01

    Epidemiological studies comprise heterogeneous data about a subject group to define disease-specific risk factors. These data contain information (features) about a subject's lifestyle, medical status as well as medical image data. Statistical regression analysis is used to evaluate these features and to identify feature combinations indicating a disease (the target feature). We propose an analysis approach of epidemiological data sets by incorporating all features in an exhaustive regression-based analysis. This approach combines all independent features w.r.t. a target feature. It provides a visualization that reveals insights into the data by highlighting relationships. The 3D Regression Heat Map, a novel 3D visual encoding, acts as an overview of the whole data set. It shows all combinations of two to three independent features with a specific target disease. Slicing through the 3D Regression Heat Map allows for the detailed analysis of the underlying relationships. Expert knowledge about disease-specific hypotheses can be included into the analysis by adjusting the regression model formulas. Furthermore, the influences of features can be assessed using a difference view comparing different calculation results. We applied our 3D Regression Heat Map method to a hepatic steatosis data set to reproduce results from a data mining-driven analysis. A qualitative analysis was conducted on a breast density data set. We were able to derive new hypotheses about relations between breast density and breast lesions with breast cancer. With the 3D Regression Heat Map, we present a visual overview of epidemiological data that allows for the first time an interactive regression-based analysis of large feature sets with respect to a disease.

  11. Prediction of flow characteristics using multiple regression and neural networks: A case study in Zimbabwe

    NARCIS (Netherlands)

    Mazvimavi, D.; Meijerink, A.M.J.; Savenije, H.H.G.; Stein, A.

    2005-01-01

    The feasibility of predicting flow characteristics from basin descriptors using multiple regression and neural networks has been investigated on 52 basins in Zimbabwe. Flow characteristics considered were average annual runoff, base flow index, flow duration curve, and average monthly runoff . Mean

  12. A Spreadsheet Tool for Learning the Multiple Regression F-Test, T-Tests, and Multicollinearity

    Science.gov (United States)

    Martin, David

    2008-01-01

    This note presents a spreadsheet tool that allows teachers the opportunity to guide students towards answering on their own questions related to the multiple regression F-test, the t-tests, and multicollinearity. The note demonstrates approaches for using the spreadsheet that might be appropriate for three different levels of statistics classes,…

  13. A Simple and Convenient Method of Multiple Linear Regression to Calculate Iodine Molecular Constants

    Science.gov (United States)

    Cooper, Paul D.

    2010-01-01

    A new procedure using a student-friendly least-squares multiple linear-regression technique utilizing a function within Microsoft Excel is described that enables students to calculate molecular constants from the vibronic spectrum of iodine. This method is advantageous pedagogically as it calculates molecular constants for ground and excited…

  14. The Performance of the Full Information Maximum Likelihood Estimator in Multiple Regression Models with Missing Data.

    Science.gov (United States)

    Enders, Craig K.

    2001-01-01

    Examined the performance of a recently available full information maximum likelihood (FIML) estimator in a multiple regression model with missing data using Monte Carlo simulation and considering the effects of four independent variables. Results indicate that FIML estimation was superior to that of three ad hoc techniques, with less bias and less…

  15. Multivariate sparse group lasso for the multivariate multiple linear regression with an arbitrary group structure.

    Science.gov (United States)

    Li, Yanming; Nan, Bin; Zhu, Ji

    2015-06-01

    We propose a multivariate sparse group lasso variable selection and estimation method for data with high-dimensional predictors as well as high-dimensional response variables. The method is carried out through a penalized multivariate multiple linear regression model with an arbitrary group structure for the regression coefficient matrix. It suits many biology studies well in detecting associations between multiple traits and multiple predictors, with each trait and each predictor embedded in some biological functional groups such as genes, pathways or brain regions. The method is able to effectively remove unimportant groups as well as unimportant individual coefficients within important groups, particularly for large p small n problems, and is flexible in handling various complex group structures such as overlapping or nested or multilevel hierarchical structures. The method is evaluated through extensive simulations with comparisons to the conventional lasso and group lasso methods, and is applied to an eQTL association study.

  16. Regional flood frequency analysis using spatial proximity and basin characteristics: Quantile regression vs. parameter regression technique

    Science.gov (United States)

    Ahn, Kuk-Hyun; Palmer, Richard

    2016-09-01

    Despite wide use of regression-based regional flood frequency analysis (RFFA) methods, the majority are based on either ordinary least squares (OLS) or generalized least squares (GLS). This paper proposes 'spatial proximity' based RFFA methods using the spatial lagged model (SLM) and spatial error model (SEM). The proposed methods are represented by two frameworks: the quantile regression technique (QRT) and parameter regression technique (PRT). The QRT develops prediction equations for flooding quantiles in average recurrence intervals (ARIs) of 2, 5, 10, 20, and 100 years whereas the PRT provides prediction of three parameters for the selected distribution. The proposed methods are tested using data incorporating 30 basin characteristics from 237 basins in Northeastern United States. Results show that generalized extreme value (GEV) distribution properly represents flood frequencies in the study gages. Also, basin area, stream network, and precipitation seasonality are found to be the most effective explanatory variables in prediction modeling by the QRT and PRT. 'Spatial proximity' based RFFA methods provide reliable flood quantile estimates compared to simpler methods. Compared to the QRT, the PRT may be recommended due to its accuracy and computational simplicity. The results presented in this paper may serve as one possible guidepost for hydrologists interested in flood analysis at ungaged sites.

  17. Statistical Downscaling: A Comparison of Multiple Linear Regression and k-Nearest Neighbor Approaches

    Science.gov (United States)

    Gangopadhyay, S.; Clark, M. P.; Rajagopalan, B.

    2002-12-01

    The success of short term (days to fortnight) streamflow forecasting largely depends on the skill of surface climate (e.g., precipitation and temperature) forecasts at local scales in the individual river basins. The surface climate forecasts are used to drive the hydrologic models for streamflow forecasting. Typically, Medium Range Forecast (MRF) models provide forecasts of large scale circulation variables (e.g. pressures, wind speed, relative humidity etc.) at different levels in the atmosphere on a regular grid - which are then used to "downscale" to the surface climate at locations within the model grid box. Several statistical and dynamical methods are available for downscaling. This paper compares the utility of two statistical downscaling methodologies: (1) multiple linear regression (MLR) and (2) a nonparametric approach based on k-nearest neighbor (k-NN) bootstrap method, in providing local-scale information of precipitation and temperature at a network of stations in the Upper Colorado River Basin. Downscaling to the stations is based on output of large scale circulation variables (i.e. predictors) from the NCEP Medium Range Forecast (MRF) database. Fourteen-day six hourly forecasts are developed using these two approaches, and their forecast skill evaluated. A stepwise regression is performed at each location to select the predictors for the MLR. The k-NN bootstrap technique resamples historical data based on their "nearness" to the current pattern in the predictor space. Prior to resampling a Principal Component Analysis (PCA) is performed on the predictor set to identify a small subset of predictors. Preliminary results using the MLR technique indicate a significant value in the downscaled MRF output in predicting runoff in the Upper Colorado Basin. It is expected that the k-NN approach will match the skill of the MLR approach at individual stations, and will have the added advantage of preserving the spatial co-variability between stations, capturing

  18. Multiple-time correlation functions for non-Markovian interaction: Beyond the Quantum Regression Theorem

    CERN Document Server

    Alonso, D; Alonso, Daniel; Vega, In\\'es de

    2004-01-01

    Multiple time correlation functions are found in the dynamical description of different phenomena. They encode and describe the fluctuations of the dynamical variables of a system. In this paper we formulate a theory of non-Markovian multiple-time correlation functions (MTCF) for a wide class of systems. We derive the dynamical equation of the {\\it reduced propagator}, an object that evolve state vectors of the system conditioned to the dynamics of its environment, which is not necessarily at the vacuum state at the initial time. Such reduced propagator is the essential piece to obtain multiple-time correlation functions. An average over the different environmental histories of the reduced propagator permits us to obtain the evolution equations of the multiple-time correlation functions. We also study the evolution of MTCF within the weak coupling limit and it is shown that the multiple-time correlation function of some observables satisfy the Quantum Regression Theorem (QRT), whereas other correlations do no...

  19. Simulation Experiments in Practice : Statistical Design and Regression Analysis

    NARCIS (Netherlands)

    Kleijnen, J.P.C.

    2007-01-01

    In practice, simulation analysts often change only one factor at a time, and use graphical analysis of the resulting Input/Output (I/O) data. Statistical theory proves that more information is obtained when applying Design Of Experiments (DOE) and linear regression analysis. Unfortunately, classic t

  20. Multiple regression as a preventive tool for determining the risk of Legionella spp.

    Directory of Open Access Journals (Sweden)

    Enrique Gea-Izquierdo

    2012-04-01

    Full Text Available To determine the interrelationship between health & hygiene conditions for prevention of legionellosis, the compositionof materials used in water distribution systems, the water origin and Legionella pneumophila risk. Material and methods. Include adescriptive study and multiple regression analysis on a sample of golf course sprinkler irrigation systems (n=31 pertaining to hotelslocated on the Costa del Sol (Malaga, Spain. The study was carried out in 2009. Results. Presented a significant lineal relation, withall the independent variables contributing significantly (p<0.05 to the model’s fit. The relationship between water type and the risk ofLegionella, as well as the material composition and the latter, is lineal and positive. In contrast, the relationship between health-hygieneconditions and Legionella risk is lineal and negative. Conclusion. The characterization of Legionella pneumophila concentration, asdefined by the risk in water and through use of the predictive method, can contribute to the consideration of new influence variables inthe development of the agent, resulting in improved control and prevention of the disease.

  1. Multiple regression method to determine aerosol optical depth in atmospheric column in Penang, Malaysia

    Science.gov (United States)

    Tan, F.; Lim, H. S.; Abdullah, K.; Yoon, T. L.; Zubir Matjafri, M.; Holben, B.

    2014-02-01

    Aerosol optical depth (AOD) from AERONET data has a very fine resolution but air pollution index (API), visibility and relative humidity from the ground truth measurements are coarse. To obtain the local AOD in the atmosphere, the relationship between these three parameters was determined using multiple regression analysis. The data of southwest monsoon period (August to September, 2012) taken in Penang, Malaysia, was used to establish a quantitative relationship in which the AOD is modeled as a function of API, relative humidity, and visibility. The highest correlated model was used to predict AOD values during southwest monsoon period. When aerosol is not uniformly distributed in the atmosphere then the predicted AOD can be highly deviated from the measured values. Therefore these deviated data can be removed by comparing between the predicted AOD values and the actual AERONET data which help to investigate whether the non uniform source of the aerosol is from the ground surface or from higher altitude level. This model can accurately predict AOD if only the aerosol is uniformly distributed in the atmosphere. However, further study is needed to determine this model is suitable to use for AOD predicting not only in Penang, but also other state in Malaysia or even global.

  2. QSAR study of prolylcarboxypeptidase inhibitors by genetic algorithm: Multiple linear regressions

    Indian Academy of Sciences (India)

    Eslam Pourbasheer; Saadat Vahdani; Reza Aalizadeh; Alireza Banaei; Mohammad Reza Ganjali

    2015-07-01

    The predictive analysis based on quantitative structure activity relationships (QSAR) on benzim-idazolepyrrolidinyl amides as prolylcarboxypeptidase (PrCP) inhibitors was performed. Molecules were represented by chemical descriptors that encode constitutional, topological, geometrical, and electronic structure features. The hierarchical clustering method was used to classify the dataset into training and test subsets. The important descriptors were selected with the aid of the genetic algorithm method. The QSAR model was constructed, using the multiple linear regressions (MLR), and its robustness and predictability were verified by internal and external cross-validation methods. Furthermore, the calculation of the domain of applicability defines the area of reliable predictions. The root mean square errors (RMSE) of the training set and the test set for GA-MLR model were calculated to be 0.176, 0.279 and the correlation coefficients (R2) were obtained to be 0.839, 0.923, respectively. The proposed model has good stability, robustness and predictability when verified by internal and external validation.

  3. MANUFACTURING AND CONTINUOUS IMPROVEMENT AREAS USING PARTIAL LEAST SQUARE PATH MODELING WITH MULTIPLE REGRESSION COMPARISON

    Directory of Open Access Journals (Sweden)

    Carlos Monge Perry

    2014-07-01

    Full Text Available Structural equation modeling (SEM has traditionally been deployed in areas of marketing, consumer satisfaction and preferences, human behavior, and recently in strategic planning. These areas are considered their niches; however, there is a remarkable tendency in empirical research studies that indicate a more diversified use of the technique.  This paper shows the application of structural equation modeling using partial least square (PLS-SEM, in areas of manufacturing, quality, continuous improvement, operational efficiency, and environmental responsibility in Mexico’s medium and large manufacturing plants, while using a small sample (n = 40.  The results obtained from the PLS-SEM model application mentioned, are highly positive, relevant, and statistically significant. Also shown in this paper, for purposes of validity, reliability, and statistical power confirmation of PLS-SEM, is a comparative analysis against multiple regression showing very similar results to those obtained by PLS-SEM.  This fact validates the use of PLS-SEM in areas of untraditional scientific research, and suggests and invites the use of the technique in diversified fields of the scientific research

  4. Neural network and multiple linear regression to predict school children dimensions for ergonomic school furniture design.

    Science.gov (United States)

    Agha, Salah R; Alnahhal, Mohammed J

    2012-11-01

    The current study investigates the possibility of obtaining the anthropometric dimensions, critical to school furniture design, without measuring all of them. The study first selects some anthropometric dimensions that are easy to measure. Two methods are then used to check if these easy-to-measure dimensions can predict the dimensions critical to the furniture design. These methods are multiple linear regression and neural networks. Each dimension that is deemed necessary to ergonomically design school furniture is expressed as a function of some other measured anthropometric dimensions. Results show that out of the five dimensions needed for chair design, four can be related to other dimensions that can be measured while children are standing. Therefore, the method suggested here would definitely save time and effort and avoid the difficulty of dealing with students while measuring these dimensions. In general, it was found that neural networks perform better than multiple linear regression in the current study.

  5. Prediction of blast boulders in open pit mines via multiple regression and artificial neural networks

    Institute of Scientific and Technical Information of China (English)

    Ghiasi Majid; Askarnejad Nematollah; Dindarloo Saeid R.; Shamsoddini Hamed

    2016-01-01

    The most important objective of blasting in open pit mines is rock fragmentation. Prediction of produced boulders (oversized crushed rocks) is a key parameter in designing blast patterns. In this study, the amount of boulder produced in blasting operations of Golegohar iron ore open pit mine, Iran was pre-dicted via multiple regression method and artificial neural networks. Results of 33 blasts in the mine were collected for modeling. Input variables were: joints spacing, density and uniaxial compressive strength of the intact rock, burden, spacing, stemming, bench height to burden ratio, and specific charge. The dependent variable was ratio of boulder volume to pattern volume. Both techniques were successful in predicting the ratio. In this study, the multiple regression method was superior with coefficient of determination and root mean squared error values of 0.89 and 0.19, respectively.

  6. Multiple Linear Regression Application on the Inter-Network Settlement of Internet

    Institute of Scientific and Technical Information of China (English)

    YANG Qing-feng; ZHANG Qi-xiang; L(U) Ting-jie

    2006-01-01

    This paper develops an analytical framework to explain the Internet interconnection settlement issues. The paper shows that multiple linear regression can be used in assessing the network value of Internet Backbone Providers (IBPs).By using the exchange rate of each network, we can define a rate of network value, which reflects the contribution of each network to interconnection and the interconnected network resource usage by each of the network.

  7. Time series analysis using semiparametric regression on oil palm production

    Science.gov (United States)

    Yundari, Pasaribu, U. S.; Mukhaiyar, U.

    2016-04-01

    This paper presents semiparametric kernel regression method which has shown its flexibility and easiness in mathematical calculation, especially in estimating density and regression function. Kernel function is continuous and it produces a smooth estimation. The classical kernel density estimator is constructed by completely nonparametric analysis and it is well reasonable working for all form of function. Here, we discuss about parameter estimation in time series analysis. First, we consider the parameters are exist, then we use nonparametrical estimation which is called semiparametrical. The selection of optimum bandwidth is obtained by considering the approximation of Mean Integrated Square Root Error (MISE).

  8. Ratio Versus Regression Analysis: Some Empirical Evidence in Brazil

    Directory of Open Access Journals (Sweden)

    Newton Carneiro Affonso da Costa Jr.

    2004-06-01

    Full Text Available This work compares the traditional methodology for ratio analysis, applied to a sample of Brazilian firms, with the alternative one of regression analysis both to cross-industry and intra-industry samples. It was tested the structural validity of the traditional methodology through a model that represents its analogous regression format. The data are from 156 Brazilian public companies in nine industrial sectors for the year 1997. The results provide weak empirical support for the traditional ratio methodology as it was verified that the validity of this methodology may differ between ratios.

  9. Multivariate Multiple Regression Models for a Big Data-Empowered SON Framework in Mobile Wireless Networks

    Directory of Open Access Journals (Sweden)

    Yoonsu Shin

    2016-01-01

    Full Text Available In the 5G era, the operational cost of mobile wireless networks will significantly increase. Further, massive network capacity and zero latency will be needed because everything will be connected to mobile networks. Thus, self-organizing networks (SON are needed, which expedite automatic operation of mobile wireless networks, but have challenges to satisfy the 5G requirements. Therefore, researchers have proposed a framework to empower SON using big data. The recent framework of a big data-empowered SON analyzes the relationship between key performance indicators (KPIs and related network parameters (NPs using machine-learning tools, and it develops regression models using a Gaussian process with those parameters. The problem, however, is that the methods of finding the NPs related to the KPIs differ individually. Moreover, the Gaussian process regression model cannot determine the relationship between a KPI and its various related NPs. In this paper, to solve these problems, we proposed multivariate multiple regression models to determine the relationship between various KPIs and NPs. If we assume one KPI and multiple NPs as one set, the proposed models help us process multiple sets at one time. Also, we can find out whether some KPIs are conflicting or not. We implement the proposed models using MapReduce.

  10. 基于多元逐步回归的脑卒中发病影响因子分析%Analysis of stroke incidence impact factors based on stepwise multiple regression

    Institute of Scientific and Technical Information of China (English)

    王建芳

    2013-01-01

    In this paper,stroke incidence impact factors were analyzed.First,the huge cases information through statistics and analysis,then it presented a mathematical model through regression fitting method,and established the relationship between stroke incidence and air temperature,barometric pressure and humidity.Last,it made some suggestions on the high-risk groups.As a result,the 2012 Higher Education Press Cup National Mathematical Contest in Modeling C title problem given a complete answer.%对脑卒中发病影响因子进行了分析和研究.首先对庞大的病例信息进行了统计分析,然后通过回归拟合的方法建立了数学模型,确立了脑卒中发病率与气温、气压和湿度间的关系,最后就高危人群提出了一些建议.由此,对2012“高教社杯”全国大学生数学建模竞赛C题的各问题给出了完整的解答.

  11. Multiple regression analysis of the factors affecting the sinking per-formance of large-scale tuna purse seine%金枪鱼围网沉降性能影响因子的多元回归分析

    Institute of Scientific and Technical Information of China (English)

    周成; 许柳雄; 张新峰; 朱国平; 唐浩; 王学昉

    2013-01-01

    We evaluated the relationship between sinking depth and a number of factors using multiple regression to determine the external patterns of sinking a purse seine. We collected data on sinking depth, gear operation, and shooting duration(T), shooting velocity(S), current speed(Cs), current direction(Cd), and purse line(L) length from tuna purse seiners owned by Shanghai Fisheries General Corp. between September and December in 2011. Current direction and shooting velocity had no effect on sinking depth (P>0.05, df=53). Conversely, the shooting duration, current speed, and length of the purse line had a significant effect on sinking depth (P0.05, df=53),而放网时间、流速及括纲长度对网具沉降深度的影响均显著(P<0.05, df=53),最终得到沉降深度(D)和放网时间(T)、流速(Cs)及扩纲长度(L)的多元回归模型为D=0.069T−144.5Cs2+0.022L+158。主成分分析表明,中国金枪鱼围网作业在一定程度上没有利用有关渔场海况信息对捕捞操作进行有益的改变。在一般操作条件(放网时间为550 s,括纲长度为2000 m)时,模型预测表明:在无风无流的情况下,网具沉降深度平均值为236.78 m,95%置信区间为[211.51,262.04],而遇海流急促的情况(流速为1 kn)时,网具沉降深度平均值为92.27 m,95%置信区间为[60.56,123.97]。

  12. Sparse Regression by Projection and Sparse Discriminant Analysis

    KAUST Repository

    Qi, Xin

    2015-04-03

    © 2015, © American Statistical Association, Institute of Mathematical Statistics, and Interface Foundation of North America. Recent years have seen active developments of various penalized regression methods, such as LASSO and elastic net, to analyze high-dimensional data. In these approaches, the direction and length of the regression coefficients are determined simultaneously. Due to the introduction of penalties, the length of the estimates can be far from being optimal for accurate predictions. We introduce a new framework, regression by projection, and its sparse version to analyze high-dimensional data. The unique nature of this framework is that the directions of the regression coefficients are inferred first, and the lengths and the tuning parameters are determined by a cross-validation procedure to achieve the largest prediction accuracy. We provide a theoretical result for simultaneous model selection consistency and parameter estimation consistency of our method in high dimension. This new framework is then generalized such that it can be applied to principal components analysis, partial least squares, and canonical correlation analysis. We also adapt this framework for discriminant analysis. Compared with the existing methods, where there is relatively little control of the dependency among the sparse components, our method can control the relationships among the components. We present efficient algorithms and related theory for solving the sparse regression by projection problem. Based on extensive simulations and real data analysis, we demonstrate that our method achieves good predictive performance and variable selection in the regression setting, and the ability to control relationships between the sparse components leads to more accurate classification. In supplementary materials available online, the details of the algorithms and theoretical proofs, and R codes for all simulation studies are provided.

  13. Early Parallel Activation of Semantics and Phonology in Picture Naming: Evidence from a Multiple Linear Regression MEG Study.

    Science.gov (United States)

    Miozzo, Michele; Pulvermüller, Friedemann; Hauk, Olaf

    2015-10-01

    The time course of brain activation during word production has become an area of increasingly intense investigation in cognitive neuroscience. The predominant view has been that semantic and phonological processes are activated sequentially, at about 150 and 200-400 ms after picture onset. Although evidence from prior studies has been interpreted as supporting this view, these studies were arguably not ideally suited to detect early brain activation of semantic and phonological processes. We here used a multiple linear regression approach to magnetoencephalography (MEG) analysis of picture naming in order to investigate early effects of variables specifically related to visual, semantic, and phonological processing. This was combined with distributed minimum-norm source estimation and region-of-interest analysis. Brain activation associated with visual image complexity appeared in occipital cortex at about 100 ms after picture presentation onset. At about 150 ms, semantic variables became physiologically manifest in left frontotemporal regions. In the same latency range, we found an effect of phonological variables in the left middle temporal gyrus. Our results demonstrate that multiple linear regression analysis is sensitive to early effects of multiple psycholinguistic variables in picture naming. Crucially, our results suggest that access to phonological information might begin in parallel with semantic processing around 150 ms after picture onset.

  14. Regression analysis for solving diagnosis problem of children's health

    Science.gov (United States)

    Cherkashina, Yu A.; Gerget, O. M.

    2016-04-01

    The paper includes results of scientific researches. These researches are devoted to the application of statistical techniques, namely, regression analysis, to assess the health status of children in the neonatal period based on medical data (hemostatic parameters, parameters of blood tests, the gestational age, vascular-endothelial growth factor) measured at 3-5 days of children's life. In this paper a detailed description of the studied medical data is given. A binary logistic regression procedure is discussed in the paper. Basic results of the research are presented. A classification table of predicted values and factual observed values is shown, the overall percentage of correct recognition is determined. Regression equation coefficients are calculated, the general regression equation is written based on them. Based on the results of logistic regression, ROC analysis was performed, sensitivity and specificity of the model are calculated and ROC curves are constructed. These mathematical techniques allow carrying out diagnostics of health of children providing a high quality of recognition. The results make a significant contribution to the development of evidence-based medicine and have a high practical importance in the professional activity of the author.

  15. Controlling the Type I Error Rate in Stepwise Regression Analysis.

    Science.gov (United States)

    Pohlmann, John T.

    Three procedures used to control Type I error rate in stepwise regression analysis are forward selection, backward elimination, and true stepwise. In the forward selection method, a model of the dependent variable is formed by choosing the single best predictor; then the second predictor which makes the strongest contribution to the prediction of…

  16. Visual category recognition using Spectral Regression and Kernel Discriminant Analysis

    NARCIS (Netherlands)

    Tahir, M.A.; Kittler, J.; Mikolajczyk, K.; Yan, F.; van de Sande, K.E.A.; Gevers, T.

    2009-01-01

    Visual category recognition (VCR) is one of the most important tasks in image and video indexing. Spectral methods have recently emerged as a powerful tool for dimensionality reduction and manifold learning. Recently, Spectral Regression combined with Kernel Discriminant Analysis (SR-KDA) has been s

  17. Energy production through organic fraction of municipal solid waste-A multiple regression modeling approach.

    Science.gov (United States)

    Ramesh, N; Ramesh, S; Vennila, G; Abdul Bari, J; MageshKumar, P

    2016-12-01

    In the 21st century, people migrated from rural to urban areas for several reasons. As a result, the populations of Indian cities are increasing day by day. On one hand, the country is developing in the field of science and technology and on the other hand, it is encountering a serious problem called 'Environmental degradation'. Due to increase in population, the generation of solid waste is also increased and is being disposed in open dumps and landfills which lead to air and land pollution. This study is attempted to generate energy out of organic solid waste by the bio- fermentation process. The study was conducted for a period of 7 months at Erode, Tamilnadu and the reading on various parameters like Hydraulic retention time, organic loading rate, sludge loading rate, influent pH, effluent pH, inlet volatile acids, out let volatile fatty acids, inlet VSS/TS ratio, outlet VSS/TS ratio, influent COD, effluent COD and % of COD removal are recorded for every 10 days. The aim of the present study is to develop a model through multiple linear regression analysis with COD as dependent variable and various parameters like HRT, OLR, SLR, influent, effluent, VSS/TS ratio, influent COD, effluent COD, etc as independent variables and to analyze the impact of these parameters on COD. The results of the model developed through step-wise regression method revealed that only four parameters Influent COD, effluent COD, VSS/TS and Influent/pH were main influencers of COD removal. The parameters influent COD and VSS/TS have positive impact on COD removal and the parameters effluent COD and Influent/pH have negative impact. The parameter Influent COD has the highest order of impact, followed by effluent COD, VSS/TS and influent pH. The other parameters HRT, OLR, SLR, INLET VFA and OUTLET VFA were not significantly contributing to the removal of COD. The implementation of the process suggested through this study might bring in dual benefit to the community, viz treatment of solid

  18. Application of Granger Causality and Multiple Regression Analysis in Air Quality Monitoring%改进多元回归分析在空气质量监测的应用

    Institute of Scientific and Technical Information of China (English)

    金江强; 张怀相

    2016-01-01

    为提高空气质量的测量精度,利用各种空气污染物之间的关联性,提出了一种基于空气污染物之间的因果关系来提高空气质量测量精度的算法。首先针对空气污染物的时间序列建立了自回归差分滑动平均模型;然后通过 F统计量检验其格兰杰因果关系;接着利用逐步线性回归模型建立空气污染物之间的定量关系;最后运用实验数据分析并验证了算法的准确性和有效性。%In order to improve the accuracy of measurement of air quality ,this paper proposes an algorithm of improving air quality measurement precision by causality between air pollutants ,based on the contact between the various air pollutants . First of all , autoregressive integrated moving average (AIMA ) model with exogenous variables is established for time series of air pollutants .Secondly ,Granger causality is tested for air pollutants by F‐statistics . Then , stepwise linear regression mode is trained to establish a quantitative relationship in air pollutants which has causal relationship .Finally ,the accuracy and effectiveness of the algorithm has been validated by the analysis of experimental data .

  19. Exploratory regression analysis: a tool for selecting models and determining predictor importance.

    Science.gov (United States)

    Braun, Michael T; Oswald, Frederick L

    2011-06-01

    Linear regression analysis is one of the most important tools in a researcher's toolbox for creating and testing predictive models. Although linear regression analysis indicates how strongly a set of predictor variables, taken together, will predict a relevant criterion (i.e., the multiple R), the analysis cannot indicate which predictors are the most important. Although there is no definitive or unambiguous method for establishing predictor variable importance, there are several accepted methods. This article reviews those methods for establishing predictor importance and provides a program (in Excel) for implementing them (available for direct download at http://dl.dropbox.com/u/2480715/ERA.xlsm?dl=1) . The program investigates all 2(p) - 1 submodels and produces several indices of predictor importance. This exploratory approach to linear regression, similar to other exploratory data analysis techniques, has the potential to yield both theoretical and practical benefits.

  20. Hierarchical Multiple Regression in Counseling Research: Common Problems and Possible Remedies.

    Science.gov (United States)

    Petrocelli, John V.

    2003-01-01

    A brief content analysis was conducted on the use of hierarchical regression in counseling research published in the "Journal of Counseling Psychology" and the "Journal of Counseling & Development" during the years 1997-2001. Common problems are cited and possible remedies are described. (Contains 43 references and 3 tables.) (Author)

  1. Early cost estimating for road construction projects using multiple regression techniques

    Directory of Open Access Journals (Sweden)

    Ibrahim Mahamid

    2011-12-01

    Full Text Available The objective of this study is to develop early cost estimating models for road construction projects using multiple regression techniques, based on 131 sets of data collected in the West Bank in Palestine. As the cost estimates are required at early stages of a project, considerations were given to the fact that the input data for the required regression model could be easily extracted from sketches or scope definition of the project. 11 regression models are developed to estimate the total cost of road construction project in US dollar; 5 of them include bid quantities as input variables and 6 include road length and road width. The coefficient of determination r2 for the developed models is ranging from 0.92 to 0.98 which indicate that the predicted values from a forecast models fit with the real-life data. The values of the mean absolute percentage error (MAPE of the developed regression models are ranging from 13% to 31%, the results compare favorably with past researches which have shown that the estimate accuracy in the early stages of a project is between ±25% and ±50%.

  2. 行为因素对黑水县居民慢性病影响的多元分析%Multiple Regression Analysis of Relation Between Behavior Factors and Chronic Diseases in Heishui County,Sichuan Province

    Institute of Scientific and Technical Information of China (English)

    汪凯; 王国庆; 李钋; 黄建生; 朱迎

    2001-01-01

    Purpose: To know the relation between chronic diseases and behavior factors in Heishui county. Method: The 1 483 inhabitants over 15in Heishui county, from the stratum sampling,areinterviewed. Resuls:The analysis of Logistic regression shows there are 8 kinds of factors affeting chronic diseases of Heishui inhabitants,and 5 kinds of factors are from behavior. The more important of them are occupation,drinking-alcohol and harmful dietetic habits. In the people of drinking-alcohol,the drinking-alcohol factors,which related with chronic diseases,are kinds of drinking-alcohol, years of drinking-alcohol,and many people drinking Za liquor with same straw and the level of drink-alcohol of 3 days. By the step wise regression,there are 5 kinds of factors on the digestive system diseases and 3 kinds are from behavior,the most important of them are harmful dietetic habits, drinking-alcohol,personal health habits. Conclusion:the behavior factors are the important for chronic diseases in the habitant of Heishui county, especially drink-alcool, harmful.%了解黑水县居民行为因素与慢性病的关系;方法:通过分层整群抽样,访谈调查了黑水县15岁及以上的居民1483人;结果:通过LOGLSTIC回归分析发现影响黑水县居民患慢性病的因素有八类因素,五类为行为因素,其中尤以职业、饮酒和不良饮食习惯因子1的作用明显,其患慢性病的概率和不患慢性病的概率之比分别为1.6627、1.4063和1.3986;与慢性病有关的饮酒因素中有饮酒种类、饮酒年限、共用吸管饮咂酒和3日饮酒量,其中喝咂酒共用吸管的相对危险度1.390为最大;逐步回归分析表明影响消化系统疾病的因素有五类,其中三类为行为因素,不良饮食卫生习惯因子1、饮酒和个人卫生因子2为最重要的三个因素。结论:影响黑水县居民患慢性病的因素是多方面的,其中自创性的行为因素占有重要的位置,尤其是饮酒

  3. Multiple Linear Regression for Reconstruction of Gene Regulatory Networks in Solving Cascade Error Problems

    Directory of Open Access Journals (Sweden)

    Faridah Hani Mohamed Salleh

    2017-01-01

    Full Text Available Gene regulatory network (GRN reconstruction is the process of identifying regulatory gene interactions from experimental data through computational analysis. One of the main reasons for the reduced performance of previous GRN methods had been inaccurate prediction of cascade motifs. Cascade error is defined as the wrong prediction of cascade motifs, where an indirect interaction is misinterpreted as a direct interaction. Despite the active research on various GRN prediction methods, the discussion on specific methods to solve problems related to cascade errors is still lacking. In fact, the experiments conducted by the past studies were not specifically geared towards proving the ability of GRN prediction methods in avoiding the occurrences of cascade errors. Hence, this research aims to propose Multiple Linear Regression (MLR to infer GRN from gene expression data and to avoid wrongly inferring of an indirect interaction (A → B → C as a direct interaction (A → C. Since the number of observations of the real experiment datasets was far less than the number of predictors, some predictors were eliminated by extracting the random subnetworks from global interaction networks via an established extraction method. In addition, the experiment was extended to assess the effectiveness of MLR in dealing with cascade error by using a novel experimental procedure that had been proposed in this work. The experiment revealed that the number of cascade errors had been very minimal. Apart from that, the Belsley collinearity test proved that multicollinearity did affect the datasets used in this experiment greatly. All the tested subnetworks obtained satisfactory results, with AUROC values above 0.5.

  4. Multivariate quantiles and multiple-output regression quantiles: From $L_1$ optimization to halfspace depth

    CERN Document Server

    Hallin, Marc; Šiman, Miroslav; 10.1214/09-AOS723

    2010-01-01

    A new multivariate concept of quantile, based on a directional version of Koenker and Bassett's traditional regression quantiles, is introduced for multivariate location and multiple-output regression problems. In their empirical version, those quantiles can be computed efficiently via linear programming techniques. Consistency, Bahadur representation and asymptotic normality results are established. Most importantly, the contours generated by those quantiles are shown to coincide with the classical halfspace depth contours associated with the name of Tukey. This relation does not only allow for efficient depth contour computations by means of parametric linear programming, but also for transferring from the quantile to the depth universe such asymptotic results as Bahadur representations. Finally, linear programming duality opens the way to promising developments in depth-related multivariate rank-based inference.

  5. Multiple regression analysis of urinary fluoride, s aliva and plaque fluoride levels of adolescents dental fluorosis%氟斑牙青少年尿氟与唾液氟及菌斑氟的相关性分析

    Institute of Scientific and Technical Information of China (English)

    于阳阳; 赵伟; 刘晓燕; 邹冬荣; 杨晓昀; 刘荣; 于晓峰; 营杰

    2016-01-01

    Objective The purpose of this study was to study the correlation between dental fluorosis, saliva and plaque fluoride levels and urinary fluoride values in adolescents dental fluorosis. Methods A middle school was chosen as a survey point in the study. Two hundred adolescents were examined the degree of dental fluorosis by Dean's method. These adolescents were divided into four groups according to the severity of fluorosis (n = 52, 40, 28 and 80). Fluoride ion specific electrode was used to measure the fluoride levels in dental plaque, saliva, urinary and drinking water. The differences were analyzed b y ANOVA. Correlation of the fluoride levels between dental plaque, saliva, urine and the degree of dental fluorosis were analyzed by the method of multiple linear regression. Results The average fluoride content of drinking water was (2.20 ± 0.40) mg/L. Compared with controls, the fluoride concentrations in dental plaque, saliva and urine were higher in light, medium and severe dental fluorosis groups [(1.55 ± 0.88), (1.94 ± 0.77), (2.74 ± 0.83) than (0.32 ± 0.20) mg/L; (4.44 ± 1.62), (8.09 ± 0.93), (10.72 ± 0.99) than (0.02 ± 0.01) mg/L;(31.77 ± 6.09), (57.98 ± 1.83), (65.98 ± 2.78) than (13.06 ± 2.11) μg/g, all P<0.05]. Urinary fluoride was correlated with fluoride in saliva and dental plaque (r=0.245, 0.440, all P<0.05). Saliva fluoride was correlated with fluoride in dental plaque (r=0.849, P<0.01). The degree of dental fluorosis was correlated with fluoride in urine and saliva (r = 0.497, 0.896, 0.924, all P< 0.01). The multiple linear regression equation between fluoride in urine and the degree of dent al fluorosis, fluoride in dental plaque and saliva was as follow: y = 1.357 + 1.618x1 + 0.001x2 - 0.331x3 ± 0.69. Conclusions The metabolism of fluoride in body is related with oral fluoride repository in adolescents dental fluorosis. Fluoride in urine is influenced by plaque fluoride level, saliva fluoride concentration and the degree of dental

  6. Optimization of rheological parameter for micro-bubble drilling fluids by multiple regression experimental design

    Institute of Scientific and Technical Information of China (English)

    郑力会; 王金凤; 李潇鹏; 张燕; 李都

    2008-01-01

    In order to optimize plastic viscosity of 18 mPa·s circulating micro-bubble drilling fluid formula,orthogonal and uniform experimental design methods were applied,and the plastic viscosities of 36 and 24 groups of agent were tested,respectively.It is found that these two experimental design methods show drawbacks,that is,the amount of agent is difficult to determine,and the results are not fully optimized.Therefore,multiple regression experimental method was used to design experimental formula.By randomly selecting arbitrary agent with the amount within the recommended range,17 groups of drilling fluid formula were designed,and the plastic viscosity of each experiment formula was measured.Set plastic viscosity as the objective function,through multiple regressions,then quadratic regression model is obtained,whose correlation coefficient meets the requirement.Set target values of plastic viscosity to be 18,20 and 22 mPa·s,respectively,with the trial method,5 drilling fluid formulas are obtained with accuracy of 0.000 3,0.000 1 and 0.000 3.Arbitrarily select target value of each of the two groups under the formula for experimental verification of drilling fluid,then the measurement errors between theoretical and tested plastic viscosity are less than 5%,confirming that regression model can be applied to optimizing the circulating of plastic-foam drilling fluid viscosity.In accordance with the precision of different formulations of drilling fluid for other constraints,the methods result in the optimization of the circulating micro-bubble drilling fluid parameters.

  7. Multiple trait model combining random regressions for daily feed intake with single measured performance traits of growing pigs

    Directory of Open Access Journals (Sweden)

    Künzi Niklaus

    2002-01-01

    Full Text Available Abstract A random regression model for daily feed intake and a conventional multiple trait animal model for the four traits average daily gain on test (ADG, feed conversion ratio (FCR, carcass lean content and meat quality index were combined to analyse data from 1 449 castrated male Large White pigs performance tested in two French central testing stations in 1997. Group housed pigs fed ad libitum with electronic feed dispensers were tested from 35 to 100 kg live body weight. A quadratic polynomial in days on test was used as a regression function for weekly means of daily feed intake and to escribe its residual variance. The same fixed (batch and random (additive genetic, pen and individual permanent environmental effects were used for regression coefficients of feed intake and single measured traits. Variance components were estimated by means of a Bayesian analysis using Gibbs sampling. Four Gibbs chains were run for 550 000 rounds each, from which 50 000 rounds were discarded from the burn-in period. Estimates of posterior means of covariance matrices were calculated from the remaining two million samples. Low heritabilities of linear and quadratic regression coefficients and their unfavourable genetic correlations with other performance traits reveal that altering the shape of the feed intake curve by direct or indirect selection is difficult.

  8. Variable selection in multiple linear regression: The influence of individual cases

    Directory of Open Access Journals (Sweden)

    SJ Steel

    2007-12-01

    Full Text Available The influence of individual cases in a data set is studied when variable selection is applied in multiple linear regression. Two different influence measures, based on the C_p criterion and Akaike's information criterion, are introduced. The relative change in the selection criterion when an individual case is omitted is proposed as the selection influence of the specific omitted case. Four standard examples from the literature are considered and the selection influence of the cases is calculated. It is argued that the selection procedure may be improved by taking the selection influence of individual data cases into account.

  9. Multiple regression and Artificial Neural Network for long-term rainfall forecasting using large scale climate modes

    Science.gov (United States)

    Mekanik, F.; Imteaz, M. A.; Gato-Trinidad, S.; Elmahdi, A.

    2013-10-01

    In this study, the application of Artificial Neural Networks (ANN) and Multiple regression analysis (MR) to forecast long-term seasonal spring rainfall in Victoria, Australia was investigated using lagged El Nino Southern Oscillation (ENSO) and Indian Ocean Dipole (IOD) as potential predictors. The use of dual (combined lagged ENSO-IOD) input sets for calibrating and validating ANN and MR Models is proposed to investigate the simultaneous effect of past values of these two major climate modes on long-term spring rainfall prediction. The MR models that did not violate the limits of statistical significance and multicollinearity were selected for future spring rainfall forecast. The ANN was developed in the form of multilayer perceptron using Levenberg-Marquardt algorithm. Both MR and ANN modelling were assessed statistically using mean square error (MSE), mean absolute error (MAE), Pearson correlation (r) and Willmott index of agreement (d). The developed MR and ANN models were tested on out-of-sample test sets; the MR models showed very poor generalisation ability for east Victoria with correlation coefficients of -0.99 to -0.90 compared to ANN with correlation coefficients of 0.42-0.93; ANN models also showed better generalisation ability for central and west Victoria with correlation coefficients of 0.68-0.85 and 0.58-0.97 respectively. The ability of multiple regression models to forecast out-of-sample sets is compatible with ANN for Daylesford in central Victoria and Kaniva in west Victoria (r = 0.92 and 0.67 respectively). The errors of the testing sets for ANN models are generally lower compared to multiple regression models. The statistical analysis suggest the potential of ANN over MR models for rainfall forecasting using large scale climate modes.

  10. Proximate analysis, backwards stepwise regression between gross calorific value, ultimate and chemical analysis of wood.

    Science.gov (United States)

    Telmo, C; Lousada, J; Moreira, N

    2010-06-01

    The gross calorific value (GCV), proximate, ultimate and chemical analysis of debark wood in Portugal were studied, for future utilization in wood pellets industry and the results compared with CEN/TS 14961. The relationship between GCV, ultimate and chemical analysis were determined by multiple regression stepwise backward. The treatment between hardwoods-softwoods did not result in significant statistical differences for proximate, ultimate and chemical analysis. Significant statistical differences were found in carbon for National (hardwoods-softwoods) and (National-tropical) hardwoods in volatile matter, fixed carbon, carbon and oxygen and also for chemical analysis in National (hardwoods-softwoods) for F and (National-tropical) hardwoods for Br. GCV was highly positively related to C (0.79 * * *) and negatively to O (-0.71 * * *). The final independent variables of the model were (C, O, S, Zn, Ni, Br) with R(2)=0.86; F=27.68 * * *. The hydrogen did not contribute statistically to the energy content.

  11. Principal regression analysis and the index leverage effect

    Science.gov (United States)

    Reigneron, Pierre-Alain; Allez, Romain; Bouchaud, Jean-Philippe

    2011-09-01

    We revisit the index leverage effect, that can be decomposed into a volatility effect and a correlation effect. We investigate the latter using a matrix regression analysis, that we call ‘Principal Regression Analysis' (PRA) and for which we provide some analytical (using Random Matrix Theory) and numerical benchmarks. We find that downward index trends increase the average correlation between stocks (as measured by the most negative eigenvalue of the conditional correlation matrix), and makes the market mode more uniform. Upward trends, on the other hand, also increase the average correlation between stocks but rotates the corresponding market mode away from uniformity. There are two time scales associated to these effects, a short one on the order of a month (20 trading days), and a longer time scale on the order of a year. We also find indications of a leverage effect for sectorial correlations as well, which reveals itself in the second and third mode of the PRA.

  12. Poisson Regression Analysis of Illness and Injury Surveillance Data

    Energy Technology Data Exchange (ETDEWEB)

    Frome E.L., Watkins J.P., Ellis E.D.

    2012-12-12

    The Department of Energy (DOE) uses illness and injury surveillance to monitor morbidity and assess the overall health of the work force. Data collected from each participating site include health events and a roster file with demographic information. The source data files are maintained in a relational data base, and are used to obtain stratified tables of health event counts and person time at risk that serve as the starting point for Poisson regression analysis. The explanatory variables that define these tables are age, gender, occupational group, and time. Typical response variables of interest are the number of absences due to illness or injury, i.e., the response variable is a count. Poisson regression methods are used to describe the effect of the explanatory variables on the health event rates using a log-linear main effects model. Results of fitting the main effects model are summarized in a tabular and graphical form and interpretation of model parameters is provided. An analysis of deviance table is used to evaluate the importance of each of the explanatory variables on the event rate of interest and to determine if interaction terms should be considered in the analysis. Although Poisson regression methods are widely used in the analysis of count data, there are situations in which over-dispersion occurs. This could be due to lack-of-fit of the regression model, extra-Poisson variation, or both. A score test statistic and regression diagnostics are used to identify over-dispersion. A quasi-likelihood method of moments procedure is used to evaluate and adjust for extra-Poisson variation when necessary. Two examples are presented using respiratory disease absence rates at two DOE sites to illustrate the methods and interpretation of the results. In the first example the Poisson main effects model is adequate. In the second example the score test indicates considerable over-dispersion and a more detailed analysis attributes the over-dispersion to extra

  13. Multiple linear regression to estimate time-frequency electrophysiological responses in single trials.

    Science.gov (United States)

    Hu, L; Zhang, Z G; Mouraux, A; Iannetti, G D

    2015-05-01

    Transient sensory, motor or cognitive event elicit not only phase-locked event-related potentials (ERPs) in the ongoing electroencephalogram (EEG), but also induce non-phase-locked modulations of ongoing EEG oscillations. These modulations can be detected when single-trial waveforms are analysed in the time-frequency domain, and consist in stimulus-induced decreases (event-related desynchronization, ERD) or increases (event-related synchronization, ERS) of synchrony in the activity of the underlying neuronal populations. ERD and ERS reflect changes in the parameters that control oscillations in neuronal networks and, depending on the frequency at which they occur, represent neuronal mechanisms involved in cortical activation, inhibition and binding. ERD and ERS are commonly estimated by averaging the time-frequency decomposition of single trials. However, their trial-to-trial variability that can reflect physiologically-important information is lost by across-trial averaging. Here, we aim to (1) develop novel approaches to explore single-trial parameters (including latency, frequency and magnitude) of ERP/ERD/ERS; (2) disclose the relationship between estimated single-trial parameters and other experimental factors (e.g., perceived intensity). We found that (1) stimulus-elicited ERP/ERD/ERS can be correctly separated using principal component analysis (PCA) decomposition with Varimax rotation on the single-trial time-frequency distributions; (2) time-frequency multiple linear regression with dispersion term (TF-MLRd) enhances the signal-to-noise ratio of ERP/ERD/ERS in single trials, and provides an unbiased estimation of their latency, frequency, and magnitude at single-trial level; (3) these estimates can be meaningfully correlated with each other and with other experimental factors at single-trial level (e.g., perceived stimulus intensity and ERP magnitude). The methods described in this article allow exploring fully non-phase-locked stimulus-induced cortical

  14. Regression modeling strategies with applications to linear models, logistic and ordinal regression, and survival analysis

    CERN Document Server

    Harrell , Jr , Frank E

    2015-01-01

    This highly anticipated second edition features new chapters and sections, 225 new references, and comprehensive R software. In keeping with the previous edition, this book is about the art and science of data analysis and predictive modeling, which entails choosing and using multiple tools. Instead of presenting isolated techniques, this text emphasizes problem solving strategies that address the many issues arising when developing multivariable models using real data and not standard textbook examples. It includes imputation methods for dealing with missing data effectively, methods for fitting nonlinear relationships and for making the estimation of transformations a formal part of the modeling process, methods for dealing with "too many variables to analyze and not enough observations," and powerful model validation techniques based on the bootstrap.  The reader will gain a keen understanding of predictive accuracy, and the harm of categorizing continuous predictors or outcomes.  This text realistically...

  15. A regressed phase analysis for coupled joint systems.

    Science.gov (United States)

    Wininger, Michael

    2011-01-01

    This study aims to address shortcomings of the relative phase analysis, a widely used method for assessment of coupling among joints of the lower limb. Goniometric data from 15 individuals with spastic diplegic cerebral palsy were recorded from the hip and knee joints during ambulation on a flat surface, and from a single healthy individual with no known motor impairment, over at least 10 gait cycles. The minimum relative phase (MRP) revealed substantial disparity in the timing and severity of the instance of maximum coupling, depending on which reference frame was selected: MRP(knee-hip) differed from MRP(hip-knee) by 16.1±14% of gait cycle and 50.6±77% difference in scale. Additionally, several relative phase portraits contained discontinuities which may contribute to error in phase feature extraction. These vagaries can be attributed to the predication of relative phase analysis on a transformation into the velocity-position phase plane, and the extraction of phase angle by the discontinuous arc-tangent operator. Here, an alternative phase analysis is proposed, wherein kinematic data is transformed into a profile of joint coupling across the entire gait cycle. By comparing joint velocities directly via a standard linear regression in the velocity-velocity phase plane, this regressed phase analysis provides several key advantages over relative phase analysis including continuity, commutativity between reference frames, and generalizability to many-joint systems.

  16. Using the Coefficient of Determination "R"[superscript 2] to Test the Significance of Multiple Linear Regression

    Science.gov (United States)

    Quinino, Roberto C.; Reis, Edna A.; Bessegato, Lupercio F.

    2013-01-01

    This article proposes the use of the coefficient of determination as a statistic for hypothesis testing in multiple linear regression based on distributions acquired by beta sampling. (Contains 3 figures.)

  17. Monitoring heavy metal Cr in soil based on hyperspectral data using regression analysis

    Science.gov (United States)

    Zhang, Ningyu; Xu, Fuyun; Zhuang, Shidong; He, Changwei

    2016-10-01

    Heavy metal pollution in soils is one of the most critical problems in the global ecology and environment safety nowadays. Hyperspectral remote sensing and its application is capable of high speed, low cost, less risk and less damage, and provides a good method for detecting heavy metals in soil. This paper proposed a new idea of applying regression analysis of stepwise multiple regression between the spectral data and monitoring the amount of heavy metal Cr by sample points in soil for environmental protection. In the measurement, a FieldSpec HandHeld spectroradiometer is used to collect reflectance spectra of sample points over the wavelength range of 325-1075 nm. Then the spectral data measured by the spectroradiometer is preprocessed to reduced the influence of the external factors, and the preprocessed methods include first-order differential equation, second-order differential equation and continuum removal method. The algorithms of stepwise multiple regression are established accordingly, and the accuracy of each equation is tested. The results showed that the accuracy of first-order differential equation works best, which makes it feasible to predict the content of heavy metal Cr by using stepwise multiple regression.

  18. An Investigation of the Relationship of Intellective and Personality Variables to Success in an Independent Study Science Course Through the Use of a Modified Multiple Regression Model.

    Science.gov (United States)

    Szabo, Michael; Feldhusen, John F.

    This is an empirical study of selected learner characteristics and their relation to academic success, as indicated by course grades, in a structured independent study learning program. This program, called the Audio-Tutorial System, was utilized in an undergraduate college course in the biological sciences. By use of multiple regression analysis,…

  19. Multiple linear regression models of urban runoff pollutant load and event mean concentration considering rainfall variables.

    Science.gov (United States)

    Maniquiz, Marla C; Lee, Soyoung; Kim, Lee-Hyung

    2010-01-01

    Rainfall is an important factor in estimating the event mean concentration (EMC) which is used to quantify the washed-off pollutant concentrations from non-point sources (NPSs). Pollutant loads could also be calculated using rainfall, catchment area and runoff coefficient. In this study, runoff quantity and quality data gathered from a 28-month monitoring conducted on the road and parking lot sites in Korea were evaluated using multiple linear regression (MLR) to develop equations for estimating pollutant loads and EMCs as a function of rainfall variables. The results revealed that total event rainfall and average rainfall intensity are possible predictors of pollutant loads. Overall, the models are indicators of the high uncertainties of NPSs; perhaps estimation of EMCs and loads could be accurately obtained by means of water quality sampling or a long-term monitoring is needed to gather more data that can be used for the development of estimation models.

  20. Genetic-algorithm-based multiple regression with fuzzy inference system for detection of nocturnal hypoglycemic episodes.

    Science.gov (United States)

    Ling, Steve S H; Nguyen, Hung T

    2011-03-01

    Hypoglycemia or low blood glucose is dangerous and can result in unconsciousness, seizures, and even death. It is a common and serious side effect of insulin therapy in patients with diabetes. Hypoglycemic monitor is a noninvasive monitor that measures some physiological parameters continuously to provide detection of hypoglycemic episodes in type 1 diabetes mellitus patients (T1DM). Based on heart rate (HR), corrected QT interval of the ECG signal, change of HR, and the change of corrected QT interval, we develop a genetic algorithm (GA)-based multiple regression with fuzzy inference system (FIS) to classify the presence of hypoglycemic episodes. GA is used to find the optimal fuzzy rules and membership functions of FIS and the model parameters of regression method. From a clinical study of 16 children with T1DM, natural occurrence of nocturnal hypoglycemic episodes is associated with HRs and corrected QT intervals. The overall data were organized into a training set (eight patients) and a testing set (another eight patients) randomly selected. The results show that the proposed algorithm performs a good sensitivity with an acceptable specificity.

  1. Predicting Fuel Ignition Quality Using 1H NMR Spectroscopy and Multiple Linear Regression

    KAUST Repository

    Abdul Jameel, Abdul Gani

    2016-09-14

    An improved model for the prediction of ignition quality of hydrocarbon fuels has been developed using 1H nuclear magnetic resonance (NMR) spectroscopy and multiple linear regression (MLR) modeling. Cetane number (CN) and derived cetane number (DCN) of 71 pure hydrocarbons and 54 hydrocarbon blends were utilized as a data set to study the relationship between ignition quality and molecular structure. CN and DCN are functional equivalents and collectively referred to as D/CN, herein. The effect of molecular weight and weight percent of structural parameters such as paraffinic CH3 groups, paraffinic CH2 groups, paraffinic CH groups, olefinic CH–CH2 groups, naphthenic CH–CH2 groups, and aromatic C–CH groups on D/CN was studied. A particular emphasis on the effect of branching (i.e., methyl substitution) on the D/CN was studied, and a new parameter denoted as the branching index (BI) was introduced to quantify this effect. A new formula was developed to calculate the BI of hydrocarbon fuels using 1H NMR spectroscopy. Multiple linear regression (MLR) modeling was used to develop an empirical relationship between D/CN and the eight structural parameters. This was then used to predict the DCN of many hydrocarbon fuels. The developed model has a high correlation coefficient (R2 = 0.97) and was validated with experimentally measured DCN of twenty-two real fuel mixtures (e.g., gasolines and diesels) and fifty-nine blends of known composition, and the predicted values matched well with the experimental data.

  2. Meta-regression Analysis of the Chinese Labor Reallocation Effect

    Institute of Scientific and Technical Information of China (English)

    Longhua; YUE; Shiyan; YANG; Rongtai; SHEN

    2013-01-01

    Meta regression analysis method was applied to study 23 papers about the effect of Chinese labor reallocation on the economic growth. The results showed that both the method of the World Bank (1996) or M.Syrquin(1986) had little impact on the results, while the calculation of the stock of physical capital had a positive impact on the results. The result by using panel data study was bigger than results obtained in the time series data. The time span had little influences on the results. Therefore, it was necessary to measure the exact stock of physical capital in China, so as to evaluate the Chinese labor reallocation effect

  3. Multivariate study and regression analysis of gluten-free granola

    Directory of Open Access Journals (Sweden)

    Lilian Maria Pagamunici

    2014-03-01

    Full Text Available This study developed a gluten-free granola and evaluated it during storage with the application of multivariate and regression analysis of the sensory and instrumental parameters. The physicochemical, sensory, and nutritional characteristics of a product containing quinoa, amaranth and linseed were evaluated. The crude protein and lipid contents ranged from 97.49 and 122.72 g kg-1 of food, respectively. The polyunsaturated/saturated, and n-6:n-3 fatty acid ratios ranged from 2.82 and 2.59:1, respectively. Granola had the best alpha-linolenic acid content, nutritional indices in the lipid fraction, and mineral content. There were good hygienic and sanitary conditions during storage; probably due to the low water activity of the formulation, which contributed to inhibit microbial growth. The sensory attributes ranged from 'like very much' to 'like slightly', and the regression models were highly fitted and correlated during the storage period. A reduction in the sensory attribute levels and in the product physical stabilisation was verified by principal component analysis. The use of the affective test acceptance and instrumental analysis combined with statistical methods allowed us to obtain promising results about the characteristics of gluten-free granola.

  4. 城乡居民幸福感影响因素多重线性回归和路径分析%The Comparison between Multiple Linear Regression Analysis and Path Analysis on the Influencing Factors of Subjective Wellbeing among Urban and Rural Residents

    Institute of Scientific and Technical Information of China (English)

    徐曼; 柴云; 李涛; 卢丽; 刘冰

    2015-01-01

    目的:以多重线性回归和路径分析深入探讨城乡居民主观幸福感影响因素及其相互作用关系。方法:应用简单随机抽样方法选取1480名城乡居民运用Campbell主观幸福感指数量表进行问卷调查,数据采用多重线性回归和路径分析进行对比分析。结果:城乡居民总体幸福感指数平均得分为(11.17±1.99)。多重线性回归分析显示,未来目标、压力应对方式、自评健康状况、兴趣爱好、城乡居住地、休闲时间是幸福感指数的预测因素,标准化偏回归数分别为0.261、0.182、0.152、0.066、0.071、0.051。路径分析发现,未来目标、压力应对方式、自评健康状况直接作用于幸福感指数,路径系数为0.285、0.191、0.160,兴趣爱好、个人月收入、受教育程度、年龄间接作用于幸福感指数,间接效应为0.08、-0.04、0.10、-0.07。结论:主观幸福感与个体生理、心理、社会因素等多个内外部因素有关,多重线性回归和路径分析在探讨居民主观幸福感影响因素及其作用关系过程中各有侧重,相互补充。%Objective:To further explore the subjective wellbeing among urban and rural residents and their influencing factors and the factors'interactions by multiple linear regression analyses and path analy-ses .Methods:By simple random sampling method ,a questionnaire survey was conducted among 1480 ur-ban and rural residents using Compbell subjective well -being index Scale ,and data were compared and analyzed by multiple linear regression analyses and path analyses .Results:The average of urban and rural residents'general well-being index was(11.17 ±1.99).Multiple linear regression analysis showed that future goals,stress coping styles,self-rated health status,hobbies,place from urban and rural residence and leisure time were predictors of well -being index ,and their standardized partial regression

  5. Demand Forecast of the Chinese Luxury Consumption Based on the Multiple Linear Regression Analysis%基于多元线性回归模型在中国奢侈品消费情况预测的应用

    Institute of Scientific and Technical Information of China (English)

    张海宁

    2013-01-01

    With the rapid growth of Chinese luxury consumption,the current Chinese luxury goods market has gradually become the battleground of international brands.The meeting of the market demand must be based on the forecast.This paper studies the factors that influence consumer of luxury goods,GDP,the level of consumption,total tourism spending domestic tourists and the tax.Based on the sample data,predictive method of multivariate linear regression model was discussed.The results showed that the model fitting degree and significance are in line with the requirements,indicating this model has the accurate prospect in the Chinese luxury consumption forecast.%随着中国奢侈品消费的高速增长,当前中国奢侈品市场已逐渐成为各国际名牌的必争之地,满足市场需求必须以奢侈品消费的研究为前提。通过研究影响奢侈品消费的因素,GDP、居民消费水平、国内游客旅游总花费、各项税收。根据样本数据,建立多元线性回归预测模型,对模型的预测结果进行检验,结果表明模型的拟合度、显著性均符合要求,说明通过该模型准确的预测了中国奢侈品消费的状况。

  6. Optimum short-time polynomial regression for signal analysis

    Indian Academy of Sciences (India)

    A SREENIVASA MURTHY; CHANDRA SEKHAR SEELAMANTULA; T V SREENIVAS

    2016-11-01

    We propose a short-time polynomial regression (STPR) for time-varying signal analysis. The advantage of using polynomials is that the notion of a spectrum is not needed and the signals can be analyzed in the time domain over short durations. In the presence of noise, such modeling becomes important, because the polynomial approximation performs smoothing leading to noise suppression. The problem of optimal smoothingdepends on the duration over which a fixed-order polynomial regression is performed. Considering the STPR of a noisy signal, we derive the optimal smoothing window by minimizing the mean-square error (MSE). For a fixed polynomial order, the smoothing window duration depends on the rate of signal variation, which, in turn,depends on its derivatives. Since the derivatives are not available a priori, exact optimization is not feasible.However, approximate optimization can be achieved using only the variance expressions and the intersection-ofconfidence-intervals (ICI) technique. The ICI technique is based on a consistency measure across confidence intervals corresponding to different window lengths. An approximate asymptotic analysis to determine the optimal confidence interval width shows that the asymptotic expressions are the same irrespective of whether one starts with a uniform sampling grid or a nonuniform one. Simulation results on sinusoids, chirps, and electrocardiogram (ECG) signals, and comparisons with standard wavelet denoising techniques, show that theproposed method is robust particularly in the low signal-to-noise ratio regime.

  7. A New Measurement Equivalence Technique Based on Latent Class Regression as Compared with Multiple Indicators Multiple Causes

    Science.gov (United States)

    Jamali, Jamshid; Ayatollahi, Seyyed Mohammad Taghi; Jafari, Peyman

    2016-01-01

    Background: Measurement equivalence is an essential prerequisite for making valid comparisons in mental health questionnaires across groups. In most methods used for assessing measurement equivalence, which is known as Differential Item Functioning (DIF), latent variables are assumed to be continuous. Objective: To compare a new method called Latent Class Regression (LCR) designed for discrete latent variable with the multiple indicators multiple cause (MIMIC) as a continuous latent variable technique to assess the measurement equivalence of the 12-item General Health Questionnaire (GHQ-12), which is a cross deferent subgroup of Iranian nurses. Methods: A cross-sectional survey was conducted in 2014 among 771 nurses working in the hospitals of Fars and Bushehr provinces of southern Iran. To identify the Minor Psychiatric Disorders (MPD), the nurses completed self-report GHQ-12 questionnaires and sociodemographic questions. Two uniform-DIF detection methods, LCR and MIMIC, were applied for comparability when the GHQ-12 score was assumed to be discrete and continuous, respectively. Results: The result of fitting LCR with 2 classes indicated that 27.4% of the nurses had MPD. Gender was identified as an influential factor of the level of MPD.LCR and MIMIC agree with detection of DIF and DIF-free items by gender, age, education and marital status in 83.3, 100.0, 91.7 and 83.3% cases, respectively. Conclusions: The results indicated that the GHQ-12 is to a great degree, an invariant measure for the assessment of MPD among nurses. High convergence between the two methods suggests using the LCR approach in cases of discrete latent variable, e.g. GHQ-12 and adequate sample size. PMID:27482129

  8. QSRR Study of GC Retention Indices of Volatile Compounds Emitted from Mosla chinensis Maxim by Multiple Linear Regression%QSRR Study of GC Retention Indices of Volatile Compounds Emitted from Mosla chinensis Maxim by Multiple Linear Regression

    Institute of Scientific and Technical Information of China (English)

    曹慧; 李祖光; 陈小珍

    2011-01-01

    The volatile compounds emitted from Mosla chinensis Maxim were analyzed by headspace solid-phase micro- extraction (HS-SPME) and headspace liquid-phase microextraction (HS-LPME) combined with gas chromatography-mass spectrometry (GC-MS). The main volatiles from Mosla chinensis Maxim were studied in this paper. It can be seen that 61 compounds were separated and identified. Forty-nine volatile compounds were identified by SPME method, mainly including myrcene, a-terpinene, p-cymene, (E)-ocimene, thymol, thymol acetate and (E)-fl-farnesene. Forty-five major volatile compounds were identified by LPME method, including a-thujene, a-pinene, camphene, butanoic acid, 2-methylpropyl ester, myrcene, butanoic acid, butyl ester, a-terpinene, p-cymene, (E)-ocimene, butane, 1,1-dibutoxy-, thymol, thymol acetate and (E)-fl-farnesene. After analyzing the volatile compounds, multiple linear regression (MLR) method was used for building the regression model. Then the quantitative structure-retention relationship (QSRR) model was validated by predictive-ability test. The prediction results were in good agreement with the experimental values. The results demonstrated that headspace SPME-GC-MS and LPME-GC-MS are the simple, rapid and easy sample enrichment technique suitable for analysis of volatile compounds. This investigation provided an effective method for predicting the retention indices of new compounds even in the absence of the standard candidates.

  9. Effect size and power in assessing moderating effects of categorical variables using multiple regression: a 30-year review.

    Science.gov (United States)

    Aguinis, Herman; Beaty, James C; Boik, Robert J; Pierce, Charles A

    2005-01-01

    The authors conducted a 30-year review (1969-1998) of the size of moderating effects of categorical variables as assessed using multiple regression. The median observed effect size (f(2)) is only .002, but 72% of the moderator tests reviewed had power of .80 or greater to detect a targeted effect conventionally defined as small. Results suggest the need to minimize the influence of artifacts that produce a downward bias in the observed effect size and put into question the use of conventional definitions of moderating effect sizes. As long as an effect has a meaningful impact, the authors advise researchers to conduct a power analysis and plan future research designs on the basis of smaller and more realistic targeted effect sizes.

  10. Artificial neural networks and multiple linear regression model using principal components to estimate rainfall over South America

    Science.gov (United States)

    Soares dos Santos, T.; Mendes, D.; Rodrigues Torres, R.

    2016-01-01

    Several studies have been devoted to dynamic and statistical downscaling for analysis of both climate variability and climate change. This paper introduces an application of artificial neural networks (ANNs) and multiple linear regression (MLR) by principal components to estimate rainfall in South America. This method is proposed for downscaling monthly precipitation time series over South America for three regions: the Amazon; northeastern Brazil; and the La Plata Basin, which is one of the regions of the planet that will be most affected by the climate change projected for the end of the 21st century. The downscaling models were developed and validated using CMIP5 model output and observed monthly precipitation. We used general circulation model (GCM) experiments for the 20th century (RCP historical; 1970-1999) and two scenarios (RCP 2.6 and 8.5; 2070-2100). The model test results indicate that the ANNs significantly outperform the MLR downscaling of monthly precipitation variability.

  11. Polygraph Test Results Assessment by Regression Analysis Methods

    Directory of Open Access Journals (Sweden)

    K. A. Leontiev

    2014-01-01

    Full Text Available The paper considers a problem of defining the importance of asked questions for the examinee under judicial and psychophysiological polygraph examination by methods of mathematical statistics. It offers the classification algorithm based on the logistic regression as an optimum Bayesian classifier, considering weight coefficients of information for the polygraph-recorded physiological parameters with no condition for independence of the measured signs.Actually, binary classification is executed by results of polygraph examination with preliminary normalization and standardization of primary results, with check of a hypothesis that distribution of obtained data is normal, as well as with calculation of coefficients of linear regression between input values and responses by method of maximum likelihood. Further, the logistic curve divided signs into two classes of the "significant" and "insignificant" type.Efficiency of model is estimated by means of the ROC analysis (Receiver Operator Characteristics. It is shown that necessary minimum sample has to contain results of 45 measurements at least. This approach ensures a reliable result provided that an expert-polygraphologist possesses sufficient qualification and follows testing techniques.

  12. Logistic Regression Analysis Using Multiple Risk Factors for Neonatal Hypoglycemia%运用多元Logistic回归模型分析影响新生儿低血糖的危险因素

    Institute of Scientific and Technical Information of China (English)

    宝凌云; 易欣; 高瑾; 胡熙; 杜琨

    2016-01-01

    Objective To explore the risk factors of neonatal hypoglycemia. Methods The clinical data of 340 neonates admitted to our hospital from July 2013 to July 2015 were retrospectively analyzed,grouped according to neonatal blood glucose levels,blood glucose<2.2 mmol/L is defined as low blood sugar,including 32 cases of neonatal hypoglycemia ans 308 normal neonates respectively,using Pearson single factor and multivariate Logistic regression model to analyze the related risk factors of neonatal hypoglycemia. Results In normal newborn infants and neonatal hypoglycemia group the differences of neonatal conditions ( birth weight,premature infants and full-term SGA) ,mother of perinatal situation ( mater-nal age,pregnancy induced hypertension) and neonatal complications (with new respiratory disease,asphyxia,congenital heart disease,hemorrhage disease,infectious disease,hyperbilirubinemia,hypothyroidism) were statistically significant ( P<0.05~0.01);the risk factors of Neonatal hypoglycemia:neonatal birth ( birth weight,premature infants and full-term SGA) , mother of perinatal situation ( maternal age,pregnancy induced hypertension) ,neonatal complications ( neonatal asphyxia, congenital heart disease and hyperbilirubinemia ) . Conclusion Neonatal hypoglycemia related risk factors were birth weight,premature infant and full term small for gestational age, maternal age, pregnancy hypertension, neonatal asphyxia, congenital heart disease,hyperbilirubinemia,controlling these factors can provide scientific basis for the effective prevention of neonatal hypoglycemia.%目的:探讨影响新生儿低血糖的危险因素。方法回顾性分析了2013年7月~2015年7月入住本院的340例新生儿的临床资料,根据新生儿血糖浓度进行分组,将血糖浓度<2.2 mmol/L为低血糖组,其中新生儿低血糖组32例,正常新生儿组308例。分别采用Pearson单因素与多元Logistic回归模型分析影响低血糖新生儿的相关危险因素。结果

  13. A Visual Analytics Approach for Correlation, Classification, and Regression Analysis

    Energy Technology Data Exchange (ETDEWEB)

    Steed, Chad A [ORNL; SwanII, J. Edward [Mississippi State University (MSU); Fitzpatrick, Patrick J. [Mississippi State University (MSU); Jankun-Kelly, T.J. [Mississippi State University (MSU)

    2012-02-01

    New approaches that combine the strengths of humans and machines are necessary to equip analysts with the proper tools for exploring today's increasing complex, multivariate data sets. In this paper, a novel visual data mining framework, called the Multidimensional Data eXplorer (MDX), is described that addresses the challenges of today's data by combining automated statistical analytics with a highly interactive parallel coordinates based canvas. In addition to several intuitive interaction capabilities, this framework offers a rich set of graphical statistical indicators, interactive regression analysis, visual correlation mining, automated axis arrangements and filtering, and data classification techniques. The current work provides a detailed description of the system as well as a discussion of key design aspects and critical feedback from domain experts.

  14. Multivariate Regression Analysis of Gravitational Waves from Rotating Core Collapse

    CERN Document Server

    Engels, William J; Ott, Christian D

    2014-01-01

    We present a new multivariate regression model for analysis and parameter estimation of gravitational waves observed from well but not perfectly modeled sources such as core-collapse supernovae. Our approach is based on a principal component decomposition of simulated waveform catalogs. Instead of reconstructing waveforms by direct linear combination of physically meaningless principal components, we solve via least squares for the relationship that encodes the connection between chosen physical parameters and the principal component basis. Although our approach is linear, the waveforms' parameter dependence may be non-linear. For the case of gravitational waves from rotating core collapse, we show, using statistical hypothesis testing, that our method is capable of identifying the most important physical parameters that govern waveform morphology in the presence of simulated detector noise. We also demonstrate our method's ability to predict waveforms from a principal component basis given a set of physical ...

  15. Cardiorespiratory fitness and laboratory stress: a meta-regression analysis.

    Science.gov (United States)

    Jackson, Erica M; Dishman, Rod K

    2006-01-01

    We performed a meta-regression analysis of 73 studies that examined whether cardiorespiratory fitness mitigates cardiovascular responses during and after acute laboratory stress in humans. The cumulative evidence indicates that fitness is related to slightly greater reactivity, but better recovery. However, effects varied according to several study features and were smallest in the better controlled studies. Fitness did not mitigate integrated stress responses such as heart rate and blood pressure, which were the focus of most of the studies we reviewed. Nonetheless, potentially important areas, particularly hemodynamic and vascular responses, have been understudied. Women, racial/ethnic groups, and cardiovascular patients were underrepresented. Randomized controlled trials, including naturalistic studies of real-life responses, are needed to clarify whether a change in fitness alters putative stress mechanisms linked with cardiovascular health.

  16. Spatial regression analysis of traffic crashes in Seoul.

    Science.gov (United States)

    Rhee, Kyoung-Ah; Kim, Joon-Ki; Lee, Young-ihn; Ulfarsson, Gudmundur F

    2016-06-01

    Traffic crashes can be spatially correlated events and the analysis of the distribution of traffic crash frequency requires evaluation of parameters that reflect spatial properties and correlation. Typically this spatial aspect of crash data is not used in everyday practice by planning agencies and this contributes to a gap between research and practice. A database of traffic crashes in Seoul, Korea, in 2010 was developed at the traffic analysis zone (TAZ) level with a number of GIS developed spatial variables. Practical spatial models using available software were estimated. The spatial error model was determined to be better than the spatial lag model and an ordinary least squares baseline regression. A geographically weighted regression model provided useful insights about localization of effects. The results found that an increased length of roads with speed limit below 30 km/h and a higher ratio of residents below age of 15 were correlated with lower traffic crash frequency, while a higher ratio of residents who moved to the TAZ, more vehicle-kilometers traveled, and a greater number of access points with speed limit difference between side roads and mainline above 30 km/h all increased the number of traffic crashes. This suggests, for example, that better control or design for merging lower speed roads with higher speed roads is important. A key result is that the length of bus-only center lanes had the largest effect on increasing traffic crashes. This is important as bus-only center lanes with bus stop islands have been increasingly used to improve transit times. Hence the potential negative safety impacts of such systems need to be studied further and mitigated through improved design of pedestrian access to center bus stop islands.

  17. Optimization of end-members used in multiple linear regression geochemical mixing models

    Science.gov (United States)

    Dunlea, Ann G.; Murray, Richard W.

    2015-11-01

    Tracking marine sediment provenance (e.g., of dust, ash, hydrothermal material, etc.) provides insight into contemporary ocean processes and helps construct paleoceanographic records. In a simple system with only a few end-members that can be easily quantified by a unique chemical or isotopic signal, chemical ratios and normative calculations can help quantify the flux of sediment from the few sources. In a more complex system (e.g., each element comes from multiple sources), more sophisticated mixing models are required. MATLAB codes published in Pisias et al. solidified the foundation for application of a Constrained Least Squares (CLS) multiple linear regression technique that can use many elements and several end-members in a mixing model. However, rigorous sensitivity testing to check the robustness of the CLS model is time and labor intensive. MATLAB codes provided in this paper reduce the time and labor involved and facilitate finding a robust and stable CLS model. By quickly comparing the goodness of fit between thousands of different end-member combinations, users are able to identify trends in the results that reveal the CLS solution uniqueness and the end-member composition precision required for a good fit. Users can also rapidly check that they have the appropriate number and type of end-members in their model. In the end, these codes improve the user's confidence that the final CLS model(s) they select are the most reliable solutions. These advantages are demonstrated by application of the codes in two case studies of well-studied datasets (Nazca Plate and South Pacific Gyre).

  18. Multiple linear regression models for shear strength prediction and design of simplysupported deep beams subjected to symmetrical point loads

    Directory of Open Access Journals (Sweden)

    Panatchai Chetchotisak

    2015-09-01

    Full Text Available Because of nonlinear strain distributions caused either by abrupt changes in geometry or in loading in deep beam, the approach for conventional beams is not applicable. Consequently, strut-and-tie model (STM has been applied as the most rational and simple method for strength prediction and design of reinforced concrete deep beams. A deep beam is idealized by the STM as a truss-like structure consisting of diagonal concrete struts and tension ties. There have been numerous works proposing the STMs for deep beams. However, uncertainty and complexity in shear strength computations of deep beams can be found in some STMs. Therefore, improvement of methods for predicting the shear strengths of deep beams are still needed. By means of a large experimental database of 406 deep beam test results covering a wide range of influencing parameters, several shapes and geometry of STM and six state-of-the-art formulation of the efficiency factors found in the design codes and literature, the new STMs for predicting the shear strength of simply supported reinforced concrete deep beams using multiple linear regression analysis is proposed in this paper. Furthermore, the regression diagnostics and the validation process are included in this study. Finally, two numerical examples are also provided for illustration.

  19. Multiple linear regression model for predicting biomass digestibility from structural features.

    Science.gov (United States)

    Zhu, Li; O'Dwyer, Jonathan P; Chang, Vincent S; Granda, Cesar B; Holtzapple, Mark T

    2010-07-01

    A total of 147 model lignocellulose samples with a broad spectrum of structural features (lignin contents, acetyl contents, and crystallinity indices) were hydrolyzed with a wide range of cellulase loadings during 1-, 6-, and 72-h hydrolysis periods. Carbohydrate conversions at 1, 6, and 72 h were linearly proportional to the logarithm of cellulase loadings from approximately 10% to 90% conversion, indicating that the simplified HCH-1 model is valid for predicting lignocellulose digestibility. The HCH-1 model is a modified Michaelis-Menton model that accounts for the fraction of insoluble substrate available to bind with enzyme. The slopes and intercepts of a simplified HCH-1 model were correlated with structural features using multiple linear regression (MLR) models. The agreement between the measured and predicted 1-, 6-, and 72-h slopes and intercepts of glucan, xylan, and total sugar hydrolyses indicate that lignin content, acetyl content, and cellulose crystallinity are key factors that determine biomass digestibility. The 1-, 6-, and 72-h glucan, xylan, and total sugar conversions predicted from structural features using MLR models and the simplified HCH-1 model fit satisfactorily with the measured data (R(2) approximately 1.0). The parameter selection suggests that lignin content and cellulose crystallinity more strongly affect on digestibility than acetyl content. Cellulose crystallinity has greater influence during short hydrolysis periods whereas lignin content has more influence during longer hydrolysis periods. Cellulose crystallinity shows more influence on glucan hydrolysis whereas lignin content affects xylan hydrolysis to a greater extent.

  20. [Clinical research XX. From clinical judgment to multiple logistic regression model].

    Science.gov (United States)

    Berea-Baltierra, Ricardo; Rivas-Ruiz, Rodolfo; Pérez-Rodríguez, Marcela; Palacios-Cruz, Lino; Moreno, Jorge; Talavera, Juan O

    2014-01-01

    The complexity of the causality phenomenon in clinical practice implies that the result of a maneuver is not solely caused by the maneuver, but by the interaction among the maneuver and other baseline factors or variables occurring during the maneuver. This requires methodological designs that allow the evaluation of these variables. When the outcome is a binary variable, we use the multiple logistic regression model (MLRM). This multivariate model is useful when we want to predict or explain, adjusting due to the effect of several risk factors, the effect of a maneuver or exposition over the outcome. In order to perform an MLRM, the outcome or dependent variable must be a binary variable and both categories must mutually exclude each other (i.e. live/death, healthy/ill); on the other hand, independent variables or risk factors may be either qualitative or quantitative. The effect measure obtained from this model is the odds ratio (OR) with 95 % confidence intervals (CI), from which we can estimate the proportion of the outcome's variability explained through the risk factors. For these reasons, the MLRM is used in clinical research, since one of the main objectives in clinical practice comprises the ability to predict or explain an event where different risk or prognostic factors are taken into account.

  1. QSAR modeling of antimalarial activity of urea derivatives using genetic algorithm–multiple linear regressions

    Directory of Open Access Journals (Sweden)

    Abolghasem Beheshti

    2016-05-01

    Full Text Available A quantitative structure–activity relationship (QSAR was performed to analyze antimalarial activities of 68 urea derivatives using multiple linear regressions (MLR. QSAR analyses were performed on the available 68 IC50 oral data based on theoretical molecular descriptors. A suitable set of molecular descriptors were calculated to represent the molecular structures of compounds, such as constitutional, topological, geometrical, electrostatic and quantum-chemical descriptors. The important descriptors were selected with the aid of the genetic algorithm (GA method. The obtained model was validated using leave-one-out (LOO cross-validation; external test set and Y-randomization test. The root mean square errors (RMSE of the training set, and the test set for GA–MLR model were calculated to be 0.314 and 0.486, the square of correlation coefficients (R2 were obtained 0.801 and 0.803, respectively. Results showed that the predictive ability of the model was satisfactory, and it can be used for designing similar group of antimalarial compounds.

  2. DESIGNING A FORECAST MODEL FOR ECONOMIC GROWTH OF JAPAN USING COMPETITIVE (HYBRID ANN VS MULTIPLE REGRESSION MODELS

    Directory of Open Access Journals (Sweden)

    Ahmet DEMIR

    2015-07-01

    Full Text Available Artificial neural network models have been already used on many different fields successfully. However, many researches show that ANN models provide better optimum results than other competitive models in most of the researches. But does it provide optimum solutions in case ANN is proposed as hybrid model? The answer of this question is given in this research by using these models on modelling a forecast for GDP growth of Japan. Multiple regression models utilized as competitive models versus hybrid ANN (ANN + multiple regression models. Results have shown that hybrid model gives better responds than multiple regression models. However, variables, which were significantly affecting GDP growth, were determined and some of the variables, which were assumed to be affecting GDP growth of Japan, were eliminated statistically.

  3. Regression Analysis of Restricted Mean Survival Time Based on Pseudo-Observations

    DEFF Research Database (Denmark)

    Andersen, Per Kragh; Hansen, Mette Gerster; Klein, John P.

    2004-01-01

    censoring; hazard function; health economics; mean survival time; pseudo-observations; regression model; restricted mean survival time; survival analysis......censoring; hazard function; health economics; mean survival time; pseudo-observations; regression model; restricted mean survival time; survival analysis...

  4. Regression analysis of restricted mean survival time based on pseudo-observations

    DEFF Research Database (Denmark)

    Andersen, Per Kragh; Hansen, Mette Gerster; Klein, John P.

    censoring; hazard function; health economics; regression model; survival analysis; mean survival time; restricted mean survival time; pseudo-observations......censoring; hazard function; health economics; regression model; survival analysis; mean survival time; restricted mean survival time; pseudo-observations...

  5. An Analysis of Bank Service Satisfaction Based on Quantile Regression and Grey Relational Analysis

    Directory of Open Access Journals (Sweden)

    Wen-Tsao Pan

    2016-01-01

    Full Text Available Bank service satisfaction is vital to the success of a bank. In this paper, we propose to use the grey relational analysis to gauge the levels of service satisfaction of the banks. With the grey relational analysis, we compared the effects of different variables on service satisfaction. We gave ranks to the banks according to their levels of service satisfaction. We further used the quantile regression model to find the variables that affected the satisfaction of a customer at a specific quantile of satisfaction level. The result of the quantile regression analysis provided a bank manager with information to formulate policies to further promote satisfaction of the customers at different quantiles of satisfaction level. We also compared the prediction accuracies of the regression models at different quantiles. The experiment result showed that, among the seven quantile regression models, the median regression model has the best performance in terms of RMSE, RTIC, and CE performance measures.

  6. Watershed Regressions for Pesticides (WARP) models for predicting stream concentrations of multiple pesticides

    Science.gov (United States)

    Stone, Wesley W.; Crawford, Charles G.; Gilliom, Robert J.

    2013-01-01

    Watershed Regressions for Pesticides for multiple pesticides (WARP-MP) are statistical models developed to predict concentration statistics for a wide range of pesticides in unmonitored streams. The WARP-MP models use the national atrazine WARP models in conjunction with an adjustment factor for each additional pesticide. The WARP-MP models perform best for pesticides with application timing and methods similar to those used with atrazine. For other pesticides, WARP-MP models tend to overpredict concentration statistics for the model development sites. For WARP and WARP-MP, the less-than-ideal sampling frequency for the model development sites leads to underestimation of the shorter-duration concentration; hence, the WARP models tend to underpredict 4- and 21-d maximum moving-average concentrations, with median errors ranging from 9 to 38% As a result of this sampling bias, pesticides that performed well with the model development sites are expected to have predictions that are biased low for these shorter-duration concentration statistics. The overprediction by WARP-MP apparent for some of the pesticides is variably offset by underestimation of the model development concentration statistics. Of the 112 pesticides used in the WARP-MP application to stream segments nationwide, 25 were predicted to have concentration statistics with a 50% or greater probability of exceeding one or more aquatic life benchmarks in one or more stream segments. Geographically, many of the modeled streams in the Corn Belt Region were predicted to have one or more pesticides that exceeded an aquatic life benchmark during 2009, indicating the potential vulnerability of streams in this region.

  7. This research is to study the factors which influence the business success of small business ‘processed rotan’. The data employed in the study are primary data within the period of July to August 2013, 30 research observations through census method. Method of analysis used in the study is multiple linear regressions. The results of analysis showed that the factors of labor, innovation and promotion have positive and significant influence on the business success of small business ‘processed rotan’ simultaneously. The analysis also showed that partially labor has positive and significant influence on the business success, yet innovation and promotion have insignificant and positive influence on the business success.

    OpenAIRE

    Nasution, Inggrita Gusti Sari; Muchtar, Yasmin Chairunnisa

    2013-01-01

    This research is to study the factors which influence the business success of small business ‘processed rotan’. The data employed in the study are primary data within the period of July to August 2013, 30 research observations through census method. Method of analysis used in the study is multiple linear regressions. The results of analysis showed that the factors of labor, innovation and promotion have positive and significant influence on the business success of small busine...

  8. 科技产出影响因素分析与预测研究——基于多元回归和BP神经网络的途径%Research on analysis of influencing factors and prediction for scientific and technological outputs an approach based on multiple linear regression and BP neural network

    Institute of Scientific and Technical Information of China (English)

    胡泽文; 武夷山

    2012-01-01

    Firstly, some qualitative analysis methods such as literature research and network investigation are applied to find out all the possible factors influencing scientific and technological(S&T) outputs, and considering data availability, collect all related data to S&T productivity and their influencing factors for the period 1996 -2008. Then based on the collected data, a bivariate correlation analysis method is utilized to analyse the mutual relations between S&T outputs and their influencing factors, and with the multiple linear regression method selecting the high - influencing factors to construct a model analyzing influencing factors and prediction for S&T outputs. Lastly based on the results of bivariate correlation analysis, a currently prevalent BP neural network prediction method is used to do a prediction study on S&T outputs, and compare the predictive performance with that of multiple linear regression method.%首先通过文献研究和网络调查等定性分析方法梳理出科技产出能力的所有可能的影响因素,并在数据可获得性的前提下,以1996-2008年为时间维,采集科技产出能力及其影响因素的相关数据,然后对科技产出能力及其影响因素之间的相互关系进行二元相关分析,并利用多元线性回归分析方法从所有相关因素中筛选出影响程度较高的因素,构建科技产出能力的影响因素分析与预测模型。最后基于二元相关分析的结果,选择相关程度较高的因素,利用目前流行的BP神经网络预测方法对科技产出能力进行预测研究,并与多元回归分析预测模型的预测性能进行比较。

  9. Analyzing Regression-Discontinuity Designs with Multiple Assignment Variables: A Comparative Study of Four Estimation Methods

    Science.gov (United States)

    Wong, Vivian C.; Steiner, Peter M.; Cook, Thomas D.

    2013-01-01

    In a traditional regression-discontinuity design (RDD), units are assigned to treatment on the basis of a cutoff score and a continuous assignment variable. The treatment effect is measured at a single cutoff location along the assignment variable. This article introduces the multivariate regression-discontinuity design (MRDD), where multiple…

  10. Design and analysis of experiments classical and regression approaches with SAS

    CERN Document Server

    Onyiah, Leonard C

    2008-01-01

    Introductory Statistical Inference and Regression Analysis Elementary Statistical Inference Regression Analysis Experiments, the Completely Randomized Design (CRD)-Classical and Regression Approaches Experiments Experiments to Compare Treatments Some Basic Ideas Requirements of a Good Experiment One-Way Experimental Layout or the CRD: Design and Analysis Analysis of Experimental Data (Fixed Effects Model) Expected Values for the Sums of Squares The Analysis of Variance (ANOVA) Table Follow-Up Analysis to Check fo

  11. Regression and kriging analysis for grid power factor estimation

    Directory of Open Access Journals (Sweden)

    Rajesh Guntaka

    2014-12-01

    Full Text Available The measurement of power factor (PF in electrical utility grids is a mainstay of load balancing and is also a critical element of transmission and distribution efficiency. The measurement of PF dates back to the earliest periods of electrical power distribution to public grids. In the wide-area distribution grid, measurement of current waveforms is trivial and may be accomplished at any point in the grid using a current tap transformer. However, voltage measurement requires reference to ground and so is more problematic and measurements are normally constrained to points that have ready and easy access to a ground source. We present two mathematical analysis methods based on kriging and linear least square estimation (LLSE (regression to derive PF at nodes with unknown voltages that are within a perimeter of sample nodes with ground reference across a selected power grid. Our results indicate an error average of 1.884% that is within acceptable tolerances for PF measurements that are used in load balancing tasks.

  12. Fast nonlinear regression method for CT brain perfusion analysis.

    Science.gov (United States)

    Bennink, Edwin; Oosterbroek, Jaap; Kudo, Kohsuke; Viergever, Max A; Velthuis, Birgitta K; de Jong, Hugo W A M

    2016-04-01

    Although computed tomography (CT) perfusion (CTP) imaging enables rapid diagnosis and prognosis of ischemic stroke, current CTP analysis methods have several shortcomings. We propose a fast nonlinear regression method with a box-shaped model (boxNLR) that has important advantages over the current state-of-the-art method, block-circulant singular value decomposition (bSVD). These advantages include improved robustness to attenuation curve truncation, extensibility, and unified estimation of perfusion parameters. The method is compared with bSVD and with a commercial SVD-based method. The three methods were quantitatively evaluated by means of a digital perfusion phantom, described by Kudo et al. and qualitatively with the aid of 50 clinical CTP scans. All three methods yielded high Pearson correlation coefficients ([Formula: see text]) with the ground truth in the phantom. The boxNLR perfusion maps of the clinical scans showed higher correlation with bSVD than the perfusion maps from the commercial method. Furthermore, it was shown that boxNLR estimates are robust to noise, truncation, and tracer delay. The proposed method provides a fast and reliable way of estimating perfusion parameters from CTP scans. This suggests it could be a viable alternative to current commercial and academic methods.

  13. A simplified procedure of linear regression in a preliminary analysis

    Directory of Open Access Journals (Sweden)

    Silvia Facchinetti

    2013-05-01

    Full Text Available The analysis of a statistical large data-set can be led by the study of a particularly interesting variable Y – regressed – and an explicative variable X, chosen among the remained variables, conjointly observed. The study gives a simplified procedure to obtain the functional link of the variables y=y(x by a partition of the data-set into m subsets, in which the observations are synthesized by location indices (mean or median of X and Y. Polynomial models for y(x of order r are considered to verify the characteristics of the given procedure, in particular we assume r= 1 and 2. The distributions of the parameter estimators are obtained by simulation, when the fitting is done for m= r + 1. Comparisons of the results, in terms of distribution and efficiency, are made with the results obtained by the ordinary least square methods. The study also gives some considerations on the consistency of the estimated parameters obtained by the given procedure.

  14. A Novel Multiobjective Evolutionary Algorithm Based on Regression Analysis

    Directory of Open Access Journals (Sweden)

    Zhiming Song

    2015-01-01

    Full Text Available As is known, the Pareto set of a continuous multiobjective optimization problem with m objective functions is a piecewise continuous (m-1-dimensional manifold in the decision space under some mild conditions. However, how to utilize the regularity to design multiobjective optimization algorithms has become the research focus. In this paper, based on this regularity, a model-based multiobjective evolutionary algorithm with regression analysis (MMEA-RA is put forward to solve continuous multiobjective optimization problems with variable linkages. In the algorithm, the optimization problem is modelled as a promising area in the decision space by a probability distribution, and the centroid of the probability distribution is (m-1-dimensional piecewise continuous manifold. The least squares method is used to construct such a model. A selection strategy based on the nondominated sorting is used to choose the individuals to the next generation. The new algorithm is tested and compared with NSGA-II and RM-MEDA. The result shows that MMEA-RA outperforms RM-MEDA and NSGA-II on the test instances with variable linkages. At the same time, MMEA-RA has higher efficiency than the other two algorithms. A few shortcomings of MMEA-RA have also been identified and discussed in this paper.

  15. A novel multiobjective evolutionary algorithm based on regression analysis.

    Science.gov (United States)

    Song, Zhiming; Wang, Maocai; Dai, Guangming; Vasile, Massimiliano

    2015-01-01

    As is known, the Pareto set of a continuous multiobjective optimization problem with m objective functions is a piecewise continuous (m - 1)-dimensional manifold in the decision space under some mild conditions. However, how to utilize the regularity to design multiobjective optimization algorithms has become the research focus. In this paper, based on this regularity, a model-based multiobjective evolutionary algorithm with regression analysis (MMEA-RA) is put forward to solve continuous multiobjective optimization problems with variable linkages. In the algorithm, the optimization problem is modelled as a promising area in the decision space by a probability distribution, and the centroid of the probability distribution is (m - 1)-dimensional piecewise continuous manifold. The least squares method is used to construct such a model. A selection strategy based on the nondominated sorting is used to choose the individuals to the next generation. The new algorithm is tested and compared with NSGA-II and RM-MEDA. The result shows that MMEA-RA outperforms RM-MEDA and NSGA-II on the test instances with variable linkages. At the same time, MMEA-RA has higher efficiency than the other two algorithms. A few shortcomings of MMEA-RA have also been identified and discussed in this paper.

  16. The Overall Odds Ratio as an Intuitive Effect Size Index for Multiple Logistic Regression: Examination of Further Refinements

    Science.gov (United States)

    Le, Huy; Marcus, Justin

    2012-01-01

    This study used Monte Carlo simulation to examine the properties of the overall odds ratio (OOR), which was recently introduced as an index for overall effect size in multiple logistic regression. It was found that the OOR was relatively independent of study base rate and performed better than most commonly used R-square analogs in indexing model…

  17. Multiple linear regression to develop strength scaled equations for knee and elbow joints based on age, gender and segment mass

    DEFF Research Database (Denmark)

    D'Souza, Sonia; Rasmussen, John; Schwirtz, Ansgar

    2012-01-01

    and valuable ergonomic tool. Objective: To investigate age and gender effects on the torque-producing ability in the knee and elbow in older adults. To create strength scaled equations based on age, gender, upper/lower limb lengths and masses using multiple linear regression. To reduce the number of dependent...

  18. Análise de regressão múltipla das concentrações de PM10 em função de elementos meteorológicos para Porto Alegre, Estado do Rio Grande do Sul, em 2005 e 2006 = Multiple regression analysis of PM10 concentration concerning to meteorological elements for Porto Alegre, Rio Grande do Sul State, in 2005 and 2006

    Directory of Open Access Journals (Sweden)

    Angela Radünz Lazzari

    2011-01-01

    Full Text Available O ar é um meio eficiente de dispersão de poluentes atmosféricos e seucomportamento depende dos movimentos atmosféricos que ocorrem na troposfera. Em Porto Alegre, Estado do Rio Grande do Sul, há um grande tráfego diário e uma concentração de indústrias que podem ser responsáveis por emissões atmosféricas. Neste trabalho, estudou-se ocomportamento das concentrações diárias de material particulado (PM10 desta cidade, considerando a influência dos elementos meteorológicos. A análise dos dados foi realizada a partir de estatísticas descritivas, correlação linear e regressão múltipla. Os dados foram fornecidos pela Fundação Estadual de Proteção Ambiental Henrique Luiz Roessler - RS (FEPAM e pelo Instituto Nacional de Meteorologia (INMET. A partir das análises pôde-se verificar que: asconcentrações do PM10, medidos diariamente às 16h, não ultrapassaram os padrões nacionais de qualidade do ar; os elementos meteorológicos que influenciam nas concentrações do PM10 foram: a velocidade média diária do vento e a radiação média diária com relações negativas; astemperaturas médias diárias do ar e as direções, norte e noroeste, do vento, com relações positivas. As direções do vento que contribuem significativamente para diminuir as concentrações nos locais medidos são Leste e Sudeste.Air is an efficient means of atmospheric pollutants dispersal and its r behavior depends on the atmospheric movements that occur in the troposphere. In Porto Alegre, Rio Grande do Sul State, there is a large daily traffic and a concentration of industries that may be responsible for atmospheric emission. In the present work we studied the behavior of daily concentrations of particulate matter (PM10, in this city, considering the influence of meteorological variables. Dataanalysis was performed from descriptive statistics, linear correlation and multiple regressions. Data were provided by the State Foundation of Environmental

  19. Selection of higher order regression models in the analysis of multi-factorial transcription data.

    Directory of Open Access Journals (Sweden)

    Olivia Prazeres da Costa

    Full Text Available INTRODUCTION: Many studies examine gene expression data that has been obtained under the influence of multiple factors, such as genetic background, environmental conditions, or exposure to diseases. The interplay of multiple factors may lead to effect modification and confounding. Higher order linear regression models can account for these effects. We present a new methodology for linear model selection and apply it to microarray data of bone marrow-derived macrophages. This experiment investigates the influence of three variable factors: the genetic background of the mice from which the macrophages were obtained, Yersinia enterocolitica infection (two strains, and a mock control, and treatment/non-treatment with interferon-γ. RESULTS: We set up four different linear regression models in a hierarchical order. We introduce the eruption plot as a new practical tool for model selection complementary to global testing. It visually compares the size and significance of effect estimates between two nested models. Using this methodology we were able to select the most appropriate model by keeping only relevant factors showing additional explanatory power. Application to experimental data allowed us to qualify the interaction of factors as either neutral (no interaction, alleviating (co-occurring effects are weaker than expected from the single effects, or aggravating (stronger than expected. We find a biologically meaningful gene cluster of putative C2TA target genes that appear to be co-regulated with MHC class II genes. CONCLUSIONS: We introduced the eruption plot as a tool for visual model comparison to identify relevant higher order interactions in the analysis of expression data obtained under the influence of multiple factors. We conclude that model selection in higher order linear regression models should generally be performed for the analysis of multi-factorial microarray data.

  20. Detection and parameter estimation for quantitative trait loci using regression models and multiple markers

    Directory of Open Access Journals (Sweden)

    Schook Lawrence B

    2000-07-01

    Full Text Available Abstract A strategy of multi-step minimal conditional regression analysis has been developed to determine the existence of statistical testing and parameter estimation for a quantitative trait locus (QTL that are unaffected by linked QTLs. The estimation of marker-QTL recombination frequency needs to consider only three cases: 1 the chromosome has only one QTL, 2 one side of the target QTL has one or more QTLs, and 3 either side of the target QTL has one or more QTLs. Analytical formula was derived to estimate marker-QTL recombination frequency for each of the three cases. The formula involves two flanking markers for case 1, two flanking markers plus a conditional marker for case 2, and two flanking markers plus two conditional markers for case 3. Each QTL variance and effect, and the total QTL variance were also estimated using analytical formulae. Simulation data show that the formulae for estimating marker-QTL recombination frequency could be a useful statistical tool for fine QTL mapping. With 1 000 observations, a QTL could be mapped to a narrow chromosome region of 1.5 cM if no linked QTL is present, and to a 2.8 cM chromosome region if either side of the target QTL has at least one linked QTL.

  1. INFLUENCE OF TOURISM SECTOR IN ALBANIAN GDP: STIMATION USING MULTIPLE REGRESSION METHOD

    Directory of Open Access Journals (Sweden)

    Eglantina HYSA

    2012-06-01

    Full Text Available During last years, tourism sector has significantly increased in Albania, since after year 1990 Albania has passed from a centralized economy to a liberal one. Tourism sector plays an important role in economic and social development. The contributions of this sector reflect directly into the generation of national income. The two main components matching the tourism movements are the number of tourists and the number of overnights in hotels. Investments done in this sector could be expected to have high positive influence in the country's GDP. This study seeks to identify the influence of tourists, their overnights in hotels and capital investment spending by all sectors directly involved in tourism sector on tourism total contribution to gross domestic product of Albania during 1996-2009. A regression analysis has been performed taking as dependent variable GDP generated by tourism sector and as independent variables, capital investment, tourist number and overnights in hotels. Even if all the variables have been found to be positivlye related, the variable ‘overnights of foreigners and Albanians in hotels' have beenfound insignificant.

  2. Multiple linear regression analysis of the X-ray measurement and WOMAC, KUJALA, MELBOURNE scores of patellofemoral pain syndrome%髌股疼痛综合征X线特征与三种评分系统的多元线性回归分析

    Institute of Scientific and Technical Information of China (English)

    薛刚; 朱庆生; 朱锦宇; 姜炜

    2013-01-01

    目的 采用X线测量发生髌股疼痛综合征(PFPS)膝关节的相关影像学参数,并分别与WOMAC、KUJALA和MEL-BOURNE评分系统进行多元线性回归分析.方法 筛选出49例(51膝)膝关节选取和PFPS相关的10项参数进行测量:股骨远端外翻角(DFVA,X1)、胫骨近端内翻角(PTVA,X2)、股骨角(FA,X3)、胫骨角(TA,X4)、胫股角(TFA,X5)、Insall-Salvati指数(ISR,X6)、沟角(SA,X7)、外侧髌骨角(LPA,X8)、适合角(CA,X9)、髌股指数(PI,X10),并进行WOMAC、KUJALA和MELBOURNE评分,应用多元线性回归方程分析影像学参数与评分之间的相关性.结果 3组多元线性回归方程均有统计学意义(P<0.05),WOMAC评分多元回归方程:Y=-213.742+2.011 X5,F=3.960,R2 =0.494;KUJALA评分多元回归方程:Y=125.835-24.475 X6-0.341 X7-0.992Xs,F=32.732,R2=0.891;MELBOURNE评分多元回归方程:Y=51.66-16.329X6-5.47X10,F =22.178,R2=0.856.结论 ①膝关节X线测量数据在一定程度上反映3项评分及膝关节功能的情况;②KUJALA评分能较全面地评估PFPS,轴位X线片上Insall-Salvati指数、沟角、外侧髌股角较为重要,可用于临床评估PFPS患者在治疗前后的功能恢复情况;③由于KUJALA和MELBOURNE评分的决定系数较大,回归系数标准误较小,从而在临床上通过统计控制确定评分值来评估影像学参数.%Objective To perform multiple linear regression analysis of X ray measurement and WOMAC,KUJALA and MELBOURNE scores of patellofemoral pain syndrome (PFPS) knee joints.Methods A total of 49 patients (51 knees) were reviewed according to inclusion and exclusion criteria.10 parameters were chosen including distal femoral valgus angle (DFVA,X1),proximal tibial varus angle (PTVA,X2),femoral angle (FA,X3),tibia angle (TA,X4),tibiofemoral angle(TFA,X5),Insall-Salvati ratio (ISR,X6),sulcus angle (SA,X7),lateral patellofemoral angle (LPA,X8),congruence angle (CA,X9) and patellofemoral index(PI,X10) which all were related to patellofemoral

  3. Relationship of push-ups and sit-ups tests to selected anthropometric variables and performance results: a multiple regression study.

    Science.gov (United States)

    Esco, Michael R; Olson, Michele S; Williford, Henry

    2008-11-01

    The purpose of this study was to explore whether selected anthropometric measures such as specific skinfold sites, along with weight, height, body mass index (BMI), waist and hip circumferences, and waist/hip ratio (WHR) were associated with sit-ups (SU) and push-ups (PU) performance, and to build a regression model for SU and PU tests. One hundred apparently healthy adults (40 men and 60 women) served as the subjects for test validation. The subjects performed 60-second SU and PU tests. The variables analyzed via multiple regression included weight, height, BMI, hip and waist circumferences, WHR, skinfolds at the abdomen (SFAB), thigh (SFTH), and subscapularis (SFSS), and sex. An additional cohort of 40 subjects (17 men and 23 women) was used to cross-validate the regression models. Validity was confirmed by correlation and paired t-tests. The regression analysis yielded a four-variable (PU, height, SFAB, and SFTH) multiple regression equation for estimating SU (R2 = 0.64, SEE = 7.5 repetitions). For PU, only SU was loaded into the regression equation (R2 = 0.43, SEE = 9.4 repetitions). Thus, the variables in the regression models accounted for 64% and 43% of the variation in SU and PU, respectively. The cross-validation sample elicited a high correlation for SU (r = 0.87) and PU (r = 0.79) scores. Moreover, paired-samples t-tests revealed that there were no significant differences between actual and predicted SU and PU scores. Therefore, this study shows that there are a number of selected, health-related anthropometric variables that account significantly for, and are predictive of, SU and PU tests.

  4. Analysis of pulsed eddy current data using regression models for steam generator tube support structure inspection

    Science.gov (United States)

    Buck, J. A.; Underhill, P. R.; Morelli, J.; Krause, T. W.

    2016-02-01

    Nuclear steam generators (SGs) are a critical component for ensuring safe and efficient operation of a reactor. Life management strategies are implemented in which SG tubes are regularly inspected by conventional eddy current testing (ECT) and ultrasonic testing (UT) technologies to size flaws, and safe operating life of SGs is predicted based on growth models. ECT, the more commonly used technique, due to the rapidity with which full SG tube wall inspection can be performed, is challenged when inspecting ferromagnetic support structure materials in the presence of magnetite sludge and multiple overlapping degradation modes. In this work, an emerging inspection method, pulsed eddy current (PEC), is being investigated to address some of these particular inspection conditions. Time-domain signals were collected by an 8 coil array PEC probe in which ferromagnetic drilled support hole diameter, depth of rectangular tube frets and 2D tube off-centering were varied. Data sets were analyzed with a modified principal components analysis (MPCA) to extract dominant signal features. Multiple linear regression models were applied to MPCA scores to size hole diameter as well as size rectangular outer diameter tube frets. Models were improved through exploratory factor analysis, which was applied to MPCA scores to refine selection for regression models inputs by removing nonessential information.

  5. Quantile regression provides a fuller analysis of speed data.

    Science.gov (United States)

    Hewson, Paul

    2008-03-01

    Considerable interest already exists in terms of assessing percentiles of speed distributions, for example monitoring the 85th percentile speed is a common feature of the investigation of many road safety interventions. However, unlike the mean, where t-tests and ANOVA can be used to provide evidence of a statistically significant change, inference on these percentiles is much less common. This paper examines the potential role of quantile regression for modelling the 85th percentile, or any other quantile. Given that crash risk may increase disproportionately with increasing relative speed, it may be argued these quantiles are of more interest than the conditional mean. In common with the more usual linear regression, quantile regression admits a simple test as to whether the 85th percentile speed has changed following an intervention in an analogous way to using the t-test to determine if the mean speed has changed by considering the significance of parameters fitted to a design matrix. Having briefly outlined the technique and briefly examined an application with a widely published dataset concerning speed measurements taken around the introduction of signs in Cambridgeshire, this paper will demonstrate the potential for quantile regression modelling by examining recent data from Northamptonshire collected in conjunction with a "community speed watch" programme. Freely available software is used to fit these models and it is hoped that the potential benefits of using quantile regression methods when examining and analysing speed data are demonstrated.

  6. Buffalos milk yield analysis using random regression models

    Directory of Open Access Journals (Sweden)

    A.S. Schierholt

    2010-02-01

    Full Text Available Data comprising 1,719 milk yield records from 357 females (predominantly Murrah breed, daughters of 110 sires, with births from 1974 to 2004, obtained from the Programa de Melhoramento Genético de Bubalinos (PROMEBUL and from records of EMBRAPA Amazônia Oriental - EAO herd, located in Belém, Pará, Brazil, were used to compare random regression models for estimating variance components and predicting breeding values of the sires. The data were analyzed by different models using the Legendre’s polynomial functions from second to fourth orders. The random regression models included the effects of herd-year, month of parity date of the control; regression coefficients for age of females (in order to describe the fixed part of the lactation curve and random regression coefficients related to the direct genetic and permanent environment effects. The comparisons among the models were based on the Akaike Infromation Criterion. The random effects regression model using third order Legendre’s polynomials with four classes of the environmental effect were the one that best described the additive genetic variation in milk yield. The heritability estimates varied from 0.08 to 0.40. The genetic correlation between milk yields in younger ages was close to the unit, but in older ages it was low.

  7. 学龄儿童心理行为特点与饮食行为关系多元线性回归分析%Multiple linear regression analysis of the correlation between psychological behaviors characteristics and diet behavior in the school-age children

    Institute of Scientific and Technical Information of China (English)

    沈洁; 张朋; 刘福康; 王婷婷; 孙桂菊; 刘江红

    2011-01-01

    Objective To analyze the effect of behavior characteristics of children on diet behavior, providing scientific knowledge for nutrition instruction. Methods A total of 302 fifth-grade primary school children from Jintan were selected. A questionnaire survey on nutritional behavior and psychological behavior was conducted among them from June to July 2010. The effect factors of diet behavior were analyzed with univariate linear regression. The variable (P < 0.05 in univariate regression model) was selected to establish multivariate regression model. Results The univariate linear regression analysis showed that anxious/depressed, social problems, thought problems, attention problems, aggressive behavior score and total score in boys and girls were negatively correlated with diet behavior score. Multiple linear regressions showed that attention problem scores in boys and thougHt problem scores in girls were negatively correlated with diet behavior score. Conclusion The findings demonstrate that psychological behaviors of school-age children are closely associated with diet behaviors. It is necessary to add health-related curriculum on risk behaviors prevention into quality education, carry out comprehensive behavior surveillance on psychology, nutrition and diet, and conduct early intervention in adolescents.%目的 分析学龄儿童心理行为特点对其饮食行为的影响,为有针对性地对其进行营养教育提供科学依据.方法 选择江苏省金坛市302名五年级儿童作为研究对象,于2010年5月至6月对其进行饮食行为和心理行为问题的问卷调查.利用单因素线性回归分析饮食行为的影响因素,从单因素分析结果中选择P< 0.05的变量建立多元线性回归模型.结果 单因素线性回归分析显示,不同性别间焦虑抑郁、社交问题、思维问题、注意缺陷、攻击行为因子分及总分与饮食量表得分负相关;多元线性回归分析显示,男性儿童注意缺陷

  8. Multiple Regression and Mediator Variables can be used to Avoid Double Counting when Economic Values are Derived using Stochastic Herd Simulation

    DEFF Research Database (Denmark)

    Østergaard, Søren; Ettema, Jehan Frans; Hjortø, Line;

    Multiple regression and model building with mediator variables was addressed to avoid double counting when economic values are estimated from data simulated with herd simulation modeling (using the SimHerd model). The simulated incidence of metritis was analyzed statistically as the independent...... in multiparous cows. The merit of using this approach was demonstrated since the economic value of metritis was estimated to be 81% higher when no mediator variables were included in the multiple regression analysis...... variable, while using the traits representing the direct effects of metritis on yield, fertility and occurrence of other diseases as mediator variables. The economic value of metritis was estimated to be €78 per 100 cow-years for each 1% increase of metritis in the period of 1-100 days in milk...

  9. Energy Consumption Forecasting and Energy Saving Analysis of Urban Buildings Based on Multiple Linear Regression Model%基于多元线性回归模型的建筑能耗预测与建筑节能分析∗

    Institute of Scientific and Technical Information of China (English)

    樊丽军

    2016-01-01

    针对城市建筑能耗的节约与有效利用,提出一种基于多元线性回归模型(MLRP)的建筑能耗预测与建筑节能分析模型。以天然气和电力为能耗目标,将建筑类型、建筑年代、占地面积和居住人数等参数作为输入特征,利用多元线性回归模型分析出对能耗具有显著性影响的因素,并预测整个区域的能耗。另外,通过该预测模型,可以评估实施改善措施后建筑的节能潜力。实验给出了各种场景下的建筑节能潜力,分析结果表明,提出的预测模型能够精确预测区域能耗。%For the issues that the saving and effective use of urban building energy consumption,a model of energy consumption forecast and energy saving analysis of urban buildings based on multiple linear re-gression model(MLRP)is proposed.This paper takes the natural gas and electric power as the energy con-sumption target,the building type,building age,floor area and number of residence as input characteristic parameters of multiple linear regression model,to analyze the factors which have a significant influence on energy consumption,so as to forecast the energy consumption of the whole region.In addition,it can evalu-ate the energy saving potential of the building after the implementation of the improvement measures by the prediction model.The experimental results show that the model can predict the regional energy consumption accurately,and give the building energy saving potential of various scenarios.

  10. Research on the High Star Hotel Staff Satisfaction Based on Multiple Regression Analysis:Taking the High Star Hotels in Changsha for Example%基于多元回归分析的高星级酒店员工满意度研究--以长沙市高星级酒店为例

    Institute of Scientific and Technical Information of China (English)

    王华丽

    2014-01-01

    The hotel staff satisfaction has been watched keenly by the hotel industry and the academia. In this paper, through investigation to the high star hotels in Changsha, the basic data are obtained and multiple regression analysis is used to study the influencing factors of hotel staff satisfaction. The results indicate that promotion prospect has the largest impact on employee satisfaction, followed by compensation, and the influence of work itself is not significant in statistical sense.%酒店员工满意度问题一直受到业界和学界的普遍关注。本文通过对长沙市高星级酒店进行调查,获得基础数据,采用多元回归分析研究酒店员工满意度的影响因素,研究结果发现:晋升机会对员工满意度的影响最大,其次是薪酬,而工作本身对员工满意度的影响在统计意义上并不显著。

  11. Measuring Habituation in Infants: An Approach Using Regression Analysis.

    Science.gov (United States)

    Ashmead, Daniel H.; Davis, DeFord L.

    1996-01-01

    Used computer simulations to examine effectiveness of different criteria for measuring infant visual habituation. Found that a criterion based on fitting a second-order polynomial regression function to looking-time data produced more accurate estimation of looking times and higher power for detecting novelty effects than did the traditional…

  12. Spontaneous Regression of Hepatocellular Carcinoma with Multiple Lung Metastases: A Case Report and Review of the Literature.

    Science.gov (United States)

    Pectasides, Eirini; Miksad, Rebecca; Pyatibrat, Sergey; Srivastava, Amogh; Bullock, Andrea

    2016-09-01

    Spontaneous regression of hepatocellular carcinoma (HCC) is a rare event. Here we present a case of spontaneous regression of metastatic HCC. A 53-year-old man with hepatitis C and alcoholic cirrhosis was found to have a large liver mass consistent with HCC based on its radiographic features. Imaging also revealed left portal and hepatic vein thrombosis, as well as multiple lung nodules concerning for metastases. Approximately 2 months after the initial diagnosis, both the primary liver lesion and the lung metastases decreased in size and eventually resolved without any intervention. Thereafter, the left hepatic vein thrombus progressed into the inferior vena cava and the right atrium, and the patient died due to right heart failure. In this case report and literature review, we discuss the potential mechanisms for and review the literature on spontaneous regression of metastatic HCC.

  13. The Generalized Regression Discontinuity Design: Using Multiple Assignment Variables and Cutoffs to Estimate Treatment Effects

    Science.gov (United States)

    Wong, Vivian C.; Steiner, Peter M.; Cook, Thomas D.

    2009-01-01

    This paper introduces a generalization of the regression-discontinuity design (RDD). Traditionally, RDD is considered in a two-dimensional framework, with a single assignment variable and cutoff. Treatment effects are measured at a single location along the assignment variable. However, this represents a specialized (and straight-forward)…

  14. Analyzing Regression-Discontinuity Designs with Multiple Assignment Variables: A Comparative Study of Four Estimation Methods

    Science.gov (United States)

    Wong, Vivian C.; Steiner, Peter M.; Cook, Thomas D.

    2012-01-01

    In a traditional regression-discontinuity design (RDD), units are assigned to treatment and comparison conditions solely on the basis of a single cutoff score on a continuous assignment variable. The discontinuity in the functional form of the outcome at the cutoff represents the treatment effect, or the average treatment effect at the cutoff.…

  15. Multiple regression models for the prediction of the maximum obtainable thermal efficiency of organic Rankine cycles

    DEFF Research Database (Denmark)

    Larsen, Ulrik; Pierobon, Leonardo; Wronski, Jorrit;

    2014-01-01

    to power. In this study we propose four linear regression models to predict the maximum obtainable thermal efficiency for simple and recuperated ORCs. A previously derived methodology is able to determine the maximum thermal efficiency among many combinations of fluids and processes, given the boundary...

  16. Point Estimates and Confidence Intervals for Variable Importance in Multiple Linear Regression

    Science.gov (United States)

    Thomas, D. Roland; Zhu, PengCheng; Decady, Yves J.

    2007-01-01

    The topic of variable importance in linear regression is reviewed, and a measure first justified theoretically by Pratt (1987) is examined in detail. Asymptotic variance estimates are used to construct individual and simultaneous confidence intervals for these importance measures. A simulation study of their coverage properties is reported, and an…

  17. Sensitivity analysis and optimization of system dynamics models : Regression analysis and statistical design of experiments

    NARCIS (Netherlands)

    Kleijnen, J.P.C.

    1995-01-01

    This tutorial discusses what-if analysis and optimization of System Dynamics models. These problems are solved, using the statistical techniques of regression analysis and design of experiments (DOE). These issues are illustrated by applying the statistical techniques to a System Dynamics model for

  18. REGRESSION ANALYSIS OF PRODUCTIVITY USING MIXED EFFECT MODEL

    Directory of Open Access Journals (Sweden)

    Siana Halim

    2007-01-01

    Full Text Available Production plants of a company are located in several areas that spread across Middle and East Java. As the production process employs mostly manpower, we suspected that each location has different characteristics affecting the productivity. Thus, the production data may have a spatial and hierarchical structure. For fitting a linear regression using the ordinary techniques, we are required to make some assumptions about the nature of the residuals i.e. independent, identically and normally distributed. However, these assumptions were rarely fulfilled especially for data that have a spatial and hierarchical structure. We worked out the problem using mixed effect model. This paper discusses the model construction of productivity and several characteristics in the production line by taking location as a random effect. The simple model with high utility that satisfies the necessary regression assumptions was built using a free statistic software R version 2.6.1.

  19. Model performance analysis and model validation in logistic regression

    Directory of Open Access Journals (Sweden)

    Rosa Arboretti Giancristofaro

    2007-10-01

    Full Text Available In this paper a new model validation procedure for a logistic regression model is presented. At first, we illustrate a brief review of different techniques of model validation. Next, we define a number of properties required for a model to be considered "good", and a number of quantitative performance measures. Lastly, we describe a methodology for the assessment of the performance of a given model by using an example taken from a management study.

  20. A New Approach in Regression Analysis for Modeling Adsorption Isotherms

    Directory of Open Access Journals (Sweden)

    Dana D. Marković

    2014-01-01

    Full Text Available Numerous regression approaches to isotherm parameters estimation appear in the literature. The real insight into the proper modeling pattern can be achieved only by testing methods on a very big number of cases. Experimentally, it cannot be done in a reasonable time, so the Monte Carlo simulation method was applied. The objective of this paper is to introduce and compare numerical approaches that involve different levels of knowledge about the noise structure of the analytical method used for initial and equilibrium concentration determination. Six levels of homoscedastic noise and five types of heteroscedastic noise precision models were considered. Performance of the methods was statistically evaluated based on median percentage error and mean absolute relative error in parameter estimates. The present study showed a clear distinction between two cases. When equilibrium experiments are performed only once, for the homoscedastic case, the winning error function is ordinary least squares, while for the case of heteroscedastic noise the use of orthogonal distance regression or Margart’s percent standard deviation is suggested. It was found that in case when experiments are repeated three times the simple method of weighted least squares performed as well as more complicated orthogonal distance regression method.

  1. Partial least squares regression can aid in detecting differential abundance of multiple features in sets of metagenomic samples

    Directory of Open Access Journals (Sweden)

    Ondrej eLibiger

    2015-12-01

    Full Text Available It is now feasible to examine the composition and diversity of microbial communities (i.e., `microbiomes‘ that populate different human organs and orifices using DNA sequencing and related technologies. To explore the potential links between changes in microbial communities and various diseases in the human body, it is essential to test associations involving different species within and across microbiomes, environmental settings and disease states. Although a number of statistical techniques exist for carrying out relevant analyses, it is unclear which of these techniques exhibit the greatest statistical power to detect associations given the complexity of most microbiome datasets. We compared the statistical power of principal component regression, partial least squares regression, regularized regression, distance-based regression, Hill's diversity measures, and a modified test implemented in the popular and widely used microbiome analysis methodology 'Metastats‘ across a wide range of simulated scenarios involving changes in feature abundance between two sets of metagenomic samples. For this purpose, simulation studies were used to change the abundance of microbial species in a real dataset from a published study examining human hands. Each technique was applied to the same data, and its ability to detect the simulated change in abundance was assessed. We hypothesized that a small subset of methods would outperform the rest in terms of the statistical power. Indeed, we found that the Metastats technique modified to accommodate multivariate analysis and partial least squares regression yielded high power under the models and data sets we studied. The statistical power of diversity measure-based tests, distance-based regression and regularized regression was significantly lower. Our results provide insight into powerful analysis strategies that utilize information on species counts from large microbiome data sets exhibiting skewed frequency

  2. A methodology for the design of experiments in computational intelligence with multiple regression models.

    Science.gov (United States)

    Fernandez-Lozano, Carlos; Gestal, Marcos; Munteanu, Cristian R; Dorado, Julian; Pazos, Alejandro

    2016-01-01

    The design of experiments and the validation of the results achieved with them are vital in any research study. This paper focuses on the use of different Machine Learning approaches for regression tasks in the field of Computational Intelligence and especially on a correct comparison between the different results provided for different methods, as those techniques are complex systems that require further study to be fully understood. A methodology commonly accepted in Computational intelligence is implemented in an R package called RRegrs. This package includes ten simple and complex regression models to carry out predictive modeling using Machine Learning and well-known regression algorithms. The framework for experimental design presented herein is evaluated and validated against RRegrs. Our results are different for three out of five state-of-the-art simple datasets and it can be stated that the selection of the best model according to our proposal is statistically significant and relevant. It is of relevance to use a statistical approach to indicate whether the differences are statistically significant using this kind of algorithms. Furthermore, our results with three real complex datasets report different best models than with the previously published methodology. Our final goal is to provide a complete methodology for the use of different steps in order to compare the results obtained in Computational Intelligence problems, as well as from other fields, such as for bioinformatics, cheminformatics, etc., given that our proposal is open and modifiable.

  3. Regression analysis of censored data using pseudo-observations

    DEFF Research Database (Denmark)

    Parner, Erik T.; Andersen, Per Kragh

    2010-01-01

    We draw upon a series of articles in which a method based on pseu- dovalues is proposed for direct regression modeling of the survival function, the restricted mean, and the cumulative incidence function in competing risks with right-censored data. The models, once the pseudovalues have been comp...... computed, can be fit using standard generalized estimating equation software. Here we present Stata procedures for computing these pseudo-observations. An example from a bone marrow transplantation study is used to illustrate the method....

  4. Analysis of some methods for reduced rank Gaussian process regression

    DEFF Research Database (Denmark)

    Quinonero-Candela, J.; Rasmussen, Carl Edward

    2005-01-01

    While there is strong motivation for using Gaussian Processes (GPs) due to their excellent performance in regression and classification problems, their computational complexity makes them impractical when the size of the training set exceeds a few thousand cases. This has motivated the recent...... Gaussian Processes (RRGPs) are equivalent to finite sparse linear models. We also introduce the concept of degenerate GPs and show that they correspond to inappropriate priors. We show how to modify the RRGP to prevent it from being degenerate at test time. Training RRGPs consists both in learning...

  5. Multivariate linear regression of high-dimensional fMRI data with multiple target variables.

    Science.gov (United States)

    Valente, Giancarlo; Castellanos, Agustin Lage; Vanacore, Gianluca; Formisano, Elia

    2014-05-01

    Multivariate regression is increasingly used to study the relation between fMRI spatial activation patterns and experimental stimuli or behavioral ratings. With linear models, informative brain locations are identified by mapping the model coefficients. This is a central aspect in neuroimaging, as it provides the sought-after link between the activity of neuronal populations and subject's perception, cognition or behavior. Here, we show that mapping of informative brain locations using multivariate linear regression (MLR) may lead to incorrect conclusions and interpretations. MLR algorithms for high dimensional data are designed to deal with targets (stimuli or behavioral ratings, in fMRI) separately, and the predictive map of a model integrates information deriving from both neural activity patterns and experimental design. Not accounting explicitly for the presence of other targets whose associated activity spatially overlaps with the one of interest may lead to predictive maps of troublesome interpretation. We propose a new model that can correctly identify the spatial patterns associated with a target while achieving good generalization. For each target, the training is based on an augmented dataset, which includes all remaining targets. The estimation on such datasets produces both maps and interaction coefficients, which are then used to generalize. The proposed formulation is independent of the regression algorithm employed. We validate this model on simulated fMRI data and on a publicly available dataset. Results indicate that our method achieves high spatial sensitivity and good generalization and that it helps disentangle specific neural effects from interaction with predictive maps associated with other targets.

  6. Using Multiple Regression in Estimating (semi) VOC Emissions and Concentrations at the European Scale

    DEFF Research Database (Denmark)

    Fauser, Patrik; Thomsen, Marianne; Pistocchi, Alberto

    2010-01-01

    for an in-depth risk assessment. Uncertainty measures are not available for the RAR data; however, uncertainties for the applied regression models are given in the paper. Evaluation of the methods reveals that between 79% and 93% of all emission and PEC estimates are within one order of magnitude...... of the reported RAR values. Bearing in mind that the domain of the method comprises organic industrial high-production volume chemicals, four chemicals, prioritized in the Water Framework Directive and the Stockholm Convention on Persistent Organic Pollutants, were used to test the method for estimated emissions...

  7. [Inversion of the lake total nitrogen concentration by multiple regression kriging model based on hyperspectral data of HJ-1A].

    Science.gov (United States)

    Pan, Bang-long; Yi, Wei-ning; Wang, Xian-hua; Qin, Hui-ping; Wang, Jia-cheng; Qiao, Yan-li

    2011-07-01

    The content of total nitrogen in the waters is an important index to measure lake water quality, and the technique of remote sensing plays a large role in quantitatively monitoring the dynamic change and timely grasping the status of lake pollution. Taking Chaohu as an example, quantitative inversion models of total nitrogen were established by multivariable regression Kriging under analyzing of an correlation between total nitrogen and chlorophyll-a or suspended solids by HIS hyperspectral remote sensing data of HJ-1A satellite. The result shows that the correlation of 0.76 was discovered between total nitrogen and the multiple combination with band 72, band 79 and band 97, while the correlation could be increased to 0.83 by applying combined model of multiple linear regression and ordinary Kriging. The optimization of the residuals of the conventional regression model can improve the accuracy of the inversion effectively. These results also provide useful exploration for further establishing a common model of quantitative inversion of lake total nitrogen concentration.

  8. Additive Intensity Regression Models in Corporate Default Analysis

    DEFF Research Database (Denmark)

    Lando, David; Medhat, Mamdouh; Nielsen, Mads Stenbo

    2013-01-01

    We consider additive intensity (Aalen) models as an alternative to the multiplicative intensity (Cox) models for analyzing the default risk of a sample of rated, nonfinancial U.S. firms. The setting allows for estimating and testing the significance of time-varying effects. We use a variety of mo...

  9. Predicting agility performance with other performance variables in pubescent boys: a multiple-regression approach.

    Science.gov (United States)

    Sekulic, Damir; Spasic, Miodrag; Esco, Michael R

    2014-04-01

    The goal was to investigate the influence of balance, jumping power, reactive-strength, speed, and morphological variables on five different agility performances in early pubescent boys (N = 71). The predictors included body height and mass, countermovement and broad jumps, overall stability index, 5 m sprint, and bilateral side jumps test of reactive strength. Forward stepwise regressions calculated on 36 randomly selected participants explained 47% of the variance in performance of the forward-backward running test, 50% of the 180 degrees turn test, 55% of the 20 yd. shuttle test, 62% of the T-shaped course test, and 44% of the zig-zag test, with the bilateral side jumps as the single best predictor. Regression models were cross-validated using the second half of the sample (n = 35). Correlation between predicted and achieved scores did not provide statistically significant validation statistics for the continuous-movement zig-zag test. Further study is needed to assess other predictors of agility in early pubescent boys.

  10. Predictive equations using regression analysis of pulmonary function for healthy children in Northeast China.

    Directory of Open Access Journals (Sweden)

    Ya-Nan Ma

    Full Text Available BACKGROUND: There have been few published studies on spirometric reference values for healthy children in China. We hypothesize that there would have been changes in lung function that would not have been precisely predicted by the existing spirometric reference equations. The objective of the study was to develop more accurate predictive equations for spirometric reference values for children aged 9 to 15 years in Northeast China. METHODOLOGY/PRINCIPAL FINDINGS: Spirometric measurements were obtained from 3,922 children, including 1,974 boys and 1,948 girls, who were randomly selected from five cities of Liaoning province, Northeast China, using the ATS (American Thoracic Society and ERS (European Respiratory Society standards. The data was then randomly split into a training subset containing 2078 cases and a validation subset containing 1844 cases. Predictive equations used multiple linear regression techniques with three predictor variables: height, age and weight. Model goodness of fit was examined using the coefficient of determination or the R(2 and adjusted R(2. The predicted values were compared with those obtained from the existing spirometric reference equations. The results showed the prediction equations using linear regression analysis performed well for most spirometric parameters. Paired t-tests were used to compare the predicted values obtained from the developed and existing spirometric reference equations based on the validation subset. The t-test for males was not statistically significant (p>0.01. The predictive accuracy of the developed equations was higher than the existing equations and the predictive ability of the model was also validated. CONCLUSION/SIGNIFICANCE: We developed prediction equations using linear regression analysis of spirometric parameters for children aged 9-15 years in Northeast China. These equations represent the first attempt at predicting lung function for Chinese children following the ATS

  11. Functional Unfold Principal Component Regression Methodology for Analysis of Industrial Batch Process Data

    DEFF Research Database (Denmark)

    Mears, Lisa; Nørregaard, Rasmus; Sin, Gürkan;

    2016-01-01

    process operating at Novozymes A/S. Following the FUPCR methodology, the final product concentration could be predicted with an average prediction error of 7.4%. Multiple iterations of preprocessing were applied by implementing the methodology to identify the best data handling methods for the model....... It is shown that application of functional data analysis and the choice of variance scaling method have the greatest impact on the prediction accuracy. Considering the vast amount of batch process data continuously generated in industry, this methodology can potentially contribute as a tool to identify......This work proposes a methodology utilizing functional unfold principal component regression (FUPCR), for application to industrial batch process data as a process modeling and optimization tool. The methodology is applied to an industrial fermentation dataset, containing 30 batches of a production...

  12. Normalization Ridge Regression in Practice II: The Estimation of Multiple Feedback Linkages.

    Science.gov (United States)

    Bulcock, J. W.

    The use of the two-stage least squares (2 SLS) procedure for estimating nonrecursive social science models is often impractical when multiple feedback linkages are required. This is because 2 SLS is extremely sensitive to multicollinearity. The standard statistical solution to the multicollinearity problem is a biased, variance reduced procedure…

  13. Multiple Linear Regression Model Based on Neural Network and Its Application in the MBR Simulation

    Directory of Open Access Journals (Sweden)

    Chunqing Li

    2012-01-01

    Full Text Available The computer simulation of the membrane bioreactor MBR has become the research focus of the MBR simulation. In order to compensate for the defects, for example, long test period, high cost, invisible equipment seal, and so forth, on the basis of conducting in-depth study of the mathematical model of the MBR, combining with neural network theory, this paper proposed a three-dimensional simulation system for MBR wastewater treatment, with fast speed, high efficiency, and good visualization. The system is researched and developed with the hybrid programming of VC++ programming language and OpenGL, with a multifactor linear regression model of affecting MBR membrane fluxes based on neural network, applying modeling method of integer instead of float and quad tree recursion. The experiments show that the three-dimensional simulation system, using the above models and methods, has the inspiration and reference for the future research and application of the MBR simulation technology.

  14. Exergy Analysis of a Subcritical Reheat Steam Power Plant with Regression Modeling and Optimization

    Directory of Open Access Journals (Sweden)

    MUHIB ALI RAJPER

    2016-07-01

    Full Text Available In this paper, exergy analysis of a 210 MW SPP (Steam Power Plant is performed. Firstly, the plant is modeled and validated, followed by a parametric study to show the effects of various operating parameters on the performance parameters. The net power output, energy efficiency, and exergy efficiency are taken as the performance parameters, while the condenser pressure, main steam pressure, bled steam pressures, main steam temperature, and reheat steam temperature isnominated as the operating parameters. Moreover, multiple polynomial regression models are developed to correlate each performance parameter with the operating parameters. The performance is then optimizedby using Direct-searchmethod. According to the results, the net power output, energy efficiency, and exergy efficiency are calculated as 186.5 MW, 31.37 and 30.41%, respectively under normal operating conditions as a base case. The condenser is a major contributor towards the energy loss, followed by the boiler, whereas the highest irreversibilities occur in the boiler and turbine. According to the parametric study, variation in the operating parameters greatly influences the performance parameters. The regression models have appeared to be a good estimator of the performance parameters. The optimum net power output, energy efficiency and exergy efficiency are obtained as 227.6 MW, 37.4 and 36.4, respectively, which have been calculated along with optimal values of selected operating parameters.

  15. Comparative analysis of regression and artificial neural network models for wind speed prediction

    Science.gov (United States)

    Bilgili, Mehmet; Sahin, Besir

    2010-11-01

    In this study, wind speed was modeled by linear regression (LR), nonlinear regression (NLR) and artificial neural network (ANN) methods. A three-layer feedforward artificial neural network structure was constructed and a backpropagation algorithm was used for the training of ANNs. To get a successful simulation, firstly, the correlation coefficients between all of the meteorological variables (wind speed, ambient temperature, atmospheric pressure, relative humidity and rainfall) were calculated taking two variables in turn for each calculation. All independent variables were added to the simple regression model. Then, the method of stepwise multiple regression was applied for the selection of the “best” regression equation (model). Thus, the best independent variables were selected for the LR and NLR models and also used in the input layer of the ANN. The results obtained by all methods were compared to each other. Finally, the ANN method was found to provide better performance than the LR and NLR methods.

  16. Assessment of the expected construction company’s net profit using neural network and multiple regression models

    Directory of Open Access Journals (Sweden)

    H.H. Mohamad

    2013-09-01

    This research aims to develop a mathematical model for assessing the expected net profit of any construction company. To achieve the research objective, four steps were performed. First, the main factors affecting firms’ net profit were identified. Second, pertinent data regarding the net profit factors were collected. Third, two different net profit models were developed using the Multiple Regression (MR and the Neural Network (NN techniques. The validity of the proposed models was also investigated. Finally, the results of both MR and NN models were compared to investigate the predictive capabilities of the two models.

  17. Prediction of Rotor Spun Yarn Strength Using Adaptive Neuro-fuzzy Inference System and Linear Multiple Regression Methods

    Institute of Scientific and Technical Information of China (English)

    NURWAHA Deogratias; WANG Xin-hou

    2008-01-01

    This paper presents a comparison study of two models for predicting the strength of rotor spun cotton yarns from fiber properties. The adaptive neuro-fuzzy system inference (ANFIS) and Multiple Linear Regression models are used to predict the rotor spun yarn strength. Fiber properties and yarn count are used as inputs to train the two models and the count-strength-product (CSP) was the target. The predictive performances of the two models are estimated and compared. We found that the ANFIS has a better predictive power in comparison with linear multipleregression model. The impact of each fiber property is also illustrated.

  18. Modeling Information Content Via Dirichlet-Multinomial Regression Analysis.

    Science.gov (United States)

    Ferrari, Alberto

    2017-02-16

    Shannon entropy is being increasingly used in biomedical research as an index of complexity and information content in sequences of symbols, e.g. languages, amino acid sequences, DNA methylation patterns and animal vocalizations. Yet, distributional properties of information entropy as a random variable have seldom been the object of study, leading to researchers mainly using linear models or simulation-based analytical approach to assess differences in information content, when entropy is measured repeatedly in different experimental conditions. Here a method to perform inference on entropy in such conditions is proposed. Building on results coming from studies in the field of Bayesian entropy estimation, a symmetric Dirichlet-multinomial regression model, able to deal efficiently with the issue of mean entropy estimation, is formulated. Through a simulation study the model is shown to outperform linear modeling in a vast range of scenarios and to have promising statistical properties. As a practical example, the method is applied to a data set coming from a real experiment on animal communication.

  19. Air Pollution Analysis using Ontologies and Regression Models

    Directory of Open Access Journals (Sweden)

    Parul Choudhary

    2016-07-01

    Full Text Available Rapidly throughout the world economy, "the expansive Web" in the "world" explosive growth, rapidly growing market characterized by short product cycles exists and the demand for increased flexibility as well as the extensive use of a new data vision managed data society. A new socio-economic system that relies more and more on movement and allocation results in data whose daily existence, refinement, economy and adjust the exchange industry. Cooperative Engineering Co -operation and multi -disciplinary installed on people's cooperation is a good example. Semantic Web is a new form of Web content that is meaningful to computers and additional approved another example. Communication, vision sharing and exchanging data Society's are new commercial bet. Urban air pollution modeling and data processing techniques need elevated Association. Artificial intelligence in countless ways and breakthrough technologies can solve environmental problems from uneven offers. A method for data to formal ontology means a true meaning and lack of ambiguity to allow us to portray memo. In this work we survey regression model for ontologies and air pollution.

  20. Survival analysis of cervical cancer using stratified Cox regression

    Science.gov (United States)

    Purnami, S. W.; Inayati, K. D.; Sari, N. W. Wulan; Chosuvivatwong, V.; Sriplung, H.

    2016-04-01

    Cervical cancer is one of the mostly widely cancer cause of the women death in the world including Indonesia. Most cervical cancer patients come to the hospital already in an advanced stadium. As a result, the treatment of cervical cancer becomes more difficult and even can increase the death's risk. One of parameter that can be used to assess successfully of treatment is the probability of survival. This study raises the issue of cervical cancer survival patients at Dr. Soetomo Hospital using stratified Cox regression based on six factors such as age, stadium, treatment initiation, companion disease, complication, and anemia. Stratified Cox model is used because there is one independent variable that does not satisfy the proportional hazards assumption that is stadium. The results of the stratified Cox model show that the complication variable is significant factor which influent survival probability of cervical cancer patient. The obtained hazard ratio is 7.35. It means that cervical cancer patient who has complication is at risk of dying 7.35 times greater than patient who did not has complication. While the adjusted survival curves showed that stadium IV had the lowest probability of survival.

  1. Simulation Experiments in Practice : Statistical Design and Regression Analysis

    NARCIS (Netherlands)

    Kleijnen, J.P.C.

    2007-01-01

    In practice, simulation analysts often change only one factor at a time, and use graphical analysis of the resulting Input/Output (I/O) data. The goal of this article is to change these traditional, naïve methods of design and analysis, because statistical theory proves that more information is obta

  2. A comparative study between the use of artificial neural networks and multiple linear regression for caustic concentration prediction in a stage of alumina production

    Directory of Open Access Journals (Sweden)

    Giovanni Leopoldo Rozza

    2015-09-01

    Full Text Available With world becoming each day a global village, enterprises continuously seek to optimize their internal processes to hold or improve their competitiveness and make better use of natural resources. In this context, decision support tools are an underlying requirement. Such tools are helpful on predicting operational issues, avoiding cost risings, loss of productivity, work-related accident leaves or environmental disasters. This paper has its focus on the prediction of spent liquor caustic concentration of Bayer process for alumina production. Caustic concentration measuring is essential to keep it at expected levels, otherwise quality issues might arise. The organization requests caustic concentration by chemical analysis laboratory once a day, such information is not enough to issue preventive actions to handle process inefficiencies that will be known only after new measurement on the next day. Thereby, this paper proposes using Multiple Linear Regression and Artificial Neural Networks techniques a mathematical model to predict the spent liquor´s caustic concentration. Hence preventive actions will occur in real time. Such models were built using software tool for numerical computation (MATLAB and a statistical analysis software package (SPSS. The models output (predicted caustic concentration were compared with the real lab data. We found evidence suggesting superior results with use of Artificial Neural Networks over Multiple Linear Regression model. The results demonstrate that replacing laboratorial analysis by the forecasting model to support technical staff on decision making could be feasible.

  3. A regression analysis on the green olives debittering

    Directory of Open Access Journals (Sweden)

    Kopsidas, Gerassimos C.

    1991-12-01

    Full Text Available In this paper, a regression model, which gives the debittering time t as a function of the sodium hydroxide concentration 0 and the debittering temperature T, at the debittering of medium size green olive fruit of the Conservolea variety, is fitted. This model has the simple form t=aoCa1 ∙ ea2/T, where ao, a1, and a2 are constants. The values of ao, a1, and a2 are determined by the method of least squares from a set of experimental data. The determined model is very satisfactory for the conditions in which Greek green olives are debittered.

    En este artículo se ajusta un modelo de regresión, que da el tiempo de endulzamiento t en función de la concentración de hidróxido sódico C y la temperatura de endulzamiento T, en el endulzamiento de aceitunas verdes de tamaño mediano de la variedad Conservolea. Este modelo tiene la forma simple t=aoCa1 ∙ ea2/T, donde a1 y a2 son constantes. Los valores de ao, a1, y a2 son determinados por el método de los mínimos cuadrados a partir de un grupo de datos experimentales. El modelo determinado es muy satisfactorio para las condiciones en las que las aceitunas verdes griegas son endulzadas.

  4. Introduction to mixed modelling beyond regression and analysis of variance

    CERN Document Server

    Galwey, N W

    2007-01-01

    Mixed modelling is one of the most promising and exciting areas of statistical analysis, enabling more powerful interpretation of data through the recognition of random effects. However, many perceive mixed modelling as an intimidating and specialized technique.

  5. Robust analysis of trends in noisy tokamak confinement data using geodesic least squares regression

    Science.gov (United States)

    Verdoolaege, G.; Shabbir, A.; Hornung, G.

    2016-11-01

    Regression analysis is a very common activity in fusion science for unveiling trends and parametric dependencies, but it can be a difficult matter. We have recently developed the method of geodesic least squares (GLS) regression that is able to handle errors in all variables, is robust against data outliers and uncertainty in the regression model, and can be used with arbitrary distribution models and regression functions. We here report on first results of application of GLS to estimation of the multi-machine scaling law for the energy confinement time in tokamaks, demonstrating improved consistency of the GLS results compared to standard least squares.

  6. Sequential Monte Carlo tracking of the marginal artery by multiple cue fusion and random forest regression.

    Science.gov (United States)

    Cherry, Kevin M; Peplinski, Brandon; Kim, Lauren; Wang, Shijun; Lu, Le; Zhang, Weidong; Liu, Jianfei; Wei, Zhuoshi; Summers, Ronald M

    2015-01-01

    Given the potential importance of marginal artery localization in automated registration in computed tomography colonography (CTC), we have devised a semi-automated method of marginal vessel detection employing sequential Monte Carlo tracking (also known as particle filtering tracking) by multiple cue fusion based on intensity, vesselness, organ detection, and minimum spanning tree information for poorly enhanced vessel segments. We then employed a random forest algorithm for intelligent cue fusion and decision making which achieved high sensitivity and robustness. After applying a vessel pruning procedure to the tracking results, we achieved statistically significantly improved precision compared to a baseline Hessian detection method (2.7% versus 75.2%, prandom forest) with a sequential Monte Carlo tracking mechanism. In so doing, we present the effective application of an anatomical probability map to vessel pruning as well as a supplementary spatial coordinate system for colonic segmentation and registration when this task has been confounded by colon lumen collapse.

  7. Relative accuracy of spatial predictive models for lynx Lynx canadensis derived using logistic regression-AIC, multiple criteria evaluation and Bayesian approaches

    Directory of Open Access Journals (Sweden)

    Shelley M. ALEXANDER

    2009-02-01

    Full Text Available We compared probability surfaces derived using one set of environmental variables in three Geographic Information Systems (GIS-based approaches: logistic regression and Akaike’s Information Criterion (AIC, Multiple Criteria Evaluation (MCE, and Bayesian Analysis (specifically Dempster-Shafer theory. We used lynx Lynx canadensis as our focal species, and developed our environment relationship model using track data collected in Banff National Park, Alberta, Canada, during winters from 1997 to 2000. The accuracy of the three spatial models were compared using a contingency table method. We determined the percentage of cases in which both presence and absence points were correctly classified (overall accuracy, the failure to predict a species where it occurred (omission error and the prediction of presence where there was absence (commission error. Our overall accuracy showed the logistic regression approach was the most accurate (74.51%. The multiple criteria evaluation was intermediate (39.22%, while the Dempster-Shafer (D-S theory model was the poorest (29.90%. However, omission and commission error tell us a different story: logistic regression had the lowest commission error, while D-S theory produced the lowest omission error. Our results provide evidence that habitat modellers should evaluate all three error measures when ascribing confidence in their model. We suggest that for our study area at least, the logistic regression model is optimal. However, where sample size is small or the species is very rare, it may also be useful to explore and/or use a more ecologically cautious modelling approach (e.g. Dempster-Shafer that would over-predict, protect more sites, and thereby minimize the risk of missing critical habitat in conservation plans[Current Zoology 55(1: 28 – 40, 2009].

  8. Relative accuracy of spatial predictive models for lynx Lynx canadensis derived using logistic regression-AIC, multiple criteria evaluation and Bayesian approaches

    Institute of Scientific and Technical Information of China (English)

    Hejun KANG; Shelley M.ALEXANDER

    2009-01-01

    We compared probability surfaces derived using one set of environmental variables in three Geographic Information Systems (GIS) -based approaches: logistic regression and Akaike's Information Criterion (AIC),Multiple Criteria Evaluation (MCE),and Bayesian Analysis (specifically Dempster-Shafer theory). We used lynx Lynx canadensis as our focal species,and developed our environment relationship model using track data collected in Banff National Park,Alberta,Canada,during winters from 1997 to 2000. The accuracy of the three spatial models were compared using a contingency table method. We determined the percentage of cases in which both presence and absence points were correctly classified (overall accuracy),the failure to predict a species where it occurred (omission error) and the prediction of presence where there was absence (commission error). Our overall accuracy showed the logistic regression approach was the most accurate (74.51% ). The multiple criteria evaluation was intermediate (39.22%),while the Dempster-Shafer (D-S) theory model was the poorest (29.90%). However,omission and commission error tell us a different story: logistic regression had the lowest commission error,while D-S theory produced the lowest omission error. Our results provide evidence that habitat modellers should evaluate all three error measures when ascribing confidence in their model. We suggest that for our study area at least,the logistic regression model is optimal. However,where sample size is small or the species is very rare,it may also be useful to explore and/or use a more ecologically cautious modelling approach (e.g. Dempster-Shafer) that would over-predict,protect more sites,and thereby minimize the risk of missing critical habitat in conservation plans.

  9. A Noncentral "t" Regression Model for Meta-Analysis

    Science.gov (United States)

    Camilli, Gregory; de la Torre, Jimmy; Chiu, Chia-Yi

    2010-01-01

    In this article, three multilevel models for meta-analysis are examined. Hedges and Olkin suggested that effect sizes follow a noncentral "t" distribution and proposed several approximate methods. Raudenbush and Bryk further refined this model; however, this procedure is based on a normal approximation. In the current research literature, this…

  10. Development of a User Interface for a Regression Analysis Software Tool

    Science.gov (United States)

    Ulbrich, Norbert Manfred; Volden, Thomas R.

    2010-01-01

    An easy-to -use user interface was implemented in a highly automated regression analysis tool. The user interface was developed from the start to run on computers that use the Windows, Macintosh, Linux, or UNIX operating system. Many user interface features were specifically designed such that a novice or inexperienced user can apply the regression analysis tool with confidence. Therefore, the user interface s design minimizes interactive input from the user. In addition, reasonable default combinations are assigned to those analysis settings that influence the outcome of the regression analysis. These default combinations will lead to a successful regression analysis result for most experimental data sets. The user interface comes in two versions. The text user interface version is used for the ongoing development of the regression analysis tool. The official release of the regression analysis tool, on the other hand, has a graphical user interface that is more efficient to use. This graphical user interface displays all input file names, output file names, and analysis settings for a specific software application mode on a single screen which makes it easier to generate reliable analysis results and to perform input parameter studies. An object-oriented approach was used for the development of the graphical user interface. This choice keeps future software maintenance costs to a reasonable limit. Examples of both the text user interface and graphical user interface are discussed in order to illustrate the user interface s overall design approach.

  11. Random Decrement and Regression Analysis of Traffic Responses of Bridges

    DEFF Research Database (Denmark)

    Asmussen, J. C.; Ibrahim, S. R.; Brincker, Rune

    The topic of this paper is the estimation of modal parameters from ambient data by applying the Random Decrement technique. The data from the Queensborough Bridge over the Fraser River in Vancouver, Canada have been applied. The loads producing the dynamic response are ambient, e.g. wind, traffic...... and small ground motion. The Random Decrement technique is used to estimate the correlation function or the free decays from the ambient data. From these functions, the modal parameters are extracted using the Ibrahim Time Domain method. The possible influence of the traffic mass load on the bridge...... of the analysis using the Random Decrement technique are compared with results from an analysis based on fast Fourier transformations....

  12. Random Decrement and Regression Analysis of Traffic Responses of Bridges

    DEFF Research Database (Denmark)

    Asmussen, J. C.; Ibrahim, S. R.; Brincker, Rune

    1996-01-01

    The topic of this paper is the estimation of modal parameters from ambient data by applying the Random Decrement technique. The data fro the Queensborough Bridge over the Fraser River in Vancouver, Canada have been applied. The loads producing the dynamic response are ambient, e. g. wind, traffic...... and small ground motion. The random Decrement technique is used to estimate the correlation function or the free decays from the ambient data. From these functions, the modal parameters are extracted using the Ibrahim Time domain method. The possible influence of the traffic mass load on the bridge...... of the analysis using the Random decrement technique are compared with results from an analysis based on fast Fourier transformations....

  13. The Use of Nonparametric Kernel Regression Methods in Econometric Production Analysis

    DEFF Research Database (Denmark)

    Czekaj, Tomasz Gerard

    This PhD thesis addresses one of the fundamental problems in applied econometric analysis, namely the econometric estimation of regression functions. The conventional approach to regression analysis is the parametric approach, which requires the researcher to specify the form of the regression...... function. However, the a priori specification of a functional form involves the risk of choosing one that is not similar to the “true” but unknown relationship between the regressors and the dependent variable. This problem, known as parametric misspecification, can result in biased parameter estimates...... and nonparametric estimations of production functions in order to evaluate the optimal firm size. The second paper discusses the use of parametric and nonparametric regression methods to estimate panel data regression models. The third paper analyses production risk, price uncertainty, and farmers' risk preferences...

  14. Logistic regression analysis of the risk factors of acute renal failure complicating limb war injuries

    Directory of Open Access Journals (Sweden)

    Chang-zhi CHENG

    2011-06-01

    Full Text Available Objective To explore the risk factors of complication of acute renal failure(ARF in war injuries of limbs.Methods The clinical data of 352 patients with limb injuries admitted to 303 Hospital of PLA from 1968 to 2002 were retrospectively analyzed.The patients were divided into ARF group(n=9 and non-ARF group(n=343 according to the occurrence of ARF,and the case-control study was carried out.Ten factors which might lead to death were analyzed by logistic regression to screen the risk factors for ARF,including causes of trauma,shock after injury,time of admission to hospital after injury,injured sites,combined trauma,number of surgical procedures,presence of foreign matters,features of fractures,amputation,and tourniquet time.Results Fifteen of the 352 patients died(4.3%,among them 7 patients(46.7% died of ARF,3(20.0% of pulmonary embolism,3(20.0% of gas gangrene,and 2(13.3% of multiple organ failure.Univariate analysis revealed that the shock,time before admitted to hospital,amputation and tourniquet time were the risk factors for ARF in the wounded with limb injuries,while the logistic regression analysis showed only amputation was the risk factor for ARF(P < 0.05.Conclusion ARF is the primary cause-of-death in the wounded with limb injury.Prompt and accurate treatment and optimal time for amputation may be beneficial to decreasing the incidence and mortality of ARF in the wounded with severe limb injury and ischemic necrosis.

  15. Multiple linear regression analysis of hip function and vitamin D levels before and after hip arthroplasty%髋关节置换前后髋关节功能与维生素D水平的多元线性回归分析

    Institute of Scientific and Technical Information of China (English)

    张炜; 汤在祥; 耿德春; 朱锋; 董汉青; 王熠军; 徐耀增

    2016-01-01

    forefront. OBJECTIVE:To determine the prevalence of low serum level of vitamin D in patients before total hip arthroplasty and its relationship with the hip function scores. METHODS:Forty-eight hips from 48 patients undergoing primary hip arthroplasty from July 2013 to August 2014 in the First Affiliated Hospital of Suzhou University were enrol ed. According to the serum level of vitamin D, patients were assigned to low-level (<20μg/L) and high-level (20≥μg/L) groups. The general information of patients, the hip function scores before and after replacement at the last fol ow-up in the two groups were observed and compared. The relationship between the serum level of vitamin D and the hip function scores before and after replacement was analyzed by multiple linear regression analysis. And the average fol ow-up was 12 months (11-14 months). RESULTS AND CONCLUSION:(1) The incidence of low vitamin D level was 82%(20 ng/mL serving as standard). (2) Compared with patients with high vitamin D level, patients with low level of vitamin D had lower preoperative Harris scores and Merle D´Aubigne-Postel score (P<0.05), and also at the last fol ow-up (P<0.05. (3) Based on the preoperative and postoperative Harris, the multiple linear regression analysis showed that there was a positive correlation between the level of vitamin D and Harris score both preoperatively and postoperatively (P<0.05). (4) These results suggest that there is a higher incidence of low level of vitamin D in patients undergoing arthroplasty, and hip function scores before and after replacement in patients with low level of vitamin D are lower than the high level patients. Moreover, there is a positive correlation between the level of vitamin D and the hip joint function scores. Therefore, it is advisable to supplement vitamin D and calcium preoperatively, and the level of vitamin D wil be helpful for disease assessment and prognosis.

  16. A review of the most relevant multiple regression models for sales forecasting in gas stations; Uma revisao dos principais modelos de regressao multipla para previsao de vendas de postos de combustiveis

    Energy Technology Data Exchange (ETDEWEB)

    Wanke, Peter [Universidade Federal do Rio de Janeiro (UFRJ), RJ (Brazil). Instituto de Pesquisa e Pos-Graduacao em Administracao de Empresas (COPPEAD). Centro de Estudos em Logistica

    2004-07-01

    In this paper, the most relevant multiple regression models for sales forecasting of gas stations, developed over the past ten years, are reviewed. The most significant variables related to gas station sales, the types of the multiple regression models (linear or non-linear), the most common uses in supporting decision making and its limits are presented. The predictive power of each model and its impact on decision-making, such as sensitivity analysis and confidence intervals for independent variables, are also commented. Four models are presented, based on studies conducted in South Africa, Portugal and Brazil. In conclusion, suggestions for future developments are presented based on past developments. (author)

  17. Key To Effective English Remedial Education: Intimation Derived From Multiple Regression

    Science.gov (United States)

    Zhang, Rong; Ishino, Fukuya

    2009-05-01

    With the rapid decrease in younger population, Japanese universities/colleges have to face the challenging task of how to reach the annual quota for incoming students. The admission criteria are debased and students with a broad variety of scholastic abilities are being accepted by higher education institutions. Freshmen's deterioration in academic performances is said to be the most crucial factor hindering the implementation of effective curriculum education. Many universities/colleges have to establish remedial education programs to deal with this problem arising from the limited room for student selection. This paper reports an English remedial education program carried out in Nishinippon Institute of Technology, Japan, examining the validities of its course setting, optimizing the prediction models for students' post-course score changes. The analysis is focused on those determinants proved to be responsible for the improvement of students' English proficiencies, verifying the argument that more effective English remedial education can be realized by conducting appropriate instructions and teaching methodology in courses at different levels.

  18. Seasonal forecasting of Bangladesh summer monsoon rainfall using simple multiple regression model

    Indian Academy of Sciences (India)

    Md Mizanur Rahman; M Rafiuddin; Md Mahbub Alam

    2013-04-01

    In this paper, the development of a statistical forecasting method for summer monsoon rainfall over Bangladesh is described. Predictors for Bangladesh summer monsoon (June–September) rainfall were identified from the large scale ocean–atmospheric circulation variables (i.e., sea-surface temperature, surface air temperature and sea level pressure). The predictors exhibited a significant relationship with Bangladesh summer monsoon rainfall during the period 1961–2007. After carrying out a detailed analysis of various global climate datasets; three predictors were selected. The model performance was evaluated during the period 1977–2007. The model showed better performance in their hindcast seasonal monsoon rainfall over Bangladesh. The RMSE and Heidke skill score for 31 years was 8.13 and 0.37, respectively, and the correlation between the predicted and observed rainfall was 0.74. The BIAS of the forecasts (% of long period average, LPA) was −0.85 and Hit score was 58%. The experimental forecasts for the year 2008 summer monsoon rainfall based on the model were also found to be in good agreement with the observation.

  19. A comparison of neural network models, fuzzy logic, and multiple linear regression for prediction of hatchability.

    Science.gov (United States)

    Mehri, M

    2013-04-01

    Application of appropriate models to approximate the performance function warrants more precise prediction and helps to make the best decisions in the poultry industry. This study reevaluated the factors affecting hatchability in laying hens from 29 to 56 wk of age. Twenty-eight data lines representing 4 inputs consisting of egg weight, eggshell thickness, egg sphericity, and yolk/albumin ratio and 1 output, hatchability, were obtained from the literature and used to train an artificial neural network (ANN). The prediction ability of ANN was compared with that of fuzzy logic to evaluate the fitness of these 2 methods. The models were compared using R(2), mean absolute deviation (MAD), mean squared error (MSE), mean absolute percentage error (MAPE), and bias. The developed model was used to assess the relative importance of each variable on the hatchability by calculating the variable sensitivity ratio. The statistical evaluations showed that the ANN-based model predicted hatchability more accurately than fuzzy logic. The ANN-based model had a higher determination of coefficient (R(2) = 0.99) and lower residual distribution (MAD = 0.005; MSE = 0.00004; MAPE = 0.732; bias = 0.0012) than fuzzy logic (R(2) = 0.87; MAD = 0.014; MSE = 0.0004; MAPE = 2.095; bias = 0.0046). The sensitivity analysis revealed that the most important variable in the ANN-based model of hatchability was egg weight (variable sensitivity ratio, VSR = 283.11), followed by yolk/albumin ratio (VSR = 113.16), eggshell thickness (VSR = 16.23), and egg sphericity (VSR = 3.63). The results of this research showed that the universal approximation capability of ANN made it a powerful tool to approximate complex functions such as hatchability in the incubation process.

  20. Regression analysis in modeling of air surface temperature and factors affecting its value in Peninsular Malaysia

    Science.gov (United States)

    Rajab, Jasim Mohammed; Jafri, Mohd. Zubir Mat; Lim, Hwee San; Abdullah, Khiruddin

    2012-10-01

    This study encompasses air surface temperature (AST) modeling in the lower atmosphere. Data of four atmosphere pollutant gases (CO, O3, CH4, and H2O) dataset, retrieved from the National Aeronautics and Space Administration Atmospheric Infrared Sounder (AIRS), from 2003 to 2008 was employed to develop a model to predict AST value in the Malaysian peninsula using the multiple regression method. For the entire period, the pollutants were highly correlated (R=0.821) with predicted AST. Comparisons among five stations in 2009 showed close agreement between the predicted AST and the observed AST from AIRS, especially in the southwest monsoon (SWM) season, within 1.3 K, and for in situ data, within 1 to 2 K. The validation results of AST with AST from AIRS showed high correlation coefficient (R=0.845 to 0.918), indicating the model's efficiency and accuracy. Statistical analysis in terms of β showed that H2O (0.565 to 1.746) tended to contribute significantly to high AST values during the northeast monsoon season. Generally, these results clearly indicate the advantage of using the satellite AIRS data and a correlation analysis study to investigate the impact of atmospheric greenhouse gases on AST over the Malaysian peninsula. A model was developed that is capable of retrieving the Malaysian peninsulan AST in all weather conditions, with total uncertainties ranging between 1 and 2 K.

  1. A primer for biomedical scientists on how to execute model II linear regression analysis.

    Science.gov (United States)

    Ludbrook, John

    2012-04-01

    1. There are two very different ways of executing linear regression analysis. One is Model I, when the x-values are fixed by the experimenter. The other is Model II, in which the x-values are free to vary and are subject to error. 2. I have received numerous complaints from biomedical scientists that they have great difficulty in executing Model II linear regression analysis. This may explain the results of a Google Scholar search, which showed that the authors of articles in journals of physiology, pharmacology and biochemistry rarely use Model II regression analysis. 3. I repeat my previous arguments in favour of using least products linear regression analysis for Model II regressions. I review three methods for executing ordinary least products (OLP) and weighted least products (WLP) regression analysis: (i) scientific calculator and/or computer spreadsheet; (ii) specific purpose computer programs; and (iii) general purpose computer programs. 4. Using a scientific calculator and/or computer spreadsheet, it is easy to obtain correct values for OLP slope and intercept, but the corresponding 95% confidence intervals (CI) are inaccurate. 5. Using specific purpose computer programs, the freeware computer program smatr gives the correct OLP regression coefficients and obtains 95% CI by bootstrapping. In addition, smatr can be used to compare the slopes of OLP lines. 6. When using general purpose computer programs, I recommend the commercial programs systat and Statistica for those who regularly undertake linear regression analysis and I give step-by-step instructions in the Supplementary Information as to how to use loss functions.

  2. Principal Component and Multiple Regression Analyses for the Estimation of Suspended Sediment Yield in Ungauged Basins of Northern Thailand

    Directory of Open Access Journals (Sweden)

    Piyawat Wuttichaikitcharoen

    2014-08-01

    Full Text Available Predicting sediment yield is necessary for good land and water management in any river basin. However, sometimes, the sediment data is either not available or is sparse, which renders estimating sediment yield a daunting task. The present study investigates the factors influencing suspended sediment yield using the principal component analysis (PCA. Additionally, the regression relationships for estimating suspended sediment yield, based on the selected key factors from the PCA, are developed. The PCA shows six components of key factors that can explain at least up to 86.7% of the variation of all variables. The regression models show that basin size, channel network characteristics, land use, basin steepness and rainfall distribution are the key factors affecting sediment yield. The validation of regression relationships for estimating suspended sediment yield shows the error of estimation ranging from −55% to +315% and −59% to +259% for suspended sediment yield and for area-specific suspended sediment yield, respectively. The proposed relationships may be considered useful for predicting suspended sediment yield in ungauged basins of Northern Thailand that have geologic, climatic and hydrologic conditions similar to the study area.

  3. Comparison of Multiple Linear Regressions and Neural Networks based QSAR models for the design of new antitubercular compounds.

    Science.gov (United States)

    Ventura, Cristina; Latino, Diogo A R S; Martins, Filomena

    2013-01-01

    The performance of two QSAR methodologies, namely Multiple Linear Regressions (MLR) and Neural Networks (NN), towards the modeling and prediction of antitubercular activity was evaluated and compared. A data set of 173 potentially active compounds belonging to the hydrazide family and represented by 96 descriptors was analyzed. Models were built with Multiple Linear Regressions (MLR), single Feed-Forward Neural Networks (FFNNs), ensembles of FFNNs and Associative Neural Networks (AsNNs) using four different data sets and different types of descriptors. The predictive ability of the different techniques used were assessed and discussed on the basis of different validation criteria and results show in general a better performance of AsNNs in terms of learning ability and prediction of antitubercular behaviors when compared with all other methods. MLR have, however, the advantage of pinpointing the most relevant molecular characteristics responsible for the behavior of these compounds against Mycobacterium tuberculosis. The best results for the larger data set (94 compounds in training set and 18 in test set) were obtained with AsNNs using seven descriptors (R(2) of 0.874 and RMSE of 0.437 against R(2) of 0.845 and RMSE of 0.472 in MLRs, for test set). Counter-Propagation Neural Networks (CPNNs) were trained with the same data sets and descriptors. From the scrutiny of the weight levels in each CPNN and the information retrieved from MLRs, a rational design of potentially active compounds was attempted. Two new compounds were synthesized and tested against M. tuberculosis showing an activity close to that predicted by the majority of the models.

  4. A conditional likelihood approach for regression analysis using biomarkers measured with batch-specific error.

    Science.gov (United States)

    Wang, Ming; Flanders, W Dana; Bostick, Roberd M; Long, Qi

    2012-12-20

    Measurement error is common in epidemiological and biomedical studies. When biomarkers are measured in batches or groups, measurement error is potentially correlated within each batch or group. In regression analysis, most existing methods are not applicable in the presence of batch-specific measurement error in predictors. We propose a robust conditional likelihood approach to account for batch-specific error in predictors when batch effect is additive and the predominant source of error, which requires no assumptions on the distribution of measurement error. Although a regression model with batch as a categorical covariable yields the same parameter estimates as the proposed conditional likelihood approach for linear regression, this result does not hold in general for all generalized linear models, in particular, logistic regression. Our simulation studies show that the conditional likelihood approach achieves better finite sample performance than the regression calibration approach or a naive approach without adjustment for measurement error. In the case of logistic regression, our proposed approach is shown to also outperform the regression approach with batch as a categorical covariate. In addition, we also examine a 'hybrid' approach combining the conditional likelihood method and the regression calibration method, which is shown in simulations to achieve good performance in the presence of both batch-specific and measurement-specific errors. We illustrate our method by using data from a colorectal adenoma study.

  5. Evaluation of syngas production unit cost of bio-gasification facility using regression analysis techniques

    Energy Technology Data Exchange (ETDEWEB)

    Deng, Yangyang; Parajuli, Prem B.

    2011-08-10

    Evaluation of economic feasibility of a bio-gasification facility needs understanding of its unit cost under different production capacities. The objective of this study was to evaluate the unit cost of syngas production at capacities from 60 through 1800Nm 3/h using an economic model with three regression analysis techniques (simple regression, reciprocal regression, and log-log regression). The preliminary result of this study showed that reciprocal regression analysis technique had the best fit curve between per unit cost and production capacity, with sum of error squares (SES) lower than 0.001 and coefficient of determination of (R 2) 0.996. The regression analysis techniques determined the minimum unit cost of syngas production for micro-scale bio-gasification facilities of $0.052/Nm 3, under the capacity of 2,880 Nm 3/h. The results of this study suggest that to reduce cost, facilities should run at a high production capacity. In addition, the contribution of this technique could be the new categorical criterion to evaluate micro-scale bio-gasification facility from the perspective of economic analysis.

  6. Performance of an Axisymmetric Rocket Based Combined Cycle Engine During Rocket Only Operation Using Linear Regression Analysis

    Science.gov (United States)

    Smith, Timothy D.; Steffen, Christopher J., Jr.; Yungster, Shaye; Keller, Dennis J.

    1998-01-01

    The all rocket mode of operation is shown to be a critical factor in the overall performance of a rocket based combined cycle (RBCC) vehicle. An axisymmetric RBCC engine was used to determine specific impulse efficiency values based upon both full flow and gas generator configurations. Design of experiments methodology was used to construct a test matrix and multiple linear regression analysis was used to build parametric models. The main parameters investigated in this study were: rocket chamber pressure, rocket exit area ratio, injected secondary flow, mixer-ejector inlet area, mixer-ejector area ratio, and mixer-ejector length-to-inlet diameter ratio. A perfect gas computational fluid dynamics analysis, using both the Spalart-Allmaras and k-omega turbulence models, was performed with the NPARC code to obtain values of vacuum specific impulse. Results from the multiple linear regression analysis showed that for both the full flow and gas generator configurations increasing mixer-ejector area ratio and rocket area ratio increase performance, while increasing mixer-ejector inlet area ratio and mixer-ejector length-to-diameter ratio decrease performance. Increasing injected secondary flow increased performance for the gas generator analysis, but was not statistically significant for the full flow analysis. Chamber pressure was found to be not statistically significant.

  7. Regression analysis understanding and building business and economic models using Excel

    CERN Document Server

    Wilson, J Holton

    2012-01-01

    The technique of regression analysis is used so often in business and economics today that an understanding of its use is necessary for almost everyone engaged in the field. This book will teach you the essential elements of building and understanding regression models in a business/economic context in an intuitive manner. The authors take a non-theoretical treatment that is accessible even if you have a limited statistical background. It is specifically designed to teach the correct use of regression, while advising you of its limitations and teaching about common pitfalls. This book describe

  8. [Local Regression Algorithm Based on Net Analyte Signal and Its Application in Near Infrared Spectral Analysis].

    Science.gov (United States)

    Zhang, Hong-guang; Lu, Jian-gang

    2016-02-01

    Abstract To overcome the problems of significant difference among samples and nonlinearity between the property and spectra of samples in spectral quantitative analysis, a local regression algorithm is proposed in this paper. In this algorithm, net signal analysis method(NAS) was firstly used to obtain the net analyte signal of the calibration samples and unknown samples, then the Euclidean distance between net analyte signal of the sample and net analyte signal of calibration samples was calculated and utilized as similarity index. According to the defined similarity index, the local calibration sets were individually selected for each unknown sample. Finally, a local PLS regression model was built on each local calibration sets for each unknown sample. The proposed method was applied to a set of near infrared spectra of meat samples. The results demonstrate that the prediction precision and model complexity of the proposed method are superior to global PLS regression method and conventional local regression algorithm based on spectral Euclidean distance.

  9. Quantile regression for the statistical analysis of immunological data with many non-detects

    OpenAIRE

    Eilers Paul HC; Röder Esther; Savelkoul Huub FJ; van Wijk Roy

    2012-01-01

    Abstract Background Immunological parameters are hard to measure. A well-known problem is the occurrence of values below the detection limit, the non-detects. Non-detects are a nuisance, because classical statistical analyses, like ANOVA and regression, cannot be applied. The more advanced statistical techniques currently available for the analysis of datasets with non-detects can only be used if a small percentage of the data are non-detects. Methods and results Quantile regression, a genera...

  10. Multiple factor analysis by example using R

    CERN Document Server

    Pagès, Jérôme

    2014-01-01

    Multiple factor analysis (MFA) enables users to analyze tables of individuals and variables in which the variables are structured into quantitative, qualitative, or mixed groups. Written by the co-developer of this methodology, Multiple Factor Analysis by Example Using R brings together the theoretical and methodological aspects of MFA. It also includes examples of applications and details of how to implement MFA using an R package (FactoMineR).The first two chapters cover the basic factorial analysis methods of principal component analysis (PCA) and multiple correspondence analysis (MCA). The

  11. Regularized Multiple-Set Canonical Correlation Analysis

    Science.gov (United States)

    Takane, Yoshio; Hwang, Heungsun; Abdi, Herve

    2008-01-01

    Multiple-set canonical correlation analysis (Generalized CANO or GCANO for short) is an important technique because it subsumes a number of interesting multivariate data analysis techniques as special cases. More recently, it has also been recognized as an important technique for integrating information from multiple sources. In this paper, we…

  12. Distance Based Root Cause Analysis and Change Impact Analysis of Performance Regressions

    Directory of Open Access Journals (Sweden)

    Junzan Zhou

    2015-01-01

    Full Text Available Performance regression testing is applied to uncover both performance and functional problems of software releases. A performance problem revealed by performance testing can be high response time, low throughput, or even being out of service. Mature performance testing process helps systematically detect software performance problems. However, it is difficult to identify the root cause and evaluate the potential change impact. In this paper, we present an approach leveraging server side logs for identifying root causes of performance problems. Firstly, server side logs are used to recover call tree of each business transaction. We define a novel distance based metric computed from call trees for root cause analysis and apply inverted index from methods to business transactions for change impact analysis. Empirical studies show that our approach can effectively and efficiently help developers diagnose root cause of performance problems.

  13. Biosensors and multiple mycotoxin analysis

    NARCIS (Netherlands)

    Gaag, B. van der; Spath, S.; Dietrich, H.; Stigter, E.; Boonzaaijer, G.; Osenbruggen, T. van; Koopal, K.

    2003-01-01

    An immunochemical biosensor assay for the detection of multiple mycotoxins in a sample is described.The inhibition assay is designed to measure four different mycotoxins in a single measurement, following extraction, sample clean-up and incubation with an appropriate cocktail of anti-mycotoxin antib

  14. Combining different functions to describe milk, fat, and protein yield in goats using Bayesian multiple-trait random regression models.

    Science.gov (United States)

    Oliveira, H R; Silva, F F; Siqueira, O H G B D; Souza, N O; Junqueira, V S; Resende, M D V; Borquis, R R A; Rodrigues, M T

    2016-05-01

    We proposed multiple-trait random regression models (MTRRM) combining different functions to describe milk yield (MY) and fat (FP) and protein (PP) percentage in dairy goat genetic evaluation by using Bayesian inference. A total of 3,856 MY, FP, and PP test-day records, measured between 2000 and 2014, from 535 first lactations of Saanen and Alpine goats, including their cross, were used in this study. The initial analyses were performed using the following single-trait random regression models (STRRM): third- and fifth-order Legendre polynomials (Leg3 and Leg5), linear B-splines with 3 and 5 knots, the Ali and Schaeffer function (Ali), and Wilmink function. Heterogeneity of residual variances was modeled considering 3 classes. After the selection of the best STRRM to describe each trait on the basis of the deviance information criterion (DIC) and posterior model probabilities (PMP), the functions were combined to compose the MTRRM. All combined MTRRM presented lower DIC values and higher PMP, showing the superiority of these models when compared to other MTRRM based only on the same function assumed for all traits. Among the combined MTRRM, those considering Ali to describe MY and PP and Leg5 to describe FP (Ali_Leg5_Ali model) presented the best fit. From the Ali_Leg5_Ali model, heritability estimates over time for MY, FP. and PP ranged from 0.25 to 0.54, 0.27 to 0.48, and 0.35 to 0.51, respectively. Genetic correlation between MY and FP, MY and PP, and FP and PP ranged from -0.58 to 0.03, -0.46 to 0.12, and 0.37 to 0.64, respectively. We concluded that combining different functions under a MTRRM approach can be a plausible alternative for joint genetic evaluation of milk yield and milk constituents in goats.

  15. The severity of Minamata disease declined in 25 years: temporal profile of the neurological findings analyzed by multiple logistic regression model.

    Science.gov (United States)

    Uchino, Makoto; Hirano, Teruyuki; Satoh, Hiroshi; Arimura, Kimiyoshi; Nakagawa, Masanori; Wakamiya, Jyunji

    2005-01-01

    Minamata disease (MD) was caused by ingestion of seafood from the methylmercury-contaminated areas. Although 50 years have passed since the discovery of MD, there have been only a few studies on the temporal profile of neurological findings in certified MD patients. Thus, we evaluated changes in neurological symptoms and signs of MD using discriminants by multiple logistic regression analysis. The severity of predictive index declined in 25 years in most of the patients. Only a few patients showed aggravation of neurological findings, which was due to complications such as spino-cerebellar degeneration. Patients with chronic MD aged over 45 years had several concomitant diseases so that their clinical pictures were complicated. It was difficult to differentiate chronic MD using statistically established discriminants based on sensory disturbance alone. In conclusion, the severity of MD declined in 25 years along with the modification by age-related concomitant disorders.

  16. Land use regression modeling of intra-urban residential variability in multiple traffic-related air pollutants

    Directory of Open Access Journals (Sweden)

    Baxter Lisa K

    2008-05-01

    Full Text Available Abstract Background There is a growing body of literature linking GIS-based measures of traffic density to asthma and other respiratory outcomes. However, no consensus exists on which traffic indicators best capture variability in different pollutants or within different settings. As part of a study on childhood asthma etiology, we examined variability in outdoor concentrations of multiple traffic-related air pollutants within urban communities, using a range of GIS-based predictors and land use regression techniques. Methods We measured fine particulate matter (PM2.5, nitrogen dioxide (NO2, and elemental carbon (EC outside 44 homes representing a range of traffic densities and neighborhoods across Boston, Massachusetts and nearby communities. Multiple three to four-day average samples were collected at each home during winters and summers from 2003 to 2005. Traffic indicators were derived using Massachusetts Highway Department data and direct traffic counts. Multivariate regression analyses were performed separately for each pollutant, using traffic indicators, land use, meteorology, site characteristics, and central site concentrations. Results PM2.5 was strongly associated with the central site monitor (R2 = 0.68. Additional variability was explained by total roadway length within 100 m of the home, smoking or grilling near the monitor, and block-group population density (R2 = 0.76. EC showed greater spatial variability, especially during winter months, and was predicted by roadway length within 200 m of the home. The influence of traffic was greater under low wind speed conditions, and concentrations were lower during summer (R2 = 0.52. NO2 showed significant spatial variability, predicted by population density and roadway length within 50 m of the home, modified by site characteristics (obstruction, and with higher concentrations during summer (R2 = 0.56. Conclusion Each pollutant examined displayed somewhat different spatial patterns

  17. Improving the Robustness and Stability of Partial Least Squares Regression for Near-infrared Spectral Analysis

    Institute of Scientific and Technical Information of China (English)

    SHAO, Xueguang; CHEN, Da; XU, Heng; LIU, Zhichao; CAI, Wensheng

    2009-01-01

    Partial least-squares (PLS) regression has been presented as a powerful tool for spectral quantitative measure- ment. However, the improvement of the robustness and stability of PLS models is still needed, because it is difficult to build a stable model when complex samples are analyzed or outliers are contained in the calibration data set. To achieve the purpose, a robust ensemble PLS technique based on probability resampling was proposed, which is named RE-PLS. In the proposed method, a probability is firstly obtained for each calibration sample from its resid- ual in a robust regression. Then, multiple PLS models are constructed based on probability resampling. At last, the multiple PLS models are used to predict unknown samples by taking the average of the predictions from the multi- ple models as final prediction result. To validate the effectiveness and universality of the proposed method, it was applied to two different sets of NIR spectra. The results show that RE-PLS can not only effectively avoid the inter- ference of outliers but also enhance the precision of prediction and the stability of PLS regression. Thus, it may pro- vide a useful tool for multivariate calibration with multiple outliers.

  18. Analysis of the influence factors and clinical indices for halitosis in end-stage renal disease patients through multiple logistic regression%终末期肾脏病患者鼻测口臭相关影响因素的Logistic回归分析

    Institute of Scientific and Technical Information of China (English)

    赵颖; 孙玉华; 冯锦红; 韩建国

    2015-01-01

    目的:探讨终末期肾脏病(ESRD)患者口臭与相关影响因素的关系。方法分别用茚三酮分光光度法和鼻测法检测ESRD患者口气中氨的浓度值和感官口气值(OS),并统计性别、年龄、学历、透析时间、抽烟史、氨值、肌酐、尿素氮、唾液pH值、舌苔面积、舌苔厚度等因素,分析鼻测法与这些检测指标之间的Logistic回归关系。结果在α=0.05水平,单因素Logistic回归分析显示氨值、尿素氮、舌苔厚度、唾液pH值等4个因素为ESRD患者口臭的危险因素,进一步作多因素Logistic回归分析表明,氨值是ESRD患者口臭最可能的危险因素。结论 SRD患者体内不能完全代谢的氨是产生氨性口臭最可能的危险因素。%Objective To study the relationship between the influence factors and halitosis in end-stage renal disease (ESRD) patients. Methods Organoleptic scores (OS) and concentration value of ammonia were measured independently by organoleptic assessment and ninhydrin spectrophotometry . Factors such as gender, age, education, duration of dialysis, smoking history, ammonia, urea nitrogen, creatinine, salivary pH, area and thickness of tongue coating were statistically analyzed. The Logistic regression relationship of oroganolepticscores and other clinical indices was studied . Results At the level of α=0.05, the single factor Logistic regression analysis showed that four factors including the ammonia, urea nitrogen, the thickness of tongue coating and salivary pH were the risk factors of halitosis in patients with end stage renal disease . Further multivariate Logistic regression analysis showed that ammonia was the most possible risk factor of halitosis in patients with end stage renal disease . Conclusion Ammonia not completely metabolized in the body of patients with end stage renal disease is the most possible risk factor of halitosis in these patients .

  19. Econometric analysis of realized covariation: high frequency based covariance, regression, and correlation in financial economics

    DEFF Research Database (Denmark)

    Barndorff-Nielsen, Ole Eiler; Shephard, N.

    2004-01-01

    This paper analyses multivariate high frequency financial data using realized covariation. We provide a new asymptotic distribution theory for standard methods such as regression, correlation analysis, and covariance. It will be based on a fixed interval of time (e.g., a day or week), allowing...... the number of high frequency returns during this period to go to infinity. Our analysis allows us to study how high frequency correlations, regressions, and covariances change through time. In particular we provide confidence intervals for each of these quantities....

  20. Analysis of Functional Data with Focus on Multinomial Regression and Multilevel Data

    DEFF Research Database (Denmark)

    Mousavi, Seyed Nourollah

    Functional data analysis (FDA) is a fast growing area in statistical research with increasingly diverse range of application from economics, medicine, agriculture, chemometrics, etc. Functional regression is an area of FDA which has received the most attention both in aspects of application...... and methodological development. Our main Functional data analysis (FDA) is a fast growing area in statistical research with increasingly diverse range of application from economics, medicine, agriculture, chemometrics, etc. Functional regression is an area of FDA which has received the most attention both in aspects...

  1. Integration Analysis of Three Omics Data Using Penalized Regression Methods: An Application to Bladder Cancer.

    Science.gov (United States)

    Pineda, Silvia; Real, Francisco X; Kogevinas, Manolis; Carrato, Alfredo; Chanock, Stephen J; Malats, Núria; Van Steen, Kristel

    2015-12-01

    Omics data integration is becoming necessary to investigate the genomic mechanisms involved in complex diseases. During the integration process, many challenges arise such as data heterogeneity, the smaller number of individuals in comparison to the number of parameters, multicollinearity, and interpretation and validation of results due to their complexity and lack of knowledge about biological processes. To overcome some of these issues, innovative statistical approaches are being developed. In this work, we propose a permutation-based method to concomitantly assess significance and correct by multiple testing with the MaxT algorithm. This was applied with penalized regression methods (LASSO and ENET) when exploring relationships between common genetic variants, DNA methylation and gene expression measured in bladder tumor samples. The overall analysis flow consisted of three steps: (1) SNPs/CpGs were selected per each gene probe within 1Mb window upstream and downstream the gene; (2) LASSO and ENET were applied to assess the association between each expression probe and the selected SNPs/CpGs in three multivariable models (SNP, CPG, and Global models, the latter integrating SNPs and CPGs); and (3) the significance of each model was assessed using the permutation-based MaxT method. We identified 48 genes whose expression levels were significantly associated with both SNPs and CPGs. Importantly, 36 (75%) of them were replicated in an independent data set (TCGA) and the performance of the proposed method was checked with a simulation study. We further support our results with a biological interpretation based on an enrichment analysis. The approach we propose allows reducing computational time and is flexible and easy to implement when analyzing several types of omics data. Our results highlight the importance of integrating omics data by applying appropriate statistical strategies to discover new insights into the complex genetic mechanisms involved in disease

  2. Development and application of a multiple linear regression model to consider the impact of weekly waste container capacity on the yield from kerbside recycling programmes in Scotland.

    Science.gov (United States)

    Baird, Jim; Curry, Robin; Reid, Tim

    2013-03-01

    This article describes the development and application of a multiple linear regression model to identify how the key elements of waste and recycling infrastructure, namely container capacity and frequency of collection, affect the yield from municipal kerbside recycling programmes. The overall aim of the research was to gain an understanding of the factors affecting the yield from municipal kerbside recycling programmes in Scotland with an underlying objective to evaluate the efficacy of the model as a decision-support tool for informing the design of kerbside recycling programmes. The study isolates the principal kerbside collection service offered by all 32 councils across Scotland, eliminating those recycling programmes associated with flatted properties or multi-occupancies. The results of the regression analysis model have identified three principal factors which explain 80% of the variability in the average yield of the principal dry recyclate services: weekly residual waste capacity, number of materials collected and the weekly recycling capacity. The use of the model has been evaluated and recommendations made on ongoing methodological development and the use of the results in informing the design of kerbside recycling programmes. We hope that the research can provide insights for the further development of methods to optimise the design and operation of kerbside recycling programmes.

  3. 2D Quantitative Structure-Property Relationship Study of Mycotoxins by Multiple Linear Regression and Support Vector Machine

    Directory of Open Access Journals (Sweden)

    Fereshteh Shiri

    2010-08-01

    Full Text Available In the present work, support vector machines (SVMs and multiple linear regression (MLR techniques were used for quantitative structure–property relationship (QSPR studies of retention time (tR in standardized liquid chromatography–UV–mass spectrometry of 67 mycotoxins (aflatoxins, trichothecenes, roquefortines and ochratoxins based on molecular descriptors calculated from the optimized 3D structures. By applying missing value, zero and multicollinearity tests with a cutoff value of 0.95, and genetic algorithm method of variable selection, the most relevant descriptors were selected to build QSPR models. MLRand SVMs methods were employed to build QSPR models. The robustness of the QSPR models was characterized by the statistical validation and applicability domain (AD. The prediction results from the MLR and SVM models are in good agreement with the experimental values. The correlation and predictability measure by r2 and q2 are 0.931 and 0.932, repectively, for SVM and 0.923 and 0.915, respectively, for MLR. The applicability domain of the model was investigated using William’s plot. The effects of different descriptors on the retention times are described.

  4. A parallel implementation of the network identification by multiple regression (NIR algorithm to reverse-engineer regulatory gene networks.

    Directory of Open Access Journals (Sweden)

    Francesco Gregoretti

    Full Text Available The reverse engineering of gene regulatory networks using gene expression profile data has become crucial to gain novel biological knowledge. Large amounts of data that need to be analyzed are currently being produced due to advances in microarray technologies. Using current reverse engineering algorithms to analyze large data sets can be very computational-intensive. These emerging computational requirements can be met using parallel computing techniques. It has been shown that the Network Identification by multiple Regression (NIR algorithm performs better than the other ready-to-use reverse engineering software. However it cannot be used with large networks with thousands of nodes--as is the case in biological networks--due to the high time and space complexity. In this work we overcome this limitation by designing and developing a parallel version of the NIR algorithm. The new implementation of the algorithm reaches a very good accuracy even for large gene networks, improving our understanding of the gene regulatory networks that is crucial for a wide range of biomedical applications.

  5. Ranking contributing areas of salt and selenium in the Lower Gunnison River Basin, Colorado, using multiple linear regression models

    Science.gov (United States)

    Linard, Joshua I.

    2013-01-01

    Mitigating the effects of salt and selenium on water quality in the Grand Valley and lower Gunnison River Basin in western Colorado is a major concern for land managers. Previous modeling indicated means to improve the models by including more detailed geospatial data and a more rigorous method for developing the models. After evaluating all possible combinations of geospatial variables, four multiple linear regression models resulted that could estimate irrigation-season salt yield, nonirrigation-season salt yield, irrigation-season selenium yield, and nonirrigation-season selenium yield. The adjusted r-squared and the residual standard error (in units of log-transformed yield) of the models were, respectively, 0.87 and 2.03 for the irrigation-season salt model, 0.90 and 1.25 for the nonirrigation-season salt model, 0.85 and 2.94 for the irrigation-season selenium model, and 0.93 and 1.75 for the nonirrigation-season selenium model. The four models were used to estimate yields and loads from contributing areas corresponding to 12-digit hydrologic unit codes in the lower Gunnison River Basin study area. Each of the 175 contributing areas was ranked according to its estimated mean seasonal yield of salt and selenium.

  6. Prediction of Currency Volume Issued in Taiwan Using a Hybrid Artificial Neural Network and Multiple Regression Approach

    Directory of Open Access Journals (Sweden)

    Yuehjen E. Shao

    2013-01-01

    Full Text Available Because the volume of currency issued by a country always affects its interest rate, price index, income levels, and many other important macroeconomic variables, the prediction of currency volume issued has attracted considerable attention in recent years. In contrast to the typical single-stage forecast model, this study proposes a hybrid forecasting approach to predict the volume of currency issued in Taiwan. The proposed hybrid models consist of artificial neural network (ANN and multiple regression (MR components. The MR component of the hybrid models is established for a selection of fewer explanatory variables, wherein the selected variables are of higher importance. The ANN component is then designed to generate forecasts based on those important explanatory variables. Subsequently, the model is used to analyze a real dataset of Taiwan's currency from 1996 to 2011 and twenty associated explanatory variables. The prediction results reveal that the proposed hybrid scheme exhibits superior forecasting performance for predicting the volume of currency issued in Taiwan.

  7. Assessing the impact of local meteorological variables on surface ozone in Hong Kong during 2000-2015 using quantile and multiple line regression models

    Science.gov (United States)

    Zhao, Wei; Fan, Shaojia; Guo, Hai; Gao, Bo; Sun, Jiaren; Chen, Laiguo

    2016-11-01

    The quantile regression (QR) method has been increasingly introduced to atmospheric environmental studies to explore the non-linear relationship between local meteorological conditions and ozone mixing ratios. In this study, we applied QR for the first time, together with multiple linear regression (MLR), to analyze the dominant meteorological parameters influencing the mean, 10th percentile, 90th percentile and 99th percentile of maximum daily 8-h average (MDA8) ozone concentrations in 2000-2015 in Hong Kong. The dominance analysis (DA) was used to assess the relative importance of meteorological variables in the regression models. Results showed that the MLR models worked better at suburban and rural sites than at urban sites, and worked better in winter than in summer. QR models performed better in summer for 99th and 90th percentiles and performed better in autumn and winter for 10th percentile. And QR models also performed better in suburban and rural areas for 10th percentile. The top 3 dominant variables associated with MDA8 ozone concentrations, changing with seasons and regions, were frequently associated with the six meteorological parameters: boundary layer height, humidity, wind direction, surface solar radiation, total cloud cover and sea level pressure. Temperature rarely became a significant variable in any season, which could partly explain the peak of monthly average ozone concentrations in October in Hong Kong. And we found the effect of solar radiation would be enhanced during extremely ozone pollution episodes (i.e., the 99th percentile). Finally, meteorological effects on MDA8 ozone had no significant changes before and after the 2010 Asian Games.

  8. Regression analysis with missing data and unknown colored noise: application to the MICROSCOPE space mission

    CERN Document Server

    Baghi, Q; Bergé, J; Christophe, B; Touboul, P; Rodrigues, M

    2015-01-01

    The analysis of physical measurements often copes with highly correlated noises and interruptions caused by outliers, saturation events or transmission losses. We assess the impact of missing data on the performance of linear regression analysis involving the fit of modeled or measured time series. We show that data gaps can significantly alter the precision of the regression parameter estimation in the presence of colored noise, due to the frequency leakage of the noise power. We present a regression method which cancels this effect and estimates the parameters of interest with a precision comparable to the complete data case, even if the noise power spectral density (PSD) is not known a priori. The method is based on an autoregressive (AR) fit of the noise, which allows us to build an approximate generalized least squares estimator approaching the minimal variance bound. The method, which can be applied to any similar data processing, is tested on simulated measurements of the MICROSCOPE space mission, whos...

  9. A general framework for the use of logistic regression models in meta-analysis.

    Science.gov (United States)

    Simmonds, Mark C; Higgins, Julian Pt

    2016-12-01

    Where individual participant data are available for every randomised trial in a meta-analysis of dichotomous event outcomes, "one-stage" random-effects logistic regression models have been proposed as a way to analyse these data. Such models can also be used even when individual participant data are not available and we have only summary contingency table data. One benefit of this one-stage regression model over conventional meta-analysis methods is that it maximises the correct binomial likelihood for the data and so does not require the common assumption that effect estimates are normally distributed. A second benefit of using this model is that it may be applied, with only minor modification, in a range of meta-analytic scenarios, including meta-regression, network meta-analyses and meta-analyses of diagnostic test accuracy. This single model can potentially replace the variety of often complex methods used in these areas. This paper considers, with a range of meta-analysis examples, how random-effects logistic regression models may be used in a number of different types of meta-analyses. This one-stage approach is compared with widely used meta-analysis methods including Bayesian network meta-analysis and the bivariate and hierarchical summary receiver operating characteristic (ROC) models for meta-analyses of diagnostic test accuracy.

  10. A Bayesian ridge regression analysis of congestion's impact on urban expressway safety.

    Science.gov (United States)

    Shi, Qi; Abdel-Aty, Mohamed; Lee, Jaeyoung

    2016-03-01

    With the rapid growth of traffic in urban areas, concerns about congestion and traffic safety have been heightened. This study leveraged both Automatic Vehicle Identification (AVI) system and Microwave Vehicle Detection System (MVDS) installed on an expressway in Central Florida to explore how congestion impacts the crash occurrence in urban areas. Multiple congestion measures from the two systems were developed. To ensure more precise estimates of the congestion's effects, the traffic data were aggregated into peak and non-peak hours. Multicollinearity among traffic parameters was examined. The results showed the presence of multicollinearity especially during peak hours. As a response, ridge regression was introduced to cope with this issue. Poisson models with uncorrelated random effects, correlated random effects, and both correlated random effects and random parameters were constructed within the Bayesian framework. It was proven that correlated random effects could significantly enhance model performance. The random parameters model has similar goodness-of-fit compared with the model with only correlated random effects. However, by accounting for the unobserved heterogeneity, more variables were found to be significantly related to crash frequency. The models indicated that congestion increased crash frequency during peak hours while during non-peak hours it was not a major crash contributing factor. Using the random parameter model, the three congestion measures were compared. It was found that all congestion indicators had similar effects while Congestion Index (CI) derived from MVDS data was a better congestion indicator for safety analysis. Also, analyses showed that the segments with higher congestion intensity could not only increase property damage only (PDO) crashes, but also more severe crashes. In addition, the issues regarding the necessity to incorporate specific congestion indicator for congestion's effects on safety and to take care of the

  11. No rationale for 1 variable per 10 events criterion for binary logistic regression analysis

    Directory of Open Access Journals (Sweden)

    Maarten van Smeden

    2016-11-01

    Full Text Available Abstract Background Ten events per variable (EPV is a widely advocated minimal criterion for sample size considerations in logistic regression analysis. Of three previous simulation studies that examined this minimal EPV criterion only one supports the use of a minimum of 10 EPV. In this paper, we examine the reasons for substantial differences between these extensive simulation studies. Methods The current study uses Monte Carlo simulations to evaluate small sample bias, coverage of confidence intervals and mean square error of logit coefficients. Logistic regression models fitted by maximum likelihood and a modified estimation procedure, known as Firth’s correction, are compared. Results The results show that besides EPV, the problems associated with low EPV depend on other factors such as the total sample size. It is also demonstrated that simulation results can be dominated by even a few simulated data sets for which the prediction of the outcome by the covariates is perfect (‘separation’. We reveal that different approaches for identifying and handling separation leads to substantially different simulation results. We further show that Firth’s correction can be used to improve the accuracy of regression coefficients and alleviate the problems associated with separation. Conclusions The current evidence supporting EPV rules for binary logistic regression is weak. Given our findings, there is an urgent need for new research to provide guidance for supporting sample size considerations for binary logistic regression analysis.

  12. Spline Nonparametric Regression Analysis of Stress-Strain Curve of Confined Concrete

    Directory of Open Access Journals (Sweden)

    Tavio Tavio

    2008-01-01

    Full Text Available Due to enormous uncertainties in confinement models associated with the maximum compressive strength and ductility of concrete confined by rectilinear ties, the implementation of spline nonparametric regression analysis is proposed herein as an alternative approach. The statistical evaluation is carried out based on 128 large-scale column specimens of either normal-or high-strength concrete tested under uniaxial compression. The main advantage of this kind of analysis is that it can be applied when the trend of relation between predictor and response variables are not obvious. The error in the analysis can, therefore, be minimized so that it does not depend on the assumption of a particular shape of the curve. This provides higher flexibility in the application. The results of the statistical analysis indicates that the stress-strain curves of confined concrete obtained from the spline nonparametric regression analysis proves to be in good agreement with the experimental curves available in literatures

  13. Isolating the Effects of Training Using Simple Regression Analysis: An Example of the Procedure.

    Science.gov (United States)

    Waugh, C. Keith

    This paper provides a case example of simple regression analysis, a forecasting procedure used to isolate the effects of training from an identified extraneous variable. This case example focuses on results of a three-day sales training program to improve bank loan officers' knowledge, skill-level, and attitude regarding solicitation and sale of…

  14. Explaining Differences in Civic Knowledge: Multi-Level Regression Analysis of Student Data from 27 Countries.

    Science.gov (United States)

    Schulz, Wolfram

    Differences in student knowledge about democracy, institutions, and citizenship and students skills in interpreting political communication were studied through multilevel regression analysis of results from the second International Education Association (IEA) Study. This study provides data on 14-year-old students from 28 countries in Europe,…

  15. Family background variables as instruments for education in income regressions: A Bayesian analysis

    NARCIS (Netherlands)

    L.F. Hoogerheide (Lennart); J.H. Block (Jörn); A.R. Thurik (Roy)

    2012-01-01

    textabstractThe validity of family background variables instrumenting education in income regressions has been much criticized. In this paper, we use data from the 2004 German Socio-Economic Panel and Bayesian analysis to analyze to what degree violations of the strict validity assumption affect the

  16. A systematic review and meta-regression analysis of mivacurium for tracheal intubation

    NARCIS (Netherlands)

    Vanlinthout, L.E.H.; Mesfin, S.H.; Hens, N.; Vanacker, B.F.; Robertson, E.N.; Booij, L.H.D.J.

    2014-01-01

    We systematically reviewed factors associated with intubation conditions in randomised controlled trials of mivacurium, using random-effects meta-regression analysis. We included 29 studies of 1050 healthy participants. Four factors explained 72.9% of the variation in the probability of excellent in

  17. 中小学生自感课业负担的理论解释--基于北京调查样本的Logistic多项回归模型分析%The Theoretical Study On Self-perceived Academic Burden of Primary and Middle School Students A Multiple Logistic Regression Model Analysis Based on the Sample of Beijing Survey

    Institute of Scientific and Technical Information of China (English)

    王东

    2016-01-01

    “客观说”和“建构说”是研究课业负担原因的两类取向。基于2014年北京中小学生的实证调查数据,建立了囊括“客观说”“建构说”的Logistic多项回归模型。模型的结果证实了基于“客观说”的一些假设,例如学习成绩好的学生自感负担程度较低;教师质量高,学生自感负担轻;学校课程选择性高,学生课业负担也相对较轻。同时,Logistic多项回归模型也证实了“建构说”的一些假设,例如学生预期学历水平越高,自感负担越重;应试压力较强的学生,自感负担较重;体现学习态度的学习价值感、学习快乐感也对学生自感负担程度有显著影响。Logistic多项回归结果表明,作为一种客观“实在”,“课业负担”在中小学生中确实存在(客观说);然而学生对此的感受则会有强弱差异(建构说)。基于此,笔者提出“课业负担感”的概念,试图整合“客观说”和“建构说”两种观点。相比于传统的“课业负担”概念,笔者认为“课业负担感”这一概念提供了更为广阔的理论研究空间,对于“政策”导向的减负策略研究也富有价值。%There are two kind of theoretical hypothesis to explain the causes of students academic burden.One of them is the objective theory;another one is the construction theory.Based on the empirical survey data of primary and middle school students in Beijing in 2014,a multiple logistic regression model is established,which includes “objective theory”and “construction theory”.The results of this model was confirmed based on the “objective”of some of the assumptions,such as learning good grades students’ self-perceived burden level is low;the quality of teachers,students’ self-perceived burden light;high school curriculum selectivity,academic burden is also relatively light.At the same time,the multiple logistic regression model also

  18. Analysis of genomic signatures in prokaryotes using multinomial regression and hierarchical clustering

    DEFF Research Database (Denmark)

    Ussery, David; Bohlin, Jon; Skjerve, Eystein

    2009-01-01

    with AT content, phyla, growth temperature, selective pressure, habitat, sequence size, oxygen requirement and pathogenicity as predictors. Many significant factors were associated with the genomic signature, most notably AT content. Phyla was also an important factor, although considerably less so than...... AT content. Small improvements to the regression model, although significant, were also obtained by factors such as sequence size, habitat, growth temperature, selective pressure measured as oligonucleotide usage variance, and oxygen requirement.The statistics obtained using hierarchical clustering...... and multinomial regression analysis indicate that the genomic signature is shaped by many factors, and this may explain the varying ability to classify prokaryotic organisms below genus level....

  19. Automatic regression analysis for use in a complex system of evaluation of plant genetic resources

    Directory of Open Access Journals (Sweden)

    Cs. ARKOSSY

    1984-08-01

    Full Text Available In accordance with the general requirements regarding computerization in gene banks and germplasm research a computer program has been compiled for the analysis of univariate response in crop germplasm evaluation. The program is compiled in COBOL and run on a FELIX C-256 computer. The different modules of the program allows for: (1. data control and error listing; (2 computation of the regression function; (3 listing of the difference between the values measured and computed; (4 sorting of the individuals samples; (5 construction of scattergrams in two dimensions for measured values with the simultaneous representation of the regression line; (6 listing of examined samples in a sequence required in evaluation.

  20. Methods and applications of linear models regression and the analysis of variance

    CERN Document Server

    Hocking, Ronald R

    2013-01-01

    Praise for the Second Edition"An essential desktop reference book . . . it should definitely be on your bookshelf." -Technometrics A thoroughly updated book, Methods and Applications of Linear Models: Regression and the Analysis of Variance, Third Edition features innovative approaches to understanding and working with models and theory of linear regression. The Third Edition provides readers with the necessary theoretical concepts, which are presented using intuitive ideas rather than complicated proofs, to describe the inference that is appropriate for the methods being discussed. The book

  1. Predicting strength of recycled aggregate concrete using Artificial Neural Network, Adaptive Neuro-Fuzzy Inference System and Multiple Linear Regression

    Directory of Open Access Journals (Sweden)

    Faezehossadat Khademi

    2016-12-01

    Full Text Available Compressive strength of concrete, recognized as one of the most significant mechanical properties of concrete, is identified as one of the most essential factors for the quality assurance of concrete. In the current study, three different data-driven models, i.e., Artificial Neural Network (ANN, Adaptive Neuro-Fuzzy Inference System (ANFIS, and Multiple Linear Regression (MLR were used to predict the 28 days compressive strength of recycled aggregate concrete (RAC. Recycled aggregate is the current need of the hour owing to its environmental pleasant aspect of re-using the wastes due to construction. 14 different input parameters, including both dimensional and non-dimensional parameters, were used in this study for predicting the 28 days compressive strength of concrete. The present study concluded that estimation of 28 days compressive strength of recycled aggregate concrete was performed better by ANN and ANFIS in comparison to MLR. In other words, comparing the test step of all the three models, it can be concluded that the MLR model is better to be utilized for preliminary mix design of concrete, and ANN and ANFIS models are suggested to be used in the mix design optimization and in the case of higher accuracy necessities. In addition, the performance of data-driven models with and without the non-dimensional parameters is explored. It was observed that the data-driven models show better accuracy when the non-dimensional parameters were used as additional input parameters. Furthermore, the effect of each non-dimensional parameter on the performance of each data-driven model is investigated. Finally, the effect of number of input parameters on 28 days compressive strength of concrete is examined.

  2. Analysis of the Influence of Quantile Regression Model on Mainland Tourists’ Service Satisfaction Performance

    Directory of Open Access Journals (Sweden)

    Wen-Cheng Wang

    2014-01-01

    Full Text Available It is estimated that mainland Chinese tourists travelling to Taiwan can bring annual revenues of 400 billion NTD to the Taiwan economy. Thus, how the Taiwanese Government formulates relevant measures to satisfy both sides is the focus of most concern. Taiwan must improve the facilities and service quality of its tourism industry so as to attract more mainland tourists. This paper conducted a questionnaire survey of mainland tourists and used grey relational analysis in grey mathematics to analyze the satisfaction performance of all satisfaction question items. The first eight satisfaction items were used as independent variables, and the overall satisfaction performance was used as a dependent variable for quantile regression model analysis to discuss the relationship between the dependent variable under different quantiles and independent variables. Finally, this study further discussed the predictive accuracy of the least mean regression model and each quantile regression model, as a reference for research personnel. The analysis results showed that other variables could also affect the overall satisfaction performance of mainland tourists, in addition to occupation and age. The overall predictive accuracy of quantile regression model Q0.25 was higher than that of the other three models.

  3. MIXED DENTITION SPACE ANALYSIS OF A SOUTHERN ITALIAN POPULATION: NEW REGRESSION EQUATIONS FOR UNERUPTED TEETH.

    Science.gov (United States)

    Cirulli, N; Ballini, A; Cantore, S; Farronato, D; Inchingolo, F; Dipalma, G; Gatto, M R; Alessandri Bonetti, G

    2015-01-01

    Mixed dentition analysis forms a critical aspect of early orthodontic treatment. In fact an accurate space analysis is one of the important criteria in determining whether the treatment plan may involve serial extraction, guidance of eruption, space maintenance, space regaining or just periodic observation of the patients. The aim of the present study was to calculate linear regression equations in mixed dentition space analysis, measuring 230 dental casts mesiodistal tooth widths, obtained from southern Italian patients (118 females, 112 males, mean age 15±3 years). Student’s t-test or Wilcoxon test for independent and paired samples were used to determine right/left side and male/female differences. On the basis of the sum of the mesiodistal diameters of the 4 mandibular incisors as predictors for the sum of the widths of the canines and premolars in the mandibular mixed dentition, a new linear regression equation was found: y = 0.613x+7.294 (r= 0.701) for both genders in a southern Italian population. To better estimate the size of leeway space, a new regression equation was found to calculate the mesiodistal size of the second premolar using the sum of the four mandibular incisors, canine and first premolar as a predictor. The equation is y = 0.241x+1.224 (r= 0.732). In conclusion, new regression equations were derived for a southern Italian population.

  4. Understanding child stunting in India: a comprehensive analysis of socio-economic, nutritional and environmental determinants using additive quantile regression.

    Directory of Open Access Journals (Sweden)

    Nora Fenske

    Full Text Available BACKGROUND: Most attempts to address undernutrition, responsible for one third of global child deaths, have fallen behind expectations. This suggests that the assumptions underlying current modelling and intervention practices should be revisited. OBJECTIVE: We undertook a comprehensive analysis of the determinants of child stunting in India, and explored whether the established focus on linear effects of single risks is appropriate. DESIGN: Using cross-sectional data for children aged 0-24 months from the Indian National Family Health Survey for 2005/2006, we populated an evidence-based diagram of immediate, intermediate and underlying determinants of stunting. We modelled linear, non-linear, spatial and age-varying effects of these determinants using additive quantile regression for four quantiles of the Z-score of standardized height-for-age and logistic regression for stunting and severe stunting. RESULTS: At least one variable within each of eleven groups of determinants was significantly associated with height-for-age in the 35% Z-score quantile regression. The non-modifiable risk factors child age and sex, and the protective factors household wealth, maternal education and BMI showed the largest effects. Being a twin or multiple birth was associated with dramatically decreased height-for-age. Maternal age, maternal BMI, birth order and number of antenatal visits influenced child stunting in non-linear ways. Findings across the four quantile and two logistic regression models were largely comparable. CONCLUSIONS: Our analysis confirms the multifactorial nature of child stunting. It emphasizes the need to pursue a systems-based approach and to consider non-linear effects, and suggests that differential effects across the height-for-age distribution do not play a major role.

  5. Regression analysis to predict growth performance from dietary net energy in growing-finishing pigs.

    Science.gov (United States)

    Nitikanchana, S; Dritz, S S; Tokach, M D; DeRouchey, J M; Goodband, R D; White, B J

    2015-06-01

    Data from 41 trials with multiple energy levels (285 observations) were used in a meta-analysis to predict growth performance based on dietary NE concentration. Nutrient and energy concentrations in all diets were estimated using the NRC ingredient library. Predictor variables examined for best fit models using Akaike information criteria included linear and quadratic terms of NE, BW, CP, standardized ileal digestible (SID) Lys, crude fiber, NDF, ADF, fat, ash, and their interactions. The initial best fit models included interactions between NE and CP or SID Lys. After removal of the observations that fed SID Lys below the suggested requirement, these terms were no longer significant. Including dietary fat in the model with NE and BW significantly improved the G:F prediction model, indicating that NE may underestimate the influence of fat on G:F. The meta-analysis indicated that, as long as diets are adequate for other nutrients (i.e., Lys), dietary NE is adequate to predict changes in ADG across different dietary ingredients and conditions. The analysis indicates that ADG increases with increasing dietary NE and BW but decreases when BW is above 87 kg. The G:F ratio improves with increasing dietary NE and fat but decreases with increasing BW. The regression equations were then evaluated by comparing the actual and predicted performance of 543 finishing pigs in 2 trials fed 5 dietary treatments, included 3 different levels of NE by adding wheat middlings, soybean hulls, dried distillers grains with solubles (DDGS; 8 to 9% oil), or choice white grease (CWG) to a corn-soybean meal-based diet. Diets were 1) 30% DDGS, 20% wheat middlings, and 4 to 5% soybean hulls (low energy); 2) 20% wheat middlings and 4 to 5% soybean hulls (low energy); 3) a corn-soybean meal diet (medium energy); 4) diet 2 supplemented with 3.7% CWG to equalize the NE level to diet 3 (medium energy); and 5) a corn-soybean meal diet with 3.7% CWG (high energy). Only small differences were observed

  6. Multilayer Perceptron for Robust Nonlinear Interval Regression Analysis Using Genetic Algorithms

    Science.gov (United States)

    2014-01-01

    On the basis of fuzzy regression, computational models in intelligence such as neural networks have the capability to be applied to nonlinear interval regression analysis for dealing with uncertain and imprecise data. When training data are not contaminated by outliers, computational models perform well by including almost all given training data in the data interval. Nevertheless, since training data are often corrupted by outliers, robust learning algorithms employed to resist outliers for interval regression analysis have been an interesting area of research. Several approaches involving computational intelligence are effective for resisting outliers, but the required parameters for these approaches are related to whether the collected data contain outliers or not. Since it seems difficult to prespecify the degree of contamination beforehand, this paper uses multilayer perceptron to construct the robust nonlinear interval regression model using the genetic algorithm. Outliers beyond or beneath the data interval will impose slight effect on the determination of data interval. Simulation results demonstrate that the proposed method performs well for contaminated datasets. PMID:25110755

  7. Multilayer perceptron for robust nonlinear interval regression analysis using genetic algorithms.

    Science.gov (United States)

    Hu, Yi-Chung

    2014-01-01

    On the basis of fuzzy regression, computational models in intelligence such as neural networks have the capability to be applied to nonlinear interval regression analysis for dealing with uncertain and imprecise data. When training data are not contaminated by outliers, computational models perform well by including almost all given training data in the data interval. Nevertheless, since training data are often corrupted by outliers, robust learning algorithms employed to resist outliers for interval regression analysis have been an interesting area of research. Several approaches involving computational intelligence are effective for resisting outliers, but the required parameters for these approaches are related to whether the collected data contain outliers or not. Since it seems difficult to prespecify the degree of contamination beforehand, this paper uses multilayer perceptron to construct the robust nonlinear interval regression model using the genetic algorithm. Outliers beyond or beneath the data interval will impose slight effect on the determination of data interval. Simulation results demonstrate that the proposed method performs well for contaminated datasets.

  8. Regressão múltipla stepwise e hierárquica em Psicologia Organizacional: aplicações, problemas e soluções Stepwise and hierarchical multiple regression in organizational psychology: Applications, problemas and solutions

    Directory of Open Access Journals (Sweden)

    Gardênia Abbad

    2002-01-01

    Full Text Available Este artigo discute algumas aplicações das técnicas de análise de regressão múltipla stepwise e hierárquica, as quais são muito utilizadas em pesquisas da área de Psicologia Organizacional. São discutidas algumas estratégias de identificação e de solução de problemas relativos à ocorrência de erros do Tipo I e II e aos fenômenos de supressão, complementaridade e redundância nas equações de regressão múltipla. São apresentados alguns exemplos de pesquisas nas quais esses padrões de associação entre variáveis estiveram presentes e descritas as estratégias utilizadas pelos pesquisadores para interpretá-los. São discutidas as aplicações dessas análises no estudo de interação entre variáveis e na realização de testes para avaliação da linearidade do relacionamento entre variáveis. Finalmente, são apresentadas sugestões para lidar com as limitações das análises de regressão múltipla (stepwise e hierárquica.This article discusses applications of stepwise and hierarchical multiple regression analyses to research in organizational psychology. Strategies for identifying type I and II errors, and solutions to potential problems that may arise from such errors are proposed. In addition, phenomena such as suppression, complementarity, and redundancy are reviewed. The article presents examples of research where these phenomena occurred, and the manner in which they were explained by researchers. Some applications of multiple regression analyses to studies involving between-variable interactions are presented, along with tests used to analyze the presence of linearity among variables. Finally, some suggestions are provided for dealing with limitations implicit in multiple regression analyses (stepwise and hierarchical.

  9. Boolean analysis of addition and multiplication

    Energy Technology Data Exchange (ETDEWEB)

    Faltin, F. (Cornell Univ., Ithaca, NY); Metropolis, N.; Ross, B.; Rota, G.-C.

    1977-01-01

    The notions of binary string and binary symmetric function are introduced, and basic results presented. Boolean algorithms are given for binary addition and multiplication. An analysis of the redundancies involved is straightforward. The examination of carry propagation which arises in the Boolean analysis of functions may lead to a new interpretation of the notion of computational complexity.

  10. Multiple-output support vector machine regression with feature selection for arousal/valence space emotion assessment.

    Science.gov (United States)

    Torres-Valencia, Cristian A; Álvarez, Mauricio A; Orozco-Gutiérrez, Alvaro A

    2014-01-01

    Human emotion recognition (HER) allows the assessment of an affective state of a subject. Until recently, such emotional states were described in terms of discrete emotions, like happiness or contempt. In order to cover a high range of emotions, researchers in the field have introduced different dimensional spaces for emotion description that allow the characterization of affective states in terms of several variables or dimensions that measure distinct aspects of the emotion. One of the most common of such dimensional spaces is the bidimensional Arousal/Valence space. To the best of our knowledge, all HER systems so far have modelled independently, the dimensions in these dimensional spaces. In this paper, we study the effect of modelling the output dimensions simultaneously and show experimentally the advantages in modeling them in this way. We consider a multimodal approach by including features from the Electroencephalogram and a few physiological signals. For modelling the multiple outputs, we employ a multiple output regressor based on support vector machines. We also include an stage of feature selection that is developed within an embedded approach known as Recursive Feature Elimination (RFE), proposed initially for SVM. The results show that several features can be eliminated using the multiple output support vector regressor with RFE without affecting the performance of the regressor. From the analysis of the features selected in smaller subsets via RFE, it can be observed that the signals that are more informative into the arousal and valence space discrimination are the EEG, Electrooculogram/Electromiogram (EOG/EMG) and the Galvanic Skin Response (GSR).

  11. APPLICATION OF GENETIC ALGORITHM-MULTIPLE LINEAR REGRESSION (GA-MLR FOR PREDICTION OF ANTI-FUNGAL ACTIVITY

    Directory of Open Access Journals (Sweden)

    Stephen Eyije Abechi

    2016-04-01

    Full Text Available Aim: To develop good and rational Quantitative Structure Activity Relationship (QSAR mathematical models that can predict to a significant level the anti-tyrosinase and anti-Candida Albicans Minimum inhibitory concentration (MIC of ketone and tetra- etone derivatives. Place and Duration of Study: Department of Chemistry (Mathieson Laboratory (3-Physical Chemistry unit, Ahmadu Bello University, Zaria, Nigeria, between December 2015 and March 2016. Methodology: A set of 44 ketone and tetra-ketone derivatives with their anti-tyrosinase and anti-Candida Albicans activities in terms of minimum inhibitory concentration (MIC against the gram-positive fungal and hyperpigmentation were selected for 1D-3D quantitative structure activity relationship (QSAR analysis using the parameterization method 6 (PM6 basis set. The computed descriptors were correlated with their experimental MIC. Genetic Function Approximation (GFA method and Multi-Linear Regression analysis (MLR were used to derive the most statistically significant QSAR model. Results: The result obtained indicates that the most statistically significant QSAR model was a five- parametric linear equation with the squared correlation coefficient (R2 value of 0.9914, adjusted squared correlation coefficient (R 2 adj value of 0.9896 and Leave one out (LOO cross validation coefficient (Q2 value of 0.9853. An external set was used for confirming the predictive power of the model, its R2 pred = 0.9618 and rm^2 = 0.8981. Conclusion: The QSAR results reveal that molecular mass, atomic mass, polarity, electronic and topological predominantly influence the anti-tyrosinase and anti-Candida Albicans activity of the complexes. The wealth of information in this study will provide an insight to designing novel bioactive ketones and tetra-ketones compound that will curb the emerging trend of multi-drug resistant strain of fungal and hyperpigmentation

  12. Deriving percentage study weights in multi-parameter meta-analysis models: with application to meta-regression, network meta-analysis and one-stage individual participant data models.

    Science.gov (United States)

    Riley, Richard D; Ensor, Joie; Jackson, Dan; Burke, Danielle L

    2017-01-01

    Many meta-analysis models contain multiple parameters, for example due to multiple outcomes, multiple treatments or multiple regression coefficients. In particular, meta-regression models may contain multiple study-level covariates, and one-stage individual participant data meta-analysis models may contain multiple patient-level covariates and interactions. Here, we propose how to derive percentage study weights for such situations, in order to reveal the (otherwise hidden) contribution of each study toward the parameter estimates of interest. We assume that studies are independent, and utilise a decomposition of Fisher's information matrix to decompose the total variance matrix of parameter estimates into study-specific contributions, from which percentage weights are derived. This approach generalises how percentage weights are calculated in a traditional, single parameter meta-analysis model. Application is made to one- and two-stage individual participant data meta-analyses, meta-regression and network (multivariate) meta-analysis of multiple treatments. These reveal percentage study weights toward clinically important estimates, such as summary treatment effects and treatment-covariate interactions, and are especially useful when some studies are potential outliers or at high risk of bias. We also derive percentage study weights toward methodologically interesting measures, such as the magnitude of ecological bias (difference between within-study and across-study associations) and the amount of inconsistency (difference between direct and indirect evidence in a network meta-analysis).

  13. Analysis of the Evolution of the Gross Domestic Product by Means of Cyclic Regressions

    Directory of Open Access Journals (Sweden)

    Catalin Angelo Ioan

    2011-08-01

    Full Text Available In this article, we will carry out an analysis on the regularity of the Gross Domestic Product of a country, in our case the United States. The method of analysis is based on a new method of analysis – the cyclic regressions based on the Fourier series of a function. Another point of view is that of considering instead the growth rate of GDP the speed of variation of this rate, computed as a numerical derivative. The obtained results show a cycle for this indicator for 71 years, the mean square error being 0.93%. The method described allows an prognosis on short-term trends in GDP.

  14. Comparison of Artificial Neural Networks and Logistic Regression Analysis in the Credit Risk Prediction

    Directory of Open Access Journals (Sweden)

    Hüseyin BUDAK

    2012-11-01

    Full Text Available Credit scoring is a vital topic for Banks since there is a need to use limited financial sources more effectively. There are several credit scoring methods that are used by Banks. One of them is to estimate whether a credit demanding customer’s repayment order will be regular or not. In this study, artificial neural networks and logistic regression analysis have been used to provide a support to the Banks’ credit risk prediction and to estimate whether a credit demanding customers’ repayment order will be regular or not. The results of the study showed that artificial neural networks method is more reliable than logistic regression analysis while estimating a credit demanding customer’s repayment order.

  15. Analysis of designed experiments by stabilised PLS Regression and jack-knifing

    DEFF Research Database (Denmark)

    Martens, Harald; Høy, M.; Westad, F.

    2001-01-01

    Pragmatical, visually oriented methods for assessing and optimising bi-linear regression models are described, and applied to PLS Regression (PLSR) analysis of multi-response data from controlled experiments. The paper outlines some ways to stabilise the PLSR method to extend its range...... of applicability to the analysis of effects in designed experiments. Two ways of passifying unreliable variables are shown. A method for estimating the reliability of the cross- validated prediction error RMSEP is demonstrated. Some recently developed jack-knifing extensions are illustrated, for estimating...... the reliability of the linear and bi-linear model parameter estimates. The paper illustrates how the obtained PLSR "significance" probabilities are similar to those from conventional factorial ANOVA, but the PLSR is shown to give important additional overview plots of the main relevant structures in the multi...

  16. Regression And Time Series Analysis Of Loan Default At Minescho Cooperative Credit Union Tarkwa

    Directory of Open Access Journals (Sweden)

    Otoo

    2015-08-01

    Full Text Available Abstract Lending in the form of loans is a principal business activity for banks credit unions and other financial institutions. This forms a substantial amount of the banks assets. However when these loans are defaulted it tends to have serious effects on the financial institutions. This study sought to determine the trend and forecast loan default at Minescho CreditUnion Tarkwa. A secondary data from the Credit Union was analyzed using Regression Analysis and the Box-Jenkins method of Time Series. From the Regression Analysis there was a moderately strong relationship between the amount of loan default and time. Also the amount of loan default had an increasing trend. The two years forecast of the amount of loan default oscillated initially and remained constant from 2016 onwards.

  17. Forecasting municipal solid waste generation using prognostic tools and regression analysis.

    Science.gov (United States)

    Ghinea, Cristina; Drăgoi, Elena Niculina; Comăniţă, Elena-Diana; Gavrilescu, Marius; Câmpean, Teofil; Curteanu, Silvia; Gavrilescu, Maria

    2016-11-01

    For an adequate planning of waste management systems the accurate forecast of waste generation is an essential step, since various factors can affect waste trends. The application of predictive and prognosis models are useful tools, as reliable support for decision making processes. In this paper some indicators such as: number of residents, population age, urban life expectancy, total municipal solid waste were used as input variables in prognostic models in order to predict the amount of solid waste fractions. We applied Waste Prognostic Tool, regression analysis and time series analysis to forecast municipal solid waste generation and composition by considering the Iasi Romania case study. Regression equations were determined for six solid waste fractions (paper, plastic, metal, glass, biodegradable and other waste). Accuracy Measures were calculated and the results showed that S-curve trend model is the most suitable for municipal solid waste (MSW) prediction.

  18. Regression analysis of growth responses to water depth in three wetland plant species

    DEFF Research Database (Denmark)

    Sorrell, Brian K; Tanner, Chris C; Brix, Hans

    2012-01-01

    ) differing in depth preferences in wetlands, using non-linear and quantile regression analyses to establish how flooding tolerance can explain field zonation. Methodology Plants were established for 8 months in outdoor cultures in waterlogged soil without standing water, and then randomly allocated to water...... depths from 0 – 0.5 m. Morphological and growth responses to depth were followed for 54 days before harvest, and then analysed by repeated measures analysis of covariance, and non-linear and quantile regression analysis (QRA), to compare flooding tolerances. Principal results Growth responses to depth...... differed between the three species, and were non-linear. P. tenax growth rapidly decreased in standing water > 0.25 m depth, C. secta growth increased initially with depth but then decreased at depths > 0.30 m, accompanied by increased shoot height and decreased shoot density, and T. orientalis...

  19. Regression Analysis of Right-censored Failure Time Data with Missing Censoring Indicators

    Institute of Scientific and Technical Information of China (English)

    Ping Chen; Ren He; Jun-shan Shen; Jian-guo Sun

    2009-01-01

    This paper discusses regression analysis of right-censored failure time data when censoring indicators are missing for some subjects. Several methods have been developed for the analysis under different situations and especially, Goetghebeur and Ryan[4] considered the situation where both the failure time and the censoring time follow the proportional hazards models marginally and developed an estimating equation approach. One limitation of their approach is that the two baseline hazard functions were assumed to be proportional to each other. We consider the same problem and present an efficient estimation procedure for regression parameters that does not require the proportionality assumption. An EM algorithm is developed and the method is evaluated by a simulation study, which indicates that the proposed methodology performs well for practical situations. An illustrative example is provided.

  20. CARAT-GxG: CUDA-Accelerated Regression Analysis Toolkit for Large-Scale Gene-Gene Interaction with GPU Computing System.

    Science.gov (United States)

    Lee, Sungyoung; Kwon, Min-Seok; Park, Taesung

    2014-01-01

    In genome-wide association studies (GWAS), regression analysis has been most commonly used to establish an association between a phenotype and genetic variants, such as single nucleotide polymorphism (SNP). However, most applications of regression analysis have been restricted to the investigation of single marker because of the large computational burden. Thus, there have been limited applications of regression analysis to multiple SNPs, including gene-gene interaction (GGI) in large-scale GWAS data. In order to overcome this limitation, we propose CARAT-GxG, a GPU computing system-oriented toolkit, for performing regression analysis with GGI using CUDA (compute unified device architecture). Compared to other methods, CARAT-GxG achieved almost 700-fold execution speed and delivered highly reliable results through our GPU-specific optimization techniques. In addition, it was possible to achieve almost-linear speed acceleration with the application of a GPU computing system, which is implemented by the TORQUE Resource Manager. We expect that CARAT-GxG will enable large-scale regression analysis with GGI for GWAS data.

  1. [Regression analysis of an instrumental conditioned tentacular reflex in the edible snail].

    Science.gov (United States)

    Stepanov, I I; Kuntsevich, S V; Lokhov, M I

    1989-01-01

    Regression analysis revealed the opportunity of approximation with exponential mathematical model of the learning curves of conditioned tentacle reflex. Retention of the reflex persisted for more than three weeks. There were some quantitative differences between conditioning of the right and the left tentacle. There was formation of the reflex in every session during spring period, but there was no retention between sessions. The conditioned tentacle reflex may be employed in neuropharmacological studies.

  2. Robust estimation for homoscedastic regression in the secondary analysis of case-control data

    KAUST Repository

    Wei, Jiawei

    2012-12-04

    Primary analysis of case-control studies focuses on the relationship between disease D and a set of covariates of interest (Y, X). A secondary application of the case-control study, which is often invoked in modern genetic epidemiologic association studies, is to investigate the interrelationship between the covariates themselves. The task is complicated owing to the case-control sampling, where the regression of Y on X is different from what it is in the population. Previous work has assumed a parametric distribution for Y given X and derived semiparametric efficient estimation and inference without any distributional assumptions about X. We take up the issue of estimation of a regression function when Y given X follows a homoscedastic regression model, but otherwise the distribution of Y is unspecified. The semiparametric efficient approaches can be used to construct semiparametric efficient estimates, but they suffer from a lack of robustness to the assumed model for Y given X. We take an entirely different approach. We show how to estimate the regression parameters consistently even if the assumed model for Y given X is incorrect, and thus the estimates are model robust. For this we make the assumption that the disease rate is known or well estimated. The assumption can be dropped when the disease is rare, which is typically so for most case-control studies, and the estimation algorithm simplifies. Simulations and empirical examples are used to illustrate the approach.

  3. Non-Stationary Hydrologic Frequency Analysis using B-Splines Quantile Regression

    Science.gov (United States)

    Nasri, B.; St-Hilaire, A.; Bouezmarni, T.; Ouarda, T.

    2015-12-01

    Hydrologic frequency analysis is commonly used by engineers and hydrologists to provide the basic information on planning, design and management of hydraulic structures and water resources system under the assumption of stationarity. However, with increasing evidence of changing climate, it is possible that the assumption of stationarity would no longer be valid and the results of conventional analysis would become questionable. In this study, we consider a framework for frequency analysis of extreme flows based on B-Splines quantile regression, which allows to model non-stationary data that have a dependence on covariates. Such covariates may have linear or nonlinear dependence. A Markov Chain Monte Carlo (MCMC) algorithm is used to estimate quantiles and their posterior distributions. A coefficient of determination for quantiles regression is proposed to evaluate the estimation of the proposed model for each quantile level. The method is applied on annual maximum and minimum streamflow records in Ontario, Canada. Climate indices are considered to describe the non-stationarity in these variables and to estimate the quantiles in this case. The results show large differences between the non-stationary quantiles and their stationary equivalents for annual maximum and minimum discharge with high annual non-exceedance probabilities. Keywords: Quantile regression, B-Splines functions, MCMC, Streamflow, Climate indices, non-stationarity.

  4. A robust multiple-locus method for quantitative trait locus analysis of non-normally distributed multiple traits.

    Science.gov (United States)

    Li, Z; Möttönen, J; Sillanpää, M J

    2015-12-01

    Linear regression-based quantitative trait loci/association mapping methods such as least squares commonly assume normality of residuals. In genetics studies of plants or animals, some quantitative traits may not follow normal distribution because the data include outlying observations or data that are collected from multiple sources, and in such cases the normal regression methods may lose some statistical power to detect quantitative trait loci. In this work, we propose a robust multiple-locus regression approach for analyzing multiple quantitative traits without normality assumption. In our method, the objective function is least absolute deviation (LAD), which corresponds to the assumption of multivariate Laplace distributed residual errors. This distribution has heavier tails than the normal distribution. In addition, we adopt a group LASSO penalty to produce shrinkage estimation of the marker effects and to describe the genetic correlation among phenotypes. Our LAD-LASSO approach is less sensitive to the outliers and is more appropriate for the analysis of data with skewedly distributed phenotypes. Another application of our robust approach is on missing phenotype problem in multiple-trait analysis, where the missing phenotype items can simply be filled with some extreme values, and be treated as outliers. The efficiency of the LAD-LASSO approach is illustrated on both simulated and real data sets.

  5. Correlation Study and Regression Analysis of Drinking Water Quality in Kashan City, Iran

    Directory of Open Access Journals (Sweden)

    Mohammad Mehdi HEYDARI

    2013-06-01

    Full Text Available Chemical and statistical regression analysis on drinking water samples at five fields (21 sampling wells with hot and dry climate in Kashan city, central Iran was carried out. Samples were collected during October 2006 to May 2007 (25 - 30 °C. Comparing the results with drinking water quality standards issued by World Health Organization (WHO, it is found that some of the water samples are not potable. Hydrochemical facies using a Piper diagram indicate that in most parts of the city, the chemical character of water is dominated by NaCl. All samples showed sulfate and sodium ion higher and K+ and F- content lower than the permissible limit. A strongly positive correlation is observed between TDS and EC (R = 0.995 and Ca2+ and TH (R = 0.948. The results showed that regression relations have the same correlation coefficients: (I pH -TH, EC -TH (R = 0.520, (II NO3- -pH, TH-pH (R = 0.520, (III Ca2+-SO42-, TH-SO42-, Cl- -SO42- (R = 0.630. The results revealed that systematic calculations of correlation coefficients between water parameters and regression analysis provide a useful means for rapid monitoring of water quality.

  6. Reducing a spatial database to its effective dimensionality for logistic-regression analysis of incidence of livestock disease.

    Science.gov (United States)

    Duchateau, L; Kruska, R L; Perry, B D

    1997-10-01

    Large databases with multiple variables, selected because they are available and might provide an insight into establishing causal relationships, are often difficult to analyse and interpret because of multicollinearity. The objective of this study was to reduce the dimensionality of a multivariable spatial database of Zimbabwe, containing many environmental variables that were collected to predict the distribution of outbreaks of theileriosis (the tick-borne infection of cattle caused by Theileria parva and transmitted by the brown ear tick). Principal-component analysis and varimax rotation of the principal components were first used to select a reduced number of variables. The logistic-regression model was evaluated by appropriate goodness-of-fit tests.

  7. THE PROGNOSIS OF RUSSIAN DEFENSE INDUSTRY DEVELOPMENT IMPLEMENTED THROUGH REGRESSION ANALYSIS

    Directory of Open Access Journals (Sweden)

    L.M. Kapustina

    2007-03-01

    Full Text Available The article illustrates the results of investigation the major internal and external factors which influence the development of the defense industry, as well as the results of regression analysis which quantitatively displays the factorial contribution in the growth rate of Russian defense industry. On the basis of calculated regression dependences the authors fulfilled the medium-term prognosis of defense industry. Optimistic and inertial versions of defense product growth rate for the period up to 2009 are based on scenario conditions in Russian economy worked out by the Ministry of economy and development. In conclusion authors point out which factors and conditions have the largest impact on successful and stable operation of Russian defense industry.

  8. Predicting electricity energy consumption: A comparison of regression analysis, decision tree and neural networks

    Energy Technology Data Exchange (ETDEWEB)

    Tso, Geoffrey K.F.; Yau, Kelvin K.W. [City University of Hong Kong, Kowloon, Hong Kong (China). Department of Management Sciences

    2007-09-15

    This study presents three modeling techniques for the prediction of electricity energy consumption. In addition to the traditional regression analysis, decision tree and neural networks are considered. Model selection is based on the square root of average squared error. In an empirical application to an electricity energy consumption study, the decision tree and neural network models appear to be viable alternatives to the stepwise regression model in understanding energy consumption patterns and predicting energy consumption levels. With the emergence of the data mining approach for predictive modeling, different types of models can be built in a unified platform: to implement various modeling techniques, assess the performance of different models and select the most appropriate model for future prediction. (author)

  9. Sub-pixel estimation of tree cover and bare surface densities using regression tree analysis

    Directory of Open Access Journals (Sweden)

    Carlos Augusto Zangrando Toneli

    2011-09-01

    Full Text Available Sub-pixel analysis is capable of generating continuous fields, which represent the spatial variability of certain thematic classes. The aim of this work was to develop numerical models to represent the variability of tree cover and bare surfaces within the study area. This research was conducted in the riparian buffer within a watershed of the São Francisco River in the North of Minas Gerais, Brazil. IKONOS and Landsat TM imagery were used with the GUIDE algorithm to construct the models. The results were two index images derived with regression trees for the entire study area, one representing tree cover and the other representing bare surface. The use of non-parametric and non-linear regression tree models presented satisfactory results to characterize wetland, deciduous and savanna patterns of forest formation.

  10. The Effects of Agricultural Informatization on Agricultural Economic Growth: An Empirical Analysis Based on Regression Model

    Institute of Scientific and Technical Information of China (English)

    Lingling; TAN

    2013-01-01

    This article selects some major factors influencing the agricultural economic growth are selected,such as labor,capital input,farmland area,fertilizer input and information input.And it selects some factors to explain information input,such as the number of website ownership,types of books,magazines and newspapers published,the number of telephone ownership per 100 households,the number of home computers ownership per 100 households,farmers’ spending on transportation and communication,culture,education,entertainment and services, and the total number of agricultural science and technology service personnel.Using regression model,this article conducts regression analysis of the cross-section data on 31 provinces,autonomous regions and municipalities in 2010.The results show that the building of information infrastructure,the use of means of information,the popularization and promotion of knowledge of agricultural science and technology,play an important role in promoting agricultural economic growth.

  11. Analysis of ontogenetic spectra of populations of plants and lichens via ordinal regression

    Science.gov (United States)

    Sofronov, G. Yu.; Glotov, N. V.; Ivanov, S. M.

    2015-03-01

    Ontogenetic spectra of plants and lichens tend to vary across the populations. This means that if several subsamples within a sample (or a population) were collected, then the subsamples would not be homogeneous. Consequently, the statistical analysis of the aggregated data would not be correct, which could potentially lead to false biological conclusions. In order to take into account the heterogeneity of the subsamples, we propose to use ordinal regression, which is a type of generalized linear regression. In this paper, we study the populations of cowberry Vaccinium vitis-idaea L. and epiphytic lichens Hypogymnia physodes (L.) Nyl. and Pseudevernia furfuracea (L.) Zopf. We obtain estimates for the proportions of between-sample variability in the total variability of the ontogenetic spectra of the populations.

  12. Artificial neural networks environmental forecasting in comparison with multiple linear regression technique: From heavy metals to organic micropollutants screening in agricultural soils

    Science.gov (United States)

    Bonelli, Maria Grazia; Ferrini, Mauro; Manni, Andrea

    2016-12-01

    The assessment of metals and organic micropollutants contamination in agricultural soils is a difficult challenge due to the extensive area used to collect and analyze a very large number of samples. With Dioxins and dioxin-like PCBs measurement methods and subsequent the treatment of data, the European Community advises the develop low-cost and fast methods allowing routing analysis of a great number of samples, providing rapid measurement of these compounds in the environment, feeds and food. The aim of the present work has been to find a method suitable to describe the relations occurring between organic and inorganic contaminants and use the value of the latter in order to forecast the former. In practice, the use of a metal portable soil analyzer coupled with an efficient statistical procedure enables the required objective to be achieved. Compared to Multiple Linear Regression, the Artificial Neural Networks technique has shown to be an excellent forecasting method, though there is no linear correlation between the variables to be analyzed.

  13. Reduced Rank Regression

    DEFF Research Database (Denmark)

    Johansen, Søren

    2008-01-01

    The reduced rank regression model is a multivariate regression model with a coefficient matrix with reduced rank. The reduced rank regression algorithm is an estimation procedure, which estimates the reduced rank regression model. It is related to canonical correlations and involves calculating e...... eigenvalues and eigenvectors. We give a number of different applications to regression and time series analysis, and show how the reduced rank regression estimator can be derived as a Gaussian maximum likelihood estimator. We briefly mention asymptotic results...

  14. Using Correspondence Analysis in Multiple Case Studies

    NARCIS (Netherlands)

    Kienstra, Natascha; van der Heijden, Peter G.M.

    2015-01-01

    In qualitative research of multiple case studies, Miles and Huberman proposed to summarize the separate cases in a so-called meta-matrix that consists of cases by variables. Yin discusses cross-case synthesis to study this matrix. We propose correspondence analysis (CA) as a useful tool to study thi

  15. Using correspondence analysis in multiple case studies

    NARCIS (Netherlands)

    Kienstra, N.H.H.; van der Heijden, P.G.M.

    2015-01-01

    In qualitative research of multiple case studies, Miles and Huberman proposed to summarize the separate cases in a so-called meta-matrix that consists of cases by variables. Yin discusses cross-case synthesis to study this matrix. We propose correspondence analysis (CA) as a useful tool to study thi

  16. Uso de regressões logísticas múltiplas para mapeamento digital de solos no Planalto Médio do RS Multiple logistic regression applied to soil survey in rio grande do sul state, Brazil

    Directory of Open Access Journals (Sweden)

    Samuel Ribeiro Figueiredo

    2008-12-01

    hydrographic variables (distance to rivers, flow length, topographical wetness index, and stream power index. Multiple logistic regressions were established between the soil classes mapped on the basis of a traditional survey at a scale of 1:80.000 and the land variables calculated using the DEM. The regressions were used to calculate the probability of occurrence of each soil class. The final estimated soil map was drawn by assigning the soil class with highest probability of occurrence to each cell. The general accuracy was evaluated at 58 % and the Kappa coefficient at 38 % in a comparison of the original soil map with the map estimated at the original scale. A legend simplification had little effect to increase the general accuracy of the map (general accuracy of 61 % and Kappa coefficient of 39 %. It was concluded that multiple logistic regressions have a predictive potential as tool of supervised soil mapping.

  17. Multiple Regression with Varying Levels of Correlation among Predictors: Monte Carlo Sampling from Normal and Non-Normal Populations.

    Science.gov (United States)

    Vasu, Ellen Storey

    1978-01-01

    The effects of the violation of the assumption of normality in the conditional distributions of the dependent variable, coupled with the condition of multicollinearity upon the outcome of testing the hypothesis that the regression coefficient equals zero, are investigated via a Monte Carlo study. (Author/JKS)

  18. Guide to using Multiple Regression in Excel (MRCX v.1.1) for Removal of River Stage Effects from Well Water Levels

    Energy Technology Data Exchange (ETDEWEB)

    Mackley, Rob D.; Spane, Frank A.; Pulsipher, Trenton C.; Allwardt, Craig H.

    2010-09-01

    A software tool was created in Fiscal Year 2010 (FY11) that enables multiple-regression correction of well water levels for river-stage effects. This task was conducted as part of the Remediation Science and Technology project of CH2MHILL Plateau Remediation Company (CHPRC). This document contains an overview of the correction methodology and a user’s manual for Multiple Regression in Excel (MRCX) v.1.1. It also contains a step-by-step tutorial that shows users how to use MRCX to correct river effects in two different wells. This report is accompanied by an enclosed CD that contains the MRCX installer application and files used in the tutorial exercises.

  19. Implementing informative priors for heterogeneity in meta-analysis using meta-regression and pseudo data.

    Science.gov (United States)

    Rhodes, Kirsty M; Turner, Rebecca M; White, Ian R; Jackson, Dan; Spiegelhalter, David J; Higgins, Julian P T

    2016-12-20

    Many meta-analyses combine results from only a small number of studies, a situation in which the between-study variance is imprecisely estimated when standard methods are applied. Bayesian meta-analysis allows incorporation of external evidence on heterogeneity, providing the potential for more robust inference on the effect size of interest. We present a method for performing Bayesian meta-analysis using data augmentation, in which we represent an informative conjugate prior for between-study variance by pseudo data and use meta-regression for estimation. To assist in this, we derive predictive inverse-gamma distributions for the between-study variance expected in future meta-analyses. These may serve as priors for heterogeneity in new meta-analyses. In a simulation study, we compare approximate Bayesian methods using meta-regression and pseudo data against fully Bayesian approaches based on importance sampling techniques and Markov chain Monte Carlo (MCMC). We compare the frequentist properties of these Bayesian methods with those of the commonly used frequentist DerSimonian and Laird procedure. The method is implemented in standard statistical software and provides a less complex alternative to standard MCMC approaches. An importance sampling approach produces almost identical results to standard MCMC approaches, and results obtained through meta-regression and pseudo data are very similar. On average, data augmentation provides closer results to MCMC, if implemented using restricted maximum likelihood estimation rather than DerSimonian and Laird or maximum likelihood estimation. The methods are applied to real datasets, and an extension to network meta-analysis is described. The proposed method facilitates Bayesian meta-analysis in a way that is accessible to applied researchers. © 2016 The Authors. Statistics in Medicine Published by John Wiley & Sons Ltd.

  20. Psoriasis regression analysis of MHC loci identifies shared genetic variants with vitiligo.

    Directory of Open Access Journals (Sweden)

    Kun-Ju Zhu

    Full Text Available Psoriasis is a common inflammatory skin disease with genetic components of both immune system and the epidermis. PSOR1 locus (6q21 has been strongly associated with psoriasis; however, it is difficult to identify additional independent association due to strong linkage disequilibrium in the MHC region. We performed stepwise regression analyses of more than 3,000 SNPs in the MHC region genotyped using Human 610-Quad (Illumina in 1,139 cases with psoriasis and 1,132 controls of Han Chinese population to search for additional independent association. With four regression models obtained, two SNPs rs9468925 in HLA-C/HLA-B and rs2858881 in HLA-DQA2 were repeatedly selected in all models, suggesting that multiple loci outside PSOR1 locus were associated with psoriasis. More importantly we find that rs9468925 in HLA-C/HLA-B is associated with both psoriasis and vitiligo, providing first important evidence that two major skin diseases share a common genetic locus in the MHC, and a basis for elucidating the molecular mechanism of skin disorders.

  1. Do drug treatment variables predict cognitive performance in multidrug-treated opioid-dependent patients? A regression analysis study

    Directory of Open Access Journals (Sweden)

    Rapeli Pekka

    2012-11-01

    Full Text Available Abstract Background Cognitive deficits and multiple psychoactive drug regimens are both common in patients treated for opioid-dependence. Therefore, we examined whether the cognitive performance of patients in opioid-substitution treatment (OST is associated with their drug treatment variables. Methods Opioid-dependent patients (N = 104 who were treated either with buprenorphine or methadone (n = 52 in both groups were given attention, working memory, verbal, and visual memory tests after they had been a minimum of six months in treatment. Group-wise results were analysed by analysis of variance. Predictors of cognitive performance were examined by hierarchical regression analysis. Results Buprenorphine-treated patients performed statistically significantly better in a simple reaction time test than methadone-treated ones. No other significant differences between groups in cognitive performance were found. In each OST drug group, approximately 10% of the attention performance could be predicted by drug treatment variables. Use of benzodiazepine medication predicted about 10% of performance variance in working memory. Treatment with more than one other psychoactive drug (than opioid or BZD and frequent substance abuse during the past month predicted about 20% of verbal memory performance. Conclusions Although this study does not prove a causal relationship between multiple prescription drug use and poor cognitive functioning, the results are relevant for psychosocial recovery, vocational rehabilitation, and psychological treatment of OST patients. Especially for patients with BZD treatment, other treatment options should be actively sought.

  2. Bayesian nonparametric regression analysis of data with random effects covariates from longitudinal measurements.

    Science.gov (United States)

    Ryu, Duchwan; Li, Erning; Mallick, Bani K

    2011-06-01

    We consider nonparametric regression analysis in a generalized linear model (GLM) framework for data with covariates that are the subject-specific random effects of longitudinal measurements. The usual assumption that the effects of the longitudinal covariate processes are linear in the GLM may be unrealistic and if this happens it can cast doubt on the inference of observed covariate effects. Allowing the regression functions to be unknown, we propose to apply Bayesian nonparametric methods including cubic smoothing splines or P-splines for the possible nonlinearity and use an additive model in this complex setting. To improve computational efficiency, we propose the use of data-augmentation schemes. The approach allows flexible covariance structures for the random effects and within-subject measurement errors of the longitudinal processes. The posterior model space is explored through a Markov chain Monte Carlo (MCMC) sampler. The proposed methods are illustrated and compared to other approaches, the "naive" approach and the regression calibration, via simulations and by an application that investigates the relationship between obesity in adulthood and childhood growth curves.

  3. Evaluation of a LASSO regression approach on the unrelated samples of Genetic Analysis Workshop 17.

    Science.gov (United States)

    Guo, Wei; Elston, Robert C; Zhu, Xiaofeng

    2011-11-29

    The Genetic Analysis Workshop 17 data we used comprise 697 unrelated individuals genotyped at 24,487 single-nucleotide polymorphisms (SNPs) from a mini-exome scan, using real sequence data for 3,205 genes annotated by the 1000 Genomes Project and simulated phenotypes. We studied 200 sets of simulated phenotypes of trait Q2. An important feature of this data set is that most SNPs are rare, with 87% of the SNPs having a minor allele frequency less than 0.05. For rare SNP detection, in this study we performed a least absolute shrinkage and selection operator (LASSO) regression and F tests at the gene level and calculated the generalized degrees of freedom to avoid any selection bias. For comparison, we also carried out linear regression and the collapsing method, which sums the rare SNPs, modified for a quantitative trait and with two different allele frequency thresholds. The aim of this paper is to evaluate these four approaches in this mini-exome data and compare their performance in terms of power and false positive rates. In most situations the LASSO approach is more powerful than linear regression and collapsing methods. We also note the difficulty in determining the optimal threshold for the collapsing method and the significant role that linkage disequilibrium plays in detecting rare causal SNPs. If a rare causal SNP is in strong linkage disequilibrium with a common marker in the same gene, power will be much improved.

  4. Comparison of various texture classification methods using multiresolution analysis and linear regression modelling.

    Science.gov (United States)

    Dhanya, S; Kumari Roshni, V S

    2016-01-01

    Textures play an important role in image classification. This paper proposes a high performance texture classification method using a combination of multiresolution analysis tool and linear regression modelling by channel elimination. The correlation between different frequency regions has been validated as a sort of effective texture characteristic. This method is motivated by the observation that there exists a distinctive correlation between the image samples belonging to the same kind of texture, at different frequency regions obtained by a wavelet transform. Experimentally, it is observed that this correlation differs across textures. The linear regression modelling is employed to analyze this correlation and extract texture features that characterize the samples. Our method considers not only the frequency regions but also the correlation between these regions. This paper primarily focuses on applying the Dual Tree Complex Wavelet Packet Transform and the Linear Regression model for classification of the obtained texture features. Additionally the paper also presents a comparative assessment of the classification results obtained from the above method with two more types of wavelet transform methods namely the Discrete Wavelet Transform and the Discrete Wavelet Packet Transform.

  5. A PANEL REGRESSION ANALYSIS OF HUMAN CAPITAL RELEVANCE IN SELECTED SCANDINAVIAN AND SE EUROPEAN COUNTRIES

    Directory of Open Access Journals (Sweden)

    Filip Kokotovic

    2016-06-01

    Full Text Available The study of human capital relevance to economic growth is becoming increasingly important taking into account its relevance in many of the Sustainable Development Goals proposed by the UN. This paper conducted a panel regression analysis of selected SE European countries and Scandinavian countries using the Granger causality test and pooled panel regression. In order to test the relevance of human capital on economic growth, several human capital proxy variables were identified. Aside from the human capital proxy variables, other explanatory variables were selected using stepwise regression while the dependant variable was GDP. This paper concludes that there are significant structural differences in the economies of the two observed panels. Of the human capital proxy variables observed, for the panel of SE European countries only life expectancy was statistically significant and it had a negative impact on economic growth, while in the panel of Scandinavian countries total public expenditure on education had a statistically significant positive effect on economic growth. Based upon these results and existing studies, this paper concludes that human capital has a far more significant impact on economic growth in more developed economies.

  6. Bayesian Nonparametric Regression Analysis of Data with Random Effects Covariates from Longitudinal Measurements

    KAUST Repository

    Ryu, Duchwan

    2010-09-28

    We consider nonparametric regression analysis in a generalized linear model (GLM) framework for data with covariates that are the subject-specific random effects of longitudinal measurements. The usual assumption that the effects of the longitudinal covariate processes are linear in the GLM may be unrealistic and if this happens it can cast doubt on the inference of observed covariate effects. Allowing the regression functions to be unknown, we propose to apply Bayesian nonparametric methods including cubic smoothing splines or P-splines for the possible nonlinearity and use an additive model in this complex setting. To improve computational efficiency, we propose the use of data-augmentation schemes. The approach allows flexible covariance structures for the random effects and within-subject measurement errors of the longitudinal processes. The posterior model space is explored through a Markov chain Monte Carlo (MCMC) sampler. The proposed methods are illustrated and compared to other approaches, the "naive" approach and the regression calibration, via simulations and by an application that investigates the relationship between obesity in adulthood and childhood growth curves. © 2010, The International Biometric Society.

  7. A comparison of ordinal regression models in an analysis of factors associated with periodontal disease

    Directory of Open Access Journals (Sweden)

    Javali Shivalingappa

    2010-01-01

    Full Text Available Aim: The study aimed to determine the factors associated with periodontal disease (different levels of severity by using different regression models for ordinal data. Design: A cross-sectional design was employed using clinical examination and ′questionnaire with interview′ method. Materials and Methods: The study was conducted during June 2008 to October 2008 in Dharwad, Karnataka, India. It involved a systematic random sample of 1760 individuals aged 18-40 years. The periodontal disease examination was conducted by using Community Periodontal Index for Treatment Needs (CPITN. Statistical Analysis Used: Regression models for ordinal data with different built-in link functions were used in determination of factors associated with periodontal disease. Results: The study findings indicated that, the ordinal regression models with four built-in link functions (logit, probit, Clog-log and nlog-log displayed similar results with negligible differences in significant factors associated with periodontal disease. The factors such as religion, caste, sources of drinking water, Timings for sweet consumption, Timings for cleaning or brushing the teeth and materials used for brushing teeth were significantly associated with periodontal disease in all ordinal models. Conclusions: The ordinal regression model with Clog-log is a better fit in determination of significant factors associated with periodontal disease as compared to models with logit, probit and nlog-log built-in link functions. The factors such as caste and time for sweet consumption are negatively associated with periodontal disease. But religion, sources of drinking water, Timings for cleaning or brushing the teeth and materials used for brushing teeth are significantly and positively associated with periodontal disease.

  8. Analysis of sparse data in logistic regression in medical research: A newer approach

    Directory of Open Access Journals (Sweden)

    S Devika

    2016-01-01

    Full Text Available Background and Objective: In the analysis of dichotomous type response variable, logistic regression is usually used. However, the performance of logistic regression in the presence of sparse data is questionable. In such a situation, a common problem is the presence of high odds ratios (ORs with very wide 95% confidence interval (CI (OR: >999.999, 95% CI: 999.999. In this paper, we addressed this issue by using penalized logistic regression (PLR method. Materials and Methods: Data from case-control study on hyponatremia and hiccups conducted in Christian Medical College, Vellore, Tamil Nadu, India was used. The outcome variable was the presence/absence of hiccups and the main exposure variable was the status of hyponatremia. Simulation dataset was created with different sample sizes and with a different number of covariates. Results: A total of 23 cases and 50 controls were used for the analysis of ordinary and PLR methods. The main exposure variable hyponatremia was present in nine (39.13% of the cases and in four (8.0% of the controls. Of the 23 hiccup cases, all were males and among the controls, 46 (92.0% were males. Thus, the complete separation between gender and the disease group led into an infinite OR with 95% CI (OR: >999.999, 95% CI: 999.999 whereas there was a finite and consistent regression coefficient for gender (OR: 5.35; 95% CI: 0.42, 816.48 using PLR. After adjusting for all the confounding variables, hyponatremia entailed 7.9 (95% CI: 2.06, 38.86 times higher risk for the development of hiccups as was found using PLR whereas there was an overestimation of risk OR: 10.76 (95% CI: 2.17, 53.41 using the conventional method. Simulation experiment shows that the estimated coverage probability of this method is near the nominal level of 95% even for small sample sizes and for a large number of covariates. Conclusions: PLR is almost equal to the ordinary logistic regression when the sample size is large and is superior in small cell

  9. FREQFIT: Computer program which performs numerical regression and statistical chi-squared goodness of fit analysis

    Energy Technology Data Exchange (ETDEWEB)

    Hofland, G.S.; Barton, C.C.

    1990-10-01

    The computer program FREQFIT is designed to perform regression and statistical chi-squared goodness of fit analysis on one-dimensional or two-dimensional data. The program features an interactive user dialogue, numerous help messages, an option for screen or line printer output, and the flexibility to use practically any commercially available graphics package to create plots of the program`s results. FREQFIT is written in Microsoft QuickBASIC, for IBM-PC compatible computers. A listing of the QuickBASIC source code for the FREQFIT program, a user manual, and sample input data, output, and plots are included. 6 refs., 1 fig.

  10. An Econometric Analysis of Modulated Realised Covariance, Regression and Correlation in Noisy Diffusion Models

    DEFF Research Database (Denmark)

    Kinnebrock, Silja; Podolskij, Mark

    This paper introduces a new estimator to measure the ex-post covariation between high-frequency financial time series under market microstructure noise. We provide an asymptotic limit theory (including feasible central limit theorems) for standard methods such as regression, correlation analysis...... and covariance, for which we obtain the optimal rate of convergence. We demonstrate some positive semidefinite estimators of the covariation and construct a positive semidefinite estimator of the conditional covariance matrix in the central limit theorem. Furthermore, we indicate how the assumptions on the noise...

  11. Application of nonlinear regression analysis for ammonium exchange by natural (Bigadic) clinoptilolite

    Energy Technology Data Exchange (ETDEWEB)

    Gunay, Ahmet [Deparment of Environmental Engineering, Faculty of Engineering and Architecture, Balikesir University (Turkey)], E-mail: ahmetgunay2@gmail.com

    2007-09-30

    The experimental data of ammonium exchange by natural Bigadic clinoptilolite was evaluated using nonlinear regression analysis. Three two-parameters isotherm models (Langmuir, Freundlich and Temkin) and three three-parameters isotherm models (Redlich-Peterson, Sips and Khan) were used to analyse the equilibrium data. Fitting of isotherm models was determined using values of standard normalization error procedure (SNE) and coefficient of determination (R{sup 2}). HYBRID error function provided lowest sum of normalized error and Khan model had better performance for modeling the equilibrium data. Thermodynamic investigation indicated that ammonium removal by clinoptilolite was favorable at lower temperatures and exothermic in nature.

  12. Regression analysis as an objective tool of economic management of rolling mill

    Directory of Open Access Journals (Sweden)

    Š. Vilamová

    2015-07-01

    Full Text Available The ability to optimize costs plays a key role in maintaining competitiveness of the company, because without detailed knowledge of costs, companies are not able to make the right decisions that will ensure their long-term growth. The aim of this article is to outline the problematic areas related to company costs and to contribute to a debate on the method used to determine the amount of fixed and variable costs, their monitoring and follow-up control. This article presents a potential use of regression analysis as an objective tool of economic management in metallurgical companies, as these companies have several specific features

  13. Classification of Error-Diffused Halftone Images Based on Spectral Regression Kernel Discriminant Analysis

    Directory of Open Access Journals (Sweden)

    Zhigao Zeng

    2016-01-01

    Full Text Available This paper proposes a novel algorithm to solve the challenging problem of classifying error-diffused halftone images. We firstly design the class feature matrices, after extracting the image patches according to their statistics characteristics, to classify the error-diffused halftone images. Then, the spectral regression kernel discriminant analysis is used for feature dimension reduction. The error-diffused halftone images are finally classified using an idea similar to the nearest centroids classifier. As demonstrated by the experimental results, our method is fast and can achieve a high classification accuracy rate with an added benefit of robustness in tackling noise.

  14. A systematic review and meta-regression analysis of mivacurium for tracheal intubation

    OpenAIRE

    Vanlinthout, L.E.H.; Mesfin, S.H.; Hens, Niel; Vanacker, B. F.; Robertson, E. N.; Booij, L. H. D. J.

    2014-01-01

    We systematically reviewed factors associated with intubation conditions in randomised controlled trials of mivacurium, using random-effects meta-regression analysis. We included 29 studies of 1050 healthy participants. Four factors explained 72.9% of the variation in the probability of excellent intubation conditions: mivacurium dose, 24.4%; opioid use, 29.9%; time to intubation and age together, 18.6%. The odds ratio (95% CI) for excellent intubation was 3.14 (1.65–5.73) for doubling the mi...

  15. Estimating the causes of traffic accidents using logistic regression and discriminant analysis.

    Science.gov (United States)

    Karacasu, Murat; Ergül, Barış; Altin Yavuz, Arzu

    2014-01-01

    Factors that affect traffic accidents have been analysed in various ways. In this study, we use the methods of logistic regression and discriminant analysis to determine the damages due to injury and non-injury accidents in the Eskisehir Province. Data were obtained from the accident reports of the General Directorate of Security in Eskisehir; 2552 traffic accidents between January and December 2009 were investigated regarding whether they resulted in injury. According to the results, the effects of traffic accidents were reflected in the variables. These results provide a wealth of information that may aid future measures toward the prevention of undesired results.

  16. Multiple comparison analysis testing in ANOVA.

    Science.gov (United States)

    McHugh, Mary L

    2011-01-01

    The Analysis of Variance (ANOVA) test has long been an important tool for researchers conducting studies on multiple experimental groups and one or more control groups. However, ANOVA cannot provide detailed information on differences among the various study groups, or on complex combinations of study groups. To fully understand group differences in an ANOVA, researchers must conduct tests of the differences between particular pairs of experimental and control groups. Tests conducted on subsets of data tested previously in another analysis are called post hoc tests. A class of post hoc tests that provide this type of detailed information for ANOVA results are called "multiple comparison analysis" tests. The most commonly used multiple comparison analysis statistics include the following tests: Tukey, Newman-Keuls, Scheffee, Bonferroni and Dunnett. These statistical tools each have specific uses, advantages and disadvantages. Some are best used for testing theory while others are useful in generating new theory. Selection of the appropriate post hoc test will provide researchers with the most detailed information while limiting Type 1 errors due to alpha inflation.

  17. Surface Roughness Prediction Model using Zirconia Toughened Alumina (ZTA) Turning Inserts: Taguchi Method and Regression Analysis

    Science.gov (United States)

    Mandal, Nilrudra; Doloi, Biswanath; Mondal, Biswanath

    2016-01-01

    In the present study, an attempt has been made to apply the Taguchi parameter design method and regression analysis for optimizing the cutting conditions on surface finish while machining AISI 4340 steel with the help of the newly developed yttria based Zirconia Toughened Alumina (ZTA) inserts. These inserts are prepared through wet chemical co-precipitation route followed by powder metallurgy process. Experiments have been carried out based on an orthogonal array L9 with three parameters (cutting speed, depth of cut and feed rate) at three levels (low, medium and high). Based on the mean response and signal to noise ratio (SNR), the best optimal cutting condition has been arrived at A3B1C1 i.e. cutting speed is 420 m/min, depth of cut is 0.5 mm and feed rate is 0.12 m/min considering the condition smaller is the better approach. Analysis of Variance (ANOVA) is applied to find out the significance and percentage contribution of each parameter. The mathematical model of surface roughness has been developed using regression analysis as a function of the above mentioned independent variables. The predicted values from the developed model and experimental values are found to be very close to each other justifying the significance of the model. A confirmation run has been carried out with 95 % confidence level to verify the optimized result and the values obtained are within the prescribed limit.

  18. Neck-focused panic attacks among Cambodian refugees; a logistic and linear regression analysis.

    Science.gov (United States)

    Hinton, Devon E; Chhean, Dara; Pich, Vuth; Um, Khin; Fama, Jeanne M; Pollack, Mark H

    2006-01-01

    Consecutive Cambodian refugees attending a psychiatric clinic were assessed for the presence and severity of current--i.e., at least one episode in the last month--neck-focused panic. Among the whole sample (N=130), in a logistic regression analysis, the Anxiety Sensitivity Index (ASI; odds ratio=3.70) and the Clinician-Administered PTSD Scale (CAPS; odds ratio=2.61) significantly predicted the presence of current neck panic (NP). Among the neck panic patients (N=60), in the linear regression analysis, NP severity was significantly predicted by NP-associated flashbacks (beta=.42), NP-associated catastrophic cognitions (beta=.22), and CAPS score (beta=.28). Further analysis revealed the effect of the CAPS score to be significantly mediated (Sobel test [Baron, R. M., & Kenny, D. A. (1986). The moderator-mediator variable distinction in social psychological research: conceptual, strategic, and statistical considerations. Journal of Personality and Social Psychology, 51, 1173-1182]) by both NP-associated flashbacks and catastrophic cognitions. In the care of traumatized Cambodian refugees, NP severity, as well as NP-associated flashbacks and catastrophic cognitions, should be specifically assessed and treated.

  19. MULTIVARIATE STEPWISE LOGISTIC REGRESSION ANALYSIS ON RISK FACTORS OF VENTILATOR-ASSOCIATED PNEUMONIA IN COMPREHENSIVE ICU

    Institute of Scientific and Technical Information of China (English)

    管军; 杨兴易; 赵良; 林兆奋; 郭昌星; 李文放

    2003-01-01

    Objective To investigate the incidence, crude mortality and independent risk factors of ventilator-associated pneumonia (VAP) in comprehensive ICU in China.Methods The clinical and microbiological data were retrospectively collected and analysed of all the 97 patients receiving mechanical ventilation (>48hr) in our comprehensive ICU during 1999. 1 - 2000. 12. Firstly several statistically significant risk factors were screened out with univariate analysis, then independent risk factors were determined with multivariate stepwise logistic regression analysis.Results The incidence of VAP was 54. 64% (15. 60 cases per 1000 ventilation days), the crude mortality 47.42% . Interval between the establishment of artificial airway and diagnosis of VAP was 6.9 ± 4.3 d. Univariate analysis suggested that indwelling naso-gastric tube, corticosteroid, acid inhibitor, third-generation cephalosporin/ imipenem, non - infection lung disease, and extrapulmonary infection were the statistically significant risk factors of

  20. Selenium Exposure and Cancer Risk: an Updated Meta-analysis and Meta-regression.

    Science.gov (United States)

    Cai, Xianlei; Wang, Chen; Yu, Wanqi; Fan, Wenjie; Wang, Shan; Shen, Ning; Wu, Pengcheng; Li, Xiuyang; Wang, Fudi

    2016-01-20

    The objective of this study was to investigate the associations between selenium exposure and cancer risk. We identified 69 studies and applied meta-analysis, meta-regression and dose-response analysis to obtain available evidence. The results indicated that high selenium exposure had a protective effect on cancer risk (pooled OR = 0.78; 95%CI: 0.73-0.83). The results of linear and nonlinear dose-response analysis indicated that high serum/plasma selenium and toenail selenium had the efficacy on cancer prevention. However, we did not find a protective efficacy of selenium supplement. High selenium exposure may have different effects on specific types of cancer. It decreased the risk of breast cancer, lung cancer, esophageal cancer, gastric cancer, and prostate cancer, but it was not associated with colorectal cancer, bladder cancer, and skin cancer.

  1. Multiple-factor Non-conditional Logistic Regression Analysis of Influential Factors on Embryo Implantation of in Vitro Fertilization and Embryo Transfer%体外受精-胚胎移植中影响胚胎种植因素的多因素非条件logistic回归分析

    Institute of Scientific and Technical Information of China (English)

    龚瑜; 白晓红; 吕睿; 宋学茹; 随笑琳; 赵晓徽

    2011-01-01

    Objective To investigate influential factors on embryo transfer (ET) of in vitro fertilization and embryo transfer (IVF-ET). Methods From January 2007 to September 2009, people who were received in vitro fertilization and embryo transfer and intracytoplasmic sperm injection (ICSI) were recruited into this study. They were divided into two groups according to implantation rate, implanted group (implantation rate was 100%, n=47) and non implanted group (implantation rate was 0, n 131).Influential factors on embryo implantation were analyzed by multivariate non conditional logistic regression analysis by choosing whether embryo implantation or not as the independent variable and age, pathogeny,gonadotropin (Gn) dose, thickness of endometrim [at the day of injecting human chorionic gonadotrophin (hCG)], number of oocytes, in vitro fertilization and embryo transfer or intracytoplasmic sperm injection,fertilization rate, normal fertilization rate, poly spermic fertilization rate, high quality embryo rate, embryo transfer time (menstrual cycle), and cell number of embryo transfer and fragment on embryo implantation as the independent variable. The study protocol was approved by the Ethical Review Board of Investigation in Human Being of Tianjin Medical University General Hospital. Informed consent was obtained from all participants. Results Variables in the equation of logistic regression were age (OR 0. 844, 95%CI 23.10 35.44), normal fertilization rate (OR=1. 019, 95%CI 46. 95-100. 00), high quality embryo rate(OR 1. 018, 95%CI 28. 18-100. 00), embryo transfer time (menstrual cycle) (OR 1. 143, 95%CI 8. 03 29.09), cell number of embryo transferred (OR=1. 775, 95%CI 6. 78 8. 86). Within a certain extent (95%CI), embryo implantation rate decreased 15. 6% when the age with additional 1 year; normal fertilization rate increased by 1.0 % when embryo implantation rate increased by 1.9 %; embryo implantation rate increased by 1.8% when high quality embryo

  2. Investigating the Quantitative Structure-Activity Relationships for Antibody Recognition of Two Immunoassays for Polycyclic Aromatic Hydrocarbons by Multiple Regression Methods

    Directory of Open Access Journals (Sweden)

    Yan-Feng Zhang

    2012-07-01

    Full Text Available Polycyclic aromatic hydrocarbons (PAHs are ubiquitous contaminants found in the environment. Immunoassays represent useful analytical methods to complement traditional analytical procedures for PAHs. Cross-reactivity (CR is a very useful character to evaluate the extent of cross-reaction of a cross-reactant in immunoreactions and immunoassays. The quantitative relationships between the molecular properties and the CR of PAHs were established by stepwise multiple linear regression, principal component regression and partial least square regression, using the data of two commercial enzyme-linked immunosorbent assay (ELISA kits. The objective is to find the most important molecular properties that affect the CR, and predict the CR by multiple regression methods. The results show that the physicochemical, electronic and topological properties of the PAH molecules have an integrated effect on the CR properties for the two ELISAs, among which molar solubility (Sm and valence molecular connectivity index (3χv are the most important factors. The obtained regression equations for RisC kit are all statistically significant (p < 0.005 and show satisfactory ability for predicting CR values, while equations for RaPID kit are all not significant (p > 0.05 and not suitable for predicting. It is probably because that the RisC immunoassay employs a monoclonal antibody, while the RaPID kit is based on polyclonal antibody. Considering the important effect of solubility on the CR values, cross-reaction potential (CRP is calculated and used as a complement of CR for evaluation of cross-reactions in immunoassays. Only the compounds with both high CR and high CRP can cause intense cross-reactions in immunoassays.

  3. Measuring treatment and scale bias effects by linear regression in the analysis of OHI-S scores.

    Science.gov (United States)

    Moore, B J

    1977-05-01

    A linear regression model is presented for estimating unbiased treatment effects from OHI-S scores. An example is given to illustrate an analysis and to compare results of an unbiased regression estimator with those based on a biased simple difference estimator.

  4. Automated particle identification through regression analysis of size, shape and colour

    Science.gov (United States)

    Rodriguez Luna, J. C.; Cooper, J. M.; Neale, S. L.

    2016-04-01

    Rapid point of care diagnostic tests and tests to provide therapeutic information are now available for a range of specific conditions from the measurement of blood glucose levels for diabetes to card agglutination tests for parasitic infections. Due to a lack of specificity these test are often then backed up by more conventional lab based diagnostic methods for example a card agglutination test may be carried out for a suspected parasitic infection in the field and if positive a blood sample can then be sent to a lab for confirmation. The eventual diagnosis is often achieved by microscopic examination of the sample. In this paper we propose a computerized vision system for aiding in the diagnostic process; this system used a novel particle recognition algorithm to improve specificity and speed during the diagnostic process. We will show the detection and classification of different types of cells in a diluted blood sample using regression analysis of their size, shape and colour. The first step is to define the objects to be tracked by a Gaussian Mixture Model for background subtraction and binary opening and closing for noise suppression. After subtracting the objects of interest from the background the next challenge is to predict if a given object belongs to a certain category or not. This is a classification problem, and the output of the algorithm is a Boolean value (true/false). As such the computer program should be able to "predict" with reasonable level of confidence if a given particle belongs to the kind we are looking for or not. We show the use of a binary logistic regression analysis with three continuous predictors: size, shape and color histogram. The results suggest this variables could be very useful in a logistic regression equation as they proved to have a relatively high predictive value on their own.

  5. Evaluating heterogeneity in indoor and outdoor air pollution using land-use regression and constrained factor analysis.

    Science.gov (United States)

    Levy, Jonathan I; Clougherty, Jane E; Baxter, Lisa K; Houseman, E Andres; Paciorek, Christopher J

    2010-12-01

    Previous studies have identified associations between traffic exposures and a variety of adverse health effects, but many of these studies relied on proximity measures rather than measured or modeled concentrations of specific air pollutants, complicating interpretability of the findings. An increasing number of studies have used land-use regression (LUR) or other techniques to model small-scale variability in concentrations of specific air pollutants. However, these studies have generally considered a limited number of pollutants, focused on outdoor concentrations (or indoor concentrations of ambient origin) when indoor concentrations are better proxies for personal exposures, and have not taken full advantage of statistical methods for source apportionment that may have provided insight about the structure of the LUR models and the interpretability of model results. Given these issues, the primary objective of our study was to determine predictors of indoor and outdoor residential concentrations of multiple traffic-related air pollutants within an urban area, based on a combination of central site monitoring data; geographic information system (GIS) covariates reflecting traffic and other outdoor sources; questionnaire data reflecting indoor sources and activities that affect ventilation rates; and factor-analytic methods to better infer source contributions. As part of a prospective birth cohort study assessing asthma etiology in urban Boston, we collected indoor and/or outdoor 3-to-4 day samples of nitrogen dioxide (NO2) and fine particulate matter with an aerodynamic diameter or = 2.5 pm (PM2.5) at 44 residences during multiple seasons of the year from 2003 through 2005. We performed reflectance analysis, x-ray fluorescence spectroscopy (XRF), and high-resolution inductively coupled plasma-mass spectrometry (ICP-MS) on particle filters to estimate the concentrations of elemental carbon (EC), trace elements, and water-soluble metals, respectively. We derived

  6. Quantitative structure-property relationship modeling of water-to-wet butyl acetate partition coefficient of 76 organic solutes using multiple linear regression and artificial neural network.

    Science.gov (United States)

    Dashtbozorgi, Zahra; Golmohammadi, Hassan

    2010-12-01

    The main aim of this study was the development of a quantitative structure-property relationship method using an artificial neural network (ANN) for predicting the water-to-wet butyl acetate partition coefficients of organic solutes. As a first step, a genetic algorithm-multiple linear regression model was developed; the descriptors appearing in this model were considered as inputs for the ANN. These descriptors are principal moment of inertia C (I(C)), area-weighted surface charge of hydrogen-bonding donor atoms (HACA-2), Kier and Hall index (order 2) ((2)χ), Balaban index (J), minimum bond order of a C atom (P(C)) and relative negative-charged SA (RNCS). Then a 6-4-1 neural network was generated for the prediction of water-to-wet butyl acetate partition coefficients of 76 organic solutes. By comparing the results obtained from multiple linear regression and ANN models, it can be seen that statistical parameters (Fisher ratio, correlation coefficient and standard error) of the ANN model are better than that regression model, which indicates that nonlinear model can simulate the relationship between the structural descriptors and the partition coefficients of the investigated molecules more accurately.

  7. Application of least squares support vector regression and linear multiple regression for modeling removal of methyl orange onto tin oxide nanoparticles loaded on activated carbon and activated carbon prepared from Pistacia atlantica wood.

    Science.gov (United States)

    Ghaedi, M; Rahimi, Mahmoud Reza; Ghaedi, A M; Tyagi, Inderjeet; Agarwal, Shilpi; Gupta, Vinod Kumar

    2016-01-01

    Two novel and eco friendly adsorbents namely tin oxide nanoparticles loaded on activated carbon (SnO2-NP-AC) and activated carbon prepared from wood tree Pistacia atlantica (AC-PAW) were used for the rapid removal and fast adsorption of methyl orange (MO) from the aqueous phase. The dependency of MO removal with various adsorption influential parameters was well modeled and optimized using multiple linear regressions (MLR) and least squares support vector regression (LSSVR). The optimal parameters for the LSSVR model were found based on γ value of 0.76 and σ(2) of 0.15. For testing the data set, the mean square error (MSE) values of 0.0010 and the coefficient of determination (R(2)) values of 0.976 were obtained for LSSVR model, and the MSE value of 0.0037 and the R(2) value of 0.897 were obtained for the MLR model. The adsorption equilibrium and kinetic data was found to be well fitted and in good agreement with Langmuir isotherm model and second-order equation and intra-particle diffusion models respectively. The small amount of the proposed SnO2-NP-AC and AC-PAW (0.015 g and 0.08 g) is applicable for successful rapid removal of methyl orange (>95%). The maximum adsorption capacity for SnO2-NP-AC and AC-PAW was 250 mg g(-1) and 125 mg g(-1) respectively.

  8. Stability and adaptability of runner peanut genotypes based on nonlinear regression and AMMI analysis

    Directory of Open Access Journals (Sweden)

    Roseane Cavalcanti dos Santos

    2012-08-01

    Full Text Available The objective of this work was to estimate the stability and adaptability of pod and seed yield in runner peanut genotypes based on the nonlinear regression and AMMI analysis. Yield data from 11 trials, distributed in six environments and three harvests, carried out in the Northeast region of Brazil during the rainy season were used. Significant effects of genotypes (G, environments (E, and GE interactions were detected in the analysis, indicating different behaviors among genotypes in favorable and unfavorable environmental conditions. The genotypes BRS Pérola Branca and LViPE‑06 are more stable and adapted to the semiarid environment, whereas LGoPE‑06 is a promising material for pod production, despite being highly dependent on favorable environments.

  9. Within-session analysis of the extinction of pavlovian fear-conditioning using robust regression

    Directory of Open Access Journals (Sweden)

    Vargas-Irwin, Cristina

    2010-06-01

    Full Text Available Traditionally , the analysis of extinction data in fear conditioning experiments has involved the use of standard linear models, mostly ANOVA of between-group differences of subjects that have undergone different extinction protocols, pharmacological manipulations or some other treatment. Although some studies report individual differences in quantities such as suppression rates or freezing percentages, these differences are not included in the statistical modeling. Withinsubject response patterns are then averaged using coarse-grain time windows which can overlook these individual performance dynamics. Here we illustrate an alternative analytical procedure consisting of 2 steps: the estimation of a trend for within-session data and analysis of group differences in trend as main outcome. This procedure is tested on real fear-conditioning extinction data, comparing trend estimates via Ordinary Least Squares (OLS and robust Least Median of Squares (LMS regression estimates, as well as comparing between-group differences and analyzing mean freezing percentage versus LMS slopes as outcomes

  10. Using Spline Regression in Semi-Parametric Stochastic Frontier Analysis: An Application to Polish Dairy Farms

    DEFF Research Database (Denmark)

    Czekaj, Tomasz Gerard; Henningsen, Arne

    The estimation of the technical efficiency comprises a vast literature in the field of applied production economics. There are two predominant approaches: the non-parametric and non-stochastic Data Envelopment Analysis (DEA) and the parametric Stochastic Frontier Analysis (SFA). The DEA...... of specifying an unsuitable functional form and thus, model misspecification and biased parameter estimates. Given these problems of the DEA and the SFA, Fan, Li and Weersink (1996) proposed a semi-parametric stochastic frontier model that estimates the production function (frontier) by non-parametric......), Kumbhakar et al. (2007), and Henningsen and Kumbhakar (2009). The aim of this paper and its main contribution to the existing literature is the estimation semi-parametric stochastic frontier models using a different non-parametric estimation technique: spline regression (Ma et al. 2011). We apply...

  11. A frailty model approach for regression analysis of multivariate current status data.

    Science.gov (United States)

    Chen, Man-Hua; Tong, Xingwei; Sun, Jianguo

    2009-11-30

    This paper discusses regression analysis of multivariate current status failure time data (The Statistical Analysis of Interval-censoring Failure Time Data. Springer: New York, 2006), which occur quite often in, for example, tumorigenicity experiments and epidemiologic investigations of the natural history of a disease. For the problem, several marginal approaches have been proposed that model each failure time of interest individually (Biometrics 2000; 56:940-943; Statist. Med. 2002; 21:3715-3726). In this paper, we present a full likelihood approach based on the proportional hazards frailty model. For estimation, an Expectation Maximization (EM) algorithm is developed and simulation studies suggest that the presented approach performs well for practical situations. The approach is applied to a set of bivariate current status data arising from a tumorigenicity experiment.

  12. The Analysis Of The Correlations And Regressions Between Some Characters On A Wheat Isogenic Varities Assortment

    Science.gov (United States)

    Păniţă, Ovidiu

    2015-09-01

    In the years 2012-2014 on Banu-Maracine DRS there were tested an assortment of 25 isogenic lines of wheat (Triticum aestivum ssp.vulgare), the analyzed characters being the number of seeds/spike, seeds weight/spike (g), no. of spikes/m2, weight of a thousand seeds (WTS) (g) and no. of emerged plants/m2. Based on recorded data and statistical processing of those, they were identified a numbers of links between these characters. Also available regression models were identified between some of the studied characters. Based on component analysis, no. of seeds/spike and seeds weight/spike are components that influence in excess of 88% variance analysis, a total of seven genotypes with positive scores for both factors.

  13. Diversity Performance Analysis on Multiple HAP Networks

    Directory of Open Access Journals (Sweden)

    Feihong Dong

    2015-06-01

    Full Text Available One of the main design challenges in wireless sensor networks (WSNs is achieving a high-data-rate transmission for individual sensor devices. The high altitude platform (HAP is an important communication relay platform for WSNs and next-generation wireless networks. Multiple-input multiple-output (MIMO techniques provide the diversity and multiplexing gain, which can improve the network performance effectively. In this paper, a virtual MIMO (V-MIMO model is proposed by networking multiple HAPs with the concept of multiple assets in view (MAV. In a shadowed Rician fading channel, the diversity performance is investigated. The probability density function (PDF and cumulative distribution function (CDF of the received signal-to-noise ratio (SNR are derived. In addition, the average symbol error rate (ASER with BPSK and QPSK is given for the V-MIMO model. The system capacity is studied for both perfect channel state information (CSI and unknown CSI individually. The ergodic capacity with various SNR and Rician factors for different network configurations is also analyzed. The simulation results validate the effectiveness of the performance analysis. It is shown that the performance of the HAPs network in WSNs can be significantly improved by utilizing the MAV to achieve overlapping coverage, with the help of the V-MIMO techniques.

  14. Canopy Height Estimation in French Guiana with LiDAR ICESat/GLAS Data Using Principal Component Analysis and Random Forest Regressions

    Directory of Open Access Journals (Sweden)

    Ibrahim Fayad

    2014-11-01

    Full Text Available Estimating forest canopy height from large-footprint satellite LiDAR waveforms is challenging given the complex interaction between LiDAR waveforms, terrain, and vegetation, especially in dense tropical and equatorial forests. In this study, canopy height in French Guiana was estimated using multiple linear regression models and the Random Forest technique (RF. This analysis was either based on LiDAR waveform metrics extracted from the GLAS (Geoscience Laser Altimeter System spaceborne LiDAR data and terrain information derived from the SRTM (Shuttle Radar Topography Mission DEM (Digital Elevation Model or on Principal Component Analysis (PCA of GLAS waveforms. Results show that the best statistical model for estimating forest height based on waveform metrics and digital elevation data is a linear regression of waveform extent, trailing edge extent, and terrain index (RMSE of 3.7 m. For the PCA based models, better canopy height estimation results were observed using a regression model that incorporated both the first 13 principal components (PCs and the waveform extent (RMSE = 3.8 m. Random Forest regressions revealed that the best configuration for canopy height estimation used all the following metrics: waveform extent, leading edge, trailing edge, and terrain index (RMSE = 3.4 m. Waveform extent was the variable that best explained canopy height, with an importance factor almost three times higher than those for the other three metrics (leading edge, trailing edge, and terrain index. Furthermore, the Random Forest regression incorporating the first 13 PCs and the waveform extent had a slightly-improved canopy height estimation in comparison to the linear model, with an RMSE of 3.6 m. In conclusion, multiple linear regressions and RF regressions provided canopy height estimations with similar precision using either LiDAR metrics or PCs. However, a regression model (linear regression or RF based on the PCA of waveform samples with waveform

  15. Optimization of Game Formats in U-10 Soccer Using Logistic Regression Analysis

    Directory of Open Access Journals (Sweden)

    Amatria Mario

    2016-12-01

    Full Text Available Small-sided games provide young soccer players with better opportunities to develop their skills and progress as individual and team players. There is, however, little evidence on the effectiveness of different game formats in different age groups, and furthermore, these formats can vary between and even within countries. The Royal Spanish Soccer Association replaced the traditional grassroots 7-a-side format (F-7 with the 8-a-side format (F-8 in the 2011-12 season and the country’s regional federations gradually followed suit. The aim of this observational methodology study was to investigate which of these formats best suited the learning needs of U-10 players transitioning from 5-aside futsal. We built a multiple logistic regression model to predict the success of offensive moves depending on the game format and the area of the pitch in which the move was initiated. Success was defined as a shot at the goal. We also built two simple logistic regression models to evaluate how the game format influenced the acquisition of technicaltactical skills. It was found that the probability of a shot at the goal was higher in F-7 than in F-8 for moves initiated in the Creation Sector-Own Half (0.08 vs 0.07 and the Creation Sector-Opponent's Half (0.18 vs 0.16. The probability was the same (0.04 in the Safety Sector. Children also had more opportunities to control the ball and pass or take a shot in the F-7 format (0.24 vs 0.20, and these were also more likely to be successful in this format (0.28 vs 0.19.

  16. Analysis of Daytime and Nighttime Ground Level Ozone Concentrations Using Boosted Regression Tree Technique

    Directory of Open Access Journals (Sweden)

    Noor Zaitun Yahaya

    2017-01-01

    Full Text Available This paper investigated the use of boosted regression trees (BRTs to draw an inference about daytime and nighttime ozone formation in a coastal environment. Hourly ground-level ozone data for a full calendar year in 2010 were obtained from the Kemaman (CA 002 air quality monitoring station. A BRT model was developed using hourly ozone data as a response variable and nitric oxide (NO, Nitrogen Dioxide (NO2 and Nitrogen Dioxide (NOx and meteorological parameters as explanatory variables. The ozone BRT algorithm model was constructed from multiple regression models, and the 'best iteration' of BRT model was performed by optimizing prediction performance. Sensitivity testing of the BRT model was conducted to determine the best parameters and good explanatory variables. Using the number of trees between 2,500-3,500, learning rate of 0.01, and interaction depth of 5 were found to be the best setting for developing the ozone boosting model. The performance of the O3 boosting models were assessed, and the fraction of predictions within two factor (FAC2, coefficient of determination (R2 and the index of agreement (IOA of the model developed for day and nighttime are 0.93, 0.69 and 0.73 for daytime and 0.79, 0.55 and 0.69 for nighttime respectively. Results showed that the model developed was within the acceptable range and could be used to understand ozone formation and identify potential sources of ozone for estimating O3 concentrations during daytime and nighttime. Results indicated that the wind speed, wind direction, relative humidity, and temperature were the most dominant variables in terms of influencing ozone formation. Finally, empirical evidence of the production of a high ozone level by wind blowing from coastal areas towards the interior region, especially from industrial areas, was obtained.

  17. Investigation of the relationship between very warm days in Romania and large-scale atmospheric circulation using multiple linear regression approach

    Science.gov (United States)

    Barbu, N.; Cuculeanu, V.; Stefan, S.

    2016-10-01

    The aim of this study is to investigate the relationship between the frequency of very warm days (TX90p) in Romania and large-scale atmospheric circulation for winter (December-February) and summer (June-August) between 1962 and 2010. In order to achieve this, two catalogues from COST733Action were used to derive daily circulation types. Seasonal occurrence frequencies of the circulation types were calculated and have been utilized as predictors within the multiple linear regression model (MLRM) for the estimation of winter and summer TX90p values for 85 synoptic stations covering the entire Romania. A forward selection procedure has been utilized to find adequate predictor combinations and those predictor combinations were tested for collinearity. The performance of the MLRMs has been quantified based on the explained variance. Furthermore, the leave-one-out cross-validation procedure was applied and the root-mean-squared error skill score was calculated at station level in order to obtain reliable evidence of MLRM robustness. From this analysis, it can be stated that the MLRM performance is higher in winter compared to summer. This is due to the annual cycle of incoming insolation and to the local factors such as orography and surface albedo variations. The MLRM performances exhibit distinct variations between regions with high performance in wintertime for the eastern and southern part of the country and in summertime for the western part of the country. One can conclude that the MLRM generally captures quite well the TX90p variability and reveals the potential for statistical downscaling of TX90p values based on circulation types.

  18. Multivariate Regression Approach To Integrate Multiple Satellite And Tide Gauge Data For Real Time Sea Level Prediction

    DEFF Research Database (Denmark)

    Cheng, Yongcun; Andersen, Ole Baltazar; Knudsen, Per

    2010-01-01

    The Sea Level Thematic Assembly Center in the EUFP7 MyOcean project aims at build a sea level service for multiple satellite sea level observations at a European level for GMES marine applications. It aims to improve the sea level related products to guarantee the sustainability and the quality o...... stations with satellite altimetry....

  19. [Band depth analysis and partial least square regression based winter wheat biomass estimation using hyperspectral measurements].

    Science.gov (United States)

    Fu, Yuan-Yuan; Wang, Ji-Hua; Yang, Gui-Jun; Song, Xiao-Yu; Xu, Xin-Gang; Feng, Hai-Kuan

    2013-05-01

    The major limitation of using existing vegetation indices for crop biomass estimation is that it approaches a saturation level asymptotically for a certain range of biomass. In order to resolve this problem, band depth analysis and partial least square regression (PLSR) were combined to establish winter wheat biomass estimation model in the present study. The models based on the combination of band depth analysis and PLSR were compared with the models based on common vegetation indexes from the point of view of estimation accuracy, subsequently. Band depth analysis was conducted in the visible spectral domain (550-750 nm). Band depth, band depth ratio (BDR), normalized band depth index, and band depth normalized to area were utilized to represent band depth information. Among the calibrated estimation models, the models based on the combination of band depth analysis and PLSR reached higher accuracy than those based on the vegetation indices. Among them, the combination of BDR and PLSR got the highest accuracy (R2 = 0.792, RMSE = 0.164 kg x m(-2)). The results indicated that the combination of band depth analysis and PLSR could well overcome the saturation problem and improve the biomass estimation accuracy when winter wheat biomass is large.

  20. An innovative land use regression model incorporating meteorology for exposure analysis.

    Science.gov (United States)

    Su, Jason G; Brauer, Michael; Ainslie, Bruce; Steyn, Douw; Larson, Timothy; Buzzelli, Michael

    2008-02-15

    The advent of spatial analysis and geographic information systems (GIS) has led to studies of chronic exposure and health effects based on the rationale that intra-urban variations in ambient air pollution concentrations are as great as inter-urban differences. Such studies typically rely on local spatial covariates (e.g., traffic, land use type) derived from circular areas (buffers) to predict concentrations/exposures at receptor sites, as a means of averaging the annual net effect of meteorological influences (i.e., wind speed, wind direction and insolation). This is the approach taken in the now popular land use regression (LUR) method. However spatial studies of chronic exposures and temporal studies of acute exposures have not been adequately integrated. This paper presents an innovative LUR method implemented in a GIS environment that reflects both temporal and spatial variability and considers the role of meteorology. The new source area LUR integrates wind speed, wind direction and cloud cover/insolation to estimate hourly nitric oxide (NO) and nitrogen dioxide (NO(2)) concentrations from land use types (i.e., road network, commercial land use) and these concentrations are then used as covariates to regress against NO and NO(2) measurements at various receptor sites across the Vancouver region and compared directly with estimates from a regular LUR. The results show that, when variability in seasonal concentration measurements is present, the source area LUR or SA-LUR model is a better option for concentration estimation.

  1. Personality disorders, violence, and antisocial behavior: a systematic review and meta-regression analysis.

    Science.gov (United States)

    Yu, Rongqin; Geddes, John R; Fazel, Seena

    2012-10-01

    The risk of antisocial outcomes in individuals with personality disorder (PD) remains uncertain. The authors synthesize the current evidence on the risks of antisocial behavior, violence, and repeat offending in PD, and they explore sources of heterogeneity in risk estimates through a systematic review and meta-regression analysis of observational studies comparing antisocial outcomes in personality disordered individuals with controls groups. Fourteen studies examined risk of antisocial and violent behavior in 10,007 individuals with PD, compared with over 12 million general population controls. There was a substantially increased risk of violent outcomes in studies with all PDs (random-effects pooled odds ratio [OR] = 3.0, 95% CI = 2.6 to 3.5). Meta-regression revealed that antisocial PD and gender were associated with higher risks (p = .01 and .07, respectively). The odds of all antisocial outcomes were also elevated. Twenty-five studies reported the risk of repeat offending in PD compared with other offenders. The risk of a repeat offense was also increased (fixed-effects pooled OR = 2.4, 95% CI = 2.2 to 2.7) in offenders with PD. The authors conclude that although PD is associated with antisocial outcomes and repeat offending, the risk appears to differ by PD category, gender, and whether individuals are offenders or not.

  2. Bayesian linear regression with skew-symmetric error distributions with applications to survival analysis

    KAUST Repository

    Rubio, Francisco J.

    2016-02-09

    We study Bayesian linear regression models with skew-symmetric scale mixtures of normal error distributions. These kinds of models can be used to capture departures from the usual assumption of normality of the errors in terms of heavy tails and asymmetry. We propose a general noninformative prior structure for these regression models and show that the corresponding posterior distribution is proper under mild conditions. We extend these propriety results to cases where the response variables are censored. The latter scenario is of interest in the context of accelerated failure time models, which are relevant in survival analysis. We present a simulation study that demonstrates good frequentist properties of the posterior credible intervals associated with the proposed priors. This study also sheds some light on the trade-off between increased model flexibility and the risk of over-fitting. We illustrate the performance of the proposed models with real data. Although we focus on models with univariate response variables, we also present some extensions to the multivariate case in the Supporting Information.

  3. A genetic algorithm for variable selection in logistic regression analysis of radiotherapy treatment outcomes.

    Science.gov (United States)

    Gayou, Olivier; Das, Shiva K; Zhou, Su-Min; Marks, Lawrence B; Parda, David S; Miften, Moyed

    2008-12-01

    A given outcome of radiotherapy treatment can be modeled by analyzing its correlation with a combination of dosimetric, physiological, biological, and clinical factors, through a logistic regression fit of a large patient population. The quality of the fit is measured by the combination of the predictive power of this particular set of factors and the statistical significance of the individual factors in the model. We developed a genetic algorithm (GA), in which a small sample of all the possible combinations of variables are fitted to the patient data. New models are derived from the best models, through crossover and mutation operations, and are in turn fitted. The process is repeated until the sample converges to the combination of factors that best predicts the outcome. The GA was tested on a data set that investigated the incidence of lung injury in NSCLC patients treated with 3DCRT. The GA identified a model with two variables as the best predictor of radiation pneumonitis: the V30 (p=0.048) and the ongoing use of tobacco at the time of referral (p=0.074). This two-variable model was confirmed as the best model by analyzing all possible combinations of factors. In conclusion, genetic algorithms provide a reliable and fast way to select significant factors in logistic regression analysis of large clinical studies.

  4. Prognostics of Lithium-Ion Batteries Based on Battery Performance Analysis and Flexible Support Vector Regression

    Directory of Open Access Journals (Sweden)

    Shuai Wang

    2014-10-01

    Full Text Available Accurate prediction of the remaining useful life (RUL of lithium-ion batteries is important for battery management systems. Traditional empirical data-driven approaches for RUL prediction usually require multidimensional physical characteristics including the current, voltage, usage duration, battery temperature, and ambient temperature. From a capacity fading analysis of lithium-ion batteries, it is found that the energy efficiency and battery working temperature are closely related to the capacity degradation, which account for all performance metrics of lithium-ion batteries with regard to the RUL and the relationships between some performance metrics. Thus, we devise a non-iterative prediction model based on flexible support vector regression (F-SVR and an iterative multi-step prediction model based on support vector regression (SVR using the energy efficiency and battery working temperature as input physical characteristics. The experimental results show that the proposed prognostic models have high prediction accuracy by using fewer dimensions for the input data than the traditional empirical models.

  5. A quantile regression approach to the analysis of the quality of life determinants in the elderly

    Directory of Open Access Journals (Sweden)

    Serena Broccoli

    2013-05-01

    Full Text Available Objective. The aim of this study is to explain the effect of important covariates on the health-related quality of life (HRQol in elderly subjects. Methods. Data were collected within a longitudinal study that involves 5256 subject, aged +or= 65. The Visual Analogue Scale inclused in the EQ-5D Questionnaire, tha EQ-VAS, was used to obtain a synthetic measure of quality of life. To model EQ-VAS Score a quantile regression analysis was employed. This methodological approach was preferred to an OLS regression becouse of the EQ-VAS Score typical distribution. The main covariates are: amount of weekly physical activity, reported problems in Activity of Daily Living, presence of cardiovascular diseases, diabetes, hypercolesterolemia, hypertension, joints pains, as well as socio-demographic information. Main Results. 1 Even a low level of physical activity significantly influences quality of life in a positive way; 2 ADL problems, at least one cardiovascular disease and joint pain strongly decrease the quality of life.

  6. Prediction of acute in vivo toxicity of some amine and amide drugs to rats by multiple linear regression, partial least squares and an artificial neural network.

    Science.gov (United States)

    Mahani, Mohamad Khayatzadeh; Chaloosi, Marzieh; Maragheh, Mohamad Ghanadi; Khanchi, Ali Reza; Afzali, Daryoush

    2007-09-01

    The oral acute in vivo toxicity of 32 amine and amide drugs was related to their structural-dependent properties. Genetic algorithm-partial least-squares and stepwise variable selection was applied to select of meaningful descriptors. Multiple linear regression (MLR), artificial neural network (ANN) and partial least square (PLS) models were created with selected descriptors. The predictive ability of all three models was evaluated and compared on a set of five drugs, which were not used in modeling steps. Average errors of 0.168, 0.169 and 0.259 were obtained for MLR, ANN and PLS, respectively.

  7. Regression for economics

    CERN Document Server

    Naghshpour, Shahdad

    2012-01-01

    Regression analysis is the most commonly used statistical method in the world. Although few would characterize this technique as simple, regression is in fact both simple and elegant. The complexity that many attribute to regression analysis is often a reflection of their lack of familiarity with the language of mathematics. But regression analysis can be understood even without a mastery of sophisticated mathematical concepts. This book provides the foundation and will help demystify regression analysis using examples from economics and with real data to show the applications of the method. T

  8. ANALYSIS OF TUITION GROWTH RATES BASED ON CLUSTERING AND REGRESSION MODELS

    Directory of Open Access Journals (Sweden)

    Long Cheng

    2016-07-01

    Full Text Available Tuition plays a significant role in determining whether a student could afford higher education, which is one of the major driving forces for country development and social prosperity. So it is necessary to fully understand what factors might affect the tuition and how they affect it. However, many existing studies on the tuition growth rate either lack sufficient real data and proper quantitative models to support their conclusions, or are limited to focus on only a few factors that might affect the tuition growth rate, failing to make a comprehensive analysis. In this paper, we explore a wide variety of factors that might affect the tuition growth rate by use of large amounts of authentic data and different quantitative methods such as clustering and regression models.

  9. Electricity price forecasting using generalized regression neural network based on principal components analysis

    Institute of Scientific and Technical Information of China (English)

    牛东晓; 刘达; 邢棉

    2008-01-01

    A combined model based on principal components analysis (PCA) and generalized regression neural network (GRNN) was adopted to forecast electricity price in day-ahead electricity market. PCA was applied to mine the main influence on day-ahead price, avoiding the strong correlation between the input factors that might influence electricity price, such as the load of the forecasting hour, other history loads and prices, weather and temperature; then GRNN was employed to forecast electricity price according to the main information extracted by PCA. To prove the efficiency of the combined model, a case from PJM (Pennsylvania-New Jersey-Maryland) day-ahead electricity market was evaluated. Compared to back-propagation (BP) neural network and standard GRNN, the combined method reduces the mean absolute percentage error about 3%.

  10. A generalized Defries-Fulker regression framework for the analysis of twin data.

    Science.gov (United States)

    Lazzeroni, Laura C; Ray, Amrita

    2013-01-01

    Twin studies compare the similarity between monozygotic twins to that between dizygotic twins in order to investigate the relative contributions of latent genetic and environmental factors influencing a phenotype. Statistical methods for twin data include likelihood estimation and Defries-Fulker regression. We propose a new generalization of the Defries-Fulker model that fully incorporates the effects of observed covariates on both members of a twin pair and is robust to violations of the Normality assumption. A simulation study demonstrates that the method is competitive with likelihood analysis. The Defries-Fulker strategy yields new insight into the parameter space of the twin model and provides a novel, prediction-based interpretation of twin study results that unifies continuous and binary traits. Due to the simplicity of its structure, extensions of the model have the potential to encompass generalized linear models, censored and truncated data; and gene by environment interactions.

  11. Model selection for marginal regression analysis of longitudinal data with missing observations and covariate measurement error.

    Science.gov (United States)

    Shen, Chung-Wei; Chen, Yi-Hau

    2015-10-01

    Missing observations and covariate measurement error commonly arise in longitudinal data. However, existing methods for model selection in marginal regression analysis of longitudinal data fail to address the potential bias resulting from these issues. To tackle this problem, we propose a new model selection criterion, the Generalized Longitudinal Information Criterion, which is based on an approximately unbiased estimator for the expected quadratic error of a considered marginal model accounting for both data missingness and covariate measurement error. The simulation results reveal that the proposed method performs quite well in the presence of missing data and covariate measurement error. On the contrary, the naive procedures without taking care of such complexity in data may perform quite poorly. The proposed method is applied to data from the Taiwan Longitudinal Study on Aging to assess the relationship of depression with health and social status in the elderly, accommodating measurement error in the covariate as well as missing observations.

  12. Logistic Regression Analysis on Factors Affecting Adoption of RiceFish Farming in North Iran

    Institute of Scientific and Technical Information of China (English)

    Seyyed Ali NOORHOSSEINI-NIYAKI; Mohammad Sadegh ALLAHYARI

    2012-01-01

    We evaluated the factors influencing the adoption of rice-fish farming in the Tavalesh region near the Caspian Sea in northern Iran.We conducted a survey with open-ended questions.Data were collected from 184 respondents (61 adopters and 123 non-adopters) randomly sampled from selected villages and analyzed using logistic regression and multiresponse analysis.Family size,number of contacts with an extension agent,participation in extension-education activities,membership in social institutions and the presence of farm workers were the most important socioeconomic factors for the adoption of rice-fish farming system.In addition,economic problems were the most common issue reported by adopters.Other issues such as lack of access to appropriate fish food,losses of fish,lack of access to high quality fish fingerlings and dehydration and poor water quality were also important to a number of farmers.

  13. Deterministic Assessment of Continuous Flight Auger Construction Durations Using Regression Analysis

    Directory of Open Access Journals (Sweden)

    Hossam E. Hosny

    2015-07-01

    Full Text Available One of the primary functions of construction equipment management is to calculate the production rate of equipment which will be a major input to the processes of time estimates, cost estimates and the overall project planning. Accordingly, it is crucial to stakeholders to be able to compute equipment production rates. This may be achieved using an accurate, reliable and easy tool. The objective of this research is to provide a simple model that can be used by specialists to predict the duration of a proposed Continuous Flight Auger job. The model was obtained using a prioritizing technique based on expert judgment then using multi-regression analysis based on a representative sample. The model was then validated on a selected sample of projects. The average error of the model was calculated to be about (3%-6%.

  14. Biological stability in drinking water: a regression analysis of influencing factors

    Institute of Scientific and Technical Information of China (English)

    LU Wei; ZHANG Xiao-jian

    2005-01-01

    Some parameters, such as assimilable organic carbon(AOC), chloramine residual, water temperature, and water residence time, were measured in drinking water from distribution systems in a northern city of China. The measurement results illustrate that when chloramine residual is more than 0.3 mg/L or AOC content is below 50 tμg/L, the biological stability of drinking water can be controlled.Both chloramine residual and AOC have a good relationship with Heterotrophic Plate Counts(HPC)(log value), the correlation coefficient was -0.64 and 0.33, respectively. By regression analysis of the survey data, a statistical equation is presented and it is concluded that disinfectant residual exerts the strongest influence on bacterial growth and AOC is a suitable index to assess the biological stability in the drinking water.

  15. Sensitivity Analysis to Select the Most Influential Risk Factors in a Logistic Regression Model

    Directory of Open Access Journals (Sweden)

    Jassim N. Hussain

    2008-01-01

    Full Text Available The traditional variable selection methods for survival data depend on iteration procedures, and control of this process assumes tuning parameters that are problematic and time consuming, especially if the models are complex and have a large number of risk factors. In this paper, we propose a new method based on the global sensitivity analysis (GSA to select the most influential risk factors. This contributes to simplification of the logistic regression model by excluding the irrelevant risk factors, thus eliminating the need to fit and evaluate a large number of models. Data from medical trials are suggested as a way to test the efficiency and capability of this method and as a way to simplify the model. This leads to construction of an appropriate model. The proposed method ranks the risk factors according to their importance.

  16. A Note on Penalized Regression Spline Estimation in the Secondary Analysis of Case-Control Data

    KAUST Repository

    Gazioglu, Suzan

    2013-05-25

    Primary analysis of case-control studies focuses on the relationship between disease (D) and a set of covariates of interest (Y, X). A secondary application of the case-control study, often invoked in modern genetic epidemiologic association studies, is to investigate the interrelationship between the covariates themselves. The task is complicated due to the case-control sampling, and to avoid the biased sampling that arises from the design, it is typical to use the control data only. In this paper, we develop penalized regression spline methodology that uses all the data, and improves precision of estimation compared to using only the controls. A simulation study and an empirical example are used to illustrate the methodology.

  17. Regression analysis between body and head measurements of Chinese alligators (Alligator sinensis in the captive population

    Directory of Open Access Journals (Sweden)

    Wu, X. B.

    2006-06-01

    Full Text Available Four body-size and fourteen head-size measurements were taken from each Chinese alligator (Alligator sinensis according to the measurements adapted from Verdade. Regression equations between body-size and head-size variables were presented to predict body size from head dimension. The coefficients of determination of captive animals concerning body- and head-size variables can be considered extremely high, which means most of the head-size variables studied can be useful for predicting body length. The result of multivariate allometric analysis indicated that the head elongates as in most other species of crocodilians. The allometric coefficients of snout length (SL and lower ramus (LM were greater than those of other variables of head, which was considered to be possibly correlated to fights and prey. On the contrary, allometric coefficients for the variables of obita (OW, OL and postorbital cranial roof (LCR, were lower than those of other variables.

  18. Multivariate Regression Analysis and Statistical Modeling for Summer Extreme Precipitation over the Yangtze River Basin, China

    Directory of Open Access Journals (Sweden)

    Tao Gao

    2014-01-01

    Full Text Available Extreme precipitation is likely to be one of the most severe meteorological disasters in China; however, studies on the physical factors affecting precipitation extremes and corresponding prediction models are not accurately available. From a new point of view, the sensible heat flux (SHF and latent heat flux (LHF, which have significant impacts on summer extreme rainfall in Yangtze River basin (YRB, have been quantified and then selections of the impact factors are conducted. Firstly, a regional extreme precipitation index was applied to determine Regions of Significant Correlation (RSC by analyzing spatial distribution of correlation coefficients between this index and SHF, LHF, and sea surface temperature (SST on global ocean scale; then the time series of SHF, LHF, and SST in RSCs during 1967–2010 were selected. Furthermore, other factors that significantly affect variations in precipitation extremes over YRB were also selected. The methods of multiple stepwise regression and leave-one-out cross-validation (LOOCV were utilized to analyze and test influencing factors and statistical prediction model. The correlation coefficient between observed regional extreme index and model simulation result is 0.85, with significant level at 99%. This suggested that the forecast skill was acceptable although many aspects of the prediction model should be improved.

  19. Bayesian hierarchical regression analysis of variations in sea surface temperature change over the past million years

    Science.gov (United States)

    Snyder, Carolyn W.

    2016-09-01

    Statistical challenges often preclude comparisons among different sea surface temperature (SST) reconstructions over the past million years. Inadequate consideration of uncertainty can result in misinterpretation, overconfidence, and biased conclusions. Here I apply Bayesian hierarchical regressions to analyze local SST responsiveness to climate changes for 54 SST reconstructions from across the globe over the past million years. I develop methods to account for multiple sources of uncertainty, including the quantification of uncertainty introduced from absolute dating into interrecord comparisons. The estimates of local SST responsiveness explain 64% (62% to 77%, 95% interval) of the total variation within each SST reconstruction with a single number. There is remarkable agreement between SST proxy methods, with the exception of Mg/Ca proxy methods estimating muted responses at high latitudes. The Indian Ocean exhibits a muted response in comparison to other oceans. I find a stable estimate of the proposed "universal curve" of change in local SST responsiveness to climate changes as a function of sin2(latitude) over the past 400,000 years: SST change at 45°N/S is larger than the average tropical response by a factor of 1.9 (1.5 to 2.6, 95% interval) and explains 50% (35% to 58%, 95% interval) of the total variation between each SST reconstruction. These uncertainty and statistical methods are well suited for application across paleoclimate and environmental data series intercomparisons.

  20. Biplots in Reduced-Rank Regression

    NARCIS (Netherlands)

    Braak, ter C.J.F.; Looman, C.W.N.

    1994-01-01

    Regression problems with a number of related response variables are typically analyzed by separate multiple regressions. This paper shows how these regressions can be visualized jointly in a biplot based on reduced-rank regression. Reduced-rank regression combines multiple regression and principal c

  1. Automated Detection of Connective Tissue by Tissue Counter Analysis and Classification and Regression Trees

    Directory of Open Access Journals (Sweden)

    Josef Smolle

    2001-01-01

    Full Text Available Objective: To evaluate the feasibility of the CART (Classification and Regression Tree procedure for the recognition of microscopic structures in tissue counter analysis. Methods: Digital microscopic images of H&E stained slides of normal human skin and of primary malignant melanoma were overlayed with regularly distributed square measuring masks (elements and grey value, texture and colour features within each mask were recorded. In the learning set, elements were interactively labeled as representing either connective tissue of the reticular dermis, other tissue components or background. Subsequently, CART models were based on these data sets. Results: Implementation of the CART classification rules into the image analysis program showed that in an independent test set 94.1% of elements classified as connective tissue of the reticular dermis were correctly labeled. Automated measurements of the total amount of tissue and of the amount of connective tissue within a slide showed high reproducibility (r=0.97 and r=0.94, respectively; p < 0.001. Conclusions: CART procedure in tissue counter analysis yields simple and reproducible classification rules for tissue elements.

  2. Computing mammographic density from a multiple regression model constructed with image-acquisition parameters from a full-field digital mammographic unit

    Energy Technology Data Exchange (ETDEWEB)

    Lu, Lee-Jane W [Department of Preventive Medicine and Community Health, University of Texas Medical Branch, Galveston, TX 77555-1109 (United States); Nishino, Thomas K [Department of Radiology, University of Texas Medical Branch, Galveston, TX 77555-0709 (United States); Khamapirad, Tuenchit [Department of Radiology, University of Texas Medical Branch, Galveston, TX 77555-0709 (United States); Grady, James J [Department of Preventive Medicine and Community Health, University of Texas Medical Branch, Galveston, TX 77555-1109 (United States); Jr, Morton H Leonard [Department of Radiology, University of Texas Medical Branch, Galveston, TX 77555-0709 (United States); Brunder, Donald G [Department of Academic Computing/Academic Resources, University of Texas Medical Branch, Galveston, TX 77555-1035 (United States)

    2007-08-21

    Breast density (the percentage of fibroglandular tissue in the breast) has been suggested to be a useful surrogate marker for breast cancer risk. It is conventionally measured using screen-film mammographic images by a labor-intensive histogram segmentation method (HSM). We have adapted and modified the HSM for measuring breast density from raw digital mammograms acquired by full-field digital mammography. Multiple regression model analyses showed that many of the instrument parameters for acquiring the screening mammograms (e.g. breast compression thickness, radiological thickness, radiation dose, compression force, etc) and image pixel intensity statistics of the imaged breasts were strong predictors of the observed threshold values (model R{sup 2} = 0.93) and %-density (R{sup 2} = 0.84). The intra-class correlation coefficient of the %-density for duplicate images was estimated to be 0.80, using the regression model-derived threshold values, and 0.94 if estimated directly from the parameter estimates of the %-density prediction regression model. Therefore, with additional research, these mathematical models could be used to compute breast density objectively, automatically bypassing the HSM step, and could greatly facilitate breast cancer research studies.

  3. Development of a regression model to predict copper toxicity to Daphnia magna and site-specific copper criteria across multiple surface-water drainages in an arid landscape.

    Science.gov (United States)

    Fulton, Barry A; Meyer, Joseph S

    2014-08-01

    The water effect ratio (WER) procedure developed by the US Environmental Protection Agency is commonly used to derive site-specific criteria for point-source metal discharges into perennial waters. However, experience is limited with this method in the ephemeral and intermittent systems typical of arid climates. The present study presents a regression model to develop WER-based site-specific criteria for a network of ephemeral and intermittent streams influenced by nonpoint sources of Cu in the southwestern United States. Acute (48-h) Cu toxicity tests were performed concurrently with Daphnia magna in site water samples and hardness-matched laboratory waters. Median effect concentrations (EC50s) for Cu in site water samples (n=17) varied by more than 12-fold, and the range of calculated WER values was similar. Statistically significant (α=0.05) univariate predictors of site-specific Cu toxicity included (in sequence of decreasing significance) dissolved organic carbon (DOC), hardness/alkalinity ratio, alkalinity, K, and total dissolved solids. A multiple-regression model developed from a combination of DOC and alkalinity explained 85% of the toxicity variability in site water samples, providing a strong predictive tool that can be used in the WER framework when site-specific criteria values are derived. The biotic ligand model (BLM) underpredicted toxicity in site waters by more than 2-fold. Adjustments to the default BLM parameters improved the model's performance but did not provide a better predictive tool compared with the regression model developed from DOC and alkalinity.

  4. Computing mammographic density from a multiple regression model constructed with image-acquisition parameters from a full-field digital mammographic unit

    Science.gov (United States)

    Lu, Lee-Jane W.; Nishino, Thomas K.; Khamapirad, Tuenchit; Grady, James J.; Leonard, Morton H., Jr.; Brunder, Donald G.

    2007-08-01

    Breast density (the percentage of fibroglandular tissue in the breast) has been suggested to be a useful surrogate marker for breast cancer risk. It is conventionally measured using screen-film mammographic images by a labor-intensive histogram segmentation method (HSM). We have adapted and modified the HSM for measuring breast density from raw digital mammograms acquired by full-field digital mammography. Multiple regression model analyses showed that many of the instrument parameters for acquiring the screening mammograms (e.g. breast compression thickness, radiological thickness, radiation dose, compression force, etc) and image pixel intensity statistics of the imaged breasts were strong predictors of the observed threshold values (model R2 = 0.93) and %-density (R2 = 0.84). The intra-class correlation coefficient of the %-density for duplicate images was estimated to be 0.80, using the regression model-derived threshold values, and 0.94 if estimated directly from the parameter estimates of the %-density prediction regression model. Therefore, with additional research, these mathematical models could be used to compute breast density objectively, automatically bypassing the HSM step, and could greatly facilitate breast cancer research studies.

  5. Binary Logistic Regression Versus Boosted Regression Trees in Assessing Landslide Susceptibility for Multiple-Occurring Regional Landslide Events: Application to the 2009 Storm Event in Messina (Sicily, southern Italy).

    Science.gov (United States)

    Lombardo, L.; Cama, M.; Maerker, M.; Parisi, L.; Rotigliano, E.

    2014-12-01

    This study aims at comparing the performances of Binary Logistic Regression (BLR) and Boosted Regression Trees (BRT) methods in assessing landslide susceptibility for multiple-occurrence regional landslide events within the Mediterranean region. A test area was selected in the north-eastern sector of Sicily (southern Italy), corresponding to the catchments of the Briga and the Giampilieri streams both stretching for few kilometres from the Peloritan ridge (eastern Sicily, Italy) to the Ionian sea. This area was struck on the 1st October 2009 by an extreme climatic event resulting in thousands of rapid shallow landslides, mainly of debris flows and debris avalanches types involving the weathered layer of a low to high grade metamorphic bedrock. Exploiting the same set of predictors and the 2009 landslide archive, BLR- and BRT-based susceptibility models were obtained for the two catchments separately, adopting a random partition (RP) technique for validation; besides, the models trained in one of the two catchments (Briga) were tested in predicting the landslide distribution in the other (Giampilieri), adopting a spatial partition (SP) based validation procedure. All the validation procedures were based on multi-folds tests so to evaluate and compare the reliability of the fitting, the prediction skill, the coherence in the predictor selection and the precision of the susceptibility estimates. All the obtained models for the two methods produced very high predictive performances, with a general congruence between BLR and BRT in the predictor importance. In particular, the research highlighted that BRT-models reached a higher prediction performance with respect to BLR-models, for RP based modelling, whilst for the SP-based models the difference in predictive skills between the two methods dropped drastically, converging to an analogous excellent performance. However, when looking at the precision of the probability estimates, BLR demonstrated to produce more robust

  6. Regressing Multiple Viral Plaques and Skin Fragility Syndrome in a Cat Coinfected with FcaPV2 and FcaPV3

    Directory of Open Access Journals (Sweden)

    Alberto Alberti

    2015-01-01

    Full Text Available Feline viral plaques are uncommon skin lesions clinically characterized by multiple, often pigmented, and slightly raised lesions. Numerous reports suggest that papillomaviruses (PVs are involved in their development. Immunosuppressed and immunocompetent cats are both affected, the biological behavior is variable, and the regression is possible but rarely documented. Here we report a case of a FIV-positive cat with skin fragility syndrome and regressing multiple viral plaques in which the contemporary presence of two PV types (FcaPV2 and FcaPV3 was demonstrated by combining a quantitative molecular approach to histopathology. The cat, under glucocorticoid therapy for stomatitis and pruritus, developed skin fragility and numerous grouped slightly raised nonulcerated pigmented macules and plaques with histological features of epidermal thickness, mild dysplasia, and presence of koilocytes. Absolute quantification of the viral DNA copies (4555 copies/microliter of FcaPV2 and 8655 copies/microliter of FcaPV3 was obtained. Eighteen months after discontinuation of glucocorticoid therapy skin fragility and viral plaques had resolved. The role of the two viruses cannot be established and it remains undetermined how each of the viruses has contributed to the onset of VP; the spontaneous remission of skin lesions might have been induced by FIV status change over time due to glucocorticoid withdraw and by glucocorticoids withdraw itself.

  7. The Jackknife Interval Estimation of Parametersin Partial Least Squares Regression Modelfor Poverty Data Analysis

    Directory of Open Access Journals (Sweden)

    Pudji Ismartini

    2010-08-01

    Full Text Available One of the major problem facing the data modelling at social area is multicollinearity. Multicollinearity can have significant impact on the quality and stability of the fitted regression model. Common classical regression technique by using Least Squares estimate is highly sensitive to multicollinearity problem. In such a problem area, Partial Least Squares Regression (PLSR is a useful and flexible tool for statistical model building; however, PLSR can only yields point estimations. This paper will construct the interval estimations for PLSR regression parameters by implementing Jackknife technique to poverty data. A SAS macro programme is developed to obtain the Jackknife interval estimator for PLSR.

  8. The value of a statistical life: a meta-analysis with a mixed effects regression model.

    Science.gov (United States)

    Bellavance, François; Dionne, Georges; Lebeau, Martin

    2009-03-01

    The value of a statistical life (VSL) is a very controversial topic, but one which is essential to the optimization of governmental decisions. We see a great variability in the values obtained from different studies. The source of this variability needs to be understood, in order to offer public decision-makers better guidance in choosing a value and to set clearer guidelines for future research on the topic. This article presents a meta-analysis based on 39 observations obtained from 37 studies (from nine different countries) which all use a hedonic wage method to calculate the VSL. Our meta-analysis is innovative in that it is the first to use the mixed effects regression model [Raudenbush, S.W., 1994. Random effects models. In: Cooper, H., Hedges, L.V. (Eds.), The Handbook of Research Synthesis. Russel Sage Foundation, New York] to analyze studies on the value of a statistical life. We conclude that the variability found in the values studied stems in large part from differences in methodologies.

  9. Generalized multilevel function-on-scalar regression and principal component analysis.

    Science.gov (United States)

    Goldsmith, Jeff; Zipunnikov, Vadim; Schrack, Jennifer

    2015-06-01

    This manuscript considers regression models for generalized, multilevel functional responses: functions are generalized in that they follow an exponential family distribution and multilevel in that they are clustered within groups or subjects. This data structure is increasingly common across scientific domains and is exemplified by our motivating example, in which binary curves indicating physical activity or inactivity are observed for nearly 600 subjects over 5 days. We use a generalized linear model to incorporate scalar covariates into the mean structure, and decompose subject-specific and subject-day-specific deviations using multilevel functional principal components analysis. Thus, functional fixed effects are estimated while accounting for within-function and within-subject correlations, and major directions of variability within and between subjects are identified. Fixed effect coefficient functions and principal component basis functions are estimated using penalized splines; model parameters are estimated in a Bayesian framework using Stan, a programming language that implements a Hamiltonian Monte Carlo sampler. Simulations designed to mimic the application have good estimation and inferential properties with reasonable computation times for moderate datasets, in both cross-sectional and multilevel scenarios; code is publicly available. In the application we identify effects of age and BMI on the time-specific change in probability of being active over a 24-hour period; in addition, the principal components analysis identifies the patterns of activity that distinguish subjects and days within subjects.

  10. Integrative Data Analysis: The Simultaneous Analysis of Multiple Data Sets

    Science.gov (United States)

    Curran, Patrick J.; Hussong, Andrea M.

    2009-01-01

    There are both quantitative and methodological techniques that foster the development and maintenance of a cumulative knowledge base within the psychological sciences. Most noteworthy of these techniques is meta-analysis, which allows for the synthesis of summary statistics drawn from multiple studies when the original data are not available.…

  11. Quantile regression

    CERN Document Server

    Hao, Lingxin

    2007-01-01

    Quantile Regression, the first book of Hao and Naiman's two-book series, establishes the seldom recognized link between inequality studies and quantile regression models. Though separate methodological literature exists for each subject, the authors seek to explore the natural connections between this increasingly sought-after tool and research topics in the social sciences. Quantile regression as a method does not rely on assumptions as restrictive as those for the classical linear regression; though more traditional models such as least squares linear regression are more widely utilized, Hao

  12. Introduction to the use of regression models in epidemiology.

    Science.gov (United States)

    Bender, Ralf

    2009-01-01

    Regression modeling is one of the most important statistical techniques used in analytical epidemiology. By means of regression models the effect of one or several explanatory variables (e.g., exposures, subject characteristics, risk factors) on a response variable such as mortality or cancer can be investigated. From multiple regression models, adjusted effect estimates can be obtained that take the effect of potential confounders into account. Regression methods can be applied in all epidemiologic study designs so that they represent a universal tool for data analysis in epidemiology. Different kinds of regression models have been developed in dependence on the measurement scale of the response variable and the study design. The most important methods are linear regression for continuous outcomes, logistic regression for binary outcomes, Cox regression for time-to-event data, and Poisson regression for frequencies and rates. This chapter provides a nontechnical introduction to these regression models with illustrating examples from cancer research.

  13. Structural model analysis of multiple quantitative traits.

    Directory of Open Access Journals (Sweden)

    Renhua Li

    2006-07-01

    Full Text Available We introduce a method for the analysis of multilocus, multitrait genetic data that provides an intuitive and precise characterization of genetic architecture. We show that it is possible to infer the magnitude and direction of causal relationships among multiple correlated phenotypes and illustrate the technique using body composition and bone density data from mouse intercross populations. Using these techniques we are able to distinguish genetic loci that affect adiposity from those that affect overall body size and thus reveal a shortcoming of standardized measures such as body mass index that are widely used in obesity research. The identification of causal networks sheds light on the nature of genetic heterogeneity and pleiotropy in complex genetic systems.

  14. Establishing the change in antibiotic resistance of Enterococcus faecium strains isolated from Dutch broilers by logistic regression and survival analysis

    NARCIS (Netherlands)

    Stegeman, J.A.; Vernooij, J.C.M.; Khalifa, O.A.; Broek, van den J.; Mevius, D.J.

    2006-01-01

    In this study, we investigated the change in the resistance of Enterococcus faecium strains isolated from Dutch broilers against erythromycin and virginiamycin in 1998, 1999 and 2001 by logistic regression analysis and survival analysis. The E. faecium strains were isolated from caecal samples that

  15. PERFORMANCE OF RIDGE REGRESSION ESTIMATOR METHODS ON SMALL SAMPLE SIZE BY VARYING CORRELATION COEFFICIENTS: A SIMULATION STUDY

    Directory of Open Access Journals (Sweden)

    Anwar Fitrianto

    2014-01-01

    Full Text Available When independent variables have high linear correlation in a multiple linear regression model, we can have wrong analysis. It happens if we do the multiple linear regression analysis based on common Ordinary Least Squares (OLS method. In this situation, we are suggested to use ridge regression estimator. We conduct some simulation study to compare the performance of ridge regression estimator and the OLS. We found that Hoerl and Kennard ridge regression estimation method has better performance than the other approaches.

  16. Quantitative laser-induced breakdown spectroscopy data using peak area step-wise regression analysis: an alternative method for interpretation of Mars science laboratory results

    Energy Technology Data Exchange (ETDEWEB)

    Clegg, Samuel M [Los Alamos National Laboratory; Barefield, James E [Los Alamos National Laboratory; Wiens, Roger C [Los Alamos National Laboratory; Dyar, Melinda D [MT HOLYOKE COLLEGE; Schafer, Martha W [LSU; Tucker, Jonathan M [MT HOLYOKE COLLEGE

    2008-01-01

    The ChemCam instrument on the Mars Science Laboratory (MSL) will include a laser-induced breakdown spectrometer (LIBS) to quantify major and minor elemental compositions. The traditional analytical chemistry approach to calibration curves for these data regresses a single diagnostic peak area against concentration for each element. This approach contrasts with a new multivariate method in which elemental concentrations are predicted by step-wise multiple regression analysis based on areas of a specific set of diagnostic peaks for each element. The method is tested on LIBS data from igneous and metamorphosed rocks. Between 4 and 13 partial regression coefficients are needed to describe each elemental abundance accurately (i.e., with a regression line of R{sup 2} > 0.9995 for the relationship between predicted and measured elemental concentration) for all major and minor elements studied. Validation plots suggest that the method is limited at present by the small data set, and will work best for prediction of concentration when a wide variety of compositions and rock types has been analyzed.

  17. BP网络和多元回归在葡萄酒质量模型中的应用%Application of BP network and multiple regression in wine quality model

    Institute of Scientific and Technical Information of China (English)

    孙文兵; 曾祥燕; 杨立君

    2014-01-01

    In order to determine the independent variables of multiple linear regression and the input layer neurons of BP network, factor analysis is used to select out the 12 physical and chemical indicators with much impact on quality of wine as their independent variables and input layer neurons, respectively. Two models are established by using multiple linear regression and improved BP neural network, respectively, which show the relationships between the physical-chemical indi-cators and the quality of wine. The comparison of generalization performance for the both models, draws that average rela-tive error of multiple linear regression model for the prediction of new samples is 1.93%, while the average relative error of the BP neural network model is 0.37%. The simulations show that the generalization capability and stability of the BP neural network are better than those of the multiple regression model.%利用因子分析法筛选出对葡萄酒质量影响较大的12种理化指标,将其作为多元线性回归的自变量和BP网络输入层神经元,分别用多元线性回归和改进的BP神经网络两种方法建立葡萄酒和酿酒葡萄的主要理化指标与葡萄酒质量的关系模型。比较了两种模型的泛化能力,得出多元线性回归模型对新样本预测的平均相对误差是1.93%,而BP神经网络模型的平均相对误差是0.37%。仿真实验表明,BP神经网络的泛化能力和稳定性明显优于多元回归模型。

  18. A Comparison of Rule-based Analysis with Regression Methods in Understanding the Risk Factors for Study Withdrawal in a Pediatric Study.

    Science.gov (United States)

    Haghighi, Mona; Johnson, Suzanne Bennett; Qian, Xiaoning; Lynch, Kristian F; Vehik, Kendra; Huang, Shuai

    2016-01-01

    Regression models are extensively used in many epidemiological studies to understand the linkage between specific outcomes of interest and their risk factors. However, regression models in general examine the average effects of the risk factors and ignore subgroups with different risk profiles. As a result, interventions are often geared towards the average member of the population, without consideration of the special health needs of different subgroups within the population. This paper demonstrates the value of using rule-based analysis methods that can identify subgroups with heterogeneous risk profiles in a population without imposing assumptions on the subgroups or method. The rules define the risk pattern of subsets of individuals by not only considering the interactions between the risk factors but also their ranges. We compared the rule-based analysis results with the results from a logistic regression model in The Environmental Determinants of Diabetes in the Young (TEDDY) study. Both methods detected a similar suite of risk factors, but the rule-based analysis was superior at detecting multiple interactions between the risk factors that characterize the subgroups. A further investigation of the particular characteristics of each subgroup may detect the special health needs of the subgroup and lead to tailored interventions.

  19. Variable Selection for Functional Logistic Regression in fMRI Data Analysis

    Directory of Open Access Journals (Sweden)

    Nedret BILLOR

    2015-03-01

    Full Text Available This study was motivated by classification problem in Functional Magnetic Resonance Imaging (fMRI, a noninvasive imaging technique which allows an experimenter to take images of a subject's brain over time. As fMRI studies usually have a small number of subjects and we assume that there is a smooth, underlying curve describing the observations in fMRI data, this results in incredibly high-dimensional datasets that are functional in nature. High dimensionality is one of the biggest problems in statistical analysis of fMRI data. There is also a need for the development of better classification methods. One of the best things about fMRI technique is its noninvasiveness. If statistical classification methods are improved, it could aid the advancement of noninvasive diagnostic techniques for mental illness or even degenerative diseases such as Alzheimer's. In this paper, we develop a variable selection technique, which tackles high dimensionality and correlation problems in fMRI data, based on L1 regularization-group lasso for the functional logistic regression model where the response is binary and represent two separate classes; the predictors are functional. We assess our method with a simulation study and an application to a real fMRI dataset.

  20. A Vehicle Traveling Time Prediction Method Based on Grey Theory and Linear Regression Analysis

    Institute of Scientific and Technical Information of China (English)

    TU Jun; LI Yan-ming; LIU Cheng-liang

    2009-01-01

    Vehicle traveling time prediction is an important part of the research of intelligent transportation system. By now, there have been various kinds of methods for vehicle traveling time prediction. But few consider both aspects of time and space. In this paper, a vehicle traveling time prediction method based on grey theory (GT) and linear regression analysis (LRA) is presented. In aspects of time, we use the history data sequence of bus speed on a certain road to predict the future bus speed on that road by GT. And in aspects of space, we calculate the traffic affecting factors between various roads by LRA. Using these factors we can predict the vehicle's speed at the lower road if the vehicle's speed at the current road is known. Finally we use time factor and space factor as the weighting factors of the two results predicted by GT and LRA respectively to find the fina0l result, thus calculating the vehicle's travehng time. The method also considers such factors as dwell time, thus making the prediction more accurate.

  1. Comparison of Bayesian and Classical Analysis of Weibull Regression Model: A Simulation Study

    Directory of Open Access Journals (Sweden)

    İmran KURT ÖMÜRLÜ

    2011-01-01

    Full Text Available Objective: The purpose of this study was to compare performances of classical Weibull Regression Model (WRM and Bayesian-WRM under varying conditions using Monte Carlo simulations. Material and Methods: It was simulated the generated data by running for each of classical WRM and Bayesian-WRM under varying informative priors and sample sizes using our simulation algorithm. In simulation studies, n=50, 100 and 250 were for sample sizes, and informative prior values using a normal prior distribution with was selected for b1. For each situation, 1000 simulations were performed. Results: Bayesian-WRM with proper informative prior showed a good performance with too little bias. It was found out that bias of Bayesian-WRM increased while priors were becoming distant from reliability in all sample sizes. Furthermore, Bayesian-WRM obtained predictions with more little standard error than the classical WRM in both of small and big samples in the light of proper priors. Conclusion: In this simulation study, Bayesian-WRM showed better performance than classical method, when subjective data analysis performed by considering of expert opinions and historical knowledge about parameters. Consequently, Bayesian-WRM should be preferred in existence of reliable informative priors, in the contrast cases, classical WRM should be preferred.

  2. Applying Different Independent Component Analysis Algorithms and Support Vector Regression for IT Chain Store Sales Forecasting

    Directory of Open Access Journals (Sweden)

    Wensheng Dai

    2014-01-01

    Full Text Available Sales forecasting is one of the most important issues in managing information technology (IT chain store sales since an IT chain store has many branches. Integrating feature extraction method and prediction tool, such as support vector regression (SVR, is a useful method for constructing an effective sales forecasting scheme. Independent component analysis (ICA is a novel feature extraction technique and has been widely applied to deal with various forecasting problems. But, up to now, only the basic ICA method (i.e., temporal ICA model was applied to sale forecasting problem. In this paper, we utilize three different ICA methods including spatial ICA (sICA, temporal ICA (tICA, and spatiotemporal ICA (stICA to extract features from the sales data and compare their performance in sales forecasting of IT chain store. Experimental results from a real sales data show that the sales forecasting scheme by integrating stICA and SVR outperforms the comparison models in terms of forecasting error. The stICA is a promising tool for extracting effective features from branch sales data and the extracted features can improve the prediction performance of SVR for sales forecasting.

  3. Applying different independent component analysis algorithms and support vector regression for IT chain store sales forecasting.

    Science.gov (United States)

    Dai, Wensheng; Wu, Jui-Yu; Lu, Chi-Jie

    2014-01-01

    Sales forecasting is one of the most important issues in managing information technology (IT) chain store sales since an IT chain store has many branches. Integrating feature extraction method and prediction tool, such as support vector regression (SVR), is a useful method for constructing an effective sales forecasting scheme. Independent component analysis (ICA) is a novel feature extraction technique and has been widely applied to deal with various forecasting problems. But, up to now, only the basic ICA method (i.e., temporal ICA model) was applied to sale forecasting problem. In this paper, we utilize three different ICA methods including spatial ICA (sICA), temporal ICA (tICA), and spatiotemporal ICA (stICA) to extract features from the sales data and compare their performance in sales forecasting of IT chain store. Experimental results from a real sales data show that the sales forecasting scheme by integrating stICA and SVR outperforms the comparison models in terms of forecasting error. The stICA is a promising tool for extracting effective features from branch sales data and the extracted features can improve the prediction performance of SVR for sales forecasting.

  4. Effect of acute hypoxia on cognition: A systematic review and meta-regression analysis.

    Science.gov (United States)

    McMorris, Terry; Hale, Beverley J; Barwood, Martin; Costello, Joseph; Corbett, Jo

    2017-03-01

    A systematic meta-regression analysis of the effects of acute hypoxia on the performance of central executive and non-executive tasks, and the effects of the moderating variables, arterial partial pressure of oxygen (PaO2) and hypobaric versus normobaric hypoxia, was undertaken. Studies were included if they were performed on healthy humans; within-subject design was used; data were reported giving the PaO2 or that allowed the PaO2 to be estimated (e.g. arterial oxygen saturation and/or altitude); and the duration of being in a hypoxic state prior to cognitive testing was ≤6days. Twenty-two experiments met the criteria for inclusion and demonstrated a moderate, negative mean effect size (g=-0.49, 95% CI -0.64 to -0.34, p<0.001). There were no significant differences between central executive and non-executive, perception/attention and short-term memory, tasks. Low (35-60mmHg) PaO2 was the key predictor of cognitive performance (R(2)=0.45, p<0.001) and this was independent of whether the exposure was in hypobaric hypoxic or normobaric hypoxic conditions.

  5. A systematic review and meta-regression analysis of mivacurium for tracheal intubation.

    Science.gov (United States)

    Vanlinthout, L E H; Mesfin, S H; Hens, N; Vanacker, B F; Robertson, E N; Booij, L H D J

    2014-12-01

    We systematically reviewed factors associated with intubation conditions in randomised controlled trials of mivacurium, using random-effects meta-regression analysis. We included 29 studies of 1050 healthy participants. Four factors explained 72.9% of the variation in the probability of excellent intubation conditions: mivacurium dose, 24.4%; opioid use, 29.9%; time to intubation and age together, 18.6%. The odds ratio (95% CI) for excellent intubation was 3.14 (1.65-5.73) for doubling the mivacurium dose, 5.99 (2.14-15.18) for adding opioids to the intubation sequence, and 6.55 (6.01-7.74) for increasing the delay between mivacurium injection and airway insertion from 1 to 2 min in subjects aged 25 years and 2.17 (2.01-2.69) for subjects aged 70 years, p < 0.001 for all. We conclude that good conditions for tracheal intubation are more likely by delaying laryngoscopy after injecting a higher dose of mivacurium with an opioid, particularly in older people.

  6. An Original Stepwise Multilevel Logistic Regression Analysis of Discriminatory Accuracy: The Case of Neighbourhoods and Health.

    Directory of Open Access Journals (Sweden)

    Juan Merlo

    Full Text Available Many multilevel logistic regression analyses of "neighbourhood and health" focus on interpreting measures of associations (e.g., odds ratio, OR. In contrast, multilevel analysis of variance is rarely considered. We propose an original stepwise analytical approach that distinguishes between "specific" (measures of association and "general" (measures of variance contextual effects. Performing two empirical examples we illustrate the methodology, interpret the results and discuss the implications of this kind of analysis in public health.We analyse 43,291 individuals residing in 218 neighbourhoods in the city of Malmö, Sweden in 2006. We study two individual outcomes (psychotropic drug use and choice of private vs. public general practitioner, GP for which the relative importance of neighbourhood as a source of individual variation differs substantially. In Step 1 of the analysis, we evaluate the OR and the area under the receiver operating characteristic (AUC curve for individual-level covariates (i.e., age, sex and individual low income. In Step 2, we assess general contextual effects using the AUC. Finally, in Step 3 the OR for a specific neighbourhood characteristic (i.e., neighbourhood income is interpreted jointly with the proportional change in variance (i.e., PCV and the proportion of ORs in the opposite direction (POOR statistics.For both outcomes, information on individual characteristics (Step 1 provide a low discriminatory accuracy (AUC = 0.616 for psychotropic drugs; = 0.600 for choosing a private GP. Accounting for neighbourhood of residence (Step 2 only improved the AUC for choosing a private GP (+0.295 units. High neighbourhood income (Step 3 was strongly associated to choosing a private GP (OR = 3.50 but the PCV was only 11% and the POOR 33%.Applying an innovative stepwise multilevel analysis, we observed that, in Malmö, the neighbourhood context per se had a negligible influence on individual use of psychotropic drugs, but

  7. Regional Flood Frequency Analysis using Support Vector Regression under historical and future climate

    Science.gov (United States)

    Gizaw, Mesgana Seyoum; Gan, Thian Yew

    2016-07-01

    Regional Flood Frequency Analysis (RFFA) is a statistical method widely used to estimate flood quantiles of catchments with limited streamflow data. In addition, to estimate the flood quantile of ungauged sites, there could be only a limited number of stations with complete dataset are available from hydrologically similar, surrounding catchments. Besides traditional regression based RFFA methods, recent applications of machine learning algorithms such as the artificial neural network (ANN) have shown encouraging results in regional flood quantile estimations. Another novel machine learning technique that is becoming widely applicable in the hydrologic community is the Support Vector Regression (SVR). In this study, an RFFA model based on SVR was developed to estimate regional flood quantiles for two study areas, one with 26 catchments located in southeastern British Columbia (BC) and another with 23 catchments located in southern Ontario (ON), Canada. The SVR-RFFA model for both study sites was developed from 13 sets of physiographic and climatic predictors for the historical period. The Ef (Nash Sutcliffe coefficient) and R2 of the SVR-RFFA model was about 0.7 when estimating flood quantiles of 10, 25, 50 and 100 year return periods which indicate satisfactory model performance in both study areas. In addition, the SVR-RFFA model also performed well based on other goodness-of-fit statistics such as BIAS (mean bias) and BIASr (relative BIAS). If the amount of data available for training RFFA models is limited, the SVR-RFFA model was found to perform better than an ANN based RFFA model, and with significantly lower median CV (coefficient of variation) of the estimated flood quantiles. The SVR-RFFA model was then used to project changes in flood quantiles over the two study areas under the impact of climate change using the RCP4.5 and RCP8.5 climate projections of five Coupled Model Intercomparison Project (CMIP5) GCMs (Global Climate Models) for the 2041

  8. Modeling and regression analysis of semiochemical dose-response curves of insect antennal reception and behavior.

    Science.gov (United States)

    Byers, John A

    2013-08-01

    Dose-response curves of the effects of semiochemicals on neurophysiology and behavior are reported in many articles in insect chemical ecology. Most curves are shown in figures representing points connected by straight lines, in which the x-axis has order of magnitude increases in dosage vs. responses on the y-axis. The lack of regression curves indicates that the nature of the dose-response relationship is not well understood. Thus, a computer model was developed to simulate a flux of various numbers of pheromone molecules (10(3) to 5 × 10(6)) passing by 10(4) receptors distributed among 10(6) positions along an insect antenna. Each receptor was depolarized by at least one strike by a molecule, and subsequent strikes had no additional effect. The simulations showed that with an increase in pheromone release rate, the antennal response would increase in a convex fashion and not in a logarithmic relation as suggested previously. Non-linear regression showed that a family of kinetic formation functions fit the simulated data nearly perfectly (R(2) >0.999). This is reasonable because olfactory receptors have proteins that bind to the pheromone molecule and are expected to exhibit enzyme kinetics. Over 90 dose-response relationships reported in the literature of electroantennographic and behavioral bioassays in the laboratory and field were analyzed by the logarithmic and kinetic formation functions. This analysis showed that in 95% of the cases, the kinetic functions explained the relationships better than the logarithmic (mean of about 20% better). The kinetic curves become sigmoid when graphed on a log scale on the x-axis. Dose-catch relationships in the field are similar to dose-EAR (effective attraction radius, in which a spherical radius indicates the trapping effect of a lure) and the circular EARc in two dimensions used in mass trapping models. The use of kinetic formation functions for dose-response curves of attractants, and kinetic decay curves for

  9. Duloxetine compared with fluoxetine and venlafaxine: use of meta-regression analysis for indirect comparisons

    Directory of Open Access Journals (Sweden)

    Lançon Christophe

    2006-07-01

    Full Text Available Abstract Background Data comparing duloxetine with existing antidepressant treatments is limited. A comparison of duloxetine with fluoxetine has been performed but no comparison with venlafaxine, the other antidepressant in the same therapeutic class with a significant market share, has been undertaken. In the absence of relevant data to assess the place that duloxetine should occupy in the therapeutic arsenal, indirect comparisons are the most rigorous way to go. We conducted a systematic review of the efficacy of duloxetine, fluoxetine and venlafaxine versus placebo in the treatment of Major Depressive Disorder (MDD, and performed indirect comparisons through meta-regressions. Methods The bibliography of the Agency for Health Care Policy and Research and the CENTRAL, Medline, and Embase databases were interrogated using advanced search strategies based on a combination of text and index terms. The search focused on randomized placebo-controlled clinical trials involving adult patients treated for acute phase Major Depressive Disorder. All outcomes were derived to take account for varying placebo responses throughout studies. Primary outcome was treatment efficacy as measured by Hedge's g effect size. Secondary outcomes were response and dropout rates as measured by log odds ratios. Meta-regressions were run to indirectly compare the drugs. Sensitivity analysis, assessing the influence of individual studies over the results, and the influence of patients' characteristics were run. Results 22 studies involving fluoxetine, 9 involving duloxetine and 8 involving venlafaxine were selected. Using indirect comparison methodology, estimated effect sizes for efficacy compared with duloxetine were 0.11 [-0.14;0.36] for fluoxetine and 0.22 [0.06;0.38] for venlafaxine. Response log odds ratios were -0.21 [-0.44;0.03], 0.70 [0.26;1.14]. Dropout log odds ratios were -0.02 [-0.33;0.29], 0.21 [-0.13;0.55]. Sensitivity analyses showed that results were

  10. Facilitating Analysis of Multiple Partial Data Streams

    Science.gov (United States)

    Maimone, Mark W.; Liebersbach, Robert R.

    2008-01-01

    Robotic Operations Automation: Mechanisms, Imaging, Navigation report Generation (ROAMING) is a set of computer programs that facilitates and accelerates both tactical and strategic analysis of time-sampled data especially the disparate and often incomplete streams of Mars Explorer Rover (MER) telemetry data described in the immediately preceding article. As used here, tactical refers to the activities over a relatively short time (one Martian day in the original MER application) and strategic refers to a longer time (the entire multi-year MER missions in the original application). Prior to installation, ROAMING must be configured with the types of data of interest, and parsers must be modified to understand the format of the input data (many example parsers are provided, including for general CSV files). Thereafter, new data from multiple disparate sources are automatically resampled into a single common annotated spreadsheet stored in a readable space-separated format, and these data can be processed or plotted at any time scale. Such processing or plotting makes it possible to study not only the details of a particular activity spanning only a few seconds, but also longer-term trends. ROAMING makes it possible to generate mission-wide plots of multiple engineering quantities [e.g., vehicle tilt as in Figure 1(a), motor current, numbers of images] that, heretofore could be found only in thousands of separate files. ROAMING also supports automatic annotation of both images and graphs. In the MER application, labels given to terrain features by rover scientists and engineers are automatically plotted in all received images based on their associated camera models (see Figure 2), times measured in seconds are mapped to Mars local time, and command names or arbitrary time-labeled events can be used to label engineering plots, as in Figure 1(b).

  11. Variables that influence HIV-1 cerebrospinal fluid viral load in cryptococcal meningitis: a linear regression analysis

    Directory of Open Access Journals (Sweden)

    Cecchini Diego M

    2009-11-01

    Full Text Available Abstract Background The central nervous system is considered a sanctuary site for HIV-1 replication. Variables associated with HIV cerebrospinal fluid (CSF viral load in the context of opportunistic CNS infections are poorly understood. Our objective was to evaluate the relation between: (1 CSF HIV-1 viral load and CSF cytological and biochemical characteristics (leukocyte count, protein concentration, cryptococcal antigen titer; (2 CSF HIV-1 viral load and HIV-1 plasma viral load; and (3 CSF leukocyte count and the peripheral blood CD4+ T lymphocyte count. Methods Our approach was to use a prospective collection and analysis of pre-treatment, paired CSF and plasma samples from antiretroviral-naive HIV-positive patients with cryptococcal meningitis and assisted at the Francisco J Muñiz Hospital, Buenos Aires, Argentina (period: 2004 to 2006. We measured HIV CSF and plasma levels by polymerase chain reaction using the Cobas Amplicor HIV-1 Monitor Test version 1.5 (Roche. Data were processed with Statistix 7.0 software (linear regression analysis. Results Samples from 34 patients were analyzed. CSF leukocyte count showed statistically significant correlation with CSF HIV-1 viral load (r = 0.4, 95% CI = 0.13-0.63, p = 0.01. No correlation was found with the plasma viral load, CSF protein concentration and cryptococcal antigen titer. A positive correlation was found between peripheral blood CD4+ T lymphocyte count and the CSF leukocyte count (r = 0.44, 95% CI = 0.125-0.674, p = 0.0123. Conclusion Our study suggests that CSF leukocyte count influences CSF HIV-1 viral load in patients with meningitis caused by Cryptococcus neoformans.

  12. The Analysis of Internet Addiction Scale Using Multivariate Adaptive Regression Splines

    Directory of Open Access Journals (Sweden)

    M Kayri

    2010-12-01

    Full Text Available "nBackground: Determining real effects on internet dependency is too crucial with unbiased and robust statistical method. MARS is a new non-parametric method in use in the literature for parameter estimations of cause and effect based research. MARS can both obtain legible model curves and make unbiased parametric predictions."nMethods: In order to examine the performance of MARS, MARS findings will be compared to Classification and Regres­sion Tree (C&RT findings, which are considered in the literature to be efficient in revealing correlations between variables. The data set for the study is taken from "The Internet Addiction Scale" (IAS, which attempts to reveal addiction levels of individu­als. The population of the study consists of 754 secondary school students (301 female, 443 male students with 10 miss­ing data. MARS 2.0 trial version is used for analysis by MARS method and C&RT analysis was done by SPSS."nResults: MARS obtained six base functions of the model. As a common result of these six functions, regression equation of the model was found. Over the predicted variable, MARS showed that the predictors of daily Internet-use time on average, the purpose of Internet- use, grade of students and occupations of mothers had a significant effect (P< 0.05. In this compara­tive study, MARS obtained different findings from C&RT in dependency level prediction."nConclusion: The fact that MARS revealed extent to which the variable, which was considered significant, changes the charac­ter of the model was observed in this study.

  13. Predictive model of biliocystic communication in liver hydatid cysts using classification and regression tree analysis

    Directory of Open Access Journals (Sweden)

    Souadka Amine

    2010-04-01

    Full Text Available Abstract Background Incidence of liver hydatid cyst (LHC rupture ranged 15%-40% of all cases and most of them concern the bile duct tree. Patients with biliocystic communication (BCC had specific clinic and therapeutic aspect. The purpose of this study was to determine witch patients with LHC may develop BCC using classification and regression tree (CART analysis Methods A retrospective study of 672 patients with liver hydatid cyst treated at the surgery department "A" at Ibn Sina University Hospital, Rabat Morocco. Four-teen risk factors for BCC occurrence were entered into CART analysis to build an algorithm that can predict at the best way the occurrence of BCC. Results Incidence of BCC was 24.5%. Subgroups with high risk were patients with jaundice and thick pericyst risk at 73.2% and patients with thick pericyst, with no jaundice 36.5 years and younger with no past history of LHC risk at 40.5%. Our developed CART model has sensitivity at 39.6%, specificity at 93.3%, positive predictive value at 65.6%, a negative predictive value at 82.6% and accuracy of good classification at 80.1%. Discriminating ability of the model was good 82%. Conclusion we developed a simple classification tool to identify LHC patients with high risk BCC during a routine clinic visit (only on clinical history and examination followed by an ultrasonography. Predictive factors were based on pericyst aspect, jaundice, age, past history of liver hydatidosis and morphological Gharbi cyst aspect. We think that this classification can be useful with efficacy to direct patients at appropriated medical struct's.

  14. Verifying the performance of artificial neural network and multiple linear regression in predicting the mean seasonal municipal solid waste generation rate: A case study of Fars province, Iran.

    Science.gov (United States)

    Azadi, Sama; Karimi-Jashni, Ayoub

    2016-02-01

    Predicting the mass of solid waste generation plays an important role in integrated solid waste management plans. In this study, the performance of two predictive models, Artificial Neural Network (ANN) and Multiple Linear Regression (MLR) was verified to predict mean Seasonal Municipal Solid Waste Generation (SMSWG) rate. The accuracy of the proposed models is illustrated through a case study of 20 cities located in Fars Province, Iran. Four performance measures, MAE, MAPE, RMSE and R were used to evaluate the performance of these models. The MLR, as a conventional model, showed poor prediction performance. On the other hand, the results indicated that the ANN model, as a non-linear model, has a higher predictive accuracy when it comes to prediction of the mean SMSWG rate. As a result, in order to develop a more cost-effective strategy for waste management in the future, the ANN model could be used to predict the mean SMSWG rate.

  15. An improved approach for measuring the impact of multiple CO2 conductances on the apparent photorespiratory CO2 compensation point through slope-intercept regression.

    Science.gov (United States)

    Walker, Berkley J; Skabelund, Dane C; Busch, Florian A; Ort, Donald R

    2016-06-01

    Biochemical models of leaf photosynthesis, which are essential for understanding the impact of photosynthesis to changing environments, depend on accurate parameterizations. One such parameter, the photorespiratory CO2 compensation point can be measured from the intersection of several CO2 response curves measured under sub-saturating illumination. However, determining the actual intersection while accounting for experimental noise can be challenging. Additionally, leaf photosynthesis model outcomes are sensitive to the diffusion paths of CO2 released from the mitochondria. This diffusion path of CO2 includes both chloroplastic as well as cell wall resistances to CO2 , which are not readily measurable. Both the difficulties of determining the photorespiratory CO2 compensation point and the impact of multiple intercellular resistances to CO2 can be addressed through application of slope-intercept regression. This technical report summarizes an improved framework for implementing slope-intercept regression to evaluate measurements of the photorespiratory CO2 compensation point. This approach extends past work to include the cases of both Rubisco and Ribulose-1,5-bisphosphate (RuBP)-limited photosynthesis. This report further presents two interactive graphical applications and a spreadsheet-based tool to allow users to apply slope-intercept theory to their data.

  16. Regression models for air pollution and daily mortality: analysis of data from Birmingham, Alabama

    Energy Technology Data Exchange (ETDEWEB)

    Smith, R.L. [University of North Carolina, Chapel Hill, NC (United States). Dept. of Statistics; Davis, J.M. [North Carolina State University, Raleigh, NC (United States). Dept. of Marine, Earth and Atmospheric Sciences; Sacks, J. [National Institute of Statistical Sciences, Research Triangle Park, NC (United States); Speckman, P. [University of Missouri, Columbia, MO (United States). Dept. of Statistics; Styer, P.

    2000-11-01

    In recent years, a very large literature has built up on the human health effects of air pollution. Many studies have been based on time series analyses in which daily mortality counts, or some other measure such as hospital admissions, have been decomposed through regression analysis into contributions based on long-term trend and seasonality, meteorological effects, and air pollution. There has been a particular focus on particulate air pollution represented by PM{sub 10} (particulate matter of aerodynamic diameter 10 {mu}m or less), though in recent years more attention has been given to very small particles of diameter 2.5 {mu}m or less. Most of the existing data studies, however, are based on PM{sub 10} because of the wide availability of monitoring data for this variable. The persistence of the resulting effects across many different studies is widely cited as evidence that this is not mere statistical association, but indeed establishes a causal relationship. These studies have been cited by the United States Environmental Protection Agency (USEPA) as justification for a tightening on particulate matter standards in the 1997 revision of the National Ambient Air Quality Standard (NAAQS), which is the basis for air pollution regulation in the United States. The purpose of the present paper is to propose a systematic approach to the regression analyses that are central to this kind of research. We argue that the results may depend on a number of ad hoc features of the analysis, including which meteorological variables to adjust for, and the manner in which different lagged values of particulate matter are combined into a single 'exposure measure'. We also examine the question of whether the effects are linear or nonlinear, with particular attention to the possibility of a 'threshold effect', i.e. that significant effects occur only above some threshold. These points are illustrated with a data set from Birmingham, Alabama, first cited by

  17. Multiple regression and inverse moments improve the characterization of the spatial scaling behavior of daily streamflows in the Southeast United States

    Science.gov (United States)

    Farmer, William H.; Over, Thomas M.; Vogel, Richard M.

    2015-01-01

    Understanding the spatial structure of daily streamflow is essential for managing freshwater resources, especially in poorly-gaged regions. Spatial scaling assumptions are common in flood frequency prediction (e.g., index-flood method) and the prediction of continuous streamflow at ungaged sites (e.g. drainage-area ratio), with simple scaling by drainage area being the most common assumption. In this study, scaling analyses of daily streamflow from 173 streamgages in the southeastern US resulted in three important findings. First, the use of only positive integer moment orders, as has been done in most previous studies, captures only the probabilistic and spatial scaling behavior of flows above an exceedance probability near the median; negative moment orders (inverse moments) are needed for lower streamflows. Second, assessing scaling by using drainage area alone is shown to result in a high degree of omitted-variable bias, masking the true spatial scaling behavior. Multiple regression is shown to mitigate this bias, controlling for regional heterogeneity of basin attributes, especially those correlated with drainage area. Previous univariate scaling analyses have neglected the scaling of low-flow events and may have produced biased estimates of the spatial scaling exponent. Third, the multiple regression results show that mean flows scale with an exponent of one, low flows scale with spatial scaling exponents greater than one, and high flows scale with exponents less than one. The relationship between scaling exponents and exceedance probabilities may be a fundamental signature of regional streamflow. This signature may improve our understanding of the physical processes generating streamflow at different exceedance probabilities. 

  18. Variable precision rough set for multiple decision attribute analysis

    Institute of Scientific and Technical Information of China (English)

    Lai; Kin; Keung

    2008-01-01

    A variable precision rough set (VPRS) model is used to solve the multi-attribute decision analysis (MADA) problem with multiple conflicting decision attributes and multiple condition attributes. By introducing confidence measures and a β-reduct, the VPRS model can rationally solve the conflicting decision analysis problem with multiple decision attributes and multiple condition attributes. For illustration, a medical diagnosis example is utilized to show the feasibility of the VPRS model in solving the MADA...

  19. Numerical analysis of fuel regression rate distribution characteristics in hybrid rocket motors with different fuel types

    Institute of Scientific and Technical Information of China (English)

    LI; XinTian; TIAN; Hui; CAI; GuoBiao

    2013-01-01

    This paper presents three-dimensional numerical simulations of the hybrid rocket motor with hydrogen peroxide (HP) and hy-droxyl terminated polybutadiene (HTPB) propellant combination and investigates the fuel regression rate distribution charac-teristics of different fuel types. The numerical models are established to couple the Navier-Stokes equations with turbulence,chemical reactions, solid fuel pyrolysis and solid-gas interfacial boundary conditions. Simulation results including the temper-ature contours and fuel regression rate distributions are presented for the tube, star and wagon wheel grains. The results demonstrate that the changing trends of the regression rate along the axis are similar for all kinds of fuel types, which decrease sharply near the leading edges of the fuels and then gradually increase with increasing axial locations. The regression rates of the star and wagon wheel grains show apparent three-dimensional characteristics, and they are higher in the regions of fuel surfaces near the central core oxidizer flow. The average regression rates increase as the oxidizer mass fluxes rise for all of the fuel types. However, under same oxidizer mass flux, the average regression rates of the star and wagon wheel grains are much larger than that of the tube grain due to their lower hydraulic diameters.

  20. Quantifying image distortion based on Gabor filter bank and multiple regression analysis

    Science.gov (United States)

    Ortiz-Jaramillo, B.; Garcia-Alvarez, J. C.; Führ, H.; Castellanos-Dominguez, G.; Philips, W.

    2012-01-01

    Image quality assessment is indispensable for image-based applications. The approaches towards image quality assessment fall into two main categories: subjective and objective methods. Subjective assessment has been widely used. However, careful subjective assessments are experimentally difficult and lengthy, and the results obtained may vary depending on the test conditions. On the other hand, objective image quality assessment would not only alleviate the difficulties described above but would also help to expand the application field. Therefore, several works have been developed for quantifying the distortion presented on a image achieving goodness of fit between subjective and objective scores up to 92%. Nevertheless, current methodologies are designed assuming that the nature of the distortion is known. Generally, this is a limiting assumption for practical applications, since in a majority of cases the distortions in the image are unknown. Therefore, we believe that the current methods of image quality assessment should be adapted in order to identify and quantify the distortion of images at the same time. That combination can improve processes such as enhancement, restoration, compression, transmission, among others. We present an approach based on the power of the experimental design and the joint localization of the Gabor filters for studying the influence of the spatial/frequencies on image quality assessment. Therefore, we achieve a correct identification and quantification of the distortion affecting images. This method provides accurate scores and differentiability between distortions.

  1. Study on traffic noise level of Sylhet by multiple regression analysis associated with health hazards

    Directory of Open Access Journals (Sweden)

    J. B. Alam, M. Jobair Bin Alam, M. M. Rahman, A. K. Dikshit, S. K. Khan

    Full Text Available The study reports the level of traffic-induced noise pollution in Sylhet City. For this purpose noise levels have been measured at thirty-seven major locations of the city from 7 am to 11 pm during the working days. It was observed that at all the locations the level of noise remains far above the acceptable limit for all the time. The noise level on the main road near residential area, hospital area and educational area were above the recommended level (65dBA. It was found that the predictive equations are in 60-70% correlated with the measured noise level. The study suggests that vulnerable institutions like school and hospital should be located about 60m away from the roadside unless any special arrangement to alleviate sound is used.

  2. Market Value Estimation Models for Marine Surface Vessels with the Use of Multiple Regression Analysis.

    Science.gov (United States)

    1982-12-01

    which should be used when quantifying "value." Some common alternative measures may include terms such as book value, net realizable value , cur- rent...brokerage fee) [Ref. 1: p. 9-6]. Depending on whether or not there 15 are any preparation or brokerage costs, the net realizable value may be equivalent...of petroleum-carrying vessels to the realizable value of their scrap steel. For example, the Motor Vessel EXXON FLORENCE was recently sold in Taiwan

  3. Development of QSAR Model of substituted Benzene Sulphonamide using Multiple Regression Analysis

    Directory of Open Access Journals (Sweden)

    R.G.Varma

    2014-01-01

    Full Text Available In continuation of our earlier work in this paper we studied 50 substituted Benzenesulphonamide using substituent nanofluorobutyl sulphonyl chloride (C4F9SO2Cl and pentafluoro benzene sulphonyl chloride (C6F5SO2Cl Accordingly we have development QSAR model of studied compounds. These models were derived using the parameters Balaban Index, Balaban-type index,(Jhetz, Jhetm, Jhetv, Jhete, Jhetp Balaban related index (F,G. and Randic connectivity index (χ1. The best suitable model is predicted on the basis of Maximum-R2 (R-squared.

  4. Analysis of multiple linear regression algorithms used for respiratory mechanics monitoring during artificial ventilation.

    Science.gov (United States)

    Polak, Adam G

    2011-02-01

    Many patients undergo long-term artificial ventilation and their respiratory system mechanics should be monitored to detect changes in the patient's state and to optimize ventilator settings. In this work the most popular algorithms for tracking variations of respiratory resistance (R(rs)) and elastance (E(rs)) over a ventilatory cycle were analysed in terms of systematic and random errors. Additionally, a new approach was proposed and compared to the previous ones. It takes into account an exact description of flow integration by volume-dependent lung compliance. The results of analyses showed advantages of this new approach and enabled to form several suggestions. Algorithms including R(rs) and E(rs) dependencies on airflow and lung volume can be effectively applied only at low levels of noise present in measurement data, otherwise the use of the simplest model with constant parameters is preferable. Additionally, one should avoid including the resistance dependence on airflow alone, since this considerably destroys the retrieved trace of R(rs). Finally, the estimated cyclic trajectories of R(rs) and E(rs) are more sensitive to noise present in pressure than in the flow signal, and the elastance traces are estimated more accurately than the resistance ones.

  5. Rolling Regressions with Stata

    OpenAIRE

    Kit Baum

    2004-01-01

    This talk will describe some work underway to add a "rolling regression" capability to Stata's suite of time series features. Although commands such as "statsby" permit analysis of non-overlapping subsamples in the time domain, they are not suited to the analysis of overlapping (e.g. "moving window") samples. Both moving-window and widening-window techniques are often used to judge the stability of time series regression relationships. We will present an implementation of a rolling regression...

  6. Malignant lymphatic and hematopoietic neoplasms mortality in Serbia, 1991-2010: a joinpoint regression analysis.

    Directory of Open Access Journals (Sweden)

    Milena Ilic

    Full Text Available BACKGROUND: Limited data on mortality from malignant lymphatic and hematopoietic neoplasms have been published for Serbia. METHODS: The study covered population of Serbia during the 1991-2010 period. Mortality trends were assessed using the joinpoint regression analysis. RESULTS: Trend for overall death rates from malignant lymphoid and haematopoietic neoplasms significantly decreased: by -2.16% per year from 1991 through 1998, and then significantly increased by +2.20% per year for the 1998-2010 period. The growth during the entire period was on average +0.8% per year (95% CI 0.3 to 1.3. Mortality was higher among males than among females in all age groups. According to the comparability test, mortality trends from malignant lymphoid and haematopoietic neoplasms in men and women were parallel (final selected model failed to reject parallelism, P = 0.232. Among younger Serbian population (0-44 years old in both sexes: trends significantly declined in males for the entire period, while in females 15-44 years of age mortality rates significantly declined only from 2003 onwards. Mortality trend significantly increased in elderly in both genders (by +1.7% in males and +1.5% in females in the 60-69 age group, and +3.8% in males and +3.6% in females in the 70+ age group. According to the comparability test, mortality trend for Hodgkin's lymphoma differed significantly from mortality trends for all other types of malignant lymphoid and haematopoietic neoplasms (P<0.05. CONCLUSION: Unfavourable mortality trend in Serbia requires targeted intervention for risk factors control, early diagnosis and modern therapy.

  7. Regression analysis of time trends in perinatal mortality in Germany 1980-1993.

    Science.gov (United States)

    Scherb, H; Weigelt, E; Brüske-Hohlfeld, I

    2000-02-01

    Numerous investigations have been carried out on the possible impact of the Chernobyl accident on the prevalence of anomalies at birth and on perinatal mortality. In many cases the studies were aimed at the detection of differences of pregnancy outcome measurements between regions or time periods. Most authors conclude that there is no evidence of a detrimental physical effect on congenital anomalies or other outcomes of pregnancy following the accident. In this paper, we report on statistical analyses of time trends of perinatal mortality in Germany. Our main intention is to investigate whether perinatal mortality, as reflected in official records, was increased in 1987 as a possible effect of the Chernobyl accident. We show that, in Germany as a whole, there was a significantly elevated perinatal mortality proportion in 1987 as compared to the trend function. The increase is 4.8% (p = 0.0046) of the expected perinatal death proportion for 1987. Even more pronounced levels of 8.2% (p = 0. 0458) and 8.5% (p = 0.0702) may be found in the higher contaminated areas of the former German Democratic Republic (GDR), including West Berlin, and of Bavaria, respectively. To investigate the impact of statistical models on results, we applied three standard regression techniques. The observed significant increase in 1987 is independent of the statistical model used. Stillbirth proportions show essentially the same behavior as perinatal death proportions, but the results for all of Germany are nonsignificant due to the smaller numbers involved. Analysis of the association of stillbirth proportions with the (137)Cs deposition on a district level in Bavaria discloses a significant relationship. Our results are in contrast to those of many analyses of the health consequences of the Chernobyl accident and contradict the present radiobiologic knowledge. As we are dealing with highly aggregated data, other causes or artifacts may explain the observed effects. Hence, the findings

  8. Expert Involvement Predicts mHealth App Downloads: Multivariate Regression Analysis of Urology Apps

    Science.gov (United States)

    Osório, Luís; Cavadas, Vitor; Fraga, Avelino; Carrasquinho, Eduardo; Cardoso de Oliveira, Eduardo; Castelo-Branco, Miguel; Roobol, Monique J

    2016-01-01

    Background Urological mobile medical (mHealth) apps are gaining popularity with both clinicians and patients. mHealth is a rapidly evolving and heterogeneous field, with some urology apps being downloaded over 10,000 times and others not at all. The factors that contribute to medical app downloads have yet to be identified, including the hypothetical influence of expert involvement in app development. Objective The objective of our study was to identify predictors of the number of urology app downloads. Methods We reviewed urology apps available in the Google Play Store and collected publicly available data. Multivariate ordinal logistic regression evaluated the effect of publicly available app variables on the number of apps being downloaded. Results Of 129 urology apps eligible for study, only 2 (1.6%) had >10,000 downloads, with half having ≤100 downloads and 4 (3.1%) having none at all. Apps developed with expert urologist involvement (P=.003), optional in-app purchases (P=.01), higher user rating (P<.001), and more user reviews (P<.001) were more likely to be installed. App cost was inversely related to the number of downloads (P<.001). Only data from the Google Play Store and the developers’ websites, but not other platforms, were publicly available for analysis, and the level and nature of expert involvement was not documented. Conclusions The explicit participation of urologists in app development is likely to enhance its chances to have a higher number of downloads. This finding should help in the design of better apps and further promote urologist involvement in mHealth. Official certification processes are required to ensure app quality and user safety. PMID:27421338

  9. Principal component regression and linear mixed model in association analysis of structured samples: competitors or complements?

    Science.gov (United States)

    Zhang, Yiwei; Pan, Wei

    2015-03-01

    Genome-wide association studies (GWAS) have been established as a major tool to identify genetic variants associated with complex traits, such as common diseases. However, GWAS may suffer from false positives and false negatives due to confounding population structures, including known or unknown relatedness. Another important issue is unmeasured environmental risk factors. Among many methods for adjusting for population structures, two approaches stand out: one is principal component regression (PCR) based on principal component analysis, which is perhaps the most popular due to its early appearance, simplicity, and general effectiveness; the other is based on a linear mixed model (LMM) that has emerged recently as perhaps the most flexible and effective, especially for samples with complex structures as in model organisms. As shown previously, the PCR approach can be regarded as an approximation to an LMM; such an approximation depends on the number of the top principal components (PCs) used, the choice of which is often difficult in practice. Hence, in the presence of population structure, the LMM appears to outperform the PCR method. However, due to the different treatments of fixed vs. random effects in the two approaches, we show an advantage of PCR over LMM: in the presence of an unknown but spatially confined environmental confounder (e.g., environmental pollution or lifestyle), the PCs may be able to implicitly and effectively adjust for the confounder whereas the LMM cannot. Accordingly, to adjust for both population structures and nongenetic confounders, we propose a hybrid method combining the use and, thus, strengths of PCR and LMM. We use real genotype data and simulated phenotypes to confirm the above points, and establish the superior performance of the hybrid method across all scenarios.

  10. Determining the Relationship between U.S. County-Level Adult Obesity Rate and Multiple Risk Factors by PLS Regression and SVM Modeling Approaches

    Directory of Open Access Journals (Sweden)

    Chau-Kuang Chen

    2015-02-01

    Full Text Available Data from the Center for Disease Control (CDC has shown that the obesity rate doubled among adults within the past two decades. This upsurge was the result of changes in human behavior and environment. Partial least squares (PLS regression and support vector machine (SVM models were conducted to determine the relationship between U.S. county-level adult obesity rate and multiple risk factors. The outcome variable was the adult obesity rate. The 23 risk factors were categorized into four domains of the social ecological model including biological/behavioral factor, socioeconomic status, food environment, and physical environment. Of the 23 risk factors related to adult obesity, the top eight significant risk factors with high normalized importance were identified including physical inactivity, natural amenity, percent of households receiving SNAP benefits, and percent of all restaurants being fast food. The study results were consistent with those in the literature. The study showed that adult obesity rate was influenced by biological/behavioral factor, socioeconomic status, food environment, and physical environment embedded in the social ecological theory. By analyzing multiple risk factors of obesity in the communities, may lead to the proposal of more comprehensive and integrated policies and intervention programs to solve the population-based problem.

  11. Diet influenced tooth erosion prevalence in children and adolescents: Results of a meta-analysis and meta-regression

    NARCIS (Netherlands)

    Salas, M.M.; Nascimento, G.G.; Vargas-Ferreira, F.; Tarquinio, S.B.; Huysmans, M.C.D.N.J.M.; Demarco, F.F.

    2015-01-01

    OBJECTIVE: The aim of the present study was to assess the influence of diet in tooth erosion presence in children and adolescents by meta-analysis and meta-regression. DATA: Two reviewers independently performed the selection process and the quality of studies was assessed. SOURCES: Studies publishe

  12. Significant drivers of the virtual water trade evaluated with a multivariate regression analysis

    Science.gov (United States)

    Tamea, Stefania; Laio, Francesco; Ridolfi, Luca

    2014-05-01

    International trade of food is vital for the food security of many countries, which rely on trade to compensate for an agricultural production insufficient to feed the population. At the same time, food trade has implications on the distribution and use of water resources, because through the international trade of food commodities, countries virtually displace the water used for food production, known as "virtual water". Trade thus implies a network of virtual water fluxes from exporting to importing countries, which has been estimated to displace more than 2 billions of m3 of water per year, or about the 2% of the annual global precipitation above land. It is thus important to adequately identify the dynamics and the controlling factors of the virtual water trade in that it supports and enables the world food security. Using the FAOSTAT database of international trade and the virtual water content available from the Water Footprint Network, we reconstructed 25 years (1986-2010) of virtual water fluxes. We then analyzed the dependence of exchanged fluxes on a set of major relevant factors, that includes: population, gross domestic product, arable land, virtual water embedded in agricultural production and dietary consumption, and geographical distance between countries. Significant drivers have been identified by means of a multivariate regression analysis, applied separately to the export and import fluxes of each country; temporal trends are outlined and the relative importance of drivers is assessed by a commonality analysis. Results indicate that population, gross domestic product and geographical distance are the major drivers of virtual water fluxes, with a minor (but non-negligible) contribution given by the agricultural production of exporting countries. Such drivers have become relevant for an increasing number of countries throughout the years, with an increasing variance explained by the distance between countries and a decreasing role of the gross

  13. Application of SAS macro to evaluated multiplicative and additive interaction in logistic and Cox regression in clinical practices%实现logistic与Cox回归相乘相加交互作用的临床实践宏程序

    Institute of Scientific and Technical Information of China (English)

    聂志强; 欧艳秋; 庄建; 曲艳吉; 麦劲壮; 陈寄梅; 刘小清

    2016-01-01

    病例对照研究常采用条件或非条件logistic分析,生存资料分析常采用Cox比例模型,但多数文献仅纳入主效应模型,然而广义线性模型不同于一般线性模型,其交互作用分为相乘交互与相加交互作用,前者只有统计学意义而后者更符合生物学意义.笔者以SAS 9.4软件编写宏,在计算logistic与Cox相乘交互项同时计算交互对比度、归因比、交互作用指数指标及利用Wald、Delta、PL(profile likelihood)3种方法的可信区间评价相加交互作用,便于临床流行病学与遗传学大数据分析相乘相加交互作用时参考.%Conditional logistic regression analysis and unconditional logistic regression analysis are commonly used in case control study,but Cox proportional hazard model is often used in survival data analysis.Most literature only refer to main effect model,however,generalized linear model differs from general linear model,and the interaction was composed of multiplicative interaction and additive interaction.The former is only statistical significant,but the latter has biological significance.In this paper,macros was written by using SAS 9.4 and the contrast ratio,attributable proportion due to interaction and synergy index were calculated while calculating the items of logistic and Cox regression interactions,and the confidence intervals of Wald,delta and profile likelihood were used to evaluate additive interaction for the reference in big data analysis in clinical epidemiology and in analysis of genetic multiplicative and additive interactions.

  14. Analysis of Product Sampling for New Product Diffusion Incorporating Multiple-Unit Ownership

    Directory of Open Access Journals (Sweden)

    Zhineng Hu

    2014-01-01

    Full Text Available Multiple-unit ownership of nondurable products is an important component of sales in many product categories. Based on the Bass model, this paper develops a new model considering the multiple-unit adoptions as a diffusion process under the influence of product sampling. Though the analysis aims to determine the optimal dynamic sampling effort for a firm and the results demonstrate that experience sampling can accelerate the diffusion process, the best time to send free samples is just before the product being launched. Multiple-unit purchasing behavior can increase sales to make more profit for a firm, and it needs more samples to make the product known much better. The local sensitivity analysis shows that the increase of both external coefficients and internal coefficients has a negative influence on the sampling level, but the internal influence on the subsequent multiple-unit adoptions has little significant influence on the sampling. Using the logistic regression along with linear regression, the global sensitivity analysis gives a whole analysis of the interaction of all factors, which manifests the external influence and multiunit purchase rate are two most important factors to influence the sampling level and net present value of the new product, and presents a two-stage method to determine the sampling level.

  15. Application of Robust Regression and Bootstrap in Poductivity Analysis of GERD Variable in EU27

    Directory of Open Access Journals (Sweden)

    Dagmar Blatná

    2014-06-01

    Full Text Available The GERD is one of Europe 2020 headline indicators being tracked within the Europe 2020 strategy. The headline indicator is the 3% target for the GERD to be reached within the EU by 2020. Eurostat defi nes “GERD” as total gross domestic expenditure on research and experimental development in a percentage of GDP. GERD depends on numerous factors of a general economic background, namely of employment, innovation and research, science and technology. The values of these indicators vary among the European countries, and consequently the occurrence of outliers can be anticipated in corresponding analyses. In such a case, a classical statistical approach – the least squares method – can be highly unreliable, the robust regression methods representing an acceptable and useful tool. The aim of the present paper is to demonstrate the advantages of robust regression and applicability of the bootstrap approach in regression based on both classical and robust methods.

  16. Future daily PM10 concentrations prediction by combining regression models and feedforward backpropagation models with principle component analysis (PCA)

    Science.gov (United States)

    Ul-Saufie, Ahmad Zia; Yahaya, Ahmad Shukri; Ramli, Nor Azam; Rosaida, Norrimi; Hamid, Hazrul Abdul

    2013-10-01

    Future PM10 concentration prediction is very important because it can help local authorities to enact preventative measures to reduce the impact of air pollution. The aims of this study are to improve prediction of Multiple Linear Regression (MLR) and Feedforward backpropagation (FFBP) by combining them with principle component analysis for predicting future (next day, next two-day and next three-day) PM10 concentration in Negeri Sembilan, Malaysia. Annual hourly observations for PM10 in Negeri Sembilan, Malaysia from January 2003 to December 2010 were selected for predicting PM10 concentration level. Eighty percent of the monitoring records were used for training and twenty percent were used for validation of the models. Three accuracy measures - Prediction Accuracy (PA), Coefficient of Determination (R2) and Index of Agreement (IA), as well as two error measures - Normalized Absolute Error (NAE) and Root Mean Square Error (RMSE) were used to evaluate the performance of the models. Results show that PCA models combined with MLR and PCA with FFBP improved MLR and FFBP models for all three days in advance of predicting PM10 concentration, with reduced errors by as much as 18.1% (PCA-MLR) and 17.68% (PCA-FFBP) for next day, 19.2% (PCA-MLR) and 22.1% (PCA-FFBP) for next two-day and 18.7% (PCA-MLR) and 22.79% (PCA-FFBP) for next three-day predictions. Including PCA improved the accuracy of the models by as much as by 12.9% (PCA-MLR) and 13.3% (PCA-FFBP) for next day, 32.3% (PCA-MLR) and 14.7% (PCA-FFBP) for next two-day and 46.1% (PCA-MLR) and 19.3% (PCA-FFBP) for next three-day predictions.

  17. Regression Basics

    CERN Document Server

    Kahane, Leo H

    2007-01-01

    Using a friendly, nontechnical approach, the Second Edition of Regression Basics introduces readers to the fundamentals of regression. Accessible to anyone with an introductory statistics background, this book builds from a simple two-variable model to a model of greater complexity. Author Leo H. Kahane weaves four engaging examples throughout the text to illustrate not only the techniques of regression but also how this empirical tool can be applied in creative ways to consider a broad array of topics. New to the Second Edition Offers greater coverage of simple panel-data estimation:

  18. JT-60 configuration parameters for feedback control determined by regression analysis

    Science.gov (United States)

    Matsukawa, Makoto; Hosogane, Nobuyuki; Ninomiya, Hiromasa

    1991-12-01

    The stepwise regression procedure was applied to obtain measurement formulas for equilibrium parameters used in the feedback control of JT-60. This procedure automatically selects variables necessary for the measurements, and selects a set of variables which are not likely to be picked up by physical considerations. Regression equations with stable and small multicollinearity were obtained and it was experimentally confirmed that the measurement formulas obtained through this procedure were accurate enough to be applicable to the feedback control of plasma configurations in JT-60.

  19. Introduction to regression graphics

    CERN Document Server

    Cook, R Dennis

    2009-01-01

    Covers the use of dynamic and interactive computer graphics in linear regression analysis, focusing on analytical graphics. Features new techniques like plot rotation. The authors have composed their own regression code, using Xlisp-Stat language called R-code, which is a nearly complete system for linear regression analysis and can be utilized as the main computer program in a linear regression course. The accompanying disks, for both Macintosh and Windows computers, contain the R-code and Xlisp-Stat. An Instructor's Manual presenting detailed solutions to all the problems in the book is ava

  20. EFFECT OF VARIOUS INPUTS ON PADDY PRODUCTION - A COMPARISON OF ARTIFICIAL NEURAL NETWORKS AND LINEAR REGRESSION ANALYSIS

    Directory of Open Access Journals (Sweden)

    Mohammad Ali HORMOZI

    2015-06-01

    Full Text Available We analyzed the effect of chemical fertilizer, seed, biocide, farm machinery and labor hours on production of paddy (paddy rice in the Khuzestan province in the South Western part of Iran. Here we test two methods (linear regression and neural network. We conclude that the results gotten by neural network with no hidden layer and linear regression are closed to each other. We insist that for a data set of this type the regression analysis yields more reliable results compared to a neural network. They suggest that machinery has a very clear positive effect on yield while fertilizer and labor doesn't affect on it. One can say that there is no necessity that increasing the amount of some "useful input" increase paddy production.