Remaining Phosphorus Estimate Through Multiple Regression Analysis
Institute of Scientific and Technical Information of China (English)
M. E. ALVES; A. LAVORENTI
2006-01-01
The remaining phosphorus (Prem), P concentration that remains in solution after shaking soil with 0.01 mol L-1 CaCl2 containing 60 μg mL-1 P, is a very useful index for studies related to the chemistry of variable charge soils. Although the Prem determination is a simple procedure, the possibility of estimating accurate values of this index from easily and/or routinely determined soil properties can be very useful for practical purposes. The present research evaluated the Premestimation through multiple regression analysis in which routinely determined soil chemical data, soil clay content and soil pH measured in 1 mol L-1 NaF (pHNaF) figured as Prem predictor variables. The Prem can be estimated with acceptable accuracy using the above-mentioned approach, and PHNaF not only substitutes for clay content as a predictor variable but also confers more accuracy to the Prem estimates.
General Nature of Multicollinearity in Multiple Regression Analysis.
Liu, Richard
1981-01-01
Discusses multiple regression, a very popular statistical technique in the field of education. One of the basic assumptions in regression analysis requires that independent variables in the equation should not be highly correlated. The problem of multicollinearity and some of the solutions to it are discussed. (Author)
Multiple regression analysis of cancer incidence around nuclear plant
International Nuclear Information System (INIS)
The results of a multiple regression analysis of cancer incidence in the vicinity of a nuclear plant are presented. No dependence on radiation factors (natural background, radioactive releases, total dose of all types of medical examinations) is established. At the same time a relationship between general cancer incidence, turmors of lungs, trashea, bronchi and hematopoictic tissue carcimona incidence and releases of dangerous chemical substances is revealed
MULTIPLE REGRESSION ANALYSIS OF MAIN ECONOMIC INDICATORS IN TOURISM
Directory of Open Access Journals (Sweden)
Erika KULCSÁR
2009-12-01
Full Text Available This paper analysis the measure between GDP dependent variable in the sector of hotels and restaurants and the following independent variables: overnight stays in the establishments of touristic reception, arrivals in the establishments of touristic reception and investments in hotels and restaurants sector in the period of analysis 1995-2007. With the multiple regression analysis I found that investments and tourist arrivals are significant predictors for the GDP dependent variable. Based on these results, I identified those components of the marketing mix, which in my opinion require investment, which could contribute to the positive development of tourist arrivals in the establishments of touristic reception.
Multiple regression for physiological data analysis: the problem of multicollinearity.
Slinker, B K; Glantz, S A
1985-07-01
Multiple linear regression, in which several predictor variables are related to a response variable, is a powerful statistical tool for gaining quantitative insight into complex in vivo physiological systems. For these insights to be correct, all predictor variables must be uncorrelated. However, in many physiological experiments the predictor variables cannot be precisely controlled and thus change in parallel (i.e., they are highly correlated). There is a redundancy of information about the response, a situation called multicollinearity, that leads to numerical problems in estimating the parameters in regression equations; the parameters are often of incorrect magnitude or sign or have large standard errors. Although multicollinearity can be avoided with good experimental design, not all interesting physiological questions can be studied without encountering multicollinearity. In these cases various ad hoc procedures have been proposed to mitigate multicollinearity. Although many of these procedures are controversial, they can be helpful in applying multiple linear regression to some physiological problems.
Estimating Commercial Property rentals in the Greek Market using Multiple regression analysis
A. Karytinos; V. Vlachostergiou
2001-01-01
The examination of the application of multiple regression analysis as a tool for modelling real estate aspects has been the object of several research papers. The latter mainly target to the explanation of commercial and residential property values and rental prices through the statistical analysis of their characteristics. The aim of this paper is to examine the possibility of using multiple regression analysis for explaining rental prices of commercial property in the Greek property market....
Business applications of multiple regression
Richardson, Ronny
2015-01-01
This second edition of Business Applications of Multiple Regression describes the use of the statistical procedure called multiple regression in business situations, including forecasting and understanding the relationships between variables. The book assumes a basic understanding of statistics but reviews correlation analysis and simple regression to prepare the reader to understand and use multiple regression. The techniques described in the book are illustrated using both Microsoft Excel and a professional statistical program. Along the way, several real-world data sets are analyzed in deta
Khalil, Mohamed H.; Shebl, Mostafa K.; Kosba, Mohamed A.; El-Sabrout, Karim; Zaki, Nesma
2016-01-01
Aim: This research was conducted to determine the most affecting parameters on hatchability of indigenous and improved local chickens’ eggs. Materials and Methods: Five parameters were studied (fertility, early and late embryonic mortalities, shape index, egg weight, and egg weight loss) on four strains, namely Fayoumi, Alexandria, Matrouh, and Montazah. Multiple linear regression was performed on the studied parameters to determine the most influencing one on hatchability. Results: The results showed significant differences in commercial and scientific hatchability among strains. Alexandria strain has the highest significant commercial hatchability (80.70%). Regarding the studied strains, highly significant differences in hatching chick weight among strains were observed. Using multiple linear regression analysis, fertility made the greatest percent contribution (71.31%) to hatchability, and the lowest percent contributions were made by shape index and egg weight loss. Conclusion: A prediction of hatchability using multiple regression analysis could be a good tool to improve hatchability percentage in chickens. PMID:27651666
Bates, Reid A.; Holton, Elwood F., III; Burnett, Michael F.
1999-01-01
A case study of learning transfer demonstrates the possible effect of influential observation on linear regression analysis. A diagnostic method that tests for violation of assumptions, multicollinearity, and individual and multiple influential observations helps determine which observation to delete to eliminate bias. (SK)
Factor analysis and multiple regression between topography and precipitation on Jeju Island, Korea
Um, Myoung-Jin; Yun, Hyeseon; Jeong, Chang-Sam; Heo, Jun-Haeng
2011-11-01
SummaryIn this study, new factors that influence precipitation were extracted from geographic variables using factor analysis, which allow for an accurate estimation of orographic precipitation. Correlation analysis was also used to examine the relationship between nine topographic variables from digital elevation models (DEMs) and the precipitation in Jeju Island. In addition, a spatial analysis was performed in order to verify the validity of the regression model. From the results of the correlation analysis, it was found that all of the topographic variables had a positive correlation with the precipitation. The relations between the variables also changed in accordance with a change in the precipitation duration. However, upon examining the correlation matrix, no significant relationship between the latitude and the aspect was found. According to the factor analysis, eight topographic variables (latitude being the exception) were found to have a direct influence on the precipitation. Three factors were then extracted from the eight topographic variables. By directly comparing the multiple regression model with the factors (model 1) to the multiple regression model with the topographic variables (model 3), it was found that model 1 did not violate the limits of statistical significance and multicollinearity. As such, model 1 was considered to be appropriate for estimating the precipitation when taking into account the topography. In the study of model 1, the multiple regression model using factor analysis was found to be the best method for estimating the orographic precipitation on Jeju Island.
Multiple Regression Analysis of Aroma Components and Sensory Evaluation of Miso
Sugawara, Etsuko; SAIGA, Suguru; Kobayashi, Akio
1994-01-01
Among several sensory characteristics to evaluate the quality of miso (fermented bean paste), aroma is the most difficult one. If results of chemical analysis of miso aroma could be transformed into numerical terms, the evaluation of miso may become easier. Therefore we investigated relationship between aroma components and sensory scores of rice-miso by multiple regression analysis. Thirty-four rice-miso exhibited at the National Miso Competition were used as the samples. Each peak area of t...
Directory of Open Access Journals (Sweden)
Nop Sopipan
2013-01-01
Full Text Available The aim of this study was to forecast the returns for the Stock Exchange of Thailand (SET Index by adding some explanatory variables and stationary Autoregressive order p (AR (p in the mean equation of returns. In addition, we used Principal Component Analysis (PCA to remove possible complications caused by multicollinearity. Results showed that the multiple regressions based on PCA, has the best performance.
COLOR IMAGE RETRIEVAL BASED ON FEATURE FUSION THROUGH MULTIPLE LINEAR REGRESSION ANALYSIS
Directory of Open Access Journals (Sweden)
K. Seetharaman
2015-08-01
Full Text Available This paper proposes a novel technique based on feature fusion using multiple linear regression analysis, and the least-square estimation method is employed to estimate the parameters. The given input query image is segmented into various regions according to the structure of the image. The color and texture features are extracted on each region of the query image, and the features are fused together using the multiple linear regression model. The estimated parameters of the model, which is modeled based on the features, are formed as a vector called a feature vector. The Canberra distance measure is adopted to compare the feature vectors of the query and target images. The F-measure is applied to evaluate the performance of the proposed technique. The obtained results expose that the proposed technique is comparable to the other existing techniques.
Energy Technology Data Exchange (ETDEWEB)
Akkaya, Ali Volkan [Department of Mechanical Engineering, Yildiz Technical University, 34349 Besiktas, Istanbul (Turkey)
2009-02-15
In this paper, multiple nonlinear regression models for estimation of higher heating value of coals are developed using proximate analysis data obtained generally from the low rank coal samples as-received basis. In this modeling study, three main model structures depended on the number of proximate analysis parameters, which are named the independent variables, such as moisture, ash, volatile matter and fixed carbon, are firstly categorized. Secondly, sub-model structures with different arrangements of the independent variables are considered. Each sub-model structure is analyzed with a number of model equations in order to find the best fitting model using multiple nonlinear regression method. Based on the results of nonlinear regression analysis, the best model for each sub-structure is determined. Among them, the models giving highest correlation for three main structures are selected. Although the selected all three models predicts HHV rather accurately, the model involving four independent variables provides the most accurate estimation of HHV. Additionally, when the chosen model with four independent variables and a literature model are tested with extra proximate analysis data, it is seen that that the developed model in this study can give more accurate prediction of HHV of coals. It can be concluded that the developed model is effective tool for HHV estimation of low rank coals. (author)
Directory of Open Access Journals (Sweden)
Deni Memić
2015-01-01
Full Text Available This article has an aim to assess credit default prediction on the banking market in Bosnia and Herzegovina nationwide as well as on its constitutional entities (Federation of Bosnia and Herzegovina and Republika Srpska. Ability to classify companies info different predefined groups or finding an appropriate tool which would replace human assessment in classifying companies into good and bad buckets has been one of the main interests on risk management researchers for a long time. We investigated the possibility and accuracy of default prediction using traditional statistical methods logistic regression (logit and multiple discriminant analysis (MDA and compared their predictive abilities. The results show that the created models have high predictive ability. For logit models, some variables are more influential on the default prediction than the others. Return on assets (ROA is statistically significant in all four periods prior to default, having very high regression coefficients, or high impact on the model's ability to predict default. Similar results are obtained for MDA models. It is also found that predictive ability differs between logistic regression and multiple discriminant analysis.
Greensmith, David J
2014-01-01
Here I present an Excel based program for the analysis of intracellular Ca transients recorded using fluorescent indicators. The program can perform all the necessary steps which convert recorded raw voltage changes into meaningful physiological information. The program performs two fundamental processes. (1) It can prepare the raw signal by several methods. (2) It can then be used to analyze the prepared data to provide information such as absolute intracellular Ca levels. Also, the rates of change of Ca can be measured using multiple, simultaneous regression analysis. I demonstrate that this program performs equally well as commercially available software, but has numerous advantages, namely creating a simplified, self-contained analysis workflow.
Freund, Rudolf J; Sa, Ping
2006-01-01
The book provides complete coverage of the classical methods of statistical analysis. It is designed to give students an understanding of the purpose of statistical analyses, to allow the student to determine, at least to some degree, the correct type of statistical analyses to be performed in a given situation, and have some appreciation of what constitutes good experimental design
Rajab, Jasim M.; MatJafri, M. Z.; Lim, H. S.
2013-06-01
This study encompasses columnar ozone modelling in the peninsular Malaysia. Data of eight atmospheric parameters [air surface temperature (AST), carbon monoxide (CO), methane (CH4), water vapour (H2Ovapour), skin surface temperature (SSKT), atmosphere temperature (AT), relative humidity (RH), and mean surface pressure (MSP)] data set, retrieved from NASA's Atmospheric Infrared Sounder (AIRS), for the entire period (2003-2008) was employed to develop models to predict the value of columnar ozone (O3) in study area. The combined method, which is based on using both multiple regressions combined with principal component analysis (PCA) modelling, was used to predict columnar ozone. This combined approach was utilized to improve the prediction accuracy of columnar ozone. Separate analysis was carried out for north east monsoon (NEM) and south west monsoon (SWM) seasons. The O3 was negatively correlated with CH4, H2Ovapour, RH, and MSP, whereas it was positively correlated with CO, AST, SSKT, and AT during both the NEM and SWM season periods. Multiple regression analysis was used to fit the columnar ozone data using the atmospheric parameter's variables as predictors. A variable selection method based on high loading of varimax rotated principal components was used to acquire subsets of the predictor variables to be comprised in the linear regression model of the atmospheric parameter's variables. It was found that the increase in columnar O3 value is associated with an increase in the values of AST, SSKT, AT, and CO and with a drop in the levels of CH4, H2Ovapour, RH, and MSP. The result of fitting the best models for the columnar O3 value using eight of the independent variables gave about the same values of the R (≈0.93) and R2 (≈0.86) for both the NEM and SWM seasons. The common variables that appeared in both regression equations were SSKT, CH4 and RH, and the principal precursor of the columnar O3 value in both the NEM and SWM seasons was SSKT.
A Performance Study of Data Mining Techniques: Multiple Linear Regression vs. Factor Analysis
Taneja, Abhishek
2011-01-01
The growing volume of data usually creates an interesting challenge for the need of data analysis tools that discover regularities in these data. Data mining has emerged as disciplines that contribute tools for data analysis, discovery of hidden knowledge, and autonomous decision making in many application domains. The purpose of this study is to compare the performance of two data mining techniques viz., factor analysis and multiple linear regression for different sample sizes on three unique sets of data. The performance of the two data mining techniques is compared on following parameters like mean square error (MSE), R-square, R-Square adjusted, condition number, root mean square error(RMSE), number of variables included in the prediction model, modified coefficient of efficiency, F-value, and test of normality. These parameters have been computed using various data mining tools like SPSS, XLstat, Stata, and MS-Excel. It is seen that for all the given dataset, factor analysis outperform multiple linear re...
Baylor, Carolyn; Yorkston, Kathryn; Bamer, Alyssa; Britton, Deanna; Amtmann, Dagmar
2010-01-01
Purpose: To explore variables associated with self-reported communicative participation in a sample (n = 498) of community-dwelling adults with multiple sclerosis (MS). Method: A battery of questionnaires was administered online or on paper per participant preference. Data were analyzed using multiple linear backward stepwise regression. The…
Simms, Laura E.; Engebretson, Mark J.; Pilipenko, Viacheslav; Reeves, Geoffrey D.; Clilverd, Mark
2016-04-01
The daily maximum relativistic electron flux at geostationary orbit can be predicted well with a set of daily averaged predictor variables including previous day's flux, seed electron flux, solar wind velocity and number density, AE index, IMF Bz, Dst, and ULF and VLF wave power. As predictor variables are intercorrelated, we used multiple regression analyses to determine which are the most predictive of flux when other variables are controlled. Empirical models produced from regressions of flux on measured predictors from 1 day previous were reasonably effective at predicting novel observations. Adding previous flux to the parameter set improves the prediction of the peak of the increases but delays its anticipation of an event. Previous day's solar wind number density and velocity, AE index, and ULF wave activity are the most significant explanatory variables; however, the AE index, measuring substorm processes, shows a negative correlation with flux when other parameters are controlled. This may be due to the triggering of electromagnetic ion cyclotron waves by substorms that cause electron precipitation. VLF waves show lower, but significant, influence. The combined effect of ULF and VLF waves shows a synergistic interaction, where each increases the influence of the other on flux enhancement. Correlations between observations and predictions for this 1 day lag model ranged from 0.71 to 0.89 (average: 0.78). A path analysis of correlations between predictors suggests that solar wind and IMF parameters affect flux through intermediate processes such as ring current (Dst), AE, and wave activity.
Directory of Open Access Journals (Sweden)
Abdul Ghafoor Memon
2014-03-01
Full Text Available In this study, thermodynamic and statistical analyses were performed on a gas turbine system, to assess the impact of some important operating parameters like CIT (Compressor Inlet Temperature, PR (Pressure Ratio and TIT (Turbine Inlet Temperature on its performance characteristics such as net power output, energy efficiency, exergy efficiency and fuel consumption. Each performance characteristic was enunciated as a function of operating parameters, followed by a parametric study and optimization. The results showed that the performance characteristics increase with an increase in the TIT and a decrease in the CIT, except fuel consumption which behaves oppositely. The net power output and efficiencies increase with the PR up to certain initial values and then start to decrease, whereas the fuel consumption always decreases with an increase in the PR. The results of exergy analysis showed the combustion chamber as a major contributor to the exergy destruction, followed by stack gas. Subsequently, multiple regression models were developed to correlate each of the response variables (performance characteristic with the predictor variables (operating parameters. The regression model equations showed a significant statistical relationship between the predictor and response variables.
Directory of Open Access Journals (Sweden)
Abdelrafe Elzamly
2014-01-01
Full Text Available Risk is not always avoidable, but it is controllable. The aim of this study is to identify whether those techniques are effective in reducing software failure. This motivates the authors to continue the effort to enrich the managing software project risks with consider mining and quantitative approach with large data set. In this study, two new techniques are introduced namely stepwise multiple regression analysis and fuzzy multiple regression to manage the software risks. Two evaluation procedures such as MMRE and Pred (25 is used to compare the accuracy of techniques. The model’s accuracy slightly improves in stepwise multiple regression rather than fuzzy multiple regression. This study will guide software managers to apply software risk management practices with real world software development organizations and verify the effectiveness of the new techniques and approaches on a software project. The study has been conducted on a group of software project using survey questionnaire. It is hope that this will enable software managers improve their decision to increase the probability of software project success.
An, Xin; Xu, Shuo; Zhang, Lu-Da; Su, Shi-Guang
2009-01-01
In the present paper, on the basis of LS-SVM algorithm, we built a multiple dependent variables LS-SVM (MLS-SVM) regression model whose weights can be optimized, and gave the corresponding algorithm. Furthermore, we theoretically explained the relationship between MLS-SVM and LS-SVM. Sixty four broomcorn samples were taken as experimental material, and the sample ratio of modeling set to predicting set was 51 : 13. We first selected randomly and uniformly five weight groups in the interval [0, 1], and then in the way of leave-one-out (LOO) rule determined one appropriate weight group and parameters including penalizing parameters and kernel parameters in the model according to the criterion of the minimum of average relative error. Then a multiple dependent variables quantitative analysis model was built with NIR spectrum and simultaneously analyzed three chemical constituents containing protein, lysine and starch. Finally, the average relative errors between actual values and predicted ones by the model of three components for the predicting set were 1.65%, 6.47% and 1.37%, respectively, and the correlation coefficients were 0.9940, 0.8392 and 0.8825, respectively. For comparison, LS-SVM was also utilized, for which the average relative errors were 1.68%, 6.25% and 1.47%, respectively, and the correlation coefficients were 0.9941, 0.8310 and 0.8800, respectively. It is obvious that MLS-SVM algorithm is comparable to LS-SVM algorithm in modeling analysis performance, and both of them can give satisfying results. The result shows that the model with MLS-SVM algorithm is capable of doing multi-components NIR quantitative analysis synchronously. Thus MLS-SVM algorithm offers a new multiple dependent variables quantitative analysis approach for chemometrics. In addition, the weights have certain effect on the prediction performance of the model with MLS-SVM, which is consistent with our intuition and is validated in this study. Therefore, it is necessary to optimize
PUMA: a unified framework for penalized multiple regression analysis of GWAS data.
Directory of Open Access Journals (Sweden)
Gabriel E Hoffman
Full Text Available Penalized Multiple Regression (PMR can be used to discover novel disease associations in GWAS datasets. In practice, proposed PMR methods have not been able to identify well-supported associations in GWAS that are undetectable by standard association tests and thus these methods are not widely applied. Here, we present a combined algorithmic and heuristic framework for PUMA (Penalized Unified Multiple-locus Association analysis that solves the problems of previously proposed methods including computational speed, poor performance on genome-scale simulated data, and identification of too many associations for real data to be biologically plausible. The framework includes a new minorize-maximization (MM algorithm for generalized linear models (GLM combined with heuristic model selection and testing methods for identification of robust associations. The PUMA framework implements the penalized maximum likelihood penalties previously proposed for GWAS analysis (i.e. Lasso, Adaptive Lasso, NEG, MCP, as well as a penalty that has not been previously applied to GWAS (i.e. LOG. Using simulations that closely mirror real GWAS data, we show that our framework has high performance and reliably increases power to detect weak associations, while existing PMR methods can perform worse than single marker testing in overall performance. To demonstrate the empirical value of PUMA, we analyzed GWAS data for type 1 diabetes, Crohns's disease, and rheumatoid arthritis, three autoimmune diseases from the original Wellcome Trust Case Control Consortium. Our analysis replicates known associations for these diseases and we discover novel etiologically relevant susceptibility loci that are invisible to standard single marker tests, including six novel associations implicating genes involved in pancreatic function, insulin pathways and immune-cell function in type 1 diabetes; three novel associations implicating genes in pro- and anti-inflammatory pathways in Crohn
Using Robust Variance Estimation to Combine Multiple Regression Estimates with Meta-Analysis
Williams, Ryan
2013-01-01
The purpose of this study was to explore the use of robust variance estimation for combining commonly specified multiple regression models and for combining sample-dependent focal slope estimates from diversely specified models. The proposed estimator obviates traditionally required information about the covariance structure of the dependent…
A factor analysis-multiple regression model for source apportionment of suspended particulate matter
Okamoto, Shin'ichi; Hayashi, Masayuki; Nakajima, Masaomi; Kainuma, Yasutaka; Shiozawa, Kiyoshige
A factor analysis-multiple regression (FA-MR) model has been used for a source apportionment study in the Tokyo metropolitan area. By a varimax rotated factor analysis, five source types could be identified: refuse incineration, soil and automobile, secondary particles, sea salt and steel mill. Quantitative estimations using the FA-MR model corresponded to the calculated contributing concentrations determined by using a weighted least-squares CMB model. However, the source type of refuse incineration identified by the FA-MR model was similar to that of biomass burning, rather than that produced by an incineration plant. The estimated contributions of sea salt and steel mill by the FA-MR model contained those of other sources, which have the same temporal variation of contributing concentrations. This symptom was caused by a multicollinearity problem. Although this result shows the limitation of the multivariate receptor model, it gives useful information concerning source types and their distribution by comparing with the results of the CMB model. In the Tokyo metropolitan area, the contributions from soil (including road dust), automobile, secondary particles and refuse incineration (biomass burning) were larger than industrial contributions: fuel oil combustion and steel mill. However, since vanadium is highly correlated with SO 42- and other secondary particle related elements, a major portion of secondary particles is considered to be related to fuel oil combustion.
Anomalous particle pinch and scaling of vin/D based on transport analysis and multiple regression
Becker, G.; Kardaun, O.
2007-01-01
Predictions of density profiles in current tokamaks and ITER require a validated scaling relation for vin/D where vin is the anomalous inward drift velocity and D is the anomalous diffusion coefficient. Transport analysis is necessary for determining the anomalous particle pinch from measured density profiles and for separating the impact of particle sources. A set of discharges in ASDEX Upgrade, DIII-D, JET and ASDEX is analysed using a special version of the 1.5-D BALDUR transport code. Profiles of ρsvin/D with ρs the effective separatrix radius, five other dimensionless parameters and many further quantities in the confinement zone are compiled, resulting in the dataset VIND1.dat, which covers a wide parameter range. Weighted multiple regression is applied to the ASDEX Upgrade subset which leads to a two-term scaling \\rho _sv_in ({x'}) /D ({x'}) =0.0432 [ { ({L_{T_{\\rme}} ({ \\bar {x}'}) / \\rho _s}) ^{-2.58}+7.13 \\, U_L^{1.55} \
Directory of Open Access Journals (Sweden)
C. Makendran
2015-01-01
Full Text Available Prediction models for low volume village roads in India are developed to evaluate the progression of different types of distress such as roughness, cracking, and potholes. Even though the Government of India is investing huge quantum of money on road construction every year, poor control over the quality of road construction and its subsequent maintenance is leading to the faster road deterioration. In this regard, it is essential that scientific maintenance procedures are to be evolved on the basis of performance of low volume flexible pavements. Considering the above, an attempt has been made in this research endeavor to develop prediction models to understand the progression of roughness, cracking, and potholes in flexible pavements exposed to least or nil routine maintenance. Distress data were collected from the low volume rural roads covering about 173 stretches spread across Tamil Nadu state in India. Based on the above collected data, distress prediction models have been developed using multiple linear regression analysis. Further, the models have been validated using independent field data. It can be concluded that the models developed in this study can serve as useful tools for the practicing engineers maintaining flexible pavements on low volume roads.
Spalj, Stjepan; Spalj, Vedrana Tudor; Ivanković, Luida; Plancak, Darije
2014-03-01
The aim of this study was to explore the patterns of oral health-related risk behaviours in relation to dental status, attitudes, motivation and knowledge among Croatian adolescents. The assessment was conducted in the sample of 750 male subjects - military recruits aged 18-28 in Croatia using the questionnaire and clinical examination. Mean number of decayed, missing and filled teeth (DMFT) and Significant Caries Index (SIC) were calculated. Multiple logistic regression models were crated for analysis. Although models of risk behaviours were statistically significant their explanatory values were quite low. Five of them--rarely toothbrushing, not using hygiene auxiliaries, rarely visiting dentist, toothache as a primary reason to visit dentist, and demand for tooth extraction due to toothache--had the highest explanatory values ranging from 21-29% and correctly classified 73-89% of subjects. Toothache as a primary reason to visit dentist, extraction as preferable therapy when toothache occurs, not having brushing education in school and frequent gingival bleeding were significantly related to population with high caries experience (DMFT > or = 14 according to SiC) producing Odds ratios of 1.6 (95% CI 1.07-2.46), 2.1 (95% CI 1.29-3.25), 1.8 (95% CI 1.21-2.74) and 2.4 (95% CI 1.21-2.74) respectively. DMFT> or = 14 model had low explanatory value of 6.5% and correctly classified 83% of subjects. It can be concluded that oral health-related risk behaviours are interrelated. Poor association was seen between attitudes concerning oral health and oral health-related risk behaviours, indicating insufficient motivation to change lifestyle and habits. Self-reported oral hygiene habits were not strongly related to dental status.
Muller, Veronica; Brooks, Jessica; Tu, Wei-Mo; Moser, Erin; Lo, Chu-Ling; Chan, Fong
2015-01-01
Purpose: The main objective of this study was to determine the extent to which physical and cognitive-affective factors are associated with fibromyalgia (FM) fatigue. Method: A quantitative descriptive design using correlation techniques and multiple regression analysis. The participants consisted of 302 members of the National Fibromyalgia &…
Bakker, D.P.; Busscher, H.J.; Zanten, J. van; Vries, J. de; Klijnstra, J.W.; Mei, H.C. van der
2004-01-01
Many studies have shown relationships of substratum hydrophobicity, charge or roughness with bacterial adhesion, although bacterial adhesion is governed by interplay of different physico-chemical properties and multiple regression analysis would be more suitable to reveal mechanisms of bacterial adh
Bakker, Dewi P; Busscher, Henk J; van Zanten, Joyce; de Vries, Jacob; Klijnstra, Job W; van der Mei, Henny C
2004-01-01
Many studies have shown relationships of substratum hydrophobicity, charge or roughness with bacterial adhesion, although bacterial adhesion is governed by interplay of different physico-chemical properties and multiple regression analysis would be more suitable to reveal mechanisms of bacterial adh
Johns, Thomas D.
1982-01-01
Approved for public release; distribution unlimited In order to determine more scientifically the value of property assisted by the Coast Guard in search and rescue incidents, regression analysis was conducted on various characteristics of vessels in order to estimate their fair market values. Data for this research were collected from the U.S. Maritime Administration, the U.S. Coast Guard, and numerous oil and steel companies. Mathematical models were developed for merch...
Regression analysis by example
Chatterjee, Samprit
2012-01-01
Praise for the Fourth Edition: ""This book is . . . an excellent source of examples for regression analysis. It has been and still is readily readable and understandable."" -Journal of the American Statistical Association Regression analysis is a conceptually simple method for investigating relationships among variables. Carrying out a successful application of regression analysis, however, requires a balance of theoretical results, empirical rules, and subjective judgment. Regression Analysis by Example, Fifth Edition has been expanded
Investigations upon the indefinite rolls quality assurance in multiple regression analysis
Directory of Open Access Journals (Sweden)
Kiss, I.
2012-04-01
Full Text Available The rolling rolls quality has been enhanced mainly due to the improvements of the chemical compositions of rolls materials. The realization of an optimal chemical composition can constitute a technical efficient mode to assure the exploitation properties, the material from which the rolling mills rolls are manufactured having a higher importance in this sense. This paper continues to present the scientifically results of our experimental research in the area of the rolling rolls. The basic research contains concrete elements of immediate practical utilities in the metallurgical enterprises, for the quality improvements of rolls, having in last as the aim the durability growth and the safety in exploitation. This paper presents an analysis of the chemical composition, the influences upon the mechanical properties of the indefinite cast iron rolls. We present some mathematical correlations and graphical interpretations between the hardness (on the working surface and on necks and the chemical composition. Using the double and triple correlations which is really helpful in the foundry practice, as it allows us to determine variation boundaries for the chemical composition, in view the obtaining the optimal values of the hardness. We suggest a mathematical interpretation of the influence of the chemical composition over the hardness of these indefinite rolling rolls. In this sense we use the multiple regression analysis which can be an important statistical tool for the investigation of relationships between variables. The enunciation of some mathematically modeling results can be described through a number of multi-component equations determined for the spaces with 3 and 4 dimensions. Also, the regression surfaces, curves of levels and volumes of variations can be represented and interpreted by technologists considering these as correlation diagrams between the analyzed variables. In this sense, these researches results can be used in the engineers
Multiple regression and principal components analysis of puberty and growth in cattle.
Baker, J F; Stewart, T S; Long, C R; Cartwright, T C
1988-09-01
Multiple regression and principal components analyses were employed to examine relationships among pubertal and growth characters. Records used were from 424 bulls and 475 heifers produced by a diallel mating of Angus, Brahman, Hereford, Holstein and Jersey breeds. Characters studied were age, weight and height at puberty and measurements of weight and hip height from 9 to 21 mo of age; pelvic measurements of heifers also were included. Measurements of weight and height near 1 yr of age were related most highly to pubertal age, weight adn height. Larger size near 1 yr of age was associated with younger, larger animals at puberty. Growth rate was associated with pubertal characters before, but not after, adjustment for effects of breed-type. Principal components of the variation of pubertal and growth characters among animals were strongly related to both weight and height. The majority of the variation among breed-types was due to height. Characteristic vectors of principal components describing the variation of bulls and heifers were strikingly similar. The variance-covariance structure of pubertal characters was essentially the same for both sexes even though the mean values of the characters differed. PMID:3170369
Directory of Open Access Journals (Sweden)
Vujić Zorica B.
2012-01-01
Full Text Available This article presents the possibility of using of multiple regression analysis (MRA and dynamic neural network (DNN for prediction of stability of Hydrocortisone 100 mg (in a form of hydrocortisone sodium succinate freeze-dried powder for injection packed into a dual chamber container. Degradation products of hydrocortisone sodium succinate: free hydrocortisone and related substances (impurities A, B, C, D and E; unspecified impurities and total impurities were followed during stress and formal stability studies. All data obtained during stability studies were used for in silico modeling; multiple regression models and dynamic neural networks as well, in order to compare predicted and observed results. High values of coefficient of determination (0.950.99 were gained using MRA and DNN, so both methods are powerful tools for in silico stability studies, but superiority of DNN over mathematical modeling of degradation was also confirmed.
A calibration method of Argo floats based on multiple regression analysis
Institute of Scientific and Technical Information of China (English)
无
2006-01-01
Argo floats are free-moving floats that report vertical profiles of salinity, temperature and pressure at regular time intervals. These floats give good measurements of temperature and pressure, but salinity measurements may show significant sensor drifting with time. It is found that sensor drifting with time is not purely linear as presupposed by Wong (2003). A new method is developed to calibrate conductivity data measured by Argo floats. In this method, Wong's objective analysis method was adopted to estimate the background climatological salinity field on potential temperature surfaces from nearby historical data in WOD01. Furthermore, temperature and time factors are taken into account, and stepwise regression was used for a time-varying or temperature-varying slope in potential conductivity space to correct the drifting in these profiling float salinity data. The result shows salinity errors using this method are smaller than that of Wong's method, the quantitative and qualitative analysis of the conductivity sensor can be carried out with our method.
El-Ansary, Afaf
2016-06-01
This work demonstrates data of multiple regression analysis between nine biomarkers related to glutamate excitotoxicity and impaired detoxification as two mechanisms recently recorded as autism phenotypes. The presented data was obtained by measuring a panel of markers in 20 autistic patients aged 3-15 years and 20 age and gender matching healthy controls. Levels of GSH, glutathione status (GSH/GSSG), glutathione reductase (GR), glutathione-s-transferase (GST), thioredoxin (Trx), thioredoxin reductase (TrxR) and peroxidoxins (Prxs I and III), glutamate, glutamine, glutamate/glutamine ratio glutamate dehydrogenase (GDH) in plasma and mercury (Hg) in red blood cells were determined in both groups. In Multiple regression analysis, R (2) values which describe the proportion or percentage of variance in the dependent variable attributed to the variance in the independent variables together were calculated. Moreover, β coefficients values which show the direction either positive or negative and the contribution of the independent variable relative to the other independent variables in explaining the variation of the dependent variable were determined. A panel of inter-related markers was recorded. This paper contains data related to and supporting research articles currently published entitled "Mechanism of nitrogen metabolism-related parameters and enzyme activities in the pathophysiology of autism" [1], "Novel metabolic biomarkers related to sulfur-dependent detoxification pathways in autistic patients of Saudi Arabia [2], and "A key role for an impaired detoxification mechanism in the etiology and severity of autism spectrum disorders" [3]. PMID:26933667
Some Simple Computational Formulas for Multiple Regression
Aiken, Lewis R., Jr.
1974-01-01
Short-cut formulas are presented for direct computation of the beta weights, the standard errors of the beta weights, and the multiple correlation coefficient for multiple regression problems involving three independent variables and one dependent variable. (Author)
Pevná, Hana; Jeníček, Michal
2014-05-01
Snow is the important component of hydrological cycle in the central Europe. Large quantity of water is accumulated as snow during winter period and this water runs off into rivers in relative short time during spring period. Increased risk of floods in central Europe exists namely in alpine and pre-alpine catchments which have the pluvio-nival flow regime. Research of snow accumulation and snowmelt processes is important for runoff forecast and reservoir management. The research is carried out in small mountain catchments in the Czech Republic. Experimental catchments are differing in elevation range, aspect, slope and type of vegetation cover. Automatic and field measurements of the snow depth and snow water equivalent (SWE) have been caring out at specific localities since 2008. Each locality is specified with elevation, aspect, slope and vegetation type (open area, clearing, young forest, sparse mature forest and dense mature forest). Measurements of snow depth and SWE are carried out at 19 localities both during snow accumulation and snow melt period. Data of snow depth and SWE were assessed using both simple statistical analysis and multiple regression and cluster analysis in order to describe the spatial distribution in snow accumulation and snowmelt. The correlation of SWE with vegetation type, elevation, aspect and slope was tested. The main findings of the research show that vegetation type has the most significant influence on the snowpack distribution and on the snow accumulation and snowmelt dynamics. Significant correlations were also proved for aspect (especially for southern slopes). The study completes similar results carried out in different study areas and climatic conditions but moreover it shows changes of importace of governing factors during snow accumulation and snowmelt periods. The results demonstrate a good applicability of cluster analysis and multiple regression for description of snowpack distribution.
Directory of Open Access Journals (Sweden)
Nop Sopipan
2013-01-01
Full Text Available The aim of this study was to forecast the returns for the Stock Exchange of Thailand (SET Index by adding some explanatory variables and stationary Autoregressive Moving-Average order p and q (ARMA (p, q in the mean equation of returns. In addition, we used Principal Component Analysis (PCA to remove possible complications caused by multicollinearity. Afterwards, we forecast the volatility of the returns for the SET Index. Results showed that the ARMA (1,1, which includes multiple regression based on PCA, has the best performance. In forecasting the volatility of returns, the GARCH model performs best for one day ahead; and the EGARCH model performs best for five days, ten days and twenty-two days ahead.
Bayesian logistic regression analysis
Van Erp, H.R.N.; Van Gelder, P.H.A.J.M.
2012-01-01
In this paper we present a Bayesian logistic regression analysis. It is found that if one wishes to derive the posterior distribution of the probability of some event, then, together with the traditional Bayes Theorem and the integrating out of nuissance parameters, the Jacobian transformation is an
Multiple Linear Regression Models in Outlier Detection
Directory of Open Access Journals (Sweden)
S.M.A.Khaleelur Rahman
2012-02-01
Full Text Available Identifying anomalous values in the real-world database is important both for improving the quality of original data and for reducing the impact of anomalous values in the process of knowledge discovery in databases. Such anomalous values give useful information to the data analyst in discovering useful patterns. Through isolation, these data may be separated and analyzed. The analysis of outliers and influential points is an important step of the regression diagnostics. In this paper, our aim is to detect the points which are very different from the others points. They do not seem to belong to a particular population and behave differently. If these influential points are to be removed it will lead to a different model. Distinction between these points is not always obvious and clear. Hence several indicators are used for identifying and analyzing outliers. Existing methods of outlier detection are based on manual inspection of graphically represented data. In this paper, we present a new approach in automating the process of detecting and isolating outliers. Impact of anomalous values on the dataset has been established by using two indicators DFFITS and Cook’sD. The process is based on modeling the human perception of exceptional values by using multiple linear regression analysis.
Directory of Open Access Journals (Sweden)
Ingunn Fride Tvete
Full Text Available Rheumatoid arthritis patients have been treated with disease modifying anti-rheumatic drugs (DMARDs and the newer biologic drugs. We sought to compare and rank the biologics with respect to efficacy. We performed a literature search identifying 54 publications encompassing 9 biologics. We conducted a multiple treatment comparison regression analysis letting the number experiencing a 50% improvement on the ACR score be dependent upon dose level and disease duration for assessing the comparable relative effect between biologics and placebo or DMARD. The analysis embraced all treatment and comparator arms over all publications. Hence, all measured effects of any biologic agent contributed to the comparison of all biologic agents relative to each other either given alone or combined with DMARD. We found the drug effect to be dependent on dose level, but not on disease duration, and the impact of a high versus low dose level was the same for all drugs (higher doses indicated a higher frequency of ACR50 scores. The ranking of the drugs when given without DMARD was certolizumab (ranked highest, etanercept, tocilizumab/ abatacept and adalimumab. The ranking of the drugs when given with DMARD was certolizumab (ranked highest, tocilizumab, anakinra/rituximab, golimumab/ infliximab/ abatacept, adalimumab/ etanercept [corrected]. Still, all drugs were effective. All biologic agents were effective compared to placebo, with certolizumab the most effective and adalimumab (without DMARD treatment and adalimumab/ etanercept (combined with DMARD treatment the least effective. The drugs were in general more effective, except for etanercept, when given together with DMARDs.
Armaghani, Danial Jahed; Mahdiyar, Amir; Hasanipanah, Mahdi; Faradonbeh, Roohollah Shirani; Khandelwal, Manoj; Amnieh, Hassan Bakhshandeh
2016-09-01
Flyrock is considered as one of the main causes of human injury, fatalities, and structural damage among all undesirable environmental impacts of blasting. Therefore, it seems that the proper prediction/simulation of flyrock is essential, especially in order to determine blast safety area. If proper control measures are taken, then the flyrock distance can be controlled, and, in return, the risk of damage can be reduced or eliminated. The first objective of this study was to develop a predictive model for flyrock estimation based on multiple regression (MR) analyses, and after that, using the developed MR model, flyrock phenomenon was simulated by the Monte Carlo (MC) approach. In order to achieve objectives of this study, 62 blasting operations were investigated in Ulu Tiram quarry, Malaysia, and some controllable and uncontrollable factors were carefully recorded/calculated. The obtained results of MC modeling indicated that this approach is capable of simulating flyrock ranges with a good level of accuracy. The mean of simulated flyrock by MC was obtained as 236.3 m, while this value was achieved as 238.6 m for the measured one. Furthermore, a sensitivity analysis was also conducted to investigate the effects of model inputs on the output of the system. The analysis demonstrated that powder factor is the most influential parameter on fly rock among all model inputs. It is noticeable that the proposed MR and MC models should be utilized only in the studied area and the direct use of them in the other conditions is not recommended.
Directory of Open Access Journals (Sweden)
Ani Shabri
2014-01-01
Full Text Available Crude oil prices do play significant role in the global economy and are a key input into option pricing formulas, portfolio allocation, and risk measurement. In this paper, a hybrid model integrating wavelet and multiple linear regressions (MLR is proposed for crude oil price forecasting. In this model, Mallat wavelet transform is first selected to decompose an original time series into several subseries with different scale. Then, the principal component analysis (PCA is used in processing subseries data in MLR for crude oil price forecasting. The particle swarm optimization (PSO is used to adopt the optimal parameters of the MLR model. To assess the effectiveness of this model, daily crude oil market, West Texas Intermediate (WTI, has been used as the case study. Time series prediction capability performance of the WMLR model is compared with the MLR, ARIMA, and GARCH models using various statistics measures. The experimental results show that the proposed model outperforms the individual models in forecasting of the crude oil prices series.
Shabri, Ani; Samsudin, Ruhaidah
2014-01-01
Crude oil prices do play significant role in the global economy and are a key input into option pricing formulas, portfolio allocation, and risk measurement. In this paper, a hybrid model integrating wavelet and multiple linear regressions (MLR) is proposed for crude oil price forecasting. In this model, Mallat wavelet transform is first selected to decompose an original time series into several subseries with different scale. Then, the principal component analysis (PCA) is used in processing subseries data in MLR for crude oil price forecasting. The particle swarm optimization (PSO) is used to adopt the optimal parameters of the MLR model. To assess the effectiveness of this model, daily crude oil market, West Texas Intermediate (WTI), has been used as the case study. Time series prediction capability performance of the WMLR model is compared with the MLR, ARIMA, and GARCH models using various statistics measures. The experimental results show that the proposed model outperforms the individual models in forecasting of the crude oil prices series.
Directory of Open Access Journals (Sweden)
Jevrić Lidija R.
2013-01-01
Full Text Available The estimation of retention factors by correlation equations with physico-chemical properties can be of great helpl in chromatographic studies. The retention factors were experimentally measured by RP-HPTLC on impregnated silica gel with paraffin oil using two-component solvent systems. The relationships between solute retention and modifier concentration were described by Snyder’s linear equation. A quantitative structure-retention relationship was developed for a series of s-triazine compounds by the multiple linear regression (MLR analysis. The MLR procedure was used to model the relationships between the molecular descriptors and retention of s-triazine derivatives. The physicochemical molecular descriptors were calculated from the optimized structures. The physico-chemical properties were the lipophilicity (log P, connectivity indices (χ, total energy (Et, water solubility (log W, dissociation constant (pKa, molar refractivity (MR, and Gibbs energy (GibbsE of s-triazines. A high agreement between the experimental and predicted retention parameters was obtained when the dissociation constant and the hydrophilic-lipophilic balance were used as the molecular descriptors. The empirical equations may be successfully used for the prediction of the various chromatographic characteristics of substances, with a similar chemical structure. [Projekat Ministarstva nauke Republike Srbije, br. 31055, br. 172012, br. 172013 i br. 172014
Denli, H. H.; Koc, Z.
2015-12-01
Estimation of real properties depending on standards is difficult to apply in time and location. Regression analysis construct mathematical models which describe or explain relationships that may exist between variables. The problem of identifying price differences of properties to obtain a price index can be converted into a regression problem, and standard techniques of regression analysis can be used to estimate the index. Considering regression analysis for real estate valuation, which are presented in real marketing process with its current characteristics and quantifiers, the method will help us to find the effective factors or variables in the formation of the value. In this study, prices of housing for sale in Zeytinburnu, a district in Istanbul, are associated with its characteristics to find a price index, based on information received from a real estate web page. The associated variables used for the analysis are age, size in m2, number of floors having the house, floor number of the estate and number of rooms. The price of the estate represents the dependent variable, whereas the rest are independent variables. Prices from 60 real estates have been used for the analysis. Same price valued locations have been found and plotted on the map and equivalence curves have been drawn identifying the same valued zones as lines.
The M Word: Multicollinearity in Multiple Regression.
Morrow-Howell, Nancy
1994-01-01
Notes that existence of substantial correlation between two or more independent variables creates problems of multicollinearity in multiple regression. Discusses multicollinearity problem in social work research in which independent variables are usually intercorrelated. Clarifies problems created by multicollinearity, explains detection of…
Kokaly, R.F.; Clark, R.N.
1999-01-01
We develop a new method for estimating the biochemistry of plant material using spectroscopy. Normalized band depths calculated from the continuum-removed reflectance spectra of dried and ground leaves were used to estimate their concentrations of nitrogen, lignin, and cellulose. Stepwise multiple linear regression was used to select wavelengths in the broad absorption features centered at 1.73 ??m, 2.10 ??m, and 2.30 ??m that were highly correlated with the chemistry of samples from eastern U.S. forests. Band depths of absorption features at these wavelengths were found to also be highly correlated with the chemistry of four other sites. A subset of data from the eastern U.S. forest sites was used to derive linear equations that were applied to the remaining data to successfully estimate their nitrogen, lignin, and cellulose concentrations. Correlations were highest for nitrogen (R2 from 0.75 to 0.94). The consistent results indicate the possibility of establishing a single equation capable of estimating the chemical concentrations in a wide variety of species from the reflectance spectra of dried leaves. The extension of this method to remote sensing was investigated. The effects of leaf water content, sensor signal-to-noise and bandpass, atmospheric effects, and background soil exposure were examined. Leaf water was found to be the greatest challenge to extending this empirical method to the analysis of fresh whole leaves and complete vegetation canopies. The influence of leaf water on reflectance spectra must be removed to within 10%. Other effects were reduced by continuum removal and normalization of band depths. If the effects of leaf water can be compensated for, it might be possible to extend this method to remote sensing data acquired by imaging spectrometers to give estimates of nitrogen, lignin, and cellulose concentrations over large areas for use in ecosystem studies.We develop a new method for estimating the biochemistry of plant material using
Loturco, Irineu; Artioli, Guilherme Giannini; Kobal, Ronaldo; Gil, Saulo; Franchini, Emerson
2014-07-01
This study investigated the relationship between punching acceleration and selected strength and power variables in 19 professional karate athletes from the Brazilian National Team (9 men and 10 women; age, 23 ± 3 years; height, 1.71 ± 0.09 m; and body mass [BM], 67.34 ± 13.44 kg). Punching acceleration was assessed under 4 different conditions in a randomized order: (a) fixed distance aiming to attain maximum speed (FS), (b) fixed distance aiming to attain maximum impact (FI), (c) self-selected distance aiming to attain maximum speed, and (d) self-selected distance aiming to attain maximum impact. The selected strength and power variables were as follows: maximal dynamic strength in bench press and squat-machine, squat and countermovement jump height, mean propulsive power in bench throw and jump squat, and mean propulsive velocity in jump squat with 40% of BM. Upper- and lower-body power and maximal dynamic strength variables were positively correlated to punch acceleration in all conditions. Multiple regression analysis also revealed predictive variables: relative mean propulsive power in squat jump (W·kg-1), and maximal dynamic strength 1 repetition maximum in both bench press and squat-machine exercises. An impact-oriented instruction and a self-selected distance to start the movement seem to be crucial to reach the highest acceleration during punching execution. This investigation, while demonstrating strong correlations between punching acceleration and strength-power variables, also provides important information for coaches, especially for designing better training strategies to improve punching speed.
Hierarchical regression for analyses of multiple outcomes.
Richardson, David B; Hamra, Ghassan B; MacLehose, Richard F; Cole, Stephen R; Chu, Haitao
2015-09-01
In cohort mortality studies, there often is interest in associations between an exposure of primary interest and mortality due to a range of different causes. A standard approach to such analyses involves fitting a separate regression model for each type of outcome. However, the statistical precision of some estimated associations may be poor because of sparse data. In this paper, we describe a hierarchical regression model for estimation of parameters describing outcome-specific relative rate functions and associated credible intervals. The proposed model uses background stratification to provide flexible control for the outcome-specific associations of potential confounders, and it employs a hierarchical "shrinkage" approach to stabilize estimates of an exposure's associations with mortality due to different causes of death. The approach is illustrated in analyses of cancer mortality in 2 cohorts: a cohort of dioxin-exposed US chemical workers and a cohort of radiation-exposed Japanese atomic bomb survivors. Compared with standard regression estimates of associations, hierarchical regression yielded estimates with improved precision that tended to have less extreme values. The hierarchical regression approach also allowed the fitting of models with effect-measure modification. The proposed hierarchical approach can yield estimates of association that are more precise than conventional estimates when one wishes to estimate associations with multiple outcomes. PMID:26232395
Regression Analysis A Constructive Critique
Berk, Richard A
2003-01-01
Regression Analysis: A Constructive Critique identifies a wide variety of problems with regression analysis as it is commonly used and then provides a number of ways in which practice could be improved. Regression is most useful for data reduction, leading to relatively simple but rich and precise descriptions of patterns in a data set. The emphasis on description provides readers with an insightful rethinking from the ground up of what regression analysis can do, so that readers can better match regression analysis with useful empirical questions and improved policy-related research. "An
A Dirty Model for Multiple Sparse Regression
Jalali, Ali; Sanghavi, Sujay
2011-01-01
Sparse linear regression -- finding an unknown vector from linear measurements -- is now known to be possible with fewer samples than variables, via methods like the LASSO. We consider the multiple sparse linear regression problem, where several related vectors -- with partially shared support sets -- have to be recovered. A natural question in this setting is whether one can use the sharing to further decrease the overall number of samples required. A line of recent research has studied the use of \\ell_1/\\ell_q norm block-regularizations with q>1 for such problems; however these could actually perform worse in sample complexity -- vis a vis solving each problem separately ignoring sharing -- depending on the level of sharing. We present a new method for multiple sparse linear regression that can leverage support and parameter overlap when it exists, but not pay a penalty when it does not. A very simple idea: we decompose the parameters into two components and regularize these differently. We show both theore...
Multiple Regression Analyses in Clinical Child and Adolescent Psychology
Jaccard, James; Guilamo-Ramos, Vincent; Johansson, Margaret; Bouris, Alida
2006-01-01
A major form of data analysis in clinical child and adolescent psychology is multiple regression. This article reviews issues in the application of such methods in light of the research designs typical of this field. Issues addressed include controlling covariates, evaluation of predictor relevance, comparing predictors, analysis of moderation,…
Multiple linear regression for isotopic measurements
Garcia Alonso, J. I.
2012-04-01
There are two typical applications of isotopic measurements: the detection of natural variations in isotopic systems and the detection man-made variations using enriched isotopes as indicators. For both type of measurements accurate and precise isotope ratio measurements are required. For the so-called non-traditional stable isotopes, multicollector ICP-MS instruments are usually applied. In many cases, chemical separation procedures are required before accurate isotope measurements can be performed. The off-line separation of Rb and Sr or Nd and Sm is the classical procedure employed to eliminate isobaric interferences before multicollector ICP-MS measurement of Sr and Nd isotope ratios. Also, this procedure allows matrix separation for precise and accurate Sr and Nd isotope ratios to be obtained. In our laboratory we have evaluated the separation of Rb-Sr and Nd-Sm isobars by liquid chromatography and on-line multicollector ICP-MS detection. The combination of this chromatographic procedure with multiple linear regression of the raw chromatographic data resulted in Sr and Nd isotope ratios with precisions and accuracies typical of off-line sample preparation procedures. On the other hand, methods for the labelling of individual organisms (such as a given plant, fish or animal) are required for population studies. We have developed a dual isotope labelling procedure which can be unique for a given individual, can be inherited in living organisms and it is stable. The detection of the isotopic signature is based also on multiple linear regression. The labelling of fish and its detection in otoliths by Laser Ablation ICP-MS will be discussed using trout and salmon as examples. As a conclusion, isotope measurement procedures based on multiple linear regression can be a viable alternative in multicollector ICP-MS measurements.
Persson, Bertil
2014-01-01
The aim of the study was to examine relationships between psychosocial family- and school environment and personality as assessed by the Junior Eysenck Personality Questionnaire (EPQ-J) and possible personality interactional effects. The study was based on 244 Swedish girls and boys, 10-19 years old, who filled in the Family- and School Psychosocial Environment (FSPE) questionnaire and the EPQ-J. A multiple regression analysis showed that the FSPE-factor Family conflicts and school discipline...
Liu, Bilan; Qiu, Xing; Zhu, Tong; Tian, Wei; Hu, Rui; Ekholm, Sven; Schifitto, Giovanni; Zhong, Jianhui
2016-03-01
Subject-specific longitudinal DTI study is vital for investigation of pathological changes of lesions and disease evolution. Spatial Regression Analysis of Diffusion tensor imaging (SPREAD) is a non-parametric permutation-based statistical framework that combines spatial regression and resampling techniques to achieve effective detection of localized longitudinal diffusion changes within the whole brain at individual level without a priori hypotheses. However, boundary blurring and dislocation limit its sensitivity, especially towards detecting lesions of irregular shapes. In the present study, we propose an improved SPREAD (dubbed improved SPREAD, or iSPREAD) method by incorporating a three-dimensional (3D) nonlinear anisotropic diffusion filtering method, which provides edge-preserving image smoothing through a nonlinear scale space approach. The statistical inference based on iSPREAD was evaluated and compared with the original SPREAD method using both simulated and in vivo human brain data. Results demonstrated that the sensitivity and accuracy of the SPREAD method has been improved substantially by adapting nonlinear anisotropic filtering. iSPREAD identifies subject-specific longitudinal changes in the brain with improved sensitivity, accuracy, and enhanced statistical power, especially when the spatial correlation is heterogeneous among neighboring image pixels in DTI.
Energy Technology Data Exchange (ETDEWEB)
Chelgani, S. Chehreh; Jorjani, E.; Mesroghli, Sh.; Bagherieh, A.H. [Department of Mining Engineering, Research and Science Campus, Islamic Azad University, Poonak, Hesarak Tehran (Iran); Hower, James C. [Center for Applied Energy Research, University of Kentucky, 2540 Research Park Drive, Lexington, KY 40511 (United States)
2008-01-15
The effects of proximate and ultimate analysis, maceral content, and coal rank (R{sub max}) for a wide range of Kentucky coal samples from calorific value of 4320 to 14960 (BTU/lb) (10.05 to 34.80 MJ/kg) on Hardgrove Grindability Index (HGI) have been investigated by multivariable regression and artificial neural network methods (ANN). The stepwise least square mathematical method shows that the relationship between (a) Moisture, ash, volatile matter, and total sulfur; (b) ln (total sulfur), hydrogen, ash, ln ((oxygen + nitrogen)/carbon) and moisture; (c) ln (exinite), semifusinite, micrinite, macrinite, resinite, and R{sub max} input sets with HGI in linear condition can achieve the correlation coefficients (R{sup 2}) of 0.77, 0.75, and 0.81, respectively. The ANN, which adequately recognized the characteristics of the coal samples, can predict HGI with correlation coefficients of 0.89, 0.89 and 0.95 respectively in testing process. It was determined that ln (exinite), semifusinite, micrinite, macrinite, resinite, and R{sub max} can be used as the best predictor for the estimation of HGI on multivariable regression (R{sup 2} = 0.81) and also artificial neural network methods (R{sup 2} = 0.95). The ANN based prediction method, as used in this paper, can be further employed as a reliable and accurate method, in the hardgrove grindability index prediction. (author)
Directory of Open Access Journals (Sweden)
Aline Gomes da Silva
2014-01-01
Full Text Available In the current context of climate change discussions, predictions of future scenarios of weather and climate are crucial for the generation of information of interest to the global community. Due to the atmosphere being a chaotic system, errors in predictions of future scenarios are systematically observed. Therefore, numerous techniques have been tested in order to generate more reliable predictions, and two techniques have excelled in science: dynamic downscaling, through regional models, and ensemble prediction, combining different outputs of climate models through the arithmetic average, in other words, a postprocessing of the output data species. Thus, this paper proposes a method of postprocessing outputs of regional climate models. This method consists in using the statistical tool multiple linear regression by principal components for combining different simulations obtained by dynamic downscaling with the regional climate model (RegCM4. Tests for the Amazon and Northeast region of Brazil (South America showed that the method provided a more realistic prediction in terms of average daily rainfall for the analyzed period prescribed, after comparing with the prediction made by set through the arithmetic averages of the simulations. This method photographed the extreme events (outlier that the prediction by averaging failed. Data from the Tropical Rainfall Measuring Mission (TRMM were used to evaluate the method.
Indian Academy of Sciences (India)
ABHIJIT SARKAR; PRASENJIT DEY; R N RAI; SUBHAS CHANDRA SAHA
2016-05-01
Weld bead plays an important role in determining the quality of welding particularly in high heat input processes. This research paper presents the development of multiple regression analysis (MRA) and artificial neural network (ANN) models to predict weld bead geometry and HAZ width in submerged arcwelding process. Design of experiments is based on Taguchi’s L16 orthogonal array by varying wire feed rate,transverse speed and stick out to develop a multiple regression model, which has been checked for adequacy andsignificance. Also, ANN model was accomplished with the back propagation approach in MATLAB program to predict bead geometry and HAZ width. Finally, the results of two prediction models were compared and analyzed. It is found that the error related to the prediction of bead geometry and HAZ width is smaller in ANN than MRA.
Interpretation of Standardized Regression Coefficients in Multiple Regression.
Thayer, Jerome D.
The extent to which standardized regression coefficients (beta values) can be used to determine the importance of a variable in an equation was explored. The beta value and the part correlation coefficient--also called the semi-partial correlation coefficient and reported in squared form as the incremental "r squared"--were compared for variables…
Directory of Open Access Journals (Sweden)
Hukharnsusatrue, A.
2005-11-01
Full Text Available The objective of this research is to compare multiple regression coefficients estimating methods with existence of multicollinearity among independent variables. The estimation methods are Ordinary Least Squares method (OLS, Restricted Least Squares method (RLS, Restricted Ridge Regression method (RRR and Restricted Liu method (RL when restrictions are true and restrictions are not true. The study used the Monte Carlo Simulation method. The experiment was repeated 1,000 times under each situation. The analyzed results of the data are demonstrated as follows. CASE 1: The restrictions are true. In all cases, RRR and RL methods have a smaller Average Mean Square Error (AMSE than OLS and RLS method, respectively. RRR method provides the smallest AMSE when the level of correlations is high and also provides the smallest AMSE for all level of correlations and all sample sizes when standard deviation is equal to 5. However, RL method provides the smallest AMSE when the level of correlations is low and middle, except in the case of standard deviation equal to 3, small sample sizes, RRR method provides the smallest AMSE.The AMSE varies with, most to least, respectively, level of correlations, standard deviation and number of independent variables but inversely with to sample size.CASE 2: The restrictions are not true.In all cases, RRR method provides the smallest AMSE, except in the case of standard deviation equal to 1 and error of restrictions equal to 5%, OLS method provides the smallest AMSE when the level of correlations is low or median and there is a large sample size, but the small sample sizes, RL method provides the smallest AMSE. In addition, when error of restrictions is increased, OLS method provides the smallest AMSE for all level, of correlations and all sample sizes, except when the level of correlations is high and sample sizes small. Moreover, the case OLS method provides the smallest AMSE, the most RLS method has a smaller AMSE than
Luo, Xingguang; Kranzler, Henry R.; Zuo, Lingjun; Wang, Shuang; Schork, Nicholas J.; Gelernter, Joel
2006-01-01
The set of alcohol-metabolizing enzymes has considerable genetic and functional complexity. The relationships between some alcohol dehydrogenase (ADH) and aldehyde dehydrogenase (ALDH) genes and alcohol dependence (AD) have long been studied in many populations, but not comprehensively. In the present study, we genotyped 16 markers within the ADH gene cluster (including the ADH1A, ADH1B, ADH1C, ADH5, ADH6, and ADH7 genes), 4 markers within the ALDH2 gene, and 38 unlinked ancestry-informative markers in a case-control sample of 801 individuals. Associations between markers and disease were analyzed by a Hardy-Weinberg equilibrium (HWE) test, a conventional case-control comparison, a structured association analysis, and a novel diplotype trend regression (DTR) analysis. Finally, the disease alleles were fine mapped by a Hardy-Weinberg disequilibrium (HWD) measure (J). All markers were found to be in HWE in controls, but some markers showed HWD in cases. Genotypes of many markers were associated with AD. DTR analysis showed that ADH5 genotypes and diplotypes of ADH1A, ADH1B, ADH7, and ALDH2 were associated with AD in European Americans and/or African Americans. The risk-influencing alleles were fine mapped from among the markers studied and were found to coincide with some well-known functional variants. We demonstrated that DTR was more powerful than many other conventional association methods. We also found that several ADH genes and the ALDH2 gene were susceptibility loci for AD, and the associations were best explained by several independent risk genes. PMID:16685648
Directory of Open Access Journals (Sweden)
Paulo Canas Rodrigues
2011-12-01
Full Text Available This paper joins the main properties of joint regression analysis (JRA, a model based on the Finlay-Wilkinson regression to analyse multi-environment trials, and of the additive main effects and multiplicative interaction (AMMI model. The study compares JRA and AMMI with particular focus on robustness with increasing amounts of randomly selected missing data. The application is made using a data set from a breeding program of durum wheat (Triticum turgidum L., Durum Group conducted in Portugal. The results of the two models result in similar dominant cultivars (JRA and winner of mega-environments (AMMI for the same environments. However, JRA had more stable results with the increase in the incidence rates of missing values.
Entrepreneurial intention modeling using hierarchical multiple regression
Directory of Open Access Journals (Sweden)
Marina Jeger
2014-12-01
Full Text Available The goal of this study is to identify the contribution of effectuation dimensions to the predictive power of the entrepreneurial intention model over and above that which can be accounted for by other predictors selected and confirmed in previous studies. As is often the case in social and behavioral studies, some variables are likely to be highly correlated with each other. Therefore, the relative amount of variance in the criterion variable explained by each of the predictors depends on several factors such as the order of variable entry and sample specifics. The results show the modest predictive power of two dimensions of effectuation prior to the introduction of the theory of planned behavior elements. The article highlights the main advantages of applying hierarchical regression in social sciences as well as in the specific context of entrepreneurial intention formation, and addresses some of the potential pitfalls that this type of analysis entails.
Relationship between Multiple Regression and Selected Multivariable Methods.
Schumacker, Randall E.
The relationship of multiple linear regression to various multivariate statistical techniques is discussed. The importance of the standardized partial regression coefficient (beta weight) in multiple linear regression as it is applied in path, factor, LISREL, and discriminant analyses is emphasized. The multivariate methods discussed in this paper…
Multiple Retrieval Models and Regression Models for Prior Art Search
Lopez, Patrice
2009-01-01
This paper presents the system called PATATRAS (PATent and Article Tracking, Retrieval and AnalysiS) realized for the IP track of CLEF 2009. Our approach presents three main characteristics: 1. The usage of multiple retrieval models (KL, Okapi) and term index definitions (lemma, phrase, concept) for the three languages considered in the present track (English, French, German) producing ten different sets of ranked results. 2. The merging of the different results based on multiple regression models using an additional validation set created from the patent collection. 3. The exploitation of patent metadata and of the citation structures for creating restricted initial working sets of patents and for producing a final re-ranking regression model. As we exploit specific metadata of the patent documents and the citation relations only at the creation of initial working sets and during the final post ranking step, our architecture remains generic and easy to extend.
Commonality Analysis for the Regression Case.
Murthy, Kavita
Commonality analysis is a procedure for decomposing the coefficient of determination (R superscript 2) in multiple regression analyses into the percent of variance in the dependent variable associated with each independent variable uniquely, and the proportion of explained variance associated with the common effects of predictors in various…
Pricing Single Malt Whisky : A Regression Analysis
Bjartmar Hylta, Sanna; Lundquist, Emma
2016-01-01
This thesis examines the factors that affect the price of whisky. Multiple regression analysis is used to model the relationship between the identified covariates that are believed to impact the price of whisky. The optimal marketing strategy for whisky producers in the regions Islay and Campbeltown are discussed. This analysis is based on the Marketing Mix. Furthermore, a Porter’s five forces analysis, focusing on the regions Campeltown and Islay, is examined. Finally the findings are summar...
Heteroscedastic regression analysis method for mixed data
Institute of Scientific and Technical Information of China (English)
FU Hui-min; YUE Xiao-rui
2011-01-01
The heteroscedastic regression model was established and the heteroscedastic regression analysis method was presented for mixed data composed of complete data, type- I censored data and type- Ⅱ censored data from the location-scale distribution. The best unbiased estimations of regression coefficients, as well as the confidence limits of the location parameter and scale parameter were given. Furthermore, the point estimations and confidence limits of percentiles were obtained. Thus, the traditional multiple regression analysis method which is only suitable to the complete data from normal distribution can be extended to the cases of heteroscedastic mixed data and the location-scale distribution. So the presented method has a broad range of promising applications.
Chelgani, S.C.; Hart, B.; Grady, W.C.; Hower, J.C.
2011-01-01
The relationship between maceral content plus mineral matter and gross calorific value (GCV) for a wide range of West Virginia coal samples (from 6518 to 15330 BTU/lb; 15.16 to 35.66MJ/kg) has been investigated by multivariable regression and adaptive neuro-fuzzy inference system (ANFIS). The stepwise least square mathematical method comparison between liptinite, vitrinite, plus mineral matter as input data sets with measured GCV reported a nonlinear correlation coefficient (R2) of 0.83. Using the same data set the correlation between the predicted GCV from the ANFIS model and the actual GCV reported a R2 value of 0.96. It was determined that the GCV-based prediction methods, as used in this article, can provide a reasonable estimation of GCV. Copyright ?? Taylor & Francis Group, LLC.
Demirturk Kocasarac, Husniye; Sinanoglu, Alper; Noujeim, Marcel; Helvacioglu Yigit, Dilek; Baydemir, Canan
2016-05-01
For forensic age estimation, radiographic assessment of third molar mineralization is important between 14 and 21 years which coincides with the legal age in most countries. The spheno-occipital synchondrosis (SOS) is an important growth site during development, and its use for age estimation is beneficial when combined with other markers. In this study, we aimed to develop a regression model to estimate and narrow the age range based on the radiologic assessment of third molar and SOS in a Turkish subpopulation. Panoramic radiographs and cone beam CT scans of 349 subjects (182 males, 167 females) with age between 8 and 25 were evaluated. Four-stage system was used to evaluate the fusion degree of SOS, and Demirjian's eight stages of development for calcification for third molars. The Pearson correlation indicated a strong positive relationship between age and third molar calcification for both sexes (r = 0.850 for females, r = 0.839 for males, P age and SOS fusion for females (r = 0.814), but a moderate relationship was found for males (r = 0.599), P age determination formula using these scores was established.
Yano, Kentaro; Mita, Suzune; Morimoto, Kaori; Haraguchi, Tamami; Arakawa, Hiroshi; Yoshida, Miyako; Yamashita, Fumiyoshi; Uchida, Takahiro; Ogihara, Takuo
2015-09-01
P-glycoprotein (P-gp) regulates absorption of many drugs in the gastrointestinal tract and their accumulation in tumor tissues, but the basis of substrate recognition by P-gp remains unclear. Bitter-tasting phenylthiocarbamide, which stimulates taste receptor 2 member 38 (T2R38), increases P-gp activity and is a substrate of P-gp. This led us to hypothesize that bitterness intensity might be a predictor of P-gp-inhibitor/substrate status. Here, we measured the bitterness intensity of a panel of P-gp substrates and nonsubstrates with various taste sensors, and used multiple linear regression analysis to examine the relationship between P-gp-inhibitor/substrate status and various physical properties, including intensity of bitter taste measured with the taste sensor. We calculated the first principal component analysis score (PC1) as the representative value of bitterness, as all taste sensor's outputs shared significant correlation. The P-gp substrates showed remarkably greater mean bitterness intensity than non-P-gp substrates. We found that Km value of P-gp substrates were correlated with molecular weight, log P, and PC1 value, and the coefficient of determination (R(2) ) of the linear regression equation was 0.63. This relationship might be useful as an aid to predict P-gp substrate status at an early stage of drug discovery.
Institute of Scientific and Technical Information of China (English)
Lin Li
2011-01-01
Partial least squares (PLS) regression was applied to the Lunar Soil Characterization Consortium (LSCC) dataset for spectral estimation of TiO2.The LSCC dataset was split into a number of subsets including the low-Ti,high-Ti,total mare soils,total highland,Apollo 16,and Apollo 14 soils to investigete the effects of interfering minerals and nonlinearity on the PLS performance.The PLS weight loading vectors were analyzed through stepwise multiple regression analysis (SMRA) to identify mineral species driving and interfering the PLS performance.PLS exhibits high performance for estimating TiO2 for the LSCC low-Ti and high-Ti mare samples and both groups analyzed together.The results suggest that while the dominant TiO2-bearing minerals are few,additional PLS factors are required to compensate the effects on the important PLS factors of minerals that are not highly corrected to TiO2,to accommodate nonlinear relationships between reflectance and TiO2,and to correct inconsistent mineral-TiO2 correlations between the high-Ti and iow-Ti mare samples.Analysis of the LSCC highland soil samples indicates that the Apollo 16 soils are responsible for the large errors of TiO2 estimates when the soils are modeled with other subgroups.For the LSCC Apollo 16 samples,the dominant spectral effects of plagioclase over other dark minerals are primarily responsible for large errors of estimated TiO2.For the Apollo 14 soils,more accurate estimation for TiO2 is attributed to the positive correlation between a major TiO2-bearing component and TiO2,explaining why the Apollo 14 soils follow the regression trend when analyzed with other soils groups.
Credit Scoring Problem Based on Regression Analysis
Khassawneh, Bashar Suhil Jad Allah
2014-01-01
ABSTRACT: This thesis provides an explanatory introduction to the regression models of data mining and contains basic definitions of key terms in the linear, multiple and logistic regression models. Meanwhile, the aim of this study is to illustrate fitting models for the credit scoring problem using simple linear, multiple linear and logistic regression models and also to analyze the found model functions by statistical tools. Keywords: Data mining, linear regression, logistic regression....
Directory of Open Access Journals (Sweden)
St Leger Antony S
2005-02-01
Full Text Available Abstract Background There is a small, but growing body of literature highlighting inequities in GP practice prescribing rates for many drug therapies. The aim of this paper is to further explore the equity of prescribing for five major CHD drug groups and to explain the amount of variation in GP practice prescribing rates that can be explained by a range of healthcare needs indicators (HCNIs. Methods The study involved a cross-sectional secondary analysis in four primary care trusts (PCTs 1–4 in the North West of England, including 132 GP practices. Prescribing rates (average daily quantities per registered patient aged over 35 years and HCNIs were developed for all GP practices. Analysis was undertaken using multiple linear regression. Results Between 22–25% of the variation in prescribing rates for statins, beta-blockers and bendrofluazide was explained in the multiple regression models. Slightly more variation was explained for ACE inhibitors (31.6% and considerably more for aspirin (51.2%. Prescribing rates were positively associated with CHD hospital diagnoses and procedures for all drug groups other than ACE inhibitors. The proportion of patients aged 55–74 years was positively related to all prescribing rates other than aspirin, where they were positively related to the proportion of patients aged >75 years. However, prescribing rates for statins and ACE inhibitors were negatively associated with the proportion of patients aged >75 years in addition to the proportion of patients from minority ethnic groups. Prescribing rates for aspirin, bendrofluazide and all CHD drugs combined were negatively associated with deprivation. Conclusion Although around 25–50% of the variation in prescribing rates was explained by HCNIs, this varied markedly between PCTs and drug groups. Prescribing rates were generally characterised by both positive and negative associations with HCNIs, suggesting possible inequities in prescribing rates on the basis
An Additive-Multiplicative Cox-Aalen Regression Model
DEFF Research Database (Denmark)
Scheike, Thomas H.; Zhang, Mei-Jie
2002-01-01
Aalen model; additive risk model; counting processes; Cox regression; survival analysis; time-varying effects......Aalen model; additive risk model; counting processes; Cox regression; survival analysis; time-varying effects...
Energy Technology Data Exchange (ETDEWEB)
Diercks, D R; Raske, D T
1976-12-01
The available elevated-temperature, strain-controlled, uniaxial fatigue data on Type 304 stainless steel (474 data points) are tabulated, and variables that influence cyclic life are divided into first- and second-order categories. The first-order variables, which include strain range, strain rate, temperature, and hold time, were used in a multiple linear regression analysis to describe the observed variation in fatigue life for zero and tension hold-time data. Goodness of fit, with respect to these variables, as well as the appropriateness of the transformations used are discussed. Prediction intervals are estimated, and comparisons between the regression equation curves and the data from which they were obtained are made. The second-order variables include the laboratories at which the data were generated, the different heats from which the test specimens were fabricated, and the heat treatments that preceded testing. These variables were statistically analyzed to determine their effect on fatigue life. The results are discussed, and the heats and heat treatments that are most resistant to fatigue damage under these loading and environmental conditions are identified.
Directory of Open Access Journals (Sweden)
Qiutong Jin
2016-06-01
Full Text Available Estimating the spatial distribution of precipitation is an important and challenging task in hydrology, climatology, ecology, and environmental science. In order to generate a highly accurate distribution map of average annual precipitation for the Loess Plateau in China, multiple linear regression Kriging (MLRK and geographically weighted regression Kriging (GWRK methods were employed using precipitation data from the period 1980–2010 from 435 meteorological stations. The predictors in regression Kriging were selected by stepwise regression analysis from many auxiliary environmental factors, such as elevation (DEM, normalized difference vegetation index (NDVI, solar radiation, slope, and aspect. All predictor distribution maps had a 500 m spatial resolution. Validation precipitation data from 130 hydrometeorological stations were used to assess the prediction accuracies of the MLRK and GWRK approaches. Results showed that both prediction maps with a 500 m spatial resolution interpolated by MLRK and GWRK had a high accuracy and captured detailed spatial distribution data; however, MLRK produced a lower prediction error and a higher variance explanation than GWRK, although the differences were small, in contrast to conclusions from similar studies.
Retail sales forecasting with application the multiple regression
Directory of Open Access Journals (Sweden)
Kuzhda, Tetyana
2012-05-01
Full Text Available The article begins with a formulation for predictive learning called multiple regression model. Theoretical approach on construction of the regression models is described. The key information of the article is the mathematical formulation for the forecast linear equation that estimates the multiple regression model. Calculation the quantitative value of dependent variable forecast under influence of independent variables is explained. This paper presents the retail sales forecasting with multiple model estimation. One of the most important decisions a retailer can make with information obtained by the multiple regression. Recently, a changing retail environment is causing by an expected consumer’s income and advertising costs. Checking model on the goodness of fit and statistical significance are explored in the article. Finally, the quantitative value of retail sales forecast based on multiple regression model is calculated.
The use of multiple linear regression in property valuation
Directory of Open Access Journals (Sweden)
Marko Pejić
2013-05-01
Full Text Available The property appraisal is of great importance for one country and its economy. Nowadays, successful land management system could not be imagined without the subsystem related to market economy. Having the information about land and its values offer broad possibilities for market economy and strongly influence development of the real estate market. Special attention should be paid to the mass appraisal methods and its use in developing the tax system and framework for appropriate property appraisal system. Multiple regression analysis is just one of the methods used for this purpose and this article is focused to its characteristics and advantages in mass appraisal system development.
Riccardi, M; Mele, G; Pulvento, C; Lavini, A; d'Andria, R; Jacobsen, S-E
2014-06-01
Leaf chlorophyll content provides valuable information about physiological status of plants; it is directly linked to photosynthetic potential and primary production. In vitro assessment by wet chemical extraction is the standard method for leaf chlorophyll determination. This measurement is expensive, laborious, and time consuming. Over the years alternative methods, rapid and non-destructive, have been explored. The aim of this work was to evaluate the applicability of a fast and non-invasive field method for estimation of chlorophyll content in quinoa and amaranth leaves based on RGB components analysis of digital images acquired with a standard SLR camera. Digital images of leaves from different genotypes of quinoa and amaranth were acquired directly in the field. Mean values of each RGB component were evaluated via image analysis software and correlated to leaf chlorophyll provided by standard laboratory procedure. Single and multiple regression models using RGB color components as independent variables have been tested and validated. The performance of the proposed method was compared to that of the widely used non-destructive SPAD method. Sensitivity of the best regression models for different genotypes of quinoa and amaranth was also checked. Color data acquisition of the leaves in the field with a digital camera was quick, more effective, and lower cost than SPAD. The proposed RGB models provided better correlation (highest R (2)) and prediction (lowest RMSEP) of the true value of foliar chlorophyll content and had a lower amount of noise in the whole range of chlorophyll studied compared with SPAD and other leaf image processing based models when applied to quinoa and amaranth. PMID:24442792
Riccardi, M; Mele, G; Pulvento, C; Lavini, A; d'Andria, R; Jacobsen, S-E
2014-06-01
Leaf chlorophyll content provides valuable information about physiological status of plants; it is directly linked to photosynthetic potential and primary production. In vitro assessment by wet chemical extraction is the standard method for leaf chlorophyll determination. This measurement is expensive, laborious, and time consuming. Over the years alternative methods, rapid and non-destructive, have been explored. The aim of this work was to evaluate the applicability of a fast and non-invasive field method for estimation of chlorophyll content in quinoa and amaranth leaves based on RGB components analysis of digital images acquired with a standard SLR camera. Digital images of leaves from different genotypes of quinoa and amaranth were acquired directly in the field. Mean values of each RGB component were evaluated via image analysis software and correlated to leaf chlorophyll provided by standard laboratory procedure. Single and multiple regression models using RGB color components as independent variables have been tested and validated. The performance of the proposed method was compared to that of the widely used non-destructive SPAD method. Sensitivity of the best regression models for different genotypes of quinoa and amaranth was also checked. Color data acquisition of the leaves in the field with a digital camera was quick, more effective, and lower cost than SPAD. The proposed RGB models provided better correlation (highest R (2)) and prediction (lowest RMSEP) of the true value of foliar chlorophyll content and had a lower amount of noise in the whole range of chlorophyll studied compared with SPAD and other leaf image processing based models when applied to quinoa and amaranth.
REPRESENTATIVE VARIABLES IN A MULTIPLE REGRESSION MODEL
Directory of Open Access Journals (Sweden)
Barbu Bogdan POPESCU
2013-02-01
Full Text Available There are presented econometric models developed for analysis of banking exclusion of the economic crisis. Access to public goods and services is a condition „sine qua non” for open and efficient society. Availability of banking and payment of the entire population without discrimination in our opinion should be the primary objective of public service policy.
Jaccard, James; And Others
1990-01-01
Issues in the detection and interpretation of interaction effects between quantitative variables in multiple regression analysis are discussed. Recent discussions associated with problems of multicollinearity are reviewed in the context of the conditional nature of multiple regression with product terms. (TJH)
Caselli; Daniele; Mangone; Paolillo
2000-01-15
The apparent pK(a) of dyes in water-in-oil microemulsions depends on the charge of the acid and base forms of the buffers present in the water pool. Extended principal-component analysis allows the precise determination of the apparent pK(a) and of the spectra of the acid and base forms of the dye. Combination with multiple linear regression increases the precision. The pK(a) of 7-hydroxycoumarin (umbelliferone) was spectrophotometrically measured in a water/AOT/isooctane microemulsion in the presence of a series of buffers carrying different charges at various different water/surfactant ratios. The spectra of the acid and base forms of the dye in the microemulsion are very similar to those in bulk water in the presence of Tris and ammonia. The presence of carbonate changes somewhat the spectrum of the acid form. Results are discussed taking into account the profile of the electrostatic potential drop in the water pool and the possible partition of umbelliferone between the aqueous core and the surfactant. The pK(a) values corrected for these effects are independent of w(0) and are close to the value of the pK(a) in bulk water. Copyright 2000 Academic Press.
Caselli, Maurizio; Mangone, Annarosa; Paolillo, Paola; Traini, Angela
2002-01-01
The pKa of 3',3",5',5"tetrabromo-m-cresolsulfonephtalein (Bromocresol Green) and o-cresolsulphonephtalein (Cresol Red) was spectrophotometrically measured in a water/AOT/isooctane microemulsion in the presence of a series of buffers carrying different charges at different water/surfactant ratios. Extended Principal Component Analysis was used for a precise determination of the apparent pKa and of the spectra of the acid and base forms of the dye. The apparent pKa of dyes in water-in-oil microemulsions depends on the charge of the acid and base forms of the buffers present in the water pool. Combination with multiple linear regression increases the precision. Results are discussed taking into account the profile of the electrostatic potential in the water pool and the possible partition of the indicator between the aqueous core and the surfactant. The pKa corrected for these effects are independent of w0 and are close to the value of the pKa in bulk water. On the basis of a tentative hypothesis it is possible to calculate the true pKa of the buffer in the pool.
Energy Technology Data Exchange (ETDEWEB)
Nakagawa, S. [Maizuru National College of Technology, Kyoto (Japan); Kenmoku, Y.; Sakakibara, T. [Toyohashi University of Technology, Aichi (Japan); Kawamoto, T. [Shizuoka University, Shizuoka (Japan). Faculty of Engineering
1996-10-27
Study is under way for a more accurate solar radiation quantity prediction for the enhancement of solar energy utilization efficiency. Utilizing the technique of roughly estimating the day`s clearness index from forecast weather, the forecast weather (constituted of weather conditions such as `clear,` `cloudy,` etc., and adverbs or adjectives such as `afterward,` `temporary,` and `intermittent`) has been quantified relative to the clearness index. This index is named the `weather index` for the purpose of this article. The error high in rate in the weather index relates to cloudy days, which means a weather index falling in 0.2-0.5. It has also been found that there is a high correlation between the clearness index and the north-south wind direction component. A multiple regression analysis has been carried out, under the circumstances, for the estimation of clearness index from the maximum temperature and the north-south wind direction component. As compared with estimation of the clearness index on the basis only of the weather index, estimation using the weather index and maximum temperature achieves a 3% improvement throughout the year. It has also been learned that estimation by use of the weather index and north-south wind direction component enables a 2% improvement for summer and a 5% or higher improvement for winter. 2 refs., 6 figs., 4 tabs.
Multiple kernel support vector regression for pricing nifty option
Directory of Open Access Journals (Sweden)
Neetu Verma
2015-09-01
Full Text Available The goal of present experiments is to investigate the use of multiple kernel learning as a tool for pricing options in the context of Indian stock market for Nifty index options. In this paper, fair price of an option is predicted by Multiple Kernel Support Vector Regression (MKLSVR using linear combinations of kernels and Single Kernel Support Vector Regression (SKSVR. Prices of option highly depend on different money market conditions like deep-in-the-money, in-the-money, at-the-money, out-of-money and deep-out-of-money condition. The experimental study attempts to identify the forecasting errors with the help of mean square error; root meant square error, and normalized root meant square error between the market option prices and the calculated option prices by model for all market conditions. The results reflect that multiple kernel support vector regression performed fairly well in comparison to support vector regression with single kernel.
Virués-Ortega, Javier
2010-06-01
A number of clinical trials and single-subject studies have been published measuring the effectiveness of long-term, comprehensive applied behavior analytic (ABA) intervention for young children with autism. However, the overall appreciation of this literature through standardized measures has been hampered by the varying methods, designs, treatment features and quality standards of published studies. In an attempt to fill this gap in the literature, state-of-the-art meta-analytical methods were implemented, including quality assessment, sensitivity analysis, meta-regression, dose-response meta-analysis and meta-analysis of studies of different metrics. Results suggested that long-term, comprehensive ABA intervention leads to (positive) medium to large effects in terms of intellectual functioning, language development, acquisition of daily living skills and social functioning in children with autism. Although favorable effects were apparent across all outcomes, language-related outcomes (IQ, receptive and expressive language, communication) were superior to non-verbal IQ, social functioning and daily living skills, with effect sizes approaching 1.5 for receptive and expressive language and communication skills. Dose-dependant effect sizes were apparent by levels of total treatment hours for language and adaptation composite scores. Methodological issues relating ABA clinical trials for autism are discussed. PMID:20223569
Vehicle Travel Time Predication based on Multiple Kernel Regression
Directory of Open Access Journals (Sweden)
Wenjing Xu
2014-07-01
Full Text Available With the rapid development of transportation and logistics economy, the vehicle travel time prediction and planning become an important topic in logistics. Travel time prediction, which is indispensible for traffic guidance, has become a key issue for researchers in this field. At present, the prediction of travel time is mainly short term prediction, and the predication methods include artificial neural network, Kaman filter and support vector regression (SVR method etc. However, these algorithms still have some shortcomings, such as highcomputationcomplexity, slow convergence rate etc. This paper exploits the learning ability of multiple kernel learning regression (MKLR in nonlinear prediction processing characteristics, logistics planning based on MKLR for vehicle travel time prediction. The method for Vehicle travel time prediction includes the following steps: (1 preprocessing historical data; (2 selecting appropriate kernel function, training the historical data and performing analysis ;(3 predicting the vehicle travel time based on the trained model. The experimental results show that, through the analysis of using different methods for prediction, the vehicle travel time prediction method proposed in this paper, archives higher accuracy than other methods. It also illustrates the feasibility and effectiveness of the proposed prediction method.
Gaussian process regression analysis for functional data
Shi, Jian Qing
2011-01-01
Gaussian Process Regression Analysis for Functional Data presents nonparametric statistical methods for functional regression analysis, specifically the methods based on a Gaussian process prior in a functional space. The authors focus on problems involving functional response variables and mixed covariates of functional and scalar variables.Covering the basics of Gaussian process regression, the first several chapters discuss functional data analysis, theoretical aspects based on the asymptotic properties of Gaussian process regression models, and new methodological developments for high dime
Steganalysis of LSB Image Steganography using Multiple Regression and Auto Regressive (AR Model
Directory of Open Access Journals (Sweden)
Souvik Bhattacharyya
2011-07-01
Full Text Available The staggering growth in communication technologyand usage of public domain channels (i.e. Internet has greatly facilitated transfer of data. However, such open communication channelshave greater vulnerability to security threats causing unauthorizedin- formation access. Traditionally, encryption is used to realizethen communication security. However, important information is notprotected once decoded. Steganography is the art and science of communicating in a way which hides the existence of the communication.Important information is ﬁrstly hidden in a host data, such as digitalimage, text, video or audio, etc, and then transmitted secretly tothe receiver. Steganalysis is another important topic in informationhiding which is the art of detecting the presence of steganography. Inthis paper a novel technique for the steganalysis of Image has beenpresented. The proposed technique uses an auto-regressive model todetect the presence of the hidden messages, as well as to estimatethe relative length of the embedded messages.Various auto regressiveparameters are used to classify cover image as well as stego imagewith the help of a SVM classiﬁer. Multiple Regression analysis ofthe cover carrier along with the stego carrier has been carried outin order to ﬁnd out the existence of the negligible amount of thesecret message. Experimental results demonstrate the effectivenessand accuracy of the proposed technique.
Isolating and Examining Sources of Suppression and Multicollinearity in Multiple Linear Regression
Beckstead, Jason W.
2012-01-01
The presence of suppression (and multicollinearity) in multiple regression analysis complicates interpretation of predictor-criterion relationships. The mathematical conditions that produce suppression in regression analysis have received considerable attention in the methodological literature but until now nothing in the way of an analytic…
Teasing out the effect of tutorials via multiple regression
Chasteen, Stephanie V.
2012-02-01
We transformed an upper-division physics course using a variety of elements, including homework help sessions, tutorials, clicker questions with peer instruction, and explicit learning goals. Overall, the course transformations improved student learning, as measured by our conceptual assessment. Since these transformations were multi-faceted, we would like to understand the impact of individual course elements. Attendance at tutorials and homework help sessions was optional, and occurred outside the class environment. In order to identify the impact of these optional out-of-class sessions, given self-selection effects in student attendance, we performed a multiple regression analysis. Even when background variables are taken into account, tutorial attendance is positively correlated with student conceptual understanding of the material - though not with performance on course exams. Other elements that increase student time-on-task, such as homework help sessions and lectures, do not achieve the same impacts.
Schaeck, S.; Karspeck, T.; Ott, C.; Weirather-Koestner, D.; Stoermer, A. O.
2011-03-01
In the first part of this work [1] a field operational test (FOT) on micro-HEVs (hybrid electric vehicles) and conventional vehicles was introduced. Valve-regulated lead-acid (VRLA) batteries in absorbent glass mat (AGM) technology and flooded batteries were applied. The FOT data were analyzed by kernel density estimation. In this publication multiple regression analysis is applied to the same data. Square regression models without interdependencies are used. Hereby, capacity loss serves as dependent parameter and several battery-related and vehicle-related parameters as independent variables. Battery temperature is found to be the most critical parameter. It is proven that flooded batteries operated in the conventional power system (CPS) degrade faster than VRLA-AGM batteries in the micro-hybrid power system (MHPS). A smaller number of FOT batteries were applied in a vehicle-assigned test design where the test battery is repeatedly mounted in a unique test vehicle. Thus, vehicle category and specific driving profiles can be taken into account in multiple regression. Both parameters have only secondary influence on battery degradation, instead, extended vehicle rest time linked to low mileage performance is more serious. A tear-down analysis was accomplished for selected VRLA-AGM batteries operated in the MHPS. Clear indications are found that pSoC-operation with periodically fully charging the battery (refresh charging) does not result in sulphation of the negative electrode. Instead, the batteries show corrosion of the positive grids and weak adhesion of the positive active mass.
Direction of Effects in Multiple Linear Regression Models.
Wiedermann, Wolfgang; von Eye, Alexander
2015-01-01
Previous studies analyzed asymmetric properties of the Pearson correlation coefficient using higher than second order moments. These asymmetric properties can be used to determine the direction of dependence in a linear regression setting (i.e., establish which of two variables is more likely to be on the outcome side) within the framework of cross-sectional observational data. Extant approaches are restricted to the bivariate regression case. The present contribution extends the direction of dependence methodology to a multiple linear regression setting by analyzing distributional properties of residuals of competing multiple regression models. It is shown that, under certain conditions, the third central moments of estimated regression residuals can be used to decide upon direction of effects. In addition, three different approaches for statistical inference are discussed: a combined D'Agostino normality test, a skewness difference test, and a bootstrap difference test. Type I error and power of the procedures are assessed using Monte Carlo simulations, and an empirical example is provided for illustrative purposes. In the discussion, issues concerning the quality of psychological data, possible extensions of the proposed methods to the fourth central moment of regression residuals, and potential applications are addressed. PMID:26609741
Confidence Intervals for an Effect Size Measure in Multiple Linear Regression
Algina, James; Keselman, H. J.; Penfield, Randall D.
2007-01-01
The increase in the squared multiple correlation coefficient ([Delta]R[squared]) associated with a variable in a regression equation is a commonly used measure of importance in regression analysis. The coverage probability that an asymptotic and percentile bootstrap confidence interval includes [Delta][rho][squared] was investigated. As expected,…
Interpreting Multiple Linear Regression: A Guidebook of Variable Importance
Nathans, Laura L.; Oswald, Frederick L.; Nimon, Kim
2012-01-01
Multiple regression (MR) analyses are commonly employed in social science fields. It is also common for interpretation of results to typically reflect overreliance on beta weights, often resulting in very limited interpretations of variable importance. It appears that few researchers employ other methods to obtain a fuller understanding of what…
Functional linear regression via canonical analysis
He, Guozhong; Wang, Jane-Ling; Yang, Wenjing; 10.3150/09-BEJ228
2011-01-01
We study regression models for the situation where both dependent and independent variables are square-integrable stochastic processes. Questions concerning the definition and existence of the corresponding functional linear regression models and some basic properties are explored for this situation. We derive a representation of the regression parameter function in terms of the canonical components of the processes involved. This representation establishes a connection between functional regression and functional canonical analysis and suggests alternative approaches for the implementation of functional linear regression analysis. A specific procedure for the estimation of the regression parameter function using canonical expansions is proposed and compared with an established functional principal component regression approach. As an example of an application, we present an analysis of mortality data for cohorts of medflies, obtained in experimental studies of aging and longevity.
Applied regression analysis a research tool
Pantula, Sastry; Dickey, David
1998-01-01
Least squares estimation, when used appropriately, is a powerful research tool. A deeper understanding of the regression concepts is essential for achieving optimal benefits from a least squares analysis. This book builds on the fundamentals of statistical methods and provides appropriate concepts that will allow a scientist to use least squares as an effective research tool. Applied Regression Analysis is aimed at the scientist who wishes to gain a working knowledge of regression analysis. The basic purpose of this book is to develop an understanding of least squares and related statistical methods without becoming excessively mathematical. It is the outgrowth of more than 30 years of consulting experience with scientists and many years of teaching an applied regression course to graduate students. Applied Regression Analysis serves as an excellent text for a service course on regression for non-statisticians and as a reference for researchers. It also provides a bridge between a two-semester introduction to...
Directory of Open Access Journals (Sweden)
Halil Ibrahim Cebeci
2009-12-01
Full Text Available This study explores the relationship between the student performance and instructional design. The research was conducted at the E-Learning School at a university in Turkey. A list of design factors that had potential influence on student success was created through a review of the literature and interviews with relevant experts. From this, the five most import design factors were chosen. The experts scored 25 university courses on the extent to which they demonstrated the chosen design factors. Multiple-regression and supervised artificial neural network (ANN models were used to examine the relationship between student grade point averages and the scores on the five design factors. The results indicated that there is no statistical difference between the two models. Both models identified the use of examples and applications as the most influential factor. The ANN model provided more information and was used to predict the course-specific factor values required for a desired level of success.
Whitlock, C. H., III
1977-01-01
Constituents with linear radiance gradients with concentration may be quantified from signals which contain nonlinear atmospheric and surface reflection effects for both homogeneous and non-homogeneous water bodies provided accurate data can be obtained and nonlinearities are constant with wavelength. Statistical parameters must be used which give an indication of bias as well as total squared error to insure that an equation with an optimum combination of bands is selected. It is concluded that the effect of error in upwelled radiance measurements is to reduce the accuracy of the least square fitting process and to increase the number of points required to obtain a satisfactory fit. The problem of obtaining a multiple regression equation that is extremely sensitive to error is discussed.
Regression Analysis and the Sociological Imagination
De Maio, Fernando
2014-01-01
Regression analysis is an important aspect of most introductory statistics courses in sociology but is often presented in contexts divorced from the central concerns that bring students into the discipline. Consequently, we present five lesson ideas that emerge from a regression analysis of income inequality and mortality in the USA and Canada.
Prediction on adsorption ratio of carbon dioxide to methane on coals with multiple linear regression
Institute of Scientific and Technical Information of China (English)
YU Hong-guan; MENG Xian-ming; FAN Wei-tang; YE Jian-ping
2007-01-01
The multiple linear regression equations for adsorption ratio of CO2/CH4 and its coal quality indexes were built with SPSS software on basis of existing coal quality data and its adsorption amount of CO2 and CH4.The regression equations built were tested with data collected from some S,and the influences of coal quality indexes on adsorption ratio of CO2/CH4 were studied with investigation of regression equations.The study results show that the regression equation for adsorption ratio of CO2/CH4 and volatile matter,ash and moisture in coal can be Obtained with multiple linear regression analysis,that the influence of same coal quality index with the degree of metamorphosis or influence of coal quality indexes for same coal rank on adsorption ratio is not consistent.
A Solution to Separation and Multicollinearity in Multiple Logistic Regression.
Shen, Jianzhao; Gao, Sujuan
2008-10-01
In dementia screening tests, item selection for shortening an existing screening test can be achieved using multiple logistic regression. However, maximum likelihood estimates for such logistic regression models often experience serious bias or even non-existence because of separation and multicollinearity problems resulting from a large number of highly correlated items. Firth (1993, Biometrika, 80(1), 27-38) proposed a penalized likelihood estimator for generalized linear models and it was shown to reduce bias and the non-existence problems. The ridge regression has been used in logistic regression to stabilize the estimates in cases of multicollinearity. However, neither solves the problems for each other. In this paper, we propose a double penalized maximum likelihood estimator combining Firth's penalized likelihood equation with a ridge parameter. We present a simulation study evaluating the empirical performance of the double penalized likelihood estimator in small to moderate sample sizes. We demonstrate the proposed approach using a current screening data from a community-based dementia study.
火灾与社会经济环境的多元回归分析%Multiple regression analysis on fire and socioeconomic environment
Institute of Scientific and Technical Information of China (English)
蔡晶菁
2012-01-01
By mathematical application software such as SPSS, Excel, MATLAB etc. , the fire and socioeconomic environment were analyzed by scatter plot, correlation analysis, principal component analysis and regression analysis. Taking fire situation in 2009 as an example, the influence of socioeconomic environment to fire was studied, which can provide reference for the fire prevention and socioeconomic environment coordinated development.%借助SPSS、Excel、MATLAB等数学应用软件,对火灾与社会经济环境进行散点图分析、相关分析、主成分分析及回归分析,以2009年全国火灾形势为例,研究社会经济环境诸指标对火灾的影响,为更好地防范火灾、促进社会经济环境协调发展提供科学依据和决策参考.
Interpret with caution: multicollinearity in multiple regression of cognitive data.
Morrison, Catriona M
2003-08-01
Shibihara and Kondo in 2002 reported a reanalysis of the 1997 Kanji picture-naming data of Yamazaki, Ellis, Morrison, and Lambon-Ralph in which independent variables were highly correlated. Their addition of the variable visual familiarity altered the previously reported pattern of results, indicating that visual familiarity, but not age of acquisition, was important in predicting Kanji naming speed. The present paper argues that caution should be taken when drawing conclusions from multiple regression analyses in which the independent variables are so highly correlated, as such multicollinearity can lead to unreliable output.
Analysis of Inflation in Turkey via Ridge Regression
Directory of Open Access Journals (Sweden)
Duygu Tunalı
2015-12-01
Full Text Available The aim of this study is to analyze inflation in Turkey between the years 2003-2014 and also compare the inflation for the period 2003-2014 with inflation in the years 1963-1983 in Turkey. When multiple linear regression modeling is used for inflation analysis, multicollinearity problem occurred between independent variables. In this study to eliminate the problem in concern ; ridge regression, which is one of the biased estimation methods, is used. Ridge regression method, gives smaller mean square error made by the least squares method based on β parameter estimator without removing variables of the model.
Joint regression analysis and AMMI model applied to oat improvement
Oliveira, A.; Oliveira, T. A.; Mejza, S.
2012-09-01
In our work we present an application of some biometrical methods useful in genotype stability evaluation, namely AMMI model, Joint Regression Analysis (JRA) and multiple comparison tests. A genotype stability analysis of oat (Avena Sativa L.) grain yield was carried out using data of the Portuguese Plant Breeding Board, sample of the 22 different genotypes during the years 2002, 2003 and 2004 in six locations. In Ferreira et al. (2006) the authors state the relevance of the regression models and of the Additive Main Effects and Multiplicative Interactions (AMMI) model, to study and to estimate phenotypic stability effects. As computational techniques we use the Zigzag algorithm to estimate the regression coefficients and the agricolae-package available in R software for AMMI model analysis.
Institute of Scientific and Technical Information of China (English)
王爱光; 崔蕾; 王桂玲; 李建志; 张梅芳; 赵敏; 孙爱华; 单容
2013-01-01
目的 运用多元回归分析的方法,寻找影响FibroScan测定肝纤维化的独立预测因子.方法 选取住院及门诊需行肝穿刺活组织检查确诊的患者181例,于肝活检手术当天进行FibroScan检测肝脏硬度值(LSM),同时收集患者临床信息及±3d的常规血液检测指标.将筛选出的10个自变量进行多元回归分析.结果 多元回归分析结果显示,血小板(PLT)、血清白蛋白(ALB)、凝血酶原活动度(PTA)、体质量指数(BMI)为独立预测因子.结论 在FibroScan检测肝脏硬度时,PLT、ALB、PTA和BMI值可能会对检测结果产生影响.%Objective To determine the independent predictors influencing liver stiffness detection with FibroScan by multiple regression analysis.Methods One hundred and eighty-one inpatients and outpatients who required liver biopsy were enrolled.Liver stiffness measurement (LSM) was detected by FibroScan on the day of performing liver biopsy,and clinical information and routine biochemical tests data (± 3 days) were gathered.Ten factors were chosen for the multiple regression analysis.Results Multiple regression analysis showed that platelet(PLT),serum albumin (ALB),prothrombin activity (PTA) and body mass index (BMI) were independent predictors.Conclusion In FibroScan detection,PLT,ALB,PTA and BMI might be influencing factors.
Directory of Open Access Journals (Sweden)
M. Srinivasan
2012-01-01
Full Text Available Problem statement: This study presents a novel method for the determination of average winding temperature rise of transformers under its predetermined field operating conditions. Rise in the winding temperature was determined from the estimated values of winding resistance during the heat run test conducted as per IEC standard. Approach: The estimation of hot resistance was modeled using Multiple Variable Regression (MVR, Multiple Polynomial Regression (MPR and soft computing techniques such as Artificial Neural Network (ANN and Adaptive Neuro Fuzzy Inference System (ANFIS. The modeled hot resistance will help to find the load losses at any load situation without using complicated measurement set up in transformers. Results: These techniques were applied for the hot resistance estimation for dry type transformer by using the input variables cold resistance, ambient temperature and temperature rise. The results are compared and they show a good agreement between measured and computed values. Conclusion: According to our experiments, the proposed methods are verified using experimental results, which have been obtained from temperature rise test performed on a 55 kVA dry-type transformer.
A Regression Analysis Model Based on Wavelet Networks
Institute of Scientific and Technical Information of China (English)
XIONG Zheng-feng
2002-01-01
In this paper, an approach is proposed to combine wavelet networks and techniques of regression analysis. The resulting wavelet regression estimator is well suited for regression estimation of moderately large dimension, in particular for regressions with localized irregularities.
Kuhn, David; Parida, Laxmi
2016-01-01
Given a set of biallelic molecular markers, such as SNPs, with genotype values encoded numerically on a collection of plant, animal or human samples, the goal of genetic trait prediction is to predict the quantitative trait values by simultaneously modeling all marker effects. Genetic trait prediction is usually represented as linear regression models. In many cases, for the same set of samples and markers, multiple traits are observed. Some of these traits might be correlated with each other. Therefore, modeling all the multiple traits together may improve the prediction accuracy. In this work, we view the multitrait prediction problem from a machine learning angle: as either a multitask learning problem or a multiple output regression problem, depending on whether different traits share the same genotype matrix or not. We then adapted multitask learning algorithms and multiple output regression algorithms to solve the multitrait prediction problem. We proposed a few strategies to improve the least square error of the prediction from these algorithms. Our experiments show that modeling multiple traits together could improve the prediction accuracy for correlated traits. Availability and implementation: The programs we used are either public or directly from the referred authors, such as MALSAR (http://www.public.asu.edu/~jye02/Software/MALSAR/) package. The Avocado data set has not been published yet and is available upon request. Contact: dhe@us.ibm.com PMID:27307640
Modeling Pan Evaporation for Kuwait by Multiple Linear Regression
Directory of Open Access Journals (Sweden)
Jaber Almedeij
2012-01-01
Full Text Available Evaporation is an important parameter for many projects related to hydrology and water resources systems. This paper constitutes the first study conducted in Kuwait to obtain empirical relations for the estimation of daily and monthly pan evaporation as functions of available meteorological data of temperature, relative humidity, and wind speed. The data used here for the modeling are daily measurements of substantial continuity coverage, within a period of 17 years between January 1993 and December 2009, which can be considered representative of the desert climate of the urban zone of the country. Multiple linear regression technique is used with a procedure of variable selection for fitting the best model forms. The correlations of evaporation with temperature and relative humidity are also transformed in order to linearize the existing curvilinear patterns of the data by using power and exponential functions, respectively. The evaporation models suggested with the best variable combinations were shown to produce results that are in a reasonable agreement with observation values.
Overcoming multicollinearity in multiple regression using correlation coefficient
Zainodin, H. J.; Yap, S. J.
2013-09-01
Multicollinearity happens when there are high correlations among independent variables. In this case, it would be difficult to distinguish between the contributions of these independent variables to that of the dependent variable as they may compete to explain much of the similar variance. Besides, the problem of multicollinearity also violates the assumption of multiple regression: that there is no collinearity among the possible independent variables. Thus, an alternative approach is introduced in overcoming the multicollinearity problem in achieving a well represented model eventually. This approach is accomplished by removing the multicollinearity source variables on the basis of the correlation coefficient values based on full correlation matrix. Using the full correlation matrix can facilitate the implementation of Excel function in removing the multicollinearity source variables. It is found that this procedure is easier and time-saving especially when dealing with greater number of independent variables in a model and a large number of all possible models. Hence, in this paper detailed insight of the procedure is shown, compared and implemented.
Functional data analysis of generalized regression quantiles
Guo, Mengmeng
2013-11-05
Generalized regression quantiles, including the conditional quantiles and expectiles as special cases, are useful alternatives to the conditional means for characterizing a conditional distribution, especially when the interest lies in the tails. We develop a functional data analysis approach to jointly estimate a family of generalized regression quantiles. Our approach assumes that the generalized regression quantiles share some common features that can be summarized by a small number of principal component functions. The principal component functions are modeled as splines and are estimated by minimizing a penalized asymmetric loss measure. An iterative least asymmetrically weighted squares algorithm is developed for computation. While separate estimation of individual generalized regression quantiles usually suffers from large variability due to lack of sufficient data, by borrowing strength across data sets, our joint estimation approach significantly improves the estimation efficiency, which is demonstrated in a simulation study. The proposed method is applied to data from 159 weather stations in China to obtain the generalized quantile curves of the volatility of the temperature at these stations. © 2013 Springer Science+Business Media New York.
He, Dan; Kuhn, David; Parida, Laxmi
2016-01-01
Given a set of biallelic molecular markers, such as SNPs, with genotype values encoded numerically on a collection of plant, animal or human samples, the goal of genetic trait prediction is to predict the quantitative trait values by simultaneously modeling all marker effects. Genetic trait prediction is usually represented as linear regression models. In many cases, for the same set of samples and markers, multiple traits are observed. Some of these traits might be correlated with each other...
Takagi, Daisuke; Ikeda, Ken'ichi; Kawachi, Ichiro
2012-11-01
Crime is an important determinant of public health outcomes, including quality of life, mental well-being, and health behavior. A body of research has documented the association between community social capital and crime victimization. The association between social capital and crime victimization has been examined at multiple levels of spatial aggregation, ranging from entire countries, to states, metropolitan areas, counties, and neighborhoods. In multilevel analysis, the spatial boundaries at level 2 are most often drawn from administrative boundaries (e.g., Census tracts in the U.S.). One problem with adopting administrative definitions of neighborhoods is that it ignores spatial spillover. We conducted a study of social capital and crime victimization in one ward of Tokyo city, using a spatial Durbin model with an inverse-distance weighting matrix that assigned each respondent a unique level of "exposure" to social capital based on all other residents' perceptions. The study is based on a postal questionnaire sent to 20-69 years old residents of Arakawa Ward, Tokyo. The response rate was 43.7%. We examined the contextual influence of generalized trust, perceptions of reciprocity, two types of social network variables, as well as two principal components of social capital (constructed from the above four variables). Our outcome measure was self-reported crime victimization in the last five years. In the spatial Durbin model, we found that neighborhood generalized trust, reciprocity, supportive networks and two principal components of social capital were each inversely associated with crime victimization. By contrast, a multilevel regression performed with the same data (using administrative neighborhood boundaries) found generally null associations between neighborhood social capital and crime. Spatial regression methods may be more appropriate for investigating the contextual influence of social capital in homogeneous cultural settings such as Japan.
DEFF Research Database (Denmark)
Riccardi, M.; Mele, G.; Pulvento, C.;
2014-01-01
is expensive, laborious, and time consuming. Over the years alternative methods, rapid and non-destructive, have been explored. The aim of this work was to evaluate the applicability of a fast and non-invasive field method for estimation of chlorophyll content in quinoa and amaranth leaves based on RGB...... components analysis of digital images acquired with a standard SLR camera. Digital images of leaves from different genotypes of quinoa and amaranth were acquired directly in the field. Mean values of each RGB component were evaluated via image analysis software and correlated to leaf chlorophyll provided...... for different genotypes of quinoa and amaranth was also checked. Color data acquisition of the leaves in the field with a digital camera was quick, more effective, and lower cost than SPAD. The proposed RGB models provided better correlation (highest R 2) and prediction (lowest RMSEP) of the true value...
奶牛产奶量与乳成分的多元回归分析%Multiple Regression Analysis on Milk Yield and Milk Composition of Dairy Cow
Institute of Scientific and Technical Information of China (English)
张巧娥; 吴学荣; 马水鱼; 邢燕
2011-01-01
通过SAS 8.2软件分析了20头胎次相同、泌乳期相近荷斯坦泌乳牛产奶量与乳成分中乳蛋白质率、乳脂率、干物质、体细胞数和乳中尿素氮的多元回归分析.结果表明：从产奶量与乳成分的单项指标回归分析表明,产奶量与乳脂率、体细胞数和干物质含量呈显著性的负相关,而与乳蛋白率和乳中尿素氮差异不显著;从产奶量与乳成分的多元回归分析表明,乳蛋白率、乳脂率和干物质含量对产奶量的影响高于体细胞数和乳中尿素氮,同时乳蛋白率、乳脂率、体细胞数和乳中尿素氮与产奶量成反比.%20 heads Holstein cattles of same matched plet and similar lactation period were selected. Multiple regression analysis between milk yield and protein ratio in milk, fat ration in milk, dry matter content,somatic cell count and urea nitrogen in milk were analyzed in this study by SAS 8.2. The result showed that the corelation between milk yield and fat ration in milk, somatic cell count, and dry matter content was significantly negative, while milk yield had no significant corelation with protein ratio in milk and urea nitrogen in milk according to single index regression analysis between milk yield and milk components. The effects of protein ratio in milk, fat ration in milk and dry matter content on milk yield were bigger than those of somatic cell count and urea nitrogen in milk, meanwhile, protein ratio in milk, fat ration in milk, somatic cell count and urea nitrogen in milk were inversely proportional to milk yield according to multiple regression analysis between milk yield and milk components.
Forecasting Gold Prices Using Multiple Linear Regression Method
Directory of Open Access Journals (Sweden)
Z. Ismail
2009-01-01
Full Text Available Problem statement: Forecasting is a function in management to assist decision making. It is also described as the process of estimation in unknown future situations. In a more general term it is commonly known as prediction which refers to estimation of time series or longitudinal type data. Gold is a precious yellow commodity once used as money. It was made illegal in USA 41 years ago, but is now once again accepted as a potential currency. The demand for this commodity is on the rise. Approach: Objective of this study was to develop a forecasting model for predicting gold prices based on economic factors such as inflation, currency price movements and others. Following the melt-down of US dollars, investors are putting their money into gold because gold plays an important role as a stabilizing influence for investment portfolios. Due to the increase in demand for gold in Malaysian and other parts of the world, it is necessary to develop a model that reflects the structure and pattern of gold market and forecast movement of gold price. The most appropriate approach to the understanding of gold prices is the Multiple Linear Regression (MLR model. MLR is a study on the relationship between a single dependent variable and one or more independent variables, as this case with gold price as the single dependent variable. The fitted model of MLR will be used to predict the future gold prices. A naive model known as forecast-1 was considered to be a benchmark model in order to evaluate the performance of the model. Results: Many factors determine the price of gold and based on a hunch of experts, several economic factors had been identified to have influence on the gold prices. Variables such as Commodity Research Bureau future index (CRB; USD/Euro Foreign Exchange Rate (EUROUSD; Inflation rate (INF; Money Supply (M1; New York Stock Exchange (NYSE; Standard and Poor 500 (SPX; Treasury Bill (T-BILL and US Dollar index (USDX were considered to
Optimization of DWDM Demultiplexer Using Regression Analysis
Directory of Open Access Journals (Sweden)
Venkatachalam Rajarajan Balaji
2016-01-01
Full Text Available We propose a novel twelve-channel Dense Wavelength Division Multiplexing (DWDM demultiplexer, using the two-dimensional photonic crystal (2D PC with square resonant cavity (SRC of ITU-T G.694.1 standard. The DWDM demultiplexer consists of an input waveguide, SRC, and output waveguide. The SRC in the proposed demultiplexer consists of square resonator and microcavity. The microcavity center rod radius (Rm is proportional to refractive index. The refractive index property of the rods filters the wavelengths of odd and even channels. The proposed microcavity can filter twelve ITU-T G.694.1 standard wavelengths with 0.2 nm/25 GHz channel spacing between the wavelengths. From the simulation, we optimize the rod radius and wavelength with linear regression analysis. From the regression analysis, we can achieve 95% of accuracy with an average quality factor of 7890, the uniform spectral line-width of 0.2 nm, the transmission efficiency of 90%, crosstalk of −42 dB, and footprint of about 784 μm2.
Institute of Scientific and Technical Information of China (English)
江家靖; 郭文斌(通讯作者)
2014-01-01
Objective: To explore the influencing factors and mechanism of depressive neurosis by multiple linear regression and path analysis. Methods: 55 cases of depressive neurosis in open wards of our hospital were investigated. CES-D, ATQ, CTQ, SAS, SCSQ and SSRS were used to explore the influencing factors of depressive neurosis. Multiple linear regression and path analysis were used to probe the influence of depression with life events, coping style and social support. Role model of the main factors was analyzed in this study. Results: Stress response, automatic thoughts, life events, subjective support, objective support, and actively or negative coping were main factors of depressive neurosis. The influences of various factors on depression were different. Automatic thoughts and coping style could induce depression through a direct way. Social support had no direct relation to depression whose path coefficients was not statistical y significant. Life events could influence the occurrence of depression through indirect channels (mediated by coping style). Conclusions: The occurrence of depression is a combined effect of life events, coping style and social support. Multiple regression analysis and path analysis have their own effect in exploring the mechanism of depression. The results can be mutual y complementary to each other.%目的：对神经症患者抑郁影响因素进行多元线性回归和路径分析，探讨影响抑郁的因素与作用机制。方法应用流调中心用抑郁量表、自动思维问卷、儿童期经历问卷、ZUNG焦虑自评量表、简易应对方式量表、自尊量表、攻击行为量表和社会支持量表对55例在广西医科大学第一附属医院开放病房住院且经专科医生确诊为神经症患者抑郁的影响因素进行调查与测评。同时应用多元线性回归和路径分析方法调查应对方式、生活事件及社会支持等在抑郁中的影响程度与作用模式，分析各影响因素对
Regression Discontinuity Designs with Multiple Rating-Score Variables
Reardon, Sean F.; Robinson, Joseph P.
2012-01-01
In the absence of a randomized control trial, regression discontinuity (RD) designs can produce plausible estimates of the treatment effect on an outcome for individuals near a cutoff score. In the standard RD design, individuals with rating scores higher than some exogenously determined cutoff score are assigned to one treatment condition; those…
Institute of Scientific and Technical Information of China (English)
吴启凡
2015-01-01
我国人口老龄化问题日趋明显，现阶段对人口老龄化的模型研究依然存在问题，在对我国人口老龄化情况的研究过程中，单纯运用多元回归的方法需考虑多重共线性问题，为避免此问题则要优选变量，但在逐步回归过程中又会将对其可能造成显著性影响的偏相关扰动项忽略，而且单纯运用回归模型进行预测将在长时间序列中造成较大误差，为此，结合年龄移算法对回归因子进行单项细度预测，再运用回归方程进行宏观计算，将大幅提高预测的精度。本文以男性人口、女性人口、城市人口、乡村人口等因素进行动态研究，先根据相关性分析，初步筛选影响因素，再通过多元线性回归找到人口老龄化与人口结构中相关因素的数量关系，这里通过逐步回归出恰好出现了偏相关扰动项无法接受检验的情况，我们运用两种标准化方法结合Mann-Whitney U检验进行验证分析，最终运用年龄移算模型和回归矩阵预测人口老龄化发展趋势，并根据预测结果进行相关分析，给出相应评价。%The problem of our aging population has become more evident ,the model for the study of population aging is still a problem at this stage. In the case of China’s aging population of the study,the issues of using a simple method (multiple re-gression multicollinearity) is to be considered,To avoid this problem may lead to the Multicollinearity,however they will be the likely cause of a significant impact which can be easily ignored. And use the simple regression model to predict the result in the long sequence may also give rise to more errors,so we need to combined with age-shift algorithm to return the individu-al factors fineness forecast,then use the macro regression equation to calculate,which will significantly improve the prediction accuracy. In this paper,According to correlation analysis,initial screening factors from
Multiple predictor smoothing methods for sensitivity analysis.
Energy Technology Data Exchange (ETDEWEB)
Helton, Jon Craig; Storlie, Curtis B.
2006-08-01
The use of multiple predictor smoothing methods in sampling-based sensitivity analyses of complex models is investigated. Specifically, sensitivity analysis procedures based on smoothing methods employing the stepwise application of the following nonparametric regression techniques are described: (1) locally weighted regression (LOESS), (2) additive models, (3) projection pursuit regression, and (4) recursive partitioning regression. The indicated procedures are illustrated with both simple test problems and results from a performance assessment for a radioactive waste disposal facility (i.e., the Waste Isolation Pilot Plant). As shown by the example illustrations, the use of smoothing procedures based on nonparametric regression techniques can yield more informative sensitivity analysis results than can be obtained with more traditional sensitivity analysis procedures based on linear regression, rank regression or quadratic regression when nonlinear relationships between model inputs and model predictions are present.
An Effect Size for Regression Predictors in Meta-Analysis
Aloe, Ariel M.; Becker, Betsy Jane
2012-01-01
A new effect size representing the predictive power of an independent variable from a multiple regression model is presented. The index, denoted as r[subscript sp], is the semipartial correlation of the predictor with the outcome of interest. This effect size can be computed when multiple predictor variables are included in the regression model…
Institute of Scientific and Technical Information of China (English)
陈诚; 吴华瑞; 秦向阳
2014-01-01
Agriculture has the characteristcs of region and season , showing a relative complex information demand in the service object of agriculture information , the factors influence information service quality can be analyzed by the construc-tion of information service quality evaluation model , and providing guidance for agricultural inforamtion service .The in-dex system of agricultural websites information service quality evaluation is constructed in this paper , the approaches of comprehensive factors analysis and multiple regression build an evaluation model of being close to practice and relative high credibility .The experiment proves that the information service quality evaluation model constructed by factors analy -sis and multiple regression has a good accurancy , and it has a certain significnace to guide the improvement of informa-tion service quality for agricultural information websites .%农业具有地域性、季节性等特征，因此农业信息服务对象的信息需求相对复杂，通过构建信息服务质量评价模型可以分析影响信息服务质量的影响因素，为农业信息服务提供指导。为此，建立了农业网站信息服务质量评价的指标体系，综合因子分析和多元回归方法构建了一种贴合实际、可信度较高的评价模型。经实验证明，运用基于因子分析和多元回归建立的信息服务质量评价模型具有良好的精度，对农业信息网站的信息服务质量的提高具有一定的指导意义。
Institute of Scientific and Technical Information of China (English)
李媛
2011-01-01
人工鱼群算法(AFsA)是一种基于动物行为的自治体寻优模式,依据鱼类活动特点构建的新型智能仿生算法.简要介绍了AFSA算法的基本原理,描述了使用AFSA算法解决多元线性回归分析问题的步骤和结果.仿真实验结果表明,AFSA算法在处理多元线性回归分析问题上是一种简单、高效的算法.%A brief introduction is made of the basic principles of Artificial Fish Swarm Algorithm (AFSA), a new algorithm with autonomous optimization mode according to the behavior of fish swarm. The steps are analyzed for the solution to problems concerning AFSA - based multiple linear regression analysis. The simulation experiment proves that it is simple and efficient.
Institute of Scientific and Technical Information of China (English)
刘枬; 梁晨
2014-01-01
According to Keynesianism and property-value bubble theory,the paper takes the statistical yearbooks in the period of 2000~2012 as the data sample,finds the three factors on house prices,that is,contemporary per capita income,newly increased housing areas,and property prices in previous year,discusses their influence on property prices by adopting the relevant analysis and multiple linear regression,figures out the main factors for the price fluctuation,and points out some suggestions for controlling the property prices.%依据凯恩斯理论和房地产泡沫理论，以统计年鉴2000年~2012年相关数据作样本，选取了当年年人均收入、新增住房面积、上一年商品房价格三个影响房价的因素，利用相关分析和多元线性回归分析测度其对房价的影响，找出了引起房价波动的主要因素，并提出了控制房价的建议。
van Gaans, P. F. M.; Vriend, S. P.
Application of ridge regression in geoscience usually is a more appropriate technique than ordinary least-squares regression, especially in the situation of highly intercorrelated predictor variables. A FORTRAN 77 program RIDGE for ridged multiple linear regression is presented. The theory of linear regression and ridge regression is treated, to allow for a careful interpretation of the results and to understand the structure of the program. The program gives various parameters to evaluate the extent of multicollinearity within a given regression problem, such as the correlation matrix, multiple correlations among the predictors, variance inflation factors, eigenvalues, condition number, and the determinant of the predictors correlation matrix. The best method for the optimum choice of the ridge parameter with ridge regression has not been established yet. Estimates of the ridge bias, ridged variance inflation factors, estimates, and norms for the ridge parameter therefore are given as output by RIDGE and should complement inspection of the ridge traces. Application within the earth sciences is discussed.
Nie, Z Q; Ou, Y Q; Zhuang, J; Qu, Y J; Mai, J Z; Chen, J M; Liu, X Q
2016-05-10
Conditional logistic regression analysis and unconditional logistic regression analysis are commonly used in case control study, but Cox proportional hazard model is often used in survival data analysis. Most literature only refer to main effect model, however, generalized linear model differs from general linear model, and the interaction was composed of multiplicative interaction and additive interaction. The former is only statistical significant, but the latter has biological significance. In this paper, macros was written by using SAS 9.4 and the contrast ratio, attributable proportion due to interaction and synergy index were calculated while calculating the items of logistic and Cox regression interactions, and the confidence intervals of Wald, delta and profile likelihood were used to evaluate additive interaction for the reference in big data analysis in clinical epidemiology and in analysis of genetic multiplicative and additive interactions. PMID:27188374
Regression Analysis with a Stochastic Design Variable
Sazak,, Hakan S.; Moti L Tiku; Qamarul Islam, M.
2006-01-01
In regression models, the design variable has primarily been treated as a nonstochastic variable. In numerous situations, however, the design variable is stochastic. The estimation and hypothesis testing problems in such situations are considered. Real life examples are given.
Tu, Y-K; Kellett, M; Clerehugh, V; Gilthorpe, M S
2005-10-01
Multivariable analysis is a widely used statistical methodology for investigating associations amongst clinical variables. However, the problems of collinearity and multicollinearity, which can give rise to spurious results, have in the past frequently been disregarded in dental research. This article illustrates and explains the problems which may be encountered, in the hope of increasing awareness and understanding of these issues, thereby improving the quality of the statistical analyses undertaken in dental research. Three examples from different clinical dental specialties are used to demonstrate how to diagnose the problem of collinearity/multicollinearity in multiple regression analyses and to illustrate how collinearity/multicollinearity can seriously distort the model development process. Lack of awareness of these problems can give rise to misleading results and erroneous interpretations. Multivariable analysis is a useful tool for dental research, though only if its users thoroughly understand the assumptions and limitations of these methods. It would benefit evidence-based dentistry enormously if researchers were more aware of both the complexities involved in multiple regression when using these methods and of the need for expert statistical consultation in developing study design and selecting appropriate statistical methodologies.
Institute of Scientific and Technical Information of China (English)
章少萍; 马守治; 陈熙; 童新文; 李秀容; 张维文
2011-01-01
evaluated before and after cementation. Multiple linear regression analysis was used to determine whether the independent variables mentioned above had an impact on the MDAC. Results: Marginal discrepancies increased significantly after cementation.The backward multiple regression analysis showed that the FLP, TAAWP, HP, MDBC, and PLRC were jointly predictives of the MDAC. Conclusion: The FLP and MDBC may have a weak influence on the MDAC, while the TAAWP, HP and PLRC impact MDAC more significantly.
Institute of Scientific and Technical Information of China (English)
黄双萍; 洪添胜; 岳学军; 吴伟斌; 蔡坤; 徐兴
2013-01-01
experimental results: First, compared with various deformations of spectral data, e.g. first derivative spectrum, second derivative spectrum, reciprocal spectrum, logarithmic spectrum, logarithm of reciprocal spectrum, the original high spectral reflectance data, as the vector-descriptor of the samples, achieved the best experimental result when using the approach in this paper. Second, when the Radial Basis Function (RBF) is used as the kernel for SVR and PCA determines the principal components with the cumulative contribution rate set to 99.9%, the model will achieve the best performance and be the most robust. Third, comparative experiments between our method and other mainstream multivariate regression analysis algorithms demonstrate the validity of using SVR and PCA to do modeling. Experimental results show our method is obviously superior to Partial Least Squares (PLS), Back Propagation (BP) and Stepwise Multiple Linear Regression (SMLR). Finally, using SVR to build the regression model based on PCA-processed data successfully achieved the ideal performance index, which indicates the effectiveness of the proposed method and provides a theoretical basis for the applications of high spectral reflectance in non-destructive nitrogen level detection.% 快捷、准确、无损地检测柑橘叶片氮(N)素含量,对柑橘树N肥施用的精准动态管理有重大现实意义.以117株园栽罗岗橙为试验研究对象,在不同生长期用ASD公司的FieldSpec3采集柑橘树健康叶片的高光谱反射值,以高光谱反射数据或其变换形式作为柑橘树样本多元矢量描述；用凯氏定氮法同期检测出柑橘树叶的真实 N素含量值；在用 PCA 对高维光谱矢量降维的基础上,利用支持矢量回归算法(SVR)建立高光谱多元表达和 N素含量间的映射关系,以实现任意柑橘树N素含量的预测分析.试验结果表明,测试集上预测值和真实值间的平方决定系数R2为0.9730,平均相对误差为0.9033%,
Tightness of M-estimators for multiple linear regression in time series
DEFF Research Database (Denmark)
Johansen, Søren; Nielsen, Bent
We show tightness of a general M-estimator for multiple linear regression in time series. The positive criterion function for the M-estimator is assumed lower semi-continuous and sufficiently large for large argument: Particular cases are the Huber-skip and quantile regression. Tightness requires...
Herring, Jennifer C.
This study reviewed the statistical practices in published research articles in the Journal of Education for Students Placed at Risk to determine the reporting of effect sizes and structure coefficients. Of the 12 quantitative studies found in the last 3 volumes of the journal, only 3 were identified as using multiple regression analysis. Two of…
Seeboonruang, U.
2013-12-01
Time series techniques have been extensively applied to research works of many academic disciplines, particularly those concerned with economics and environment. This paper presents application of a time series multiple linear regression technique to a groundwater system to predict groundwater level and salinity fluctuations in a saline area in the northeastern part of Thailand. Surface and groundwater interaction is the major mechanism controlling the shallow subsurface system and salinity of the area. The basic technique is based on the lagged correlation between hydrologic, and hydrogeological and environmental parameters. As a result of a large irrigation project in the area, several regulating gates have been installed to control flooding to the downstream rivers and to provide the upstream areas with sufficient irrigating water. From the lagged correlation analysis, the shallow groundwater and groundwater salinity fluctuation in the irrigating area are shown to be dependent upon the surface water levels at the installed regulated gates and prior rainfall. A set of multiple linear regression equations with lagged time dependent function are then formulated. The dependent variables are groundwater level and groundwater salinity while the independent variables are rainfall rates and water levels measured at the regulating gates. After calibration and verification, the model, as an alternative to the conventional method which requires detailed and continuous variables and is costlier, can be used to forecast and manage future groundwater systems.
Epistasis analysis for quantitative traits by functional regression model.
Zhang, Futao; Boerwinkle, Eric; Xiong, Momiao
2014-06-01
The critical barrier in interaction analysis for rare variants is that most traditional statistical methods for testing interactions were originally designed for testing the interaction between common variants and are difficult to apply to rare variants because of their prohibitive computational time and poor ability. The great challenges for successful detection of interactions with next-generation sequencing (NGS) data are (1) lack of methods for interaction analysis with rare variants, (2) severe multiple testing, and (3) time-consuming computations. To meet these challenges, we shift the paradigm of interaction analysis between two loci to interaction analysis between two sets of loci or genomic regions and collectively test interactions between all possible pairs of SNPs within two genomic regions. In other words, we take a genome region as a basic unit of interaction analysis and use high-dimensional data reduction and functional data analysis techniques to develop a novel functional regression model to collectively test interactions between all possible pairs of single nucleotide polymorphisms (SNPs) within two genome regions. By intensive simulations, we demonstrate that the functional regression models for interaction analysis of the quantitative trait have the correct type 1 error rates and a much better ability to detect interactions than the current pairwise interaction analysis. The proposed method was applied to exome sequence data from the NHLBI's Exome Sequencing Project (ESP) and CHARGE-S study. We discovered 27 pairs of genes showing significant interactions after applying the Bonferroni correction (P-values < 4.58 × 10(-10)) in the ESP, and 11 were replicated in the CHARGE-S study.
Zahari, Siti Meriam; Ramli, Norazan Mohamed; Moktar, Balkiah; Zainol, Mohammad Said
2014-09-01
In the presence of multicollinearity and multiple outliers, statistical inference of linear regression model using ordinary least squares (OLS) estimators would be severely affected and produces misleading results. To overcome this, many approaches have been investigated. These include robust methods which were reported to be less sensitive to the presence of outliers. In addition, ridge regression technique was employed to tackle multicollinearity problem. In order to mitigate both problems, a combination of ridge regression and robust methods was discussed in this study. The superiority of this approach was examined when simultaneous presence of multicollinearity and multiple outliers occurred in multiple linear regression. This study aimed to look at the performance of several well-known robust estimators; M, MM, RIDGE and robust ridge regression estimators, namely Weighted Ridge M-estimator (WRM), Weighted Ridge MM (WRMM), Ridge MM (RMM), in such a situation. Results of the study showed that in the presence of simultaneous multicollinearity and multiple outliers (in both x and y-direction), the RMM and RIDGE are more or less similar in terms of superiority over the other estimators, regardless of the number of observation, level of collinearity and percentage of outliers used. However, when outliers occurred in only single direction (y-direction), the WRMM estimator is the most superior among the robust ridge regression estimators, by producing the least variance. In conclusion, the robust ridge regression is the best alternative as compared to robust and conventional least squares estimators when dealing with simultaneous presence of multicollinearity and outliers.
Elzamly, Abdelrafe; Hussin, Burairah
2014-01-01
The aim of this paper is to propose new mining techniques by which we can study the impact of different risk management techniques and different software risk factors on software analysis development projects. The new mining technique uses the fuzzy multiple regression analysis techniques with fuzzy concepts to manage the software risks in a software project and mitigating risk with software process improvement. Top ten software risk factors in analysis phase and thirty risk management techni...
Survival Analysis with Multivariate adaptive Regression Splines
Kriner, Monika
2007-01-01
Multivariate adaptive regression splines (MARS) are a useful tool to identify linear and nonlinear eﬀects and interactions between two covariates. In this dissertation a new proposal to model survival type data with MARS is introduced. Martingale and deviance residuals of a Cox PH model are used as response in a common MARS approach to model functional forms of covariate eﬀects as well as possible interactions in a data-driven way. Simulation studies prove that the new method yields a bett...
Institute of Scientific and Technical Information of China (English)
王晨羽; 徐骞; 陈紫薇; 林育芳
2015-01-01
Objective:To explore the impact of social networking services and various factors on emo-tions,depression, and self -esteem.Methods:The Chinese Affect Scale , the centre for Epidemiologic studies depression scale and the self -esteem scale were used to collect the data which was statistical de-scribed and multiple logistic regression analyzed by SPSS For Windows 19.0.Results:①The 512 college students were 20.50 ±1.49 years old on average.The average years of using SNS were at (7.16 ±2.67) years,the times to login in SNS per day was 14.31 ±15.96 times on average,the time spent on SNS per day was (2.81 ±2.04) hours on average,and the longest time of one single use was 2.98 ±2.76 hours on average .②Logistic regression analysis on positive emotions showed that OR of "engineering students"was 0.53(P=0.079)compared to "arts students".③Logistic regression analysis on negative emotions showed that OR of "age","years of using SNS"and"time spent on SNS per day"was 1.14 ( P =0.063),0.90(P=0.008)and 1.09(P=0.080).OR of the students who couldn't stand if stop using SNS for a month was 2.41(P=0.003)compared to the students who would feel more relaxed .④Logistic regression analysis on CES -D showed that OR of "the years of using SNS"was 0.89(P=0.007).And ORs of the junior students and senior students were 1.69(P=0.086)and 2.74(P=0.002)compared to the freshmen .ORs of the students who couldn't stand if stop using SNS for a month and the students who didn't care were 2.62(P=0.002)and 1.87(P=0.023)compared to the students who would feel more relaxed.⑤Logistic regression analysis on SES showed that OR of "the times to login in SNS per day"was 1.01(P=0.056).ORs of engineering students and science students were 0.56(P=0.046)and 0.49(P=0.028)compared to art students.OR of the students coming from city was 1.27(P=0.032)compared to the students coming from towns and villages .Conclusion:①Majors have an effect on the positive emo-tions.Age,the years of using SNS ,the time
Simple multiple regression model for long range forecasting of Indian summer monsoon rainfall
Digital Repository Service at National Institute of Oceanography (India)
Sadhuram, Y.; Murthy, T.V.R.
) and ISMR is found to be 0.62. The multiple correlation using the above two parameters is 0.85 which explains 72% variance in ISMR. Using the above two parameters a linear multiple regression model to predict ISMR is developed. The results are comparable...
Urrutia, Jackie D.; Tampis, Razzcelle L.; Mercado, Joseph; Baygan, Aaron Vito M.; Baccay, Edcon B.
2016-02-01
The objective of this research is to formulate a mathematical model for the Philippines' Real Gross Domestic Product (Real GDP). The following factors are considered: Consumers' Spending (x1), Government's Spending (x2), Capital Formation (x3) and Imports (x4) as the Independent Variables that can actually influence in the Real GDP in the Philippines (y). The researchers used a Normal Estimation Equation using Matrices to create the model for Real GDP and used α = 0.01.The researchers analyzed quarterly data from 1990 to 2013. The data were acquired from the National Statistical Coordination Board (NSCB) resulting to a total of 96 observations for each variable. The data have undergone a logarithmic transformation particularly the Dependent Variable (y) to satisfy all the assumptions of the Multiple Linear Regression Analysis. The mathematical model for Real GDP was formulated using Matrices through MATLAB. Based on the results, only three of the Independent Variables are significant to the Dependent Variable namely: Consumers' Spending (x1), Capital Formation (x3) and Imports (x4), hence, can actually predict Real GDP (y). The regression analysis displays that 98.7% (coefficient of determination) of the Independent Variables can actually predict the Dependent Variable. With 97.6% of the result in Paired T-Test, the Predicted Values obtained from the model showed no significant difference from the Actual Values of Real GDP. This research will be essential in appraising the forthcoming changes to aid the Government in implementing policies for the development of the economy.
Multiple Regression Prediction Model for Cutting Forces in Turning Carbon-Reinforced PEEK CF30
Directory of Open Access Journals (Sweden)
Francisco Mata
2010-01-01
Full Text Available Among the thermoplastic polymers available, the reinforced polyetheretherketone with 30% of carbon fibres (PEEK CF 30 demonstrates a particularly good combination of strength, rigidity, and hardness, which prove ideal for industrial applications. Considering these properties and potential areas of application, it is necessary to investigate the machining of PEEK CF30. In this study, response surface methodology was applied to predict the cutting forces in turning operations using TiN-coated cutting tools under dry conditions where the machining parameters are cutting speed ranges, feed rate, and depth of cut. For this study, the experiments have been conducted using full factorial design in the design of experiments (DOEs on CNC turning machine. Based on statistical analysis, multiple quadratic regression model for cutting forces was derived with satisfactory 2-squared correlation. This model proved to be highly preferment for predicting cutting forces.
Multiattribute shopping models and ridge regression analysis
Timmermans, HJP Harry
1981-01-01
Policy decisions regarding retailing facilities essentially involve multiple attributes of shopping centres. If mathematical shopping models are to contribute to these decision processes, their structure should reflect the multiattribute character of retailing planning. Examination of existing models shows that most operational shopping models include only two policy variables. A serious problem in the calibration of the existing multiattribute shopping models is that of multicollinearity ari...
Directory of Open Access Journals (Sweden)
Željko V. Račić
2010-12-01
Full Text Available This paper aims to present the specifics of the application of multiple linear regression model. The economic (financial crisis is analyzed in terms of gross domestic product which is in a function of the foreign trade balance (on one hand and the credit cards, i.e. indebtedness of the population on this basis (on the other hand, in the USA (from 1999. to 2008. We used the extended application model which shows how the analyst should run the whole development process of regression model. This process began with simple statistical features and the application of regression procedures, and ended with residual analysis, intended for the study of compatibility of data and model settings. This paper also analyzes the values of some standard statistics used in the selection of appropriate regression model. Testing of the model is carried out with the use of the Statistics PASW 17 program.
Tools to support interpreting multiple regression in the face of multicollinearity.
Kraha, Amanda; Turner, Heather; Nimon, Kim; Zientek, Linda Reichwein; Henson, Robin K
2012-01-01
While multicollinearity may increase the difficulty of interpreting multiple regression (MR) results, it should not cause undue problems for the knowledgeable researcher. In the current paper, we argue that rather than using one technique to investigate regression results, researchers should consider multiple indices to understand the contributions that predictors make not only to a regression model, but to each other as well. Some of the techniques to interpret MR effects include, but are not limited to, correlation coefficients, beta weights, structure coefficients, all possible subsets regression, commonality coefficients, dominance weights, and relative importance weights. This article will review a set of techniques to interpret MR effects, identify the elements of the data on which the methods focus, and identify statistical software to support such analyses.
Nonparametric survival analysis using Bayesian Additive Regression Trees (BART).
Sparapani, Rodney A; Logan, Brent R; McCulloch, Robert E; Laud, Purushottam W
2016-07-20
Bayesian additive regression trees (BART) provide a framework for flexible nonparametric modeling of relationships of covariates to outcomes. Recently, BART models have been shown to provide excellent predictive performance, for both continuous and binary outcomes, and exceeding that of its competitors. Software is also readily available for such outcomes. In this article, we introduce modeling that extends the usefulness of BART in medical applications by addressing needs arising in survival analysis. Simulation studies of one-sample and two-sample scenarios, in comparison with long-standing traditional methods, establish face validity of the new approach. We then demonstrate the model's ability to accommodate data from complex regression models with a simulation study of a nonproportional hazards scenario with crossing survival functions and survival function estimation in a scenario where hazards are multiplicatively modified by a highly nonlinear function of the covariates. Using data from a recently published study of patients undergoing hematopoietic stem cell transplantation, we illustrate the use and some advantages of the proposed method in medical investigations. Copyright © 2016 John Wiley & Sons, Ltd. PMID:26854022
On asymptotics of t-type regression estimation in multiple linear model
Institute of Scientific and Technical Information of China (English)
无
2004-01-01
We consider a robust estimator (t-type regression estimator) of multiple linear regression model by maximizing marginal likelihood of a scaled t-type error t-distribution.The marginal likelihood can also be applied to the de-correlated response when the withinsubject correlation can be consistently estimated from an initial estimate of the model based on the independent working assumption. This paper shows that such a t-type estimator is consistent.
Directory of Open Access Journals (Sweden)
Mohammad Reza Marami Milani
2016-07-01
Full Text Available This study focuses on multiple linear regression models relating six climate indices (temperature humidity THI, environmental stress ESI, equivalent temperature index ETI, heat load HLI, modified HLI (HLI new, and respiratory rate predictor RRP with three main components of cow’s milk (yield, fat, and protein for cows in Iran. The least absolute shrinkage selection operator (LASSO and the Akaike information criterion (AIC techniques are applied to select the best model for milk predictands with the smallest number of climate predictors. Uncertainty estimation is employed by applying bootstrapping through resampling. Cross validation is used to avoid over-fitting. Climatic parameters are calculated from the NASA-MERRA global atmospheric reanalysis. Milk data for the months from April to September, 2002 to 2010 are used. The best linear regression models are found in spring between milk yield as the predictand and THI, ESI, ETI, HLI, and RRP as predictors with p-value < 0.001 and R2 (0.50, 0.49 respectively. In summer, milk yield with independent variables of THI, ETI, and ESI show the highest relation (p-value < 0.001 with R2 (0.69. For fat and protein the results are only marginal. This method is suggested for the impact studies of climate variability/change on agriculture and food science fields when short-time series or data with large uncertainty are available.
Institute of Scientific and Technical Information of China (English)
谭清武; 李庆华
2009-01-01
Objective To study the risk factors of multiple organ dysfunction syndrome in elderly (MODSE).Methods A retrospective study was conducted on data of 393 patients aging over 60 hospitalized due to lung infection or having lung infection in hospital from 2001 to 2006.The patients were divided into group MODSE(n=196) and group non-MODSE(n=224).Risk factors of statistical significance were first screened out by single factor analysis,and then independent risk factors by stepwise Logistic regression analysis.Results Single factor analysis showed that age,chronic obstructive pulmonary disease,chronic respiratory failure,pulmonary interstitial fibrosis,pulmonary heart disease,coronary heart disease,chronic cardiac insufficiency,cerebrovascular disease,cervical spondylosis,chronic hepatitis and cirrhosis,diabetes,hyperuricemia,chronic renal failure,malignant tumor,hemoglobin,albumin,urea nitrogen,creatinine and fasting blood glucose were risk factors of MODSE.Stepwise Logistic regression analysis showed that chronic obstructive pulmonary disease,chronic respiratory failure,pulmonary fibrosis,chronic cardiac insufficiency,cerebrovascular disease,diabetes,chronic renal failure,low hemoglobin,low albumin,high urea nitrogen and high fasting blood glucose were independent risk factors of MODSE.Conclusion Chronic obstructive pulmonary disease,chronic respiratory failure,pulmonary fibrosis,chronic cardiac insufficiency,cerebrovascular disease,diabetes,chronic renal failure,low hemoglobin,low albumin,high urea nitrogen and high fasting blood glucose were independent risk factors of MODSE.%目的 探讨老年多器官功能不全综合征(MODSE)的发病危险因素.方法 回顾性调查2001-2006年因肺部感染在我院住院或住院期间出现肺部感染的驻石家庄地区60岁以上的师以上军队离退休干部393例的病历资料,根据肺部感染是否诱发MODSE将393例患者分为MODSE组(169例)和非MODSE组(224例).先以单因素分析筛选有统计学
Neutron multiplicity analysis tool
Energy Technology Data Exchange (ETDEWEB)
Stewart, Scott L [Los Alamos National Laboratory
2010-01-01
I describe the capabilities of the EXCOM (EXcel based COincidence and Multiplicity) calculation tool which is used to analyze experimental data or simulated neutron multiplicity data. The input to the program is the count-rate data (including the multiplicity distribution) for a measurement, the isotopic composition of the sample and relevant dates. The program carries out deadtime correction and background subtraction and then performs a number of analyses. These are: passive calibration curve, known alpha and multiplicity analysis. The latter is done with both the point model and with the weighted point model. In the current application EXCOM carries out the rapid analysis of Monte Carlo calculated quantities and allows the user to determine the magnitude of sample perturbations that lead to systematic errors. Neutron multiplicity counting is an assay method used in the analysis of plutonium for safeguards applications. It is widely used in nuclear material accountancy by international (IAEA) and national inspectors. The method uses the measurement of the correlations in a pulse train to extract information on the spontaneous fission rate in the presence of neutrons from ({alpha},n) reactions and induced fission. The measurement is relatively simple to perform and gives results very quickly ({le} 1 hour). By contrast, destructive analysis techniques are extremely costly and time consuming (several days). By improving the achievable accuracy of neutron multiplicity counting, a nondestructive analysis technique, it could be possible to reduce the use of destructive analysis measurements required in safeguards applications. The accuracy of a neutron multiplicity measurement can be affected by a number of variables such as density, isotopic composition, chemical composition and moisture in the material. In order to determine the magnitude of these effects on the measured plutonium mass a calculational tool, EXCOM, has been produced using VBA within Excel. This
Institute of Scientific and Technical Information of China (English)
万瑞平; 刘振寰; 林青梅
2011-01-01
Objective To analyze the correlative factors influencing quality of life(QOL) in children with cerebral palsy(CP). Methods Eighty children with CP( CP group) and 80 healthy children( healthy control group) were eveluated by Pediatric Quality of Life Inventory Version 4 (PedsQL4.0) to assess their QOL,and then the differences in QOL of children were compared between the 2 groups. Children with CP were also assessed using Gesell Developmental Scale(GDS) and Gross Motor Function Classification System(GMFCS) to test their developmental quotient and severity, and then the correlation among QOL,sex, family incomes, clinical types, GM FCS,and the intelligence capacity were analyzed by multiple regression analysis. Results There were significant differences in physical function/aspect, emotional function, social function, psychological aspect and total QOL between CP group and healthy conorol group (Pa ＜ 0.01 ). Intelligence degree was positive correlated to total score of QOL. Severity degree and intelligence degree were positive correlated to physical aspect, and age was negative correlated to physical aspect, while severity degree affected physical aspect most. Intelligence degree was positive correlated to psychological aspects. Conclusions QOL of children with CP had impairment in full - scale. The intelligence capacity and the physical functions and intelligence degree are important factors which influence QOL of children with CP.%目的 分析影响脑性瘫痪(脑瘫)儿童生存质量的相关因素.方法 将确诊为脑瘫的80例患儿作为脑瘫组,同时选择80例同龄健康儿童作为健康对照组.采用儿童生存质量的PedsQL4.0普适性核心量表对2组儿童的生存质量进行评定,比较2组儿童生存质量的差异;采用粗大运动功能分级系统(GMFCS)评定脑瘫患儿粗大运动功能的级别,采用北京Gesell发育商评定脑瘫患儿的智力水平;采用多重线性回归分析脑瘫患儿生存质量与性别、月
Institute of Scientific and Technical Information of China (English)
章杰宽
2011-01-01
作者历时2个多月，在大量走访以及问卷调查的基础之上，着重研究分析了影响国内老年游客旅游消费行为的众多因素，并运用多元逐步回归分析方法研究了各因素对老年人旅游消费行为的影响程度。结论显示，影响老年人旅游行为的主要有13个因素，其中老年人的收入水平、旅游地景点的吸引力是影响老年人旅游行为——旅游次数、旅游停留时间和旅游日消费额的共同因素，而收入水平最为关键。%As our country population aging advancement is more and more obvious, the old tourist industry is rapidly becoming an important part of the tour market. Experience and theory of tourism behavior have shown that travel frequency, residence time and amount of tourism consumption are the main indicators to measure the attractiveness of a tourism destination. This paper makes an empirical study through questionnaires among the old tourists located in 12 main tourist attractions in Xi＇ an. Based on 800 questionnaires, this paper emphatically analyses the influencing factors of the domestic old tourists＇ consumption behavior and employs the multiple stepwise regression analysis to have studied the affecting degree of every factor. Results conclude that 13 main factors affect the travel behavior of older people; they are physical condition, income, attitude of tourism, spouse, attitude of sons and daughters, related groups, tourism prices, distance, security, climatic conditions, food and accommodation, transport and tourism attraction. Among these factors, income and tourism attraction are the common factors affecting old tourists＇ travel frequency, residence time, amount of consumption per day. Specifically, the old tourists＇ travel frequency is directly proportional to income, attitude of tourism, attitude of sons and daughters, physical condition, tourism attraction and is inversely proportional to distance. The old tourists＇ residence
3D Regression Heat Map Analysis of Population Study Data.
Klemm, Paul; Lawonn, Kai; Glaßer, Sylvia; Niemann, Uli; Hegenscheid, Katrin; Völzke, Henry; Preim, Bernhard
2016-01-01
Epidemiological studies comprise heterogeneous data about a subject group to define disease-specific risk factors. These data contain information (features) about a subject's lifestyle, medical status as well as medical image data. Statistical regression analysis is used to evaluate these features and to identify feature combinations indicating a disease (the target feature). We propose an analysis approach of epidemiological data sets by incorporating all features in an exhaustive regression-based analysis. This approach combines all independent features w.r.t. a target feature. It provides a visualization that reveals insights into the data by highlighting relationships. The 3D Regression Heat Map, a novel 3D visual encoding, acts as an overview of the whole data set. It shows all combinations of two to three independent features with a specific target disease. Slicing through the 3D Regression Heat Map allows for the detailed analysis of the underlying relationships. Expert knowledge about disease-specific hypotheses can be included into the analysis by adjusting the regression model formulas. Furthermore, the influences of features can be assessed using a difference view comparing different calculation results. We applied our 3D Regression Heat Map method to a hepatic steatosis data set to reproduce results from a data mining-driven analysis. A qualitative analysis was conducted on a breast density data set. We were able to derive new hypotheses about relations between breast density and breast lesions with breast cancer. With the 3D Regression Heat Map, we present a visual overview of epidemiological data that allows for the first time an interactive regression-based analysis of large feature sets with respect to a disease. PMID:26529689
3D Regression Heat Map Analysis of Population Study Data.
Klemm, Paul; Lawonn, Kai; Glaßer, Sylvia; Niemann, Uli; Hegenscheid, Katrin; Völzke, Henry; Preim, Bernhard
2016-01-01
Epidemiological studies comprise heterogeneous data about a subject group to define disease-specific risk factors. These data contain information (features) about a subject's lifestyle, medical status as well as medical image data. Statistical regression analysis is used to evaluate these features and to identify feature combinations indicating a disease (the target feature). We propose an analysis approach of epidemiological data sets by incorporating all features in an exhaustive regression-based analysis. This approach combines all independent features w.r.t. a target feature. It provides a visualization that reveals insights into the data by highlighting relationships. The 3D Regression Heat Map, a novel 3D visual encoding, acts as an overview of the whole data set. It shows all combinations of two to three independent features with a specific target disease. Slicing through the 3D Regression Heat Map allows for the detailed analysis of the underlying relationships. Expert knowledge about disease-specific hypotheses can be included into the analysis by adjusting the regression model formulas. Furthermore, the influences of features can be assessed using a difference view comparing different calculation results. We applied our 3D Regression Heat Map method to a hepatic steatosis data set to reproduce results from a data mining-driven analysis. A qualitative analysis was conducted on a breast density data set. We were able to derive new hypotheses about relations between breast density and breast lesions with breast cancer. With the 3D Regression Heat Map, we present a visual overview of epidemiological data that allows for the first time an interactive regression-based analysis of large feature sets with respect to a disease.
A Spreadsheet Tool for Learning the Multiple Regression F-Test, T-Tests, and Multicollinearity
Martin, David
2008-01-01
This note presents a spreadsheet tool that allows teachers the opportunity to guide students towards answering on their own questions related to the multiple regression F-test, the t-tests, and multicollinearity. The note demonstrates approaches for using the spreadsheet that might be appropriate for three different levels of statistics classes,…
International Nuclear Information System (INIS)
We report a case of tumor regression of multiple bone metastases from breast carcinoma after administration of strontium-89 chloride. This case suggests that strontium-89 chloride can not only relieve bone metastases pain not responsive to analgesics, but may also have a tumoricidal effect on bone metastases
基于多重回归分析的DV-HOP定位算法研究%Study for DV-HOP localization algorithm based on multiple regression analysis
Institute of Scientific and Technical Information of China (English)
胡燕; 单志龙
2011-01-01
In the wireless sensor network,localization accuracy of DV-HOP algorithm is poor. This paper proposed a new RS-DV-HOP( RSSI and statistics DV-HOP) localization algorithm. After making the regression analysis to anchor-node information, being used the regression model to locate unknown nodes. It proved that the localization accuracy of the RSDV-HOP algorithm has greatly improved compared with DV-HOP algorithm.%针对无线传感器网络中已有DV-HOP定位算法节点定位精度不高的问题,提出了一种RSDV-HOP( RSSI and statistics DV-HOP)算法.该算法运用统计学中的多重回归分析方法,通过对锚节点信息建立的回归模型,运用到全网中实现对未知节点的定位.仿真结果表明,RSDV-HOP算法在定位精度上相比DV-HOP算法有明显的提高.
Ahn, Kuk-Hyun; Palmer, Richard
2016-09-01
Despite wide use of regression-based regional flood frequency analysis (RFFA) methods, the majority are based on either ordinary least squares (OLS) or generalized least squares (GLS). This paper proposes 'spatial proximity' based RFFA methods using the spatial lagged model (SLM) and spatial error model (SEM). The proposed methods are represented by two frameworks: the quantile regression technique (QRT) and parameter regression technique (PRT). The QRT develops prediction equations for flooding quantiles in average recurrence intervals (ARIs) of 2, 5, 10, 20, and 100 years whereas the PRT provides prediction of three parameters for the selected distribution. The proposed methods are tested using data incorporating 30 basin characteristics from 237 basins in Northeastern United States. Results show that generalized extreme value (GEV) distribution properly represents flood frequencies in the study gages. Also, basin area, stream network, and precipitation seasonality are found to be the most effective explanatory variables in prediction modeling by the QRT and PRT. 'Spatial proximity' based RFFA methods provide reliable flood quantile estimates compared to simpler methods. Compared to the QRT, the PRT may be recommended due to its accuracy and computational simplicity. The results presented in this paper may serve as one possible guidepost for hydrologists interested in flood analysis at ungaged sites.
Background stratified Poisson regression analysis of cohort data
Richardson, David B.; Langholz, Bryan
2011-01-01
Background stratified Poisson regression is an approach that has been used in the analysis of data derived from a variety of epidemiologically important studies of radiation-exposed populations, including uranium miners, nuclear industry workers, and atomic bomb survivors. We describe a novel approach to fit Poisson regression models that adjust for a set of covariates through background stratification while directly estimating the radiation-disease association of primary interest. The approa...
Innovation and market value: a quantile regression analysis
Alex Coad; Rekha Rao
2006-01-01
We construct a new database by matching firm-level Compustat data to NBER patent data, for four 2-digit complex technology sectors. Whilst conventional regression estimators show that the stock market does recognise efforts at innovation, quantile regression analysis adds a new dimension to the literature, suggesting that the influence of innovation on market value varies dramatically across the market value distribution. For firms with a low value of Tobin's q, the stock market will barely r...
Linear regression and sensitivity analysis in nuclear reactor design
International Nuclear Information System (INIS)
Highlights: • Presented a benchmark for the applicability of linear regression to complex systems. • Applied linear regression to a nuclear reactor power system. • Performed neutronics, thermal–hydraulics, and energy conversion using Brayton’s cycle for the design of a GCFBR. • Performed detailed sensitivity analysis to a set of parameters in a nuclear reactor power system. • Modeled and developed reactor design using MCNP, regression using R, and thermal–hydraulics in Java. - Abstract: The paper presents a general strategy applicable for sensitivity analysis (SA), and uncertainity quantification analysis (UA) of parameters related to a nuclear reactor design. This work also validates the use of linear regression (LR) for predictive analysis in a nuclear reactor design. The analysis helps to determine the parameters on which a LR model can be fit for predictive analysis. For those parameters, a regression surface is created based on trial data and predictions are made using this surface. A general strategy of SA to determine and identify the influential parameters those affect the operation of the reactor is mentioned. Identification of design parameters and validation of linearity assumption for the application of LR of reactor design based on a set of tests is performed. The testing methods used to determine the behavior of the parameters can be used as a general strategy for UA, and SA of nuclear reactor models, and thermal hydraulics calculations. A design of a gas cooled fast breeder reactor (GCFBR), with thermal–hydraulics, and energy transfer has been used for the demonstration of this method. MCNP6 is used to simulate the GCFBR design, and perform the necessary criticality calculations. Java is used to build and run input samples, and to extract data from the output files of MCNP6, and R is used to perform regression analysis and other multivariate variance, and analysis of the collinearity of data
Institute of Scientific and Technical Information of China (English)
崔利宏; 何裕民; 倪红梅
2013-01-01
Objective:To study the affecting factors of allergy in sub-health people,and prevent the occurrence of allergy. Methods:Possible affecting factors of allergy in 6 975 cases of sub-health people were filtered by using multiple stepwise regression method. There were thirteen factors leading to the multiple stepwise regression model, namely: degree, age, fatigue, digestion, sleep, plant nerve, immunity, aging,constipation,depression,learning,memory,self-realization and sex. Results:Statistical results showed that allergy had correlation with 19 aspects of four areas of body performance,psychological,social adaptation,sex,and age and degree. Among these,a positive correlation was presented between allergy and 19 aspects of four areas of body performance,psychological ,social adaptation,sex,and degree and a negative correlation was existed between allergy and age,and both with statistical significance(P <0.01). Conclusion:The prevention of allergy should focus on the whole adjustment and strengthen people's physique. In clinical,affecting factors of allergy should be fully considered in order to avoid missed diagnosis, erroneous diagnosis and delay of the illness and reduce the quality of life.%目的:探讨亚健康人群中过敏的影响因素,预防过敏的发生.方法:对6 975例亚健康人群,采用多元逐步回归方法对过敏的可能影响因素进行筛选.进入多元逐步回归模型的因素有13个,分别是:学历、年龄、疲劳、消化、睡眠、植物神经、免疫力、衰老、便秘、抑郁、学习、记忆力、自我实现及性生活.结果:统计结果显示,过敏方面与躯体表现、心理表现、社会适应、性生活四个领域的19个方面及年龄、学历均存在相关性.其中,过敏方面与躯体表现、心理表现、社会适应、性生活四个领域的19个方面及学历均呈正相关；与年龄呈负相关.统计学均具有极显著性意义(P＜0.01).结论:预防过敏应注重整体调整、增
Simulation Experiments in Practice : Statistical Design and Regression Analysis
Kleijnen, J.P.C.
2007-01-01
In practice, simulation analysts often change only one factor at a time, and use graphical analysis of the resulting Input/Output (I/O) data. Statistical theory proves that more information is obtained when applying Design Of Experiments (DOE) and linear regression analysis. Unfortunately, classic t
International Nuclear Information System (INIS)
Aerosol optical depth (AOD) from AERONET data has a very fine resolution but air pollution index (API), visibility and relative humidity from the ground truth measurements are coarse. To obtain the local AOD in the atmosphere, the relationship between these three parameters was determined using multiple regression analysis. The data of southwest monsoon period (August to September, 2012) taken in Penang, Malaysia, was used to establish a quantitative relationship in which the AOD is modeled as a function of API, relative humidity, and visibility. The highest correlated model was used to predict AOD values during southwest monsoon period. When aerosol is not uniformly distributed in the atmosphere then the predicted AOD can be highly deviated from the measured values. Therefore these deviated data can be removed by comparing between the predicted AOD values and the actual AERONET data which help to investigate whether the non uniform source of the aerosol is from the ground surface or from higher altitude level. This model can accurately predict AOD if only the aerosol is uniformly distributed in the atmosphere. However, further study is needed to determine this model is suitable to use for AOD predicting not only in Penang, but also other state in Malaysia or even global
QSAR study of prolylcarboxypeptidase inhibitors by genetic algorithm: Multiple linear regressions
Indian Academy of Sciences (India)
Eslam Pourbasheer; Saadat Vahdani; Reza Aalizadeh; Alireza Banaei; Mohammad Reza Ganjali
2015-07-01
The predictive analysis based on quantitative structure activity relationships (QSAR) on benzim-idazolepyrrolidinyl amides as prolylcarboxypeptidase (PrCP) inhibitors was performed. Molecules were represented by chemical descriptors that encode constitutional, topological, geometrical, and electronic structure features. The hierarchical clustering method was used to classify the dataset into training and test subsets. The important descriptors were selected with the aid of the genetic algorithm method. The QSAR model was constructed, using the multiple linear regressions (MLR), and its robustness and predictability were verified by internal and external cross-validation methods. Furthermore, the calculation of the domain of applicability defines the area of reliable predictions. The root mean square errors (RMSE) of the training set and the test set for GA-MLR model were calculated to be 0.176, 0.279 and the correlation coefficients (R2) were obtained to be 0.839, 0.923, respectively. The proposed model has good stability, robustness and predictability when verified by internal and external validation.
Multiple regression as a preventive tool for determining the risk of Legionella spp.
Directory of Open Access Journals (Sweden)
Enrique Gea-Izquierdo
2012-04-01
Full Text Available To determine the interrelationship between health & hygiene conditions for prevention of legionellosis, the compositionof materials used in water distribution systems, the water origin and Legionella pneumophila risk. Material and methods. Include adescriptive study and multiple regression analysis on a sample of golf course sprinkler irrigation systems (n=31 pertaining to hotelslocated on the Costa del Sol (Malaga, Spain. The study was carried out in 2009. Results. Presented a significant lineal relation, withall the independent variables contributing significantly (p<0.05 to the model’s fit. The relationship between water type and the risk ofLegionella, as well as the material composition and the latter, is lineal and positive. In contrast, the relationship between health-hygieneconditions and Legionella risk is lineal and negative. Conclusion. The characterization of Legionella pneumophila concentration, asdefined by the risk in water and through use of the predictive method, can contribute to the consideration of new influence variables inthe development of the agent, resulting in improved control and prevention of the disease.
Directory of Open Access Journals (Sweden)
Carlos Monge Perry
2014-07-01
Full Text Available Structural equation modeling (SEM has traditionally been deployed in areas of marketing, consumer satisfaction and preferences, human behavior, and recently in strategic planning. These areas are considered their niches; however, there is a remarkable tendency in empirical research studies that indicate a more diversified use of the technique. This paper shows the application of structural equation modeling using partial least square (PLS-SEM, in areas of manufacturing, quality, continuous improvement, operational efficiency, and environmental responsibility in Mexico’s medium and large manufacturing plants, while using a small sample (n = 40. The results obtained from the PLS-SEM model application mentioned, are highly positive, relevant, and statistically significant. Also shown in this paper, for purposes of validity, reliability, and statistical power confirmation of PLS-SEM, is a comparative analysis against multiple regression showing very similar results to those obtained by PLS-SEM. This fact validates the use of PLS-SEM in areas of untraditional scientific research, and suggests and invites the use of the technique in diversified fields of the scientific research
Institute of Scientific and Technical Information of China (English)
Ghiasi Majid; Askarnejad Nematollah; Dindarloo Saeid R.; Shamsoddini Hamed
2016-01-01
The most important objective of blasting in open pit mines is rock fragmentation. Prediction of produced boulders (oversized crushed rocks) is a key parameter in designing blast patterns. In this study, the amount of boulder produced in blasting operations of Golegohar iron ore open pit mine, Iran was pre-dicted via multiple regression method and artificial neural networks. Results of 33 blasts in the mine were collected for modeling. Input variables were: joints spacing, density and uniaxial compressive strength of the intact rock, burden, spacing, stemming, bench height to burden ratio, and specific charge. The dependent variable was ratio of boulder volume to pattern volume. Both techniques were successful in predicting the ratio. In this study, the multiple regression method was superior with coefficient of determination and root mean squared error values of 0.89 and 0.19, respectively.
Multiple Linear Regression Application on the Inter-Network Settlement of Internet
Institute of Scientific and Technical Information of China (English)
YANG Qing-feng; ZHANG Qi-xiang; L(U) Ting-jie
2006-01-01
This paper develops an analytical framework to explain the Internet interconnection settlement issues. The paper shows that multiple linear regression can be used in assessing the network value of Internet Backbone Providers (IBPs).By using the exchange rate of each network, we can define a rate of network value, which reflects the contribution of each network to interconnection and the interconnected network resource usage by each of the network.
Variable selection in multiple linear regression: The influence of individual cases
SJ Steel; DW Uys
2007-01-01
The influence of individual cases in a data set is studied when variable selection is applied in multiple linear regression. Two different influence measures, based on the C_p criterion and Akaike's information criterion, are introduced. The relative change in the selection criterion when an individual case is omitted is proposed as the selection influence of the specific omitted case. Four standard examples from the literature are considered and the selection influence of the cases is calcul...
Time series analysis using semiparametric regression on oil palm production
Yundari, Pasaribu, U. S.; Mukhaiyar, U.
2016-04-01
This paper presents semiparametric kernel regression method which has shown its flexibility and easiness in mathematical calculation, especially in estimating density and regression function. Kernel function is continuous and it produces a smooth estimation. The classical kernel density estimator is constructed by completely nonparametric analysis and it is well reasonable working for all form of function. Here, we discuss about parameter estimation in time series analysis. First, we consider the parameters are exist, then we use nonparametrical estimation which is called semiparametrical. The selection of optimum bandwidth is obtained by considering the approximation of Mean Integrated Square Root Error (MISE).
Ratio Versus Regression Analysis: Some Empirical Evidence in Brazil
Directory of Open Access Journals (Sweden)
Newton Carneiro Affonso da Costa Jr.
2004-06-01
Full Text Available This work compares the traditional methodology for ratio analysis, applied to a sample of Brazilian firms, with the alternative one of regression analysis both to cross-industry and intra-industry samples. It was tested the structural validity of the traditional methodology through a model that represents its analogous regression format. The data are from 156 Brazilian public companies in nine industrial sectors for the year 1997. The results provide weak empirical support for the traditional ratio methodology as it was verified that the validity of this methodology may differ between ratios.
Institute of Scientific and Technical Information of China (English)
王建芳
2013-01-01
In this paper,stroke incidence impact factors were analyzed.First,the huge cases information through statistics and analysis,then it presented a mathematical model through regression fitting method,and established the relationship between stroke incidence and air temperature,barometric pressure and humidity.Last,it made some suggestions on the high-risk groups.As a result,the 2012 Higher Education Press Cup National Mathematical Contest in Modeling C title problem given a complete answer.%对脑卒中发病影响因子进行了分析和研究.首先对庞大的病例信息进行了统计分析,然后通过回归拟合的方法建立了数学模型,确立了脑卒中发病率与气温、气压和湿度间的关系,最后就高危人群提出了一些建议.由此,对2012“高教社杯”全国大学生数学建模竞赛C题的各问题给出了完整的解答.
Directory of Open Access Journals (Sweden)
Yoonsu Shin
2016-01-01
Full Text Available In the 5G era, the operational cost of mobile wireless networks will significantly increase. Further, massive network capacity and zero latency will be needed because everything will be connected to mobile networks. Thus, self-organizing networks (SON are needed, which expedite automatic operation of mobile wireless networks, but have challenges to satisfy the 5G requirements. Therefore, researchers have proposed a framework to empower SON using big data. The recent framework of a big data-empowered SON analyzes the relationship between key performance indicators (KPIs and related network parameters (NPs using machine-learning tools, and it develops regression models using a Gaussian process with those parameters. The problem, however, is that the methods of finding the NPs related to the KPIs differ individually. Moreover, the Gaussian process regression model cannot determine the relationship between a KPI and its various related NPs. In this paper, to solve these problems, we proposed multivariate multiple regression models to determine the relationship between various KPIs and NPs. If we assume one KPI and multiple NPs as one set, the proposed models help us process multiple sets at one time. Also, we can find out whether some KPIs are conflicting or not. We implement the proposed models using MapReduce.
Analysis of Sting Balance Calibration Data Using Optimized Regression Models
Ulbrich, N.; Bader, Jon B.
2010-01-01
Calibration data of a wind tunnel sting balance was processed using a candidate math model search algorithm that recommends an optimized regression model for the data analysis. During the calibration the normal force and the moment at the balance moment center were selected as independent calibration variables. The sting balance itself had two moment gages. Therefore, after analyzing the connection between calibration loads and gage outputs, it was decided to choose the difference and the sum of the gage outputs as the two responses that best describe the behavior of the balance. The math model search algorithm was applied to these two responses. An optimized regression model was obtained for each response. Classical strain gage balance load transformations and the equations of the deflection of a cantilever beam under load are used to show that the search algorithm s two optimized regression models are supported by a theoretical analysis of the relationship between the applied calibration loads and the measured gage outputs. The analysis of the sting balance calibration data set is a rare example of a situation when terms of a regression model of a balance can directly be derived from first principles of physics. In addition, it is interesting to note that the search algorithm recommended the correct regression model term combinations using only a set of statistical quality metrics that were applied to the experimental data during the algorithm s term selection process.
Sparse Regression by Projection and Sparse Discriminant Analysis
Qi, Xin
2015-04-03
© 2015, © American Statistical Association, Institute of Mathematical Statistics, and Interface Foundation of North America. Recent years have seen active developments of various penalized regression methods, such as LASSO and elastic net, to analyze high-dimensional data. In these approaches, the direction and length of the regression coefficients are determined simultaneously. Due to the introduction of penalties, the length of the estimates can be far from being optimal for accurate predictions. We introduce a new framework, regression by projection, and its sparse version to analyze high-dimensional data. The unique nature of this framework is that the directions of the regression coefficients are inferred first, and the lengths and the tuning parameters are determined by a cross-validation procedure to achieve the largest prediction accuracy. We provide a theoretical result for simultaneous model selection consistency and parameter estimation consistency of our method in high dimension. This new framework is then generalized such that it can be applied to principal components analysis, partial least squares, and canonical correlation analysis. We also adapt this framework for discriminant analysis. Compared with the existing methods, where there is relatively little control of the dependency among the sparse components, our method can control the relationships among the components. We present efficient algorithms and related theory for solving the sparse regression by projection problem. Based on extensive simulations and real data analysis, we demonstrate that our method achieves good predictive performance and variable selection in the regression setting, and the ability to control relationships between the sparse components leads to more accurate classification. In supplementary materials available online, the details of the algorithms and theoretical proofs, and R codes for all simulation studies are provided.
Miozzo, Michele; Pulvermüller, Friedemann; Hauk, Olaf
2015-10-01
The time course of brain activation during word production has become an area of increasingly intense investigation in cognitive neuroscience. The predominant view has been that semantic and phonological processes are activated sequentially, at about 150 and 200-400 ms after picture onset. Although evidence from prior studies has been interpreted as supporting this view, these studies were arguably not ideally suited to detect early brain activation of semantic and phonological processes. We here used a multiple linear regression approach to magnetoencephalography (MEG) analysis of picture naming in order to investigate early effects of variables specifically related to visual, semantic, and phonological processing. This was combined with distributed minimum-norm source estimation and region-of-interest analysis. Brain activation associated with visual image complexity appeared in occipital cortex at about 100 ms after picture presentation onset. At about 150 ms, semantic variables became physiologically manifest in left frontotemporal regions. In the same latency range, we found an effect of phonological variables in the left middle temporal gyrus. Our results demonstrate that multiple linear regression analysis is sensitive to early effects of multiple psycholinguistic variables in picture naming. Crucially, our results suggest that access to phonological information might begin in parallel with semantic processing around 150 ms after picture onset.
Regression analysis for solving diagnosis problem of children's health
Cherkashina, Yu A.; Gerget, O. M.
2016-04-01
The paper includes results of scientific researches. These researches are devoted to the application of statistical techniques, namely, regression analysis, to assess the health status of children in the neonatal period based on medical data (hemostatic parameters, parameters of blood tests, the gestational age, vascular-endothelial growth factor) measured at 3-5 days of children's life. In this paper a detailed description of the studied medical data is given. A binary logistic regression procedure is discussed in the paper. Basic results of the research are presented. A classification table of predicted values and factual observed values is shown, the overall percentage of correct recognition is determined. Regression equation coefficients are calculated, the general regression equation is written based on them. Based on the results of logistic regression, ROC analysis was performed, sensitivity and specificity of the model are calculated and ROC curves are constructed. These mathematical techniques allow carrying out diagnostics of health of children providing a high quality of recognition. The results make a significant contribution to the development of evidence-based medicine and have a high practical importance in the professional activity of the author.
Regression Analysis: Instructional Resource for Cost/Managerial Accounting
Stout, David E.
2015-01-01
This paper describes a classroom-tested instructional resource, grounded in principles of active learning and a constructivism, that embraces two primary objectives: "demystify" for accounting students technical material from statistics regarding ordinary least-squares (OLS) regression analysis--material that students may find obscure or…
Early cost estimating for road construction projects using multiple regression techniques
Directory of Open Access Journals (Sweden)
Ibrahim Mahamid
2011-12-01
Full Text Available The objective of this study is to develop early cost estimating models for road construction projects using multiple regression techniques, based on 131 sets of data collected in the West Bank in Palestine. As the cost estimates are required at early stages of a project, considerations were given to the fact that the input data for the required regression model could be easily extracted from sketches or scope definition of the project. 11 regression models are developed to estimate the total cost of road construction project in US dollar; 5 of them include bid quantities as input variables and 6 include road length and road width. The coefficient of determination r2 for the developed models is ranging from 0.92 to 0.98 which indicate that the predicted values from a forecast models fit with the real-life data. The values of the mean absolute percentage error (MAPE of the developed regression models are ranging from 13% to 31%, the results compare favorably with past researches which have shown that the estimate accuracy in the early stages of a project is between ±25% and ±50%.
Institute of Scientific and Technical Information of China (English)
于阳阳; 赵伟; 刘晓燕; 邹冬荣; 杨晓昀; 刘荣; 于晓峰; 营杰
2016-01-01
Objective The purpose of this study was to study the correlation between dental fluorosis, saliva and plaque fluoride levels and urinary fluoride values in adolescents dental fluorosis. Methods A middle school was chosen as a survey point in the study. Two hundred adolescents were examined the degree of dental fluorosis by Dean's method. These adolescents were divided into four groups according to the severity of fluorosis (n = 52, 40, 28 and 80). Fluoride ion specific electrode was used to measure the fluoride levels in dental plaque, saliva, urinary and drinking water. The differences were analyzed b y ANOVA. Correlation of the fluoride levels between dental plaque, saliva, urine and the degree of dental fluorosis were analyzed by the method of multiple linear regression. Results The average fluoride content of drinking water was (2.20 ± 0.40) mg/L. Compared with controls, the fluoride concentrations in dental plaque, saliva and urine were higher in light, medium and severe dental fluorosis groups [(1.55 ± 0.88), (1.94 ± 0.77), (2.74 ± 0.83) than (0.32 ± 0.20) mg/L; (4.44 ± 1.62), (8.09 ± 0.93), (10.72 ± 0.99) than (0.02 ± 0.01) mg/L;(31.77 ± 6.09), (57.98 ± 1.83), (65.98 ± 2.78) than (13.06 ± 2.11) μg/g, all P<0.05]. Urinary fluoride was correlated with fluoride in saliva and dental plaque (r=0.245, 0.440, all P<0.05). Saliva fluoride was correlated with fluoride in dental plaque (r=0.849, P<0.01). The degree of dental fluorosis was correlated with fluoride in urine and saliva (r = 0.497, 0.896, 0.924, all P< 0.01). The multiple linear regression equation between fluoride in urine and the degree of dent al fluorosis, fluoride in dental plaque and saliva was as follow: y = 1.357 + 1.618x1 + 0.001x2 - 0.331x3 ± 0.69. Conclusions The metabolism of fluoride in body is related with oral fluoride repository in adolescents dental fluorosis. Fluoride in urine is influenced by plaque fluoride level, saliva fluoride concentration and the degree of dental
Institute of Scientific and Technical Information of China (English)
郑力会; 王金凤; 李潇鹏; 张燕; 李都
2008-01-01
In order to optimize plastic viscosity of 18 mPa·s circulating micro-bubble drilling fluid formula,orthogonal and uniform experimental design methods were applied,and the plastic viscosities of 36 and 24 groups of agent were tested,respectively.It is found that these two experimental design methods show drawbacks,that is,the amount of agent is difficult to determine,and the results are not fully optimized.Therefore,multiple regression experimental method was used to design experimental formula.By randomly selecting arbitrary agent with the amount within the recommended range,17 groups of drilling fluid formula were designed,and the plastic viscosity of each experiment formula was measured.Set plastic viscosity as the objective function,through multiple regressions,then quadratic regression model is obtained,whose correlation coefficient meets the requirement.Set target values of plastic viscosity to be 18,20 and 22 mPa·s,respectively,with the trial method,5 drilling fluid formulas are obtained with accuracy of 0.000 3,0.000 1 and 0.000 3.Arbitrarily select target value of each of the two groups under the formula for experimental verification of drilling fluid,then the measurement errors between theoretical and tested plastic viscosity are less than 5%,confirming that regression model can be applied to optimizing the circulating of plastic-foam drilling fluid viscosity.In accordance with the precision of different formulations of drilling fluid for other constraints,the methods result in the optimization of the circulating micro-bubble drilling fluid parameters.
Estimation of mass flow of seeds using fibre sensor and multiple linear regression modelling
Al-Mallahi, A. A.; Kataoka, T
2013-01-01
A new methodology to estimate the mass of grain seeds, which flow in the shape of clumps, was suggested in this paper. The methodology used an off-the-shelf digital fibre sensor to detect the behaviour of the clumps and multiple linear regression modelling to estimate the mass by the parameters detected by the sensor which were the length and the density of the clumps. An indoor apparatus was used for modelling which resembled the sowing process using the grain drill. A fluted roller was inst...
Mekanik, F.; Imteaz, M. A.; Gato-Trinidad, S.; Elmahdi, A.
2013-10-01
In this study, the application of Artificial Neural Networks (ANN) and Multiple regression analysis (MR) to forecast long-term seasonal spring rainfall in Victoria, Australia was investigated using lagged El Nino Southern Oscillation (ENSO) and Indian Ocean Dipole (IOD) as potential predictors. The use of dual (combined lagged ENSO-IOD) input sets for calibrating and validating ANN and MR Models is proposed to investigate the simultaneous effect of past values of these two major climate modes on long-term spring rainfall prediction. The MR models that did not violate the limits of statistical significance and multicollinearity were selected for future spring rainfall forecast. The ANN was developed in the form of multilayer perceptron using Levenberg-Marquardt algorithm. Both MR and ANN modelling were assessed statistically using mean square error (MSE), mean absolute error (MAE), Pearson correlation (r) and Willmott index of agreement (d). The developed MR and ANN models were tested on out-of-sample test sets; the MR models showed very poor generalisation ability for east Victoria with correlation coefficients of -0.99 to -0.90 compared to ANN with correlation coefficients of 0.42-0.93; ANN models also showed better generalisation ability for central and west Victoria with correlation coefficients of 0.68-0.85 and 0.58-0.97 respectively. The ability of multiple regression models to forecast out-of-sample sets is compatible with ANN for Daylesford in central Victoria and Kaniva in west Victoria (r = 0.92 and 0.67 respectively). The errors of the testing sets for ANN models are generally lower compared to multiple regression models. The statistical analysis suggest the potential of ANN over MR models for rainfall forecasting using large scale climate modes.
Telmo, C; Lousada, J; Moreira, N
2010-06-01
The gross calorific value (GCV), proximate, ultimate and chemical analysis of debark wood in Portugal were studied, for future utilization in wood pellets industry and the results compared with CEN/TS 14961. The relationship between GCV, ultimate and chemical analysis were determined by multiple regression stepwise backward. The treatment between hardwoods-softwoods did not result in significant statistical differences for proximate, ultimate and chemical analysis. Significant statistical differences were found in carbon for National (hardwoods-softwoods) and (National-tropical) hardwoods in volatile matter, fixed carbon, carbon and oxygen and also for chemical analysis in National (hardwoods-softwoods) for F and (National-tropical) hardwoods for Br. GCV was highly positively related to C (0.79 * * *) and negatively to O (-0.71 * * *). The final independent variables of the model were (C, O, S, Zn, Ni, Br) with R(2)=0.86; F=27.68 * * *. The hydrogen did not contribute statistically to the energy content.
Principal regression analysis and the index leverage effect
Reigneron, Pierre-Alain; Allez, Romain; Bouchaud, Jean-Philippe
2011-09-01
We revisit the index leverage effect, that can be decomposed into a volatility effect and a correlation effect. We investigate the latter using a matrix regression analysis, that we call ‘Principal Regression Analysis' (PRA) and for which we provide some analytical (using Random Matrix Theory) and numerical benchmarks. We find that downward index trends increase the average correlation between stocks (as measured by the most negative eigenvalue of the conditional correlation matrix), and makes the market mode more uniform. Upward trends, on the other hand, also increase the average correlation between stocks but rotates the corresponding market mode away from uniformity. There are two time scales associated to these effects, a short one on the order of a month (20 trading days), and a longer time scale on the order of a year. We also find indications of a leverage effect for sectorial correlations as well, which reveals itself in the second and third mode of the PRA.
Poisson Regression Analysis of Illness and Injury Surveillance Data
Energy Technology Data Exchange (ETDEWEB)
Frome E.L., Watkins J.P., Ellis E.D.
2012-12-12
The Department of Energy (DOE) uses illness and injury surveillance to monitor morbidity and assess the overall health of the work force. Data collected from each participating site include health events and a roster file with demographic information. The source data files are maintained in a relational data base, and are used to obtain stratified tables of health event counts and person time at risk that serve as the starting point for Poisson regression analysis. The explanatory variables that define these tables are age, gender, occupational group, and time. Typical response variables of interest are the number of absences due to illness or injury, i.e., the response variable is a count. Poisson regression methods are used to describe the effect of the explanatory variables on the health event rates using a log-linear main effects model. Results of fitting the main effects model are summarized in a tabular and graphical form and interpretation of model parameters is provided. An analysis of deviance table is used to evaluate the importance of each of the explanatory variables on the event rate of interest and to determine if interaction terms should be considered in the analysis. Although Poisson regression methods are widely used in the analysis of count data, there are situations in which over-dispersion occurs. This could be due to lack-of-fit of the regression model, extra-Poisson variation, or both. A score test statistic and regression diagnostics are used to identify over-dispersion. A quasi-likelihood method of moments procedure is used to evaluate and adjust for extra-Poisson variation when necessary. Two examples are presented using respiratory disease absence rates at two DOE sites to illustrate the methods and interpretation of the results. In the first example the Poisson main effects model is adequate. In the second example the score test indicates considerable over-dispersion and a more detailed analysis attributes the over-dispersion to extra
COX MULTIVARIATE REGRESSION ANALYSIS OF RECURRENCE FACTORS FOR COLONIC CARCINOMA
Institute of Scientific and Technical Information of China (English)
杜寒松; 王国斌; 秦青平; 夏玉春; 司徒光伟
2004-01-01
Objective: To determine the independent prognostic factors in the recurrence of colonic carcinoma after curative resection. Methods: Two hundred and one patients undergoing curative resections for colonic carcinoma were investigated by univariate and Cox multivariate regression analyses. Ten factors contributed to the rate were analyzed. Results: Dukes stages, obstruction, postoperative chemotherapy as well as the growth manner of the tumor were significantly associated with the recurrence rate of colonic carcinoma (P<0.05) by univariate analysis, while Dukes stages, obstruction, and postoperative chemotherapy were significant factors by the multivariate analysis. Conclusion: Dukes stages, obstruction, and postoperative chemotherapy are independent prognostic factors in the recurrence of colonic carcinoma.
Hu, L; Zhang, Z G; Mouraux, A; Iannetti, G D
2015-05-01
Transient sensory, motor or cognitive event elicit not only phase-locked event-related potentials (ERPs) in the ongoing electroencephalogram (EEG), but also induce non-phase-locked modulations of ongoing EEG oscillations. These modulations can be detected when single-trial waveforms are analysed in the time-frequency domain, and consist in stimulus-induced decreases (event-related desynchronization, ERD) or increases (event-related synchronization, ERS) of synchrony in the activity of the underlying neuronal populations. ERD and ERS reflect changes in the parameters that control oscillations in neuronal networks and, depending on the frequency at which they occur, represent neuronal mechanisms involved in cortical activation, inhibition and binding. ERD and ERS are commonly estimated by averaging the time-frequency decomposition of single trials. However, their trial-to-trial variability that can reflect physiologically-important information is lost by across-trial averaging. Here, we aim to (1) develop novel approaches to explore single-trial parameters (including latency, frequency and magnitude) of ERP/ERD/ERS; (2) disclose the relationship between estimated single-trial parameters and other experimental factors (e.g., perceived intensity). We found that (1) stimulus-elicited ERP/ERD/ERS can be correctly separated using principal component analysis (PCA) decomposition with Varimax rotation on the single-trial time-frequency distributions; (2) time-frequency multiple linear regression with dispersion term (TF-MLRd) enhances the signal-to-noise ratio of ERP/ERD/ERS in single trials, and provides an unbiased estimation of their latency, frequency, and magnitude at single-trial level; (3) these estimates can be meaningfully correlated with each other and with other experimental factors at single-trial level (e.g., perceived stimulus intensity and ERP magnitude). The methods described in this article allow exploring fully non-phase-locked stimulus-induced cortical
Harrell , Jr , Frank E
2015-01-01
This highly anticipated second edition features new chapters and sections, 225 new references, and comprehensive R software. In keeping with the previous edition, this book is about the art and science of data analysis and predictive modeling, which entails choosing and using multiple tools. Instead of presenting isolated techniques, this text emphasizes problem solving strategies that address the many issues arising when developing multivariable models using real data and not standard textbook examples. It includes imputation methods for dealing with missing data effectively, methods for fitting nonlinear relationships and for making the estimation of transformations a formal part of the modeling process, methods for dealing with "too many variables to analyze and not enough observations," and powerful model validation techniques based on the bootstrap. The reader will gain a keen understanding of predictive accuracy, and the harm of categorizing continuous predictors or outcomes. This text realistically...
Quinino, Roberto C.; Reis, Edna A.; Bessegato, Lupercio F.
2013-01-01
This article proposes the use of the coefficient of determination as a statistic for hypothesis testing in multiple linear regression based on distributions acquired by beta sampling. (Contains 3 figures.)
Digital Repository Service at National Institute of Oceanography (India)
Balachandran, K.K.; Jayalakshmy, K.V.; Laluraj, C.M.; Nair, M.; Joseph, T.; Sheeba, P.
The interaction effects of abiotic processes in the production of phytoplankton in a coastal marine region off Cochin are evaluated using multiple regression models. The study shows that chlorophyll production is not limited by nutrients...
A regressed phase analysis for coupled joint systems.
Wininger, Michael
2011-01-01
This study aims to address shortcomings of the relative phase analysis, a widely used method for assessment of coupling among joints of the lower limb. Goniometric data from 15 individuals with spastic diplegic cerebral palsy were recorded from the hip and knee joints during ambulation on a flat surface, and from a single healthy individual with no known motor impairment, over at least 10 gait cycles. The minimum relative phase (MRP) revealed substantial disparity in the timing and severity of the instance of maximum coupling, depending on which reference frame was selected: MRP(knee-hip) differed from MRP(hip-knee) by 16.1±14% of gait cycle and 50.6±77% difference in scale. Additionally, several relative phase portraits contained discontinuities which may contribute to error in phase feature extraction. These vagaries can be attributed to the predication of relative phase analysis on a transformation into the velocity-position phase plane, and the extraction of phase angle by the discontinuous arc-tangent operator. Here, an alternative phase analysis is proposed, wherein kinematic data is transformed into a profile of joint coupling across the entire gait cycle. By comparing joint velocities directly via a standard linear regression in the velocity-velocity phase plane, this regressed phase analysis provides several key advantages over relative phase analysis including continuity, commutativity between reference frames, and generalizability to many-joint systems.
Spontaneous regression of multiple pulmonary metastatic nodules of hepatocarcinoma: a case report
International Nuclear Information System (INIS)
Although are spontaneous regression of either primary or metastatic malignant tumor in the absence of or inadequate therapy has been well documented. Since the earliest day of this century various malignant tumors have been reported to spontaneously disappear or to be arrested of their growth, but the cases of hepatocarcinoma has been very rare. From the literature, we were able to find out 5 previously reported cases of hepatocarcinoma which showed spontaneous regression at the primary site. Recently we have seen a case of multiple pulmonary metastatic nodules of hepatocarcinoma which completely regressed spontaneously and this forms the basis of the present case report. The patient was 55-year-old male admitted to St. Mary's Hospital, Catholic Medical College because of a hard palpable mass in the epigastrium on April 26, 1978. The admission PA chest roentgenogram revealed multiple small nodular densities scattered throughout both lung field especially in lower zones and toward the peripheral portion. A hepatoscintigram revealed a large cold area involving the left lobe and inermediate zone of the liver. Alfa-fetoprotein and hepatitis B serum antigen test were positive whereas many other standard liver function tests turned out to be negative. A needle biopsy of the tumor revealed well differentiated hepatocellular carcinoma. The patient was put under chemotherapy which consisted of 5 FU 500 mg intravenously for 6 days from April 28 to May 3, 1978. The patient was discharged after this single course of 5 FU treatment and was on a herb medicine, the nature and quantity of which obscure. No other specific treatment was given. The second admission took place on Dec. 3, 1980 because of irregularity in bowel habits and dyspepsia. A follow up PA chest roentgenogram obtained on the second admission revealed complete disappearance of previously noted multiple pulmonary nodular lesions (Fig. 3). Follow up liver scan revealed persistence of the cold area in the left lobe
Szabo, Michael; Feldhusen, John F.
This is an empirical study of selected learner characteristics and their relation to academic success, as indicated by course grades, in a structured independent study learning program. This program, called the Audio-Tutorial System, was utilized in an undergraduate college course in the biological sciences. By use of multiple regression analysis,…
Maniquiz, Marla C; Lee, Soyoung; Kim, Lee-Hyung
2010-01-01
Rainfall is an important factor in estimating the event mean concentration (EMC) which is used to quantify the washed-off pollutant concentrations from non-point sources (NPSs). Pollutant loads could also be calculated using rainfall, catchment area and runoff coefficient. In this study, runoff quantity and quality data gathered from a 28-month monitoring conducted on the road and parking lot sites in Korea were evaluated using multiple linear regression (MLR) to develop equations for estimating pollutant loads and EMCs as a function of rainfall variables. The results revealed that total event rainfall and average rainfall intensity are possible predictors of pollutant loads. Overall, the models are indicators of the high uncertainties of NPSs; perhaps estimation of EMCs and loads could be accurately obtained by means of water quality sampling or a long-term monitoring is needed to gather more data that can be used for the development of estimation models.
Directory of Open Access Journals (Sweden)
Avval Zhila Mohajeri
2015-01-01
Full Text Available This paper deals with developing a linear quantitative structure-activity relationship (QSAR model for predicting the RSK inhibition activity of some new compounds. A dataset consisting of 62 pyrazino [1,2-α] indole, diazepino [1,2-α] indole, and imidazole derivatives with known inhibitory activities was used. Multiple linear regressions (MLR technique combined with the stepwise (SW and the genetic algorithm (GA methods as variable selection tools was employed. For more checking stability, robustness and predictability of the proposed models, internal and external validation techniques were used. Comparison of the results obtained, indicate that the GA-MLR model is superior to the SW-MLR model and that it isapplicable for designing novel RSK inhibitors.
Hema, M; Srinivasan, K
2011-07-01
Nickel removal efficiency of powered activated carbons of coconut oilcake, neem oilcake and commercial carbon was investigated by using artificial neural network. The effective parameters for the removal of nickel (%R) by adsorption process, which included the pH, contact time (T), distinctiveness of activated carbon (Cn), amount of activated carbon (Cw) and initial concentration of nickel (Co) were investigated. Levenberg-Marquardt (LM) Back-propagation algorithm is used to train the network. The network topology was optimized by varying number of hidden layer and number of neurons in hidden layer. The model was developed in terms of training; validation and testing of experimental data, the test subsets that each of them contains 60%, 20% and 20% of total experimental data, respectively. Multiple regression equation was developed for nickel adsorption system and the output was compared with both simulated and experimental outputs. Standard deviation (SD) with respect to experimental output was quite higher in the case of regression model when compared with ANN model. The obtained experimental data best fitted with the artificial neural network. PMID:23029923
Multiple Regression (MR) and Artificial Neural Network (ANN) models for prediction of soil suction
Erzin, Yusuf; Yilmaz, Isik
2010-05-01
This article presents a comparison of multiple regression (MR) and artificial neural network (ANN) model for prediction of soil suction of clayey soils. The results of the soil suction tests utilizing thermocouple psychrometers on statically compacted specimens of Bentonite-Kaolinite clay mixtures with varying soil properties were used to develope the models. The results obtained from both models were then compared with the experimental results. The performance indices such as coefficient of determination (R2), root mean square error (RMSE), mean absolute error (MAE), and variance account for (VAF) were used to control the performance of the prediction capacity of the models developed in this study. ANN model has shown higher prediction performance than regression model according to the performance indices. It is shown that ANN models provide significant improvements in prediction accuracy over statistical models. The potential benefits of soft computing models extend beyond the high computation rates. Higher performances of the soft computing models were sourced from greater degree of robustness and fault tolerance than traditional statistical models because there are many more processing neurons, each with primarily local connections. It appears that there is a possibility of estimating soil suction by using the proposed empirical relationships and soft computing models. The population of the analyzed data is relatively limited in this study. Therefore, the practical outcome of the proposed equations and models could be used, with acceptable accuracy.
Majumdar, Arunabha; Witte, John S.; Ghosh, Saurabh
2016-01-01
Binary phenotypes commonly arise due to multiple underlying quantitative precursors. Genetic variants may impact multiple traits in a pleiotropic manner. Hence, simultaneously analyzing such correlated traits may be more powerful than analyzing individual traits. Various genotype-level methods, e.g. MultiPhen [O'Reilly et al., 2012], have been developed to identify genetic factors underlying a multivariate phenotype. For univariate phenotypes, the usefulness and applicability of allele-level tests have been investigated. The test of allele frequency difference among cases and controls is commonly used for mapping case-control association. However, allelic methods for multivariate association mapping have not been studied much. We explore two allelic tests of multivariate association: one using a Binomial regression model based on inverted regression of genotype on phenotype (BAMP), and the other employing the Mahalanobis distance between two sample means of the multivariate phenotype vector for two alleles at a SNP (DAMP). These methods can incorporate both discrete and continuous phenotypes. Some theoretical properties for BAMP are studied. Using simulations, the power of the methods for detecting multivariate association are compared with the genotype-level test MultiPhen. The allelic tests yield marginally higher power than MultiPhen for multivariate phenotypes. For one/two binary traits under recessive mode of inheritance, allelic tests are found substantially more powerful. All three tests are applied to two real data and the results offer some support for the simulation study. Since the allelic approaches assume Hardy-Weinberg Equilibrium (HWE), we propose a hybrid approach for testing multivariate association that implements MultiPhen when HWE is violated and BAMP otherwise. PMID:26493781
Finding determinants of audit delay by pooled OLS regression analysis
Tina Vuko; Marko Čular
2014-01-01
The aim of this paper is to investigate determinants of audit delay. Audit delay is measured as the length of time (i.e. the number of calendar days) from the fiscal year-end to the audit report date. It is important to understand factors that influence audit delay since it directly affects the timeliness of financial reporting. The research is conducted on a sample of Croatian listed companies, covering the period of four years (from 2008 to 2011). We use pooled OLS regression analysis, mode...
Meta-regression Analysis of the Chinese Labor Reallocation Effect
Institute of Scientific and Technical Information of China (English)
Longhua; YUE; Shiyan; YANG; Rongtai; SHEN
2013-01-01
Meta regression analysis method was applied to study 23 papers about the effect of Chinese labor reallocation on the economic growth. The results showed that both the method of the World Bank (1996) or M.Syrquin(1986) had little impact on the results, while the calculation of the stock of physical capital had a positive impact on the results. The result by using panel data study was bigger than results obtained in the time series data. The time span had little influences on the results. Therefore, it was necessary to measure the exact stock of physical capital in China, so as to evaluate the Chinese labor reallocation effect
Multivariate study and regression analysis of gluten-free granola
Directory of Open Access Journals (Sweden)
Lilian Maria Pagamunici
2014-03-01
Full Text Available This study developed a gluten-free granola and evaluated it during storage with the application of multivariate and regression analysis of the sensory and instrumental parameters. The physicochemical, sensory, and nutritional characteristics of a product containing quinoa, amaranth and linseed were evaluated. The crude protein and lipid contents ranged from 97.49 and 122.72 g kg-1 of food, respectively. The polyunsaturated/saturated, and n-6:n-3 fatty acid ratios ranged from 2.82 and 2.59:1, respectively. Granola had the best alpha-linolenic acid content, nutritional indices in the lipid fraction, and mineral content. There were good hygienic and sanitary conditions during storage; probably due to the low water activity of the formulation, which contributed to inhibit microbial growth. The sensory attributes ranged from 'like very much' to 'like slightly', and the regression models were highly fitted and correlated during the storage period. A reduction in the sensory attribute levels and in the product physical stabilisation was verified by principal component analysis. The use of the affective test acceptance and instrumental analysis combined with statistical methods allowed us to obtain promising results about the characteristics of gluten-free granola.
Multiple Time Scales and Longitudinal Measurements in Event History Analysis
Danardono,
2005-01-01
A general time-to-event data analysis known as event history analysis is considered. The focus is on the analysis of time-to-event data using Cox's regression model when the time to the event may be measured from different origins giving several observable time scales and when longitudinal measurements are involved. For the multiple time scales problem, procedures to choose a basic time scale in Cox's regression model are proposed. The connections between piecewise constant hazards, time-depe...
Da, Yang; VanRaden, Paul; Schook, Lawrence
2000-01-01
International audience A strategy of multi-step minimal conditional regression analysis has been developed to determine the existence of statistical testing and parameter estimation for a quantitative trait locus (QTL) that are unaffected by linked QTLs. The estimation of marker-QTL recombination frequency needs to consider only three cases: 1) the chromosome has only one QTL, 2) one side of the target QTL has one or more QTLs, and 3) either side of the target QTL has one or more QTLs. Ana...
Determining Balıkesir’s Energy Potential Using a Regression Analysis Computer Program
Directory of Open Access Journals (Sweden)
Bedri Yüksel
2014-01-01
Full Text Available Solar power and wind energy are used concurrently during specific periods, while at other times only the more efficient is used, and hybrid systems make this possible. When establishing a hybrid system, the extent to which these two energy sources support each other needs to be taken into account. This paper is a study of the effects of wind speed, insolation levels, and the meteorological parameters of temperature and humidity on the energy potential in Balıkesir, in the Marmara region of Turkey. The relationship between the parameters was studied using a multiple linear regression method. Using a designed-for-purpose computer program, two different regression equations were derived, with wind speed being the dependent variable in the first and insolation levels in the second. The regression equations yielded accurate results. The computer program allowed for the rapid calculation of different acceptance rates. The results of the statistical analysis proved the reliability of the equations. An estimate of identified meteorological parameters and unknown parameters could be produced with a specified precision by using the regression analysis method. The regression equations also worked for the evaluation of energy potential.
Institute of Scientific and Technical Information of China (English)
徐曼; 柴云; 李涛; 卢丽; 刘冰
2015-01-01
目的：以多重线性回归和路径分析深入探讨城乡居民主观幸福感影响因素及其相互作用关系。方法：应用简单随机抽样方法选取1480名城乡居民运用Campbell主观幸福感指数量表进行问卷调查，数据采用多重线性回归和路径分析进行对比分析。结果：城乡居民总体幸福感指数平均得分为（11．17±1．99）。多重线性回归分析显示，未来目标、压力应对方式、自评健康状况、兴趣爱好、城乡居住地、休闲时间是幸福感指数的预测因素，标准化偏回归数分别为0．261、0．182、0．152、0．066、0．071、0．051。路径分析发现，未来目标、压力应对方式、自评健康状况直接作用于幸福感指数，路径系数为0．285、0．191、0．160，兴趣爱好、个人月收入、受教育程度、年龄间接作用于幸福感指数，间接效应为0．08、-0．04、0．10、-0．07。结论：主观幸福感与个体生理、心理、社会因素等多个内外部因素有关，多重线性回归和路径分析在探讨居民主观幸福感影响因素及其作用关系过程中各有侧重，相互补充。%Objective:To further explore the subjective wellbeing among urban and rural residents and their influencing factors and the factors'interactions by multiple linear regression analyses and path analy-ses .Methods:By simple random sampling method ,a questionnaire survey was conducted among 1480 ur-ban and rural residents using Compbell subjective well -being index Scale ,and data were compared and analyzed by multiple linear regression analyses and path analyses .Results:The average of urban and rural residents'general well-being index was(11.17 ±1.99).Multiple linear regression analysis showed that future goals,stress coping styles,self-rated health status,hobbies,place from urban and rural residence and leisure time were predictors of well -being index ,and their standardized partial regression
International Nuclear Information System (INIS)
Objective: To analyze the correlations between liver lipid level determined by liver 3.0 T 1H-MRS in vivo and influencing factors using multiple linear stepwise regression. Methods: The prospective study of liver 1H-MRS was performed with 3.0 T system and eight-channel torso phased-array coils using PRESS sequence. Forty-four volunteers were enrolled in this study. Liver spectra were collected with a TR of 1500 ms, TE of 30 ms, volume of interest of 2 cm×2 cm×2 cm, NSA of 64 times. The acquired raw proton MRS data were processed by using a software program SAGE. For each MRS measurement, using water as the internal reference, the amplitude of the lipid signal was normalized to the sum of the signal from lipid and water to obtain percentage lipid within the liver. The statistical description of height, weight, age and BMI, Line width and water suppression were recorded, and Pearson analysis was applied to test their relationships. Multiple linear stepwise regression was used to set the statistical model for the prediction of Liver lipid content. Results: Age (39.1±12.6) years, body weight (64.4±10.4) kg, BMI (23.3±3.1) kg/m2, linewidth (18.9±4.4) and the water suppression (90.7±6.5)% had significant correlation with liver lipid content (0.00 to 0.96%, median 0.02%), r were 0.11, 0.44, 0.40, 0.52, -0.73 respectively (P<0.05). But only age, BMI, line width, and the water suppression entered into the multiple linear regression equation. Liver lipid content prediction equation was as follows: Y= 1.395 - (0.021×water suppression) + (0.022×BMI) + (0.014×line width) - (0.004×age), and the coefficient of determination was 0. 613, corrected coefficient of determination was 0.59. Conclusion: The regression model fitted well, since the variables of age, BMI, width, and water suppression can explain about 60% of liver lipid content changes. (authors)
Regression analysis of reported earthquake precursors. I. Presentation of data
Niazi, Mansour
1984-11-01
Around 700 reported precursors of about 350 earthquakes, including the negative observations, have been compiled in 11 categories with 31 subdivisions. The data base is subjected to an initial sorting and screening by imposing three restrictions on the ranges of main shock magnitude ( M≥4.0), precursory time ( t≤20 years), and the epicentral distance of observation points ( X m≤4.100.3 M ). Of the 31 subcategories of precursory phenomena, 18 with 9 data points or more are independently studied by regressing their precursory times against magnitude. The preliminary results tend to classify the precursors into three groups: 1. The precursors which show weak or no correlation between time and the magnitude of the eventual main shock. Examples of this group are foreshocks and precursory tilt. 2. The precursors which show clear scaling with magnitude. These include seismic velocity ratio ( V p/Vs), travel time delay, duration of seismic quiescence, and, to some degree, the variation of b-value, and anomalous seismicity. 3. The precursors which display clustering of precursory times around a mean value, which differs for different precursors from a few hours to a few years. Examples include the conductivity rate, geoelectric current and potential, strain, water well level, geochemical anomalies, change of focal mechanism, and the enhancement of seismicity reported only for larger earthquakes. Some of the precursors in this category, such as leveling changes and the occurrence of microseismicity, show bimodal patterns of precursory times and may partially be coseismic. In addition, each category with a sufficient number of reported estimates of distance and signal amplitude is subjected to multiple linear regression. The usefulness of these regressions at this stage appears to be limited to specifying which of the parameters shows a more significant correlation. Standard deviations of residuals of precursory time against magnitude are generally reduced when
Jamali, Jamshid; Ayatollahi, Seyyed Mohammad Taghi; Jafari, Peyman
2016-01-01
Background: Measurement equivalence is an essential prerequisite for making valid comparisons in mental health questionnaires across groups. In most methods used for assessing measurement equivalence, which is known as Differential Item Functioning (DIF), latent variables are assumed to be continuous. Objective: To compare a new method called Latent Class Regression (LCR) designed for discrete latent variable with the multiple indicators multiple cause (MIMIC) as a continuous latent variable technique to assess the measurement equivalence of the 12-item General Health Questionnaire (GHQ-12), which is a cross deferent subgroup of Iranian nurses. Methods: A cross-sectional survey was conducted in 2014 among 771 nurses working in the hospitals of Fars and Bushehr provinces of southern Iran. To identify the Minor Psychiatric Disorders (MPD), the nurses completed self-report GHQ-12 questionnaires and sociodemographic questions. Two uniform-DIF detection methods, LCR and MIMIC, were applied for comparability when the GHQ-12 score was assumed to be discrete and continuous, respectively. Results: The result of fitting LCR with 2 classes indicated that 27.4% of the nurses had MPD. Gender was identified as an influential factor of the level of MPD.LCR and MIMIC agree with detection of DIF and DIF-free items by gender, age, education and marital status in 83.3, 100.0, 91.7 and 83.3% cases, respectively. Conclusions: The results indicated that the GHQ-12 is to a great degree, an invariant measure for the assessment of MPD among nurses. High convergence between the two methods suggests using the LCR approach in cases of discrete latent variable, e.g. GHQ-12 and adequate sample size. PMID:27482129
Soares dos Santos, T.; Mendes, D.; Rodrigues Torres, R.
2016-01-01
Several studies have been devoted to dynamic and statistical downscaling for analysis of both climate variability and climate change. This paper introduces an application of artificial neural networks (ANNs) and multiple linear regression (MLR) by principal components to estimate rainfall in South America. This method is proposed for downscaling monthly precipitation time series over South America for three regions: the Amazon; northeastern Brazil; and the La Plata Basin, which is one of the regions of the planet that will be most affected by the climate change projected for the end of the 21st century. The downscaling models were developed and validated using CMIP5 model output and observed monthly precipitation. We used general circulation model (GCM) experiments for the 20th century (RCP historical; 1970-1999) and two scenarios (RCP 2.6 and 8.5; 2070-2100). The model test results indicate that the ANNs significantly outperform the MLR downscaling of monthly precipitation variability.
Aguinis, Herman; Beaty, James C; Boik, Robert J; Pierce, Charles A
2005-01-01
The authors conducted a 30-year review (1969-1998) of the size of moderating effects of categorical variables as assessed using multiple regression. The median observed effect size (f(2)) is only .002, but 72% of the moderator tests reviewed had power of .80 or greater to detect a targeted effect conventionally defined as small. Results suggest the need to minimize the influence of artifacts that produce a downward bias in the observed effect size and put into question the use of conventional definitions of moderating effect sizes. As long as an effect has a meaningful impact, the authors advise researchers to conduct a power analysis and plan future research designs on the basis of smaller and more realistic targeted effect sizes.
Regression analysis exploring teacher impact on student FCI post scores
Mahadeo, Jonathan V.; Manthey, Seth R.; Brewe, Eric
2013-01-01
High School Modeling Workshops are designed to improve high school physics teachers' understanding of physics and how to teach using the Modeling method. The basic assumption is that the teacher plays a critical role in their students' physics education. This study investigated teacher impacts on students' Force Concept Inventory scores, (FCI), with the hopes of identifying quantitative differences between teachers. This study examined student FCI scores from 18 teachers with at least a year of teaching high school physics. This data was then evaluated using a General Linear Model (GLM), which allowed for a regression equation to be fitted to the data. This regression equation was used to predict student post FCI scores, based on: teacher ID, student pre FCI score, gender, and representation. The results show 12 out of 18 teachers significantly impact their student post FCI scores. The GLM further revealed that of the 12 teachers only five have a positive impact on student post FCI scores. Given these differences among teachers it is our intention to extend our analysis to investigate pedagogical differences between them.
Institute of Scientific and Technical Information of China (English)
宝凌云; 易欣; 高瑾; 胡熙; 杜琨
2016-01-01
Objective To explore the risk factors of neonatal hypoglycemia. Methods The clinical data of 340 neonates admitted to our hospital from July 2013 to July 2015 were retrospectively analyzed,grouped according to neonatal blood glucose levels,blood glucose<2.2 mmol/L is defined as low blood sugar,including 32 cases of neonatal hypoglycemia ans 308 normal neonates respectively,using Pearson single factor and multivariate Logistic regression model to analyze the related risk factors of neonatal hypoglycemia. Results In normal newborn infants and neonatal hypoglycemia group the differences of neonatal conditions ( birth weight,premature infants and full-term SGA) ,mother of perinatal situation ( mater-nal age,pregnancy induced hypertension) and neonatal complications (with new respiratory disease,asphyxia,congenital heart disease,hemorrhage disease,infectious disease,hyperbilirubinemia,hypothyroidism) were statistically significant ( P<0.05~0.01);the risk factors of Neonatal hypoglycemia:neonatal birth ( birth weight,premature infants and full-term SGA) , mother of perinatal situation ( maternal age,pregnancy induced hypertension) ,neonatal complications ( neonatal asphyxia, congenital heart disease and hyperbilirubinemia ) . Conclusion Neonatal hypoglycemia related risk factors were birth weight,premature infant and full term small for gestational age, maternal age, pregnancy hypertension, neonatal asphyxia, congenital heart disease,hyperbilirubinemia,controlling these factors can provide scientific basis for the effective prevention of neonatal hypoglycemia.%目的：探讨影响新生儿低血糖的危险因素。方法回顾性分析了2013年7月~2015年7月入住本院的340例新生儿的临床资料，根据新生儿血糖浓度进行分组，将血糖浓度<2.2 mmol/L为低血糖组，其中新生儿低血糖组32例，正常新生儿组308例。分别采用Pearson单因素与多元Logistic回归模型分析影响低血糖新生儿的相关危险因素。结果
Finding determinants of audit delay by pooled OLS regression analysis
Directory of Open Access Journals (Sweden)
Tina Vuko
2014-03-01
Full Text Available The aim of this paper is to investigate determinants of audit delay. Audit delay is measured as the length of time (i.e. the number of calendar days from the fiscal year-end to the audit report date. It is important to understand factors that influence audit delay since it directly affects the timeliness of financial reporting. The research is conducted on a sample of Croatian listed companies, covering the period of four years (from 2008 to 2011. We use pooled OLS regression analysis, modelling audit delay as a function of the following explanatory variables: audit firm type, audit opinion, profitability, leverage, inventory and receivables to total assets, absolute value of total accruals, company size and audit committee existence. Our results indicate that audit committee existence, profitability and leverage are statistically significant determinants of audit delay in Croatia.
A Visual Analytics Approach for Correlation, Classification, and Regression Analysis
Energy Technology Data Exchange (ETDEWEB)
Steed, Chad A [ORNL; SwanII, J. Edward [Mississippi State University (MSU); Fitzpatrick, Patrick J. [Mississippi State University (MSU); Jankun-Kelly, T.J. [Mississippi State University (MSU)
2012-02-01
New approaches that combine the strengths of humans and machines are necessary to equip analysts with the proper tools for exploring today's increasing complex, multivariate data sets. In this paper, a novel visual data mining framework, called the Multidimensional Data eXplorer (MDX), is described that addresses the challenges of today's data by combining automated statistical analytics with a highly interactive parallel coordinates based canvas. In addition to several intuitive interaction capabilities, this framework offers a rich set of graphical statistical indicators, interactive regression analysis, visual correlation mining, automated axis arrangements and filtering, and data classification techniques. The current work provides a detailed description of the system as well as a discussion of key design aspects and critical feedback from domain experts.
International Nuclear Information System (INIS)
Risk associated with power generation must be identified to make intelligent choices between alternate power technologies. Radionuclide air stack emissions for a single coal plant and a single nuclear plant are used to compute the single plant leukemia incidence risk and total industry leukemia incidence risk. Leukemia incidence is the response variable as a function of radionuclide bone dose for the six proposed dose response curves considered. During normal operation a coal plant has higher radionuclide emissions than a nuclear plant and the coal industry has a higher leukaemia incidence risk than the nuclear industry, unless a nuclear accident occurs. Variation of nuclear accident size allows quantification of the impact of accidents on the total industry leukemia incidence risk comparison. The leukemia incidence risk is quantified as the number of accidents of a given size for the nuclear industry leukemia incidence risk to equal the coal industry leukemia incidence risk. The general linear model is used to develop equations that relate the accident frequency required for equal industry risks to the magnitude of the nuclear emission. Exploratory data analysis revealed that the relationship between the natural log of accident number versus the natural log of accident size is linear. (Author)
Optimization of end-members used in multiple linear regression geochemical mixing models
Dunlea, Ann G.; Murray, Richard W.
2015-11-01
Tracking marine sediment provenance (e.g., of dust, ash, hydrothermal material, etc.) provides insight into contemporary ocean processes and helps construct paleoceanographic records. In a simple system with only a few end-members that can be easily quantified by a unique chemical or isotopic signal, chemical ratios and normative calculations can help quantify the flux of sediment from the few sources. In a more complex system (e.g., each element comes from multiple sources), more sophisticated mixing models are required. MATLAB codes published in Pisias et al. solidified the foundation for application of a Constrained Least Squares (CLS) multiple linear regression technique that can use many elements and several end-members in a mixing model. However, rigorous sensitivity testing to check the robustness of the CLS model is time and labor intensive. MATLAB codes provided in this paper reduce the time and labor involved and facilitate finding a robust and stable CLS model. By quickly comparing the goodness of fit between thousands of different end-member combinations, users are able to identify trends in the results that reveal the CLS solution uniqueness and the end-member composition precision required for a good fit. Users can also rapidly check that they have the appropriate number and type of end-members in their model. In the end, these codes improve the user's confidence that the final CLS model(s) they select are the most reliable solutions. These advantages are demonstrated by application of the codes in two case studies of well-studied datasets (Nazca Plate and South Pacific Gyre).
Directory of Open Access Journals (Sweden)
Panatchai Chetchotisak
2015-09-01
Full Text Available Because of nonlinear strain distributions caused either by abrupt changes in geometry or in loading in deep beam, the approach for conventional beams is not applicable. Consequently, strut-and-tie model (STM has been applied as the most rational and simple method for strength prediction and design of reinforced concrete deep beams. A deep beam is idealized by the STM as a truss-like structure consisting of diagonal concrete struts and tension ties. There have been numerous works proposing the STMs for deep beams. However, uncertainty and complexity in shear strength computations of deep beams can be found in some STMs. Therefore, improvement of methods for predicting the shear strengths of deep beams are still needed. By means of a large experimental database of 406 deep beam test results covering a wide range of influencing parameters, several shapes and geometry of STM and six state-of-the-art formulation of the efficiency factors found in the design codes and literature, the new STMs for predicting the shear strength of simply supported reinforced concrete deep beams using multiple linear regression analysis is proposed in this paper. Furthermore, the regression diagnostics and the validation process are included in this study. Finally, two numerical examples are also provided for illustration.
Oh, Eunsong; Kim, Sug-Whan; Cho, Seongick; Ryu, Joo-Hyung
2011-10-01
In our earlier study[12], we suggested a new alignment algorithm called Multiple Design Configuration Optimization (MDCO hereafter) method combining the merit function regression (MFR) computation with the differential wavefront sampling method (DWS). In this study, we report alignment state estimation performances of the method for three target optical systems (i.e. i) a two-mirror Cassegrain telescope of 58mm in diameter for deep space earth observation, ii) a three-mirror anastigmat of 210mm in aperture for ocean monitoring from the geostationary orbit, and iii) on-axis/off-axis pairs of a extremely large telescope of 27.4m in aperture). First we introduced known amounts of alignment state disturbances to the target optical system elements. Example alignment parameter ranges may include, but not limited to, from 800microns to 10mm in decenter, and from 0.1 to 1.0 degree in tilt. We then ran alignment state estimation simulation using MDCO, MFR and DWS. The simulation results show that MDCO yields much better estimation performance than MFR and DWS over the alignment disturbance level of up to 150 times larger than the required tolerances. In particular, with its simple single field measurement, MDCO exhibits greater practicality and application potentials for shop floor optical testing environment than MFR and DWS.
Dental malocclusion and body posture in young subjects: A multiple regression study
Directory of Open Access Journals (Sweden)
Giuseppe Perinetti
2010-01-01
Full Text Available OBJECTIVES: Controversial results have been reported on potential correlations between the stomatognathic system and body posture. We investigated whether malocclusal traits correlate with body posture alterations in young subjects to determine possible clinical applications. METHODS: A total of 122 subjects, including 86 males and 36 females (age range of 10.8-16.3 years, were enrolled. All subjects tested negative for temporomandibular disorders or other conditions affecting the stomatognathic systems, except malocclusion. A dental occlusion assessment included phase of dentition, molar class, overjet, overbite, anterior and posterior crossbite, scissorbite, mandibular crowding and dental midline deviation. In addition, body posture was recorded through static posturography using a vertical force platform. Recordings were performed under two conditions, namely, i mandibular rest position (RP and ii dental intercuspidal position (ICP. Posturographic parameters included the projected sway area and velocity and the antero-posterior and right-left load differences. Multiple regression models were run for both recording conditions to evaluate associations between each malocclusal trait and posturographic parameters. RESULTS: All of the posturographic parameters had large variability and were very similar between the two recording conditions. Moreover, a limited number of weakly significant correlations were observed, mainly for overbite and dentition phase, when using multivariate models. CONCLUSION: Our current findings, particularly with regard to the use of posturography as a diagnostic aid for subjects affected by dental malocclusion, do not support existence of clinically relevant correlations between malocclusal traits and body posture
Regression Analysis of Restricted Mean Survival Time Based on Pseudo-Observations
DEFF Research Database (Denmark)
Andersen, Per Kragh; Hansen, Mette Gerster; Klein, John P.
2004-01-01
censoring; hazard function; health economics; mean survival time; pseudo-observations; regression model; restricted mean survival time; survival analysis......censoring; hazard function; health economics; mean survival time; pseudo-observations; regression model; restricted mean survival time; survival analysis...
Regression analysis of restricted mean survival time based on pseudo-observations
DEFF Research Database (Denmark)
Andersen, Per Kragh; Hansen, Mette Gerster; Klein, John P.
censoring; hazard function; health economics; regression model; survival analysis; mean survival time; restricted mean survival time; pseudo-observations......censoring; hazard function; health economics; regression model; survival analysis; mean survival time; restricted mean survival time; pseudo-observations...
An Analysis of Bank Service Satisfaction Based on Quantile Regression and Grey Relational Analysis
Directory of Open Access Journals (Sweden)
Wen-Tsao Pan
2016-01-01
Full Text Available Bank service satisfaction is vital to the success of a bank. In this paper, we propose to use the grey relational analysis to gauge the levels of service satisfaction of the banks. With the grey relational analysis, we compared the effects of different variables on service satisfaction. We gave ranks to the banks according to their levels of service satisfaction. We further used the quantile regression model to find the variables that affected the satisfaction of a customer at a specific quantile of satisfaction level. The result of the quantile regression analysis provided a bank manager with information to formulate policies to further promote satisfaction of the customers at different quantiles of satisfaction level. We also compared the prediction accuracies of the regression models at different quantiles. The experiment result showed that, among the seven quantile regression models, the median regression model has the best performance in terms of RMSE, RTIC, and CE performance measures.
Kayaalp, G.Tamer
1999-01-01
In animal breeding, when there is a relationship between the dependent (Y) and independent (X) variables, regression analysis is applied. But when one of the variables has one or more missing observations regression analysis cannot be applied. This paper illustrates and discusses a regression analysis in which the independent variable (X) has a missing observation.
Regression Analysis between Properties of Subgrade Lateritic Soil
Directory of Open Access Journals (Sweden)
Afeez Adefemi BELLO
2012-12-01
Full Text Available The results of a study that considered the use of regression analysis that may have correlation between index properties and California Bearing Ratio (CBR of some lateritic soil within Osogbo town of South Western Nigeria have been presented. For an appreciable conclusion to be established, lateritic soil samples were collected from eight (8 different borrow pits within the town and various laboratory tests including Atterberg Limits, Gradation analysis, California Bearing Ratio, Compaction and Specific Gravity were performed on the soil samples.Various linear relationships between index properties and CBR of the samples were investigated and predictive equations estimating CBR from the experimental index values were developed. The findings indicate that good correlation exists between the two groups (i.e Index properties and CBR values. However, the values of the CBR computed from the models are only to be used for preliminary in view of simplicity and economy and not acceptable alternatives to laboratory testing because of the anisotropic nature of lateritic soil and its heterogeneity.
Framing an Nuclear Emergency Plan using Qualitative Regression Analysis
International Nuclear Information System (INIS)
Since the arising on safety maintenance issues due to post-Fukushima disaster, as well as, lack of literatures on disaster scenario investigation and theory development. This study is dealing with the initiation difficulty on the research purpose which is related to content and problem setting of the phenomenon. Therefore, the research design of this study refers to inductive approach which is interpreted and codified qualitatively according to primary findings and written reports. These data need to be classified inductively into thematic analysis as to develop conceptual framework related to several theoretical lenses. Moreover, the framing of the expected framework of the respective emergency plan as the improvised business process models are abundant of unstructured data abstraction and simplification. The structural methods of Qualitative Regression Analysis (QRA) and Work System snapshot applied to form the data into the proposed model conceptualization using rigorous analyses. These methods were helpful in organising and summarizing the snapshot into an 'as-is' work system that being recommended as 'to-be'work system towards business process modelling. We conclude that these methods are useful to develop comprehensive and structured research framework for future enhancement in business process simulation. (author)
Institute of Scientific and Technical Information of China (English)
袁小红; 于之锋; 毛端谦
2011-01-01
Hotels in cities has entered a new period of development in recent years, the star-rated hotels has played a positive lead and exemplary roles for the hospitality industry, even the whole tourism. Based on the research of the pioneers, using the 1990-2007 relevant statistical data, such as economy and transportation, the paper analyses the key driving force from 12 factors that influence the spatial distribution of star-rated hotels, by Multiple stepwise Regression Analysis and Path Analysis. Through analysis, it can be concluded that the city traffic, the income from tourism and the Urban Air Quality are the main driving force, which take important role in the spatial distribution of the star-rated hotels in Nanchang city. Lastly, the more and more better investment environments and economic development policy, it will influence the redistribution.%近年来,城市饭店进入了一个蓬勃发展的新时期,星级饭店作为旅游饭店业的主体部分,对饭店业乃至旅游业的发展起到积极引领和示范作用.在前人的研究基础上,采用1990年～2007年南昌市市辖区有关方面的统计数据,对影响星级饭店规模布局的12个因子进行多元线性逐步回归及通径分析,最终确定市内交通状况、旅游业收入及城市空气质量为影响南昌市星级饭店空间布局的主要驱动因子.另外,良好的投资环境与经济发展政策将在一定程度上影响南昌市星级饭店再分布.
Energy Technology Data Exchange (ETDEWEB)
Janssen, I.; Stebbings, J.H.
1990-01-01
In environmental epidemiology, trace and toxic substance concentrations frequently have very highly skewed distributions ranging over one or more orders of magnitude, and prediction by conventional regression is often poor. Classification and Regression Tree Analysis (CART) is an alternative in such contexts. To compare the techniques, two Pennsylvania data sets and three independent variables are used: house radon progeny (RnD) and gamma levels as predicted by construction characteristics in 1330 houses; and {approximately}200 house radon (Rn) measurements as predicted by topographic parameters. CART may identify structural variables of interest not identified by conventional regression, and vice versa, but in general the regression models are similar. CART has major advantages in dealing with other common characteristics of environmental data sets, such as missing values, continuous variables requiring transformations, and large sets of potential independent variables. CART is most useful in the identification and screening of independent variables, greatly reducing the need for cross-tabulations and nested breakdown analyses. There is no need to discard cases with missing values for the independent variables because surrogate variables are intrinsic to CART. The tree-structured approach is also independent of the scale on which the independent variables are measured, so that transformations are unnecessary. CART identifies important interactions as well as main effects. The major advantages of CART appear to be in exploring data. Once the important variables are identified, conventional regressions seem to lead to results similar but more interpretable by most audiences. 12 refs., 8 figs., 10 tabs.
Analysis of some methods for reduced rank Gaussian process regression
DEFF Research Database (Denmark)
Quinonero-Candela, J.; Rasmussen, Carl Edward
2005-01-01
proliferation of a number of cost-effective approximations to GPs, both for classification and for regression. In this paper we analyze one popular approximation to GPs for regression: the reduced rank approximation. While generally GPs are equivalent to infinite linear models, we show that Reduced Rank...
Stone, Wesley W.; Crawford, Charles G.; Gilliom, Robert J.
2013-01-01
Watershed Regressions for Pesticides for multiple pesticides (WARP-MP) are statistical models developed to predict concentration statistics for a wide range of pesticides in unmonitored streams. The WARP-MP models use the national atrazine WARP models in conjunction with an adjustment factor for each additional pesticide. The WARP-MP models perform best for pesticides with application timing and methods similar to those used with atrazine. For other pesticides, WARP-MP models tend to overpredict concentration statistics for the model development sites. For WARP and WARP-MP, the less-than-ideal sampling frequency for the model development sites leads to underestimation of the shorter-duration concentration; hence, the WARP models tend to underpredict 4- and 21-d maximum moving-average concentrations, with median errors ranging from 9 to 38% As a result of this sampling bias, pesticides that performed well with the model development sites are expected to have predictions that are biased low for these shorter-duration concentration statistics. The overprediction by WARP-MP apparent for some of the pesticides is variably offset by underestimation of the model development concentration statistics. Of the 112 pesticides used in the WARP-MP application to stream segments nationwide, 25 were predicted to have concentration statistics with a 50% or greater probability of exceeding one or more aquatic life benchmarks in one or more stream segments. Geographically, many of the modeled streams in the Corn Belt Region were predicted to have one or more pesticides that exceeded an aquatic life benchmark during 2009, indicating the potential vulnerability of streams in this region.
Analysis of retirement income adequacy using quantile regression: A case study in Malaysia
Alaudin, Ros Idayuwati; Ismail, Noriszura; Isa, Zaidi
2015-09-01
Quantile regression is a statistical analysis that does not restrict attention to the conditional mean and therefore, permitting the approximation of the whole conditional distribution of a response variable. Quantile regression is a robust regression to outliers compared to mean regression models. In this paper, we demonstrate how quantile regression approach can be used to analyze the ratio of projected wealth to needs (wealth-needs ratio) during retirement.
Institute of Scientific and Technical Information of China (English)
胡泽文; 武夷山
2012-01-01
Firstly, some qualitative analysis methods such as literature research and network investigation are applied to find out all the possible factors influencing scientific and technological（S＆T） outputs, and considering data availability, collect all related data to S＆T productivity and their influencing factors for the period 1996 -2008. Then based on the collected data, a bivariate correlation analysis method is utilized to analyse the mutual relations between S＆T outputs and their influencing factors, and with the multiple linear regression method selecting the high - influencing factors to construct a model analyzing influencing factors and prediction for S＆T outputs. Lastly based on the results of bivariate correlation analysis, a currently prevalent BP neural network prediction method is used to do a prediction study on S＆T outputs, and compare the predictive performance with that of multiple linear regression method.%首先通过文献研究和网络调查等定性分析方法梳理出科技产出能力的所有可能的影响因素,并在数据可获得性的前提下,以1996-2008年为时间维,采集科技产出能力及其影响因素的相关数据,然后对科技产出能力及其影响因素之间的相互关系进行二元相关分析,并利用多元线性回归分析方法从所有相关因素中筛选出影响程度较高的因素,构建科技产出能力的影响因素分析与预测模型。最后基于二元相关分析的结果,选择相关程度较高的因素,利用目前流行的BP神经网络预测方法对科技产出能力进行预测研究,并与多元回归分析预测模型的预测性能进行比较。
Wong, Vivian C.; Steiner, Peter M.; Cook, Thomas D.
2013-01-01
In a traditional regression-discontinuity design (RDD), units are assigned to treatment on the basis of a cutoff score and a continuous assignment variable. The treatment effect is measured at a single cutoff location along the assignment variable. This article introduces the multivariate regression-discontinuity design (MRDD), where multiple…
Design and analysis of experiments classical and regression approaches with SAS
Onyiah, Leonard C
2008-01-01
Introductory Statistical Inference and Regression Analysis Elementary Statistical Inference Regression Analysis Experiments, the Completely Randomized Design (CRD)-Classical and Regression Approaches Experiments Experiments to Compare Treatments Some Basic Ideas Requirements of a Good Experiment One-Way Experimental Layout or the CRD: Design and Analysis Analysis of Experimental Data (Fixed Effects Model) Expected Values for the Sums of Squares The Analysis of Variance (ANOVA) Table Follow-Up Analysis to Check fo
A simplified procedure of linear regression in a preliminary analysis
Directory of Open Access Journals (Sweden)
Silvia Facchinetti
2013-05-01
Full Text Available The analysis of a statistical large data-set can be led by the study of a particularly interesting variable Y – regressed – and an explicative variable X, chosen among the remained variables, conjointly observed. The study gives a simplified procedure to obtain the functional link of the variables y=y(x by a partition of the data-set into m subsets, in which the observations are synthesized by location indices (mean or median of X and Y. Polynomial models for y(x of order r are considered to verify the characteristics of the given procedure, in particular we assume r= 1 and 2. The distributions of the parameter estimators are obtained by simulation, when the fitting is done for m= r + 1. Comparisons of the results, in terms of distribution and efficiency, are made with the results obtained by the ordinary least square methods. The study also gives some considerations on the consistency of the estimated parameters obtained by the given procedure.
Fast nonlinear regression method for CT brain perfusion analysis.
Bennink, Edwin; Oosterbroek, Jaap; Kudo, Kohsuke; Viergever, Max A; Velthuis, Birgitta K; de Jong, Hugo W A M
2016-04-01
Although computed tomography (CT) perfusion (CTP) imaging enables rapid diagnosis and prognosis of ischemic stroke, current CTP analysis methods have several shortcomings. We propose a fast nonlinear regression method with a box-shaped model (boxNLR) that has important advantages over the current state-of-the-art method, block-circulant singular value decomposition (bSVD). These advantages include improved robustness to attenuation curve truncation, extensibility, and unified estimation of perfusion parameters. The method is compared with bSVD and with a commercial SVD-based method. The three methods were quantitatively evaluated by means of a digital perfusion phantom, described by Kudo et al. and qualitatively with the aid of 50 clinical CTP scans. All three methods yielded high Pearson correlation coefficients ([Formula: see text]) with the ground truth in the phantom. The boxNLR perfusion maps of the clinical scans showed higher correlation with bSVD than the perfusion maps from the commercial method. Furthermore, it was shown that boxNLR estimates are robust to noise, truncation, and tracer delay. The proposed method provides a fast and reliable way of estimating perfusion parameters from CTP scans. This suggests it could be a viable alternative to current commercial and academic methods.
Power analysis of principal components regression in genetic association studies
Institute of Scientific and Technical Information of China (English)
Yan-feng SHEN; Jun ZHU
2009-01-01
Association analysis provides an opportunity to find genetic variants underlying complex traits. A principal com-ponents regression (PCR)-based approach was shown to outperform some competing approaches. However, a limitation of this method is that the principal components (PCs) selected from single nucleotide polymorphisms (SNPs) may be unrelated to the phenotype. In this article, we investigate the theoretical properties of such a method in more detail. We first derive the exact power function of the test based on PCR, and hence clarify the relationship between the test power and the degrees of freedom (DF). Next, we extend the PCR test to a general weighted PCs test, which provides a unified framework for understanding the properties of some related statistics. We then compare the performance of these tests. We also introduce several data-driven adaptive alterna-tives to overcome difficulties in the PCR approach. Finally, we illustrate our results using simulations based on real genotype data. Simulation study shows the risk of using the unsupervised rule to determine the number of PCs, and demonstrates that there is no single uniformly powerful method for detecting genetic variants.
Directory of Open Access Journals (Sweden)
Angela Radünz Lazzari
2011-01-01
Full Text Available O ar é um meio eficiente de dispersão de poluentes atmosféricos e seucomportamento depende dos movimentos atmosféricos que ocorrem na troposfera. Em Porto Alegre, Estado do Rio Grande do Sul, há um grande tráfego diário e uma concentração de indústrias que podem ser responsáveis por emissões atmosféricas. Neste trabalho, estudou-se ocomportamento das concentrações diárias de material particulado (PM10 desta cidade, considerando a influência dos elementos meteorológicos. A análise dos dados foi realizada a partir de estatísticas descritivas, correlação linear e regressão múltipla. Os dados foram fornecidos pela Fundação Estadual de Proteção Ambiental Henrique Luiz Roessler - RS (FEPAM e pelo Instituto Nacional de Meteorologia (INMET. A partir das análises pôde-se verificar que: asconcentrações do PM10, medidos diariamente às 16h, não ultrapassaram os padrões nacionais de qualidade do ar; os elementos meteorológicos que influenciam nas concentrações do PM10 foram: a velocidade média diária do vento e a radiação média diária com relações negativas; astemperaturas médias diárias do ar e as direções, norte e noroeste, do vento, com relações positivas. As direções do vento que contribuem significativamente para diminuir as concentrações nos locais medidos são Leste e Sudeste.Air is an efficient means of atmospheric pollutants dispersal and its r behavior depends on the atmospheric movements that occur in the troposphere. In Porto Alegre, Rio Grande do Sul State, there is a large daily traffic and a concentration of industries that may be responsible for atmospheric emission. In the present work we studied the behavior of daily concentrations of particulate matter (PM10, in this city, considering the influence of meteorological variables. Dataanalysis was performed from descriptive statistics, linear correlation and multiple regressions. Data were provided by the State Foundation of Environmental
Selection of higher order regression models in the analysis of multi-factorial transcription data.
Directory of Open Access Journals (Sweden)
Olivia Prazeres da Costa
Full Text Available INTRODUCTION: Many studies examine gene expression data that has been obtained under the influence of multiple factors, such as genetic background, environmental conditions, or exposure to diseases. The interplay of multiple factors may lead to effect modification and confounding. Higher order linear regression models can account for these effects. We present a new methodology for linear model selection and apply it to microarray data of bone marrow-derived macrophages. This experiment investigates the influence of three variable factors: the genetic background of the mice from which the macrophages were obtained, Yersinia enterocolitica infection (two strains, and a mock control, and treatment/non-treatment with interferon-γ. RESULTS: We set up four different linear regression models in a hierarchical order. We introduce the eruption plot as a new practical tool for model selection complementary to global testing. It visually compares the size and significance of effect estimates between two nested models. Using this methodology we were able to select the most appropriate model by keeping only relevant factors showing additional explanatory power. Application to experimental data allowed us to qualify the interaction of factors as either neutral (no interaction, alleviating (co-occurring effects are weaker than expected from the single effects, or aggravating (stronger than expected. We find a biologically meaningful gene cluster of putative C2TA target genes that appear to be co-regulated with MHC class II genes. CONCLUSIONS: We introduced the eruption plot as a tool for visual model comparison to identify relevant higher order interactions in the analysis of expression data obtained under the influence of multiple factors. We conclude that model selection in higher order linear regression models should generally be performed for the analysis of multi-factorial microarray data.
INFLUENCE OF TOURISM SECTOR IN ALBANIAN GDP: STIMATION USING MULTIPLE REGRESSION METHOD
Directory of Open Access Journals (Sweden)
Eglantina HYSA
2012-06-01
Full Text Available During last years, tourism sector has significantly increased in Albania, since after year 1990 Albania has passed from a centralized economy to a liberal one. Tourism sector plays an important role in economic and social development. The contributions of this sector reflect directly into the generation of national income. The two main components matching the tourism movements are the number of tourists and the number of overnights in hotels. Investments done in this sector could be expected to have high positive influence in the country's GDP. This study seeks to identify the influence of tourists, their overnights in hotels and capital investment spending by all sectors directly involved in tourism sector on tourism total contribution to gross domestic product of Albania during 1996-2009. A regression analysis has been performed taking as dependent variable GDP generated by tourism sector and as independent variables, capital investment, tourist number and overnights in hotels. Even if all the variables have been found to be positivlye related, the variable ‘overnights of foreigners and Albanians in hotels' have beenfound insignificant.
Institute of Scientific and Technical Information of China (English)
薛刚; 朱庆生; 朱锦宇; 姜炜
2013-01-01
目的 采用X线测量发生髌股疼痛综合征(PFPS)膝关节的相关影像学参数,并分别与WOMAC、KUJALA和MEL-BOURNE评分系统进行多元线性回归分析.方法 筛选出49例(51膝)膝关节选取和PFPS相关的10项参数进行测量:股骨远端外翻角(DFVA,X1)、胫骨近端内翻角(PTVA,X2)、股骨角(FA,X3)、胫骨角(TA,X4)、胫股角(TFA,X5)、Insall-Salvati指数(ISR,X6)、沟角(SA,X7)、外侧髌骨角(LPA,X8)、适合角(CA,X9)、髌股指数(PI,X10),并进行WOMAC、KUJALA和MELBOURNE评分,应用多元线性回归方程分析影像学参数与评分之间的相关性.结果 3组多元线性回归方程均有统计学意义(P＜0.05),WOMAC评分多元回归方程:Y=-213.742+2.011 X5,F=3.960,R2 =0.494；KUJALA评分多元回归方程:Y=125.835-24.475 X6-0.341 X7-0.992Xs,F=32.732,R2=0.891；MELBOURNE评分多元回归方程:Y=51.66-16.329X6-5.47X10,F =22.178,R2=0.856.结论 ①膝关节X线测量数据在一定程度上反映3项评分及膝关节功能的情况；②KUJALA评分能较全面地评估PFPS,轴位X线片上Insall-Salvati指数、沟角、外侧髌股角较为重要,可用于临床评估PFPS患者在治疗前后的功能恢复情况；③由于KUJALA和MELBOURNE评分的决定系数较大,回归系数标准误较小,从而在临床上通过统计控制确定评分值来评估影像学参数.%Objective To perform multiple linear regression analysis of X ray measurement and WOMAC,KUJALA and MELBOURNE scores of patellofemoral pain syndrome (PFPS) knee joints.Methods A total of 49 patients (51 knees) were reviewed according to inclusion and exclusion criteria.10 parameters were chosen including distal femoral valgus angle (DFVA,X1),proximal tibial varus angle (PTVA,X2),femoral angle (FA,X3),tibia angle (TA,X4),tibiofemoral angle(TFA,X5),Insall-Salvati ratio (ISR,X6),sulcus angle (SA,X7),lateral patellofemoral angle (LPA,X8),congruence angle (CA,X9) and patellofemoral index(PI,X10) which all were related to patellofemoral
The Evolution of GDP in USA Using Polynomial Regression Analysis
Directory of Open Access Journals (Sweden)
Catalin Angelo Ioan
2013-10-01
Full Text Available The paper deals with the problem of statistical forecasts in terms of polynomial regression. Thus, it compares actual results with predicted variables using data sets sequentially go through all the set initially.
The Evolution of GDP in USA Using Polynomial Regression Analysis
Catalin Angelo Ioan; Gina Ioan
2013-01-01
The paper deals with the problem of statistical forecasts in terms of polynomial regression. Thus, it compares actual results with predicted variables using data sets sequentially go through all the set initially.
Regression analysis of technical parameters affecting nuclear power plant performances
International Nuclear Information System (INIS)
Since the 80's many studies have been conducted in order to explicate good and bad performances of commercial nuclear power plants (NPPs), but yet no defined correlation has been found out to be totally representative of plant operational experience. In early works, data availability and the number of operating power stations were both limited; therefore, results showed that specific technical characteristics of NPPs were supposed to be the main causal factors for successful plant operation. Although these aspects keep on assuming a significant role, later studies and observations showed that other factors concerning management and organization of the plant could instead be predominant comparing utilities operational and economic results. Utility quality, in a word, can be used to summarize all the managerial and operational aspects that seem to be effective in determining plant performance. In this paper operational data of a consistent sample of commercial nuclear power stations, out of the total 433 operating NPPs, are analyzed, mainly focusing on the last decade operational experience. The sample consists of PWR and BWR technology, operated by utilities located in different countries, including U.S. (Japan)) (France)) (Germany)) and Finland. Multivariate regression is performed using Unit Capability Factor (UCF) as the dependent variable; this factor reflects indeed the effectiveness of plant programs and practices in maximizing the available electrical generation and consequently provides an overall indication of how well plants are operated and maintained. Aspects that may not be real causal factors but which can have a consistent impact on the UCF, as technology design, supplier, size and age, are included in the analysis as independent variables. (authors)
Institute of Scientific and Technical Information of China (English)
汪文生; 张娟
2016-01-01
一个国家或地区的物流发展水平与当地的社会物流总费用占GDP的比重息息相关，理论上，这一比例不宜过高，如何采取有效措施来降低该比例已成为当务之急。本文基于多元回归模型，运用Eviews软件对可能影响该比例的四大因素进行了回归分析及ADF、AEG协整检验，研究结果表明：产业结构对社会物流总费用占GDP比重的影响最为显著；物流行业就业人数的影响作用次之；而物流基础设施投资、经济发展水平对该比例的影响并不显著。在此基础上，分析了回归结果的经济原因并联系实际情况给出了相应的可行性政策建议。%The logistics development level of a country or region is closely related to the local total social logistics costs accounted for the proportion of GDP . In theory , the ratio should't be too high . And how to take effective measures to reduce the proportion has become a top priority . The paper conducts the multiple regression model and make the regression analysis and ADF , AEG cointegration test for four possible factors by using Eviews software . The research results show that , the industrial structure is the most significant factor ;the em-ployment in the logistics industry takes the second place ;the logistics infrastructure investment and the level of economic development are not the obvious influence factors . On this basis , the economic reasons of the regression results are analyzed , and the feasible policy sug-gestions are given with practice .
Buck, J. A.; Underhill, P. R.; Morelli, J.; Krause, T. W.
2016-02-01
Nuclear steam generators (SGs) are a critical component for ensuring safe and efficient operation of a reactor. Life management strategies are implemented in which SG tubes are regularly inspected by conventional eddy current testing (ECT) and ultrasonic testing (UT) technologies to size flaws, and safe operating life of SGs is predicted based on growth models. ECT, the more commonly used technique, due to the rapidity with which full SG tube wall inspection can be performed, is challenged when inspecting ferromagnetic support structure materials in the presence of magnetite sludge and multiple overlapping degradation modes. In this work, an emerging inspection method, pulsed eddy current (PEC), is being investigated to address some of these particular inspection conditions. Time-domain signals were collected by an 8 coil array PEC probe in which ferromagnetic drilled support hole diameter, depth of rectangular tube frets and 2D tube off-centering were varied. Data sets were analyzed with a modified principal components analysis (MPCA) to extract dominant signal features. Multiple linear regression models were applied to MPCA scores to size hole diameter as well as size rectangular outer diameter tube frets. Models were improved through exploratory factor analysis, which was applied to MPCA scores to refine selection for regression models inputs by removing nonessential information.
Buffalos milk yield analysis using random regression models
Directory of Open Access Journals (Sweden)
A.S. Schierholt
2010-02-01
Full Text Available Data comprising 1,719 milk yield records from 357 females (predominantly Murrah breed, daughters of 110 sires, with births from 1974 to 2004, obtained from the Programa de Melhoramento Genético de Bubalinos (PROMEBUL and from records of EMBRAPA Amazônia Oriental - EAO herd, located in Belém, Pará, Brazil, were used to compare random regression models for estimating variance components and predicting breeding values of the sires. The data were analyzed by different models using the Legendre’s polynomial functions from second to fourth orders. The random regression models included the effects of herd-year, month of parity date of the control; regression coefficients for age of females (in order to describe the fixed part of the lactation curve and random regression coefficients related to the direct genetic and permanent environment effects. The comparisons among the models were based on the Akaike Infromation Criterion. The random effects regression model using third order Legendre’s polynomials with four classes of the environmental effect were the one that best described the additive genetic variation in milk yield. The heritability estimates varied from 0.08 to 0.40. The genetic correlation between milk yields in younger ages was close to the unit, but in older ages it was low.
Quantile regression provides a fuller analysis of speed data.
Hewson, Paul
2008-03-01
Considerable interest already exists in terms of assessing percentiles of speed distributions, for example monitoring the 85th percentile speed is a common feature of the investigation of many road safety interventions. However, unlike the mean, where t-tests and ANOVA can be used to provide evidence of a statistically significant change, inference on these percentiles is much less common. This paper examines the potential role of quantile regression for modelling the 85th percentile, or any other quantile. Given that crash risk may increase disproportionately with increasing relative speed, it may be argued these quantiles are of more interest than the conditional mean. In common with the more usual linear regression, quantile regression admits a simple test as to whether the 85th percentile speed has changed following an intervention in an analogous way to using the t-test to determine if the mean speed has changed by considering the significance of parameters fitted to a design matrix. Having briefly outlined the technique and briefly examined an application with a widely published dataset concerning speed measurements taken around the introduction of signs in Cambridgeshire, this paper will demonstrate the potential for quantile regression modelling by examining recent data from Northamptonshire collected in conjunction with a "community speed watch" programme. Freely available software is used to fit these models and it is hoped that the potential benefits of using quantile regression methods when examining and analysing speed data are demonstrated. PMID:18329400
Quantile regression provides a fuller analysis of speed data.
Hewson, Paul
2008-03-01
Considerable interest already exists in terms of assessing percentiles of speed distributions, for example monitoring the 85th percentile speed is a common feature of the investigation of many road safety interventions. However, unlike the mean, where t-tests and ANOVA can be used to provide evidence of a statistically significant change, inference on these percentiles is much less common. This paper examines the potential role of quantile regression for modelling the 85th percentile, or any other quantile. Given that crash risk may increase disproportionately with increasing relative speed, it may be argued these quantiles are of more interest than the conditional mean. In common with the more usual linear regression, quantile regression admits a simple test as to whether the 85th percentile speed has changed following an intervention in an analogous way to using the t-test to determine if the mean speed has changed by considering the significance of parameters fitted to a design matrix. Having briefly outlined the technique and briefly examined an application with a widely published dataset concerning speed measurements taken around the introduction of signs in Cambridgeshire, this paper will demonstrate the potential for quantile regression modelling by examining recent data from Northamptonshire collected in conjunction with a "community speed watch" programme. Freely available software is used to fit these models and it is hoped that the potential benefits of using quantile regression methods when examining and analysing speed data are demonstrated.
DEFF Research Database (Denmark)
Østergaard, Søren; Ettema, Jehan Frans; Hjortø, Line;
Multiple regression and model building with mediator variables was addressed to avoid double counting when economic values are estimated from data simulated with herd simulation modeling (using the SimHerd model). The simulated incidence of metritis was analyzed statistically as the independent...... in multiparous cows. The merit of using this approach was demonstrated since the economic value of metritis was estimated to be 81% higher when no mediator variables were included in the multiple regression analysis...... variable, while using the traits representing the direct effects of metritis on yield, fertility and occurrence of other diseases as mediator variables. The economic value of metritis was estimated to be €78 per 100 cow-years for each 1% increase of metritis in the period of 1-100 days in milk...
Institute of Scientific and Technical Information of China (English)
樊丽军
2016-01-01
针对城市建筑能耗的节约与有效利用，提出一种基于多元线性回归模型(MLRP)的建筑能耗预测与建筑节能分析模型。以天然气和电力为能耗目标，将建筑类型、建筑年代、占地面积和居住人数等参数作为输入特征，利用多元线性回归模型分析出对能耗具有显著性影响的因素，并预测整个区域的能耗。另外，通过该预测模型，可以评估实施改善措施后建筑的节能潜力。实验给出了各种场景下的建筑节能潜力，分析结果表明，提出的预测模型能够精确预测区域能耗。%For the issues that the saving and effective use of urban building energy consumption,a model of energy consumption forecast and energy saving analysis of urban buildings based on multiple linear re-gression model(MLRP)is proposed.This paper takes the natural gas and electric power as the energy con-sumption target,the building type,building age,floor area and number of residence as input characteristic parameters of multiple linear regression model,to analyze the factors which have a significant influence on energy consumption,so as to forecast the energy consumption of the whole region.In addition,it can evalu-ate the energy saving potential of the building after the implementation of the improvement measures by the prediction model.The experimental results show that the model can predict the regional energy consumption accurately,and give the building energy saving potential of various scenarios.
Regression Analysis of Variables Describing Poultry Meat Supply in European Countries
Directory of Open Access Journals (Sweden)
Simonič Miro
2012-11-01
Full Text Available In this paper, based on the analysis of official FAOSTAT and EUROSTAT data on poultry meat for 38 European countries for years 2007 and 2009, two hypotheses were examined. Firstly, considering four clustering variables on poultry meat, i.e. production, export and import in kg/capita, as well as the producer price in US $/t, using descriptive exploratory and cluster analysis, the hypothesis that the clusters of countries may be recognized was confirmed. As a result six clusters of similar countries were distinguished. Secondly, based on multiple regression analysis, this paper proofs that there exists the statistically significant relationship of poultry meat production on export and import of that kind of meat, all measured in kg/capita. There is also a high correlation between production, as a dependent, and each of two independent variables.
Directory of Open Access Journals (Sweden)
Renfu Jia
2016-01-01
Full Text Available This paper introduces an integrated approach to find out the major factors influencing efficiency of irrigation water use in China. It combines multiple stepwise regression (MSR and principal component analysis (PCA to obtain more realistic results. In real world case studies, classical linear regression model often involves too many explanatory variables and the linear correlation issue among variables cannot be eliminated. Linearly correlated variables will cause the invalidity of the factor analysis results. To overcome this issue and reduce the number of the variables, PCA technique has been used combining with MSR. As such, the irrigation water use status in China was analyzed to find out the five major factors that have significant impacts on irrigation water use efficiency. To illustrate the performance of the proposed approach, the calculation based on real data was conducted and the results were shown in this paper.
Institute of Scientific and Technical Information of China (English)
王华丽
2014-01-01
The hotel staff satisfaction has been watched keenly by the hotel industry and the academia. In this paper, through investigation to the high star hotels in Changsha, the basic data are obtained and multiple regression analysis is used to study the influencing factors of hotel staff satisfaction. The results indicate that promotion prospect has the largest impact on employee satisfaction, followed by compensation, and the influence of work itself is not significant in statistical sense.%酒店员工满意度问题一直受到业界和学界的普遍关注。本文通过对长沙市高星级酒店进行调查，获得基础数据，采用多元回归分析研究酒店员工满意度的影响因素，研究结果发现：晋升机会对员工满意度的影响最大，其次是薪酬，而工作本身对员工满意度的影响在统计意义上并不显著。
Grades, Gender, and Encouragement: A Regression Discontinuity Analysis
Owen, Ann L.
2010-01-01
The author employs a regression discontinuity design to provide direct evidence on the effects of grades earned in economics principles classes on the decision to major in economics and finds a differential effect for male and female students. Specifically, for female students, receiving an A for a final grade in the first economics class is…
Teaching Quantitative Literacy through a Regression Analysis of Exam Performance
Lindner, Andrew M.
2012-01-01
Quantitative literacy is increasingly essential for both informed citizenship and a variety of careers. Though regression is one of the most common methods in quantitative sociology, it is rarely taught until late in students' college careers. In this article, the author describes a classroom-based activity introducing students to regression…
Parsons, Vickie s.
2009-01-01
The request to conduct an independent review of regression models, developed for determining the expected Launch Commit Criteria (LCC) External Tank (ET)-04 cycle count for the Space Shuttle ET tanking process, was submitted to the NASA Engineering and Safety Center NESC on September 20, 2005. The NESC team performed an independent review of regression models documented in Prepress Regression Analysis, Tom Clark and Angela Krenn, 10/27/05. This consultation consisted of a peer review by statistical experts of the proposed regression models provided in the Prepress Regression Analysis. This document is the consultation's final report.
Kleijnen, J.P.C.
1995-01-01
This tutorial discusses what-if analysis and optimization of System Dynamics models. These problems are solved, using the statistical techniques of regression analysis and design of experiments (DOE). These issues are illustrated by applying the statistical techniques to a System Dynamics model for
Use of Structure Coefficients in Published Multiple Regression Articles: Beta Is Not Enough.
Courville, Troy; Thompson, Bruce
2001-01-01
Reviewed articles published in the "Journal of Applied Psychology" (JAP) to determine how interpretations might have differed if standardized regression coefficients and structure coefficients (or bivariate "r"s of predictors with the criterion) had been interpreted. Summarizes some dramatic misinterpretations or incomplete interpretations.…
DEFF Research Database (Denmark)
Larsen, Ulrik; Pierobon, Leonardo; Wronski, Jorrit;
2014-01-01
to power. In this study we propose four linear regression models to predict the maximum obtainable thermal efficiency for simple and recuperated ORCs. A previously derived methodology is able to determine the maximum thermal efficiency among many combinations of fluids and processes, given the boundary...
Wong, Vivian C.; Steiner, Peter M.; Cook, Thomas D.
2009-01-01
This paper introduces a generalization of the regression-discontinuity design (RDD). Traditionally, RDD is considered in a two-dimensional framework, with a single assignment variable and cutoff. Treatment effects are measured at a single location along the assignment variable. However, this represents a specialized (and straight-forward)…
Wong, Vivian C.; Steiner, Peter M.; Cook, Thomas D.
2012-01-01
In a traditional regression-discontinuity design (RDD), units are assigned to treatment and comparison conditions solely on the basis of a single cutoff score on a continuous assignment variable. The discontinuity in the functional form of the outcome at the cutoff represents the treatment effect, or the average treatment effect at the cutoff.…
Point Estimates and Confidence Intervals for Variable Importance in Multiple Linear Regression
Thomas, D. Roland; Zhu, PengCheng; Decady, Yves J.
2007-01-01
The topic of variable importance in linear regression is reviewed, and a measure first justified theoretically by Pratt (1987) is examined in detail. Asymptotic variance estimates are used to construct individual and simultaneous confidence intervals for these importance measures. A simulation study of their coverage properties is reported, and an…
REGRESSION ANALYSIS OF PRODUCTIVITY USING MIXED EFFECT MODEL
Directory of Open Access Journals (Sweden)
Siana Halim
2007-01-01
Full Text Available Production plants of a company are located in several areas that spread across Middle and East Java. As the production process employs mostly manpower, we suspected that each location has different characteristics affecting the productivity. Thus, the production data may have a spatial and hierarchical structure. For fitting a linear regression using the ordinary techniques, we are required to make some assumptions about the nature of the residuals i.e. independent, identically and normally distributed. However, these assumptions were rarely fulfilled especially for data that have a spatial and hierarchical structure. We worked out the problem using mixed effect model. This paper discusses the model construction of productivity and several characteristics in the production line by taking location as a random effect. The simple model with high utility that satisfies the necessary regression assumptions was built using a free statistic software R version 2.6.1.
Zhi, Shuai; Li, Qiaozhi; Yasui, Yutaka; Banting, Graham; Edge, Thomas A; Topp, Edward; McAllister, Tim A; Neumann, Norman F
2016-10-01
Several studies have demonstrated that E. coli appears to display some level of host adaptation and specificity. Recent studies in our laboratory support these findings as determined by logic regression modeling of single nucleotide polymorphisms (SNP) in intergenic regions (ITGRs). We sought to determine the degree of host-specific information encoded in various ITGRs across a library of animal E. coli isolates using both whole genome analysis and a targeted ITGR sequencing approach. Our findings demonstrated that ITGRs across the genome encode various degrees of host-specific information. Incorporating multiple ITGRs (i.e., concatenation) into logic regression model building resulted in greater host-specificity and sensitivity outcomes in biomarkers, but the overall level of polymorphism in an ITGR did not correlate with the degree of host-specificity encoded in the ITGR. This suggests that distinct SNPs in ITGRs may be more important in defining host-specificity than overall sequence variation, explaining why traditional unsupervised learning phylogenetic approaches may be less informative in terms of revealing host-specific information encoded in DNA sequence. In silico analysis of 80 candidate ITGRs from publically available E. coli genomes was performed as a tool for discovering highly host-specific ITGRs. In one ITGR (ydeR-yedS) we identified a SNP biomarker that was 98% specific for cattle and for which 92% of all E. coli isolates originating from cattle carried this unique biomarker. In the case of humans, a host-specific biomarker (98% specificity) was identified in the concatenated ITGR sequences of rcsD-ompC, ydeR-yedS, and rclR-ykgE, and for which 78% of E. coli originating from humans carried this biomarker. Interestingly, human-specific biomarkers were dominant in ITGRs regulating antibiotic resistance, whereas in cattle host-specific biomarkers were found in ITGRs involved in stress regulation. These data suggest that evolution towards host
Model performance analysis and model validation in logistic regression
Directory of Open Access Journals (Sweden)
Rosa Arboretti Giancristofaro
2007-10-01
Full Text Available In this paper a new model validation procedure for a logistic regression model is presented. At first, we illustrate a brief review of different techniques of model validation. Next, we define a number of properties required for a model to be considered "good", and a number of quantitative performance measures. Lastly, we describe a methodology for the assessment of the performance of a given model by using an example taken from a management study.
Retirement patterns in Hong Kong: A censored regression analysis
Wing Suen
1997-01-01
This paper provides an overview of retirement patterns in Hong Kong on the basis of limited data. A censored regression model is used to infer the retirement age from people`s current retirement status and their current age. This model is equivalent to a restricted probit model, and the interpretation of parameters is straightforward. The results clearly show a negative income effect on the retirement decision. The retirement age seems to be positively related to lifetime earnings but negativ...
Hermite Regression Analysis of Multi-Modal Count Data
David E. Giles
2010-01-01
We discuss the modeling of count data whose empirical distribution is both multi-modal and over-dispersed, and propose the Hermite distribution with covariates introduced through the conditional mean. The model is readily estimated by maximum likelihood, and nests the Poisson model as a special case. The Hermite regression model is applied to data for the number of banking and currency crises in IMF-member countries, and is found to out-perform the Poisson and negative binomial models.
Directory of Open Access Journals (Sweden)
Ondrej eLibiger
2015-12-01
Full Text Available It is now feasible to examine the composition and diversity of microbial communities (i.e., `microbiomes‘ that populate different human organs and orifices using DNA sequencing and related technologies. To explore the potential links between changes in microbial communities and various diseases in the human body, it is essential to test associations involving different species within and across microbiomes, environmental settings and disease states. Although a number of statistical techniques exist for carrying out relevant analyses, it is unclear which of these techniques exhibit the greatest statistical power to detect associations given the complexity of most microbiome datasets. We compared the statistical power of principal component regression, partial least squares regression, regularized regression, distance-based regression, Hill's diversity measures, and a modified test implemented in the popular and widely used microbiome analysis methodology 'Metastats‘ across a wide range of simulated scenarios involving changes in feature abundance between two sets of metagenomic samples. For this purpose, simulation studies were used to change the abundance of microbial species in a real dataset from a published study examining human hands. Each technique was applied to the same data, and its ability to detect the simulated change in abundance was assessed. We hypothesized that a small subset of methods would outperform the rest in terms of the statistical power. Indeed, we found that the Metastats technique modified to accommodate multivariate analysis and partial least squares regression yielded high power under the models and data sets we studied. The statistical power of diversity measure-based tests, distance-based regression and regularized regression was significantly lower. Our results provide insight into powerful analysis strategies that utilize information on species counts from large microbiome data sets exhibiting skewed frequency
Elvio Giasson; Robin Thomas Clarke; Alberto Vasconcellos Inda Junior; Gustavo Henrique Merten; Carlos Gustavo Tornquist
2006-01-01
Soil surveys are necessary sources of information for land use planning, but they are not always available. This study proposes the use of multiple logistic regressions on the prediction of occurrence of soil types based on reference areas. From a digitalized soil map and terrain parameters derived from the digital elevation model in ArcView environment, several sets of multiple logistic regressions were defined using statistical software Minitab, establishing relationship between explanatory...
Regression analysis of censored data using pseudo-observations
DEFF Research Database (Denmark)
Parner, Erik T.; Andersen, Per Kragh
2010-01-01
We draw upon a series of articles in which a method based on pseu- dovalues is proposed for direct regression modeling of the survival function, the restricted mean, and the cumulative incidence function in competing risks with right-censored data. The models, once the pseudovalues have been comp...... computed, can be fit using standard generalized estimating equation software. Here we present Stata procedures for computing these pseudo-observations. An example from a bone marrow transplantation study is used to illustrate the method....
Directory of Open Access Journals (Sweden)
Ya-Nan Ma
Full Text Available BACKGROUND: There have been few published studies on spirometric reference values for healthy children in China. We hypothesize that there would have been changes in lung function that would not have been precisely predicted by the existing spirometric reference equations. The objective of the study was to develop more accurate predictive equations for spirometric reference values for children aged 9 to 15 years in Northeast China. METHODOLOGY/PRINCIPAL FINDINGS: Spirometric measurements were obtained from 3,922 children, including 1,974 boys and 1,948 girls, who were randomly selected from five cities of Liaoning province, Northeast China, using the ATS (American Thoracic Society and ERS (European Respiratory Society standards. The data was then randomly split into a training subset containing 2078 cases and a validation subset containing 1844 cases. Predictive equations used multiple linear regression techniques with three predictor variables: height, age and weight. Model goodness of fit was examined using the coefficient of determination or the R(2 and adjusted R(2. The predicted values were compared with those obtained from the existing spirometric reference equations. The results showed the prediction equations using linear regression analysis performed well for most spirometric parameters. Paired t-tests were used to compare the predicted values obtained from the developed and existing spirometric reference equations based on the validation subset. The t-test for males was not statistically significant (p>0.01. The predictive accuracy of the developed equations was higher than the existing equations and the predictive ability of the model was also validated. CONCLUSION/SIGNIFICANCE: We developed prediction equations using linear regression analysis of spirometric parameters for children aged 9-15 years in Northeast China. These equations represent the first attempt at predicting lung function for Chinese children following the ATS
DEFF Research Database (Denmark)
Mears, Lisa; Nørregaard, Rasmus; Sin, Gürkan;
2016-01-01
process operating at Novozymes A/S. Following the FUPCR methodology, the final product concentration could be predicted with an average prediction error of 7.4%. Multiple iterations of preprocessing were applied by implementing the methodology to identify the best data handling methods for the model......This work proposes a methodology utilizing functional unfold principal component regression (FUPCR), for application to industrial batch process data as a process modeling and optimization tool. The methodology is applied to an industrial fermentation dataset, containing 30 batches of a production....... It is shown that application of functional data analysis and the choice of variance scaling method have the greatest impact on the prediction accuracy. Considering the vast amount of batch process data continuously generated in industry, this methodology can potentially contribute as a tool to identify...
Normalization Ridge Regression in Practice II: The Estimation of Multiple Feedback Linkages.
Bulcock, J. W.
The use of the two-stage least squares (2 SLS) procedure for estimating nonrecursive social science models is often impractical when multiple feedback linkages are required. This is because 2 SLS is extremely sensitive to multicollinearity. The standard statistical solution to the multicollinearity problem is a biased, variance reduced procedure…
An Analysis of Bank Service Satisfaction Based on Quantile Regression and Grey Relational Analysis
Wen-Tsao Pan; Yungho Leu
2016-01-01
Bank service satisfaction is vital to the success of a bank. In this paper, we propose to use the grey relational analysis to gauge the levels of service satisfaction of the banks. With the grey relational analysis, we compared the effects of different variables on service satisfaction. We gave ranks to the banks according to their levels of service satisfaction. We further used the quantile regression model to find the variables that affected the satisfaction of a customer at a specific quan...
McIntosh, Chris; Purdie, Thomas G
2016-04-01
Radiation therapy is an integral part of cancer treatment, but to date it remains highly manual. Plans are created through optimization of dose volume objectives that specify intent to minimize, maximize, or achieve a prescribed dose level to clinical targets and organs. Optimization is NP-hard, requiring highly iterative and manual initialization procedures. We present a proof-of-concept for a method to automatically infer the radiation dose directly from the patient's treatment planning image based on a database of previous patients with corresponding clinical treatment plans. Our method uses regression forests augmented with density estimation over the most informative features to learn an automatic atlas-selection metric that is tailored to dose prediction. We validate our approach on 276 patients from 3 clinical treatment plan sites (whole breast, breast cavity, and prostate), with an overall dose prediction accuracies of 78.68%, 64.76%, 86.83% under the Gamma metric. PMID:26660888
Functional Multiple-Set Canonical Correlation Analysis
Hwang, Heungsun; Jung, Kwanghee; Takane, Yoshio; Woodward, Todd S.
2012-01-01
We propose functional multiple-set canonical correlation analysis for exploring associations among multiple sets of functions. The proposed method includes functional canonical correlation analysis as a special case when only two sets of functions are considered. As in classical multiple-set canonical correlation analysis, computationally, the…
Exergy Analysis of a Subcritical Reheat Steam Power Plant with Regression Modeling and Optimization
Directory of Open Access Journals (Sweden)
MUHIB ALI RAJPER
2016-07-01
Full Text Available In this paper, exergy analysis of a 210 MW SPP (Steam Power Plant is performed. Firstly, the plant is modeled and validated, followed by a parametric study to show the effects of various operating parameters on the performance parameters. The net power output, energy efficiency, and exergy efficiency are taken as the performance parameters, while the condenser pressure, main steam pressure, bled steam pressures, main steam temperature, and reheat steam temperature isnominated as the operating parameters. Moreover, multiple polynomial regression models are developed to correlate each performance parameter with the operating parameters. The performance is then optimizedby using Direct-searchmethod. According to the results, the net power output, energy efficiency, and exergy efficiency are calculated as 186.5 MW, 31.37 and 30.41%, respectively under normal operating conditions as a base case. The condenser is a major contributor towards the energy loss, followed by the boiler, whereas the highest irreversibilities occur in the boiler and turbine. According to the parametric study, variation in the operating parameters greatly influences the performance parameters. The regression models have appeared to be a good estimator of the performance parameters. The optimum net power output, energy efficiency and exergy efficiency are obtained as 227.6 MW, 37.4 and 36.4, respectively, which have been calculated along with optimal values of selected operating parameters.
Institute of Scientific and Technical Information of China (English)
NURWAHA Deogratias; WANG Xin-hou
2008-01-01
This paper presents a comparison study of two models for predicting the strength of rotor spun cotton yarns from fiber properties. The adaptive neuro-fuzzy system inference (ANFIS) and Multiple Linear Regression models are used to predict the rotor spun yarn strength. Fiber properties and yarn count are used as inputs to train the two models and the count-strength-product (CSP) was the target. The predictive performances of the two models are estimated and compared. We found that the ANFIS has a better predictive power in comparison with linear multipleregression model. The impact of each fiber property is also illustrated.
Survival analysis of cervical cancer using stratified Cox regression
Purnami, S. W.; Inayati, K. D.; Sari, N. W. Wulan; Chosuvivatwong, V.; Sriplung, H.
2016-04-01
Cervical cancer is one of the mostly widely cancer cause of the women death in the world including Indonesia. Most cervical cancer patients come to the hospital already in an advanced stadium. As a result, the treatment of cervical cancer becomes more difficult and even can increase the death's risk. One of parameter that can be used to assess successfully of treatment is the probability of survival. This study raises the issue of cervical cancer survival patients at Dr. Soetomo Hospital using stratified Cox regression based on six factors such as age, stadium, treatment initiation, companion disease, complication, and anemia. Stratified Cox model is used because there is one independent variable that does not satisfy the proportional hazards assumption that is stadium. The results of the stratified Cox model show that the complication variable is significant factor which influent survival probability of cervical cancer patient. The obtained hazard ratio is 7.35. It means that cervical cancer patient who has complication is at risk of dying 7.35 times greater than patient who did not has complication. While the adjusted survival curves showed that stadium IV had the lowest probability of survival.
Factors associated with methadone treatment duration: a Cox regression analysis.
Directory of Open Access Journals (Sweden)
Chao-Kuang Lin
Full Text Available This study examined retention rates and associated predictors of methadone maintenance treatment (MMT duration among 128 newly admitted patients in Taiwan. A semi-structured questionnaire was used to obtain demographic and drug use history. Daily records of methadone taken and test results for HIV, HCV, and morphine toxicology were taken from a computerized medical registry. Cox regression analyses were performed to examine factors associated with MMT duration. MMT retention rates were 80.5%, 68.8%, 53.9%, and 41.4% for 3, 6, 12, and 18 months, respectively. Excluding 38 patients incarcerated during the study period, retention rates were 81.1%, 73.3%, 61.1%, and 48.9% for 3 months, 6 months, 12 months, and 18 months, respectively. No participant seroconverted to HIV and 1 died during the 18-months follow-up. Results showed that being female, imprisonment, a longer distance from house to clinic, having a lower methadone dose after 30 days, being HCV positive, and in the New Taipei city program predicted early patient dropout. The findings suggest favorable MMT outcomes of HIV seroincidence and mortality. Results indicate that the need to minimize travel distance and to provide programs that meet women's requirements justify expansion of MMT clinics in Taiwan.
Additive Intensity Regression Models in Corporate Default Analysis
DEFF Research Database (Denmark)
Lando, David; Medhat, Mamdouh; Nielsen, Mads Stenbo;
2013-01-01
We consider additive intensity (Aalen) models as an alternative to the multiplicative intensity (Cox) models for analyzing the default risk of a sample of rated, nonfinancial U.S. firms. The setting allows for estimating and testing the significance of time-varying effects. We use a variety...... of model checking techniques to identify misspecifications. In our final model, we find evidence of time-variation in the effects of distance-to-default and short-to-long term debt. Also we identify interactions between distance-to-default and other covariates, and the quick ratio covariate is significant...
Institute of Scientific and Technical Information of China (English)
鲍治诚; 夏雪龙; 王万华; 吴亚平
2014-01-01
目的：探讨老年人脑出血并发多器官功能障碍综合征（ MODS）的危险因素。方法回顾性分析2012年2月至2014年2月昆山市第一人民医院神经内科收治的80例（其中并发 MODS 54例）老年脑出血患者的临床资料。对脑出血并发 MODS的危险因素进行单因素和多因素 Logistic回归分析。结果单因素分析结果显示，患者年龄、原有基础疾病、脏器衰竭数目、营养状态、机体免疫功能和感染与脑出血并发MODS有关（P＜0．05）；多因素Logistic回归分析结果显示，原有基础疾病、脏器衰竭数目和感染是脑出血并发MODS的独立危险因素（P＜0．05）。结论老年人脑出血继发MODS的主要危险因素是原有基础疾病、脏器衰竭数目和感染，在临床工作中应积极重视这些危险因素并给予有效治疗，从而有效预防MODS的发生。%Objective To investigate the risk factors of cerebral hemorrhage with multiple organ dys-function syndrome(MODS) in the elderly.Methods Total of 80 cases of cerebral hemorrhage (54 cases with MODS) of the elderly in Neurological department of the First People′s Hospital of Kunshan from Feb. 2012 to Feb.2014 were selected,and their clinical information were retrospectively analyzed.The risk factors of the cerebral hemorrhage with MODS were analyzed by single factors and multivariate Logistic regression a-nalysis.Results Single factor analysis showed that age of patients,primary disease,the number of organ fail-ure,nutritional status,immune function,and infection were associated with cerebral hemorrhage with MODS (P<0.05).Multivariate Logistic regression analysis showed that primary disease,the number of organ fail-ure,infection were the independent risk factors of cerebral hemorrhage with MODS(P<0.05).Conclusion The risk factors of cerebral hemorrhage with MODS in the elderly include primary disease ,the number of organ failure and infection.More attention should be given
Simulation Experiments in Practice : Statistical Design and Regression Analysis
Kleijnen, J.P.C.
2007-01-01
In practice, simulation analysts often change only one factor at a time, and use graphical analysis of the resulting Input/Output (I/O) data. The goal of this article is to change these traditional, naïve methods of design and analysis, because statistical theory proves that more information is obta
International Nuclear Information System (INIS)
desired power peaking limits, desired effective and infinite neutron multiplication factors, high fast fission factor, high thermal efficiency in the conversion from thermal energy to electrical energy using the Brayton cycle, and high fuel burn-up. It is to be noted that we have kept the total mass of the fuel as constant. In this work, we present a module based (modular) approach to perform the optimization wherein, we have defined the following modules: single fuel pin cell, whole core, thermal–hydraulics, and energy conversion. In each of the modules we have defined a specific set of parameters and optimization objectives. The GA system (GAS), and RS together, play the role of optimizing each of the individual modules, and integrating the modules to determine the final nuclear reactor core. However, implementation of GA could lead to a local minimum or a non-unique set of parameters, those meet the specific optimization objectives. The GA code is built using Java, neutronic analysis using MCNP6, thermal–hydraulics calculations using Java, and regression analysis using R
Analysis of Filariasis Through Zero Inflated Poisson (ZIP) Regression Approach
Mohammad Setyo Pramono; Herti Maryani; Sri Pingit Wulandari
2014-01-01
Background: Indonesia is a tropical disease endemic areas, one of which is the disease elephantiasis (filariasis). Filariasis is filarial worm infectionand transmitted by mosquito bites. Baseline Health Survey (Riskesdas) 2007 showed that the percentage of patients with filariasis in the province of Nanggroe Aceh Darussalam (NAD) was the largest in Indonesia. Methods: Secondary data analysis from Riskesdas 2007. The unit of analysis is the individual in NAD Province. Research focused on the r...
Robust analysis of trends in noisy tokamak confinement data using geodesic least squares regression
Verdoolaege, G.; Shabbir, A.; Hornung, G.
2016-11-01
Regression analysis is a very common activity in fusion science for unveiling trends and parametric dependencies, but it can be a difficult matter. We have recently developed the method of geodesic least squares (GLS) regression that is able to handle errors in all variables, is robust against data outliers and uncertainty in the regression model, and can be used with arbitrary distribution models and regression functions. We here report on first results of application of GLS to estimation of the multi-machine scaling law for the energy confinement time in tokamaks, demonstrating improved consistency of the GLS results compared to standard least squares.
Institute of Scientific and Technical Information of China (English)
Hejun KANG; Shelley M.ALEXANDER
2009-01-01
We compared probability surfaces derived using one set of environmental variables in three Geographic Information Systems (GIS) -based approaches: logistic regression and Akaike's Information Criterion (AIC),Multiple Criteria Evaluation (MCE),and Bayesian Analysis (specifically Dempster-Shafer theory). We used lynx Lynx canadensis as our focal species,and developed our environment relationship model using track data collected in Banff National Park,Alberta,Canada,during winters from 1997 to 2000. The accuracy of the three spatial models were compared using a contingency table method. We determined the percentage of cases in which both presence and absence points were correctly classified (overall accuracy),the failure to predict a species where it occurred (omission error) and the prediction of presence where there was absence (commission error). Our overall accuracy showed the logistic regression approach was the most accurate (74.51% ). The multiple criteria evaluation was intermediate (39.22%),while the Dempster-Shafer (D-S) theory model was the poorest (29.90%). However,omission and commission error tell us a different story: logistic regression had the lowest commission error,while D-S theory produced the lowest omission error. Our results provide evidence that habitat modellers should evaluate all three error measures when ascribing confidence in their model. We suggest that for our study area at least,the logistic regression model is optimal. However,where sample size is small or the species is very rare,it may also be useful to explore and/or use a more ecologically cautious modelling approach (e.g. Dempster-Shafer) that would over-predict,protect more sites,and thereby minimize the risk of missing critical habitat in conservation plans.
Multivariate Regression Analysis of Prognostic Factors in Colorectal Cancer
Institute of Scientific and Technical Information of China (English)
YANGZuli; WANGJianping; WANGLei; DONGWenguang; HUANGYihua; QINJianzhang; ZHANWenhua
2003-01-01
Objective: To evaluate the relationship between clinicopathologic features and prognosis of col-orectal cancer after surgical treatment. Methods: The relationship between clinicopathological character-istics and prognosis of 941 patients with colorectal cancer after surgical treatment were investigated by univariate and multivariate analysis. Results: The overall 3- and 5-year survival rates of patients withcolorectal cancer after surgical treatment were 63.2% and 60.8% respectively with a median survival of 1841 days. Univariate analysis revealed that such factors as gross findings, degree of differentiation, depth of infiltration, nodal and distant metastasis and neoplastic intestinal obstruction were correlated with the survival rate. Dukes stages, gross tumor configuration, intramural spread and differentiation degree were shown to be available independent prognostic factors by multivariate analysis. Conclusion: Dukes stage,as the most important available independent prognostic factor for colorectal cancer (P<0.0005), can be used to assess the postoperative survival.
Steiner, Genevieve Z.; Barry, Robert J.; Gonsalvez, Craig J.
2016-01-01
In oddball tasks, increasing the time between stimuli within a particular condition (target-to-target interval, TTI; nontarget-to-nontarget interval, NNI) systematically enhances N1, P2, and P300 event-related potential (ERP) component amplitudes. This study examined the mechanism underpinning these effects in ERP components recorded from 28 adults who completed a conventional three-tone oddball task. Bivariate correlations, partial correlations and multiple regression explored component changes due to preceding ERP component amplitudes and intervals found within the stimulus series, rather than constraining the task with experimentally constructed intervals, which has been adequately explored in prior studies. Multiple regression showed that for targets, N1 and TTI predicted N2, TTI predicted P3a and P3b, and Processing Negativity (PN), P3b, and TTI predicted reaction time. For rare nontargets, P1 predicted N1, NNI predicted N2, and N1 predicted Slow Wave (SW). Findings show that the mechanism is operating on separate stages of stimulus-processing, suggestive of either increased activation within a number of stimulus-specific pathways, or very long component generator recovery cycles. These results demonstrate the extent to which matching-stimulus intervals influence ERP component amplitudes and behavior in a three-tone oddball task, and should be taken into account when designing similar studies. PMID:27445774
A Noncentral "t" Regression Model for Meta-Analysis
Camilli, Gregory; de la Torre, Jimmy; Chiu, Chia-Yi
2010-01-01
In this article, three multilevel models for meta-analysis are examined. Hedges and Olkin suggested that effect sizes follow a noncentral "t" distribution and proposed several approximate methods. Raudenbush and Bryk further refined this model; however, this procedure is based on a normal approximation. In the current research literature, this…
Al-Maamari, Faisal
2015-01-01
It is important to consider the question of whether teacher-, course-, and student-related factors affect student ratings of instructors in Student Evaluation of Teaching (SET) in English Language Teaching (ELT). This paper reports on a statistical analysis of SET in two large EFL programmes at a university setting in the Sultanate of Oman. I…
Directory of Open Access Journals (Sweden)
Jaworski Janusz
2016-03-01
Full Text Available Purpose. The aim of the study was to evaluate somatic and functional determinants of sports skill level in badminton players at three consecutive stages of training. Methods. The study examined 96 badminton players aged 11 to 19 years. The scope of the study included somatic characteristics, physical abilities and neurosensory abilities. Thirty nine variables were analysed in each athlete. Coefficients of multiple determination were used to evaluate the effect of structural and functional parameters on sports skill level in badminton players. Results. In the group of younger cadets, quality and effectiveness of playing were mostly determined by the level of physical abilities. In the group of cadets, the most important determinants were physical abilities, followed by somatic characteristics. In this group, coordination abilities were also important. In juniors, the most pronounced was a set of the variables that reflect physical abilities. Conclusions. Models of determination of sports skill level are most noticeable in the group of cadets. In all three groups of badminton players, the dominant effect on the quality of playing is due to a set of the variables that determine physical abilities.
The Use of Nonparametric Kernel Regression Methods in Econometric Production Analysis
DEFF Research Database (Denmark)
Czekaj, Tomasz Gerard
and hence, also in biased measures, which are derived from the estimated parameters. This, in turn, can result in incorrect economic conclusions and recommendations for managers, politicians and decision makers in general. This PhD thesis focuses on a nonparametric econometric approach that can be used......This PhD thesis addresses one of the fundamental problems in applied econometric analysis, namely the econometric estimation of regression functions. The conventional approach to regression analysis is the parametric approach, which requires the researcher to specify the form of the regression...... and nonparametric estimations of production functions in order to evaluate the optimal firm size. The second paper discusses the use of parametric and nonparametric regression methods to estimate panel data regression models. The third paper analyses production risk, price uncertainty, and farmers' risk preferences...
Energy Technology Data Exchange (ETDEWEB)
Wanke, Peter [Universidade Federal do Rio de Janeiro (UFRJ), RJ (Brazil). Instituto de Pesquisa e Pos-Graduacao em Administracao de Empresas (COPPEAD). Centro de Estudos em Logistica
2004-07-01
In this paper, the most relevant multiple regression models for sales forecasting of gas stations, developed over the past ten years, are reviewed. The most significant variables related to gas station sales, the types of the multiple regression models (linear or non-linear), the most common uses in supporting decision making and its limits are presented. The predictive power of each model and its impact on decision-making, such as sensitivity analysis and confidence intervals for independent variables, are also commented. Four models are presented, based on studies conducted in South Africa, Portugal and Brazil. In conclusion, suggestions for future developments are presented based on past developments. (author)
Directory of Open Access Journals (Sweden)
Hui Wang
2014-01-01
Full Text Available Immunoglobulin A nephropathy (IgAN is a complex trait regulated by the inter-action among multiple physiologic regulatory systems and probably involving numerous genes, which leads to inconsistent findings in genetic studies. One possibility of failure to replicate some single-locus results is that the underlying genetics of IgAN nephropathy is based on multiple genes with minor effects. To learn the association between 23 single nucleotide polymorphisms (SNPs in 14 genes predisposing to chronic glomerular diseases and IgAN in Han males, the 23 SNPs genotypes of 21 Han males were detected and analyzed with a BaiO gene chip, and their asso-ciations were analyzed with univariate analysis and multiple linear regression analysis. Analysis showed that CTLA4 rs231726 and CR2 rs1048971 revealed a significant association with IgAN. These findings support the multi-gene nature of the etiology of IgAN and propose a potential gene-gene interactive model for future studies.
Institute of Scientific and Technical Information of China (English)
张镓; 霍佳震; 胡军; 黄志明
2012-01-01
Producer service industry is Shanghai' s key industry in "12th Five-Year" period, in which is a critical period for Shanghai automotive industry, producer service industry development based on the automotive industry will contribute effectively to the two key industries in Shanghai. From the producer service industry and the automobile industry in Shanghai, based on the conceptual model of the relevance between the producer service industry and the automotive industry, applying multiple linear regression techniques, the paper does multiple linear regression of the producer service industry in Shanghai, the automobile industry and its external environmental factors, and explains their meanings.%生产性服务业是上海市“十二五”重点发展产业,“十二五”期间是上海市汽车产业的关键时期,发展基于汽车行业的生产性服务业将有效促进上海市两大关键产业的发展.从上海市生产性服务业和汽车产业发展的实践出发,基于生产性服务业和汽车行业的关联性概念模型,应用多元线性回归技术,通过对上海市生产性服务业、汽车产业及其外部环境因素的多元线性回归,论文研究了上海市生产性服务业、汽车产业和外部环境的数量关联及其含义.
Seasonal forecasting of Bangladesh summer monsoon rainfall using simple multiple regression model
Indian Academy of Sciences (India)
Md Mizanur Rahman; M Rafiuddin; Md Mahbub Alam
2013-04-01
In this paper, the development of a statistical forecasting method for summer monsoon rainfall over Bangladesh is described. Predictors for Bangladesh summer monsoon (June–September) rainfall were identified from the large scale ocean–atmospheric circulation variables (i.e., sea-surface temperature, surface air temperature and sea level pressure). The predictors exhibited a significant relationship with Bangladesh summer monsoon rainfall during the period 1961–2007. After carrying out a detailed analysis of various global climate datasets; three predictors were selected. The model performance was evaluated during the period 1977–2007. The model showed better performance in their hindcast seasonal monsoon rainfall over Bangladesh. The RMSE and Heidke skill score for 31 years was 8.13 and 0.37, respectively, and the correlation between the predicted and observed rainfall was 0.74. The BIAS of the forecasts (% of long period average, LPA) was −0.85 and Hit score was 58%. The experimental forecasts for the year 2008 summer monsoon rainfall based on the model were also found to be in good agreement with the observation.
Mehri, M
2013-04-01
Application of appropriate models to approximate the performance function warrants more precise prediction and helps to make the best decisions in the poultry industry. This study reevaluated the factors affecting hatchability in laying hens from 29 to 56 wk of age. Twenty-eight data lines representing 4 inputs consisting of egg weight, eggshell thickness, egg sphericity, and yolk/albumin ratio and 1 output, hatchability, were obtained from the literature and used to train an artificial neural network (ANN). The prediction ability of ANN was compared with that of fuzzy logic to evaluate the fitness of these 2 methods. The models were compared using R(2), mean absolute deviation (MAD), mean squared error (MSE), mean absolute percentage error (MAPE), and bias. The developed model was used to assess the relative importance of each variable on the hatchability by calculating the variable sensitivity ratio. The statistical evaluations showed that the ANN-based model predicted hatchability more accurately than fuzzy logic. The ANN-based model had a higher determination of coefficient (R(2) = 0.99) and lower residual distribution (MAD = 0.005; MSE = 0.00004; MAPE = 0.732; bias = 0.0012) than fuzzy logic (R(2) = 0.87; MAD = 0.014; MSE = 0.0004; MAPE = 2.095; bias = 0.0046). The sensitivity analysis revealed that the most important variable in the ANN-based model of hatchability was egg weight (variable sensitivity ratio, VSR = 283.11), followed by yolk/albumin ratio (VSR = 113.16), eggshell thickness (VSR = 16.23), and egg sphericity (VSR = 3.63). The results of this research showed that the universal approximation capability of ANN made it a powerful tool to approximate complex functions such as hatchability in the incubation process.
Rajab, Jasim Mohammed; Jafri, Mohd. Zubir Mat; Lim, Hwee San; Abdullah, Khiruddin
2012-10-01
This study encompasses air surface temperature (AST) modeling in the lower atmosphere. Data of four atmosphere pollutant gases (CO, O3, CH4, and H2O) dataset, retrieved from the National Aeronautics and Space Administration Atmospheric Infrared Sounder (AIRS), from 2003 to 2008 was employed to develop a model to predict AST value in the Malaysian peninsula using the multiple regression method. For the entire period, the pollutants were highly correlated (R=0.821) with predicted AST. Comparisons among five stations in 2009 showed close agreement between the predicted AST and the observed AST from AIRS, especially in the southwest monsoon (SWM) season, within 1.3 K, and for in situ data, within 1 to 2 K. The validation results of AST with AST from AIRS showed high correlation coefficient (R=0.845 to 0.918), indicating the model's efficiency and accuracy. Statistical analysis in terms of β showed that H2O (0.565 to 1.746) tended to contribute significantly to high AST values during the northeast monsoon season. Generally, these results clearly indicate the advantage of using the satellite AIRS data and a correlation analysis study to investigate the impact of atmospheric greenhouse gases on AST over the Malaysian peninsula. A model was developed that is capable of retrieving the Malaysian peninsulan AST in all weather conditions, with total uncertainties ranging between 1 and 2 K.
Ventura, Cristina; Latino, Diogo A R S; Martins, Filomena
2013-01-01
The performance of two QSAR methodologies, namely Multiple Linear Regressions (MLR) and Neural Networks (NN), towards the modeling and prediction of antitubercular activity was evaluated and compared. A data set of 173 potentially active compounds belonging to the hydrazide family and represented by 96 descriptors was analyzed. Models were built with Multiple Linear Regressions (MLR), single Feed-Forward Neural Networks (FFNNs), ensembles of FFNNs and Associative Neural Networks (AsNNs) using four different data sets and different types of descriptors. The predictive ability of the different techniques used were assessed and discussed on the basis of different validation criteria and results show in general a better performance of AsNNs in terms of learning ability and prediction of antitubercular behaviors when compared with all other methods. MLR have, however, the advantage of pinpointing the most relevant molecular characteristics responsible for the behavior of these compounds against Mycobacterium tuberculosis. The best results for the larger data set (94 compounds in training set and 18 in test set) were obtained with AsNNs using seven descriptors (R(2) of 0.874 and RMSE of 0.437 against R(2) of 0.845 and RMSE of 0.472 in MLRs, for test set). Counter-Propagation Neural Networks (CPNNs) were trained with the same data sets and descriptors. From the scrutiny of the weight levels in each CPNN and the information retrieved from MLRs, a rational design of potentially active compounds was attempted. Two new compounds were synthesized and tested against M. tuberculosis showing an activity close to that predicted by the majority of the models.
Jolly, William H.
1992-01-01
Relationships defining the ballistic limit of Space Station Freedom's (SSF) dual wall protection systems have been determined. These functions were regressed from empirical data found in Marshall Space Flight Center's (MSFC) Hypervelocity Impact Testing Summary (HITS) for the velocity range between three and seven kilometers per second. A stepwise linear least squares regression was used to determine the coefficients of several expressions that define a ballistic limit surface. Using statistical significance indicators and graphical comparisons to other limit curves, a final set of expressions is recommended for potential use in Probability of No Critical Flaw (PNCF) calculations for Space Station. The three equations listed below represent the mean curves for normal, 45 degree, and 65 degree obliquity ballistic limits, respectively, for a dual wall protection system consisting of a thin 6061-T6 aluminum bumper spaced 4.0 inches from a .125 inches thick 2219-T87 rear wall with multiple layer thermal insulation installed between the two walls. Normal obliquity is d(sub c) = 1.0514 v(exp 0.2983 t(sub 1)(exp 0.5228). Forty-five degree obliquity is d(sub c) = 0.8591 v(exp 0.0428) t(sub 1)(exp 0.2063). Sixty-five degree obliquity is d(sub c) = 0.2824 v(exp 0.1986) t(sub 1)(exp -0.3874). Plots of these curves are provided. A sensitivity study on the effects of using these new equations in the probability of no critical flaw analysis indicated a negligible increase in the performance of the dual wall protection system for SSF over the current baseline. The magnitude of the increase was 0.17 percent over 25 years on the MB-7 configuration run with the Bumper II program code.
Comparative Analysis of MOGA, NSGA-II and MOPSO for Regression Test Suite Optimization
Directory of Open Access Journals (Sweden)
Zeeshan Anwar
2014-01-01
Full Text Available In Software Engineering Regression Testing is a mandatory activity. Whenever, a change in existing system occurs and new version appears, the unchanged portions need to be regression tested for any resulting undesirable effects. During process of Regression Testing, same test cases are executed repeatedly for un-modified portion of software. This activity is an overhead and consumes huge resources and budget. To save time and resources, researches have proposed various techniques for Regression Test Suite Optimization. In this research regression test suites are minimized using three Computational Intelligence multi-objective techniques for black box testing methods. These include; 1- Multi-Objective Genetic Algorithms (MOGA, 2- Non-Dominated Sorting Genetic Algorithm (NSGA-II and 3- Multi-Objective Particle Swarm Optimization (MOPSO. Said techniques are applied on two published case studies and through experimentation, the quality of these techniques is analyzed. Four quality metrics are defined to perform this analysis. The results of research show that MOGA is better for reducing the size and thus execution time of the regression test suites as compared to MOPSO and NSGA-II. It was also found that use of MOGA, NSGA-II and MOPSO are not safe for regression test suite optimization. This is because fault detection rate and requirement coverage is reduced after optimization of Regression Test Suites.
Energy Technology Data Exchange (ETDEWEB)
Deng, Yangyang; Parajuli, Prem B.
2011-08-10
Evaluation of economic feasibility of a bio-gasification facility needs understanding of its unit cost under different production capacities. The objective of this study was to evaluate the unit cost of syngas production at capacities from 60 through 1800Nm 3/h using an economic model with three regression analysis techniques (simple regression, reciprocal regression, and log-log regression). The preliminary result of this study showed that reciprocal regression analysis technique had the best fit curve between per unit cost and production capacity, with sum of error squares (SES) lower than 0.001 and coefficient of determination of (R 2) 0.996. The regression analysis techniques determined the minimum unit cost of syngas production for micro-scale bio-gasification facilities of $0.052/Nm 3, under the capacity of 2,880 Nm 3/h. The results of this study suggest that to reduce cost, facilities should run at a high production capacity. In addition, the contribution of this technique could be the new categorical criterion to evaluate micro-scale bio-gasification facility from the perspective of economic analysis.
Regression analysis understanding and building business and economic models using Excel
Wilson, J Holton
2012-01-01
The technique of regression analysis is used so often in business and economics today that an understanding of its use is necessary for almost everyone engaged in the field. This book will teach you the essential elements of building and understanding regression models in a business/economic context in an intuitive manner. The authors take a non-theoretical treatment that is accessible even if you have a limited statistical background. It is specifically designed to teach the correct use of regression, while advising you of its limitations and teaching about common pitfalls. This book describe
Zhang, Hong-guang; Lu, Jian-gang
2016-02-01
Abstract To overcome the problems of significant difference among samples and nonlinearity between the property and spectra of samples in spectral quantitative analysis, a local regression algorithm is proposed in this paper. In this algorithm, net signal analysis method(NAS) was firstly used to obtain the net analyte signal of the calibration samples and unknown samples, then the Euclidean distance between net analyte signal of the sample and net analyte signal of calibration samples was calculated and utilized as similarity index. According to the defined similarity index, the local calibration sets were individually selected for each unknown sample. Finally, a local PLS regression model was built on each local calibration sets for each unknown sample. The proposed method was applied to a set of near infrared spectra of meat samples. The results demonstrate that the prediction precision and model complexity of the proposed method are superior to global PLS regression method and conventional local regression algorithm based on spectral Euclidean distance.
Joint Analysis of Multiple Traits in Rare Variant Association Studies.
Wang, Zhenchuan; Wang, Xuexia; Sha, Qiuying; Zhang, Shuanglin
2016-05-01
The joint analysis of multiple traits has recently become popular since it can increase statistical power to detect genetic variants and there is increasing evidence showing that pleiotropy is a widespread phenomenon in complex diseases. Currently, the majority of existing methods for the joint analysis of multiple traits test association between one common variant and multiple traits. However, the variant-by-variant methods for common variant association studies may not be optimal for rare variant association studies due to the allelic heterogeneity as well as the extreme rarity of individual variants. Current statistical methods for rare variant association studies are for one single trait only. In this paper, we propose an adaptive weighting reverse regression (AWRR) method to test association between multiple traits and rare variants in a genomic region. AWRR is robust to the directions of effects of causal variants and is also robust to the directions of association of traits. Using extensive simulation studies, we compare the performance of AWRR with canonical correlation analysis (CCA), Single-TOW, and the weighted sum reverse regression (WSRR). Our results show that, in all of the simulation scenarios, AWRR is consistently more powerful than CCA. In most scenarios, AWRR is more powerful than Single-TOW and WSRR. PMID:26990300
International Nuclear Information System (INIS)
The primary treatment goal of radiotherapy for paragangliomas of the head and neck region (HNPGLs) is local control of the tumor, i.e. stabilization of tumor volume. Interestingly, regression of tumor volume has also been reported. Up to the present, no meta-analysis has been performed giving an overview of regression rates after radiotherapy in HNPGLs. The main objective was to perform a systematic review and meta-analysis to assess regression of tumor volume in HNPGL-patients after radiotherapy. A second outcome was local tumor control. Design of the study is systematic review and meta-analysis. PubMed, EMBASE, Web of Science, COCHRANE and Academic Search Premier and references of key articles were searched in March 2012 to identify potentially relevant studies. Considering the indolent course of HNPGLs, only studies with ⩾12 months follow-up were eligible. Main outcomes were the pooled proportions of regression and local control after radiotherapy as initial, combined (i.e. directly post-operatively or post-embolization) or salvage treatment (i.e. after initial treatment has failed) for HNPGLs. A meta-analysis was performed with an exact likelihood approach using a logistic regression with a random effect at the study level. Pooled proportions with 95% confidence intervals (CI) were reported. Fifteen studies were included, concerning a total of 283 jugulotympanic HNPGLs in 276 patients. Pooled regression proportions for initial, combined and salvage treatment were respectively 21%, 33% and 52% in radiosurgery studies and 4%, 0% and 64% in external beam radiotherapy studies. Pooled local control proportions for radiotherapy as initial, combined and salvage treatment ranged from 79% to 100%. Radiotherapy for jugulotympanic paragangliomas results in excellent local tumor control and therefore is a valuable treatment for these types of tumors. The effects of radiotherapy on regression of tumor volume remain ambiguous, although the data suggest that regression can
Detrended fluctuation analysis as a regression framework: Estimating dependence at different scales
Ladislav Kristoufek
2014-01-01
We propose a framework combining detrended fluctuation analysis with standard regression methodology. The method is built on detrended variances and covariances and it is designed to estimate regression parameters at different scales and under potential non-stationarity and power-law correlations. The former feature allows for distinguishing between effects for a pair of variables from different temporal perspectives. The latter ones make the method a significant improvement over the standard...
Filip Kokotovic
2016-01-01
The study of human capital relevance to economic growth is becoming increasingly important taking into account its relevance in many of the Sustainable Development Goals proposed by the UN. This paper conducted a panel regression analysis of selected SE European countries and Scandinavian countries using the Granger causality test and pooled panel regression. In order to test the relevance of human capital on economic growth, several human capital proxy variables were identified. ...
Quantile regression for the statistical analysis of immunological data with many non-detects
Eilers Paul HC; Röder Esther; Savelkoul Huub FJ; Van Wijk Roy
2012-01-01
Abstract Background Immunological parameters are hard to measure. A well-known problem is the occurrence of values below the detection limit, the non-detects. Non-detects are a nuisance, because classical statistical analyses, like ANOVA and regression, cannot be applied. The more advanced statistical techniques currently available for the analysis of datasets with non-detects can only be used if a small percentage of the data are non-detects. Methods and results Quantile regression, a genera...
Regularized Multiple-Set Canonical Correlation Analysis
Takane, Yoshio; Hwang, Heungsun; Abdi, Herve
2008-01-01
Multiple-set canonical correlation analysis (Generalized CANO or GCANO for short) is an important technique because it subsumes a number of interesting multivariate data analysis techniques as special cases. More recently, it has also been recognized as an important technique for integrating information from multiple sources. In this paper, we…
Tahsin, Subrina; Chang, Ni-Bin
2016-02-01
Stormwater wet detention ponds have been a commonly employed best management practice for stormwater management throughout the world for many years. In the past, the trophic state index values have been used to evaluate seasonal changes in water quality and rank lakes within a region or between several regions; yet, to date, there is no similar index for stormwater wet detention ponds. This study aimed to develop a new multivariate trophic state index (MTSI) suitable for conducting a rapid eutrophication assessment of stormwater wet detention ponds under uncertainty with respect to three typical physical and chemical properties. Six stormwater wet detention ponds in Florida were selected for demonstration of the new MTSI with respect to total phosphorus (TP), total nitrogen (TN), and Secchi disk depth (SDD) as cognitive assessment metrics to sense eutrophication potential collectively and inform the environmental impact holistically. Due to the involvement of multiple endogenous variables (i.e., TN, TP, and SDD) for the eutrophication assessment simultaneously under uncertainty, fuzzy synthetic evaluation was applied to first standardize and synchronize the sources of uncertainty in the decision analysis. The ordered probit regression model was then formulated for assessment based on the concept of MTSI with the inputs from the fuzzy synthetic evaluation. It is indicative that the severe eutrophication condition is present during fall, which might be due to frequent heavy summer storm events contributing to high-nutrient inputs in these six ponds. PMID:26733470
Savescu, Roxana Florenta; Laba, Marian
2016-06-01
This paper highlights the statistical methodology used in a dissection experiment carried out in Romania to calibrate and standardize two classification devices, OptiGrade PRO (OGP) and Fat-o-Meat'er (FOM). One hundred forty-five carcasses were measured using the two probes and dissected according to the European reference method. To derive prediction formulas for each device, multiple linear regression analysis was performed on the relationship between the reference lean meat percentage and the back fat and muscle thicknesses, using the ordinary least squares technique. The root mean squared error of prediction calculated using the leave-one-out cross validation met European Commission (EC) requirements. The application of the new prediction equations reduced the gap between the lean meat percentage measured with the OGP and FOM from 2.43% (average for the period Q3/2006-Q2/2008) to 0.10% (average for the period Q3/2008-Q4/2014), providing the basis for a fair payment system for the pig producers. PMID:26835835
Biosensors and multiple mycotoxin analysis
Gaag, B. van der; Spath, S.; Dietrich, H.; Stigter, E.; Boonzaaijer, G.; Osenbruggen, T. van; Koopal, K.
2003-01-01
An immunochemical biosensor assay for the detection of multiple mycotoxins in a sample is described.The inhibition assay is designed to measure four different mycotoxins in a single measurement, following extraction, sample clean-up and incubation with an appropriate cocktail of anti-mycotoxin antib
Directory of Open Access Journals (Sweden)
Long Cheng
2016-05-01
Full Text Available Since higher education is one of the major driving forces for country development and social prosperity, and tuition plays a significant role in determining whether or not a person can afford to receive higher education, the rising tuition is a topic of big concern today. So it is essentially necessary to understand what factors affect the tuition and how they increase or decrease the tuition. Many existing studies on the rising tuition either lack large amounts of real data and proper quantitative models to support their conclusions, or are limited to focus on only a few factors that might affect the tuition, which fail to make a comprehensive analysis. In this paper, we explore a wide variety of factors that might affect the tuition growth rate by use of large amounts of authentic data and different quantitative methods such as clustering analysis and regression models.
Distance Based Root Cause Analysis and Change Impact Analysis of Performance Regressions
Directory of Open Access Journals (Sweden)
Junzan Zhou
2015-01-01
Full Text Available Performance regression testing is applied to uncover both performance and functional problems of software releases. A performance problem revealed by performance testing can be high response time, low throughput, or even being out of service. Mature performance testing process helps systematically detect software performance problems. However, it is difficult to identify the root cause and evaluate the potential change impact. In this paper, we present an approach leveraging server side logs for identifying root causes of performance problems. Firstly, server side logs are used to recover call tree of each business transaction. We define a novel distance based metric computed from call trees for root cause analysis and apply inverted index from methods to business transactions for change impact analysis. Empirical studies show that our approach can effectively and efficiently help developers diagnose root cause of performance problems.
Oliveira, H R; Silva, F F; Siqueira, O H G B D; Souza, N O; Junqueira, V S; Resende, M D V; Borquis, R R A; Rodrigues, M T
2016-05-01
We proposed multiple-trait random regression models (MTRRM) combining different functions to describe milk yield (MY) and fat (FP) and protein (PP) percentage in dairy goat genetic evaluation by using Bayesian inference. A total of 3,856 MY, FP, and PP test-day records, measured between 2000 and 2014, from 535 first lactations of Saanen and Alpine goats, including their cross, were used in this study. The initial analyses were performed using the following single-trait random regression models (STRRM): third- and fifth-order Legendre polynomials (Leg3 and Leg5), linear B-splines with 3 and 5 knots, the Ali and Schaeffer function (Ali), and Wilmink function. Heterogeneity of residual variances was modeled considering 3 classes. After the selection of the best STRRM to describe each trait on the basis of the deviance information criterion (DIC) and posterior model probabilities (PMP), the functions were combined to compose the MTRRM. All combined MTRRM presented lower DIC values and higher PMP, showing the superiority of these models when compared to other MTRRM based only on the same function assumed for all traits. Among the combined MTRRM, those considering Ali to describe MY and PP and Leg5 to describe FP (Ali_Leg5_Ali model) presented the best fit. From the Ali_Leg5_Ali model, heritability estimates over time for MY, FP. and PP ranged from 0.25 to 0.54, 0.27 to 0.48, and 0.35 to 0.51, respectively. Genetic correlation between MY and FP, MY and PP, and FP and PP ranged from -0.58 to 0.03, -0.46 to 0.12, and 0.37 to 0.64, respectively. We concluded that combining different functions under a MTRRM approach can be a plausible alternative for joint genetic evaluation of milk yield and milk constituents in goats.
Analysis and application of partial least square regression in arc welding process
Institute of Scientific and Technical Information of China (English)
YANG Hai-lan; CAI Yan; BAO Ye-feng; ZHOU Yun
2005-01-01
Because of the relativity among the parameters, partial least square regression(PLSR)was applied to build the model and get the regression equation. The improved algorithm simplified the calculating process greatly because of the reduction of calculation. The orthogonal design was adopted in this experiment. Every sample had strong representation, which could reduce the experimental time and obtain the overall test data. Combined with the formation problem of gas metal arc weld with big current, the auxiliary analysis technique of PLSR was discussed and the regression equation of form factors (i.e. surface width, weld penetration and weld reinforcement) to process parameters(i.e. wire feed rate, wire extension, welding speed, gas flow, welding voltage and welding current)was given. The correlativity structure among variables was analyzed and there was certain correlation between independent variables matrix X and dependent variables matrix Y. The regression analysis shows that the welding speed mainly influences the weld formation while the variation of gas flow in certain range has little influence on formation of weld. The fitting plot of regression accuracy is given. The fitting quality of regression equation is basically satisfactory.
An Analysis of Transit Bus Driver Distraction Using Multinomial Logistic Regression Models
D'Souza, Kelwyn
2012-01-01
This paper explores the problem of distracted driving at a regional bus transit agency to identify the sources of distraction and provide an understanding of factors responsible for driver distraction. A risk range system was developed to classify the distracting activities into four risk zones. The high risk zone distracting activities were analyzed using multinomial logistic regression models to determine the impact of various factors on the multiple categorical levels of driver distraction...
Directory of Open Access Journals (Sweden)
Baxter Lisa K
2008-05-01
Full Text Available Abstract Background There is a growing body of literature linking GIS-based measures of traffic density to asthma and other respiratory outcomes. However, no consensus exists on which traffic indicators best capture variability in different pollutants or within different settings. As part of a study on childhood asthma etiology, we examined variability in outdoor concentrations of multiple traffic-related air pollutants within urban communities, using a range of GIS-based predictors and land use regression techniques. Methods We measured fine particulate matter (PM2.5, nitrogen dioxide (NO2, and elemental carbon (EC outside 44 homes representing a range of traffic densities and neighborhoods across Boston, Massachusetts and nearby communities. Multiple three to four-day average samples were collected at each home during winters and summers from 2003 to 2005. Traffic indicators were derived using Massachusetts Highway Department data and direct traffic counts. Multivariate regression analyses were performed separately for each pollutant, using traffic indicators, land use, meteorology, site characteristics, and central site concentrations. Results PM2.5 was strongly associated with the central site monitor (R2 = 0.68. Additional variability was explained by total roadway length within 100 m of the home, smoking or grilling near the monitor, and block-group population density (R2 = 0.76. EC showed greater spatial variability, especially during winter months, and was predicted by roadway length within 200 m of the home. The influence of traffic was greater under low wind speed conditions, and concentrations were lower during summer (R2 = 0.52. NO2 showed significant spatial variability, predicted by population density and roadway length within 50 m of the home, modified by site characteristics (obstruction, and with higher concentrations during summer (R2 = 0.56. Conclusion Each pollutant examined displayed somewhat different spatial patterns
Institute of Scientific and Technical Information of China (English)
SHAO, Xueguang; CHEN, Da; XU, Heng; LIU, Zhichao; CAI, Wensheng
2009-01-01
Partial least-squares (PLS) regression has been presented as a powerful tool for spectral quantitative measure- ment. However, the improvement of the robustness and stability of PLS models is still needed, because it is difficult to build a stable model when complex samples are analyzed or outliers are contained in the calibration data set. To achieve the purpose, a robust ensemble PLS technique based on probability resampling was proposed, which is named RE-PLS. In the proposed method, a probability is firstly obtained for each calibration sample from its resid- ual in a robust regression. Then, multiple PLS models are constructed based on probability resampling. At last, the multiple PLS models are used to predict unknown samples by taking the average of the predictions from the multi- ple models as final prediction result. To validate the effectiveness and universality of the proposed method, it was applied to two different sets of NIR spectra. The results show that RE-PLS can not only effectively avoid the inter- ference of outliers but also enhance the precision of prediction and the stability of PLS regression. Thus, it may pro- vide a useful tool for multivariate calibration with multiple outliers.
Grégoire, G.
2014-12-01
The logistic regression originally is intended to explain the relationship between the probability of an event and a set of covariables. The model's coefficients can be interpreted via the odds and odds ratio, which are presented in introduction of the chapter. The observations are possibly got individually, then we speak of binary logistic regression. When they are grouped, the logistic regression is said binomial. In our presentation we mainly focus on the binary case. For statistical inference the main tool is the maximum likelihood methodology: we present the Wald, Rao and likelihoods ratio results and their use to compare nested models. The problems we intend to deal with are essentially the same as in multiple linear regression: testing global effect, individual effect, selection of variables to build a model, measure of the fitness of the model, prediction of new values… . The methods are demonstrated on data sets using R. Finally we briefly consider the binomial case and the situation where we are interested in several events, that is the polytomous (multinomial) logistic regression and the particular case of ordinal logistic regression.
Institute of Scientific and Technical Information of China (English)
孙伟; 林芳琦
2012-01-01
为了研究丰城市工业园区2012～2016年工业资金需求水平,笔者运用柯布一道格拉斯生产函数,以2005～2011年的工业增加值、就业人数和固定资产投资额为基本数据,在SPSS中运用多元回归法建立预测模型;再通过GM2008年灰色预测系统,运用灰色预测模型对工业增加值和就业人数进行预测,并进一步预测2012～2016年的资金需求和提出具有可行性的政策建议。%In order to research the industrial capital requirement of Fengcheng industrial park from 2012 to 2016,the author use Cobb-Douglas Production Function,with industrial added value,employment and fixed asset investment from 2005 to 2011 as the basic data,using multiple regression in SPSS to establish the forecasting model.Then through the Grey Forecasting System 2008,we use the Grey Forecasting Model to forecast the industrial added value and employment,and further forecast the financial needs from 2012 to 2016 and feasibility put forward some policy suggestions.
DEFF Research Database (Denmark)
Barndorff-Nielsen, Ole Eiler; Shephard, N.
2004-01-01
This paper analyses multivariate high frequency financial data using realized covariation. We provide a new asymptotic distribution theory for standard methods such as regression, correlation analysis, and covariance. It will be based on a fixed interval of time (e.g., a day or week), allowing...... the number of high frequency returns during this period to go to infinity. Our analysis allows us to study how high frequency correlations, regressions, and covariances change through time. In particular we provide confidence intervals for each of these quantities....
Pineda, Silvia; Real, Francisco X; Kogevinas, Manolis; Carrato, Alfredo; Chanock, Stephen J; Malats, Núria; Van Steen, Kristel
2015-12-01
Omics data integration is becoming necessary to investigate the genomic mechanisms involved in complex diseases. During the integration process, many challenges arise such as data heterogeneity, the smaller number of individuals in comparison to the number of parameters, multicollinearity, and interpretation and validation of results due to their complexity and lack of knowledge about biological processes. To overcome some of these issues, innovative statistical approaches are being developed. In this work, we propose a permutation-based method to concomitantly assess significance and correct by multiple testing with the MaxT algorithm. This was applied with penalized regression methods (LASSO and ENET) when exploring relationships between common genetic variants, DNA methylation and gene expression measured in bladder tumor samples. The overall analysis flow consisted of three steps: (1) SNPs/CpGs were selected per each gene probe within 1Mb window upstream and downstream the gene; (2) LASSO and ENET were applied to assess the association between each expression probe and the selected SNPs/CpGs in three multivariable models (SNP, CPG, and Global models, the latter integrating SNPs and CPGs); and (3) the significance of each model was assessed using the permutation-based MaxT method. We identified 48 genes whose expression levels were significantly associated with both SNPs and CPGs. Importantly, 36 (75%) of them were replicated in an independent data set (TCGA) and the performance of the proposed method was checked with a simulation study. We further support our results with a biological interpretation based on an enrichment analysis. The approach we propose allows reducing computational time and is flexible and easy to implement when analyzing several types of omics data. Our results highlight the importance of integrating omics data by applying appropriate statistical strategies to discover new insights into the complex genetic mechanisms involved in disease
Baird, Jim; Curry, Robin; Reid, Tim
2013-03-01
This article describes the development and application of a multiple linear regression model to identify how the key elements of waste and recycling infrastructure, namely container capacity and frequency of collection, affect the yield from municipal kerbside recycling programmes. The overall aim of the research was to gain an understanding of the factors affecting the yield from municipal kerbside recycling programmes in Scotland with an underlying objective to evaluate the efficacy of the model as a decision-support tool for informing the design of kerbside recycling programmes. The study isolates the principal kerbside collection service offered by all 32 councils across Scotland, eliminating those recycling programmes associated with flatted properties or multi-occupancies. The results of the regression analysis model have identified three principal factors which explain 80% of the variability in the average yield of the principal dry recyclate services: weekly residual waste capacity, number of materials collected and the weekly recycling capacity. The use of the model has been evaluated and recommendations made on ongoing methodological development and the use of the results in informing the design of kerbside recycling programmes. We hope that the research can provide insights for the further development of methods to optimise the design and operation of kerbside recycling programmes.
Directory of Open Access Journals (Sweden)
Francesco Gregoretti
Full Text Available The reverse engineering of gene regulatory networks using gene expression profile data has become crucial to gain novel biological knowledge. Large amounts of data that need to be analyzed are currently being produced due to advances in microarray technologies. Using current reverse engineering algorithms to analyze large data sets can be very computational-intensive. These emerging computational requirements can be met using parallel computing techniques. It has been shown that the Network Identification by multiple Regression (NIR algorithm performs better than the other ready-to-use reverse engineering software. However it cannot be used with large networks with thousands of nodes--as is the case in biological networks--due to the high time and space complexity. In this work we overcome this limitation by designing and developing a parallel version of the NIR algorithm. The new implementation of the algorithm reaches a very good accuracy even for large gene networks, improving our understanding of the gene regulatory networks that is crucial for a wide range of biomedical applications.
Directory of Open Access Journals (Sweden)
Fereshteh Shiri
2010-08-01
Full Text Available In the present work, support vector machines (SVMs and multiple linear regression (MLR techniques were used for quantitative structure–property relationship (QSPR studies of retention time (tR in standardized liquid chromatography–UV–mass spectrometry of 67 mycotoxins (aflatoxins, trichothecenes, roquefortines and ochratoxins based on molecular descriptors calculated from the optimized 3D structures. By applying missing value, zero and multicollinearity tests with a cutoff value of 0.95, and genetic algorithm method of variable selection, the most relevant descriptors were selected to build QSPR models. MLRand SVMs methods were employed to build QSPR models. The robustness of the QSPR models was characterized by the statistical validation and applicability domain (AD. The prediction results from the MLR and SVM models are in good agreement with the experimental values. The correlation and predictability measure by r2 and q2 are 0.931 and 0.932, repectively, for SVM and 0.923 and 0.915, respectively, for MLR. The applicability domain of the model was investigated using William’s plot. The effects of different descriptors on the retention times are described.
Zhao, Wei; Fan, Shaojia; Guo, Hai; Gao, Bo; Sun, Jiaren; Chen, Laiguo
2016-11-01
The quantile regression (QR) method has been increasingly introduced to atmospheric environmental studies to explore the non-linear relationship between local meteorological conditions and ozone mixing ratios. In this study, we applied QR for the first time, together with multiple linear regression (MLR), to analyze the dominant meteorological parameters influencing the mean, 10th percentile, 90th percentile and 99th percentile of maximum daily 8-h average (MDA8) ozone concentrations in 2000-2015 in Hong Kong. The dominance analysis (DA) was used to assess the relative importance of meteorological variables in the regression models. Results showed that the MLR models worked better at suburban and rural sites than at urban sites, and worked better in winter than in summer. QR models performed better in summer for 99th and 90th percentiles and performed better in autumn and winter for 10th percentile. And QR models also performed better in suburban and rural areas for 10th percentile. The top 3 dominant variables associated with MDA8 ozone concentrations, changing with seasons and regions, were frequently associated with the six meteorological parameters: boundary layer height, humidity, wind direction, surface solar radiation, total cloud cover and sea level pressure. Temperature rarely became a significant variable in any season, which could partly explain the peak of monthly average ozone concentrations in October in Hong Kong. And we found the effect of solar radiation would be enhanced during extremely ozone pollution episodes (i.e., the 99th percentile). Finally, meteorological effects on MDA8 ozone had no significant changes before and after the 2010 Asian Games.
Baghi, Q; Bergé, J; Christophe, B; Touboul, P; Rodrigues, M
2015-01-01
The analysis of physical measurements often copes with highly correlated noises and interruptions caused by outliers, saturation events or transmission losses. We assess the impact of missing data on the performance of linear regression analysis involving the fit of modeled or measured time series. We show that data gaps can significantly alter the precision of the regression parameter estimation in the presence of colored noise, due to the frequency leakage of the noise power. We present a regression method which cancels this effect and estimates the parameters of interest with a precision comparable to the complete data case, even if the noise power spectral density (PSD) is not known a priori. The method is based on an autoregressive (AR) fit of the noise, which allows us to build an approximate generalized least squares estimator approaching the minimal variance bound. The method, which can be applied to any similar data processing, is tested on simulated measurements of the MICROSCOPE space mission, whos...
A Bayesian ridge regression analysis of congestion's impact on urban expressway safety.
Shi, Qi; Abdel-Aty, Mohamed; Lee, Jaeyoung
2016-03-01
With the rapid growth of traffic in urban areas, concerns about congestion and traffic safety have been heightened. This study leveraged both Automatic Vehicle Identification (AVI) system and Microwave Vehicle Detection System (MVDS) installed on an expressway in Central Florida to explore how congestion impacts the crash occurrence in urban areas. Multiple congestion measures from the two systems were developed. To ensure more precise estimates of the congestion's effects, the traffic data were aggregated into peak and non-peak hours. Multicollinearity among traffic parameters was examined. The results showed the presence of multicollinearity especially during peak hours. As a response, ridge regression was introduced to cope with this issue. Poisson models with uncorrelated random effects, correlated random effects, and both correlated random effects and random parameters were constructed within the Bayesian framework. It was proven that correlated random effects could significantly enhance model performance. The random parameters model has similar goodness-of-fit compared with the model with only correlated random effects. However, by accounting for the unobserved heterogeneity, more variables were found to be significantly related to crash frequency. The models indicated that congestion increased crash frequency during peak hours while during non-peak hours it was not a major crash contributing factor. Using the random parameter model, the three congestion measures were compared. It was found that all congestion indicators had similar effects while Congestion Index (CI) derived from MVDS data was a better congestion indicator for safety analysis. Also, analyses showed that the segments with higher congestion intensity could not only increase property damage only (PDO) crashes, but also more severe crashes. In addition, the issues regarding the necessity to incorporate specific congestion indicator for congestion's effects on safety and to take care of the
A Bayesian ridge regression analysis of congestion's impact on urban expressway safety.
Shi, Qi; Abdel-Aty, Mohamed; Lee, Jaeyoung
2016-03-01
With the rapid growth of traffic in urban areas, concerns about congestion and traffic safety have been heightened. This study leveraged both Automatic Vehicle Identification (AVI) system and Microwave Vehicle Detection System (MVDS) installed on an expressway in Central Florida to explore how congestion impacts the crash occurrence in urban areas. Multiple congestion measures from the two systems were developed. To ensure more precise estimates of the congestion's effects, the traffic data were aggregated into peak and non-peak hours. Multicollinearity among traffic parameters was examined. The results showed the presence of multicollinearity especially during peak hours. As a response, ridge regression was introduced to cope with this issue. Poisson models with uncorrelated random effects, correlated random effects, and both correlated random effects and random parameters were constructed within the Bayesian framework. It was proven that correlated random effects could significantly enhance model performance. The random parameters model has similar goodness-of-fit compared with the model with only correlated random effects. However, by accounting for the unobserved heterogeneity, more variables were found to be significantly related to crash frequency. The models indicated that congestion increased crash frequency during peak hours while during non-peak hours it was not a major crash contributing factor. Using the random parameter model, the three congestion measures were compared. It was found that all congestion indicators had similar effects while Congestion Index (CI) derived from MVDS data was a better congestion indicator for safety analysis. Also, analyses showed that the segments with higher congestion intensity could not only increase property damage only (PDO) crashes, but also more severe crashes. In addition, the issues regarding the necessity to incorporate specific congestion indicator for congestion's effects on safety and to take care of the
Family Background Variables as Instruments for Education in Income Regressions: A Bayesian Analysis
Hoogerheide, Lennart; Block, Joern H.; Thurik, Roy
2012-01-01
The validity of family background variables instrumenting education in income regressions has been much criticized. In this paper, we use data from the 2004 German Socio-Economic Panel and Bayesian analysis to analyze to what degree violations of the strict validity assumption affect the estimation results. We show that, in case of moderate direct…
DEFF Research Database (Denmark)
Kinnebrock, Silja; Podolskij, Mark
This paper introduces a new estimator to measure the ex-post covariation between high-frequency financial time series under market microstructure noise. We provide an asymptotic limit theory (including feasible central limit theorems) for standard methods such as regression, correlation analysis ...
Isolating the Effects of Training Using Simple Regression Analysis: An Example of the Procedure.
Waugh, C. Keith
This paper provides a case example of simple regression analysis, a forecasting procedure used to isolate the effects of training from an identified extraneous variable. This case example focuses on results of a three-day sales training program to improve bank loan officers' knowledge, skill-level, and attitude regarding solicitation and sale of…
Catching up with Harvard: Results from Regression Analysis of World Universities League Tables
Li, Mei; Shankar, Sriram; Tang, Kam Ki
2011-01-01
This paper uses regression analysis to test if the universities performing less well according to Shanghai Jiao Tong University's world universities league tables are able to catch up with the top performers, and to identify national and institutional factors that could affect this catching up process. We have constructed a dataset of 461…
Fast algorithm of the robust Gaussian regression filter for areal surface analysis
International Nuclear Information System (INIS)
In this paper, the general model of the Gaussian regression filter for areal surface analysis is explored. The intrinsic relationships between the linear Gaussian filter and the robust filter are addressed. A general mathematical solution for this model is presented. Based on this technique, a fast algorithm is created. Both simulated and practical engineering data (stochastic and structured) have been used in the testing of the fast algorithm. Results show that with the same accuracy, the processing time of the second-order nonlinear regression filters for a dataset of 1024*1024 points has been reduced to several seconds from the several hours of traditional algorithms
Regression analysis of non-contact acousto-thermal signature data
Criner, Amanda; Schehl, Norman
2016-05-01
The non-contact acousto-thermal signature (NCATS) is a nondestructive evaluation technique with potential to detect fatigue in materials such as noisy titanium and polymer matrix composites. The underlying physical mechanisms and properties may be determined by parameter estimation via nonlinear regression. The nonlinear regression analysis formulation, including the underlying models, is discussed. Several models and associated data analyses are given along with the assumptions implicit in the underlying model. The results are anomalous. These anomalous results are evaluated with respect to the accuracy of the implicit assumptions.
Methods and applications of linear models regression and the analysis of variance
Hocking, Ronald R
2013-01-01
Praise for the Second Edition"An essential desktop reference book . . . it should definitely be on your bookshelf." -Technometrics A thoroughly updated book, Methods and Applications of Linear Models: Regression and the Analysis of Variance, Third Edition features innovative approaches to understanding and working with models and theory of linear regression. The Third Edition provides readers with the necessary theoretical concepts, which are presented using intuitive ideas rather than complicated proofs, to describe the inference that is appropriate for the methods being discussed. The book
Pradhan, B.; Buchroithner, M. F.; Mansor, S.
2009-04-01
This paper presents the assessment results of spatially based probabilistic three models using Geoinformation Techniques (GIT) for landslide susceptibility analysis at Penang Island in Malaysia. Landslide locations within the study areas were identified by interpreting aerial photographs, satellite images and supported with field surveys. Maps of the topography, soil type, lineaments and land cover were constructed from the spatial data sets. There are nine landslide related factors were extracted from the spatial database and the neural network, frequency ratio and logistic regression coefficients of each factor was computed. Landslide susceptibility maps were drawn for study area using neural network, frequency ratios and logistic regression models. For verification, the results of the analyses were compared with actual landslide locations in study area. The verification results show that frequency ratio model provides higher prediction accuracy than the ANN and regression models.
Quantile Regression Analysis on Convergence of China’s Regional Economic Growth
Institute of Scientific and Technical Information of China (English)
Kun; HE
2014-01-01
Using quantile regression method,this paper made an empirical analysis on convergence of China’s regional economic growth since the reform and opening-up.It firstly introduced principle of quantile regression method and related theories of convergence of economic growth.Through discussing interprovincial variation coefficient of GDP per capita,it carried out σ convergence analysis on economic growth and divided 3 decades since the reform and opening-up into 3 stages.Then,it made a comparative analysis of absolute β convergence on 3 stages using least-squares estimation and quantile regression method,and also stressed the advantage of quantile regression method.On this basis,it made an in-depth study on conditional β convergence at 3 stages.Empirical results indicate that there is absolute and conditional convergence at the first stage,no convergence at the second stage,and weak convergence at the third stage.Finally,it discussed weak points in this study and came up with recommendations for future studies.
Directory of Open Access Journals (Sweden)
Wen-Cheng Wang
2014-01-01
Full Text Available It is estimated that mainland Chinese tourists travelling to Taiwan can bring annual revenues of 400 billion NTD to the Taiwan economy. Thus, how the Taiwanese Government formulates relevant measures to satisfy both sides is the focus of most concern. Taiwan must improve the facilities and service quality of its tourism industry so as to attract more mainland tourists. This paper conducted a questionnaire survey of mainland tourists and used grey relational analysis in grey mathematics to analyze the satisfaction performance of all satisfaction question items. The first eight satisfaction items were used as independent variables, and the overall satisfaction performance was used as a dependent variable for quantile regression model analysis to discuss the relationship between the dependent variable under different quantiles and independent variables. Finally, this study further discussed the predictive accuracy of the least mean regression model and each quantile regression model, as a reference for research personnel. The analysis results showed that other variables could also affect the overall satisfaction performance of mainland tourists, in addition to occupation and age. The overall predictive accuracy of quantile regression model Q0.25 was higher than that of the other three models.
DEFF Research Database (Denmark)
Ussery, David; Bohlin, Jon; Skjerve, Eystein
2009-01-01
Recently there has been an explosion in the availability of bacterial genomic sequences, making possible now an analysis of genomic signatures across more than 800 hundred different bacterial chromosomes, from a wide variety of environments. Using genomic signatures, we pair-wise compared 867 dif...... and multinomial regression analysis indicate that the genomic signature is shaped by many factors, and this may explain the varying ability to classify prokaryotic organisms below genus level....
Directory of Open Access Journals (Sweden)
Nora Fenske
Full Text Available BACKGROUND: Most attempts to address undernutrition, responsible for one third of global child deaths, have fallen behind expectations. This suggests that the assumptions underlying current modelling and intervention practices should be revisited. OBJECTIVE: We undertook a comprehensive analysis of the determinants of child stunting in India, and explored whether the established focus on linear effects of single risks is appropriate. DESIGN: Using cross-sectional data for children aged 0-24 months from the Indian National Family Health Survey for 2005/2006, we populated an evidence-based diagram of immediate, intermediate and underlying determinants of stunting. We modelled linear, non-linear, spatial and age-varying effects of these determinants using additive quantile regression for four quantiles of the Z-score of standardized height-for-age and logistic regression for stunting and severe stunting. RESULTS: At least one variable within each of eleven groups of determinants was significantly associated with height-for-age in the 35% Z-score quantile regression. The non-modifiable risk factors child age and sex, and the protective factors household wealth, maternal education and BMI showed the largest effects. Being a twin or multiple birth was associated with dramatically decreased height-for-age. Maternal age, maternal BMI, birth order and number of antenatal visits influenced child stunting in non-linear ways. Findings across the four quantile and two logistic regression models were largely comparable. CONCLUSIONS: Our analysis confirms the multifactorial nature of child stunting. It emphasizes the need to pursue a systems-based approach and to consider non-linear effects, and suggests that differential effects across the height-for-age distribution do not play a major role.
Regression analysis to predict growth performance from dietary net energy in growing-finishing pigs.
Nitikanchana, S; Dritz, S S; Tokach, M D; DeRouchey, J M; Goodband, R D; White, B J
2015-06-01
Data from 41 trials with multiple energy levels (285 observations) were used in a meta-analysis to predict growth performance based on dietary NE concentration. Nutrient and energy concentrations in all diets were estimated using the NRC ingredient library. Predictor variables examined for best fit models using Akaike information criteria included linear and quadratic terms of NE, BW, CP, standardized ileal digestible (SID) Lys, crude fiber, NDF, ADF, fat, ash, and their interactions. The initial best fit models included interactions between NE and CP or SID Lys. After removal of the observations that fed SID Lys below the suggested requirement, these terms were no longer significant. Including dietary fat in the model with NE and BW significantly improved the G:F prediction model, indicating that NE may underestimate the influence of fat on G:F. The meta-analysis indicated that, as long as diets are adequate for other nutrients (i.e., Lys), dietary NE is adequate to predict changes in ADG across different dietary ingredients and conditions. The analysis indicates that ADG increases with increasing dietary NE and BW but decreases when BW is above 87 kg. The G:F ratio improves with increasing dietary NE and fat but decreases with increasing BW. The regression equations were then evaluated by comparing the actual and predicted performance of 543 finishing pigs in 2 trials fed 5 dietary treatments, included 3 different levels of NE by adding wheat middlings, soybean hulls, dried distillers grains with solubles (DDGS; 8 to 9% oil), or choice white grease (CWG) to a corn-soybean meal-based diet. Diets were 1) 30% DDGS, 20% wheat middlings, and 4 to 5% soybean hulls (low energy); 2) 20% wheat middlings and 4 to 5% soybean hulls (low energy); 3) a corn-soybean meal diet (medium energy); 4) diet 2 supplemented with 3.7% CWG to equalize the NE level to diet 3 (medium energy); and 5) a corn-soybean meal diet with 3.7% CWG (high energy). Only small differences were observed
Applying support vector regression analysis on grip force level-related corticomuscular coherence
DEFF Research Database (Denmark)
Rong, Yao; Han, Xixuan; Hao, Dongmei;
2014-01-01
in an accessory muscle, this study proposed an expanded support vector regression (ESVR) algorithm to quantify the coherence between electroencephalogram (EEG) from sensorimotor cortex and surface electromyogram (EMG) from brachioradialis in upper limb. A measure called coherence proportion was introduced...... to compare the corticomuscular coherence in the alpha (7–15Hz), beta (15–30Hz) and gamma (30–45Hz) band at 25 % maximum grip force (MGF) and 75 % MGF. Results show that ESVR could reduce the influence of deflected signals and summarize the overall behavior of multiple coherence curves. Coherence proportion...
Multi-dimensional regression analysis of the process of dewatering frozen peat
Energy Technology Data Exchange (ETDEWEB)
Aleksandrov, B.M.
1986-05-01
Studies dewatering of good-quality, frozen peat with a low decomposition level frozen at -5 to -10 C using regression analysis in order to assess the feasibility of dewatering peat with a cryonic texture to provide a product with standard moisture content. Factors analyzed include: specific charge of dried peat relative to pressure applied per unit of area; initial peat moisture content; pressure applied to the peat; time of peat in press; final frozen peat moisture content after compression. Experimental data are processed using the Jordan-Gaussian method of statistical analysis. It is concluded that regression analysis can be used under certain circumstances to forecast the value of one of the random variables in the dewatering process if the other variables are known; accuracy, however, depends on the statistical distribution of the data available.
Multilayer Perceptron for Robust Nonlinear Interval Regression Analysis Using Genetic Algorithms
2014-01-01
On the basis of fuzzy regression, computational models in intelligence such as neural networks have the capability to be applied to nonlinear interval regression analysis for dealing with uncertain and imprecise data. When training data are not contaminated by outliers, computational models perform well by including almost all given training data in the data interval. Nevertheless, since training data are often corrupted by outliers, robust learning algorithms employed to resist outliers for interval regression analysis have been an interesting area of research. Several approaches involving computational intelligence are effective for resisting outliers, but the required parameters for these approaches are related to whether the collected data contain outliers or not. Since it seems difficult to prespecify the degree of contamination beforehand, this paper uses multilayer perceptron to construct the robust nonlinear interval regression model using the genetic algorithm. Outliers beyond or beneath the data interval will impose slight effect on the determination of data interval. Simulation results demonstrate that the proposed method performs well for contaminated datasets. PMID:25110755
Directory of Open Access Journals (Sweden)
Gardênia Abbad
2002-01-01
Full Text Available Este artigo discute algumas aplicações das técnicas de análise de regressão múltipla stepwise e hierárquica, as quais são muito utilizadas em pesquisas da área de Psicologia Organizacional. São discutidas algumas estratégias de identificação e de solução de problemas relativos à ocorrência de erros do Tipo I e II e aos fenômenos de supressão, complementaridade e redundância nas equações de regressão múltipla. São apresentados alguns exemplos de pesquisas nas quais esses padrões de associação entre variáveis estiveram presentes e descritas as estratégias utilizadas pelos pesquisadores para interpretá-los. São discutidas as aplicações dessas análises no estudo de interação entre variáveis e na realização de testes para avaliação da linearidade do relacionamento entre variáveis. Finalmente, são apresentadas sugestões para lidar com as limitações das análises de regressão múltipla (stepwise e hierárquica.This article discusses applications of stepwise and hierarchical multiple regression analyses to research in organizational psychology. Strategies for identifying type I and II errors, and solutions to potential problems that may arise from such errors are proposed. In addition, phenomena such as suppression, complementarity, and redundancy are reviewed. The article presents examples of research where these phenomena occurred, and the manner in which they were explained by researchers. Some applications of multiple regression analyses to studies involving between-variable interactions are presented, along with tests used to analyze the presence of linearity among variables. Finally, some suggestions are provided for dealing with limitations implicit in multiple regression analyses (stepwise and hierarchical.
Torres-Valencia, Cristian A; Álvarez, Mauricio A; Orozco-Gutiérrez, Alvaro A
2014-01-01
Human emotion recognition (HER) allows the assessment of an affective state of a subject. Until recently, such emotional states were described in terms of discrete emotions, like happiness or contempt. In order to cover a high range of emotions, researchers in the field have introduced different dimensional spaces for emotion description that allow the characterization of affective states in terms of several variables or dimensions that measure distinct aspects of the emotion. One of the most common of such dimensional spaces is the bidimensional Arousal/Valence space. To the best of our knowledge, all HER systems so far have modelled independently, the dimensions in these dimensional spaces. In this paper, we study the effect of modelling the output dimensions simultaneously and show experimentally the advantages in modeling them in this way. We consider a multimodal approach by including features from the Electroencephalogram and a few physiological signals. For modelling the multiple outputs, we employ a multiple output regressor based on support vector machines. We also include an stage of feature selection that is developed within an embedded approach known as Recursive Feature Elimination (RFE), proposed initially for SVM. The results show that several features can be eliminated using the multiple output support vector regressor with RFE without affecting the performance of the regressor. From the analysis of the features selected in smaller subsets via RFE, it can be observed that the signals that are more informative into the arousal and valence space discrimination are the EEG, Electrooculogram/Electromiogram (EOG/EMG) and the Galvanic Skin Response (GSR).
Lo, Benjamin W. Y.; Fukuda, Hitoshi; Angle, Mark; Teitelbaum, Jeanne; Macdonald, R. Loch; Farrokhyar, Forough; Thabane, Lehana; Levine, Mitchell A. H.
2016-01-01
Background: Classification and regression tree analysis involves the creation of a decision tree by recursive partitioning of a dataset into more homogeneous subgroups. Thus far, there is scarce literature on using this technique to create clinical prediction tools for aneurysmal subarachnoid hemorrhage (SAH). Methods: The classification and regression tree analysis technique was applied to the multicenter Tirilazad database (3551 patients) in order to create the decision-making algorithm. In order to elucidate prognostic subgroups in aneurysmal SAH, neurologic, systemic, and demographic factors were taken into account. The dependent variable used for analysis was the dichotomized Glasgow Outcome Score at 3 months. Results: Classification and regression tree analysis revealed seven prognostic subgroups. Neurological grade, occurrence of post-admission stroke, occurrence of post-admission fever, and age represented the explanatory nodes of this decision tree. Split sample validation revealed classification accuracy of 79% for the training dataset and 77% for the testing dataset. In addition, the occurrence of fever at 1-week post-aneurysmal SAH is associated with increased odds of post-admission stroke (odds ratio: 1.83, 95% confidence interval: 1.56–2.45, P decision making. This prognostic decision-making algorithm also shed light on the complex interactions between a number of risk factors in determining outcome after aneurysmal SAH. PMID:27512607
TWO-VARIANCE REGRESSION ANALYSIS METHOD%双方差回归分析方法
Institute of Scientific and Technical Information of China (English)
傅惠民; 吴琼
2011-01-01
提出双方差回归模型,建立双方差回归分析方法,给出其回归方程和高置信水平、高可靠度的置信限曲线.双方差回归模型包含完全相关随机变量和相互独立随机变量,前者可用相关方差表征,后者则需用独立方差描述.传统回归分析主要适用于处理在一条曲线上随机波动的数据,而文中方法则可处理在多条不同曲线上随机波动的数据.在性能曲线测试中,文中方法与成组试验法相比,具有信息量大、精度高,所需试样少的特点.%The two-variance regression model is put forward, on the basis of which the two-variance regression analysis method is established. Then the regression equation and confidence limit curves with high confidence level and high reliability are also given. The two-variance regression model involves totally correlated random variable and independent random variable which can be represented by correlated variance and independent variance, respectively. The presented method extends the regression analysis, which is mainly suitable to test data fluctuating around only one curve, to the test data fluctuating around several different curves that is very common in engineering. Compared with the group test method, the presented method not only has higher precision but also solves the problem of reliability assessment with very small sample.
Forecasting municipal solid waste generation using prognostic tools and regression analysis.
Ghinea, Cristina; Drăgoi, Elena Niculina; Comăniţă, Elena-Diana; Gavrilescu, Marius; Câmpean, Teofil; Curteanu, Silvia; Gavrilescu, Maria
2016-11-01
For an adequate planning of waste management systems the accurate forecast of waste generation is an essential step, since various factors can affect waste trends. The application of predictive and prognosis models are useful tools, as reliable support for decision making processes. In this paper some indicators such as: number of residents, population age, urban life expectancy, total municipal solid waste were used as input variables in prognostic models in order to predict the amount of solid waste fractions. We applied Waste Prognostic Tool, regression analysis and time series analysis to forecast municipal solid waste generation and composition by considering the Iasi Romania case study. Regression equations were determined for six solid waste fractions (paper, plastic, metal, glass, biodegradable and other waste). Accuracy Measures were calculated and the results showed that S-curve trend model is the most suitable for municipal solid waste (MSW) prediction. PMID:27454099
Regression Analysis of Right-censored Failure Time Data with Missing Censoring Indicators
Institute of Scientific and Technical Information of China (English)
Ping Chen; Ren He; Jun-shan Shen; Jian-guo Sun
2009-01-01
This paper discusses regression analysis of right-censored failure time data when censoring indicators are missing for some subjects. Several methods have been developed for the analysis under different situations and especially, Goetghebeur and Ryan[4] considered the situation where both the failure time and the censoring time follow the proportional hazards models marginally and developed an estimating equation approach. One limitation of their approach is that the two baseline hazard functions were assumed to be proportional to each other. We consider the same problem and present an efficient estimation procedure for regression parameters that does not require the proportionality assumption. An EM algorithm is developed and the method is evaluated by a simulation study, which indicates that the proposed methodology performs well for practical situations. An illustrative example is provided.
Forecasting municipal solid waste generation using prognostic tools and regression analysis.
Ghinea, Cristina; Drăgoi, Elena Niculina; Comăniţă, Elena-Diana; Gavrilescu, Marius; Câmpean, Teofil; Curteanu, Silvia; Gavrilescu, Maria
2016-11-01
For an adequate planning of waste management systems the accurate forecast of waste generation is an essential step, since various factors can affect waste trends. The application of predictive and prognosis models are useful tools, as reliable support for decision making processes. In this paper some indicators such as: number of residents, population age, urban life expectancy, total municipal solid waste were used as input variables in prognostic models in order to predict the amount of solid waste fractions. We applied Waste Prognostic Tool, regression analysis and time series analysis to forecast municipal solid waste generation and composition by considering the Iasi Romania case study. Regression equations were determined for six solid waste fractions (paper, plastic, metal, glass, biodegradable and other waste). Accuracy Measures were calculated and the results showed that S-curve trend model is the most suitable for municipal solid waste (MSW) prediction.
Regression And Time Series Analysis Of Loan Default At Minescho Cooperative Credit Union Tarkwa
Directory of Open Access Journals (Sweden)
Otoo
2015-08-01
Full Text Available Abstract Lending in the form of loans is a principal business activity for banks credit unions and other financial institutions. This forms a substantial amount of the banks assets. However when these loans are defaulted it tends to have serious effects on the financial institutions. This study sought to determine the trend and forecast loan default at Minescho CreditUnion Tarkwa. A secondary data from the Credit Union was analyzed using Regression Analysis and the Box-Jenkins method of Time Series. From the Regression Analysis there was a moderately strong relationship between the amount of loan default and time. Also the amount of loan default had an increasing trend. The two years forecast of the amount of loan default oscillated initially and remained constant from 2016 onwards.
Li, Caiyan; Li, Hongzhe
2010-01-01
Graphs and networks are common ways of depicting biological information. In biology, many different biological processes are represented by graphs, such as regulatory networks, metabolic pathways and protein--protein interaction networks. This kind of a priori use of graphs is a useful supplement to the standard numerical data such as microarray gene expression data. In this paper we consider the problem of regression analysis and variable selection when the covariates are linked on a graph. ...
Mehmet AKSARAYLI; SAYGIN, Özge
2011-01-01
In this study, students' perceived service quality level of Dokuz Eylul University (DEU) Buca Girl Dormitory Service is investigated by using SERVQUAL scale, which is a common service quality measure. Impacts of the dimensions of perceived service quality, which are tangibles, reliability, responsiveness, assurance, empathy, on preference and recommendation are investigated by logistic regression analysis. As a result, it is concluded that perceived service quality has impacts on preference a...
Gibbons, Robert D.; Segawa, Eisuke; Karabatsos, George; Amatya, Anup K.; Bhaumik, Dulal K.; Brown, C Hendricks; Kapur, Kush; Marcus, Sue M.; Hur, Kwan; Mann, J. John
2008-01-01
A new statistical methodology is developed for the analysis of spontaneous adverse event (AE) reports from post-marketing drug surveillance data. The method involves both empirical Bayes (EB) and fully Bayes estimation of rate multipliers for each drug within a class of drugs, for a particular AE, based on a mixed-effects Poisson regression model. Both parametric and semiparametric models for the random-effect distribution are examined. The method is applied to data from Food and Drug Adminis...
Robust estimation for homoscedastic regression in the secondary analysis of case-control data
Wei, Jiawei
2012-12-04
Primary analysis of case-control studies focuses on the relationship between disease D and a set of covariates of interest (Y, X). A secondary application of the case-control study, which is often invoked in modern genetic epidemiologic association studies, is to investigate the interrelationship between the covariates themselves. The task is complicated owing to the case-control sampling, where the regression of Y on X is different from what it is in the population. Previous work has assumed a parametric distribution for Y given X and derived semiparametric efficient estimation and inference without any distributional assumptions about X. We take up the issue of estimation of a regression function when Y given X follows a homoscedastic regression model, but otherwise the distribution of Y is unspecified. The semiparametric efficient approaches can be used to construct semiparametric efficient estimates, but they suffer from a lack of robustness to the assumed model for Y given X. We take an entirely different approach. We show how to estimate the regression parameters consistently even if the assumed model for Y given X is incorrect, and thus the estimates are model robust. For this we make the assumption that the disease rate is known or well estimated. The assumption can be dropped when the disease is rare, which is typically so for most case-control studies, and the estimation algorithm simplifies. Simulations and empirical examples are used to illustrate the approach.
Non-Stationary Hydrologic Frequency Analysis using B-Splines Quantile Regression
Nasri, B.; St-Hilaire, A.; Bouezmarni, T.; Ouarda, T.
2015-12-01
Hydrologic frequency analysis is commonly used by engineers and hydrologists to provide the basic information on planning, design and management of hydraulic structures and water resources system under the assumption of stationarity. However, with increasing evidence of changing climate, it is possible that the assumption of stationarity would no longer be valid and the results of conventional analysis would become questionable. In this study, we consider a framework for frequency analysis of extreme flows based on B-Splines quantile regression, which allows to model non-stationary data that have a dependence on covariates. Such covariates may have linear or nonlinear dependence. A Markov Chain Monte Carlo (MCMC) algorithm is used to estimate quantiles and their posterior distributions. A coefficient of determination for quantiles regression is proposed to evaluate the estimation of the proposed model for each quantile level. The method is applied on annual maximum and minimum streamflow records in Ontario, Canada. Climate indices are considered to describe the non-stationarity in these variables and to estimate the quantiles in this case. The results show large differences between the non-stationary quantiles and their stationary equivalents for annual maximum and minimum discharge with high annual non-exceedance probabilities. Keywords: Quantile regression, B-Splines functions, MCMC, Streamflow, Climate indices, non-stationarity.
Genetic analysis of carcass traits in beef cattle using random regression models.
Englishby, T M; Banos, G; Moore, K L; Coffey, M P; Evans, R D; Berry, D P
2016-04-01
Livestock mature at different rates depending, in part, on their genetic merit; therefore, the optimal age at slaughter for progeny of certain sires may differ. The objective of the present study was to examine sire-level genetic profiles for carcass weight, carcass conformation, and carcass fat in cattle of multiple beef and dairy breeds, including crossbreeds. Slaughter records from 126,214 heifers and 124,641 steers aged between 360 and 1,200 d and from 86,089 young bulls aged between 360 and 720 d were used in the analysis; animals were from 15,127 sires. Variance components for each trait across age at slaughter were generated using sire random regression models that included quadratic polynomials for fixed and random effects; heterogeneous residual variances were assumed across ages. Heritability estimates across genders ranged from 0.08 (±0.02) to 0.34 (±0.02) for carcass weight, from 0.24 (±0.02) to 0.42 (±0.01) for conformation, and from 0.16 (±0.03) to 0.40 (±0.02) for fat score. Genetic correlations within each trait across ages weakened as the interval between ages compared lengthened but were all >0.64, suggesting a similar genetic background for each trait across different ages. Eigenvalues and eigenfunctions of the additive genetic covariance matrix revealed genetic variability among animals in their growth profiles for carcass traits, although most of the genetic variability was associated with the height of the growth profile. At the same age, a positive genetic correlation (0.60 to 0.78; SE ranged from 0.01 to 0.04) existed between carcass weight and conformation, whereas negative genetic correlations existed between fatness and both conformation (-0.46 to 0.08; SE ranged from 0.02 to 0.09) and carcass weight (-0.48 to -0.16; SE ranged from 0.02 to 0.14) at the same age. The estimated genetic parameters in the present study indicate genetic variability in the growth trajectory in cattle, which can be exploited through breeding programs and
Baghi, Quentin; Métris, Gilles; Bergé, Joël; Christophe, Bruno; Touboul, Pierre; Rodrigues, Manuel
2015-03-01
The analysis of physical measurements often copes with highly correlated noises and interruptions caused by outliers, saturation events, or transmission losses. We assess the impact of missing data on the performance of linear regression analysis involving the fit of modeled or measured time series. We show that data gaps can significantly alter the precision of the regression parameter estimation in the presence of colored noise, due to the frequency leakage of the noise power. We present a regression method that cancels this effect and estimates the parameters of interest with a precision comparable to the complete data case, even if the noise power spectral density (PSD) is not known a priori. The method is based on an autoregressive fit of the noise, which allows us to build an approximate generalized least squares estimator approaching the minimal variance bound. The method, which can be applied to any similar data processing, is tested on simulated measurements of the MICROSCOPE space mission, whose goal is to test the weak equivalence principle (WEP) with a precision of 1 0-15. In this particular context the signal of interest is the WEP violation signal expected to be found around a well defined frequency. We test our method with different gap patterns and noise of known PSD and find that the results agree with the mission requirements, decreasing the uncertainty by a factor of 60 with respect to ordinary least squares methods. We show that it also provides a test of significance to assess the uncertainty of the measurement.
Periodic regression: a new approach to the analysis of two dimensional grain shape
Energy Technology Data Exchange (ETDEWEB)
Harrell, J.
1985-01-01
Fourier Shape Analysis has become a popular technique for describing the shape of two-dimensional outlines of sediment grains, fossils and other objects. The two-dimensional shape of an object is described by establishing the goodness-of-fit to the outline of each of a series of closed-form sinusoidal curves with frequencies that are harmonically related to some function of the outline length. This approach has two problems: (1) it requires outlines to be sampled at equal intervals with respect to the polar angle about the centroid; and (2) it fits an arbitrary, harmonic series of sinusoids to the outline and does not resolve the typically nonharmonic sinusoidal components that are actually present in the outline. This paper introduces a new but mathematically related technique, Periodic Regression, that has all of the strengths and none of the drawbacks of Fourier Shape Analysis. Periodic Regression operates on an outline samples on either an equal or unequal interval, and decomposes it into a set of nonharmonic waveforms that correspond to the sinusoidal components actually present in the outline. The resulting residual outline is reanalyzed in the same manner in the next treatment cycle. The process is repeated until the original outline has been reduced to a circle. A plot of frequency vs. contribution for the set of extracted sinusoids defines the outline shape. Periodic Regression has the advantage of greater convenience of outline sampling, and greater resolution of shape components.
Using correspondence analysis in multiple case studies
Kienstra, N.H.H.; van der Heijden, P.G.M.
2015-01-01
In qualitative research of multiple case studies, Miles and Huberman proposed to summarize the separate cases in a so-called meta-matrix that consists of cases by variables. Yin discusses cross-case synthesis to study this matrix. We propose correspondence analysis (CA) as a useful tool to study thi
Using Correspondence Analysis in Multiple Case Studies
Kienstra, Natascha; van der Heijden, Peter G.M.
2015-01-01
In qualitative research of multiple case studies, Miles and Huberman proposed to summarize the separate cases in a so-called meta-matrix that consists of cases by variables. Yin discusses cross-case synthesis to study this matrix. We propose correspondence analysis (CA) as a useful tool to study thi
Duchateau, L; Kruska, R L; Perry, B D
1997-10-01
Large databases with multiple variables, selected because they are available and might provide an insight into establishing causal relationships, are often difficult to analyse and interpret because of multicollinearity. The objective of this study was to reduce the dimensionality of a multivariable spatial database of Zimbabwe, containing many environmental variables that were collected to predict the distribution of outbreaks of theileriosis (the tick-borne infection of cattle caused by Theileria parva and transmitted by the brown ear tick). Principal-component analysis and varimax rotation of the principal components were first used to select a reduced number of variables. The logistic-regression model was evaluated by appropriate goodness-of-fit tests.
Institute of Scientific and Technical Information of China (English)
Lingling; TAN
2013-01-01
This article selects some major factors influencing the agricultural economic growth are selected,such as labor,capital input,farmland area,fertilizer input and information input.And it selects some factors to explain information input,such as the number of website ownership,types of books,magazines and newspapers published,the number of telephone ownership per 100 households,the number of home computers ownership per 100 households,farmers’ spending on transportation and communication,culture,education,entertainment and services, and the total number of agricultural science and technology service personnel.Using regression model,this article conducts regression analysis of the cross-section data on 31 provinces,autonomous regions and municipalities in 2010.The results show that the building of information infrastructure,the use of means of information,the popularization and promotion of knowledge of agricultural science and technology,play an important role in promoting agricultural economic growth.
Sub-pixel estimation of tree cover and bare surface densities using regression tree analysis
Directory of Open Access Journals (Sweden)
Carlos Augusto Zangrando Toneli
2011-09-01
Full Text Available Sub-pixel analysis is capable of generating continuous fields, which represent the spatial variability of certain thematic classes. The aim of this work was to develop numerical models to represent the variability of tree cover and bare surfaces within the study area. This research was conducted in the riparian buffer within a watershed of the São Francisco River in the North of Minas Gerais, Brazil. IKONOS and Landsat TM imagery were used with the GUIDE algorithm to construct the models. The results were two index images derived with regression trees for the entire study area, one representing tree cover and the other representing bare surface. The use of non-parametric and non-linear regression tree models presented satisfactory results to characterize wetland, deciduous and savanna patterns of forest formation.
Directory of Open Access Journals (Sweden)
Samuel Ribeiro Figueiredo
2008-12-01
hydrographic variables (distance to rivers, flow length, topographical wetness index, and stream power index. Multiple logistic regressions were established between the soil classes mapped on the basis of a traditional survey at a scale of 1:80.000 and the land variables calculated using the DEM. The regressions were used to calculate the probability of occurrence of each soil class. The final estimated soil map was drawn by assigning the soil class with highest probability of occurrence to each cell. The general accuracy was evaluated at 58 % and the Kappa coefficient at 38 % in a comparison of the original soil map with the map estimated at the original scale. A legend simplification had little effect to increase the general accuracy of the map (general accuracy of 61 % and Kappa coefficient of 39 %. It was concluded that multiple logistic regressions have a predictive potential as tool of supervised soil mapping.
Energy Technology Data Exchange (ETDEWEB)
Mackley, Rob D.; Spane, Frank A.; Pulsipher, Trenton C.; Allwardt, Craig H.
2010-09-01
A software tool was created in Fiscal Year 2010 (FY11) that enables multiple-regression correction of well water levels for river-stage effects. This task was conducted as part of the Remediation Science and Technology project of CH2MHILL Plateau Remediation Company (CHPRC). This document contains an overview of the correction methodology and a user’s manual for Multiple Regression in Excel (MRCX) v.1.1. It also contains a step-by-step tutorial that shows users how to use MRCX to correct river effects in two different wells. This report is accompanied by an enclosed CD that contains the MRCX installer application and files used in the tutorial exercises.
Vasu, Ellen Storey
1978-01-01
The effects of the violation of the assumption of normality in the conditional distributions of the dependent variable, coupled with the condition of multicollinearity upon the outcome of testing the hypothesis that the regression coefficient equals zero, are investigated via a Monte Carlo study. (Author/JKS)
Ryu, Duchwan
2010-09-28
We consider nonparametric regression analysis in a generalized linear model (GLM) framework for data with covariates that are the subject-specific random effects of longitudinal measurements. The usual assumption that the effects of the longitudinal covariate processes are linear in the GLM may be unrealistic and if this happens it can cast doubt on the inference of observed covariate effects. Allowing the regression functions to be unknown, we propose to apply Bayesian nonparametric methods including cubic smoothing splines or P-splines for the possible nonlinearity and use an additive model in this complex setting. To improve computational efficiency, we propose the use of data-augmentation schemes. The approach allows flexible covariance structures for the random effects and within-subject measurement errors of the longitudinal processes. The posterior model space is explored through a Markov chain Monte Carlo (MCMC) sampler. The proposed methods are illustrated and compared to other approaches, the "naive" approach and the regression calibration, via simulations and by an application that investigates the relationship between obesity in adulthood and childhood growth curves. © 2010, The International Biometric Society.
Directory of Open Access Journals (Sweden)
Filip Kokotovic
2016-06-01
Full Text Available The study of human capital relevance to economic growth is becoming increasingly important taking into account its relevance in many of the Sustainable Development Goals proposed by the UN. This paper conducted a panel regression analysis of selected SE European countries and Scandinavian countries using the Granger causality test and pooled panel regression. In order to test the relevance of human capital on economic growth, several human capital proxy variables were identified. Aside from the human capital proxy variables, other explanatory variables were selected using stepwise regression while the dependant variable was GDP. This paper concludes that there are significant structural differences in the economies of the two observed panels. Of the human capital proxy variables observed, for the panel of SE European countries only life expectancy was statistically significant and it had a negative impact on economic growth, while in the panel of Scandinavian countries total public expenditure on education had a statistically significant positive effect on economic growth. Based upon these results and existing studies, this paper concludes that human capital has a far more significant impact on economic growth in more developed economies.
International Nuclear Information System (INIS)
Highlights: • A new method useful for the parametric analysis and optimization of reactor core designs. • This uses the strengths of genetic algorithms (GA), and regression splines. • The method is applied to the core fuel pin cell of a PHWR design. • Tools like java, R, and codes like Serpent, Matlab are used in this research. - Abstract: An analysis and optimization of a set of neutronics parameters of a thorium-fueled pressurized heavy water reactor core fuel has been performed. The analysis covers a detailed pin-cell analysis of a seed-blanket configuration, where the seed is composed of natural uranium, and the blanket is composed of thorium. Genetic algorithms (GA) is used to optimize the input parameters to meet a specific set of objectives related to: infinite multiplication factor, initial breeding ratio, and specific nuclide’s effective microscopic cross-section. The core input parameters are the pitch-to-diameter ratio, and blanket material composition. Recursive partitioning of decision trees (rpart) multivariate regression model is used to perform a predictive analysis of the samples generated from the GA module. Reactor designs are usually complex and a simulation needs a significantly large amount time to execute, hence implementation of GA or any other global optimization techniques is not feasible, therefore we present a new method of using rpart in conjunction with GA. Due to using rpart, we do not necessarily need to run the neutronics simulation for all the inputs generated from the GA module rather, run the simulations for a predefined set of inputs, build a regression fit to the input and the output parameters, and then use this fit to predict the output parameters for the inputs generated by GA. The rpart model is implemented as a library using R programming language. The results suggest that the initial breeding ratio tends to increase due to a harder neutron spectrum, however a softer neutron spectrum is desired to limit the
Analysis of sparse data in logistic regression in medical research: A newer approach
Directory of Open Access Journals (Sweden)
S Devika
2016-01-01
Full Text Available Background and Objective: In the analysis of dichotomous type response variable, logistic regression is usually used. However, the performance of logistic regression in the presence of sparse data is questionable. In such a situation, a common problem is the presence of high odds ratios (ORs with very wide 95% confidence interval (CI (OR: >999.999, 95% CI: 999.999. In this paper, we addressed this issue by using penalized logistic regression (PLR method. Materials and Methods: Data from case-control study on hyponatremia and hiccups conducted in Christian Medical College, Vellore, Tamil Nadu, India was used. The outcome variable was the presence/absence of hiccups and the main exposure variable was the status of hyponatremia. Simulation dataset was created with different sample sizes and with a different number of covariates. Results: A total of 23 cases and 50 controls were used for the analysis of ordinary and PLR methods. The main exposure variable hyponatremia was present in nine (39.13% of the cases and in four (8.0% of the controls. Of the 23 hiccup cases, all were males and among the controls, 46 (92.0% were males. Thus, the complete separation between gender and the disease group led into an infinite OR with 95% CI (OR: >999.999, 95% CI: 999.999 whereas there was a finite and consistent regression coefficient for gender (OR: 5.35; 95% CI: 0.42, 816.48 using PLR. After adjusting for all the confounding variables, hyponatremia entailed 7.9 (95% CI: 2.06, 38.86 times higher risk for the development of hiccups as was found using PLR whereas there was an overestimation of risk OR: 10.76 (95% CI: 2.17, 53.41 using the conventional method. Simulation experiment shows that the estimated coverage probability of this method is near the nominal level of 95% even for small sample sizes and for a large number of covariates. Conclusions: PLR is almost equal to the ordinary logistic regression when the sample size is large and is superior in small cell
Multiple comparison analysis testing in ANOVA.
McHugh, Mary L
2011-01-01
The Analysis of Variance (ANOVA) test has long been an important tool for researchers conducting studies on multiple experimental groups and one or more control groups. However, ANOVA cannot provide detailed information on differences among the various study groups, or on complex combinations of study groups. To fully understand group differences in an ANOVA, researchers must conduct tests of the differences between particular pairs of experimental and control groups. Tests conducted on subsets of data tested previously in another analysis are called post hoc tests. A class of post hoc tests that provide this type of detailed information for ANOVA results are called "multiple comparison analysis" tests. The most commonly used multiple comparison analysis statistics include the following tests: Tukey, Newman-Keuls, Scheffee, Bonferroni and Dunnett. These statistical tools each have specific uses, advantages and disadvantages. Some are best used for testing theory while others are useful in generating new theory. Selection of the appropriate post hoc test will provide researchers with the most detailed information while limiting Type 1 errors due to alpha inflation.
Simunovic, K.; Simunovic, G.; Saric, T.
2013-10-01
The surface roughness is a very significant indicator of surface quality. It represents an essential exploitation requirement and influences technological time and costs, i.e. productivity. For that reason, the main objective of this paper is to analyse the influence of face milling cutting parameters (number of revolution, feed rate and depth of cut) on the surface roughness of aluminium alloy. Hence, a statistical (regression) model has been developed to predict the surface roughness by using the methodology of experimental design. Central composite design is chosen for fitting response surface. Also, numerical optimization considering two goals simultaneously (minimum propagation of error and minimum roughness) was performed throughout the experimental region. In this way, the settings of cutting parameters causing the minimum variability in response were determined for the estimated variations of the significant regression factors.
Directory of Open Access Journals (Sweden)
Zhigao Zeng
2016-01-01
Full Text Available This paper proposes a novel algorithm to solve the challenging problem of classifying error-diffused halftone images. We firstly design the class feature matrices, after extracting the image patches according to their statistics characteristics, to classify the error-diffused halftone images. Then, the spectral regression kernel discriminant analysis is used for feature dimension reduction. The error-diffused halftone images are finally classified using an idea similar to the nearest centroids classifier. As demonstrated by the experimental results, our method is fast and can achieve a high classification accuracy rate with an added benefit of robustness in tackling noise.
Sugihara, Shigemitsu; Shinozaki, Tsuguhiro; Ohishi, Hiroyuki; Araki, Yoshinori; Furukawa, Kohei
It is difficult to deregulate sediment-related disaster warning information, for the reason that it is difficult to quantify the risk of disaster after the heavy rain. If we can quantify the risk according to the rain situation, it will be an indication of deregulation. In this study, using logistic regression analysis, we quantified the risk according to the rain situation as the probability of disaster occurrence. And we analyzed the setup of resolutive criterion for sediment-related disaster warning information. As a result, we can improve convenience of the evaluation method of probability of disaster occurrence, which is useful to provide information of imminently situation.
Regression analysis as an objective tool of economic management of rolling mill
Directory of Open Access Journals (Sweden)
Š. Vilamová
2015-07-01
Full Text Available The ability to optimize costs plays a key role in maintaining competitiveness of the company, because without detailed knowledge of costs, companies are not able to make the right decisions that will ensure their long-term growth. The aim of this article is to outline the problematic areas related to company costs and to contribute to a debate on the method used to determine the amount of fixed and variable costs, their monitoring and follow-up control. This article presents a potential use of regression analysis as an objective tool of economic management in metallurgical companies, as these companies have several specific features
Energy Technology Data Exchange (ETDEWEB)
Hofland, G.S.; Barton, C.C.
1990-10-01
The computer program FREQFIT is designed to perform regression and statistical chi-squared goodness of fit analysis on one-dimensional or two-dimensional data. The program features an interactive user dialogue, numerous help messages, an option for screen or line printer output, and the flexibility to use practically any commercially available graphics package to create plots of the program`s results. FREQFIT is written in Microsoft QuickBASIC, for IBM-PC compatible computers. A listing of the QuickBASIC source code for the FREQFIT program, a user manual, and sample input data, output, and plots are included. 6 refs., 1 fig.
Ziemssen, Tjalf; Reimann, Manja; Gasch, Julia; Rüdiger, Heinz
2013-09-01
Biological rhythms, describing the temporal variation of biological processes, are a characteristic feature of complex systems. The analysis of biological rhythms can provide important insights into the pathophysiology of different diseases, especially, in cardiovascular medicine. In the field of the autonomic nervous system, heart rate variability (HRV) and baroreflex sensitivity (BRS) describe important fluctuations of blood pressure and heart rate which are often analyzed by Fourier transformation. However, these parameters are stochastic with overlaying rhythmical structures. R-R intervals as independent variables of time are not equidistant. That is why the trigonometric regressive spectral (TRS) analysis--reviewed in this paper--was introduced, considering both the statistical and rhythmical features of such time series. The data segments required for TRS analysis can be as short as 20 s allowing for dynamic evaluation of heart rate and blood pressure interaction over longer periods. Beyond HRV, TRS also estimates BRS based on linear regression analyses of coherent heart rate and blood pressure oscillations. An additional advantage is that all oscillations are analyzed by the same (maximal) number of R-R intervals thereby providing a high number of individual BRS values. This ensures a high confidence level of BRS determination which, along with short recording periods, may be of profound clinical relevance. The dynamic assessment of heart rate and blood pressure spectra by TRS allows a more precise evaluation of cardiovascular modulation under different settings as has already been demonstrated in different clinical studies. PMID:23812502
International Nuclear Information System (INIS)
Purpose: The goal of this study was to maximize the discrimination between benign and malignant masses in patients with sonographically indeterminate ovarian lesions by means of unenhanced and contrast-enhanced MR imaging, and to develop a computer-assisted diagnosis system. Material and Methods: Findings in precontrast and Gd-DTPA contrast-enhanced MR images of 104 patients with 115 sonographically indeterminate ovarian masses were analyzed, and the results were correlated with histopathological findings. Of 115 lesions, 65 were benign (23 cystadenomas, 13 complex cysts, 11 teratomas, 6 fibrothecomas, 12 others) and 50 were malignant (32 ovarian carcinomas, 7 metastatic tumors of the ovary, 4 carcinomas of the fallopian tubes, 7 others). A logistic regression analysis was performed to discriminate between benign and malignant lesions, and a model of a computer-assisted diagnosis was developed. This model was prospectively tested in 75 cases of ovarian tumors found at other institutions. Results: From the univariate analysis, the following parameters were selected as significant for predicting malignancy (p≤0.05): A solid or cystic mass with a large solid component or wall thickness greater than 3 mm; complex internal architecture; ascites; and bilaterality. Based on these parameters, a model of a computer-assisted diagnosis system was developed with the logistic regression analysis. To distinguish benign from malignant lesions, the maximum cut-off point was obtained between 0.47 and 0.51. In a prospective application of this model, 87% of the lesions were accurately identified as benign or malignant. (orig.)
Energy Technology Data Exchange (ETDEWEB)
Yamashita, Y. [Dept. of Radiology, Kumamoto Univ. School of Medicine (Japan); Hatanaka, Y. [Dept. of Radiology, Kumamoto Univ. School of Medicine (Japan); Torashima, M. [Dept. of Radiology, Kumamoto Univ. School of Medicine (Japan); Takahashi, M. [Dept. of Radiology, Kumamoto Univ. School of Medicine (Japan); Miyazaki, K. [Dept. of Obstetrics and Gynecology, Kumamoto Univ. School of Medicine (Japan); Okamura, H. [Dept. of Obstetrics and Gynecology, Kumamoto Univ. School of Medicine (Japan)
1997-07-01
Purpose: The goal of this study was to maximize the discrimination between benign and malignant masses in patients with sonographically indeterminate ovarian lesions by means of unenhanced and contrast-enhanced MR imaging, and to develop a computer-assisted diagnosis system. Material and Methods: Findings in precontrast and Gd-DTPA contrast-enhanced MR images of 104 patients with 115 sonographically indeterminate ovarian masses were analyzed, and the results were correlated with histopathological findings. Of 115 lesions, 65 were benign (23 cystadenomas, 13 complex cysts, 11 teratomas, 6 fibrothecomas, 12 others) and 50 were malignant (32 ovarian carcinomas, 7 metastatic tumors of the ovary, 4 carcinomas of the fallopian tubes, 7 others). A logistic regression analysis was performed to discriminate between benign and malignant lesions, and a model of a computer-assisted diagnosis was developed. This model was prospectively tested in 75 cases of ovarian tumors found at other institutions. Results: From the univariate analysis, the following parameters were selected as significant for predicting malignancy (p{<=}0.05): A solid or cystic mass with a large solid component or wall thickness greater than 3 mm; complex internal architecture; ascites; and bilaterality. Based on these parameters, a model of a computer-assisted diagnosis system was developed with the logistic regression analysis. To distinguish benign from malignant lesions, the maximum cut-off point was obtained between 0.47 and 0.51. In a prospective application of this model, 87% of the lesions were accurately identified as benign or malignant. (orig.).
Directory of Open Access Journals (Sweden)
Elvio Giasson
2006-06-01
Full Text Available Soil surveys are necessary sources of information for land use planning, but they are not always available. This study proposes the use of multiple logistic regressions on the prediction of occurrence of soil types based on reference areas. From a digitalized soil map and terrain parameters derived from the digital elevation model in ArcView environment, several sets of multiple logistic regressions were defined using statistical software Minitab, establishing relationship between explanatory terrain variables and soil types, using either the original legend or a simplified legend, and using or not stratification of the study area by drainage classes. Terrain parameters, such as elevation, distance to stream, flow accumulation, and topographic wetness index, were the variables that best explained soil distribution. Stratification by drainage classes did not have significant effect. Simplification of the original legend increased the accuracy of the method on predicting soil distribution.Os levantamentos de solos são fontes de informação necessárias para o planejamento de uso das terras, entretanto eles nem sempre estão disponíveis. Este estudo propõe o uso de regressões logísticas múltiplas na predição de ocorrência de classes de solos a partir de áreas de referência. Baseado no mapa original de solos em formato digital e parâmetros do terreno derivados do modelo numérico do terreno em ambiente ArcView, vários conjuntos de regressões logísticas múltiplas foram definidas usando o programa estatístico Minitab, estabelecendo relações entre as variáveis do terreno independentes e tipos de solos, usando tanto a legenda original como uma legenda simplificada, e usando ou não estratificação da área de estudo por classes de drenagem. Os parâmetros do terreno como elevação, distância dos rios, acúmulo de fluxo e índice de umidade topográfica foram as variáveis que melhor explicaram a distribuição das classes de
Institute of Scientific and Technical Information of China (English)
管军; 杨兴易; 赵良; 林兆奋; 郭昌星; 李文放
2003-01-01
Objective To investigate the incidence, crude mortality and independent risk factors of ventilator-associated pneumonia (VAP) in comprehensive ICU in China.Methods The clinical and microbiological data were retrospectively collected and analysed of all the 97 patients receiving mechanical ventilation (>48hr) in our comprehensive ICU during 1999. 1 - 2000. 12. Firstly several statistically significant risk factors were screened out with univariate analysis, then independent risk factors were determined with multivariate stepwise logistic regression analysis.Results The incidence of VAP was 54. 64% (15. 60 cases per 1000 ventilation days), the crude mortality 47.42% . Interval between the establishment of artificial airway and diagnosis of VAP was 6.9 ± 4.3 d. Univariate analysis suggested that indwelling naso-gastric tube, corticosteroid, acid inhibitor, third-generation cephalosporin/ imipenem, non - infection lung disease, and extrapulmonary infection were the statistically significant risk factors of
Directory of Open Access Journals (Sweden)
Yan-Feng Zhang
2012-07-01
Full Text Available Polycyclic aromatic hydrocarbons (PAHs are ubiquitous contaminants found in the environment. Immunoassays represent useful analytical methods to complement traditional analytical procedures for PAHs. Cross-reactivity (CR is a very useful character to evaluate the extent of cross-reaction of a cross-reactant in immunoreactions and immunoassays. The quantitative relationships between the molecular properties and the CR of PAHs were established by stepwise multiple linear regression, principal component regression and partial least square regression, using the data of two commercial enzyme-linked immunosorbent assay (ELISA kits. The objective is to find the most important molecular properties that affect the CR, and predict the CR by multiple regression methods. The results show that the physicochemical, electronic and topological properties of the PAH molecules have an integrated effect on the CR properties for the two ELISAs, among which molar solubility (S_{m} and valence molecular connectivity index (^{3}χ^{v} are the most important factors. The obtained regression equations for Ris^{C} kit are all statistically significant (p < 0.005 and show satisfactory ability for predicting CR values, while equations for RaPID kit are all not significant (p > 0.05 and not suitable for predicting. It is probably because that the Ris^{C} immunoassay employs a monoclonal antibody, while the RaPID kit is based on polyclonal antibody. Considering the important effect of solubility on the CR values, cross-reaction potential (CRP is calculated and used as a complement of CR for evaluation of cross-reactions in immunoassays. Only the compounds with both high CR and high CRP can cause intense cross-reactions in immunoassays.
Measuring treatment and scale bias effects by linear regression in the analysis of OHI-S scores.
Moore, B J
1977-05-01
A linear regression model is presented for estimating unbiased treatment effects from OHI-S scores. An example is given to illustrate an analysis and to compare results of an unbiased regression estimator with those based on a biased simple difference estimator.
Regression analysis of growth responses to water depth in three wetland plant species
Sorrell, Brian K.; Tanner, Chris C.; Brix, Hans
2012-01-01
Background and aims Plant species composition in wetlands and on lakeshores often shows dramatic zonation, which is frequently ascribed to differences in flooding tolerance. This study compared the growth responses to water depth of three species (Phormium tenax, Carex secta and Typha orientalis) differing in depth preferences in wetlands, using non-linear and quantile regression analyses to establish how flooding tolerance can explain field zonation. Methodology Plants were established for 8 months in outdoor cultures in waterlogged soil without standing water, and then randomly allocated to water depths from 0 to 0.5 m. Morphological and growth responses to depth were followed for 54 days before harvest, and then analysed by repeated-measures analysis of covariance, and non-linear and quantile regression analysis (QRA), to compare flooding tolerances. Principal results Growth responses to depth differed between the three species, and were non-linear. Phormium tenax growth decreased rapidly in standing water >0.25 m depth, C. secta growth increased initially with depth but then decreased at depths >0.30 m, accompanied by increased shoot height and decreased shoot density, and T. orientalis was unaffected by the 0- to 0.50-m depth range. In P. tenax the decrease in growth was associated with a decrease in the number of leaves produced per ramet and in C. secta the effect of water depth was greatest for the tallest shoots. Allocation patterns were unaffected by depth. Conclusions The responses are consistent with the principle that zonation in the field is primarily structured by competition in shallow water and by physiological flooding tolerance in deep water. Regression analyses, especially QRA, proved to be powerful tools in distinguishing genuine phenotypic responses to water depth from non-phenotypic variation due to size and developmental differences. PMID:23259044
Automated particle identification through regression analysis of size, shape and colour
Rodriguez Luna, J. C.; Cooper, J. M.; Neale, S. L.
2016-04-01
Rapid point of care diagnostic tests and tests to provide therapeutic information are now available for a range of specific conditions from the measurement of blood glucose levels for diabetes to card agglutination tests for parasitic infections. Due to a lack of specificity these test are often then backed up by more conventional lab based diagnostic methods for example a card agglutination test may be carried out for a suspected parasitic infection in the field and if positive a blood sample can then be sent to a lab for confirmation. The eventual diagnosis is often achieved by microscopic examination of the sample. In this paper we propose a computerized vision system for aiding in the diagnostic process; this system used a novel particle recognition algorithm to improve specificity and speed during the diagnostic process. We will show the detection and classification of different types of cells in a diluted blood sample using regression analysis of their size, shape and colour. The first step is to define the objects to be tracked by a Gaussian Mixture Model for background subtraction and binary opening and closing for noise suppression. After subtracting the objects of interest from the background the next challenge is to predict if a given object belongs to a certain category or not. This is a classification problem, and the output of the algorithm is a Boolean value (true/false). As such the computer program should be able to "predict" with reasonable level of confidence if a given particle belongs to the kind we are looking for or not. We show the use of a binary logistic regression analysis with three continuous predictors: size, shape and color histogram. The results suggest this variables could be very useful in a logistic regression equation as they proved to have a relatively high predictive value on their own.
Levy, Jonathan I; Clougherty, Jane E; Baxter, Lisa K; Houseman, E Andres; Paciorek, Christopher J
2010-12-01
Previous studies have identified associations between traffic exposures and a variety of adverse health effects, but many of these studies relied on proximity measures rather than measured or modeled concentrations of specific air pollutants, complicating interpretability of the findings. An increasing number of studies have used land-use regression (LUR) or other techniques to model small-scale variability in concentrations of specific air pollutants. However, these studies have generally considered a limited number of pollutants, focused on outdoor concentrations (or indoor concentrations of ambient origin) when indoor concentrations are better proxies for personal exposures, and have not taken full advantage of statistical methods for source apportionment that may have provided insight about the structure of the LUR models and the interpretability of model results. Given these issues, the primary objective of our study was to determine predictors of indoor and outdoor residential concentrations of multiple traffic-related air pollutants within an urban area, based on a combination of central site monitoring data; geographic information system (GIS) covariates reflecting traffic and other outdoor sources; questionnaire data reflecting indoor sources and activities that affect ventilation rates; and factor-analytic methods to better infer source contributions. As part of a prospective birth cohort study assessing asthma etiology in urban Boston, we collected indoor and/or outdoor 3-to-4 day samples of nitrogen dioxide (NO2) and fine particulate matter with an aerodynamic diameter or = 2.5 pm (PM2.5) at 44 residences during multiple seasons of the year from 2003 through 2005. We performed reflectance analysis, x-ray fluorescence spectroscopy (XRF), and high-resolution inductively coupled plasma-mass spectrometry (ICP-MS) on particle filters to estimate the concentrations of elemental carbon (EC), trace elements, and water-soluble metals, respectively. We derived
Levy, Jonathan I; Clougherty, Jane E; Baxter, Lisa K; Houseman, E Andres; Paciorek, Christopher J
2010-12-01
Previous studies have identified associations between traffic exposures and a variety of adverse health effects, but many of these studies relied on proximity measures rather than measured or modeled concentrations of specific air pollutants, complicating interpretability of the findings. An increasing number of studies have used land-use regression (LUR) or other techniques to model small-scale variability in concentrations of specific air pollutants. However, these studies have generally considered a limited number of pollutants, focused on outdoor concentrations (or indoor concentrations of ambient origin) when indoor concentrations are better proxies for personal exposures, and have not taken full advantage of statistical methods for source apportionment that may have provided insight about the structure of the LUR models and the interpretability of model results. Given these issues, the primary objective of our study was to determine predictors of indoor and outdoor residential concentrations of multiple traffic-related air pollutants within an urban area, based on a combination of central site monitoring data; geographic information system (GIS) covariates reflecting traffic and other outdoor sources; questionnaire data reflecting indoor sources and activities that affect ventilation rates; and factor-analytic methods to better infer source contributions. As part of a prospective birth cohort study assessing asthma etiology in urban Boston, we collected indoor and/or outdoor 3-to-4 day samples of nitrogen dioxide (NO2) and fine particulate matter with an aerodynamic diameter or = 2.5 pm (PM2.5) at 44 residences during multiple seasons of the year from 2003 through 2005. We performed reflectance analysis, x-ray fluorescence spectroscopy (XRF), and high-resolution inductively coupled plasma-mass spectrometry (ICP-MS) on particle filters to estimate the concentrations of elemental carbon (EC), trace elements, and water-soluble metals, respectively. We derived
Analysis of dynamic multiplicity fluctuations at PHOBOS
Chai, Zhengwei; PHOBOS Collaboration; Back, B. B.; Baker, M. D.; Ballintijn, M.; Barton, D. S.; Betts, R. R.; Bickley, A. A.; Bindel, R.; Budzanowski, A.; Busza, W.; Carroll, A.; Chai, Z.; Decowski, M. P.; García, E.; George, N.; Gulbrandsen, K.; Gushue, S.; Halliwell, C.; Hamblen, J.; Heintzelman, G. A.; Henderson, C.; Hofman, D. J.; Hollis, R. S.; Holynski, R.; Holzman, B.; Iordanova, A.; Johnson, E.; Kane, J. L.; Katzy, J.; Khan, N.; Kucewicz, W.; Kulinich, P.; Kuo, C. M.; Lin, W. T.; Manly, S.; McLeod, D.; Mignerey, A. C.; Nouicer, R.; Olszewski, A.; Pak, R.; Park, I. C.; Pernegger, H.; Reed, C.; Remsberg, L. P.; Reuter, M.; Roland, C.; Roland, G.; Rosenberg, L.; Sagerer, J.; Sarin, P.; Sawicki, P.; Skulski, W.; Steinberg, P.; Stephans, G. S. F.; Sukhanov, A.; Tang, J. L.; Trzupek, A.; Vale, C.; van Nieuwenhuizen, G. J.; Verdier, R.; Wolfs, F. L. H.; Wosiek, B.; Wozniak, K.; Wuosmaa, A. H.; Wyslouch, B.
2005-01-01
This paper presents the analysis of the dynamic fluctuations in the inclusive charged particle multiplicity measured by PHOBOS for Au+Au collisions at surdsNN = 200GeV within the pseudo-rapidity range of -3 < η < 3. First the definition of the fluctuations observables used in this analysis is presented, together with the discussion of their physics meaning. Then the procedure for the extraction of dynamic fluctuations is described. Some preliminary results are included to illustrate the correlation features of the fluctuation observable. New dynamic fluctuations results will be available in a later publication.
Poisson regression analysis of mortality among male workers at a thorium-processing plant
Energy Technology Data Exchange (ETDEWEB)
Liu, Zhiyuan; Lee, Tze-San; Kotek, T.J.
1991-12-31
Analyses of mortality among a cohort of 3119 male workers employed between 1915 and 1973 at a thorium-processing plant were updated to the end of 1982. Of the whole group, 761 men were deceased and 2161 men were still alive, while 197 men were lost to follow-up. A total of 250 deaths was added to the 511 deaths observed in the previous study. The standardized mortality ratio (SMR) for all causes of death was 1.12 with 95% confidence interval (CI) of 1.05-1.21. The SMRs were also significantly increased for all malignant neoplasms (SMR = 1.23, 95% CI = 1.04-1.43) and lung cancer (SMR = 1.36, 95% CI = 1.02-1.78). Poisson regression analysis was employed to evaluate the joint effects of job classification, duration of employment, time since first employment, age and year at first employment on mortality of all malignant neoplasms and lung cancer. A comparison of internal and external analyses with the Poisson regression model was also conducted and showed no obvious difference in fitting the data on lung cancer mortality of the thorium workers. The results of the multivariate analysis showed that there was no significant effect of all the study factors on mortality due to all malignant neoplasms and lung cancer. Therefore, further study is needed for the former thorium workers.
A least trimmed square regression method for second level FMRI effective connectivity analysis.
Li, Xingfeng; Coyle, Damien; Maguire, Liam; McGinnity, Thomas Martin
2013-01-01
We present a least trimmed square (LTS) robust regression method to combine different runs/subjects for second/high level effective connectivity analysis. The basic idea of this method is to treat the extreme nonlinear model variability as outliers if they exceed a certain threshold. A bootstrap method for the LTS estimation is employed to detect model outliers. We compared the LTS robust method with a non-robust method using simulated and real datasets. The difference between LTS and the non-robust method for second level effective connectivity analysis is significant, suggesting the conventional non-robust method is easily affected by the model variability from the first level analysis. In addition, after these outliers are detected and excluded for the high level analysis, the model coefficients of the second level are combined within the framework of a mixed model. The variance of the mixed model is estimated using the Newton-Raphson (NR) type Levenberg-Marquardt algorithm. Three sets of real data are adopted to compare conventional methods which do not include random effects in the analysis with a mixed model for second level effective connectivity analysis. The results show that the conventional method is significantly different from the mixed model when greater model variability exists, suggesting there is a strong random effect, and the mixed model should be employed for the second level effective connectivity analysis. PMID:23093379
Borquis, Rusbel Raul Aspilcueta; Neto, Francisco Ribeiro de Araujo; Baldi, Fernando; Hurtado-Lugo, Naudin; de Camargo, Gregório M F; Muñoz-Berrocal, Milthon; Tonhati, Humberto
2013-09-01
In this study, genetic parameters for test-day milk, fat, and protein yield were estimated for the first lactation. The data analyzed consisted of 1,433 first lactations of Murrah buffaloes, daughters of 113 sires from 12 herds in the state of São Paulo, Brazil, with calvings from 1985 to 2007. Ten-month classes of lactation days were considered for the test-day yields. The (co)variance components for the 3 traits were estimated using the regression analyses by Bayesian inference applying an animal model by Gibbs sampling. The contemporary groups were defined as herd-year-month of the test day. In the model, the random effects were additive genetic, permanent environment, and residual. The fixed effects were contemporary group and number of milkings (1 or 2), the linear and quadratic effects of the covariable age of the buffalo at calving, as well as the mean lactation curve of the population, which was modeled by orthogonal Legendre polynomials of fourth order. The random effects for the traits studied were modeled by Legendre polynomials of third and fourth order for additive genetic and permanent environment, respectively, the residual variances were modeled considering 4 residual classes. The heritability estimates for the traits were moderate (from 0.21-0.38), with higher estimates in the intermediate lactation phase. The genetic correlation estimates within and among the traits varied from 0.05 to 0.99. The results indicate that the selection for any trait test day will result in an indirect genetic gain for milk, fat, and protein yield in all periods of the lactation curve. The accuracy associated with estimated breeding values obtained using multi-trait random regression was slightly higher (around 8%) compared with single-trait random regression. This difference may be because to the greater amount of information available per animal. PMID:23831097
Diversity Performance Analysis on Multiple HAP Networks
Directory of Open Access Journals (Sweden)
Feihong Dong
2015-06-01
Full Text Available One of the main design challenges in wireless sensor networks (WSNs is achieving a high-data-rate transmission for individual sensor devices. The high altitude platform (HAP is an important communication relay platform for WSNs and next-generation wireless networks. Multiple-input multiple-output (MIMO techniques provide the diversity and multiplexing gain, which can improve the network performance effectively. In this paper, a virtual MIMO (V-MIMO model is proposed by networking multiple HAPs with the concept of multiple assets in view (MAV. In a shadowed Rician fading channel, the diversity performance is investigated. The probability density function (PDF and cumulative distribution function (CDF of the received signal-to-noise ratio (SNR are derived. In addition, the average symbol error rate (ASER with BPSK and QPSK is given for the V-MIMO model. The system capacity is studied for both perfect channel state information (CSI and unknown CSI individually. The ergodic capacity with various SNR and Rician factors for different network configurations is also analyzed. The simulation results validate the effectiveness of the performance analysis. It is shown that the performance of the HAPs network in WSNs can be significantly improved by utilizing the MAV to achieve overlapping coverage, with the help of the V-MIMO techniques.
Diversity Performance Analysis on Multiple HAP Networks.
Dong, Feihong; Li, Min; Gong, Xiangwu; Li, Hongjun; Gao, Fengyue
2015-01-01
One of the main design challenges in wireless sensor networks (WSNs) is achieving a high-data-rate transmission for individual sensor devices. The high altitude platform (HAP) is an important communication relay platform for WSNs and next-generation wireless networks. Multiple-input multiple-output (MIMO) techniques provide the diversity and multiplexing gain, which can improve the network performance effectively. In this paper, a virtual MIMO (V-MIMO) model is proposed by networking multiple HAPs with the concept of multiple assets in view (MAV). In a shadowed Rician fading channel, the diversity performance is investigated. The probability density function (PDF) and cumulative distribution function (CDF) of the received signal-to-noise ratio (SNR) are derived. In addition, the average symbol error rate (ASER) with BPSK and QPSK is given for the V-MIMO model. The system capacity is studied for both perfect channel state information (CSI) and unknown CSI individually. The ergodic capacity with various SNR and Rician factors for different network configurations is also analyzed. The simulation results validate the effectiveness of the performance analysis. It is shown that the performance of the HAPs network in WSNs can be significantly improved by utilizing the MAV to achieve overlapping coverage, with the help of the V-MIMO techniques. PMID:26134102
Directory of Open Access Journals (Sweden)
Ibrahim Fayad
2014-11-01
Full Text Available Estimating forest canopy height from large-footprint satellite LiDAR waveforms is challenging given the complex interaction between LiDAR waveforms, terrain, and vegetation, especially in dense tropical and equatorial forests. In this study, canopy height in French Guiana was estimated using multiple linear regression models and the Random Forest technique (RF. This analysis was either based on LiDAR waveform metrics extracted from the GLAS (Geoscience Laser Altimeter System spaceborne LiDAR data and terrain information derived from the SRTM (Shuttle Radar Topography Mission DEM (Digital Elevation Model or on Principal Component Analysis (PCA of GLAS waveforms. Results show that the best statistical model for estimating forest height based on waveform metrics and digital elevation data is a linear regression of waveform extent, trailing edge extent, and terrain index (RMSE of 3.7 m. For the PCA based models, better canopy height estimation results were observed using a regression model that incorporated both the first 13 principal components (PCs and the waveform extent (RMSE = 3.8 m. Random Forest regressions revealed that the best configuration for canopy height estimation used all the following metrics: waveform extent, leading edge, trailing edge, and terrain index (RMSE = 3.4 m. Waveform extent was the variable that best explained canopy height, with an importance factor almost three times higher than those for the other three metrics (leading edge, trailing edge, and terrain index. Furthermore, the Random Forest regression incorporating the first 13 PCs and the waveform extent had a slightly-improved canopy height estimation in comparison to the linear model, with an RMSE of 3.6 m. In conclusion, multiple linear regressions and RF regressions provided canopy height estimations with similar precision using either LiDAR metrics or PCs. However, a regression model (linear regression or RF based on the PCA of waveform samples with waveform
Ghaedi, M; Rahimi, Mahmoud Reza; Ghaedi, A M; Tyagi, Inderjeet; Agarwal, Shilpi; Gupta, Vinod Kumar
2016-01-01
Two novel and eco friendly adsorbents namely tin oxide nanoparticles loaded on activated carbon (SnO2-NP-AC) and activated carbon prepared from wood tree Pistacia atlantica (AC-PAW) were used for the rapid removal and fast adsorption of methyl orange (MO) from the aqueous phase. The dependency of MO removal with various adsorption influential parameters was well modeled and optimized using multiple linear regressions (MLR) and least squares support vector regression (LSSVR). The optimal parameters for the LSSVR model were found based on γ value of 0.76 and σ(2) of 0.15. For testing the data set, the mean square error (MSE) values of 0.0010 and the coefficient of determination (R(2)) values of 0.976 were obtained for LSSVR model, and the MSE value of 0.0037 and the R(2) value of 0.897 were obtained for the MLR model. The adsorption equilibrium and kinetic data was found to be well fitted and in good agreement with Langmuir isotherm model and second-order equation and intra-particle diffusion models respectively. The small amount of the proposed SnO2-NP-AC and AC-PAW (0.015 g and 0.08 g) is applicable for successful rapid removal of methyl orange (>95%). The maximum adsorption capacity for SnO2-NP-AC and AC-PAW was 250 mg g(-1) and 125 mg g(-1) respectively.
Within-session analysis of the extinction of pavlovian fear-conditioning using robust regression
Directory of Open Access Journals (Sweden)
Vargas-Irwin, Cristina
2010-06-01
Full Text Available Traditionally , the analysis of extinction data in fear conditioning experiments has involved the use of standard linear models, mostly ANOVA of between-group differences of subjects that have undergone different extinction protocols, pharmacological manipulations or some other treatment. Although some studies report individual differences in quantities such as suppression rates or freezing percentages, these differences are not included in the statistical modeling. Withinsubject response patterns are then averaged using coarse-grain time windows which can overlook these individual performance dynamics. Here we illustrate an alternative analytical procedure consisting of 2 steps: the estimation of a trend for within-session data and analysis of group differences in trend as main outcome. This procedure is tested on real fear-conditioning extinction data, comparing trend estimates via Ordinary Least Squares (OLS and robust Least Median of Squares (LMS regression estimates, as well as comparing between-group differences and analyzing mean freezing percentage versus LMS slopes as outcomes
Directory of Open Access Journals (Sweden)
Roseane Cavalcanti dos Santos
2012-08-01
Full Text Available The objective of this work was to estimate the stability and adaptability of pod and seed yield in runner peanut genotypes based on the nonlinear regression and AMMI analysis. Yield data from 11 trials, distributed in six environments and three harvests, carried out in the Northeast region of Brazil during the rainy season were used. Significant effects of genotypes (G, environments (E, and GE interactions were detected in the analysis, indicating different behaviors among genotypes in favorable and unfavorable environmental conditions. The genotypes BRS Pérola Branca and LViPE‑06 are more stable and adapted to the semiarid environment, whereas LGoPE‑06 is a promising material for pod production, despite being highly dependent on favorable environments.
Păniţă, Ovidiu
2015-09-01
In the years 2012-2014 on Banu-Maracine DRS there were tested an assortment of 25 isogenic lines of wheat (Triticum aestivum ssp.vulgare), the analyzed characters being the number of seeds/spike, seeds weight/spike (g), no. of spikes/m2, weight of a thousand seeds (WTS) (g) and no. of emerged plants/m2. Based on recorded data and statistical processing of those, they were identified a numbers of links between these characters. Also available regression models were identified between some of the studied characters. Based on component analysis, no. of seeds/spike and seeds weight/spike are components that influence in excess of 88% variance analysis, a total of seven genotypes with positive scores for both factors.
Barbu, N.; Cuculeanu, V.; Stefan, S.
2016-10-01
The aim of this study is to investigate the relationship between the frequency of very warm days (TX90p) in Romania and large-scale atmospheric circulation for winter (December-February) and summer (June-August) between 1962 and 2010. In order to achieve this, two catalogues from COST733Action were used to derive daily circulation types. Seasonal occurrence frequencies of the circulation types were calculated and have been utilized as predictors within the multiple linear regression model (MLRM) for the estimation of winter and summer TX90p values for 85 synoptic stations covering the entire Romania. A forward selection procedure has been utilized to find adequate predictor combinations and those predictor combinations were tested for collinearity. The performance of the MLRMs has been quantified based on the explained variance. Furthermore, the leave-one-out cross-validation procedure was applied and the root-mean-squared error skill score was calculated at station level in order to obtain reliable evidence of MLRM robustness. From this analysis, it can be stated that the MLRM performance is higher in winter compared to summer. This is due to the annual cycle of incoming insolation and to the local factors such as orography and surface albedo variations. The MLRM performances exhibit distinct variations between regions with high performance in wintertime for the eastern and southern part of the country and in summertime for the western part of the country. One can conclude that the MLRM generally captures quite well the TX90p variability and reveals the potential for statistical downscaling of TX90p values based on circulation types.
Ren, Y Y; Zhou, L C; Yang, L; Liu, P Y; Zhao, B W; Liu, H X
2016-09-01
The paper highlights the use of the logistic regression (LR) method in the construction of acceptable statistically significant, robust and predictive models for the classification of chemicals according to their aquatic toxic modes of action. Essentials accounting for a reliable model were all considered carefully. The model predictors were selected by stepwise forward discriminant analysis (LDA) from a combined pool of experimental data and chemical structure-based descriptors calculated by the CODESSA and DRAGON software packages. Model predictive ability was validated both internally and externally. The applicability domain was checked by the leverage approach to verify prediction reliability. The obtained models are simple and easy to interpret. In general, LR performs much better than LDA and seems to be more attractive for the prediction of the more toxic compounds, i.e. compounds that exhibit excess toxicity versus non-polar narcotic compounds and more reactive compounds versus less reactive compounds. In addition, model fit and regression diagnostics was done through the influence plot which reflects the hat-values, studentized residuals, and Cook's distance statistics of each sample. Overdispersion was also checked for the LR model. The relationships between the descriptors and the aquatic toxic behaviour of compounds are also discussed. PMID:27653817
Rubio, Francisco J; Genton, Marc G
2016-06-30
We study Bayesian linear regression models with skew-symmetric scale mixtures of normal error distributions. These kinds of models can be used to capture departures from the usual assumption of normality of the errors in terms of heavy tails and asymmetry. We propose a general noninformative prior structure for these regression models and show that the corresponding posterior distribution is proper under mild conditions. We extend these propriety results to cases where the response variables are censored. The latter scenario is of interest in the context of accelerated failure time models, which are relevant in survival analysis. We present a simulation study that demonstrates good frequentist properties of the posterior credible intervals associated with the proposed priors. This study also sheds some light on the trade-off between increased model flexibility and the risk of over-fitting. We illustrate the performance of the proposed models with real data. Although we focus on models with univariate response variables, we also present some extensions to the multivariate case in the Supporting Information. Copyright © 2016 John Wiley & Sons, Ltd. PMID:26856806
Yu, Rongqin; Geddes, John R; Fazel, Seena
2012-10-01
The risk of antisocial outcomes in individuals with personality disorder (PD) remains uncertain. The authors synthesize the current evidence on the risks of antisocial behavior, violence, and repeat offending in PD, and they explore sources of heterogeneity in risk estimates through a systematic review and meta-regression analysis of observational studies comparing antisocial outcomes in personality disordered individuals with controls groups. Fourteen studies examined risk of antisocial and violent behavior in 10,007 individuals with PD, compared with over 12 million general population controls. There was a substantially increased risk of violent outcomes in studies with all PDs (random-effects pooled odds ratio [OR] = 3.0, 95% CI = 2.6 to 3.5). Meta-regression revealed that antisocial PD and gender were associated with higher risks (p = .01 and .07, respectively). The odds of all antisocial outcomes were also elevated. Twenty-five studies reported the risk of repeat offending in PD compared with other offenders. The risk of a repeat offense was also increased (fixed-effects pooled OR = 2.4, 95% CI = 2.2 to 2.7) in offenders with PD. The authors conclude that although PD is associated with antisocial outcomes and repeat offending, the risk appears to differ by PD category, gender, and whether individuals are offenders or not.
Rubio, Francisco J.
2016-02-09
We study Bayesian linear regression models with skew-symmetric scale mixtures of normal error distributions. These kinds of models can be used to capture departures from the usual assumption of normality of the errors in terms of heavy tails and asymmetry. We propose a general noninformative prior structure for these regression models and show that the corresponding posterior distribution is proper under mild conditions. We extend these propriety results to cases where the response variables are censored. The latter scenario is of interest in the context of accelerated failure time models, which are relevant in survival analysis. We present a simulation study that demonstrates good frequentist properties of the posterior credible intervals associated with the proposed priors. This study also sheds some light on the trade-off between increased model flexibility and the risk of over-fitting. We illustrate the performance of the proposed models with real data. Although we focus on models with univariate response variables, we also present some extensions to the multivariate case in the Supporting Information.
Gayou, Olivier; Das, Shiva K; Zhou, Su-Min; Marks, Lawrence B; Parda, David S; Miften, Moyed
2008-12-01
A given outcome of radiotherapy treatment can be modeled by analyzing its correlation with a combination of dosimetric, physiological, biological, and clinical factors, through a logistic regression fit of a large patient population. The quality of the fit is measured by the combination of the predictive power of this particular set of factors and the statistical significance of the individual factors in the model. We developed a genetic algorithm (GA), in which a small sample of all the possible combinations of variables are fitted to the patient data. New models are derived from the best models, through crossover and mutation operations, and are in turn fitted. The process is repeated until the sample converges to the combination of factors that best predicts the outcome. The GA was tested on a data set that investigated the incidence of lung injury in NSCLC patients treated with 3DCRT. The GA identified a model with two variables as the best predictor of radiation pneumonitis: the V30 (p=0.048) and the ongoing use of tobacco at the time of referral (p=0.074). This two-variable model was confirmed as the best model by analyzing all possible combinations of factors. In conclusion, genetic algorithms provide a reliable and fast way to select significant factors in logistic regression analysis of large clinical studies.
Improved Regression Analysis of Temperature-Dependent Strain-Gage Balance Calibration Data
Ulbrich, N.
2015-01-01
An improved approach is discussed that may be used to directly include first and second order temperature effects in the load prediction algorithm of a wind tunnel strain-gage balance. The improved approach was designed for the Iterative Method that fits strain-gage outputs as a function of calibration loads and uses a load iteration scheme during the wind tunnel test to predict loads from measured gage outputs. The improved approach assumes that the strain-gage balance is at a constant uniform temperature when it is calibrated and used. First, the method introduces a new independent variable for the regression analysis of the balance calibration data. The new variable is designed as the difference between the uniform temperature of the balance and a global reference temperature. This reference temperature should be the primary calibration temperature of the balance so that, if needed, a tare load iteration can be performed. Then, two temperature{dependent terms are included in the regression models of the gage outputs. They are the temperature difference itself and the square of the temperature difference. Simulated temperature{dependent data obtained from Triumph Aerospace's 2013 calibration of NASA's ARC-30K five component semi{span balance is used to illustrate the application of the improved approach.
An innovative land use regression model incorporating meteorology for exposure analysis.
Su, Jason G; Brauer, Michael; Ainslie, Bruce; Steyn, Douw; Larson, Timothy; Buzzelli, Michael
2008-02-15
The advent of spatial analysis and geographic information systems (GIS) has led to studies of chronic exposure and health effects based on the rationale that intra-urban variations in ambient air pollution concentrations are as great as inter-urban differences. Such studies typically rely on local spatial covariates (e.g., traffic, land use type) derived from circular areas (buffers) to predict concentrations/exposures at receptor sites, as a means of averaging the annual net effect of meteorological influences (i.e., wind speed, wind direction and insolation). This is the approach taken in the now popular land use regression (LUR) method. However spatial studies of chronic exposures and temporal studies of acute exposures have not been adequately integrated. This paper presents an innovative LUR method implemented in a GIS environment that reflects both temporal and spatial variability and considers the role of meteorology. The new source area LUR integrates wind speed, wind direction and cloud cover/insolation to estimate hourly nitric oxide (NO) and nitrogen dioxide (NO(2)) concentrations from land use types (i.e., road network, commercial land use) and these concentrations are then used as covariates to regress against NO and NO(2) measurements at various receptor sites across the Vancouver region and compared directly with estimates from a regular LUR. The results show that, when variability in seasonal concentration measurements is present, the source area LUR or SA-LUR model is a better option for concentration estimation.
A quantile regression approach to the analysis of the quality of life determinants in the elderly
Directory of Open Access Journals (Sweden)
Serena Broccoli
2013-05-01
Full Text Available Objective. The aim of this study is to explain the effect of important covariates on the health-related quality of life (HRQol in elderly subjects. Methods. Data were collected within a longitudinal study that involves 5256 subject, aged +or= 65. The Visual Analogue Scale inclused in the EQ-5D Questionnaire, tha EQ-VAS, was used to obtain a synthetic measure of quality of life. To model EQ-VAS Score a quantile regression analysis was employed. This methodological approach was preferred to an OLS regression becouse of the EQ-VAS Score typical distribution. The main covariates are: amount of weekly physical activity, reported problems in Activity of Daily Living, presence of cardiovascular diseases, diabetes, hypercolesterolemia, hypertension, joints pains, as well as socio-demographic information. Main Results. 1 Even a low level of physical activity significantly influences quality of life in a positive way; 2 ADL problems, at least one cardiovascular disease and joint pain strongly decrease the quality of life.
Mahani, Mohamad Khayatzadeh; Chaloosi, Marzieh; Maragheh, Mohamad Ghanadi; Khanchi, Ali Reza; Afzali, Daryoush
2007-09-01
The oral acute in vivo toxicity of 32 amine and amide drugs was related to their structural-dependent properties. Genetic algorithm-partial least-squares and stepwise variable selection was applied to select of meaningful descriptors. Multiple linear regression (MLR), artificial neural network (ANN) and partial least square (PLS) models were created with selected descriptors. The predictive ability of all three models was evaluated and compared on a set of five drugs, which were not used in modeling steps. Average errors of 0.168, 0.169 and 0.259 were obtained for MLR, ANN and PLS, respectively.
Naghshpour, Shahdad
2012-01-01
Regression analysis is the most commonly used statistical method in the world. Although few would characterize this technique as simple, regression is in fact both simple and elegant. The complexity that many attribute to regression analysis is often a reflection of their lack of familiarity with the language of mathematics. But regression analysis can be understood even without a mastery of sophisticated mathematical concepts. This book provides the foundation and will help demystify regression analysis using examples from economics and with real data to show the applications of the method. T
Institute of Scientific and Technical Information of China (English)
牛东晓; 刘达; 邢棉
2008-01-01
A combined model based on principal components analysis (PCA) and generalized regression neural network (GRNN) was adopted to forecast electricity price in day-ahead electricity market. PCA was applied to mine the main influence on day-ahead price, avoiding the strong correlation between the input factors that might influence electricity price, such as the load of the forecasting hour, other history loads and prices, weather and temperature; then GRNN was employed to forecast electricity price according to the main information extracted by PCA. To prove the efficiency of the combined model, a case from PJM (Pennsylvania-New Jersey-Maryland) day-ahead electricity market was evaluated. Compared to back-propagation (BP) neural network and standard GRNN, the combined method reduces the mean absolute percentage error about 3%.
Logistic Regression Analysis on Factors Affecting Adoption of RiceFish Farming in North Iran
Institute of Scientific and Technical Information of China (English)
Seyyed Ali NOORHOSSEINI-NIYAKI; Mohammad Sadegh ALLAHYARI
2012-01-01
We evaluated the factors influencing the adoption of rice-fish farming in the Tavalesh region near the Caspian Sea in northern Iran.We conducted a survey with open-ended questions.Data were collected from 184 respondents (61 adopters and 123 non-adopters) randomly sampled from selected villages and analyzed using logistic regression and multiresponse analysis.Family size,number of contacts with an extension agent,participation in extension-education activities,membership in social institutions and the presence of farm workers were the most important socioeconomic factors for the adoption of rice-fish farming system.In addition,economic problems were the most common issue reported by adopters.Other issues such as lack of access to appropriate fish food,losses of fish,lack of access to high quality fish fingerlings and dehydration and poor water quality were also important to a number of farmers.
A Note on Penalized Regression Spline Estimation in the Secondary Analysis of Case-Control Data
Gazioglu, Suzan
2013-05-25
Primary analysis of case-control studies focuses on the relationship between disease (D) and a set of covariates of interest (Y, X). A secondary application of the case-control study, often invoked in modern genetic epidemiologic association studies, is to investigate the interrelationship between the covariates themselves. The task is complicated due to the case-control sampling, and to avoid the biased sampling that arises from the design, it is typical to use the control data only. In this paper, we develop penalized regression spline methodology that uses all the data, and improves precision of estimation compared to using only the controls. A simulation study and an empirical example are used to illustrate the methodology.
Tam, Vivian W Y; Wang, K; Tam, C M
2008-04-01
Recycled demolished concrete (DC) as recycled aggregate (RA) and recycled aggregate concrete (RAC) is generally suitable for most construction applications. Low-grade applications, including sub-base and roadwork, have been implemented in many countries; however, higher-grade activities are rarely considered. This paper examines relationships among DC characteristics, properties of their RA and strength of their RAC using regression analysis. Ten samples collected from demolition sites are examined. The results show strong correlation among the DC samples, properties of RA and RAC. It should be highlighted that inferior quality of DC will lower the quality of RA and thus their RAC. Prediction of RAC strength is also formulated from the DC characteristics and the RA properties. From that, the RAC performance from DC and RA can be estimated. In addition, RAC design requirements can also be developed at the initial stage of concrete demolition. Recommendations are also given to improve the future concreting practice. PMID:17764837
Biological stability in drinking water: a regression analysis of influencing factors
Institute of Scientific and Technical Information of China (English)
LU Wei; ZHANG Xiao-jian
2005-01-01
Some parameters, such as assimilable organic carbon(AOC), chloramine residual, water temperature, and water residence time, were measured in drinking water from distribution systems in a northern city of China. The measurement results illustrate that when chloramine residual is more than 0.3 mg/L or AOC content is below 50 tμg/L, the biological stability of drinking water can be controlled.Both chloramine residual and AOC have a good relationship with Heterotrophic Plate Counts(HPC)(log value), the correlation coefficient was -0.64 and 0.33, respectively. By regression analysis of the survey data, a statistical equation is presented and it is concluded that disinfectant residual exerts the strongest influence on bacterial growth and AOC is a suitable index to assess the biological stability in the drinking water.
ANALYSIS OF TUITION GROWTH RATES BASED ON CLUSTERING AND REGRESSION MODELS
Directory of Open Access Journals (Sweden)
Long Cheng
2016-07-01
Full Text Available Tuition plays a significant role in determining whether a student could afford higher education, which is one of the major driving forces for country development and social prosperity. So it is necessary to fully understand what factors might affect the tuition and how they affect it. However, many existing studies on the tuition growth rate either lack sufficient real data and proper quantitative models to support their conclusions, or are limited to focus on only a few factors that might affect the tuition growth rate, failing to make a comprehensive analysis. In this paper, we explore a wide variety of factors that might affect the tuition growth rate by use of large amounts of authentic data and different quantitative methods such as clustering and regression models.
Directory of Open Access Journals (Sweden)
Wu, X. B.
2006-06-01
Full Text Available Four body-size and fourteen head-size measurements were taken from each Chinese alligator (Alligator sinensis according to the measurements adapted from Verdade. Regression equations between body-size and head-size variables were presented to predict body size from head dimension. The coefficients of determination of captive animals concerning body- and head-size variables can be considered extremely high, which means most of the head-size variables studied can be useful for predicting body length. The result of multivariate allometric analysis indicated that the head elongates as in most other species of crocodilians. The allometric coefficients of snout length (SL and lower ramus (LM were greater than those of other variables of head, which was considered to be possibly correlated to fights and prey. On the contrary, allometric coefficients for the variables of obita (OW, OL and postorbital cranial roof (LCR, were lower than those of other variables.
Directory of Open Access Journals (Sweden)
Kulikov Vladimir
2016-01-01
Full Text Available We have been elaborating an approach founded on the identification of multimodal laws of the complex structure distribution in medicine, biology, chemistry of ultrapure materials and membrane technology as well as in technical applications. The method is based on the formulation and solution of inverse problems in mathematical physics for the respective probability density functions. The verification of the used algorithmic tools is carried out on model limited-scope samples. For stochastic structures and systems under study the method is supplemented with an original option of a regression analysis taking into account the identified stochastic laws displaying numerical parameters into the binary space. The proposed approach has been tested on clinical material in practical medicine.
Deterministic Assessment of Continuous Flight Auger Construction Durations Using Regression Analysis
Directory of Open Access Journals (Sweden)
Hossam E. Hosny
2015-07-01
Full Text Available One of the primary functions of construction equipment management is to calculate the production rate of equipment which will be a major input to the processes of time estimates, cost estimates and the overall project planning. Accordingly, it is crucial to stakeholders to be able to compute equipment production rates. This may be achieved using an accurate, reliable and easy tool. The objective of this research is to provide a simple model that can be used by specialists to predict the duration of a proposed Continuous Flight Auger job. The model was obtained using a prioritizing technique based on expert judgment then using multi-regression analysis based on a representative sample. The model was then validated on a selected sample of projects. The average error of the model was calculated to be about (3%-6%.
Snyder, Carolyn W.
2016-09-01
Statistical challenges often preclude comparisons among different sea surface temperature (SST) reconstructions over the past million years. Inadequate consideration of uncertainty can result in misinterpretation, overconfidence, and biased conclusions. Here I apply Bayesian hierarchical regressions to analyze local SST responsiveness to climate changes for 54 SST reconstructions from across the globe over the past million years. I develop methods to account for multiple sources of uncertainty, including the quantification of uncertainty introduced from absolute dating into interrecord comparisons. The estimates of local SST responsiveness explain 64% (62% to 77%, 95% interval) of the total variation within each SST reconstruction with a single number. There is remarkable agreement between SST proxy methods, with the exception of Mg/Ca proxy methods estimating muted responses at high latitudes. The Indian Ocean exhibits a muted response in comparison to other oceans. I find a stable estimate of the proposed "universal curve" of change in local SST responsiveness to climate changes as a function of sin2(latitude) over the past 400,000 years: SST change at 45°N/S is larger than the average tropical response by a factor of 1.9 (1.5 to 2.6, 95% interval) and explains 50% (35% to 58%, 95% interval) of the total variation between each SST reconstruction. These uncertainty and statistical methods are well suited for application across paleoclimate and environmental data series intercomparisons.
Directory of Open Access Journals (Sweden)
Tao Gao
2014-01-01
Full Text Available Extreme precipitation is likely to be one of the most severe meteorological disasters in China; however, studies on the physical factors affecting precipitation extremes and corresponding prediction models are not accurately available. From a new point of view, the sensible heat flux (SHF and latent heat flux (LHF, which have significant impacts on summer extreme rainfall in Yangtze River basin (YRB, have been quantified and then selections of the impact factors are conducted. Firstly, a regional extreme precipitation index was applied to determine Regions of Significant Correlation (RSC by analyzing spatial distribution of correlation coefficients between this index and SHF, LHF, and sea surface temperature (SST on global ocean scale; then the time series of SHF, LHF, and SST in RSCs during 1967–2010 were selected. Furthermore, other factors that significantly affect variations in precipitation extremes over YRB were also selected. The methods of multiple stepwise regression and leave-one-out cross-validation (LOOCV were utilized to analyze and test influencing factors and statistical prediction model. The correlation coefficient between observed regional extreme index and model simulation result is 0.85, with significant level at 99%. This suggested that the forecast skill was acceptable although many aspects of the prediction model should be improved.
Poisson regression analysis of the mortality among a cohort of World War II nuclear industry workers
International Nuclear Information System (INIS)
A historical cohort mortality study was conducted among 28,008 white male employees who had worked for at least 1 month in Oak Ridge, Tennessee, during World War II. The workers were employed at two plants that were producing enriched uranium and a research and development laboratory. Vital status was ascertained through 1980 for 98.1% of the cohort members and death certificates were obtained for 96.8% of the 11,671 decedents. A modified version of the traditional standardized mortality ratio (SMR) analysis was used to compare the cause-specific mortality experience of the World War II workers with the U.S. white male population. An SMR and a trend statistic were computed for each cause-of-death category for the 30-year interval from 1950 to 1980. The SMR for all causes was 1.11, and there was a significant upward trend of 0.74% per year. The excess mortality was primarily due to lung cancer and diseases of the respiratory system. Poisson regression methods were used to evaluate the influence of duration of employment, facility of employment, socioeconomic status, birth year, period of follow-up, and radiation exposure on cause-specific mortality. Maximum likelihood estimates of the parameters in a main-effects model were obtained to describe the joint effects of these six factors on cause-specific mortality of the World War II workers. We show that these multivariate regression techniques provide a useful extension of conventional SMR analysis and illustrate their effective use in a large occupational cohort study
International Nuclear Information System (INIS)
The ORC (organic Rankine cycle) is an established technology for converting low temperature heat to electricity. Knowing that most of the commercially available ORCs are of the subcritical type, there is potential for improvement by implementing new cycle architectures. The cycles under consideration are: the SCORC (subcritical ORC), the TCORC (transcritical ORC) and the PEORC (partial evaporation ORC). Care is taken to develop an optimization strategy considering various boundary conditions. The analysis and comparison is based on an exergy approach. Initially 67 possible working fluids are investigated. In successive stages design constraints are added. First, only environmentally friendly working fluids are retained. Next, the turbine outlet is constrained to a superheated state. Finally, the heat carrier exit temperature is restricted and addition of a recuperator is considered. Regression models with low computational cost are provided to quickly evaluate each design implications. The results indicate that the PEORC clearly outperforms the TCORC by up to 25.6% in second law efficiency, while the TCORC outperforms the SCORC by up to 10.8%. For high waste heat carrier inlet temperatures the performance gain becomes small. Additionally, a high performing environmentally friendly working fluid for the TCORC is missing at low heat carrier temperatures (100 °C). - Highlights: • Thermodynamic analysis of subcritical, transcritical and partial evaporation ORC. • Regression models are provided to quickly assess design implications. • Performance gain up to 25.6% for PEORC compared to TCORC. • Performance gain up to 10.8% for TCORC compared to SCORC. • Opportunity for new low temperature environmentally friendly working fluids
Biplots in Reduced-Rank Regression
Braak, ter C.J.F.; Looman, C.W.N.
1994-01-01
Regression problems with a number of related response variables are typically analyzed by separate multiple regressions. This paper shows how these regressions can be visualized jointly in a biplot based on reduced-rank regression. Reduced-rank regression combines multiple regression and principal c
Directory of Open Access Journals (Sweden)
Marco Aurélio Carino Bouzada
2009-09-01
Full Text Available Este trabalho descreve - por meio do estudo de um caso - o problema da previsão de demanda de chamadas para um determinado produto no call center de uma grande empresa brasileira do setor - a Contax - e como ele foi abordado com o uso de Regressão Múltipla com variáveis dummy. Depois de destacar e justificar a importância do tema, o estudo apresenta uma breve revisão de literatura acerca de métodos de previsão de demanda e de sua aplicação em call centers. O caso é descrito, contextualizando, inicialmente, a empresa estudada e descrevendo, a seguir, a forma como ela lida com o problema de previsão de demanda de chamadas para o produto 103 - serviços relacionados à telefonia fixa. Um modelo de Regressão Múltipla com variáveis dummy é, então, desenvolvido para servir como base do processo de previsão de demanda proposto. Este modelo utiliza informações disponíveis capazes de influenciar a demanda, tais como o dia da semana, a ocorrência ou não de feriado e a proximidade da data com eventos críticos, como a chegada da conta à residência do cliente e seu vencimento; e apresentou ganhos de acurácia da ordem de 3 pontos percentuais para o período estudado, quando comparado com a ferramenta anteriormente em uso.This work describes - with the aid of a case study -a demand forecast problem for a specific product reported to the call center of a large Brazilian company in an industry called Contax, and the way it was approached with the use of Multiple Regression using dummy variables. After highlighting and justifying the studied matter relevance, the article presents a small literature review regarding demand forecast methods and their use in the call center industry. The case is described presenting the studied company and the way it deals with the Forecasting Demand for a telephone all center regarding telephone services products. Therefore, a Multiple Regression with dummy variables model was developed to work as the
Directory of Open Access Journals (Sweden)
Süleyman Demir
2014-04-01
Full Text Available This study performs a Differential Item Function (DIF analysis in terms of gender and culture on the items available in the PISA 2009 mathematics literacy sub-test. The DIF analyses were done through the Mantel Haenszel, Logistic Regression and the SIBTEST methods. The data for the gender variable were collected from the responses given by 332 students to the items in the mathematics literacy sub-test during the administration of the 5th booklet in the PISA 2009 application whereas the data for the culture variable were collected through the application of the 5th booklet in Turkey, Germany, Finland and the United States in the PISA 2009 application. As a result of DIF analysis according to gender, 4 items carried out in favor of men, only one item can be said to be advantageous in favor of girls. As a result of DIF analysis according to culture, 16 items for Turkish and German students, 14 items for Turkish and Finn students, 18 items for Turkish and United States students were determined.
Fulton, Barry A; Meyer, Joseph S
2014-08-01
The water effect ratio (WER) procedure developed by the US Environmental Protection Agency is commonly used to derive site-specific criteria for point-source metal discharges into perennial waters. However, experience is limited with this method in the ephemeral and intermittent systems typical of arid climates. The present study presents a regression model to develop WER-based site-specific criteria for a network of ephemeral and intermittent streams influenced by nonpoint sources of Cu in the southwestern United States. Acute (48-h) Cu toxicity tests were performed concurrently with Daphnia magna in site water samples and hardness-matched laboratory waters. Median effect concentrations (EC50s) for Cu in site water samples (n=17) varied by more than 12-fold, and the range of calculated WER values was similar. Statistically significant (α=0.05) univariate predictors of site-specific Cu toxicity included (in sequence of decreasing significance) dissolved organic carbon (DOC), hardness/alkalinity ratio, alkalinity, K, and total dissolved solids. A multiple-regression model developed from a combination of DOC and alkalinity explained 85% of the toxicity variability in site water samples, providing a strong predictive tool that can be used in the WER framework when site-specific criteria values are derived. The biotic ligand model (BLM) underpredicted toxicity in site waters by more than 2-fold. Adjustments to the default BLM parameters improved the model's performance but did not provide a better predictive tool compared with the regression model developed from DOC and alkalinity.
Energy Technology Data Exchange (ETDEWEB)
Lu, Lee-Jane W [Department of Preventive Medicine and Community Health, University of Texas Medical Branch, Galveston, TX 77555-1109 (United States); Nishino, Thomas K [Department of Radiology, University of Texas Medical Branch, Galveston, TX 77555-0709 (United States); Khamapirad, Tuenchit [Department of Radiology, University of Texas Medical Branch, Galveston, TX 77555-0709 (United States); Grady, James J [Department of Preventive Medicine and Community Health, University of Texas Medical Branch, Galveston, TX 77555-1109 (United States); Jr, Morton H Leonard [Department of Radiology, University of Texas Medical Branch, Galveston, TX 77555-0709 (United States); Brunder, Donald G [Department of Academic Computing/Academic Resources, University of Texas Medical Branch, Galveston, TX 77555-1035 (United States)
2007-08-21
Breast density (the percentage of fibroglandular tissue in the breast) has been suggested to be a useful surrogate marker for breast cancer risk. It is conventionally measured using screen-film mammographic images by a labor-intensive histogram segmentation method (HSM). We have adapted and modified the HSM for measuring breast density from raw digital mammograms acquired by full-field digital mammography. Multiple regression model analyses showed that many of the instrument parameters for acquiring the screening mammograms (e.g. breast compression thickness, radiological thickness, radiation dose, compression force, etc) and image pixel intensity statistics of the imaged breasts were strong predictors of the observed threshold values (model R{sup 2} = 0.93) and %-density (R{sup 2} = 0.84). The intra-class correlation coefficient of the %-density for duplicate images was estimated to be 0.80, using the regression model-derived threshold values, and 0.94 if estimated directly from the parameter estimates of the %-density prediction regression model. Therefore, with additional research, these mathematical models could be used to compute breast density objectively, automatically bypassing the HSM step, and could greatly facilitate breast cancer research studies.
Lu, Lee-Jane W.; Nishino, Thomas K.; Khamapirad, Tuenchit; Grady, James J.; Leonard, Morton H., Jr.; Brunder, Donald G.
2007-08-01
Breast density (the percentage of fibroglandular tissue in the breast) has been suggested to be a useful surrogate marker for breast cancer risk. It is conventionally measured using screen-film mammographic images by a labor-intensive histogram segmentation method (HSM). We have adapted and modified the HSM for measuring breast density from raw digital mammograms acquired by full-field digital mammography. Multiple regression model analyses showed that many of the instrument parameters for acquiring the screening mammograms (e.g. breast compression thickness, radiological thickness, radiation dose, compression force, etc) and image pixel intensity statistics of the imaged breasts were strong predictors of the observed threshold values (model R2 = 0.93) and %-density (R2 = 0.84). The intra-class correlation coefficient of the %-density for duplicate images was estimated to be 0.80, using the regression model-derived threshold values, and 0.94 if estimated directly from the parameter estimates of the %-density prediction regression model. Therefore, with additional research, these mathematical models could be used to compute breast density objectively, automatically bypassing the HSM step, and could greatly facilitate breast cancer research studies.
Directory of Open Access Journals (Sweden)
Alberto Alberti
2015-01-01
Full Text Available Feline viral plaques are uncommon skin lesions clinically characterized by multiple, often pigmented, and slightly raised lesions. Numerous reports suggest that papillomaviruses (PVs are involved in their development. Immunosuppressed and immunocompetent cats are both affected, the biological behavior is variable, and the regression is possible but rarely documented. Here we report a case of a FIV-positive cat with skin fragility syndrome and regressing multiple viral plaques in which the contemporary presence of two PV types (FcaPV2 and FcaPV3 was demonstrated by combining a quantitative molecular approach to histopathology. The cat, under glucocorticoid therapy for stomatitis and pruritus, developed skin fragility and numerous grouped slightly raised nonulcerated pigmented macules and plaques with histological features of epidermal thickness, mild dysplasia, and presence of koilocytes. Absolute quantification of the viral DNA copies (4555 copies/microliter of FcaPV2 and 8655 copies/microliter of FcaPV3 was obtained. Eighteen months after discontinuation of glucocorticoid therapy skin fragility and viral plaques had resolved. The role of the two viruses cannot be established and it remains undetermined how each of the viruses has contributed to the onset of VP; the spontaneous remission of skin lesions might have been induced by FIV status change over time due to glucocorticoid withdraw and by glucocorticoids withdraw itself.
Díaz, S.; Deferrari, G.; Martinioni, D.; Oberto, A.
2000-05-01
Factors affecting UV radiation at the earth's surface include the solar zenith angle, earth-sun distance, clouds, aerosols, altitude, ozone and the ground's albedo. The variation of some factors, such as solar zenith angle and earth-sun distance, is well established. Total column ozone and UV radiation are inversely related, but the presence of clouds may affect the resulting UV in such a way that a depletion in the total column ozone may not always lead to an increase in the radiation at the earth's surface. The aim of this paper is to determine the contribution to the variation of the biologically effective irradiance by geometric factors, clouds and ozone, jointly and separately, in Ushuaia (54°49'S, 68°19'W, sea level), and the seasonal variation of this relationship, given the magnitude and seasonal distribution of the ozone depletion and the frequent presence of high cloud cover in this site. For this purpose, multivariate and simple regression analyses of daily and monthly integrated irradiances weighted by the DNA damage action spectrum as a function of total column ozone and the integrated irradiances in the band 337-342 nm (as a proxy for cloud cover and geometric factors) have been performed. For the analysed period (September 1989-December 1996) more than 97% of the variation of the DNA damage weighted daily integrated irradiances is described by changes in ozone, clouds and geometric factors. Simple regression analysis for daily integrated irradiances, grouped by month, shows that most of this variation is explained by clouds and geometric factors, except in spring, when strong ozone depletion occurs intermittently over this area. When monthly trends are removed, similar results are observed, except for late winter.
Tabatabai Mohammad A; Eby Wayne M; Nimeh Nadim; Li Hong; Singh Karan P
2012-01-01
Abstract Background We explore the benefits of applying a new proportional hazard model to analyze survival of breast cancer patients. As a parametric model, the hypertabastic survival model offers a closer fit to experimental data than Cox regression, and furthermore provides explicit survival and hazard functions which can be used as additional tools in the survival analysis. In addition, one of our main concerns is utilization of multiple gene expression variables. Our analysis treats the ...
Institute of Scientific and Technical Information of China (English)
王蕾; 牟李红; 梁浩; 秦波; 李革; 杨戎
2011-01-01
diseases; and keep the reproductive system healthy, the acceptance rate increased by 33.06％, and might increase by 40.82％(P＜.0001 ). The results of the multivariate logistic regression showed that the factors affecting the willingness to aocept the surgery were as follows: whether or not they had phimosis or redundant foreskin; whether the surgery could increase the sexual satisfaction in the future or not; whether or not there were close friends or relatives who had had this surgery, and so on. Conclusion: The promotion of the operation in the public needs our promulgation. Medical students who have accepted the operation can be trained for healthy education to other people; meanwhile, relevant divisions or departments can make improvement in technology and make the surgery cheaper, safer and easier.
Directory of Open Access Journals (Sweden)
Pudji Ismartini
2010-08-01
Full Text Available One of the major problem facing the data modelling at social area is multicollinearity. Multicollinearity can have significant impact on the quality and stability of the fitted regression model. Common classical regression technique by using Least Squares estimate is highly sensitive to multicollinearity problem. In such a problem area, Partial Least Squares Regression (PLSR is a useful and flexible tool for statistical model building; however, PLSR can only yields point estimations. This paper will construct the interval estimations for PLSR regression parameters by implementing Jackknife technique to poverty data. A SAS macro programme is developed to obtain the Jackknife interval estimator for PLSR.
DEFF Research Database (Denmark)
Fitzenberger, Bernd; Wilke, Ralf Andreas
2015-01-01
if the mean regression model does not. We provide a short informal introduction into the principle of quantile regression which includes an illustrative application from empirical labor market research. This is followed by briefly sketching the underlying statistical model for linear quantile regression based......, duration data, and endogeneity, and we describe how quantile regression can be used for decomposition analysis. Finally, we identify several key issues, which should be addressed by future research, and we provide an overview of quantile regression implementations in major statistics software. Our...... treatment of the topic is based on the perspective of applied researchers using quantile regression in their empirical work....
Integrative Data Analysis: The Simultaneous Analysis of Multiple Data Sets
Curran, Patrick J.; Hussong, Andrea M.
2009-01-01
There are both quantitative and methodological techniques that foster the development and maintenance of a cumulative knowledge base within the psychological sciences. Most noteworthy of these techniques is meta-analysis, which allows for the synthesis of summary statistics drawn from multiple studies when the original data are not available.…
Binary Logistic Regression Analysis of Foramen Magnum Dimensions for Sex Determination
Directory of Open Access Journals (Sweden)
Venkatesh Gokuldas Kamath
2015-01-01
Full Text Available Purpose. The structural integrity of foramen magnum is usually preserved in fire accidents and explosions due to its resistant nature and secluded anatomical position and this study attempts to determine its sexing potential. Methods. The sagittal and transverse diameters and area of foramen magnum of seventy-two skulls (41 male and 31 female from south Indian population were measured. The analysis was done using Student’s t-test, linear correlation, histogram, Q-Q plot, and Binary Logistic Regression (BLR to obtain a model for sex determination. The predicted probabilities of BLR were analysed using Receiver Operating Characteristic (ROC curve. Result. BLR analysis and ROC curve revealed that the predictability of the dimensions in sexing the crania was 69.6% for sagittal diameter, 66.4% for transverse diameter, and 70.3% for area of foramen. Conclusion. The sexual dimorphism of foramen magnum dimensions is established. However, due to considerable overlapping of male and female values, it is unwise to singularly rely on the foramen measurements. However, considering the high sex predictability percentage of its dimensions in the present study and the studies preceding it, the foramen measurements can be used to supplement other sexing evidence available so as to precisely ascertain the sex of the skeleton.
Gibbons, Robert D; Segawa, Eisuke; Karabatsos, George; Amatya, Anup K; Bhaumik, Dulal K; Brown, C Hendricks; Kapur, Kush; Marcus, Sue M; Hur, Kwan; Mann, J John
2008-05-20
A new statistical methodology is developed for the analysis of spontaneous adverse event (AE) reports from post-marketing drug surveillance data. The method involves both empirical Bayes (EB) and fully Bayes estimation of rate multipliers for each drug within a class of drugs, for a particular AE, based on a mixed-effects Poisson regression model. Both parametric and semiparametric models for the random-effect distribution are examined. The method is applied to data from Food and Drug Administration (FDA)'s Adverse Event Reporting System (AERS) on the relationship between antidepressants and suicide. We obtain point estimates and 95 per cent confidence (posterior) intervals for the rate multiplier for each drug (e.g. antidepressants), which can be used to determine whether a particular drug has an increased risk of association with a particular AE (e.g. suicide). Confidence (posterior) intervals that do not include 1.0 provide evidence for either significant protective or harmful associations of the drug and the adverse effect. We also examine EB, parametric Bayes, and semiparametric Bayes estimators of the rate multipliers and associated confidence (posterior) intervals. Results of our analysis of the FDA AERS data revealed that newer antidepressants are associated with lower rates of suicide adverse event reports compared with older antidepressants. We recommend improvements to the existing AERS system, which are likely to improve its public health value as an early warning system. PMID:18404622
Generalized multilevel function-on-scalar regression and principal component analysis.
Goldsmith, Jeff; Zipunnikov, Vadim; Schrack, Jennifer
2015-06-01
This manuscript considers regression models for generalized, multilevel functional responses: functions are generalized in that they follow an exponential family distribution and multilevel in that they are clustered within groups or subjects. This data structure is increasingly common across scientific domains and is exemplified by our motivating example, in which binary curves indicating physical activity or inactivity are observed for nearly 600 subjects over 5 days. We use a generalized linear model to incorporate scalar covariates into the mean structure, and decompose subject-specific and subject-day-specific deviations using multilevel functional principal components analysis. Thus, functional fixed effects are estimated while accounting for within-function and within-subject correlations, and major directions of variability within and between subjects are identified. Fixed effect coefficient functions and principal component basis functions are estimated using penalized splines; model parameters are estimated in a Bayesian framework using Stan, a programming language that implements a Hamiltonian Monte Carlo sampler. Simulations designed to mimic the application have good estimation and inferential properties with reasonable computation times for moderate datasets, in both cross-sectional and multilevel scenarios; code is publicly available. In the application we identify effects of age and BMI on the time-specific change in probability of being active over a 24-hour period; in addition, the principal components analysis identifies the patterns of activity that distinguish subjects and days within subjects.
Kügler, S. D.; Polsterer, K.; Hoecker, M.
2015-04-01
Context. In astronomy, new approaches to process and analyze the exponentially increasing amount of data are inevitable. For spectra, such as in the Sloan Digital Sky Survey spectral database, usually templates of well-known classes are used for classification. In case the fitting of a template fails, wrong spectral properties (e.g. redshift) are derived. Validation of the derived properties is the key to understand the caveats of the template-based method. Aims: In this paper we present a method for statistically computing the redshift z based on a similarity approach. This allows us to determine redshifts in spectra for emission and absorption features without using any predefined model. Additionally, we show how to determine the redshift based on single features. As a consequence we are, for example, able to filter objects that show multiple redshift components. Methods: The redshift calculation is performed by comparing predefined regions in the spectra and individually applying a nearest neighbor regression model to each predefined emission and absorption region. Results: The choice of the model parameters controls the quality and the completeness of the redshifts. For ≈90% of the analyzed 16 000 spectra of our reference and test sample, a certain redshift can be computed that is comparable to the completeness of SDSS (96%). The redshift calculation yields a precision for every individually tested feature that is comparable to the overall precision of the redshifts of SDSS. Using the new method to compute redshifts, we could also identify 14 spectra with a significant shift between emission and absorption or between emission and emission lines. The results already show the immense power of this simple machine-learning approach for investigating huge databases such as the SDSS.
Structural model analysis of multiple quantitative traits.
Directory of Open Access Journals (Sweden)
Renhua Li
2006-07-01
Full Text Available We introduce a method for the analysis of multilocus, multitrait genetic data that provides an intuitive and precise characterization of genetic architecture. We show that it is possible to infer the magnitude and direction of causal relationships among multiple correlated phenotypes and illustrate the technique using body composition and bone density data from mouse intercross populations. Using these techniques we are able to distinguish genetic loci that affect adiposity from those that affect overall body size and thus reveal a shortcoming of standardized measures such as body mass index that are widely used in obesity research. The identification of causal networks sheds light on the nature of genetic heterogeneity and pleiotropy in complex genetic systems.
Boy-Roura, M; Cameron, K C; Di, H J
2016-02-01
This study presents a meta-analysis of 12 experiments that quantify nitrate-N leaching losses from grazed pasture systems in alluvial sedimentary soils in Canterbury (New Zealand). Mean measured nitrate-N leached (kg N/ha × 100 mm drainage) losses were 2.7 when no urine was applied, 8.4 at the urine rate of 300 kg N/ha, 9.8 at 500 kg N/ha, 24.5 at 700 kg N/ha and 51.4 at 1000 kg N/ha. Lismore soils presented significantly higher nitrate-N losses compared to Templeton soils. Moreover, a multiple linear regression (MLR) model was developed to determine the key factors that influence nitrate-N leaching and to predict nitrate-N leaching losses. The MLR analyses was calibrated and validated using 82 average values of nitrate-N leached and 48 explanatory variables representative of nitrogen inputs and outputs, transport, attenuation of nitrogen and farm management practices. The MLR model (R (2) = 0.81) showed that nitrate-N leaching losses were greater at higher urine application rates and when there was more drainage from rainfall and irrigation. On the other hand, nitrate leaching decreased when nitrification inhibitors (e.g. dicyandiamide (DCD)) were applied. Predicted nitrate-N leaching losses at the paddock scale were calculated using the MLR equation, and they varied largely depending on the urine application rate and urine patch coverage. PMID:26498804
Hao, Lingxin
2007-01-01
Quantile Regression, the first book of Hao and Naiman's two-book series, establishes the seldom recognized link between inequality studies and quantile regression models. Though separate methodological literature exists for each subject, the authors seek to explore the natural connections between this increasingly sought-after tool and research topics in the social sciences. Quantile regression as a method does not rely on assumptions as restrictive as those for the classical linear regression; though more traditional models such as least squares linear regression are more widely utilized, Hao
DIOPTER REGRESSION ANALYSIS OF LASER IN SITU KERATOMILEUSIS IN THE TREATMENT OF MYOPIALIAN
Institute of Scientific and Technical Information of China (English)
廉井财; 张琼; 叶纹; 周德佑; 王康孙
2003-01-01
Objective To evaluate the relevant factors of regression phenomenon of laser in situ keratomileusis (LASIK) for treatment of myopia.MethodsWe studied 408 eyes of 250 myopic patients who received LASIK. Patients were divided into 2 groups according to preoperative diopters (-6.00D~10.00D, 194 eyes;-10.10D~-15.00D, 214 eyes). Mean period of follow up were 12 months and the results were statistically analyzed.Results12 months after surgery, in the first group (-6.00D~-10.00D) the regression equal to or beyond -1.0D were 21 eyes (10.8%), range from1.0D to3.0D. The average regression was 1.33D. In the second group (-10.10D~-15.00D) regression equal to or beyond -1.0D were 78 eyes (36.5%), range from 1.0D to -5.50D. The average regression was 1.99D.ConclusionThe results indicate that excimer LASIK can be used to treat myopia between -6.00D~-15.00D effectively with minimal regression within 12 months. Preoperative thin corneas with intraoperative small ablation zone could induce regression. Some modification of the surgical algorithms and laser nomogram will help to improve predictability and reduce regression.
Directory of Open Access Journals (Sweden)
Anwar Fitrianto
2014-01-01
Full Text Available When independent variables have high linear correlation in a multiple linear regression model, we can have wrong analysis. It happens if we do the multiple linear regression analysis based on common Ordinary Least Squares (OLS method. In this situation, we are suggested to use ridge regression estimator. We conduct some simulation study to compare the performance of ridge regression estimator and the OLS. We found that Hoerl and Kennard ridge regression estimation method has better performance than the other approaches.
Energy Technology Data Exchange (ETDEWEB)
Clegg, Samuel M [Los Alamos National Laboratory; Barefield, James E [Los Alamos National Laboratory; Wiens, Roger C [Los Alamos National Laboratory; Dyar, Melinda D [MT HOLYOKE COLLEGE; Schafer, Martha W [LSU; Tucker, Jonathan M [MT HOLYOKE COLLEGE
2008-01-01
The ChemCam instrument on the Mars Science Laboratory (MSL) will include a laser-induced breakdown spectrometer (LIBS) to quantify major and minor elemental compositions. The traditional analytical chemistry approach to calibration curves for these data regresses a single diagnostic peak area against concentration for each element. This approach contrasts with a new multivariate method in which elemental concentrations are predicted by step-wise multiple regression analysis based on areas of a specific set of diagnostic peaks for each element. The method is tested on LIBS data from igneous and metamorphosed rocks. Between 4 and 13 partial regression coefficients are needed to describe each elemental abundance accurately (i.e., with a regression line of R{sup 2} > 0.9995 for the relationship between predicted and measured elemental concentration) for all major and minor elements studied. Validation plots suggest that the method is limited at present by the small data set, and will work best for prediction of concentration when a wide variety of compositions and rock types has been analyzed.
Demenais, F M; Laing, A E; Bonney, G E
1992-01-01
Segregation analysis of discrete traits can be conducted by the classical mixed model and the recently introduced regressive models. The mixed model assumes an underlying liability to the disease, to which a major gene, a multifactorial component, and random environment contribute independently. Affected persons have a liability exceeding a threshold. The regressive logistic models assume that the logarithm of the odds of being affected is a linear function of major genotype effects, the phenotypes of older relatives, and other covariates. A formulation of the regressive models, based on an underlying liability model, has been recently proposed. The regression coefficients on antecedents are expressed in terms of the relevant familial correlations and a one-to-one correspondence with the parameters of the mixed model can thus be established. Computer simulations are conducted to evaluate the fit of the two formulations of the regressive models to the mixed model on nuclear families. The two forms of the class D regressive model provide a good fit to a generated mixed model, in terms of both hypothesis testing and parameter estimation. The simpler class A regressive model, which assumes that the outcomes of children depend solely on the outcomes of parents, is not robust against a sib-sib correlation exceeding that specified by the model, emphasizing testing class A against class D. The studies reported here show that if the true state of nature is that described by the mixed model, then a regressive model will do just as well. Moreover, the regressive models, allowing for more patterns of family dependence, provide a flexible framework to understand gene-environment interactions in complex diseases. PMID:1487139
Haghighi, Mona; Johnson, Suzanne Bennett; Qian, Xiaoning; Lynch, Kristian F; Vehik, Kendra; Huang, Shuai
2016-01-01
Regression models are extensively used in many epidemiological studies to understand the linkage between specific outcomes of interest and their risk factors. However, regression models in general examine the average effects of the risk factors and ignore subgroups with different risk profiles. As a result, interventions are often geared towards the average member of the population, without consideration of the special health needs of different subgroups within the population. This paper demonstrates the value of using rule-based analysis methods that can identify subgroups with heterogeneous risk profiles in a population without imposing assumptions on the subgroups or method. The rules define the risk pattern of subsets of individuals by not only considering the interactions between the risk factors but also their ranges. We compared the rule-based analysis results with the results from a logistic regression model in The Environmental Determinants of Diabetes in the Young (TEDDY) study. Both methods detected a similar suite of risk factors, but the rule-based analysis was superior at detecting multiple interactions between the risk factors that characterize the subgroups. A further investigation of the particular characteristics of each subgroup may detect the special health needs of the subgroup and lead to tailored interventions.
Dai, Wensheng; Wu, Jui-Yu; Lu, Chi-Jie
2014-01-01
Sales forecasting is one of the most important issues in managing information technology (IT) chain store sales since an IT chain store has many branches. Integrating feature extraction method and prediction tool, such as support vector regression (SVR), is a useful method for constructing an effective sales forecasting scheme. Independent component analysis (ICA) is a novel feature extraction technique and has been widely applied to deal with various forecasting problems. But, up to now, only the basic ICA method (i.e., temporal ICA model) was applied to sale forecasting problem. In this paper, we utilize three different ICA methods including spatial ICA (sICA), temporal ICA (tICA), and spatiotemporal ICA (stICA) to extract features from the sales data and compare their performance in sales forecasting of IT chain store. Experimental results from a real sales data show that the sales forecasting scheme by integrating stICA and SVR outperforms the comparison models in terms of forecasting error. The stICA is a promising tool for extracting effective features from branch sales data and the extracted features can improve the prediction performance of SVR for sales forecasting.
A Vehicle Traveling Time Prediction Method Based on Grey Theory and Linear Regression Analysis
Institute of Scientific and Technical Information of China (English)
TU Jun; LI Yan-ming; LIU Cheng-liang
2009-01-01
Vehicle traveling time prediction is an important part of the research of intelligent transportation system. By now, there have been various kinds of methods for vehicle traveling time prediction. But few consider both aspects of time and space. In this paper, a vehicle traveling time prediction method based on grey theory (GT) and linear regression analysis (LRA) is presented. In aspects of time, we use the history data sequence of bus speed on a certain road to predict the future bus speed on that road by GT. And in aspects of space, we calculate the traffic affecting factors between various roads by LRA. Using these factors we can predict the vehicle's speed at the lower road if the vehicle's speed at the current road is known. Finally we use time factor and space factor as the weighting factors of the two results predicted by GT and LRA respectively to find the fina0l result, thus calculating the vehicle's travehng time. The method also considers such factors as dwell time, thus making the prediction more accurate.
Variable Selection for Functional Logistic Regression in fMRI Data Analysis
Directory of Open Access Journals (Sweden)
Nedret BILLOR
2015-03-01
Full Text Available This study was motivated by classification problem in Functional Magnetic Resonance Imaging (fMRI, a noninvasive imaging technique which allows an experimenter to take images of a subject's brain over time. As fMRI studies usually have a small number of subjects and we assume that there is a smooth, underlying curve describing the observations in fMRI data, this results in incredibly high-dimensional datasets that are functional in nature. High dimensionality is one of the biggest problems in statistical analysis of fMRI data. There is also a need for the development of better classification methods. One of the best things about fMRI technique is its noninvasiveness. If statistical classification methods are improved, it could aid the advancement of noninvasive diagnostic techniques for mental illness or even degenerative diseases such as Alzheimer's. In this paper, we develop a variable selection technique, which tackles high dimensionality and correlation problems in fMRI data, based on L1 regularization-group lasso for the functional logistic regression model where the response is binary and represent two separate classes; the predictors are functional. We assess our method with a simulation study and an application to a real fMRI dataset.
Kirsanov, Dmitry; Panchuk, Vitaly; Goydenko, Alexander; Khaydukova, Maria; Semenov, Valentin; Legin, Andrey
2015-11-01
This study addresses the problem of simultaneous quantitative analysis of six lanthanides (Ce, Pr, Nd, Sm, Eu, Gd) in mixed solutions by two different X-ray fluorescence techniques: energy-dispersive (EDX) and total reflection (TXRF). Concentration of each lanthanide was varied in the range 10- 6-10- 3 mol/L, low values being around the detection limit of the method. This resulted in XRF spectra with very poor signal to noise ratio and overlapping bands in case of EDX, while only the latter problem was observed for TXRF. It was shown that ordinary least squares approach in numerical calibration fails to provide for reasonable precision in quantification of individual lanthanides. Partial least squares (PLS) regression was able to circumvent spectral inferiorities and yielded adequate calibration models for both techniques with RMSEP (root mean squared error of prediction) values around 10- 5 mol/L. It was demonstrated that comparatively simple and inexpensive EDX method is capable of ensuring the similar precision to more sophisticated TXRF, when the spectra are treated by PLS.
Shayan, Zahra; Mezerji, Naser Mohammad Gholi; Shayan, Leila; Naseri, Parisa
2016-01-01
Background: Logistic regression (LR) and linear discriminant analysis (LDA) are two popular statistical models for prediction of group membership. Although they are very similar, the LDA makes more assumptions about the data. When categorical and continuous variables used simultaneously, the optimal choice between the two models is questionable. In most studies, classification error (CE) is used to discriminate between subjects in several groups, but this index is not suitable to predict the accuracy of the outcome. The present study compared LR and LDA models using classification indices. Methods: This cross-sectional study selected 243 cancer patients. Sample sets of different sizes (n = 50, 100, 150, 200, 220) were randomly selected and the CE, B, and Q classification indices were calculated by the LR and LDA models. Results: CE revealed the a lack of superiority for one model over the other, but the results showed that LR performed better than LDA for the B and Q indices in all situations. No significant effect for sample size on CE was noted for selection of an optimal model. Assessment of the accuracy of prediction of real data indicated that the B and Q indices are appropriate for selection of an optimal model. Conclusion: The results of this study showed that LR performs better in some cases and LDA in others when based on CE. The CE index is not appropriate for classification, although the B and Q indices performed better and offered more efficient criteria for comparison and discrimination between groups.
Directory of Open Access Journals (Sweden)
Juan Merlo
Full Text Available Many multilevel logistic regression analyses of "neighbourhood and health" focus on interpreting measures of associations (e.g., odds ratio, OR. In contrast, multilevel analysis of variance is rarely considered. We propose an original stepwise analytical approach that distinguishes between "specific" (measures of association and "general" (measures of variance contextual effects. Performing two empirical examples we illustrate the methodology, interpret the results and discuss the implications of this kind of analysis in public health.We analyse 43,291 individuals residing in 218 neighbourhoods in the city of Malmö, Sweden in 2006. We study two individual outcomes (psychotropic drug use and choice of private vs. public general practitioner, GP for which the relative importance of neighbourhood as a source of individual variation differs substantially. In Step 1 of the analysis, we evaluate the OR and the area under the receiver operating characteristic (AUC curve for individual-level covariates (i.e., age, sex and individual low income. In Step 2, we assess general contextual effects using the AUC. Finally, in Step 3 the OR for a specific neighbourhood characteristic (i.e., neighbourhood income is interpreted jointly with the proportional change in variance (i.e., PCV and the proportion of ORs in the opposite direction (POOR statistics.For both outcomes, information on individual characteristics (Step 1 provide a low discriminatory accuracy (AUC = 0.616 for psychotropic drugs; = 0.600 for choosing a private GP. Accounting for neighbourhood of residence (Step 2 only improved the AUC for choosing a private GP (+0.295 units. High neighbourhood income (Step 3 was strongly associated to choosing a private GP (OR = 3.50 but the PCV was only 11% and the POOR 33%.Applying an innovative stepwise multilevel analysis, we observed that, in Malmö, the neighbourhood context per se had a negligible influence on individual use of psychotropic drugs, but
Directory of Open Access Journals (Sweden)
Lançon Christophe
2006-07-01
Full Text Available Abstract Background Data comparing duloxetine with existing antidepressant treatments is limited. A comparison of duloxetine with fluoxetine has been performed but no comparison with venlafaxine, the other antidepressant in the same therapeutic class with a significant market share, has been undertaken. In the absence of relevant data to assess the place that duloxetine should occupy in the therapeutic arsenal, indirect comparisons are the most rigorous way to go. We conducted a systematic review of the efficacy of duloxetine, fluoxetine and venlafaxine versus placebo in the treatment of Major Depressive Disorder (MDD, and performed indirect comparisons through meta-regressions. Methods The bibliography of the Agency for Health Care Policy and Research and the CENTRAL, Medline, and Embase databases were interrogated using advanced search strategies based on a combination of text and index terms. The search focused on randomized placebo-controlled clinical trials involving adult patients treated for acute phase Major Depressive Disorder. All outcomes were derived to take account for varying placebo responses throughout studies. Primary outcome was treatment efficacy as measured by Hedge's g effect size. Secondary outcomes were response and dropout rates as measured by log odds ratios. Meta-regressions were run to indirectly compare the drugs. Sensitivity analysis, assessing the influence of individual studies over the results, and the influence of patients' characteristics were run. Results 22 studies involving fluoxetine, 9 involving duloxetine and 8 involving venlafaxine were selected. Using indirect comparison methodology, estimated effect sizes for efficacy compared with duloxetine were 0.11 [-0.14;0.36] for fluoxetine and 0.22 [0.06;0.38] for venlafaxine. Response log odds ratios were -0.21 [-0.44;0.03], 0.70 [0.26;1.14]. Dropout log odds ratios were -0.02 [-0.33;0.29], 0.21 [-0.13;0.55]. Sensitivity analyses showed that results were
Gizaw, Mesgana Seyoum; Gan, Thian Yew
2016-07-01
Regional Flood Frequency Analysis (RFFA) is a statistical method widely used to estimate flood quantiles of catchments with limited streamflow data. In addition, to estimate the flood quantile of ungauged sites, there could be only a limited number of stations with complete dataset are available from hydrologically similar, surrounding catchments. Besides traditional regression based RFFA methods, recent applications of machine learning algorithms such as the artificial neural network (ANN) have shown encouraging results in regional flood quantile estimations. Another novel machine learning technique that is becoming widely applicable in the hydrologic community is the Support Vector Regression (SVR). In this study, an RFFA model based on SVR was developed to estimate regional flood quantiles for two study areas, one with 26 catchments located in southeastern British Columbia (BC) and another with 23 catchments located in southern Ontario (ON), Canada. The SVR-RFFA model for both study sites was developed from 13 sets of physiographic and climatic predictors for the historical period. The Ef (Nash Sutcliffe coefficient) and R2 of the SVR-RFFA model was about 0.7 when estimating flood quantiles of 10, 25, 50 and 100 year return periods which indicate satisfactory model performance in both study areas. In addition, the SVR-RFFA model also performed well based on other goodness-of-fit statistics such as BIAS (mean bias) and BIASr (relative BIAS). If the amount of data available for training RFFA models is limited, the SVR-RFFA model was found to perform better than an ANN based RFFA model, and with significantly lower median CV (coefficient of variation) of the estimated flood quantiles. The SVR-RFFA model was then used to project changes in flood quantiles over the two study areas under the impact of climate change using the RCP4.5 and RCP8.5 climate projections of five Coupled Model Intercomparison Project (CMIP5) GCMs (Global Climate Models) for the 2041
Directory of Open Access Journals (Sweden)
Cecchini Diego M
2009-11-01
Full Text Available Abstract Background The central nervous system is considered a sanctuary site for HIV-1 replication. Variables associated with HIV cerebrospinal fluid (CSF viral load in the context of opportunistic CNS infections are poorly understood. Our objective was to evaluate the relation between: (1 CSF HIV-1 viral load and CSF cytological and biochemical characteristics (leukocyte count, protein concentration, cryptococcal antigen titer; (2 CSF HIV-1 viral load and HIV-1 plasma viral load; and (3 CSF leukocyte count and the peripheral blood CD4+ T lymphocyte count. Methods Our approach was to use a prospective collection and analysis of pre-treatment, paired CSF and plasma samples from antiretroviral-naive HIV-positive patients with cryptococcal meningitis and assisted at the Francisco J Muñiz Hospital, Buenos Aires, Argentina (period: 2004 to 2006. We measured HIV CSF and plasma levels by polymerase chain reaction using the Cobas Amplicor HIV-1 Monitor Test version 1.5 (Roche. Data were processed with Statistix 7.0 software (linear regression analysis. Results Samples from 34 patients were analyzed. CSF leukocyte count showed statistically significant correlation with CSF HIV-1 viral load (r = 0.4, 95% CI = 0.13-0.63, p = 0.01. No correlation was found with the plasma viral load, CSF protein concentration and cryptococcal antigen titer. A positive correlation was found between peripheral blood CD4+ T lymphocyte count and the CSF leukocyte count (r = 0.44, 95% CI = 0.125-0.674, p = 0.0123. Conclusion Our study suggests that CSF leukocyte count influences CSF HIV-1 viral load in patients with meningitis caused by Cryptococcus neoformans.
Determinants for changing the treatment of COPD: a regression analysis from a clinical audit
López-Campos, Jose Luis; Abad Arranz, María; Calero Acuña, Carmen; Romero Valero, Fernando; Ayerbe García, Ruth; Hidalgo Molina, Antonio; Aguilar Perez-Grovas, Ricardo I; García Gil, Francisco; Casas Maldonado, Francisco; Caballero Ballesteros, Laura; Sánchez Palop, María; Pérez-Tejero, Dolores; Segado, Alejandro; Calvo Bonachera, Jose; Hernández Sierra, Bárbara; Doménech, Adolfo; Arroyo Varela, Macarena; González Vargas, Francisco; Cruz Rueda, Juan J
2016-01-01
Introduction This study is an analysis of a pilot COPD clinical audit that evaluated adherence to guidelines for patients with COPD in a stable disease phase during a routine visit in specialized secondary care outpatient clinics in order to identify the variables associated with the decision to step-up or step-down pharmacological treatment. Methods This study was a pilot clinical audit performed at hospital outpatient respiratory clinics in the region of Andalusia, Spain (eight provinces with over eight million inhabitants), in which 20% of centers in the area (catchment population 3,143,086 inhabitants) were invited to participate. Treatment changes were evaluated in terms of the number of prescribed medications and were classified as step-up, step-down, or no change. Three backward stepwise binominal multivariate logistic regression analyses were conducted to evaluate variables associated with stepping up, stepping down, and inhaled corticosteroids discontinuation. Results The present analysis evaluated 565 clinical records (91%) of the complete audit. Of those records, 366 (64.8%) cases saw no change in pharmacological treatment, while 99 patients (17.5%) had an increase in the number of drugs, 55 (9.7%) had a decrease in the number of drugs, and 45 (8.0%) noted a change to other medication for a similar therapeutic scheme. Exacerbations were the main factor in stepping up treatment, as were the symptoms themselves. In contrast, rather than symptoms, doctors used forced expiratory volume in 1 second and previous treatment with long-term antibiotics or inhaled corticosteroids as the key determinants to stepping down treatment. Conclusion The majority of doctors did not change the prescription. When changes were made, a number of related factors were noted. Future trials must evaluate whether these therapeutic changes impact clinically relevant outcomes at follow-up. PMID:27330285
Azadi, Sama; Karimi-Jashni, Ayoub
2016-02-01
Predicting the mass of solid waste generation plays an important role in integrated solid waste management plans. In this study, the performance of two predictive models, Artificial Neural Network (ANN) and Multiple Linear Regression (MLR) was verified to predict mean Seasonal Municipal Solid Waste Generation (SMSWG) rate. The accuracy of the proposed models is illustrated through a case study of 20 cities located in Fars Province, Iran. Four performance measures, MAE, MAPE, RMSE and R were used to evaluate the performance of these models. The MLR, as a conventional model, showed poor prediction performance. On the other hand, the results indicated that the ANN model, as a non-linear model, has a higher predictive accuracy when it comes to prediction of the mean SMSWG rate. As a result, in order to develop a more cost-effective strategy for waste management in the future, the ANN model could be used to predict the mean SMSWG rate.
Walker, Berkley J; Skabelund, Dane C; Busch, Florian A; Ort, Donald R
2016-06-01
Biochemical models of leaf photosynthesis, which are essential for understanding the impact of photosynthesis to changing environments, depend on accurate parameterizations. One such parameter, the photorespiratory CO2 compensation point can be measured from the intersection of several CO2 response curves measured under sub-saturating illumination. However, determining the actual intersection while accounting for experimental noise can be challenging. Additionally, leaf photosynthesis model outcomes are sensitive to the diffusion paths of CO2 released from the mitochondria. This diffusion path of CO2 includes both chloroplastic as well as cell wall resistances to CO2 , which are not readily measurable. Both the difficulties of determining the photorespiratory CO2 compensation point and the impact of multiple intercellular resistances to CO2 can be addressed through application of slope-intercept regression. This technical report summarizes an improved framework for implementing slope-intercept regression to evaluate measurements of the photorespiratory CO2 compensation point. This approach extends past work to include the cases of both Rubisco and Ribulose-1,5-bisphosphate (RuBP)-limited photosynthesis. This report further presents two interactive graphical applications and a spreadsheet-based tool to allow users to apply slope-intercept theory to their data. PMID:27103099
Walker, Berkley J; Skabelund, Dane C; Busch, Florian A; Ort, Donald R
2016-06-01
Biochemical models of leaf photosynthesis, which are essential for understanding the impact of photosynthesis to changing environments, depend on accurate parameterizations. One such parameter, the photorespiratory CO2 compensation point can be measured from the intersection of several CO2 response curves measured under sub-saturating illumination. However, determining the actual intersection while accounting for experimental noise can be challenging. Additionally, leaf photosynthesis model outcomes are sensitive to the diffusion paths of CO2 released from the mitochondria. This diffusion path of CO2 includes both chloroplastic as well as cell wall resistances to CO2 , which are not readily measurable. Both the difficulties of determining the photorespiratory CO2 compensation point and the impact of multiple intercellular resistances to CO2 can be addressed through application of slope-intercept regression. This technical report summarizes an improved framework for implementing slope-intercept regression to evaluate measurements of the photorespiratory CO2 compensation point. This approach extends past work to include the cases of both Rubisco and Ribulose-1,5-bisphosphate (RuBP)-limited photosynthesis. This report further presents two interactive graphical applications and a spreadsheet-based tool to allow users to apply slope-intercept theory to their data.
Analysis of Functional Data with Focus on Multinomial Regression and Multilevel Data
DEFF Research Database (Denmark)
Mousavi, Seyed Nourollah
to analyze functional data with a categorical response (more than two classes) and a functional predictor. To this end, a combination of discrete wavelet transform and LASSO penalization is considered. This model is applied to two datasets, one regarding lameness detection for horse and another regarding......, functional penalized regression and function regression using functional principle components. The comparison is based on simulation study and data application. In the third paper, we study a constrained version of function-on-function regression, in which both response and predictor are dened at same domain...... functional regression model to analyze functional data with a categorical response (more than two classes) and a functional predictor. To this end, a combination of discrete wavelet transform and LASSO penalization is considered. This model is applied to two datasets, one regarding lameness detection...
Regression models for air pollution and daily mortality: analysis of data from Birmingham, Alabama
Energy Technology Data Exchange (ETDEWEB)
Smith, R.L. [University of North Carolina, Chapel Hill, NC (United States). Dept. of Statistics; Davis, J.M. [North Carolina State University, Raleigh, NC (United States). Dept. of Marine, Earth and Atmospheric Sciences; Sacks, J. [National Institute of Statistical Sciences, Research Triangle Park, NC (United States); Speckman, P. [University of Missouri, Columbia, MO (United States). Dept. of Statistics; Styer, P.
2000-11-01
In recent years, a very large literature has built up on the human health effects of air pollution. Many studies have been based on time series analyses in which daily mortality counts, or some other measure such as hospital admissions, have been decomposed through regression analysis into contributions based on long-term trend and seasonality, meteorological effects, and air pollution. There has been a particular focus on particulate air pollution represented by PM{sub 10} (particulate matter of aerodynamic diameter 10 {mu}m or less), though in recent years more attention has been given to very small particles of diameter 2.5 {mu}m or less. Most of the existing data studies, however, are based on PM{sub 10} because of the wide availability of monitoring data for this variable. The persistence of the resulting effects across many different studies is widely cited as evidence that this is not mere statistical association, but indeed establishes a causal relationship. These studies have been cited by the United States Environmental Protection Agency (USEPA) as justification for a tightening on particulate matter standards in the 1997 revision of the National Ambient Air Quality Standard (NAAQS), which is the basis for air pollution regulation in the United States. The purpose of the present paper is to propose a systematic approach to the regression analyses that are central to this kind of research. We argue that the results may depend on a number of ad hoc features of the analysis, including which meteorological variables to adjust for, and the manner in which different lagged values of particulate matter are combined into a single 'exposure measure'. We also examine the question of whether the effects are linear or nonlinear, with particular attention to the possibility of a 'threshold effect', i.e. that significant effects occur only above some threshold. These points are illustrated with a data set from Birmingham, Alabama, first cited by
Kondo, Yumi; Zhao, Yinshan; Petkau, John
2015-06-15
We develop a new modeling approach to enhance a recently proposed method to detect increases of contrast-enhancing lesions (CELs) on repeated magnetic resonance imaging, which have been used as an indicator for potential adverse events in multiple sclerosis clinical trials. The method signals patients with unusual increases in CEL activity by estimating the probability of observing CEL counts as large as those observed on a patient's recent scans conditional on the patient's CEL counts on previous scans. This conditional probability index (CPI), computed based on a mixed-effect negative binomial regression model, can vary substantially depending on the choice of distribution for the patient-specific random effects. Therefore, we relax this parametric assumption to model the random effects with an infinite mixture of beta distributions, using the Dirichlet process, which effectively allows any form of distribution. To our knowledge, no previous literature considers a mixed-effect regression for longitudinal count variables where the random effect is modeled with a Dirichlet process mixture. As our inference is in the Bayesian framework, we adopt a meta-analytic approach to develop an informative prior based on previous clinical trials. This is particularly helpful at the early stages of trials when less data are available. Our enhanced method is illustrated with CEL data from 10 previous multiple sclerosis clinical trials. Our simulation study shows that our procedure estimates the CPI more accurately than parametric alternatives when the patient-specific random effect distribution is misspecified and that an informative prior improves the accuracy of the CPI estimates. PMID:25784219
Variable precision rough set for multiple decision attribute analysis
Institute of Scientific and Technical Information of China (English)
Lai; Kin; Keung
2008-01-01
A variable precision rough set (VPRS) model is used to solve the multi-attribute decision analysis (MADA) problem with multiple conflicting decision attributes and multiple condition attributes. By introducing confidence measures and a β-reduct, the VPRS model can rationally solve the conflicting decision analysis problem with multiple decision attributes and multiple condition attributes. For illustration, a medical diagnosis example is utilized to show the feasibility of the VPRS model in solving the MADA...
Sample- and segment-size specific Model Selection in Mixture Regression Analysis
Sarstedt, Marko
2006-01-01
As mixture regression models increasingly receive attention from both theory and practice, the question of selecting the correct number of segments gains urgency. A misspecification can lead to an under- or oversegmentation, thus resulting in flawed management decisions on customer targeting or product positioning. This paper presents the results of an extensive simulation study that examines the performance of commonly used information criteria in a mixture regression context with normal ...
A two-stage productivity analysis using bootstrapped Malmquist index and quantile regression
Kaditi, Eleni A.; Nitsi, Elisavet I.
2009-01-01
This paper examines the effects of farm characteristics and government policies in enhancing productivity growth for a sample of Greek farms, using a two-stage procedure. In the 1st-stage, non-parametric estimates of Malmquist index and its decompositions are computed, while a bootstrapping procedure is applied to provide their statistical precision. In the 2nd-stage, the productivity growth estimates are regressed on various covariates using a bootstrapped quantile regression approach. The e...
Autoencoder, Principal Component Analysis and Support Vector Regression for Data Imputation
Marivate, Vukosi N.; Nelwamodo, Fulufhelo V.; Marwala, Tshilidzi
2007-01-01
Data collection often results in records that have missing values or variables. This investigation compares 3 different data imputation models and identifies their merits by using accuracy measures. Autoencoder Neural Networks, Principal components and Support Vector regression are used for prediction and combined with a genetic algorithm to then impute missing variables. The use of PCA improves the overall performance of the autoencoder network while the use of support vector regression show...
Institute of Scientific and Technical Information of China (English)
LI; XinTian; TIAN; Hui; CAI; GuoBiao
2013-01-01
This paper presents three-dimensional numerical simulations of the hybrid rocket motor with hydrogen peroxide (HP) and hy-droxyl terminated polybutadiene (HTPB) propellant combination and investigates the fuel regression rate distribution charac-teristics of different fuel types. The numerical models are established to couple the Navier-Stokes equations with turbulence,chemical reactions, solid fuel pyrolysis and solid-gas interfacial boundary conditions. Simulation results including the temper-ature contours and fuel regression rate distributions are presented for the tube, star and wagon wheel grains. The results demonstrate that the changing trends of the regression rate along the axis are similar for all kinds of fuel types, which decrease sharply near the leading edges of the fuels and then gradually increase with increasing axial locations. The regression rates of the star and wagon wheel grains show apparent three-dimensional characteristics, and they are higher in the regions of fuel surfaces near the central core oxidizer flow. The average regression rates increase as the oxidizer mass fluxes rise for all of the fuel types. However, under same oxidizer mass flux, the average regression rates of the star and wagon wheel grains are much larger than that of the tube grain due to their lower hydraulic diameters.
Survival Data and Regression Models
Grégoire, G.
2014-12-01
We start this chapter by introducing some basic elements for the analysis of censored survival data. Then we focus on right censored data and develop two types of regression models. The first one concerns the so-called accelerated failure time models (AFT), which are parametric models where a function of a parameter depends linearly on the covariables. The second one is a semiparametric model, where the covariables enter in a multiplicative form in the expression of the hazard rate function. The main statistical tool for analysing these regression models is the maximum likelihood methodology and, in spite we recall some essential results about the ML theory, we refer to the chapter "Logistic Regression" for a more detailed presentation.
Directory of Open Access Journals (Sweden)
J. B. Alam, M. Jobair Bin Alam, M. M. Rahman, A. K. Dikshit, S. K. Khan
Full Text Available The study reports the level of traffic-induced noise pollution in Sylhet City. For this purpose noise levels have been measured at thirty-seven major locations of the city from 7 am to 11 pm during the working days. It was observed that at all the locations the level of noise remains far above the acceptable limit for all the time. The noise level on the main road near residential area, hospital area and educational area were above the recommended level (65dBA. It was found that the predictive equations are in 60-70% correlated with the measured noise level. The study suggests that vulnerable institutions like school and hospital should be located about 60m away from the roadside unless any special arrangement to alleviate sound is used.
Comparing Rough Set Theory with Multiple Regression Analysis as Automated Valuation Methodologies
Maurizio d’Amato
2007-01-01
This paper focuses on the problem of applying rough set theory to mass appraisal. This methodology was first introduced by a Polish mathematician, and has been applied recently as an automated valuation methodology by the author. The method allows the appraiser to estimate a property without defining econometric modeling, although it does not give any quantitative estimation of marginal prices. In a previous paper by the author, data were organized into classes prior to the valuation process,...
Pardeck, John T.
1991-01-01
Explored the effects of the family system on the potential for alcoholism in 209 college students. Findings showed that students' gender, race, and how often they consumed alcohol were unrelated to the potential for alcoholism; however, perceived conflict in the students' family of origin appeared to increase potential. (Author/PVV)
Analysis of the Turnover Evolution with the Help of Multiple Regression
Mioara TURCAS
2012-01-01
Turnover represents the sum of sales (less taxes) realized by the company in present and normal activity. It is the sum of sale of merchandise, manufactured products, services and products from auxiliary activities. Turnover presents importance for study as it reflects te volume of business generated by the current activity of the enterprise and allows thus the appreciation of size, position on market and offers information on the dynamics of the activity, development opportunities and the im...
Quantifying image distortion based on Gabor filter bank and multiple regression analysis
Ortiz-Jaramillo, B.; Garcia-Alvarez, J. C.; Führ, H.; Castellanos-Dominguez, G.; Philips, W.
2012-01-01
Image quality assessment is indispensable for image-based applications. The approaches towards image quality assessment fall into two main categories: subjective and objective methods. Subjective assessment has been widely used. However, careful subjective assessments are experimentally difficult and lengthy, and the results obtained may vary depending on the test conditions. On the other hand, objective image quality assessment would not only alleviate the difficulties described above but would also help to expand the application field. Therefore, several works have been developed for quantifying the distortion presented on a image achieving goodness of fit between subjective and objective scores up to 92%. Nevertheless, current methodologies are designed assuming that the nature of the distortion is known. Generally, this is a limiting assumption for practical applications, since in a majority of cases the distortions in the image are unknown. Therefore, we believe that the current methods of image quality assessment should be adapted in order to identify and quantify the distortion of images at the same time. That combination can improve processes such as enhancement, restoration, compression, transmission, among others. We present an approach based on the power of the experimental design and the joint localization of the Gabor filters for studying the influence of the spatial/frequencies on image quality assessment. Therefore, we achieve a correct identification and quantification of the distortion affecting images. This method provides accurate scores and differentiability between distortions.
Uy, Chin; Manalo, Ronaldo A.; Cabauatan, Ronaldo R.
2015-01-01
In the Philippines, students seeking admission to a university are usually required to meet certain entrance requirements, including passing the entrance examinations with questions on IQ and English, mathematics, and science. This paper aims to determine the factors that affect the performance of entrants into business programmes in high-stakes…
Vozinaki, Anthi Eirini K.; Karatzas, George P.; Sibetheros, Ioannis A.; Varouchakis, Emmanouil A.
2014-05-01
Damage curves are the most significant component of the flood loss estimation models. Their development is quite complex. Two types of damage curves exist, historical and synthetic curves. Historical curves are developed from historical loss data from actual flood events. However, due to the scarcity of historical data, synthetic damage curves can be alternatively developed. Synthetic curves rely on the analysis of expected damage under certain hypothetical flooding conditions. A synthetic approach was developed and presented in this work for the development of damage curves, which are subsequently used as the basic input to a flood loss estimation model. A questionnaire-based survey took place among practicing and research agronomists, in order to generate rural loss data based on the responders' loss estimates, for several flood condition scenarios. In addition, a similar questionnaire-based survey took place among building experts, i.e. civil engineers and architects, in order to generate loss data for the urban sector. By answering the questionnaire, the experts were in essence expressing their opinion on how damage to various crop types or building types is related to a range of values of flood inundation parameters, such as floodwater depth and velocity. However, the loss data compiled from the completed questionnaires were not sufficient for the construction of workable damage curves; to overcome this problem, a Weighted Monte Carlo method was implemented, in order to generate extra synthetic datasets with statistical properties identical to those of the questionnaire-based data. The data generated by the Weighted Monte Carlo method were processed via Logistic Regression techniques in order to develop accurate logistic damage curves for the rural and the urban sectors. A Python-based code was developed, which combines the Weighted Monte Carlo method and the Logistic Regression analysis into a single code (WMCLR Python code). Each WMCLR code execution
Zhang, Lu-da; Zhao, Li-li; Zhao, Long-lian; Li, Jun-hui; Yan, Yan-lu
2005-08-01
This paper introduces the principle and method with which the model about the quantitative analysis of Fourier transformation near infrared (NIR) spectroscopy by MAXR regression procedure can be established. In this way, the authors have selected the wave length information by Matlab language design programming in order to establish the quantitative analysis models with near infrared spectroscopy. Taking sixty-six wheat samples as experiment materials, quantitative analysis models to determine protein content are established with thirty-three samples. The relative coefficient are 0.977 1 and 0.976 5 respectively and the standard error are 0.335 and 0.340 between the predication result of the two models which include respectively two or three wave length information and Kjeldahl's value for the protein content of the another thirty-three wheat samples. When selecting the wave length information, the MAXR regression procedure can establish the optimum regression models which contain 1 or 2...or k wavelength information respectively. MAXR regression procedure is a useful method when selecting the optimum wavelength information because of its shorter computation time, and the method not only can carefully select the essential wavelength information to establish NIR spectroscopy quantitative analysis models of resisting multicollinearity information disturbance, but also to establish the work for selecting optimum wavelength information which can direct to design the special NIR analysis instrument for analyzing specific component in the special samples. PMID:16329486
Zhang, Yiwei; Pan, Wei
2015-03-01
Genome-wide association studies (GWAS) have been established as a major tool to identify genetic variants associated with complex traits, such as common diseases. However, GWAS may suffer from false positives and false negatives due to confounding population structures, including known or unknown relatedness. Another important issue is unmeasured environmental risk factors. Among many methods for adjusting for population structures, two approaches stand out: one is principal component regression (PCR) based on principal component analysis, which is perhaps the most popular due to its early appearance, simplicity, and general effectiveness; the other is based on a linear mixed model (LMM) that has emerged recently as perhaps the most flexible and effective, especially for samples with complex structures as in model organisms. As shown previously, the PCR approach can be regarded as an approximation to an LMM; such an approximation depends on the number of the top principal components (PCs) used, the choice of which is often difficult in practice. Hence, in the presence of population structure, the LMM appears to outperform the PCR method. However, due to the different treatments of fixed vs. random effects in the two approaches, we show an advantage of PCR over LMM: in the presence of an unknown but spatially confined environmental confounder (e.g., environmental pollution or lifestyle), the PCs may be able to implicitly and effectively adjust for the confounder whereas the LMM cannot. Accordingly, to adjust for both population structures and nongenetic confounders, we propose a hybrid method combining the use and, thus, strengths of PCR and LMM. We use real genotype data and simulated phenotypes to confirm the above points, and establish the superior performance of the hybrid method across all scenarios.
Directory of Open Access Journals (Sweden)
Milena Ilic
Full Text Available BACKGROUND: Limited data on mortality from malignant lymphatic and hematopoietic neoplasms have been published for Serbia. METHODS: The study covered population of Serbia during the 1991-2010 period. Mortality trends were assessed using the joinpoint regression analysis. RESULTS: Trend for overall death rates from malignant lymphoid and haematopoietic neoplasms significantly decreased: by -2.16% per year from 1991 through 1998, and then significantly increased by +2.20% per year for the 1998-2010 period. The growth during the entire period was on average +0.8% per year (95% CI 0.3 to 1.3. Mortality was higher among males than among females in all age groups. According to the comparability test, mortality trends from malignant lymphoid and haematopoietic neoplasms in men and women were parallel (final selected model failed to reject parallelism, P = 0.232. Among younger Serbian population (0-44 years old in both sexes: trends significantly declined in males for the entire period, while in females 15-44 years of age mortality rates significantly declined only from 2003 onwards. Mortality trend significantly increased in elderly in both genders (by +1.7% in males and +1.5% in females in the 60-69 age group, and +3.8% in males and +3.6% in females in the 70+ age group. According to the comparability test, mortality trend for Hodgkin's lymphoma differed significantly from mortality trends for all other types of malignant lymphoid and haematopoietic neoplasms (P<0.05. CONCLUSION: Unfavourable mortality trend in Serbia requires targeted intervention for risk factors control, early diagnosis and modern therapy.
Regression analysis of time trends in perinatal mortality in Germany 1980-1993.
Scherb, H; Weigelt, E; Brüske-Hohlfeld, I
2000-02-01
Numerous investigations have been carried out on the possible impact of the Chernobyl accident on the prevalence of anomalies at birth and on perinatal mortality. In many cases the studies were aimed at the detection of differences of pregnancy outcome measurements between regions or time periods. Most authors conclude that there is no evidence of a detrimental physical effect on congenital anomalies or other outcomes of pregnancy following the accident. In this paper, we report on statistical analyses of time trends of perinatal mortality in Germany. Our main intention is to investigate whether perinatal mortality, as reflected in official records, was increased in 1987 as a possible effect of the Chernobyl accident. We show that, in Germany as a whole, there was a significantly elevated perinatal mortality proportion in 1987 as compared to the trend function. The increase is 4.8% (p = 0.0046) of the expected perinatal death proportion for 1987. Even more pronounced levels of 8.2% (p = 0. 0458) and 8.5% (p = 0.0702) may be found in the higher contaminated areas of the former German Democratic Republic (GDR), including West Berlin, and of Bavaria, respectively. To investigate the impact of statistical models on results, we applied three standard regression techniques. The observed significant increase in 1987 is independent of the statistical model used. Stillbirth proportions show essentially the same behavior as perinatal death proportions, but the results for all of Germany are nonsignificant due to the smaller numbers involved. Analysis of the association of stillbirth proportions with the (137)Cs deposition on a district level in Bavaria discloses a significant relationship. Our results are in contrast to those of many analyses of the health consequences of the Chernobyl accident and contradict the present radiobiologic knowledge. As we are dealing with highly aggregated data, other causes or artifacts may explain the observed effects. Hence, the findings
Directory of Open Access Journals (Sweden)
Chau-Kuang Chen
2015-02-01
Full Text Available Data from the Center for Disease Control (CDC has shown that the obesity rate doubled among adults within the past two decades. This upsurge was the result of changes in human behavior and environment. Partial least squares (PLS regression and support vector machine (SVM models were conducted to determine the relationship between U.S. county-level adult obesity rate and multiple risk factors. The outcome variable was the adult obesity rate. The 23 risk factors were categorized into four domains of the social ecological model including biological/behavioral factor, socioeconomic status, food environment, and physical environment. Of the 23 risk factors related to adult obesity, the top eight significant risk factors with high normalized importance were identified including physical inactivity, natural amenity, percent of households receiving SNAP benefits, and percent of all restaurants being fast food. The study results were consistent with those in the literature. The study showed that adult obesity rate was influenced by biological/behavioral factor, socioeconomic status, food environment, and physical environment embedded in the social ecological theory. By analyzing multiple risk factors of obesity in the communities, may lead to the proposal of more comprehensive and integrated policies and intervention programs to solve the population-based problem.
Salas, M.M.; Nascimento, G.G.; Vargas-Ferreira, F.; Tarquinio, S.B.; Huysmans, M.C.D.N.J.M.; Demarco, F.F.
2015-01-01
OBJECTIVE: The aim of the present study was to assess the influence of diet in tooth erosion presence in children and adolescents by meta-analysis and meta-regression. DATA: Two reviewers independently performed the selection process and the quality of studies was assessed. SOURCES: Studies publishe