Regions of Significance in Multiple Regression Analysis
Takane, Yoshio; Cramer, Elliott M.
1975-01-01
This paper considers the case of two predictor variables. Figures are obtained which show the regions of significance of joint regression coefficients, regression coefficients considered separately, and the multiple correlation. The intersection of these regions of significance and non-significance illustrates how the various apparent…
Functional Regression Models for Epistasis Analysis of Multiple Quantitative Traits
Xie, Dan; Liang, Meimei; Xiong, Momiao
2016-01-01
To date, most genetic analyses of phenotypes have focused on analyzing single traits or analyzing each phenotype independently. However, joint epistasis analysis of multiple complementary traits will increase statistical power and improve our understanding of the complicated genetic structure of the complex diseases. Despite their importance in uncovering the genetic structure of complex traits, the statistical methods for identifying epistasis in multiple phenotypes remains fundamentally unexplored. To fill this gap, we formulate a test for interaction between two genes in multiple quantitative trait analysis as a multiple functional regression (MFRG) in which the genotype functions (genetic variant profiles) are defined as a function of the genomic position of the genetic variants. We use large-scale simulations to calculate Type I error rates for testing interaction between two genes with multiple phenotypes and to compare the power with multivariate pairwise interaction analysis and single trait interaction analysis by a single variate functional regression model. To further evaluate performance, the MFRG for epistasis analysis is applied to five phenotypes of exome sequence data from the NHLBI’s Exome Sequencing Project (ESP) to detect pleiotropic epistasis. A total of 267 pairs of genes that formed a genetic interaction network showed significant evidence of epistasis influencing five traits. The results demonstrate that the joint interaction analysis of multiple phenotypes has a much higher power to detect interaction than the interaction analysis of a single trait and may open a new direction to fully uncovering the genetic structure of multiple phenotypes. PMID:27104857
MULTIPLE REGRESSION ANALYSIS OF MAIN ECONOMIC INDICATORS IN TOURISM
Erika KULCSÁR
2009-01-01
This paper analysis the measure between GDP dependent variable in the sector of hotels and restaurants and the following independent variables: overnight stays in the establishments of touristic reception, arrivals in the establishments of touristic reception and investments in hotels and restaurants sector in the period of analysis 1995-2007. With the multiple regression analysis I found that investments and tourist arrivals are significant predictors for the GDP dependent variable. Based on...
Multiple regression analysis of cancer incidence around nuclear plant
The results of a multiple regression analysis of cancer incidence in the vicinity of a nuclear plant are presented. No dependence on radiation factors (natural background, radioactive releases, total dose of all types of medical examinations) is established. At the same time a relationship between general cancer incidence, turmors of lungs, trashea, bronchi and hematopoictic tissue carcimona incidence and releases of dangerous chemical substances is revealed
MULTIPLE REGRESSION ANALYSIS OF MAIN ECONOMIC INDICATORS IN TOURISM
Erika KULCSÁR
2009-12-01
Full Text Available This paper analysis the measure between GDP dependent variable in the sector of hotels and restaurants and the following independent variables: overnight stays in the establishments of touristic reception, arrivals in the establishments of touristic reception and investments in hotels and restaurants sector in the period of analysis 1995-2007. With the multiple regression analysis I found that investments and tourist arrivals are significant predictors for the GDP dependent variable. Based on these results, I identified those components of the marketing mix, which in my opinion require investment, which could contribute to the positive development of tourist arrivals in the establishments of touristic reception.
Applied multiple regression correlation analysis for the behavioral sciences
Cohen, Patricia; Aiken, Leona S
2014-01-01
This classic text on multiple regression is noted for its nonmathematical, applied, and data-analytic approach. Readers profit from its verbal-conceptual exposition and frequent use of examples. The applied emphasis provides clear illustrations of the principles and provides worked examples of the types of applications that are possible. Researchers learn how to specify regression models that directly address their research questions. An overview of the fundamental ideas of multiple regression and a review of bivariate correlation and regression and other elementary statistical concepts provide a strong foundation for understanding the rest of the text. The third edition features an increased emphasis on graphics and the use of confidence intervals and effect size measures, and an accompanying CD with data for most of the numerical examples along with the computer code for SPSS, SAS, and SYSTAT. Applied Multiple Regression serves as both a textbook for graduate students and as a reference tool for researche...
Regression analysis for multiple-disease group testing data.
Zhang, Boan; Bilder, Christopher R; Tebbs, Joshua M
2013-12-10
Group testing, where individual specimens are composited into groups to test for the presence of a disease (or other binary characteristic), is a procedure commonly used to reduce the costs of screening a large number of individuals. Group testing data are unique in that only group responses may be available, but inferences are needed at the individual level. A further methodological challenge arises when individuals are tested in groups for multiple diseases simultaneously, because unobserved individual disease statuses are likely correlated. In this paper, we propose new regression techniques for multiple-disease group testing data. We develop an expectation-solution based algorithm that provides consistent parameter estimates and natural large-sample inference procedures. We apply our proposed methodology to chlamydia and gonorrhea screening data collected in Nebraska as part of the Infertility Prevention Project and to prenatal infectious disease screening data from Kenya. PMID:23703944
Business applications of multiple regression
Richardson, Ronny
2015-01-01
This second edition of Business Applications of Multiple Regression describes the use of the statistical procedure called multiple regression in business situations, including forecasting and understanding the relationships between variables. The book assumes a basic understanding of statistics but reviews correlation analysis and simple regression to prepare the reader to understand and use multiple regression. The techniques described in the book are illustrated using both Microsoft Excel and a professional statistical program. Along the way, several real-world data sets are analyzed in deta
A multiple regression analysis for accurate background subtraction in 99Tcm-DTPA renography
A technique for accurate background subtraction in 99Tcm-DTPA renography is described. The technique is based on a multiple regression analysis of the renal curves and separate heart and soft tissue curves which together represent background activity. It is compared, in over 100 renograms, with a previously described linear regression technique. Results show that the method provides accurate background subtraction, even in very poorly functioning kidneys, thus enabling relative renal filtration and excretion to be accurately estimated. (author)
Multiple regression analysis of Jominy hardenability data for boron treated steels
The relations between chemical composition and their hardenability of boron treated steels have been investigated using a multiple regression analysis method. A linear model of regression was chosen. The free boron content that is effective for the hardenability was calculated using a model proposed by Jansson. The regression analysis for 1261 steel heats provided equations that were statistically significant at the 95% level. All heats met the specification according to the nordic countries producers classification. The variation in chemical composition explained typically 80 to 90% of the variation in the hardenability. In the regression analysis elements which did not significantly contribute to the calculated hardness according to the F test were eliminated. Carbon, silicon, manganese, phosphorus and chromium were of importance at all Jominy distances, nickel, vanadium, boron and nitrogen at distances above 6 mm. After the regression analysis it was demonstrated that very few outliers were present in the data set, i.e. data points outside four times the standard deviation. The model has successfully been used in industrial practice replacing some of the necessary Jominy tests. (orig.)
BUDIMAN
2012-01-01
Full Text Available Budiman, Arisoesilaningsih E. 2012. Predictive model of Amorphophallus muelleri growth in some agroforestry in East Java by multiple regression analysis. Biodiversitas 13: 18-22. The aims of this research was to determine the multiple regression models of vegetative and corm growth of Amorphophallus muelleri Blume in some age variations and habitat conditions of agroforestry in East Java. Descriptive exploratory research method was conducted by systematic random sampling at five agroforestries on four plantations in East Java: Saradan, Bojonegoro, Nganjuk and Blitar. In each agroforestry, we observed A. muelleri vegetative and corm growth on four growing age (1, 2, 3 and 4 years old respectively as well as environmental variables such as altitude, vegetation, climate and soil conditions. Data were analyzed using descriptive statistics to compare A. muelleri habitat in five agroforestries. Meanwhile, the influence and contribution of each environmental variable to the growth of A. muelleri vegetative and corm were determined using multiple regression analysis of SPSS 17.0. The multiple regression models of A. muelleri vegetative and corm growth were generated based on some characteristics of agroforestries and age showed high validity with R2 = 88-99%. Regression model showed that age, monthly temperatures, percentage of radiation and soil calcium (Ca content either simultaneously or partially determined the growth of A. muelleri vegetative and corm. Based on these models, the A. muelleri corm reached the optimal growth after four years of cultivation and they will be ready to be harvested. Additionally, the soil Ca content should reach 25.3 me.hg-1 as Sugihwaras agroforestry, with the maximal radiation of 60%.
Nop Sopipan
2013-01-01
Full Text Available The aim of this study was to forecast the returns for the Stock Exchange of Thailand (SET Index by adding some explanatory variables and stationary Autoregressive order p (AR (p in the mean equation of returns. In addition, we used Principal Component Analysis (PCA to remove possible complications caused by multicollinearity. Results showed that the multiple regressions based on PCA, has the best performance.
COLOR IMAGE RETRIEVAL BASED ON FEATURE FUSION THROUGH MULTIPLE LINEAR REGRESSION ANALYSIS
K. Seetharaman
2015-08-01
Full Text Available This paper proposes a novel technique based on feature fusion using multiple linear regression analysis, and the least-square estimation method is employed to estimate the parameters. The given input query image is segmented into various regions according to the structure of the image. The color and texture features are extracted on each region of the query image, and the features are fused together using the multiple linear regression model. The estimated parameters of the model, which is modeled based on the features, are formed as a vector called a feature vector. The Canberra distance measure is adopted to compare the feature vectors of the query and target images. The F-measure is applied to evaluate the performance of the proposed technique. The obtained results expose that the proposed technique is comparable to the other existing techniques.
Various types of ultrasonic techniques have been used for the estimation of compressive strength of concrete structures. However, conventional ultrasonic velocity method using only longitudial wave cannot be determined the compressive strength of concrete structures with accuracy. In this paper, by using the introduction of multiple parameter, e. g. velocity of shear wave, velocity of longitudinal wave, attenuation coefficient of shear wave, attenuation coefficient of longitudinal wave, combination condition, age and preservation method, multiple regression analysis method was applied to the determination of compressive strength of concrete structures. The experimental results show that velocity of shear wave can be estimated compressive strength of concrete with more accuracy compared with the velocity of longitudinal wave, accuracy of estimated error range of compressive strength of concrete structures can be enhanced within the range of 10% approximately
Liu, Pudong; Shi, Runhe; Wang, Hong; Bai, Kaixu; Gao, Wei
2014-10-01
Leaf pigments are key elements for plant photosynthesis and growth. Traditional manual sampling of these pigments is labor-intensive and costly, which also has the difficulty in capturing their temporal and spatial characteristics. The aim of this work is to estimate photosynthetic pigments at large scale by remote sensing. For this purpose, inverse model were proposed with the aid of stepwise multiple linear regression (SMLR) analysis. Furthermore, a leaf radiative transfer model (i.e. PROSPECT model) was employed to simulate the leaf reflectance where wavelength varies from 400 to 780 nm at 1 nm interval, and then these values were treated as the data from remote sensing observations. Meanwhile, simulated chlorophyll concentration (Cab), carotenoid concentration (Car) and their ratio (Cab/Car) were taken as target to build the regression model respectively. In this study, a total of 4000 samples were simulated via PROSPECT with different Cab, Car and leaf mesophyll structures as 70% of these samples were applied for training while the last 30% for model validation. Reflectance (r) and its mathematic transformations (1/r and log (1/r)) were all employed to build regression model respectively. Results showed fair agreements between pigments and simulated reflectance with all adjusted coefficients of determination (R2) larger than 0.8 as 6 wavebands were selected to build the SMLR model. The largest value of R2 for Cab, Car and Cab/Car are 0.8845, 0.876 and 0.8765, respectively. Meanwhile, mathematic transformations of reflectance showed little influence on regression accuracy. We concluded that it was feasible to estimate the chlorophyll and carotenoids and their ratio based on statistical model with leaf reflectance data.
A. Shirvani
2005-10-01
Full Text Available Since the fluctuations of the Persian Gulf Sea Surface Temperature (PGSST have a significant effect on the winter precipitation and water resources and agricultural productions of the south western parts of Iran, the possibility of the Winter SST prediction was evaluated by multiple regression model. The time series of PGSSTs for all seasons, during 1947-1992, were considered as predictors, and the time series of MSSTs during 1948-1993, as the prrdictand. For the purpose of data reduction and principal components extraction, the principal components analysis was applied. Just the scores of the first four PCs (PC1 to PC4 that accounted for the total variance in predictor field were considered as the input file for the regression analysis. For finding the dependency of each principal component to the first time series of the PGSST, the Varimax rotation analysis was applied. The results have indicated that PC1 to PC4 respectively are the indicator of temperature changes during winter, autumn, Spring and Summer. According to the regression model, the components of PC1, PC2 and PC4 were significant at 5% level. But the components of PC3 was insignificant. The results indicated that the significant variables are held accountable for the 33.5% of the total variance in the winter PGSSTs. It became obvious that for the prediction of the winter PGSST, the PGSST during the winter of the last year has a particular importance. At the next stage, autumn and summer temperature have also a role in prediction of winter PGSST.
A Performance Study of Data Mining Techniques: Multiple Linear Regression vs. Factor Analysis
Taneja, Abhishek
2011-01-01
The growing volume of data usually creates an interesting challenge for the need of data analysis tools that discover regularities in these data. Data mining has emerged as disciplines that contribute tools for data analysis, discovery of hidden knowledge, and autonomous decision making in many application domains. The purpose of this study is to compare the performance of two data mining techniques viz., factor analysis and multiple linear regression for different sample sizes on three unique sets of data. The performance of the two data mining techniques is compared on following parameters like mean square error (MSE), R-square, R-Square adjusted, condition number, root mean square error(RMSE), number of variables included in the prediction model, modified coefficient of efficiency, F-value, and test of normality. These parameters have been computed using various data mining tools like SPSS, XLstat, Stata, and MS-Excel. It is seen that for all the given dataset, factor analysis outperform multiple linear re...
Zhan, Xinhua; Liang, Xiao; Xu, Guohua; Zhou, Lixiang
2013-08-01
Polycyclic aromatic hydrocarbons (PAHs) are contaminants that reside mainly in surface soils. Dietary intake of plant-based foods can make a major contribution to total PAH exposure. Little information is available on the relationship between root morphology and plant uptake of PAHs. An understanding of plant root morphologic and compositional factors that affect root uptake of contaminants is important and can inform both agricultural (chemical contamination of crops) and engineering (phytoremediation) applications. Five crop plant species are grown hydroponically in solutions containing the PAH phenanthrene. Measurements are taken for 1) phenanthrene uptake, 2) root morphology--specific surface area, volume, surface area, tip number and total root length and 3) root tissue composition--water, lipid, protein and carbohydrate content. These factors are compared through Pearson's correlation and multiple linear regression analysis. The major factors which promote phenanthrene uptake are specific surface area and lipid content. PMID:23708267
Abdelrafe Elzamly
2014-01-01
Full Text Available Risk is not always avoidable, but it is controllable. The aim of this study is to identify whether those techniques are effective in reducing software failure. This motivates the authors to continue the effort to enrich the managing software project risks with consider mining and quantitative approach with large data set. In this study, two new techniques are introduced namely stepwise multiple regression analysis and fuzzy multiple regression to manage the software risks. Two evaluation procedures such as MMRE and Pred (25 is used to compare the accuracy of techniques. The model’s accuracy slightly improves in stepwise multiple regression rather than fuzzy multiple regression. This study will guide software managers to apply software risk management practices with real world software development organizations and verify the effectiveness of the new techniques and approaches on a software project. The study has been conducted on a group of software project using survey questionnaire. It is hope that this will enable software managers improve their decision to increase the probability of software project success.
In this study, thermodynamic and statistical analyses were performed on a gas turbine system, to assess the impact of some important operating parameters like CIT (Compressor Inlet Temperature), PR (Pressure Ratio) and TIT (Turbine Inlet Temperature) on its performance characteristics such as net power output, energy efficiency, exergy efficiency and fuel consumption. Each performance characteristic was enunciated as a function of operating parameters, followed by a parametric study and optimization. The results showed that the performance characteristics increase with an increase in the TIT and a decrease in the CIT, except fuel consumption which behaves oppositely. The net power output and efficiencies increase with the PR up to certain initial values and then start to decrease, whereas the fuel consumption always decreases with an increase in the PR. The results of exergy analysis showed the combustion chamber as a major contributor to the exergy destruction, followed by stack gas. Subsequently, multiple regression models were developed to correlate each of the response variables (performance characteristic) with the predictor variables (operating parameters). The regression model equations showed a significant statistical relationship between the predictor and response variables. (author)
Some Applied Research Concerns Using Multiple Linear Regression.
Newman, Isadore; Fraas, John
1979-01-01
Issues in the application of multiple regression analysis as a data analytic tool are discussed at some length. Included are discussions on component regression, factor regression, ridge regression, and systems of equations. (JKS)
Multiple Regressive Model Adaptive Control
Garipov, Emil; Stoilkov, Teodor; Kalaykov, Ivan
2008-01-01
The essence of the ideas applied to this text consists in the development of the strategy for control of the arbitrary in complexity continuous plant by means of a set of discrete timeinvariant linear controllers. Their number and tuned parameters correspond to the number and parameters of the linear time-invariant regressive models in the model bank, which approximate the complex plant dynamics in different operating points. Described strategy is known as Multiple Regressive Model Adaptive C...
Polycyclic aromatic hydrocarbons (PAHs) are contaminants that reside mainly in surface soils. Dietary intake of plant-based foods can make a major contribution to total PAH exposure. Little information is available on the relationship between root morphology and plant uptake of PAHs. An understanding of plant root morphologic and compositional factors that affect root uptake of contaminants is important and can inform both agricultural (chemical contamination of crops) and engineering (phytoremediation) applications. Five crop plant species are grown hydroponically in solutions containing the PAH phenanthrene. Measurements are taken for 1) phenanthrene uptake, 2) root morphology – specific surface area, volume, surface area, tip number and total root length and 3) root tissue composition – water, lipid, protein and carbohydrate content. These factors are compared through Pearson's correlation and multiple linear regression analysis. The major factors which promote phenanthrene uptake are specific surface area and lipid content. -- Highlights: •There is no correlation between phenanthrene uptake and total root length, and water. •Specific surface area and lipid are the most crucial factors for phenanthrene uptake. •The contribution of specific surface area is greater than that of lipid. -- The contribution of specific surface area is greater than that of lipid in the two most important root morphological and compositional factors affecting phenanthrene uptake
The results of studying the interconnection of parameters of solar cosmic ray (SCR) events and microwave (?) bursts obtained by the methods of multiple statistic analysis, are presented. It is shown using multiple correlation and regression analysis that the main peculiarities of the connection between ?-bursts and SCR events can be understood when accounting the differences in the dynamics of electrons and protons in different size flare arcs, supposing no SCR particle acceleration in the second flare phase
Špalj, Stjepan; Tudor Špalj, Vedrana; Ivanković, Luiđa; Plančak, Darije
2014-01-01
The aim of this study was to explore the patterns of oral health-related risk behaviours in relation to dental status, attitudes, motivation and knowledge among Croatian adolescents. The assessment was conducted in the sample of 750 male subjects – military recruits aged 18 – 28 in Croatia using the questionnaire and clinical examination. Mean number of decayed, missing and filled teeth (DMFT) and Significant Caries Index (SiC) were calculated. Multiple logistic regression models were crated ...
Investigations upon the indefinite rolls quality assurance in multiple regression analysis
The rolling rolls quality has been enhanced mainly due to the improvements of the chemical compositions of rolls materials. The realization of an optimal chemical composition can constitute a technical efficient mode to assure the exploitation properties, the material from which the rolling mills rolls are manufactured having a higher importance in this sense. This paper continues to present the scientifically results of our experimental research in the area of the rolling rolls. The basic research contains concrete elements of immediate practical utilities in the metallurgical enterprises, for the quality improvements of rolls, having in last as the aim the durability growth and the safety in exploitation. This paper presents an analysis of the chemical composition, the influences upon the mechanical properties of the indefinite cast iron rolls. We present some mathematical correlations and graphical interpretations between the hardness (on the working surface and on necks) and the chemical composition. Using the double and triple correlations which is really helpful in the foundry practice, as it allows us to determine variation boundaries for the chemical composition, in view the obtaining the optimal values of the hardness. We suggest a mathematical interpretation of the influence of the chemical composition over the hardness of these indefinite rolling rolls. In this sense we use the multiple regression analysis which can be an important statistical tool for the investigation of relationships between variables. The enunciation of some mathematically modeling results can be described through a number of multi-component equations determined for the spaces with 3 and 4 dimensions. Also, the regression surfaces, curves of levels and volumes of variations can be represented and interpreted by technologists considering these as correlation diagrams between the analyzed variables. In this sense, these researches results can be used in the engineers collectives of the foundries and the rolling mills sectors, for quality assurances of rolls as far back as phase of production, as well as in exploitation of these, what lead to, inevitably, to the quality assurance of produced laminates. (Author) 16 refs.
Multiple Logistic Regression Analysis of Cigarette Use among High School Students
Adwere-Boamah, Joseph
2011-01-01
A binary logistic regression analysis was performed to predict high school students' cigarette smoking behavior from selected predictors from 2009 CDC Youth Risk Behavior Surveillance Survey. The specific target student behavior of interest was frequent cigarette use. Five predictor variables included in the model were: a) race, b) frequency of
Regression analysis by example
Chatterjee, Samprit
2012-01-01
Praise for the Fourth Edition: ""This book is . . . an excellent source of examples for regression analysis. It has been and still is readily readable and understandable."" -Journal of the American Statistical Association Regression analysis is a conceptually simple method for investigating relationships among variables. Carrying out a successful application of regression analysis, however, requires a balance of theoretical results, empirical rules, and subjective judgment. Regression Analysis by Example, Fifth Edition has been expanded
Error analysis of dimensionless scaling experiments with multiple points using linear regression
A general method of error estimation in the case of multiple point dimensionless scaling experiments, using linear regression and standard error propagation, is proposed. The method reduces to the previous result of Cordey (2009 Nucl. Fusion 49 052001) in the case of a two-point scan. On the other hand, if the points follow a linear trend, it explains how the estimated error decreases as more points are added to the scan. Based on the analytical expression that is derived, it is argued that for a low number of points, adding points to the ends of the scanned range, rather than the middle, results in a smaller error estimate. (letter)
Tvete, Ingunn Fride; Natvig, Bent; Gsemyr, Jrund; Meland, Nils; Rine, Marianne; Klemp, Marianne
2015-01-01
Rheumatoid arthritis patients have been treated with disease modifying anti-rheumatic drugs (DMARDs) and the newer biologic drugs. We sought to compare and rank the biologics with respect to efficacy. We performed a literature search identifying 54 publications encompassing 9 biologics. We conducted a multiple treatment comparison regression analysis letting the number experiencing a 50% improvement on the ACR score be dependent upon dose level and disease duration for assessing the comparabl...
Multiple regression analysis of factors that may influence middle school science scores
Glover, Judith
The purpose of this quantitative multiple regression study was to determine whether a relationship existed between Maryland State Assessment (MSA) reading scores, MSA math scores, gender, ethnicity, age, and MSA science scores. Also examined was if MSA reading scores, MSA math scores, gender, ethnicity, and age can be used in combination or alone to predict a passing score on the MSA science test and which variable, if any, had the most influence on science MSA scores. Both math and reading MSA scores were positively correlated with science MSA scores. Ethnicity was correlated with science MSA scores, but may have been confounded by socio-economic status. Age and gender were not correlated with science MSA scores. When the variables were combined, results showed that math MSA scores followed by reading MSA scores had the most predictive influence upon science MSA scores. Ethnicity, gender, and age had the least predictive influence. The findings of this study may serve as a catalyst for improving student achievement in science through changes in instructional methodology and curriculum design thereby increasing the number of students pursuing science careers.
Application of multiple regression analysis to forecasting South Africa's electricity demand
Renee, Koen; Jennifer, Holloway.
2014-11-01
Full Text Available In a developing country such as South Africa, understanding the expected future demand for electricity is very important in various planning contexts. It is specifically important to understand how expected scenarios regarding population or economic growth can be translated into corresponding future [...] electricity usage patterns. This paper discusses a methodology for forecasting long-term electricity demand that was specifically developed for applying to such scenarios. The methodology uses a series of multiple regression models to quantify historical patterns of electricity usage per sector in relation to patterns observed in certain economic and demographic variables, and uses these relationships to derive expected future electricity usage patterns. The methodology has been used successfully to derive forecasts used for strategic planning within a private company as well as to provide forecasts to aid planning in the public sector. This paper discusses the development of the modelling methodology, provides details regarding the extensive data collection and validation processes followed during the model development, and reports on the relevant model fit statistics. The paper also shows that the forecasting methodology has to some extent been able to match the actual patterns, and therefore concludes that the methodology can be used to support planning by translating changes relating to economic and demographic growth, for a range of scenarios, into a corresponding electricity demand. The methodology therefore fills a particular gap within the South African long-term electricity forecasting domain.
Multiple Regressions in Analysing House Price Variations
Aminah Md Yusof
2012-03-01
Full Text Available An application of rigorous statistical analysis in aiding investment decision making gains momentum in the United States of America as well as the United Kingdom. Nonetheless in Malaysia the responses from the local academician are rather slow and the rate is even slower as far as the practitioners are concern. This paper illustrates how Multiple Regression Analysis (MRA and its extension, Hedonic Regression Analysis been used in explaining price variation for selected houses in Malaysia. Each attribute that theoretically identified as price determinant is priced and the perceived contribution of each is explicitly shown. The paper demonstrates how the statistical analysis is capable of analyzing property investment by considering multiple determinants. The consideration of various characteristics which is more rigorous enables better investment decision making.
Keat, Sim Chong; Chun, Beh Boon; San, Lim Hwee; Jafri, Mohd Zubir Mat
2015-04-01
Climate change due to carbon dioxide (CO2) emissions is one of the most complex challenges threatening our planet. This issue considered as a great and international concern that primary attributed from different fossil fuels. In this paper, regression model is used for analyzing the causal relationship among CO2 emissions based on the energy consumption in Malaysia using time series data for the period of 1980-2010. The equations were developed using regression model based on the eight major sources that contribute to the CO2 emissions such as non energy, Liquefied Petroleum Gas (LPG), diesel, kerosene, refinery gas, Aviation Turbine Fuel (ATF) and Aviation Gasoline (AV Gas), fuel oil and motor petrol. The related data partly used for predict the regression model (1980-2000) and partly used for validate the regression model (2001-2010). The results of the prediction model with the measured data showed a high correlation coefficient (R2=0.9544), indicating the model's accuracy and efficiency. These results are accurate and can be used in early warning of the population to comply with air quality standards.
Cross-Validation, Shrinkage, and Multiple Regression.
Hynes, Kevin
One aspect of multiple regression--the shrinkage of the multiple correlation coefficient on cross-validation is reviewed. The paper consists of four sections. In section one, the distinction between a fixed and a random multiple regression model is made explicit. In section two, the cross-validation paradigm and an explanation for the occurrence
El-Ansary, Afaf
2016-06-01
This work demonstrates data of multiple regression analysis between nine biomarkers related to glutamate excitotoxicity and impaired detoxification as two mechanisms recently recorded as autism phenotypes. The presented data was obtained by measuring a panel of markers in 20 autistic patients aged 3-15 years and 20 age and gender matching healthy controls. Levels of GSH, glutathione status (GSH/GSSG), glutathione reductase (GR), glutathione-s-transferase (GST), thioredoxin (Trx), thioredoxin reductase (TrxR) and peroxidoxins (Prxs I and III), glutamate, glutamine, glutamate/glutamine ratio glutamate dehydrogenase (GDH) in plasma and mercury (Hg) in red blood cells were determined in both groups. In Multiple regression analysis, R (2) values which describe the proportion or percentage of variance in the dependent variable attributed to the variance in the independent variables together were calculated. Moreover, β coefficients values which show the direction either positive or negative and the contribution of the independent variable relative to the other independent variables in explaining the variation of the dependent variable were determined. A panel of inter-related markers was recorded. This paper contains data related to and supporting research articles currently published entitled "Mechanism of nitrogen metabolism-related parameters and enzyme activities in the pathophysiology of autism" [1], "Novel metabolic biomarkers related to sulfur-dependent detoxification pathways in autistic patients of Saudi Arabia [2], and "A key role for an impaired detoxification mechanism in the etiology and severity of autism spectrum disorders" [3]. PMID:26933667
Sanford, Ward E.; Nelms, David L.; Pope, Jason P.; Selnick, David L.
2012-01-01
This study by the U.S. Geological Survey, prepared in cooperation with the Virginia Department of Environmental Quality, quantifies the components of the hydrologic cycle across the Commonwealth of Virginia. Long-term, mean fluxes were calculated for precipitation, surface runoff, infiltration, total evapotranspiration (ET), riparian ET, recharge, base flow (or groundwater discharge) and net total outflow. Fluxes of these components were first estimated on a number of real-time-gaged watersheds across Virginia. Specific conductance was used to distinguish and separate surface runoff from base flow. Specific-conductance data were collected every 15 minutes at 75 real-time gages for approximately 18 months between March 2007 and August 2008. Precipitation was estimated for 1971–2000 using PRISM climate data. Precipitation and temperature from the PRISM data were used to develop a regression-based relation to estimate total ET. The proportion of watershed precipitation that becomes surface runoff was related to physiographic province and rock type in a runoff regression equation. Component flux estimates from the watersheds were transferred to flux estimates for counties and independent cities using the ET and runoff regression equations. Only 48 of the 75 watersheds yielded sufficient data, and data from these 48 were used in the final runoff regression equation. The base-flow proportion for the 48 watersheds averaged 72 percent using specific conductance, a value that was substantially higher than the 61 percent average calculated using a graphical-separation technique (the USGS program PART). Final results for the study are presented as component flux estimates for all counties and independent cities in Virginia.
Interpretation of Regressions with Multiple Proxies
Darren Lubotsky; Martin Wittenberg
2001-01-01
We consider the situation in which there are multiple proxies for one unobserved explanatory variable in a linear regression and provide a procedure by which the coefficient of interest can be extracted "post hoc" from a multiple regression in which all the proxies are used simultaneously. This post hoc estimator is strictly superior in large samples to coefficients derived using any index or linear combination of the proxies that is created prior to the regression. To use an index created fr...
M. Cholewa
2011-07-01
Full Text Available In this article authors showed influence of technological parameters and modification treatment on structural properties for closed skeleton castings. Approach obtained maximal refinement of structure and minimal structure diversification. Skeleton castings were manufactured in accordance with elaborated production technology. Experimental castings were manufactured in variables technological conditions: range of pouring temperature 953 1013 K , temperature of mould 293 373 K and height of gating system above casting level 105 175 mm. Analysis of metallographic specimens and quantitative analysis of silicon crystals and secondary dendrite-arm spacing analysis of solution ? were performed. Average values of stereological parameters for all castings were determined. (B/L and (P/A factors were determined. On basis results of microstructural analysis authors compares research of samples. The aim of analysis was selected samples on least diversification of refinement degree of structure and least silicon crystals. On basis microstructural analysis authors state that samples 5 (AlSi11, Tpour 1013K, Tmould 333K, h 265 mm has the best structural properties (least diversification of refinement degree of structure and the least refinement of silicon crystals. Then statistical analysis results of structural analysis was obtained. On basis statistical analysis autors statethat the best structural properties for technological parameters: Tpour= 1013 K, Tmould= 373 K and h = 230 mm [4]. The results of statistical analysis are the prerequisite for optimization studies.
A two layer perceptron with backpropagation of error is used for quantitative analysis in ICP-AES. The network was trained by emission spectra of two interfering lines of Cd and As and the concentrations of both elements were subsequently estimated from mixture spectra. The spectra of the Cd and As lines were also used to perform multiple linear regression (MLR) via the calculation of the pseudoinverse S+ of the sensitivity matrix S. In the present paper it is shown that there exist close relations between the operation of the perceptron and the MLR procedure. These are most clearly apparent in the correlation between the weights of the backpropagation network and the elements of the pseudoinverse. Using MLR, the confidence intervals over the predictions are exploited to correct for the optical device of the wavelength shift. (orig.)
Barrett, C. A.
1985-01-01
Multiple linear regression analysis was used to determine an equation for estimating hot corrosion attack for a series of Ni base cast turbine alloys. The U transform (i.e., 1/sin (% A/100) to the 1/2) was shown to give the best estimate of the dependent variable, y. A complete second degree equation is described for the centered" weight chemistries for the elements Cr, Al, Ti, Mo, W, Cb, Ta, and Co. In addition linear terms for the minor elements C, B, and Zr were added for a basic 47 term equation. The best reduced equation was determined by the stepwise selection method with essentially 13 terms. The Cr term was found to be the most important accounting for 60 percent of the explained variability hot corrosion attack.
Tvete, Ingunn Fride; Natvig, Bent; Gsemyr, Jrund; Meland, Nils; Rine, Marianne; Klemp, Marianne
2015-01-01
Rheumatoid arthritis patients have been treated with disease modifying anti-rheumatic drugs (DMARDs) and the newer biologic drugs. We sought to compare and rank the biologics with respect to efficacy. We performed a literature search identifying 54 publications encompassing 9 biologics. We conducted a multiple treatment comparison regression analysis letting the number experiencing a 50% improvement on the ACR score be dependent upon dose level and disease duration for assessing the comparable relative effect between biologics and placebo or DMARD. The analysis embraced all treatment and comparator arms over all publications. Hence, all measured effects of any biologic agent contributed to the comparison of all biologic agents relative to each other either given alone or combined with DMARD. We found the drug effect to be dependent on dose level, but not on disease duration, and the impact of a high versus low dose level was the same for all drugs (higher doses indicated a higher frequency of ACR50 scores). The ranking of the drugs when given without DMARD was certolizumab (ranked highest), etanercept, tocilizumab/ abatacept and adalimumab. The ranking of the drugs when given with DMARD was certolizumab (ranked highest), tocilizumab, anakinra, rituximab, golimumab/ infliximab/ abatacept, adalimumab/ etanercept. Still, all drugs were effective. All biologic agents were effective compared to placebo, with certolizumab the most effective and adalimumab (without DMARD treatment) and adalimumab/ etanercept (combined with DMARD treatment) the least effective. The drugs were in general more effective, except for etanercept, when given together with DMARDs. PMID:26356639
Multiple Linear Regression Models in Outlier Detection
S.M.A.Khaleelur Rahman
2012-02-01
Full Text Available Identifying anomalous values in the real-world database is important both for improving the quality of original data and for reducing the impact of anomalous values in the process of knowledge discovery in databases. Such anomalous values give useful information to the data analyst in discovering useful patterns. Through isolation, these data may be separated and analyzed. The analysis of outliers and influential points is an important step of the regression diagnostics. In this paper, our aim is to detect the points which are very different from the others points. They do not seem to belong to a particular population and behave differently. If these influential points are to be removed it will lead to a different model. Distinction between these points is not always obvious and clear. Hence several indicators are used for identifying and analyzing outliers. Existing methods of outlier detection are based on manual inspection of graphically represented data. In this paper, we present a new approach in automating the process of detecting and isolating outliers. Impact of anomalous values on the dataset has been established by using two indicators DFFITS and CooksD. The process is based on modeling the human perception of exceptional values by using multiple linear regression analysis.
Computing multiple-output regression quantile regions
Paindaveine, D.; Šiman, Miroslav
2012-01-01
Roč. 56, č. 4 (2012), s. 840-853. ISSN 0167-9473 R&D Projects: GA MŠk(CZ) 1M06047 Institutional research plan: CEZ:AV0Z10750506 Keywords : halfspace depth * multiple-output regression * parametric linear programming * quantile regression Subject RIV: BA - General Mathematics Impact factor: 1.304, year: 2012 http://library.utia.cas.cz/separaty/2012/SI/siman-0376413.pdf
Kokaly, R.F.; Clark, R.N.
1999-01-01
We develop a new method for estimating the biochemistry of plant material using spectroscopy. Normalized band depths calculated from the continuum-removed reflectance spectra of dried and ground leaves were used to estimate their concentrations of nitrogen, lignin, and cellulose. Stepwise multiple linear regression was used to select wavelengths in the broad absorption features centered at 1.73 ??m, 2.10 ??m, and 2.30 ??m that were highly correlated with the chemistry of samples from eastern U.S. forests. Band depths of absorption features at these wavelengths were found to also be highly correlated with the chemistry of four other sites. A subset of data from the eastern U.S. forest sites was used to derive linear equations that were applied to the remaining data to successfully estimate their nitrogen, lignin, and cellulose concentrations. Correlations were highest for nitrogen (R2 from 0.75 to 0.94). The consistent results indicate the possibility of establishing a single equation capable of estimating the chemical concentrations in a wide variety of species from the reflectance spectra of dried leaves. The extension of this method to remote sensing was investigated. The effects of leaf water content, sensor signal-to-noise and bandpass, atmospheric effects, and background soil exposure were examined. Leaf water was found to be the greatest challenge to extending this empirical method to the analysis of fresh whole leaves and complete vegetation canopies. The influence of leaf water on reflectance spectra must be removed to within 10%. Other effects were reduced by continuum removal and normalization of band depths. If the effects of leaf water can be compensated for, it might be possible to extend this method to remote sensing data acquired by imaging spectrometers to give estimates of nitrogen, lignin, and cellulose concentrations over large areas for use in ecosystem studies.We develop a new method for estimating the biochemistry of plant material using spectroscopy. Normalized band depths calculated from the continuum-removed reflectance spectra of dried and ground leaves were used to estimate their concentrations of nitrogen, lignin, and cellulose. Stepwise multiple linear regression was used to select wavelengths in the broad absorption features centered at 1.73 ??m, 2.10 ??m, and 2.301 ??m that were highly correlated with the chemistry of samples from eastern U.S. forests. Band depths of absorption features at these wavelengths were found to also be highly correlated with the chemistry of four other sites. A subset of data from the eastern U.S. forest sites was used to derive linear equations that were applied to the remaining data to successfully estimate their nitrogen, lignin, and cellulose concentrations. Correlations were highest for nitrogen (R2 from 0.75 to 0.94). The consistent results indicate the possibility of establishing a single equation capable of estimating the chemical concentrations in a wide variety of species from the reflectance spectra of dried leaves. The extension of this method to remote sensing was investigated. The effects of leaf water content, sensor signal-to-noise and bandpass, atmospheric effects, and background soil exposure were examined. Leaf water was found to be the greatest challenge to extending this empirical method to the analysis of fresh whole leaves and complete vegetation canopies. The influence of leaf water on reflectance spectra must be removed to within 10%. Other effects were reduced by continuum removal and normalization of band depths. If the effects of leaf water can be compensated for, it might be possible to extend this method to remote sensing data acquired by imaging spectrometers to give estimates of nitrogen, lignin, and cellulose concentrations over large areas for use in ecosystem studies.
Suresh, Arumuganainar; Choi, Hong Lim
2011-10-01
Swine waste land application has increased due to organic fertilization, but excess application in an arable system can cause environmental risk. Therefore, in situ characterizations of such resources are important prior to application. To explore this, 41 swine slurry samples were collected from Korea, and wide differences were observed in the physico-biochemical properties. However, significant (Phydrometer, EC meter, drying oven and pH meter were found useful to estimate Mn, Fe, Ca, K, Al, Na, N and 5-day biochemical oxygen demands (BOD₅) at improved R² values of 0.83, 0.82, 0.77, 0.75, 0.67, 0.47, 0.88 and 0.70, respectively. The results from this study suggest that multiple property regressions can facilitate the prediction of micronutrients and organic matter much better than a single property regression for livestock waste. PMID:21767950
Enhance-Synergism and Suppression Effects in Multiple Regression
Lipovetsky, Stan; Conklin, W. Michael
2004-01-01
Relations between pairwise correlations and the coefficient of multiple determination in regression analysis are considered. The conditions for the occurrence of enhance-synergism and suppression effects when multiple determination becomes bigger than the total of squared correlations of the dependent variable with the regressors are discussed. It…
Persson, Bertil
2014-01-01
The aim of the study was to examine relationships between psychosocial family- and school environment and personality as assessed by the Junior Eysenck Personality Questionnaire (EPQ-J) and possible personality interactional effects. The study was based on 244 Swedish girls and boys, 10-19 years old, who filled in the Family- and School Psychosocial Environment (FSPE) questionnaire and the EPQ-J. A multiple regression analysis showed that the FSPE-factor Family conflicts and school discipline...
A Dirty Model for Multiple Sparse Regression
Jalali, Ali; Sanghavi, Sujay
2011-01-01
Sparse linear regression -- finding an unknown vector from linear measurements -- is now known to be possible with fewer samples than variables, via methods like the LASSO. We consider the multiple sparse linear regression problem, where several related vectors -- with partially shared support sets -- have to be recovered. A natural question in this setting is whether one can use the sharing to further decrease the overall number of samples required. A line of recent research has studied the use of \\ell_1/\\ell_q norm block-regularizations with q>1 for such problems; however these could actually perform worse in sample complexity -- vis a vis solving each problem separately ignoring sharing -- depending on the level of sharing. We present a new method for multiple sparse linear regression that can leverage support and parameter overlap when it exists, but not pay a penalty when it does not. A very simple idea: we decompose the parameters into two components and regularize these differently. We show both theore...
Liu, Bilan; Qiu, Xing; Zhu, Tong; Tian, Wei; Hu, Rui; Ekholm, Sven; Schifitto, Giovanni; Zhong, Jianhui
2016-03-01
Subject-specific longitudinal DTI study is vital for investigation of pathological changes of lesions and disease evolution. Spatial Regression Analysis of Diffusion tensor imaging (SPREAD) is a non-parametric permutation-based statistical framework that combines spatial regression and resampling techniques to achieve effective detection of localized longitudinal diffusion changes within the whole brain at individual level without a priori hypotheses. However, boundary blurring and dislocation limit its sensitivity, especially towards detecting lesions of irregular shapes. In the present study, we propose an improved SPREAD (dubbed improved SPREAD, or iSPREAD) method by incorporating a three-dimensional (3D) nonlinear anisotropic diffusion filtering method, which provides edge-preserving image smoothing through a nonlinear scale space approach. The statistical inference based on iSPREAD was evaluated and compared with the original SPREAD method using both simulated and in vivo human brain data. Results demonstrated that the sensitivity and accuracy of the SPREAD method has been improved substantially by adapting nonlinear anisotropic filtering. iSPREAD identifies subject-specific longitudinal changes in the brain with improved sensitivity, accuracy, and enhanced statistical power, especially when the spatial correlation is heterogeneous among neighboring image pixels in DTI.
S. CONDON
2014-06-01
Full Text Available The thermal inactivation of Enterococcus faecium under isothermal conditions in tryptic soy broth of different pH (4.0, 5.5 and 7.4 was studied. The bacterial cells were more sensitive at higher temperature and in media of low pH. Decimal reduction times at 71C were 2.56, 0.39 and 0.03 min at pH 7.4, 5.5 and 4.0 respectively. At all temperatures and pH assayed, the survival curves obtained were linear. A mathematical model based on the first order kinetic accurately described these survival curves. The relationship between DT values and temperature was also linear. A mean z-value of 5C was established. A multiple linear regression model using four predictor variables (pH, T, pH2 and T2 related the Log of DT value with pH and treatment temperature. The developed tertiary model satisfactorily predicted the heat inactivation of Enterococcus faeciumunder the treatment conditions investigated.
Hukharnsusatrue, A.
2005-11-01
Full Text Available The objective of this research is to compare multiple regression coefficients estimating methods with existence of multicollinearity among independent variables. The estimation methods are Ordinary Least Squares method (OLS, Restricted Least Squares method (RLS, Restricted Ridge Regression method (RRR and Restricted Liu method (RL when restrictions are true and restrictions are not true. The study used the Monte Carlo Simulation method. The experiment was repeated 1,000 times under each situation. The analyzed results of the data are demonstrated as follows. CASE 1: The restrictions are true. In all cases, RRR and RL methods have a smaller Average Mean Square Error (AMSE than OLS and RLS method, respectively. RRR method provides the smallest AMSE when the level of correlations is high and also provides the smallest AMSE for all level of correlations and all sample sizes when standard deviation is equal to 5. However, RL method provides the smallest AMSE when the level of correlations is low and middle, except in the case of standard deviation equal to 3, small sample sizes, RRR method provides the smallest AMSE.The AMSE varies with, most to least, respectively, level of correlations, standard deviation and number of independent variables but inversely with to sample size.CASE 2: The restrictions are not true.In all cases, RRR method provides the smallest AMSE, except in the case of standard deviation equal to 1 and error of restrictions equal to 5%, OLS method provides the smallest AMSE when the level of correlations is low or median and there is a large sample size, but the small sample sizes, RL method provides the smallest AMSE. In addition, when error of restrictions is increased, OLS method provides the smallest AMSE for all level, of correlations and all sample sizes, except when the level of correlations is high and sample sizes small. Moreover, the case OLS method provides the smallest AMSE, the most RLS method has a smaller AMSE than RRR and RL methods when the level of correlations is low or median and sample sizes are large.The AMSE varies with, most to least, respectively, error of restrictions, level of correlations, standard deviation and number of independent variables but inversely with to sample sizes, except that error of restrictions does not affect AMSE of OLS method.
Riccardi, M.; Mele, G.; Pulvento, C.; Lavini, A.; D'Andria, R.; Jacobsen, Sven-Erik
2014-01-01
standard laboratory procedure. Single and multiple regression models using RGB color components as independent variables have been tested and validated. The performance of the proposed method was compared to that of the widely used non-destructive SPAD method. Sensitivity of the best regression models for......Leaf chlorophyll content provides valuable information about physiological status of plants; it is directly linked to photosynthetic potential and primary production. In vitro assessment by wet chemical extraction is the standard method for leaf chlorophyll determination. This measurement is...... components analysis of digital images acquired with a standard SLR camera. Digital images of leaves from different genotypes of quinoa and amaranth were acquired directly in the field. Mean values of each RGB component were evaluated via image analysis software and correlated to leaf chlorophyll provided by...
Paulo Canas Rodrigues
2011-12-01
Full Text Available This paper joins the main properties of joint regression analysis (JRA, a model based on the Finlay-Wilkinson regression to analyse multi-environment trials, and of the additive main effects and multiplicative interaction (AMMI model. The study compares JRA and AMMI with particular focus on robustness with increasing amounts of randomly selected missing data. The application is made using a data set from a breeding program of durum wheat (Triticum turgidum L., Durum Group conducted in Portugal. The results of the two models result in similar dominant cultivars (JRA and winner of mega-environments (AMMI for the same environments. However, JRA had more stable results with the increase in the incidence rates of missing values.
Paulo Canas, Rodrigues; Dulce Gamito Santinhos, Pereira; Joo Tiago, Mexia.
2011-12-01
Full Text Available This paper joins the main properties of joint regression analysis (JRA), a model based on the Finlay-Wilkinson regression to analyse multi-environment trials, and of the additive main effects and multiplicative interaction (AMMI) model. The study compares JRA and AMMI with particular focus on robust [...] ness with increasing amounts of randomly selected missing data. The application is made using a data set from a breeding program of durum wheat (Triticum turgidum L., Durum Group) conducted in Portugal. The results of the two models result in similar dominant cultivars (JRA) and winner of mega-environments (AMMI) for the same environments. However, JRA had more stable results with the increase in the incidence rates of missing values.
Entrepreneurial intention modeling using hierarchical multiple regression
Marina Jeger
2014-12-01
Full Text Available The goal of this study is to identify the contribution of effectuation dimensions to the predictive power of the entrepreneurial intention model over and above that which can be accounted for by other predictors selected and confirmed in previous studies. As is often the case in social and behavioral studies, some variables are likely to be highly correlated with each other. Therefore, the relative amount of variance in the criterion variable explained by each of the predictors depends on several factors such as the order of variable entry and sample specifics. The results show the modest predictive power of two dimensions of effectuation prior to the introduction of the theory of planned behavior elements. The article highlights the main advantages of applying hierarchical regression in social sciences as well as in the specific context of entrepreneurial intention formation, and addresses some of the potential pitfalls that this type of analysis entails.
Polynomial regression analysis and significance test of the regression function
In order to analyze the decay heating power of a certain radioactive isotope per kilogram with polynomial regression method, the paper firstly demonstrated the broad usage of polynomial function and deduced its parameters with ordinary least squares estimate. Then significance test method of polynomial regression function is derived considering the similarity between the polynomial regression model and the multivariable linear regression model. Finally, polynomial regression analysis and significance test of the polynomial function are done to the decay heating power of the iso tope per kilogram in accord with the authors' real work. (authors)
Relationship between Multiple Regression and Selected Multivariable Methods.
Schumacker, Randall E.
The relationship of multiple linear regression to various multivariate statistical techniques is discussed. The importance of the standardized partial regression coefficient (beta weight) in multiple linear regression as it is applied in path, factor, LISREL, and discriminant analyses is emphasized. The multivariate methods discussed in this paper…
Multiple Retrieval Models and Regression Models for Prior Art Search
Lopez, Patrice
2009-01-01
This paper presents the system called PATATRAS (PATent and Article Tracking, Retrieval and AnalysiS) realized for the IP track of CLEF 2009. Our approach presents three main characteristics: 1. The usage of multiple retrieval models (KL, Okapi) and term index definitions (lemma, phrase, concept) for the three languages considered in the present track (English, French, German) producing ten different sets of ranked results. 2. The merging of the different results based on multiple regression models using an additional validation set created from the patent collection. 3. The exploitation of patent metadata and of the citation structures for creating restricted initial working sets of patents and for producing a final re-ranking regression model. As we exploit specific metadata of the patent documents and the citation relations only at the creation of initial working sets and during the final post ranking step, our architecture remains generic and easy to extend.
Chelgani, S.C.; Hart, B.; Grady, W.C.; Hower, J.C.
2011-01-01
The relationship between maceral content plus mineral matter and gross calorific value (GCV) for a wide range of West Virginia coal samples (from 6518 to 15330 BTU/lb; 15.16 to 35.66MJ/kg) has been investigated by multivariable regression and adaptive neuro-fuzzy inference system (ANFIS). The stepwise least square mathematical method comparison between liptinite, vitrinite, plus mineral matter as input data sets with measured GCV reported a nonlinear correlation coefficient (R2) of 0.83. Using the same data set the correlation between the predicted GCV from the ANFIS model and the actual GCV reported a R2 value of 0.96. It was determined that the GCV-based prediction methods, as used in this article, can provide a reasonable estimation of GCV. Copyright ?? Taylor & Francis Group, LLC.
Credit Scoring Problem Based on Regression Analysis
Khassawneh, Bashar Suhil Jad Allah
2014-01-01
ABSTRACT: This thesis provides an explanatory introduction to the regression models of data mining and contains basic definitions of key terms in the linear, multiple and logistic regression models. Meanwhile, the aim of this study is to illustrate fitting models for the credit scoring problem using simple linear, multiple linear and logistic regression models and also to analyze the found model functions by statistical tools. Keywords: Data mining, linear regression, logistic regression....
The problem of performing process capability analysis when auto correlations are present is discussed. It is shown that when the systematic nonrandom phenomenon induced by autocorrelation is ignored the variance estimate obtained from the original data is no longer an appropriate estimate for use in the process capability analyses. A remedial measure based on an autoregressive integrated moving average model is proposed. It is also shown that the process variance estimated from the residual analysis yields appropriate results for the process capability indices
A retrospective analysis of 965 patients with invasive cervix cancer treated by radiation therapy between 1976 and 1981 was performed in order to evaluate prognostic factors for disease-free survival (DFS) and pelvic control. FIGO stage was the most powerful prognostic factor followed by radiation dose and treatment duration (P values = 0.0001). If the analysis was limited to patients treated with radical doses of 75 Gy or more, dose was no longer significant. Young age at diagnosis, non-squamous histology and transfusion during treatment were also adverse prognostic factors for survival and control. Para-aortic nodal involvement on lymphogram was associated with a reduction in DFS (P = 0.0027), whereas pelvic lymph node involvement alone was not. In patients with Stage I and IIA disease, tumour size was the most powerful prognostic factor for survival (P = 0.0001) and the extent of pelvic sidewall involvement was significant in patients with Stage III tumours (P = 0.007). Histological grade appeared to be a predictive factor but was only recorded in 712 patients. These features should be considered in the staging of patients and in the design of clinical trials
An Additive-Multiplicative Cox-Aalen Regression Model
Scheike, Thomas H.; Zhang, Mei-Jie
Aalen model; additive risk model; counting processes; Cox regression; survival analysis; time-varying effects......Aalen model; additive risk model; counting processes; Cox regression; survival analysis; time-varying effects...
Retail sales forecasting with application the multiple regression
Kuzhda, Tetyana
2012-05-01
Full Text Available The article begins with a formulation for predictive learning called multiple regression model. Theoretical approach on construction of the regression models is described. The key information of the article is the mathematical formulation for the forecast linear equation that estimates the multiple regression model. Calculation the quantitative value of dependent variable forecast under influence of independent variables is explained. This paper presents the retail sales forecasting with multiple model estimation. One of the most important decisions a retailer can make with information obtained by the multiple regression. Recently, a changing retail environment is causing by an expected consumer’s income and advertising costs. Checking model on the goodness of fit and statistical significance are explored in the article. Finally, the quantitative value of retail sales forecast based on multiple regression model is calculated.
Riccardi, M; Mele, G; Pulvento, C; Lavini, A; d'Andria, R; Jacobsen, S-E
2014-06-01
Leaf chlorophyll content provides valuable information about physiological status of plants; it is directly linked to photosynthetic potential and primary production. In vitro assessment by wet chemical extraction is the standard method for leaf chlorophyll determination. This measurement is expensive, laborious, and time consuming. Over the years alternative methods, rapid and non-destructive, have been explored. The aim of this work was to evaluate the applicability of a fast and non-invasive field method for estimation of chlorophyll content in quinoa and amaranth leaves based on RGB components analysis of digital images acquired with a standard SLR camera. Digital images of leaves from different genotypes of quinoa and amaranth were acquired directly in the field. Mean values of each RGB component were evaluated via image analysis software and correlated to leaf chlorophyll provided by standard laboratory procedure. Single and multiple regression models using RGB color components as independent variables have been tested and validated. The performance of the proposed method was compared to that of the widely used non-destructive SPAD method. Sensitivity of the best regression models for different genotypes of quinoa and amaranth was also checked. Color data acquisition of the leaves in the field with a digital camera was quick, more effective, and lower cost than SPAD. The proposed RGB models provided better correlation (highest R (2)) and prediction (lowest RMSEP) of the true value of foliar chlorophyll content and had a lower amount of noise in the whole range of chlorophyll studied compared with SPAD and other leaf image processing based models when applied to quinoa and amaranth. PMID:24442792
The use of multiple linear regression in property valuation
Marko Pejić
2013-05-01
Full Text Available The property appraisal is of great importance for one country and its economy. Nowadays, successful land management system could not be imagined without the subsystem related to market economy. Having the information about land and its values offer broad possibilities for market economy and strongly influence development of the real estate market. Special attention should be paid to the mass appraisal methods and its use in developing the tax system and framework for appropriate property appraisal system. Multiple regression analysis is just one of the methods used for this purpose and this article is focused to its characteristics and advantages in mass appraisal system development.
Regression analysis with categorized regression calibrated exposure: some interesting findings
Hjartåker Anette
2006-07-01
Full Text Available Abstract Background Regression calibration as a method for handling measurement error is becoming increasingly well-known and used in epidemiologic research. However, the standard version of the method is not appropriate for exposure analyzed on a categorical (e.g. quintile scale, an approach commonly used in epidemiologic studies. A tempting solution could then be to use the predicted continuous exposure obtained through the regression calibration method and treat it as an approximation to the true exposure, that is, include the categorized calibrated exposure in the main regression analysis. Methods We use semi-analytical calculations and simulations to evaluate the performance of the proposed approach compared to the naive approach of not correcting for measurement error, in situations where analyses are performed on quintile scale and when incorporating the original scale into the categorical variables, respectively. We also present analyses of real data, containing measures of folate intake and depression, from the Norwegian Women and Cancer study (NOWAC. Results In cases where extra information is available through replicated measurements and not validation data, regression calibration does not maintain important qualities of the true exposure distribution, thus estimates of variance and percentiles can be severely biased. We show that the outlined approach maintains much, in some cases all, of the misclassification found in the observed exposure. For that reason, regression analysis with the corrected variable included on a categorical scale is still biased. In some cases the corrected estimates are analytically equal to those obtained by the naive approach. Regression calibration is however vastly superior to the naive method when applying the medians of each category in the analysis. Conclusion Regression calibration in its most well-known form is not appropriate for measurement error correction when the exposure is analyzed on a percentile scale. Relating back to the original scale of the exposure solves the problem. The conclusion regards all regression models.
Nakagawa, S. [Maizuru National College of Technology, Kyoto (Japan); Kenmoku, Y.; Sakakibara, T. [Toyohashi University of Technology, Aichi (Japan); Kawamoto, T. [Shizuoka University, Shizuoka (Japan). Faculty of Engineering
1996-10-27
Study is under way for a more accurate solar radiation quantity prediction for the enhancement of solar energy utilization efficiency. Utilizing the technique of roughly estimating the day`s clearness index from forecast weather, the forecast weather (constituted of weather conditions such as `clear,` `cloudy,` etc., and adverbs or adjectives such as `afterward,` `temporary,` and `intermittent`) has been quantified relative to the clearness index. This index is named the `weather index` for the purpose of this article. The error high in rate in the weather index relates to cloudy days, which means a weather index falling in 0.2-0.5. It has also been found that there is a high correlation between the clearness index and the north-south wind direction component. A multiple regression analysis has been carried out, under the circumstances, for the estimation of clearness index from the maximum temperature and the north-south wind direction component. As compared with estimation of the clearness index on the basis only of the weather index, estimation using the weather index and maximum temperature achieves a 3% improvement throughout the year. It has also been learned that estimation by use of the weather index and north-south wind direction component enables a 2% improvement for summer and a 5% or higher improvement for winter. 2 refs., 6 figs., 4 tabs.
Valente, Andrea; Bürki, Audrey; Laganaro, Marina
2014-01-01
A major effort in cognitive neuroscience of language is to define the temporal and spatial characteristics of the core cognitive processes involved in word production. One approach consists in studying the effects of linguistic and pre-linguistic variables in picture naming tasks. So far, studies have analyzed event-related potentials (ERPs) during word production by examining one or two variables with factorial designs. Here we extended this approach by investigating simultaneously the effects of multiple theoretical relevant predictors in a picture naming task. High density EEG was recorded on 31 participants during overt naming of 100 pictures. ERPs were extracted on a trial by trial basis from picture onset to 100 ms before the onset of articulation. Mixed-effects regression models were conducted to examine which variables affected production latencies and the duration of periods of stable electrophysiological patterns (topographic maps). Results revealed an effect of a pre-linguistic variable, visual complexity, on an early period of stable electric field at scalp, from 140 to 180 ms after picture presentation, a result consistent with the proposal that this time period is associated with visual object recognition processes. Three other variables, word Age of Acquisition, Name Agreement, and Image Agreement influenced response latencies and modulated ERPs from ~380 ms to the end of the analyzed period. These results demonstrate that a topographic analysis fitted into the single trial ERPs and covering the entire processing period allows one to associate the cost generated by psycholinguistic variables to the duration of specific stable electrophysiological processes and to pinpoint the precise time-course of multiple word production predictors at once. PMID:25538546
Valente, Andrea; Bürki, Audrey; Laganaro, Marina
2014-01-01
A major effort in cognitive neuroscience of language is to define the temporal and spatial characteristics of the core cognitive processes involved in word production. One approach consists in studying the effects of linguistic and pre-linguistic variables in picture naming tasks. So far, studies have analyzed event-related potentials (ERPs) during word production by examining one or two variables with factorial designs. Here we extended this approach by investigating simultaneously the effects of multiple theoretical relevant predictors in a picture naming task. High density EEG was recorded on 31 participants during overt naming of 100 pictures. ERPs were extracted on a trial by trial basis from picture onset to 100 ms before the onset of articulation. Mixed-effects regression models were conducted to examine which variables affected production latencies and the duration of periods of stable electrophysiological patterns (topographic maps). Results revealed an effect of a pre-linguistic variable, visual complexity, on an early period of stable electric field at scalp, from 140 to 180 ms after picture presentation, a result consistent with the proposal that this time period is associated with visual object recognition processes. Three other variables, word Age of Acquisition, Name Agreement, and Image Agreement influenced response latencies and modulated ERPs from ~380 ms to the end of the analyzed period. These results demonstrate that a topographic analysis fitted into the single trial ERPs and covering the entire processing period allows one to associate the cost generated by psycholinguistic variables to the duration of specific stable electrophysiological processes and to pinpoint the precise time-course of multiple word production predictors at once. PMID:25538546
Virués-Ortega, Javier
2010-06-01
A number of clinical trials and single-subject studies have been published measuring the effectiveness of long-term, comprehensive applied behavior analytic (ABA) intervention for young children with autism. However, the overall appreciation of this literature through standardized measures has been hampered by the varying methods, designs, treatment features and quality standards of published studies. In an attempt to fill this gap in the literature, state-of-the-art meta-analytical methods were implemented, including quality assessment, sensitivity analysis, meta-regression, dose-response meta-analysis and meta-analysis of studies of different metrics. Results suggested that long-term, comprehensive ABA intervention leads to (positive) medium to large effects in terms of intellectual functioning, language development, acquisition of daily living skills and social functioning in children with autism. Although favorable effects were apparent across all outcomes, language-related outcomes (IQ, receptive and expressive language, communication) were superior to non-verbal IQ, social functioning and daily living skills, with effect sizes approaching 1.5 for receptive and expressive language and communication skills. Dose-dependant effect sizes were apparent by levels of total treatment hours for language and adaptation composite scores. Methodological issues relating ABA clinical trials for autism are discussed. PMID:20223569
Multiple regression analyses in the prediction of aerospace instrument costs
Tran, Linh
The aerospace industry has been investing for decades in ways to improve its efficiency in estimating the project life cycle cost (LCC). One of the major focuses in the LCC is the cost/prediction of aerospace instruments done during the early conceptual design phase of the project. The accuracy of early cost predictions affects the project scheduling and funding, and it is often the major cause for project cost overruns. The prediction of instruments' cost is based on the statistical analysis of these independent variables: Mass (kg), Power (watts), Instrument Type, Technology Readiness Level (TRL), Destination: earth orbiting or planetary, Data rates (kbps), Number of bands, Number of channels, Design life (months), and Development duration (months). This author is proposing a cost prediction approach of aerospace instruments based on these statistical analyses: Clustering Analysis, Principle Components Analysis (PCA), Bootstrap, and multiple regressions (both linear and non-linear). In the proposed approach, the Cost Estimating Relationship (CER) will be developed for the dependent variable Instrument Cost by using a combination of multiple independent variables. "The Full Model" will be developed and executed to estimate the full set of nine variables. The SAS program, Excel, Automatic Cost Estimating Integrate Tool (ACEIT) and Minitab are the tools to aid the analysis. Through the analysis, the cost drivers will be identified which will help develop an ultimate cost estimating software tool for the Instrument Cost prediction and optimization of future missions.
Multiple kernel support vector regression for pricing nifty option
Neetu Verma
2015-09-01
Full Text Available The goal of present experiments is to investigate the use of multiple kernel learning as a tool for pricing options in the context of Indian stock market for Nifty index options. In this paper, fair price of an option is predicted by Multiple Kernel Support Vector Regression (MKLSVR using linear combinations of kernels and Single Kernel Support Vector Regression (SKSVR. Prices of option highly depend on different money market conditions like deep-in-the-money, in-the-money, at-the-money, out-of-money and deep-out-of-money condition. The experimental study attempts to identify the forecasting errors with the help of mean square error; root meant square error, and normalized root meant square error between the market option prices and the calculated option prices by model for all market conditions. The results reflect that multiple kernel support vector regression performed fairly well in comparison to support vector regression with single kernel.
Vehicle Travel Time Predication based on Multiple Kernel Regression
Wenjing Xu
2014-07-01
Full Text Available With the rapid development of transportation and logistics economy, the vehicle travel time prediction and planning become an important topic in logistics. Travel time prediction, which is indispensible for traffic guidance, has become a key issue for researchers in this field. At present, the prediction of travel time is mainly short term prediction, and the predication methods include artificial neural network, Kaman filter and support vector regression (SVR method etc. However, these algorithms still have some shortcomings, such as highcomputationcomplexity, slow convergence rate etc. This paper exploits the learning ability of multiple kernel learning regression (MKLR in nonlinear prediction processing characteristics, logistics planning based on MKLR for vehicle travel time prediction. The method for Vehicle travel time prediction includes the following steps: (1 preprocessing historical data; (2 selecting appropriate kernel function, training the historical data and performing analysis ;(3 predicting the vehicle travel time based on the trained model. The experimental results show that, through the analysis of using different methods for prediction, the vehicle travel time prediction method proposed in this paper, archives higher accuracy than other methods. It also illustrates the feasibility and effectiveness of the proposed prediction method.
Steganalysis of LSB Image Steganography using Multiple Regression and Auto Regressive (AR Model
Souvik Bhattacharyya
2011-07-01
Full Text Available The staggering growth in communication technologyand usage of public domain channels (i.e. Internet has greatly facilitated transfer of data. However, such open communication channelshave greater vulnerability to security threats causing unauthorizedin- formation access. Traditionally, encryption is used to realizethen communication security. However, important information is notprotected once decoded. Steganography is the art and science of communicating in a way which hides the existence of the communication.Important information is ﬁrstly hidden in a host data, such as digitalimage, text, video or audio, etc, and then transmitted secretly tothe receiver. Steganalysis is another important topic in informationhiding which is the art of detecting the presence of steganography. Inthis paper a novel technique for the steganalysis of Image has beenpresented. The proposed technique uses an auto-regressive model todetect the presence of the hidden messages, as well as to estimatethe relative length of the embedded messages.Various auto regressiveparameters are used to classify cover image as well as stego imagewith the help of a SVM classiﬁer. Multiple Regression analysis ofthe cover carrier along with the stego carrier has been carried outin order to ﬁnd out the existence of the negligible amount of thesecret message. Experimental results demonstrate the effectivenessand accuracy of the proposed technique.
Relative risk regression analysis of epidemiologic data.
Prentice, R. L.
1985-01-01
Relative risk regression methods are described. These methods provide a unified approach to a range of data analysis problems in environmental risk assessment and in the study of disease risk factors more generally. Relative risk regression methods are most readily viewed as an outgrowth of Cox's regression and life model. They can also be viewed as a regression generalization of more classical epidemiologic procedures, such as that due to Mantel and Haenszel. In the context of an epidemiolog...
Bry, Xavier; Cazes, Pierre
2008-01-01
A variable group Y is assumed to depend upon R thematic variable groups X 1, >..., X R . We assume that components in Y depend linearly upon components in the Xr's. In this work, we propose a multiple covariance criterion which extends that of PLS regression to this multiple predictor groups situation. On this criterion, we build a PLS-type exploratory method - Structural Equation Exploratory Regression (SEER) - that allows to simultaneously perform dimension reduction in groups and investigate the linear model of the components. SEER uses the multidimensional structure of each group. An application example is given.
Schaeck, S.; Karspeck, T.; Ott, C.; Weirather-Koestner, D.; Stoermer, A. O.
2011-03-01
In the first part of this work [1] a field operational test (FOT) on micro-HEVs (hybrid electric vehicles) and conventional vehicles was introduced. Valve-regulated lead-acid (VRLA) batteries in absorbent glass mat (AGM) technology and flooded batteries were applied. The FOT data were analyzed by kernel density estimation. In this publication multiple regression analysis is applied to the same data. Square regression models without interdependencies are used. Hereby, capacity loss serves as dependent parameter and several battery-related and vehicle-related parameters as independent variables. Battery temperature is found to be the most critical parameter. It is proven that flooded batteries operated in the conventional power system (CPS) degrade faster than VRLA-AGM batteries in the micro-hybrid power system (MHPS). A smaller number of FOT batteries were applied in a vehicle-assigned test design where the test battery is repeatedly mounted in a unique test vehicle. Thus, vehicle category and specific driving profiles can be taken into account in multiple regression. Both parameters have only secondary influence on battery degradation, instead, extended vehicle rest time linked to low mileage performance is more serious. A tear-down analysis was accomplished for selected VRLA-AGM batteries operated in the MHPS. Clear indications are found that pSoC-operation with periodically fully charging the battery (refresh charging) does not result in sulphation of the negative electrode. Instead, the batteries show corrosion of the positive grids and weak adhesion of the positive active mass.
Computing multiple-output regression quantile regions from projection quantiles
Paindaveine, D.; Šiman, Miroslav
2012-01-01
Roč. 27, č. 1 (2012), s. 29-49. ISSN 0943-4062 R&D Projects: GA MŠk(CZ) 1M06047 Institutional research plan: CEZ:AV0Z10750506 Keywords : directional quantile * halfspace depth * multiple-output regression * parametric programming * quantile regression Subject RIV: BA - General Mathematics Impact factor: 0.482, year: 2012 http://library.utia.cas.cz/separaty/2012/SI/siman-0376414.pdf
Elliptical multiple-output quantile regression and convex optimization
Hallin, M.; Šiman, Miroslav
2016-01-01
Roč. 109, č. 1 (2016), s. 232-237. ISSN 0167-7152 R&D Projects: GA ČR GA14-07234S Institutional support: RVO:67985556 Keywords : quantile regression * elliptical quantile * multivariate quantile * multiple-output regression Subject RIV: BA - General Mathematics Impact factor: 0.595, year: 2014 http:// library .utia.cas.cz/separaty/2016/SI/siman-0458243.pdf
On the adaptive estimation of a multiplicative separable regression function
Chesneau, Christophe
2013-01-01
We investigate the estimation of a multiplicative separable regression function from a bi-dimensional nonparametric regression model with random design. We present a general estimator for this problem and study its mean integrated squared error (MISE) properties. A wavelet version of this estimator is developed. In some situations, we prove that it attains the standard unidimensional rate of convergence under the MISE over Besov balls.
Regression Analysis of Soil Compressibility
LAV, M. Ay?en; ANSAL, Atilla M.
2001-01-01
A detailed study was carried out to determine the correlations between consolidation properties such as compression index, overconsolidation ratio and various index properties based on test results obtained from 300 soil samples. All of the tests were conducted in the I.T.U. Soil Mechanics Laboratory on samples taken from different construction sites distributed throughout Turkey during the last forty years. Different regression models were utilized and the most suitable relat...
Applied regression analysis a research tool
Pantula, Sastry; Dickey, David
1998-01-01
Least squares estimation, when used appropriately, is a powerful research tool. A deeper understanding of the regression concepts is essential for achieving optimal benefits from a least squares analysis. This book builds on the fundamentals of statistical methods and provides appropriate concepts that will allow a scientist to use least squares as an effective research tool. Applied Regression Analysis is aimed at the scientist who wishes to gain a working knowledge of regression analysis. The basic purpose of this book is to develop an understanding of least squares and related statistical methods without becoming excessively mathematical. It is the outgrowth of more than 30 years of consulting experience with scientists and many years of teaching an applied regression course to graduate students. Applied Regression Analysis serves as an excellent text for a service course on regression for non-statisticians and as a reference for researchers. It also provides a bridge between a two-semester introduction to...
Hierarchical Regression Analysis in Structural Equation Modeling.
de Jong, Peter F.
1999-01-01
Describes how a hierarchical regression analysis may be conducted in structural equation modeling. The main procedure is to perform a Cholesky or triangular decomposition of the intercorrelations among the latest predictors. Provides an example of a hierarchical regression analysis with latent variables. (SLD)
Regression Analysis and the Sociological Imagination
De Maio, Fernando
2014-01-01
Regression analysis is an important aspect of most introductory statistics courses in sociology but is often presented in contexts divorced from the central concerns that bring students into the discipline. Consequently, we present five lesson ideas that emerge from a regression analysis of income inequality and mortality in the USA and Canada.
Hecht, Jeffrey B.
The analysis of regression residuals and detection of outliers are discussed, with emphasis on determining how deviant an individual data point must be to be considered an outlier and the impact that multiple suspected outlier data points have on the process of outlier determination and treatment. Only bivariate (one dependent and one independent)…
Testing Genetic Association by Regressing Genotype over Multiple Phenotypes
Wang, Kai
2014-01-01
Complex disorders are typically characterized by multiple phenotypes. Analyzing these phenotypes jointly is expected to be more powerful than dealing with one of them at a time. A recent approach (O'Reilly et al. 2012) is to regress the genotype at a SNP marker on multiple phenotypes and apply the proportional odds model. In the current research, we introduce an explicit expression for the score test statistic and its non-centrality parameter that determines its power. Same simulation studies...
Interpreting Multiple Linear Regression: A Guidebook of Variable Importance
Nathans, Laura L.; Oswald, Frederick L.; Nimon, Kim
2012-01-01
Multiple regression (MR) analyses are commonly employed in social science fields. It is also common for interpretation of results to typically reflect overreliance on beta weights, often resulting in very limited interpretations of variable importance. It appears that few researchers employ other methods to obtain a fuller understanding of what…
Strategies for Identification and Detection of Outliers in Multiple Regression.
Vannoy, Martha
Outliers are frequently found in data sets and can cause problems for researchers if not addressed. Failure to identify and deal with outliers in an appropriate manner may lead researchers to report erroneous results. Using a multiple regression context, this paper examines some of the reasons for the presence of outliers and simple methods for
Halil Ibrahim Cebeci
2009-12-01
Full Text Available This study explores the relationship between the student performance and instructional design. The research was conducted at the E-Learning School at a university in Turkey. A list of design factors that had potential influence on student success was created through a review of the literature and interviews with relevant experts. From this, the five most import design factors were chosen. The experts scored 25 university courses on the extent to which they demonstrated the chosen design factors. Multiple-regression and supervised artificial neural network (ANN models were used to examine the relationship between student grade point averages and the scores on the five design factors. The results indicated that there is no statistical difference between the two models. Both models identified the use of examples and applications as the most influential factor. The ANN model provided more information and was used to predict the course-specific factor values required for a desired level of success.
Linear regression analysis theory and computing
Yan, Xin
2009-01-01
This volume presents in detail the fundamental theories of linear regression analysis and diagnosis, as well as the relevant statistical computing techniques so that readers are able to actually model the data using the methods and techniques described in the book. It covers the fundamental theories in linear regression analysis and is extremely useful for future research in this area. The examples of regression analysis using the Statistical Application System (SAS) are also included. This book is suitable for graduate students who are either majoring in statistics/biostatistics or using line
Chen Su-Fen
2013-01-01
Unified Multiple Linear Regression (UMLR) is a nonlinear programming model that unifies all kind of multiple linear regression models, such as Principal Components Regression, Ridge Regression, Robust Regression and constrained regression. Although, UMLR has exhibited excellent performances in some real applications, the optimization procedure is not satisfying yet. This study proposes a novel Granular Computing-Particle Swarm Optimization (Grc-PSO) algorithm by ...
Estimating R-squared Shrinkage in Multiple Regression: A Comparison of Different Analytical Methods.
Yin, Ping; Fan, Xitao
2001-01-01
Studied the effectiveness of various analytical formulas for estimating "R" squared shrinkage in multiple regression analysis, focusing on estimators of the squared population multiple correlation coefficient and the squared population cross validity coefficient. Simulation results suggest that the most widely used Wherry (R. Wherry, 1931) formula
Multiple linear regression estimators with skew normal errors
Alhamide, A. A.; Ibrahim, K.; Alodat, M. T.
2015-09-01
The idea of skew normal distribution is suitable to be used for the analysis of data which is skewed. The purpose of this paper is to study the estimation of the regression parameters under the extended multivariate skew normal errors. The estimators for the regression parameters found based on the maximum likelihood method are derived. A simulation study is carried out to investigate the performance of the estimators derived and the standard errors associate with the respective parameters estimates are found to be quite small.
General Dimensional Multiple-Output Support Vector Regressions and Their Multiple Kernel Learning.
Chung, Wooyong; Kim, Jisu; Lee, Heejin; Kim, Euntai
2015-11-01
Support vector regression has been considered as one of the most important regression or function approximation methodologies in a variety of fields. In this paper, two new general dimensional multiple output support vector regressions (MSVRs) named SOCPL1 and SOCPL2 are proposed. The proposed methods are formulated in the dual space and their relationship with the previous works is clearly investigated. Further, the proposed MSVRs are extended into the multiple kernel learning and their training is implemented by the off-the-shelf convex optimization tools. The proposed MSVRs are applied to benchmark problems and their performances are compared with those of the previous methods in the experimental section. PMID:25532215
Random design analysis of ridge regression
Hsu, Daniel; Kakade, Sham M.; Zhang, Tong
2011-01-01
This work gives a simultaneous analysis of both the ordinary least squares estimator and the ridge regression estimator in the random design setting under mild assumptions on the covariate/response distributions. In particular, the analysis provides sharp results on the ``out-of-sample'' prediction error, as opposed to the ``in-sample'' (fixed design) error. The analysis also reveals the effect of errors in the estimated covariance structure, as well as the effect of modeling errors, neither ...
Numerical analysis of robust regression methods
The robust estimates of linear regression parameters are considered. Numerical analysis of these methods and their modifications for the problems of particle track recognition is performed. For the cases when disbalance points can appear a special method of studentizing of residual vector is proposed. This method is investigated by theoretical and graphical means. The numerical characteristics of the methods obtained by the Monte-Carlo simulation in various models are summarized in tables. Some recommendations for using rubust regression methods in the mass data processing are given
M. Srinivasan
2012-01-01
Full Text Available Problem statement: This study presents a novel method for the determination of average winding temperature rise of transformers under its predetermined field operating conditions. Rise in the winding temperature was determined from the estimated values of winding resistance during the heat run test conducted as per IEC standard. Approach: The estimation of hot resistance was modeled using Multiple Variable Regression (MVR, Multiple Polynomial Regression (MPR and soft computing techniques such as Artificial Neural Network (ANN and Adaptive Neuro Fuzzy Inference System (ANFIS. The modeled hot resistance will help to find the load losses at any load situation without using complicated measurement set up in transformers. Results: These techniques were applied for the hot resistance estimation for dry type transformer by using the input variables cold resistance, ambient temperature and temperature rise. The results are compared and they show a good agreement between measured and computed values. Conclusion: According to our experiments, the proposed methods are verified using experimental results, which have been obtained from temperature rise test performed on a 55 kVA dry-type transformer.
Robust visual tracking via speedup multiple kernel ridge regression
Qian, Cheng; Breckon, Toby P.; Li, Hui
2015-09-01
Most of the tracking methods attempt to build up feature spaces to represent the appearance of a target. However, limited by the complex structure of the distribution of features, the feature spaces constructed in a linear manner cannot characterize the nonlinear structure well. We propose an appearance model based on kernel ridge regression for visual tracking. Dense sampling is fulfilled around the target image patches to collect the training samples. In order to obtain a kernel space in favor of describing the target appearance, multiple kernel learning is introduced into the selection of kernels. Under the framework, instead of a single kernel, a linear combination of kernels is learned from the training samples to create a kernel space. Resorting to the circulant property of a kernel matrix, a fast interpolate iterative algorithm is developed to seek coefficients that are assigned to these kernels so as to give an optimal combination. After the regression function is learned, all candidate image patches gathered are taken as the input of the function, and the candidate with the maximal response is regarded as the object image patch. Extensive experimental results demonstrate that the proposed method outperforms other state-of-the-art tracking methods.
A multiple regression model for the Ft. Calhoun reactor coolant pump system
Multiple regression analysis is one of the most widely used of all statistical tools. In this research paper, we introduce an application of fitting a multiple regression model on reactor coolant pump (RCP) data. The primary purpose of this research is to correlate the results obtained by Design of Experiments (DOE) and regression model fitting. Also, the idea behind using regression model is to gain more detailed information in the RCP data than provided by DOE. In engineering science, statistical quality control techniques have traditionally been applied to control manufacturing processes. An application to commercial nuclear power plant maintenance and control is presented that can greatly improve plant safety and reliability. The result obtained show that six out of ten parameters are under control specification limits and four parameters are not in the state of statistical control. The four parameters that are out of control adversely affect the regression model fitting and the final prediction equation, thereby, does not predict accurate response for the future. The analysis concludes that in order to fit a best regression model, one has to remove all out of control points from the data set, including dropping a variable from the model to have better prediction of the response variable. (author)
Functional data analysis of generalized regression quantiles
Guo, Mengmeng
2013-11-05
Generalized regression quantiles, including the conditional quantiles and expectiles as special cases, are useful alternatives to the conditional means for characterizing a conditional distribution, especially when the interest lies in the tails. We develop a functional data analysis approach to jointly estimate a family of generalized regression quantiles. Our approach assumes that the generalized regression quantiles share some common features that can be summarized by a small number of principal component functions. The principal component functions are modeled as splines and are estimated by minimizing a penalized asymmetric loss measure. An iterative least asymmetrically weighted squares algorithm is developed for computation. While separate estimation of individual generalized regression quantiles usually suffers from large variability due to lack of sufficient data, by borrowing strength across data sets, our joint estimation approach significantly improves the estimation efficiency, which is demonstrated in a simulation study. The proposed method is applied to data from 159 weather stations in China to obtain the generalized quantile curves of the volatility of the temperature at these stations. © 2013 Springer Science+Business Media New York.
Functional linear regression analysis for longitudinal data
Yao, F; Wang, J L; Yao, Fang; Mller, Hans-Georg; Wang, Jane-Ling
2005-01-01
We propose nonparametric methods for functional linear regression which are designed for sparse longitudinal data, where both the predictor and response are functions of a covariate such as time. Predictor and response processes have smooth random trajectories, and the data consist of a small number of noisy repeated measurements made at irregular times for a sample of subjects. In longitudinal studies, the number of repeated measurements per subject is often small and may be modeled as a discrete random number and, accordingly, only a finite and asymptotically nonincreasing number of measurements are available for each subject or experimental unit. We propose a functional regression approach for this situation, using functional principal component analysis, where we estimate the functional principal component scores through conditional expectations. This allows the prediction of an unobserved response trajectory from sparse measurements of a predictor trajectory. The resulting technique is flexible and allow...
An Analysis of Random Design Linear Regression
Hsu, Daniel; Zhang, Tong
2011-01-01
The random design setting for linear regression concerns estimators based on a random sample of covariate/response pairs. This work gives explicit bounds on the prediction error for the ordinary least squares estimator and the ridge regression estimator under mild assumptions on the covariate/response distributions. In particular, this work provides sharp results on the "out-of-sample" prediction error, as opposed to the "in-sample" (fixed design) error. Our analysis also explicitly reveals the effect of noise vs. modeling errors. The approach reveals a close connection to the more traditional fixed design setting, and our methods make use of recent advances in concentration inequalities (for vectors and matrices). We also describe an application of our results to fast least squares computations.
Multiple regression models for energy use in air-conditioned office buildings in different climates
An attempt was made to develop multiple regression models for office buildings in the five major climates in China - severe cold, cold, hot summer and cold winter, mild, and hot summer and warm winter. A total of 12 key building design variables were identified through parametric and sensitivity analysis, and considered as inputs in the regression models. The coefficient of determination R2 varies from 0.89 in Harbin to 0.97 in Kunming, indicating that 89-97% of the variations in annual building energy use can be explained by the changes in the 12 parameters. A pseudo-random number generator based on three simple multiplicative congruential generators was employed to generate random designs for evaluation of the regression models. The difference between regression-predicted and DOE-simulated annual building energy use are largely within 10%. It is envisaged that the regression models developed can be used to estimate the likely energy savings/penalty during the initial design stage when different building schemes and design concepts are being considered.
Somayeh Mashari; Karim Solaimani; Ebrahim Omidvar
2012-01-01
Landslide is a natural hazard that causes many damages to the environment. Depending on the landform, several factors can cause the Landslide. This research addresses the methodology for landslide susceptibility mapping using multiple regression analysis and GIS tools. Based on the initial hypothesis, ten factors were recognized as effectual elements on landslide, which is geology, slope, aspect, distance from roads, faults and drainage network, soil capability, land use and rainfall. Crossin...
Cuss, C W; Guguen, C
2014-01-01
This study reports on the development and application of a piecewise linear model for the determination of copper-binding parameters at concentrations in the nanomolar range using fluorescence quenching. L-Tyrosine, Suwannee River natural organic matter, and two leaf leachates with similar fluorescence signatures were used as test compounds, and results were compared with those of the standard Ryan-Weber model. The piecewise model was also applied to and compared with data from an earlier study. Parallel factor analysis (PARAFAC) was used to identify three to five independent fluorophores in each test compound, and copper-binding parameters were estimated for one to three binding sites for each fluorophore. The binding properties of similar and different fluorophores were also compared. The conditional binding strengths (log K') estimated using the piecewise approach were similar to those obtained using the Ryan-Weber approach (p?>?0.05); however, the piecewise linear model provided superior results compared to models based on the Ryan-Weber equation in several ways, including (1) capable of distinguishing more binding sites for a single fluorophore, (2) capable of extracting binding parameters at environmentally relevant, nanomolar concentrations of copper, where fluorescence changes are often observed as enhancement, (3) greater precision over repeated titrations, and (4) no severe underestimation of complexing capacities. Finally, the copper-binding properties of PARAFAC components with similar optical signatures were found to be similar, both in sources with dramatically different and similar total fluorescence signatures. PMID:24327077
Chen Su-Fen
2013-01-01
Full Text Available Unified Multiple Linear Regression (UMLR is a nonlinear programming model that unifies all kind of multiple linear regression models, such as Principal Components Regression, Ridge Regression, Robust Regression and constrained regression. Although, UMLR has exhibited excellent performances in some real applications, the optimization procedure is not satisfying yet. This study proposes a novel Granular Computing-Particle Swarm Optimization (Grc-PSO algorithm by introducing granular computing into standard PSO which is used for the optimization of the UMLR model. The experimental results show that the solution got by Grc-PSO algorithm is much better to the real situation than other state-of-art algorithms.
Forecasting Gold Prices Using Multiple Linear Regression Method
Z. Ismail
2009-01-01
Full Text Available Problem statement: Forecasting is a function in management to assist decision making. It is also described as the process of estimation in unknown future situations. In a more general term it is commonly known as prediction which refers to estimation of time series or longitudinal type data. Gold is a precious yellow commodity once used as money. It was made illegal in USA 41 years ago, but is now once again accepted as a potential currency. The demand for this commodity is on the rise. Approach: Objective of this study was to develop a forecasting model for predicting gold prices based on economic factors such as inflation, currency price movements and others. Following the melt-down of US dollars, investors are putting their money into gold because gold plays an important role as a stabilizing influence for investment portfolios. Due to the increase in demand for gold in Malaysian and other parts of the world, it is necessary to develop a model that reflects the structure and pattern of gold market and forecast movement of gold price. The most appropriate approach to the understanding of gold prices is the Multiple Linear Regression (MLR model. MLR is a study on the relationship between a single dependent variable and one or more independent variables, as this case with gold price as the single dependent variable. The fitted model of MLR will be used to predict the future gold prices. A naive model known as forecast-1 was considered to be a benchmark model in order to evaluate the performance of the model. Results: Many factors determine the price of gold and based on a hunch of experts, several economic factors had been identified to have influence on the gold prices. Variables such as Commodity Research Bureau future index (CRB; USD/Euro Foreign Exchange Rate (EUROUSD; Inflation rate (INF; Money Supply (M1; New York Stock Exchange (NYSE; Standard and Poor 500 (SPX; Treasury Bill (T-BILL and US Dollar index (USDX were considered to have influence on the prices. Parameter estimations for the MLR were carried out using Statistical Packages for Social Science package (SPSS with Mean Square Error (MSE as the fitness function to determine the forecast accuracy. Conclusion: Two models were considered. The first model considered all possible independent variables. The model appeared to be useful for predicting the price of gold with 85.2% of sample variations in monthly gold prices explained by the model. The second model considered the following four independent variables the (CRB lagged one, (EUROUSD lagged one, (INF lagged two and (M1 lagged two to be significant. In terms of prediction, the second model achieved high level of predictive accuracy. The amount of variance explained was about 70% and the regression coefficients also provide a means of assessing the relative importance of individual variables in the overall prediction of gold price.
Multiple predictor smoothing methods for sensitivity analysis.
Helton, Jon Craig; Storlie, Curtis B.
2006-08-01
The use of multiple predictor smoothing methods in sampling-based sensitivity analyses of complex models is investigated. Specifically, sensitivity analysis procedures based on smoothing methods employing the stepwise application of the following nonparametric regression techniques are described: (1) locally weighted regression (LOESS), (2) additive models, (3) projection pursuit regression, and (4) recursive partitioning regression. The indicated procedures are illustrated with both simple test problems and results from a performance assessment for a radioactive waste disposal facility (i.e., the Waste Isolation Pilot Plant). As shown by the example illustrations, the use of smoothing procedures based on nonparametric regression techniques can yield more informative sensitivity analysis results than can be obtained with more traditional sensitivity analysis procedures based on linear regression, rank regression or quadratic regression when nonlinear relationships between model inputs and model predictions are present.
Forecasting Electrical Load using ANN Combined with Multiple Regression Method
Saeed M. Badran; Ossama B. Abouelatta
2012-01-01
This paper combined artificial neural network and regression modeling methods to predict electrical load. We propose an approach for specific day, week and/or month load forecasting for electrical companies taking into account the historical load. Therefore, a modified technique, based on artificial neural network (ANN) combined with linear regression, is applied on the KSA electrical network dependent on its historical data to predict the electrical load demand forecasting up to year 2020. T...
An Effect Size for Regression Predictors in Meta-Analysis
Aloe, Ariel M.; Becker, Betsy Jane
2012-01-01
A new effect size representing the predictive power of an independent variable from a multiple regression model is presented. The index, denoted as r[subscript sp], is the semipartial correlation of the predictor with the outcome of interest. This effect size can be computed when multiple predictor variables are included in the regression model
Repeated Results Analysis for Middleware Regression Benchmarking
Bulej, Lubomír; Kalibera, T.; Tůma, P.
2005-01-01
Roč. 60, - (2005), s. 345-358. ISSN 0166-5316 R&D Projects: GA ČR GA102/03/0672 Institutional research plan: CEZ:AV0Z10300504 Keywords : middleware benchmarking * regression benchmarking * regression testing Subject RIV: JD - Computer Applications, Robotics Impact factor: 0.756, year: 2005
Using Dominance Analysis to Determine Predictor Importance in Logistic Regression
Azen, Razia; Traxel, Nicole
2009-01-01
This article proposes an extension of dominance analysis that allows researchers to determine the relative importance of predictors in logistic regression models. Criteria for choosing logistic regression R[superscript 2] analogues were determined and measures were selected that can be used to perform dominance analysis in logistic regression. A…
Regression Analysis with a Stochastic Design Variable
Sazak,, Hakan S.; Tiku, Moti L; Qamarul Islam, M.
2006-01-01
In regression models, the design variable has primarily been treated as a nonstochastic variable. In numerous situations, however, the design variable is stochastic. The estimation and hypothesis testing problems in such situations are considered. Real life examples are given.
Self-concordant analysis for logistic regression
Bach, Francis
2010-01-01
Most of the non-asymptotic theoretical work in regression is carried out for the square loss, where estimators can be obtained through closed-form expressions. In this paper, we use and extend tools from the convex optimization literature, namely self-concordant functions, to provide simple extensions of theoretical results for the square loss to the logistic loss. We apply the extension techniques to logistic regression with regularization by the ℓ2-norm and regularization by the ℓ1-norm, sh...
A maximum likelihood latent variable regression model for multiple informants
Horton, Nicholas J.; Roberts, Kevin, 1940-; Ryan, Louise; Suglia, Shakira Franco; Wright, Rosalind J.
2008-01-01
Studies pertaining to childhood psychopathology often incorporate information from multiple sources (or informants). For example, measurement of some factor of particular interest might be collected from parents, teachers as well as the children being studied. We propose a latent variable modeling framework to incorporate multiple informant predictor data. Several related models are presented, and likelihood ratio tests (LRT) are introduced to formally compare fit. The incorporation of partia...
MULTIPLE LOGISTIC REGRESSION MODEL TO PREDICT RISK FACTORS OF ORAL HEALTH DISEASES
Parameshwar V. Pandit
2012-06-01
Full Text Available Purpose: To analysis the dependence of oral health diseases i.e. dental caries and periodontal disease on considering the number of risk factors through the applications of logistic regression model. Method: The cross sectional study involves a systematic random sample of 1760 permanent dentition aged between 18-40 years in Dharwad, Karnataka, India. Dharwad is situated in North Karnataka. The mean age was 34.26±7.28. The risk factors of dental caries and periodontal disease were established by multiple logistic regression model using SPSS statistical software. Results: The factors like frequency of brushing, timings of cleaning teeth and type of toothpastes are significant persistent predictors of dental caries and periodontal disease. The log likelihood value of full model is –1013.1364 and Akaike’s Information Criterion (AIC is 1.1752 as compared to reduced regression model are -1019.8106 and 1.1748 respectively for dental caries. But, the log likelihood value of full model is –1085.7876 and AIC is 1.2577 followed by reduced regression model are -1019.8106 and 1.1748 respectively for periodontal disease. The area under Receiver Operating Characteristic (ROC curve for the dental caries is 0.7509 (full model and 0.7447 (reduced model; the ROC for the periodontal disease is 0.6128 (full model and 0.5821 (reduced model. Conclusions: The frequency of brushing, timings of cleaning teeth and type of toothpastes are main signifi cant risk factors of dental caries and periodontal disease. The fitting performance of reduced logistic regression model is slightly a better fit as compared to full logistic regression model in identifying the these risk factors for both dichotomous dental caries and periodontal disease.
Nie, Z Q; Ou, Y Q; Zhuang, J; Qu, Y J; Mai, J Z; Chen, J M; Liu, X Q
2016-05-10
Conditional logistic regression analysis and unconditional logistic regression analysis are commonly used in case control study, but Cox proportional hazard model is often used in survival data analysis. Most literature only refer to main effect model, however, generalized linear model differs from general linear model, and the interaction was composed of multiplicative interaction and additive interaction. The former is only statistical significant, but the latter has biological significance. In this paper, macros was written by using SAS 9.4 and the contrast ratio, attributable proportion due to interaction and synergy index were calculated while calculating the items of logistic and Cox regression interactions, and the confidence intervals of Wald, delta and profile likelihood were used to evaluate additive interaction for the reference in big data analysis in clinical epidemiology and in analysis of genetic multiplicative and additive interactions. PMID:27188374
Spatial regression analysis on 32 years total column ozone data
J. S. Knibbe
2014-02-01
Full Text Available Multiple-regressions analysis have been performed on 32 years of total ozone column data that was spatially gridded with a 1° × 1.5° resolution. The total ozone data consists of the MSR (Multi Sensor Reanalysis; 1979–2008 and two years of assimilated SCIAMACHY ozone data (2009–2010. The two-dimensionality in this data-set allows us to perform the regressions locally and investigate spatial patterns of regression coefficients and their explanatory power. Seasonal dependencies of ozone on regressors are included in the analysis. A new physically oriented model is developed to parameterize stratospheric ozone. Ozone variations on non-seasonal timescales are parameterized by explanatory variables describing the solar cycle, stratospheric aerosols, the quasi-biennial oscillation (QBO, El Nino (ENSO and stratospheric alternative halogens (EESC. For several explanatory variables, seasonally adjusted versions of these explanatory variables are constructed to account for the difference in their effect on ozone throughout the year. To account for seasonal variation in ozone, explanatory variables describing the polar vortex, geopotential height, potential vorticity and average day length are included. Results of this regression model are compared to that of similar analysis based on a more commonly applied statistically oriented model. The physically oriented model provides spatial patterns in the regression results for each explanatory variable. The EESC has a significant depleting effect on ozone at high and mid-latitudes, the solar cycle affects ozone positively mostly at the Southern Hemisphere, stratospheric aerosols affect ozone negatively at high Northern latitudes, the effect of QBO is positive and negative at the tropics and mid to high-latitudes respectively and ENSO affects ozone negatively between 30° N and 30° S, particularly at the Pacific. The contribution of explanatory variables describing seasonal ozone variation is generally large at mid to high latitudes. We observe ozone contributing effects for potential vorticity and day length, negative effect on ozone for geopotential height and variable ozone effects due to the polar vortex at regions to the north and south of the polar vortices. Recovery of ozone is identified globally. However, recovery rates and uncertainties strongly depend on choices that can be made in defining the explanatory variables. In particular the recovery rates over Antarctica might not be statistically significant. Furthermore, the results show that there is no spatial homogeneous pattern which regression model and explanatory variables provide the best fit to the data and the most accurate estimates of the recovery rates. Overall these results suggest that care has to be taken in determining ozone recovery rates, in particular for the Antarctic ozone hole.
A maximum likelihood latent variable regression model for multiple informants
Horton, Nicholas J.; Roberts, Kevin; Ryan, Louise; Suglia, Shakira Franco; Wright, Rosalind J.
2008-01-01
SUMMARY Studies pertaining to childhood psychopathology often incorporate information from multiple sources (or informants). For example, measurement of some factor of particular interest might be collected from parents, teachers as well as the children being studied. We propose a latent variable modeling framework to incorporate multiple informant predictor data. Several related models are presented, and likelihood ratio tests (LRT) are introduced to formally compare fit. The incorporation of partially observed subjects is addressed under a variety of missing data mechanisms. The methods are motivated by and applied to a study of the association of chronic exposure to violence on asthma in children. PMID:18613227
Approximation Analysis of Learning Algorithms for Support Vector Regression and Quantile Regression
Dao-Hong Xiang; Ting Hu; Ding-Xuan Zhou
2012-01-01
We study learning algorithms generated by regularization schemes in reproducing kernel Hilbert spaces associated with an ${\\epsilon}$ -insensitive pinball loss. This loss function is motivated by the ${\\epsilon}$ -insensitive loss for support vector regression and the pinball loss for quantile regression. Approximation analysis is conducted for these algorithms by means of a variance-expectation bound when a noise condition is satisfied for the underlying probability measure. The ...
Mapping of multiple quantitative trait loci by simple regression in half-sib designs.
de Koning, D J; Schulmant, N F; Elo, K; Moisio, S; Kinos, R; Vilkki, J; Mki-Tanila, A
2001-03-01
Detection of QTL in outbred half-sib family structures has mainly been based on interval mapping of single QTL on individual chromosomes. Methods to account for linked and unlinked QTL have been developed, but most of them are only applicable in designs with inbred species or pose great demands on computing facilities. This study describes a strategy that allows for rapid analysis, involving multiple QTL, of complete genomes. The methods combine information from individual analyses after which trait scores for a specific linkage group are adjusted for identified QTL at other linkage groups. Regression methods are used to estimate QTL positions and effects; permutation tests are used to obtain empirical threshold values. The description of the methods is complemented by an example of the combined analysis of 28 bovine chromosomes and their associations with milk yield in Finnish Ayrshire cattle. In this example, the individual analysis revealed five suggestive QTL affecting milk yield. Following the strategy presented in this paper, the final combined analysis showed eight significant QTL affecting milk yield. This clearly demonstrates the potential gain of using the combined analysis. The use of regression methods, with low demands on computing resources, makes this approach very practical for total genome scans. PMID:11263821
Elzamly, Abdelrafe; Hussin, Burairah
2014-01-01
The aim of this paper is to propose new mining techniques by which we can study the impact of different risk management techniques and different software risk factors on software analysis development projects. The new mining technique uses the fuzzy multiple regression analysis techniques with fuzzy concepts to manage the software risks in a software project and mitigating risk with software process improvement. Top ten software risk factors in analysis phase and thirty risk management techni...
A Note on the Adaptive Estimation of a Multiplicative Separable Regression Function
Christophe Chesneau
2014-01-01
We investigate the estimation of a multiplicative separable regression function from a bidimensional nonparametric regression model with random design. We present a general estimator for this problem and study its mean integrated squared error (MISE) properties. A wavelet version of this estimator is developed. In some situations, we prove that it attains the standard unidimensional rate of convergence under the MISE over Besov balls.
Comparison of Fuzzy Inference System and Multiple Regression to Predict Synthetic Envelopes Clogging
Bakhtiar Karimi; Farhad Mirzaei; Mohammad Javad Nahvinia; Behnam Ababaei
2010-01-01
Geo-synthetic materials are being used with acceptable performance in soil and water projects worldwide. Geotextiles are one of the categories of geo-synthetics being used in drainage systems. First generation of geotextiles used in the late 1950’s as an alternative for gravel envelopes. In this research two methods (multiple regression and fuzzy interference system) evaluate to predict synthetic envelope clogging. In multiple regression method the correlation coefficients for PP450, PP700 an...
Ohlmacher, G.C.; Davis, J.C.
2003-01-01
Landslides in the hilly terrain along the Kansas and Missouri rivers in northeastern Kansas have caused millions of dollars in property damage during the last decade. To address this problem, a statistical method called multiple logistic regression has been used to create a landslide-hazard map for Atchison, Kansas, and surrounding areas. Data included digitized geology, slopes, and landslides, manipulated using ArcView GIS. Logistic regression relates predictor variables to the occurrence or nonoccurrence of landslides within geographic cells and uses the relationship to produce a map showing the probability of future landslides, given local slopes and geologic units. Results indicated that slope is the most important variable for estimating landslide hazard in the study area. Geologic units consisting mostly of shale, siltstone, and sandstone were most susceptible to landslides. Soil type and aspect ratio were considered but excluded from the final analysis because these variables did not significantly add to the predictive power of the logistic regression. Soil types were highly correlated with the geologic units, and no significant relationships existed between landslides and slope aspect. ?? 2003 Elsevier Science B.V. All rights reserved.
Introductory regression analysis with computer application for business and economics
Webster, Allen
2013-01-01
Regression analysis is arguably the single most powerful and widely applicable tool in any effective examination of common business issues. Every day, decision-makers face problems that require constructive actions with significant consequences, and regression procedures can prove a meaningful and valuable asset in the decision-making process. This text is designed to help students achieve a full understanding of regression and the many ways it can be used.Taking into consideration current statistical technology, Introductory Regression Analysis focuses on the use and interpretation
A note on the use of multiple linear regression in molecular ecology.
Frasier, Timothy R
2016-03-01
Multiple linear regression analyses (also often referred to as generalized linear models - GLMs, or generalized linear mixed models - GLMMs) are widely used in the analysis of data in molecular ecology, often to assess the relative effects of genetic characteristics on individual fitness or traits, or how environmental characteristics influence patterns of genetic differentiation. However, the coefficients resulting from multiple regression analyses are sometimes misinterpreted, which can lead to incorrect interpretations and conclusions within individual studies, and can propagate to wider-spread errors in the general understanding of a topic. The primary issue revolves around the interpretation of coefficients for independent variables when interaction terms are also included in the analyses. In this scenario, the coefficients associated with each independent variable are often interpreted as the independent effect of each predictor variable on the predicted variable. However, this interpretation is incorrect. The correct interpretation is that these coefficients represent the effect of each predictor variable on the predicted variable when all other predictor variables are zero. This difference may sound subtle, but the ramifications cannot be overstated. Here, my goals are to raise awareness of this issue, to demonstrate and emphasize the problems that can result and to provide alternative approaches for obtaining the desired information. PMID:26650184
Zahari, Siti Meriam; Ramli, Norazan Mohamed; Moktar, Balkiah; Zainol, Mohammad Said
2014-09-01
In the presence of multicollinearity and multiple outliers, statistical inference of linear regression model using ordinary least squares (OLS) estimators would be severely affected and produces misleading results. To overcome this, many approaches have been investigated. These include robust methods which were reported to be less sensitive to the presence of outliers. In addition, ridge regression technique was employed to tackle multicollinearity problem. In order to mitigate both problems, a combination of ridge regression and robust methods was discussed in this study. The superiority of this approach was examined when simultaneous presence of multicollinearity and multiple outliers occurred in multiple linear regression. This study aimed to look at the performance of several well-known robust estimators; M, MM, RIDGE and robust ridge regression estimators, namely Weighted Ridge M-estimator (WRM), Weighted Ridge MM (WRMM), Ridge MM (RMM), in such a situation. Results of the study showed that in the presence of simultaneous multicollinearity and multiple outliers (in both x and y-direction), the RMM and RIDGE are more or less similar in terms of superiority over the other estimators, regardless of the number of observation, level of collinearity and percentage of outliers used. However, when outliers occurred in only single direction (y-direction), the WRMM estimator is the most superior among the robust ridge regression estimators, by producing the least variance. In conclusion, the robust ridge regression is the best alternative as compared to robust and conventional least squares estimators when dealing with simultaneous presence of multicollinearity and outliers.
Simulation Experiments in Practice: Statistical Design and Regression Analysis
Kleijnen, J.P.C.
2007-01-01
In practice, simulation analysts often change only one factor at a time, and use graphical analysis of the resulting Input/Output (I/O) data. The goal of this article is to change these traditional, naïve methods of design and analysis, because statistical theory proves that more information is obtained when applying Design Of Experiments (DOE) and linear regression analysis. Unfortunately, classic DOE and regression analysis assume a single simulation response that is normally and independen...
Stability Analysis and Learning Bounds for Transductive Regression Algorithms
Cortes, Corinna; Mohri, Mehryar; Pechyony, Dmitry; Rastogi, Ashish
2009-01-01
This paper uses the notion of algorithmic stability to derive novel generalization bounds for several families of transductive regression algorithms, both by using convexity and closed-form solutions. Our analysis helps compare the stability of these algorithms. It also shows that a number of widely used transductive regression algorithms are in fact unstable. Finally, it reports the results of experiments with local transductive regression demonstrating the benefit of our stability bounds fo...
Urrutia, Jackie D.; Tampis, Razzcelle L.; Mercado, Joseph; Baygan, Aaron Vito M.; Baccay, Edcon B.
2016-02-01
The objective of this research is to formulate a mathematical model for the Philippines' Real Gross Domestic Product (Real GDP). The following factors are considered: Consumers' Spending (x1), Government's Spending (x2), Capital Formation (x3) and Imports (x4) as the Independent Variables that can actually influence in the Real GDP in the Philippines (y). The researchers used a Normal Estimation Equation using Matrices to create the model for Real GDP and used α = 0.01.The researchers analyzed quarterly data from 1990 to 2013. The data were acquired from the National Statistical Coordination Board (NSCB) resulting to a total of 96 observations for each variable. The data have undergone a logarithmic transformation particularly the Dependent Variable (y) to satisfy all the assumptions of the Multiple Linear Regression Analysis. The mathematical model for Real GDP was formulated using Matrices through MATLAB. Based on the results, only three of the Independent Variables are significant to the Dependent Variable namely: Consumers' Spending (x1), Capital Formation (x3) and Imports (x4), hence, can actually predict Real GDP (y). The regression analysis displays that 98.7% (coefficient of determination) of the Independent Variables can actually predict the Dependent Variable. With 97.6% of the result in Paired T-Test, the Predicted Values obtained from the model showed no significant difference from the Actual Values of Real GDP. This research will be essential in appraising the forthcoming changes to aid the Government in implementing policies for the development of the economy.
Željko V. Račić
2010-12-01
Full Text Available This paper aims to present the specifics of the application of multiple linear regression model. The economic (financial crisis is analyzed in terms of gross domestic product which is in a function of the foreign trade balance (on one hand and the credit cards, i.e. indebtedness of the population on this basis (on the other hand, in the USA (from 1999. to 2008. We used the extended application model which shows how the analyst should run the whole development process of regression model. This process began with simple statistical features and the application of regression procedures, and ended with residual analysis, intended for the study of compatibility of data and model settings. This paper also analyzes the values of some standard statistics used in the selection of appropriate regression model. Testing of the model is carried out with the use of the Statistics PASW 17 program.
Estimating changes in river faecal coliform loading using nonparametric multiplicative regression.
Schulz, Christopher J; Childers, Gary W
2011-03-01
Faecal coliform (FC) concentration was monitored weekly in the Tangipahoa River over an eight year period. Available USGS discharge and precipitation data were used to construct a nonparametric multiplicative regression (NPMR) model for both forecasting and backcasting of FC density. NPMR backcasting and forecasting of FC allowed for estimation of concentration for any flow regime. During this study a remediation effort was undertaken to improve disinfection systems of contributing municipal waste water treatment plants in the watershed. Time-series analysis of FC concentrations demonstrated a drop in FC levels coinciding with remediation efforts. The NPMR model suggested the reduction in FC levels was not due to climate variance (i.e. discharge and precipitation changes) alone. Use of the NPMR method circumvented the need for construction of a more complex physical watershed model to estimate FC loading in the river. This method can be used to detect and estimate new discharge impacts, or forecast daily FC estimates. PMID:21301120
A method for the analysis of capillary column Polychlorinated biphenyl (PCB) data using regression analysis with outlier checking and elimination, COMSTAR, is presented and evaluated. his algorithm determines the best combination of the commercial PCB mixtures which best fits the...
Clara Novoa; Suleima Alkusari
2012-01-01
This talk exemplifies the application of the multiple imputation technique available in STATA to analize a design of experiments with multiple responses and missing data. No imputation and multiple imputation methodologies are compared.
Projection estimation in multiple regression with application to functional ANOVA models
Huang, Jianhua Z.
1998-01-01
A general theory on rates of convergence of the least-squares projection estimate in multiple regression is developed. The theory is applied to the functional ANOVA model, where the multivariate regression function is modeled as a specified sum of a constant term, main effects (functions of one variable) and selected interaction terms (functions of two or more variables). The least-squares projection is onto an approximating space constructed from arbitrary linear spaces of functions and thei...
Ridge Regression: A Regression Procedure for Analyzing correlated Independent Variables
Rakow, Ernest A.
1978-01-01
Ridge regression is a technique used to ameliorate the problem of highly correlated independent variables in multiple regression analysis. This paper explains the fundamentals of ridge regression and illustrates its use. (JKS)
3D Regression Heat Map Analysis of Population Study Data.
Klemm, Paul; Lawonn, Kai; Glaser, Sylvia; Niemann, Uli; Hegenscheid, Katrin; Volzke, Henry; Preim, Bernhard
2016-01-01
Epidemiological studies comprise heterogeneous data about a subject group to define disease-specific risk factors. These data contain information (features) about a subject's lifestyle, medical status as well as medical image data. Statistical regression analysis is used to evaluate these features and to identify feature combinations indicating a disease (the target feature). We propose an analysis approach of epidemiological data sets by incorporating all features in an exhaustive regression-based analysis. This approach combines all independent features w.r.t. a target feature. It provides a visualization that reveals insights into the data by highlighting relationships. The 3D Regression Heat Map, a novel 3D visual encoding, acts as an overview of the whole data set. It shows all combinations of two to three independent features with a specific target disease. Slicing through the 3D Regression Heat Map allows for the detailed analysis of the underlying relationships. Expert knowledge about disease-specific hypotheses can be included into the analysis by adjusting the regression model formulas. Furthermore, the influences of features can be assessed using a difference view comparing different calculation results. We applied our 3D Regression Heat Map method to a hepatic steatosis data set to reproduce results from a data mining-driven analysis. A qualitative analysis was conducted on a breast density data set. We were able to derive new hypotheses about relations between breast density and breast lesions with breast cancer. With the 3D Regression Heat Map, we present a visual overview of epidemiological data that allows for the first time an interactive regression-based analysis of large feature sets with respect to a disease. PMID:26529689
Egg hatchability prediction by multiple linear regression and artificial neural networks
AC Bolzan
2008-06-01
Full Text Available An artificial neural network (ANN was compared with a multiple linear regression statistical method to predict hatchability in an artificial incubation process. A feedforward neural network architecture was applied. Network trainings were made by the backpropagation algorithm based on data obtained from industrial incubations. The ANN model was chosen as it produced data that fit better the experimental data as compared to the multiple linear regression model, which used coefficients determined by minimum square method. The proposed simulation results of these approaches indicate that this ANN can be used for incubation performance prediction.
Sykas, Dimitris; Karathanassi, Vassilia
2015-06-01
This paper presents a new method for automatically determining the optimum regression model, which enable the estimation of a parameter. The concept lies on the combination of k spectral pre-processing algorithms (SPPAs) that enhance spectral features correlated to the desired parameter. Initially a pre-processing algorithm uses as input a single spectral signature and transforms it according to the SPPA function. A k-step combination of SPPAs uses k preprocessing algorithms serially. The result of each SPPA is used as input to the next SPPA, and so on until the k desired pre-processed signatures are reached. These signatures are then used as input to three different regression methods: the Normalized band Difference Regression (NDR), the Multiple Linear Regression (MLR) and the Partial Least Squares Regression (PLSR). Three Simple Genetic Algorithms (SGAs) are used, one for each regression method, for the selection of the optimum combination of k SPPAs. The performance of the SGAs is evaluated based on the RMS error of the regression models. The evaluation not only indicates the selection of the optimum SPPA combination but also the regression method that produces the optimum prediction model. The proposed method was applied on soil spectral measurements in order to predict Soil Organic Matter (SOM). In this study, the maximum value assigned to k was 3. PLSR yielded the highest accuracy while NDR's accuracy was satisfactory compared to its complexity. MLR method showed severe drawbacks due to the presence of noise in terms of collinearity at the spectral bands. Most of the regression methods required a 3-step combination of SPPAs for achieving the highest performance. The selected preprocessing algorithms were different for each regression method since each regression method handles with a different way the explanatory variables.
Seo, Min-Seok; Kim, Ja-Kyung; Shim, Jae-Yong
2015-01-01
We report a case of regression of multiple pulmonary metastases, which originated from hepatocellular carcinoma after treatment with intravenous administration of high-dose vitamin C. A 74-year-old woman presented to the clinic for her cancer-related symptoms such as general weakness and anorexia. After undergoing initial transarterial chemoembolization (TACE), local recurrence with multiple pulmonary metastases was found. She refused further conventional therapy, including sorafenib tosylate...
Background stratified Poisson regression analysis of cohort data
Richardson, David B; Langholz, Bryan
2011-01-01
Background stratified Poisson regression is an approach that has been used in the analysis of data derived from a variety of epidemiologically important studies of radiation-exposed populations, including uranium miners, nuclear industry workers, and atomic bomb survivors. We describe a novel approach to fit Poisson regression models that adjust for a set of covariates through background stratification while directly estimating the radiation-disease association of primary interest. The approa...
Evaluating the Sustainable Development of Agriculture Based on Multiple Linear Regression
Li Qing-xue; Wu Hua-rui
2013-01-01
Agriculture is the base of national economy, rural area is basic community and agricultural sustainable development is the base of whole society sustainable development. Studying evaluation index system of agricultural sustainable development level, constructing reasonable evaluation model, are significant for path selection and level promotion. Evaluation index system based on input and output has been built with the method of multiple regression, the inte...
A Simple and Convenient Method of Multiple Linear Regression to Calculate Iodine Molecular Constants
Cooper, Paul D.
2010-01-01
A new procedure using a student-friendly least-squares multiple linear-regression technique utilizing a function within Microsoft Excel is described that enables students to calculate molecular constants from the vibronic spectrum of iodine. This method is advantageous pedagogically as it calculates molecular constants for ground and excited…
We report a case of tumor regression of multiple bone metastases from breast carcinoma after administration of strontium-89 chloride. This case suggests that strontium-89 chloride can not only relieve bone metastases pain not responsive to analgesics, but may also have a tumoricidal effect on bone metastases
Calculation of U, Ra, Th and K contents in uranium ore by multiple linear regression method
A multiple linear regression method was used to compute ? spectra of uranium ore samples and to calculate contents of U, Ra, Th, and K. In comparison with the inverse matrix method, its advantage is that no standard samples of pure U, Ra, Th and K are needed for obtaining response coefficients
Linear regression and sensitivity analysis in nuclear reactor design
Highlights: • Presented a benchmark for the applicability of linear regression to complex systems. • Applied linear regression to a nuclear reactor power system. • Performed neutronics, thermal–hydraulics, and energy conversion using Brayton’s cycle for the design of a GCFBR. • Performed detailed sensitivity analysis to a set of parameters in a nuclear reactor power system. • Modeled and developed reactor design using MCNP, regression using R, and thermal–hydraulics in Java. - Abstract: The paper presents a general strategy applicable for sensitivity analysis (SA), and uncertainity quantification analysis (UA) of parameters related to a nuclear reactor design. This work also validates the use of linear regression (LR) for predictive analysis in a nuclear reactor design. The analysis helps to determine the parameters on which a LR model can be fit for predictive analysis. For those parameters, a regression surface is created based on trial data and predictions are made using this surface. A general strategy of SA to determine and identify the influential parameters those affect the operation of the reactor is mentioned. Identification of design parameters and validation of linearity assumption for the application of LR of reactor design based on a set of tests is performed. The testing methods used to determine the behavior of the parameters can be used as a general strategy for UA, and SA of nuclear reactor models, and thermal hydraulics calculations. A design of a gas cooled fast breeder reactor (GCFBR), with thermal–hydraulics, and energy transfer has been used for the demonstration of this method. MCNP6 is used to simulate the GCFBR design, and perform the necessary criticality calculations. Java is used to build and run input samples, and to extract data from the output files of MCNP6, and R is used to perform regression analysis and other multivariate variance, and analysis of the collinearity of data
Regression analysis with linked data: problems and possible solutions
Andrea Tancredi
2015-03-01
Full Text Available In this paper we have described and extended some recent proposals on a general Bayesian methodology for performing record linkage and making inference using the resulting matched units. In particular, we have framed the record linkage process into a formal statistical model which comprises both the matching variables and the other variables included at the inferential stage. This way, the researcher is able to account for the matching process uncertainty in inferential procedures based on probabilistically linked data, and at the same time, he/she is also able to generate a feedback propagation of the information between the working statistical model and the record linkage stage.We have argued that this feedback effect is both essential to eliminate potential biases that otherwise would characterize the resulting linked data inference, and able to improve record linkage performances. The practical implementation of the procedure is based on the use of standard Bayesian computational techniques, such as Markov Chain Monte Carlo algorithms. Although the methodology is quite general, we have restricted our analysis to the popular and important case of multiple linear regression set-up for expository convenience.
The Study on Technology Innovation of Chinese Enterprises by Regression Analysis
ZIYAN ZHANG; xungang zheng
2011-01-01
According to China Science and Technology Data in recent years, we use Multiple Regression to analysis the influencing factors of technology innovation, and demonstrate the impact of significant and non-significant factors about China’s investment expenditures related policies for technological innovation, so as to enhance China's technological innovation capability and to promote domestic economic development play a guidance and reference.
Seo, Min-Seok; Kim, Ja-Kyung; Shim, Jae-Yong
2015-09-01
We report a case of regression of multiple pulmonary metastases, which originated from hepatocellular carcinoma after treatment with intravenous administration of high-dose vitamin C. A 74-year-old woman presented to the clinic for her cancer-related symptoms such as general weakness and anorexia. After undergoing initial transarterial chemoembolization (TACE), local recurrence with multiple pulmonary metastases was found. She refused further conventional therapy, including sorafenib tosylate (Nexavar). She did receive high doses of vitamin C (70 g), which were administered into a peripheral vein twice a week for 10 months, and multiple pulmonary metastases were observed to have completely regressed. She then underwent subsequent TACE, resulting in remission of her primary hepatocellular carcinoma. PMID:26256994
Gangopadhyay, S.; Clark, M. P.; Rajagopalan, B.
2002-12-01
The success of short term (days to fortnight) streamflow forecasting largely depends on the skill of surface climate (e.g., precipitation and temperature) forecasts at local scales in the individual river basins. The surface climate forecasts are used to drive the hydrologic models for streamflow forecasting. Typically, Medium Range Forecast (MRF) models provide forecasts of large scale circulation variables (e.g. pressures, wind speed, relative humidity etc.) at different levels in the atmosphere on a regular grid - which are then used to "downscale" to the surface climate at locations within the model grid box. Several statistical and dynamical methods are available for downscaling. This paper compares the utility of two statistical downscaling methodologies: (1) multiple linear regression (MLR) and (2) a nonparametric approach based on k-nearest neighbor (k-NN) bootstrap method, in providing local-scale information of precipitation and temperature at a network of stations in the Upper Colorado River Basin. Downscaling to the stations is based on output of large scale circulation variables (i.e. predictors) from the NCEP Medium Range Forecast (MRF) database. Fourteen-day six hourly forecasts are developed using these two approaches, and their forecast skill evaluated. A stepwise regression is performed at each location to select the predictors for the MLR. The k-NN bootstrap technique resamples historical data based on their "nearness" to the current pattern in the predictor space. Prior to resampling a Principal Component Analysis (PCA) is performed on the predictor set to identify a small subset of predictors. Preliminary results using the MLR technique indicate a significant value in the downscaled MRF output in predicting runoff in the Upper Colorado Basin. It is expected that the k-NN approach will match the skill of the MLR approach at individual stations, and will have the added advantage of preserving the spatial co-variability between stations, capturing nonlinearities in the relationship and non-gaussian error structure, and the consistency between forecasted precipitation and temperature.
Ratio Versus Regression Analysis: Some Empirical Evidence in Brazil
Newton Carneiro Affonso da Costa Jr.
2004-06-01
Full Text Available This work compares the traditional methodology for ratio analysis, applied to a sample of Brazilian firms, with the alternative one of regression analysis both to cross-industry and intra-industry samples. It was tested the structural validity of the traditional methodology through a model that represents its analogous regression format. The data are from 156 Brazilian public companies in nine industrial sectors for the year 1997. The results provide weak empirical support for the traditional ratio methodology as it was verified that the validity of this methodology may differ between ratios.
Time series analysis using semiparametric regression on oil palm production
Yundari, Pasaribu, U. S.; Mukhaiyar, U.
2016-04-01
This paper presents semiparametric kernel regression method which has shown its flexibility and easiness in mathematical calculation, especially in estimating density and regression function. Kernel function is continuous and it produces a smooth estimation. The classical kernel density estimator is constructed by completely nonparametric analysis and it is well reasonable working for all form of function. Here, we discuss about parameter estimation in time series analysis. First, we consider the parameters are exist, then we use nonparametrical estimation which is called semiparametrical. The selection of optimum bandwidth is obtained by considering the approximation of Mean Integrated Square Root Error (MISE).
Analysis of Sting Balance Calibration Data Using Optimized Regression Models
Ulbrich, N.; Bader, Jon B.
2010-01-01
Calibration data of a wind tunnel sting balance was processed using a candidate math model search algorithm that recommends an optimized regression model for the data analysis. During the calibration the normal force and the moment at the balance moment center were selected as independent calibration variables. The sting balance itself had two moment gages. Therefore, after analyzing the connection between calibration loads and gage outputs, it was decided to choose the difference and the sum of the gage outputs as the two responses that best describe the behavior of the balance. The math model search algorithm was applied to these two responses. An optimized regression model was obtained for each response. Classical strain gage balance load transformations and the equations of the deflection of a cantilever beam under load are used to show that the search algorithm s two optimized regression models are supported by a theoretical analysis of the relationship between the applied calibration loads and the measured gage outputs. The analysis of the sting balance calibration data set is a rare example of a situation when terms of a regression model of a balance can directly be derived from first principles of physics. In addition, it is interesting to note that the search algorithm recommended the correct regression model term combinations using only a set of statistical quality metrics that were applied to the experimental data during the algorithm s term selection process.
Sintering equation: determination of its coefficients by experiments - using multiple regression
Sintering is a method for volume-compression (or volume-contraction) of powdered or grained material applying high temperature (less than the melting point of the material). Maekipirtti tried to find an equation which describes the process of sintering by its main parameters sintering time, sintering temperature and volume contracting. Such equation is called a sintering equation. It also contains some coefficients which characterise the behaviour of the material during the process of sintering. These coefficients have to be determined by experiments. Here we show that some linear regressions will produce wrong coefficients, but multiple regression results in an useful sintering equation. (orig.)
Regression analysis of creep-rupture data: a practical approach
A generalized linear regression approach to the analysis of creep and creep-rupture data appears to have great promise for future applications. Uncertainties in predictions of creep behavior can be large due to heat treatment, heat-to-heat and other variations in properties. For types 304 and 316 stainless steels and for 2 1/4 Cr--1 Mo steel these uncertainties can be reduced by using regression models that include terms involving the ultimate tensile strength or 100-hr rupture strength of a given heat. A model for Alloy 800H was developed to predict the middle of the scatter band on behavior. Regression analysis of single heat data sets for a variety of materials yielded generally good results. Extrapolation of any model must be done with extreme caution. Possible metallurgical instabilities or changes in creep mechanism can cause serious errors in extrapolated results
Sparse Regression by Projection and Sparse Discriminant Analysis
Qi, Xin
2015-04-03
© 2015, © American Statistical Association, Institute of Mathematical Statistics, and Interface Foundation of North America. Recent years have seen active developments of various penalized regression methods, such as LASSO and elastic net, to analyze high-dimensional data. In these approaches, the direction and length of the regression coefficients are determined simultaneously. Due to the introduction of penalties, the length of the estimates can be far from being optimal for accurate predictions. We introduce a new framework, regression by projection, and its sparse version to analyze high-dimensional data. The unique nature of this framework is that the directions of the regression coefficients are inferred first, and the lengths and the tuning parameters are determined by a cross-validation procedure to achieve the largest prediction accuracy. We provide a theoretical result for simultaneous model selection consistency and parameter estimation consistency of our method in high dimension. This new framework is then generalized such that it can be applied to principal components analysis, partial least squares, and canonical correlation analysis. We also adapt this framework for discriminant analysis. Compared with the existing methods, where there is relatively little control of the dependency among the sparse components, our method can control the relationships among the components. We present efficient algorithms and related theory for solving the sparse regression by projection problem. Based on extensive simulations and real data analysis, we demonstrate that our method achieves good predictive performance and variable selection in the regression setting, and the ability to control relationships between the sparse components leads to more accurate classification. In supplementary materials available online, the details of the algorithms and theoretical proofs, and R codes for all simulation studies are provided.
Regression analysis for solving diagnosis problem of children's health
Cherkashina, Yu A.; Gerget, O. M.
2016-04-01
The paper includes results of scientific researches. These researches are devoted to the application of statistical techniques, namely, regression analysis, to assess the health status of children in the neonatal period based on medical data (hemostatic parameters, parameters of blood tests, the gestational age, vascular-endothelial growth factor) measured at 3-5 days of children's life. In this paper a detailed description of the studied medical data is given. A binary logistic regression procedure is discussed in the paper. Basic results of the research are presented. A classification table of predicted values and factual observed values is shown, the overall percentage of correct recognition is determined. Regression equation coefficients are calculated, the general regression equation is written based on them. Based on the results of logistic regression, ROC analysis was performed, sensitivity and specificity of the model are calculated and ROC curves are constructed. These mathematical techniques allow carrying out diagnostics of health of children providing a high quality of recognition. The results make a significant contribution to the development of evidence-based medicine and have a high practical importance in the professional activity of the author.
QSAR study of prolylcarboxypeptidase inhibitors by genetic algorithm: Multiple linear regressions
Eslam Pourbasheer; Saadat Vahdani; Reza Aalizadeh; Alireza Banaei; Mohammad Reza Ganjali
2015-07-01
The predictive analysis based on quantitative structure activity relationships (QSAR) on benzim-idazolepyrrolidinyl amides as prolylcarboxypeptidase (PrCP) inhibitors was performed. Molecules were represented by chemical descriptors that encode constitutional, topological, geometrical, and electronic structure features. The hierarchical clustering method was used to classify the dataset into training and test subsets. The important descriptors were selected with the aid of the genetic algorithm method. The QSAR model was constructed, using the multiple linear regressions (MLR), and its robustness and predictability were verified by internal and external cross-validation methods. Furthermore, the calculation of the domain of applicability defines the area of reliable predictions. The root mean square errors (RMSE) of the training set and the test set for GA-MLR model were calculated to be 0.176, 0.279 and the correlation coefficients (R2) were obtained to be 0.839, 0.923, respectively. The proposed model has good stability, robustness and predictability when verified by internal and external validation.
Multiple regression as a preventive tool for determining the risk of Legionella spp.
Enrique Gea-Izquierdo
2012-04-01
Full Text Available To determine the interrelationship between health & hygiene conditions for prevention of legionellosis, the compositionof materials used in water distribution systems, the water origin and Legionella pneumophila risk. Material and methods. Include adescriptive study and multiple regression analysis on a sample of golf course sprinkler irrigation systems (n=31 pertaining to hotelslocated on the Costa del Sol (Malaga, Spain. The study was carried out in 2009. Results. Presented a significant lineal relation, withall the independent variables contributing significantly (p<0.05 to the models fit. The relationship between water type and the risk ofLegionella, as well as the material composition and the latter, is lineal and positive. In contrast, the relationship between health-hygieneconditions and Legionella risk is lineal and negative. Conclusion. The characterization of Legionella pneumophila concentration, asdefined by the risk in water and through use of the predictive method, can contribute to the consideration of new influence variables inthe development of the agent, resulting in improved control and prevention of the disease.
Carlos Monge Perry
2014-07-01
Full Text Available Structural equation modeling (SEM has traditionally been deployed in areas of marketing, consumer satisfaction and preferences, human behavior, and recently in strategic planning. These areas are considered their niches; however, there is a remarkable tendency in empirical research studies that indicate a more diversified use of the technique. This paper shows the application of structural equation modeling using partial least square (PLS-SEM, in areas of manufacturing, quality, continuous improvement, operational efficiency, and environmental responsibility in Mexico’s medium and large manufacturing plants, while using a small sample (n = 40. The results obtained from the PLS-SEM model application mentioned, are highly positive, relevant, and statistically significant. Also shown in this paper, for purposes of validity, reliability, and statistical power confirmation of PLS-SEM, is a comparative analysis against multiple regression showing very similar results to those obtained by PLS-SEM. This fact validates the use of PLS-SEM in areas of untraditional scientific research, and suggests and invites the use of the technique in diversified fields of the scientific research
Aerosol optical depth (AOD) from AERONET data has a very fine resolution but air pollution index (API), visibility and relative humidity from the ground truth measurements are coarse. To obtain the local AOD in the atmosphere, the relationship between these three parameters was determined using multiple regression analysis. The data of southwest monsoon period (August to September, 2012) taken in Penang, Malaysia, was used to establish a quantitative relationship in which the AOD is modeled as a function of API, relative humidity, and visibility. The highest correlated model was used to predict AOD values during southwest monsoon period. When aerosol is not uniformly distributed in the atmosphere then the predicted AOD can be highly deviated from the measured values. Therefore these deviated data can be removed by comparing between the predicted AOD values and the actual AERONET data which help to investigate whether the non uniform source of the aerosol is from the ground surface or from higher altitude level. This model can accurately predict AOD if only the aerosol is uniformly distributed in the atmosphere. However, further study is needed to determine this model is suitable to use for AOD predicting not only in Penang, but also other state in Malaysia or even global
Yuliastuti Ramadhani
2011-06-01
Full Text Available Generally, productivity is interpreted as relation between input and output, that is the comparison between input and the result or output. The measurement of productivity is one of the major indicator in assessing compete ability in a company. PT Taman Batu Alam is a natural stone company, that in its growth always cope to increases the productivity by doing repairmen in production.The measurement and performance analyze of transform process are done by using multiple regression analysis. This model selection is based on the form that simple and easy to comprehended. Directly it can depict the size measurement of performance that is the index of efficiency and production function in which can show elasticity of input usage that be used to produces the output.From the calculation result, its gotten that proportion input in which having effects to production process is efficiency index for the year of 2007 is 5.57 and for the year of 2008 is 1094,44. Result of return to scale in 2007 increasing and in 2008 decreasing. The usage of input elasticity: for the year of 2007 the usage of raw material is 0.39, the usage of labour is 0.22 and the expense of overhead is 0,42. While for the year of 2008 the usage of raw material is 0.39, the usage of labour is 0.165 and the expense of overhead is 0,237.
Spatial data, analysis, and regression - a mini course
Daniel Arribas-Bel
2014-10-01
Full Text Available This resource contains the materials and structure suggested to run a mini course of approximately 14 hours on spatial data, analysis and regression. The course is structured along four lectures and four labs that require the use of computers.
Exploratory regression analysis: a tool for selecting models and determining predictor importance.
Braun, Michael T; Oswald, Frederick L
2011-06-01
Linear regression analysis is one of the most important tools in a researcher's toolbox for creating and testing predictive models. Although linear regression analysis indicates how strongly a set of predictor variables, taken together, will predict a relevant criterion (i.e., the multiple R), the analysis cannot indicate which predictors are the most important. Although there is no definitive or unambiguous method for establishing predictor variable importance, there are several accepted methods. This article reviews those methods for establishing predictor importance and provides a program (in Excel) for implementing them (available for direct download at http://dl.dropbox.com/u/2480715/ERA.xlsm?dl=1) . The program investigates all 2(p) - 1 submodels and produces several indices of predictor importance. This exploratory approach to linear regression, similar to other exploratory data analysis techniques, has the potential to yield both theoretical and practical benefits. PMID:21298571
Much attention is focused on increasing the energy efficiency to decrease fuel costs and CO2 emissions throughout industrial sectors. The ORC (organic Rankine cycle) is a relatively simple but efficient process that can be used for this purpose by converting low and medium temperature waste heat to power. In this study we propose four linear regression models to predict the maximum obtainable thermal efficiency for simple and recuperated ORCs. A previously derived methodology is able to determine the maximum thermal efficiency among many combinations of fluids and processes, given the boundary conditions of the process. Hundreds of optimised cases with varied design parameters are used as observations in four multiple regression analyses. We analyse the model assumptions, prediction abilities and extrapolations, and compare the results with recent studies in the literature. The models are in agreement with the literature, and they present an opportunity for accurate prediction of the potential of an ORC to convert heat sources with temperatures from 80 to 360 C, without detailed knowledge or need for simulation of the process. - Highlights: The maximum thermal efficiency of ORCs in hundreds of cases was analysed. Multiple regression models were derived to predict the maximum obtainable efficiency of ORCs. Using only key design parameters, the maximum obtainable efficiency can be evaluated. The regression models decrease the resources needed to evaluate the maximum potential. The models are statistically strong and in good agreement with the literature
User's Guide to the Weighted-Multiple-Linear Regression Program (WREG version 1.0)
Eng, Ken; Chen, Yin-Yu; Kiang, Julie.E.
2009-01-01
Streamflow is not measured at every location in a stream network. Yet hydrologists, State and local agencies, and the general public still seek to know streamflow characteristics, such as mean annual flow or flood flows with different exceedance probabilities, at ungaged basins. The goals of this guide are to introduce and familiarize the user with the weighted multiple-linear regression (WREG) program, and to also provide the theoretical background for program features. The program is intended to be used to develop a regional estimation equation for streamflow characteristics that can be applied at an ungaged basin, or to improve the corresponding estimate at continuous-record streamflow gages with short records. The regional estimation equation results from a multiple-linear regression that relates the observable basin characteristics, such as drainage area, to streamflow characteristics.
Single image super-resolution using locally adaptive multiple linear regression.
Yu, Soohwan; Kang, Wonseok; Ko, Seungyong; Paik, Joonki
2015-12-01
This paper presents a regularized superresolution (SR) reconstruction method using locally adaptive multiple linear regression to overcome the limitation of spatial resolution of digital images. In order to make the SR problem better-posed, the proposed method incorporates the locally adaptive multiple linear regression into the regularization process as a local prior. The local regularization prior assumes that the target high-resolution (HR) pixel is generated by a linear combination of similar pixels in differently scaled patches and optimum weight parameters. In addition, we adapt a modified version of the nonlocal means filter as a smoothness prior to utilize the patch redundancy. Experimental results show that the proposed algorithm better restores HR images than existing state-of-the-art methods in the sense of the most objective measures in the literature. PMID:26831381
Mass estimation of loose parts in nuclear power plant based on multiple regression
According to the application of the HilbertHuang transform to the non-stationary signal and the relation between the mass of loose parts in nuclear power plant and corresponding frequency content, a new method for loose part mass estimation based on the marginal HilbertHuang spectrum (MHS) and multiple regression is proposed in this paper. The frequency spectrum of a loose part in a nuclear power plant can be expressed by the MHS. The multiple regression model that is constructed by the MHS feature of the impact signals for mass estimation is used to predict the unknown masses of a loose part. A simulated experiment verified that the method is feasible and the errors of the results are acceptable. (paper)
Fundamental parameters calculations are used for the analysis of europium in the concentration range of 0.1 WT% to 30.0 WT% in the oxidic catalyst supports alumina, calcia, magnesia, lanthania, and thoria. The precision and accuracy of this method is dependent on how the sample matrix is defined in the fundamental parameters program and the number and concentration of the standards used. Results comparable to the multiple regression method are obtained when the matrix stoichiometry is defined as Eu2O3 and the catalyst oxide (i.e. A12O3 etc). It is also necessary to use standards which bracket the europium concentration in the samples. When these conditions are met, the results are comparable to those obtained from a ten point multiple regression calibration curve but with a considerable saving of standard preparation time. The precision is better than + or - 2% relative. The % relative difference between the fundamental parameters and multiple regression results is also 2%. Data is presented which illustrates the effect of defining the sample stoichiometry in the XRF11 computer program
General regression neural network in energy cost analysis
Previous researches on energy cost evaluation in industrial processes have been led by the authors using variance analysis techniques, MANOVA. The results were satisfactory and the codes developed using this techniques on process computers were capable to take care of various factors. Nevertheless either many hypothesis had to be made on the analytical form of the regression surfaces, or a pure MANOVA model had to be used, loosing information on the possible interpolation. Moreover, regression approach was hardly extensible to on-line acquisition of new data. In order to achieve this goal and to simplify the processing of data, we adopted neural networks techniques. We tested various types of networks and we found empirical evidence that the General Regression Neural Networks structure (GRNN) could behave consistently better than back-propagation algorithms
Specification and sensitivity analysis of cross-country growth regressions
Thanasis Stengos; Theofanis P. Mamuneas; Pantelis Kalaitzidakis
2002-01-01
We compare the sensitivity analysis of cross-country growth regressions based on extreme bounds analysis to a more direct specification testing approach using non-nested hypotheses tests. The results suggest that those specifications that are adequate are also those that include two of the only few conditioning variables that are found to be robust, namely the standard deviation of inflation and the standard deviation of domestic credit.
Placca, Latevi [FC LAB., Fuel Cell System Laboratory, Rue Thierry Mieg, 90000 Belfort (France); M3M research laboratory, University of Technology of Belfort-Montbeliard, 90010 Belfort (France); CEA, LITEN, 17, Rue des Martyrs - 38000 Grenoble (France); Kouta, Raed; Charon, Willy [FC LAB., Fuel Cell System Laboratory, Rue Thierry Mieg, 90000 Belfort (France); M3M research laboratory, University of Technology of Belfort-Montbeliard, 90010 Belfort (France); Candusso, Denis [FC LAB., Fuel Cell System Laboratory, Rue Thierry Mieg, 90000 Belfort (France); INRETS, The French National Institute for Transport and Safety Research, Laboratory of New Technologies (LTN), 25 Allee des Marronniers, 78000 Versailles - Satory (France); Blachot, Jean-Francois [FC LAB., Fuel Cell System Laboratory, Rue Thierry Mieg, 90000 Belfort (France); CEA, LITEN, 17, Rue des Martyrs - 38000 Grenoble (France)
2010-05-15
Polarisation curves performed at the Fuel Cell System Laboratory (FC LAB) at Belfort on a PEM fuel cell stack using a homemade fully instrumented test bench led to more than 100 variables depending on time. Visualising and analysing all the different test variables are complex. In this work, we show how the Principal Component Analysis (PCA) method helps to explore correlations between variables and similarities between measurements at a specific sampling time (individuals). To complete this method, an empirical model of the PEM fuel cell is proposed by linking the different input parameters to the cell voltage using Multiple Linear Regression. (author)
T. Masters
2013-01-01
The effectiveness of multiple linear regression approaches in removing solar, volcanic, and El Nino Southern Oscillation (ENSO) influences from the recent (19792012) surface temperature record is examined, using simple energy balance and global climate models (GCMs). These multiple regression methods are found to incorrectly diagnose the underlying signal particularly in the presence of a deceleration by generally overestimating the solar cooling contri...
Miozzo, Michele; Pulvermüller, Friedemann; Hauk, Olaf
2015-10-01
The time course of brain activation during word production has become an area of increasingly intense investigation in cognitive neuroscience. The predominant view has been that semantic and phonological processes are activated sequentially, at about 150 and 200-400 ms after picture onset. Although evidence from prior studies has been interpreted as supporting this view, these studies were arguably not ideally suited to detect early brain activation of semantic and phonological processes. We here used a multiple linear regression approach to magnetoencephalography (MEG) analysis of picture naming in order to investigate early effects of variables specifically related to visual, semantic, and phonological processing. This was combined with distributed minimum-norm source estimation and region-of-interest analysis. Brain activation associated with visual image complexity appeared in occipital cortex at about 100 ms after picture presentation onset. At about 150 ms, semantic variables became physiologically manifest in left frontotemporal regions. In the same latency range, we found an effect of phonological variables in the left middle temporal gyrus. Our results demonstrate that multiple linear regression analysis is sensitive to early effects of multiple psycholinguistic variables in picture naming. Crucially, our results suggest that access to phonological information might begin in parallel with semantic processing around 150 ms after picture onset. PMID:25005037
La Delfa, Nicholas J; Potvin, Jim R
2016-02-29
In ergonomics, strength prediction has typically been accomplished using linked-segment biomechanical models, and independent estimates of strength about each axis of the wrist, elbow and shoulder joints. It has recently been shown that multiple regression approaches, using the simple task-relevant inputs of hand location and force direction, may be a better method for predicting manual arm strength (MAS) capabilities. Artificial neural networks (ANNs) also serve as a powerful data fitting approach, but their application to occupational biomechanics and ergonomics is limited. Therefore, the purpose of this study was to perform a direct comparison between ANN and regression models, by evaluating their ability to predict MAS with identical sets of development and validation MAS data. Multi-directional MAS data were obtained from 95 healthy female participants at 36 hand locations within the reach envelope. ANN and regression models were developed using a random, but identical, sample of 85% of the MAS data (n=456). The remaining 15% of the data (n=80) were used to validate the two approaches. When compared to the development data, the ANN predictions had a much higher explained variance (90.2% vs. 66.5%) and much lower RMSD (9.3N vs. 17.2N), vs. the regression model. The ANN also performed better with the independent validation data (r(2)=78.6%, RMSD=15.1) compared to the regression approach (r(2)=65.3%, RMSD=18.6N). These results suggest that ANNs provide a more accurate and robust alternative to regression approaches, and should be considered more often in biomechanics and ergonomics evaluations. PMID:26876987
Early cost estimating for road construction projects using multiple regression techniques
Ibrahim Mahamid
2011-12-01
Full Text Available The objective of this study is to develop early cost estimating models for road construction projects using multiple regression techniques, based on 131 sets of data collected in the West Bank in Palestine. As the cost estimates are required at early stages of a project, considerations were given to the fact that the input data for the required regression model could be easily extracted from sketches or scope definition of the project. 11 regression models are developed to estimate the total cost of road construction project in US dollar; 5 of them include bid quantities as input variables and 6 include road length and road width. The coefficient of determination r2 for the developed models is ranging from 0.92 to 0.98 which indicate that the predicted values from a forecast models fit with the real-life data. The values of the mean absolute percentage error (MAPE of the developed regression models are ranging from 13% to 31%, the results compare favorably with past researches which have shown that the estimate accuracy in the early stages of a project is between ±25% and ±50%.
[A Case of Spontaneous Regression of Breast Cancer with Multiple Lung Metastases].
Asano, Yuka; Kashiwagi, Shinichiro; Goto, Wataru; Kurata, Kento; Morisaki, Tamami; Noda, Satoru; Takashima, Tsutomu; Onoda, Naoyoshi; Ohsawa, Masahiko; Hirakawa, Kosei
2015-11-01
Spontaneous regression of any malignant tumor is a rare event, occurring in about 1 of 60,000-100,000 cases of malignant tumor. We report a case of spontaneous regression of breast cancer with multiple pulmonary metastases. The patient was a 73-year-old woman who complained of a left mammary mass. A tumor, approximately 2.2 cm in diameter, was palpated, and breast cancer was suspected based on ultrasound examination. Histopathological findings of the core needle biopsy specimen indicated invasive ductal carcinoma. The patient underwent partial mastectomy with axillary lymph node dissection. It was a stage ⅡB (pT2N1 [sn] M0) tumor. CT performed after adjuvant therapy confirmed the presence of multiple pulmonary metastases 6 years after surgery. We started anti-cancer therapy with TS-1; however, it was discontinued because an adverse event occurred. Half a year later, tumor shrinkage was confirmed after a recurrence. Four years and 6 months after the treatment was discontinued, the tumor continued to regress spontaneously. PMID:26805177
Principal regression analysis and the index leverage effect
Reigneron, Pierre-Alain; Allez, Romain; Bouchaud, Jean-Philippe
2011-09-01
We revisit the index leverage effect, that can be decomposed into a volatility effect and a correlation effect. We investigate the latter using a matrix regression analysis, that we call Principal Regression Analysis' (PRA) and for which we provide some analytical (using Random Matrix Theory) and numerical benchmarks. We find that downward index trends increase the average correlation between stocks (as measured by the most negative eigenvalue of the conditional correlation matrix), and makes the market mode more uniform. Upward trends, on the other hand, also increase the average correlation between stocks but rotates the corresponding market mode away from uniformity. There are two time scales associated to these effects, a short one on the order of a month (20 trading days), and a longer time scale on the order of a year. We also find indications of a leverage effect for sectorial correlations as well, which reveals itself in the second and third mode of the PRA.
A Quantile Regression Analysis of Micro-lending's Poverty Impact
Stephen W. Polk; Johnson, Daniel K.N.
2012-01-01
This paper aims to evaluate the impact of a microlending program on ameliorating measured poverty within its client population, with the aim of improving that impact. We analyze over 18,000 women micro-finance clients of the Negros Women for Tomorrow Foundation (NWTF), a database using the Progress out of Poverty (PPI) Scorecard as a measure of poverty. Analysis using both OLS and quantile multivariate regression models shows how observable borrower attributes affect the ability of clients ...
Globalisation and the welfare state - A meta-regression analysis
Kallager, Per Kristian Roko
2014-01-01
The effect of economic globalisation on the welfare state is a widely polarised debate in the scholarly literature. In essence, there are three possible effects of this relationship: economic globalisation increases welfare, decreases welfare or it has no effect. By applying meta-regression analysis to 33 empirical studies, this thesis concludes that globalization have a positive effect on the welfare state, although it is quite small. Moreover, the thesis finds that publication bias is not ...
Regression tree analysis for predicting slaughter weight in broilers
Erkut Akkartal; Mehmet Mendeş
2010-01-01
In this study, Regression Tree Analysis (RTA) was used to predict and to determine the most important variables in predicting the slaughter weight of Ross 308 broiler chickens. Data for this study came from 224 chickens raised during three different seasons, namely spring (n=66), summer (n=66), winter (n=92). Second week body weight, shank length, shank width, breast bone length, breast width, breast circumference and body length were used to predict the slaughter weight. Results of RTA showe...
Künzi Niklaus
2002-01-01
Full Text Available Abstract A random regression model for daily feed intake and a conventional multiple trait animal model for the four traits average daily gain on test (ADG, feed conversion ratio (FCR, carcass lean content and meat quality index were combined to analyse data from 1 449 castrated male Large White pigs performance tested in two French central testing stations in 1997. Group housed pigs fed ad libitum with electronic feed dispensers were tested from 35 to 100 kg live body weight. A quadratic polynomial in days on test was used as a regression function for weekly means of daily feed intake and to escribe its residual variance. The same fixed (batch and random (additive genetic, pen and individual permanent environmental effects were used for regression coefficients of feed intake and single measured traits. Variance components were estimated by means of a Bayesian analysis using Gibbs sampling. Four Gibbs chains were run for 550 000 rounds each, from which 50 000 rounds were discarded from the burn-in period. Estimates of posterior means of covariance matrices were calculated from the remaining two million samples. Low heritabilities of linear and quadratic regression coefficients and their unfavourable genetic correlations with other performance traits reveal that altering the shape of the feed intake curve by direct or indirect selection is difficult.
Dragomir, Carmelia Mariana; Voiculescu, Mirela; Constantin, Daniel-Eduard; Georgescu, Lucian Puiu
2015-12-01
The probability of exceeding EU limit values for NO2 concentrations has increased in many European cities. Meteorological parameters have an extremely important role in evaluating the dispersion of pollutants in various city areas. This paper focuses on meteorological variations and their impact on urban background NO2 concentrations in the city of Braila for 2009-2013. The dependence between measured NO2 data and meteorological parameters are analyzed using two modeling methods: multiple linear regression and artificial neuronal networks. The dataset calculated using the proposed models indicate that artificial neural networks can be applied in the analysis and forecasting of air quality.
Poisson Regression Analysis of Illness and Injury Surveillance Data
Frome E.L., Watkins J.P., Ellis E.D.
2012-12-12
The Department of Energy (DOE) uses illness and injury surveillance to monitor morbidity and assess the overall health of the work force. Data collected from each participating site include health events and a roster file with demographic information. The source data files are maintained in a relational data base, and are used to obtain stratified tables of health event counts and person time at risk that serve as the starting point for Poisson regression analysis. The explanatory variables that define these tables are age, gender, occupational group, and time. Typical response variables of interest are the number of absences due to illness or injury, i.e., the response variable is a count. Poisson regression methods are used to describe the effect of the explanatory variables on the health event rates using a log-linear main effects model. Results of fitting the main effects model are summarized in a tabular and graphical form and interpretation of model parameters is provided. An analysis of deviance table is used to evaluate the importance of each of the explanatory variables on the event rate of interest and to determine if interaction terms should be considered in the analysis. Although Poisson regression methods are widely used in the analysis of count data, there are situations in which over-dispersion occurs. This could be due to lack-of-fit of the regression model, extra-Poisson variation, or both. A score test statistic and regression diagnostics are used to identify over-dispersion. A quasi-likelihood method of moments procedure is used to evaluate and adjust for extra-Poisson variation when necessary. Two examples are presented using respiratory disease absence rates at two DOE sites to illustrate the methods and interpretation of the results. In the first example the Poisson main effects model is adequate. In the second example the score test indicates considerable over-dispersion and a more detailed analysis attributes the over-dispersion to extra-Poisson variation. The R open source software environment for statistical computing and graphics is used for analysis. Additional details about R and the data that were used in this report are provided in an Appendix. Information on how to obtain R and utility functions that can be used to duplicate results in this report are provided.
Karpechko, Aleyey; Maraun, Douglas; Eyring, Veronika
2014-05-01
Accurate projections of stratospheric ozone are required, because ozone changes impact onexposures to ultraviolet radiation and on tropospheric climate. Unweighted multi-model ensemble mean (uMMM) projections from chemistry-climate models (CCMs) are commonly used to project ozone in the 21 th century, when ozone-depleting substances are expected to decline and greenhouse gases expected to rise. Here, we address the question whether Antarctic total column ozone projections in October given by the uMMM of CCM simulations can be improved by using a process-oriented multiple diagnostic ensemble regression (MDER) method. This method is based on the correlation between simulated future ozone and selected key processes relevant for stratospheric ozone under present-day conditions. The regression model is built using an algorithm that selects those process-oriented diagnostics which explain a significant fraction of the spread in the projected ozone among the CCMs. The regression model with observed diagnostics is then used to predict future ozone and associated uncertainty. The precision of our method is tested in a pseudo-reality, i.e. the prediction is validated against an independent CCM projection used to replace unavailable future observations. The test shows that MDER has a higher precision than uMMM, suggesting an improvement in the estimate of future Antarctic ozone. Our method projects that Antarctic total ozone will return to 1980 values around 2060 with the 95% confidence interval ranging from 2040 to 2080. This reduces the range of return dates across the ensemble of CCMs by more than a decade and suggests that the earliest simulated return dates are unlikely. Karpechko, Maraun and Eyring (2013) Improving Antarctic Total Ozone Projections by a Process-Oriented Multiple Diagnostic Ensemble Regression, J. Atmos. Sci. 70: 3959-3976
ELMAZ, Özkan; DİKMEN, Serdal; CİRİT, Ümit; DEMİR, Hıdır
2008-01-01
The relationship between the prepubertal body weight, testicular size, testosterone concentration, and postpubertal reproductive function was investigated in Kıvırcık ram lambs. The body weight, testicular size, and testosterone concentration were measured every 20 days between 60 and 420 days of age. Semen was collected from the ram lambs at 7, 8, 9, 10, 11, 12, 13 and 14 months of age. Data obtained were analyzed by best subsets regression model. We determined that body weight, scrotal circ...
Regression analysis for wavefront fitting with Zernike polynomials
Qi, Bo; Chen, Hongbin; Ma, Jiaguang; Dong, Nengli
2004-01-01
Many approaches to compute the wavefront of interferometer have been devised, for example least squares method, Gram-Schmidt method, covariance matrix method and SVD method, but one of the most interesting is based on the Zernike Polynomials. Zernike polynomials are ideal for fitting the measured data points in a wavefront to a two-dimensional polynomial, due to their orthogonal properties. The key problem of wavefront fitting is how to express exactly the whole wavefront. In established algorithms, the fixed mode number of Zernike polynomials is used, for example most analyzing software using 36 Zernike polynomials (i.e., Metropro of Zygo). When analyzing high spatial frequency aberrations, the analyzed result is not accurate. We develop a method of wavefront fitting with regression analysis. Regression analysis is the most widely used technique in statistics, and it is a statistical technique for investigating and modeling the relationship between variables. With stepwise regression we obtain the optimum combination of mode, and the wavefront can be exactly expressed.
Tolosana-Delgado, R.; von Eynatten, H.
2010-05-01
Modern geochemical data sets have typically around 20-30 compositional variables measured on some tens or hundreds of samples. A statistical analysis of data sets with so many variables should take as a priority the reduction of dimensionality of the model, in order to increase its reliability and enhance its interpretation. In the framework of compositional data analysis with multiple regression, such simplification can be achieved taking some geometric concepts into account. First, the sample space of compositions, the simplex, is given an Euclidean space structure by the compositional operations of perturbation, powering and Aitchison inner product. Then, given some qualitative information on which subcompositions might depend on each explanatory variable, one can decompose the simplex in a set of orthogonal subspaces, in such a way that the composition projected onto each subspace is independent of a subset of the explanatory variables. This is achieved with a series of singular value decomposition computations. The method is applied to a data set of 88 observations of six major oxides in molar proportions, from modern glacial and fluvio-glacial sediments, with grain size ranging from coarse sand to clay. The goal is to assess the influence of chemical weathering processes (expected to impose a linear relation of composition and grain size) against purely physical processes (expected to show step-wise functions following the largest characteristic crystal sizes of specific minerals in the source rock). We exhaustively explore all patterns of uncorrelation of the composition with three explanatory variables: grain size in ϕ scale, and two step functions for the silt and clay domains. The best pattern, chosen with a likelihood ratio test, has only a smooth trend of (Mg,Fe) vs. (Al,K,Ca+Na) enrichment towards finer grain sizes—explained as differential mechanical behaviour of phyllosilicates vs. feldspar—and coefficients for the two step functions related to the sharp decrease of quartz in silt fractions, and the sudden enrichment of mafic accessory minerals, alteration products and mechanically unstable phyllosilicates in the clay fraction. We could thus be confident that weathering is almost absent in this data set.
Harrell , Jr , Frank E
2015-01-01
This highly anticipated second edition features new chapters and sections, 225 new references, and comprehensive R software. In keeping with the previous edition, this book is about the art and science of data analysis and predictive modeling, which entails choosing and using multiple tools. Instead of presenting isolated techniques, this text emphasizes problem solving strategies that address the many issues arising when developing multivariable models using real data and not standard textbook examples. It includes imputation methods for dealing with missing data effectively, methods for fitting nonlinear relationships and for making the estimation of transformations a formal part of the modeling process, methods for dealing with "too many variables to analyze and not enough observations," and powerful model validation techniques based on the bootstrap. The reader will gain a keen understanding of predictive accuracy, and the harm of categorizing continuous predictors or outcomes. This text realistically...
Arch Height: A Regression Analysis of Different Measuring Parameters
Hironmoy Roy
2011-07-01
Full Text Available Rationale: For measuring the height of the arch of foot either standing navicular height or talar height of the medial longitudinal arch was accepted in earlier days, where as the ‘standing normalised navicular height’ is taken by modern day by authors as a yardstick. But being troublesome and time consuming, we practically not opt for them in busy OPD schedule; rather go for measuring the arch-height in supine posture. Objectives: So this study was aimed to derive the regression between the standing arch-height values with the supine counterparts, so that former can be predicted easily from later. Methodology: It was carried out among 103 adult subjects in the purview of North Bengal Medical College & Hospital. From the x-ray films of their feet in supine and standing posture the navicular and talar heights were determined and the records were analysed. Result: Statistically significant correlation followed by regression analysis could reveal simple linear regression-equations for predicting the standing arch-height values from the supine values; derived separately in both males and females. Conclusion: Thus, from a known supine arch-height value, we can derive the respective standing arch- height, as well as the ‘standing normalised navicular height’ indirectly avoiding the entire troublesome maneuver in regular practice. So the present study recommends this method in clinical fields as because this is more rational and ideal approach to estimate arch height.
Performance Evaluation of Button Bits in Coal Measure Rocks by Using Multiple Regression Analyses
Su, Okan
2016-02-01
Electro-hydraulic and jumbo drills are commonly used for underground coal mines and tunnel drives for the purpose of blasthole drilling and rock bolt installations. Not only machine parameters but also environmental conditions have significant effects on drilling. This study characterizes the performance of button bits during blasthole drilling in coal measure rocks by using multiple regression analyses. The penetration rate of jumbo and electro-hydraulic drills was measured in the field by employing bits in different diameters and the specific energy of the drilling was calculated at various locations, including highway tunnels and underground roadways of coal mines. Large block samples were collected from each location at which in situ drilling measurements were performed. Then, the effects of rock properties and machine parameters on the drilling performance were examined. Multiple regression models were developed for the prediction of the specific energy of the drilling and the penetration rate. The results revealed that hole area, impact (blow) energy, blows per minute of the piston within the drill, and some rock properties, such as the uniaxial compressive strength (UCS) and the drilling rate index (DRI), influence the drill performance.
Highlights: ► We obtained models for estimation of cetane number of biodiesel. ► Twenty-four neural networks using two topologies were evaluated. ► The best neural network for predict the cetane number was selected. ► The best accuracy was obtained for the selected neural network. - Abstract: Models for estimation of cetane number of biodiesel from their fatty acid methyl ester composition using multiple linear regression and artificial neural networks were obtained in this work. For the obtaining of models to predict the cetane number, an experimental data from literature reports that covers 48 and 15 biodiesels in the modeling-training step and validation step respectively were taken. Twenty-four neural networks using two topologies and different algorithms for the second training step were evaluated. The model obtained using multiple regression was compared with two other models from literature and it was able to predict cetane number with 89% of accuracy, observing one outlier. A model to predict cetane number using artificial neural network was obtained with better accuracy than 92% except one outlier. The best neural network to predict the cetane number was a backpropagation network (11:5:1) using the Levenberg–Marquardt algorithm for the second step of the networks training and showing R = 0.9544 for the validation data.
Majumdar, Arunabha; Witte, John S; Ghosh, Saurabh
2015-12-01
Binary phenotypes commonly arise due to multiple underlying quantitative precursors and genetic variants may impact multiple traits in a pleiotropic manner. Hence, simultaneously analyzing such correlated traits may be more powerful than analyzing individual traits. Various genotype-level methods, e.g., MultiPhen (O'Reilly et al. []), have been developed to identify genetic factors underlying a multivariate phenotype. For univariate phenotypes, the usefulness and applicability of allele-level tests have been investigated. The test of allele frequency difference among cases and controls is commonly used for mapping case-control association. However, allelic methods for multivariate association mapping have not been studied much. In this article, we explore two allelic tests of multivariate association: one using a Binomial regression model based on inverted regression of genotype on phenotype (Binomial regression-based Association of Multivariate Phenotypes [BAMP]), and the other employing the Mahalanobis distance between two sample means of the multivariate phenotype vector for two alleles at a single-nucleotide polymorphism (Distance-based Association of Multivariate Phenotypes [DAMP]). These methods can incorporate both discrete and continuous phenotypes. Some theoretical properties for BAMP are studied. Using simulations, the power of the methods for detecting multivariate association is compared with the genotype-level test MultiPhen's. The allelic tests yield marginally higher power than MultiPhen for multivariate phenotypes. For one/two binary traits under recessive mode of inheritance, allelic tests are found to be substantially more powerful. All three tests are applied to two different real data and the results offer some support for the simulation study. We propose a hybrid approach for testing multivariate association that implements MultiPhen when Hardy-Weinberg Equilibrium (HWE) is violated and BAMP otherwise, because the allelic approaches assume HWE. PMID:26493781
Partial Functional Linear Quantile Regression for Neuroimaging Data Analysis
Yu, Dengdeng; Kong, Linglong; Mizera, Ivan
2015-01-01
We propose a prediction procedure for the functional linear quantile regression model by using partial quantile covariance techniques and develop a simple partial quantile regression (SIMPQR) algorithm to efficiently extract partial quantile regression (PQR) basis for estimating functional coefficients. We further extend our partial quantile covariance techniques to functional composite quantile regression (CQR) defining partial composite quantile covariance. There are three major contributio...
Finding determinants of audit delay by pooled OLS regression analysis
Tina Vuko; Marko Čular
2014-01-01
The aim of this paper is to investigate determinants of audit delay. Audit delay is measured as the length of time (i.e. the number of calendar days) from the fiscal year-end to the audit report date. It is important to understand factors that influence audit delay since it directly affects the timeliness of financial reporting. The research is conducted on a sample of Croatian listed companies, covering the period of four years (from 2008 to 2011). We use pooled OLS regression analysis, mode...
Multivariate study and regression analysis of gluten-free granola
Lilian Maria Pagamunici
2014-03-01
Full Text Available This study developed a gluten-free granola and evaluated it during storage with the application of multivariate and regression analysis of the sensory and instrumental parameters. The physicochemical, sensory, and nutritional characteristics of a product containing quinoa, amaranth and linseed were evaluated. The crude protein and lipid contents ranged from 97.49 and 122.72 g kg-1 of food, respectively. The polyunsaturated/saturated, and n-6:n-3 fatty acid ratios ranged from 2.82 and 2.59:1, respectively. Granola had the best alpha-linolenic acid content, nutritional indices in the lipid fraction, and mineral content. There were good hygienic and sanitary conditions during storage; probably due to the low water activity of the formulation, which contributed to inhibit microbial growth. The sensory attributes ranged from 'like very much' to 'like slightly', and the regression models were highly fitted and correlated during the storage period. A reduction in the sensory attribute levels and in the product physical stabilisation was verified by principal component analysis. The use of the affective test acceptance and instrumental analysis combined with statistical methods allowed us to obtain promising results about the characteristics of gluten-free granola.
Data from the Interagency Monitoring of Protected Visual Environments (IMPROVE) network are used to estimate organic mass to organic carbon (OM/OC) ratios across the United States by extending previously published multiple regression techniques. Our new methodology addresses com...
Balachandran, K.K.; Jayalakshmy, K.V.; Laluraj, C.M.; Nair, M.; Joseph, T.; Sheeba, P.
The interaction effects of abiotic processes in the production of phytoplankton in a coastal marine region off Cochin are evaluated using multiple regression models. The study shows that chlorophyll production is not limited by nutrients...
Quinino, Roberto C.; Reis, Edna A.; Bessegato, Lupercio F.
2013-01-01
This article proposes the use of the coefficient of determination as a statistic for hypothesis testing in multiple linear regression based on distributions acquired by beta sampling. (Contains 3 figures.)
Generalized Constrained Multiple Correspondence Analysis.
Hwang, Heungsun; Takane, Yoshio
2002-01-01
Proposes a comprehensive approach, generalized constrained multiple correspondence analysis, for imposing both row and column constraints on multivariate discrete data. Each set of discrete data is decomposed into several submatrices and then multiple correspondence analysis is applied to explore relationships among the decomposed submatrices.…
Arvinder Kaur
2012-01-01
Full Text Available Software Estimation Techniques present an inclusive set of directives for software project developers, project managers and the management in order to produce more accurate estimates or predictions for future developments. The estimates also facilitate allocation of resources’ for Software development. Estimations also smooth the process of re-planning, prioritizing, classification and reuse of the projects. Various estimation models are widely being used in the Industry as well for research purposes. Several comparative studies have been executed on them, but choosing the best technique is quite intricate. Estimation by Analogy(EbA is the method of making estimations based on the outcome from k most analogous projects. The projects close in distance are potentially similar to the reference project from the repository of projects. This method has widely been accepted and is quite popular as it impersonates human beings inherent judgment skill by estimating with analogous projects. In this paper, Grey Relational Analysis(GRA is used as the method for feature selection and also for locating the closest analogous projects to the reference project from the set of projects. The closest k projects are then used to build regression models. Regression techniques like Multiple Linear Regression, Stepwise Regression and Robust regression techniques are used to find the effort from the closest projects.
Kolasa-Wiecek, Alicja
2015-04-01
The energy sector in Poland is the source of 81% of greenhouse gas (GHG) emissions. Poland, among other European Union countries, occupies a leading position with regard to coal consumption. Polish energy sector actively participates in efforts to reduce GHG emissions to the atmosphere, through a gradual decrease of the share of coal in the fuel mix and development of renewable energy sources. All evidence which completes the knowledge about issues related to GHG emissions is a valuable source of information. The article presents the results of modeling of GHG emissions which are generated by the energy sector in Poland. For a better understanding of the quantitative relationship between total consumption of primary energy and greenhouse gas emission, multiple stepwise regression model was applied. The modeling results of CO2 emissions demonstrate a high relationship (0.97) with the hard coal consumption variable. Adjustment coefficient of the model to actual data is high and equal to 95%. The backward step regression model, in the case of CH4 emission, indicated the presence of hard coal (0.66), peat and fuel wood (0.34), solid waste fuels, as well as other sources (-0.64) as the most important variables. The adjusted coefficient is suitable and equals R2=0.90. For N2O emission modeling the obtained coefficient of determination is low and equal to 43%. A significant variable influencing the amount of N2O emission is the peat and wood fuel consumption. PMID:25872708
Avval Zhila Mohajeri
2015-01-01
Full Text Available This paper deals with developing a linear quantitative structure-activity relationship (QSAR model for predicting the RSK inhibition activity of some new compounds. A dataset consisting of 62 pyrazino [1,2-α] indole, diazepino [1,2-α] indole, and imidazole derivatives with known inhibitory activities was used. Multiple linear regressions (MLR technique combined with the stepwise (SW and the genetic algorithm (GA methods as variable selection tools was employed. For more checking stability, robustness and predictability of the proposed models, internal and external validation techniques were used. Comparison of the results obtained, indicate that the GA-MLR model is superior to the SW-MLR model and that it isapplicable for designing novel RSK inhibitors.
We report two cases of spontaneous regression of multiple pulmonary metastases occurring after radiofrequency ablation (RFA) of a single lung metastasis. To the best of our knowledge, these are the first such cases reported. These two patients presented with lung metastases progressive despite treatment with interleukin-2, interferon, or sorafenib but were safely ablated with percutaneous RFA under computed tomography guidance. Percutaneous RFA allowed control of the targeted tumors for >1 year. Distant lung metastases presented an objective response despite the fact that they received no targeted local treatment. Local ablative techniques, such as RFA, induce the release of tumor-degradation product, which is probably responsible for an immunologic reaction that is able to produce a response in distant tumors.
Regression analysis exploring teacher impact on student FCI post scores
Mahadeo, Jonathan V.; Manthey, Seth R.; Brewe, Eric
2013-01-01
High School Modeling Workshops are designed to improve high school physics teachers' understanding of physics and how to teach using the Modeling method. The basic assumption is that the teacher plays a critical role in their students' physics education. This study investigated teacher impacts on students' Force Concept Inventory scores, (FCI), with the hopes of identifying quantitative differences between teachers. This study examined student FCI scores from 18 teachers with at least a year of teaching high school physics. This data was then evaluated using a General Linear Model (GLM), which allowed for a regression equation to be fitted to the data. This regression equation was used to predict student post FCI scores, based on: teacher ID, student pre FCI score, gender, and representation. The results show 12 out of 18 teachers significantly impact their student post FCI scores. The GLM further revealed that of the 12 teachers only five have a positive impact on student post FCI scores. Given these differences among teachers it is our intention to extend our analysis to investigate pedagogical differences between them.
Parameters derived from computer analysis of digital radio-frequency (rf) ultrasound scan data of untreated uveal malignant melanomas were examined for correlations with tumor regression following cobalt-60 plaque. Parameters included tumor height, normalized power spectrum and acoustic tissue type (ATT). Acoustic tissue type was based upon discriminant analysis of tumor power spectra, with spectra of tumors of known pathology serving as a model. Results showed ATT to be correlated with tumor regression during the first 18 months following treatment. Tumors with ATT associated with spindle cell malignant melanoma showed over twice the percentage reduction in height as those with ATT associated with mixed/epithelioid melanomas. Pre-treatment height was only weakly correlated with regression. Additionally, significant spectral changes were observed following treatment. Ultrasonic spectrum analysis thus provides a noninvasive tool for classification, prediction and monitoring of tumor response to cobalt-60 plaque
Objective: To analyze the correlations between liver lipid level determined by liver 3.0 T 1H-MRS in vivo and influencing factors using multiple linear stepwise regression. Methods: The prospective study of liver 1H-MRS was performed with 3.0 T system and eight-channel torso phased-array coils using PRESS sequence. Forty-four volunteers were enrolled in this study. Liver spectra were collected with a TR of 1500 ms, TE of 30 ms, volume of interest of 2 cm2 cm2 cm, NSA of 64 times. The acquired raw proton MRS data were processed by using a software program SAGE. For each MRS measurement, using water as the internal reference, the amplitude of the lipid signal was normalized to the sum of the signal from lipid and water to obtain percentage lipid within the liver. The statistical description of height, weight, age and BMI, Line width and water suppression were recorded, and Pearson analysis was applied to test their relationships. Multiple linear stepwise regression was used to set the statistical model for the prediction of Liver lipid content. Results: Age (39.112.6) years, body weight (64.410.4) kg, BMI (23.33.1) kg/m2, linewidth (18.94.4) and the water suppression (90.76.5)% had significant correlation with liver lipid content (0.00 to 0.96%, median 0.02%), r were 0.11, 0.44, 0.40, 0.52, -0.73 respectively (P<0.05). But only age, BMI, line width, and the water suppression entered into the multiple linear regression equation. Liver lipid content prediction equation was as follows: Y= 1.395 - (0.021water suppression) + (0.022BMI) + (0.014line width) - (0.004age), and the coefficient of determination was 0. 613, corrected coefficient of determination was 0.59. Conclusion: The regression model fitted well, since the variables of age, BMI, width, and water suppression can explain about 60% of liver lipid content changes. (authors)
Multiple timescale spectral analysis
Denoël, Vincent
2015-01-01
Abstract Spectral analysis is a classical tool for the structural analysis of structures subjected to random excitations. The most common application of spectral analysis is the determination of the steady-state second order cumulant of a linear oscillator, under the action of a stationary loading prescribed by means of its power spectral density. There exists however a broad variety of such similar problems, extending the concept to multi degree-of-freedom systems, non Gaussian excitation, s...
A Visual Analytics Approach for Correlation, Classification, and Regression Analysis
Steed, Chad A [ORNL; SwanII, J. Edward [Mississippi State University (MSU); Fitzpatrick, Patrick J. [Mississippi State University (MSU); Jankun-Kelly, T.J. [Mississippi State University (MSU)
2012-02-01
New approaches that combine the strengths of humans and machines are necessary to equip analysts with the proper tools for exploring today's increasing complex, multivariate data sets. In this paper, a novel visual data mining framework, called the Multidimensional Data eXplorer (MDX), is described that addresses the challenges of today's data by combining automated statistical analytics with a highly interactive parallel coordinates based canvas. In addition to several intuitive interaction capabilities, this framework offers a rich set of graphical statistical indicators, interactive regression analysis, visual correlation mining, automated axis arrangements and filtering, and data classification techniques. The current work provides a detailed description of the system as well as a discussion of key design aspects and critical feedback from domain experts.
Least Squares Adjustment: Linear and Nonlinear Weighted Regression Analysis
Nielsen, Allan Aasbjerg
2007-01-01
estimates of relevant parameters in an over-determined system of equations which may arise from deliberately carrying out more measurements than actually needed to determine the set of desired parameters. An example may be the determination of a geographical position based on information from a number of...... Global Navigation Satellite System (GNSS) satellites also known as space vehicles (SV). It takes at least four SVs to determine the position (and the clock error) of a GNSS receiver. Often more than four SVs are used and we use adjustment to obtain a better estimate of the geographical position (and the...... different variables in an experiment or in a survey, etc. Regression analysis is probably one the most used statistical techniques around. Dr. Anna B. O. Jensen provided insight and data for the Global Positioning System (GPS) example. Matlab code and sections that are considered as either traditional land...
Finding determinants of audit delay by pooled OLS regression analysis
Tina Vuko
2014-03-01
Full Text Available The aim of this paper is to investigate determinants of audit delay. Audit delay is measured as the length of time (i.e. the number of calendar days from the fiscal year-end to the audit report date. It is important to understand factors that influence audit delay since it directly affects the timeliness of financial reporting. The research is conducted on a sample of Croatian listed companies, covering the period of four years (from 2008 to 2011. We use pooled OLS regression analysis, modelling audit delay as a function of the following explanatory variables: audit firm type, audit opinion, profitability, leverage, inventory and receivables to total assets, absolute value of total accruals, company size and audit committee existence. Our results indicate that audit committee existence, profitability and leverage are statistically significant determinants of audit delay in Croatia.
Node-Mapping EIT Method Based on Regression Analysis
Jianjun Zhang
2012-12-01
Full Text Available Medical Imaging shows people the morphology of the body's internal organs function intuitive ly. Electrical Impedance Tomography (EIT is an emerging medical imaging technology. It has the advantages of simple structure, low cost, non-radiological hazards and non-invasive . EIT can not only take advantage of the impedance differences between the different organizations reconstruction of anatomical images, and cantissues and organs to achieve functional imaging impedance changes in different physiological and pathological state, and is suitable for long -term monitoring. The solution is approximate due to t he ill -posedness of inverse problem . Because the image is accuracy and computation of contradictions in not quick enough, EIT is still unable to meet the requirements of practical pplication. By using regression analysis algorithm , Node-Mapping Method only calculates the node potential . The speed of operation and the reconstructed image quality have been greatly improved.
A Quantile Regression Analysis of Micro-lending's Poverty Impact
Stephen W. Polk
2012-07-01
Full Text Available This paper aims to evaluate the impact of a microlending program on ameliorating measured poverty within its client population, with the aim of improving that impact. We analyze over 18,000 women micro-finance clients of the Negros Women for Tomorrow Foundation (NWTF, a database using the Progress out of Poverty (PPI Scorecard as a measure of poverty. Analysis using both OLS and quantile multivariate regression models shows how observable borrower attributes affect the ability of clients to reduce their measured poverty. Loan size, duration, and the economic activity supported all have strongly identifiable effects. Moreover, estimates suggest which among the poor are receiving the greatest effective help by the program. Results offer specific advice to the NWTF and other micro-lenders: impact is greatest with fewer, larger loans in particular economic sectors (sari-sari, service and trade but require patience as each additional year increases the client’s average change in poverty score.
Logistic regression analysis on the risk factors of radiation pneumonitis
Objective: To identify the risk factors of radiation pneumonitis (RP). Methods: A retrospective study was conducted on 101 patients with radiation pneumonitis using SPSS 8.0 software. Factors evaluated included: gender, age, pathology, clinical stage, irradiation dose, irradiation field size, history of smoking, cardiovascular disease, bronchitis, surgery, chemotherapy, lung infection, atelectasis, obstructive infection and pleural effusion. Univariate analysis was performed using Chi-Square test and multivariate analysis was performed using Logistic regression model. Results: Univariate analysis revealed a significant relationship between 10 factors: pulmonary infection, atelectasis, obstructive infection, cardiovascular disease, bronchitis, chemotherapy, irradiation dose, number of days of radiation and irradiation field size were factors leading to radiation pneumonitis. Multivariate analysis showed that 9 factors: pulmonary infection, obs tractive infection, atelectasis, pleural effusion, bronchitis, cardiovascular disease, chemotherapy, irradiation dose, and irradiation field size were independent factors. Conclusion: Comprehensive consideration of the accompanying disease, chemotherapy, dose, field size, etc during the planning of radiotherapy is able to minimize the possibility of developing radiation pneumonitis
Low-Cost Housing in Sabah, Malaysia: A Regression Analysis
Dullah Mulok
2009-02-01
Full Text Available Low-cost housing plays a vital role in the development process especially in providing accommodation to those who are less fortunate and the lower income group. This effort is also a step in overcoming the squatter problem which could cripple the competitive drive of the local community especially in the state of Sabah, Malaysia. This article attempts to look into the influencing factors to low-cost housing in Sabah namely the government’s budget (allocation for low cost housing projects and Sabah’s total population. At the same time, this study will attempt to show the implication from the development and economic crises which occurred during period 1971 to 2000 towards the provision of low cost houses in Sabah. Empirical analyses were conducted using the multiple linear regression method, stepwise and also the dummy variable approach in demonstrating the link. The empirical result shows that the government’s budget for low-cost housing is the main contributor to the provision of low-cost housing in Sabah. The empirical decision also suggests that economic growth namely Gross Domestic Product (GDP did not provide a significant effect to the low-cost housing in Sabah. However, almost all major crises that have beset upon Malaysia’s economy caused a significant and consistent effect to the low-cost housing in Sabah especially the financial crisis which occurred in mid 1997.
Spatial regression analysis of traffic crashes in Seoul.
Rhee, Kyoung-Ah; Kim, Joon-Ki; Lee, Young-Ihn; Ulfarsson, Gudmundur F
2016-06-01
Traffic crashes can be spatially correlated events and the analysis of the distribution of traffic crash frequency requires evaluation of parameters that reflect spatial properties and correlation. Typically this spatial aspect of crash data is not used in everyday practice by planning agencies and this contributes to a gap between research and practice. A database of traffic crashes in Seoul, Korea, in 2010 was developed at the traffic analysis zone (TAZ) level with a number of GIS developed spatial variables. Practical spatial models using available software were estimated. The spatial error model was determined to be better than the spatial lag model and an ordinary least squares baseline regression. A geographically weighted regression model provided useful insights about localization of effects. The results found that an increased length of roads with speed limit below 30km/h and a higher ratio of residents below age of 15 were correlated with lower traffic crash frequency, while a higher ratio of residents who moved to the TAZ, more vehicle-kilometers traveled, and a greater number of access points with speed limit difference between side roads and mainline above 30km/h all increased the number of traffic crashes. This suggests, for example, that better control or design for merging lower speed roads with higher speed roads is important. A key result is that the length of bus-only center lanes had the largest effect on increasing traffic crashes. This is important as bus-only center lanes with bus stop islands have been increasingly used to improve transit times. Hence the potential negative safety impacts of such systems need to be studied further and mitigated through improved design of pedestrian access to center bus stop islands. PMID:26994374
Soares dos Santos, T.; Mendes, D.; Rodrigues Torres, R.
2016-01-01
Several studies have been devoted to dynamic and statistical downscaling for analysis of both climate variability and climate change. This paper introduces an application of artificial neural networks (ANNs) and multiple linear regression (MLR) by principal components to estimate rainfall in South America. This method is proposed for downscaling monthly precipitation time series over South America for three regions: the Amazon; northeastern Brazil; and the La Plata Basin, which is one of the regions of the planet that will be most affected by the climate change projected for the end of the 21st century. The downscaling models were developed and validated using CMIP5 model output and observed monthly precipitation. We used general circulation model (GCM) experiments for the 20th century (RCP historical; 1970-1999) and two scenarios (RCP 2.6 and 8.5; 2070-2100). The model test results indicate that the ANNs significantly outperform the MLR downscaling of monthly precipitation variability.
The analysis of kernel ridge regression learning algorithm.
Pozdnoukhov, Alexei
2002-01-01
The paper presents Kernel Ridge Regression, a nonlinear extension of the well known statistical model of ridge regression. New insights on the method are also presented. In particular, the connection between ridge regression and local translation-invariant squared loss minimization algorithm is shown. An iterative training algorithm is proposed, that allows training the KRR for large datasets. The training time is empirically found to scale quadratically with the number of samples. The applic...
Risk associated with power generation must be identified to make intelligent choices between alternate power technologies. Radionuclide air stack emissions for a single coal plant and a single nuclear plant are used to compute the single plant leukemia incidence risk and total industry leukemia incidence risk. Leukemia incidence is the response variable as a function of radionuclide bone dose for the six proposed dose response curves considered. During normal operation a coal plant has higher radionuclide emissions than a nuclear plant and the coal industry has a higher leukaemia incidence risk than the nuclear industry, unless a nuclear accident occurs. Variation of nuclear accident size allows quantification of the impact of accidents on the total industry leukemia incidence risk comparison. The leukemia incidence risk is quantified as the number of accidents of a given size for the nuclear industry leukemia incidence risk to equal the coal industry leukemia incidence risk. The general linear model is used to develop equations that relate the accident frequency required for equal industry risks to the magnitude of the nuclear emission. Exploratory data analysis revealed that the relationship between the natural log of accident number versus the natural log of accident size is linear. (Author)
Framing an Nuclear Emergency Plan using Qualitative Regression Analysis
Since the arising on safety maintenance issues due to post-Fukushima disaster, as well as, lack of literatures on disaster scenario investigation and theory development. This study is dealing with the initiation difficulty on the research purpose which is related to content and problem setting of the phenomenon. Therefore, the research design of this study refers to inductive approach which is interpreted and codified qualitatively according to primary findings and written reports. These data need to be classified inductively into thematic analysis as to develop conceptual framework related to several theoretical lenses. Moreover, the framing of the expected framework of the respective emergency plan as the improvised business process models are abundant of unstructured data abstraction and simplification. The structural methods of Qualitative Regression Analysis (QRA) and Work System snapshot applied to form the data into the proposed model conceptualization using rigorous analyses. These methods were helpful in organising and summarizing the snapshot into an 'as-is' work system that being recommended as 'to-be'work system towards business process modelling. We conclude that these methods are useful to develop comprehensive and structured research framework for future enhancement in business process simulation. (author)
Analysis of retirement income adequacy using quantile regression: A case study in Malaysia
Alaudin, Ros Idayuwati; Ismail, Noriszura; Isa, Zaidi
2015-09-01
Quantile regression is a statistical analysis that does not restrict attention to the conditional mean and therefore, permitting the approximation of the whole conditional distribution of a response variable. Quantile regression is a robust regression to outliers compared to mean regression models. In this paper, we demonstrate how quantile regression approach can be used to analyze the ratio of projected wealth to needs (wealth-needs ratio) during retirement.
In environmental epidemiology, trace and toxic substance concentrations frequently have very highly skewed distributions ranging over one or more orders of magnitude, and prediction by conventional regression is often poor. Classification and Regression Tree Analysis (CART) is an alternative in such contexts. To compare the techniques, two Pennsylvania data sets and three independent variables are used: house radon progeny (RnD) and gamma levels as predicted by construction characteristics in 1330 houses; and ∼200 house radon (Rn) measurements as predicted by topographic parameters. CART may identify structural variables of interest not identified by conventional regression, and vice versa, but in general the regression models are similar. CART has major advantages in dealing with other common characteristics of environmental data sets, such as missing values, continuous variables requiring transformations, and large sets of potential independent variables. CART is most useful in the identification and screening of independent variables, greatly reducing the need for cross-tabulations and nested breakdown analyses. There is no need to discard cases with missing values for the independent variables because surrogate variables are intrinsic to CART. The tree-structured approach is also independent of the scale on which the independent variables are measured, so that transformations are unnecessary. CART identifies important interactions as well as main effects. The major advantages of CART appear to be in exploring data. Once the important variables are identified, conventional regressions seem to lead to results similar but more interpretable by most audiences. 12 refs., 8 figs., 10 tabs
The Variance Normalization Method of Ridge Regression Analysis.
Bulcock, J. W.; And Others
The testing of contemporary sociological theory often calls for the application of structural-equation models to data which are inherently collinear. It is shown that simple ridge regression, which is commonly used for controlling the instability of ordinary least squares regression estimates in ill-conditioned data sets, is not a legitimate
Adaptive regression analysis: theory and applications in econometrics
J. García Pérez
2003-01-01
Full Text Available In this work we (a discuss some theoretical and computational difficulties of regression analysing dependences, describing the behaviour of the heterogeneous systems, (b offer a set of new techniques adaptable to regression analysing the heterogeneous dependences and (c demonstrate the advantages of application of these new techniques in econometrics.
Design and analysis of experiments classical and regression approaches with SAS
Onyiah, Leonard C
2008-01-01
Introductory Statistical Inference and Regression Analysis Elementary Statistical Inference Regression Analysis Experiments, the Completely Randomized Design (CRD)-Classical and Regression Approaches Experiments Experiments to Compare Treatments Some Basic Ideas Requirements of a Good Experiment One-Way Experimental Layout or the CRD: Design and Analysis Analysis of Experimental Data (Fixed Effects Model) Expected Values for the Sums of Squares The Analysis of Variance (ANOVA) Table Follow-Up Analysis to Check fo
Optimization of end-members used in multiple linear regression geochemical mixing models
Dunlea, Ann G.; Murray, Richard W.
2015-11-01
Tracking marine sediment provenance (e.g., of dust, ash, hydrothermal material, etc.) provides insight into contemporary ocean processes and helps construct paleoceanographic records. In a simple system with only a few end-members that can be easily quantified by a unique chemical or isotopic signal, chemical ratios and normative calculations can help quantify the flux of sediment from the few sources. In a more complex system (e.g., each element comes from multiple sources), more sophisticated mixing models are required. MATLAB codes published in Pisias et al. solidified the foundation for application of a Constrained Least Squares (CLS) multiple linear regression technique that can use many elements and several end-members in a mixing model. However, rigorous sensitivity testing to check the robustness of the CLS model is time and labor intensive. MATLAB codes provided in this paper reduce the time and labor involved and facilitate finding a robust and stable CLS model. By quickly comparing the goodness of fit between thousands of different end-member combinations, users are able to identify trends in the results that reveal the CLS solution uniqueness and the end-member composition precision required for a good fit. Users can also rapidly check that they have the appropriate number and type of end-members in their model. In the end, these codes improve the user's confidence that the final CLS model(s) they select are the most reliable solutions. These advantages are demonstrated by application of the codes in two case studies of well-studied datasets (Nazca Plate and South Pacific Gyre).
T.A. Renaldy
2011-01-01
Full Text Available oped for prediction of particulate matter. The performance of the multiple regression models was assessed. For the development of neural network models, a feed forward with back propagation learning algorithm was used to train the network. The performance of neural network was determined in terms of correlation coefficient (R and Mean Square Error (MSE. The optimum number of hidden neurons was found out for obtaining the lowest value of MSE and the highest value of R. The results indicated that the network can predict particulate concentrations better than multiple regression models.
A flexible count data regression model for risk analysis.
Guikema, Seth D; Coffelt, Jeremy P; Goffelt, Jeremy P
2008-02-01
In many cases, risk and reliability analyses involve estimating the probabilities of discrete events such as hardware failures and occurrences of disease or death. There is often additional information in the form of explanatory variables that can be used to help estimate the likelihood of different numbers of events in the future through the use of an appropriate regression model, such as a generalized linear model. However, existing generalized linear models (GLM) are limited in their ability to handle the types of variance structures often encountered in using count data in risk and reliability analysis. In particular, standard models cannot handle both underdispersed data (variance less than the mean) and overdispersed data (variance greater than the mean) in a single coherent modeling framework. This article presents a new GLM based on a reformulation of the Conway-Maxwell Poisson (COM) distribution that is useful for both underdispersed and overdispersed count data and demonstrates this model by applying it to the assessment of electric power system reliability. The results show that the proposed COM GLM can provide as good of fits to data as the commonly used existing models for overdispered data sets while outperforming these commonly used models for underdispersed data sets. PMID:18304118
A simplified procedure of linear regression in a preliminary analysis
Silvia Facchinetti
2013-05-01
Full Text Available The analysis of a statistical large data-set can be led by the study of a particularly interesting variable Y regressed and an explicative variable X, chosen among the remained variables, conjointly observed. The study gives a simplified procedure to obtain the functional link of the variables y=y(x by a partition of the data-set into m subsets, in which the observations are synthesized by location indices (mean or median of X and Y. Polynomial models for y(x of order r are considered to verify the characteristics of the given procedure, in particular we assume r= 1 and 2. The distributions of the parameter estimators are obtained by simulation, when the fitting is done for m= r + 1. Comparisons of the results, in terms of distribution and efficiency, are made with the results obtained by the ordinary least square methods. The study also gives some considerations on the consistency of the estimated parameters obtained by the given procedure.
Regression tree analysis for predicting slaughter weight in broilers
Erkut Akkartal
2010-01-01
Full Text Available In this study, Regression Tree Analysis (RTA was used to predict and to determine the most important variables in predicting the slaughter weight of Ross 308 broiler chickens. Data for this study came from 224 chickens raised during three different seasons, namely spring (n=66, summer (n=66, winter (n=92. Second week body weight, shank length, shank width, breast bone length, breast width, breast circumference and body length were used to predict the slaughter weight. Results of RTA showed that among the seven independent variables only four were selected, namely; body weight, breast bone length, shank width, and breast circumference. These selected independent variables were more efficient than the others in predicting the slaughter weight. RTA indicated that the birds which had values of second week body weight >295.95 g, breast bone length >55.82 mm and breast circumference >14.18 cm or that of body weight ≤295.95 g, breast bone length >60.26 mm and shank width >8.32 mm could be expected to have higher slaughter weights.
Multiple Regression Analysis Using ANCOVA in University Model
Maneesha; Priti Bajpai
2013-01-01
The government of UAE is promoting Dubai as an academic hub. Dubai International Academic City (DIAC) is a free zone area with many national and international universities promoting higher education in almost all disciplines. The aspiration of every graduating student from the university is to get a good placement. In Dubai diverse job opportunities in national and multinational organizations are available. The objective of the paper is to review the placement opportunities in Dubai for the u...
Mortaza Jamshidian
2005-01-01
Full Text Available The problem of simultaneous inference and multiple comparison for comparing means of k( ≥ 3 populations has been long studied in the statistics literature and is widely available in statistics literature. However to-date, the problem of multiple comparison of regression models has not found its way to the software. It is only recently that the computational aspects of this problem have been resolved in a general setting. SimReg employs this new methodology and provides users with software for multiple regression of several regression models. The comparisons can be among any set of pairs, and moreover any number of predictors can be included in the model. More importantly predictors can be constrained to their natural boundaries, if known. Computational methods for the problem of simultaneous confidence bands when predictors are constrained to intervals has also recently been addressed. SimReg utilizes this recent development to offer simultaneous confidence bands for regression models with any number of predictor variables. Again, the predictors can be constrained to their natural boundaries which results in narrower bands, as compared to the case where no restriction is imposed. A by-product of these confidence bands is a new method for comparing two regression surfaces, that is more informative than the usual partial F test.
Tan, K H; Sulke, N.; Taub, N A; Watts, E.; Karani, S.; Sowton, E
1993-01-01
OBJECTIVE--To study the determinants of success of coronary angioplasty in patients with chronic total occlusions, and to formulate a multiple logistic regression model to improve selection of patients. DESIGN--A retrospective analysis of clinical and angiographic data on a consecutive series of patients. PATIENTS--312 patients (mean age 55, range 31 to 79 years, 86% men) who underwent coronary angioplasty procedure for a chronic total occlusion between 1981 and 1992. RESULTS--Procedural succ...
Regression analysis of technical parameters affecting nuclear power plant performances
Since the 80's many studies have been conducted in order to explicate good and bad performances of commercial nuclear power plants (NPPs), but yet no defined correlation has been found out to be totally representative of plant operational experience. In early works, data availability and the number of operating power stations were both limited; therefore, results showed that specific technical characteristics of NPPs were supposed to be the main causal factors for successful plant operation. Although these aspects keep on assuming a significant role, later studies and observations showed that other factors concerning management and organization of the plant could instead be predominant comparing utilities operational and economic results. Utility quality, in a word, can be used to summarize all the managerial and operational aspects that seem to be effective in determining plant performance. In this paper operational data of a consistent sample of commercial nuclear power stations, out of the total 433 operating NPPs, are analyzed, mainly focusing on the last decade operational experience. The sample consists of PWR and BWR technology, operated by utilities located in different countries, including U.S. (Japan)) (France)) (Germany)) and Finland. Multivariate regression is performed using Unit Capability Factor (UCF) as the dependent variable; this factor reflects indeed the effectiveness of plant programs and practices in maximizing the available electrical generation and consequently provides an overall indication of how well plants are operated and maintained. Aspects that may not be real causal factors but which can have a consistent impact on the UCF, as technology design, supplier, size and age, are included in the analysis as independent variables. (authors)
Several MRI features of supratentorial astrocytomas are associated with high histologic grade by statistically significant p values. We sought to apply this information prospectively to a group of astrocytomas in the prediction of tumor grade. We used 10 MRI features of fibrillary astrocytomas from 52 patient studies to develop neural network and multiple linear regression models for practical use in predicting tumor grade. The models were tested prospectively on MR images from 29 patient studies. The performance of the models was compared against that of a radiologist. Neural network accuracy was 61 % in distinguishing between low and high grade tumors. Multiple linear regression achieved an accuracy of 59 %. Assessment of the images by a radiologist yielded 57 % accuracy. We conclude that while certain MRI parameters may be statistically related to astrocytoma histologic grade, neural network and linear regression models cannot reliably use them to predict tumor grade. (orig.)
Buck, J. A.; Underhill, P. R.; Morelli, J.; Krause, T. W.
2016-02-01
Nuclear steam generators (SGs) are a critical component for ensuring safe and efficient operation of a reactor. Life management strategies are implemented in which SG tubes are regularly inspected by conventional eddy current testing (ECT) and ultrasonic testing (UT) technologies to size flaws, and safe operating life of SGs is predicted based on growth models. ECT, the more commonly used technique, due to the rapidity with which full SG tube wall inspection can be performed, is challenged when inspecting ferromagnetic support structure materials in the presence of magnetite sludge and multiple overlapping degradation modes. In this work, an emerging inspection method, pulsed eddy current (PEC), is being investigated to address some of these particular inspection conditions. Time-domain signals were collected by an 8 coil array PEC probe in which ferromagnetic drilled support hole diameter, depth of rectangular tube frets and 2D tube off-centering were varied. Data sets were analyzed with a modified principal components analysis (MPCA) to extract dominant signal features. Multiple linear regression models were applied to MPCA scores to size hole diameter as well as size rectangular outer diameter tube frets. Models were improved through exploratory factor analysis, which was applied to MPCA scores to refine selection for regression models inputs by removing nonessential information.
Le, Huy; Marcus, Justin
2012-01-01
This study used Monte Carlo simulation to examine the properties of the overall odds ratio (OOR), which was recently introduced as an index for overall effect size in multiple logistic regression. It was found that the OOR was relatively independent of study base rate and performed better than most commonly used R-square analogs in indexing model…
Thatcher, Greg W.; Henson, Robin K.
This study examined research in training and development to determine effect size reporting practices. It focused on the reporting of corrected effect sizes in research articles using multiple regression analyses. When possible, researchers calculated corrected effect sizes and determine if the associated shrinkage could have impacted researcher
Kromrey, Jeffrey D.; Hines, Constance V.
1996-01-01
The accuracy of three analytical formulas for shrinkage estimation and four empirical techniques were investigated in a Monte Carlo study of the coefficient of cross-validity in multiple regression. Substantial statistical bias was evident for all techniques except the formula of M. W. Brown (1975) and multicross-validation. (SLD)
Buffalos milk yield analysis using random regression models
A.S. Schierholt
2010-02-01
Full Text Available Data comprising 1,719 milk yield records from 357 females (predominantly Murrah breed, daughters of 110 sires, with births from 1974 to 2004, obtained from the Programa de Melhoramento Genético de Bubalinos (PROMEBUL and from records of EMBRAPA Amazônia Oriental - EAO herd, located in Belém, Pará, Brazil, were used to compare random regression models for estimating variance components and predicting breeding values of the sires. The data were analyzed by different models using the Legendre’s polynomial functions from second to fourth orders. The random regression models included the effects of herd-year, month of parity date of the control; regression coefficients for age of females (in order to describe the fixed part of the lactation curve and random regression coefficients related to the direct genetic and permanent environment effects. The comparisons among the models were based on the Akaike Infromation Criterion. The random effects regression model using third order Legendre’s polynomials with four classes of the environmental effect were the one that best described the additive genetic variation in milk yield. The heritability estimates varied from 0.08 to 0.40. The genetic correlation between milk yields in younger ages was close to the unit, but in older ages it was low.
Regression Analysis of Variables Describing Poultry Meat Supply in European Countries
Simoni? Miro
2012-11-01
Full Text Available In this paper, based on the analysis of official FAOSTAT and EUROSTAT data on poultry meat for 38 European countries for years 2007 and 2009, two hypotheses were examined. Firstly, considering four clustering variables on poultry meat, i.e. production, export and import in kg/capita, as well as the producer price in US $/t, using descriptive exploratory and cluster analysis, the hypothesis that the clusters of countries may be recognized was confirmed. As a result six clusters of similar countries were distinguished. Secondly, based on multiple regression analysis, this paper proofs that there exists the statistically significant relationship of poultry meat production on export and import of that kind of meat, all measured in kg/capita. There is also a high correlation between production, as a dependent, and each of two independent variables.
Use of generalized regression models for the analysis of stress-rupture data
The design of components for operation in an elevated-temperature environment often requires a detailed consideration of the creep and creep-rupture properties of the construction materials involved. Techniques for the analysis and extrapolation of creep data have been widely discussed. The paper presents a generalized regression approach to the analysis of such data. This approach has been applied to multiple heat data sets for types 304 and 316 austenitic stainless steel, ferritic 21/4 Cr-1 Mo steel, and the high-nickel austenitic alloy 800H. Analyses of data for single heats of several materials are also presented. All results appear good. The techniques presented represent a simple yet flexible and powerful means for the analysis and extrapolation of creep and creep-rupture data
Multiple logistic regression model of signalling practices of drivers on urban highways
Puan, Othman Che; Ibrahim, Muttaka Na'iya; Zakaria, Rozana
2015-05-01
Giving signal is a way of informing other road users, especially to the conflicting drivers, the intention of a driver to change his/her movement course. Other users are exposed to hazard situation and risks of accident if the driver who changes his/her course failed to give signal as required. This paper describes the application of logistic regression model for the analysis of driver's signalling practices on multilane highways based on possible factors affecting driver's decision such as driver's gender, vehicle's type, vehicle's speed and traffic flow intensity. Data pertaining to the analysis of such factors were collected manually. More than 2000 drivers who have performed a lane changing manoeuvre while driving on two sections of multilane highways were observed. Finding from the study shows that relatively a large proportion of drivers failed to give any signals when changing lane. The result of the analysis indicates that although the proportion of the drivers who failed to provide signal prior to lane changing manoeuvre is high, the degree of compliances of the female drivers is better than the male drivers. A binary logistic model was developed to represent the probability of a driver to provide signal indication prior to lane changing manoeuvre. The model indicates that driver's gender, type of vehicle's driven, speed of vehicle and traffic volume influence the driver's decision to provide a signal indication prior to a lane changing manoeuvre on a multilane urban highway. In terms of types of vehicles driven, about 97% of motorcyclists failed to comply with the signal indication requirement. The proportion of non-compliance drivers under stable traffic flow conditions is much higher than when the flow is relatively heavy. This is consistent with the data which indicates a high degree of non-compliances when the average speed of the traffic stream is relatively high.
Van der Ark, L. Andries
2005-01-01
It is well known that regression analyses involving compositional data need special attention because the data are not of full rank. For a regression analysis where both the dependent and independent variable are components we propose a transformation of the components emphasizing their role as dependent and independent variables. A simple linear regression can be performed on the transformed components. The regression line can be depicted in a ternary diagram facilitating the interpretation ...
INFLUENCE OF TOURISM SECTOR IN ALBANIAN GDP: STIMATION USING MULTIPLE REGRESSION METHOD
Eglantina HYSA
2012-06-01
Full Text Available During last years, tourism sector has significantly increased in Albania, since after year 1990 Albania has passed from a centralized economy to a liberal one. Tourism sector plays an important role in economic and social development. The contributions of this sector reflect directly into the generation of national income. The two main components matching the tourism movements are the number of tourists and the number of overnights in hotels. Investments done in this sector could be expected to have high positive influence in the country's GDP. This study seeks to identify the influence of tourists, their overnights in hotels and capital investment spending by all sectors directly involved in tourism sector on tourism total contribution to gross domestic product of Albania during 1996-2009. A regression analysis has been performed taking as dependent variable GDP generated by tourism sector and as independent variables, capital investment, tourist number and overnights in hotels. Even if all the variables have been found to be positivlye related, the variable ‘overnights of foreigners and Albanians in hotels' have beenfound insignificant.
INFLUENCE OF TOURISM SECTOR IN ALBANIAN GDP: STIMATION USING MULTIPLE REGRESSION METHOD
Eglantina HYSA
2012-06-01
Full Text Available During last years, tourism sector has significantly increased in Albania, since after year 1990 Albania has passed from a centralized economy to a liberal one. Tourism sector plays an important role in economic and social development. The contributions of this sector reflect directly into the generation of national income. The two main components matching the tourism movements are the number of tourists and the number of overnights in hotels. Investments done in this sector could be expected to have high positive influence in the country's GDP. This study seeks to identify the influence of tourists, their overnights in hotels and capital investment spending by all sectors directly involved in tourism sector on tourism total contribution to gross domestic product of Albania during 1996-2009. A regression analysis has been performed taking as dependent variable GDP generated by tourism sector and as independent variables, capital investment, tourist number and overnights in hotels. Even if all the variables have been found to be positivlye related, the variable overnights of foreigners and Albanians in hotels' have beenfound insignificant.
The gamma/beta TLD badge used by OPPD consists of two TLD-700 chips (Harshaw G7 card), one of which (chip number sign 2) is shielded by a 0.102 cm-thick aluminum filter, and the other (chip number sign 1) is unshielded, as shown in Fig. 1. Standard procedure had been to determine the beta dose to the badge by subtracting the response of chip number sign 2 from that of chip number sign 1 and then dividing by a calibrated beta-sensitivity factor; the gamma dose was taken to be the response of chip number sign 2 divided by the chip's gamma-sensitivity factor followed by the subtraction of the background dose. A problem with this procedure is penetration of energetic beta particles through the aluminum filter on chip number sign 2 which causes an over-response. Due to the technique used to obtain the beta dose, this also results in an under-estimate of the beta dose. This problem has been corrected through application of multiple linear regression analysis on a large data base of pure gamma (137Cs), pure beta (90Sr), and mixed exposures. The outcome of the analysis is an algorithm that automatically corrects for penetration effects. Performance tests using the ANSI N13.11 standard are presented to show the improvement
Grades, Gender, and Encouragement: A Regression Discontinuity Analysis
Owen, Ann L.
2010-01-01
The author employs a regression discontinuity design to provide direct evidence on the effects of grades earned in economics principles classes on the decision to major in economics and finds a differential effect for male and female students. Specifically, for female students, receiving an A for a final grade in the first economics class is…
Regression Analysis with Block Missing Values and Variables Selection
Chien-Pai Han
2011-07-01
Full Text Available We consider a regression model when a block of observations is missing, i.e. there are a group of observations with all the explanatory variables or covariates observed and another set of observations with only a block of the variables observed. We propose an estimator of the regression coefficients that is a combination of two estimators, one based on the observations with no missing variables, and the other the set all observations after deleting of the block of variables with missing values. The proposed combined estimator will be compared with the uncombined estimators. If the experimenter suspects that the variables with missing values may be deleted, a preliminary test will be performed to resolve the uncertainty. If the preliminary test of the null hypothesis that regression coefficients of the variables with missing value equal to zero is accepted, then only the data with no missing values are used for estimating the regression coefficients. Otherwise the combined estimator is used. This gives a preliminary test estimator. The properties of the preliminary test estimator and comparisons of the estimators are studied by a Monte Carlo study
Analysis on Train Stopping Accuracy based on Regression Algorithms
Lin Ma
2014-05-01
Full Text Available Stopping accuracy is one of the most important indexes of efficiency of automatic train operation (ATO systems. Traditional stopping control algorithms in ATO systems have some drawbacks, as many factors have not been taken into account. In the large amount of field-collected data about stopping accuracy there are many factors (e.g. system delays, stopping time, net pressure which affecting stopping accuracy. In this paper, three popular data mining methods are proposed to analyze the train stopping accuracy. Firstly, we find fifteen factors which have impact on the stopping accuracy. Then, ridge regression, lasso regression and elastic net regression are employed to mine models to reflecting the relationship between the fifteen factors and the stopping accuracy. Then, the three models are compared by using Akaike information criterion (AIC, a model selection criterion which considering the trade-off between accuracy and complexity.The computational results show that elastic net regression model has a best performance on AIC value. Finally, we obtain the parameters which can make the train stop more accurately which can provide a reference to improve stopping accuracy for ATO systems.
Testing Heteroscedasticity in Nonparametric Regression Based on Trend Analysis
Si-Lian Shen; Jian-Ling Cui; Chun-Wei Wang
2014-01-01
We first propose in this paper a new test method for detecting heteroscedasticity of the error term in nonparametric regression. Some simulation experiments are then conducted to evaluate the performance of the proposed methodology. A real-world data set is finally analyzed to demonstrate the application of the method.
Grades, gender, and encouragement: A regression discontinuity analysis
Owen, Ann L.
2008-01-01
This study employs a regression discontinuity design in order to provide direct evidence on the effects of grades earned in economics principles classes on the decision to major in economics and finds a differential effect for male and female students. Specifically, for female students, receiving an “A” for a final grade in the first economics class is associated with a meaningful increase in the probability of majoring in economics, even after controlling for the numerical grade earned in t...
Buffalos milk yield analysis using random regression models
A.S. Schierholt; L. Celi Chaves; S. Inoe Araújo; A. De Amorim Ramos; C.V. Araújo
2010-01-01
Data comprising 1,719 milk yield records from 357 females (predominantly Murrah breed), daughters of 110 sires, with births from 1974 to 2004, obtained from the Programa de Melhoramento Genético de Bubalinos (PROMEBUL) and from records of EMBRAPA Amazônia Oriental - EAO herd, located in Belém, Pará, Brazil, were used to compare random regression models for estimating variance components and predicting breeding values of the sires. The data were analyzed by different...
Model performance analysis and model validation in logistic regression
Rosa Arboretti Giancristofaro
2007-10-01
Full Text Available In this paper a new model validation procedure for a logistic regression model is presented. At first, we illustrate a brief review of different techniques of model validation. Next, we define a number of properties required for a model to be considered "good", and a number of quantitative performance measures. Lastly, we describe a methodology for the assessment of the performance of a given model by using an example taken from a management study.
Regression analysis of correlated ordinal data using orthogonalized residuals.
Perin, J; Preisser, J S; Phillips, C; Qaqish, B
2014-12-01
Semi-parametric regression models for the joint estimation of marginal mean and within-cluster pairwise association parameters are used in a variety of settings for population-averaged modeling of multivariate categorical outcomes. Recently, a formulation of alternating logistic regressions based on orthogonalized, marginal residuals has been introduced for correlated binary data. Unlike the original procedure based on conditional residuals, its covariance estimator is invariant to the ordering of observations within clusters. In this article, the orthogonalized residuals method is extended to model correlated ordinal data with a global odds ratio, and shown in a simulation study to be more efficient and less biased with regards to estimating within-cluster association parameters than an existing extension to ordinal data of alternating logistic regressions based on conditional residuals. Orthogonalized residuals are used to estimate a model for three correlated ordinal outcomes measured repeatedly in a longitudinal clinical trial of an intervention to improve recovery of patients' perception of altered sensation following jaw surgery. PMID:25134789
H., Jang; E., Topal; Y., Kawamura.
2015-05-01
Full Text Available Unplanned dilution and ore loss directly influence not only the productivity of underground stopes, but also the profitability of the entire mining process. Stope dilution is a result of complex interactions between a number of factors, and cannot be predicted prior to mining. In this study, unplann [...] ed dilution and ore loss prediction models were established using multiple linear and nonlinear regression analysis (MLRA and MNRA), as well as an artificial neural network (ANN) method based on 1067 datasets with ten causative factors from three underground longhole stoping mines in Western Australia. Models were established for individual mines, as well as a general model that includes all of the mine data-sets. The correlation coefficient (R) was used to evaluate the methods, and the values for MLRA, MNRA, and ANN compared with the general model were 0.419, 0.438, and 0.719, respectively. Considering that the current unplanned dilution and ore loss prediction for the mines investigated yielded an R of 0.088, the ANN model results are noteworthy. The proposed ANN model can be used directly as a practical tool to predict unplanned dilution and ore loss in mines, which will not only enhance productivity, but will also be beneficial for stope planning and design.
Arch Height: A Regression Analysis of Different Measuring Parameters
Hironmoy Roy; Kalyan Bhattacharya; Asit Chandra Roy; Samar Deb; Kuntala Ray
2011-01-01
Rationale: For measuring the height of the arch of foot either standing navicular height or talar height of the medial longitudinal arch was accepted in earlier days, where as the ‘standing normalised navicular height’ is taken by modern day by authors as a yardstick. But being troublesome and time consuming, we practically not opt for them in busy OPD schedule; rather go for measuring the arch-height in supine posture. Objectives: So this study was aimed to derive the regression between the...
Regression analysis of censored data using pseudo-observations
Parner, Erik T.; Andersen, Per Kragh
2010-01-01
We draw upon a series of articles in which a method based on pseu- dovalues is proposed for direct regression modeling of the survival function, the restricted mean, and the cumulative incidence function in competing risks with right-censored data. The models, once the pseudovalues have been comp...... computed, can be fit using standard generalized estimating equation software. Here we present Stata procedures for computing these pseudo-observations. An example from a bone marrow transplantation study is used to illustrate the method....
The Development and Demonstration of Multiple Regression Models for Operant Conditioning Questions.
Fanning, Fred; Newman, Isadore
Based on the assumption that inferential statistics can make the operant conditioner more sensitive to possible significant relationships, regressions models were developed to test the statistical significance between slopes and Y intercepts of the experimental and control group subjects. These results were then compared to the traditional operant…
Analysis of some methods for reduced rank Gaussian process regression
Quinonero-Candela, J.; Rasmussen, Carl Edward
2005-01-01
While there is strong motivation for using Gaussian Processes (GPs) due to their excellent performance in regression and classification problems, their computational complexity makes them impractical when the size of the training set exceeds a few thousand cases. This has motivated the recent pro...... the posterior. We propose an alternative method to the SGGP that has better generalization capabilities. Finally we make experiments to compare the different ways of training a RRGP. We provide some Matlab code for learning RRGPs.......While there is strong motivation for using Gaussian Processes (GPs) due to their excellent performance in regression and classification problems, their computational complexity makes them impractical when the size of the training set exceeds a few thousand cases. This has motivated the recent...... covariance function hyperparameters and the support set. We propose a method for learning hyperparameters for a given support set. We also review the Sparse Greedy GP (SGGP) approximation (Smola and Bartlett, 2001), which is a way of learning the support set for given hyperparameters based on approximating...
Saeidi, Omid; Torabi, Seyed Rahman; Ataei, Mohammad
2014-03-01
Rock mass classification systems are one of the most common ways of determining rock mass excavatability and related equipment assessment. However, the strength and weak points of such rating-based classifications have always been questionable. Such classification systems assign quantifiable values to predefined classified geotechnical parameters of rock mass. This causes particular ambiguities, leading to the misuse of such classifications in practical applications. Recently, intelligence system approaches such as artificial neural networks (ANNs) and neuro-fuzzy methods, along with multiple regression models, have been used successfully to overcome such uncertainties. The purpose of the present study is the construction of several models by using an adaptive neuro-fuzzy inference system (ANFIS) method with two data clustering approaches, including fuzzy c-means (FCM) clustering and subtractive clustering, an ANN and non-linear multiple regression to estimate the basic rock mass diggability index. A set of data from several case studies was used to obtain the real rock mass diggability index and compared to the predicted values by the constructed models. In conclusion, it was observed that ANFIS based on the FCM model shows higher accuracy and correlation with actual data compared to that of the ANN and multiple regression. As a result, one can use the assimilation of ANNs with fuzzy clustering-based models to construct such rigorous predictor tools.
Litman, Heather J.; Horton, Nicholas J.; Hernández, Bernardo; Nan M. Laird
2007-01-01
Multiple informant data refers to information obtained from different individuals or sources used to measure the same construct; for example, researchers might collect information regarding child psychopathology from the child's teacher and the child's parent. Frequently, studies with multiple informants have incomplete observations; in some cases the missingness of informants is substantial. We introduce a Maximum Likelihood (ML) technique to fit models with multiple informants as predictors...
Gaines, R E; Tydeman, M S
1982-08-01
A program, WRANL, is described for the analysis of immunoassays or bioassays which have a logistic dose-response relationship. Responses are transformed to logits and iterative weighted regression analysis is used to obtain log dose-logit response lines for all preparations compared in an assay. Potency estimates of preparations relative to the standard preparation are available for both unweighted and weighted regression analyses together with detailed analysis of variance, estimates of slope and other relevant parameters. The general comparisons of dose-response relationships produced by the program are a feature of particular interest. However, an option which suppresses the more general output is available if the program is to be used for analysis of a 'screening' assay comparing single dilutions or doses of test samples with a standard curve. Data input is designed to permit immediate running of the program by junior personnel. Data output is designed to facilitate record keeping. PMID:7128120
A program, WRANL, is described for the analysis of immunoassays or bioassays which have a logistic dose-response relationship. Responses are transformed to logits and iterative weighted regression analysis is used to obtain log dose-logit response lines for all preparations compared in an assay. Potency estimates of preparations relative to the standard preparation are available for both unweighted and weighted regression analyses together with detailed analysis of variance, estimates of slope and other relevant parameters. The general comparisons of dose-response relationships produced by the program are a feature of particular interest. However, an option which suppresses the more general output is available if the program is to be used for analysis of a 'screening' assay comparing single dilutions or doses of test samples with a standard curve. Data input is designed to permit immediate running of the program by junior personnel. Data output is designed to facilitate record keeping. (Auth.)
Additive Intensity Regression Models in Corporate Default Analysis
Lando, David; Medhat, Mamdouh; Nielsen, Mads Stenbo; Nielsen, Søren Feodor
2013-01-01
We consider additive intensity (Aalen) models as an alternative to the multiplicative intensity (Cox) models for analyzing the default risk of a sample of rated, nonfinancial U.S. firms. The setting allows for estimating and testing the significance of time-varying effects. We use a variety of mo...
A plot of lung-cancer rates versus radon exposures in 965 US counties, or in all US states, has a strong negative slope, b, in sharp contrast to the strong positive slope predicted by linear/no-threshold theory. The discrepancy between these slopes exceeds 20 standard deviations (SD). Including smoking frequency in the analysis substantially improves fits to a linear relationship but has little effect on the discrepancy in b, because correlations between smoking frequency and radon levels are quite weak. Including 17 socioeconomic variables (SEV) in multiple regression analysis reduces the discrepancy to 15 SD. Data were divided into segments by stratifying on each SEV in turn, and on geography, and on both simultaneously, giving over 300 data sets to be analyzed individually, but negative slopes predominated. The slope is negative whether one considers only the most urban counties or only the most rural; only the richest or only the poorest; only the richest in the South Atlantic region or only the poorest in that region, etc., etc.,; and for all the strata in between. Since this is an ecological study, the well-known problems with ecological studies were investigated and found not to be applicable here. The open-quotes ecological fallacyclose quotes was shown not to apply in testing a linear/no-threshold theory, and the vulnerability to confounding is greatly reduced when confounding factors are only weakly correlated with radon levels, as is generally the case here. All confounding factors known to correlate with radon and with lung cancer were investigated quantitatively and found to have little effect on the discrepancy
Raheem, SM Enayetur; Ahmed, S. Ejaz
2011-01-01
Consider a problem of predicting a response variable using a set of covariates in a linear regression model. If it is \\emph{a priori} known or suspected that a subset of the covariates do not significantly contribute to the overall fit of the model, a restricted model that excludes these covariates, may be sufficient. If, on the other hand, the subset provides useful information, shrinkage method combines restricted and unrestricted estimators to obtain the parameter estimates. Such an estima...
[Analysis of conditional logistic regression for risk factors of lung cancer in Dachang Tin Mine].
Wu, K G
1989-03-01
An increased risk of lung cancer in Dachang Tin Mine of Guangxi has been reported. To investigate the factors of the excessive risk of lung cancer, the authors conducted a matched pair case-control study in the mine area and analysed the effect of multiple factors, such as condition of living and housing, occupational exposure and smoking by statistical method of conditional logistic regression. The patients group consisted of 69 patients with primary bronchogenic cancer including 55 deceased. The control group consisted of 138 persons also including 55 deceased. The results showed that the factors of the excessive risk of lung cancer in the mine area were mainly related to the occupational exposure. The risk factors with statistical significance in conditional logistic regression analysis were exposure time of smelting, time of underground drilling, and age of beginning mining underground. In the study model of all cases matched against living controls, daily number of cigarette also was a risk factor besides the above three factors. Furthermore, there was a synergic action among the factors. The relationship between the risk factors and lung cancer is discussed. PMID:2806041
Yuan Zheng
2005-10-01
Full Text Available Abstract Background Protein tertiary structure can be partly characterized via each amino acid's contact number measuring how residues are spatially arranged. The contact number of a residue in a folded protein is a measure of its exposure to the local environment, and is defined as the number of Cβ atoms in other residues within a sphere around the Cβ atom of the residue of interest. Contact number is partly conserved between protein folds and thus is useful for protein fold and structure prediction. In turn, each residue's contact number can be partially predicted from primary amino acid sequence, assisting tertiary fold analysis from sequence data. In this study, we provide a more accurate contact number prediction method from protein primary sequence. Results We predict contact number from protein sequence using a novel support vector regression algorithm. Using protein local sequences with multiple sequence alignments (PSI-BLAST profiles, we demonstrate a correlation coefficient between predicted and observed contact numbers of 0.70, which outperforms previously achieved accuracies. Including additional information about sequence weight and amino acid composition further improves prediction accuracies significantly with the correlation coefficient reaching 0.73. If residues are classified as being either "contacted" or "non-contacted", the prediction accuracies are all greater than 77%, regardless of the choice of classification thresholds. Conclusion The successful application of support vector regression to the prediction of protein contact number reported here, together with previous applications of this approach to the prediction of protein accessible surface area and B-factor profile, suggests that a support vector regression approach may be very useful for determining the structure-function relation between primary protein sequence and higher order consecutive protein structural and functional properties.
Yang, Jianhong; Yi, Cancan; Xu, Jinwu; Ma, Xianghong
2015-05-01
A new LIBS quantitative analysis method based on analytical line adaptive selection and Relevance Vector Machine (RVM) regression model is proposed. First, a scheme of adaptively selecting analytical line is put forward in order to overcome the drawback of high dependency on a priori knowledge. The candidate analytical lines are automatically selected based on the built-in characteristics of spectral lines, such as spectral intensity, wavelength and width at half height. The analytical lines which will be used as input variables of regression model are determined adaptively according to the samples for both training and testing. Second, an LIBS quantitative analysis method based on RVM is presented. The intensities of analytical lines and the elemental concentrations of certified standard samples are used to train the RVM regression model. The predicted elemental concentration analysis results will be given with a form of confidence interval of probabilistic distribution, which is helpful for evaluating the uncertainness contained in the measured spectra. Chromium concentration analysis experiments of 23 certified standard high-alloy steel samples have been carried out. The multiple correlation coefficient of the prediction was up to 98.85%, and the average relative error of the prediction was 4.01%. The experiment results showed that the proposed LIBS quantitative analysis method achieved better prediction accuracy and better modeling robustness compared with the methods based on partial least squares regression, artificial neural network and standard support vector machine.
Survival analysis of cervical cancer using stratified Cox regression
Purnami, S. W.; Inayati, K. D.; Sari, N. W. Wulan; Chosuvivatwong, V.; Sriplung, H.
2016-04-01
Cervical cancer is one of the mostly widely cancer cause of the women death in the world including Indonesia. Most cervical cancer patients come to the hospital already in an advanced stadium. As a result, the treatment of cervical cancer becomes more difficult and even can increase the death's risk. One of parameter that can be used to assess successfully of treatment is the probability of survival. This study raises the issue of cervical cancer survival patients at Dr. Soetomo Hospital using stratified Cox regression based on six factors such as age, stadium, treatment initiation, companion disease, complication, and anemia. Stratified Cox model is used because there is one independent variable that does not satisfy the proportional hazards assumption that is stadium. The results of the stratified Cox model show that the complication variable is significant factor which influent survival probability of cervical cancer patient. The obtained hazard ratio is 7.35. It means that cervical cancer patient who has complication is at risk of dying 7.35 times greater than patient who did not has complication. While the adjusted survival curves showed that stadium IV had the lowest probability of survival.
Highlights: • This paper presents a new method useful for the optimization of complex dynamic systems. • The method uses the strengths of; genetic algorithms (GA), and regression splines. • The method is applied to the design of a gas cooled fast breeder reactor design. • Tools like Java, R, and codes like MCNP, Matlab are used in this research. - Abstract: A module based optimization method using genetic algorithms (GA), and multivariate regression analysis has been developed to optimize a set of parameters in the design of a nuclear reactor. GA simulates natural evolution to perform optimization, and is widely used in recent times by the scientific community. The GA fits a population of random solutions to the optimized solution of a specific problem. In this work, we have developed a genetic algorithm to determine the values for a set of nuclear reactor parameters to design a gas cooled fast breeder reactor core including a basis thermal–hydraulics analysis, and energy transfer. Multivariate regression is implemented using regression splines (RS). Reactor designs are usually complex and a simulation needs a significantly large amount of time to execute, hence the implementation of GA or any other global optimization techniques is not feasible, therefore we present a new method of using RS in conjunction with GA. Due to using RS, we do not necessarily need to run the neutronics simulation for all the inputs generated from the GA module rather, run the simulations for a predefined set of inputs, build a multivariate regression fit to the input and the output parameters, and then use this fit to predict the output parameters for the inputs generated by GA. The reactor parameters are given by the, radius of a fuel pin cell, isotopic enrichment of the fissile material in the fuel, mass flow rate of the coolant, and temperature of the coolant at the core inlet. And, the optimization objectives for the reactor core are, high breeding of U-233 and Pu-239 in desired power peaking limits, desired effective and infinite neutron multiplication factors, high fast fission factor, high thermal efficiency in the conversion from thermal energy to electrical energy using the Brayton cycle, and high fuel burn-up. It is to be noted that we have kept the total mass of the fuel as constant. In this work, we present a module based (modular) approach to perform the optimization wherein, we have defined the following modules: single fuel pin cell, whole core, thermal–hydraulics, and energy conversion. In each of the modules we have defined a specific set of parameters and optimization objectives. The GA system (GAS), and RS together, play the role of optimizing each of the individual modules, and integrating the modules to determine the final nuclear reactor core. However, implementation of GA could lead to a local minimum or a non-unique set of parameters, those meet the specific optimization objectives. The GA code is built using Java, neutronic analysis using MCNP6, thermal–hydraulics calculations using Java, and regression analysis using R
Functional Multiple-Set Canonical Correlation Analysis
Hwang, Heungsun; Jung, Kwanghee; Takane, Yoshio; Woodward, Todd S.
2012-01-01
We propose functional multiple-set canonical correlation analysis for exploring associations among multiple sets of functions. The proposed method includes functional canonical correlation analysis as a special case when only two sets of functions are considered. As in classical multiple-set canonical correlation analysis, computationally, the…
Normalization Ridge Regression in Practice II: The Estimation of Multiple Feedback Linkages.
Bulcock, J. W.
The use of the two-stage least squares (2 SLS) procedure for estimating nonrecursive social science models is often impractical when multiple feedback linkages are required. This is because 2 SLS is extremely sensitive to multicollinearity. The standard statistical solution to the multicollinearity problem is a biased, variance reduced procedure
Development of regression model for uncertainty analysis by response surface method in HANARO
The feasibility of uncertainty analysis with regression model in reactor physics problem was investigated. Regression model as a alternative model for a MCNP/ORIGEN2 code system which is uncertainty analysis tool of fission-produced molybdenum production was developed using Response Surface Method. It was shown that the development of regression model in the reactor physics problem was possible by introducing the burnup parameter. The most important parameter affecting the uncertainty of 99Mo yield ratio was fuel thickness in the regression model. This results agree well those of Crude Monte Carlo Method for each parameter. The regression model developed in this research was shown to be suitable as a alternative model, because coefficient of determination was 0.99
The application of a multiple regression model for aero radiometric data
The data observed in the total channel of high sensitivity airborne γ-ray spectrometric surveys is selected as the dependent variable while those of the Th, K and U channels are considered as independent variables and a linear statistical model is assumed to relate them as (Total)sub(i) αsub(0) + β1(U)sub(i) + β2(Th)sub(i) + β3(K)sub(i) + εsub(i), β1, β2, β3, are the partial regression coefficients and εsub(i) is the error term. The estimated coefficients (β1, β2, β3) are used to check on board the data acquisition system as well as to predict occasionally the more appropriate value of the data in case a single data item is not recorded correctly. (author)
SAS PARTIAL LEAST SQUARES REGRESSION FOR ANALYSIS OF SPECTROSCOPIC DATA
The objective was to investigate the potential of SAS PLS to perform chemometric analysis of spectroscopic data. As implemented, SAS can perform type II PLS only, PCR and RRR. While possessing several algorithms for PLS, various cross validation options, the ability to mean center and variance sca...
A Noncentral "t" Regression Model for Meta-Analysis
Camilli, Gregory; de la Torre, Jimmy; Chiu, Chia-Yi
2010-01-01
In this article, three multilevel models for meta-analysis are examined. Hedges and Olkin suggested that effect sizes follow a noncentral "t" distribution and proposed several approximate methods. Raudenbush and Bryk further refined this model; however, this procedure is based on a normal approximation. In the current research literature, this…
Highlights: • Thermo-environmental and exergoeconomic models of a combined cycle power plant are defined. • Effects of various operating parameters on performance, CO2 emissions and costs are deliberated. • Multiple polynomial regression models are developed. • For various operating conditions, optimal operating parameters are determined. - Abstract: A combined cycle power plant is analyzed through thermo-environmental, exergoeconomic and statistical methods. The plant is first modeled and parametrically studied to deliberate the effects of various operating parameters on the thermo-environmental quantities, like net power output, energy efficiency, exergy efficiency and CO2 emissions. These quantities are then correlated with operating parameters through multiple polynomial regression analysis. Moreover, exergoeconomic analysis is performed to look into the impact of operating parameters on fuel cost, capital cost and exergy destruction cost. The optimal operating parameters are then determined using the Nelder-Mead simplex method by defining two objective functions, namely exergy efficiency (maximized) and total cost (minimized). According to the parametric analysis, the operating parameters impart significant effects on the performance and cost rates. The regression models are appearing to be a good estimator of the response variables since appended with satisfactory R2 values. The optimization results exhibit that the exergy efficiency is increased and cost rates are decreased by selecting the best trade-off values at different power output conditions
Multiple linear regression models are often used to predict levels of fecal indicator bacteria (FIB) in recreational swimming waters based on independent variables (IVs) such as meteorologic, hydrodynamic, and water-quality measures. The IVs used for these analyses are traditiona...
Angela Radnz Lazzari
2011-01-01
Full Text Available O ar um meio eficiente de disperso de poluentes atmosfricos e seucomportamento depende dos movimentos atmosfricos que ocorrem na troposfera. Em Porto Alegre, Estado do Rio Grande do Sul, h um grande trfego dirio e uma concentrao de indstrias que podem ser responsveis por emisses atmosfricas. Neste trabalho, estudou-se ocomportamento das concentraes dirias de material particulado (PM10 desta cidade, considerando a influncia dos elementos meteorolgicos. A anlise dos dados foi realizada a partir de estatsticas descritivas, correlao linear e regresso mltipla. Os dados foram fornecidos pela Fundao Estadual de Proteo Ambiental Henrique Luiz Roessler - RS (FEPAM e pelo Instituto Nacional de Meteorologia (INMET. A partir das anlises pde-se verificar que: asconcentraes do PM10, medidos diariamente s 16h, no ultrapassaram os padres nacionais de qualidade do ar; os elementos meteorolgicos que influenciam nas concentraes do PM10 foram: a velocidade mdia diria do vento e a radiao mdia diria com relaes negativas; astemperaturas mdias dirias do ar e as direes, norte e noroeste, do vento, com relaes positivas. As direes do vento que contribuem significativamente para diminuir as concentraes nos locais medidos so Leste e Sudeste.Air is an efficient means of atmospheric pollutants dispersal and its r behavior depends on the atmospheric movements that occur in the troposphere. In Porto Alegre, Rio Grande do Sul State, there is a large daily traffic and a concentration of industries that may be responsible for atmospheric emission. In the present work we studied the behavior of daily concentrations of particulate matter (PM10, in this city, considering the influence of meteorological variables. Dataanalysis was performed from descriptive statistics, linear correlation and multiple regressions. Data were provided by the State Foundation of Environmental Protection Henrique Luiz Roessler - RS and the National Institute of Meteorology. Based on the analysis it was possible to verify that: the concentration of PM10, measured every day at 4:00 p.m., did not exceed national standards for air quality; meteorological elements that influenced on the concentrations of PM10 were the daily average wind speed and average daily radiation with negative relations; the daily average temperature of the air and the directions, north and northwest of wind, with positive relations. Wind directions which contribute significantly to lower concentrations on the measured placesare east and southeast.
Walton, Joseph M.; And Others
1978-01-01
Ridge regression is an approach to the problem of large standard errors of regression estimates of intercorrelated regressors. The effect of ridge regression on the estimated squared multiple correlation coefficient is discussed and illustrated. (JKS)
Ijima, Yusuke; Nose, Takashi; Tachibana, Makoto; Kobayashi, Takao
In this paper, we propose a rapid model adaptation technique for emotional speech recognition which enables us to extract paralinguistic information as well as linguistic information contained in speech signals. This technique is based on style estimation and style adaptation using a multiple-regression HMM (MRHMM). In the MRHMM, the mean parameters of the output probability density function are controlled by a low-dimensional parameter vector, called a style vector, which corresponds to a set of the explanatory variables of the multiple regression. The recognition process consists of two stages. In the first stage, the style vector that represents the emotional expression category and the intensity of its expressiveness for the input speech is estimated on a sentence-by-sentence basis. Next, the acoustic models are adapted using the estimated style vector, and then standard HMM-based speech recognition is performed in the second stage. We assess the performance of the proposed technique in the recognition of simulated emotional speech uttered by both professional narrators and non-professional speakers.
Masters, T.
2013-11-01
The effectiveness of multiple linear regression approaches in removing solar, volcanic, and El Nino Southern Oscillation (ENSO) influences from the recent (1979-2012) surface temperature record is examined, using simple energy balance and global climate models (GCMs). These multiple regression methods are found to incorrectly diagnose the underlying signal - particularly in the presence of a deceleration - by generally overestimating the solar cooling contribution to an early 21st century pause while underestimating the warming contribution from the Mt. Pinatubo recovery. In fact, one-box models and GCMs suggest that the Pinatubo recovery has contributed more to post-2000 warming trends than the solar minimum has contributed to cooling over the same period. After adjusting the observed surface temperature record based on the natural-only multi-model mean from several CMIP5 GCMs and an empirical ENSO adjustment, a significant deceleration in the surface temperature increase is found, ranging in magnitude from -0.06 to -0.12 K dec-2 depending on model sensitivity and the temperature index used. This likely points to internal decadal variability beyond these solar, volcanic, and ENSO influences.
Hirose, Takahiko; Nakajima, Hideto; Shigekiyo, Tarou; Yokote, Taiji; Ishida, Shimon; Kimura, Fumiharu
2016-01-29
We report the case of a 62-year-old man who presented with malignant lymphoma as recurrent multiple cranial nerve palsy after spontaneous regression of oculomotor nerve palsy. He developed ptosis and diplopia due to right oculomotor nerve palsy. Brain MRI/MRA showed no abnormality, and he recovered with conservative medical management. Three months later, he showed diplopia due to right abducens nerve palsy and facial pain and trigeminal sensory loss. Neurological examination revealed multiple cranial nerve palsy involved cranial nerve III, V, IX, and X of the right side. Serum soluble interleukin-2 receptor levels were normal, and cerebrospinal fluid examination was unremarkable. Steroid and subsequent intravenous immunoglobulin therapy didn't improve his symptoms. Six weeks after his admission, he showed rapid enlargement of the cervical lymph node and the right tonsil, and post-contrast T1-weighted MRI showed enlargement and enhancement of the left infraorbital nerve, the bilateral cavernous sinus, the bilateral facial nerves, and the left trigeminal nerve. The histopathologic examination of the tonsil biopsy revealed diffuse large B cell lymphoma. The cause of these symptoms was thought to be infiltrating the cavernous sinus, and adjacent nerves. Spontaneous regression of malignant lymphoma is an exceptional event, but this possibility should be considered so as to the correct diagnosis and proper treatment. PMID:26616489
Analysis of Functional Data with Focus on Multinomial Regression and Multilevel Data
Mousavi, Seyed Nourollah
another regarding speech recognition. In the second paper, we consider functional logistic regression via wavelet and LASSO which is a specic case of multinomial functional regression with two classes for the response and compare the eciency (from classication point of view) of this model with two other...... datasets, one regarding lameness detection for horse and another regarding speech recognition. In the second paper, we consider functional logistic regression via wavelet and LASSO which is a specic case of multinomial functional regression with two classes for the response and compare the eciency (from......Functional data analysis (FDA) is a fast growing area in statistical research with increasingly diverse range of application from economics, medicine, agriculture, chemometrics, etc. Functional regression is an area of FDA which has received the most attention both in aspects of application and...
Coelho, Lcia H G; Gutz, Ivano G R
2006-03-15
A chemometric method for analysis of conductometric titration data was introduced to extend its applicability to lower concentrations and more complex acid-base systems. Auxiliary pH measurements were made during the titration to assist the calculation of the distribution of protonable species on base of known or guessed equilibrium constants. Conductivity values of each ionized or ionizable species possibly present in the sample were introduced in a general equation where the only unknown parameters were the total concentrations of (conjugated) bases and of strong electrolytes not involved in acid-base equilibria. All these concentrations were adjusted by a multiparametric nonlinear regression (NLR) method, based on the Levenberg-Marquardt algorithm. This first conductometric titration method with NLR analysis (CT-NLR) was successfully applied to simulated conductometric titration data and to synthetic samples with multiple components at concentrations as low as those found in rainwater (approximately 10 micromol L(-1)). It was possible to resolve and quantify mixtures containing a strong acid, formic acid, acetic acid, ammonium ion, bicarbonate and inert electrolyte with accuracy of 5% or better. PMID:18970555
An analysis of past nuclear power plant availability performance is presented which covers the experience of 72 U.S. BWR's and PWR's currently in operation. This analysis quantitatively related availability to several design and organizational characteristics, including: plant size, age, staffing levels, maintenance quality, turnover rates, and other factors. The results are presented in terms of Physical (design), organizational, and external factors affecting plant performance
Wanke, Peter [Universidade Federal do Rio de Janeiro (UFRJ), RJ (Brazil). Instituto de Pesquisa e Pos-Graduacao em Administracao de Empresas (COPPEAD). Centro de Estudos em Logistica
2004-07-01
In this paper, the most relevant multiple regression models for sales forecasting of gas stations, developed over the past ten years, are reviewed. The most significant variables related to gas station sales, the types of the multiple regression models (linear or non-linear), the most common uses in supporting decision making and its limits are presented. The predictive power of each model and its impact on decision-making, such as sensitivity analysis and confidence intervals for independent variables, are also commented. Four models are presented, based on studies conducted in South Africa, Portugal and Brazil. In conclusion, suggestions for future developments are presented based on past developments. (author)
Jolly, William H.
1992-01-01
Relationships defining the ballistic limit of Space Station Freedom's (SSF) dual wall protection systems have been determined. These functions were regressed from empirical data found in Marshall Space Flight Center's (MSFC) Hypervelocity Impact Testing Summary (HITS) for the velocity range between three and seven kilometers per second. A stepwise linear least squares regression was used to determine the coefficients of several expressions that define a ballistic limit surface. Using statistical significance indicators and graphical comparisons to other limit curves, a final set of expressions is recommended for potential use in Probability of No Critical Flaw (PNCF) calculations for Space Station. The three equations listed below represent the mean curves for normal, 45 degree, and 65 degree obliquity ballistic limits, respectively, for a dual wall protection system consisting of a thin 6061-T6 aluminum bumper spaced 4.0 inches from a .125 inches thick 2219-T87 rear wall with multiple layer thermal insulation installed between the two walls. Normal obliquity is d(sub c) = 1.0514 v(exp 0.2983 t(sub 1)(exp 0.5228). Forty-five degree obliquity is d(sub c) = 0.8591 v(exp 0.0428) t(sub 1)(exp 0.2063). Sixty-five degree obliquity is d(sub c) = 0.2824 v(exp 0.1986) t(sub 1)(exp -0.3874). Plots of these curves are provided. A sensitivity study on the effects of using these new equations in the probability of no critical flaw analysis indicated a negligible increase in the performance of the dual wall protection system for SSF over the current baseline. The magnitude of the increase was 0.17 percent over 25 years on the MB-7 configuration run with the Bumper II program code.
Deng, Yangyang; Parajuli, Prem B.
2011-08-10
Evaluation of economic feasibility of a bio-gasification facility needs understanding of its unit cost under different production capacities. The objective of this study was to evaluate the unit cost of syngas production at capacities from 60 through 1800Nm 3/h using an economic model with three regression analysis techniques (simple regression, reciprocal regression, and log-log regression). The preliminary result of this study showed that reciprocal regression analysis technique had the best fit curve between per unit cost and production capacity, with sum of error squares (SES) lower than 0.001 and coefficient of determination of (R 2) 0.996. The regression analysis techniques determined the minimum unit cost of syngas production for micro-scale bio-gasification facilities of $0.052/Nm 3, under the capacity of 2,880 Nm 3/h. The results of this study suggest that to reduce cost, facilities should run at a high production capacity. In addition, the contribution of this technique could be the new categorical criterion to evaluate micro-scale bio-gasification facility from the perspective of economic analysis.
Fisz, Jacek J
2006-12-01
The optimization approach based on the genetic algorithm (GA) combined with multiple linear regression (MLR) method, is discussed. The GA-MLR optimizer is designed for the nonlinear least-squares problems in which the model functions are linear combinations of nonlinear functions. GA optimizes the nonlinear parameters, and the linear parameters are calculated from MLR. GA-MLR is an intuitive optimization approach and it exploits all advantages of the genetic algorithm technique. This optimization method results from an appropriate combination of two well-known optimization methods. The MLR method is embedded in the GA optimizer and linear and nonlinear model parameters are optimized in parallel. The MLR method is the only one strictly mathematical "tool" involved in GA-MLR. The GA-MLR approach simplifies and accelerates considerably the optimization process because the linear parameters are not the fitted ones. Its properties are exemplified by the analysis of the kinetic biexponential fluorescence decay surface corresponding to a two-excited-state interconversion process. A short discussion of the variable projection (VP) algorithm, designed for the same class of the optimization problems, is presented. VP is a very advanced mathematical formalism that involves the methods of nonlinear functionals, algebra of linear projectors, and the formalism of Fréchet derivatives and pseudo-inverses. Additional explanatory comments are added on the application of recently introduced the GA-NR optimizer to simultaneous recovery of linear and weakly nonlinear parameters occurring in the same optimization problem together with nonlinear parameters. The GA-NR optimizer combines the GA method with the NR method, in which the minimum-value condition for the quadratic approximation to chi(2), obtained from the Taylor series expansion of chi(2), is recovered by means of the Newton-Raphson algorithm. The application of the GA-NR optimizer to model functions which are multi-linear combinations of nonlinear functions, is indicated. The VP algorithm does not distinguish the weakly nonlinear parameters from the nonlinear ones and it does not apply to the model functions which are multi-linear combinations of nonlinear functions. PMID:17134156
Regression Models for Demand Reduction based on Cluster Analysis of Load Profiles
Yamaguchi, Nobuyuki; Han, Junqiao; Ghatikar, Girish; Piette, Mary Ann; Asano, Hiroshi; Kiliccote, Sila
2009-06-28
This paper provides new regression models for demand reduction of Demand Response programs for the purpose of ex ante evaluation of the programs and screening for recruiting customer enrollment into the programs. The proposed regression models employ load sensitivity to outside air temperature and representative load pattern derived from cluster analysis of customer baseline load as explanatory variables. The proposed models examined their performances from the viewpoint of validity of explanatory variables and fitness of regressions, using actual load profile data of Pacific Gas and Electric Company's commercial and industrial customers who participated in the 2008 Critical Peak Pricing program including Manual and Automated Demand Response.
Regression analysis understanding and building business and economic models using Excel
Wilson, J Holton
2012-01-01
The technique of regression analysis is used so often in business and economics today that an understanding of its use is necessary for almost everyone engaged in the field. This book will teach you the essential elements of building and understanding regression models in a business/economic context in an intuitive manner. The authors take a non-theoretical treatment that is accessible even if you have a limited statistical background. It is specifically designed to teach the correct use of regression, while advising you of its limitations and teaching about common pitfalls. This book describe
Raghavendra B.K. & S.K. Srivatsa
2011-12-01
Full Text Available Logistic Regression (LR is a well known classification method in the field of statistical learning. Itallows probabilistic classification and shows promising results on several benchmark problems.Logistic regression enables us to investigate the relationship between a categorical outcome anda set of explanatory variables. Artificial Neural Networks (ANNs are popularly used as universalnon-linear inference models and have gained extensive popularity in recent years. Researchactivities are considerable and literature is growing. The goal of this research work is to comparethe performance of logistic regression and neural network models on publicly available medicaldatasets. The evaluation process of the model is as follows. The logistic regression and neuralnetwork methods with sensitivity analysis have been evaluated for the effectiveness of theclassification. The classification accuracy is used to measure the performance of both themodels. From the experimental results it is confirmed that the neural network model withsensitivity analysis model gives more efficient result.
Smith, Timothy D.; Steffen, Christopher J., Jr.; Yungster, Shaye; Keller, Dennis J.
1998-01-01
The all rocket mode of operation is shown to be a critical factor in the overall performance of a rocket based combined cycle (RBCC) vehicle. An axisymmetric RBCC engine was used to determine specific impulse efficiency values based upon both full flow and gas generator configurations. Design of experiments methodology was used to construct a test matrix and multiple linear regression analysis was used to build parametric models. The main parameters investigated in this study were: rocket chamber pressure, rocket exit area ratio, injected secondary flow, mixer-ejector inlet area, mixer-ejector area ratio, and mixer-ejector length-to-inlet diameter ratio. A perfect gas computational fluid dynamics analysis, using both the Spalart-Allmaras and k-omega turbulence models, was performed with the NPARC code to obtain values of vacuum specific impulse. Results from the multiple linear regression analysis showed that for both the full flow and gas generator configurations increasing mixer-ejector area ratio and rocket area ratio increase performance, while increasing mixer-ejector inlet area ratio and mixer-ejector length-to-diameter ratio decrease performance. Increasing injected secondary flow increased performance for the gas generator analysis, but was not statistically significant for the full flow analysis. Chamber pressure was found to be not statistically significant.
Zerrin Asan Greenacre; Levent Terlemez; Sevil Sentrk
2014-01-01
The aim of this study is to show complementary usage of logistic and correspondence analysis in a research subject to self-healing methodologies. Firstly, the number of the variables is reduced by logistic regression according to relationship between dependent and independent variables and then research carries on searching variables. The relationship among the behaviours of individuals and their demographic characteristics is modelled by logistic regression and shown graphically by correspon...
Kinnebrock, Silja; Podolskij, Mark
This paper introduces a new estimator to measure the ex-post covariation between high-frequency financial time series under market microstructure noise. We provide an asymptotic limit theory (including feasible central limit theorems) for standard methods such as regression, correlation analysis...... on the noise process can be relaxed and how our method can be applied to non-synchronous observations. We also present an empirical study of how high-frequency correlations, regressions and covariances change through time....
The primary treatment goal of radiotherapy for paragangliomas of the head and neck region (HNPGLs) is local control of the tumor, i.e. stabilization of tumor volume. Interestingly, regression of tumor volume has also been reported. Up to the present, no meta-analysis has been performed giving an overview of regression rates after radiotherapy in HNPGLs. The main objective was to perform a systematic review and meta-analysis to assess regression of tumor volume in HNPGL-patients after radiotherapy. A second outcome was local tumor control. Design of the study is systematic review and meta-analysis. PubMed, EMBASE, Web of Science, COCHRANE and Academic Search Premier and references of key articles were searched in March 2012 to identify potentially relevant studies. Considering the indolent course of HNPGLs, only studies with ⩾12 months follow-up were eligible. Main outcomes were the pooled proportions of regression and local control after radiotherapy as initial, combined (i.e. directly post-operatively or post-embolization) or salvage treatment (i.e. after initial treatment has failed) for HNPGLs. A meta-analysis was performed with an exact likelihood approach using a logistic regression with a random effect at the study level. Pooled proportions with 95% confidence intervals (CI) were reported. Fifteen studies were included, concerning a total of 283 jugulotympanic HNPGLs in 276 patients. Pooled regression proportions for initial, combined and salvage treatment were respectively 21%, 33% and 52% in radiosurgery studies and 4%, 0% and 64% in external beam radiotherapy studies. Pooled local control proportions for radiotherapy as initial, combined and salvage treatment ranged from 79% to 100%. Radiotherapy for jugulotympanic paragangliomas results in excellent local tumor control and therefore is a valuable treatment for these types of tumors. The effects of radiotherapy on regression of tumor volume remain ambiguous, although the data suggest that regression can be achieved at least in some patients. More research is needed to identify predictors for treatment success
The Impact of Outliers on Net-Benefit Regression Model in Cost-Effectiveness Analysis
Wen, Yu-Wen; Tsai, Yi-Wen; Wu, David Bin-Chia; Chen, Pei-Fen
2013-01-01
Ordinary least square (OLS) in regression has been widely used to analyze patient-level data in cost-effectiveness analysis (CEA). However, the estimates, inference and decision making in the economic evaluation based on OLS estimation may be biased by the presence of outliers. Instead, robust estimation can remain unaffected and provide result which is resistant to outliers. The objective of this study is to explore the impact of outliers on net-benefit regression (NBR) in CEA using OLS and ...
Savescu, Roxana Florenta; Laba, Marian
2016-06-01
This paper highlights the statistical methodology used in a dissection experiment carried out in Romania to calibrate and standardize two classification devices, OptiGrade PRO (OGP) and Fat-o-Meat'er (FOM). One hundred forty-five carcasses were measured using the two probes and dissected according to the European reference method. To derive prediction formulas for each device, multiple linear regression analysis was performed on the relationship between the reference lean meat percentage and the back fat and muscle thicknesses, using the ordinary least squares technique. The root mean squared error of prediction calculated using the leave-one-out cross validation met European Commission (EC) requirements. The application of the new prediction equations reduced the gap between the lean meat percentage measured with the OGP and FOM from 2.43% (average for the period Q3/2006-Q2/2008) to 0.10% (average for the period Q3/2008-Q4/2014), providing the basis for a fair payment system for the pig producers. PMID:26835835
Emmanuel Olu Megbelayin
2014-02-01
Full Text Available The aim of the present study was to appraise prime dependent variables of ophthalmic patients satisfaction in a Nigerian public eye care facility with a view to boosting service uptake. It was a cross sectional study conducted between March and May 2012 in our centre. Consecutive clinic patients (n=251 that met studys criteria were recruited. The patients filled interviewer-administered structured questionnaires. A total of 251 patients were analyzed comprising 139 males (55.4% and 112 females (44.6%. Male:female ratio=1:0.8. The ages of the patients studied ranged from 17 to 92 years with a mean of 37.2 years15.57. Bivariate analysis, validated by multiple logistic regression, showed P values of 0.021, 0.008, 0.036, 0.008 and 0.004 for privacy, comfort during eye exam, fairness (non-partiality, thoroughness of examination and expectation, respectively. Satisfaction with overall quality of services was 80.1%. The services of any eye facility should be patient-driven to attain desired goals; therefore the identified areas of patients dissatisfaction should be addressed for effective service uptake.
Tahsin, Subrina; Chang, Ni-Bin
2016-02-01
Stormwater wet detention ponds have been a commonly employed best management practice for stormwater management throughout the world for many years. In the past, the trophic state index values have been used to evaluate seasonal changes in water quality and rank lakes within a region or between several regions; yet, to date, there is no similar index for stormwater wet detention ponds. This study aimed to develop a new multivariate trophic state index (MTSI) suitable for conducting a rapid eutrophication assessment of stormwater wet detention ponds under uncertainty with respect to three typical physical and chemical properties. Six stormwater wet detention ponds in Florida were selected for demonstration of the new MTSI with respect to total phosphorus (TP), total nitrogen (TN), and Secchi disk depth (SDD) as cognitive assessment metrics to sense eutrophication potential collectively and inform the environmental impact holistically. Due to the involvement of multiple endogenous variables (i.e., TN, TP, and SDD) for the eutrophication assessment simultaneously under uncertainty, fuzzy synthetic evaluation was applied to first standardize and synchronize the sources of uncertainty in the decision analysis. The ordered probit regression model was then formulated for assessment based on the concept of MTSI with the inputs from the fuzzy synthetic evaluation. It is indicative that the severe eutrophication condition is present during fall, which might be due to frequent heavy summer storm events contributing to high-nutrient inputs in these six ponds. PMID:26733470
Eghnam, Karam M.; Sheta, Alaa F.
2008-06-01
Development of accurate models is necessary in critical applications such as prediction. In this paper, a solution to the stock prediction problem of the Barents Sea capelin is introduced using Artificial Neural Network (ANN) and Multiple Linear model Regression (MLR) models. The Capelin stock in the Barents Sea is one of the largest in the world. It normally maintained a fishery with annual catches of up to 3 million tons. The Capelin stock problem has an impact in the fish stock development. The proposed prediction model was developed using an ANNs with their weights adapted using Genetic Algorithm (GA). The proposed model was compared to traditional linear model the MLR. The results showed that the ANN-GA model produced an overall accuracy of 21% better than the MLR model.
Nose, Takashi; Kobayashi, Takao
In this paper, we propose a technique for estimating the degree or intensity of emotional expressions and speaking styles appearing in speech. The key idea is based on a style control technique for speech synthesis using a multiple regression hidden semi-Markov model (MRHSMM), and the proposed technique can be viewed as the inverse of the style control. In the proposed technique, the acoustic features of spectrum, power, fundamental frequency, and duration are simultaneously modeled using the MRHSMM. We derive an algorithm for estimating explanatory variables of the MRHSMM, each of which represents the degree or intensity of emotional expressions and speaking styles appearing in acoustic features of speech, based on a maximum likelihood criterion. We show experimental results to demonstrate the ability of the proposed technique using two types of speech data, simulated emotional speech and spontaneous speech with different speaking styles. It is found that the estimated values have correlation with human perception.
Highlights: Thermodynamic models of simple and regenerative cycles are defined. Exergy destruction rate of different components was determined. Impact of important operating parameters on cycles characteristics was determined. Multiple polynomial regression models were developed. Optimization for optimal operating parameters was performed. - Abstract: In this paper, thermo-environmental, economic and regression analyses of simple and regenerative gas turbine cycles are exhibited. Firstly, thermodynamic models for both cycles are defined; exergy destruction rate of different components is determined and parametric study is carried out to investigate the effects of compressor inlet temperature, turbine inlet temperature and compressor pressure ratio on the parameters that measure cycles performance, environmental impact and costs. Subsequently, multiple polynomial regression (MPR) models are developed to correlate important response variables with predictor variables and finally optimization is performed for optimal operating conditions. The results of parametric study have shown a significant impact of operating parameters on the performance parameters, environmental impact and costs. According to exergy analysis, the combustion chamber and exhaust stack are two major sites where largest exergy destruction/losses occur. Also, the total exergy destruction in the regenerative cycle is relatively lower; thereby resulted in a higher exergy efficiency of the cycle. The MPR models are also appeared as good estimator of the response variables since appended with very high R2 values. Finally, these models are used to determine the optimal operating parameters, which maximize the cycles performance and minimize CO2 emissions and costs
A new LIBS quantitative analysis method based on analytical line adaptive selection and Relevance Vector Machine (RVM) regression model is proposed. First, a scheme of adaptively selecting analytical line is put forward in order to overcome the drawback of high dependency on a priori knowledge. The candidate analytical lines are automatically selected based on the built-in characteristics of spectral lines, such as spectral intensity, wavelength and width at half height. The analytical lines which will be used as input variables of regression model are determined adaptively according to the samples for both training and testing. Second, an LIBS quantitative analysis method based on RVM is presented. The intensities of analytical lines and the elemental concentrations of certified standard samples are used to train the RVM regression model. The predicted elemental concentration analysis results will be given with a form of confidence interval of probabilistic distribution, which is helpful for evaluating the uncertainness contained in the measured spectra. Chromium concentration analysis experiments of 23 certified standard high-alloy steel samples have been carried out. The multiple correlation coefficient of the prediction was up to 98.85%, and the average relative error of the prediction was 4.01%. The experiment results showed that the proposed LIBS quantitative analysis method achieved better prediction accuracy and better modeling robustness compared with the methods based on partial least squares regression, artificial neural network and standard support vector machine. - Highlights: • Both training and testing samples are considered for analytical lines selection. • The analytical lines are auto-selected based on the built-in characteristics of spectral lines. • The new method can achieve better prediction accuracy and modeling robustness. • Model predictions are given with confidence interval of probabilistic distribution
Yang, Jianhong, E-mail: yangjianhong@me.ustb.edu.cn [School of Mechanical Engineering, University of Science and Technology Beijing, Beijing, 100083 (China); Yi, Cancan; Xu, Jinwu [School of Mechanical Engineering, University of Science and Technology Beijing, Beijing, 100083 (China); Ma, Xianghong [School of Engineering and Applied Science, Aston University, Birmingham B4 7ET (United Kingdom)
2015-05-01
A new LIBS quantitative analysis method based on analytical line adaptive selection and Relevance Vector Machine (RVM) regression model is proposed. First, a scheme of adaptively selecting analytical line is put forward in order to overcome the drawback of high dependency on a priori knowledge. The candidate analytical lines are automatically selected based on the built-in characteristics of spectral lines, such as spectral intensity, wavelength and width at half height. The analytical lines which will be used as input variables of regression model are determined adaptively according to the samples for both training and testing. Second, an LIBS quantitative analysis method based on RVM is presented. The intensities of analytical lines and the elemental concentrations of certified standard samples are used to train the RVM regression model. The predicted elemental concentration analysis results will be given with a form of confidence interval of probabilistic distribution, which is helpful for evaluating the uncertainness contained in the measured spectra. Chromium concentration analysis experiments of 23 certified standard high-alloy steel samples have been carried out. The multiple correlation coefficient of the prediction was up to 98.85%, and the average relative error of the prediction was 4.01%. The experiment results showed that the proposed LIBS quantitative analysis method achieved better prediction accuracy and better modeling robustness compared with the methods based on partial least squares regression, artificial neural network and standard support vector machine. - Highlights: • Both training and testing samples are considered for analytical lines selection. • The analytical lines are auto-selected based on the built-in characteristics of spectral lines. • The new method can achieve better prediction accuracy and modeling robustness. • Model predictions are given with confidence interval of probabilistic distribution.
Seasonal forecasting of Bangladesh summer monsoon rainfall using simple multiple regression model
Md Mizanur Rahman; M Rafiuddin; Md Mahbub Alam
2013-04-01
In this paper, the development of a statistical forecasting method for summer monsoon rainfall over Bangladesh is described. Predictors for Bangladesh summer monsoon (JuneSeptember) rainfall were identified from the large scale oceanatmospheric circulation variables (i.e., sea-surface temperature, surface air temperature and sea level pressure). The predictors exhibited a significant relationship with Bangladesh summer monsoon rainfall during the period 19612007. After carrying out a detailed analysis of various global climate datasets; three predictors were selected. The model performance was evaluated during the period 19772007. The model showed better performance in their hindcast seasonal monsoon rainfall over Bangladesh. The RMSE and Heidke skill score for 31 years was 8.13 and 0.37, respectively, and the correlation between the predicted and observed rainfall was 0.74. The BIAS of the forecasts (% of long period average, LPA) was ?0.85 and Hit score was 58%. The experimental forecasts for the year 2008 summer monsoon rainfall based on the model were also found to be in good agreement with the observation.
Multiple factor analysis by example using R
Pagès, Jérôme
2014-01-01
Multiple factor analysis (MFA) enables users to analyze tables of individuals and variables in which the variables are structured into quantitative, qualitative, or mixed groups. Written by the co-developer of this methodology, Multiple Factor Analysis by Example Using R brings together the theoretical and methodological aspects of MFA. It also includes examples of applications and details of how to implement MFA using an R package (FactoMineR).The first two chapters cover the basic factorial analysis methods of principal component analysis (PCA) and multiple correspondence analysis (MCA). The
Bhattacharjee, Arnab; Bhattacharjee, Madhuchhanda
2007-01-01
We propose Bayesian inference in hazard regression models where the baseline hazard is unknown, covariate effects are possibly age-varying (non-proportional), and there is multiplicative frailty with arbitrary distribution. Our framework incorporates a wide variety of order restrictions on covariate dependence and duration dependence (ageing). We propose estimation and evaluation of age-varying covariate effects when covariate dependence is monotone rather than proportional. In particular, we...
Regularized Multiple-Set Canonical Correlation Analysis
Takane, Yoshio; Hwang, Heungsun; Abdi, Herve
2008-01-01
Multiple-set canonical correlation analysis (Generalized CANO or GCANO for short) is an important technique because it subsumes a number of interesting multivariate data analysis techniques as special cases. More recently, it has also been recognized as an important technique for integrating information from multiple sources. In this paper, we…
Analysis of designed experiments by stabilised PLS Regression and jack-knifing
Martens, Harald; Høy, M.; Westad, F.; Folkenberg, D.; Martens, M.
2001-01-01
applicability to the analysis of effects in designed experiments. Two ways of passifying unreliable variables are shown. A method for estimating the reliability of the cross- validated prediction error RMSEP is demonstrated. Some recently developed jack-knifing extensions are illustrated, for estimating the......Pragmatical, visually oriented methods for assessing and optimising bi-linear regression models are described, and applied to PLS Regression (PLSR) analysis of multi-response data from controlled experiments. The paper outlines some ways to stabilise the PLSR method to extend its range of...
Barndorff-Nielsen, Ole Eiler; Shephard, N.
2004-01-01
This paper analyses multivariate high frequency financial data using realized covariation. We provide a new asymptotic distribution theory for standard methods such as regression, correlation analysis, and covariance. It will be based on a fixed interval of time (e.g., a day or week), allowing the...... number of high frequency returns during this period to go to infinity. Our analysis allows us to study how high frequency correlations, regressions, and covariances change through time. In particular we provide confidence intervals for each of these quantities....
Han, Bing; Jing, Hongyuan; Liu, Jianping; Wu, Zhangzhong [PetroChina Pipeline RandD Center, Langfang, Hebei (China); Hao, Jianbin [School of Petroleum Engineering, Southwest Petroleum University, Chengdu, Sichuan (China)
2010-07-01
Landslides have a serious impact on the integrity of oil and gas pipelines in the tough terrain of Western China. This paper introduces a solving method of axial stress, which uses numerical simulation and regression analysis for the pipelines subjected to landslides. Numerical simulation is performed to analyze the change regularity of pipe stresses for the five vulnerability assessment indexes, which are: the distance between pipeline and landslide tail; the thickness of landslide; the inclination angle of landslide; the pipeline length passing through landslide; and the buried depth of pipeline. A pipeline passing through a certain landslide in southwest China was selected as an example to verify the feasibility and effectiveness of this method. This method has practical applicability, but it would need large numbers of examples to better verify its reliability and should be modified accordingly. Also, it only considers the case where the direction of the pipeline is perpendicular to the primary slip direction of the landslide.
The empirical model of turbine efficiency is necessary for the control- and/or diagnosis-oriented simulation and useful for the simulation and analysis of dynamic performances of the turbine equipment and systems, such as air cycle refrigeration systems, power plants, turbine engines, and turbochargers. Existing empirical models of turbine efficiency are insufficient because there is no suitable form available for air cycle refrigeration turbines. This work performs a critical review of empirical models (called mean value models in some literature) of turbine efficiency and develops an empirical model in the desired form for air cycle refrigeration, the dominant cooling approach in aircraft environmental control systems. The Taylor series and regression analysis are used to build the model, with the Taylor series being used to expand functions with the polytropic exponent and the regression analysis to finalize the model. The measured data of a turbocharger turbine and two air cycle refrigeration turbines are used for the regression analysis. The proposed model is compact and able to present the turbine efficiency map. Its predictions agree with the measured data very well, with the corrected coefficient of determination Rc2 ≥ 0.96 and the mean absolute percentage deviation = 1.19% for the three turbines. -- Highlights: → Performed a critical review of empirical models of turbine efficiency. → Developed an empirical model in the desired form for air cycle refrigeration, using the Taylor expansion and regression analysis. → Verified the method for developing the empirical model. → Verified the model.
A Bayesian ridge regression analysis of congestion's impact on urban expressway safety.
Shi, Qi; Abdel-Aty, Mohamed; Lee, Jaeyoung
2016-03-01
With the rapid growth of traffic in urban areas, concerns about congestion and traffic safety have been heightened. This study leveraged both Automatic Vehicle Identification (AVI) system and Microwave Vehicle Detection System (MVDS) installed on an expressway in Central Florida to explore how congestion impacts the crash occurrence in urban areas. Multiple congestion measures from the two systems were developed. To ensure more precise estimates of the congestion's effects, the traffic data were aggregated into peak and non-peak hours. Multicollinearity among traffic parameters was examined. The results showed the presence of multicollinearity especially during peak hours. As a response, ridge regression was introduced to cope with this issue. Poisson models with uncorrelated random effects, correlated random effects, and both correlated random effects and random parameters were constructed within the Bayesian framework. It was proven that correlated random effects could significantly enhance model performance. The random parameters model has similar goodness-of-fit compared with the model with only correlated random effects. However, by accounting for the unobserved heterogeneity, more variables were found to be significantly related to crash frequency. The models indicated that congestion increased crash frequency during peak hours while during non-peak hours it was not a major crash contributing factor. Using the random parameter model, the three congestion measures were compared. It was found that all congestion indicators had similar effects while Congestion Index (CI) derived from MVDS data was a better congestion indicator for safety analysis. Also, analyses showed that the segments with higher congestion intensity could not only increase property damage only (PDO) crashes, but also more severe crashes. In addition, the issues regarding the necessity to incorporate specific congestion indicator for congestion's effects on safety and to take care of the multicollinearity between explanatory variables were also discussed. By including a specific congestion indicator, the model performance significantly improved. When comparing models with and without ridge regression, the magnitude of the coefficients was altered in the existence of multicollinearity. These conclusions suggest that the use of appropriate congestion measure and consideration of multicolilnearity among the variables would improve the models and our understanding about the effects of congestion on traffic safety. PMID:26760688
Comparison of artificial neural network with least-square regression in quantitative analysis of XRF
Backward error propagation algorithm of artificial neural network in XRF quantitative analysis of Pt-Pd alloy were presented. The results were compared with least-square regression method. The technique of artificial neural network was proved more convenient and reliable
Cognitive Differentiation Analysis: A Regression Extension of the Reynolds-Sutrick Model.
Reynolds, Thomas J.; Sutrick, Kenneth H.
1988-01-01
Cognitive Differentiation Analysis (CDA) represents a method to measure the correspondence of an individual vector or a composite vector of descriptor ratings to a matrix of pair-wise dissimilarity judgments where both sets of judgments are assumed to be ordinal. The zero intercept regression extension of CDA is described. (TJH)
Meta-regression analysis of commensal and pathogenic Escherichia coli survival in soil and water.
Franz, Eelco; Schijven, Jack; de Roda Husman, Ana Maria; Blaak, Hetty
2014-06-17
The extent to which pathogenic and commensal E. coli (respectively PEC and CEC) can survive, and which factors predominantly determine the rate of decline, are crucial issues from a public health point of view. The goal of this study was to provide a quantitative summary of the variability in E. coli survival in soil and water over a broad range of individual studies and to identify the most important sources of variability. To that end, a meta-regression analysis on available literature data was conducted. The considerable variation in reported decline rates indicated that the persistence of E. coli is not easily predictable. The meta-analysis demonstrated that for soil and water, the type of experiment (laboratory or field), the matrix subtype (type of water and soil), and temperature were the main factors included in the regression analysis. A higher average decline rate in soil of PEC compared with CEC was observed. The regression models explained at best 57% of the variation in decline rate in soil and 41% of the variation in decline rate in water. This indicates that additional factors, not included in the current meta-regression analysis, are of importance but rarely reported. More complete reporting of experimental conditions may allow future inference on the global effects of these variables on the decline rate of E. coli. PMID:24839874
Ultrasound-enhanced bioscouring of greige cotton: regression analysis of process factors
Process factors of enzyme concentration, time, power and frequency were investigated for ultrasound-enhanced bioscouring of greige cotton. A fractional factorial experimental design and subsequent regression analysis of the process factors were employed to determine the significance of each factor a...
The Use of Nonparametric Kernel Regression Methods in Econometric Production Analysis
Czekaj, Tomasz Gerard
practically and politically relevant problems and to illustrate how nonparametric regression methods can be used in applied microeconomic production analysis both in panel data and cross-section data settings. The thesis consists of four papers. The first paper addresses problems of parametric and...
What Satisfies Students?: Mining Student-Opinion Data with Regression and Decision Tree Analysis
Thomas, Emily H.; Galambos, Nora
2004-01-01
To investigate how students' characteristics and experiences affect satisfaction, this study uses regression and decision tree analysis with the CHAID algorithm to analyze student-opinion data. A data mining approach identifies the specific aspects of students' university experience that most influence three measures of general satisfaction. The
What Satisfies Students?: Mining Student-Opinion Data with Regression and Decision Tree Analysis
Thomas, Emily H.; Galambos, Nora
2004-01-01
To investigate how students' characteristics and experiences affect satisfaction, this study uses regression and decision tree analysis with the CHAID algorithm to analyze student-opinion data. A data mining approach identifies the specific aspects of students' university experience that most influence three measures of general satisfaction. The…
Baxter Lisa K
2008-05-01
Full Text Available Abstract Background There is a growing body of literature linking GIS-based measures of traffic density to asthma and other respiratory outcomes. However, no consensus exists on which traffic indicators best capture variability in different pollutants or within different settings. As part of a study on childhood asthma etiology, we examined variability in outdoor concentrations of multiple traffic-related air pollutants within urban communities, using a range of GIS-based predictors and land use regression techniques. Methods We measured fine particulate matter (PM2.5, nitrogen dioxide (NO2, and elemental carbon (EC outside 44 homes representing a range of traffic densities and neighborhoods across Boston, Massachusetts and nearby communities. Multiple three to four-day average samples were collected at each home during winters and summers from 2003 to 2005. Traffic indicators were derived using Massachusetts Highway Department data and direct traffic counts. Multivariate regression analyses were performed separately for each pollutant, using traffic indicators, land use, meteorology, site characteristics, and central site concentrations. Results PM2.5 was strongly associated with the central site monitor (R2 = 0.68. Additional variability was explained by total roadway length within 100 m of the home, smoking or grilling near the monitor, and block-group population density (R2 = 0.76. EC showed greater spatial variability, especially during winter months, and was predicted by roadway length within 200 m of the home. The influence of traffic was greater under low wind speed conditions, and concentrations were lower during summer (R2 = 0.52. NO2 showed significant spatial variability, predicted by population density and roadway length within 50 m of the home, modified by site characteristics (obstruction, and with higher concentrations during summer (R2 = 0.56. Conclusion Each pollutant examined displayed somewhat different spatial patterns within urban neighborhoods, and were differently related to local traffic and meteorology. Our results indicate a need for multi-pollutant exposure modeling to disentangle causal agents in epidemiological studies, and further investigation of site-specific and meteorological modification of the traffic-concentration relationship in urban neighborhoods.
Detrended fluctuation analysis as a regression framework: Estimating dependence at different scales
Kristoufek, Ladislav
2015-02-01
We propose a framework combining detrended fluctuation analysis with standard regression methodology. The method is built on detrended variances and covariances and it is designed to estimate regression parameters at different scales and under potential nonstationarity and power-law correlations. The former feature allows for distinguishing between effects for a pair of variables from different temporal perspectives. The latter ones make the method a significant improvement over the standard least squares estimation. Theoretical claims are supported by Monte Carlo simulations. The method is then applied on selected examples from physics, finance, environmental science, and epidemiology. For most of the studied cases, the relationship between variables of interest varies strongly across scales.
Methods and applications of linear models regression and the analysis of variance
Hocking, Ronald R
2013-01-01
Praise for the Second Edition"An essential desktop reference book . . . it should definitely be on your bookshelf." -Technometrics A thoroughly updated book, Methods and Applications of Linear Models: Regression and the Analysis of Variance, Third Edition features innovative approaches to understanding and working with models and theory of linear regression. The Third Edition provides readers with the necessary theoretical concepts, which are presented using intuitive ideas rather than complicated proofs, to describe the inference that is appropriate for the methods being discussed. The book
Pradhan, B.; Buchroithner, M. F.; Mansor, S.
2009-04-01
This paper presents the assessment results of spatially based probabilistic three models using Geoinformation Techniques (GIT) for landslide susceptibility analysis at Penang Island in Malaysia. Landslide locations within the study areas were identified by interpreting aerial photographs, satellite images and supported with field surveys. Maps of the topography, soil type, lineaments and land cover were constructed from the spatial data sets. There are nine landslide related factors were extracted from the spatial database and the neural network, frequency ratio and logistic regression coefficients of each factor was computed. Landslide susceptibility maps were drawn for study area using neural network, frequency ratios and logistic regression models. For verification, the results of the analyses were compared with actual landslide locations in study area. The verification results show that frequency ratio model provides higher prediction accuracy than the ANN and regression models.
Zainal Ahmad
2007-10-01
Full Text Available Different methods for modelling nonlinear system are investigated in this paper. Neural network (NN techniques, multiple linear regression (MLR and principal component regression (PCR are applied to two nonlinear systems which are sine function and distillation column. For the sake of studying these three distinctive methods, all the data taken is from simulation which is then be seperated into training, testing and validation. Among those different approaches, the NN approach based on the nonlinear prediction technique gives a very good performance in for both case studies. It is also shown that MLR model suffers from glitches due to the collinearity of the input variables whereas PCR model shows good result in the prediction output. As a conclusion, the NN methods exhibit a consistent result with least sum square error (SSE on the unseen data compared to the other two technique
Cirulli, N; Ballini, A; Cantore, S; Farronato, D; Inchingolo, F; Dipalma, G; Gatto, M R; Alessandri Bonetti, G
2015-01-01
Mixed dentition analysis forms a critical aspect of early orthodontic treatment. In fact an accurate space analysis is one of the important criteria in determining whether the treatment plan may involve serial extraction, guidance of eruption, space maintenance, space regaining or just periodic observation of the patients. The aim of the present study was to calculate linear regression equations in mixed dentition space analysis, measuring 230 dental casts mesiodistal tooth widths, obtained from southern Italian patients (118 females, 112 males, mean age 15±3 years). Students t-test or Wilcoxon test for independent and paired samples were used to determine right/left side and male/female differences. On the basis of the sum of the mesiodistal diameters of the 4 mandibular incisors as predictors for the sum of the widths of the canines and premolars in the mandibular mixed dentition, a new linear regression equation was found: y = 0.613x+7.294 (r= 0.701) for both genders in a southern Italian population. To better estimate the size of leeway space, a new regression equation was found to calculate the mesiodistal size of the second premolar using the sum of the four mandibular incisors, canine and first premolar as a predictor. The equation is y = 0.241x+1.224 (r= 0.732). In conclusion, new regression equations were derived for a southern Italian population. PMID:26122245
Wang, Wen-Cheng; Cho, Wen-Chien; Chen, Yin-Jen
2014-01-01
It is estimated that mainland Chinese tourists travelling to Taiwan can bring annual revenues of 400 billion NTD to the Taiwan economy. Thus, how the Taiwanese Government formulates relevant measures to satisfy both sides is the focus of most concern. Taiwan must improve the facilities and service quality of its tourism industry so as to attract more mainland tourists. This paper conducted a questionnaire survey of mainland tourists and used grey relational analysis in grey mathematics to analyze the satisfaction performance of all satisfaction question items. The first eight satisfaction items were used as independent variables, and the overall satisfaction performance was used as a dependent variable for quantile regression model analysis to discuss the relationship between the dependent variable under different quantiles and independent variables. Finally, this study further discussed the predictive accuracy of the least mean regression model and each quantile regression model, as a reference for research personnel. The analysis results showed that other variables could also affect the overall satisfaction performance of mainland tourists, in addition to occupation and age. The overall predictive accuracy of quantile regression model Q0.25 was higher than that of the other three models. PMID:24574916
Wang, Wen-Cheng; Cho, Wen-Chien; Chen, Yin-Jen
2014-01-01
It is estimated that mainland Chinese tourists travelling to Taiwan can bring annual revenues of 400 billion NTD to the Taiwan economy. Thus, how the Taiwanese Government formulates relevant measures to satisfy both sides is the focus of most concern. Taiwan must improve the facilities and service quality of its tourism industry so as to attract more mainland tourists. This paper conducted a questionnaire survey of mainland tourists and used grey relational analysis in grey mathematics to analyze the satisfaction performance of all satisfaction question items. The first eight satisfaction items were used as independent variables, and the overall satisfaction performance was used as a dependent variable for quantile regression model analysis to discuss the relationship between the dependent variable under different quantiles and independent variables. Finally, this study further discussed the predictive accuracy of the least mean regression model and each quantile regression model, as a reference for research personnel. The analysis results showed that other variables could also affect the overall satisfaction performance of mainland tourists, in addition to occupation and age. The overall predictive accuracy of quantile regression model Q0.25 was higher than that of the other three models. PMID:24574916
Analysis of multiple primary cancers
From January 1971 to August 1979, 4156 patients with malignant tumor except brain tumor were registered at the Department of Radiotherapy, National Sapporo Hospital. Seventy-one patients out of them had multiple primary cancers. The incidence in our series was 1.71%. One patient had four separate primary cancers arising in respectively the cervix uteri, the sigmoid colon, the thymus and the stomach. In 27 cases (38.0%), the cancers occurred within 1 year of each other. The longest interval was 33 years. Five cases were considered to be radiation-induced cancers. They developed secondarily in the region irradiated in the period between 5 and 26 years after the completion of irradiation. In 25%, patient had a family history of cancer. (author)
Forecasting Model for IPTV Service in Korea Using Bootstrap Ridge Regression Analysis
Lee, Byoung Chul; Kee, Seho; Kim, Jae Bum; Kim, Yun Bae
The telecom firms in Korea are taking new step to prepare for the next generation of convergence services, IPTV. In this paper we described our analysis on the effective method for demand forecasting about IPTV broadcasting. We have tried according to 3 types of scenarios based on some aspects of IPTV potential market and made a comparison among the results. The forecasting method used in this paper is the multi generation substitution model with bootstrap ridge regression analysis.
Statistical Properties of Multivariate Distance Matrix Regression for High-Dimensional Data Analysis
Zapala, Matthew A.; Schork, Nicholas J.
2012-01-01
Multivariate distance matrix regression (MDMR) analysis is a statistical technique that allows researchers to relate P variables to an additional M factors collected on N individuals, where P ≫ N. The technique can be applied to a number of research settings involving high-dimensional data types such as DNA sequence data, gene expression microarray data, and imaging data. MDMR analysis involves computing the distance between all pairs of individuals with respect to P variables of interest and...
Critical Regression Analysis of Real Time Industrial Web Data Set Using Data Mining Tool
Kohli, Shruti; Gupta, Ankit
2014-01-01
In todays fast pacing, highly competing,volatile and challenging world, companies highly rely on data analysis obtained from both offline as well as online way to make their future strategy, to sustain in the market. This paper reviews the regression technique analysis on a real time web data to analyse different attributes of interest and to predict possible growth factors for the company, so as to enable the company to make possible strategic decisions for the growth of the company.
Geroukis, Asterios; Brorson, Erik
2014-01-01
In this study, we compare the two statistical techniques logistic regression and discriminant analysis to see how well they classify companies based on clusters made from the solvency ratio using principal components as independent variables. The principal components are made with different financial ratios. We use cluster analysis to find groups with low, medium and high solvency ratio of 1200 different companies found on the NASDAQ stock market and use this as an apriori definition of ...
Analysis of Dynamic Multiplicity Fluctuations at PHOBOS
Chai, Zhengwei; Collaboration, for the PHOBOS
2005-01-01
This paper presents the analysis of the dynamic fluctuations in the inclusive charged particle multiplicity measured by PHOBOS for Au+Au collisions at sqrt(s_NN)=200$GeV within the pseudo-rapidity range of -3
Wang, Chong; Sun, Qun; Wahab, Magd Abdel; Zhang, Xingyu; Xu, Limin
2015-09-01
Rotary cup brushes mounted on each side of a road sweeper undertake heavy debris removal tasks but the characteristics have not been well known until recently. A Finite Element (FE) model that can analyze brush deformation and predict brush characteristics have been developed to investigate the sweeping efficiency and to assist the controller design. However, the FE model requires large amount of CPU time to simulate each brush design and operating scenario, which may affect its applications in a real-time system. This study develops a mathematical regression model to summarize the FE modeled results. The complex brush load characteristic curves were statistically analyzed to quantify the effects of cross-section, length, mounting angle, displacement and rotational speed etc. The data were then fitted by a multiple variable regression model using the maximum likelihood method. The fitted results showed good agreement with the FE analysis results and experimental results, suggesting that the mathematical regression model may be directly used in a real-time system to predict characteristics of different brushes under varying operating conditions. The methodology may also be used in the design and optimization of rotary brush tools. PMID:26123978
Prediction of the amount of hospital waste production will be helpful in the storage, transportation and disposal of hospital waste management. Based on this fact, two predictor models including artificial neural networks (ANNs) and multiple linear regression (MLR) were applied to predict the rate of medical waste generation totally and in different types of sharp, infectious and general. In this study, a 5-fold cross-validation procedure on a database containing total of 50 hospitals of Fars province (Iran) were used to verify the performance of the models. Three performance measures including MAR, RMSE and R2 were used to evaluate performance of models. The MLR as a conventional model obtained poor prediction performance measure values. However, MLR distinguished hospital capacity and bed occupancy as more significant parameters. On the other hand, ANNs as a more powerful model, which has not been introduced in predicting rate of medical waste generation, showed high performance measure values, especially 0.99 value of R2 confirming the good fit of the data. Such satisfactory results could be attributed to the non-linear nature of ANNs in problem solving which provides the opportunity for relating independent variables to dependent ones non-linearly. In conclusion, the obtained results showed that our ANN-based model approach is very promising and may play a useful role in developing a better cost-effective strategy for waste management in future.
D'Souza, Sonia; Rasmussen, John; Schwirtz, Ansgar
2012-01-01
valuable ergonomic tool. Objective: To investigate age and gender effects on the torque-producing ability in the knee and elbow in older adults. To create strength scaled equations based on age, gender, upper/lower limb lengths and masses using multiple linear regression. To reduce the number of dependent...... parameters based on statistical redundancies, and then validate these equations. Methods: 283 subjects (141 males, 142 females) aged 50-59 years (54.9 +/- 2.9) , 60-69 years (65.4 +/- 2.9) and 70-79 years (73.7 +/- 2.7) were tested for maximal voluntary isometric torque of right knee extensors and elbow...... flexors. Results: Males were signifantly stronger than females across all age groups. Elbow peak torque (EPT) was better preserved from 60s to 70s whereas knee peak torque (KPT) reduced significantly (P<0.05) across all age groups. This held true for males and females. Gender, thigh mass and age best...
Gregoretti, Francesco; Belcastro, Vincenzo; di Bernardo, Diego; Oliva, Gennaro
2010-01-01
The reverse engineering of gene regulatory networks using gene expression profile data has become crucial to gain novel biological knowledge. Large amounts of data that need to be analyzed are currently being produced due to advances in microarray technologies. Using current reverse engineering algorithms to analyze large data sets can be very computational-intensive. These emerging computational requirements can be met using parallel computing techniques. It has been shown that the Network Identification by multiple Regression (NIR) algorithm performs better than the other ready-to-use reverse engineering software. However it cannot be used with large networks with thousands of nodes--as is the case in biological networks--due to the high time and space complexity. In this work we overcome this limitation by designing and developing a parallel version of the NIR algorithm. The new implementation of the algorithm reaches a very good accuracy even for large gene networks, improving our understanding of the gene regulatory networks that is crucial for a wide range of biomedical applications. PMID:20422008
Fereshteh Shiri
2010-08-01
Full Text Available In the present work, support vector machines (SVMs and multiple linear regression (MLR techniques were used for quantitative structure–property relationship (QSPR studies of retention time (tR in standardized liquid chromatography–UV–mass spectrometry of 67 mycotoxins (aflatoxins, trichothecenes, roquefortines and ochratoxins based on molecular descriptors calculated from the optimized 3D structures. By applying missing value, zero and multicollinearity tests with a cutoff value of 0.95, and genetic algorithm method of variable selection, the most relevant descriptors were selected to build QSPR models. MLRand SVMs methods were employed to build QSPR models. The robustness of the QSPR models was characterized by the statistical validation and applicability domain (AD. The prediction results from the MLR and SVM models are in good agreement with the experimental values. The correlation and predictability measure by r2 and q2 are 0.931 and 0.932, repectively, for SVM and 0.923 and 0.915, respectively, for MLR. The applicability domain of the model was investigated using William’s plot. The effects of different descriptors on the retention times are described.
Regression analysis to predict growth performance from dietary net energy in growing-finishing pigs.
Nitikanchana, S; Dritz, S S; Tokach, M D; DeRouchey, J M; Goodband, R D; White, B J
2015-06-01
Data from 41 trials with multiple energy levels (285 observations) were used in a meta-analysis to predict growth performance based on dietary NE concentration. Nutrient and energy concentrations in all diets were estimated using the NRC ingredient library. Predictor variables examined for best fit models using Akaike information criteria included linear and quadratic terms of NE, BW, CP, standardized ileal digestible (SID) Lys, crude fiber, NDF, ADF, fat, ash, and their interactions. The initial best fit models included interactions between NE and CP or SID Lys. After removal of the observations that fed SID Lys below the suggested requirement, these terms were no longer significant. Including dietary fat in the model with NE and BW significantly improved the G:F prediction model, indicating that NE may underestimate the influence of fat on G:F. The meta-analysis indicated that, as long as diets are adequate for other nutrients (i.e., Lys), dietary NE is adequate to predict changes in ADG across different dietary ingredients and conditions. The analysis indicates that ADG increases with increasing dietary NE and BW but decreases when BW is above 87 kg. The G:F ratio improves with increasing dietary NE and fat but decreases with increasing BW. The regression equations were then evaluated by comparing the actual and predicted performance of 543 finishing pigs in 2 trials fed 5 dietary treatments, included 3 different levels of NE by adding wheat middlings, soybean hulls, dried distillers grains with solubles (DDGS; 8 to 9% oil), or choice white grease (CWG) to a corn-soybean meal-based diet. Diets were 1) 30% DDGS, 20% wheat middlings, and 4 to 5% soybean hulls (low energy); 2) 20% wheat middlings and 4 to 5% soybean hulls (low energy); 3) a corn-soybean meal diet (medium energy); 4) diet 2 supplemented with 3.7% CWG to equalize the NE level to diet 3 (medium energy); and 5) a corn-soybean meal diet with 3.7% CWG (high energy). Only small differences were observed between predicted and observed values of ADG and G:F except for the low-energy diet containing the greatest fiber content (30% DDGS diet), where ADG and G:F were overpredicted by 3 to 6%. Therefore, the prediction equations provided a good estimation of the growth rate and feed efficiency of growing-finishing pigs fed different levels of dietary NE except for the pigs fed the low-energy diet containing the greatest fiber content. PMID:26115270
Applying support vector regression analysis on grip force level-related corticomuscular coherence
Rong, Yao; Han, Xixuan; Hao, Dongmei; Cao, Liu; Wang, Qing; Li, Mingai; Duan, Lijuan; Zeng, Yanjun
2014-01-01
accessory muscle, this study proposed an expanded support vector regression (ESVR) algorithm to quantify the coherence between electroencephalogram (EEG) from sensorimotor cortex and surface electromyogram (EMG) from brachioradialis in upper limb. A measure called coherence proportion was introduced to...... compare the corticomuscular coherence in the alpha (7–15Hz), beta (15–30Hz) and gamma (30–45Hz) band at 25 % maximum grip force (MGF) and 75 % MGF. Results show that ESVR could reduce the influence of deflected signals and summarize the overall behavior of multiple coherence curves. Coherence proportion...
The estimation of Aerosol Optical Depth in eastern China based on regression analysis
Wang, Jing; Shi, Runhe; Liu, Chaoshun; Zhou, Cong
2015-09-01
The atmospheric pollution and air quality issues are getting worse in China, the formation mechanism of aerosols and their environment effects attracted more and more attention. Aerosol Optical Depth (AOD) is one of the most important parameters which can indicate the atmospheric turbidity and aerosol load. High-quality AOD data are significant for the study in the atmospheric environment (i.e., air quality). This paper used MODIS/Terra AOD in 2008 to improve the coverage of MODIS/Aqua AOD, which was based on linear regression analysis model. RMSE between estimation value and AquaAOD detected through satellite is 0.132. The average value of test data was 0.812. The average of regression result was 0.807. It showed that the regression model between AODTerra and AODAqua worked well. Also, we built two sets of estimation models (MODIS AOD and OMI AOD) through stepwise regression analysis model. One is using OMI AOD and meteorological elements to estimate MODIS AOD. The value of RMSE was 0.113, which represents 13.916% of the average(R2=0.782). The other one is using MODIS AOD and meteorological elements to estimate OMI AOD. RMSE of the model is 0.132, which represents 18.182% of the average (R2=0.726).
Groping Toward Linear Regression Analysis: Newton's Analysis of Hipparchus' Equinox Observations
Belenkiy, Ari
2008-01-01
In 1700, Newton, in designing a new universal calendar contained in the manuscripts known as Yahuda MS 24 from Jewish National and University Library at Jerusalem and analyzed in our recent article in Notes & Records Royal Society (59 (3), Sept 2005, pp. 223-54), attempted to compute the length of the tropical year using the ancient equinox observations reported by a famous Greek astronomer Hipparchus of Rhodes, ten in number. Though Newton had a very thin sample of data, he obtained a tropical year only a few seconds longer than the correct length. The reason lies in Newton's application of a technique similar to modern regression analysis. Actually he wrote down the first of the two so-called "normal equations" known from the Ordinary Least Squares method. Newton also had a vague understanding of qualitative variables. This paper concludes by discussing open historico-astronomical problems related to the inclination of the Earth's axis of rotation. In particular, ignorance about the long-range variation...
Sadi Elasan
2015-01-01
Full Text Available Multinomial logistic regression analysis is one of the analysis techniques which is used to examine relationships between independent and dependent variables when dependent variable including three or more category. In multinomial logistic regression analysis, any category of dependent variable is considered as reference category and other categories are analyzed with respect to this category. In this study Multinomial Logistic Regression Analysis was introduced and an application was done. In the application trauma variable was considered as 4 categories [no abused (0, sexual abused (1, physical abused (2, sexual and physical abused (3] and effects of other variables on trauma were examined. As a result, it can be noted that multinomial logistic regression analysis is applicable for response variable contains 3 or more categories.zetMultinomiyal logistik regresyon analizi, cevap de?i?keninin veya daha fazla kategori ierdi?i durumlarda; bu de?i?ken ile a?klay?c? de?i?kenler (ba??ms?z de?i?kenler aras?ndaki ili?kiyi belirlemede kullan?lan yntemlerden birisidir. Multinomiyal logistik regresyon analizinde; cevap de?i?keninin herhangi bir kategorisi referans kategori olarak al?n?r ve di?er kategoriler bu referans kategoriye gre analiz edilir. Bu al??mada, Multinomiyal Logistik Regresyon Analizi tan?t?lm?? ve bir uygulama yap?lm??t?r. Uygulamada, travma de?i?keni, [Travma yok (0, Cinsel travma (1, Fiziksel travma (2, Cinsel ve Fiziksel travma (3] 4 kategorili olarak kodlanm?? ve bu de?i?ken zerine di?er de?i?kenlerin etkisi incelenmi?tir. Sonuta cevap de?i?keninin 3 ve daha fazla kategori ierdi?i durumlarda Multinomiyal Logistik Regresyon Analizi ynteminin kullan?labilirli?ine dikkat ekilmi?tir.
From correspondence analysis to multiple and joint correspondence analysis
Greenacre, Michael
2005-01-01
The generalization of simple (two-variable) correspondence analysis to more than two categorical variables, commonly referred to as multiple correspondence analysis, is neither obvious nor well-defined. We present two alternative ways of generalizing correspondence analysis, one based on the quantification of the variables and intercorrelation relationships, and the other based on the geometric ideas of simple correspondence analysis. We propose a version of multiple correspondence analysis...
Selenium Exposure and Cancer Risk: an Updated Meta-analysis and Meta-regression
Xianlei Cai; Chen Wang; Wanqi Yu; Wenjie Fan; Shan Wang; Ning Shen; Pengcheng Wu; Xiuyang Li; Fudi Wang
2016-01-01
The objective of this study was to investigate the associations between selenium exposure and cancer risk. We identified 69 studies and applied meta-analysis, meta-regression and dose-response analysis to obtain available evidence. The results indicated that high selenium exposure had a protective effect on cancer risk (pooled OR = 0.78; 95%CI: 0.73–0.83). The results of linear and nonlinear dose-response analysis indicated that high serum/plasma selenium and toenail selenium had the efficacy...
Analysis of the Evolution of the Gross Domestic Product by Means of Cyclic Regressions
Catalin Angelo Ioan
2011-08-01
Full Text Available In this article, we will carry out an analysis on the regularity of the Gross Domestic Product of a country, in our case the United States. The method of analysis is based on a new method of analysis – the cyclic regressions based on the Fourier series of a function. Another point of view is that of considering instead the growth rate of GDP the speed of variation of this rate, computed as a numerical derivative. The obtained results show a cycle for this indicator for 71 years, the mean square error being 0.93%. The method described allows an prognosis on short-term trends in GDP.
Regression And Time Series Analysis Of Loan Default At Minescho Cooperative Credit Union Tarkwa
Otoo
2015-08-01
Full Text Available Abstract Lending in the form of loans is a principal business activity for banks credit unions and other financial institutions. This forms a substantial amount of the banks assets. However when these loans are defaulted it tends to have serious effects on the financial institutions. This study sought to determine the trend and forecast loan default at Minescho CreditUnion Tarkwa. A secondary data from the Credit Union was analyzed using Regression Analysis and the Box-Jenkins method of Time Series. From the Regression Analysis there was a moderately strong relationship between the amount of loan default and time. Also the amount of loan default had an increasing trend. The two years forecast of the amount of loan default oscillated initially and remained constant from 2016 onwards.
Robust estimation for homoscedastic regression in the secondary analysis of case-control data
Wei, Jiawei
2012-12-04
Primary analysis of case-control studies focuses on the relationship between disease D and a set of covariates of interest (Y, X). A secondary application of the case-control study, which is often invoked in modern genetic epidemiologic association studies, is to investigate the interrelationship between the covariates themselves. The task is complicated owing to the case-control sampling, where the regression of Y on X is different from what it is in the population. Previous work has assumed a parametric distribution for Y given X and derived semiparametric efficient estimation and inference without any distributional assumptions about X. We take up the issue of estimation of a regression function when Y given X follows a homoscedastic regression model, but otherwise the distribution of Y is unspecified. The semiparametric efficient approaches can be used to construct semiparametric efficient estimates, but they suffer from a lack of robustness to the assumed model for Y given X. We take an entirely different approach. We show how to estimate the regression parameters consistently even if the assumed model for Y given X is incorrect, and thus the estimates are model robust. For this we make the assumption that the disease rate is known or well estimated. The assumption can be dropped when the disease is rare, which is typically so for most case-control studies, and the estimation algorithm simplifies. Simulations and empirical examples are used to illustrate the approach.
Coherence Motivated Sampling and Convergence Analysis of Least-Squares Polynomial Chaos Regression
Hampton, Jerrad; Doostan, Alireza
2014-01-01
Independent sampling of orthogonal polynomial bases via Monte Carlo is of interest for uncertainty quantification of models, using Polynomial Chaos (PC) expansions. It is known that bounding the spectral radius of a random matrix consisting of PC samples, yields a bound on the number of samples necessary to identify coefficients in the PC expansion via solution to a least-squares regression problem. We present a related analysis which guarantees a mean square convergence using a coherence par...
Detrended fluctuation analysis as a regression framework: Estimating dependence at different scales
Krištoufek, Ladislav
2015-01-01
Roč. 91, č. 1 (2015), 022802-1-022802-5. ISSN 1539-3755 R&D Projects: GA ČR(CZ) GP14-11402P Grant ostatní: GA ČR(CZ) GAP402/11/0948 Institutional support: RVO:67985556 Keywords : Detrended cross-correlation analysis * Regression * Scales Subject RIV: AH - Economics Impact factor: 2.288, year: 2014 http://library.utia.cas.cz/separaty/2015/E/kristoufek-0452315.pdf
Functional Mixture Discriminant Analysis with hidden process regression for curve classification
Chamroukhi, Faicel; Glotin, Her; Rabouy, Cline
2013-01-01
We present a new mixture model-based discriminant analysis approach for functional data using a specific hidden process regression model. The approach allows for fitting flexible curve-models to each class of complex-shaped curves presenting regime changes. The model parameters are learned by maximizing the observed-data log-likelihood for each class by using a dedicated expectation-maximization (EM) algorithm. Comparisons on simulated data with alternative approaches show that the proposed a...
A Logistic Regression Analysis of the Contractor`s Awareness Regarding Waste Management
Rawshan Ara Begum; Chamhuri Siwar; Joy Jacqueline Pereira; Abdul Hamid Jaafar
2006-01-01
This study has highlighted a number of factors affecting contractor`s awareness regarding construction waste management to the construction industry. The data in the present study is based on contractors registered with the Construction Industry Development Board of Malaysia. Binary logistic regression analysis is employed for exploring the factors affecting the awareness. Contractor`s awareness regarding waste management will tend to be significantly adequate with the increasing values in th...
Soil colour and spectral analysis employing linear regression models I. Effect of organic matter
Moustakas N.K.; Barouchas P.E.
2004-01-01
This work comprises an investigation into whether soil reflectance spectral analysis which is employed to calculate the colour characteristics (hue, value, chroma) of soil can be carried out using linear regression models, so that comparison of colour characteristics subsequently becomes possible, and also statistically documented. To this end the colour of soil samples was calculated through spectrum reflectance in the visible region of dry smooth-rubbed soil samples smaller than 250 mm. The...
Augmented kludge waveforms and Gaussian process regression for EMRI data analysis
Chua, Alvin J K
2016-01-01
Extreme-mass-ratio inspirals (EMRIs) will be an important type of astrophysical source for future space-based gravitational-wave detectors. There is a trade-off between accuracy and computational speed for the EMRI waveform templates required in the analysis of data from these detectors. We discuss how the systematic error incurred by using faster templates may be reduced with improved models such as augmented kludge waveforms, and marginalised over with statistical techniques such as Gaussian process regression.
Estimation of Output Disturbance in Auto-Regressive Model via Independent Component Analysis
Tanaka, R; Kawaguchi, K; J. Endo; Shibasaki, H.; Y. Hikichi; Ishida, Y.
2013-01-01
This paper explains and demonstrates how to estimate an output disturbance in an auto-regressive model. This method uses the independent component analysis (ICA) technique, which restores source signals from their linear mixtures under the assumption that the source signals are mutually independent. The estimation is achieved by a model whose source signals consist of input and output disturbance, and observed signals consist of input and output. To solve the ICA problem, a natural gradient m...
Lee, C. Y.; Tippett, M. K.; Sobel, A. H.; Camargo, S. J.
2014-12-01
We are working towards the development of a new statistical-dynamical downscaling system to study the influence of climate on tropical cyclones (TCs). The first step is development of an appropriate model for TC intensity as a function of environmental variables. We approach this issue with a stochastic model consisting of a multiple linear regression model (MLR) for 12-hour intensity forecasts as a deterministic component, and a random error generator as a stochastic component. Similar to the operational Statistical Hurricane Intensity Prediction Scheme (SHIPS), MLR relates the surrounding environment to storm intensity, but with only essential predictors calculated from monthly-mean NCEP reanalysis fields (potential intensity, shear, etc.) and from persistence. The deterministic MLR is developed with data from 1981-1999 and tested with data from 2000-2012 for the Atlantic, Eastern North Pacific, Western North Pacific, Indian Ocean, and Southern Hemisphere basins. While the global MLR's skill is comparable to that of the operational statistical models (e.g., SHIPS), the distribution of the predicted maximum intensity from deterministic results has a systematic low bias compared to observations; the deterministic MLR creates almost no storms with intensities greater than 100 kt. The deterministic MLR can be significantly improved by adding the stochastic component, based on the distribution of random forecasting errors from the deterministic model compared to the training data. This stochastic component may be thought of as representing the component of TC intensification that is not linearly related to the environmental variables. We find that in order for the stochastic model to accurately capture the observed distribution of maximum storm intensities, the stochastic component must be auto-correlated across 12-hour time steps. This presentation also includes a detailed discussion of the distributions of other TC-intensity related quantities, as well as the inter-annual variability of predicted storm intensity in the form of accumulated cyclone energy (ACE). Applying this stochastic model in conjunction with global climate model fields is an ongoing task.
Rosana de Cassia de Souza Schneider
2011-03-01
Full Text Available O ar é um meio eficiente de dispersão de poluentes atmosféricos e seu comportamento depende dos movimentos atmosféricos que ocorrem na troposfera. Em Porto Alegre, Estado do Rio Grande do Sul, há um grande tráfego diário e uma concentração de indústrias que podem ser responsáveis por emissões atmosféricas. Neste trabalho, estudou-se o comportamento das concentrações diárias de material particulado (PM10 desta cidade, considerando a influência dos elementos meteorológicos. A análise dos dados foi realizada a partir de estatísticas descritivas, correlação linear e regressão múltipla. Os dados foram fornecidos pela Fundação Estadual de Proteção Ambiental Henrique Luiz Roessler - RS (FEPAM e pelo Instituto Nacional de Meteorologia (INMET. A partir das análises pôde-se verificar que: as concentrações do PM10, medidos diariamente às 16h, não ultrapassaram os padrões nacionais de qualidade do ar; os elementos meteorológicos que influenciam nas concentrações do PM10 foram: a velocidade média diária do vento e a radiação média diária com relações negativas; as temperaturas médias diárias do ar e as direções, norte e noroeste, do vento, com relações positivas. As direções do vento que contribuem significativamente para diminuir as concentrações nos locais medidos são Leste e Sudeste.Air is an efficient means of atmospheric pollutants dispersal and its r behavior depends on the atmospheric movements that occur in the troposphere. In Porto Alegre, Rio Grande do Sul State, there is a large daily traffic and a concentration of industries that may be responsible for atmospheric emission. In the present work we studied the behavior of daily concentrations of particulate matter (PM10, in this city, considering the influence of meteorological variables. Data analysis was performed from descriptive statistics, linear correlation and multiple regressions. Data were provided by the State Foundation of Environmental Protection Henrique Luiz Roessler - RS and the National Institute of Meteorology. Based on the analysis it was possible to verify that: the concentration of PM10, measured every day at 4:00 p.m., did not exceed national standards for air quality; meteorological elements that influenced on the concentrations of PM10 were the daily average wind speed and average daily radiation with negative relations; the daily average temperature of the air and the directions, north and northwest of wind, with positive relations. Wind directions which contribute significantly to lower concentrations on the measured places are east and southeast.
Mok Tik
2014-06-01
Full Text Available This study formulates regression of vector data that will enable statistical analysis of various geodetic phenomena such as, polar motion, ocean currents, typhoon/hurricane tracking, crustal deformations, and precursory earthquake signals. The observed vector variable of an event (dependent vector variable is expressed as a function of a number of hypothesized phenomena realized also as vector variables (independent vector variables and/or scalar variables that are likely to impact the dependent vector variable. The proposed representation has the unique property of solving the coefficients of independent vector variables (explanatory variables also as vectors, hence it supersedes multivariate multiple regression models, in which the unknown coefficients are scalar quantities. For the solution, complex numbers are used to rep- resent vector information, and the method of least squares is deployed to estimate the vector model parameters after transforming the complex vector regression model into a real vector regression model through isomorphism. Various operational statistics for testing the predictive significance of the estimated vector parameter coefficients are also derived. A simple numerical example demonstrates the use of the proposed vector regression analysis in modeling typhoon paths.
Repeated-measures regression designs and analysis for environmental effects monitoring programs
Paine, Michael D.; Skinner, Marc A.; Kilgour, Bruce W.; DeBlois, Elisabeth M.; Tracy, Ellen
2014-12-01
This paper provides a general overview of repeated-measures (RM) regression designs and analysis for marine monitoring programs, in support of sediment chemistry, particle size and benthic macroinvertebrate community analyses provided as part of this series. In RM regression designs, the same n replicates (usually stations in monitoring programs) are re-sampled (i.e., repeatedly measured) at t>1 Times (usually years). The stations provide variation in the predictor, or X variables. In the Terra Nova environmental effects monitoring (EEM) program, n=48 stations were sampled in each of t=7 years from 2000 to 2010. Two distance measures from five drill centres (sources of drilling wastes) were fixed predictor variables. RM regression designs are rarely used in environmental monitoring programs, but are often suitable and would be appropriate if applied to data from many monitoring programs. For the Terra Nova EEM program, carry-over effects, or persistent and usually small-scale variations among stations unrelated to distance, were strong for most sediment quality variables. Whenever natural carry-over effects are strong, RM designs and analysis will usually be more powerful and suitable than alternative approaches to the analysis.
Structural Model Analysis of Multiple Quantitative Traits
Li, Renhua; Tsaih, Shirng-Wern; Shockley, Keith; Stylianou, Ioannis M.; Wergedal, Jon; Paigen, Beverly; Churchill, Gary A.
2006-01-01
Synopsis Disease states are often associated with multiple, correlated traits that may result from shared genetic and nongenetic factors. Genetic analysis of multiple traits can reveal a network of effects in which each trait is influenced by more than one genetic locus (heterogeneity) and different traits share one or more loci in common (pleiotropy). Physiological interactions independent of genetic factors may also contribute to the observed correlations. Structural equation modeling is pr...
Gardênia Abbad
2002-01-01
Full Text Available Este artigo discute algumas aplicações das técnicas de análise de regressão múltipla stepwise e hierárquica, as quais são muito utilizadas em pesquisas da área de Psicologia Organizacional. São discutidas algumas estratégias de identificação e de solução de problemas relativos à ocorrência de erros do Tipo I e II e aos fenômenos de supressão, complementaridade e redundância nas equações de regressão múltipla. São apresentados alguns exemplos de pesquisas nas quais esses padrões de associação entre variáveis estiveram presentes e descritas as estratégias utilizadas pelos pesquisadores para interpretá-los. São discutidas as aplicações dessas análises no estudo de interação entre variáveis e na realização de testes para avaliação da linearidade do relacionamento entre variáveis. Finalmente, são apresentadas sugestões para lidar com as limitações das análises de regressão múltipla (stepwise e hierárquica.This article discusses applications of stepwise and hierarchical multiple regression analyses to research in organizational psychology. Strategies for identifying type I and II errors, and solutions to potential problems that may arise from such errors are proposed. In addition, phenomena such as suppression, complementarity, and redundancy are reviewed. The article presents examples of research where these phenomena occurred, and the manner in which they were explained by researchers. Some applications of multiple regression analyses to studies involving between-variable interactions are presented, along with tests used to analyze the presence of linearity among variables. Finally, some suggestions are provided for dealing with limitations implicit in multiple regression analyses (stepwise and hierarchical.
Lin, Lixin; Wang, Yunjia; Teng, Jiyao; Wang, Xuchen
2016-02-01
Hyperspectral estimation of soil organic matter (SOM) in coal mining regions is an important tool for enhancing fertilization in soil restoration programs. The correlation-partial least squares regression (PLSR) method effectively solves the information loss problem of correlation-multiple linear stepwise regression, but results of the correlation analysis must be optimized to improve precision. This study considers the relationship between spectral reflectance and SOM based on spectral reflectance curves of soil samples collected from coal mining regions. Based on the major absorption troughs in the 400-1006 nm spectral range, PLSR analysis was performed using 289 independent bands of the second derivative (SDR) with three levels and measured SOM values. A wavelet-correlation-PLSR (W-C-PLSR) model was then constructed. By amplifying useful information that was previously obscured by noise, the W-C-PLSR model was optimal for estimating SOM content, with smaller prediction errors in both calibration (R (2) = 0.970, root mean square error (RMSEC) = 3.10, and mean relative error (MREC) = 8.75) and validation (RMSEV = 5.85 and MREV = 14.32) analyses, as compared with other models. Results indicate that W-C-PLSR has great potential to estimate SOM in coal mining regions. PMID:26780416
A multivariate regression analysis is applied to decay measurements of ?-resp. ?-filter activcity. Activity concentrations for Po-218, Pb-214 and Bi-214, resp. for the Rn-222 equilibrium equivalent concentration are obtained explicitly. The regression analysis takes into account properly the variances of the measured count rates and their influence on the resulting activity concentrations. (orig.)
Multiple-user neutron activation analysis system
The Nuclear Data, Inc., computer ND6600, a state-of-the-art multiple-user laboratory computer system, has been applied to neutron activation analysis (NAA) data acquisition and data processing. The ND6600 NAA software is specifically aimed at solving four problem areas: (1) unification of reactor-parameter and standard comparison techniques in a single analysis, (2) use of multiple comparative standards in a single analysis, (3) improvement of statistical processng, and (4) determination of minimum detectable concentrations. Thirteen NAA program modules were developed. Software modules can be run either manually or automatically in any combination. Operations can be easily tailored to meet the unique needs of each activation analyst. This applications software capability is designed around the ND6600 COMBUS and uses true distributed processing hardware. User operations are implemented through the multiple-user MIDAS software operating system
Correlation Study and Regression Analysis of Drinking Water Quality in Kashan City, Iran
Mohammad Mehdi HEYDARI
2013-06-01
Full Text Available Chemical and statistical regression analysis on drinking water samples at five fields (21 sampling wells with hot and dry climate in Kashan city, central Iran was carried out. Samples were collected during October 2006 to May 2007 (25 - 30 C. Comparing the results with drinking water quality standards issued by World Health Organization (WHO, it is found that some of the water samples are not potable. Hydrochemical facies using a Piper diagram indicate that in most parts of the city, the chemical character of water is dominated by NaCl. All samples showed sulfate and sodium ion higher and K+ and F- content lower than the permissible limit. A strongly positive correlation is observed between TDS and EC (R = 0.995 and Ca2+ and TH (R = 0.948. The results showed that regression relations have the same correlation coefficients: (I pH -TH, EC -TH (R = 0.520, (II NO3- -pH, TH-pH (R = 0.520, (III Ca2+-SO42-, TH-SO42-, Cl- -SO42- (R = 0.630. The results revealed that systematic calculations of correlation coefficients between water parameters and regression analysis provide a useful means for rapid monitoring of water quality.
Cigarette Smoking Habits among Men and Women in Turkey: A Meta Regression Analysis
F Sahin Mutlu; U Ayranci; K Ozdamar
2006-01-01
Smoking has become more prevalent in Turkey than it has in those of western countries during the past decade. This study was conducted to make parameter estimations on gender related smoking habits with the minimum of variance. Of the ninety-two researches related to smoking habits conducted from 1981 to 2003 in Turkey, 60 were deemed appropriate for the application of Meta analysis and Meta regression analysis. The proportions of men and women smoking cigarettes were 0.51 and 0.35, respectiv...
A regression analysis of the effect of energy use in agriculture
This study investigates the impacts of energy use on productivity of Turkey's agriculture. It reports the results of a regression analysis of the relationship between energy use and agricultural productivity. The study is based on the analysis of the yearbook data for the period 1971-2003. Agricultural productivity was specified as a function of its energy consumption (TOE) and gross additions of fixed assets during the year. Least square (LS) was employed to estimate equation parameters. The data of this study comes from the State Institute of Statistics (SIS) and The Ministry of Energy of Turkey
Sub-pixel estimation of tree cover and bare surface densities using regression tree analysis
Carlos Augusto Zangrando Toneli
2011-09-01
Full Text Available Sub-pixel analysis is capable of generating continuous fields, which represent the spatial variability of certain thematic classes. The aim of this work was to develop numerical models to represent the variability of tree cover and bare surfaces within the study area. This research was conducted in the riparian buffer within a watershed of the So Francisco River in the North of Minas Gerais, Brazil. IKONOS and Landsat TM imagery were used with the GUIDE algorithm to construct the models. The results were two index images derived with regression trees for the entire study area, one representing tree cover and the other representing bare surface. The use of non-parametric and non-linear regression tree models presented satisfactory results to characterize wetland, deciduous and savanna patterns of forest formation.
Identification of cotton properties to improve yarn count quality by using regression analysis
Identification of raw material characteristics towards yarn count variation was studied by using statistical techniques. Regression analysis is used to meet the objective. Stepwise regression is used for mode) selection, and coefficient of determination and mean squared error (MSE) criteria are used to identify the contributing factors of cotton properties for yam count. Statistical assumptions of normality, autocorrelation and multicollinearity are evaluated by using probability plot, Durbin Watson test, variance inflation factor (VIF), and then model fitting is carried out. It is found that, invisible (INV), nepness (Nep), grayness (RD), cotton trash (TR) and uniformity index (VI) are the main contributing cotton properties for yarn count variation. The results are also verified by Pareto chart. (author)
Dervilis, N.; Worden, K.; Cross, E. J.
2015-07-01
In the data-based approach to structural health monitoring (SHM), the absence of data from damaged structures in many cases forces a dependence on novelty detection as a means of diagnosis. Unfortunately, this means that benign variations in the operating or environmental conditions of the structure must be handled very carefully, lest they lead to false alarms. If novelty detection is implemented in terms of outlier detection, the outliers may arise in the data as the result of both benign and malign causes and it is important to understand their sources. Comparatively recent developments in the field of robust regression have the potential to provide ways of exploring and visualising SHM data as a means of shedding light on the different origins of outliers. The current paper will illustrate the use of robust regression for SHM data analysis through experimental data acquired from the Z24 and Tamar Bridges, although the methods are general and not restricted to SHM or civil infrastructure.
Analysis of ontogenetic spectra of populations of plants and lichens via ordinal regression
Sofronov, G. Yu.; Glotov, N. V.; Ivanov, S. M.
2015-03-01
Ontogenetic spectra of plants and lichens tend to vary across the populations. This means that if several subsamples within a sample (or a population) were collected, then the subsamples would not be homogeneous. Consequently, the statistical analysis of the aggregated data would not be correct, which could potentially lead to false biological conclusions. In order to take into account the heterogeneity of the subsamples, we propose to use ordinal regression, which is a type of generalized linear regression. In this paper, we study the populations of cowberry Vaccinium vitis-idaea L. and epiphytic lichens Hypogymnia physodes (L.) Nyl. and Pseudevernia furfuracea (L.) Zopf. We obtain estimates for the proportions of between-sample variability in the total variability of the ontogenetic spectra of the populations.
Alados, C.L.; Pueyo, Y.; Giner, M.L.; Navarro, T.; Escos, J.; Barroso, F.; Cabezudo, B.; Emlen, J.M.
2003-01-01
We studied the effect of grazing on the degree of regression of successional vegetation dynamic in a semi-arid Mediterranean matorral. We quantified the spatial distribution patterns of the vegetation by fractal analyses, using the fractal information dimension and spatial autocorrelation measured by detrended fluctuation analyses (DFA). It is the first time that fractal analysis of plant spatial patterns has been used to characterize the regressive ecological succession. Plant spatial patterns were compared over a long-term grazing gradient (low, medium and heavy grazing pressure) and on ungrazed sites for two different plant communities: A middle dense matorral of Chamaerops and Periploca at Sabinar-Romeral and a middle dense matorral of Chamaerops, Rhamnus and Ulex at Requena-Montano. The two communities differed also in the microclimatic characteristics (sea oriented at the Sabinar-Romeral site and inland oriented at the Requena-Montano site). The information fractal dimension increased as we moved from a middle dense matorral to discontinuous and scattered matorral and, finally to the late regressive succession, at Stipa steppe stage. At this stage a drastic change in the fractal dimension revealed a change in the vegetation structure, accurately indicating end successional vegetation stages. Long-term correlation analysis (DFA) revealed that an increase in grazing pressure leads to unpredictability (randomness) in species distributions, a reduction in diversity, and an increase in cover of the regressive successional species, e.g. Stipa tenacissima L. These comparisons provide a quantitative characterization of the successional dynamic of plant spatial patterns in response to grazing perturbation gradient. ?? 2002 Elsevier Science B.V. All rights reserved.
Chung, Chaeuk; Park, Dong IL; Kim, Sun Young; Ju Ock KIM; Jung, Sung Soo; Park, Hee Sun; Moon, Jae Young; Kim, Sung Min; Cho, Min Ji; Jung, Sang Ok; Lee, Choong Sik; LEE, JEONG EUN
2015-01-01
Spontaneous regression (SR) of cancer is defined as a complete or partial, temporary or permanent disappearance of all or at least some relevant parameters of malignant disease with inadequate or no treatment. SR of cancer is an extremely rare phenomenon. We report a case of a 67-year-old man who experienced SR of non-small-cell lung cancer (NSCLC), which progressed after fifth-line chemotherapy and regressed after chemotherapy ceased. Surprisingly, the primary tumor size continued to decreas...
Computation of distance to fault on an electrical transmission line is affected by many sources of uncertainty, including parameter setting errors, measurement errors, as well as absence of information and incomplete modelling of a system under fault condition. In this paper we propose an application of the variance-based global sensitivity measures for evaluation of fault location algorithms. The main goal of the evaluation is to identify factors and their interactions that contribute to the fault locator output variability. This analysis is based on the results of Sparse Grid Regression. The method compiles the Functional ANOVA model to represent fault locator output as a function of uncertain factors. The ANOVA model provides a tool for interpretation and sensitivity analysis. In practice, such analysis can help in functional performance tests, especially in: selection of the optimal fault location algorithm (device) for a specific application, calibration process and building confidence in a fault location function result. The paper concludes with an application example which demonstrates use of the proposed methodology in testing and comparing some commonly used fault location algorithms. This example is also used to demonstrate numerical efficiency for this type of application of the proposed Sparse Grid Regression method in comparison to the Quasi-Monte Carlo approach. - Highlights: ► Sparse Grid Regression (SGR) method has been developed and presented in the paper. ► The SGR method is able to fit ANOVA model to input/output data of a black-box function. ► The SGR provides variance-based sensitivities to be used for Global Sensitivity Analysis (GSA). ► The SGR algorithm relies on the numerical multi-dimensional integration on a sparse grid. ► Application example presented is GSA of fault-locating algorithms used in electrical networks.
P300 Amplitude in Alzheimer's Disease: A Meta-Analysis and Meta-Regression.
Hedges, Dawson; Janis, Rebecca; Mickelson, Stephen; Keith, Cierra; Bennett, David; Brown, Bruce L
2016-01-01
Alzheimer's disease accounts for 60% of all dementia. Numerous biomarkers have been developed that can help in making an early diagnosis. The P300 is an event-related potential that may be abnormal in Alzheimer's disease. Given the possible association between P300 amplitude and Alzheimer's disease and the need for biomarkers in early Alzheimer's disease, the main purpose of this meta-analysis and meta-regression was to characterize P300 amplitude in probable Alzheimer's disease compared to healthy controls. Using online search engines, we identified peer-reviewed articles containing amplitude measures for the P300 in response to a visual or auditory oddball stimulus in subjects with Alzheimer's disease and in a healthy control group and pooled effect sizes for differences in P300 amplitude between Alzheimer's disease and control groups to obtain summary effect sizes. We also used meta-regression to determine whether age, sex, educational attainment, or dementia severity affected the association between P300 amplitude and Alzheimer's disease. Twenty articles containing a total of 646 subjects met inclusion and exclusion criteria. The overall effect size from all electrode locations was 1.079 (95% confidence interval=0.745-1.412, PMeta-regression showed an association between amplitude and educational attainment, but no association between amplitude and age, sex, and dementia severity. In conclusion, P300 amplitude is smaller in subjects with Alzheimer's disease than in healthy controls. PMID:25253434
Ryu, Duchwan
2010-09-28
We consider nonparametric regression analysis in a generalized linear model (GLM) framework for data with covariates that are the subject-specific random effects of longitudinal measurements. The usual assumption that the effects of the longitudinal covariate processes are linear in the GLM may be unrealistic and if this happens it can cast doubt on the inference of observed covariate effects. Allowing the regression functions to be unknown, we propose to apply Bayesian nonparametric methods including cubic smoothing splines or P-splines for the possible nonlinearity and use an additive model in this complex setting. To improve computational efficiency, we propose the use of data-augmentation schemes. The approach allows flexible covariance structures for the random effects and within-subject measurement errors of the longitudinal processes. The posterior model space is explored through a Markov chain Monte Carlo (MCMC) sampler. The proposed methods are illustrated and compared to other approaches, the "naive" approach and the regression calibration, via simulations and by an application that investigates the relationship between obesity in adulthood and childhood growth curves. © 2010, The International Biometric Society.
Highlights: • A new method useful for the parametric analysis and optimization of reactor core designs. • This uses the strengths of genetic algorithms (GA), and regression splines. • The method is applied to the core fuel pin cell of a PHWR design. • Tools like java, R, and codes like Serpent, Matlab are used in this research. - Abstract: An analysis and optimization of a set of neutronics parameters of a thorium-fueled pressurized heavy water reactor core fuel has been performed. The analysis covers a detailed pin-cell analysis of a seed-blanket configuration, where the seed is composed of natural uranium, and the blanket is composed of thorium. Genetic algorithms (GA) is used to optimize the input parameters to meet a specific set of objectives related to: infinite multiplication factor, initial breeding ratio, and specific nuclide’s effective microscopic cross-section. The core input parameters are the pitch-to-diameter ratio, and blanket material composition. Recursive partitioning of decision trees (rpart) multivariate regression model is used to perform a predictive analysis of the samples generated from the GA module. Reactor designs are usually complex and a simulation needs a significantly large amount time to execute, hence implementation of GA or any other global optimization techniques is not feasible, therefore we present a new method of using rpart in conjunction with GA. Due to using rpart, we do not necessarily need to run the neutronics simulation for all the inputs generated from the GA module rather, run the simulations for a predefined set of inputs, build a regression fit to the input and the output parameters, and then use this fit to predict the output parameters for the inputs generated by GA. The rpart model is implemented as a library using R programming language. The results suggest that the initial breeding ratio tends to increase due to a harder neutron spectrum, however a softer neutron spectrum is desired to limit the parasitic absorption of Pa-233. The neutronics model, design and analysis have been done using Serpent 1.1.19 Monte Carlo code
Gaines Das, R.E.; Tydeman, M.S. (National Inst. for Biological Standards and Control, London (UK))
1982-08-01
A program, WRANL, is described for the analysis of immunoassays or bioassays which have a logistic dose-response relationship. Responses are transformed to logits and iterative weighted regression analysis is used to obtain log dose-logit response lines for all preparations compared in an assay. Potency estimates of preparations relative to the standard preparation are available for both unweighted and weighted regression analyses together with detailed analysis of variance, estimates of slope and other relevant parameters. The general comparisons of dose-response relationships produced by the program are a feature of particular interest. However, an option which suppresses the more general output is available if the program is to be used for analysis of a 'screening' assay comparing single dilutions or doses of test samples with a standard curve. Data input is designed to permit immediate running of the program by junior personnel. Data output is designed to facilitate record keeping.
Rapeli Pekka
2012-11-01
Full Text Available Abstract Background Cognitive deficits and multiple psychoactive drug regimens are both common in patients treated for opioid-dependence. Therefore, we examined whether the cognitive performance of patients in opioid-substitution treatment (OST is associated with their drug treatment variables. Methods Opioid-dependent patients (N = 104 who were treated either with buprenorphine or methadone (n = 52 in both groups were given attention, working memory, verbal, and visual memory tests after they had been a minimum of six months in treatment. Group-wise results were analysed by analysis of variance. Predictors of cognitive performance were examined by hierarchical regression analysis. Results Buprenorphine-treated patients performed statistically significantly better in a simple reaction time test than methadone-treated ones. No other significant differences between groups in cognitive performance were found. In each OST drug group, approximately 10% of the attention performance could be predicted by drug treatment variables. Use of benzodiazepine medication predicted about 10% of performance variance in working memory. Treatment with more than one other psychoactive drug (than opioid or BZD and frequent substance abuse during the past month predicted about 20% of verbal memory performance. Conclusions Although this study does not prove a causal relationship between multiple prescription drug use and poor cognitive functioning, the results are relevant for psychosocial recovery, vocational rehabilitation, and psychological treatment of OST patients. Especially for patients with BZD treatment, other treatment options should be actively sought.
Analysis of sparse data in logistic regression in medical research: A newer approach
S Devika
2016-01-01
Full Text Available Background and Objective: In the analysis of dichotomous type response variable, logistic regression is usually used. However, the performance of logistic regression in the presence of sparse data is questionable. In such a situation, a common problem is the presence of high odds ratios (ORs with very wide 95% confidence interval (CI (OR: >999.999, 95% CI: 999.999. In this paper, we addressed this issue by using penalized logistic regression (PLR method. Materials and Methods: Data from case-control study on hyponatremia and hiccups conducted in Christian Medical College, Vellore, Tamil Nadu, India was used. The outcome variable was the presence/absence of hiccups and the main exposure variable was the status of hyponatremia. Simulation dataset was created with different sample sizes and with a different number of covariates. Results: A total of 23 cases and 50 controls were used for the analysis of ordinary and PLR methods. The main exposure variable hyponatremia was present in nine (39.13% of the cases and in four (8.0% of the controls. Of the 23 hiccup cases, all were males and among the controls, 46 (92.0% were males. Thus, the complete separation between gender and the disease group led into an infinite OR with 95% CI (OR: >999.999, 95% CI: 999.999 whereas there was a finite and consistent regression coefficient for gender (OR: 5.35; 95% CI: 0.42, 816.48 using PLR. After adjusting for all the confounding variables, hyponatremia entailed 7.9 (95% CI: 2.06, 38.86 times higher risk for the development of hiccups as was found using PLR whereas there was an overestimation of risk OR: 10.76 (95% CI: 2.17, 53.41 using the conventional method. Simulation experiment shows that the estimated coverage probability of this method is near the nominal level of 95% even for small sample sizes and for a large number of covariates. Conclusions: PLR is almost equal to the ordinary logistic regression when the sample size is large and is superior in small cell values.
Sleyman Demir; ?brahim Alper Kse
2014-01-01
This study performs a Differential Item Function (DIF) analysis in terms of gender and culture on the items available in the PISA 2009 mathematics literacy sub-test. The DIF analyses were done through the Mantel Haenszel, Logistic Regression and the SIBTEST methods. The data for the gender variable were collected from the responses given by 332 students to the items in the mathematics literacy sub-test during the administration of the 5th booklet in the PISA 2009 application whereas the data ...
A systematic review and meta-regression analysis of mivacurium for tracheal intubation
Vanlinthout, L. E. H.; Mesfin, S. H.; Hens, Niel; Vanacker, B. F.; Robertson, E. N.; Booij, L.H.D.J.
2014-01-01
We systematically reviewed factors associated with intubation conditions in randomised controlled trials of mivacurium, using random-effects meta-regression analysis. We included 29 studies of 1050 healthy participants. Four factors explained 72.9% of the variation in the probability of excellent intubation conditions: mivacurium dose, 24.4%; opioid use, 29.9%; time to intubation and age together, 18.6%. The odds ratio (95% CI) for excellent intubation was 3.14 (1.655.73) for doubling the mi...
LINEAR REGRESSION MODEL IN THE ANALYSIS OF THE GROSS DOMESTIC PRODUCT
Constantin ANGHELACHE
2011-12-01
Full Text Available As we ascertain the evolutionary trend of the global economy, it becomes evident that strict analyses on the evolution of a certain micro or macro-economical indicator is no longer enough to describe the corresponding phenomenon, as the emphasis shifts towards the analysis of the correlations existing between two or more indicators, able to offer a much stronger insight on the economical phenomenon. We propose to use the simple linear regression model, a relatively easy and very effective modality to establish the correlation between two economical indicators. The measurement of the factors influence on the indicator will most surely offer additional information on the phenomen they describe.
The computer program FREQFIT is designed to perform regression and statistical chi-squared goodness of fit analysis on one-dimensional or two-dimensional data. The program features an interactive user dialogue, numerous help messages, an option for screen or line printer output, and the flexibility to use practically any commercially available graphics package to create plots of the program's results. FREQFIT is written in Microsoft QuickBASIC, for IBM-PC compatible computers. A listing of the QuickBASIC source code for the FREQFIT program, a user manual, and sample input data, output, and plots are included. 6 refs., 1 fig
Analysis of electrical resistance tomography (ERT) data using least-squares regression modelling in industrial process tomographs has been tested. Potential differences measured between electrodes in rings have been used to carry out the regression modelling to investigate the location and size of a disturbance present in the system. Extensive experiments have been carried out with ERT to test a suitable regression algorithm to extract the disturbance. Current analysis has been performed for a single disturbance known to be present in the system. For the environment considered, the least-squares regression reported in this paper demonstrates an alternative approach for analysis of tomography data in industrial applications. The position (concentric or off-centre) and the size of the disturbance (in concentric cases) can be well defined by the reported regression modelling approach. However, it is still a challenge to define the size of the off-centre disturbance
Mandal, Nilrudra; Doloi, Biswanath; Mondal, Biswanath
2015-05-01
In the present study, an attempt has been made to apply the Taguchi parameter design method and regression analysis for optimizing the cutting conditions on surface finish while machining AISI 4340 steel with the help of the newly developed yttria based Zirconia Toughened Alumina (ZTA) inserts. These inserts are prepared through wet chemical co-precipitation route followed by powder metallurgy process. Experiments have been carried out based on an orthogonal array L9 with three parameters (cutting speed, depth of cut and feed rate) at three levels (low, medium and high). Based on the mean response and signal to noise ratio (SNR), the best optimal cutting condition has been arrived at A3B1C1 i.e. cutting speed is 420 m/min, depth of cut is 0.5 mm and feed rate is 0.12 m/min considering the condition smaller is the better approach. Analysis of Variance (ANOVA) is applied to find out the significance and percentage contribution of each parameter. The mathematical model of surface roughness has been developed using regression analysis as a function of the above mentioned independent variables. The predicted values from the developed model and experimental values are found to be very close to each other justifying the significance of the model. A confirmation run has been carried out with 95 % confidence level to verify the optimized result and the values obtained are within the prescribed limit.
Mandal, Nilrudra; Doloi, Biswanath; Mondal, Biswanath
2016-01-01
In the present study, an attempt has been made to apply the Taguchi parameter design method and regression analysis for optimizing the cutting conditions on surface finish while machining AISI 4340 steel with the help of the newly developed yttria based Zirconia Toughened Alumina (ZTA) inserts. These inserts are prepared through wet chemical co-precipitation route followed by powder metallurgy process. Experiments have been carried out based on an orthogonal array L9 with three parameters (cutting speed, depth of cut and feed rate) at three levels (low, medium and high). Based on the mean response and signal to noise ratio (SNR), the best optimal cutting condition has been arrived at A3B1C1 i.e. cutting speed is 420 m/min, depth of cut is 0.5 mm and feed rate is 0.12 m/min considering the condition smaller is the better approach. Analysis of Variance (ANOVA) is applied to find out the significance and percentage contribution of each parameter. The mathematical model of surface roughness has been developed using regression analysis as a function of the above mentioned independent variables. The predicted values from the developed model and experimental values are found to be very close to each other justifying the significance of the model. A confirmation run has been carried out with 95 % confidence level to verify the optimized result and the values obtained are within the prescribed limit.
Purpose: The goal of this study was to maximize the discrimination between benign and malignant masses in patients with sonographically indeterminate ovarian lesions by means of unenhanced and contrast-enhanced MR imaging, and to develop a computer-assisted diagnosis system. Material and Methods: Findings in precontrast and Gd-DTPA contrast-enhanced MR images of 104 patients with 115 sonographically indeterminate ovarian masses were analyzed, and the results were correlated with histopathological findings. Of 115 lesions, 65 were benign (23 cystadenomas, 13 complex cysts, 11 teratomas, 6 fibrothecomas, 12 others) and 50 were malignant (32 ovarian carcinomas, 7 metastatic tumors of the ovary, 4 carcinomas of the fallopian tubes, 7 others). A logistic regression analysis was performed to discriminate between benign and malignant lesions, and a model of a computer-assisted diagnosis was developed. This model was prospectively tested in 75 cases of ovarian tumors found at other institutions. Results: From the univariate analysis, the following parameters were selected as significant for predicting malignancy (p≤0.05): A solid or cystic mass with a large solid component or wall thickness greater than 3 mm; complex internal architecture; ascites; and bilaterality. Based on these parameters, a model of a computer-assisted diagnosis system was developed with the logistic regression analysis. To distinguish benign from malignant lesions, the maximum cut-off point was obtained between 0.47 and 0.51. In a prospective application of this model, 87% of the lesions were accurately identified as benign or malignant. (orig.)
Soil colour and spectral analysis employing linear regression models I. Effect of organic matter
Moustakas N.K.
2004-03-01
Full Text Available This work comprises an investigation into whether soil reflectance spectral analysis which is employed to calculate the colour characteristics (hue, value, chroma of soil can be carried out using linear regression models, so that comparison of colour characteristics subsequently becomes possible, and also statistically documented. To this end the colour of soil samples was calculated through spectrum reflectance in the visible region of dry smooth-rubbed soil samples smaller than 250 mm. The colour parameters of the CIE system assessed by analysis of the spectrum reflectance were converted into Munsell colour system characte- ristics. Regression in accordance with the piecewise linear model was then applied to the spectrum data. The processing indicated that this model is capable of making satisfactory predictions - above all of the value and secondarily of the chroma of the soil samples. Detection of statistically significant differences in the colour characteristics of horizons of the same profile was effected through the application of the nested model. These differences cannot be detected using the tables of the Munsell colour system. Finally, in each region of the spectrum, qualitative analysis of the effect of the organic matter on the soil colour characteristics was performed, demonstrating its active role in determining the readings for value and chroma.
A quantitative structure-property relationship (QSPR) study was performed to develop models those relate the structures of 150 drug organic compounds to their n-octanol-water partition coefficients (log Po/w). Molecular descriptors derived solely from 3D structures of the molecular drugs. A genetic algorithm was also applied as a variable selection tool in QSPR analysis. The models were constructed using 110 molecules as training set, and predictive ability tested using 40 compounds. Modeling of log Po/w of these compounds as a function of the theoretically derived descriptors was established by multiple linear regression (MLR). Four descriptors for these compounds molecular volume (MV) (geometrical), hydrophilic-lipophilic balance (HLB) (constitutional), hydrogen bond forming ability (HB) (electronic) and polar surface area (PSA) (electrostatic) are taken as inputs for the model. The use of descriptors calculated only from molecular structure eliminates the need for experimental determination of properties for use in the correlation and allows for the estimation of log Po/w for molecules not yet synthesized. Application of the developed model to a testing set of 40 drug organic compounds demonstrates that the model is reliable with good predictive accuracy and simple formulation. The prediction results are in good agreement with the experimental value. The root mean square error of prediction (RMSEP) and square correlation coefficient (R2) for MLR model were 0.22 and 0.99 for the prediction set log Po/w
Selenium Exposure and Cancer Risk: an Updated Meta-analysis and Meta-regression
Cai, Xianlei; Wang, Chen; Yu, Wanqi; Fan, Wenjie; Wang, Shan; Shen, Ning; Wu, Pengcheng; Li, Xiuyang; Wang, Fudi
2016-01-01
The objective of this study was to investigate the associations between selenium exposure and cancer risk. We identified 69 studies and applied meta-analysis, meta-regression and dose-response analysis to obtain available evidence. The results indicated that high selenium exposure had a protective effect on cancer risk (pooled OR?=?0.78; 95%CI: 0.730.83). The results of linear and nonlinear dose-response analysis indicated that high serum/plasma selenium and toenail selenium had the efficacy on cancer prevention. However, we did not find a protective efficacy of selenium supplement. High selenium exposure may have different effects on specific types of cancer. It decreased the risk of breast cancer, lung cancer, esophageal cancer, gastric cancer, and prostate cancer, but it was not associated with colorectal cancer, bladder cancer, and skin cancer. PMID:26786590
Selenium Exposure and Cancer Risk: an Updated Meta-analysis and Meta-regression.
Cai, Xianlei; Wang, Chen; Yu, Wanqi; Fan, Wenjie; Wang, Shan; Shen, Ning; Wu, Pengcheng; Li, Xiuyang; Wang, Fudi
2016-01-01
The objective of this study was to investigate the associations between selenium exposure and cancer risk. We identified 69 studies and applied meta-analysis, meta-regression and dose-response analysis to obtain available evidence. The results indicated that high selenium exposure had a protective effect on cancer risk (pooled OR = 0.78; 95%CI: 0.73-0.83). The results of linear and nonlinear dose-response analysis indicated that high serum/plasma selenium and toenail selenium had the efficacy on cancer prevention. However, we did not find a protective efficacy of selenium supplement. High selenium exposure may have different effects on specific types of cancer. It decreased the risk of breast cancer, lung cancer, esophageal cancer, gastric cancer, and prostate cancer, but it was not associated with colorectal cancer, bladder cancer, and skin cancer. PMID:26786590
The records of three earthquakes which had induced significant earthquake response to the piping system were obtained with the earthquake observation system. In the present paper, first, the eigenvalue analysis results for the natural piping system based on the piping support (boundary) conditions are described and second, the frequency and the damping factor evaluation results for each vibrational mode are described. In the present study, the Auto Regressive (AR) analysis method is used in the evaluation of natural frequencies and damping factors. The AR analysis applied here has a capability of direct evaluation of natural frequencies and damping factors from earthquake records observed on a piping system without any information on the input motions to the system. (orig./HP)
Samuel Ribeiro Figueiredo
2008-12-01
Full Text Available Regressões nominais logísticas estabelecem relações matemáticas entre variáveis independentes contínuas ou discretas e variáveis dependentes discretas. Essas foram avaliadas quanto ao seu potencial em predizer a ocorrência e distribuição de classes de solos na região dos municípios de Ibirubá e Quinze de Novembro (RS. A partir de modelo numérico de terreno digital (MNT com 90 m de resolução, foram calculadas variáveis de terreno topográficas (elevação, declividade e curvatura e hidrográficas (distância dos rios, índice de umidade topográfica, comprimento de fluxo de escoamento e índice de poder de escoamento. Foram então estabelecidas regressões logísticas múltiplas entre as classes de solos da região com base em levantamento tradicional na escala 1:80.000 e as variáveis de terreno. As regressões serviram para calcular a probabilidade de ocorrência de cada classe de solo, e o mapa final de solos estimado foi produzido atribuindo-se a cada célula do mapa a denominação da classe de solo com maior probabilidade de ocorrência. Observou-se acurácia geral (AG de 58 % e acurácia pelo coeficiente Kappa de Cohen de 38 %, comparando-se o mapa original com o mapa estimado dentro da escala original. Uma simplificação de escala foi pouco significativa para o aumento da acurácia do mapa, sendo 61 % de AG e 39 % de Kappa. Concluiu-se que as regressões logísticas múltiplas apresentaram potencial preditivo para serem usadas como ferramentas no mapeamento supervisionado de solos.Logistic nominal regressions establish mathematical relations between continuous or discrete independent variables and discrete dependent variables. The prediction potential of the occurrence and distribution of soil classes in the region Ibirubá and Quinze de Novembro, RS, Brazil was evaluated. Using a digital elevation model (DEM with 90 m resolution, were calculated several topographic characteristics (elevation, slope, and curvature and hydrographic variables (distance to rivers, flow length, topographical wetness index, and stream power index. Multiple logistic regressions were established between the soil classes mapped on the basis of a traditional survey at a scale of 1:80.000 and the land variables calculated using the DEM. The regressions were used to calculate the probability of occurrence of each soil class. The final estimated soil map was drawn by assigning the soil class with highest probability of occurrence to each cell. The general accuracy was evaluated at 58 % and the Kappa coefficient at 38 % in a comparison of the original soil map with the map estimated at the original scale. A legend simplification had little effect to increase the general accuracy of the map (general accuracy of 61 % and Kappa coefficient of 39 %. It was concluded that multiple logistic regressions have a predictive potential as tool of supervised soil mapping.
Mackley, Rob D.; Spane, Frank A.; Pulsipher, Trenton C.; Allwardt, Craig H.
2010-09-01
A software tool was created in Fiscal Year 2010 (FY11) that enables multiple-regression correction of well water levels for river-stage effects. This task was conducted as part of the Remediation Science and Technology project of CH2MHILL Plateau Remediation Company (CHPRC). This document contains an overview of the correction methodology and a user’s manual for Multiple Regression in Excel (MRCX) v.1.1. It also contains a step-by-step tutorial that shows users how to use MRCX to correct river effects in two different wells. This report is accompanied by an enclosed CD that contains the MRCX installer application and files used in the tutorial exercises.
Ibrahim Fayad
2014-11-01
Full Text Available Estimating forest canopy height from large-footprint satellite LiDAR waveforms is challenging given the complex interaction between LiDAR waveforms, terrain, and vegetation, especially in dense tropical and equatorial forests. In this study, canopy height in French Guiana was estimated using multiple linear regression models and the Random Forest technique (RF. This analysis was either based on LiDAR waveform metrics extracted from the GLAS (Geoscience Laser Altimeter System spaceborne LiDAR data and terrain information derived from the SRTM (Shuttle Radar Topography Mission DEM (Digital Elevation Model or on Principal Component Analysis (PCA of GLAS waveforms. Results show that the best statistical model for estimating forest height based on waveform metrics and digital elevation data is a linear regression of waveform extent, trailing edge extent, and terrain index (RMSE of 3.7 m. For the PCA based models, better canopy height estimation results were observed using a regression model that incorporated both the first 13 principal components (PCs and the waveform extent (RMSE = 3.8 m. Random Forest regressions revealed that the best configuration for canopy height estimation used all the following metrics: waveform extent, leading edge, trailing edge, and terrain index (RMSE = 3.4 m. Waveform extent was the variable that best explained canopy height, with an importance factor almost three times higher than those for the other three metrics (leading edge, trailing edge, and terrain index. Furthermore, the Random Forest regression incorporating the first 13 PCs and the waveform extent had a slightly-improved canopy height estimation in comparison to the linear model, with an RMSE of 3.6 m. In conclusion, multiple linear regressions and RF regressions provided canopy height estimations with similar precision using either LiDAR metrics or PCs. However, a regression model (linear regression or RF based on the PCA of waveform samples with waveform extent information is an interesting alternative for canopy height estimation as it does not require several metrics that are difficult to derive from GLAS waveforms in dense forests, such as those in French Guiana.
Non-invasively reconstructing the transmembrane potentials (TMPs) from body surface potentials (BSPs) constitutes one form of the inverse ECG problem that can be treated as a regression problem with multi-inputs and multi-outputs, and which can be solved using the support vector regression (SVR) method. In developing an effective SVR model, feature extraction is an important task for pre-processing the original input data. This paper proposes the application of principal component analysis (PCA) and kernel principal component analysis (KPCA) to the SVR method for feature extraction. Also, the genetic algorithm and simplex optimization method is invoked to determine the hyper-parameters of the SVR. Based on the realistic heart-torso model, the equivalent double-layer source method is applied to generate the data set for training and testing the SVR model. The experimental results show that the SVR method with feature extraction (PCA-SVR and KPCA-SVR) can perform better than that without the extract feature extraction (single SVR) in terms of the reconstruction of the TMPs on epi- and endocardial surfaces. Moreover, compared with the PCA-SVR, the KPCA-SVR features good approximation and generalization ability when reconstructing the TMPs.
The observation of the equipment and piping system installed in an operating nuclear power plant in earthquakes is very umportant for evaluating and confirming the adequacy and the safety margin expected in the design stage. By analyzing observed earthquake records, it can be expected to get the valuable data concerning the behavior of those in earthquakes, and extract the information about the aseismatic design parameters for those systems. From these viewpoints, an earthquake observation system was installed in a reactor building in an operating plant. Up to now, the records of three earthquakes were obtained with this system. In this paper, an example of the analysis of earthquake records is shown, and the main purpose of the analysis was the evaluation of the vibration mode, natural frequency and damping factor of this piping system. Prior to the earthquake record analysis, the eigenvalue analysis for this piping system was performed. Auto-regressive analysis was applied to the observed acceleration time history which was obtained with a piping system installed in an operating BWR. The results of earthquake record analysis agreed well with the results of eigenvalue analysis. (Kako, I.)
Czekaj, Tomasz Gerard; Henningsen, Arne
The estimation of the technical efficiency comprises a vast literature in the field of applied production economics. There are two predominant approaches: the non-parametric and non-stochastic Data Envelopment Analysis (DEA) and the parametric Stochastic Frontier Analysis (SFA). The DEA is...... specifying an unsuitable functional form and thus, model misspecification and biased parameter estimates. Given these problems of the DEA and the SFA, Fan, Li and Weersink (1996) proposed a semi-parametric stochastic frontier model that estimates the production function (frontier) by non......), Kumbhakar et al. (2007), and Henningsen and Kumbhakar (2009). The aim of this paper and its main contribution to the existing literature is the estimation semi-parametric stochastic frontier models using a different non-parametric estimation technique: spline regression (Ma et al. 2011). We apply this...
Within-session analysis of the extinction of pavlovian fear-conditioning using robust regression
Vargas-Irwin, Cristina
2010-06-01
Full Text Available Traditionally , the analysis of extinction data in fear conditioning experiments has involved the use of standard linear models, mostly ANOVA of between-group differences of subjects that have undergone different extinction protocols, pharmacological manipulations or some other treatment. Although some studies report individual differences in quantities such as suppression rates or freezing percentages, these differences are not included in the statistical modeling. Withinsubject response patterns are then averaged using coarse-grain time windows which can overlook these individual performance dynamics. Here we illustrate an alternative analytical procedure consisting of 2 steps: the estimation of a trend for within-session data and analysis of group differences in trend as main outcome. This procedure is tested on real fear-conditioning extinction data, comparing trend estimates via Ordinary Least Squares (OLS and robust Least Median of Squares (LMS regression estimates, as well as comparing between-group differences and analyzing mean freezing percentage versus LMS slopes as outcomes
Statistical learning method in regression analysis of simulated positron spectral data
Positron lifetime spectroscopy is a non-destructive tool for detection of radiation induced defects in nuclear reactor materials. This work concerns the applicability of the support vector machines method for the input data compression in the neural network analysis of positron lifetime spectra. It has been demonstrated that the SVM technique can be successfully applied to regression analysis of positron spectra. A substantial data compression of about 50 % and 8 % of the whole training set with two and three spectral components respectively has been achieved including a high accuracy of the spectra approximation. However, some parameters in the SVM approach such as the insensitivity zone e and the penalty parameter C have to be chosen carefully to obtain a good performance. (author)
Păniţă, Ovidiu
2015-09-01
In the years 2012-2014 on Banu-Maracine DRS there were tested an assortment of 25 isogenic lines of wheat (Triticum aestivum ssp.vulgare), the analyzed characters being the number of seeds/spike, seeds weight/spike (g), no. of spikes/m2, weight of a thousand seeds (WTS) (g) and no. of emerged plants/m2. Based on recorded data and statistical processing of those, they were identified a numbers of links between these characters. Also available regression models were identified between some of the studied characters. Based on component analysis, no. of seeds/spike and seeds weight/spike are components that influence in excess of 88% variance analysis, a total of seven genotypes with positive scores for both factors.
Roseane Cavalcanti dos Santos
2012-08-01
Full Text Available The objective of this work was to estimate the stability and adaptability of pod and seed yield in runner peanut genotypes based on the nonlinear regression and AMMI analysis. Yield data from 11 trials, distributed in six environments and three harvests, carried out in the Northeast region of Brazil during the rainy season were used. Significant effects of genotypes (G, environments (E, and GE interactions were detected in the analysis, indicating different behaviors among genotypes in favorable and unfavorable environmental conditions. The genotypes BRS Pérola Branca and LViPE‑06 are more stable and adapted to the semiarid environment, whereas LGoPE‑06 is a promising material for pod production, despite being highly dependent on favorable environments.
Multiple comparison analysis testing in ANOVA
McHugh, Mary L.
2011-01-01
The Analysis of Variance (ANOVA) test has long been an important tool for researchers conducting studies on multiple experimental groups and one or more control groups. However, ANOVA cannot provide detailed information on differences among the various study groups, or on complex combinations of study groups. To fully understand group differences in an ANOVA, researchers must conduct tests of the differences between particular pairs of experimental and control groups. Tests conducted on subse...
The Impact of Outliers on Net-Benefit Regression Model in Cost-Effectiveness Analysis.
Wen, Yu-Wen; Tsai, Yi-Wen; Wu, David Bin-Chia; Chen, Pei-Fen
2013-01-01
Ordinary least square (OLS) in regression has been widely used to analyze patient-level data in cost-effectiveness analysis (CEA). However, the estimates, inference and decision making in the economic evaluation based on OLS estimation may be biased by the presence of outliers. Instead, robust estimation can remain unaffected and provide result which is resistant to outliers. The objective of this study is to explore the impact of outliers on net-benefit regression (NBR) in CEA using OLS and to propose a potential solution by using robust estimations, i.e. Huber M-estimation, Hampel M-estimation, Tukey's bisquare M-estimation, MM-estimation and least trimming square estimation. Simulations under different outlier-generating scenarios and an empirical example were used to obtain the regression estimates of NBR by OLS and five robust estimations. Empirical size and empirical power of both OLS and robust estimations were then compared in the context of hypothesis testing. Simulations showed that the five robust approaches compared with OLS estimation led to lower empirical sizes and achieved higher empirical powers in testing cost-effectiveness. Using real example of antiplatelet therapy, the estimated incremental net-benefit by OLS estimation was lower than those by robust approaches because of outliers in cost data. Robust estimations demonstrated higher probability of cost-effectiveness compared to OLS estimation. The presence of outliers can bias the results of NBR and its interpretations. It is recommended that the use of robust estimation in NBR can be an appropriate method to avoid such biased decision making. PMID:23840378
Modelling and analysis of turbulent datasets using Auto Regressive Moving Average processes
Faranda, Davide, E-mail: davide.faranda@cea.fr; Dubrulle, Brengre; Daviaud, Franois [Laboratoire SPHYNX, Service de Physique de l' Etat Condens, DSM, CEA Saclay, CNRS URA 2464, 91191 Gif-sur-Yvette (France); Pons, Flavio Maria Emanuele [Dipartimento di Scienze Statistiche, Universit di Bologna, Via delle Belle Arti 41, 40126 Bologna (Italy); Saint-Michel, Brice [Institut de Recherche sur les Phnomnes Hors Equilibre, Technopole de Chateau Gombert, 49 rue Frdric Joliot Curie, B.P. 146, 13 384 Marseille (France); Herbert, ric [Universit Paris Diderot - LIED - UMR 8236, Laboratoire Interdisciplinaire des nergies de Demain, Paris (France); Cortet, Pierre-Philippe [Laboratoire FAST, CNRS, Universit Paris-Sud (France)
2014-10-15
We introduce a novel way to extract information from turbulent datasets by applying an Auto Regressive Moving Average (ARMA) statistical analysis. Such analysis goes well beyond the analysis of the mean flow and of the fluctuations and links the behavior of the recorded time series to a discrete version of a stochastic differential equation which is able to describe the correlation structure in the dataset. We introduce a new index ? that measures the difference between the resulting analysis and the Obukhov model of turbulence, the simplest stochastic model reproducing both Richardson law and the Kolmogorov spectrum. We test the method on datasets measured in a von Krmn swirling flow experiment. We found that the ARMA analysis is well correlated with spatial structures of the flow, and can discriminate between two different flows with comparable mean velocities, obtained by changing the forcing. Moreover, we show that the ? is highest in regions where shear layer vortices are present, thereby establishing a link between deviations from the Kolmogorov model and coherent structures. These deviations are consistent with the ones observed by computing the Hurst exponents for the same time series. We show that some salient features of the analysis are preserved when considering global instead of local observables. Finally, we analyze flow configurations with multistability features where the ARMA technique is efficient in discriminating different stability branches of the system.
Modelling and analysis of turbulent datasets using Auto Regressive Moving Average processes
We introduce a novel way to extract information from turbulent datasets by applying an Auto Regressive Moving Average (ARMA) statistical analysis. Such analysis goes well beyond the analysis of the mean flow and of the fluctuations and links the behavior of the recorded time series to a discrete version of a stochastic differential equation which is able to describe the correlation structure in the dataset. We introduce a new index Υ that measures the difference between the resulting analysis and the Obukhov model of turbulence, the simplest stochastic model reproducing both Richardson law and the Kolmogorov spectrum. We test the method on datasets measured in a von Kármán swirling flow experiment. We found that the ARMA analysis is well correlated with spatial structures of the flow, and can discriminate between two different flows with comparable mean velocities, obtained by changing the forcing. Moreover, we show that the Υ is highest in regions where shear layer vortices are present, thereby establishing a link between deviations from the Kolmogorov model and coherent structures. These deviations are consistent with the ones observed by computing the Hurst exponents for the same time series. We show that some salient features of the analysis are preserved when considering global instead of local observables. Finally, we analyze flow configurations with multistability features where the ARMA technique is efficient in discriminating different stability branches of the system
Elvio Giasson
2006-06-01
Full Text Available Soil surveys are necessary sources of information for land use planning, but they are not always available. This study proposes the use of multiple logistic regressions on the prediction of occurrence of soil types based on reference areas. From a digitalized soil map and terrain parameters derived from the digital elevation model in ArcView environment, several sets of multiple logistic regressions were defined using statistical software Minitab, establishing relationship between explanatory terrain variables and soil types, using either the original legend or a simplified legend, and using or not stratification of the study area by drainage classes. Terrain parameters, such as elevation, distance to stream, flow accumulation, and topographic wetness index, were the variables that best explained soil distribution. Stratification by drainage classes did not have significant effect. Simplification of the original legend increased the accuracy of the method on predicting soil distribution.Os levantamentos de solos são fontes de informação necessárias para o planejamento de uso das terras, entretanto eles nem sempre estão disponíveis. Este estudo propõe o uso de regressões logísticas múltiplas na predição de ocorrência de classes de solos a partir de áreas de referência. Baseado no mapa original de solos em formato digital e parâmetros do terreno derivados do modelo numérico do terreno em ambiente ArcView, vários conjuntos de regressões logísticas múltiplas foram definidas usando o programa estatístico Minitab, estabelecendo relações entre as variáveis do terreno independentes e tipos de solos, usando tanto a legenda original como uma legenda simplificada, e usando ou não estratificação da área de estudo por classes de drenagem. Os parâmetros do terreno como elevação, distância dos rios, acúmulo de fluxo e índice de umidade topográfica foram as variáveis que melhor explicaram a distribuição das classes de solos. A estratificação por classes de drenagem não teve efeito significativo. A simplificação da legenda aumentou a precisão do método na predição da distribuição dos solos.
Analysis of dynamic multiplicity fluctuations at PHOBOS
This paper presents the analysis of the dynamic fluctuations in the inclusive charged particle multiplicity measured by PHOBOS for Au+Au collisions at √sNN = 200GeV within the pseudo-rapidity range of -3 < η < 3. First the definition of the fluctuations observables used in this analysis is presented, together with the discussion of their physics meaning. Then the procedure for the extraction of dynamic fluctuations is described. Some preliminary results are included to illustrate the correlation features of the fluctuation observable. New dynamic fluctuations results will be available in a later publication
Simunovic, K.; Simunovic, G.; Saric, T.
2013-10-01
The surface roughness is a very significant indicator of surface quality. It represents an essential exploitation requirement and influences technological time and costs, i.e. productivity. For that reason, the main objective of this paper is to analyse the influence of face milling cutting parameters (number of revolution, feed rate and depth of cut) on the surface roughness of aluminium alloy. Hence, a statistical (regression) model has been developed to predict the surface roughness by using the methodology of experimental design. Central composite design is chosen for fitting response surface. Also, numerical optimization considering two goals simultaneously (minimum propagation of error and minimum roughness) was performed throughout the experimental region. In this way, the settings of cutting parameters causing the minimum variability in response were determined for the estimated variations of the significant regression factors.
Fast Ridge Regression with Randomized Principal Component Analysis and Gradient Descent
Lu, Yichao; Foster, Dean P.
2014-01-01
We propose a new two stage algorithm LING for large scale regression problems. LING has the same risk as the well known Ridge Regression under the fixed design setting and can be computed much faster. Our experiments have shown that LING performs well in terms of both prediction accuracy and computational efficiency compared with other large scale regression algorithms like Gradient Descent, Stochastic Gradient Descent and Principal Component Regression on both simulated and real datasets.
The Use of Logistic Regression in the Analysis of Data Concerning Good Medical Practice
Damon MN
2002-06-01
Full Text Available Logistic regression is one of the commonly used models of explicative multivariate analysis utilized in epidemiology. Its use, which has become easier with modern statistical software, allows researchers to control confusion bias. It measures the odds-ratio , a quantification of the association probability between a given occurrence, represented by a dichotomic variable, and factors susceptible to influence it, represented by explicative variables. The choice of explicative variables integrated into the model is based on previous information on the study subject and is aimed at avoiding the confusion factors which have already been identified. The authors explain the fundamental principles of logistic regression and the steps involved in its application. By using two examples (the quality of the follow up care given to diabetics and in-hospital mortality after acute myocardial infarction, they demonstrate the value this statistical tool can have in studies performed by the medical service of the national health care fund, particularly in studies designed to evaluate professional practice.
Rubio, Francisco J.
2016-02-09
We study Bayesian linear regression models with skew-symmetric scale mixtures of normal error distributions. These kinds of models can be used to capture departures from the usual assumption of normality of the errors in terms of heavy tails and asymmetry. We propose a general noninformative prior structure for these regression models and show that the corresponding posterior distribution is proper under mild conditions. We extend these propriety results to cases where the response variables are censored. The latter scenario is of interest in the context of accelerated failure time models, which are relevant in survival analysis. We present a simulation study that demonstrates good frequentist properties of the posterior credible intervals associated with the proposed priors. This study also sheds some light on the trade-off between increased model flexibility and the risk of over-fitting. We illustrate the performance of the proposed models with real data. Although we focus on models with univariate response variables, we also present some extensions to the multivariate case in the Supporting Information.
Improved Regression Analysis of Temperature-Dependent Strain-Gage Balance Calibration Data
Ulbrich, N.
2015-01-01
An improved approach is discussed that may be used to directly include first and second order temperature effects in the load prediction algorithm of a wind tunnel strain-gage balance. The improved approach was designed for the Iterative Method that fits strain-gage outputs as a function of calibration loads and uses a load iteration scheme during the wind tunnel test to predict loads from measured gage outputs. The improved approach assumes that the strain-gage balance is at a constant uniform temperature when it is calibrated and used. First, the method introduces a new independent variable for the regression analysis of the balance calibration data. The new variable is designed as the difference between the uniform temperature of the balance and a global reference temperature. This reference temperature should be the primary calibration temperature of the balance so that, if needed, a tare load iteration can be performed. Then, two temperature{dependent terms are included in the regression models of the gage outputs. They are the temperature difference itself and the square of the temperature difference. Simulated temperature{dependent data obtained from Triumph Aerospace's 2013 calibration of NASA's ARC-30K five component semi{span balance is used to illustrate the application of the improved approach.
Significance Test Algorithm of Crowd Flow in Public Fitness Areas Based On Regression Analysis
Guanghong Liu
2013-01-01
Full Text Available The increase of crowd come in and go out of fitness places can reflects the increase of the number of fitness people from the side. Fitness places often have installed camera, their daily video recording can be used as raw data of health situation in the area. In order to better statistical the number of people in fitness places with high density population, research means on the video should develop toward a more accurate goal that is easier to achieve. In the places with higher population density, target occlusion problem among each other is more prominent, which makes it difficult to detect and trace independent entity in a crowded area and the difficulty to precisely acquire the bodys movement trajectory is strengthened. On the basis of studying the characteristics of the video study object (crowd flow, this study establishes a linear regression model to estimate the population flows. The study first introduces the principle of video motion segmentation and the extraction method of eight categories of image features and then discusses the principles of regression estimation and significance test approach, finally verifies the reasonableness of theoretical models in the text by the data, which provides a theoretical basis for video analysis and provides a better technical foundation for the regional public fitness study.
Diversity Performance Analysis on Multiple HAP Networks
Feihong Dong
2015-06-01
Full Text Available One of the main design challenges in wireless sensor networks (WSNs is achieving a high-data-rate transmission for individual sensor devices. The high altitude platform (HAP is an important communication relay platform for WSNs and next-generation wireless networks. Multiple-input multiple-output (MIMO techniques provide the diversity and multiplexing gain, which can improve the network performance effectively. In this paper, a virtual MIMO (V-MIMO model is proposed by networking multiple HAPs with the concept of multiple assets in view (MAV. In a shadowed Rician fading channel, the diversity performance is investigated. The probability density function (PDF and cumulative distribution function (CDF of the received signal-to-noise ratio (SNR are derived. In addition, the average symbol error rate (ASER with BPSK and QPSK is given for the V-MIMO model. The system capacity is studied for both perfect channel state information (CSI and unknown CSI individually. The ergodic capacity with various SNR and Rician factors for different network configurations is also analyzed. The simulation results validate the effectiveness of the performance analysis. It is shown that the performance of the HAPs network in WSNs can be significantly improved by utilizing the MAV to achieve overlapping coverage, with the help of the V-MIMO techniques.
Huang, Dong; Cabral, Ricardo; De la Torre, Fernando
2016-02-01
Discriminative methods (e.g., kernel regression, SVM) have been extensively used to solve problems such as object recognition, image alignment and pose estimation from images. These methods typically map image features ( X) to continuous (e.g., pose) or discrete (e.g., object category) values. A major drawback of existing discriminative methods is that samples are directly projected onto a subspace and hence fail to account for outliers common in realistic training sets due to occlusion, specular reflections or noise. It is important to notice that existing discriminative approaches assume the input variables X to be noise free. Thus, discriminative methods experience significant performance degradation when gross outliers are present. Despite its obvious importance, the problem of robust discriminative learning has been relatively unexplored in computer vision. This paper develops the theory of robust regression (RR) and presents an effective convex approach that uses recent advances on rank minimization. The framework applies to a variety of problems in computer vision including robust linear discriminant analysis, regression with missing data, and multi-label classification. Several synthetic and real examples with applications to head pose estimation from images, image and video classification and facial attribute classification with missing data are used to illustrate the benefits of RR. PMID:26761740
Kyriakides, Leonidas; Luyten, Hans
2009-01-01
This article reports the results of a study in which the basic regression-discontinuity approach to assess the effect of 1 year of schooling is extended. The data analysis covers the 6 grades of secondary education in Cyprus and thus assesses the contribution of secondary education to the cognitive development of 12- to 18-year-old students. A…
Quantitative analysis of multiple isotope autoradiography
Recently, in nuclear medicine, many new gamma- and positron- emitting radiopharmaceuticals have been introduced, and their distribution and metabolism need to be evaluated. The use of whole body autoradiography (ARG) provides the high spatial resolution required to determine radiopharmaceutical biodistribution in small animals. The quantitative digital film analysis system using videodensitometry permits to analyze the multiple isotope ARG in the same sections of the same animals. The system, the method used and an illustrative example of application of quantitative multiple isotope ARG are described. Simultaneous injections of two tracers can differentiate two physiological process, for example, blood flow and metabolism, in the same animal, and sequential injection of two tracers can identify differences in a process in normal and diseased states, or differences in the same process sampled at two times
Ghaedi, M; Rahimi, Mahmoud Reza; Ghaedi, A M; Tyagi, Inderjeet; Agarwal, Shilpi; Gupta, Vinod Kumar
2016-01-01
Two novel and eco friendly adsorbents namely tin oxide nanoparticles loaded on activated carbon (SnO2-NP-AC) and activated carbon prepared from wood tree Pistacia atlantica (AC-PAW) were used for the rapid removal and fast adsorption of methyl orange (MO) from the aqueous phase. The dependency of MO removal with various adsorption influential parameters was well modeled and optimized using multiple linear regressions (MLR) and least squares support vector regression (LSSVR). The optimal parameters for the LSSVR model were found based on γ value of 0.76 and σ(2) of 0.15. For testing the data set, the mean square error (MSE) values of 0.0010 and the coefficient of determination (R(2)) values of 0.976 were obtained for LSSVR model, and the MSE value of 0.0037 and the R(2) value of 0.897 were obtained for the MLR model. The adsorption equilibrium and kinetic data was found to be well fitted and in good agreement with Langmuir isotherm model and second-order equation and intra-particle diffusion models respectively. The small amount of the proposed SnO2-NP-AC and AC-PAW (0.015 g and 0.08 g) is applicable for successful rapid removal of methyl orange (>95%). The maximum adsorption capacity for SnO2-NP-AC and AC-PAW was 250 mg g(-1) and 125 mg g(-1) respectively. PMID:26414425
Deterministic Assessment of Continuous Flight Auger Construction Durations Using Regression Analysis
Hossam E. Hosny
2015-07-01
Full Text Available One of the primary functions of construction equipment management is to calculate the production rate of equipment which will be a major input to the processes of time estimates, cost estimates and the overall project planning. Accordingly, it is crucial to stakeholders to be able to compute equipment production rates. This may be achieved using an accurate, reliable and easy tool. The objective of this research is to provide a simple model that can be used by specialists to predict the duration of a proposed Continuous Flight Auger job. The model was obtained using a prioritizing technique based on expert judgment then using multi-regression analysis based on a representative sample. The model was then validated on a selected sample of projects. The average error of the model was calculated to be about (3%-6%.
A Logistic Regression Analysis of the Contractor`s Awareness Regarding Waste Management
Rawshan Ara Begum
2006-01-01
Full Text Available This study has highlighted a number of factors affecting contractor`s awareness regarding construction waste management to the construction industry. The data in the present study is based on contractors registered with the Construction Industry Development Board of Malaysia. Binary logistic regression analysis is employed for exploring the factors affecting the awareness. Contractor`s awareness regarding waste management will tend to be significantly adequate with the increasing values in the factors of having waste management plan, awareness of source reduction of waste minimisation measures, awareness of reusing and recycling of waste materials, sorting waste materials, perception on harmfulness of construction waste to the human health and willing to pay more for improved waste collection and disposal services. The findings generated from the study could help the environmental and waste management planners in their decision making for managing construction waste and reducing environmental pollution.
A Note on Penalized Regression Spline Estimation in the Secondary Analysis of Case-Control Data
Gazioglu, Suzan
2013-05-25
Primary analysis of case-control studies focuses on the relationship between disease (D) and a set of covariates of interest (Y, X). A secondary application of the case-control study, often invoked in modern genetic epidemiologic association studies, is to investigate the interrelationship between the covariates themselves. The task is complicated due to the case-control sampling, and to avoid the biased sampling that arises from the design, it is typical to use the control data only. In this paper, we develop penalized regression spline methodology that uses all the data, and improves precision of estimation compared to using only the controls. A simulation study and an empirical example are used to illustrate the methodology.
Wu, X. B.
2006-06-01
Full Text Available Four body-size and fourteen head-size measurements were taken from each Chinese alligator (Alligator sinensis according to the measurements adapted from Verdade. Regression equations between body-size and head-size variables were presented to predict body size from head dimension. The coefficients of determination of captive animals concerning body- and head-size variables can be considered extremely high, which means most of the head-size variables studied can be useful for predicting body length. The result of multivariate allometric analysis indicated that the head elongates as in most other species of crocodilians. The allometric coefficients of snout length (SL and lower ramus (LM were greater than those of other variables of head, which was considered to be possibly correlated to fights and prey. On the contrary, allometric coefficients for the variables of obita (OW, OL and postorbital cranial roof (LCR, were lower than those of other variables.
The monitoring of detailed 3-dimensional (3D) reactor core power distribution is a prerequisite in the operation of nuclear power reactors to ensure that various safety limits imposed on the LPD and DNBR, are not violated during nuclear power reactor operation. The LPD and DNBR should be calculated in order to perform the two major functions of the core protection calculator system (CPCS) and the core operation limit supervisory system (COLSS). The LPD at the hottest part of a hot fuel rod, which is related to the power peaking factor (PPF, Fq ), is more important than the LPD at any other position in a reactor core. The LPD needs to be estimated accurately to prevent nuclear fuel rods from melting. In this study, support vector regression (SVR) and uncertainty analysis have been applied to estimation of reactor core power peaking factor
The ORC (organic Rankine cycle) is an established technology for converting low temperature heat to electricity. Knowing that most of the commercially available ORCs are of the subcritical type, there is potential for improvement by implementing new cycle architectures. The cycles under consideration are: the SCORC (subcritical ORC), the TCORC (transcritical ORC) and the PEORC (partial evaporation ORC). Care is taken to develop an optimization strategy considering various boundary conditions. The analysis and comparison is based on an exergy approach. Initially 67 possible working fluids are investigated. In successive stages design constraints are added. First, only environmentally friendly working fluids are retained. Next, the turbine outlet is constrained to a superheated state. Finally, the heat carrier exit temperature is restricted and addition of a recuperator is considered. Regression models with low computational cost are provided to quickly evaluate each design implications. The results indicate that the PEORC clearly outperforms the TCORC by up to 25.6% in second law efficiency, while the TCORC outperforms the SCORC by up to 10.8%. For high waste heat carrier inlet temperatures the performance gain becomes small. Additionally, a high performing environmentally friendly working fluid for the TCORC is missing at low heat carrier temperatures (100 °C). - Highlights: • Thermodynamic analysis of subcritical, transcritical and partial evaporation ORC. • Regression models are provided to quickly assess design implications. • Performance gain up to 25.6% for PEORC compared to TCORC. • Performance gain up to 10.8% for TCORC compared to SCORC. • Opportunity for new low temperature environmentally friendly working fluids
Poisson regression analysis of the mortality among a cohort of World War II nuclear industry workers
A historical cohort mortality study was conducted among 28,008 white male employees who had worked for at least 1 month in Oak Ridge, Tennessee, during World War II. The workers were employed at two plants that were producing enriched uranium and a research and development laboratory. Vital status was ascertained through 1980 for 98.1% of the cohort members and death certificates were obtained for 96.8% of the 11,671 decedents. A modified version of the traditional standardized mortality ratio (SMR) analysis was used to compare the cause-specific mortality experience of the World War II workers with the U.S. white male population. An SMR and a trend statistic were computed for each cause-of-death category for the 30-year interval from 1950 to 1980. The SMR for all causes was 1.11, and there was a significant upward trend of 0.74% per year. The excess mortality was primarily due to lung cancer and diseases of the respiratory system. Poisson regression methods were used to evaluate the influence of duration of employment, facility of employment, socioeconomic status, birth year, period of follow-up, and radiation exposure on cause-specific mortality. Maximum likelihood estimates of the parameters in a main-effects model were obtained to describe the joint effects of these six factors on cause-specific mortality of the World War II workers. We show that these multivariate regression techniques provide a useful extension of conventional SMR analysis and illustrate their effective use in a large occupational cohort study
Ridge Regression Analysis on the Influential Factors of FDI in Jiangsu Province
Yang CAO
2008-08-01
Full Text Available
As Chinese eastern coastal developed areas, through the use of foreign capital, Jiangsu Province has not only promoted economic growth rapidly, enhanced the regional comprehensive competitiveness, promoted employment, but also created a new famous mode of economic development called Sunan. Based on the qualitative analysis of factors affecting the inflow of foreign capital in Jiangsu, the paper establish a mathematical model between the FDI and major economic indicators in Jiangsu, in accordance with its own characteristics. And then taken 1992-2006 time-series data for the background, the paper use the method of ridge regression to analysis the influential factors of FDI in Jiangsu.
Key words: foreign direct investment, ridge regression, factors, Jiangsu
Rsum: En tant quune rgion dveloppe dans la cte-est de la Chine, grce lusage du capital tranger, la province du Jiangsu a non seulement eu une croissance conomique rapide, augment la comptitivit gnrale, cr desemplois mais aussi invent un nouveau modle du dveloppement conomique quon appelle Sunan. En se basant sur les analyses qualitatives des facteurs affectant lafflux du capital tranger dans la province de Jiangsu, larticle talit un modle mathmatiqueentre le FDI et les principaux indicateurs conomiques dans la Province, conformment ses caractristiques appropries. Et puis, en employant les donnes de la priode de lanne 1992 2006 comme larrire-plan, larticle utilise la mthode danalyse de ridge rgressionn pour tudier les facteurs influents de FDI dans la province de Jiangsu.
Mots-Cls: investissements directs trangers, ridge rgression, facteurs, Jiangsu
The Analysis of Internet Addiction Scale Using Multivariate Adaptive Regression Splines
M Kayri
2010-12-01
Full Text Available "nBackground: Determining real effects on internet dependency is too crucial with unbiased and robust statistical method. MARS is a new non-parametric method in use in the literature for parameter estimations of cause and effect based research. MARS can both obtain legible model curves and make unbiased parametric predictions."nMethods: In order to examine the performance of MARS, MARS findings will be compared to Classification and Regression Tree (C&RT findings, which are considered in the literature to be efficient in revealing correlations between variables. The data set for the study is taken from "The Internet Addiction Scale" (IAS, which attempts to reveal addiction levels of individuals. The population of the study consists of 754 secondary school students (301 female, 443 male students with 10 missing data. MARS 2.0 trial version is used for analysis by MARS method and C&RT analysis was done by SPSS."nResults: MARS obtained six base functions of the model. As a common result of these six functions, regression equation of the model was found. Over the predicted variable, MARS showed that the predictors of daily Internet-use time on average, the purpose of Internet- use, grade of students and occupations of mothers had a significant effect (P< 0.05. In this comparative study, MARS obtained different findings from C&RT in dependency level prediction."nConclusion: The fact that MARS revealed extent to which the variable, which was considered significant, changes the character of the model was observed in this study.
A Skew-t space-varying regression model for the spectral analysis of resting state brain activity.
Ismail, Salimah; Sun, Wenqi; Nathoo, Farouk S; Babul, Arif; Moiseev, Alexader; Beg, Mirza Faisal; Virji-Babul, Naznin
2013-08-01
It is known that in many neurological disorders such as Down syndrome, main brain rhythms shift their frequencies slightly, and characterizing the spatial distribution of these shifts is of interest. This article reports on the development of a Skew-t mixed model for the spatial analysis of resting state brain activity in healthy controls and individuals with Down syndrome. Time series of oscillatory brain activity are recorded using magnetoencephalography, and spectral summaries are examined at multiple sensor locations across the scalp. We focus on the mean frequency of the power spectral density, and use space-varying regression to examine associations with age, gender and Down syndrome across several scalp regions. Spatial smoothing priors are incorporated based on a multivariate Markov random field, and the markedly non-Gaussian nature of the spectral response variable is accommodated by the use of a Skew-t distribution. A range of models representing different assumptions on the association structure and response distribution are examined, and we conduct model selection using the deviance information criterion. (1) Our analysis suggests region-specific differences between healthy controls and individuals with Down syndrome, particularly in the left and right temporal regions, and produces smoothed maps indicating the scalp topography of the estimated differences. PMID:22614763
Cheng, Yongcun; Andersen, Ole Baltazar; Knudsen, Per
2010-01-01
GMES marine core service. One such added value will be a multivariate regression model of sea level variability of multisatellite and in-situ tide gauge observations with the aim at improved future high spatial and temporal sea level prediction for i.e., human safety. Tide gauges and satellite...... altimetry data from the last seventeen years have been compared for an area around UK and temporal correlation coefficients between them were calculated. The results are extremely encouraging, as we have shown that the detided signal from response method correlates to more than 90% for nearly all tide gauge...
Naghshpour, Shahdad
2012-01-01
Regression analysis is the most commonly used statistical method in the world. Although few would characterize this technique as simple, regression is in fact both simple and elegant. The complexity that many attribute to regression analysis is often a reflection of their lack of familiarity with the language of mathematics. But regression analysis can be understood even without a mastery of sophisticated mathematical concepts. This book provides the foundation and will help demystify regression analysis using examples from economics and with real data to show the applications of the method. T
Ghasemi, Jahanbakhsh [Chemistry Department, Faculty of Sciences, Razi University, Kermanshah (Iran, Islamic Republic of)], E-mail: Jahan.ghasemi@gmail.com; Saaidpour, Saadi [Chemistry Department, Faculty of Sciences, Razi University, Kermanshah (Iran, Islamic Republic of)
2007-12-05
A quantitative structure-property relationship (QSPR) study was performed to develop models those relate the structures of 150 drug organic compounds to their n-octanol-water partition coefficients (log P{sub o/w}). Molecular descriptors derived solely from 3D structures of the molecular drugs. A genetic algorithm was also applied as a variable selection tool in QSPR analysis. The models were constructed using 110 molecules as training set, and predictive ability tested using 40 compounds. Modeling of log P{sub o/w} of these compounds as a function of the theoretically derived descriptors was established by multiple linear regression (MLR). Four descriptors for these compounds molecular volume (MV) (geometrical), hydrophilic-lipophilic balance (HLB) (constitutional), hydrogen bond forming ability (HB) (electronic) and polar surface area (PSA) (electrostatic) are taken as inputs for the model. The use of descriptors calculated only from molecular structure eliminates the need for experimental determination of properties for use in the correlation and allows for the estimation of log P{sub o/w} for molecules not yet synthesized. Application of the developed model to a testing set of 40 drug organic compounds demonstrates that the model is reliable with good predictive accuracy and simple formulation. The prediction results are in good agreement with the experimental value. The root mean square error of prediction (RMSEP) and square correlation coefficient (R{sup 2}) for MLR model were 0.22 and 0.99 for the prediction set log P{sub o/w}.
Barbu, N.; Cuculeanu, V.; Stefan, S.
2015-08-01
The aim of this study is to investigate the relationship between the frequency of very warm days (TX90p) in Romania and large-scale atmospheric circulation for winter (December-February) and summer (June-August) between 1962 and 2010. In order to achieve this, two catalogues from COST733Action were used to derive daily circulation types. Seasonal occurrence frequencies of the circulation types were calculated and have been utilized as predictors within the multiple linear regression model (MLRM) for the estimation of winter and summer TX90p values for 85 synoptic stations covering the entire Romania. A forward selection procedure has been utilized to find adequate predictor combinations and those predictor combinations were tested for collinearity. The performance of the MLRMs has been quantified based on the explained variance. Furthermore, the leave-one-out cross-validation procedure was applied and the root-mean-squared error skill score was calculated at station level in order to obtain reliable evidence of MLRM robustness. From this analysis, it can be stated that the MLRM performance is higher in winter compared to summer. This is due to the annual cycle of incoming insolation and to the local factors such as orography and surface albedo variations. The MLRM performances exhibit distinct variations between regions with high performance in wintertime for the eastern and southern part of the country and in summertime for the western part of the country. One can conclude that the MLRM generally captures quite well the TX90p variability and reveals the potential for statistical downscaling of TX90p values based on circulation types.
Keerthiprasad.K
2014-08-01
Full Text Available In recent years, alloy steels have been widely usedin aerospace and automotive industries. Machining of these materials requires better understanding of cutting processes regarding accuracy and efficiency. This study addresses the modelling of the machinability of EN353 and 20mncr5 materials. In this study, multiple regression analysis (MRA is used to investigate the influence of some parameters on the thrust force and torque in the drilling processes of alloy steel materials. The model were identified by using cutting speed, feed rate, and depth as input data and the thrust force and torque as the output data. The statistical analysis accompanied with results showed that cutting feed (f were the most significant parameters on the drilling process, while spindle speed seemed insignificant. Since the spindle speed was insignificant, it directed us to set it either at the highest spindle speed to obtain high material removal rate or at the lowest spindle speed to prolong the tool life depending on the need for the application. The mathematical model is based on a power regression modelling, dependent on the three above mentioned parameters.
New introduction to multiple time series analysis
Ltkepohl, Helmut
2005-01-01
When I worked on my Introduction to Multiple Time Series Analysis (Lutk - pohl (1991)), a suitable textbook for this ?eld was not available. Given the great importance these methods have gained in applied econometric work, it is perhaps not surprising in retrospect that the book was quite successful. Now, almost one and a half decades later the ?eld has undergone substantial development and, therefore, the book does not cover all topics of my own courses on the subject anymore. Therefore, I started to think about a serious revision of the book when I moved to the European University Institu
Süleyman Demir
2014-04-01
Full Text Available This study performs a Differential Item Function (DIF analysis in terms of gender and culture on the items available in the PISA 2009 mathematics literacy sub-test. The DIF analyses were done through the Mantel Haenszel, Logistic Regression and the SIBTEST methods. The data for the gender variable were collected from the responses given by 332 students to the items in the mathematics literacy sub-test during the administration of the 5th booklet in the PISA 2009 application whereas the data for the culture variable were collected through the application of the 5th booklet in Turkey, Germany, Finland and the United States in the PISA 2009 application. As a result of DIF analysis according to gender, 4 items carried out in favor of men, only one item can be said to be advantageous in favor of girls. As a result of DIF analysis according to culture, 16 items for Turkish and German students, 14 items for Turkish and Finn students, 18 items for Turkish and United States students were determined.
A brief introduction to regression designs and mixed-effects modelling by a recent convert
Balling, Laura Winther
2008-01-01
This article discusses the advantages of multiple regression designs over the factorial designs traditionally used in many psycholinguistic experiments. It is shown that regression designs are typically more informative, statistically more powerful and better suited to the analysis of naturalistic...
Daz, S.; Deferrari, G.; Martinioni, D.; Oberto, A.
2000-05-01
Factors affecting UV radiation at the earth's surface include the solar zenith angle, earth-sun distance, clouds, aerosols, altitude, ozone and the ground's albedo. The variation of some factors, such as solar zenith angle and earth-sun distance, is well established. Total column ozone and UV radiation are inversely related, but the presence of clouds may affect the resulting UV in such a way that a depletion in the total column ozone may not always lead to an increase in the radiation at the earth's surface. The aim of this paper is to determine the contribution to the variation of the biologically effective irradiance by geometric factors, clouds and ozone, jointly and separately, in Ushuaia (5449'S, 6819'W, sea level), and the seasonal variation of this relationship, given the magnitude and seasonal distribution of the ozone depletion and the frequent presence of high cloud cover in this site. For this purpose, multivariate and simple regression analyses of daily and monthly integrated irradiances weighted by the DNA damage action spectrum as a function of total column ozone and the integrated irradiances in the band 337-342 nm (as a proxy for cloud cover and geometric factors) have been performed. For the analysed period (September 1989-December 1996) more than 97% of the variation of the DNA damage weighted daily integrated irradiances is described by changes in ozone, clouds and geometric factors. Simple regression analysis for daily integrated irradiances, grouped by month, shows that most of this variation is explained by clouds and geometric factors, except in spring, when strong ozone depletion occurs intermittently over this area. When monthly trends are removed, similar results are observed, except for late winter.
Yi, Honggang; Wo, Hongmei; Zhao, Yang; Zhang, Ruyang; Dai, Junchen; Jin, Guangfu; Ma, Hongxia; Wu, Tangchun; Hu, Zhibin; Lin, Dongxin; Shen, Hongbing; Chen, Feng
2015-07-01
With recent advances in biotechnology, genome-wide association study (GWAS) has been widely used to identify genetic variants that underlie human complex diseases and traits. In case-control GWAS, typical statistical strategy is traditional logistical regression (LR) based on single-locus analysis. However, such a single-locus analysis leads to the well-known multiplicity problem, with a risk of inflating type I error and reducing power. Dimension reduction-based techniques, such as principal component-based logistic regression (PC-LR), partial least squares-based logistic regression (PLS-LR), have recently gained much attention in the analysis of high dimensional genomic data. However, the performance of these methods is still not clear, especially in GWAS. We conducted simulations and real data application to compare the type I error and power of PC-LR, PLS-LR and LR applicable to GWAS within a defined single nucleotide polymorphism (SNP) set region. We found that PC-LR and PLS can reasonably control type I error under null hypothesis. On contrast, LR, which is corrected by Bonferroni method, was more conserved in all simulation settings. In particular, we found that PC-LR and PLS-LR had comparable power and they both outperformed LR, especially when the causal SNP was in high linkage disequilibrium with genotyped ones and with a small effective size in simulation. Based on SNP set analysis, we applied all three methods to analyze non-small cell lung cancer GWAS data. PMID:26243516
Anwar Fitrianto
2014-01-01
Full Text Available When independent variables have high linear correlation in a multiple linear regression model, we can have wrong analysis. It happens if we do the multiple linear regression analysis based on common Ordinary Least Squares (OLS method. In this situation, we are suggested to use ridge regression estimator. We conduct some simulation study to compare the performance of ridge regression estimator and the OLS. We found that Hoerl and Kennard ridge regression estimation method has better performance than the other approaches.
Anwar Fitrianto; Lee Ceng Yik
2014-01-01
When independent variables have high linear correlation in a multiple linear regression model, we can have wrong analysis. It happens if we do the multiple linear regression analysis based on common Ordinary Least Squares (OLS) method. In this situation, we are suggested to use ridge regression estimator. We conduct some simulation study to compare the performance of ridge regression estimator and the OLS. We found that Hoerl and Kennard ridge regression estimation method has better performan...
Droz, J P; Kramar, A; Ghosn, M; Piot, G; Rey, A; Theodore, C; Wibault, P; Court, B H; Perrin, J L; Travagli, J P
1988-08-01
In order to define prognostic factors for advanced stage of nonseminomatous germ cell tumors (NSGCT) of the testis, the authors reviewed 84 patients treated from 1978 through 1985. The survival rate was 51% at 3 years. Patients with elevated seric levels of human chorionic gonadotropin (HCG) and/or alpha-fetoprotein (AFP), or the presence of an abdominal mass had significantly worse survival. Only HCG and AFP levels retained their significance when multivariate Cox analysis was performed. The probability that a patient achieves a complete remission (CR) was assessed by a function of certain patient characteristics using a multivariate logistic regression analysis. The significant variables were a function of HCG and AFP values. Since both variables are related to the CR rate and survival the authors define the obtention of a CR as a unique outcome of interest. The probability of a CR greater than 70% adequately separates the patients into two prognostic subgroups. This model currently is being used to enrole NSGCT patients in a prospective modulated clinical trial according to these prognostic factors. PMID:2455591
Cigarette Smoking Habits among Men and Women in Turkey: A Meta Regression Analysis
F Sahin Mutlu
2006-06-01
Full Text Available Smoking has become more prevalent in Turkey than it has in those of western countries during the past decade. This study was conducted to make parameter estimations on gender related smoking habits with the minimum of variance. Of the ninety-two researches related to smoking habits conducted from 1981 to 2003 in Turkey, 60 were deemed appropriate for the application of Meta analysis and Meta regression analysis. The proportions of men and women smoking cigarettes were 0.51 and 0.35, respectively. The proportion of men smoking cigarette in 1996 and the years before it was 0.52, and for women as 0.35. However, the figures for the years following 1996 were 0.41 for men, and 0.32 for women. In the results of the Dersimonian and Laird random effect model, the Odds Ratio, which shows the tendency of men to smoke compared to women, was found 1.894 for the period of 1981-2003. A heterogeneous distribution between the researches was apparent (Q=1560.91, P<0.001 as well as for Tau-square test (x2=0.55, z=6.29, P<0.001. We propose that effective precautions should be considered, especially with regard to the introduction of laws to minimize the smoking habit for both sexes, with particular attention to women.
Generalized multilevel function-on-scalar regression and principal component analysis.
Goldsmith, Jeff; Zipunnikov, Vadim; Schrack, Jennifer
2015-06-01
This manuscript considers regression models for generalized, multilevel functional responses: functions are generalized in that they follow an exponential family distribution and multilevel in that they are clustered within groups or subjects. This data structure is increasingly common across scientific domains and is exemplified by our motivating example, in which binary curves indicating physical activity or inactivity are observed for nearly 600 subjects over 5 days. We use a generalized linear model to incorporate scalar covariates into the mean structure, and decompose subject-specific and subject-day-specific deviations using multilevel functional principal components analysis. Thus, functional fixed effects are estimated while accounting for within-function and within-subject correlations, and major directions of variability within and between subjects are identified. Fixed effect coefficient functions and principal component basis functions are estimated using penalized splines; model parameters are estimated in a Bayesian framework using Stan, a programming language that implements a Hamiltonian Monte Carlo sampler. Simulations designed to mimic the application have good estimation and inferential properties with reasonable computation times for moderate datasets, in both cross-sectional and multilevel scenarios; code is publicly available. In the application we identify effects of age and BMI on the time-specific change in probability of being active over a 24-hour period; in addition, the principal components analysis identifies the patterns of activity that distinguish subjects and days within subjects. PMID:25620473
Risky decision making in Attention-Deficit/Hyperactivity Disorder: A meta-regression analysis.
Dekkers, Tycho J; Popma, Arne; Agelink van Rentergem, Joost A; Bexkens, Anika; Huizenga, Hilde M
2016-04-01
ADHD has been associated with various forms of risky real life decision making, for example risky driving, unsafe sex and substance abuse. However, results from laboratory studies on decision making deficits in ADHD have been inconsistent, probably because of between study differences. We therefore performed a meta-regression analysis in which 37 studies (n ADHD=1175; n Control=1222) were included, containing 52 effect sizes. The overall analysis yielded a small to medium effect size (standardized mean difference=.36, pADHD showed more risky decision making than control groups. There was a trend for a moderating influence of co-morbid Disruptive Behavior Disorders (DBD): studies including more participants with co-morbid DBD had larger effect sizes. No moderating influence of co-morbid internalizing disorders, age or task explicitness was found. These results indicate that ADHD is related to increased risky decision making in laboratory settings, which tended to be more pronounced if ADHD is accompanied by DBD. We therefore argue that risky decision making should have a more prominent role in research on the neuropsychological and -biological mechanisms of ADHD, which can be useful in ADHD assessment and intervention. PMID:26978323
VanEngelsdorp, Dennis; Speybroeck, Niko; Evans, Jay D; Nguyen, Bach Kim; Mullin, Chris; Frazier, Maryann; Frazier, Jim; Cox-Foster, Diana; Chen, Yanping; Tarpy, David R; Haubruge, Eric; Pettis, Jeffrey S; Saegerman, Claude
2010-10-01
Colony collapse disorder (CCD), a syndrome whose defining trait is the rapid loss of adult worker honey bees, Apis mellifera L., is thought to be responsible for a minority of the large overwintering losses experienced by U.S. beekeepers since the winter 2006-2007. Using the same data set developed to perform a monofactorial analysis (PloS ONE 4: e6481, 2009), we conducted a classification and regression tree (CART) analysis in an attempt to better understand the relative importance and interrelations among different risk variables in explaining CCD. Fifty-five exploratory variables were used to construct two CART models: one model with and one model without a cost of misclassifying a CCD-diagnosed colony as a non-CCD colony. The resulting model tree that permitted for misclassification had a sensitivity and specificity of 85 and 74%, respectively. Although factors measuring colony stress (e.g., adult bee physiological measures, such as fluctuating asymmetry or mass of head) were important discriminating values, six of the 19 variables having the greatest discriminatory value were pesticide levels in different hive matrices. Notably, coumaphos levels in brood (a miticide commonly used by beekeepers) had the highest discriminatory value and were highest in control (healthy) colonies. Our CART analysis provides evidence that CCD is probably the result of several factors acting in concert, making afflicted colonies more susceptible to disease. This analysis highlights several areas that warrant further attention, including the effect of sublethal pesticide exposure on pathogen prevalence and the role of variability in bee tolerance to pesticides on colony survivorship. PMID:21061948
Boy-Roura, M; Cameron, K C; Di, H J
2016-02-01
This study presents a meta-analysis of 12 experiments that quantify nitrate-N leaching losses from grazed pasture systems in alluvial sedimentary soils in Canterbury (New Zealand). Mean measured nitrate-N leached (kgN/ha??100mm drainage) losses were 2.7 when no urine was applied, 8.4 at the urine rate of 300kgN/ha, 9.8 at 500kgN/ha, 24.5 at 700kgN/ha and 51.4 at 1000kgN/ha. Lismore soils presented significantly higher nitrate-N losses compared to Templeton soils. Moreover, a multiple linear regression (MLR) model was developed to determine the key factors that influence nitrate-N leaching and to predict nitrate-N leaching losses. The MLR analyses was calibrated and validated using 82 average values of nitrate-N leached and 48 explanatory variables representative of nitrogen inputs and outputs, transport, attenuation of nitrogen and farm management practices. The MLR model (R (2)?=?0.81) showed that nitrate-N leaching losses were greater at higher urine application rates and when there was more drainage from rainfall and irrigation. On the other hand, nitrate leaching decreased when nitrification inhibitors (e.g. dicyandiamide (DCD)) were applied. Predicted nitrate-N leaching losses at the paddock scale were calculated using the MLR equation, and they varied largely depending on the urine application rate and urine patch coverage. PMID:26498804
K.Satyanarayana
2013-06-01
Full Text Available The present work deals with the cutting forces and cutting temperature produced during turning of titanium alloy Ti-6Al-4V with PVD TiN coated tungsten carbide inserts under dry environment. The 1st order mathematical models are developed using multiple regression analysis and optimized the process parameters using contour plots. The model presented high determination coefficient (R2 = 0.964 and 0.989 explaining 96.4 % and 98.9 % of the variability in the cutting force and cutting temperature, which indicates the goodness of fit for the model and high significance of the model. The developed mathematical model correlates the relationship of the cutting force and temperature with the process parameters with good degree of approximation. From the contour plots, the optimal parametric combination for lowest cutting force is v 3 (75 m/min – f 1 (0.25 mm/rev. Similarly, the optimal parametric combination for minimum temperature is v 1 (45 m/min – f 1 (0.25 mm/rev. Cutting speed is found to be the most significance parameter on cutting forces followed by feed. Similarly, for cutting temperature, feed is found to be the most influencing parameter followed by cutting speed.
Clegg, Samuel M [Los Alamos National Laboratory; Barefield, James E [Los Alamos National Laboratory; Wiens, Roger C [Los Alamos National Laboratory; Dyar, Melinda D [MT HOLYOKE COLLEGE; Schafer, Martha W [LSU; Tucker, Jonathan M [MT HOLYOKE COLLEGE
2008-01-01
The ChemCam instrument on the Mars Science Laboratory (MSL) will include a laser-induced breakdown spectrometer (LIBS) to quantify major and minor elemental compositions. The traditional analytical chemistry approach to calibration curves for these data regresses a single diagnostic peak area against concentration for each element. This approach contrasts with a new multivariate method in which elemental concentrations are predicted by step-wise multiple regression analysis based on areas of a specific set of diagnostic peaks for each element. The method is tested on LIBS data from igneous and metamorphosed rocks. Between 4 and 13 partial regression coefficients are needed to describe each elemental abundance accurately (i.e., with a regression line of R{sup 2} > 0.9995 for the relationship between predicted and measured elemental concentration) for all major and minor elements studied. Validation plots suggest that the method is limited at present by the small data set, and will work best for prediction of concentration when a wide variety of compositions and rock types has been analyzed.
Marco Aurélio Carino Bouzada
2009-09-01
Full Text Available Este trabalho descreve - por meio do estudo de um caso - o problema da previsão de demanda de chamadas para um determinado produto no call center de uma grande empresa brasileira do setor - a Contax - e como ele foi abordado com o uso de Regressão Múltipla com variáveis dummy. Depois de destacar e justificar a importância do tema, o estudo apresenta uma breve revisão de literatura acerca de métodos de previsão de demanda e de sua aplicação em call centers. O caso é descrito, contextualizando, inicialmente, a empresa estudada e descrevendo, a seguir, a forma como ela lida com o problema de previsão de demanda de chamadas para o produto 103 - serviços relacionados à telefonia fixa. Um modelo de Regressão Múltipla com variáveis dummy é, então, desenvolvido para servir como base do processo de previsão de demanda proposto. Este modelo utiliza informações disponíveis capazes de influenciar a demanda, tais como o dia da semana, a ocorrência ou não de feriado e a proximidade da data com eventos críticos, como a chegada da conta à residência do cliente e seu vencimento; e apresentou ganhos de acurácia da ordem de 3 pontos percentuais para o período estudado, quando comparado com a ferramenta anteriormente em uso.This work describes - with the aid of a case study -a demand forecast problem for a specific product reported to the call center of a large Brazilian company in an industry called Contax, and the way it was approached with the use of Multiple Regression using dummy variables. After highlighting and justifying the studied matter relevance, the article presents a small literature review regarding demand forecast methods and their use in the call center industry. The case is described presenting the studied company and the way it deals with the Forecasting Demand for a telephone all center regarding telephone services products. Therefore, a Multiple Regression with dummy variables model was developed to work as the basis of the proposed demand forecast process. This model uses available data capable of influencing the demand such as the week day, occurrence of holidays, and the date of critical events such as the date on which the bill is sent and the date of payment collect. The model presented an improvement of Demand Forecasting Accuracy of 0.3% in the studied period when compared to the previously tool in use
A proposal for a new dimension analysis procedure in a general regression problem
Santiago Velilla; Mª Pilar Barrios
2001-01-01
In this paper, a new procedure for testing the number of linear components in a general regression problem is introduced. It is based on a nonparametric estimate of the covariance matrix of the inverse regression curve. A review of previous dimension tests is also presented.
/ Partial least squares (PLS) regression and its application to coal analysis
Carlos E, Alciaturi; Marcos E, Escobar; Carlos, De La Cruz; Carlos, Rincn.
2003-12-01
Full Text Available Los mtodos instrumentales de anlisis qumico hacen uso de las relaciones entre la seal obtenida y una propiedad del sistema estudiado (generalmente, una concentracin). Los avances en electrnica y computacin han hecho posible un rpido progreso en la adquisicin de datos y en su transmisin y p [...] rocesamiento. La aplicacin de diversos mtodos matemticos al clculo de concentraciones y otras propiedades a partir de datos instrumentales se conoce como quimiometra y es un rea de intensa actividad, por sus amplias aplicaciones en la industria qumica, de procesos y en estudios ambientales. Uno de los mtodos ms usados en quimiometra es el mtodo de mnimos cuadrados parciales, conocido por sus iniciales en ingls, PLS ("partial least squares"). Este mtodo, relacionado con la regresin de componentes principales, PCR ("principal components regression") posee ventajas tericas y computacionales que han llevado a innumerables aplicaciones. Se encuentran en Internet decenas de miles de referencias solamente para el PLS lineal. En este artculo, se explica los fundamentos del mtodo y se muestra una aplicacin a la prediccin de propiedades de carbones minerales a partir de datos del infrarrojo medio, con el objetivo de desarrollar mtodos de anlisis rpidos y no destructivos para estos materiales. Abstract in english Instrumental chemical analysis methods use the relationships between a signal obtained and a property (generally a concentration) of the system under study. The study and applications of these relations is known as chemometrics, a discipline of intense development, with ample applications in chemica [...] l and process industry and in environmental studies. The method of partial least squares (PLS) is one of the most used in chemometrics. This method is closely related to principal components regression (PCR). PLS have theoretical and computational advantages that have led to a great number of applications. The numbers of Internet sites referring to PLS are hundreds of thousands. Here, we give the fundamentals and show an application to prediction of coal properties from mid-infrared data, with the purpose of developing fast, non-destructive methods of analysis for these materials.
Breast density (the percentage of fibroglandular tissue in the breast) has been suggested to be a useful surrogate marker for breast cancer risk. It is conventionally measured using screen-film mammographic images by a labor-intensive histogram segmentation method (HSM). We have adapted and modified the HSM for measuring breast density from raw digital mammograms acquired by full-field digital mammography. Multiple regression model analyses showed that many of the instrument parameters for acquiring the screening mammograms (e.g. breast compression thickness, radiological thickness, radiation dose, compression force, etc) and image pixel intensity statistics of the imaged breasts were strong predictors of the observed threshold values (model R2 = 0.93) and %-density (R2 = 0.84). The intra-class correlation coefficient of the %-density for duplicate images was estimated to be 0.80, using the regression model-derived threshold values, and 0.94 if estimated directly from the parameter estimates of the %-density prediction regression model. Therefore, with additional research, these mathematical models could be used to compute breast density objectively, automatically bypassing the HSM step, and could greatly facilitate breast cancer research studies
Using the classical linear regression model in analysis of the dependences of conveyor belt life
Miriam Andrejiov
2013-12-01
Full Text Available The paper deals with the classical linear regression model of the dependence of conveyor belt life on some selected parameters: thickness of paint layer, width and length of the belt, conveyor speed and quantity of transported material. The first part of the article is about regression model design, point and interval estimation of parameters, verification of statistical significance of the model, and about the parameters of the proposed regression model. The second part of the article deals with identification of influential and extreme values that can have an impact on estimation of regression model parameters. The third part focuses on assumptions of the classical regression model, i.e. on verification of independence assumptions, normality and homoscedasticity of residuals.
Failure analysis of high strength pipeline with single and multiple corrosions
Highlights: • We study failure of high strength pipelines with single corrosion. • We give regression equations for failure pressure prediction. • We propose assessment procedure for pipelines with multiple corrosions. - Abstract: Corrosion will compromise safety operation of oil and gas pipelines, accurate determination of failure pressure finds importance in residual strength assessment and corrosion allowance design of onshore and offshore pipelines. This paper investigates failure pressure of high strength pipeline with single and multiple corrosions using nonlinear finite element analysis. On the basis of developed regression equations for failure pressure prediction of high strength pipeline with single corrosion, the paper proposes an assessment procedure for predicting failure pressure of high strength pipeline with multiple corrosions. Furthermore, failure pressures predicted by proposed solutions are compared with experimental results and various assessment methods available in literature, where accuracy and versatility are demonstrated
Lombardo, L.; Cama, M.; Maerker, M.; Parisi, L.; Rotigliano, E.
2014-12-01
This study aims at comparing the performances of Binary Logistic Regression (BLR) and Boosted Regression Trees (BRT) methods in assessing landslide susceptibility for multiple-occurrence regional landslide events within the Mediterranean region. A test area was selected in the north-eastern sector of Sicily (southern Italy), corresponding to the catchments of the Briga and the Giampilieri streams both stretching for few kilometres from the Peloritan ridge (eastern Sicily, Italy) to the Ionian sea. This area was struck on the 1st October 2009 by an extreme climatic event resulting in thousands of rapid shallow landslides, mainly of debris flows and debris avalanches types involving the weathered layer of a low to high grade metamorphic bedrock. Exploiting the same set of predictors and the 2009 landslide archive, BLR- and BRT-based susceptibility models were obtained for the two catchments separately, adopting a random partition (RP) technique for validation; besides, the models trained in one of the two catchments (Briga) were tested in predicting the landslide distribution in the other (Giampilieri), adopting a spatial partition (SP) based validation procedure. All the validation procedures were based on multi-folds tests so to evaluate and compare the reliability of the fitting, the prediction skill, the coherence in the predictor selection and the precision of the susceptibility estimates. All the obtained models for the two methods produced very high predictive performances, with a general congruence between BLR and BRT in the predictor importance. In particular, the research highlighted that BRT-models reached a higher prediction performance with respect to BLR-models, for RP based modelling, whilst for the SP-based models the difference in predictive skills between the two methods dropped drastically, converging to an analogous excellent performance. However, when looking at the precision of the probability estimates, BLR demonstrated to produce more robust models in terms of selected predictors and coefficients, as well as of dispersion of the estimated probabilities around the mean value for each mapped pixel. The difference in the behaviour could be interpreted as the result of overfitting effects, which heavily affect decision tree classification more than logistic regression techniques.
Multiple tumors. Analysis of 50 patients
The description of multiple primary neoplasms dating from the late nineteenth; Warrem and Gates established the clinicopathological criteria for diagnosis. frequency Clinical presentation is from 1.5 to 5.4% of cancers, and of 5% to 11% by autopsies. In recent years there has been an increase in second tumors probably due to new strategies of staging, monitoring patients (ptes) and therapeutic results with improved survival from first diagnosis. Objective: Analysis of 50 tumor ptes multiple carriers assisted in the HCFF.AA Oncology Service in the period 1/1997 to 1/2004. Patients and methods: ptes included. registered in the H.C.FF.AA, carriers 2 or histologically documented malignant tumors. Were reviewed medical records, describing age, sex, date of diagnosis and type of tumor. Frequency of these tumors and their occurrence interval were analyzed. Results: We included 50 ptes, with 2.0% of registered patients. (2400). The average age was 61 years (36-89 years). Median appearance interval between the first and second tumor was 28 months (0-300). The most common tumors were: breast carcinoma (23), no skin tumors melanoma (15), colon adenocarcinoma (12), prostate (8) and kidney (6). according to appearance 10 were synchronous and 40 metachronous. Breast tumor They most often associated endometrial tumors (5), ovarian (3), colon (3) and kidney (3). Of the 50 patients, 42 had 2 tumors in 8 cases and 3 tumors. Conclusions: The frequency of occurrence of multiple neoplasms in our series and presentation mode in time does not differ from that reported by other authors. Monitoring of patients with cancer and advances in diagnosis Therapeutic and lead to increased tumor diagnosis seconds and a new therapeutic challenge
Adamowski, Jan; Fung Chan, Hiu; Prasher, Shiv O.; Ozga-Zielinski, Bogdan; Sliusarieva, Anna
2012-01-01
Daily water demand forecasts are an important component of cost-effective and sustainable management and optimization of urban water supply systems. In this study, a method based on coupling discrete wavelet transforms (WA) and artificial neural networks (ANNs) for urban water demand forecasting applications is proposed and tested. Multiple linear regression (MLR), multiple nonlinear regression (MNLR), autoregressive integrated moving average (ARIMA), ANN and WA-ANN models for urban water demand forecasting at lead times of one day for the summer months (May to August) were developed, and their relative performance was compared using the coefficient of determination, root mean square error, relative root mean square error, and efficiency index. The key variables used to develop and validate the models were daily total precipitation, daily maximum temperature, and daily water demand data from 2001 to 2009 in the city of Montreal, Canada. The WA-ANN models were found to provide more accurate urban water demand forecasts than the MLR, MNLR, ARIMA, and ANN models. The results of this study indicate that coupled wavelet-neural network models are a potentially promising new method of urban water demand forecasting that merit further study.
Predicting pesticide removal efficacy of vegetated filter strips: A meta-regression analysis.
Chen, Huajin; Grieneisen, Michael L; Zhang, Minghua
2016-04-01
Vegetated Filter Strips (VFS's) are widely used for alleviating agricultural pesticide loadings to surface water bodies. However, effective tools are lacking to quantify the performance of VFS's in reducing off-site pesticide transport. In this study, we applied meta-regression to develop a model for predicting VFS pesticide retention efficiency based on hydrologic responses of VFS's, incoming pollutant characteristics and the interaction within and between these two factor groups (R(2)=0.83). In cross-validation analysis, our model (Q(2)=0.81) outperformed the existing pesticide retention module of VFSMOD (Q(2)=0.72) by explicitly accounting for interaction effect and the categorical effect of pesticide adsorption properties. Based on the 181 data points studied, infiltration had a leading, positive influence on pesticide retention, followed by sedimentation and interaction between the two. Interaction between infiltration and pesticide adsorption properties was also prominent, as the influence of infiltration was significantly lower for strongly adsorbed pesticides. In addition, the clay content of incoming sediment was negatively associated with pesticide retention. Our model is not only valuable in predicting VFS performance, but also provides a quantitative characterization of the interacting VFS processes, thereby facilitating a deeper understanding of the underlying mechanisms. PMID:26802340
A systematic review and meta-regression analysis of mivacurium for tracheal intubation.
Vanlinthout, L E H; Mesfin, S H; Hens, N; Vanacker, B F; Robertson, E N; Booij, L H D J
2014-12-01
We systematically reviewed factors associated with intubation conditions in randomised controlled trials of mivacurium, using random-effects meta-regression analysis. We included 29 studies of 1050 healthy participants. Four factors explained 72.9% of the variation in the probability of excellent intubation conditions: mivacurium dose, 24.4%; opioid use, 29.9%; time to intubation and age together, 18.6%. The odds ratio (95% CI) for excellent intubation was 3.14 (1.65-5.73) for doubling the mivacurium dose, 5.99 (2.14-15.18) for adding opioids to the intubation sequence, and 6.55 (6.01-7.74) for increasing the delay between mivacurium injection and airway insertion from 1 to 2 min in subjects aged 25 years and 2.17 (2.01-2.69) for subjects aged 70 years, p < 0.001 for all. We conclude that good conditions for tracheal intubation are more likely by delaying laryngoscopy after injecting a higher dose of mivacurium with an opioid, particularly in older people. PMID:25040541
Kirsanov, Dmitry; Panchuk, Vitaly; Goydenko, Alexander; Khaydukova, Maria; Semenov, Valentin; Legin, Andrey
2015-11-01
This study addresses the problem of simultaneous quantitative analysis of six lanthanides (Ce, Pr, Nd, Sm, Eu, Gd) in mixed solutions by two different X-ray fluorescence techniques: energy-dispersive (EDX) and total reflection (TXRF). Concentration of each lanthanide was varied in the range 10- 6-10- 3 mol/L, low values being around the detection limit of the method. This resulted in XRF spectra with very poor signal to noise ratio and overlapping bands in case of EDX, while only the latter problem was observed for TXRF. It was shown that ordinary least squares approach in numerical calibration fails to provide for reasonable precision in quantification of individual lanthanides. Partial least squares (PLS) regression was able to circumvent spectral inferiorities and yielded adequate calibration models for both techniques with RMSEP (root mean squared error of prediction) values around 10- 5 mol/L. It was demonstrated that comparatively simple and inexpensive EDX method is capable of ensuring the similar precision to more sophisticated TXRF, when the spectra are treated by PLS.
Comparison of linear discriminant analysis and logistic regression for data classification
Liong, Choong-Yeun; Foo, Sin-Fan
2013-04-01
Linear discriminant analysis (LDA) and logistic regression (LR) are often used for the purpose of classifying populations or groups using a set of predictor variables. Assumptions of multivariate normality and equal variance-covariance matrices across groups are required before proceeding with LDA, but such assumptions are not required for LR and hence LR is considered to be much more robust than LDA. In this paper, several real datasets which are different in terms of normality, number of independent variables and sample size are used to study the performance of both methods. The methods are compared based on the percentage of correct classification and B index. The results show that overall, LR performs better regardless of the distribution of the data is normal or nonnormal. However, LR needs longer computing time than LDA with the increase in sample size. The performance of LDA was also tested by using various prior probabilities. The results show that the average percentage of correct classification and the B index are higher when the prior probability is set based on the group size rather than using equal probabilities for all groups.
Power Law Regression Analysis of Heat Flux Width in Type I ELMs
Stephens, C. D.; Makowski, M. A.; Leonard, A. W.; Osborne, T. H.
2014-10-01
In this project, a database of Type I ELM characteristics has been assembled and will be used to investigate possible dependencies of the heat flux width on physics and engineering parameters. At the edge near the divertor, high impulsive heat loads are imparted onto the surface. The impact of these ELMs can cause a reduction in divertor lifetime if the heat flux is great enough due to material erosion. A program will be used to analyze data, extract relevant, measurable quantities, and record the quantities in the table. Care is taken to accurately capture the complex space/time structure of the ELM. Then correlations between discharge and equilibrium parameters will be investigated. Power law regression analysis will be used to help determine the dependence of the heat flux width on these various measurable quantities and parameters. This will enable us to better understand the physics of heat flux at the edge. Work supported in part by the National Undergraduate Fellowship Program in Plasma Physics and Fusion Energy Sciences and the US DOE under DE-FG02-04ER54761, DE-AC52-07NA27344, DE-FC02-04ER54698.
Variable Selection for Functional Logistic Regression in fMRI Data Analysis
Nedret BILLOR
2015-03-01
Full Text Available This study was motivated by classification problem in Functional Magnetic Resonance Imaging (fMRI, a noninvasive imaging technique which allows an experimenter to take images of a subject's brain over time. As fMRI studies usually have a small number of subjects and we assume that there is a smooth, underlying curve describing the observations in fMRI data, this results in incredibly high-dimensional datasets that are functional in nature. High dimensionality is one of the biggest problems in statistical analysis of fMRI data. There is also a need for the development of better classification methods. One of the best things about fMRI technique is its noninvasiveness. If statistical classification methods are improved, it could aid the advancement of noninvasive diagnostic techniques for mental illness or even degenerative diseases such as Alzheimer's. In this paper, we develop a variable selection technique, which tackles high dimensionality and correlation problems in fMRI data, based on L1 regularization-group lasso for the functional logistic regression model where the response is binary and represent two separate classes; the predictors are functional. We assess our method with a simulation study and an application to a real fMRI dataset.
Comparison of Bayesian and Classical Analysis of Weibull Regression Model: A Simulation Study
?mran KURT MRL
2011-01-01
Full Text Available Objective: The purpose of this study was to compare performances of classical Weibull Regression Model (WRM and Bayesian-WRM under varying conditions using Monte Carlo simulations. Material and Methods: It was simulated the generated data by running for each of classical WRM and Bayesian-WRM under varying informative priors and sample sizes using our simulation algorithm. In simulation studies, n=50, 100 and 250 were for sample sizes, and informative prior values using a normal prior distribution with was selected for b1. For each situation, 1000 simulations were performed. Results: Bayesian-WRM with proper informative prior showed a good performance with too little bias. It was found out that bias of Bayesian-WRM increased while priors were becoming distant from reliability in all sample sizes. Furthermore, Bayesian-WRM obtained predictions with more little standard error than the classical WRM in both of small and big samples in the light of proper priors. Conclusion: In this simulation study, Bayesian-WRM showed better performance than classical method, when subjective data analysis performed by considering of expert opinions and historical knowledge about parameters. Consequently, Bayesian-WRM should be preferred in existence of reliable informative priors, in the contrast cases, classical WRM should be preferred.
A New Global Regression Analysis Method for the Prediction of Wind Tunnel Model Weight Corrections
Ulbrich, Norbert Manfred; Bridge, Thomas M.; Amaya, Max A.
2014-01-01
A new global regression analysis method is discussed that predicts wind tunnel model weight corrections for strain-gage balance loads during a wind tunnel test. The method determines corrections by combining "wind-on" model attitude measurements with least squares estimates of the model weight and center of gravity coordinates that are obtained from "wind-off" data points. The method treats the least squares fit of the model weight separate from the fit of the center of gravity coordinates. Therefore, it performs two fits of "wind- off" data points and uses the least squares estimator of the model weight as an input for the fit of the center of gravity coordinates. Explicit equations for the least squares estimators of the weight and center of gravity coordinates are derived that simplify the implementation of the method in the data system software of a wind tunnel. In addition, recommendations for sets of "wind-off" data points are made that take typical model support system constraints into account. Explicit equations of the confidence intervals on the model weight and center of gravity coordinates and two different error analyses of the model weight prediction are also discussed in the appendices of the paper.
Hao, Lingxin
2007-01-01
Quantile Regression, the first book of Hao and Naiman's two-book series, establishes the seldom recognized link between inequality studies and quantile regression models. Though separate methodological literature exists for each subject, the authors seek to explore the natural connections between this increasingly sought-after tool and research topics in the social sciences. Quantile regression as a method does not rely on assumptions as restrictive as those for the classical linear regression; though more traditional models such as least squares linear regression are more widely utilized, Hao
Lançon Christophe
2006-07-01
Full Text Available Abstract Background Data comparing duloxetine with existing antidepressant treatments is limited. A comparison of duloxetine with fluoxetine has been performed but no comparison with venlafaxine, the other antidepressant in the same therapeutic class with a significant market share, has been undertaken. In the absence of relevant data to assess the place that duloxetine should occupy in the therapeutic arsenal, indirect comparisons are the most rigorous way to go. We conducted a systematic review of the efficacy of duloxetine, fluoxetine and venlafaxine versus placebo in the treatment of Major Depressive Disorder (MDD, and performed indirect comparisons through meta-regressions. Methods The bibliography of the Agency for Health Care Policy and Research and the CENTRAL, Medline, and Embase databases were interrogated using advanced search strategies based on a combination of text and index terms. The search focused on randomized placebo-controlled clinical trials involving adult patients treated for acute phase Major Depressive Disorder. All outcomes were derived to take account for varying placebo responses throughout studies. Primary outcome was treatment efficacy as measured by Hedge's g effect size. Secondary outcomes were response and dropout rates as measured by log odds ratios. Meta-regressions were run to indirectly compare the drugs. Sensitivity analysis, assessing the influence of individual studies over the results, and the influence of patients' characteristics were run. Results 22 studies involving fluoxetine, 9 involving duloxetine and 8 involving venlafaxine were selected. Using indirect comparison methodology, estimated effect sizes for efficacy compared with duloxetine were 0.11 [-0.14;0.36] for fluoxetine and 0.22 [0.06;0.38] for venlafaxine. Response log odds ratios were -0.21 [-0.44;0.03], 0.70 [0.26;1.14]. Dropout log odds ratios were -0.02 [-0.33;0.29], 0.21 [-0.13;0.55]. Sensitivity analyses showed that results were consistent. Conclusion Fluoxetine was not statistically different in either tolerability or efficacy when compared with duloxetine. Venlafaxine was significantly superior to duloxetine in all analyses except dropout rate. In the absence of relevant data from head-to-head comparison trials, results suggest that venlafaxine is superior compared with duloxetine and that duloxetine does not differentiate from fluoxetine.
Pradhan, Biswajeet
Recently, in 2006 and 2007 heavy monsoons rainfall have triggered floods along Malaysia's east coast as well as in southern state of Johor. The hardest hit areas are along the east coast of peninsular Malaysia in the states of Kelantan, Terengganu and Pahang. The city of Johor was particularly hard hit in southern side. The flood cost nearly billion ringgit of property and many lives. The extent of damage could have been reduced or minimized if an early warning system would have been in place. This paper deals with flood susceptibility analysis using logistic regression model. We have evaluated the flood susceptibility and the effect of flood-related factors along the Kelantan river basin using the Geographic Information System (GIS) and remote sensing data. Previous flooded areas were extracted from archived radarsat images using image processing tools. Flood susceptibility mapping was conducted in the study area along the Kelantan River using radarsat imagery and then enlarged to 1:25,000 scales. Topographical, hydrological, geological data and satellite images were collected, processed, and constructed into a spatial database using GIS and image processing. The factors chosen that influence flood occurrence were: topographic slope, topographic aspect, topographic curvature, DEM and distance from river drainage, all from the topographic database; flow direction, flow accumulation, extracted from hydrological database; geology and distance from lineament, taken from the geologic database; land use from SPOT satellite images; soil texture from soil database; and the vegetation index value from SPOT satellite images. Flood susceptible areas were analyzed and mapped using the probability-logistic regression model. Results indicate that flood prone areas can be performed at 1:25,000 which is comparable to some conventional flood hazard map scales. The flood prone areas delineated on these maps correspond to areas that would be inundated by significant flooding (approximately the 100 year flood). The flood prone area boundaries were generally in agreement with flood hazard maps produced by the Department of Irrigation and Drainage although the latter are somewhat more detailed because of their larger scale.
Cecchini Diego M
2009-11-01
Full Text Available Abstract Background The central nervous system is considered a sanctuary site for HIV-1 replication. Variables associated with HIV cerebrospinal fluid (CSF viral load in the context of opportunistic CNS infections are poorly understood. Our objective was to evaluate the relation between: (1 CSF HIV-1 viral load and CSF cytological and biochemical characteristics (leukocyte count, protein concentration, cryptococcal antigen titer; (2 CSF HIV-1 viral load and HIV-1 plasma viral load; and (3 CSF leukocyte count and the peripheral blood CD4+ T lymphocyte count. Methods Our approach was to use a prospective collection and analysis of pre-treatment, paired CSF and plasma samples from antiretroviral-naive HIV-positive patients with cryptococcal meningitis and assisted at the Francisco J Muiz Hospital, Buenos Aires, Argentina (period: 2004 to 2006. We measured HIV CSF and plasma levels by polymerase chain reaction using the Cobas Amplicor HIV-1 Monitor Test version 1.5 (Roche. Data were processed with Statistix 7.0 software (linear regression analysis. Results Samples from 34 patients were analyzed. CSF leukocyte count showed statistically significant correlation with CSF HIV-1 viral load (r = 0.4, 95% CI = 0.13-0.63, p = 0.01. No correlation was found with the plasma viral load, CSF protein concentration and cryptococcal antigen titer. A positive correlation was found between peripheral blood CD4+ T lymphocyte count and the CSF leukocyte count (r = 0.44, 95% CI = 0.125-0.674, p = 0.0123. Conclusion Our study suggests that CSF leukocyte count influences CSF HIV-1 viral load in patients with meningitis caused by Cryptococcus neoformans.
Changes of platelet GMP-140 in diabetic nephropathy and its multi-factor regression analysis
The relation of platelet GMP-140 and its related factors with diabetic nephropathy was studied. 144 patients of diabetic mellitus without nephropathy (group without DN, mean suffering duration of 25.5 +- 18.6 months); 80 with diabetic nephropathy (group DN, mean suffering duration of 58.7 +- 31.6 months) and 50 normal controls were chosen in the research. Platelet GMP-140, plasma α1-MG, β2-MG, and 24 hour urine albumin (ALB), IgG, α1-MG, β2-MG were detected by RIA, while HBA1C via chromatographic separation and FBG, PBG, Ch, TG, HDL, FG via biochemical methods. All the data had been processed with software on computer with t-test and linear regression, and multi-factor analysis were done also. The levels of platelet GMP-140, FG, DBP, TG, HBA1C and PBG in group DN were significantly higher than those of group without DN and normal control (P 0.05), while they were higher than those of normal controls. Multi-factor analysis of platelet GMP-140 with TG, DBP and HBA1C were performed in 80 patients with DN (P 1C are the independent factors enhancing the activation of platelets. The disturbance of lipid metabolism in type II diabetic mellitus may also enhance the activation of platelets. Elevation of blood pressure may accelerate the initiation and deterioration of DN in which change of platelet GMP-140 is an independent factor. Elevation of HBA1C and blood glucose are related closely to the diabetic nephropathy
We applied multivariate analysis to the clinical findings in patients with acute gastrointestinal (GI) hemorrhage and compared the relationship between these findings and angiographic evidence of extravasation. Our study population consisted of 46 patients with acute GI bleeding. They were divided into two groups. In group 1 we retrospectively analyzed 41 angiograms obtained in 29 patients (age range, 25-91 years; average, 71 years). Their clinical findings including the shock index (SI), diastolic blood pressure, hemoglobin, platelet counts, and age, which were quantitatively analyzed. In group 2, consisting of 17 patients (age range, 21-78 years; average, 60 years), we prospectively applied statistical analysis by a logistics regression model to their clinical findings and then assessed 21 angiograms obtained in these patients to determine whether our model was useful for predicting the presence of angiographic evidence of extravasation. On 18 of 41 (43.9%) angiograms in group 1 there was evidence of extravasation; in 3 patients it was demonstrated only by selective angiography. Factors significantly associated with angiographic visualization of extravasation were the SI and patient age. For differentiation between cases with and cases without angiographic evidence of extravasation, the maximum cutoff point was between 0.51 and 0.0.53. Of the 21 angiograms obtained in group 2, 13 (61.9%) showed evidence of extravasation; in 1 patient it was demonstrated only on selective angiograms. We found that in 90% of the cases, the prospective application of our model correctly predicted the angiographically confirmed presence or absence of extravasation. We conclude that in patients with GI hemorrhage, angiographic visualization of extravasation is associated with the pre-embolization SI. Patients with a high SI value should undergo study to facilitate optimal treatment planning
Alternative Methods of Regression
Birkes, David
2011-01-01
Of related interest. Nonlinear Regression Analysis and its Applications Douglas M. Bates and Donald G. Watts ".an extraordinary presentation of concepts and methods concerning the use and analysis of nonlinear regression models.highly recommend[ed].for anyone needing to use and/or understand issues concerning the analysis of nonlinear regression models." --Technometrics This book provides a balance between theory and practice supported by extensive displays of instructive geometrical constructs. Numerous in-depth case studies illustrate the use of nonlinear regression analysis--with all data s
José Ribeiro de Araújo Neto
2014-04-01
Full Text Available The main goal of this work was to develop and validate multiple regression models to estimate the electrical conductivity of the surface water reservoir in the basin Metropolitan Ceará State, based on the concentration of the each investigated ion. The influence of ions on the values of EC formed by each group from a hierarchical cluster analysis – HCA was determined. The data were provided by the Company of Water Resources Management of Ceará and cover the period of 1998/2009. A total of 290 samples from seven reservoirs were used. The parameters evaluated were: Electrical conductivity of water (EC, Sodium (Na+, calcium (Ca+2, magnesium (Mg+2, chloride (Cl- and bicarbonate (HCO3-. The results showed that the HCA formed two distinct groups and the values of all parameters studied in the group 2 always presented with average highest than the group 1. The reservoirs in the group 1 (Castro and Pompeu Sobrinho have the highest level of salinity in the Metropolitan basin. Chloride was present in the both models developed and it was the main ion responsible for the ionic composition of the EC. The statistical models developed had simulated values very close to those observed and this indicates a good accuracy of the models. According to the indices applied, calibrated and validated models showed good accuracy with indices Trusts (c greater than 0.71, and with the indexes Willmott (id greater than 0.85. This fact show a good performance of the models applied in this work. Resumo - Este trabalho foi realizado com o objetivo de desenvolver e validar modelos de regressão múltipla em que a condutividade elétrica das águas superficiais de reservatórios na bacia Metropolitana do Ceará, pudesse ser estimada com base na concentração de cada íon pesquisado, determinando, assim, a ordem de influência dos íons nos valores da CE, isso para cada grupo formado a partir de uma análise multivariada de agrupamento hierárquico - AAH. Os dados utilizados foram fornecidos pela Companhia de Gestão dos Recursos Hídricos do Ceará e contemplam o período de 1998/2009, com um total de 290 amostras de 7 reservatórios. As características avaliadas foram: Condutividade elétrica da água (CE, Sódio (Na+, Cálcio (Ca+2, Magnésio (Mg+2, Cloreto (Cl- e Bicarbonato (HCO3-. Os resultados mostram que a AAH deu origem a dois grupos distintos, sendo que os valores de todas variáveis estudadas do grupo 2 apresentaram-se sempre com maiores médias em relação aos valores do grupo 1, mostrando que, os açudes que compõem esse grupo (Castro e Pompeu Sobrinho apresentam maiores nível de salinidade na bacia Metropolitana. O cloreto se fez presente nos dois modelos desenvolvidos, sendo o principal íon responsável pela composição iônica da CE. Os modelos estatísticos desenvolvidos apresentaram valores simulados bem próximos dos observados, o que indica boa acuracidade de tais modelos. Pelos índices aplicados, os modelos calibrados e validados apresentaram boa precisão, com índices de confianças (c superiores a 0,71; e índices de Willmott (id maiores que 0,85; indicando bom desempenho dos modelos.
Regression analysis of mean quality-adjusted survival time based on pseudo-observations
Tunes-da-Silva, G; Klein, J. P.
2009-01-01
Regression models for the mean quality-adjusted survival time are specified from hazard functions of transitions between two states and the mean quality-adjusted survival time may be a complex function of covariates. We discuss a regression model for the mean quality-adjusted survival (QAS) time based on pseudo-observations, which has the advantage of directly modeling the effect of covariates in the QAS time. Both Monte Carlo simulations and a real data set are studied.
Regression analysis of mean quality-adjusted survival time based on pseudo-observations.
Tunes-da-Silva, G; Klein, J P
2009-03-30
Regression models for the mean quality-adjusted survival time are specified from hazard functions of transitions between two states and the mean quality-adjusted survival time may be a complex function of covariates. We discuss a regression model for the mean quality-adjusted survival (QAS) time based on pseudo-observations, which has the advantage of directly modeling the effect of covariates in the QAS time. Both Monte Carlo simulations and a real data set are studied. PMID:19205073
Das Sumonkanti; Rahman Rajwanur M
2011-01-01
Abstract Background The study attempts to develop an ordinal logistic regression (OLR) model to identify the determinants of child malnutrition instead of developing traditional binary logistic regression (BLR) model using the data of Bangladesh Demographic and Health Survey 2004. Methods Based on weight-for-age anthropometric index (Z-score) child nutrition status is categorized into three groups-severely undernourished (< -3.0), moderately undernourished (-3.0 to -2.01) and nourished (≥-2.0...
Cochran-armitage test versus logistic regression in the analysis of genetic association studies
Wellek, Stefan; Ziegler, Andreas
2012-01-01
Objective: The Cochran-Armitage trend test based on the linear regression model has become a standard procedure for association testing in case-control studies. In contrast, the logistic regression model is generally used for estimating effect sizes. The aim of this paper is to propose an approach that allows for association testing and parameter estimation by means of the same statistic. Methods/Results: The trend test is recommendable as a test of no association between genotype and risk of...
Analysis of variance, coefficient of determination and $F$-test for local polynomial regression
Huang, Li-Shan; Chen, Jianwei
2008-01-01
This paper provides ANOVA inference for nonparametric local polynomial regression (LPR) in analogy with ANOVA tools for the classical linear regression model. A surprisingly simple and exact local ANOVA decomposition is established, and a local R-squared quantity is defined to measure the proportion of local variation explained by fitting LPR. A global ANOVA decomposition is obtained by integrating local counterparts, and a global R-squared and a symmetric projection matrix are defined. We sh...
Autoencoder, Principal Component Analysis and Support Vector Regression for Data Imputation
Marivate, Vukosi N.; Nelwamodo, Fulufhelo V.; Marwala, Tshilidzi
2007-01-01
Data collection often results in records that have missing values or variables. This investigation compares 3 different data imputation models and identifies their merits by using accuracy measures. Autoencoder Neural Networks, Principal components and Support Vector regression are used for prediction and combined with a genetic algorithm to then impute missing variables. The use of PCA improves the overall performance of the autoencoder network while the use of support vector regression show...
Analysis of dental caries using generalized linear and count regression models
Javali M. Phil
2013-11-01
Full Text Available Generalized linear models (GLM are generalization of linear regression models, which allow fitting regression models to response data in all the sciences especially medical and dental sciences that follow a general exponential family. These are flexible and widely used class of such models that can accommodate response variables. Count data are frequently characterized by overdispersion and excess zeros. Zero-inflated count models provide a parsimonious yet powerful way to model this type of situation. Such models assume that the data are a mixture of two separate data generation processes: one generates only zeros, and the other is either a Poisson or a negative binomial data-generating process. Zero inflated count regression models such as the zero-inflated Poisson (ZIP, zero-inflated negative binomial (ZINB regression models have been used to handle dental caries count data with many zeros. We present an evaluation framework to the suitability of applying the GLM, Poisson, NB, ZIP and ZINB to dental caries data set where the count data may exhibit evidence of many zeros and over-dispersion. Estimation of the model parameters using the method of maximum likelihood is provided. Based on the Vuong test statistic and the goodness of fit measure for dental caries data, the NB and ZINB regression models perform better than other count regression models.
Vozinaki, Anthi Eirini K.; Karatzas, George P.; Sibetheros, Ioannis A.; Varouchakis, Emmanouil A.
2014-05-01
Damage curves are the most significant component of the flood loss estimation models. Their development is quite complex. Two types of damage curves exist, historical and synthetic curves. Historical curves are developed from historical loss data from actual flood events. However, due to the scarcity of historical data, synthetic damage curves can be alternatively developed. Synthetic curves rely on the analysis of expected damage under certain hypothetical flooding conditions. A synthetic approach was developed and presented in this work for the development of damage curves, which are subsequently used as the basic input to a flood loss estimation model. A questionnaire-based survey took place among practicing and research agronomists, in order to generate rural loss data based on the responders' loss estimates, for several flood condition scenarios. In addition, a similar questionnaire-based survey took place among building experts, i.e. civil engineers and architects, in order to generate loss data for the urban sector. By answering the questionnaire, the experts were in essence expressing their opinion on how damage to various crop types or building types is related to a range of values of flood inundation parameters, such as floodwater depth and velocity. However, the loss data compiled from the completed questionnaires were not sufficient for the construction of workable damage curves; to overcome this problem, a Weighted Monte Carlo method was implemented, in order to generate extra synthetic datasets with statistical properties identical to those of the questionnaire-based data. The data generated by the Weighted Monte Carlo method were processed via Logistic Regression techniques in order to develop accurate logistic damage curves for the rural and the urban sectors. A Python-based code was developed, which combines the Weighted Monte Carlo method and the Logistic Regression analysis into a single code (WMCLR Python code). Each WMCLR code execution provided a flow velocity-depth damage curve for a specific land use. More specifically, each WMCLR code execution for the agricultural sector generated a damage curve for a specific crop and for every month of the year, thus relating the damage to any crop with floodwater depth, flow velocity and the growth phase of the crop at the time of flooding. Respectively, each WMCLR code execution for the urban sector developed a damage curve for a specific building type, relating structural damage with floodwater depth and velocity. Furthermore, two techno-economic models were developed in Python programming language, in order to estimate monetary values of flood damages to the rural and the urban sector, respectively. A new Monte Carlo simulation was performed, consisting of multiple executions of the techno-economic code, which generated multiple damage cost estimates. Each execution used the proper WMCLR simulated damage curve. The uncertainty analysis of the damage estimates established the accuracy and reliability of the proposed methodology for the synthetic damage curves' development.
Nunes, Jorge; Madeira, manuel; Gazarini, Luiz; Neves, José; Vicente, Henrique
2012-01-01
The changes in the soil nitrate concentration were studied during 2 years in a ‘‘montado’’ ecosystem, in the South of Portugal. Total rainfall, air and soil temperature and soil water content under and outside Quercus rotundifolia canopy were also evaluated. A cluster analysis was carried out using climatic and microclimatic parameters in order to maximize the intraclass similarity and minimize the interclass similarity. It was used the k-Means Clustering Method. Se...
Liu, Bilan; Qiu, Xing; Zhu, Tong; Tian, Wei; Hu, Rui; Ekholm, Sven; Schifitto, Giovanni; Zhong, Jianhui
2016-01-01
Quantitative measurement of localized longitudinal changes in brain abnormalities at an individual level may offer critical information for disease diagnosis and treatment. The voxel-wise permutation-based method SPREAD/iSPREAD, which combines resampling and spatial regression of neighboring voxels, provides an effective and robust method for detecting subject-specific longitudinal changes within the whole brain, especially for longitudinal studies with a limited number of scans. As an extension of SPREAD/iSPREAD, we present a general method that facilitates analysis of serial Diffusion Tensor Imaging (DTI) measurements (with more than two time points) for testing localized changes in longitudinal studies. Two types of voxel-level test statistics (model-free test statistics, which measure intra-subject variability across time, and test statistics based on general linear model that incorporate specific lesion evolution models) were estimated and tested against the null hypothesis among groups of DTI data across time. The implementation and utility of the proposed statistical method were demonstrated by both Monte Carlo simulations and applications on clinical DTI data from human brain in vivo. By a design of test statistics based on the disease progression model, it was possible to apportion the true significant voxels attributed to the disease progression and those caused by underlying anatomical differences that cannot be explained by the model, which led to improvement in false positive (FP) control in the results. Extension of the proposed method to include other diseases or drug effect models, as well as the feasibility of global statistics, was discussed. The proposed statistical method can be extended to a broad spectrum of longitudinal studies with carefully designed test statistics, which helps to detect localized changes at the individual level.
Analysis of neutral particle emission containing a fast ion tail by use of a non linear-regression
We present a program for the analysis of neutral particle emission detected by a single channel analyzer which may be easily modified to handle the data from a multichannel analyzer. In particular the program uses a nonlinear regression to fit the data and therefore correctly handles cases where the Maxwellian velocity distribution function is distorted by a high energy ion population
Brabant, Marie-Eve; Hebert, Martine; Chagnon, Francois
2013-01-01
This study explored the clinical profiles of 77 female teenager survivors of sexual abuse and examined the association of abuse-related and personal variables with suicidal ideations. Analyses revealed that 64% of participants experienced suicidal ideations. Findings from classification and regression tree analysis indicated that depression,
Brabant, Marie-Eve; Hebert, Martine; Chagnon, Francois
2013-01-01
This study explored the clinical profiles of 77 female teenager survivors of sexual abuse and examined the association of abuse-related and personal variables with suicidal ideations. Analyses revealed that 64% of participants experienced suicidal ideations. Findings from classification and regression tree analysis indicated that depression,…
Long, Nguyen Phuoc; Huy, Nguyen Tien; Trang, Nguyen Thi Huyen; Luan, Nguyen Thien; Anh, NguyenHoang; Nghi, Tran Diem; Hieu, Mai Van; Hirayama, Kenji; Karbwang, Juntra
2014-01-01
BACKGROUND: Ethics is one of the main pillars in the development of science. We performed a JoinPoint regression analysis to analyze the trends of ethical issue research over the past half century. The question is whether ethical issues are neglected despite their importance in modern research.
B?l?cescu Aniela
2011-12-01
Full Text Available This paper aims to examine the causal relationship between GDP and final consumption. The authors used linear regression model in which GDP is considered variable results, and final consumption variable factor. In drafting article we used Excel software application that is a modern computing and statistical data analysis.
Zhang, Kun; Huang, Feifei; Chen, Jie; Cai, Qingqing; Wang, Tong; Zou, Rong; Zuo, Zhiyi; Wang, Jingfeng; Huang, Hui
2014-01-01
Overweight and obesity are associated with adverse cardiovascular outcomes. However, the role of overweight and obesity in left ventricular hypertrophy (LVH) of hypertensive patients is controversial. The aim of the current meta-analysis was to evaluate the influence of overweight and obesity on LVH regression in the hypertensive population.
Thomas, Emily H.; Galambos, Nora
To investigate how students' characteristics and experiences affect satisfaction, this study used regression and decision-tree analysis with the CHAID algorithm to analyze student opinion data from a sample of 1,783 college students. A data-mining approach identifies the specific aspects of students' university experience that most influence three…