Regression analysis using dependent Polya trees.
Schörgendorfer, Angela; Branscum, Adam J
2013-11-30
Many commonly used models for linear regression analysis force overly simplistic shape and scale constraints on the residual structure of data. We propose a semiparametric Bayesian model for regression analysis that produces data-driven inference by using a new type of dependent Polya tree prior to model arbitrary residual distributions that are allowed to evolve across increasing levels of an ordinal covariate (e.g., time, in repeated measurement studies). By modeling residual distributions at consecutive covariate levels or time points using separate, but dependent Polya tree priors, distributional information is pooled while allowing for broad pliability to accommodate many types of changing residual distributions. We can use the proposed dependent residual structure in a wide range of regression settings, including fixed-effects and mixed-effects linear and nonlinear models for cross-sectional, prospective, and repeated measurement data. A simulation study illustrates the flexibility of our novel semiparametric regression model to accurately capture evolving residual distributions. In an application to immune development data on immunoglobulin G antibodies in children, our new model outperforms several contemporary semiparametric regression models based on a predictive model selection criterion. Copyright © 2013 John Wiley & Sons, Ltd.
Classification and regression trees
Breiman, Leo; Olshen, Richard A; Stone, Charles J
1984-01-01
The methodology used to construct tree structured rules is the focus of this monograph. Unlike many other statistical procedures, which moved from pencil and paper to calculators, this text's use of trees was unthinkable before computers. Both the practical and theoretical sides have been developed in the authors' study of tree methods. Classification and Regression Trees reflects these two sides, covering the use of trees as a data analysis method, and in a more mathematical framework, proving some of their fundamental properties.
Sub-pixel estimation of tree cover and bare surface densities using regression tree analysis
Carlos Augusto Zangrando Toneli
2011-09-01
Full Text Available Sub-pixel analysis is capable of generating continuous fields, which represent the spatial variability of certain thematic classes. The aim of this work was to develop numerical models to represent the variability of tree cover and bare surfaces within the study area. This research was conducted in the riparian buffer within a watershed of the São Francisco River in the North of Minas Gerais, Brazil. IKONOS and Landsat TM imagery were used with the GUIDE algorithm to construct the models. The results were two index images derived with regression trees for the entire study area, one representing tree cover and the other representing bare surface. The use of non-parametric and non-linear regression tree models presented satisfactory results to characterize wetland, deciduous and savanna patterns of forest formation.
Henrard, S; Speybroeck, N; Hermans, C
2015-11-01
Haemophilia is a rare genetic haemorrhagic disease characterized by partial or complete deficiency of coagulation factor VIII, for haemophilia A, or IX, for haemophilia B. As in any other medical research domain, the field of haemophilia research is increasingly concerned with finding factors associated with binary or continuous outcomes through multivariable models. Traditional models include multiple logistic regressions, for binary outcomes, and multiple linear regressions for continuous outcomes. Yet these regression models are at times difficult to implement, especially for non-statisticians, and can be difficult to interpret. The present paper sought to didactically explain how, why, and when to use classification and regression tree (CART) analysis for haemophilia research. The CART method is non-parametric and non-linear, based on the repeated partitioning of a sample into subgroups based on a certain criterion. Breiman developed this method in 1984. Classification trees (CTs) are used to analyse categorical outcomes and regression trees (RTs) to analyse continuous ones. The CART methodology has become increasingly popular in the medical field, yet only a few examples of studies using this methodology specifically in haemophilia have to date been published. Two examples using CART analysis and previously published in this field are didactically explained in details. There is increasing interest in using CART analysis in the health domain, primarily due to its ease of implementation, use, and interpretation, thus facilitating medical decision-making. This method should be promoted for analysing continuous or categorical outcomes in haemophilia, when applicable. © 2015 John Wiley & Sons Ltd.
Tso, Geoffrey K.F.; Yau, Kelvin K.W. [City University of Hong Kong, Kowloon, Hong Kong (China). Department of Management Sciences
2007-09-15
This study presents three modeling techniques for the prediction of electricity energy consumption. In addition to the traditional regression analysis, decision tree and neural networks are considered. Model selection is based on the square root of average squared error. In an empirical application to an electricity energy consumption study, the decision tree and neural network models appear to be viable alternatives to the stepwise regression model in understanding energy consumption patterns and predicting energy consumption levels. With the emergence of the data mining approach for predictive modeling, different types of models can be built in a unified platform: to implement various modeling techniques, assess the performance of different models and select the most appropriate model for future prediction. (author)
Shi, K-Q; Zhou, Y-Y; Yan, H-D; Li, H; Wu, F-L; Xie, Y-Y; Braddock, M; Lin, X-Y; Zheng, M-H
2017-02-01
At present, there is no ideal model for predicting the short-term outcome of patients with acute-on-chronic hepatitis B liver failure (ACHBLF). This study aimed to establish and validate a prognostic model by using the classification and regression tree (CART) analysis. A total of 1047 patients from two separate medical centres with suspected ACHBLF were screened in the study, which were recognized as derivation cohort and validation cohort, respectively. CART analysis was applied to predict the 3-month mortality of patients with ACHBLF. The accuracy of the CART model was tested using the area under the receiver operating characteristic curve, which was compared with the model for end-stage liver disease (MELD) score and a new logistic regression model. CART analysis identified four variables as prognostic factors of ACHBLF: total bilirubin, age, serum sodium and INR, and three distinct risk groups: low risk (4.2%), intermediate risk (30.2%-53.2%) and high risk (81.4%-96.9%). The new logistic regression model was constructed with four independent factors, including age, total bilirubin, serum sodium and prothrombin activity by multivariate logistic regression analysis. The performances of the CART model (0.896), similar to the logistic regression model (0.914, P=.382), exceeded that of MELD score (0.667, P<.001). The results were confirmed in the validation cohort. We have developed and validated a novel CART model superior to MELD for predicting three-month mortality of patients with ACHBLF. Thus, the CART model could facilitate medical decision-making and provide clinicians with a validated practical bedside tool for ACHBLF risk stratification.
Tejera-Vaquerizo, A; Martín-Cuevas, P; Gallego, E; Herrera-Acosta, E; Traves, V; Herrera-Ceballos, E; Nagore, E
2015-04-01
The main aim of this study was to identify predictors of sentinel lymph node (SN) metastasis in cutaneous melanoma. This was a retrospective cohort study of 818 patients in 2 tertiary-level hospitals. The primary outcome variable was SN involvement. Independent predictors were identified using multiple logistic regression and a classification and regression tree (CART) analysis. Ulceration, tumor thickness, and a high mitotic rate (≥6 mitoses/mm(2)) were independently associated with SN metastasis in the multiple regression analysis. The most important predictor in the CART analysis was Breslow thickness. Absence of an inflammatory infiltrate, patient age, and tumor location were predictive of SN metastasis in patients with tumors thicker than 2mm. In the case of thinner melanomas, the predictors were mitotic rate (>6 mitoses/mm(2)), presence of ulceration, and tumor thickness. Patient age, mitotic rate, and tumor thickness and location were predictive of survival. A high mitotic rate predicts a higher risk of SN involvement and worse survival. CART analysis improves the prediction of regional metastasis, resulting in better clinical management of melanoma patients. It may also help select suitable candidates for inclusion in clinical trials. Copyright © 2014 Elsevier España, S.L.U. and AEDV. All rights reserved.
VanEngelsdorp, Dennis; Speybroeck, Niko; Evans, Jay D; Nguyen, Bach Kim; Mullin, Chris; Frazier, Maryann; Frazier, Jim; Cox-Foster, Diana; Chen, Yanping; Tarpy, David R; Haubruge, Eric; Pettis, Jeffrey S; Saegerman, Claude
2010-10-01
Colony collapse disorder (CCD), a syndrome whose defining trait is the rapid loss of adult worker honey bees, Apis mellifera L., is thought to be responsible for a minority of the large overwintering losses experienced by U.S. beekeepers since the winter 2006-2007. Using the same data set developed to perform a monofactorial analysis (PloS ONE 4: e6481, 2009), we conducted a classification and regression tree (CART) analysis in an attempt to better understand the relative importance and interrelations among different risk variables in explaining CCD. Fifty-five exploratory variables were used to construct two CART models: one model with and one model without a cost of misclassifying a CCD-diagnosed colony as a non-CCD colony. The resulting model tree that permitted for misclassification had a sensitivity and specificity of 85 and 74%, respectively. Although factors measuring colony stress (e.g., adult bee physiological measures, such as fluctuating asymmetry or mass of head) were important discriminating values, six of the 19 variables having the greatest discriminatory value were pesticide levels in different hive matrices. Notably, coumaphos levels in brood (a miticide commonly used by beekeepers) had the highest discriminatory value and were highest in control (healthy) colonies. Our CART analysis provides evidence that CCD is probably the result of several factors acting in concert, making afflicted colonies more susceptible to disease. This analysis highlights several areas that warrant further attention, including the effect of sublethal pesticide exposure on pathogen prevalence and the role of variability in bee tolerance to pesticides on colony survivorship.
Risk Factors of Falls in Community-Dwelling Older Adults: Logistic Regression Tree Analysis
Yamashita, Takashi; Noe, Douglas A.; Bailer, A. John
2012-01-01
Purpose of the Study: A novel logistic regression tree-based method was applied to identify fall risk factors and possible interaction effects of those risk factors. Design and Methods: A nationally representative sample of American older adults aged 65 years and older (N = 9,592) in the Health and Retirement Study 2004 and 2006 modules was used.…
Risk Factors of Falls in Community-Dwelling Older Adults: Logistic Regression Tree Analysis
Yamashita, Takashi; Noe, Douglas A.; Bailer, A. John
2012-01-01
Purpose of the Study: A novel logistic regression tree-based method was applied to identify fall risk factors and possible interaction effects of those risk factors. Design and Methods: A nationally representative sample of American older adults aged 65 years and older (N = 9,592) in the Health and Retirement Study 2004 and 2006 modules was used.…
Gmur, Stephan; Vogt, Daniel; Zabowski, Darlene; Moskal, L Monika
2012-01-01
The characterization of soil attributes using hyperspectral sensors has revealed patterns in soil spectra that are known to respond to mineral composition, organic matter, soil moisture and particle size distribution. Soil samples from different soil horizons of replicated soil series from sites located within Washington and Oregon were analyzed with the FieldSpec Spectroradiometer to measure their spectral signatures across the electromagnetic range of 400 to 1,000 nm. Similarity rankings of individual soil samples reveal differences between replicate series as well as samples within the same replicate series. Using classification and regression tree statistical methods, regression trees were fitted to each spectral response using concentrations of nitrogen, carbon, carbonate and organic matter as the response variables. Statistics resulting from fitted trees were: nitrogen R(2) 0.91 (p organic matter R(2) 0.98 (p organic matter for upper soil horizons in a nondestructive method.
Souadka Amine
2010-04-01
Full Text Available Abstract Background Incidence of liver hydatid cyst (LHC rupture ranged 15%-40% of all cases and most of them concern the bile duct tree. Patients with biliocystic communication (BCC had specific clinic and therapeutic aspect. The purpose of this study was to determine witch patients with LHC may develop BCC using classification and regression tree (CART analysis Methods A retrospective study of 672 patients with liver hydatid cyst treated at the surgery department "A" at Ibn Sina University Hospital, Rabat Morocco. Four-teen risk factors for BCC occurrence were entered into CART analysis to build an algorithm that can predict at the best way the occurrence of BCC. Results Incidence of BCC was 24.5%. Subgroups with high risk were patients with jaundice and thick pericyst risk at 73.2% and patients with thick pericyst, with no jaundice 36.5 years and younger with no past history of LHC risk at 40.5%. Our developed CART model has sensitivity at 39.6%, specificity at 93.3%, positive predictive value at 65.6%, a negative predictive value at 82.6% and accuracy of good classification at 80.1%. Discriminating ability of the model was good 82%. Conclusion we developed a simple classification tool to identify LHC patients with high risk BCC during a routine clinic visit (only on clinical history and examination followed by an ultrasonography. Predictive factors were based on pericyst aspect, jaundice, age, past history of liver hydatidosis and morphological Gharbi cyst aspect. We think that this classification can be useful with efficacy to direct patients at appropriated medical struct's.
Josef Smolle
2001-01-01
Full Text Available Objective: To evaluate the feasibility of the CART (Classification and Regression Tree procedure for the recognition of microscopic structures in tissue counter analysis. Methods: Digital microscopic images of H&E stained slides of normal human skin and of primary malignant melanoma were overlayed with regularly distributed square measuring masks (elements and grey value, texture and colour features within each mask were recorded. In the learning set, elements were interactively labeled as representing either connective tissue of the reticular dermis, other tissue components or background. Subsequently, CART models were based on these data sets. Results: Implementation of the CART classification rules into the image analysis program showed that in an independent test set 94.1% of elements classified as connective tissue of the reticular dermis were correctly labeled. Automated measurements of the total amount of tissue and of the amount of connective tissue within a slide showed high reproducibility (r=0.97 and r=0.94, respectively; p < 0.001. Conclusions: CART procedure in tissue counter analysis yields simple and reproducible classification rules for tissue elements.
Brabant, Marie-Eve; Hebert, Martine; Chagnon, Francois
2013-01-01
This study explored the clinical profiles of 77 female teenager survivors of sexual abuse and examined the association of abuse-related and personal variables with suicidal ideations. Analyses revealed that 64% of participants experienced suicidal ideations. Findings from classification and regression tree analysis indicated that depression,…
L. Monika Moskal
2012-08-01
Full Text Available The characterization of soil attributes using hyperspectral sensors has revealed patterns in soil spectra that are known to respond to mineral composition, organic matter, soil moisture and particle size distribution. Soil samples from different soil horizons of replicated soil series from sites located within Washington and Oregon were analyzed with the FieldSpec Spectroradiometer to measure their spectral signatures across the electromagnetic range of 400 to 1,000 nm. Similarity rankings of individual soil samples reveal differences between replicate series as well as samples within the same replicate series. Using classification and regression tree statistical methods, regression trees were fitted to each spectral response using concentrations of nitrogen, carbon, carbonate and organic matter as the response variables. Statistics resulting from fitted trees were: nitrogen R^{2} 0.91 (p < 0.01 at 403, 470, 687, and 846 nm spectral band widths, carbonate R^{2} 0.95 (p < 0.01 at 531 and 898 nm band widths, total carbon R^{2} 0.93 (p < 0.01 at 400, 409, 441 and 907 nm band widths, and organic matter R^{2} 0.98 (p < 0.01 at 300, 400, 441, 832 and 907 nm band widths. Use of the 400 to 1,000 nm electromagnetic range utilizing regression trees provided a powerful, rapid and inexpensive method for assessing nitrogen, carbon, carbonate and organic matter for upper soil horizons in a nondestructive method.
Tomczyk, Aleksandra; Ewertowski, Marek; White, Piran; Kasprzak, Leszek
2016-04-01
The dual role of many Protected Natural Areas in providing benefits for both conservation and recreation poses challenges for management. Although recreation-based damage to ecosystems can occur very quickly, restoration can take many years. The protection of conservation interests at the same as providing for recreation requires decisions to be made about how to prioritise and direct management actions. Trails are commonly used to divert visitors from the most important areas of a site, but high visitor pressure can lead to increases in trail width and a concomitant increase in soil erosion. Here we use detailed field data on condition of recreational trails in Gorce National Park, Poland, as the basis for a regression tree analysis to determine the factors influencing trail deterioration, and link specific trail impacts with environmental, use related and managerial factors. We distinguished 12 types of trails, characterised by four levels of degradation: (1) trails with an acceptable level of degradation; (2) threatened trails; (3) damaged trails; and (4) heavily damaged trails. Damaged trails were the most vulnerable of all trails and should be prioritised for appropriate conservation and restoration. We also proposed five types of monitoring of recreational trail conditions: (1) rapid inventory of negative impacts; (2) monitoring visitor numbers and variation in type of use; (3) change-oriented monitoring focusing on sections of trail which were subjected to changes in type or level of use or subjected to extreme weather events; (4) monitoring of dynamics of trail conditions; and (5) full assessment of trail conditions, to be carried out every 10-15 years. The application of the proposed framework can enhance the ability of Park managers to prioritise their trail management activities, enhancing trail conditions and visitor safety, while minimising adverse impacts on the conservation value of the ecosystem. A.M.T. was supported by the Polish Ministry of
Noor Zaitun Yahaya
2017-01-01
Full Text Available This paper investigated the use of boosted regression trees (BRTs to draw an inference about daytime and nighttime ozone formation in a coastal environment. Hourly ground-level ozone data for a full calendar year in 2010 were obtained from the Kemaman (CA 002 air quality monitoring station. A BRT model was developed using hourly ozone data as a response variable and nitric oxide (NO, Nitrogen Dioxide (NO2 and Nitrogen Dioxide (NOx and meteorological parameters as explanatory variables. The ozone BRT algorithm model was constructed from multiple regression models, and the 'best iteration' of BRT model was performed by optimizing prediction performance. Sensitivity testing of the BRT model was conducted to determine the best parameters and good explanatory variables. Using the number of trees between 2,500-3,500, learning rate of 0.01, and interaction depth of 5 were found to be the best setting for developing the ozone boosting model. The performance of the O3 boosting models were assessed, and the fraction of predictions within two factor (FAC2, coefficient of determination (R2 and the index of agreement (IOA of the model developed for day and nighttime are 0.93, 0.69 and 0.73 for daytime and 0.79, 0.55 and 0.69 for nighttime respectively. Results showed that the model developed was within the acceptable range and could be used to understand ozone formation and identify potential sources of ozone for estimating O3 concentrations during daytime and nighttime. Results indicated that the wind speed, wind direction, relative humidity, and temperature were the most dominant variables in terms of influencing ozone formation. Finally, empirical evidence of the production of a high ozone level by wind blowing from coastal areas towards the interior region, especially from industrial areas, was obtained.
Risk assessment of dengue fever in Zhongshan, China: a time-series regression tree analysis.
Liu, K-K; Wang, T; Huang, X-D; Wang, G-L; Xia, Y; Zhang, Y-T; Jing, Q-L; Huang, J-W; Liu, X-X; Lu, J-H; Hu, W-B
2017-02-01
Dengue fever (DF) is the most prevalent and rapidly spreading mosquito-borne disease globally. Control of DF is limited by barriers to vector control and integrated management approaches. This study aimed to explore the potential risk factors for autochthonous DF transmission and to estimate the threshold effects of high-order interactions among risk factors. A time-series regression tree model was applied to estimate the hierarchical relationship between reported autochthonous DF cases and the potential risk factors including the timeliness of DF surveillance systems (median time interval between symptom onset date and diagnosis date, MTIOD), mosquito density, imported cases and meteorological factors in Zhongshan, China from 2001 to 2013. We found that MTIOD was the most influential factor in autochthonous DF transmission. Monthly autochthonous DF incidence rate increased by 36·02-fold [relative risk (RR) 36·02, 95% confidence interval (CI) 25·26-46·78, compared to the average DF incidence rate during the study period] when the 2-month lagged moving average of MTIOD was >4·15 days and the 3-month lagged moving average of the mean Breteau Index (BI) was ⩾16·57. If the 2-month lagged moving average MTIOD was between 1·11 and 4·15 days and the monthly maximum diurnal temperature range at a lag of 1 month was <9·6 °C, the monthly mean autochthonous DF incidence rate increased by 14·67-fold (RR 14·67, 95% CI 8·84-20·51, compared to the average DF incidence rate during the study period). This study demonstrates that the timeliness of DF surveillance systems, mosquito density and diurnal temperature range play critical roles in the autochthonous DF transmission in Zhongshan. Better assessment and prediction of the risk of DF transmission is beneficial for establishing scientific strategies for DF early warning surveillance and control.
Benjamin W. Y. Lo
2016-01-01
Conclusions: A clinically useful classification tree was generated, which serves as a prediction tool to guide bedside prognostication and clinical treatment decision making. This prognostic decision-making algorithm also shed light on the complex interactions between a number of risk factors in determining outcome after aneurysmal SAH.
Freund, Rudolf J; Sa, Ping
2006-01-01
The book provides complete coverage of the classical methods of statistical analysis. It is designed to give students an understanding of the purpose of statistical analyses, to allow the student to determine, at least to some degree, the correct type of statistical analyses to be performed in a given situation, and have some appreciation of what constitutes good experimental design
Boosted regression tree, table, and figure data
Spreadsheets are included here to support the manuscript Boosted Regression Tree Models to Explain Watershed Nutrient Concentrations and Biological Condition. This dataset is associated with the following publication:Golden , H., C. Lane , A. Prues, and E. D'Amico. Boosted Regression Tree Models to Explain Watershed Nutrient Concentrations and Biological Condition. JAWRA. American Water Resources Association, Middleburg, VA, USA, 52(5): 1251-1274, (2016).
Shuqiong Wu
2015-01-01
Full Text Available As a machine learning method, AdaBoost is widely applied to data classification and object detection because of its robustness and efficiency. AdaBoost constructs a global and optimal combination of weak classifiers based on a sample reweighting. It is known that this kind of combination improves the classification performance tremendously. As the popularity of AdaBoost increases, many variants have been proposed to improve the performance of AdaBoost. Then, a lot of comparison and review studies for AdaBoost variants have also been published. Some researchers compared different AdaBoost variants by experiments in their own fields, and others reviewed various AdaBoost variants by basically introducing these algorithms. However, there is a lack of mathematical analysis of the generalization abilities for different AdaBoost variants. In this paper, we analyze the generalization abilities of six AdaBoost variants in terms of classification margins. The six compared variants are Real AdaBoost, Gentle AdaBoost, Modest AdaBoost, Parameterized AdaBoost, Margin-pruning Boost, and Penalized AdaBoost. Finally, we use experiments to verify our analyses.
SMOOTH TRANSITION LOGISTIC REGRESSION MODEL TREE
RODRIGO PINTO MOREIRA
2008-01-01
Este trabalho tem como objetivo principal adaptar o modelo STR-Tree, o qual é a combinação de um modelo Smooth Transition Regression com Classification and Regression Tree (CART), a fim de utilizá-lo em Classificação. Para isto algumas alterações foram realizadas em sua forma estrutural e na estimação. Devido ao fato de estarmos fazendo classificação de variáveis dependentes binárias, se faz necessária a utilização das técnicas empregadas em Regressão Logística, dessa forma a estimação dos pa...
Ermarth, Anna; Bryce, Matthew; Woodward, Stephanie; Stoddard, Gregory; Book, Linda; Jensen, M Kyle
2017-03-01
Celiac disease is detected using serology and endoscopy analyses. We used multiple statistical analyses of a geographically isolated population in the United States to determine whether a single serum screening can identify individuals with celiac disease. We performed a retrospective study of 3555 pediatric patients (18 years old or younger) in the intermountain West region of the United States from January 1, 2008, through September 30, 2013. All patients had undergone serologic analyses for celiac disease, including measurement of antibodies to tissue transglutaminase (TTG) and/or deamidated gliadin peptide (DGP), and had duodenal biopsies collected within the following year. Modified Marsh criteria were used to identify patients with celiac disease. We developed models to identify patients with celiac disease using logistic regression and classification and regression tree (CART) analysis. Single use of a test for serum level of IgA against TTG identified patients with celiac disease with 90% sensitivity, 90% specificity, a 61% positive predictive value (PPV), a 90% negative predictive value, and an area under the receiver operating characteristic curve value of 0.91; these values were higher than those obtained from assays for IgA against DGP or IgG against TTG plus DGP. Not including the test for DGP antibody caused only 0.18% of celiac disease cases to be missed. Level of TTG IgA 7-fold the upper limit of normal (ULN) identified patients with celiac disease with a 96% PPV and 100% specificity. Using CART analysis, we found a level of TTG IgA 3.2-fold the ULN and higher to most accurately identify patients with celiac disease (PPV, 89%). Multivariable CART analysis showed that a level of TTG IgA 2.5-fold the ULN and higher was sufficient to identify celiac disease in patients with type 1 diabetes (PPV, 88%). Serum level of IgA against TTG in patients with versus those without trisomy 21 did not affect diagnosis predictability in CART analysis. In a population
Regression analysis by example
Chatterjee, Samprit; Hadi, Ali S
2012-01-01
.... The emphasis continues to be on exploratory data analysis rather than statistical theory. The coverage offers in-depth treatment of regression diagnostics, transformation, multicollinearity, logistic regression, and robust regression...
Wylie, Bruce K.; Howard, Daniel; Dahal, Devendra; Gilmanov, Tagir; Ji, Lei; Zhang, Li; Smith, Kelcy
2016-01-01
This paper presents the methodology and results of two ecological-based net ecosystem production (NEP) regression tree models capable of up scaling measurements made at various flux tower sites throughout the U.S. Great Plains. Separate grassland and cropland NEP regression tree models were trained using various remote sensing data and other biogeophysical data, along with 15 flux towers contributing to the grassland model and 15 flux towers for the cropland model. The models yielded weekly mean daily grassland and cropland NEP maps of the U.S. Great Plains at 250 m resolution for 2000–2008. The grassland and cropland NEP maps were spatially summarized and statistically compared. The results of this study indicate that grassland and cropland ecosystems generally performed as weak net carbon (C) sinks, absorbing more C from the atmosphere than they released from 2000 to 2008. Grasslands demonstrated higher carbon sink potential (139 g C·m−2·year−1) than non-irrigated croplands. A closer look into the weekly time series reveals the C fluctuation through time and space for each land cover type.
Bruce Wylie
2016-11-01
Full Text Available This paper presents the methodology and results of two ecological-based net ecosystem production (NEP regression tree models capable of up scaling measurements made at various flux tower sites throughout the U.S. Great Plains. Separate grassland and cropland NEP regression tree models were trained using various remote sensing data and other biogeophysical data, along with 15 flux towers contributing to the grassland model and 15 flux towers for the cropland model. The models yielded weekly mean daily grassland and cropland NEP maps of the U.S. Great Plains at 250 m resolution for 2000–2008. The grassland and cropland NEP maps were spatially summarized and statistically compared. The results of this study indicate that grassland and cropland ecosystems generally performed as weak net carbon (C sinks, absorbing more C from the atmosphere than they released from 2000 to 2008. Grasslands demonstrated higher carbon sink potential (139 g C·m−2·year−1 than non-irrigated croplands. A closer look into the weekly time series reveals the C fluctuation through time and space for each land cover type.
Classification and Regression Trees(CART) Theory and Applications
Timofeev, Roman
2004-01-01
This master thesis is devoted to Classification and Regression Trees (CART). CART is classification method which uses historical data to construct decision trees. Depending on available information about the dataset, classification tree or regression tree can be constructed. Constructed tree can be then used for classification of new observations. The first part of the thesis describes fundamental principles of tree construction, different splitting algorithms and pruning procedures. Seco...
Artur Wnorowski
2017-06-01
Full Text Available Tree saps are nourishing biological media commonly used for beverage and syrup production. Although the nutritional aspect of tree saps is widely acknowledged, the exact relationship between the sap composition, origin, and effect on the metabolic rate of human cells is still elusive. Thus, we collected saps from seven different tree species and conducted composition-activity analysis. Saps from trees of Betulaceae, but not from Salicaceae, Sapindaceae, nor Juglandaceae families, were increasing the metabolic rate of HepG2 cells, as measured using tetrazolium-based assay. Content of glucose, fructose, sucrose, chlorides, nitrates, sulphates, fumarates, malates, and succinates in sap samples varied across different tree species. Grade correspondence analysis clustered trees based on the saps’ chemical footprint indicating its usability in chemotaxonomy. Multiple regression modeling showed that glucose and fumarate present in saps from silver birch (Betula pendula Roth., black alder (Alnus glutinosa Gaertn., and European hornbeam (Carpinus betulus L. are positively affecting the metabolic activity of HepG2 cells.
Regression analysis by example
Chatterjee, Samprit
2012-01-01
Praise for the Fourth Edition: ""This book is . . . an excellent source of examples for regression analysis. It has been and still is readily readable and understandable."" -Journal of the American Statistical Association Regression analysis is a conceptually simple method for investigating relationships among variables. Carrying out a successful application of regression analysis, however, requires a balance of theoretical results, empirical rules, and subjective judgment. Regression Analysis by Example, Fifth Edition has been expanded
Inferring gene regression networks with model trees
Aguilar-Ruiz Jesus S
2010-10-01
Full Text Available Abstract Background Novel strategies are required in order to handle the huge amount of data produced by microarray technologies. To infer gene regulatory networks, the first step is to find direct regulatory relationships between genes building the so-called gene co-expression networks. They are typically generated using correlation statistics as pairwise similarity measures. Correlation-based methods are very useful in order to determine whether two genes have a strong global similarity but do not detect local similarities. Results We propose model trees as a method to identify gene interaction networks. While correlation-based methods analyze each pair of genes, in our approach we generate a single regression tree for each gene from the remaining genes. Finally, a graph from all the relationships among output and input genes is built taking into account whether the pair of genes is statistically significant. For this reason we apply a statistical procedure to control the false discovery rate. The performance of our approach, named REGNET, is experimentally tested on two well-known data sets: Saccharomyces Cerevisiae and E.coli data set. First, the biological coherence of the results are tested. Second the E.coli transcriptional network (in the Regulon database is used as control to compare the results to that of a correlation-based method. This experiment shows that REGNET performs more accurately at detecting true gene associations than the Pearson and Spearman zeroth and first-order correlation-based methods. Conclusions REGNET generates gene association networks from gene expression data, and differs from correlation-based methods in that the relationship between one gene and others is calculated simultaneously. Model trees are very useful techniques to estimate the numerical values for the target genes by linear regression functions. They are very often more precise than linear regression models because they can add just different linear
Subgroup finding via Bayesian additive regression trees.
Sivaganesan, Siva; Müller, Peter; Huang, Bin
2017-03-09
We provide a Bayesian decision theoretic approach to finding subgroups that have elevated treatment effects. Our approach separates the modeling of the response variable from the task of subgroup finding and allows a flexible modeling of the response variable irrespective of potential subgroups of interest. We use Bayesian additive regression trees to model the response variable and use a utility function defined in terms of a candidate subgroup and the predicted response for that subgroup. Subgroups are identified by maximizing the expected utility where the expectation is taken with respect to the posterior predictive distribution of the response, and the maximization is carried out over an a priori specified set of candidate subgroups. Our approach allows subgroups based on both quantitative and categorical covariates. We illustrate the approach using simulated data set study and a real data set. Copyright © 2017 John Wiley & Sons, Ltd.
Boosted Regression Tree Models to Explain Watershed ...
Boosted regression tree (BRT) models were developed to quantify the nonlinear relationships between landscape variables and nutrient concentrations in a mesoscale mixed land cover watershed during base-flow conditions. Factors that affect instream biological components, based on the Index of Biotic Integrity (IBI), were also analyzed. Seasonal BRT models at two spatial scales (watershed and riparian buffered area [RBA]) for nitrite-nitrate (NO2-NO3), total Kjeldahl nitrogen, and total phosphorus (TP) and annual models for the IBI score were developed. Two primary factors — location within the watershed (i.e., geographic position, stream order, and distance to a downstream confluence) and percentage of urban land cover (both scales) — emerged as important predictor variables. Latitude and longitude interacted with other factors to explain the variability in summer NO2-NO3 concentrations and IBI scores. BRT results also suggested that location might be associated with indicators of sources (e.g., land cover), runoff potential (e.g., soil and topographic factors), and processes not easily represented by spatial data indicators. Runoff indicators (e.g., Hydrological Soil Group D and Topographic Wetness Indices) explained a substantial portion of the variability in nutrient concentrations as did point sources for TP in the summer months. The results from our BRT approach can help prioritize areas for nutrient management in mixed-use and heavily impacted watershed
Hwang, Sang-Hyun; Pyo, Tina; Oh, Heung-Bum; Park, Hyun Jun; Lee, Kwan-Jeh
2013-01-16
The probability of a prostate cancer-positive biopsy result varies with PSA concentration. Thus, we applied information theory on classification and regression tree (CART) analysis for decision making predicting the probability of a biopsy result at various PSA concentrations. From 2007 to 2009, prostate biopsies were performed in 664 referred patients in a tertiary hospital. We created 2 CART models based on the information theory: one for moderate uncertainty (PSA concentration: 2.5-10 ng/ml) and the other for high uncertainty (PSA concentration: 10-25 ng/ml). The CART model for moderate uncertainty (n=321) had 3 splits based on PSA density (PSAD), hypoechoic nodules, and age and the other CART for high uncertainty (n=160) had 2 splits based on prostate volume and percent-free PSA. In this validation set, the patients (14.3% and 14.0% for moderate and high uncertainty groups, respectively) could avoid unnecessary biopsies without false-negative results. Using these CART models based on uncertainty information of PSA, the overall reduction in unnecessary prostate biopsies was 14.0-14.3% and CART models were simplified. Using uncertainty of laboratory results from information theoretic approach can provide additional information for decision analysis such as CART. Copyright © 2012 Elsevier B.V. All rights reserved.
Seber, George A F
2012-01-01
Concise, mathematically clear, and comprehensive treatment of the subject.* Expanded coverage of diagnostics and methods of model fitting.* Requires no specialized knowledge beyond a good grasp of matrix algebra and some acquaintance with straight-line regression and simple analysis of variance models.* More than 200 problems throughout the book plus outline solutions for the exercises.* This revision has been extensively class-tested.
Koon, Sharon; Petscher, Yaacov
2015-01-01
The purpose of this report was to explicate the use of logistic regression and classification and regression tree (CART) analysis in the development of early warning systems. It was motivated by state education leaders' interest in maintaining high classification accuracy while simultaneously improving practitioner understanding of the rules by…
Robertson, D.M.; Saad, D.A.; Heisey, D.M.
2006-01-01
Various approaches are used to subdivide large areas into regions containing streams that have similar reference or background water quality and that respond similarly to different factors. For many applications, such as establishing reference conditions, it is preferable to use physical characteristics that are not affected by human activities to delineate these regions. However, most approaches, such as ecoregion classifications, rely on land use to delineate regions or have difficulties compensating for the effects of land use. Land use not only directly affects water quality, but it is often correlated with the factors used to define the regions. In this article, we describe modifications to SPARTA (spatial regression-tree analysis), a relatively new approach applied to water-quality and environmental characteristic data to delineate zones with similar factors affecting water quality. In this modified approach, land-use-adjusted (residualized) water quality and environmental characteristics are computed for each site. Regression-tree analysis is applied to the residualized data to determine the most statistically important environmental characteristics describing the distribution of a specific water-quality constituent. Geographic information for small basins throughout the study area is then used to subdivide the area into relatively homogeneous environmental water-quality zones. For each zone, commonly used approaches are subsequently used to define its reference water quality and how its water quality responds to changes in land use. SPARTA is used to delineate zones of similar reference concentrations of total phosphorus and suspended sediment throughout the upper Midwestern part of the United States. ?? 2006 Springer Science+Business Media, Inc.
José A. Delgado
2012-01-01
Full Text Available Forest structural parameters such as quadratic mean diameter, basal area, and number of trees per unit area are important for the assessment of wood volume and biomass and represent key forest inventory attributes. Forest inventory information is required to support sustainable management, carbon accounting, and policy development activities. Digital image processing of remotely sensed imagery is increasingly utilized to assist traditional, more manual, methods in the estimation of forest structural attributes over extensive areas, also enabling evaluation of change over time. Empirical attribute estimation with remotely sensed data is frequently employed, yet with known limitations, especially over complex environments such as Mediterranean forests. In this study, the capacity of high spatial resolution (HSR imagery and related techniques to model structural parameters at the stand level (n = 490 in Mediterranean pines in Central Spain is tested using data from the commercial satellite QuickBird-2. Spectral and spatial information derived from multispectral and panchromatic imagery (2.4 m and 0.68 m sided pixels, respectively served to model structural parameters. Classification and Regression Tree Analysis (CART was selected for the modeling of attributes. Accurate models were produced of quadratic mean diameter (QMD (R2 = 0.8; RMSE = 0.13 m with an average error of 17% while basal area (BA models produced an average error of 22% (RMSE = 5.79 m2/ha. When the measured number of trees per unit area (N was categorized, as per frequent forest management practices, CART models correctly classified 70% of the stands, with all other stands classified in an adjacent class. The accuracy of the attributes estimated here is expected to be better when canopy cover is more open and attribute values are at the lower end of the range present, as related in the pattern of the residuals found in this study. Our findings indicate that attributes derived from
Paulsen, P; Smulders, F J M; Tichy, A; Aydin, A; Höck, C
2011-07-01
In a retrospective study on the microbiology of minced meat from small food businesses supplying directly to the consumer, the relative contribution of meat supplier, meat species and outlet where meat was minced was assessed by "Classification and Regression Tree" (CART) analysis. Samples (n=888) originated from 129 outlets of a single supermarket chain. Sampling units were 4-5 packs (pork, beef, and mixed pork-beef). Total aerobic counts (TACs) were 5.3±1.0 log CFU/g. In 75.6% of samples, E. coli were <1 log CFU/g. The proportion of "unsatisfactory" sample sets [as defined in Reg. (EC) 2073/2005] were 31.3 and 4.5% for TAC and E. coli, respectively. For classification according to TACs, the outlet where meat was minced and the "meat supplier" were the most important predictors. For E. coli, "outlet" was the most important predictor, but the limit of detection of 1 log CFU/g was not discriminative enough to allow further conclusions.
Combining regression trees and radial basis function networks.
Orr, M; Hallam, J; Takezawa, K; Murra, A; Ninomiya, S; Oide, M; Leonard, T
2000-12-01
We describe a method for non-parametric regression which combines regression trees with radial basis function networks. The method is similar to that of Kubat, who was first to suggest such a combination, but has some significant improvements. We demonstrate the features of the new method, compare its performance with other methods on DELVE data sets and apply it to a real world problem involving the classification of soybean plants from digital images.
Multiple linear regression analysis
Edwards, T. R.
1980-01-01
Program rapidly selects best-suited set of coefficients. User supplies only vectors of independent and dependent data and specifies confidence level required. Program uses stepwise statistical procedure for relating minimal set of variables to set of observations; final regression contains only most statistically significant coefficients. Program is written in FORTRAN IV for batch execution and has been implemented on NOVA 1200.
Yutao LIU
2011-10-01
Full Text Available Background and objective Erlotinib is a targeted therapy drug for non-small cell lung cancer (NSCLC. It has been proven that, there was evidence of various survival benefits derived from erlotinib in patients with different clinical features, but the results are conflicting. The aim of this study is to identify novel predictive factors and explore the interactions between clinical variables as well as their impact on the survival of Chinese patients with advanced NSCLC heavily treated with erlotinib. Methods The clinical and follow-up data of 105 Chinese NSCLC patients referred to the Cancer Hospital and Institute, Chinese Academy of Medical Sciences from September 2006 to September 2009 were analyzed. Multivariate analysis of progressive-free survival (PFS was performed using recursive partitioning referred to as the classification and regression tree (CART analysis. Results The median PFS of 105 eligible consecutive Chinese NSCLC patients was 5.0 months (95%CI: 2.9-7.1. CART analysis was performed for the initial, second, and third split in the lymph node involvement, the time of erlotinib administration, and smoking history. Four terminal subgroups were formed. The longer values for the median PFS were 11.0 months (95%CI: 8.9-13.1 for the subgroup with no lymph node metastasis and 10.0 months (95%CI: 7.9-12.1 for the subgroup with lymph node involvement, but not over the second-line erlotinib treatment with a smoking history ≤35 packs per year. The shorter values for the median PFS were 2.3 months (95%CI: 1.6-3.0 for the subgroup with lymph node metastasis and over the second-line erlotinib treatment, and 1.3 months (95%CI: 0.5-2.1 for the subgroup with lymph node metastasis, but not over the second-line erlotinib treatment with a smoking history >35 packs per year. Conclusion Lymph node metastasis, the time of erlotinib administration, and smoking history are closely correlated with the survival of advanced NSCLC patients with first- to
Iron Supplementation and Altitude: Decision Making Using a Regression Tree
Laura A. Garvican-Lewis, Andrew D. Govus, Peter Peeling, Chris R. Abbiss, Christopher J. Gore
2016-03-01
Full Text Available Altitude exposure increases the body’s need for iron (Gassmann and Muckenthaler, 2015, primarily to support accelerated erythropoiesis, yet clear supplementation guidelines do not exist. Athletes are typically recommended to ingest a daily oral iron supplement to facilitate altitude adaptations, and to help maintain iron balance. However, there is some debate as to whether athletes with otherwise healthy iron stores should be supplemented, due in part to concerns of iron overload. Excess iron in vital organs is associated with an increased risk of a number of conditions including cancer, liver disease and heart failure. Therefore clear guidelines are warranted and athletes should be discouraged from ‘self-prescribing” supplementation without medical advice. In the absence of prospective-controlled studies, decision tree analysis can be used to describe a data set, with the resultant regression tree serving as guide for clinical decision making. Here, we present a regression tree in the context of iron supplementation during altitude exposure, to examine the association between pre-altitude ferritin (Ferritin-Pre and the haemoglobin mass (Hbmass response, based on daily iron supplement dose. De-identified ferritin and Hbmass data from 178 athletes engaged in altitude training were extracted from the Australian Institute of Sport (AIS database. Altitude exposure was predominantly achieved via normobaric Live high: Train low (n = 147 at a simulated altitude of 3000 m for 2 to 4 weeks. The remaining athletes engaged in natural altitude training at venues ranging from 1350 to 2800 m for 3-4 weeks. Thus, the “hypoxic dose” ranged from ~890 km.h to ~1400 km.h. Ethical approval was granted by the AIS Human Ethics Committee, and athletes provided written informed consent. An in depth description and traditional analysis of the complete data set is presented elsewhere (Govus et al., 2015. Iron supplementation was prescribed by a sports physician
Lucinda Pfalzer
2013-06-01
Full Text Available Background/Purpose: Over 1/3 of adults over age 65 experiences at least one fall each year. This pilot report uses a classification regression tree analysis (CART to model the outcomes for balance/risk of falls from the Gentiva® Safe Strides® Program (SSP. Methods/Outcomes: SSP is a home-based balance/fall prevention program designed to treat root causes of a patient
González-Costa, Juan José; Reigosa, Manuel Joaquín; Matías, José María; Fernández-Covelo, Emma
2017-01-01
This study determines the influence of the different soil components and of the cation-exchange capacity on the adsorption and retention of different heavy metals: cadmium, chromium, copper, nickel, lead and zinc. In order to do so, regression models were created through decision trees and the importance of soil components was assessed. Used variables were: humified organic matter, specific cation-exchange capacity, percentages of sand and silt, proportions of Mn, Fe and Al oxides and hematite, and the proportion of quartz, plagioclase and mica, and the proportions of the different clays: kaolinite, vermiculite, gibbsite and chlorite. The most important components in the obtained models were vermiculite and gibbsite, especially for the adsorption of cadmium and zinc, while clays were less relevant. Oxides are less important than clays, especially for the adsorption of chromium and lead and the retention of chromium, copper and lead. PMID:28072849
González-Costa, Juan José; Reigosa, Manuel Joaquín; Matías, José María; Fernández-Covelo, Emma
2017-01-01
This study determines the influence of the different soil components and of the cation-exchange capacity on the adsorption and retention of different heavy metals: cadmium, chromium, copper, nickel, lead and zinc. In order to do so, regression models were created through decision trees and the importance of soil components was assessed. Used variables were: humified organic matter, specific cation-exchange capacity, percentages of sand and silt, proportions of Mn, Fe and Al oxides and hematite, and the proportion of quartz, plagioclase and mica, and the proportions of the different clays: kaolinite, vermiculite, gibbsite and chlorite. The most important components in the obtained models were vermiculite and gibbsite, especially for the adsorption of cadmium and zinc, while clays were less relevant. Oxides are less important than clays, especially for the adsorption of chromium and lead and the retention of chromium, copper and lead.
Data mining in psychological treatment research: a primer on classification and regression trees.
King, Matthew W; Resick, Patricia A
2014-10-01
Data mining of treatment study results can reveal unforeseen but critical insights, such as who receives the most benefit from treatment and under what circumstances. The usefulness and legitimacy of exploratory data analysis have received relatively little recognition, however, and analytic methods well suited to the task are not widely known in psychology. With roots in computer science and statistics, statistical learning approaches offer a credible option: These methods take a more inductive approach to building a model than is done in traditional regression, allowing the data greater role in suggesting the correct relationships between variables rather than imposing them a priori. Classification and regression trees are presented as a powerful, flexible exemplar of statistical learning methods. Trees allow researchers to efficiently identify useful predictors of an outcome and discover interactions between predictors without the need to anticipate and specify these in advance, making them ideal for revealing patterns that inform hypotheses about treatment effects. Trees can also provide a predictive model for forecasting outcomes as an aid to clinical decision making. This primer describes how tree models are constructed, how the results are interpreted and evaluated, and how trees overcome some of the complexities of traditional regression. Examples are drawn from randomized clinical trial data and highlight some interpretations of particular interest to treatment researchers. The limitations of tree models are discussed, and suggestions for further reading and choices in software are offered.
Allore, Heather; Tinetti, Mary E; Araujo, Katy L B; Hardy, Susan; Peduzzi, Peter
2005-02-01
Many important physiologic and clinical predictors are continuous. Clinical investigators and epidemiologists' interest in these predictors lies, in part, in the risk they pose for adverse outcomes, which may be continuous as well. The relationship between continuous predictors and a continuous outcome may be complex and difficult to interpret. Therefore, methods to detect levels of a predictor variable that predict the outcome and determine the threshold for clinical intervention would provide a beneficial tool for clinical investigators and epidemiologists. We present a case study using regression tree methodology to predict Social and Productive Activities score at 3 years using five modifiable impairments. The predictive ability of regression tree methodology was compared with multiple linear regression using two independent data sets, one for development and one for validation. The regression tree approach and the multiple linear regression model provided similar fit (model deviances) on the development cohort. In the validation cohort, the deviance of the multiple linear regression model was 31% greater than the regression tree approach. Regression tree analysis developed a better model of impairments predicting Social and Productive Activities score that may be more easily applied in research settings than multiple linear regression alone.
C. Quantin
2011-01-01
Full Text Available Cardiologists are interested in determining whether the type of hospital pathway followed by a patient is predictive of survival. The study objective was to determine whether accounting for hospital pathways in the selection of prognostic factors of one-year survival after acute myocardial infarction (AMI provided a more informative analysis than that obtained by the use of a standard regression tree analysis (CART method. Information on AMI was collected for 1095 hospitalized patients over an 18-month period. The construction of pathways followed by patients produced symbolic-valued observations requiring a symbolic regression tree analysis. This analysis was compared with the standard CART analysis using patients as statistical units described by standard data selected TIMI score as the primary predictor variable. For the 1011 (84, resp. patients with a lower (higher TIMI score, the pathway variable did not appear as a diagnostic variable until the third (second stage of the tree construction. For an ecological analysis, again TIMI score was the first predictor variable. However, in a symbolic regression tree analysis using hospital pathways as statistical units, the type of pathway followed was the key predictor variable, showing in particular that pathways involving early admission to cardiology units produced high one-year survival rates.
Chi-Ming Chu
2014-01-01
Full Text Available Background. Microarray technology shows great potential but previous studies were limited by small number of samples in the colorectal cancer (CRC research. The aims of this study are to investigate gene expression profile of CRCs by pooling cDNA microarrays using PAM, ANN, and decision trees (CART and C5.0. Methods. Pooled 16 datasets contained 88 normal mucosal tissues and 1186 CRCs. PAM was performed to identify significant expressed genes in CRCs and models of PAM, ANN, CART, and C5.0 were constructed for screening candidate genes via ranking gene order of significances. Results. The first screening identified 55 genes. The test accuracy of each model was over 0.97 averagely. Less than eight genes achieve excellent classification accuracy. Combining the results of four models, we found the top eight differential genes in CRCs; suppressor genes, CA7, SPIB, GUCA2B, AQP8, IL6R and CWH43; oncogenes, SPP1 and TCN1. Genes of higher significances showed lower variation in rank ordering by different methods. Conclusion. We adopted a two-tier genetic screen, which not only reduced the number of candidate genes but also yielded good accuracy (nearly 100%. This method can be applied to future studies. Among the top eight genes, CA7, TCN1, and CWH43 have not been reported to be related to CRC.
U.S. Environmental Protection Agency — Spreadsheets are included here to support the manuscript "Boosted Regression Tree Models to Explain Watershed Nutrient Concentrations and Biological Condition". This...
[Hyperspectral Estimation of Apple Tree Canopy LAI Based on SVM and RF Regression].
Han, Zhao-ying; Zhu, Xi-cun; Fang, Xian-yi; Wang, Zhuo-yuan; Wang, Ling; Zhao, Geng-Xing; Jiang, Yuan-mao
2016-03-01
Leaf area index (LAI) is the dynamic index of crop population size. Hyperspectral technology can be used to estimate apple canopy LAI rapidly and nondestructively. It can be provide a reference for monitoring the tree growing and yield estimation. The Red Fuji apple trees of full bearing fruit are the researching objects. Ninety apple trees canopies spectral reflectance and LAI values were measured by the ASD Fieldspec3 spectrometer and LAI-2200 in thirty orchards in constant two years in Qixia research area of Shandong Province. The optimal vegetation indices were selected by the method of correlation analysis of the original spectral reflectance and vegetation indices. The models of predicting the LAI were built with the multivariate regression analysis method of support vector machine (SVM) and random forest (RF). The new vegetation indices, GNDVI527, ND-VI676, RVI682, FD-NVI656 and GRVI517 and the previous two main vegetation indices, NDVI670 and NDVI705, are in accordance with LAI. In the RF regression model, the calibration set decision coefficient C-R2 of 0.920 and validation set decision coefficient V-R2 of 0.889 are higher than the SVM regression model by 0.045 and 0.033 respectively. The root mean square error of calibration set C-RMSE of 0.249, the root mean square error validation set V-RMSE of 0.236 are lower than that of the SVM regression model by 0.054 and 0.058 respectively. Relative analysis of calibrating error C-RPD and relative analysis of validation set V-RPD reached 3.363 and 2.520, 0.598 and 0.262, respectively, which were higher than the SVM regression model. The measured and predicted the scatterplot trend line slope of the calibration set and validation set C-S and V-S are close to 1. The estimation result of RF regression model is better than that of the SVM. RF regression model can be used to estimate the LAI of red Fuji apple trees in full fruit period.
Susan L. King
2003-01-01
The performance of two classifiers, logistic regression and neural networks, are compared for modeling noncatastrophic individual tree mortality for 21 species of trees in West Virginia. The output of the classifier is usually a continuous number between 0 and 1. A threshold is selected between 0 and 1 and all of the trees below the threshold are classified as...
Gang WU
2016-01-01
Full Text Available Objective To analyze the risk factors for prognosis in intracerebral hemorrhage using decision tree (classification and regression tree, CART model and logistic regression model. Methods CART model and logistic regression model were established according to the risk factors for prognosis of patients with cerebral hemorrhage. The differences in the results were compared between the two methods. Results Logistic regression analyses showed that hematoma volume (OR-value 0.953, initial Glasgow Coma Scale (GCS score (OR-value 1.210, pulmonary infection (OR-value 0.295, and basal ganglia hemorrhage (OR-value 0.336 were the risk factors for the prognosis of cerebral hemorrhage. The results of CART analysis showed that volume of hematoma and initial GCS score were the main factors affecting the prognosis of cerebral hemorrhage. The effects of two models on the prognosis of cerebral hemorrhage were similar (Z-value 0.402, P=0.688. Conclusions CART model has a similar value to that of logistic model in judging the prognosis of cerebral hemorrhage, and it is characterized by using transactional analysis between the risk factors, and it is more intuitive. DOI: 10.11855/j.issn.0577-7402.2015.12.13
Common pitfalls in statistical analysis: Logistic regression.
Ranganathan, Priya; Pramesh, C S; Aggarwal, Rakesh
2017-01-01
Logistic regression analysis is a statistical technique to evaluate the relationship between various predictor variables (either categorical or continuous) and an outcome which is binary (dichotomous). In this article, we discuss logistic regression analysis and the limitations of this technique.
Petersen, Mette Bisgaard; Tolver, Anders; Husted, Louise
2016-01-01
admitted with acute colitis (random...... forest analysis. Ponies and Icelandic horses made up 59% of the population, whilst the remaining 41% were horses. Blood lactate concentration at admission was the only individual parameter significantly associated with probability of survival to discharge (P
Principal component regression analysis with SPSS.
Liu, R X; Kuang, J; Gong, Q; Hou, X L
2003-06-01
The paper introduces all indices of multicollinearity diagnoses, the basic principle of principal component regression and determination of 'best' equation method. The paper uses an example to describe how to do principal component regression analysis with SPSS 10.0: including all calculating processes of the principal component regression and all operations of linear regression, factor analysis, descriptives, compute variable and bivariate correlations procedures in SPSS 10.0. The principal component regression analysis can be used to overcome disturbance of the multicollinearity. The simplified, speeded up and accurate statistical effect is reached through the principal component regression analysis with SPSS.
Regression Analysis by Example. 5th Edition
Chatterjee, Samprit; Hadi, Ali S.
2012-01-01
Regression analysis is a conceptually simple method for investigating relationships among variables. Carrying out a successful application of regression analysis, however, requires a balance of theoretical results, empirical rules, and subjective judgment. "Regression Analysis by Example, Fifth Edition" has been expanded and thoroughly…
Santana Isabel
2011-08-01
Full Text Available Abstract Background Dementia and cognitive impairment associated with aging are a major medical and social concern. Neuropsychological testing is a key element in the diagnostic procedures of Mild Cognitive Impairment (MCI, but has presently a limited value in the prediction of progression to dementia. We advance the hypothesis that newer statistical classification methods derived from data mining and machine learning methods like Neural Networks, Support Vector Machines and Random Forests can improve accuracy, sensitivity and specificity of predictions obtained from neuropsychological testing. Seven non parametric classifiers derived from data mining methods (Multilayer Perceptrons Neural Networks, Radial Basis Function Neural Networks, Support Vector Machines, CART, CHAID and QUEST Classification Trees and Random Forests were compared to three traditional classifiers (Linear Discriminant Analysis, Quadratic Discriminant Analysis and Logistic Regression in terms of overall classification accuracy, specificity, sensitivity, Area under the ROC curve and Press'Q. Model predictors were 10 neuropsychological tests currently used in the diagnosis of dementia. Statistical distributions of classification parameters obtained from a 5-fold cross-validation were compared using the Friedman's nonparametric test. Results Press' Q test showed that all classifiers performed better than chance alone (p Conclusions When taking into account sensitivity, specificity and overall classification accuracy Random Forests and Linear Discriminant analysis rank first among all the classifiers tested in prediction of dementia using several neuropsychological tests. These methods may be used to improve accuracy, sensitivity and specificity of Dementia predictions from neuropsychological testing.
Boosted Regression Tree Models to Explain Watershed Nutrient Concentrations and Biological Condition
Boosted regression tree (BRT) models were developed to quantify the nonlinear relationships between landscape variables and nutrient concentrations in a mesoscale mixed land cover watershed during base-flow conditions. Factors that affect instream biological components, based on ...
Boosted Regression Tree Models to Explain Watershed Nutrient Concentrations and Biological Condition
Boosted regression tree (BRT) models were developed to quantify the nonlinear relationships between landscape variables and nutrient concentrations in a mesoscale mixed land cover watershed during base-flow conditions. Factors that affect instream biological components, based on ...
Combining an additive and tree-based regression model simultaneously: STIMA
Dusseldorp, E.; Conversano, C.; Os, B.J. van
2010-01-01
Additive models and tree-based regression models are two main classes of statistical models used to predict the scores on a continuous response variable. It is known that additive models become very complex in the presence of higher order interaction effects, whereas some tree-based models, such as
Stefanie M. Herrmann
2013-10-01
Full Text Available Field trees are an integral part of the farmed parkland landscape in West Africa and provide multiple benefits to the local environment and livelihoods. While field trees have received increasing interest in the context of strengthening resilience to climate variability and change, the actual extent of farmed parkland and spatial patterns of tree cover are largely unknown. We used the rule-based predictive modeling tool Cubist® to estimate field tree cover in the west-central agricultural region of Senegal. A collection of rules and associated multiple linear regression models was constructed from (1 a reference dataset of percent tree cover derived from very high spatial resolution data (2 m Orbview as the dependent variable, and (2 ten years of 10-day 250 m Moderate Resolution Imaging Spectrometer (MODIS Normalized Difference Vegetation Index (NDVI composites and derived phenological metrics as independent variables. Correlation coefficients between modeled and reference percent tree cover of 0.88 and 0.77 were achieved for training and validation data respectively, with absolute mean errors of 1.07 and 1.03 percent tree cover. The resulting map shows a west-east gradient from high tree cover in the peri-urban areas of horticulture and arboriculture to low tree cover in the more sparsely populated eastern part of the study area. A comparison of current (2000s tree cover along this gradient with historic cover as seen on Corona images reveals dynamics of change but also areas of remarkable stability of field tree cover since 1968. The proposed modeling approach can help to identify locations of high and low tree cover in dryland environments and guide ground studies and management interventions aimed at promoting the integration of field trees in agricultural systems.
Suduan Chen
2014-01-01
Full Text Available As the fraudulent financial statement of an enterprise is increasingly serious with each passing day, establishing a valid forecasting fraudulent financial statement model of an enterprise has become an important question for academic research and financial practice. After screening the important variables using the stepwise regression, the study also matches the logistic regression, support vector machine, and decision tree to construct the classification models to make a comparison. The study adopts financial and nonfinancial variables to assist in establishment of the forecasting fraudulent financial statement model. Research objects are the companies to which the fraudulent and nonfraudulent financial statement happened between years 1998 to 2012. The findings are that financial and nonfinancial information are effectively used to distinguish the fraudulent financial statement, and decision tree C5.0 has the best classification effect 85.71%.
Functional linear regression via canonical analysis
He, Guozhong; Wang, Jane-Ling; Yang, Wenjing; 10.3150/09-BEJ228
2011-01-01
We study regression models for the situation where both dependent and independent variables are square-integrable stochastic processes. Questions concerning the definition and existence of the corresponding functional linear regression models and some basic properties are explored for this situation. We derive a representation of the regression parameter function in terms of the canonical components of the processes involved. This representation establishes a connection between functional regression and functional canonical analysis and suggests alternative approaches for the implementation of functional linear regression analysis. A specific procedure for the estimation of the regression parameter function using canonical expansions is proposed and compared with an established functional principal component regression approach. As an example of an application, we present an analysis of mortality data for cohorts of medflies, obtained in experimental studies of aging and longevity.
Using Regression Mixture Analysis in Educational Research
Cody S. Ding
2006-11-01
Full Text Available Conventional regression analysis is typically used in educational research. Usually such an analysis implicitly assumes that a common set of regression parameter estimates captures the population characteristics represented in the sample. In some situations, however, this implicit assumption may not be realistic, and the sample may contain several subpopulations such as high math achievers and low math achievers. In these cases, conventional regression models may provide biased estimates since the parameter estimates are constrained to be the same across subpopulations. This paper advocates the applications of regression mixture models, also known as latent class regression analysis, in educational research. Regression mixture analysis is more flexible than conventional regression analysis in that latent classes in the data can be identified and regression parameter estimates can vary within each latent class. An illustration of regression mixture analysis is provided based on a dataset of authentic data. The strengths and limitations of the regression mixture models are discussed in the context of educational research.
Applied regression analysis a research tool
Pantula, Sastry; Dickey, David
1998-01-01
Least squares estimation, when used appropriately, is a powerful research tool. A deeper understanding of the regression concepts is essential for achieving optimal benefits from a least squares analysis. This book builds on the fundamentals of statistical methods and provides appropriate concepts that will allow a scientist to use least squares as an effective research tool. Applied Regression Analysis is aimed at the scientist who wishes to gain a working knowledge of regression analysis. The basic purpose of this book is to develop an understanding of least squares and related statistical methods without becoming excessively mathematical. It is the outgrowth of more than 30 years of consulting experience with scientists and many years of teaching an applied regression course to graduate students. Applied Regression Analysis serves as an excellent text for a service course on regression for non-statisticians and as a reference for researchers. It also provides a bridge between a two-semester introduction to...
Ziping WANG
2011-09-01
Full Text Available Background and objective It has been proven that gefitinib produces only 10%-20% tumor regression in heavily pretreated, unselected non-small cell lung cancer (NSCLC patients as the second- and third-line setting. Asian, female, nonsmokers and adenocarcinoma are favorable factors; however, it is difficult to find a patient satisfying all the above clinical characteristics. The aim of this study is to identify novel predicting factors, and to explore the interactions between clinical variables and their impact on the survival of Chinese patients with advanced NSCLC who were heavily treated with gefitinib in the second- or third-line setting. Methods The clinical and follow-up data of 127 advanced NSCLC patients referred to the Cancer Hospital & Institute, Chinese Academy of Medical Sciences from March 2005 to March 2010 were analyzed. Multivariate analysis of progression-free survival (PFS was performed using recursive partitioning, which is referred to as the classification and regression tree (CART analysis. Results The median PFS of 127 eligible consecutive advanced NSCLC patients was 8.0 months (95%CI: 5.8-10.2. CART was performed with an initial split on first-line chemotherapy outcomes and a second split on patients’ age. Three terminal subgroups were formed. The median PFS of the three subsets ranged from 1.0 month (95%CI: 0.8-1.2 for those with progressive disease outcome after the first-line chemotherapy subgroup, 10 months (95%CI: 7.0-13.0 in patients with a partial response or stable disease in first-line chemotherapy and age <70, and 22.0 months for patients obtaining a partial response or stable disease in first-line chemotherapy at age 70-81 (95%CI: 3.8-40.1. Conclusion Partial response, stable disease in first-line chemotherapy and age ≥ 70 are closely correlated with long-term survival treated by gefitinib as a second- or third-line setting in advanced NSCLC. CART can be used to identify previously unappreciated patient
Kwon, Y.
2013-12-01
As evidence of global warming continue to increase, being able to predict forest response to climate changes, such as expected rise of temperature and precipitation, will be vital for maintaining the sustainability and productivity of forests. To map forest species redistribution by climate change scenario has been successful, however, most species redistribution maps lack mechanistic understanding to explain why trees grow under the novel conditions of chaining climate. Distributional map is only capable of predicting under the equilibrium assumption that the communities would exist following a prolonged period under the new climate. In this context, forest NPP as a surrogate for growth rate, the most important facet that determines stand dynamics, can lead to valid prediction on the transition stage to new vegetation-climate equilibrium as it represents changes in structure of forest reflecting site conditions and climate factors. The objective of this study is to develop forest growth map using regression tree analysis by extracting large-scale non-linear structures from both field-based FIA and remotely sensed MODIS data set. The major issue addressed in this approach is non-linear spatial patterns of forest attributes. Forest inventory data showed complex spatial patterns that reflect environmental states and processes that originate at different spatial scales. At broad scales, non-linear spatial trends in forest attributes and mixture of continuous and discrete types of environmental variables make traditional statistical (multivariate regression) and geostatistical (kriging) models inefficient. It calls into question some traditional underlying assumptions of spatial trends that uncritically accepted in forest data. To solve the controversy surrounding the suitability of forest data, regression tree analysis are performed using Software See5 and Cubist. Four publicly available data sets were obtained: First, field-based Forest Inventory and Analysis (USDA
Improving Cluster Analysis with Automatic Variable Selection Based on Trees
2014-12-01
ANALYSIS WITH AUTOMATIC VARIABLE SELECTION BASED ON TREES by Anton D. Orr December 2014 Thesis Advisor: Samuel E. Buttrey Second Reader...DATES COVERED Master’s Thesis 4. TITLE AND SUBTITLE IMPROVING CLUSTER ANALYSIS WITH AUTOMATIC VARIABLE SELECTION BASED ON TREES 5. FUNDING NUMBERS 6...2006 based on classification and regression trees to address problems with determining dissimilarity. Current algorithms do not simultaneously address
Regression Analysis and the Sociological Imagination
De Maio, Fernando
2014-01-01
Regression analysis is an important aspect of most introductory statistics courses in sociology but is often presented in contexts divorced from the central concerns that bring students into the discipline. Consequently, we present five lesson ideas that emerge from a regression analysis of income inequality and mortality in the USA and Canada.
Regression Analysis and the Sociological Imagination
De Maio, Fernando
2014-01-01
Regression analysis is an important aspect of most introductory statistics courses in sociology but is often presented in contexts divorced from the central concerns that bring students into the discipline. Consequently, we present five lesson ideas that emerge from a regression analysis of income inequality and mortality in the USA and Canada.
Bou Kheir, Rania; Greve, Mogens Humlekrog; Deroin, Jean-Paul
2013-01-01
Soil contamination by heavy metals has become a widespread dangerous problem in many parts of the world, including the Mediterranean environments. This is closely related to the increase irrigation by waste waters, to the uncontrolled application of sewage sludge, industrial effluents, pesticides...... coastal area situated in northern Lebanon using a geographic information system (GIS) and regression-tree analysis. The chosen area represents a typical case study of Mediterranean coastal landscape with deteriorating environment. Fifteen environmental parameters (parent material, soil type, p......H, hydraulical conductivity, organic matter, stoniness ratio, soil depth, slope gradient, slope aspect, slope curvature, land cover/use, distance to drainage line, proximity to roads, nearness to cities, and surroundings to waste areas) were generated from satellite imageries, Digital Elevation Models (DEMs...
Rothwell, James J; Futter, Martyn N; Dise, Nancy B
2008-11-01
Often, there is a non-linear relationship between atmospheric dissolved inorganic nitrogen (DIN) input and DIN leaching that is poorly captured by existing models. We present the first application of the non-parametric classification and regression tree approach to evaluate the key environmental drivers controlling DIN leaching from European forests. DIN leaching was classified as low (15kg N ha(-1) year(-1)) at 215 sites across Europe. The analysis identified throughfall NO(3)(-) deposition, acid deposition, hydrology, soil type, the carbon content of the soil, and the legacy of historic N deposition as the dominant drivers of DIN leaching for these forests. Ninety four percent of sites were successfully classified into the appropriate leaching category. This approach shows promise for understanding complex ecosystem responses to a wide range of anthropogenic stressors as well as an improved method for identifying risk and targeting pollution mitigation strategies in forest ecosystems.
Maroco, João; Silva, Dina; Rodrigues, Ana; Guerreiro, Manuela; Santana, Isabel; de Mendonça, Alexandre
2011-08-17
Dementia and cognitive impairment associated with aging are a major medical and social concern. Neuropsychological testing is a key element in the diagnostic procedures of Mild Cognitive Impairment (MCI), but has presently a limited value in the prediction of progression to dementia. We advance the hypothesis that newer statistical classification methods derived from data mining and machine learning methods like Neural Networks, Support Vector Machines and Random Forests can improve accuracy, sensitivity and specificity of predictions obtained from neuropsychological testing. Seven non parametric classifiers derived from data mining methods (Multilayer Perceptrons Neural Networks, Radial Basis Function Neural Networks, Support Vector Machines, CART, CHAID and QUEST Classification Trees and Random Forests) were compared to three traditional classifiers (Linear Discriminant Analysis, Quadratic Discriminant Analysis and Logistic Regression) in terms of overall classification accuracy, specificity, sensitivity, Area under the ROC curve and Press'Q. Model predictors were 10 neuropsychological tests currently used in the diagnosis of dementia. Statistical distributions of classification parameters obtained from a 5-fold cross-validation were compared using the Friedman's nonparametric test. Press' Q test showed that all classifiers performed better than chance alone (p Machines showed the larger overall classification accuracy (Median (Me) = 0.76) an area under the ROC (Me = 0.90). However this method showed high specificity (Me = 1.0) but low sensitivity (Me = 0.3). Random Forest ranked second in overall accuracy (Me = 0.73) with high area under the ROC (Me = 0.73) specificity (Me = 0.73) and sensitivity (Me = 0.64). Linear Discriminant Analysis also showed acceptable overall accuracy (Me = 0.66), with acceptable area under the ROC (Me = 0.72) specificity (Me = 0.66) and sensitivity (Me = 0.64). The remaining classifiers showed overall classification accuracy above a
Yoonseok Shin
2015-01-01
Among the recent data mining techniques available, the boosting approach has attracted a great deal of attention because of its effective learning algorithm and strong boundaries in terms of its generalization performance. However, the boosting approach has yet to be used in regression problems within the construction domain, including cost estimations, but has been actively utilized in other domains. Therefore, a boosting regression tree (BRT) is applied to cost estimations at the early stag...
Incomplete meteorological data has been a problem in environmental modeling studies. The objective of this work was to develop a technique to reconstruct missing daily precipitation data in the central part of Chesapeake Bay Watershed using regression trees (RT) and artificial neural networks (ANN)....
Multi-site solar power forecasting using gradient boosted regression trees
Persson, Caroline Stougård; Bacher, Peder; Shiga, Takahiro
2017-01-01
generation and relevant meteorological variables related to 42 individual PV rooftop installations are used to train a gradient boosted regression tree (GBRT) model. When compared to single-site linear autoregressive and variations of GBRT models the multi-site model shows competitive results in terms...
Heteroscedastic regression analysis method for mixed data
FU Hui-min; YUE Xiao-rui
2011-01-01
The heteroscedastic regression model was established and the heteroscedastic regression analysis method was presented for mixed data composed of complete data, type- I censored data and type- Ⅱ censored data from the location-scale distribution. The best unbiased estimations of regression coefficients, as well as the confidence limits of the location parameter and scale parameter were given. Furthermore, the point estimations and confidence limits of percentiles were obtained. Thus, the traditional multiple regression analysis method which is only suitable to the complete data from normal distribution can be extended to the cases of heteroscedastic mixed data and the location-scale distribution. So the presented method has a broad range of promising applications.
Naghibi, Seyed Amir; Pourghasemi, Hamid Reza; Dixon, Barnali
2016-01-01
Groundwater is considered one of the most valuable fresh water resources. The main objective of this study was to produce groundwater spring potential maps in the Koohrang Watershed, Chaharmahal-e-Bakhtiari Province, Iran, using three machine learning models: boosted regression tree (BRT), classification and regression tree (CART), and random forest (RF). Thirteen hydrological-geological-physiographical (HGP) factors that influence locations of springs were considered in this research. These factors include slope degree, slope aspect, altitude, topographic wetness index (TWI), slope length (LS), plan curvature, profile curvature, distance to rivers, distance to faults, lithology, land use, drainage density, and fault density. Subsequently, groundwater spring potential was modeled and mapped using CART, RF, and BRT algorithms. The predicted results from the three models were validated using the receiver operating characteristics curve (ROC). From 864 springs identified, 605 (≈70 %) locations were used for the spring potential mapping, while the remaining 259 (≈30 %) springs were used for the model validation. The area under the curve (AUC) for the BRT model was calculated as 0.8103 and for CART and RF the AUC were 0.7870 and 0.7119, respectively. Therefore, it was concluded that the BRT model produced the best prediction results while predicting locations of springs followed by CART and RF models, respectively. Geospatially integrated BRT, CART, and RF methods proved to be useful in generating the spring potential map (SPM) with reasonable accuracy.
Podio, Natalia S; López-Froilán, Rebeca; Ramirez-Moreno, Esther; Bertrand, Lidwina; Baroni, María V; Pérez-Rodríguez, María L; Sánchez-Mata, María-Cortes; Wunderlin, Daniel A
2015-11-01
The aim of this study was to evaluate changes in polyphenol profile and antioxidant capacity of five soluble coffees throughout a simulated gastro-intestinal digestion, including absorption through a dialysis membrane. Our results demonstrate that both polyphenol content and antioxidant capacity were characteristic for each type of studied coffee, showing a drop after dialysis. Twenty-seven compounds were identified in coffee by HPLC-MS, while only 14 of them were found after dialysis. Green+roasted coffee blend and chicory+coffee blend showed the highest and lowest content of polyphenols and antioxidant capacity before in vitro digestion and after dialysis, respectively. Canonical correlation analysis showed significant correlation between the antioxidant capacity and the polyphenol profile before digestion and after dialysis. Furthermore, boosted regression trees analysis (BRT) showed that only four polyphenol compounds (5-p-coumaroylquinic acid, quinic acid, coumaroyl tryptophan conjugated, and 5-O-caffeoylquinic acid) appear to be the most relevant to explain the antioxidant capacity after dialysis, these compounds being the most bioaccessible after dialysis. To our knowledge, this is the first report matching the antioxidant capacity of foods with the polyphenol profile by BRT, which opens an interesting method of analysis for future reports on the antioxidant capacity of foods.
Buchner, Florian; Wasem, Jürgen; Schillo, Sonja
2017-01-01
Risk equalization formulas have been refined since their introduction about two decades ago. Because of the complexity and the abundance of possible interactions between the variables used, hardly any interactions are considered. A regression tree is used to systematically search for interactions, a methodologically new approach in risk equalization. Analyses are based on a data set of nearly 2.9 million individuals from a major German social health insurer. A two-step approach is applied: In the first step a regression tree is built on the basis of the learning data set. Terminal nodes characterized by more than one morbidity-group-split represent interaction effects of different morbidity groups. In the second step the 'traditional' weighted least squares regression equation is expanded by adding interaction terms for all interactions detected by the tree, and regression coefficients are recalculated. The resulting risk adjustment formula shows an improvement in the adjusted R(2) from 25.43% to 25.81% on the evaluation data set. Predictive ratios are calculated for subgroups affected by the interactions. The R(2) improvement detected is only marginal. According to the sample level performance measures used, not involving a considerable number of morbidity interactions forms no relevant loss in accuracy. Copyright © 2015 John Wiley & Sons, Ltd.
Relative risk regression analysis of epidemiologic data.
Prentice, R L
1985-11-01
Relative risk regression methods are described. These methods provide a unified approach to a range of data analysis problems in environmental risk assessment and in the study of disease risk factors more generally. Relative risk regression methods are most readily viewed as an outgrowth of Cox's regression and life model. They can also be viewed as a regression generalization of more classical epidemiologic procedures, such as that due to Mantel and Haenszel. In the context of an epidemiologic cohort study, relative risk regression methods extend conventional survival data methods and binary response (e.g., logistic) regression models by taking explicit account of the time to disease occurrence while allowing arbitrary baseline disease rates, general censorship, and time-varying risk factors. This latter feature is particularly relevant to many environmental risk assessment problems wherein one wishes to relate disease rates at a particular point in time to aspects of a preceding risk factor history. Relative risk regression methods also adapt readily to time-matched case-control studies and to certain less standard designs. The uses of relative risk regression methods are illustrated and the state of development of these procedures is discussed. It is argued that asymptotic partial likelihood estimation techniques are now well developed in the important special case in which the disease rates of interest have interpretations as counting process intensity functions. Estimation of relative risks processes corresponding to disease rates falling outside this class has, however, received limited attention. The general area of relative risk regression model criticism has, as yet, not been thoroughly studied, though a number of statistical groups are studying such features as tests of fit, residuals, diagnostics and graphical procedures. Most such studies have been restricted to exponential form relative risks as have simulation studies of relative risk estimation
Prediction of cadmium enrichment in reclaimed coastal soils by classification and regression tree
Ru, Feng; Yin, Aijing; Jin, Jiaxin; Zhang, Xiuying; Yang, Xiaohui; Zhang, Ming; Gao, Chao
2016-08-01
Reclamation of coastal land is one of the most common ways to obtain land resources in China. However, it has long been acknowledged that the artificial interference with coastal land has disadvantageous effects, such as heavy metal contamination. This study aimed to develop a prediction model for cadmium enrichment levels and assess the importance of affecting factors in typical reclaimed land in Eastern China (DFCL: Dafeng Coastal Land). Two hundred and twenty seven surficial soil/sediment samples were collected and analyzed to identify the enrichment levels of cadmium and the possible affecting factors in soils and sediments. The classification and regression tree (CART) model was applied in this study to predict cadmium enrichment levels. The prediction results showed that cadmium enrichment levels assessed by the CART model had an accuracy of 78.0%. The CART model could extract more information on factors affecting the environmental behavior of cadmium than correlation analysis. The integration of correlation analysis and the CART model showed that fertilizer application and organic carbon accumulation were the most important factors affecting soil/sediment cadmium enrichment levels, followed by particle size effects (Al2O3, TFe2O3 and SiO2), contents of Cl and S, surrounding construction areas and reclamation history.
Prediction of tissue-specific cis-regulatory modules using Bayesian networks and regression trees
Chen Xiaoyu
2007-12-01
Full Text Available Abstract Background In vertebrates, a large part of gene transcriptional regulation is operated by cis-regulatory modules. These modules are believed to be regulating much of the tissue-specificity of gene expression. Results We develop a Bayesian network approach for identifying cis-regulatory modules likely to regulate tissue-specific expression. The network integrates predicted transcription factor binding site information, transcription factor expression data, and target gene expression data. At its core is a regression tree modeling the effect of combinations of transcription factors bound to a module. A new unsupervised EM-like algorithm is developed to learn the parameters of the network, including the regression tree structure. Conclusion Our approach is shown to accurately identify known human liver and erythroid-specific modules. When applied to the prediction of tissue-specific modules in 10 different tissues, the network predicts a number of important transcription factor combinations whose concerted binding is associated to specific expression.
Discrete Discriminant analysis based on tree-structured graphical models
Perez de la Cruz, Gonzalo; Eslava, Guillermina
The purpose of this paper is to illustrate the potential use of discriminant analysis based on tree{structured graphical models for discrete variables. This is done by comparing its empirical performance using estimated error rates for real and simulated data. The results show that discriminant...... analysis based on tree{structured graphical models is a simple nonlinear method competitive with, and sometimes superior to, other well{known linear methods like those assuming mutual independence between variables and linear logistic regression....
Discrete Discriminant analysis based on tree-structured graphical models
Perez de la Cruz, Gonzalo; Eslava, Guillermina
The purpose of this paper is to illustrate the potential use of discriminant analysis based on tree{structured graphical models for discrete variables. This is done by comparing its empirical performance using estimated error rates for real and simulated data. The results show that discriminant a...... analysis based on tree{structured graphical models is a simple nonlinear method competitive with, and sometimes superior to, other well{known linear methods like those assuming mutual independence between variables and linear logistic regression....
Paap, M.C.S.; Eggen, T.J.H.M.; Veldkamp, B.P.
2012-01-01
Standardized tests often group items around a common stimulus. Such groupings of items are called testlets. The potential dependency among items within a testlet is generally ignored in practice, even though a basic assumption of item response theory (IRT) is that individual items are independent of one another. A technique called tree-based regression (TBR) was applied to identify key features of stimuli that could properly predict the dependence structure of testlet data. Knowledge about th...
Development of prognostic indicators using Classification And Regression Trees (CART) for survival
Nunn, Martha E.; Fan, Juanjuan; Su, Xiaogang; McGuire, Michael K.
2012-01-01
The development of an accurate prognosis is an integral component of treatment planning in the practice of periodontics. Prior work has evaluated the validity of using various clinical measured parameters for assigning periodontal prognosis as well as for predicting tooth survival and change in clinical conditions over time. We critically review the application of multivariate Classification And Regression Trees (CART) for survival in developing evidence-based periodontal prognostic indicator...
Nishanee Rampersad
2017-01-01
Full Text Available Background: Assessment of intraocular pressure (IOP is an important test in glaucoma. In addition, anterior segment variables may be useful in screening for glaucoma risk. Studies have investigated the associations between IOP and anterior segment variables using traditional statistical methods. The classification and regression tree (CART method provides another dimension to detect important variables in a relationship automatically.Aim: To identify the critical factors that influence IOP using a regression tree.Methods: A quantitative cross-sectional research design was used. Anterior segment variables were measured in 700 participants using the iVue100 optical coherence tomographer, Oculus Keratograph and Nidek US-500 ultrasonographer. A Goldmann applanation tonometer was used to measure IOP. Data from only the right eyes were analysed because of high levels of interocular symmetry. A regression tree model was generated with the CART method and Pearson’s correlation coefficients were used to assess the relationships between the ocular variables.Results: The mean IOP for the entire sample was 14.63 mmHg ± 2.40 mmHg. The CART method selected three anterior segment variables in the regression tree model. Central corneal thickness was the most important variable with a cut-off value of 527 µm. The other important variables included average paracentral corneal thickness and axial anterior chamber depth. Corneal thickness measurements increased towards the periphery and were significantly correlated with IOP (r ≥ 0.50, p ≤ 0.001.Conclusion: The CART method identified the anterior segment variables that influenced IOP. Understanding the relationship between IOP and anterior segment variables may help to clinically identify patients with ocular risk factors associated with elevated IOPs.
Lampa, Erik; Lind, Lars; Lind, Monica P.; Bornefalk-Hermansson, Anna
2014-01-01
Background: There is a need to evaluate complex interaction effects on human health, such as those induced by mixtures of environmental contaminants. The usual approach is to formulate an additive statistical model and check for departures using product terms between the variables of interest. In this paper, we present an approach to search for interaction effects among several variables using boosted regression trees. Methods: We simulate a continuous outcome from real data on 27 environment...
Risk assessment of dental caries by using Classification and Regression Trees.
Ito, Ataru; Hayashi, Mikako; Hamasaki, Toshimitsu; Ebisu, Shigeyuki
2011-06-01
Being able to predict an individual's risks of dental caries would offer a potentially huge natural step forward toward better oral heath. As things stand, preventive treatment against caries is mostly carried out without risk assessment because there is no proven way to analyse an individual's risk factors. The purpose of this study was to try to identify those patients with high and low risk of caries by using Classification and Regression Trees (CART). In this historical cohort study, data from 442 patients in a general practice who met the inclusion criteria were analysed. CART was applied to the data to seek a model for predicting caries by using the following parameters according to each patient: age, number of carious teeth, numbers of cariogenic bacteria, the secretion rate and buffer capacity of saliva, and compliance with a prevention programme. The risks of caries were presented by odds ratios. Multiple logistic regression analysis was performed to confirm the results obtained by CART. CART identified high and low risk patients for primary caries with relative odds ratios of 0.41 (95%CI: 0.22-0.77, p = 0.0055) and 2.88 (95%CI: 1.49-5.59, p = 0.0018) according the numbers of cariogenic bacteria. High and low risk patients for secondary caries were also identified with the odds ratios of 0.07 (95%CI: 0.01-0.55, p = 0.00109) and 7.00 (95%CI: 3.50-13.98, p caries. Cariogenic bacteria play a leading role in the incidence of caries. CART proved effective in identifying an individual patient's risk of caries. Copyright © 2011 Elsevier Ltd. All rights reserved.
Robust Mediation Analysis Based on Median Regression
Yuan, Ying; MacKinnon, David P.
2014-01-01
Mediation analysis has many applications in psychology and the social sciences. The most prevalent methods typically assume that the error distribution is normal and homoscedastic. However, this assumption may rarely be met in practice, which can affect the validity of the mediation analysis. To address this problem, we propose robust mediation analysis based on median regression. Our approach is robust to various departures from the assumption of homoscedasticity and normality, including heavy-tailed, skewed, contaminated, and heteroscedastic distributions. Simulation studies show that under these circumstances, the proposed method is more efficient and powerful than standard mediation analysis. We further extend the proposed robust method to multilevel mediation analysis, and demonstrate through simulation studies that the new approach outperforms the standard multilevel mediation analysis. We illustrate the proposed method using data from a program designed to increase reemployment and enhance mental health of job seekers. PMID:24079925
Functional data analysis of generalized regression quantiles
Guo, Mengmeng
2013-11-05
Generalized regression quantiles, including the conditional quantiles and expectiles as special cases, are useful alternatives to the conditional means for characterizing a conditional distribution, especially when the interest lies in the tails. We develop a functional data analysis approach to jointly estimate a family of generalized regression quantiles. Our approach assumes that the generalized regression quantiles share some common features that can be summarized by a small number of principal component functions. The principal component functions are modeled as splines and are estimated by minimizing a penalized asymmetric loss measure. An iterative least asymmetrically weighted squares algorithm is developed for computation. While separate estimation of individual generalized regression quantiles usually suffers from large variability due to lack of sufficient data, by borrowing strength across data sets, our joint estimation approach significantly improves the estimation efficiency, which is demonstrated in a simulation study. The proposed method is applied to data from 159 weather stations in China to obtain the generalized quantile curves of the volatility of the temperature at these stations. © 2013 Springer Science+Business Media New York.
E. Brito-Rocha
Full Text Available Abstract Individual leaf area (LA is a key variable in studies of tree ecophysiology because it directly influences light interception, photosynthesis and evapotranspiration of adult trees and seedlings. We analyzed the leaf dimensions (length – L and width – W of seedlings and adults of seven Neotropical rainforest tree species (Brosimum rubescens, Manilkara maxima, Pouteria caimito, Pouteria torta, Psidium cattleyanum, Symphonia globulifera and Tabebuia stenocalyx with the objective to test the feasibility of single regression models to estimate LA of both adults and seedlings. In southern Bahia, Brazil, a first set of data was collected between March and October 2012. From the seven species analyzed, only two (P. cattleyanum and T. stenocalyx had very similar relationships between LW and LA in both ontogenetic stages. For these two species, a second set of data was collected in August 2014, in order to validate the single models encompassing adult and seedlings. Our results show the possibility of development of models for predicting individual leaf area encompassing different ontogenetic stages for tropical tree species. The development of these models was more dependent on the species than the differences in leaf size between seedlings and adults.
Shokouh Taghipour Zahir
2013-01-01
Full Text Available Purpose. We sought to investigate the utility of classification and regression trees (CART classifier to differentiate benign from malignant nodules in patients referred for thyroid surgery. Methods. Clinical and demographic data of 271 patients referred to the Sadoughi Hospital during 2006–2011 were collected. In a two-step approach, a CART classifier was employed to differentiate patients with a high versus low risk of thyroid malignancy. The first step served as the screening procedure and was tailored to produce as few false negatives as possible. The second step identified those with the lowest risk of malignancy, chosen from a high risk population. Sensitivity, specificity, positive and negative predictive values (PPV and NPV of the optimal tree were calculated. Results. In the first step, age, sex, and nodule size contributed to the optimal tree. Ultrasonographic features were employed in the second step with hypoechogenicity and/or microcalcifications yielding the highest discriminatory ability. The combined tree produced a sensitivity and specificity of 80.0% (95% CI: 29.9–98.9 and 94.1% (95% CI: 78.9–99.0, respectively. NPV and PPV were 66.7% (41.1–85.6 and 97.0% (82.5–99.8, respectively. Conclusion. CART classifier reliably identifies patients with a low risk of malignancy who can avoid unnecessary surgery.
Human action analysis with randomized trees
Yu, Gang; Liu, Zicheng
2014-01-01
This book will provide a comprehensive overview on human action analysis with randomized trees. It will cover both the supervised random trees and the unsupervised random trees. When there are sufficient amount of labeled data available, supervised random trees provides a fast method for space-time interest point matching. When labeled data is minimal as in the case of example-based action search, unsupervised random trees is used to leverage the unlabelled data. We describe how the randomized trees can be used for action classification, action detection, action search, and action prediction.
徐迎迎
2013-01-01
随着信息技术的不断发展，应用商业智能技术进行数据挖掘与分析对商家来说也越来越重要，分类回归树和神经网络算法是数据挖掘的经典算法，其广泛运用在数据分析、预测和评估等方面。文章分别运用分类回归树和神经网络算法对零售商品采取促销方案后收入变化的数据进行分析，并建立相应的模型对促销方案效果进行预测。%with the development of information technology, the application of business intelligence technology of data mining and analysis for the merchants is more and more important, classification and regression tree and neural network algorithm is a classical algorithm of data mining, it is widely used in the data analysis, prediction and evaluation. This paper using classification and regression tree analysis and neural network algorithm for retail commodity take promotion plan changes in income after the data, and to establish the corresponding model to predict the effect of promotion.
M. Saki
2013-03-01
Full Text Available The relationship between plant species and environmental factors has always been a central issue in plant ecology. With rising power of statistical techniques, geo-statistics and geographic information systems (GIS, the development of predictive habitat distribution models of organisms has rapidly increased in ecology. This study aimed to evaluate the ability of Logistic Regression Tree model to create potential habitat map of Astragalus verus. This species produces Tragacanth and has economic value. A stratified- random sampling was applied to 100 sites (50 presence- 50 absence of given species, and produced environmental and edaphic factors maps by using Kriging and Inverse Distance Weighting methods in the ArcGIS software for the whole study area. Relationships between species occurrence and environmental factors were determined by Logistic Regression Tree model and extended to the whole study area. The results indicated species occurrence has strong correlation with environmental factors such as mean daily temperature and clay, EC and organic carbon content of the soil. Species occurrence showed direct relationship with mean daily temperature and clay and organic carbon, and inverse relationship with EC. Model accuracy was evaluated both by Cohen’s kappa statistics (κ and by area under Receiver Operating Characteristics curve based on independent test data set. Their values (kappa=0.9, Auc of ROC=0.96 indicated the high power of LRT to create potential habitat map on local scales. This model, therefore, can be applied to recognize potential sites for rangeland reclamation projects.
Credit Scoring Problem Based on Regression Analysis
Khassawneh, Bashar Suhil Jad Allah
2014-01-01
ABSTRACT: This thesis provides an explanatory introduction to the regression models of data mining and contains basic definitions of key terms in the linear, multiple and logistic regression models. Meanwhile, the aim of this study is to illustrate fitting models for the credit scoring problem using simple linear, multiple linear and logistic regression models and also to analyze the found model functions by statistical tools. Keywords: Data mining, linear regression, logistic regression....
Shin, Yoonseok
2015-01-01
Among the recent data mining techniques available, the boosting approach has attracted a great deal of attention because of its effective learning algorithm and strong boundaries in terms of its generalization performance. However, the boosting approach has yet to be used in regression problems within the construction domain, including cost estimations, but has been actively utilized in other domains. Therefore, a boosting regression tree (BRT) is applied to cost estimations at the early stage of a construction project to examine the applicability of the boosting approach to a regression problem within the construction domain. To evaluate the performance of the BRT model, its performance was compared with that of a neural network (NN) model, which has been proven to have a high performance in cost estimation domains. The BRT model has shown results similar to those of NN model using 234 actual cost datasets of a building construction project. In addition, the BRT model can provide additional information such as the importance plot and structure model, which can support estimators in comprehending the decision making process. Consequently, the boosting approach has potential applicability in preliminary cost estimations in a building construction project.
Kritski Afrânio
2006-02-01
Full Text Available Abstract Background Smear negative pulmonary tuberculosis (SNPT accounts for 30% of pulmonary tuberculosis cases reported yearly in Brazil. This study aimed to develop a prediction model for SNPT for outpatients in areas with scarce resources. Methods The study enrolled 551 patients with clinical-radiological suspicion of SNPT, in Rio de Janeiro, Brazil. The original data was divided into two equivalent samples for generation and validation of the prediction models. Symptoms, physical signs and chest X-rays were used for constructing logistic regression and classification and regression tree models. From the logistic regression, we generated a clinical and radiological prediction score. The area under the receiver operator characteristic curve, sensitivity, and specificity were used to evaluate the model's performance in both generation and validation samples. Results It was possible to generate predictive models for SNPT with sensitivity ranging from 64% to 71% and specificity ranging from 58% to 76%. Conclusion The results suggest that those models might be useful as screening tools for estimating the risk of SNPT, optimizing the utilization of more expensive tests, and avoiding costs of unnecessary anti-tuberculosis treatment. Those models might be cost-effective tools in a health care network with hierarchical distribution of scarce resources.
Yoonseok Shin
2015-01-01
Full Text Available Among the recent data mining techniques available, the boosting approach has attracted a great deal of attention because of its effective learning algorithm and strong boundaries in terms of its generalization performance. However, the boosting approach has yet to be used in regression problems within the construction domain, including cost estimations, but has been actively utilized in other domains. Therefore, a boosting regression tree (BRT is applied to cost estimations at the early stage of a construction project to examine the applicability of the boosting approach to a regression problem within the construction domain. To evaluate the performance of the BRT model, its performance was compared with that of a neural network (NN model, which has been proven to have a high performance in cost estimation domains. The BRT model has shown results similar to those of NN model using 234 actual cost datasets of a building construction project. In addition, the BRT model can provide additional information such as the importance plot and structure model, which can support estimators in comprehending the decision making process. Consequently, the boosting approach has potential applicability in preliminary cost estimations in a building construction project.
Remaining Phosphorus Estimate Through Multiple Regression Analysis
M. E. ALVES; A. LAVORENTI
2006-01-01
The remaining phosphorus (Prem), P concentration that remains in solution after shaking soil with 0.01 mol L-1 CaCl2 containing 60 μg mL-1 P, is a very useful index for studies related to the chemistry of variable charge soils. Although the Prem determination is a simple procedure, the possibility of estimating accurate values of this index from easily and/or routinely determined soil properties can be very useful for practical purposes. The present research evaluated the Premestimation through multiple regression analysis in which routinely determined soil chemical data, soil clay content and soil pH measured in 1 mol L-1 NaF (pHNaF) figured as Prem predictor variables. The Prem can be estimated with acceptable accuracy using the above-mentioned approach, and PHNaF not only substitutes for clay content as a predictor variable but also confers more accuracy to the Prem estimates.
Hermanek, P; Guggenmoos-Holzmann, I
1994-01-01
A total of 961 patients who had received resective surgery for gastric carcinoma were grouped according to prognosis by classification and regression trees (CART). This grouping was compared to the present UICC stage grouping. For patients resected for cure (R0) the CART approach allows a better discrimination of patients with poor prognosis (5-year survival rates 15%-30%) from patients with a 5-year survival of 50%, on the one hand, and from patients with extremely poor prognosis (5-year survival rates below 5%) on the other. In the present investigation CART grouping was not influenced by the differentiation between pT1 and pT2 or between pT3 and pT4.
Prediction of cannabis and cocaine use in adolescence using decision trees and logistic regression
Alfonso L. Palmer
2010-01-01
Full Text Available Spain is one of the European countries with the highest prevalence of cannabis and cocaine use among young people. The aim of this study was to investigate the factors related to the consumption of cocaine and cannabis among adolescents. A questionnaire was administered to 9,284 students between 14 and 18 years of age in Palma de Mallorca (47.1% boys and 52.9% girls whose mean age was 15.59 years. Logistic regression and decision trees were carried out in order to model the consumption of cannabis and cocaine. The results show the use of legal substances and committing fraudulence or theft are the main variables that raise the odds of consuming cannabis. In boys, cannabis consumption and a family history of drug use increase the odds of consuming cocaine, whereas in girls the use of alcohol, behaviours of fraudulence or theft and difficulty in some personal skills influence their odds of consuming cocaine. Finally, ease of access to the substance greatly raises the odds of consuming cocaine and cannabis in both genders. Decision trees highlight the role of consuming other substances and committing fraudulence or theft. The results of this study gain importance when it comes to putting into practice effective prevention programmes.
Hemmateenejad, Bahram, E-mail: hemmatb@sums.ac.ir [Department of Chemistry, Shiraz University, Shiraz (Iran, Islamic Republic of); Medicinal and Natural Products Chemistry Research Center, Shiraz University of Medical Sciences, Shiraz (Iran, Islamic Republic of); Shamsipur, Mojtaba [Department of Chemistry, Razi University, Kermanshah (Iran, Islamic Republic of); Zare-Shahabadi, Vali [Young Researchers Club, Mahshahr Branch, Islamic Azad University, Mahshahr (Iran, Islamic Republic of); Akhond, Morteza [Department of Chemistry, Shiraz University, Shiraz (Iran, Islamic Republic of)
2011-10-17
Highlights: {yields} Ant colony systems help to build optimum classification and regression trees. {yields} Using of genetic algorithm operators in ant colony systems resulted in more appropriate models. {yields} Variable selection in each terminal node of the tree gives promising results. {yields} CART-ACS-GA could model the melting point of organic materials with prediction errors lower than previous models. - Abstract: The classification and regression trees (CART) possess the advantage of being able to handle large data sets and yield readily interpretable models. A conventional method of building a regression tree is recursive partitioning, which results in a good but not optimal tree. Ant colony system (ACS), which is a meta-heuristic algorithm and derived from the observation of real ants, can be used to overcome this problem. The purpose of this study was to explore the use of CART and its combination with ACS for modeling of melting points of a large variety of chemical compounds. Genetic algorithm (GA) operators (e.g., cross averring and mutation operators) were combined with ACS algorithm to select the best solution model. In addition, at each terminal node of the resulted tree, variable selection was done by ACS-GA algorithm to build an appropriate partial least squares (PLS) model. To test the ability of the resulted tree, a set of approximately 4173 structures and their melting points were used (3000 compounds as training set and 1173 as validation set). Further, an external test set containing of 277 drugs was used to validate the prediction ability of the tree. Comparison of the results obtained from both trees showed that the tree constructed by ACS-GA algorithm performs better than that produced by recursive partitioning procedure.
Common pitfalls in statistical analysis: Linear regression analysis.
Aggarwal, Rakesh; Ranganathan, Priya
2017-01-01
In a previous article in this series, we explained correlation analysis which describes the strength of relationship between two continuous variables. In this article, we deal with linear regression analysis which predicts the value of one continuous variable from another. We also discuss the assumptions and pitfalls associated with this analysis.
Common pitfalls in statistical analysis: Linear regression analysis
Rakesh Aggarwal
2017-01-01
Full Text Available In a previous article in this series, we explained correlation analysis which describes the strength of relationship between two continuous variables. In this article, we deal with linear regression analysis which predicts the value of one continuous variable from another. We also discuss the assumptions and pitfalls associated with this analysis.
Sliced Inverse Regression for Time Series Analysis
Chen, Li-Sue
1995-11-01
In this thesis, general nonlinear models for time series data are considered. A basic form is x _{t} = f(beta_sp{1} {T}X_{t-1},beta_sp {2}{T}X_{t-1},... , beta_sp{k}{T}X_ {t-1},varepsilon_{t}), where x_{t} is an observed time series data, X_{t } is the first d time lag vector, (x _{t},x_{t-1},... ,x _{t-d-1}), f is an unknown function, beta_{i}'s are unknown vectors, varepsilon_{t }'s are independent distributed. Special cases include AR and TAR models. We investigate the feasibility applying SIR/PHD (Li 1990, 1991) (the sliced inverse regression and principal Hessian methods) in estimating beta _{i}'s. PCA (Principal component analysis) is brought in to check one critical condition for SIR/PHD. Through simulation and a study on 3 well -known data sets of Canadian lynx, U.S. unemployment rate and sunspot numbers, we demonstrate how SIR/PHD can effectively retrieve the interesting low-dimension structures for time series data.
Donmez, Cenk; Berberoglu, Suha; Erdogan, Mehmet Akif; Tanriover, Anil Akin; Cilek, Ahmet
2015-02-01
Percent tree cover is the percentage of the ground surface area covered by a vertical projection of the outermost perimeter of the plants. It is an important indicator to reveal the condition of forest systems and has a significant importance for ecosystem models as a main input. The aim of this study is to estimate the percent tree cover of various forest stands in a Mediterranean environment based on an empirical relationship between tree coverage and remotely sensed data in Goksu Watershed located at the Eastern Mediterranean coast of Turkey. A regression tree algorithm was used to simulate spatial fractions of Pinus nigra, Cedrus libani, Pinus brutia, Juniperus excelsa and Quercus cerris using multi-temporal LANDSAT TM/ETM data as predictor variables and land cover information. Two scenes of high resolution GeoEye-1 images were employed for training and testing the model. The predictor variables were incorporated in addition to biophysical variables estimated from the LANDSAT TM/ETM data. Additionally, normalised difference vegetation index (NDVI) was incorporated to LANDSAT TM/ETM band settings as a biophysical variable. Stepwise linear regression (SLR) was applied for selecting the relevant bands to employ in regression tree process. SLR-selected variables produced accurate results in the model with a high correlation coefficient of 0.80. The output values ranged from 0 to 100 %. The different tree species were mapped in 30 m resolution in respect to elevation. Percent tree cover map as a final output was derived using LANDSAT TM/ETM image over Goksu Watershed and the biophysical variables. The results were tested using high spatial resolution GeoEye-1 images. Thus, the combination of the RT algorithm and higher resolution data for percent tree cover mapping were tested and examined in a complex Mediterranean environment.
Deo, Ravinesh C.; Kisi, Ozgur; Singh, Vijay P.
2017-02-01
Drought forecasting using standardized metrics of rainfall is a core task in hydrology and water resources management. Standardized Precipitation Index (SPI) is a rainfall-based metric that caters for different time-scales at which the drought occurs, and due to its standardization, is well-suited for forecasting drought at different periods in climatically diverse regions. This study advances drought modelling using multivariate adaptive regression splines (MARS), least square support vector machine (LSSVM), and M5Tree models by forecasting SPI in eastern Australia. MARS model incorporated rainfall as mandatory predictor with month (periodicity), Southern Oscillation Index, Pacific Decadal Oscillation Index and Indian Ocean Dipole, ENSO Modoki and Nino 3.0, 3.4 and 4.0 data added gradually. The performance was evaluated with root mean square error (RMSE), mean absolute error (MAE), and coefficient of determination (r2). Best MARS model required different input combinations, where rainfall, sea surface temperature and periodicity were used for all stations, but ENSO Modoki and Pacific Decadal Oscillation indices were not required for Bathurst, Collarenebri and Yamba, and the Southern Oscillation Index was not required for Collarenebri. Inclusion of periodicity increased the r2 value by 0.5-8.1% and reduced RMSE by 3.0-178.5%. Comparisons showed that MARS superseded the performance of the other counterparts for three out of five stations with lower MAE by 15.0-73.9% and 7.3-42.2%, respectively. For the other stations, M5Tree was better than MARS/LSSVM with lower MAE by 13.8-13.4% and 25.7-52.2%, respectively, and for Bathurst, LSSVM yielded more accurate result. For droughts identified by SPI ≤ - 0.5, accurate forecasts were attained by MARS/M5Tree for Bathurst, Yamba and Peak Hill, whereas for Collarenebri and Barraba, M5Tree was better than LSSVM/MARS. Seasonal analysis revealed disparate results where MARS/M5Tree was better than LSSVM. The results highlight the
Koestel, John; Bechtold, Michel; Jorda, Helena; Jarvis, Nicholas
2015-04-01
The saturated and near-saturated hydraulic conductivity of soil is of key importance for modelling water and solute fluxes in the vadose zone. Hydraulic conductivity measurements are cumbersome at the Darcy scale and practically impossible at larger scales where water and solute transport models are mostly applied. Hydraulic conductivity must therefore be estimated from proxy variables. Such pedotransfer functions are known to work decently well for e.g. water retention curves but rather poorly for near-saturated and saturated hydraulic conductivities. Recently, Weynants et al. (2009, Revisiting Vereecken pedotransfer functions: Introducing a closed-form hydraulic model. Vadose Zone Journal, 8, 86-95) reported a coefficients of determination of 0.25 (validation with an independent data set) for the saturated hydraulic conductivity from lab-measurements of Belgian soil samples. In our study, we trained boosted regression trees on a global meta-database containing tension-disk infiltrometer data (see Jarvis et al. 2013. Influence of soil, land use and climatic factors on the hydraulic conductivity of soil. Hydrology & Earth System Sciences, 17, 5185-5195) to predict the saturated hydraulic conductivity (Ks) and the conductivity at a tension of 10 cm (K10). We found coefficients of determination of 0.39 and 0.62 under a simple 10-fold cross-validation for Ks and K10. When carrying out the validation folded over the data-sources, i.e. the source publications, we found that the corresponding coefficients of determination reduced to 0.15 and 0.36, respectively. We conclude that the stricter source-wise cross-validation should be applied in future pedotransfer studies to prevent overly optimistic validation results. The boosted regression trees also allowed for an investigation of relevant predictors for estimating the near-saturated hydraulic conductivity. We found that land use and bulk density were most important to predict Ks. We also observed that Ks is large in fine
Kropat, Georg; Bochud, Francois; Jaboyedoff, Michel; Laedermann, Jean-Pascal; Murith, Christophe; Palacios Gruson, Martha; Baechler, Sébastien
2015-09-01
According to estimations around 230 people die as a result of radon exposure in Switzerland. This public health concern makes reliable indoor radon prediction and mapping methods necessary in order to improve risk communication to the public. The aim of this study was to develop an automated method to classify lithological units according to their radon characteristics and to develop mapping and predictive tools in order to improve local radon prediction. About 240 000 indoor radon concentration (IRC) measurements in about 150 000 buildings were available for our analysis. The automated classification of lithological units was based on k-medoids clustering via pair-wise Kolmogorov distances between IRC distributions of lithological units. For IRC mapping and prediction we used random forests and Bayesian additive regression trees (BART). The automated classification groups lithological units well in terms of their IRC characteristics. Especially the IRC differences in metamorphic rocks like gneiss are well revealed by this method. The maps produced by random forests soundly represent the regional difference of IRCs in Switzerland and improve the spatial detail compared to existing approaches. We could explain 33% of the variations in IRC data with random forests. Additionally, the influence of a variable evaluated by random forests shows that building characteristics are less important predictors for IRCs than spatial/geological influences. BART could explain 29% of IRC variability and produced maps that indicate the prediction uncertainty. Ensemble regression trees are a powerful tool to model and understand the multidimensional influences on IRCs. Automatic clustering of lithological units complements this method by facilitating the interpretation of radon properties of rock types. This study provides an important element for radon risk communication. Future approaches should consider taking into account further variables like soil gas radon measurements as
Using Dominance Analysis to Determine Predictor Importance in Logistic Regression
Azen, Razia; Traxel, Nicole
2009-01-01
This article proposes an extension of dominance analysis that allows researchers to determine the relative importance of predictors in logistic regression models. Criteria for choosing logistic regression R[superscript 2] analogues were determined and measures were selected that can be used to perform dominance analysis in logistic regression. A…
Using Dominance Analysis to Determine Predictor Importance in Logistic Regression
Azen, Razia; Traxel, Nicole
2009-01-01
This article proposes an extension of dominance analysis that allows researchers to determine the relative importance of predictors in logistic regression models. Criteria for choosing logistic regression R[superscript 2] analogues were determined and measures were selected that can be used to perform dominance analysis in logistic regression. A…
Differential Diagnosis of Erythmato-Squamous Diseases Using Classification and Regression Tree
Maghooli, Keivan; Langarizadeh, Mostafa; Shahmoradi, Leila; Habibi-koolaee, Mahdi; Jebraeily, Mohamad; Bouraghi, Hamid
2016-01-01
Introduction: Differential diagnosis of Erythmato-Squamous Diseases (ESD) is a major challenge in the field of dermatology. The ESD diseases are placed into six different classes. Data mining is the process for detection of hidden patterns. In the case of ESD, data mining help us to predict the diseases. Different algorithms were developed for this purpose. Objective: we aimed to use the Classification and Regression Tree (CART) to predict differential diagnosis of ESD. Methods: we used the Cross Industry Standard Process for Data Mining (CRISP-DM) methodology. For this purpose, the dermatology data set from machine learning repository, UCI was obtained. The Clementine 12.0 software from IBM Company was used for modelling. In order to evaluation of the model we calculate the accuracy, sensitivity and specificity of the model. Results: The proposed model had an accuracy of 94.84% ( Standard Deviation: 24.42) in order to correct prediction of the ESD disease. Conclusions: Results indicated that using of this classifier could be useful. But, it would be strongly recommended that the combination of machine learning methods could be more useful in terms of prediction of ESD. PMID:28077889
Development of prognostic indicators using Classification And Regression Trees (CART) for survival
Nunn, Martha E.; Fan, Juanjuan; Su, Xiaogang; McGuire, Michael K.
2014-01-01
The development of an accurate prognosis is an integral component of treatment planning in the practice of periodontics. Prior work has evaluated the validity of using various clinical measured parameters for assigning periodontal prognosis as well as for predicting tooth survival and change in clinical conditions over time. We critically review the application of multivariate Classification And Regression Trees (CART) for survival in developing evidence-based periodontal prognostic indicators. We focus attention on two distinct methods of multivariate CART for survival: the marginal goodness-of-fit approach, and the multivariate exponential approach. A number of common clinical measures have been found to be significantly associated with tooth loss from periodontal disease, including furcation involvement, probing depth, mobility, crown-to-root ratio, and oral hygiene. However, the inter-relationships among these measures, as well as the relevance of other clinical measures to tooth loss from periodontal disease (such as bruxism, family history of periodontal disease, and overall bone loss), remain less clear. While inferences drawn from any single current study are necessarily limited, the application of new approaches in epidemiologic analyses to periodontal prognosis, such as CART for survival, should yield important insights into our understanding, and treatment, of periodontal diseases. PMID:22133372
Estimating carbon and showing impacts of drought using satellite data in regression-tree models
Boyte, Stephen; Wylie, Bruce K.; Howard, Danny; Dahal, Devendra; Gilmanov, Tagir G.
2018-01-01
Integrating spatially explicit biogeophysical and remotely sensed data into regression-tree models enables the spatial extrapolation of training data over large geographic spaces, allowing a better understanding of broad-scale ecosystem processes. The current study presents annual gross primary production (GPP) and annual ecosystem respiration (RE) for 2000–2013 in several short-statured vegetation types using carbon flux data from towers that are located strategically across the conterminous United States (CONUS). We calculate carbon fluxes (annual net ecosystem production [NEP]) for each year in our study period, which includes 2012 when drought and higher-than-normal temperatures influence vegetation productivity in large parts of the study area. We present and analyse carbon flux dynamics in the CONUS to better understand how drought affects GPP, RE, and NEP. Model accuracy metrics show strong correlation coefficients (r) (r ≥ 94%) between training and estimated data for both GPP and RE. Overall, average annual GPP, RE, and NEP are relatively constant throughout the study period except during 2012 when almost 60% less carbon is sequestered than normal. These results allow us to conclude that this modelling method effectively estimates carbon dynamics through time and allows the exploration of impacts of meteorological anomalies and vegetation types on carbon dynamics.
Prioritizing Highway Safety Manual's crash prediction variables using boosted regression trees.
Saha, Dibakar; Alluri, Priyanka; Gan, Albert
2015-06-01
The Highway Safety Manual (HSM) recommends using the empirical Bayes (EB) method with locally derived calibration factors to predict an agency's safety performance. However, the data needs for deriving these local calibration factors are significant, requiring very detailed roadway characteristics information. Many of the data variables identified in the HSM are currently unavailable in the states' databases. Moreover, the process of collecting and maintaining all the HSM data variables is cost-prohibitive. Prioritization of the variables based on their impact on crash predictions would, therefore, help to identify influential variables for which data could be collected and maintained for continued updates. This study aims to determine the impact of each independent variable identified in the HSM on crash predictions. A relatively recent data mining approach called boosted regression trees (BRT) is used to investigate the association between the variables and crash predictions. The BRT method can effectively handle different types of predictor variables, identify very complex and non-linear association among variables, and compute variable importance. Five years of crash data from 2008 to 2012 on two urban and suburban facility types, two-lane undivided arterials and four-lane divided arterials, were analyzed for estimating the influence of variables on crash predictions. Variables were found to exhibit non-linear and sometimes complex relationship to predicted crash counts. In addition, only a few variables were found to explain most of the variation in the crash data. Published by Elsevier Ltd.
Prognostic transcriptional association networks: a new supervised approach based on regression trees
Nepomuceno-Chamorro, Isabel; Azuaje, Francisco; Devaux, Yvan; Nazarov, Petr V.; Muller, Arnaud; Aguilar-Ruiz, Jesús S.; Wagner, Daniel R.
2011-01-01
Motivation: The application of information encoded in molecular networks for prognostic purposes is a crucial objective of systems biomedicine. This approach has not been widely investigated in the cardiovascular research area. Within this area, the prediction of clinical outcomes after suffering a heart attack would represent a significant step forward. We developed a new quantitative prediction-based method for this prognostic problem based on the discovery of clinically relevant transcriptional association networks. This method integrates regression trees and clinical class-specific networks, and can be applied to other clinical domains. Results: Before analyzing our cardiovascular disease dataset, we tested the usefulness of our approach on a benchmark dataset with control and disease patients. We also compared it to several algorithms to infer transcriptional association networks and classification models. Comparative results provided evidence of the prediction power of our approach. Next, we discovered new models for predicting good and bad outcomes after myocardial infarction. Using blood-derived gene expression data, our models reported areas under the receiver operating characteristic curve above 0.70. Our model could also outperform different techniques based on co-expressed gene modules. We also predicted processes that may represent novel therapeutic targets for heart disease, such as the synthesis of leucine and isoleucine. Availability: The SATuRNo software is freely available at http://www.lsi.us.es/isanepo/toolsSaturno/. Contact: inepomuceno@us.es Supplementary information: Supplementary data are available at Bioinformatics online. PMID:21098433
I GEDE AGUS JIWADIANA
2015-11-01
Full Text Available The aim of this research is to determine the classification characteristics of traffic accidents in Denpasar city in January-July 2014 by using Classification And Regression Trees (CART. Then, for determine the explanatory variables into the main classifier of CART. The result showed that optimum CART generate three terminal node. First terminal node, there are 12 people were classified as heavy traffic accident characteritics with single accident, and second terminal nodes, there are 68 people were classified as minor traffic accident characteristics by type of traffic accident front-rear, front-front, front-side, pedestrians, side-side and location of traffic accident in district road and sub-district road. For third terminal node, there are 291 people were classified as medium traffic accident characteristics by type of traffic accident front-rear, front-front, front-side, pedestrians, side-side and location of traffic accident in municipality road and explanatory variables into the main splitter to make of CART is type of traffic accident with maximum homogeneity measure of 0.03252.
Tree-space statistics and approximations for large-scale analysis of anatomical trees
Feragen, Aasa; Owen, Megan; Petersen, Jens
2013-01-01
onto tree-space is not available. Using tree-space and its shortest paths, a variety of statistical properties, such as mean, principal component, hypothesis testing and linear discriminant analysis can be defined. For some of these properties it is still an open problem how to compute them; others......Statistical analysis of anatomical trees is hard to perform due to differences in the topological structure of the trees. In this paper we define statistical properties of leaf-labeled anatomical trees with geometric edge attributes by considering the anatomical trees as points in the geometric...... space of leaf-labeled trees. This tree-space is a geodesic metric space where any two trees are connected by a unique shortest path, which corresponds to a tree deformation. However, tree-space is not a manifold, and the usual strategy of performing statistical analysis in a tangent space and projecting...
Distance Based Root Cause Analysis and Change Impact Analysis of Performance Regressions
Junzan Zhou
2015-01-01
Full Text Available Performance regression testing is applied to uncover both performance and functional problems of software releases. A performance problem revealed by performance testing can be high response time, low throughput, or even being out of service. Mature performance testing process helps systematically detect software performance problems. However, it is difficult to identify the root cause and evaluate the potential change impact. In this paper, we present an approach leveraging server side logs for identifying root causes of performance problems. Firstly, server side logs are used to recover call tree of each business transaction. We define a novel distance based metric computed from call trees for root cause analysis and apply inverted index from methods to business transactions for change impact analysis. Empirical studies show that our approach can effectively and efficiently help developers diagnose root cause of performance problems.
Stability Analysis for Regularized Least Squares Regression
Rudin, Cynthia
2005-01-01
We discuss stability for a class of learning algorithms with respect to noisy labels. The algorithms we consider are for regression, and they involve the minimization of regularized risk functionals, such as L(f) := 1/N sum_i (f(x_i)-y_i)^2+ lambda ||f||_H^2. We shall call the algorithm `stable' if, when y_i is a noisy version of f*(x_i) for some function f* in H, the output of the algorithm converges to f* as the regularization term and noise simultaneously vanish. We consider two flavors of...
Tree-space statistics and approximations for large-scale analysis of anatomical trees
Feragen, Aasa; Owen, Megan; Petersen, Jens;
2013-01-01
Statistical analysis of anatomical trees is hard to perform due to differences in the topological structure of the trees. In this paper we define statistical properties of leaf-labeled anatomical trees with geometric edge attributes by considering the anatomical trees as points in the geometric s...... healthy ones. Software is available from http://image.diku.dk/aasa/software.php....
Comprehensive database of diameter-based biomass regressions for North American tree species
Jennifer C. Jenkins; David C. Chojnacky; Linda S. Heath; Richard A. Birdsey
2004-01-01
A database consisting of 2,640 equations compiled from the literature for predicting the biomass of trees and tree components from diameter measurements of species found in North America. Bibliographic information, geographic locations, diameter limits, diameter and biomass units, equation forms, statistical errors, and coefficients are provided for each equation,...
Understanding Boswellia papyrifera tree secondary metabolites through bark spectral analysis
Girma, Atkilt; Skidmore, Andrew K.; de Bie, C. A. J. M.; Bongers, Frans
2015-07-01
Decision makers are concerned whether to tap or rest Boswellia Papyrifera trees. Tapping for the production of frankincense is known to deplete carbon reserves from the tree leading to production of less viable seeds, tree carbon starvation and ultimately tree mortality. Decision makers use traditional experience without considering the amount of metabolites stored or depleted from the stem-bark of the tree. This research was designed to come up with a non-destructive B. papyrifera tree metabolite estimation technique relevant for management using spectroscopy. The concentration of biochemicals (metabolites) found in the tree bark was estimated through spectral analysis. Initially, a random sample of 33 trees was selected, the spectra of bark measured with an Analytical Spectral Device (ASD) spectrometer. Bark samples were air dried and ground. Then, 10 g of sample was soaked in Petroleum ether to extract crude metabolites. Further chemical analysis was conducted to quantify and isolate pure metabolite compounds such as incensole acetate and boswellic acid. The crude metabolites, which relate to frankincense produce, were compared to plant properties (such as diameter and crown area) and reflectance spectra of the bark. Moreover, the extract was compared to the ASD spectra using partial least square regression technique (PLSR) and continuum removed spectral analysis. The continuum removed spectral analysis were performed, on two wavelength regions (1275-1663 and 1836-2217) identified through PLSR, using absorption features such as band depth, area, position, asymmetry and the width to characterize and find relationship with the bark extracts. The results show that tree properties such as diameter at breast height (DBH) and the crown area of untapped and healthy trees were strongly correlated to the amount of stored crude metabolites. In addition, the PLSR technique applied to the first derivative transformation of the reflectance spectrum was found to estimate the
Guideliness for system modeling: fault tree [analysis
Lee, Yoon Hwan; Yang, Joon Eon; Kang, Dae Il; Hwang, Mee Jeong
2004-07-01
This document, the guidelines for system modeling related to Fault Tree Analysis(FTA), is intended to provide the guidelines with the analyzer to construct the fault trees in the level of the capability category II of ASME PRA standard. Especially, they are to provide the essential and basic guidelines and the related contents to be used in support of revising the Ulchin 3 and 4 PSA model for risk monitor within the capability category II of ASME PRA standard. Normally the main objective of system analysis is to assess the reliability of system modeled by Event Tree Analysis (ETA). A variety of analytical techniques can be used for the system analysis, however, FTA method is used in this procedures guide. FTA is the method used for representing the failure logic of plant systems deductively using AND, OR or NOT gates. The fault tree should reflect all possible failure modes that may contribute to the system unavailability. This should include contributions due to the mechanical failures of the components, Common Cause Failures (CCFs), human errors and outages for testing and maintenance. This document identifies and describes the definitions and the general procedures of FTA and the essential and basic guidelines for reving the fault trees. Accordingly, the guidelines for FTA will be capable to guide the FTA to the level of the capability category II of ASME PRA standard.
Aguiar Fabio S
2012-08-01
Full Text Available Abstract Background Tuberculosis (TB remains a public health issue worldwide. The lack of specific clinical symptoms to diagnose TB makes the correct decision to admit patients to respiratory isolation a difficult task for the clinician. Isolation of patients without the disease is common and increases health costs. Decision models for the diagnosis of TB in patients attending hospitals can increase the quality of care and decrease costs, without the risk of hospital transmission. We present a predictive model for predicting pulmonary TB in hospitalized patients in a high prevalence area in order to contribute to a more rational use of isolation rooms without increasing the risk of transmission. Methods Cross sectional study of patients admitted to CFFH from March 2003 to December 2004. A classification and regression tree (CART model was generated and validated. The area under the ROC curve (AUC, sensitivity, specificity, positive and negative predictive values were used to evaluate the performance of model. Validation of the model was performed with a different sample of patients admitted to the same hospital from January to December 2005. Results We studied 290 patients admitted with clinical suspicion of TB. Diagnosis was confirmed in 26.5% of them. Pulmonary TB was present in 83.7% of the patients with TB (62.3% with positive sputum smear and HIV/AIDS was present in 56.9% of patients. The validated CART model showed sensitivity, specificity, positive predictive value and negative predictive value of 60.00%, 76.16%, 33.33%, and 90.55%, respectively. The AUC was 79.70%. Conclusions The CART model developed for these hospitalized patients with clinical suspicion of TB had fair to good predictive performance for pulmonary TB. The most important variable for prediction of TB diagnosis was chest radiograph results. Prospective validation is still necessary, but our model offer an alternative for decision making in whether to isolate patients with
Shabani, Farzin; Kumar, Lalit; Solhjouy-fard, Samaneh
2016-05-01
The aim of this study was to have a comparative investigation and evaluation of the capabilities of correlative and mechanistic modeling processes, applied to the projection of future distributions of date palm in novel environments and to establish a method of minimizing uncertainty in the projections of differing techniques. The location of this study on a global scale is in Middle Eastern Countries. We compared the mechanistic model CLIMEX (CL) with the correlative models MaxEnt (MX), Boosted Regression Trees (BRT), and Random Forests (RF) to project current and future distributions of date palm (Phoenix dactylifera L.). The Global Climate Model (GCM), the CSIRO-Mk3.0 (CS) using the A2 emissions scenario, was selected for making projections. Both indigenous and alien distribution data of the species were utilized in the modeling process. The common areas predicted by MX, BRT, RF, and CL from the CS GCM were extracted and compared to ascertain projection uncertainty levels of each individual technique. The common areas identified by all four modeling techniques were used to produce a map indicating suitable and unsuitable areas for date palm cultivation for Middle Eastern countries, for the present and the year 2100. The four different modeling approaches predict fairly different distributions. Projections from CL were more conservative than from MX. The BRT and RF were the most conservative methods in terms of projections for the current time. The combination of the final CL and MX projections for the present and 2100 provide higher certainty concerning those areas that will become highly suitable for future date palm cultivation. According to the four models, cold, hot, and wet stress, with differences on a regional basis, appears to be the major restrictions on future date palm distribution. The results demonstrate variances in the projections, resulting from different techniques. The assessment and interpretation of model projections requires reservations
Shabani, Farzin; Kumar, Lalit; Solhjouy-fard, Samaneh
2017-08-01
The aim of this study was to have a comparative investigation and evaluation of the capabilities of correlative and mechanistic modeling processes, applied to the projection of future distributions of date palm in novel environments and to establish a method of minimizing uncertainty in the projections of differing techniques. The location of this study on a global scale is in Middle Eastern Countries. We compared the mechanistic model CLIMEX (CL) with the correlative models MaxEnt (MX), Boosted Regression Trees (BRT), and Random Forests (RF) to project current and future distributions of date palm ( Phoenix dactylifera L.). The Global Climate Model (GCM), the CSIRO-Mk3.0 (CS) using the A2 emissions scenario, was selected for making projections. Both indigenous and alien distribution data of the species were utilized in the modeling process. The common areas predicted by MX, BRT, RF, and CL from the CS GCM were extracted and compared to ascertain projection uncertainty levels of each individual technique. The common areas identified by all four modeling techniques were used to produce a map indicating suitable and unsuitable areas for date palm cultivation for Middle Eastern countries, for the present and the year 2100. The four different modeling approaches predict fairly different distributions. Projections from CL were more conservative than from MX. The BRT and RF were the most conservative methods in terms of projections for the current time. The combination of the final CL and MX projections for the present and 2100 provide higher certainty concerning those areas that will become highly suitable for future date palm cultivation. According to the four models, cold, hot, and wet stress, with differences on a regional basis, appears to be the major restrictions on future date palm distribution. The results demonstrate variances in the projections, resulting from different techniques. The assessment and interpretation of model projections requires reservations
Liu, Yang; Lü, Yi-he; Zheng, Hai-feng; Chen, Li-ding
2010-05-01
Based on the 10-day SPOT VEGETATION NDVI data and the daily meteorological data from 1998 to 2007 in Yan' an City, the main meteorological variables affecting the annual and interannual variations of NDVI were determined by using regression tree. It was found that the effects of test meteorological variables on the variability of NDVI differed with seasons and time lags. Temperature and precipitation were the most important meteorological variables affecting the annual variation of NDVI, and the average highest temperature was the most important meteorological variable affecting the inter-annual variation of NDVI. Regression tree was very powerful in determining the key meteorological variables affecting NDVI variation, but could not build quantitative relations between NDVI and meteorological variables, which limited its further and wider application.
Takagi, Daisuke; Ikeda, Ken'ichi; Kawachi, Ichiro
2012-11-01
Crime is an important determinant of public health outcomes, including quality of life, mental well-being, and health behavior. A body of research has documented the association between community social capital and crime victimization. The association between social capital and crime victimization has been examined at multiple levels of spatial aggregation, ranging from entire countries, to states, metropolitan areas, counties, and neighborhoods. In multilevel analysis, the spatial boundaries at level 2 are most often drawn from administrative boundaries (e.g., Census tracts in the U.S.). One problem with adopting administrative definitions of neighborhoods is that it ignores spatial spillover. We conducted a study of social capital and crime victimization in one ward of Tokyo city, using a spatial Durbin model with an inverse-distance weighting matrix that assigned each respondent a unique level of "exposure" to social capital based on all other residents' perceptions. The study is based on a postal questionnaire sent to 20-69 years old residents of Arakawa Ward, Tokyo. The response rate was 43.7%. We examined the contextual influence of generalized trust, perceptions of reciprocity, two types of social network variables, as well as two principal components of social capital (constructed from the above four variables). Our outcome measure was self-reported crime victimization in the last five years. In the spatial Durbin model, we found that neighborhood generalized trust, reciprocity, supportive networks and two principal components of social capital were each inversely associated with crime victimization. By contrast, a multilevel regression performed with the same data (using administrative neighborhood boundaries) found generally null associations between neighborhood social capital and crime. Spatial regression methods may be more appropriate for investigating the contextual influence of social capital in homogeneous cultural settings such as Japan.
Hecht, Jeffrey B.
The analysis of regression residuals and detection of outliers are discussed, with emphasis on determining how deviant an individual data point must be to be considered an outlier and the impact that multiple suspected outlier data points have on the process of outlier determination and treatment. Only bivariate (one dependent and one independent)…
Grinn-Gofroń, Agnieszka; Strzelczak, Agnieszka
2009-11-01
A study was made of the link between time of day, weather variables and the hourly content of certain fungal spores in the atmosphere of the city of Szczecin, Poland, in 2004-2007. Sampling was carried out with a Lanzoni 7-day-recording spore trap. The spores analysed belonged to the taxa Alternaria and Cladosporium. These spores were selected both for their allergenic capacity and for their high level presence in the atmosphere, particularly during summer. Spearman correlation coefficients between spore concentrations, meteorological parameters and time of day showed different indices depending on the taxon being analysed. Relative humidity (RH), air temperature, air pressure and clouds most strongly and significantly influenced the concentration of Alternaria spores. Cladosporium spores correlated less strongly and significantly than Alternaria. Multivariate regression tree analysis revealed that, at air pressures lower than 1,011 hPa the concentration of Alternaria spores was low. Under higher air pressure spore concentrations were higher, particularly when RH was lower than 36.5%. In the case of Cladosporium, under higher air pressure (>1,008 hPa), the spores analysed were more abundant, particularly after 0330 hours. In artificial neural networks, RH, air pressure and air temperature were the most important variables in the model for Alternaria spore concentration. For Cladosporium, clouds, time of day, air pressure, wind speed and dew point temperature were highly significant factors influencing spore concentration. The maximum abundance of Cladosporium spores in air fell between 1200 and 1700 hours.
Regression Commonality Analysis: A Technique for Quantitative Theory Building
Nimon, Kim; Reio, Thomas G., Jr.
2011-01-01
When it comes to multiple linear regression analysis (MLR), it is common for social and behavioral science researchers to rely predominately on beta weights when evaluating how predictors contribute to a regression model. Presenting an underutilized statistical technique, this article describes how organizational researchers can use commonality…
General Nature of Multicollinearity in Multiple Regression Analysis.
Liu, Richard
1981-01-01
Discusses multiple regression, a very popular statistical technique in the field of education. One of the basic assumptions in regression analysis requires that independent variables in the equation should not be highly correlated. The problem of multicollinearity and some of the solutions to it are discussed. (Author)
Fault tree analysis for urban flooding
Ten Veldhuis, J.A.E.; Clemens, F.H.L.R.; Van Gelder, P.H.A.J.M.
2008-01-01
Traditional methods to evaluate flood risk mostly focus on storm events as the main cause of flooding. Fault tree analysis is a technique that is able to model all potential causes of flooding and to quantify both the overall probability of flooding and the contributions of all causes of flooding to
[Measurement and analysis of anatomical parameter values in tree shrews].
Li, Bo; Zhang, Rong-Ping; Li, Jin-Tao; He, Bao-Li; Zhen, Hong; Wang, Li-Mei; Jiao, Jian-Lin
2013-04-01
Anatomical parameter values in tree shrews are major biological characteristic indicators in laboratory animals. Body size, bones and mammilla, organ weights, coefficient intestinal canal and other anatomical data were measured and analyzed in laboratory domesticated tree shrews (7 to 9 months of age). Measurement of 31 anatomical parameters showed that body height, width of the right ear, ileum and colon had significant differences between males and females (P<0.05). Highly significant differences were also found in body slanting length, chest depth, torso length, left and right forelimb length, right hind limb length, left and right ear length, left ear width, keel bone length, left and right tibia length, duodenum and jejunum (P<0.01). With body length as the dependent variable, and tail length, torso length, right and left forelimb length, and left and right hind limb length as independent variables for stepwise regression analysis, the regression equation for body length = 13.90 + tail length × 0.16. The results of 37 organs weights between female and male tree shrews showed very significant differences (P<0.01) for weight of heart, lungs, spleen, left and right kidney, bladder, left and right hippocampus, left submandibular gland, and left and right thyroid gland, as well as significant (P<0.05) differences in the small intestine, right submandibular gland, and left adrenal gland. The coefficient of heart, lung, stomach, bladder, small and large intestine, brain, right hippocampus, and left adrenal gland showed highly significant differences (P<0.01), while differences in the right kidney, left hippocampus, left submandibular gland, right adrenal gland, and left and right thyroid gland were significant (P<0.05). With animal weight as the dependent variable and indicators of heart, lung, liver, spleen, left and right kidney and brain as independent variables for stepwise regression analysis, the regression equation showed that weight = 62.73 + left kidney
Predicting Nigeria budget allocation using regression analysis: A ...
Predicting Nigeria budget allocation using regression analysis: A data mining approach. ... Open Access DOWNLOAD FULL TEXT ... Budget is used by the Government as a guiding tool for planning and management of its resources to aid in ...
An Original Stepwise Multilevel Logistic Regression Analysis of Discriminatory Accuracy
Merlo, Juan; Wagner, Philippe; Ghith, Nermin
2016-01-01
BACKGROUND AND AIM: Many multilevel logistic regression analyses of "neighbourhood and health" focus on interpreting measures of associations (e.g., odds ratio, OR). In contrast, multilevel analysis of variance is rarely considered. We propose an original stepwise analytical approach that disting......BACKGROUND AND AIM: Many multilevel logistic regression analyses of "neighbourhood and health" focus on interpreting measures of associations (e.g., odds ratio, OR). In contrast, multilevel analysis of variance is rarely considered. We propose an original stepwise analytical approach...
3D Regression Heat Map Analysis of Population Study Data.
Klemm, Paul; Lawonn, Kai; Glaßer, Sylvia; Niemann, Uli; Hegenscheid, Katrin; Völzke, Henry; Preim, Bernhard
2016-01-01
Epidemiological studies comprise heterogeneous data about a subject group to define disease-specific risk factors. These data contain information (features) about a subject's lifestyle, medical status as well as medical image data. Statistical regression analysis is used to evaluate these features and to identify feature combinations indicating a disease (the target feature). We propose an analysis approach of epidemiological data sets by incorporating all features in an exhaustive regression-based analysis. This approach combines all independent features w.r.t. a target feature. It provides a visualization that reveals insights into the data by highlighting relationships. The 3D Regression Heat Map, a novel 3D visual encoding, acts as an overview of the whole data set. It shows all combinations of two to three independent features with a specific target disease. Slicing through the 3D Regression Heat Map allows for the detailed analysis of the underlying relationships. Expert knowledge about disease-specific hypotheses can be included into the analysis by adjusting the regression model formulas. Furthermore, the influences of features can be assessed using a difference view comparing different calculation results. We applied our 3D Regression Heat Map method to a hepatic steatosis data set to reproduce results from a data mining-driven analysis. A qualitative analysis was conducted on a breast density data set. We were able to derive new hypotheses about relations between breast density and breast lesions with breast cancer. With the 3D Regression Heat Map, we present a visual overview of epidemiological data that allows for the first time an interactive regression-based analysis of large feature sets with respect to a disease.
Least Squares Adjustment: Linear and Nonlinear Weighted Regression Analysis
Nielsen, Allan Aasbjerg
2007-01-01
This note primarily describes the mathematics of least squares regression analysis as it is often used in geodesy including land surveying and satellite positioning applications. In these fields regression is often termed adjustment. The note also contains a couple of typical land surveying...... and satellite positioning application examples. In these application areas we are typically interested in the parameters in the model typically 2- or 3-D positions and not in predictive modelling which is often the main concern in other regression analysis applications. Adjustment is often used to obtain...
Betsey Dexter Dyer
2008-01-01
Full Text Available Classification and regression tree (CART analysis was applied to genome-wide tetranucleotide frequencies (genomic signatures of 195 archaea and bacteria. Although genomic signatures have typically been used to classify evolutionary divergence, in this study, convergent evolution was the focus. Temperature optima for most of the organisms examined could be distinguished by CART analyses of tetranucleotide frequencies. This suggests that pervasive (nonlinear qualities of genomes may reflect certain environmental conditions (such as temperature in which those genomes evolved. The predominant use of GAGA and AGGA as the discriminating tetramers in CART models suggests that purine-loading and codon biases of thermophiles may explain some of the results.
Bou Kheir, Rania; Shomar, B.; Greve, Mogens Humlekrog
2014-01-01
Soil heavy metal pollution has been and continues to be a worldwide phenomenon that has attracted a great deal of attention from governments and regulatory bodies. In this context, our study used Geographic Information Systems (GIS) and regression-tree modeling (196 trees) to precisely quantify...... as weighted input data in soil pollution prediction models. The developed strongest relationships were associated with Cd and As, variance being equal to 82%, followed by Ni (75%) and Cr (73%) as the weakest relationship. This study also showed that nearness to cities (with a relative importance varying...... the relationships between four toxic heavy metals (Ni, Cr, Cd and As) and sixteen environmental parameters (e.g., parent material, slope gradient, proximity to roads, etc.) in the soils of northern Lebanon (as a case study of Mediterranean landscapes), and to detect the most important parameters that can be used...
FAULT TREE ANALYSIS OF SOLAR CONCENTRATORS
Dobrivoje Catic
2013-12-01
Full Text Available In the introductory part, the history development is presented, and it points out the importance of using the Fault Tree Analysis - FTA method for analysis of the reliability and safety of technical systems. By analyzing a number of references related to the FTA method, the FTA methodology is established, and explanation of some steps by this method is given in this paper. As an example of the practical application of methods, the failure of the solar concentrators is analyzed.For the failure analysis of the considered device, it is necessary to know the structure, functioning, working conditions and all factors that have a greater or less influence on its reliability. Along with an explanation of certain parts of the fault tree, the estimation of the significance of certain events is done, and it is considered to be able to eliminate causes of failure or to minimize the consequences of failure.
Ahn, Kuk-Hyun; Palmer, Richard
2016-09-01
Despite wide use of regression-based regional flood frequency analysis (RFFA) methods, the majority are based on either ordinary least squares (OLS) or generalized least squares (GLS). This paper proposes 'spatial proximity' based RFFA methods using the spatial lagged model (SLM) and spatial error model (SEM). The proposed methods are represented by two frameworks: the quantile regression technique (QRT) and parameter regression technique (PRT). The QRT develops prediction equations for flooding quantiles in average recurrence intervals (ARIs) of 2, 5, 10, 20, and 100 years whereas the PRT provides prediction of three parameters for the selected distribution. The proposed methods are tested using data incorporating 30 basin characteristics from 237 basins in Northeastern United States. Results show that generalized extreme value (GEV) distribution properly represents flood frequencies in the study gages. Also, basin area, stream network, and precipitation seasonality are found to be the most effective explanatory variables in prediction modeling by the QRT and PRT. 'Spatial proximity' based RFFA methods provide reliable flood quantile estimates compared to simpler methods. Compared to the QRT, the PRT may be recommended due to its accuracy and computational simplicity. The results presented in this paper may serve as one possible guidepost for hydrologists interested in flood analysis at ungaged sites.
Homer, Collin G.; Aldridge, Cameron L.; Meyer, Debra K.; Schell, Spencer J.
2012-02-01
Sagebrush ecosystems in North America have experienced extensive degradation since European settlement. Further degradation continues from exotic invasive plants, altered fire frequency, intensive grazing practices, oil and gas development, and climate change - adding urgency to the need for ecosystem-wide understanding. Remote sensing is often identified as a key information source to facilitate ecosystem-wide characterization, monitoring, and analysis; however, approaches that characterize sagebrush with sufficient and accurate local detail across large enough areas to support this paradigm are unavailable. We describe the development of a new remote sensing sagebrush characterization approach for the state of Wyoming, U.S.A. This approach integrates 2.4 m QuickBird, 30 m Landsat TM, and 56 m AWiFS imagery into the characterization of four primary continuous field components including percent bare ground, percent herbaceous cover, percent litter, and percent shrub, and four secondary components including percent sagebrush ( Artemisia spp.), percent big sagebrush ( Artemisia tridentata), percent Wyoming sagebrush ( Artemisia tridentata Wyomingensis), and shrub height using a regression tree. According to an independent accuracy assessment, primary component root mean square error (RMSE) values ranged from 4.90 to 10.16 for 2.4 m QuickBird, 6.01 to 15.54 for 30 m Landsat, and 6.97 to 16.14 for 56 m AWiFS. Shrub and herbaceous components outperformed the current data standard called LANDFIRE, with a shrub RMSE value of 6.04 versus 12.64 and a herbaceous component RMSE value of 12.89 versus 14.63. This approach offers new advancements in sagebrush characterization from remote sensing and provides a foundation to quantitatively monitor these components into the future.
Homer, Collin G.; Aldridge, Cameron L.; Meyer, Debra K.; Schell, Spencer J.
2012-01-01
agebrush ecosystems in North America have experienced extensive degradation since European settlement. Further degradation continues from exotic invasive plants, altered fire frequency, intensive grazing practices, oil and gas development, and climate change – adding urgency to the need for ecosystem-wide understanding. Remote sensing is often identified as a key information source to facilitate ecosystem-wide characterization, monitoring, and analysis; however, approaches that characterize sagebrush with sufficient and accurate local detail across large enough areas to support this paradigm are unavailable. We describe the development of a new remote sensing sagebrush characterization approach for the state of Wyoming, U.S.A. This approach integrates 2.4 m QuickBird, 30 m Landsat TM, and 56 m AWiFS imagery into the characterization of four primary continuous field components including percent bare ground, percent herbaceous cover, percent litter, and percent shrub, and four secondary components including percent sagebrush (Artemisia spp.), percent big sagebrush (Artemisia tridentata), percent Wyoming sagebrush (Artemisia tridentata Wyomingensis), and shrub height using a regression tree. According to an independent accuracy assessment, primary component root mean square error (RMSE) values ranged from 4.90 to 10.16 for 2.4 m QuickBird, 6.01 to 15.54 for 30 m Landsat, and 6.97 to 16.14 for 56 m AWiFS. Shrub and herbaceous components outperformed the current data standard called LANDFIRE, with a shrub RMSE value of 6.04 versus 12.64 and a herbaceous component RMSE value of 12.89 versus 14.63. This approach offers new advancements in sagebrush characterization from remote sensing and provides a foundation to quantitatively monitor these components into the future.
Gu, Yingxin; Wylie, Bruce K.; Boyte, Stephen
2016-01-01
In this study, we developed a method that identifies an optimal sample data usage strategy and rule numbers that minimize over- and underfitting effects in regression tree mapping models. A LANDFIRE tile (r04c03, located mainly in northeastern Nevada), which is a composite of multiple Landsat 8 scenes for a target date, was selected for the study. To minimize any cloud and bad detection effects in the original Landsat 8 data, the compositing approach used cosine-similarity-combined pixels from multiple observations based on data quality and temporal proximity to a target date. Julian date 212, which yielded relatively low "no data and/or cloudy” pixels, was used as the target date with Landsat 8 observations from days 140–240 in 2013. The 30-m Landsat 8 composited data were then upscaled to 250 m using a spatial averaging method. Six Landsat 8 spectral bands (bands 1–6) at 250-m resolution were used as independent variables for developing the piecewise regression-tree models to predict the 250-m eMODIS NDVI (dependent variable). Furthermore, to ensure the high quality of the derived 250-m Landsat 8 data, and avoid any additional cloud and atmospheric effects, the percentage of 30-m pixels with “0” values within a 250-m pixel was calculated. Only those 250-m pixels with 0% of “0” values (i.e., all the 30-m pixels within a 250-m pixel have no zero values pixels) were selected to develop the regression-tree model.The 7-day maximum value composites of 250-m MODIS NDVI for the year 2013 were obtained from the USGS expedited MODIS (eMODIS) data archive (https://lta.cr.usgs.gov/emodis). Pixels with bad quality, negative values, clouds, snow cover, and low view angles were filtered out based on the MODIS quality assurance data to ensure high quality eMODIS NDVI data. The 2013 weekly NDVI data were then stacked and temporally smoothed using a weighted least-squares approach to reduce additional atmospheric noise. Temporal smoothing helps to ensure reliable
Research and analyze of physical health using multiple regression analysis
T. S. Kyi
2014-01-01
Full Text Available This paper represents the research which is trying to create a mathematical model of the "healthy people" using the method of regression analysis. The factors are the physical parameters of the person (such as heart rate, lung capacity, blood pressure, breath holding, weight height coefficient, flexibility of the spine, muscles of the shoulder belt, abdominal muscles, squatting, etc.., and the response variable is an indicator of physical working capacity. After performing multiple regression analysis, obtained useful multiple regression models that can predict the physical performance of boys the aged of fourteen to seventeen years. This paper represents the development of regression model for the sixteen year old boys and analyzed results.
Tree-space statistics and approximations for large-scale analysis of anatomical trees.
Feragen, Aasa; Owen, Megan; Petersen, Jens; Wille, Mathilde M W; Thomsen, Laura H; Dirksen, Asger; de Bruijne, Marleen
2013-01-01
Statistical analysis of anatomical trees is hard to perform due to differences in the topological structure of the trees. In this paper we define statistical properties of leaf-labeled anatomical trees with geometric edge attributes by considering the anatomical trees as points in the geometric space of leaf-labeled trees. This tree-space is a geodesic metric space where any two trees are connected by a unique shortest path, which corresponds to a tree deformation. However, tree-space is not a manifold, and the usual strategy of performing statistical analysis in a tangent space and projecting onto tree-space is not available. Using tree-space and its shortest paths, a variety of statistical properties, such as mean, principal component, hypothesis testing and linear discriminant analysis can be defined. For some of these properties it is still an open problem how to compute them; others (like the mean) can be computed, but efficient alternatives are helpful in speeding up algorithms that use means iteratively, like hypothesis testing. In this paper, we take advantage of a very large dataset (N = 8016) to obtain computable approximations, under the assumption that the data trees parametrize the relevant parts of tree-space well. Using the developed approximate statistics, we illustrate how the structure and geometry of airway trees vary across a population and show that airway trees with Chronic Obstructive Pulmonary Disease come from a different distribution in tree-space than healthy ones. Software is available from http://image.diku.dk/aasa/software.php.
U.S. Geological Survey, Department of the Interior — Integrating spatially explicit biogeophysical and remotely sensed data into regression-tree models enables the spatial extrapolation of training data over large...
Schumacher, Phyllis; Olinsky, Alan; Quinn, John; Smith, Richard
2010-01-01
The authors extended previous research by 2 of the authors who conducted a study designed to predict the successful completion of students enrolled in an actuarial program. They used logistic regression to determine the probability of an actuarial student graduating in the major or dropping out. They compared the results of this study with those…
Schumacher, Phyllis; Olinsky, Alan; Quinn, John; Smith, Richard
2010-01-01
The authors extended previous research by 2 of the authors who conducted a study designed to predict the successful completion of students enrolled in an actuarial program. They used logistic regression to determine the probability of an actuarial student graduating in the major or dropping out. They compared the results of this study with those…
Regression Model Optimization for the Analysis of Experimental Data
Ulbrich, N.
2009-01-01
A candidate math model search algorithm was developed at Ames Research Center that determines a recommended math model for the multivariate regression analysis of experimental data. The search algorithm is applicable to classical regression analysis problems as well as wind tunnel strain gage balance calibration analysis applications. The algorithm compares the predictive capability of different regression models using the standard deviation of the PRESS residuals of the responses as a search metric. This search metric is minimized during the search. Singular value decomposition is used during the search to reject math models that lead to a singular solution of the regression analysis problem. Two threshold dependent constraints are also applied. The first constraint rejects math models with insignificant terms. The second constraint rejects math models with near-linear dependencies between terms. The math term hierarchy rule may also be applied as an optional constraint during or after the candidate math model search. The final term selection of the recommended math model depends on the regressor and response values of the data set, the user s function class combination choice, the user s constraint selections, and the result of the search metric minimization. A frequently used regression analysis example from the literature is used to illustrate the application of the search algorithm to experimental data.
Simulation Experiments in Practice : Statistical Design and Regression Analysis
Kleijnen, J.P.C.
2007-01-01
In practice, simulation analysts often change only one factor at a time, and use graphical analysis of the resulting Input/Output (I/O) data. Statistical theory proves that more information is obtained when applying Design Of Experiments (DOE) and linear regression analysis. Unfortunately, classic t
Evaluation Applications of Regression Analysis with Time-Series Data.
Veney, James E.
1993-01-01
The application of time series analysis is described, focusing on the use of regression analysis for analyzing time series in a way that may make it more readily available to an evaluation practice audience. Practical guidelines are suggested for decision makers in government, health, and social welfare agencies. (SLD)
The Analysis of the Regression-Discontinuity Design in R
Thoemmes, Felix; Liao, Wang; Jin, Ze
2017-01-01
This article describes the analysis of regression-discontinuity designs (RDDs) using the R packages rdd, rdrobust, and rddtools. We discuss similarities and differences between these packages and provide directions on how to use them effectively. We use real data from the Carolina Abecedarian Project to show how an analysis of an RDD can be…
Joint regression analysis and AMMI model applied to oat improvement
Oliveira, A.; Oliveira, T. A.; Mejza, S.
2012-09-01
In our work we present an application of some biometrical methods useful in genotype stability evaluation, namely AMMI model, Joint Regression Analysis (JRA) and multiple comparison tests. A genotype stability analysis of oat (Avena Sativa L.) grain yield was carried out using data of the Portuguese Plant Breeding Board, sample of the 22 different genotypes during the years 2002, 2003 and 2004 in six locations. In Ferreira et al. (2006) the authors state the relevance of the regression models and of the Additive Main Effects and Multiplicative Interactions (AMMI) model, to study and to estimate phenotypic stability effects. As computational techniques we use the Zigzag algorithm to estimate the regression coefficients and the agricolae-package available in R software for AMMI model analysis.
Ratio Versus Regression Analysis: Some Empirical Evidence in Brazil
Newton Carneiro Affonso da Costa Jr.
2004-06-01
Full Text Available This work compares the traditional methodology for ratio analysis, applied to a sample of Brazilian firms, with the alternative one of regression analysis both to cross-industry and intra-industry samples. It was tested the structural validity of the traditional methodology through a model that represents its analogous regression format. The data are from 156 Brazilian public companies in nine industrial sectors for the year 1997. The results provide weak empirical support for the traditional ratio methodology as it was verified that the validity of this methodology may differ between ratios.
Time series analysis using semiparametric regression on oil palm production
Yundari, Pasaribu, U. S.; Mukhaiyar, U.
2016-04-01
This paper presents semiparametric kernel regression method which has shown its flexibility and easiness in mathematical calculation, especially in estimating density and regression function. Kernel function is continuous and it produces a smooth estimation. The classical kernel density estimator is constructed by completely nonparametric analysis and it is well reasonable working for all form of function. Here, we discuss about parameter estimation in time series analysis. First, we consider the parameters are exist, then we use nonparametrical estimation which is called semiparametrical. The selection of optimum bandwidth is obtained by considering the approximation of Mean Integrated Square Root Error (MISE).
Analysis of Sting Balance Calibration Data Using Optimized Regression Models
Ulbrich, N.; Bader, Jon B.
2010-01-01
Calibration data of a wind tunnel sting balance was processed using a candidate math model search algorithm that recommends an optimized regression model for the data analysis. During the calibration the normal force and the moment at the balance moment center were selected as independent calibration variables. The sting balance itself had two moment gages. Therefore, after analyzing the connection between calibration loads and gage outputs, it was decided to choose the difference and the sum of the gage outputs as the two responses that best describe the behavior of the balance. The math model search algorithm was applied to these two responses. An optimized regression model was obtained for each response. Classical strain gage balance load transformations and the equations of the deflection of a cantilever beam under load are used to show that the search algorithm s two optimized regression models are supported by a theoretical analysis of the relationship between the applied calibration loads and the measured gage outputs. The analysis of the sting balance calibration data set is a rare example of a situation when terms of a regression model of a balance can directly be derived from first principles of physics. In addition, it is interesting to note that the search algorithm recommended the correct regression model term combinations using only a set of statistical quality metrics that were applied to the experimental data during the algorithm s term selection process.
BEAST: Bayesian evolutionary analysis by sampling trees
Drummond Alexei J
2007-11-01
Full Text Available Abstract Background The evolutionary analysis of molecular sequence variation is a statistical enterprise. This is reflected in the increased use of probabilistic models for phylogenetic inference, multiple sequence alignment, and molecular population genetics. Here we present BEAST: a fast, flexible software architecture for Bayesian analysis of molecular sequences related by an evolutionary tree. A large number of popular stochastic models of sequence evolution are provided and tree-based models suitable for both within- and between-species sequence data are implemented. Results BEAST version 1.4.6 consists of 81000 lines of Java source code, 779 classes and 81 packages. It provides models for DNA and protein sequence evolution, highly parametric coalescent analysis, relaxed clock phylogenetics, non-contemporaneous sequence data, statistical alignment and a wide range of options for prior distributions. BEAST source code is object-oriented, modular in design and freely available at http://beast-mcmc.googlecode.com/ under the GNU LGPL license. Conclusion BEAST is a powerful and flexible evolutionary analysis package for molecular sequence variation. It also provides a resource for the further development of new models and statistical methods of evolutionary analysis.
Fault tree analysis of KNICS RPS software
Park, Gee Yong; Kwon, Kee Choon [Korea Atomic Energy Research Institute, Daejeon (Korea, Republic of); Koh, Kwang Yong; Jee, Eun Kyoung; Seong, Poong Hyun [Korea Advanced Institute of Science and Technology, Daejeon (Korea, Republic of); Lee, Dae Hyung [Doosan Heavy Industries and Construction, Yongin (Korea, Republic of)
2008-08-15
This paper describes the application of a software Fault Tree Analysis (FTA) as one of the analysis techniques for a Software Safety Analysis (SSA) at the design phase and its analysis results for the safety-critical software of a digital reactor protection system, which is called the KNICS RPS, being developed in the KNICS (Korea Nuclear Instrumentation and Control Systems) project. The software modules in the design description were represented by Function Blocks (FBs), and the software FTA was performed based on the well-defined fault tree templates for the FBs. The SSA, which is part of the verification and validation (V and V) activities, was activated at each phase of the software lifecycle for the KNICS RPS. At the design phase, the software HAZOP (Hazard and Operability) and the software FTA were employed in the SSA in such a way that the software HAZOP was performed first and then the software FTA was applied. The software FTA was applied to some critical modules selected from the software HAZOP analysis.
Fuzzy Uncertainty Evaluation for Fault Tree Analysis
Kim, Ki Beom; Shim, Hyung Jin [Seoul National University, Seoul (Korea, Republic of); Jae, Moo Sung [Hanyang University, Seoul (Korea, Republic of)
2015-05-15
This traditional probabilistic approach can calculate relatively accurate results. However it requires a long time because of repetitive computation due to the MC method. In addition, when informative data for statistical analysis are not sufficient or some events are mainly caused by human error, the probabilistic approach may not be possible because uncertainties of these events are difficult to be expressed by probabilistic distributions. In order to reduce the computation time and quantify uncertainties of top events when basic events whose uncertainties are difficult to be expressed by probabilistic distributions exist, the fuzzy uncertainty propagation based on fuzzy set theory can be applied. In this paper, we develop a fuzzy uncertainty propagation code and apply the fault tree of the core damage accident after the large loss of coolant accident (LLOCA). The fuzzy uncertainty propagation code is implemented and tested for the fault tree of the radiation release accident. We apply this code to the fault tree of the core damage accident after the LLOCA in three cases and compare the results with those computed by the probabilistic uncertainty propagation using the MC method. The results obtained by the fuzzy uncertainty propagation can be calculated in relatively short time, covering the results obtained by the probabilistic uncertainty propagation.
Sparse Regression by Projection and Sparse Discriminant Analysis
Qi, Xin
2015-04-03
© 2015, © American Statistical Association, Institute of Mathematical Statistics, and Interface Foundation of North America. Recent years have seen active developments of various penalized regression methods, such as LASSO and elastic net, to analyze high-dimensional data. In these approaches, the direction and length of the regression coefficients are determined simultaneously. Due to the introduction of penalties, the length of the estimates can be far from being optimal for accurate predictions. We introduce a new framework, regression by projection, and its sparse version to analyze high-dimensional data. The unique nature of this framework is that the directions of the regression coefficients are inferred first, and the lengths and the tuning parameters are determined by a cross-validation procedure to achieve the largest prediction accuracy. We provide a theoretical result for simultaneous model selection consistency and parameter estimation consistency of our method in high dimension. This new framework is then generalized such that it can be applied to principal components analysis, partial least squares, and canonical correlation analysis. We also adapt this framework for discriminant analysis. Compared with the existing methods, where there is relatively little control of the dependency among the sparse components, our method can control the relationships among the components. We present efficient algorithms and related theory for solving the sparse regression by projection problem. Based on extensive simulations and real data analysis, we demonstrate that our method achieves good predictive performance and variable selection in the regression setting, and the ability to control relationships between the sparse components leads to more accurate classification. In supplementary materials available online, the details of the algorithms and theoretical proofs, and R codes for all simulation studies are provided.
Dynamic Event Tree Analysis Through RAVEN
A. Alfonsi; C. Rabiti; D. Mandelli; J. Cogliati; R. A. Kinoshita; A. Naviglio
2013-09-01
Conventional Event-Tree (ET) based methodologies are extensively used as tools to perform reliability and safety assessment of complex and critical engineering systems. One of the disadvantages of these methods is that timing/sequencing of events and system dynamics is not explicitly accounted for in the analysis. In order to overcome these limitations several techniques, also know as Dynamic Probabilistic Risk Assessment (D-PRA), have been developed. Monte-Carlo (MC) and Dynamic Event Tree (DET) are two of the most widely used D-PRA methodologies to perform safety assessment of Nuclear Power Plants (NPP). In the past two years, the Idaho National Laboratory (INL) has developed its own tool to perform Dynamic PRA: RAVEN (Reactor Analysis and Virtual control ENvironment). RAVEN has been designed in a high modular and pluggable way in order to enable easy integration of different programming languages (i.e., C++, Python) and coupling with other application including the ones based on the MOOSE framework, developed by INL as well. RAVEN performs two main tasks: 1) control logic driver for the new Thermo-Hydraulic code RELAP-7 and 2) post-processing tool. In the first task, RAVEN acts as a deterministic controller in which the set of control logic laws (user defined) monitors the RELAP-7 simulation and controls the activation of specific systems. Moreover, RAVEN also models stochastic events, such as components failures, and performs uncertainty quantification. Such stochastic modeling is employed by using both MC and DET algorithms. In the second task, RAVEN processes the large amount of data generated by RELAP-7 using data-mining based algorithms. This paper focuses on the first task and shows how it is possible to perform the analysis of dynamic stochastic systems using the newly developed RAVEN DET capability. As an example, the Dynamic PRA analysis, using Dynamic Event Tree, of a simplified pressurized water reactor for a Station Black-Out scenario is presented.
Regression analysis for solving diagnosis problem of children's health
Cherkashina, Yu A.; Gerget, O. M.
2016-04-01
The paper includes results of scientific researches. These researches are devoted to the application of statistical techniques, namely, regression analysis, to assess the health status of children in the neonatal period based on medical data (hemostatic parameters, parameters of blood tests, the gestational age, vascular-endothelial growth factor) measured at 3-5 days of children's life. In this paper a detailed description of the studied medical data is given. A binary logistic regression procedure is discussed in the paper. Basic results of the research are presented. A classification table of predicted values and factual observed values is shown, the overall percentage of correct recognition is determined. Regression equation coefficients are calculated, the general regression equation is written based on them. Based on the results of logistic regression, ROC analysis was performed, sensitivity and specificity of the model are calculated and ROC curves are constructed. These mathematical techniques allow carrying out diagnostics of health of children providing a high quality of recognition. The results make a significant contribution to the development of evidence-based medicine and have a high practical importance in the professional activity of the author.
Visual category recognition using Spectral Regression and Kernel Discriminant Analysis
Tahir, M.A.; Kittler, J.; Mikolajczyk, K.; Yan, F.; van de Sande, K.E.A.; Gevers, T.
2009-01-01
Visual category recognition (VCR) is one of the most important tasks in image and video indexing. Spectral methods have recently emerged as a powerful tool for dimensionality reduction and manifold learning. Recently, Spectral Regression combined with Kernel Discriminant Analysis (SR-KDA) has been s
Controlling the Type I Error Rate in Stepwise Regression Analysis.
Pohlmann, John T.
Three procedures used to control Type I error rate in stepwise regression analysis are forward selection, backward elimination, and true stepwise. In the forward selection method, a model of the dependent variable is formed by choosing the single best predictor; then the second predictor which makes the strongest contribution to the prediction of…
Regis Wendpouire Oubida
2015-03-01
Full Text Available Local adaptation to climate in temperate forest trees involves the integration of multiple physiological, morphological, and phenological traits. Latitudinal clines are frequently observed for these traits, but environmental constraints also track longitude and altitude. We combined extensive phenotyping of 12 candidate adaptive traits, multivariate regression trees, quantitative genetics, and a genome-wide panel of SNP markers to better understand the interplay among geography, climate, and adaptation to abiotic factors in Populus trichocarpa. Heritabilities were low to moderate (0.13 to 0.32 and population differentiation for many traits exceeded the 99th percentile of the genome-wide distribution of FST, suggesting local adaptation. When climate variables were taken as predictors and the 12 traits as response variables in a multivariate regression tree analysis, evapotranspiration (Eref explained the most variation, with subsequent splits related to mean temperature of the warmest month, frost-free period (FFP, and mean annual precipitation (MAP. These grouping matched relatively well the splits using geographic variables as predictors: the northernmost groups (short FFP and low Eref had the lowest growth, and lowest cold injury index; the southern British Columbia group (low Eref and intermediate temperatures had average growth and cold injury index; the group from the coast of California and Oregon (high Eref and FFP had the highest growth performance and the highest cold injury index; and the southernmost, high-altitude group (with high Eref and low FFP performed poorly, had high cold injury index, and lower water use efficiency. Taken together, these results suggest variation in both temperature and water availability across the range shape multivariate adaptive traits in poplar.
王静; 郑群; 余素飞; 冯贻君; 沈波
2015-01-01
目的 采用Logistic回归筛选与宫颈癌相关的血清肿瘤标志物,并进一步使用分类树卡方自动交互检测法(CHAID)建立鳞状上皮细胞癌相关抗原(Scc)在宫颈癌中的辅助诊断模型.方法 回顾性收集2010至2013年浙江省台州医院检测肿瘤标志物的宫颈癌初诊患者581例,宫颈良性疾病者342例,健康体检者341名,检测其糖类抗原199(CA199)、糖类抗原125(CA125)、CEA、SCC、AFP水平.先采用Logistic回归筛选出有统计学意义的肿瘤标志物,再进一步采用决策树CHAID法确定上述肿瘤标志物在辅助诊断宫颈癌中的价值.最后收集2014年1至12月SCC高于本研究得出的诊断值的子宫相关疾病患者共284例,计算其中的宫颈癌患者比例来验证决策树CHAID法结果.结果 Logistic回归结果显示5类可能与宫颈癌相关的肿瘤标志物中仅SCC具有统计学意义(Wald x2=22.120,P=0.000),OR值及其95％ CI为1.900(1.454 ～2.483).随着SCC数值的升高,宫颈癌患者的比例也逐渐增高,当SCC＞2.20 μg/L时,阳性预测值达94.7％.284例SCC高于2.20 μg/L的考虑子宫相关疾病的人群中,最终证实为宫颈癌的比例为95.1％(270例).结论 SCC对于官颈癌患者具有较好的辅助诊断价值.%Objective To explore the relationship between serum tumor markers and cervical cancer by using Logistic regression, and to further establish the diagnosis model of squamous cell carcinoma antigen (SCC) in cervical cancer by using chi-squared automatic interaction detector (CHAID) analysis of decision tree.Methods Total of 581 cases of cervical cancer,342 cases of cervical benign diseases and 341 cases of healthy controls who detected tumor markers in Taizhou Hospital of Zhejiang during 2010-2013, were retrospectively studied.The test results of carbohydrate antigen 199 (CA199), carbohydrate antigen 125 (CA125), carcinoembryonic antigen (CEA), SCC, and alpha fetoprotein (AFP) were reviewed.The Logistic regression were
Bennema, S C; Molento, M B; Scholte, R G; Carvalho, O S; Pritsch, I
2017-11-01
Fascioliasis is a condition caused by the trematode Fasciola hepatica. In this paper, the spatial distribution of F. hepatica in bovines in Brazil was modelled using a decision tree approach and a logistic regression, combined with a geographic information system (GIS) query. In the decision tree and the logistic model, isothermality had the strongest influence on disease prevalence. Also, the 50-year average precipitation in the warmest quarter of the year was included as a risk factor, having a negative influence on the parasite prevalence. The risk maps developed using both techniques, showed a predicted higher prevalence mainly in the South of Brazil. The prediction performance seemed to be high, but both techniques failed to reach a high accuracy in predicting the medium and high prevalence classes to the entire country. The GIS query map, based on the range of isothermality, minimum temperature of coldest month, precipitation of warmest quarter of the year, altitude and the average dailyland surface temperature, showed a possibility of presence of F. hepatica in a very large area. The risk maps produced using these methods can be used to focus activities of animal and public health programmes, even on non-evaluated F. hepatica areas.
MULTIPLE REGRESSION ANALYSIS OF MAIN ECONOMIC INDICATORS IN TOURISM
Erika KULCSÁR
2009-12-01
Full Text Available This paper analysis the measure between GDP dependent variable in the sector of hotels and restaurants and the following independent variables: overnight stays in the establishments of touristic reception, arrivals in the establishments of touristic reception and investments in hotels and restaurants sector in the period of analysis 1995-2007. With the multiple regression analysis I found that investments and tourist arrivals are significant predictors for the GDP dependent variable. Based on these results, I identified those components of the marketing mix, which in my opinion require investment, which could contribute to the positive development of tourist arrivals in the establishments of touristic reception.
MULTIPLE REGRESSION ANALYSIS OF MAIN ECONOMIC INDICATORS IN TOURISM
Erika KULCSÁR
2009-12-01
Full Text Available This paper analysis the measure between GDP dependent variable in the sector of hotels and restaurants and the following independent variables: overnight stays in the establishments of touristic reception, arrivals in the establishments of touristic reception and investments in hotels and restaurants sector in the period of analysis 1995-2007. With the multiple regression analysis I found that investments and tourist arrivals are significant predictors for the GDP dependent variable. Based on these results, I identified those components of the marketing mix, which in my opinion require investment, which could contribute to the positive development of tourist arrivals in the establishments of touristic reception.
Principal regression analysis and the index leverage effect
Reigneron, Pierre-Alain; Allez, Romain; Bouchaud, Jean-Philippe
2011-09-01
We revisit the index leverage effect, that can be decomposed into a volatility effect and a correlation effect. We investigate the latter using a matrix regression analysis, that we call ‘Principal Regression Analysis' (PRA) and for which we provide some analytical (using Random Matrix Theory) and numerical benchmarks. We find that downward index trends increase the average correlation between stocks (as measured by the most negative eigenvalue of the conditional correlation matrix), and makes the market mode more uniform. Upward trends, on the other hand, also increase the average correlation between stocks but rotates the corresponding market mode away from uniformity. There are two time scales associated to these effects, a short one on the order of a month (20 trading days), and a longer time scale on the order of a year. We also find indications of a leverage effect for sectorial correlations as well, which reveals itself in the second and third mode of the PRA.
Fault tree analysis for urban flooding.
ten Veldhuis, J A E; Clemens, F H L R; van Gelder, P H A J M
2009-01-01
Traditional methods to evaluate flood risk generally focus on heavy storm events as the principal cause of flooding. Conversely, fault tree analysis is a technique that aims at modelling all potential causes of flooding. It quantifies both overall flood probability and relative contributions of individual causes of flooding. This paper presents a fault model for urban flooding and an application to the case of Haarlem, a city of 147,000 inhabitants. Data from a complaint register, rainfall gauges and hydrodynamic model calculations are used to quantify probabilities of basic events in the fault tree. This results in a flood probability of 0.78/week for Haarlem. It is shown that gully pot blockages contribute to 79% of flood incidents, whereas storm events contribute only 5%. This implies that for this case more efficient gully pot cleaning is a more effective strategy to reduce flood probability than enlarging drainage system capacity. Whether this is also the most cost-effective strategy can only be decided after risk assessment has been complemented with a quantification of consequences of both types of events. To do this will be the next step in this study.
Horner, Stacy B; Fireman, Gary D; Wang, Eugene W
2010-04-01
Peer nominations and demographic information were collected from a diverse sample of 1493 elementary school participants to examine behavior (overt and relational aggression, impulsivity, and prosociality), context (peer status), and demographic characteristics (race and gender) as predictors of teacher and administrator decisions about discipline. Exploratory results using classification tree analyses indicated students nominated as average or highly overtly aggressive were more likely to be disciplined than others. Among these students, race was the most significant predictor, with African American students more likely to be disciplined than Caucasians, Hispanics, or Others. Among the students nominated as low in overt aggression, a lack of prosocial behavior was the most significant predictor. Confirmatory analysis using hierarchical logistic regression supported the exploratory results. Similarities with other biased referral patterns, proactive classroom management strategies, and culturally sensitive recommendations are discussed.
Poisson Regression Analysis of Illness and Injury Surveillance Data
Frome E.L., Watkins J.P., Ellis E.D.
2012-12-12
The Department of Energy (DOE) uses illness and injury surveillance to monitor morbidity and assess the overall health of the work force. Data collected from each participating site include health events and a roster file with demographic information. The source data files are maintained in a relational data base, and are used to obtain stratified tables of health event counts and person time at risk that serve as the starting point for Poisson regression analysis. The explanatory variables that define these tables are age, gender, occupational group, and time. Typical response variables of interest are the number of absences due to illness or injury, i.e., the response variable is a count. Poisson regression methods are used to describe the effect of the explanatory variables on the health event rates using a log-linear main effects model. Results of fitting the main effects model are summarized in a tabular and graphical form and interpretation of model parameters is provided. An analysis of deviance table is used to evaluate the importance of each of the explanatory variables on the event rate of interest and to determine if interaction terms should be considered in the analysis. Although Poisson regression methods are widely used in the analysis of count data, there are situations in which over-dispersion occurs. This could be due to lack-of-fit of the regression model, extra-Poisson variation, or both. A score test statistic and regression diagnostics are used to identify over-dispersion. A quasi-likelihood method of moments procedure is used to evaluate and adjust for extra-Poisson variation when necessary. Two examples are presented using respiratory disease absence rates at two DOE sites to illustrate the methods and interpretation of the results. In the first example the Poisson main effects model is adequate. In the second example the score test indicates considerable over-dispersion and a more detailed analysis attributes the over-dispersion to extra
Analysis of Logic Programs Using Regular Tree Languages
Gallagher, John Patrick
2012-01-01
The eld of nite tree automata provides fundamental notations and tools for reasoning about set of terms called regular or recognizable tree languages. We consider two kinds of analysis using regular tree languages, applied to logic programs. The rst approach is to try to discover automatically a ...... to the analysis is a program and a tree automaton, and the output is an abstract model of the program. These two contrasting abstract interpretations can be used in a wide range of analysis and verication problems.......The eld of nite tree automata provides fundamental notations and tools for reasoning about set of terms called regular or recognizable tree languages. We consider two kinds of analysis using regular tree languages, applied to logic programs. The rst approach is to try to discover automatically...... a tree automaton from a logic program, approximating its minimal Herbrand model. In this case the input for the analysis is a program, and the output is a tree automaton. The second approach is to expose or check properties of the program that can be expressed by a given tree automaton. The input...
A regressed phase analysis for coupled joint systems.
Wininger, Michael
2011-01-01
This study aims to address shortcomings of the relative phase analysis, a widely used method for assessment of coupling among joints of the lower limb. Goniometric data from 15 individuals with spastic diplegic cerebral palsy were recorded from the hip and knee joints during ambulation on a flat surface, and from a single healthy individual with no known motor impairment, over at least 10 gait cycles. The minimum relative phase (MRP) revealed substantial disparity in the timing and severity of the instance of maximum coupling, depending on which reference frame was selected: MRP(knee-hip) differed from MRP(hip-knee) by 16.1±14% of gait cycle and 50.6±77% difference in scale. Additionally, several relative phase portraits contained discontinuities which may contribute to error in phase feature extraction. These vagaries can be attributed to the predication of relative phase analysis on a transformation into the velocity-position phase plane, and the extraction of phase angle by the discontinuous arc-tangent operator. Here, an alternative phase analysis is proposed, wherein kinematic data is transformed into a profile of joint coupling across the entire gait cycle. By comparing joint velocities directly via a standard linear regression in the velocity-velocity phase plane, this regressed phase analysis provides several key advantages over relative phase analysis including continuity, commutativity between reference frames, and generalizability to many-joint systems.
Forecasting urban water demand: A meta-regression analysis.
Sebri, Maamar
2016-12-01
Water managers and planners require accurate water demand forecasts over the short-, medium- and long-term for many purposes. These range from assessing water supply needs over spatial and temporal patterns to optimizing future investments and planning future allocations across competing sectors. This study surveys the empirical literature on the urban water demand forecasting using the meta-analytical approach. Specifically, using more than 600 estimates, a meta-regression analysis is conducted to identify explanations of cross-studies variation in accuracy of urban water demand forecasting. Our study finds that accuracy depends significantly on study characteristics, including demand periodicity, modeling method, forecasting horizon, model specification and sample size. The meta-regression results remain robust to different estimators employed as well as to a series of sensitivity checks performed. The importance of these findings lies in the conclusions and implications drawn out for regulators and policymakers and for academics alike. Copyright © 2016. Published by Elsevier Ltd.
Epistasis analysis for quantitative traits by functional regression model.
Zhang, Futao; Boerwinkle, Eric; Xiong, Momiao
2014-06-01
The critical barrier in interaction analysis for rare variants is that most traditional statistical methods for testing interactions were originally designed for testing the interaction between common variants and are difficult to apply to rare variants because of their prohibitive computational time and poor ability. The great challenges for successful detection of interactions with next-generation sequencing (NGS) data are (1) lack of methods for interaction analysis with rare variants, (2) severe multiple testing, and (3) time-consuming computations. To meet these challenges, we shift the paradigm of interaction analysis between two loci to interaction analysis between two sets of loci or genomic regions and collectively test interactions between all possible pairs of SNPs within two genomic regions. In other words, we take a genome region as a basic unit of interaction analysis and use high-dimensional data reduction and functional data analysis techniques to develop a novel functional regression model to collectively test interactions between all possible pairs of single nucleotide polymorphisms (SNPs) within two genome regions. By intensive simulations, we demonstrate that the functional regression models for interaction analysis of the quantitative trait have the correct type 1 error rates and a much better ability to detect interactions than the current pairwise interaction analysis. The proposed method was applied to exome sequence data from the NHLBI's Exome Sequencing Project (ESP) and CHARGE-S study. We discovered 27 pairs of genes showing significant interactions after applying the Bonferroni correction (P-values < 4.58 × 10(-10)) in the ESP, and 11 were replicated in the CHARGE-S study.
Residential behavioural energy savings : a meta-regression analysis
Tiedemann, K.H. [BC Hydro, Burnaby, BC (Canada)
2009-07-01
Increasing attention is being given to opportunities for residential energy behavioural savings, as developed countries attempt to reduce energy use and greenhouse gas emissions. Several utility companies have undertaken pilot programs geared at understanding which interventions are most effective in reducing residential energy consumption through behavioural change. This paper presented the first metaregression analysis of residential energy behavioural savings. This study focused on interventions which affected household energy-related behaviours and as a result, affected household energy use. The paper described rational choice theory, the theory of planned behaviour, and the integration of rational choice theory and the adjusted expectancy values theory in a simple framework. The paper also discussed the review of various social, psychological and economics journals and databases. The results of the studies were presented. A basic concept in meta-regression analysis is the effects size which is defined as the program effect divided by the standard error of the program effect. A lengthy review of the literature found twenty-eight treatments from ten experiments for which an effect size could be calculated. The experiments involved classifying treatments according to whether the interventions were information, goal setting, feedback, rewards or combinations of these interventions. The impact of these alternative interventions on the effect size was then modelled using White's robust regression. Five regression models were compared on the basis of the Akaike's information criterion. It was found that model 5, which used all of the regressors, was the preferred model. It was concluded that the theory of planned behaviour is more appropriate in the context of analysis of behavioural change and energy use. 21 refs., 4 tabs.
Meta-regression Analysis of the Chinese Labor Reallocation Effect
Longhua; YUE; Shiyan; YANG; Rongtai; SHEN
2013-01-01
Meta regression analysis method was applied to study 23 papers about the effect of Chinese labor reallocation on the economic growth. The results showed that both the method of the World Bank (1996) or M.Syrquin(1986) had little impact on the results, while the calculation of the stock of physical capital had a positive impact on the results. The result by using panel data study was bigger than results obtained in the time series data. The time span had little influences on the results. Therefore, it was necessary to measure the exact stock of physical capital in China, so as to evaluate the Chinese labor reallocation effect
Multivariate study and regression analysis of gluten-free granola
Lilian Maria Pagamunici
2014-03-01
Full Text Available This study developed a gluten-free granola and evaluated it during storage with the application of multivariate and regression analysis of the sensory and instrumental parameters. The physicochemical, sensory, and nutritional characteristics of a product containing quinoa, amaranth and linseed were evaluated. The crude protein and lipid contents ranged from 97.49 and 122.72 g kg-1 of food, respectively. The polyunsaturated/saturated, and n-6:n-3 fatty acid ratios ranged from 2.82 and 2.59:1, respectively. Granola had the best alpha-linolenic acid content, nutritional indices in the lipid fraction, and mineral content. There were good hygienic and sanitary conditions during storage; probably due to the low water activity of the formulation, which contributed to inhibit microbial growth. The sensory attributes ranged from 'like very much' to 'like slightly', and the regression models were highly fitted and correlated during the storage period. A reduction in the sensory attribute levels and in the product physical stabilisation was verified by principal component analysis. The use of the affective test acceptance and instrumental analysis combined with statistical methods allowed us to obtain promising results about the characteristics of gluten-free granola.
Kanungo, D. P.; Sharma, Shaifaly; Pain, Anindya
2014-09-01
The shear strength parameters of soil (cohesion and angle of internal friction) are quite essential in solving many civil engineering problems. In order to determine these parameters, laboratory tests are used. The main objective of this work is to evaluate the potential of Artificial Neural Network (ANN) and Regression Tree (CART) techniques for the indirect estimation of these parameters. Four different models, considering different combinations of 6 inputs, such as gravel %, sand %, silt %, clay %, dry density, and plasticity index, were investigated to evaluate the degree of their effects on the prediction of shear parameters. A performance evaluation was carried out using Correlation Coefficient and Root Mean Squared Error measures. It was observed that for the prediction of friction angle, the performance of both the techniques is about the same. However, for the prediction of cohesion, the ANN technique performs better than the CART technique. It was further observed that the model considering all of the 6 input soil parameters is the most appropriate model for the prediction of shear parameters. Also, connection weight and bias analyses of the best neural network (i.e., 6/2/2) were attempted using Connection Weight, Garson, and proposed Weight-bias approaches to characterize the influence of input variables on shear strength parameters. It was observed that the Connection Weight Approach provides the best overall methodology for accurately quantifying variable importance, and should be favored over the other approaches examined in this study.
Neela Deshpande
2014-12-01
Full Text Available In the recent past Artificial Neural Networks (ANN have emerged out as a promising technique for predicting compressive strength of concrete. In the present study back propagation was used to predict the 28 day compressive strength of recycled aggregate concrete (RAC along with two other data driven techniques namely Model Tree (MT and Non-linear Regression (NLR. Recycled aggregate is the current need of the hour owing to its environmental friendly aspect of re-use of the construction waste. The study observed that, prediction of 28 day compressive strength of RAC was done better by ANN than NLR and MT. The input parameters were cubic meter proportions of Cement, Natural fine aggregate, Natural coarse Aggregates, recycled aggregates, Admixture and Water (also called as raw data. The study also concluded that ANN performs better when non-dimensional parameters like Sand–Aggregate ratio, Water–total materials ratio, Aggregate–Cement ratio, Water–Cement ratio and Replacement ratio of natural aggregates by recycled aggregates, were used as additional input parameters. Study of each network developed using raw data and each non dimensional parameter facilitated in studying the impact of each parameter on the performance of the models developed using ANN, MT and NLR as well as performance of the ANN models developed with limited number of inputs. The results indicate that ANN learn from the examples and grasp the fundamental domain rules governing strength of concrete.
LI Chang-ping; ZHI Xin-yue; MA Jun; CUI Zhuang; ZHU Zi-long; ZHANG Cui; HU Liang-ping
2012-01-01
Background Various methods can be applied to build predictive models for the clinical data with binary outcome variable.This research aims to explore the process of constructing common predictive models,Logistic regression (LR),decision tree (DT) and multilayer perceptron (MLP),as well as focus on specific details when applying the methods mentioned above:what preconditions should be satisfied,how to set parameters of the model,how to screen variables and build accuracy models quickly and efficiently,and how to assess the generalization ability (that is,prediction performance) reliably by Monte Carlo method in the case of small sample size.Methods All the 274 patients (include 137 type 2 diabetes mellitus with diabetic peripheral neuropathy and 137 type 2 diabetes mellitus without diabetic peripheral neuropathy) from the Metabolic Disease Hospital in Tianjin participated in the study.There were 30 variables such as sex,age,glycosylated hemoglobin,etc.On account of small sample size,the classification and regression tree (CART) with the chi-squared automatic interaction detector tree (CHAID) were combined by means of the 100 times 5-7 fold stratified cross-validation to build DT.The MLP was constructed by Schwarz Bayes Criterion to choose the number of hidden layers and hidden layer units,alone with levenberg-marquardt (L-M) optimization algorithm,weight decay and preliminary training method.Subsequently,LR was applied by the best subset method with the Akaike Information Criterion (AIC) to make the best used of information and avoid overfitting.Eventually,a 10 to 100 times 3-10 fold stratified cross-validation method was used to compare the generalization ability of DT,MLP and LR in view of the areas under the receiver operating characteristic (ROC) curves (AUC).Results The AUC of DT,MLP and LR were 0.8863,0.8536 and 0.8802,respectively.As the larger the AUC of a specific prediction model is,the higher diagnostic ability presents,MLP performed optimally,and then
Regression analysis application for designing the vibration dampers
A. V. Ivanov
2014-01-01
Full Text Available Multi-frequency vibration dampers protect air power lines and fiber optic communication channels against Aeolian vibrations. To have a maximum efficiency the natural frequencies of dampers should be evenly distributed over the entire operating frequency range from 3 to 150 Hz. A traditional approach to damper design is to investigate damper features using the fullscale models. As a result, a conclusion on the damper capabilities is drawn, and design changes are made to achieve the required natural frequencies. The article describes a direct optimization method to design dampers.This method leads to a clear-cut definition of geometrical and mass parameters of dampers by their natural frequencies. The direct designing method is based on the active plan and design experiment.Based on regression analysis, a regression model is obtained as a second order polynomial to establish unique relation between the input (element dimensions, the weights of cargos and the output (natural frequencies design parameters. Different problems of designing dampers are considered using developed regression models.As a result, it has been found that a satisfactory accuracy of mathematical models, relating the input designing parameters to the output ones, is achieved. Depending on the number of input parameters and the nature of the restrictions a statement of designing purpose, including an optimization one, can be different when restrictions for design parameters are to meet the conflicting requirements.A proposed optimization method to solve a direct designing problem allows us to determine directly the damper element dimensions for any natural frequencies, and at the initial stage of the analysis, based on the methods of nonlinear programming, to disclose problems with no solution.The developed approach can be successfully applied to design various mechanical systems with complicated nonlinear interactions between the input and output parameters.
Yingxin Gu
2016-11-01
Full Text Available Regression tree models have been widely used for remote sensing-based ecosystem mapping. Improper use of the sample data (model training and testing data may cause overfitting and underfitting effects in the model. The goal of this study is to develop an optimal sampling data usage strategy for any dataset and identify an appropriate number of rules in the regression tree model that will improve its accuracy and robustness. Landsat 8 data and Moderate-Resolution Imaging Spectroradiometer-scaled Normalized Difference Vegetation Index (NDVI were used to develop regression tree models. A Python procedure was designed to generate random replications of model parameter options across a range of model development data sizes and rule number constraints. The mean absolute difference (MAD between the predicted and actual NDVI (scaled NDVI, value from 0–200 and its variability across the different randomized replications were calculated to assess the accuracy and stability of the models. In our case study, a six-rule regression tree model developed from 80% of the sample data had the lowest MAD (MADtraining = 2.5 and MADtesting = 2.4, which was suggested as the optimal model. This study demonstrates how the training data and rule number selections impact model accuracy and provides important guidance for future remote-sensing-based ecosystem modeling.
Gu, Yingxin; Wylie, Bruce K.; Boyte, Stephen; Picotte, Joshua J.; Howard, Danny; Smith, Kelcy; Nelson, Kurtis
2016-01-01
Regression tree models have been widely used for remote sensing-based ecosystem mapping. Improper use of the sample data (model training and testing data) may cause overfitting and underfitting effects in the model. The goal of this study is to develop an optimal sampling data usage strategy for any dataset and identify an appropriate number of rules in the regression tree model that will improve its accuracy and robustness. Landsat 8 data and Moderate-Resolution Imaging Spectroradiometer-scaled Normalized Difference Vegetation Index (NDVI) were used to develop regression tree models. A Python procedure was designed to generate random replications of model parameter options across a range of model development data sizes and rule number constraints. The mean absolute difference (MAD) between the predicted and actual NDVI (scaled NDVI, value from 0–200) and its variability across the different randomized replications were calculated to assess the accuracy and stability of the models. In our case study, a six-rule regression tree model developed from 80% of the sample data had the lowest MAD (MADtraining = 2.5 and MADtesting = 2.4), which was suggested as the optimal model. This study demonstrates how the training data and rule number selections impact model accuracy and provides important guidance for future remote-sensing-based ecosystem modeling.
López-Serrano PM
2016-04-01
Full Text Available The Sierra Madre Occidental mountain range (Durango, Mexico is of great ecological interest because of the high degree of environmental heterogeneity in the area. The objective of the present study was to estimate the biomass of mixed and uneven-aged forests in the Sierra Madre Occidental by using Landsat-5 TM spectral data and forest inventory data. We used the ATCOR3® atmospheric and topographic correction module to convert remotely sensed imagery digital signals to surface reflectance values. The usual approach of modeling stand variables by using multiple linear regression was compared with a hybrid model developed in two steps: in the first step a regression tree was used to obtain an initial classification of homogeneous biomass groups, and multiple linear regression models were then fitted to each node of the pruned regression tree. Cross-validation of the hybrid model explained 72.96% of the observed stand biomass variation, with a reduction in the RMSE of 25.47% with respect to the estimates yielded by the linear model fitted to the complete database. The most important variables for the binary classification process in the regression tree were the albedo, the corrected readings of the short-wave infrared band of the satellite (2.08-2.35 µm and the topographic moisture index. We used the model output to construct a map for estimating biomass in the study area, which yielded values of between 51 and 235 Mg ha-1. The use of regression trees in combination with stepwise regression of corrected satellite imagery proved a reliable method for estimating forest biomass.
Optimum short-time polynomial regression for signal analysis
A SREENIVASA MURTHY; CHANDRA SEKHAR SEELAMANTULA; T V SREENIVAS
2016-11-01
We propose a short-time polynomial regression (STPR) for time-varying signal analysis. The advantage of using polynomials is that the notion of a spectrum is not needed and the signals can be analyzed in the time domain over short durations. In the presence of noise, such modeling becomes important, because the polynomial approximation performs smoothing leading to noise suppression. The problem of optimal smoothingdepends on the duration over which a fixed-order polynomial regression is performed. Considering the STPR of a noisy signal, we derive the optimal smoothing window by minimizing the mean-square error (MSE). For a fixed polynomial order, the smoothing window duration depends on the rate of signal variation, which, in turn,depends on its derivatives. Since the derivatives are not available a priori, exact optimization is not feasible.However, approximate optimization can be achieved using only the variance expressions and the intersection-ofconfidence-intervals (ICI) technique. The ICI technique is based on a consistency measure across confidence intervals corresponding to different window lengths. An approximate asymptotic analysis to determine the optimal confidence interval width shows that the asymptotic expressions are the same irrespective of whether one starts with a uniform sampling grid or a nonuniform one. Simulation results on sinusoids, chirps, and electrocardiogram (ECG) signals, and comparisons with standard wavelet denoising techniques, show that theproposed method is robust particularly in the low signal-to-noise ratio regime.
DFTCalc: Reliability centered maintenance via fault tree analysis (tool paper)
Guck, Dennis; Spel, Jip; Stoelinga, Mariëlle Ida Antoinette; Butler, Michael; Conchon, Sylvain; Zaïdi, Fatiha
2015-01-01
Reliability, availability, maintenance and safety (RAMS) analysis is essential in the evaluation of safety critical systems like nuclear power plants and the railway infrastructure. A widely used methodology within RAMS analysis are fault trees, representing failure propagations throughout a system.
DFTCalc: reliability centered maintenance via fault tree analysis (tool paper)
Guck, Dennis; Spel, Jip; Stoelinga, Mariëlle; Butler, Michael; Conchon, Sylvain; Zaïdi, Fatiha
2015-01-01
Reliability, availability, maintenance and safety (RAMS) analysis is essential in the evaluation of safety critical systems like nuclear power plants and the railway infrastructure. A widely used methodology within RAMS analysis are fault trees, representing failure propagations throughout a system.
Building and Analysis of Computer Aided Fuze Fault Tree
王亚斌; 刘明杰; 谭惠民
2004-01-01
A common software to analyze fuze fault tree is developed to simplify the trivialness in generating the fuze fault tree and reduce the manual calculation work. The overall structure, function and implementation of the system are introduced. The software based on Windows platform is used to generate the fuze fault tree in graphics mode. A quantitative analysis of fuze fault tree can be obtained by the method of minimum cut sets. A calculation example is used to verify the function of the software. Consequently, the expected requirements of this software system are achieved to a certain level.
Polygraph Test Results Assessment by Regression Analysis Methods
K. A. Leontiev
2014-01-01
Full Text Available The paper considers a problem of defining the importance of asked questions for the examinee under judicial and psychophysiological polygraph examination by methods of mathematical statistics. It offers the classification algorithm based on the logistic regression as an optimum Bayesian classifier, considering weight coefficients of information for the polygraph-recorded physiological parameters with no condition for independence of the measured signs.Actually, binary classification is executed by results of polygraph examination with preliminary normalization and standardization of primary results, with check of a hypothesis that distribution of obtained data is normal, as well as with calculation of coefficients of linear regression between input values and responses by method of maximum likelihood. Further, the logistic curve divided signs into two classes of the "significant" and "insignificant" type.Efficiency of model is estimated by means of the ROC analysis (Receiver Operator Characteristics. It is shown that necessary minimum sample has to contain results of 45 measurements at least. This approach ensures a reliable result provided that an expert-polygraphologist possesses sufficient qualification and follows testing techniques.
Jaime Lynn Speiser
Full Text Available Assessing prognosis for acetaminophen-induced acute liver failure (APAP-ALF patients often presents significant challenges. King's College (KCC has been validated on hospital admission, but little has been published on later phases of illness. We aimed to improve determinations of prognosis both at the time of and following admission for APAP-ALF using Classification and Regression Tree (CART models.CART models were applied to US ALFSG registry data to predict 21-day death or liver transplant early (on admission and post-admission (days 3-7 for 803 APAP-ALF patients enrolled 01/1998-09/2013. Accuracy in prediction of outcome (AC, sensitivity (SN, specificity (SP, and area under receiver-operating curve (AUROC were compared between 3 models: KCC (INR, creatinine, coma grade, pH, CART analysis using only KCC variables (KCC-CART and a CART model using new variables (NEW-CART.Traditional KCC yielded 69% AC, 90% SP, 27% SN, and 0.58 AUROC on admission, with similar performance post-admission. KCC-CART at admission offered predictive 66% AC, 65% SP, 67% SN, and 0.74 AUROC. Post-admission, KCC-CART had predictive 82% AC, 86% SP, 46% SN and 0.81 AUROC. NEW-CART models using MELD (Model for end stage liver disease, lactate and mechanical ventilation on admission yielded predictive 72% AC, 71% SP, 77% SN and AUROC 0.79. For later stages, NEW-CART (MELD, lactate, coma grade offered predictive AC 86%, SP 91%, SN 46%, AUROC 0.73.CARTs offer simple prognostic models for APAP-ALF patients, which have higher AUROC and SN than KCC, with similar AC and negligibly worse SP. Admission and post-admission predictions were developed.• Prognostication in acetaminophen-induced acute liver failure (APAP-ALF is challenging beyond admission • Little has been published regarding the use of King's College Criteria (KCC beyond admission and KCC has shown limited sensitivity in subsequent studies • Classification and Regression Tree (CART methodology allows the
Al-Khaja, Nawal
2007-01-01
This is a thematic lesson plan for young learners about palm trees and the importance of taking care of them. The two part lesson teaches listening, reading and speaking skills. The lesson includes parts of a tree; the modal auxiliary, can; dialogues and a role play activity.
Dhanya, S; Kumari Roshni, V S
2016-01-01
Textures play an important role in image classification. This paper proposes a high performance texture classification method using a combination of multiresolution analysis tool and linear regression modelling by channel elimination. The correlation between different frequency regions has been validated as a sort of effective texture characteristic. This method is motivated by the observation that there exists a distinctive correlation between the image samples belonging to the same kind of texture, at different frequency regions obtained by a wavelet transform. Experimentally, it is observed that this correlation differs across textures. The linear regression modelling is employed to analyze this correlation and extract texture features that characterize the samples. Our method considers not only the frequency regions but also the correlation between these regions. This paper primarily focuses on applying the Dual Tree Complex Wavelet Packet Transform and the Linear Regression model for classification of the obtained texture features. Additionally the paper also presents a comparative assessment of the classification results obtained from the above method with two more types of wavelet transform methods namely the Discrete Wavelet Transform and the Discrete Wavelet Packet Transform.
A Visual Analytics Approach for Correlation, Classification, and Regression Analysis
Steed, Chad A [ORNL; SwanII, J. Edward [Mississippi State University (MSU); Fitzpatrick, Patrick J. [Mississippi State University (MSU); Jankun-Kelly, T.J. [Mississippi State University (MSU)
2012-02-01
New approaches that combine the strengths of humans and machines are necessary to equip analysts with the proper tools for exploring today's increasing complex, multivariate data sets. In this paper, a novel visual data mining framework, called the Multidimensional Data eXplorer (MDX), is described that addresses the challenges of today's data by combining automated statistical analytics with a highly interactive parallel coordinates based canvas. In addition to several intuitive interaction capabilities, this framework offers a rich set of graphical statistical indicators, interactive regression analysis, visual correlation mining, automated axis arrangements and filtering, and data classification techniques. The current work provides a detailed description of the system as well as a discussion of key design aspects and critical feedback from domain experts.
Cardiorespiratory fitness and laboratory stress: a meta-regression analysis.
Jackson, Erica M; Dishman, Rod K
2006-01-01
We performed a meta-regression analysis of 73 studies that examined whether cardiorespiratory fitness mitigates cardiovascular responses during and after acute laboratory stress in humans. The cumulative evidence indicates that fitness is related to slightly greater reactivity, but better recovery. However, effects varied according to several study features and were smallest in the better controlled studies. Fitness did not mitigate integrated stress responses such as heart rate and blood pressure, which were the focus of most of the studies we reviewed. Nonetheless, potentially important areas, particularly hemodynamic and vascular responses, have been understudied. Women, racial/ethnic groups, and cardiovascular patients were underrepresented. Randomized controlled trials, including naturalistic studies of real-life responses, are needed to clarify whether a change in fitness alters putative stress mechanisms linked with cardiovascular health.
Multivariate Regression Analysis of Gravitational Waves from Rotating Core Collapse
Engels, William J; Ott, Christian D
2014-01-01
We present a new multivariate regression model for analysis and parameter estimation of gravitational waves observed from well but not perfectly modeled sources such as core-collapse supernovae. Our approach is based on a principal component decomposition of simulated waveform catalogs. Instead of reconstructing waveforms by direct linear combination of physically meaningless principal components, we solve via least squares for the relationship that encodes the connection between chosen physical parameters and the principal component basis. Although our approach is linear, the waveforms' parameter dependence may be non-linear. For the case of gravitational waves from rotating core collapse, we show, using statistical hypothesis testing, that our method is capable of identifying the most important physical parameters that govern waveform morphology in the presence of simulated detector noise. We also demonstrate our method's ability to predict waveforms from a principal component basis given a set of physical ...
Spatial regression analysis of traffic crashes in Seoul.
Rhee, Kyoung-Ah; Kim, Joon-Ki; Lee, Young-ihn; Ulfarsson, Gudmundur F
2016-06-01
Traffic crashes can be spatially correlated events and the analysis of the distribution of traffic crash frequency requires evaluation of parameters that reflect spatial properties and correlation. Typically this spatial aspect of crash data is not used in everyday practice by planning agencies and this contributes to a gap between research and practice. A database of traffic crashes in Seoul, Korea, in 2010 was developed at the traffic analysis zone (TAZ) level with a number of GIS developed spatial variables. Practical spatial models using available software were estimated. The spatial error model was determined to be better than the spatial lag model and an ordinary least squares baseline regression. A geographically weighted regression model provided useful insights about localization of effects. The results found that an increased length of roads with speed limit below 30 km/h and a higher ratio of residents below age of 15 were correlated with lower traffic crash frequency, while a higher ratio of residents who moved to the TAZ, more vehicle-kilometers traveled, and a greater number of access points with speed limit difference between side roads and mainline above 30 km/h all increased the number of traffic crashes. This suggests, for example, that better control or design for merging lower speed roads with higher speed roads is important. A key result is that the length of bus-only center lanes had the largest effect on increasing traffic crashes. This is important as bus-only center lanes with bus stop islands have been increasingly used to improve transit times. Hence the potential negative safety impacts of such systems need to be studied further and mitigated through improved design of pedestrian access to center bus stop islands.
Elliptic Fourier analysis of crown shapes in Quercus petraea trees
Ovidiu Hâruţa
2011-06-01
Full Text Available Shape is a fundamental morphological descriptor, significant in taxonomic research as well as in ecomorphology, one method of estimation being from digitally processed images. In the present study, were analysed shapes of Q. petraea crowns, pertaining to five different stem diameter classes, from three similar stands. Based on measurements on terrestrial digital vertical photos, crown size analysis was performed and correlations between crown and stem variables were tested. Linear regression equations between crown volumes and dbh, and crown volumes and stem volumes were derived, explaining more than half of data variability. Employment of elliptic Fourier analysis (EFA, a powerful analysis tool, permitted the extraction of the mean shape from crowns, characterized by high morphological variability. The extracted, most important, coefficients were used to reconstruct the average shape of the crowns, using Inverse Fourier Transform. A mean shape of the crown, corresponding to stand conditions in which competition is added as influential shaping factor, aside genetic program of the species, is described for each stem diameter class. Crown regions with highest shape variability, from the perspective of stage development of the trees, were determined. Accordingly, the main crown shape characteristics are: crown elongation, centroid position, asymmetry with regard to the main axis, lateral regions symmetrical and asymmetrical variations.
Elliptic Fourier analysis of crown shapes in Quercus petraea trees
Ovidiu Hâruţa
2011-02-01
Full Text Available Shape is a fundamental morphological descriptor, significant in taxonomic research as well as in ecomorphology, one method of estimation being from digitally processed images. In the present study, were analysed shapes of Q. petraea crowns, pertaining to five different stem diameter classes, from three similar stands. Based on measurements on terrestrial digital vertical photos, crown size analysis was performed and correlations between crown and stem variables were tested. Linear regression equations between crown volumes and dbh, and crown volumes and stem volumes were derived, explaining more than half of data variability. Employment of elliptic Fourier analysis (EFA, a powerful analysis tool, permitted the extraction of the mean shape from crowns, characterized by high morphological variability. The extracted, most important, coefficients were used to reconstruct the average shape of the crowns, using Inverse Fourier Transform. A mean shape of the crown, corresponding to stand conditions in which competition is added as influential shaping factor, aside genetic program of the species, is described for each stem diameter class. Crown regions with highest shape variability, from the perspective of stage developmentof the trees, were determined. Accordingly, the main crown shape characteristics are: crown elongation, mass center, asymmetry with regard to the main axis, lateral regions symmetrical and asymmetrical variations.
APPLICATION OF FUZZY SET THEORY IN FAULT TREE ANALYSIS
1998-01-01
Based on the fuzzy set theory and the expand principle, using fuzzy number as the boundary condition of fault tree analysis, a new method of analyzing fuzzy fault probability of the top event is developed. Fuzzy importance analysis of the basic event is proposed as well. A practical example is given. This method is a new way to solve the obscure problems of fault tree analysis and has great value in engineering practice.
Lazaridis, D.C.; Verbesselt, J.; Robinson, A.P.
2011-01-01
Constructing models can be complicated when the available fitting data are highly correlated and of high dimension. However, the complications depend on whether the goal is prediction instead of estimation. We focus on predicting tree mortality (measured as the number of dead trees) from change metr
Online Statistical Modeling (Regression Analysis) for Independent Responses
Made Tirta, I.; Anggraeni, Dian; Pandutama, Martinus
2017-06-01
Regression analysis (statistical analmodelling) are among statistical methods which are frequently needed in analyzing quantitative data, especially to model relationship between response and explanatory variables. Nowadays, statistical models have been developed into various directions to model various type and complex relationship of data. Rich varieties of advanced and recent statistical modelling are mostly available on open source software (one of them is R). However, these advanced statistical modelling, are not very friendly to novice R users, since they are based on programming script or command line interface. Our research aims to developed web interface (based on R and shiny), so that most recent and advanced statistical modelling are readily available, accessible and applicable on web. We have previously made interface in the form of e-tutorial for several modern and advanced statistical modelling on R especially for independent responses (including linear models/LM, generalized linier models/GLM, generalized additive model/GAM and generalized additive model for location scale and shape/GAMLSS). In this research we unified them in the form of data analysis, including model using Computer Intensive Statistics (Bootstrap and Markov Chain Monte Carlo/ MCMC). All are readily accessible on our online Virtual Statistics Laboratory. The web (interface) make the statistical modeling becomes easier to apply and easier to compare them in order to find the most appropriate model for the data.
Fuzzy set theoretic approach to fault tree analysis
user
Research in conventional fault tree analysis (FTA) is based mainly on failure ... Thus for a very complex system having large number of components, the ..... Smaller, the triangular fuzzy number B-Ai, will result in the best approximation for B.
The analysis of internet addiction scale using multivariate adaptive regression splines.
Kayri, M
2010-01-01
Determining real effects on internet dependency is too crucial with unbiased and robust statistical method. MARS is a new non-parametric method in use in the literature for parameter estimations of cause and effect based research. MARS can both obtain legible model curves and make unbiased parametric predictions. In order to examine the performance of MARS, MARS findings will be compared to Classification and Regression Tree (C&RT) findings, which are considered in the literature to be efficient in revealing correlations between variables. The data set for the study is taken from "The Internet Addiction Scale" (IAS), which attempts to reveal addiction levels of individuals. The population of the study consists of 754 secondary school students (301 female, 443 male students with 10 missing data). MARS 2.0 trial version is used for analysis by MARS method and C&RT analysis was done by SPSS. MARS obtained six base functions of the model. As a common result of these six functions, regression equation of the model was found. Over the predicted variable, MARS showed that the predictors of daily Internet-use time on average, the purpose of Internet-use, grade of students and occupations of mothers had a significant effect (Pdependency level prediction. The fact that MARS revealed extent to which the variable, which was considered significant, changes the character of the model was observed in this study.
Analysis of electrical tree propagation in XLPE power cable insulation
Bao Minghui, E-mail: baominghui@21cn.co [College of Electrical and Electronic Engineering, Huazhong University of Science and Technology, Wuhan (China); Yin Xiaogen; He Junjia [College of Electrical and Electronic Engineering, Huazhong University of Science and Technology, Wuhan (China)
2011-04-01
Electrical treeing is one of the major breakdown mechanisms for solid dielectrics subjected to high electrical stress. In this paper, the characteristics of electrical tree growth in XLPE samples have been investigated. XLPE samples are obtained from a commercial XLPE power cable, in which electrical trees have been grown from pin to plane in the frequency range of 4000-10,000 Hz, voltage range of 4-10 kV, and the distances between electrodes of 1 and 2 mm. Images of trees and their growing processes were taken by a CCD camera. The fractal dimensions of electric trees were obtained by using a simple box-counting technique. The results show that the tree growth rate and fractal dimension was bigger when the frequency or voltage was higher, or the distance between electrodes was smaller. Contrary to our expectation, it has been found that when the distance between electrodes changed from 1 to 2 mm, the required voltage of the similar electrical trees decreased only 1or 2 kV. In order to evaluate the difficulties of electrical tree propagation in different conditions, a simple energy threshold analysis method has been proposed. The threshold energy, which presents the minimum energy that a charge carrier in the well at the top of the tree should have to make the tree grow, has been computed considering the length of electrical tree, the fractal dimension, and the growth time. The computed results indicate that when one of the three parameters of voltage, frequency, and local electric field increase, the trends of energy threshold can be split into 3 regions.
An analysis of Monte Carlo tree search
James, S
2017-02-01
Full Text Available Monte Carlo Tree Search (MCTS) is a family of directed search algorithms that has gained widespread attention in recent years. Despite the vast amount of research into MCTS, the effect of modifications on the algorithm, as well as the manner...
An Effect Size for Regression Predictors in Meta-Analysis
Aloe, Ariel M.; Becker, Betsy Jane
2012-01-01
A new effect size representing the predictive power of an independent variable from a multiple regression model is presented. The index, denoted as r[subscript sp], is the semipartial correlation of the predictor with the outcome of interest. This effect size can be computed when multiple predictor variables are included in the regression model…
An Analysis of Bank Service Satisfaction Based on Quantile Regression and Grey Relational Analysis
Wen-Tsao Pan
2016-01-01
Full Text Available Bank service satisfaction is vital to the success of a bank. In this paper, we propose to use the grey relational analysis to gauge the levels of service satisfaction of the banks. With the grey relational analysis, we compared the effects of different variables on service satisfaction. We gave ranks to the banks according to their levels of service satisfaction. We further used the quantile regression model to find the variables that affected the satisfaction of a customer at a specific quantile of satisfaction level. The result of the quantile regression analysis provided a bank manager with information to formulate policies to further promote satisfaction of the customers at different quantiles of satisfaction level. We also compared the prediction accuracies of the regression models at different quantiles. The experiment result showed that, among the seven quantile regression models, the median regression model has the best performance in terms of RMSE, RTIC, and CE performance measures.
Regression Analysis of Restricted Mean Survival Time Based on Pseudo-Observations
Andersen, Per Kragh; Hansen, Mette Gerster; Klein, John P.
2004-01-01
censoring; hazard function; health economics; mean survival time; pseudo-observations; regression model; restricted mean survival time; survival analysis......censoring; hazard function; health economics; mean survival time; pseudo-observations; regression model; restricted mean survival time; survival analysis...
Regression analysis of restricted mean survival time based on pseudo-observations
Andersen, Per Kragh; Hansen, Mette Gerster; Klein, John P.
censoring; hazard function; health economics; regression model; survival analysis; mean survival time; restricted mean survival time; pseudo-observations......censoring; hazard function; health economics; regression model; survival analysis; mean survival time; restricted mean survival time; pseudo-observations...
Regression and kriging analysis for grid power factor estimation
Rajesh Guntaka
2014-12-01
Full Text Available The measurement of power factor (PF in electrical utility grids is a mainstay of load balancing and is also a critical element of transmission and distribution efficiency. The measurement of PF dates back to the earliest periods of electrical power distribution to public grids. In the wide-area distribution grid, measurement of current waveforms is trivial and may be accomplished at any point in the grid using a current tap transformer. However, voltage measurement requires reference to ground and so is more problematic and measurements are normally constrained to points that have ready and easy access to a ground source. We present two mathematical analysis methods based on kriging and linear least square estimation (LLSE (regression to derive PF at nodes with unknown voltages that are within a perimeter of sample nodes with ground reference across a selected power grid. Our results indicate an error average of 1.884% that is within acceptable tolerances for PF measurements that are used in load balancing tasks.
A simplified procedure of linear regression in a preliminary analysis
Silvia Facchinetti
2013-05-01
Full Text Available The analysis of a statistical large data-set can be led by the study of a particularly interesting variable Y – regressed – and an explicative variable X, chosen among the remained variables, conjointly observed. The study gives a simplified procedure to obtain the functional link of the variables y=y(x by a partition of the data-set into m subsets, in which the observations are synthesized by location indices (mean or median of X and Y. Polynomial models for y(x of order r are considered to verify the characteristics of the given procedure, in particular we assume r= 1 and 2. The distributions of the parameter estimators are obtained by simulation, when the fitting is done for m= r + 1. Comparisons of the results, in terms of distribution and efficiency, are made with the results obtained by the ordinary least square methods. The study also gives some considerations on the consistency of the estimated parameters obtained by the given procedure.
Fast nonlinear regression method for CT brain perfusion analysis.
Bennink, Edwin; Oosterbroek, Jaap; Kudo, Kohsuke; Viergever, Max A; Velthuis, Birgitta K; de Jong, Hugo W A M
2016-04-01
Although computed tomography (CT) perfusion (CTP) imaging enables rapid diagnosis and prognosis of ischemic stroke, current CTP analysis methods have several shortcomings. We propose a fast nonlinear regression method with a box-shaped model (boxNLR) that has important advantages over the current state-of-the-art method, block-circulant singular value decomposition (bSVD). These advantages include improved robustness to attenuation curve truncation, extensibility, and unified estimation of perfusion parameters. The method is compared with bSVD and with a commercial SVD-based method. The three methods were quantitatively evaluated by means of a digital perfusion phantom, described by Kudo et al. and qualitatively with the aid of 50 clinical CTP scans. All three methods yielded high Pearson correlation coefficients ([Formula: see text]) with the ground truth in the phantom. The boxNLR perfusion maps of the clinical scans showed higher correlation with bSVD than the perfusion maps from the commercial method. Furthermore, it was shown that boxNLR estimates are robust to noise, truncation, and tracer delay. The proposed method provides a fast and reliable way of estimating perfusion parameters from CTP scans. This suggests it could be a viable alternative to current commercial and academic methods.
Influencing Academic Library Use in Tanzania: A Multiple Regression Analysis
Leocardia L Juventus
2016-12-01
Full Text Available Library use is influenced by many factors. This study uses a multiple regression analysis to ascertain the connection between the level of library use and a few of these factors based on the questionnaire responses from 158 undergraduate students who use academic libraries in two Tanzania’s universities: Muhimbili University of Health and Allied Sciences (MUHAS, and Hubert Kairuki Memorial University (HKMU. It has been discovered that users of academic libraries in Tanzania are influenced by the need to: search and access online materials, check for new books or other resources, check out books and other materials, and enjoy a friendly environment for study. However, their library use is not influenced by either the free wireless network, or consultation from librarians. It is argued that, academic libraries need to devise and implement plans that can make these libraries better learning environment and platforms to drive socio-economic developmentparticularly in developing nations such as Tanzania. It is further argued that, this can be enhanced through investment in modern academic library infrastructures.
A Novel Multiobjective Evolutionary Algorithm Based on Regression Analysis
Zhiming Song
2015-01-01
Full Text Available As is known, the Pareto set of a continuous multiobjective optimization problem with m objective functions is a piecewise continuous (m-1-dimensional manifold in the decision space under some mild conditions. However, how to utilize the regularity to design multiobjective optimization algorithms has become the research focus. In this paper, based on this regularity, a model-based multiobjective evolutionary algorithm with regression analysis (MMEA-RA is put forward to solve continuous multiobjective optimization problems with variable linkages. In the algorithm, the optimization problem is modelled as a promising area in the decision space by a probability distribution, and the centroid of the probability distribution is (m-1-dimensional piecewise continuous manifold. The least squares method is used to construct such a model. A selection strategy based on the nondominated sorting is used to choose the individuals to the next generation. The new algorithm is tested and compared with NSGA-II and RM-MEDA. The result shows that MMEA-RA outperforms RM-MEDA and NSGA-II on the test instances with variable linkages. At the same time, MMEA-RA has higher efficiency than the other two algorithms. A few shortcomings of MMEA-RA have also been identified and discussed in this paper.
A novel multiobjective evolutionary algorithm based on regression analysis.
Song, Zhiming; Wang, Maocai; Dai, Guangming; Vasile, Massimiliano
2015-01-01
As is known, the Pareto set of a continuous multiobjective optimization problem with m objective functions is a piecewise continuous (m - 1)-dimensional manifold in the decision space under some mild conditions. However, how to utilize the regularity to design multiobjective optimization algorithms has become the research focus. In this paper, based on this regularity, a model-based multiobjective evolutionary algorithm with regression analysis (MMEA-RA) is put forward to solve continuous multiobjective optimization problems with variable linkages. In the algorithm, the optimization problem is modelled as a promising area in the decision space by a probability distribution, and the centroid of the probability distribution is (m - 1)-dimensional piecewise continuous manifold. The least squares method is used to construct such a model. A selection strategy based on the nondominated sorting is used to choose the individuals to the next generation. The new algorithm is tested and compared with NSGA-II and RM-MEDA. The result shows that MMEA-RA outperforms RM-MEDA and NSGA-II on the test instances with variable linkages. At the same time, MMEA-RA has higher efficiency than the other two algorithms. A few shortcomings of MMEA-RA have also been identified and discussed in this paper.
[The Application of the Fault Tree Analysis Method in Medical Equipment Maintenance].
Liu, Hongbin
2015-11-01
In this paper, the traditional fault tree analysis method is presented, detailed instructions for its application characteristics in medical instrument maintenance is made. It is made significant changes when the traditional fault tree analysis method is introduced into the medical instrument maintenance: gave up the logic symbolic, logic analysis and calculation, gave up its complicated programs, and only keep its image and practical fault tree diagram, and the fault tree diagram there are also differences: the fault tree is no longer a logical tree but the thinking tree in troubleshooting, the definition of the fault tree's nodes is different, the composition of the fault tree's branches is also different.
Principal components analysis in the space of phylogenetic trees
Nye, Tom M W
2012-01-01
Phylogenetic analysis of DNA or other data commonly gives rise to a collection or sample of inferred evolutionary trees. Principal Components Analysis (PCA) cannot be applied directly to collections of trees since the space of evolutionary trees on a fixed set of taxa is not a vector space. This paper describes a novel geometrical approach to PCA in tree-space that constructs the first principal path in an analogous way to standard linear Euclidean PCA. Given a data set of phylogenetic trees, a geodesic principal path is sought that maximizes the variance of the data under a form of projection onto the path. Due to the high dimensionality of tree-space and the nonlinear nature of this problem, the computational complexity is potentially very high, so approximate optimization algorithms are used to search for the optimal path. Principal paths identified in this way reveal and quantify the main sources of variation in the original collection of trees in terms of both topology and branch lengths. The approach is...
Standardized Regression Coefficients as Indices of Effect Sizes in Meta-Analysis
Kim, Rae Seon
2011-01-01
When conducting a meta-analysis, it is common to find many collected studies that report regression analyses, because multiple regression analysis is widely used in many fields. Meta-analysis uses effect sizes drawn from individual studies as a means of synthesizing a collection of results. However, indices of effect size from regression analyses…
NUMERICAL ANALYSIS ON BINOMIAL TREE METHODS FOR AMERICAN LOOKBACK OPTIONS
戴民
2001-01-01
Lookback options are path-dependent options. In general, the binomial tree methods,as the most popular approaches to pricing options, involve a path dependent variable as well as the underlying asset price for lookback options. However, for floating strike lookback options, a single-state variable binomial tree method can be constructed. This paper is devoted to the convergence analysis of the single-state binomial tree methods both for discretely and continuously monitored American floating strike lookback options. We also investigate some properties of such options, including effects of expiration date, interest rate and dividend yield on options prices,properties of optimal exercise boundaries and so on.
Harrell , Jr , Frank E
2015-01-01
This highly anticipated second edition features new chapters and sections, 225 new references, and comprehensive R software. In keeping with the previous edition, this book is about the art and science of data analysis and predictive modeling, which entails choosing and using multiple tools. Instead of presenting isolated techniques, this text emphasizes problem solving strategies that address the many issues arising when developing multivariable models using real data and not standard textbook examples. It includes imputation methods for dealing with missing data effectively, methods for fitting nonlinear relationships and for making the estimation of transformations a formal part of the modeling process, methods for dealing with "too many variables to analyze and not enough observations," and powerful model validation techniques based on the bootstrap. The reader will gain a keen understanding of predictive accuracy, and the harm of categorizing continuous predictors or outcomes. This text realistically...
Design and analysis of experiments classical and regression approaches with SAS
Onyiah, Leonard C
2008-01-01
Introductory Statistical Inference and Regression Analysis Elementary Statistical Inference Regression Analysis Experiments, the Completely Randomized Design (CRD)-Classical and Regression Approaches Experiments Experiments to Compare Treatments Some Basic Ideas Requirements of a Good Experiment One-Way Experimental Layout or the CRD: Design and Analysis Analysis of Experimental Data (Fixed Effects Model) Expected Values for the Sums of Squares The Analysis of Variance (ANOVA) Table Follow-Up Analysis to Check fo
Buffalos milk yield analysis using random regression models
A.S. Schierholt
2010-02-01
Full Text Available Data comprising 1,719 milk yield records from 357 females (predominantly Murrah breed, daughters of 110 sires, with births from 1974 to 2004, obtained from the Programa de Melhoramento Genético de Bubalinos (PROMEBUL and from records of EMBRAPA Amazônia Oriental - EAO herd, located in Belém, Pará, Brazil, were used to compare random regression models for estimating variance components and predicting breeding values of the sires. The data were analyzed by different models using the Legendre’s polynomial functions from second to fourth orders. The random regression models included the effects of herd-year, month of parity date of the control; regression coefficients for age of females (in order to describe the fixed part of the lactation curve and random regression coefficients related to the direct genetic and permanent environment effects. The comparisons among the models were based on the Akaike Infromation Criterion. The random effects regression model using third order Legendre’s polynomials with four classes of the environmental effect were the one that best described the additive genetic variation in milk yield. The heritability estimates varied from 0.08 to 0.40. The genetic correlation between milk yields in younger ages was close to the unit, but in older ages it was low.
Quantile regression provides a fuller analysis of speed data.
Hewson, Paul
2008-03-01
Considerable interest already exists in terms of assessing percentiles of speed distributions, for example monitoring the 85th percentile speed is a common feature of the investigation of many road safety interventions. However, unlike the mean, where t-tests and ANOVA can be used to provide evidence of a statistically significant change, inference on these percentiles is much less common. This paper examines the potential role of quantile regression for modelling the 85th percentile, or any other quantile. Given that crash risk may increase disproportionately with increasing relative speed, it may be argued these quantiles are of more interest than the conditional mean. In common with the more usual linear regression, quantile regression admits a simple test as to whether the 85th percentile speed has changed following an intervention in an analogous way to using the t-test to determine if the mean speed has changed by considering the significance of parameters fitted to a design matrix. Having briefly outlined the technique and briefly examined an application with a widely published dataset concerning speed measurements taken around the introduction of signs in Cambridgeshire, this paper will demonstrate the potential for quantile regression modelling by examining recent data from Northamptonshire collected in conjunction with a "community speed watch" programme. Freely available software is used to fit these models and it is hoped that the potential benefits of using quantile regression methods when examining and analysing speed data are demonstrated.
Analysis of retirement income adequacy using quantile regression: A case study in Malaysia
Alaudin, Ros Idayuwati; Ismail, Noriszura; Isa, Zaidi
2015-09-01
Quantile regression is a statistical analysis that does not restrict attention to the conditional mean and therefore, permitting the approximation of the whole conditional distribution of a response variable. Quantile regression is a robust regression to outliers compared to mean regression models. In this paper, we demonstrate how quantile regression approach can be used to analyze the ratio of projected wealth to needs (wealth-needs ratio) during retirement.
Multiple regression for physiological data analysis: the problem of multicollinearity.
Slinker, B K; Glantz, S A
1985-07-01
Multiple linear regression, in which several predictor variables are related to a response variable, is a powerful statistical tool for gaining quantitative insight into complex in vivo physiological systems. For these insights to be correct, all predictor variables must be uncorrelated. However, in many physiological experiments the predictor variables cannot be precisely controlled and thus change in parallel (i.e., they are highly correlated). There is a redundancy of information about the response, a situation called multicollinearity, that leads to numerical problems in estimating the parameters in regression equations; the parameters are often of incorrect magnitude or sign or have large standard errors. Although multicollinearity can be avoided with good experimental design, not all interesting physiological questions can be studied without encountering multicollinearity. In these cases various ad hoc procedures have been proposed to mitigate multicollinearity. Although many of these procedures are controversial, they can be helpful in applying multiple linear regression to some physiological problems.
Analysis of some methods for reduced rank Gaussian process regression
Quinonero-Candela, J.; Rasmussen, Carl Edward
2005-01-01
While there is strong motivation for using Gaussian Processes (GPs) due to their excellent performance in regression and classification problems, their computational complexity makes them impractical when the size of the training set exceeds a few thousand cases. This has motivated the recent...... proliferation of a number of cost-effective approximations to GPs, both for classification and for regression. In this paper we analyze one popular approximation to GPs for regression: the reduced rank approximation. While generally GPs are equivalent to infinite linear models, we show that Reduced Rank...... Gaussian Processes (RRGPs) are equivalent to finite sparse linear models. We also introduce the concept of degenerate GPs and show that they correspond to inappropriate priors. We show how to modify the RRGP to prevent it from being degenerate at test time. Training RRGPs consists both in learning...
Rodrigo Teske
2012-07-01
Full Text Available A área útil efetiva é um parâmetro importante na aquisição de terras e planejamento do florestamento. A finalidade desta pesquisa foi gerar mapas preditores de áreas aptas ao plantio de eucalipto usando regressões logísticas binárias e variáveis geomorfométricas. As relações entre as variáveis preditoras e as áreas aptas para plantio de eucalipto foram modeladas e a variável que melhor explicou a ocorrência de áreas para plantio foi a distância dos rios. O mapa gerado apresentando as áreas aptas para plantio mostrou alta capacidade de reproduzir o mapa original de plantio de eucalipto. As regressões logísticas demonstraram viabilidade do uso para o mapeamento da aptidão para o plantio de eucalipto.Effective usable area is a key parameter in land acquisition and afforestation planning. The purpose of this research was to generate predictive maps of areas suitable for planting eucalyptus trees using binary logistic regressions and geomorphometric variables. The relationships between the predicting variables and suitable areas for planting eucalyptus trees were modeled and the variable that best explained occurrence of suitable lands was distance from rivers. The generated map showing areas suitable for planting had a high ability to reproduce the original planting map. Logistic regressions demonstrated the feasibility of use this approach to map suitability for eucalyptus forestation.
Frederick, Logan; VanDerslice, James; Taddie, Marissa; Malecki, Kristen; Gregg, Josh; Faust, Nicholas; Johnson, William P
2016-03-15
Arsenic contamination in groundwater is a public health and environmental concern in the United States (U.S.) particularly where monitoring is not required under the Safe Water Drinking Act. Previous studies suggest the influence of regional mechanisms for arsenic mobilization into groundwater; however, no study has examined how influencing parameters change at a continental scale spanning multiple regions. We herein examine covariates for groundwater in the western, central and eastern U.S. regions representing mechanisms associated with arsenic concentrations exceeding the U.S. Environmental Protection Agency maximum contamination level (MCL) of 10 parts per billion (ppb). Statistically significant covariates were identified via classification and regression tree (CART) analysis, and included hydrometeorological and groundwater chemical parameters. The CART analyses were performed at two scales: national and regional; for which three physiographic regions located in the western (Payette Section and the Snake River Plain), central (Osage Plains of the Central Lowlands), and eastern (Embayed Section of the Coastal Plains) U.S. were examined. Validity of each of the three regional CART models was indicated by values >85% for the area under the receiver-operating characteristic curve. Aridity (precipitation minus potential evapotranspiration) was identified as the primary covariate associated with elevated arsenic at the national scale. At the regional scale, aridity and pH were the major covariates in the arid to semi-arid (western) region; whereas dissolved iron (taken to represent chemically reducing conditions) and pH were major covariates in the temperate (eastern) region, although additional important covariates emerged, including elevated phosphate. Analysis in the central U.S. region indicated that elevated arsenic concentrations were driven by a mixture of those observed in the western and eastern regions.
Henri Epstein
2016-01-01
An algebraic formalism, developed with V. Glaser and R. Stora for the study of the generalized retarded functions of quantum field theory, is used to prove a factorization theorem which provides a complete description of the generalized retarded functions associated with any tree graph. Integrating over the variables associated to internal vertices to obtain the perturbative generalized retarded functions for interacting fields arising from such graphs is shown to be possible for a large cate...
Epstein, Henri
2016-01-01
An algebraic formalism, developped with V. Glaser and R. Stora for the study of the generalized retarded functions of quantum field theory, is used to prove a factorization theorem which provides a complete description of the generalized retarded functions associated with any tree graph. Integrating over the variables associated to internal vertices to obtain the perturbative generalized retarded functions for interacting fields arising from such graphs is shown to be possible for a large cat...
Epstein, Henri
2016-01-01
An algebraic formalism, developped with V.~Glaser and R.~Stora for the study of the generalized retarded functions of quantum field theory, is used to prove a factorization theorem which provides a complete description of the generalized retarded functions associated with any tree graph. Integrating over the variables associated to internal vertices to obtain the perturbative generalized retarded functions for interacting fields arising from such graphs is shown to be possible for a large category of space-times.
Regression analysis of censored data using pseudo-observations
Parner, Erik T.; Andersen, Per Kragh
2010-01-01
We draw upon a series of articles in which a method based on pseu- dovalues is proposed for direct regression modeling of the survival function, the restricted mean, and the cumulative incidence function in competing risks with right-censored data. The models, once the pseudovalues have been...
Measuring Habituation in Infants: An Approach Using Regression Analysis.
Ashmead, Daniel H.; Davis, DeFord L.
1996-01-01
Used computer simulations to examine effectiveness of different criteria for measuring infant visual habituation. Found that a criterion based on fitting a second-order polynomial regression function to looking-time data produced more accurate estimation of looking times and higher power for detecting novelty effects than did the traditional…
Grades, Gender, and Encouragement: A Regression Discontinuity Analysis
Owen, Ann L.
2010-01-01
The author employs a regression discontinuity design to provide direct evidence on the effects of grades earned in economics principles classes on the decision to major in economics and finds a differential effect for male and female students. Specifically, for female students, receiving an A for a final grade in the first economics class is…
Mines Systems Safety Improvement Using an Integrated Event Tree and Fault Tree Analysis
Kumar, Ranjan; Ghosh, Achyuta Krishna
2017-04-01
Mines systems such as ventilation system, strata support system, flame proof safety equipment, are exposed to dynamic operational conditions such as stress, humidity, dust, temperature, etc., and safety improvement of such systems can be done preferably during planning and design stage. However, the existing safety analysis methods do not handle the accident initiation and progression of mine systems explicitly. To bridge this gap, this paper presents an integrated Event Tree (ET) and Fault Tree (FT) approach for safety analysis and improvement of mine systems design. This approach includes ET and FT modeling coupled with redundancy allocation technique. In this method, a concept of top hazard probability is introduced for identifying system failure probability and redundancy is allocated to the system either at component or system level. A case study on mine methane explosion safety with two initiating events is performed. The results demonstrate that the presented method can reveal the accident scenarios and improve the safety of complex mine systems simultaneously.
REGRESSION ANALYSIS OF PRODUCTIVITY USING MIXED EFFECT MODEL
Siana Halim
2007-01-01
Full Text Available Production plants of a company are located in several areas that spread across Middle and East Java. As the production process employs mostly manpower, we suspected that each location has different characteristics affecting the productivity. Thus, the production data may have a spatial and hierarchical structure. For fitting a linear regression using the ordinary techniques, we are required to make some assumptions about the nature of the residuals i.e. independent, identically and normally distributed. However, these assumptions were rarely fulfilled especially for data that have a spatial and hierarchical structure. We worked out the problem using mixed effect model. This paper discusses the model construction of productivity and several characteristics in the production line by taking location as a random effect. The simple model with high utility that satisfies the necessary regression assumptions was built using a free statistic software R version 2.6.1.
Academic Achievement, Intelligence, and Creativity: A Regression Surface Analysis.
Marjoribanks, K
1976-01-01
Data collected on 400 12-year-old English school children were used to examine relations between measures of intelligence, creativity and academic achievement. Complex multiple regression models, which included terms to account for the possible interaction and curvilinear relations between intelligence, creativity and academic achievement were used to construct regression surfaces. The surfaces showed that the traditional threshold hypothesis, which suggests that beyond a certain level of intelligence academic achievement is related increasingly to creativity and ceases to be related strongly to intelligence, was not supported. For some areas of academic performance the results suggest an alternate proposition, that creativity ceases to be related to achievement after a threshold level of intelligence has been reached. It was also found that at high levels of verbal ability, non-verbal ability and creativity appeared to have differential relations with academic achievement.
Model performance analysis and model validation in logistic regression
Rosa Arboretti Giancristofaro
2007-10-01
Full Text Available In this paper a new model validation procedure for a logistic regression model is presented. At first, we illustrate a brief review of different techniques of model validation. Next, we define a number of properties required for a model to be considered "good", and a number of quantitative performance measures. Lastly, we describe a methodology for the assessment of the performance of a given model by using an example taken from a management study.
Object oriented data analysis: Sets of trees
Wang, Haonan; Marron, J.S.
2007-01-01
Object oriented data analysis is the statistical analysis of populations of complex objects. In the special case of functional data analysis, these data objects are curves, where standard Euclidean approaches, such as principal component analysis, have been very successful. Recent developments in medical image analysis motivate the statistical analysis of populations of more complex data objects which are elements of mildly non-Euclidean spaces, such as Lie groups and symmetric spaces, or of ...
A New Approach in Regression Analysis for Modeling Adsorption Isotherms
Dana D. Marković
2014-01-01
Full Text Available Numerous regression approaches to isotherm parameters estimation appear in the literature. The real insight into the proper modeling pattern can be achieved only by testing methods on a very big number of cases. Experimentally, it cannot be done in a reasonable time, so the Monte Carlo simulation method was applied. The objective of this paper is to introduce and compare numerical approaches that involve different levels of knowledge about the noise structure of the analytical method used for initial and equilibrium concentration determination. Six levels of homoscedastic noise and five types of heteroscedastic noise precision models were considered. Performance of the methods was statistically evaluated based on median percentage error and mean absolute relative error in parameter estimates. The present study showed a clear distinction between two cases. When equilibrium experiments are performed only once, for the homoscedastic case, the winning error function is ordinary least squares, while for the case of heteroscedastic noise the use of orthogonal distance regression or Margart’s percent standard deviation is suggested. It was found that in case when experiments are repeated three times the simple method of weighted least squares performed as well as more complicated orthogonal distance regression method.
Evidential Networks for Fault Tree Analysis with Imprecise Knowledge
Yang, Jianping; Huang, Hong-Zhong; Liu, Yu; Li, Yan-Feng
2012-06-01
Fault tree analysis (FTA), as one of the powerful tools in reliability engineering, has been widely used to enhance system quality attributes. In most fault tree analyses, precise values are adopted to represent the probabilities of occurrence of those events. Due to the lack of sufficient data or imprecision of existing data at the early stage of product design, it is often difficult to accurately estimate the failure rates of individual events or the probabilities of occurrence of the events. Therefore, such imprecision and uncertainty need to be taken into account in reliability analysis. In this paper, the evidential networks (EN) are employed to quantify and propagate the aforementioned uncertainty and imprecision in fault tree analysis. The detailed conversion processes of some logic gates to EN are described in fault tree (FT). The figures of the logic gates and the converted equivalent EN, together with the associated truth tables and the conditional belief mass tables, are also presented in this work. The new epistemic importance is proposed to describe the effect of ignorance degree of event. The fault tree of an aircraft engine damaged by oil filter plugs is presented to demonstrate the proposed method.
Fuzzy fault tree analysis of roller oscillating tooth gear drive
李瑰贤; 杨伟君; 张欣; 李笑; 刘福利
2002-01-01
Conventional fault tree and reliability analysis do not reflect the characteristics of basic events asnon-stationary and ergodic process. To overcome these drawbacks, theory of fuzzy sets is employed to run faulttree analysis(FTA) of roller oscillating tooth gear drive( ROTGD), the relative frequencies of basic events areconsidered as symmetrical normal fuzzy numbers, from the logical relationship between different events in thefault tree and fuzzy operators AND and OR, fuzzy probability of top event is solved. Finally, an example is giv-en to demonstrate a real ROTGD system.
Kleijnen, J.P.C.
1995-01-01
This tutorial discusses what-if analysis and optimization of System Dynamics models. These problems are solved, using the statistical techniques of regression analysis and design of experiments (DOE). These issues are illustrated by applying the statistical techniques to a System Dynamics model for
Liu, Xiang; Saat, M Rapik; Qin, Xiao; Barkan, Christopher P L
2013-10-01
Derailments are the most common type of freight-train accidents in the United States. Derailments cause damage to infrastructure and rolling stock, disrupt services, and may cause casualties and harm the environment. Accordingly, derailment analysis and prevention has long been a high priority in the rail industry and government. Despite the low probability of a train derailment, the potential for severe consequences justify the need to better understand the factors influencing train derailment severity. In this paper, a zero-truncated negative binomial (ZTNB) regression model is developed to estimate the conditional mean of train derailment severity. Recognizing that the mean is not the only statistic describing data distribution, a quantile regression (QR) model is also developed to estimate derailment severity at different quantiles. The two regression models together provide a better understanding of train derailment severity distribution. Results of this work can be used to estimate train derailment severity under various operational conditions and by different accident causes. This research is intended to provide insights regarding development of cost-efficient train safety policies. Copyright © 2013 Elsevier Ltd. All rights reserved.
Lin, Zhiyue; Kahrilas, P J; Roman, S; Boris, L; Carlson, D; Pandolfino, J E
2012-08-01
The Integrated Relaxation Pressure (IRP) is the esophageal pressure topography (EPT) metric used for assessing the adequacy of esophagogastric junction (EGJ) relaxation in the Chicago Classification of motility disorders. However, because the IRP value is also influenced by distal esophageal contractility, we hypothesized that its normal limits should vary with different patterns of contractility. Five hundred and twenty two selected EPT studies were used to compare the accuracy of alternative analysis paradigms to that of a motility expert (the 'gold standard'). Chicago Classification metrics were scored manually and used as inputs for MATLAB™ programs that utilized either strict algorithm-based interpretation (fixed abnormal IRP threshold of 15 mmHg) or a classification and regression tree (CART) model that selected variable IRP thresholds depending on the associated esophageal contractility. The sensitivity of the CART model for achalasia (93%) was better than that of the algorithm-based approach (85%) on account of using variable IRP thresholds that ranged from a low value of >10 mmHg to distinguish type I achalasia from absent peristalsis to a high value of >17 mmHg to distinguish type III achalasia from distal esophageal spasm. Additionally, type II achalasia was diagnosed solely by panesophageal pressurization without the IRP entering the algorithm. Automated interpretation of EPT studies more closely mimics that of a motility expert when IRP thresholds for impaired EGJ relaxation are adjusted depending on the pattern of associated esophageal contractility. The range of IRP cutoffs suggested by the CART model ranged from 10 to 17 mmHg. © 2012 Blackwell Publishing Ltd.
From Curves to Trees: A Tree-like Shapes Distance Using the Elastic Shape Analysis Framework.
Mottini, A; Descombes, X; Besse, F
2015-04-01
Trees are a special type of graph that can be found in various disciplines. In the field of biomedical imaging, trees have been widely studied as they can be used to describe structures such as neurons, blood vessels and lung airways. It has been shown that the morphological characteristics of these structures can provide information on their function aiding the characterization of pathological states. Therefore, it is important to develop methods that analyze their shape and quantify differences between their structures. In this paper, we present a method for the comparison of tree-like shapes that takes into account both topological and geometrical information. This method, which is based on the Elastic Shape Analysis Framework, also computes the mean shape of a population of trees. As a first application, we have considered the comparison of axon morphology. The performance of our method has been evaluated on two sets of images. For the first set of images, we considered four different populations of neurons from different animals and brain sections from the NeuroMorpho.org open database. The second set was composed of a database of 3D confocal microscopy images of three populations of axonal trees (normal and two types of mutations) of the same type of neurons. We have calculated the inter and intra class distances between the populations and embedded the distance in a classification scheme. We have compared the performance of our method against three other state of the art algorithms, and results showed that the proposed method better distinguishes between the populations. Furthermore, we present the mean shape of each population. These shapes present a more complete picture of the morphological characteristics of each population, compared to the average value of certain predefined features.
Application of portfolio theory in decision tree analysis.
Galligan, D T; Ramberg, C; Curtis, C; Ferguson, J; Fetrow, J
1991-07-01
A general application of portfolio analysis for herd decision tree analysis is described. In the herd environment, this methodology offers a means of employing population-based decision strategies that can help the producer control economic variation in expected return from a given set of decision options. An economic decision tree model regarding the use of prostaglandin in dairy cows with undetected estrus was used to determine the expected return of the decisions to use prostaglandin and breed on a timed basis, use prostaglandin and then breed on sign of estrus, or breed on signs of estrus. The risk attributes of these decision alternatives were calculated from the decision tree, and portfolio theory was used to find the efficient decision combinations (portfolios with the highest return for a given variance). The resulting combinations of decisions could be used to control return variation.
Analysis of some methods for reduced rank Gaussian process regression
Quinonero-Candela, J.; Rasmussen, Carl Edward
2005-01-01
While there is strong motivation for using Gaussian Processes (GPs) due to their excellent performance in regression and classification problems, their computational complexity makes them impractical when the size of the training set exceeds a few thousand cases. This has motivated the recent...... Gaussian Processes (RRGPs) are equivalent to finite sparse linear models. We also introduce the concept of degenerate GPs and show that they correspond to inappropriate priors. We show how to modify the RRGP to prevent it from being degenerate at test time. Training RRGPs consists both in learning...
Parsons, Vickie s.
2009-01-01
The request to conduct an independent review of regression models, developed for determining the expected Launch Commit Criteria (LCC) External Tank (ET)-04 cycle count for the Space Shuttle ET tanking process, was submitted to the NASA Engineering and Safety Center NESC on September 20, 2005. The NESC team performed an independent review of regression models documented in Prepress Regression Analysis, Tom Clark and Angela Krenn, 10/27/05. This consultation consisted of a peer review by statistical experts of the proposed regression models provided in the Prepress Regression Analysis. This document is the consultation's final report.
Strobl, Carolin; Malley, James; Tutz, Gerhard
2009-01-01
Recursive partitioning methods have become popular and widely used tools for nonparametric regression and classification in many scientific fields. Especially random forests, which can deal with large numbers of predictor variables even in the presence of complex interactions, have been applied successfully in genetics, clinical medicine, and…
ADTool: Security Analysis with Attack-Defense Trees
Kordy, Barbara; Kordy, P.T.; Mauw, Sjouke; Schweitzer, Patrick; Joshi, Kaustubh; Siegle, Markus; Stoelinga, Mariëlle Ida Antoinette; d' Argenio, P.R.
ADTool is free, open source software assisting graphical modeling and quantitative analysis of security, using attack–defense trees. The main features of ADTool are easy creation, efficient editing, and automated bottom-up evaluation of security-relevant measures. The tool also supports the usage of
Tree-space statistics and approximations for large-scale analysis of anatomical trees
Feragen, Aasa; Petersen, Jens; Winkler Wille, Mathilde Marie;
2013-01-01
parametrize the relevant parts of tree-space well. Using the developed approximate statistics, we illustrate how the structure and geometry of airway trees vary across a population and show that airway trees with Chronic Obstructive Pulmonary Disease come from a different distribution in tree-space than...
Simulation Experiments in Practice : Statistical Design and Regression Analysis
Kleijnen, J.P.C.
2007-01-01
In practice, simulation analysts often change only one factor at a time, and use graphical analysis of the resulting Input/Output (I/O) data. The goal of this article is to change these traditional, naïve methods of design and analysis, because statistical theory proves that more information is obta
Modeling Information Content Via Dirichlet-Multinomial Regression Analysis.
Ferrari, Alberto
2017-02-16
Shannon entropy is being increasingly used in biomedical research as an index of complexity and information content in sequences of symbols, e.g. languages, amino acid sequences, DNA methylation patterns and animal vocalizations. Yet, distributional properties of information entropy as a random variable have seldom been the object of study, leading to researchers mainly using linear models or simulation-based analytical approach to assess differences in information content, when entropy is measured repeatedly in different experimental conditions. Here a method to perform inference on entropy in such conditions is proposed. Building on results coming from studies in the field of Bayesian entropy estimation, a symmetric Dirichlet-multinomial regression model, able to deal efficiently with the issue of mean entropy estimation, is formulated. Through a simulation study the model is shown to outperform linear modeling in a vast range of scenarios and to have promising statistical properties. As a practical example, the method is applied to a data set coming from a real experiment on animal communication.
Air Pollution Analysis using Ontologies and Regression Models
Parul Choudhary
2016-07-01
Full Text Available Rapidly throughout the world economy, "the expansive Web" in the "world" explosive growth, rapidly growing market characterized by short product cycles exists and the demand for increased flexibility as well as the extensive use of a new data vision managed data society. A new socio-economic system that relies more and more on movement and allocation results in data whose daily existence, refinement, economy and adjust the exchange industry. Cooperative Engineering Co -operation and multi -disciplinary installed on people's cooperation is a good example. Semantic Web is a new form of Web content that is meaningful to computers and additional approved another example. Communication, vision sharing and exchanging data Society's are new commercial bet. Urban air pollution modeling and data processing techniques need elevated Association. Artificial intelligence in countless ways and breakthrough technologies can solve environmental problems from uneven offers. A method for data to formal ontology means a true meaning and lack of ambiguity to allow us to portray memo. In this work we survey regression model for ontologies and air pollution.
Survival analysis of cervical cancer using stratified Cox regression
Purnami, S. W.; Inayati, K. D.; Sari, N. W. Wulan; Chosuvivatwong, V.; Sriplung, H.
2016-04-01
Cervical cancer is one of the mostly widely cancer cause of the women death in the world including Indonesia. Most cervical cancer patients come to the hospital already in an advanced stadium. As a result, the treatment of cervical cancer becomes more difficult and even can increase the death's risk. One of parameter that can be used to assess successfully of treatment is the probability of survival. This study raises the issue of cervical cancer survival patients at Dr. Soetomo Hospital using stratified Cox regression based on six factors such as age, stadium, treatment initiation, companion disease, complication, and anemia. Stratified Cox model is used because there is one independent variable that does not satisfy the proportional hazards assumption that is stadium. The results of the stratified Cox model show that the complication variable is significant factor which influent survival probability of cervical cancer patient. The obtained hazard ratio is 7.35. It means that cervical cancer patient who has complication is at risk of dying 7.35 times greater than patient who did not has complication. While the adjusted survival curves showed that stadium IV had the lowest probability of survival.
Severe accident analysis using dynamic accident progression event trees
Hakobyan, Aram P.
In present, the development and analysis of Accident Progression Event Trees (APETs) are performed in a manner that is computationally time consuming, difficult to reproduce and also can be phenomenologically inconsistent. One of the principal deficiencies lies in the static nature of conventional APETs. In the conventional event tree techniques, the sequence of events is pre-determined in a fixed order based on the expert judgments. The main objective of this PhD dissertation was to develop a software tool (ADAPT) for automated APET generation using the concept of dynamic event trees. As implied by the name, in dynamic event trees the order and timing of events are determined by the progression of the accident. The tool determines the branching times from a severe accident analysis code based on user specified criteria for branching. It assigns user specified probabilities to every branch, tracks the total branch probability, and truncates branches based on the given pruning/truncation rules to avoid an unmanageable number of scenarios. The function of a dynamic APET developed includes prediction of the conditions, timing, and location of containment failure or bypass leading to the release of radioactive material, and calculation of probabilities of those failures. Thus, scenarios that can potentially lead to early containment failure or bypass, such as through accident induced failure of steam generator tubes, are of particular interest. Also, the work is focused on treatment of uncertainties in severe accident phenomena such as creep rupture of major RCS components, hydrogen burn, containment failure, timing of power recovery, etc. Although the ADAPT methodology (Analysis of Dynamic Accident Progression Trees) could be applied to any severe accident analysis code, in this dissertation the approach is demonstrated by applying it to the MELCOR code [1]. A case study is presented involving station blackout with the loss of auxiliary feedwater system for a
Introduction to mixed modelling beyond regression and analysis of variance
Galwey, N W
2007-01-01
Mixed modelling is one of the most promising and exciting areas of statistical analysis, enabling more powerful interpretation of data through the recognition of random effects. However, many perceive mixed modelling as an intimidating and specialized technique.
A regression analysis on the green olives debittering
Kopsidas, Gerassimos C.
1991-12-01
Full Text Available In this paper, a regression model, which gives the debittering time t as a function of the sodium hydroxide concentration 0 and the debittering temperature T, at the debittering of medium size green olive fruit of the Conservolea variety, is fitted. This model has the simple form t=a_{o}C^{a1} ∙ e^{a2/T}, where a_{o}, a_{1}, and a_{2} are constants. The values of a_{o}, a_{1}, and a_{2} are determined by the method of least squares from a set of experimental data. The determined model is very satisfactory for the conditions in which Greek green olives are debittered.
En este artículo se ajusta un modelo de regresión, que da el tiempo de endulzamiento t en función de la concentración de hidróxido sódico C y la temperatura de endulzamiento T, en el endulzamiento de aceitunas verdes de tamaño mediano de la variedad Conservolea. Este modelo tiene la forma simple t=a_{o}C^{a1} ∙ e^{a2/T}, donde a_{1} y a_{2} son constantes. Los valores de a_{o}, a_{1}, y a_{2} son determinados por el método de los mínimos cuadrados a partir de un grupo de datos experimentales. El modelo determinado es muy satisfactorio para las condiciones en las que las aceitunas verdes griegas son endulzadas.
A Quality Assessment Tool for Non-Specialist Users of Regression Analysis
Argyrous, George
2015-01-01
This paper illustrates the use of a quality assessment tool for regression analysis. It is designed for non-specialist "consumers" of evidence, such as policy makers. The tool provides a series of questions such consumers of evidence can ask to interrogate regression analysis, and is illustrated with reference to a recent study published…
Methods of Detecting Outliers in A Regression Analysis Model ...
PROF. O. E. OSUAGWU
2013-06-01
Jun 1, 2013 ... Capacity), X2 (Design Pressure), X3 (Boiler Type), X4 (Drum Type) were used. The analysis of the ... 1.2 Identification Of Outliers. There is no such thing as a simple test. However, there are many ..... Psychological. Bulletin, 95 ...
A Noncentral "t" Regression Model for Meta-Analysis
Camilli, Gregory; de la Torre, Jimmy; Chiu, Chia-Yi
2010-01-01
In this article, three multilevel models for meta-analysis are examined. Hedges and Olkin suggested that effect sizes follow a noncentral "t" distribution and proposed several approximate methods. Raudenbush and Bryk further refined this model; however, this procedure is based on a normal approximation. In the current research literature, this…
Lombardo, L.; Cama, M.; Maerker, M.; Parisi, L.; Rotigliano, E.
2014-12-01
This study aims at comparing the performances of Binary Logistic Regression (BLR) and Boosted Regression Trees (BRT) methods in assessing landslide susceptibility for multiple-occurrence regional landslide events within the Mediterranean region. A test area was selected in the north-eastern sector of Sicily (southern Italy), corresponding to the catchments of the Briga and the Giampilieri streams both stretching for few kilometres from the Peloritan ridge (eastern Sicily, Italy) to the Ionian sea. This area was struck on the 1st October 2009 by an extreme climatic event resulting in thousands of rapid shallow landslides, mainly of debris flows and debris avalanches types involving the weathered layer of a low to high grade metamorphic bedrock. Exploiting the same set of predictors and the 2009 landslide archive, BLR- and BRT-based susceptibility models were obtained for the two catchments separately, adopting a random partition (RP) technique for validation; besides, the models trained in one of the two catchments (Briga) were tested in predicting the landslide distribution in the other (Giampilieri), adopting a spatial partition (SP) based validation procedure. All the validation procedures were based on multi-folds tests so to evaluate and compare the reliability of the fitting, the prediction skill, the coherence in the predictor selection and the precision of the susceptibility estimates. All the obtained models for the two methods produced very high predictive performances, with a general congruence between BLR and BRT in the predictor importance. In particular, the research highlighted that BRT-models reached a higher prediction performance with respect to BLR-models, for RP based modelling, whilst for the SP-based models the difference in predictive skills between the two methods dropped drastically, converging to an analogous excellent performance. However, when looking at the precision of the probability estimates, BLR demonstrated to produce more robust
An improved multiple linear regression and data analysis computer program package
Sidik, S. M.
1972-01-01
NEWRAP, an improved version of a previous multiple linear regression program called RAPIER, CREDUC, and CRSPLT, allows for a complete regression analysis including cross plots of the independent and dependent variables, correlation coefficients, regression coefficients, analysis of variance tables, t-statistics and their probability levels, rejection of independent variables, plots of residuals against the independent and dependent variables, and a canonical reduction of quadratic response functions useful in optimum seeking experimentation. A major improvement over RAPIER is that all regression calculations are done in double precision arithmetic.
The Analysis of Internet Addiction Scale Using Multivariate Adaptive Regression Splines
M Kayri
2010-12-01
Full Text Available "nBackground: Determining real effects on internet dependency is too crucial with unbiased and robust statistical method. MARS is a new non-parametric method in use in the literature for parameter estimations of cause and effect based research. MARS can both obtain legible model curves and make unbiased parametric predictions."nMethods: In order to examine the performance of MARS, MARS findings will be compared to Classification and Regression Tree (C&RT findings, which are considered in the literature to be efficient in revealing correlations between variables. The data set for the study is taken from "The Internet Addiction Scale" (IAS, which attempts to reveal addiction levels of individuals. The population of the study consists of 754 secondary school students (301 female, 443 male students with 10 missing data. MARS 2.0 trial version is used for analysis by MARS method and C&RT analysis was done by SPSS."nResults: MARS obtained six base functions of the model. As a common result of these six functions, regression equation of the model was found. Over the predicted variable, MARS showed that the predictors of daily Internet-use time on average, the purpose of Internet- use, grade of students and occupations of mothers had a significant effect (P< 0.05. In this comparative study, MARS obtained different findings from C&RT in dependency level prediction."nConclusion: The fact that MARS revealed extent to which the variable, which was considered significant, changes the character of the model was observed in this study.
Random Decrement and Regression Analysis of Traffic Responses of Bridges
Asmussen, J. C.; Ibrahim, S. R.; Brincker, Rune
The topic of this paper is the estimation of modal parameters from ambient data by applying the Random Decrement technique. The data from the Queensborough Bridge over the Fraser River in Vancouver, Canada have been applied. The loads producing the dynamic response are ambient, e.g. wind, traffic...... and small ground motion. The Random Decrement technique is used to estimate the correlation function or the free decays from the ambient data. From these functions, the modal parameters are extracted using the Ibrahim Time Domain method. The possible influence of the traffic mass load on the bridge...... of the analysis using the Random Decrement technique are compared with results from an analysis based on fast Fourier transformations....
Random Decrement and Regression Analysis of Traffic Responses of Bridges
Asmussen, J. C.; Ibrahim, S. R.; Brincker, Rune
1996-01-01
The topic of this paper is the estimation of modal parameters from ambient data by applying the Random Decrement technique. The data fro the Queensborough Bridge over the Fraser River in Vancouver, Canada have been applied. The loads producing the dynamic response are ambient, e. g. wind, traffic...... and small ground motion. The random Decrement technique is used to estimate the correlation function or the free decays from the ambient data. From these functions, the modal parameters are extracted using the Ibrahim Time domain method. The possible influence of the traffic mass load on the bridge...... of the analysis using the Random decrement technique are compared with results from an analysis based on fast Fourier transformations....
Analysis of cost regression and post-accident absence
Wojciech, Drozd
2017-07-01
The article presents issues related with costs of work safety. It proves the thesis that economic aspects cannot be overlooked in effective management of occupational health and safety and that adequate expenditures on safety can bring tangible benefits to the company. Reliable analysis of this problem is essential for the description the problem of safety the work. In the article attempts to carry it out using the procedures of mathematical statistics [1, 2, 3].
Chadoeuf, J.; Joannes, H.; Nandris, D.; Pierrat, J.C.
1988-12-01
The spread of root diseases in rubber tree (Hevea brasiliensis) due to Rigidoporus lignosus and Phellinus noxius was investigated epidemiologically using data collected every 6 month during a 6-year survey in a plantation. The aim of the present study is to see what factors could predict whether a given tree would be infested at the following inspection. Using a qualitative regression method we expressed the probability of pathogenic attack on a tree in terms of three factors: the state of health of the surrounding trees, the method used to clear the forest prior to planting, and evolution with time. The effects of each factor were ranked, and the roles of the various classes of neighbors were established and quantified. Variability between successive inspections was small, and the method of forest clearing was important only while primary inocula in the soil were still infectious. The state of health of the immediate neighbors was most significant; more distant neighbors in the same row had some effect; interrow spread was extremely rare. This investigation dealt only with trees as individuals, and further study of the interrelationships of groups of trees is needed.
The Use of Nonparametric Kernel Regression Methods in Econometric Production Analysis
Czekaj, Tomasz Gerard
This PhD thesis addresses one of the fundamental problems in applied econometric analysis, namely the econometric estimation of regression functions. The conventional approach to regression analysis is the parametric approach, which requires the researcher to specify the form of the regression...... to avoid this problem. The main objective is to investigate the applicability of the nonparametric kernel regression method in applied production analysis. The focus of the empirical analyses included in this thesis is the agricultural sector in Poland. Data on Polish farms are used to investigate...... practically and politically relevant problems and to illustrate how nonparametric regression methods can be used in applied microeconomic production analysis both in panel data and cross-section data settings. The thesis consists of four papers. The first paper addresses problems of parametric...
Measurement and Analysis of Test Suite Volume Metrics for Regression Testing
S Raju
2014-01-01
Full Text Available Regression testing intends to ensure that a software applications works as specified after changes made to it during maintenance. It is an important phase in software development lifecycle. Regression testing is the re-execution of some subset of test cases that has already been executed. It is an expensive process used to detect defects due to regressions. Regression testing has been used to support software-testing activities and assure acquiring an appropriate quality through several versions of a software product during its development and maintenance. Regression testing assures the quality of modified applications. In this proposed work, a study and analysis of metrics related to test suite volume was undertaken. It was shown that the software under test needs more test cases after changes were made to it. A comparative analysis was performed for finding the change in test suite size before and after the regression test.
Yoon, Young Kyung; Kim, Hyun Ah; Ryu, Seong Yeol; Lee, Eun Jung; Lee, Mi Suk; Kim, Jieun; Park, Seong Yeon; Yang, Kyung Sook; Kim, Shin Woo
2017-02-01
This study aimed to construct a prediction algorithm, which is readily applicable in the clinical setting, to determine the mortality rate for patients with P. aeruginosa bacteremia. A multicenter observational cohort study was performed retrospectively in seven university-affiliated hospitals in Korea from March 2012 to February 2015. In total, 264 adult patients with monomicrobial P. aeruginosa bacteremia were included in the analyses. Among the predictors independently associated with 30-day mortality in the Cox regression model, Pitt bacteremia score >2 and high-risk source of bacteremia were identified as critical nodes in the tree-structured survival analysis. Particularly, the empirical combination therapy was not associated with any survival benefit in the Cox regression model compared to the empirical monotherapy. This study suggests that determining the infection source and evaluating the clinical severity are critical to predict the clinical outcome in patients with P. aeruginosa bacteremia.
Greve, Mogens Humlekrog; Bou Kheir, Rania; Greve, Mette Balslev
2012-01-01
sand, silt, and clay in soil determines its textural classification. This study used Geographic Information Systems (GIS) and regression-tree modeling to precisely quantify the relationships between the soil texture fractions and different environmental parameters on a national scale, and to detect...... precipitation, seasonal precipitation to statistically explain soil texture fractions field/laboratory measurements (45,224 sampling sites) in the area of interest (Denmark). The developed strongest relationships were associated with clay and silt, variance being equal to 60%, followed by coarse sand (54.......5%) and fine sand (52%) as the weakest relationship. This study also showed that parent materials (with a relative importance varying between 47% and 100%), geographic regions (31–100%) and landscape types (68–100%) considerably influenced all soil texture fractions, which is not the case for climate and DEM...
DIF Trees: Using Classification Trees to Detect Differential Item Functioning
Vaughn, Brandon K.; Wang, Qiu
2010-01-01
A nonparametric tree classification procedure is used to detect differential item functioning for items that are dichotomously scored. Classification trees are shown to be an alternative procedure to detect differential item functioning other than the use of traditional Mantel-Haenszel and logistic regression analysis. A nonparametric…
Multi-level tree analysis of pulmonary artery/vein trees in non-contrast CT images
Gao, Zhiyun; Grout, Randall W.; Hoffman, Eric A.; Saha, Punam K.
2012-02-01
Diseases like pulmonary embolism and pulmonary hypertension are associated with vascular dystrophy. Identifying such pulmonary artery/vein (A/V) tree dystrophy in terms of quantitative measures via CT imaging significantly facilitates early detection of disease or a treatment monitoring process. A tree structure, consisting of nodes and connected arcs, linked to the volumetric representation allows multi-level geometric and volumetric analysis of A/V trees. Here, a new theory and method is presented to generate multi-level A/V tree representation of volumetric data and to compute quantitative measures of A/V tree geometry and topology at various tree hierarchies. The new method is primarily designed on arc skeleton computation followed by a tree construction based topologic and geometric analysis of the skeleton. The method starts with a volumetric A/V representation as input and generates its topologic and multi-level volumetric tree representations long with different multi-level morphometric measures. A new recursive merging and pruning algorithms are introduced to detect bad junctions and noisy branches often associated with digital geometric and topologic analysis. Also, a new notion of shortest axial path is introduced to improve the skeletal arc joining two junctions. The accuracy of the multi-level tree analysis algorithm has been evaluated using computer generated phantoms and pulmonary CT images of a pig vessel cast phantom while the reproducibility of method is evaluated using multi-user A/V separation of in vivo contrast-enhanced CT images of a pig lung at different respiratory volumes.
Robust analysis of trends in noisy tokamak confinement data using geodesic least squares regression
Verdoolaege, G.; Shabbir, A.; Hornung, G.
2016-11-01
Regression analysis is a very common activity in fusion science for unveiling trends and parametric dependencies, but it can be a difficult matter. We have recently developed the method of geodesic least squares (GLS) regression that is able to handle errors in all variables, is robust against data outliers and uncertainty in the regression model, and can be used with arbitrary distribution models and regression functions. We here report on first results of application of GLS to estimation of the multi-machine scaling law for the energy confinement time in tokamaks, demonstrating improved consistency of the GLS results compared to standard least squares.
Development of a User Interface for a Regression Analysis Software Tool
Ulbrich, Norbert Manfred; Volden, Thomas R.
2010-01-01
An easy-to -use user interface was implemented in a highly automated regression analysis tool. The user interface was developed from the start to run on computers that use the Windows, Macintosh, Linux, or UNIX operating system. Many user interface features were specifically designed such that a novice or inexperienced user can apply the regression analysis tool with confidence. Therefore, the user interface s design minimizes interactive input from the user. In addition, reasonable default combinations are assigned to those analysis settings that influence the outcome of the regression analysis. These default combinations will lead to a successful regression analysis result for most experimental data sets. The user interface comes in two versions. The text user interface version is used for the ongoing development of the regression analysis tool. The official release of the regression analysis tool, on the other hand, has a graphical user interface that is more efficient to use. This graphical user interface displays all input file names, output file names, and analysis settings for a specific software application mode on a single screen which makes it easier to generate reliable analysis results and to perform input parameter studies. An object-oriented approach was used for the development of the graphical user interface. This choice keeps future software maintenance costs to a reasonable limit. Examples of both the text user interface and graphical user interface are discussed in order to illustrate the user interface s overall design approach.
The Use of Nonparametric Kernel Regression Methods in Econometric Production Analysis
Czekaj, Tomasz Gerard
This PhD thesis addresses one of the fundamental problems in applied econometric analysis, namely the econometric estimation of regression functions. The conventional approach to regression analysis is the parametric approach, which requires the researcher to specify the form of the regression...... function. However, the a priori specification of a functional form involves the risk of choosing one that is not similar to the “true” but unknown relationship between the regressors and the dependent variable. This problem, known as parametric misspecification, can result in biased parameter estimates...... and nonparametric estimations of production functions in order to evaluate the optimal firm size. The second paper discusses the use of parametric and nonparametric regression methods to estimate panel data regression models. The third paper analyses production risk, price uncertainty, and farmers' risk preferences...
Telmo, C; Lousada, J; Moreira, N
2010-06-01
The gross calorific value (GCV), proximate, ultimate and chemical analysis of debark wood in Portugal were studied, for future utilization in wood pellets industry and the results compared with CEN/TS 14961. The relationship between GCV, ultimate and chemical analysis were determined by multiple regression stepwise backward. The treatment between hardwoods-softwoods did not result in significant statistical differences for proximate, ultimate and chemical analysis. Significant statistical differences were found in carbon for National (hardwoods-softwoods) and (National-tropical) hardwoods in volatile matter, fixed carbon, carbon and oxygen and also for chemical analysis in National (hardwoods-softwoods) for F and (National-tropical) hardwoods for Br. GCV was highly positively related to C (0.79 * * *) and negatively to O (-0.71 * * *). The final independent variables of the model were (C, O, S, Zn, Ni, Br) with R(2)=0.86; F=27.68 * * *. The hydrogen did not contribute statistically to the energy content.
Detecting overdispersion in count data: A zero-inflated Poisson regression analysis
Afiqah Muhamad Jamil, Siti; Asrul Affendi Abdullah, M.; Kek, Sie Long; Nor, Maria Elena; Mohamed, Maryati; Ismail, Norradihah
2017-09-01
This study focusing on analysing count data of butterflies communities in Jasin, Melaka. In analysing count dependent variable, the Poisson regression model has been known as a benchmark model for regression analysis. Continuing from the previous literature that used Poisson regression analysis, this study comprising the used of zero-inflated Poisson (ZIP) regression analysis to gain acute precision on analysing the count data of butterfly communities in Jasin, Melaka. On the other hands, Poisson regression should be abandoned in the favour of count data models, which are capable of taking into account the extra zeros explicitly. By far, one of the most popular models include ZIP regression model. The data of butterfly communities which had been called as the number of subjects in this study had been taken in Jasin, Melaka and consisted of 131 number of subjects visits Jasin, Melaka. Since the researchers are considering the number of subjects, this data set consists of five families of butterfly and represent the five variables involve in the analysis which are the types of subjects. Besides, the analysis of ZIP used the SAS procedure of overdispersion in analysing zeros value and the main purpose of continuing the previous study is to compare which models would be better than when exists zero values for the observation of the count data. The analysis used AIC, BIC and Voung test of 5% level significance in order to achieve the objectives. The finding indicates that there is a presence of over-dispersion in analysing zero value. The ZIP regression model is better than Poisson regression model when zero values exist.
A primer for biomedical scientists on how to execute model II linear regression analysis.
Ludbrook, John
2012-04-01
1. There are two very different ways of executing linear regression analysis. One is Model I, when the x-values are fixed by the experimenter. The other is Model II, in which the x-values are free to vary and are subject to error. 2. I have received numerous complaints from biomedical scientists that they have great difficulty in executing Model II linear regression analysis. This may explain the results of a Google Scholar search, which showed that the authors of articles in journals of physiology, pharmacology and biochemistry rarely use Model II regression analysis. 3. I repeat my previous arguments in favour of using least products linear regression analysis for Model II regressions. I review three methods for executing ordinary least products (OLP) and weighted least products (WLP) regression analysis: (i) scientific calculator and/or computer spreadsheet; (ii) specific purpose computer programs; and (iii) general purpose computer programs. 4. Using a scientific calculator and/or computer spreadsheet, it is easy to obtain correct values for OLP slope and intercept, but the corresponding 95% confidence intervals (CI) are inaccurate. 5. Using specific purpose computer programs, the freeware computer program smatr gives the correct OLP regression coefficients and obtains 95% CI by bootstrapping. In addition, smatr can be used to compare the slopes of OLP lines. 6. When using general purpose computer programs, I recommend the commercial programs systat and Statistica for those who regularly undertake linear regression analysis and I give step-by-step instructions in the Supplementary Information as to how to use loss functions.
Infinite dimensional spherical analysis and harmonic analysis for groups acting on homogeneous trees
Axelgaard, Emil
of the groups, the so-called irreducible tame representations. We prove the existence of irreducible non-tame representations by constructing a compactification of the boundary of the tree - an object which until now has not played any role in the analysis of automorphism groups for trees which are not locally......In this thesis, we study groups of automorphisms for homogeneous trees of countable degree by using an inductive limit approach. The main focus is the thourough discussion of two Olshanski spherical pairs consisting of automorphism groups for a homogeneous tree and a homogeneous rooted tree...... finite. Finally, we discuss conditionally positive definite functions on the groups and use the generalized Bochner-Godement theorem for Olshanski spherical pairs to prove Levy-Khinchine formulas for both of the considered pairs....
FINANCIAL PERFORMANCE INDICATORS OF TUNISIAN COMPANIES: DECISION TREE ANALYSIS
Ferdaws Ezzi
2016-01-01
Full Text Available The article at hand is an attempt to identify the various indicators that are more likely to explain the financial performance of Tunisian companies. In this respective, the emphasis is put on diversification, innovation, intrapersonal and interpersonal skills. Indeed, they are the appropriate strategies that can designate emotional intelligence, the level of indebtedness, the firm age and size as the proper variables that support the target variable. The "decision tree", as a new data analysis method, is utilized to analyze our work. The results involve the construction of a crucial model which is used to achieve a sound financial performance.
Wang, Ming; Flanders, W Dana; Bostick, Roberd M; Long, Qi
2012-12-20
Measurement error is common in epidemiological and biomedical studies. When biomarkers are measured in batches or groups, measurement error is potentially correlated within each batch or group. In regression analysis, most existing methods are not applicable in the presence of batch-specific measurement error in predictors. We propose a robust conditional likelihood approach to account for batch-specific error in predictors when batch effect is additive and the predominant source of error, which requires no assumptions on the distribution of measurement error. Although a regression model with batch as a categorical covariable yields the same parameter estimates as the proposed conditional likelihood approach for linear regression, this result does not hold in general for all generalized linear models, in particular, logistic regression. Our simulation studies show that the conditional likelihood approach achieves better finite sample performance than the regression calibration approach or a naive approach without adjustment for measurement error. In the case of logistic regression, our proposed approach is shown to also outperform the regression approach with batch as a categorical covariate. In addition, we also examine a 'hybrid' approach combining the conditional likelihood method and the regression calibration method, which is shown in simulations to achieve good performance in the presence of both batch-specific and measurement-specific errors. We illustrate our method by using data from a colorectal adenoma study.
Mesoamerican tree squirrels evolution (Rodentia: Sciuridae): a molecular phylogenetic analysis.
Villalobos, Federico; Gutierrez-Espeleta, Gustavo
2014-06-01
The tribe Sciurini comprehends the genera Sciurus, Syntheosiurus, Microsciurus, Tamiasciurus and Rheinthrosciurus. The phylogenetic relationships within Sciurus have been only partially done, and the relationship between Mesoamerican species remains unsolved. The phylogenetic relationships of the Mesoamerican tree squirrels were examined using molecular data. Sequence data publicly available (12S, 16S, CYTB mitochondrial genes and IRBP nuclear gene) and cytochrome B gene sequences of four previously not sampled Mesoamerican Sciurus species were analyzed under a Bayesian multispecies coalescence model. Phylogenetic analysis of the multilocus data set showed the neotropical tree squirrels as a monophyletic clade. The genus Sciurus was paraphyletic due to the inclusion of Microsciurus species (M. alfari and M. flaviventer). The South American species S. aestuans and S. stramineus showed a sister taxa relationship. Single locus analysis based on the most compact and complete data set (i.e. CYTB gene sequences), supported the monophyly of the South American species and recovered a Mesoamerican clade including S. aureogaster, S. granatensis and S. variegatoides. These results corroborated previous findings based on cladistic analysis of cranial and post-cranial characters. Our data support a close relationship between Mesoamerican Sciurus species and a sister relationship with South American species, and corroborates previous findings in relation to the polyphyly of Microsciurus and Syntheosciurus paraphyly.
Deng, Yangyang; Parajuli, Prem B.
2011-08-10
Evaluation of economic feasibility of a bio-gasification facility needs understanding of its unit cost under different production capacities. The objective of this study was to evaluate the unit cost of syngas production at capacities from 60 through 1800Nm 3/h using an economic model with three regression analysis techniques (simple regression, reciprocal regression, and log-log regression). The preliminary result of this study showed that reciprocal regression analysis technique had the best fit curve between per unit cost and production capacity, with sum of error squares (SES) lower than 0.001 and coefficient of determination of (R 2) 0.996. The regression analysis techniques determined the minimum unit cost of syngas production for micro-scale bio-gasification facilities of $0.052/Nm 3, under the capacity of 2,880 Nm 3/h. The results of this study suggest that to reduce cost, facilities should run at a high production capacity. In addition, the contribution of this technique could be the new categorical criterion to evaluate micro-scale bio-gasification facility from the perspective of economic analysis.
Regression analysis understanding and building business and economic models using Excel
Wilson, J Holton
2012-01-01
The technique of regression analysis is used so often in business and economics today that an understanding of its use is necessary for almost everyone engaged in the field. This book will teach you the essential elements of building and understanding regression models in a business/economic context in an intuitive manner. The authors take a non-theoretical treatment that is accessible even if you have a limited statistical background. It is specifically designed to teach the correct use of regression, while advising you of its limitations and teaching about common pitfalls. This book describe
Zhang, Hong-guang; Lu, Jian-gang
2016-02-01
Abstract To overcome the problems of significant difference among samples and nonlinearity between the property and spectra of samples in spectral quantitative analysis, a local regression algorithm is proposed in this paper. In this algorithm, net signal analysis method(NAS) was firstly used to obtain the net analyte signal of the calibration samples and unknown samples, then the Euclidean distance between net analyte signal of the sample and net analyte signal of calibration samples was calculated and utilized as similarity index. According to the defined similarity index, the local calibration sets were individually selected for each unknown sample. Finally, a local PLS regression model was built on each local calibration sets for each unknown sample. The proposed method was applied to a set of near infrared spectra of meat samples. The results demonstrate that the prediction precision and model complexity of the proposed method are superior to global PLS regression method and conventional local regression algorithm based on spectral Euclidean distance.
Enterprise architecture availability analysis using fault trees and stakeholder interviews
Närman, Per; Franke, Ulrik; König, Johan; Buschle, Markus; Ekstedt, Mathias
2014-01-01
The availability of enterprise information systems is a key concern for many organisations. This article describes a method for availability analysis based on Fault Tree Analysis and constructs from the ArchiMate enterprise architecture (EA) language. To test the quality of the method, several case-studies within the banking and electrical utility industries were performed. Input data were collected through stakeholder interviews. The results from the case studies were compared with availability of log data to determine the accuracy of the method's predictions. In the five cases where accurate log data were available, the yearly downtime estimates were within eight hours from the actual downtimes. The cost of performing the analysis was low; no case study required more than 20 man-hours of work, making the method ideal for practitioners with an interest in obtaining rapid availability estimates of their enterprise information systems.
Quantile regression for the statistical analysis of immunological data with many non-detects
Eilers Paul HC; Röder Esther; Savelkoul Huub FJ; van Wijk Roy
2012-01-01
Abstract Background Immunological parameters are hard to measure. A well-known problem is the occurrence of values below the detection limit, the non-detects. Non-detects are a nuisance, because classical statistical analyses, like ANOVA and regression, cannot be applied. The more advanced statistical techniques currently available for the analysis of datasets with non-detects can only be used if a small percentage of the data are non-detects. Methods and results Quantile regression, a genera...
Sayegh, Arwa; Tate, James E.; Ropkins, Karl
2016-02-01
Oxides of Nitrogen (NOx) is a major component of photochemical smog and its constituents are considered principal traffic-related pollutants affecting human health. This study investigates the influence of background concentrations of NOx, traffic density, and prevailing meteorological conditions on roadside concentrations of NOx at UK urban, open motorway, and motorway tunnel sites using the statistical approach Boosted Regression Trees (BRT). BRT models have been fitted using hourly concentration, traffic, and meteorological data for each site. The models predict, rank, and visualise the relationship between model variables and roadside NOx concentrations. A strong relationship between roadside NOx and monitored local background concentrations is demonstrated. Relationships between roadside NOx and other model variables have been shown to be strongly influenced by the quality and resolution of background concentrations of NOx, i.e. if it were based on monitored data or modelled prediction. The paper proposes a direct method of using site-specific fundamental diagrams for splitting traffic data into four traffic states: free-flow, busy-flow, congested, and severely congested. Using BRT models, the density of traffic (vehicles per kilometre) was observed to have a proportional influence on the concentrations of roadside NOx, with different fitted regression line slopes for the different traffic states. When other influences are conditioned out, the relationship between roadside concentrations and ambient air temperature suggests NOx concentrations reach a minimum at around 22 °C with high concentrations at low ambient air temperatures which could be associated to restricted atmospheric dispersion and/or to changes in road traffic exhaust emission characteristics at low ambient air temperatures. This paper uses BRT models to study how different critical factors, and their relative importance, influence the variation of roadside NOx concentrations. The paper
Ibanez, C. A. G.; Carcellar, B. G., III; Paringit, E. C.; Argamosa, R. J. L.; Faelga, R. A. G.; Posilero, M. A. V.; Zaragosa, G. P.; Dimayacyac, N. A.
2016-06-01
Diameter-at-Breast-Height Estimation is a prerequisite in various allometric equations estimating important forestry indices like stem volume, basal area, biomass and carbon stock. LiDAR Technology has a means of directly obtaining different forest parameters, except DBH, from the behavior and characteristics of point cloud unique in different forest classes. Extensive tree inventory was done on a two-hectare established sample plot in Mt. Makiling, Laguna for a natural growth forest. Coordinates, height, and canopy cover were measured and types of species were identified to compare to LiDAR derivatives. Multiple linear regression was used to get LiDAR-derived DBH by integrating field-derived DBH and 27 LiDAR-derived parameters at 20m, 10m, and 5m grid resolutions. To know the best combination of parameters in DBH Estimation, all possible combinations of parameters were generated and automated using python scripts and additional regression related libraries such as Numpy, Scipy, and Scikit learn were used. The combination that yields the highest r-squared or coefficient of determination and lowest AIC (Akaike's Information Criterion) and BIC (Bayesian Information Criterion) was determined to be the best equation. The equation is at its best using 11 parameters at 10mgrid size and at of 0.604 r-squared, 154.04 AIC and 175.08 BIC. Combination of parameters may differ among forest classes for further studies. Additional statistical tests can be supplemented to help determine the correlation among parameters such as Kaiser- Meyer-Olkin (KMO) Coefficient and the Barlett's Test for Spherecity (BTS).
Evaluating Non-Linear Regression Models in Analysis of Persian Walnut Fruit Growth
I. Karamatlou
2016-02-01
Full Text Available Introduction: Persian walnut (Juglans regia L. is a large, wind-pollinated, monoecious, dichogamous, long lived, perennial tree cultivated for its high quality wood and nuts throughout the temperate regions of the world. Growth model methodology has been widely used in the modeling of plant growth. Mathematical models are important tools to study the plant growth and agricultural systems. These models can be applied for decision-making anddesigning management procedures in horticulture. Through growth analysis, planning for planting systems, fertilization, pruning operations, harvest time as well as obtaining economical yield can be more accessible.Non-linear models are more difficult to specify and estimate than linear models. This research was aimed to studynon-linear regression models based on data obtained from fruit weight, length and width. Selecting the best models which explain that fruit inherent growth pattern of Persian walnut was a further goal of this study. Materials and Methods: The experimental material comprising 14 Persian walnut genotypes propagated by seed collected from a walnut orchard in Golestan province, Minoudasht region, Iran, at latitude 37◦04’N; longitude 55◦32’E; altitude 1060 m, in a silt loam soil type. These genotypes were selected as a representative sampling of the many walnut genotypes available throughout the Northeastern Iran. The age range of walnut trees was 30 to 50 years. The annual mean temperature at the location is16.3◦C, with annual mean rainfall of 690 mm.The data used here is the average of walnut fresh fruit and measured withgram/millimeter/day in2011.According to the data distribution pattern, several equations have been proposed to describesigmoidal growth patterns. Here, we used double-sigmoid and logistic–monomolecular models to evaluate fruit growth based on fruit weight and4different regression models in cluding Richards, Gompertz, Logistic and Exponential growth for evaluation
Exploratory regression analysis: a tool for selecting models and determining predictor importance.
Braun, Michael T; Oswald, Frederick L
2011-06-01
Linear regression analysis is one of the most important tools in a researcher's toolbox for creating and testing predictive models. Although linear regression analysis indicates how strongly a set of predictor variables, taken together, will predict a relevant criterion (i.e., the multiple R), the analysis cannot indicate which predictors are the most important. Although there is no definitive or unambiguous method for establishing predictor variable importance, there are several accepted methods. This article reviews those methods for establishing predictor importance and provides a program (in Excel) for implementing them (available for direct download at http://dl.dropbox.com/u/2480715/ERA.xlsm?dl=1) . The program investigates all 2(p) - 1 submodels and produces several indices of predictor importance. This exploratory approach to linear regression, similar to other exploratory data analysis techniques, has the potential to yield both theoretical and practical benefits.
TU-AB-BRD-03: Fault Tree Analysis
Dunscombe, P. [University of Calgary (Canada)
2015-06-15
Current quality assurance and quality management guidelines provided by various professional organizations are prescriptive in nature, focusing principally on performance characteristics of planning and delivery devices. However, published analyses of events in radiation therapy show that most events are often caused by flaws in clinical processes rather than by device failures. This suggests the need for the development of a quality management program that is based on integrated approaches to process and equipment quality assurance. Industrial engineers have developed various risk assessment tools that are used to identify and eliminate potential failures from a system or a process before a failure impacts a customer. These tools include, but are not limited to, process mapping, failure modes and effects analysis, fault tree analysis. Task Group 100 of the American Association of Physicists in Medicine has developed these tools and used them to formulate an example risk-based quality management program for intensity-modulated radiotherapy. This is a prospective risk assessment approach that analyzes potential error pathways inherent in a clinical process and then ranks them according to relative risk, typically before implementation, followed by the design of a new process or modification of the existing process. Appropriate controls are then put in place to ensure that failures are less likely to occur and, if they do, they will more likely be detected before they propagate through the process, compromising treatment outcome and causing harm to the patient. Such a prospective approach forms the basis of the work of Task Group 100 that has recently been approved by the AAPM. This session will be devoted to a discussion of these tools and practical examples of how these tools can be used in a given radiotherapy clinic to develop a risk based quality management program. Learning Objectives: Learn how to design a process map for a radiotherapy process Learn how to
César Lavorenti
1990-01-01
Full Text Available O presente trabalho foi realizado com o objetivo de determinar a existência e as magnitudes de correlações e regressões lineares simples em plântulas jovens de seringueira (Hevea spp., para melhor condução de seleção nos futuros trabalhos de melhoramento. Foram utilizadas médias de produção de borracha seca por plântulas por corte, através do teste Hamaker-Morris-Mann (P; circunferência do caule (CC; espessura de casca (EC; número de anéis (NA; diâmetro dos vasos (DV; densidade dos vasos laticíleros (D e distância média entre anéis de vasos consecutivos (DMEAVC em um viveiro de cruzamento com três anos e meio de idade. Os resultados mostraram, entre outros fatores, que as correlações lineares simples de P com CC, EC, NA, D, DV e DMEAVC foram, respectivamente, r =t 0,61, 0,34, 0,28, 0,29, 0,43 e -0,13. As correlações de CC com EC, NA, D, DV e DMEAVC foram: 0,65, 0,22, 0,37, 0,33 e 0,096 respectivamente. Estudos de regressão linear simples de P com CC, EC, NA, DV, D e DMEAVC sugerem que CC foi o caráter independente mais significativo, contribuindo com 36% da variação em P. Em relação ao vigor, a regressão de CC com os respectivos caracteres sugere que EC foi o único caráter que contribuiu significativamente para a variação de CC com 42%. As altas correlações observadas da produção com circunferência do caule e com espessura de casca evidenciam a possibilidade de obter genótipos jovens de boa capacidade produtiva e grande vigor, através de seleção precoce dessas variáveis.This study was undertaken aiming to determine the existence of linear correlations, based on simple regression studies for a better improvement of young rubber tree (Hevea spp. breeding and selection. The characters studied were: yield of dry rubber per tapping by Hamaker-Morris-Mann test tapping (P, mean gurth (CC, bark thickness (EC, number of latex vessel rings (NA, diameter of latex vesseis (DV, density of latex vesseis per 5mm
Analysis of Ion in Water Tree by Micro-FTIR, and Evaluation of Harmfulness of Water Tree
Sugimoto, Shu; Fujimura, Yoshinori; Nagahara, Shigeki; Nakade, Masahiko; Watabe, Mitsuhiro
In the case of XLPE cables, there is the influence of the ion in the water trees as deterioration factors by the ones. Therefore we have analyzed the element by EDX so far, but it was the analysis technique that had difficulty with the distinction of the multi atomic ion. Therefore we developed ion analysis technique in the water trees including the multi atomic ion by Micro-FTIR, and utilize it as to analyzing the actual cables. In addition, we have investigated the harmfulness of each electrolyte detected from the actual cables.
A. J. Cannon
2011-03-01
Full Text Available A global climate classification is defined using a multivariate regression tree (MRT. The MRT algorithm is automated, which removes the need for a practitioner to manually define the classes; it is hierarchical, which allows a series of nested classes to be defined; and it is rule-based, which allows climate classes to be unambiguously defined and easily interpreted. Climate variables used in the MRT are restricted to those from the Köppen-Geiger climate classification. The result is a hierarchical, rule-based climate classification that can be directly compared against the traditional system. An objective comparison between the two climate classifications at their 5, 13, and 30 class hierarchical levels indicates that both perform well in terms of identifying regions of homogeneous temperature variability, although the MRT still generally outperforms the Köppen-Geiger system. In terms of precipitation discrimination, the Köppen-Geiger classification performs poorly relative to the MRT. The data and algorithm implementation used in this study are freely available. Thus, the MRT climate classification offers instructors and students in the geosciences a simple instrument for exploring modern, computer-based climatological methods.
A. J. Cannon
2012-01-01
Full Text Available A global climate classification is defined using a multivariate regression tree (MRT. The MRT algorithm is automated, hierarchical, and rule-based, thus allowing a system of climate classes to be quickly defined and easily interpreted. Climate variables used in the MRT are restricted to those from the Köppen-Geiger classification system. The result is a set of classes that can be directly compared against those from the traditional system. The two climate classifications are compared at their 5, 13, and 30 class hierarchical levels in terms of climate homogeneity. Results indicate that both perform well in terms of identifying regions of homogeneous temperature variability, although the MRT still generally outperforms the Köppen-Geiger system. In terms of precipitation discrimination, the Köppen-Geiger classification performs poorly relative to the MRT. The data and algorithm implementation used in this study are freely available. Thus, the MRT climate classification offers instructors and students in the geosciences a simple instrument for exploring modern, computer-based climatological methods.
Suzuki, Hideaki; Tabata, Takahisa; Koizumi, Hiroki; Hohchi, Nobusuke; Takeuchi, Shoko; Kitamura, Takuro; Fujino, Yoshihisa; Ohbuchi, Toyoaki
2014-12-01
This study aimed to create a multiple regression model for predicting hearing outcomes of idiopathic sudden sensorineural hearing loss (ISSNHL). The participants were 205 consecutive patients (205 ears) with ISSNHL (hearing level ≥ 40 dB, interval between onset and treatment ≤ 30 days). They received systemic steroid administration combined with intratympanic steroid injection. Data were examined by simple and multiple regression analyses. Three hearing indices (percentage hearing improvement, hearing gain, and posttreatment hearing level [HLpost]) and 7 prognostic factors (age, days from onset to treatment, initial hearing level, initial hearing level at low frequencies, initial hearing level at high frequencies, presence of vertigo, and contralateral hearing level) were included in the multiple regression analysis as dependent and explanatory variables, respectively. In the simple regression analysis, the percentage hearing improvement, hearing gain, and HLpost showed significant correlation with 2, 5, and 6 of the 7 prognostic factors, respectively. The multiple correlation coefficients were 0.396, 0.503, and 0.714 for the percentage hearing improvement, hearing gain, and HLpost, respectively. Predicted values of HLpost calculated by the multiple regression equation were reliable with 70% probability with a 40-dB-width prediction interval. Prediction of HLpost by the multiple regression model may be useful to estimate the hearing prognosis of ISSNHL. © The Author(s) 2014.
Modeling of retardance in ferrofluid with Taguchi-based multiple regression analysis
Lin, Jing-Fung; Wu, Jyh-Shyang; Sheu, Jer-Jia
2015-03-01
The citric acid (CA) coated Fe3O4 ferrofluids are prepared by a co-precipitation method and the magneto-optical retardance property is measured by a Stokes polarimeter. Optimization and multiple regression of retardance in ferrofluids are executed by combining Taguchi method and Excel. From the nine tests for four parameters, including pH of suspension, molar ratio of CA to Fe3O4, volume of CA, and coating temperature, influence sequence and excellent program are found. Multiple regression analysis and F-test on the significance of regression equation are performed. It is found that the model F value is much larger than Fcritical and significance level P <0.0001. So it can be concluded that the regression model has statistically significant predictive ability. Substituting excellent program into equation, retardance is obtained as 32.703°, higher than the highest value in tests by 11.4%.
Many units in public housing or other low-income urban dwellings may have elevated pesticide residues, given recurring infestation, but it would be logistically and economically infeasible to sample a large number of units to identify highly exposed households to design interven...
Sudden losses of managed honey bee (Apis mellifera L.) colonies are considered an important problem worldwide but the underlying cause or causes of these losses are currently unknown. In the United States, this syndrome was termed Colony Collapse Disorder (CCD), since the defining trait was a rapid ...
The use of regression trees to the analysis of real estate market of housing
Jasińska, Elżbieta; Preweda, Edward
2013-01-01
Housing property can be described using a number of traits, some of which turn out to be difficult to express in numerical form. You can also use parameters which are not subject to explicit quantification. Belong to it selected location or membership of an exclusive building project, or opposed to a specific buyer reluctance of the complex. The attributes relevant to buyers may be disregarded just because of problems with their recognition in shaping market prices. A multitude of real estate...
Trend Analysis of Cancer Mortality and Incidence in Panama, Using Joinpoint Regression Analysis
Politis, Michael; Higuera, Gladys; Chang, Lissette Raquel; Gomez, Beatriz; Bares, Juan; Motta, Jorge
2015-01-01
Abstract Cancer is one of the leading causes of death worldwide and its incidence is expected to increase in the future. In Panama, cancer is also one of the leading causes of death. In 1964, a nationwide cancer registry was started and it was restructured and improved in 2012. The aim of this study is to utilize Joinpoint regression analysis to study the trends of the incidence and mortality of cancer in Panama in the last decade. Cancer mortality was estimated from the Panamanian National Institute of Census and Statistics Registry for the period 2001 to 2011. Cancer incidence was estimated from the Panamanian National Cancer Registry for the period 2000 to 2009. The Joinpoint Regression Analysis program, version 4.0.4, was used to calculate trends by age-adjusted incidence and mortality rates for selected cancers. Overall, the trend of age-adjusted cancer mortality in Panama has declined over the last 10 years (−1.12% per year). The cancers for which there was a significant increase in the trend of mortality were female breast cancer and ovarian cancer; while the highest increases in incidence were shown for breast cancer, liver cancer, and prostate cancer. Significant decrease in the trend of mortality was evidenced for the following: prostate cancer, lung and bronchus cancer, and cervical cancer; with respect to incidence, only oral and pharynx cancer in both sexes had a significant decrease. Some cancers showed no significant trends in incidence or mortality. This study reveals contrasting trends in cancer incidence and mortality in Panama in the last decade. Although Panama is considered an upper middle income nation, this study demonstrates that some cancer mortality trends, like the ones seen in cervical and lung cancer, behave similarly to the ones seen in high income countries. In contrast, other types, like breast cancer, follow a pattern seen in countries undergoing a transition to a developed economy with its associated lifestyle, nutrition, and
Fault tree analysis of most common rolling bearing tribological failures
Vencl, Aleksandar; Gašić, Vlada; Stojanović, Blaža
2017-02-01
Wear as a tribological process has a major influence on the reliability and life of rolling bearings. Field examinations of bearing failures due to wear indicate possible causes and point to the necessary measurements for wear reduction or elimination. Wear itself is a very complex process initiated by the action of different mechanisms, and can be manifested by different wear types which are often related. However, the dominant type of wear can be approximately determined. The paper presents the classification of most common bearing damages according to the dominant wear type, i.e. abrasive wear, adhesive wear, surface fatigue wear, erosive wear, fretting wear and corrosive wear. The wear types are correlated with the terms used in ISO 15243 standard. Each wear type is illustrated with an appropriate photograph, and for each wear type, appropriate description of causes and manifestations is presented. Possible causes of rolling bearing failure are used for the fault tree analysis (FTA). It was performed to determine the root causes for bearing failures. The constructed fault tree diagram for rolling bearing failure can be useful tool for maintenance engineers.
Analysis of thermal conductivity in tree-like branched networks
Kou Jian-Long; Lu Hang-Jun; Wu Feng-Min; Xu You-Sheng
2009-01-01
Asymmetric tree-like branched networks are explored by geometric algorithms.Based on the network,an analysis of the thermal conductivity is presented.The relationship between effective thermal conductivity and geometric structures is obtained by using the thermal-electrical analogy technique.In all studied cases,a clear behaviour is observed,where angle(δ,θ)among parent branching extended lines,branches and parameter of the geometric structures have stronger effects on the effective thermal conductivity.When the angle δ is fixed,the optical diameter ratio β* is dependent on angle θ.Moreover,γ and m are not related to β*.The longer the branch is,the smaller the effective thermal conductivity will be.It is also found that when the angle θ＜δ/2,the higher the iteration m is,the lower the thermal conductivity will be and it tends to zero,otherwise,it is bigger than zero.When the diameter ratio β1＜0.707 and angle δ is bigger,the optimal k of the perfect ratio increases with the increase of the angle δ;when β1＞0.707,the optimal k decreases.In addition,the effective thermal conductivity is always less than that of single channel material.The present results also show that the effective thermal conductivity of the asymmetric tree-like branched networks does not obey Murray's law.
A Study of Characteristics of Water Tree Growth by Electric Field Analysis
Nakade, Masahiko; Inoue, Daisuke
Three-dimensional electric fields analysis was applied to the tip of water trees in XLPE cable. First, pre-breakdown detection was curried out on “needle-shaped” water trees. The results were analyzed by the three-dimensional electric field F.E.M. The water tree was simplified by a spheroid in the analysis. The position of the trees, length, tip radius were read from the microphotographs. The analysis was done on all 11 examples. The test results could be explained well when the conductivity of the water tree region was assumed to be 5×10-7S/m. Next, the electric fields of tip of three kinds of water tree (“blue”, “needle-shaped” and “white” water tree) were analyzed. Three kinds of water tree were expressed by changing the conductivity of water tree region. And, a distance from water trees to inside half conductor was made to change in three kinds. These were analyzed about the cable (insulation thickness: 3, 6, 9 mm) of 6.6, 22, 66 kV respectively. It was found out that “blue” and needle-shaped tree in the 66 kV cable and “blue” tree in the 22 kV cable may cause a breakdown under the operation voltage. As for other cases, the tree may propagate without making breakdown until it bridges the insulation. And, the possibility that the growth of “white” water tree declined rapidly in 66, the 22 kV cable when it touches inner semi-conducting layer so that the tip electric fields of the tree are the same as the average electric fields of the cable was suggested.
Barndorff-Nielsen, Ole Eiler; Shephard, N.
2004-01-01
This paper analyses multivariate high frequency financial data using realized covariation. We provide a new asymptotic distribution theory for standard methods such as regression, correlation analysis, and covariance. It will be based on a fixed interval of time (e.g., a day or week), allowing...... the number of high frequency returns during this period to go to infinity. Our analysis allows us to study how high frequency correlations, regressions, and covariances change through time. In particular we provide confidence intervals for each of these quantities....
Analysis of Functional Data with Focus on Multinomial Regression and Multilevel Data
Mousavi, Seyed Nourollah
Functional data analysis (FDA) is a fast growing area in statistical research with increasingly diverse range of application from economics, medicine, agriculture, chemometrics, etc. Functional regression is an area of FDA which has received the most attention both in aspects of application...... and methodological development. Our main Functional data analysis (FDA) is a fast growing area in statistical research with increasingly diverse range of application from economics, medicine, agriculture, chemometrics, etc. Functional regression is an area of FDA which has received the most attention both in aspects...
Factor analysis and multiple regression between topography and precipitation on Jeju Island, Korea
Um, Myoung-Jin; Yun, Hyeseon; Jeong, Chang-Sam; Heo, Jun-Haeng
2011-11-01
SummaryIn this study, new factors that influence precipitation were extracted from geographic variables using factor analysis, which allow for an accurate estimation of orographic precipitation. Correlation analysis was also used to examine the relationship between nine topographic variables from digital elevation models (DEMs) and the precipitation in Jeju Island. In addition, a spatial analysis was performed in order to verify the validity of the regression model. From the results of the correlation analysis, it was found that all of the topographic variables had a positive correlation with the precipitation. The relations between the variables also changed in accordance with a change in the precipitation duration. However, upon examining the correlation matrix, no significant relationship between the latitude and the aspect was found. According to the factor analysis, eight topographic variables (latitude being the exception) were found to have a direct influence on the precipitation. Three factors were then extracted from the eight topographic variables. By directly comparing the multiple regression model with the factors (model 1) to the multiple regression model with the topographic variables (model 3), it was found that model 1 did not violate the limits of statistical significance and multicollinearity. As such, model 1 was considered to be appropriate for estimating the precipitation when taking into account the topography. In the study of model 1, the multiple regression model using factor analysis was found to be the best method for estimating the orographic precipitation on Jeju Island.
A general framework for the use of logistic regression models in meta-analysis.
Simmonds, Mark C; Higgins, Julian Pt
2016-12-01
Where individual participant data are available for every randomised trial in a meta-analysis of dichotomous event outcomes, "one-stage" random-effects logistic regression models have been proposed as a way to analyse these data. Such models can also be used even when individual participant data are not available and we have only summary contingency table data. One benefit of this one-stage regression model over conventional meta-analysis methods is that it maximises the correct binomial likelihood for the data and so does not require the common assumption that effect estimates are normally distributed. A second benefit of using this model is that it may be applied, with only minor modification, in a range of meta-analytic scenarios, including meta-regression, network meta-analyses and meta-analyses of diagnostic test accuracy. This single model can potentially replace the variety of often complex methods used in these areas. This paper considers, with a range of meta-analysis examples, how random-effects logistic regression models may be used in a number of different types of meta-analyses. This one-stage approach is compared with widely used meta-analysis methods including Bayesian network meta-analysis and the bivariate and hierarchical summary receiver operating characteristic (ROC) models for meta-analyses of diagnostic test accuracy.
Analysing count data of Butterflies communities in Jasin, Melaka: A Poisson regression analysis
Afiqah Muhamad Jamil, Siti; Asrul Affendi Abdullah, M.; Kek, Sie Long; Nor, Maria Elena; Mohamed, Maryati; Ismail, Norradihah
2017-09-01
Counting outcomes normally have remaining values highly skewed toward the right as they are often characterized by large values of zeros. The data of butterfly communities, had been taken from Jasin, Melaka and consists of 131 number of subject visits in Jasin, Melaka. In this paper, considering the count data of butterfly communities, an analysis is considered Poisson regression analysis as it is assumed to be an alternative way on better suited to the counting process. This research paper is about analysing count data from zero observation ecological inference of butterfly communities in Jasin, Melaka by using Poisson regression analysis. The software for Poisson regression is readily available and it is becoming more widely used in many field of research and the data was analysed by using SAS software. The purpose of analysis comprised the framework of identifying the concerns. Besides, by using Poisson regression analysis, the study determines the fitness of data for accessing the reliability on using the count data. The finding indicates that the highest and lowest number of subject comes from the third family (Nymphalidae) family and fifth (Hesperidae) family and the Poisson distribution seems to fit the zero values.
Regression Model Term Selection for the Analysis of Strain-Gage Balance Calibration Data
Ulbrich, Norbert Manfred; Volden, Thomas R.
2010-01-01
The paper discusses the selection of regression model terms for the analysis of wind tunnel strain-gage balance calibration data. Different function class combinations are presented that may be used to analyze calibration data using either a non-iterative or an iterative method. The role of the intercept term in a regression model of calibration data is reviewed. In addition, useful algorithms and metrics originating from linear algebra and statistics are recommended that will help an analyst (i) to identify and avoid both linear and near-linear dependencies between regression model terms and (ii) to make sure that the selected regression model of the calibration data uses only statistically significant terms. Three different tests are suggested that may be used to objectively assess the predictive capability of the final regression model of the calibration data. These tests use both the original data points and regression model independent confirmation points. Finally, data from a simplified manual calibration of the Ames MK40 balance is used to illustrate the application of some of the metrics and tests to a realistic calibration data set.
Baghi, Q; Bergé, J; Christophe, B; Touboul, P; Rodrigues, M
2015-01-01
The analysis of physical measurements often copes with highly correlated noises and interruptions caused by outliers, saturation events or transmission losses. We assess the impact of missing data on the performance of linear regression analysis involving the fit of modeled or measured time series. We show that data gaps can significantly alter the precision of the regression parameter estimation in the presence of colored noise, due to the frequency leakage of the noise power. We present a regression method which cancels this effect and estimates the parameters of interest with a precision comparable to the complete data case, even if the noise power spectral density (PSD) is not known a priori. The method is based on an autoregressive (AR) fit of the noise, which allows us to build an approximate generalized least squares estimator approaching the minimal variance bound. The method, which can be applied to any similar data processing, is tested on simulated measurements of the MICROSCOPE space mission, whos...
Fang Ye; Zhi-Hua Chen; Jie Chen; Fang Liu; Yong Zhang; Qin-Ying Fan; Lin Wang
2016-01-01
Background:In the past decades,studies on infant anemia have mainly focused on rural areas of China.With the increasing heterogeneity of population in recent years,available information on infant anemia is inconclusive in large cities of China,especially with comparison between native residents and floating population.This population-based cross-sectional study was implemented to determine the anemic status of infants as well as the risk factors in a representative downtown area of Beijing.Methods:As useful methods to build a predictive model,Chi-squared automatic interaction detection (CHAID) decision tree analysis and logistic regression analysis were introduced to explore risk factors of infant anemia.A total of 1091 infants aged 6-12 months together with their parents/caregivers living at Heping Avenue Subdistrict of Beijing were surveyed from January 1,2013 to December 31,2014.Results:The prevalence of anemia was 12.60％ with a range of 3.47％-40.00％ in different subgroup characteristics.The CHAID decision tree model has demonstrated multilevel interaction among risk factors through stepwise pathways to detect anemia.Besides the three predictors identified by logistic regression model including maternal anemia during pregnancy,exclusive breastfeeding in the first 6 months,and floating population,CHAID decision tree analysis also identified the fourth risk factor,the matemal educational level,with higher overall classification accuracy and larger area below the receiver operating characteristic curve.Conclusions:The infant anemic status in metropolis is complex and should be carefully considered by the basic health care practitioners.CHAID decision tree analysis has demonstrated a better performance in hierarchical analysis of population with great heterogeneity.Risk factors identified by this study might be meaningful in the early detection and prompt treatment of infant anemia in large cities.
Spline Nonparametric Regression Analysis of Stress-Strain Curve of Confined Concrete
Tavio Tavio
2008-01-01
Full Text Available Due to enormous uncertainties in confinement models associated with the maximum compressive strength and ductility of concrete confined by rectilinear ties, the implementation of spline nonparametric regression analysis is proposed herein as an alternative approach. The statistical evaluation is carried out based on 128 large-scale column specimens of either normal-or high-strength concrete tested under uniaxial compression. The main advantage of this kind of analysis is that it can be applied when the trend of relation between predictor and response variables are not obvious. The error in the analysis can, therefore, be minimized so that it does not depend on the assumption of a particular shape of the curve. This provides higher flexibility in the application. The results of the statistical analysis indicates that the stress-strain curves of confined concrete obtained from the spline nonparametric regression analysis proves to be in good agreement with the experimental curves available in literatures
No rationale for 1 variable per 10 events criterion for binary logistic regression analysis
Maarten van Smeden
2016-11-01
Full Text Available Abstract Background Ten events per variable (EPV is a widely advocated minimal criterion for sample size considerations in logistic regression analysis. Of three previous simulation studies that examined this minimal EPV criterion only one supports the use of a minimum of 10 EPV. In this paper, we examine the reasons for substantial differences between these extensive simulation studies. Methods The current study uses Monte Carlo simulations to evaluate small sample bias, coverage of confidence intervals and mean square error of logit coefficients. Logistic regression models fitted by maximum likelihood and a modified estimation procedure, known as Firth’s correction, are compared. Results The results show that besides EPV, the problems associated with low EPV depend on other factors such as the total sample size. It is also demonstrated that simulation results can be dominated by even a few simulated data sets for which the prediction of the outcome by the covariates is perfect (‘separation’. We reveal that different approaches for identifying and handling separation leads to substantially different simulation results. We further show that Firth’s correction can be used to improve the accuracy of regression coefficients and alleviate the problems associated with separation. Conclusions The current evidence supporting EPV rules for binary logistic regression is weak. Given our findings, there is an urgent need for new research to provide guidance for supporting sample size considerations for binary logistic regression analysis.
Family background variables as instruments for education in income regressions: A Bayesian analysis
L.F. Hoogerheide (Lennart); J.H. Block (Jörn); A.R. Thurik (Roy)
2012-01-01
textabstractThe validity of family background variables instrumenting education in income regressions has been much criticized. In this paper, we use data from the 2004 German Socio-Economic Panel and Bayesian analysis to analyze to what degree violations of the strict validity assumption affect the
A systematic review and meta-regression analysis of mivacurium for tracheal intubation
Vanlinthout, L.E.H.; Mesfin, S.H.; Hens, N.; Vanacker, B.F.; Robertson, E.N.; Booij, L.H.D.J.
2014-01-01
We systematically reviewed factors associated with intubation conditions in randomised controlled trials of mivacurium, using random-effects meta-regression analysis. We included 29 studies of 1050 healthy participants. Four factors explained 72.9% of the variation in the probability of excellent in
Schulz, Wolfram
Differences in student knowledge about democracy, institutions, and citizenship and students skills in interpreting political communication were studied through multilevel regression analysis of results from the second International Education Association (IEA) Study. This study provides data on 14-year-old students from 28 countries in Europe,…
Isolating the Effects of Training Using Simple Regression Analysis: An Example of the Procedure.
Waugh, C. Keith
This paper provides a case example of simple regression analysis, a forecasting procedure used to isolate the effects of training from an identified extraneous variable. This case example focuses on results of a three-day sales training program to improve bank loan officers' knowledge, skill-level, and attitude regarding solicitation and sale of…
Multiple Logistic Regression Analysis of Cigarette Use among High School Students
Adwere-Boamah, Joseph
2011-01-01
A binary logistic regression analysis was performed to predict high school students' cigarette smoking behavior from selected predictors from 2009 CDC Youth Risk Behavior Surveillance Survey. The specific target student behavior of interest was frequent cigarette use. Five predictor variables included in the model were: a) race, b) frequency of…
Bates, Reid A.; Holton, Elwood F., III; Burnett, Michael F.
1999-01-01
A case study of learning transfer demonstrates the possible effect of influential observation on linear regression analysis. A diagnostic method that tests for violation of assumptions, multicollinearity, and individual and multiple influential observations helps determine which observation to delete to eliminate bias. (SK)
Declining Bias and Gender Wage Discrimination? A Meta-Regression Analysis
Jarrell, Stephen B.; Stanley, T. D.
2004-01-01
The meta-regression analysis reveals that there is a strong tendency for discrimination estimates to fall and wage discrimination exist against the woman. The biasing effect of researchers' gender of not correcting for selection bias has weakened and changes in labor market have made it less important.
Bayesian nonparametric meta-analysis using Polya tree mixture models.
Branscum, Adam J; Hanson, Timothy E
2008-09-01
Summary. A common goal in meta-analysis is estimation of a single effect measure using data from several studies that are each designed to address the same scientific inquiry. Because studies are typically conducted in geographically disperse locations, recent developments in the statistical analysis of meta-analytic data involve the use of random effects models that account for study-to-study variability attributable to differences in environments, demographics, genetics, and other sources that lead to heterogeneity in populations. Stemming from asymptotic theory, study-specific summary statistics are modeled according to normal distributions with means representing latent true effect measures. A parametric approach subsequently models these latent measures using a normal distribution, which is strictly a convenient modeling assumption absent of theoretical justification. To eliminate the influence of overly restrictive parametric models on inferences, we consider a broader class of random effects distributions. We develop a novel hierarchical Bayesian nonparametric Polya tree mixture (PTM) model. We present methodology for testing the PTM versus a normal random effects model. These methods provide researchers a straightforward approach for conducting a sensitivity analysis of the normality assumption for random effects. An application involving meta-analysis of epidemiologic studies designed to characterize the association between alcohol consumption and breast cancer is presented, which together with results from simulated data highlight the performance of PTMs in the presence of nonnormality of effect measures in the source population.
Learning from examples - Generation and evaluation of decision trees for software resource analysis
Selby, Richard W.; Porter, Adam A.
1988-01-01
A general solution method for the automatic generation of decision (or classification) trees is investigated. The approach is to provide insights through in-depth empirical characterization and evaluation of decision trees for software resource data analysis. The trees identify classes of objects (software modules) that had high development effort. Sixteen software systems ranging from 3,000 to 112,000 source lines were selected for analysis from a NASA production environment. The collection and analysis of 74 attributes (or metrics), for over 4,700 objects, captured information about the development effort, faults, changes, design style, and implementation style. A total of 9,600 decision trees were automatically generated and evaluated. The trees correctly identified 79.3 percent of the software modules that had high development effort or faults, and the trees generated from the best parameter combinations correctly identified 88.4 percent of the modules on the average.
Analysis of designed experiments by stabilised PLS Regression and jack-knifing
Martens, Harald; Høy, M.; Westad, F.
2001-01-01
Pragmatical, visually oriented methods for assessing and optimising bi-linear regression models are described, and applied to PLS Regression (PLSR) analysis of multi-response data from controlled experiments. The paper outlines some ways to stabilise the PLSR method to extend its range...... the reliability of the linear and bi-linear model parameter estimates. The paper illustrates how the obtained PLSR "significance" probabilities are similar to those from conventional factorial ANOVA, but the PLSR is shown to give important additional overview plots of the main relevant structures in the multi...
Automatic regression analysis for use in a complex system of evaluation of plant genetic resources
Cs. ARKOSSY
1984-08-01
Full Text Available In accordance with the general requirements regarding computerization in gene banks and germplasm research a computer program has been compiled for the analysis of univariate response in crop germplasm evaluation. The program is compiled in COBOL and run on a FELIX C-256 computer. The different modules of the program allows for: (1. data control and error listing; (2 computation of the regression function; (3 listing of the difference between the values measured and computed; (4 sorting of the individuals samples; (5 construction of scattergrams in two dimensions for measured values with the simultaneous representation of the regression line; (6 listing of examined samples in a sequence required in evaluation.
Methods and applications of linear models regression and the analysis of variance
Hocking, Ronald R
2013-01-01
Praise for the Second Edition"An essential desktop reference book . . . it should definitely be on your bookshelf." -Technometrics A thoroughly updated book, Methods and Applications of Linear Models: Regression and the Analysis of Variance, Third Edition features innovative approaches to understanding and working with models and theory of linear regression. The Third Edition provides readers with the necessary theoretical concepts, which are presented using intuitive ideas rather than complicated proofs, to describe the inference that is appropriate for the methods being discussed. The book
Wang, D Z; Wang, C; Shen, C F; Zhang, Y; Zhang, H; Song, G D; Xue, X D; Xu, Z L; Zhang, S; Jiang, G H
2017-05-10
We described the time trend of acute myocardial infarction (AMI) from 1999 to 2013 in Tianjin incidence rate with Cochran-Armitage trend (CAT) test and linear regression analysis, and the results were compared. Based on actual population, CAT test had much stronger statistical power than linear regression analysis for both overall incidence trend and age specific incidence trend (Cochran-Armitage trend P valuelinear regression P value). The statistical power of CAT test decreased, while the result of linear regression analysis remained the same when population size was reduced by 100 times and AMI incidence rate remained unchanged. The two statistical methods have their advantages and disadvantages. It is necessary to choose statistical method according the fitting degree of data, or comprehensively analyze the results of two methods.
Greensmith, David J
2014-01-01
Here I present an Excel based program for the analysis of intracellular Ca transients recorded using fluorescent indicators. The program can perform all the necessary steps which convert recorded raw voltage changes into meaningful physiological information. The program performs two fundamental processes. (1) It can prepare the raw signal by several methods. (2) It can then be used to analyze the prepared data to provide information such as absolute intracellular Ca levels. Also, the rates of change of Ca can be measured using multiple, simultaneous regression analysis. I demonstrate that this program performs equally well as commercially available software, but has numerous advantages, namely creating a simplified, self-contained analysis workflow.
Integrated Analysis of Tropical Trees Growth: A Multivariate Approach
YÁÑEZ-ESPINOSA, LAURA; TERRAZAS, TERESA; LÓPEZ-MATA, LAURO
2006-01-01
• Background and Aims One of the problems analysing cause–effect relationships of growth and environmental factors is that a single factor could be correlated with other ones directly influencing growth. One attempt to understand tropical trees' growth cause–effect relationships is integrating research about anatomical, physiological and environmental factors that influence growth in order to develop mathematical models. The relevance is to understand the nature of the process of growth and to model this as a function of the environment. • Methods The relationships of Aphananthe monoica, Pleuranthodendron lindenii and Psychotria costivenia radial growth and phenology with environmental factors (local climate, vertical strata microclimate and physical and chemical soil variables) were evaluated from April 2000 to September 2001. The association among these groups of variables was determined by generalized canonical correlation analysis (GCCA), which considers the probable associations of three or more data groups and the selection of the most important variables for each data group. • Key Results The GCCA allowed determination of a general model of relationships among tree phenology and radial growth with climate, microclimate and soil factors. A strong influence of climate in phenology and radial growth existed. Leaf initiation and cambial activity periods were associated with maximum temperature and day length, and vascular tissue differentiation with soil moisture and rainfall. The analyses of individual species detected different relationships for the three species. • Conclusions The analyses of the individual species suggest that each one takes advantage in a different way of the environment in which they are growing, allowing them to coexist. PMID:16822807
Cirulli, N; Ballini, A; Cantore, S; Farronato, D; Inchingolo, F; Dipalma, G; Gatto, M R; Alessandri Bonetti, G
2015-01-01
Mixed dentition analysis forms a critical aspect of early orthodontic treatment. In fact an accurate space analysis is one of the important criteria in determining whether the treatment plan may involve serial extraction, guidance of eruption, space maintenance, space regaining or just periodic observation of the patients. The aim of the present study was to calculate linear regression equations in mixed dentition space analysis, measuring 230 dental casts mesiodistal tooth widths, obtained from southern Italian patients (118 females, 112 males, mean age 15±3 years). Students t-test or Wilcoxon test for independent and paired samples were used to determine right/left side and male/female differences. On the basis of the sum of the mesiodistal diameters of the 4 mandibular incisors as predictors for the sum of the widths of the canines and premolars in the mandibular mixed dentition, a new linear regression equation was found: y = 0.613x+7.294 (r= 0.701) for both genders in a southern Italian population. To better estimate the size of leeway space, a new regression equation was found to calculate the mesiodistal size of the second premolar using the sum of the four mandibular incisors, canine and first premolar as a predictor. The equation is y = 0.241x+1.224 (r= 0.732). In conclusion, new regression equations were derived for a southern Italian population.
Wen-Cheng Wang
2014-01-01
Full Text Available It is estimated that mainland Chinese tourists travelling to Taiwan can bring annual revenues of 400 billion NTD to the Taiwan economy. Thus, how the Taiwanese Government formulates relevant measures to satisfy both sides is the focus of most concern. Taiwan must improve the facilities and service quality of its tourism industry so as to attract more mainland tourists. This paper conducted a questionnaire survey of mainland tourists and used grey relational analysis in grey mathematics to analyze the satisfaction performance of all satisfaction question items. The first eight satisfaction items were used as independent variables, and the overall satisfaction performance was used as a dependent variable for quantile regression model analysis to discuss the relationship between the dependent variable under different quantiles and independent variables. Finally, this study further discussed the predictive accuracy of the least mean regression model and each quantile regression model, as a reference for research personnel. The analysis results showed that other variables could also affect the overall satisfaction performance of mainland tourists, in addition to occupation and age. The overall predictive accuracy of quantile regression model Q0.25 was higher than that of the other three models.
Wang, Wen-Cheng; Cho, Wen-Chien; Chen, Yin-Jen
2014-01-01
It is estimated that mainland Chinese tourists travelling to Taiwan can bring annual revenues of 400 billion NTD to the Taiwan economy. Thus, how the Taiwanese Government formulates relevant measures to satisfy both sides is the focus of most concern. Taiwan must improve the facilities and service quality of its tourism industry so as to attract more mainland tourists. This paper conducted a questionnaire survey of mainland tourists and used grey relational analysis in grey mathematics to analyze the satisfaction performance of all satisfaction question items. The first eight satisfaction items were used as independent variables, and the overall satisfaction performance was used as a dependent variable for quantile regression model analysis to discuss the relationship between the dependent variable under different quantiles and independent variables. Finally, this study further discussed the predictive accuracy of the least mean regression model and each quantile regression model, as a reference for research personnel. The analysis results showed that other variables could also affect the overall satisfaction performance of mainland tourists, in addition to occupation and age. The overall predictive accuracy of quantile regression model Q0.25 was higher than that of the other three models. PMID:24574916
Groenendijk, Peter; van der Sleen, Peter; Vlam, Mart; Bunyavejchewin, Sarayudh; Bongers, Frans; Zuidema, Pieter A
2015-10-01
The important role of tropical forests in the global carbon cycle makes it imperative to assess changes in their carbon dynamics for accurate projections of future climate-vegetation feedbacks. Forest monitoring studies conducted over the past decades have found evidence for both increasing and decreasing growth rates of tropical forest trees. The limited duration of these studies restrained analyses to decadal scales, and it is still unclear whether growth changes occurred over longer time scales, as would be expected if CO2 -fertilization stimulated tree growth. Furthermore, studies have so far dealt with changes in biomass gain at forest-stand level, but insights into species-specific growth changes - that ultimately determine community-level responses - are lacking. Here, we analyse species-specific growth changes on a centennial scale, using growth data from tree-ring analysis for 13 tree species (~1300 trees), from three sites distributed across the tropics. We used an established (regional curve standardization) and a new (size-class isolation) growth-trend detection method and explicitly assessed the influence of biases on the trend detection. In addition, we assessed whether aggregated trends were present within and across study sites. We found evidence for decreasing growth rates over time for 8-10 species, whereas increases were noted for two species and one showed no trend. Additionally, we found evidence for weak aggregated growth decreases at the site in Thailand and when analysing all sites simultaneously. The observed growth reductions suggest deteriorating growth conditions, perhaps due to warming. However, other causes cannot be excluded, such as recovery from large-scale disturbances or changing forest dynamics. Our findings contrast growth patterns that would be expected if elevated CO2 would stimulate tree growth. These results suggest that commonly assumed growth increases of tropical forests may not occur, which could lead to erroneous
Ussery, David; Bohlin, Jon; Skjerve, Eystein
2009-01-01
Recently there has been an explosion in the availability of bacterial genomic sequences, making possible now an analysis of genomic signatures across more than 800 hundred different bacterial chromosomes, from a wide variety of environments. Using genomic signatures, we pair-wise compared 867...... different genomic DNA sequences, taken from chromosomes and plasmids more than 100,000 base-pairs in length. Hierarchical clustering was performed on the outcome of the comparisons before a multinomial regression model was fitted. The regression model included the cluster groups as the response variable...... AT content. Small improvements to the regression model, although significant, were also obtained by factors such as sequence size, habitat, growth temperature, selective pressure measured as oligonucleotide usage variance, and oxygen requirement.The statistics obtained using hierarchical clustering...
Bayesian Method of Moments (BMOM) Analysis of Mean and Regression Models
Zellner, Arnold
2008-01-01
A Bayesian method of moments/instrumental variable (BMOM/IV) approach is developed and applied in the analysis of the important mean and multiple regression models. Given a single set of data, it is shown how to obtain posterior and predictive moments without the use of likelihood functions, prior densities and Bayes' Theorem. The posterior and predictive moments, based on a few relatively weak assumptions, are then used to obtain maximum entropy densities for parameters, realized error terms and future values of variables. Posterior means for parameters and realized error terms are shown to be equal to certain well known estimates and rationalized in terms of quadratic loss functions. Conditional maxent posterior densities for means and regression coefficients given scale parameters are in the normal form while scale parameters' maxent densities are in the exponential form. Marginal densities for individual regression coefficients, realized error terms and future values are in the Laplace or double-exponenti...
Groping Toward Linear Regression Analysis: Newton's Analysis of Hipparchus' Equinox Observations
Belenkiy, Ari
2008-01-01
In 1700, Newton, in designing a new universal calendar contained in the manuscripts known as Yahuda MS 24 from Jewish National and University Library at Jerusalem and analyzed in our recent article in Notes & Records Royal Society (59 (3), Sept 2005, pp. 223-54), attempted to compute the length of the tropical year using the ancient equinox observations reported by a famous Greek astronomer Hipparchus of Rhodes, ten in number. Though Newton had a very thin sample of data, he obtained a tropical year only a few seconds longer than the correct length. The reason lies in Newton's application of a technique similar to modern regression analysis. Actually he wrote down the first of the two so-called "normal equations" known from the Ordinary Least Squares method. Newton also had a vague understanding of qualitative variables. This paper concludes by discussing open historico-astronomical problems related to the inclination of the Earth's axis of rotation. In particular, ignorance about the long-range variation...
孙文胜; 胡玲敏
2011-01-01
Concerning the common problem of tag collision in Radio Frequency Identification (RFID) system, an improved anti-collision algorithm for multi-branch tree was proposed based on the regressive-style search algorithm.According to the characteristics of the tags collision, the presented algorithm adopted the dormancy count, and took quad tree structure when continuous collision appeared, which had the ability to choose the number of forks dynamically during the searching process, reduced the search range and improved the identification efficiency.The performance analysis results show that the system efficiency of the proposed algorithm is about 76.5％; moreover, with the number of tags increased, the superiority of the performance is more obvious.%针对无线射频识别(RFID)系统中常见的标签防碰撞问题,在后退式搜索算法的基础上提出了一种改进的多叉树防碰撞算法.根据标签碰撞的特点,采用休眠计数的方法,以及遇到连续碰撞位时进行四叉树分裂的策略,使得在搜索过程中能够动态选择分叉数量,缩短了标签识别时间,有效地提高了算法的搜索效率.性能分析表明,该算法的系统识别效率达76.5%,且随着标签数目的增多,优越性更加明显.
Toward a theory of statistical tree-shape analysis
Feragen, Aasa; Lo, Pechin Chien Pau; de Bruijne, Marleen
2013-01-01
these advantages. Along with the theoretical framework we provide experimental proof-of-concept results on synthetic data trees as well as small airway trees from pulmonary CT scans. This way, we illustrate that our framework has promising theoretical and qualitative properties necessary to build a theory...
Replica analysis of overfitting in regression models for time-to-event data
Coolen, A. C. C.; Barrett, J. E.; Paga, P.; Perez-Vicente, C. J.
2017-09-01
Overfitting, which happens when the number of parameters in a model is too large compared to the number of data points available for determining these parameters, is a serious and growing problem in survival analysis. While modern medicine presents us with data of unprecedented dimensionality, these data cannot yet be used effectively for clinical outcome prediction. Standard error measures in maximum likelihood regression, such as p-values and z-scores, are blind to overfitting, and even for Cox’s proportional hazards model (the main tool of medical statisticians), one finds in literature only rules of thumb on the number of samples required to avoid overfitting. In this paper we present a mathematical theory of overfitting in regression models for time-to-event data, which aims to increase our quantitative understanding of the problem and provide practical tools with which to correct regression outcomes for the impact of overfitting. It is based on the replica method, a statistical mechanical technique for the analysis of heterogeneous many-variable systems that has been used successfully for several decades in physics, biology, and computer science, but not yet in medical statistics. We develop the theory initially for arbitrary regression models for time-to-event data, and verify its predictions in detail for the popular Cox model.
FRICTION MODELING OF Al-Mg ALLOY SHEETS BASED ON MULTIPLE REGRESSION ANALYSIS AND NEURAL NETWORKS
Hirpa G. Lemu
2017-03-01
Full Text Available This article reports a proposed approach to a frictional resistance description in sheet metal forming processes that enables determination of the friction coefficient value under a wide range of friction conditions without performing time-consuming experiments. The motivation for this proposal is the fact that there exists a considerable amount of factors affect the friction coefficient value and as a result building analytical friction model for specified process conditions is practically impossible. In this proposed approach, a mathematical model of friction behaviour is created using multiple regression analysis and artificial neural networks. The regression analysis was performed using a subroutine in MATLAB programming code and STATISTICA Neural Networks was utilized to build an artificial neural networks model. The effect of different training strategies on the quality of neural networks was studied. As input variables for regression model and training of radial basis function networks, generalized regression neural networks and multilayer networks the results of strip drawing friction test were utilized. Four kinds of Al-Mg alloy sheets were used as a test material.
Multilayer perceptron for robust nonlinear interval regression analysis using genetic algorithms.
Hu, Yi-Chung
2014-01-01
On the basis of fuzzy regression, computational models in intelligence such as neural networks have the capability to be applied to nonlinear interval regression analysis for dealing with uncertain and imprecise data. When training data are not contaminated by outliers, computational models perform well by including almost all given training data in the data interval. Nevertheless, since training data are often corrupted by outliers, robust learning algorithms employed to resist outliers for interval regression analysis have been an interesting area of research. Several approaches involving computational intelligence are effective for resisting outliers, but the required parameters for these approaches are related to whether the collected data contain outliers or not. Since it seems difficult to prespecify the degree of contamination beforehand, this paper uses multilayer perceptron to construct the robust nonlinear interval regression model using the genetic algorithm. Outliers beyond or beneath the data interval will impose slight effect on the determination of data interval. Simulation results demonstrate that the proposed method performs well for contaminated datasets.
Multilayer Perceptron for Robust Nonlinear Interval Regression Analysis Using Genetic Algorithms
2014-01-01
On the basis of fuzzy regression, computational models in intelligence such as neural networks have the capability to be applied to nonlinear interval regression analysis for dealing with uncertain and imprecise data. When training data are not contaminated by outliers, computational models perform well by including almost all given training data in the data interval. Nevertheless, since training data are often corrupted by outliers, robust learning algorithms employed to resist outliers for interval regression analysis have been an interesting area of research. Several approaches involving computational intelligence are effective for resisting outliers, but the required parameters for these approaches are related to whether the collected data contain outliers or not. Since it seems difficult to prespecify the degree of contamination beforehand, this paper uses multilayer perceptron to construct the robust nonlinear interval regression model using the genetic algorithm. Outliers beyond or beneath the data interval will impose slight effect on the determination of data interval. Simulation results demonstrate that the proposed method performs well for contaminated datasets. PMID:25110755
Decision tree analysis of factors influencing rainfall-related building damage
M. H. Spekkers
2014-04-01
Full Text Available Flood damage prediction models are essential building blocks in flood risk assessments. Little research has been dedicated so far to damage of small-scale urban floods caused by heavy rainfall, while there is a need for reliable damage models for this flood type among insurers and water authorities. The aim of this paper is to investigate a wide range of damage-influencing factors and their relationships with rainfall-related damage, using decision tree analysis. For this, district-aggregated claim data from private property insurance companies in the Netherlands were analysed, for the period of 1998–2011. The databases include claims of water-related damage, for example, damages related to rainwater intrusion through roofs and pluvial flood water entering buildings at ground floor. Response variables being modelled are average claim size and claim frequency, per district per day. The set of predictors include rainfall-related variables derived from weather radar images, topographic variables from a digital terrain model, building-related variables and socioeconomic indicators of households. Analyses were made separately for property and content damage claim data. Results of decision tree analysis show that claim frequency is most strongly associated with maximum hourly rainfall intensity, followed by real estate value, ground floor area, household income, season (property data only, buildings age (property data only, ownership structure (content data only and fraction of low-rise buildings (content data only. It was not possible to develop statistically acceptable trees for average claim size, which suggest that variability in average claim size is related to explanatory variables that cannot be defined at the district scale. Cross-validation results show that decision trees were able to predict 22–26% of variance in claim frequency, which is considerably better compared to results from global multiple regression models (11–18% of
Decision tree analysis of factors influencing rainfall-related building damage
Spekkers, M. H.; Kok, M.; Clemens, F. H. L. R.; ten Veldhuis, J. A. E.
2014-04-01
Flood damage prediction models are essential building blocks in flood risk assessments. Little research has been dedicated so far to damage of small-scale urban floods caused by heavy rainfall, while there is a need for reliable damage models for this flood type among insurers and water authorities. The aim of this paper is to investigate a wide range of damage-influencing factors and their relationships with rainfall-related damage, using decision tree analysis. For this, district-aggregated claim data from private property insurance companies in the Netherlands were analysed, for the period of 1998-2011. The databases include claims of water-related damage, for example, damages related to rainwater intrusion through roofs and pluvial flood water entering buildings at ground floor. Response variables being modelled are average claim size and claim frequency, per district per day. The set of predictors include rainfall-related variables derived from weather radar images, topographic variables from a digital terrain model, building-related variables and socioeconomic indicators of households. Analyses were made separately for property and content damage claim data. Results of decision tree analysis show that claim frequency is most strongly associated with maximum hourly rainfall intensity, followed by real estate value, ground floor area, household income, season (property data only), buildings age (property data only), ownership structure (content data only) and fraction of low-rise buildings (content data only). It was not possible to develop statistically acceptable trees for average claim size, which suggest that variability in average claim size is related to explanatory variables that cannot be defined at the district scale. Cross-validation results show that decision trees were able to predict 22-26% of variance in claim frequency, which is considerably better compared to results from global multiple regression models (11-18% of variance explained). Still, a
Decision-tree analysis of factors influencing rainfall-related building structure and content damage
Spekkers, M. H.; Kok, M.; Clemens, F. H. L. R.; ten Veldhuis, J. A. E.
2014-09-01
Flood-damage prediction models are essential building blocks in flood risk assessments. So far, little research has been dedicated to damage from small-scale urban floods caused by heavy rainfall, while there is a need for reliable damage models for this flood type among insurers and water authorities. The aim of this paper is to investigate a wide range of damage-influencing factors and their relationships with rainfall-related damage, using decision-tree analysis. For this, district-aggregated claim data from private property insurance companies in the Netherlands were analysed, for the period 1998-2011. The databases include claims of water-related damage (for example, damages related to rainwater intrusion through roofs and pluvial flood water entering buildings at ground floor). Response variables being modelled are average claim size and claim frequency, per district, per day. The set of predictors include rainfall-related variables derived from weather radar images, topographic variables from a digital terrain model, building-related variables and socioeconomic indicators of households. Analyses were made separately for property and content damage claim data. Results of decision-tree analysis show that claim frequency is most strongly associated with maximum hourly rainfall intensity, followed by real estate value, ground floor area, household income, season (property data only), buildings age (property data only), a fraction of homeowners (content data only), a and fraction of low-rise buildings (content data only). It was not possible to develop statistically acceptable trees for average claim size. It is recommended to investigate explanations for the failure to derive models. These require the inclusion of other explanatory factors that were not used in the present study, an investigation of the variability in average claim size at different spatial scales, and the collection of more detailed insurance data that allows one to distinguish between the
Tutorial on Biostatistics: Linear Regression Analysis of Continuous Correlated Eye Data.
Ying, Gui-Shuang; Maguire, Maureen G; Glynn, Robert; Rosner, Bernard
2017-04-01
To describe and demonstrate appropriate linear regression methods for analyzing correlated continuous eye data. We describe several approaches to regression analysis involving both eyes, including mixed effects and marginal models under various covariance structures to account for inter-eye correlation. We demonstrate, with SAS statistical software, applications in a study comparing baseline refractive error between one eye with choroidal neovascularization (CNV) and the unaffected fellow eye, and in a study determining factors associated with visual field in the elderly. When refractive error from both eyes were analyzed with standard linear regression without accounting for inter-eye correlation (adjusting for demographic and ocular covariates), the difference between eyes with CNV and fellow eyes was 0.15 diopters (D; 95% confidence interval, CI -0.03 to 0.32D, p = 0.10). Using a mixed effects model or a marginal model, the estimated difference was the same but with narrower 95% CI (0.01 to 0.28D, p = 0.03). Standard regression for visual field data from both eyes provided biased estimates of standard error (generally underestimated) and smaller p-values, while analysis of the worse eye provided larger p-values than mixed effects models and marginal models. In research involving both eyes, ignoring inter-eye correlation can lead to invalid inferences. Analysis using only right or left eyes is valid, but decreases power. Worse-eye analysis can provide less power and biased estimates of effect. Mixed effects or marginal models using the eye as the unit of analysis should be used to appropriately account for inter-eye correlation and maximize power and precision.
Akkaya, Ali Volkan [Department of Mechanical Engineering, Yildiz Technical University, 34349 Besiktas, Istanbul (Turkey)
2009-02-15
In this paper, multiple nonlinear regression models for estimation of higher heating value of coals are developed using proximate analysis data obtained generally from the low rank coal samples as-received basis. In this modeling study, three main model structures depended on the number of proximate analysis parameters, which are named the independent variables, such as moisture, ash, volatile matter and fixed carbon, are firstly categorized. Secondly, sub-model structures with different arrangements of the independent variables are considered. Each sub-model structure is analyzed with a number of model equations in order to find the best fitting model using multiple nonlinear regression method. Based on the results of nonlinear regression analysis, the best model for each sub-structure is determined. Among them, the models giving highest correlation for three main structures are selected. Although the selected all three models predicts HHV rather accurately, the model involving four independent variables provides the most accurate estimation of HHV. Additionally, when the chosen model with four independent variables and a literature model are tested with extra proximate analysis data, it is seen that that the developed model in this study can give more accurate prediction of HHV of coals. It can be concluded that the developed model is effective tool for HHV estimation of low rank coals. (author)
Analysis of the Evolution of the Gross Domestic Product by Means of Cyclic Regressions
Catalin Angelo Ioan
2011-08-01
Full Text Available In this article, we will carry out an analysis on the regularity of the Gross Domestic Product of a country, in our case the United States. The method of analysis is based on a new method of analysis – the cyclic regressions based on the Fourier series of a function. Another point of view is that of considering instead the growth rate of GDP the speed of variation of this rate, computed as a numerical derivative. The obtained results show a cycle for this indicator for 71 years, the mean square error being 0.93%. The method described allows an prognosis on short-term trends in GDP.
Multiple Regression Analysis of Unconfined Compression Strength of Mine Tailings Matrices
Mahmood Ali A.
2017-01-01
Full Text Available As part of a novel approach of sustainable development of mine tailings, experimental and numerical analysis is carried out on newly formulated tailings matrices. Several physical characteristic tests are carried out including the unconfined compression strength test to ascertain the integrity of these matrices when subjected to loading. The current paper attempts a multiple regression analysis of the unconfined compressive strength test results of these matrices to investigate the most pertinent factors affecting their strength. Results of this analysis showed that the suggested equation is reasonably applicable to the range of binder combinations used.
Javier Trujillano
2008-02-01
Full Text Available Objetivo: : Realizar una aproximación a la metodología de árboles de decisión tipo CART (Classification and Regression Trees desarrollando un modelo para calcular la probabilidad de muerte hospitalaria en infarto agudo de miocardio (IAM. Método: Se utiliza el conjunto mínimo básico de datos al alta hospitalaria (CMBD de Andalucía, Cataluña, Madrid y País Vasco de los años 2001 y 2002, que incluye los casos con IAM como diagnóstico principal. Los 33.203 pacientes se dividen aleatoriamente (70 y 30 % en grupo de desarrollo (GD = 23.277 y grupo de validación (GV = 9.926. Como CART se utiliza un modelo inductivo basado en el algoritmo de Breiman, con análisis de sensibilidad mediante el índice de Gini y sistema de validación cruzada. Se compara con un modelo de regresión logística (RL y una red neuronal artificial (RNA (multilayer perceptron. Los modelos desarrollados se contrastan en el GV y sus propiedades se comparan con el área bajo la curva ROC (ABC (intervalo de confianza del 95%. Resultados: En el GD el CART con ABC = 0,85 (0,86-0,88, RL 0,87 (0,86-0,88 y RNA 0,85 (0,85-0,86. En el GV el CART con ABC = 0,85 (0,85-0,88, RL 0,86 (0,85-0,88 y RNA 0,84 (0,83-0,86. Conclusiones: Los 3 modelos obtienen resultados similares en su capacidad de discriminación. El modelo CART ofrece como ventaja su simplicidad de uso y de interpretación, ya que las reglas de decisión que generan pueden aplicarse sin necesidad de procesos matemáticos.Objective: To provide an overview of decision trees based on CART (Classification and Regression Trees methodology. As an example, we developed a CART model intended to estimate the probability of intrahospital death from acute myocardial infarction (AMI. Method: We employed the minimum data set (MDS of Andalusia, Catalonia, Madrid and the Basque Country (2001-2002, which included 33,203 patients with a diagnosis of AMI. The 33,203 patients were randomly divided (70% and 30% into the development (DS
Hüseyin BUDAK
2012-11-01
Full Text Available Credit scoring is a vital topic for Banks since there is a need to use limited financial sources more effectively. There are several credit scoring methods that are used by Banks. One of them is to estimate whether a credit demanding customer’s repayment order will be regular or not. In this study, artificial neural networks and logistic regression analysis have been used to provide a support to the Banks’ credit risk prediction and to estimate whether a credit demanding customers’ repayment order will be regular or not. The results of the study showed that artificial neural networks method is more reliable than logistic regression analysis while estimating a credit demanding customer’s repayment order.
Analysis of Functional Data with Focus on Multinomial Regression and Multilevel Data
Mousavi, Seyed Nourollah
Functional data analysis (FDA) is a fast growing area in statistical research with increasingly diverse range of application from economics, medicine, agriculture, chemometrics, etc. Functional regression is an area of FDA which has received the most attention both in aspects of application...... and methodological development. Our main Functional data analysis (FDA) is a fast growing area in statistical research with increasingly diverse range of application from economics, medicine, agriculture, chemometrics, etc. Functional regression is an area of FDA which has received the most attention both in aspects...... and the prediction of the response at time t only depends on th concurrently observed predictor. We introduce a version of this model for multilevel functional data of the type subjectunit, with the unit-level data being functional observations. Finally, in the fourth paper we show how registration can be applied...
Forecasting municipal solid waste generation using prognostic tools and regression analysis.
Ghinea, Cristina; Drăgoi, Elena Niculina; Comăniţă, Elena-Diana; Gavrilescu, Marius; Câmpean, Teofil; Curteanu, Silvia; Gavrilescu, Maria
2016-11-01
For an adequate planning of waste management systems the accurate forecast of waste generation is an essential step, since various factors can affect waste trends. The application of predictive and prognosis models are useful tools, as reliable support for decision making processes. In this paper some indicators such as: number of residents, population age, urban life expectancy, total municipal solid waste were used as input variables in prognostic models in order to predict the amount of solid waste fractions. We applied Waste Prognostic Tool, regression analysis and time series analysis to forecast municipal solid waste generation and composition by considering the Iasi Romania case study. Regression equations were determined for six solid waste fractions (paper, plastic, metal, glass, biodegradable and other waste). Accuracy Measures were calculated and the results showed that S-curve trend model is the most suitable for municipal solid waste (MSW) prediction.
Regression analysis of growth responses to water depth in three wetland plant species
Sorrell, Brian K; Tanner, Chris C; Brix, Hans
2012-01-01
) differing in depth preferences in wetlands, using non-linear and quantile regression analyses to establish how flooding tolerance can explain field zonation. Methodology Plants were established for 8 months in outdoor cultures in waterlogged soil without standing water, and then randomly allocated to water...... depths from 0 – 0.5 m. Morphological and growth responses to depth were followed for 54 days before harvest, and then analysed by repeated measures analysis of covariance, and non-linear and quantile regression analysis (QRA), to compare flooding tolerances. Principal results Growth responses to depth...... differed between the three species, and were non-linear. P. tenax growth rapidly decreased in standing water > 0.25 m depth, C. secta growth increased initially with depth but then decreased at depths > 0.30 m, accompanied by increased shoot height and decreased shoot density, and T. orientalis...
Regression And Time Series Analysis Of Loan Default At Minescho Cooperative Credit Union Tarkwa
Otoo
2015-08-01
Full Text Available Abstract Lending in the form of loans is a principal business activity for banks credit unions and other financial institutions. This forms a substantial amount of the banks assets. However when these loans are defaulted it tends to have serious effects on the financial institutions. This study sought to determine the trend and forecast loan default at Minescho CreditUnion Tarkwa. A secondary data from the Credit Union was analyzed using Regression Analysis and the Box-Jenkins method of Time Series. From the Regression Analysis there was a moderately strong relationship between the amount of loan default and time. Also the amount of loan default had an increasing trend. The two years forecast of the amount of loan default oscillated initially and remained constant from 2016 onwards.
Regression Analysis of Right-censored Failure Time Data with Missing Censoring Indicators
Ping Chen; Ren He; Jun-shan Shen; Jian-guo Sun
2009-01-01
This paper discusses regression analysis of right-censored failure time data when censoring indicators are missing for some subjects. Several methods have been developed for the analysis under different situations and especially, Goetghebeur and Ryan[4] considered the situation where both the failure time and the censoring time follow the proportional hazards models marginally and developed an estimating equation approach. One limitation of their approach is that the two baseline hazard functions were assumed to be proportional to each other. We consider the same problem and present an efficient estimation procedure for regression parameters that does not require the proportionality assumption. An EM algorithm is developed and the method is evaluated by a simulation study, which indicates that the proposed methodology performs well for practical situations. An illustrative example is provided.
Charles H. (Hobie) Perry; Kevin J. Horn; R. Quinn Thomas; Linda H. Pardo; Erica A.H. Smithwick; Doug Baldwin; Gregory B. Lawrence; Scott W. Bailey; Sabine Braun; Christopher M. Clark; Mark Fenn; Annika Nordin; Jennifer N. Phelan; Paul G. Schaberg; Sam St. Clair; Richard Warby; Shaun Watmough; Steven S. Perakis
2015-01-01
The abundance of temporally and spatially consistent Forest Inventory and Analysis data facilitates hierarchical/multilevel analysis to investigate factors affecting tree growth, scaling from plot-level to continental scales. Herein we use FIA tree and soil inventories in conjunction with various spatial climate and soils data to estimate species-specific responses of...
[Regression analysis of an instrumental conditioned tentacular reflex in the edible snail].
Stepanov, I I; Kuntsevich, S V; Lokhov, M I
1989-01-01
Regression analysis revealed the opportunity of approximation with exponential mathematical model of the learning curves of conditioned tentacle reflex. Retention of the reflex persisted for more than three weeks. There were some quantitative differences between conditioning of the right and the left tentacle. There was formation of the reflex in every session during spring period, but there was no retention between sessions. The conditioned tentacle reflex may be employed in neuropharmacological studies.
Nop Sopipan
2013-01-01
Full Text Available The aim of this study was to forecast the returns for the Stock Exchange of Thailand (SET Index by adding some explanatory variables and stationary Autoregressive order p (AR (p in the mean equation of returns. In addition, we used Principal Component Analysis (PCA to remove possible complications caused by multicollinearity. Results showed that the multiple regressions based on PCA, has the best performance.
Quantitative Analysis of Tree Species in Mixed Forests of Mandal Catchments, Garhwal Himalaya
Balwant KUMAR
2012-06-01
Full Text Available A total of 14 tree species were identified in the study sites, among which Quercus leucotrichophora Hook. F. (Banj oak, Rhododendron arboreum Smith (Burans, Lyonia ovalifolia Drude (Ayar and Pyrus pashia Buch-Hemp (Mehal are the predominant tree species. A quantitative analysis of tree species indicates that on the basis of their canopy cover, tree density and total base area, these study sites fall within the category of disturbed forest. The uncontrolled lopping for timber, firewood and leaf fodder and the absence of saplings and seedlings are some of the major factors responsible for the declining of forests in the Himalayan region.
Monitoring heavy metal Cr in soil based on hyperspectral data using regression analysis
Zhang, Ningyu; Xu, Fuyun; Zhuang, Shidong; He, Changwei
2016-10-01
Heavy metal pollution in soils is one of the most critical problems in the global ecology and environment safety nowadays. Hyperspectral remote sensing and its application is capable of high speed, low cost, less risk and less damage, and provides a good method for detecting heavy metals in soil. This paper proposed a new idea of applying regression analysis of stepwise multiple regression between the spectral data and monitoring the amount of heavy metal Cr by sample points in soil for environmental protection. In the measurement, a FieldSpec HandHeld spectroradiometer is used to collect reflectance spectra of sample points over the wavelength range of 325-1075 nm. Then the spectral data measured by the spectroradiometer is preprocessed to reduced the influence of the external factors, and the preprocessed methods include first-order differential equation, second-order differential equation and continuum removal method. The algorithms of stepwise multiple regression are established accordingly, and the accuracy of each equation is tested. The results showed that the accuracy of first-order differential equation works best, which makes it feasible to predict the content of heavy metal Cr by using stepwise multiple regression.
Robust estimation for homoscedastic regression in the secondary analysis of case-control data
Wei, Jiawei
2012-12-04
Primary analysis of case-control studies focuses on the relationship between disease D and a set of covariates of interest (Y, X). A secondary application of the case-control study, which is often invoked in modern genetic epidemiologic association studies, is to investigate the interrelationship between the covariates themselves. The task is complicated owing to the case-control sampling, where the regression of Y on X is different from what it is in the population. Previous work has assumed a parametric distribution for Y given X and derived semiparametric efficient estimation and inference without any distributional assumptions about X. We take up the issue of estimation of a regression function when Y given X follows a homoscedastic regression model, but otherwise the distribution of Y is unspecified. The semiparametric efficient approaches can be used to construct semiparametric efficient estimates, but they suffer from a lack of robustness to the assumed model for Y given X. We take an entirely different approach. We show how to estimate the regression parameters consistently even if the assumed model for Y given X is incorrect, and thus the estimates are model robust. For this we make the assumption that the disease rate is known or well estimated. The assumption can be dropped when the disease is rare, which is typically so for most case-control studies, and the estimation algorithm simplifies. Simulations and empirical examples are used to illustrate the approach.
Hiroyuki Nakamoto
2014-01-01
Full Text Available The human is covered with soft skin and has tactile receptors inside. The skin deforms along a contact surface. The tactile receptors detect the mechanical deformation. The detection of the mechanical deformation is essential for the tactile sensation. We propose a magnetic type tactile sensor which has a soft surface and eight magnetoresistive elements. The soft surface has a permanent magnet inside and the magnetoresistive elements under the soft surface measure the magnetic flux density of the magnet. The tactile sensor estimates the displacement and the rotation on the surface based on the change of the magnetic flux density. Determination of an estimate equation is difficult because the displacement and the rotation are not geometrically decided based on the magnetic flux density. In this paper, a stepwise regression analysis determines the estimate equation. The outputs of the magnetoresistive elements are used as explanatory variables, and the three-axis displacement and the two-axis rotation are response variables in the regression analysis. We confirm the regression analysis is effective for determining the estimate equations through simulation and experiment. The results show the tactile sensor measures both the displacement and the rotation generated on the surface by using the determined equation.
Non-Stationary Hydrologic Frequency Analysis using B-Splines Quantile Regression
Nasri, B.; St-Hilaire, A.; Bouezmarni, T.; Ouarda, T.
2015-12-01
Hydrologic frequency analysis is commonly used by engineers and hydrologists to provide the basic information on planning, design and management of hydraulic structures and water resources system under the assumption of stationarity. However, with increasing evidence of changing climate, it is possible that the assumption of stationarity would no longer be valid and the results of conventional analysis would become questionable. In this study, we consider a framework for frequency analysis of extreme flows based on B-Splines quantile regression, which allows to model non-stationary data that have a dependence on covariates. Such covariates may have linear or nonlinear dependence. A Markov Chain Monte Carlo (MCMC) algorithm is used to estimate quantiles and their posterior distributions. A coefficient of determination for quantiles regression is proposed to evaluate the estimation of the proposed model for each quantile level. The method is applied on annual maximum and minimum streamflow records in Ontario, Canada. Climate indices are considered to describe the non-stationarity in these variables and to estimate the quantiles in this case. The results show large differences between the non-stationary quantiles and their stationary equivalents for annual maximum and minimum discharge with high annual non-exceedance probabilities. Keywords: Quantile regression, B-Splines functions, MCMC, Streamflow, Climate indices, non-stationarity.
Object-based Analysis for Extraction of Dominant Tree Species
Meiyun; SHAO; Xia; JING; Lu; WANG
2015-01-01
As forest is of great significance for our whole development and the sustainable plan is so focus on it. It is very urgent for us to have the whole distribution,stock volume and other related information about that. So the forest inventory program is on our schedule. Aiming at dealing with the problem in extraction of dominant tree species,we tested the highly hot method-object-based analysis. Based on the ALOS image data,we combined multi-resolution in e Cognition software and fuzzy classification algorithm. Through analyzing the segmentation results,we basically extract the spruce,the pine,the birch and the oak of the study area. Both the spectral and spatial characteristics were derived from those objects,and with the help of GLCM,we got the differences of each species. We use confusion matrix to do the Classification accuracy assessment compared with the actual ground data and this method showed a comparatively good precision as 87% with the kappa coefficient 0. 837.
SSR markers for Quercus suber tree identification and embryo analysis.
Gómez, A; Pintos, B; Aguiriano, E; Manzanera, J A; Bueno, M A
2001-01-01
Three Quercus simple sequence repeat (SSR) markers were amplified by polymerase chain reaction (PCR) from nuclear DNA extracts of trees and in vitro-induced haploid embryos from anther cultures of Quercus suber L. These markers were sufficiently polymorphic to identify 10 of 12 trees located in two Spanish natural areas. The same loci have been analyzed in anther-derived haploid embryos showing the parental tree allele segregation. All the alleles were present in the haploid progeny. The presence of diverse alleles in embryos derived from the same anther demonstrated that they were induced on multiple microspores or pollen grains and they were not clonally propagated. Also, diploid cultures and mixtures of haploid-diploid tissues were obtained. The origin of such cultures, either somatic or gametic, was elucidated by SSR markers. All the embryos showed only one allele, corroborating a haploid origin. Allelic composition of the haploid progeny permitted parental identification among all analyzed trees.
Building the Forest Inventory and Analysis Tree-Ring Data set
Robert J. DeRose; John D. Shaw; James N. Long
2017-01-01
The Interior West Forest Inventory and Analysis (IW-FIA) program measures forestland conditions at great extent with relatively high spatial resolution, including the collection of tree-ring data. We describe the development of an unprecedented spatial tree-ring data set for the IW-FIA that enhances the baseline plot data by incorporating ring-width increment measured...
Tree or shrub: a functional branch analysis of Jatropha curcas L.
Tjeuw, J.; Mulia, R.; Slingerland, M.A.; Noordwijk, van M.
2015-01-01
Jatropha curcas is an oil-bearing semi-evergreen shrub or small tree with potential as a source of sustainable biofuel, yet information regarding vegetative and fruit biomass in relation to plant architecture is lacking. Research conducted in Indonesia used the tree based functional branch analysis
Traffic Accident Analysis Using Decision Trees and Neural Networks
Chong, Miao M.; Abraham, Ajith; Paprzycki, Marcin
2004-01-01
The costs of fatalities and injuries due to traffic accident have a great impact on society. This paper presents our research to model the severity of injury resulting from traffic accidents using artificial neural networks and decision trees. We have applied them to an actual data set obtained from the National Automotive Sampling System (NASS) General Estimates System (GES). Experiment results reveal that in all the cases the decision tree outperforms the neural network. Our research analys...
Production and analysis of recombinant tree nut allergens.
Willison, Leanna N; Sathe, Shridhar K; Roux, Kenneth H
2014-03-01
Allergic reactions to tree nuts are a growing global concern as the number of affected individuals continues to rise. Unlike some food allergies, tree nuts can cause severe reactions that persist throughout life. The tree nuts discussed in this review include those most commonly responsible for allergic reactions: cashew, almond, hazelnut, walnut, pecan, Brazil nut, pistachio, and chestnut. The native allergenic proteins derived from tree nuts are frequently difficult to isolate and purify and may not be adequately represented in aqueous nut protein extracts. Consequently, defined recombinant allergens have become useful reagents in a variety of immunoassays aimed at the diagnosis of tree nut allergy, assessing cross-reactivity between various nuts and other seeds, mapping of IgE binding epitopes, and analyzing the effects of the food matrix, food processing, and gastric digestion on allergenicity. This review describes the approaches that can be used for the production of recombinant tree nut allergens and addresses key issues associated with their production and downstream applications.
Deni Memić
2015-01-01
Full Text Available This article has an aim to assess credit default prediction on the banking market in Bosnia and Herzegovina nationwide as well as on its constitutional entities (Federation of Bosnia and Herzegovina and Republika Srpska. Ability to classify companies info different predefined groups or finding an appropriate tool which would replace human assessment in classifying companies into good and bad buckets has been one of the main interests on risk management researchers for a long time. We investigated the possibility and accuracy of default prediction using traditional statistical methods logistic regression (logit and multiple discriminant analysis (MDA and compared their predictive abilities. The results show that the created models have high predictive ability. For logit models, some variables are more influential on the default prediction than the others. Return on assets (ROA is statistically significant in all four periods prior to default, having very high regression coefficients, or high impact on the model's ability to predict default. Similar results are obtained for MDA models. It is also found that predictive ability differs between logistic regression and multiple discriminant analysis.
Correlation Study and Regression Analysis of Drinking Water Quality in Kashan City, Iran
Mohammad Mehdi HEYDARI
2013-06-01
Full Text Available Chemical and statistical regression analysis on drinking water samples at five fields (21 sampling wells with hot and dry climate in Kashan city, central Iran was carried out. Samples were collected during October 2006 to May 2007 (25 - 30 °C. Comparing the results with drinking water quality standards issued by World Health Organization (WHO, it is found that some of the water samples are not potable. Hydrochemical facies using a Piper diagram indicate that in most parts of the city, the chemical character of water is dominated by NaCl. All samples showed sulfate and sodium ion higher and K+ and F- content lower than the permissible limit. A strongly positive correlation is observed between TDS and EC (R = 0.995 and Ca2+ and TH (R = 0.948. The results showed that regression relations have the same correlation coefficients: (I pH -TH, EC -TH (R = 0.520, (II NO3- -pH, TH-pH (R = 0.520, (III Ca2+-SO42-, TH-SO42-, Cl- -SO42- (R = 0.630. The results revealed that systematic calculations of correlation coefficients between water parameters and regression analysis provide a useful means for rapid monitoring of water quality.
Multigenic phylogeny and analysis of tree incongruences in Triticeae (Poaceae
Guilhaumon Claire
2011-06-01
Full Text Available Abstract Background Introgressive events (e.g., hybridization, gene flow, horizontal gene transfer and incomplete lineage sorting of ancestral polymorphisms are a challenge for phylogenetic analyses since different genes may exhibit conflicting genealogical histories. Grasses of the Triticeae tribe provide a particularly striking example of incongruence among gene trees. Previous phylogenies, mostly inferred with one gene, are in conflict for several taxon positions. Therefore, obtaining a resolved picture of relationships among genera and species of this tribe has been a challenging task. Here, we obtain the most comprehensive molecular dataset to date in Triticeae, including one chloroplastic and 26 nuclear genes. We aim to test whether it is possible to infer phylogenetic relationships in the face of (potentially large-scale introgressive events and/or incomplete lineage sorting; to identify parts of the evolutionary history that have not evolved in a tree-like manner; and to decipher the biological causes of gene-tree conflicts in this tribe. Results We obtain resolved phylogenetic hypotheses using the supermatrix and Bayesian Concordance Factors (BCF approaches despite numerous incongruences among gene trees. These phylogenies suggest the existence of 4-5 major clades within Triticeae, with Psathyrostachys and Hordeum being the deepest genera. In addition, we construct a multigenic network that highlights parts of the Triticeae history that have not evolved in a tree-like manner. Dasypyrum, Heteranthelium and genera of clade V, grouping Secale, Taeniatherum, Triticum and Aegilops, have evolved in a reticulated manner. Their relationships are thus better represented by the multigenic network than by the supermatrix or BCF trees. Noteworthy, we demonstrate that gene-tree incongruences increase with genetic distance and are greater in telomeric than centromeric genes. Together, our results suggest that recombination is the main factor
Classification tree analysis for the discrimination of pleural exudates and transudates.
Esquerda, Aureli; Trujillano, Javier; López de Ullibarri, Ignacio; Bielsa, Silvia; Madroñero, Ana B; Porcel, José M
2007-01-01
Classification and regression tree (CART) analysis is a non-parametric technique suitable for the generation of clinical decision rules. We have studied the performance of CART analysis in the separation of pleural exudates and transudates. Basic demographic, radiologic and laboratory data were retrospectively evaluated in 1257 pleural effusions (204 transudates and 1053 exudates, according to standard clinical criteria) and submitted for CART analysis. The model's discriminative ability was compared with that of Light's criteria, in both the original formulation and an abbreviated version, i.e., deleting the pleural fluid (PF)/serum lactate dehydrogenase (LDH) ratio from the triad. A first CART model built starting from all available data identified PF/serum protein ratio and PF LDH ratios as the two best discriminatory parameters. This algorithm achieved a sensitivity of 96.8%, slightly lower than that of classical Light's criteria (98.5%) and comparable to that of the abbreviated Light's criteria (97.0%), and significantly better specificity (85.3%) compared to both classical (74.0%) and abbreviated (79.4%) Light's criteria. A second CART model developed after excluding serum measurements selected PF protein and PF LDH as the most discriminatory variables, and correctly classified 97.2% of exudates and 77.0% of transudates. CART-based algorithms can efficiently discriminate between pleural exudates and transudates.
Mora, R.; Barahona, A.; Aguilar, H.
2015-04-01
This paper presents a method for using high detail volumetric information, captured with a land based photogrammetric survey, to obtain information from individual trees. Applying LIDAR analysis techniques it is possible to measure diameter at breast height, height at first branch (commercial height), basal area and volume of an individual tree. Given this information it is possible to calculate how much of that tree can be exploited as wood. The main objective is to develop a methodology for successfully surveying one individual tree, capturing every side of the stem a using high resolution digital camera and reference marks with GPS coordinates. The process is executed for several individuals of two species present in the metropolitan area in San Jose, Costa Rica, Delonix regia (Bojer) Raf. and Tabebuia rosea (Bertol.) DC., each one with different height, stem shape and crown area. Using a photogrammetry suite all the pictures are aligned, geo-referenced and a dense point cloud is generated with enough detail to perform the required measurements, as well as a solid tridimensional model for volume measurement. This research will open the way to develop a capture methodology with an airborne camera using close range UAVs. An airborne platform will make possible to capture every individual in a forest plantation, furthermore if the analysis techniques applied in this research are automated it will be possible to calculate with high precision the exploit potential of a forest plantation and improve its management.
Analysis of ontogenetic spectra of populations of plants and lichens via ordinal regression
Sofronov, G. Yu.; Glotov, N. V.; Ivanov, S. M.
2015-03-01
Ontogenetic spectra of plants and lichens tend to vary across the populations. This means that if several subsamples within a sample (or a population) were collected, then the subsamples would not be homogeneous. Consequently, the statistical analysis of the aggregated data would not be correct, which could potentially lead to false biological conclusions. In order to take into account the heterogeneity of the subsamples, we propose to use ordinal regression, which is a type of generalized linear regression. In this paper, we study the populations of cowberry Vaccinium vitis-idaea L. and epiphytic lichens Hypogymnia physodes (L.) Nyl. and Pseudevernia furfuracea (L.) Zopf. We obtain estimates for the proportions of between-sample variability in the total variability of the ontogenetic spectra of the populations.
Lingling; TAN
2013-01-01
This article selects some major factors influencing the agricultural economic growth are selected,such as labor,capital input,farmland area,fertilizer input and information input.And it selects some factors to explain information input,such as the number of website ownership,types of books,magazines and newspapers published,the number of telephone ownership per 100 households,the number of home computers ownership per 100 households,farmers’ spending on transportation and communication,culture,education,entertainment and services, and the total number of agricultural science and technology service personnel.Using regression model,this article conducts regression analysis of the cross-section data on 31 provinces,autonomous regions and municipalities in 2010.The results show that the building of information infrastructure,the use of means of information,the popularization and promotion of knowledge of agricultural science and technology,play an important role in promoting agricultural economic growth.
THE PROGNOSIS OF RUSSIAN DEFENSE INDUSTRY DEVELOPMENT IMPLEMENTED THROUGH REGRESSION ANALYSIS
L.M. Kapustina
2007-03-01
Full Text Available The article illustrates the results of investigation the major internal and external factors which influence the development of the defense industry, as well as the results of regression analysis which quantitatively displays the factorial contribution in the growth rate of Russian defense industry. On the basis of calculated regression dependences the authors fulfilled the medium-term prognosis of defense industry. Optimistic and inertial versions of defense product growth rate for the period up to 2009 are based on scenario conditions in Russian economy worked out by the Ministry of economy and development. In conclusion authors point out which factors and conditions have the largest impact on successful and stable operation of Russian defense industry.
COLOR IMAGE RETRIEVAL BASED ON FEATURE FUSION THROUGH MULTIPLE LINEAR REGRESSION ANALYSIS
K. Seetharaman
2015-08-01
Full Text Available This paper proposes a novel technique based on feature fusion using multiple linear regression analysis, and the least-square estimation method is employed to estimate the parameters. The given input query image is segmented into various regions according to the structure of the image. The color and texture features are extracted on each region of the query image, and the features are fused together using the multiple linear regression model. The estimated parameters of the model, which is modeled based on the features, are formed as a vector called a feature vector. The Canberra distance measure is adopted to compare the feature vectors of the query and target images. The F-measure is applied to evaluate the performance of the proposed technique. The obtained results expose that the proposed technique is comparable to the other existing techniques.
The Fault tree analysis of the lead acid battery’s degradation
K. BRIK
2008-06-01
Full Text Available In this paper the authors present an approach of reliability to analyze lead-acid battery’s degradation. The construction of causal tree analysis offers a framework privileged to the deductive analysis which consists in seeking the various possible combinations of events leading to the loss of batteries capacity. The description of the causality chain is completed by a fault tree analysis (FTA established from the equivalent electric circuit of battery.
The Fault tree analysis of the lead acid battery’s degradation
K. BRIK; F. BEN AMMAR
2008-01-01
In this paper the authors present an approach of reliability to analyze lead-acid battery’s degradation. The construction of causal tree analysis offers a framework privileged to the deductive analysis which consists in seeking the various possible combinations of events leading to the loss of batteries capacity. The description of the causality chain is completed by a fault tree analysis (FTA) established from the equivalent electric circuit of battery.
Orthology prediction at scalable resolution by phylogenetic tree analysis
Huynen Martijn A
2007-03-01
Full Text Available Abstract Background Orthology is one of the cornerstones of gene function prediction. Dividing the phylogenetic relations between genes into either orthologs or paralogs is however an oversimplification. Already in two-species gene-phylogenies, the complicated, non-transitive nature of phylogenetic relations results in inparalogs and outparalogs. For situations with more than two species we lack semantics to specifically describe the phylogenetic relations, let alone to exploit them. Published procedures to extract orthologous groups from phylogenetic trees do not allow identification of orthology at various levels of resolution, nor do they document the relations between the orthologous groups. Results We introduce "levels of orthology" to describe the multi-level nature of gene relations. This is implemented in a program LOFT (Levels of Orthology From Trees that assigns hierarchical orthology numbers to genes based on a phylogenetic tree. To decide upon speciation and gene duplication events in a tree LOFT can be instructed either to perform classical species-tree reconciliation or to use the species overlap between partitions in the tree. The hierarchical orthology numbers assigned by LOFT effectively summarize the phylogenetic relations between genes. The resulting high-resolution orthologous groups are depicted in colour, facilitating visual inspection of (large trees. A benchmark for orthology prediction, that takes into account the varying levels of orthology between genes, shows that the phylogeny-based high-resolution orthology assignments made by LOFT are reliable. Conclusion The "levels of orthology" concept offers high resolution, reliable orthology, while preserving the relations between orthologous groups. A Windows as well as a preliminary Java version of LOFT is available from the LOFT website http://www.cmbi.ru.nl/LOFT.
Economic Cost-Analysis of the Impact of Container Size on Transplanted Tree Value
Lauren M. Garcia Chance
2017-04-01
Full Text Available The benefits and costs of varying container sizes have yet to be fully evaluated to determine which container size affords the most advantageous opportunity for consumers. To determine value of the tree following transplant, clonal replicates of Vitex agnus-castus L. [Chaste Tree], Acer rubrum L. var. drummondii (Hook. & Arn. ex Nutt. Sarg. [Drummond Red Maple], and Taxodium distichum (L. Rich. [Baldcypress] were grown under common conditions in each of five container sizes 3.5, 11.7, 23.3, 97.8 or 175.0 L, respectively (#1, 3, 7, 25 or 45. In June 2013, six trees of each container size and species were transplanted to a sandy clay loam field in College Station, Texas. To determine the increase in value over a two-year post-transplant period, height and caliper measurements were taken at the end of nursery production and again at the end of the second growing season in the field, October 2014. Utilizing industry standards, initial costs of materials and labor were then compared with the size of trees after two years. Replacement cost analysis after two growing seasons indicated a greater increase in value for 11.7 and 23.3 L trees compared to losses in value for some 175.0 L trees. In comparison with trees from larger containers, trees from smaller size containers experienced shorter establishment times and increased growth rates, thus creating a quicker return on investment for trees transplanted from the smaller container sizes.
A refined method for multivariate meta-analysis and meta-regression.
Jackson, Daniel; Riley, Richard D
2014-02-20
Making inferences about the average treatment effect using the random effects model for meta-analysis is problematic in the common situation where there is a small number of studies. This is because estimates of the between-study variance are not precise enough to accurately apply the conventional methods for testing and deriving a confidence interval for the average effect. We have found that a refined method for univariate meta-analysis, which applies a scaling factor to the estimated effects' standard error, provides more accurate inference. We explain how to extend this method to the multivariate scenario and show that our proposal for refined multivariate meta-analysis and meta-regression can provide more accurate inferences than the more conventional approach. We explain how our proposed approach can be implemented using standard output from multivariate meta-analysis software packages and apply our methodology to two real examples. Copyright © 2013 John Wiley & Sons, Ltd.
Study of Hip Fracture Risk using Tree Structured Survival Analysis
Lu Y
2003-01-01
Full Text Available In dieser Studie wird das Hüftfraktur-Risiko bei postmenopausalen Frauen untersucht, indem die Frauen in verschiedene Subgruppen hinsichtlich dieses Risikos klassifiziert werden. Frauen in einer gemeinsamen Subgruppe haben ein ähnliches Risiko, hingegen in verschiedenen Subgruppen ein unterschiedliches Hüftfraktur-Risiko. Die Subgruppen wurden mittels der Tree Structured Survival Analysis (TSSA aus den Daten von 7.665 Frauen der SOF (Study of Osteoporosis Fracture ermittelt. Bei allen Studienteilnehmerinnen wurde die Knochenmineraldichte (BMD von Unterarm, Oberschenkelhals, Hüfte und Wirbelsäule gemessen. Die Zeit von der BMD-Messung bis zur Hüftfraktur wurde als Endpunkt notiert. Eine Stichprobe von 75% der Teilnehmerinnen wurde verwendet, um die prognostischen Subgruppen zu bilden (Trainings-Datensatz, während die anderen 25% als Bestätigung der Ergebnisse diente (Validierungs-Datensatz. Aufgrund des Trainings-Datensatzes konnten mittels TSSA 4 Subgruppen identifiziert werden, deren Hüftfraktur-Risiko bei einem Follow-up von im Mittel 6,5 Jahren bei 19%, 9%, 4% und 1% lag. Die Einteilung in die Subgruppen erfolgte aufgrund der Bewertung der BMD des Ward'schen Dreiecks sowie des Oberschenkelhalses und nach dem Alter. Diese Ergebnisse konnten mittels des Validierungs-Datensatzes reproduziert werden, was die Sinnhaftigkeit der Klassifizierungregeln in einem klinischen Setting bestätigte. Mittels TSSA war eine sinnvolle, aussagekräftige und reproduzierbare Identifikation von prognostischen Subgruppen, die auf dem Alter und den BMD-Werten beruhen, möglich. In this paper we studied the risk of hip fracture for post-menopausal women by classifying women into different subgroups based on their risk of hip fracture. The subgroups were generated such that all the women in a particular subgroup had relatively similar risk while women belonging to two different subgroups had rather different risks of hip fracture. We used the Tree Structured
Sensitivity Investigation of Fault Tree Analysis with Matrix-Algebraic Method
Pokorádi László
2011-04-01
Full Text Available The Fault Tree Analysis (FTA is a systematic, deductive (top-down type and probabilistic risk assessment tool which shows the causal relations leading to a given undesired event, referred to as the “Top Event” (TE. The events, which cannot be subdivided, are called the Basic Events. Fault Tree diagram displays the undesired state of the investigated system (top event in terms of the states of its components (basic events. The Fault Tree Analysis is a graphical design technique main result of which is a tree, a dendritic graph. Probabilistic Fault Tree Analysis (PFTA is a quantitative analysis method used to calculate the probability of Top Event from given failure probabilities of system components. The objective of the sensitivity analysis is to show how the change in any system parameter influences the resultant reliability value of the whole system. The main aim of this study is to elaborate an easy-used algorithm for setting-up of Linear Fault Tree Sensitivity Model (LFTSM. This modular approach process uses matrix-algebraic method based upon the mathematical diagnostic modeling of aircraft systems and gas turbine engines. The paper shows the adaptation of linear mathematical diagnostic modeling methodology for setting-up of LFTSM and its possibility of use to investigate Fault Tree sensitivity by a demonstrative example.
Analysis of Nonlinear Structural Dynamics and Resonance in Trees
H. Doumiri Ganji
2012-01-01
Full Text Available Wind and gravity both impact trees in storms, but wind loads greatly exceed gravity loads in most situations. Complex behavior of trees in windstorms is gradually turning into a controversial concern among ecological engineers. To better understand the effects of nonlinear behavior of trees, the dynamic forces on tree structures during periods of high winds have been examined as a mass-spring system. In fact, the simulated dynamic forces created by strong winds are studied in order to determine the responses of the trees to such dynamic loads. Many of such nonlinear differential equations are complicated to solve. Therefore, this paper focuses on an accurate and simple solution, Differential Transformation Method (DTM, to solve the derived equation. In this regard, the concept of differential transformation is briefly introduced. The approximate solution to this equation is calculated in the form of a series with easily computable terms. Then, the method has been employed to achieve an acceptable solution to the presented nonlinear differential equation. To verify the accuracy of the proposed method, the obtained results from DTM are compared with those from the numerical solution. The results reveal that this method gives successive approximations of high accuracy solution.
Rhodes, Kirsty M; Turner, Rebecca M; White, Ian R; Jackson, Dan; Spiegelhalter, David J; Higgins, Julian P T
2016-12-20
Many meta-analyses combine results from only a small number of studies, a situation in which the between-study variance is imprecisely estimated when standard methods are applied. Bayesian meta-analysis allows incorporation of external evidence on heterogeneity, providing the potential for more robust inference on the effect size of interest. We present a method for performing Bayesian meta-analysis using data augmentation, in which we represent an informative conjugate prior for between-study variance by pseudo data and use meta-regression for estimation. To assist in this, we derive predictive inverse-gamma distributions for the between-study variance expected in future meta-analyses. These may serve as priors for heterogeneity in new meta-analyses. In a simulation study, we compare approximate Bayesian methods using meta-regression and pseudo data against fully Bayesian approaches based on importance sampling techniques and Markov chain Monte Carlo (MCMC). We compare the frequentist properties of these Bayesian methods with those of the commonly used frequentist DerSimonian and Laird procedure. The method is implemented in standard statistical software and provides a less complex alternative to standard MCMC approaches. An importance sampling approach produces almost identical results to standard MCMC approaches, and results obtained through meta-regression and pseudo data are very similar. On average, data augmentation provides closer results to MCMC, if implemented using restricted maximum likelihood estimation rather than DerSimonian and Laird or maximum likelihood estimation. The methods are applied to real datasets, and an extension to network meta-analysis is described. The proposed method facilitates Bayesian meta-analysis in a way that is accessible to applied researchers. © 2016 The Authors. Statistics in Medicine Published by John Wiley & Sons Ltd.
Parallel TREE code for two-component ultracold plasma analysis
Jeon, Byoungseon; Kress, Joel D.; Collins, Lee A.; Grønbech-Jensen, Niels
2008-02-01
The TREE method has been widely used for long-range interaction N-body problems. We have developed a parallel TREE code for two-component classical plasmas with open boundary conditions and highly non-uniform charge distributions. The program efficiently handles millions of particles evolved over long relaxation times requiring millions of time steps. Appropriate domain decomposition and dynamic data management were employed, and large-scale parallel processing was achieved using an intermediate level of granularity of domain decomposition and ghost TREE communication. Even though the computational load is not fully distributed in fine grains, high parallel efficiency was achieved for ultracold plasma systems of charged particles. As an application, we performed simulations of an ultracold neutral plasma with a half million particles and a half million time steps. For the long temporal trajectories of relaxation between heavy ions and light electrons, large configurations of ultracold plasmas can now be investigated, which was not possible in past studies.
Ryu, Duchwan
2010-09-28
We consider nonparametric regression analysis in a generalized linear model (GLM) framework for data with covariates that are the subject-specific random effects of longitudinal measurements. The usual assumption that the effects of the longitudinal covariate processes are linear in the GLM may be unrealistic and if this happens it can cast doubt on the inference of observed covariate effects. Allowing the regression functions to be unknown, we propose to apply Bayesian nonparametric methods including cubic smoothing splines or P-splines for the possible nonlinearity and use an additive model in this complex setting. To improve computational efficiency, we propose the use of data-augmentation schemes. The approach allows flexible covariance structures for the random effects and within-subject measurement errors of the longitudinal processes. The posterior model space is explored through a Markov chain Monte Carlo (MCMC) sampler. The proposed methods are illustrated and compared to other approaches, the "naive" approach and the regression calibration, via simulations and by an application that investigates the relationship between obesity in adulthood and childhood growth curves. © 2010, The International Biometric Society.
Ryu, Duchwan; Li, Erning; Mallick, Bani K
2011-06-01
We consider nonparametric regression analysis in a generalized linear model (GLM) framework for data with covariates that are the subject-specific random effects of longitudinal measurements. The usual assumption that the effects of the longitudinal covariate processes are linear in the GLM may be unrealistic and if this happens it can cast doubt on the inference of observed covariate effects. Allowing the regression functions to be unknown, we propose to apply Bayesian nonparametric methods including cubic smoothing splines or P-splines for the possible nonlinearity and use an additive model in this complex setting. To improve computational efficiency, we propose the use of data-augmentation schemes. The approach allows flexible covariance structures for the random effects and within-subject measurement errors of the longitudinal processes. The posterior model space is explored through a Markov chain Monte Carlo (MCMC) sampler. The proposed methods are illustrated and compared to other approaches, the "naive" approach and the regression calibration, via simulations and by an application that investigates the relationship between obesity in adulthood and childhood growth curves.
Evaluation of a LASSO regression approach on the unrelated samples of Genetic Analysis Workshop 17.
Guo, Wei; Elston, Robert C; Zhu, Xiaofeng
2011-11-29
The Genetic Analysis Workshop 17 data we used comprise 697 unrelated individuals genotyped at 24,487 single-nucleotide polymorphisms (SNPs) from a mini-exome scan, using real sequence data for 3,205 genes annotated by the 1000 Genomes Project and simulated phenotypes. We studied 200 sets of simulated phenotypes of trait Q2. An important feature of this data set is that most SNPs are rare, with 87% of the SNPs having a minor allele frequency less than 0.05. For rare SNP detection, in this study we performed a least absolute shrinkage and selection operator (LASSO) regression and F tests at the gene level and calculated the generalized degrees of freedom to avoid any selection bias. For comparison, we also carried out linear regression and the collapsing method, which sums the rare SNPs, modified for a quantitative trait and with two different allele frequency thresholds. The aim of this paper is to evaluate these four approaches in this mini-exome data and compare their performance in terms of power and false positive rates. In most situations the LASSO approach is more powerful than linear regression and collapsing methods. We also note the difficulty in determining the optimal threshold for the collapsing method and the significant role that linkage disequilibrium plays in detecting rare causal SNPs. If a rare causal SNP is in strong linkage disequilibrium with a common marker in the same gene, power will be much improved.
Use of Fault Tree Analysis for Automotive Reliability and Safety Analysis
Lambert, H
2003-09-24
Fault tree analysis (FTA) evolved from the aerospace industry in the 1960's. A fault tree is deductive logic model that is generated with a top undesired event in mind. FTA answers the question, ''how can something occur?'' as opposed to failure modes and effects analysis (FMEA) that is inductive and answers the question, ''what if?'' FTA is used in risk, reliability and safety assessments. FTA is currently being used by several industries such as nuclear power and chemical processing. Typically the automotive industries uses failure modes and effects analysis (FMEA) such as design FMEAs and process FMEAs. The use of FTA has spread to the automotive industry. This paper discusses the use of FTA for automotive applications. With the addition automotive electronics for various applications in systems such as engine/power control, cruise control and braking/traction, FTA is well suited to address failure modes within these systems. FTA can determine the importance of these failure modes from various perspectives such as cost, reliability and safety. A fault tree analysis of a car starting system is presented as an example.
Javali Shivalingappa
2010-01-01
Full Text Available Aim: The study aimed to determine the factors associated with periodontal disease (different levels of severity by using different regression models for ordinal data. Design: A cross-sectional design was employed using clinical examination and ′questionnaire with interview′ method. Materials and Methods: The study was conducted during June 2008 to October 2008 in Dharwad, Karnataka, India. It involved a systematic random sample of 1760 individuals aged 18-40 years. The periodontal disease examination was conducted by using Community Periodontal Index for Treatment Needs (CPITN. Statistical Analysis Used: Regression models for ordinal data with different built-in link functions were used in determination of factors associated with periodontal disease. Results: The study findings indicated that, the ordinal regression models with four built-in link functions (logit, probit, Clog-log and nlog-log displayed similar results with negligible differences in significant factors associated with periodontal disease. The factors such as religion, caste, sources of drinking water, Timings for sweet consumption, Timings for cleaning or brushing the teeth and materials used for brushing teeth were significantly associated with periodontal disease in all ordinal models. Conclusions: The ordinal regression model with Clog-log is a better fit in determination of significant factors associated with periodontal disease as compared to models with logit, probit and nlog-log built-in link functions. The factors such as caste and time for sweet consumption are negatively associated with periodontal disease. But religion, sources of drinking water, Timings for cleaning or brushing the teeth and materials used for brushing teeth are significantly and positively associated with periodontal disease.
Analysis of sparse data in logistic regression in medical research: A newer approach
S Devika
2016-01-01
Full Text Available Background and Objective: In the analysis of dichotomous type response variable, logistic regression is usually used. However, the performance of logistic regression in the presence of sparse data is questionable. In such a situation, a common problem is the presence of high odds ratios (ORs with very wide 95% confidence interval (CI (OR: >999.999, 95% CI: 999.999. In this paper, we addressed this issue by using penalized logistic regression (PLR method. Materials and Methods: Data from case-control study on hyponatremia and hiccups conducted in Christian Medical College, Vellore, Tamil Nadu, India was used. The outcome variable was the presence/absence of hiccups and the main exposure variable was the status of hyponatremia. Simulation dataset was created with different sample sizes and with a different number of covariates. Results: A total of 23 cases and 50 controls were used for the analysis of ordinary and PLR methods. The main exposure variable hyponatremia was present in nine (39.13% of the cases and in four (8.0% of the controls. Of the 23 hiccup cases, all were males and among the controls, 46 (92.0% were males. Thus, the complete separation between gender and the disease group led into an infinite OR with 95% CI (OR: >999.999, 95% CI: 999.999 whereas there was a finite and consistent regression coefficient for gender (OR: 5.35; 95% CI: 0.42, 816.48 using PLR. After adjusting for all the confounding variables, hyponatremia entailed 7.9 (95% CI: 2.06, 38.86 times higher risk for the development of hiccups as was found using PLR whereas there was an overestimation of risk OR: 10.76 (95% CI: 2.17, 53.41 using the conventional method. Simulation experiment shows that the estimated coverage probability of this method is near the nominal level of 95% even for small sample sizes and for a large number of covariates. Conclusions: PLR is almost equal to the ordinary logistic regression when the sample size is large and is superior in small cell
Dissecting the space-time structure of tree-ring datasets using the partial triadic analysis.
Rossi, Jean-Pierre; Nardin, Maxime; Godefroid, Martin; Ruiz-Diaz, Manuela; Sergent, Anne-Sophie; Martinez-Meier, Alejandro; Pâques, Luc; Rozenberg, Philippe
2014-01-01
Tree-ring datasets are used in a variety of circumstances, including archeology, climatology, forest ecology, and wood technology. These data are based on microdensity profiles and consist of a set of tree-ring descriptors, such as ring width or early/latewood density, measured for a set of individual trees. Because successive rings correspond to successive years, the resulting dataset is a ring variables × trees × time datacube. Multivariate statistical analyses, such as principal component analysis, have been widely used for extracting worthwhile information from ring datasets, but they typically address two-way matrices, such as ring variables × trees or ring variables × time. Here, we explore the potential of the partial triadic analysis (PTA), a multivariate method dedicated to the analysis of three-way datasets, to apprehend the space-time structure of tree-ring datasets. We analyzed a set of 11 tree-ring descriptors measured in 149 georeferenced individuals of European larch (Larix decidua Miller) during the period of 1967-2007. The processing of densitometry profiles led to a set of ring descriptors for each tree and for each year from 1967-2007. The resulting three-way data table was subjected to two distinct analyses in order to explore i) the temporal evolution of spatial structures and ii) the spatial structure of temporal dynamics. We report the presence of a spatial structure common to the different years, highlighting the inter-individual variability of the ring descriptors at the stand scale. We found a temporal trajectory common to the trees that could be separated into a high and low frequency signal, corresponding to inter-annual variations possibly related to defoliation events and a long-term trend possibly related to climate change. We conclude that PTA is a powerful tool to unravel and hierarchize the different sources of variation within tree-ring datasets.
Dissecting the space-time structure of tree-ring datasets using the partial triadic analysis.
Jean-Pierre Rossi
Full Text Available Tree-ring datasets are used in a variety of circumstances, including archeology, climatology, forest ecology, and wood technology. These data are based on microdensity profiles and consist of a set of tree-ring descriptors, such as ring width or early/latewood density, measured for a set of individual trees. Because successive rings correspond to successive years, the resulting dataset is a ring variables × trees × time datacube. Multivariate statistical analyses, such as principal component analysis, have been widely used for extracting worthwhile information from ring datasets, but they typically address two-way matrices, such as ring variables × trees or ring variables × time. Here, we explore the potential of the partial triadic analysis (PTA, a multivariate method dedicated to the analysis of three-way datasets, to apprehend the space-time structure of tree-ring datasets. We analyzed a set of 11 tree-ring descriptors measured in 149 georeferenced individuals of European larch (Larix decidua Miller during the period of 1967-2007. The processing of densitometry profiles led to a set of ring descriptors for each tree and for each year from 1967-2007. The resulting three-way data table was subjected to two distinct analyses in order to explore i the temporal evolution of spatial structures and ii the spatial structure of temporal dynamics. We report the presence of a spatial structure common to the different years, highlighting the inter-individual variability of the ring descriptors at the stand scale. We found a temporal trajectory common to the trees that could be separated into a high and low frequency signal, corresponding to inter-annual variations possibly related to defoliation events and a long-term trend possibly related to climate change. We conclude that PTA is a powerful tool to unravel and hierarchize the different sources of variation within tree-ring datasets.
Verbeek, A. A.
The analysis of extracts from tree leaf, bark and wood samples for Ca, Mg, K, Na, P, Mn, Fe, Al, B, Cu and Zn by inductively coupled argon plasma sequential emission spectrometry is described. Recovery percentages for simulated tree extracts and for spiked tree samples are presented together with typical analysis values for a leaf and a wood sample. The choice of analytical line for each element is discussed and spectral interferences, not listed in the ICP tables of Boumans, of Cu on the 214.9 nm line of P and of Fe on the 249.7 nm line of B are noted.
Seismic signal analysis based on the dual-tree complex wavelet packet transform
谢周敏; 王恩福; 张国宏; 赵国存; 陈旭庚
2004-01-01
We tried to apply the dual-tree complex wavelet packet transform in seismic signal analysis. The complex waveletpacket transform (CWPT) combine the merits of real wavelet packet transform with that of complex continuouswavelet transform (CCWT). It can not only pick up the phase information of signal, but also produce better "focalizing" function if it matches the phase spectrum of signals analyzed. We here described the dual-tree CWPT algorithm, and gave the examples of simulation and actual seismic signals analysis. As shown by our results, thedual-tree CWPT is a very efecfive method in analyzing seismic signals with non-linear phase.
Hofland, G.S.; Barton, C.C.
1990-10-01
The computer program FREQFIT is designed to perform regression and statistical chi-squared goodness of fit analysis on one-dimensional or two-dimensional data. The program features an interactive user dialogue, numerous help messages, an option for screen or line printer output, and the flexibility to use practically any commercially available graphics package to create plots of the program`s results. FREQFIT is written in Microsoft QuickBASIC, for IBM-PC compatible computers. A listing of the QuickBASIC source code for the FREQFIT program, a user manual, and sample input data, output, and plots are included. 6 refs., 1 fig.
Gunay, Ahmet [Deparment of Environmental Engineering, Faculty of Engineering and Architecture, Balikesir University (Turkey)], E-mail: ahmetgunay2@gmail.com
2007-09-30
The experimental data of ammonium exchange by natural Bigadic clinoptilolite was evaluated using nonlinear regression analysis. Three two-parameters isotherm models (Langmuir, Freundlich and Temkin) and three three-parameters isotherm models (Redlich-Peterson, Sips and Khan) were used to analyse the equilibrium data. Fitting of isotherm models was determined using values of standard normalization error procedure (SNE) and coefficient of determination (R{sup 2}). HYBRID error function provided lowest sum of normalized error and Khan model had better performance for modeling the equilibrium data. Thermodynamic investigation indicated that ammonium removal by clinoptilolite was favorable at lower temperatures and exothermic in nature.
Zhigao Zeng
2016-01-01
Full Text Available This paper proposes a novel algorithm to solve the challenging problem of classifying error-diffused halftone images. We firstly design the class feature matrices, after extracting the image patches according to their statistics characteristics, to classify the error-diffused halftone images. Then, the spectral regression kernel discriminant analysis is used for feature dimension reduction. The error-diffused halftone images are finally classified using an idea similar to the nearest centroids classifier. As demonstrated by the experimental results, our method is fast and can achieve a high classification accuracy rate with an added benefit of robustness in tackling noise.
A systematic review and meta-regression analysis of mivacurium for tracheal intubation
Vanlinthout, L.E.H.; Mesfin, S.H.; Hens, Niel; Vanacker, B. F.; Robertson, E. N.; Booij, L. H. D. J.
2014-01-01
We systematically reviewed factors associated with intubation conditions in randomised controlled trials of mivacurium, using random-effects meta-regression analysis. We included 29 studies of 1050 healthy participants. Four factors explained 72.9% of the variation in the probability of excellent intubation conditions: mivacurium dose, 24.4%; opioid use, 29.9%; time to intubation and age together, 18.6%. The odds ratio (95% CI) for excellent intubation was 3.14 (1.65–5.73) for doubling the mi...
Mehdi Najafi; Seyed Mohammad Esmaiel Jalali; Reza KhaloKakaie; Farrokh Forouhandeh
2015-01-01
During underground coal gasification (UCG), whereby coal is converted to syngas in situ, a cavity is formed in the coal seam. The cavity growth rate (CGR) or the moving rate of the gasification face is affected by controllable (operation pressure, gasification time, geometry of UCG panel) and uncontrollable (coal seam properties) factors. The CGR is usually predicted by mathematical models and laboratory experiments, which are time consuming, cumbersome and expensive. In this paper, a new simple model for CGR is developed using non-linear regression analysis, based on data from 11 UCG field trials. The empirical model compares satisfactorily with Perkins model and can reliably predict CGR.
Hofland, G.S.; Barton, C.C.
1990-10-01
The computer program FREQFIT is designed to perform regression and statistical chi-squared goodness of fit analysis on one-dimensional or two-dimensional data. The program features an interactive user dialogue, numerous help messages, an option for screen or line printer output, and the flexibility to use practically any commercially available graphics package to create plots of the program`s results. FREQFIT is written in Microsoft QuickBASIC, for IBM-PC compatible computers. A listing of the QuickBASIC source code for the FREQFIT program, a user manual, and sample input data, output, and plots are included. 6 refs., 1 fig.
Kinnebrock, Silja; Podolskij, Mark
This paper introduces a new estimator to measure the ex-post covariation between high-frequency financial time series under market microstructure noise. We provide an asymptotic limit theory (including feasible central limit theorems) for standard methods such as regression, correlation analysis...... and covariance, for which we obtain the optimal rate of convergence. We demonstrate some positive semidefinite estimators of the covariation and construct a positive semidefinite estimator of the conditional covariance matrix in the central limit theorem. Furthermore, we indicate how the assumptions on the noise...
Regression analysis as an objective tool of economic management of rolling mill
Š. Vilamová
2015-07-01
Full Text Available The ability to optimize costs plays a key role in maintaining competitiveness of the company, because without detailed knowledge of costs, companies are not able to make the right decisions that will ensure their long-term growth. The aim of this article is to outline the problematic areas related to company costs and to contribute to a debate on the method used to determine the amount of fixed and variable costs, their monitoring and follow-up control. This article presents a potential use of regression analysis as an objective tool of economic management in metallurgical companies, as these companies have several specific features
Kinnebrock, Silja; Podolskij, Mark
and covariance, for which we obtain the optimal rate of convergence. We demonstrate some positive semidefinite estimators of the covariation and construct a positive semidefinite estimator of the conditional covariance matrix in the central limit theorem. Furthermore, we indicate how the assumptions on the noise......This paper introduces a new estimator to measure the ex-post covariation between high-frequency financial time series under market microstructure noise. We provide an asymptotic limit theory (including feasible central limit theorems) for standard methods such as regression, correlation analysis...
Estimating the causes of traffic accidents using logistic regression and discriminant analysis.
Karacasu, Murat; Ergül, Barış; Altin Yavuz, Arzu
2014-01-01
Factors that affect traffic accidents have been analysed in various ways. In this study, we use the methods of logistic regression and discriminant analysis to determine the damages due to injury and non-injury accidents in the Eskisehir Province. Data were obtained from the accident reports of the General Directorate of Security in Eskisehir; 2552 traffic accidents between January and December 2009 were investigated regarding whether they resulted in injury. According to the results, the effects of traffic accidents were reflected in the variables. These results provide a wealth of information that may aid future measures toward the prevention of undesired results.
Neck-focused panic attacks among Cambodian refugees; a logistic and linear regression analysis.
Hinton, Devon E; Chhean, Dara; Pich, Vuth; Um, Khin; Fama, Jeanne M; Pollack, Mark H
2006-01-01
Consecutive Cambodian refugees attending a psychiatric clinic were assessed for the presence and severity of current--i.e., at least one episode in the last month--neck-focused panic. Among the whole sample (N=130), in a logistic regression analysis, the Anxiety Sensitivity Index (ASI; odds ratio=3.70) and the Clinician-Administered PTSD Scale (CAPS; odds ratio=2.61) significantly predicted the presence of current neck panic (NP). Among the neck panic patients (N=60), in the linear regression analysis, NP severity was significantly predicted by NP-associated flashbacks (beta=.42), NP-associated catastrophic cognitions (beta=.22), and CAPS score (beta=.28). Further analysis revealed the effect of the CAPS score to be significantly mediated (Sobel test [Baron, R. M., & Kenny, D. A. (1986). The moderator-mediator variable distinction in social psychological research: conceptual, strategic, and statistical considerations. Journal of Personality and Social Psychology, 51, 1173-1182]) by both NP-associated flashbacks and catastrophic cognitions. In the care of traumatized Cambodian refugees, NP severity, as well as NP-associated flashbacks and catastrophic cognitions, should be specifically assessed and treated.
Mandal, Nilrudra; Doloi, Biswanath; Mondal, Biswanath
2016-01-01
In the present study, an attempt has been made to apply the Taguchi parameter design method and regression analysis for optimizing the cutting conditions on surface finish while machining AISI 4340 steel with the help of the newly developed yttria based Zirconia Toughened Alumina (ZTA) inserts. These inserts are prepared through wet chemical co-precipitation route followed by powder metallurgy process. Experiments have been carried out based on an orthogonal array L9 with three parameters (cutting speed, depth of cut and feed rate) at three levels (low, medium and high). Based on the mean response and signal to noise ratio (SNR), the best optimal cutting condition has been arrived at A3B1C1 i.e. cutting speed is 420 m/min, depth of cut is 0.5 mm and feed rate is 0.12 m/min considering the condition smaller is the better approach. Analysis of Variance (ANOVA) is applied to find out the significance and percentage contribution of each parameter. The mathematical model of surface roughness has been developed using regression analysis as a function of the above mentioned independent variables. The predicted values from the developed model and experimental values are found to be very close to each other justifying the significance of the model. A confirmation run has been carried out with 95 % confidence level to verify the optimized result and the values obtained are within the prescribed limit.
管军; 杨兴易; 赵良; 林兆奋; 郭昌星; 李文放
2003-01-01
Objective To investigate the incidence, crude mortality and independent risk factors of ventilator-associated pneumonia (VAP) in comprehensive ICU in China.Methods The clinical and microbiological data were retrospectively collected and analysed of all the 97 patients receiving mechanical ventilation (>48hr) in our comprehensive ICU during 1999. 1 - 2000. 12. Firstly several statistically significant risk factors were screened out with univariate analysis, then independent risk factors were determined with multivariate stepwise logistic regression analysis.Results The incidence of VAP was 54. 64% (15. 60 cases per 1000 ventilation days), the crude mortality 47.42% . Interval between the establishment of artificial airway and diagnosis of VAP was 6.9 ± 4.3 d. Univariate analysis suggested that indwelling naso-gastric tube, corticosteroid, acid inhibitor, third-generation cephalosporin/ imipenem, non - infection lung disease, and extrapulmonary infection were the statistically significant risk factors of
Selenium Exposure and Cancer Risk: an Updated Meta-analysis and Meta-regression.
Cai, Xianlei; Wang, Chen; Yu, Wanqi; Fan, Wenjie; Wang, Shan; Shen, Ning; Wu, Pengcheng; Li, Xiuyang; Wang, Fudi
2016-01-20
The objective of this study was to investigate the associations between selenium exposure and cancer risk. We identified 69 studies and applied meta-analysis, meta-regression and dose-response analysis to obtain available evidence. The results indicated that high selenium exposure had a protective effect on cancer risk (pooled OR = 0.78; 95%CI: 0.73-0.83). The results of linear and nonlinear dose-response analysis indicated that high serum/plasma selenium and toenail selenium had the efficacy on cancer prevention. However, we did not find a protective efficacy of selenium supplement. High selenium exposure may have different effects on specific types of cancer. It decreased the risk of breast cancer, lung cancer, esophageal cancer, gastric cancer, and prostate cancer, but it was not associated with colorectal cancer, bladder cancer, and skin cancer.
An Analysis on Performance of Decision Tree Algorithms using Student’s Qualitative Data
T.Miranda Lakshmi
2013-06-01
Full Text Available Decision Tree is the most widely applied supervised classification technique. The learning and classification steps of decision tree induction are simple and fast and it can be applied to any domain. In this research student qualitative data has been taken from educational data mining and the performance analysis of the decision tree algorithm ID3, C4.5 and CART are compared. The comparison result shows that the Gini Index of CART influence information Gain Ratio of ID3 and C4.5. The classification accuracy of CART is higher when compared to ID3 and C4.5. However the difference in classification accuracy between the decision tree algorithms is not considerably higher. The experimental results of decision tree indicate that student’s performance also influenced by qualitative factors.
Complexity Analysis of Balloon Drawing for Rooted Trees
Lin, Chun-Cheng; Poon, Sheung-Hung; Fan, Jia-Hao
2010-01-01
In a balloon drawing of a tree, all the children under the same parent are placed on the circumference of the circle centered at their parent, and the radius of the circle centered at each node along any path from the root reflects the number of descendants associated with the node. Among various styles of tree drawings reported in the literature, the balloon drawing enjoys a desirable feature of displaying tree structures in a rather balanced fashion. For each internal node in a balloon drawing, the ray from the node to each of its children divides the wedge accommodating the subtree rooted at the child into two sub-wedges. Depending on whether the two sub-wedge angles are required to be identical or not, a balloon drawing can further be divided into two types: even sub-wedge and uneven sub-wedge types. In the most general case, for any internal node in the tree there are two dimensions of freedom that affect the quality of a balloon drawing: (1) altering the order in which the children of the node appear in...
Population Analysis Of Emergent Timber Trees Species (Etts) In Iko ...
The mean numbers of the ETTs per transect ranged between 0.855 and ... The density of all ETTs in Iko Esai Community Forest Reserve was 15.7 trees per hectare. ... Piptadeniastrum africanum ranked first while Irvingia gabonensis took the ...
Transfer Matrix Method for Natural Vibration Analysis of Tree System
Bin He
2012-01-01
Full Text Available The application of Transfer matrix method (TMM ranges from linear/nonlinear vibration, composite structure, and multibody system to calculating static deformation, natural vibration, dynamical response, and damage identification. Generally TMM has two characteristics: (1 the TMM formulae share similarity to the chain mechanics model in terms of topology structure; then TMM often is selected as a powerful tool to analyze the chain system. (2 TMM is adopted to deal with the problems of the discrete system, continuous system, and especial discrete/continuous coupling system with the uniform matrix form. In this investigation, a novel TMM is proposed to analyze the natural vibration of the tree system. In order to make the TMM of the tree system have the two above advantages of the TMM of the chain system, the suitable state vectors and transfer matrices of the typical components of the tree system are constructed. Then the topology comparability between the mechanics model and its corresponding formulae of TMM can be adopted to assembling the transfer matrices and transfer equations of the global tree system. Two examples of natural vibration problems validating the method are given. The formulation of the proposed TMM is mathematically intuitive and can be held and applied by the engineers easily.
Time dependent analysis with dynamic counter measure trees
Kumar, Rajesh; Guck, Dennis; Stoelinga, Mariëlle Ida Antoinette
2015-01-01
The success of a security attack crucially depends on time: the more time available to the attacker, the higher the probability of a successful attack. Formalisms such as Reliability block diagrams, Reliability graphs and Attack Countermeasure trees provide quantitative information about attack scen
Automated particle identification through regression analysis of size, shape and colour
Rodriguez Luna, J. C.; Cooper, J. M.; Neale, S. L.
2016-04-01
Rapid point of care diagnostic tests and tests to provide therapeutic information are now available for a range of specific conditions from the measurement of blood glucose levels for diabetes to card agglutination tests for parasitic infections. Due to a lack of specificity these test are often then backed up by more conventional lab based diagnostic methods for example a card agglutination test may be carried out for a suspected parasitic infection in the field and if positive a blood sample can then be sent to a lab for confirmation. The eventual diagnosis is often achieved by microscopic examination of the sample. In this paper we propose a computerized vision system for aiding in the diagnostic process; this system used a novel particle recognition algorithm to improve specificity and speed during the diagnostic process. We will show the detection and classification of different types of cells in a diluted blood sample using regression analysis of their size, shape and colour. The first step is to define the objects to be tracked by a Gaussian Mixture Model for background subtraction and binary opening and closing for noise suppression. After subtracting the objects of interest from the background the next challenge is to predict if a given object belongs to a certain category or not. This is a classification problem, and the output of the algorithm is a Boolean value (true/false). As such the computer program should be able to "predict" with reasonable level of confidence if a given particle belongs to the kind we are looking for or not. We show the use of a binary logistic regression analysis with three continuous predictors: size, shape and color histogram. The results suggest this variables could be very useful in a logistic regression equation as they proved to have a relatively high predictive value on their own.
Selection of higher order regression models in the analysis of multi-factorial transcription data.
Olivia Prazeres da Costa
Full Text Available INTRODUCTION: Many studies examine gene expression data that has been obtained under the influence of multiple factors, such as genetic background, environmental conditions, or exposure to diseases. The interplay of multiple factors may lead to effect modification and confounding. Higher order linear regression models can account for these effects. We present a new methodology for linear model selection and apply it to microarray data of bone marrow-derived macrophages. This experiment investigates the influence of three variable factors: the genetic background of the mice from which the macrophages were obtained, Yersinia enterocolitica infection (two strains, and a mock control, and treatment/non-treatment with interferon-γ. RESULTS: We set up four different linear regression models in a hierarchical order. We introduce the eruption plot as a new practical tool for model selection complementary to global testing. It visually compares the size and significance of effect estimates between two nested models. Using this methodology we were able to select the most appropriate model by keeping only relevant factors showing additional explanatory power. Application to experimental data allowed us to qualify the interaction of factors as either neutral (no interaction, alleviating (co-occurring effects are weaker than expected from the single effects, or aggravating (stronger than expected. We find a biologically meaningful gene cluster of putative C2TA target genes that appear to be co-regulated with MHC class II genes. CONCLUSIONS: We introduced the eruption plot as a tool for visual model comparison to identify relevant higher order interactions in the analysis of expression data obtained under the influence of multiple factors. We conclude that model selection in higher order linear regression models should generally be performed for the analysis of multi-factorial microarray data.
Ya-Nan Ma
Full Text Available BACKGROUND: There have been few published studies on spirometric reference values for healthy children in China. We hypothesize that there would have been changes in lung function that would not have been precisely predicted by the existing spirometric reference equations. The objective of the study was to develop more accurate predictive equations for spirometric reference values for children aged 9 to 15 years in Northeast China. METHODOLOGY/PRINCIPAL FINDINGS: Spirometric measurements were obtained from 3,922 children, including 1,974 boys and 1,948 girls, who were randomly selected from five cities of Liaoning province, Northeast China, using the ATS (American Thoracic Society and ERS (European Respiratory Society standards. The data was then randomly split into a training subset containing 2078 cases and a validation subset containing 1844 cases. Predictive equations used multiple linear regression techniques with three predictor variables: height, age and weight. Model goodness of fit was examined using the coefficient of determination or the R(2 and adjusted R(2. The predicted values were compared with those obtained from the existing spirometric reference equations. The results showed the prediction equations using linear regression analysis performed well for most spirometric parameters. Paired t-tests were used to compare the predicted values obtained from the developed and existing spirometric reference equations based on the validation subset. The t-test for males was not statistically significant (p>0.01. The predictive accuracy of the developed equations was higher than the existing equations and the predictive ability of the model was also validated. CONCLUSION/SIGNIFICANCE: We developed prediction equations using linear regression analysis of spirometric parameters for children aged 9-15 years in Northeast China. These equations represent the first attempt at predicting lung function for Chinese children following the ATS
Radiation densitometry in tree-ring analysis: a review and procedure manual
Parker, M.L.; Taylor, F.G.; Doyle, T.W.; Foster, B.E.; Cooper, C.; West, D.C.
1985-01-01
An x-ray densitometry of wood facility is being established by the Environmental Sciences Division, Oak Ridge Natioanl Laboratory (ORNL). The objective is to apply tree-ring data to determine whether or not there is a fertilizer effect on tree growth from increased atmospheric carbon dioxide since the beginning of the industrial era. Intra-ring width and density data, including ring-mass will be detemined from tree-ring samples collected from sites located throughout the United States and Canada. This report is designed as a guide to assist ORNL scientists in building the x-ray densitometry system. The history and development of x-ray densitometry in tree-ring research is examined and x-ray densitometry is compared with other techniques. Relative wood and tree characteristics are described as are environmental and genetic factors affecting tree growth responses. Methods in x-ray densitometry are examined in detail and the techniques used at four operating laboratories are described. Some ways that dendrochronology has been applied in dating, in wood quality, and environmental studies are presented, and a number of tree-ring studies in Canada are described. An annotated bibliography of radiation densitometry in tree-ring analysis and related subjects is included.
焦有权; 赵礼曦; 邓欧; 徐伟恒; 冯仲科
2013-01-01
several sections, and each section’s volume were summed up as the total tree volume. Based the analytic data, the unary models between diameter at breast and volume were established, and also, to set diameter at breast and tree height as independent variables, tree volume as dependent variable, the binary models could be established, as well as a ternary model that describes the relationship between volume and 3 independent variables including diameter at breast, tree height, and tree step form. Nevertheless, these models mentioned above are sample linear models or nonlinear models. To estimate the forest stocks in the forest survey, former researchers usually cut down target trees and extracted samples based on the principle of sampling, and then made a corresponding volume table. This felled, destructive, and time-consuming method damaged many growth dominant trees. Tree volume modeling is the key step of volume table establishment, and volume usually was predicted by the volume equation that was derived from experience. However, because of the uncertainty of tree growth, it is difficult to effectively predict the complexity and diversity of the volume model through conventional volume equations. For this reason, the volume prediction accuracy rate is unsatisfactory. In order to promote the volume prediction accuracy rate, the algorithm of particle swarm optimization (PSO) was introduced into the standing tree volume prediction model. Moreover, the parameters were optimized by the support vector regression (SVM). The data of diameters at breast height and tree heights of standing trees were input into SVM, which were used to learn, parameters of SVM were used as the particle of PSO, standing trees volume value that were measured by authors were considered as objective function of PSO, then prediction values of standing trees volume were detected by the optimized parameters which were obtained through mutual co-ordination of particle, and the prediction values of
Julia Gasch
Full Text Available BACKGROUND: Differences in spontaneous and drug-induced baroreflex sensitivity (BRS have been attributed to its different operating ranges. The current study attempted to compare BRS estimates during cardiovascular steady-state and pharmacologically stimulation using an innovative algorithm for dynamic determination of baroreflex gain. METHODOLOGY/PRINCIPAL FINDINGS: Forty-five volunteers underwent the modified Oxford maneuver in supine and 60° tilted position with blood pressure and heart rate being continuously recorded. Drug-induced BRS-estimates were calculated from data obtained by bolus injections of nitroprusside and phenylephrine. Spontaneous indices were derived from data obtained during rest (stationary and under pharmacological stimulation (non-stationary using the algorithm of trigonometric regressive spectral analysis (TRS. Spontaneous and drug-induced BRS values were significantly correlated and display directionally similar changes under different situations. Using the Bland-Altman method, systematic differences between spontaneous and drug-induced estimates were found and revealed that the discrepancy can be as large as the gain itself. Fixed bias was not evident with ordinary least products regression. The correlation and agreement between the estimates increased significantly when BRS was calculated by TRS in non-stationary mode during the drug injection period. TRS-BRS significantly increased during phenylephrine and decreased under nitroprusside. CONCLUSIONS/SIGNIFICANCE: The TRS analysis provides a reliable, non-invasive assessment of human BRS not only under static steady state conditions, but also during pharmacological perturbation of the cardiovascular system.
Buck, J. A.; Underhill, P. R.; Morelli, J.; Krause, T. W.
2016-02-01
Nuclear steam generators (SGs) are a critical component for ensuring safe and efficient operation of a reactor. Life management strategies are implemented in which SG tubes are regularly inspected by conventional eddy current testing (ECT) and ultrasonic testing (UT) technologies to size flaws, and safe operating life of SGs is predicted based on growth models. ECT, the more commonly used technique, due to the rapidity with which full SG tube wall inspection can be performed, is challenged when inspecting ferromagnetic support structure materials in the presence of magnetite sludge and multiple overlapping degradation modes. In this work, an emerging inspection method, pulsed eddy current (PEC), is being investigated to address some of these particular inspection conditions. Time-domain signals were collected by an 8 coil array PEC probe in which ferromagnetic drilled support hole diameter, depth of rectangular tube frets and 2D tube off-centering were varied. Data sets were analyzed with a modified principal components analysis (MPCA) to extract dominant signal features. Multiple linear regression models were applied to MPCA scores to size hole diameter as well as size rectangular outer diameter tube frets. Models were improved through exploratory factor analysis, which was applied to MPCA scores to refine selection for regression models inputs by removing nonessential information.
Application of decision trees to the analysis of soil radon data for earthquake prediction.
Zmazek, B; Todorovski, L; Dzeroski, S; Vaupotic, J; Kobal, I
2003-06-01
Different regression methods have been used to predict radon concentration in soil gas on the basis of environmental data, i.e. barometric pressure, soil temperature, air temperature and rainfall. Analyses of the radon data from three stations in the Krsko basin, Slovenia, have shown that model trees outperform other regression methods. A model has been built which predicts radon concentration with a correlation of 0.8, provided it is influenced only by the environmental parameters. In periods with seismic activity this correlation is much lower. This decrease in predictive accuracy appears 1-7 days before earthquakes with local magnitude 0.8-3.3.
Application of decision trees to the analysis of soil radon data for earthquake prediction
Zmazek, B. E-mail: boris.zmazek@ijs.si; Todorovski, L.; Dzeroski, S.; Vaupotic, J.; Kobal, I
2003-06-01
Different regression methods have been used to predict radon concentration in soil gas on the basis of environmental data, i.e. barometric pressure, soil temperature, air temperature and rainfall. Analyses of the radon data from three stations in the Krsko basin, Slovenia, have shown that model trees outperform other regression methods. A model has been built which predicts radon concentration with a correlation of 0.8, provided it is influenced only by the environmental parameters. In periods with seismic activity this correlation is much lower. This decrease in predictive accuracy appears 1-7 days before earthquakes with local magnitude 0.8-3.3.
Czekaj, Tomasz Gerard; Henningsen, Arne
The estimation of the technical efficiency comprises a vast literature in the field of applied production economics. There are two predominant approaches: the non-parametric and non-stochastic Data Envelopment Analysis (DEA) and the parametric Stochastic Frontier Analysis (SFA). The DEA...... of specifying an unsuitable functional form and thus, model misspecification and biased parameter estimates. Given these problems of the DEA and the SFA, Fan, Li and Weersink (1996) proposed a semi-parametric stochastic frontier model that estimates the production function (frontier) by non-parametric......), Kumbhakar et al. (2007), and Henningsen and Kumbhakar (2009). The aim of this paper and its main contribution to the existing literature is the estimation semi-parametric stochastic frontier models using a different non-parametric estimation technique: spline regression (Ma et al. 2011). We apply...
Within-session analysis of the extinction of pavlovian fear-conditioning using robust regression
Vargas-Irwin, Cristina
2010-06-01
Full Text Available Traditionally , the analysis of extinction data in fear conditioning experiments has involved the use of standard linear models, mostly ANOVA of between-group differences of subjects that have undergone different extinction protocols, pharmacological manipulations or some other treatment. Although some studies report individual differences in quantities such as suppression rates or freezing percentages, these differences are not included in the statistical modeling. Withinsubject response patterns are then averaged using coarse-grain time windows which can overlook these individual performance dynamics. Here we illustrate an alternative analytical procedure consisting of 2 steps: the estimation of a trend for within-session data and analysis of group differences in trend as main outcome. This procedure is tested on real fear-conditioning extinction data, comparing trend estimates via Ordinary Least Squares (OLS and robust Least Median of Squares (LMS regression estimates, as well as comparing between-group differences and analyzing mean freezing percentage versus LMS slopes as outcomes
Păniţă, Ovidiu
2015-09-01
In the years 2012-2014 on Banu-Maracine DRS there were tested an assortment of 25 isogenic lines of wheat (Triticum aestivum ssp.vulgare), the analyzed characters being the number of seeds/spike, seeds weight/spike (g), no. of spikes/m2, weight of a thousand seeds (WTS) (g) and no. of emerged plants/m2. Based on recorded data and statistical processing of those, they were identified a numbers of links between these characters. Also available regression models were identified between some of the studied characters. Based on component analysis, no. of seeds/spike and seeds weight/spike are components that influence in excess of 88% variance analysis, a total of seven genotypes with positive scores for both factors.
A frailty model approach for regression analysis of multivariate current status data.
Chen, Man-Hua; Tong, Xingwei; Sun, Jianguo
2009-11-30
This paper discusses regression analysis of multivariate current status failure time data (The Statistical Analysis of Interval-censoring Failure Time Data. Springer: New York, 2006), which occur quite often in, for example, tumorigenicity experiments and epidemiologic investigations of the natural history of a disease. For the problem, several marginal approaches have been proposed that model each failure time of interest individually (Biometrics 2000; 56:940-943; Statist. Med. 2002; 21:3715-3726). In this paper, we present a full likelihood approach based on the proportional hazards frailty model. For estimation, an Expectation Maximization (EM) algorithm is developed and simulation studies suggest that the presented approach performs well for practical situations. The approach is applied to a set of bivariate current status data arising from a tumorigenicity experiment.
Roseane Cavalcanti dos Santos
2012-08-01
Full Text Available The objective of this work was to estimate the stability and adaptability of pod and seed yield in runner peanut genotypes based on the nonlinear regression and AMMI analysis. Yield data from 11 trials, distributed in six environments and three harvests, carried out in the Northeast region of Brazil during the rainy season were used. Significant effects of genotypes (G, environments (E, and GE interactions were detected in the analysis, indicating different behaviors among genotypes in favorable and unfavorable environmental conditions. The genotypes BRS Pérola Branca and LViPE‑06 are more stable and adapted to the semiarid environment, whereas LGoPE‑06 is a promising material for pod production, despite being highly dependent on favorable environments.
李现文; 李春玉; Miyong Kim; 李贞姬; 黄德镐; 朱琴淑; 金今姬
2012-01-01
目的 探讨和评价决策树与Logistic回归用于预测高血压患者健康素养中的可行性与准确性.方法 利用Logistic回归分析和Answer Tree软件分别建立高血压患者健康素养预测模型,利用受试者工作曲线(ROC)评价两个预测模型的优劣.结果 Logistic回归预测模型的灵敏度(82.5％)、Youden指数(50.9％)高于决策树模型(77.9％,48.0％),决策树模型的特异性(70.1％)高于Logistic回归预测模型(68.4％),误判率(29.9％)低于Logistic回归预测模型(31.6％);决策树模型ROC曲线下面积与Logistic回归预测模型ROC曲线下面积相当(0.813 vs 0.847).结论 利用决策树预测高血压患者健康素养效果与Logistic回归模型相当,根据决策树模型可以确定高血压患者健康素养筛选策略,数据挖掘技术可以用于慢性病患者健康素养预测中.%Objective To study and evaluate the feasibility and accuracy for the application of decision tree methods and logistic regression on the health literacy prediction of hypertension patients. Method Two health literacy prediction models were generated with decision tree methods and logistic regression respectively. The receiver operating curve ( ROC) was used to evaluate the results of the two prediction models. Result The sensitivity(82. 5%) , Youden index (50. 9%)by logistic regression model was higher than decision tree model(77. 9% ,48. 0%) , the Spe-cificity(70. 1%)by decision tree model was higher than that of logistic regression model(68. 4%), The error rate (29.9%) was lower than that of logistic regression model(31. 6%). The ROC for both models were 0. 813 and 0. 847. Conclusion The effect of decision tree prediction model was similar to logistic regression prediction model. Health literacy screening strategy could be obtained by decision tree prediction model, implying the data mining methods is feasible in the chronic disease management of community health service.
Evaluation of Visual Field Progression in Glaucoma: Quasar Regression Program and Event Analysis.
Díaz-Alemán, Valentín T; González-Hernández, Marta; Perera-Sanz, Daniel; Armas-Domínguez, Karintia
2016-01-01
To determine the sensitivity, specificity and agreement between the Quasar program, glaucoma progression analysis (GPA II) event analysis and expert opinion in the detection of glaucomatous progression. The Quasar program is based on linear regression analysis of both mean defect (MD) and pattern standard deviation (PSD). Each series of visual fields was evaluated by three methods; Quasar, GPA II and four experts. The sensitivity, specificity and agreement (kappa) for each method was calculated, using expert opinion as the reference standard. The study included 439 SITA Standard visual fields of 56 eyes of 42 patients, with a mean of 7.8 ± 0.8 visual fields per eye. When suspected cases of progression were considered stable, sensitivity and specificity of Quasar, GPA II and the experts were 86.6% and 70.7%, 26.6% and 95.1%, and 86.6% and 92.6% respectively. When suspected cases of progression were considered as progressing, sensitivity and specificity of Quasar, GPA II and the experts were 79.1% and 81.2%, 45.8% and 90.6%, and 85.4% and 90.6% respectively. The agreement between Quasar and GPA II when suspected cases were considered stable or progressing was 0.03 and 0.28 respectively. The degree of agreement between Quasar and the experts when suspected cases were considered stable or progressing was 0.472 and 0.507. The degree of agreement between GPA II and the experts when suspected cases were considered stable or progressing was 0.262 and 0.342. The combination of MD and PSD regression analysis in the Quasar program showed better agreement with the experts and higher sensitivity than GPA II.
Rodríguez-Barranco, Miguel; Tobías, Aurelio; Redondo, Daniel; Molina-Portillo, Elena; Sánchez, María José
2017-03-17
Meta-analysis is very useful to summarize the effect of a treatment or a risk factor for a given disease. Often studies report results based on log-transformed variables in order to achieve the principal assumptions of a linear regression model. If this is the case for some, but not all studies, the effects need to be homogenized. We derived a set of formulae to transform absolute changes into relative ones, and vice versa, to allow including all results in a meta-analysis. We applied our procedure to all possible combinations of log-transformed independent or dependent variables. We also evaluated it in a simulation based on two variables either normally or asymmetrically distributed. In all the scenarios, and based on different change criteria, the effect size estimated by the derived set of formulae was equivalent to the real effect size. To avoid biased estimates of the effect, this procedure should be used with caution in the case of independent variables with asymmetric distributions that significantly differ from the normal distribution. We illustrate an application of this procedure by an application to a meta-analysis on the potential effects on neurodevelopment in children exposed to arsenic and manganese. The procedure proposed has been shown to be valid and capable of expressing the effect size of a linear regression model based on different change criteria in the variables. Homogenizing the results from different studies beforehand allows them to be combined in a meta-analysis, independently of whether the transformations had been performed on the dependent and/or independent variables.
Askelöf, P; Korsfeldt, M; Mannervik, B
1976-10-01
Knowledge of the error structure of a given set of experimental data is a necessary prerequisite for incisive analysis and for discrimination between alternative mathematical models of the data set. A reaction system consisting of glutathione S-transferase A (glutathione S-aryltransferase), glutathione, and 3,4-dichloro-1-nitrobenzene was investigated under steady-state conditions. It was found that the experimental error increased with initial velocity, v, and that the variance (estimated by replicates) could be described by a polynomial in v Var (v) = K0 + K1 - v + K2 - v2 or by a power function Var (v) = K0 + K1 - vK2. These equations were good approximations irrespective of whether different v values were generated by changing substrate or enzyme concentrations. The selection of these models was based mainly on experiments involving varying enzyme concentration, which, unlike v, is not considered a stochastic variable. Different models of the variance, expressed as functions of enzyme concentration, were examined by regression analysis, and the models could then be transformed to functions in which velocity is substituted for enzyme concentration owing to the proportionality between these variables. Thus, neither the absolute nor the relative error was independent of velocity, a result previously obtained for glutathione reductase in this laboratory [BioSystems 7, 101-119 (1975)]. If the experimental errors or velocities were standardized by division with their corresponding mean velocity value they showed a normal (Gaussian) distribution provided that the coefficient of variation was approximately constant for the data considered. Furthermore, it was established that the errors in the independent variables (enzyme and substrate concentrations) were small in comparison with the error in the velocity determinations. For weighting in regression analysis the inverted value of the local variance in each experimental point should be used. It was found that the
Measuring treatment and scale bias effects by linear regression in the analysis of OHI-S scores.
Moore, B J
1977-05-01
A linear regression model is presented for estimating unbiased treatment effects from OHI-S scores. An example is given to illustrate an analysis and to compare results of an unbiased regression estimator with those based on a biased simple difference estimator.
The Impact of Outliers on Net-Benefit Regression Model in Cost-Effectiveness Analysis.
Wen, Yu-Wen; Tsai, Yi-Wen; Wu, David Bin-Chia; Chen, Pei-Fen
2013-01-01
Ordinary least square (OLS) in regression has been widely used to analyze patient-level data in cost-effectiveness analysis (CEA). However, the estimates, inference and decision making in the economic evaluation based on OLS estimation may be biased by the presence of outliers. Instead, robust estimation can remain unaffected and provide result which is resistant to outliers. The objective of this study is to explore the impact of outliers on net-benefit regression (NBR) in CEA using OLS and to propose a potential solution by using robust estimations, i.e. Huber M-estimation, Hampel M-estimation, Tukey's bisquare M-estimation, MM-estimation and least trimming square estimation. Simulations under different outlier-generating scenarios and an empirical example were used to obtain the regression estimates of NBR by OLS and five robust estimations. Empirical size and empirical power of both OLS and robust estimations were then compared in the context of hypothesis testing. Simulations showed that the five robust approaches compared with OLS estimation led to lower empirical sizes and achieved higher empirical powers in testing cost-effectiveness. Using real example of antiplatelet therapy, the estimated incremental net-benefit by OLS estimation was lower than those by robust approaches because of outliers in cost data. Robust estimations demonstrated higher probability of cost-effectiveness compared to OLS estimation. The presence of outliers can bias the results of NBR and its interpretations. It is recommended that the use of robust estimation in NBR can be an appropriate method to avoid such biased decision making.
Stature estimation from footprint measurements in Indian Tamils by regression analysis
T. Nataraja Moorthy
2014-03-01
Full Text Available Stature estimation is of particular interest to forensic scientists for its importance in human identification. Footprint is one piece of valuable physical evidence encountered at crime scenes and its identification can facilitate narrowing down the suspects and establishing the identity of the criminals. Analysis of footprints helps in estimation of an individual’s stature because of the existence of the strong correlation between footprint and height. Foot impressions are still found at crime scenes, since offenders often tend to remove their footwear either to avoid noise or to gain a better grip in climbing walls, etc., while entering or exiting. In Asian countries like India, there are people who still have the habit of walking barefoot. The present study aims to estimate the stature in a sample of 2,040 bilateral footprints collected from 1,020 healthy adult male Indian Tamils, an ethnic group in Tamilnadu State, India, who consented to participate in the study and who range in age from 19 to 42 years old; this study will help to generate population-specific equations using a simple linear regression statistical method. All footprint lengths exhibit a statistically positive significant correlation with stature (p-value < 0.01 and the correlation coefficient (r ranges from 0.546 to 0.578. The accuracy of the regression equations was verified by comparing the estimated stature with the actual stature. Regression equations derived in this research can be used to estimate stature from the complete or even partial footprints among Indian Tamils.
Systems Analysis of Ten Supply Chains for Whole Tree Chips
Helmer Belbo
2014-09-01
Full Text Available Whole trees from energy thinnings constitute one of many forest fuel sources, yet ten widely applied supply chains could be defined for this feedstock alone. These ten represent only a subset of the real possibilities, as felling method was held constant and only a single market (combustion of whole tree chips was considered. Stages included in-field, roadside landing, terminal, and conversion plant, and biomass states at each of these included loose whole trees, bundled whole trees or chipped material. Assumptions on prices, performances, and conversion rates were based on field trials and published literature in similar boreal forest conditions. The economic outcome was calculated on the basis of production, handling, treatment and storage costs and losses. Outcomes were tested for robustness on a range of object volumes (50–350 m3solid, extraction distances (50–550 m and transport distances (10–70 km using simulation across a set of discrete values. Transport was calculated for both a standard 19.5 m and an extended 24 m timber truck. Results showed that the most expensive chain (roadside bundling, roadside storage, terminal storage and delivery using a 19.5 m timber truck at 158 € td−1 was 23% more costly than the cheapest chain (roadside chipping and direct transport to conversion plant with container truck, at 128 € td−1. Outcomes vary at specific object volumes and transport distances, highlighting the need to verify assumptions, although standard deviations around mean supply costs for each chain were small (6%–9%. Losses at all stages were modelled, with the largest losses (23 € td−1 occurring in the chains including bundles. The study makes all methods and assumptions explicit and can assist the procurement manager in understanding the mechanisms at work.
Analysis of superheater's pipe wall overtemperature by fault tree diagnose
盛德仁; 任浩仁; 陈坚红; 李蔚
2002-01-01
After research on a 2000t/h subcritical forced-circulation balanced v entilation were applied boiler and the structure and operation of its auxiliary system builds up this heat transfer model of a superheater's pipe wall and analy ze the effect of primary factors on the overtemperature of the pipe wall. Fault tree structure was used to uncover the multiplayer logic between the overtempera ture of the superheater's pipe wall and the faults.
Fu, Yuan-Yuan; Wang, Ji-Hua; Yang, Gui-Jun; Song, Xiao-Yu; Xu, Xin-Gang; Feng, Hai-Kuan
2013-05-01
The major limitation of using existing vegetation indices for crop biomass estimation is that it approaches a saturation level asymptotically for a certain range of biomass. In order to resolve this problem, band depth analysis and partial least square regression (PLSR) were combined to establish winter wheat biomass estimation model in the present study. The models based on the combination of band depth analysis and PLSR were compared with the models based on common vegetation indexes from the point of view of estimation accuracy, subsequently. Band depth analysis was conducted in the visible spectral domain (550-750 nm). Band depth, band depth ratio (BDR), normalized band depth index, and band depth normalized to area were utilized to represent band depth information. Among the calibrated estimation models, the models based on the combination of band depth analysis and PLSR reached higher accuracy than those based on the vegetation indices. Among them, the combination of BDR and PLSR got the highest accuracy (R2 = 0.792, RMSE = 0.164 kg x m(-2)). The results indicated that the combination of band depth analysis and PLSR could well overcome the saturation problem and improve the biomass estimation accuracy when winter wheat biomass is large.
Analysis of contaminating elements in tree rings in Santiago, Chile
Romo-Kroeger, C.M.; Avila, M.J.; Eaton, L.C.; Lopez, L.A. [Faculty of Sciences. Univ. of Chile, Santiago (Chile)
1996-12-31
Using the 22`` isochronous cyclotron at the University of Chile, we have performed PIXE analyses on a group of samples collected from trees of metropolitan parks in Santiago. Dendrochronology was performed on each sample, which was then sectioned for the PIXE and other analyses, neutron activation and electro-chemistry. Available samples are trunk sections or cores obtained by the use of a 4.0 mm stainless steel incremental corer. We took three cores from each tree with permission of the municipalities. For the PIXE we use infinitely thick targets, as wood slabs taken along the trunk radius, and thin targets obtained by acid digestion of wood pieces and deposition on Kapton foils. Self supporting thick targets were placed directly in the PIXE chamber in a position so as to allow the irradiation of a specific annual ring. Potassium and Calcium appear as the most abundant elements in wood. Other elements such as S, Cu, Zn, As, Br and Pb were detected in amounts above the natural background in wood, and can be attributed to environmental contamination. The K/Ca ratios appear to be different for each species of tree, and seem to be related to the physico-chemical properties of wood. Preliminary results show important amounts of As and Cu (supposedly from mining origin) with increasing presence in the recent years. Pb and Zn (supposedly from vehicle origin) are also higher in recent years. (author)
Use of Tritium Accelerator Mass Spectrometry for Tree Ring Analysis
LOVE, ADAM H.; HUNT, JAMES R.; ROBERTS, MARK L.; SOUTHON, JOHN R.; CHIARAPPA - ZUCCA, MARINA L.; DINGLEY, KAREN H.
2010-01-01
Public concerns over the health effects associated with low-level and long-term exposure to tritium released from industrial point sources have generated the demand for better methods to evaluate historical tritium exposure levels for these communities. The cellulose of trees accurately reflects the tritium concentration in the source water and may contain the only historical record of tritium exposure. The tritium activity in the annual rings of a tree was measured using accelerator mass spectrometry to reconstruct historical annual averages of tritium exposure. Milligram-sized samples of the annual tree rings from a Tamarix located at the Nevada Test Site are used for validation of this methodology. The salt cedar was chosen since it had a single source of tritiated water that was well-characterized as it varied over time. The decay-corrected tritium activity of the water in which the salt cedar grew closely agrees with the organically bound tritium activity in its annual rings. This demonstrates that the milligram-sized samples used in tritium accelerator mass spectrometry are suited for reconstructing anthropogenic tritium levels in the environment. PMID:12144257
Performance Analysis of Evolutionary Algorithms for Steiner Tree Problems.
Lai, Xinsheng; Zhou, Yuren; Xia, Xiaoyun; Zhang, Qingfu
2016-12-13
The Steiner tree problem (STP) aims to determine some Steiner nodes such that the minimum spanning tree over these Steiner nodes and a given set of special nodes has the minimum weight, which is NP-hard. STP includes several important cases. The Steiner tree problem in graphs (GSTP) is one of them. Many heuristics have been proposed for STP, and some of them have proved to be performance guarantee approximation algorithms for this problem. Since evolutionary algorithms (EAs) are general and popular randomized heuristics, it is significant to investigate the performance of EAs for STP. Several empirical investigations have shown that EAs are efficient for STP. However, up to now, there is no theoretical work on the performance of EAs for STP. In this paper, we reveal that the (1 + 1) EA achieves 3/2-approximation ratio for STP in a special class of quasi-bipartite graphs in expected runtime O(r(r + s - 1) ⋅ wmax), where r, s and wmax are respectively the number of Steiner nodes, the number of special nodes and the largest weight among all edges in the input graph. We also show that the (1 + 1) EA is better than two other heuristics on two GSTP instances, and the (1 + 1) EA may be inefficient on a constructed GSTP instance.
Ripe Fuji Apple Detection Model Analysis in Natural Tree Canopy
Dongjian He
2012-11-01
Full Text Available In this work we develop a novel approach for the automatic recognition of red Fuji apples within a tree canopy using three distinguishable color models in order to achieve automated harvesting. How to select the recognition model is important for the certain intelligent harvester employed to perform in real orchards. The L*a*b color model, HSI (Hue, Saturation and Intensity color model and LCD color difference model, which are insensitive to light conditions, are analyzed and applied to detect the fruit under the different lighting conditions because the fruit has the highest red color among the objects in the image. The fuzzy 2-partition entropy, which could discriminate the object and the background in grayscale images and is obtained from the histogram, is applied to the segment the Fuji apples under complex backgrounds. A series of mathematical morphological operations are used to eliminate segmental fragments after segmentation. Finally, the proposed approach is validated on apple images taken in natural tree canopies. A contribution reported in this work, is the voting scheme added to the natural tree canopy which recognizes apples under different light influences.
Lopes, J.A Pecas; Vasconcelos, Maria Helena O.P. de [Instituto de Engenharia de Sistemas e Computadores (INESC), Porto (Portugal). E-mail: jpl@riff.fe.up.pt; hvasconcelos@inescn.pt
1999-07-01
This paper describes in a synthetic manner the technology adopted to define structures used in the fast evaluation of dynamic safety of isolated network with high level of eolic production contribution. This methodology uses hybrid regression trees, which allows the quantification the endurance connected to the dynamic behavior of these networks by emulating the frequency minimum deviation that will be experienced by the system when submitted toa pre-defined perturbation. Also, new procedures for data automatic generation are presented, which will be used for construction and measurements of the evaluation structures performance. The paper describes the Terceira island - Acores archipelago network study case.
Shuai Wang
2014-10-01
Full Text Available Accurate prediction of the remaining useful life (RUL of lithium-ion batteries is important for battery management systems. Traditional empirical data-driven approaches for RUL prediction usually require multidimensional physical characteristics including the current, voltage, usage duration, battery temperature, and ambient temperature. From a capacity fading analysis of lithium-ion batteries, it is found that the energy efficiency and battery working temperature are closely related to the capacity degradation, which account for all performance metrics of lithium-ion batteries with regard to the RUL and the relationships between some performance metrics. Thus, we devise a non-iterative prediction model based on flexible support vector regression (F-SVR and an iterative multi-step prediction model based on support vector regression (SVR using the energy efficiency and battery working temperature as input physical characteristics. The experimental results show that the proposed prognostic models have high prediction accuracy by using fewer dimensions for the input data than the traditional empirical models.
An innovative land use regression model incorporating meteorology for exposure analysis.
Su, Jason G; Brauer, Michael; Ainslie, Bruce; Steyn, Douw; Larson, Timothy; Buzzelli, Michael
2008-02-15
The advent of spatial analysis and geographic information systems (GIS) has led to studies of chronic exposure and health effects based on the rationale that intra-urban variations in ambient air pollution concentrations are as great as inter-urban differences. Such studies typically rely on local spatial covariates (e.g., traffic, land use type) derived from circular areas (buffers) to predict concentrations/exposures at receptor sites, as a means of averaging the annual net effect of meteorological influences (i.e., wind speed, wind direction and insolation). This is the approach taken in the now popular land use regression (LUR) method. However spatial studies of chronic exposures and temporal studies of acute exposures have not been adequately integrated. This paper presents an innovative LUR method implemented in a GIS environment that reflects both temporal and spatial variability and considers the role of meteorology. The new source area LUR integrates wind speed, wind direction and cloud cover/insolation to estimate hourly nitric oxide (NO) and nitrogen dioxide (NO(2)) concentrations from land use types (i.e., road network, commercial land use) and these concentrations are then used as covariates to regress against NO and NO(2) measurements at various receptor sites across the Vancouver region and compared directly with estimates from a regular LUR. The results show that, when variability in seasonal concentration measurements is present, the source area LUR or SA-LUR model is a better option for concentration estimation.
Improved Regression Analysis of Temperature-Dependent Strain-Gage Balance Calibration Data
Ulbrich, N.
2015-01-01
An improved approach is discussed that may be used to directly include first and second order temperature effects in the load prediction algorithm of a wind tunnel strain-gage balance. The improved approach was designed for the Iterative Method that fits strain-gage outputs as a function of calibration loads and uses a load iteration scheme during the wind tunnel test to predict loads from measured gage outputs. The improved approach assumes that the strain-gage balance is at a constant uniform temperature when it is calibrated and used. First, the method introduces a new independent variable for the regression analysis of the balance calibration data. The new variable is designed as the difference between the uniform temperature of the balance and a global reference temperature. This reference temperature should be the primary calibration temperature of the balance so that, if needed, a tare load iteration can be performed. Then, two temperature{dependent terms are included in the regression models of the gage outputs. They are the temperature difference itself and the square of the temperature difference. Simulated temperature{dependent data obtained from Triumph Aerospace's 2013 calibration of NASA's ARC-30K five component semi{span balance is used to illustrate the application of the improved approach.
Multiple Regression Analysis of mRNA-miRNA Associations in Colorectal Cancer Pathway
Fengfeng Wang
2014-01-01
Full Text Available Background. MicroRNA (miRNA is a short and endogenous RNA molecule that regulates posttranscriptional gene expression. It is an important factor for tumorigenesis of colorectal cancer (CRC, and a potential biomarker for diagnosis, prognosis, and therapy of CRC. Our objective is to identify the related miRNAs and their associations with genes frequently involved in CRC microsatellite instability (MSI and chromosomal instability (CIN signaling pathways. Results. A regression model was adopted to identify the significantly associated miRNAs targeting a set of candidate genes frequently involved in colorectal cancer MSI and CIN pathways. Multiple linear regression analysis was used to construct the model and find the significant mRNA-miRNA associations. We identified three significantly associated mRNA-miRNA pairs: BCL2 was positively associated with miR-16 and SMAD4 was positively associated with miR-567 in the CRC tissue, while MSH6 was positively associated with miR-142-5p in the normal tissue. As for the whole model, BCL2 and SMAD4 models were not significant, and MSH6 model was significant. The significant associations were different in the normal and the CRC tissues. Conclusion. Our results have laid down a solid foundation in exploration of novel CRC mechanisms, and identification of miRNA roles as oncomirs or tumor suppressor mirs in CRC.
Gayou, Olivier; Das, Shiva K; Zhou, Su-Min; Marks, Lawrence B; Parda, David S; Miften, Moyed
2008-12-01
A given outcome of radiotherapy treatment can be modeled by analyzing its correlation with a combination of dosimetric, physiological, biological, and clinical factors, through a logistic regression fit of a large patient population. The quality of the fit is measured by the combination of the predictive power of this particular set of factors and the statistical significance of the individual factors in the model. We developed a genetic algorithm (GA), in which a small sample of all the possible combinations of variables are fitted to the patient data. New models are derived from the best models, through crossover and mutation operations, and are in turn fitted. The process is repeated until the sample converges to the combination of factors that best predicts the outcome. The GA was tested on a data set that investigated the incidence of lung injury in NSCLC patients treated with 3DCRT. The GA identified a model with two variables as the best predictor of radiation pneumonitis: the V30 (p=0.048) and the ongoing use of tobacco at the time of referral (p=0.074). This two-variable model was confirmed as the best model by analyzing all possible combinations of factors. In conclusion, genetic algorithms provide a reliable and fast way to select significant factors in logistic regression analysis of large clinical studies.
Rubio, Francisco J.
2016-02-09
We study Bayesian linear regression models with skew-symmetric scale mixtures of normal error distributions. These kinds of models can be used to capture departures from the usual assumption of normality of the errors in terms of heavy tails and asymmetry. We propose a general noninformative prior structure for these regression models and show that the corresponding posterior distribution is proper under mild conditions. We extend these propriety results to cases where the response variables are censored. The latter scenario is of interest in the context of accelerated failure time models, which are relevant in survival analysis. We present a simulation study that demonstrates good frequentist properties of the posterior credible intervals associated with the proposed priors. This study also sheds some light on the trade-off between increased model flexibility and the risk of over-fitting. We illustrate the performance of the proposed models with real data. Although we focus on models with univariate response variables, we also present some extensions to the multivariate case in the Supporting Information.
Abdul Ghafoor Memon
2014-03-01
Full Text Available In this study, thermodynamic and statistical analyses were performed on a gas turbine system, to assess the impact of some important operating parameters like CIT (Compressor Inlet Temperature, PR (Pressure Ratio and TIT (Turbine Inlet Temperature on its performance characteristics such as net power output, energy efficiency, exergy efficiency and fuel consumption. Each performance characteristic was enunciated as a function of operating parameters, followed by a parametric study and optimization. The results showed that the performance characteristics increase with an increase in the TIT and a decrease in the CIT, except fuel consumption which behaves oppositely. The net power output and efficiencies increase with the PR up to certain initial values and then start to decrease, whereas the fuel consumption always decreases with an increase in the PR. The results of exergy analysis showed the combustion chamber as a major contributor to the exergy destruction, followed by stack gas. Subsequently, multiple regression models were developed to correlate each of the response variables (performance characteristic with the predictor variables (operating parameters. The regression model equations showed a significant statistical relationship between the predictor and response variables.
Exergy Analysis of a Subcritical Reheat Steam Power Plant with Regression Modeling and Optimization
MUHIB ALI RAJPER
2016-07-01
Full Text Available In this paper, exergy analysis of a 210 MW SPP (Steam Power Plant is performed. Firstly, the plant is modeled and validated, followed by a parametric study to show the effects of various operating parameters on the performance parameters. The net power output, energy efficiency, and exergy efficiency are taken as the performance parameters, while the condenser pressure, main steam pressure, bled steam pressures, main steam temperature, and reheat steam temperature isnominated as the operating parameters. Moreover, multiple polynomial regression models are developed to correlate each performance parameter with the operating parameters. The performance is then optimizedby using Direct-searchmethod. According to the results, the net power output, energy efficiency, and exergy efficiency are calculated as 186.5 MW, 31.37 and 30.41%, respectively under normal operating conditions as a base case. The condenser is a major contributor towards the energy loss, followed by the boiler, whereas the highest irreversibilities occur in the boiler and turbine. According to the parametric study, variation in the operating parameters greatly influences the performance parameters. The regression models have appeared to be a good estimator of the performance parameters. The optimum net power output, energy efficiency and exergy efficiency are obtained as 227.6 MW, 37.4 and 36.4, respectively, which have been calculated along with optimal values of selected operating parameters.
Classification of Effective Soil Depth by Using Multinomial Logistic Regression Analysis
Chang, C. H.; Chan, H. C.; Chen, B. A.
2016-12-01
Classification of effective soil depth is a task of determining the slopeland utilizable limitation in Taiwan. The "Slopeland Conservation and Utilization Act" categorizes the slopeland into agriculture and husbandry land, land suitable for forestry and land for enhanced conservation according to the factors including average slope, effective soil depth, soil erosion and parental rock. However, sit investigation of the effective soil depth requires a cost-effective field work. This research aimed to classify the effective soil depth by using multinomial logistic regression with the environmental factors. The Wen-Shui Watershed located at the central Taiwan was selected as the study areas. The analysis of multinomial logistic regression is performed by the assistance of a Geographic Information Systems (GIS). The effective soil depth was categorized into four levels including deeper, deep, shallow and shallower. The environmental factors of slope, aspect, digital elevation model (DEM), curvature and normalized difference vegetation index (NDVI) were selected for classifying the soil depth. An Error Matrix was then used to assess the model accuracy. The results showed an overall accuracy of 75%. At the end, a map of effective soil depth was produced to help planners and decision makers in determining the slopeland utilizable limitation in the study areas.
Yu, Rongqin; Geddes, John R; Fazel, Seena
2012-10-01
The risk of antisocial outcomes in individuals with personality disorder (PD) remains uncertain. The authors synthesize the current evidence on the risks of antisocial behavior, violence, and repeat offending in PD, and they explore sources of heterogeneity in risk estimates through a systematic review and meta-regression analysis of observational studies comparing antisocial outcomes in personality disordered individuals with controls groups. Fourteen studies examined risk of antisocial and violent behavior in 10,007 individuals with PD, compared with over 12 million general population controls. There was a substantially increased risk of violent outcomes in studies with all PDs (random-effects pooled odds ratio [OR] = 3.0, 95% CI = 2.6 to 3.5). Meta-regression revealed that antisocial PD and gender were associated with higher risks (p = .01 and .07, respectively). The odds of all antisocial outcomes were also elevated. Twenty-five studies reported the risk of repeat offending in PD compared with other offenders. The risk of a repeat offense was also increased (fixed-effects pooled OR = 2.4, 95% CI = 2.2 to 2.7) in offenders with PD. The authors conclude that although PD is associated with antisocial outcomes and repeat offending, the risk appears to differ by PD category, gender, and whether individuals are offenders or not.
A quantile regression approach to the analysis of the quality of life determinants in the elderly
Serena Broccoli
2013-05-01
Full Text Available Objective. The aim of this study is to explain the effect of important covariates on the health-related quality of life (HRQol in elderly subjects. Methods. Data were collected within a longitudinal study that involves 5256 subject, aged +or= 65. The Visual Analogue Scale inclused in the EQ-5D Questionnaire, tha EQ-VAS, was used to obtain a synthetic measure of quality of life. To model EQ-VAS Score a quantile regression analysis was employed. This methodological approach was preferred to an OLS regression becouse of the EQ-VAS Score typical distribution. The main covariates are: amount of weekly physical activity, reported problems in Activity of Daily Living, presence of cardiovascular diseases, diabetes, hypercolesterolemia, hypertension, joints pains, as well as socio-demographic information. Main Results. 1 Even a low level of physical activity significantly influences quality of life in a positive way; 2 ADL problems, at least one cardiovascular disease and joint pain strongly decrease the quality of life.
minimum variance estimation of yield parameters of rubber tree with ...
2013-03-01
Mar 1, 2013 ... STAMP, an OxMetric modular software system for time series analysis, was used to estimate the yield ... derlying regression techniques. .... Kalman Filter Minimum Variance Estimation of Rubber Tree Yield Parameters. 83.
Factors predicting the failure of Bernese periacetabular osteotomy: a meta-regression analysis.
Sambandam, Senthil Nathan; Hull, Jason; Jiranek, William A
2009-12-01
There is no clear evidence regarding the outcome of Bernese periacetabular osteotomy (PAO) in different patient populations. We performed systematic meta-regression analysis of 23 eligible studies. There were 1,113 patients of which 61 patients had total hip arthroplasty (THA) (endpoint) as a result of failed Bernese PAO. Univariate analysis revealed significant correlation between THA and presence of grade 2/grade 3 arthritis, Merle de'Aubigne score (MDS), Harris hip score and Tonnis angle, change in lateral centre edge (LCE) angle, late proximal femoral osteotomies, and heterotrophic ossification (HO) resection. Multivariate analysis showed that the odds of having THA increases with grade 2/grade 3 osteoarthritis (3.36 times), joint penetration (3.12 times), low preoperative MDS (1.59 times), late PFO (1.59 times), presence of preoperative subluxation (1.22 times), previous hip operations (1.14 times), and concomitant PFO (1.09 times). In the absence of randomised controlled studies, the findings of this analysis can help the surgeon to make treatment decisions.
A Performance Study of Data Mining Techniques: Multiple Linear Regression vs. Factor Analysis
Taneja, Abhishek
2011-01-01
The growing volume of data usually creates an interesting challenge for the need of data analysis tools that discover regularities in these data. Data mining has emerged as disciplines that contribute tools for data analysis, discovery of hidden knowledge, and autonomous decision making in many application domains. The purpose of this study is to compare the performance of two data mining techniques viz., factor analysis and multiple linear regression for different sample sizes on three unique sets of data. The performance of the two data mining techniques is compared on following parameters like mean square error (MSE), R-square, R-Square adjusted, condition number, root mean square error(RMSE), number of variables included in the prediction model, modified coefficient of efficiency, F-value, and test of normality. These parameters have been computed using various data mining tools like SPSS, XLstat, Stata, and MS-Excel. It is seen that for all the given dataset, factor analysis outperform multiple linear re...
Wu, X. B.
2006-06-01
Full Text Available Four body-size and fourteen head-size measurements were taken from each Chinese alligator (Alligator sinensis according to the measurements adapted from Verdade. Regression equations between body-size and head-size variables were presented to predict body size from head dimension. The coefficients of determination of captive animals concerning body- and head-size variables can be considered extremely high, which means most of the head-size variables studied can be useful for predicting body length. The result of multivariate allometric analysis indicated that the head elongates as in most other species of crocodilians. The allometric coefficients of snout length (SL and lower ramus (LM were greater than those of other variables of head, which was considered to be possibly correlated to fights and prey. On the contrary, allometric coefficients for the variables of obita (OW, OL and postorbital cranial roof (LCR, were lower than those of other variables.
Deterministic Assessment of Continuous Flight Auger Construction Durations Using Regression Analysis
Hossam E. Hosny
2015-07-01
Full Text Available One of the primary functions of construction equipment management is to calculate the production rate of equipment which will be a major input to the processes of time estimates, cost estimates and the overall project planning. Accordingly, it is crucial to stakeholders to be able to compute equipment production rates. This may be achieved using an accurate, reliable and easy tool. The objective of this research is to provide a simple model that can be used by specialists to predict the duration of a proposed Continuous Flight Auger job. The model was obtained using a prioritizing technique based on expert judgment then using multi-regression analysis based on a representative sample. The model was then validated on a selected sample of projects. The average error of the model was calculated to be about (3%-6%.
Biological stability in drinking water: a regression analysis of influencing factors
LU Wei; ZHANG Xiao-jian
2005-01-01
Some parameters, such as assimilable organic carbon(AOC), chloramine residual, water temperature, and water residence time, were measured in drinking water from distribution systems in a northern city of China. The measurement results illustrate that when chloramine residual is more than 0.3 mg/L or AOC content is below 50 tμg/L, the biological stability of drinking water can be controlled.Both chloramine residual and AOC have a good relationship with Heterotrophic Plate Counts(HPC)(log value), the correlation coefficient was -0.64 and 0.33, respectively. By regression analysis of the survey data, a statistical equation is presented and it is concluded that disinfectant residual exerts the strongest influence on bacterial growth and AOC is a suitable index to assess the biological stability in the drinking water.
Logistic Regression Analysis on Factors Affecting Adoption of RiceFish Farming in North Iran
Seyyed Ali NOORHOSSEINI-NIYAKI; Mohammad Sadegh ALLAHYARI
2012-01-01
We evaluated the factors influencing the adoption of rice-fish farming in the Tavalesh region near the Caspian Sea in northern Iran.We conducted a survey with open-ended questions.Data were collected from 184 respondents (61 adopters and 123 non-adopters) randomly sampled from selected villages and analyzed using logistic regression and multiresponse analysis.Family size,number of contacts with an extension agent,participation in extension-education activities,membership in social institutions and the presence of farm workers were the most important socioeconomic factors for the adoption of rice-fish farming system.In addition,economic problems were the most common issue reported by adopters.Other issues such as lack of access to appropriate fish food,losses of fish,lack of access to high quality fish fingerlings and dehydration and poor water quality were also important to a number of farmers.
ANALYSIS OF TUITION GROWTH RATES BASED ON CLUSTERING AND REGRESSION MODELS
Long Cheng
2016-07-01
Full Text Available Tuition plays a significant role in determining whether a student could afford higher education, which is one of the major driving forces for country development and social prosperity. So it is necessary to fully understand what factors might affect the tuition and how they affect it. However, many existing studies on the tuition growth rate either lack sufficient real data and proper quantitative models to support their conclusions, or are limited to focus on only a few factors that might affect the tuition growth rate, failing to make a comprehensive analysis. In this paper, we explore a wide variety of factors that might affect the tuition growth rate by use of large amounts of authentic data and different quantitative methods such as clustering and regression models.
A generalized Defries-Fulker regression framework for the analysis of twin data.
Lazzeroni, Laura C; Ray, Amrita
2013-01-01
Twin studies compare the similarity between monozygotic twins to that between dizygotic twins in order to investigate the relative contributions of latent genetic and environmental factors influencing a phenotype. Statistical methods for twin data include likelihood estimation and Defries-Fulker regression. We propose a new generalization of the Defries-Fulker model that fully incorporates the effects of observed covariates on both members of a twin pair and is robust to violations of the Normality assumption. A simulation study demonstrates that the method is competitive with likelihood analysis. The Defries-Fulker strategy yields new insight into the parameter space of the twin model and provides a novel, prediction-based interpretation of twin study results that unifies continuous and binary traits. Due to the simplicity of its structure, extensions of the model have the potential to encompass generalized linear models, censored and truncated data; and gene by environment interactions.
牛东晓; 刘达; 邢棉
2008-01-01
A combined model based on principal components analysis (PCA) and generalized regression neural network (GRNN) was adopted to forecast electricity price in day-ahead electricity market. PCA was applied to mine the main influence on day-ahead price, avoiding the strong correlation between the input factors that might influence electricity price, such as the load of the forecasting hour, other history loads and prices, weather and temperature; then GRNN was employed to forecast electricity price according to the main information extracted by PCA. To prove the efficiency of the combined model, a case from PJM (Pennsylvania-New Jersey-Maryland) day-ahead electricity market was evaluated. Compared to back-propagation (BP) neural network and standard GRNN, the combined method reduces the mean absolute percentage error about 3%.
Sensitivity Analysis to Select the Most Influential Risk Factors in a Logistic Regression Model
Jassim N. Hussain
2008-01-01
Full Text Available The traditional variable selection methods for survival data depend on iteration procedures, and control of this process assumes tuning parameters that are problematic and time consuming, especially if the models are complex and have a large number of risk factors. In this paper, we propose a new method based on the global sensitivity analysis (GSA to select the most influential risk factors. This contributes to simplification of the logistic regression model by excluding the irrelevant risk factors, thus eliminating the need to fit and evaluate a large number of models. Data from medical trials are suggested as a way to test the efficiency and capability of this method and as a way to simplify the model. This leads to construction of an appropriate model. The proposed method ranks the risk factors according to their importance.
Melanin and blood concentration in human skin studied by multiple regression analysis: experiments
Shimada, M.; Yamada, Y.; Itoh, M.; Yatagai, T.
2001-09-01
Knowledge of the mechanism of human skin colour and measurement of melanin and blood concentration in human skin are needed in the medical and cosmetic fields. The absorbance spectrum from reflectance at the visible wavelength of human skin increases under several conditions such as a sunburn or scalding. The change of the absorbance spectrum from reflectance including the scattering effect does not correspond to the molar absorption spectrum of melanin and blood. The modified Beer-Lambert law is applied to the change in the absorbance spectrum from reflectance of human skin as the change in melanin and blood is assumed to be small. The concentration of melanin and blood was estimated from the absorbance spectrum reflectance of human skin using multiple regression analysis. Estimated concentrations were compared with the measured one in a phantom experiment and this method was applied to in vivo skin.
Shen, Chung-Wei; Chen, Yi-Hau
2015-10-01
Missing observations and covariate measurement error commonly arise in longitudinal data. However, existing methods for model selection in marginal regression analysis of longitudinal data fail to address the potential bias resulting from these issues. To tackle this problem, we propose a new model selection criterion, the Generalized Longitudinal Information Criterion, which is based on an approximately unbiased estimator for the expected quadratic error of a considered marginal model accounting for both data missingness and covariate measurement error. The simulation results reveal that the proposed method performs quite well in the presence of missing data and covariate measurement error. On the contrary, the naive procedures without taking care of such complexity in data may perform quite poorly. The proposed method is applied to data from the Taiwan Longitudinal Study on Aging to assess the relationship of depression with health and social status in the elderly, accommodating measurement error in the covariate as well as missing observations.
Mears, Lisa; Nørregaard, Rasmus; Sin, Gürkan;
2016-01-01
process operating at Novozymes A/S. Following the FUPCR methodology, the final product concentration could be predicted with an average prediction error of 7.4%. Multiple iterations of preprocessing were applied by implementing the methodology to identify the best data handling methods for the model....... It is shown that application of functional data analysis and the choice of variance scaling method have the greatest impact on the prediction accuracy. Considering the vast amount of batch process data continuously generated in industry, this methodology can potentially contribute as a tool to identify......This work proposes a methodology utilizing functional unfold principal component regression (FUPCR), for application to industrial batch process data as a process modeling and optimization tool. The methodology is applied to an industrial fermentation dataset, containing 30 batches of a production...
A Note on Penalized Regression Spline Estimation in the Secondary Analysis of Case-Control Data
Gazioglu, Suzan
2013-05-25
Primary analysis of case-control studies focuses on the relationship between disease (D) and a set of covariates of interest (Y, X). A secondary application of the case-control study, often invoked in modern genetic epidemiologic association studies, is to investigate the interrelationship between the covariates themselves. The task is complicated due to the case-control sampling, and to avoid the biased sampling that arises from the design, it is typical to use the control data only. In this paper, we develop penalized regression spline methodology that uses all the data, and improves precision of estimation compared to using only the controls. A simulation study and an empirical example are used to illustrate the methodology.
Chang-zhi CHENG
2011-06-01
Full Text Available Objective To explore the risk factors of complication of acute renal failure(ARF in war injuries of limbs.Methods The clinical data of 352 patients with limb injuries admitted to 303 Hospital of PLA from 1968 to 2002 were retrospectively analyzed.The patients were divided into ARF group(n=9 and non-ARF group(n=343 according to the occurrence of ARF,and the case-control study was carried out.Ten factors which might lead to death were analyzed by logistic regression to screen the risk factors for ARF,including causes of trauma,shock after injury,time of admission to hospital after injury,injured sites,combined trauma,number of surgical procedures,presence of foreign matters,features of fractures,amputation,and tourniquet time.Results Fifteen of the 352 patients died(4.3%,among them 7 patients(46.7% died of ARF,3(20.0% of pulmonary embolism,3(20.0% of gas gangrene,and 2(13.3% of multiple organ failure.Univariate analysis revealed that the shock,time before admitted to hospital,amputation and tourniquet time were the risk factors for ARF in the wounded with limb injuries,while the logistic regression analysis showed only amputation was the risk factor for ARF(P < 0.05.Conclusion ARF is the primary cause-of-death in the wounded with limb injury.Prompt and accurate treatment and optimal time for amputation may be beneficial to decreasing the incidence and mortality of ARF in the wounded with severe limb injury and ischemic necrosis.
Use of Bayesian event trees in semi-quantitative volcano eruption forecasting and hazard analysis
Wright, Heather; Pallister, John; Newhall, Chris
2015-04-01
Use of Bayesian event trees to forecast eruptive activity during volcano crises is an increasingly common practice for the USGS-USAID Volcano Disaster Assistance Program (VDAP) in collaboration with foreign counterparts. This semi-quantitative approach combines conceptual models of volcanic processes with current monitoring data and patterns of occurrence to reach consensus probabilities. This approach allows a response team to draw upon global datasets, local observations, and expert judgment, where the relative influence of these data depends upon the availability and quality of monitoring data and the degree to which the volcanic history is known. The construction of such event trees additionally relies upon existence and use of relevant global databases and documented past periods of unrest. Because relevant global databases may be underpopulated or nonexistent, uncertainty in probability estimations may be large. Our 'hybrid' approach of combining local and global monitoring data and expert judgment facilitates discussion and constructive debate between disciplines: including seismology, gas geochemistry, geodesy, petrology, physical volcanology and technology/engineering, where difference in opinion between response team members contributes to definition of the uncertainty in the probability estimations. In collaboration with foreign colleagues, we have created event trees for numerous areas experiencing volcanic unrest. Event trees are created for a specified time frame and are updated, revised, or replaced as the crisis proceeds. Creation of an initial tree is often prompted by a change in monitoring data, such that rapid assessment of probability is needed. These trees are intended as a vehicle for discussion and a way to document relevant data and models, where the target audience is the scientists themselves. However, the probabilities derived through the event-tree analysis can also be used to help inform communications with emergency managers and the
Ion Uptake Determination of Dendrochronologically-Dated Trees Using Neutron Activation Analysis
Kenan Unlu; P.I. Kuniholm; D.K.H. Schwarz; N.O. Cetiner; J.J. Chiment
2009-03-30
Uptake of metal ions by plan roots is a function of the type and concentration of metal in the soil, the nutrient biochemistry of the plant, and the immediate environment of the root. Uptake of gold (Au) is known to be sensitive to soil pH for many species. Soil acidification due to acid precipitation following volcanic eruptions can dramatically increase Au uptake by trees. Identification of high Au content in tree rings in dendrochronologically-dated, overlapping sequences of trees allows the identification of temporally-conscribed, volcanically-influenced periods of environmental change. Ion uptake, specifically determination of trace amounts of gold, was performed for dendrochronologically-dated tree samples utilizing Neutron Activation Analysis (NAA) technique. The concentration of gold was correlated with known enviironmental changes, e.g. volcanic activities, during historic periods.
Genetic analysis of somatic cell score in Norwegian cattle using random regression test-day models.
Odegård, J; Jensen, J; Klemetsdal, G; Madsen, P; Heringstad, B
2003-12-01
The dataset used in this analysis contained a total of 341,736 test-day observations of somatic cell scores from 77,110 primiparous daughters of 1965 Norwegian Cattle sires. Initial analyses, using simple random regression models without genetic effects, indicated that use of homogeneous residual variance was appropriate. Further analyses were carried out by use of a repeatability model and 12 random regression sire models. Legendre polynomials of varying order were used to model both permanent environmental and sire effects, as did the Wilmink function, the Lidauer-Mäntysaari function, and the Ali-Schaeffer function. For all these models, heritability estimates were lowest at the beginning (0.05 to 0.07) and higher at the end (0.09 to 0.12) of lactation. Genetic correlations between somatic cell scores early and late in lactation were moderate to high (0.38 to 0.71), whereas genetic correlations for adjacent DIM were near unity. Models were compared based on likelihood ratio tests, Bayesian information criterion, Akaike information criterion, residual variance, and predictive ability. Based on prediction of randomly excluded observations, models with 4 coefficients for permanent environmental effect were preferred over simpler models. More highly parameterized models did not substantially increase predictive ability. Evaluation of the different model selection criteria indicated that a reduced order of fit for sire effects was desireable. Models with zeroth- or first-order of fit for sire effects and higher order of fit for permanent environmental effects probably underestimated sire variance. The chosen model had Legendre polynomials with 3 coefficients for sire, and 4 coefficients for permanent environmental effects. For this model, trajectories of sire variance and heritability were similar assuming either homogeneous or heterogeneous residual variance structure.
Generalized multilevel function-on-scalar regression and principal component analysis.
Goldsmith, Jeff; Zipunnikov, Vadim; Schrack, Jennifer
2015-06-01
This manuscript considers regression models for generalized, multilevel functional responses: functions are generalized in that they follow an exponential family distribution and multilevel in that they are clustered within groups or subjects. This data structure is increasingly common across scientific domains and is exemplified by our motivating example, in which binary curves indicating physical activity or inactivity are observed for nearly 600 subjects over 5 days. We use a generalized linear model to incorporate scalar covariates into the mean structure, and decompose subject-specific and subject-day-specific deviations using multilevel functional principal components analysis. Thus, functional fixed effects are estimated while accounting for within-function and within-subject correlations, and major directions of variability within and between subjects are identified. Fixed effect coefficient functions and principal component basis functions are estimated using penalized splines; model parameters are estimated in a Bayesian framework using Stan, a programming language that implements a Hamiltonian Monte Carlo sampler. Simulations designed to mimic the application have good estimation and inferential properties with reasonable computation times for moderate datasets, in both cross-sectional and multilevel scenarios; code is publicly available. In the application we identify effects of age and BMI on the time-specific change in probability of being active over a 24-hour period; in addition, the principal components analysis identifies the patterns of activity that distinguish subjects and days within subjects.
The value of a statistical life: a meta-analysis with a mixed effects regression model.
Bellavance, François; Dionne, Georges; Lebeau, Martin
2009-03-01
The value of a statistical life (VSL) is a very controversial topic, but one which is essential to the optimization of governmental decisions. We see a great variability in the values obtained from different studies. The source of this variability needs to be understood, in order to offer public decision-makers better guidance in choosing a value and to set clearer guidelines for future research on the topic. This article presents a meta-analysis based on 39 observations obtained from 37 studies (from nine different countries) which all use a hedonic wage method to calculate the VSL. Our meta-analysis is innovative in that it is the first to use the mixed effects regression model [Raudenbush, S.W., 1994. Random effects models. In: Cooper, H., Hedges, L.V. (Eds.), The Handbook of Research Synthesis. Russel Sage Foundation, New York] to analyze studies on the value of a statistical life. We conclude that the variability found in the values studied stems in large part from differences in methodologies.
Yuliana Yuliana
2010-06-01
Full Text Available Quantitative Electronic Structure Activity Relationship (QSAR analysis of a series of benzalacetones has been investigated based on semi empirical PM3 calculation data using Principal Components Regression (PCR. Investigation has been done based on antimutagen activity from benzalacetone compounds (presented by log 1/IC50 and was studied as linear correlation with latent variables (Tx resulted from transformation of atomic net charges using Principal Component Analysis (PCA. QSAR equation was determinated based on distribution of selected components and then was analysed with PCR. The result was described by the following QSAR equation : log 1/IC50 = 6.555 + (2.177.T1 + (2.284.T2 + (1.933.T3 The equation was significant on the 95% level with statistical parameters : n = 28 r = 0.766 SE = 0.245 Fcalculation/Ftable = 3.780 and gave the PRESS result 0.002. It means that there were only a relatively few deviations between the experimental and theoretical data of antimutagenic activity. New types of benzalacetone derivative compounds were designed and their theoretical activity were predicted based on the best QSAR equation. It was found that compounds number 29, 30, 31, 32, 33, 35, 36, 37, 38, 40, 41, 42, 44, 47, 48, 49 and 50 have a relatively high antimutagenic activity. Keywords: QSAR; antimutagenic activity; benzalaceton; atomic net charge
Binary Logistic Regression Analysis of Foramen Magnum Dimensions for Sex Determination
Kamath, Venkatesh Gokuldas
2015-01-01
Purpose. The structural integrity of foramen magnum is usually preserved in fire accidents and explosions due to its resistant nature and secluded anatomical position and this study attempts to determine its sexing potential. Methods. The sagittal and transverse diameters and area of foramen magnum of seventy-two skulls (41 male and 31 female) from south Indian population were measured. The analysis was done using Student's t-test, linear correlation, histogram, Q-Q plot, and Binary Logistic Regression (BLR) to obtain a model for sex determination. The predicted probabilities of BLR were analysed using Receiver Operating Characteristic (ROC) curve. Result. BLR analysis and ROC curve revealed that the predictability of the dimensions in sexing the crania was 69.6% for sagittal diameter, 66.4% for transverse diameter, and 70.3% for area of foramen. Conclusion. The sexual dimorphism of foramen magnum dimensions is established. However, due to considerable overlapping of male and female values, it is unwise to singularly rely on the foramen measurements. However, considering the high sex predictability percentage of its dimensions in the present study and the studies preceding it, the foramen measurements can be used to supplement other sexing evidence available so as to precisely ascertain the sex of the skeleton. PMID:26346917
A calibration method of Argo floats based on multiple regression analysis
无
2006-01-01
Argo floats are free-moving floats that report vertical profiles of salinity, temperature and pressure at regular time intervals. These floats give good measurements of temperature and pressure, but salinity measurements may show significant sensor drifting with time. It is found that sensor drifting with time is not purely linear as presupposed by Wong (2003). A new method is developed to calibrate conductivity data measured by Argo floats. In this method, Wong's objective analysis method was adopted to estimate the background climatological salinity field on potential temperature surfaces from nearby historical data in WOD01. Furthermore, temperature and time factors are taken into account, and stepwise regression was used for a time-varying or temperature-varying slope in potential conductivity space to correct the drifting in these profiling float salinity data. The result shows salinity errors using this method are smaller than that of Wong's method, the quantitative and qualitative analysis of the conductivity sensor can be carried out with our method.
Rajab, Jasim Mohammed; Jafri, Mohd. Zubir Mat; Lim, Hwee San; Abdullah, Khiruddin
2012-10-01
This study encompasses air surface temperature (AST) modeling in the lower atmosphere. Data of four atmosphere pollutant gases (CO, O3, CH4, and H2O) dataset, retrieved from the National Aeronautics and Space Administration Atmospheric Infrared Sounder (AIRS), from 2003 to 2008 was employed to develop a model to predict AST value in the Malaysian peninsula using the multiple regression method. For the entire period, the pollutants were highly correlated (R=0.821) with predicted AST. Comparisons among five stations in 2009 showed close agreement between the predicted AST and the observed AST from AIRS, especially in the southwest monsoon (SWM) season, within 1.3 K, and for in situ data, within 1 to 2 K. The validation results of AST with AST from AIRS showed high correlation coefficient (R=0.845 to 0.918), indicating the model's efficiency and accuracy. Statistical analysis in terms of β showed that H2O (0.565 to 1.746) tended to contribute significantly to high AST values during the northeast monsoon season. Generally, these results clearly indicate the advantage of using the satellite AIRS data and a correlation analysis study to investigate the impact of atmospheric greenhouse gases on AST over the Malaysian peninsula. A model was developed that is capable of retrieving the Malaysian peninsulan AST in all weather conditions, with total uncertainties ranging between 1 and 2 K.
A factor analysis-multiple regression model for source apportionment of suspended particulate matter
Okamoto, Shin'ichi; Hayashi, Masayuki; Nakajima, Masaomi; Kainuma, Yasutaka; Shiozawa, Kiyoshige
A factor analysis-multiple regression (FA-MR) model has been used for a source apportionment study in the Tokyo metropolitan area. By a varimax rotated factor analysis, five source types could be identified: refuse incineration, soil and automobile, secondary particles, sea salt and steel mill. Quantitative estimations using the FA-MR model corresponded to the calculated contributing concentrations determined by using a weighted least-squares CMB model. However, the source type of refuse incineration identified by the FA-MR model was similar to that of biomass burning, rather than that produced by an incineration plant. The estimated contributions of sea salt and steel mill by the FA-MR model contained those of other sources, which have the same temporal variation of contributing concentrations. This symptom was caused by a multicollinearity problem. Although this result shows the limitation of the multivariate receptor model, it gives useful information concerning source types and their distribution by comparing with the results of the CMB model. In the Tokyo metropolitan area, the contributions from soil (including road dust), automobile, secondary particles and refuse incineration (biomass burning) were larger than industrial contributions: fuel oil combustion and steel mill. However, since vanadium is highly correlated with SO 42- and other secondary particle related elements, a major portion of secondary particles is considered to be related to fuel oil combustion.
Machine learning of swimming data via wisdom of crowd and regression analysis.
Xie, Jiang; Xu, Junfu; Nie, Celine; Nie, Qing
2017-04-01
Every performance, in an officially sanctioned meet, by a registered USA swimmer is recorded into an online database with times dating back to 1980. For the first time, statistical analysis and machine learning methods are systematically applied to 4,022,631 swim records. In this study, we investigate performance features for all strokes as a function of age and gender. The variances in performance of males and females for different ages and strokes were studied, and the correlations of performances for different ages were estimated using the Pearson correlation. Regression analysis show the performance trends for both males and females at different ages and suggest critical ages for peak training. Moreover, we assess twelve popular machine learning methods to predict or classify swimmer performance. Each method exhibited different strengths or weaknesses in different cases, indicating no one method could predict well for all strokes. To address this problem, we propose a new method by combining multiple inference methods to derive Wisdom of Crowd Classifier (WoCC). Our simulation experiments demonstrate that the WoCC is a consistent method with better overall prediction accuracy. Our study reveals several new age-dependent trends in swimming and provides an accurate method for classifying and predicting swimming times.