WorldWideScience

Sample records for regression tree approach

  1. Classification and regression trees

    CERN Document Server

    Breiman, Leo; Olshen, Richard A; Stone, Charles J

    1984-01-01

    The methodology used to construct tree structured rules is the focus of this monograph. Unlike many other statistical procedures, which moved from pencil and paper to calculators, this text's use of trees was unthinkable before computers. Both the practical and theoretical sides have been developed in the authors' study of tree methods. Classification and Regression Trees reflects these two sides, covering the use of trees as a data analysis method, and in a more mathematical framework, proving some of their fundamental properties.

  2. A Hybrid Approach of Stepwise Regression, Logistic Regression, Support Vector Machine, and Decision Tree for Forecasting Fraudulent Financial Statements

    Directory of Open Access Journals (Sweden)

    Suduan Chen

    2014-01-01

    Full Text Available As the fraudulent financial statement of an enterprise is increasingly serious with each passing day, establishing a valid forecasting fraudulent financial statement model of an enterprise has become an important question for academic research and financial practice. After screening the important variables using the stepwise regression, the study also matches the logistic regression, support vector machine, and decision tree to construct the classification models to make a comparison. The study adopts financial and nonfinancial variables to assist in establishment of the forecasting fraudulent financial statement model. Research objects are the companies to which the fraudulent and nonfraudulent financial statement happened between years 1998 to 2012. The findings are that financial and nonfinancial information are effectively used to distinguish the fraudulent financial statement, and decision tree C5.0 has the best classification effect 85.71%.

  3. A hybrid approach of stepwise regression, logistic regression, support vector machine, and decision tree for forecasting fraudulent financial statements.

    Science.gov (United States)

    Chen, Suduan; Goo, Yeong-Jia James; Shen, Zone-De

    2014-01-01

    As the fraudulent financial statement of an enterprise is increasingly serious with each passing day, establishing a valid forecasting fraudulent financial statement model of an enterprise has become an important question for academic research and financial practice. After screening the important variables using the stepwise regression, the study also matches the logistic regression, support vector machine, and decision tree to construct the classification models to make a comparison. The study adopts financial and nonfinancial variables to assist in establishment of the forecasting fraudulent financial statement model. Research objects are the companies to which the fraudulent and nonfraudulent financial statement happened between years 1998 to 2012. The findings are that financial and nonfinancial information are effectively used to distinguish the fraudulent financial statement, and decision tree C5.0 has the best classification effect 85.71%.

  4. Chronic subdural hematoma: Surgical management and outcome in 986 cases: A classification and regression tree approach

    Science.gov (United States)

    Rovlias, Aristedis; Theodoropoulos, Spyridon; Papoutsakis, Dimitrios

    2015-01-01

    Background: Chronic subdural hematoma (CSDH) is one of the most common clinical entities in daily neurosurgical practice which carries a most favorable prognosis. However, because of the advanced age and medical problems of patients, surgical therapy is frequently associated with various complications. This study evaluated the clinical features, radiological findings, and neurological outcome in a large series of patients with CSDH. Methods: A classification and regression tree (CART) technique was employed in the analysis of data from 986 patients who were operated at Asclepeion General Hospital of Athens from January 1986 to December 2011. Burr holes evacuation with closed system drainage has been the operative technique of first choice at our institution for 29 consecutive years. A total of 27 prognostic factors were examined to predict the outcome at 3-month postoperatively. Results: Our results indicated that neurological status on admission was the best predictor of outcome. With regard to the other data, age, brain atrophy, thickness and density of hematoma, subdural accumulation of air, and antiplatelet and anticoagulant therapy were found to correlate significantly with prognosis. The overall cross-validated predictive accuracy of CART model was 85.34%, with a cross-validated relative error of 0.326. Conclusions: Methodologically, CART technique is quite different from the more commonly used methods, with the primary benefit of illustrating the important prognostic variables as related to outcome. Since, the ideal therapy for the treatment of CSDH is still under debate, this technique may prove useful in developing new therapeutic strategies and approaches for patients with CSDH. PMID:26257985

  5. City housing atmospheric pollutant impact on emergency visit for asthma: A classification and regression tree approach.

    Science.gov (United States)

    Mazenq, Julie; Dubus, Jean-Christophe; Gaudart, Jean; Charpin, Denis; Viudes, Gilles; Noel, Guilhem

    2017-11-01

    Particulate matter, nitrogen dioxide (NO 2 ) and ozone are recognized as the three pollutants that most significantly affect human health. Asthma is a multifactorial disease. However, the place of residence has rarely been investigated. We compared the impact of air pollution, measured near patients' homes, on emergency department (ED) visits for asthma or trauma (controls) within the Provence-Alpes-Côte-d'Azur region. Variables were selected using classification and regression trees on asthmatic and control population, 3-99 years, visiting ED from January 1 to December 31, 2013. Then in a nested case control study, randomization was based on the day of ED visit and on defined age groups. Pollution, meteorological, pollens and viral data measured that day were linked to the patient's ZIP code. A total of 794,884 visits were reported including 6250 for asthma and 278,192 for trauma. Factors associated with an excess risk of emergency visit for asthma included short-term exposure to NO 2 , female gender, high viral load and a combination of low temperature and high humidity. Short-term exposures to high NO 2 concentrations, as assessed close to the homes of the patients, were significantly associated with asthma-related ED visits in children and adults. Copyright © 2017 Elsevier Ltd. All rights reserved.

  6. Analysis of the impact of recreational trail usage for prioritising management decisions: a regression tree approach

    Science.gov (United States)

    Tomczyk, Aleksandra; Ewertowski, Marek; White, Piran; Kasprzak, Leszek

    2016-04-01

    The dual role of many Protected Natural Areas in providing benefits for both conservation and recreation poses challenges for management. Although recreation-based damage to ecosystems can occur very quickly, restoration can take many years. The protection of conservation interests at the same as providing for recreation requires decisions to be made about how to prioritise and direct management actions. Trails are commonly used to divert visitors from the most important areas of a site, but high visitor pressure can lead to increases in trail width and a concomitant increase in soil erosion. Here we use detailed field data on condition of recreational trails in Gorce National Park, Poland, as the basis for a regression tree analysis to determine the factors influencing trail deterioration, and link specific trail impacts with environmental, use related and managerial factors. We distinguished 12 types of trails, characterised by four levels of degradation: (1) trails with an acceptable level of degradation; (2) threatened trails; (3) damaged trails; and (4) heavily damaged trails. Damaged trails were the most vulnerable of all trails and should be prioritised for appropriate conservation and restoration. We also proposed five types of monitoring of recreational trail conditions: (1) rapid inventory of negative impacts; (2) monitoring visitor numbers and variation in type of use; (3) change-oriented monitoring focusing on sections of trail which were subjected to changes in type or level of use or subjected to extreme weather events; (4) monitoring of dynamics of trail conditions; and (5) full assessment of trail conditions, to be carried out every 10-15 years. The application of the proposed framework can enhance the ability of Park managers to prioritise their trail management activities, enhancing trail conditions and visitor safety, while minimising adverse impacts on the conservation value of the ecosystem. A.M.T. was supported by the Polish Ministry of

  7. Spatial prediction of landslides using a hybrid machine learning approach based on Random Subspace and Classification and Regression Trees

    Science.gov (United States)

    Pham, Binh Thai; Prakash, Indra; Tien Bui, Dieu

    2018-02-01

    A hybrid machine learning approach of Random Subspace (RSS) and Classification And Regression Trees (CART) is proposed to develop a model named RSSCART for spatial prediction of landslides. This model is a combination of the RSS method which is known as an efficient ensemble technique and the CART which is a state of the art classifier. The Luc Yen district of Yen Bai province, a prominent landslide prone area of Viet Nam, was selected for the model development. Performance of the RSSCART model was evaluated through the Receiver Operating Characteristic (ROC) curve, statistical analysis methods, and the Chi Square test. Results were compared with other benchmark landslide models namely Support Vector Machines (SVM), single CART, Naïve Bayes Trees (NBT), and Logistic Regression (LR). In the development of model, ten important landslide affecting factors related with geomorphology, geology and geo-environment were considered namely slope angles, elevation, slope aspect, curvature, lithology, distance to faults, distance to rivers, distance to roads, and rainfall. Performance of the RSSCART model (AUC = 0.841) is the best compared with other popular landslide models namely SVM (0.835), single CART (0.822), NBT (0.821), and LR (0.723). These results indicate that performance of the RSSCART is a promising method for spatial landslide prediction.

  8. Dynamic travel time estimation using regression trees.

    Science.gov (United States)

    2008-10-01

    This report presents a methodology for travel time estimation by using regression trees. The dissemination of travel time information has become crucial for effective traffic management, especially under congested road conditions. In the absence of c...

  9. Regression analysis using dependent Polya trees.

    Science.gov (United States)

    Schörgendorfer, Angela; Branscum, Adam J

    2013-11-30

    Many commonly used models for linear regression analysis force overly simplistic shape and scale constraints on the residual structure of data. We propose a semiparametric Bayesian model for regression analysis that produces data-driven inference by using a new type of dependent Polya tree prior to model arbitrary residual distributions that are allowed to evolve across increasing levels of an ordinal covariate (e.g., time, in repeated measurement studies). By modeling residual distributions at consecutive covariate levels or time points using separate, but dependent Polya tree priors, distributional information is pooled while allowing for broad pliability to accommodate many types of changing residual distributions. We can use the proposed dependent residual structure in a wide range of regression settings, including fixed-effects and mixed-effects linear and nonlinear models for cross-sectional, prospective, and repeated measurement data. A simulation study illustrates the flexibility of our novel semiparametric regression model to accurately capture evolving residual distributions. In an application to immune development data on immunoglobulin G antibodies in children, our new model outperforms several contemporary semiparametric regression models based on a predictive model selection criterion. Copyright © 2013 John Wiley & Sons, Ltd.

  10. Short-term load forecasting with increment regression tree

    Energy Technology Data Exchange (ETDEWEB)

    Yang, Jingfei; Stenzel, Juergen [Darmstadt University of Techonology, Darmstadt 64283 (Germany)

    2006-06-15

    This paper presents a new regression tree method for short-term load forecasting. Both increment and non-increment tree are built according to the historical data to provide the data space partition and input variable selection. Support vector machine is employed to the samples of regression tree nodes for further fine regression. Results of different tree nodes are integrated through weighted average method to obtain the comprehensive forecasting result. The effectiveness of the proposed method is demonstrated through its application to an actual system. (author)

  11. Estimation of Tree Cover in an Agricultural Parkland of Senegal Using Rule-Based Regression Tree Modeling

    Directory of Open Access Journals (Sweden)

    Stefanie M. Herrmann

    2013-10-01

    Full Text Available Field trees are an integral part of the farmed parkland landscape in West Africa and provide multiple benefits to the local environment and livelihoods. While field trees have received increasing interest in the context of strengthening resilience to climate variability and change, the actual extent of farmed parkland and spatial patterns of tree cover are largely unknown. We used the rule-based predictive modeling tool Cubist® to estimate field tree cover in the west-central agricultural region of Senegal. A collection of rules and associated multiple linear regression models was constructed from (1 a reference dataset of percent tree cover derived from very high spatial resolution data (2 m Orbview as the dependent variable, and (2 ten years of 10-day 250 m Moderate Resolution Imaging Spectrometer (MODIS Normalized Difference Vegetation Index (NDVI composites and derived phenological metrics as independent variables. Correlation coefficients between modeled and reference percent tree cover of 0.88 and 0.77 were achieved for training and validation data respectively, with absolute mean errors of 1.07 and 1.03 percent tree cover. The resulting map shows a west-east gradient from high tree cover in the peri-urban areas of horticulture and arboriculture to low tree cover in the more sparsely populated eastern part of the study area. A comparison of current (2000s tree cover along this gradient with historic cover as seen on Corona images reveals dynamics of change but also areas of remarkable stability of field tree cover since 1968. The proposed modeling approach can help to identify locations of high and low tree cover in dryland environments and guide ground studies and management interventions aimed at promoting the integration of field trees in agricultural systems.

  12. Iron Supplementation and Altitude: Decision Making Using a Regression Tree

    Directory of Open Access Journals (Sweden)

    Laura A. Garvican-Lewis, Andrew D. Govus, Peter Peeling, Chris R. Abbiss, Christopher J. Gore

    2016-03-01

    not increase in the 8 athletes with Ferritin-Pre <35 µg.L-1 who were supplemented with 105 mg.d-1, it is possible that in these athletes this dose was insufficient to support erythropoiesis. Furthermore, athletes with higher Ferritin-Pre, but large daily iron requirements, may also benefit from supplementation at altitude. The lack of improvement in Hbmass in non-supplemented athletes, may be related to reduced iron availability and mobility, despite plentiful iron stores, perhaps due to an increase in hepcidin post-exercise (Peeling, 2010. In summary, our regression tree suggests if sufficient iron is made available (via supplementation, even ID athletes can improve Hbmass in response to altitude. Some athletes with otherwise normal ferritin may also require supplementation to maintain an iron balance capable of supporting both the haematological and non-haematological adaptations to altitude. We recommend an individualised approach when deciding whether iron supplementation is appropriate, particularly concerning the dose provided.

  13. Sub-pixel estimation of tree cover and bare surface densities using regression tree analysis

    Directory of Open Access Journals (Sweden)

    Carlos Augusto Zangrando Toneli

    2011-09-01

    Full Text Available Sub-pixel analysis is capable of generating continuous fields, which represent the spatial variability of certain thematic classes. The aim of this work was to develop numerical models to represent the variability of tree cover and bare surfaces within the study area. This research was conducted in the riparian buffer within a watershed of the São Francisco River in the North of Minas Gerais, Brazil. IKONOS and Landsat TM imagery were used with the GUIDE algorithm to construct the models. The results were two index images derived with regression trees for the entire study area, one representing tree cover and the other representing bare surface. The use of non-parametric and non-linear regression tree models presented satisfactory results to characterize wetland, deciduous and savanna patterns of forest formation.

  14. What Satisfies Students?: Mining Student-Opinion Data with Regression and Decision Tree Analysis

    Science.gov (United States)

    Thomas, Emily H.; Galambos, Nora

    2004-01-01

    To investigate how students' characteristics and experiences affect satisfaction, this study uses regression and decision tree analysis with the CHAID algorithm to analyze student-opinion data. A data mining approach identifies the specific aspects of students' university experience that most influence three measures of general satisfaction. The…

  15. Fuzzy multiple linear regression: A computational approach

    Science.gov (United States)

    Juang, C. H.; Huang, X. H.; Fleming, J. W.

    1992-01-01

    This paper presents a new computational approach for performing fuzzy regression. In contrast to Bardossy's approach, the new approach, while dealing with fuzzy variables, closely follows the conventional regression technique. In this approach, treatment of fuzzy input is more 'computational' than 'symbolic.' The following sections first outline the formulation of the new approach, then deal with the implementation and computational scheme, and this is followed by examples to illustrate the new procedure.

  16. Prediction of radiation levels in residences: A methodological comparison of CART [Classification and Regression Tree Analysis] and conventional regression

    International Nuclear Information System (INIS)

    Janssen, I.; Stebbings, J.H.

    1990-01-01

    In environmental epidemiology, trace and toxic substance concentrations frequently have very highly skewed distributions ranging over one or more orders of magnitude, and prediction by conventional regression is often poor. Classification and Regression Tree Analysis (CART) is an alternative in such contexts. To compare the techniques, two Pennsylvania data sets and three independent variables are used: house radon progeny (RnD) and gamma levels as predicted by construction characteristics in 1330 houses; and ∼200 house radon (Rn) measurements as predicted by topographic parameters. CART may identify structural variables of interest not identified by conventional regression, and vice versa, but in general the regression models are similar. CART has major advantages in dealing with other common characteristics of environmental data sets, such as missing values, continuous variables requiring transformations, and large sets of potential independent variables. CART is most useful in the identification and screening of independent variables, greatly reducing the need for cross-tabulations and nested breakdown analyses. There is no need to discard cases with missing values for the independent variables because surrogate variables are intrinsic to CART. The tree-structured approach is also independent of the scale on which the independent variables are measured, so that transformations are unnecessary. CART identifies important interactions as well as main effects. The major advantages of CART appear to be in exploring data. Once the important variables are identified, conventional regressions seem to lead to results similar but more interpretable by most audiences. 12 refs., 8 figs., 10 tabs

  17. Comparative study of biodegradability prediction of chemicals using decision trees, functional trees, and logistic regression.

    Science.gov (United States)

    Chen, Guangchao; Li, Xuehua; Chen, Jingwen; Zhang, Ya-Nan; Peijnenburg, Willie J G M

    2014-12-01

    Biodegradation is the principal environmental dissipation process of chemicals. As such, it is a dominant factor determining the persistence and fate of organic chemicals in the environment, and is therefore of critical importance to chemical management and regulation. In the present study, the authors developed in silico methods assessing biodegradability based on a large heterogeneous set of 825 organic compounds, using the techniques of the C4.5 decision tree, the functional inner regression tree, and logistic regression. External validation was subsequently carried out by 2 independent test sets of 777 and 27 chemicals. As a result, the functional inner regression tree exhibited the best predictability with predictive accuracies of 81.5% and 81.0%, respectively, on the training set (825 chemicals) and test set I (777 chemicals). Performance of the developed models on the 2 test sets was subsequently compared with that of the Estimation Program Interface (EPI) Suite Biowin 5 and Biowin 6 models, which also showed a better predictability of the functional inner regression tree model. The model built in the present study exhibits a reasonable predictability compared with existing models while possessing a transparent algorithm. Interpretation of the mechanisms of biodegradation was also carried out based on the models developed. © 2014 SETAC.

  18. Assessment of wastewater treatment facility compliance with decreasing ammonia discharge limits using a regression tree model.

    Science.gov (United States)

    Suchetana, Bihu; Rajagopalan, Balaji; Silverstein, JoAnn

    2017-11-15

    A regression tree-based diagnostic approach is developed to evaluate factors affecting US wastewater treatment plant compliance with ammonia discharge permit limits using Discharge Monthly Report (DMR) data from a sample of 106 municipal treatment plants for the period of 2004-2008. Predictor variables used to fit the regression tree are selected using random forests, and consist of the previous month's effluent ammonia, influent flow rates and plant capacity utilization. The tree models are first used to evaluate compliance with existing ammonia discharge standards at each facility and then applied assuming more stringent discharge limits, under consideration in many states. The model predicts that the ability to meet both current and future limits depends primarily on the previous month's treatment performance. With more stringent discharge limits predicted ammonia concentration relative to the discharge limit, increases. In-sample validation shows that the regression trees can provide a median classification accuracy of >70%. The regression tree model is validated using ammonia discharge data from an operating wastewater treatment plant and is able to accurately predict the observed ammonia discharge category approximately 80% of the time, indicating that the regression tree model can be applied to predict compliance for individual treatment plants providing practical guidance for utilities and regulators with an interest in controlling ammonia discharges. The proposed methodology is also used to demonstrate how to delineate reliable sources of demand and supply in a point source-to-point source nutrient credit trading scheme, as well as how planners and decision makers can set reasonable discharge limits in future. Copyright © 2017 Elsevier B.V. All rights reserved.

  19. Prediction of tissue-specific cis-regulatory modules using Bayesian networks and regression trees

    Directory of Open Access Journals (Sweden)

    Chen Xiaoyu

    2007-12-01

    Full Text Available Abstract Background In vertebrates, a large part of gene transcriptional regulation is operated by cis-regulatory modules. These modules are believed to be regulating much of the tissue-specificity of gene expression. Results We develop a Bayesian network approach for identifying cis-regulatory modules likely to regulate tissue-specific expression. The network integrates predicted transcription factor binding site information, transcription factor expression data, and target gene expression data. At its core is a regression tree modeling the effect of combinations of transcription factors bound to a module. A new unsupervised EM-like algorithm is developed to learn the parameters of the network, including the regression tree structure. Conclusion Our approach is shown to accurately identify known human liver and erythroid-specific modules. When applied to the prediction of tissue-specific modules in 10 different tissues, the network predicts a number of important transcription factor combinations whose concerted binding is associated to specific expression.

  20. Regression Trees Identify Relevant Interactions: Can This Improve the Predictive Performance of Risk Adjustment?

    Science.gov (United States)

    Buchner, Florian; Wasem, Jürgen; Schillo, Sonja

    2017-01-01

    Risk equalization formulas have been refined since their introduction about two decades ago. Because of the complexity and the abundance of possible interactions between the variables used, hardly any interactions are considered. A regression tree is used to systematically search for interactions, a methodologically new approach in risk equalization. Analyses are based on a data set of nearly 2.9 million individuals from a major German social health insurer. A two-step approach is applied: In the first step a regression tree is built on the basis of the learning data set. Terminal nodes characterized by more than one morbidity-group-split represent interaction effects of different morbidity groups. In the second step the 'traditional' weighted least squares regression equation is expanded by adding interaction terms for all interactions detected by the tree, and regression coefficients are recalculated. The resulting risk adjustment formula shows an improvement in the adjusted R 2 from 25.43% to 25.81% on the evaluation data set. Predictive ratios are calculated for subgroups affected by the interactions. The R 2 improvement detected is only marginal. According to the sample level performance measures used, not involving a considerable number of morbidity interactions forms no relevant loss in accuracy. Copyright © 2015 John Wiley & Sons, Ltd. Copyright © 2015 John Wiley & Sons, Ltd.

  1. Forecasting exchange rates: a robust regression approach

    OpenAIRE

    Preminger, Arie; Franck, Raphael

    2005-01-01

    The least squares estimation method as well as other ordinary estimation method for regression models can be severely affected by a small number of outliers, thus providing poor out-of-sample forecasts. This paper suggests a robust regression approach, based on the S-estimation method, to construct forecasting models that are less sensitive to data contamination by outliers. A robust linear autoregressive (RAR) and a robust neural network (RNN) models are estimated to study the predictabil...

  2. Data to support "Boosted Regression Tree Models to Explain Watershed Nutrient Concentrations & Biological Condition"

    Data.gov (United States)

    U.S. Environmental Protection Agency — Spreadsheets are included here to support the manuscript "Boosted Regression Tree Models to Explain Watershed Nutrient Concentrations and Biological Condition". This...

  3. Comparing Methodologies for Developing an Early Warning System: Classification and Regression Tree Model versus Logistic Regression. REL 2015-077

    Science.gov (United States)

    Koon, Sharon; Petscher, Yaacov

    2015-01-01

    The purpose of this report was to explicate the use of logistic regression and classification and regression tree (CART) analysis in the development of early warning systems. It was motivated by state education leaders' interest in maintaining high classification accuracy while simultaneously improving practitioner understanding of the rules by…

  4. Market-based approaches to tree valuation

    Science.gov (United States)

    Geoffrey H. Donovan; David T. Butry

    2008-01-01

    A recent four-part series in Arborist News outlined different appraisal processes used to value urban trees. The final article in the series described the three generally accepted approaches to tree valuation: the sales comparison approach, the cost approach, and the income capitalization approach. The author, D. Logan Nelson, noted that the sales comparison approach...

  5. Using ROC curves to compare neural networks and logistic regression for modeling individual noncatastrophic tree mortality

    Science.gov (United States)

    Susan L. King

    2003-01-01

    The performance of two classifiers, logistic regression and neural networks, are compared for modeling noncatastrophic individual tree mortality for 21 species of trees in West Virginia. The output of the classifier is usually a continuous number between 0 and 1. A threshold is selected between 0 and 1 and all of the trees below the threshold are classified as...

  6. Regression: The Apple Does Not Fall Far From the Tree.

    Science.gov (United States)

    Vetter, Thomas R; Schober, Patrick

    2018-05-15

    Researchers and clinicians are frequently interested in either: (1) assessing whether there is a relationship or association between 2 or more variables and quantifying this association; or (2) determining whether 1 or more variables can predict another variable. The strength of such an association is mainly described by the correlation. However, regression analysis and regression models can be used not only to identify whether there is a significant relationship or association between variables but also to generate estimations of such a predictive relationship between variables. This basic statistical tutorial discusses the fundamental concepts and techniques related to the most common types of regression analysis and modeling, including simple linear regression, multiple regression, logistic regression, ordinal regression, and Poisson regression, as well as the common yet often underrecognized phenomenon of regression toward the mean. The various types of regression analysis are powerful statistical techniques, which when appropriately applied, can allow for the valid interpretation of complex, multifactorial data. Regression analysis and models can assess whether there is a relationship or association between 2 or more observed variables and estimate the strength of this association, as well as determine whether 1 or more variables can predict another variable. Regression is thus being applied more commonly in anesthesia, perioperative, critical care, and pain research. However, it is crucial to note that regression can identify plausible risk factors; it does not prove causation (a definitive cause and effect relationship). The results of a regression analysis instead identify independent (predictor) variable(s) associated with the dependent (outcome) variable. As with other statistical methods, applying regression requires that certain assumptions be met, which can be tested with specific diagnostics.

  7. Building interpretable predictive models for pediatric hospital readmission using Tree-Lasso logistic regression.

    Science.gov (United States)

    Jovanovic, Milos; Radovanovic, Sandro; Vukicevic, Milan; Van Poucke, Sven; Delibasic, Boris

    2016-09-01

    Quantification and early identification of unplanned readmission risk have the potential to improve the quality of care during hospitalization and after discharge. However, high dimensionality, sparsity, and class imbalance of electronic health data and the complexity of risk quantification, challenge the development of accurate predictive models. Predictive models require a certain level of interpretability in order to be applicable in real settings and create actionable insights. This paper aims to develop accurate and interpretable predictive models for readmission in a general pediatric patient population, by integrating a data-driven model (sparse logistic regression) and domain knowledge based on the international classification of diseases 9th-revision clinical modification (ICD-9-CM) hierarchy of diseases. Additionally, we propose a way to quantify the interpretability of a model and inspect the stability of alternative solutions. The analysis was conducted on >66,000 pediatric hospital discharge records from California, State Inpatient Databases, Healthcare Cost and Utilization Project between 2009 and 2011. We incorporated domain knowledge based on the ICD-9-CM hierarchy in a data driven, Tree-Lasso regularized logistic regression model, providing the framework for model interpretation. This approach was compared with traditional Lasso logistic regression resulting in models that are easier to interpret by fewer high-level diagnoses, with comparable prediction accuracy. The results revealed that the use of a Tree-Lasso model was as competitive in terms of accuracy (measured by area under the receiver operating characteristic curve-AUC) as the traditional Lasso logistic regression, but integration with the ICD-9-CM hierarchy of diseases provided more interpretable models in terms of high-level diagnoses. Additionally, interpretations of models are in accordance with existing medical understanding of pediatric readmission. Best performing models have

  8. Prediction of survival to discharge following cardiopulmonary resuscitation using classification and regression trees.

    Science.gov (United States)

    Ebell, Mark H; Afonso, Anna M; Geocadin, Romergryko G

    2013-12-01

    To predict the likelihood that an inpatient who experiences cardiopulmonary arrest and undergoes cardiopulmonary resuscitation survives to discharge with good neurologic function or with mild deficits (Cerebral Performance Category score = 1). Classification and Regression Trees were used to develop branching algorithms that optimize the ability of a series of tests to correctly classify patients into two or more groups. Data from 2007 to 2008 (n = 38,092) were used to develop candidate Classification and Regression Trees models to predict the outcome of inpatient cardiopulmonary resuscitation episodes and data from 2009 (n = 14,435) to evaluate the accuracy of the models and judge the degree of over fitting. Both supervised and unsupervised approaches to model development were used. 366 hospitals participating in the Get With the Guidelines-Resuscitation registry. Adult inpatients experiencing an index episode of cardiopulmonary arrest and undergoing cardiopulmonary resuscitation in the hospital. The five candidate models had between 8 and 21 nodes and an area under the receiver operating characteristic curve from 0.718 to 0.766 in the derivation group and from 0.683 to 0.746 in the validation group. One of the supervised models had 14 nodes and classified 27.9% of patients as very unlikely to survive neurologically intact or with mild deficits (Tree models that predict survival to discharge with good neurologic function or with mild deficits following in-hospital cardiopulmonary arrest. Models like this can assist physicians and patients who are considering do-not-resuscitate orders.

  9. Malignancy Risk Assessment in Patients with Thyroid Nodules Using Classification and Regression Trees

    Directory of Open Access Journals (Sweden)

    Shokouh Taghipour Zahir

    2013-01-01

    Full Text Available Purpose. We sought to investigate the utility of classification and regression trees (CART classifier to differentiate benign from malignant nodules in patients referred for thyroid surgery. Methods. Clinical and demographic data of 271 patients referred to the Sadoughi Hospital during 2006–2011 were collected. In a two-step approach, a CART classifier was employed to differentiate patients with a high versus low risk of thyroid malignancy. The first step served as the screening procedure and was tailored to produce as few false negatives as possible. The second step identified those with the lowest risk of malignancy, chosen from a high risk population. Sensitivity, specificity, positive and negative predictive values (PPV and NPV of the optimal tree were calculated. Results. In the first step, age, sex, and nodule size contributed to the optimal tree. Ultrasonographic features were employed in the second step with hypoechogenicity and/or microcalcifications yielding the highest discriminatory ability. The combined tree produced a sensitivity and specificity of 80.0% (95% CI: 29.9–98.9 and 94.1% (95% CI: 78.9–99.0, respectively. NPV and PPV were 66.7% (41.1–85.6 and 97.0% (82.5–99.8, respectively. Conclusion. CART classifier reliably identifies patients with a low risk of malignancy who can avoid unnecessary surgery.

  10. Regression Nodes: Extending attack trees with data from social sciences

    NARCIS (Netherlands)

    Bullee, Jan-Willem; Montoya, L.; Pieters, Wolter; Junger, Marianne; Hartel, Pieter H.

    In the field of security, attack trees are often used to assess security vulnerabilities probabilistically in relation to multi-step attacks. The nodes are usually connected via AND-gates, where all children must be executed, or via OR-gates, where only one action is necessary for the attack step to

  11. The process and utility of classification and regression tree methodology in nursing research.

    Science.gov (United States)

    Kuhn, Lisa; Page, Karen; Ward, John; Worrall-Carter, Linda

    2014-06-01

    This paper presents a discussion of classification and regression tree analysis and its utility in nursing research. Classification and regression tree analysis is an exploratory research method used to illustrate associations between variables not suited to traditional regression analysis. Complex interactions are demonstrated between covariates and variables of interest in inverted tree diagrams. Discussion paper. English language literature was sourced from eBooks, Medline Complete and CINAHL Plus databases, Google and Google Scholar, hard copy research texts and retrieved reference lists for terms including classification and regression tree* and derivatives and recursive partitioning from 1984-2013. Classification and regression tree analysis is an important method used to identify previously unknown patterns amongst data. Whilst there are several reasons to embrace this method as a means of exploratory quantitative research, issues regarding quality of data as well as the usefulness and validity of the findings should be considered. Classification and regression tree analysis is a valuable tool to guide nurses to reduce gaps in the application of evidence to practice. With the ever-expanding availability of data, it is important that nurses understand the utility and limitations of the research method. Classification and regression tree analysis is an easily interpreted method for modelling interactions between health-related variables that would otherwise remain obscured. Knowledge is presented graphically, providing insightful understanding of complex and hierarchical relationships in an accessible and useful way to nursing and other health professions. © 2013 The Authors. Journal of Advanced Nursing Published by John Wiley & Sons Ltd.

  12. Application of Boosting Regression Trees to Preliminary Cost Estimation in Building Construction Projects

    Directory of Open Access Journals (Sweden)

    Yoonseok Shin

    2015-01-01

    Full Text Available Among the recent data mining techniques available, the boosting approach has attracted a great deal of attention because of its effective learning algorithm and strong boundaries in terms of its generalization performance. However, the boosting approach has yet to be used in regression problems within the construction domain, including cost estimations, but has been actively utilized in other domains. Therefore, a boosting regression tree (BRT is applied to cost estimations at the early stage of a construction project to examine the applicability of the boosting approach to a regression problem within the construction domain. To evaluate the performance of the BRT model, its performance was compared with that of a neural network (NN model, which has been proven to have a high performance in cost estimation domains. The BRT model has shown results similar to those of NN model using 234 actual cost datasets of a building construction project. In addition, the BRT model can provide additional information such as the importance plot and structure model, which can support estimators in comprehending the decision making process. Consequently, the boosting approach has potential applicability in preliminary cost estimations in a building construction project.

  13. What Satisfies Students? Mining Student-Opinion Data with Regression and Decision-Tree Analysis. AIR 2002 Forum Paper.

    Science.gov (United States)

    Thomas, Emily H.; Galambos, Nora

    To investigate how students' characteristics and experiences affect satisfaction, this study used regression and decision-tree analysis with the CHAID algorithm to analyze student opinion data from a sample of 1,783 college students. A data-mining approach identifies the specific aspects of students' university experience that most influence three…

  14. Crime Modeling using Spatial Regression Approach

    Science.gov (United States)

    Saleh Ahmar, Ansari; Adiatma; Kasim Aidid, M.

    2018-01-01

    Act of criminality in Indonesia increased both variety and quantity every year. As murder, rape, assault, vandalism, theft, fraud, fencing, and other cases that make people feel unsafe. Risk of society exposed to crime is the number of reported cases in the police institution. The higher of the number of reporter to the police institution then the number of crime in the region is increasing. In this research, modeling criminality in South Sulawesi, Indonesia with the dependent variable used is the society exposed to the risk of crime. Modelling done by area approach is the using Spatial Autoregressive (SAR) and Spatial Error Model (SEM) methods. The independent variable used is the population density, the number of poor population, GDP per capita, unemployment and the human development index (HDI). Based on the analysis using spatial regression can be shown that there are no dependencies spatial both lag or errors in South Sulawesi.

  15. Aproximación a la metodología basada en árboles de decisión (CART: Mortalidad hospitalaria del infarto agudo de miocardio Approach to the methodology of classification and regression trees

    Directory of Open Access Journals (Sweden)

    Javier Trujillano

    2008-02-01

    Full Text Available Objetivo: : Realizar una aproximación a la metodología de árboles de decisión tipo CART (Classification and Regression Trees desarrollando un modelo para calcular la probabilidad de muerte hospitalaria en infarto agudo de miocardio (IAM. Método: Se utiliza el conjunto mínimo básico de datos al alta hospitalaria (CMBD de Andalucía, Cataluña, Madrid y País Vasco de los años 2001 y 2002, que incluye los casos con IAM como diagnóstico principal. Los 33.203 pacientes se dividen aleatoriamente (70 y 30 % en grupo de desarrollo (GD = 23.277 y grupo de validación (GV = 9.926. Como CART se utiliza un modelo inductivo basado en el algoritmo de Breiman, con análisis de sensibilidad mediante el índice de Gini y sistema de validación cruzada. Se compara con un modelo de regresión logística (RL y una red neuronal artificial (RNA (multilayer perceptron. Los modelos desarrollados se contrastan en el GV y sus propiedades se comparan con el área bajo la curva ROC (ABC (intervalo de confianza del 95%. Resultados: En el GD el CART con ABC = 0,85 (0,86-0,88, RL 0,87 (0,86-0,88 y RNA 0,85 (0,85-0,86. En el GV el CART con ABC = 0,85 (0,85-0,88, RL 0,86 (0,85-0,88 y RNA 0,84 (0,83-0,86. Conclusiones: Los 3 modelos obtienen resultados similares en su capacidad de discriminación. El modelo CART ofrece como ventaja su simplicidad de uso y de interpretación, ya que las reglas de decisión que generan pueden aplicarse sin necesidad de procesos matemáticos.Objective: To provide an overview of decision trees based on CART (Classification and Regression Trees methodology. As an example, we developed a CART model intended to estimate the probability of intrahospital death from acute myocardial infarction (AMI. Method: We employed the minimum data set (MDS of Andalusia, Catalonia, Madrid and the Basque Country (2001-2002, which included 33,203 patients with a diagnosis of AMI. The 33,203 patients were randomly divided (70% and 30% into the development (DS

  16. Estimating cavity tree and snag abundance using negative binomial regression models and nearest neighbor imputation methods

    Science.gov (United States)

    Bianca N.I. Eskelson; Hailemariam Temesgen; Tara M. Barrett

    2009-01-01

    Cavity tree and snag abundance data are highly variable and contain many zero observations. We predict cavity tree and snag abundance from variables that are readily available from forest cover maps or remotely sensed data using negative binomial (NB), zero-inflated NB, and zero-altered NB (ZANB) regression models as well as nearest neighbor (NN) imputation methods....

  17. Applied Regression Modeling A Business Approach

    CERN Document Server

    Pardoe, Iain

    2012-01-01

    An applied and concise treatment of statistical regression techniques for business students and professionals who have little or no background in calculusRegression analysis is an invaluable statistical methodology in business settings and is vital to model the relationship between a response variable and one or more predictor variables, as well as the prediction of a response value given values of the predictors. In view of the inherent uncertainty of business processes, such as the volatility of consumer spending and the presence of market uncertainty, business professionals use regression a

  18. Boosted regression trees, multivariate adaptive regression splines and their two-step combinations with multiple linear regression or partial least squares to predict blood-brain barrier passage: a case study.

    Science.gov (United States)

    Deconinck, E; Zhang, M H; Petitet, F; Dubus, E; Ijjaali, I; Coomans, D; Vander Heyden, Y

    2008-02-18

    The use of some unconventional non-linear modeling techniques, i.e. classification and regression trees and multivariate adaptive regression splines-based methods, was explored to model the blood-brain barrier (BBB) passage of drugs and drug-like molecules. The data set contains BBB passage values for 299 structural and pharmacological diverse drugs, originating from a structured knowledge-based database. Models were built using boosted regression trees (BRT) and multivariate adaptive regression splines (MARS), as well as their respective combinations with stepwise multiple linear regression (MLR) and partial least squares (PLS) regression in two-step approaches. The best models were obtained using combinations of MARS with either stepwise MLR or PLS. It could be concluded that the use of combinations of a linear with a non-linear modeling technique results in some improved properties compared to the individual linear and non-linear models and that, when the use of such a combination is appropriate, combinations using MARS as non-linear technique should be preferred over those with BRT, due to some serious drawbacks of the BRT approaches.

  19. A stepwise regression tree for nonlinear approximation: applications to estimating subpixel land cover

    Science.gov (United States)

    Huang, C.; Townshend, J.R.G.

    2003-01-01

    A stepwise regression tree (SRT) algorithm was developed for approximating complex nonlinear relationships. Based on the regression tree of Breiman et al . (BRT) and a stepwise linear regression (SLR) method, this algorithm represents an improvement over SLR in that it can approximate nonlinear relationships and over BRT in that it gives more realistic predictions. The applicability of this method to estimating subpixel forest was demonstrated using three test data sets, on all of which it gave more accurate predictions than SLR and BRT. SRT also generated more compact trees and performed better than or at least as well as BRT at all 10 equal forest proportion interval ranging from 0 to 100%. This method is appealing to estimating subpixel land cover over large areas.

  20. Aneurysmal subarachnoid hemorrhage prognostic decision-making algorithm using classification and regression tree analysis.

    Science.gov (United States)

    Lo, Benjamin W Y; Fukuda, Hitoshi; Angle, Mark; Teitelbaum, Jeanne; Macdonald, R Loch; Farrokhyar, Forough; Thabane, Lehana; Levine, Mitchell A H

    2016-01-01

    Classification and regression tree analysis involves the creation of a decision tree by recursive partitioning of a dataset into more homogeneous subgroups. Thus far, there is scarce literature on using this technique to create clinical prediction tools for aneurysmal subarachnoid hemorrhage (SAH). The classification and regression tree analysis technique was applied to the multicenter Tirilazad database (3551 patients) in order to create the decision-making algorithm. In order to elucidate prognostic subgroups in aneurysmal SAH, neurologic, systemic, and demographic factors were taken into account. The dependent variable used for analysis was the dichotomized Glasgow Outcome Score at 3 months. Classification and regression tree analysis revealed seven prognostic subgroups. Neurological grade, occurrence of post-admission stroke, occurrence of post-admission fever, and age represented the explanatory nodes of this decision tree. Split sample validation revealed classification accuracy of 79% for the training dataset and 77% for the testing dataset. In addition, the occurrence of fever at 1-week post-aneurysmal SAH is associated with increased odds of post-admission stroke (odds ratio: 1.83, 95% confidence interval: 1.56-2.45, P tree was generated, which serves as a prediction tool to guide bedside prognostication and clinical treatment decision making. This prognostic decision-making algorithm also shed light on the complex interactions between a number of risk factors in determining outcome after aneurysmal SAH.

  1. The decision tree approach to classification

    Science.gov (United States)

    Wu, C.; Landgrebe, D. A.; Swain, P. H.

    1975-01-01

    A class of multistage decision tree classifiers is proposed and studied relative to the classification of multispectral remotely sensed data. The decision tree classifiers are shown to have the potential for improving both the classification accuracy and the computation efficiency. Dimensionality in pattern recognition is discussed and two theorems on the lower bound of logic computation for multiclass classification are derived. The automatic or optimization approach is emphasized. Experimental results on real data are reported, which clearly demonstrate the usefulness of decision tree classifiers.

  2. Combining logistic regression with classification and regression tree to predict quality of care in a home health nursing data set.

    Science.gov (United States)

    Guo, Huey-Ming; Shyu, Yea-Ing Lotus; Chang, Her-Kun

    2006-01-01

    In this article, the authors provide an overview of a research method to predict quality of care in home health nursing data set. The results of this study can be visualized through classification an regression tree (CART) graphs. The analysis was more effective, and the results were more informative since the home health nursing dataset was analyzed with a combination of the logistic regression and CART, these two techniques complete each other. And the results more informative that more patients' characters were related to quality of care in home care. The results contributed to home health nurse predict patient outcome in case management. Improved prediction is needed for interventions to be appropriately targeted for improved patient outcome and quality of care.

  3. Risk Factors of Falls in Community-Dwelling Older Adults: Logistic Regression Tree Analysis

    Science.gov (United States)

    Yamashita, Takashi; Noe, Douglas A.; Bailer, A. John

    2012-01-01

    Purpose of the Study: A novel logistic regression tree-based method was applied to identify fall risk factors and possible interaction effects of those risk factors. Design and Methods: A nationally representative sample of American older adults aged 65 years and older (N = 9,592) in the Health and Retirement Study 2004 and 2006 modules was used.…

  4. Reconstructing missing daily precipitation data using regression trees and artificial neural networks

    Science.gov (United States)

    Incomplete meteorological data has been a problem in environmental modeling studies. The objective of this work was to develop a technique to reconstruct missing daily precipitation data in the central part of Chesapeake Bay Watershed using regression trees (RT) and artificial neural networks (ANN)....

  5. Using the PDD Behavior Inventory as a Level 2 Screener: A Classification and Regression Trees Analysis

    Science.gov (United States)

    Cohen, Ira L.; Liu, Xudong; Hudson, Melissa; Gillis, Jennifer; Cavalari, Rachel N. S.; Romanczyk, Raymond G.; Karmel, Bernard Z.; Gardner, Judith M.

    2016-01-01

    In order to improve discrimination accuracy between Autism Spectrum Disorder (ASD) and similar neurodevelopmental disorders, a data mining procedure, Classification and Regression Trees (CART), was used on a large multi-site sample of PDD Behavior Inventory (PDDBI) forms on children with and without ASD. Discrimination accuracy exceeded 80%,…

  6. Fractal approach to computer-analytical modelling of tree crown

    International Nuclear Information System (INIS)

    Berezovskaya, F.S.; Karev, G.P.; Kisliuk, O.F.; Khlebopros, R.G.; Tcelniker, Yu.L.

    1993-09-01

    In this paper we discuss three approaches to the modeling of a tree crown development. These approaches are experimental (i.e. regressive), theoretical (i.e. analytical) and simulation (i.e. computer) modeling. The common assumption of these is that a tree can be regarded as one of the fractal objects which is the collection of semi-similar objects and combines the properties of two- and three-dimensional bodies. We show that a fractal measure of crown can be used as the link between the mathematical models of crown growth and light propagation through canopy. The computer approach gives the possibility to visualize a crown development and to calibrate the model on experimental data. In the paper different stages of the above-mentioned approaches are described. The experimental data for spruce, the description of computer system for modeling and the variant of computer model are presented. (author). 9 refs, 4 figs

  7. An efficient and extensible approach for compressing phylogenetic trees

    KAUST Repository

    Matthews, Suzanne J; Williams, Tiffani L

    2011-01-01

    Background: Biologists require new algorithms to efficiently compress and store their large collections of phylogenetic trees. Our previous work showed that TreeZip is a promising approach for compressing phylogenetic trees. In this paper, we extend

  8. A review of logistic regression models used to predict post-fire tree mortality of western North American conifers

    Science.gov (United States)

    Travis Woolley; David C. Shaw; Lisa M. Ganio; Stephen. Fitzgerald

    2012-01-01

    Logistic regression models used to predict tree mortality are critical to post-fire management, planning prescribed bums and understanding disturbance ecology. We review literature concerning post-fire mortality prediction using logistic regression models for coniferous tree species in the western USA. We include synthesis and review of: methods to develop, evaluate...

  9. Capacitance Regression Modelling Analysis on Latex from Selected Rubber Tree Clones

    International Nuclear Information System (INIS)

    Rosli, A D; Baharudin, R; Hashim, H; Khairuzzaman, N A; Mohd Sampian, A F; Abdullah, N E; Kamaru'zzaman, M; Sulaiman, M S

    2015-01-01

    This paper investigates the capacitance regression modelling performance of latex for various rubber tree clones, namely clone 2002, 2008, 2014 and 3001. Conventionally, the rubber tree clones identification are based on observation towards tree features such as shape of leaf, trunk, branching habit and pattern of seeds texture. The former method requires expert persons and very time-consuming. Currently, there is no sensing device based on electrical properties that can be employed to measure different clones from latex samples. Hence, with a hypothesis that the dielectric constant of each clone varies, this paper discusses the development of a capacitance sensor via Capacitance Comparison Bridge (known as capacitance sensor) to measure an output voltage of different latex samples. The proposed sensor is initially tested with 30ml of latex sample prior to gradually addition of dilution water. The output voltage and capacitance obtained from the test are recorded and analyzed using Simple Linear Regression (SLR) model. This work outcome infers that latex clone of 2002 has produced the highest and reliable linear regression line with determination coefficient of 91.24%. In addition, the study also found that the capacitive elements in latex samples deteriorate if it is diluted with higher volume of water. (paper)

  10. A simple approach to power and sample size calculations in logistic regression and Cox regression models.

    Science.gov (United States)

    Vaeth, Michael; Skovlund, Eva

    2004-06-15

    For a given regression problem it is possible to identify a suitably defined equivalent two-sample problem such that the power or sample size obtained for the two-sample problem also applies to the regression problem. For a standard linear regression model the equivalent two-sample problem is easily identified, but for generalized linear models and for Cox regression models the situation is more complicated. An approximately equivalent two-sample problem may, however, also be identified here. In particular, we show that for logistic regression and Cox regression models the equivalent two-sample problem is obtained by selecting two equally sized samples for which the parameters differ by a value equal to the slope times twice the standard deviation of the independent variable and further requiring that the overall expected number of events is unchanged. In a simulation study we examine the validity of this approach to power calculations in logistic regression and Cox regression models. Several different covariate distributions are considered for selected values of the overall response probability and a range of alternatives. For the Cox regression model we consider both constant and non-constant hazard rates. The results show that in general the approach is remarkably accurate even in relatively small samples. Some discrepancies are, however, found in small samples with few events and a highly skewed covariate distribution. Comparison with results based on alternative methods for logistic regression models with a single continuous covariate indicates that the proposed method is at least as good as its competitors. The method is easy to implement and therefore provides a simple way to extend the range of problems that can be covered by the usual formulas for power and sample size determination. Copyright 2004 John Wiley & Sons, Ltd.

  11. [Hyperspectral Estimation of Apple Tree Canopy LAI Based on SVM and RF Regression].

    Science.gov (United States)

    Han, Zhao-ying; Zhu, Xi-cun; Fang, Xian-yi; Wang, Zhuo-yuan; Wang, Ling; Zhao, Geng-Xing; Jiang, Yuan-mao

    2016-03-01

    Leaf area index (LAI) is the dynamic index of crop population size. Hyperspectral technology can be used to estimate apple canopy LAI rapidly and nondestructively. It can be provide a reference for monitoring the tree growing and yield estimation. The Red Fuji apple trees of full bearing fruit are the researching objects. Ninety apple trees canopies spectral reflectance and LAI values were measured by the ASD Fieldspec3 spectrometer and LAI-2200 in thirty orchards in constant two years in Qixia research area of Shandong Province. The optimal vegetation indices were selected by the method of correlation analysis of the original spectral reflectance and vegetation indices. The models of predicting the LAI were built with the multivariate regression analysis method of support vector machine (SVM) and random forest (RF). The new vegetation indices, GNDVI527, ND-VI676, RVI682, FD-NVI656 and GRVI517 and the previous two main vegetation indices, NDVI670 and NDVI705, are in accordance with LAI. In the RF regression model, the calibration set decision coefficient C-R2 of 0.920 and validation set decision coefficient V-R2 of 0.889 are higher than the SVM regression model by 0.045 and 0.033 respectively. The root mean square error of calibration set C-RMSE of 0.249, the root mean square error validation set V-RMSE of 0.236 are lower than that of the SVM regression model by 0.054 and 0.058 respectively. Relative analysis of calibrating error C-RPD and relative analysis of validation set V-RPD reached 3.363 and 2.520, 0.598 and 0.262, respectively, which were higher than the SVM regression model. The measured and predicted the scatterplot trend line slope of the calibration set and validation set C-S and V-S are close to 1. The estimation result of RF regression model is better than that of the SVM. RF regression model can be used to estimate the LAI of red Fuji apple trees in full fruit period.

  12. An efficient and extensible approach for compressing phylogenetic trees

    KAUST Repository

    Matthews, Suzanne J

    2011-01-01

    Background: Biologists require new algorithms to efficiently compress and store their large collections of phylogenetic trees. Our previous work showed that TreeZip is a promising approach for compressing phylogenetic trees. In this paper, we extend our TreeZip algorithm by handling trees with weighted branches. Furthermore, by using the compressed TreeZip file as input, we have designed an extensible decompressor that can extract subcollections of trees, compute majority and strict consensus trees, and merge tree collections using set operations such as union, intersection, and set difference.Results: On unweighted phylogenetic trees, TreeZip is able to compress Newick files in excess of 98%. On weighted phylogenetic trees, TreeZip is able to compress a Newick file by at least 73%. TreeZip can be combined with 7zip with little overhead, allowing space savings in excess of 99% (unweighted) and 92%(weighted). Unlike TreeZip, 7zip is not immune to branch rotations, and performs worse as the level of variability in the Newick string representation increases. Finally, since the TreeZip compressed text (TRZ) file contains all the semantic information in a collection of trees, we can easily filter and decompress a subset of trees of interest (such as the set of unique trees), or build the resulting consensus tree in a matter of seconds. We also show the ease of which set operations can be performed on TRZ files, at speeds quicker than those performed on Newick or 7zip compressed Newick files, and without loss of space savings.Conclusions: TreeZip is an efficient approach for compressing large collections of phylogenetic trees. The semantic and compact nature of the TRZ file allow it to be operated upon directly and quickly, without a need to decompress the original Newick file. We believe that TreeZip will be vital for compressing and archiving trees in the biological community. © 2011 Matthews and Williams; licensee BioMed Central Ltd.

  13. An efficient and extensible approach for compressing phylogenetic trees.

    Science.gov (United States)

    Matthews, Suzanne J; Williams, Tiffani L

    2011-10-18

    Biologists require new algorithms to efficiently compress and store their large collections of phylogenetic trees. Our previous work showed that TreeZip is a promising approach for compressing phylogenetic trees. In this paper, we extend our TreeZip algorithm by handling trees with weighted branches. Furthermore, by using the compressed TreeZip file as input, we have designed an extensible decompressor that can extract subcollections of trees, compute majority and strict consensus trees, and merge tree collections using set operations such as union, intersection, and set difference. On unweighted phylogenetic trees, TreeZip is able to compress Newick files in excess of 98%. On weighted phylogenetic trees, TreeZip is able to compress a Newick file by at least 73%. TreeZip can be combined with 7zip with little overhead, allowing space savings in excess of 99% (unweighted) and 92%(weighted). Unlike TreeZip, 7zip is not immune to branch rotations, and performs worse as the level of variability in the Newick string representation increases. Finally, since the TreeZip compressed text (TRZ) file contains all the semantic information in a collection of trees, we can easily filter and decompress a subset of trees of interest (such as the set of unique trees), or build the resulting consensus tree in a matter of seconds. We also show the ease of which set operations can be performed on TRZ files, at speeds quicker than those performed on Newick or 7zip compressed Newick files, and without loss of space savings. TreeZip is an efficient approach for compressing large collections of phylogenetic trees. The semantic and compact nature of the TRZ file allow it to be operated upon directly and quickly, without a need to decompress the original Newick file. We believe that TreeZip will be vital for compressing and archiving trees in the biological community.

  14. New approaches to evaluating fault trees

    International Nuclear Information System (INIS)

    Sinnamon, R.M.; Andrews, J.D.

    1997-01-01

    Fault Tree Analysis is now a widely accepted technique to assess the probability and frequency of system failure in many industries. For complex systems an analysis may produce hundreds of thousands of combinations of events which can cause system failure (minimal cut sets). The determination of these cut sets can be a very time consuming process even on modern high speed digital computers. Computerised methods, such as bottom-up or top-down approaches, to conduct this analysis are now so well developed that further refinement is unlikely to result in vast reductions in computer time. It is felt that substantial improvement in computer utilisation will only result from a completely new approach. This paper describes the use of a Binary Decision Diagram for Fault Tree Analysis and some ways in which it can be efficiently implemented on a computer. In particular, attention is given to the production of a minimum form of the Binary Decision Diagram by considering the ordering that has to be given to the basic events of the fault tree

  15. Perceived Organizational Support for Enhancing Welfare at Work: A Regression Tree Model

    Science.gov (United States)

    Giorgi, Gabriele; Dubin, David; Perez, Javier Fiz

    2016-01-01

    When trying to examine outcomes such as welfare and well-being, research tends to focus on main effects and take into account limited numbers of variables at a time. There are a number of techniques that may help address this problem. For example, many statistical packages available in R provide easy-to-use methods of modeling complicated analysis such as classification and tree regression (i.e., recursive partitioning). The present research illustrates the value of recursive partitioning in the prediction of perceived organizational support in a sample of more than 6000 Italian bankers. Utilizing the tree function party package in R, we estimated a regression tree model predicting perceived organizational support from a multitude of job characteristics including job demand, lack of job control, lack of supervisor support, training, etc. The resulting model appears particularly helpful in pointing out several interactions in the prediction of perceived organizational support. In particular, training is the dominant factor. Another dimension that seems to influence organizational support is reporting (perceived communication about safety and stress concerns). Results are discussed from a theoretical and methodological point of view. PMID:28082924

  16. Improved predictive mapping of indoor radon concentrations using ensemble regression trees based on automatic clustering of geological units

    International Nuclear Information System (INIS)

    Kropat, Georg; Bochud, Francois; Jaboyedoff, Michel; Laedermann, Jean-Pascal; Murith, Christophe; Palacios, Martha; Baechler, Sébastien

    2015-01-01

    Purpose: According to estimations around 230 people die as a result of radon exposure in Switzerland. This public health concern makes reliable indoor radon prediction and mapping methods necessary in order to improve risk communication to the public. The aim of this study was to develop an automated method to classify lithological units according to their radon characteristics and to develop mapping and predictive tools in order to improve local radon prediction. Method: About 240 000 indoor radon concentration (IRC) measurements in about 150 000 buildings were available for our analysis. The automated classification of lithological units was based on k-medoids clustering via pair-wise Kolmogorov distances between IRC distributions of lithological units. For IRC mapping and prediction we used random forests and Bayesian additive regression trees (BART). Results: The automated classification groups lithological units well in terms of their IRC characteristics. Especially the IRC differences in metamorphic rocks like gneiss are well revealed by this method. The maps produced by random forests soundly represent the regional difference of IRCs in Switzerland and improve the spatial detail compared to existing approaches. We could explain 33% of the variations in IRC data with random forests. Additionally, the influence of a variable evaluated by random forests shows that building characteristics are less important predictors for IRCs than spatial/geological influences. BART could explain 29% of IRC variability and produced maps that indicate the prediction uncertainty. Conclusion: Ensemble regression trees are a powerful tool to model and understand the multidimensional influences on IRCs. Automatic clustering of lithological units complements this method by facilitating the interpretation of radon properties of rock types. This study provides an important element for radon risk communication. Future approaches should consider taking into account further variables

  17. The Roots of Inequality: Estimating Inequality of Opportunity from Regression Trees

    DEFF Research Database (Denmark)

    Brunori, Paolo; Hufe, Paul; Mahler, Daniel Gerszon

    2017-01-01

    the risk of arbitrary and ad-hoc model selection. Second, they provide a standardized way of trading off upward and downward biases in inequality of opportunity estimations. Finally, regression trees can be graphically represented; their structure is immediate to read and easy to understand. This will make...... the measurement of inequality of opportunity more easily comprehensible to a large audience. These advantages are illustrated by an empirical application based on the 2011 wave of the European Union Statistics on Income and Living Conditions....

  18. bayesQR: A Bayesian Approach to Quantile Regression

    Directory of Open Access Journals (Sweden)

    Dries F. Benoit

    2017-01-01

    Full Text Available After its introduction by Koenker and Basset (1978, quantile regression has become an important and popular tool to investigate the conditional response distribution in regression. The R package bayesQR contains a number of routines to estimate quantile regression parameters using a Bayesian approach based on the asymmetric Laplace distribution. The package contains functions for the typical quantile regression with continuous dependent variable, but also supports quantile regression for binary dependent variables. For both types of dependent variables, an approach to variable selection using the adaptive lasso approach is provided. For the binary quantile regression model, the package also contains a routine that calculates the fitted probabilities for each vector of predictors. In addition, functions for summarizing the results, creating traceplots, posterior histograms and drawing quantile plots are included. This paper starts with a brief overview of the theoretical background of the models used in the bayesQR package. The main part of this paper discusses the computational problems that arise in the implementation of the procedure and illustrates the usefulness of the package through selected examples.

  19. Downscaling soil moisture over East Asia through multi-sensor data fusion and optimization of regression trees

    Science.gov (United States)

    Park, Seonyoung; Im, Jungho; Park, Sumin; Rhee, Jinyoung

    2017-04-01

    Soil moisture is one of the most important keys for understanding regional and global climate systems. Soil moisture is directly related to agricultural processes as well as hydrological processes because soil moisture highly influences vegetation growth and determines water supply in the agroecosystem. Accurate monitoring of the spatiotemporal pattern of soil moisture is important. Soil moisture has been generally provided through in situ measurements at stations. Although field survey from in situ measurements provides accurate soil moisture with high temporal resolution, it requires high cost and does not provide the spatial distribution of soil moisture over large areas. Microwave satellite (e.g., advanced Microwave Scanning Radiometer on the Earth Observing System (AMSR2), the Advanced Scatterometer (ASCAT), and Soil Moisture Active Passive (SMAP)) -based approaches and numerical models such as Global Land Data Assimilation System (GLDAS) and Modern- Era Retrospective Analysis for Research and Applications (MERRA) provide spatial-temporalspatiotemporally continuous soil moisture products at global scale. However, since those global soil moisture products have coarse spatial resolution ( 25-40 km), their applications for agriculture and water resources at local and regional scales are very limited. Thus, soil moisture downscaling is needed to overcome the limitation of the spatial resolution of soil moisture products. In this study, GLDAS soil moisture data were downscaled up to 1 km spatial resolution through the integration of AMSR2 and ASCAT soil moisture data, Shuttle Radar Topography Mission (SRTM) Digital Elevation Model (DEM), and Moderate Resolution Imaging Spectroradiometer (MODIS) data—Land Surface Temperature, Normalized Difference Vegetation Index, and Land cover—using modified regression trees over East Asia from 2013 to 2015. Modified regression trees were implemented using Cubist, a commercial software tool based on machine learning. An

  20. Approaches to Low Fuel Regression Rate in Hybrid Rocket Engines

    Directory of Open Access Journals (Sweden)

    Dario Pastrone

    2012-01-01

    Full Text Available Hybrid rocket engines are promising propulsion systems which present appealing features such as safety, low cost, and environmental friendliness. On the other hand, certain issues hamper the development hoped for. The present paper discusses approaches addressing improvements to one of the most important among these issues: low fuel regression rate. To highlight the consequence of such an issue and to better understand the concepts proposed, fundamentals are summarized. Two approaches are presented (multiport grain and high mixture ratio which aim at reducing negative effects without enhancing regression rate. Furthermore, fuel material changes and nonconventional geometries of grain and/or injector are presented as methods to increase fuel regression rate. Although most of these approaches are still at the laboratory or concept scale, many of them are promising.

  1. Regression trees for predicting mortality in patients with cardiovascular disease: What improvement is achieved by using ensemble-based methods?

    Science.gov (United States)

    Austin, Peter C; Lee, Douglas S; Steyerberg, Ewout W; Tu, Jack V

    2012-01-01

    In biomedical research, the logistic regression model is the most commonly used method for predicting the probability of a binary outcome. While many clinical researchers have expressed an enthusiasm for regression trees, this method may have limited accuracy for predicting health outcomes. We aimed to evaluate the improvement that is achieved by using ensemble-based methods, including bootstrap aggregation (bagging) of regression trees, random forests, and boosted regression trees. We analyzed 30-day mortality in two large cohorts of patients hospitalized with either acute myocardial infarction (N = 16,230) or congestive heart failure (N = 15,848) in two distinct eras (1999–2001 and 2004–2005). We found that both the in-sample and out-of-sample prediction of ensemble methods offered substantial improvement in predicting cardiovascular mortality compared to conventional regression trees. However, conventional logistic regression models that incorporated restricted cubic smoothing splines had even better performance. We conclude that ensemble methods from the data mining and machine learning literature increase the predictive performance of regression trees, but may not lead to clear advantages over conventional logistic regression models for predicting short-term mortality in population-based samples of subjects with cardiovascular disease. PMID:22777999

  2. Exploring the predictive power of interaction terms in a sophisticated risk equalization model using regression trees.

    Science.gov (United States)

    van Veen, S H C M; van Kleef, R C; van de Ven, W P M M; van Vliet, R C J A

    2018-02-01

    This study explores the predictive power of interaction terms between the risk adjusters in the Dutch risk equalization (RE) model of 2014. Due to the sophistication of this RE-model and the complexity of the associations in the dataset (N = ~16.7 million), there are theoretically more than a million interaction terms. We used regression tree modelling, which has been applied rarely within the field of RE, to identify interaction terms that statistically significantly explain variation in observed expenses that is not already explained by the risk adjusters in this RE-model. The interaction terms identified were used as additional risk adjusters in the RE-model. We found evidence that interaction terms can improve the prediction of expenses overall and for specific groups in the population. However, the prediction of expenses for some other selective groups may deteriorate. Thus, interactions can reduce financial incentives for risk selection for some groups but may increase them for others. Furthermore, because regression trees are not robust, additional criteria are needed to decide which interaction terms should be used in practice. These criteria could be the right incentive structure for risk selection and efficiency or the opinion of medical experts. Copyright © 2017 John Wiley & Sons, Ltd.

  3. Regression tree analysis for predicting body weight of Nigerian Muscovy duck (Cairina moschata

    Directory of Open Access Journals (Sweden)

    Oguntunji Abel Olusegun

    2017-01-01

    Full Text Available Morphometric parameters and their indices are central to the understanding of the type and function of livestock. The present study was conducted to predict body weight (BWT of adult Nigerian Muscovy ducks from nine (9 morphometric parameters and seven (7 body indices and also to identify the most important predictor of BWT among them using regression tree analysis (RTA. The experimental birds comprised of 1,020 adult male and female Nigerian Muscovy ducks randomly sampled in Rain Forest (203, Guinea Savanna (298 and Derived Savanna (519 agro-ecological zones. Result of RTA revealed that compactness; body girth and massiveness were the most important independent variables in predicting BWT and were used in constructing RT. The combined effect of the three predictors was very high and explained 91.00% of the observed variation of the target variable (BWT. The optimal regression tree suggested that Muscovy ducks with compactness >5.765 would be fleshy and have highest BWT. The result of the present study could be exploited by animal breeders and breeding companies in selection and improvement of BWT of Muscovy ducks.

  4. Application of Logistic Regression Tree Model in Determining Habitat Distribution of Astragalus verus

    Directory of Open Access Journals (Sweden)

    M. Saki

    2013-03-01

    Full Text Available The relationship between plant species and environmental factors has always been a central issue in plant ecology. With rising power of statistical techniques, geo-statistics and geographic information systems (GIS, the development of predictive habitat distribution models of organisms has rapidly increased in ecology. This study aimed to evaluate the ability of Logistic Regression Tree model to create potential habitat map of Astragalus verus. This species produces Tragacanth and has economic value. A stratified- random sampling was applied to 100 sites (50 presence- 50 absence of given species, and produced environmental and edaphic factors maps by using Kriging and Inverse Distance Weighting methods in the ArcGIS software for the whole study area. Relationships between species occurrence and environmental factors were determined by Logistic Regression Tree model and extended to the whole study area. The results indicated species occurrence has strong correlation with environmental factors such as mean daily temperature and clay, EC and organic carbon content of the soil. Species occurrence showed direct relationship with mean daily temperature and clay and organic carbon, and inverse relationship with EC. Model accuracy was evaluated both by Cohen’s kappa statistics (κ and by area under Receiver Operating Characteristics curve based on independent test data set. Their values (kappa=0.9, Auc of ROC=0.96 indicated the high power of LRT to create potential habitat map on local scales. This model, therefore, can be applied to recognize potential sites for rangeland reclamation projects.

  5. Development of hybrid genetic-algorithm-based neural networks using regression trees for modeling air quality inside a public transportation bus.

    Science.gov (United States)

    Kadiyala, Akhil; Kaur, Devinder; Kumar, Ashok

    2013-02-01

    The present study developed a novel approach to modeling indoor air quality (IAQ) of a public transportation bus by the development of hybrid genetic-algorithm-based neural networks (also known as evolutionary neural networks) with input variables optimized from using the regression trees, referred as the GART approach. This study validated the applicability of the GART modeling approach in solving complex nonlinear systems by accurately predicting the monitored contaminants of carbon dioxide (CO2), carbon monoxide (CO), nitric oxide (NO), sulfur dioxide (SO2), 0.3-0.4 microm sized particle numbers, 0.4-0.5 microm sized particle numbers, particulate matter (PM) concentrations less than 1.0 microm (PM10), and PM concentrations less than 2.5 microm (PM2.5) inside a public transportation bus operating on 20% grade biodiesel in Toledo, OH. First, the important variables affecting each monitored in-bus contaminant were determined using regression trees. Second, the analysis of variance was used as a complimentary sensitivity analysis to the regression tree results to determine a subset of statistically significant variables affecting each monitored in-bus contaminant. Finally, the identified subsets of statistically significant variables were used as inputs to develop three artificial neural network (ANN) models. The models developed were regression tree-based back-propagation network (BPN-RT), regression tree-based radial basis function network (RBFN-RT), and GART models. Performance measures were used to validate the predictive capacity of the developed IAQ models. The results from this approach were compared with the results obtained from using a theoretical approach and a generalized practicable approach to modeling IAQ that included the consideration of additional independent variables when developing the aforementioned ANN models. The hybrid GART models were able to capture majority of the variance in the monitored in-bus contaminants. The genetic

  6. Variances in the projections, resulting from CLIMEX, Boosted Regression Trees and Random Forests techniques

    Science.gov (United States)

    Shabani, Farzin; Kumar, Lalit; Solhjouy-fard, Samaneh

    2017-08-01

    The aim of this study was to have a comparative investigation and evaluation of the capabilities of correlative and mechanistic modeling processes, applied to the projection of future distributions of date palm in novel environments and to establish a method of minimizing uncertainty in the projections of differing techniques. The location of this study on a global scale is in Middle Eastern Countries. We compared the mechanistic model CLIMEX (CL) with the correlative models MaxEnt (MX), Boosted Regression Trees (BRT), and Random Forests (RF) to project current and future distributions of date palm ( Phoenix dactylifera L.). The Global Climate Model (GCM), the CSIRO-Mk3.0 (CS) using the A2 emissions scenario, was selected for making projections. Both indigenous and alien distribution data of the species were utilized in the modeling process. The common areas predicted by MX, BRT, RF, and CL from the CS GCM were extracted and compared to ascertain projection uncertainty levels of each individual technique. The common areas identified by all four modeling techniques were used to produce a map indicating suitable and unsuitable areas for date palm cultivation for Middle Eastern countries, for the present and the year 2100. The four different modeling approaches predict fairly different distributions. Projections from CL were more conservative than from MX. The BRT and RF were the most conservative methods in terms of projections for the current time. The combination of the final CL and MX projections for the present and 2100 provide higher certainty concerning those areas that will become highly suitable for future date palm cultivation. According to the four models, cold, hot, and wet stress, with differences on a regional basis, appears to be the major restrictions on future date palm distribution. The results demonstrate variances in the projections, resulting from different techniques. The assessment and interpretation of model projections requires reservations

  7. Predicting smear negative pulmonary tuberculosis with classification trees and logistic regression: a cross-sectional study

    Directory of Open Access Journals (Sweden)

    Kritski Afrânio

    2006-02-01

    Full Text Available Abstract Background Smear negative pulmonary tuberculosis (SNPT accounts for 30% of pulmonary tuberculosis cases reported yearly in Brazil. This study aimed to develop a prediction model for SNPT for outpatients in areas with scarce resources. Methods The study enrolled 551 patients with clinical-radiological suspicion of SNPT, in Rio de Janeiro, Brazil. The original data was divided into two equivalent samples for generation and validation of the prediction models. Symptoms, physical signs and chest X-rays were used for constructing logistic regression and classification and regression tree models. From the logistic regression, we generated a clinical and radiological prediction score. The area under the receiver operator characteristic curve, sensitivity, and specificity were used to evaluate the model's performance in both generation and validation samples. Results It was possible to generate predictive models for SNPT with sensitivity ranging from 64% to 71% and specificity ranging from 58% to 76%. Conclusion The results suggest that those models might be useful as screening tools for estimating the risk of SNPT, optimizing the utilization of more expensive tests, and avoiding costs of unnecessary anti-tuberculosis treatment. Those models might be cost-effective tools in a health care network with hierarchical distribution of scarce resources.

  8. Prediction of cannabis and cocaine use in adolescence using decision trees and logistic regression

    Directory of Open Access Journals (Sweden)

    Alfonso L. Palmer

    2010-01-01

    Full Text Available Spain is one of the European countries with the highest prevalence of cannabis and cocaine use among young people. The aim of this study was to investigate the factors related to the consumption of cocaine and cannabis among adolescents. A questionnaire was administered to 9,284 students between 14 and 18 years of age in Palma de Mallorca (47.1% boys and 52.9% girls whose mean age was 15.59 years. Logistic regression and decision trees were carried out in order to model the consumption of cannabis and cocaine. The results show the use of legal substances and committing fraudulence or theft are the main variables that raise the odds of consuming cannabis. In boys, cannabis consumption and a family history of drug use increase the odds of consuming cocaine, whereas in girls the use of alcohol, behaviours of fraudulence or theft and difficulty in some personal skills influence their odds of consuming cocaine. Finally, ease of access to the substance greatly raises the odds of consuming cocaine and cannabis in both genders. Decision trees highlight the role of consuming other substances and committing fraudulence or theft. The results of this study gain importance when it comes to putting into practice effective prevention programmes.

  9. Building optimal regression tree by ant colony system-genetic algorithm: Application to modeling of melting points

    Energy Technology Data Exchange (ETDEWEB)

    Hemmateenejad, Bahram, E-mail: hemmatb@sums.ac.ir [Department of Chemistry, Shiraz University, Shiraz (Iran, Islamic Republic of); Medicinal and Natural Products Chemistry Research Center, Shiraz University of Medical Sciences, Shiraz (Iran, Islamic Republic of); Shamsipur, Mojtaba [Department of Chemistry, Razi University, Kermanshah (Iran, Islamic Republic of); Zare-Shahabadi, Vali [Young Researchers Club, Mahshahr Branch, Islamic Azad University, Mahshahr (Iran, Islamic Republic of); Akhond, Morteza [Department of Chemistry, Shiraz University, Shiraz (Iran, Islamic Republic of)

    2011-10-17

    Highlights: {yields} Ant colony systems help to build optimum classification and regression trees. {yields} Using of genetic algorithm operators in ant colony systems resulted in more appropriate models. {yields} Variable selection in each terminal node of the tree gives promising results. {yields} CART-ACS-GA could model the melting point of organic materials with prediction errors lower than previous models. - Abstract: The classification and regression trees (CART) possess the advantage of being able to handle large data sets and yield readily interpretable models. A conventional method of building a regression tree is recursive partitioning, which results in a good but not optimal tree. Ant colony system (ACS), which is a meta-heuristic algorithm and derived from the observation of real ants, can be used to overcome this problem. The purpose of this study was to explore the use of CART and its combination with ACS for modeling of melting points of a large variety of chemical compounds. Genetic algorithm (GA) operators (e.g., cross averring and mutation operators) were combined with ACS algorithm to select the best solution model. In addition, at each terminal node of the resulted tree, variable selection was done by ACS-GA algorithm to build an appropriate partial least squares (PLS) model. To test the ability of the resulted tree, a set of approximately 4173 structures and their melting points were used (3000 compounds as training set and 1173 as validation set). Further, an external test set containing of 277 drugs was used to validate the prediction ability of the tree. Comparison of the results obtained from both trees showed that the tree constructed by ACS-GA algorithm performs better than that produced by recursive partitioning procedure.

  10. Analysing inequalities in Germany a structured additive distributional regression approach

    CERN Document Server

    Silbersdorff, Alexander

    2017-01-01

    This book seeks new perspectives on the growing inequalities that our societies face, putting forward Structured Additive Distributional Regression as a means of statistical analysis that circumvents the common problem of analytical reduction to simple point estimators. This new approach allows the observed discrepancy between the individuals’ realities and the abstract representation of those realities to be explicitly taken into consideration using the arithmetic mean alone. In turn, the method is applied to the question of economic inequality in Germany.

  11. Integrating classification trees with local logistic regression in Intensive Care prognosis.

    Science.gov (United States)

    Abu-Hanna, Ameen; de Keizer, Nicolette

    2003-01-01

    Health care effectiveness and efficiency are under constant scrutiny especially when treatment is quite costly as in the Intensive Care (IC). Currently there are various international quality of care programs for the evaluation of IC. At the heart of such quality of care programs lie prognostic models whose prediction of patient mortality can be used as a norm to which actual mortality is compared. The current generation of prognostic models in IC are statistical parametric models based on logistic regression. Given a description of a patient at admission, these models predict the probability of his or her survival. Typically, this patient description relies on an aggregate variable, called a score, that quantifies the severity of illness of the patient. The use of a parametric model and an aggregate score form adequate means to develop models when data is relatively scarce but it introduces the risk of bias. This paper motivates and suggests a method for studying and improving the performance behavior of current state-of-the-art IC prognostic models. Our method is based on machine learning and statistical ideas and relies on exploiting information that underlies a score variable. In particular, this underlying information is used to construct a classification tree whose nodes denote patient sub-populations. For these sub-populations, local models, most notably logistic regression ones, are developed using only the total score variable. We compare the performance of this hybrid model to that of a traditional global logistic regression model. We show that the hybrid model not only provides more insight into the data but also has a better performance. We pay special attention to the precision aspect of model performance and argue why precision is more important than discrimination ability.

  12. Naive Fault Tree : formulation of the approach

    NARCIS (Netherlands)

    Rajabalinejad, M

    2017-01-01

    Naive Fault Tree (NFT) accepts a single value or a range of values for each basic event and returns values for the top event. This accommodates the need of commonly used Fault Trees (FT) for precise data making them prone to data concerns and limiting their area of application. This paper extends

  13. Bayesian approach to errors-in-variables in regression models

    Science.gov (United States)

    Rozliman, Nur Aainaa; Ibrahim, Adriana Irawati Nur; Yunus, Rossita Mohammad

    2017-05-01

    In many applications and experiments, data sets are often contaminated with error or mismeasured covariates. When at least one of the covariates in a model is measured with error, Errors-in-Variables (EIV) model can be used. Measurement error, when not corrected, would cause misleading statistical inferences and analysis. Therefore, our goal is to examine the relationship of the outcome variable and the unobserved exposure variable given the observed mismeasured surrogate by applying the Bayesian formulation to the EIV model. We shall extend the flexible parametric method proposed by Hossain and Gustafson (2009) to another nonlinear regression model which is the Poisson regression model. We shall then illustrate the application of this approach via a simulation study using Markov chain Monte Carlo sampling methods.

  14. Using boosted regression trees to predict the near-saturated hydraulic conductivity of undisturbed soils

    Science.gov (United States)

    Koestel, John; Bechtold, Michel; Jorda, Helena; Jarvis, Nicholas

    2015-04-01

    The saturated and near-saturated hydraulic conductivity of soil is of key importance for modelling water and solute fluxes in the vadose zone. Hydraulic conductivity measurements are cumbersome at the Darcy scale and practically impossible at larger scales where water and solute transport models are mostly applied. Hydraulic conductivity must therefore be estimated from proxy variables. Such pedotransfer functions are known to work decently well for e.g. water retention curves but rather poorly for near-saturated and saturated hydraulic conductivities. Recently, Weynants et al. (2009, Revisiting Vereecken pedotransfer functions: Introducing a closed-form hydraulic model. Vadose Zone Journal, 8, 86-95) reported a coefficients of determination of 0.25 (validation with an independent data set) for the saturated hydraulic conductivity from lab-measurements of Belgian soil samples. In our study, we trained boosted regression trees on a global meta-database containing tension-disk infiltrometer data (see Jarvis et al. 2013. Influence of soil, land use and climatic factors on the hydraulic conductivity of soil. Hydrology & Earth System Sciences, 17, 5185-5195) to predict the saturated hydraulic conductivity (Ks) and the conductivity at a tension of 10 cm (K10). We found coefficients of determination of 0.39 and 0.62 under a simple 10-fold cross-validation for Ks and K10. When carrying out the validation folded over the data-sources, i.e. the source publications, we found that the corresponding coefficients of determination reduced to 0.15 and 0.36, respectively. We conclude that the stricter source-wise cross-validation should be applied in future pedotransfer studies to prevent overly optimistic validation results. The boosted regression trees also allowed for an investigation of relevant predictors for estimating the near-saturated hydraulic conductivity. We found that land use and bulk density were most important to predict Ks. We also observed that Ks is large in fine

  15. Measuring the satisfaction of intensive care unit patient families in Morocco: a regression tree analysis.

    Science.gov (United States)

    Damghi, Nada; Khoudri, Ibtissam; Oualili, Latifa; Abidi, Khalid; Madani, Naoufel; Zeggwagh, Amine Ali; Abouqal, Redouane

    2008-07-01

    Meeting the needs of patients' family members becomes an essential part of responsibilities of intensive care unit physicians. The aim of this study was to evaluate the satisfaction of patients' family members using the Arabic version of the Society of Critical Care Medicine's Family Needs Assessment questionnaire and to assess the predictors of family satisfaction using the classification and regression tree method. The authors conducted a prospective study. This study was conducted at a 12-bed medical intensive care unit in Morocco. Family representatives (n = 194) of consecutive patients with a length of stay >48 hrs were included in the study. Intervention was the Society of Critical Care Medicine's Family Needs Assessment questionnaire. Demographic data for relatives included age, gender, relationship with patients, education level, and intensive care unit commuting time. Clinical data for patients included age, gender, diagnoses, intensive care unit length of stay, Acute Physiology and Chronic Health Evaluation, MacCabe index, Therapeutic Interventioning Scoring System, and mechanical ventilation. The Arabic version of the Society of Critical Care Medicine's Family Needs Assessment questionnaire was administered between the third and fifth days after admission. Of family representatives, 81% declared being satisfied with information provided by physicians, 27% would like more information about the diagnosis, 30% about prognosis, and 45% about treatment. In univariate analysis, family satisfaction (small Society of Critical Care Medicine's Family Needs Assessment questionnaire score) increased with a lower family education level (p = .005), when the information was given by a senior physician (p = .014), and when the Society of Critical Care Medicine's Family Needs Assessment questionnaire was administered by an investigator (p = .002). Multivariate analysis (classification and regression tree) showed that the education level was the predominant factor

  16. Accurate phylogenetic tree reconstruction from quartets: a heuristic approach.

    Science.gov (United States)

    Reaz, Rezwana; Bayzid, Md Shamsuzzoha; Rahman, M Sohel

    2014-01-01

    Supertree methods construct trees on a set of taxa (species) combining many smaller trees on the overlapping subsets of the entire set of taxa. A 'quartet' is an unrooted tree over 4 taxa, hence the quartet-based supertree methods combine many 4-taxon unrooted trees into a single and coherent tree over the complete set of taxa. Quartet-based phylogeny reconstruction methods have been receiving considerable attentions in the recent years. An accurate and efficient quartet-based method might be competitive with the current best phylogenetic tree reconstruction methods (such as maximum likelihood or Bayesian MCMC analyses), without being as computationally intensive. In this paper, we present a novel and highly accurate quartet-based phylogenetic tree reconstruction method. We performed an extensive experimental study to evaluate the accuracy and scalability of our approach on both simulated and biological datasets.

  17. Predicting volume of distribution with decision tree-based regression methods using predicted tissue:plasma partition coefficients.

    Science.gov (United States)

    Freitas, Alex A; Limbu, Kriti; Ghafourian, Taravat

    2015-01-01

    Volume of distribution is an important pharmacokinetic property that indicates the extent of a drug's distribution in the body tissues. This paper addresses the problem of how to estimate the apparent volume of distribution at steady state (Vss) of chemical compounds in the human body using decision tree-based regression methods from the area of data mining (or machine learning). Hence, the pros and cons of several different types of decision tree-based regression methods have been discussed. The regression methods predict Vss using, as predictive features, both the compounds' molecular descriptors and the compounds' tissue:plasma partition coefficients (Kt:p) - often used in physiologically-based pharmacokinetics. Therefore, this work has assessed whether the data mining-based prediction of Vss can be made more accurate by using as input not only the compounds' molecular descriptors but also (a subset of) their predicted Kt:p values. Comparison of the models that used only molecular descriptors, in particular, the Bagging decision tree (mean fold error of 2.33), with those employing predicted Kt:p values in addition to the molecular descriptors, such as the Bagging decision tree using adipose Kt:p (mean fold error of 2.29), indicated that the use of predicted Kt:p values as descriptors may be beneficial for accurate prediction of Vss using decision trees if prior feature selection is applied. Decision tree based models presented in this work have an accuracy that is reasonable and similar to the accuracy of reported Vss inter-species extrapolations in the literature. The estimation of Vss for new compounds in drug discovery will benefit from methods that are able to integrate large and varied sources of data and flexible non-linear data mining methods such as decision trees, which can produce interpretable models. Graphical AbstractDecision trees for the prediction of tissue partition coefficient and volume of distribution of drugs.

  18. Differential Diagnosis of Erythmato-Squamous Diseases Using Classification and Regression Tree.

    Science.gov (United States)

    Maghooli, Keivan; Langarizadeh, Mostafa; Shahmoradi, Leila; Habibi-Koolaee, Mahdi; Jebraeily, Mohamad; Bouraghi, Hamid

    2016-10-01

    Differential diagnosis of Erythmato-Squamous Diseases (ESD) is a major challenge in the field of dermatology. The ESD diseases are placed into six different classes. Data mining is the process for detection of hidden patterns. In the case of ESD, data mining help us to predict the diseases. Different algorithms were developed for this purpose. we aimed to use the Classification and Regression Tree (CART) to predict differential diagnosis of ESD. we used the Cross Industry Standard Process for Data Mining (CRISP-DM) methodology. For this purpose, the dermatology data set from machine learning repository, UCI was obtained. The Clementine 12.0 software from IBM Company was used for modelling. In order to evaluation of the model we calculate the accuracy, sensitivity and specificity of the model. The proposed model had an accuracy of 94.84% (. 24.42) in order to correct prediction of the ESD disease. Results indicated that using of this classifier could be useful. But, it would be strongly recommended that the combination of machine learning methods could be more useful in terms of prediction of ESD.

  19. Groundwater level prediction of landslide based on classification and regression tree

    Directory of Open Access Journals (Sweden)

    Yannan Zhao

    2016-09-01

    Full Text Available According to groundwater level monitoring data of Shuping landslide in the Three Gorges Reservoir area, based on the response relationship between influential factors such as rainfall and reservoir level and the change of groundwater level, the influential factors of groundwater level were selected. Then the classification and regression tree (CART model was constructed by the subset and used to predict the groundwater level. Through the verification, the predictive results of the test sample were consistent with the actually measured values, and the mean absolute error and relative error is 0.28 m and 1.15% respectively. To compare the support vector machine (SVM model constructed using the same set of factors, the mean absolute error and relative error of predicted results is 1.53 m and 6.11% respectively. It is indicated that CART model has not only better fitting and generalization ability, but also strong advantages in the analysis of landslide groundwater dynamic characteristics and the screening of important variables. It is an effective method for prediction of ground water level in landslides.

  20. Automated Detection of Connective Tissue by Tissue Counter Analysis and Classification and Regression Trees

    Directory of Open Access Journals (Sweden)

    Josef Smolle

    2001-01-01

    Full Text Available Objective: To evaluate the feasibility of the CART (Classification and Regression Tree procedure for the recognition of microscopic structures in tissue counter analysis. Methods: Digital microscopic images of H&E stained slides of normal human skin and of primary malignant melanoma were overlayed with regularly distributed square measuring masks (elements and grey value, texture and colour features within each mask were recorded. In the learning set, elements were interactively labeled as representing either connective tissue of the reticular dermis, other tissue components or background. Subsequently, CART models were based on these data sets. Results: Implementation of the CART classification rules into the image analysis program showed that in an independent test set 94.1% of elements classified as connective tissue of the reticular dermis were correctly labeled. Automated measurements of the total amount of tissue and of the amount of connective tissue within a slide showed high reproducibility (r=0.97 and r=0.94, respectively; p < 0.001. Conclusions: CART procedure in tissue counter analysis yields simple and reproducible classification rules for tissue elements.

  1. KLASIFIKASI KARAKTERISTIK KECELAKAAN LALU LINTAS DI KOTA DENPASAR DENGAN PENDEKATAN CLASSIFICATION AND REGRESSION TREES (CART

    Directory of Open Access Journals (Sweden)

    I GEDE AGUS JIWADIANA

    2015-11-01

    Full Text Available The aim of this research is to determine the classification characteristics of traffic accidents in Denpasar city in January-July 2014 by using Classification And Regression Trees (CART. Then, for determine the explanatory variables into the main classifier of CART. The result showed that optimum CART generate three terminal node. First terminal node, there are 12 people were classified as heavy traffic accident characteritics with single accident, and second terminal nodes, there are 68 people were classified as minor traffic accident characteristics by type of traffic accident front-rear, front-front, front-side, pedestrians, side-side and location of traffic accident in district road and sub-district road. For third terminal node, there are 291 people were classified as medium traffic accident characteristics by type of traffic accident front-rear, front-front, front-side, pedestrians, side-side and location of traffic accident in municipality road and explanatory variables into the main splitter to make of CART is type of traffic accident with maximum homogeneity measure of 0.03252.

  2. Estimating carbon and showing impacts of drought using satellite data in regression-tree models

    Science.gov (United States)

    Boyte, Stephen; Wylie, Bruce K.; Howard, Danny; Dahal, Devendra; Gilmanov, Tagir G.

    2018-01-01

    Integrating spatially explicit biogeophysical and remotely sensed data into regression-tree models enables the spatial extrapolation of training data over large geographic spaces, allowing a better understanding of broad-scale ecosystem processes. The current study presents annual gross primary production (GPP) and annual ecosystem respiration (RE) for 2000–2013 in several short-statured vegetation types using carbon flux data from towers that are located strategically across the conterminous United States (CONUS). We calculate carbon fluxes (annual net ecosystem production [NEP]) for each year in our study period, which includes 2012 when drought and higher-than-normal temperatures influence vegetation productivity in large parts of the study area. We present and analyse carbon flux dynamics in the CONUS to better understand how drought affects GPP, RE, and NEP. Model accuracy metrics show strong correlation coefficients (r) (r ≥ 94%) between training and estimated data for both GPP and RE. Overall, average annual GPP, RE, and NEP are relatively constant throughout the study period except during 2012 when almost 60% less carbon is sequestered than normal. These results allow us to conclude that this modelling method effectively estimates carbon dynamics through time and allows the exploration of impacts of meteorological anomalies and vegetation types on carbon dynamics.

  3. Regression trees modeling and forecasting of PM10 air pollution in urban areas

    Science.gov (United States)

    Stoimenova, M.; Voynikova, D.; Ivanov, A.; Gocheva-Ilieva, S.; Iliev, I.

    2017-10-01

    Fine particulate matter (PM10) air pollution is a serious problem affecting the health of the population in many Bulgarian cities. As an example, the object of this study is the pollution with PM10 of the town of Pleven, Northern Bulgaria. The measured concentrations of this air pollutant for this city consistently exceeded the permissible limits set by European and national legislation. Based on data for the last 6 years (2011-2016), the analysis shows that this applies both to the daily limit of 50 micrograms per cubic meter and the allowable number of daily concentration exceedances to 35 per year. Also, the average annual concentration of PM10 exceeded the prescribed norm of no more than 40 micrograms per cubic meter. The aim of this work is to build high performance mathematical models for effective prediction and forecasting the level of PM10 pollution. The study was conducted with the powerful flexible data mining technique Classification and Regression Trees (CART). The values of PM10 were fitted with respect to meteorological data such as maximum and minimum air temperature, relative humidity, wind speed and direction and others, as well as with time and autoregressive variables. As a result the obtained CART models demonstrate high predictive ability and fit the actual data with up to 80%. The best models were applied for forecasting the level pollution for 3 to 7 days ahead. An interpretation of the modeling results is presented.

  4. Predicting number of hospitalization days based on health insurance claims data using bagged regression trees.

    Science.gov (United States)

    Xie, Yang; Schreier, Günter; Chang, David C W; Neubauer, Sandra; Redmond, Stephen J; Lovell, Nigel H

    2014-01-01

    Healthcare administrators worldwide are striving to both lower the cost of care whilst improving the quality of care given. Therefore, better clinical and administrative decision making is needed to improve these issues. Anticipating outcomes such as number of hospitalization days could contribute to addressing this problem. In this paper, a method was developed, using large-scale health insurance claims data, to predict the number of hospitalization days in a population. We utilized a regression decision tree algorithm, along with insurance claim data from 300,000 individuals over three years, to provide predictions of number of days in hospital in the third year, based on medical admissions and claims data from the first two years. Our method performs well in the general population. For the population aged 65 years and over, the predictive model significantly improves predictions over a baseline method (predicting a constant number of days for each patient), and achieved a specificity of 70.20% and sensitivity of 75.69% in classifying these subjects into two categories of 'no hospitalization' and 'at least one day in hospital'.

  5. Identification of Sexually Abused Female Adolescents at Risk for Suicidal Ideations: A Classification and Regression Tree Analysis

    Science.gov (United States)

    Brabant, Marie-Eve; Hebert, Martine; Chagnon, Francois

    2013-01-01

    This study explored the clinical profiles of 77 female teenager survivors of sexual abuse and examined the association of abuse-related and personal variables with suicidal ideations. Analyses revealed that 64% of participants experienced suicidal ideations. Findings from classification and regression tree analysis indicated that depression,…

  6. Regression modeling and mapping of coniferous forest basal area and tree density from discrete-return lidar and multispectral data

    Science.gov (United States)

    Andrew T. Hudak; Nicholas L. Crookston; Jeffrey S. Evans; Michael K. Falkowski; Alistair M. S. Smith; Paul E. Gessler; Penelope Morgan

    2006-01-01

    We compared the utility of discrete-return light detection and ranging (lidar) data and multispectral satellite imagery, and their integration, for modeling and mapping basal area and tree density across two diverse coniferous forest landscapes in north-central Idaho. We applied multiple linear regression models subset from a suite of 26 predictor variables derived...

  7. Regionalization of meso-scale physically based nitrogen modeling outputs to the macro-scale by the use of regression trees

    Science.gov (United States)

    Künne, A.; Fink, M.; Kipka, H.; Krause, P.; Flügel, W.-A.

    2012-06-01

    In this paper, a method is presented to estimate excess nitrogen on large scales considering single field processes. The approach was implemented by using the physically based model J2000-S to simulate the nitrogen balance as well as the hydrological dynamics within meso-scale test catchments. The model input data, the parameterization, the results and a detailed system understanding were used to generate the regression tree models with GUIDE (Loh, 2002). For each landscape type in the federal state of Thuringia a regression tree was calibrated and validated using the model data and results of excess nitrogen from the test catchments. Hydrological parameters such as precipitation and evapotranspiration were also used to predict excess nitrogen by the regression tree model. Hence they had to be calculated and regionalized as well for the state of Thuringia. Here the model J2000g was used to simulate the water balance on the macro scale. With the regression trees the excess nitrogen was regionalized for each landscape type of Thuringia. The approach allows calculating the potential nitrogen input into the streams of the drainage area. The results show that the applied methodology was able to transfer the detailed model results of the meso-scale catchments to the entire state of Thuringia by low computing time without losing the detailed knowledge from the nitrogen transport modeling. This was validated with modeling results from Fink (2004) in a catchment lying in the regionalization area. The regionalized and modeled excess nitrogen correspond with 94%. The study was conducted within the framework of a project in collaboration with the Thuringian Environmental Ministry, whose overall aim was to assess the effect of agro-environmental measures regarding load reduction in the water bodies of Thuringia to fulfill the requirements of the European Water Framework Directive (Bäse et al., 2007; Fink, 2006; Fink et al., 2007).

  8. Prevalence and Determinants of Preterm Birth in Tehran, Iran: A Comparison between Logistic Regression and Decision Tree Methods.

    Science.gov (United States)

    Amini, Payam; Maroufizadeh, Saman; Samani, Reza Omani; Hamidi, Omid; Sepidarkish, Mahdi

    2017-06-01

    Preterm birth (PTB) is a leading cause of neonatal death and the second biggest cause of death in children under five years of age. The objective of this study was to determine the prevalence of PTB and its associated factors using logistic regression and decision tree classification methods. This cross-sectional study was conducted on 4,415 pregnant women in Tehran, Iran, from July 6-21, 2015. Data were collected by a researcher-developed questionnaire through interviews with mothers and review of their medical records. To evaluate the accuracy of the logistic regression and decision tree methods, several indices such as sensitivity, specificity, and the area under the curve were used. The PTB rate was 5.5% in this study. The logistic regression outperformed the decision tree for the classification of PTB based on risk factors. Logistic regression showed that multiple pregnancies, mothers with preeclampsia, and those who conceived with assisted reproductive technology had an increased risk for PTB ( p logistic regression model for the classification of risk groups for PTB.

  9. Regression Tree-Based Methodology for Customizing Building Energy Benchmarks to Individual Commercial Buildings

    Science.gov (United States)

    Kaskhedikar, Apoorva Prakash

    According to the U.S. Energy Information Administration, commercial buildings represent about 40% of the United State's energy consumption of which office buildings consume a major portion. Gauging the extent to which an individual building consumes energy in excess of its peers is the first step in initiating energy efficiency improvement. Energy Benchmarking offers initial building energy performance assessment without rigorous evaluation. Energy benchmarking tools based on the Commercial Buildings Energy Consumption Survey (CBECS) database are investigated in this thesis. This study proposes a new benchmarking methodology based on decision trees, where a relationship between the energy use intensities (EUI) and building parameters (continuous and categorical) is developed for different building types. This methodology was applied to medium office and school building types contained in the CBECS database. The Random Forest technique was used to find the most influential parameters that impact building energy use intensities. Subsequently, correlations which were significant were identified between EUIs and CBECS variables. Other than floor area, some of the important variables were number of workers, location, number of PCs and main cooling equipment. The coefficient of variation was used to evaluate the effectiveness of the new model. The customization technique proposed in this thesis was compared with another benchmarking model that is widely used by building owners and designers namely, the ENERGY STAR's Portfolio Manager. This tool relies on the standard Linear Regression methods which is only able to handle continuous variables. The model proposed uses data mining technique and was found to perform slightly better than the Portfolio Manager. The broader impacts of the new benchmarking methodology proposed is that it allows for identifying important categorical variables, and then incorporating them in a local, as against a global, model framework for EUI

  10. Classification and regression tree (CART) model to predict pulmonary tuberculosis in hospitalized patients.

    Science.gov (United States)

    Aguiar, Fabio S; Almeida, Luciana L; Ruffino-Netto, Antonio; Kritski, Afranio Lineu; Mello, Fernanda Cq; Werneck, Guilherme L

    2012-08-07

    Tuberculosis (TB) remains a public health issue worldwide. The lack of specific clinical symptoms to diagnose TB makes the correct decision to admit patients to respiratory isolation a difficult task for the clinician. Isolation of patients without the disease is common and increases health costs. Decision models for the diagnosis of TB in patients attending hospitals can increase the quality of care and decrease costs, without the risk of hospital transmission. We present a predictive model for predicting pulmonary TB in hospitalized patients in a high prevalence area in order to contribute to a more rational use of isolation rooms without increasing the risk of transmission. Cross sectional study of patients admitted to CFFH from March 2003 to December 2004. A classification and regression tree (CART) model was generated and validated. The area under the ROC curve (AUC), sensitivity, specificity, positive and negative predictive values were used to evaluate the performance of model. Validation of the model was performed with a different sample of patients admitted to the same hospital from January to December 2005. We studied 290 patients admitted with clinical suspicion of TB. Diagnosis was confirmed in 26.5% of them. Pulmonary TB was present in 83.7% of the patients with TB (62.3% with positive sputum smear) and HIV/AIDS was present in 56.9% of patients. The validated CART model showed sensitivity, specificity, positive predictive value and negative predictive value of 60.00%, 76.16%, 33.33%, and 90.55%, respectively. The AUC was 79.70%. The CART model developed for these hospitalized patients with clinical suspicion of TB had fair to good predictive performance for pulmonary TB. The most important variable for prediction of TB diagnosis was chest radiograph results. Prospective validation is still necessary, but our model offer an alternative for decision making in whether to isolate patients with clinical suspicion of TB in tertiary health facilities in

  11. Schistosoma mansoni reinfection: Analysis of risk factors by classification and regression tree (CART modeling.

    Directory of Open Access Journals (Sweden)

    Andréa Gazzinelli

    Full Text Available Praziquantel (PZQ is an effective chemotherapy for schistosomiasis mansoni and a mainstay for its control and potential elimination. However, it does not prevent against reinfection, which can occur rapidly in areas with active transmission. A guide to ranking the risk factors for Schistosoma mansoni reinfection would greatly contribute to prioritizing resources and focusing prevention and control measures to prevent rapid reinfection. The objective of the current study was to explore the relationship among the socioeconomic, demographic, and epidemiological factors that can influence reinfection by S. mansoni one year after successful treatment with PZQ in school-aged children in Northeastern Minas Gerais state Brazil. Parasitological, socioeconomic, demographic, and water contact information were surveyed in 506 S. mansoni-infected individuals, aged 6 to 15 years, resident in these endemic areas. Eligible individuals were treated with PZQ until they were determined to be negative by the absence of S. mansoni eggs in the feces on two consecutive days of Kato-Katz fecal thick smear. These individuals were surveyed again 12 months from the date of successful treatment with PZQ. A classification and regression tree modeling (CART was then used to explore the relationship between socioeconomic, demographic, and epidemiological variables and their reinfection status. The most important risk factor identified for S. mansoni reinfection was their "heavy" infection at baseline. Additional analyses, excluding heavy infection status, showed that lower socioeconomic status and a lower level of education of the household head were also most important risk factors for S. mansoni reinfection. Our results provide an important contribution toward the control and possible elimination of schistosomiasis by identifying three major risk factors that can be used for targeted treatment and monitoring of reinfection. We suggest that control measures that target

  12. A Visual Analytics Approach for Correlation, Classification, and Regression Analysis

    Energy Technology Data Exchange (ETDEWEB)

    Steed, Chad A [ORNL; SwanII, J. Edward [Mississippi State University (MSU); Fitzpatrick, Patrick J. [Mississippi State University (MSU); Jankun-Kelly, T.J. [Mississippi State University (MSU)

    2012-02-01

    New approaches that combine the strengths of humans and machines are necessary to equip analysts with the proper tools for exploring today's increasing complex, multivariate data sets. In this paper, a novel visual data mining framework, called the Multidimensional Data eXplorer (MDX), is described that addresses the challenges of today's data by combining automated statistical analytics with a highly interactive parallel coordinates based canvas. In addition to several intuitive interaction capabilities, this framework offers a rich set of graphical statistical indicators, interactive regression analysis, visual correlation mining, automated axis arrangements and filtering, and data classification techniques. The current work provides a detailed description of the system as well as a discussion of key design aspects and critical feedback from domain experts.

  13. A Practical pedestrian approach to parsimonious regression with inaccurate inputs

    Directory of Open Access Journals (Sweden)

    Seppo Karrila

    2014-04-01

    Full Text Available A measurement result often dictates an interval containing the correct value. Interval data is also created by roundoff, truncation, and binning. We focus on such common interval uncertainty in data. Inaccuracy in model inputs is typically ignored on model fitting. We provide a practical approach for regression with inaccurate data: the mathematics is easy, and the linear programming formulations simple to use even in a spreadsheet. This self-contained elementary presentation introduces interval linear systems and requires only basic knowledge of algebra. Feature selection is automatic; but can be controlled to find only a few most relevant inputs; and joint feature selection is enabled for multiple modeled outputs. With more features than cases, a novel connection to compressed sensing emerges: robustness against interval errors-in-variables implies model parsimony, and the input inaccuracies determine the regularization term. A small numerical example highlights counterintuitive results and a dramatic difference to total least squares.

  14. Does intense monitoring matter? A quantile regression approach

    Directory of Open Access Journals (Sweden)

    Fekri Ali Shawtari

    2017-06-01

    Full Text Available Corporate governance has become a centre of attention in corporate management at both micro and macro levels due to adverse consequences and repercussion of insufficient accountability. In this study, we include the Malaysian stock market as sample to explore the impact of intense monitoring on the relationship between intellectual capital performance and market valuation. The objectives of the paper are threefold: i to investigate whether intense monitoring affects the intellectual capital performance of listed companies; ii to explore the impact of intense monitoring on firm value; iii to examine the extent to which the directors serving more than two board committees affects the linkage between intellectual capital performance and firms' value. We employ two approaches, namely, the Ordinary Least Square (OLS and the quantile regression approach. The purpose of the latter is to estimate and generate inference about conditional quantile functions. This method is useful when the conditional distribution does not have a standard shape such as an asymmetric, fat-tailed, or truncated distribution. In terms of variables, the intellectual capital is measured using the value added intellectual coefficient (VAIC, while the market valuation is proxied by firm's market capitalization. The findings of the quantile regression shows that some of the results do not coincide with the results of OLS. We found that intensity of monitoring does not influence the intellectual capital of all firms. It is also evident that intensity of monitoring does not influence the market valuation. However, to some extent, it moderates the relationship between intellectual capital performance and market valuation. This paper contributes to the existing literature as it presents new empirical evidences on the moderating effects of the intensity of monitoring of the board committees on the relationship between performance and intellectual capital.

  15. Multi-scale remote sensing sagebrush characterization with regression trees over Wyoming, USA: laying a foundation for monitoring

    Science.gov (United States)

    Homer, Collin G.; Aldridge, Cameron L.; Meyer, Debra K.; Schell, Spencer J.

    2012-01-01

    agebrush ecosystems in North America have experienced extensive degradation since European settlement. Further degradation continues from exotic invasive plants, altered fire frequency, intensive grazing practices, oil and gas development, and climate change – adding urgency to the need for ecosystem-wide understanding. Remote sensing is often identified as a key information source to facilitate ecosystem-wide characterization, monitoring, and analysis; however, approaches that characterize sagebrush with sufficient and accurate local detail across large enough areas to support this paradigm are unavailable. We describe the development of a new remote sensing sagebrush characterization approach for the state of Wyoming, U.S.A. This approach integrates 2.4 m QuickBird, 30 m Landsat TM, and 56 m AWiFS imagery into the characterization of four primary continuous field components including percent bare ground, percent herbaceous cover, percent litter, and percent shrub, and four secondary components including percent sagebrush (Artemisia spp.), percent big sagebrush (Artemisia tridentata), percent Wyoming sagebrush (Artemisia tridentata Wyomingensis), and shrub height using a regression tree. According to an independent accuracy assessment, primary component root mean square error (RMSE) values ranged from 4.90 to 10.16 for 2.4 m QuickBird, 6.01 to 15.54 for 30 m Landsat, and 6.97 to 16.14 for 56 m AWiFS. Shrub and herbaceous components outperformed the current data standard called LANDFIRE, with a shrub RMSE value of 6.04 versus 12.64 and a herbaceous component RMSE value of 12.89 versus 14.63. This approach offers new advancements in sagebrush characterization from remote sensing and provides a foundation to quantitatively monitor these components into the future.

  16. Comprehensive database of diameter-based biomass regressions for North American tree species

    Science.gov (United States)

    Jennifer C. Jenkins; David C. Chojnacky; Linda S. Heath; Richard A. Birdsey

    2004-01-01

    A database consisting of 2,640 equations compiled from the literature for predicting the biomass of trees and tree components from diameter measurements of species found in North America. Bibliographic information, geographic locations, diameter limits, diameter and biomass units, equation forms, statistical errors, and coefficients are provided for each equation,...

  17. Bayesian logistic regression approaches to predict incorrect DRG assignment.

    Science.gov (United States)

    Suleiman, Mani; Demirhan, Haydar; Boyd, Leanne; Girosi, Federico; Aksakalli, Vural

    2018-05-07

    Episodes of care involving similar diagnoses and treatments and requiring similar levels of resource utilisation are grouped to the same Diagnosis-Related Group (DRG). In jurisdictions which implement DRG based payment systems, DRGs are a major determinant of funding for inpatient care. Hence, service providers often dedicate auditing staff to the task of checking that episodes have been coded to the correct DRG. The use of statistical models to estimate an episode's probability of DRG error can significantly improve the efficiency of clinical coding audits. This study implements Bayesian logistic regression models with weakly informative prior distributions to estimate the likelihood that episodes require a DRG revision, comparing these models with each other and to classical maximum likelihood estimates. All Bayesian approaches had more stable model parameters than maximum likelihood. The best performing Bayesian model improved overall classification per- formance by 6% compared to maximum likelihood, with a 34% gain compared to random classification, respectively. We found that the original DRG, coder and the day of coding all have a significant effect on the likelihood of DRG error. Use of Bayesian approaches has improved model parameter stability and classification accuracy. This method has already lead to improved audit efficiency in an operational capacity.

  18. Identifying Domain-General and Domain-Specific Predictors of Low Mathematics Performance: A Classification and Regression Tree Analysis

    Directory of Open Access Journals (Sweden)

    David J. Purpura

    2017-12-01

    Full Text Available Many children struggle to successfully acquire early mathematics skills. Theoretical and empirical evidence has pointed to deficits in domain-specific skills (e.g., non-symbolic mathematics skills or domain-general skills (e.g., executive functioning and language as underlying low mathematical performance. In the current study, we assessed a sample of 113 three- to five-year old preschool children on a battery of domain-specific and domain-general factors in the fall and spring of their preschool year to identify Time 1 (fall factors associated with low performance in mathematics knowledge at Time 2 (spring. We used the exploratory approach of classification and regression tree analyses, a strategy that uses step-wise partitioning to create subgroups from a larger sample using multiple predictors, to identify the factors that were the strongest classifiers of low performance for younger and older preschool children. Results indicated that the most consistent classifier of low mathematics performance at Time 2 was children’s Time 1 mathematical language skills. Further, other distinct classifiers of low performance emerged for younger and older children. These findings suggest that risk classification for low mathematics performance may differ depending on children’s age.

  19. Regression Benchmarking: An Approach to Quality Assurance in Performance

    OpenAIRE

    Bulej, Lubomír

    2005-01-01

    The paper presents a short summary of our work in the area of regression benchmarking and its application to software development. Specially, we explain the concept of regression benchmarking, the requirements for employing regression testing in a software project, and methods used for analyzing the vast amounts of data resulting from repeated benchmarking. We present the application of regression benchmarking on a real software project and conclude with a glimpse at the challenges for the fu...

  20. Classification and regression tree (CART model to predict pulmonary tuberculosis in hospitalized patients

    Directory of Open Access Journals (Sweden)

    Aguiar Fabio S

    2012-08-01

    Full Text Available Abstract Background Tuberculosis (TB remains a public health issue worldwide. The lack of specific clinical symptoms to diagnose TB makes the correct decision to admit patients to respiratory isolation a difficult task for the clinician. Isolation of patients without the disease is common and increases health costs. Decision models for the diagnosis of TB in patients attending hospitals can increase the quality of care and decrease costs, without the risk of hospital transmission. We present a predictive model for predicting pulmonary TB in hospitalized patients in a high prevalence area in order to contribute to a more rational use of isolation rooms without increasing the risk of transmission. Methods Cross sectional study of patients admitted to CFFH from March 2003 to December 2004. A classification and regression tree (CART model was generated and validated. The area under the ROC curve (AUC, sensitivity, specificity, positive and negative predictive values were used to evaluate the performance of model. Validation of the model was performed with a different sample of patients admitted to the same hospital from January to December 2005. Results We studied 290 patients admitted with clinical suspicion of TB. Diagnosis was confirmed in 26.5% of them. Pulmonary TB was present in 83.7% of the patients with TB (62.3% with positive sputum smear and HIV/AIDS was present in 56.9% of patients. The validated CART model showed sensitivity, specificity, positive predictive value and negative predictive value of 60.00%, 76.16%, 33.33%, and 90.55%, respectively. The AUC was 79.70%. Conclusions The CART model developed for these hospitalized patients with clinical suspicion of TB had fair to good predictive performance for pulmonary TB. The most important variable for prediction of TB diagnosis was chest radiograph results. Prospective validation is still necessary, but our model offer an alternative for decision making in whether to isolate patients with

  1. Multiple Additive Regression Trees a Methodology for Predictive Data Mining for Fraud Detection

    National Research Council Canada - National Science Library

    da

    2002-01-01

    ...) is using new and innovative techniques for fraud detection. Their primary techniques for fraud detection are the data mining tools of classification trees and neural networks as well as methods for pooling the results of multiple model fits...

  2. [Application of regression tree in analyzing the effects of climate factors on NDVI in loess hilly area of Shaanxi Province].

    Science.gov (United States)

    Liu, Yang; Lü, Yi-he; Zheng, Hai-feng; Chen, Li-ding

    2010-05-01

    Based on the 10-day SPOT VEGETATION NDVI data and the daily meteorological data from 1998 to 2007 in Yan' an City, the main meteorological variables affecting the annual and interannual variations of NDVI were determined by using regression tree. It was found that the effects of test meteorological variables on the variability of NDVI differed with seasons and time lags. Temperature and precipitation were the most important meteorological variables affecting the annual variation of NDVI, and the average highest temperature was the most important meteorological variable affecting the inter-annual variation of NDVI. Regression tree was very powerful in determining the key meteorological variables affecting NDVI variation, but could not build quantitative relations between NDVI and meteorological variables, which limited its further and wider application.

  3. Testing for Stock Market Contagion: A Quantile Regression Approach

    NARCIS (Netherlands)

    S.Y. Park (Sung); W. Wang (Wendun); N. Huang (Naijing)

    2015-01-01

    markdownabstract__Abstract__ Regarding the asymmetric and leptokurtic behavior of financial data, we propose a new contagion test in the quantile regression framework that is robust to model misspecification. Unlike conventional correlation-based tests, the proposed quantile contagion test

  4. Identifying predictors of physics item difficulty: A linear regression approach

    Science.gov (United States)

    Mesic, Vanes; Muratovic, Hasnija

    2011-06-01

    Large-scale assessments of student achievement in physics are often approached with an intention to discriminate students based on the attained level of their physics competencies. Therefore, for purposes of test design, it is important that items display an acceptable discriminatory behavior. To that end, it is recommended to avoid extraordinary difficult and very easy items. Knowing the factors that influence physics item difficulty makes it possible to model the item difficulty even before the first pilot study is conducted. Thus, by identifying predictors of physics item difficulty, we can improve the test-design process. Furthermore, we get additional qualitative feedback regarding the basic aspects of student cognitive achievement in physics that are directly responsible for the obtained, quantitative test results. In this study, we conducted a secondary analysis of data that came from two large-scale assessments of student physics achievement at the end of compulsory education in Bosnia and Herzegovina. Foremost, we explored the concept of “physics competence” and performed a content analysis of 123 physics items that were included within the above-mentioned assessments. Thereafter, an item database was created. Items were described by variables which reflect some basic cognitive aspects of physics competence. For each of the assessments, Rasch item difficulties were calculated in separate analyses. In order to make the item difficulties from different assessments comparable, a virtual test equating procedure had to be implemented. Finally, a regression model of physics item difficulty was created. It has been shown that 61.2% of item difficulty variance can be explained by factors which reflect the automaticity, complexity, and modality of the knowledge structure that is relevant for generating the most probable correct solution, as well as by the divergence of required thinking and interference effects between intuitive and formal physics knowledge

  5. Identifying predictors of physics item difficulty: A linear regression approach

    Directory of Open Access Journals (Sweden)

    Hasnija Muratovic

    2011-06-01

    Full Text Available Large-scale assessments of student achievement in physics are often approached with an intention to discriminate students based on the attained level of their physics competencies. Therefore, for purposes of test design, it is important that items display an acceptable discriminatory behavior. To that end, it is recommended to avoid extraordinary difficult and very easy items. Knowing the factors that influence physics item difficulty makes it possible to model the item difficulty even before the first pilot study is conducted. Thus, by identifying predictors of physics item difficulty, we can improve the test-design process. Furthermore, we get additional qualitative feedback regarding the basic aspects of student cognitive achievement in physics that are directly responsible for the obtained, quantitative test results. In this study, we conducted a secondary analysis of data that came from two large-scale assessments of student physics achievement at the end of compulsory education in Bosnia and Herzegovina. Foremost, we explored the concept of “physics competence” and performed a content analysis of 123 physics items that were included within the above-mentioned assessments. Thereafter, an item database was created. Items were described by variables which reflect some basic cognitive aspects of physics competence. For each of the assessments, Rasch item difficulties were calculated in separate analyses. In order to make the item difficulties from different assessments comparable, a virtual test equating procedure had to be implemented. Finally, a regression model of physics item difficulty was created. It has been shown that 61.2% of item difficulty variance can be explained by factors which reflect the automaticity, complexity, and modality of the knowledge structure that is relevant for generating the most probable correct solution, as well as by the divergence of required thinking and interference effects between intuitive and formal

  6. Identifying Generalizable Image Segmentation Parameters for Urban Land Cover Mapping through Meta-Analysis and Regression Tree Modeling

    Directory of Open Access Journals (Sweden)

    Brian A. Johnson

    2018-01-01

    Full Text Available The advent of very high resolution (VHR satellite imagery and the development of Geographic Object-Based Image Analysis (GEOBIA have led to many new opportunities for fine-scale land cover mapping, especially in urban areas. Image segmentation is an important step in the GEOBIA framework, so great time/effort is often spent to ensure that computer-generated image segments closely match real-world objects of interest. In the remote sensing community, segmentation is frequently performed using the multiresolution segmentation (MRS algorithm, which is tuned through three user-defined parameters (the scale, shape/color, and compactness/smoothness parameters. The scale parameter (SP is the most important parameter and governs the average size of generated image segments. Existing automatic methods to determine suitable SPs for segmentation are scene-specific and often computationally intensive, so an approach to estimating appropriate SPs that is generalizable (i.e., not scene-specific could speed up the GEOBIA workflow considerably. In this study, we attempted to identify generalizable SPs for five common urban land cover types (buildings, vegetation, roads, bare soil, and water through meta-analysis and nonlinear regression tree (RT modeling. First, we performed a literature search of recent studies that employed GEOBIA for urban land cover mapping and extracted the MRS parameters used, the image properties (i.e., spatial and radiometric resolutions, and the land cover classes mapped. Using this data extracted from the literature, we constructed RT models for each land cover class to predict suitable SP values based on the: image spatial resolution, image radiometric resolution, shape/color parameter, and compactness/smoothness parameter. Based on a visual and quantitative analysis of results, we found that for all land cover classes except water, relatively accurate SPs could be identified using our RT modeling results. The main advantage of our

  7. On the quantitative relationships between environmental parameters and heavy metals pollution in Mediterranean soils using GIS regression-trees

    DEFF Research Database (Denmark)

    Bou Kheir, Rania; Shomar, B.; Greve, Mogens Humlekrog

    2014-01-01

    Soil heavy metal pollution has been and continues to be a worldwide phenomenon that has attracted a great deal of attention from governments and regulatory bodies. In this context, our study used Geographic Information Systems (GIS) and regression-tree modeling (196 trees) to precisely quantify...... the relationships between four toxic heavy metals (Ni, Cr, Cd and As) and sixteen environmental parameters (e.g., parent material, slope gradient, proximity to roads, etc.) in the soils of northern Lebanon (as a case study of Mediterranean landscapes), and to detect the most important parameters that can be used...... between 68% and 100%), surroundings of waste areas (48 – 92%), proximity to roads (45 – 82%) and parent materials (57 – 73%) considerably influenced all investigated heavy metals, which is not the case of hydromorphological and soil properties. For instance, hydraulic conductivity (18 – 41%) and pH (23...

  8. Computer-oriented approach to fault-tree construction

    International Nuclear Information System (INIS)

    Salem, S.L.; Apostolakis, G.E.; Okrent, D.

    1976-11-01

    A methodology for systematically constructing fault trees for general complex systems is developed and applied, via the Computer Automated Tree (CAT) program, to several systems. A means of representing component behavior by decision tables is presented. The method developed allows the modeling of components with various combinations of electrical, fluid and mechanical inputs and outputs. Each component can have multiple internal failure mechanisms which combine with the states of the inputs to produce the appropriate output states. The generality of this approach allows not only the modeling of hardware, but human actions and interactions as well. A procedure for constructing and editing fault trees, either manually or by computer, is described. The techniques employed result in a complete fault tree, in standard form, suitable for analysis by current computer codes. Methods of describing the system, defining boundary conditions and specifying complex TOP events are developed in order to set up the initial configuration for which the fault tree is to be constructed. The approach used allows rapid modifications of the decision tables and systems to facilitate the analysis and comparison of various refinements and changes in the system configuration and component modeling

  9. Fuzzy set theoretic approach to fault tree analysis | Tyagi ...

    African Journals Online (AJOL)

    This approach can be widely used to improve the reliability and to reduce the operating cost of a system. The proposed techniques are discussed and illustrated by taking an example of a nuclear power plant. Keywords: Fault tree, Triangular and Trapezoidal fuzzy number, Fuzzy importance, Ranking of fuzzy numbers ...

  10. Analisis Perbandingan Teknik Support Vector Regression (SVR) Dan Decision Tree C4.5 Dalam Data Mining

    OpenAIRE

    Astuti, Yuniar Andi

    2011-01-01

    This study examines techniques Support Vector Regression and Decision Tree C4.5 has been used in studies in various fields, in order to know the advantages and disadvantages of both techniques that appear in Data Mining. From the ten studies that use both techniques, the results of the analysis showed that the accuracy of the SVR technique for 59,64% and C4.5 for 76,97% So in this study obtained a statement that C4.5 is better than SVR 097038020

  11. A Comparison of Logistic Regression, Neural Networks, and Classification Trees Predicting Success of Actuarial Students

    Science.gov (United States)

    Schumacher, Phyllis; Olinsky, Alan; Quinn, John; Smith, Richard

    2010-01-01

    The authors extended previous research by 2 of the authors who conducted a study designed to predict the successful completion of students enrolled in an actuarial program. They used logistic regression to determine the probability of an actuarial student graduating in the major or dropping out. They compared the results of this study with those…

  12. New machine learning tools for predictive vegetation mapping after climate change: Bagging and Random Forest perform better than Regression Tree Analysis

    Science.gov (United States)

    L.R. Iverson; A.M. Prasad; A. Liaw

    2004-01-01

    More and better machine learning tools are becoming available for landscape ecologists to aid in understanding species-environment relationships and to map probable species occurrence now and potentially into the future. To thal end, we evaluated three statistical models: Regression Tree Analybib (RTA), Bagging Trees (BT) and Random Forest (RF) for their utility in...

  13. APPLICATION OF MULTIPLE LOGISTIC REGRESSION, BAYESIAN LOGISTIC AND CLASSIFICATION TREE TO IDENTIFY THE SIGNIFICANT FACTORS INFLUENCING CRASH SEVERITY

    Directory of Open Access Journals (Sweden)

    MILAD TAZIK

    2017-11-01

    Full Text Available Identifying cases in which road crashes result in fatality or injury of drivers may help improve their safety. In this study, datasets of crashes happened in TehranQom freeway, Iran, were examined by three models (multiple logistic regression, Bayesian logistic and classification tree to analyse the contribution of several variables to fatal accidents. For multiple logistic regression and Bayesian logistic models, the odds ratio was calculated for each variable. The model which best suited the identification of accident severity was determined based on AIC and DIC criteria. Based on the results of these two models, rollover crashes (OR = 14.58, %95 CI: 6.8-28.6, not using of seat belt (OR = 5.79, %95 CI: 3.1-9.9, exceeding speed limits (OR = 4.02, %95 CI: 1.8-7.9 and being female (OR = 2.91, %95 CI: 1.1-6.1 were the most important factors in fatalities of drivers. In addition, the results of the classification tree model have verified the findings of the other models.

  14. A novel dendrochronological approach reveals drivers of carbon sequestration in tree species of riparian forests across spatiotemporal scales.

    Science.gov (United States)

    Rieger, Isaak; Kowarik, Ingo; Cherubini, Paolo; Cierjacks, Arne

    2017-01-01

    Aboveground carbon (C) sequestration in trees is important in global C dynamics, but reliable techniques for its modeling in highly productive and heterogeneous ecosystems are limited. We applied an extended dendrochronological approach to disentangle the functioning of drivers from the atmosphere (temperature, precipitation), the lithosphere (sedimentation rate), the hydrosphere (groundwater table, river water level fluctuation), the biosphere (tree characteristics), and the anthroposphere (dike construction). Carbon sequestration in aboveground biomass of riparian Quercus robur L. and Fraxinus excelsior L. was modeled (1) over time using boosted regression tree analysis (BRT) on cross-datable trees characterized by equal annual growth ring patterns and (2) across space using a subsequent classification and regression tree analysis (CART) on cross-datable and not cross-datable trees. While C sequestration of cross-datable Q. robur responded to precipitation and temperature, cross-datable F. excelsior also responded to a low Danube river water level. However, CART revealed that C sequestration over time is governed by tree height and parameters that vary over space (magnitude of fluctuation in the groundwater table, vertical distance to mean river water level, and longitudinal distance to upstream end of the study area). Thus, a uniform response to climatic drivers of aboveground C sequestration in Q. robur was only detectable in trees of an intermediate height class and in taller trees (>21.8m) on sites where the groundwater table fluctuated little (≤0.9m). The detection of climatic drivers and the river water level in F. excelsior depended on sites at lower altitudes above the mean river water level (≤2.7m) and along a less dynamic downstream section of the study area. Our approach indicates unexploited opportunities of understanding the interplay of different environmental drivers in aboveground C sequestration. Results may support species-specific and

  15. The price sensitivity of Medicare beneficiaries: a regression discontinuity approach.

    Science.gov (United States)

    Buchmueller, Thomas C; Grazier, Kyle; Hirth, Richard A; Okeke, Edward N

    2013-01-01

    We use 4 years of data from the retiree health benefits program of the University of Michigan to estimate the effect of price on the health plan choices of Medicare beneficiaries. During the period of our analysis, changes in the University's premium contribution rules led to substantial price changes. A key feature of this 'natural experiment' is that individuals who had retired before a certain date were exempted from having to pay any premium contributions. This 'grandfathering' creates quasi-experimental variation that is ideal for estimating the effect of price. Using regression discontinuity methods, we compare the plan choices of individuals who retired just after the grandfathering cutoff date and were therefore exposed to significant price changes to the choices of a 'control group' of individuals who retired just before that date and therefore did not experience the price changes. The results indicate a statistically significant effect of price, with a $10 increase in monthly premium contributions leading to a 2 to 3 percentage point decrease in a plan's market share. Copyright © 2012 John Wiley & Sons, Ltd.

  16. Efficient and robust cell detection: A structured regression approach.

    Science.gov (United States)

    Xie, Yuanpu; Xing, Fuyong; Shi, Xiaoshuang; Kong, Xiangfei; Su, Hai; Yang, Lin

    2018-02-01

    Efficient and robust cell detection serves as a critical prerequisite for many subsequent biomedical image analysis methods and computer-aided diagnosis (CAD). It remains a challenging task due to touching cells, inhomogeneous background noise, and large variations in cell sizes and shapes. In addition, the ever-increasing amount of available datasets and the high resolution of whole-slice scanned images pose a further demand for efficient processing algorithms. In this paper, we present a novel structured regression model based on a proposed fully residual convolutional neural network for efficient cell detection. For each testing image, our model learns to produce a dense proximity map that exhibits higher responses at locations near cell centers. Our method only requires a few training images with weak annotations (just one dot indicating the cell centroids). We have extensively evaluated our method using four different datasets, covering different microscopy staining methods (e.g., H & E or Ki-67 staining) or image acquisition techniques (e.g., bright-filed image or phase contrast). Experimental results demonstrate the superiority of our method over existing state of the art methods in terms of both detection accuracy and running time. Copyright © 2017. Published by Elsevier B.V.

  17. Neighborhood Effects in Wind Farm Performance: A Regression Approach

    Directory of Open Access Journals (Sweden)

    Matthias Ritter

    2017-03-01

    Full Text Available The optimization of turbine density in wind farms entails a trade-off between the usage of scarce, expensive land and power losses through turbine wake effects. A quantification and prediction of the wake effect, however, is challenging because of the complex aerodynamic nature of the interdependencies of turbines. In this paper, we propose a parsimonious data driven regression wake model that can be used to predict production losses of existing and potential wind farms. Motivated by simple engineering wake models, the predicting variables are wind speed, the turbine alignment angle, and distance. By utilizing data from two wind farms in Germany, we show that our models can compete with the standard Jensen model in predicting wake effect losses. A scenario analysis reveals that a distance between turbines can be reduced by up to three times the rotor size, without entailing substantial production losses. In contrast, an unfavorable configuration of turbines with respect to the main wind direction can result in production losses that are much higher than in an optimal case.

  18. Multi-site solar power forecasting using gradient boosted regression trees

    DEFF Research Database (Denmark)

    Persson, Caroline Stougård; Bacher, Peder; Shiga, Takahiro

    2017-01-01

    The challenges to optimally utilize weather dependent renewable energy sources call for powerful tools for forecasting. This paper presents a non-parametric machine learning approach used for multi-site prediction of solar power generation on a forecast horizon of one to six hours. Historical pow...

  19. Online monitoring and conditional regression tree test: Useful tools for a better understanding of combined sewer network behavior.

    Science.gov (United States)

    Bersinger, T; Bareille, G; Pigot, T; Bru, N; Le Hécho, I

    2018-06-01

    A good knowledge of the dynamic of pollutant concentration and flux in a combined sewer network is necessary when considering solutions to limit the pollutants discharged by combined sewer overflow (CSO) into receiving water during wet weather. Identification of the parameters that influence pollutant concentration and flux is important. Nevertheless, few studies have obtained satisfactory results for the identification of these parameters using statistical tools. Thus, this work uses a large database of rain events (116 over one year) obtained via continuous measurement of rainfall, discharge flow and chemical oxygen demand (COD) estimated using online turbidity for the identification of these parameters. We carried out a statistical study of the parameters influencing the maximum COD concentration, the discharge flow and the discharge COD flux. In this study a new test was used that has never been used in this field: the conditional regression tree test. We have demonstrated that the antecedent dry weather period, the rain event average intensity and the flow before the event are the three main factors influencing the maximum COD concentration during a rainfall event. Regarding the discharge flow, it is mainly influenced by the overall rainfall height but not by the maximum rainfall intensity. Finally, COD discharge flux is influenced by the discharge volume and the maximum COD concentration. Regression trees seem much more appropriate than common tests like PCA and PLS for this type of study as they take into account the thresholds and cumulative effects of various parameters as a function of the target variable. These results could help to improve sewer and CSO management in order to decrease the discharge of pollutants into receiving waters. Copyright © 2017 Elsevier B.V. All rights reserved.

  20. Use of GLM approach to assess the responses of tropical trees to urban air pollution in relation to leaf functional traits and tree characteristics.

    Science.gov (United States)

    Mukherjee, Arideep; Agrawal, Madhoolika

    2018-05-15

    Responses of urban vegetation to air pollution stress in relation to their tolerance and sensitivity have been extensively studied, however, studies related to air pollution responses based on different leaf functional traits and tree characteristics are limited. In this paper, we have tried to assess combined and individual effects of major air pollutants PM 10 (particulate matter ≤ 10 µm), TSP (total suspended particulate matter), SO 2 (sulphur dioxide), NO 2 (nitrogen dioxide) and O 3 (ozone) on thirteen tropical tree species in relation to fifteen leaf functional traits and different tree characteristics. Stepwise linear regression a general linear modelling approach was used to quantify the pollution response of trees against air pollutants. The study was performed for six successive seasons for two years in three distinct urban areas (traffic, industrial and residential) of Varanasi city in India. At all the study sites, concentrations of air pollutants, specifically PM (particulate matter) and NO 2 were above the specified standards. Distinct variations were recorded in all the fifteen leaf functional traits with pollution load. Caesalpinia sappan was identified as most tolerant species followed by Psidium guajava, Dalbergia sissoo and Albizia lebbeck. Stepwise regression analysis identified maximum response of Eucalyptus citriodora and P. guajava to air pollutants explaining overall 59% and 58% variability's in leaf functional traits, respectively. Among leaf functional traits, maximum effect of air pollutants was observed on non-enzymatic antioxidants followed by photosynthetic pigments and leaf water status. Among the pollutants, PM was identified as the major stress factor followed by O 3 explaining 47% and 33% variability's in leaf functional traits. Tolerance and pollution response were regulated by different tree characteristics such as height, canopy size, leaf from, texture and nature of tree. Outcomes of this study will help in urban forest

  1. Detection of Differential Item Functioning with Nonlinear Regression: A Non-IRT Approach Accounting for Guessing

    Science.gov (United States)

    Drabinová, Adéla; Martinková, Patrícia

    2017-01-01

    In this article we present a general approach not relying on item response theory models (non-IRT) to detect differential item functioning (DIF) in dichotomous items with presence of guessing. The proposed nonlinear regression (NLR) procedure for DIF detection is an extension of method based on logistic regression. As a non-IRT approach, NLR can…

  2. A Demographic Approach to Evaluating Tree Population Sustainability

    Directory of Open Access Journals (Sweden)

    Corey R. Halpin

    2017-02-01

    Full Text Available Quantitative criteria for assessing demographic sustainability of tree populations would be useful in forest conservation, as climate change and a growing complex of invasive pests are likely to drive forests outside their historic range of variability. In this paper, we used CANOPY, a spatially explicit, individual‐tree model, to examine the effects of initial size distributions on sustainability of tree populations for 70 northern hardwood stands under current environmental conditions. A demographic sustainability index was calculated as the ratio of future simulated basal area to current basal area, given current demographic structure and density‐dependent demographic equations. Only steeply descending size distributions were indicated to be moderately or highly sustainable (final basal area/initial basal area ≥0.7 over several tree generations. Five of the six principal species had demographic sustainability index values of <0.6 in 40%–84% of the stands. However, at a small landscape scale, nearly all species had mean index values >1. Simulation experiments suggested that a minimum sapling density of 300 per hectare was required to sustain the initial basal area, but further increases in sapling density did not increase basal area because of coincident increases in mortality. A variable slope with high q‐ratios in small size classes was needed to maintain the existing overstory of mature and old‐growth stands. This analytical approach may be useful in identifying stands needing restoration treatments to maintain existing species composition in situations where forests are likely to have future recruitment limitations.

  3. Urban tree mortality: a primer on demographic approaches

    Science.gov (United States)

    Lara A. Roman; John J. Battles; Joe R. McBride

    2016-01-01

    Realizing the benefits of tree planting programs depends on tree survival. Projections of urban forest ecosystem services and cost-benefit analyses are sensitive to assumptions about tree mortality rates. Long-term mortality data are needed to improve the accuracy of these models and optimize the public investment in tree planting. With more accurate population...

  4. Design and analysis of experiments classical and regression approaches with SAS

    CERN Document Server

    Onyiah, Leonard C

    2008-01-01

    Introductory Statistical Inference and Regression Analysis Elementary Statistical Inference Regression Analysis Experiments, the Completely Randomized Design (CRD)-Classical and Regression Approaches Experiments Experiments to Compare Treatments Some Basic Ideas Requirements of a Good Experiment One-Way Experimental Layout or the CRD: Design and Analysis Analysis of Experimental Data (Fixed Effects Model) Expected Values for the Sums of Squares The Analysis of Variance (ANOVA) Table Follow-Up Analysis to Check fo

  5. Decision Tree Approach to Discovering Fraud in Leasing Agreements

    Directory of Open Access Journals (Sweden)

    Horvat Ivan

    2014-09-01

    Full Text Available Background: Fraud attempts create large losses for financing subjects in modern economies. At the same time, leasing agreements have become more and more popular as a means of financing objects such as machinery and vehicles, but are more vulnerable to fraud attempts. Objectives: The goal of the paper is to estimate the usability of the data mining approach in discovering fraud in leasing agreements. Methods/Approach: Real-world data from one Croatian leasing firm was used for creating tow models for fraud detection in leasing. The decision tree method was used for creating a classification model, and the CHAID algorithm was deployed. Results: The decision tree model has indicated that the object of the leasing agreement had the strongest impact on the probability of fraud. Conclusions: In order to enhance the probability of the developed model, it would be necessary to develop software that would enable automated, quick and transparent retrieval of data from the system, processing according to the rules and displaying the results in multiple categories.

  6. Using multiobjective tradeoff sets and Multivariate Regression Trees to identify critical and robust decisions for long term water utility planning

    Science.gov (United States)

    Smith, R.; Kasprzyk, J. R.; Balaji, R.

    2017-12-01

    In light of deeply uncertain factors like future climate change and population shifts, responsible resource management will require new types of information and strategies. For water utilities, this entails potential expansion and efficient management of water supply infrastructure systems for changes in overall supply; changes in frequency and severity of climate extremes such as droughts and floods; and variable demands, all while accounting for conflicting long and short term performance objectives. Multiobjective Evolutionary Algorithms (MOEAs) are emerging decision support tools that have been used by researchers and, more recently, water utilities to efficiently generate and evaluate thousands of planning portfolios. The tradeoffs between conflicting objectives are explored in an automated way to produce (often large) suites of portfolios that strike different balances of performance. Once generated, the sets of optimized portfolios are used to support relatively subjective assertions of priorities and human reasoning, leading to adoption of a plan. These large tradeoff sets contain information about complex relationships between decisions and between groups of decisions and performance that, until now, has not been quantitatively described. We present a novel use of Multivariate Regression Trees (MRTs) to analyze tradeoff sets to reveal these relationships and critical decisions. Additionally, when MRTs are applied to tradeoff sets developed for different realizations of an uncertain future, they can identify decisions that are robust across a wide range of conditions and produce fundamental insights about the system being optimized.

  7. Identifying the critical success factors in the coverage of low vision services using the classification analysis and regression tree methodology.

    Science.gov (United States)

    Chiang, Peggy Pei-Chia; Xie, Jing; Keeffe, Jill Elizabeth

    2011-04-25

    To identify the critical success factors (CSF) associated with coverage of low vision services. Data were collected from a survey distributed to Vision 2020 contacts, government, and non-government organizations (NGOs) in 195 countries. The Classification and Regression Tree Analysis (CART) was used to identify the critical success factors of low vision service coverage. Independent variables were sourced from the survey: policies, epidemiology, provision of services, equipment and infrastructure, barriers to services, human resources, and monitoring and evaluation. Socioeconomic and demographic independent variables: health expenditure, population statistics, development status, and human resources in general, were sourced from the World Health Organization (WHO), World Bank, and the United Nations (UN). The findings identified that having >50% of children obtaining devices when prescribed (χ(2) = 44; P 3 rehabilitation workers per 10 million of population (χ(2) = 4.50; P = 0.034), higher percentage of population urbanized (χ(2) = 14.54; P = 0.002), a level of private investment (χ(2) = 14.55; P = 0.015), and being fully funded by government (χ(2) = 6.02; P = 0.014), are critical success factors associated with coverage of low vision services. This study identified the most important predictors for countries with better low vision coverage. The CART is a useful and suitable methodology in survey research and is a novel way to simplify a complex global public health issue in eye care.

  8. Modelling daily dissolved oxygen concentration using least square support vector machine, multivariate adaptive regression splines and M5 model tree

    Science.gov (United States)

    Heddam, Salim; Kisi, Ozgur

    2018-04-01

    In the present study, three types of artificial intelligence techniques, least square support vector machine (LSSVM), multivariate adaptive regression splines (MARS) and M5 model tree (M5T) are applied for modeling daily dissolved oxygen (DO) concentration using several water quality variables as inputs. The DO concentration and water quality variables data from three stations operated by the United States Geological Survey (USGS) were used for developing the three models. The water quality data selected consisted of daily measured of water temperature (TE, °C), pH (std. unit), specific conductance (SC, μS/cm) and discharge (DI cfs), are used as inputs to the LSSVM, MARS and M5T models. The three models were applied for each station separately and compared to each other. According to the results obtained, it was found that: (i) the DO concentration could be successfully estimated using the three models and (ii) the best model among all others differs from one station to another.

  9. Performance of the modified Poisson regression approach for estimating relative risks from clustered prospective data.

    Science.gov (United States)

    Yelland, Lisa N; Salter, Amy B; Ryan, Philip

    2011-10-15

    Modified Poisson regression, which combines a log Poisson regression model with robust variance estimation, is a useful alternative to log binomial regression for estimating relative risks. Previous studies have shown both analytically and by simulation that modified Poisson regression is appropriate for independent prospective data. This method is often applied to clustered prospective data, despite a lack of evidence to support its use in this setting. The purpose of this article is to evaluate the performance of the modified Poisson regression approach for estimating relative risks from clustered prospective data, by using generalized estimating equations to account for clustering. A simulation study is conducted to compare log binomial regression and modified Poisson regression for analyzing clustered data from intervention and observational studies. Both methods generally perform well in terms of bias, type I error, and coverage. Unlike log binomial regression, modified Poisson regression is not prone to convergence problems. The methods are contrasted by using example data sets from 2 large studies. The results presented in this article support the use of modified Poisson regression as an alternative to log binomial regression for analyzing clustered prospective data when clustering is taken into account by using generalized estimating equations.

  10. An Optimal Sample Data Usage Strategy to Minimize Overfitting and Underfitting Effects in Regression Tree Models Based on Remotely-Sensed Data

    Directory of Open Access Journals (Sweden)

    Yingxin Gu

    2016-11-01

    Full Text Available Regression tree models have been widely used for remote sensing-based ecosystem mapping. Improper use of the sample data (model training and testing data may cause overfitting and underfitting effects in the model. The goal of this study is to develop an optimal sampling data usage strategy for any dataset and identify an appropriate number of rules in the regression tree model that will improve its accuracy and robustness. Landsat 8 data and Moderate-Resolution Imaging Spectroradiometer-scaled Normalized Difference Vegetation Index (NDVI were used to develop regression tree models. A Python procedure was designed to generate random replications of model parameter options across a range of model development data sizes and rule number constraints. The mean absolute difference (MAD between the predicted and actual NDVI (scaled NDVI, value from 0–200 and its variability across the different randomized replications were calculated to assess the accuracy and stability of the models. In our case study, a six-rule regression tree model developed from 80% of the sample data had the lowest MAD (MADtraining = 2.5 and MADtesting = 2.4, which was suggested as the optimal model. This study demonstrates how the training data and rule number selections impact model accuracy and provides important guidance for future remote-sensing-based ecosystem modeling.

  11. Fuzzy multinomial logistic regression analysis: A multi-objective programming approach

    Science.gov (United States)

    Abdalla, Hesham A.; El-Sayed, Amany A.; Hamed, Ramadan

    2017-05-01

    Parameter estimation for multinomial logistic regression is usually based on maximizing the likelihood function. For large well-balanced datasets, Maximum Likelihood (ML) estimation is a satisfactory approach. Unfortunately, ML can fail completely or at least produce poor results in terms of estimated probabilities and confidence intervals of parameters, specially for small datasets. In this study, a new approach based on fuzzy concepts is proposed to estimate parameters of the multinomial logistic regression. The study assumes that the parameters of multinomial logistic regression are fuzzy. Based on the extension principle stated by Zadeh and Bárdossy's proposition, a multi-objective programming approach is suggested to estimate these fuzzy parameters. A simulation study is used to evaluate the performance of the new approach versus Maximum likelihood (ML) approach. Results show that the new proposed model outperforms ML in cases of small datasets.

  12. A Branch-and-Price approach to find optimal decision trees

    NARCIS (Netherlands)

    Firat, M.; Crognier, Guillaume; Gabor, Adriana; Zhang, Y.

    2018-01-01

    In Artificial Intelligence (AI) field, decision trees have gained certain importance due to their effectiveness in solving classification and regression problems. Recently, in the literature we see finding optimal decision trees are formulated as Mixed Integer Linear Programming (MILP) models. This

  13. Employing Measures of Heterogeneity and an Object-Based Approach to Extrapolate Tree Species Distribution Data

    Directory of Open Access Journals (Sweden)

    Trevor G. Jones

    2014-07-01

    Full Text Available Information derived from high spatial resolution remotely sensed data is critical for the effective management of forested ecosystems. However, high spatial resolution data-sets are typically costly to acquire and process and usually provide limited geographic coverage. In contrast, moderate spatial resolution remotely sensed data, while not able to provide the spectral or spatial detail required for certain types of products and applications, offer inexpensive, comprehensive landscape-level coverage. This study assessed using an object-based approach to extrapolate detailed tree species heterogeneity beyond the extent of hyperspectral/LiDAR flightlines to the broader area covered by a Landsat scene. Using image segments, regression trees established ecologically decipherable relationships between tree species heterogeneity and the spectral properties of Landsat segments. The spectral properties of Landsat bands 4 (i.e., NIR: 0.76–0.90 µm, 5 (i.e., SWIR: 1.55–1.75 µm and 7 (SWIR: 2.08–2.35 µm were consistently selected as predictor variables, explaining approximately 50% of variance in richness and diversity. Results have important ramifications for ongoing management initiatives in the study area and are applicable to wide range of applications.

  14. Trees

    Science.gov (United States)

    Al-Khaja, Nawal

    2007-01-01

    This is a thematic lesson plan for young learners about palm trees and the importance of taking care of them. The two part lesson teaches listening, reading and speaking skills. The lesson includes parts of a tree; the modal auxiliary, can; dialogues and a role play activity.

  15. Comparing Kriging and Regression Approaches for Mapping Soil Clay Content in a diverse Danish Landscape

    DEFF Research Database (Denmark)

    Adhikari, Kabindra; Bou Kheir, Rania; Greve, Mette Balslev

    2013-01-01

    Information on the spatial variability of soil texture including soil clay content in a landscape is very important for agricultural and environmental use. Different prediction techniques are available to assess and map spatial variability of soil properties, but selecting the most suitable techn...... the prediction in OKst compared with that in OK, whereas RT showed the lowest performance of all (R2 = 0.52; RMSE = 0.52; and RPD = 1.17). We found RKrr to be an effective prediction method and recommend this method for any future soil mapping activities in Denmark....... technique at a given site has always been a major issue in all soil mapping applications. We studied the prediction performance of ordinary kriging (OK), stratified OK (OKst), regression trees (RT), and rule-based regression kriging (RKrr) for digital mapping of soil clay content at 30.4-m grid size using 6...

  16. GIS-based groundwater potential analysis using novel ensemble weights-of-evidence with logistic regression and functional tree models.

    Science.gov (United States)

    Chen, Wei; Li, Hui; Hou, Enke; Wang, Shengquan; Wang, Guirong; Panahi, Mahdi; Li, Tao; Peng, Tao; Guo, Chen; Niu, Chao; Xiao, Lele; Wang, Jiale; Xie, Xiaoshen; Ahmad, Baharin Bin

    2018-09-01

    The aim of the current study was to produce groundwater spring potential maps using novel ensemble weights-of-evidence (WoE) with logistic regression (LR) and functional tree (FT) models. First, a total of 66 springs were identified by field surveys, out of which 70% of the spring locations were used for training the models and 30% of the spring locations were employed for the validation process. Second, a total of 14 affecting factors including aspect, altitude, slope, plan curvature, profile curvature, stream power index (SPI), topographic wetness index (TWI), sediment transport index (STI), lithology, normalized difference vegetation index (NDVI), land use, soil, distance to roads, and distance to streams was used to analyze the spatial relationship between these affecting factors and spring occurrences. Multicollinearity analysis and feature selection of the correlation attribute evaluation (CAE) method were employed to optimize the affecting factors. Subsequently, the novel ensembles of the WoE, LR, and FT models were constructed using the training dataset. Finally, the receiver operating characteristic (ROC) curves, standard error, confidence interval (CI) at 95%, and significance level P were employed to validate and compare the performance of three models. Overall, all three models performed well for groundwater spring potential evaluation. The prediction capability of the FT model, with the highest AUC values, the smallest standard errors, the narrowest CIs, and the smallest P values for the training and validation datasets, is better compared to those of other models. The groundwater spring potential maps can be adopted for the management of water resources and land use by planners and engineers. Copyright © 2018 Elsevier B.V. All rights reserved.

  17. Landslide susceptibility mapping using decision-tree based CHi-squared automatic interaction detection (CHAID) and Logistic regression (LR) integration

    International Nuclear Information System (INIS)

    Althuwaynee, Omar F; Pradhan, Biswajeet; Ahmad, Noordin

    2014-01-01

    This article uses methodology based on chi-squared automatic interaction detection (CHAID), as a multivariate method that has an automatic classification capacity to analyse large numbers of landslide conditioning factors. This new algorithm was developed to overcome the subjectivity of the manual categorization of scale data of landslide conditioning factors, and to predict rainfall-induced susceptibility map in Kuala Lumpur city and surrounding areas using geographic information system (GIS). The main objective of this article is to use CHi-squared automatic interaction detection (CHAID) method to perform the best classification fit for each conditioning factor, then, combining it with logistic regression (LR). LR model was used to find the corresponding coefficients of best fitting function that assess the optimal terminal nodes. A cluster pattern of landslide locations was extracted in previous study using nearest neighbor index (NNI), which were then used to identify the clustered landslide locations range. Clustered locations were used as model training data with 14 landslide conditioning factors such as; topographic derived parameters, lithology, NDVI, land use and land cover maps. Pearson chi-squared value was used to find the best classification fit between the dependent variable and conditioning factors. Finally the relationship between conditioning factors were assessed and the landslide susceptibility map (LSM) was produced. An area under the curve (AUC) was used to test the model reliability and prediction capability with the training and validation landslide locations respectively. This study proved the efficiency and reliability of decision tree (DT) model in landslide susceptibility mapping. Also it provided a valuable scientific basis for spatial decision making in planning and urban management studies

  18. Landslide susceptibility mapping using decision-tree based CHi-squared automatic interaction detection (CHAID) and Logistic regression (LR) integration

    Science.gov (United States)

    Althuwaynee, Omar F.; Pradhan, Biswajeet; Ahmad, Noordin

    2014-06-01

    This article uses methodology based on chi-squared automatic interaction detection (CHAID), as a multivariate method that has an automatic classification capacity to analyse large numbers of landslide conditioning factors. This new algorithm was developed to overcome the subjectivity of the manual categorization of scale data of landslide conditioning factors, and to predict rainfall-induced susceptibility map in Kuala Lumpur city and surrounding areas using geographic information system (GIS). The main objective of this article is to use CHi-squared automatic interaction detection (CHAID) method to perform the best classification fit for each conditioning factor, then, combining it with logistic regression (LR). LR model was used to find the corresponding coefficients of best fitting function that assess the optimal terminal nodes. A cluster pattern of landslide locations was extracted in previous study using nearest neighbor index (NNI), which were then used to identify the clustered landslide locations range. Clustered locations were used as model training data with 14 landslide conditioning factors such as; topographic derived parameters, lithology, NDVI, land use and land cover maps. Pearson chi-squared value was used to find the best classification fit between the dependent variable and conditioning factors. Finally the relationship between conditioning factors were assessed and the landslide susceptibility map (LSM) was produced. An area under the curve (AUC) was used to test the model reliability and prediction capability with the training and validation landslide locations respectively. This study proved the efficiency and reliability of decision tree (DT) model in landslide susceptibility mapping. Also it provided a valuable scientific basis for spatial decision making in planning and urban management studies.

  19. Comparing pseudo-absences generation techniques in Boosted Regression Trees models for conservation purposes: A case study on amphibians in a protected area.

    Directory of Open Access Journals (Sweden)

    Francesco Cerasoli

    Full Text Available Boosted Regression Trees (BRT is one of the modelling techniques most recently applied to biodiversity conservation and it can be implemented with presence-only data through the generation of artificial absences (pseudo-absences. In this paper, three pseudo-absences generation techniques are compared, namely the generation of pseudo-absences within target-group background (TGB, testing both the weighted (WTGB and unweighted (UTGB scheme, and the generation at random (RDM, evaluating their performance and applicability in distribution modelling and species conservation. The choice of the target group fell on amphibians, because of their rapid decline worldwide and the frequent lack of guidelines for conservation strategies and regional-scale planning, which instead could be provided through an appropriate implementation of SDMs. Bufo bufo, Salamandrina perspicillata and Triturus carnifex were considered as target species, in order to perform our analysis with species having different ecological and distributional characteristics. The study area is the "Gran Sasso-Monti della Laga" National Park, which hosts 15 Natura 2000 sites and represents one of the most important biodiversity hotspots in Europe. Our results show that the model calibration ameliorates when using the target-group based pseudo-absences compared to the random ones, especially when applying the WTGB. Contrarily, model discrimination did not significantly vary in a consistent way among the three approaches with respect to the tree target species. Both WTGB and RDM clearly isolate the highly contributing variables, supplying many relevant indications for species conservation actions. Moreover, the assessment of pairwise variable interactions and their three-dimensional visualization further increase the amount of useful information for protected areas' managers. Finally, we suggest the use of RDM as an admissible alternative when it is not possible to individuate a suitable set of

  20. Predicting 30-day Hospital Readmission with Publicly Available Administrative Database. A Conditional Logistic Regression Modeling Approach.

    Science.gov (United States)

    Zhu, K; Lou, Z; Zhou, J; Ballester, N; Kong, N; Parikh, P

    2015-01-01

    This article is part of the Focus Theme of Methods of Information in Medicine on "Big Data and Analytics in Healthcare". Hospital readmissions raise healthcare costs and cause significant distress to providers and patients. It is, therefore, of great interest to healthcare organizations to predict what patients are at risk to be readmitted to their hospitals. However, current logistic regression based risk prediction models have limited prediction power when applied to hospital administrative data. Meanwhile, although decision trees and random forests have been applied, they tend to be too complex to understand among the hospital practitioners. Explore the use of conditional logistic regression to increase the prediction accuracy. We analyzed an HCUP statewide inpatient discharge record dataset, which includes patient demographics, clinical and care utilization data from California. We extracted records of heart failure Medicare beneficiaries who had inpatient experience during an 11-month period. We corrected the data imbalance issue with under-sampling. In our study, we first applied standard logistic regression and decision tree to obtain influential variables and derive practically meaning decision rules. We then stratified the original data set accordingly and applied logistic regression on each data stratum. We further explored the effect of interacting variables in the logistic regression modeling. We conducted cross validation to assess the overall prediction performance of conditional logistic regression (CLR) and compared it with standard classification models. The developed CLR models outperformed several standard classification models (e.g., straightforward logistic regression, stepwise logistic regression, random forest, support vector machine). For example, the best CLR model improved the classification accuracy by nearly 20% over the straightforward logistic regression model. Furthermore, the developed CLR models tend to achieve better sensitivity of

  1. Comparative analysis of neural network and regression based condition monitoring approaches for wind turbine fault detection

    DEFF Research Database (Denmark)

    Schlechtingen, Meik; Santos, Ilmar

    2011-01-01

    This paper presents the research results of a comparison of three different model based approaches for wind turbine fault detection in online SCADA data, by applying developed models to five real measured faults and anomalies. The regression based model as the simplest approach to build a normal...

  2. Efficient fault tree handling - the Asea-Atom approach

    International Nuclear Information System (INIS)

    Ericsson, G.; Knochenhauer, M.; Mills, R.

    1985-01-01

    In recent years there has been a trend in Swedish Probabilistic Safety Analysis (PSA) work towards coordination of the tools and methods used, in order to facilitate exchange of information and review. Thus, standardized methods for fault tree drawing and basic event coding have been developed as well as a number of computer codes for fault tree handling. The computer code used by Asea-Atom is called SUPER-TREE. As indicated by the name, the key feature is the concept of one super tree containing all the information necessary in the fault tree analysis, i.e. system fault trees, sequence fault trees and component data base. The code has proved to allow great flexibility in the choice of level of detail in the analysis

  3. Penalized regression techniques for prediction: a case study for predicting tree mortality using remotely sensed vegetation indices

    NARCIS (Netherlands)

    Lazaridis, D.C.; Verbesselt, J.; Robinson, A.P.

    2011-01-01

    Constructing models can be complicated when the available fitting data are highly correlated and of high dimension. However, the complications depend on whether the goal is prediction instead of estimation. We focus on predicting tree mortality (measured as the number of dead trees) from change

  4. Multiple regression approach to predict turbine-generator output for Chinshan nuclear power plant

    International Nuclear Information System (INIS)

    Chan, Yea-Kuang; Tsai, Yu-Ching

    2017-01-01

    The objective of this study is to develop a turbine cycle model using the multiple regression approach to estimate the turbine-generator output for the Chinshan Nuclear Power Plant (NPP). The plant operating data was verified using a linear regression model with a corresponding 95% confidence interval for the operating data. In this study, the key parameters were selected as inputs for the multiple regression based turbine cycle model. The proposed model was used to estimate the turbine-generator output. The effectiveness of the proposed turbine cycle model was demonstrated by using plant operating data obtained from the Chinshan NPP Unit 2. The results show that this multiple regression based turbine cycle model can be used to accurately estimate the turbine-generator output. In addition, this study also provides an alternative approach with simple and easy features to evaluate the thermal performance for nuclear power plants.

  5. Multiple regression approach to predict turbine-generator output for Chinshan nuclear power plant

    Energy Technology Data Exchange (ETDEWEB)

    Chan, Yea-Kuang; Tsai, Yu-Ching [Institute of Nuclear Energy Research, Taoyuan City, Taiwan (China). Nuclear Engineering Division

    2017-03-15

    The objective of this study is to develop a turbine cycle model using the multiple regression approach to estimate the turbine-generator output for the Chinshan Nuclear Power Plant (NPP). The plant operating data was verified using a linear regression model with a corresponding 95% confidence interval for the operating data. In this study, the key parameters were selected as inputs for the multiple regression based turbine cycle model. The proposed model was used to estimate the turbine-generator output. The effectiveness of the proposed turbine cycle model was demonstrated by using plant operating data obtained from the Chinshan NPP Unit 2. The results show that this multiple regression based turbine cycle model can be used to accurately estimate the turbine-generator output. In addition, this study also provides an alternative approach with simple and easy features to evaluate the thermal performance for nuclear power plants.

  6. Metabolic activity of tree saps of different origin towards cultured human cells in the light of grade correspondence analysis and multiple regression modeling

    Directory of Open Access Journals (Sweden)

    Artur Wnorowski

    2017-06-01

    Full Text Available Tree saps are nourishing biological media commonly used for beverage and syrup production. Although the nutritional aspect of tree saps is widely acknowledged, the exact relationship between the sap composition, origin, and effect on the metabolic rate of human cells is still elusive. Thus, we collected saps from seven different tree species and conducted composition-activity analysis. Saps from trees of Betulaceae, but not from Salicaceae, Sapindaceae, nor Juglandaceae families, were increasing the metabolic rate of HepG2 cells, as measured using tetrazolium-based assay. Content of glucose, fructose, sucrose, chlorides, nitrates, sulphates, fumarates, malates, and succinates in sap samples varied across different tree species. Grade correspondence analysis clustered trees based on the saps’ chemical footprint indicating its usability in chemotaxonomy. Multiple regression modeling showed that glucose and fumarate present in saps from silver birch (Betula pendula Roth., black alder (Alnus glutinosa Gaertn., and European hornbeam (Carpinus betulus L. are positively affecting the metabolic activity of HepG2 cells.

  7. "Trees and Things That Live in Trees": Three Children with Special Needs Experience the Project Approach

    Science.gov (United States)

    Griebling, Susan; Elgas, Peg; Konerman, Rachel

    2015-01-01

    The authors report on research conducted during a project investigation undertaken with preschool children, ages 3-5. The report focuses on three children with special needs and the positive outcomes for each child as they engaged in the project Trees and Things That Live in Trees. Two of the children were diagnosed with developmental delays, and…

  8. Dual Regression

    OpenAIRE

    Spady, Richard; Stouli, Sami

    2012-01-01

    We propose dual regression as an alternative to the quantile regression process for the global estimation of conditional distribution functions under minimal assumptions. Dual regression provides all the interpretational power of the quantile regression process while avoiding the need for repairing the intersecting conditional quantile surfaces that quantile regression often produces in practice. Our approach introduces a mathematical programming characterization of conditional distribution f...

  9. Support vector methods for survival analysis: a comparison between ranking and regression approaches.

    Science.gov (United States)

    Van Belle, Vanya; Pelckmans, Kristiaan; Van Huffel, Sabine; Suykens, Johan A K

    2011-10-01

    To compare and evaluate ranking, regression and combined machine learning approaches for the analysis of survival data. The literature describes two approaches based on support vector machines to deal with censored observations. In the first approach the key idea is to rephrase the task as a ranking problem via the concordance index, a problem which can be solved efficiently in a context of structural risk minimization and convex optimization techniques. In a second approach, one uses a regression approach, dealing with censoring by means of inequality constraints. The goal of this paper is then twofold: (i) introducing a new model combining the ranking and regression strategy, which retains the link with existing survival models such as the proportional hazards model via transformation models; and (ii) comparison of the three techniques on 6 clinical and 3 high-dimensional datasets and discussing the relevance of these techniques over classical approaches fur survival data. We compare svm-based survival models based on ranking constraints, based on regression constraints and models based on both ranking and regression constraints. The performance of the models is compared by means of three different measures: (i) the concordance index, measuring the model's discriminating ability; (ii) the logrank test statistic, indicating whether patients with a prognostic index lower than the median prognostic index have a significant different survival than patients with a prognostic index higher than the median; and (iii) the hazard ratio after normalization to restrict the prognostic index between 0 and 1. Our results indicate a significantly better performance for models including regression constraints above models only based on ranking constraints. This work gives empirical evidence that svm-based models using regression constraints perform significantly better than svm-based models based on ranking constraints. Our experiments show a comparable performance for methods

  10. Understanding how roadside concentrations of NOx are influenced by the background levels, traffic density, and meteorological conditions using Boosted Regression Trees

    Science.gov (United States)

    Sayegh, Arwa; Tate, James E.; Ropkins, Karl

    2016-02-01

    Oxides of Nitrogen (NOx) is a major component of photochemical smog and its constituents are considered principal traffic-related pollutants affecting human health. This study investigates the influence of background concentrations of NOx, traffic density, and prevailing meteorological conditions on roadside concentrations of NOx at UK urban, open motorway, and motorway tunnel sites using the statistical approach Boosted Regression Trees (BRT). BRT models have been fitted using hourly concentration, traffic, and meteorological data for each site. The models predict, rank, and visualise the relationship between model variables and roadside NOx concentrations. A strong relationship between roadside NOx and monitored local background concentrations is demonstrated. Relationships between roadside NOx and other model variables have been shown to be strongly influenced by the quality and resolution of background concentrations of NOx, i.e. if it were based on monitored data or modelled prediction. The paper proposes a direct method of using site-specific fundamental diagrams for splitting traffic data into four traffic states: free-flow, busy-flow, congested, and severely congested. Using BRT models, the density of traffic (vehicles per kilometre) was observed to have a proportional influence on the concentrations of roadside NOx, with different fitted regression line slopes for the different traffic states. When other influences are conditioned out, the relationship between roadside concentrations and ambient air temperature suggests NOx concentrations reach a minimum at around 22 °C with high concentrations at low ambient air temperatures which could be associated to restricted atmospheric dispersion and/or to changes in road traffic exhaust emission characteristics at low ambient air temperatures. This paper uses BRT models to study how different critical factors, and their relative importance, influence the variation of roadside NOx concentrations. The paper

  11. A Quantile Regression Approach to Estimating the Distribution of Anesthetic Procedure Time during Induction.

    Directory of Open Access Journals (Sweden)

    Hsin-Lun Wu

    Full Text Available Although procedure time analyses are important for operating room management, it is not easy to extract useful information from clinical procedure time data. A novel approach was proposed to analyze procedure time during anesthetic induction. A two-step regression analysis was performed to explore influential factors of anesthetic induction time (AIT. Linear regression with stepwise model selection was used to select significant correlates of AIT and then quantile regression was employed to illustrate the dynamic relationships between AIT and selected variables at distinct quantiles. A total of 1,060 patients were analyzed. The first and second-year residents (R1-R2 required longer AIT than the third and fourth-year residents and attending anesthesiologists (p = 0.006. Factors prolonging AIT included American Society of Anesthesiologist physical status ≧ III, arterial, central venous and epidural catheterization, and use of bronchoscopy. Presence of surgeon before induction would decrease AIT (p < 0.001. Types of surgery also had significant influence on AIT. Quantile regression satisfactorily estimated extra time needed to complete induction for each influential factor at distinct quantiles. Our analysis on AIT demonstrated the benefit of quantile regression analysis to provide more comprehensive view of the relationships between procedure time and related factors. This novel two-step regression approach has potential applications to procedure time analysis in operating room management.

  12. A best-first tree-searching approach for ML decoding in MIMO system

    KAUST Repository

    Shen, Chung-An; Eltawil, Ahmed M.; Mondal, Sudip; Salama, Khaled N.

    2012-01-01

    In MIMO communication systems maximum-likelihood (ML) decoding can be formulated as a tree-searching problem. This paper presents a tree-searching approach that combines the features of classical depth-first and breadth-first approaches to achieve

  13. Modeling Personalized Email Prioritization: Classification-based and Regression-based Approaches

    Energy Technology Data Exchange (ETDEWEB)

    Yoo S.; Yang, Y.; Carbonell, J.

    2011-10-24

    Email overload, even after spam filtering, presents a serious productivity challenge for busy professionals and executives. One solution is automated prioritization of incoming emails to ensure the most important are read and processed quickly, while others are processed later as/if time permits in declining priority levels. This paper presents a study of machine learning approaches to email prioritization into discrete levels, comparing ordinal regression versus classier cascades. Given the ordinal nature of discrete email priority levels, SVM ordinal regression would be expected to perform well, but surprisingly a cascade of SVM classifiers significantly outperforms ordinal regression for email prioritization. In contrast, SVM regression performs well -- better than classifiers -- on selected UCI data sets. This unexpected performance inversion is analyzed and results are presented, providing core functionality for email prioritization systems.

  14. Delineating Individual Trees from Lidar Data: A Comparison of Vector- and Raster-based Segmentation Approaches

    Directory of Open Access Journals (Sweden)

    Maggi Kelly

    2013-08-01

    Full Text Available Light detection and ranging (lidar data is increasingly being used for ecosystem monitoring across geographic scales. This work concentrates on delineating individual trees in topographically-complex, mixed conifer forest across the California’s Sierra Nevada. We delineated individual trees using vector data and a 3D lidar point cloud segmentation algorithm, and using raster data with an object-based image analysis (OBIA of a canopy height model (CHM. The two approaches are compared to each other and to ground reference data. We used high density (9 pulses/m2, discreet lidar data and WorldView-2 imagery to delineate individual trees, and to classify them by species or species types. We also identified a new method to correct artifacts in a high-resolution CHM. Our main focus was to determine the difference between the two types of approaches and to identify the one that produces more realistic results. We compared the delineations via tree detection, tree heights, and the shape of the generated polygons. The tree height agreement was high between the two approaches and the ground data (r2: 0.93–0.96. Tree detection rates increased for more dominant trees (8–100 percent. The two approaches delineated tree boundaries that differed in shape: the lidar-approach produced fewer, more complex, and larger polygons that more closely resembled real forest structure.

  15. Ordinary Least Squares and Quantile Regression: An Inquiry-Based Learning Approach to a Comparison of Regression Methods

    Science.gov (United States)

    Helmreich, James E.; Krog, K. Peter

    2018-01-01

    We present a short, inquiry-based learning course on concepts and methods underlying ordinary least squares (OLS), least absolute deviation (LAD), and quantile regression (QR). Students investigate squared, absolute, and weighted absolute distance functions (metrics) as location measures. Using differential calculus and properties of convex…

  16. Time series modeling by a regression approach based on a latent process.

    Science.gov (United States)

    Chamroukhi, Faicel; Samé, Allou; Govaert, Gérard; Aknin, Patrice

    2009-01-01

    Time series are used in many domains including finance, engineering, economics and bioinformatics generally to represent the change of a measurement over time. Modeling techniques may then be used to give a synthetic representation of such data. A new approach for time series modeling is proposed in this paper. It consists of a regression model incorporating a discrete hidden logistic process allowing for activating smoothly or abruptly different polynomial regression models. The model parameters are estimated by the maximum likelihood method performed by a dedicated Expectation Maximization (EM) algorithm. The M step of the EM algorithm uses a multi-class Iterative Reweighted Least-Squares (IRLS) algorithm to estimate the hidden process parameters. To evaluate the proposed approach, an experimental study on simulated data and real world data was performed using two alternative approaches: a heteroskedastic piecewise regression model using a global optimization algorithm based on dynamic programming, and a Hidden Markov Regression Model whose parameters are estimated by the Baum-Welch algorithm. Finally, in the context of the remote monitoring of components of the French railway infrastructure, and more particularly the switch mechanism, the proposed approach has been applied to modeling and classifying time series representing the condition measurements acquired during switch operations.

  17. A maximum pseudo-likelihood approach for estimating species trees under the coalescent model

    Directory of Open Access Journals (Sweden)

    Edwards Scott V

    2010-10-01

    Full Text Available Abstract Background Several phylogenetic approaches have been developed to estimate species trees from collections of gene trees. However, maximum likelihood approaches for estimating species trees under the coalescent model are limited. Although the likelihood of a species tree under the multispecies coalescent model has already been derived by Rannala and Yang, it can be shown that the maximum likelihood estimate (MLE of the species tree (topology, branch lengths, and population sizes from gene trees under this formula does not exist. In this paper, we develop a pseudo-likelihood function of the species tree to obtain maximum pseudo-likelihood estimates (MPE of species trees, with branch lengths of the species tree in coalescent units. Results We show that the MPE of the species tree is statistically consistent as the number M of genes goes to infinity. In addition, the probability that the MPE of the species tree matches the true species tree converges to 1 at rate O(M -1. The simulation results confirm that the maximum pseudo-likelihood approach is statistically consistent even when the species tree is in the anomaly zone. We applied our method, Maximum Pseudo-likelihood for Estimating Species Trees (MP-EST to a mammal dataset. The four major clades found in the MP-EST tree are consistent with those in the Bayesian concatenation tree. The bootstrap supports for the species tree estimated by the MP-EST method are more reasonable than the posterior probability supports given by the Bayesian concatenation method in reflecting the level of uncertainty in gene trees and controversies over the relationship of four major groups of placental mammals. Conclusions MP-EST can consistently estimate the topology and branch lengths (in coalescent units of the species tree. Although the pseudo-likelihood is derived from coalescent theory, and assumes no gene flow or horizontal gene transfer (HGT, the MP-EST method is robust to a small amount of HGT in the

  18. A Novel Imbalanced Data Classification Approach Based on Logistic Regression and Fisher Discriminant

    Directory of Open Access Journals (Sweden)

    Baofeng Shi

    2015-01-01

    Full Text Available We introduce an imbalanced data classification approach based on logistic regression significant discriminant and Fisher discriminant. First of all, a key indicators extraction model based on logistic regression significant discriminant and correlation analysis is derived to extract features for customer classification. Secondly, on the basis of the linear weighted utilizing Fisher discriminant, a customer scoring model is established. And then, a customer rating model where the customer number of all ratings follows normal distribution is constructed. The performance of the proposed model and the classical SVM classification method are evaluated in terms of their ability to correctly classify consumers as default customer or nondefault customer. Empirical results using the data of 2157 customers in financial engineering suggest that the proposed approach better performance than the SVM model in dealing with imbalanced data classification. Moreover, our approach contributes to locating the qualified customers for the banks and the bond investors.

  19. Decision tree approach for classification of remotely sensed satellite ...

    Indian Academy of Sciences (India)

    sensed satellite data using open source support. Richa Sharma .... Decision tree classification techniques have been .... the USGS Earth Resource Observation Systems. (EROS) ... for shallow water, 11% were for sparse and dense built-up ...

  20. Decision tree approach for classification of remotely sensed satellite

    Indian Academy of Sciences (India)

    DTC) algorithm for classification of remotely sensed satellite data (Landsat TM) using open source support. The decision tree is constructed by recursively partitioning the spectral distribution of the training dataset using WEKA, open source ...

  1. An automated approach to the design of decision tree classifiers

    Science.gov (United States)

    Argentiero, P.; Chin, R.; Beaudet, P.

    1982-01-01

    An automated technique is presented for designing effective decision tree classifiers predicated only on a priori class statistics. The procedure relies on linear feature extractions and Bayes table look-up decision rules. Associated error matrices are computed and utilized to provide an optimal design of the decision tree at each so-called 'node'. A by-product of this procedure is a simple algorithm for computing the global probability of correct classification assuming the statistical independence of the decision rules. Attention is given to a more precise definition of decision tree classification, the mathematical details on the technique for automated decision tree design, and an example of a simple application of the procedure using class statistics acquired from an actual Landsat scene.

  2. A computational approach to compare regression modelling strategies in prediction research.

    Science.gov (United States)

    Pajouheshnia, Romin; Pestman, Wiebe R; Teerenstra, Steven; Groenwold, Rolf H H

    2016-08-25

    It is often unclear which approach to fit, assess and adjust a model will yield the most accurate prediction model. We present an extension of an approach for comparing modelling strategies in linear regression to the setting of logistic regression and demonstrate its application in clinical prediction research. A framework for comparing logistic regression modelling strategies by their likelihoods was formulated using a wrapper approach. Five different strategies for modelling, including simple shrinkage methods, were compared in four empirical data sets to illustrate the concept of a priori strategy comparison. Simulations were performed in both randomly generated data and empirical data to investigate the influence of data characteristics on strategy performance. We applied the comparison framework in a case study setting. Optimal strategies were selected based on the results of a priori comparisons in a clinical data set and the performance of models built according to each strategy was assessed using the Brier score and calibration plots. The performance of modelling strategies was highly dependent on the characteristics of the development data in both linear and logistic regression settings. A priori comparisons in four empirical data sets found that no strategy consistently outperformed the others. The percentage of times that a model adjustment strategy outperformed a logistic model ranged from 3.9 to 94.9 %, depending on the strategy and data set. However, in our case study setting the a priori selection of optimal methods did not result in detectable improvement in model performance when assessed in an external data set. The performance of prediction modelling strategies is a data-dependent process and can be highly variable between data sets within the same clinical domain. A priori strategy comparison can be used to determine an optimal logistic regression modelling strategy for a given data set before selecting a final modelling approach.

  3. QUANTITATIVE ELECTRONIC STRUCTURE - ACTIVITY RELATIONSHIP OF ANTIMALARIAL COMPOUND OF ARTEMISININ DERIVATIVES USING PRINCIPAL COMPONENT REGRESSION APPROACH

    Directory of Open Access Journals (Sweden)

    Paul Robert Martin Werfette

    2010-06-01

    Full Text Available Analysis of quantitative structure - activity relationship (QSAR for a series of antimalarial compound artemisinin derivatives has been done using principal component regression. The descriptors for QSAR study were representation of electronic structure i.e. atomic net charges of the artemisinin skeleton calculated by AM1 semi-empirical method. The antimalarial activity of the compound was expressed in log 1/IC50 which is an experimental data. The main purpose of the principal component analysis approach is to transform a large data set of atomic net charges to simplify into a data set which known as latent variables. The best QSAR equation to analyze of log 1/IC50 can be obtained from the regression method as a linear function of several latent variables i.e. x1, x2, x3, x4 and x5. The best QSAR model is expressed in the following equation,  (;;   Keywords: QSAR, antimalarial, artemisinin, principal component regression

  4. A fuzzy regression with support vector machine approach to the estimation of horizontal global solar radiation

    International Nuclear Information System (INIS)

    Baser, Furkan; Demirhan, Haydar

    2017-01-01

    Accurate estimation of the amount of horizontal global solar radiation for a particular field is an important input for decision processes in solar radiation investments. In this article, we focus on the estimation of yearly mean daily horizontal global solar radiation by using an approach that utilizes fuzzy regression functions with support vector machine (FRF-SVM). This approach is not seriously affected by outlier observations and does not suffer from the over-fitting problem. To demonstrate the utility of the FRF-SVM approach in the estimation of horizontal global solar radiation, we conduct an empirical study over a dataset collected in Turkey and applied the FRF-SVM approach with several kernel functions. Then, we compare the estimation accuracy of the FRF-SVM approach to an adaptive neuro-fuzzy system and a coplot supported-genetic programming approach. We observe that the FRF-SVM approach with a Gaussian kernel function is not affected by both outliers and over-fitting problem and gives the most accurate estimates of horizontal global solar radiation among the applied approaches. Consequently, the use of hybrid fuzzy functions and support vector machine approaches is found beneficial in long-term forecasting of horizontal global solar radiation over a region with complex climatic and terrestrial characteristics. - Highlights: • A fuzzy regression functions with support vector machines approach is proposed. • The approach is robust against outlier observations and over-fitting problem. • Estimation accuracy of the model is superior to several existent alternatives. • A new solar radiation estimation model is proposed for the region of Turkey. • The model is useful under complex terrestrial and climatic conditions.

  5. Corporate Social Responsibility and Financial Performance: A Two Least Regression Approach

    Directory of Open Access Journals (Sweden)

    Alexander Olawumi Dabor

    2017-12-01

    Full Text Available The objective of this study is to investigate the casuality between corporate social responsibility and firm financial performance. The study employed two least square regression approaches. Fifty-two firms were selected using the scientific method. The findings revealed that corporate social responsibility and firm performance in manufacturing sector are mutually related at 5%. The study recommended that management of manufacturing companies in Nigeria should expend on CSR to boost profitability and corporate image.

  6. Modelling the return distribution of salmon farming companies : a quantile regression approach

    OpenAIRE

    Jacobsen, Fredrik

    2017-01-01

    The salmon farming industry has gained increased attention from investors, portfolio managers, financial analysts and other stakeholders the recent years. Despite this development, very little is known about the risk and return of salmon farming company stocks, and especially how the relationship between risk and return varies under different market conditions, given the volatile nature of the salmon farming industry. We approach this problem by using quantile regression to examine the relati...

  7. A new approach to enhance the performance of decision tree for classifying gene expression data.

    Science.gov (United States)

    Hassan, Md; Kotagiri, Ramamohanarao

    2013-12-20

    Gene expression data classification is a challenging task due to the large dimensionality and very small number of samples. Decision tree is one of the popular machine learning approaches to address such classification problems. However, the existing decision tree algorithms use a single gene feature at each node to split the data into its child nodes and hence might suffer from poor performance specially when classifying gene expression dataset. By using a new decision tree algorithm where, each node of the tree consists of more than one gene, we enhance the classification performance of traditional decision tree classifiers. Our method selects suitable genes that are combined using a linear function to form a derived composite feature. To determine the structure of the tree we use the area under the Receiver Operating Characteristics curve (AUC). Experimental analysis demonstrates higher classification accuracy using the new decision tree compared to the other existing decision trees in literature. We experimentally compare the effect of our scheme against other well known decision tree techniques. Experiments show that our algorithm can substantially boost the classification performance of the decision tree.

  8. A land use regression model for ambient ultrafine particles in Montreal, Canada: A comparison of linear regression and a machine learning approach.

    Science.gov (United States)

    Weichenthal, Scott; Ryswyk, Keith Van; Goldstein, Alon; Bagg, Scott; Shekkarizfard, Maryam; Hatzopoulou, Marianne

    2016-04-01

    Existing evidence suggests that ambient ultrafine particles (UFPs) (regression model for UFPs in Montreal, Canada using mobile monitoring data collected from 414 road segments during the summer and winter months between 2011 and 2012. Two different approaches were examined for model development including standard multivariable linear regression and a machine learning approach (kernel-based regularized least squares (KRLS)) that learns the functional form of covariate impacts on ambient UFP concentrations from the data. The final models included parameters for population density, ambient temperature and wind speed, land use parameters (park space and open space), length of local roads and rail, and estimated annual average NOx emissions from traffic. The final multivariable linear regression model explained 62% of the spatial variation in ambient UFP concentrations whereas the KRLS model explained 79% of the variance. The KRLS model performed slightly better than the linear regression model when evaluated using an external dataset (R(2)=0.58 vs. 0.55) or a cross-validation procedure (R(2)=0.67 vs. 0.60). In general, our findings suggest that the KRLS approach may offer modest improvements in predictive performance compared to standard multivariable linear regression models used to estimate spatial variations in ambient UFPs. However, differences in predictive performance were not statistically significant when evaluated using the cross-validation procedure. Crown Copyright © 2015. Published by Elsevier Inc. All rights reserved.

  9. Oak Wilt: People and Trees, A Community Approach to Management

    Science.gov (United States)

    J. Juzwik; S. Cook; L. Haugen; J. Elwell

    2004-01-01

    Version 1.3. This self-paced short course on CD-ROM was designed as a learning tool for urban and community foresters, city administrators, tree inspectors, parks and recreation staff, and others involved in oak wilt management.Click the "View or print this publication" link below to request your Oak Wilt: People and...

  10. Hierarchical Multinomial Processing Tree Models: A Latent-Trait Approach

    Science.gov (United States)

    Klauer, Karl Christoph

    2010-01-01

    Multinomial processing tree models are widely used in many areas of psychology. A hierarchical extension of the model class is proposed, using a multivariate normal distribution of person-level parameters with the mean and covariance matrix to be estimated from the data. The hierarchical model allows one to take variability between persons into…

  11. A basic approach to fire injury of tree stems

    Science.gov (United States)

    R. E. Martin

    1963-01-01

    Fire has come to be widely used as a tool in wildland management, particularly in the South. Its usefulness in fire hazard reduction, removal of undesirable trees, and changing of cover types has been demonstrated. We are continually trying to improve fire use, however, by learning more of the specific effects of fire on different species of plants.

  12. An Economic Approach to Planting Trees for Carbon Storage

    Science.gov (United States)

    Peter J. Parks; David O. Hall; Bengt Kristrom; Omar R. Masera; Robert J. Multon; Andrew J. Plantinga; Joel N. Swisher; Jack K. Winjum

    1997-01-01

    Abstract: Methods are described for evaluating economic and carbon storage aspects of tree planting projects (e.g., plantations for restoration, roundwood, bioenergy, and nonwood products). Total carbon (C) stock is dynamic and comprises C in vegetation, decomposing matter, soil, products, and fuel substituted. An alternative (reference) case is...

  13. Poisson regression approach for modeling fatal injury rates amongst Malaysian workers

    International Nuclear Information System (INIS)

    Kamarulzaman Ibrahim; Heng Khai Theng

    2005-01-01

    Many safety studies are based on the analysis carried out on injury surveillance data. The injury surveillance data gathered for the analysis include information on number of employees at risk of injury in each of several strata where the strata are defined in terms of a series of important predictor variables. Further insight into the relationship between fatal injury rates and predictor variables may be obtained by the poisson regression approach. Poisson regression is widely used in analyzing count data. In this study, poisson regression is used to model the relationship between fatal injury rates and predictor variables which are year (1995-2002), gender, recording system and industry type. Data for the analysis were obtained from PERKESO and Jabatan Perangkaan Malaysia. It is found that the assumption that the data follow poisson distribution has been violated. After correction for the problem of over dispersion, the predictor variables that are found to be significant in the model are gender, system of recording, industry type, two interaction effects (interaction between recording system and industry type and between year and industry type). Introduction Regression analysis is one of the most popular

  14. Classification and regression tree (CART) analyses of genomic signatures reveal sets of tetramers that discriminate temperature optima of archaea and bacteria

    Science.gov (United States)

    Dyer, Betsey D.; Kahn, Michael J.; LeBlanc, Mark D.

    2008-01-01

    Classification and regression tree (CART) analysis was applied to genome-wide tetranucleotide frequencies (genomic signatures) of 195 archaea and bacteria. Although genomic signatures have typically been used to classify evolutionary divergence, in this study, convergent evolution was the focus. Temperature optima for most of the organisms examined could be distinguished by CART analyses of tetranucleotide frequencies. This suggests that pervasive (nonlinear) qualities of genomes may reflect certain environmental conditions (such as temperature) in which those genomes evolved. The predominant use of GAGA and AGGA as the discriminating tetramers in CART models suggests that purine-loading and codon biases of thermophiles may explain some of the results. PMID:19054742

  15. The quantile regression approach to efficiency measurement: insights from Monte Carlo simulations.

    Science.gov (United States)

    Liu, Chunping; Laporte, Audrey; Ferguson, Brian S

    2008-09-01

    In the health economics literature there is an ongoing debate over approaches used to estimate the efficiency of health systems at various levels, from the level of the individual hospital - or nursing home - up to that of the health system as a whole. The two most widely used approaches to evaluating the efficiency with which various units deliver care are non-parametric data envelopment analysis (DEA) and parametric stochastic frontier analysis (SFA). Productivity researchers tend to have very strong preferences over which methodology to use for efficiency estimation. In this paper, we use Monte Carlo simulation to compare the performance of DEA and SFA in terms of their ability to accurately estimate efficiency. We also evaluate quantile regression as a potential alternative approach. A Cobb-Douglas production function, random error terms and a technical inefficiency term with different distributions are used to calculate the observed output. The results, based on these experiments, suggest that neither DEA nor SFA can be regarded as clearly dominant, and that, depending on the quantile estimated, the quantile regression approach may be a useful addition to the armamentarium of methods for estimating technical efficiency.

  16. Mastectomy or breast conserving surgery? Factors affecting type of surgical treatment for breast cancer – a classification tree approach

    International Nuclear Information System (INIS)

    Martin, Michael A; Meyricke, Ramona; O'Neill, Terry; Roberts, Steven

    2006-01-01

    A critical choice facing breast cancer patients is which surgical treatment – mastectomy or breast conserving surgery (BCS) – is most appropriate. Several studies have investigated factors that impact the type of surgery chosen, identifying features such as place of residence, age at diagnosis, tumor size, socio-economic and racial/ethnic elements as relevant. Such assessment of 'propensity' is important in understanding issues such as a reported under-utilisation of BCS among women for whom such treatment was not contraindicated. Using Western Australian (WA) data, we further examine the factors associated with the type of surgical treatment for breast cancer using a classification tree approach. This approach deals naturally with complicated interactions between factors, and so allows flexible and interpretable models for treatment choice to be built that add to the current understanding of this complex decision process. Data was extracted from the WA Cancer Registry on women diagnosed with breast cancer in WA from 1990 to 2000. Subjects' treatment preferences were predicted from covariates using both classification trees and logistic regression. Tumor size was the primary determinant of patient choice, subjects with tumors smaller than 20 mm in diameter preferring BCS. For subjects with tumors greater than 20 mm in diameter factors such as patient age, nodal status, and tumor histology become relevant as predictors of patient choice. Classification trees perform as well as logistic regression for predicting patient choice, but are much easier to interpret for clinical use. The selected tree can inform clinicians' advice to patients

  17. Identifying Risk Factors for Drug Use in an Iranian Treatment Sample: A Prediction Approach Using Decision Trees.

    Science.gov (United States)

    Amirabadizadeh, Alireza; Nezami, Hossein; Vaughn, Michael G; Nakhaee, Samaneh; Mehrpour, Omid

    2018-05-12

    Substance abuse exacts considerable social and health care burdens throughout the world. The aim of this study was to create a prediction model to better identify risk factors for drug use. A prospective cross-sectional study was conducted in South Khorasan Province, Iran. Of the total of 678 eligible subjects, 70% (n: 474) were randomly selected to provide a training set for constructing decision tree and multiple logistic regression (MLR) models. The remaining 30% (n: 204) were employed in a holdout sample to test the performance of the decision tree and MLR models. Predictive performance of different models was analyzed by the receiver operating characteristic (ROC) curve using the testing set. Independent variables were selected from demographic characteristics and history of drug use. For the decision tree model, the sensitivity and specificity for identifying people at risk for drug abuse were 66% and 75%, respectively, while the MLR model was somewhat less effective at 60% and 73%. Key independent variables in the analyses included first substance experience, age at first drug use, age, place of residence, history of cigarette use, and occupational and marital status. While study findings are exploratory and lack generalizability they do suggest that the decision tree model holds promise as an effective classification approach for identifying risk factors for drug use. Convergent with prior research in Western contexts is that age of drug use initiation was a critical factor predicting a substance use disorder.

  18. A holistic approach to determine tree structural complexity based on laser scanning data and fractal analysis.

    Science.gov (United States)

    Seidel, Dominik

    2018-01-01

    The three-dimensional forest structure affects many ecosystem functions and services provided by forests. As forests are made of trees it seems reasonable to approach their structure by investigating individual tree structure. Based on three-dimensional point clouds from laser scanning, a newly developed holistic approach is presented that enables to calculate the box dimension as a measure of structural complexity of individual trees using fractal analysis. It was found that the box dimension of trees was significantly different among the tested species, among trees belonging to the same species but exposed to different growing conditions (at gap vs. forest interior) or to different kinds of competition (intraspecific vs. interspecific). Furthermore, it was shown that the box dimension is positively related to the trees' growth rate. The box dimension was identified as an easy to calculate measure that integrates the effect of several external drivers of tree structure, such as competition strength and type, while simultaneously providing information on structure-related properties, like tree growth.

  19. Increased tree establishment in Lithuanian peat bogs--insights from field and remotely sensed approaches.

    Science.gov (United States)

    Edvardsson, Johannes; Šimanauskienė, Rasa; Taminskas, Julius; Baužienė, Ieva; Stoffel, Markus

    2015-02-01

    Over the past century an ongoing establishment of Scots pine (Pinus sylvestris L.), sometimes at accelerating rates, is noted at three studied Lithuanian peat bogs, namely Kerėplis, Rėkyva and Aukštumala, all representing different degrees of tree coverage and geographic settings. Present establishment rates seem to depend on tree density on the bog surface and are most significant at sparsely covered sites where about three-fourth of the trees have established since the mid-1990s, whereas the initial establishment in general was during the early to mid-19th century. Three methods were used to detect, compare and describe tree establishment: (1) tree counts in small plots, (2) dendrochronological dating of bog pine trees, and (3) interpretation of aerial photographs and historical maps of the study areas. In combination, the different approaches provide complimentary information but also weigh up each other's drawbacks. Tree counts in plots provided a reasonable overview of age class distributions and enabled capturing of the most recently established trees with ages less than 50 years. The dendrochronological analysis yielded accurate tree ages and a good temporal resolution of long-term changes. Tree establishment and spread interpreted from aerial photographs and historical maps provided a good overview of tree spread and total affected area. It also helped to verify the results obtained with the other methods and an upscaling of findings to the entire peat bogs. The ongoing spread of trees in predominantly undisturbed peat bogs is related to warmer and/or drier climatic conditions, and to a minor degree to land-use changes. Our results therefore provide valuable insights into vegetation changes in peat bogs, also with respect to bog response to ongoing and future climatic changes. Copyright © 2014 Elsevier B.V. All rights reserved.

  20. An Introduction to Recursive Partitioning: Rationale, Application, and Characteristics of Classification and Regression Trees, Bagging, and Random Forests

    Science.gov (United States)

    Strobl, Carolin; Malley, James; Tutz, Gerhard

    2009-01-01

    Recursive partitioning methods have become popular and widely used tools for nonparametric regression and classification in many scientific fields. Especially random forests, which can deal with large numbers of predictor variables even in the presence of complex interactions, have been applied successfully in genetics, clinical medicine, and…

  1. Systems analysis approach to probabilistic modeling of fault trees

    International Nuclear Information System (INIS)

    Bartholomew, R.J.; Qualls, C.R.

    1985-01-01

    A method of probabilistic modeling of fault tree logic combined with stochastic process theory (Markov modeling) has been developed. Systems are then quantitatively analyzed probabilistically in terms of their failure mechanisms including common cause/common mode effects and time dependent failure and/or repair rate effects that include synergistic and propagational mechanisms. The modeling procedure results in a state vector set of first order, linear, inhomogeneous, differential equations describing the time dependent probabilities of failure described by the fault tree. The solutions of this Failure Mode State Variable (FMSV) model are cumulative probability distribution functions of the system. A method of appropriate synthesis of subsystems to form larger systems is developed and applied to practical nuclear power safety systems

  2. A data mining approach to optimize pellets manufacturing process based on a decision tree algorithm.

    Science.gov (United States)

    Ronowicz, Joanna; Thommes, Markus; Kleinebudde, Peter; Krysiński, Jerzy

    2015-06-20

    The present study is focused on the thorough analysis of cause-effect relationships between pellet formulation characteristics (pellet composition as well as process parameters) and the selected quality attribute of the final product. The shape using the aspect ratio value expressed the quality of pellets. A data matrix for chemometric analysis consisted of 224 pellet formulations performed by means of eight different active pharmaceutical ingredients and several various excipients, using different extrusion/spheronization process conditions. The data set contained 14 input variables (both formulation and process variables) and one output variable (pellet aspect ratio). A tree regression algorithm consistent with the Quality by Design concept was applied to obtain deeper understanding and knowledge of formulation and process parameters affecting the final pellet sphericity. The clear interpretable set of decision rules were generated. The spehronization speed, spheronization time, number of holes and water content of extrudate have been recognized as the key factors influencing pellet aspect ratio. The most spherical pellets were achieved by using a large number of holes during extrusion, a high spheronizer speed and longer time of spheronization. The described data mining approach enhances knowledge about pelletization process and simultaneously facilitates searching for the optimal process conditions which are necessary to achieve ideal spherical pellets, resulting in good flow characteristics. This data mining approach can be taken into consideration by industrial formulation scientists to support rational decision making in the field of pellets technology. Copyright © 2015 Elsevier B.V. All rights reserved.

  3. Flux Measurements in Trees: Methodological Approach and Application to Vineyards

    Directory of Open Access Journals (Sweden)

    Francesca De Lorenzi

    2008-03-01

    Full Text Available In this paper a review of two sap flow methods for measuring the transpiration in vineyards is presented. The objective of this work is to examine the potential of detecting transpiration in trees in response to environmental stresses, particularly the high concentration of ozone (O3 in troposphere. The methods described are the stem heat balance and the thermal dissipation probe; advantages and disadvantages of each method are detailed. Applications of both techniques are shown, in two large commercial vineyards in Southern Italy (Apulia and Sicily, submitted to semi-arid climate. Sap flow techniques allow to measure transpiration at plant scale and an upscaling procedure is necessary to calculate the transpiration at the whole stand level. Here a general technique to link the value of transpiration at plant level to the canopy value is presented, based on experimental relationships between transpiration and biometric characteristics of the trees. In both vineyards transpiration measured by sap flow methods compares well with evapotranspiration measured by micrometeorological techniques at canopy scale. Moreover soil evaporation component has been quantified. In conclusion, comments about the suitability of the sap flow methods for studying the interactions between trees and ozone are given.

  4. A multi-scale relevance vector regression approach for daily urban water demand forecasting

    Science.gov (United States)

    Bai, Yun; Wang, Pu; Li, Chuan; Xie, Jingjing; Wang, Yin

    2014-09-01

    Water is one of the most important resources for economic and social developments. Daily water demand forecasting is an effective measure for scheduling urban water facilities. This work proposes a multi-scale relevance vector regression (MSRVR) approach to forecast daily urban water demand. The approach uses the stationary wavelet transform to decompose historical time series of daily water supplies into different scales. At each scale, the wavelet coefficients are used to train a machine-learning model using the relevance vector regression (RVR) method. The estimated coefficients of the RVR outputs for all of the scales are employed to reconstruct the forecasting result through the inverse wavelet transform. To better facilitate the MSRVR forecasting, the chaos features of the daily water supply series are analyzed to determine the input variables of the RVR model. In addition, an adaptive chaos particle swarm optimization algorithm is used to find the optimal combination of the RVR model parameters. The MSRVR approach is evaluated using real data collected from two waterworks and is compared with recently reported methods. The results show that the proposed MSRVR method can forecast daily urban water demand much more precisely in terms of the normalized root-mean-square error, correlation coefficient, and mean absolute percentage error criteria.

  5. A knowledge-based approach to the evaluation of fault trees

    International Nuclear Information System (INIS)

    Hwang, Yann-Jong; Chow, Louis R.; Huang, Henry C.

    1996-01-01

    A list of critical components is useful for determining the potential problems of a complex system. However, to find this list through evaluating the fault trees is expensive and time consuming. This paper intends to propose an integrated software program which consists of a fault tree constructor, a knowledge base, and an efficient algorithm for evaluating minimal cut sets of a large fault tree. The proposed algorithm uses the approaches of top-down heuristic searching and the probability-based truncation. That makes the evaluation of fault trees obviously efficient and provides critical components for solving the potential problems in complex systems. Finally, some practical fault trees are included to illustrate the results

  6. Partitioning of late gestation energy expenditure in ewes using indirect calorimetry and a linear regression approach

    DEFF Research Database (Denmark)

    Kiani, Alishir; Chwalibog, André; Nielsen, Mette O

    2007-01-01

    Late gestation energy expenditure (EE(gest)) originates from energy expenditure (EE) of development of conceptus (EE(conceptus)) and EE of homeorhetic adaptation of metabolism (EE(homeorhetic)). Even though EE(gest) is relatively easy to quantify, its partitioning is problematic. In the present...... study metabolizable energy (ME) intake ranges for twin-bearing ewes were 220-440, 350- 700, 350-900 kJ per metabolic body weight (W0.75) at week seven, five, two pre-partum respectively. Indirect calorimetry and a linear regression approach were used to quantify EE(gest) and then partition to EE......(conceptus) and EE(homeorhetic). Energy expenditure of basal metabolism of the non-gravid tissues (EE(bmng)), derived from the intercept of the linear regression equation of retained energy [kJ/W0.75] and ME intake [kJ/W(0.75)], was 298 [kJ/ W0.75]. Values of the intercepts of the regression equations at week seven...

  7. Partitioning of Multivariate Phenotypes using Regression Trees Reveals Complex Patterns of Adaptation to Climate across the Range of Black Cottonwood (Populus trichocarpa

    Directory of Open Access Journals (Sweden)

    Regis Wendpouire Oubida

    2015-03-01

    Full Text Available Local adaptation to climate in temperate forest trees involves the integration of multiple physiological, morphological, and phenological traits. Latitudinal clines are frequently observed for these traits, but environmental constraints also track longitude and altitude. We combined extensive phenotyping of 12 candidate adaptive traits, multivariate regression trees, quantitative genetics, and a genome-wide panel of SNP markers to better understand the interplay among geography, climate, and adaptation to abiotic factors in Populus trichocarpa. Heritabilities were low to moderate (0.13 to 0.32 and population differentiation for many traits exceeded the 99th percentile of the genome-wide distribution of FST, suggesting local adaptation. When climate variables were taken as predictors and the 12 traits as response variables in a multivariate regression tree analysis, evapotranspiration (Eref explained the most variation, with subsequent splits related to mean temperature of the warmest month, frost-free period (FFP, and mean annual precipitation (MAP. These grouping matched relatively well the splits using geographic variables as predictors: the northernmost groups (short FFP and low Eref had the lowest growth, and lowest cold injury index; the southern British Columbia group (low Eref and intermediate temperatures had average growth and cold injury index; the group from the coast of California and Oregon (high Eref and FFP had the highest growth performance and the highest cold injury index; and the southernmost, high-altitude group (with high Eref and low FFP performed poorly, had high cold injury index, and lower water use efficiency. Taken together, these results suggest variation in both temperature and water availability across the range shape multivariate adaptive traits in poplar.

  8. A symptom based decision tree approach to boiling water reactor emergency operating procedures

    International Nuclear Information System (INIS)

    Knobel, R.C.

    1984-01-01

    This paper describes a Decision Tree approach to development of BWR Emergency Operating Procedures for use by operators during emergencies. This approach utilizes the symptom based Emergency Procedure Guidelines approved for implementation by the USNRC. Included in the paper is a discussion of the relative merits of the event based Emergency Operating Procedures currently in use at USBWR plants. The body of the paper is devoted to a discussion of the Decision Tree Approach to Emergency Operating Procedures soon to be implemented at two United States Boiling Water Reactor plants, why this approach solves many of the problems with procedures indentified in the post accident reviews of Three Mile Island procedures, and why only now is this approach both desirable and feasible. The paper discusses how nuclear plant simulators were involved in the development of the Emergency Operating Procedure decision trees, and in the verification and validation of these procedures. (orig./HP)

  9. Which sociodemographic factors are important on smoking behaviour of high school students? The contribution of classification and regression tree methodology in a broad epidemiological survey.

    Science.gov (United States)

    Ozge, C; Toros, F; Bayramkaya, E; Camdeviren, H; Sasmaz, T

    2006-08-01

    The purpose of this study is to evaluate the most important sociodemographic factors on smoking status of high school students using a broad randomised epidemiological survey. Using in-class, self administered questionnaire about their sociodemographic variables and smoking behaviour, a representative sample of total 3304 students of preparatory, 9th, 10th, and 11th grades, from 22 randomly selected schools of Mersin, were evaluated and discriminative factors have been determined using appropriate statistics. In addition to binary logistic regression analysis, the study evaluated combined effects of these factors using classification and regression tree methodology, as a new statistical method. The data showed that 38% of the students reported lifetime smoking and 16.9% of them reported current smoking with a male predominancy and increasing prevalence by age. Second hand smoking was reported at a 74.3% frequency with father predominance (56.6%). The significantly important factors that affect current smoking in these age groups were increased by household size, late birth rank, certain school types, low academic performance, increased second hand smoking, and stress (especially reported as separation from a close friend or because of violence at home). Classification and regression tree methodology showed the importance of some neglected sociodemographic factors with a good classification capacity. It was concluded that, as closely related with sociocultural factors, smoking was a common problem in this young population, generating important academic and social burden in youth life and with increasing data about this behaviour and using new statistical methods, effective coping strategies could be composed.

  10. Bridging process-based and empirical approaches to modeling tree growth

    Science.gov (United States)

    Harry T. Valentine; Annikki Makela; Annikki Makela

    2005-01-01

    The gulf between process-based and empirical approaches to modeling tree growth may be bridged, in part, by the use of a common model. To this end, we have formulated a process-based model of tree growth that can be fitted and applied in an empirical mode. The growth model is grounded in pipe model theory and an optimal control model of crown development. Together, the...

  11. When homogeneity meets heterogeneity: the geographically weighted regression with spatial lag approach to prenatal care utilization

    Science.gov (United States)

    Shoff, Carla; Chen, Vivian Yi-Ju; Yang, Tse-Chuan

    2014-01-01

    Using geographically weighted regression (GWR), a recent study by Shoff and colleagues (2012) investigated the place-specific risk factors for prenatal care utilization in the US and found that most of the relationships between late or not prenatal care and its determinants are spatially heterogeneous. However, the GWR approach may be subject to the confounding effect of spatial homogeneity. The goal of this study is to address this concern by including both spatial homogeneity and heterogeneity into the analysis. Specifically, we employ an analytic framework where a spatially lagged (SL) effect of the dependent variable is incorporated into the GWR model, which is called GWR-SL. Using this innovative framework, we found evidence to argue that spatial homogeneity is neglected in the study by Shoff et al. (2012) and the results are changed after considering the spatially lagged effect of prenatal care utilization. The GWR-SL approach allows us to gain a place-specific understanding of prenatal care utilization in US counties. In addition, we compared the GWR-SL results with the results of conventional approaches (i.e., OLS and spatial lag models) and found that GWR-SL is the preferred modeling approach. The new findings help us to better estimate how the predictors are associated with prenatal care utilization across space, and determine whether and how the level of prenatal care utilization in neighboring counties matters. PMID:24893033

  12. Binary Logistic Regression Versus Boosted Regression Trees in Assessing Landslide Susceptibility for Multiple-Occurring Regional Landslide Events: Application to the 2009 Storm Event in Messina (Sicily, southern Italy).

    Science.gov (United States)

    Lombardo, L.; Cama, M.; Maerker, M.; Parisi, L.; Rotigliano, E.

    2014-12-01

    This study aims at comparing the performances of Binary Logistic Regression (BLR) and Boosted Regression Trees (BRT) methods in assessing landslide susceptibility for multiple-occurrence regional landslide events within the Mediterranean region. A test area was selected in the north-eastern sector of Sicily (southern Italy), corresponding to the catchments of the Briga and the Giampilieri streams both stretching for few kilometres from the Peloritan ridge (eastern Sicily, Italy) to the Ionian sea. This area was struck on the 1st October 2009 by an extreme climatic event resulting in thousands of rapid shallow landslides, mainly of debris flows and debris avalanches types involving the weathered layer of a low to high grade metamorphic bedrock. Exploiting the same set of predictors and the 2009 landslide archive, BLR- and BRT-based susceptibility models were obtained for the two catchments separately, adopting a random partition (RP) technique for validation; besides, the models trained in one of the two catchments (Briga) were tested in predicting the landslide distribution in the other (Giampilieri), adopting a spatial partition (SP) based validation procedure. All the validation procedures were based on multi-folds tests so to evaluate and compare the reliability of the fitting, the prediction skill, the coherence in the predictor selection and the precision of the susceptibility estimates. All the obtained models for the two methods produced very high predictive performances, with a general congruence between BLR and BRT in the predictor importance. In particular, the research highlighted that BRT-models reached a higher prediction performance with respect to BLR-models, for RP based modelling, whilst for the SP-based models the difference in predictive skills between the two methods dropped drastically, converging to an analogous excellent performance. However, when looking at the precision of the probability estimates, BLR demonstrated to produce more robust

  13. Ensemble of trees approaches to risk adjustment for evaluating a hospital's performance.

    Science.gov (United States)

    Liu, Yang; Traskin, Mikhail; Lorch, Scott A; George, Edward I; Small, Dylan

    2015-03-01

    A commonly used method for evaluating a hospital's performance on an outcome is to compare the hospital's observed outcome rate to the hospital's expected outcome rate given its patient (case) mix and service. The process of calculating the hospital's expected outcome rate given its patient mix and service is called risk adjustment (Iezzoni 1997). Risk adjustment is critical for accurately evaluating and comparing hospitals' performances since we would not want to unfairly penalize a hospital just because it treats sicker patients. The key to risk adjustment is accurately estimating the probability of an Outcome given patient characteristics. For cases with binary outcomes, the method that is commonly used in risk adjustment is logistic regression. In this paper, we consider ensemble of trees methods as alternatives for risk adjustment, including random forests and Bayesian additive regression trees (BART). Both random forests and BART are modern machine learning methods that have been shown recently to have excellent performance for prediction of outcomes in many settings. We apply these methods to carry out risk adjustment for the performance of neonatal intensive care units (NICU). We show that these ensemble of trees methods outperform logistic regression in predicting mortality among babies treated in NICU, and provide a superior method of risk adjustment compared to logistic regression.

  14. A MULTIVARIATE APPROACH TO ANALYSE NATIVE FOREST TREE SPECIE SEEDS

    Directory of Open Access Journals (Sweden)

    Alessandro Dal Col Lúcio

    2006-03-01

    Full Text Available This work grouped, by species, the most similar seed tree, using the variables observed in exotic forest species of theBrazilian flora of seeds collected in the Forest Research and Soil Conservation Center of Santa Maria, Rio Grande do Sul, analyzedfrom January, 1997, to march, 2003. For the cluster analysis, all the species that possessed four or more analyses per lot wereanalyzed by the hierarchical Clustering method, of the standardized Euclidian medium distance, being also a principal componentanalysis technique for reducing the number of variables. The species Callistemon speciosus, Cassia fistula, Eucalyptus grandis,Eucalyptus robusta, Eucalyptus saligna, Eucalyptus tereticornis, Delonix regia, Jacaranda mimosaefolia e Pinus elliottii presentedmore than four analyses per lot, in which the third and fourth main components explained 80% of the total variation. The clusteranalysis was efficient in the separation of the groups of all tested species, as well as the method of the main components.

  15. RAVEN. Dynamic Event Tree Approach Level III Milestone

    Energy Technology Data Exchange (ETDEWEB)

    Alfonsi, Andrea [Idaho National Lab. (INL), Idaho Falls, ID (United States); Rabiti, Cristian [Idaho National Lab. (INL), Idaho Falls, ID (United States); Mandelli, Diego [Idaho National Lab. (INL), Idaho Falls, ID (United States); Cogliati, Joshua [Idaho National Lab. (INL), Idaho Falls, ID (United States); Kinoshita, Robert [Idaho National Lab. (INL), Idaho Falls, ID (United States)

    2014-07-01

    Conventional Event-Tree (ET) based methodologies are extensively used as tools to perform reliability and safety assessment of complex and critical engineering systems. One of the disadvantages of these methods is that timing/sequencing of events and system dynamics are not explicitly accounted for in the analysis. In order to overcome these limitations several techniques, also know as Dynamic Probabilistic Risk Assessment (DPRA), have been developed. Monte-Carlo (MC) and Dynamic Event Tree (DET) are two of the most widely used D-PRA methodologies to perform safety assessment of Nuclear Power Plants (NPP). In the past two years, the Idaho National Laboratory (INL) has developed its own tool to perform Dynamic PRA: RAVEN (Reactor Analysis and Virtual control ENvironment). RAVEN has been designed to perform two main tasks: 1) control logic driver for the new Thermo-Hydraulic code RELAP-7 and 2) post-processing tool. In the first task, RAVEN acts as a deterministic controller in which the set of control logic laws (user defined) monitors the RELAP-7 simulation and controls the activation of specific systems. Moreover, the control logic infrastructure is used to model stochastic events, such as components failures, and perform uncertainty propagation. Such stochastic modeling is deployed using both MC and DET algorithms. In the second task, RAVEN processes the large amount of data generated by RELAP-7 using data-mining based algorithms. This report focuses on the analysis of dynamic stochastic systems using the newly developed RAVEN DET capability. As an example, a DPRA analysis, using DET, of a simplified pressurized water reactor for a Station Black-Out (SBO) scenario is presented.

  16. RAVEN: Dynamic Event Tree Approach Level III Milestone

    Energy Technology Data Exchange (ETDEWEB)

    Andrea Alfonsi; Cristian Rabiti; Diego Mandelli; Joshua Cogliati; Robert Kinoshita

    2013-07-01

    Conventional Event-Tree (ET) based methodologies are extensively used as tools to perform reliability and safety assessment of complex and critical engineering systems. One of the disadvantages of these methods is that timing/sequencing of events and system dynamics are not explicitly accounted for in the analysis. In order to overcome these limitations several techniques, also know as Dynamic Probabilistic Risk Assessment (DPRA), have been developed. Monte-Carlo (MC) and Dynamic Event Tree (DET) are two of the most widely used D-PRA methodologies to perform safety assessment of Nuclear Power Plants (NPP). In the past two years, the Idaho National Laboratory (INL) has developed its own tool to perform Dynamic PRA: RAVEN (Reactor Analysis and Virtual control ENvironment). RAVEN has been designed to perform two main tasks: 1) control logic driver for the new Thermo-Hydraulic code RELAP-7 and 2) post-processing tool. In the first task, RAVEN acts as a deterministic controller in which the set of control logic laws (user defined) monitors the RELAP-7 simulation and controls the activation of specific systems. Moreover, the control logic infrastructure is used to model stochastic events, such as components failures, and perform uncertainty propagation. Such stochastic modeling is deployed using both MC and DET algorithms. In the second task, RAVEN processes the large amount of data generated by RELAP-7 using data-mining based algorithms. This report focuses on the analysis of dynamic stochastic systems using the newly developed RAVEN DET capability. As an example, a DPRA analysis, using DET, of a simplified pressurized water reactor for a Station Black-Out (SBO) scenario is presented.

  17. A different approach to estimate nonlinear regression model using numerical methods

    Science.gov (United States)

    Mahaboob, B.; Venkateswarlu, B.; Mokeshrayalu, G.; Balasiddamuni, P.

    2017-11-01

    This research paper concerns with the computational methods namely the Gauss-Newton method, Gradient algorithm methods (Newton-Raphson method, Steepest Descent or Steepest Ascent algorithm method, the Method of Scoring, the Method of Quadratic Hill-Climbing) based on numerical analysis to estimate parameters of nonlinear regression model in a very different way. Principles of matrix calculus have been used to discuss the Gradient-Algorithm methods. Yonathan Bard [1] discussed a comparison of gradient methods for the solution of nonlinear parameter estimation problems. However this article discusses an analytical approach to the gradient algorithm methods in a different way. This paper describes a new iterative technique namely Gauss-Newton method which differs from the iterative technique proposed by Gorden K. Smyth [2]. Hans Georg Bock et.al [10] proposed numerical methods for parameter estimation in DAE’s (Differential algebraic equation). Isabel Reis Dos Santos et al [11], Introduced weighted least squares procedure for estimating the unknown parameters of a nonlinear regression metamodel. For large-scale non smooth convex minimization the Hager and Zhang (HZ) conjugate gradient Method and the modified HZ (MHZ) method were presented by Gonglin Yuan et al [12].

  18. A modified approach to estimating sample size for simple logistic regression with one continuous covariate.

    Science.gov (United States)

    Novikov, I; Fund, N; Freedman, L S

    2010-01-15

    Different methods for the calculation of sample size for simple logistic regression (LR) with one normally distributed continuous covariate give different results. Sometimes the difference can be large. Furthermore, some methods require the user to specify the prevalence of cases when the covariate equals its population mean, rather than the more natural population prevalence. We focus on two commonly used methods and show through simulations that the power for a given sample size may differ substantially from the nominal value for one method, especially when the covariate effect is large, while the other method performs poorly if the user provides the population prevalence instead of the required parameter. We propose a modification of the method of Hsieh et al. that requires specification of the population prevalence and that employs Schouten's sample size formula for a t-test with unequal variances and group sizes. This approach appears to increase the accuracy of the sample size estimates for LR with one continuous covariate.

  19. Heterogeneous effects of oil shocks on exchange rates: evidence from a quantile regression approach.

    Science.gov (United States)

    Su, Xianfang; Zhu, Huiming; You, Wanhai; Ren, Yinghua

    2016-01-01

    The determinants of exchange rates have attracted considerable attention among researchers over the past several decades. Most studies, however, ignore the possibility that the impact of oil shocks on exchange rates could vary across the exchange rate returns distribution. We employ a quantile regression approach to address this issue. Our results indicate that the effect of oil shocks on exchange rates is heterogeneous across quantiles. A large US depreciation or appreciation tends to heighten the effects of oil shocks on exchange rate returns. Positive oil demand shocks lead to appreciation pressures in oil-exporting countries and this result is robust across lower and upper return distributions. These results offer rich and useful information for investors and decision-makers.

  20. Quantifying the ability of environmental parameters to predict soil texture fractions using regression-tree model with GIS and LIDAR data

    DEFF Research Database (Denmark)

    Greve, Mogens Humlekrog; Bou Kheir, Rania; Greve, Mette Balslev

    2012-01-01

    Soil texture is an important soil characteristic that drives crop production and field management, and is the basis for environmental monitoring (including soil quality and sustainability, hydrological and ecological processes, and climate change simulations). The combination of coarse sand, fine...... sand, silt, and clay in soil determines its textural classification. This study used Geographic Information Systems (GIS) and regression-tree modeling to precisely quantify the relationships between the soil texture fractions and different environmental parameters on a national scale, and to detect...... precipitation, seasonal precipitation to statistically explain soil texture fractions field/laboratory measurements (45,224 sampling sites) in the area of interest (Denmark). The developed strongest relationships were associated with clay and silt, variance being equal to 60%, followed by coarse sand (54...

  1. Predictors of adherence with self-care guidelines among persons with type 2 diabetes: results from a logistic regression tree analysis.

    Science.gov (United States)

    Yamashita, Takashi; Kart, Cary S; Noe, Douglas A

    2012-12-01

    Type 2 diabetes is known to contribute to health disparities in the U.S. and failure to adhere to recommended self-care behaviors is a contributing factor. Intervention programs face difficulties as a result of patient diversity and limited resources. With data from the 2005 Behavioral Risk Factor Surveillance System, this study employs a logistic regression tree algorithm to identify characteristics of sub-populations with type 2 diabetes according to their reported frequency of adherence to four recommended diabetes self-care behaviors including blood glucose monitoring, foot examination, eye examination and HbA1c testing. Using Andersen's health behavior model, need factors appear to dominate the definition of which sub-groups were at greatest risk for low as well as high adherence. Findings demonstrate the utility of easily interpreted tree diagrams to design specific culturally appropriate intervention programs targeting sub-populations of diabetes patients who need to improve their self-care behaviors. Limitations and contributions of the study are discussed.

  2. A new approach to nuclear reactor design optimization using genetic algorithms and regression analysis

    International Nuclear Information System (INIS)

    Kumar, Akansha; Tsvetkov, Pavel V.

    2015-01-01

    desired power peaking limits, desired effective and infinite neutron multiplication factors, high fast fission factor, high thermal efficiency in the conversion from thermal energy to electrical energy using the Brayton cycle, and high fuel burn-up. It is to be noted that we have kept the total mass of the fuel as constant. In this work, we present a module based (modular) approach to perform the optimization wherein, we have defined the following modules: single fuel pin cell, whole core, thermal–hydraulics, and energy conversion. In each of the modules we have defined a specific set of parameters and optimization objectives. The GA system (GAS), and RS together, play the role of optimizing each of the individual modules, and integrating the modules to determine the final nuclear reactor core. However, implementation of GA could lead to a local minimum or a non-unique set of parameters, those meet the specific optimization objectives. The GA code is built using Java, neutronic analysis using MCNP6, thermal–hydraulics calculations using Java, and regression analysis using R

  3. A regression tree for identifying combinations of fall risk factors associated to recurrent falling: a cross-sectional elderly population-based study.

    Science.gov (United States)

    Kabeshova, A; Annweiler, C; Fantino, B; Philip, T; Gromov, V A; Launay, C P; Beauchet, O

    2014-06-01

    Regression tree (RT) analyses are particularly adapted to explore the risk of recurrent falling according to various combinations of fall risk factors compared to logistic regression models. The aims of this study were (1) to determine which combinations of fall risk factors were associated with the occurrence of recurrent falls in older community-dwellers, and (2) to compare the efficacy of RT and multiple logistic regression model for the identification of recurrent falls. A total of 1,760 community-dwelling volunteers (mean age ± standard deviation, 71.0 ± 5.1 years; 49.4 % female) were recruited prospectively in this cross-sectional study. Age, gender, polypharmacy, use of psychoactive drugs, fear of falling (FOF), cognitive disorders and sad mood were recorded. In addition, the history of falls within the past year was recorded using a standardized questionnaire. Among 1,760 participants, 19.7 % (n = 346) were recurrent fallers. The RT identified 14 nodes groups and 8 end nodes with FOF as the first major split. Among participants with FOF, those who had sad mood and polypharmacy formed the end node with the greatest OR for recurrent falls (OR = 6.06 with p falls (OR = 0.25 with p factors for recurrent falls, the combination most associated with recurrent falls involving FOF, sad mood and polypharmacy. The FOF emerged as the risk factor strongly associated with recurrent falls. In addition, RT and multiple logistic regression were not sensitive enough to identify the majority of recurrent fallers but appeared efficient in detecting individuals not at risk of recurrent falls.

  4. Decision Tree Approach to Discovering Fraud in Leasing Agreements

    OpenAIRE

    Horvat Ivan; Pejić Bach Mirjana; Merkač Skok Marjana

    2014-01-01

    Background: Fraud attempts create large losses for financing subjects in modern economies. At the same time, leasing agreements have become more and more popular as a means of financing objects such as machinery and vehicles, but are more vulnerable to fraud attempts. Objectives: The goal of the paper is to estimate the usability of the data mining approach in discovering fraud in leasing agreements. Methods/Approach: Real-world data from one Croatian leasing firm was used for creating tow mo...

  5. A Gaussian process regression based hybrid approach for short-term wind speed prediction

    International Nuclear Information System (INIS)

    Zhang, Chi; Wei, Haikun; Zhao, Xin; Liu, Tianhong; Zhang, Kanjian

    2016-01-01

    Highlights: • A novel hybrid approach is proposed for short-term wind speed prediction. • This method combines the parametric AR model with the non-parametric GPR model. • The relative importance of different inputs is considered. • Different types of covariance functions are considered and combined. • It can provide both accurate point forecasts and satisfactory prediction intervals. - Abstract: This paper proposes a hybrid model based on autoregressive (AR) model and Gaussian process regression (GPR) for probabilistic wind speed forecasting. In the proposed approach, the AR model is employed to capture the overall structure from wind speed series, and the GPR is adopted to extract the local structure. Additionally, automatic relevance determination (ARD) is used to take into account the relative importance of different inputs, and different types of covariance functions are combined to capture the characteristics of the data. The proposed hybrid model is compared with the persistence model, artificial neural network (ANN), and support vector machine (SVM) for one-step ahead forecasting, using wind speed data collected from three wind farms in China. The forecasting results indicate that the proposed method can not only improve point forecasts compared with other methods, but also generate satisfactory prediction intervals.

  6. A linear regression approach to evaluate the green supply chain management impact on industrial organizational performance.

    Science.gov (United States)

    Mumtaz, Ubaidullah; Ali, Yousaf; Petrillo, Antonella

    2018-05-15

    The increase in the environmental pollution is one of the most important topic in today's world. In this context, the industrial activities can pose a significant threat to the environment. To manage problems associate to industrial activities several methods, techniques and approaches have been developed. Green supply chain management (GSCM) is considered one of the most important "environmental management approach". In developing countries such as Pakistan the implementation of GSCM practices is still in its initial stages. Lack of knowledge about its effects on economic performance is the reason because of industries fear to implement these practices. The aim of this research is to perceive the effects of GSCM practices on organizational performance in Pakistan. In this research the GSCM practices considered are: internal practices, external practices, investment recovery and eco-design. While, the performance parameters considered are: environmental pollution, operational cost and organizational flexibility. A set of hypothesis propose the effect of each GSCM practice on the performance parameters. Factor analysis and linear regression are used to analyze the survey data of Pakistani industries, in order to authenticate these hypotheses. The findings of this research indicate a decrease in environmental pollution and operational cost with the implementation of GSCM practices, whereas organizational flexibility has not improved for Pakistani industries. These results aim to help managers regarding their decision of implementing GSCM practices in the industrial sector of Pakistan. Copyright © 2017 Elsevier B.V. All rights reserved.

  7. Explaining the heterogeneous scrapie surveillance figures across Europe: a meta-regression approach

    Directory of Open Access Journals (Sweden)

    Ru Giuseppe

    2007-06-01

    Full Text Available Abstract Background Two annual surveys, the abattoir and the fallen stock, monitor the presence of scrapie across Europe. A simple comparison between the prevalence estimates in different countries reveals that, in 2003, the abattoir survey appears to detect more scrapie in some countries. This is contrary to evidence suggesting the greater ability of the fallen stock survey to detect the disease. We applied meta-analysis techniques to study this apparent heterogeneity in the behaviour of the surveys across Europe. Furthermore, we conducted a meta-regression analysis to assess the effect of country-specific characteristics on the variability. We have chosen the odds ratios between the two surveys to inform the underlying relationship between them and to allow comparisons between the countries under the meta-regression framework. Baseline risks, those of the slaughtered populations across Europe, and country-specific covariates, available from the European Commission Report, were inputted in the model to explain the heterogeneity. Results Our results show the presence of significant heterogeneity in the odds ratios between countries and no reduction in the variability after adjustment for the different risks in the baseline populations. Three countries contributed the most to the overall heterogeneity: Germany, Ireland and The Netherlands. The inclusion of country-specific covariates did not, in general, reduce the variability except for one variable: the proportion of the total adult sheep population sampled as fallen stock by each country. A large residual heterogeneity remained in the model indicating the presence of substantial effect variability between countries. Conclusion The meta-analysis approach was useful to assess the level of heterogeneity in the implementation of the surveys and to explore the reasons for the variation between countries.

  8. Which Types of Leadership Styles Do Followers Prefer? A Decision Tree Approach

    Science.gov (United States)

    Salehzadeh, Reza

    2017-01-01

    Purpose: The purpose of this paper is to propose a new method to find the appropriate leadership styles based on the followers' preferences using the decision tree technique. Design/methodology/approach: Statistical population includes the students of the University of Isfahan. In total, 750 questionnaires were distributed; out of which, 680…

  9. A decision tree approach using silvics to guide planning for forest restoration

    Science.gov (United States)

    Sharon M. Hermann; John S. Kush; John C. Gilbert

    2013-01-01

    We created a decision tree based on silvics of longleaf pine (Pinus palustris) and historical descriptions to develop approaches for restoration management at Horseshoe Bend National Military Park located in central Alabama. A National Park Service goal is to promote structure and composition of a forest that likely surrounded the 1814 battlefield....

  10. Asymmetrical Responses of Ecosystem Processes to Positive Versus Negative Precipitation Extremes: a Replicated Regression Experimental Approach

    Science.gov (United States)

    Felton, A. J.; Smith, M. D.

    2016-12-01

    Heightened climatic variability due to atmospheric warming is forecast to increase the frequency and severity of climate extremes. In particular, changes to interannual variability in precipitation, characterized by increases in extreme wet and dry years, are likely to impact virtually all terrestrial ecosystem processes. However, to date experimental approaches have yet to explicitly test how ecosystem processes respond to multiple levels of climatic extremity, limiting our understanding of how ecosystems will respond to forecast increases in the magnitude of climate extremes. Here we report the results of a replicated regression experimental approach, in which we imposed 9 and 11 levels of growing season precipitation amount and extremity in mesic grassland during 2015 and 2016, respectively. Each level corresponded to a specific percentile of the long-term record, which produced a large gradient of soil moisture conditions that ranged from extreme wet to extreme dry. In both 2015 and 2016, asymptotic responses to water availability were observed for soil respiration. This asymmetry was driven in part by transitions between soil moisture versus temperature constraints on respiration as conditions became increasingly dry versus increasingly wet. In 2015, aboveground net primary production (ANPP) exhibited asymmetric responses to precipitation that largely mirrored those of soil respiration. In total, our results suggest that in this mesic ecosystem, these two carbon cycle processes were more sensitive to extreme drought than to extreme wet years. Future work will assess ANPP responses for 2016, soil nutrient supply and physiological responses of the dominant plant species. Future efforts are needed to compare our findings across a diverse array of ecosystem types, and in particular how the timing and magnitude of precipitation events may modify the response of ecosystem processes to increasing magnitudes of precipitation extremes.

  11. A Vector Approach to Regression Analysis and Its Implications to Heavy-Duty Diesel Emissions

    Energy Technology Data Exchange (ETDEWEB)

    McAdams, H.T.

    2001-02-14

    An alternative approach is presented for the regression of response data on predictor variables that are not logically or physically separable. The methodology is demonstrated by its application to a data set of heavy-duty diesel emissions. Because of the covariance of fuel properties, it is found advantageous to redefine the predictor variables as vectors, in which the original fuel properties are components, rather than as scalars each involving only a single fuel property. The fuel property vectors are defined in such a way that they are mathematically independent and statistically uncorrelated. Because the available data set does not allow definitive separation of vehicle and fuel effects, and because test fuels used in several of the studies may be unrealistically contrived to break the association of fuel variables, the data set is not considered adequate for development of a full-fledged emission model. Nevertheless, the data clearly show that only a few basic patterns of fuel-property variation affect emissions and that the number of these patterns is considerably less than the number of variables initially thought to be involved. These basic patterns, referred to as ''eigenfuels,'' may reflect blending practice in accordance with their relative weighting in specific circumstances. The methodology is believed to be widely applicable in a variety of contexts. It promises an end to the threat of collinearity and the frustration of attempting, often unrealistically, to separate variables that are inseparable.

  12. Model-free prediction and regression a transformation-based approach to inference

    CERN Document Server

    Politis, Dimitris N

    2015-01-01

    The Model-Free Prediction Principle expounded upon in this monograph is based on the simple notion of transforming a complex dataset to one that is easier to work with, e.g., i.i.d. or Gaussian. As such, it restores the emphasis on observable quantities, i.e., current and future data, as opposed to unobservable model parameters and estimates thereof, and yields optimal predictors in diverse settings such as regression and time series. Furthermore, the Model-Free Bootstrap takes us beyond point prediction in order to construct frequentist prediction intervals without resort to unrealistic assumptions such as normality. Prediction has been traditionally approached via a model-based paradigm, i.e., (a) fit a model to the data at hand, and (b) use the fitted model to extrapolate/predict future data. Due to both mathematical and computational constraints, 20th century statistical practice focused mostly on parametric models. Fortunately, with the advent of widely accessible powerful computing in the late 1970s, co...

  13. A regression modeling approach for studying carbonate system variability in the northern Gulf of Alaska

    Science.gov (United States)

    Evans, Wiley; Mathis, Jeremy T.; Winsor, Peter; Statscewich, Hank; Whitledge, Terry E.

    2013-01-01

    northern Gulf of Alaska (GOA) shelf experiences carbonate system variability on seasonal and annual time scales, but little information exists to resolve higher frequency variability in this region. To resolve this variability using platforms-of-opportunity, we present multiple linear regression (MLR) models constructed from hydrographic data collected along the Northeast Pacific Global Ocean Ecosystems Dynamics (GLOBEC) Seward Line. The empirical algorithms predict dissolved inorganic carbon (DIC) and total alkalinity (TA) using observations of nitrate (NO3-), temperature, salinity and pressure from the surface to 500 m, with R2s > 0.97 and RMSE values of 11 µmol kg-1 for DIC and 9 µmol kg-1 for TA. We applied these relationships to high-resolution NO3- data sets collected during a novel 20 h glider flight and a GLOBEC mesoscale SeaSoar survey. Results from the glider flight demonstrated time/space along-isopycnal variability of aragonite saturations (Ωarag) associated with a dicothermal layer (a cold near-surface layer found in high latitude oceans) that rivaled changes seen vertically through the thermocline. The SeaSoar survey captured the uplift to aragonite saturation horizon (depth where Ωarag = 1) shoaled to a previously unseen depth in the northern GOA. This work is similar to recent studies aimed at predicting the carbonate system in continental margin settings, albeit demonstrates that a NO3--based approach can be applied to high-latitude data collected from platforms capable of high-frequency measurements.

  14. A logistic regression approach to model the willingness of consumers to adopt renewable energy sources

    Science.gov (United States)

    Ulkhaq, M. M.; Widodo, A. K.; Yulianto, M. F. A.; Widhiyaningrum; Mustikasari, A.; Akshinta, P. Y.

    2018-03-01

    The implementation of renewable energy in this globalization era is inevitable since the non-renewable energy leads to climate change and global warming; hence, it does harm the environment and human life. However, in the developing countries, such as Indonesia, the implementation of the renewable energy sources does face technical and social problems. For the latter, renewable energy sources implementation is only effective if the public is aware of its benefits. This research tried to identify the determinants that influence consumers’ intention in adopting renewable energy sources. In addition, this research also tried to predict the consumers who are willing to apply the renewable energy sources in their houses using a logistic regression approach. A case study was conducted in Semarang, Indonesia. The result showed that only eight variables (from fifteen) that are significant statistically, i.e., educational background, employment status, income per month, average electricity cost per month, certainty about the efficiency of renewable energy project, relatives’ influence to adopt the renewable energy sources, energy tax deduction, and the condition of the price of the non-renewable energy sources. The finding of this study could be used as a basis for the government to set up a policy towards an implementation of the renewable energy sources.

  15. An Ionospheric Index Model based on Linear Regression and Neural Network Approaches

    Science.gov (United States)

    Tshisaphungo, Mpho; McKinnell, Lee-Anne; Bosco Habarulema, John

    2017-04-01

    The ionosphere is well known to reflect radio wave signals in the high frequency (HF) band due to the present of electron and ions within the region. To optimise the use of long distance HF communications, it is important to understand the drivers of ionospheric storms and accurately predict the propagation conditions especially during disturbed days. This paper presents the development of an ionospheric storm-time index over the South African region for the application of HF communication users. The model will result into a valuable tool to measure the complex ionospheric behaviour in an operational space weather monitoring and forecasting environment. The development of an ionospheric storm-time index is based on a single ionosonde station data over Grahamstown (33.3°S,26.5°E), South Africa. Critical frequency of the F2 layer (foF2) measurements for a period 1996-2014 were considered for this study. The model was developed based on linear regression and neural network approaches. In this talk validation results for low, medium and high solar activity periods will be discussed to demonstrate model's performance.

  16. Modeling Forest Structural Parameters in the Mediterranean Pines of Central Spain using QuickBird-2 Imagery and Classification and Regression Tree Analysis (CART

    Directory of Open Access Journals (Sweden)

    José A. Delgado

    2012-01-01

    Full Text Available Forest structural parameters such as quadratic mean diameter, basal area, and number of trees per unit area are important for the assessment of wood volume and biomass and represent key forest inventory attributes. Forest inventory information is required to support sustainable management, carbon accounting, and policy development activities. Digital image processing of remotely sensed imagery is increasingly utilized to assist traditional, more manual, methods in the estimation of forest structural attributes over extensive areas, also enabling evaluation of change over time. Empirical attribute estimation with remotely sensed data is frequently employed, yet with known limitations, especially over complex environments such as Mediterranean forests. In this study, the capacity of high spatial resolution (HSR imagery and related techniques to model structural parameters at the stand level (n = 490 in Mediterranean pines in Central Spain is tested using data from the commercial satellite QuickBird-2. Spectral and spatial information derived from multispectral and panchromatic imagery (2.4 m and 0.68 m sided pixels, respectively served to model structural parameters. Classification and Regression Tree Analysis (CART was selected for the modeling of attributes. Accurate models were produced of quadratic mean diameter (QMD (R2 = 0.8; RMSE = 0.13 m with an average error of 17% while basal area (BA models produced an average error of 22% (RMSE = 5.79 m2/ha. When the measured number of trees per unit area (N was categorized, as per frequent forest management practices, CART models correctly classified 70% of the stands, with all other stands classified in an adjacent class. The accuracy of the attributes estimated here is expected to be better when canopy cover is more open and attribute values are at the lower end of the range present, as related in the pattern of the residuals found in this study. Our findings indicate that attributes derived from

  17. Modeling ozone bioindicator injury with microscale and landscape-scale explanatory variables: A logistic regression approach

    Science.gov (United States)

    John W. Coulston

    2011-01-01

    Tropospheric ozone occurs at phytotoxic levels in the United States (Lefohn and Pinkerton 1988). Several plant species, including commercially important timber species, are sensitive to elevated ozone levels. Exposure to elevated ozone can cause growth reduction and foliar injury and make trees more susceptible to secondary stressors such as insects and pathogens (...

  18. Detection of Differential Item Functioning with Nonlinear Regression: A Non-IRT Approach Accounting for Guessing

    Czech Academy of Sciences Publication Activity Database

    Drabinová, Adéla; Martinková, Patrícia

    2017-01-01

    Roč. 54, č. 4 (2017), s. 498-517 ISSN 0022-0655 R&D Projects: GA ČR GJ15-15856Y Institutional support: RVO:67985807 Keywords : differential item functioning * non-linear regression * logistic regression * item response theory Subject RIV: AM - Education OBOR OECD: Statistics and probability Impact factor: 0.979, year: 2016

  19. A parameter tree approach to estimating system sensitivities to parameter sets

    International Nuclear Information System (INIS)

    Jarzemba, M.S.; Sagar, B.

    2000-01-01

    A post-processing technique for determining relative system sensitivity to groups of parameters and system components is presented. It is assumed that an appropriate parametric model is used to simulate system behavior using Monte Carlo techniques and that a set of realizations of system output(s) is available. The objective of our technique is to analyze the input vectors and the corresponding output vectors (that is, post-process the results) to estimate the relative sensitivity of the output to input parameters (taken singly and as a group) and thereby rank them. This technique is different from the design of experimental techniques in that a partitioning of the parameter space is not required before the simulation. A tree structure (which looks similar to an event tree) is developed to better explain the technique. Each limb of the tree represents a particular combination of parameters or a combination of system components. For convenience and to distinguish it from the event tree, we call it the parameter tree. To construct the parameter tree, the samples of input parameter values are treated as either a '+' or a '-' based on whether or not the sampled parameter value is greater than or less than a specified branching criterion (e.g., mean, median, percentile of the population). The corresponding system outputs are also segregated into similar bins. Partitioning the first parameter into a '+' or a '-' bin creates the first level of the tree containing two branches. At the next level, realizations associated with each first-level branch are further partitioned into two bins using the branching criteria on the second parameter and so on until the tree is fully populated. Relative sensitivities are then inferred from the number of samples associated with each branch of the tree. The parameter tree approach is illustrated by applying it to a number of preliminary simulations of the proposed high-level radioactive waste repository at Yucca Mountain, NV. Using a

  20. Statistical approach for selection of regression model during validation of bioanalytical method

    Directory of Open Access Journals (Sweden)

    Natalija Nakov

    2014-06-01

    Full Text Available The selection of an adequate regression model is the basis for obtaining accurate and reproducible results during the bionalytical method validation. Given the wide concentration range, frequently present in bioanalytical assays, heteroscedasticity of the data may be expected. Several weighted linear and quadratic regression models were evaluated during the selection of the adequate curve fit using nonparametric statistical tests: One sample rank test and Wilcoxon signed rank test for two independent groups of samples. The results obtained with One sample rank test could not give statistical justification for the selection of linear vs. quadratic regression models because slight differences between the error (presented through the relative residuals were obtained. Estimation of the significance of the differences in the RR was achieved using Wilcoxon signed rank test, where linear and quadratic regression models were treated as two independent groups. The application of this simple non-parametric statistical test provides statistical confirmation of the choice of an adequate regression model.

  1. Demographic and socioeconomic disparity in nutrition: application of a novel Correlated Component Regression approach

    Science.gov (United States)

    Alkerwi, Ala'a; Vernier, Céderic; Sauvageot, Nicolas; Crichton, Georgina E; Elias, Merrill F

    2015-01-01

    Objectives This study aimed to examine the most important demographic and socioeconomic factors associated with diet quality, evaluated in terms of compliance with national dietary recommendations, selection of healthy and unhealthy food choices, energy density and food variety. We hypothesised that different demographic and socioeconomic factors may show disparate associations with diet quality. Study design A nationwide, cross-sectional, population-based study. Participants A total of 1352 apparently healthy and non-institutionalised subjects, aged 18–69 years, participated in the Observation of Cardiovascular Risk Factors in Luxembourg (ORISCAV-LUX) study in 2007–2008. The participants attended the nearest study centre after a telephone appointment, and were interviewed by trained research staff. Outcome measures Diet quality as measured by 5 dietary indicators, namely, recommendation compliance index (RCI), recommended foods score (RFS), non-recommended foods score (non-RFS), energy density score (EDS), and dietary diversity score (DDS). The novel Correlated Component Regression (CCR) technique was used to determine the importance and magnitude of the association of each socioeconomic factor with diet quality, in a global analytic approach. Results Increasing age, being male and living below the poverty threshold were predominant factors associated with eating a high energy density diet. Education level was an important factor associated with healthy and adequate food choices, whereas economic resources were predominant factors associated with food diversity and energy density. Conclusions Multiple demographic and socioeconomic circumstances were associated with different diet quality indicators. Efforts to improve diet quality for high-risk groups need an important public health focus. PMID:25967988

  2. SU-E-J-212: Identifying Bones From MRI: A Dictionary Learnign and Sparse Regression Approach

    International Nuclear Information System (INIS)

    Ruan, D; Yang, Y; Cao, M; Hu, P; Low, D

    2014-01-01

    Purpose: To develop an efficient and robust scheme to identify bony anatomy based on MRI-only simulation images. Methods: MRI offers important soft tissue contrast and functional information, yet its lack of correlation to electron-density has placed it as an auxiliary modality to CT in radiotherapy simulation and adaptation. An effective scheme to identify bony anatomy is an important first step towards MR-only simulation/treatment paradigm and would satisfy most practical purposes. We utilize a UTE acquisition sequence to achieve visibility of the bone. By contrast to manual + bulk or registration-to identify bones, we propose a novel learning-based approach for improved robustness to MR artefacts and environmental changes. Specifically, local information is encoded with MR image patch, and the corresponding label is extracted (during training) from simulation CT aligned to the UTE. Within each class (bone vs. nonbone), an overcomplete dictionary is learned so that typical patches within the proper class can be represented as a sparse combination of the dictionary entries. For testing, an acquired UTE-MRI is divided to patches using a sliding scheme, where each patch is sparsely regressed against both bone and nonbone dictionaries, and subsequently claimed to be associated with the class with the smaller residual. Results: The proposed method has been applied to the pilot site of brain imaging and it has showed general good performance, with dice similarity coefficient of greater than 0.9 in a crossvalidation study using 4 datasets. Importantly, it is robust towards consistent foreign objects (e.g., headset) and the artefacts relates to Gibbs and field heterogeneity. Conclusion: A learning perspective has been developed for inferring bone structures based on UTE MRI. The imaging setting is subject to minimal motion effects and the post-processing is efficient. The improved efficiency and robustness enables a first translation to MR-only routine. The scheme

  3. SU-E-J-212: Identifying Bones From MRI: A Dictionary Learnign and Sparse Regression Approach

    Energy Technology Data Exchange (ETDEWEB)

    Ruan, D; Yang, Y; Cao, M; Hu, P; Low, D [UCLA, Los Angeles, CA (United States)

    2014-06-01

    Purpose: To develop an efficient and robust scheme to identify bony anatomy based on MRI-only simulation images. Methods: MRI offers important soft tissue contrast and functional information, yet its lack of correlation to electron-density has placed it as an auxiliary modality to CT in radiotherapy simulation and adaptation. An effective scheme to identify bony anatomy is an important first step towards MR-only simulation/treatment paradigm and would satisfy most practical purposes. We utilize a UTE acquisition sequence to achieve visibility of the bone. By contrast to manual + bulk or registration-to identify bones, we propose a novel learning-based approach for improved robustness to MR artefacts and environmental changes. Specifically, local information is encoded with MR image patch, and the corresponding label is extracted (during training) from simulation CT aligned to the UTE. Within each class (bone vs. nonbone), an overcomplete dictionary is learned so that typical patches within the proper class can be represented as a sparse combination of the dictionary entries. For testing, an acquired UTE-MRI is divided to patches using a sliding scheme, where each patch is sparsely regressed against both bone and nonbone dictionaries, and subsequently claimed to be associated with the class with the smaller residual. Results: The proposed method has been applied to the pilot site of brain imaging and it has showed general good performance, with dice similarity coefficient of greater than 0.9 in a crossvalidation study using 4 datasets. Importantly, it is robust towards consistent foreign objects (e.g., headset) and the artefacts relates to Gibbs and field heterogeneity. Conclusion: A learning perspective has been developed for inferring bone structures based on UTE MRI. The imaging setting is subject to minimal motion effects and the post-processing is efficient. The improved efficiency and robustness enables a first translation to MR-only routine. The scheme

  4. Predictors of success of external cephalic version and cephalic presentation at birth among 1253 women with non-cephalic presentation using logistic regression and classification tree analyses.

    Science.gov (United States)

    Hutton, Eileen K; Simioni, Julia C; Thabane, Lehana

    2017-08-01

    Among women with a fetus with a non-cephalic presentation, external cephalic version (ECV) has been shown to reduce the rate of breech presentation at birth and cesarean birth. Compared with ECV at term, beginning ECV prior to 37 weeks' gestation decreases the number of infants in a non-cephalic presentation at birth. The purpose of this secondary analysis was to investigate factors associated with a successful ECV procedure and to present this in a clinically useful format. Data were collected as part of the Early ECV Pilot and Early ECV2 Trials, which randomized 1776 women with a fetus in breech presentation to either early ECV (34-36 weeks' gestation) or delayed ECV (at or after 37 weeks). The outcome of interest was successful ECV, defined as the fetus being in a cephalic presentation immediately following the procedure, as well as at the time of birth. The importance of several factors in predicting successful ECV was investigated using two statistical methods: logistic regression and classification and regression tree (CART) analyses. Among nulliparas, non-engagement of the presenting part and an easily palpable fetal head were independently associated with success. Among multiparas, non-engagement of the presenting part, gestation less than 37 weeks and an easily palpable fetal head were found to be independent predictors of success. These findings were consistent with results of the CART analyses. Regardless of parity, descent of the presenting part was the most discriminating factor in predicting successful ECV and cephalic presentation at birth. © 2017 Nordic Federation of Societies of Obstetrics and Gynecology.

  5. Estimating Dbh of Trees Employing Multiple Linear Regression of the best Lidar-Derived Parameter Combination Automated in Python in a Natural Broadleaf Forest in the Philippines

    Science.gov (United States)

    Ibanez, C. A. G.; Carcellar, B. G., III; Paringit, E. C.; Argamosa, R. J. L.; Faelga, R. A. G.; Posilero, M. A. V.; Zaragosa, G. P.; Dimayacyac, N. A.

    2016-06-01

    Diameter-at-Breast-Height Estimation is a prerequisite in various allometric equations estimating important forestry indices like stem volume, basal area, biomass and carbon stock. LiDAR Technology has a means of directly obtaining different forest parameters, except DBH, from the behavior and characteristics of point cloud unique in different forest classes. Extensive tree inventory was done on a two-hectare established sample plot in Mt. Makiling, Laguna for a natural growth forest. Coordinates, height, and canopy cover were measured and types of species were identified to compare to LiDAR derivatives. Multiple linear regression was used to get LiDAR-derived DBH by integrating field-derived DBH and 27 LiDAR-derived parameters at 20m, 10m, and 5m grid resolutions. To know the best combination of parameters in DBH Estimation, all possible combinations of parameters were generated and automated using python scripts and additional regression related libraries such as Numpy, Scipy, and Scikit learn were used. The combination that yields the highest r-squared or coefficient of determination and lowest AIC (Akaike's Information Criterion) and BIC (Bayesian Information Criterion) was determined to be the best equation. The equation is at its best using 11 parameters at 10mgrid size and at of 0.604 r-squared, 154.04 AIC and 175.08 BIC. Combination of parameters may differ among forest classes for further studies. Additional statistical tests can be supplemented to help determine the correlation among parameters such as Kaiser- Meyer-Olkin (KMO) Coefficient and the Barlett's Test for Spherecity (BTS).

  6. A regression approach for Zircaloy-2 in-reactor creep constitutive equations

    International Nuclear Information System (INIS)

    Yung Liu, Y.; Bement, A.L.

    1977-01-01

    In this paper the methodology of multiple regressions as applied to Zircaloy-2 in-reactor creep data analysis and construction of constitutive equation are illustrated. While the resulting constitutive equation can be used in creep analysis of in-reactor Zircaloy structural components, the methodology itself is entirely general and can be applied to any creep data analysis. The promising aspects of multiple regression creep data analysis are briefly outlined as follows: (1) When there are more than one variable involved, there is no need to make the assumption that each variable affects the response independently. No separate normalizations are required either and the estimation of parameters is obtained by solving many simultaneous equations. The number of simultaneous equations is equal to the number of data sets. (2) Regression statistics such as R 2 - and F-statistics provide measures of the significance of regression creep equation in correlating the overall data. The relative weights of each variable on the response can also be obtained. (3) Special regression techniques such as step-wise, ridge, and robust regressions and residual plots, etc., provide diagnostic tools for model selections. Multiple regression analysis performed on a set of carefully selected Zircaloy-2 in-reactor creep data leads to a model which provides excellent correlations for the data. (Auth.)

  7. Identifying changes in dissolved organic matter content and characteristics by fluorescence spectroscopy coupled with self-organizing map and classification and regression tree analysis during wastewater treatment.

    Science.gov (United States)

    Yu, Huibin; Song, Yonghui; Liu, Ruixia; Pan, Hongwei; Xiang, Liancheng; Qian, Feng

    2014-10-01

    The stabilization of latent tracers of dissolved organic matter (DOM) of wastewater was analyzed by three-dimensional excitation-emission matrix (EEM) fluorescence spectroscopy coupled with self-organizing map and classification and regression tree analysis (CART) in wastewater treatment performance. DOM of water samples collected from primary sedimentation, anaerobic, anoxic, oxic and secondary sedimentation tanks in a large-scale wastewater treatment plant contained four fluorescence components: tryptophan-like (C1), tyrosine-like (C2), microbial humic-like (C3) and fulvic-like (C4) materials extracted by self-organizing map. These components showed good positive linear correlations with dissolved organic carbon of DOM. C1 and C2 were representative components in the wastewater, and they were removed to a higher extent than those of C3 and C4 in the treatment process. C2 was a latent parameter determined by CART to differentiate water samples of oxic and secondary sedimentation tanks from the successive treatment units, indirectly proving that most of tyrosine-like material was degraded by anaerobic microorganisms. C1 was an accurate parameter to comprehensively separate the samples of the five treatment units from each other, indirectly indicating that tryptophan-like material was decomposed by anaerobic and aerobic bacteria. EEM fluorescence spectroscopy in combination with self-organizing map and CART analysis can be a nondestructive effective method for characterizing structural component of DOM fractions and monitoring organic matter removal in wastewater treatment process. Copyright © 2014 Elsevier Ltd. All rights reserved.

  8. Relação entre diferentes caracteres de plantas jovens de seringueira Correlations and regressions studies among juvenile rubber tree characters

    Directory of Open Access Journals (Sweden)

    César Lavorenti

    1990-01-01

    Full Text Available O presente trabalho foi realizado com o objetivo de determinar a existência e as magnitudes de correlações e regressões lineares simples em plântulas jovens de seringueira (Hevea spp., para melhor condução de seleção nos futuros trabalhos de melhoramento. Foram utilizadas médias de produção de borracha seca por plântulas por corte, através do teste Hamaker-Morris-Mann (P; circunferência do caule (CC; espessura de casca (EC; número de anéis (NA; diâmetro dos vasos (DV; densidade dos vasos laticíleros (D e distância média entre anéis de vasos consecutivos (DMEAVC em um viveiro de cruzamento com três anos e meio de idade. Os resultados mostraram, entre outros fatores, que as correlações lineares simples de P com CC, EC, NA, D, DV e DMEAVC foram, respectivamente, r =t 0,61, 0,34, 0,28, 0,29, 0,43 e -0,13. As correlações de CC com EC, NA, D, DV e DMEAVC foram: 0,65, 0,22, 0,37, 0,33 e 0,096 respectivamente. Estudos de regressão linear simples de P com CC, EC, NA, DV, D e DMEAVC sugerem que CC foi o caráter independente mais significativo, contribuindo com 36% da variação em P. Em relação ao vigor, a regressão de CC com os respectivos caracteres sugere que EC foi o único caráter que contribuiu significativamente para a variação de CC com 42%. As altas correlações observadas da produção com circunferência do caule e com espessura de casca evidenciam a possibilidade de obter genótipos jovens de boa capacidade produtiva e grande vigor, através de seleção precoce dessas variáveis.This study was undertaken aiming to determine the existence of linear correlations, based on simple regression studies for a better improvement of young rubber tree (Hevea spp. breeding and selection. The characters studied were: yield of dry rubber per tapping by Hamaker-Morris-Mann test tapping (P, mean gurth (CC, bark thickness (EC, number of latex vessel rings (NA, diameter of latex vesseis (DV, density of latex vesseis per 5mm

  9. A robust ridge regression approach in the presence of both multicollinearity and outliers in the data

    Science.gov (United States)

    Shariff, Nurul Sima Mohamad; Ferdaos, Nur Aqilah

    2017-08-01

    Multicollinearity often leads to inconsistent and unreliable parameter estimates in regression analysis. This situation will be more severe in the presence of outliers it will cause fatter tails in the error distributions than the normal distributions. The well-known procedure that is robust to multicollinearity problem is the ridge regression method. This method however is expected to be affected by the presence of outliers due to some assumptions imposed in the modeling procedure. Thus, the robust version of existing ridge method with some modification in the inverse matrix and the estimated response value is introduced. The performance of the proposed method is discussed and comparisons are made with several existing estimators namely, Ordinary Least Squares (OLS), ridge regression and robust ridge regression based on GM-estimates. The finding of this study is able to produce reliable parameter estimates in the presence of both multicollinearity and outliers in the data.

  10. An Integrated Approach to Battery Health Monitoring using Bayesian Regression, Classification and State Estimation

    Data.gov (United States)

    National Aeronautics and Space Administration — The application of the Bayesian theory of managing uncertainty and complexity to regression and classification in the form of Relevance Vector Machine (RVM), and to...

  11. HYBRID DATA APPROACH FOR SELECTING EFFECTIVE TEST CASES DURING THE REGRESSION TESTING

    OpenAIRE

    Mohan, M.; Shrimali, Tarun

    2017-01-01

    In the software industry, software testing becomes more important in the entire software development life cycle. Software testing is one of the fundamental components of software quality assurances. Software Testing Life Cycle (STLC)is a process involved in testing the complete software, which includes Regression Testing, Unit Testing, Smoke Testing, Integration Testing, Interface Testing, System Testing & etc. In the STLC of Regression testing, test case selection is one of the most importan...

  12. Quantifying multi-dimensional functional trait spaces of trees: empirical versus theoretical approaches

    Science.gov (United States)

    Ogle, K.; Fell, M.; Barber, J. J.

    2016-12-01

    Empirical, field studies of plant functional traits have revealed important trade-offs among pairs or triplets of traits, such as the leaf (LES) and wood (WES) economics spectra. Trade-offs include correlations between leaf longevity (LL) vs specific leaf area (SLA), LL vs mass-specific leaf respiration rate (RmL), SLA vs RmL, and resistance to breakage vs wood density. Ordination analyses (e.g., PCA) show groupings of traits that tend to align with different life-history strategies or taxonomic groups. It is unclear, however, what underlies such trade-offs and emergent spectra. Do they arise from inherent physiological constraints on growth, or are they more reflective of environmental filtering? The relative importance of these mechanisms has implications for predicting biogeochemical cycling, which is influenced by trait distributions of the plant community. We address this question using an individual-based model of tree growth (ACGCA) to quantify the theoretical trait space of trees that emerges from physiological constraints. ACGCA's inputs include 32 physiological, anatomical, and allometric traits, many of which are related to the LES and WES. We fit ACGCA to 1.6 million USFS FIA observations of tree diameters and heights to obtain vectors of trait values that produce realistic growth, and we explored the structure of this trait space. No notable correlations emerged among the 496 trait pairs, but stepwise regressions revealed complicated multi-variate structure: e.g., relationships between pairs of traits (e.g., RmL and SLA) are governed by other traits (e.g., LL, radiation-use efficiency [RUE]). We also simulated growth under various canopy gap scenarios that impose varying degrees of environmental filtering to explore the multi-dimensional trait space (hypervolume) of trees that died vs survived. The centroid and volume of the hypervolumes differed among dead and live trees, especially under gap conditions leading to low mortality. Traits most predictive

  13. Bias in logistic regression due to imperfect diagnostic test results and practical correction approaches.

    Science.gov (United States)

    Valle, Denis; Lima, Joanna M Tucker; Millar, Justin; Amratia, Punam; Haque, Ubydul

    2015-11-04

    Logistic regression is a statistical model widely used in cross-sectional and cohort studies to identify and quantify the effects of potential disease risk factors. However, the impact of imperfect tests on adjusted odds ratios (and thus on the identification of risk factors) is under-appreciated. The purpose of this article is to draw attention to the problem associated with modelling imperfect diagnostic tests, and propose simple Bayesian models to adequately address this issue. A systematic literature review was conducted to determine the proportion of malaria studies that appropriately accounted for false-negatives/false-positives in a logistic regression setting. Inference from the standard logistic regression was also compared with that from three proposed Bayesian models using simulations and malaria data from the western Brazilian Amazon. A systematic literature review suggests that malaria epidemiologists are largely unaware of the problem of using logistic regression to model imperfect diagnostic test results. Simulation results reveal that statistical inference can be substantially improved when using the proposed Bayesian models versus the standard logistic regression. Finally, analysis of original malaria data with one of the proposed Bayesian models reveals that microscopy sensitivity is strongly influenced by how long people have lived in the study region, and an important risk factor (i.e., participation in forest extractivism) is identified that would have been missed by standard logistic regression. Given the numerous diagnostic methods employed by malaria researchers and the ubiquitous use of logistic regression to model the results of these diagnostic tests, this paper provides critical guidelines to improve data analysis practice in the presence of misclassification error. Easy-to-use code that can be readily adapted to WinBUGS is provided, enabling straightforward implementation of the proposed Bayesian models.

  14. Frugivores bias seed-adult tree associations through nonrandom seed dispersal: a phylogenetic approach.

    Science.gov (United States)

    Razafindratsima, Onja H; Dunham, Amy E

    2016-08-01

    Frugivores are the main seed dispersers in many ecosystems, such that behaviorally driven, nonrandom patterns of seed dispersal are a common process; but patterns are poorly understood. Characterizing these patterns may be essential for understanding spatial organization of fruiting trees and drivers of seed-dispersal limitation in biodiverse forests. To address this, we studied resulting spatial associations between dispersed seeds and adult tree neighbors in a diverse rainforest in Madagascar, using a temporal and phylogenetic approach. Data show that by using fruiting trees as seed-dispersal foci, frugivores bias seed dispersal under conspecific adults and under heterospecific trees that share dispersers and fruiting time with the dispersed species. Frugivore-mediated seed dispersal also resulted in nonrandom phylogenetic associations of dispersed seeds with their nearest adult neighbors, in nine out of the 16 months of our study. However, these nonrandom phylogenetic associations fluctuated unpredictably over time, ranging from clustered to overdispersed. The spatial and phylogenetic template of seed dispersal did not translate to similar patterns of association in adult tree neighborhoods, suggesting the importance of post-dispersal processes in structuring plant communities. Results suggest that frugivore-mediated seed dispersal is important for structuring early stages of plant-plant associations, setting the template for post-dispersal processes that influence ultimate patterns of plant recruitment. Importantly, if biased patterns of dispersal are common in other systems, frugivores may promote tree coexistence in biodiverse forests by limiting the frequency and diversity of heterospecific interactions of seeds they disperse. © 2016 by the Ecological Society of America.

  15. Mortality risk prediction in burn injury: Comparison of logistic regression with machine learning approaches.

    Science.gov (United States)

    Stylianou, Neophytos; Akbarov, Artur; Kontopantelis, Evangelos; Buchan, Iain; Dunn, Ken W

    2015-08-01

    Predicting mortality from burn injury has traditionally employed logistic regression models. Alternative machine learning methods have been introduced in some areas of clinical prediction as the necessary software and computational facilities have become accessible. Here we compare logistic regression and machine learning predictions of mortality from burn. An established logistic mortality model was compared to machine learning methods (artificial neural network, support vector machine, random forests and naïve Bayes) using a population-based (England & Wales) case-cohort registry. Predictive evaluation used: area under the receiver operating characteristic curve; sensitivity; specificity; positive predictive value and Youden's index. All methods had comparable discriminatory abilities, similar sensitivities, specificities and positive predictive values. Although some machine learning methods performed marginally better than logistic regression the differences were seldom statistically significant and clinically insubstantial. Random forests were marginally better for high positive predictive value and reasonable sensitivity. Neural networks yielded slightly better prediction overall. Logistic regression gives an optimal mix of performance and interpretability. The established logistic regression model of burn mortality performs well against more complex alternatives. Clinical prediction with a small set of strong, stable, independent predictors is unlikely to gain much from machine learning outside specialist research contexts. Copyright © 2015 Elsevier Ltd and ISBI. All rights reserved.

  16. Modeling flash floods in ungauged mountain catchments of China: A decision tree learning approach for parameter regionalization

    Science.gov (United States)

    Ragettli, S.; Zhou, J.; Wang, H.; Liu, C.; Guo, L.

    2017-12-01

    Flash floods in small mountain catchments are one of the most frequent causes of loss of life and property from natural hazards in China. Hydrological models can be a useful tool for the anticipation of these events and the issuing of timely warnings. One of the main challenges of setting up such a system is finding appropriate model parameter values for ungauged catchments. Previous studies have shown that the transfer of parameter sets from hydrologically similar gauged catchments is one of the best performing regionalization methods. However, a remaining key issue is the identification of suitable descriptors of similarity. In this study, we use decision tree learning to explore parameter set transferability in the full space of catchment descriptors. For this purpose, a semi-distributed rainfall-runoff model is set up for 35 catchments in ten Chinese provinces. Hourly runoff data from in total 858 storm events are used to calibrate the model and to evaluate the performance of parameter set transfers between catchments. We then present a novel technique that uses the splitting rules of classification and regression trees (CART) for finding suitable donor catchments for ungauged target catchments. The ability of the model to detect flood events in assumed ungauged catchments is evaluated in series of leave-one-out tests. We show that CART analysis increases the probability of detection of 10-year flood events in comparison to a conventional measure of physiographic-climatic similarity by up to 20%. Decision tree learning can outperform other regionalization approaches because it generates rules that optimally consider spatial proximity and physical similarity. Spatial proximity can be used as a selection criteria but is skipped in the case where no similar gauged catchments are in the vicinity. We conclude that the CART regionalization concept is particularly suitable for implementation in sparsely gauged and topographically complex environments where a proximity

  17. A Voronoi interior adjacency-based approach for generating a contour tree

    Science.gov (United States)

    Chen, Jun; Qiao, Chaofei; Zhao, Renliang

    2004-05-01

    A contour tree is a good graphical tool for representing the spatial relations of contour lines and has found many applications in map generalization, map annotation, terrain analysis, etc. A new approach for generating contour trees by introducing a Voronoi-based interior adjacency set concept is proposed in this paper. The immediate interior adjacency set is employed to identify all of the children contours of each contour without contour elevations. It has advantages over existing methods such as the point-in-polygon method and the region growing-based method. This new approach can be used for spatial data mining and knowledge discovering, such as the automatic extraction of terrain features and construction of multi-resolution digital elevation model.

  18. A best-first tree-searching approach for ML decoding in MIMO system

    KAUST Repository

    Shen, Chung-An

    2012-07-28

    In MIMO communication systems maximum-likelihood (ML) decoding can be formulated as a tree-searching problem. This paper presents a tree-searching approach that combines the features of classical depth-first and breadth-first approaches to achieve close to ML performance while minimizing the number of visited nodes. A detailed outline of the algorithm is given, including the required storage. The effects of storage size on BER performance and complexity in terms of search space are also studied. Our result demonstrates that with a proper choice of storage size the proposed method visits 40% fewer nodes than a sphere decoding algorithm at signal to noise ratio (SNR) = 20dB and by an order of magnitude at 0 dB SNR.

  19. A regression approach for zircaloy-2 in-reactor creep constitutive equations

    International Nuclear Information System (INIS)

    Yung Liu, Y.; Bement, A.L.

    1977-01-01

    In this paper the methodology of multiple regressions as applied to zircaloy-2 in-reactor creep data analysis and construction of constitutive equation are illustrated. While the resulting constitutive equation can be used in creep analysis of in-reactor zircaloy structural components, the methodology itself is entirely general and can be applied to any creep data analysis. From data analysis and model development point of views, both the assumption of independence and prior committment to specific model forms are unacceptable. One would desire means which can not only estimate the required parameters directly from data but also provide basis for model selections, viz., one model against others. Basic understanding of the physics of deformation is important in choosing the forms of starting physical model equations, but the justifications must rely on their abilities in correlating the overall data. The promising aspects of multiple regression creep data analysis are briefly outlined as follows: (1) when there are more than one variable involved, there is no need to make the assumption that each variable affects the response independently. No separate normalizations are required either and the estimation of parameters is obtained by solving many simultaneous equations. The number of simultaneous equations is equal to the number of data sets, (2) regression statistics such as R 2 - and F-statistics provide measures of the significance of regression creep equation in correlating the overall data. The relative weights of each variable on the response can also be obtained. (3) Special regression techniques such as step-wise, ridge, and robust regressions and residual plots, etc., provide diagnostic tools for model selections

  20. A nonparametric approach to calculate critical micelle concentrations: the local polynomial regression method

    Energy Technology Data Exchange (ETDEWEB)

    Lopez Fontan, J.L.; Costa, J.; Ruso, J.M.; Prieto, G. [Dept. of Applied Physics, Univ. of Santiago de Compostela, Santiago de Compostela (Spain); Sarmiento, F. [Dept. of Mathematics, Faculty of Informatics, Univ. of A Coruna, A Coruna (Spain)

    2004-02-01

    The application of a statistical method, the local polynomial regression method, (LPRM), based on a nonparametric estimation of the regression function to determine the critical micelle concentration (cmc) is presented. The method is extremely flexible because it does not impose any parametric model on the subjacent structure of the data but rather allows the data to speak for themselves. Good concordance of cmc values with those obtained by other methods was found for systems in which the variation of a measured physical property with concentration showed an abrupt change. When this variation was slow, discrepancies between the values obtained by LPRM and others methods were found. (orig.)

  1. An Integrated Approach of Model checking and Temporal Fault Tree for System Safety Analysis

    Energy Technology Data Exchange (ETDEWEB)

    Koh, Kwang Yong; Seong, Poong Hyun [Korea Advanced Institute of Science and Technology, Daejeon (Korea, Republic of)

    2009-10-15

    Digitalization of instruments and control systems in nuclear power plants offers the potential to improve plant safety and reliability through features such as increased hardware reliability and stability, and improved failure detection capability. It however makes the systems and their safety analysis more complex. Originally, safety analysis was applied to hardware system components and formal methods mainly to software. For software-controlled or digitalized systems, it is necessary to integrate both. Fault tree analysis (FTA) which has been one of the most widely used safety analysis technique in nuclear industry suffers from several drawbacks as described in. In this work, to resolve the problems, FTA and model checking are integrated to provide formal, automated and qualitative assistance to informal and/or quantitative safety analysis. Our approach proposes to build a formal model of the system together with fault trees. We introduce several temporal gates based on timed computational tree logic (TCTL) to capture absolute time behaviors of the system and to give concrete semantics to fault tree gates to reduce errors during the analysis, and use model checking technique to automate the reasoning process of FTA.

  2. Gender Gaps in Mathematics, Science and Reading Achievements in Muslim Countries: A Quantile Regression Approach

    Science.gov (United States)

    Shafiq, M. Najeeb

    2013-01-01

    Using quantile regression analyses, this study examines gender gaps in mathematics, science, and reading in Azerbaijan, Indonesia, Jordan, the Kyrgyz Republic, Qatar, Tunisia, and Turkey among 15-year-old students. The analyses show that girls in Azerbaijan achieve as well as boys in mathematics and science and overachieve in reading. In Jordan,…

  3. INTRODUCTION TO A COMBINED MULTIPLE LINEAR REGRESSION AND ARMA MODELING APPROACH FOR BEACH BACTERIA PREDICTION

    Science.gov (United States)

    Due to the complexity of the processes contributing to beach bacteria concentrations, many researchers rely on statistical modeling, among which multiple linear regression (MLR) modeling is most widely used. Despite its ease of use and interpretation, there may be time dependence...

  4. Modeling geochemical datasets for source apportionment: Comparison of least square regression and inversion approaches.

    Digital Repository Service at National Institute of Oceanography (India)

    Tripathy, G.R.; Das, Anirban.

    used methods, the Least Square Regression (LSR) and Inverse Modeling (IM), to determine the contributions of (i) solutes from different sources to global river water, and (ii) various rocks to a glacial till. The purpose of this exercise is to compare...

  5. Replicating Experimental Impact Estimates Using a Regression Discontinuity Approach. NCEE 2012-4025

    Science.gov (United States)

    Gleason, Philip M.; Resch, Alexandra M.; Berk, Jillian A.

    2012-01-01

    This NCEE Technical Methods Paper compares the estimated impacts of an educational intervention using experimental and regression discontinuity (RD) study designs. The analysis used data from two large-scale randomized controlled trials--the Education Technology Evaluation and the Teach for America Study--to provide evidence on the performance of…

  6. On the Usefulness of a Multilevel Logistic Regression Approach to Person-Fit Analysis

    Science.gov (United States)

    Conijn, Judith M.; Emons, Wilco H. M.; van Assen, Marcel A. L. M.; Sijtsma, Klaas

    2011-01-01

    The logistic person response function (PRF) models the probability of a correct response as a function of the item locations. Reise (2000) proposed to use the slope parameter of the logistic PRF as a person-fit measure. He reformulated the logistic PRF model as a multilevel logistic regression model and estimated the PRF parameters from this…

  7. Financial Aid and First-Year Collegiate GPA: A Regression Discontinuity Approach

    Science.gov (United States)

    Curs, Bradley R.; Harper, Casandra E.

    2012-01-01

    Using a regression discontinuity design, we investigate whether a merit-based financial aid program has a causal effect on the first-year grade point average of first-time out-of-state freshmen at the University of Oregon. Our results indicate that merit-based financial aid has a positive and significant effect on first-year collegiate grade point…

  8. VOXEL-BASED APPROACH FOR ESTIMATING URBAN TREE VOLUME FROM TERRESTRIAL LASER SCANNING DATA

    Directory of Open Access Journals (Sweden)

    C. Vonderach

    2012-07-01

    Full Text Available The importance of single trees and the determination of related parameters has been recognized in recent years, e.g. for forest inventories or management. For urban areas an increasing interest in the data acquisition of trees can be observed concerning aspects like urban climate, CO2 balance, and environmental protection. Urban trees differ significantly from natural systems with regard to the site conditions (e.g. technogenic soils, contaminants, lower groundwater level, regular disturbance, climate (increased temperature, reduced humidity and species composition and arrangement (habitus and health status and therefore allometric relations cannot be transferred from natural sites to urban areas. To overcome this problem an extended approach was developed for a fast and non-destructive extraction of branch volume, DBH (diameter at breast height and height of single trees from point clouds of terrestrial laser scanning (TLS. For data acquisition, the trees were scanned with highest scan resolution from several (up to five positions located around the tree. The resulting point clouds (20 to 60 million points are analysed with an algorithm based on voxel (volume elements structure, leading to an appropriate data reduction. In a first step, two kinds of noise reduction are carried out: the elimination of isolated voxels as well as voxels with marginal point density. To obtain correct volume estimates, the voxels inside the stem and branches (interior voxels where voxels contain no laser points must be regarded. For this filling process, an easy and robust approach was developed based on a layer-wise (horizontal layers of the voxel structure intersection of four orthogonal viewing directions. However, this procedure also generates several erroneous "phantom" voxels, which have to be eliminated. For this purpose the previous approach was extended by a special region growing algorithm. In a final step the volume is determined layer-wise based on the

  9. A conceptual approach to approximate tree root architecture in infinite slope models

    Science.gov (United States)

    Schmaltz, Elmar; Glade, Thomas

    2016-04-01

    Vegetation-related properties - particularly tree root distribution and coherent hydrologic and mechanical effects on the underlying soil mantle - are commonly not considered in infinite slope models. Indeed, from a geotechnical point of view, these effects appear to be difficult to be reproduced reliably in a physically-based modelling approach. The growth of a tree and the expansion of its root architecture are directly connected with both intrinsic properties such as species and age, and extrinsic factors like topography, availability of nutrients, climate and soil type. These parameters control four main issues of the tree root architecture: 1) Type of rooting; 2) maximum growing distance to the tree stem (radius r); 3) maximum growing depth (height h); and 4) potential deformation of the root system. Geometric solids are able to approximate the distribution of a tree root system. The objective of this paper is to investigate whether it is possible to implement root systems and the connected hydrological and mechanical attributes sufficiently in a 3-dimensional slope stability model. Hereby, a spatio-dynamic vegetation module should cope with the demands of performance, computation time and significance. However, in this presentation, we focus only on the distribution of roots. The assumption is that the horizontal root distribution around a tree stem on a 2-dimensional plane can be described by a circle with the stem located at the centroid and a distinct radius r that is dependent on age and species. We classified three main types of tree root systems and reproduced the species-age-related root distribution with three respective mathematical solids in a synthetic 3-dimensional hillslope ambience. Thus, two solids in an Euclidian space were distinguished to represent the three root systems: i) cylinders with radius r and height h, whilst the dimension of latter defines the shape of a taproot-system or a shallow-root-system respectively; ii) elliptic

  10. The node-weighted Steiner tree approach to identify elements of cancer-related signaling pathways.

    Science.gov (United States)

    Sun, Yahui; Ma, Chenkai; Halgamuge, Saman

    2017-12-28

    Cancer constitutes a momentous health burden in our society. Critical information on cancer may be hidden in its signaling pathways. However, even though a large amount of money has been spent on cancer research, some critical information on cancer-related signaling pathways still remains elusive. Hence, new works towards a complete understanding of cancer-related signaling pathways will greatly benefit the prevention, diagnosis, and treatment of cancer. We propose the node-weighted Steiner tree approach to identify important elements of cancer-related signaling pathways at the level of proteins. This new approach has advantages over previous approaches since it is fast in processing large protein-protein interaction networks. We apply this new approach to identify important elements of two well-known cancer-related signaling pathways: PI3K/Akt and MAPK. First, we generate a node-weighted protein-protein interaction network using protein and signaling pathway data. Second, we modify and use two preprocessing techniques and a state-of-the-art Steiner tree algorithm to identify a subnetwork in the generated network. Third, we propose two new metrics to select important elements from this subnetwork. On a commonly used personal computer, this new approach takes less than 2 s to identify the important elements of PI3K/Akt and MAPK signaling pathways in a large node-weighted protein-protein interaction network with 16,843 vertices and 1,736,922 edges. We further analyze and demonstrate the significance of these identified elements to cancer signal transduction by exploring previously reported experimental evidences. Our node-weighted Steiner tree approach is shown to be both fast and effective to identify important elements of cancer-related signaling pathways. Furthermore, it may provide new perspectives into the identification of signaling pathways for other human diseases.

  11. Gene selection for the reconstruction of stem cell differentiation trees: a linear programming approach.

    Science.gov (United States)

    Ghadie, Mohamed A; Japkowicz, Nathalie; Perkins, Theodore J

    2015-08-15

    Stem cell differentiation is largely guided by master transcriptional regulators, but it also depends on the expression of other types of genes, such as cell cycle genes, signaling genes, metabolic genes, trafficking genes, etc. Traditional approaches to understanding gene expression patterns across multiple conditions, such as principal components analysis or K-means clustering, can group cell types based on gene expression, but they do so without knowledge of the differentiation hierarchy. Hierarchical clustering can organize cell types into a tree, but in general this tree is different from the differentiation hierarchy itself. Given the differentiation hierarchy and gene expression data at each node, we construct a weighted Euclidean distance metric such that the minimum spanning tree with respect to that metric is precisely the given differentiation hierarchy. We provide a set of linear constraints that are provably sufficient for the desired construction and a linear programming approach to identify sparse sets of weights, effectively identifying genes that are most relevant for discriminating different parts of the tree. We apply our method to microarray gene expression data describing 38 cell types in the hematopoiesis hierarchy, constructing a weighted Euclidean metric that uses just 175 genes. However, we find that there are many alternative sets of weights that satisfy the linear constraints. Thus, in the style of random-forest training, we also construct metrics based on random subsets of the genes and compare them to the metric of 175 genes. We then report on the selected genes and their biological functions. Our approach offers a new way to identify genes that may have important roles in stem cell differentiation. tperkins@ohri.ca Supplementary data are available at Bioinformatics online. © The Author 2015. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.

  12. Three approaches to deal with inconsistent decision tables - Comparison of decision tree complexity

    KAUST Repository

    Azad, Mohammad; Chikalov, Igor; Moshkov, Mikhail

    2013-01-01

    In inconsistent decision tables, there are groups of rows with equal values of conditional attributes and different decisions (values of the decision attribute). We study three approaches to deal with such tables. Instead of a group of equal rows, we consider one row given by values of conditional attributes and we attach to this row: (i) the set of all decisions for rows from the group (many-valued decision approach); (ii) the most common decision for rows from the group (most common decision approach); and (iii) the unique code of the set of all decisions for rows from the group (generalized decision approach). We present experimental results and compare the depth, average depth and number of nodes of decision trees constructed by a greedy algorithm in the framework of each of the three approaches. © 2013 Springer-Verlag.

  13. Comparing Two Different Approaches to the Modeling of the Common Cause Failures in Fault Trees

    International Nuclear Information System (INIS)

    Vukovic, I.; Mikulicic, V.; Vrbanic, I.

    2002-01-01

    The potential for common cause failures in systems that perform critical functions has been recognized as very important contributor to risk associated with operation of nuclear power plants. Consequentially, modeling of common cause failures (CCF) in fault trees has become one among the essential elements in any probabilistic safety assessment (PSA). Detailed and realistic representation of CCF potential in fault tree structure is sometimes very challenging task. This is especially so in the cases where a common cause group involves more than two components. During the last ten years the difficulties associated with this kind of modeling have been overcome to some degree by development of integral PSA tools with high capabilities. Some of them allow for the definition of CCF groups and their automated expanding in the process of Boolean resolution and generation of minimal cutsets. On the other hand, in PSA models developed and run by more traditional tools, CCF-potential had to be modeled in the fault trees explicitly. With explicit CCF modeling, fault trees can grow very large, especially in the cases when they involve CCF groups with 3 or more members, which can become an issue for the management of fault trees and basic events with traditional non-integral PSA models. For these reasons various simplifications had to be made. Speaking in terms of an overall PSA model, there are also some other issues that need to be considered, such as maintainability and accessibility of the model. In this paper a comparison is made between the two approaches to CCF modeling. Analysis is based on a full-scope Level 1 PSA model for internal initiating events that had originally been developed by a traditional PSA tool and later transferred to a new-generation PSA tool with automated CCF modeling capabilities. Related aspects and issues mentioned above are discussed in the paper. (author)

  14. Healthcare Expenditures Associated with Depression Among Individuals with Osteoarthritis: Post-Regression Linear Decomposition Approach.

    Science.gov (United States)

    Agarwal, Parul; Sambamoorthi, Usha

    2015-12-01

    Depression is common among individuals with osteoarthritis and leads to increased healthcare burden. The objective of this study was to examine excess total healthcare expenditures associated with depression among individuals with osteoarthritis in the US. Adults with self-reported osteoarthritis (n = 1881) were identified using data from the 2010 Medical Expenditure Panel Survey (MEPS). Among those with osteoarthritis, chi-square tests and ordinary least square regressions (OLS) were used to examine differences in healthcare expenditures between those with and without depression. Post-regression linear decomposition technique was used to estimate the relative contribution of different constructs of the Anderson's behavioral model, i.e., predisposing, enabling, need, personal healthcare practices, and external environment factors, to the excess expenditures associated with depression among individuals with osteoarthritis. All analysis accounted for the complex survey design of MEPS. Depression coexisted among 20.6 % of adults with osteoarthritis. The average total healthcare expenditures were $13,684 among adults with depression compared to $9284 among those without depression. Multivariable OLS regression revealed that adults with depression had 38.8 % higher healthcare expenditures (p regression linear decomposition analysis indicated that 50 % of differences in expenditures among adults with and without depression can be explained by differences in need factors. Among individuals with coexisting osteoarthritis and depression, excess healthcare expenditures associated with depression were mainly due to comorbid anxiety, chronic conditions and poor health status. These expenditures may potentially be reduced by providing timely intervention for need factors or by providing care under a collaborative care model.

  15. The effect of foreign aid on corruption: A quantile regression approach

    OpenAIRE

    Okada, Keisuke; Samreth, Sovannroeun

    2011-01-01

    This paper investigates the effect of foreign aid on corruption using a quantile regression method. Our estimation results illustrate that foreign aid generally lessens corruption and, in particular, its reduction effect is larger in countries with low levels of corruption. In addition, considering foreign aid by donors, our analysis indicates that while multilateral aid has a larger reduction impact on corruption, bilateral aid from the world’s leading donors, such as France, the United King...

  16. Performance and separation occurrence of binary probit regression estimator using maximum likelihood method and Firths approach under different sample size

    Science.gov (United States)

    Lusiana, Evellin Dewi

    2017-12-01

    The parameters of binary probit regression model are commonly estimated by using Maximum Likelihood Estimation (MLE) method. However, MLE method has limitation if the binary data contains separation. Separation is the condition where there are one or several independent variables that exactly grouped the categories in binary response. It will result the estimators of MLE method become non-convergent, so that they cannot be used in modeling. One of the effort to resolve the separation is using Firths approach instead. This research has two aims. First, to identify the chance of separation occurrence in binary probit regression model between MLE method and Firths approach. Second, to compare the performance of binary probit regression model estimator that obtained by MLE method and Firths approach using RMSE criteria. Those are performed using simulation method and under different sample size. The results showed that the chance of separation occurrence in MLE method for small sample size is higher than Firths approach. On the other hand, for larger sample size, the probability decreased and relatively identic between MLE method and Firths approach. Meanwhile, Firths estimators have smaller RMSE than MLEs especially for smaller sample sizes. But for larger sample sizes, the RMSEs are not much different. It means that Firths estimators outperformed MLE estimator.

  17. Sequence comparison alignment-free approach based on suffix tree and L-words frequency.

    Science.gov (United States)

    Soares, Inês; Goios, Ana; Amorim, António

    2012-01-01

    The vast majority of methods available for sequence comparison rely on a first sequence alignment step, which requires a number of assumptions on evolutionary history and is sometimes very difficult or impossible to perform due to the abundance of gaps (insertions/deletions). In such cases, an alternative alignment-free method would prove valuable. Our method starts by a computation of a generalized suffix tree of all sequences, which is completed in linear time. Using this tree, the frequency of all possible words with a preset length L-L-words--in each sequence is rapidly calculated. Based on the L-words frequency profile of each sequence, a pairwise standard Euclidean distance is then computed producing a symmetric genetic distance matrix, which can be used to generate a neighbor joining dendrogram or a multidimensional scaling graph. We present an improvement to word counting alignment-free approaches for sequence comparison, by determining a single optimal word length and combining suffix tree structures to the word counting tasks. Our approach is, thus, a fast and simple application that proved to be efficient and powerful when applied to mitochondrial genomes. The algorithm was implemented in Python language and is freely available on the web.

  18. Sequence Comparison Alignment-Free Approach Based on Suffix Tree and L-Words Frequency

    Directory of Open Access Journals (Sweden)

    Inês Soares

    2012-01-01

    Full Text Available The vast majority of methods available for sequence comparison rely on a first sequence alignment step, which requires a number of assumptions on evolutionary history and is sometimes very difficult or impossible to perform due to the abundance of gaps (insertions/deletions. In such cases, an alternative alignment-free method would prove valuable. Our method starts by a computation of a generalized suffix tree of all sequences, which is completed in linear time. Using this tree, the frequency of all possible words with a preset length L—L-words—in each sequence is rapidly calculated. Based on the L-words frequency profile of each sequence, a pairwise standard Euclidean distance is then computed producing a symmetric genetic distance matrix, which can be used to generate a neighbor joining dendrogram or a multidimensional scaling graph. We present an improvement to word counting alignment-free approaches for sequence comparison, by determining a single optimal word length and combining suffix tree structures to the word counting tasks. Our approach is, thus, a fast and simple application that proved to be efficient and powerful when applied to mitochondrial genomes. The algorithm was implemented in Python language and is freely available on the web.

  19. AucPR: an AUC-based approach using penalized regression for disease prediction with high-dimensional omics data.

    Science.gov (United States)

    Yu, Wenbao; Park, Taesung

    2014-01-01

    It is common to get an optimal combination of markers for disease classification and prediction when multiple markers are available. Many approaches based on the area under the receiver operating characteristic curve (AUC) have been proposed. Existing works based on AUC in a high-dimensional context depend mainly on a non-parametric, smooth approximation of AUC, with no work using a parametric AUC-based approach, for high-dimensional data. We propose an AUC-based approach using penalized regression (AucPR), which is a parametric method used for obtaining a linear combination for maximizing the AUC. To obtain the AUC maximizer in a high-dimensional context, we transform a classical parametric AUC maximizer, which is used in a low-dimensional context, into a regression framework and thus, apply the penalization regression approach directly. Two kinds of penalization, lasso and elastic net, are considered. The parametric approach can avoid some of the difficulties of a conventional non-parametric AUC-based approach, such as the lack of an appropriate concave objective function and a prudent choice of the smoothing parameter. We apply the proposed AucPR for gene selection and classification using four real microarray and synthetic data. Through numerical studies, AucPR is shown to perform better than the penalized logistic regression and the nonparametric AUC-based method, in the sense of AUC and sensitivity for a given specificity, particularly when there are many correlated genes. We propose a powerful parametric and easily-implementable linear classifier AucPR, for gene selection and disease prediction for high-dimensional data. AucPR is recommended for its good prediction performance. Beside gene expression microarray data, AucPR can be applied to other types of high-dimensional omics data, such as miRNA and protein data.

  20. Analysis of sparse data in logistic regression in medical research: A newer approach

    Directory of Open Access Journals (Sweden)

    S Devika

    2016-01-01

    Full Text Available Background and Objective: In the analysis of dichotomous type response variable, logistic regression is usually used. However, the performance of logistic regression in the presence of sparse data is questionable. In such a situation, a common problem is the presence of high odds ratios (ORs with very wide 95% confidence interval (CI (OR: >999.999, 95% CI: 999.999. In this paper, we addressed this issue by using penalized logistic regression (PLR method. Materials and Methods: Data from case-control study on hyponatremia and hiccups conducted in Christian Medical College, Vellore, Tamil Nadu, India was used. The outcome variable was the presence/absence of hiccups and the main exposure variable was the status of hyponatremia. Simulation dataset was created with different sample sizes and with a different number of covariates. Results: A total of 23 cases and 50 controls were used for the analysis of ordinary and PLR methods. The main exposure variable hyponatremia was present in nine (39.13% of the cases and in four (8.0% of the controls. Of the 23 hiccup cases, all were males and among the controls, 46 (92.0% were males. Thus, the complete separation between gender and the disease group led into an infinite OR with 95% CI (OR: >999.999, 95% CI: 999.999 whereas there was a finite and consistent regression coefficient for gender (OR: 5.35; 95% CI: 0.42, 816.48 using PLR. After adjusting for all the confounding variables, hyponatremia entailed 7.9 (95% CI: 2.06, 38.86 times higher risk for the development of hiccups as was found using PLR whereas there was an overestimation of risk OR: 10.76 (95% CI: 2.17, 53.41 using the conventional method. Simulation experiment shows that the estimated coverage probability of this method is near the nominal level of 95% even for small sample sizes and for a large number of covariates. Conclusions: PLR is almost equal to the ordinary logistic regression when the sample size is large and is superior in small cell

  1. A Multi-stakeholder Approach to Moving Beyond Tree Mortality in the Sierra Nevada

    Science.gov (United States)

    Balachowski, J.; Buluc, L.; Fischer, C.; Ko, J.; Ostoja, S.

    2017-12-01

    The US Forest Service has estimated that 102 million trees have died in California since 2010. This die off event has been attributed to the combined effects of historical land management practices, fire suppression, insect outbreaks, and climate-related stressors. This tree mortality event represents the largest and most significant ecological disturbance in California in centuries, if not longer. Both scientists and managers recognize the need to rethink our approach to forest management in the face of a changing climate and increasingly frequent, uncharacteristically large wildfires, while budgets and staffing capacity continue to decrease. Addressing the uncertainly in managing under climate change with fewer financial resources will require multiple partners and stakeholders—including federal and state agencies, local governments, and non-governmental organizations—to work together to identify common goals and paths moving forward. The USDA California Climate Hub and USFS Region 5 convened a symposium on drought and tree mortality in July 2017. With nearly 170 attendees across a wide range of sectors, the event provided a meaningful opportunity for reflection, analysis, and consideration of next steps. Among the outcomes of this symposium was the identification of areas in which our capacity for individual and synergistic action is stronger, and those in which it is lacking that will thus require additional attention and effort. From this symposium, which included a series of smaller, stakeholder and partner working groups, we collectively identified research and information needs, possible policy adjustments, future management actions, and funding needs and opportunities. Here, we present these findings and suggest approaches for addressing the current tree mortality event based on the shared interests of multiple, diverse stakeholder groups.

  2. Using the mean approach in pooling cross-section and time series data for regression modelling

    International Nuclear Information System (INIS)

    Nuamah, N.N.N.N.

    1989-12-01

    The mean approach is one of the methods for pooling cross section and time series data for mathematical-statistical modelling. Though a simple approach, its results are sometimes paradoxical in nature. However, researchers still continue using it for its simplicity. Here, the paper investigates the nature and source of such unwanted phenomena. (author). 7 refs

  3. Antibiotic Resistances in Livestock: A Comparative Approach to Identify an Appropriate Regression Model for Count Data

    Directory of Open Access Journals (Sweden)

    Anke Hüls

    2017-05-01

    Full Text Available Antimicrobial resistance in livestock is a matter of general concern. To develop hygiene measures and methods for resistance prevention and control, epidemiological studies on a population level are needed to detect factors associated with antimicrobial resistance in livestock holdings. In general, regression models are used to describe these relationships between environmental factors and resistance outcome. Besides the study design, the correlation structures of the different outcomes of antibiotic resistance and structural zero measurements on the resistance outcome as well as on the exposure side are challenges for the epidemiological model building process. The use of appropriate regression models that acknowledge these complexities is essential to assure valid epidemiological interpretations. The aims of this paper are (i to explain the model building process comparing several competing models for count data (negative binomial model, quasi-Poisson model, zero-inflated model, and hurdle model and (ii to compare these models using data from a cross-sectional study on antibiotic resistance in animal husbandry. These goals are essential to evaluate which model is most suitable to identify potential prevention measures. The dataset used as an example in our analyses was generated initially to study the prevalence and associated factors for the appearance of cefotaxime-resistant Escherichia coli in 48 German fattening pig farms. For each farm, the outcome was the count of samples with resistant bacteria. There was almost no overdispersion and only moderate evidence of excess zeros in the data. Our analyses show that it is essential to evaluate regression models in studies analyzing the relationship between environmental factors and antibiotic resistances in livestock. After model comparison based on evaluation of model predictions, Akaike information criterion, and Pearson residuals, here the hurdle model was judged to be the most appropriate

  4. Identifying the Safety Factors over Traffic Signs in State Roads using a Panel Quantile Regression Approach.

    Science.gov (United States)

    Šarić, Željko; Xu, Xuecai; Duan, Li; Babić, Darko

    2018-06-20

    This study intended to investigate the interactions between accident rate and traffic signs in state roads located in Croatia, and accommodate the heterogeneity attributed to unobserved factors. The data from 130 state roads between 2012 and 2016 were collected from Traffic Accident Database System maintained by the Republic of Croatia Ministry of the Interior. To address the heterogeneity, a panel quantile regression model was proposed, in which quantile regression model offers a more complete view and a highly comprehensive analysis of the relationship between accident rate and traffic signs, while the panel data model accommodates the heterogeneity attributed to unobserved factors. Results revealed that (1) low visibility of material damage (MD) and death or injured (DI) increased the accident rate; (2) the number of mandatory signs and the number of warning signs were more likely to reduce the accident rate; (3)average speed limit and the number of invalid traffic signs per km exhibited a high accident rate. To our knowledge, it's the first attempt to analyze the interactions between accident consequences and traffic signs by employing a panel quantile regression model; by involving the visibility, the present study demonstrates that the low visibility causes a relatively higher risk of MD and DI; It is noteworthy that average speed limit corresponds with accident rate positively; The number of mandatory signs and the number of warning signs are more likely to reduce the accident rate; The number of invalid traffic signs per km are significant for accident rate, thus regular maintenance should be kept for a safer roadway environment.

  5. Modifiable risk factors predicting major depressive disorder at four year follow-up: a decision tree approach.

    Science.gov (United States)

    Batterham, Philip J; Christensen, Helen; Mackinnon, Andrew J

    2009-11-22

    Relative to physical health conditions such as cardiovascular disease, little is known about risk factors that predict the prevalence of depression. The present study investigates the expected effects of a reduction of these risks over time, using the decision tree method favoured in assessing cardiovascular disease risk. The PATH through Life cohort was used for the study, comprising 2,105 20-24 year olds, 2,323 40-44 year olds and 2,177 60-64 year olds sampled from the community in the Canberra region, Australia. A decision tree methodology was used to predict the presence of major depressive disorder after four years of follow-up. The decision tree was compared with a logistic regression analysis using ROC curves. The decision tree was found to distinguish and delineate a wide range of risk profiles. Previous depressive symptoms were most highly predictive of depression after four years, however, modifiable risk factors such as substance use and employment status played significant roles in assessing the risk of depression. The decision tree was found to have better sensitivity and specificity than a logistic regression using identical predictors. The decision tree method was useful in assessing the risk of major depressive disorder over four years. Application of the model to the development of a predictive tool for tailored interventions is discussed.

  6. Modifiable risk factors predicting major depressive disorder at four year follow-up: a decision tree approach

    Directory of Open Access Journals (Sweden)

    Christensen Helen

    2009-11-01

    Full Text Available Abstract Background Relative to physical health conditions such as cardiovascular disease, little is known about risk factors that predict the prevalence of depression. The present study investigates the expected effects of a reduction of these risks over time, using the decision tree method favoured in assessing cardiovascular disease risk. Methods The PATH through Life cohort was used for the study, comprising 2,105 20-24 year olds, 2,323 40-44 year olds and 2,177 60-64 year olds sampled from the community in the Canberra region, Australia. A decision tree methodology was used to predict the presence of major depressive disorder after four years of follow-up. The decision tree was compared with a logistic regression analysis using ROC curves. Results The decision tree was found to distinguish and delineate a wide range of risk profiles. Previous depressive symptoms were most highly predictive of depression after four years, however, modifiable risk factors such as substance use and employment status played significant roles in assessing the risk of depression. The decision tree was found to have better sensitivity and specificity than a logistic regression using identical predictors. Conclusion The decision tree method was useful in assessing the risk of major depressive disorder over four years. Application of the model to the development of a predictive tool for tailored interventions is discussed.

  7. The Application of Classification and Regression Trees for the Triage of Women for Referral to Colposcopy and the Estimation of Risk for Cervical Intraepithelial Neoplasia: A Study Based on 1625 Cases with Incomplete Data from Molecular Tests

    Directory of Open Access Journals (Sweden)

    Abraham Pouliakis

    2015-01-01

    Full Text Available Objective. Nowadays numerous ancillary techniques detecting HPV DNA and mRNA compete with cytology; however no perfect test exists; in this study we evaluated classification and regression trees (CARTs for the production of triage rules and estimate the risk for cervical intraepithelial neoplasia (CIN in cases with ASCUS+ in cytology. Study Design. We used 1625 cases. In contrast to other approaches we used missing data to increase the data volume, obtain more accurate results, and simulate real conditions in the everyday practice of gynecologic clinics and laboratories. The proposed CART was based on the cytological result, HPV DNA typing, HPV mRNA detection based on NASBA and flow cytometry, p16 immunocytochemical expression, and finally age and parous status. Results. Algorithms useful for the triage of women were produced; gynecologists could apply these in conjunction with available examination results and conclude to an estimation of the risk for a woman to harbor CIN expressed as a probability. Conclusions. The most important test was the cytological examination; however the CART handled cases with inadequate cytological outcome and increased the diagnostic accuracy by exploiting the results of ancillary techniques even if there were inadequate missing data. The CART performance was better than any other single test involved in this study.

  8. The Application of Classification and Regression Trees for the Triage of Women for Referral to Colposcopy and the Estimation of Risk for Cervical Intraepithelial Neoplasia: A Study Based on 1625 Cases with Incomplete Data from Molecular Tests.

    Science.gov (United States)

    Pouliakis, Abraham; Karakitsou, Efrossyni; Chrelias, Charalampos; Pappas, Asimakis; Panayiotides, Ioannis; Valasoulis, George; Kyrgiou, Maria; Paraskevaidis, Evangelos; Karakitsos, Petros

    2015-01-01

    Nowadays numerous ancillary techniques detecting HPV DNA and mRNA compete with cytology; however no perfect test exists; in this study we evaluated classification and regression trees (CARTs) for the production of triage rules and estimate the risk for cervical intraepithelial neoplasia (CIN) in cases with ASCUS+ in cytology. We used 1625 cases. In contrast to other approaches we used missing data to increase the data volume, obtain more accurate results, and simulate real conditions in the everyday practice of gynecologic clinics and laboratories. The proposed CART was based on the cytological result, HPV DNA typing, HPV mRNA detection based on NASBA and flow cytometry, p16 immunocytochemical expression, and finally age and parous status. Algorithms useful for the triage of women were produced; gynecologists could apply these in conjunction with available examination results and conclude to an estimation of the risk for a woman to harbor CIN expressed as a probability. The most important test was the cytological examination; however the CART handled cases with inadequate cytological outcome and increased the diagnostic accuracy by exploiting the results of ancillary techniques even if there were inadequate missing data. The CART performance was better than any other single test involved in this study.

  9. Identification of chilling and heat requirements of cherry trees--a statistical approach.

    Science.gov (United States)

    Luedeling, Eike; Kunz, Achim; Blanke, Michael M

    2013-09-01

    Most trees from temperate climates require the accumulation of winter chill and subsequent heat during their dormant phase to resume growth and initiate flowering in the following spring. Global warming could reduce chill and hence hamper the cultivation of high-chill species such as cherries. Yet determining chilling and heat requirements requires large-scale controlled-forcing experiments, and estimates are thus often unavailable. Where long-term phenology datasets exist, partial least squares (PLS) regression can be used as an alternative, to determine climatic requirements statistically. Bloom dates of cherry cv. 'Schneiders späte Knorpelkirsche' trees in Klein-Altendorf, Germany, from 24 growing seasons were correlated with 11-day running means of daily mean temperature. Based on the output of the PLS regression, five candidate chilling periods ranging in length from 17 to 102 days, and one forcing phase of 66 days were delineated. Among three common chill models used to quantify chill, the Dynamic Model showed the lowest variation in chill, indicating that it may be more accurate than the Utah and Chilling Hours Models. Based on the longest candidate chilling phase with the earliest starting date, cv. 'Schneiders späte Knorpelkirsche' cherries at Bonn exhibited a chilling requirement of 68.6 ± 5.7 chill portions (or 1,375 ± 178 chilling hours or 1,410 ± 238 Utah chill units) and a heat requirement of 3,473 ± 1,236 growing degree hours. Closer investigation of the distinct chilling phases detected by PLS regression could contribute to our understanding of dormancy processes and thus help fruit and nut growers identify suitable tree cultivars for a future in which static climatic conditions can no longer be assumed. All procedures used in this study were bundled in an R package ('chillR') and are provided as Supplementary materials. The procedure was also applied to leaf emergence dates of walnut (cv. 'Payne') at Davis, California.

  10. The N-shaped environmental Kuznets curve: an empirical evaluation using a panel quantile regression approach.

    Science.gov (United States)

    Allard, Alexandra; Takman, Johanna; Uddin, Gazi Salah; Ahmed, Ali

    2018-02-01

    We evaluate the N-shaped environmental Kuznets curve (EKC) using panel quantile regression analysis. We investigate the relationship between CO 2 emissions and GDP per capita for 74 countries over the period of 1994-2012. We include additional explanatory variables, such as renewable energy consumption, technological development, trade, and institutional quality. We find evidence for the N-shaped EKC in all income groups, except for the upper-middle-income countries. Heterogeneous characteristics are, however, observed over the N-shaped EKC. Finally, we find a negative relationship between renewable energy consumption and CO 2 emissions, which highlights the importance of promoting greener energy in order to combat global warming.

  11. Regional trends in short-duration precipitation extremes: a flexible multivariate monotone quantile regression approach

    Science.gov (United States)

    Cannon, Alex

    2017-04-01

    Estimating historical trends in short-duration rainfall extremes at regional and local scales is challenging due to low signal-to-noise ratios and the limited availability of homogenized observational data. In addition to being of scientific interest, trends in rainfall extremes are of practical importance, as their presence calls into question the stationarity assumptions that underpin traditional engineering and infrastructure design practice. Even with these fundamental challenges, increasingly complex questions are being asked about time series of extremes. For instance, users may not only want to know whether or not rainfall extremes have changed over time, they may also want information on the modulation of trends by large-scale climate modes or on the nonstationarity of trends (e.g., identifying hiatus periods or periods of accelerating positive trends). Efforts have thus been devoted to the development and application of more robust and powerful statistical estimators for regional and local scale trends. While a standard nonparametric method like the regional Mann-Kendall test, which tests for the presence of monotonic trends (i.e., strictly non-decreasing or non-increasing changes), makes fewer assumptions than parametric methods and pools information from stations within a region, it is not designed to visualize detected trends, include information from covariates, or answer questions about the rate of change in trends. As a remedy, monotone quantile regression (MQR) has been developed as a nonparametric alternative that can be used to estimate a common monotonic trend in extremes at multiple stations. Quantile regression makes efficient use of data by directly estimating conditional quantiles based on information from all rainfall data in a region, i.e., without having to precompute the sample quantiles. The MQR method is also flexible and can be used to visualize and analyze the nonlinearity of the detected trend. However, it is fundamentally a

  12. THE GENDER PAY GAP IN VIETNAM, 1993-2002: A QUANTILE REGRESSION APPROACH

    OpenAIRE

    Pham, Hung T; Reilly, Barry

    2007-01-01

    This paper uses mean and quantile regression analysis to investigate the gender pay gap for the wage employed in Vietnam over the period 1993 to 2002. It finds that the Doi moi reforms appear to have been associated with a sharp reduction in gender pay gap disparities for the wage employed. The average gender pay gap in this sector halved between 1993 and 2002 with most of the contraction evident by 1998. There has also been a narrowing in the gender pay gap at most selected points of the con...

  13. The Gender Pay Gap In Vietnam, 1993-2002: A Quantile Regression Approach

    OpenAIRE

    Barry Reilly & T. Hung Pham

    2006-01-01

    This paper uses mean and quantile regression analysis to investigate the gender pay gap for the wage employed in Vietnam over the period 1993 to 2002. It finds that the Doi moi reforms have been associated with a sharp reduction in gender wage disparities for the wage employed. The average gender pay gap in this sector halved between 1993 and 2002 with most of the contraction evident by 1998. There has also been a contraction in the gender pay at most selected points of the conditional wage d...

  14. Risk assessment for enterprise resource planning (ERP) system implementations: a fault tree analysis approach

    Science.gov (United States)

    Zeng, Yajun; Skibniewski, Miroslaw J.

    2013-08-01

    Enterprise resource planning (ERP) system implementations are often characterised with large capital outlay, long implementation duration, and high risk of failure. In order to avoid ERP implementation failure and realise the benefits of the system, sound risk management is the key. This paper proposes a probabilistic risk assessment approach for ERP system implementation projects based on fault tree analysis, which models the relationship between ERP system components and specific risk factors. Unlike traditional risk management approaches that have been mostly focused on meeting project budget and schedule objectives, the proposed approach intends to address the risks that may cause ERP system usage failure. The approach can be used to identify the root causes of ERP system implementation usage failure and quantify the impact of critical component failures or critical risk events in the implementation process.

  15. Statistical Downscaling Output GCM Modeling with Continuum Regression and Pre-Processing PCA Approach

    Directory of Open Access Journals (Sweden)

    Sutikno Sutikno

    2010-08-01

    Full Text Available One of the climate models used to predict the climatic conditions is Global Circulation Models (GCM. GCM is a computer-based model that consists of different equations. It uses numerical and deterministic equation which follows the physics rules. GCM is a main tool to predict climate and weather, also it uses as primary information source to review the climate change effect. Statistical Downscaling (SD technique is used to bridge the large-scale GCM with a small scale (the study area. GCM data is spatial and temporal data most likely to occur where the spatial correlation between different data on the grid in a single domain. Multicollinearity problems require the need for pre-processing of variable data X. Continuum Regression (CR and pre-processing with Principal Component Analysis (PCA methods is an alternative to SD modelling. CR is one method which was developed by Stone and Brooks (1990. This method is a generalization from Ordinary Least Square (OLS, Principal Component Regression (PCR and Partial Least Square method (PLS methods, used to overcome multicollinearity problems. Data processing for the station in Ambon, Pontianak, Losarang, Indramayu and Yuntinyuat show that the RMSEP values and R2 predict in the domain 8x8 and 12x12 by uses CR method produces results better than by PCR and PLS.

  16. A Gaussian mixture copula model based localized Gaussian process regression approach for long-term wind speed prediction

    International Nuclear Information System (INIS)

    Yu, Jie; Chen, Kuilin; Mori, Junichi; Rashid, Mudassir M.

    2013-01-01

    Optimizing wind power generation and controlling the operation of wind turbines to efficiently harness the renewable wind energy is a challenging task due to the intermittency and unpredictable nature of wind speed, which has significant influence on wind power production. A new approach for long-term wind speed forecasting is developed in this study by integrating GMCM (Gaussian mixture copula model) and localized GPR (Gaussian process regression). The time series of wind speed is first classified into multiple non-Gaussian components through the Gaussian mixture copula model and then Bayesian inference strategy is employed to incorporate the various non-Gaussian components using the posterior probabilities. Further, the localized Gaussian process regression models corresponding to different non-Gaussian components are built to characterize the stochastic uncertainty and non-stationary seasonality of the wind speed data. The various localized GPR models are integrated through the posterior probabilities as the weightings so that a global predictive model is developed for the prediction of wind speed. The proposed GMCM–GPR approach is demonstrated using wind speed data from various wind farm locations and compared against the GMCM-based ARIMA (auto-regressive integrated moving average) and SVR (support vector regression) methods. In contrast to GMCM–ARIMA and GMCM–SVR methods, the proposed GMCM–GPR model is able to well characterize the multi-seasonality and uncertainty of wind speed series for accurate long-term prediction. - Highlights: • A novel predictive modeling method is proposed for long-term wind speed forecasting. • Gaussian mixture copula model is estimated to characterize the multi-seasonality. • Localized Gaussian process regression models can deal with the random uncertainty. • Multiple GPR models are integrated through Bayesian inference strategy. • The proposed approach shows higher prediction accuracy and reliability

  17. [Study on sensitivity of climatic factors on influenza A (H1N1) based on classification and regression tree and wavelet analysis].

    Science.gov (United States)

    Xiao, Hong; Lin, Xiao-ling; Dai, Xiang-yu; Gao, Li-dong; Chen, Bi-yun; Zhang, Xi-xing; Zhu, Pei-juan; Tian, Huai-yu

    2012-05-01

    To analyze the periodicity of pandemic influenza A (H1N1) in Changsha in year 2009 and its correlation with sensitive climatic factors. The information of 5439 cases of influenza A (H1N1) and synchronous meteorological data during the period between May 22th and December 31st in year 2009 (223 days in total) in Changsha city were collected. The classification and regression tree (CART) was employed to screen the sensitive climatic factors on influenza A (H1N1); meanwhile, cross wavelet transform and wavelet coherence analysis were applied to assess and compare the periodicity of the pandemic disease and its association with the time-lag phase features of the sensitive climatic factors. The results of CART indicated that the daily minimum temperature and daily absolute humidity were the sensitive climatic factors for the popularity of influenza A (H1N1) in Changsha. The peak of the incidence of influenza A (H1N1) was in the period between October and December (Median (M) = 44.00 cases per day), simultaneously the daily minimum temperature (M = 13°C) and daily absolute humidity (M = 6.69 g/m(3)) were relatively low. The results of wavelet analysis demonstrated that a period of 16 days was found in the epidemic threshold in Changsha, while the daily minimum temperature and daily absolute humidity were the relatively sensitive climatic factors. The number of daily reported patients was statistically relevant to the daily minimum temperature and daily absolute humidity. The frequency domain was mostly in the period of (16 ± 2) days. In the initial stage of the disease (from August 9th and September 8th), a 6-day lag was found between the incidence and the daily minimum temperature. In the peak period of the disease, the daily minimum temperature and daily absolute humidity were negatively relevant to the incidence of the disease. In the pandemic period, the incidence of influenza A (H1N1) showed periodic features; and the sensitive climatic factors did have a "driving

  18. Repeated measurements of blood lactate concentration as a prognostic marker in horses with acute colitis evaluated with classification and regression trees (CART) and random forest analysis

    DEFF Research Database (Denmark)

    Petersen, Mette Bisgaard; Tolver, Anders; Husted, Louise

    2016-01-01

    -off value of 7 mmol/L had a sensitivity of 0.66 and a specificity of 0.92 in predicting survival. In independent test data, the sensitivity was 0.69 and the specificity was 0.76. At the observed survival rate (38%), the optimal decision tree identified horses as non-survivors when the Lac at admission...... admitted with acute colitis (trees, as well as random...

  19. Regression models for categorical, count, and related variables an applied approach

    CERN Document Server

    Hoffmann, John P

    2016-01-01

    Social science and behavioral science students and researchers are often confronted with data that are categorical, count a phenomenon, or have been collected over time. Sociologists examining the likelihood of interracial marriage, political scientists studying voting behavior, criminologists counting the number of offenses people commit, health scientists studying the number of suicides across neighborhoods, and psychologists modeling mental health treatment success are all interested in outcomes that are not continuous. Instead, they must measure and analyze these events and phenomena in a discrete manner.   This book provides an introduction and overview of several statistical models designed for these types of outcomes--all presented with the assumption that the reader has only a good working knowledge of elementary algebra and has taken introductory statistics and linear regression analysis.   Numerous examples from the social sciences demonstrate the practical applications of these models. The chapte...

  20. Key factors contributing to accident severity rate in construction industry in Iran: a regression modelling approach.

    Science.gov (United States)

    Soltanzadeh, Ahmad; Mohammadfam, Iraj; Moghimbeigi, Abbas; Ghiasvand, Reza

    2016-03-01

    Construction industry involves the highest risk of occupational accidents and bodily injuries, which range from mild to very severe. The aim of this cross-sectional study was to identify the factors associated with accident severity rate (ASR) in the largest Iranian construction companies based on data about 500 occupational accidents recorded from 2009 to 2013. We also gathered data on safety and health risk management and training systems. Data were analysed using Pearson's chi-squared coefficient and multiple regression analysis. Median ASR (and the interquartile range) was 107.50 (57.24- 381.25). Fourteen of the 24 studied factors stood out as most affecting construction accident severity (p<0.05). These findings can be applied in the design and implementation of a comprehensive safety and health risk management system to reduce ASR.

  1. Flow modeling in a porous cylinder with regressing walls using semi analytical approach

    Directory of Open Access Journals (Sweden)

    M Azimi

    2016-10-01

    Full Text Available In this paper, the mathematical modeling of the flow in a porous cylinder with a focus on applications to solid rocket motors is presented. As usual, the cylindrical propellant grain of a solid rocket motor is modeled as a long tube with one end closed at the headwall, while the other remains open. The cylindrical wall is assumed to be permeable so as to simulate the propellant burning and normal gas injection. At first, the problem description and formulation are considered. The Navier-Stokes equations for the viscous flow in a porous cylinder with regressing walls are reduced to a nonlinear ODE by using a similarity transformation in time and space. Application of Differential Transformation Method (DTM as an approximate analytical method has been successfully applied. Finally the results have been presented for various cases.

  2. In search of a corrected prescription drug elasticity estimate: a meta-regression approach.

    Science.gov (United States)

    Gemmill, Marin C; Costa-Font, Joan; McGuire, Alistair

    2007-06-01

    An understanding of the relationship between cost sharing and drug consumption depends on consistent and unbiased price elasticity estimates. However, there is wide heterogeneity among studies, which constrains the applicability of elasticity estimates for empirical purposes and policy simulation. This paper attempts to provide a corrected measure of the drug price elasticity by employing meta-regression analysis (MRA). The results indicate that the elasticity estimates are significantly different from zero, and the corrected elasticity is -0.209 when the results are made robust to heteroskedasticity and clustering of observations. Elasticity values are higher when the study was published in an economic journal, when the study employed a greater number of observations, and when the study used aggregate data. Elasticity estimates are lower when the institutional setting was a tax-based health insurance system.

  3. Casemix funding for a specialist paediatrics hospital: a hedonic regression approach.

    Science.gov (United States)

    Bridges, J F; Hanson, R M

    2000-01-01

    This paper inquires into the effects that Diagnosis Related Groups (DRGs) have had on the ability to explain patient-level costs in a specialist paediatrics hospital. Two hedonic models are estimated using 1996/97 New Children's Hospital (NCH) patient level cost data, one with and one without a casemix index (CMI). The results show that the inclusion of a casemix index as an explanatory variable leads to a better accounting of cost. The full hedonic model is then used to simulate a funding model for the 1997/98 NCH cost data. These costs are highly correlated with the actual costs reported for that year. In addition, univariate regression indicates that there has been inflation in costs in the order of 4.8% between the two years. In conclusion, hedonic analysis can provide valuable evidence for the design of funding models that account for casemix.

  4. How efficient are referral hospitals in Uganda? A data envelopment analysis and tobit regression approach.

    Science.gov (United States)

    Mujasi, Paschal N; Asbu, Eyob Z; Puig-Junoy, Jaume

    2016-07-08

    Hospitals represent a significant proportion of health expenditures in Uganda, accounting for about 26 % of total health expenditure. Improving the technical efficiency of hospitals in Uganda can result in large savings which can be devoted to expand access to services and improve quality of care. This paper explores the technical efficiency of referral hospitals in Uganda during the 2012/2013 financial year. This was a cross sectional study using secondary data. Input and output data were obtained from the Uganda Ministry of Health annual health sector performance report for the period July 1, 2012 to June 30, 2013 for the 14 public sector regional referral and 4 large private not for profit hospitals. We assumed an output-oriented model with Variable Returns to Scale to estimate the efficiency score for each hospital using Data Envelopment Analysis (DEA) with STATA13. Using a Tobit model DEA, efficiency scores were regressed against selected institutional and contextual/environmental factors to estimate their impacts on efficiency. The average variable returns to scale (Pure) technical efficiency score was 91.4 % and the average scale efficiency score was 87.1 % while the average constant returns to scale technical efficiency score was 79.4 %. Technically inefficient hospitals could have become more efficient by increasing the outpatient department visits by 45,943; and inpatient days by 31,425 without changing the total number of inputs. Alternatively, they would achieve efficiency by for example transferring the excess 216 medical staff and 454 beds to other levels of the health system without changing the total number of outputs. Tobit regression indicates that significant factors in explaining hospital efficiency are: hospital size (p Uganda.

  5. Pattern recognition of spruce trees. An integrated, analytical approach to forest damage

    International Nuclear Information System (INIS)

    Simmleit, N.; Schulten, H.R.

    1989-01-01

    In-source pyrolysis-field ionization mass spectrometry was used to fingerprint old needles taken from 90-year-old Norway spruce trees (Picea abies) grown in the Taunus mountains (Federal Republic of Germany). Biometric, physiological variables and elemental compositions of needle and forest soil samples were gathered for the same trees. The mass spectral and conventional data sets were evaluated by principal-component and multiple regression analysis. The results indicate that the mass signal pattern of antioxidants, the soil acidity, the water status, and the nutritional supply of the plant contribute most to the variance of damage symptoms observed in the forest stand investigated. The visual needle loss of the canopy can be predicted by antioxidant, soil acidity, and water status parameters, whereas a further classification according to the discoloration of the needles can only be achieved by adding a soil nutrient component. It is emphasized that multivariate statistical evaluation of complex data sets should be used for the investigation of environmental problems

  6. Crown-level tree species classification from AISA hyperspectral imagery using an innovative pixel-weighting approach

    Science.gov (United States)

    Liu, Haijian; Wu, Changshan

    2018-06-01

    Crown-level tree species classification is a challenging task due to the spectral similarity among different tree species. Shadow, underlying objects, and other materials within a crown may decrease the purity of extracted crown spectra and further reduce classification accuracy. To address this problem, an innovative pixel-weighting approach was developed for tree species classification at the crown level. The method utilized high density discrete LiDAR data for individual tree delineation and Airborne Imaging Spectrometer for Applications (AISA) hyperspectral imagery for pure crown-scale spectra extraction. Specifically, three steps were included: 1) individual tree identification using LiDAR data, 2) pixel-weighted representative crown spectra calculation using hyperspectral imagery, with which pixel-based illuminated-leaf fractions estimated using a linear spectral mixture analysis (LSMA) were employed as weighted factors, and 3) representative spectra based tree species classification was performed through applying a support vector machine (SVM) approach. Analysis of results suggests that the developed pixel-weighting approach (OA = 82.12%, Kc = 0.74) performed better than treetop-based (OA = 70.86%, Kc = 0.58) and pixel-majority methods (OA = 72.26, Kc = 0.62) in terms of classification accuracy. McNemar tests indicated the differences in accuracy between pixel-weighting and treetop-based approaches as well as that between pixel-weighting and pixel-majority approaches were statistically significant.

  7. A Classification Regression Tree Analysis to Reduce Balance Impairments and Falls in the Older population: Impact on Resource Utilization and Clinical Decision-Making in USA Rehabilitation Service Delivery

    Directory of Open Access Journals (Sweden)

    Lucinda Pfalzer

    2013-06-01

    Full Text Available Background/Purpose: Over 1/3 of adults over age 65 experiences at least one fall each year. This pilot report uses a classification regression tree analysis (CART to model the outcomes for balance/risk of falls from the Gentiva® Safe Strides® Program (SSP. Methods/Outcomes: SSP is a home-based balance/fall prevention program designed to treat root causes of a patient

  8. Identifying multiple outliers in linear regression: robust fit and clustering approach

    International Nuclear Information System (INIS)

    Robiah Adnan; Mohd Nor Mohamad; Halim Setan

    2001-01-01

    This research provides a clustering based approach for determining potential candidates for outliers. This is modification of the method proposed by Serbert et. al (1988). It is based on using the single linkage clustering algorithm to group the standardized predicted and residual values of data set fit by least trimmed of squares (LTS). (Author)

  9. Relative Age in School and Suicide among Young Individuals in Japan: A Regression Discontinuity Approach.

    Directory of Open Access Journals (Sweden)

    Tetsuya Matsubayashi

    Full Text Available Evidence collected in many parts of the world suggests that, compared to older students, students who are relatively younger at school entry tend to have worse academic performance and lower levels of income. This study examined how relative age in a grade affects suicide rates of adolescents and young adults between 15 and 25 years of age using data from Japan.We examined individual death records in the Vital Statistics of Japan from 1989 to 2010. In contrast to other countries, late entry to primary school is not allowed in Japan. We took advantage of the school entry cutoff date to implement a regression discontinuity (RD design, assuming that the timing of births around the school entry cutoff date was randomly determined and therefore that individuals who were born just before and after the cutoff date have similar baseline characteristics.We found that those who were born right before the school cutoff day and thus youngest in their cohort have higher mortality rates by suicide, compared to their peers who were born right after the cutoff date and thus older. We also found that those with relative age disadvantage tend to follow a different career path than those with relative age advantage, which may explain their higher suicide mortality rates.Relative age effects have broader consequences than was previously supposed. This study suggests that policy intervention that alleviates the relative age effect can be important.

  10. A kernel regression approach to gene-gene interaction detection for case-control studies.

    Science.gov (United States)

    Larson, Nicholas B; Schaid, Daniel J

    2013-11-01

    Gene-gene interactions are increasingly being addressed as a potentially important contributor to the variability of complex traits. Consequently, attentions have moved beyond single locus analysis of association to more complex genetic models. Although several single-marker approaches toward interaction analysis have been developed, such methods suffer from very high testing dimensionality and do not take advantage of existing information, notably the definition of genes as functional units. Here, we propose a comprehensive family of gene-level score tests for identifying genetic elements of disease risk, in particular pairwise gene-gene interactions. Using kernel machine methods, we devise score-based variance component tests under a generalized linear mixed model framework. We conducted simulations based upon coalescent genetic models to evaluate the performance of our approach under a variety of disease models. These simulations indicate that our methods are generally higher powered than alternative gene-level approaches and at worst competitive with exhaustive SNP-level (where SNP is single-nucleotide polymorphism) analyses. Furthermore, we observe that simulated epistatic effects resulted in significant marginal testing results for the involved genes regardless of whether or not true main effects were present. We detail the benefits of our methods and discuss potential genome-wide analysis strategies for gene-gene interaction analysis in a case-control study design. © 2013 WILEY PERIODICALS, INC.

  11. Direct energy rebound effect for road passenger transport in China: A dynamic panel quantile regression approach

    International Nuclear Information System (INIS)

    Zhang, Yue-Jun; Peng, Hua-Rong; Liu, Zhao; Tan, Weiping

    2015-01-01

    The transport sector appears a main energy consumer in China and plays a significant role in energy conservation. Improving energy efficiency proves an effective way to reduce energy consumption in transport sector, whereas its effectiveness may be affected by the rebound effect. This paper proposes a dynamic panel quantile regression model to estimate the direct energy rebound effect for road passenger transport in the whole country, eastern, central and western China, respectively, based on the data of 30 provinces from 2003 to 2012. The empirical results reveal that, first of all, the direct rebound effect does exist for road passenger transport and on the whole country, the short-term and long-term direct rebound effects are 25.53% and 26.56% on average, respectively. Second, the direct rebound effect for road passenger transport in central and eastern China tends to decrease, increase and then decrease again, whereas that in western China decreases and then increases, with the increasing passenger kilometers. Finally, when implementing energy efficiency policy in road passenger transport sector, the effectiveness of energy conservation in western China proves much better than that in central China overall, while the effectiveness in central China is relatively better than that in eastern China. - Highlights: • The direct rebound effect (RE) for road passenger transport in China is estimated. • The direct RE in the whole country, eastern, central, and western China is analyzed. • The short and long-term direct REs are 25.53% and 26.56% within the sample period. • Western China has better energy-saving performance than central and eastern China.

  12. Consistency analysis of subspace identification methods based on a linear regression approach

    DEFF Research Database (Denmark)

    Knudsen, Torben

    2001-01-01

    In the literature results can be found which claim consistency for the subspace method under certain quite weak assumptions. Unfortunately, a new result gives a counter example showing inconsistency under these assumptions and then gives new more strict sufficient assumptions which however does n...... not include important model structures as e.g. Box-Jenkins. Based on a simple least squares approach this paper shows the possible inconsistency under the weak assumptions and develops only slightly stricter assumptions sufficient for consistency and which includes any model structure...

  13. Tree-based approaches for understanding growth patterns in the European regions

    Directory of Open Access Journals (Sweden)

    Paola Annoni

    2016-09-01

    transport infrastructure, human capital, labour market and research and innovation - and incorporates the institutional quality and two variables which aim to reflect the macroeconomic conditions in which the regions operate. Given the scarcity of reliable and comparable regional data at the EU level, large part of the analysis has been devoted to build reliable and consistent panel data on potential factors of growth. Two non-parametric, decision-tree techniques, randomized Classication and Regression Tree and Multivariate Adaptive Regression Splines, are employed for their ability to address data complexities such as non-linearities and interaction eects, which are generally a challenge for more traditional statistical procedures such as linear regression. Results show that the dependence of growth rates on the factors included in the analysis is clearly non-linear with important factor interactions. This means that growth is determined by the simultaneous presence of multiple stimulus factors rather than the presence of a single area of excellence. Results also conrm the critical importance of the macroeconomic framework together with human capital as major drivers of economic growth of countries and regions. This is overall in line with most of the economic literature, which has persistently underlined the major role of these factors on economic growth but with the novelty that the macroeconomic conditions are here incorporated. Human capital also has an important role, with low-skilled workforce having a higher detrimental eect on growth than high-skilled. Not surprisingly, other important factors are the quality of governance and, in line with the neoclassical growth theory, the stage of development, with less developed economies growing at a faster pace than the others. The evidence given by the model about the impact of other factors on economic growth such as those on the quality of infrastructure or the level of innovation seems to be more limited and

  14. In silico prediction of toxicity of phenols to Tetrahymena pyriformis by using genetic algorithm and decision tree-based modeling approach.

    Science.gov (United States)

    Abbasitabar, Fatemeh; Zare-Shahabadi, Vahid

    2017-04-01

    Risk assessment of chemicals is an important issue in environmental protection; however, there is a huge lack of experimental data for a large number of end-points. The experimental determination of toxicity of chemicals involves high costs and time-consuming process. In silico tools such as quantitative structure-toxicity relationship (QSTR) models, which are constructed on the basis of computational molecular descriptors, can predict missing data for toxic end-points for existing or even not yet synthesized chemicals. Phenol derivatives are known to be aquatic pollutants. With this background, we aimed to develop an accurate and reliable QSTR model for the prediction of toxicity of 206 phenols to Tetrahymena pyriformis. A multiple linear regression (MLR)-based QSTR was obtained using a powerful descriptor selection tool named Memorized_ACO algorithm. Statistical parameters of the model were 0.72 and 0.68 for R training 2 and R test 2 , respectively. To develop a high-quality QSTR model, classification and regression tree (CART) was employed. Two approaches were considered: (1) phenols were classified into different modes of action using CART and (2) the phenols in the training set were partitioned to several subsets by a tree in such a manner that in each subset, a high-quality MLR could be developed. For the first approach, the statistical parameters of the resultant QSTR model were improved to 0.83 and 0.75 for R training 2 and R test 2 , respectively. Genetic algorithm was employed in the second approach to obtain an optimal tree, and it was shown that the final QSTR model provided excellent prediction accuracy for the training and test sets (R training 2 and R test 2 were 0.91 and 0.93, respectively). The mean absolute error for the test set was computed as 0.1615. Copyright © 2016 Elsevier Ltd. All rights reserved.

  15. Short-term wind speed prediction using an unscented Kalman filter based state-space support vector regression approach

    International Nuclear Information System (INIS)

    Chen, Kuilin; Yu, Jie

    2014-01-01

    Highlights: • A novel hybrid modeling method is proposed for short-term wind speed forecasting. • Support vector regression model is constructed to formulate nonlinear state-space framework. • Unscented Kalman filter is adopted to recursively update states under random uncertainty. • The new SVR–UKF approach is compared to several conventional methods for short-term wind speed prediction. • The proposed method demonstrates higher prediction accuracy and reliability. - Abstract: Accurate wind speed forecasting is becoming increasingly important to improve and optimize renewable wind power generation. Particularly, reliable short-term wind speed prediction can enable model predictive control of wind turbines and real-time optimization of wind farm operation. However, this task remains challenging due to the strong stochastic nature and dynamic uncertainty of wind speed. In this study, unscented Kalman filter (UKF) is integrated with support vector regression (SVR) based state-space model in order to precisely update the short-term estimation of wind speed sequence. In the proposed SVR–UKF approach, support vector regression is first employed to formulate a nonlinear state-space model and then unscented Kalman filter is adopted to perform dynamic state estimation recursively on wind sequence with stochastic uncertainty. The novel SVR–UKF method is compared with artificial neural networks (ANNs), SVR, autoregressive (AR) and autoregressive integrated with Kalman filter (AR-Kalman) approaches for predicting short-term wind speed sequences collected from three sites in Massachusetts, USA. The forecasting results indicate that the proposed method has much better performance in both one-step-ahead and multi-step-ahead wind speed predictions than the other approaches across all the locations

  16. Comparison of beta-binomial regression model approaches to analyze health-related quality of life data.

    Science.gov (United States)

    Najera-Zuloaga, Josu; Lee, Dae-Jin; Arostegui, Inmaculada

    2017-01-01

    Health-related quality of life has become an increasingly important indicator of health status in clinical trials and epidemiological research. Moreover, the study of the relationship of health-related quality of life with patients and disease characteristics has become one of the primary aims of many health-related quality of life studies. Health-related quality of life scores are usually assumed to be distributed as binomial random variables and often highly skewed. The use of the beta-binomial distribution in the regression context has been proposed to model such data; however, the beta-binomial regression has been performed by means of two different approaches in the literature: (i) beta-binomial distribution with a logistic link; and (ii) hierarchical generalized linear models. None of the existing literature in the analysis of health-related quality of life survey data has performed a comparison of both approaches in terms of adequacy and regression parameter interpretation context. This paper is motivated by the analysis of a real data application of health-related quality of life outcomes in patients with Chronic Obstructive Pulmonary Disease, where the use of both approaches yields to contradictory results in terms of covariate effects significance and consequently the interpretation of the most relevant factors in health-related quality of life. We present an explanation of the results in both methodologies through a simulation study and address the need to apply the proper approach in the analysis of health-related quality of life survey data for practitioners, providing an R package.

  17. Education-Based Gaps in eHealth: A Weighted Logistic Regression Approach.

    Science.gov (United States)

    Amo, Laura

    2016-10-12

    Persons with a college degree are more likely to engage in eHealth behaviors than persons without a college degree, compounding the health disadvantages of undereducated groups in the United States. However, the extent to which quality of recent eHealth experience reduces the education-based eHealth gap is unexplored. The goal of this study was to examine how eHealth information search experience moderates the relationship between college education and eHealth behaviors. Based on a nationally representative sample of adults who reported using the Internet to conduct the most recent health information search (n=1458), I evaluated eHealth search experience in relation to the likelihood of engaging in different eHealth behaviors. I examined whether Internet health information search experience reduces the eHealth behavior gaps among college-educated and noncollege-educated adults. Weighted logistic regression models were used to estimate the probability of different eHealth behaviors. College education was significantly positively related to the likelihood of 4 eHealth behaviors. In general, eHealth search experience was negatively associated with health care behaviors, health information-seeking behaviors, and user-generated or content sharing behaviors after accounting for other covariates. Whereas Internet health information search experience has narrowed the education gap in terms of likelihood of using email or Internet to communicate with a doctor or health care provider and likelihood of using a website to manage diet, weight, or health, it has widened the education gap in the instances of searching for health information for oneself, searching for health information for someone else, and downloading health information on a mobile device. The relationship between college education and eHealth behaviors is moderated by Internet health information search experience in different ways depending on the type of eHealth behavior. After controlling for college

  18. An Approach to Indexing and Retrieval of Spatial Data with Reduced R+ Tree and K-NN Query Algorithm

    OpenAIRE

    S. Palaniappan; T.V. Rajinikanth; A. Govardhan

    2015-01-01

    Recently, “spatial data bases have been extensively adopted in the recent decade and various methods have been presented to store, browse, search and retrieve spatial objects”. In this study, a method is plotted for retrieving nearest neighbors from spatial data indexed by R+ tree. The approach uses a reduced R+tree for the purpose of representing the spatial data. Initially the spatial data is selected and R+tree is constructed accordingly. Then a function called joining nodes is applied to ...

  19. Straight line fitting and predictions: On a marginal likelihood approach to linear regression and errors-in-variables models

    Science.gov (United States)

    Christiansen, Bo

    2015-04-01

    Linear regression methods are without doubt the most used approaches to describe and predict data in the physical sciences. They are often good first order approximations and they are in general easier to apply and interpret than more advanced methods. However, even the properties of univariate regression can lead to debate over the appropriateness of various models as witnessed by the recent discussion about climate reconstruction methods. Before linear regression is applied important choices have to be made regarding the origins of the noise terms and regarding which of the two variables under consideration that should be treated as the independent variable. These decisions are often not easy to make but they may have a considerable impact on the results. We seek to give a unified probabilistic - Bayesian with flat priors - treatment of univariate linear regression and prediction by taking, as starting point, the general errors-in-variables model (Christiansen, J. Clim., 27, 2014-2031, 2014). Other versions of linear regression can be obtained as limits of this model. We derive the likelihood of the model parameters and predictands of the general errors-in-variables model by marginalizing over the nuisance parameters. The resulting likelihood is relatively simple and easy to analyze and calculate. The well known unidentifiability of the errors-in-variables model is manifested as the absence of a well-defined maximum in the likelihood. However, this does not mean that probabilistic inference can not be made; the marginal likelihoods of model parameters and the predictands have, in general, well-defined maxima. We also include a probabilistic version of classical calibration and show how it is related to the errors-in-variables model. The results are illustrated by an example from the coupling between the lower stratosphere and the troposphere in the Northern Hemisphere winter.

  20. New approaches to phylogenetic tree search and their application to large numbers of protein alignments.

    Science.gov (United States)

    Whelan, Simon

    2007-10-01

    Phylogenetic tree estimation plays a critical role in a wide variety of molecular studies, including molecular systematics, phylogenetics, and comparative genomics. Finding the optimal tree relating a set of sequences using score-based (optimality criterion) methods, such as maximum likelihood and maximum parsimony, may require all possible trees to be considered, which is not feasible even for modest numbers of sequences. In practice, trees are estimated using heuristics that represent a trade-off between topological accuracy and speed. I present a series of novel algorithms suitable for score-based phylogenetic tree reconstruction that demonstrably improve the accuracy of tree estimates while maintaining high computational speeds. The heuristics function by allowing the efficient exploration of large numbers of trees through novel hill-climbing and resampling strategies. These heuristics, and other computational approximations, are implemented for maximum likelihood estimation of trees in the program Leaphy, and its performance is compared to other popular phylogenetic programs. Trees are estimated from 4059 different protein alignments using a selection of phylogenetic programs and the likelihoods of the tree estimates are compared. Trees estimated using Leaphy are found to have equal to or better likelihoods than trees estimated using other phylogenetic programs in 4004 (98.6%) families and provide a unique best tree that no other program found in 1102 (27.1%) families. The improvement is particularly marked for larger families (80 to 100 sequences), where Leaphy finds a unique best tree in 81.7% of families.

  1. Semiparametric approach for non-monotone missing covariates in a parametric regression model

    KAUST Repository

    Sinha, Samiran

    2014-02-26

    Missing covariate data often arise in biomedical studies, and analysis of such data that ignores subjects with incomplete information may lead to inefficient and possibly biased estimates. A great deal of attention has been paid to handling a single missing covariate or a monotone pattern of missing data when the missingness mechanism is missing at random. In this article, we propose a semiparametric method for handling non-monotone patterns of missing data. The proposed method relies on the assumption that the missingness mechanism of a variable does not depend on the missing variable itself but may depend on the other missing variables. This mechanism is somewhat less general than the completely non-ignorable mechanism but is sometimes more flexible than the missing at random mechanism where the missingness mechansim is allowed to depend only on the completely observed variables. The proposed approach is robust to misspecification of the distribution of the missing covariates, and the proposed mechanism helps to nullify (or reduce) the problems due to non-identifiability that result from the non-ignorable missingness mechanism. The asymptotic properties of the proposed estimator are derived. Finite sample performance is assessed through simulation studies. Finally, for the purpose of illustration we analyze an endometrial cancer dataset and a hip fracture dataset.

  2. Regressive Prediction Approach to Vertical Handover in Fourth Generation Wireless Networks

    Directory of Open Access Journals (Sweden)

    Abubakar M. Miyim

    2014-11-01

    Full Text Available The over increasing demand for deployment of wireless access networks has made wireless mobile devices to face so many challenges in choosing the best suitable network from a set of available access networks. Some of the weighty issues in 4G wireless networks are fastness and seamlessness in handover process. This paper therefore, proposes a handover technique based on movement prediction in wireless mobile (WiMAX and LTE-A environment. The technique enables the system to predict signal quality between the UE and Radio Base Stations (RBS/Access Points (APs in two different networks. Prediction is achieved by employing the Markov Decision Process Model (MDPM where the movement of the UE is dynamically estimated and averaged to keep track of the signal strength of mobile users. With the help of the prediction, layer-3 handover activities are able to occur prior to layer-2 handover, and therefore, total handover latency can be reduced. The performances of various handover approaches influenced by different metrics (mobility velocities were evaluated. The results presented demonstrate good accuracy the proposed method was able to achieve in predicting the next signal level by reducing the total handover latency.

  3. A physarum-inspired prize-collecting steiner tree approach to identify subnetworks for drug repositioning.

    Science.gov (United States)

    Sun, Yahui; Hameed, Pathima Nusrath; Verspoor, Karin; Halgamuge, Saman

    2016-12-05

    Drug repositioning can reduce the time, costs and risks of drug development by identifying new therapeutic effects for known drugs. It is challenging to reposition drugs as pharmacological data is large and complex. Subnetwork identification has already been used to simplify the visualization and interpretation of biological data, but it has not been applied to drug repositioning so far. In this paper, we fill this gap by proposing a new Physarum-inspired Prize-Collecting Steiner Tree algorithm to identify subnetworks for drug repositioning. Drug Similarity Networks (DSN) are generated using the chemical, therapeutic, protein, and phenotype features of drugs. In DSNs, vertex prizes and edge costs represent the similarities and dissimilarities between drugs respectively, and terminals represent drugs in the cardiovascular class, as defined in the Anatomical Therapeutic Chemical classification system. A new Physarum-inspired Prize-Collecting Steiner Tree algorithm is proposed in this paper to identify subnetworks. We apply both the proposed algorithm and the widely-used GW algorithm to identify subnetworks in our 18 generated DSNs. In these DSNs, our proposed algorithm identifies subnetworks with an average Rand Index of 81.1%, while the GW algorithm can only identify subnetworks with an average Rand Index of 64.1%. We select 9 subnetworks with high Rand Index to find drug repositioning opportunities. 10 frequently occurring drugs in these subnetworks are identified as candidates to be repositioned for cardiovascular diseases. We find evidence to support previous discoveries that nitroglycerin, theophylline and acarbose may be able to be repositioned for cardiovascular diseases. Moreover, we identify seven previously unknown drug candidates that also may interact with the biological cardiovascular system. These discoveries show our proposed Prize-Collecting Steiner Tree approach as a promising strategy for drug repositioning.

  4. The Public-Private Sector Wage Gap in Zambia in the 1990s: A Quantile Regression Approach

    DEFF Research Database (Denmark)

    Nielsen, Helena Skyt; Rosholm, Michael

    2001-01-01

    of economic transition, because items as privatization and deregulation were on the political agenda. The focus is placed on the public-private sector wage gap, and the results show that this gap was relatively favorable for the low-skilled and less favorable for the high-skilled. This picture was further......We investigate the determinants of wages in Zambia and based on the quantile regression approach, we analyze how their effects differ at different points in the wage distribution and over time. We use three cross-sections of Zambian household data from the early nineties, which was a period...

  5. Integrated system fault diagnostics utilising digraph and fault tree-based approaches

    International Nuclear Information System (INIS)

    Bartlett, L.M.; Hurdle, E.E.; Kelly, E.M.

    2009-01-01

    With the growing intolerance to failures within systems, the issue of fault diagnosis has become ever prevalent. Information concerning these possible failures can help to minimise the disruption to the functionality of the system by allowing quick rectification. Traditional approaches to fault diagnosis within engineering systems have focused on sequential testing procedures and real-time mechanisms. Both methods have been predominantly limited to single fault causes. Latest approaches also consider the issue of multiple faults in reflection to the characteristics of modern day systems designed for high reliability. In addition, a diagnostic capability is required in real time and for changeable system functionality. This paper focuses on two approaches which have been developed to cater for the demands of diagnosis within current engineering systems, namely application of the fault tree analysis technique and the method of digraphs. Both use a comparative approach to consider differences between actual system behaviour and that expected. The procedural guidelines are discussed for each method, with an experimental aircraft fuel system used to test and demonstrate the features of the techniques. The effectiveness of the approaches is compared and their future potential highlighted

  6. Model-Independent Evaluation of Tumor Markers and a Logistic-Tree Approach to Diagnostic Decision Support

    Directory of Open Access Journals (Sweden)

    Weizeng Ni

    2014-01-01

    Full Text Available Sensitivity and specificity of using individual tumor markers hardly meet the clinical requirement. This challenge gave rise to many efforts, e.g., combing multiple tumor markers and employing machine learning algorithms. However, results from different studies are often inconsistent, which are partially attributed to the use of different evaluation criteria. Also, the wide use of model-dependent validation leads to high possibility of data overfitting when complex models are used for diagnosis. We propose two model-independent criteria, namely, area under the curve (AUC and Relief to evaluate the diagnostic values of individual and multiple tumor markers, respectively. For diagnostic decision support, we propose the use of logistic-tree which combines decision tree and logistic regression. Application on a colorectal cancer dataset shows that the proposed evaluation criteria produce results that are consistent with current knowledge. Furthermore, the simple and highly interpretable logistic-tree has diagnostic performance that is competitive with other complex models.

  7. A machine learning approach to galaxy-LSS classification - I. Imprints on halo merger trees

    Science.gov (United States)

    Hui, Jianan; Aragon, Miguel; Cui, Xinping; Flegal, James M.

    2018-04-01

    The cosmic web plays a major role in the formation and evolution of galaxies and defines, to a large extent, their properties. However, the relation between galaxies and environment is still not well understood. Here, we present a machine learning approach to study imprints of environmental effects on the mass assembly of haloes. We present a galaxy-LSS machine learning classifier based on galaxy properties sensitive to the environment. We then use the classifier to assess the relevance of each property. Correlations between galaxy properties and their cosmic environment can be used to predict galaxy membership to void/wall or filament/cluster with an accuracy of 93 per cent. Our study unveils environmental information encoded in properties of haloes not normally considered directly dependent on the cosmic environment such as merger history and complexity. Understanding the physical mechanism by which the cosmic web is imprinted in a halo can lead to significant improvements in galaxy formation models. This is accomplished by extracting features from galaxy properties and merger trees, computing feature scores for each feature and then applying support vector machine (SVM) to different feature sets. To this end, we have discovered that the shape and depth of the merger tree, formation time, and density of the galaxy are strongly associated with the cosmic environment. We describe a significant improvement in the original classification algorithm by performing LU decomposition of the distance matrix computed by the feature vectors and then using the output of the decomposition as input vectors for SVM.

  8. A Unified Experimental Approach for Estimation of Irrigationwater and Nitrate Leaching in Tree Crops

    Science.gov (United States)

    Hopmans, J. W.; Kandelous, M. M.; Moradi, A. B.

    2014-12-01

    Groundwater quality is specifically vulnerable in irrigated agricultural lands in California and many other(semi-)arid regions of the world. The routine application of nitrogen fertilizers with irrigation water in California is likely responsible for the high nitrate concentrations in groundwater, underlying much of its main agricultural areas. To optimize irrigation/fertigation practices, it is essential that irrigation and fertilizers are applied at the optimal concentration, place, and time to ensure maximum root uptake and minimize leaching losses to the groundwater. The applied irrigation water and dissolved fertilizer, as well as root growth and associated nitrate and water uptake, interact with soil properties and fertilizer source(s) in a complex manner that cannot easily be resolved. It is therefore that coupled experimental-modeling studies are required to allow for unraveling of the relevant complexities that result from typical field-wide spatial variations of soil texture and layering across farmer-managed fields. We present experimental approaches across a network of tree crop orchards in the San Joaquin Valley, that provide the necessary soil data of soil moisture, water potential and nitrate concentration to evaluate and optimize irrigation water management practices. Specifically, deep tensiometers were used to monitor in-situ continuous soil water potential gradients, for the purpose to compute leaching fluxes of water and nitrate at both the individual tree and field scale.

  9. A novel prediction approach for antimalarial activities of Trimethoprim, Pyrimethamine, and Cycloguanil analogues using extremely randomized trees.

    Science.gov (United States)

    Nattee, Cholwich; Khamsemanan, Nirattaya; Lawtrakul, Luckhana; Toochinda, Pisanu; Hannongbua, Supa

    2017-01-01

    Malaria is still one of the most serious diseases in tropical regions. This is due in part to the high resistance against available drugs for the inhibition of parasites, Plasmodium, the cause of the disease. New potent compounds with high clinical utility are urgently needed. In this work, we created a novel model using a regression tree to study structure-activity relationships and predict the inhibition constant, K i of three different antimalarial analogues (Trimethoprim, Pyrimethamine, and Cycloguanil) based on their molecular descriptors. To the best of our knowledge, this work is the first attempt to study the structure-activity relationships of all three analogues combined. The most relevant descriptors and appropriate parameters of the regression tree are harvested using extremely randomized trees. These descriptors are water accessible surface area, Log of the aqueous solubility, total hydrophobic van der Waals surface area, and molecular refractivity. Out of all possible combinations of these selected parameters and descriptors, the tree with the strongest coefficient of determination is selected to be our prediction model. Predicted K i values from the proposed model show a strong coefficient of determination, R 2 =0.996, to experimental K i values. From the structure of the regression tree, compounds with high accessible surface area of all hydrophobic atoms (ASA_H) and low aqueous solubility of inhibitors (Log S) generally possess low K i values. Our prediction model can also be utilized as a screening test for new antimalarial drug compounds which may reduce the time and expenses for new drug development. New compounds with high predicted K i should be excluded from further drug development. It is also our inference that a threshold of ASA_H greater than 575.80 and Log S less than or equal to -4.36 is a sufficient condition for a new compound to possess a low K i . Copyright © 2016 Elsevier Inc. All rights reserved.

  10. A Two-Stage Penalized Logistic Regression Approach to Case-Control Genome-Wide Association Studies

    Directory of Open Access Journals (Sweden)

    Jingyuan Zhao

    2012-01-01

    Full Text Available We propose a two-stage penalized logistic regression approach to case-control genome-wide association studies. This approach consists of a screening stage and a selection stage. In the screening stage, main-effect and interaction-effect features are screened by using L1-penalized logistic like-lihoods. In the selection stage, the retained features are ranked by the logistic likelihood with the smoothly clipped absolute deviation (SCAD penalty (Fan and Li, 2001 and Jeffrey’s Prior penalty (Firth, 1993, a sequence of nested candidate models are formed, and the models are assessed by a family of extended Bayesian information criteria (J. Chen and Z. Chen, 2008. The proposed approach is applied to the analysis of the prostate cancer data of the Cancer Genetic Markers of Susceptibility (CGEMS project in the National Cancer Institute, USA. Simulation studies are carried out to compare the approach with the pair-wise multiple testing approach (Marchini et al. 2005 and the LASSO-patternsearch algorithm (Shi et al. 2007.

  11. Predictability of extreme weather events for NE U.S.: improvement of the numerical prediction using a Bayesian regression approach

    Science.gov (United States)

    Yang, J.; Astitha, M.; Anagnostou, E. N.; Hartman, B.; Kallos, G. B.

    2015-12-01

    Weather prediction accuracy has become very important for the Northeast U.S. given the devastating effects of extreme weather events in the recent years. Weather forecasting systems are used towards building strategies to prevent catastrophic losses for human lives and the environment. Concurrently, weather forecast tools and techniques have evolved with improved forecast skill as numerical prediction techniques are strengthened by increased super-computing resources. In this study, we examine the combination of two state-of-the-science atmospheric models (WRF and RAMS/ICLAMS) by utilizing a Bayesian regression approach to improve the prediction of extreme weather events for NE U.S. The basic concept behind the Bayesian regression approach is to take advantage of the strengths of two atmospheric modeling systems and, similar to the multi-model ensemble approach, limit their weaknesses which are related to systematic and random errors in the numerical prediction of physical processes. The first part of this study is focused on retrospective simulations of seventeen storms that affected the region in the period 2004-2013. Optimal variances are estimated by minimizing the root mean square error and are applied to out-of-sample weather events. The applicability and usefulness of this approach are demonstrated by conducting an error analysis based on in-situ observations from meteorological stations of the National Weather Service (NWS) for wind speed and wind direction, and NCEP Stage IV radar data, mosaicked from the regional multi-sensor for precipitation. The preliminary results indicate a significant improvement in the statistical metrics of the modeled-observed pairs for meteorological variables using various combinations of the sixteen events as predictors of the seventeenth. This presentation will illustrate the implemented methodology and the obtained results for wind speed, wind direction and precipitation, as well as set the research steps that will be

  12. Cost-of-illness studies based on massive data: a prevalence-based, top-down regression approach.

    Science.gov (United States)

    Stollenwerk, Björn; Welchowski, Thomas; Vogl, Matthias; Stock, Stephanie

    2016-04-01

    Despite the increasing availability of routine data, no analysis method has yet been presented for cost-of-illness (COI) studies based on massive data. We aim, first, to present such a method and, second, to assess the relevance of the associated gain in numerical efficiency. We propose a prevalence-based, top-down regression approach consisting of five steps: aggregating the data; fitting a generalized additive model (GAM); predicting costs via the fitted GAM; comparing predicted costs between prevalent and non-prevalent subjects; and quantifying the stochastic uncertainty via error propagation. To demonstrate the method, it was applied to aggregated data in the context of chronic lung disease to German sickness funds data (from 1999), covering over 7.3 million insured. To assess the gain in numerical efficiency, the computational time of the innovative approach has been compared with corresponding GAMs applied to simulated individual-level data. Furthermore, the probability of model failure was modeled via logistic regression. Applying the innovative method was reasonably fast (19 min). In contrast, regarding patient-level data, computational time increased disproportionately by sample size. Furthermore, using patient-level data was accompanied by a substantial risk of model failure (about 80 % for 6 million subjects). The gain in computational efficiency of the innovative COI method seems to be of practical relevance. Furthermore, it may yield more precise cost estimates.

  13. Validity of the reduced-sample insulin modified frequently-sampled intravenous glucose tolerance test using the nonlinear regression approach.

    Science.gov (United States)

    Sumner, Anne E; Luercio, Marcella F; Frempong, Barbara A; Ricks, Madia; Sen, Sabyasachi; Kushner, Harvey; Tulloch-Reid, Marshall K

    2009-02-01

    The disposition index, the product of the insulin sensitivity index (S(I)) and the acute insulin response to glucose, is linked in African Americans to chromosome 11q. This link was determined with S(I) calculated with the nonlinear regression approach to the minimal model and data from the reduced-sample insulin-modified frequently-sampled intravenous glucose tolerance test (Reduced-Sample-IM-FSIGT). However, the application of the nonlinear regression approach to calculate S(I) using data from the Reduced-Sample-IM-FSIGT has been challenged as being not only inaccurate but also having a high failure rate in insulin-resistant subjects. Our goal was to determine the accuracy and failure rate of the Reduced-Sample-IM-FSIGT using the nonlinear regression approach to the minimal model. With S(I) from the Full-Sample-IM-FSIGT considered the standard and using the nonlinear regression approach to the minimal model, we compared the agreement between S(I) from the Full- and Reduced-Sample-IM-FSIGT protocols. One hundred African Americans (body mass index, 31.3 +/- 7.6 kg/m(2) [mean +/- SD]; range, 19.0-56.9 kg/m(2)) had FSIGTs. Glucose (0.3 g/kg) was given at baseline. Insulin was infused from 20 to 25 minutes (total insulin dose, 0.02 U/kg). For the Full-Sample-IM-FSIGT, S(I) was calculated based on the glucose and insulin samples taken at -1, 1, 2, 3, 4, 5, 6, 7, 8,10, 12, 14, 16, 19, 22, 23, 24, 25, 27, 30, 40, 50, 60, 70, 80, 90, 100, 120, 150, and 180 minutes. For the Reduced-Sample-FSIGT, S(I) was calculated based on the time points that appear in bold. Agreement was determined by Spearman correlation, concordance, and the Bland-Altman method. In addition, for both protocols, the population was divided into tertiles of S(I). Insulin resistance was defined by the lowest tertile of S(I) from the Full-Sample-IM-FSIGT. The distribution of subjects across tertiles was compared by rank order and kappa statistic. We found that the rate of failure of resolution of S(I) by

  14. Estimating aboveground tree biomass on forest land in the Pacific Northwest: a comparison of approaches

    Science.gov (United States)

    Xiaoping Zhou; Miles A. Hemstrom

    2009-01-01

    Live tree biomass estimates are essential for carbon accounting, bioenergy feasibility studies, and other analyses. Several models are currently used for estimating tree biomass. Each of these incorporates different calculation methods that may significantly impact the estimates of total aboveground tree biomass, merchantable biomass, and carbon pools. Consequently,...

  15. On Directed Edge-Disjoint Spanning Trees in Product Networks, An Algorithmic Approach

    Directory of Open Access Journals (Sweden)

    A.R. Touzene

    2014-12-01

    Full Text Available In (Ku et al. 2003, the authors have proposed a construction of edge-disjoint spanning trees EDSTs in undirected product networks. Their construction method focuses more on showing the existence of a maximum number (n1+n2-1 of EDSTs in product network of two graphs, where factor graphs have respectively n1 and n2 EDSTs. In this paper, we propose a new systematic and algorithmic approach to construct (n1+n2 directed routed EDST in the product networks. The direction of an edge is added to support bidirectional links in interconnection networks. Our EDSTs can be used straightforward to develop efficient collective communication algorithms for both models store-and-forward and wormhole.

  16. Unequal Probability Marking Approach to Enhance Security of Traceback Scheme in Tree-Based WSNs.

    Science.gov (United States)

    Huang, Changqin; Ma, Ming; Liu, Xiao; Liu, Anfeng; Zuo, Zhengbang

    2017-06-17

    Fog (from core to edge) computing is a newly emerging computing platform, which utilizes a large number of network devices at the edge of a network to provide ubiquitous computing, thus having great development potential. However, the issue of security poses an important challenge for fog computing. In particular, the Internet of Things (IoT) that constitutes the fog computing platform is crucial for preserving the security of a huge number of wireless sensors, which are vulnerable to attack. In this paper, a new unequal probability marking approach is proposed to enhance the security performance of logging and migration traceback (LM) schemes in tree-based wireless sensor networks (WSNs). The main contribution of this paper is to overcome the deficiency of the LM scheme that has a higher network lifetime and large storage space. In the unequal probability marking logging and migration (UPLM) scheme of this paper, different marking probabilities are adopted for different nodes according to their distances to the sink. A large marking probability is assigned to nodes in remote areas (areas at a long distance from the sink), while a small marking probability is applied to nodes in nearby area (areas at a short distance from the sink). This reduces the consumption of storage and energy in addition to enhancing the security performance, lifetime, and storage capacity. Marking information will be migrated to nodes at a longer distance from the sink for increasing the amount of stored marking information, thus enhancing the security performance in the process of migration. The experimental simulation shows that for general tree-based WSNs, the UPLM scheme proposed in this paper can store 1.12-1.28 times the amount of stored marking information that the equal probability marking approach achieves, and has 1.15-1.26 times the storage utilization efficiency compared with other schemes.

  17. An Introduction to the Hybrid Approach of Neural Networks and the Linear Regression Model : An Illustration in the Hedonic Pricing Model of Building Costs

    OpenAIRE

    浅野, 美代子; マーコ, ユー K.W.

    2007-01-01

    This paper introduces the hybrid approach of neural networks and linear regression model proposed by Asano and Tsubaki (2003). Neural networks are often credited with its superiority in data consistency whereas the linear regression model provides simple interpretation of the data enabling researchers to verify their hypotheses. The hybrid approach aims at combing the strengths of these two well-established statistical methods. A step-by-step procedure for performing the hybrid approach is pr...

  18. Hybrid regression trees applied to the monitoring of dynamic safety of isolated networks with large eolic production contribution; Utilizacao de arvores de regressao hibridas na monitorizacao da seguranca dinamica de redes isoladas com grande producao eolica

    Energy Technology Data Exchange (ETDEWEB)

    Lopes, J.A Pecas; Vasconcelos, Maria Helena O.P. de [Instituto de Engenharia de Sistemas e Computadores (INESC), Porto (Portugal). E-mail: jpl@riff.fe.up.pt; hvasconcelos@inescn.pt

    1999-07-01

    This paper describes in a synthetic manner the technology adopted to define structures used in the fast evaluation of dynamic safety of isolated network with high level of eolic production contribution. This methodology uses hybrid regression trees, which allows the quantification the endurance connected to the dynamic behavior of these networks by emulating the frequency minimum deviation that will be experienced by the system when submitted toa pre-defined perturbation. Also, new procedures for data automatic generation are presented, which will be used for construction and measurements of the evaluation structures performance. The paper describes the Terceira island - Acores archipelago network study case.

  19. Decision trees in epidemiological research

    Directory of Open Access Journals (Sweden)

    Ashwini Venkatasubramaniam

    2017-09-01

    Full Text Available Abstract Background In many studies, it is of interest to identify population subgroups that are relatively homogeneous with respect to an outcome. The nature of these subgroups can provide insight into effect mechanisms and suggest targets for tailored interventions. However, identifying relevant subgroups can be challenging with standard statistical methods. Main text We review the literature on decision trees, a family of techniques for partitioning the population, on the basis of covariates, into distinct subgroups who share similar values of an outcome variable. We compare two decision tree methods, the popular Classification and Regression tree (CART technique and the newer Conditional Inference tree (CTree technique, assessing their performance in a simulation study and using data from the Box Lunch Study, a randomized controlled trial of a portion size intervention. Both CART and CTree identify homogeneous population subgroups and offer improved prediction accuracy relative to regression-based approaches when subgroups are truly present in the data. An important distinction between CART and CTree is that the latter uses a formal statistical hypothesis testing framework in building decision trees, which simplifies the process of identifying and interpreting the final tree model. We also introduce a novel way to visualize the subgroups defined by decision trees. Our novel graphical visualization provides a more scientifically meaningful characterization of the subgroups identified by decision trees. Conclusions Decision trees are a useful tool for identifying homogeneous subgroups defined by combinations of individual characteristics. While all decision tree techniques generate subgroups, we advocate the use of the newer CTree technique due to its simplicity and ease of interpretation.

  20. Decision trees in epidemiological research.

    Science.gov (United States)

    Venkatasubramaniam, Ashwini; Wolfson, Julian; Mitchell, Nathan; Barnes, Timothy; JaKa, Meghan; French, Simone

    2017-01-01

    In many studies, it is of interest to identify population subgroups that are relatively homogeneous with respect to an outcome. The nature of these subgroups can provide insight into effect mechanisms and suggest targets for tailored interventions. However, identifying relevant subgroups can be challenging with standard statistical methods. We review the literature on decision trees, a family of techniques for partitioning the population, on the basis of covariates, into distinct subgroups who share similar values of an outcome variable. We compare two decision tree methods, the popular Classification and Regression tree (CART) technique and the newer Conditional Inference tree (CTree) technique, assessing their performance in a simulation study and using data from the Box Lunch Study, a randomized controlled trial of a portion size intervention. Both CART and CTree identify homogeneous population subgroups and offer improved prediction accuracy relative to regression-based approaches when subgroups are truly present in the data. An important distinction between CART and CTree is that the latter uses a formal statistical hypothesis testing framework in building decision trees, which simplifies the process of identifying and interpreting the final tree model. We also introduce a novel way to visualize the subgroups defined by decision trees. Our novel graphical visualization provides a more scientifically meaningful characterization of the subgroups identified by decision trees. Decision trees are a useful tool for identifying homogeneous subgroups defined by combinations of individual characteristics. While all decision tree techniques generate subgroups, we advocate the use of the newer CTree technique due to its simplicity and ease of interpretation.

  1. Comparison of exact, efron and breslow parameter approach method on hazard ratio and stratified cox regression model

    Science.gov (United States)

    Fatekurohman, Mohamat; Nurmala, Nita; Anggraeni, Dian

    2018-04-01

    Lungs are the most important organ, in the case of respiratory system. Problems related to disorder of the lungs are various, i.e. pneumonia, emphysema, tuberculosis and lung cancer. Comparing all those problems, lung cancer is the most harmful. Considering about that, the aim of this research applies survival analysis and factors affecting the endurance of the lung cancer patient using comparison of exact, Efron and Breslow parameter approach method on hazard ratio and stratified cox regression model. The data applied are based on the medical records of lung cancer patients in Jember Paru-paru hospital on 2016, east java, Indonesia. The factors affecting the endurance of the lung cancer patients can be classified into several criteria, i.e. sex, age, hemoglobin, leukocytes, erythrocytes, sedimentation rate of blood, therapy status, general condition, body weight. The result shows that exact method of stratified cox regression model is better than other. On the other hand, the endurance of the patients is affected by their age and the general conditions.

  2. Regression-based approach for testing the association between multi-region haplotype configuration and complex trait

    Directory of Open Access Journals (Sweden)

    Zhao Hongbo

    2009-09-01

    Full Text Available Abstract Background It is quite common that the genetic architecture of complex traits involves many genes and their interactions. Therefore, dealing with multiple unlinked genomic regions simultaneously is desirable. Results In this paper we develop a regression-based approach to assess the interactions of haplotypes that belong to different unlinked regions, and we use score statistics to test the null hypothesis of non-genetic association. Additionally, multiple marker combinations at each unlinked region are considered. The multiple tests are settled via the minP approach. The P value of the "best" multi-region multi-marker configuration is corrected via Monte-Carlo simulations. Through simulation studies, we assess the performance of the proposed approach and demonstrate its validity and power in testing for haplotype interaction association. Conclusion Our simulations showed that, for binary trait without covariates, our proposed methods prove to be equal and even more powerful than htr and hapcc which are part of the FAMHAP program. Additionally, our model can be applied to a wider variety of traits and allow adjustment for other covariates. To test the validity, our methods are applied to analyze the association between four unlinked candidate genes and pig meat quality.

  3. SEMI-AUTOMATED APPROACH FOR MAPPING URBAN TREES FROM INTEGRATED AERIAL LiDAR POINT CLOUD AND DIGITAL IMAGERY DATASETS

    Directory of Open Access Journals (Sweden)

    M. A. Dogon-Yaro

    2016-09-01

    Full Text Available Mapping of trees plays an important role in modern urban spatial data management, as many benefits and applications inherit from this detailed up-to-date data sources. Timely and accurate acquisition of information on the condition of urban trees serves as a tool for decision makers to better appreciate urban ecosystems and their numerous values which are critical to building up strategies for sustainable development. The conventional techniques used for extracting trees include ground surveying and interpretation of the aerial photography. However, these techniques are associated with some constraints, such as labour intensive field work and a lot of financial requirement which can be overcome by means of integrated LiDAR and digital image datasets. Compared to predominant studies on trees extraction mainly in purely forested areas, this study concentrates on urban areas, which have a high structural complexity with a multitude of different objects. This paper presented a workflow about semi-automated approach for extracting urban trees from integrated processing of airborne based LiDAR point cloud and multispectral digital image datasets over Istanbul city of Turkey. The paper reveals that the integrated datasets is a suitable technology and viable source of information for urban trees management. As a conclusion, therefore, the extracted information provides a snapshot about location, composition and extent of trees in the study area useful to city planners and other decision makers in order to understand how much canopy cover exists, identify new planting, removal, or reforestation opportunities and what locations have the greatest need or potential to maximize benefits of return on investment. It can also help track trends or changes to the urban trees over time and inform future management decisions.

  4. Semi-Automated Approach for Mapping Urban Trees from Integrated Aerial LiDAR Point Cloud and Digital Imagery Datasets

    Science.gov (United States)

    Dogon-Yaro, M. A.; Kumar, P.; Rahman, A. Abdul; Buyuksalih, G.

    2016-09-01

    Mapping of trees plays an important role in modern urban spatial data management, as many benefits and applications inherit from this detailed up-to-date data sources. Timely and accurate acquisition of information on the condition of urban trees serves as a tool for decision makers to better appreciate urban ecosystems and their numerous values which are critical to building up strategies for sustainable development. The conventional techniques used for extracting trees include ground surveying and interpretation of the aerial photography. However, these techniques are associated with some constraints, such as labour intensive field work and a lot of financial requirement which can be overcome by means of integrated LiDAR and digital image datasets. Compared to predominant studies on trees extraction mainly in purely forested areas, this study concentrates on urban areas, which have a high structural complexity with a multitude of different objects. This paper presented a workflow about semi-automated approach for extracting urban trees from integrated processing of airborne based LiDAR point cloud and multispectral digital image datasets over Istanbul city of Turkey. The paper reveals that the integrated datasets is a suitable technology and viable source of information for urban trees management. As a conclusion, therefore, the extracted information provides a snapshot about location, composition and extent of trees in the study area useful to city planners and other decision makers in order to understand how much canopy cover exists, identify new planting, removal, or reforestation opportunities and what locations have the greatest need or potential to maximize benefits of return on investment. It can also help track trends or changes to the urban trees over time and inform future management decisions.

  5. Characterization of particulate matter deposited on urban tree foliage: A landscape analysis approach

    Science.gov (United States)

    Lin, Lin; Yan, Jingli; Ma, Keming; Zhou, Weiqi; Chen, Guojian; Tang, Rongli; Zhang, Yuxin

    2017-12-01

    Plants can mitigate ambient particulate matter by cleaning the air, which is crucial to urban environments. A novel approach was presented to quantitatively characterize particulate matter deposited on urban tree foliage. This approach could accurately quantify the number, size, shape, and spatial distribution of particles with different diameters on leaves. Spatial distribution is represented by proximity, which measures the closeness of particles. We sampled three common broadleaf species and obtained images through field emission scanning electron microscopy. We conducted the object-based method to extract particles from images. We then used Fragstats to analyze the landscape characteristics of these particles in term of selected metrics. Results reveal that Salix matsudana is more efficient than Ailanthus altissima and Fraxinus chinensis in terms of the number and area of particles per unit area and the proportion of fine particulate matter. The shape complexity of the particles increases with their size. Among the three species, S. matsudana and A. altissima particles respectively yield the highest and lowest proximity. PM1 in A. altissima and PM10 in F. chinensis and S. matsudana show the highest proximity, which may influence subsequent particle retention. S. matsudana should be generally considered to collect additional small particles. Different species and particle sizes exhibit various proximities, which should be further examined to elucidate the underlying mechanism.

  6. A reduction approach to improve the quantification of linked fault trees through binary decision diagrams

    International Nuclear Information System (INIS)

    Ibanez-Llano, Cristina; Rauzy, Antoine; Melendez, Enrique; Nieto, Francisco

    2010-01-01

    Over the last two decades binary decision diagrams have been applied successfully to improve Boolean reliability models. Conversely to the classical approach based on the computation of the MCS, the BDD approach involves no approximation in the quantification of the model and is able to handle correctly negative logic. However, when models are sufficiently large and complex, as for example the ones coming from the PSA studies of the nuclear industry, it begins to be unfeasible to compute the BDD within a reasonable amount of time and computer memory. Therefore, simplification or reduction of the full model has to be considered in some way to adapt the application of the BDD technology to the assessment of such models in practice. This paper proposes a reduction process based on using information provided by the set of the most relevant minimal cutsets of the model in order to perform the reduction directly on it. This allows controlling the degree of reduction and therefore the impact of such simplification on the final quantification results. This reduction is integrated in an incremental procedure that is compatible with the dynamic generation of the event trees and therefore adaptable to the recent dynamic developments and extensions of the PSA studies. The proposed method has been applied to a real case study, and the results obtained confirm that the reduction enables the BDD computation while maintaining accuracy.

  7. A reduction approach to improve the quantification of linked fault trees through binary decision diagrams

    Energy Technology Data Exchange (ETDEWEB)

    Ibanez-Llano, Cristina, E-mail: cristina.ibanez@iit.upcomillas.e [Instituto de Investigacion Tecnologica (IIT), Escuela Tecnica Superior de Ingenieria ICAI, Universidad Pontificia Comillas, C/Santa Cruz de Marcenado 26, 28015 Madrid (Spain); Rauzy, Antoine, E-mail: Antoine.RAUZY@3ds.co [Dassault Systemes, 10 rue Marcel Dassault CS 40501, 78946 Velizy Villacoublay, Cedex (France); Melendez, Enrique, E-mail: ema@csn.e [Consejo de Seguridad Nuclear (CSN), C/Justo Dorado 11, 28040 Madrid (Spain); Nieto, Francisco, E-mail: nieto@iit.upcomillas.e [Instituto de Investigacion Tecnologica (IIT), Escuela Tecnica Superior de Ingenieria ICAI, Universidad Pontificia Comillas, C/Santa Cruz de Marcenado 26, 28015 Madrid (Spain)

    2010-12-15

    Over the last two decades binary decision diagrams have been applied successfully to improve Boolean reliability models. Conversely to the classical approach based on the computation of the MCS, the BDD approach involves no approximation in the quantification of the model and is able to handle correctly negative logic. However, when models are sufficiently large and complex, as for example the ones coming from the PSA studies of the nuclear industry, it begins to be unfeasible to compute the BDD within a reasonable amount of time and computer memory. Therefore, simplification or reduction of the full model has to be considered in some way to adapt the application of the BDD technology to the assessment of such models in practice. This paper proposes a reduction process based on using information provided by the set of the most relevant minimal cutsets of the model in order to perform the reduction directly on it. This allows controlling the degree of reduction and therefore the impact of such simplification on the final quantification results. This reduction is integrated in an incremental procedure that is compatible with the dynamic generation of the event trees and therefore adaptable to the recent dynamic developments and extensions of the PSA studies. The proposed method has been applied to a real case study, and the results obtained confirm that the reduction enables the BDD computation while maintaining accuracy.

  8. Investigating the complex relationship between in situ Southern Ocean pCO2 and its ocean physics and biogeochemical drivers using a nonparametric regression approach

    CSIR Research Space (South Africa)

    Pretorius, W

    2014-01-01

    Full Text Available the relationship more accurately in terms of MSE, RMSE and MAE, than a standard parametric approach (multiple linear regression). These results provide a platform for using the developed nonparametric regression model based on in situ measurements to predict p...

  9. Multicasting in Wireless Communications (Ad-Hoc Networks): Comparison against a Tree-Based Approach

    Science.gov (United States)

    Rizos, G. E.; Vasiliadis, D. C.

    2007-12-01

    We examine on-demand multicasting in ad hoc networks. The Core Assisted Mesh Protocol (CAMP) is a well-known protocol for multicast routing in ad-hoc networks, generalizing the notion of core-based trees employed for internet multicasting into multicast meshes that have much richer connectivity than trees. On the other hand, wireless tree-based multicast routing protocols use much simpler structures for determining route paths, using only parent-child relationships. In this work, we compare the performance of the CAMP protocol against the performance of wireless tree-based multicast routing protocols, in terms of two important factors, namely packet delay and ratio of dropped packets.

  10. Optimization and analysis of decision trees and rules: Dynamic programming approach

    KAUST Repository

    Alkhalid, Abdulaziz

    2013-08-01

    This paper is devoted to the consideration of software system Dagger created in KAUST. This system is based on extensions of dynamic programming. It allows sequential optimization of decision trees and rules relative to different cost functions, derivation of relationships between two cost functions (in particular, between number of misclassifications and depth of decision trees), and between cost and uncertainty of decision trees. We describe features of Dagger and consider examples of this systems work on decision tables from UCI Machine Learning Repository. We also use Dagger to compare 16 different greedy algorithms for decision tree construction. © 2013 Taylor and Francis Group, LLC.

  11. Optimization and analysis of decision trees and rules: Dynamic programming approach

    KAUST Repository

    Alkhalid, Abdulaziz; Amin, Talha M.; Chikalov, Igor; Hussain, Shahid; Moshkov, Mikhail; Zielosko, Beata

    2013-01-01

    This paper is devoted to the consideration of software system Dagger created in KAUST. This system is based on extensions of dynamic programming. It allows sequential optimization of decision trees and rules relative to different cost functions, derivation of relationships between two cost functions (in particular, between number of misclassifications and depth of decision trees), and between cost and uncertainty of decision trees. We describe features of Dagger and consider examples of this systems work on decision tables from UCI Machine Learning Repository. We also use Dagger to compare 16 different greedy algorithms for decision tree construction. © 2013 Taylor and Francis Group, LLC.

  12. A computer-oriented approach to fault-tree construction. Topical report No. 1

    International Nuclear Information System (INIS)

    Chu, B.B.

    1976-11-01

    Fault Tree Analysis is one of the major tools for the safety and reliability analysis of large systems. A methodology for systematically constructing fault trees for general complex systems is developed and applied, via the computer program CAT, to several systems. First, a means of representing component behavior by decision tables is presented. In order to use these tables, a procedure for constructing and editing fault trees, either manually or by computer, is described. In order to verify the methodology the computer program CAT has been developed and used to construct fault trees for two systems

  13. Analysis of effects of manhole covers on motorcycle driver maneuvers: a nonparametric classification tree approach.

    Science.gov (United States)

    Chang, Li-Yen

    2014-01-01

    A manhole cover is a removable plate forming the lid over the opening of a manhole to allow traffic to pass over the manhole and to prevent people from falling in. Because most manhole covers are placed in roadway traffic lanes, if these manhole covers are not appropriately installed or maintained, they can represent unexpected hazards on the road, especially for motorcycle drivers. The objective of this study is to identify the effects of manhole cover characteristics as well as driver factors and traffic and roadway conditions on motorcycle driver maneuvers. A video camera was used to record motorcycle drivers' maneuvers when they encountered an inappropriately installed or maintained manhole cover. Information on 3059 drivers' maneuver decisions was recorded. Classification and regression tree (CART) models were applied to explore factors that can significantly affect motorcycle driver maneuvers when passing a manhole cover. Nearly 50 percent of the motorcycle drivers decelerated or changed their driving path to reduce the effects of the manhole cover. The manhole cover characteristics including the level difference between manhole cover and pavement, the pavement condition over the manhole cover, and the size of the manhole cover can significantly affect motorcycle driver maneuvers. Other factors, including traffic conditions, lane width, motorcycle speed, and loading conditions, also have significant effects on motorcycle driver maneuvers. To reduce the effects and potential risks from the manhole covers, highway authorities not only need to make sure that any newly installed manhole covers are as level as possible but also need to regularly maintain all the manhole covers to ensure that they are in good condition. In the long run, the size of manhole covers should be kept as small as possible so that the impact of manhole covers on motorcycle drivers can be effectively reduced. Supplemental materials are available for this article. Go to the publisher

  14. Assessment of perfusion by dynamic contrast-enhanced imaging using a deconvolution approach based on regression and singular value decomposition.

    Science.gov (United States)

    Koh, T S; Wu, X Y; Cheong, L H; Lim, C C T

    2004-12-01

    The assessment of tissue perfusion by dynamic contrast-enhanced (DCE) imaging involves a deconvolution process. For analysis of DCE imaging data, we implemented a regression approach to select appropriate regularization parameters for deconvolution using the standard and generalized singular value decomposition methods. Monte Carlo simulation experiments were carried out to study the performance and to compare with other existing methods used for deconvolution analysis of DCE imaging data. The present approach is found to be robust and reliable at the levels of noise commonly encountered in DCE imaging, and for different models of the underlying tissue vasculature. The advantages of the present method, as compared with previous methods, include its efficiency of computation, ability to achieve adequate regularization to reproduce less noisy solutions, and that it does not require prior knowledge of the noise condition. The proposed method is applied on actual patient study cases with brain tumors and ischemic stroke, to illustrate its applicability as a clinical tool for diagnosis and assessment of treatment response.

  15. Options Evaluation for Remediation of the Gunnar Site Using a Decision- Tree Approach

    Energy Technology Data Exchange (ETDEWEB)

    Yankovich, Tamara L. [International Atomic Energy Agency, P.O. Box 100, 1400 Vienna (Austria); Hachkowski, Andrea [CH2M Hill Canada Limited, 1305 Kenaston Blvd, Winnipeg, Manitoba, R3P 2P2 (Canada); Klyashtorin, Alexey [Saskatchewan Research Council, 15 Innovation Blvd no.125, Saskatoon, Saskatchewan, S7N 2X8 (Canada)

    2014-07-01

    Current best practice in the nuclear industry involves proactive planning of activities from cradle-to-grave over the entire nuclear life cycle in accordance with national requirements and international guidance. This includes the development of detailed decommissioning plans (DDP) at an early stage to facilitate proactive, responsible decision-making as activities are being planned. It should be noted, however, that the current approach may not be applicable to historic nuclear legacy sites, such as abandoned uranium mines and mills, which had operated in the past under less stringent regulatory regimes. In such cases, records documenting past activities are often not available and monitoring data may not have been collected, thereby limiting knowledge of impacts related to past activities. This can lead to challenges in gaining regulatory and funding approvals related to the remediation of such sites, especially given the costs that can be associated with remediation and the uncertainties in characterizing the existing situation. The Gunnar Site, in northern Saskatchewan, is an example of an abandoned uranium mine/mill site, which was operated between the late 1950's to early 1960's under a different regulatory regime than today. Due to the lack of monitoring data and records for the site, and the corresponding uncertainties, a number of precedent-setting approaches have been developed and applied, as part of the environmental impact assessment (EIA) process. Specifically, unlike traditional environmental assessments for planned and operating facilities, it was not possible to identify a preferred and alternative remedial option. Instead, a step-wise decision-tree approach has been developed to identify all potentially feasible remedial options and to map out key decision points, during the licensing phase of the project (following approval of the environmental assessment), when final remedial options will be selected. The presentation will provide

  16. River flow prediction using hybrid models of support vector regression with the wavelet transform, singular spectrum analysis and chaotic approach

    Science.gov (United States)

    Baydaroğlu, Özlem; Koçak, Kasım; Duran, Kemal

    2018-06-01

    Prediction of water amount that will enter the reservoirs in the following month is of vital importance especially for semi-arid countries like Turkey. Climate projections emphasize that water scarcity will be one of the serious problems in the future. This study presents a methodology for predicting river flow for the subsequent month based on the time series of observed monthly river flow with hybrid models of support vector regression (SVR). Monthly river flow over the period 1940-2012 observed for the Kızılırmak River in Turkey has been used for training the method, which then has been applied for predictions over a period of 3 years. SVR is a specific implementation of support vector machines (SVMs), which transforms the observed input data time series into a high-dimensional feature space (input matrix) by way of a kernel function and performs a linear regression in this space. SVR requires a special input matrix. The input matrix was produced by wavelet transforms (WT), singular spectrum analysis (SSA), and a chaotic approach (CA) applied to the input time series. WT convolutes the original time series into a series of wavelets, and SSA decomposes the time series into a trend, an oscillatory and a noise component by singular value decomposition. CA uses a phase space formed by trajectories, which represent the dynamics producing the time series. These three methods for producing the input matrix for the SVR proved successful, while the SVR-WT combination resulted in the highest coefficient of determination and the lowest mean absolute error.

  17. A new non-invasive approach based on polyhexamethylene biguanide increases the regression rate of HPV infection

    Directory of Open Access Journals (Sweden)

    Gentile Antonio

    2012-09-01

    Full Text Available Abstract Background HPV infection is a worldwide problem strictly linked to the development of cervical cancer. Persistence of the infection is one of the main factors responsible for the invasive progression and women diagnosed with intraepithelial squamous lesions are referred for further assessment and surgical treatments which are prone to complications. Despite this, there are several reports on the spontaneous regression of the infection. This study was carried out to evaluate the effectiveness of a long term polyhexamethylene biguanide (PHMB-based local treatment in improving the viral clearance, reducing the time exposure to the infection and avoiding the complications associated with the invasive treatments currently available. Method 100 women diagnosed with HPV infection were randomly assigned to receive six months of treatment with a PHMB-based gynecological solution (Monogin®, Lo.Li. Pharma, Rome - Italy or to remain untreated for the same period of time. Results A greater number of patients, who received the treatment were cleared of the infection at the two time points of the study (three and six months compared to that of the control group. A significant difference in the regression rate (90% Monogin group vs 70% control group was observed at the end of the study highlighting the time-dependent ability of PHMB to interact with the infection progression. Conclusions The topic treatment with PHMB is a preliminary safe and promising approach for patients with detected HPV infection increasing the chance of clearance and avoiding the use of invasive treatments when not strictly necessary. Trial registration ClinicalTrials.gov Identifier NCT01571141

  18. Predicting the disease of Alzheimer with SNP biomarkers and clinical data using data mining classification approach: decision tree.

    Science.gov (United States)

    Erdoğan, Onur; Aydin Son, Yeşim

    2014-01-01

    Single Nucleotide Polymorphisms (SNPs) are the most common genomic variations where only a single nucleotide differs between individuals. Individual SNPs and SNP profiles associated with diseases can be utilized as biological markers. But there is a need to determine the SNP subsets and patients' clinical data which is informative for the diagnosis. Data mining approaches have the highest potential for extracting the knowledge from genomic datasets and selecting the representative SNPs as well as most effective and informative clinical features for the clinical diagnosis of the diseases. In this study, we have applied one of the widely used data mining classification methodology: "decision tree" for associating the SNP biomarkers and significant clinical data with the Alzheimer's disease (AD), which is the most common form of "dementia". Different tree construction parameters have been compared for the optimization, and the most accurate tree for predicting the AD is presented.

  19. Selection bias in species distribution models: An econometric approach on forest trees based on structural modeling

    Science.gov (United States)

    Martin-StPaul, N. K.; Ay, J. S.; Guillemot, J.; Doyen, L.; Leadley, P.

    2014-12-01

    Species distribution models (SDMs) are widely used to study and predict the outcome of global changes on species. In human dominated ecosystems the presence of a given species is the result of both its ecological suitability and human footprint on nature such as land use choices. Land use choices may thus be responsible for a selection bias in the presence/absence data used in SDM calibration. We present a structural modelling approach (i.e. based on structural equation modelling) that accounts for this selection bias. The new structural species distribution model (SSDM) estimates simultaneously land use choices and species responses to bioclimatic variables. A land use equation based on an econometric model of landowner choices was joined to an equation of species response to bioclimatic variables. SSDM allows the residuals of both equations to be dependent, taking into account the possibility of shared omitted variables and measurement errors. We provide a general description of the statistical theory and a set of applications on forest trees over France using databases of climate and forest inventory at different spatial resolution (from 2km to 8km). We also compared the outputs of the SSDM with outputs of a classical SDM (i.e. Biomod ensemble modelling) in terms of bioclimatic response curves and potential distributions under current climate and climate change scenarios. The shapes of the bioclimatic response curves and the modelled species distribution maps differed markedly between SSDM and classical SDMs, with contrasted patterns according to species and spatial resolutions. The magnitude and directions of these differences were dependent on the correlations between the errors from both equations and were highest for higher spatial resolutions. A first conclusion is that the use of classical SDMs can potentially lead to strong miss-estimation of the actual and future probability of presence modelled. Beyond this selection bias, the SSDM we propose represents

  20. A watershed modeling approach to streamflow reconstruction from tree-ring records

    International Nuclear Information System (INIS)

    Saito, Laurel; Biondi, Franco; Salas, Jose D; Panorska, Anna K; Kozubowski, Tomasz J

    2008-01-01

    Insight into long-term changes of streamflow is critical for addressing implications of global warming for sustainable water management. To date, dendrohydrologists have employed sophisticated regression techniques to extend runoff records, but this empirical approach cannot directly test the influence of watershed factors that alter streamflow independently of climate. We designed a mechanistic watershed model to calculate streamflows at annual timescales using as few inputs as possible. The model was calibrated for upper reaches of the Walker River, which straddles the boundary between the Sierra Nevada of California and the Great Basin of Nevada. Even though the model incorporated simplified relationships between precipitation and other components of the hydrologic cycle, it predicted water year streamflows with correlations of 0.87 when appropriate precipitation values were used

  1. An object-based approach for tree species extraction from digital orthophoto maps

    Science.gov (United States)

    Jamil, Akhtar; Bayram, Bulent

    2018-05-01

    Tree segmentation is an active and ongoing research area in the field of photogrammetry and remote sensing. It is more challenging due to both intra-class and inter-class similarities among various tree species. In this study, we exploited various statistical features for extraction of hazelnut trees from 1 : 5000 scaled digital orthophoto maps. Initially, the non-vegetation areas were eliminated using traditional normalized difference vegetation index (NDVI) followed by application of mean shift segmentation for transforming the pixels into meaningful homogeneous objects. In order to eliminate false positives, morphological opening and closing was employed on candidate objects. A number of heuristics were also derived to eliminate unwanted effects such as shadow and bounding box aspect ratios, before passing them into the classification stage. Finally, a knowledge based decision tree was constructed to distinguish the hazelnut trees from rest of objects which include manmade objects and other type of vegetation. We evaluated the proposed methodology on 10 sample orthophoto maps obtained from Giresun province in Turkey. The manually digitized hazelnut tree boundaries were taken as reference data for accuracy assessment. Both manually digitized and segmented tree borders were converted into binary images and the differences were calculated. According to the obtained results, the proposed methodology obtained an overall accuracy of more than 85 % for all sample images.

  2. How do urban households in China respond to increasing block pricing in electricity? Evidence from a fuzzy regression discontinuity approach

    International Nuclear Information System (INIS)

    Zhang, Zibin; Cai, Wenxin; Feng, Xiangzhao

    2017-01-01

    China is the largest electricity consumption country after it has passed the United States in 2011. Residential electricity consumption in China grew by 381.35% (12.85% per annum) between 2000 and 2013. In order to deal with rapid growth in residential electricity consumption, an increasing block pricing policy was introduced for residential electricity consumers in China on July 1st, 2012. Using difference-in-differences models with a fuzzy regression discontinuity design, we estimate a causal effect of price on electricity consumption for urban households during the introduction of increasing block pricing policy in Guangdong province of China. We find that consumers do not respond to a smaller (approximately 8%) increase in marginal price. However, consumers do respond to a larger increase in marginal price. An approximately 40% increase in marginal price induces an approximately 35% decrease in electricity use (284 kW h per month). Our results suggest that although the increasing block pricing could affect the behavior of households with higher electricity use, there is only a limit potential to overall energy conservation. - Highlights: • Estimate electricity consumption changes in response to the IBP in China. • Employ quasi-experimental approach and micro household level data in China. • Households do not respond to a smaller increase in marginal price. • 40% increase in marginal price induces a 35% decrease in electricity use.

  3. A Short-Term and High-Resolution System Load Forecasting Approach Using Support Vector Regression with Hybrid Parameters Optimization

    Energy Technology Data Exchange (ETDEWEB)

    Jiang, Huaiguang [National Renewable Energy Laboratory (NREL), Golden, CO (United States)

    2017-08-25

    This work proposes an approach for distribution system load forecasting, which aims to provide highly accurate short-term load forecasting with high resolution utilizing a support vector regression (SVR) based forecaster and a two-step hybrid parameters optimization method. Specifically, because the load profiles in distribution systems contain abrupt deviations, a data normalization is designed as the pretreatment for the collected historical load data. Then an SVR model is trained by the load data to forecast the future load. For better performance of SVR, a two-step hybrid optimization algorithm is proposed to determine the best parameters. In the first step of the hybrid optimization algorithm, a designed grid traverse algorithm (GTA) is used to narrow the parameters searching area from a global to local space. In the second step, based on the result of the GTA, particle swarm optimization (PSO) is used to determine the best parameters in the local parameter space. After the best parameters are determined, the SVR model is used to forecast the short-term load deviation in the distribution system.

  4. Impact of a New Law to Reduce the Legal Blood Alcohol Concentration Limit - A Poisson Regression Analysis and Descriptive Approach.

    Science.gov (United States)

    Nistal-Nuño, Beatriz

    2017-03-31

    In Chile, a new law introduced in March 2012 lowered the blood alcohol concentration (BAC) limit for impaired drivers from 0.1% to 0.08% and the BAC limit for driving under the influence of alcohol from 0.05% to 0.03%, but its effectiveness remains uncertain. The goal of this investigation was to evaluate the effects of this enactment on road traffic injuries and fatalities in Chile. A retrospective cohort study. Data were analyzed using a descriptive and a Generalized Linear Models approach, type of Poisson regression, to analyze deaths and injuries in a series of additive Log-Linear Models accounting for the effects of law implementation, month influence, a linear time trend and population exposure. A review of national databases in Chile was conducted from 2003 to 2014 to evaluate the monthly rates of traffic fatalities and injuries associated to alcohol and in total. It was observed a decrease by 28.1 percent in the monthly rate of traffic fatalities related to alcohol as compared to before the law (Plaw (Plaw implemented in 2012 in Chile. Chile experienced a significant reduction in alcohol-related traffic fatalities and injuries, being a successful public health intervention.

  5. Prediction of Currency Volume Issued in Taiwan Using a Hybrid Artificial Neural Network and Multiple Regression Approach

    Directory of Open Access Journals (Sweden)

    Yuehjen E. Shao

    2013-01-01

    Full Text Available Because the volume of currency issued by a country always affects its interest rate, price index, income levels, and many other important macroeconomic variables, the prediction of currency volume issued has attracted considerable attention in recent years. In contrast to the typical single-stage forecast model, this study proposes a hybrid forecasting approach to predict the volume of currency issued in Taiwan. The proposed hybrid models consist of artificial neural network (ANN and multiple regression (MR components. The MR component of the hybrid models is established for a selection of fewer explanatory variables, wherein the selected variables are of higher importance. The ANN component is then designed to generate forecasts based on those important explanatory variables. Subsequently, the model is used to analyze a real dataset of Taiwan's currency from 1996 to 2011 and twenty associated explanatory variables. The prediction results reveal that the proposed hybrid scheme exhibits superior forecasting performance for predicting the volume of currency issued in Taiwan.

  6. Seemingly Unrelated Regression Approach for GSTARIMA Model to Forecast Rain Fall Data in Malang Southern Region Districts

    Directory of Open Access Journals (Sweden)

    Siti Choirun Nisak

    2016-06-01

    Full Text Available Time series forecasting models can be used to predict phenomena that occur in nature. Generalized Space Time Autoregressive (GSTAR is one of time series model used to forecast the data consisting the elements of time and space. This model is limited to the stationary and non-seasonal data. Generalized Space Time Autoregressive Integrated Moving Average (GSTARIMA is GSTAR development model that accommodates the non-stationary and seasonal data. Ordinary Least Squares (OLS is method used to estimate parameter of GSTARIMA model. Estimation parameter of GSTARIMA model using OLS will not produce efficiently estimator if there is an error correlation between spaces. Ordinary Least Square (OLS assumes the variance-covariance matrix has a constant error ~(, but in fact, the observatory spaces are correlated so that variance-covariance matrix of the error is not constant. Therefore, Seemingly Unrelated Regression (SUR approach is used to accommodate the weakness of the OLS. SUR assumption is ~(, for estimating parameters GSTARIMA model. The method to estimate parameter of SUR is Generalized Least Square (GLS. Applications GSTARIMA-SUR models for rainfall data in the region Malang obtained GSTARIMA models ((1(1,12,36,(0,(1-SUR with determination coefficient generated with the average of 57.726%.

  7. Classification and Compression of Multi-Resolution Vectors: A Tree Structured Vector Quantizer Approach

    Science.gov (United States)

    2002-01-01

    their expression profile and for classification of cells into tumerous and non- tumerous classes. Then we will present a parallel tree method for... cancerous cells. We will use the same dataset and use tree structured classifiers with multi-resolution analysis for classifying cancerous from non- cancerous ...cells. We have the expressions of 4096 genes from 98 different cell types. Of these 98, 72 are cancerous while 26 are non- cancerous . We are interested

  8. Urban tree growth modeling

    Science.gov (United States)

    E. Gregory McPherson; Paula J. Peper

    2012-01-01

    This paper describes three long-term tree growth studies conducted to evaluate tree performance because repeated measurements of the same trees produce critical data for growth model calibration and validation. Several empirical and process-based approaches to modeling tree growth are reviewed. Modeling is more advanced in the fields of forestry and...

  9. A novel approach to internal crown characterization for coniferous tree species classification

    Science.gov (United States)

    Harikumar, A.; Bovolo, F.; Bruzzone, L.

    2016-10-01

    The knowledge about individual trees in forest is highly beneficial in forest management. High density small foot- print multi-return airborne Light Detection and Ranging (LiDAR) data can provide a very accurate information about the structural properties of individual trees in forests. Every tree species has a unique set of crown structural characteristics that can be used for tree species classification. In this paper, we use both the internal and external crown structural information of a conifer tree crown, derived from a high density small foot-print multi-return LiDAR data acquisition for species classification. Considering the fact that branches are the major building blocks of a conifer tree crown, we obtain the internal crown structural information using a branch level analysis. The structure of each conifer branch is represented using clusters in the LiDAR point cloud. We propose the joint use of the k-means clustering and geometric shape fitting, on the LiDAR data projected onto a novel 3-dimensional space, to identify branch clusters. After mapping the identified clusters back to the original space, six internal geometric features are estimated using a branch-level analysis. The external crown characteristics are modeled by using six least correlated features based on cone fitting and convex hull. Species classification is performed using a sparse Support Vector Machines (sparse SVM) classifier.

  10. Automated delineation and characterization of drumlins using a localized contour tree approach

    Science.gov (United States)

    Wang, Shujie; Wu, Qiusheng; Ward, Dylan

    2017-10-01

    Drumlins are ubiquitous landforms in previously glaciated regions, formed through a series of complex subglacial processes operating underneath the paleo-ice sheets. Accurate delineation and characterization of drumlins are essential for understanding the formation mechanism of drumlins as well as the flow behaviors and basal conditions of paleo-ice sheets. Automated mapping of drumlins is particularly important for examining the distribution patterns of drumlins across large spatial scales. This paper presents an automated vector-based approach to mapping drumlins from high-resolution light detection and ranging (LiDAR) data. The rationale is to extract a set of concentric contours by building localized contour trees and establishing topological relationships. This automated method can overcome the shortcomings of previously manual and automated methods for mapping drumlins, for instance, the azimuthal biases during the generation of shaded relief images. A case study was carried out over a portion of the New York Drumlin Field. Overall 1181 drumlins were identified from the LiDAR-derived DEM across the study region, which had been underestimated in previous literature. The delineation results were visually and statistically compared to the manual digitization results. The morphology of drumlins was characterized by quantifying the length, width, elongation ratio, height, area, and volume. Statistical and spatial analyses were conducted to examine the distribution pattern and spatial variability of drumlin size and form. The drumlins and the morphologic characteristics exhibit significant spatial clustering rather than randomly distributed patterns. The form of drumlins varies from ovoid to spindle shapes towards the downstream direction of paleo ice flows, along with the decrease in width, area, and volume. This observation is in line with previous studies, which may be explained by the variations in sediment thickness and/or the velocity increases of ice flows

  11. Regression Phalanxes

    OpenAIRE

    Zhang, Hongyang; Welch, William J.; Zamar, Ruben H.

    2017-01-01

    Tomal et al. (2015) introduced the notion of "phalanxes" in the context of rare-class detection in two-class classification problems. A phalanx is a subset of features that work well for classification tasks. In this paper, we propose a different class of phalanxes for application in regression settings. We define a "Regression Phalanx" - a subset of features that work well together for prediction. We propose a novel algorithm which automatically chooses Regression Phalanxes from high-dimensi...

  12. Reconstruction of Local Sea Levels at South West Pacific Islands—A Multiple Linear Regression Approach (1988-2014)

    Science.gov (United States)

    Kumar, V.; Melet, A.; Meyssignac, B.; Ganachaud, A.; Kessler, W. S.; Singh, A.; Aucan, J.

    2018-02-01

    Rising sea levels are a critical concern in small island nations. The problem is especially serious in the western south Pacific, where the total sea level rise over the last 60 years has been up to 3 times the global average. In this study, we aim at reconstructing sea levels at selected sites in the region (Suva, Lautoka—Fiji, and Nouméa—New Caledonia) as a multilinear regression (MLR) of atmospheric and oceanic variables. We focus on sea level variability at interannual-to-interdecadal time scales, and trend over the 1988-2014 period. Local sea levels are first expressed as a sum of steric and mass changes. Then a dynamical approach is used based on wind stress curl as a proxy for the thermosteric component, as wind stress curl anomalies can modulate the thermocline depth and resultant sea levels via Rossby wave propagation. Statistically significant predictors among wind stress curl, halosteric sea level, zonal/meridional wind stress components, and sea surface temperature are used to construct a MLR model simulating local sea levels. Although we are focusing on the local scale, the global mean sea level needs to be adjusted for. Our reconstructions provide insights on key drivers of sea level variability at the selected sites, showing that while local dynamics and the global signal modulate sea level to a given extent, most of the variance is driven by regional factors. On average, the MLR model is able to reproduce 82% of the variance in island sea level, and could be used to derive local sea level projections via downscaling of climate models.

  13. Species trees from consensus single nucleotide polymorphism (SNP) data: Testing phylogenetic approaches with simulated and empirical data.

    Science.gov (United States)

    Schmidt-Lebuhn, Alexander N; Aitken, Nicola C; Chuah, Aaron

    2017-11-01

    Datasets of hundreds or thousands of SNPs (Single Nucleotide Polymorphisms) from multiple individuals per species are increasingly used to study population structure, species delimitation and shallow phylogenetics. The principal software tool to infer species or population trees from SNP data is currently the BEAST template SNAPP which uses a Bayesian coalescent analysis. However, it is computationally extremely demanding and tolerates only small amounts of missing data. We used simulated and empirical SNPs from plants (Australian Craspedia, Asteraceae, and Pelargonium, Geraniaceae) to compare species trees produced (1) by SNAPP, (2) using SVD quartets, and (3) using Bayesian and parsimony analysis with several different approaches to summarising data from multiple samples into one set of traits per species. Our aims were to explore the impact of tree topology and missing data on the results, and to test which data summarising and analyses approaches would best approximate the results obtained from SNAPP for empirical data. SVD quartets retrieved the correct topology from simulated data, as did SNAPP except in the case of a very unbalanced phylogeny. Both methods failed to retrieve the correct topology when large amounts of data were missing. Bayesian analysis of species level summary data scoring the two alleles of each SNP as independent characters and parsimony analysis of data scoring each SNP as one character produced trees with branch length distributions closest to the true trees on which SNPs were simulated. For empirical data, Bayesian inference and Dollo parsimony analysis of data scored allele-wise produced phylogenies most congruent with the results of SNAPP. In the case of study groups divergent enough for missing data to be phylogenetically informative (because of additional mutations preventing amplification of genomic fragments or bioinformatic establishment of homology), scoring of SNP data as a presence/absence matrix irrespective of allele

  14. Percentile-Based ETCCDI Temperature Extremes Indices for CMIP5 Model Output: New Results through Semiparametric Quantile Regression Approach

    Science.gov (United States)

    Li, L.; Yang, C.

    2017-12-01

    Climate extremes often manifest as rare events in terms of surface air temperature and precipitation with an annual reoccurrence period. In order to represent the manifold characteristics of climate extremes for monitoring and analysis, the Expert Team on Climate Change Detection and Indices (ETCCDI) had worked out a set of 27 core indices based on daily temperature and precipitation data, describing extreme weather and climate events on an annual basis. The CLIMDEX project (http://www.climdex.org) had produced public domain datasets of such indices for data from a variety of sources, including output from global climate models (GCM) participating in the Coupled Model Intercomparison Project Phase 5 (CMIP5). Among the 27 ETCCDI indices, there are six percentile-based temperature extremes indices that may fall into two groups: exceedance rates (ER) (TN10p, TN90p, TX10p and TX90p) and durations (CSDI and WSDI). Percentiles must be estimated prior to the calculation of the indices, and could more or less be biased by the adopted algorithm. Such biases will in turn be propagated to the final results of indices. The CLIMDEX used an empirical quantile estimator combined with a bootstrap resampling procedure to reduce the inhomogeneity in the annual series of the ER indices. However, there are still some problems remained in the CLIMDEX datasets, namely the overestimated climate variability due to unaccounted autocorrelation in the daily temperature data, seasonally varying biases and inconsistency between algorithms applied to the ER indices and to the duration indices. We now present new results of the six indices through a semiparametric quantile regression approach for the CMIP5 model output. By using the base-period data as a whole and taking seasonality and autocorrelation into account, this approach successfully addressed the aforementioned issues and came out with consistent results. The new datasets cover the historical and three projected (RCP2.6, RCP4.5 and RCP

  15. A Weibull-based compositional approach for hierarchical dynamic fault trees

    International Nuclear Information System (INIS)

    Chiacchio, F.; Cacioppo, M.; D'Urso, D.; Manno, G.; Trapani, N.; Compagno, L.

    2013-01-01

    The solution of a dynamic fault tree (DFT) for the reliability assessment can be achieved using a wide variety of techniques. These techniques have a strong theoretical foundation as both the analytical and the simulation methods have been extensively developed. Nevertheless, they all present the same limits that appear with the increasing of the size of the fault trees (i.e., state space explosion, time-consuming simulations), compromising the resolution. We have tested the feasibility of a composition algorithm based on a Weibull distribution, addressed to the resolution of a general class of dynamic fault trees characterized by non-repairable basic events and generally distributed failure times. The proposed composition algorithm is used to generalize the traditional hierarchical technique that, as previous literature have extensively confirmed, is able to reduce the computational effort of a large DFT through the modularization of independent parts of the tree. The results of this study are achieved both through simulation and analytical techniques, thus confirming the capability to solve a quite general class of dynamic fault trees and overcome the limits of traditional techniques.

  16. Does the Magnitude of the Link between Unemployment and Crime Depend on the Crime Level? A Quantile Regression Approach

    Directory of Open Access Journals (Sweden)

    Horst Entorf

    2015-07-01

    Full Text Available Two alternative hypotheses – referred to as opportunity- and stigma-based behavior – suggest that the magnitude of the link between unemployment and crime also depends on preexisting local crime levels. In order to analyze conjectured nonlinearities between both variables, we use quantile regressions applied to German district panel data. While both conventional OLS and quantile regressions confirm the positive link between unemployment and crime for property crimes, results for assault differ with respect to the method of estimation. Whereas conventional mean regressions do not show any significant effect (which would confirm the usual result found for violent crimes in the literature, quantile regression reveals that size and importance of the relationship are conditional on the crime rate. The partial effect is significantly positive for moderately low and median quantiles of local assault rates.

  17. The risk of disabling, surgery and reoperation in Crohn's disease - A decision tree-based approach to prognosis.

    Science.gov (United States)

    Dias, Cláudia Camila; Pereira Rodrigues, Pedro; Fernandes, Samuel; Portela, Francisco; Ministro, Paula; Martins, Diana; Sousa, Paula; Lago, Paula; Rosa, Isadora; Correia, Luis; Moura Santos, Paula; Magro, Fernando

    2017-01-01

    Crohn's disease (CD) is a chronic inflammatory bowel disease known to carry a high risk of disabling and many times requiring surgical interventions. This article describes a decision-tree based approach that defines the CD patients' risk or undergoing disabling events, surgical interventions and reoperations, based on clinical and demographic variables. This multicentric study involved 1547 CD patients retrospectively enrolled and divided into two cohorts: a derivation one (80%) and a validation one (20%). Decision trees were built upon applying the CHAIRT algorithm for the selection of variables. Three-level decision trees were built for the risk of disabling and reoperation, whereas the risk of surgery was described in a two-level one. A receiver operating characteristic (ROC) analysis was performed, and the area under the curves (AUC) Was higher than 70% for all outcomes. The defined risk cut-off values show usefulness for the assessed outcomes: risk levels above 75% for disabling had an odds test positivity of 4.06 [3.50-4.71], whereas risk levels below 34% and 19% excluded surgery and reoperation with an odds test negativity of 0.15 [0.09-0.25] and 0.50 [0.24-1.01], respectively. Overall, patients with B2 or B3 phenotype had a higher proportion of disabling disease and surgery, while patients with later introduction of pharmacological therapeutic (1 months after initial surgery) had a higher proportion of reoperation. The decision-tree based approach used in this study, with demographic and clinical variables, has shown to be a valid and useful approach to depict such risks of disabling, surgery and reoperation.

  18. Single event and TREE latchup mitigation for a star tracker sensor: An innovative approach to system level latchup mitigation

    International Nuclear Information System (INIS)

    Kimbrough, J.R.; Colella, N.J.; Davis, R.W.; Bruener, D.B.; Coakley, P.G.; Lutjens, S.W.; Mallon, C.E.

    1994-08-01

    Electronic packages designed for spacecraft should be fault-tolerant and operate without ground control intervention through extremes in the space radiation environment. If designed for military use, the electronics must survive and function in a nuclear radiation environment. This paper presents an innovative ''blink'' approach rather than the typical ''operate through'' approach to achieve system level latchup mitigation on a prototype star tracker camera. Included are circuit designs, flash x-ray test data, and heavy ion data demonstrating latchup mitigation protecting micro-electronics from current latchup and burnout due to Single Event Latchup (SEL) and Transient Radiation Effects on Electronics (TREE)

  19. Comparison of two regression-based approaches for determining nutrient and sediment fluxes and trends in the Chesapeake Bay watershed

    Science.gov (United States)

    Moyer, Douglas; Hirsch, Robert M.; Hyer, Kenneth

    2012-01-01

    Nutrient and sediment fluxes and changes in fluxes over time are key indicators that water resource managers can use to assess the progress being made in improving the structure and function of the Chesapeake Bay ecosystem. The U.S. Geological Survey collects annual nutrient (nitrogen and phosphorus) and sediment flux data and computes trends that describe the extent to which water-quality conditions are changing within the major Chesapeake Bay tributaries. Two regression-based approaches were compared for estimating annual nutrient and sediment fluxes and for characterizing how these annual fluxes are changing over time. The two regression models compared are the traditionally used ESTIMATOR and the newly developed Weighted Regression on Time, Discharge, and Season (WRTDS). The model comparison focused on answering three questions: (1) What are the differences between the functional form and construction of each model? (2) Which model produces estimates of flux with the greatest accuracy and least amount of bias? (3) How different would the historical estimates of annual flux be if WRTDS had been used instead of ESTIMATOR? One additional point of comparison between the two models is how each model determines trends in annual flux once the year-to-year variations in discharge have been determined. All comparisons were made using total nitrogen, nitrate, total phosphorus, orthophosphorus, and suspended-sediment concentration data collected at the nine U.S. Geological Survey River Input Monitoring stations located on the Susquehanna, Potomac, James, Rappahannock, Appomattox, Pamunkey, Mattaponi, Patuxent, and Choptank Rivers in the Chesapeake Bay watershed. Two model characteristics that uniquely distinguish ESTIMATOR and WRTDS are the fundamental model form and the determination of model coefficients. ESTIMATOR and WRTDS both predict water-quality constituent concentration by developing a linear relation between the natural logarithm of observed constituent

  20. The nuisance of nuisance regression: spectral misspecification in a common approach to resting-state fMRI preprocessing reintroduces noise and obscures functional connectivity.

    Science.gov (United States)

    Hallquist, Michael N; Hwang, Kai; Luna, Beatriz

    2013-11-15

    Recent resting-state functional connectivity fMRI (RS-fcMRI) research has demonstrated that head motion during fMRI acquisition systematically influences connectivity estimates despite bandpass filtering and nuisance regression, which are intended to reduce such nuisance variability. We provide evidence that the effects of head motion and other nuisance signals are poorly controlled when the fMRI time series are bandpass-filtered but the regressors are unfiltered, resulting in the inadvertent reintroduction of nuisance-related variation into frequencies previously suppressed by the bandpass filter, as well as suboptimal correction for noise signals in the frequencies of interest. This is important because many RS-fcMRI studies, including some focusing on motion-related artifacts, have applied this approach. In two cohorts of individuals (n=117 and 22) who completed resting-state fMRI scans, we found that the bandpass-regress approach consistently overestimated functional connectivity across the brain, typically on the order of r=.10-.35, relative to a simultaneous bandpass filtering and nuisance regression approach. Inflated correlations under the bandpass-regress approach were associated with head motion and cardiac artifacts. Furthermore, distance-related differences in the association of head motion and connectivity estimates were much weaker for the simultaneous filtering approach. We recommend that future RS-fcMRI studies ensure that the frequencies of nuisance regressors and fMRI data match prior to nuisance regression, and we advocate a simultaneous bandpass filtering and nuisance regression strategy that better controls nuisance-related variability. Copyright © 2013 Elsevier Inc. All rights reserved.

  1. Tropical dendrochemistry: A novel approach to estimate age and growth from ringless trees

    International Nuclear Information System (INIS)

    Poussart, P.; Myneni, S.; Lanzirotti, A.

    2006-01-01

    Although tropical forests play an active role in the global carbon cycle and climate, their growth history remains poorly characterized compared to other ecosystems on the planet. Trees are prime candidates for the extraction of paleoclimate archives as they can be probed sub-annually, are widely distributed and can live for over 1400 years. However, dendrochronological techniques have found limited applications in the tropics because trees often lack visible growth rings. Alternative methods exist (dendrometry, radio- and stable isotopes), but the derived records are either of short-duration, lack seasonal resolution or are prohibitively labor intensive to produce. Here, we show the first X-ray microprobe synchrotron record of calcium (Ca) from a ringless Miliusa velutina tree from Thailand and use it to estimate the tree's age and growth history. The Ca age model agrees within (le)2 years of bomb-radiocarbon age estimates and confirms that the cycles are seasonal. The amplitude of the Ca annual cycle is correlated significantly with growth and annual Ca maxima correlate with the amount of dry season rainfall. Synchrotron measurements are fast and producing sufficient numbers of replicated multi-century tropical dendrochemical climate records now seems analytically feasible

  2. A Comparison Between Educational Approaches to Teaching Forestry and Tree Identification in a Resident Camp Setting

    Science.gov (United States)

    Schmitt, Robert M.; Groves, David L.

    1976-01-01

    Results of pre- and post-testing the tree identification skills of a group of 4-H members showed that adolescents aged 9-11 respond better to lecture demonstration teaching methods, while adolescents aged 12-14 respond better to an inquiry process that used nature trails as the primary instructional tool. (MLH)

  3. Seeing beyond fertiliser trees : a case study of a community based participatory approach to agroforestry research and development in western Kenya

    NARCIS (Netherlands)

    Kiptot, E.

    2007-01-01

    Key words: village committee approach, agroforestry, improved tree fallows, biomass transfer, realist evaluation, soil fertility, adoption, dissemination. The thesis explores and describes various processes that take place in the implementation of a community based participatory initiative

  4. Translating Response During Therapy into Ultimate Treatment Outcome: A Personalized 4-Dimensional MRI Tumor Volumetric Regression Approach in Cervical Cancer

    International Nuclear Information System (INIS)

    Mayr, Nina A.; Wang, Jian Z.; Lo, Simon S.; Zhang Dongqing; Grecula, John C.; Lu Lanchun; Montebello, Joseph F.; Fowler, Jeffrey M.; Yuh, William T.C.

    2010-01-01

    Purpose: To assess individual volumetric tumor regression pattern in cervical cancer during therapy using serial four-dimensional MRI and to define the regression parameters' prognostic value validated with local control and survival correlation. Methods and Materials: One hundred and fifteen patients with Stage IB 2 -IVA cervical cancer treated with radiation therapy (RT) underwent serial MRI before (MRI 1) and during RT, at 2-2.5 weeks (MRI 2, at 20-25 Gy), and at 4-5 weeks (MRI 3, at 40-50 Gy). Eighty patients had a fourth MRI 1-2 months post-RT. Mean follow-up was 5.3 years. Tumor volume was measured by MRI-based three-dimensional volumetry, and plotted as dose(time)/volume regression curves. Volume regression parameters were correlated with local control, disease-specific, and overall survival. Results: Residual tumor volume, slope, and area under the regression curve correlated significantly with local control and survival. Residual volumes ≥20% at 40-50 Gy were independently associated with inferior 5-year local control (53% vs. 97%, p <0.001) and disease-specific survival rates (50% vs. 72%, p = 0.009) than smaller volumes. Patients with post-RT residual volumes ≥10% had 0% local control and 17% disease-specific survival, compared with 91% and 72% for <10% volume (p <0.001). Conclusion: Using more accurate four-dimensional volumetric regression analysis, tumor response can now be directly translated into individual patients' outcome for clinical application. Our results define two temporal thresholds critically influencing local control and survival. In patients with ≥20% residual volume at 40-50 Gy and ≥10% post-RT, the risk for local failure and death are so high that aggressive intervention may be warranted.

  5. Assessing the response of area burned to changing climate in western boreal North America using a Multivariate Adaptive Regression Splines (MARS) approach

    Science.gov (United States)

    Michael S. Balshi; A. David McGuire; Paul Duffy; Mike Flannigan; John Walsh; Jerry Melillo

    2009-01-01

    We developed temporally and spatially explicit relationships between air temperature and fuel moisture codes derived from the Canadian Fire Weather Index System to estimate annual area burned at 2.5o (latitude x longitude) resolution using a Multivariate Adaptive Regression Spline (MARS) approach across Alaska and Canada. Burned area was...

  6. New developments in fruit and vegetables consumption in the period 1999-2004 in Denmark - a quantile regression approach

    DEFF Research Database (Denmark)

    Hansen, Aslak Hedemann

    2008-01-01

    The development in the consumption of fruit and vegetables in the period 1999-2004 in Denmark was investigated using quantile regression and two previously overlooked problems were identified. First, the change in the ten percent quantile samples decreased. This could have been caused by changes ...

  7. Predicting attention-deficit/hyperactivity disorder severity from psychosocial stress and stress-response genes : A random forest regression approach

    NARCIS (Netherlands)

    Van Der Meer, D.; Hoekstra, P. J.; Van Donkelaar, M.; Bralten, J.; Oosterlaan, J.; Heslenfeld, D.; Faraone, S. V.; Franke, B.; Buitelaar, J. K.; Hartman, C. A.

    2017-01-01

    Identifying genetic variants contributing to attention-deficit/hyperactivity disorder (ADHD) is complicated by the involvement of numerous common genetic variants with small effects, interacting with each other as well as with environmental factors, such as stress exposure. Random forest regression

  8. A Generalized Least Squares Regression Approach for Computing Effect Sizes in Single-Case Research: Application Examples

    Science.gov (United States)

    Maggin, Daniel M.; Swaminathan, Hariharan; Rogers, Helen J.; O'Keeffe, Breda V.; Sugai, George; Horner, Robert H.

    2011-01-01

    A new method for deriving effect sizes from single-case designs is proposed. The strategy is applicable to small-sample time-series data with autoregressive errors. The method uses Generalized Least Squares (GLS) to model the autocorrelation of the data and estimate regression parameters to produce an effect size that represents the magnitude of…

  9. Predicting attention-deficit/hyperactivity disorder severity from psychosocial stress and stress-response genes : a random forest regression approach

    NARCIS (Netherlands)

    van der Meer, D.; Hoekstra, P. J.; van Donkelaar, Marjolein M. J.; Bralten, Janita; Oosterlaan, J; Heslenfeld, Dirk J.; Faraone, S. V.; Franke, B.; Buitelaar, J. K.; Hartman, C. A.

    2017-01-01

    Identifying genetic variants contributing to attention-deficit/hyperactivity disorder (ADHD) is complicated by the involvement of numerous common genetic variants with small effects, interacting with each other as well as with environmental factors, such as stress exposure. Random forest regression

  10. A comparative study between nonlinear regression and artificial neural network approaches for modelling wild oat (Avena fatua) field emergence

    Science.gov (United States)

    Non-linear regression techniques are used widely to fit weed field emergence patterns to soil microclimatic indices using S-type functions. Artificial neural networks present interesting and alternative features for such modeling purposes. In this work, a univariate hydrothermal-time based Weibull m...

  11. Comparison of autoregressive (AR) strategy with that of regression approach for determining ozone layer depletion as a physical process

    International Nuclear Information System (INIS)

    Yousufzai, M.A.K; Aansari, M.R.K.; Quamar, J.; Iqbal, J.; Hussain, M.A.

    2010-01-01

    This communication presents the development of a comprehensive characterization of ozone layer depletion (OLD) phenomenon as a physical process in the form of mathematical models that comprise the usual regression, multiple or polynomial regression and stochastic strategy. The relevance of these models has been illuminated using predicted values of different parameters under a changing environment. The information obtained from such analysis can be employed to alter the possible factors and variables to achieve optimum performance. This kind of analysis initiates a study towards formulating the phenomenon of OLD as a physical process with special reference to the stratospheric region of Pakistan. The data presented here establishes that the Auto regressive (AR) nature of modeling OLD as a physical process is an appropriate scenario rather than using usual regression. The data reported in literature suggest quantitatively the OLD is occurring in our region. For this purpose we have modeled this phenomenon using the data recorded at the Geophysical Centre Quetta during the period 1960-1999. The predictions made by this analysis are useful for public, private and other relevant organizations. (author)

  12. Spectral mapping of savanna tree species at canopy level, with focus on tall trees, using an integrated CAO Hyperspectral & LiDAR sensor approach

    CSIR Research Space (South Africa)

    Naidoo, L

    2010-03-01

    Full Text Available The detection and mapping of tree/plant species in the savanna ecosystem can provide numerous benefits for the managerial authorities. This includes the accurate mapping of the spatial distribution of economically viable trees which are a key source...

  13. Quantile Regression Methods

    DEFF Research Database (Denmark)

    Fitzenberger, Bernd; Wilke, Ralf Andreas

    2015-01-01

    if the mean regression model does not. We provide a short informal introduction into the principle of quantile regression which includes an illustrative application from empirical labor market research. This is followed by briefly sketching the underlying statistical model for linear quantile regression based......Quantile regression is emerging as a popular statistical approach, which complements the estimation of conditional mean models. While the latter only focuses on one aspect of the conditional distribution of the dependent variable, the mean, quantile regression provides more detailed insights...... by modeling conditional quantiles. Quantile regression can therefore detect whether the partial effect of a regressor on the conditional quantiles is the same for all quantiles or differs across quantiles. Quantile regression can provide evidence for a statistical relationship between two variables even...

  14. Establishing a cause and effect relationship for ambient ozone exposure and tree growth in the forest: Progress and an experimental approach

    International Nuclear Information System (INIS)

    Manning, William J.

    2005-01-01

    Much has been written about the effects of ambient ozone on tree growth. Cause and effect has been established with seedlings in chambers. Results from multi-year studies with older tree seedlings, in open-top chambers, have been inconclusive, due to chamber effects. Extrapolation of results from chambers to trees in the forest is not possible. Predictive models for forest tree growth reductions caused by ozone have been developed, but not verified. Dendrochronological methods have been used to establish correlations between radial growth reductions in forest trees and ambient ozone exposure. The protective chemical ethylenediurea (EDU) has been used to protect tree seedlings from ozone injury. An experimental approach is advocated here that utilizes forest trees selected for sensitivity and non-sensitivity to ozone, dendrochronological methods, the protective chemical EDU, and monitoring data for ambient ozone, stomatal conductance, soil moisture potential, air temperature, PAR, etc. in long-term investigations to establish cause and effect relationships. - Progress is reviewed and an experimental approach is proposed to demonstrate a cause and effect relationship for ambient ozone and forest tree growth

  15. Quantifying ozone uptake at the canopy level of spruce, pine and larch trees at the alpine timberline: an approach based on sap flow measurement

    International Nuclear Information System (INIS)

    Wieser, G.; Matyssek, R.; Koestner, B.; Oberhuber, W.

    2003-01-01

    Sap-flow based measurements can be used to estimate ozone uptake at whole-tree and stand levels. - Micro-climatic and ambient ozone data were combined with measurements of sap flow through tree trunks in order to estimate whole-tree ozone uptake of adult Norway spruce (Picea abies), cembran pine (Pinus cembra), and European larch (Larix decidua) trees. Sap flow was monitored by means of the heat balance approach in two trees of each species during the growing season of 1998. In trees making up the stand canopy, the ozone uptake by evergreen foliages was significantly higher than by deciduous ones, when scaled to the ground area. However, if expressed per unit of whole-tree foliage area, ozone flux through the stomata into the needle mesophyll was 1.09, 1.18 and 1.40 nmol m -2 s -1 in Picea abies, Pinus cembra and Larix decidua, respectively. These fluxes are consistent with findings from measurements of needle gas exchange, published from the same species at the study site. It is concluded that the sap flow-based approach offers an inexpensive, spatially and temporally integrating way for estimating ozone uptake at the whole-tree and stand level, intrinsically covering the effect of boundary layers on ozone flux

  16. A novel approach for individual tree crown delineation using lidar data

    Science.gov (United States)

    Liu, Tao

    Individual tree crown delineation (ITCD) is an important technique to support precision forestry. ITCD is particularly difficult for deciduous forests where the existence of multiple branches can lead to false tree top detection. This thesis focused on developing a new ITCD model, which consists of two components: (1) boundary refinement using a novel algorithm called Fishing Net Dragging (FiND), and (2) segment merging using boundary classification. The proposed ITCD model was tested in both deciduous and mixed forests, attaining an overall accuracy of 74% and 78%, respectively. This compared favorably to an ITCD method commonly cited in the literature, which attained 41% and 51% on the same plots. To facilitate comparison of research in the ITCD community, this thesis also developed a new accuracy assessment scheme for ITCD. This new accuracy assessment is easy to interpret and convenient to implement while comprehensively evaluating ITCD accuracy.

  17. An FSPM approach for modeling fruit yield and quality in mango trees

    OpenAIRE

    Boudon , Frédéric; Persello , Severine; Jestin , Alexandra; Briand , Anne-Sarah; Fernique , Pierre; Guédon , Yann; Léchaudel , Mathieu; Grechi , Isabelle; Normand , Frédéric

    2016-01-01

    International audience; Research focus-Mango (Mangifera indica L.), the fifth most cultivated fruit in the world, is mainly produced in tropical and subtropical regions. Its cultivation raises a number of issues: (i) mango yield is irregular across years, (ii) phenological asynchronisms within and between trees maintain long periods with phenological stages susceptible to pests and diseases, and (iii) fruit quality and maturity are heterogeneous at harvest. To address these issues, we develop...

  18. Decision-tree approach to evaluating inactive uranium-processing sites for liner requirements

    International Nuclear Information System (INIS)

    Relyea, J.F.

    1983-03-01

    Recently, concern has been expressed about potential toxic effects of both radon emission and release of toxic elements in leachate from inactive uranium mill tailings piles. Remedial action may be required to meet disposal standards set by the states and the US Environmental Protection Agency (EPA). In some cases, a possible disposal option is the exhumation and reburial (either on site or at a new location) of tailings and reliance on engineered barriers to satisfy the objectives established for remedial actions. Liners under disposal pits are the major engineered barrier for preventing contaminant release to ground and surface water. The purpose of this report is to provide a logical sequence of action, in the form of a decision tree, which could be followed to show whether a selected tailings disposal design meets the objectives for subsurface contaminant release without a liner. This information can be used to determine the need and type of liner for sites exhibiting a potential groundwater problem. The decision tree is based on the capability of hydrologic and mass transport models to predict the movement of water and contaminants with time. The types of modeling capabilities and data needed for those models are described, and the steps required to predict water and contaminant movement are discussed. A demonstration of the decision tree procedure is given to aid the reader in evaluating the need for the adequacy of a liner

  19. Carbon footprint of forest and tree utilization technologies in life cycle approach

    Science.gov (United States)

    Polgár, András; Pécsinger, Judit

    2017-04-01

    In our research project a suitable method has been developed related the technological aspect of the environmental assessment of land use changes caused by climate change. We have prepared an eco-balance (environmental inventory) to the environmental effects classification in life-cycle approach in connection with the typical agricultural / forest and tree utilization technologies. The use of balances and environmental classification makes possible to compare land-use technologies and their environmental effects per common functional unit. In order to test our environmental analysis model, we carried out surveys in sample of forest stands. We set up an eco-balance of the working systems of intermediate cutting and final harvest in the stands of beech, oak, spruce, acacia, poplar and short rotation energy plantations (willow, poplar). We set up the life-cycle plan of the surveyed working systems by using the GaBi 6.0 Professional software and carried out midpoint and endpoint impact assessment. Out of the results, we applied the values of CML 2001 - Global Warming Potential (GWP 100 years) [kg CO2-Equiv.] and Eco-Indicator 99 - Human health, Climate Change [DALY]. On the basis of the values we set up a ranking of technology. By this, we received the environmental impact classification of the technologies based on carbon footprint. The working systems had the greatest impact on global warming (GWP 100 years) throughout their whole life cycle. This is explained by the amount of carbon dioxide releasing to the atmosphere resulting from the fuel of the technologies. Abiotic depletion (ADP foss) and marine aquatic ecotoxicity (MAETP) emerged also as significant impact categories. These impact categories can be explained by the share of input of fuel and lube. On the basis of the most significant environmental impact category (carbon footprint), we perform the relative life cycle contribution and ranking of each technologies. The technological life cycle stages examined

  20. A Poisson regression approach to model monthly hail occurrence in Northern Switzerland using large-scale environmental variables

    Science.gov (United States)

    Madonna, Erica; Ginsbourger, David; Martius, Olivia

    2018-05-01

    In Switzerland, hail regularly causes substantial damage to agriculture, cars and infrastructure, however, little is known about its long-term variability. To study the variability, the monthly number of days with hail in northern Switzerland is modeled in a regression framework using large-scale predictors derived from ERA-Interim reanalysis. The model is developed and verified using radar-based hail observations for the extended summer season (April-September) in the period 2002-2014. The seasonality of hail is explicitly modeled with a categorical predictor (month) and monthly anomalies of several large-scale predictors are used to capture the year-to-year variability. Several regression models are applied and their performance tested with respect to standard scores and cross-validation. The chosen model includes four predictors: the monthly anomaly of the two meter temperature, the monthly anomaly of the logarithm of the convective available potential energy (CAPE), the monthly anomaly of the wind shear and the month. This model well captures the intra-annual variability and slightly underestimates its inter-annual variability. The regression model is applied to the reanalysis data back in time to 1980. The resulting hail day time series shows an increase of the number of hail days per month, which is (in the model) related to an increase in temperature and CAPE. The trend corresponds to approximately 0.5 days per month per decade. The results of the regression model have been compared to two independent data sets. All data sets agree on the sign of the trend, but the trend is weaker in the other data sets.

  1. Life history and biogeographic diversification of an endemic western North American freshwater fish clade using a comparative species tree approach.

    Science.gov (United States)

    Baumsteiger, Jason; Kinziger, Andrew P; Aguilar, Andres

    2012-12-01

    The west coast of North America contains a number of biogeographic freshwater provinces which reflect an ever-changing aquatic landscape. Clues to understanding this complex structure are often encapsulated genetically in the ichthyofauna, though frequently as unresolved evolutionary relationships and putative cryptic species. Advances in molecular phylogenetics through species tree analyses now allow for improved exploration of these relationships. Using a comprehensive approach, we analyzed two mitochondrial and nine nuclear loci for a group of endemic freshwater fish (sculpin-Cottus) known for a wide ranging distribution and complex species structure in this region. Species delimitation techniques identified three novel cryptic lineages, all well supported by phylogenetic analyses. Comparative phylogenetic analyses consistently found five distinct clades reflecting a number of unique biogeographic provinces. Some internal node relationships varied by species tree reconstruction method, and were associated with either Bayesian or maximum likelihood statistical approaches or between mitochondrial, nuclear, and combined datasets. Limited cases of mitochondrial capture were also evident, suggestive of putative ancestral hybridization between species. Biogeographic diversification was associated with four major regions and revealed historical faunal exchanges across regions. Mapping of an important life-history character (amphidromy) revealed two separate instances of trait evolution, a transition that has occurred repeatedly in Cottus. This study demonstrates the power of current phylogenetic methods, the need for a comprehensive phylogenetic approach, and the potential for sculpin to serve as an indicator of biogeographic history for native ichthyofauna in the region. Copyright © 2012 Elsevier Inc. All rights reserved.

  2. Linking Tree Growth Response to Measured Microclimate - A Field Based Approach

    Science.gov (United States)

    Martin, J. T.; Hoylman, Z. H.; Looker, N. T.; Jencso, K. G.; Hu, J.

    2015-12-01

    The general relationship between climate and tree growth is a well established and important tenet shaping both paleo and future perspectives of forest ecosystem growth dynamics. Across much of the American west, water limits growth via physiological mechanisms that tie regional and local climatic conditions to forest productivity in a relatively predictable way, and these growth responses are clearly evident in tree ring records. However, within the annual cycle of a forest landscape, water availability varies across both time and space, and interacts with other potentially growth limiting factors such as temperature, light, and nutrients. In addition, tree growth responses may lag climate drivers and may vary in terms of where in a tree carbon is allocated. As such, determining when and where water actually limits forest growth in real time can be a significant challenge. Despite these challenges, we present data suggestive of real-time growth limitation driven by soil moisture supply and atmospheric water demand reflected in high frequency field measurements of stem radii and cell structure across ecological gradients. The experiment was conducted at the Lubrecht Experimental Forest in western Montana where, over two years, we observed intra-annual growth rates of four dominant conifer species: Douglas fir, Ponderosa Pine, Engelmann Spruce and Western Larch using point dendrometers and microcores. In all four species studied, compensatory use of stored water (inferred from stem water deficit) appears to exhibit a threshold relationship with a critical balance point between water supply and demand. The occurrence of this point in time coincided with a decrease in stem growth rates, and the while the timing varied up to one month across topographic and elevational gradients, the onset date of growth limitation was a reliable predictor of overall annual growth. Our findings support previous model-based observations of nonlinearity in the relationship between

  3. Autistic Regression

    Science.gov (United States)

    Matson, Johnny L.; Kozlowski, Alison M.

    2010-01-01

    Autistic regression is one of the many mysteries in the developmental course of autism and pervasive developmental disorders not otherwise specified (PDD-NOS). Various definitions of this phenomenon have been used, further clouding the study of the topic. Despite this problem, some efforts at establishing prevalence have been made. The purpose of…

  4. Generalized regression neural network (GRNN)-based approach for colored dissolved organic matter (CDOM) retrieval: case study of Connecticut River at Middle Haddam Station, USA.

    Science.gov (United States)

    Heddam, Salim

    2014-11-01

    The prediction of colored dissolved organic matter (CDOM) using artificial neural network approaches has received little attention in the past few decades. In this study, colored dissolved organic matter (CDOM) was modeled using generalized regression neural network (GRNN) and multiple linear regression (MLR) models as a function of Water temperature (TE), pH, specific conductance (SC), and turbidity (TU). Evaluation of the prediction accuracy of the models is based on the root mean square error (RMSE), mean absolute error (MAE), coefficient of correlation (CC), and Willmott's index of agreement (d). The results indicated that GRNN can be applied successfully for prediction of colored dissolved organic matter (CDOM).

  5. A Systematic Approach for Dynamic Security Assessment and the Corresponding Preventive Control Scheme Based on Decision Trees

    DEFF Research Database (Denmark)

    Liu, Leo; Sun, Kai; Rather, Zakir Hussain

    2014-01-01

    This paper proposes a decision tree (DT)-based systematic approach for cooperative online power system dynamic security assessment (DSA) and preventive control. This approach adopts a new methodology that trains two contingency-oriented DTs on a daily basis by the databases generated from power...... system simulations. Fed with real-time wide-area measurements, one DT of measurable variables is employed for online DSA to identify potential security issues, and the other DT of controllable variables provides online decision support on preventive control strategies against those issues. A cost......-effective algorithm is adopted in this proposed approach to optimize the trajectory of preventive control. The paper also proposes an importance sampling algorithm on database preparation for efficient DT training for power systems with high penetration of wind power and distributed generation. The performance...

  6. Interacting with mobile devices by fusion eye and hand gestures recognition systems based on decision tree approach

    Science.gov (United States)

    Elleuch, Hanene; Wali, Ali; Samet, Anis; Alimi, Adel M.

    2017-03-01

    Two systems of eyes and hand gestures recognition are used to control mobile devices. Based on a real-time video streaming captured from the device's camera, the first system recognizes the motion of user's eyes and the second one detects the static hand gestures. To avoid any confusion between natural and intentional movements we developed a system to fuse the decision coming from eyes and hands gesture recognition systems. The phase of fusion was based on decision tree approach. We conducted a study on 5 volunteers and the results that our system is robust and competitive.

  7. AucPR: An AUC-based approach using penalized regression for disease prediction with high-dimensional omics data

    OpenAIRE

    Yu, Wenbao; Park, Taesung

    2014-01-01

    Motivation It is common to get an optimal combination of markers for disease classification and prediction when multiple markers are available. Many approaches based on the area under the receiver operating characteristic curve (AUC) have been proposed. Existing works based on AUC in a high-dimensional context depend mainly on a non-parametric, smooth approximation of AUC, with no work using a parametric AUC-based approach, for high-dimensional data. Results We propose an AUC-based approach u...

  8. Data mining methods in the prediction of Dementia: A real-data comparison of the accuracy, sensitivity and specificity of linear discriminant analysis, logistic regression, neural networks, support vector machines, classification trees and random forests

    Directory of Open Access Journals (Sweden)

    Santana Isabel

    2011-08-01

    Full Text Available Abstract Background Dementia and cognitive impairment associated with aging are a major medical and social concern. Neuropsychological testing is a key element in the diagnostic procedures of Mild Cognitive Impairment (MCI, but has presently a limited value in the prediction of progression to dementia. We advance the hypothesis that newer statistical classification methods derived from data mining and machine learning methods like Neural Networks, Support Vector Machines and Random Forests can improve accuracy, sensitivity and specificity of predictions obtained from neuropsychological testing. Seven non parametric classifiers derived from data mining methods (Multilayer Perceptrons Neural Networks, Radial Basis Function Neural Networks, Support Vector Machines, CART, CHAID and QUEST Classification Trees and Random Forests were compared to three traditional classifiers (Linear Discriminant Analysis, Quadratic Discriminant Analysis and Logistic Regression in terms of overall classification accuracy, specificity, sensitivity, Area under the ROC curve and Press'Q. Model predictors were 10 neuropsychological tests currently used in the diagnosis of dementia. Statistical distributions of classification parameters obtained from a 5-fold cross-validation were compared using the Friedman's nonparametric test. Results Press' Q test showed that all classifiers performed better than chance alone (p Conclusions When taking into account sensitivity, specificity and overall classification accuracy Random Forests and Linear Discriminant analysis rank first among all the classifiers tested in prediction of dementia using several neuropsychological tests. These methods may be used to improve accuracy, sensitivity and specificity of Dementia predictions from neuropsychological testing.

  9. Penalized linear regression for discrete ill-posed problems: A hybrid least-squares and mean-squared error approach

    KAUST Repository

    Suliman, Mohamed Abdalla Elhag; Ballal, Tarig; Kammoun, Abla; Al-Naffouri, Tareq Y.

    2016-01-01

    This paper proposes a new approach to find the regularization parameter for linear least-squares discrete ill-posed problems. In the proposed approach, an artificial perturbation matrix with a bounded norm is forced into the discrete ill-posed model

  10. Penalized linear regression for discrete ill-posed problems: A hybrid least-squares and mean-squared error approach

    KAUST Repository

    Suliman, Mohamed Abdalla Elhag

    2016-12-19

    This paper proposes a new approach to find the regularization parameter for linear least-squares discrete ill-posed problems. In the proposed approach, an artificial perturbation matrix with a bounded norm is forced into the discrete ill-posed model matrix. This perturbation is introduced to enhance the singular-value (SV) structure of the matrix and hence to provide a better solution. The proposed approach is derived to select the regularization parameter in a way that minimizes the mean-squared error (MSE) of the estimator. Numerical results demonstrate that the proposed approach outperforms a set of benchmark methods in most cases when applied to different scenarios of discrete ill-posed problems. Jointly, the proposed approach enjoys the lowest run-time and offers the highest level of robustness amongst all the tested methods.

  11. The Spatial Association Between Federally Qualified Health Centers and County-Level Reported Sexually Transmitted Infections: A Spatial Regression Approach.

    Science.gov (United States)

    Owusu-Edusei, Kwame; Gift, Thomas L; Leichliter, Jami S; Romaguera, Raul A

    2018-02-01

    The number of categorical sexually transmitted disease (STD) clinics is declining in the United States. Federally qualified health centers (FQHCs) have the potential to supplement the needed sexually transmitted infection (STI) services. In this study, we describe the spatial distribution of FQHC sites and determine if reported county-level nonviral STI morbidity were associated with having FQHC(s) using spatial regression techniques. We extracted map data from the Health Resources and Services Administration data warehouse on FQHCs (ie, geocoded health care service delivery [HCSD] sites) and extracted county-level data on the reported rates of chlamydia, gonorrhea and, primary and secondary (P&S) syphilis (2008-2012) from surveillance data. A 3-equation seemingly unrelated regression estimation procedure (with a spatial regression specification that controlled for county-level multiyear (2008-2012) demographic and socioeconomic factors) was used to determine the association between reported county-level STI morbidity and HCSD sites. Counties with HCSD sites had higher STI, poverty, unemployment, and violent crime rates than counties with no HCSD sites (P < 0.05). The number of HCSD sites was associated (P < 0.01) with increases in the temporally smoothed rates of chlamydia, gonorrhea, and P&S syphilis, but there was no significant association between the number of HCSD per 100,000 population and reported STI rates. There is a positive association between STI morbidity and the number of HCSD sites; however, this association does not exist when adjusting by population size. Further work may determine the extent to which HCSD sites can meet unmet needs for safety net STI services.

  12. Linear regression

    CERN Document Server

    Olive, David J

    2017-01-01

    This text covers both multiple linear regression and some experimental design models. The text uses the response plot to visualize the model and to detect outliers, does not assume that the error distribution has a known parametric distribution, develops prediction intervals that work when the error distribution is unknown, suggests bootstrap hypothesis tests that may be useful for inference after variable selection, and develops prediction regions and large sample theory for the multivariate linear regression model that has m response variables. A relationship between multivariate prediction regions and confidence regions provides a simple way to bootstrap confidence regions. These confidence regions often provide a practical method for testing hypotheses. There is also a chapter on generalized linear models and generalized additive models. There are many R functions to produce response and residual plots, to simulate prediction intervals and hypothesis tests, to detect outliers, and to choose response trans...

  13. Benefit-based tree valuation

    Science.gov (United States)

    E.G. McPherson

    2007-01-01

    Benefit-based tree valuation provides alternative estimates of the fair and reasonable value of trees while illustrating the relative contribution of different benefit types. This study compared estimates of tree value obtained using cost- and benefit-based approaches. The cost-based approach used the Council of Landscape and Tree Appraisers trunk formula method, and...

  14. Calibration of the Diameter Distribution Derived from the Area-based Approach with Individual Tree-based Diameter Estimates Using the Airborne Laser Scanning

    Science.gov (United States)

    Xu, Q.; Hou, Z.; Maltamo, M.; Tokola, T.

    2015-12-01

    Diameter distributions of trees are important indicators of current forest stand structure and future dynamics. A new method was proposed in the study to combine the diameter distributions derived from the area-based approach (ABA) and the diameter distribution derived from the individual tree detection (ITD) in order to obtain more accurate forest stand attributes. Since dominant trees can be reliably detected and measured by the Lidar data via the ITD, the focus of the study is to retrieve the suppressed trees (trees that were missed by the ITD) from the ABA. Replacement and histogram matching were respectively employed at the plot level to retrieve the suppressed trees. Cut point was detected from the ITD-derived diameter distribution for each sample plot to distinguish dominant trees from the suppressed trees. The results showed that calibrated diameter distributions were more accurate in terms of error index and the entire growing stock estimates. Compared with the best performer between the ABA and the ITD, calibrated diameter distributions decreased the relative RMSE of the estimated entire growing stock, saw log and pulpwood fractions by 2.81%, 3.05% and 7.73% points respectively. Calibration improved the estimation of pulpwood fraction significantly, resulting in a negligible bias of the estimated entire growing stock.

  15. Supporting Frequent Updates in R-Trees: A Bottom-Up Approach

    DEFF Research Database (Denmark)

    Lee, Mong Li; Hsu, Wynne; Jensen, Christian Søndergaard

    2003-01-01

    Advances in hardware-related technologies promise to enable new data management applications that monitor continuous processes. In these applications, enormous amounts of state samples are obtained via sensors and are streamed to a database. Further, updates are very frequent and may exhibit...... and aims to improve update performance. It has different levels of reorganization—ranging from global to local—during updates, avoiding expensive top-down updates. A compact main-memory summary structure that allows direct access to the R-tree index nodes is used together with efficient bottom...

  16. A constructive approach to minimal free resolutions of path ideals of trees

    Directory of Open Access Journals (Sweden)

    Rachelle R. Bouchat

    2017-01-01

    Full Text Available For a rooted tree $\\Gamma$, we consider path ideals of $\\Gamma$, which are ideals that are generated by all directed paths of a fixed length in $\\Gamma$. In this paper, we provide a combinatorial description of the minimal free resolution of these path ideals. In particular, we provide a class of subforests of $\\Gamma$ that are in one-to-one correspondence with the multi-graded Betti numbers of the path ideal as well as providing a method for determining the projective dimension and the Castelnuovo-Mumford regularity of a given path ideal.

  17. Plateletpheresis efficiency and mathematical correction of software-derived platelet yield prediction: A linear regression and ROC modeling approach.

    Science.gov (United States)

    Jaime-Pérez, José Carlos; Jiménez-Castillo, Raúl Alberto; Vázquez-Hernández, Karina Elizabeth; Salazar-Riojas, Rosario; Méndez-Ramírez, Nereida; Gómez-Almaguer, David

    2017-10-01

    Advances in automated cell separators have improved the efficiency of plateletpheresis and the possibility of obtaining double products (DP). We assessed cell processor accuracy of predicted platelet (PLT) yields with the goal of a better prediction of DP collections. This retrospective proof-of-concept study included 302 plateletpheresis procedures performed on a Trima Accel v6.0 at the apheresis unit of a hematology department. Donor variables, software predicted yield and actual PLT yield were statistically evaluated. Software prediction was optimized by linear regression analysis and its optimal cut-off to obtain a DP assessed by receiver operating characteristic curve (ROC) modeling. Three hundred and two plateletpheresis procedures were performed; in 271 (89.7%) occasions, donors were men and in 31 (10.3%) women. Pre-donation PLT count had the best direct correlation with actual PLT yield (r = 0.486. P Simple correction derived from linear regression analysis accurately corrected this underestimation and ROC analysis identified a precise cut-off to reliably predict a DP. © 2016 Wiley Periodicals, Inc.

  18. Heartwood and sapwood in eucalyptus trees: non-conventional approach to wood quality.

    Science.gov (United States)

    Cherelli, Sabrina G; Sartori, Maria Márcia P; Próspero, André G; Ballarin, Adriano W

    2018-01-01

    This study evaluated the quality of heartwood and sapwood from mature trees of three species of Eucalyptus, by means of the qualification of their proportion, determination of basic and apparent density using non-destructive attenuation of gamma radiation technique and calculation of the density uniformity index. Six trees of each species (Eucalyptus grandis - 18 years old, Eucalyptus tereticornis - 35 years old and Corymbia citriodora - 28 years old) were used in the experimental program. The heartwood and sapwood were delimited by macroscopic analysis and the calculation of areas and percentage of heartwood and sapwood were performed using digital image. The uniformity index was calculated following methodology which numerically quantifies the dispersion of punctual density values of the wood around the mean density along the radius. The percentage of the heartwood was higher than the sapwood in all species studied. The density results showed no statistical difference between heartwood and sapwood. Differently from the density results, in all species studied there was statistical differences between uniformity indexes for heartwood and sapwood regions, making justifiable the inclusion of the density uniformity index as a quality parameter for Eucalyptus wood.

  19. The Covariance Adjustment Approaches for Combining Incomparable Cox Regressions Caused by Unbalanced Covariates Adjustment: A Multivariate Meta-Analysis Study

    Directory of Open Access Journals (Sweden)

    Tania Dehesh

    2015-01-01

    Full Text Available Background. Univariate meta-analysis (UM procedure, as a technique that provides a single overall result, has become increasingly popular. Neglecting the existence of other concomitant covariates in the models leads to loss of treatment efficiency. Our aim was proposing four new approximation approaches for the covariance matrix of the coefficients, which is not readily available for the multivariate generalized least square (MGLS method as a multivariate meta-analysis approach. Methods. We evaluated the efficiency of four new approaches including zero correlation (ZC, common correlation (CC, estimated correlation (EC, and multivariate multilevel correlation (MMC on the estimation bias, mean square error (MSE, and 95% probability coverage of the confidence interval (CI in the synthesis of Cox proportional hazard models coefficients in a simulation study. Result. Comparing the results of the simulation study on the MSE, bias, and CI of the estimated coefficients indicated that MMC approach was the most accurate procedure compared to EC, CC, and ZC procedures. The precision ranking of the four approaches according to all above settings was MMC ≥ EC ≥ CC ≥ ZC. Conclusion. This study highlights advantages of MGLS meta-analysis on UM approach. The results suggested the use of MMC procedure to overcome the lack of information for having a complete covariance matrix of the coefficients.

  20. The Covariance Adjustment Approaches for Combining Incomparable Cox Regressions Caused by Unbalanced Covariates Adjustment: A Multivariate Meta-Analysis Study.

    Science.gov (United States)

    Dehesh, Tania; Zare, Najaf; Ayatollahi, Seyyed Mohammad Taghi

    2015-01-01

    Univariate meta-analysis (UM) procedure, as a technique that provides a single overall result, has become increasingly popular. Neglecting the existence of other concomitant covariates in the models leads to loss of treatment efficiency. Our aim was proposing four new approximation approaches for the covariance matrix of the coefficients, which is not readily available for the multivariate generalized least square (MGLS) method as a multivariate meta-analysis approach. We evaluated the efficiency of four new approaches including zero correlation (ZC), common correlation (CC), estimated correlation (EC), and multivariate multilevel correlation (MMC) on the estimation bias, mean square error (MSE), and 95% probability coverage of the confidence interval (CI) in the synthesis of Cox proportional hazard models coefficients in a simulation study. Comparing the results of the simulation study on the MSE, bias, and CI of the estimated coefficients indicated that MMC approach was the most accurate procedure compared to EC, CC, and ZC procedures. The precision ranking of the four approaches according to all above settings was MMC ≥ EC ≥ CC ≥ ZC. This study highlights advantages of MGLS meta-analysis on UM approach. The results suggested the use of MMC procedure to overcome the lack of information for having a complete covariance matrix of the coefficients.

  1. Fault tree handbook

    International Nuclear Information System (INIS)

    Haasl, D.F.; Roberts, N.H.; Vesely, W.E.; Goldberg, F.F.

    1981-01-01

    This handbook describes a methodology for reliability analysis of complex systems such as those which comprise the engineered safety features of nuclear power generating stations. After an initial overview of the available system analysis approaches, the handbook focuses on a description of the deductive method known as fault tree analysis. The following aspects of fault tree analysis are covered: basic concepts for fault tree analysis; basic elements of a fault tree; fault tree construction; probability, statistics, and Boolean algebra for the fault tree analyst; qualitative and quantitative fault tree evaluation techniques; and computer codes for fault tree evaluation. Also discussed are several example problems illustrating the basic concepts of fault tree construction and evaluation

  2. Spatially explicit multi-threat assessment of food tree species in Burkina Faso: A fine-scale approach.

    Directory of Open Access Journals (Sweden)

    Hannes Gaisberger

    Full Text Available Over the last decades agroforestry parklands in Burkina Faso have come under increasing demographic as well as climatic pressures, which are threatening indigenous tree species that contribute substantially to income generation and nutrition in rural households. Analyzing the threats as well as the species vulnerability to them is fundamental for priority setting in conservation planning. Guided by literature and local experts we selected 16 important food tree species (Acacia macrostachya, Acacia senegal, Adansonia digitata, Annona senegalensis, Balanites aegyptiaca, Bombax costatum, Boscia senegalensis, Detarium microcarpum, Lannea microcarpa, Parkia biglobosa, Sclerocarya birrea, Strychnos spinosa, Tamarindus indica, Vitellaria paradoxa, Ximenia americana, Ziziphus mauritiana and six key threats to them (overexploitation, overgrazing, fire, cotton production, mining and climate change. We developed a species-specific and spatially explicit approach combining freely accessible datasets, species distribution models (SDMs, climate models and expert survey results to predict, at fine scale, where these threats are likely to have the greatest impact. We find that all species face serious threats throughout much of their distribution in Burkina Faso and that climate change is predicted to be the most prevalent threat in the long term, whereas overexploitation and cotton production are the most important short-term threats. Tree populations growing in areas designated as 'highly threatened' due to climate change should be used as seed sources for ex situ conservation and planting in areas where future climate is predicting suitable habitats. Assisted regeneration is suggested for populations in areas where suitable habitat under future climate conditions coincides with high threat levels due to short-term threats. In the case of Vitellaria paradoxa, we suggest collecting seed along the northern margins of its distribution and considering assisted

  3. Growth curves of preschool children in the northeast of iran: a population based study using quantile regression approach.

    Science.gov (United States)

    Payande, Abolfazl; Tabesh, Hamed; Shakeri, Mohammad Taghi; Saki, Azadeh; Safarian, Mohammad

    2013-01-14

    Growth charts are widely used to assess children's growth status and can provide a trajectory of growth during early important months of life. The objectives of this study are going to construct growth charts and normal values of weight-for-age for children aged 0 to 5 years using a powerful and applicable methodology. The results compare with the World Health Organization (WHO) references and semi-parametric LMS method of Cole and Green. A total of 70737 apparently healthy boys and girls aged 0 to 5 years were recruited in July 2004 for 20 days from those attending community clinics for routine health checks as a part of a national survey. Anthropometric measurements were done by trained health staff using WHO methodology. The nonparametric quantile regression method obtained by local constant kernel estimation of conditional quantiles curves using for estimation of curves and normal values. The weight-for-age growth curves for boys and girls aged from 0 to 5 years were derived utilizing a population of children living in the northeast of Iran. The results were similar to the ones obtained by the semi-parametric LMS method in the same data. Among all age groups from 0 to 5 years, the median values of children's weight living in the northeast of Iran were lower than the corresponding values in WHO reference data. The weight curves of boys were higher than those of girls in all age groups. The differences between growth patterns of children living in the northeast of Iran versus international ones necessitate using local and regional growth charts. International normal values may not properly recognize the populations at risk for growth problems in Iranian children. Quantile regression (QR) as a flexible method which doesn't require restricted assumptions, proposed for estimation reference curves and normal values.

  4. Comparison of rule induction, decision trees and formal concept analysis approaches for classification

    Science.gov (United States)

    Kotelnikov, E. V.; Milov, V. R.

    2018-05-01

    Rule-based learning algorithms have higher transparency and easiness to interpret in comparison with neural networks and deep learning algorithms. These properties make it possible to effectively use such algorithms to solve descriptive tasks of data mining. The choice of an algorithm depends also on its ability to solve predictive tasks. The article compares the quality of the solution of the problems with binary and multiclass classification based on the experiments with six datasets from the UCI Machine Learning Repository. The authors investigate three algorithms: Ripper (rule induction), C4.5 (decision trees), In-Close (formal concept analysis). The results of the experiments show that In-Close demonstrates the best quality of classification in comparison with Ripper and C4.5, however the latter two generate more compact rule sets.

  5. Genetic Markers Analyses and Bioinformatic Approaches to Distinguish Between Olive Tree (Olea europaea L.) Cultivars.

    Science.gov (United States)

    Ben Ayed, Rayda; Ben Hassen, Hanen; Ennouri, Karim; Rebai, Ahmed

    2016-12-01

    The genetic diversity of 22 olive tree cultivars (Olea europaea L.) sampled from different Mediterranean countries was assessed using 5 SNP markers (FAD2.1; FAD2.3; CALC; SOD and ANTHO3) located in four different genes. The genotyping analysis of the 22 cultivars with 5 SNP loci revealed 11 alleles (average 2.2 per allele). The dendrogram based on cultivar genotypes revealed three clusters consistent with the cultivars classification. Besides, the results obtained with the five SNPs were compared to those obtained with the SSR markers using bioinformatic analyses and by computing a cophenetic correlation coefficient, indicating the usefulness of the UPGMA method for clustering plant genotypes. Based on principal coordinate analysis using a similarity matrix, the first two coordinates, revealed 54.94 % of the total variance. This work provides a more comprehensive explanation of the diversity available in Tunisia olive cultivars, and an important contribution for olive breeding and olive oil authenticity.

  6. A novel decision tree approach based on transcranial Doppler sonography to screen for blunt cervical vascular injuries.

    Science.gov (United States)

    Purvis, Dianna; Aldaghlas, Tayseer; Trickey, Amber W; Rizzo, Anne; Sikdar, Siddhartha

    2013-06-01

    Early detection and treatment of blunt cervical vascular injuries prevent adverse neurologic sequelae. Current screening criteria can miss up to 22% of these injuries. The study objective was to investigate bedside transcranial Doppler sonography for detecting blunt cervical vascular injuries in trauma patients using a novel decision tree approach. This prospective pilot study was conducted at a level I trauma center. Patients undergoing computed tomographic angiography for suspected blunt cervical vascular injuries were studied with transcranial Doppler sonography. Extracranial and intracranial vasculatures were examined with a portable power M-mode transcranial Doppler unit. The middle cerebral artery mean flow velocity, pulsatility index, and their asymmetries were used to quantify flow patterns and develop an injury decision tree screening protocol. Student t tests validated associations between injuries and transcranial Doppler predictive measures. We evaluated 27 trauma patients with 13 injuries. Single vertebral artery injuries were most common (38.5%), followed by single internal carotid artery injuries (30%). Compared to patients without injuries, mean flow velocity asymmetry was higher for single internal carotid artery (P = .003) and single vertebral artery (P = .004) injuries. Similarly, pulsatility index asymmetry was higher in single internal carotid artery (P = .015) and single vertebral artery (P = .042) injuries, whereas the lowest pulsatility index was elevated for bilateral vertebral artery injuries (P = .006). The decision tree yielded 92% specificity, 93% sensitivity, and 93% correct classifications. In this pilot feasibility study, transcranial Doppler measures were significantly associated with the blunt cervical vascular injury status, suggesting that transcranial Doppler sonography might be a viable bedside screening tool for trauma. Patient-specific hemodynamic information from transcranial Doppler assessment has the potential to alter

  7. Financial performance monitoring of the technical efficiency of critical access hospitals: a data envelopment analysis and logistic regression modeling approach.

    Science.gov (United States)

    Wilson, Asa B; Kerr, Bernard J; Bastian, Nathaniel D; Fulton, Lawrence V

    2012-01-01

    From 1980 to 1999, rural designated hospitals closed at a disproportionally high rate. In response to this emergent threat to healthcare access in rural settings, the Balanced Budget Act of 1997 made provisions for the creation of a new rural hospital--the critical access hospital (CAH). The conversion to CAH and the associated cost-based reimbursement scheme significantly slowed the closure rate of rural hospitals. This work investigates which methods can ensure the long-term viability of small hospitals. This article uses a two-step design to focus on a hypothesized relationship between technical efficiency of CAHs and a recently developed set of financial monitors for these entities. The goal is to identify the financial performance measures associated with efficiency. The first step uses data envelopment analysis (DEA) to differentiate efficient from inefficient facilities within a data set of 183 CAHs. Determining DEA efficiency is an a priori categorization of hospitals in the data set as efficient or inefficient. In the second step, DEA efficiency is the categorical dependent variable (efficient = 0, inefficient = 1) in the subsequent binary logistic regression (LR) model. A set of six financial monitors selected from the array of 20 measures were the LR independent variables. We use a binary LR to test the null hypothesis that recently developed CAH financial indicators had no predictive value for categorizing a CAH as efficient or inefficient, (i.e., there is no relationship between DEA efficiency and fiscal performance).

  8. Disease Mapping and Regression with Count Data in the Presence of Overdispersion and Spatial Autocorrelation: A Bayesian Model Averaging Approach

    Science.gov (United States)

    Mohebbi, Mohammadreza; Wolfe, Rory; Forbes, Andrew

    2014-01-01

    This paper applies the generalised linear model for modelling geographical variation to esophageal cancer incidence data in the Caspian region of Iran. The data have a complex and hierarchical structure that makes them suitable for hierarchical analysis using Bayesian techniques, but with care required to deal with problems arising from counts of events observed in small geographical areas when overdispersion and residual spatial autocorrelation are present. These considerations lead to nine regression models derived from using three probability distributions for count data: Poisson, generalised Poisson and negative binomial, and three different autocorrelation structures. We employ the framework of Bayesian variable selection and a Gibbs sampling based technique to identify significant cancer risk factors. The framework deals with situations where the number of possible models based on different combinations of candidate explanatory variables is large enough such that calculation of posterior probabilities for all models is difficult or infeasible. The evidence from applying the modelling methodology suggests that modelling strategies based on the use of generalised Poisson and negative binomial with spatial autocorrelation work well and provide a robust basis for inference. PMID:24413702

  9. Relative accuracy of spatial predictive models for lynx Lynx canadensis derived using logistic regression-AIC, multiple criteria evaluation and Bayesian approaches

    Directory of Open Access Journals (Sweden)

    Shelley M. ALEXANDER

    2009-02-01

    Full Text Available We compared probability surfaces derived using one set of environmental variables in three Geographic Information Systems (GIS-based approaches: logistic regression and Akaike’s Information Criterion (AIC, Multiple Criteria Evaluation (MCE, and Bayesian Analysis (specifically Dempster-Shafer theory. We used lynx Lynx canadensis as our focal species, and developed our environment relationship model using track data collected in Banff National Park, Alberta, Canada, during winters from 1997 to 2000. The accuracy of the three spatial models were compared using a contingency table method. We determined the percentage of cases in which both presence and absence points were correctly classified (overall accuracy, the failure to predict a species where it occurred (omission error and the prediction of presence where there was absence (commission error. Our overall accuracy showed the logistic regression approach was the most accurate (74.51%. The multiple criteria evaluation was intermediate (39.22%, while the Dempster-Shafer (D-S theory model was the poorest (29.90%. However, omission and commission error tell us a different story: logistic regression had the lowest commission error, while D-S theory produced the lowest omission error. Our results provide evidence that habitat modellers should evaluate all three error measures when ascribing confidence in their model. We suggest that for our study area at least, the logistic regression model is optimal. However, where sample size is small or the species is very rare, it may also be useful to explore and/or use a more ecologically cautious modelling approach (e.g. Dempster-Shafer that would over-predict, protect more sites, and thereby minimize the risk of missing critical habitat in conservation plans[Current Zoology 55(1: 28 – 40, 2009].

  10. Classification and Regression Tree Analysis of Clinical Patterns that Predict Survival in 127 Chinese Patients with Advanced Non-small Cell Lung Cancer Treated by Gefitinib Who Failed to Previous Chemotherapy

    Directory of Open Access Journals (Sweden)

    Ziping WANG

    2011-09-01

    Full Text Available Background and objective It has been proven that gefitinib produces only 10%-20% tumor regression in heavily pretreated, unselected non-small cell lung cancer (NSCLC patients as the second- and third-line setting. Asian, female, nonsmokers and adenocarcinoma are favorable factors; however, it is difficult to find a patient satisfying all the above clinical characteristics. The aim of this study is to identify novel predicting factors, and to explore the interactions between clinical variables and their impact on the survival of Chinese patients with advanced NSCLC who were heavily treated with gefitinib in the second- or third-line setting. Methods The clinical and follow-up data of 127 advanced NSCLC patients referred to the Cancer Hospital & Institute, Chinese Academy of Medical Sciences from March 2005 to March 2010 were analyzed. Multivariate analysis of progression-free survival (PFS was performed using recursive partitioning, which is referred to as the classification and regression tree (CART analysis. Results The median PFS of 127 eligible consecutive advanced NSCLC patients was 8.0 months (95%CI: 5.8-10.2. CART was performed with an initial split on first-line chemotherapy outcomes and a second split on patients’ age. Three terminal subgroups were formed. The median PFS of the three subsets ranged from 1.0 month (95%CI: 0.8-1.2 for those with progressive disease outcome after the first-line chemotherapy subgroup, 10 months (95%CI: 7.0-13.0 in patients with a partial response or stable disease in first-line chemotherapy and age <70, and 22.0 months for patients obtaining a partial response or stable disease in first-line chemotherapy at age 70-81 (95%CI: 3.8-40.1. Conclusion Partial response, stable disease in first-line chemotherapy and age ≥ 70 are closely correlated with long-term survival treated by gefitinib as a second- or third-line setting in advanced NSCLC. CART can be used to identify previously unappreciated patient

  11. Optimization of classification and regression analysis of four monoclonal antibodies from Raman spectra using collaborative machine learning approach.

    Science.gov (United States)

    Le, Laetitia Minh Maï; Kégl, Balázs; Gramfort, Alexandre; Marini, Camille; Nguyen, David; Cherti, Mehdi; Tfaili, Sana; Tfayli, Ali; Baillet-Guffroy, Arlette; Prognon, Patrice; Chaminade, Pierre; Caudron, Eric

    2018-07-01

    The use of monoclonal antibodies (mAbs) constitutes one of the most important strategies to treat patients suffering from cancers such as hematological malignancies and solid tumors. These antibodies are prescribed by the physician and prepared by hospital pharmacists. An analytical control enables the quality of the preparations to be ensured. The aim of this study was to explore the development of a rapid analytical method for quality control. The method used four mAbs (Infliximab, Bevacizumab, Rituximab and Ramucirumab) at various concentrations and was based on recording Raman data and coupling them to a traditional chemometric and machine learning approach for data analysis. Compared to conventional linear approach, prediction errors are reduced with a data-driven approach using statistical machine learning methods. In the latter, preprocessing and predictive models are jointly optimized. An additional original aspect of the work involved on submitting the problem to a collaborative data challenge platform called Rapid Analytics and Model Prototyping (RAMP). This allowed using solutions from about 300 data scientists in collaborative work. Using machine learning, the prediction of the four mAbs samples was considerably improved. The best predictive model showed a combined error of 2.4% versus 14.6% using linear approach. The concentration and classification errors were 5.8% and 0.7%, only three spectra were misclassified over the 429 spectra of the test set. This large improvement obtained with machine learning techniques was uniform for all molecules but maximal for Bevacizumab with an 88.3% reduction on combined errors (2.1% versus 17.9%). Copyright © 2018 Elsevier B.V. All rights reserved.

  12. Understanding poisson regression.

    Science.gov (United States)

    Hayat, Matthew J; Higgins, Melinda

    2014-04-01

    Nurse investigators often collect study data in the form of counts. Traditional methods of data analysis have historically approached analysis of count data either as if the count data were continuous and normally distributed or with dichotomization of the counts into the categories of occurred or did not occur. These outdated methods for analyzing count data have been replaced with more appropriate statistical methods that make use of the Poisson probability distribution, which is useful for analyzing count data. The purpose of this article is to provide an overview of the Poisson distribution and its use in Poisson regression. Assumption violations for the standard Poisson regression model are addressed with alternative approaches, including addition of an overdispersion parameter or negative binomial regression. An illustrative example is presented with an application from the ENSPIRE study, and regression modeling of comorbidity data is included for illustrative purposes. Copyright 2014, SLACK Incorporated.

  13. A Forecasting Approach Combining Self-Organizing Map with Support Vector Regression for Reservoir Inflow during Typhoon Periods

    Directory of Open Access Journals (Sweden)

    Gwo-Fong Lin

    2016-01-01

    Full Text Available This study describes the development of a reservoir inflow forecasting model for typhoon events to improve short lead-time flood forecasting performance. To strengthen the forecasting ability of the original support vector machines (SVMs model, the self-organizing map (SOM is adopted to group inputs into different clusters in advance of the proposed SOM-SVM model. Two different input methods are proposed for the SVM-based forecasting method, namely, SOM-SVM1 and SOM-SVM2. The methods are applied to an actual reservoir watershed to determine the 1 to 3 h ahead inflow forecasts. For 1, 2, and 3 h ahead forecasts, improvements in mean coefficient of efficiency (MCE due to the clusters obtained from SOM-SVM1 are 21.5%, 18.5%, and 23.0%, respectively. Furthermore, improvement in MCE for SOM-SVM2 is 20.9%, 21.2%, and 35.4%, respectively. Another SOM-SVM2 model increases the SOM-SVM1 model for 1, 2, and 3 h ahead forecasts obtained improvement increases of 0.33%, 2.25%, and 10.08%, respectively. These results show that the performance of the proposed model can provide improved forecasts of hourly inflow, especially in the proposed SOM-SVM2 model. In conclusion, the proposed model, which considers limit and higher related inputs instead of all inputs, can generate better forecasts in different clusters than are generated from the SOM process. The SOM-SVM2 model is recommended as an alternative to the original SVR (Support Vector Regression model because of its accuracy and robustness.

  14. Investigation of the marked and long-standing spatial inhomogeneity of the Hungarian suicide rate: a spatial regression approach.

    Science.gov (United States)

    Balint, Lajos; Dome, Peter; Daroczi, Gergely; Gonda, Xenia; Rihmer, Zoltan

    2014-02-01

    In the last century Hungary had astonishingly high suicide rates characterized by marked regional within-country inequalities, a spatial pattern which has been quite stable over time. To explain the above phenomenon at the level of micro-regions (n=175) in the period between 2005 and 2011. Our dependent variable was the age and gender standardized mortality ratio (SMR) for suicide while explanatory variables were factors which are supposed to influence suicide risk, such as measures of religious and political integration, travel time accessibility of psychiatric services, alcohol consumption, unemployment and disability pensionery. When applying the ordinary least squared regression model, the residuals were found to be spatially autocorrelated, which indicates the violation of the assumption on the independence of error terms and - accordingly - the necessity of application of a spatial autoregressive (SAR) model to handle this problem. According to our calculations the SARlag model was a better way (versus the SARerr model) of addressing the problem of spatial autocorrelation, furthermore its substantive meaning is more convenient. SMR was significantly associated with the "political integration" variable in a negative and with "lack of religious integration" and "disability pensionery" variables in a positive manner. Associations were not significant for the remaining explanatory variables. Several important psychiatric variables were not available at the level of micro-regions. We conducted our analysis on aggregate data. Our results may draw attention to the relevance and abiding validity of the classic Durkheimian suicide risk factors - such as lack of social integration - apropos of the spatial pattern of Hungarian suicides. © 2013 Published by Elsevier B.V.

  15. An alternative approach to the determination of scaling law expressions for the L–H transition in Tokamaks utilizing classification tools instead of regression

    International Nuclear Information System (INIS)

    Gaudio, P; Gelfusa, M; Lupelli, I; Murari, A; Vega, J

    2014-01-01

    A new approach to determine the power law expressions for the threshold between the H and L mode of confinement is presented. The method is based on two powerful machine learning tools for classification: neural networks and support vector machines. Using as inputs clear examples of the systems on either side of the transition, the machine learning tools learn the input–output mapping corresponding to the equations of the boundary separating the confinement regimes. Systematic tests with synthetic data show that the machine learning tools provide results competitive with traditional statistical regression and more robust against random noise and systematic errors. The developed tools have then been applied to the multi-machine International Tokamak Physics Activity International Global Threshold Database of validated ITER-like Tokamak discharges. The machine learning tools converge on the same scaling law parameters obtained with non-linear regression. On the other hand, the developed tools allow a reduction of 50% of the uncertainty in the extrapolations to ITER. Therefore the proposed approach can effectively complement traditional regression since its application poses much less stringent requirements on the experimental data, to be used to determine the scaling laws, because they do not require examples exactly at the moment of the transition. (paper)

  16. Equivalence among three alternative approaches to estimating live tree carbon stocks in the eastern United States

    Science.gov (United States)

    Coeli M. Hoover; James E. Smith

    2017-01-01

    Assessments of forest carbon are available via multiple alternate tools or applications and are in use to address various regulatory and reporting requirements. The various approaches to making such estimates may or may not be entirely comparable. Knowing how the estimates produced by some commonly used approaches vary across forest types and regions allows users of...

  17. A Two-Step Approach for Analysis of Nonignorable Missing Outcomes in Longitudinal Regression: an Application to Upstate KIDS Study.

    Science.gov (United States)

    Liu, Danping; Yeung, Edwina H; McLain, Alexander C; Xie, Yunlong; Buck Louis, Germaine M; Sundaram, Rajeshwari

    2017-09-01

    Imperfect follow-up in longitudinal studies commonly leads to missing outcome data that can potentially bias the inference when the missingness is nonignorable; that is, the propensity of missingness depends on missing values in the data. In the Upstate KIDS Study, we seek to determine if the missingness of child development outcomes is nonignorable, and how a simple model assuming ignorable missingness would compare with more complicated models for a nonignorable mechanism. To correct for nonignorable missingness, the shared random effects model (SREM) jointly models the outcome and the missing mechanism. However, the computational complexity and lack of software packages has limited its practical applications. This paper proposes a novel two-step approach to handle nonignorable missing outcomes in generalized linear mixed models. We first analyse the missing mechanism with a generalized linear mixed model and predict values of the random effects; then, the outcome model is fitted adjusting for the predicted random effects to account for heterogeneity in the missingness propensity. Extensive simulation studies suggest that the proposed method is a reliable approximation to SREM, with a much faster computation. The nonignorability of missing data in the Upstate KIDS Study is estimated to be mild to moderate, and the analyses using the two-step approach or SREM are similar to the model assuming ignorable missingness. The two-step approach is a computationally straightforward method that can be conducted as sensitivity analyses in longitudinal studies to examine violations to the ignorable missingness assumption and the implications relative to health outcomes. © 2017 John Wiley & Sons Ltd.

  18. Boosted beta regression.

    Directory of Open Access Journals (Sweden)

    Matthias Schmid

    Full Text Available Regression analysis with a bounded outcome is a common problem in applied statistics. Typical examples include regression models for percentage outcomes and the analysis of ratings that are measured on a bounded scale. In this paper, we consider beta regression, which is a generalization of logit models to situations where the response is continuous on the interval (0,1. Consequently, beta regression is a convenient tool for analyzing percentage responses. The classical approach to fit a beta regression model is to use maximum likelihood estimation with subsequent AIC-based variable selection. As an alternative to this established - yet unstable - approach, we propose a new estimation technique called boosted beta regression. With boosted beta regression estimation and variable selection can be carried out simultaneously in a highly efficient way. Additionally, both the mean and the variance of a percentage response can be modeled using flexible nonlinear covariate effects. As a consequence, the new method accounts for common problems such as overdispersion and non-binomial variance structures.

  19. Mapping trees outside forests using high-resolution aerial imagery: a comparison of pixel- and object based classification approaches

    Science.gov (United States)

    Dacia M. Meneguzzo; Greg C. Liknes; Mark D. Nelson

    2013-01-01

    Discrete trees and small groups of trees in nonforest settings are considered an essential resource around the world and are collectively referred to as trees outside forests (ToF). ToF provide important functions across the landscape, such as protecting soil and water resources, providing wildlife habitat, and improving farmstead energy efficiency and aesthetics....

  20. Isokinetic knee strength qualities as predictors of jumping performance in high-level volleyball athletes: multiple regression approach.

    Science.gov (United States)

    Sattler, Tine; Sekulic, Damir; Spasic, Miodrag; Osmankac, Nedzad; Vicente João, Paulo; Dervisevic, Edvin; Hadzic, Vedran

    2016-01-01

    Previous investigations noted potential importance of isokinetic strength in rapid muscular performances, such as jumping. This study aimed to identify the influence of isokinetic-knee-strength on specific jumping performance in volleyball. The secondary aim of the study was to evaluate reliability and validity of the two volleyball-specific jumping tests. The sample comprised 67 female (21.96±3.79 years; 68.26±8.52 kg; 174.43±6.85 cm) and 99 male (23.62±5.27 years; 84.83±10.37 kg; 189.01±7.21 cm) high- volleyball players who competed in 1st and 2nd National Division. Subjects were randomly divided into validation (N.=55 and 33 for males and females, respectively) and cross-validation subsamples (N.=54 and 34 for males and females, respectively). Set of predictors included isokinetic tests, to evaluate the eccentric and concentric strength capacities of the knee extensors, and flexors for dominant and non-dominant leg. The main outcome measure for the isokinetic testing was peak torque (PT) which was later normalized for body mass and expressed as PT/Kg. Block-jump and spike-jump performances were measured over three trials, and observed as criteria. Forward stepwise multiple regressions were calculated for validation subsamples and then cross-validated. Cross validation included correlations between and t-test differences between observed and predicted scores; and Bland Altman graphics. Jumping tests were found to be reliable (spike jump: ICC of 0.79 and 0.86; block-jump: ICC of 0.86 and 0.90; for males and females, respectively), and their validity was confirmed by significant t-test differences between 1st vs. 2nd division players. Isokinetic variables were found to be significant predictors of jumping performance in females, but not among males. In females, the isokinetic-knee measures were shown to be stronger and more valid predictors of the block-jump (42% and 64% of the explained variance for validation and cross-validation subsample, respectively

  1. An observation-based progression modeling approach to spring and autumn deciduous tree phenology

    Science.gov (United States)

    Yu, Rong; Schwartz, Mark D.; Donnelly, Alison; Liang, Liang

    2016-03-01

    It is important to accurately determine the response of spring and autumn phenology to climate change in forest ecosystems, as phenological variations affect carbon balance, forest productivity, and biodiversity. We observed phenology intensively throughout spring and autumn in a temperate deciduous woodlot at Milwaukee, WI, USA, during 2007-2012. Twenty-four phenophase levels in spring and eight in autumn were recorded for 106 trees, including white ash, basswood, white oak, boxelder, red oak, and hophornbeam. Our phenological progression models revealed that accumulated degree-days and day length explained 87.9-93.4 % of the variation in spring canopy development and 75.8-89.1 % of the variation in autumn senescence. In addition, the timing of community-level spring and autumn phenophases and the length of the growing season from 1871 to 2012 were reconstructed with the models developed. All simulated spring phenophases significantly advanced at a rate from 0.24 to 0.48 days/decade ( p ≤ 0.001) during the 1871-2012 period and from 1.58 to 2.00 days/decade ( p coloration) and 0.50 (full-leaf coloration) days/decade ( p coloration and leaf fall, and suggested accelerating simulated ecosystem responses to climate warming over the last four decades in comparison to the past 142 years.

  2. Output-Only Modal Parameter Recursive Estimation of Time-Varying Structures via a Kernel Ridge Regression FS-TARMA Approach

    Directory of Open Access Journals (Sweden)

    Zhi-Sai Ma

    2017-01-01

    Full Text Available Modal parameter estimation plays an important role in vibration-based damage detection and is worth more attention and investigation, as changes in modal parameters are usually being used as damage indicators. This paper focuses on the problem of output-only modal parameter recursive estimation of time-varying structures based upon parameterized representations of the time-dependent autoregressive moving average (TARMA. A kernel ridge regression functional series TARMA (FS-TARMA recursive identification scheme is proposed and subsequently employed for the modal parameter estimation of a numerical three-degree-of-freedom time-varying structural system and a laboratory time-varying structure consisting of a simply supported beam and a moving mass sliding on it. The proposed method is comparatively assessed against an existing recursive pseudolinear regression FS-TARMA approach via Monte Carlo experiments and shown to be capable of accurately tracking the time-varying dynamics in a recursive manner.

  3. Combination of individual tree detection and area-based approach in imputation of forest variables using airborne laser data

    Science.gov (United States)

    Vastaranta, Mikko; Kankare, Ville; Holopainen, Markus; Yu, Xiaowei; Hyyppä, Juha; Hyyppä, Hannu

    2012-01-01

    The two main approaches to deriving forest variables from laser-scanning data are the statistical area-based approach (ABA) and individual tree detection (ITD). With ITD it is feasible to acquire single tree information, as in field measurements. Here, ITD was used for measuring training data for the ABA. In addition to automatic ITD (ITD auto), we tested a combination of ITD auto and visual interpretation (ITD visual). ITD visual had two stages: in the first, ITD auto was carried out and in the second, the results of the ITD auto were visually corrected by interpreting three-dimensional laser point clouds. The field data comprised 509 circular plots ( r = 10 m) that were divided equally for testing and training. ITD-derived forest variables were used for training the ABA and the accuracies of the k-most similar neighbor ( k-MSN) imputations were evaluated and compared with the ABA trained with traditional measurements. The root-mean-squared error (RMSE) in the mean volume was 24.8%, 25.9%, and 27.2% with the ABA trained with field measurements, ITD auto, and ITD visual, respectively. When ITD methods were applied in acquiring training data, the mean volume, basal area, and basal area-weighted mean diameter were underestimated in the ABA by 2.7-9.2%. This project constituted a pilot study for using ITD measurements as training data for the ABA. Further studies are needed to reduce the bias and to determine the accuracy obtained in imputation of species-specific variables. The method could be applied in areas with sparse road networks or when the costs of fieldwork must be minimized.

  4. Application Of Decision Tree Approach To Student Selection Model- A Case Study

    Science.gov (United States)

    Harwati; Sudiya, Amby

    2016-01-01

    The main purpose of the institution is to provide quality education to the students and to improve the quality of managerial decisions. One of the ways to improve the quality of students is to arrange the selection of new students with a more selective. This research takes the case in the selection of new students at Islamic University of Indonesia, Yogyakarta, Indonesia. One of the university's selection is through filtering administrative selection based on the records of prospective students at the high school without paper testing. Currently, that kind of selection does not yet has a standard model and criteria. Selection is only done by comparing candidate application file, so the subjectivity of assessment is very possible to happen because of the lack standard criteria that can differentiate the quality of students from one another. By applying data mining techniques classification, can be built a model selection for new students which includes criteria to certain standards such as the area of origin, the status of the school, the average value and so on. These criteria are determined by using rules that appear based on the classification of the academic achievement (GPA) of the students in previous years who entered the university through the same way. The decision tree method with C4.5 algorithm is used here. The results show that students are given priority for admission is that meet the following criteria: came from the island of Java, public school, majoring in science, an average value above 75, and have at least one achievement during their study in high school.

  5. An iteratively reweighted least-squares approach to adaptive robust adjustment of parameters in linear regression models with autoregressive and t-distributed deviations

    Science.gov (United States)

    Kargoll, Boris; Omidalizarandi, Mohammad; Loth, Ina; Paffenholz, Jens-André; Alkhatib, Hamza

    2018-03-01

    In this paper, we investigate a linear regression time series model of possibly outlier-afflicted observations and autocorrelated random deviations. This colored noise is represented by a covariance-stationary autoregressive (AR) process, in which the independent error components follow a scaled (Student's) t-distribution. This error model allows for the stochastic modeling of multiple outliers and for an adaptive robust maximum likelihood (ML) estimation of the unknown regression and AR coefficients, the scale parameter, and the degree of freedom of the t-distribution. This approach is meant to be an extension of known estimators, which tend to focus only on the regression model, or on the AR error model, or on normally distributed errors. For the purpose of ML estimation, we derive an expectation conditional maximization either algorithm, which leads to an easy-to-implement version of iteratively reweighted least squares. The estimation performance of the algorithm is evaluated via Monte Carlo simulations for a Fourier as well as a spline model in connection with AR colored noise models of different orders and with three different sampling distributions generating the white noise components. We apply the algorithm to a vibration dataset recorded by a high-accuracy, single-axis accelerometer, focusing on the evaluation of the estimated AR colored noise model.

  6. Parametric optimization of multiple quality characteristics in laser cutting of Inconel-718 by using hybrid approach of multiple regression analysis and genetic algorithm

    Science.gov (United States)

    Shrivastava, Prashant Kumar; Pandey, Arun Kumar

    2018-06-01

    Inconel-718 has found high demand in different industries due to their superior mechanical properties. The traditional cutting methods are facing difficulties for cutting these alloys due to their low thermal potential, lower elasticity and high chemical compatibility at inflated temperature. The challenges of machining and/or finishing of unusual shapes and/or sizes in these materials have also faced by traditional machining. Laser beam cutting may be applied for the miniaturization and ultra-precision cutting and/or finishing by appropriate control of different process parameter. This paper present multi-objective optimization the kerf deviation, kerf width and kerf taper in the laser cutting of Incone-718 sheet. The second order regression models have been developed for different quality characteristics by using the experimental data obtained through experimentation. The regression models have been used as objective function for multi-objective optimization based on the hybrid approach of multiple regression analysis and genetic algorithm. The comparison of optimization results to experimental results shows an improvement of 88%, 10.63% and 42.15% in kerf deviation, kerf width and kerf taper, respectively. Finally, the effects of different process parameters on quality characteristics have also been discussed.

  7. Random Forest as a Predictive Analytics Alternative to Regression in Institutional Research

    Science.gov (United States)

    He, Lingjun; Levine, Richard A.; Fan, Juanjuan; Beemer, Joshua; Stronach, Jeanne

    2018-01-01

    In institutional research, modern data mining approaches are seldom considered to address predictive analytics problems. The goal of this paper is to highlight the advantages of tree-based machine learning algorithms over classic (logistic) regression methods for data-informed decision making in higher education problems, and stress the success of…

  8. A fuzzy-based reliability approach to evaluate basic events of fault tree analysis for nuclear power plant probabilistic safety assessment

    International Nuclear Information System (INIS)

    Purba, Julwan Hendry

    2014-01-01

    Highlights: • We propose a fuzzy-based reliability approach to evaluate basic event reliabilities. • It implements the concepts of failure possibilities and fuzzy sets. • Experts evaluate basic event failure possibilities using qualitative words. • Triangular fuzzy numbers mathematically represent qualitative failure possibilities. • It is a very good alternative for conventional reliability approach. - Abstract: Fault tree analysis has been widely utilized as a tool for nuclear power plant probabilistic safety assessment. This analysis can be completed only if all basic events of the system fault tree have their quantitative failure rates or failure probabilities. However, it is difficult to obtain those failure data due to insufficient data, environment changing or new components. This study proposes a fuzzy-based reliability approach to evaluate basic events of system fault trees whose failure precise probability distributions of their lifetime to failures are not available. It applies the concept of failure possibilities to qualitatively evaluate basic events and the concept of fuzzy sets to quantitatively represent the corresponding failure possibilities. To demonstrate the feasibility and the effectiveness of the proposed approach, the actual basic event failure probabilities collected from the operational experiences of the David–Besse design of the Babcock and Wilcox reactor protection system fault tree are used to benchmark the failure probabilities generated by the proposed approach. The results confirm that the proposed fuzzy-based reliability approach arises as a suitable alternative for the conventional probabilistic reliability approach when basic events do not have the corresponding quantitative historical failure data for determining their reliability characteristics. Hence, it overcomes the limitation of the conventional fault tree analysis for nuclear power plant probabilistic safety assessment

  9. Toward a simple, repeatable, non-destructive approach to measuring stable-isotope ratios of water within tree stems

    Science.gov (United States)

    Raulerson, S.; Volkmann, T.; Pangle, L. A.

    2017-12-01

    Traditional methodologies for measuring ratios of stable isotopes within the xylem water of trees involve destructive coring of the stem. A recent approach involves permanently installed probes within the stem, and an on-site assembly of pumps, switching valves, gas lines, and climate-controlled structure for field deployment of a laser spectrometer. The former method limits the possible temporal resolution of sampling, and sample size, while the latter may not be feasible for many research groups. We present results from initial laboratory efforts towards developing a non-destructive, temporally-resolved technique for measuring stable isotope ratios within the xylem flow of trees. Researchers have used direct liquid-vapor equilibration as a method to measure isotope ratios of the water in soil pores. Typically, this is done by placing soil samples in a fixed container, and allowing the liquid water within the soil to come into isotopic equilibrium with the headspace of the container. Water can also be removed via cryogenic distillation or azeotropic distillation, with the resulting liquid tested for isotope ratios. Alternatively, the isotope ratios of the water vapor can be directly measured using a laser-based water vapor isotope analyzer. Well-established fractionation factors and the isotope ratios in the vapor phase are then used to calculate the isotope ratios in the liquid phase. We propose a setup which would install a single, removable chamber onto a tree, where vapor samples could non-destructively and repeatedly be taken. These vapor samples will be injected into a laser-based isotope analyzer by a recirculating gas conveyance system. A major part of what is presented here is in the procedure of taking vapor samples at 100% relative humidity, appropriately diluting them with completely dry N2 calibration gas, and injecting them into the gas conveyance system without inducing fractionation in the process. This methodology will be helpful in making

  10. Collaborative regression.

    Science.gov (United States)

    Gross, Samuel M; Tibshirani, Robert

    2015-04-01

    We consider the scenario where one observes an outcome variable and sets of features from multiple assays, all measured on the same set of samples. One approach that has been proposed for dealing with these type of data is "sparse multiple canonical correlation analysis" (sparse mCCA). All of the current sparse mCCA techniques are biconvex and thus have no guarantees about reaching a global optimum. We propose a method for performing sparse supervised canonical correlation analysis (sparse sCCA), a specific case of sparse mCCA when one of the datasets is a vector. Our proposal for sparse sCCA is convex and thus does not face the same difficulties as the other methods. We derive efficient algorithms for this problem that can be implemented with off the shelf solvers, and illustrate their use on simulated and real data. © The Author 2014. Published by Oxford University Press. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.

  11. Two-step superresolution approach for surveillance face image through radial basis function-partial least squares regression and locality-induced sparse representation

    Science.gov (United States)

    Jiang, Junjun; Hu, Ruimin; Han, Zhen; Wang, Zhongyuan; Chen, Jun

    2013-10-01

    Face superresolution (SR), or face hallucination, refers to the technique of generating a high-resolution (HR) face image from a low-resolution (LR) one with the help of a set of training examples. It aims at transcending the limitations of electronic imaging systems. Applications of face SR include video surveillance, in which the individual of interest is often far from cameras. A two-step method is proposed to infer a high-quality and HR face image from a low-quality and LR observation. First, we establish the nonlinear relationship between LR face images and HR ones, according to radial basis function and partial least squares (RBF-PLS) regression, to transform the LR face into the global face space. Then, a locality-induced sparse representation (LiSR) approach is presented to enhance the local facial details once all the global faces for each LR training face are constructed. A comparison of some state-of-the-art SR methods shows the superiority of the proposed two-step approach, RBF-PLS global face regression followed by LiSR-based local patch reconstruction. Experiments also demonstrate the effectiveness under both simulation conditions and some real conditions.

  12. Measuring Response Styles Across the Big Five: A Multiscale Extension of an Approach Using Multinomial Processing Trees.

    Science.gov (United States)

    Khorramdel, Lale; von Davier, Matthias

    2014-01-01

    This study shows how to address the problem of trait-unrelated response styles (RS) in rating scales using multidimensional item response theory. The aim is to test and correct data for RS in order to provide fair assessments of personality. Expanding on an approach presented by Böckenholt (2012), observed rating data are decomposed into multiple response processes based on a multinomial processing tree. The data come from a questionnaire consisting of 50 items of the International Personality Item Pool measuring the Big Five dimensions administered to 2,026 U.S. students with a 5-point rating scale. It is shown that this approach can be used to test if RS exist in the data and that RS can be differentiated from trait-related responses. Although the extreme RS appear to be unidimensional after exclusion of only 1 item, a unidimensional measure for the midpoint RS is obtained only after exclusion of 10 items. Both RS measurements show high cross-scale correlations and item response theory-based (marginal) reliabilities. Cultural differences could be found in giving extreme responses. Moreover, it is shown how to score rating data to correct for RS after being proved to exist in the data.

  13. Live tree carbon stock equivalence of fire and fuels extension to the Forest Vegetation Simulator and Forest Inventory and Analysis approaches

    Science.gov (United States)

    James E. Smith; Coeli M. Hoover

    2017-01-01

    The carbon reports in the Fire and Fuels Extension (FFE) to the Forest Vegetation Simulator (FVS) provide two alternate approaches to carbon estimates for live trees (Rebain 2010). These are (1) the FFE biomass algorithms, which are volumebased biomass equations, and (2) the Jenkins allometric equations (Jenkins and others 2003), which are diameter based. Here, we...

  14. Mechanisms of neuroblastoma regression

    Science.gov (United States)

    Brodeur, Garrett M.; Bagatell, Rochelle

    2014-01-01

    Recent genomic and biological studies of neuroblastoma have shed light on the dramatic heterogeneity in the clinical behaviour of this disease, which spans from spontaneous regression or differentiation in some patients, to relentless disease progression in others, despite intensive multimodality therapy. This evidence also suggests several possible mechanisms to explain the phenomena of spontaneous regression in neuroblastomas, including neurotrophin deprivation, humoral or cellular immunity, loss of telomerase activity and alterations in epigenetic regulation. A better understanding of the mechanisms of spontaneous regression might help to identify optimal therapeutic approaches for patients with these tumours. Currently, the most druggable mechanism is the delayed activation of developmentally programmed cell death regulated by the tropomyosin receptor kinase A pathway. Indeed, targeted therapy aimed at inhibiting neurotrophin receptors might be used in lieu of conventional chemotherapy or radiation in infants with biologically favourable tumours that require treatment. Alternative approaches consist of breaking immune tolerance to tumour antigens or activating neurotrophin receptor pathways to induce neuronal differentiation. These approaches are likely to be most effective against biologically favourable tumours, but they might also provide insights into treatment of biologically unfavourable tumours. We describe the different mechanisms of spontaneous neuroblastoma regression and the consequent therapeutic approaches. PMID:25331179

  15. Application of the Decision Tree Modeling Approach to Evaluation of Proliferation Resistance

    International Nuclear Information System (INIS)

    Coles, Garill A.; Zentner, Michael D.

    2007-01-01

    An experts group was created in 2002 by The Generation IV International Forum for the purpose of developing an internationally accepted methodology for assessing the proliferation resistance of a nuclear energy system (NES) and its individual elements. After three years of work, and some limited demonstration studies, a pilot study was initiated to exercise the methodologies being developed by assessing the proliferation resistance of a specific portion of a hypothetical NES called the Example Sodium Fast Reactor (ESFR). This paper summarizes the assessment approach, and describes the next steps to be taken in the development of the methodology.

  16. Coded Splitting Tree Protocols

    DEFF Research Database (Denmark)

    Sørensen, Jesper Hemming; Stefanovic, Cedomir; Popovski, Petar

    2013-01-01

    This paper presents a novel approach to multiple access control called coded splitting tree protocol. The approach builds on the known tree splitting protocols, code structure and successive interference cancellation (SIC). Several instances of the tree splitting protocol are initiated, each...... instance is terminated prematurely and subsequently iterated. The combined set of leaves from all the tree instances can then be viewed as a graph code, which is decodable using belief propagation. The main design problem is determining the order of splitting, which enables successful decoding as early...

  17. Efficiency of Individual Tree Detection Approaches Based on Light-Weight and Low-Cost UAS Imagery in Australian Savannas

    Directory of Open Access Journals (Sweden)

    Grigorijs Goldbergs

    2018-01-01

    Full Text Available The reliability of airborne light detection and ranging (LiDAR for delineating individual trees and estimating aboveground biomass (AGB has been proven in a diverse range of ecosystems, but can be difficult and costly to commission. Point clouds derived from structure from motion (SfM matching techniques obtained from unmanned aerial systems (UAS could be a feasible low-cost alternative to airborne LiDAR scanning for canopy parameter retrieval. This study assesses the extent to which SfM three-dimensional (3D point clouds—obtained from a light-weight mini-UAS quadcopter with an inexpensive consumer action GoPro camera—can efficiently and effectively detect individual trees, measure tree heights, and provide AGB estimates in Australian tropical savannas. Two well-established canopy maxima and watershed segmentation tree detection algorithms were tested on canopy height models (CHM derived from SfM imagery. The influence of CHM spatial resolution on tree detection accuracy was analysed, and the results were validated against existing high-resolution airborne LiDAR data. We found that the canopy maxima and watershed segmentation routines produced similar tree detection rates (~70% for dominant and co-dominant trees, but yielded low detection rates (<35% for suppressed and small trees due to poor representativeness in point clouds and overstory occlusion. Although airborne LiDAR provides higher tree detection rates and more accurate estimates of tree heights, we found SfM image matching to be an adequate low-cost alternative for the detection of dominant and co-dominant tree stands.

  18. Evaluating risk factors for endemic human Salmonella Enteritidis infections with different phage types in Ontario, Canada using multinomial logistic regression and a case-case study approach

    Directory of Open Access Journals (Sweden)

    Varga Csaba

    2012-10-01

    Full Text Available Abstract Background Identifying risk factors for Salmonella Enteritidis (SE infections in Ontario will assist public health authorities to design effective control and prevention programs to reduce the burden of SE infections. Our research objective was to identify risk factors for acquiring SE infections with various phage types (PT in Ontario, Canada. We hypothesized that certain PTs (e.g., PT8 and PT13a have specific risk factors for infection. Methods Our study included endemic SE cases with various PTs whose isolates were submitted to the Public Health Laboratory-Toronto from January 20th to August 12th, 2011. Cases were interviewed using a standardized questionnaire that included questions pertaining to demographics, travel history, clinical symptoms, contact with animals, and food exposures. A multinomial logistic regression method using the Generalized Linear Latent and Mixed Model procedure and a case-case study design were used to identify risk factors for acquiring SE infections with various PTs in Ontario, Canada. In the multinomial logistic regression model, the outcome variable had three categories representing human infections caused by SE PT8, PT13a, and all other SE PTs (i.e., non-PT8/non-PT13a as a referent category to which the other two categories were compared. Results In the multivariable model, SE PT8 was positively associated with contact with dogs (OR=2.17, 95% CI 1.01-4.68 and negatively associated with pepper consumption (OR=0.35, 95% CI 0.13-0.94, after adjusting for age categories and gender, and using exposure periods and health regions as random effects to account for clustering. Conclusions Our study findings offer interesting hypotheses about the role of phage type-specific risk factors. Multinomial logistic regression analysis and the case-case study approach are novel methodologies to evaluate associations among SE infections with different PTs and various risk factors.

  19. Refining discordant gene trees.

    Science.gov (United States)

    Górecki, Pawel; Eulenstein, Oliver

    2014-01-01

    Evolutionary studies are complicated by discordance between gene trees and the species tree in which they evolved. Dealing with discordant trees often relies on comparison costs between gene and species trees, including the well-established Robinson-Foulds, gene duplication, and deep coalescence costs. While these costs have provided credible results for binary rooted gene trees, corresponding cost definitions for non-binary unrooted gene trees, which are frequently occurring in practice, are challenged by biological realism. We propose a natural extension of the well-established costs for comparing unrooted and non-binary gene trees with rooted binary species trees using a binary refinement model. For the duplication cost we describe an efficient algorithm that is based on a linear time reduction and also computes an optimal rooted binary refinement of the given gene tree. Finally, we show that similar reductions lead to solutions for computing the deep coalescence and the Robinson-Foulds costs. Our binary refinement of Robinson-Foulds, gene duplication, and deep coalescence costs for unrooted and non-binary gene trees together with the linear time reductions provided here for computing these costs significantly extends the range of trees that can be incorporated into approaches dealing with discordance.

  20. Spatial variability of excess mortality during prolonged dust events in a high-density city: a time-stratified spatial regression approach.

    Science.gov (United States)

    Wong, Man Sing; Ho, Hung Chak; Yang, Lin; Shi, Wenzhong; Yang, Jinxin; Chan, Ta-Chien

    2017-07-24

    Dust events have long been recognized to be associated with a higher mortality risk. However, no study has investigated how prolonged dust events affect the spatial variability of mortality across districts in a downwind city. In this study, we applied a spatial regression approach to estimate the district-level mortality during two extreme dust events in Hong Kong. We compared spatial and non-spatial models to evaluate the ability of each regression to estimate mortality. We also compared prolonged dust events with non-dust events to determine the influences of community factors on mortality across the city. The density of a built environment (estimated by the sky view factor) had positive association with excess mortality in each district, while socioeconomic deprivation contributed by lower income and lower education induced higher mortality impact in each territory planning unit during a prolonged dust event. Based on the model comparison, spatial error modelling with the 1st order of queen contiguity consistently outperformed other models. The high-risk areas with higher increase in mortality were located in an urban high-density environment with higher socioeconomic deprivation. Our model design shows the ability to predict spatial variability of mortality risk during an extreme weather event that is not able to be estimated based on traditional time-series analysis or ecological studies. Our spatial protocol can be used for public health surveillance, sustainable planning and disaster preparation when relevant data are available.

  1. How does warming affect carbon allocation, respiration and residence time in trees? An isotope tracer approach in a eucalypt

    Science.gov (United States)

    Pendall, E.; Drake, J. E.; Furze, M.; Barton, C. V.; Carillo, Y.; Richter, A.; Tjoelker, M. G.

    2017-12-01

    Climate warming has the potential to alter the balance between photosynthetic carbon assimilation and respiratory losses in forest trees, leading to uncertainty in predicting their future physiological functioning. In a previous experiment, warming decreased canopy CO2 assimilation (A) rates of Eucalyptus tereticornis trees, but respiration (R) rates were usually not significantly affected, due to physiological acclimation to temperature. This led to a slight increase in (R/A) and thus decrease in plant carbon use efficiency with climate warming. In contrast to carbon fluxes, the effect of warming on carbon allocation and residence time in trees has received less attention. We conducted a study to test the hypothesis that warming would decrease the allocation of C belowground owing to reduced cost of nutrient uptake. E. parramattensis trees were grown in the field in unique whole-tree chambers operated at ambient and ambient +3 °C temperature treatments (n=3 per treatment). We applied a 13CO2 pulse and followed the label in CO2 respired from leaves, roots, canopy and soil, in plant sugars, and in rhizosphere microbes over a 3-week period in conjunction with measurements of tree growth. The 9-m tall, 57 m3 whole-tree chambers were monitored for CO2 concentrations in independent canopy and below ground (root and soil) compartments; periodic monitoring of δ13C values in air in the compartments allowed us to quantify the amount of 13CO2 assimilated and respired by each tree. Warmed trees grew faster and assimilated more of the label than control trees, but the 13C allocation to canopy, root and soil respiration was not altered. However, warming appeared to reduce the residence time of carbon respired from leaves, and especially from roots and soil, indicating that autotrophic respiration has the potential to feedback to climate change. This experiment provides insights into how warming may affect the fate of assimilated carbon from the leaf to the ecosystem scale.

  2. Tree-Based Ecosystem Approaches (TBEAs as Multi-Functional Land Management Strategies—Evidence from Rwanda

    Directory of Open Access Journals (Sweden)

    Miyuki Iiyama

    2018-04-01

    Full Text Available Densely populated rural areas in the East African Highlands have faced significant intensification challenges under extreme population pressure on their land and ecosystems. Sustainable agricultural intensification, in the context of increasing cropping intensities, is a prerequisite for deliberate land management strategies that deliver multiple ecosystem goods (food, energy, income sources, etc. and services (especially improving soil conditions on the same land, as well as system resilience, if adopted at scale. Tree based ecosystem approaches (TBEAs are among such multi-functional land management strategies. Knowledge on the multi-functionality of TBEAs and on their scaling up, however, remains severely limited due to several methodological challenges. This study aims at offering an analytical perspective to view multi-functional TBEAs as an integral part of sustainable agricultural intensification. The study proposes a conceptual framework to guide the analysis of socio-economic data and applies it to cross-site analysis of TBEAs in extremely densely populated Rwanda. Heterogeneous TBEAs were identified across Rwanda’s different agro-ecological zones to meet locally-specific smallholders’ needs for a set of ecosystem goods and services on the same land. The sustained adoption of TBEAs would be guaranteed if farmers subjectively recognize their compatibility and synergy with sustainable intensification of existing farming systems, supported by favorable institutional conditions.

  3. Accounting for selection bias in species distribution models: An econometric approach on forested trees based on structural modeling

    Science.gov (United States)

    Ay, Jean-Sauveur; Guillemot, Joannès; Martin-StPaul, Nicolas K.; Doyen, Luc; Leadley, Paul

    2015-04-01

    Species distribution models (SDMs) are widely used to study and predict the outcome of global change on species. In human dominated ecosystems the presence of a given species is the result of both its ecological suitability and human footprint on nature such as land use choices. Land use choices may thus be responsible for a selection bias in the presence/absence data used in SDM calibration. We present a structural modelling approach (i.e. based on structural equation modelling) that accounts for this selection bias. The new structural species distribution model (SSDM) estimates simultaneously land use choices and species responses to bioclimatic variables. A land use equation based on an econometric model of landowner choices was joined to an equation of species response to bioclimatic variables. SSDM allows the residuals of both equations to be dependent, taking into account the possibility of shared omitted variables and measurement errors. We provide a general description of the statistical theory and a set of application on forested trees over France using databases of climate and forest inventory at different spatial resolution (from 2km to 8 km). We also compared the output of the SSDM with outputs of a classical SDM in term of bioclimatic response curves and potential distribution under current climate. According to the species and the spatial resolution of the calibration dataset, shapes of bioclimatic response curves the modelled species distribution maps differed markedly between the SSDM and classical SDMs. The magnitude and directions of these differences were dependent on the correlations between the errors from both equations and were highest for higher spatial resolutions. A first conclusion is that the use of classical SDMs can potentially lead to strong miss-estimation of the actual and future probability of presence modelled. Beyond this selection bias, the SSDM we propose represents a crucial step to account for economic constraints on tree

  4. Methods for identifying SNP interactions: a review on variations of Logic Regression, Random Forest and Bayesian logistic regression.

    Science.gov (United States)

    Chen, Carla Chia-Ming; Schwender, Holger; Keith, Jonathan; Nunkesser, Robin; Mengersen, Kerrie; Macrossan, Paula

    2011-01-01

    Due to advancements in computational ability, enhanced technology and a reduction in the price of genotyping, more data are being generated for understanding genetic associations with diseases and disorders. However, with the availability of large data sets comes the inherent challenges of new methods of statistical analysis and modeling. Considering a complex phenotype may be the effect of a combination of multiple loci, various statistical methods have been developed for identifying genetic epistasis effects. Among these methods, logic regression (LR) is an intriguing approach incorporating tree-like structures. Various methods have built on the original LR to improve different aspects of the model. In this study, we review four variations of LR, namely Logic Feature Selection, Monte Carlo Logic Regression, Genetic Programming for Association Studies, and Modified Logic Regression-Gene Expression Programming, and investigate the performance of each method using simulated and real genotype data. We contrast these with another tree-like approach, namely Random Forests, and a Bayesian logistic regression with stochastic search variable selection.

  5. Human dental age estimation using third molar developmental stages: does a Bayesian approach outperform regression models to discriminate between juveniles and adults?

    Science.gov (United States)

    Thevissen, P W; Fieuws, S; Willems, G

    2010-01-01

    Dental age estimation methods based on the radiologically detected third molar developmental stages are implemented in forensic age assessments to discriminate between juveniles and adults considering the judgment of young unaccompanied asylum seekers. Accurate and unbiased age estimates combined with appropriate quantified uncertainties are the required properties for accurate forensic reporting. In this study, a subset of 910 individuals uniformly distributed in age between 16 and 22 years was selected from an existing dataset collected by Gunst et al. containing 2,513 panoramic radiographs with known third molar developmental stages of Belgian Caucasian men and women. This subset was randomly split in a training set to develop a classical regression analysis and a Bayesian model for the multivariate distribution of the third molar developmental stages conditional on age and in a test set to assess the performance of both models. The aim of this study was to verify if the Bayesian approach differentiates the age of maturity more precisely and removes the bias, which disadvantages the systematically overestimated young individuals. The Bayesian model offers the discrimination of subjects being older than 18 years more appropriate and produces more meaningful prediction intervals but does not strongly outperform the classical approaches.

  6. IND - THE IND DECISION TREE PACKAGE

    Science.gov (United States)

    Buntine, W.

    1994-01-01

    A common approach to supervised classification and prediction in artificial intelligence and statistical pattern recognition is the use of decision trees. A tree is "grown" from data using a recursive partitioning algorithm to create a tree which has good prediction of classes on new data. Standard algorithms are CART (by Breiman Friedman, Olshen and Stone) and ID3 and its successor C4 (by Quinlan). As well as reimplementing parts of these algorithms and offering experimental control suites, IND also introduces Bayesian and MML methods and more sophisticated search in growing trees. These produce more accurate class probability estimates that are important in applications like diagnosis. IND is applicable to most data sets consisting of independent instances, each described by a fixed length vector of attribute values. An attribute value may be a number, one of a set of attribute specific symbols, or it may be omitted. One of the attributes is delegated the "target" and IND grows trees to predict the target. Prediction can then be done on new data or the decision tree printed out for inspection. IND provides a range of features and styles with convenience for the casual user as well as fine-tuning for the advanced user or those interested in research. IND can be operated in a CART-like mode (but without regression trees, surrogate splits or multivariate splits), and in a mode like the early version of C4. Advanced features allow more extensive search, interactive control and display of tree growing, and Bayesian and MML algorithms for tree pruning and smoothing. These often produce more accurate class probability estimates at the leaves. IND also comes with a comprehensive experimental control suite. IND consists of four basic kinds of routines: data manipulation routines, tree generation routines, tree testing routines, and tree display routines. The data manipulation routines are used to partition a single large data set into smaller training and test sets. The

  7. An integrated unscented kalman filter and relevance vector regression approach for lithium-ion battery remaining useful life and short-term capacity prediction

    International Nuclear Information System (INIS)

    Zheng, Xiujuan; Fang, Huajing

    2015-01-01

    The gradual decreasing capacity of lithium-ion batteries can serve as a health indicator for tracking the degradation of lithium-ion batteries. It is important to predict the capacity of a lithium-ion battery for future cycles to assess its health condition and remaining useful life (RUL). In this paper, a novel method is developed using unscented Kalman filter (UKF) with relevance vector regression (RVR) and applied to RUL and short-term capacity prediction of batteries. A RVR model is employed as a nonlinear time-series prediction model to predict the UKF future residuals which otherwise remain zero during the prediction period. Taking the prediction step into account, the predictive value through the RVR method and the latest real residual value constitute the future evolution of the residuals with a time-varying weighting scheme. Next, the future residuals are utilized by UKF to recursively estimate the battery parameters for predicting RUL and short-term capacity. Finally, the performance of the proposed method is validated and compared to other predictors with the experimental data. According to the experimental and analysis results, the proposed approach has high reliability and prediction accuracy, which can be applied to battery monitoring and prognostics, as well as generalized to other prognostic applications. - Highlights: • An integrated method is proposed for RUL prediction as well as short-term capacity prediction. • Relevance vector regression model is employed as a nonlinear time-series prediction model. • Unscented Kalman filter is used to recursively update the states for battery model parameters during the prediction. • A time-varying weighting scheme is utilized to improve the accuracy of the RUL prediction. • The proposed method demonstrates high reliability and prediction accuracy.

  8. Review of cause-based decision tree approach for the development of domestic standard human reliability analysis procedure in low power/shutdown operation probabilistic safety assessment

    International Nuclear Information System (INIS)

    Kang, D. I.; Jung, W. D.

    2003-01-01

    We review the Cause-Based Decision Tree (CBDT) approach to decide whether we incorporate it or not for the development of domestic standard Human Reliability Analysis (HRA) procedure in low power/shutdown operation Probabilistic Safety Assessment (PSA). In this paper, we introduce the cause based decision tree approach, quantify human errors using it, and identify merits and demerits of it in comparision with previously used THERP. The review results show that it is difficult to incorporate the CBDT method for the development of domestic standard HRA procedure in low power/shutdown PSA because the CBDT method need for the subjective judgment of HRA analyst like as THERP. However, it is expected that the incorporation of the CBDT method into the development of domestic standard HRA procedure only for the comparision of quantitative HRA results will relieve the burden of development of detailed HRA procedure and will help maintain consistent quantitative HRA results

  9. Ordinary least square regression, orthogonal regression, geometric mean regression and their applications in aerosol science

    International Nuclear Information System (INIS)

    Leng Ling; Zhang Tianyi; Kleinman, Lawrence; Zhu Wei

    2007-01-01

    Regression analysis, especially the ordinary least squares method which assumes that errors are confined to the dependent variable, has seen a fair share of its applications in aerosol science. The ordinary least squares approach, however, could be problematic due to the fact that atmospheric data often does not lend itself to calling one variable independent and the other dependent. Errors often exist for both measurements. In this work, we examine two regression approaches available to accommodate this situation. They are orthogonal regression and geometric mean regression. Comparisons are made theoretically as well as numerically through an aerosol study examining whether the ratio of organic aerosol to CO would change with age

  10. DIF Trees: Using Classification Trees to Detect Differential Item Functioning

    Science.gov (United States)

    Vaughn, Brandon K.; Wang, Qiu

    2010-01-01

    A nonparametric tree classification procedure is used to detect differential item functioning for items that are dichotomously scored. Classification trees are shown to be an alternative procedure to detect differential item functioning other than the use of traditional Mantel-Haenszel and logistic regression analysis. A nonparametric…

  11. Differentiating regressed melanoma from regressed lichenoid keratosis.

    Science.gov (United States)

    Chan, Aegean H; Shulman, Kenneth J; Lee, Bonnie A

    2017-04-01

    Distinguishing regressed lichen planus-like keratosis (LPLK) from regressed melanoma can be difficult on histopathologic examination, potentially resulting in mismanagement of patients. We aimed to identify histopathologic features by which regressed melanoma can be differentiated from regressed LPLK. Twenty actively inflamed LPLK, 12 LPLK with regression and 15 melanomas with regression were compared and evaluated by hematoxylin and eosin staining as well as Melan-A, microphthalmia transcription factor (MiTF) and cytokeratin (AE1/AE3) immunostaining. (1) A total of 40% of regressed melanomas showed complete or near complete loss of melanocytes within the epidermis with Melan-A and MiTF immunostaining, while 8% of regressed LPLK exhibited this finding. (2) Necrotic keratinocytes were seen in the epidermis in 33% regressed melanomas as opposed to all of the regressed LPLK. (3) A dense infiltrate of melanophages in the papillary dermis was seen in 40% of regressed melanomas, a feature not seen in regressed LPLK. In summary, our findings suggest that a complete or near complete loss of melanocytes within the epidermis strongly favors a regressed melanoma over a regressed LPLK. In addition, necrotic epidermal keratinocytes and the presence of a dense band-like distribution of dermal melanophages can be helpful in differentiating these lesions. © 2016 John Wiley & Sons A/S. Published by John Wiley & Sons Ltd.

  12. Evaluation of the prediction precision capability of partial least squares regression approach for analysis of high alloy steel by laser induced breakdown spectroscopy

    Science.gov (United States)

    Sarkar, Arnab; Karki, Vijay; Aggarwal, Suresh K.; Maurya, Gulab S.; Kumar, Rohit; Rai, Awadhesh K.; Mao, Xianglei; Russo, Richard E.

    2015-06-01

    Laser induced breakdown spectroscopy (LIBS) was applied for elemental characterization of high alloy steel using partial least squares regression (PLSR) with an objective to evaluate the analytical performance of this multivariate approach. The optimization of the number of principle components for minimizing error in PLSR algorithm was investigated. The effect of different pre-treatment procedures on the raw spectral data before PLSR analysis was evaluated based on several statistical (standard error of prediction, percentage relative error of prediction etc.) parameters. The pre-treatment with "NORM" parameter gave the optimum statistical results. The analytical performance of PLSR model improved by increasing the number of laser pulses accumulated per spectrum as well as by truncating the spectrum to appropriate wavelength region. It was found that the statistical benefit of truncating the spectrum can also be accomplished by increasing the number of laser pulses per accumulation without spectral truncation. The constituents (Co and Mo) present in hundreds of ppm were determined with relative precision of 4-9% (2σ), whereas the major constituents Cr and Ni (present at a few percent levels) were determined with a relative precision of ~ 2%(2σ).

  13. Regression: A Bibliography.

    Science.gov (United States)

    Pedrini, D. T.; Pedrini, Bonnie C.

    Regression, another mechanism studied by Sigmund Freud, has had much research, e.g., hypnotic regression, frustration regression, schizophrenic regression, and infra-human-animal regression (often directly related to fixation). Many investigators worked with hypnotic age regression, which has a long history, going back to Russian reflexologists.…

  14. Ridge Regression Signal Processing

    Science.gov (United States)

    Kuhl, Mark R.

    1990-01-01

    The introduction of the Global Positioning System (GPS) into the National Airspace System (NAS) necessitates the development of Receiver Autonomous Integrity Monitoring (RAIM) techniques. In order to guarantee a certain level of integrity, a thorough understanding of modern estimation techniques applied to navigational problems is required. The extended Kalman filter (EKF) is derived and analyzed under poor geometry conditions. It was found that the performance of the EKF is difficult to predict, since the EKF is designed for a Gaussian environment. A novel approach is implemented which incorporates ridge regression to explain the behavior of an EKF in the presence of dynamics under poor geometry conditions. The basic principles of ridge regression theory are presented, followed by the derivation of a linearized recursive ridge estimator. Computer simulations are performed to confirm the underlying theory and to provide a comparative analysis of the EKF and the recursive ridge estimator.

  15. Regression modeling methods, theory, and computation with SAS

    CERN Document Server

    Panik, Michael

    2009-01-01

    Regression Modeling: Methods, Theory, and Computation with SAS provides an introduction to a diverse assortment of regression techniques using SAS to solve a wide variety of regression problems. The author fully documents the SAS programs and thoroughly explains the output produced by the programs.The text presents the popular ordinary least squares (OLS) approach before introducing many alternative regression methods. It covers nonparametric regression, logistic regression (including Poisson regression), Bayesian regression, robust regression, fuzzy regression, random coefficients regression,

  16. UV-laser microdissection system - A novel approach for the preparation of high-resolution stable isotope records (δ13C/δ18O) from tree rings

    Science.gov (United States)

    Schollaen, Karina; Helle, Gerhard

    2013-04-01

    Intra-annual stable isotope (δ13C and δ18O) studies of tree rings at various incremental resolutions have been attempting to extract valuable seasonal climatic and environmental information or assessing plant ecophysiological processes. For preparing high-resolution isotope samples normally wood segments or cores are mechanically divided in radial direction or cut in tangential direction. After mechanical dissection, wood samples are ground to a fine powder and either cellulose is extracted or bulk wood samples are analyzed. Here, we present a novel approach for the preparation of high-resolution stable isotope records from tree rings using an UV-laser microdissection system. Firstly, tree-ring cellulose is directly extracted from wholewood cross-sections largely leaving the wood anatomical structure intact and saving time as compared to the classical procedure. Secondly, micro-samples from cellulose cross-sections are dissected with an UV-Laser dissection microscope. Tissues of interest from cellulose cross-sections are identified and marked precisely with a screen-pen and dissected via an UV-laser beam. Dissected cellulose segments were automatically collected in capsules and are prepared for stable isotope (δ13C and δ18O) analysis. The new techniques facilitate inter- and intra-annual isotope analysis on tree-ring and open various possibilities for comparisons with wood anatomy in plant eco-physiological studies. We describe the design and the handling of this novel methodology and discuss advantages and constraints given by the example of intra-annual oxygen isotope analysis on tropical trees.

  17. Quantile Regression With Measurement Error

    KAUST Repository

    Wei, Ying; Carroll, Raymond J.

    2009-01-01

    . The finite sample performance of the proposed method is investigated in a simulation study, and compared to the standard regression calibration approach. Finally, we apply our methodology to part of the National Collaborative Perinatal Project growth data, a

  18. Flowering Trees

    Indian Academy of Sciences (India)

    Flowering Trees. Boswellia serrata Roxb. ex Colebr. (Indian Frankincense tree) of Burseraceae is a large-sized deciduous tree that is native to India. Bark is thin, greenish-ash-coloured that exfoliates into smooth papery flakes. Stem exudes pinkish resin ... Fruit is a three-valved capsule. A green gum-resin exudes from the ...

  19. Flowering Trees

    Indian Academy of Sciences (India)

    IAS Admin

    Flowering Trees. Ailanthus excelsa Roxb. (INDIAN TREE OF. HEAVEN) of Simaroubaceae is a lofty tree with large pinnately compound alternate leaves, which are ... inflorescences, unisexual and greenish-yellow. Fruits are winged, wings many-nerved. Wood is used in making match sticks. 1. Male flower; 2. Female flower.

  20. Flowering Trees

    Indian Academy of Sciences (India)

    Flowering Trees. Gyrocarpus americanus Jacq. (Helicopter Tree) of Hernandiaceae is a moderate size deciduous tree that grows to about 12 m in height with a smooth, shining, greenish-white bark. The leaves are ovate, rarely irregularly ... flowers which are unpleasant smelling. Fruit is a woody nut with two long thin wings.

  1. Flowering Trees

    Indian Academy of Sciences (India)

    More Details Fulltext PDF. Volume 8 Issue 8 August 2003 pp 112-112 Flowering Trees. Zizyphus jujuba Lam. of Rhamnaceae · More Details Fulltext PDF. Volume 8 Issue 9 September 2003 pp 97-97 Flowering Trees. Moringa oleifera · More Details Fulltext PDF. Volume 8 Issue 10 October 2003 pp 100-100 Flowering Trees.

  2. A High Performance Computing Approach to Tree Cover Delineation in 1-m NAIP Imagery Using a Probabilistic Learning Framework

    Science.gov (United States)

    Basu, Saikat; Ganguly, Sangram; Michaelis, Andrew; Votava, Petr; Roy, Anshuman; Mukhopadhyay, Supratik; Nemani, Ramakrishna

    2015-01-01

    Tree cover delineation is a useful instrument in deriving Above Ground Biomass (AGB) density estimates from Very High Resolution (VHR) airborne imagery data. Numerous algorithms have been designed to address this problem, but most of them do not scale to these datasets, which are of the order of terabytes. In this paper, we present a semi-automated probabilistic framework for the segmentation and classification of 1-m National Agriculture Imagery Program (NAIP) for tree-cover delineation for the whole of Continental United States, using a High Performance Computing Architecture. Classification is performed using a multi-layer Feedforward Backpropagation Neural Network and segmentation is performed using a Statistical Region Merging algorithm. The results from the classification and segmentation algorithms are then consolidated into a structured prediction framework using a discriminative undirected probabilistic graphical model based on Conditional Random Field, which helps in capturing the higher order contextual dependencies between neighboring pixels. Once the final probability maps are generated, the framework is updated and re-trained by relabeling misclassified image patches. This leads to a significant improvement in the true positive rates and reduction in false positive rates. The tree cover maps were generated for the whole state of California, spanning a total of 11,095 NAIP tiles covering a total geographical area of 163,696 sq. miles. The framework produced true positive rates of around 88% for fragmented forests and 74% for urban tree cover areas, with false positive rates lower than 2% for both landscapes. Comparative studies with the National Land Cover Data (NLCD) algorithm and the LiDAR canopy height model (CHM) showed the effectiveness of our framework for generating accurate high-resolution tree-cover maps.

  3. A High Performance Computing Approach to Tree Cover Delineation in 1-m NAIP Imagery using a Probabilistic Learning Framework

    Science.gov (United States)

    Basu, S.; Ganguly, S.; Michaelis, A.; Votava, P.; Roy, A.; Mukhopadhyay, S.; Nemani, R. R.

    2015-12-01

    Tree cover delineation is a useful instrument in deriving Above Ground Biomass (AGB) density estimates from Very High Resolution (VHR) airborne imagery data. Numerous algorithms have been designed to address this problem, but most of them do not scale to these datasets which are of the order of terabytes. In this paper, we present a semi-automated probabilistic framework for the segmentation and classification of 1-m National Agriculture Imagery Program (NAIP) for tree-cover delineation for the whole of Continental United States, using a High Performance Computing Architecture. Classification is performed using a multi-layer Feedforward Backpropagation Neural Network and segmentation is performed using a Statistical Region Merging algorithm. The results from the classification and segmentation algorithms are then consolidated into a structured prediction framework using a discriminative undirected probabilistic graphical model based on Conditional Random Field, which helps in capturing the higher order contextual dependencies between neighboring pixels. Once the final probability maps are generated, the framework is updated and re-trained by relabeling misclassified image patches. This leads to a significant improvement in the true positive rates and reduction in false positive rates. The tree cover maps were generated for the whole state of California, spanning a total of 11,095 NAIP tiles covering a total geographical area of 163,696 sq. miles. The framework produced true positive rates of around 88% for fragmented forests and 74% for urban tree cover areas, with false positive rates lower than 2% for both landscapes. Comparative studies with the National Land Cover Data (NLCD) algorithm and the LiDAR canopy height model (CHM) showed the effectiveness of our framework for generating accurate high-resolution tree-cover maps.

  4. Tree compression with top trees

    DEFF Research Database (Denmark)

    Bille, Philip; Gørtz, Inge Li; Landau, Gad M.

    2013-01-01

    We introduce a new compression scheme for labeled trees based on top trees [3]. Our compression scheme is the first to simultaneously take advantage of internal repeats in the tree (as opposed to the classical DAG compression that only exploits rooted subtree repeats) while also supporting fast...

  5. Tree compression with top trees

    DEFF Research Database (Denmark)

    Bille, Philip; Gørtz, Inge Li; Landau, Gad M.

    2015-01-01

    We introduce a new compression scheme for labeled trees based on top trees. Our compression scheme is the first to simultaneously take advantage of internal repeats in the tree (as opposed to the classical DAG compression that only exploits rooted subtree repeats) while also supporting fast...

  6. Regression analysis with categorized regression calibrated exposure: some interesting findings

    Directory of Open Access Journals (Sweden)

    Hjartåker Anette

    2006-07-01

    Full Text Available Abstract Background Regression calibration as a method for handling measurement error is becoming increasingly well-known and used in epidemiologic research. However, the standard version of the method is not appropriate for exposure analyzed on a categorical (e.g. quintile scale, an approach commonly used in epidemiologic studies. A tempting solution could then be to use the predicted continuous exposure obtained through the regression calibration method and treat it as an approximation to the true exposure, that is, include the categorized calibrated exposure in the main regression analysis. Methods We use semi-analytical calculations and simulations to evaluate the performance of the proposed approach compared to the naive approach of not correcting for measurement error, in situations where analyses are performed on quintile scale and when incorporating the original scale into the categorical variables, respectively. We also present analyses of real data, containing measures of folate intake and depression, from the Norwegian Women and Cancer study (NOWAC. Results In cases where extra information is available through replicated measurements and not validation data, regression calibration does not maintain important qualities of the true exposure distribution, thus estimates of variance and percentiles can be severely biased. We show that the outlined approach maintains much, in some cases all, of the misclassification found in the observed exposure. For that reason, regression analysis with the corrected variable included on a categorical scale is still biased. In some cases the corrected estimates are analytically equal to those obtained by the naive approach. Regression calibration is however vastly superior to the naive method when applying the medians of each category in the analysis. Conclusion Regression calibration in its most well-known form is not appropriate for measurement error correction when the exposure is analyzed on a

  7. Tree biomass in the Swiss landscape: nationwide modelling for improved accounting for forest and non-forest trees.

    Science.gov (United States)

    Price, B; Gomez, A; Mathys, L; Gardi, O; Schellenberger, A; Ginzler, C; Thürig, E

    2017-03-01

    Trees outside forest (TOF) can perform a variety of social, economic and ecological functions including carbon sequestration. However, detailed quantification of tree biomass is usually limited to forest areas. Taking advantage of structural information available from stereo aerial imagery and airborne laser scanning (ALS), this research models tree biomass using national forest inventory data and linear least-square regression and applies the model both inside and outside of forest to create a nationwide model for tree biomass (above ground and below ground). Validation of the tree biomass model against TOF data within settlement areas shows relatively low model performance (R 2 of 0.44) but still a considerable improvement on current biomass estimates used for greenhouse gas inventory and carbon accounting. We demonstrate an efficient and easily implementable approach to modelling tree biomass across a large heterogeneous nationwide area. The model offers significant opportunity for improved estimates on land use combination categories (CC) where tree biomass has either not been included or only roughly estimated until now. The ALS biomass model also offers the advantage of providing greater spatial resolution and greater within CC spatial variability compared to the current nationwide estimates.

  8. hs-CRP is strongly associated with coronary heart disease (CHD): A data mining approach using decision tree algorithm.

    Science.gov (United States)

    Tayefi, Maryam; Tajfard, Mohammad; Saffar, Sara; Hanachi, Parichehr; Amirabadizadeh, Ali Reza; Esmaeily, Habibollah; Taghipour, Ali; Ferns, Gordon A; Moohebati, Mohsen; Ghayour-Mobarhan, Majid

    2017-04-01

    Coronary heart disease (CHD) is an important public health problem globally. Algorithms incorporating the assessment of clinical biomarkers together with several established traditional risk factors can help clinicians to predict CHD and support clinical decision making with respect to interventions. Decision tree (DT) is a data mining model for extracting hidden knowledge from large databases. We aimed to establish a predictive model for coronary heart disease using a decision tree algorithm. Here we used a dataset of 2346 individuals including 1159 healthy participants and 1187 participant who had undergone coronary angiography (405 participants with negative angiography and 782 participants with positive angiography). We entered 10 variables of a total 12 variables into the DT algorithm (including age, sex, FBG, TG, hs-CRP, TC, HDL, LDL, SBP and DBP). Our model could identify the associated risk factors of CHD with sensitivity, specificity, accuracy of 96%, 87%, 94% and respectively. Serum hs-CRP levels was at top of the tree in our model, following by FBG, gender and age. Our model appears to be an accurate, specific and sensitive model for identifying the presence of CHD, but will require validation in prospective studies. Copyright © 2017 Elsevier B.V. All rights reserved.

  9. Modeling flash floods in ungauged mountain catchments of China: A decision tree learning approach for parameter regionalization

    Science.gov (United States)

    Ragettli, S.; Zhou, J.; Wang, H.; Liu, C.

    2017-12-01

    Flash floods in small mountain catchments are one of the most frequent causes of loss of life and property from natural hazards in China. Hydrological models can be a useful tool for the anticipation of these events and the issuing of timely warnings. Since sub-daily streamflow information is unavailable for most small basins in China, one of the main challenges is finding appropriate parameter values for simulating flash floods in ungauged catchments. In this study, we use decision tree learning to explore parameter set transferability between different catchments. For this purpose, the physically-based, semi-distributed rainfall-runoff model PRMS-OMS is set up for 35 catchments in ten Chinese provinces. Hourly data from more than 800 storm runoff events are used to calibrate the model and evaluate the performance of parameter set transfers between catchments. For each catchment, 58 catchment attributes are extracted from several data sets available for whole China. We then use a data mining technique (decision tree learning) to identify catchment similarities that can be related to good transfer performance. Finally, we use the splitting rules of decision trees for finding suitable donor catchments for ungauged target catchments. We show that decision tree learning allows to optimally utilize the information content of available catchment descriptors and outperforms regionalization based on a conventional measure of physiographic-climatic similarity by 15%-20%. Similar performance can be achieved with a regionalization method based on spatial proximity, but decision trees offer flexible rules for selecting suitable donor catchments, not relying on the vicinity of gauged catchments. This flexibility makes the method particularly suitable for implementation in sparsely gauged environments. We evaluate the probability to detect flood events exceeding a given return period, considering measured discharge and PRMS-OMS simulated flows with regionalized parameters

  10. Bayesian additive decision trees of biomarker by treatment interactions for predictive biomarker detection and subgroup identification.

    Science.gov (United States)

    Zhao, Yang; Zheng, Wei; Zhuo, Daisy Y; Lu, Yuefeng; Ma, Xiwen; Liu, Hengchang; Zeng, Zhen; Laird, Glen

    2017-10-11

    Personalized medicine, or tailored therapy, has been an active and important topic in recent medical research. Many methods have been proposed in the literature for predictive biomarker detection and subgroup identification. In this article, we propose a novel decision tree-based approach applicable in randomized clinical trials. We model the prognostic effects of the biomarkers using additive regression trees and the biomarker-by-treatment effect using a single regression tree. Bayesian approach is utilized to periodically revise the split variables and the split rules of the decision trees, which provides a better overall fitting. Gibbs sampler is implemented in the MCMC procedure, which updates the prognostic trees and the interaction tree separately. We use the posterior distribution of the interaction tree to construct the predictive scores of the biomarkers and to identify the subgroup where the treatment is superior to the control. Numerical simulations show that our proposed method performs well under various settings comparing to existing methods. We also demonstrate an application of our method in a real clinical trial.

  11. Retro-regression--another important multivariate regression improvement.

    Science.gov (United States)

    Randić, M

    2001-01-01

    We review the serious problem associated with instabilities of the coefficients of regression equations, referred to as the MRA (multivariate regression analysis) "nightmare of the first kind". This is manifested when in a stepwise regression a descriptor is included or excluded from a regression. The consequence is an unpredictable change of the coefficients of the descriptors that remain in the regression equation. We follow with consideration of an even more serious problem, referred to as the MRA "nightmare of the second kind", arising when optimal descriptors are selected from a large pool of descriptors. This process typically causes at different steps of the stepwise regression a replacement of several previously used descriptors by new ones. We describe a procedure that resolves these difficulties. The approach is illustrated on boiling points of nonanes which are considered (1) by using an ordered connectivity basis; (2) by using an ordering resulting from application of greedy algorithm; and (3) by using an ordering derived from an exhaustive search for optimal descriptors. A novel variant of multiple regression analysis, called retro-regression (RR), is outlined showing how it resolves the ambiguities associated with both "nightmares" of the first and the second kind of MRA.

  12. Hierarchical Matching and Regression with Application to Photometric Redshift Estimation

    Science.gov (United States)

    Murtagh, Fionn

    2017-06-01

    This work emphasizes that heterogeneity, diversity, discontinuity, and discreteness in data is to be exploited in classification and regression problems. A global a priori model may not be desirable. For data analytics in cosmology, this is motivated by the variety of cosmological objects such as elliptical, spiral, active, and merging galaxies at a wide range of redshifts. Our aim is matching and similarity-based analytics that takes account of discrete relationships in the data. The information structure of the data is represented by a hierarchy or tree where the branch structure, rather than just the proximity, is important. The representation is related to p-adic number theory. The clustering or binning of the data values, related to the precision of the measurements, has a central role in this methodology. If used for regression, our approach is a method of cluster-wise regression, generalizing nearest neighbour regression. Both to exemplify this analytics approach, and to demonstrate computational benefits, we address the well-known photometric redshift or `photo-z' problem, seeking to match Sloan Digital Sky Survey (SDSS) spectroscopic and photometric redshifts.

  13. Reduced Rank Regression

    DEFF Research Database (Denmark)

    Johansen, Søren

    2008-01-01

    The reduced rank regression model is a multivariate regression model with a coefficient matrix with reduced rank. The reduced rank regression algorithm is an estimation procedure, which estimates the reduced rank regression model. It is related to canonical correlations and involves calculating...

  14. Recursive Trees for Practical ORAM

    Directory of Open Access Journals (Sweden)

    Moataz Tarik

    2015-06-01

    Full Text Available We present a new, general data structure that reduces the communication cost of recent tree-based ORAMs. Contrary to ORAM trees with constant height and path lengths, our new construction r-ORAM allows for trees with varying shorter path length. Accessing an element in the ORAM tree results in different communication costs depending on the location of the element. The main idea behind r-ORAM is a recursive ORAM tree structure, where nodes in the tree are roots of other trees. While this approach results in a worst-case access cost (tree height at most as any recent tree-based ORAM, we show that the average cost saving is around 35% for recent binary tree ORAMs. Besides reducing communication cost, r-ORAM also reduces storage overhead on the server by 4% to 20% depending on the ORAM’s client memory type. To prove r-ORAM’s soundness, we conduct a detailed overflow analysis. r-ORAM’s recursive approach is general in that it can be applied to all recent tree ORAMs, both constant and poly-log client memory ORAMs. Finally, we implement and benchmark r-ORAM in a practical setting to back up our theoretical claims.

  15. Tree Nut Allergies

    Science.gov (United States)

    ... Blog Vision Awards Common Allergens Tree Nut Allergy Tree Nut Allergy Learn about tree nut allergy, how ... a Tree Nut Label card . Allergic Reactions to Tree Nuts Tree nuts can cause a severe and ...

  16. From Rasch scores to regression

    DEFF Research Database (Denmark)

    Christensen, Karl Bang

    2006-01-01

    Rasch models provide a framework for measurement and modelling latent variables. Having measured a latent variable in a population a comparison of groups will often be of interest. For this purpose the use of observed raw scores will often be inadequate because these lack interval scale propertie....... This paper compares two approaches to group comparison: linear regression models using estimated person locations as outcome variables and latent regression models based on the distribution of the score....

  17. Modifiable risk factors predicting major depressive disorder at four year follow-up: a decision tree approach

    OpenAIRE

    Batterham, Philip J; Christensen, Helen; Mackinnon, Andrew J

    2009-01-01

    Abstract Background Relative to physical health conditions such as cardiovascular disease, little is known about risk factors that predict the prevalence of depression. The present study investigates the expected effects of a reduction of these risks over time, using the decision tree method favoured in assessing cardiovascular disease risk. Methods The PATH through Life cohort was used for the study, comprising 2,105 20-24 year olds, 2,323 40-44 year olds and 2,177 60-64 year olds sampled fr...

  18. Human figure drawings and house tree person drawings as indicators of self-esteem: a quantitative approach.

    Science.gov (United States)

    Groth-Marnat, G; Roberts, L

    1998-02-01

    This study assessed the concurrent validity of Human Figure Drawings (HFD) and House-Tree-Person (HTP) drawings as measures of self-esteem. Adult subjects were requested to make HFD and HTP drawings and to complete measures of psychological adjustment which included the Coopersmith Self Esteem Inventory and Tennessee Self Concept Scale. The drawings were scored using a quantitative, composite rating scale derived from HFD and HTP empirical and theoretical literature on psychological health. Results indicated that neither the HFD nor the HTP quantitative composite ratings of psychological health related to the formal measures of self-esteem.

  19. Evaluating the User Acceptance Level of a Novel Tree-Based Approach on Moddeling Medical Research Data

    Directory of Open Access Journals (Sweden)

    Péter OLÁH

    2016-12-01

    Full Text Available Modern medical research needs specialized Information Systems that can store and process large amounts of complex scientific data, while providing the medical researcher with a high degree of flexibility regarding the design of the data structure. In our previous work we have proposed an architecture for such a system based on trees and multitrees, stored in a relational database. In this paper we present the tools that we have used to assess the user acceptance level of this system along with the results of this evaluation. Our findings have validated the benefits of the proposed system and helped us identify some relevant aspects that need further improvement.

  20. Use of Multiple Linear Regression Models for Setting Water Quality Criteria for Copper: A Complementary Approach to the Biotic Ligand Model.

    Science.gov (United States)

    Brix, Kevin V; DeForest, David K; Tear, Lucinda; Grosell, Martin; Adams, William J

    2017-05-02

    Biotic Ligand Models (BLMs) for metals are widely applied in ecological risk assessments and in the development of regulatory water quality guidelines in Europe, and in 2007 the United States Environmental Protection Agency (USEPA) recommended BLM-based water quality criteria (WQC) for Cu in freshwater. However, to-date, few states have adopted BLM-based Cu criteria into their water quality standards on a state-wide basis, which appears to be due to the perception that the BLM is too complicated or requires too many input variables. Using the mechanistic BLM framework to first identify key water chemistry parameters that influence Cu bioavailability, namely dissolved organic carbon (DOC), pH, and hardness, we developed Cu criteria using the same basic methodology used by the USEPA to derive hardness-based criteria but with the addition of DOC and pH. As an initial proof of concept, we developed stepwise multiple linear regression (MLR) models for species that have been tested over wide ranges of DOC, pH, and hardness conditions. These models predicted acute Cu toxicity values that were within a factor of ±2 in 77% to 97% of tests (5 species had adequate data) and chronic Cu toxicity values that were within a factor of ±2 in 92% of tests (1 species had adequate data). This level of accuracy is comparable to the BLM. Following USEPA guidelines for WQC development, the species data were then combined to develop a linear model with pooled slopes for each independent parameter (i.e., DOC, pH, and hardness) and species-specific intercepts using Analysis of Covariance. The pooled MLR and BLM models predicted species-specific toxicity with similar precision; adjusted R 2 and R 2 values ranged from 0.56 to 0.86 and 0.66-0.85, respectively. Graphical exploration of relationships between predicted and observed toxicity, residuals and observed toxicity, and residuals and concentrations of key input parameters revealed many similarities and a few key distinctions between the

  1. Flowering Trees

    Indian Academy of Sciences (India)

    medium-sized handsome tree with a straight bole that branches at the top. Leaves are once pinnate, with two to three pairs of leaflets. Young parts of the tree are velvety. Inflorescence is a branched raceme borne at the branch ends. Flowers are large, white, attractive, and fragrant. Corolla is funnel-shaped. Fruit is an ...

  2. Flowering Trees

    Indian Academy of Sciences (India)

    Cassia siamia Lamk. (Siamese tree senna) of Caesalpiniaceae is a small or medium size handsome tree. Leaves are alternate, pinnately compound and glandular, upto 18 cm long with 8–12 pairs of leaflets. Inflorescence is axillary or terminal and branched. Flowering lasts for a long period from March to February. Fruit is ...

  3. Flowering Trees

    Indian Academy of Sciences (India)

    Flowering Trees. Cerbera manghasL. (SEA MANGO) of Apocynaceae is a medium-sized evergreen coastal tree with milky latex. The bark is grey-brown, thick and ... Fruit is large. (5–10 cm long), oval containing two flattened seeds and resembles a mango, hence the name Mangas or. Manghas. Leaves and fruits contain ...

  4. Flowering Trees

    Indian Academy of Sciences (India)

    user

    Flowering Trees. Gliricidia sepium(Jacq.) Kunta ex Walp. (Quickstick) of Fabaceae is a small deciduous tree with. Pinnately compound leaves. Flower are prroduced in large number in early summer on terminal racemes. They are attractive, pinkish-white and typically like bean flowers. Fruit is a few-seeded flat pod.

  5. Flowering Trees

    Indian Academy of Sciences (India)

    Flowering Trees. Acrocarpus fraxinifolius Wight & Arn. (PINK CEDAR, AUSTRALIAN ASH) of. Caesalpiniaceae is a lofty unarmed deciduous native tree that attains a height of 30–60m with buttresses. Bark is thin and light grey. Leaves are compound and bright red when young. Flowers in dense, erect, axillary racemes.

  6. Talking Trees

    Science.gov (United States)

    Tolman, Marvin

    2005-01-01

    Students love outdoor activities and will love them even more when they build confidence in their tree identification and measurement skills. Through these activities, students will learn to identify the major characteristics of trees and discover how the pace--a nonstandard measuring unit--can be used to estimate not only distances but also the…

  7. Drawing Trees

    DEFF Research Database (Denmark)

    Halkjær From, Andreas; Schlichtkrull, Anders; Villadsen, Jørgen

    2018-01-01

    We formally prove in Isabelle/HOL two properties of an algorithm for laying out trees visually. The first property states that removing layout annotations recovers the original tree. The second property states that nodes are placed at least a unit of distance apart. We have yet to formalize three...

  8. Flowering Trees

    Indian Academy of Sciences (India)

    Srimath

    Grevillea robusta A. Cunn. ex R. Br. (Sil- ver Oak) of Proteaceae is a daintily lacy ornamental tree while young and growing into a mighty tree (45 m). Young shoots are silvery grey and the leaves are fern- like. Flowers are golden-yellow in one- sided racemes (10 cm). Fruit is a boat- shaped, woody follicle.

  9. DeepSAT: A Deep Learning Approach to Tree-cover Delineation in 1-m NAIP Imagery for the Continental United States

    Science.gov (United States)

    Ganguly, S.; Basu, S.; Nemani, R. R.; Mukhopadhyay, S.; Michaelis, A.; Votava, P.

    2016-12-01

    High resolution tree cover classification maps are needed to increase the accuracy of current land ecosystem and climate model outputs. Limited studies are in place that demonstrates the state-of-the-art in deriving very high resolution (VHR) tree cover products. In addition, most methods heavily rely on commercial softwares that are difficult to scale given the region of study (e.g. continents to globe). Complexities in present approaches relate to (a) scalability of the algorithm, (b) large image data processing (compute and memory intensive), (c) computational cost, (d) massively parallel architecture, and (e) machine learning automation. In addition, VHR satellite datasets are of the order of terabytes and features extracted from these datasets are of the order of petabytes. In our present study, we have acquired the National Agriculture Imagery Program (NAIP) dataset for the Continental United States at a spatial resolution of 1-m. This data comes as image tiles (a total of quarter million image scenes with 60 million pixels) and has a total size of 65 terabytes for a single acquisition. Features extracted from the entire dataset would amount to 8-10 petabytes. In our proposed approach, we have implemented a novel semi-automated machine learning algorithm rooted on the principles of "deep learning" to delineate the percentage of tree cover. Using the NASA Earth Exchange (NEX) initiative, we have developed an end-to-end architecture by integrating a segmentation module based on Statistical Region Merging, a classification algorithm using Deep Belief Network and a structured prediction algorithm using Conditional Random Fields to integrate the results from the segmentation and classification modules to create per-pixel class labels. The training process is scaled up using the power of GPUs and the prediction is scaled to quarter million NAIP tiles spanning the whole of Continental United States using the NEX HPC supercomputing cluster. An initial pilot over the

  10. DeepSAT: A Deep Learning Approach to Tree-Cover Delineation in 1-m NAIP Imagery for the Continental United States

    Science.gov (United States)

    Ganguly, Sangram; Basu, Saikat; Nemani, Ramakrishna R.; Mukhopadhyay, Supratik; Michaelis, Andrew; Votava, Petr

    2016-01-01

    High resolution tree cover classification maps are needed to increase the accuracy of current land ecosystem and climate model outputs. Limited studies are in place that demonstrates the state-of-the-art in deriving very high resolution (VHR) tree cover products. In addition, most methods heavily rely on commercial softwares that are difficult to scale given the region of study (e.g. continents to globe). Complexities in present approaches relate to (a) scalability of the algorithm, (b) large image data processing (compute and memory intensive), (c) computational cost, (d) massively parallel architecture, and (e) machine learning automation. In addition, VHR satellite datasets are of the order of terabytes and features extracted from these datasets are of the order of petabytes. In our present study, we have acquired the National Agriculture Imagery Program (NAIP) dataset for the Continental United States at a spatial resolution of 1-m. This data comes as image tiles (a total of quarter million image scenes with 60 million pixels) and has a total size of 65 terabytes for a single acquisition. Features extracted from the entire dataset would amount to 8-10 petabytes. In our proposed approach, we have implemented a novel semi-automated machine learning algorithm rooted on the principles of "deep learning" to delineate the percentage of tree cover. Using the NASA Earth Exchange (NEX) initiative, we have developed an end-to-end architecture by integrating a segmentation module based on Statistical Region Merging, a classification algorithm using Deep Belief Network and a structured prediction algorithm using Conditional Random Fields to integrate the results from the segmentation and classification modules to create per-pixel class labels. The training process is scaled up using the power of GPUs and the prediction is scaled to quarter million NAIP tiles spanning the whole of Continental United States using the NEX HPC supercomputing cluster. An initial pilot over the

  11. The Nuisance of Nuisance Regression: Spectral Misspecification in a Common Approach to Resting-State fMRI Preprocessing Reintroduces Noise and Obscures Functional Connectivity

    OpenAIRE

    Hallquist, Michael N.; Hwang, Kai; Luna, Beatriz

    2013-01-01

    Recent resting-state functional connectivity fMRI (RS-fcMRI) research has demonstrated that head motion during fMRI acquisition systematically influences connectivity estimates despite bandpass filtering and nuisance regression, which are intended to reduce such nuisance variability. We provide evidence that the effects of head motion and other nuisance signals are poorly controlled when the fMRI time series are bandpass-filtered but the regressors are unfiltered, resulting in the inadvertent...

  12. Assessing the predictive capability of randomized tree-based ensembles in streamflow modelling

    Science.gov (United States)

    Galelli, S.; Castelletti, A.

    2013-07-01

    Combining randomization methods with ensemble prediction is emerging as an effective option to balance accuracy and computational efficiency in data-driven modelling. In this paper, we investigate the prediction capability of extremely randomized trees (Extra-Trees), in terms of accuracy, explanation ability and computational efficiency, in a streamflow modelling exercise. Extra-Trees are a totally randomized tree-based ensemble method that (i) alleviates the poor generalisation property and tendency to overfitting of traditional standalone decision trees (e.g. CART); (ii) is computationally efficient; and, (iii) allows to infer the relative importance of the input variables, which might help in the ex-post physical interpretation of the model. The Extra-Trees potential is analysed on two real-world case studies - Marina catchment (Singapore) and Canning River (Western Australia) - representing two different morphoclimatic contexts. The evaluation is performed against other tree-based methods (CART and M5) and parametric data-driven approaches (ANNs and multiple linear regression). Results show that Extra-Trees perform comparatively well to the best of the benchmarks (i.e. M5) in both the watersheds, while outperforming the other approaches in terms of computational requirement when adopted on large datasets. In addition, the ranking of the input variable provided can be given a physically meaningful interpretation.

  13. A New Predictive Model Based on the ABC Optimized Multivariate Adaptive Regression Splines Approach for Predicting the Remaining Useful Life in Aircraft Engines

    Directory of Open Access Journals (Sweden)

    Paulino José García Nieto

    2016-05-01

    Full Text Available Remaining useful life (RUL estimation is considered as one of the most central points in the prognostics and health management (PHM. The present paper describes a nonlinear hybrid ABC–MARS-based model for the prediction of the remaining useful life of aircraft engines. Indeed, it is well-known that an accurate RUL estimation allows failure prevention in a more controllable way so that the effective maintenance can be carried out in appropriate time to correct impending faults. The proposed hybrid model combines multivariate adaptive regression splines (MARS, which have been successfully adopted for regression problems, with the artificial bee colony (ABC technique. This optimization technique involves parameter setting in the MARS training procedure, which significantly influences the regression accuracy. However, its use in reliability applications has not yet been widely explored. Bearing this in mind, remaining useful life values have been predicted here by using the hybrid ABC–MARS-based model from the remaining measured parameters (input variables for aircraft engines with success. A correlation coefficient equal to 0.92 was obtained when this hybrid ABC–MARS-based model was applied to experimental data. The agreement of this model with experimental data confirmed its good performance. The main advantage of this predictive model is that it does not require information about the previous operation states of the aircraft engine.

  14. Canonical variate regression.

    Science.gov (United States)

    Luo, Chongliang; Liu, Jin; Dey, Dipak K; Chen, Kun

    2016-07-01

    In many fields, multi-view datasets, measuring multiple distinct but interrelated sets of characteristics on the same set of subjects, together with data on certain outcomes or phenotypes, are routinely collected. The objective in such a problem is often two-fold: both to explore the association structures of multiple sets of measurements and to develop a parsimonious model for predicting the future outcomes. We study a unified canonical variate regression framework to tackle the two problems simultaneously. The proposed criterion integrates multiple canonical correlation analysis with predictive modeling, balancing between the association strength of the canonical variates and their joint predictive power on the outcomes. Moreover, the proposed criterion seeks multiple sets of canonical variates simultaneously to enable the examination of their joint effects on the outcomes, and is able to handle multivariate and non-Gaussian outcomes. An efficient algorithm based on variable splitting and Lagrangian multipliers is proposed. Simulation studies show the superior performance of the proposed approach. We demonstrate the effectiveness of the proposed approach in an [Formula: see text] intercross mice study and an alcohol dependence study. © The Author 2016. Published by Oxford University Press. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.

  15. Regression analysis by example

    CERN Document Server

    Chatterjee, Samprit

    2012-01-01

    Praise for the Fourth Edition: ""This book is . . . an excellent source of examples for regression analysis. It has been and still is readily readable and understandable."" -Journal of the American Statistical Association Regression analysis is a conceptually simple method for investigating relationships among variables. Carrying out a successful application of regression analysis, however, requires a balance of theoretical results, empirical rules, and subjective judgment. Regression Analysis by Example, Fifth Edition has been expanded

  16. Logic regression and its extensions.

    Science.gov (United States)

    Schwender, Holger; Ruczinski, Ingo

    2010-01-01

    Logic regression is an adaptive classification and regression procedure, initially developed to reveal interacting single nucleotide polymorphisms (SNPs) in genetic association studies. In general, this approach can be used in any setting with binary predictors, when the interaction of these covariates is of primary interest. Logic regression searches for Boolean (logic) combinations of binary variables that best explain the variability in the outcome variable, and thus, reveals variables and interactions that are associated with the response and/or have predictive capabilities. The logic expressions are embedded in a generalized linear regression framework, and thus, logic regression can handle a variety of outcome types, such as binary responses in case-control studies, numeric responses, and time-to-event data. In this chapter, we provide an introduction to the logic regression methodology, list some applications in public health and medicine, and summarize some of the direct extensions and modifications of logic regression that have been proposed in the literature. Copyright © 2010 Elsevier Inc. All rights reserved.

  17. Visualizing phylogenetic tree landscapes.

    Science.gov (United States)

    Wilgenbusch, James C; Huang, Wen; Gallivan, Kyle A

    2017-02-02

    Genomic-scale sequence alignments are increasingly used to infer phylogenies in order to better understand the processes and patterns of evolution. Different partitions within these new alignments (e.g., genes, codon positions, and structural features) often favor hundreds if not thousands of competing phylogenies. Summarizing and comparing phylogenies obtained from multi-source data sets using current consensus tree methods discards valuable information and can disguise potential methodological problems. Discovery of efficient and accurate dimensionality reduction methods used to display at once in 2- or 3- dimensions the relationship among these competing phylogenies will help practitioners diagnose the limits of current evolutionary models and potential problems with phylogenetic reconstruction methods when analyzing large multi-source data sets. We introduce several dimensionality reduction methods to visualize in 2- and 3-dimensions the relationship among competing phylogenies obtained from gene partitions found in three mid- to large-size mitochondrial genome alignments. We test the performance of these dimensionality reduction methods by applying several goodness-of-fit measures. The intrinsic dimensionality of each data set is also estimated to determine whether projections in 2- and 3-dimensions can be expected to reveal meaningful relationships among trees from different data partitions. Several new approaches to aid in the comparison of different phylogenetic landscapes are presented. Curvilinear Components Analysis (CCA) and a stochastic gradient decent (SGD) optimization method give the best representation of the original tree-to-tree distance matrix for each of the three- mitochondrial genome alignments and greatly outperformed the method currently used to visualize tree landscapes. The CCA + SGD method converged at least as fast as previously applied methods for visualizing tree landscapes. We demonstrate for all three mtDNA alignments that 3D

  18. Phylogenetic trees

    OpenAIRE

    Baños, Hector; Bushek, Nathaniel; Davidson, Ruth; Gross, Elizabeth; Harris, Pamela E.; Krone, Robert; Long, Colby; Stewart, Allen; Walker, Robert

    2016-01-01

    We introduce the package PhylogeneticTrees for Macaulay2 which allows users to compute phylogenetic invariants for group-based tree models. We provide some background information on phylogenetic algebraic geometry and show how the package PhylogeneticTrees can be used to calculate a generating set for a phylogenetic ideal as well as a lower bound for its dimension. Finally, we show how methods within the package can be used to compute a generating set for the join of any two ideals.

  19. Polychlorinated biphenyl congener patterns in tree swallows (Tachycineta bicolor) nesting in the Housatonic River watershed, western Massachusetts, USA, using a novel statistical approach

    International Nuclear Information System (INIS)

    Custer, Christine M.; Read, Lorraine B.

    2006-01-01

    A novel application of a commonly used statistical approach was used to examine differences in polychlorinated biphenyl (PCB) congener patterns among locations and sample matrices in tree swallows (Tachycineta bicolor) nesting in the Housatonic River watershed in western Massachusetts, USA. The most prevalent PCB congeners in tree swallow tissue samples from the Housatonic River watershed were Ballsmitter Zell numbers 153, 138, 180, 187, 149, 101, and 170. These congeners were seven of the eight most prevalent congeners in Aroclor[reg] 1260, the PCB mixture that was the primary source of contamination in the Housatonic River system. Using paired-Euclidean distances and tolerance limits, it was demonstrated that congener patterns in swallow tissues from sites on the main stem of the Housatonic River were more similar to one another than to two sites upstream of the contamination or from a nearby reference area. The congener patterns also differed between the reference area and the two upstream tributaries and between the two tributaries. These pattern differences were the same in both pipper (eggs or just hatched nestlings) and 12-day-old nestling samples. Lower-chlorinated congeners appeared to be metabolized in nestlings and pippers compared to diet, and metabolized more in pippers compared to nestlings. Euclidean distances and tolerance limits provide a simple and statistically valid method to compare PCB congener patterns among groups. - Polychlorinated biphenyl congener patterns in swallows differed between the main stem of the Housatonic River, MA and other locations in the watershed

  20. Assessing a Template Matching Approach for Tree Height and Position Extraction from Lidar-Derived Canopy Height Models of Pinus Pinaster Stands

    Directory of Open Access Journals (Sweden)

    Francesco Pirotti

    2010-10-01

    Full Text Available In this paper, an assessment of a method using a correlation filter over a lidar-derived digital canopy height model (CHM is presented. The objective of the procedure is to obtain stem density, position, and height values, on a stand with the following characteristics: ellipsoidal canopy shape (Pinus pinaster, even-aged and single-layer structure. The process consists of three steps: extracting a correlation map from CHM by applying a template whose size and shape resembles the canopy to be detected, applying a threshold mask to the correlation map to keep a subset of candidate-pixels, and then applying a local maximum filter to the remaining pixel groups. The method performs satisfactorily considering the experimental conditions. The mean tree extraction percentage is 65% with a coefficient of agreement of 0.4. The mean absolute error of height is ~0.5 m for all plots except one. It can be considered a valid approach for extracting tree density and height in regularly spaced stands (i.e., poplar plantations which are fundamental for extracting related forest parameters such as volume and biomass.