WorldWideScience

Sample records for regression tree approach

  1. Classification and regression trees

    CERN Document Server

    Breiman, Leo; Olshen, Richard A; Stone, Charles J

    1984-01-01

    The methodology used to construct tree structured rules is the focus of this monograph. Unlike many other statistical procedures, which moved from pencil and paper to calculators, this text's use of trees was unthinkable before computers. Both the practical and theoretical sides have been developed in the authors' study of tree methods. Classification and Regression Trees reflects these two sides, covering the use of trees as a data analysis method, and in a more mathematical framework, proving some of their fundamental properties.

  2. A Hybrid Approach of Stepwise Regression, Logistic Regression, Support Vector Machine, and Decision Tree for Forecasting Fraudulent Financial Statements

    Directory of Open Access Journals (Sweden)

    Suduan Chen

    2014-01-01

    Full Text Available As the fraudulent financial statement of an enterprise is increasingly serious with each passing day, establishing a valid forecasting fraudulent financial statement model of an enterprise has become an important question for academic research and financial practice. After screening the important variables using the stepwise regression, the study also matches the logistic regression, support vector machine, and decision tree to construct the classification models to make a comparison. The study adopts financial and nonfinancial variables to assist in establishment of the forecasting fraudulent financial statement model. Research objects are the companies to which the fraudulent and nonfraudulent financial statement happened between years 1998 to 2012. The findings are that financial and nonfinancial information are effectively used to distinguish the fraudulent financial statement, and decision tree C5.0 has the best classification effect 85.71%.

  3. Prognostic transcriptional association networks: a new supervised approach based on regression trees

    Science.gov (United States)

    Nepomuceno-Chamorro, Isabel; Azuaje, Francisco; Devaux, Yvan; Nazarov, Petr V.; Muller, Arnaud; Aguilar-Ruiz, Jesús S.; Wagner, Daniel R.

    2011-01-01

    Motivation: The application of information encoded in molecular networks for prognostic purposes is a crucial objective of systems biomedicine. This approach has not been widely investigated in the cardiovascular research area. Within this area, the prediction of clinical outcomes after suffering a heart attack would represent a significant step forward. We developed a new quantitative prediction-based method for this prognostic problem based on the discovery of clinically relevant transcriptional association networks. This method integrates regression trees and clinical class-specific networks, and can be applied to other clinical domains. Results: Before analyzing our cardiovascular disease dataset, we tested the usefulness of our approach on a benchmark dataset with control and disease patients. We also compared it to several algorithms to infer transcriptional association networks and classification models. Comparative results provided evidence of the prediction power of our approach. Next, we discovered new models for predicting good and bad outcomes after myocardial infarction. Using blood-derived gene expression data, our models reported areas under the receiver operating characteristic curve above 0.70. Our model could also outperform different techniques based on co-expressed gene modules. We also predicted processes that may represent novel therapeutic targets for heart disease, such as the synthesis of leucine and isoleucine. Availability: The SATuRNo software is freely available at http://www.lsi.us.es/isanepo/toolsSaturno/. Contact: inepomuceno@us.es Supplementary information: Supplementary data are available at Bioinformatics online. PMID:21098433

  4. Analysis of the impact of recreational trail usage for prioritising management decisions: a regression tree approach

    Science.gov (United States)

    Tomczyk, Aleksandra; Ewertowski, Marek; White, Piran; Kasprzak, Leszek

    2016-04-01

    The dual role of many Protected Natural Areas in providing benefits for both conservation and recreation poses challenges for management. Although recreation-based damage to ecosystems can occur very quickly, restoration can take many years. The protection of conservation interests at the same as providing for recreation requires decisions to be made about how to prioritise and direct management actions. Trails are commonly used to divert visitors from the most important areas of a site, but high visitor pressure can lead to increases in trail width and a concomitant increase in soil erosion. Here we use detailed field data on condition of recreational trails in Gorce National Park, Poland, as the basis for a regression tree analysis to determine the factors influencing trail deterioration, and link specific trail impacts with environmental, use related and managerial factors. We distinguished 12 types of trails, characterised by four levels of degradation: (1) trails with an acceptable level of degradation; (2) threatened trails; (3) damaged trails; and (4) heavily damaged trails. Damaged trails were the most vulnerable of all trails and should be prioritised for appropriate conservation and restoration. We also proposed five types of monitoring of recreational trail conditions: (1) rapid inventory of negative impacts; (2) monitoring visitor numbers and variation in type of use; (3) change-oriented monitoring focusing on sections of trail which were subjected to changes in type or level of use or subjected to extreme weather events; (4) monitoring of dynamics of trail conditions; and (5) full assessment of trail conditions, to be carried out every 10-15 years. The application of the proposed framework can enhance the ability of Park managers to prioritise their trail management activities, enhancing trail conditions and visitor safety, while minimising adverse impacts on the conservation value of the ecosystem. A.M.T. was supported by the Polish Ministry of

  5. Modelling the spatial distribution of Fasciola hepatica in bovines using decision tree, logistic regression and GIS query approaches for Brazil.

    Science.gov (United States)

    Bennema, S C; Molento, M B; Scholte, R G; Carvalho, O S; Pritsch, I

    2017-11-01

    Fascioliasis is a condition caused by the trematode Fasciola hepatica. In this paper, the spatial distribution of F. hepatica in bovines in Brazil was modelled using a decision tree approach and a logistic regression, combined with a geographic information system (GIS) query. In the decision tree and the logistic model, isothermality had the strongest influence on disease prevalence. Also, the 50-year average precipitation in the warmest quarter of the year was included as a risk factor, having a negative influence on the parasite prevalence. The risk maps developed using both techniques, showed a predicted higher prevalence mainly in the South of Brazil. The prediction performance seemed to be high, but both techniques failed to reach a high accuracy in predicting the medium and high prevalence classes to the entire country. The GIS query map, based on the range of isothermality, minimum temperature of coldest month, precipitation of warmest quarter of the year, altitude and the average dailyland surface temperature, showed a possibility of presence of F. hepatica in a very large area. The risk maps produced using these methods can be used to focus activities of animal and public health programmes, even on non-evaluated F. hepatica areas.

  6. Subgroup finding via Bayesian additive regression trees.

    Science.gov (United States)

    Sivaganesan, Siva; Müller, Peter; Huang, Bin

    2017-03-09

    We provide a Bayesian decision theoretic approach to finding subgroups that have elevated treatment effects. Our approach separates the modeling of the response variable from the task of subgroup finding and allows a flexible modeling of the response variable irrespective of potential subgroups of interest. We use Bayesian additive regression trees to model the response variable and use a utility function defined in terms of a candidate subgroup and the predicted response for that subgroup. Subgroups are identified by maximizing the expected utility where the expectation is taken with respect to the posterior predictive distribution of the response, and the maximization is carried out over an a priori specified set of candidate subgroups. Our approach allows subgroups based on both quantitative and categorical covariates. We illustrate the approach using simulated data set study and a real data set. Copyright © 2017 John Wiley & Sons, Ltd.

  7. Boosted regression tree, table, and figure data

    Science.gov (United States)

    Spreadsheets are included here to support the manuscript Boosted Regression Tree Models to Explain Watershed Nutrient Concentrations and Biological Condition. This dataset is associated with the following publication:Golden , H., C. Lane , A. Prues, and E. D'Amico. Boosted Regression Tree Models to Explain Watershed Nutrient Concentrations and Biological Condition. JAWRA. American Water Resources Association, Middleburg, VA, USA, 52(5): 1251-1274, (2016).

  8. Inferring gene regression networks with model trees

    Directory of Open Access Journals (Sweden)

    Aguilar-Ruiz Jesus S

    2010-10-01

    Full Text Available Abstract Background Novel strategies are required in order to handle the huge amount of data produced by microarray technologies. To infer gene regulatory networks, the first step is to find direct regulatory relationships between genes building the so-called gene co-expression networks. They are typically generated using correlation statistics as pairwise similarity measures. Correlation-based methods are very useful in order to determine whether two genes have a strong global similarity but do not detect local similarities. Results We propose model trees as a method to identify gene interaction networks. While correlation-based methods analyze each pair of genes, in our approach we generate a single regression tree for each gene from the remaining genes. Finally, a graph from all the relationships among output and input genes is built taking into account whether the pair of genes is statistically significant. For this reason we apply a statistical procedure to control the false discovery rate. The performance of our approach, named REGNET, is experimentally tested on two well-known data sets: Saccharomyces Cerevisiae and E.coli data set. First, the biological coherence of the results are tested. Second the E.coli transcriptional network (in the Regulon database is used as control to compare the results to that of a correlation-based method. This experiment shows that REGNET performs more accurately at detecting true gene associations than the Pearson and Spearman zeroth and first-order correlation-based methods. Conclusions REGNET generates gene association networks from gene expression data, and differs from correlation-based methods in that the relationship between one gene and others is calculated simultaneously. Model trees are very useful techniques to estimate the numerical values for the target genes by linear regression functions. They are very often more precise than linear regression models because they can add just different linear

  9. SMOOTH TRANSITION LOGISTIC REGRESSION MODEL TREE

    OpenAIRE

    RODRIGO PINTO MOREIRA

    2008-01-01

    Este trabalho tem como objetivo principal adaptar o modelo STR-Tree, o qual é a combinação de um modelo Smooth Transition Regression com Classification and Regression Tree (CART), a fim de utilizá-lo em Classificação. Para isto algumas alterações foram realizadas em sua forma estrutural e na estimação. Devido ao fato de estarmos fazendo classificação de variáveis dependentes binárias, se faz necessária a utilização das técnicas empregadas em Regressão Logística, dessa forma a estimação dos pa...

  10. Boosted Regression Tree Models to Explain Watershed ...

    Science.gov (United States)

    Boosted regression tree (BRT) models were developed to quantify the nonlinear relationships between landscape variables and nutrient concentrations in a mesoscale mixed land cover watershed during base-flow conditions. Factors that affect instream biological components, based on the Index of Biotic Integrity (IBI), were also analyzed. Seasonal BRT models at two spatial scales (watershed and riparian buffered area [RBA]) for nitrite-nitrate (NO2-NO3), total Kjeldahl nitrogen, and total phosphorus (TP) and annual models for the IBI score were developed. Two primary factors — location within the watershed (i.e., geographic position, stream order, and distance to a downstream confluence) and percentage of urban land cover (both scales) — emerged as important predictor variables. Latitude and longitude interacted with other factors to explain the variability in summer NO2-NO3 concentrations and IBI scores. BRT results also suggested that location might be associated with indicators of sources (e.g., land cover), runoff potential (e.g., soil and topographic factors), and processes not easily represented by spatial data indicators. Runoff indicators (e.g., Hydrological Soil Group D and Topographic Wetness Indices) explained a substantial portion of the variability in nutrient concentrations as did point sources for TP in the summer months. The results from our BRT approach can help prioritize areas for nutrient management in mixed-use and heavily impacted watershed

  11. Regression analysis using dependent Polya trees.

    Science.gov (United States)

    Schörgendorfer, Angela; Branscum, Adam J

    2013-11-30

    Many commonly used models for linear regression analysis force overly simplistic shape and scale constraints on the residual structure of data. We propose a semiparametric Bayesian model for regression analysis that produces data-driven inference by using a new type of dependent Polya tree prior to model arbitrary residual distributions that are allowed to evolve across increasing levels of an ordinal covariate (e.g., time, in repeated measurement studies). By modeling residual distributions at consecutive covariate levels or time points using separate, but dependent Polya tree priors, distributional information is pooled while allowing for broad pliability to accommodate many types of changing residual distributions. We can use the proposed dependent residual structure in a wide range of regression settings, including fixed-effects and mixed-effects linear and nonlinear models for cross-sectional, prospective, and repeated measurement data. A simulation study illustrates the flexibility of our novel semiparametric regression model to accurately capture evolving residual distributions. In an application to immune development data on immunoglobulin G antibodies in children, our new model outperforms several contemporary semiparametric regression models based on a predictive model selection criterion. Copyright © 2013 John Wiley & Sons, Ltd.

  12. Classification and Regression Trees(CART) Theory and Applications

    OpenAIRE

    Timofeev, Roman

    2004-01-01

    This master thesis is devoted to Classification and Regression Trees (CART). CART is classification method which uses historical data to construct decision trees. Depending on available information about the dataset, classification tree or regression tree can be constructed. Constructed tree can be then used for classification of new observations. The first part of the thesis describes fundamental principles of tree construction, different splitting algorithms and pruning procedures. Seco...

  13. Combining regression trees and radial basis function networks.

    Science.gov (United States)

    Orr, M; Hallam, J; Takezawa, K; Murra, A; Ninomiya, S; Oide, M; Leonard, T

    2000-12-01

    We describe a method for non-parametric regression which combines regression trees with radial basis function networks. The method is similar to that of Kubat, who was first to suggest such a combination, but has some significant improvements. We demonstrate the features of the new method, compare its performance with other methods on DELVE data sets and apply it to a real world problem involving the classification of soybean plants from digital images.

  14. Application of Boosting Regression Trees to Preliminary Cost Estimation in Building Construction Projects

    OpenAIRE

    Yoonseok Shin

    2015-01-01

    Among the recent data mining techniques available, the boosting approach has attracted a great deal of attention because of its effective learning algorithm and strong boundaries in terms of its generalization performance. However, the boosting approach has yet to be used in regression problems within the construction domain, including cost estimations, but has been actively utilized in other domains. Therefore, a boosting regression tree (BRT) is applied to cost estimations at the early stag...

  15. Estimation of Tree Cover in an Agricultural Parkland of Senegal Using Rule-Based Regression Tree Modeling

    Directory of Open Access Journals (Sweden)

    Stefanie M. Herrmann

    2013-10-01

    Full Text Available Field trees are an integral part of the farmed parkland landscape in West Africa and provide multiple benefits to the local environment and livelihoods. While field trees have received increasing interest in the context of strengthening resilience to climate variability and change, the actual extent of farmed parkland and spatial patterns of tree cover are largely unknown. We used the rule-based predictive modeling tool Cubist® to estimate field tree cover in the west-central agricultural region of Senegal. A collection of rules and associated multiple linear regression models was constructed from (1 a reference dataset of percent tree cover derived from very high spatial resolution data (2 m Orbview as the dependent variable, and (2 ten years of 10-day 250 m Moderate Resolution Imaging Spectrometer (MODIS Normalized Difference Vegetation Index (NDVI composites and derived phenological metrics as independent variables. Correlation coefficients between modeled and reference percent tree cover of 0.88 and 0.77 were achieved for training and validation data respectively, with absolute mean errors of 1.07 and 1.03 percent tree cover. The resulting map shows a west-east gradient from high tree cover in the peri-urban areas of horticulture and arboriculture to low tree cover in the more sparsely populated eastern part of the study area. A comparison of current (2000s tree cover along this gradient with historic cover as seen on Corona images reveals dynamics of change but also areas of remarkable stability of field tree cover since 1968. The proposed modeling approach can help to identify locations of high and low tree cover in dryland environments and guide ground studies and management interventions aimed at promoting the integration of field trees in agricultural systems.

  16. Iron Supplementation and Altitude: Decision Making Using a Regression Tree

    Directory of Open Access Journals (Sweden)

    Laura A. Garvican-Lewis, Andrew D. Govus, Peter Peeling, Chris R. Abbiss, Christopher J. Gore

    2016-03-01

    not increase in the 8 athletes with Ferritin-Pre <35 µg.L-1 who were supplemented with 105 mg.d-1, it is possible that in these athletes this dose was insufficient to support erythropoiesis. Furthermore, athletes with higher Ferritin-Pre, but large daily iron requirements, may also benefit from supplementation at altitude. The lack of improvement in Hbmass in non-supplemented athletes, may be related to reduced iron availability and mobility, despite plentiful iron stores, perhaps due to an increase in hepcidin post-exercise (Peeling, 2010. In summary, our regression tree suggests if sufficient iron is made available (via supplementation, even ID athletes can improve Hbmass in response to altitude. Some athletes with otherwise normal ferritin may also require supplementation to maintain an iron balance capable of supporting both the haematological and non-haematological adaptations to altitude. We recommend an individualised approach when deciding whether iron supplementation is appropriate, particularly concerning the dose provided.

  17. Rank regression: an alternative regression approach for data with outliers.

    Science.gov (United States)

    Chen, Tian; Tang, Wan; Lu, Ying; Tu, Xin

    2014-10-01

    Linear regression models are widely used in mental health and related health services research. However, the classic linear regression analysis assumes that the data are normally distributed, an assumption that is not met by the data obtained in many studies. One method of dealing with this problem is to use semi-parametric models, which do not require that the data be normally distributed. But semi-parametric models are quite sensitive to outlying observations, so the generated estimates are unreliable when study data includes outliers. In this situation, some researchers trim the extreme values prior to conducting the analysis, but the ad-hoc rules used for data trimming are based on subjective criteria so different methods of adjustment can yield different results. Rank regression provides a more objective approach to dealing with non-normal data that includes outliers. This paper uses simulated and real data to illustrate this useful regression approach for dealing with outliers and compares it to the results generated using classical regression models and semi-parametric regression models.

  18. Sub-pixel estimation of tree cover and bare surface densities using regression tree analysis

    Directory of Open Access Journals (Sweden)

    Carlos Augusto Zangrando Toneli

    2011-09-01

    Full Text Available Sub-pixel analysis is capable of generating continuous fields, which represent the spatial variability of certain thematic classes. The aim of this work was to develop numerical models to represent the variability of tree cover and bare surfaces within the study area. This research was conducted in the riparian buffer within a watershed of the São Francisco River in the North of Minas Gerais, Brazil. IKONOS and Landsat TM imagery were used with the GUIDE algorithm to construct the models. The results were two index images derived with regression trees for the entire study area, one representing tree cover and the other representing bare surface. The use of non-parametric and non-linear regression tree models presented satisfactory results to characterize wetland, deciduous and savanna patterns of forest formation.

  19. Data mining in psychological treatment research: a primer on classification and regression trees.

    Science.gov (United States)

    King, Matthew W; Resick, Patricia A

    2014-10-01

    Data mining of treatment study results can reveal unforeseen but critical insights, such as who receives the most benefit from treatment and under what circumstances. The usefulness and legitimacy of exploratory data analysis have received relatively little recognition, however, and analytic methods well suited to the task are not widely known in psychology. With roots in computer science and statistics, statistical learning approaches offer a credible option: These methods take a more inductive approach to building a model than is done in traditional regression, allowing the data greater role in suggesting the correct relationships between variables rather than imposing them a priori. Classification and regression trees are presented as a powerful, flexible exemplar of statistical learning methods. Trees allow researchers to efficiently identify useful predictors of an outcome and discover interactions between predictors without the need to anticipate and specify these in advance, making them ideal for revealing patterns that inform hypotheses about treatment effects. Trees can also provide a predictive model for forecasting outcomes as an aid to clinical decision making. This primer describes how tree models are constructed, how the results are interpreted and evaluated, and how trees overcome some of the complexities of traditional regression. Examples are drawn from randomized clinical trial data and highlight some interpretations of particular interest to treatment researchers. The limitations of tree models are discussed, and suggestions for further reading and choices in software are offered.

  20. A case study found that a regression tree outperformed multiple linear regression in predicting the relationship between impairments and Social and Productive Activities scores.

    Science.gov (United States)

    Allore, Heather; Tinetti, Mary E; Araujo, Katy L B; Hardy, Susan; Peduzzi, Peter

    2005-02-01

    Many important physiologic and clinical predictors are continuous. Clinical investigators and epidemiologists' interest in these predictors lies, in part, in the risk they pose for adverse outcomes, which may be continuous as well. The relationship between continuous predictors and a continuous outcome may be complex and difficult to interpret. Therefore, methods to detect levels of a predictor variable that predict the outcome and determine the threshold for clinical intervention would provide a beneficial tool for clinical investigators and epidemiologists. We present a case study using regression tree methodology to predict Social and Productive Activities score at 3 years using five modifiable impairments. The predictive ability of regression tree methodology was compared with multiple linear regression using two independent data sets, one for development and one for validation. The regression tree approach and the multiple linear regression model provided similar fit (model deviances) on the development cohort. In the validation cohort, the deviance of the multiple linear regression model was 31% greater than the regression tree approach. Regression tree analysis developed a better model of impairments predicting Social and Productive Activities score that may be more easily applied in research settings than multiple linear regression alone.

  1. Fuzzy multiple linear regression: A computational approach

    Science.gov (United States)

    Juang, C. H.; Huang, X. H.; Fleming, J. W.

    1992-01-01

    This paper presents a new computational approach for performing fuzzy regression. In contrast to Bardossy's approach, the new approach, while dealing with fuzzy variables, closely follows the conventional regression technique. In this approach, treatment of fuzzy input is more 'computational' than 'symbolic.' The following sections first outline the formulation of the new approach, then deal with the implementation and computational scheme, and this is followed by examples to illustrate the new procedure.

  2. The identification of complex interactions in epidemiology and toxicology : a simulation study of Boosted Regression Trees

    OpenAIRE

    Lampa, Erik; Lind, Lars; Lind, Monica P.; Bornefalk-Hermansson, Anna

    2014-01-01

    Background: There is a need to evaluate complex interaction effects on human health, such as those induced by mixtures of environmental contaminants. The usual approach is to formulate an additive statistical model and check for departures using product terms between the variables of interest. In this paper, we present an approach to search for interaction effects among several variables using boosted regression trees. Methods: We simulate a continuous outcome from real data on 27 environment...

  3. Regression Trees Identify Relevant Interactions: Can This Improve the Predictive Performance of Risk Adjustment?

    Science.gov (United States)

    Buchner, Florian; Wasem, Jürgen; Schillo, Sonja

    2017-01-01

    Risk equalization formulas have been refined since their introduction about two decades ago. Because of the complexity and the abundance of possible interactions between the variables used, hardly any interactions are considered. A regression tree is used to systematically search for interactions, a methodologically new approach in risk equalization. Analyses are based on a data set of nearly 2.9 million individuals from a major German social health insurer. A two-step approach is applied: In the first step a regression tree is built on the basis of the learning data set. Terminal nodes characterized by more than one morbidity-group-split represent interaction effects of different morbidity groups. In the second step the 'traditional' weighted least squares regression equation is expanded by adding interaction terms for all interactions detected by the tree, and regression coefficients are recalculated. The resulting risk adjustment formula shows an improvement in the adjusted R(2) from 25.43% to 25.81% on the evaluation data set. Predictive ratios are calculated for subgroups affected by the interactions. The R(2) improvement detected is only marginal. According to the sample level performance measures used, not involving a considerable number of morbidity interactions forms no relevant loss in accuracy. Copyright © 2015 John Wiley & Sons, Ltd.

  4. Prediction of tissue-specific cis-regulatory modules using Bayesian networks and regression trees

    Directory of Open Access Journals (Sweden)

    Chen Xiaoyu

    2007-12-01

    Full Text Available Abstract Background In vertebrates, a large part of gene transcriptional regulation is operated by cis-regulatory modules. These modules are believed to be regulating much of the tissue-specificity of gene expression. Results We develop a Bayesian network approach for identifying cis-regulatory modules likely to regulate tissue-specific expression. The network integrates predicted transcription factor binding site information, transcription factor expression data, and target gene expression data. At its core is a regression tree modeling the effect of combinations of transcription factors bound to a module. A new unsupervised EM-like algorithm is developed to learn the parameters of the network, including the regression tree structure. Conclusion Our approach is shown to accurately identify known human liver and erythroid-specific modules. When applied to the prediction of tissue-specific modules in 10 different tissues, the network predicts a number of important transcription factor combinations whose concerted binding is associated to specific expression.

  5. Predicting electricity energy consumption: A comparison of regression analysis, decision tree and neural networks

    Energy Technology Data Exchange (ETDEWEB)

    Tso, Geoffrey K.F.; Yau, Kelvin K.W. [City University of Hong Kong, Kowloon, Hong Kong (China). Department of Management Sciences

    2007-09-15

    This study presents three modeling techniques for the prediction of electricity energy consumption. In addition to the traditional regression analysis, decision tree and neural networks are considered. Model selection is based on the square root of average squared error. In an empirical application to an electricity energy consumption study, the decision tree and neural network models appear to be viable alternatives to the stepwise regression model in understanding energy consumption patterns and predicting energy consumption levels. With the emergence of the data mining approach for predictive modeling, different types of models can be built in a unified platform: to implement various modeling techniques, assess the performance of different models and select the most appropriate model for future prediction. (author)

  6. Data to support "Boosted Regression Tree Models to Explain Watershed Nutrient Concentrations & Biological Condition"

    Data.gov (United States)

    U.S. Environmental Protection Agency — Spreadsheets are included here to support the manuscript "Boosted Regression Tree Models to Explain Watershed Nutrient Concentrations and Biological Condition". This...

  7. Comparing Methodologies for Developing an Early Warning System: Classification and Regression Tree Model versus Logistic Regression. REL 2015-077

    Science.gov (United States)

    Koon, Sharon; Petscher, Yaacov

    2015-01-01

    The purpose of this report was to explicate the use of logistic regression and classification and regression tree (CART) analysis in the development of early warning systems. It was motivated by state education leaders' interest in maintaining high classification accuracy while simultaneously improving practitioner understanding of the rules by…

  8. Market-based approaches to tree valuation

    Science.gov (United States)

    Geoffrey H. Donovan; David T. Butry

    2008-01-01

    A recent four-part series in Arborist News outlined different appraisal processes used to value urban trees. The final article in the series described the three generally accepted approaches to tree valuation: the sales comparison approach, the cost approach, and the income capitalization approach. The author, D. Logan Nelson, noted that the sales comparison approach...

  9. Using ROC curves to compare neural networks and logistic regression for modeling individual noncatastrophic tree mortality

    Science.gov (United States)

    Susan L. King

    2003-01-01

    The performance of two classifiers, logistic regression and neural networks, are compared for modeling noncatastrophic individual tree mortality for 21 species of trees in West Virginia. The output of the classifier is usually a continuous number between 0 and 1. A threshold is selected between 0 and 1 and all of the trees below the threshold are classified as...

  10. Malignancy Risk Assessment in Patients with Thyroid Nodules Using Classification and Regression Trees

    Directory of Open Access Journals (Sweden)

    Shokouh Taghipour Zahir

    2013-01-01

    Full Text Available Purpose. We sought to investigate the utility of classification and regression trees (CART classifier to differentiate benign from malignant nodules in patients referred for thyroid surgery. Methods. Clinical and demographic data of 271 patients referred to the Sadoughi Hospital during 2006–2011 were collected. In a two-step approach, a CART classifier was employed to differentiate patients with a high versus low risk of thyroid malignancy. The first step served as the screening procedure and was tailored to produce as few false negatives as possible. The second step identified those with the lowest risk of malignancy, chosen from a high risk population. Sensitivity, specificity, positive and negative predictive values (PPV and NPV of the optimal tree were calculated. Results. In the first step, age, sex, and nodule size contributed to the optimal tree. Ultrasonographic features were employed in the second step with hypoechogenicity and/or microcalcifications yielding the highest discriminatory ability. The combined tree produced a sensitivity and specificity of 80.0% (95% CI: 29.9–98.9 and 94.1% (95% CI: 78.9–99.0, respectively. NPV and PPV were 66.7% (41.1–85.6 and 97.0% (82.5–99.8, respectively. Conclusion. CART classifier reliably identifies patients with a low risk of malignancy who can avoid unnecessary surgery.

  11. Application of Boosting Regression Trees to Preliminary Cost Estimation in Building Construction Projects.

    Science.gov (United States)

    Shin, Yoonseok

    2015-01-01

    Among the recent data mining techniques available, the boosting approach has attracted a great deal of attention because of its effective learning algorithm and strong boundaries in terms of its generalization performance. However, the boosting approach has yet to be used in regression problems within the construction domain, including cost estimations, but has been actively utilized in other domains. Therefore, a boosting regression tree (BRT) is applied to cost estimations at the early stage of a construction project to examine the applicability of the boosting approach to a regression problem within the construction domain. To evaluate the performance of the BRT model, its performance was compared with that of a neural network (NN) model, which has been proven to have a high performance in cost estimation domains. The BRT model has shown results similar to those of NN model using 234 actual cost datasets of a building construction project. In addition, the BRT model can provide additional information such as the importance plot and structure model, which can support estimators in comprehending the decision making process. Consequently, the boosting approach has potential applicability in preliminary cost estimations in a building construction project.

  12. Application of Boosting Regression Trees to Preliminary Cost Estimation in Building Construction Projects

    Directory of Open Access Journals (Sweden)

    Yoonseok Shin

    2015-01-01

    Full Text Available Among the recent data mining techniques available, the boosting approach has attracted a great deal of attention because of its effective learning algorithm and strong boundaries in terms of its generalization performance. However, the boosting approach has yet to be used in regression problems within the construction domain, including cost estimations, but has been actively utilized in other domains. Therefore, a boosting regression tree (BRT is applied to cost estimations at the early stage of a construction project to examine the applicability of the boosting approach to a regression problem within the construction domain. To evaluate the performance of the BRT model, its performance was compared with that of a neural network (NN model, which has been proven to have a high performance in cost estimation domains. The BRT model has shown results similar to those of NN model using 234 actual cost datasets of a building construction project. In addition, the BRT model can provide additional information such as the importance plot and structure model, which can support estimators in comprehending the decision making process. Consequently, the boosting approach has potential applicability in preliminary cost estimations in a building construction project.

  13. Classification and regression tree analysis vs. multivariable linear and logistic regression methods as statistical tools for studying haemophilia.

    Science.gov (United States)

    Henrard, S; Speybroeck, N; Hermans, C

    2015-11-01

    Haemophilia is a rare genetic haemorrhagic disease characterized by partial or complete deficiency of coagulation factor VIII, for haemophilia A, or IX, for haemophilia B. As in any other medical research domain, the field of haemophilia research is increasingly concerned with finding factors associated with binary or continuous outcomes through multivariable models. Traditional models include multiple logistic regressions, for binary outcomes, and multiple linear regressions for continuous outcomes. Yet these regression models are at times difficult to implement, especially for non-statisticians, and can be difficult to interpret. The present paper sought to didactically explain how, why, and when to use classification and regression tree (CART) analysis for haemophilia research. The CART method is non-parametric and non-linear, based on the repeated partitioning of a sample into subgroups based on a certain criterion. Breiman developed this method in 1984. Classification trees (CTs) are used to analyse categorical outcomes and regression trees (RTs) to analyse continuous ones. The CART methodology has become increasingly popular in the medical field, yet only a few examples of studies using this methodology specifically in haemophilia have to date been published. Two examples using CART analysis and previously published in this field are didactically explained in details. There is increasing interest in using CART analysis in the health domain, primarily due to its ease of implementation, use, and interpretation, thus facilitating medical decision-making. This method should be promoted for analysing continuous or categorical outcomes in haemophilia, when applicable. © 2015 John Wiley & Sons Ltd.

  14. Boosted Regression Tree Models to Explain Watershed Nutrient Concentrations and Biological Condition

    Science.gov (United States)

    Boosted regression tree (BRT) models were developed to quantify the nonlinear relationships between landscape variables and nutrient concentrations in a mesoscale mixed land cover watershed during base-flow conditions. Factors that affect instream biological components, based on ...

  15. Boosted Regression Tree Models to Explain Watershed Nutrient Concentrations and Biological Condition

    Science.gov (United States)

    Boosted regression tree (BRT) models were developed to quantify the nonlinear relationships between landscape variables and nutrient concentrations in a mesoscale mixed land cover watershed during base-flow conditions. Factors that affect instream biological components, based on ...

  16. Combining an additive and tree-based regression model simultaneously: STIMA

    NARCIS (Netherlands)

    Dusseldorp, E.; Conversano, C.; Os, B.J. van

    2010-01-01

    Additive models and tree-based regression models are two main classes of statistical models used to predict the scores on a continuous response variable. It is known that additive models become very complex in the presence of higher order interaction effects, whereas some tree-based models, such as

  17. Aproximación a la metodología basada en árboles de decisión (CART: Mortalidad hospitalaria del infarto agudo de miocardio Approach to the methodology of classification and regression trees

    Directory of Open Access Journals (Sweden)

    Javier Trujillano

    2008-02-01

    Full Text Available Objetivo: : Realizar una aproximación a la metodología de árboles de decisión tipo CART (Classification and Regression Trees desarrollando un modelo para calcular la probabilidad de muerte hospitalaria en infarto agudo de miocardio (IAM. Método: Se utiliza el conjunto mínimo básico de datos al alta hospitalaria (CMBD de Andalucía, Cataluña, Madrid y País Vasco de los años 2001 y 2002, que incluye los casos con IAM como diagnóstico principal. Los 33.203 pacientes se dividen aleatoriamente (70 y 30 % en grupo de desarrollo (GD = 23.277 y grupo de validación (GV = 9.926. Como CART se utiliza un modelo inductivo basado en el algoritmo de Breiman, con análisis de sensibilidad mediante el índice de Gini y sistema de validación cruzada. Se compara con un modelo de regresión logística (RL y una red neuronal artificial (RNA (multilayer perceptron. Los modelos desarrollados se contrastan en el GV y sus propiedades se comparan con el área bajo la curva ROC (ABC (intervalo de confianza del 95%. Resultados: En el GD el CART con ABC = 0,85 (0,86-0,88, RL 0,87 (0,86-0,88 y RNA 0,85 (0,85-0,86. En el GV el CART con ABC = 0,85 (0,85-0,88, RL 0,86 (0,85-0,88 y RNA 0,84 (0,83-0,86. Conclusiones: Los 3 modelos obtienen resultados similares en su capacidad de discriminación. El modelo CART ofrece como ventaja su simplicidad de uso y de interpretación, ya que las reglas de decisión que generan pueden aplicarse sin necesidad de procesos matemáticos.Objective: To provide an overview of decision trees based on CART (Classification and Regression Trees methodology. As an example, we developed a CART model intended to estimate the probability of intrahospital death from acute myocardial infarction (AMI. Method: We employed the minimum data set (MDS of Andalusia, Catalonia, Madrid and the Basque Country (2001-2002, which included 33,203 patients with a diagnosis of AMI. The 33,203 patients were randomly divided (70% and 30% into the development (DS

  18. A classification and regression tree model of controls on dissolved inorganic nitrogen leaching from European forests.

    Science.gov (United States)

    Rothwell, James J; Futter, Martyn N; Dise, Nancy B

    2008-11-01

    Often, there is a non-linear relationship between atmospheric dissolved inorganic nitrogen (DIN) input and DIN leaching that is poorly captured by existing models. We present the first application of the non-parametric classification and regression tree approach to evaluate the key environmental drivers controlling DIN leaching from European forests. DIN leaching was classified as low (15kg N ha(-1) year(-1)) at 215 sites across Europe. The analysis identified throughfall NO(3)(-) deposition, acid deposition, hydrology, soil type, the carbon content of the soil, and the legacy of historic N deposition as the dominant drivers of DIN leaching for these forests. Ninety four percent of sites were successfully classified into the appropriate leaching category. This approach shows promise for understanding complex ecosystem responses to a wide range of anthropogenic stressors as well as an improved method for identifying risk and targeting pollution mitigation strategies in forest ecosystems.

  19. Classification and regression trees (CART) for estimation of prognosis in patients with gastric carcinoma.

    Science.gov (United States)

    Hermanek, P; Guggenmoos-Holzmann, I

    1994-01-01

    A total of 961 patients who had received resective surgery for gastric carcinoma were grouped according to prognosis by classification and regression trees (CART). This grouping was compared to the present UICC stage grouping. For patients resected for cure (R0) the CART approach allows a better discrimination of patients with poor prognosis (5-year survival rates 15%-30%) from patients with a 5-year survival of 50%, on the one hand, and from patients with extremely poor prognosis (5-year survival rates below 5%) on the other. In the present investigation CART grouping was not influenced by the differentiation between pT1 and pT2 or between pT3 and pT4.

  20. Applied Regression Modeling A Business Approach

    CERN Document Server

    Pardoe, Iain

    2012-01-01

    An applied and concise treatment of statistical regression techniques for business students and professionals who have little or no background in calculusRegression analysis is an invaluable statistical methodology in business settings and is vital to model the relationship between a response variable and one or more predictor variables, as well as the prediction of a response value given values of the predictors. In view of the inherent uncertainty of business processes, such as the volatility of consumer spending and the presence of market uncertainty, business professionals use regression a

  1. Regression tree modeling of forest NPP using site conditions and climate variables across eastern USA

    Science.gov (United States)

    Kwon, Y.

    2013-12-01

    As evidence of global warming continue to increase, being able to predict forest response to climate changes, such as expected rise of temperature and precipitation, will be vital for maintaining the sustainability and productivity of forests. To map forest species redistribution by climate change scenario has been successful, however, most species redistribution maps lack mechanistic understanding to explain why trees grow under the novel conditions of chaining climate. Distributional map is only capable of predicting under the equilibrium assumption that the communities would exist following a prolonged period under the new climate. In this context, forest NPP as a surrogate for growth rate, the most important facet that determines stand dynamics, can lead to valid prediction on the transition stage to new vegetation-climate equilibrium as it represents changes in structure of forest reflecting site conditions and climate factors. The objective of this study is to develop forest growth map using regression tree analysis by extracting large-scale non-linear structures from both field-based FIA and remotely sensed MODIS data set. The major issue addressed in this approach is non-linear spatial patterns of forest attributes. Forest inventory data showed complex spatial patterns that reflect environmental states and processes that originate at different spatial scales. At broad scales, non-linear spatial trends in forest attributes and mixture of continuous and discrete types of environmental variables make traditional statistical (multivariate regression) and geostatistical (kriging) models inefficient. It calls into question some traditional underlying assumptions of spatial trends that uncritically accepted in forest data. To solve the controversy surrounding the suitability of forest data, regression tree analysis are performed using Software See5 and Cubist. Four publicly available data sets were obtained: First, field-based Forest Inventory and Analysis (USDA

  2. Decision tree approach for soil liquefaction assessment.

    Science.gov (United States)

    Gandomi, Amir H; Fridline, Mark M; Roke, David A

    2013-01-01

    In the current study, the performances of some decision tree (DT) techniques are evaluated for postearthquake soil liquefaction assessment. A database containing 620 records of seismic parameters and soil properties is used in this study. Three decision tree techniques are used here in two different ways, considering statistical and engineering points of view, to develop decision rules. The DT results are compared to the logistic regression (LR) model. The results of this study indicate that the DTs not only successfully predict liquefaction but they can also outperform the LR model. The best DT models are interpreted and evaluated based on an engineering point of view.

  3. Development of prognostic indicators using Classification And Regression Trees (CART) for survival

    Science.gov (United States)

    Nunn, Martha E.; Fan, Juanjuan; Su, Xiaogang; McGuire, Michael K.

    2014-01-01

    The development of an accurate prognosis is an integral component of treatment planning in the practice of periodontics. Prior work has evaluated the validity of using various clinical measured parameters for assigning periodontal prognosis as well as for predicting tooth survival and change in clinical conditions over time. We critically review the application of multivariate Classification And Regression Trees (CART) for survival in developing evidence-based periodontal prognostic indicators. We focus attention on two distinct methods of multivariate CART for survival: the marginal goodness-of-fit approach, and the multivariate exponential approach. A number of common clinical measures have been found to be significantly associated with tooth loss from periodontal disease, including furcation involvement, probing depth, mobility, crown-to-root ratio, and oral hygiene. However, the inter-relationships among these measures, as well as the relevance of other clinical measures to tooth loss from periodontal disease (such as bruxism, family history of periodontal disease, and overall bone loss), remain less clear. While inferences drawn from any single current study are necessarily limited, the application of new approaches in epidemiologic analyses to periodontal prognosis, such as CART for survival, should yield important insights into our understanding, and treatment, of periodontal diseases. PMID:22133372

  4. Reconstructing missing daily precipitation data using regression trees and artificial neural networks

    Science.gov (United States)

    Incomplete meteorological data has been a problem in environmental modeling studies. The objective of this work was to develop a technique to reconstruct missing daily precipitation data in the central part of Chesapeake Bay Watershed using regression trees (RT) and artificial neural networks (ANN)....

  5. Multi-site solar power forecasting using gradient boosted regression trees

    DEFF Research Database (Denmark)

    Persson, Caroline Stougård; Bacher, Peder; Shiga, Takahiro

    2017-01-01

    generation and relevant meteorological variables related to 42 individual PV rooftop installations are used to train a gradient boosted regression tree (GBRT) model. When compared to single-site linear autoregressive and variations of GBRT models the multi-site model shows competitive results in terms...

  6. Risk Factors of Falls in Community-Dwelling Older Adults: Logistic Regression Tree Analysis

    Science.gov (United States)

    Yamashita, Takashi; Noe, Douglas A.; Bailer, A. John

    2012-01-01

    Purpose of the Study: A novel logistic regression tree-based method was applied to identify fall risk factors and possible interaction effects of those risk factors. Design and Methods: A nationally representative sample of American older adults aged 65 years and older (N = 9,592) in the Health and Retirement Study 2004 and 2006 modules was used.…

  7. Risk Factors of Falls in Community-Dwelling Older Adults: Logistic Regression Tree Analysis

    Science.gov (United States)

    Yamashita, Takashi; Noe, Douglas A.; Bailer, A. John

    2012-01-01

    Purpose of the Study: A novel logistic regression tree-based method was applied to identify fall risk factors and possible interaction effects of those risk factors. Design and Methods: A nationally representative sample of American older adults aged 65 years and older (N = 9,592) in the Health and Retirement Study 2004 and 2006 modules was used.…

  8. Hyperspectral analysis of soil nitrogen, carbon, carbonate, and organic matter using regression trees.

    Science.gov (United States)

    Gmur, Stephan; Vogt, Daniel; Zabowski, Darlene; Moskal, L Monika

    2012-01-01

    The characterization of soil attributes using hyperspectral sensors has revealed patterns in soil spectra that are known to respond to mineral composition, organic matter, soil moisture and particle size distribution. Soil samples from different soil horizons of replicated soil series from sites located within Washington and Oregon were analyzed with the FieldSpec Spectroradiometer to measure their spectral signatures across the electromagnetic range of 400 to 1,000 nm. Similarity rankings of individual soil samples reveal differences between replicate series as well as samples within the same replicate series. Using classification and regression tree statistical methods, regression trees were fitted to each spectral response using concentrations of nitrogen, carbon, carbonate and organic matter as the response variables. Statistics resulting from fitted trees were: nitrogen R(2) 0.91 (p organic matter R(2) 0.98 (p organic matter for upper soil horizons in a nondestructive method.

  9. GIS-based groundwater potential mapping using boosted regression tree, classification and regression tree, and random forest machine learning models in Iran.

    Science.gov (United States)

    Naghibi, Seyed Amir; Pourghasemi, Hamid Reza; Dixon, Barnali

    2016-01-01

    Groundwater is considered one of the most valuable fresh water resources. The main objective of this study was to produce groundwater spring potential maps in the Koohrang Watershed, Chaharmahal-e-Bakhtiari Province, Iran, using three machine learning models: boosted regression tree (BRT), classification and regression tree (CART), and random forest (RF). Thirteen hydrological-geological-physiographical (HGP) factors that influence locations of springs were considered in this research. These factors include slope degree, slope aspect, altitude, topographic wetness index (TWI), slope length (LS), plan curvature, profile curvature, distance to rivers, distance to faults, lithology, land use, drainage density, and fault density. Subsequently, groundwater spring potential was modeled and mapped using CART, RF, and BRT algorithms. The predicted results from the three models were validated using the receiver operating characteristics curve (ROC). From 864 springs identified, 605 (≈70 %) locations were used for the spring potential mapping, while the remaining 259 (≈30 %) springs were used for the model validation. The area under the curve (AUC) for the BRT model was calculated as 0.8103 and for CART and RF the AUC were 0.7870 and 0.7119, respectively. Therefore, it was concluded that the BRT model produced the best prediction results while predicting locations of springs followed by CART and RF models, respectively. Geospatially integrated BRT, CART, and RF methods proved to be useful in generating the spring potential map (SPM) with reasonable accuracy.

  10. Classification and Regression Trees on Aggregate Data Modeling: An Application in Acute Myocardial Infarction

    Directory of Open Access Journals (Sweden)

    C. Quantin

    2011-01-01

    Full Text Available Cardiologists are interested in determining whether the type of hospital pathway followed by a patient is predictive of survival. The study objective was to determine whether accounting for hospital pathways in the selection of prognostic factors of one-year survival after acute myocardial infarction (AMI provided a more informative analysis than that obtained by the use of a standard regression tree analysis (CART method. Information on AMI was collected for 1095 hospitalized patients over an 18-month period. The construction of pathways followed by patients produced symbolic-valued observations requiring a symbolic regression tree analysis. This analysis was compared with the standard CART analysis using patients as statistical units described by standard data selected TIMI score as the primary predictor variable. For the 1011 (84, resp. patients with a lower (higher TIMI score, the pathway variable did not appear as a diagnostic variable until the third (second stage of the tree construction. For an ecological analysis, again TIMI score was the first predictor variable. However, in a symbolic regression tree analysis using hospital pathways as statistical units, the type of pathway followed was the key predictor variable, showing in particular that pathways involving early admission to cardiology units produced high one-year survival rates.

  11. Prioritizing Highway Safety Manual's crash prediction variables using boosted regression trees.

    Science.gov (United States)

    Saha, Dibakar; Alluri, Priyanka; Gan, Albert

    2015-06-01

    The Highway Safety Manual (HSM) recommends using the empirical Bayes (EB) method with locally derived calibration factors to predict an agency's safety performance. However, the data needs for deriving these local calibration factors are significant, requiring very detailed roadway characteristics information. Many of the data variables identified in the HSM are currently unavailable in the states' databases. Moreover, the process of collecting and maintaining all the HSM data variables is cost-prohibitive. Prioritization of the variables based on their impact on crash predictions would, therefore, help to identify influential variables for which data could be collected and maintained for continued updates. This study aims to determine the impact of each independent variable identified in the HSM on crash predictions. A relatively recent data mining approach called boosted regression trees (BRT) is used to investigate the association between the variables and crash predictions. The BRT method can effectively handle different types of predictor variables, identify very complex and non-linear association among variables, and compute variable importance. Five years of crash data from 2008 to 2012 on two urban and suburban facility types, two-lane undivided arterials and four-lane divided arterials, were analyzed for estimating the influence of variables on crash predictions. Variables were found to exhibit non-linear and sometimes complex relationship to predicted crash counts. In addition, only a few variables were found to explain most of the variation in the crash data. Published by Elsevier Ltd.

  12. Minimizing the testlet effect: Identifying critical testlet features by means of tree-based regression

    OpenAIRE

    Paap, M.C.S.; Eggen, T.J.H.M.; Veldkamp, B.P.

    2012-01-01

    Standardized tests often group items around a common stimulus. Such groupings of items are called testlets. The potential dependency among items within a testlet is generally ignored in practice, even though a basic assumption of item response theory (IRT) is that individual items are independent of one another. A technique called tree-based regression (TBR) was applied to identify key features of stimuli that could properly predict the dependence structure of testlet data. Knowledge about th...

  13. Development of prognostic indicators using Classification And Regression Trees (CART) for survival

    OpenAIRE

    Nunn, Martha E.; Fan, Juanjuan; Su, Xiaogang; McGuire, Michael K.

    2012-01-01

    The development of an accurate prognosis is an integral component of treatment planning in the practice of periodontics. Prior work has evaluated the validity of using various clinical measured parameters for assigning periodontal prognosis as well as for predicting tooth survival and change in clinical conditions over time. We critically review the application of multivariate Classification And Regression Trees (CART) for survival in developing evidence-based periodontal prognostic indicator...

  14. Identifying the critical factors that influence intraocular pressure using an automated regression tree

    Directory of Open Access Journals (Sweden)

    Nishanee Rampersad

    2017-01-01

    Full Text Available Background: Assessment of intraocular pressure (IOP is an important test in glaucoma. In addition, anterior segment variables may be useful in screening for glaucoma risk. Studies have investigated the associations between IOP and anterior segment variables using traditional statistical methods. The classification and regression tree (CART method provides another dimension to detect important variables in a relationship automatically.Aim: To identify the critical factors that influence IOP using a regression tree.Methods: A quantitative cross-sectional research design was used. Anterior segment variables were measured in 700 participants using the iVue100 optical coherence tomographer, Oculus Keratograph and Nidek US-500 ultrasonographer. A Goldmann applanation tonometer was used to measure IOP. Data from only the right eyes were analysed because of high levels of interocular symmetry. A regression tree model was generated with the CART method and Pearson’s correlation coefficients were used to assess the relationships between the ocular variables.Results: The mean IOP for the entire sample was 14.63 mmHg ± 2.40 mmHg. The CART method selected three anterior segment variables in the regression tree model. Central corneal thickness was the most important variable with a cut-off value of 527 µm. The other important variables included average paracentral corneal thickness and axial anterior chamber depth. Corneal thickness measurements increased towards the periphery and were significantly correlated with IOP (r ≥ 0.50, p ≤ 0.001.Conclusion: The CART method identified the anterior segment variables that influenced IOP. Understanding the relationship between IOP and anterior segment variables may help to clinically identify patients with ocular risk factors associated with elevated IOPs.

  15. [Hyperspectral Estimation of Apple Tree Canopy LAI Based on SVM and RF Regression].

    Science.gov (United States)

    Han, Zhao-ying; Zhu, Xi-cun; Fang, Xian-yi; Wang, Zhuo-yuan; Wang, Ling; Zhao, Geng-Xing; Jiang, Yuan-mao

    2016-03-01

    Leaf area index (LAI) is the dynamic index of crop population size. Hyperspectral technology can be used to estimate apple canopy LAI rapidly and nondestructively. It can be provide a reference for monitoring the tree growing and yield estimation. The Red Fuji apple trees of full bearing fruit are the researching objects. Ninety apple trees canopies spectral reflectance and LAI values were measured by the ASD Fieldspec3 spectrometer and LAI-2200 in thirty orchards in constant two years in Qixia research area of Shandong Province. The optimal vegetation indices were selected by the method of correlation analysis of the original spectral reflectance and vegetation indices. The models of predicting the LAI were built with the multivariate regression analysis method of support vector machine (SVM) and random forest (RF). The new vegetation indices, GNDVI527, ND-VI676, RVI682, FD-NVI656 and GRVI517 and the previous two main vegetation indices, NDVI670 and NDVI705, are in accordance with LAI. In the RF regression model, the calibration set decision coefficient C-R2 of 0.920 and validation set decision coefficient V-R2 of 0.889 are higher than the SVM regression model by 0.045 and 0.033 respectively. The root mean square error of calibration set C-RMSE of 0.249, the root mean square error validation set V-RMSE of 0.236 are lower than that of the SVM regression model by 0.054 and 0.058 respectively. Relative analysis of calibrating error C-RPD and relative analysis of validation set V-RPD reached 3.363 and 2.520, 0.598 and 0.262, respectively, which were higher than the SVM regression model. The measured and predicted the scatterplot trend line slope of the calibration set and validation set C-S and V-S are close to 1. The estimation result of RF regression model is better than that of the SVM. RF regression model can be used to estimate the LAI of red Fuji apple trees in full fruit period.

  16. Risk stratification for prognosis in intracerebral hemorrhage: A decision tree model and logistic regression

    Directory of Open Access Journals (Sweden)

    Gang WU

    2016-01-01

    Full Text Available Objective  To analyze the risk factors for prognosis in intracerebral hemorrhage using decision tree (classification and regression tree, CART model and logistic regression model. Methods  CART model and logistic regression model were established according to the risk factors for prognosis of patients with cerebral hemorrhage. The differences in the results were compared between the two methods. Results  Logistic regression analyses showed that hematoma volume (OR-value 0.953, initial Glasgow Coma Scale (GCS score (OR-value 1.210, pulmonary infection (OR-value 0.295, and basal ganglia hemorrhage (OR-value 0.336 were the risk factors for the prognosis of cerebral hemorrhage. The results of CART analysis showed that volume of hematoma and initial GCS score were the main factors affecting the prognosis of cerebral hemorrhage. The effects of two models on the prognosis of cerebral hemorrhage were similar (Z-value 0.402, P=0.688. Conclusions  CART model has a similar value to that of logistic model in judging the prognosis of cerebral hemorrhage, and it is characterized by using transactional analysis between the risk factors, and it is more intuitive. DOI: 10.11855/j.issn.0577-7402.2015.12.13

  17. Hyperspectral Analysis of Soil Nitrogen, Carbon, Carbonate, and Organic Matter Using Regression Trees

    Directory of Open Access Journals (Sweden)

    L. Monika Moskal

    2012-08-01

    Full Text Available The characterization of soil attributes using hyperspectral sensors has revealed patterns in soil spectra that are known to respond to mineral composition, organic matter, soil moisture and particle size distribution. Soil samples from different soil horizons of replicated soil series from sites located within Washington and Oregon were analyzed with the FieldSpec Spectroradiometer to measure their spectral signatures across the electromagnetic range of 400 to 1,000 nm. Similarity rankings of individual soil samples reveal differences between replicate series as well as samples within the same replicate series. Using classification and regression tree statistical methods, regression trees were fitted to each spectral response using concentrations of nitrogen, carbon, carbonate and organic matter as the response variables. Statistics resulting from fitted trees were: nitrogen R2 0.91 (p < 0.01 at 403, 470, 687, and 846 nm spectral band widths, carbonate R2 0.95 (p < 0.01 at 531 and 898 nm band widths, total carbon R2 0.93 (p < 0.01 at 400, 409, 441 and 907 nm band widths, and organic matter R2 0.98 (p < 0.01 at 300, 400, 441, 832 and 907 nm band widths. Use of the 400 to 1,000 nm electromagnetic range utilizing regression trees provided a powerful, rapid and inexpensive method for assessing nitrogen, carbon, carbonate and organic matter for upper soil horizons in a nondestructive method.

  18. Classification and regression tree analysis of acute-on-chronic hepatitis B liver failure: Seeing the forest for the trees.

    Science.gov (United States)

    Shi, K-Q; Zhou, Y-Y; Yan, H-D; Li, H; Wu, F-L; Xie, Y-Y; Braddock, M; Lin, X-Y; Zheng, M-H

    2017-02-01

    At present, there is no ideal model for predicting the short-term outcome of patients with acute-on-chronic hepatitis B liver failure (ACHBLF). This study aimed to establish and validate a prognostic model by using the classification and regression tree (CART) analysis. A total of 1047 patients from two separate medical centres with suspected ACHBLF were screened in the study, which were recognized as derivation cohort and validation cohort, respectively. CART analysis was applied to predict the 3-month mortality of patients with ACHBLF. The accuracy of the CART model was tested using the area under the receiver operating characteristic curve, which was compared with the model for end-stage liver disease (MELD) score and a new logistic regression model. CART analysis identified four variables as prognostic factors of ACHBLF: total bilirubin, age, serum sodium and INR, and three distinct risk groups: low risk (4.2%), intermediate risk (30.2%-53.2%) and high risk (81.4%-96.9%). The new logistic regression model was constructed with four independent factors, including age, total bilirubin, serum sodium and prothrombin activity by multivariate logistic regression analysis. The performances of the CART model (0.896), similar to the logistic regression model (0.914, P=.382), exceeded that of MELD score (0.667, P<.001). The results were confirmed in the validation cohort. We have developed and validated a novel CART model superior to MELD for predicting three-month mortality of patients with ACHBLF. Thus, the CART model could facilitate medical decision-making and provide clinicians with a validated practical bedside tool for ACHBLF risk stratification.

  19. An efficient and extensible approach for compressing phylogenetic trees

    KAUST Repository

    Matthews, Suzanne J

    2011-01-01

    Background: Biologists require new algorithms to efficiently compress and store their large collections of phylogenetic trees. Our previous work showed that TreeZip is a promising approach for compressing phylogenetic trees. In this paper, we extend our TreeZip algorithm by handling trees with weighted branches. Furthermore, by using the compressed TreeZip file as input, we have designed an extensible decompressor that can extract subcollections of trees, compute majority and strict consensus trees, and merge tree collections using set operations such as union, intersection, and set difference.Results: On unweighted phylogenetic trees, TreeZip is able to compress Newick files in excess of 98%. On weighted phylogenetic trees, TreeZip is able to compress a Newick file by at least 73%. TreeZip can be combined with 7zip with little overhead, allowing space savings in excess of 99% (unweighted) and 92%(weighted). Unlike TreeZip, 7zip is not immune to branch rotations, and performs worse as the level of variability in the Newick string representation increases. Finally, since the TreeZip compressed text (TRZ) file contains all the semantic information in a collection of trees, we can easily filter and decompress a subset of trees of interest (such as the set of unique trees), or build the resulting consensus tree in a matter of seconds. We also show the ease of which set operations can be performed on TRZ files, at speeds quicker than those performed on Newick or 7zip compressed Newick files, and without loss of space savings.Conclusions: TreeZip is an efficient approach for compressing large collections of phylogenetic trees. The semantic and compact nature of the TRZ file allow it to be operated upon directly and quickly, without a need to decompress the original Newick file. We believe that TreeZip will be vital for compressing and archiving trees in the biological community. © 2011 Matthews and Williams; licensee BioMed Central Ltd.

  20. Risk assessment of dengue fever in Zhongshan, China: a time-series regression tree analysis.

    Science.gov (United States)

    Liu, K-K; Wang, T; Huang, X-D; Wang, G-L; Xia, Y; Zhang, Y-T; Jing, Q-L; Huang, J-W; Liu, X-X; Lu, J-H; Hu, W-B

    2017-02-01

    Dengue fever (DF) is the most prevalent and rapidly spreading mosquito-borne disease globally. Control of DF is limited by barriers to vector control and integrated management approaches. This study aimed to explore the potential risk factors for autochthonous DF transmission and to estimate the threshold effects of high-order interactions among risk factors. A time-series regression tree model was applied to estimate the hierarchical relationship between reported autochthonous DF cases and the potential risk factors including the timeliness of DF surveillance systems (median time interval between symptom onset date and diagnosis date, MTIOD), mosquito density, imported cases and meteorological factors in Zhongshan, China from 2001 to 2013. We found that MTIOD was the most influential factor in autochthonous DF transmission. Monthly autochthonous DF incidence rate increased by 36·02-fold [relative risk (RR) 36·02, 95% confidence interval (CI) 25·26-46·78, compared to the average DF incidence rate during the study period] when the 2-month lagged moving average of MTIOD was >4·15 days and the 3-month lagged moving average of the mean Breteau Index (BI) was ⩾16·57. If the 2-month lagged moving average MTIOD was between 1·11 and 4·15 days and the monthly maximum diurnal temperature range at a lag of 1 month was <9·6 °C, the monthly mean autochthonous DF incidence rate increased by 14·67-fold (RR 14·67, 95% CI 8·84-20·51, compared to the average DF incidence rate during the study period). This study demonstrates that the timeliness of DF surveillance systems, mosquito density and diurnal temperature range play critical roles in the autochthonous DF transmission in Zhongshan. Better assessment and prediction of the risk of DF transmission is beneficial for establishing scientific strategies for DF early warning surveillance and control.

  1. Landsat 8 six spectral band data and MODIS NDVI data for assessing the optimal regression tree models

    Science.gov (United States)

    Gu, Yingxin; Wylie, Bruce K.; Boyte, Stephen

    2016-01-01

    In this study, we developed a method that identifies an optimal sample data usage strategy and rule numbers that minimize over- and underfitting effects in regression tree mapping models. A LANDFIRE tile (r04c03, located mainly in northeastern Nevada), which is a composite of multiple Landsat 8 scenes for a target date, was selected for the study. To minimize any cloud and bad detection effects in the original Landsat 8 data, the compositing approach used cosine-similarity-combined pixels from multiple observations based on data quality and temporal proximity to a target date. Julian date 212, which yielded relatively low "no data and/or cloudy” pixels, was used as the target date with Landsat 8 observations from days 140–240 in 2013. The 30-m Landsat 8 composited data were then upscaled to 250 m using a spatial averaging method. Six Landsat 8 spectral bands (bands 1–6) at 250-m resolution were used as independent variables for developing the piecewise regression-tree models to predict the 250-m eMODIS NDVI (dependent variable). Furthermore, to ensure the high quality of the derived 250-m Landsat 8 data, and avoid any additional cloud and atmospheric effects, the percentage of 30-m pixels with “0” values within a 250-m pixel was calculated. Only those 250-m pixels with 0% of “0” values (i.e., all the 30-m pixels within a 250-m pixel have no zero values pixels) were selected to develop the regression-tree model.The 7-day maximum value composites of 250-m MODIS NDVI for the year 2013 were obtained from the USGS expedited MODIS (eMODIS) data archive (https://lta.cr.usgs.gov/emodis). Pixels with bad quality, negative values, clouds, snow cover, and low view angles were filtered out based on the MODIS quality assurance data to ensure high quality eMODIS NDVI data. The 2013 weekly NDVI data were then stacked and temporally smoothed using a weighted least-squares approach to reduce additional atmospheric noise. Temporal smoothing helps to ensure reliable

  2. Regression models for estimating leaf area of seedlings and adult individuals of Neotropical rainforest tree species

    Directory of Open Access Journals (Sweden)

    E. Brito-Rocha

    Full Text Available Abstract Individual leaf area (LA is a key variable in studies of tree ecophysiology because it directly influences light interception, photosynthesis and evapotranspiration of adult trees and seedlings. We analyzed the leaf dimensions (length – L and width – W of seedlings and adults of seven Neotropical rainforest tree species (Brosimum rubescens, Manilkara maxima, Pouteria caimito, Pouteria torta, Psidium cattleyanum, Symphonia globulifera and Tabebuia stenocalyx with the objective to test the feasibility of single regression models to estimate LA of both adults and seedlings. In southern Bahia, Brazil, a first set of data was collected between March and October 2012. From the seven species analyzed, only two (P. cattleyanum and T. stenocalyx had very similar relationships between LW and LA in both ontogenetic stages. For these two species, a second set of data was collected in August 2014, in order to validate the single models encompassing adult and seedlings. Our results show the possibility of development of models for predicting individual leaf area encompassing different ontogenetic stages for tropical tree species. The development of these models was more dependent on the species than the differences in leaf size between seedlings and adults.

  3. Approaches to Low Fuel Regression Rate in Hybrid Rocket Engines

    Directory of Open Access Journals (Sweden)

    Dario Pastrone

    2012-01-01

    Full Text Available Hybrid rocket engines are promising propulsion systems which present appealing features such as safety, low cost, and environmental friendliness. On the other hand, certain issues hamper the development hoped for. The present paper discusses approaches addressing improvements to one of the most important among these issues: low fuel regression rate. To highlight the consequence of such an issue and to better understand the concepts proposed, fundamentals are summarized. Two approaches are presented (multiport grain and high mixture ratio which aim at reducing negative effects without enhancing regression rate. Furthermore, fuel material changes and nonconventional geometries of grain and/or injector are presented as methods to increase fuel regression rate. Although most of these approaches are still at the laboratory or concept scale, many of them are promising.

  4. Application of Logistic Regression Tree Model in Determining Habitat Distribution of Astragalus verus

    Directory of Open Access Journals (Sweden)

    M. Saki

    2013-03-01

    Full Text Available The relationship between plant species and environmental factors has always been a central issue in plant ecology. With rising power of statistical techniques, geo-statistics and geographic information systems (GIS, the development of predictive habitat distribution models of organisms has rapidly increased in ecology. This study aimed to evaluate the ability of Logistic Regression Tree model to create potential habitat map of Astragalus verus. This species produces Tragacanth and has economic value. A stratified- random sampling was applied to 100 sites (50 presence- 50 absence of given species, and produced environmental and edaphic factors maps by using Kriging and Inverse Distance Weighting methods in the ArcGIS software for the whole study area. Relationships between species occurrence and environmental factors were determined by Logistic Regression Tree model and extended to the whole study area. The results indicated species occurrence has strong correlation with environmental factors such as mean daily temperature and clay, EC and organic carbon content of the soil. Species occurrence showed direct relationship with mean daily temperature and clay and organic carbon, and inverse relationship with EC. Model accuracy was evaluated both by Cohen’s kappa statistics (κ and by area under Receiver Operating Characteristics curve based on independent test data set. Their values (kappa=0.9, Auc of ROC=0.96 indicated the high power of LRT to create potential habitat map on local scales. This model, therefore, can be applied to recognize potential sites for rangeland reclamation projects.

  5. A Multiple Regression Approach to Normalization of Spatiotemporal Gait Features.

    Science.gov (United States)

    Wahid, Ferdous; Begg, Rezaul; Lythgo, Noel; Hass, Chris J; Halgamuge, Saman; Ackland, David C

    2016-04-01

    Normalization of gait data is performed to reduce the effects of intersubject variations due to physical characteristics. This study reports a multiple regression normalization approach for spatiotemporal gait data that takes into account intersubject variations in self-selected walking speed and physical properties including age, height, body mass, and sex. Spatiotemporal gait data including stride length, cadence, stance time, double support time, and stride time were obtained from healthy subjects including 782 children, 71 adults, 29 elderly subjects, and 28 elderly Parkinson's disease (PD) patients. Data were normalized using standard dimensionless equations, a detrending method, and a multiple regression approach. After normalization using dimensionless equations and the detrending method, weak to moderate correlations between walking speed, physical properties, and spatiotemporal gait features were observed (0.01 normalization using the multiple regression method reduced these correlations to weak values (|r| normalization using dimensionless equations and detrending resulted in significant differences in stride length and double support time of PD patients; however the multiple regression approach revealed significant differences in these features as well as in cadence, stance time, and stride time. The proposed multiple regression normalization may be useful in machine learning, gait classification, and clinical evaluation of pathological gait patterns.

  6. A Novel Approach for Core Selection in Shared Tree Multicasting

    Directory of Open Access Journals (Sweden)

    Bidyut Gupta

    2014-03-01

    Full Text Available Multicasting is preferred over multiple unicasts from the viewpoint of better utilization of network bandwidth. Multicasting can be done in two different ways: source based tree approach and shared tree approach. Protocols such as Core Based Tree (CBT, Protocol Independent Multicasting Sparse Mode (PIM-SM use shared treeapproach. Shared tree approach is preferred over source-based tree approach because in the later construction of minimum cost treeper source is needed unlike a single shared tree in the former approach.The work presented in this paper provides an efficient core selection method for shared tree multicasting. In this work, we have used a new concept known as pseudo diameter for core selection. The presented methodselects more than one core to achieve fault tolerance

  7. Variances in the projections, resulting from CLIMEX, Boosted Regression Trees and Random Forests techniques

    Science.gov (United States)

    Shabani, Farzin; Kumar, Lalit; Solhjouy-fard, Samaneh

    2016-05-01

    The aim of this study was to have a comparative investigation and evaluation of the capabilities of correlative and mechanistic modeling processes, applied to the projection of future distributions of date palm in novel environments and to establish a method of minimizing uncertainty in the projections of differing techniques. The location of this study on a global scale is in Middle Eastern Countries. We compared the mechanistic model CLIMEX (CL) with the correlative models MaxEnt (MX), Boosted Regression Trees (BRT), and Random Forests (RF) to project current and future distributions of date palm (Phoenix dactylifera L.). The Global Climate Model (GCM), the CSIRO-Mk3.0 (CS) using the A2 emissions scenario, was selected for making projections. Both indigenous and alien distribution data of the species were utilized in the modeling process. The common areas predicted by MX, BRT, RF, and CL from the CS GCM were extracted and compared to ascertain projection uncertainty levels of each individual technique. The common areas identified by all four modeling techniques were used to produce a map indicating suitable and unsuitable areas for date palm cultivation for Middle Eastern countries, for the present and the year 2100. The four different modeling approaches predict fairly different distributions. Projections from CL were more conservative than from MX. The BRT and RF were the most conservative methods in terms of projections for the current time. The combination of the final CL and MX projections for the present and 2100 provide higher certainty concerning those areas that will become highly suitable for future date palm cultivation. According to the four models, cold, hot, and wet stress, with differences on a regional basis, appears to be the major restrictions on future date palm distribution. The results demonstrate variances in the projections, resulting from different techniques. The assessment and interpretation of model projections requires reservations

  8. Variances in the projections, resulting from CLIMEX, Boosted Regression Trees and Random Forests techniques

    Science.gov (United States)

    Shabani, Farzin; Kumar, Lalit; Solhjouy-fard, Samaneh

    2017-08-01

    The aim of this study was to have a comparative investigation and evaluation of the capabilities of correlative and mechanistic modeling processes, applied to the projection of future distributions of date palm in novel environments and to establish a method of minimizing uncertainty in the projections of differing techniques. The location of this study on a global scale is in Middle Eastern Countries. We compared the mechanistic model CLIMEX (CL) with the correlative models MaxEnt (MX), Boosted Regression Trees (BRT), and Random Forests (RF) to project current and future distributions of date palm ( Phoenix dactylifera L.). The Global Climate Model (GCM), the CSIRO-Mk3.0 (CS) using the A2 emissions scenario, was selected for making projections. Both indigenous and alien distribution data of the species were utilized in the modeling process. The common areas predicted by MX, BRT, RF, and CL from the CS GCM were extracted and compared to ascertain projection uncertainty levels of each individual technique. The common areas identified by all four modeling techniques were used to produce a map indicating suitable and unsuitable areas for date palm cultivation for Middle Eastern countries, for the present and the year 2100. The four different modeling approaches predict fairly different distributions. Projections from CL were more conservative than from MX. The BRT and RF were the most conservative methods in terms of projections for the current time. The combination of the final CL and MX projections for the present and 2100 provide higher certainty concerning those areas that will become highly suitable for future date palm cultivation. According to the four models, cold, hot, and wet stress, with differences on a regional basis, appears to be the major restrictions on future date palm distribution. The results demonstrate variances in the projections, resulting from different techniques. The assessment and interpretation of model projections requires reservations

  9. Approaches to Low Fuel Regression Rate in Hybrid Rocket Engines

    OpenAIRE

    Dario Pastrone

    2012-01-01

    Hybrid rocket engines are promising propulsion systems which present appealing features such as safety, low cost, and environmental friendliness. On the other hand, certain issues hamper the development hoped for. The present paper discusses approaches addressing improvements to one of the most important among these issues: low fuel regression rate. To highlight the consequence of such an issue and to better understand the concepts proposed, fundamentals are summarized. Two approaches are pre...

  10. Predictors of sentinel lymph node status in cutaneous melanoma: a classification and regression tree analysis.

    Science.gov (United States)

    Tejera-Vaquerizo, A; Martín-Cuevas, P; Gallego, E; Herrera-Acosta, E; Traves, V; Herrera-Ceballos, E; Nagore, E

    2015-04-01

    The main aim of this study was to identify predictors of sentinel lymph node (SN) metastasis in cutaneous melanoma. This was a retrospective cohort study of 818 patients in 2 tertiary-level hospitals. The primary outcome variable was SN involvement. Independent predictors were identified using multiple logistic regression and a classification and regression tree (CART) analysis. Ulceration, tumor thickness, and a high mitotic rate (≥6 mitoses/mm(2)) were independently associated with SN metastasis in the multiple regression analysis. The most important predictor in the CART analysis was Breslow thickness. Absence of an inflammatory infiltrate, patient age, and tumor location were predictive of SN metastasis in patients with tumors thicker than 2mm. In the case of thinner melanomas, the predictors were mitotic rate (>6 mitoses/mm(2)), presence of ulceration, and tumor thickness. Patient age, mitotic rate, and tumor thickness and location were predictive of survival. A high mitotic rate predicts a higher risk of SN involvement and worse survival. CART analysis improves the prediction of regional metastasis, resulting in better clinical management of melanoma patients. It may also help select suitable candidates for inclusion in clinical trials. Copyright © 2014 Elsevier España, S.L.U. and AEDV. All rights reserved.

  11. Predicting smear negative pulmonary tuberculosis with classification trees and logistic regression: a cross-sectional study

    Directory of Open Access Journals (Sweden)

    Kritski Afrânio

    2006-02-01

    Full Text Available Abstract Background Smear negative pulmonary tuberculosis (SNPT accounts for 30% of pulmonary tuberculosis cases reported yearly in Brazil. This study aimed to develop a prediction model for SNPT for outpatients in areas with scarce resources. Methods The study enrolled 551 patients with clinical-radiological suspicion of SNPT, in Rio de Janeiro, Brazil. The original data was divided into two equivalent samples for generation and validation of the prediction models. Symptoms, physical signs and chest X-rays were used for constructing logistic regression and classification and regression tree models. From the logistic regression, we generated a clinical and radiological prediction score. The area under the receiver operator characteristic curve, sensitivity, and specificity were used to evaluate the model's performance in both generation and validation samples. Results It was possible to generate predictive models for SNPT with sensitivity ranging from 64% to 71% and specificity ranging from 58% to 76%. Conclusion The results suggest that those models might be useful as screening tools for estimating the risk of SNPT, optimizing the utilization of more expensive tests, and avoiding costs of unnecessary anti-tuberculosis treatment. Those models might be cost-effective tools in a health care network with hierarchical distribution of scarce resources.

  12. Pruning Chinese trees : an experimental and modelling approach

    NARCIS (Netherlands)

    Zeng, Bo

    2002-01-01

    Pruning of trees, in which some branches are removed from the lower crown of a tree, has been extensively used in China in silvicultural management for many purposes. With an experimental and modelling approach, the effects of pruning on tree growth and on the harvest of plant material were studied.

  13. A Regression Approach for Forecasting Vendor Revenue in Telecommunication Industries

    Directory of Open Access Journals (Sweden)

    Aida Mustapha

    2014-12-01

    Full Text Available In many telecommunication companies, Entrepreneur Development Unit (EDU is responsible to manage a big group of vendors that hold contract with the company. This unit assesses the vendors’ performance in terms of revenue and profitability on yearly basis and uses the information in arranging suitable development trainings. The main challenge faced by this unit, however, is to obtain the annual revenue data from the vendors due to time constraints. This paper presents a regression approach to predict the vendors’ annual revenues based on their previous records so the assessment exercise could be expedited. Three regression methods were investigated; linear regression, sequential minimal optimization algorithm, and M5rules. The results were analysed and discussed.

  14. ESTIMATING PROJECT DEVELOPMENT EFFORT USING CLUSTERED REGRESSION APPROACH

    Directory of Open Access Journals (Sweden)

    Geeta Nagpal

    2013-02-01

    Full Text Available Due to the intangible nature of “software”, accurate and reliable software effort estimation is a challenge in the software Industry. It is unlikely to expect very accurate estimates of software development effort because of the inherent uncertainty in software development projects and the complex and dynamic interaction of factors that impact software development. Heterogeneity exists in the software engineering datasets because data is made available from diverse sources. This can be reduced by defining certain relationship between the data values by classifying them into different clusters. This study focuses on how the combination of clustering and regression techniques can reduce the potential problems in effectiveness of predictive efficiency due to heterogeneity of the data. Using a clustered approach creates the subsets of data having a degree of homogeneity that enhances prediction accuracy. It was also observed in this study that ridge regression performs better than other regression techniques used in the analysis.

  15. Implementing GIS regression trees for generating the spatial distribution of copper in Mediterranean environments

    DEFF Research Database (Denmark)

    Bou Kheir, Rania; Greve, Mogens Humlekrog; Deroin, Jean-Paul

    2013-01-01

    Soil contamination by heavy metals has become a widespread dangerous problem in many parts of the world, including the Mediterranean environments. This is closely related to the increase irrigation by waste waters, to the uncontrolled application of sewage sludge, industrial effluents, pesticides...... coastal area situated in northern Lebanon using a geographic information system (GIS) and regression-tree analysis. The chosen area represents a typical case study of Mediterranean coastal landscape with deteriorating environment. Fifteen environmental parameters (parent material, soil type, p......H, hydraulical conductivity, organic matter, stoniness ratio, soil depth, slope gradient, slope aspect, slope curvature, land cover/use, distance to drainage line, proximity to roads, nearness to cities, and surroundings to waste areas) were generated from satellite imageries, Digital Elevation Models (DEMs...

  16. Improved predictive mapping of indoor radon concentrations using ensemble regression trees based on automatic clustering of geological units.

    Science.gov (United States)

    Kropat, Georg; Bochud, Francois; Jaboyedoff, Michel; Laedermann, Jean-Pascal; Murith, Christophe; Palacios Gruson, Martha; Baechler, Sébastien

    2015-09-01

    According to estimations around 230 people die as a result of radon exposure in Switzerland. This public health concern makes reliable indoor radon prediction and mapping methods necessary in order to improve risk communication to the public. The aim of this study was to develop an automated method to classify lithological units according to their radon characteristics and to develop mapping and predictive tools in order to improve local radon prediction. About 240 000 indoor radon concentration (IRC) measurements in about 150 000 buildings were available for our analysis. The automated classification of lithological units was based on k-medoids clustering via pair-wise Kolmogorov distances between IRC distributions of lithological units. For IRC mapping and prediction we used random forests and Bayesian additive regression trees (BART). The automated classification groups lithological units well in terms of their IRC characteristics. Especially the IRC differences in metamorphic rocks like gneiss are well revealed by this method. The maps produced by random forests soundly represent the regional difference of IRCs in Switzerland and improve the spatial detail compared to existing approaches. We could explain 33% of the variations in IRC data with random forests. Additionally, the influence of a variable evaluated by random forests shows that building characteristics are less important predictors for IRCs than spatial/geological influences. BART could explain 29% of IRC variability and produced maps that indicate the prediction uncertainty. Ensemble regression trees are a powerful tool to model and understand the multidimensional influences on IRCs. Automatic clustering of lithological units complements this method by facilitating the interpretation of radon properties of rock types. This study provides an important element for radon risk communication. Future approaches should consider taking into account further variables like soil gas radon measurements as

  17. Prediction of cannabis and cocaine use in adolescence using decision trees and logistic regression

    Directory of Open Access Journals (Sweden)

    Alfonso L. Palmer

    2010-01-01

    Full Text Available Spain is one of the European countries with the highest prevalence of cannabis and cocaine use among young people. The aim of this study was to investigate the factors related to the consumption of cocaine and cannabis among adolescents. A questionnaire was administered to 9,284 students between 14 and 18 years of age in Palma de Mallorca (47.1% boys and 52.9% girls whose mean age was 15.59 years. Logistic regression and decision trees were carried out in order to model the consumption of cannabis and cocaine. The results show the use of legal substances and committing fraudulence or theft are the main variables that raise the odds of consuming cannabis. In boys, cannabis consumption and a family history of drug use increase the odds of consuming cocaine, whereas in girls the use of alcohol, behaviours of fraudulence or theft and difficulty in some personal skills influence their odds of consuming cocaine. Finally, ease of access to the substance greatly raises the odds of consuming cocaine and cannabis in both genders. Decision trees highlight the role of consuming other substances and committing fraudulence or theft. The results of this study gain importance when it comes to putting into practice effective prevention programmes.

  18. Analysis of Daytime and Nighttime Ground Level Ozone Concentrations Using Boosted Regression Tree Technique

    Directory of Open Access Journals (Sweden)

    Noor Zaitun Yahaya

    2017-01-01

    Full Text Available This paper investigated the use of boosted regression trees (BRTs to draw an inference about daytime and nighttime ozone formation in a coastal environment. Hourly ground-level ozone data for a full calendar year in 2010 were obtained from the Kemaman (CA 002 air quality monitoring station. A BRT model was developed using hourly ozone data as a response variable and nitric oxide (NO, Nitrogen Dioxide (NO2 and Nitrogen Dioxide (NOx and meteorological parameters as explanatory variables. The ozone BRT algorithm model was constructed from multiple regression models, and the 'best iteration' of BRT model was performed by optimizing prediction performance. Sensitivity testing of the BRT model was conducted to determine the best parameters and good explanatory variables. Using the number of trees between 2,500-3,500, learning rate of 0.01, and interaction depth of 5 were found to be the best setting for developing the ozone boosting model. The performance of the O3 boosting models were assessed, and the fraction of predictions within two factor (FAC2, coefficient of determination (R2 and the index of agreement (IOA of the model developed for day and nighttime are 0.93, 0.69 and 0.73 for daytime and 0.79, 0.55 and 0.69 for nighttime respectively. Results showed that the model developed was within the acceptable range and could be used to understand ozone formation and identify potential sources of ozone for estimating O3 concentrations during daytime and nighttime. Results indicated that the wind speed, wind direction, relative humidity, and temperature were the most dominant variables in terms of influencing ozone formation. Finally, empirical evidence of the production of a high ozone level by wind blowing from coastal areas towards the interior region, especially from industrial areas, was obtained.

  19. Estimating biomass of mixed and uneven-aged forests using spectral data and a hybrid model combining regression trees and linear models

    Directory of Open Access Journals (Sweden)

    López-Serrano PM

    2016-04-01

    Full Text Available The Sierra Madre Occidental mountain range (Durango, Mexico is of great ecological interest because of the high degree of environmental heterogeneity in the area. The objective of the present study was to estimate the biomass of mixed and uneven-aged forests in the Sierra Madre Occidental by using Landsat-5 TM spectral data and forest inventory data. We used the ATCOR3® atmospheric and topographic correction module to convert remotely sensed imagery digital signals to surface reflectance values. The usual approach of modeling stand variables by using multiple linear regression was compared with a hybrid model developed in two steps: in the first step a regression tree was used to obtain an initial classification of homogeneous biomass groups, and multiple linear regression models were then fitted to each node of the pruned regression tree. Cross-validation of the hybrid model explained 72.96% of the observed stand biomass variation, with a reduction in the RMSE of 25.47% with respect to the estimates yielded by the linear model fitted to the complete database. The most important variables for the binary classification process in the regression tree were the albedo, the corrected readings of the short-wave infrared band of the satellite (2.08-2.35 µm and the topographic moisture index. We used the model output to construct a map for estimating biomass in the study area, which yielded values of between 51 and 235 Mg ha-1. The use of regression trees in combination with stepwise regression of corrected satellite imagery proved a reliable method for estimating forest biomass.

  20. Building optimal regression tree by ant colony system-genetic algorithm: Application to modeling of melting points

    Energy Technology Data Exchange (ETDEWEB)

    Hemmateenejad, Bahram, E-mail: hemmatb@sums.ac.ir [Department of Chemistry, Shiraz University, Shiraz (Iran, Islamic Republic of); Medicinal and Natural Products Chemistry Research Center, Shiraz University of Medical Sciences, Shiraz (Iran, Islamic Republic of); Shamsipur, Mojtaba [Department of Chemistry, Razi University, Kermanshah (Iran, Islamic Republic of); Zare-Shahabadi, Vali [Young Researchers Club, Mahshahr Branch, Islamic Azad University, Mahshahr (Iran, Islamic Republic of); Akhond, Morteza [Department of Chemistry, Shiraz University, Shiraz (Iran, Islamic Republic of)

    2011-10-17

    Highlights: {yields} Ant colony systems help to build optimum classification and regression trees. {yields} Using of genetic algorithm operators in ant colony systems resulted in more appropriate models. {yields} Variable selection in each terminal node of the tree gives promising results. {yields} CART-ACS-GA could model the melting point of organic materials with prediction errors lower than previous models. - Abstract: The classification and regression trees (CART) possess the advantage of being able to handle large data sets and yield readily interpretable models. A conventional method of building a regression tree is recursive partitioning, which results in a good but not optimal tree. Ant colony system (ACS), which is a meta-heuristic algorithm and derived from the observation of real ants, can be used to overcome this problem. The purpose of this study was to explore the use of CART and its combination with ACS for modeling of melting points of a large variety of chemical compounds. Genetic algorithm (GA) operators (e.g., cross averring and mutation operators) were combined with ACS algorithm to select the best solution model. In addition, at each terminal node of the resulted tree, variable selection was done by ACS-GA algorithm to build an appropriate partial least squares (PLS) model. To test the ability of the resulted tree, a set of approximately 4173 structures and their melting points were used (3000 compounds as training set and 1173 as validation set). Further, an external test set containing of 277 drugs was used to validate the prediction ability of the tree. Comparison of the results obtained from both trees showed that the tree constructed by ACS-GA algorithm performs better than that produced by recursive partitioning procedure.

  1. Analysing inequalities in Germany a structured additive distributional regression approach

    CERN Document Server

    Silbersdorff, Alexander

    2017-01-01

    This book seeks new perspectives on the growing inequalities that our societies face, putting forward Structured Additive Distributional Regression as a means of statistical analysis that circumvents the common problem of analytical reduction to simple point estimators. This new approach allows the observed discrepancy between the individuals’ realities and the abstract representation of those realities to be explicitly taken into consideration using the arithmetic mean alone. In turn, the method is applied to the question of economic inequality in Germany.

  2. Response of the regression tree model to high resolution remote sensing data for predicting percent tree cover in a Mediterranean ecosystem.

    Science.gov (United States)

    Donmez, Cenk; Berberoglu, Suha; Erdogan, Mehmet Akif; Tanriover, Anil Akin; Cilek, Ahmet

    2015-02-01

    Percent tree cover is the percentage of the ground surface area covered by a vertical projection of the outermost perimeter of the plants. It is an important indicator to reveal the condition of forest systems and has a significant importance for ecosystem models as a main input. The aim of this study is to estimate the percent tree cover of various forest stands in a Mediterranean environment based on an empirical relationship between tree coverage and remotely sensed data in Goksu Watershed located at the Eastern Mediterranean coast of Turkey. A regression tree algorithm was used to simulate spatial fractions of Pinus nigra, Cedrus libani, Pinus brutia, Juniperus excelsa and Quercus cerris using multi-temporal LANDSAT TM/ETM data as predictor variables and land cover information. Two scenes of high resolution GeoEye-1 images were employed for training and testing the model. The predictor variables were incorporated in addition to biophysical variables estimated from the LANDSAT TM/ETM data. Additionally, normalised difference vegetation index (NDVI) was incorporated to LANDSAT TM/ETM band settings as a biophysical variable. Stepwise linear regression (SLR) was applied for selecting the relevant bands to employ in regression tree process. SLR-selected variables produced accurate results in the model with a high correlation coefficient of 0.80. The output values ranged from 0 to 100 %. The different tree species were mapped in 30 m resolution in respect to elevation. Percent tree cover map as a final output was derived using LANDSAT TM/ETM image over Goksu Watershed and the biophysical variables. The results were tested using high spatial resolution GeoEye-1 images. Thus, the combination of the RT algorithm and higher resolution data for percent tree cover mapping were tested and examined in a complex Mediterranean environment.

  3. A New Approach in Regression Analysis for Modeling Adsorption Isotherms

    Directory of Open Access Journals (Sweden)

    Dana D. Marković

    2014-01-01

    Full Text Available Numerous regression approaches to isotherm parameters estimation appear in the literature. The real insight into the proper modeling pattern can be achieved only by testing methods on a very big number of cases. Experimentally, it cannot be done in a reasonable time, so the Monte Carlo simulation method was applied. The objective of this paper is to introduce and compare numerical approaches that involve different levels of knowledge about the noise structure of the analytical method used for initial and equilibrium concentration determination. Six levels of homoscedastic noise and five types of heteroscedastic noise precision models were considered. Performance of the methods was statistically evaluated based on median percentage error and mean absolute relative error in parameter estimates. The present study showed a clear distinction between two cases. When equilibrium experiments are performed only once, for the homoscedastic case, the winning error function is ordinary least squares, while for the case of heteroscedastic noise the use of orthogonal distance regression or Margart’s percent standard deviation is suggested. It was found that in case when experiments are repeated three times the simple method of weighted least squares performed as well as more complicated orthogonal distance regression method.

  4. Predictive model of biliocystic communication in liver hydatid cysts using classification and regression tree analysis

    Directory of Open Access Journals (Sweden)

    Souadka Amine

    2010-04-01

    Full Text Available Abstract Background Incidence of liver hydatid cyst (LHC rupture ranged 15%-40% of all cases and most of them concern the bile duct tree. Patients with biliocystic communication (BCC had specific clinic and therapeutic aspect. The purpose of this study was to determine witch patients with LHC may develop BCC using classification and regression tree (CART analysis Methods A retrospective study of 672 patients with liver hydatid cyst treated at the surgery department "A" at Ibn Sina University Hospital, Rabat Morocco. Four-teen risk factors for BCC occurrence were entered into CART analysis to build an algorithm that can predict at the best way the occurrence of BCC. Results Incidence of BCC was 24.5%. Subgroups with high risk were patients with jaundice and thick pericyst risk at 73.2% and patients with thick pericyst, with no jaundice 36.5 years and younger with no past history of LHC risk at 40.5%. Our developed CART model has sensitivity at 39.6%, specificity at 93.3%, positive predictive value at 65.6%, a negative predictive value at 82.6% and accuracy of good classification at 80.1%. Discriminating ability of the model was good 82%. Conclusion we developed a simple classification tool to identify LHC patients with high risk BCC during a routine clinic visit (only on clinical history and examination followed by an ultrasonography. Predictive factors were based on pericyst aspect, jaundice, age, past history of liver hydatidosis and morphological Gharbi cyst aspect. We think that this classification can be useful with efficacy to direct patients at appropriated medical struct's.

  5. Weighing risk factors associated with bee colony collapse disorder by classification and regression tree analysis.

    Science.gov (United States)

    VanEngelsdorp, Dennis; Speybroeck, Niko; Evans, Jay D; Nguyen, Bach Kim; Mullin, Chris; Frazier, Maryann; Frazier, Jim; Cox-Foster, Diana; Chen, Yanping; Tarpy, David R; Haubruge, Eric; Pettis, Jeffrey S; Saegerman, Claude

    2010-10-01

    Colony collapse disorder (CCD), a syndrome whose defining trait is the rapid loss of adult worker honey bees, Apis mellifera L., is thought to be responsible for a minority of the large overwintering losses experienced by U.S. beekeepers since the winter 2006-2007. Using the same data set developed to perform a monofactorial analysis (PloS ONE 4: e6481, 2009), we conducted a classification and regression tree (CART) analysis in an attempt to better understand the relative importance and interrelations among different risk variables in explaining CCD. Fifty-five exploratory variables were used to construct two CART models: one model with and one model without a cost of misclassifying a CCD-diagnosed colony as a non-CCD colony. The resulting model tree that permitted for misclassification had a sensitivity and specificity of 85 and 74%, respectively. Although factors measuring colony stress (e.g., adult bee physiological measures, such as fluctuating asymmetry or mass of head) were important discriminating values, six of the 19 variables having the greatest discriminatory value were pesticide levels in different hive matrices. Notably, coumaphos levels in brood (a miticide commonly used by beekeepers) had the highest discriminatory value and were highest in control (healthy) colonies. Our CART analysis provides evidence that CCD is probably the result of several factors acting in concert, making afflicted colonies more susceptible to disease. This analysis highlights several areas that warrant further attention, including the effect of sublethal pesticide exposure on pathogen prevalence and the role of variability in bee tolerance to pesticides on colony survivorship.

  6. Using boosted regression trees to predict the near-saturated hydraulic conductivity of undisturbed soils

    Science.gov (United States)

    Koestel, John; Bechtold, Michel; Jorda, Helena; Jarvis, Nicholas

    2015-04-01

    The saturated and near-saturated hydraulic conductivity of soil is of key importance for modelling water and solute fluxes in the vadose zone. Hydraulic conductivity measurements are cumbersome at the Darcy scale and practically impossible at larger scales where water and solute transport models are mostly applied. Hydraulic conductivity must therefore be estimated from proxy variables. Such pedotransfer functions are known to work decently well for e.g. water retention curves but rather poorly for near-saturated and saturated hydraulic conductivities. Recently, Weynants et al. (2009, Revisiting Vereecken pedotransfer functions: Introducing a closed-form hydraulic model. Vadose Zone Journal, 8, 86-95) reported a coefficients of determination of 0.25 (validation with an independent data set) for the saturated hydraulic conductivity from lab-measurements of Belgian soil samples. In our study, we trained boosted regression trees on a global meta-database containing tension-disk infiltrometer data (see Jarvis et al. 2013. Influence of soil, land use and climatic factors on the hydraulic conductivity of soil. Hydrology & Earth System Sciences, 17, 5185-5195) to predict the saturated hydraulic conductivity (Ks) and the conductivity at a tension of 10 cm (K10). We found coefficients of determination of 0.39 and 0.62 under a simple 10-fold cross-validation for Ks and K10. When carrying out the validation folded over the data-sources, i.e. the source publications, we found that the corresponding coefficients of determination reduced to 0.15 and 0.36, respectively. We conclude that the stricter source-wise cross-validation should be applied in future pedotransfer studies to prevent overly optimistic validation results. The boosted regression trees also allowed for an investigation of relevant predictors for estimating the near-saturated hydraulic conductivity. We found that land use and bulk density were most important to predict Ks. We also observed that Ks is large in fine

  7. Artificial Neural Network (ANN) and Regression Tree (CART) applications for the indirect estimation of unsaturated soil shear strength parameters

    Science.gov (United States)

    Kanungo, D. P.; Sharma, Shaifaly; Pain, Anindya

    2014-09-01

    The shear strength parameters of soil (cohesion and angle of internal friction) are quite essential in solving many civil engineering problems. In order to determine these parameters, laboratory tests are used. The main objective of this work is to evaluate the potential of Artificial Neural Network (ANN) and Regression Tree (CART) techniques for the indirect estimation of these parameters. Four different models, considering different combinations of 6 inputs, such as gravel %, sand %, silt %, clay %, dry density, and plasticity index, were investigated to evaluate the degree of their effects on the prediction of shear parameters. A performance evaluation was carried out using Correlation Coefficient and Root Mean Squared Error measures. It was observed that for the prediction of friction angle, the performance of both the techniques is about the same. However, for the prediction of cohesion, the ANN technique performs better than the CART technique. It was further observed that the model considering all of the 6 input soil parameters is the most appropriate model for the prediction of shear parameters. Also, connection weight and bias analyses of the best neural network (i.e., 6/2/2) were attempted using Connection Weight, Garson, and proposed Weight-bias approaches to characterize the influence of input variables on shear strength parameters. It was observed that the Connection Weight Approach provides the best overall methodology for accurately quantifying variable importance, and should be favored over the other approaches examined in this study.

  8. Risk assessment of dental caries by using Classification and Regression Trees.

    Science.gov (United States)

    Ito, Ataru; Hayashi, Mikako; Hamasaki, Toshimitsu; Ebisu, Shigeyuki

    2011-06-01

    Being able to predict an individual's risks of dental caries would offer a potentially huge natural step forward toward better oral heath. As things stand, preventive treatment against caries is mostly carried out without risk assessment because there is no proven way to analyse an individual's risk factors. The purpose of this study was to try to identify those patients with high and low risk of caries by using Classification and Regression Trees (CART). In this historical cohort study, data from 442 patients in a general practice who met the inclusion criteria were analysed. CART was applied to the data to seek a model for predicting caries by using the following parameters according to each patient: age, number of carious teeth, numbers of cariogenic bacteria, the secretion rate and buffer capacity of saliva, and compliance with a prevention programme. The risks of caries were presented by odds ratios. Multiple logistic regression analysis was performed to confirm the results obtained by CART. CART identified high and low risk patients for primary caries with relative odds ratios of 0.41 (95%CI: 0.22-0.77, p = 0.0055) and 2.88 (95%CI: 1.49-5.59, p = 0.0018) according the numbers of cariogenic bacteria. High and low risk patients for secondary caries were also identified with the odds ratios of 0.07 (95%CI: 0.01-0.55, p = 0.00109) and 7.00 (95%CI: 3.50-13.98, p caries. Cariogenic bacteria play a leading role in the incidence of caries. CART proved effective in identifying an individual patient's risk of caries. Copyright © 2011 Elsevier Ltd. All rights reserved.

  9. Differential Diagnosis of Erythmato-Squamous Diseases Using Classification and Regression Tree

    Science.gov (United States)

    Maghooli, Keivan; Langarizadeh, Mostafa; Shahmoradi, Leila; Habibi-koolaee, Mahdi; Jebraeily, Mohamad; Bouraghi, Hamid

    2016-01-01

    Introduction: Differential diagnosis of Erythmato-Squamous Diseases (ESD) is a major challenge in the field of dermatology. The ESD diseases are placed into six different classes. Data mining is the process for detection of hidden patterns. In the case of ESD, data mining help us to predict the diseases. Different algorithms were developed for this purpose. Objective: we aimed to use the Classification and Regression Tree (CART) to predict differential diagnosis of ESD. Methods: we used the Cross Industry Standard Process for Data Mining (CRISP-DM) methodology. For this purpose, the dermatology data set from machine learning repository, UCI was obtained. The Clementine 12.0 software from IBM Company was used for modelling. In order to evaluation of the model we calculate the accuracy, sensitivity and specificity of the model. Results: The proposed model had an accuracy of 94.84% ( Standard Deviation: 24.42) in order to correct prediction of the ESD disease. Conclusions: Results indicated that using of this classifier could be useful. But, it would be strongly recommended that the combination of machine learning methods could be more useful in terms of prediction of ESD. PMID:28077889

  10. Automated Detection of Connective Tissue by Tissue Counter Analysis and Classification and Regression Trees

    Directory of Open Access Journals (Sweden)

    Josef Smolle

    2001-01-01

    Full Text Available Objective: To evaluate the feasibility of the CART (Classification and Regression Tree procedure for the recognition of microscopic structures in tissue counter analysis. Methods: Digital microscopic images of H&E stained slides of normal human skin and of primary malignant melanoma were overlayed with regularly distributed square measuring masks (elements and grey value, texture and colour features within each mask were recorded. In the learning set, elements were interactively labeled as representing either connective tissue of the reticular dermis, other tissue components or background. Subsequently, CART models were based on these data sets. Results: Implementation of the CART classification rules into the image analysis program showed that in an independent test set 94.1% of elements classified as connective tissue of the reticular dermis were correctly labeled. Automated measurements of the total amount of tissue and of the amount of connective tissue within a slide showed high reproducibility (r=0.97 and r=0.94, respectively; p < 0.001. Conclusions: CART procedure in tissue counter analysis yields simple and reproducible classification rules for tissue elements.

  11. Matching in Vitro Bioaccessibility of Polyphenols and Antioxidant Capacity of Soluble Coffee by Boosted Regression Trees.

    Science.gov (United States)

    Podio, Natalia S; López-Froilán, Rebeca; Ramirez-Moreno, Esther; Bertrand, Lidwina; Baroni, María V; Pérez-Rodríguez, María L; Sánchez-Mata, María-Cortes; Wunderlin, Daniel A

    2015-11-01

    The aim of this study was to evaluate changes in polyphenol profile and antioxidant capacity of five soluble coffees throughout a simulated gastro-intestinal digestion, including absorption through a dialysis membrane. Our results demonstrate that both polyphenol content and antioxidant capacity were characteristic for each type of studied coffee, showing a drop after dialysis. Twenty-seven compounds were identified in coffee by HPLC-MS, while only 14 of them were found after dialysis. Green+roasted coffee blend and chicory+coffee blend showed the highest and lowest content of polyphenols and antioxidant capacity before in vitro digestion and after dialysis, respectively. Canonical correlation analysis showed significant correlation between the antioxidant capacity and the polyphenol profile before digestion and after dialysis. Furthermore, boosted regression trees analysis (BRT) showed that only four polyphenol compounds (5-p-coumaroylquinic acid, quinic acid, coumaroyl tryptophan conjugated, and 5-O-caffeoylquinic acid) appear to be the most relevant to explain the antioxidant capacity after dialysis, these compounds being the most bioaccessible after dialysis. To our knowledge, this is the first report matching the antioxidant capacity of foods with the polyphenol profile by BRT, which opens an interesting method of analysis for future reports on the antioxidant capacity of foods.

  12. Estimating carbon and showing impacts of drought using satellite data in regression-tree models

    Science.gov (United States)

    Boyte, Stephen; Wylie, Bruce K.; Howard, Danny; Dahal, Devendra; Gilmanov, Tagir G.

    2018-01-01

    Integrating spatially explicit biogeophysical and remotely sensed data into regression-tree models enables the spatial extrapolation of training data over large geographic spaces, allowing a better understanding of broad-scale ecosystem processes. The current study presents annual gross primary production (GPP) and annual ecosystem respiration (RE) for 2000–2013 in several short-statured vegetation types using carbon flux data from towers that are located strategically across the conterminous United States (CONUS). We calculate carbon fluxes (annual net ecosystem production [NEP]) for each year in our study period, which includes 2012 when drought and higher-than-normal temperatures influence vegetation productivity in large parts of the study area. We present and analyse carbon flux dynamics in the CONUS to better understand how drought affects GPP, RE, and NEP. Model accuracy metrics show strong correlation coefficients (r) (r ≥ 94%) between training and estimated data for both GPP and RE. Overall, average annual GPP, RE, and NEP are relatively constant throughout the study period except during 2012 when almost 60% less carbon is sequestered than normal. These results allow us to conclude that this modelling method effectively estimates carbon dynamics through time and allows the exploration of impacts of meteorological anomalies and vegetation types on carbon dynamics.

  13. Prediction of cadmium enrichment in reclaimed coastal soils by classification and regression tree

    Science.gov (United States)

    Ru, Feng; Yin, Aijing; Jin, Jiaxin; Zhang, Xiuying; Yang, Xiaohui; Zhang, Ming; Gao, Chao

    2016-08-01

    Reclamation of coastal land is one of the most common ways to obtain land resources in China. However, it has long been acknowledged that the artificial interference with coastal land has disadvantageous effects, such as heavy metal contamination. This study aimed to develop a prediction model for cadmium enrichment levels and assess the importance of affecting factors in typical reclaimed land in Eastern China (DFCL: Dafeng Coastal Land). Two hundred and twenty seven surficial soil/sediment samples were collected and analyzed to identify the enrichment levels of cadmium and the possible affecting factors in soils and sediments. The classification and regression tree (CART) model was applied in this study to predict cadmium enrichment levels. The prediction results showed that cadmium enrichment levels assessed by the CART model had an accuracy of 78.0%. The CART model could extract more information on factors affecting the environmental behavior of cadmium than correlation analysis. The integration of correlation analysis and the CART model showed that fertilizer application and organic carbon accumulation were the most important factors affecting soil/sediment cadmium enrichment levels, followed by particle size effects (Al2O3, TFe2O3 and SiO2), contents of Cl and S, surrounding construction areas and reclamation history.

  14. KLASIFIKASI KARAKTERISTIK KECELAKAAN LALU LINTAS DI KOTA DENPASAR DENGAN PENDEKATAN CLASSIFICATION AND REGRESSION TREES (CART

    Directory of Open Access Journals (Sweden)

    I GEDE AGUS JIWADIANA

    2015-11-01

    Full Text Available The aim of this research is to determine the classification characteristics of traffic accidents in Denpasar city in January-July 2014 by using Classification And Regression Trees (CART. Then, for determine the explanatory variables into the main classifier of CART. The result showed that optimum CART generate three terminal node. First terminal node, there are 12 people were classified as heavy traffic accident characteritics with single accident, and second terminal nodes, there are 68 people were classified as minor traffic accident characteristics by type of traffic accident front-rear, front-front, front-side, pedestrians, side-side and location of traffic accident in district road and sub-district road. For third terminal node, there are 291 people were classified as medium traffic accident characteristics by type of traffic accident front-rear, front-front, front-side, pedestrians, side-side and location of traffic accident in municipality road and explanatory variables into the main splitter to make of CART is type of traffic accident with maximum homogeneity measure of 0.03252.

  15. Identification of Sexually Abused Female Adolescents at Risk for Suicidal Ideations: A Classification and Regression Tree Analysis

    Science.gov (United States)

    Brabant, Marie-Eve; Hebert, Martine; Chagnon, Francois

    2013-01-01

    This study explored the clinical profiles of 77 female teenager survivors of sexual abuse and examined the association of abuse-related and personal variables with suicidal ideations. Analyses revealed that 64% of participants experienced suicidal ideations. Findings from classification and regression tree analysis indicated that depression,…

  16. A Visual Analytics Approach for Correlation, Classification, and Regression Analysis

    Energy Technology Data Exchange (ETDEWEB)

    Steed, Chad A [ORNL; SwanII, J. Edward [Mississippi State University (MSU); Fitzpatrick, Patrick J. [Mississippi State University (MSU); Jankun-Kelly, T.J. [Mississippi State University (MSU)

    2012-02-01

    New approaches that combine the strengths of humans and machines are necessary to equip analysts with the proper tools for exploring today's increasing complex, multivariate data sets. In this paper, a novel visual data mining framework, called the Multidimensional Data eXplorer (MDX), is described that addresses the challenges of today's data by combining automated statistical analytics with a highly interactive parallel coordinates based canvas. In addition to several intuitive interaction capabilities, this framework offers a rich set of graphical statistical indicators, interactive regression analysis, visual correlation mining, automated axis arrangements and filtering, and data classification techniques. The current work provides a detailed description of the system as well as a discussion of key design aspects and critical feedback from domain experts.

  17. A Practical pedestrian approach to parsimonious regression with inaccurate inputs

    Directory of Open Access Journals (Sweden)

    Seppo Karrila

    2014-04-01

    Full Text Available A measurement result often dictates an interval containing the correct value. Interval data is also created by roundoff, truncation, and binning. We focus on such common interval uncertainty in data. Inaccuracy in model inputs is typically ignored on model fitting. We provide a practical approach for regression with inaccurate data: the mathematics is easy, and the linear programming formulations simple to use even in a spreadsheet. This self-contained elementary presentation introduces interval linear systems and requires only basic knowledge of algebra. Feature selection is automatic; but can be controlled to find only a few most relevant inputs; and joint feature selection is enabled for multiple modeled outputs. With more features than cases, a novel connection to compressed sensing emerges: robustness against interval errors-in-variables implies model parsimony, and the input inaccuracies determine the regularization term. A small numerical example highlights counterintuitive results and a dramatic difference to total least squares.

  18. Multi-scale remote sensing sagebrush characterization with regression trees over Wyoming, USA: Laying a foundation for monitoring

    Science.gov (United States)

    Homer, Collin G.; Aldridge, Cameron L.; Meyer, Debra K.; Schell, Spencer J.

    2012-02-01

    Sagebrush ecosystems in North America have experienced extensive degradation since European settlement. Further degradation continues from exotic invasive plants, altered fire frequency, intensive grazing practices, oil and gas development, and climate change - adding urgency to the need for ecosystem-wide understanding. Remote sensing is often identified as a key information source to facilitate ecosystem-wide characterization, monitoring, and analysis; however, approaches that characterize sagebrush with sufficient and accurate local detail across large enough areas to support this paradigm are unavailable. We describe the development of a new remote sensing sagebrush characterization approach for the state of Wyoming, U.S.A. This approach integrates 2.4 m QuickBird, 30 m Landsat TM, and 56 m AWiFS imagery into the characterization of four primary continuous field components including percent bare ground, percent herbaceous cover, percent litter, and percent shrub, and four secondary components including percent sagebrush ( Artemisia spp.), percent big sagebrush ( Artemisia tridentata), percent Wyoming sagebrush ( Artemisia tridentata Wyomingensis), and shrub height using a regression tree. According to an independent accuracy assessment, primary component root mean square error (RMSE) values ranged from 4.90 to 10.16 for 2.4 m QuickBird, 6.01 to 15.54 for 30 m Landsat, and 6.97 to 16.14 for 56 m AWiFS. Shrub and herbaceous components outperformed the current data standard called LANDFIRE, with a shrub RMSE value of 6.04 versus 12.64 and a herbaceous component RMSE value of 12.89 versus 14.63. This approach offers new advancements in sagebrush characterization from remote sensing and provides a foundation to quantitatively monitor these components into the future.

  19. Multi-scale remote sensing sagebrush characterization with regression trees over Wyoming, USA: laying a foundation for monitoring

    Science.gov (United States)

    Homer, Collin G.; Aldridge, Cameron L.; Meyer, Debra K.; Schell, Spencer J.

    2012-01-01

    agebrush ecosystems in North America have experienced extensive degradation since European settlement. Further degradation continues from exotic invasive plants, altered fire frequency, intensive grazing practices, oil and gas development, and climate change – adding urgency to the need for ecosystem-wide understanding. Remote sensing is often identified as a key information source to facilitate ecosystem-wide characterization, monitoring, and analysis; however, approaches that characterize sagebrush with sufficient and accurate local detail across large enough areas to support this paradigm are unavailable. We describe the development of a new remote sensing sagebrush characterization approach for the state of Wyoming, U.S.A. This approach integrates 2.4 m QuickBird, 30 m Landsat TM, and 56 m AWiFS imagery into the characterization of four primary continuous field components including percent bare ground, percent herbaceous cover, percent litter, and percent shrub, and four secondary components including percent sagebrush (Artemisia spp.), percent big sagebrush (Artemisia tridentata), percent Wyoming sagebrush (Artemisia tridentata Wyomingensis), and shrub height using a regression tree. According to an independent accuracy assessment, primary component root mean square error (RMSE) values ranged from 4.90 to 10.16 for 2.4 m QuickBird, 6.01 to 15.54 for 30 m Landsat, and 6.97 to 16.14 for 56 m AWiFS. Shrub and herbaceous components outperformed the current data standard called LANDFIRE, with a shrub RMSE value of 6.04 versus 12.64 and a herbaceous component RMSE value of 12.89 versus 14.63. This approach offers new advancements in sagebrush characterization from remote sensing and provides a foundation to quantitatively monitor these components into the future.

  20. Does intense monitoring matter? A quantile regression approach

    Directory of Open Access Journals (Sweden)

    Fekri Ali Shawtari

    2017-06-01

    Full Text Available Corporate governance has become a centre of attention in corporate management at both micro and macro levels due to adverse consequences and repercussion of insufficient accountability. In this study, we include the Malaysian stock market as sample to explore the impact of intense monitoring on the relationship between intellectual capital performance and market valuation. The objectives of the paper are threefold: i to investigate whether intense monitoring affects the intellectual capital performance of listed companies; ii to explore the impact of intense monitoring on firm value; iii to examine the extent to which the directors serving more than two board committees affects the linkage between intellectual capital performance and firms' value. We employ two approaches, namely, the Ordinary Least Square (OLS and the quantile regression approach. The purpose of the latter is to estimate and generate inference about conditional quantile functions. This method is useful when the conditional distribution does not have a standard shape such as an asymmetric, fat-tailed, or truncated distribution. In terms of variables, the intellectual capital is measured using the value added intellectual coefficient (VAIC, while the market valuation is proxied by firm's market capitalization. The findings of the quantile regression shows that some of the results do not coincide with the results of OLS. We found that intensity of monitoring does not influence the intellectual capital of all firms. It is also evident that intensity of monitoring does not influence the market valuation. However, to some extent, it moderates the relationship between intellectual capital performance and market valuation. This paper contributes to the existing literature as it presents new empirical evidences on the moderating effects of the intensity of monitoring of the board committees on the relationship between performance and intellectual capital.

  1. Comprehensive database of diameter-based biomass regressions for North American tree species

    Science.gov (United States)

    Jennifer C. Jenkins; David C. Chojnacky; Linda S. Heath; Richard A. Birdsey

    2004-01-01

    A database consisting of 2,640 equations compiled from the literature for predicting the biomass of trees and tree components from diameter measurements of species found in North America. Bibliographic information, geographic locations, diameter limits, diameter and biomass units, equation forms, statistical errors, and coefficients are provided for each equation,...

  2. Regression Benchmarking: An Approach to Quality Assurance in Performance

    OpenAIRE

    2005-01-01

    The paper presents a short summary of our work in the area of regression benchmarking and its application to software development. Specially, we explain the concept of regression benchmarking, the requirements for employing regression testing in a software project, and methods used for analyzing the vast amounts of data resulting from repeated benchmarking. We present the application of regression benchmarking on a real software project and conclude with a glimpse at the challenges for the fu...

  3. A Support Vector Regression Approach for Investigating Multianticipative Driving Behavior

    Directory of Open Access Journals (Sweden)

    Bin Lu

    2015-01-01

    Full Text Available This paper presents a Support Vector Regression (SVR approach that can be applied to predict the multianticipative driving behavior using vehicle trajectory data. Building upon the SVR approach, a multianticipative car-following model is developed and enhanced in learning speed and predication accuracy. The model training and validation are conducted by using the field trajectory data extracted from the Next Generation Simulation (NGSIM project. During the model training and validation tests, the estimation results show that the SVR model performs as well as IDM model with respect to the model prediction accuracy. In addition, this paper performs a relative importance analysis to quantify the multianticipation in terms of the different stimuli to which drivers react in platoon car following. The analysis results confirm that drivers respond to the behavior of not only the immediate leading vehicle in front but also the second, third, and even fourth leading vehicles. Specifically, in congested traffic conditions, drivers are observed to be more sensitive to the relative speed than to the gap. These findings provide insight into multianticipative driving behavior and illustrate the necessity of taking into account multianticipative car-following model in microscopic traffic simulation.

  4. Alcohol outlet density and violence: A geographically weighted regression approach.

    Science.gov (United States)

    Cameron, Michael P; Cochrane, William; Gordon, Craig; Livingston, Michael

    2016-05-01

    We investigate the relationship between outlet density (of different types) and violence (as measured by police activity) across the North Island of New Zealand, specifically looking at whether the relationships vary spatially. We use New Zealand data at the census area unit (approximately suburb) level, on police-attended violent incidents and outlet density (by type of outlet), controlling for population density and local social deprivation. We employed geographically weighted regression to obtain both global average and locally specific estimates of the relationships between alcohol outlet density and violence. We find that bar and night club density, and licensed club density (e.g. sports clubs) have statistically significant and positive relationships with violence, with an additional bar or night club is associated with nearly 5.3 additional violent events per year, and an additional licensed club associated with 0.8 additional violent events per year. These relationships do not show significant spatial variation. In contrast, the effects of off-licence density and restaurant/café density do exhibit significant spatial variation. However, the non-varying effects of bar and night club density are larger than the locally specific effects of other outlet types. The relationships between outlet density and violence vary significantly across space for off-licences and restaurants/cafés. These results suggest that in order to minimise alcohol-related harms, such as violence, locally specific policy interventions are likely to be necessary. [Cameron MP, Cochrane W, Gordon C, Livingston M. Alcohol outlet density and violence: A geographically weighted regression approach. Drug Alcohol Rev 2016;35:280-288]. © 2015 Australasian Professional Society on Alcohol and other Drugs.

  5. Comparing Kriging and Regression Approaches for Mapping Soil Clay Content in a diverse Danish Landscape

    DEFF Research Database (Denmark)

    Adhikari, Kabindra; Bou Kheir, Rania; Greve, Mette Balslev

    2013-01-01

    technique at a given site has always been a major issue in all soil mapping applications. We studied the prediction performance of ordinary kriging (OK), stratified OK (OKst), regression trees (RT), and rule-based regression kriging (RKrr) for digital mapping of soil clay content at 30.4-m grid size using 6...

  6. Classification and regression tree (CART model to predict pulmonary tuberculosis in hospitalized patients

    Directory of Open Access Journals (Sweden)

    Aguiar Fabio S

    2012-08-01

    Full Text Available Abstract Background Tuberculosis (TB remains a public health issue worldwide. The lack of specific clinical symptoms to diagnose TB makes the correct decision to admit patients to respiratory isolation a difficult task for the clinician. Isolation of patients without the disease is common and increases health costs. Decision models for the diagnosis of TB in patients attending hospitals can increase the quality of care and decrease costs, without the risk of hospital transmission. We present a predictive model for predicting pulmonary TB in hospitalized patients in a high prevalence area in order to contribute to a more rational use of isolation rooms without increasing the risk of transmission. Methods Cross sectional study of patients admitted to CFFH from March 2003 to December 2004. A classification and regression tree (CART model was generated and validated. The area under the ROC curve (AUC, sensitivity, specificity, positive and negative predictive values were used to evaluate the performance of model. Validation of the model was performed with a different sample of patients admitted to the same hospital from January to December 2005. Results We studied 290 patients admitted with clinical suspicion of TB. Diagnosis was confirmed in 26.5% of them. Pulmonary TB was present in 83.7% of the patients with TB (62.3% with positive sputum smear and HIV/AIDS was present in 56.9% of patients. The validated CART model showed sensitivity, specificity, positive predictive value and negative predictive value of 60.00%, 76.16%, 33.33%, and 90.55%, respectively. The AUC was 79.70%. Conclusions The CART model developed for these hospitalized patients with clinical suspicion of TB had fair to good predictive performance for pulmonary TB. The most important variable for prediction of TB diagnosis was chest radiograph results. Prospective validation is still necessary, but our model offer an alternative for decision making in whether to isolate patients with

  7. [Application of regression tree in analyzing the effects of climate factors on NDVI in loess hilly area of Shaanxi Province].

    Science.gov (United States)

    Liu, Yang; Lü, Yi-he; Zheng, Hai-feng; Chen, Li-ding

    2010-05-01

    Based on the 10-day SPOT VEGETATION NDVI data and the daily meteorological data from 1998 to 2007 in Yan' an City, the main meteorological variables affecting the annual and interannual variations of NDVI were determined by using regression tree. It was found that the effects of test meteorological variables on the variability of NDVI differed with seasons and time lags. Temperature and precipitation were the most important meteorological variables affecting the annual variation of NDVI, and the average highest temperature was the most important meteorological variable affecting the inter-annual variation of NDVI. Regression tree was very powerful in determining the key meteorological variables affecting NDVI variation, but could not build quantitative relations between NDVI and meteorological variables, which limited its further and wider application.

  8. Sampling forest tree regeneration with a transect approach

    Directory of Open Access Journals (Sweden)

    D. Hessenmöller

    2013-05-01

    Full Text Available A new transect approach for sampling forest tree regeneration isdeveloped with the aim to minimize the amount of field measurements, and to produce an accurate estimation of tree species composition and density independent of tree height. This approach is based on the “probability proportional to size” (PPS theory to assess heterogeneous vegetation. This new method is compared with other approaches to assess forest regeneration based on simulated and measured, real data. The main result is that the transect approach requires about 50% of the time to assess stand density as compared to the plot approach, due to the fact that only 25% of the tree individuals are measured. In addition, tall members of the regeneration are counted with equal probability as small members. This is not the case in the plot approach. The evenness is 0.1 to 0.2 units larger in the transect by PPS than in the plot approach, which means that the plot approach shows a more homogeneous regeneration layer than the PPS approach, even though the stand densities and height distributions are similar. The species diversity is variable in both approaches and needs further investigations.

  9. Sampling forest tree regeneration with a transect approach

    Directory of Open Access Journals (Sweden)

    D. Hessenmoeller

    2013-07-01

    Full Text Available A new transect approach for sampling forest tree regeneration is developed with the aim to minimize the amount of field measurements, and to produce an accurate estimation of tree species composition and density independent of tree height. This approach is based on the “probability proportional to size” (PPS theory to assess heterogeneous vegetation. This new method is compared with other approaches to assess forest regeneration based on simulated and measured, real data. The main result is that the transect approach requires about 50% of the time to assess stand density as compared to the plot approach, due to the fact that only 25% of the tree individuals are measured. In addition, tall members of the regeneration are counted with equal probability as small members. This is not the case in the plot approach. The evenness is 0.1 to 0.2 units larger in the transect by PPS than in the plot approach, which means that the plot approach shows a more homogenous regeneration layer than the PPS approach, even though the stand densities and height distributions are similar. The species diversity is variable in both approaches and needs further investigations.

  10. Testing for Stock Market Contagion: A Quantile Regression Approach

    NARCIS (Netherlands)

    S.Y. Park (Sung); W. Wang (Wendun); N. Huang (Naijing)

    2015-01-01

    markdownabstract__Abstract__ Regarding the asymmetric and leptokurtic behavior of financial data, we propose a new contagion test in the quantile regression framework that is robust to model misspecification. Unlike conventional correlation-based tests, the proposed quantile contagion test

  11. Delimitação de áreas para plantio de eucalipto utilizando regressões logísticas Delimitation of areas for planting eucalyptus trees using logistic regressions

    Directory of Open Access Journals (Sweden)

    Rodrigo Teske

    2012-07-01

    Full Text Available A área útil efetiva é um parâmetro importante na aquisição de terras e planejamento do florestamento. A finalidade desta pesquisa foi gerar mapas preditores de áreas aptas ao plantio de eucalipto usando regressões logísticas binárias e variáveis geomorfométricas. As relações entre as variáveis preditoras e as áreas aptas para plantio de eucalipto foram modeladas e a variável que melhor explicou a ocorrência de áreas para plantio foi a distância dos rios. O mapa gerado apresentando as áreas aptas para plantio mostrou alta capacidade de reproduzir o mapa original de plantio de eucalipto. As regressões logísticas demonstraram viabilidade do uso para o mapeamento da aptidão para o plantio de eucalipto.Effective usable area is a key parameter in land acquisition and afforestation planning. The purpose of this research was to generate predictive maps of areas suitable for planting eucalyptus trees using binary logistic regressions and geomorphometric variables. The relationships between the predicting variables and suitable areas for planting eucalyptus trees were modeled and the variable that best explained occurrence of suitable lands was distance from rivers. The generated map showing areas suitable for planting had a high ability to reproduce the original planting map. Logistic regressions demonstrated the feasibility of use this approach to map suitability for eucalyptus forestation.

  12. Aneurysmal subarachnoid hemorrhage prognostic decision-making algorithm using classification and regression tree analysis

    Directory of Open Access Journals (Sweden)

    Benjamin W. Y. Lo

    2016-01-01

    Conclusions: A clinically useful classification tree was generated, which serves as a prediction tool to guide bedside prognostication and clinical treatment decision making. This prognostic decision-making algorithm also shed light on the complex interactions between a number of risk factors in determining outcome after aneurysmal SAH.

  13. Identifying predictors of physics item difficulty: A linear regression approach

    Directory of Open Access Journals (Sweden)

    Vanes Mesic

    2011-06-01

    Full Text Available Large-scale assessments of student achievement in physics are often approached with an intention to discriminate students based on the attained level of their physics competencies. Therefore, for purposes of test design, it is important that items display an acceptable discriminatory behavior. To that end, it is recommended to avoid extraordinary difficult and very easy items. Knowing the factors that influence physics item difficulty makes it possible to model the item difficulty even before the first pilot study is conducted. Thus, by identifying predictors of physics item difficulty, we can improve the test-design process. Furthermore, we get additional qualitative feedback regarding the basic aspects of student cognitive achievement in physics that are directly responsible for the obtained, quantitative test results. In this study, we conducted a secondary analysis of data that came from two large-scale assessments of student physics achievement at the end of compulsory education in Bosnia and Herzegovina. Foremost, we explored the concept of “physics competence” and performed a content analysis of 123 physics items that were included within the above-mentioned assessments. Thereafter, an item database was created. Items were described by variables which reflect some basic cognitive aspects of physics competence. For each of the assessments, Rasch item difficulties were calculated in separate analyses. In order to make the item difficulties from different assessments comparable, a virtual test equating procedure had to be implemented. Finally, a regression model of physics item difficulty was created. It has been shown that 61.2% of item difficulty variance can be explained by factors which reflect the automaticity, complexity, and modality of the knowledge structure that is relevant for generating the most probable correct solution, as well as by the divergence of required thinking and interference effects between intuitive and formal

  14. Identifying predictors of physics item difficulty: A linear regression approach

    Science.gov (United States)

    Mesic, Vanes; Muratovic, Hasnija

    2011-06-01

    Large-scale assessments of student achievement in physics are often approached with an intention to discriminate students based on the attained level of their physics competencies. Therefore, for purposes of test design, it is important that items display an acceptable discriminatory behavior. To that end, it is recommended to avoid extraordinary difficult and very easy items. Knowing the factors that influence physics item difficulty makes it possible to model the item difficulty even before the first pilot study is conducted. Thus, by identifying predictors of physics item difficulty, we can improve the test-design process. Furthermore, we get additional qualitative feedback regarding the basic aspects of student cognitive achievement in physics that are directly responsible for the obtained, quantitative test results. In this study, we conducted a secondary analysis of data that came from two large-scale assessments of student physics achievement at the end of compulsory education in Bosnia and Herzegovina. Foremost, we explored the concept of “physics competence” and performed a content analysis of 123 physics items that were included within the above-mentioned assessments. Thereafter, an item database was created. Items were described by variables which reflect some basic cognitive aspects of physics competence. For each of the assessments, Rasch item difficulties were calculated in separate analyses. In order to make the item difficulties from different assessments comparable, a virtual test equating procedure had to be implemented. Finally, a regression model of physics item difficulty was created. It has been shown that 61.2% of item difficulty variance can be explained by factors which reflect the automaticity, complexity, and modality of the knowledge structure that is relevant for generating the most probable correct solution, as well as by the divergence of required thinking and interference effects between intuitive and formal physics knowledge

  15. On the quantitative relationships between environmental parameters and heavy metals pollution in Mediterranean soils using GIS regression-trees

    DEFF Research Database (Denmark)

    Bou Kheir, Rania; Shomar, B.; Greve, Mogens Humlekrog

    2014-01-01

    Soil heavy metal pollution has been and continues to be a worldwide phenomenon that has attracted a great deal of attention from governments and regulatory bodies. In this context, our study used Geographic Information Systems (GIS) and regression-tree modeling (196 trees) to precisely quantify...... as weighted input data in soil pollution prediction models. The developed strongest relationships were associated with Cd and As, variance being equal to 82%, followed by Ni (75%) and Cr (73%) as the weakest relationship. This study also showed that nearness to cities (with a relative importance varying...... the relationships between four toxic heavy metals (Ni, Cr, Cd and As) and sixteen environmental parameters (e.g., parent material, slope gradient, proximity to roads, etc.) in the soils of northern Lebanon (as a case study of Mediterranean landscapes), and to detect the most important parameters that can be used...

  16. A Bayesian approach to linear regression in astronomy

    CERN Document Server

    Sereno, Mauro

    2015-01-01

    Linear regression is common in astronomical analyses. I discuss a Bayesian hierarchical modeling of data with heteroscedastic and possibly correlated measurement errors and intrinsic scatter. The method fully accounts for time evolution. The slope, the normalization, and the intrinsic scatter of the relation can evolve with the redshift. The intrinsic distribution of the independent variable is approximated using a mixture of Gaussian distributions whose means and standard deviations depend on time. The method can address scatter in the measured independent variable (a kind of Eddington bias), selection effects in the response variable (Malmquist bias), and departure from linearity in form of a knee. I tested the method with toy models and simulations and quantified the effect of biases and inefficient modeling. The R-package LIRA (LInear Regression in Astronomy) is made available to perform the regression.

  17. An Approach to the Programming of Biased Regression Algorithms.

    Science.gov (United States)

    1978-11-01

    Due to the near nonexistence of computer algorithms for calculating estimators and ancillary statistics that are needed for biased regression methodologies, many users of these methodologies are forced to write their own programs. Brute-force coding of such programs can result in a great waste of computer core and computing time, as well as inefficient and inaccurate computing techniques. This article proposes some guides to more efficient programming by taking advantage of mathematical similarities among several of the more popular biased regression estimators.

  18. Measuring Habituation in Infants: An Approach Using Regression Analysis.

    Science.gov (United States)

    Ashmead, Daniel H.; Davis, DeFord L.

    1996-01-01

    Used computer simulations to examine effectiveness of different criteria for measuring infant visual habituation. Found that a criterion based on fitting a second-order polynomial regression function to looking-time data produced more accurate estimation of looking times and higher power for detecting novelty effects than did the traditional…

  19. Testing for Stock Market Contagion: A Quantile Regression Approach

    NARCIS (Netherlands)

    S.Y. Park (Sung); W. Wang (Wendun); N. Huang (Naijing)

    2015-01-01

    markdownabstract__Abstract__ Regarding the asymmetric and leptokurtic behavior of financial data, we propose a new contagion test in the quantile regression framework that is robust to model misspecification. Unlike conventional correlation-based tests, the proposed quantile contagion test allows

  20. The Learning Tree Montessori Child Care: An Approach to Diversity

    Science.gov (United States)

    Wick, Laurie

    2006-01-01

    In this article the author describes how she and her partners started The Learning Tree Montessori Child Care, a Montessori program with a different approach in Seattle in 1979. The author also relates that the other area Montessori schools then offered half-day programs, and as a result the children who attended were, for the most part,…

  1. A regional classification scheme for estimating reference water quality in streams using land-use-adjusted spatial regression-tree analysis

    Science.gov (United States)

    Robertson, D.M.; Saad, D.A.; Heisey, D.M.

    2006-01-01

    Various approaches are used to subdivide large areas into regions containing streams that have similar reference or background water quality and that respond similarly to different factors. For many applications, such as establishing reference conditions, it is preferable to use physical characteristics that are not affected by human activities to delineate these regions. However, most approaches, such as ecoregion classifications, rely on land use to delineate regions or have difficulties compensating for the effects of land use. Land use not only directly affects water quality, but it is often correlated with the factors used to define the regions. In this article, we describe modifications to SPARTA (spatial regression-tree analysis), a relatively new approach applied to water-quality and environmental characteristic data to delineate zones with similar factors affecting water quality. In this modified approach, land-use-adjusted (residualized) water quality and environmental characteristics are computed for each site. Regression-tree analysis is applied to the residualized data to determine the most statistically important environmental characteristics describing the distribution of a specific water-quality constituent. Geographic information for small basins throughout the study area is then used to subdivide the area into relatively homogeneous environmental water-quality zones. For each zone, commonly used approaches are subsequently used to define its reference water quality and how its water quality responds to changes in land use. SPARTA is used to delineate zones of similar reference concentrations of total phosphorus and suspended sediment throughout the upper Midwestern part of the United States. ?? 2006 Springer Science+Business Media, Inc.

  2. Mapping average GPP, RE, and NEP for 2000 to 2013 using satellite data integrated into regression-tree models in the conterminous United States

    Data.gov (United States)

    U.S. Geological Survey, Department of the Interior — Integrating spatially explicit biogeophysical and remotely sensed data into regression-tree models enables the spatial extrapolation of training data over large...

  3. BUSINESS GROWTH STRATEGIES OF ILLINOIS FARMS: A QUANTILE REGRESSION APPROACH

    OpenAIRE

    Hennings, Enrique; Katchova, Ani L.

    2005-01-01

    This study examines the business strategies employed by Illinois farms to maintain equity growth using quantile regression analysis. Using data from the Farm Business Farm Management system, this study finds that the effect of different business strategies on equity growth rates differs between quantiles. Financial management strategies have a positive effect for farms situated in the highest quantile of equity growth, while for farms in the lowest quantile the effect on equity growth is nega...

  4. Vessel-guided airway tree segmentation: A voxel classification approach

    DEFF Research Database (Denmark)

    Ashraf, Haseem; Pedersen, Jesper J H; Lo, Pechin Chien Pau;

    2010-01-01

    This paper presents a method for airway tree segmentation that uses a combination of a trained airway appearance model, vessel and airway orientation information, and region growing. We propose a voxel classification approach for the appearance model, which uses a classifier that is trained...... method is evaluated on 250 low dose computed tomography images from a lung cancer screening trial. Our experiments showed that applying the region growing algorithm on the airway appearance model produces more complete airway segmentations, leading to on average 20% longer trees, and 50% less leakage...

  5. A Comparison of Logistic Regression, Neural Networks, and Classification Trees Predicting Success of Actuarial Students

    Science.gov (United States)

    Schumacher, Phyllis; Olinsky, Alan; Quinn, John; Smith, Richard

    2010-01-01

    The authors extended previous research by 2 of the authors who conducted a study designed to predict the successful completion of students enrolled in an actuarial program. They used logistic regression to determine the probability of an actuarial student graduating in the major or dropping out. They compared the results of this study with those…

  6. A Comparison of Logistic Regression, Neural Networks, and Classification Trees Predicting Success of Actuarial Students

    Science.gov (United States)

    Schumacher, Phyllis; Olinsky, Alan; Quinn, John; Smith, Richard

    2010-01-01

    The authors extended previous research by 2 of the authors who conducted a study designed to predict the successful completion of students enrolled in an actuarial program. They used logistic regression to determine the probability of an actuarial student graduating in the major or dropping out. They compared the results of this study with those…

  7. Grassland and cropland net ecosystem production of the U.S. Great Plains: Regression tree model development and comparative analysis

    Science.gov (United States)

    Wylie, Bruce K.; Howard, Daniel; Dahal, Devendra; Gilmanov, Tagir; Ji, Lei; Zhang, Li; Smith, Kelcy

    2016-01-01

    This paper presents the methodology and results of two ecological-based net ecosystem production (NEP) regression tree models capable of up scaling measurements made at various flux tower sites throughout the U.S. Great Plains. Separate grassland and cropland NEP regression tree models were trained using various remote sensing data and other biogeophysical data, along with 15 flux towers contributing to the grassland model and 15 flux towers for the cropland model. The models yielded weekly mean daily grassland and cropland NEP maps of the U.S. Great Plains at 250 m resolution for 2000–2008. The grassland and cropland NEP maps were spatially summarized and statistically compared. The results of this study indicate that grassland and cropland ecosystems generally performed as weak net carbon (C) sinks, absorbing more C from the atmosphere than they released from 2000 to 2008. Grasslands demonstrated higher carbon sink potential (139 g C·m−2·year−1) than non-irrigated croplands. A closer look into the weekly time series reveals the C fluctuation through time and space for each land cover type.

  8. Grassland and Cropland Net Ecosystem Production of the U.S. Great Plains: Regression Tree Model Development and Comparative Analysis

    Directory of Open Access Journals (Sweden)

    Bruce Wylie

    2016-11-01

    Full Text Available This paper presents the methodology and results of two ecological-based net ecosystem production (NEP regression tree models capable of up scaling measurements made at various flux tower sites throughout the U.S. Great Plains. Separate grassland and cropland NEP regression tree models were trained using various remote sensing data and other biogeophysical data, along with 15 flux towers contributing to the grassland model and 15 flux towers for the cropland model. The models yielded weekly mean daily grassland and cropland NEP maps of the U.S. Great Plains at 250 m resolution for 2000–2008. The grassland and cropland NEP maps were spatially summarized and statistically compared. The results of this study indicate that grassland and cropland ecosystems generally performed as weak net carbon (C sinks, absorbing more C from the atmosphere than they released from 2000 to 2008. Grasslands demonstrated higher carbon sink potential (139 g C·m−2·year−1 than non-irrigated croplands. A closer look into the weekly time series reveals the C fluctuation through time and space for each land cover type.

  9. A 2-adic approach of the human respiratory tree

    CERN Document Server

    Bernicot, Frederic; Salort, Delphine

    2010-01-01

    We propose here a general framework to address the question of trace operators on a dyadic tree. This work is motivated by the modeling of the human bronchial tree which, thanks to its regularity, can be extrapolated in a natural way to an infinite resistive tree. The space of pressure fields at bifurcation nodes of this infinite tree can be endowed with a Sobolev space structure, with a semi-norm which measures the instantaneous rate of dissipated energy. We aim at describing the behaviour of finite energy pressure fields near the end. The core of the present approach is an identification of the set of ends with the ring Z_2 of 2-adic integers. Sobolev spaces over Z_2 can be defined in a very natural way by means of Fourier transform, which allows us to establish precised trace theorems which are formally quite similar to those in standard Sobolev spaces, with a Sobolev regularity which depends on the growth rate of resistances, i.e. on geometrical properties of the tree. Furthermore, we exhibit an explicit ...

  10. A Maximum Likelihood Approach to Least Absolute Deviation Regression

    Directory of Open Access Journals (Sweden)

    Yinbo Li

    2004-09-01

    Full Text Available Least absolute deviation (LAD regression is an important tool used in numerous applications throughout science and engineering, mainly due to the intrinsic robust characteristics of LAD. In this paper, we show that the optimization needed to solve the LAD regression problem can be viewed as a sequence of maximum likelihood estimates (MLE of location. The derived algorithm reduces to an iterative procedure where a simple coordinate transformation is applied during each iteration to direct the optimization procedure along edge lines of the cost surface, followed by an MLE of location which is executed by a weighted median operation. Requiring weighted medians only, the new algorithm can be easily modularized for hardware implementation, as opposed to most of the other existing LAD methods which require complicated operations such as matrix entry manipulations. One exception is Wesolowsky's direct descent algorithm, which among the top algorithms is also based on weighted median operations. Simulation shows that the new algorithm is superior in speed to Wesolowsky's algorithm, which is simple in structure as well. The new algorithm provides a better tradeoff solution between convergence speed and implementation complexity.

  11. A novel dendrochronological approach reveals drivers of carbon sequestration in tree species of riparian forests across spatiotemporal scales.

    Science.gov (United States)

    Rieger, Isaak; Kowarik, Ingo; Cherubini, Paolo; Cierjacks, Arne

    2017-01-01

    Aboveground carbon (C) sequestration in trees is important in global C dynamics, but reliable techniques for its modeling in highly productive and heterogeneous ecosystems are limited. We applied an extended dendrochronological approach to disentangle the functioning of drivers from the atmosphere (temperature, precipitation), the lithosphere (sedimentation rate), the hydrosphere (groundwater table, river water level fluctuation), the biosphere (tree characteristics), and the anthroposphere (dike construction). Carbon sequestration in aboveground biomass of riparian Quercus robur L. and Fraxinus excelsior L. was modeled (1) over time using boosted regression tree analysis (BRT) on cross-datable trees characterized by equal annual growth ring patterns and (2) across space using a subsequent classification and regression tree analysis (CART) on cross-datable and not cross-datable trees. While C sequestration of cross-datable Q. robur responded to precipitation and temperature, cross-datable F. excelsior also responded to a low Danube river water level. However, CART revealed that C sequestration over time is governed by tree height and parameters that vary over space (magnitude of fluctuation in the groundwater table, vertical distance to mean river water level, and longitudinal distance to upstream end of the study area). Thus, a uniform response to climatic drivers of aboveground C sequestration in Q. robur was only detectable in trees of an intermediate height class and in taller trees (>21.8m) on sites where the groundwater table fluctuated little (≤0.9m). The detection of climatic drivers and the river water level in F. excelsior depended on sites at lower altitudes above the mean river water level (≤2.7m) and along a less dynamic downstream section of the study area. Our approach indicates unexploited opportunities of understanding the interplay of different environmental drivers in aboveground C sequestration. Results may support species-specific and

  12. Determinants of the Slovak Enterprises Profi tability: Quantile Regression Approach

    Directory of Open Access Journals (Sweden)

    Štefan Kováč

    2013-09-01

    Full Text Available Th e goal of this paper is to analyze profi tability of the Slovak enterprises by means of quantile regression. Th eanalysis is based on individual data from the 2001, 2006 and 2011 fi nancial statements of the Slovak companies.Profi tability is proxied by ratio of profi t/loss to total assets, and twelve covariates are used in the study,including two nominal variables: region and sector. According to the fi ndings size, short- and long-term indebtedness,ratio of long-term assets to total assets, ratio of sales revenue to cost of sales, region and sectorare the possible determinants of profi tability of the companies in Slovakia. Th e results further suggest that thechanges over time have infl uenced the magnitude of the eff ects of given variables.

  13. Core set approach to reduce uncertainty of gene trees

    Directory of Open Access Journals (Sweden)

    Okuhara Yoshiyasu

    2006-05-01

    Full Text Available Abstract Background A genealogy based on gene sequences within a species plays an essential role in the estimation of the character, structure, and evolutionary history of that species. Because intraspecific sequences are more closely related than interspecific ones, detailed information on the evolutionary process may be available by determining all the node sequences of trees and provide insight into functional constraints and adaptations. However, strong evolutionary correlations on a few lineages make this determination difficult as a whole, and the maximum parsimony (MP method frequently allows a number of topologies with a same total branching length. Results Kitazoe et al. developed multidimensional vector-space representation of phylogeny. It converts additivity of evolutionary distances to orthogonality among the vectors expressing branches, and provides a unified index to measure deviations from the orthogoality. In this paper, this index is used to detect and exclude sequences with large deviations from orthogonality, and then selects a maximum subset ("core set" of sequences for which MP generates a single solution. Once the core set tree is formed whose all the node sequences are given, the excluded sequences are found to have basically two phylogenetic positions on this tree, respectively. Fortunately, since multiple substitutions are rare in intra-species sequences, the variance of nucleotide transitions is confined to a small range. By applying the core set approach to 38 partial env sequences of HIV-1 in a single patient and also 198 mitochondrial COI and COII DNA sequences of Anopheles dirus, we demonstrate how consistently this approach constructs the tree. Conclusion In the HIV dataset, we confirmed that the obtained core set tree is the unique maximum set for which MP proposes a single tree. In the mosquito data set, the fluctuation of nucleotide transitions caused by the sequences excluded from the core set was very small

  14. Neighborhood Effects in Wind Farm Performance: A Regression Approach

    Directory of Open Access Journals (Sweden)

    Matthias Ritter

    2017-03-01

    Full Text Available The optimization of turbine density in wind farms entails a trade-off between the usage of scarce, expensive land and power losses through turbine wake effects. A quantification and prediction of the wake effect, however, is challenging because of the complex aerodynamic nature of the interdependencies of turbines. In this paper, we propose a parsimonious data driven regression wake model that can be used to predict production losses of existing and potential wind farms. Motivated by simple engineering wake models, the predicting variables are wind speed, the turbine alignment angle, and distance. By utilizing data from two wind farms in Germany, we show that our models can compete with the standard Jensen model in predicting wake effect losses. A scenario analysis reveals that a distance between turbines can be reduced by up to three times the rotor size, without entailing substantial production losses. In contrast, an unfavorable configuration of turbines with respect to the main wind direction can result in production losses that are much higher than in an optimal case.

  15. The price sensitivity of Medicare beneficiaries: a regression discontinuity approach.

    Science.gov (United States)

    Buchmueller, Thomas C; Grazier, Kyle; Hirth, Richard A; Okeke, Edward N

    2013-01-01

    We use 4 years of data from the retiree health benefits program of the University of Michigan to estimate the effect of price on the health plan choices of Medicare beneficiaries. During the period of our analysis, changes in the University's premium contribution rules led to substantial price changes. A key feature of this 'natural experiment' is that individuals who had retired before a certain date were exempted from having to pay any premium contributions. This 'grandfathering' creates quasi-experimental variation that is ideal for estimating the effect of price. Using regression discontinuity methods, we compare the plan choices of individuals who retired just after the grandfathering cutoff date and were therefore exposed to significant price changes to the choices of a 'control group' of individuals who retired just before that date and therefore did not experience the price changes. The results indicate a statistically significant effect of price, with a $10 increase in monthly premium contributions leading to a 2 to 3 percentage point decrease in a plan's market share. Copyright © 2012 John Wiley & Sons, Ltd.

  16. Drought forecasting in eastern Australia using multivariate adaptive regression spline, least square support vector machine and M5Tree model

    Science.gov (United States)

    Deo, Ravinesh C.; Kisi, Ozgur; Singh, Vijay P.

    2017-02-01

    Drought forecasting using standardized metrics of rainfall is a core task in hydrology and water resources management. Standardized Precipitation Index (SPI) is a rainfall-based metric that caters for different time-scales at which the drought occurs, and due to its standardization, is well-suited for forecasting drought at different periods in climatically diverse regions. This study advances drought modelling using multivariate adaptive regression splines (MARS), least square support vector machine (LSSVM), and M5Tree models by forecasting SPI in eastern Australia. MARS model incorporated rainfall as mandatory predictor with month (periodicity), Southern Oscillation Index, Pacific Decadal Oscillation Index and Indian Ocean Dipole, ENSO Modoki and Nino 3.0, 3.4 and 4.0 data added gradually. The performance was evaluated with root mean square error (RMSE), mean absolute error (MAE), and coefficient of determination (r2). Best MARS model required different input combinations, where rainfall, sea surface temperature and periodicity were used for all stations, but ENSO Modoki and Pacific Decadal Oscillation indices were not required for Bathurst, Collarenebri and Yamba, and the Southern Oscillation Index was not required for Collarenebri. Inclusion of periodicity increased the r2 value by 0.5-8.1% and reduced RMSE by 3.0-178.5%. Comparisons showed that MARS superseded the performance of the other counterparts for three out of five stations with lower MAE by 15.0-73.9% and 7.3-42.2%, respectively. For the other stations, M5Tree was better than MARS/LSSVM with lower MAE by 13.8-13.4% and 25.7-52.2%, respectively, and for Bathurst, LSSVM yielded more accurate result. For droughts identified by SPI ≤ - 0.5, accurate forecasts were attained by MARS/M5Tree for Bathurst, Yamba and Peak Hill, whereas for Collarenebri and Barraba, M5Tree was better than LSSVM/MARS. Seasonal analysis revealed disparate results where MARS/M5Tree was better than LSSVM. The results highlight the

  17. A conditional likelihood approach for regression analysis using biomarkers measured with batch-specific error.

    Science.gov (United States)

    Wang, Ming; Flanders, W Dana; Bostick, Roberd M; Long, Qi

    2012-12-20

    Measurement error is common in epidemiological and biomedical studies. When biomarkers are measured in batches or groups, measurement error is potentially correlated within each batch or group. In regression analysis, most existing methods are not applicable in the presence of batch-specific measurement error in predictors. We propose a robust conditional likelihood approach to account for batch-specific error in predictors when batch effect is additive and the predominant source of error, which requires no assumptions on the distribution of measurement error. Although a regression model with batch as a categorical covariable yields the same parameter estimates as the proposed conditional likelihood approach for linear regression, this result does not hold in general for all generalized linear models, in particular, logistic regression. Our simulation studies show that the conditional likelihood approach achieves better finite sample performance than the regression calibration approach or a naive approach without adjustment for measurement error. In the case of logistic regression, our proposed approach is shown to also outperform the regression approach with batch as a categorical covariate. In addition, we also examine a 'hybrid' approach combining the conditional likelihood method and the regression calibration method, which is shown in simulations to achieve good performance in the presence of both batch-specific and measurement-specific errors. We illustrate our method by using data from a colorectal adenoma study.

  18. Multi-variate flood damage assessment: a tree-based data-mining approach

    Science.gov (United States)

    Merz, B.; Kreibich, H.; Lall, U.

    2013-01-01

    The usual approach for flood damage assessment consists of stage-damage functions which relate the relative or absolute damage for a certain class of objects to the inundation depth. Other characteristics of the flooding situation and of the flooded object are rarely taken into account, although flood damage is influenced by a variety of factors. We apply a group of data-mining techniques, known as tree-structured models, to flood damage assessment. A very comprehensive data set of more than 1000 records of direct building damage of private households in Germany is used. Each record contains details about a large variety of potential damage-influencing characteristics, such as hydrological and hydraulic aspects of the flooding situation, early warning and emergency measures undertaken, state of precaution of the household, building characteristics and socio-economic status of the household. Regression trees and bagging decision trees are used to select the more important damage-influencing variables and to derive multi-variate flood damage models. It is shown that these models outperform existing models, and that tree-structured models are a promising alternative to traditional damage models.

  19. Design and analysis of experiments classical and regression approaches with SAS

    CERN Document Server

    Onyiah, Leonard C

    2008-01-01

    Introductory Statistical Inference and Regression Analysis Elementary Statistical Inference Regression Analysis Experiments, the Completely Randomized Design (CRD)-Classical and Regression Approaches Experiments Experiments to Compare Treatments Some Basic Ideas Requirements of a Good Experiment One-Way Experimental Layout or the CRD: Design and Analysis Analysis of Experimental Data (Fixed Effects Model) Expected Values for the Sums of Squares The Analysis of Variance (ANOVA) Table Follow-Up Analysis to Check fo

  20. A sequential tree approach for incremental sequential pattern mining

    Indian Academy of Sciences (India)

    RAJESH KUMAR BOGHEY; SHAILENDRA SINGH

    2016-12-01

    ‘‘Sequential pattern mining’’ is a prominent and significant method to explore the knowledge and innovation from the large database. Common sequential pattern mining algorithms handle static databases.Pragmatically, looking into the functional and actual execution, the database grows exponentially thereby leading to the necessity and requirement of such innovation, research, and development culminating into the designing of mining algorithm. Once the database is updated, the previous mining result will be incorrect, and we need to restart and trigger the entire mining process for the new updated sequential database. To overcome and avoid the process of rescanning of the entire database, this unique system of incremental mining of sequential pattern is available. The previous approaches, system, and techniques are a priori-based frameworks but mine patterns is an advanced and sophisticated technique giving the desired solution. We propose and incorporate an algorithm called STISPM for incremental mining of sequential patterns using the sequence treespace structure. STISPM uses the depth-first approach along with backward tracking and the dynamic lookahead pruning strategy that removes infrequent and irregular patterns. The process and approach from the root node to any leaf node depict a sequential pattern in the database. The structural characteristic of the sequence tree makes it convenient and appropriate for incremental sequential pattern mining. The sequence tree also stores all the sequential patterns with its count and statistics, so whenever the support system is withdrawn or changed, our algorithm using frequent sequence tree as the storage structure can find and detect all the sequential patternswithout mining the database once again.

  1. A Demographic Approach to Evaluating Tree Population Sustainability

    Directory of Open Access Journals (Sweden)

    Corey R. Halpin

    2017-02-01

    Full Text Available Quantitative criteria for assessing demographic sustainability of tree populations would be useful in forest conservation, as climate change and a growing complex of invasive pests are likely to drive forests outside their historic range of variability. In this paper, we used CANOPY, a spatially explicit, individual‐tree model, to examine the effects of initial size distributions on sustainability of tree populations for 70 northern hardwood stands under current environmental conditions. A demographic sustainability index was calculated as the ratio of future simulated basal area to current basal area, given current demographic structure and density‐dependent demographic equations. Only steeply descending size distributions were indicated to be moderately or highly sustainable (final basal area/initial basal area ≥0.7 over several tree generations. Five of the six principal species had demographic sustainability index values of <0.6 in 40%–84% of the stands. However, at a small landscape scale, nearly all species had mean index values >1. Simulation experiments suggested that a minimum sapling density of 300 per hectare was required to sustain the initial basal area, but further increases in sapling density did not increase basal area because of coincident increases in mortality. A variable slope with high q‐ratios in small size classes was needed to maintain the existing overstory of mature and old‐growth stands. This analytical approach may be useful in identifying stands needing restoration treatments to maintain existing species composition in situations where forests are likely to have future recruitment limitations.

  2. A Decision Tree Approach for Predicting Smokers' Quit Intentions

    Institute of Scientific and Technical Information of China (English)

    Xiao-Jiang Ding; Susan Bedingfield; Chung-Hsing Yeh; Ron Borland; David Young; Jian-Ying Zhang; Sonja Petrovic-Lazarevic; Ken Coghill

    2008-01-01

    This paper presents a decision tree approach for predicting smokers'quit intentions using the data from the International Tobacco Control Four Country Survey. Three rule-based classification models are generated from three data sets using attributes in relation to demographics, warning labels, and smokers' beliefs. Both demographic attributes and warning label attributes are important in predicting smokers' quit intentions. The model's ability to predict smokers' quit intentions is enhanced, if the attributes regarding smokers' internal motivation and beliefs about quitting are included.

  3. Decision Tree Approach to Discovering Fraud in Leasing Agreements

    Directory of Open Access Journals (Sweden)

    Horvat Ivan

    2014-09-01

    Full Text Available Background: Fraud attempts create large losses for financing subjects in modern economies. At the same time, leasing agreements have become more and more popular as a means of financing objects such as machinery and vehicles, but are more vulnerable to fraud attempts. Objectives: The goal of the paper is to estimate the usability of the data mining approach in discovering fraud in leasing agreements. Methods/Approach: Real-world data from one Croatian leasing firm was used for creating tow models for fraud detection in leasing. The decision tree method was used for creating a classification model, and the CHAID algorithm was deployed. Results: The decision tree model has indicated that the object of the leasing agreement had the strongest impact on the probability of fraud. Conclusions: In order to enhance the probability of the developed model, it would be necessary to develop software that would enable automated, quick and transparent retrieval of data from the system, processing according to the rules and displaying the results in multiple categories.

  4. Hourly predictive artificial neural network and multivariate regression tree models of Alternaria and Cladosporium spore concentrations in Szczecin (Poland)

    Science.gov (United States)

    Grinn-Gofroń, Agnieszka; Strzelczak, Agnieszka

    2009-11-01

    A study was made of the link between time of day, weather variables and the hourly content of certain fungal spores in the atmosphere of the city of Szczecin, Poland, in 2004-2007. Sampling was carried out with a Lanzoni 7-day-recording spore trap. The spores analysed belonged to the taxa Alternaria and Cladosporium. These spores were selected both for their allergenic capacity and for their high level presence in the atmosphere, particularly during summer. Spearman correlation coefficients between spore concentrations, meteorological parameters and time of day showed different indices depending on the taxon being analysed. Relative humidity (RH), air temperature, air pressure and clouds most strongly and significantly influenced the concentration of Alternaria spores. Cladosporium spores correlated less strongly and significantly than Alternaria. Multivariate regression tree analysis revealed that, at air pressures lower than 1,011 hPa the concentration of Alternaria spores was low. Under higher air pressure spore concentrations were higher, particularly when RH was lower than 36.5%. In the case of Cladosporium, under higher air pressure (>1,008 hPa), the spores analysed were more abundant, particularly after 0330 hours. In artificial neural networks, RH, air pressure and air temperature were the most important variables in the model for Alternaria spore concentration. For Cladosporium, clouds, time of day, air pressure, wind speed and dew point temperature were highly significant factors influencing spore concentration. The maximum abundance of Cladosporium spores in air fell between 1200 and 1700 hours.

  5. Modeling compressive strength of recycled aggregate concrete by Artificial Neural Network, Model Tree and Non-linear Regression

    Directory of Open Access Journals (Sweden)

    Neela Deshpande

    2014-12-01

    Full Text Available In the recent past Artificial Neural Networks (ANN have emerged out as a promising technique for predicting compressive strength of concrete. In the present study back propagation was used to predict the 28 day compressive strength of recycled aggregate concrete (RAC along with two other data driven techniques namely Model Tree (MT and Non-linear Regression (NLR. Recycled aggregate is the current need of the hour owing to its environmental friendly aspect of re-use of the construction waste. The study observed that, prediction of 28 day compressive strength of RAC was done better by ANN than NLR and MT. The input parameters were cubic meter proportions of Cement, Natural fine aggregate, Natural coarse Aggregates, recycled aggregates, Admixture and Water (also called as raw data. The study also concluded that ANN performs better when non-dimensional parameters like Sand–Aggregate ratio, Water–total materials ratio, Aggregate–Cement ratio, Water–Cement ratio and Replacement ratio of natural aggregates by recycled aggregates, were used as additional input parameters. Study of each network developed using raw data and each non dimensional parameter facilitated in studying the impact of each parameter on the performance of the models developed using ANN, MT and NLR as well as performance of the ANN models developed with limited number of inputs. The results indicate that ANN learn from the examples and grasp the fundamental domain rules governing strength of concrete.

  6. Performance comparison between Logistic regression, decision trees, and multilayer perceptron in predicting peripheral neuropathy in type 2 diabetes mellitus

    Institute of Scientific and Technical Information of China (English)

    LI Chang-ping; ZHI Xin-yue; MA Jun; CUI Zhuang; ZHU Zi-long; ZHANG Cui; HU Liang-ping

    2012-01-01

    Background Various methods can be applied to build predictive models for the clinical data with binary outcome variable.This research aims to explore the process of constructing common predictive models,Logistic regression (LR),decision tree (DT) and multilayer perceptron (MLP),as well as focus on specific details when applying the methods mentioned above:what preconditions should be satisfied,how to set parameters of the model,how to screen variables and build accuracy models quickly and efficiently,and how to assess the generalization ability (that is,prediction performance) reliably by Monte Carlo method in the case of small sample size.Methods All the 274 patients (include 137 type 2 diabetes mellitus with diabetic peripheral neuropathy and 137 type 2 diabetes mellitus without diabetic peripheral neuropathy) from the Metabolic Disease Hospital in Tianjin participated in the study.There were 30 variables such as sex,age,glycosylated hemoglobin,etc.On account of small sample size,the classification and regression tree (CART) with the chi-squared automatic interaction detector tree (CHAID) were combined by means of the 100 times 5-7 fold stratified cross-validation to build DT.The MLP was constructed by Schwarz Bayes Criterion to choose the number of hidden layers and hidden layer units,alone with levenberg-marquardt (L-M) optimization algorithm,weight decay and preliminary training method.Subsequently,LR was applied by the best subset method with the Akaike Information Criterion (AIC) to make the best used of information and avoid overfitting.Eventually,a 10 to 100 times 3-10 fold stratified cross-validation method was used to compare the generalization ability of DT,MLP and LR in view of the areas under the receiver operating characteristic (ROC) curves (AUC).Results The AUC of DT,MLP and LR were 0.8863,0.8536 and 0.8802,respectively.As the larger the AUC of a specific prediction model is,the higher diagnostic ability presents,MLP performed optimally,and then

  7. An Optimal Sample Data Usage Strategy to Minimize Overfitting and Underfitting Effects in Regression Tree Models Based on Remotely-Sensed Data

    Directory of Open Access Journals (Sweden)

    Yingxin Gu

    2016-11-01

    Full Text Available Regression tree models have been widely used for remote sensing-based ecosystem mapping. Improper use of the sample data (model training and testing data may cause overfitting and underfitting effects in the model. The goal of this study is to develop an optimal sampling data usage strategy for any dataset and identify an appropriate number of rules in the regression tree model that will improve its accuracy and robustness. Landsat 8 data and Moderate-Resolution Imaging Spectroradiometer-scaled Normalized Difference Vegetation Index (NDVI were used to develop regression tree models. A Python procedure was designed to generate random replications of model parameter options across a range of model development data sizes and rule number constraints. The mean absolute difference (MAD between the predicted and actual NDVI (scaled NDVI, value from 0–200 and its variability across the different randomized replications were calculated to assess the accuracy and stability of the models. In our case study, a six-rule regression tree model developed from 80% of the sample data had the lowest MAD (MADtraining = 2.5 and MADtesting = 2.4, which was suggested as the optimal model. This study demonstrates how the training data and rule number selections impact model accuracy and provides important guidance for future remote-sensing-based ecosystem modeling.

  8. An optimal sample data usage strategy to minimize overfitting and underfitting effects in regression tree models based on remotely-sensed data

    Science.gov (United States)

    Gu, Yingxin; Wylie, Bruce K.; Boyte, Stephen; Picotte, Joshua J.; Howard, Danny; Smith, Kelcy; Nelson, Kurtis

    2016-01-01

    Regression tree models have been widely used for remote sensing-based ecosystem mapping. Improper use of the sample data (model training and testing data) may cause overfitting and underfitting effects in the model. The goal of this study is to develop an optimal sampling data usage strategy for any dataset and identify an appropriate number of rules in the regression tree model that will improve its accuracy and robustness. Landsat 8 data and Moderate-Resolution Imaging Spectroradiometer-scaled Normalized Difference Vegetation Index (NDVI) were used to develop regression tree models. A Python procedure was designed to generate random replications of model parameter options across a range of model development data sizes and rule number constraints. The mean absolute difference (MAD) between the predicted and actual NDVI (scaled NDVI, value from 0–200) and its variability across the different randomized replications were calculated to assess the accuracy and stability of the models. In our case study, a six-rule regression tree model developed from 80% of the sample data had the lowest MAD (MADtraining = 2.5 and MADtesting = 2.4), which was suggested as the optimal model. This study demonstrates how the training data and rule number selections impact model accuracy and provides important guidance for future remote-sensing-based ecosystem modeling.

  9. Fuzzy multinomial logistic regression analysis: A multi-objective programming approach

    Science.gov (United States)

    Abdalla, Hesham A.; El-Sayed, Amany A.; Hamed, Ramadan

    2017-05-01

    Parameter estimation for multinomial logistic regression is usually based on maximizing the likelihood function. For large well-balanced datasets, Maximum Likelihood (ML) estimation is a satisfactory approach. Unfortunately, ML can fail completely or at least produce poor results in terms of estimated probabilities and confidence intervals of parameters, specially for small datasets. In this study, a new approach based on fuzzy concepts is proposed to estimate parameters of the multinomial logistic regression. The study assumes that the parameters of multinomial logistic regression are fuzzy. Based on the extension principle stated by Zadeh and Bárdossy's proposition, a multi-objective programming approach is suggested to estimate these fuzzy parameters. A simulation study is used to evaluate the performance of the new approach versus Maximum likelihood (ML) approach. Results show that the new proposed model outperforms ML in cases of small datasets.

  10. Employing Measures of Heterogeneity and an Object-Based Approach to Extrapolate Tree Species Distribution Data

    Directory of Open Access Journals (Sweden)

    Trevor G. Jones

    2014-07-01

    Full Text Available Information derived from high spatial resolution remotely sensed data is critical for the effective management of forested ecosystems. However, high spatial resolution data-sets are typically costly to acquire and process and usually provide limited geographic coverage. In contrast, moderate spatial resolution remotely sensed data, while not able to provide the spectral or spatial detail required for certain types of products and applications, offer inexpensive, comprehensive landscape-level coverage. This study assessed using an object-based approach to extrapolate detailed tree species heterogeneity beyond the extent of hyperspectral/LiDAR flightlines to the broader area covered by a Landsat scene. Using image segments, regression trees established ecologically decipherable relationships between tree species heterogeneity and the spectral properties of Landsat segments. The spectral properties of Landsat bands 4 (i.e., NIR: 0.76–0.90 µm, 5 (i.e., SWIR: 1.55–1.75 µm and 7 (SWIR: 2.08–2.35 µm were consistently selected as predictor variables, explaining approximately 50% of variance in richness and diversity. Results have important ramifications for ongoing management initiatives in the study area and are applicable to wide range of applications.

  11. A Computationally Efficient State Space Approach to Estimating Multilevel Regression Models and Multilevel Confirmatory Factor Models.

    Science.gov (United States)

    Gu, Fei; Preacher, Kristopher J; Wu, Wei; Yung, Yiu-Fai

    2014-01-01

    Although the state space approach for estimating multilevel regression models has been well established for decades in the time series literature, it does not receive much attention from educational and psychological researchers. In this article, we (a) introduce the state space approach for estimating multilevel regression models and (b) extend the state space approach for estimating multilevel factor models. A brief outline of the state space formulation is provided and then state space forms for univariate and multivariate multilevel regression models, and a multilevel confirmatory factor model, are illustrated. The utility of the state space approach is demonstrated with either a simulated or real example for each multilevel model. It is concluded that the results from the state space approach are essentially identical to those from specialized multilevel regression modeling and structural equation modeling software. More importantly, the state space approach offers researchers a computationally more efficient alternative to fit multilevel regression models with a large number of Level 1 units within each Level 2 unit or a large number of observations on each subject in a longitudinal study.

  12. Comparative analysis of neural network and regression based condition monitoring approaches for wind turbine fault detection

    DEFF Research Database (Denmark)

    Schlechtingen, Meik; Santos, Ilmar

    2011-01-01

    This paper presents the research results of a comparison of three different model based approaches for wind turbine fault detection in online SCADA data, by applying developed models to five real measured faults and anomalies. The regression based model as the simplest approach to build a normal ...

  13. Predicting 30-day Hospital Readmission with Publicly Available Administrative Database. A Conditional Logistic Regression Modeling Approach.

    Science.gov (United States)

    Zhu, K; Lou, Z; Zhou, J; Ballester, N; Kong, N; Parikh, P

    2015-01-01

    This article is part of the Focus Theme of Methods of Information in Medicine on "Big Data and Analytics in Healthcare". Hospital readmissions raise healthcare costs and cause significant distress to providers and patients. It is, therefore, of great interest to healthcare organizations to predict what patients are at risk to be readmitted to their hospitals. However, current logistic regression based risk prediction models have limited prediction power when applied to hospital administrative data. Meanwhile, although decision trees and random forests have been applied, they tend to be too complex to understand among the hospital practitioners. Explore the use of conditional logistic regression to increase the prediction accuracy. We analyzed an HCUP statewide inpatient discharge record dataset, which includes patient demographics, clinical and care utilization data from California. We extracted records of heart failure Medicare beneficiaries who had inpatient experience during an 11-month period. We corrected the data imbalance issue with under-sampling. In our study, we first applied standard logistic regression and decision tree to obtain influential variables and derive practically meaning decision rules. We then stratified the original data set accordingly and applied logistic regression on each data stratum. We further explored the effect of interacting variables in the logistic regression modeling. We conducted cross validation to assess the overall prediction performance of conditional logistic regression (CLR) and compared it with standard classification models. The developed CLR models outperformed several standard classification models (e.g., straightforward logistic regression, stepwise logistic regression, random forest, support vector machine). For example, the best CLR model improved the classification accuracy by nearly 20% over the straightforward logistic regression model. Furthermore, the developed CLR models tend to achieve better sensitivity of

  14. A Vertex Oriented Approach to Minimum Cost Spanning Tree Problems

    NARCIS (Netherlands)

    Ciftci, B.B.; Tijs, S.H.

    2007-01-01

    In this paper we consider spanning tree problems, where n players want to be connected to a source as cheap as possible. We introduce and analyze (n!) vertex oriented construct and charge procedures for such spanning tree situations leading in n steps to a minimum cost spanning tree and a cost shari

  15. A Vertex Oriented Approach to Minimum Cost Spanning Tree Problems

    NARCIS (Netherlands)

    Ciftci, B.B.; Tijs, S.H.

    2007-01-01

    In this paper we consider spanning tree problems, where n players want to be connected to a source as cheap as possible. We introduce and analyze (n!) vertex oriented construct and charge procedures for such spanning tree situations leading in n steps to a minimum cost spanning tree and a cost shari

  16. New Approach for Segmentation and Extraction of Single Tree from Point Clouds Data and Aerial Images

    Science.gov (United States)

    Homainejad, A. S.

    2016-06-01

    This paper addresses a new approach for reconstructing a 3D model from single trees via Airborne Laser Scanners (ALS) data and aerial images. The approach detects and extracts single tree from ALS data and aerial images. The existing approaches are able to provide bulk segmentation from a group of trees; however, some methods focused on detection and extraction of a particular tree from ALS and images. Segmentation of a single tree within a group of trees is mostly a mission impossible since the detection of boundary lines between the trees is a tedious job and basically it is not feasible. In this approach an experimental formula based on the height of the trees was developed and applied in order to define the boundary lines between the trees. As a result, each single tree was segmented and extracted and later a 3D model was created. Extracted trees from this approach have a unique identification and attribute. The output has application in various fields of science and engineering such as forestry, urban planning, and agriculture. For example in forestry, the result can be used for study in ecologically diverse, biodiversity and ecosystem.

  17. Trees

    Science.gov (United States)

    Al-Khaja, Nawal

    2007-01-01

    This is a thematic lesson plan for young learners about palm trees and the importance of taking care of them. The two part lesson teaches listening, reading and speaking skills. The lesson includes parts of a tree; the modal auxiliary, can; dialogues and a role play activity.

  18. Identification of Pediatric Patients With Celiac Disease Based on Serology and a Classification and Regression Tree Analysis.

    Science.gov (United States)

    Ermarth, Anna; Bryce, Matthew; Woodward, Stephanie; Stoddard, Gregory; Book, Linda; Jensen, M Kyle

    2017-03-01

    Celiac disease is detected using serology and endoscopy analyses. We used multiple statistical analyses of a geographically isolated population in the United States to determine whether a single serum screening can identify individuals with celiac disease. We performed a retrospective study of 3555 pediatric patients (18 years old or younger) in the intermountain West region of the United States from January 1, 2008, through September 30, 2013. All patients had undergone serologic analyses for celiac disease, including measurement of antibodies to tissue transglutaminase (TTG) and/or deamidated gliadin peptide (DGP), and had duodenal biopsies collected within the following year. Modified Marsh criteria were used to identify patients with celiac disease. We developed models to identify patients with celiac disease using logistic regression and classification and regression tree (CART) analysis. Single use of a test for serum level of IgA against TTG identified patients with celiac disease with 90% sensitivity, 90% specificity, a 61% positive predictive value (PPV), a 90% negative predictive value, and an area under the receiver operating characteristic curve value of 0.91; these values were higher than those obtained from assays for IgA against DGP or IgG against TTG plus DGP. Not including the test for DGP antibody caused only 0.18% of celiac disease cases to be missed. Level of TTG IgA 7-fold the upper limit of normal (ULN) identified patients with celiac disease with a 96% PPV and 100% specificity. Using CART analysis, we found a level of TTG IgA 3.2-fold the ULN and higher to most accurately identify patients with celiac disease (PPV, 89%). Multivariable CART analysis showed that a level of TTG IgA 2.5-fold the ULN and higher was sufficient to identify celiac disease in patients with type 1 diabetes (PPV, 88%). Serum level of IgA against TTG in patients with versus those without trisomy 21 did not affect diagnosis predictability in CART analysis. In a population

  19. Individual-based approach as a useful tool to disentangle the relative importance of tree age, size and inter-tree competition in dendroclimatic studies

    OpenAIRE

    Rozas V

    2015-01-01

    In this work, an individual-based approach was used to assess the relative importance of tree age, size, and competition in modulating the individual dendroclimatic response of Quercus robur L. This was performed in a multi-aged forest in northwestern Spain under a wet Atlantic climate. All trees in five replicated forest stands with homogeneous soil conditions were mapped and inter-tree competition was quantified with a distance-dependent competition index. Tree rings of cored trees were cro...

  20. A timescale decomposed threshold regression downscaling approach to forecasting South China early summer rainfall

    Science.gov (United States)

    Song, Linye; Duan, Wansuo; Li, Yun; Mao, Jiangyu

    2016-09-01

    A timescale decomposed threshold regression (TSDTR) downscaling approach to forecasting South China early summer rainfall (SCESR) is described by using long-term observed station rainfall data and NOAA ERSST data. It makes use of two distinct regression downscaling models corresponding to the interannual and interdecadal rainfall variability of SCESR. The two models are developed based on the partial least squares (PLS) regression technique, linking SCESR to SST modes in preceding months on both interannual and interdecadal timescales. Specifically, using the datasets in the calibration period 1915-84, the variability of SCESR and SST are decomposed into interannual and interdecadal components. On the interannual timescale, a threshold PLS regression model is fitted to interannual components of SCESR and March SST patterns by taking account of the modulation of negative and positive phases of the Pacific Decadal Oscillation (PDO). On the interdecadal timescale, a standard PLS regression model is fitted to the relationship between SCESR and preceding November SST patterns. The total rainfall prediction is obtained by the sum of the outputs from both the interannual and interdecadal models. Results show that the TSDTR downscaling approach achieves reasonable skill in predicting the observed rainfall in the validation period 1985-2006, compared to other simpler approaches. This study suggests that the TSDTR approach, considering different interannual SCESR-SST relationships under the modulation of PDO phases, as well as the interdecadal variability of SCESR associated with SST patterns, may provide a new perspective to improve climate predictions.

  1. Penalized regression techniques for prediction: a case study for predicting tree mortality using remotely sensed vegetation indices

    NARCIS (Netherlands)

    Lazaridis, D.C.; Verbesselt, J.; Robinson, A.P.

    2011-01-01

    Constructing models can be complicated when the available fitting data are highly correlated and of high dimension. However, the complications depend on whether the goal is prediction instead of estimation. We focus on predicting tree mortality (measured as the number of dead trees) from change metr

  2. Understanding how roadside concentrations of NOx are influenced by the background levels, traffic density, and meteorological conditions using Boosted Regression Trees

    Science.gov (United States)

    Sayegh, Arwa; Tate, James E.; Ropkins, Karl

    2016-02-01

    Oxides of Nitrogen (NOx) is a major component of photochemical smog and its constituents are considered principal traffic-related pollutants affecting human health. This study investigates the influence of background concentrations of NOx, traffic density, and prevailing meteorological conditions on roadside concentrations of NOx at UK urban, open motorway, and motorway tunnel sites using the statistical approach Boosted Regression Trees (BRT). BRT models have been fitted using hourly concentration, traffic, and meteorological data for each site. The models predict, rank, and visualise the relationship between model variables and roadside NOx concentrations. A strong relationship between roadside NOx and monitored local background concentrations is demonstrated. Relationships between roadside NOx and other model variables have been shown to be strongly influenced by the quality and resolution of background concentrations of NOx, i.e. if it were based on monitored data or modelled prediction. The paper proposes a direct method of using site-specific fundamental diagrams for splitting traffic data into four traffic states: free-flow, busy-flow, congested, and severely congested. Using BRT models, the density of traffic (vehicles per kilometre) was observed to have a proportional influence on the concentrations of roadside NOx, with different fitted regression line slopes for the different traffic states. When other influences are conditioned out, the relationship between roadside concentrations and ambient air temperature suggests NOx concentrations reach a minimum at around 22 °C with high concentrations at low ambient air temperatures which could be associated to restricted atmospheric dispersion and/or to changes in road traffic exhaust emission characteristics at low ambient air temperatures. This paper uses BRT models to study how different critical factors, and their relative importance, influence the variation of roadside NOx concentrations. The paper

  3. Metabolic activity of tree saps of different origin towards cultured human cells in the light of grade correspondence analysis and multiple regression modeling

    Directory of Open Access Journals (Sweden)

    Artur Wnorowski

    2017-06-01

    Full Text Available Tree saps are nourishing biological media commonly used for beverage and syrup production. Although the nutritional aspect of tree saps is widely acknowledged, the exact relationship between the sap composition, origin, and effect on the metabolic rate of human cells is still elusive. Thus, we collected saps from seven different tree species and conducted composition-activity analysis. Saps from trees of Betulaceae, but not from Salicaceae, Sapindaceae, nor Juglandaceae families, were increasing the metabolic rate of HepG2 cells, as measured using tetrazolium-based assay. Content of glucose, fructose, sucrose, chlorides, nitrates, sulphates, fumarates, malates, and succinates in sap samples varied across different tree species. Grade correspondence analysis clustered trees based on the saps’ chemical footprint indicating its usability in chemotaxonomy. Multiple regression modeling showed that glucose and fumarate present in saps from silver birch (Betula pendula Roth., black alder (Alnus glutinosa Gaertn., and European hornbeam (Carpinus betulus L. are positively affecting the metabolic activity of HepG2 cells.

  4. Combined application of information theory on laboratory results with classification and regression tree analysis: analysis of unnecessary biopsy for prostate cancer.

    Science.gov (United States)

    Hwang, Sang-Hyun; Pyo, Tina; Oh, Heung-Bum; Park, Hyun Jun; Lee, Kwan-Jeh

    2013-01-16

    The probability of a prostate cancer-positive biopsy result varies with PSA concentration. Thus, we applied information theory on classification and regression tree (CART) analysis for decision making predicting the probability of a biopsy result at various PSA concentrations. From 2007 to 2009, prostate biopsies were performed in 664 referred patients in a tertiary hospital. We created 2 CART models based on the information theory: one for moderate uncertainty (PSA concentration: 2.5-10 ng/ml) and the other for high uncertainty (PSA concentration: 10-25 ng/ml). The CART model for moderate uncertainty (n=321) had 3 splits based on PSA density (PSAD), hypoechoic nodules, and age and the other CART for high uncertainty (n=160) had 2 splits based on prostate volume and percent-free PSA. In this validation set, the patients (14.3% and 14.0% for moderate and high uncertainty groups, respectively) could avoid unnecessary biopsies without false-negative results. Using these CART models based on uncertainty information of PSA, the overall reduction in unnecessary prostate biopsies was 14.0-14.3% and CART models were simplified. Using uncertainty of laboratory results from information theoretic approach can provide additional information for decision analysis such as CART. Copyright © 2012 Elsevier B.V. All rights reserved.

  5. "Trees and Things That Live in Trees": Three Children with Special Needs Experience the Project Approach

    Science.gov (United States)

    Griebling, Susan; Elgas, Peg; Konerman, Rachel

    2015-01-01

    The authors report on research conducted during a project investigation undertaken with preschool children, ages 3-5. The report focuses on three children with special needs and the positive outcomes for each child as they engaged in the project Trees and Things That Live in Trees. Two of the children were diagnosed with developmental delays, and…

  6. Modeling Personalized Email Prioritization: Classification-based and Regression-based Approaches

    Energy Technology Data Exchange (ETDEWEB)

    Yoo S.; Yang, Y.; Carbonell, J.

    2011-10-24

    Email overload, even after spam filtering, presents a serious productivity challenge for busy professionals and executives. One solution is automated prioritization of incoming emails to ensure the most important are read and processed quickly, while others are processed later as/if time permits in declining priority levels. This paper presents a study of machine learning approaches to email prioritization into discrete levels, comparing ordinal regression versus classier cascades. Given the ordinal nature of discrete email priority levels, SVM ordinal regression would be expected to perform well, but surprisingly a cascade of SVM classifiers significantly outperforms ordinal regression for email prioritization. In contrast, SVM regression performs well -- better than classifiers -- on selected UCI data sets. This unexpected performance inversion is analyzed and results are presented, providing core functionality for email prioritization systems.

  7. An Efficient Approach for Tree Digital Image Segmentation

    Institute of Scientific and Technical Information of China (English)

    Cheng Lei; Song Tieying

    2004-01-01

    This paper proposes an improved method to segment tree image based on color and texture feature and amends the segmented result by mathematical morphology. The crown and trunk of one tree have been successfully segmented and the experimental result is deemed effective. The authors conclude that building a standard data base for a range of species, featuring color and texture is a necessary condition and constitutes the essential groundwork for tree image segmentation in order to insure its quality.

  8. Partitioning of late gestation energy expenditure in ewes using indirect calorimetry and a linear regression approach

    DEFF Research Database (Denmark)

    Kiani, Alishir; Chwalibog, André; Nielsen, Mette O

    2007-01-01

    study metabolizable energy (ME) intake ranges for twin-bearing ewes were 220-440, 350- 700, 350-900 kJ per metabolic body weight (W0.75) at week seven, five, two pre-partum respectively. Indirect calorimetry and a linear regression approach were used to quantify EE(gest) and then partition to EE...

  9. Modeling Approach of Regression Orthogonal Experiment Design for Thermal Error Compensation of CNC Turning Center

    Institute of Scientific and Technical Information of China (English)

    2002-01-01

    The thermal induced errors can account for as much as 70% of the dimensional errors on a workpiece. Accurate modeling of errors is an essential part of error compensation. Base on analyzing the existing approaches of the thermal error modeling for machine tools, a new approach of regression orthogonal design is proposed, which combines the statistic theory with machine structures, surrounding condition, engineering judgements, and experience in modeling. A whole computation and analysis procedure is given. ...

  10. Delineating Individual Trees from Lidar Data: A Comparison of Vector- and Raster-based Segmentation Approaches

    Directory of Open Access Journals (Sweden)

    Maggi Kelly

    2013-08-01

    Full Text Available Light detection and ranging (lidar data is increasingly being used for ecosystem monitoring across geographic scales. This work concentrates on delineating individual trees in topographically-complex, mixed conifer forest across the California’s Sierra Nevada. We delineated individual trees using vector data and a 3D lidar point cloud segmentation algorithm, and using raster data with an object-based image analysis (OBIA of a canopy height model (CHM. The two approaches are compared to each other and to ground reference data. We used high density (9 pulses/m2, discreet lidar data and WorldView-2 imagery to delineate individual trees, and to classify them by species or species types. We also identified a new method to correct artifacts in a high-resolution CHM. Our main focus was to determine the difference between the two types of approaches and to identify the one that produces more realistic results. We compared the delineations via tree detection, tree heights, and the shape of the generated polygons. The tree height agreement was high between the two approaches and the ground data (r2: 0.93–0.96. Tree detection rates increased for more dominant trees (8–100 percent. The two approaches delineated tree boundaries that differed in shape: the lidar-approach produced fewer, more complex, and larger polygons that more closely resembled real forest structure.

  11. Predicting dissolved oxygen concentration using kernel regression modeling approaches with nonlinear hydro-chemical data.

    Science.gov (United States)

    Singh, Kunwar P; Gupta, Shikha; Rai, Premanjali

    2014-05-01

    Kernel function-based regression models were constructed and applied to a nonlinear hydro-chemical dataset pertaining to surface water for predicting the dissolved oxygen levels. Initial features were selected using nonlinear approach. Nonlinearity in the data was tested using BDS statistics, which revealed the data with nonlinear structure. Kernel ridge regression, kernel principal component regression, kernel partial least squares regression, and support vector regression models were developed using the Gaussian kernel function and their generalization and predictive abilities were compared in terms of several statistical parameters. Model parameters were optimized using the cross-validation procedure. The proposed kernel regression methods successfully captured the nonlinear features of the original data by transforming it to a high dimensional feature space using the kernel function. Performance of all the kernel-based modeling methods used here were comparable both in terms of predictive and generalization abilities. Values of the performance criteria parameters suggested for the adequacy of the constructed models to fit the nonlinear data and their good predictive capabilities.

  12. Developing Dynamic Virtual Environments Using Hierarchical, Tree-Structured Approach

    Directory of Open Access Journals (Sweden)

    Wan Mohd Rizhan Wan Idris

    2011-05-01

    Full Text Available Virtual reality (VR has been utilized in various applications such as in architecture, medicine, advertisement, business, entertainment, and education. In the world of simulation, VR software allows users to visualize, manipulate and interact with the computers and complex data. However, developing VR environments is costly and expensive. Highly-technical persons are needed to create the virtual objects from scratch. Once a virtual system is created, managing and modifying it creates further problems. There is a need for non-technical users to be able to create and modify their own virtual environments. This paper discusses a systematic technique to develop dynamic virtual environments and to manage virtual objects in their virtual environment. The technique is called hierarchical, tree-structured approach. To implement the technique, object-oriented programming language was used such as Java, Java 3D and Java Swing. For the usability and performance of the technique, a virtual environment has been created to become as case study. The tool has been perceived as an easy tool to use, especially for an environment in education.

  13. DEVELOPING DYNAMIC VIRTUAL ENVIRONMENTS USING HIERARCHICAL, TREE-STRUCTURED APPROACH

    Directory of Open Access Journals (Sweden)

    Wan Mohd Rizhan Wan Idris

    2015-10-01

    Full Text Available Virtual reality (VR has been utilized in various applications such as in architecture, medicine, advertisement, business, entertainment, and education. In the world of simulation, VR software allows users to visualize, manipulate and interact with the computers and complex data. However, developing VR environments is costly and expensive. Highly-technical persons are needed to create the virtual objects from scratch. Once a virtual system is created, managing and modifying it creates further problems. There is a need for non-technical users to be able to create and modify their own virtual environments. This paper discusses a systematic technique to develop dynamic virtual environments and to manage virtual objects in their virtual environment. The technique is called hierarchical, tree-structured approach. To implement the technique, object-oriented programming language was used such as Java, Java 3D and Java Swing. For the usability and performance of the technique, a virtual environment has been created to become as case study. The tool has been perceived as an easy tool to use, especially for an environment in education.

  14. A Novel Imbalanced Data Classification Approach Based on Logistic Regression and Fisher Discriminant

    Directory of Open Access Journals (Sweden)

    Baofeng Shi

    2015-01-01

    Full Text Available We introduce an imbalanced data classification approach based on logistic regression significant discriminant and Fisher discriminant. First of all, a key indicators extraction model based on logistic regression significant discriminant and correlation analysis is derived to extract features for customer classification. Secondly, on the basis of the linear weighted utilizing Fisher discriminant, a customer scoring model is established. And then, a customer rating model where the customer number of all ratings follows normal distribution is constructed. The performance of the proposed model and the classical SVM classification method are evaluated in terms of their ability to correctly classify consumers as default customer or nondefault customer. Empirical results using the data of 2157 customers in financial engineering suggest that the proposed approach better performance than the SVM model in dealing with imbalanced data classification. Moreover, our approach contributes to locating the qualified customers for the banks and the bond investors.

  15. A new approach to modeling tree rainfall interception

    Science.gov (United States)

    Xiao, Qingfu; McPherson, E. Gregory; Ustin, Susan L.; Grismer, Mark E.

    2000-12-01

    A three-dimensional physically based stochastic model was developed to describe canopy rainfall interception processes at desired spatial and temporal resolutions. Such model development is important to understand these processes because forest canopy interception may exceed 59% of annual precipitation in old growth trees. The model describes the interception process from a single leaf, to a branch segment, and then up to the individual tree level. It takes into account rainfall, meteorology, and canopy architecture factors as explicit variables. Leaf and stem surface roughness, architecture, and geometric shape control both leaf drip and stemflow. Model predictions were evaluated using actual interception data collected for two mature open grown trees, a 9-year-old broadleaf deciduous pear tree (Pyrus calleryana "Bradford" or Callery pear) and an 8-year-old broadleaf evergreen oak tree (Quercus suber or cork oak). When simulating 18 rainfall events for the oak tree and 16 rainfall events for the pear tree, the model over estimated interception loss by 4.5% and 3.0%, respectively, while stemflow was under estimated by 0.8% and 3.3%, and throughfall was under estimated by 3.7% for the oak tree and over estimated by 0.3% for the pear tree. A model sensitivity analysis indicates that canopy surface storage capacity had the greatest influence on interception, and interception losses were sensitive to leaf and stem surface area indices. Among rainfall factors, interception losses relative to gross precipitation were most sensitive to rainfall amount. Rainfall incident angle had a significant effect on total precipitation intercepting the projected surface area. Stemflow was sensitive to stem segment and leaf zenith angle distributions. Enhanced understanding of interception loss dynamics should lead to improved urban forest ecosystem management.

  16. Combining the Performance Strengths of the Logistic Regression and Neural Network Models: A Medical Outcomes Approach

    Directory of Open Access Journals (Sweden)

    Wun Wong

    2003-01-01

    Full Text Available The assessment of medical outcomes is important in the effort to contain costs, streamline patient management, and codify medical practices. As such, it is necessary to develop predictive models that will make accurate predictions of these outcomes. The neural network methodology has often been shown to perform as well, if not better, than the logistic regression methodology in terms of sample predictive performance. However, the logistic regression method is capable of providing an explanation regarding the relationship(s between variables. This explanation is often crucial to understanding the clinical underpinnings of the disease process. Given the respective strengths of the methodologies in question, the combined use of a statistical (i.e., logistic regression and machine learning (i.e., neural network technology in the classification of medical outcomes is warranted under appropriate conditions. The study discusses these conditions and describes an approach for combining the strengths of the models.

  17. Fuzzy set theoretic approach to fault tree analysis

    African Journals Online (AJOL)

    user

    Research in conventional fault tree analysis (FTA) is based mainly on failure ... Thus for a very complex system having large number of components, the ..... Smaller, the triangular fuzzy number B-Ai, will result in the best approximation for B.

  18. A quantile regression approach for modelling a Health-Related Quality of Life Measure

    Directory of Open Access Journals (Sweden)

    Giulia Cavrini

    2013-05-01

    Full Text Available Objective. The aim of this study is to propose a new approach for modeling the EQ-5D index and EQ-5D VAS in order to explain the lifestyle determinants effect using the quantile regression analysis. Methods. Data was collected within a cross-sectional study that involved a probabilistic sample of 1,622 adults randomly selected from the population register of two Health Authorities of Bologna in northern Italy. The perceived health status of people was measured using the EQ-5D questionnaire. The Visual Analogue Scale included in the EQ-5D Questionnaire, the EQ-VAS, and the EQ-5D index were used to obtain the synthetic measures of quality of life. To model EQ-VAS Score and EQ-5D index, a quantile regression analysis was employed. Quantile Regression is a way to estimate the conditional quantiles of the VAS Score distribution in a linear model, in order to have a more complete view of possible associations between a measure of Health Related Quality of Life (dependent variable and socio-demographic and determinants data. This methodological approach was preferred to an OLS regression because of the EQ-VAS Score and EQ-5D index typical distribution. Main Results. The analysis suggested that age, gender, and comorbidity can explain variability in perceived health status measured by the EQ-5D index and the VAS.

  19. Refining the criterion for an abnormal Integrated Relaxation Pressure in esophageal pressure topography based on the pattern of esophageal contractility using a classification and regression tree model.

    Science.gov (United States)

    Lin, Zhiyue; Kahrilas, P J; Roman, S; Boris, L; Carlson, D; Pandolfino, J E

    2012-08-01

    The Integrated Relaxation Pressure (IRP) is the esophageal pressure topography (EPT) metric used for assessing the adequacy of esophagogastric junction (EGJ) relaxation in the Chicago Classification of motility disorders. However, because the IRP value is also influenced by distal esophageal contractility, we hypothesized that its normal limits should vary with different patterns of contractility. Five hundred and twenty two selected EPT studies were used to compare the accuracy of alternative analysis paradigms to that of a motility expert (the 'gold standard'). Chicago Classification metrics were scored manually and used as inputs for MATLAB™ programs that utilized either strict algorithm-based interpretation (fixed abnormal IRP threshold of 15 mmHg) or a classification and regression tree (CART) model that selected variable IRP thresholds depending on the associated esophageal contractility. The sensitivity of the CART model for achalasia (93%) was better than that of the algorithm-based approach (85%) on account of using variable IRP thresholds that ranged from a low value of >10 mmHg to distinguish type I achalasia from absent peristalsis to a high value of >17 mmHg to distinguish type III achalasia from distal esophageal spasm. Additionally, type II achalasia was diagnosed solely by panesophageal pressurization without the IRP entering the algorithm. Automated interpretation of EPT studies more closely mimics that of a motility expert when IRP thresholds for impaired EGJ relaxation are adjusted depending on the pattern of associated esophageal contractility. The range of IRP cutoffs suggested by the CART model ranged from 10 to 17 mmHg. © 2012 Blackwell Publishing Ltd.

  20. Adaptive modelling of gene regulatory network using Bayesian information criterion-guided sparse regression approach.

    Science.gov (United States)

    Shi, Ming; Shen, Weiming; Wang, Hong-Qiang; Chong, Yanwen

    2016-12-01

    Inferring gene regulatory networks (GRNs) from microarray expression data are an important but challenging issue in systems biology. In this study, the authors propose a Bayesian information criterion (BIC)-guided sparse regression approach for GRN reconstruction. This approach can adaptively model GRNs by optimising the l1-norm regularisation of sparse regression based on a modified version of BIC. The use of the regularisation strategy ensures the inferred GRNs to be as sparse as natural, while the modified BIC allows incorporating prior knowledge on expression regulation and thus avoids the overestimation of expression regulators as usual. Especially, the proposed method provides a clear interpretation of combinatorial regulations of gene expression by optimally extracting regulation coordination for a given target gene. Experimental results on both simulation data and real-world microarray data demonstrate the competent performance of discovering regulatory relationships in GRN reconstruction.

  1. Trees

    OpenAIRE

    Henri Epstein

    2016-01-01

    An algebraic formalism, developed with V. Glaser and R. Stora for the study of the generalized retarded functions of quantum field theory, is used to prove a factorization theorem which provides a complete description of the generalized retarded functions associated with any tree graph. Integrating over the variables associated to internal vertices to obtain the perturbative generalized retarded functions for interacting fields arising from such graphs is shown to be possible for a large cate...

  2. Trees

    OpenAIRE

    Epstein, Henri

    2016-01-01

    An algebraic formalism, developped with V. Glaser and R. Stora for the study of the generalized retarded functions of quantum field theory, is used to prove a factorization theorem which provides a complete description of the generalized retarded functions associated with any tree graph. Integrating over the variables associated to internal vertices to obtain the perturbative generalized retarded functions for interacting fields arising from such graphs is shown to be possible for a large cat...

  3. Trees

    CERN Document Server

    Epstein, Henri

    2016-01-01

    An algebraic formalism, developped with V.~Glaser and R.~Stora for the study of the generalized retarded functions of quantum field theory, is used to prove a factorization theorem which provides a complete description of the generalized retarded functions associated with any tree graph. Integrating over the variables associated to internal vertices to obtain the perturbative generalized retarded functions for interacting fields arising from such graphs is shown to be possible for a large category of space-times.

  4. A land use regression model for ambient ultrafine particles in Montreal, Canada: A comparison of linear regression and a machine learning approach.

    Science.gov (United States)

    Weichenthal, Scott; Ryswyk, Keith Van; Goldstein, Alon; Bagg, Scott; Shekkarizfard, Maryam; Hatzopoulou, Marianne

    2016-04-01

    Existing evidence suggests that ambient ultrafine particles (UFPs) (regression model for UFPs in Montreal, Canada using mobile monitoring data collected from 414 road segments during the summer and winter months between 2011 and 2012. Two different approaches were examined for model development including standard multivariable linear regression and a machine learning approach (kernel-based regularized least squares (KRLS)) that learns the functional form of covariate impacts on ambient UFP concentrations from the data. The final models included parameters for population density, ambient temperature and wind speed, land use parameters (park space and open space), length of local roads and rail, and estimated annual average NOx emissions from traffic. The final multivariable linear regression model explained 62% of the spatial variation in ambient UFP concentrations whereas the KRLS model explained 79% of the variance. The KRLS model performed slightly better than the linear regression model when evaluated using an external dataset (R(2)=0.58 vs. 0.55) or a cross-validation procedure (R(2)=0.67 vs. 0.60). In general, our findings suggest that the KRLS approach may offer modest improvements in predictive performance compared to standard multivariable linear regression models used to estimate spatial variations in ambient UFPs. However, differences in predictive performance were not statistically significant when evaluated using the cross-validation procedure.

  5. Parallel Approach for Time Series Analysis with General Regression Neural Networks

    Directory of Open Access Journals (Sweden)

    J.C. Cuevas-Tello

    2012-04-01

    Full Text Available The accuracy on time delay estimation given pairs of irregularly sampled time series is of great relevance in astrophysics. However the computational time is also important because the study of large data sets is needed. Besides introducing a new approach for time delay estimation, this paper presents a parallel approach to obtain a fast algorithm for time delay estimation. The neural network architecture that we use is general Regression Neural Network (GRNN. For the parallel approach, we use Message Passing Interface (MPI on a beowulf-type cluster and on a Cray supercomputer and we also use the Compute Unified Device Architecture (CUDA™ language on Graphics Processing Units (GPUs. We demonstrate that, with our approach, fast algorithms can be obtained for time delay estimation on large data sets with the same accuracy as state-of-the-art methods.

  6. A frailty model approach for regression analysis of multivariate current status data.

    Science.gov (United States)

    Chen, Man-Hua; Tong, Xingwei; Sun, Jianguo

    2009-11-30

    This paper discusses regression analysis of multivariate current status failure time data (The Statistical Analysis of Interval-censoring Failure Time Data. Springer: New York, 2006), which occur quite often in, for example, tumorigenicity experiments and epidemiologic investigations of the natural history of a disease. For the problem, several marginal approaches have been proposed that model each failure time of interest individually (Biometrics 2000; 56:940-943; Statist. Med. 2002; 21:3715-3726). In this paper, we present a full likelihood approach based on the proportional hazards frailty model. For estimation, an Expectation Maximization (EM) algorithm is developed and simulation studies suggest that the presented approach performs well for practical situations. The approach is applied to a set of bivariate current status data arising from a tumorigenicity experiment.

  7. Binary Tree Approach to Scaling in Unimodal Maps

    CERN Document Server

    Ketoja, J A; Ketoja, Jukka A.; Kurkijarvi, Juhani

    1993-01-01

    Ge, Rusjan, and Zweifel (J. Stat. Phys. 59, 1265 (1990)) introduced a binary tree which represents all the periodic windows in the chaotic regime of iterated one-dimensional unimodal maps. We consider the scaling behavior in a modified tree which takes into account the self-similarity of the window structure. A non-universal geometric convergence of the associated superstable parameter values towards a Misiurewicz point is observed for almost all binary sequences with periodic tails. There are an infinite number of exceptional sequences, however, which lead to superexponential scaling. The origin of such sequences is explained.

  8. Mastectomy or breast conserving surgery? Factors affecting type of surgical treatment for breast cancer – a classification tree approach

    Directory of Open Access Journals (Sweden)

    O'Neill Terry

    2006-04-01

    Full Text Available Abstract Background A critical choice facing breast cancer patients is which surgical treatment – mastectomy or breast conserving surgery (BCS – is most appropriate. Several studies have investigated factors that impact the type of surgery chosen, identifying features such as place of residence, age at diagnosis, tumor size, socio-economic and racial/ethnic elements as relevant. Such assessment of "propensity" is important in understanding issues such as a reported under-utilisation of BCS among women for whom such treatment was not contraindicated. Using Western Australian (WA data, we further examine the factors associated with the type of surgical treatment for breast cancer using a classification tree approach. This approach deals naturally with complicated interactions between factors, and so allows flexible and interpretable models for treatment choice to be built that add to the current understanding of this complex decision process. Methods Data was extracted from the WA Cancer Registry on women diagnosed with breast cancer in WA from 1990 to 2000. Subjects' treatment preferences were predicted from covariates using both classification trees and logistic regression. Results Tumor size was the primary determinant of patient choice, subjects with tumors smaller than 20 mm in diameter preferring BCS. For subjects with tumors greater than 20 mm in diameter factors such as patient age, nodal status, and tumor histology become relevant as predictors of patient choice. Conclusion Classification trees perform as well as logistic regression for predicting patient choice, but are much easier to interpret for clinical use. The selected tree can inform clinicians' advice to patients.

  9. A Bayesian Approach for Graph-constrained Estimation for High-dimensional Regression.

    Science.gov (United States)

    Sun, Hokeun; Li, Hongzhe

    Many different biological processes are represented by network graphs such as regulatory networks, metabolic pathways, and protein-protein interaction networks. Since genes that are linked on the networks usually have biologically similar functions, the linked genes form molecular modules to affect the clinical phenotypes/outcomes. Similarly, in large-scale genetic association studies, many SNPs are in high linkage disequilibrium (LD), which can also be summarized as a LD graph. In order to incorporate the graph information into regression analysis with high dimensional genomic data as predictors, we introduce a Bayesian approach for graph-constrained estimation (Bayesian GRACE) and regularization, which controls the amount of regularization for sparsity and smoothness of the regression coefficients. The Bayesian estimation with their posterior distributions can provide credible intervals for the estimates of the regression coefficients along with standard errors. The deviance information criterion (DIC) is applied for model assessment and tuning parameter selection. The performance of the proposed Bayesian approach is evaluated through simulation studies and is compared with Bayesian Lasso and Bayesian Elastic-net procedures. We demonstrate our method in an analysis of data from a case-control genome-wide association study of neuroblastoma using a weighted LD graph.

  10. Evaluation of a LASSO regression approach on the unrelated samples of Genetic Analysis Workshop 17.

    Science.gov (United States)

    Guo, Wei; Elston, Robert C; Zhu, Xiaofeng

    2011-11-29

    The Genetic Analysis Workshop 17 data we used comprise 697 unrelated individuals genotyped at 24,487 single-nucleotide polymorphisms (SNPs) from a mini-exome scan, using real sequence data for 3,205 genes annotated by the 1000 Genomes Project and simulated phenotypes. We studied 200 sets of simulated phenotypes of trait Q2. An important feature of this data set is that most SNPs are rare, with 87% of the SNPs having a minor allele frequency less than 0.05. For rare SNP detection, in this study we performed a least absolute shrinkage and selection operator (LASSO) regression and F tests at the gene level and calculated the generalized degrees of freedom to avoid any selection bias. For comparison, we also carried out linear regression and the collapsing method, which sums the rare SNPs, modified for a quantitative trait and with two different allele frequency thresholds. The aim of this paper is to evaluate these four approaches in this mini-exome data and compare their performance in terms of power and false positive rates. In most situations the LASSO approach is more powerful than linear regression and collapsing methods. We also note the difficulty in determining the optimal threshold for the collapsing method and the significant role that linkage disequilibrium plays in detecting rare causal SNPs. If a rare causal SNP is in strong linkage disequilibrium with a common marker in the same gene, power will be much improved.

  11. A Simple Regression-based Approach to Account for Survival Bias in Birth Outcomes Research.

    Science.gov (United States)

    Tchetgen Tchetgen, Eric J; Phiri, Kelesitse; Shapiro, Roger

    2015-07-01

    In perinatal epidemiology, birth outcomes such as small for gestational age (SGA) may not be observed for a pregnancy ending with a stillbirth. It is then said that SGA is truncated by stillbirth, which may give rise to survival bias when evaluating the effects on SGA of an exposure known also to influence the risk of a stillbirth. In this article, we consider the causal effects of maternal infection with human immunodeficiency virus (HIV) on the risk of SGA, in a sample of pregnant women in Botswana. We hypothesize that previously estimated effects of HIV on SGA may be understated because they fail to appropriately account for the over-representation of live births among HIV negative mothers, relative to HIV positive mothers. A simple yet novel regression-based approach is proposed to adjust effect estimates for survival bias for an outcome that is either continuous or binary. Under certain straightforward assumptions, the approach produces an estimate that may be interpreted as the survivor average causal effect of maternal HIV, which is, the average effect of maternal HIV on SGA among births that would be live irrespective of maternal HIV status. The approach is particularly appealing, because it recovers an exposure effect which is robust to survival bias, even if the association between the risk of SGA and that of a stillbirth cannot be completely explained by adjusting for observed shared risk factors. The approach also gives a formal statistical test of the null hypothesis of no survival bias in the regression framework.

  12. A Kalman Filtering and Nonlinear Penalty Regression Approach for Noninvasive Anemia Detection with Palpebral Conjunctiva Images

    Directory of Open Access Journals (Sweden)

    Yi-Ming Chen

    2017-01-01

    Full Text Available Noninvasive medical procedures are usually preferable to their invasive counterparts in the medical community. Anemia examining through the palpebral conjunctiva is a convenient noninvasive procedure. The procedure can be automated to reduce the medical cost. We propose an anemia examining approach by using a Kalman filter (KF and a regression method. The traditional KF is often used in time-dependent applications. Here, we modified the traditional KF for the time-independent data in medical applications. We simply compute the mean value of the red component of the palpebral conjunctiva image as our recognition feature and use a penalty regression algorithm to find a nonlinear curve that best fits the data of feature values and the corresponding levels of hemoglobin (Hb concentration. To evaluate the proposed approach and several relevant approaches, we propose a risk evaluation scheme, where the entire Hb spectrum is divided into high-risk, low-risk, and doubtful intervals for anemia. The doubtful interval contains the Hb threshold, say 11 g/dL, separating anemia and nonanemia. A suspect sample is the sample falling in the doubtful interval. For the anemia screening purpose, we would like to have as less suspect samples as possible. The experimental results show that the modified KF reduces the number of suspect samples significantly for all the approaches considered here.

  13. Classification and regression tree (CART analyses of genomic signatures reveal sets of tetramers that discriminate temperature optima of archaea and bacteria

    Directory of Open Access Journals (Sweden)

    Betsey Dexter Dyer

    2008-01-01

    Full Text Available Classification and regression tree (CART analysis was applied to genome-wide tetranucleotide frequencies (genomic signatures of 195 archaea and bacteria. Although genomic signatures have typically been used to classify evolutionary divergence, in this study, convergent evolution was the focus. Temperature optima for most of the organisms examined could be distinguished by CART analyses of tetranucleotide frequencies. This suggests that pervasive (nonlinear qualities of genomes may reflect certain environmental conditions (such as temperature in which those genomes evolved. The predominant use of GAGA and AGGA as the discriminating tetramers in CART models suggests that purine-loading and codon biases of thermophiles may explain some of the results.

  14. Towards global empirical upscaling of FLUXNET eddy covariance observations: validation of a model tree ensemble approach using a biosphere model

    Science.gov (United States)

    Jung, M.; Reichstein, M.; Bondeau, A.

    2009-10-01

    Global, spatially and temporally explicit estimates of carbon and water fluxes derived from empirical up-scaling eddy covariance measurements would constitute a new and possibly powerful data stream to study the variability of the global terrestrial carbon and water cycle. This paper introduces and validates a machine learning approach dedicated to the upscaling of observations from the current global network of eddy covariance towers (FLUXNET). We present a new model TRee Induction ALgorithm (TRIAL) that performs hierarchical stratification of the data set into units where particular multiple regressions for a target variable hold. We propose an ensemble approach (Evolving tRees with RandOm gRowth, ERROR) where the base learning algorithm is perturbed in order to gain a diverse sequence of different model trees which evolves over time. We evaluate the efficiency of the model tree ensemble (MTE) approach using an artificial data set derived from the Lund-Potsdam-Jena managed Land (LPJmL) biosphere model. We aim at reproducing global monthly gross primary production as simulated by LPJmL from 1998-2005 using only locations and months where high quality FLUXNET data exist for the training of the model trees. The model trees are trained with the LPJmL land cover and meteorological input data, climate data, and the fraction of absorbed photosynthetic active radiation simulated by LPJmL. Given that we know the "true result" in the form of global LPJmL simulations we can effectively study the performance of the MTE upscaling and associated problems of extrapolation capacity. We show that MTE is able to explain 92% of the variability of the global LPJmL GPP simulations. The mean spatial pattern and the seasonal variability of GPP that constitute the largest sources of variance are very well reproduced (96% and 94% of variance explained respectively) while the monthly interannual anomalies which occupy much less variance are less well matched (41% of variance explained

  15. New Approaches To Photometric Redshift Prediction Via Gaussian Process Regression In The Sloan Digital Sky Survey

    CERN Document Server

    Way, M J; Gazis, P R; Srivastava, A N

    2009-01-01

    Expanding upon the work of Way & Srivastava 2006 we demonstrate how the use of training sets of comparable size continue to make Gaussian Process Regression a competitive and in many ways a superior approach to that of Neural Networks and other least-squares fitting methods. This is possible via new matrix inversion techniques developed for Gaussian Processes that do not require that the kernel matrix be sparse. This development, combined with a neural-network kernel function appears to give superior results for this problem. We demonstrate that there appears to be a minimum number of training set galaxies needed to obtain the optimal fit when using our Gaussian Process Regression rank-reduction methods. We also find that morphological information included with many photometric surveys appears, for the most part, to make the photometric redshift evaluation slightly worse rather than better. This would indicate that morphological information simply adds noise from the Gaussian Process point of view. In add...

  16. MODIFIED REGRESSION APPROACH IN PREDICTION OF FINITE POPULATION MEAN USING KNOWN COEFFICIENT OF VARIATION

    Directory of Open Access Journals (Sweden)

    Sheela Misra

    2013-01-01

    Full Text Available In this paper,we are utilizingthe modified regression approach for the prediction offinite population mean, with known coefficient of variation of study variabley, undersimplerandom sampling without replacement. The bias and mean square error of the proposed estimatorare obtained and compared with the usual regression estimator of the population mean and comesout to be more efficient in the sense of having lesser meansquare error. The optimum class ofestimators is obtained and for the greater practical utility proposed optimum estimator based onestimated optimum value of the characterizing scalar is also obtained and is shown to retain thesame efficiency to the first order of approximation as the former one. A numerical illustration isalso given to support the theoretical conclusions.

  17. Towards global empirical upscaling of FLUXNET eddy covariance observations: validation of a model tree ensemble approach using a biosphere model

    Directory of Open Access Journals (Sweden)

    M. Jung

    2009-05-01

    Full Text Available Global, spatially and temporally explicit estimates of carbon and water fluxes derived from empirical up-scaling eddy covariance measurements would constitute a new and possibly powerful data stream to study the variability of the global terrestrial carbon and water cycle. This paper introduces and validates a machine learning approach dedicated to the upscaling of observations from the current global network of eddy covariance towers (FLUXNET. We present a new model TRee Induction ALgorithm (TRIAL that performs hierarchical stratification of the data set into units where particular multiple regressions for a target variable hold. We propose an ensemble approach (Evolving tRees with RandOm gRowth, ERROR where the base learning algorithm is perturbed in order to gain a diverse sequence of different model trees which evolves over time.

    We evaluate the efficiency of the model tree ensemble approach using an artificial data set derived from the the Lund-Potsdam-Jena managed Land (LPJmL biosphere model. We aim at reproducing global monthly gross primary production as simulated by LPJmL from 1998–2005 using only locations and months where high quality FLUXNET data exist for the training of the model trees. The model trees are trained with the LPJmL land cover and meteorological input data, climate data, and the fraction of absorbed photosynthetic active radiation simulated by LPJmL. Given that we know the "true result" in the form of global LPJmL simulations we can effectively study the performance of the model tree ensemble upscaling and associated problems of extrapolation capacity.

    We show that the model tree ensemble is able to explain 92% of the variability of the global LPJmL GPP simulations. The mean spatial pattern and the seasonal variability of GPP that constitute the largest sources of variance are very well reproduced (96% and 94% of variance explained respectively while the monthly interannual anomalies which occupy

  18. A data mining approach to optimize pellets manufacturing process based on a decision tree algorithm.

    Science.gov (United States)

    Ronowicz, Joanna; Thommes, Markus; Kleinebudde, Peter; Krysiński, Jerzy

    2015-06-20

    The present study is focused on the thorough analysis of cause-effect relationships between pellet formulation characteristics (pellet composition as well as process parameters) and the selected quality attribute of the final product. The shape using the aspect ratio value expressed the quality of pellets. A data matrix for chemometric analysis consisted of 224 pellet formulations performed by means of eight different active pharmaceutical ingredients and several various excipients, using different extrusion/spheronization process conditions. The data set contained 14 input variables (both formulation and process variables) and one output variable (pellet aspect ratio). A tree regression algorithm consistent with the Quality by Design concept was applied to obtain deeper understanding and knowledge of formulation and process parameters affecting the final pellet sphericity. The clear interpretable set of decision rules were generated. The spehronization speed, spheronization time, number of holes and water content of extrudate have been recognized as the key factors influencing pellet aspect ratio. The most spherical pellets were achieved by using a large number of holes during extrusion, a high spheronizer speed and longer time of spheronization. The described data mining approach enhances knowledge about pelletization process and simultaneously facilitates searching for the optimal process conditions which are necessary to achieve ideal spherical pellets, resulting in good flow characteristics. This data mining approach can be taken into consideration by industrial formulation scientists to support rational decision making in the field of pellets technology. Copyright © 2015 Elsevier B.V. All rights reserved.

  19. A poisson regression approach for modelling spatial autocorrelation between geographically referenced observations

    Directory of Open Access Journals (Sweden)

    Jolley Damien

    2011-10-01

    Full Text Available Abstract Background Analytic methods commonly used in epidemiology do not account for spatial correlation between observations. In regression analyses, omission of that autocorrelation can bias parameter estimates and yield incorrect standard error estimates. Methods We used age standardised incidence ratios (SIRs of esophageal cancer (EC from the Babol cancer registry from 2001 to 2005, and extracted socioeconomic indices from the Statistical Centre of Iran. The following models for SIR were used: (1 Poisson regression with agglomeration-specific nonspatial random effects; (2 Poisson regression with agglomeration-specific spatial random effects. Distance-based and neighbourhood-based autocorrelation structures were used for defining the spatial random effects and a pseudolikelihood approach was applied to estimate model parameters. The Bayesian information criterion (BIC, Akaike's information criterion (AIC and adjusted pseudo R2, were used for model comparison. Results A Gaussian semivariogram with an effective range of 225 km best fit spatial autocorrelation in agglomeration-level EC incidence. The Moran's I index was greater than its expected value indicating systematic geographical clustering of EC. The distance-based and neighbourhood-based Poisson regression estimates were generally similar. When residual spatial dependence was modelled, point and interval estimates of covariate effects were different to those obtained from the nonspatial Poisson model. Conclusions The spatial pattern evident in the EC SIR and the observation that point estimates and standard errors differed depending on the modelling approach indicate the importance of accounting for residual spatial correlation in analyses of EC incidence in the Caspian region of Iran. Our results also illustrate that spatial smoothing must be applied with care.

  20. Multivariate regression approaches for surrogate-based diffeomorphic estimation of respiratory motion in radiation therapy

    Science.gov (United States)

    Wilms, M.; Werner, R.; Ehrhardt, J.; Schmidt-Richberg, A.; Schlemmer, H.-P.; Handels, H.

    2014-03-01

    Breathing-induced location uncertainties of internal structures are still a relevant issue in the radiation therapy of thoracic and abdominal tumours. Motion compensation approaches like gating or tumour tracking are usually driven by low-dimensional breathing signals, which are acquired in real-time during the treatment. These signals are only surrogates of the internal motion of target structures and organs at risk, and, consequently, appropriate models are needed to establish correspondence between the acquired signals and the sought internal motion patterns. In this work, we present a diffeomorphic framework for correspondence modelling based on the Log-Euclidean framework and multivariate regression. Within the framework, we systematically compare standard and subspace regression approaches (principal component regression, partial least squares, canonical correlation analysis) for different types of common breathing signals (1D: spirometry, abdominal belt, diaphragm tracking; multi-dimensional: skin surface tracking). Experiments are based on 4D CT and 4D MRI data sets and cover intra- and inter-cycle as well as intra- and inter-session motion variations. Only small differences in internal motion estimation accuracy are observed between the 1D surrogates. Increasing the surrogate dimensionality, however, improved the accuracy significantly; this is shown for both 2D signals, which consist of a common 1D signal and its time derivative, and high-dimensional signals containing the motion of many skin surface points. Eventually, comparing the standard and subspace regression variants when applied to the high-dimensional breathing signals, only small differences in terms of motion estimation accuracy are found.

  1. Tests of Simple Slopes in Multiple Regression Models with an Interaction: Comparison of Four Approaches.

    Science.gov (United States)

    Liu, Yu; West, Stephen G; Levy, Roy; Aiken, Leona S

    2017-01-01

    In multiple regression researchers often follow up significant tests of the interaction between continuous predictors X and Z with tests of the simple slope of Y on X at different sample-estimated values of the moderator Z (e.g., ±1 SD from the mean of Z). We show analytically that when X and Z are randomly sampled from the population, the variance expression of the simple slope at sample-estimated values of Z differs from the traditional variance expression obtained when the values of X and Z are fixed. A simulation study using randomly sampled predictors compared four approaches: (a) the Aiken and West ( 1991 ) test of simple slopes at fixed population values of Z, (b) the Aiken and West test at sample-estimated values of Z, (c) a 95% percentile bootstrap confidence interval approach, and (d) a fully Bayesian approach with diffuse priors. The results showed that approach (b) led to inflated Type 1 error rates and 95% confidence intervals with inadequate coverage rates, whereas other approaches maintained acceptable Type 1 error rates and adequate coverage of confidence intervals. Approach (c) had asymmetric rejection rates at small sample sizes. We used an empirical data set to illustrate these approaches.

  2. Greedy and Linear Ensembles of Machine Learning Methods Outperform Single Approaches for QSPR Regression Problems.

    Science.gov (United States)

    Kew, William; Mitchell, John B O

    2015-09-01

    The application of Machine Learning to cheminformatics is a large and active field of research, but there exist few papers which discuss whether ensembles of different Machine Learning methods can improve upon the performance of their component methodologies. Here we investigated a variety of methods, including kernel-based, tree, linear, neural networks, and both greedy and linear ensemble methods. These were all tested against a standardised methodology for regression with data relevant to the pharmaceutical development process. This investigation focused on QSPR problems within drug-like chemical space. We aimed to investigate which methods perform best, and how the 'wisdom of crowds' principle can be applied to ensemble predictors. It was found that no single method performs best for all problems, but that a dynamic, well-structured ensemble predictor would perform very well across the board, usually providing an improvement in performance over the best single method. Its use of weighting factors allows the greedy ensemble to acquire a bigger contribution from the better performing models, and this helps the greedy ensemble generally to outperform the simpler linear ensemble. Choice of data preprocessing methodology was found to be crucial to performance of each method too. © 2015 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.

  3. An Introduction to Recursive Partitioning: Rationale, Application, and Characteristics of Classification and Regression Trees, Bagging, and Random Forests

    Science.gov (United States)

    Strobl, Carolin; Malley, James; Tutz, Gerhard

    2009-01-01

    Recursive partitioning methods have become popular and widely used tools for nonparametric regression and classification in many scientific fields. Especially random forests, which can deal with large numbers of predictor variables even in the presence of complex interactions, have been applied successfully in genetics, clinical medicine, and…

  4. Increased tree establishment in Lithuanian peat bogs--insights from field and remotely sensed approaches.

    Science.gov (United States)

    Edvardsson, Johannes; Šimanauskienė, Rasa; Taminskas, Julius; Baužienė, Ieva; Stoffel, Markus

    2015-02-01

    Over the past century an ongoing establishment of Scots pine (Pinus sylvestris L.), sometimes at accelerating rates, is noted at three studied Lithuanian peat bogs, namely Kerėplis, Rėkyva and Aukštumala, all representing different degrees of tree coverage and geographic settings. Present establishment rates seem to depend on tree density on the bog surface and are most significant at sparsely covered sites where about three-fourth of the trees have established since the mid-1990s, whereas the initial establishment in general was during the early to mid-19th century. Three methods were used to detect, compare and describe tree establishment: (1) tree counts in small plots, (2) dendrochronological dating of bog pine trees, and (3) interpretation of aerial photographs and historical maps of the study areas. In combination, the different approaches provide complimentary information but also weigh up each other's drawbacks. Tree counts in plots provided a reasonable overview of age class distributions and enabled capturing of the most recently established trees with ages less than 50 years. The dendrochronological analysis yielded accurate tree ages and a good temporal resolution of long-term changes. Tree establishment and spread interpreted from aerial photographs and historical maps provided a good overview of tree spread and total affected area. It also helped to verify the results obtained with the other methods and an upscaling of findings to the entire peat bogs. The ongoing spread of trees in predominantly undisturbed peat bogs is related to warmer and/or drier climatic conditions, and to a minor degree to land-use changes. Our results therefore provide valuable insights into vegetation changes in peat bogs, also with respect to bog response to ongoing and future climatic changes. Copyright © 2014 Elsevier B.V. All rights reserved.

  5. An empirical approach to update multivariate regression models intended for routine industrial use

    Energy Technology Data Exchange (ETDEWEB)

    Garcia-Mencia, M.V.; Andrade, J.M.; Lopez-Mahia, P.; Prada, D. [University of La Coruna, La Coruna (Spain). Dept. of Analytical Chemistry

    2000-11-01

    Many problems currently tackled by analysts are highly complex and, accordingly, multivariate regression models need to be developed. Two intertwined topics are important when such models are to be applied within the industrial routines: (1) Did the model account for the 'natural' variance of the production samples? (2) Is the model stable on time? This paper focuses on the second topic and it presents an empirical approach where predictive models developed by using Mid-FTIR and PLS and PCR hold its utility during about nine months when used to predict the octane number of platforming naphthas in a petrochemical refinery. 41 refs., 10 figs., 1 tab.

  6. Flux Measurements in Trees: Methodological Approach and Application to Vineyards

    Directory of Open Access Journals (Sweden)

    Francesca De Lorenzi

    2008-03-01

    Full Text Available In this paper a review of two sap flow methods for measuring the transpiration in vineyards is presented. The objective of this work is to examine the potential of detecting transpiration in trees in response to environmental stresses, particularly the high concentration of ozone (O3 in troposphere. The methods described are the stem heat balance and the thermal dissipation probe; advantages and disadvantages of each method are detailed. Applications of both techniques are shown, in two large commercial vineyards in Southern Italy (Apulia and Sicily, submitted to semi-arid climate. Sap flow techniques allow to measure transpiration at plant scale and an upscaling procedure is necessary to calculate the transpiration at the whole stand level. Here a general technique to link the value of transpiration at plant level to the canopy value is presented, based on experimental relationships between transpiration and biometric characteristics of the trees. In both vineyards transpiration measured by sap flow methods compares well with evapotranspiration measured by micrometeorological techniques at canopy scale. Moreover soil evaporation component has been quantified. In conclusion, comments about the suitability of the sap flow methods for studying the interactions between trees and ozone are given.

  7. Robust Nonlinear Regression: A Greedy Approach Employing Kernels With Application to Image Denoising

    Science.gov (United States)

    Papageorgiou, George; Bouboulis, Pantelis; Theodoridis, Sergios

    2017-08-01

    We consider the task of robust non-linear regression in the presence of both inlier noise and outliers. Assuming that the unknown non-linear function belongs to a Reproducing Kernel Hilbert Space (RKHS), our goal is to estimate the set of the associated unknown parameters. Due to the presence of outliers, common techniques such as the Kernel Ridge Regression (KRR) or the Support Vector Regression (SVR) turn out to be inadequate. Instead, we employ sparse modeling arguments to explicitly model and estimate the outliers, adopting a greedy approach. The proposed robust scheme, i.e., Kernel Greedy Algorithm for Robust Denoising (KGARD), is inspired by the classical Orthogonal Matching Pursuit (OMP) algorithm. Specifically, the proposed method alternates between a KRR task and an OMP-like selection step. Theoretical results concerning the identification of the outliers are provided. Moreover, KGARD is compared against other cutting edge methods, where its performance is evaluated via a set of experiments with various types of noise. Finally, the proposed robust estimation framework is applied to the task of image denoising, and its enhanced performance in the presence of outliers is demonstrated.

  8. Anatomical diversity and regressive evolution in trichomanoid filmy ferns (Hymenophyllaceae): a phylogenetic approach.

    Science.gov (United States)

    Dubuisson, Jean-Yves; Hennequin, Sabine; Bary, Sophie; Ebihara, Atsushi; Boucheron-Dubuisson, Elodie

    2011-12-01

    To infer the anatomical evolution of the Hymenophyllaceae (filmy ferns) and to test previously suggested scenarios of regressive evolution, we performed an exhaustive investigation of stem anatomy in the most variable lineage of the family, the trichomanoids, using a representative sampling of 50 species. The evolution of qualitative and quantitative anatomical characters and possibly related growth-forms was analyzed using a maximum likelihood approach. Potential correlations between selected characters were then statistically tested using a phylogenetic comparative method. Our investigations support the anatomical homogeneity of this family at the generic and sub-generic levels. Reduced and sub-collateral/collateral steles likely derived from an ancestral massive protostele, and sub-collateral/collateral types appear to be related to stem thickness reduction and root apparatus regression. These results corroborate the hypothesis of regressive evolution in the lineage, in terms of morphology as well as anatomy. In addition, a heterogeneous cortex, which is derived in the lineage, appears to be related to a colonial strategy and likely to a climbing phenotype. The evolutionary hypotheses proposed in this study lay the ground for further evolutionary analyses that take into account trichomanoid habitats and accurate ecological preferences.

  9. Impacts of age-dependent tree sensitivity and dating approaches on dendrogeomorphic time series of landslides

    Science.gov (United States)

    Šilhán, Karel; Stoffel, Markus

    2015-05-01

    Different approaches and thresholds have been utilized in the past to date landslides with growth ring series of disturbed trees. Past work was mostly based on conifer species because of their well-defined ring boundaries and the easy identification of compression wood after stem tilting. More recently, work has been expanded to include broad-leaved trees, which are thought to produce less and less evident reactions after landsliding. This contribution reviews recent progress made in dendrogeomorphic landslide analysis and introduces a new approach in which landslides are dated via ring eccentricity formed after tilting. We compare results of this new and the more conventional approaches. In addition, the paper also addresses tree sensitivity to landslide disturbance as a function of tree age and trunk diameter using 119 common beech (Fagus sylvatica L.) and 39 Crimean pine (Pinus nigra ssp. pallasiana) trees growing on two landslide bodies. The landslide events reconstructed with the classical approach (reaction wood) also appear as events in the eccentricity analysis, but the inclusion of eccentricity clearly allowed for more (162%) landslides to be detected in the tree-ring series. With respect to tree sensitivity, conifers and broad-leaved trees show the strongest reactions to landslides at ages comprised between 40 and 60 years, with a second phase of increased sensitivity in P. nigra at ages of ca. 120-130 years. These phases of highest sensitivities correspond with trunk diameters at breast height of 6-8 and 18-22 cm, respectively (P. nigra). This study thus calls for the inclusion of eccentricity analyses in future landslide reconstructions as well as for the selection of trees belonging to different age and diameter classes to allow for a well-balanced and more complete reconstruction of past events.

  10. Partitioning of Multivariate Phenotypes using Regression Trees Reveals Complex Patterns of Adaptation to Climate across the Range of Black Cottonwood (Populus trichocarpa

    Directory of Open Access Journals (Sweden)

    Regis Wendpouire Oubida

    2015-03-01

    Full Text Available Local adaptation to climate in temperate forest trees involves the integration of multiple physiological, morphological, and phenological traits. Latitudinal clines are frequently observed for these traits, but environmental constraints also track longitude and altitude. We combined extensive phenotyping of 12 candidate adaptive traits, multivariate regression trees, quantitative genetics, and a genome-wide panel of SNP markers to better understand the interplay among geography, climate, and adaptation to abiotic factors in Populus trichocarpa. Heritabilities were low to moderate (0.13 to 0.32 and population differentiation for many traits exceeded the 99th percentile of the genome-wide distribution of FST, suggesting local adaptation. When climate variables were taken as predictors and the 12 traits as response variables in a multivariate regression tree analysis, evapotranspiration (Eref explained the most variation, with subsequent splits related to mean temperature of the warmest month, frost-free period (FFP, and mean annual precipitation (MAP. These grouping matched relatively well the splits using geographic variables as predictors: the northernmost groups (short FFP and low Eref had the lowest growth, and lowest cold injury index; the southern British Columbia group (low Eref and intermediate temperatures had average growth and cold injury index; the group from the coast of California and Oregon (high Eref and FFP had the highest growth performance and the highest cold injury index; and the southernmost, high-altitude group (with high Eref and low FFP performed poorly, had high cold injury index, and lower water use efficiency. Taken together, these results suggest variation in both temperature and water availability across the range shape multivariate adaptive traits in poplar.

  11. 利用误差变量联立方程组建立南方杉木一元立木材积模型和胸径地径回归模型%Using Error-in-Variable Simultaneous Equations Approach to Construct One-way Tree Volume Models and Diameter at Breast Height-Diameter on Root Collar Regression Model for Chinese Fir (Cunninghamia lanceolata) in Southern China

    Institute of Scientific and Technical Information of China (English)

    曾伟生

    2012-01-01

    利用我国南方的杉木实测数据,采用误差变量联立方程组方法,同时建立了胸径一元材积模型、地径一元材积模型和胸径一地径回归模型。结果表明:地径与胸径之间相关紧密,其回归模型的确定系数可以达到0.96以上;地径一元材积模型的预估精度要明显低于胸径一元材积模型。%Based on the data of Chinese fir ( Cunninghamia lanceolata) in southern China, three models, DBH (Diameter at Breast Height ) -based volume model, DRC (Diameter on Root Collar)-based volume model, and DBH-DRC regression model, were constructed using the error-in-variabl~ simultaneous equations approach. The results showed that DBH is closely related to DRC, determination coefficient of the regression is more than 0. 96 ; and the prediction precision of DRC-based volume model is clearly lower than that of DBH-based volume model.

  12. Predicting Chinese Abbreviations from Definitions: An Empirical Learning Approach Using Support Vector Regression

    Institute of Scientific and Technical Information of China (English)

    Xu Sun; Hou-Feng Wang; Bo Wang

    2008-01-01

    In Chinese, phrases and named entities play a central role in information retrieval. Abbreviations, however,make keyword-based approaches less effective. This paper presents an empirical learning approach to Chinese abbreviation prediction. In this study, each abbreviation is taken as a reduced form of the corresponding definition (expanded form),and the abbreviation prediction is formalized as a scoring and ranking problem among abbreviation candidates, which are automatically generated from the corresponding definition. By employing Support Vector Regression (SVR) for scoring,we can obtain multiple abbreviation candidates together with their SVR values, which are used for candidate ranking.Experimental results show that the SVR method performs better than the popular heuristic rule of abbreviation prediction.In addition, in abbreviation prediction, the SVR method outperforms the hidden Markov model (HMM).

  13. An adaptive online learning approach for Support Vector Regression: Online-SVR-FID

    Science.gov (United States)

    Liu, Jie; Zio, Enrico

    2016-08-01

    Support Vector Regression (SVR) is a popular supervised data-driven approach for building empirical models from available data. Like all data-driven methods, under non-stationary environmental and operational conditions it needs to be provided with adaptive learning capabilities, which might become computationally burdensome with large datasets cumulating dynamically. In this paper, a cost-efficient online adaptive learning approach is proposed for SVR by combining Feature Vector Selection (FVS) and Incremental and Decremental Learning. The proposed approach adaptively modifies the model only when different pattern drifts are detected according to proposed criteria. Two tolerance parameters are introduced in the approach to control the computational complexity, reduce the influence of the intrinsic noise in the data and avoid the overfitting problem of SVR. Comparisons of the prediction results is made with other online learning approaches e.g. NORMA, SOGA, KRLS, Incremental Learning, on several artificial datasets and a real case study concerning time series prediction based on data recorded on a component of a nuclear power generation system. The performance indicators MSE and MARE computed on the test dataset demonstrate the efficiency of the proposed online learning method.

  14. Statistical Downscaling: A Comparison of Multiple Linear Regression and k-Nearest Neighbor Approaches

    Science.gov (United States)

    Gangopadhyay, S.; Clark, M. P.; Rajagopalan, B.

    2002-12-01

    The success of short term (days to fortnight) streamflow forecasting largely depends on the skill of surface climate (e.g., precipitation and temperature) forecasts at local scales in the individual river basins. The surface climate forecasts are used to drive the hydrologic models for streamflow forecasting. Typically, Medium Range Forecast (MRF) models provide forecasts of large scale circulation variables (e.g. pressures, wind speed, relative humidity etc.) at different levels in the atmosphere on a regular grid - which are then used to "downscale" to the surface climate at locations within the model grid box. Several statistical and dynamical methods are available for downscaling. This paper compares the utility of two statistical downscaling methodologies: (1) multiple linear regression (MLR) and (2) a nonparametric approach based on k-nearest neighbor (k-NN) bootstrap method, in providing local-scale information of precipitation and temperature at a network of stations in the Upper Colorado River Basin. Downscaling to the stations is based on output of large scale circulation variables (i.e. predictors) from the NCEP Medium Range Forecast (MRF) database. Fourteen-day six hourly forecasts are developed using these two approaches, and their forecast skill evaluated. A stepwise regression is performed at each location to select the predictors for the MLR. The k-NN bootstrap technique resamples historical data based on their "nearness" to the current pattern in the predictor space. Prior to resampling a Principal Component Analysis (PCA) is performed on the predictor set to identify a small subset of predictors. Preliminary results using the MLR technique indicate a significant value in the downscaled MRF output in predicting runoff in the Upper Colorado Basin. It is expected that the k-NN approach will match the skill of the MLR approach at individual stations, and will have the added advantage of preserving the spatial co-variability between stations, capturing

  15. An improved Approach for Document Retrieval Using Suffix Trees

    Directory of Open Access Journals (Sweden)

    N. Sandhya

    2011-09-01

    Full Text Available Huge collection of documents is available at few mouse clicks. The current World Wide Web is a web of pages. Users have to guess possible keywords that might lead through search engines to the pages that contain information of interest and browse hundreds or even thousands of the returned pages in order to obtain what they want. In our work we build a generalized suffix tree for our documents and propose a search technique for retrieving documents based on a sort of phrase called word sequences. Our proposed method efficiently searches for a given phrase (with missing or additional words in between with better performance.

  16. An efficient approach to 3D single tree-crown delineation in LiDAR data

    Science.gov (United States)

    Mongus, Domen; Žalik, Borut

    2015-10-01

    This paper proposes a new method for 3D delineation of single tree-crowns in LiDAR data by exploiting the complementaries of treetop and tree trunk detections. A unified mathematical framework is provided based on the graph theory, allowing for all the segmentations to be achieved using marker-controlled watersheds. Treetops are defined by detecting concave neighbourhoods within the canopy height model using locally fitted surfaces. These serve as markers for watershed segmentation of the canopy layer where possible oversegmentation is reduced by merging the regions based on their heights, areas, and shapes. Additional tree crowns are delineated from mid- and under-storey layers based on tree trunk detection. A new approach for estimating the verticalities of the points' distributions is proposed for this purpose. The watershed segmentation is then applied on a density function within the voxel space, while boundaries of delineated trees from the canopy layer are used to prevent the overspreading of regions. The experiments show an approximately 6% increase in the efficiency of the proposed treetop definition based on locally fitted surfaces in comparison with the traditionally used local maxima of the smoothed canopy height model. In addition, 4% increase in the efficiency is achieved by the proposed tree trunk detection. Although the tree trunk detection alone is dependent on the data density, supplementing it with the treetop detection the proposed approach is efficient even when dealing with low density point-clouds.

  17. A Tree-based Approach for Modelling Interception Loss From Evergreen Oak Mediterranean Savannas

    Science.gov (United States)

    Pereira, Fernando L.; Gash, John H. C.; David, Jorge S.; David, Teresa S.; Monteiro, Paulo R.; Valente, Fernanda

    2010-05-01

    Evaporation of rainfall intercepted by tree canopies is usually an important part of the overall water balance of forested catchments and there have been many studies dedicated to measuring and modelling rainfall interception loss. These studies have mainly been conducted in dense forests; there have been few studies on the very sparse forests which are common in dry and semi-arid areas. Water resources are scarce in these areas making sparse forests particularly important. Methods for modelling interception loss are thus required to support sustainable water management in those areas. In very sparse forests, trees occur as widely spaced individuals rather than as a continuous forest canopy. We therefore suggest that interception loss for this vegetation type can be more adequately modelled if the overall forest evaporation is derived by scaling up the evaporation from individual trees. The evaporation rate for a single tree can be estimated using a simple Dalton-type diffusion equation for water vapour as long as its surface temperature is known. From theory, this temperature is shown to be dependent upon the available energy and windspeed. However, the surface temperature of a fully saturated tree crown, under rainy conditions, should approach the wet bulb temperature as the radiative energy input to the tree reduces to zero. This was experimentally confirmed from measurements of the radiation balance and surface temperature of an isolated tree crown. Thus, evaporation of intercepted rainfall can be estimated using an equation which only requires knowledge of the air dry and wet bulb temperatures and of the bulk tree-crown aerodynamic conductance. This was taken as the basis of a new approach for modelling interception loss from savanna-type woodland, i.e. by combining the Dalton-type equation with the Gash's analytical model to estimate interception loss from isolated trees. This modelling approach was tested using data from two Mediterranean savanna-type oak

  18. Binary Logistic Regression Versus Boosted Regression Trees in Assessing Landslide Susceptibility for Multiple-Occurring Regional Landslide Events: Application to the 2009 Storm Event in Messina (Sicily, southern Italy).

    Science.gov (United States)

    Lombardo, L.; Cama, M.; Maerker, M.; Parisi, L.; Rotigliano, E.

    2014-12-01

    This study aims at comparing the performances of Binary Logistic Regression (BLR) and Boosted Regression Trees (BRT) methods in assessing landslide susceptibility for multiple-occurrence regional landslide events within the Mediterranean region. A test area was selected in the north-eastern sector of Sicily (southern Italy), corresponding to the catchments of the Briga and the Giampilieri streams both stretching for few kilometres from the Peloritan ridge (eastern Sicily, Italy) to the Ionian sea. This area was struck on the 1st October 2009 by an extreme climatic event resulting in thousands of rapid shallow landslides, mainly of debris flows and debris avalanches types involving the weathered layer of a low to high grade metamorphic bedrock. Exploiting the same set of predictors and the 2009 landslide archive, BLR- and BRT-based susceptibility models were obtained for the two catchments separately, adopting a random partition (RP) technique for validation; besides, the models trained in one of the two catchments (Briga) were tested in predicting the landslide distribution in the other (Giampilieri), adopting a spatial partition (SP) based validation procedure. All the validation procedures were based on multi-folds tests so to evaluate and compare the reliability of the fitting, the prediction skill, the coherence in the predictor selection and the precision of the susceptibility estimates. All the obtained models for the two methods produced very high predictive performances, with a general congruence between BLR and BRT in the predictor importance. In particular, the research highlighted that BRT-models reached a higher prediction performance with respect to BLR-models, for RP based modelling, whilst for the SP-based models the difference in predictive skills between the two methods dropped drastically, converging to an analogous excellent performance. However, when looking at the precision of the probability estimates, BLR demonstrated to produce more robust

  19. Using New Approaches to obtain Gibbs Measures of Vannimenus model on a Cayley tree

    OpenAIRE

    2015-01-01

    In this paper, we consider Vannimenus model with competing nearest-neighbors and prolonged next-nearest-neighbors interactions on a Cayley tree. For this model we define Markov random fields with memory of length 2. By using a new approach, we obtain new sets of Gibbs measures of Ising-Vannimenus model on Cayley tree of order 2. We construct the recurrence equations corresponding Ising-Vannimenus model. We prove the Kolmogorov consistency condition. We investigate the translation-invariant an...

  20. A MULTIVARIATE APPROACH TO ANALYSE NATIVE FOREST TREE SPECIE SEEDS

    Directory of Open Access Journals (Sweden)

    Alessandro Dal Col Lúcio

    2006-03-01

    Full Text Available This work grouped, by species, the most similar seed tree, using the variables observed in exotic forest species of theBrazilian flora of seeds collected in the Forest Research and Soil Conservation Center of Santa Maria, Rio Grande do Sul, analyzedfrom January, 1997, to march, 2003. For the cluster analysis, all the species that possessed four or more analyses per lot wereanalyzed by the hierarchical Clustering method, of the standardized Euclidian medium distance, being also a principal componentanalysis technique for reducing the number of variables. The species Callistemon speciosus, Cassia fistula, Eucalyptus grandis,Eucalyptus robusta, Eucalyptus saligna, Eucalyptus tereticornis, Delonix regia, Jacaranda mimosaefolia e Pinus elliottii presentedmore than four analyses per lot, in which the third and fourth main components explained 80% of the total variation. The clusteranalysis was efficient in the separation of the groups of all tested species, as well as the method of the main components.

  1. A single-ensemble-based hybrid approach to clutter rejection combining bilinear Hankel with regression.

    Science.gov (United States)

    Shen, Zhiyuan; Feng, Naizhang; Lee, Chin-Hui

    2013-04-01

    Clutter regarded as ultrasound Doppler echoes of soft tissue interferes with the primary objective of color flow imaging (CFI): measurement and display of blood flow. Multi-ensemble samples based clutter filters degrade resolution or frame rate of CFI. The prevalent single-ensemble clutter rejection filter is based on a single rejection criterion and fails to achieve a high accuracy for estimating both the low- and high-velocity blood flow components. The Bilinear Hankel-SVD achieved more exact signal decomposition than the conventional Hankel-SVD. Furthermore, the correlation between two arbitrary eigen-components obtained by the B-Hankel-SVD was demonstrated. In the hybrid approach, the input ultrasound Doppler signal first passes through a low-order regression filter, and then the output is properly decomposed into a collection of eigen-components under the framework of B-Hankel-SVD. The blood flow components are finally extracted based on a frequency threshold. In a series of simulations, the proposed B-Hankel-SVD filter reduced the estimation bias of the blood flow over the conventional Hankel-SVD filter. The hybrid algorithm was shown to be more effective than regression or Hankel-SVD filters alone in rejecting the undesirable clutter components with single-ensemble (S-E) samples. It achieved a significant improvement in blood flow frequency estimation and estimation variance over the other competing filters.

  2. A quantile regression approach to the analysis of the quality of life determinants in the elderly

    Directory of Open Access Journals (Sweden)

    Serena Broccoli

    2013-05-01

    Full Text Available Objective. The aim of this study is to explain the effect of important covariates on the health-related quality of life (HRQol in elderly subjects. Methods. Data were collected within a longitudinal study that involves 5256 subject, aged +or= 65. The Visual Analogue Scale inclused in the EQ-5D Questionnaire, tha EQ-VAS, was used to obtain a synthetic measure of quality of life. To model EQ-VAS Score a quantile regression analysis was employed. This methodological approach was preferred to an OLS regression becouse of the EQ-VAS Score typical distribution. The main covariates are: amount of weekly physical activity, reported problems in Activity of Daily Living, presence of cardiovascular diseases, diabetes, hypercolesterolemia, hypertension, joints pains, as well as socio-demographic information. Main Results. 1 Even a low level of physical activity significantly influences quality of life in a positive way; 2 ADL problems, at least one cardiovascular disease and joint pain strongly decrease the quality of life.

  3. Integrated Analysis of Tropical Trees Growth: A Multivariate Approach

    Science.gov (United States)

    YÁÑEZ-ESPINOSA, LAURA; TERRAZAS, TERESA; LÓPEZ-MATA, LAURO

    2006-01-01

    • Background and Aims One of the problems analysing cause–effect relationships of growth and environmental factors is that a single factor could be correlated with other ones directly influencing growth. One attempt to understand tropical trees' growth cause–effect relationships is integrating research about anatomical, physiological and environmental factors that influence growth in order to develop mathematical models. The relevance is to understand the nature of the process of growth and to model this as a function of the environment. • Methods The relationships of Aphananthe monoica, Pleuranthodendron lindenii and Psychotria costivenia radial growth and phenology with environmental factors (local climate, vertical strata microclimate and physical and chemical soil variables) were evaluated from April 2000 to September 2001. The association among these groups of variables was determined by generalized canonical correlation analysis (GCCA), which considers the probable associations of three or more data groups and the selection of the most important variables for each data group. • Key Results The GCCA allowed determination of a general model of relationships among tree phenology and radial growth with climate, microclimate and soil factors. A strong influence of climate in phenology and radial growth existed. Leaf initiation and cambial activity periods were associated with maximum temperature and day length, and vascular tissue differentiation with soil moisture and rainfall. The analyses of individual species detected different relationships for the three species. • Conclusions The analyses of the individual species suggest that each one takes advantage in a different way of the environment in which they are growing, allowing them to coexist. PMID:16822807

  4. RAVEN: Dynamic Event Tree Approach Level III Milestone

    Energy Technology Data Exchange (ETDEWEB)

    Andrea Alfonsi; Cristian Rabiti; Diego Mandelli; Joshua Cogliati; Robert Kinoshita

    2013-07-01

    Conventional Event-Tree (ET) based methodologies are extensively used as tools to perform reliability and safety assessment of complex and critical engineering systems. One of the disadvantages of these methods is that timing/sequencing of events and system dynamics are not explicitly accounted for in the analysis. In order to overcome these limitations several techniques, also know as Dynamic Probabilistic Risk Assessment (DPRA), have been developed. Monte-Carlo (MC) and Dynamic Event Tree (DET) are two of the most widely used D-PRA methodologies to perform safety assessment of Nuclear Power Plants (NPP). In the past two years, the Idaho National Laboratory (INL) has developed its own tool to perform Dynamic PRA: RAVEN (Reactor Analysis and Virtual control ENvironment). RAVEN has been designed to perform two main tasks: 1) control logic driver for the new Thermo-Hydraulic code RELAP-7 and 2) post-processing tool. In the first task, RAVEN acts as a deterministic controller in which the set of control logic laws (user defined) monitors the RELAP-7 simulation and controls the activation of specific systems. Moreover, the control logic infrastructure is used to model stochastic events, such as components failures, and perform uncertainty propagation. Such stochastic modeling is deployed using both MC and DET algorithms. In the second task, RAVEN processes the large amount of data generated by RELAP-7 using data-mining based algorithms. This report focuses on the analysis of dynamic stochastic systems using the newly developed RAVEN DET capability. As an example, a DPRA analysis, using DET, of a simplified pressurized water reactor for a Station Black-Out (SBO) scenario is presented.

  5. RAVEN. Dynamic Event Tree Approach Level III Milestone

    Energy Technology Data Exchange (ETDEWEB)

    Alfonsi, Andrea [Idaho National Lab. (INL), Idaho Falls, ID (United States); Rabiti, Cristian [Idaho National Lab. (INL), Idaho Falls, ID (United States); Mandelli, Diego [Idaho National Lab. (INL), Idaho Falls, ID (United States); Cogliati, Joshua [Idaho National Lab. (INL), Idaho Falls, ID (United States); Kinoshita, Robert [Idaho National Lab. (INL), Idaho Falls, ID (United States)

    2014-07-01

    Conventional Event-Tree (ET) based methodologies are extensively used as tools to perform reliability and safety assessment of complex and critical engineering systems. One of the disadvantages of these methods is that timing/sequencing of events and system dynamics are not explicitly accounted for in the analysis. In order to overcome these limitations several techniques, also know as Dynamic Probabilistic Risk Assessment (DPRA), have been developed. Monte-Carlo (MC) and Dynamic Event Tree (DET) are two of the most widely used D-PRA methodologies to perform safety assessment of Nuclear Power Plants (NPP). In the past two years, the Idaho National Laboratory (INL) has developed its own tool to perform Dynamic PRA: RAVEN (Reactor Analysis and Virtual control ENvironment). RAVEN has been designed to perform two main tasks: 1) control logic driver for the new Thermo-Hydraulic code RELAP-7 and 2) post-processing tool. In the first task, RAVEN acts as a deterministic controller in which the set of control logic laws (user defined) monitors the RELAP-7 simulation and controls the activation of specific systems. Moreover, the control logic infrastructure is used to model stochastic events, such as components failures, and perform uncertainty propagation. Such stochastic modeling is deployed using both MC and DET algorithms. In the second task, RAVEN processes the large amount of data generated by RELAP-7 using data-mining based algorithms. This report focuses on the analysis of dynamic stochastic systems using the newly developed RAVEN DET capability. As an example, a DPRA analysis, using DET, of a simplified pressurized water reactor for a Station Black-Out (SBO) scenario is presented.

  6. Regression analysis based on conditional likelihood approach under semi-competing risks data.

    Science.gov (United States)

    Hsieh, Jin-Jian; Huang, Yu-Ting

    2012-07-01

    Medical studies often involve semi-competing risks data, which consist of two types of events, namely terminal event and non-terminal event. Because the non-terminal event may be dependently censored by the terminal event, it is not possible to make inference on the non-terminal event without extra assumptions. Therefore, this study assumes that the dependence structure on the non-terminal event and the terminal event follows a copula model, and lets the marginal regression models of the non-terminal event and the terminal event both follow time-varying effect models. This study uses a conditional likelihood approach to estimate the time-varying coefficient of the non-terminal event, and proves the large sample properties of the proposed estimator. Simulation studies show that the proposed estimator performs well. This study also uses the proposed method to analyze AIDS Clinical Trial Group (ACTG 320).

  7. The relation of student behavior, peer status, race, and gender to decisions about school discipline using CHAID decision trees and regression modeling.

    Science.gov (United States)

    Horner, Stacy B; Fireman, Gary D; Wang, Eugene W

    2010-04-01

    Peer nominations and demographic information were collected from a diverse sample of 1493 elementary school participants to examine behavior (overt and relational aggression, impulsivity, and prosociality), context (peer status), and demographic characteristics (race and gender) as predictors of teacher and administrator decisions about discipline. Exploratory results using classification tree analyses indicated students nominated as average or highly overtly aggressive were more likely to be disciplined than others. Among these students, race was the most significant predictor, with African American students more likely to be disciplined than Caucasians, Hispanics, or Others. Among the students nominated as low in overt aggression, a lack of prosocial behavior was the most significant predictor. Confirmatory analysis using hierarchical logistic regression supported the exploratory results. Similarities with other biased referral patterns, proactive classroom management strategies, and culturally sensitive recommendations are discussed.

  8. Analysis of the Importance of Oxides and Clays in Cd, Cr, Cu, Ni, Pb and Zn Adsorption and Retention with Regression Trees

    Science.gov (United States)

    González-Costa, Juan José; Reigosa, Manuel Joaquín; Matías, José María; Fernández-Covelo, Emma

    2017-01-01

    This study determines the influence of the different soil components and of the cation-exchange capacity on the adsorption and retention of different heavy metals: cadmium, chromium, copper, nickel, lead and zinc. In order to do so, regression models were created through decision trees and the importance of soil components was assessed. Used variables were: humified organic matter, specific cation-exchange capacity, percentages of sand and silt, proportions of Mn, Fe and Al oxides and hematite, and the proportion of quartz, plagioclase and mica, and the proportions of the different clays: kaolinite, vermiculite, gibbsite and chlorite. The most important components in the obtained models were vermiculite and gibbsite, especially for the adsorption of cadmium and zinc, while clays were less relevant. Oxides are less important than clays, especially for the adsorption of chromium and lead and the retention of chromium, copper and lead. PMID:28072849

  9. Analysis of the Importance of Oxides and Clays in Cd, Cr, Cu, Ni, Pb and Zn Adsorption and Retention with Regression Trees.

    Science.gov (United States)

    González-Costa, Juan José; Reigosa, Manuel Joaquín; Matías, José María; Fernández-Covelo, Emma

    2017-01-01

    This study determines the influence of the different soil components and of the cation-exchange capacity on the adsorption and retention of different heavy metals: cadmium, chromium, copper, nickel, lead and zinc. In order to do so, regression models were created through decision trees and the importance of soil components was assessed. Used variables were: humified organic matter, specific cation-exchange capacity, percentages of sand and silt, proportions of Mn, Fe and Al oxides and hematite, and the proportion of quartz, plagioclase and mica, and the proportions of the different clays: kaolinite, vermiculite, gibbsite and chlorite. The most important components in the obtained models were vermiculite and gibbsite, especially for the adsorption of cadmium and zinc, while clays were less relevant. Oxides are less important than clays, especially for the adsorption of chromium and lead and the retention of chromium, copper and lead.

  10. Quantifying the ability of environmental parameters to predict soil texture fractions using regression-tree model with GIS and LIDAR data

    DEFF Research Database (Denmark)

    Greve, Mogens Humlekrog; Bou Kheir, Rania; Greve, Mette Balslev

    2012-01-01

    sand, silt, and clay in soil determines its textural classification. This study used Geographic Information Systems (GIS) and regression-tree modeling to precisely quantify the relationships between the soil texture fractions and different environmental parameters on a national scale, and to detect...... precipitation, seasonal precipitation to statistically explain soil texture fractions field/laboratory measurements (45,224 sampling sites) in the area of interest (Denmark). The developed strongest relationships were associated with clay and silt, variance being equal to 60%, followed by coarse sand (54.......5%) and fine sand (52%) as the weakest relationship. This study also showed that parent materials (with a relative importance varying between 47% and 100%), geographic regions (31–100%) and landscape types (68–100%) considerably influenced all soil texture fractions, which is not the case for climate and DEM...

  11. Genome trees constructed using five different approaches suggest new major bacterial clades

    Directory of Open Access Journals (Sweden)

    Tatusov Roman L

    2001-10-01

    Full Text Available Abstract Background The availability of multiple complete genome sequences from diverse taxa prompts the development of new phylogenetic approaches, which attempt to incorporate information derived from comparative analysis of complete gene sets or large subsets thereof. Such attempts are particularly relevant because of the major role of horizontal gene transfer and lineage-specific gene loss, at least in the evolution of prokaryotes. Results Five largely independent approaches were employed to construct trees for completely sequenced bacterial and archaeal genomes: i presence-absence of genomes in clusters of orthologous genes; ii conservation of local gene order (gene pairs among prokaryotic genomes; iii parameters of identity distribution for probable orthologs; iv analysis of concatenated alignments of ribosomal proteins; v comparison of trees constructed for multiple protein families. All constructed trees support the separation of the two primary prokaryotic domains, bacteria and archaea, as well as some terminal bifurcations within the bacterial and archaeal domains. Beyond these obvious groupings, the trees made with different methods appeared to differ substantially in terms of the relative contributions of phylogenetic relationships and similarities in gene repertoires caused by similar life styles and horizontal gene transfer to the tree topology. The trees based on presence-absence of genomes in orthologous clusters and the trees based on conserved gene pairs appear to be strongly affected by gene loss and horizontal gene transfer. The trees based on identity distributions for orthologs and particularly the tree made of concatenated ribosomal protein sequences seemed to carry a stronger phylogenetic signal. The latter tree supported three potential high-level bacterial clades,: i Chlamydia-Spirochetes, ii Thermotogales-Aquificales (bacterial hyperthermophiles, and ii Actinomycetes-Deinococcales-Cyanobacteria. The latter group also

  12. A covariate-adjustment regression model approach to noninferiority margin definition.

    Science.gov (United States)

    Nie, Lei; Soon, Guoxing

    2010-05-10

    To maintain the interpretability of the effect of experimental treatment (EXP) obtained from a noninferiority trial, current statistical approaches often require the constancy assumption. This assumption typically requires that the control treatment effect in the population of the active control trial is the same as its effect presented in the population of the historical trial. To prevent constancy assumption violation, clinical trial sponsors were recommended to make sure that the design of the active control trial is as close to the design of the historical trial as possible. However, these rigorous requirements are rarely fulfilled in practice. The inevitable discrepancies between the historical trial and the active control trial have led to debates on many controversial issues. Without support from a well-developed quantitative method to determine the impact of the discrepancies on the constancy assumption violation, a correct judgment seems difficult. In this paper, we present a covariate-adjustment generalized linear regression model approach to achieve two goals: (1) to quantify the impact of population difference between the historical trial and the active control trial on the degree of constancy assumption violation and (2) to redefine the active control treatment effect in the active control trial population if the quantification suggests an unacceptable violation. Through achieving goal (1), we examine whether or not a population difference leads to an unacceptable violation. Through achieving goal (2), we redefine the noninferiority margin if the violation is unacceptable. This approach allows us to correctly determine the effect of EXP in the noninferiority trial population when constancy assumption is violated due to the population difference. We illustrate the covariate-adjustment approach through a case study.

  13. Materialized View Selection Approach Using Tree Based Methodology

    Directory of Open Access Journals (Sweden)

    MR. P. P. KARDE

    2010-10-01

    Full Text Available In large databases particularly in distributed database, query response time plays an important role as timely access to information and it is the basic requirement of successful business application. A data warehouse uses multiple materialized views to efficiently process a given set of queries. Quick response time and accuracy areimportant factors in the success of any database. The materialization of all views is not possible because of the space constraint and maintenance cost constraint. Selection of Materialized views is one of the most important decisions in designing a data warehouse for optimal efficiency. Selecting a suitable set of views that minimizesthe total cost associated with the materialized views and is the key component in data warehousing. Materialized views are found to be very useful for fast query processing. This paper gives the results of proposed tree based materialized view selection algorithm for query processing. In distributed environment where database is distributed over the nodes on which query should get executed and also plays an important role. This paper also proposes node selection algorithm for fast materialized view selection in distributed environment. And finally it is found that the proposed methodology performs better for query processing as compared to other materializedview selection strategies.

  14. Tree level hydrodynamic approach for resolving aboveground water storage and stomatal conductance and modeling the effects of tree hydraulic strategy

    Science.gov (United States)

    Mirfenderesgi, Golnazalsadat; Bohrer, Gil; Matheny, Ashley M.; Fatichi, Simone; Moraes Frasson, Renato Prata; Schäfer, Karina V. R.

    2016-07-01

    The finite difference ecosystem-scale tree crown hydrodynamics model version 2 (FETCH2) is a tree-scale hydrodynamic model of transpiration. The FETCH2 model employs a finite difference numerical methodology and a simplified single-beam conduit system to explicitly resolve xylem water potentials throughout the vertical extent of a tree. Empirical equations relate water potential within the stem to stomatal conductance of the leaves at each height throughout the crown. While highly simplified, this approach brings additional realism to the simulation of transpiration by linking stomatal responses to stem water potential rather than directly to soil moisture, as is currently the case in the majority of land surface models. FETCH2 accounts for plant hydraulic traits, such as the degree of anisohydric/isohydric response of stomata, maximal xylem conductivity, vertical distribution of leaf area, and maximal and minimal xylem water content. We used FETCH2 along with sap flow and eddy covariance data sets collected from a mixed plot of two genera (oak/pine) in Silas Little Experimental Forest, NJ, USA, to conduct an analysis of the intergeneric variation of hydraulic strategies and their effects on diurnal and seasonal transpiration dynamics. We define these strategies through the parameters that describe the genus level transpiration and xylem conductivity responses to changes in stem water potential. Our evaluation revealed that FETCH2 considerably improved the simulation of ecosystem transpiration and latent heat flux in comparison to more conventional models. A virtual experiment showed that the model was able to capture the effect of hydraulic strategies such as isohydric/anisohydric behavior on stomatal conductance under different soil-water availability conditions.

  15. Explaining the heterogeneous scrapie surveillance figures across Europe: a meta-regression approach

    Directory of Open Access Journals (Sweden)

    Ru Giuseppe

    2007-06-01

    Full Text Available Abstract Background Two annual surveys, the abattoir and the fallen stock, monitor the presence of scrapie across Europe. A simple comparison between the prevalence estimates in different countries reveals that, in 2003, the abattoir survey appears to detect more scrapie in some countries. This is contrary to evidence suggesting the greater ability of the fallen stock survey to detect the disease. We applied meta-analysis techniques to study this apparent heterogeneity in the behaviour of the surveys across Europe. Furthermore, we conducted a meta-regression analysis to assess the effect of country-specific characteristics on the variability. We have chosen the odds ratios between the two surveys to inform the underlying relationship between them and to allow comparisons between the countries under the meta-regression framework. Baseline risks, those of the slaughtered populations across Europe, and country-specific covariates, available from the European Commission Report, were inputted in the model to explain the heterogeneity. Results Our results show the presence of significant heterogeneity in the odds ratios between countries and no reduction in the variability after adjustment for the different risks in the baseline populations. Three countries contributed the most to the overall heterogeneity: Germany, Ireland and The Netherlands. The inclusion of country-specific covariates did not, in general, reduce the variability except for one variable: the proportion of the total adult sheep population sampled as fallen stock by each country. A large residual heterogeneity remained in the model indicating the presence of substantial effect variability between countries. Conclusion The meta-analysis approach was useful to assess the level of heterogeneity in the implementation of the surveys and to explore the reasons for the variation between countries.

  16. Odor-baited trap trees: a new approach to monitoring plum curculio (Coleoptera: Curculionidae).

    Science.gov (United States)

    Prokopy, Ronald J; Chandler, Bradley W; Dynok, Sara A; Piñero, Jaime C

    2003-06-01

    We compared a trap approach with a trap-tree approach to determine the need and timing of insecticide applications against overwintered adult plum curculios, Conotrachelus nenuphar (Herbst.), in commercial apple orchards in Massachusetts in 2002. All traps and trap trees were baited with benzaldehyde (attractive fruit odor) plus grandisoic acid (attractive pheromone). Sticky clear Plexiglas panel traps placed at orchard borders, designed to intercept adults immigrating from border areas by flight, captured significantly more adults than similarly placed black pyramid traps, which are designed to capture adults immigrating primarily by crawling, or Circle traps wrapped around trunks of perimeter-row trees, which are designed to intercept adults crawling up tree trunks. None of these trap types, however, exhibited amounts of captures that correlated significantly with either weekly or season-long amounts of fresh ovipositional injury to fruit by adults. Hence, none appears to offer high promise as a tool for effectively monitoring the seasonal course of plum curculio injury to apples in commercial orchards in Massachusetts. In contrast, baiting branches of selected perimeter-row trees with benzaldehyde plus grandisoic acid led to significant aggregation (14-15-fold) of ovipositional injury, markedly facilitating monitoring of the seasonal course of injury to apples. A concurrent experiment revealed that addition of other synthetic fruit odor attractants to apple trees baited with benzaldehyde plus grandisoic acid did not enhance aggregation of ovipositional injury above that of this dual combination. We conclude that monitoring apples on odor-baited trap trees for fresh ovipositional injury could be a useful new approach for determining need and timing of insecticide application against plum curculio in commercial orchards.

  17. Predicting outcome on admission and post-admission for acetaminophen-induced acute liver failure using classification and regression tree models.

    Directory of Open Access Journals (Sweden)

    Jaime Lynn Speiser

    Full Text Available Assessing prognosis for acetaminophen-induced acute liver failure (APAP-ALF patients often presents significant challenges. King's College (KCC has been validated on hospital admission, but little has been published on later phases of illness. We aimed to improve determinations of prognosis both at the time of and following admission for APAP-ALF using Classification and Regression Tree (CART models.CART models were applied to US ALFSG registry data to predict 21-day death or liver transplant early (on admission and post-admission (days 3-7 for 803 APAP-ALF patients enrolled 01/1998-09/2013. Accuracy in prediction of outcome (AC, sensitivity (SN, specificity (SP, and area under receiver-operating curve (AUROC were compared between 3 models: KCC (INR, creatinine, coma grade, pH, CART analysis using only KCC variables (KCC-CART and a CART model using new variables (NEW-CART.Traditional KCC yielded 69% AC, 90% SP, 27% SN, and 0.58 AUROC on admission, with similar performance post-admission. KCC-CART at admission offered predictive 66% AC, 65% SP, 67% SN, and 0.74 AUROC. Post-admission, KCC-CART had predictive 82% AC, 86% SP, 46% SN and 0.81 AUROC. NEW-CART models using MELD (Model for end stage liver disease, lactate and mechanical ventilation on admission yielded predictive 72% AC, 71% SP, 77% SN and AUROC 0.79. For later stages, NEW-CART (MELD, lactate, coma grade offered predictive AC 86%, SP 91%, SN 46%, AUROC 0.73.CARTs offer simple prognostic models for APAP-ALF patients, which have higher AUROC and SN than KCC, with similar AC and negligibly worse SP. Admission and post-admission predictions were developed.• Prognostication in acetaminophen-induced acute liver failure (APAP-ALF is challenging beyond admission • Little has been published regarding the use of King's College Criteria (KCC beyond admission and KCC has shown limited sensitivity in subsequent studies • Classification and Regression Tree (CART methodology allows the

  18. Nitrogen isotopes in Tree-Rings - An approach combining soil biogeochemistry and isotopic long series with statistical modeling

    Science.gov (United States)

    Savard, Martine M.; Bégin, Christian; Paré, David; Marion, Joëlle; Laganière, Jérôme; Séguin, Armand; Stefani, Franck; Smirnoff, Anna

    2016-04-01

    Monitoring atmospheric emissions from industrial centers in North America generally started less than 25 years ago. To compensate for the lack of monitoring, previous investigations have interpreted tree-ring N changes using the known chronology of human activities, without facing the challenge of separating climatic effects from potential anthropogenic impacts. Here we document such an attempt conducted in the oil sands (OS) mining region of Northeastern Alberta, Canada. The reactive nitrogen (Nr)-emitting oil extraction operations began in 1967, but air quality measurements were only initiated in 1997. To investigate if the beginning and intensification of OS operations induced changes in the forest N-cycle, we sampled white spruce (Picea glauca (Moench) Voss) stands located at various distances from the main mining area, and receiving low, but different N deposition. Our approach combines soil biogeochemical and metagenomic characterization with long, well dated, tree-ring isotopic series. To objectively delineate the natural N isotopic behaviour in trees, we have characterized tree-ring N isotope (15N/14N) ratios between 1880 and 2009, used statistical analyses of the isotopic values and local climatic parameters of the pre-mining period to calibrate response functions and project the isotopic responses to climate during the extraction period. During that period, the measured series depart negatively from the projected natural trends. In addition, these long-term negative isotopic trends are better reproduced by multiple-regression models combining climatic parameters with the proxy for regional mining Nr emissions. These negative isotopic trends point towards changes in the forest soil biogeochemical N cycle. The biogeochemical data and ultimate soil mechanisms responsible for such changes will be discussed during the presentation.

  19. A tree unification approach to constructing generic processes

    NARCIS (Netherlands)

    Zhang, Linda L.; Rodrigues, Brian

    2009-01-01

    In dealing with product diversity, manufacturing companies strive to maintain stable production by eliminating variations in production processes. In this respect, planning process families in relation to product families to achieve production stability is a promising approach. In this paper, the ge

  20. Frugivores bias seed-adult tree associations through nonrandom seed dispersal: a phylogenetic approach.

    Science.gov (United States)

    Razafindratsima, Onja H; Dunham, Amy E

    2016-08-01

    Frugivores are the main seed dispersers in many ecosystems, such that behaviorally driven, nonrandom patterns of seed dispersal are a common process; but patterns are poorly understood. Characterizing these patterns may be essential for understanding spatial organization of fruiting trees and drivers of seed-dispersal limitation in biodiverse forests. To address this, we studied resulting spatial associations between dispersed seeds and adult tree neighbors in a diverse rainforest in Madagascar, using a temporal and phylogenetic approach. Data show that by using fruiting trees as seed-dispersal foci, frugivores bias seed dispersal under conspecific adults and under heterospecific trees that share dispersers and fruiting time with the dispersed species. Frugivore-mediated seed dispersal also resulted in nonrandom phylogenetic associations of dispersed seeds with their nearest adult neighbors, in nine out of the 16 months of our study. However, these nonrandom phylogenetic associations fluctuated unpredictably over time, ranging from clustered to overdispersed. The spatial and phylogenetic template of seed dispersal did not translate to similar patterns of association in adult tree neighborhoods, suggesting the importance of post-dispersal processes in structuring plant communities. Results suggest that frugivore-mediated seed dispersal is important for structuring early stages of plant-plant associations, setting the template for post-dispersal processes that influence ultimate patterns of plant recruitment. Importantly, if biased patterns of dispersal are common in other systems, frugivores may promote tree coexistence in biodiverse forests by limiting the frequency and diversity of heterospecific interactions of seeds they disperse.

  1. An in situ approach to detect tree root ecology: linking ground-penetrating radar imaging to isotope-derived water acquisition zones

    National Research Council Canada - National Science Library

    Isaac, Marney E; Anglaaere, Luke C N

    2013-01-01

    .... Methodologically, nondestructive in situ tree root ecology analysis has lagged. In this study, we tested a nondestructive approach to determine tree coarse root architecture and function of a perennial tree crop, Theobroma cacao L...

  2. A Vector Approach to Regression Analysis and Its Implications to Heavy-Duty Diesel Emissions

    Energy Technology Data Exchange (ETDEWEB)

    McAdams, H.T.

    2001-02-14

    An alternative approach is presented for the regression of response data on predictor variables that are not logically or physically separable. The methodology is demonstrated by its application to a data set of heavy-duty diesel emissions. Because of the covariance of fuel properties, it is found advantageous to redefine the predictor variables as vectors, in which the original fuel properties are components, rather than as scalars each involving only a single fuel property. The fuel property vectors are defined in such a way that they are mathematically independent and statistically uncorrelated. Because the available data set does not allow definitive separation of vehicle and fuel effects, and because test fuels used in several of the studies may be unrealistically contrived to break the association of fuel variables, the data set is not considered adequate for development of a full-fledged emission model. Nevertheless, the data clearly show that only a few basic patterns of fuel-property variation affect emissions and that the number of these patterns is considerably less than the number of variables initially thought to be involved. These basic patterns, referred to as ''eigenfuels,'' may reflect blending practice in accordance with their relative weighting in specific circumstances. The methodology is believed to be widely applicable in a variety of contexts. It promises an end to the threat of collinearity and the frustration of attempting, often unrealistically, to separate variables that are inseparable.

  3. An Ionospheric Index Model based on Linear Regression and Neural Network Approaches

    Science.gov (United States)

    Tshisaphungo, Mpho; McKinnell, Lee-Anne; Bosco Habarulema, John

    2017-04-01

    The ionosphere is well known to reflect radio wave signals in the high frequency (HF) band due to the present of electron and ions within the region. To optimise the use of long distance HF communications, it is important to understand the drivers of ionospheric storms and accurately predict the propagation conditions especially during disturbed days. This paper presents the development of an ionospheric storm-time index over the South African region for the application of HF communication users. The model will result into a valuable tool to measure the complex ionospheric behaviour in an operational space weather monitoring and forecasting environment. The development of an ionospheric storm-time index is based on a single ionosonde station data over Grahamstown (33.3°S,26.5°E), South Africa. Critical frequency of the F2 layer (foF2) measurements for a period 1996-2014 were considered for this study. The model was developed based on linear regression and neural network approaches. In this talk validation results for low, medium and high solar activity periods will be discussed to demonstrate model's performance.

  4. A Vector Approach to Regression Analysis and Its Implications to Heavy-Duty Diesel Emissions

    Energy Technology Data Exchange (ETDEWEB)

    McAdams, H.T.

    2001-02-14

    An alternative approach is presented for the regression of response data on predictor variables that are not logically or physically separable. The methodology is demonstrated by its application to a data set of heavy-duty diesel emissions. Because of the covariance of fuel properties, it is found advantageous to redefine the predictor variables as vectors, in which the original fuel properties are components, rather than as scalars each involving only a single fuel property. The fuel property vectors are defined in such a way that they are mathematically independent and statistically uncorrelated. Because the available data set does not allow definitive separation of vehicle and fuel effects, and because test fuels used in several of the studies may be unrealistically contrived to break the association of fuel variables, the data set is not considered adequate for development of a full-fledged emission model. Nevertheless, the data clearly show that only a few basic patterns of fuel-property variation affect emissions and that the number of these patterns is considerably less than the number of variables initially thought to be involved. These basic patterns, referred to as ''eigenfuels,'' may reflect blending practice in accordance with their relative weighting in specific circumstances. The methodology is believed to be widely applicable in a variety of contexts. It promises an end to the threat of collinearity and the frustration of attempting, often unrealistically, to separate variables that are inseparable.

  5. Black box modeling of PIDs implemented in PLCs without structural information: a support vector regression approach.

    Science.gov (United States)

    Salat, Robert; Awtoniuk, Michal

    In this report, the parameters identification of a proportional-integral-derivative (PID) algorithm implemented in a programmable logic controller (PLC) using support vector regression (SVR) is presented. This report focuses on a black box model of the PID with additional functions and modifications provided by the manufacturers and without information on the exact structure. The process of feature selection and its impact on the training and testing abilities are emphasized. The method was tested on a real PLC (Siemens and General Electric) with the implemented PID. The results show that the SVR maps the function of the PID algorithms and the modifications introduced by the manufacturer of the PLC with high accuracy. With this approach, the simulation results can be directly used to tune the PID algorithms in the PLC. The method is sufficiently universal in that it can be applied to any PI or PID algorithm implemented in the PLC with additional functions and modifications that were previously considered to be trade secrets. This method can also be an alternative for engineers who need to tune the PID and do not have any such information on the structure and cannot use the default settings for the known structures.

  6. Model-free prediction and regression a transformation-based approach to inference

    CERN Document Server

    Politis, Dimitris N

    2015-01-01

    The Model-Free Prediction Principle expounded upon in this monograph is based on the simple notion of transforming a complex dataset to one that is easier to work with, e.g., i.i.d. or Gaussian. As such, it restores the emphasis on observable quantities, i.e., current and future data, as opposed to unobservable model parameters and estimates thereof, and yields optimal predictors in diverse settings such as regression and time series. Furthermore, the Model-Free Bootstrap takes us beyond point prediction in order to construct frequentist prediction intervals without resort to unrealistic assumptions such as normality. Prediction has been traditionally approached via a model-based paradigm, i.e., (a) fit a model to the data at hand, and (b) use the fitted model to extrapolate/predict future data. Due to both mathematical and computational constraints, 20th century statistical practice focused mostly on parametric models. Fortunately, with the advent of widely accessible powerful computing in the late 1970s, co...

  7. A logistic regression based approach for the prediction of flood warning threshold exceedance

    Science.gov (United States)

    Diomede, Tommaso; Trotter, Luca; Stefania Tesini, Maria; Marsigli, Chiara

    2016-04-01

    A method based on logistic regression is proposed for the prediction of river level threshold exceedance at short (+0-18h) and medium (+18-42h) lead times. The aim of the study is to provide a valuable tool for the issue of warnings by the authority responsible of public safety in case of flood. The role of different precipitation periods as predictors for the exceedance of a fixed river level has been investigated, in order to derive significant information for flood forecasting. Based on catchment-averaged values, a separation of "antecedent" and "peak-triggering" rainfall amounts as independent variables is attempted. In particular, the following flood-related precipitation periods have been considered: (i) the period from 1 to n days before the forecast issue time, which may be relevant for the soil saturation, (ii) the last 24 hours, which may be relevant for the current water level in the river, and (iii) the period from 0 to x hours in advance with respect to the forecast issue time, when the flood-triggering precipitation generally occurs. Several combinations and values of these predictors have been tested to optimise the method implementation. In particular, the period for the precursor antecedent precipitation ranges between 5 and 45 days; the state of the river can be represented by the last 24-h precipitation or, as alternative, by the current river level. The flood-triggering precipitation has been cumulated over the next 18 hours (for the short lead time) and 36-42 hours (for the medium lead time). The proposed approach requires a specific implementation of logistic regression for each river section and warning threshold. The method performance has been evaluated over the Santerno river catchment (about 450 km2) in the Emilia-Romagna Region, northern Italy. A statistical analysis in terms of false alarms, misses and related scores was carried out by using a 8-year long database. The results are quite satisfactory, with slightly better performances

  8. Modeling Forest Structural Parameters in the Mediterranean Pines of Central Spain using QuickBird-2 Imagery and Classification and Regression Tree Analysis (CART

    Directory of Open Access Journals (Sweden)

    José A. Delgado

    2012-01-01

    Full Text Available Forest structural parameters such as quadratic mean diameter, basal area, and number of trees per unit area are important for the assessment of wood volume and biomass and represent key forest inventory attributes. Forest inventory information is required to support sustainable management, carbon accounting, and policy development activities. Digital image processing of remotely sensed imagery is increasingly utilized to assist traditional, more manual, methods in the estimation of forest structural attributes over extensive areas, also enabling evaluation of change over time. Empirical attribute estimation with remotely sensed data is frequently employed, yet with known limitations, especially over complex environments such as Mediterranean forests. In this study, the capacity of high spatial resolution (HSR imagery and related techniques to model structural parameters at the stand level (n = 490 in Mediterranean pines in Central Spain is tested using data from the commercial satellite QuickBird-2. Spectral and spatial information derived from multispectral and panchromatic imagery (2.4 m and 0.68 m sided pixels, respectively served to model structural parameters. Classification and Regression Tree Analysis (CART was selected for the modeling of attributes. Accurate models were produced of quadratic mean diameter (QMD (R2 = 0.8; RMSE = 0.13 m with an average error of 17% while basal area (BA models produced an average error of 22% (RMSE = 5.79 m2/ha. When the measured number of trees per unit area (N was categorized, as per frequent forest management practices, CART models correctly classified 70% of the stands, with all other stands classified in an adjacent class. The accuracy of the attributes estimated here is expected to be better when canopy cover is more open and attribute values are at the lower end of the range present, as related in the pattern of the residuals found in this study. Our findings indicate that attributes derived from

  9. Mechanisms of Developmental Regression in Autism and the Broader Phenotype: A Neural Network Modeling Approach

    Science.gov (United States)

    Thomas, Michael S. C.; Knowland, Victoria C. P.; Karmiloff-Smith, Annette

    2011-01-01

    Loss of previously established behaviors in early childhood constitutes a markedly atypical developmental trajectory. It is found almost uniquely in autism and its cause is currently unknown (Baird et al., 2008). We present an artificial neural network model of developmental regression, exploring the hypothesis that regression is caused by…

  10. Mechanisms of Developmental Regression in Autism and the Broader Phenotype: A Neural Network Modeling Approach

    Science.gov (United States)

    Thomas, Michael S. C.; Knowland, Victoria C. P.; Karmiloff-Smith, Annette

    2011-01-01

    Loss of previously established behaviors in early childhood constitutes a markedly atypical developmental trajectory. It is found almost uniquely in autism and its cause is currently unknown (Baird et al., 2008). We present an artificial neural network model of developmental regression, exploring the hypothesis that regression is caused by…

  11. Modelling of Random Textured Tandem Silicon Solar Cells Characteristics: Decision Tree Approach

    Directory of Open Access Journals (Sweden)

    R.S. Kamath

    2016-11-01

    Full Text Available We report decision tree (DT modeling of randomly textured tandem silicon solar cells characteristics. The photovoltaic modules of silicon-based solar cells are extremely popular due to their high efficiency and longer lifetime. Decision tree model is one of the most common data mining models can be used for predictive analytics. The reported investigation depicts optimum decision tree architecture achieved by tuning parameters such as Min split, Min bucket, Max depth and Complexity. DT model, thus derived is easy to understand and entails recursive partitioning approach implemented in the “rpart” package. Moreover the performance of the model is evaluated with reference Mean Square Error (MSE estimate of error rate. The modeling of the random textured silicon solar cells reveals strong correlation of efficiency with “Fill factor” and “thickness of a-Si layer”.

  12. Estimating Dbh of Trees Employing Multiple Linear Regression of the best Lidar-Derived Parameter Combination Automated in Python in a Natural Broadleaf Forest in the Philippines

    Science.gov (United States)

    Ibanez, C. A. G.; Carcellar, B. G., III; Paringit, E. C.; Argamosa, R. J. L.; Faelga, R. A. G.; Posilero, M. A. V.; Zaragosa, G. P.; Dimayacyac, N. A.

    2016-06-01

    Diameter-at-Breast-Height Estimation is a prerequisite in various allometric equations estimating important forestry indices like stem volume, basal area, biomass and carbon stock. LiDAR Technology has a means of directly obtaining different forest parameters, except DBH, from the behavior and characteristics of point cloud unique in different forest classes. Extensive tree inventory was done on a two-hectare established sample plot in Mt. Makiling, Laguna for a natural growth forest. Coordinates, height, and canopy cover were measured and types of species were identified to compare to LiDAR derivatives. Multiple linear regression was used to get LiDAR-derived DBH by integrating field-derived DBH and 27 LiDAR-derived parameters at 20m, 10m, and 5m grid resolutions. To know the best combination of parameters in DBH Estimation, all possible combinations of parameters were generated and automated using python scripts and additional regression related libraries such as Numpy, Scipy, and Scikit learn were used. The combination that yields the highest r-squared or coefficient of determination and lowest AIC (Akaike's Information Criterion) and BIC (Bayesian Information Criterion) was determined to be the best equation. The equation is at its best using 11 parameters at 10mgrid size and at of 0.604 r-squared, 154.04 AIC and 175.08 BIC. Combination of parameters may differ among forest classes for further studies. Additional statistical tests can be supplemented to help determine the correlation among parameters such as Kaiser- Meyer-Olkin (KMO) Coefficient and the Barlett's Test for Spherecity (BTS).

  13. SU-E-J-212: Identifying Bones From MRI: A Dictionary Learnign and Sparse Regression Approach

    Energy Technology Data Exchange (ETDEWEB)

    Ruan, D; Yang, Y; Cao, M; Hu, P; Low, D [UCLA, Los Angeles, CA (United States)

    2014-06-01

    Purpose: To develop an efficient and robust scheme to identify bony anatomy based on MRI-only simulation images. Methods: MRI offers important soft tissue contrast and functional information, yet its lack of correlation to electron-density has placed it as an auxiliary modality to CT in radiotherapy simulation and adaptation. An effective scheme to identify bony anatomy is an important first step towards MR-only simulation/treatment paradigm and would satisfy most practical purposes. We utilize a UTE acquisition sequence to achieve visibility of the bone. By contrast to manual + bulk or registration-to identify bones, we propose a novel learning-based approach for improved robustness to MR artefacts and environmental changes. Specifically, local information is encoded with MR image patch, and the corresponding label is extracted (during training) from simulation CT aligned to the UTE. Within each class (bone vs. nonbone), an overcomplete dictionary is learned so that typical patches within the proper class can be represented as a sparse combination of the dictionary entries. For testing, an acquired UTE-MRI is divided to patches using a sliding scheme, where each patch is sparsely regressed against both bone and nonbone dictionaries, and subsequently claimed to be associated with the class with the smaller residual. Results: The proposed method has been applied to the pilot site of brain imaging and it has showed general good performance, with dice similarity coefficient of greater than 0.9 in a crossvalidation study using 4 datasets. Importantly, it is robust towards consistent foreign objects (e.g., headset) and the artefacts relates to Gibbs and field heterogeneity. Conclusion: A learning perspective has been developed for inferring bone structures based on UTE MRI. The imaging setting is subject to minimal motion effects and the post-processing is efficient. The improved efficiency and robustness enables a first translation to MR-only routine. The scheme

  14. Effects of Students' Beliefs on Mathematics and Achievement of University Students: Regression Analysis Approach

    Directory of Open Access Journals (Sweden)

    Velo Suthar

    2010-01-01

    Full Text Available Problem statement: At present, after almost more than 20-decades, Malaysia can boast of a solid national philosophy of education, despite tremendous struggles and hopes. The professional learning opportunities are necessary to enhance, support and sustain student's mathematics achievement. Approach: Empirical evidence had shown that student's belief in mathematics is crucial in meeting career aspiration. In addition mathematical beliefs are closely correlated to their mathematics achievement among university students. Results: The literature exposed that a few studies had been done on university undergraduates. The present study involves a sample of eighty-six university undergraduate students, who had completed a self-reported questionnaire related to student mathematical beliefs on three dimensions, viz-a-viz beliefs about mathematics, beliefs about importance of mathematics and beliefs on one's ability in mathematics. The reliability index, using the Cronbach's alpha was 0.86, indicating a high level of internal consistency. Records of achievement (GPA were obtained from the academic division, University Putra Malaysia. Based on these records, students were classified into the minor and major mathematics group. The authors examined student's mathematical beliefs based on a three dimensional logistic regression model estimation technique, appropriate for a survey design study. Conclusion/Recommendations: The results illustrated and identified significant relationships between student beliefs about importance of mathematics and beliefs on one's ability in mathematics with mathematics achievement. In addition, the Hosmer and Lemeshow test was non-significant with a chi-square of 8.46, p = 0.3, which indicated that there is a good model fit as the data did not significantly deviate from the model. The overall model, 77.9% of the sample was classified correctly.

  15. Relação entre diferentes caracteres de plantas jovens de seringueira Correlations and regressions studies among juvenile rubber tree characters

    Directory of Open Access Journals (Sweden)

    César Lavorenti

    1990-01-01

    Full Text Available O presente trabalho foi realizado com o objetivo de determinar a existência e as magnitudes de correlações e regressões lineares simples em plântulas jovens de seringueira (Hevea spp., para melhor condução de seleção nos futuros trabalhos de melhoramento. Foram utilizadas médias de produção de borracha seca por plântulas por corte, através do teste Hamaker-Morris-Mann (P; circunferência do caule (CC; espessura de casca (EC; número de anéis (NA; diâmetro dos vasos (DV; densidade dos vasos laticíleros (D e distância média entre anéis de vasos consecutivos (DMEAVC em um viveiro de cruzamento com três anos e meio de idade. Os resultados mostraram, entre outros fatores, que as correlações lineares simples de P com CC, EC, NA, D, DV e DMEAVC foram, respectivamente, r =t 0,61, 0,34, 0,28, 0,29, 0,43 e -0,13. As correlações de CC com EC, NA, D, DV e DMEAVC foram: 0,65, 0,22, 0,37, 0,33 e 0,096 respectivamente. Estudos de regressão linear simples de P com CC, EC, NA, DV, D e DMEAVC sugerem que CC foi o caráter independente mais significativo, contribuindo com 36% da variação em P. Em relação ao vigor, a regressão de CC com os respectivos caracteres sugere que EC foi o único caráter que contribuiu significativamente para a variação de CC com 42%. As altas correlações observadas da produção com circunferência do caule e com espessura de casca evidenciam a possibilidade de obter genótipos jovens de boa capacidade produtiva e grande vigor, através de seleção precoce dessas variáveis.This study was undertaken aiming to determine the existence of linear correlations, based on simple regression studies for a better improvement of young rubber tree (Hevea spp. breeding and selection. The characters studied were: yield of dry rubber per tapping by Hamaker-Morris-Mann test tapping (P, mean gurth (CC, bark thickness (EC, number of latex vessel rings (NA, diameter of latex vesseis (DV, density of latex vesseis per 5mm

  16. Alternative standardization approaches to improving streamflow reconstructions with ring-width indices of riparian trees

    Science.gov (United States)

    Meko, David M; Friedman, Jonathan M.; Touchan, Ramzi; Edmondson, Jesse R.; Griffin, Eleanor R.; Scott, Julian A.

    2015-01-01

    Old, multi-aged populations of riparian trees provide an opportunity to improve reconstructions of streamflow. Here, ring widths of 394 plains cottonwood (Populus deltoids, ssp. monilifera) trees in the North Unit of Theodore Roosevelt National Park, North Dakota, are used to reconstruct streamflow along the Little Missouri River (LMR), North Dakota, US. Different versions of the cottonwood chronology are developed by (1) age-curve standardization (ACS), using age-stratified samples and a single estimated curve of ring width against estimated ring age, and (2) time-curve standardization (TCS), using a subset of longer ring-width series individually detrended with cubic smoothing splines of width against year. The cottonwood chronologies are combined with the first principal component of four upland conifer chronologies developed by conventional methods to investigate the possible value of riparian tree-ring chronologies for streamflow reconstruction of the LMR. Regression modeling indicates that the statistical signal for flow is stronger in the riparian cottonwood than in the upland chronologies. The flow signal from cottonwood complements rather than repeats the signal from upland conifers and is especially strong in young trees (e.g. 5–35 years). Reconstructions using a combination of cottonwoods and upland conifers are found to explain more than 50% of the variance of LMR flow over a 1935–1990 calibration period and to yield reconstruction of flow to 1658. The low-frequency component of reconstructed flow is sensitive to the choice of standardization method for the cottonwood. In contrast to the TCS version, the ACS reconstruction features persistent low flows in the 19th century. Results demonstrate the value to streamflow reconstruction of riparian cottonwood and suggest that more studies are needed to exploit the low-frequency streamflow signal in densely sampled age-stratified stands of riparian trees.

  17. Regression Basics

    CERN Document Server

    Kahane, Leo H

    2007-01-01

    Using a friendly, nontechnical approach, the Second Edition of Regression Basics introduces readers to the fundamentals of regression. Accessible to anyone with an introductory statistics background, this book builds from a simple two-variable model to a model of greater complexity. Author Leo H. Kahane weaves four engaging examples throughout the text to illustrate not only the techniques of regression but also how this empirical tool can be applied in creative ways to consider a broad array of topics. New to the Second Edition Offers greater coverage of simple panel-data estimation:

  18. Application of Classification and Regression Tree (CART) analysis on the microflora of minced meat for classification according to Reg. (EC) 2073/2005.

    Science.gov (United States)

    Paulsen, P; Smulders, F J M; Tichy, A; Aydin, A; Höck, C

    2011-07-01

    In a retrospective study on the microbiology of minced meat from small food businesses supplying directly to the consumer, the relative contribution of meat supplier, meat species and outlet where meat was minced was assessed by "Classification and Regression Tree" (CART) analysis. Samples (n=888) originated from 129 outlets of a single supermarket chain. Sampling units were 4-5 packs (pork, beef, and mixed pork-beef). Total aerobic counts (TACs) were 5.3±1.0 log CFU/g. In 75.6% of samples, E. coli were <1 log CFU/g. The proportion of "unsatisfactory" sample sets [as defined in Reg. (EC) 2073/2005] were 31.3 and 4.5% for TAC and E. coli, respectively. For classification according to TACs, the outlet where meat was minced and the "meat supplier" were the most important predictors. For E. coli, "outlet" was the most important predictor, but the limit of detection of 1 log CFU/g was not discriminative enough to allow further conclusions.

  19. Köppen versus the computer: an objective comparison between the Köppen-Geiger climate classification and a multivariate regression tree

    Directory of Open Access Journals (Sweden)

    A. J. Cannon

    2011-03-01

    Full Text Available A global climate classification is defined using a multivariate regression tree (MRT. The MRT algorithm is automated, which removes the need for a practitioner to manually define the classes; it is hierarchical, which allows a series of nested classes to be defined; and it is rule-based, which allows climate classes to be unambiguously defined and easily interpreted. Climate variables used in the MRT are restricted to those from the Köppen-Geiger climate classification. The result is a hierarchical, rule-based climate classification that can be directly compared against the traditional system. An objective comparison between the two climate classifications at their 5, 13, and 30 class hierarchical levels indicates that both perform well in terms of identifying regions of homogeneous temperature variability, although the MRT still generally outperforms the Köppen-Geiger system. In terms of precipitation discrimination, the Köppen-Geiger classification performs poorly relative to the MRT. The data and algorithm implementation used in this study are freely available. Thus, the MRT climate classification offers instructors and students in the geosciences a simple instrument for exploring modern, computer-based climatological methods.

  20. Köppen versus the computer: comparing Köppen-Geiger and multivariate regression tree climate classifications in terms of climate homogeneity

    Directory of Open Access Journals (Sweden)

    A. J. Cannon

    2012-01-01

    Full Text Available A global climate classification is defined using a multivariate regression tree (MRT. The MRT algorithm is automated, hierarchical, and rule-based, thus allowing a system of climate classes to be quickly defined and easily interpreted. Climate variables used in the MRT are restricted to those from the Köppen-Geiger classification system. The result is a set of classes that can be directly compared against those from the traditional system. The two climate classifications are compared at their 5, 13, and 30 class hierarchical levels in terms of climate homogeneity. Results indicate that both perform well in terms of identifying regions of homogeneous temperature variability, although the MRT still generally outperforms the Köppen-Geiger system. In terms of precipitation discrimination, the Köppen-Geiger classification performs poorly relative to the MRT. The data and algorithm implementation used in this study are freely available. Thus, the MRT climate classification offers instructors and students in the geosciences a simple instrument for exploring modern, computer-based climatological methods.

  1. An Integrated Approach to Battery Health Monitoring using Bayesian Regression, Classification and State Estimation

    Data.gov (United States)

    National Aeronautics and Space Administration — The application of the Bayesian theory of managing uncertainty and complexity to regression and classification in the form of Relevance Vector Machine (RVM), and to...

  2. Quantifying multi-dimensional functional trait spaces of trees: empirical versus theoretical approaches

    Science.gov (United States)

    Ogle, K.; Fell, M.; Barber, J. J.

    2016-12-01

    Empirical, field studies of plant functional traits have revealed important trade-offs among pairs or triplets of traits, such as the leaf (LES) and wood (WES) economics spectra. Trade-offs include correlations between leaf longevity (LL) vs specific leaf area (SLA), LL vs mass-specific leaf respiration rate (RmL), SLA vs RmL, and resistance to breakage vs wood density. Ordination analyses (e.g., PCA) show groupings of traits that tend to align with different life-history strategies or taxonomic groups. It is unclear, however, what underlies such trade-offs and emergent spectra. Do they arise from inherent physiological constraints on growth, or are they more reflective of environmental filtering? The relative importance of these mechanisms has implications for predicting biogeochemical cycling, which is influenced by trait distributions of the plant community. We address this question using an individual-based model of tree growth (ACGCA) to quantify the theoretical trait space of trees that emerges from physiological constraints. ACGCA's inputs include 32 physiological, anatomical, and allometric traits, many of which are related to the LES and WES. We fit ACGCA to 1.6 million USFS FIA observations of tree diameters and heights to obtain vectors of trait values that produce realistic growth, and we explored the structure of this trait space. No notable correlations emerged among the 496 trait pairs, but stepwise regressions revealed complicated multi-variate structure: e.g., relationships between pairs of traits (e.g., RmL and SLA) are governed by other traits (e.g., LL, radiation-use efficiency [RUE]). We also simulated growth under various canopy gap scenarios that impose varying degrees of environmental filtering to explore the multi-dimensional trait space (hypervolume) of trees that died vs survived. The centroid and volume of the hypervolumes differed among dead and live trees, especially under gap conditions leading to low mortality. Traits most predictive

  3. A new approach for effectively determining fracture network connections in fractured rocks using R tree indexing

    Institute of Scientific and Technical Information of China (English)

    LIU Hua-mei; WANG Ming-yu; SONG Xian-feng

    2011-01-01

    Determinations of fracture network connections would help the investigators remove those “meaningless” no-flow-passing fractures,providing an updated and more effective fracture network that could considerably improve the computation efficiency in the pertinent numerical simulations of fluid flow and solute transport.The effective algorithms with higher computational efficiency are needed to accomplish this task in large-scale fractured rock masses.A new approach using R tree indexing was proposed for determining fracture connection in 3D stochastically distributed fracture network.By comparing with the traditional exhaustion algorithm,it was observed that from the simulation results,this approach was much more effective; and the more the fractures were investigated,the more obvious the advantages of the approach were.Furthermore,it was indicated that the runtime used for creating the R tree indexing has a major part in the total of the runtime used for calculating Minimum Bounding Rectangles(MBRs),creating the R tree indexing,precisely finding out fracture intersections,and identifying flow paths,which are four important steps to determine fracture connections.This proposed approach for the determination of fracture connections in three-dimensional fractured rocks are expected to provide efficient preprocessing and critical database for practically accomplishing numerical computation of fluid flow and solute transport in large-scale fractured rock masses.

  4. Mortality risk prediction in burn injury: Comparison of logistic regression with machine learning approaches.

    Science.gov (United States)

    Stylianou, Neophytos; Akbarov, Artur; Kontopantelis, Evangelos; Buchan, Iain; Dunn, Ken W

    2015-08-01

    Predicting mortality from burn injury has traditionally employed logistic regression models. Alternative machine learning methods have been introduced in some areas of clinical prediction as the necessary software and computational facilities have become accessible. Here we compare logistic regression and machine learning predictions of mortality from burn. An established logistic mortality model was compared to machine learning methods (artificial neural network, support vector machine, random forests and naïve Bayes) using a population-based (England & Wales) case-cohort registry. Predictive evaluation used: area under the receiver operating characteristic curve; sensitivity; specificity; positive predictive value and Youden's index. All methods had comparable discriminatory abilities, similar sensitivities, specificities and positive predictive values. Although some machine learning methods performed marginally better than logistic regression the differences were seldom statistically significant and clinically insubstantial. Random forests were marginally better for high positive predictive value and reasonable sensitivity. Neural networks yielded slightly better prediction overall. Logistic regression gives an optimal mix of performance and interpretability. The established logistic regression model of burn mortality performs well against more complex alternatives. Clinical prediction with a small set of strong, stable, independent predictors is unlikely to gain much from machine learning outside specialist research contexts. Copyright © 2015 Elsevier Ltd and ISBI. All rights reserved.

  5. Alternatives for Mixed-Effects Meta-Regression Models in the Reliability Generalization Approach: A Simulation Study

    Science.gov (United States)

    López-López, José Antonio; Botella, Juan; Sánchez-Meca, Julio; Marín-Martínez, Fulgencio

    2013-01-01

    Since heterogeneity between reliability coefficients is usually found in reliability generalization studies, moderator analyses constitute a crucial step for that meta-analytic approach. In this study, different procedures for conducting mixed-effects meta-regression analyses were compared. Specifically, four transformation methods for the…

  6. A best-first tree-searching approach for ML decoding in MIMO system

    KAUST Repository

    Shen, Chung-An

    2012-07-28

    In MIMO communication systems maximum-likelihood (ML) decoding can be formulated as a tree-searching problem. This paper presents a tree-searching approach that combines the features of classical depth-first and breadth-first approaches to achieve close to ML performance while minimizing the number of visited nodes. A detailed outline of the algorithm is given, including the required storage. The effects of storage size on BER performance and complexity in terms of search space are also studied. Our result demonstrates that with a proper choice of storage size the proposed method visits 40% fewer nodes than a sphere decoding algorithm at signal to noise ratio (SNR) = 20dB and by an order of magnitude at 0 dB SNR.

  7. A nonparametric approach to calculate critical micelle concentrations: the local polynomial regression method.

    Science.gov (United States)

    López Fontán, J L; Costa, J; Ruso, J M; Prieto, G; Sarmiento, F

    2004-02-01

    The application of a statistical method, the local polynomial regression method, (LPRM), based on a nonparametric estimation of the regression function to determine the critical micelle concentration (cmc) is presented. The method is extremely flexible because it does not impose any parametric model on the subjacent structure of the data but rather allows the data to speak for themselves. Good concordance of cmc values with those obtained by other methods was found for systems in which the variation of a measured physical property with concentration showed an abrupt change. When this variation was slow, discrepancies between the values obtained by LPRM and others methods were found.

  8. A nonparametric approach to calculate critical micelle concentrations: the local polynomial regression method

    Energy Technology Data Exchange (ETDEWEB)

    Lopez Fontan, J.L.; Costa, J.; Ruso, J.M.; Prieto, G. [Dept. of Applied Physics, Univ. of Santiago de Compostela, Santiago de Compostela (Spain); Sarmiento, F. [Dept. of Mathematics, Faculty of Informatics, Univ. of A Coruna, A Coruna (Spain)

    2004-02-01

    The application of a statistical method, the local polynomial regression method, (LPRM), based on a nonparametric estimation of the regression function to determine the critical micelle concentration (cmc) is presented. The method is extremely flexible because it does not impose any parametric model on the subjacent structure of the data but rather allows the data to speak for themselves. Good concordance of cmc values with those obtained by other methods was found for systems in which the variation of a measured physical property with concentration showed an abrupt change. When this variation was slow, discrepancies between the values obtained by LPRM and others methods were found. (orig.)

  9. Tropical forest structure characterization using airborne lidar data: an individual tree level approach

    Science.gov (United States)

    Ferraz, A.; Saatchi, S. S.

    2015-12-01

    Fine scale tropical forest structure characterization has been performed by means of field measurements techniques that record both the specie and the diameter at the breast height (dbh) for every tree within a given area. Due to dense and complex vegetation, additional important ecological variables (e.g. the tree height and crown size) are usually not measured because they are hardly recognized from the ground. The poor knowledge on the 3D tropical forest structure has been a major limitation for the understanding of different ecological issues such as the spatial distribution of carbon stocks, regeneration and competition dynamics and light penetration gradient assessments. Airborne laser scanning (ALS) is an active remote sensing technique that provides georeferenced distance measurements between the aircraft and the surface. It provides an unstructured 3D point cloud that is a high-resolution model of the forest. This study presents the first approach for tropical forest characterization at a fine scale using remote sensing data. The multi-modal lidar point cloud is decomposed into 3D clusters that correspond to single trees by means of a technique called Adaptive Mean Shift Segmentation (AMS3D). The ability of the corresponding individual tree metrics (tree height, crown area and crown volume) for the estimation of above ground biomass (agb) over the 50 ha CTFS plot in Barro Colorado Island is here assessed. We conclude that our approach is able to map the agb spatial distribution with an error of nearly 12% (RMSE=28 Mg ha-1) compared with field-based estimates over 1ha plots.

  10. Persistent Phylogeny: A Galled-Tree and Integer Linear Programming Approach

    OpenAIRE

    Gusfield, Dan

    2015-01-01

    The Persistent-Phylogeny Model is an extension of the widely studied Perfect-Phylogeny Model, encompassing a broader range of evolutionary phenomena. Biological and algorithmic questions concerning persistent phylogeny have been intensely investigated in recent years. In this paper, we explore two alternative approaches to the persistent-phylogeny problem that grow out of our previous work on perfect phylogeny, and on galled trees. We develop an integer programming solution to the Persistent-...

  11. On the Usefulness of a Multilevel Logistic Regression Approach to Person-Fit Analysis

    Science.gov (United States)

    Conijn, Judith M.; Emons, Wilco H. M.; van Assen, Marcel A. L. M.; Sijtsma, Klaas

    2011-01-01

    The logistic person response function (PRF) models the probability of a correct response as a function of the item locations. Reise (2000) proposed to use the slope parameter of the logistic PRF as a person-fit measure. He reformulated the logistic PRF model as a multilevel logistic regression model and estimated the PRF parameters from this…

  12. An Alumni Oriented Approach to Sport Management Curriculum Design Using Performance Ratings and a Regression Model.

    Science.gov (United States)

    Ulrich, David; Parkhouse, Bonnie L.

    1982-01-01

    An alumni-based model is proposed as an alternative to sports management curriculum design procedures. The model relies on the assessment of curriculum by sport management alumni and uses performance ratings of employers and measures of satisfaction by alumni in a regression model to identify curriculum leading to increased work performance and…

  13. On the Usefulness of a Multilevel Logistic Regression Approach to Person-Fit Analysis

    Science.gov (United States)

    Conijn, Judith M.; Emons, Wilco H. M.; van Assen, Marcel A. L. M.; Sijtsma, Klaas

    2011-01-01

    The logistic person response function (PRF) models the probability of a correct response as a function of the item locations. Reise (2000) proposed to use the slope parameter of the logistic PRF as a person-fit measure. He reformulated the logistic PRF model as a multilevel logistic regression model and estimated the PRF parameters from this…

  14. Logic-tree Approach for Probabilistic Tsunami Hazard Analysis and its Applications to the Japanese Coasts

    Science.gov (United States)

    Annaka, Tadashi; Satake, Kenji; Sakakiyama, Tsutomu; Yanagisawa, Ken; Shuto, Nobuo

    2007-03-01

    For Probabilistic Tsunami Hazard Analysis (PTHA), we propose a logic-tree approach to construct tsunami hazard curves (relationship between tsunami height and probability of exceedance) and present some examples for Japan for the purpose of quantitative assessments of tsunami risk for important coastal facilities. A hazard curve is obtained by integration over the aleatory uncertainties, and numerous hazard curves are obtained for different branches of logic-tree representing epistemic uncertainty. A PTHA consists of a tsunami source model and coastal tsunami height estimation. We developed the logic-tree models for local tsunami sources around Japan and for distant tsunami sources along the South American subduction zones. Logic-trees were made for tsunami source zones, size and frequency of tsunamigenic earthquakes, fault models, and standard error of estimated tsunami heights. Numerical simulation rather than empirical relation was used for estimating the median tsunami heights. Weights of discrete branches that represent alternative hypotheses and interpretations were determined by the questionnaire survey for tsunami and earthquake experts, whereas those representing the error of estimated value were determined on the basis of historical data. Examples of tsunami hazard curves were illustrated for the coastal sites, and uncertainty in the tsunami hazard was displayed by 5-, 16-, 50-, 84- and 95-percentile and mean hazard curves.

  15. An Integrated Approach of Model checking and Temporal Fault Tree for System Safety Analysis

    Energy Technology Data Exchange (ETDEWEB)

    Koh, Kwang Yong; Seong, Poong Hyun [Korea Advanced Institute of Science and Technology, Daejeon (Korea, Republic of)

    2009-10-15

    Digitalization of instruments and control systems in nuclear power plants offers the potential to improve plant safety and reliability through features such as increased hardware reliability and stability, and improved failure detection capability. It however makes the systems and their safety analysis more complex. Originally, safety analysis was applied to hardware system components and formal methods mainly to software. For software-controlled or digitalized systems, it is necessary to integrate both. Fault tree analysis (FTA) which has been one of the most widely used safety analysis technique in nuclear industry suffers from several drawbacks as described in. In this work, to resolve the problems, FTA and model checking are integrated to provide formal, automated and qualitative assistance to informal and/or quantitative safety analysis. Our approach proposes to build a formal model of the system together with fault trees. We introduce several temporal gates based on timed computational tree logic (TCTL) to capture absolute time behaviors of the system and to give concrete semantics to fault tree gates to reduce errors during the analysis, and use model checking technique to automate the reasoning process of FTA.

  16. Contrasting regional and national mechanisms for predicting elevated arsenic in private wells across the United States using classification and regression trees.

    Science.gov (United States)

    Frederick, Logan; VanDerslice, James; Taddie, Marissa; Malecki, Kristen; Gregg, Josh; Faust, Nicholas; Johnson, William P

    2016-03-15

    Arsenic contamination in groundwater is a public health and environmental concern in the United States (U.S.) particularly where monitoring is not required under the Safe Water Drinking Act. Previous studies suggest the influence of regional mechanisms for arsenic mobilization into groundwater; however, no study has examined how influencing parameters change at a continental scale spanning multiple regions. We herein examine covariates for groundwater in the western, central and eastern U.S. regions representing mechanisms associated with arsenic concentrations exceeding the U.S. Environmental Protection Agency maximum contamination level (MCL) of 10 parts per billion (ppb). Statistically significant covariates were identified via classification and regression tree (CART) analysis, and included hydrometeorological and groundwater chemical parameters. The CART analyses were performed at two scales: national and regional; for which three physiographic regions located in the western (Payette Section and the Snake River Plain), central (Osage Plains of the Central Lowlands), and eastern (Embayed Section of the Coastal Plains) U.S. were examined. Validity of each of the three regional CART models was indicated by values >85% for the area under the receiver-operating characteristic curve. Aridity (precipitation minus potential evapotranspiration) was identified as the primary covariate associated with elevated arsenic at the national scale. At the regional scale, aridity and pH were the major covariates in the arid to semi-arid (western) region; whereas dissolved iron (taken to represent chemically reducing conditions) and pH were major covariates in the temperate (eastern) region, although additional important covariates emerged, including elevated phosphate. Analysis in the central U.S. region indicated that elevated arsenic concentrations were driven by a mixture of those observed in the western and eastern regions.

  17. VOXEL-BASED APPROACH FOR ESTIMATING URBAN TREE VOLUME FROM TERRESTRIAL LASER SCANNING DATA

    Directory of Open Access Journals (Sweden)

    C. Vonderach

    2012-07-01

    Full Text Available The importance of single trees and the determination of related parameters has been recognized in recent years, e.g. for forest inventories or management. For urban areas an increasing interest in the data acquisition of trees can be observed concerning aspects like urban climate, CO2 balance, and environmental protection. Urban trees differ significantly from natural systems with regard to the site conditions (e.g. technogenic soils, contaminants, lower groundwater level, regular disturbance, climate (increased temperature, reduced humidity and species composition and arrangement (habitus and health status and therefore allometric relations cannot be transferred from natural sites to urban areas. To overcome this problem an extended approach was developed for a fast and non-destructive extraction of branch volume, DBH (diameter at breast height and height of single trees from point clouds of terrestrial laser scanning (TLS. For data acquisition, the trees were scanned with highest scan resolution from several (up to five positions located around the tree. The resulting point clouds (20 to 60 million points are analysed with an algorithm based on voxel (volume elements structure, leading to an appropriate data reduction. In a first step, two kinds of noise reduction are carried out: the elimination of isolated voxels as well as voxels with marginal point density. To obtain correct volume estimates, the voxels inside the stem and branches (interior voxels where voxels contain no laser points must be regarded. For this filling process, an easy and robust approach was developed based on a layer-wise (horizontal layers of the voxel structure intersection of four orthogonal viewing directions. However, this procedure also generates several erroneous "phantom" voxels, which have to be eliminated. For this purpose the previous approach was extended by a special region growing algorithm. In a final step the volume is determined layer-wise based on the

  18. A conceptual approach to approximate tree root architecture in infinite slope models

    Science.gov (United States)

    Schmaltz, Elmar; Glade, Thomas

    2016-04-01

    Vegetation-related properties - particularly tree root distribution and coherent hydrologic and mechanical effects on the underlying soil mantle - are commonly not considered in infinite slope models. Indeed, from a geotechnical point of view, these effects appear to be difficult to be reproduced reliably in a physically-based modelling approach. The growth of a tree and the expansion of its root architecture are directly connected with both intrinsic properties such as species and age, and extrinsic factors like topography, availability of nutrients, climate and soil type. These parameters control four main issues of the tree root architecture: 1) Type of rooting; 2) maximum growing distance to the tree stem (radius r); 3) maximum growing depth (height h); and 4) potential deformation of the root system. Geometric solids are able to approximate the distribution of a tree root system. The objective of this paper is to investigate whether it is possible to implement root systems and the connected hydrological and mechanical attributes sufficiently in a 3-dimensional slope stability model. Hereby, a spatio-dynamic vegetation module should cope with the demands of performance, computation time and significance. However, in this presentation, we focus only on the distribution of roots. The assumption is that the horizontal root distribution around a tree stem on a 2-dimensional plane can be described by a circle with the stem located at the centroid and a distinct radius r that is dependent on age and species. We classified three main types of tree root systems and reproduced the species-age-related root distribution with three respective mathematical solids in a synthetic 3-dimensional hillslope ambience. Thus, two solids in an Euclidian space were distinguished to represent the three root systems: i) cylinders with radius r and height h, whilst the dimension of latter defines the shape of a taproot-system or a shallow-root-system respectively; ii) elliptic

  19. Three approaches to deal with inconsistent decision tables - Comparison of decision tree complexity

    KAUST Repository

    Azad, Mohammad

    2013-01-01

    In inconsistent decision tables, there are groups of rows with equal values of conditional attributes and different decisions (values of the decision attribute). We study three approaches to deal with such tables. Instead of a group of equal rows, we consider one row given by values of conditional attributes and we attach to this row: (i) the set of all decisions for rows from the group (many-valued decision approach); (ii) the most common decision for rows from the group (most common decision approach); and (iii) the unique code of the set of all decisions for rows from the group (generalized decision approach). We present experimental results and compare the depth, average depth and number of nodes of decision trees constructed by a greedy algorithm in the framework of each of the three approaches. © 2013 Springer-Verlag.

  20. Direct Marketing and the Structure of Farm Sales: An Unconditional Quantile Regression Approach

    OpenAIRE

    Park, Timothy A.

    2015-01-01

    This paper examines the impact of participation in direct marketing on the entire distribution of farm sales using the unconditional quantile regression (UQR) estimator. Our analysis yields unbiased estimates of the unconditional impact of direct marketing on farm sales and reveals the heterogeneous effects that occur across the distribution of farm sales. The impacts of direct marketing efforts are uniformly negative across the UQR results, but declines in sales tend to grow smaller as sales...

  1. Healthcare Expenditures Associated with Depression Among Individuals with Osteoarthritis: Post-Regression Linear Decomposition Approach.

    Science.gov (United States)

    Agarwal, Parul; Sambamoorthi, Usha

    2015-12-01

    Depression is common among individuals with osteoarthritis and leads to increased healthcare burden. The objective of this study was to examine excess total healthcare expenditures associated with depression among individuals with osteoarthritis in the US. Adults with self-reported osteoarthritis (n = 1881) were identified using data from the 2010 Medical Expenditure Panel Survey (MEPS). Among those with osteoarthritis, chi-square tests and ordinary least square regressions (OLS) were used to examine differences in healthcare expenditures between those with and without depression. Post-regression linear decomposition technique was used to estimate the relative contribution of different constructs of the Anderson's behavioral model, i.e., predisposing, enabling, need, personal healthcare practices, and external environment factors, to the excess expenditures associated with depression among individuals with osteoarthritis. All analysis accounted for the complex survey design of MEPS. Depression coexisted among 20.6 % of adults with osteoarthritis. The average total healthcare expenditures were $13,684 among adults with depression compared to $9284 among those without depression. Multivariable OLS regression revealed that adults with depression had 38.8 % higher healthcare expenditures (p regression linear decomposition analysis indicated that 50 % of differences in expenditures among adults with and without depression can be explained by differences in need factors. Among individuals with coexisting osteoarthritis and depression, excess healthcare expenditures associated with depression were mainly due to comorbid anxiety, chronic conditions and poor health status. These expenditures may potentially be reduced by providing timely intervention for need factors or by providing care under a collaborative care model.

  2. Direct Marketing and the Structure of Farm Sales: An Unconditional Quantile Regression Approach

    OpenAIRE

    Park, Timothy A.

    2015-01-01

    This paper examines the impact of participation in direct marketing on the entire distribution of farm sales using the unconditional quantile regression (UQR) estimator. Our analysis yields unbiased estimates of the unconditional impact of direct marketing on farm sales and reveals the heterogeneous effects that occur across the distribution of farm sales. The impacts of direct marketing efforts are uniformly negative across the UQR results, but declines in sales tend to grow smaller as sales...

  3. Breaking the waves: a poisson regression approach to Schumpeterian clustering of basic innovations

    OpenAIRE

    Silverberg, G.P.; Verspagen, B.

    2000-01-01

    The Schumpeterian theory of long waves has given rise to an intense debate on the existenceof clusters of basic innovations. Silverberg and Lehnert have criticized the empirical part ofthis literature on several methodological accounts. In this paper, we propose the methodologyof Poisson regression as a logical way to incorporate this criticism. We construct a new timeseries for basic innovations (based on previously used time series), and use this to test thehypothesis that basic innovations...

  4. Sequence Comparison Alignment-Free Approach Based on Suffix Tree and L-Words Frequency

    Directory of Open Access Journals (Sweden)

    Inês Soares

    2012-01-01

    Full Text Available The vast majority of methods available for sequence comparison rely on a first sequence alignment step, which requires a number of assumptions on evolutionary history and is sometimes very difficult or impossible to perform due to the abundance of gaps (insertions/deletions. In such cases, an alternative alignment-free method would prove valuable. Our method starts by a computation of a generalized suffix tree of all sequences, which is completed in linear time. Using this tree, the frequency of all possible words with a preset length L—L-words—in each sequence is rapidly calculated. Based on the L-words frequency profile of each sequence, a pairwise standard Euclidean distance is then computed producing a symmetric genetic distance matrix, which can be used to generate a neighbor joining dendrogram or a multidimensional scaling graph. We present an improvement to word counting alignment-free approaches for sequence comparison, by determining a single optimal word length and combining suffix tree structures to the word counting tasks. Our approach is, thus, a fast and simple application that proved to be efficient and powerful when applied to mitochondrial genomes. The algorithm was implemented in Python language and is freely available on the web.

  5. A Consensus Tree Approach for Reconstructing Human Evolutionary History and Detecting Population Substructure

    Science.gov (United States)

    Tsai, Ming-Chi; Blelloch, Guy; Ravi, R.; Schwartz, Russell

    The random accumulation of variations in the human genome over time implicitly encodes a history of how human populations have arisen, dispersed, and intermixed since we emerged as a species. Reconstructing that history is a challenging computational and statistical problem but has important applications both to basic research and to the discovery of genotype-phenotype correlations. In this study, we present a novel approach to inferring human evolutionary history from genetic variation data. Our approach uses the idea of consensus trees, a technique generally used to reconcile species trees from divergent gene trees, adapting it to the problem of finding the robust relationships within a set of intraspecies phylogenies derived from local regions of the genome. We assess the quality of the method on two large-scale genetic variation data sets: the HapMap Phase II and the Human Genome Diversity Project. Qualitative comparison to a consensus model of the evolution of modern human population groups shows that our inferences closely match our best current understanding of human evolutionary history. A further comparison with results of a leading method for the simpler problem of population substructure assignment verifies that our method provides comparable accuracy in identifying meaningful population subgroups in addition to inferring the relationships among them.

  6. Analysis of sparse data in logistic regression in medical research: A newer approach

    Directory of Open Access Journals (Sweden)

    S Devika

    2016-01-01

    Full Text Available Background and Objective: In the analysis of dichotomous type response variable, logistic regression is usually used. However, the performance of logistic regression in the presence of sparse data is questionable. In such a situation, a common problem is the presence of high odds ratios (ORs with very wide 95% confidence interval (CI (OR: >999.999, 95% CI: 999.999. In this paper, we addressed this issue by using penalized logistic regression (PLR method. Materials and Methods: Data from case-control study on hyponatremia and hiccups conducted in Christian Medical College, Vellore, Tamil Nadu, India was used. The outcome variable was the presence/absence of hiccups and the main exposure variable was the status of hyponatremia. Simulation dataset was created with different sample sizes and with a different number of covariates. Results: A total of 23 cases and 50 controls were used for the analysis of ordinary and PLR methods. The main exposure variable hyponatremia was present in nine (39.13% of the cases and in four (8.0% of the controls. Of the 23 hiccup cases, all were males and among the controls, 46 (92.0% were males. Thus, the complete separation between gender and the disease group led into an infinite OR with 95% CI (OR: >999.999, 95% CI: 999.999 whereas there was a finite and consistent regression coefficient for gender (OR: 5.35; 95% CI: 0.42, 816.48 using PLR. After adjusting for all the confounding variables, hyponatremia entailed 7.9 (95% CI: 2.06, 38.86 times higher risk for the development of hiccups as was found using PLR whereas there was an overestimation of risk OR: 10.76 (95% CI: 2.17, 53.41 using the conventional method. Simulation experiment shows that the estimated coverage probability of this method is near the nominal level of 95% even for small sample sizes and for a large number of covariates. Conclusions: PLR is almost equal to the ordinary logistic regression when the sample size is large and is superior in small cell

  7. Modifiable risk factors predicting major depressive disorder at four year follow-up: a decision tree approach

    Directory of Open Access Journals (Sweden)

    Christensen Helen

    2009-11-01

    Full Text Available Abstract Background Relative to physical health conditions such as cardiovascular disease, little is known about risk factors that predict the prevalence of depression. The present study investigates the expected effects of a reduction of these risks over time, using the decision tree method favoured in assessing cardiovascular disease risk. Methods The PATH through Life cohort was used for the study, comprising 2,105 20-24 year olds, 2,323 40-44 year olds and 2,177 60-64 year olds sampled from the community in the Canberra region, Australia. A decision tree methodology was used to predict the presence of major depressive disorder after four years of follow-up. The decision tree was compared with a logistic regression analysis using ROC curves. Results The decision tree was found to distinguish and delineate a wide range of risk profiles. Previous depressive symptoms were most highly predictive of depression after four years, however, modifiable risk factors such as substance use and employment status played significant roles in assessing the risk of depression. The decision tree was found to have better sensitivity and specificity than a logistic regression using identical predictors. Conclusion The decision tree method was useful in assessing the risk of major depressive disorder over four years. Application of the model to the development of a predictive tool for tailored interventions is discussed.

  8. The Application of Classification and Regression Trees for the Triage of Women for Referral to Colposcopy and the Estimation of Risk for Cervical Intraepithelial Neoplasia: A Study Based on 1625 Cases with Incomplete Data from Molecular Tests

    Directory of Open Access Journals (Sweden)

    Abraham Pouliakis

    2015-01-01

    Full Text Available Objective. Nowadays numerous ancillary techniques detecting HPV DNA and mRNA compete with cytology; however no perfect test exists; in this study we evaluated classification and regression trees (CARTs for the production of triage rules and estimate the risk for cervical intraepithelial neoplasia (CIN in cases with ASCUS+ in cytology. Study Design. We used 1625 cases. In contrast to other approaches we used missing data to increase the data volume, obtain more accurate results, and simulate real conditions in the everyday practice of gynecologic clinics and laboratories. The proposed CART was based on the cytological result, HPV DNA typing, HPV mRNA detection based on NASBA and flow cytometry, p16 immunocytochemical expression, and finally age and parous status. Results. Algorithms useful for the triage of women were produced; gynecologists could apply these in conjunction with available examination results and conclude to an estimation of the risk for a woman to harbor CIN expressed as a probability. Conclusions. The most important test was the cytological examination; however the CART handled cases with inadequate cytological outcome and increased the diagnostic accuracy by exploiting the results of ancillary techniques even if there were inadequate missing data. The CART performance was better than any other single test involved in this study.

  9. Decision tree approach for classification of remotely sensed satellite data using open source support

    Indian Academy of Sciences (India)

    Richa Sharma; Aniruddha Ghosh; P K Joshi

    2013-10-01

    In this study, an attempt has been made to develop a decision tree classification (DTC) algorithm for classification of remotely sensed satellite data (Landsat TM) using open source support. The decision tree is constructed by recursively partitioning the spectral distribution of the training dataset using WEKA, open source data mining software. The classified image is compared with the image classified using classical ISODATA clustering and Maximum Likelihood Classifier (MLC) algorithms. Classification result based on DTC method provided better visual depiction than results produced by ISODATA clustering or by MLC algorithms. The overall accuracy was found to be 90% (kappa = 0.88) using the DTC, 76.67% (kappa = 0.72) using the Maximum Likelihood and 57.5% (kappa = 0.49) using ISODATA clustering method. Based on the overall accuracy and kappa statistics, DTC was found to be more preferred classification approach than others.

  10. Identifying Risk and Protective Factors in Recidivist Juvenile Offenders: A Decision Tree Approach

    Science.gov (United States)

    Ortega-Campos, Elena; García-García, Juan; Gil-Fenoy, Maria José; Zaldívar-Basurto, Flor

    2016-01-01

    Research on juvenile justice aims to identify profiles of risk and protective factors in juvenile offenders. This paper presents a study of profiles of risk factors that influence young offenders toward committing sanctionable antisocial behavior (S-ASB). Decision tree analysis is used as a multivariate approach to the phenomenon of repeated sanctionable antisocial behavior in juvenile offenders in Spain. The study sample was made up of the set of juveniles who were charged in a court case in the Juvenile Court of Almeria (Spain). The period of study of recidivism was two years from the baseline. The object of study is presented, through the implementation of a decision tree. Two profiles of risk and protective factors are found. Risk factors associated with higher rates of recidivism are antisocial peers, age at baseline S-ASB, problems in school and criminality in family members. PMID:27611313

  11. Antibiotic Resistances in Livestock: A Comparative Approach to Identify an Appropriate Regression Model for Count Data

    Directory of Open Access Journals (Sweden)

    Anke Hüls

    2017-05-01

    Full Text Available Antimicrobial resistance in livestock is a matter of general concern. To develop hygiene measures and methods for resistance prevention and control, epidemiological studies on a population level are needed to detect factors associated with antimicrobial resistance in livestock holdings. In general, regression models are used to describe these relationships between environmental factors and resistance outcome. Besides the study design, the correlation structures of the different outcomes of antibiotic resistance and structural zero measurements on the resistance outcome as well as on the exposure side are challenges for the epidemiological model building process. The use of appropriate regression models that acknowledge these complexities is essential to assure valid epidemiological interpretations. The aims of this paper are (i to explain the model building process comparing several competing models for count data (negative binomial model, quasi-Poisson model, zero-inflated model, and hurdle model and (ii to compare these models using data from a cross-sectional study on antibiotic resistance in animal husbandry. These goals are essential to evaluate which model is most suitable to identify potential prevention measures. The dataset used as an example in our analyses was generated initially to study the prevalence and associated factors for the appearance of cefotaxime-resistant Escherichia coli in 48 German fattening pig farms. For each farm, the outcome was the count of samples with resistant bacteria. There was almost no overdispersion and only moderate evidence of excess zeros in the data. Our analyses show that it is essential to evaluate regression models in studies analyzing the relationship between environmental factors and antibiotic resistances in livestock. After model comparison based on evaluation of model predictions, Akaike information criterion, and Pearson residuals, here the hurdle model was judged to be the most appropriate

  12. Risk assessment for enterprise resource planning (ERP) system implementations: a fault tree analysis approach

    Science.gov (United States)

    Zeng, Yajun; Skibniewski, Miroslaw J.

    2013-08-01

    Enterprise resource planning (ERP) system implementations are often characterised with large capital outlay, long implementation duration, and high risk of failure. In order to avoid ERP implementation failure and realise the benefits of the system, sound risk management is the key. This paper proposes a probabilistic risk assessment approach for ERP system implementation projects based on fault tree analysis, which models the relationship between ERP system components and specific risk factors. Unlike traditional risk management approaches that have been mostly focused on meeting project budget and schedule objectives, the proposed approach intends to address the risks that may cause ERP system usage failure. The approach can be used to identify the root causes of ERP system implementation usage failure and quantify the impact of critical component failures or critical risk events in the implementation process.

  13. Regional trends in short-duration precipitation extremes: a flexible multivariate monotone quantile regression approach

    Science.gov (United States)

    Cannon, Alex

    2017-04-01

    Estimating historical trends in short-duration rainfall extremes at regional and local scales is challenging due to low signal-to-noise ratios and the limited availability of homogenized observational data. In addition to being of scientific interest, trends in rainfall extremes are of practical importance, as their presence calls into question the stationarity assumptions that underpin traditional engineering and infrastructure design practice. Even with these fundamental challenges, increasingly complex questions are being asked about time series of extremes. For instance, users may not only want to know whether or not rainfall extremes have changed over time, they may also want information on the modulation of trends by large-scale climate modes or on the nonstationarity of trends (e.g., identifying hiatus periods or periods of accelerating positive trends). Efforts have thus been devoted to the development and application of more robust and powerful statistical estimators for regional and local scale trends. While a standard nonparametric method like the regional Mann-Kendall test, which tests for the presence of monotonic trends (i.e., strictly non-decreasing or non-increasing changes), makes fewer assumptions than parametric methods and pools information from stations within a region, it is not designed to visualize detected trends, include information from covariates, or answer questions about the rate of change in trends. As a remedy, monotone quantile regression (MQR) has been developed as a nonparametric alternative that can be used to estimate a common monotonic trend in extremes at multiple stations. Quantile regression makes efficient use of data by directly estimating conditional quantiles based on information from all rainfall data in a region, i.e., without having to precompute the sample quantiles. The MQR method is also flexible and can be used to visualize and analyze the nonlinearity of the detected trend. However, it is fundamentally a

  14. Modeling geochemical datasets for source apportionment: Comparison of least square regression and inversion approaches.

    Digital Repository Service at National Institute of Oceanography (India)

    Tripathy, G.R.; Das, Anirban.

    -determined linear system, where the measured chemical data (bi = 1 to m) of the system are related to the chemical composition of its possible sources/end-members (aij; i = 1 to n; j = 1 to m) and their relative contributions to the mixture (xi; i = 1 to m... estimates when the IM-derived a-posteriori are used for source composition in the LSR. The slope of the error-weighted regression relation between the LSR results (with a-posteriori inputs) and the IM ones is found to be 0.89 ± 0.19, indistinguishable from...

  15. An Approach to Indexing and Retrieval of Spatial Data with Reduced R+ Tree and K-NN Query Algorithm

    Directory of Open Access Journals (Sweden)

    S. Palaniappan

    2015-05-01

    Full Text Available Recently, “spatial data bases have been extensively adopted in the recent decade and various methods have been presented to store, browse, search and retrieve spatial objects”. In this study, a method is plotted for retrieving nearest neighbors from spatial data indexed by R+ tree. The approach uses a reduced R+tree for the purpose of representing the spatial data. Initially the spatial data is selected and R+tree is constructed accordingly. Then a function called joining nodes is applied to reduce the number of nodes by combining the half-filled nodes to form completely filled. The idea behind reducing the nodes is to perform search and retrieval quickly and efficiently. The reduced R+ tree is then processed with KNN query algorithm to fetch the nearest neighbors to a point query. The basic procedures of KNN algorithm are used in the proposed approach for retrieving the nearest neighbors. The proposed approach is evaluated for its performance withspatial data and results are plotted in the experimental analysis section. The experimental results showed that the proposed approach is remarkably up a head than the conventional methods. The maximum time required to index the 1000 data points by the R+ tree is 10324 ms. The number of nodes possessed by reduced R+ tree is also less for 1000 data points as compared to the conventional R+ tree algorithm.

  16. Statistical Downscaling Output GCM Modeling with Continuum Regression and Pre-Processing PCA Approach

    Directory of Open Access Journals (Sweden)

    Sutikno Sutikno

    2010-08-01

    Full Text Available One of the climate models used to predict the climatic conditions is Global Circulation Models (GCM. GCM is a computer-based model that consists of different equations. It uses numerical and deterministic equation which follows the physics rules. GCM is a main tool to predict climate and weather, also it uses as primary information source to review the climate change effect. Statistical Downscaling (SD technique is used to bridge the large-scale GCM with a small scale (the study area. GCM data is spatial and temporal data most likely to occur where the spatial correlation between different data on the grid in a single domain. Multicollinearity problems require the need for pre-processing of variable data X. Continuum Regression (CR and pre-processing with Principal Component Analysis (PCA methods is an alternative to SD modelling. CR is one method which was developed by Stone and Brooks (1990. This method is a generalization from Ordinary Least Square (OLS, Principal Component Regression (PCR and Partial Least Square method (PLS methods, used to overcome multicollinearity problems. Data processing for the station in Ambon, Pontianak, Losarang, Indramayu and Yuntinyuat show that the RMSEP values and R2 predict in the domain 8x8 and 12x12 by uses CR method produces results better than by PCR and PLS.

  17. Predicting agility performance with other performance variables in pubescent boys: a multiple-regression approach.

    Science.gov (United States)

    Sekulic, Damir; Spasic, Miodrag; Esco, Michael R

    2014-04-01

    The goal was to investigate the influence of balance, jumping power, reactive-strength, speed, and morphological variables on five different agility performances in early pubescent boys (N = 71). The predictors included body height and mass, countermovement and broad jumps, overall stability index, 5 m sprint, and bilateral side jumps test of reactive strength. Forward stepwise regressions calculated on 36 randomly selected participants explained 47% of the variance in performance of the forward-backward running test, 50% of the 180 degrees turn test, 55% of the 20 yd. shuttle test, 62% of the T-shaped course test, and 44% of the zig-zag test, with the bilateral side jumps as the single best predictor. Regression models were cross-validated using the second half of the sample (n = 35). Correlation between predicted and achieved scores did not provide statistically significant validation statistics for the continuous-movement zig-zag test. Further study is needed to assess other predictors of agility in early pubescent boys.

  18. Classification and Regression Tree Analysis of Clinical Patterns to Predict the Survival of Patients with Advanced Non-small Cell Lung Cancer Treated with Erlotinib

    Directory of Open Access Journals (Sweden)

    Yutao LIU

    2011-10-01

    Full Text Available Background and objective Erlotinib is a targeted therapy drug for non-small cell lung cancer (NSCLC. It has been proven that, there was evidence of various survival benefits derived from erlotinib in patients with different clinical features, but the results are conflicting. The aim of this study is to identify novel predictive factors and explore the interactions between clinical variables as well as their impact on the survival of Chinese patients with advanced NSCLC heavily treated with erlotinib. Methods The clinical and follow-up data of 105 Chinese NSCLC patients referred to the Cancer Hospital and Institute, Chinese Academy of Medical Sciences from September 2006 to September 2009 were analyzed. Multivariate analysis of progressive-free survival (PFS was performed using recursive partitioning referred to as the classification and regression tree (CART analysis. Results The median PFS of 105 eligible consecutive Chinese NSCLC patients was 5.0 months (95%CI: 2.9-7.1. CART analysis was performed for the initial, second, and third split in the lymph node involvement, the time of erlotinib administration, and smoking history. Four terminal subgroups were formed. The longer values for the median PFS were 11.0 months (95%CI: 8.9-13.1 for the subgroup with no lymph node metastasis and 10.0 months (95%CI: 7.9-12.1 for the subgroup with lymph node involvement, but not over the second-line erlotinib treatment with a smoking history ≤35 packs per year. The shorter values for the median PFS were 2.3 months (95%CI: 1.6-3.0 for the subgroup with lymph node metastasis and over the second-line erlotinib treatment, and 1.3 months (95%CI: 0.5-2.1 for the subgroup with lymph node metastasis, but not over the second-line erlotinib treatment with a smoking history >35 packs per year. Conclusion Lymph node metastasis, the time of erlotinib administration, and smoking history are closely correlated with the survival of advanced NSCLC patients with first- to

  19. A quantile regression approach can reveal the effect of fruit and vegetable consumption on plasma homocysteine levels.

    Directory of Open Access Journals (Sweden)

    Eliseu Verly-Jr

    Full Text Available A reduction in homocysteine concentration due to the use of supplemental folic acid is well recognized, although evidence of the same effect for natural folate sources, such as fruits and vegetables (FV, is lacking. The traditional statistical analysis approaches do not provide further information. As an alternative, quantile regression allows for the exploration of the effects of covariates through percentiles of the conditional distribution of the dependent variable.To investigate how the associations of FV intake with plasma total homocysteine (tHcy differ through percentiles in the distribution using quantile regression.A cross-sectional population-based survey was conducted among 499 residents of Sao Paulo City, Brazil. The participants provided food intake and fasting blood samples. Fruit and vegetable intake was predicted by adjusting for day-to-day variation using a proper measurement error model. We performed a quantile regression to verify the association between tHcy and the predicted FV intake. The predicted values of tHcy for each percentile model were calculated considering an increase of 200 g in the FV intake for each percentile.The results showed that tHcy was inversely associated with FV intake when assessed by linear regression whereas, the association was different when using quantile regression. The relationship with FV consumption was inverse and significant for almost all percentiles of tHcy. The coefficients increased as the percentile of tHcy increased. A simulated increase of 200 g in the FV intake could decrease the tHcy levels in the overall percentiles, but the higher percentiles of tHcy benefited more.This study confirms that the effect of FV intake on lowering the tHcy levels is dependent on the level of tHcy using an innovative statistical approach. From a public health point of view, encouraging people to increase FV intake would benefit people with high levels of tHcy.

  20. Mapping trees outside forests using high-resolution aerial imagery: a comparison of pixel- and object-based classification approaches.

    Science.gov (United States)

    Meneguzzo, Dacia M; Liknes, Greg C; Nelson, Mark D

    2013-08-01

    Discrete trees and small groups of trees in nonforest settings are considered an essential resource around the world and are collectively referred to as trees outside forests (ToF). ToF provide important functions across the landscape, such as protecting soil and water resources, providing wildlife habitat, and improving farmstead energy efficiency and aesthetics. Despite the significance of ToF, forest and other natural resource inventory programs and geospatial land cover datasets that are available at a national scale do not include comprehensive information regarding ToF in the United States. Additional ground-based data collection and acquisition of specialized imagery to inventory these resources are expensive alternatives. As a potential solution, we identified two remote sensing-based approaches that use free high-resolution aerial imagery from the National Agriculture Imagery Program (NAIP) to map all tree cover in an agriculturally dominant landscape. We compared the results obtained using an unsupervised per-pixel classifier (independent component analysis-[ICA]) and an object-based image analysis (OBIA) procedure in Steele County, Minnesota, USA. Three types of accuracy assessments were used to evaluate how each method performed in terms of: (1) producing a county-level estimate of total tree-covered area, (2) correctly locating tree cover on the ground, and (3) how tree cover patch metrics computed from the classified outputs compared to those delineated by a human photo interpreter. Both approaches were found to be viable for mapping tree cover over a broad spatial extent and could serve to supplement ground-based inventory data. The ICA approach produced an estimate of total tree cover more similar to the photo-interpreted result, but the output from the OBIA method was more realistic in terms of describing the actual observed spatial pattern of tree cover.

  1. A Classification Regression Tree Analysis to Reduce Balance Impairments and Falls in the Older population: Impact on Resource Utilization and Clinical Decision-Making in USA Rehabilitation Service Delivery

    Directory of Open Access Journals (Sweden)

    Lucinda Pfalzer

    2013-06-01

    Full Text Available Background/Purpose: Over 1/3 of adults over age 65 experiences at least one fall each year. This pilot report uses a classification regression tree analysis (CART to model the outcomes for balance/risk of falls from the Gentiva® Safe Strides® Program (SSP. Methods/Outcomes: SSP is a home-based balance/fall prevention program designed to treat root causes of a patient

  2. A Principal Component Regression Approach for Estimating Ventricular Repolarization Duration Variability

    Directory of Open Access Journals (Sweden)

    Pasi A. Karjalainen

    2007-01-01

    Full Text Available Ventricular repolarization duration (VRD is affected by heart rate and autonomic control, and thus VRD varies in time in a similar way as heart rate. VRD variability is commonly assessed by determining the time differences between successive R- and T-waves, that is, RT intervals. Traditional methods for RT interval detection necessitate the detection of either T-wave apexes or offsets. In this paper, we propose a principal-component-regression- (PCR- based method for estimating RT variability. The main benefit of the method is that it does not necessitate T-wave detection. The proposed method is compared with traditional RT interval measures, and as a result, it is observed to estimate RT variability accurately and to be less sensitive to noise than the traditional methods. As a specific application, the method is applied to exercise electrocardiogram (ECG recordings.

  3. EFFECT OF HUMAN CAPITAL ON MAIZE PRODUCTIVITY IN GHANA: A QUANTILE REGRESSION APPROACH

    Directory of Open Access Journals (Sweden)

    Isaac Nyamekye

    2016-04-01

    Full Text Available Agriculture continues to play an important role in the economy of most African countries. Thus, productivity growth in agriculture is necessary for economic growth and poverty reduction of the region. While, theoretically, investing in human capital improves productivity, the empirical evidence is somewhat mixed, especially in developing countries. In Ghana, maize is associated with household food security, and low-income households are considered food insecure if they have no maize in stock. But, due to low productivity, Ghanaian farmers are yet to produce enough to meet local demand. Using quantile and OLS regression techniques, this study contributes to the literature on human capital and productivity by assessing the effect of human capital (captured by education, farming experience and access to extension services on maize productivity in Ghana. The results suggest that although human capital has no significant effect on maize yields, its effect on productivity varies across quantiles.

  4. In search of a corrected prescription drug elasticity estimate: a meta-regression approach.

    Science.gov (United States)

    Gemmill, Marin C; Costa-Font, Joan; McGuire, Alistair

    2007-06-01

    An understanding of the relationship between cost sharing and drug consumption depends on consistent and unbiased price elasticity estimates. However, there is wide heterogeneity among studies, which constrains the applicability of elasticity estimates for empirical purposes and policy simulation. This paper attempts to provide a corrected measure of the drug price elasticity by employing meta-regression analysis (MRA). The results indicate that the elasticity estimates are significantly different from zero, and the corrected elasticity is -0.209 when the results are made robust to heteroskedasticity and clustering of observations. Elasticity values are higher when the study was published in an economic journal, when the study employed a greater number of observations, and when the study used aggregate data. Elasticity estimates are lower when the institutional setting was a tax-based health insurance system.

  5. Adjusting for Cell Type Composition in DNA Methylation Data Using a Regression-Based Approach.

    Science.gov (United States)

    Jones, Meaghan J; Islam, Sumaiya A; Edgar, Rachel D; Kobor, Michael S

    2017-01-01

    Analysis of DNA methylation in a population context has the potential to uncover novel gene and environment interactions as well as markers of health and disease. In order to find such associations it is important to control for factors which may mask or alter DNA methylation signatures. Since tissue of origin and coinciding cell type composition are major contributors to DNA methylation patterns, and can easily confound important findings, it is vital to adjust DNA methylation data for such differences across individuals. Here we describe the use of a regression method to adjust for cell type composition in DNA methylation data. We specifically discuss what information is required to adjust for cell type composition and then provide detailed instructions on how to perform cell type adjustment on high dimensional DNA methylation data. This method has been applied mainly to Illumina 450K data, but can also be adapted to pyrosequencing or genome-wide bisulfite sequencing data.

  6. Arch Index: An Easier Approach for Arch Height (A Regression Analysis

    Directory of Open Access Journals (Sweden)

    Hironmoy Roy

    2012-04-01

    Full Text Available Background: Arch-height estimation though practiced usually in supine posture; is neither correct nor scientific as referred in literature, which favour for standing x-rays or arch-index as yardstick. In fact the standing x-rays can be excused for being troublesome in busy OPD, but an ink-footprint on simple graph-sheet can be documented, as it is easier, cheaper and requires almost no machineries and expertisation. Objective: So this study aimed to redefine the inter-relationship of the radiological standing arch-heights with the arch-index for correlation and regression so that from the later we can derive the radiographical standing arch-height values indirectly, avoiding the actual maneuver. Methods: The study involved 103 adult subjects attending at a tertiary care hospital of North Bengal. From the standing x-rays of foot, the standing navicular, talar heights were measured, and ‘normalised’ with the foot length. In parallel foot-prints also been obtained for arch-index. Finally variables analysed by SPSS software. Result: The arch-index showed significant negative correlations and simple linear regressions with standing navicular height, standing talar height as well as standing normalised navicular and talar heights analysed in both sexes separately with supporting mathematical equations. Conclusion: To measure the standing arch-height in a busy OPD, it is wise to have the foot-print first. Arch-index once get known, can be put in the equations as derived here, to predict the preferred standing arch-heights in either sex.

  7. A Quantile Regression Approach to Understanding the Relations Between Morphological Awareness, Vocabulary, and Reading Comprehension in Adult Basic Education Students

    Science.gov (United States)

    Tighe, Elizabeth L.; Schatschneider, Christopher

    2015-01-01

    The purpose of this study was to investigate the joint and unique contributions of morphological awareness and vocabulary knowledge at five reading comprehension levels in Adult Basic Education (ABE) students. We introduce the statistical technique of multiple quantile regression, which enabled us to assess the predictive utility of morphological awareness and vocabulary knowledge at multiple points (quantiles) along the continuous distribution of reading comprehension. To demonstrate the efficacy of our multiple quantile regression analysis, we compared and contrasted our results with a traditional multiple regression analytic approach. Our results indicated that morphological awareness and vocabulary knowledge accounted for a large portion of the variance (82-95%) in reading comprehension skills across all quantiles. Morphological awareness exhibited the greatest unique predictive ability at lower levels of reading comprehension whereas vocabulary knowledge exhibited the greatest unique predictive ability at higher levels of reading comprehension. These results indicate the utility of using multiple quantile regression to assess trajectories of component skills across multiple levels of reading comprehension. The implications of our findings for ABE programs are discussed. PMID:25351773

  8. A Modular Approach Utilizing Decision Tree in Teaching Integration Techniques in Calculus

    Directory of Open Access Journals (Sweden)

    Edrian E. Gonzales

    2015-08-01

    Full Text Available This study was conducted to test the effectiveness of modular approach using decision tree in teaching integration techniques in Calculus. It sought answer to the question: Is there a significant difference between the mean scores of two groups of students in their quizzes on (1 integration by parts and (2 integration by trigonometric transformation? Twenty-eight second year B.S. Computer Science students at City College of Calamba who were enrolled in Mathematical Analysis II for the second semester of school year 2013-2014 were purposively chosen as respondents. The study made use of the non-equivalent control group posttest-only design of quasi-experimental research. The experimental group was taught using modular approach while the comparison group was exposed to traditional instruction. The research instruments used were two twenty-item multiple-choice-type quizzes. Statistical treatment used the mean, standard deviation, Shapiro-Wilk test for normality, twotailed t-test for independent samples, and Mann-Whitney U-test. The findings led to the conclusion that both modular and traditional instructions were equally effective in facilitating the learning of integration by parts. The other result revealed that the use of modular approach utilizing decision tree in teaching integration by trigonometric transformation was more effective than the traditional method.

  9. In silico prediction of toxicity of phenols to Tetrahymena pyriformis by using genetic algorithm and decision tree-based modeling approach.

    Science.gov (United States)

    Abbasitabar, Fatemeh; Zare-Shahabadi, Vahid

    2017-04-01

    Risk assessment of chemicals is an important issue in environmental protection; however, there is a huge lack of experimental data for a large number of end-points. The experimental determination of toxicity of chemicals involves high costs and time-consuming process. In silico tools such as quantitative structure-toxicity relationship (QSTR) models, which are constructed on the basis of computational molecular descriptors, can predict missing data for toxic end-points for existing or even not yet synthesized chemicals. Phenol derivatives are known to be aquatic pollutants. With this background, we aimed to develop an accurate and reliable QSTR model for the prediction of toxicity of 206 phenols to Tetrahymena pyriformis. A multiple linear regression (MLR)-based QSTR was obtained using a powerful descriptor selection tool named Memorized_ACO algorithm. Statistical parameters of the model were 0.72 and 0.68 for Rtraining(2) and Rtest(2), respectively. To develop a high-quality QSTR model, classification and regression tree (CART) was employed. Two approaches were considered: (1) phenols were classified into different modes of action using CART and (2) the phenols in the training set were partitioned to several subsets by a tree in such a manner that in each subset, a high-quality MLR could be developed. For the first approach, the statistical parameters of the resultant QSTR model were improved to 0.83 and 0.75 for Rtraining(2) and Rtest(2), respectively. Genetic algorithm was employed in the second approach to obtain an optimal tree, and it was shown that the final QSTR model provided excellent prediction accuracy for the training and test sets (Rtraining(2) and Rtest(2) were 0.91 and 0.93, respectively). The mean absolute error for the test set was computed as 0.1615.

  10. Energy production through organic fraction of municipal solid waste-A multiple regression modeling approach.

    Science.gov (United States)

    Ramesh, N; Ramesh, S; Vennila, G; Abdul Bari, J; MageshKumar, P

    2016-12-01

    In the 21st century, people migrated from rural to urban areas for several reasons. As a result, the populations of Indian cities are increasing day by day. On one hand, the country is developing in the field of science and technology and on the other hand, it is encountering a serious problem called 'Environmental degradation'. Due to increase in population, the generation of solid waste is also increased and is being disposed in open dumps and landfills which lead to air and land pollution. This study is attempted to generate energy out of organic solid waste by the bio- fermentation process. The study was conducted for a period of 7 months at Erode, Tamilnadu and the reading on various parameters like Hydraulic retention time, organic loading rate, sludge loading rate, influent pH, effluent pH, inlet volatile acids, out let volatile fatty acids, inlet VSS/TS ratio, outlet VSS/TS ratio, influent COD, effluent COD and % of COD removal are recorded for every 10 days. The aim of the present study is to develop a model through multiple linear regression analysis with COD as dependent variable and various parameters like HRT, OLR, SLR, influent, effluent, VSS/TS ratio, influent COD, effluent COD, etc as independent variables and to analyze the impact of these parameters on COD. The results of the model developed through step-wise regression method revealed that only four parameters Influent COD, effluent COD, VSS/TS and Influent/pH were main influencers of COD removal. The parameters influent COD and VSS/TS have positive impact on COD removal and the parameters effluent COD and Influent/pH have negative impact. The parameter Influent COD has the highest order of impact, followed by effluent COD, VSS/TS and influent pH. The other parameters HRT, OLR, SLR, INLET VFA and OUTLET VFA were not significantly contributing to the removal of COD. The implementation of the process suggested through this study might bring in dual benefit to the community, viz treatment of solid

  11. Spatializing Area-Based Measures of Neighborhood Characteristics for Multilevel Regression Analyses: An Areal Median Filtering Approach.

    Science.gov (United States)

    Oka, Masayoshi; Wong, David W S

    2016-06-01

    Area-based measures of neighborhood characteristics simply derived from enumeration units (e.g., census tracts or block groups) ignore the potential of spatial spillover effects, and thus incorporating such measures into multilevel regression models may underestimate the neighborhood effects on health. To overcome this limitation, we describe the concept and method of areal median filtering to spatialize area-based measures of neighborhood characteristics for multilevel regression analyses. The areal median filtering approach provides a means to specify or formulate "neighborhoods" as meaningful geographic entities by removing enumeration unit boundaries as the absolute barriers and by pooling information from the neighboring enumeration units. This spatializing process takes into account for the potential of spatial spillover effects and also converts aspatial measures of neighborhood characteristics into spatial measures. From a conceptual and methodological standpoint, incorporating the derived spatial measures into multilevel regression analyses allows us to more accurately examine the relationships between neighborhood characteristics and health. To promote and set the stage for informative research in the future, we provide a few important conceptual and methodological remarks, and discuss possible applications, inherent limitations, and practical solutions for using the areal median filtering approach in the study of neighborhood effects on health.

  12. A robust approach for tree segmentation in deciduous forests using small-footprint airborne LiDAR data

    Science.gov (United States)

    Hamraz, Hamid; Contreras, Marco A.; Zhang, Jun

    2016-10-01

    This paper presents a non-parametric approach for segmenting trees from airborne LiDAR data in deciduous forests. Based on the LiDAR point cloud, the approach collects crown information such as steepness and height on-the-fly to delineate crown boundaries, and most importantly, does not require a priori assumptions of crown shape and size. The approach segments trees iteratively starting from the tallest within a given area to the smallest until all trees have been segmented. To evaluate its performance, the approach was applied to the University of Kentucky Robinson Forest, a deciduous closed-canopy forest with complex terrain and vegetation conditions. The approach identified 94% of dominant and co-dominant trees with a false detection rate of 13%. About 62% of intermediate, overtopped, and dead trees were also detected with a false detection rate of 15%. The overall segmentation accuracy was 77%. Correlations of the segmentation scores of the proposed approach with local terrain and stand metrics was not significant, which is likely an indication of the robustness of the approach as results are not sensitive to the differences in terrain and stand structures.

  13. Scale and scope economies in nursing homes: a quantile regression approach.

    Science.gov (United States)

    Christensen, Eric W

    2004-04-01

    Nursing homes vary widely between facilities with very few beds and facilities with several hundred beds. Previous studies, which estimate nursing home scale and scope economies, do not account for this heterogeneity and implicitly assume that all nursing homes face the same cost structure. To account for heterogeneity, this paper uses quantile regression to estimate cost functions for skilled and intermediate care nursing homes. The results show that the parameters of nursing home cost functions vary significantly by output mix and across the cost distribution. Estimates show that product-specific scale economies systematically increase across the cost distribution for both skilled and intermediate care facilities, with diseconomies of scale in the lower deciles and no significant scale economies in the higher deciles. As for ray scale economies, estimates show economies of scale in the lower deciles and diseconomies of scale or no significant scale economies at higher deciles. The estimates also show that scope economies exist in the lower cost deciles and that no scope economies exist in the higher cost deciles. Additionally, the degree of scope economies monotonically decreases across the deciles.

  14. Relative Age in School and Suicide among Young Individuals in Japan: A Regression Discontinuity Approach.

    Science.gov (United States)

    Matsubayashi, Tetsuya; Ueda, Michiko

    2015-01-01

    Evidence collected in many parts of the world suggests that, compared to older students, students who are relatively younger at school entry tend to have worse academic performance and lower levels of income. This study examined how relative age in a grade affects suicide rates of adolescents and young adults between 15 and 25 years of age using data from Japan. We examined individual death records in the Vital Statistics of Japan from 1989 to 2010. In contrast to other countries, late entry to primary school is not allowed in Japan. We took advantage of the school entry cutoff date to implement a regression discontinuity (RD) design, assuming that the timing of births around the school entry cutoff date was randomly determined and therefore that individuals who were born just before and after the cutoff date have similar baseline characteristics. We found that those who were born right before the school cutoff day and thus youngest in their cohort have higher mortality rates by suicide, compared to their peers who were born right after the cutoff date and thus older. We also found that those with relative age disadvantage tend to follow a different career path than those with relative age advantage, which may explain their higher suicide mortality rates. Relative age effects have broader consequences than was previously supposed. This study suggests that policy intervention that alleviates the relative age effect can be important.

  15. QUANTITATIVE ELECTRONIC STRUCTURE - ACTIVITY RELATIONSHIPS ANALYSIS ANTIMUTAGENIC BENZALACETONE DERIVATIVES BY PRINCIPAL COMPONENT REGRESSION APPROACH

    Directory of Open Access Journals (Sweden)

    Yuliana Yuliana

    2010-06-01

    Full Text Available Quantitative Electronic Structure Activity Relationship (QSAR analysis of a series of benzalacetones has been investigated based on semi empirical PM3 calculation data using Principal Components Regression (PCR. Investigation has been done based on antimutagen activity from benzalacetone compounds (presented by log 1/IC50 and was studied as linear correlation with latent variables (Tx resulted from transformation of atomic net charges using Principal Component Analysis (PCA. QSAR equation was determinated based on distribution of selected components and then was analysed with PCR. The result was described by the following QSAR equation : log 1/IC50 = 6.555 + (2.177.T1 + (2.284.T2 + (1.933.T3 The equation was significant on the 95% level with statistical parameters : n = 28 r = 0.766  SE  = 0.245  Fcalculation/Ftable = 3.780 and gave the PRESS result 0.002. It means that there were only a relatively few deviations between the experimental and theoretical data of antimutagenic activity.          New types of benzalacetone derivative compounds were designed  and their theoretical activity were predicted based on the best QSAR equation. It was found that compounds number 29, 30, 31, 32, 33, 35, 36, 37, 38, 40, 41, 42, 44, 47, 48, 49 and 50  have  a relatively high antimutagenic activity.   Keywords: QSAR; antimutagenic activity; benzalaceton; atomic net charge

  16. Relative Age in School and Suicide among Young Individuals in Japan: A Regression Discontinuity Approach.

    Directory of Open Access Journals (Sweden)

    Tetsuya Matsubayashi

    Full Text Available Evidence collected in many parts of the world suggests that, compared to older students, students who are relatively younger at school entry tend to have worse academic performance and lower levels of income. This study examined how relative age in a grade affects suicide rates of adolescents and young adults between 15 and 25 years of age using data from Japan.We examined individual death records in the Vital Statistics of Japan from 1989 to 2010. In contrast to other countries, late entry to primary school is not allowed in Japan. We took advantage of the school entry cutoff date to implement a regression discontinuity (RD design, assuming that the timing of births around the school entry cutoff date was randomly determined and therefore that individuals who were born just before and after the cutoff date have similar baseline characteristics.We found that those who were born right before the school cutoff day and thus youngest in their cohort have higher mortality rates by suicide, compared to their peers who were born right after the cutoff date and thus older. We also found that those with relative age disadvantage tend to follow a different career path than those with relative age advantage, which may explain their higher suicide mortality rates.Relative age effects have broader consequences than was previously supposed. This study suggests that policy intervention that alleviates the relative age effect can be important.

  17. ProteinLasso: A Lasso regression approach to protein inference problem in shotgun proteomics.

    Science.gov (United States)

    Huang, Ting; Gong, Haipeng; Yang, Can; He, Zengyou

    2013-04-01

    Protein inference is an important issue in proteomics research. Its main objective is to select a proper subset of candidate proteins that best explain the observed peptides. Although many methods have been proposed for solving this problem, several issues such as peptide degeneracy and one-hit wonders still remain unsolved. Therefore, the accurate identification of proteins that are truly present in the sample continues to be a challenging task. Based on the concept of peptide detectability, we formulate the protein inference problem as a constrained Lasso regression problem, which can be solved very efficiently through a coordinate descent procedure. The new inference algorithm is named as ProteinLasso, which explores an ensemble learning strategy to address the sparsity parameter selection problem in Lasso model. We test the performance of ProteinLasso on three datasets. As shown in the experimental results, ProteinLasso outperforms those state-of-the-art protein inference algorithms in terms of both identification accuracy and running efficiency. In addition, we show that ProteinLasso is stable under different parameter specifications. The source code of our algorithm is available at: http://sourceforge.net/projects/proteinlasso.

  18. Generalized regression neural network-based approach for modelling hourly dissolved oxygen concentration in the Upper Klamath River, Oregon, USA.

    Science.gov (United States)

    Heddam, Salim

    2014-08-01

    In this study, a comparison between generalized regression neural network (GRNN) and multiple linear regression (MLR) models is given on the effectiveness of modelling dissolved oxygen (DO) concentration in a river. The two models are developed using hourly experimental data collected from the United States Geological Survey (USGS Station No: 421209121463000 [top]) station at the Klamath River at Railroad Bridge at Lake Ewauna. The input variables used for the two models are water, pH, temperature, electrical conductivity, and sensor depth. The performances of the models are evaluated using root mean square errors (RMSE), the mean absolute error (MAE), Willmott's index of agreement (d), and correlation coefficient (CC) statistics. Of the two approaches employed, the best fit was obtained using the GRNN model with the four input variables used.

  19. Modeling Approach and Analysis of the Structural Parameters of an Inductively Coupled Plasma Etcher Based on a Regression Orthogonal Design

    Institute of Scientific and Technical Information of China (English)

    CHENG Jia; ZHU Yu; JI Linhong

    2012-01-01

    The geometry of an inductively coupled plasma (ICP) etcher is usually considered to be an important factor for determining both plasma and process uniformity over a large wafer. During the past few decades, these parameters were determined by the "trial and error" method, resulting in wastes of time and funds. In this paper, a new approach of regression orthogonal design with plasma simulation experiments is proposed to investigate the sensitivity of the structural parameters on the uniformity of plasma characteristics. The tool for simulating plasma is CFD-ACE+, which is commercial multi-physical modeling software that has been proven to be accurate for plasma simulation. The simulated experimental results are analyzed to get a regression equation on three structural parameters. Through this equation, engineers can compute the uniformity of the electron number density rapidly without modeling by CFD-ACE+. An optimization performed at the end produces good results.

  20. A nearest neighbour approach by genetic distance to the assignment of individual trees to geographic origin.

    Science.gov (United States)

    Degen, Bernd; Blanc-Jolivet, Céline; Stierand, Katrin; Gillet, Elizabeth

    2017-03-01

    During the past decade, the use of DNA for forensic applications has been extensively implemented for plant and animal species, as well as in humans. Tracing back the geographical origin of an individual usually requires genetic assignment analysis. These approaches are based on reference samples that are grouped into populations or other aggregates and intend to identify the most likely group of origin. Often this grouping does not have a biological but rather a historical or political justification, such as "country of origin". In this paper, we present a new nearest neighbour approach to individual assignment or classification within a given but potentially imperfect grouping of reference samples. This method, which is based on the genetic distance between individuals, functions better in many cases than commonly used methods. We demonstrate the operation of our assignment method using two data sets. One set is simulated for a large number of trees distributed in a 120km by 120km landscape with individual genotypes at 150 SNPs, and the other set comprises experimental data of 1221 individuals of the African tropical tree species Entandrophragma cylindricum (Sapelli) genotyped at 61 SNPs. Judging by the level of correct self-assignment, our approach outperformed the commonly used frequency and Bayesian approaches by 15% for the simulated data set and by 5-7% for the Sapelli data set. Our new approach is less sensitive to overlapping sources of genetic differentiation, such as genetic differences among closely-related species, phylogeographic lineages and isolation by distance, and thus operates better even for suboptimal grouping of individuals. Copyright © 2016 Elsevier Ireland Ltd. All rights reserved.

  1. Estimating earnings losses due to mental illness: a quantile regression approach.

    Science.gov (United States)

    Marcotte, Dave E; Wilcox-Gök, Virginia

    2003-09-01

    The ability of workers to remain productive and sustain earnings when afflicted with mental illness depends importantly on access to appropriate treatment and on flexibility and support from employers. In the United States there is substantial variation in access to health care and sick leave and other employment flexibilities across the earnings distribution. Consequently, a worker's ability to work and how much his/her earnings are impeded likely depend upon his/her position in the earnings distribution. Because of this, focusing on average earnings losses may provide insufficient information on the impact of mental illness in the labor market. In this paper, we examine the effects of mental illness on earnings by recognizing that effects could vary across the distribution of earnings. Using data from the National Comorbidity Survey, we employ a quantile regression estimator to identify the effects at key points in the earnings distribution. We find that earnings effects vary importantly across the distribution. While average effects are often not large, mental illness more commonly imposes earnings losses at the lower tail of the distribution, especially for women. In only one case do we find an illness to have negative effects across the distribution. Mental illness can have larger negative impacts on economic outcomes than previously estimated, even if those effects are not uniform. Consequently, researchers and policy makers alike should not be placated by findings that mean earnings effects are relatively small. Such estimates miss important features of how and where mental illness is associated with real economic losses for the ill.

  2. Appraisal, coping, emotion, and performance during elite fencing matches: a random coefficient regression model approach.

    Science.gov (United States)

    Doron, J; Martinent, G

    2016-06-23

    Understanding more about the stress process is important for the performance of athletes during stressful situations. Grounded in Lazarus's (1991, 1999, 2000) CMRT of emotion, this study tracked longitudinally the relationships between cognitive appraisal, coping, emotions, and performance in nine elite fencers across 14 international matches (representing 619 momentary assessments) using a naturalistic, video-assisted methodology. A series of hierarchical linear modeling analyses were conducted to: (a) explore the relationships between cognitive appraisals (challenge and threat), coping strategies (task- and disengagement oriented coping), emotions (positive and negative) and objective performance; (b) ascertain whether the relationship between appraisal and emotion was mediated by coping; and (c) examine whether the relationship between appraisal and objective performance was mediated by emotion and coping. The results of the random coefficient regression models showed: (a) positive relationships between challenge appraisal, task-oriented coping, positive emotions, and performance, as well as between threat appraisal, disengagement-oriented coping and negative emotions; (b) that disengagement-oriented coping partially mediated the relationship between threat and negative emotions, whereas task-oriented coping partially mediated the relationship between challenge and positive emotions; and (c) that disengagement-oriented coping mediated the relationship between threat and performance, whereas task-oriented coping and positive emotions partially mediated the relationship between challenge and performance. As a whole, this study furthered knowledge during sport performance situations of Lazarus's (1999) claim that these psychological constructs exist within a conceptual unit. Specifically, our findings indicated that the ways these constructs are inter-related influence objective performance within competitive settings.

  3. IsoLasso: A LASSO Regression Approach to RNA-Seq Based Transcriptome Assembly

    Science.gov (United States)

    Li, Wei; Feng, Jianxing; Jiang, Tao

    The new second generation sequencing technology revolutionizes many biology related research fields, and posts various computational biology challenges. One of them is transcriptome assembly based on RNA-Seq data, which aims at reconstructing all full-length mRNA transcripts simultaneously from millions of short reads. In this paper, we consider three objectives in transcriptome assembly: the maximization of prediction accuracy, minimization of interpretation, and maximization of completeness. The first objective, the maximization of prediction accuracy, requires that the estimated expression levels based on assembled transcripts should be as close as possible to the observed ones for every expressed region of the genome. The minimization of interpretation follows the parsimony principle to seek as few transcripts in the prediction as possible. The third objective, the maximization of completeness, requires that the maximum number of mapped reads (or "expressed segments" in gene models) be explained by (i.e., contained in) the predicted transcripts in the solution. Based on the above three objectives, we present IsoLasso, a new RNA-Seq based transcriptome assembly tool. IsoLasso is based on the well-known LASSO algorithm, a multivariate regression method designated to seek a balance between the maximization of prediction accuracy and the minimization of interpretation. By including some additional constraints in the quadratic program involved in LASSO, IsoLasso is able to make the set of assembled transcripts as complete as possible. Experiments on simulated and real RNA-Seq datasets show that IsoLasso achieves higher sensitivity and precision simultaneously than the state-of-art transcript assembly tools.

  4. Identification of area-level influences on regions of high cancer incidence in Queensland, Australia: a classification tree approach

    Directory of Open Access Journals (Sweden)

    Mengersen Kerrie L

    2011-07-01

    Full Text Available Abstract Background Strategies for cancer reduction and management are targeted at both individual and area levels. Area-level strategies require careful understanding of geographic differences in cancer incidence, in particular the association with factors such as socioeconomic status, ethnicity and accessibility. This study aimed to identify the complex interplay of area-level factors associated with high area-specific incidence of Australian priority cancers using a classification and regression tree (CART approach. Methods Area-specific smoothed standardised incidence ratios were estimated for priority-area cancers across 478 statistical local areas in Queensland, Australia (1998-2007, n = 186,075. For those cancers with significant spatial variation, CART models were used to identify whether area-level accessibility, socioeconomic status and ethnicity were associated with high area-specific incidence. Results The accessibility of a person's residence had the most consistent association with the risk of cancer diagnosis across the specific cancers. Many cancers were likely to have high incidence in more urban areas, although male lung cancer and cervical cancer tended to have high incidence in more remote areas. The impact of socioeconomic status and ethnicity on these associations differed by type of cancer. Conclusions These results highlight the complex interactions between accessibility, socioeconomic status and ethnicity in determining cancer incidence risk.

  5. Education-Based Gaps in eHealth: A Weighted Logistic Regression Approach.

    Science.gov (United States)

    Amo, Laura

    2016-10-12

    Persons with a college degree are more likely to engage in eHealth behaviors than persons without a college degree, compounding the health disadvantages of undereducated groups in the United States. However, the extent to which quality of recent eHealth experience reduces the education-based eHealth gap is unexplored. The goal of this study was to examine how eHealth information search experience moderates the relationship between college education and eHealth behaviors. Based on a nationally representative sample of adults who reported using the Internet to conduct the most recent health information search (n=1458), I evaluated eHealth search experience in relation to the likelihood of engaging in different eHealth behaviors. I examined whether Internet health information search experience reduces the eHealth behavior gaps among college-educated and noncollege-educated adults. Weighted logistic regression models were used to estimate the probability of different eHealth behaviors. College education was significantly positively related to the likelihood of 4 eHealth behaviors. In general, eHealth search experience was negatively associated with health care behaviors, health information-seeking behaviors, and user-generated or content sharing behaviors after accounting for other covariates. Whereas Internet health information search experience has narrowed the education gap in terms of likelihood of using email or Internet to communicate with a doctor or health care provider and likelihood of using a website to manage diet, weight, or health, it has widened the education gap in the instances of searching for health information for oneself, searching for health information for someone else, and downloading health information on a mobile device. The relationship between college education and eHealth behaviors is moderated by Internet health information search experience in different ways depending on the type of eHealth behavior. After controlling for college

  6. Modified Logistic Regression Approaches to Eliminating the Impact of Response Styles on DIF Detection in Likert-Type Scales.

    Science.gov (United States)

    Chen, Hui-Fang; Jin, Kuan-Yu; Wang, Wen-Chung

    2017-01-01

    Extreme response styles (ERS) is prevalent in Likert- or rating-type data but previous research has not well-addressed their impact on differential item functioning (DIF) assessments. This study aimed to fill in the knowledge gap and examined their influence on the performances of logistic regression (LR) approaches in DIF detections, including the ordinal logistic regression (OLR) and the logistic discriminant functional analysis (LDFA). Results indicated that both the standard OLR and LDFA yielded severely inflated false positive rates as the magnitude of the differences in ERS increased between two groups. This study proposed a class of modified LR approaches to eliminating the ERS effect on DIF assessment. These proposed modifications showed satisfactory control of false positive rates when no DIF items existed and yielded a better control of false positive rates and more accurate true positive rates under DIF conditions than the conventional LR approaches did. In conclusion, the proposed modifications are recommended in survey research when there are multiple group or cultural groups.

  7. Insight into the Properties of the UK Power Consumption Using a Linear Regression and Wavelet Transform Approach

    CERN Document Server

    Avdakovic, Samir; Nuhanovic, Amir

    2013-01-01

    In this paper, the relationship between the Gross Domestic Product (GDP), air temperature variations and power consumption is evaluated using the linear regression and Wavelet Coherence (WTC) approach on a 1971-2011 time series for the United Kingdom (UK). The results based on the linear regression approach indicate that some 66% variability of the UK electricity demand can be explained by the quarterly GDP variations, while only 11% of the quarterly changes of the UK electricity demand are caused by seasonal air temperature variations. WTC however, can detect the period of time when GDP and air temperature significantly correlate with electricity demand and the results of the wavelet correlation at different time scales indicate that a significant correlation is to be found on a long-term basis for GDP and on an annual basis for seasonal air-temperature variations. This approach provides an insight into the properties of the impact of the main factors on power consumption on the basis of which the power syst...

  8. Exposed tree root analysis as a dendrogeomorphic approach to estimating bank retreat at the South River, Virginia

    Science.gov (United States)

    Stotts, Stephanie; O'Neal, Michael; Pizzuto, James; Hupp, Cliff

    2014-10-01

    We use a biometric approach based on anatomical changes in the wood of exposed tree roots to quantify riverbank erosion along South River, Virginia, a site where commonly applied techniques for determining bank erosion rates are either not appropriate because of the required spatial scale of analysis (i.e., erosion pins, traditional surveys, LiDAR analysis) or have failed to detect obvious erosion (i.e., photogrammetric techniques). We sampled 73 exposed roots from 22 study reaches and identified the year of exposure macroscopically (2 to 20 times magnification) and microscopically (20 to 100 times magnification), comparing the estimated erosion rates between levels of magnification and to those obtained with photogrammetric techniques. We found no statistical differences between the results of macroscopic and microscopic analyses (t-test, α = 0.01) but encountered difficulty in identifying the year of root exhumation in some samples. When comparing exposed root analysis to photogrammetric techniques, the results indicate that the exposed root approach is a feasible and effective method for estimating annual- to decadal-scale bank erosion. In addition to producing erosion rates statistically indistinguishable from photogrammetric techniques at sites with erosion rates large enough for detection using historical aerial photographs (regression analysis and t-test, α = 0.01), exposed root analysis was able to estimate erosion rates at sites where photogrammetric techniques failed. We also identify deciduous species well suited for this approach (Fraxinus pennsylvanica) and others that prove more problematic (e.g., Acer negundo, Celtis occidentalis, Acer saccharinum). This study is significant because it describes a robust tool that provides insights into annual- to decadal-scale erosion where other commonly applied techniques may not be appropriate or easily applied.

  9. An Enhanced MEMS Error Modeling Approach Based on Nu-Support Vector Regression

    Directory of Open Access Journals (Sweden)

    Deepak Bhatt

    2012-07-01

    Full Text Available Micro Electro Mechanical System (MEMS-based inertial sensors have made possible the development of a civilian land vehicle navigation system by offering a low-cost solution. However, the accurate modeling of the MEMS sensor errors is one of the most challenging tasks in the design of low-cost navigation systems. These sensors exhibit significant errors like biases, drift, noises; which are negligible for higher grade units. Different conventional techniques utilizing the Gauss Markov model and neural network method have been previously utilized to model the errors. However, Gauss Markov model works unsatisfactorily in the case of MEMS units due to the presence of high inherent sensor errors. On the other hand, modeling the random drift utilizing Neural Network (NN is time consuming, thereby affecting its real-time implementation. We overcome these existing drawbacks by developing an enhanced Support Vector Machine (SVM based error model. Unlike NN, SVMs do not suffer from local minimisation or over-fitting problems and delivers a reliable global solution. Experimental results proved that the proposed SVM approach reduced the noise standard deviation by 10–35% for gyroscopes and 61–76% for accelerometers. Further, positional error drifts under static conditions improved by 41% and 80% in comparison to NN and GM approaches.

  10. The Public-Private Sector Wage Gap in Zambia in the 1990s: A Quantile Regression Approach

    DEFF Research Database (Denmark)

    Nielsen, Helena Skyt; Rosholm, Michael

    2001-01-01

    of economic transition, because items as privatization and deregulation were on the political agenda. The focus is placed on the public-private sector wage gap, and the results show that this gap was relatively favorable for the low-skilled and less favorable for the high-skilled. This picture was further......We investigate the determinants of wages in Zambia and based on the quantile regression approach, we analyze how their effects differ at different points in the wage distribution and over time. We use three cross-sections of Zambian household data from the early nineties, which was a period...

  11. The Public-Private Sector Wage Gap in Zambia in the 1990s: A Quantile Regression Approach

    DEFF Research Database (Denmark)

    Nielsen, Helena Skyt; Rosholm, Michael

    2001-01-01

    We investigate the determinants of wages in Zambia and based on the quantile regression approach, we analyze how their effects differ at different points in the wage distribution and over time. We use three cross-sections of Zambian household data from the early nineties, which was a period...... of economic transition, because items as privatization and deregulation were on the political agenda. The focus is placed on the public-private sector wage gap, and the results show that this gap was relatively favorable for the low-skilled and less favorable for the high-skilled. This picture was further...

  12. Tree-ring analysis and modeling approaches yield contrary response of circumboreal forest productivity to climate change.

    Science.gov (United States)

    Tei, Shunsuke; Sugimoto, Atsuko; Yonenobu, Hitoshi; Matsuura, Yojiro; Osawa, Akira; Sato, Hisashi; Fujinuma, Junichi; Maximov, Trofim

    2017-06-06

    Circumboreal forest ecosystems are exposed to a larger magnitude of warming in comparison with the global average, as a result of warming-induced environmental changes. However, it is not clear how tree growth in these ecosystems responds to these changes. In this study, we investigated the sensitivity of forest productivity to climate change using ring width indices (RWI) from a tree-ring width dataset accessed from the International Tree-Ring Data Bank and gridded climate datasets from the Climate Research Unit. A negative relationship of RWI with summer temperature and recent reductions in RWI were typically observed in continental dry regions, such as inner Alaska and Canada, southern Europe, and the southern part of eastern Siberia. We then developed a multiple regression model with regional meteorological parameters to predict RWI, and then applied to these models to predict how tree growth will respond to twenty-first-century climate change (RCP8.5 scenario). The projections showed a spatial variation and future continuous reduction in tree growth in those continental dry regions. The spatial variation, however, could not be reproduced by a dynamic global vegetation model (DGVM). The DGVM projected a generally positive trend in future tree growth all over the circumboreal region. These results indicate that DGVMs may overestimate future wood net primary productivity (NPP) in continental dry regions such as these; this seems to be common feature of current DGVMs. DGVMs should be able to express the negative effect of warming on tree growth, so that they simulate the observed recent reduction in tree growth in continental dry regions. © 2017 John Wiley & Sons Ltd.

  13. Identifying tree crown delineation shapes and need for remediation on high resolution imagery using an evidence based approach

    Science.gov (United States)

    Leckie, Donald G.; Walsworth, Nicholas; Gougeon, François A.

    2016-04-01

    In order to fully realize the benefits of automated individual tree mapping for tree species, health, forest inventory attribution and forest management decision making, the tree delineations should be as good as possible. The concept of identifying poorly delineated tree crowns and suggesting likely types of remediation was investigated. Delineations (isolations or isols) were classified into shape types reflecting whether they were realistic tree shapes and the likely kind of remediation needed. Shape type was classified by an evidence based rules approach using primitives based on isol size, shape indices, morphology, the presence of local maxima, and matches with template models representing trees of different sizes. A test set containing 50,000 isols based on an automated tree delineation of 40 cm multispectral airborne imagery of a diverse temperate-boreal forest site was used. Isolations representing single trees or several trees were the focus, as opposed to cases where a tree is split into several isols. For eight shape classes from regular through to convolute, shape classification accuracy was in the order of 62%; simplifying to six classes accuracy was 83%. Shape type did give an indication of the type of remediation and there were 6% false alarms (i.e., isols classed as needing remediation but did not). Alternately, there were 5% omissions (i.e., isols of regular shape and not earmarked for remediation that did need remediation). The usefulness of the concept of identifying poor delineations in need of remediation was demonstrated and one suite of methods developed and shown to be effective.

  14. Model-Independent Evaluation of Tumor Markers and a Logistic-Tree Approach to Diagnostic Decision Support

    Directory of Open Access Journals (Sweden)

    Weizeng Ni

    2014-01-01

    Full Text Available Sensitivity and specificity of using individual tumor markers hardly meet the clinical requirement. This challenge gave rise to many efforts, e.g., combing multiple tumor markers and employing machine learning algorithms. However, results from different studies are often inconsistent, which are partially attributed to the use of different evaluation criteria. Also, the wide use of model-dependent validation leads to high possibility of data overfitting when complex models are used for diagnosis. We propose two model-independent criteria, namely, area under the curve (AUC and Relief to evaluate the diagnostic values of individual and multiple tumor markers, respectively. For diagnostic decision support, we propose the use of logistic-tree which combines decision tree and logistic regression. Application on a colorectal cancer dataset shows that the proposed evaluation criteria produce results that are consistent with current knowledge. Furthermore, the simple and highly interpretable logistic-tree has diagnostic performance that is competitive with other complex models.

  15. A physarum-inspired prize-collecting steiner tree approach to identify subnetworks for drug repositioning.

    Science.gov (United States)

    Sun, Yahui; Hameed, Pathima Nusrath; Verspoor, Karin; Halgamuge, Saman

    2016-12-05

    Drug repositioning can reduce the time, costs and risks of drug development by identifying new therapeutic effects for known drugs. It is challenging to reposition drugs as pharmacological data is large and complex. Subnetwork identification has already been used to simplify the visualization and interpretation of biological data, but it has not been applied to drug repositioning so far. In this paper, we fill this gap by proposing a new Physarum-inspired Prize-Collecting Steiner Tree algorithm to identify subnetworks for drug repositioning. Drug Similarity Networks (DSN) are generated using the chemical, therapeutic, protein, and phenotype features of drugs. In DSNs, vertex prizes and edge costs represent the similarities and dissimilarities between drugs respectively, and terminals represent drugs in the cardiovascular class, as defined in the Anatomical Therapeutic Chemical classification system. A new Physarum-inspired Prize-Collecting Steiner Tree algorithm is proposed in this paper to identify subnetworks. We apply both the proposed algorithm and the widely-used GW algorithm to identify subnetworks in our 18 generated DSNs. In these DSNs, our proposed algorithm identifies subnetworks with an average Rand Index of 81.1%, while the GW algorithm can only identify subnetworks with an average Rand Index of 64.1%. We select 9 subnetworks with high Rand Index to find drug repositioning opportunities. 10 frequently occurring drugs in these subnetworks are identified as candidates to be repositioned for cardiovascular diseases. We find evidence to support previous discoveries that nitroglycerin, theophylline and acarbose may be able to be repositioned for cardiovascular diseases. Moreover, we identify seven previously unknown drug candidates that also may interact with the biological cardiovascular system. These discoveries show our proposed Prize-Collecting Steiner Tree approach as a promising strategy for drug repositioning.

  16. A novel prediction approach for antimalarial activities of Trimethoprim, Pyrimethamine, and Cycloguanil analogues using extremely randomized trees.

    Science.gov (United States)

    Nattee, Cholwich; Khamsemanan, Nirattaya; Lawtrakul, Luckhana; Toochinda, Pisanu; Hannongbua, Supa

    2017-01-01

    Malaria is still one of the most serious diseases in tropical regions. This is due in part to the high resistance against available drugs for the inhibition of parasites, Plasmodium, the cause of the disease. New potent compounds with high clinical utility are urgently needed. In this work, we created a novel model using a regression tree to study structure-activity relationships and predict the inhibition constant, Ki of three different antimalarial analogues (Trimethoprim, Pyrimethamine, and Cycloguanil) based on their molecular descriptors. To the best of our knowledge, this work is the first attempt to study the structure-activity relationships of all three analogues combined. The most relevant descriptors and appropriate parameters of the regression tree are harvested using extremely randomized trees. These descriptors are water accessible surface area, Log of the aqueous solubility, total hydrophobic van der Waals surface area, and molecular refractivity. Out of all possible combinations of these selected parameters and descriptors, the tree with the strongest coefficient of determination is selected to be our prediction model. Predicted Ki values from the proposed model show a strong coefficient of determination, R(2)=0.996, to experimental Ki values. From the structure of the regression tree, compounds with high accessible surface area of all hydrophobic atoms (ASA_H) and low aqueous solubility of inhibitors (Log S) generally possess low Ki values. Our prediction model can also be utilized as a screening test for new antimalarial drug compounds which may reduce the time and expenses for new drug development. New compounds with high predicted Ki should be excluded from further drug development. It is also our inference that a threshold of ASA_H greater than 575.80 and Log S less than or equal to -4.36 is a sufficient condition for a new compound to possess a low Ki.

  17. Topological and canonical kriging for design flood prediction in ungauged catchments: an improvement over a traditional regional regression approach?

    Science.gov (United States)

    Archfield, Stacey A.; Pugliese, Alessio; Castellarin, Attilio; Skøien, Jon O.; Kiang, Julie E.

    2013-01-01

    In the United States, estimation of flood frequency quantiles at ungauged locations has been largely based on regional regression techniques that relate measurable catchment descriptors to flood quantiles. More recently, spatial interpolation techniques of point data have been shown to be effective for predicting streamflow statistics (i.e., flood flows and low-flow indices) in ungauged catchments. Literature reports successful applications of two techniques, canonical kriging, CK (or physiographical-space-based interpolation, PSBI), and topological kriging, TK (or top-kriging). CK performs the spatial interpolation of the streamflow statistic of interest in the two-dimensional space of catchment descriptors. TK predicts the streamflow statistic along river networks taking both the catchment area and nested nature of catchments into account. It is of interest to understand how these spatial interpolation methods compare with generalized least squares (GLS) regression, one of the most common approaches to estimate flood quantiles at ungauged locations. By means of a leave-one-out cross-validation procedure, the performance of CK and TK was compared to GLS regression equations developed for the prediction of 10, 50, 100 and 500 yr floods for 61 streamgauges in the southeast United States. TK substantially outperforms GLS and CK for the study area, particularly for large catchments. The performance of TK over GLS highlights an important distinction between the treatments of spatial correlation when using regression-based or spatial interpolation methods to estimate flood quantiles at ungauged locations. The analysis also shows that coupling TK with CK slightly improves the performance of TK; however, the improvement is marginal when compared to the improvement in performance over GLS.

  18. Topological and canonical kriging for design flood prediction in ungauged catchments: an improvement over a traditional regional regression approach?

    Directory of Open Access Journals (Sweden)

    S. A. Archfield

    2013-04-01

    Full Text Available In the United States, estimation of flood frequency quantiles at ungauged locations has been largely based on regional regression techniques that relate measurable catchment descriptors to flood quantiles. More recently, spatial interpolation techniques of point data have been shown to be effective for predicting streamflow statistics (i.e., flood flows and low-flow indices in ungauged catchments. Literature reports successful applications of two techniques, canonical kriging, CK (or physiographical-space-based interpolation, PSBI, and topological kriging, TK (or top-kriging. CK performs the spatial interpolation of the streamflow statistic of interest in the two-dimensional space of catchment descriptors. TK predicts the streamflow statistic along river networks taking both the catchment area and nested nature of catchments into account. It is of interest to understand how these spatial interpolation methods compare with generalized least squares (GLS regression, one of the most common approaches to estimate flood quantiles at ungauged locations. By means of a leave-one-out cross-validation procedure, the performance of CK and TK was compared to GLS regression equations developed for the prediction of 10, 50, 100 and 500 yr floods for 61 streamgauges in the southeast United States. TK substantially outperforms GLS and CK for the study area, particularly for large catchments. The performance of TK over GLS highlights an important distinction between the treatments of spatial correlation when using regression-based or spatial interpolation methods to estimate flood quantiles at ungauged locations. The analysis also shows that coupling TK with CK slightly improves the performance of TK; however, the improvement is marginal when compared to the improvement in performance over GLS.

  19. Semiparametric approach for non-monotone missing covariates in a parametric regression model

    KAUST Repository

    Sinha, Samiran

    2014-02-26

    Missing covariate data often arise in biomedical studies, and analysis of such data that ignores subjects with incomplete information may lead to inefficient and possibly biased estimates. A great deal of attention has been paid to handling a single missing covariate or a monotone pattern of missing data when the missingness mechanism is missing at random. In this article, we propose a semiparametric method for handling non-monotone patterns of missing data. The proposed method relies on the assumption that the missingness mechanism of a variable does not depend on the missing variable itself but may depend on the other missing variables. This mechanism is somewhat less general than the completely non-ignorable mechanism but is sometimes more flexible than the missing at random mechanism where the missingness mechansim is allowed to depend only on the completely observed variables. The proposed approach is robust to misspecification of the distribution of the missing covariates, and the proposed mechanism helps to nullify (or reduce) the problems due to non-identifiability that result from the non-ignorable missingness mechanism. The asymptotic properties of the proposed estimator are derived. Finite sample performance is assessed through simulation studies. Finally, for the purpose of illustration we analyze an endometrial cancer dataset and a hip fracture dataset.

  20. Dual wavelet energy approach-regression analysis for exploring steel micro structural behavior

    Science.gov (United States)

    Bettayeb, Fairouz

    2012-05-01

    Ultrasonic Ndt data are time series data decomposed in signal plus noise obtained from traveling ultrasonic waves inside a material and captured by piezoelectric sensors. The natural inhomogeneous and anisotropy character of steel made material causes high acoustic attenuation and scattering effect. This makes data interpretation highly complex for most of qualified Ndt operators. In this paper we address the non linear features of back scattered ultrasonic waves from steel plates. The structural noise data captured from the specimens, and processed by an algorithm based on wavelet energy approach, show significant insights into the relationship between backscattered noise and material microstructures. This algorithm along with correlation coefficients, residuals and interpolations calculations of processed ultrasonic data seems to be a well-adapted signal analysis tool for viewing material micro structural dimension scales. Experiments show interesting 3D interface and indicate a quasi linear signal energy distribution at micro structural level. It suggests probable incidence of microstructure acoustic signatures at different energy scales of the material phases. In conclusion multi polynomial interpolations of processed noise data exhibit an attractor shape which should involves chaos theory noise data modeling.

  1. Semiparametric approach for non-monotone missing covariates in a parametric regression model.

    Science.gov (United States)

    Sinha, Samiran; Saha, Krishna K; Wang, Suojin

    2014-06-01

    Missing covariate data often arise in biomedical studies, and analysis of such data that ignores subjects with incomplete information may lead to inefficient and possibly biased estimates. A great deal of attention has been paid to handling a single missing covariate or a monotone pattern of missing data when the missingness mechanism is missing at random. In this article, we propose a semiparametric method for handling non-monotone patterns of missing data. The proposed method relies on the assumption that the missingness mechanism of a variable does not depend on the missing variable itself but may depend on the other missing variables. This mechanism is somewhat less general than the completely non-ignorable mechanism but is sometimes more flexible than the missing at random mechanism where the missingness mechansim is allowed to depend only on the completely observed variables. The proposed approach is robust to misspecification of the distribution of the missing covariates, and the proposed mechanism helps to nullify (or reduce) the problems due to non-identifiability that result from the non-ignorable missingness mechanism. The asymptotic properties of the proposed estimator are derived. Finite sample performance is assessed through simulation studies. Finally, for the purpose of illustration we analyze an endometrial cancer dataset and a hip fracture dataset.

  2. Regressive Prediction Approach to Vertical Handover in Fourth Generation Wireless Networks

    Directory of Open Access Journals (Sweden)

    Abubakar M. Miyim

    2014-11-01

    Full Text Available The over increasing demand for deployment of wireless access networks has made wireless mobile devices to face so many challenges in choosing the best suitable network from a set of available access networks. Some of the weighty issues in 4G wireless networks are fastness and seamlessness in handover process. This paper therefore, proposes a handover technique based on movement prediction in wireless mobile (WiMAX and LTE-A environment. The technique enables the system to predict signal quality between the UE and Radio Base Stations (RBS/Access Points (APs in two different networks. Prediction is achieved by employing the Markov Decision Process Model (MDPM where the movement of the UE is dynamically estimated and averaged to keep track of the signal strength of mobile users. With the help of the prediction, layer-3 handover activities are able to occur prior to layer-2 handover, and therefore, total handover latency can be reduced. The performances of various handover approaches influenced by different metrics (mobility velocities were evaluated. The results presented demonstrate good accuracy the proposed method was able to achieve in predicting the next signal level by reducing the total handover latency.

  3. Assessment of the classification abilities of the CNS multi-parametric optimization approach by the method of logistic regression.

    Science.gov (United States)

    Raevsky, O A; Polianczyk, D E; Mukhametov, A; Grigorev, V Y

    2016-08-01

    Assessment of "CNS drugs/CNS candidates" classification abilities of the multi-parametric optimization (CNS MPO) approach was performed by logistic regression. It was found that the five out of the six separately used physical-chemical properties (topological polar surface area, number of hydrogen-bonded donor atoms, basicity, lipophilicity of compound in neutral form and at pH = 7.4) provided accuracy of recognition below 60%. Only the descriptor of molecular weight (MW) could correctly classify two-thirds of the studied compounds. Aggregation of all six properties in the MPOscore did not improve the classification, which was worse than the classification using only MW. The results of our study demonstrate the imperfection of the CNS MPO approach; in its current form it is not very useful for computer design of new, effective CNS drugs.

  4. A Two-Stage Penalized Logistic Regression Approach to Case-Control Genome-Wide Association Studies

    Directory of Open Access Journals (Sweden)

    Jingyuan Zhao

    2012-01-01

    Full Text Available We propose a two-stage penalized logistic regression approach to case-control genome-wide association studies. This approach consists of a screening stage and a selection stage. In the screening stage, main-effect and interaction-effect features are screened by using L1-penalized logistic like-lihoods. In the selection stage, the retained features are ranked by the logistic likelihood with the smoothly clipped absolute deviation (SCAD penalty (Fan and Li, 2001 and Jeffrey’s Prior penalty (Firth, 1993, a sequence of nested candidate models are formed, and the models are assessed by a family of extended Bayesian information criteria (J. Chen and Z. Chen, 2008. The proposed approach is applied to the analysis of the prostate cancer data of the Cancer Genetic Markers of Susceptibility (CGEMS project in the National Cancer Institute, USA. Simulation studies are carried out to compare the approach with the pair-wise multiple testing approach (Marchini et al. 2005 and the LASSO-patternsearch algorithm (Shi et al. 2007.

  5. Predictability of extreme weather events for NE U.S.: improvement of the numerical prediction using a Bayesian regression approach

    Science.gov (United States)

    Yang, J.; Astitha, M.; Anagnostou, E. N.; Hartman, B.; Kallos, G. B.

    2015-12-01

    Weather prediction accuracy has become very important for the Northeast U.S. given the devastating effects of extreme weather events in the recent years. Weather forecasting systems are used towards building strategies to prevent catastrophic losses for human lives and the environment. Concurrently, weather forecast tools and techniques have evolved with improved forecast skill as numerical prediction techniques are strengthened by increased super-computing resources. In this study, we examine the combination of two state-of-the-science atmospheric models (WRF and RAMS/ICLAMS) by utilizing a Bayesian regression approach to improve the prediction of extreme weather events for NE U.S. The basic concept behind the Bayesian regression approach is to take advantage of the strengths of two atmospheric modeling systems and, similar to the multi-model ensemble approach, limit their weaknesses which are related to systematic and random errors in the numerical prediction of physical processes. The first part of this study is focused on retrospective simulations of seventeen storms that affected the region in the period 2004-2013. Optimal variances are estimated by minimizing the root mean square error and are applied to out-of-sample weather events. The applicability and usefulness of this approach are demonstrated by conducting an error analysis based on in-situ observations from meteorological stations of the National Weather Service (NWS) for wind speed and wind direction, and NCEP Stage IV radar data, mosaicked from the regional multi-sensor for precipitation. The preliminary results indicate a significant improvement in the statistical metrics of the modeled-observed pairs for meteorological variables using various combinations of the sixteen events as predictors of the seventeenth. This presentation will illustrate the implemented methodology and the obtained results for wind speed, wind direction and precipitation, as well as set the research steps that will be

  6. Tree-based approach for exploring marine spatial patterns with raster datasets.

    Science.gov (United States)

    Liao, Xiaohan; Xue, Cunjin; Su, Fenzhen

    2017-01-01

    From multiple raster datasets to spatial association patterns, the data-mining technique is divided into three subtasks, i.e., raster dataset pretreatment, mining algorithm design, and spatial pattern exploration from the mining results. Comparison with the former two subtasks reveals that the latter remains unresolved. Confronted with the interrelated marine environmental parameters, we propose a Tree-based Approach for eXploring Marine Spatial Patterns with multiple raster datasets called TAXMarSP, which includes two models. One is the Tree-based Cascading Organization Model (TCOM), and the other is the Spatial Neighborhood-based CAlculation Model (SNCAM). TCOM designs the "Spatial node→Pattern node" from top to bottom layers to store the table-formatted frequent patterns. Together with TCOM, SNCAM considers the spatial neighborhood contributions to calculate the pattern-matching degree between the specified marine parameters and the table-formatted frequent patterns and then explores the marine spatial patterns. Using the prevalent quantification Apriori algorithm and a real remote sensing dataset from January 1998 to December 2014, a successful application of TAXMarSP to marine spatial patterns in the Pacific Ocean is described, and the obtained marine spatial patterns present not only the well-known but also new patterns to Earth scientists.

  7. Uncertain-tree: discriminating among competing approaches to the phylogenetic analysis of phenotype data

    Science.gov (United States)

    Tanner, Alastair R.; Fleming, James F.; Tarver, James E.; Pisani, Davide

    2017-01-01

    Morphological data provide the only means of classifying the majority of life's history, but the choice between competing phylogenetic methods for the analysis of morphology is unclear. Traditionally, parsimony methods have been favoured but recent studies have shown that these approaches are less accurate than the Bayesian implementation of the Mk model. Here we expand on these findings in several ways: we assess the impact of tree shape and maximum-likelihood estimation using the Mk model, as well as analysing data composed of both binary and multistate characters. We find that all methods struggle to correctly resolve deep clades within asymmetric trees, and when analysing small character matrices. The Bayesian Mk model is the most accurate method for estimating topology, but with lower resolution than other methods. Equal weights parsimony is more accurate than implied weights parsimony, and maximum-likelihood estimation using the Mk model is the least accurate method. We conclude that the Bayesian implementation of the Mk model should be the default method for phylogenetic estimation from phenotype datasets, and we explore the implications of our simulations in reanalysing several empirical morphological character matrices. A consequence of our finding is that high levels of resolution or the ability to classify species or groups with much confidence should not be expected when using small datasets. It is now necessary to depart from the traditional parsimony paradigms of constructing character matrices, towards datasets constructed explicitly for Bayesian methods. PMID:28077778

  8. Calculation of live tree timber volume based on particle swarm optimization and support vector regression%基于支持向量机优化粒子群算法的活立木材积测算

    Institute of Scientific and Technical Information of China (English)

    焦有权; 赵礼曦; 邓欧; 徐伟恒; 冯仲科

    2013-01-01

    several sections, and each section’s volume were summed up as the total tree volume. Based the analytic data, the unary models between diameter at breast and volume were established, and also, to set diameter at breast and tree height as independent variables, tree volume as dependent variable, the binary models could be established, as well as a ternary model that describes the relationship between volume and 3 independent variables including diameter at breast, tree height, and tree step form. Nevertheless, these models mentioned above are sample linear models or nonlinear models. To estimate the forest stocks in the forest survey, former researchers usually cut down target trees and extracted samples based on the principle of sampling, and then made a corresponding volume table. This felled, destructive, and time-consuming method damaged many growth dominant trees. Tree volume modeling is the key step of volume table establishment, and volume usually was predicted by the volume equation that was derived from experience. However, because of the uncertainty of tree growth, it is difficult to effectively predict the complexity and diversity of the volume model through conventional volume equations. For this reason, the volume prediction accuracy rate is unsatisfactory. In order to promote the volume prediction accuracy rate, the algorithm of particle swarm optimization (PSO) was introduced into the standing tree volume prediction model. Moreover, the parameters were optimized by the support vector regression (SVM). The data of diameters at breast height and tree heights of standing trees were input into SVM, which were used to learn, parameters of SVM were used as the particle of PSO, standing trees volume value that were measured by authors were considered as objective function of PSO, then prediction values of standing trees volume were detected by the optimized parameters which were obtained through mutual co-ordination of particle, and the prediction values of

  9. Wavelet coupled MARS and M5 Model Tree approaches for groundwater level forecasting

    Science.gov (United States)

    Rezaie-balf, Mohammad; Naganna, Sujay Raghavendra; Ghaemi, Alireza; Deka, Paresh Chandra

    2017-10-01

    In this study, two different machine learning models, Multivariate Adaptive Regression Splines (MARS) and M5 Model Trees (MT) have been applied to simulate the groundwater level (GWL) fluctuations of three shallow open wells within diverse unconfined aquifers. The Wavelet coupled MARS and MT hybrid models were developed in an attempt to further increase the GWL forecast accuracy. The Discrete Wavelet Transform (DWT) which is particularly effective in dealing with non-stationary time-series data was employed to decompose the input time series into various sub-series components. Historical data of 10 years (August-1996 to July-2006) comprising monthly groundwater level, rainfall, and temperature were used to calibrate and validate the models. The models were calibrated and tested for one, three and six months ahead forecast horizons. The wavelet coupled MARS and MT models were compared with their simple counterpart using standard statistical performance evaluation measures such as Root Mean Square Error (RMSE), Normalized Nash-Sutcliffe Efficiency (NNSE) and Coefficient of Determination (R2) . The wavelet coupled MARS and MT models developed using multi-scale input data performed better compared to their simple counterpart and the forecast accuracy of W-MARS models were superior to that of W-MT models. Specifically, the DWT offered a better discrimination of non-linear and non-stationary trends that were present at various scales in the time series of the input variables thus crafting the W-MARS models to provide more accurate GWL forecasts.

  10. Application of decision tree and logistic regression on the health literacy prediction of hypertension patients%决策树与Logistic回归在高血压患者健康素养预测中的应用

    Institute of Scientific and Technical Information of China (English)

    李现文; 李春玉; Miyong Kim; 李贞姬; 黄德镐; 朱琴淑; 金今姬

    2012-01-01

    目的 探讨和评价决策树与Logistic回归用于预测高血压患者健康素养中的可行性与准确性.方法 利用Logistic回归分析和Answer Tree软件分别建立高血压患者健康素养预测模型,利用受试者工作曲线(ROC)评价两个预测模型的优劣.结果 Logistic回归预测模型的灵敏度(82.5%)、Youden指数(50.9%)高于决策树模型(77.9%,48.0%),决策树模型的特异性(70.1%)高于Logistic回归预测模型(68.4%),误判率(29.9%)低于Logistic回归预测模型(31.6%);决策树模型ROC曲线下面积与Logistic回归预测模型ROC曲线下面积相当(0.813 vs 0.847).结论 利用决策树预测高血压患者健康素养效果与Logistic回归模型相当,根据决策树模型可以确定高血压患者健康素养筛选策略,数据挖掘技术可以用于慢性病患者健康素养预测中.%Objective To study and evaluate the feasibility and accuracy for the application of decision tree methods and logistic regression on the health literacy prediction of hypertension patients. Method Two health literacy prediction models were generated with decision tree methods and logistic regression respectively. The receiver operating curve ( ROC) was used to evaluate the results of the two prediction models. Result The sensitivity(82. 5%) , Youden index (50. 9%)by logistic regression model was higher than decision tree model(77. 9% ,48. 0%) , the Spe-cificity(70. 1%)by decision tree model was higher than that of logistic regression model(68. 4%), The error rate (29.9%) was lower than that of logistic regression model(31. 6%). The ROC for both models were 0. 813 and 0. 847. Conclusion The effect of decision tree prediction model was similar to logistic regression prediction model. Health literacy screening strategy could be obtained by decision tree prediction model, implying the data mining methods is feasible in the chronic disease management of community health service.

  11. D-Tree Approach to Constructing Overlapping Location-Dependent Data Regions

    Institute of Scientific and Technical Information of China (English)

    2005-01-01

    For the past few years, there has been an explosion in the number of individuals carrying wireless devices that are capable of conducting location-dependent information services (LDISs) and provide information based on locations specified in the queries. This paper examines the user's demand of vague queries, and then proposes system architecture of LDISs. In particular, based on D-tree index structure, it takes the method of membership function in fuzzy set theory to provide a more user-oriented approach to constructing overlapping location-dependent data regions, called complementary void overlapping data regions. This structure, carrying more information with less additional storage, brings balance between perceived usefulness and responsiveness, energy consumption, and bandwidth contention in wireless communications. The simulations show that this method is more flexible and practical in practice.

  12. Hybrid regression trees applied to the monitoring of dynamic safety of isolated networks with large eolic production contribution; Utilizacao de arvores de regressao hibridas na monitorizacao da seguranca dinamica de redes isoladas com grande producao eolica

    Energy Technology Data Exchange (ETDEWEB)

    Lopes, J.A Pecas; Vasconcelos, Maria Helena O.P. de [Instituto de Engenharia de Sistemas e Computadores (INESC), Porto (Portugal). E-mail: jpl@riff.fe.up.pt; hvasconcelos@inescn.pt

    1999-07-01

    This paper describes in a synthetic manner the technology adopted to define structures used in the fast evaluation of dynamic safety of isolated network with high level of eolic production contribution. This methodology uses hybrid regression trees, which allows the quantification the endurance connected to the dynamic behavior of these networks by emulating the frequency minimum deviation that will be experienced by the system when submitted toa pre-defined perturbation. Also, new procedures for data automatic generation are presented, which will be used for construction and measurements of the evaluation structures performance. The paper describes the Terceira island - Acores archipelago network study case.

  13. The application of event-tree based approach in long-term crude oil scheduling

    Institute of Scientific and Technical Information of China (English)

    2010-01-01

    This paper addresses the problem of optimal operation in long-term crude oil scheduling,which involves unloading crude oil from vessels,transferring it to charging tanks and feeding it to the distillation units.The application of a new approach for modeling and optimization of long-term crude oil scheduling is presented and the event-tree based modeling method that is very different from mathematical programming is employed.This approach is developed on the basis of natural language modeling and continuous time representation.Event triggered rules,decomposition strategy,depth-first search algorithm and pruning strategy are adopted to improve the efficiency of searching the optimum solution.This approach is successfully applied to an industrial-size problem over a horizon of 4 weeks,involving 7 vessels,6 storage tanks,6 charging tanks,2 crude oil distillation units,and 6 crude oil types.The CPU (AMD 3000+,2.0GHz) solving time is less than 70 seconds.

  14. A piecewise regression approach for determining biologically relevant hydraulic thresholds for the protection of fish at river infrastructure

    Energy Technology Data Exchange (ETDEWEB)

    Boys, Craig A.; Robinson, Wayne; Miller, Brett; Pflugrath, Brett D.; Baumgartner, Lee J.; Navarro, Anna; Brown, Richard S.; Deng, Zhiqun

    2016-05-13

    Barotrauma injury can occur when fish are exposed to rapid decompression during downstream passage through river infrastructure. A piecewise regression approach was used to objectively quantify barotrauma injury thresholds in two physoclistous species (Murray cod Maccullochella peelii and silver perch Bidyanus bidyanus) following simulated infrastructure passage in barometric chambers. The probability of injuries such as swim bladder rupture; exophthalmia; and haemorrhage and emphysema in various organs increased as the ratio between the lowest exposure pressure and the acclimation pressure (ratio of pressure change RPCE/A) fell. The relationship was typically non-linear and piecewise regression was able to quantify thresholds in RPCE/A that once exceeded resulted in a substantial increase in barotrauma injury. Thresholds differed among injury types and between species but by applying a multi-species precautionary principle, the maintenance of exposure pressures at river infrastructure above 70% of acclimation pressure (RPCE/A of 0.7) should sufficiently protect downstream migrating juveniles of these two physoclistous species. These findings have important implications for determining the risk posed by current infrastructures and informing the design and operation of new ones.

  15. Forecasting Helicoverpa populations in Australia: A comparison of regression based models and a bioclimatic based modelling approach

    Institute of Scientific and Technical Information of China (English)

    MYRONP.ZALUCKI; MICHAELJ.FURLONG

    2005-01-01

    Long-term forecasts of pest pressure are central to the effective management of many agricultural insect pests. In the eastern cropping regions of Australia, serious infestations of Helicoverpa punctigera (Wallenglen) and H. armigera (Hübner)(Lepidoptera:Noctuidae) are experienced annually. Regression analyses of a long series of light-trap catches of adult moths were used to describe the seasonal dynamics of both species. The size of the spring generation in eastern cropping zones could be related to rainfall in putative source areas in inland Australia. Subsequent generations could be related to the abundance of various crops in agricultural areas, rainfall and the magnitude of the spring population peak. As rainfall figured prominently as a predictor variable, and can itself be predicted using the Southern Oscillation Index (SOI), trap catches were also related to this variable. The geographic distribution of each species was modelled in relation to climate and CLIMEX was used to predict temporal variation in abundance at given putative source sites in inland Australia using historical meteorological data. These predictions were then correlated with subsequent pest abundance data in a major cropping region. The regression-based and bioclimatic-based approaches to predicting pest abundance are compared and their utility in predicting and interpreting pest dynamics are discussed.

  16. ON THE INFLUENCE OF CLIMATE AND SOCIO-ECONOMIC CONDITION TO THE DENGUE INCIDENCES: A SEMIPARAMETRIC PANEL REGRESSION APPROACH

    Directory of Open Access Journals (Sweden)

    Mutiah Salamah

    2012-01-01

    Full Text Available Dengue is one of the most dangerous diseases in the worlds. In particularly in East Java province Indonesia, dengue has been identified as one of the major causes of death. Hence, it is important to investigate the factors that induce the number of dengue incidences in this region. This study examines climate and socio-economic conditions, which are assumed to influence the number of dengue in the examined region. The semiparametric panel regression approach has been applied and the results are compared with the standard panel regression. In this case, the socio-economic condition is treated parametrically while climate effect is modeled nonparametrically. The analysis showed that the number of dengue incidences is significantly influenced by the income per-capita and the number of inhabitant below 15 years. Furthermore, the dengue incidence is optimum under rainfall of 1500 to 3670 mm, temperature of 22 to 27 degree and humidity of 82 to 87%. The elasticity allows us to identify the most responsive and most irresponsive district towards the changes of climate variable. The study shows that Surabaya is the most responsive district with respect to the change of climate variables.

  17. Operational optimization of irrigation scheduling for citrus trees using an ensemble based data assimilation approach

    Science.gov (United States)

    Hendricks Franssen, H.; Han, X.; Martinez, F.; Jimenez, M.; Manzano, J.; Chanzy, A.; Vereecken, H.

    2013-12-01

    Data assimilation (DA) techniques, like the local ensemble transform Kalman filter (LETKF) not only offer the opportunity to update model predictions by assimilating new measurement data in real time, but also provide an improved basis for real-time (DA-based) control. This study focuses on the optimization of real-time irrigation scheduling for fields of citrus trees near Picassent (Spain). For three selected fields the irrigation was optimized with DA-based control, and for other fields irrigation was optimized on the basis of a more traditional approach where reference evapotranspiration for citrus trees was estimated using the FAO-method. The performance of the two methods is compared for the year 2013. The DA-based real-time control approach is based on ensemble predictions of soil moisture profiles, using the Community Land Model (CLM). The uncertainty in the model predictions is introduced by feeding the model with weather predictions from an ensemble prediction system (EPS) and uncertain soil hydraulic parameters. The model predictions are updated daily by assimilating soil moisture data measured by capacitance probes. The measurement data are assimilated with help of LETKF. The irrigation need was calculated for each of the ensemble members, averaged, and logistic constraints (hydraulics, energy costs) were taken into account for the final assigning of irrigation in space and time. For the operational scheduling based on this approach only model states and no model parameters were updated by the model. Other, non-operational simulation experiments for the same period were carried out where (1) neither ensemble weather forecast nor DA were used (open loop), (2) Only ensemble weather forecast was used, (3) Only DA was used, (4) also soil hydraulic parameters were updated in data assimilation and (5) both soil hydraulic and plant specific parameters were updated. The FAO-based and DA-based real-time irrigation control are compared in terms of soil moisture

  18. Ottawa's urban forest: A geospatial approach to data collection for the UFORE/i-Tree Eco ecosystem services valuation model

    Science.gov (United States)

    Palmer, Michael D.

    The i-Tree Eco model, developed by the U.S. Forest Service, is commonly used to estimate the value of the urban forest and the ecosystem services trees provide. The model relies on field-based measurements to estimate ecosystem service values. However, the methods for collecting the field data required for the model can be extensive and costly for large areas, and data collection can thus be a barrier to implementing the model for many cities. This study investigated the use of geospatial technologies as a means to collect urban forest structure measurements within the City of Ottawa, Ontario. Results show that geospatial data collection methods can serve as a proxy for urban forest structure parameters required by i-Tree Eco. Valuations using the geospatial approach are shown to be less accurate than those developed from field-based data, but significantly less expensive. Planners must weigh the limitations of either approach when planning assessment projects.

  19. A Practical Approach for Extracting Tree Models in Forest Environments Based on Equirectangular Projections of Terrestrial Laser Scans

    Directory of Open Access Journals (Sweden)

    Felix Morsdorf

    2013-10-01

    Full Text Available Extracting 3D tree models based on terrestrial laser scanning (TLS point clouds is a challenging task as trees are complex objects. Current TLS devices acquire high-density data that allow a detailed reconstruction of the tree topology. However, in dense forests a fully automatic reconstruction of trees is often limited by occlusion, wind influences and co-registration issues. In this paper, a semi-automatic method for extracting branching and stem structure based on equirectangular projections (range and intensity maps is presented. The digitization of branches and stems is based on 2D maps, which enables simple navigation and raster processing. The modeling is performed for each viewpoint individually instead of using a registered point cloud. Previously reconstructed 2D-skeletons are transformed between the maps. Therefore, wind influences, orientation imperfections of scans and data gaps can be overcome. The method is applied to a TLS dataset acquired in a forest in Germany. In total 34 scans were carried out within a managed forest to measure approximately 90 spruce trees with minimal occlusions. The results demonstrate the feasibility of the presented approach to extract tree models with a high completeness and correctness and provide an excellent input for further modeling applications.

  20. Semi-Automated Approach for Mapping Urban Trees from Integrated Aerial LiDAR Point Cloud and Digital Imagery Datasets

    Science.gov (United States)

    Dogon-Yaro, M. A.; Kumar, P.; Rahman, A. Abdul; Buyuksalih, G.

    2016-09-01

    Mapping of trees plays an important role in modern urban spatial data management, as many benefits and applications inherit from this detailed up-to-date data sources. Timely and accurate acquisition of information on the condition of urban trees serves as a tool for decision makers to better appreciate urban ecosystems and their numerous values which are critical to building up strategies for sustainable development. The conventional techniques used for extracting trees include ground surveying and interpretation of the aerial photography. However, these techniques are associated with some constraints, such as labour intensive field work and a lot of financial requirement which can be overcome by means of integrated LiDAR and digital image datasets. Compared to predominant studies on trees extraction mainly in purely forested areas, this study concentrates on urban areas, which have a high structural complexity with a multitude of different objects. This paper presented a workflow about semi-automated approach for extracting urban trees from integrated processing of airborne based LiDAR point cloud and multispectral digital image datasets over Istanbul city of Turkey. The paper reveals that the integrated datasets is a suitable technology and viable source of information for urban trees management. As a conclusion, therefore, the extracted information provides a snapshot about location, composition and extent of trees in the study area useful to city planners and other decision makers in order to understand how much canopy cover exists, identify new planting, removal, or reforestation opportunities and what locations have the greatest need or potential to maximize benefits of return on investment. It can also help track trends or changes to the urban trees over time and inform future management decisions.

  1. SEMI-AUTOMATED APPROACH FOR MAPPING URBAN TREES FROM INTEGRATED AERIAL LiDAR POINT CLOUD AND DIGITAL IMAGERY DATASETS

    Directory of Open Access Journals (Sweden)

    M. A. Dogon-Yaro

    2016-09-01

    Full Text Available Mapping of trees plays an important role in modern urban spatial data management, as many benefits and applications inherit from this detailed up-to-date data sources. Timely and accurate acquisition of information on the condition of urban trees serves as a tool for decision makers to better appreciate urban ecosystems and their numerous values which are critical to building up strategies for sustainable development. The conventional techniques used for extracting trees include ground surveying and interpretation of the aerial photography. However, these techniques are associated with some constraints, such as labour intensive field work and a lot of financial requirement which can be overcome by means of integrated LiDAR and digital image datasets. Compared to predominant studies on trees extraction mainly in purely forested areas, this study concentrates on urban areas, which have a high structural complexity with a multitude of different objects. This paper presented a workflow about semi-automated approach for extracting urban trees from integrated processing of airborne based LiDAR point cloud and multispectral digital image datasets over Istanbul city of Turkey. The paper reveals that the integrated datasets is a suitable technology and viable source of information for urban trees management. As a conclusion, therefore, the extracted information provides a snapshot about location, composition and extent of trees in the study area useful to city planners and other decision makers in order to understand how much canopy cover exists, identify new planting, removal, or reforestation opportunities and what locations have the greatest need or potential to maximize benefits of return on investment. It can also help track trends or changes to the urban trees over time and inform future management decisions.

  2. Individual tree crown approach for predicting site index in boreal forests using airborne laser scanning and hyperspectral data

    Science.gov (United States)

    Kandare, Kaja; Ørka, Hans Ole; Dalponte, Michele; Næsset, Erik; Gobakken, Terje

    2017-08-01

    Site productivity is essential information for sustainable forest management and site index (SI) is the most common quantitative measure of it. The SI is usually determined for individual tree species based on tree height and the age of the 100 largest trees per hectare according to stem diameter. The present study aimed to demonstrate and validate a methodology for the determination of SI using remotely sensed data, in particular fused airborne laser scanning (ALS) and airborne hyperspectral data in a forest site in Norway. The applied approach was based on individual tree crown (ITC) delineation: tree species, tree height, diameter at breast height (DBH), and age were modelled and predicted at ITC level using 10-fold cross validation. Four dominant ITCs per 400 m2 plot were selected as input to predict SI at plot level for Norway spruce (Picea abies (L.) Karst.) and Scots pine (Pinus sylvestris L.). We applied an experimental setup with different subsets of dominant ITCs with different combinations of attributes (predicted or field-derived) for SI predictions. The results revealed that the selection of the dominant ITCs based on the largest DBH independent of tree species, predicted the SI with similar accuracy as ITCs matched with field-derived dominant trees (RMSE: 27.6% vs 23.3%). The SI accuracies were at the same level when dominant species were determined from the remotely sensed or field data (RMSE: 27.6% vs 27.8%). However, when the predicted tree age was used the SI accuracy decreased compared to field-derived age (RMSE: 27.6% vs 7.6%). In general, SI was overpredicted for both tree species in the mature forest, while there was an underprediction in the young forest. In conclusion, the proposed approach for SI determination based on ITC delineation and a combination of ALS and hyperspectral data is an efficient and stable procedure, which has the potential to predict SI in forest areas at various spatial scales and additionally to improve existing SI

  3. Tailored approach in inguinal hernia repair – Decision tree based on the guidelines

    Directory of Open Access Journals (Sweden)

    Ferdinand eKöckerling

    2014-06-01

    Full Text Available The endoscopic procedures TEP and TAPP and the open techniques Lichtenstein, Plug and Patch and PHS currently represent the gold standard in inguinal hernia repair recommended in the guidelines of the European Hernia Society, the International Endohernia Society and the European Association of Endoscopic Surgery. 82 % of experienced hernia surgeons use the tailored approach, the differentiated use of the several inguinal hernia repair techniques depending on the findings of the patient, trying to minimize the risks. The following differential therapeutic situations must be distinguished in inguinal hernia repair: unilateral in men, unilateral in women, bilateral, scrotal, after previous pelvic and lower abdominal surgery, no general anaesthesia possible, recurrence and emergency surgery. Evidence-based guidelines and consensus conferences of experts give recommendations for the best approach in the individual situation of a patient. This review tries to summarized the recommendations of the various guidelines and to transfer them into a practical dicision tree for the daily work of surgeons performing inguinal hernia repair.

  4. Optimization and analysis of decision trees and rules: Dynamic programming approach

    KAUST Repository

    Alkhalid, Abdulaziz

    2013-08-01

    This paper is devoted to the consideration of software system Dagger created in KAUST. This system is based on extensions of dynamic programming. It allows sequential optimization of decision trees and rules relative to different cost functions, derivation of relationships between two cost functions (in particular, between number of misclassifications and depth of decision trees), and between cost and uncertainty of decision trees. We describe features of Dagger and consider examples of this systems work on decision tables from UCI Machine Learning Repository. We also use Dagger to compare 16 different greedy algorithms for decision tree construction. © 2013 Taylor and Francis Group, LLC.

  5. Spatial Downscaling of TRMM Precipitation Product Using a Combined Multifractal and Regression Approach: Demonstration for South China

    Directory of Open Access Journals (Sweden)

    Guanghua Xu

    2015-06-01

    Full Text Available The lack of high spatial resolution precipitation data, which are crucial for the modeling and managing of hydrological systems, has triggered many attempts at spatial downscaling. The essence of downscaling lies in extracting extra information from a dataset through some scale-invariant characteristics related to the process of interest. While most studies utilize only one source of information, here we propose an approach that integrates two independent information sources, which are characterized by self-similar and relationship with other geo-referenced factors, respectively. This approach is applied to 16 years (1998–2013 of TRMM 3B43 monthly precipitation data in an orographic and monsoon influenced region in South China. Elevation, latitude, and longitude are used as predictive variables in the regression model, while self-similarity is characterized by multifractals and modeled by a log-normal multiplicative random cascade. The original 0.25° precipitation field was downscaled to the 0.01° scale. The result was validated with rain gauge data. Good consistency was achieved on coefficient of determination, bias, and root mean square error. This study contributes to the current precipitation downscaling methodology and is helpful for hydrology and water resources management, especially in areas with insufficient ground gauges.

  6. Structured Ordinary Least Squares: A Sufficient Dimension Reduction approach for regressions with partitioned predictors and heterogeneous units.

    Science.gov (United States)

    Liu, Yang; Chiaromonte, Francesca; Li, Bing

    2017-06-01

    In many scientific and engineering fields, advanced experimental and computing technologies are producing data that are not just high dimensional, but also internally structured. For instance, statistical units may have heterogeneous origins from distinct studies or subpopulations, and features may be naturally partitioned based on experimental platforms generating them, or on information available about their roles in a given phenomenon. In a regression analysis, exploiting this known structure in the predictor dimension reduction stage that precedes modeling can be an effective way to integrate diverse data. To pursue this, we propose a novel Sufficient Dimension Reduction (SDR) approach that we call structured Ordinary Least Squares (sOLS). This combines ideas from existing SDR literature to merge reductions performed within groups of samples and/or predictors. In particular, it leads to a version of OLS for grouped predictors that requires far less computation than recently proposed groupwise SDR procedures, and provides an informal yet effective variable selection tool in these settings. We demonstrate the performance of sOLS by simulation and present a first application to genomic data. The R package "sSDR," publicly available on CRAN, includes all procedures necessary to implement the sOLS approach. © 2016, The International Biometric Society.

  7. River flow prediction using hybrid models of support vector regression with the wavelet transform, singular spectrum analysis and chaotic approach

    Science.gov (United States)

    Baydaroğlu, Özlem; Koçak, Kasım; Duran, Kemal

    2017-03-01

    Prediction of water amount that will enter the reservoirs in the following month is of vital importance especially for semi-arid countries like Turkey. Climate projections emphasize that water scarcity will be one of the serious problems in the future. This study presents a methodology for predicting river flow for the subsequent month based on the time series of observed monthly river flow with hybrid models of support vector regression (SVR). Monthly river flow over the period 1940-2012 observed for the Kızılırmak River in Turkey has been used for training the method, which then has been applied for predictions over a period of 3 years. SVR is a specific implementation of support vector machines (SVMs), which transforms the observed input data time series into a high-dimensional feature space (input matrix) by way of a kernel function and performs a linear regression in this space. SVR requires a special input matrix. The input matrix was produced by wavelet transforms (WT), singular spectrum analysis (SSA), and a chaotic approach (CA) applied to the input time series. WT convolutes the original time series into a series of wavelets, and SSA decomposes the time series into a trend, an oscillatory and a noise component by singular value decomposition. CA uses a phase space formed by trajectories, which represent the dynamics producing the time series. These three methods for producing the input matrix for the SVR proved successful, while the SVR-WT combination resulted in the highest coefficient of determination and the lowest mean absolute error.

  8. A new non-invasive approach based on polyhexamethylene biguanide increases the regression rate of HPV infection

    Directory of Open Access Journals (Sweden)

    Gentile Antonio

    2012-09-01

    Full Text Available Abstract Background HPV infection is a worldwide problem strictly linked to the development of cervical cancer. Persistence of the infection is one of the main factors responsible for the invasive progression and women diagnosed with intraepithelial squamous lesions are referred for further assessment and surgical treatments which are prone to complications. Despite this, there are several reports on the spontaneous regression of the infection. This study was carried out to evaluate the effectiveness of a long term polyhexamethylene biguanide (PHMB-based local treatment in improving the viral clearance, reducing the time exposure to the infection and avoiding the complications associated with the invasive treatments currently available. Method 100 women diagnosed with HPV infection were randomly assigned to receive six months of treatment with a PHMB-based gynecological solution (Monogin®, Lo.Li. Pharma, Rome - Italy or to remain untreated for the same period of time. Results A greater number of patients, who received the treatment were cleared of the infection at the two time points of the study (three and six months compared to that of the control group. A significant difference in the regression rate (90% Monogin group vs 70% control group was observed at the end of the study highlighting the time-dependent ability of PHMB to interact with the infection progression. Conclusions The topic treatment with PHMB is a preliminary safe and promising approach for patients with detected HPV infection increasing the chance of clearance and avoiding the use of invasive treatments when not strictly necessary. Trial registration ClinicalTrials.gov Identifier NCT01571141

  9. Quantile Regression Methods

    DEFF Research Database (Denmark)

    Fitzenberger, Bernd; Wilke, Ralf Andreas

    2015-01-01

    Quantile regression is emerging as a popular statistical approach, which complements the estimation of conditional mean models. While the latter only focuses on one aspect of the conditional distribution of the dependent variable, the mean, quantile regression provides more detailed insights by m...... treatment of the topic is based on the perspective of applied researchers using quantile regression in their empirical work....

  10. Gene Expression Profiling of Colorectal Tumors and Normal Mucosa by Microarrays Meta-Analysis Using Prediction Analysis of Microarray, Artificial Neural Network, Classification, and Regression Trees

    Directory of Open Access Journals (Sweden)

    Chi-Ming Chu

    2014-01-01

    Full Text Available Background. Microarray technology shows great potential but previous studies were limited by small number of samples in the colorectal cancer (CRC research. The aims of this study are to investigate gene expression profile of CRCs by pooling cDNA microarrays using PAM, ANN, and decision trees (CART and C5.0. Methods. Pooled 16 datasets contained 88 normal mucosal tissues and 1186 CRCs. PAM was performed to identify significant expressed genes in CRCs and models of PAM, ANN, CART, and C5.0 were constructed for screening candidate genes via ranking gene order of significances. Results. The first screening identified 55 genes. The test accuracy of each model was over 0.97 averagely. Less than eight genes achieve excellent classification accuracy. Combining the results of four models, we found the top eight differential genes in CRCs; suppressor genes, CA7, SPIB, GUCA2B, AQP8, IL6R and CWH43; oncogenes, SPP1 and TCN1. Genes of higher significances showed lower variation in rank ordering by different methods. Conclusion. We adopted a two-tier genetic screen, which not only reduced the number of candidate genes but also yielded good accuracy (nearly 100%. This method can be applied to future studies. Among the top eight genes, CA7, TCN1, and CWH43 have not been reported to be related to CRC.

  11. Dynamic fault trees resolution: A conscious trade-off between analytical and simulative approaches

    Energy Technology Data Exchange (ETDEWEB)

    Chiacchio, F., E-mail: chiacchio@dmi.unict.it [Dipartimento di Matematica e Informatica-DMI, Universita degli Studi di Catania (Italy); Compagno, L., E-mail: lco@diim.unict.it [Dipartimento di Ingegneria Industriale e Meccanica-DIIM, Universita degli Studi di Catania (Italy); D' Urso, D., E-mail: ddurso@diim.unict.it [Dipartimento di Ingegneria Industriale e Meccanica-DIIM, Universita degli Studi di Catania (Italy); Manno, G., E-mail: gmanno@dmi.unict.it [Dipartimento di Matematica e Informatica-DMI, Universita degli Studi di Catania (Italy); Trapani, N., E-mail: ntrapani@diim.unict.it [Dipartimento di Ingegneria Industriale e Meccanica-DIIM, Universita degli Studi di Catania (Italy)

    2011-11-15

    Safety assessment in industrial plants with 'major hazards' requires a rigorous combination of both qualitative and quantitative techniques of RAMS. Quantitative assessment can be executed by static or dynamic tools of dependability but, while the former are not sufficient to model exhaustively time-dependent activities, the latter are still too complex to be used with success by the operators of the industrial field. In this paper we present a review of the procedures that can be used to solve quite general dynamic fault trees (DFT) that present a combination of the following characteristics: time dependencies, repeated events and generalized probability failure. Theoretical foundations of the DFT theory are discussed and the limits of the most known DFT tools are presented. Introducing the concept of weak and strong hierarchy, the well-known modular approach is adapted to study a more generic class of DFT. In order to quantify the approximations introduced, an ad-hoc simulative environment is used as benchmark. In the end, a DFT of an accidental scenario is analyzed with both analytical and simulative approaches. Final results are in good agreement and prove how it is possible to implement a suitable Monte Carlo simulation with the features of a spreadsheet environment, able to overcome the limits of the analytical tools, thus encouraging further researches along this direction. - Highlights: > Theoretical foundations of the DFT are reviewed and the limits of the analytical techniques are assessed. > Hierarchical technique is discussed, introducing the concepts of weak and strong equivalence. > Simulative environment developed with a spreadsheet electronic document is tested. > Comparison between the simulative and the analytical results is performed. > Classification of which technique is more suitable is provided, depending on the complexity of the DFT.

  12. Selection bias in species distribution models: An econometric approach on forest trees based on structural modeling

    Science.gov (United States)

    Martin-StPaul, N. K.; Ay, J. S.; Guillemot, J.; Doyen, L.; Leadley, P.

    2014-12-01

    Species distribution models (SDMs) are widely used to study and predict the outcome of global changes on species. In human dominated ecosystems the presence of a given species is the result of both its ecological suitability and human footprint on nature such as land use choices. Land use choices may thus be responsible for a selection bias in the presence/absence data used in SDM calibration. We present a structural modelling approach (i.e. based on structural equation modelling) that accounts for this selection bias. The new structural species distribution model (SSDM) estimates simultaneously land use choices and species responses to bioclimatic variables. A land use equation based on an econometric model of landowner choices was joined to an equation of species response to bioclimatic variables. SSDM allows the residuals of both equations to be dependent, taking into account the possibility of shared omitted variables and measurement errors. We provide a general description of the statistical theory and a set of applications on forest trees over France using databases of climate and forest inventory at different spatial resolution (from 2km to 8km). We also compared the outputs of the SSDM with outputs of a classical SDM (i.e. Biomod ensemble modelling) in terms of bioclimatic response curves and potential distributions under current climate and climate change scenarios. The shapes of the bioclimatic response curves and the modelled species distribution maps differed markedly between SSDM and classical SDMs, with contrasted patterns according to species and spatial resolutions. The magnitude and directions of these differences were dependent on the correlations between the errors from both equations and were highest for higher spatial resolutions. A first conclusion is that the use of classical SDMs can potentially lead to strong miss-estimation of the actual and future probability of presence modelled. Beyond this selection bias, the SSDM we propose represents

  13. Quantile regression

    CERN Document Server

    Hao, Lingxin

    2007-01-01

    Quantile Regression, the first book of Hao and Naiman's two-book series, establishes the seldom recognized link between inequality studies and quantile regression models. Though separate methodological literature exists for each subject, the authors seek to explore the natural connections between this increasingly sought-after tool and research topics in the social sciences. Quantile regression as a method does not rely on assumptions as restrictive as those for the classical linear regression; though more traditional models such as least squares linear regression are more widely utilized, Hao

  14. A Short-Term and High-Resolution System Load Forecasting Approach Using Support Vector Regression with Hybrid Parameters Optimization

    Energy Technology Data Exchange (ETDEWEB)

    Jiang, Huaiguang [National Renewable Energy Laboratory (NREL), Golden, CO (United States)

    2017-08-25

    This work proposes an approach for distribution system load forecasting, which aims to provide highly accurate short-term load forecasting with high resolution utilizing a support vector regression (SVR) based forecaster and a two-step hybrid parameters optimization method. Specifically, because the load profiles in distribution systems contain abrupt deviations, a data normalization is designed as the pretreatment for the collected historical load data. Then an SVR model is trained by the load data to forecast the future load. For better performance of SVR, a two-step hybrid optimization algorithm is proposed to determine the best parameters. In the first step of the hybrid optimization algorithm, a designed grid traverse algorithm (GTA) is used to narrow the parameters searching area from a global to local space. In the second step, based on the result of the GTA, particle swarm optimization (PSO) is used to determine the best parameters in the local parameter space. After the best parameters are determined, the SVR model is used to forecast the short-term load deviation in the distribution system.

  15. Prediction of Currency Volume Issued in Taiwan Using a Hybrid Artificial Neural Network and Multiple Regression Approach

    Directory of Open Access Journals (Sweden)

    Yuehjen E. Shao

    2013-01-01

    Full Text Available Because the volume of currency issued by a country always affects its interest rate, price index, income levels, and many other important macroeconomic variables, the prediction of currency volume issued has attracted considerable attention in recent years. In contrast to the typical single-stage forecast model, this study proposes a hybrid forecasting approach to predict the volume of currency issued in Taiwan. The proposed hybrid models consist of artificial neural network (ANN and multiple regression (MR components. The MR component of the hybrid models is established for a selection of fewer explanatory variables, wherein the selected variables are of higher importance. The ANN component is then designed to generate forecasts based on those important explanatory variables. Subsequently, the model is used to analyze a real dataset of Taiwan's currency from 1996 to 2011 and twenty associated explanatory variables. The prediction results reveal that the proposed hybrid scheme exhibits superior forecasting performance for predicting the volume of currency issued in Taiwan.

  16. Seemingly Unrelated Regression Approach for GSTARIMA Model to Forecast Rain Fall Data in Malang Southern Region Districts

    Directory of Open Access Journals (Sweden)

    Siti Choirun Nisak

    2016-06-01

    Full Text Available Time series forecasting models can be used to predict phenomena that occur in nature. Generalized Space Time Autoregressive (GSTAR is one of time series model used to forecast the data consisting the elements of time and space. This model is limited to the stationary and non-seasonal data. Generalized Space Time Autoregressive Integrated Moving Average (GSTARIMA is GSTAR development model that accommodates the non-stationary and seasonal data. Ordinary Least Squares (OLS is method used to estimate parameter of GSTARIMA model. Estimation parameter of GSTARIMA model using OLS will not produce efficiently estimator if there is an error correlation between spaces. Ordinary Least Square (OLS assumes the variance-covariance matrix has a constant error ~(, but in fact, the observatory spaces are correlated so that variance-covariance matrix of the error is not constant. Therefore, Seemingly Unrelated Regression (SUR approach is used to accommodate the weakness of the OLS. SUR assumption is ~(, for estimating parameters GSTARIMA model. The method to estimate parameter of SUR is Generalized Least Square (GLS. Applications GSTARIMA-SUR models for rainfall data in the region Malang obtained GSTARIMA models ((1(1,12,36,(0,(1-SUR with determination coefficient generated with the average of 57.726%.

  17. Cheminformatics Approach to Gene Silencing: Z Descriptors of Nucleotides and SVM Regression Afford Predictive Models for siRNA Potency.

    Science.gov (United States)

    Ebalunode, Jerry O; Zheng, Weifan

    2010-12-17

    Short interfering RNA mediated gene silencing technology has been through tremendous development over the past decade, and has found broad applications in both basic biomedical research and pharmaceutical development. Critical to the effective use of this technology is the development of reliable algorithms to predict the potency and selectivity of siRNAs under study. Existing algorithms are mostly built upon sequence information of siRNAs and then employ statistical pattern recognition or machine learning techniques to derive rules or models. However, sequence-based features have limited ability to characterize siRNAs, especially chemically modified ones. In this study, we proposed a cheminformatics approach to describe siRNAs. Principal component scores (z1, z2, z3, z4) have been derived for each of the 5 nucleotides (A, U, G, C, T) from the descriptor matrix computed by the MOE program. Descriptors of a given siRNA sequence are simply the concatenation of the z values of its composing nucleotides. Thus, for each of the 2431 siRNA sequences in the Huesken dataset, 76 descriptors were generated for the 19-NT representation, and 84 descriptors were generated for the 21-NT representation of siRNAs. Support Vector Machine regression (SVMR) was employed to develop predictive models. In all cases, the models achieved Pearson correlation coefficient r and R about 0.84 and 0.65 for the training sets and test sets, respectively. A minimum of 25 % of the whole dataset was needed to obtain predictive models that could accurately predict 75 % of the remaining siRNAs. Thus, for the first time, a cheminformatics approach has been developed to successfully model the structure-potency relationship in siRNA-based gene silencing data, which has laid a solid foundation for quantitative modeling of chemically modified siRNAs.

  18. A novel approach to equipment health management based on auto-regressive hidden semi-Markov model (AR-HSMM)

    Institute of Scientific and Technical Information of China (English)

    DONG Ming

    2008-01-01

    As a new maintenance method, CBM (condition based maintenance) is becoming more and more important for the health management of complicated and costly equipment. A prerequisite to widespread deployment of CBM technology and prac-tice in industry is effective diagnostics and prognostics. Recently, a pattern recog-nition technique called HMM (hidden Markov model) was widely used in many fields. However, due to some unrealistic assumptions, diagnositic results from HMM were not so good, and it was difficult to use HMM directly for prognosis. By relaxing the unrealistic assumptions in HMM, this paper presents a novel approach to equip-ment health management based on auto-regressive hidden semi-Markov model (AR-HSMM). Compared with HMM, AR-HSMM has three advantages: 1)It allows explicitly modeling the time duration of the hidden states and therefore is capable of prognosis. 2) It can relax observations' independence assumption by accom-modating a link between consecutive observations. 3) It does not follow the unre-alistic Markov chain's memoryless assumption and therefore provides more pow-erful modeling and analysis capability for real problems. To facilitate the computation in the proposed AR-HSMM-based diagnostics and prognostics, new forwardbackward variables are defined and a modified forward-backward algorithm is developed. The evaluation of the proposed methodology was carried out through a real world application case study: health diagnosis and prognosis of hydraulic pumps in Caterpillar Inc. The testing results show that the proposed new approach based on AR-HSMM is effective and can provide useful support for the decision-making in equipment health management.

  19. Modelling air pollution for epidemiologic research--Part I: A novel approach combining land use regression and air dispersion.

    Science.gov (United States)

    Mölter, A; Lindley, S; de Vocht, F; Simpson, A; Agius, R

    2010-11-01

    A common limitation of epidemiological studies on health effects of air pollution is the quality of exposure data available for study participants. Exposure data derived from urban monitoring networks is usually not adequately representative of the spatial variation of pollutants, while personal monitoring campaigns are often not feasible, due to time and cost restrictions. Therefore, many studies now rely on empirical modelling techniques, such as land use regression (LUR), to estimate pollution exposure. However, LUR still requires a quantity of specifically measured data to develop a model, which is usually derived from a dedicated monitoring campaign. A dedicated air dispersion modelling exercise is also possible but is similarly resource and data intensive. This study adopted a novel approach to LUR, which utilised existing data from an air dispersion model rather than monitored data. There are several advantages to such an approach such as a larger number of sites to develop the LUR model compared to monitored data. Furthermore, through this approach the LUR model can be adapted to predict temporal variation as well as spatial variation. The aim of this study was to develop two LUR models for an epidemiologic study based in Greater Manchester by using modelled NO(2) and PM(10) concentrations as dependent variables, and traffic intensity, emissions, land use and physical geography as potential predictor variables. The LUR models were validated through a set aside "validation" dataset and data from monitoring stations. The final models for PM(10) and NO(2) comprised nine and eight predictor variables respectively and had determination coefficients (R²) of 0.71 (PM(10): Adj. R²=0.70, F=54.89, p<0.001, NO(2): Adj. R²=0.70, F=62.04, p<0.001). Validation of the models using the validation data and measured data showed that the R² decreases compared to the final models, except for NO(2) validation in the measured data (validation data: PM(10): R²=0.33, NO(2

  20. A regression approach to the mapping of bio-physical characteristics of surface sediment using in situ and airborne hyperspectral acquisitions

    Science.gov (United States)

    Ibrahim, Elsy; Kim, Wonkook; Crawford, Melba; Monbaliu, Jaak

    2017-01-01

    Remote sensing has been successfully utilized to distinguish and quantify sediment properties in the intertidal environment. Classification approaches of imagery are popular and powerful yet can lead to site- and case-specific results. Such specificity creates challenges for temporal studies. Thus, this paper investigates the use of regression models to quantify sediment properties instead of classifying them. Two regression approaches, namely multiple regression (MR) and support vector regression (SVR), are used in this study for the retrieval of bio-physical variables of intertidal surface sediment of the IJzermonding, a Belgian nature reserve. In the regression analysis, mud content, chlorophyll a concentration, organic matter content, and soil moisture are estimated using radiometric variables of two airborne sensors, namely airborne hyperspectral sensor (AHS) and airborne prism experiment (APEX) and and using field hyperspectral acquisitions by analytical spectral device (ASD). The performance of the two regression approaches is best for the estimation of moisture content. SVR attains the highest accuracy without feature reduction while MR achieves good results when feature reduction is carried out. Sediment property maps are successfully obtained using the models and hyperspectral imagery where SVR used with all bands achieves the best performance. The study also involves the extraction of weights identifying the contribution of each band of the images in the quantification of each sediment property when MR and principal component analysis are used.

  1. A regression approach to the mapping of bio-physical characteristics of surface sediment using in situ and airborne hyperspectral acquisitions

    Science.gov (United States)

    Ibrahim, Elsy; Kim, Wonkook; Crawford, Melba; Monbaliu, Jaak

    2017-02-01

    Remote sensing has been successfully utilized to distinguish and quantify sediment properties in the intertidal environment. Classification approaches of imagery are popular and powerful yet can lead to site- and case-specific results. Such specificity creates challenges for temporal studies. Thus, this paper investigates the use of regression models to quantify sediment properties instead of classifying them. Two regression approaches, namely multiple regression (MR) and support vector regression (SVR), are used in this study for the retrieval of bio-physical variables of intertidal surface sediment of the IJzermonding, a Belgian nature reserve. In the regression analysis, mud content, chlorophyll a concentration, organic matter content, and soil moisture are estimated using radiometric variables of two airborne sensors, namely airborne hyperspectral sensor (AHS) and airborne prism experiment (APEX) and and using field hyperspectral acquisitions by analytical spectral device (ASD). The performance of the two regression approaches is best for the estimation of moisture content. SVR attains the highest accuracy without feature reduction while MR achieves good results when feature reduction is carried out. Sediment property maps are successfully obtained using the models and hyperspectral imagery where SVR used with all bands achieves the best performance. The study also involves the extraction of weights identifying the contribution of each band of the images in the quantification of each sediment property when MR and principal component analysis are used.

  2. Comparison of two regression-based approaches for determining nutrient and sediment fluxes and trends in the Chesapeake Bay watershed

    Science.gov (United States)

    Moyer, Douglas; Hirsch, Robert M.; Hyer, Kenneth

    2012-01-01

    Nutrient and sediment fluxes and changes in fluxes over time are key indicators that water resource managers can use to assess the progress being made in improving the structure and function of the Chesapeake Bay ecosystem. The U.S. Geological Survey collects annual nutrient (nitrogen and phosphorus) and sediment flux data and computes trends that describe the extent to which water-quality conditions are changing within the major Chesapeake Bay tributaries. Two regression-based approaches were compared for estimating annual nutrient and sediment fluxes and for characterizing how these annual fluxes are changing over time. The two regression models compared are the traditionally used ESTIMATOR and the newly developed Weighted Regression on Time, Discharge, and Season (WRTDS). The model comparison focused on answering three questions: (1) What are the differences between the functional form and construction of each model? (2) Which model produces estimates of flux with the greatest accuracy and least amount of bias? (3) How different would the historical estimates of annual flux be if WRTDS had been used instead of ESTIMATOR? One additional point of comparison between the two models is how each model determines trends in annual flux once the year-to-year variations in discharge have been determined. All comparisons were made using total nitrogen, nitrate, total phosphorus, orthophosphorus, and suspended-sediment concentration data collected at the nine U.S. Geological Survey River Input Monitoring stations located on the Susquehanna, Potomac, James, Rappahannock, Appomattox, Pamunkey, Mattaponi, Patuxent, and Choptank Rivers in the Chesapeake Bay watershed. Two model characteristics that uniquely distinguish ESTIMATOR and WRTDS are the fundamental model form and the determination of model coefficients. ESTIMATOR and WRTDS both predict water-quality constituent concentration by developing a linear relation between the natural logarithm of observed constituent

  3. The integration of geophysical and enhanced Moderate Resolution Imaging Spectroradiometer Normalized Difference Vegetation Index data into a rule-based, piecewise regression-tree model to estimate cheatgrass beginning of spring growth

    Science.gov (United States)

    Boyte, Stephen P.; Wylie, Bruce K.; Major, Donald J.; Brown, Jesslyn F.

    2015-01-01

    Cheatgrass exhibits spatial and temporal phenological variability across the Great Basin as described by ecological models formed using remote sensing and other spatial data-sets. We developed a rule-based, piecewise regression-tree model trained on 99 points that used three data-sets – latitude, elevation, and start of season time based on remote sensing input data – to estimate cheatgrass beginning of spring growth (BOSG) in the northern Great Basin. The model was then applied to map the location and timing of cheatgrass spring growth for the entire area. The model was strong (R2 = 0.85) and predicted an average cheatgrass BOSG across the study area of 29 March–4 April. Of early cheatgrass BOSG areas, 65% occurred at elevations below 1452 m. The highest proportion of cheatgrass BOSG occurred between mid-April and late May. Predicted cheatgrass BOSG in this study matched well with previous Great Basin cheatgrass green-up studies.

  4. Climatic signal from Pinus leucodermis axial resin ducts: a tree-ring time series approach

    OpenAIRE

    Antonio Saracino; Angelo Rita; Sergio Rossi; Laia Andreu-Hayles; G. Helle; Luigi Todaro

    2016-01-01

    Developing long-term chronologies of tree-ring anatomical features to evaluate climatic relationships within species might serve as an annual proxy to explore and elucidate the climatic drivers affecting xylem differentiation. Pinus leucodermis response to climate was examined by analyzing vertical xylem resin ducts in wood growing at high elevation in the Apennines of peninsular Southern Italy. Early- and latewood tree-ring resin duct chronologies, spanning the 1804–2010 time period, were co...

  5. A novel approach to internal crown characterization for coniferous tree species classification

    Science.gov (United States)

    Harikumar, A.; Bovolo, F.; Bruzzone, L.

    2016-10-01

    The knowledge about individual trees in forest is highly beneficial in forest management. High density small foot- print multi-return airborne Light Detection and Ranging (LiDAR) data can provide a very accurate information about the structural properties of individual trees in forests. Every tree species has a unique set of crown structural characteristics that can be used for tree species classification. In this paper, we use both the internal and external crown structural information of a conifer tree crown, derived from a high density small foot-print multi-return LiDAR data acquisition for species classification. Considering the fact that branches are the major building blocks of a conifer tree crown, we obtain the internal crown structural information using a branch level analysis. The structure of each conifer branch is represented using clusters in the LiDAR point cloud. We propose the joint use of the k-means clustering and geometric shape fitting, on the LiDAR data projected onto a novel 3-dimensional space, to identify branch clusters. After mapping the identified clusters back to the original space, six internal geometric features are estimated using a branch-level analysis. The external crown characteristics are modeled by using six least correlated features based on cone fitting and convex hull. Species classification is performed using a sparse Support Vector Machines (sparse SVM) classifier.

  6. Sensitivity of Bovine Tuberculosis Surveillance in Wildlife in France: A Scenario Tree Approach.

    Directory of Open Access Journals (Sweden)

    Julie Rivière

    Full Text Available Bovine tuberculosis (bTB is a common disease in cattle and wildlife, with an impact on animal and human health, and economic implications. Infected wild animals have been detected in some European countries, and bTB reservoirs in wildlife have been identified, potentially hindering the eradication of bTB from cattle populations. However, the surveillance of bTB in wildlife involves several practical difficulties and is not currently covered by EU legislation. We report here the first assessment of the sensitivity of the bTB surveillance system for free-ranging wildlife launched in France in 2011 (the Sylvatub system, based on scenario tree modelling. Three surveillance system components were identified: (i passive scanning surveillance for hunted wild boar, red deer and roe deer, based on carcass examination, (ii passive surveillance on animals found dead, moribund or with abnormal behaviour, for wild boar, red deer, roe deer and badger and (iii active surveillance for wild boar and badger. The application of these three surveillance system components depends on the geographic risk of bTB infection in wildlife, which in turn depends on the prevalence of bTB in cattle. We estimated the effectiveness of the three components of the Sylvatub surveillance system quantitatively, for each species separately. Active surveillance and passive scanning surveillance by carcass examination were the approaches most likely to detect at least one infected animal in a population with a given design prevalence, regardless of the local risk level and species considered. The awareness of hunters, which depends on their training and the geographic risk, was found to affect surveillance sensitivity. The results obtained are relevant for hunters and veterinary authorities wishing to determine the actual efficacy of wildlife bTB surveillance as a function of geographic area and species, and could provide support for decision-making processes concerning the enhancement

  7. Real Options in Defense R and D: A Decision Tree Analysis Approach for Options to Defer, Abandon, and Expand

    Science.gov (United States)

    2016-12-01

    1995). The options approach to capital investment. Harvard Business Review , 73(3), 105–15. Retrieved from https://hbr.org Ehrhardt, M. C., & Brigham...options. Financial Management, 22(3), 259–270. doi:10.2307/3665943 Kester, W. C. (1984). Today’s options for tomorrow’s growth. Harvard Business Review ...Getting started on the numbers. Harvard Business Review , 76(4), 51–67. Magee, J. F. (1964a). Decision trees for decision making. Harvard Business

  8. Does the Magnitude of the Link between Unemployment and Crime Depend on the Crime Level? A Quantile Regression Approach

    Directory of Open Access Journals (Sweden)

    Horst Entorf

    2015-07-01

    Full Text Available Two alternative hypotheses – referred to as opportunity- and stigma-based behavior – suggest that the magnitude of the link between unemployment and crime also depends on preexisting local crime levels. In order to analyze conjectured nonlinearities between both variables, we use quantile regressions applied to German district panel data. While both conventional OLS and quantile regressions confirm the positive link between unemployment and crime for property crimes, results for assault differ with respect to the method of estimation. Whereas conventional mean regressions do not show any significant effect (which would confirm the usual result found for violent crimes in the literature, quantile regression reveals that size and importance of the relationship are conditional on the crime rate. The partial effect is significantly positive for moderately low and median quantiles of local assault rates.

  9. A critical assessment of shrinkage-based regression approaches for estimating the adverse health effects of multiple air pollutants

    Science.gov (United States)

    Roberts, Steven; Martin, Michael

    Most investigations of the adverse health effects of multiple air pollutants analyse the time series involved by simultaneously entering the multiple pollutants into a Poisson log-linear model. Concerns have been raised about this type of analysis, and it has been stated that new methodology or models should be developed for investigating the adverse health effects of multiple air pollutants. In this paper, we introduce the use of the lasso for this purpose and compare its statistical properties to those of ridge regression and the Poisson log-linear model. Ridge regression has been used in time series analyses on the adverse health effects of multiple air pollutants but its properties for this purpose have not been investigated. A series of simulation studies was used to compare the performance of the lasso, ridge regression, and the Poisson log-linear model. In these simulations, realistic mortality time series were generated with known air pollution mortality effects permitting the performance of the three models to be compared. Both the lasso and ridge regression produced more accurate estimates of the adverse health effects of the multiple air pollutants than those produced using the Poisson log-linear model. This increase in accuracy came at the expense of increased bias. Ridge regression produced more accurate estimates than the lasso, but the lasso produced more interpretable models. The lasso and ridge regression offer a flexible way of obtaining more accurate estimation of pollutant effects than that provided by the standard Poisson log-linear model.

  10. The Nuisance of Nuisance Regression: Spectral Misspecification in a Common Approach to Resting-State fMRI Preprocessing Reintroduces Noise and Obscures Functional Connectivity

    Science.gov (United States)

    Hallquist, Michael N.; Hwang, Kai; Luna, Beatriz

    2013-01-01

    Recent resting-state functional connectivity fMRI (RS-fcMRI) research has demonstrated that head motion during fMRI acquisition systematically influences connectivity estimates despite bandpass filtering and nuisance regression, which are intended to reduce such nuisance variability. We provide evidence that the effects of head motion and other nuisance signals are poorly controlled when the fMRI time series are bandpass-filtered but the regressors are unfiltered, resulting in the inadvertent reintroduction of nuisance-related variation into frequencies previously suppressed by the bandpass filter, as well as suboptimal correction for noise signals in the frequencies of interest. This is important because many RS-fcMRI studies, including some focusing on motion-related artifacts, have applied this approach. In two cohorts of individuals (n = 117 and 22) who completed resting-state fMRI scans, we found that the bandpass-regress approach consistently overestimated functional connectivity across the brain, typically on the order of r = .10 – .35, relative to a simultaneous bandpass filtering and nuisance regression approach. Inflated correlations under the bandpass-regress approach were associated with head motion and cardiac artifacts. Furthermore, distance-related differences in the association of head motion and connectivity estimates were much weaker for the simultaneous filtering approach. We recommend that future RS-fcMRI studies ensure that the frequencies of nuisance regressors and fMRI data match prior to nuisance regression, and we advocate a simultaneous bandpass filtering and nuisance regression strategy that better controls nuisance-related variability. PMID:23747457

  11. Dendroclimatic reconstruction with time varying predictor subsets of tree indices

    Energy Technology Data Exchange (ETDEWEB)

    Meko, D. [Univ. of Arizona, Tucson, AZ (United States)

    1997-04-01

    Tree-ring site chronologies, the predictors for most dendroclimatic reconstructions, are essentially mean-value functions with a time varying sample size (number of trees) and sample composition. Because reconstruction models are calibrated and verified on the most recent, best-replicated part of the chronologies, regression and verification statistics can be misleading as indicators of long-term reconstruction accuracy. A new reconstruction method is described that circumvents the use of site chronologies and instead derives predictor variables from indices of individual trees. Separate regression models are estimated and cross validated for various time segments of the tree-ring record, depending on the trees available at the time. This approach allows the reconstruction to extend to the first year covered by any tree in the network and yields direct evaluation of the change in reconstruction accuracy with tree-ring sample composition. The method includes two regression stages. The first is to separately deconvolve the local climate signal for individual trees, and the second is to weight the deconvolved signals into estimates of the climatic variable to be reconstructed. The method is illustrated in an application of precipitation and tree-ring data for the San Pedro River Basin in southeastern Arizona. Extensions to larger-scale problems and spatial reconstruction are suggested. 17 refs., 4 figs., 4 tabs.

  12. Supporting Frequent Updates in R-Trees: A Bottom-Up Approach

    DEFF Research Database (Denmark)

    Lee, Mong Li; Hsu, Wynne; Jensen, Christian Søndergaard

    2004-01-01

    locality. While the R-tree is the index of choice for multi-dimensional data with low dimensionality, and is thus relevant to these applications, R-tree updates are also relatively inefficient. We present a bottom-up update strategy for R-trees that generalizes existing update techniques and aims......Advances in hardware-related technologies promise to enable new data management applications that monitor continuous processes. In these applications, enormous amounts of state samples are obtained via sensors and are streamed to a database. Further, updates are very frequent and may exhibit....... Empirical studies indicate that the bottom-up strategy outperforms the traditional top-down technique, leads to indices with better query performance, achieves higher throughput, and is scalable....

  13. Morse–Smale Regression

    Energy Technology Data Exchange (ETDEWEB)

    Gerber, Samuel [Univ. of Utah, Salt Lake City, UT (United States); Rubel, Oliver [Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States); Bremer, Peer -Timo [Lawrence Livermore National Lab. (LLNL), Livermore, CA (United States); Pascucci, Valerio [Univ. of Utah, Salt Lake City, UT (United States); Whitaker, Ross T. [Univ. of Utah, Salt Lake City, UT (United States)

    2012-01-19

    This paper introduces a novel partition-based regression approach that incorporates topological information. Partition-based regression typically introduces a quality-of-fit-driven decomposition of the domain. The emphasis in this work is on a topologically meaningful segmentation. Thus, the proposed regression approach is based on a segmentation induced by a discrete approximation of the Morse–Smale complex. This yields a segmentation with partitions corresponding to regions of the function with a single minimum and maximum that are often well approximated by a linear model. This approach yields regression models that are amenable to interpretation and have good predictive capacity. Typically, regression estimates are quantified by their geometrical accuracy. For the proposed regression, an important aspect is the quality of the segmentation itself. Thus, this article introduces a new criterion that measures the topological accuracy of the estimate. The topological accuracy provides a complementary measure to the classical geometrical error measures and is very sensitive to overfitting. The Morse–Smale regression is compared to state-of-the-art approaches in terms of geometry and topology and yields comparable or improved fits in many cases. Finally, a detailed study on climate-simulation data demonstrates the application of the Morse–Smale regression. Supplementary Materials are available online and contain an implementation of the proposed approach in the R package msr, an analysis and simulations on the stability of the Morse–Smale complex approximation, and additional tables for the climate-simulation study.

  14. A DATA MINING APPROACH TO PREDICT PROSPECTIVE BUSINESS SECTORS FOR LENDING IN RETAIL BANKING USING DECISION TREE

    Directory of Open Access Journals (Sweden)

    Md. Rafiqul Islam

    2015-03-01

    Full Text Available A potential objective of every financial organization is to retain existing customers and attain new prospective customers for long-term. The economic behaviour of customer and the nature of the organization are controlled by a prescribed form called Know Your Customer (KYC in manual banking. Depositor customers in some sectors (business of Jewellery/Gold, Arms, Money exchanger etc are with high risk; whereas in some sectors (Transport Operators, Auto-delear, religious are with medium risk; and in remaining sectors (Retail, Corporate, Service, Farmer etc belongs to low risk. Presently, credit risk for counterparty can be broadly categorized under quantitative and qualitative factors. Although there are many existing systems on customer retention as well as customer attrition systems in bank, these rigorous methods suffers clear and defined approach to disburse loan in business sector. In the paper, we have used records of business customers of a retail commercial bank in the city including rural and urban area of (Tangail city Bangladesh to analyse the major transactional determinants of customers and predicting of a model for prospective sectors in retail bank. To achieve this, data mining approach is adopted for analysing the challenging issues, where pruned decision tree classification technique has been used to develop the model and finally tested its performance with Weka result. Moreover, this paper attempts to build up a model to predict prospective business sectors in retail banking. KEYWORDS Data Mining, Decision Tree, Tree Pruning, Prospective Business Sector, Customer,

  15. A regression approach for estimation of anthropogenic heat flux based on a bottom-up air pollutant emission database

    Science.gov (United States)

    Lee, Sang-Hyun; McKeen, Stuart A.; Sailor, David J.

    2014-10-01

    A statistical regression method is presented for estimating hourly anthropogenic heat flux (AHF) using an anthropogenic pollutant emission inventory for use in mesoscale meteorological and air-quality modeling. Based on bottom-up AHF estimated from detailed energy consumption data and anthropogenic pollutant emissions of carbon monoxide (CO) and nitrogen oxides (NOx) in the US National Emission Inventory year 2005 (NEI-2005), a robust regression relation between the AHF and the pollutant emissions is obtained for Houston. This relation is a combination of two power functions (Y = aXb) relating CO and NOx emissions to AHF, giving a determinant coefficient (R2) of 0.72. The AHF for Houston derived from the regression relation has high temporal (R = 0.91) and spatial (R = 0.83) correlations with the bottom-up AHF. Hourly AHF for the whole US in summer is estimated by applying the regression relation to the NEI-2005 summer pollutant emissions with a high spatial resolution of 4-km. The summer daily mean AHF range 10-40 W m-2 on a 4 × 4 km2 grid scale with maximum heat fluxes of 50-140 W m-2 for major US cities. The AHFs derived from the regression relations between the bottom-up AHF and either CO or NOx emissions show a small difference of less than 5% (4.7 W m-2) in city-scale daily mean AHF, and similar R2 statistics, compared to results from their combination. Thus, emissions of either species can be used to estimate AHF in the US cities. An hourly AHF inventory at 4 × 4 km2 resolution over the entire US based on the combined regression is derived and made publicly available for use in mesoscale numerical modeling.

  16. Seeing beyond fertiliser trees : a case study of a community based participatory approach to agroforestry research and development in western Kenya

    NARCIS (Netherlands)

    Kiptot, E.

    2007-01-01

    Key words: village committee approach, agroforestry, improved tree fallows, biomass transfer, realist evaluation, soil fertility, adoption, dissemination.   The thesis explores and describes various processes that take place in the implementation of a community based participatory initiative known a

  17. Eliciting indigenous knowledge on tree fodder among Maasai pastoralists via a multi-method sequencing approach

    NARCIS (Netherlands)

    Kiptot, E.

    2007-01-01

    Although the potential of indigenous knowledge in sustainable natural resource management has been recognized, methods of gathering and utilizing it effectively are still being developed and tested. This paper focuses on various methods used in gathering knowledge on the use and management of tree f

  18. Eliciting indigenous knowledge on tree fodder among Maasai pastoralists via a multi-method sequencing approach

    NARCIS (Netherlands)

    Kiptot, E.

    2007-01-01

    Although the potential of indigenous knowledge in sustainable natural resource management has been recognized, methods of gathering and utilizing it effectively are still being developed and tested. This paper focuses on various methods used in gathering knowledge on the use and management of tree f

  19. Reverse Query Tree approach to cope with Id distribution problem in Tree-based tag anti-collision protocols of RFID

    Directory of Open Access Journals (Sweden)

    Milad HajMirzaei

    2013-09-01

    Full Text Available Tag collision is one of the most important issues in RFID systems and many tag anti-collision protocols were proposed in literature. But some kind of these protocols like Tree-based protocols (specifically Query tree which its performance depends on tag id length and construction, have some issues like id distribution. In this paper we discuss about Query tree protocol which may influenced by id distribution. Then we propose a novel type of this protocol called Reverse Query tree to solve it.

  20. The suitability of the dual isotope approach (δ13C and δ18O) in tree ring studies

    Science.gov (United States)

    Siegwolf, Rolf; Saurer, Matthias

    2016-04-01

    The use of stable isotopes, complementary to tree ring width data in tree ring research has proven to be a powerful tool in studying the impact of environmental parameters on tree physiology and growth. These three proxies are thus instrumental for climate reconstruction and improve the understanding of underlying causes of growth changes. In various cases, however, their use suggests non-plausible interpretations. Often the use of one isotope alone does not allow the detection of such "erroneous isotope responses". A careful analysis of these deviating results shows that either the validity of the carbon isotope discrimination concept is no longer true (Farquhar et al. 1982) or the assumptions for the leaf water enrichment model (Cernusak et al., 2003) are violated and thus both fractionation models are not applicable. In this presentation we discuss such cases when the known fractionation concepts fail and do not allow a correct interpretation of the isotope data. With the help of the dual isotope approach (Scheidegger et al.; 2000) it is demonstrated, how to detect and uncover the causes for such anomalous isotope data. The fractionation concepts and their combinations before the background of CO2 and H2O gas exchange are briefly explained and the specific use of the dual isotope approach for tree ring data analyses and interpretations are demonstrated. References: Cernusak, L. A., Arthur, D. J., Pate, J. S. and Farquhar, G. D.: Water relations link carbon and oxygen isotope discrimination to phloem sap sugar concentration in Eucalyptus globules, Plant Physiol., 131, 1544-1554, 2003. Farquhar, G. D., O'Leary, M. H. and Berry, J. A.: On the relationship between carbon isotope discrimination and the intercellular carbon dioxide concentration in leaves, Aust. J. Plant Physiol., 9, 121-137, 1982. Scheidegger, Y., Saurer, M., Bahn, M. and Siegwolf, R.: Linking stable oxygen and carbon isotopes with stomatal conductance and photosynthetic capacity: A conceptual model

  1. 基于后退式搜索的自适应多叉树防碰撞算法%Anti-collision algorithm for adaptive multi-branch tree based on regressive-style search

    Institute of Scientific and Technical Information of China (English)

    孙文胜; 胡玲敏

    2011-01-01

    Concerning the common problem of tag collision in Radio Frequency Identification (RFID) system, an improved anti-collision algorithm for multi-branch tree was proposed based on the regressive-style search algorithm.According to the characteristics of the tags collision, the presented algorithm adopted the dormancy count, and took quad tree structure when continuous collision appeared, which had the ability to choose the number of forks dynamically during the searching process, reduced the search range and improved the identification efficiency.The performance analysis results show that the system efficiency of the proposed algorithm is about 76.5%; moreover, with the number of tags increased, the superiority of the performance is more obvious.%针对无线射频识别(RFID)系统中常见的标签防碰撞问题,在后退式搜索算法的基础上提出了一种改进的多叉树防碰撞算法.根据标签碰撞的特点,采用休眠计数的方法,以及遇到连续碰撞位时进行四叉树分裂的策略,使得在搜索过程中能够动态选择分叉数量,缩短了标签识别时间,有效地提高了算法的搜索效率.性能分析表明,该算法的系统识别效率达76.5%,且随着标签数目的增多,优越性更加明显.

  2. A Quantile Regression Approach to Understanding the Relations among Morphological Awareness, Vocabulary, and Reading Comprehension in Adult Basic Education Students

    Science.gov (United States)

    Tighe, Elizabeth L.; Schatschneider, Christopher

    2016-01-01

    The purpose of this study was to investigate the joint and unique contributions of morphological awareness and vocabulary knowledge at five reading comprehension levels in adult basic education (ABE) students. We introduce the statistical technique of multiple quantile regression, which enabled us to assess the predictive utility of morphological…

  3. A comparative study between nonlinear regression and artificial neural network approaches for modelling wild oat (Avena fatua) field emergence

    Science.gov (United States)

    Non-linear regression techniques are used widely to fit weed field emergence patterns to soil microclimatic indices using S-type functions. Artificial neural networks present interesting and alternative features for such modeling purposes. In this work, a univariate hydrothermal-time based Weibull m...

  4. Predicting attention-deficit/hyperactivity disorder severity from psychosocial stress and stress-response genes : a random forest regression approach

    NARCIS (Netherlands)

    van der Meer, D.; Hoekstra, P. J.; van Donkelaar, Marjolein M. J.; Bralten, Janita; Oosterlaan, J; Heslenfeld, Dirk J.; Faraone, S. V.; Franke, B.; Buitelaar, J. K.; Hartman, C. A.

    2017-01-01

    Identifying genetic variants contributing to attention-deficit/hyperactivity disorder (ADHD) is complicated by the involvement of numerous common genetic variants with small effects, interacting with each other as well as with environmental factors, such as stress exposure. Random forest regression

  5. Predicting attention-deficit/hyperactivity disorder severity from psychosocial stress and stress-response genes: a random forest regression approach

    NARCIS (Netherlands)

    Meer, D. van der; Hoekstra, P.J.; Donkelaar, M.M.J. van; Bralten, J.B.; Oosterlaan, J.; Heslenfeld, D.; Faraone, S.V; Franke, B.; Buitelaar, J.K.; Hartman, C.A.

    2017-01-01

    Identifying genetic variants contributing to attention-deficit/hyperactivity disorder (ADHD) is complicated by the involvement of numerous common genetic variants with small effects, interacting with each other as well as with environmental factors, such as stress exposure. Random forest regression

  6. A Novel Approach for Retrieving Tree Leaf Area from Ground-Based LiDAR

    Directory of Open Access Journals (Sweden)

    Ting Yun

    2016-11-01

    Full Text Available Leaf area is an important plant canopy structure parameter with important ecological significance. Light detection and ranging technology (LiDAR with the application of a terrestrial laser scanner (TLS is an appealing method for accurately estimating leaf area; however, the actual utility of this scanner depends largely on the efficacy of point cloud data (PCD analysis. In this paper, we present a novel method for quantifying total leaf area within each tree canopy from PCD. Firstly, the shape, normal vector distribution and structure tensor of PCD features were combined with the semi-supervised support vector machine (SVM method to separate various tree organs, i.e., branches and leaves. In addition, the moving least squares (MLS method was adopted to remove ghost points caused by the shaking of leaves in the wind during the scanning process. Secondly, each target tree was scanned using two patterns, i.e., one scan and three scans around the canopy, to reduce the occlusion effect. Specific layer subdivision strategies according to the acquisition ranges of the scanners were designed to separate the canopy into several layers. Thirdly, 10% of the PCD was randomly chosen as an analytic dataset (ADS. For the ADS, an innovative triangulation algorithm with an assembly threshold was designed to transform these discrete scanning points into leaf surfaces and estimate the fractions of each foliage surface covered by the laser pulses. Then, a novel ratio of the point number to leaf area in each layer was defined and combined with the total number of scanned points to retrieve the total area of the leaves in the canopy. The quantified total leaf area of each tree was validated using laborious measurements with a LAI-2200 Plant Canopy Analyser and an LI-3000C Portable Area Meter. The results showed that the individual tree leaf area was accurately reproduced using our method from three registered scans, with a relative deviation of less than 10

  7. CDT coupled to dimer matter: An analytical approach via tree bijections

    CERN Document Server

    Atkin, Max R

    2012-01-01

    We review a recently obtained analytical solution of a restricted so-called hard dimers model coupled to two-dimensional CDT. The combinatorial solution is obtained via bijections of causal triangulations with dimers and decorated trees. We show that the scaling limit of this model can also be obtained from a multi-critical point of the transfer matrix for dynamical triangulations of triangles and squares when one disallows for spatial topology changes to occur.

  8. Tropical dendrochemistry: A novel approach for reconstructing seasonally-resolved growth rates from ringless tropical trees

    Science.gov (United States)

    Poussart, P. M.; Myneni, S. C.

    2005-12-01

    Although tropical forests play an active role in the global carbon cycle and are host to a variety of pristine paleoclimate archives, they remain poorly characterized as compared to other ecosystems on the planet. In particular, dating and reconstructing the growth rate history of tropical trees remains a challenge and continues to delay research efforts towards understanding tropical forest dynamics. Traditional dendrochronological techniques have found limited applications in the tropics because temperature seasonality is often too small to initiate the production of visible annual growth rings. Dendrometers, cambium scarring methods and sub-annual records of oxygen and carbon isotopes from tree cellulose may be used to estimate growth rate histories when growth rings are absent. However, dendrometer records rarely extend beyond the past couple of decades and the generation of seasonally-resolved isotopic records remains labour intensive, currently prohibiting the level of record replication necessary for statistical analysis. Here, we present evidence that Ca may also be used as a proxy for dating and reconstructing growth rates of trees lacking visible growth rings. Using the Brookhaven National Lab Synchrotron, we recover a radial record of cyclic variations in Ca from a Miliusa velutina tree from northern Thailand. We determine that the Ca cycles are seasonal based on a comparison between radiocarbon age estimates and a trace element age model, which agree within 2 years over the period of 1955 to 2000. The amplitude of the Ca annual cycle is significantly correlated with growth rate estimates, which are also correlated to the amount of dry season rainfall. The measurements at the Synchrotron are fast, non-destructive and require little sample preparation. Application of this technique in the tropics holds the potential to resolve longstanding questions about tropical forest dynamics and interannual to decadal changes in the carbon cycle.

  9. Decision tree approach to evaluating inactive uranium processing sites for liner requirements

    Energy Technology Data Exchange (ETDEWEB)

    Relyea, J.F.

    1983-03-01

    Recently, concern has been expressed about potential toxic effects of both radon emission and release of toxic elements in leachate from inactive uranium mill tailings piles. Remedial action may be required to meet disposal standards set by the states and the US Environmental Protection Agency (EPA). In some cases, a possible disposal option is the exhumation and reburial (either on site or at a new location) of tailings and reliance on engineered barriers to satisfy the objectives established for remedial actions. Liners under disposal pits are the major engineered barrier for preventing contaminant release to ground and surface water. The purpose of this report is to provide a logical sequence of action, in the form of a decision tree, which could be followed to show whether a selected tailings disposal design meets the objectives for subsurface contaminant release without a liner. This information can be used to determine the need and type of liner for sites exhibiting a potential groundwater problem. The decision tree is based on the capability of hydrologic and mass transport models to predict the movement of water and contaminants with time. The types of modeling capabilities and data needed for those models are described, and the steps required to predict water and contaminant movement are discussed. A demonstration of the decision tree procedure is given to aid the reader in evaluating the need for the adequacy of a liner.

  10. Carbon footprint of forest and tree utilization technologies in life cycle approach

    Science.gov (United States)

    Polgár, András; Pécsinger, Judit

    2017-04-01

    In our research project a suitable method has been developed related the technological aspect of the environmental assessment of land use changes caused by climate change. We have prepared an eco-balance (environmental inventory) to the environmental effects classification in life-cycle approach in connection with the typical agricultural / forest and tree utilization technologies. The use of balances and environmental classification makes possible to compare land-use technologies and their environmental effects per common functional unit. In order to test our environmental analysis model, we carried out surveys in sample of forest stands. We set up an eco-balance of the working systems of intermediate cutting and final harvest in the stands of beech, oak, spruce, acacia, poplar and short rotation energy plantations (willow, poplar). We set up the life-cycle plan of the surveyed working systems by using the GaBi 6.0 Professional software and carried out midpoint and endpoint impact assessment. Out of the results, we applied the values of CML 2001 - Global Warming Potential (GWP 100 years) [kg CO2-Equiv.] and Eco-Indicator 99 - Human health, Climate Change [DALY]. On the basis of the values we set up a ranking of technology. By this, we received the environmental impact classification of the technologies based on carbon footprint. The working systems had the greatest impact on global warming (GWP 100 years) throughout their whole life cycle. This is explained by the amount of carbon dioxide releasing to the atmosphere resulting from the fuel of the technologies. Abiotic depletion (ADP foss) and marine aquatic ecotoxicity (MAETP) emerged also as significant impact categories. These impact categories can be explained by the share of input of fuel and lube. On the basis of the most significant environmental impact category (carbon footprint), we perform the relative life cycle contribution and ranking of each technologies. The technological life cycle stages examined

  11. Seasonal river discharge forecast in alpine catchments using snow map time series and support vector regression approach

    OpenAIRE

    Callegari, Mattia; Mazzoli, Paolo; Gregorio, Ludovica de; Notarnicola, Claudia; PETITTA Marcello; Pasolli, Luca; Seppi, Roberto; Pistocchi, Alberto

    2014-01-01

    The prediction of monthly mean discharge is critical for water resources management. Statistical methods applied on discharge time series are traditionally used for predicting this kind of slow response hydrological events. With this paper we present a Support Vector Regression (SVR) system able to predict monthly mean discharge considering discharge and snow cover extent (250 meters resolution obtained by MODIS images) time series as input. Additional meteorological and climatic variables ar...

  12. Effects of the Interest Rate and Reserve Requirement Ratio on Bank Risk in China: A Panel Smooth Transition Regression Approach

    OpenAIRE

    Zhongyuan Geng; Xue Zhai

    2015-01-01

    This paper applies the Panel Smooth Transition Regression (PSTR) model to simulate the effects of the interest rate and reserve requirement ratio on bank risk in China. The results reveal the nonlinearity embedded in the interest rate, reserve requirement ratio, and bank risk nexus. Both the interest rate and reserve requirement ratio exert a positive impact on bank risk for the low regime and a negative impact for the high regime. The interest rate performs a significant effect while the res...

  13. Effects of the Interest Rate and Reserve Requirement Ratio on Bank Risk in China: A Panel Smooth Transition Regression Approach

    OpenAIRE

    Zhongyuan Geng; Xue Zhai

    2015-01-01

    This paper applies the Panel Smooth Transition Regression (PSTR) model to simulate the effects of the interest rate and reserve requirement ratio on bank risk in China. The results reveal the nonlinearity embedded in the interest rate, reserve requirement ratio, and bank risk nexus. Both the interest rate and reserve requirement ratio exert a positive impact on bank risk for the low regime and a negative impact for the high regime. The interest rate performs a significant effect while the res...

  14. Autistic epileptiform regression.

    Science.gov (United States)

    Canitano, Roberto; Zappella, Michele

    2006-01-01

    Autistic regression is a well known condition that occurs in one third of children with pervasive developmental disorders, who, after normal development in the first year of life, undergo a global regression during the second year that encompasses language, social skills and play. In a portion of these subjects, epileptiform abnormalities are present with or without seizures, resembling, in some respects, other epileptiform regressions of language and behaviour such as Landau-Kleffner syndrome. In these cases, for a more accurate definition of the clinical entity, the term autistic epileptifom regression has been suggested. As in other epileptic syndromes with regression, the relationships between EEG abnormalities, language and behaviour, in autism, are still unclear. We describe two cases of autistic epileptiform regression selected from a larger group of children with autistic spectrum disorders, with the aim of discussing the clinical features of the condition, the therapeutic approach and the outcome.

  15. A Systematic Approach for Dynamic Security Assessment and the Corresponding Preventive Control Scheme Based on Decision Trees

    DEFF Research Database (Denmark)

    Liu, Leo; Sun, Kai; Rather, Zakir Hussain

    2014-01-01

    This paper proposes a decision tree (DT)-based systematic approach for cooperative online power system dynamic security assessment (DSA) and preventive control. This approach adopts a new methodology that trains two contingency-oriented DTs on a daily basis by the databases generated from power...... system simulations. Fed with real-time wide-area measurements, one DT of measurable variables is employed for online DSA to identify potential security issues, and the other DT of controllable variables provides online decision support on preventive control strategies against those issues. A cost......-effective algorithm is adopted in this proposed approach to optimize the trajectory of preventive control. The paper also proposes an importance sampling algorithm on database preparation for efficient DT training for power systems with high penetration of wind power and distributed generation. The performance...

  16. Autistic Regression

    Science.gov (United States)

    Matson, Johnny L.; Kozlowski, Alison M.

    2010-01-01

    Autistic regression is one of the many mysteries in the developmental course of autism and pervasive developmental disorders not otherwise specified (PDD-NOS). Various definitions of this phenomenon have been used, further clouding the study of the topic. Despite this problem, some efforts at establishing prevalence have been made. The purpose of…

  17. Logistic regression.

    Science.gov (United States)

    Nick, Todd G; Campbell, Kathleen M

    2007-01-01

    The Medical Subject Headings (MeSH) thesaurus used by the National Library of Medicine defines logistic regression models as "statistical models which describe the relationship between a qualitative dependent variable (that is, one which can take only certain discrete values, such as the presence or absence of a disease) and an independent variable." Logistic regression models are used to study effects of predictor variables on categorical outcomes and normally the outcome is binary, such as presence or absence of disease (e.g., non-Hodgkin's lymphoma), in which case the model is called a binary logistic model. When there are multiple predictors (e.g., risk factors and treatments) the model is referred to as a multiple or multivariable logistic regression model and is one of the most frequently used statistical model in medical journals. In this chapter, we examine both simple and multiple binary logistic regression models and present related issues, including interaction, categorical predictor variables, continuous predictor variables, and goodness of fit.

  18. Linking Tree Growth Response to Measured Microclimate - A Field Based Approach

    Science.gov (United States)

    Martin, J. T.; Hoylman, Z. H.; Looker, N. T.; Jencso, K. G.; Hu, J.

    2015-12-01

    The general relationship between climate and tree growth is a well established and important tenet shaping both paleo and future perspectives of forest ecosystem growth dynamics. Across much of the American west, water limits growth via physiological mechanisms that tie regional and local climatic conditions to forest productivity in a relatively predictable way, and these growth responses are clearly evident in tree ring records. However, within the annual cycle of a forest landscape, water availability varies across both time and space, and interacts with other potentially growth limiting factors such as temperature, light, and nutrients. In addition, tree growth responses may lag climate drivers and may vary in terms of where in a tree carbon is allocated. As such, determining when and where water actually limits forest growth in real time can be a significant challenge. Despite these challenges, we present data suggestive of real-time growth limitation driven by soil moisture supply and atmospheric water demand reflected in high frequency field measurements of stem radii and cell structure across ecological gradients. The experiment was conducted at the Lubrecht Experimental Forest in western Montana where, over two years, we observed intra-annual growth rates of four dominant conifer species: Douglas fir, Ponderosa Pine, Engelmann Spruce and Western Larch using point dendrometers and microcores. In all four species studied, compensatory use of stored water (inferred from stem water deficit) appears to exhibit a threshold relationship with a critical balance point between water supply and demand. The occurrence of this point in time coincided with a decrease in stem growth rates, and the while the timing varied up to one month across topographic and elevational gradients, the onset date of growth limitation was a reliable predictor of overall annual growth. Our findings support previous model-based observations of nonlinearity in the relationship between

  19. Data mining methods in the prediction of Dementia: A real-data comparison of the accuracy, sensitivity and specificity of linear discriminant analysis, logistic regression, neural networks, support vector machines, classification trees and random forests

    Directory of Open Access Journals (Sweden)

    Santana Isabel

    2011-08-01

    Full Text Available Abstract Background Dementia and cognitive impairment associated with aging are a major medical and social concern. Neuropsychological testing is a key element in the diagnostic procedures of Mild Cognitive Impairment (MCI, but has presently a limited value in the prediction of progression to dementia. We advance the hypothesis that newer statistical classification methods derived from data mining and machine learning methods like Neural Networks, Support Vector Machines and Random Forests can improve accuracy, sensitivity and specificity of predictions obtained from neuropsychological testing. Seven non parametric classifiers derived from data mining methods (Multilayer Perceptrons Neural Networks, Radial Basis Function Neural Networks, Support Vector Machines, CART, CHAID and QUEST Classification Trees and Random Forests were compared to three traditional classifiers (Linear Discriminant Analysis, Quadratic Discriminant Analysis and Logistic Regression in terms of overall classification accuracy, specificity, sensitivity, Area under the ROC curve and Press'Q. Model predictors were 10 neuropsychological tests currently used in the diagnosis of dementia. Statistical distributions of classification parameters obtained from a 5-fold cross-validation were compared using the Friedman's nonparametric test. Results Press' Q test showed that all classifiers performed better than chance alone (p Conclusions When taking into account sensitivity, specificity and overall classification accuracy Random Forests and Linear Discriminant analysis rank first among all the classifiers tested in prediction of dementia using several neuropsychological tests. These methods may be used to improve accuracy, sensitivity and specificity of Dementia predictions from neuropsychological testing.

  20. Aspen Trees.

    Science.gov (United States)

    Canfield, Elaine

    2002-01-01

    Describes a fifth-grade art activity that offers a new approach to creating pictures of Aspen trees. Explains that the students learned about art concepts, such as line and balance, in this lesson. Discusses the process in detail for creating the pictures. (CMK)

  1. The Spatial Association Between Federally Qualified Health Centers and County-Level Reported Sexually Transmitted Infections: A Spatial Regression Approach.

    Science.gov (United States)

    Owusu-Edusei, Kwame; Gift, Thomas L; Leichliter, Jami S; Romaguera, Raul A

    2017-08-16

    The number of categorical sexually transmitted disease (STD) clinics is declining in the United States. Federally qualified health centers (FQHCs) have the potential to supplement the needed sexually transmitted infection (STI) services. In this study, we describe the spatial distribution of FQHC sites and determine if reported county-level nonviral STI morbidity were associated with having FQHC(s) using spatial regression techniques. We extracted map data from the Health Resources and Services Administration data warehouse on FQHCs (ie, geocoded health care service delivery [HCSD] sites) and extracted county-level data on the reported rates of chlamydia, gonorrhea and, primary and secondary (P&S) syphilis (2008-2012) from surveillance data. A 3-equation seemingly unrelated regression estimation procedure (with a spatial regression specification that controlled for county-level multiyear (2008-2012) demographic and socioeconomic factors) was used to determine the association between reported county-level STI morbidity and HCSD sites. Counties with HCSD sites had higher STI, poverty, unemployment, and violent crime rates than counties with no HCSD sites (P < 0.05). The number of HCSD sites was associated (P < 0.01) with increases in the temporally smoothed rates of chlamydia, gonorrhea, and P&S syphilis, but there was no significant association between the number of HCSD per 100,000 population and reported STI rates. There is a positive association between STI morbidity and the number of HCSD sites; however, this association does not exist when adjusting by population size. Further work may determine the extent to which HCSD sites can meet unmet needs for safety net STI services.

  2. Effects of the Interest Rate and Reserve Requirement Ratio on Bank Risk in China: A Panel Smooth Transition Regression Approach

    Directory of Open Access Journals (Sweden)

    Zhongyuan Geng

    2015-01-01

    Full Text Available This paper applies the Panel Smooth Transition Regression (PSTR model to simulate the effects of the interest rate and reserve requirement ratio on bank risk in China. The results reveal the nonlinearity embedded in the interest rate, reserve requirement ratio, and bank risk nexus. Both the interest rate and reserve requirement ratio exert a positive impact on bank risk for the low regime and a negative impact for the high regime. The interest rate performs a significant effect while the reserve requirement ratio shows an insignificant effect on bank risk on a statistical basis for both the high and low regimes.

  3. APPROACH BASED ON LINEAR REGRESSION FOR STOCK EXCHANGE PREDICTION – CASE STUDY OF PETR4 PETROBRÁS, BRAZIL

    Directory of Open Access Journals (Sweden)

    Nadson S. Timbó

    2016-01-01

    Full Text Available The stock exchange is an important apparatus for economic growth as it is an opportunity for investors to acquire equity and, at the same time, provide resources for organizations expansions. On the other hand, a major concern regarding entering this market is related with the dynamic in which deals are made since the pricing of shares happens in a smart and oscillatory way. Due to this context, several researchers are studying techniques in order to predict the stock exchange, maximize profits and reduce risks. Thus, this study proposes a linear regression model for stock exchange prediction which, combined with financial indicators, provides support decision-making by investors.

  4. Biomass Estimation of Xerophytic Forests Using Visible Aerial Imagery: Contrasting Single-Tree and Area-Based Approaches

    Directory of Open Access Journals (Sweden)

    Luca Bernasconi

    2017-03-01

    Full Text Available A large part of arid areas in tropical and sub-tropical regions are dominated by sparse xerophytic vegetation, which are essential for providing products and services for local populations. While a large number of researches already exist for the derivation of wall-to-wall estimations of above ground biomass (AGB with remotely sensed data, only a few of them are based on the direct use of non-photogrammetric aerial photography. In this contribution we present an experiment carried out in a study area located in the Santiago Island in the Cape Verde archipelago where a National Forest Inventory (NFI was recently carried out together with a new acquisition of a visible high-resolution aerial orthophotography. We contrasted two approaches: single-tree, based on the automatic delineation of tree canopies; and area-based, on the basis of an automatic image classification. Using 184 field plots collected for the NFI we created parametric models to predict AGB on the basis of the crown projection area (CPA estimated from the two approaches. Both the methods produced similar root mean square errors (RMSE at pixel level 45% for the single-tree and 42% for the area-based. However, the latest was able to better predict the AGB along all the variable range, limiting the saturation problem which is evident when the CPA tends to reach the full coverage of the field plots. These findings demonstrate that in regions dominated by sparse vegetation, a simple aerial orthophoto can be used to successfully create AGB wall-to-wall predictions. The level of these estimations’ uncertainty permits the derivation of small area estimations useful for supporting a more correct implementation of sustainable management practices of wood resources.

  5. Linear regression

    CERN Document Server

    Olive, David J

    2017-01-01

    This text covers both multiple linear regression and some experimental design models. The text uses the response plot to visualize the model and to detect outliers, does not assume that the error distribution has a known parametric distribution, develops prediction intervals that work when the error distribution is unknown, suggests bootstrap hypothesis tests that may be useful for inference after variable selection, and develops prediction regions and large sample theory for the multivariate linear regression model that has m response variables. A relationship between multivariate prediction regions and confidence regions provides a simple way to bootstrap confidence regions. These confidence regions often provide a practical method for testing hypotheses. There is also a chapter on generalized linear models and generalized additive models. There are many R functions to produce response and residual plots, to simulate prediction intervals and hypothesis tests, to detect outliers, and to choose response trans...

  6. Penalized linear regression for discrete ill-posed problems: A hybrid least-squares and mean-squared error approach

    KAUST Repository

    Suliman, Mohamed

    2016-12-19

    This paper proposes a new approach to find the regularization parameter for linear least-squares discrete ill-posed problems. In the proposed approach, an artificial perturbation matrix with a bounded norm is forced into the discrete ill-posed model matrix. This perturbation is introduced to enhance the singular-value (SV) structure of the matrix and hence to provide a better solution. The proposed approach is derived to select the regularization parameter in a way that minimizes the mean-squared error (MSE) of the estimator. Numerical results demonstrate that the proposed approach outperforms a set of benchmark methods in most cases when applied to different scenarios of discrete ill-posed problems. Jointly, the proposed approach enjoys the lowest run-time and offers the highest level of robustness amongst all the tested methods.

  7. New approach for phylogenetic tree recovery based on genome-scale metabolic networks.

    Science.gov (United States)

    Gamermann, Daniel; Montagud, Arnaud; Conejero, J Alberto; Urchueguía, Javier F; de Córdoba, Pedro Fernández

    2014-07-01

    A wide range of applications and research has been done with genome-scale metabolic models. In this work, we describe an innovative methodology for comparing metabolic networks constructed from genome-scale metabolic models and how to apply this comparison in order to infer evolutionary distances between different organisms. Our methodology allows a quantification of the metabolic differences between different species from a broad range of families and even kingdoms. This quantification is then applied in order to reconstruct phylogenetic trees for sets of various organisms.

  8. Adjustment of State Owned and Foreign-Funded Enterprises in China to economic reforms,1980s-2007: a logistic smooth transition regression (LSTR) approach

    OpenAIRE

    Aizenman, Joshua; Geng, Nan

    2009-01-01

    This paper applies a logistic smooth transition regression approach to the estimation of a homogenous aggregate value added production function of the State Owned (SOE) and Foreign-Funded Enterprises (FFE) in China, 1980s-2007. The transition associated with the eco- nomic reforms in China is estimated applying a curvilinear logistic function, where the speed and the timing of the transition are endoge- nously determined by the data. We find high but gradually declining markups in both ...

  9. Plateletpheresis efficiency and mathematical correction of software-derived platelet yield prediction: A linear regression and ROC modeling approach.

    Science.gov (United States)

    Jaime-Pérez, José Carlos; Jiménez-Castillo, Raúl Alberto; Vázquez-Hernández, Karina Elizabeth; Salazar-Riojas, Rosario; Méndez-Ramírez, Nereida; Gómez-Almaguer, David

    2017-10-01

    Advances in automated cell separators have improved the efficiency of plateletpheresis and the possibility of obtaining double products (DP). We assessed cell processor accuracy of predicted platelet (PLT) yields with the goal of a better prediction of DP collections. This retrospective proof-of-concept study included 302 plateletpheresis procedures performed on a Trima Accel v6.0 at the apheresis unit of a hematology department. Donor variables, software predicted yield and actual PLT yield were statistically evaluated. Software prediction was optimized by linear regression analysis and its optimal cut-off to obtain a DP assessed by receiver operating characteristic curve (ROC) modeling. Three hundred and two plateletpheresis procedures were performed; in 271 (89.7%) occasions, donors were men and in 31 (10.3%) women. Pre-donation PLT count had the best direct correlation with actual PLT yield (r = 0.486. P linear regression analysis accurately corrected this underestimation and ROC analysis identified a precise cut-off to reliably predict a DP. © 2016 Wiley Periodicals, Inc.

  10. A New Predictive Model of Centerline Segregation in Continuous Cast Steel Slabs by Using Multivariate Adaptive Regression Splines Approach

    Directory of Open Access Journals (Sweden)

    Paulino José García Nieto

    2015-06-01

    Full Text Available The aim of this study was to obtain a predictive model able to perform an early detection of central segregation severity in continuous cast steel slabs. Segregation in steel cast products is an internal defect that can be very harmful when slabs are rolled in heavy plate mills. In this research work, the central segregation was studied with success using the data mining methodology based on multivariate adaptive regression splines (MARS technique. For this purpose, the most important physical-chemical parameters are considered. The results of the present study are two-fold. In the first place, the significance of each physical-chemical variable on the segregation is presented through the model. Second, a model for forecasting segregation is obtained. Regression with optimal hyperparameters was performed and coefficients of determination equal to 0.93 for continuity factor estimation and 0.95 for average width were obtained when the MARS technique was applied to the experimental dataset, respectively. The agreement between experimental data and the model confirmed the good performance of the latter.

  11. The Local Food Environment and Fruit and Vegetable Intake: A Geographically Weighted Regression Approach in the ORiEL Study.

    Science.gov (United States)

    Clary, Christelle; Lewis, Daniel J; Flint, Ellen; Smith, Neil R; Kestens, Yan; Cummins, Steven

    2016-12-01

    Studies that explore associations between the local food environment and diet routinely use global regression models, which assume that relationships are invariant across space, yet such stationarity assumptions have been little tested. We used global and geographically weighted regression models to explore associations between the residential food environment and fruit and vegetable intake. Analyses were performed in 4 boroughs of London, United Kingdom, using data collected between April 2012 and July 2012 from 969 adults in the Olympic Regeneration in East London Study. Exposures were assessed both as absolute densities of healthy and unhealthy outlets, taken separately, and as a relative measure (proportion of total outlets classified as healthy). Overall, local models performed better than global models (lower Akaike information criterion). Locally estimated coefficients varied across space, regardless of the type of exposure measure, although changes of sign were observed only when absolute measures were used. Despite findings from global models showing significant associations between the relative measure and fruit and vegetable intake (β = 0.022; P environment and diet. It further challenges the idea that a single measure of exposure, whether relative or absolute, can reflect the many ways the food environment may shape health behaviors. © The Author 2016. Published by Oxford University Press on behalf of the Johns Hopkins Bloomberg School of Public Health. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.

  12. Evaluation of the Relationship between Social Desirability and Minor Psychiatric Disorders among Nurses in Southern Iran: A Robust Regression Approach.

    Science.gov (United States)

    Roustaei, Narges; Jafari, Peyman; Sadeghi, Erfan; Jamali, Jamshid

    2015-10-01

    Social desirability may affect different aspects of people's quality of life. One of the impressive dimensions of quality of life is mental health. The prevalence of Minor Psychiatric Disorders (MPD) among health care workers is higher than other health workers. This article aims at evaluating the relationship between social desirability and MPD among nurses in southern Iran. A cross-sectional study was carried out on 765 nurses who had been employed in hospitals in the southern provinces of Iran. The 12-item General Health Questionnaire (GHQ-12) and Marlowe-Crowne Social Desirability Scale (MC-SDS) were used for evaluating the MPD and social desirability in nurses, respectively. The Robust Regression was used to determine any quantified relationship between social desirability and the level of MPD with adjusted age, gender, work experience, marital status, and level of education. The mean scores of GHQ-12 and MC-SDS were 13.02±5.64 (out of 36) and 20.17±4.76 (out of 33), respectively. The result of Robust Regression indicated that gender and social desirability were statistically significant in affecting MPD. The prevalence of MPD in female nurses was higher than males. Nurses with higher social desirability scores had the tendency to report lower levels of MPD.

  13. Evaluation of the Relationship between Social Desirability and Minor Psychiatric Disorders among Nurses in Southern Iran: A Robust Regression Approach

    Directory of Open Access Journals (Sweden)

    Narges Roustaei

    2015-10-01

    Full Text Available Abstract Background: Social desirability may affect different aspects of people’s quality of life. One of the impressive dimensions of quality of life is mental health. The prevalence of Minor Psychiatric Disorders (MPD among health care workers is higher than other health workers. This article aims at evaluating the relationship between social desirability and MPD among nurses in southern Iran. Method: A cross-sectional study was carried out on 765 nurses who had been employed in hospitals in the southern provinces of Iran. The 12-item General Health Questionnaire (GHQ-12 and Marlowe- Crowne Social Desirability Scale (MC-SDS were used for evaluating the MPD and social desirability in nurses, respectively. The Robust Regression was used to determine any quantified relationship between social desirability and the level of MPD with adjusted age, gender, work experience, marital status, and level of education. Result: The mean scores of GHQ-12 and MC-SDS were 13.02±5.64 (out of 36 and 20.17±4.76 (out of 33, respectively. The result of Robust Regression indicated that gender and social desirability were statistically significant in affecting MPD. Conclusion: The prevalence of MPD in female nurses was higher than males. Nurses with higher social desirability scores had the tendency to report lower levels of MPD.

  14. A New Predictive Model of Centerline Segregation in Continuous Cast Steel Slabs by Using Multivariate Adaptive Regression Splines Approach

    Science.gov (United States)

    García Nieto, Paulino José; González Suárez, Victor Manuel; Álvarez Antón, Juan Carlos; Mayo Bayón, Ricardo; Sirgo Blanco, José Ángel; Díaz Fernández, Ana María

    2015-01-01

    The aim of this study was to obtain a predictive model able to perform an early detection of central segregation severity in continuous cast steel slabs. Segregation in steel cast products is an internal defect that can be very harmful when slabs are rolled in heavy plate mills. In this research work, the central segregation was studied with success using the data mining methodology based on multivariate adaptive regression splines (MARS) technique. For this purpose, the most important physical-chemical parameters are considered. The results of the present study are two-fold. In the first place, the significance of each physical-chemical variable on the segregation is presented through the model. Second, a model for forecasting segregation is obtained. Regression with optimal hyperparameters was performed and coefficients of determination equal to 0.93 for continuity factor estimation and 0.95 for average width were obtained when the MARS technique was applied to the experimental dataset, respectively. The agreement between experimental data and the model confirmed the good performance of the latter.

  15. Antibiogram-Derived Radial Decision Trees: An Innovative Approach to Susceptibility Data Display

    Directory of Open Access Journals (Sweden)

    Rocco J. Perla

    2005-01-01

    Full Text Available Hospital antibiograms (ABGMs are often presented in the form of large 2-factor (single organism vs. single antimicrobial tables. Presenting susceptibility data in this fashion, although of value, does have limitations relative to drug resistant subpopulations. As the crisis of antimicrobial drug-resistance continues to escalate globally, clinicians need (1 to have access to susceptibility data that, for isolates resistant to first-line drugs, indicates susceptibility to second line drugs and (2 to understand the probabilities of encountering such organisms in a particular institution. This article describes a strategy used to transform data in a hospital ABGM into a probability-based radial decision tree (RDT that can be used as a guide to empiric antimicrobial therapy. Presenting ABGM data in the form of a radial decision tree versus a table makes it easier to visually organize complex data and to demonstrate different levels of therapeutic decision-making. The RDT model discussed here may also serve as a more effective tool to understand the prevalence of different resistant subpopulations in a given institution compared to the traditional ABGM.

  16. New developments in fruit and vegetables consumption in the period 1999-2004 in Denmark - a quantile regression approach

    DEFF Research Database (Denmark)

    Hansen, Aslak Hedemann

    2008-01-01

    The development in the consumption of fruit and vegetables in the period 1999-2004 in Denmark was investigated using quantile regression and two previously overlooked problems were identified. First, the change in the ten percent quantile samples decreased. This could have been caused by changes...... for this development is probably due to low income groups becoming relatively more income constrained since the gap to the high income group have grown considerably at the lower end of the distribution. The second problem was that the education inducing gap became larger in 2004 indicating that uneducated people have...... not responded as well to the health related information flow. These results suggest that information campaigns have not been as successful as previously thought; more importantly the results indicate that information campaigns alone will do a poor job in solving the identified problems. Other instruments...

  17. An alternative approach to the ground motion prediction problem by a non-parametric adaptive regression method

    Science.gov (United States)

    Yerlikaya-Özkurt, Fatma; Askan, Aysegul; Weber, Gerhard-Wilhelm

    2014-12-01

    Ground Motion Prediction Equations (GMPEs) are empirical relationships which are used for determining the peak ground response at a particular distance from an earthquake source. They relate the peak ground responses as a function of earthquake source type, distance from the source, local site conditions where the data are recorded and finally the depth and magnitude of the earthquake. In this article, a new prediction algorithm, called Conic Multivariate Adaptive Regression Splines (CMARS), is employed on an available dataset for deriving a new GMPE. CMARS is based on a special continuous optimization technique, conic quadratic programming. These convex optimization problems are very well-structured, resembling linear programs and, hence, permitting the use of interior point methods. The CMARS method is performed on the strong ground motion database of Turkey. Results are compared with three other GMPEs. CMARS is found to be effective for ground motion prediction purposes.

  18. Relative accuracy of spatial predictive models for lynx Lynx canadensis derived using logistic regression-AIC, multiple criteria evaluation and Bayesian approaches

    Institute of Scientific and Technical Information of China (English)

    Hejun KANG; Shelley M.ALEXANDER

    2009-01-01

    We compared probability surfaces derived using one set of environmental variables in three Geographic Information Systems (GIS) -based approaches: logistic regression and Akaike's Information Criterion (AIC),Multiple Criteria Evaluation (MCE),and Bayesian Analysis (specifically Dempster-Shafer theory). We used lynx Lynx canadensis as our focal species,and developed our environment relationship model using track data collected in Banff National Park,Alberta,Canada,during winters from 1997 to 2000. The accuracy of the three spatial models were compared using a contingency table method. We determined the percentage of cases in which both presence and absence points were correctly classified (overall accuracy),the failure to predict a species where it occurred (omission error) and the prediction of presence where there was absence (commission error). Our overall accuracy showed the logistic regression approach was the most accurate (74.51% ). The multiple criteria evaluation was intermediate (39.22%),while the Dempster-Shafer (D-S) theory model was the poorest (29.90%). However,omission and commission error tell us a different story: logistic regression had the lowest commission error,while D-S theory produced the lowest omission error. Our results provide evidence that habitat modellers should evaluate all three error measures when ascribing confidence in their model. We suggest that for our study area at least,the logistic regression model is optimal. However,where sample size is small or the species is very rare,it may also be useful to explore and/or use a more ecologically cautious modelling approach (e.g. Dempster-Shafer) that would over-predict,protect more sites,and thereby minimize the risk of missing critical habitat in conservation plans.

  19. Relative accuracy of spatial predictive models for lynx Lynx canadensis derived using logistic regression-AIC, multiple criteria evaluation and Bayesian approaches

    Directory of Open Access Journals (Sweden)

    Shelley M. ALEXANDER

    2009-02-01

    Full Text Available We compared probability surfaces derived using one set of environmental variables in three Geographic Information Systems (GIS-based approaches: logistic regression and Akaike’s Information Criterion (AIC, Multiple Criteria Evaluation (MCE, and Bayesian Analysis (specifically Dempster-Shafer theory. We used lynx Lynx canadensis as our focal species, and developed our environment relationship model using track data collected in Banff National Park, Alberta, Canada, during winters from 1997 to 2000. The accuracy of the three spatial models were compared using a contingency table method. We determined the percentage of cases in which both presence and absence points were correctly classified (overall accuracy, the failure to predict a species where it occurred (omission error and the prediction of presence where there was absence (commission error. Our overall accuracy showed the logistic regression approach was the most accurate (74.51%. The multiple criteria evaluation was intermediate (39.22%, while the Dempster-Shafer (D-S theory model was the poorest (29.90%. However, omission and commission error tell us a different story: logistic regression had the lowest commission error, while D-S theory produced the lowest omission error. Our results provide evidence that habitat modellers should evaluate all three error measures when ascribing confidence in their model. We suggest that for our study area at least, the logistic regression model is optimal. However, where sample size is small or the species is very rare, it may also be useful to explore and/or use a more ecologically cautious modelling approach (e.g. Dempster-Shafer that would over-predict, protect more sites, and thereby minimize the risk of missing critical habitat in conservation plans[Current Zoology 55(1: 28 – 40, 2009].

  20. A Versatile, Production-Oriented Approach to High-Resolution Tree-Canopy Mapping in Urban and Suburban Landscapes Using GEOBIA and Data Fusion

    Directory of Open Access Journals (Sweden)

    Jarlath O'Neil-Dunne

    2014-12-01

    Full Text Available The benefits of tree canopy in urban and suburban landscapes are increasingly well known: stormwater runoff control, air-pollution mitigation, temperature regulation, carbon storage, wildlife habitat, neighborhood cohesion, and other social indicators of quality of life. However, many urban areas lack high-resolution tree canopy maps that document baseline conditions or inform tree-planting programs, limiting effective study and management. This paper describes a GEOBIA approach to tree-canopy mapping that relies on existing public investments in LiDAR, multispectral imagery, and thematic GIS layers, thus eliminating or reducing data acquisition costs. This versatile approach accommodates datasets of varying content and quality, first using LiDAR derivatives to identify aboveground features and then a combination of LiDAR and imagery to differentiate trees from buildings and other anthropogenic structures. Initial tree canopy objects are then refined through contextual analysis, morphological smoothing, and small-gap filling. Case studies from locations in the United States and Canada show how a GEOBIA approach incorporating data fusion and enterprise processing can be used for producing high-accuracy, high-resolution maps for large geographic extents. These maps are designed specifically for practical application by planning and regulatory end users who expect not only high accuracy but also high realism and visual coherence.

  1. Artificial Root Exudate System (ARES): a field approach to simulate tree root exudation in soils

    Science.gov (United States)

    Lopez-Sangil, Luis; Estradera-Gumbau, Eduard; George, Charles; Sayer, Emma

    2016-04-01

    The exudation of labile solutes by fine roots represents an important strategy for plants to promote soil nutrient availability in terrestrial ecosystems. Compounds exuded by roots (mainly sugars, carboxylic and amino acids) provide energy to soil microbes, thus priming the mineralization of soil organic matter (SOM) and the consequent release of inorganic nutrients into the rhizosphere. Studies in several forest ecosystems suggest that tree root exudates represent 1 to 10% of the total photoassimilated C, with exudation rates increasing markedly under elevated CO2 scenarios. Despite their importance in ecosystem functioning, we know little about how tree root exudation affect soil carbon dynamics in situ. This is mainly because there has been no viable method to experimentally control inputs of root exudates at field scale. Here, I present a method to apply artificial root exudates below the soil surface in small field plots. The artificial root exudate system (ARES) consists of a water container with a mixture of labile carbon solutes (mimicking tree root exudate rates and composition), which feeds a system of drip-tips covering an area of 1 m2. The tips are evenly distributed every 20 cm and inserted 4-cm into the soil with minimal disturbance. The system is regulated by a mechanical timer, such that artificial root exudate solution can be applied at frequent, regular daily intervals. We tested ARES from April to September 2015 (growing season) within a leaf-litter manipulation experiment ongoing in temperate deciduous woodland in the UK. Soil respiration was measured monthly, and soil samples were taken at the end of the growing season for PLFA, enzymatic activity and nutrient analyses. First results show a very rapid mineralization of the root exudate compounds and, interestingly, long-term increases in SOM respiration, with negligible effects on soil moisture levels. Large positive priming effects (2.5-fold increase in soil respiration during the growing

  2. Genetic Markers Analyses and Bioinformatic Approaches to Distinguish Between Olive Tree (Olea europaea L.) Cultivars.

    Science.gov (United States)

    Ben Ayed, Rayda; Ben Hassen, Hanen; Ennouri, Karim; Rebai, Ahmed

    2016-12-01

    The genetic diversity of 22 olive tree cultivars (Olea europaea L.) sampled from different Mediterranean countries was assessed using 5 SNP markers (FAD2.1; FAD2.3; CALC; SOD and ANTHO3) located in four different genes. The genotyping analysis of the 22 cultivars with 5 SNP loci revealed 11 alleles (average 2.2 per allele). The dendrogram based on cultivar genotypes revealed three clusters consistent with the cultivars classification. Besides, the results obtained with the five SNPs were compared to those obtained with the SSR markers using bioinformatic analyses and by computing a cophenetic correlation coefficient, indicating the usefulness of the UPGMA method for clustering plant genotypes. Based on principal coordinate analysis using a similarity matrix, the first two coordinates, revealed 54.94 % of the total variance. This work provides a more comprehensive explanation of the diversity available in Tunisia olive cultivars, and an important contribution for olive breeding and olive oil authenticity.

  3. A novel decision tree approach based on transcranial Doppler sonography to screen for blunt cervical vascular injuries.

    Science.gov (United States)

    Purvis, Dianna; Aldaghlas, Tayseer; Trickey, Amber W; Rizzo, Anne; Sikdar, Siddhartha

    2013-06-01

    Early detection and treatment of blunt cervical vascular injuries prevent adverse neurologic sequelae. Current screening criteria can miss up to 22% of these injuries. The study objective was to investigate bedside transcranial Doppler sonography for detecting blunt cervical vascular injuries in trauma patients using a novel decision tree approach. This prospective pilot study was conducted at a level I trauma center. Patients undergoing computed tomographic angiography for suspected blunt cervical vascular injuries were studied with transcranial Doppler sonography. Extracranial and intracranial vasculatures were examined with a portable power M-mode transcranial Doppler unit. The middle cerebral artery mean flow velocity, pulsatility index, and their asymmetries were used to quantify flow patterns and develop an injury decision tree screening protocol. Student t tests validated associations between injuries and transcranial Doppler predictive measures. We evaluated 27 trauma patients with 13 injuries. Single vertebral artery injuries were most common (38.5%), followed by single internal carotid artery injuries (30%). Compared to patients without injuries, mean flow velocity asymmetry was higher for single internal carotid artery (P = .003) and single vertebral artery (P = .004) injuries. Similarly, pulsatility index asymmetry was higher in single internal carotid artery (P = .015) and single vertebral artery (P = .042) injuries, whereas the lowest pulsatility index was elevated for bilateral vertebral artery injuries (P = .006). The decision tree yielded 92% specificity, 93% sensitivity, and 93% correct classifications. In this pilot feasibility study, transcranial Doppler measures were significantly associated with the blunt cervical vascular injury status, suggesting that transcranial Doppler sonography might be a viable bedside screening tool for trauma. Patient-specific hemodynamic information from transcranial Doppler assessment has the potential to alter

  4. Boosted beta regression.

    Directory of Open Access Journals (Sweden)

    Matthias Schmid

    Full Text Available Regression analysis with a bounded outcome is a common problem in applied statistics. Typical examples include regression models for percentage outcomes and the analysis of ratings that are measured on a bounded scale. In this paper, we consider beta regression, which is a generalization of logit models to situations where the response is continuous on the interval (0,1. Consequently, beta regression is a convenient tool for analyzing percentage responses. The classical approach to fit a beta regression model is to use maximum likelihood estimation with subsequent AIC-based variable selection. As an alternative to this established - yet unstable - approach, we propose a new estimation technique called boosted beta regression. With boosted beta regression estimation and variable selection can be carried out simultaneously in a highly efficient way. Additionally, both the mean and the variance of a percentage response can be modeled using flexible nonlinear covariate effects. As a consequence, the new method accounts for common problems such as overdispersion and non-binomial variance structures.

  5. Classification and Regression Tree Analysis of Clinical Patterns that Predict Survival in 127 Chinese Patients with Advanced Non-small Cell Lung Cancer Treated by Gefitinib Who Failed to Previous Chemotherapy

    Directory of Open Access Journals (Sweden)

    Ziping WANG

    2011-09-01

    Full Text Available Background and objective It has been proven that gefitinib produces only 10%-20% tumor regression in heavily pretreated, unselected non-small cell lung cancer (NSCLC patients as the second- and third-line setting. Asian, female, nonsmokers and adenocarcinoma are favorable factors; however, it is difficult to find a patient satisfying all the above clinical characteristics. The aim of this study is to identify novel predicting factors, and to explore the interactions between clinical variables and their impact on the survival of Chinese patients with advanced NSCLC who were heavily treated with gefitinib in the second- or third-line setting. Methods The clinical and follow-up data of 127 advanced NSCLC patients referred to the Cancer Hospital & Institute, Chinese Academy of Medical Sciences from March 2005 to March 2010 were analyzed. Multivariate analysis of progression-free survival (PFS was performed using recursive partitioning, which is referred to as the classification and regression tree (CART analysis. Results The median PFS of 127 eligible consecutive advanced NSCLC patients was 8.0 months (95%CI: 5.8-10.2. CART was performed with an initial split on first-line chemotherapy outcomes and a second split on patients’ age. Three terminal subgroups were formed. The median PFS of the three subsets ranged from 1.0 month (95%CI: 0.8-1.2 for those with progressive disease outcome after the first-line chemotherapy subgroup, 10 months (95%CI: 7.0-13.0 in patients with a partial response or stable disease in first-line chemotherapy and age <70, and 22.0 months for patients obtaining a partial response or stable disease in first-line chemotherapy at age 70-81 (95%CI: 3.8-40.1. Conclusion Partial response, stable disease in first-line chemotherapy and age ≥ 70 are closely correlated with long-term survival treated by gefitinib as a second- or third-line setting in advanced NSCLC. CART can be used to identify previously unappreciated patient

  6. A design of experiments approach to validation sampling for logistic regression modeling with error-prone medical records.

    Science.gov (United States)

    Ouyang, Liwen; Apley, Daniel W; Mehrotra, Sanjay

    2016-04-01

    Electronic medical record (EMR) databases offer significant potential for developing clinical hypotheses and identifying disease risk associations by fitting statistical models that capture the relationship between a binary response variable and a set of predictor variables that represent clinical, phenotypical, and demographic data for the patient. However, EMR response data may be error prone for a variety of reasons. Performing a manual chart review to validate data accuracy is time consuming, which limits the number of chart reviews in a large database. The authors' objective is to develop a new design-of-experiments-based systematic chart validation and review (DSCVR) approach that is more powerful than the random validation sampling used in existing approaches. The DSCVR approach judiciously and efficiently selects the cases to validate (i.e., validate whether the response values are correct for those cases) for maximum information content, based only on their predictor variable values. The final predictive model will be fit using only the validation sample, ignoring the remainder of the unvalidated and unreliable error-prone data. A Fisher information based D-optimality criterion is used, and an algorithm for optimizing it is developed. The authors' method is tested in a simulation comparison that is based on a sudden cardiac arrest case study with 23 041 patients' records. This DSCVR approach, using the Fisher information based D-optimality criterion, results in a fitted model with much better predictive performance, as measured by the receiver operating characteristic curve and the accuracy in predicting whether a patient will experience the event, than a model fitted using a random validation sample. The simulation comparisons demonstrate that this DSCVR approach can produce predictive models that are significantly better than those produced from random validation sampling, especially when the event rate is low. © The Author 2015. Published by Oxford

  7. Financial performance monitoring of the technical efficiency of critical access hospitals: a data envelopment analysis and logistic regression modeling approach.

    Science.gov (United States)

    Wilson, Asa B; Kerr, Bernard J; Bastian, Nathaniel D; Fulton, Lawrence V

    2012-01-01

    From 1980 to 1999, rural designated hospitals closed at a disproportionally high rate. In response to this emergent threat to healthcare access in rural settings, the Balanced Budget Act of 1997 made provisions for the creation of a new rural hospital--the critical access hospital (CAH). The conversion to CAH and the associated cost-based reimbursement scheme significantly slowed the closure rate of rural hospitals. This work investigates which methods can ensure the long-term viability of small hospitals. This article uses a two-step design to focus on a hypothesized relationship between technical efficiency of CAHs and a recently developed set of financial monitors for these entities. The goal is to identify the financial performance measures associated with efficiency. The first step uses data envelopment analysis (DEA) to differentiate efficient from inefficient facilities within a data set of 183 CAHs. Determining DEA efficiency is an a priori categorization of hospitals in the data set as efficient or inefficient. In the second step, DEA efficiency is the categorical dependent variable (efficient = 0, inefficient = 1) in the subsequent binary logistic regression (LR) model. A set of six financial monitors selected from the array of 20 measures were the LR independent variables. We use a binary LR to test the null hypothesis that recently developed CAH financial indicators had no predictive value for categorizing a CAH as efficient or inefficient, (i.e., there is no relationship between DEA efficiency and fiscal performance).

  8. A comparative study of multiple regression analysis and back propagation neural network approaches on plain carbon steel in submerged-arc welding

    Indian Academy of Sciences (India)

    ABHIJIT SARKAR; PRASENJIT DEY; R N RAI; SUBHAS CHANDRA SAHA

    2016-05-01

    Weld bead plays an important role in determining the quality of welding particularly in high heat input processes. This research paper presents the development of multiple regression analysis (MRA) and artificial neural network (ANN) models to predict weld bead geometry and HAZ width in submerged arcwelding process. Design of experiments is based on Taguchi’s L16 orthogonal array by varying wire feed rate,transverse speed and stick out to develop a multiple regression model, which has been checked for adequacy andsignificance. Also, ANN model was accomplished with the back propagation approach in MATLAB program to predict bead geometry and HAZ width. Finally, the results of two prediction models were compared and analyzed. It is found that the error related to the prediction of bead geometry and HAZ width is smaller in ANN than MRA.

  9. Oil condition monitoring of gears onboard ships using a regression approach for multivariate T2 control charts

    DEFF Research Database (Denmark)

    Henneberg, Morten; Jørgensen, Bent; Eriksen, René Lynge

    2016-01-01

    In this paper, we present an oil condition and wear debris evaluation method for ship thruster gears using T2 statistics to form control charts from a multi-sensor platform. The proposed method takes into account the different ambient conditions by multiple linear regression on the mean value...... as substitution from the normal empirical mean value. This regression approach accounts for the bias imposed on the empirical mean value due to different geographical and seasonal differences on the multi-sensor inputs. Data from a gearbox are used to evaluate the length of the run-in period in order to ensure...... only quasi-stationary data are included in phase I of the T2 statistics. Data from two thruster gears onboard two different ships are presented and analyzed, and the selection of the phase I data size is discussed. A graphic overview for quick localization of T2 signaling is also demonstrated using...

  10. Heterogeneity in the Relationship of Substance Use to Risky Sexual Behavior Among Justice-Involved Youth: A Regression Mixture Modeling Approach.

    Science.gov (United States)

    Schmiege, Sarah J; Bryan, Angela D

    2016-04-01

    Justice-involved adolescents engage in high levels of risky sexual behavior and substance use, and understanding potential relationships among these constructs is important for effective HIV/STI prevention. A regression mixture modeling approach was used to determine whether subgroups could be identified based on the regression of two indicators of sexual risk (condom use and frequency of intercourse) on three measures of substance use (alcohol, marijuana and hard drugs). Three classes were observed among n = 596 adolescents on probation: none of the substances predicted outcomes for approximately 18 % of the sample; alcohol and marijuana use were predictive for approximately 59 % of the sample, and marijuana use and hard drug use were predictive in approximately 23 % of the sample. Demographic, individual difference, and additional sexual and substance use risk variables were examined in relation to class membership. Findings are discussed in terms of understanding profiles of risk behavior among at-risk youth.

  11. Extended Beta Regression in R: Shaken, Stirred, Mixed, and Partitioned

    Directory of Open Access Journals (Sweden)

    Bettina Grün

    2012-05-01

    Full Text Available Beta regression – an increasingly popular approach for modeling rates and proportions – is extended in various directions: (a bias correction/reduction of the maximum likelihood estimator, (b beta regression tree models by means of recursive partitioning, (c latent class beta regression by means of finite mixture models. All three extensions may be of importance for enhancing the beta regression toolbox in practice to provide more reliable inference and capture both observed and unobserved/latent heterogeneity in the data. Using the analogy of Smithson and Verkuilen (2006, these extensions make beta regression not only “a better lemon squeezer” (compared to classical least squares regression but a full-fledged modern juicer offering lemon-based drinks: shaken and stirred (bias correction and reduction, mixed (finite mixture model, or partitioned (tree model. All three extensions are provided in the R package betareg (at least 2.4-0, building on generic algorithms and implementations for bias correction/reduction, model-based recursive partioning, and finite mixture models, respectively. Specifically, the new functions betatree( and betamix( reuse the object-oriented flexible implementation from the R packages party and flexmix, respectively.

  12. An Approach to Continuous Approximation of Pareto Front Using Geometric Support Vector Regression for Multi-objective Optimization of Fermentation Process

    Institute of Scientific and Technical Information of China (English)

    Jiahuan Wu; Jianlin Wang; Tao Yu; Liqiang Zhao

    2014-01-01

    The approaches to discrete approximation of Pareto front using multi-objective evolutionary algorithms have the problems of heavy computation burden, long running time and missing Pareto optimal points. In order to overcome these problems, an approach to continuous approximation of Pareto front using geometric support vector regression is presented. The regression model of the small size approximate discrete Pareto front is constructed by geometric support vector regression modeling and is described as the approximate continuous Pareto front. In the process of geometric support vector regression modeling, considering the distribution characteristic of Pareto optimal points, the separable augmented training sample sets are constructed by shifting original training sample points along multiple coordinated axes. Besides, an interactive decision-making (DM) procedure, in which the continuous approximation of Pareto front and decision-making is performed interactive-ly, is designed for improving the accuracy of the preferred Pareto optimal point. The correctness of the continuous approximation of Pareto front is demonstrated with a typical multi-objective optimization problem. In addition, combined with the interactive decision-making procedure, the continuous approximation of Pareto front is applied in the multi-objective optimization for an industrial fed-batch yeast fermentation process. The experi-mental results show that the generated approximate continuous Pareto front has good accuracy and complete-ness. Compared with the multi-objective evolutionary algorithm with large size population, a more accurate preferred Pareto optimal point can be obtained from the approximate continuous Pareto front with less compu-tation and shorter running time. The operation strategy corresponding to the final preferred Pareto optimal point generated by the interactive DM procedure can improve the production indexes of the fermentation process effectively.

  13. Investigation of the marked and long-standing spatial inhomogeneity of the Hungarian suicide rate: a spatial regression approach.

    Science.gov (United States)

    Balint, Lajos; Dome, Peter; Daroczi, Gergely; Gonda, Xenia; Rihmer, Zoltan

    2014-02-01

    In the last century Hungary had astonishingly high suicide rates characterized by marked regional within-country inequalities, a spatial pattern which has been quite stable over time. To explain the above phenomenon at the level of micro-regions (n=175) in the period between 2005 and 2011. Our dependent variable was the age and gender standardized mortality ratio (SMR) for suicide while explanatory variables were factors which are supposed to influence suicide risk, such as measures of religious and political integration, travel time accessibility of psychiatric services, alcohol consumption, unemployment and disability pensionery. When applying the ordinary least squared regression model, the residuals were found to be spatially autocorrelated, which indicates the violation of the assumption on the independence of error terms and - accordingly - the necessity of application of a spatial autoregressive (SAR) model to handle this problem. According to our calculations the SARlag model was a better way (versus the SARerr model) of addressing the problem of spatial autocorrelation, furthermore its substantive meaning is more convenient. SMR was significantly associated with the "political integration" variable in a negative and with "lack of religious integration" and "disability pensionery" variables in a positive manner. Associations were not significant for the remaining explanatory variables. Several important psychiatric variables were not available at the level of micro-regions. We conducted our analysis on aggregate data. Our results may draw attention to the relevance and abiding validity of the classic Durkheimian suicide risk factors - such as lack of social integration - apropos of the spatial pattern of Hungarian suicides. © 2013 Published by Elsevier B.V.

  14. Determinants of food insecurity among farming households in Katsina State, north western Nigeria: An ordinal logit regression approach

    Directory of Open Access Journals (Sweden)

    Ibrahim Hussaini Y.

    2016-01-01

    Full Text Available The study identified the determinants of food insecurity among farming households in Katsina State, north western Nigeria. A cross sectional sample survey design was used to select a total of 150 small-holder farmers from 15 communities across 10 Local Government Areas of the state. A structured questionnaire, Focus Group Discussion and Key Informant Interview were used for data collection. The coping strategy index was used to determine the food security status of the household and the ordered logit regression was used to identify the determinants of food insecurity among the households. The majority (73% were found to be food insecure. In terms of food insecurity status, 44% of the respondents were less food insecure, while 17% and 12% were moderately food insecure and severely food insecure respectively. Eating the less preferred meal, purchasing food on credit and reducing the quantity of food consumed were the major coping strategies adopted by the food insecure households. The result of the ordered logit model shows that the total quantity of cereal saved, number of income sources and dependency ratio were significant for both the moderately and severely food insecure groups at p<0.05 while access to credit was also significant for the two groups but at p<0.01. The output of other crops was significant at p=0.10 but only for the severely food insecure group. The study concluded that food insecurity was high in the study area and therefore recommended that the farming households be provided with opportunities to diversify their livelihood activities.

  15. A Scenario Tree based Stochastic Programming Approach for Multi-Stage Weapon Equipment Mix Production Planning in Defense Manufacturing

    Directory of Open Access Journals (Sweden)

    Li Xuan

    2016-01-01

    Full Text Available The evolving military capability requirements (CRs must be meted continuously by the multi-stage weapon equipment mix production planning (MWEMPP. Meanwhile, the CRs possess complex uncertainties with the variant military tasks in the whole planning horizon. The mean-value deterministic programming technique is difficult to deal with the multi-period and multi-level uncertain decision-making problem in MWEMPP. Therefore, a multi-stage stochastic programming approach is proposed to solve this problem. This approach first uses the scenario tree to quantitatively describe the bi-level uncertainty of the time and quantity of the CRs, and then build the whole off-line planning alternatives assembles for each possible scenario, at last the optimal planning alternative is selected on-line to flexibly encounter the real scenario in each period. A case is studied to validate the proposed approach. The results confirm that the proposed approach can better hedge against each scenario of the CRs than the traditional mean-value deterministic technique.

  16. Detecting surface coal mining areas from remote sensing imagery: an approach based on object-oriented decision trees

    Science.gov (United States)

    Zeng, Xiaoji; Liu, Zhifeng; He, Chunyang; Ma, Qun; Wu, Jianguo

    2017-01-01

    Detecting surface coal mining areas (SCMAs) using remote sensing data in a timely and an accurate manner is necessary for coal industry management and environmental assessment. We developed an approach to effectively extract SCMAs from remote sensing imagery based on object-oriented decision trees (OODT). This OODT approach involves three main steps: object-oriented segmentation, calculation of spectral characteristics, and extraction of SCMAs. The advantage of this approach lies in its effective integration of the spectral and spatial characteristics of SCMAs so as to distinguish the mining areas (i.e., the extracting areas, stripped areas, and dumping areas) from other areas that exhibit similar spectral features (e.g., bare soils and built-up areas). We implemented this method to extract SCMAs in the eastern part of Ordos City in Inner Mongolia, China. Our results had an overall accuracy of 97.07% and a kappa coefficient of 0.80. As compared with three other spectral information-based methods, our OODT approach is more accurate in quantifying the amount and spatial pattern of SCMAs in dryland regions.

  17. Application of time-lagged ensemble approach with auto-regressive processors to reduce uncertainties in peak discharge and timing

    Directory of Open Access Journals (Sweden)

    Kyung-Jin Kim

    2017-02-01

    An accuracy evaluation using observations from 2002 to 2009 found that the time-lagged ensemble approach alone produced significant bias but the AR processor reduced the relative error percentage of the peak discharge from 60% to 10% and also decreased the peak timing error from more than 10 h to less than 3 h, on average. The proposed methodology is easy and inexpensive to implement with the existing products and models and thus can be immediately activated until a new product for forecasted meteorological ensembles is officially issued in Korea.

  18. An alternative approach for gene transfer in trees using wild-type Agrobacterium strains.

    Science.gov (United States)

    Brasileiro, A C; Leplé, J C; Muzzin, J; Ounnoughi, D; Michel, M F; Jouanin, L

    1991-09-01

    Micropropagated shoots of three forest tree species, poplar (Populus tremula x P. alba), wild cherry (Prunus avium L.) and walnut (Juglans nigra x J. regia), were inoculated each with six different wild-type Agrobacterium strains. Poplar and wild cherry developed tumors that grew hormone-independently, whereas on walnut, gall formation was weak. On poplar and wild cherry, tumors induced by nopaline strains developed spontaneously shoots that had a normal phenotype and did not carry oncogenic T-DNA. From these observations, we have established a co-inoculation method to transform plants, using poplar as an experimental model. The method is based on inoculation of stem internodes with an Agrobacterium suspension containing both an oncogenic strain that induces shoot differentiation and a disarmed strain that provides the suitable genes in a binary vector. We used the vector pBI121 carrying neo (kanamycin resistance) and uidA (beta-glucuronidase) genes to facilitate early selection and screening. Poplar plants derived from kanamycin-resistant shoots that did not carry oncogenic T-DNA, were shown to contain and to express neo and uidA genes. These results suggest that wild-type Agrobacterium strains that induce shoot formation directly from tumors can be used as a general tool for gene transfer, avoiding difficult regeneration procedures.

  19. Distributed Graph Coloring: An Approach Based on the Calling Behavior of Japanese Tree Frogs

    CERN Document Server

    Hernández, Hugo

    2010-01-01

    Graph coloring, also known as vertex coloring, considers the problem of assigning colors to the nodes of a graph such that adjacent nodes do not share the same color. The optimization version of the problem concerns the minimization of the number of used colors. In this paper we deal with the problem of finding valid colorings of graphs in a distributed way, that is, by means of an algorithm that only uses local information for deciding the color of the nodes. Such algorithms prescind from any central control. Due to the fact that quite a few practical applications require to find colorings in a distributed way, the interest in distributed algorithms for graph coloring has been growing during the last decade. As an example consider wireless ad-hoc and sensor networks, where tasks such as the assignment of frequencies or the assignment of TDMA slots are strongly related to graph coloring. The algorithm proposed in this paper is inspired by the calling behavior of Japanese tree frogs. Male frogs use their calls...

  20. An observation-based progression modeling approach to spring and autumn deciduous tree phenology

    Science.gov (United States)

    Yu, Rong; Schwartz, Mark D.; Donnelly, Alison; Liang, Liang

    2016-03-01

    It is important to accurately determine the response of spring and autumn phenology to climate change in forest ecosystems, as phenological variations affect carbon balance, forest productivity, and biodiversity. We observed phenology intensively throughout spring and autumn in a temperate deciduous woodlot at Milwaukee, WI, USA, during 2007-2012. Twenty-four phenophase levels in spring and eight in autumn were recorded for 106 trees, including white ash, basswood, white oak, boxelder, red oak, and hophornbeam. Our phenological progression models revealed that accumulated degree-days and day length explained 87.9-93.4 % of the variation in spring canopy development and 75.8-89.1 % of the variation in autumn senescence. In addition, the timing of community-level spring and autumn phenophases and the length of the growing season from 1871 to 2012 were reconstructed with the models developed. All simulated spring phenophases significantly advanced at a rate from 0.24 to 0.48 days/decade ( p ≤ 0.001) during the 1871-2012 period and from 1.58 to 2.00 days/decade ( p climate between early and late spring phenophases, as well as between leaf coloration and leaf fall, and suggested accelerating simulated ecosystem responses to climate warming over the last four decades in comparison to the past 142 years.

  1. Using Hybrid Decision Tree -Houph Transform Approach For Automatic Bank Check Processing

    Directory of Open Access Journals (Sweden)

    Heba A. Elnemr

    2012-05-01

    Full Text Available One of the first steps in the realization of an automatic system of bank check processing is the automatic classification of checks and extraction of handwritten area. This paper presents a new hybrid method which couple together the statistical color histogram features, the entropy, the energy and the Houph transform to achieve the automatic classification of checks as well as the segmentation and recognition of the various information on the check. The proposed method relies on two stages. First, a two-step classification algorithm is implemented. In the first step, a decision classification tree is built using the entropy, the energy, the logo location and histogram features of colored bank checks. These features are used to classify checks into several groups. Each group may contain one or more type of checks. Therefore, in the second step the bank logo or bank name are matched against its stored template to identify the correct prototype. Second, Hough transform is utilized to detect lines in the classified checks. These lines are used as indicator to the bank check fields. A group of experiments is performed showing that the proposed technique is promising as regards classifying the bank checks and extracting the important fields in that check.

  2. On the relation between tree crown morphology and particulate matter deposition on urban tree leaves: a ground-based LiDAR approach

    NARCIS (Netherlands)

    Hofman, J.; Bartholomeus, H.; Calders, K.; Wittenberghe, van S.; Wuyts, K.; Samson, R.

    2014-01-01

    Urban dwellers often breathe air that does not meet the European and WHO standards. Next to legislative initiatives to lower atmospheric pollutants, much research has been conducted on the potential of urban trees as mitigation tool for atmospheric particles. While leaf-deposited dust has shown to v

  3. Spectral mapping of savanna tree species at canopy level, with focus on tall trees, using an integrated CAO Hyperspectral & LiDAR sensor approach

    CSIR Research Space (South Africa)

    Naidoo, L

    2010-03-01

    Full Text Available of food production and fuel wood for the local populace and communities. Economically viable tree species can thus be sustainably monitored while the pest species can be targeted and removed. The aim of this study was to identify spectrally and map 5 tall...

  4. Isokinetic knee strength qualities as predictors of jumping performance in high-level volleyball athletes: multiple regression approach.

    Science.gov (United States)

    Sattler, Tine; Sekulic, Damir; Spasic, Miodrag; Osmankac, Nedzad; Vicente João, Paulo; Dervisevic, Edvin; Hadzic, Vedran

    2016-01-01

    Previous investigations noted potential importance of isokinetic strength in rapid muscular performances, such as jumping. This study aimed to identify the influence of isokinetic-knee-strength on specific jumping performance in volleyball. The secondary aim of the study was to evaluate reliability and validity of the two volleyball-specific jumping tests. The sample comprised 67 female (21.96±3.79 years; 68.26±8.52 kg; 174.43±6.85 cm) and 99 male (23.62±5.27 years; 84.83±10.37 kg; 189.01±7.21 cm) high- volleyball players who competed in 1st and 2nd National Division. Subjects were randomly divided into validation (N.=55 and 33 for males and females, respectively) and cross-validation subsamples (N.=54 and 34 for males and females, respectively). Set of predictors included isokinetic tests, to evaluate the eccentric and concentric strength capacities of the knee extensors, and flexors for dominant and non-dominant leg. The main outcome measure for the isokinetic testing was peak torque (PT) which was later normalized for body mass and expressed as PT/Kg. Block-jump and spike-jump performances were measured over three trials, and observed as criteria. Forward stepwise multiple regressions were calculated for validation subsamples and then cross-validated. Cross validation included correlations between and t-test differences between observed and predicted scores; and Bland Altman graphics. Jumping tests were found to be reliable (spike jump: ICC of 0.79 and 0.86; block-jump: ICC of 0.86 and 0.90; for males and females, respectively), and their validity was confirmed by significant t-test differences between 1st vs. 2nd division players. Isokinetic variables were found to be significant predictors of jumping performance in females, but not among males. In females, the isokinetic-knee measures were shown to be stronger and more valid predictors of the block-jump (42% and 64% of the explained variance for validation and cross-validation subsample, respectively

  5. Regression equations for estimation of annual peak-streamflow frequency for undeveloped watersheds in Texas using an L-moment-based, PRESS-minimized, residual-adjusted approach

    Science.gov (United States)

    Asquith, William H.; Roussel, Meghan C.

    2009-01-01

    Annual peak-streamflow frequency estimates are needed for flood-plain management; for objective assessment of flood risk; for cost-effective design of dams, levees, and other flood-control structures; and for design of roads, bridges, and culverts. Annual peak-streamflow frequency represents the peak streamflow for nine recurrence intervals of 2, 5, 10, 25, 50, 100, 200, 250, and 500 years. Common methods for estimation of peak-streamflow frequency for ungaged or unmonitored watersheds are regression equations for each recurrence interval developed for one or more regions; such regional equations are the subject of this report. The method is based on analysis of annual peak-streamflow data from U.S. Geological Survey streamflow-gaging stations (stations). Beginning in 2007, the U.S. Geological Survey, in cooperation with the Texas Department of Transportation and in partnership with Texas Tech University, began a 3-year investigation concerning the development of regional equations to estimate annual peak-streamflow frequency for undeveloped watersheds in Texas. The investigation focuses primarily on 638 stations with 8 or more years of data from undeveloped watersheds and other criteria. The general approach is explicitly limited to the use of L-moment statistics, which are used in conjunction with a technique of multi-linear regression referred to as PRESS minimization. The approach used to develop the regional equations, which was refined during the investigation, is referred to as the 'L-moment-based, PRESS-minimized, residual-adjusted approach'. For the approach, seven unique distributions are fit to the sample L-moments of the data for each of 638 stations and trimmed means of the seven results of the distributions for each recurrence interval are used to define the station specific, peak-streamflow frequency. As a first iteration of regression, nine weighted-least-squares, PRESS-minimized, multi-linear regression equations are computed using the watershed

  6. An iteratively reweighted least-squares approach to adaptive robust adjustment of parameters in linear regression models with autoregressive and t-distributed deviations

    Science.gov (United States)

    Kargoll, Boris; Omidalizarandi, Mohammad; Loth, Ina; Paffenholz, Jens-André; Alkhatib, Hamza

    2017-09-01

    In this paper, we investigate a linear regression time series model of possibly outlier-afflicted observations and autocorrelated random deviations. This colored noise is represented by a covariance-stationary autoregressive (AR) process, in which the independent error components follow a scaled (Student's) t-distribution. This error model allows for the stochastic modeling of multiple outliers and for an adaptive robust maximum likelihood (ML) estimation of the unknown regression and AR coefficients, the scale parameter, and the degree of freedom of the t-distribution. This approach is meant to be an extension of known estimators, which tend to focus only on the regression model, or on the AR error model, or on normally distributed errors. For the purpose of ML estimation, we derive an expectation conditional maximization either algorithm, which leads to an easy-to-implement version of iteratively reweighted least squares. The estimation performance of the algorithm is evaluated via Monte Carlo simulations for a Fourier as well as a spline model in connection with AR colored noise models of different orders and with three different sampling distributions generating the white noise components. We apply the algorithm to a vibration dataset recorded by a high-accuracy, single-axis accelerometer, focusing on the evaluation of the estimated AR colored noise model.

  7. Application Of Decision Tree Approach To Student Selection Model- A Case Study

    Science.gov (United States)

    Harwati; Sudiya, Amby

    2016-01-01

    The main purpose of the institution is to provide quality education to the students and to improve the quality of managerial decisions. One of the ways to improve the quality of students is to arrange the selection of new students with a more selective. This research takes the case in the selection of new students at Islamic University of Indonesia, Yogyakarta, Indonesia. One of the university's selection is through filtering administrative selection based on the records of prospective students at the high school without paper testing. Currently, that kind of selection does not yet has a standard model and criteria. Selection is only done by comparing candidate application file, so the subjectivity of assessment is very possible to happen because of the lack standard criteria that can differentiate the quality of students from one another. By applying data mining techniques classification, can be built a model selection for new students which includes criteria to certain standards such as the area of origin, the status of the school, the average value and so on. These criteria are determined by using rules that appear based on the classification of the academic achievement (GPA) of the students in previous years who entered the university through the same way. The decision tree method with C4.5 algorithm is used here. The results show that students are given priority for admission is that meet the following criteria: came from the island of Java, public school, majoring in science, an average value above 75, and have at least one achievement during their study in high school.

  8. Stimulating seedling growth in early stages of secondary forest succession: a modeling approach to guide tree liberation.

    Directory of Open Access Journals (Sweden)

    Marijke van Kuijk

    2014-07-01

    Full Text Available Excessive growth of non-woody plants and shrubs on degraded lands can strongly hamper tree growth and thus secondary forest succession. A common method to accelerate succession, called liberation, involves opening up the vegetation canopy around young target trees. This can increase growth of target trees by reducing competition for light with neighboring plants. However, liberation has not always the desired effect, likely due to differences in light requirement between tree species. Here we present a 3D-model, which calculates photosynthetic rate of individual trees in a vegetation stand. It enables us to examine how stature, crown structure and physiological traits of target trees and characteristics of the surrounding vegetation together determine effects of light on tree growth. The model was applied to a liberation experiment conducted with three pioneer species in a young secondary forest in Vietnam. Species responded differently to the treatment depending on their height, crown structure and their shade-tolerance level. Model simulations revealed practical thresholds over which the tree growth response is heavily influenced by the height and density of surrounding vegetation and gap radius. There were strong correlations between calculated photosynthetic rates and observed growth: the model was well able to predict growth of trees in young forests and the effects of liberation there upon. Thus our model serves as a useful tool to analyze light competition between young trees and surrounding vegetation and may help assess the potential effect of tree liberation.

  9. Relative risk for HIV in India - An estimate using conditional auto-regressive models with Bayesian approach.

    Science.gov (United States)

    Kandhasamy, Chandrasekaran; Ghosh, Kaushik

    2017-02-01

    Indian states are currently classified into HIV-risk categories based on the observed prevalence counts, percentage of infected attendees in antenatal clinics, and percentage of infected high-risk individuals. This method, however, does not account for the spatial dependence among the states nor does it provide any measure of statistical uncertainty. We provide an alternative model-based approach to address these issues. Our method uses Poisson log-normal models having various conditional autoregressive structures with neighborhood-based and distance-based weight matrices and incorporates all available covariate information. We use R and WinBugs software to fit these models to the 2011 HIV data. Based on the Deviance Information Criterion, the convolution model using distance-based weight matrix and covariate information on female sex workers, literacy rate and intravenous drug users is found to have the best fit. The relative risk of HIV for the various states is estimated using the best model and the states are then classified into the risk categories based on these estimated values. An HIV risk map of India is constructed based on these results. The choice of the final model suggests that an HIV control strategy which focuses on the female sex workers, intravenous drug users and literacy rate would be most effective.

  10. Heritability Estimation using a Regularized Regression Approach (HERRA): Applicable to continuous, dichotomous or age-at-onset outcome.

    Science.gov (United States)

    Gorfine, Malka; Berndt, Sonja I; Chang-Claude, Jenny; Hoffmeister, Michael; Le Marchand, Loic; Potter, John; Slattery, Martha L; Keret, Nir; Peters, Ulrike; Hsu, Li

    2017-01-01

    The popular Genome-wide Complex Trait Analysis (GCTA) software uses the random-effects models for estimating the narrow-sense heritability based on GWAS data of unrelated individuals without knowing and identifying the causal loci. Many methods have since extended this approach to various situations. However, since the proportion of causal loci among the variants is typically very small and GCTA uses all variants to calculate the similarities among individuals, the estimation of heritability may be unstable, resulting in a large variance of the estimates. Moreover, if the causal SNPs are not genotyped, GCTA sometimes greatly underestimates the true heritability. We present a novel narrow-sense heritability estimator, named HERRA, using well-developed ultra-high dimensional machine-learning methods, applicable to continuous or dichotomous outcomes, as other existing methods. Additionally, HERRA is applicable to time-to-event or age-at-onset outcome, which, to our knowledge, no existing method can handle. Compared to GCTA and LDAK for continuous and binary outcomes, HERRA often has a smaller variance, and when causal SNPs are not genotyped, HERRA has a much smaller empirical bias. We applied GCTA, LDAK and HERRA to a large colorectal cancer dataset using dichotomous outcome (4,312 cases, 4,356 controls, genotyped using Illumina 300K), the respective heritability estimates of GCTA, LDAK and HERRA are 0.068 (SE = 0.017), 0.072 (SE = 0.021) and 0.110 (SE = 5.19 x 10-3). HERRA yields over 50% increase in heritability estimate compared to GCTA or LDAK.

  11. A bottom-up approach for labeling of human airway trees

    DEFF Research Database (Denmark)

    2011-01-01

    In this paper, an airway labeling algorithm that allows for gaps between the labeled branches is introduced. A bottom-up approach for arriving to an optimal set of branches and their associated labels is used in the proposed method. A K nearest neighbor based appearance model is used...

  12. Predicting equilibrium vapour pressure isotope effects by using artificial neural networks or multi-linear regression - A quantitative structure property relationship approach.

    Science.gov (United States)

    Parinet, Julien; Julien, Maxime; Nun, Pierrick; Robins, Richard J; Remaud, Gerald; Höhener, Patrick

    2015-09-01

    We aim at predicting the effect of structure and isotopic substitutions on the equilibrium vapour pressure isotope effect of various organic compounds (alcohols, acids, alkanes, alkenes and aromatics) at intermediate temperatures. We attempt to explore quantitative structure property relationships by using artificial neural networks (ANN); the multi-layer perceptron (MLP) and compare the performances of it with multi-linear regression (MLR). These approaches are based on the relationship between the molecular structure (organic chain, polar functions, type of functions, type of isotope involved) of the organic compounds, and their equilibrium vapour pressure. A data set of 130 equilibrium vapour pressure isotope effects was used: 112 were used in the training set and the remaining 18 were used for the test/validation dataset. Two sets of descriptors were tested, a set with all the descriptors: number of(12)C, (13)C, (16)O, (18)O, (1)H, (2)H, OH functions, OD functions, CO functions, Connolly Solvent Accessible Surface Area (CSA) and temperature and a reduced set of descriptors. The dependent variable (the output) is the natural logarithm of the ratios of vapour pressures (ln R), expressed as light/heavy as in classical literature. Since the database is rather small, the leave-one-out procedure was used to validate both models. Considering higher determination coefficients and lower error values, it is concluded that the multi-layer perceptron provided better results compared to multi-linear regression. The stepwise regression procedure is a useful tool to reduce the number of descriptors. To our knowledge, a Quantitative Structure Property Relationship (QSPR) approach for isotopic studies is novel.

  13. Data mining methods in the prediction of Dementia: A real-data comparison of the accuracy, sensitivity and specificity of linear discriminant analysis, logistic regression, neural networks, support vector machines, classification trees and random forests.

    Science.gov (United States)

    Maroco, João; Silva, Dina; Rodrigues, Ana; Guerreiro, Manuela; Santana, Isabel; de Mendonça, Alexandre

    2011-08-17

    Dementia and cognitive impairment associated with aging are a major medical and social concern. Neuropsychological testing is a key element in the diagnostic procedures of Mild Cognitive Impairment (MCI), but has presently a limited value in the prediction of progression to dementia. We advance the hypothesis that newer statistical classification methods derived from data mining and machine learning methods like Neural Networks, Support Vector Machines and Random Forests can improve accuracy, sensitivity and specificity of predictions obtained from neuropsychological testing. Seven non parametric classifiers derived from data mining methods (Multilayer Perceptrons Neural Networks, Radial Basis Function Neural Networks, Support Vector Machines, CART, CHAID and QUEST Classification Trees and Random Forests) were compared to three traditional classifiers (Linear Discriminant Analysis, Quadratic Discriminant Analysis and Logistic Regression) in terms of overall classification accuracy, specificity, sensitivity, Area under the ROC curve and Press'Q. Model predictors were 10 neuropsychological tests currently used in the diagnosis of dementia. Statistical distributions of classification parameters obtained from a 5-fold cross-validation were compared using the Friedman's nonparametric test. Press' Q test showed that all classifiers performed better than chance alone (p Machines showed the larger overall classification accuracy (Median (Me) = 0.76) an area under the ROC (Me = 0.90). However this method showed high specificity (Me = 1.0) but low sensitivity (Me = 0.3). Random Forest ranked second in overall accuracy (Me = 0.73) with high area under the ROC (Me = 0.73) specificity (Me = 0.73) and sensitivity (Me = 0.64). Linear Discriminant Analysis also showed acceptable overall accuracy (Me = 0.66), with acceptable area under the ROC (Me = 0.72) specificity (Me = 0.66) and sensitivity (Me = 0.64). The remaining classifiers showed overall classification accuracy above a

  14. Using Regression to Establish Weights for a Set of Composite Equations through a Numerical Analysis Approach: A Case of Admission Criteria to a College

    Directory of Open Access Journals (Sweden)

    Ramzi N. Nasser

    2010-01-01

    Full Text Available Problem statement: Mathematically little is known of college admission criteria as in school grade point average, admission test scores or rank in class and weighting of the criteria into a composite equation. Approach: This study presented a method to obtain weights on “composite admission” equation. The method uses an iterative procedure to build a prediction equation for an optimal weighted admission composite score. The three-predictor variables, high school average, entrance exam scores and rank in class, were regressed on college Grade Point Average (GPA. The weights for the composite equation were determined through regression coefficients and numerical approach that correlate the composite score with college GPA. Results: A set of composite equations were determined with the weights on each criteria in a composite equation. Conclusion: This study detailed a substantiated algorithm and based on an optimal composite score, comes out with an original and unique structured composite score equation for admissions, which can be used by admission officers at colleges and universities.

  15. A NEW FRAMEWORK FOR PREDICTING THE IMPACT OF TRAFFIC ON THE PERFORMANCE OF MOBILE AD-HOC NETWORK (MANET: USING REGRESSION AS DATA MINING APPROACH

    Directory of Open Access Journals (Sweden)

    Kamal Moh’d Alhendawi

    2017-02-01

    Full Text Available With the rapid technological advances in wireless communication and the increasing of usage of portable computing devices, it is expected that mobile ad hoc networks are increasingly developed towards enhancing the flexibility, scalability and efficiency of communication technology. The wireless ad-hoc network is a collection of mobile nodes in which these nodes have the ability to connect each other without backbone infrastructure (i.e. infrastructure less. Although many studies have been done on the performance assessment of MANET routing protocols, there is a need for investigating the impact of traffic load on the performance of MANET in order to justify the use of some routing protocol in MANET. This study is one of the fewest that aims at proposing a new framework towards predicting and validating the results of future scenario using data mining techniques. The regression analysis is used as data mining method in the prediction of the future scenarios. Practically, two experiments with eight scenarios are conducted. The findings indicate that the network size and traffic loads are proportionally related to the throughput. However, the findings show that the network size is inversely related to the delay in case of medium FTP traffic, and proportionally related in case of high FTP traffic. The results also indicate that data mining approach specially regression is an effective approach towards the prediction of the future network behavior.

  16. A data driven approach for condition monitoring of wind turbine blade using vibration signals through best-first tree algorithm and functional trees algorithm: A comparative study.

    Science.gov (United States)

    Joshuva, A; Sugumaran, V

    2017-03-01

    Wind energy is one of the important renewable energy resources available in nature. It is one of the major resources for production of energy because of its dependability due to the development of the technology and relatively low cost. Wind energy is converted into electrical energy using rotating blades. Due to environmental conditions and large structure, the blades are subjected to various vibration forces that may cause damage to the blades. This leads to a liability in energy production and turbine shutdown. The downtime can be reduced when the blades are diagnosed continuously using structural health condition monitoring. These are considered as a pattern recognition problem which consists of three phases namely, feature extraction, feature selection, and feature classification. In this study, statistical features were extracted from vibration signals, feature selection was carried out using a J48 decision tree algorithm and feature classification was performed using best-first tree algorithm and functional trees algorithm. The better algorithm is suggested for fault diagnosis of wind turbine blade. Copyright © 2017 ISA. Published by Elsevier Ltd. All rights reserved.

  17. Tailored Approach in Inguinal Hernia Repair – Decision Tree Based on the Guidelines

    OpenAIRE

    2014-01-01

    The endoscopic procedures TEP and TAPP and the open techniques Lichtenstein, Plug and Patch, and PHS currently represent the gold standard in inguinal hernia repair recommended in the guidelines of the European Hernia Society, the International Endohernia Society, and the European Association of Endoscopic Surgery. Eighty-two percent of experienced hernia surgeons use the “tailored approach,” the differentiated use of the several inguinal hernia repair techniques depending on the findings of ...

  18. Programming macro tree transducers

    DEFF Research Database (Denmark)

    Bahr, Patrick; Day, Laurence E.

    2013-01-01

    A tree transducer is a set of mutually recursive functions transforming an input tree into an output tree. Macro tree transducers extend this recursion scheme by allowing each function to be defined in terms of an arbitrary number of accumulation parameters. In this paper, we show how macro tree...... transducers can be concisely represented in Haskell, and demonstrate the benefits of utilising such an approach with a number of examples. In particular, tree transducers afford a modular programming style as they can be easily composed and manipulated. Our Haskell representation generalises the original...... definition of (macro) tree transducers, abolishing a restriction on finite state spaces. However, as we demonstrate, this generalisation does not affect compositionality....

  19. Programming macro tree transducers

    DEFF Research Database (Denmark)

    Bahr, Patrick; Day, Laurence E.

    2013-01-01

    A tree transducer is a set of mutually recursive functions transforming an input tree into an output tree. Macro tree transducers extend this recursion scheme by allowing each function to be defined in terms of an arbitrary number of accumulation parameters. In this paper, we show how macro tree...... transducers can be concisely represented in Haskell, and demonstrate the benefits of utilising such an approach with a number of examples. In particular, tree transducers afford a modular programming style as they can be easily composed and manipulated. Our Haskell representation generalises the original...... definition of (macro) tree transducers, abolishing a restriction on finite state spaces. However, as we demonstrate, this generalisation does not affect compositionality....

  20. Evaluating risk factors for endemic human Salmonella Enteritidis infections with different phage types in Ontario, Canada using multinomial logistic regression and a case-case study approach

    Directory of Open Access Journals (Sweden)

    Varga Csaba

    2012-10-01

    Full Text Available Abstract Background Identifying risk factors for Salmonella Enteritidis (SE infections in Ontario will assist public health authorities to design effective control and prevention programs to reduce the burden of SE infections. Our research objective was to identify risk factors for acquiring SE infections with various phage types (PT in Ontario, Canada. We hypothesized that certain PTs (e.g., PT8 and PT13a have specific risk factors for infection. Methods Our study included endemic SE cases with various PTs whose isolates were submitted to the Public Health Laboratory-Toronto from January 20th to August 12th, 2011. Cases were interviewed using a standardized questionnaire that included questions pertaining to demographics, travel history, clinical symptoms, contact with animals, and food exposures. A multinomial logistic regression method using the Generalized Linear Latent and Mixed Model procedure and a case-case study design were used to identify risk factors for acquiring SE infections with various PTs in Ontario, Canada. In the multinomial logistic regression model, the outcome variable had three categories representing human infections caused by SE PT8, PT13a, and all other SE PTs (i.e., non-PT8/non-PT13a as a referent category to which the other two categories were compared. Results In the multivariable model, SE PT8 was positively associated with contact with dogs (OR=2.17, 95% CI 1.01-4.68 and negatively associated with pepper consumption (OR=0.35, 95% CI 0.13-0.94, after adjusting for age categories and gender, and using exposure periods and health regions as random effects to account for clustering. Conclusions Our study findings offer interesting hypotheses about the role of phage type-specific risk factors. Multinomial logistic regression analysis and the case-case study approach are novel methodologies to evaluate associations among SE infections with different PTs and various risk factors.

  1. An improved approach for measuring the impact of multiple CO2 conductances on the apparent photorespiratory CO2 compensation point through slope-intercept regression.

    Science.gov (United States)

    Walker, Berkley J; Skabelund, Dane C; Busch, Florian A; Ort, Donald R

    2016-06-01

    Biochemical models of leaf photosynthesis, which are essential for understanding the impact of photosynthesis to changing environments, depend on accurate parameterizations. One such parameter, the photorespiratory CO2 compensation point can be measured from the intersection of several CO2 response curves measured under sub-saturating illumination. However, determining the actual intersection while accounting for experimental noise can be challenging. Additionally, leaf photosynthesis model outcomes are sensitive to the diffusion paths of CO2 released from the mitochondria. This diffusion path of CO2 includes both chloroplastic as well as cell wall resistances to CO2 , which are not readily measurable. Both the difficulties of determining the photorespiratory CO2 compensation point and the impact of multiple intercellular resistances to CO2 can be addressed through application of slope-intercept regression. This technical report summarizes an improved framework for implementing slope-intercept regression to evaluate measurements of the photorespiratory CO2 compensation point. This approach extends past work to include the cases of both Rubisco and Ribulose-1,5-bisphosphate (RuBP)-limited photosynthesis. This report further presents two interactive graphical applications and a spreadsheet-based tool to allow users to apply slope-intercept theory to their data.

  2. A Multi-way Multi-task Learning Approach for Multinomial Logistic Regression*. An Application in Joint Prediction of Appointment Miss-opportunities across Multiple Clinics.

    Science.gov (United States)

    Alaeddini, Adel; Hong, Seung Hee

    2017-08-11

    Whether they have been engineered for it or not, most healthcare systems experience a variety of unexpected events such as appointment miss-opportunities that can have significant impact on their revenue, cost and resource utilization. In this paper, a multi-way multi-task learning model based on multinomial logistic regression is proposed to jointly predict the occurrence of different types of miss-opportunities at multiple clinics. An extension of L1 / L2 regularization is proposed to enable transfer of information among various types of miss-opportunities as well as different clinics. A proximal algorithm is developed to transform the convex but non-smooth likelihood function of the multi-way multi-task learning model into a convex and smooth optimization problem solvable using gradient descent algorithm. A dataset of real attendance records of patients at four different clinics of a VA medical center is used to verify the performance of the proposed multi-task learning approach. Additionally, a simulation study, investigating more general data situations is provided to highlight the specific aspects of the proposed approach. Various individual and integrated multinomial logistic regression models with/without LASSO penalty along with a number of other common classification algorithms are fitted and compared against the proposed multi-way multi-task learning approach. Fivefold cross validation is used to estimate comparing models parameters and their predictive accuracy. The multi-way multi-task learning framework enables the proposed approach to achieve a considerable rate of parameter shrinkage and superior prediction accuracy across various types of miss-opportunities and clinics. The proposed approach provides an integrated structure to effectively transfer knowledge among different miss-opportunities and clinics to reduce model size, increase estimation efficacy, and more importantly improve predictions results. The proposed framework can be

  3. Cytotoxicity towards CCO cells of imidazolium ionic liquids with functionalized side chains: preliminary QSTR modeling using regression and classification based approaches.

    Science.gov (United States)

    Bubalo, Marina Cvjetko; Radošević, Kristina; Srček, Višnja Gaurina; Das, Rudra Narayan; Popelier, Paul; Roy, Kunal

    2015-02-01

    Within this work we evaluated the cytotoxicity towards the Channel Catfish Ovary (CCO) cell line of some imidazolium-based ionic liquids containing different functionalized and unsaturated side chains. The toxic effects were measured by the reduction of the WST-1 dye after 72 h exposure resulting in dose- and structure-dependent toxicities. The obtained data on cytotoxic effects of 14 different imidazolium ionic liquids in CCO cells, expressed as EC50 values, were used in a preliminary quantitative structure-toxicity relationship (QSTR) study employing regression- and classification-based approaches. The toxicity of ILs towards CCO was chiefly related to the shape and hydrophobicity parameters of cations. A significant influence of the quantum topological molecular similarity descriptor ellipticity (ε) of the imine bond was also observed.

  4. LIFE CLIMATREE project: A novel approach for accounting and monitoring carbon sequestration of tree crops and their potential as carbon sink areas

    Science.gov (United States)

    Stergiou, John; Tagaris, Efthimios; -Eleni Sotiropoulou, Rafaella

    2016-04-01

    Climate Change Mitigation is one of the most important objectives of the Kyoto Convention, and is mostly oriented towards reducing GHG emissions. However, carbon sink is retained only in the calculation of the forests capacity since agricultural land and farmers practices for securing carbon stored in soils have not been recognized in GHG accounting, possibly resulting in incorrect estimations of the carbon dioxide balance in the atmosphere. The agricultural sector, which is a key sector in the EU, presents a consistent strategic framework since 1954, in the form of Common Agricultural Policy (CAP). In its latest reform of 2013 (reg. (EU) 1305/13) CAP recognized the significance of Agriculture as a key player in Climate Change policy. In order to fill this gap the "LIFE ClimaTree" project has recently founded by the European Commission aiming to provide a novel method for including tree crop cultivations in the LULUCF's accounting rules for GHG emissions and removal. In the framework of "LIFE ClimaTree" project estimation of carbon sink within EU through the inclusion of the calculated tree crop capacity will be assessed for both current and future climatic conditions by 2050s using the GISS-WRF modeling system in a very fine scale (i.e., 9km x 9km) using RCP8.5 and RCP4.5 climate scenarios. Acknowledgement: LIFE CLIMATREE project "A novel approach for accounting and monitoring carbon sequestration of tree crops and their potential as carbon sink areas" (LIFE14 CCM/GR/000635).

  5. Delineation of seismic source zones based on seismicity parameters and probabilistic evaluation of seismic hazard using logic tree approach

    Indian Academy of Sciences (India)

    K S Vipin; T G Sitharam

    2013-06-01

    The delineation of seismic source zones plays an important role in the evaluation of seismic hazard. In most of the studies the seismic source delineation is done based on geological features. In the present study, an attempt has been made to delineate seismic source zones in the study area (south India) based on the seismicity parameters. Seismicity parameters and the maximum probable earthquake for these source zones were evaluated and were used in the hazard evaluation. The probabilistic evaluation of seismic hazard for south India was carried out using a logic tree approach. Two different types of seismic sources, linear and areal, were considered in the present study to model the seismic sources in the region more precisely. In order to properly account for the attenuation characteristics of the region, three different attenuation relations were used with different weightage factors. Seismic hazard evaluation was done for the probability of exceedance (PE) of 10% and 2% in 50 years. The spatial variation of rock level peak horizontal acceleration (PHA) and spectral acceleration (Sa) values corresponding to return periods of 475 and 2500 years for the entire study area are presented in this work. The peak ground acceleration (PGA) values at ground surface level were estimated based on different NEHRP site classes by considering local site effects.

  6. An Inventory of Tree and Stand Growth Empirical Modelling Approaches with Potential Application in Coppice Forestry (a Review

    Directory of Open Access Journals (Sweden)

    Michal Kneifl

    2015-01-01

    Full Text Available We examined currently available empirical growth models which could be potentially applicable to coppice growth and production modelling. We compiled a summary of empirical models applied in coppices, high forests and fast-growing tree plantations, including coppice plantations. The collected growth models were analysed in order to find out whether they encompassed any of 13 key dendrometric and structural variables that we found as characteristic for coppices. There is no currently available complex growth model for coppices in Europe. Furthermore, many aspects of coppice growth process have been totally ignored or omitted in the most common modelling approaches so far. Within-stool competition, mortality and stool morphological variability are the most important parameters. However, some individual empirical submodels or their parts are potentially applicable for coppice growth and production modelling (e. g. diameter increment model or model of resprouting probability. As the issue of coppice management gains attention, the need for a decision support tool (e.g. coppice growth simulator becomes more actual.

  7. Tree-based genetic programming approach to infer microphysical parameters of the DSDs from the polarization diversity measurements

    Science.gov (United States)

    Islam, Tanvir; Rico-Ramirez, Miguel A.; Han, Dawei

    2012-11-01

    The use of polarization diversity measurements to infer the microphysical parametrization has remained an active goal in the radar remote sensing community. In view of this, the tree-based genetic programming (GP) as a novel approach has been presented for retrieving the governing microphysical parameters of a normalized gamma drop size distribution model D0 (median drop diameter), Nw (concentration parameter), and μ (shape parameter) from the polarization diversity measurements. A large number of raindrop spectra acquired from a Joss-Waldvogel disdrometer has been utilized to develop the GP models, relating the microphysical parameters to the T-matrix scattering simulated polarization measurements. Several functional formulations retrieving the microphysical parameters-D0 [f(ZDR), f(ZH, ZDR)], log10Nw [f(ZH, D0), f(ZH, ZDR, D0), and μ[f(ZDR, D0), f(ZH, ZDR, D0)], where ZH represents reflectivity and ZDR represents differential reflectivity, have been investigated, and applied to a S-band polarimetric radar (CAMRA) for evaluation. It has been shown that the GP model retrieved microphysical parameters from the polarization measurements are in a reasonable agreement with disdrometer observations. The calculated root mean squared errors (RMSE) are noted as 0.23-0.25 mm for D0, 0.74-0.85 for log10Nw (Nw in mm-1 mm-3), and 3.30-3.36 for μ. The GP model based microphysical retrieval procedure is further compared with a physically based constrained gamma model for D0 and log10Nw estimates. The close agreement of the retrieval results between the GP and the constrained gamma models supports the suitability of the proposed genetic programming approach to infer microphysical parameterization.

  8. Accounting for selection bias in species distribution models: An econometric approach on forested trees based on structural modeling

    Science.gov (United States)

    Ay, Jean-Sauveur; Guillemot, Joannès; Martin-StPaul, Nicolas K.; Doyen, Luc; Leadley, Paul

    2015-04-01

    Species distribution models (SDMs) are widely used to study and predict the outcome of global change on species. In human dominated ecosystems the presence of a given species is the result of both its ecological suitability and human footprint on nature such as land use choices. Land use choices may thus be responsible for a selection bias in the presence/absence data used in SDM calibration. We present a structural modelling approach (i.e. based on structural equation modelling) that accounts for this selection bias. The new structural species distribution model (SSDM) estimates simultaneously land use choices and species responses to bioclimatic variables. A land use equation based on an econometric model of landowner choices was joined to an equation of species response to bioclimatic variables. SSDM allows the residuals of both equations to