WorldWideScience

Sample records for multivariate random forests

  1. A Copula Based Approach for Design of Multivariate Random Forests for Drug Sensitivity Prediction.

    Science.gov (United States)

    Haider, Saad; Rahman, Raziur; Ghosh, Souparno; Pal, Ranadip

    2015-01-01

    Modeling sensitivity to drugs based on genetic characterizations is a significant challenge in the area of systems medicine. Ensemble based approaches such as Random Forests have been shown to perform well in both individual sensitivity prediction studies and team science based prediction challenges. However, Random Forests generate a deterministic predictive model for each drug based on the genetic characterization of the cell lines and ignores the relationship between different drug sensitivities during model generation. This application motivates the need for generation of multivariate ensemble learning techniques that can increase prediction accuracy and improve variable importance ranking by incorporating the relationships between different output responses. In this article, we propose a novel cost criterion that captures the dissimilarity in the output response structure between the training data and node samples as the difference in the two empirical copulas. We illustrate that copulas are suitable for capturing the multivariate structure of output responses independent of the marginal distributions and the copula based multivariate random forest framework can provide higher accuracy prediction and improved variable selection. The proposed framework has been validated on genomics of drug sensitivity for cancer and cancer cell line encyclopedia database.

  2. Multivariate stochastic simulation with subjective multivariate normal distributions

    Science.gov (United States)

    P. J. Ince; J. Buongiorno

    1991-01-01

    In many applications of Monte Carlo simulation in forestry or forest products, it may be known that some variables are correlated. However, for simplicity, in most simulations it has been assumed that random variables are independently distributed. This report describes an alternative Monte Carlo simulation technique for subjectively assesed multivariate normal...

  3. A tale of two "forests": random forest machine learning AIDS tropical forest carbon mapping.

    Science.gov (United States)

    Mascaro, Joseph; Asner, Gregory P; Knapp, David E; Kennedy-Bowdoin, Ty; Martin, Roberta E; Anderson, Christopher; Higgins, Mark; Chadwick, K Dana

    2014-01-01

    Accurate and spatially-explicit maps of tropical forest carbon stocks are needed to implement carbon offset mechanisms such as REDD+ (Reduced Deforestation and Degradation Plus). The Random Forest machine learning algorithm may aid carbon mapping applications using remotely-sensed data. However, Random Forest has never been compared to traditional and potentially more reliable techniques such as regionally stratified sampling and upscaling, and it has rarely been employed with spatial data. Here, we evaluated the performance of Random Forest in upscaling airborne LiDAR (Light Detection and Ranging)-based carbon estimates compared to the stratification approach over a 16-million hectare focal area of the Western Amazon. We considered two runs of Random Forest, both with and without spatial contextual modeling by including--in the latter case--x, and y position directly in the model. In each case, we set aside 8 million hectares (i.e., half of the focal area) for validation; this rigorous test of Random Forest went above and beyond the internal validation normally compiled by the algorithm (i.e., called "out-of-bag"), which proved insufficient for this spatial application. In this heterogeneous region of Northern Peru, the model with spatial context was the best preforming run of Random Forest, and explained 59% of LiDAR-based carbon estimates within the validation area, compared to 37% for stratification or 43% by Random Forest without spatial context. With the 60% improvement in explained variation, RMSE against validation LiDAR samples improved from 33 to 26 Mg C ha(-1) when using Random Forest with spatial context. Our results suggest that spatial context should be considered when using Random Forest, and that doing so may result in substantially improved carbon stock modeling for purposes of climate change mitigation.

  4. A tale of two "forests": random forest machine learning AIDS tropical forest carbon mapping.

    Directory of Open Access Journals (Sweden)

    Joseph Mascaro

    Full Text Available Accurate and spatially-explicit maps of tropical forest carbon stocks are needed to implement carbon offset mechanisms such as REDD+ (Reduced Deforestation and Degradation Plus. The Random Forest machine learning algorithm may aid carbon mapping applications using remotely-sensed data. However, Random Forest has never been compared to traditional and potentially more reliable techniques such as regionally stratified sampling and upscaling, and it has rarely been employed with spatial data. Here, we evaluated the performance of Random Forest in upscaling airborne LiDAR (Light Detection and Ranging-based carbon estimates compared to the stratification approach over a 16-million hectare focal area of the Western Amazon. We considered two runs of Random Forest, both with and without spatial contextual modeling by including--in the latter case--x, and y position directly in the model. In each case, we set aside 8 million hectares (i.e., half of the focal area for validation; this rigorous test of Random Forest went above and beyond the internal validation normally compiled by the algorithm (i.e., called "out-of-bag", which proved insufficient for this spatial application. In this heterogeneous region of Northern Peru, the model with spatial context was the best preforming run of Random Forest, and explained 59% of LiDAR-based carbon estimates within the validation area, compared to 37% for stratification or 43% by Random Forest without spatial context. With the 60% improvement in explained variation, RMSE against validation LiDAR samples improved from 33 to 26 Mg C ha(-1 when using Random Forest with spatial context. Our results suggest that spatial context should be considered when using Random Forest, and that doing so may result in substantially improved carbon stock modeling for purposes of climate change mitigation.

  5. Analysis, Simulation and Prediction of Multivariate Random Fields with Package RandomFields

    Directory of Open Access Journals (Sweden)

    Martin Schlather

    2015-02-01

    Full Text Available Modeling of and inference on multivariate data that have been measured in space, such as temperature and pressure, are challenging tasks in environmental sciences, physics and materials science. We give an overview over and some background on modeling with cross- covariance models. The R package RandomFields supports the simulation, the parameter estimation and the prediction in particular for the linear model of coregionalization, the multivariate Matrn models, the delay model, and a spectrum of physically motivated vector valued models. An example on weather data is considered, illustrating the use of RandomFields for parameter estimation and prediction.

  6. Evolving Random Forest for Preference Learning

    DEFF Research Database (Denmark)

    Abou-Zleikha, Mohamed; Shaker, Noor

    2015-01-01

    This paper introduces a novel approach for pairwise preference learning through a combination of an evolutionary method and random forest. Grammatical evolution is used to describe the structure of the trees in the Random Forest (RF) and to handle the process of evolution. Evolved random forests ...... obtained for predicting pairwise self-reports of users for the three emotional states engagement, frustration and challenge show very promising results that are comparable and in some cases superior to those obtained from state-of-the-art methods....

  7. Identification by random forest method of HLA class I amino acid substitutions associated with lower survival at day 100 in unrelated donor hematopoietic cell transplantation.

    Science.gov (United States)

    Marino, S R; Lin, S; Maiers, M; Haagenson, M; Spellman, S; Klein, J P; Binkowski, T A; Lee, S J; van Besien, K

    2012-02-01

    The identification of important amino acid substitutions associated with low survival in hematopoietic cell transplantation (HCT) is hampered by the large number of observed substitutions compared with the small number of patients available for analysis. Random forest analysis is designed to address these limitations. We studied 2107 HCT recipients with good or intermediate risk hematological malignancies to identify HLA class I amino acid substitutions associated with reduced survival at day 100 post transplant. Random forest analysis and traditional univariate and multivariate analyses were used. Random forest analysis identified amino acid substitutions in 33 positions that were associated with reduced 100 day survival, including HLA-A 9, 43, 62, 63, 76, 77, 95, 97, 114, 116, 152, 156, 166 and 167; HLA-B 97, 109, 116 and 156; and HLA-C 6, 9, 11, 14, 21, 66, 77, 80, 95, 97, 99, 116, 156, 163 and 173. In all 13 had been previously reported by other investigators using classical biostatistical approaches. Using the same data set, traditional multivariate logistic regression identified only five amino acid substitutions associated with lower day 100 survival. Random forest analysis is a novel statistical methodology for analysis of HLA mismatching and outcome studies, capable of identifying important amino acid substitutions missed by other methods.

  8. Aprendizaje supervisado mediante random forests

    OpenAIRE

    Molero del Río, María Cristina

    2017-01-01

    Muchos problemas de la vida real pueden modelarse como problemas de clasificación, tales como la detección temprana de enfermedades o la concesión de crédito a un cierto individuo. La Clasificación Supervisada se encarga de este tipo de problemas: aprende de una muestra con el objetivo final de inferir observaciones futuras. Hoy en día, existe una amplia gama de técnicas de Clasificación Supervisada. En este trabajo nos centramos en los bosques aleatorios (Random Forests). El Random Forests e...

  9. Mapping Distinct Forest Types Improves Overall Forest Identification Based on Multi-Spectral Landsat Imagery for Myanmar’s Tanintharyi Region

    Directory of Open Access Journals (Sweden)

    Grant Connette

    2016-10-01

    Full Text Available We investigated the use of multi-spectral Landsat OLI imagery for delineating mangrove, lowland evergreen, upland evergreen and mixed deciduous forest types in Myanmar’s Tanintharyi Region and estimated the extent of degraded forest for each unique forest type. We mapped a total of 16 natural and human land use classes using both a Random Forest algorithm and a multivariate Gaussian model while considering scenarios with all natural forest classes grouped into a single intact or degraded category. Overall, classification accuracy increased for the multivariate Gaussian model with the partitioning of intact and degraded forest into separate forest cover classes but slightly decreased based on the Random Forest classifier. Natural forest cover was estimated to be 80.7% of total area in Tanintharyi. The most prevalent forest types are upland evergreen forest (42.3% of area and lowland evergreen forest (21.6%. However, while just 27.1% of upland evergreen forest was classified as degraded (on the basis of canopy cover <80%, 66.0% of mangrove forest and 47.5% of the region’s biologically-rich lowland evergreen forest were classified as degraded. This information on the current status of Tanintharyi’s unique forest ecosystems and patterns of human land use is critical to effective conservation strategies and land-use planning.

  10. Visible and near infrared spectroscopy coupled to random forest to quantify some soil quality parameters

    Science.gov (United States)

    de Santana, Felipe Bachion; de Souza, André Marcelo; Poppi, Ronei Jesus

    2018-02-01

    This study evaluates the use of visible and near infrared spectroscopy (Vis-NIRS) combined with multivariate regression based on random forest to quantify some quality soil parameters. The parameters analyzed were soil cation exchange capacity (CEC), sum of exchange bases (SB), organic matter (OM), clay and sand present in the soils of several regions of Brazil. Current methods for evaluating these parameters are laborious, timely and require various wet analytical methods that are not adequate for use in precision agriculture, where faster and automatic responses are required. The random forest regression models were statistically better than PLS regression models for CEC, OM, clay and sand, demonstrating resistance to overfitting, attenuating the effect of outlier samples and indicating the most important variables for the model. The methodology demonstrates the potential of the Vis-NIR as an alternative for determination of CEC, SB, OM, sand and clay, making possible to develop a fast and automatic analytical procedure.

  11. Data-Driven Lead-Acid Battery Prognostics Using Random Survival Forests

    Science.gov (United States)

    2014-10-02

    Kogalur, Blackstone , & Lauer, 2008; Ishwaran & Kogalur, 2010). Random survival forest is a sur- vival analysis extension of Random Forests (Breiman, 2001...Statistics & probability letters, 80(13), 1056–1064. Ishwaran, H., Kogalur, U. B., Blackstone , E. H., & Lauer, M. S. (2008). Random survival forests. The...and environment for sta- tistical computing [Computer software manual]. Vienna, Austria. Retrieved from http://www.R-project .org/ Wager, S., Hastie, T

  12. Automatic structure classification of small proteins using random forest

    Directory of Open Access Journals (Sweden)

    Hirst Jonathan D

    2010-07-01

    Full Text Available Abstract Background Random forest, an ensemble based supervised machine learning algorithm, is used to predict the SCOP structural classification for a target structure, based on the similarity of its structural descriptors to those of a template structure with an equal number of secondary structure elements (SSEs. An initial assessment of random forest is carried out for domains consisting of three SSEs. The usability of random forest in classifying larger domains is demonstrated by applying it to domains consisting of four, five and six SSEs. Results Random forest, trained on SCOP version 1.69, achieves a predictive accuracy of up to 94% on an independent and non-overlapping test set derived from SCOP version 1.73. For classification to the SCOP Class, Fold, Super-family or Family levels, the predictive quality of the model in terms of Matthew's correlation coefficient (MCC ranged from 0.61 to 0.83. As the number of constituent SSEs increases the MCC for classification to different structural levels decreases. Conclusions The utility of random forest in classifying domains from the place-holder classes of SCOP to the true Class, Fold, Super-family or Family levels is demonstrated. Issues such as introduction of a new structural level in SCOP and the merger of singleton levels can also be addressed using random forest. A real-world scenario is mimicked by predicting the classification for those protein structures from the PDB, which are yet to be assigned to the SCOP classification hierarchy.

  13. Using Random Forest Models to Predict Organizational Violence

    Science.gov (United States)

    Levine, Burton; Bobashev, Georgly

    2012-01-01

    We present a methodology to access the proclivity of an organization to commit violence against nongovernment personnel. We fitted a Random Forest model using the Minority at Risk Organizational Behavior (MAROS) dataset. The MAROS data is longitudinal; so, individual observations are not independent. We propose a modification to the standard Random Forest methodology to account for the violation of the independence assumption. We present the results of the model fit, an example of predicting violence for an organization; and finally, we present a summary of the forest in a "meta-tree,"

  14. Approximating prediction uncertainty for random forest regression models

    Science.gov (United States)

    John W. Coulston; Christine E. Blinn; Valerie A. Thomas; Randolph H. Wynne

    2016-01-01

    Machine learning approaches such as random forest have increased for the spatial modeling and mapping of continuous variables. Random forest is a non-parametric ensemble approach, and unlike traditional regression approaches there is no direct quantification of prediction error. Understanding prediction uncertainty is important when using model-based continuous maps as...

  15. Fast image interpolation via random forests.

    Science.gov (United States)

    Huang, Jun-Jie; Siu, Wan-Chi; Liu, Tian-Rui

    2015-10-01

    This paper proposes a two-stage framework for fast image interpolation via random forests (FIRF). The proposed FIRF method gives high accuracy, as well as requires low computation. The underlying idea of this proposed work is to apply random forests to classify the natural image patch space into numerous subspaces and learn a linear regression model for each subspace to map the low-resolution image patch to high-resolution image patch. The FIRF framework consists of two stages. Stage 1 of the framework removes most of the ringing and aliasing artifacts in the initial bicubic interpolated image, while Stage 2 further refines the Stage 1 interpolated image. By varying the number of decision trees in the random forests and the number of stages applied, the proposed FIRF method can realize computationally scalable image interpolation. Extensive experimental results show that the proposed FIRF(3, 2) method achieves more than 0.3 dB improvement in peak signal-to-noise ratio over the state-of-the-art nonlocal autoregressive modeling (NARM) method. Moreover, the proposed FIRF(1, 1) obtains similar or better results as NARM while only takes its 0.3% computational time.

  16. Multivariate Analysis of Some Pine Forested Areas of Azad Kashmir-Pakistan

    International Nuclear Information System (INIS)

    Bokhari, T.Z.; Liu, Y.; Li, Q.; Malik, S.A.; Ahmed, M.; Siddiqui, M.F.; Khan, Z.U.

    2016-01-01

    Floristic composition and communities in Azad Kashmir area of Pakistan were studied by using multivariate analysis. Quantitative sampling from thirty one sites was carried out in different coniferous forests of Azad Kashmir in order to analyze the effects of past earthquakes and landslides on vegetation of these areas. Though coniferous forests were highly disturbed either naturally or anthropogenic activities, therefore sampling was preferred to those forests which were near fault line. Trees were sampled using Point Centered Quarter (PCQ) method. Results of cluster analysis (using Ward's method) yielded six groups dominated by different conifer species. Group I and V were dominated by Pinus wallichiana while this species was co-dominant in group III. Other groups showed the dominance of different conifer species i.e. Cedrus deodara, Pinus roxburghii, Picea smithiana and Abies pindrow. Both the cluster analysis and ordination techniques (by two dimensional non-metric multidimensional scaling) classify and ordinate the structure of various groups indicating interrelationship among different species. The groups of trees were readily be superimposed on NMS ordination axes; they were well classified and well separated out in ordination. The present research revealed that these forests had diverse and asymmetric structure due to natural anthropogenic disturbances and overgrazing, which were key factors in addition to natural disturbances. However, some of the forests showed considerably stable structure due to less human interference. (author)

  17. A multivariate decision tree analysis of biophysical factors in tropical forest fire occurrence

    Science.gov (United States)

    Rey S. Ofren; Edward Harvey

    2000-01-01

    A multivariate decision tree model was used to quantify the relative importance of complex hierarchical relationships between biophysical variables and the occurrence of tropical forest fires. The study site is the Huai Kha Kbaeng wildlife sanctuary, a World Heritage Site in northwestern Thailand where annual fires are common and particularly destructive. Thematic...

  18. Resent state and multivariate analysis of a few juniper forests of baluchistan, pakistan

    International Nuclear Information System (INIS)

    Ahmed, M.; Siddiqui, M.F.

    2015-01-01

    Quantitative multivariate investigations were carried out to explore various forms of Juniper trees resulting human disturbances and natural phenomenon. Thirty stands were sampled by point centered quarter method and data were analysed using Wards cluster analysis and Bray-Curtis ordination. On the basis of multivariate analysis eight various forms i.e. healthy, unhealthy, over mature, disturbed, dieback, standing dead, logs and cut stem were recognized. Structural attributes were computed. Highest numbers (130-133 stem ha-1) of logs were recorded from Cautair and Khunk forests. Highest density ha-1 (229 ha-1) of healthy plants was estimated from Tangi Top area while lowest number (24 ha-1) of healthy plants was found from Saraghara area. Multivariate analysis showed five groups in cluster and ordination diagrams. These groups are characterized on the basis of healthy, over mature, disturbed and logged trees of Juniper. Higher number (115, 96, 84, 80 ha-1) of disturbed trees were distributed at Speena Sukher, Srag Kazi, Prang Shella and Tangi Top respectively. Overall density does not show any significant relation with basal area m2 ha-1, degree of slopes and the elevation of the sampling stands. Present study show that each and every Juniper stands are highly disturbed mostly due to human influence, therefore prompt conservational steps should be taken to safe these forests. (author)

  19. Probabilistic, multi-variate flood damage modelling using random forests and Bayesian networks

    Science.gov (United States)

    Kreibich, Heidi; Schröter, Kai

    2015-04-01

    Decisions on flood risk management and adaptation are increasingly based on risk analyses. Such analyses are associated with considerable uncertainty, even more if changes in risk due to global change are expected. Although uncertainty analysis and probabilistic approaches have received increased attention recently, they are hardly applied in flood damage assessments. Most of the damage models usually applied in standard practice have in common that complex damaging processes are described by simple, deterministic approaches like stage-damage functions. This presentation will show approaches for probabilistic, multi-variate flood damage modelling on the micro- and meso-scale and discuss their potential and limitations. Reference: Merz, B.; Kreibich, H.; Lall, U. (2013): Multi-variate flood damage assessment: a tree-based data-mining approach. NHESS, 13(1), 53-64. Schröter, K., Kreibich, H., Vogel, K., Riggelsen, C., Scherbaum, F., Merz, B. (2014): How useful are complex flood damage models? - Water Resources Research, 50, 4, p. 3378-3395.

  20. Correlated random sampling for multivariate normal and log-normal distributions

    International Nuclear Information System (INIS)

    Žerovnik, Gašper; Trkov, Andrej; Kodeli, Ivan A.

    2012-01-01

    A method for correlated random sampling is presented. Representative samples for multivariate normal or log-normal distribution can be produced. Furthermore, any combination of normally and log-normally distributed correlated variables may be sampled to any requested accuracy. Possible applications of the method include sampling of resonance parameters which are used for reactor calculations.

  1. Random survival forests for competing risks

    DEFF Research Database (Denmark)

    Ishwaran, Hemant; Gerds, Thomas A; Kogalur, Udaya B

    2014-01-01

    We introduce a new approach to competing risks using random forests. Our method is fully non-parametric and can be used for selecting event-specific variables and for estimating the cumulative incidence function. We show that the method is highly effective for both prediction and variable selection...

  2. Multivariate geomorphic analysis of forest streams: Implications for assessment of land use impacts on channel condition

    Science.gov (United States)

    Richard. D. Wood-Smith; John M. Buffington

    1996-01-01

    Multivariate statistical analyses of geomorphic variables from 23 forest stream reaches in southeast Alaska result in successful discrimination between pristine streams and those disturbed by land management, specifically timber harvesting and associated road building. Results of discriminant function analysis indicate that a three-variable model discriminates 10...

  3. The Dirichet-Multinomial model for multivariate randomized response data and small samples

    NARCIS (Netherlands)

    Avetisyan, Marianna; Fox, Gerardus J.A.

    2012-01-01

    In survey sampling the randomized response (RR) technique can be used to obtain truthful answers to sensitive questions. Although the individual answers are masked due to the RR technique, individual (sensitive) response rates can be estimated when observing multivariate response data. The

  4. Applying a weighted random forests method to extract karst sinkholes from LiDAR data

    Science.gov (United States)

    Zhu, Junfeng; Pierskalla, William P.

    2016-02-01

    Detailed mapping of sinkholes provides critical information for mitigating sinkhole hazards and understanding groundwater and surface water interactions in karst terrains. LiDAR (Light Detection and Ranging) measures the earth's surface in high-resolution and high-density and has shown great potentials to drastically improve locating and delineating sinkholes. However, processing LiDAR data to extract sinkholes requires separating sinkholes from other depressions, which can be laborious because of the sheer number of the depressions commonly generated from LiDAR data. In this study, we applied the random forests, a machine learning method, to automatically separate sinkholes from other depressions in a karst region in central Kentucky. The sinkhole-extraction random forest was grown on a training dataset built from an area where LiDAR-derived depressions were manually classified through a visual inspection and field verification process. Based on the geometry of depressions, as well as natural and human factors related to sinkholes, 11 parameters were selected as predictive variables to form the dataset. Because the training dataset was imbalanced with the majority of depressions being non-sinkholes, a weighted random forests method was used to improve the accuracy of predicting sinkholes. The weighted random forest achieved an average accuracy of 89.95% for the training dataset, demonstrating that the random forest can be an effective sinkhole classifier. Testing of the random forest in another area, however, resulted in moderate success with an average accuracy rate of 73.96%. This study suggests that an automatic sinkhole extraction procedure like the random forest classifier can significantly reduce time and labor costs and makes its more tractable to map sinkholes using LiDAR data for large areas. However, the random forests method cannot totally replace manual procedures, such as visual inspection and field verification.

  5. A Variable Impacts Measurement in Random Forest for Mobile Cloud Computing

    Directory of Open Access Journals (Sweden)

    Jae-Hee Hur

    2017-01-01

    Full Text Available Recently, the importance of mobile cloud computing has increased. Mobile devices can collect personal data from various sensors within a shorter period of time and sensor-based data consists of valuable information from users. Advanced computation power and data analysis technology based on cloud computing provide an opportunity to classify massive sensor data into given labels. Random forest algorithm is known as black box model which is hardly able to interpret the hidden process inside. In this paper, we propose a method that analyzes the variable impact in random forest algorithm to clarify which variable affects classification accuracy the most. We apply Shapley Value with random forest to analyze the variable impact. Under the assumption that every variable cooperates as players in the cooperative game situation, Shapley Value fairly distributes the payoff of variables. Our proposed method calculates the relative contributions of the variables within its classification process. In this paper, we analyze the influence of variables and list the priority of variables that affect classification accuracy result. Our proposed method proves its suitability for data interpretation in black box model like a random forest so that the algorithm is applicable in mobile cloud computing environment.

  6. Assessing the potential of random forest method for estimating solar radiation using air pollution index

    International Nuclear Information System (INIS)

    Sun, Huaiwei; Gui, Dongwei; Yan, Baowei; Liu, Yi; Liao, Weihong; Zhu, Yan; Lu, Chengwei; Zhao, Na

    2016-01-01

    Highlights: • Models based on random forests for daily solar radiation estimation are proposed. • Three sites within different air pollution index conditions are considered. • Performance of random forests is better than that of empirical methodologies. • Special attention is given to the use of air pollution index. • The potential of air pollution index is assessed by random forest models. - Abstract: Simulations of solar radiation have become increasingly common in recent years because of the rapid global development and deployment of solar energy technologies. The effect of air pollution on solar radiation is well known. However, few studies have attempting to evaluate the potential of the air pollution index in estimating solar radiation. In this study, meteorological data, solar radiation, and air pollution index data from three sites having different air pollution index conditions are used to develop random forest models. We propose different random forest models with and without considering air pollution index data, and then compare their respective performance with that of empirical methodologies. In addition, a variable importance approach based on random forest is applied in order to assess input variables. The results show that the performance of random forest models with air pollution index data is better than that of the empirical methodologies, generating 9.1–17.0% lower values of root-mean-square error in a fitted period and 2.0–17.4% lower values of root-mean-square error in a predicted period. Both the comparative results of different random forest models and variance importance indicate that applying air pollution index data is improves estimation of solar radiation. Also, although the air pollution index values varied largely from season to season, the random forest models appear more robust performances in different seasons than different models. The findings can act as a guide in selecting used variables to estimate daily solar

  7. The Dirichlet-Multinomial Model for Multivariate Randomized Response Data and Small Samples

    Science.gov (United States)

    Avetisyan, Marianna; Fox, Jean-Paul

    2012-01-01

    In survey sampling the randomized response (RR) technique can be used to obtain truthful answers to sensitive questions. Although the individual answers are masked due to the RR technique, individual (sensitive) response rates can be estimated when observing multivariate response data. The beta-binomial model for binary RR data will be generalized…

  8. UNDERSTANDING SEVERE WEATHER PROCESSES THROUGH SPATIOTEMPORAL RELATIONAL RANDOM FORESTS

    Data.gov (United States)

    National Aeronautics and Space Administration — UNDERSTANDING SEVERE WEATHER PROCESSES THROUGH SPATIOTEMPORAL RELATIONAL RANDOM FORESTS AMY MCGOVERN, TIMOTHY SUPINIE, DAVID JOHN GAGNE II, NATHANIEL TROUTMAN,...

  9. Random Forest Application for NEXRAD Radar Data Quality Control

    Science.gov (United States)

    Keem, M.; Seo, B. C.; Krajewski, W. F.

    2017-12-01

    Identification and elimination of non-meteorological radar echoes (e.g., returns from ground, wind turbines, and biological targets) are the basic data quality control steps before radar data use in quantitative applications (e.g., precipitation estimation). Although WSR-88Ds' recent upgrade to dual-polarization has enhanced this quality control and echo classification, there are still challenges to detect some non-meteorological echoes that show precipitation-like characteristics (e.g., wind turbine or anomalous propagation clutter embedded in rain). With this in mind, a new quality control method using Random Forest is proposed in this study. This classification algorithm is known to produce reliable results with less uncertainty. The method introduces randomness into sampling and feature selections and integrates consequent multiple decision trees. The multidimensional structure of the trees can characterize the statistical interactions of involved multiple features in complex situations. The authors explore the performance of Random Forest method for NEXRAD radar data quality control. Training datasets are selected using several clear cases of precipitation and non-precipitation (but with some non-meteorological echoes). The model is structured using available candidate features (from the NEXRAD data) such as horizontal reflectivity, differential reflectivity, differential phase shift, copolar correlation coefficient, and their horizontal textures (e.g., local standard deviation). The influence of each feature on classification results are quantified by variable importance measures that are automatically estimated by the Random Forest algorithm. Therefore, the number and types of features in the final forest can be examined based on the classification accuracy. The authors demonstrate the capability of the proposed approach using several cases ranging from distinct to complex rain/no-rain events and compare the performance with the existing algorithms (e

  10. Predicting Coastal Flood Severity using Random Forest Algorithm

    Science.gov (United States)

    Sadler, J. M.; Goodall, J. L.; Morsy, M. M.; Spencer, K.

    2017-12-01

    Coastal floods have become more common recently and are predicted to further increase in frequency and severity due to sea level rise. Predicting floods in coastal cities can be difficult due to the number of environmental and geographic factors which can influence flooding events. Built stormwater infrastructure and irregular urban landscapes add further complexity. This paper demonstrates the use of machine learning algorithms in predicting street flood occurrence in an urban coastal setting. The model is trained and evaluated using data from Norfolk, Virginia USA from September 2010 - October 2016. Rainfall, tide levels, water table levels, and wind conditions are used as input variables. Street flooding reports made by city workers after named and unnamed storm events, ranging from 1-159 reports per event, are the model output. Results show that Random Forest provides predictive power in estimating the number of flood occurrences given a set of environmental conditions with an out-of-bag root mean squared error of 4.3 flood reports and a mean absolute error of 0.82 flood reports. The Random Forest algorithm performed much better than Poisson regression. From the Random Forest model, total daily rainfall was by far the most important factor in flood occurrence prediction, followed by daily low tide and daily higher high tide. The model demonstrated here could be used to predict flood severity based on forecast rainfall and tide conditions and could be further enhanced using more complete street flooding data for model training.

  11. Bias in random forest variable importance measures: Illustrations, sources and a solution

    Directory of Open Access Journals (Sweden)

    Hothorn Torsten

    2007-01-01

    Full Text Available Abstract Background Variable importance measures for random forests have been receiving increased attention as a means of variable selection in many classification tasks in bioinformatics and related scientific fields, for instance to select a subset of genetic markers relevant for the prediction of a certain disease. We show that random forest variable importance measures are a sensible means for variable selection in many applications, but are not reliable in situations where potential predictor variables vary in their scale of measurement or their number of categories. This is particularly important in genomics and computational biology, where predictors often include variables of different types, for example when predictors include both sequence data and continuous variables such as folding energy, or when amino acid sequence data show different numbers of categories. Results Simulation studies are presented illustrating that, when random forest variable importance measures are used with data of varying types, the results are misleading because suboptimal predictor variables may be artificially preferred in variable selection. The two mechanisms underlying this deficiency are biased variable selection in the individual classification trees used to build the random forest on one hand, and effects induced by bootstrap sampling with replacement on the other hand. Conclusion We propose to employ an alternative implementation of random forests, that provides unbiased variable selection in the individual classification trees. When this method is applied using subsampling without replacement, the resulting variable importance measures can be used reliably for variable selection even in situations where the potential predictor variables vary in their scale of measurement or their number of categories. The usage of both random forest algorithms and their variable importance measures in the R system for statistical computing is illustrated and

  12. Applications of random forest feature selection for fine-scale genetic population assignment.

    Science.gov (United States)

    Sylvester, Emma V A; Bentzen, Paul; Bradbury, Ian R; Clément, Marie; Pearce, Jon; Horne, John; Beiko, Robert G

    2018-02-01

    Genetic population assignment used to inform wildlife management and conservation efforts requires panels of highly informative genetic markers and sensitive assignment tests. We explored the utility of machine-learning algorithms (random forest, regularized random forest and guided regularized random forest) compared with F ST ranking for selection of single nucleotide polymorphisms (SNP) for fine-scale population assignment. We applied these methods to an unpublished SNP data set for Atlantic salmon ( Salmo salar ) and a published SNP data set for Alaskan Chinook salmon ( Oncorhynchus tshawytscha ). In each species, we identified the minimum panel size required to obtain a self-assignment accuracy of at least 90% using each method to create panels of 50-700 markers Panels of SNPs identified using random forest-based methods performed up to 7.8 and 11.2 percentage points better than F ST -selected panels of similar size for the Atlantic salmon and Chinook salmon data, respectively. Self-assignment accuracy ≥90% was obtained with panels of 670 and 384 SNPs for each data set, respectively, a level of accuracy never reached for these species using F ST -selected panels. Our results demonstrate a role for machine-learning approaches in marker selection across large genomic data sets to improve assignment for management and conservation of exploited populations.

  13. Rapid Land Cover Map Updates Using Change Detection and Robust Random Forest Classifiers

    Directory of Open Access Journals (Sweden)

    Konrad J. Wessels

    2016-10-01

    Full Text Available The paper evaluated the Landsat Automated Land Cover Update Mapping (LALCUM system designed to rapidly update a land cover map to a desired nominal year using a pre-existing reference land cover map. The system uses the Iteratively Reweighted Multivariate Alteration Detection (IRMAD to identify areas of change and no change. The system then automatically generates large amounts of training samples (n > 1 million in the no-change areas as input to an optimized Random Forest classifier. Experiments were conducted in the KwaZulu-Natal Province of South Africa using a reference land cover map from 2008, a change mask between 2008 and 2011 and Landsat ETM+ data for 2011. The entire system took 9.5 h to process. We expected that the use of the change mask would improve classification accuracy by reducing the number of mislabeled training data caused by land cover change between 2008 and 2011. However, this was not the case due to exceptional robustness of Random Forest classifier to mislabeled training samples. The system achieved an overall accuracy of 65%–67% using 22 detailed classes and 72%–74% using 12 aggregated national classes. “Water”, “Plantations”, “Plantations—clearfelled”, “Orchards—trees”, “Sugarcane”, “Built-up/dense settlement”, “Cultivation—Irrigated” and “Forest (indigenous” had user’s accuracies above 70%. Other detailed classes (e.g., “Low density settlements”, “Mines and Quarries”, and “Cultivation, subsistence, drylands” which are required for operational, provincial-scale land use planning and are usually mapped using manual image interpretation, could not be mapped using Landsat spectral data alone. However, the system was able to map the 12 national classes, at a sufficiently high level of accuracy for national scale land cover monitoring. This update approach and the highly automated, scalable LALCUM system can improve the efficiency and update rate of regional land

  14. Nonparametric indices of dependence between components for inhomogeneous multivariate random measures and marked sets

    OpenAIRE

    van Lieshout, Maria Nicolette Margaretha

    2018-01-01

    We propose new summary statistics to quantify the association between the components in coverage-reweighted moment stationary multivariate random sets and measures. They are defined in terms of the coverage-reweighted cumulant densities and extend classic functional statistics for stationary random closed sets. We study the relations between these statistics and evaluate them explicitly for a range of models. Unbiased estimators are given for all statistics and applied to simulated examples a...

  15. Statistical Uncertainty Estimation Using Random Forests and Its Application to Drought Forecast

    OpenAIRE

    Chen, Junfei; Li, Ming; Wang, Weiguang

    2012-01-01

    Drought is part of natural climate variability and ranks the first natural disaster in the world. Drought forecasting plays an important role in mitigating impacts on agriculture and water resources. In this study, a drought forecast model based on the random forest method is proposed to predict the time series of monthly standardized precipitation index (SPI). We demonstrate model application by four stations in the Haihe river basin, China. The random-forest- (RF-) based forecast model has ...

  16. Prediction of N2O emission from local information with Random Forest

    International Nuclear Information System (INIS)

    Philibert, Aurore; Loyce, Chantal; Makowski, David

    2013-01-01

    Nitrous oxide is a potent greenhouse gas, with a global warming potential 298 times greater than that of CO 2 . In agricultural soils, N 2 O emissions are influenced by a large number of environmental characteristics and crop management techniques that are not systematically reported in experiments. Random Forest (RF) is a machine learning method that can handle missing data and ranks input variables on the basis of their importance. We aimed to predict N 2 O emission on the basis of local information, to rank environmental and crop management variables according to their influence on N 2 O emission, and to compare the performances of RF with several regression models. RF outperformed the regression models for predictive purposes, and this approach led to the identification of three important input variables: N fertilization, type of crop, and experiment duration. This method could be used in the future for prediction of N 2 O emissions from local information. -- Highlights: ► Random Forest gave more accurate N 2 O predictions than regression. ► Missing data were well handled by Random Forest. ► The most important factors were nitrogen rate, type of crop and experiment duration. -- Random Forest, a machine learning method, outperformed the regression models for predicting N 2 O emissions and led to the identification of three important input variables

  17. Research on machine learning framework based on random forest algorithm

    Science.gov (United States)

    Ren, Qiong; Cheng, Hui; Han, Hai

    2017-03-01

    With the continuous development of machine learning, industry and academia have released a lot of machine learning frameworks based on distributed computing platform, and have been widely used. However, the existing framework of machine learning is limited by the limitations of machine learning algorithm itself, such as the choice of parameters and the interference of noises, the high using threshold and so on. This paper introduces the research background of machine learning framework, and combined with the commonly used random forest algorithm in machine learning classification algorithm, puts forward the research objectives and content, proposes an improved adaptive random forest algorithm (referred to as ARF), and on the basis of ARF, designs and implements the machine learning framework.

  18. Aggregated recommendation through random forests.

    Science.gov (United States)

    Zhang, Heng-Ru; Min, Fan; He, Xu

    2014-01-01

    Aggregated recommendation refers to the process of suggesting one kind of items to a group of users. Compared to user-oriented or item-oriented approaches, it is more general and, therefore, more appropriate for cold-start recommendation. In this paper, we propose a random forest approach to create aggregated recommender systems. The approach is used to predict the rating of a group of users to a kind of items. In the preprocessing stage, we merge user, item, and rating information to construct an aggregated decision table, where rating information serves as the decision attribute. We also model the data conversion process corresponding to the new user, new item, and both new problems. In the training stage, a forest is built for the aggregated training set, where each leaf is assigned a distribution of discrete rating. In the testing stage, we present four predicting approaches to compute evaluation values based on the distribution of each tree. Experiments results on the well-known MovieLens dataset show that the aggregated approach maintains an acceptable level of accuracy.

  19. Multivariable Christoffel-Darboux Kernels and Characteristic Polynomials of Random Hermitian Matrices

    Directory of Open Access Journals (Sweden)

    Hjalmar Rosengren

    2006-12-01

    Full Text Available We study multivariable Christoffel-Darboux kernels, which may be viewed as reproducing kernels for antisymmetric orthogonal polynomials, and also as correlation functions for products of characteristic polynomials of random Hermitian matrices. Using their interpretation as reproducing kernels, we obtain simple proofs of Pfaffian and determinant formulas, as well as Schur polynomial expansions, for such kernels. In subsequent work, these results are applied in combinatorics (enumeration of marked shifted tableaux and number theory (representation of integers as sums of squares.

  20. Brain Tumor Segmentation Based on Random Forest

    Directory of Open Access Journals (Sweden)

    László Lefkovits

    2016-09-01

    Full Text Available In this article we present a discriminative model for tumor detection from multimodal MR images. The main part of the model is built around the random forest (RF classifier. We created an optimization algorithm able to select the important features for reducing the dimensionality of data. This method is also used to find out the training parameters used in the learning phase. The algorithm is based on random feature properties for evaluating the importance of the variable, the evolution of learning errors and the proximities between instances. The detection performances obtained have been compared with the most recent systems, offering similar results.

  1. UAV Remote Sensing for Urban Vegetation Mapping Using Random Forest and Texture Analysis

    Directory of Open Access Journals (Sweden)

    Quanlong Feng

    2015-01-01

    Full Text Available Unmanned aerial vehicle (UAV remote sensing has great potential for vegetation mapping in complex urban landscapes due to the ultra-high resolution imagery acquired at low altitudes. Because of payload capacity restrictions, off-the-shelf digital cameras are widely used on medium and small sized UAVs. The limitation of low spectral resolution in digital cameras for vegetation mapping can be reduced by incorporating texture features and robust classifiers. Random Forest has been widely used in satellite remote sensing applications, but its usage in UAV image classification has not been well documented. The objectives of this paper were to propose a hybrid method using Random Forest and texture analysis to accurately differentiate land covers of urban vegetated areas, and analyze how classification accuracy changes with texture window size. Six least correlated second-order texture measures were calculated at nine different window sizes and added to original Red-Green-Blue (RGB images as ancillary data. A Random Forest classifier consisting of 200 decision trees was used for classification in the spectral-textural feature space. Results indicated the following: (1 Random Forest outperformed traditional Maximum Likelihood classifier and showed similar performance to object-based image analysis in urban vegetation classification; (2 the inclusion of texture features improved classification accuracy significantly; (3 classification accuracy followed an inverted U relationship with texture window size. The results demonstrate that UAV provides an efficient and ideal platform for urban vegetation mapping. The hybrid method proposed in this paper shows good performance in differentiating urban vegetation mapping. The drawbacks of off-the-shelf digital cameras can be reduced by adopting Random Forest and texture analysis at the same time.

  2. Comparison of the Predictive Performance and Interpretability of Random Forest and Linear Models on Benchmark Data Sets.

    Science.gov (United States)

    Marchese Robinson, Richard L; Palczewska, Anna; Palczewski, Jan; Kidley, Nathan

    2017-08-28

    The ability to interpret the predictions made by quantitative structure-activity relationships (QSARs) offers a number of advantages. While QSARs built using nonlinear modeling approaches, such as the popular Random Forest algorithm, might sometimes be more predictive than those built using linear modeling approaches, their predictions have been perceived as difficult to interpret. However, a growing number of approaches have been proposed for interpreting nonlinear QSAR models in general and Random Forest in particular. In the current work, we compare the performance of Random Forest to those of two widely used linear modeling approaches: linear Support Vector Machines (SVMs) (or Support Vector Regression (SVR)) and partial least-squares (PLS). We compare their performance in terms of their predictivity as well as the chemical interpretability of the predictions using novel scoring schemes for assessing heat map images of substructural contributions. We critically assess different approaches for interpreting Random Forest models as well as for obtaining predictions from the forest. We assess the models on a large number of widely employed public-domain benchmark data sets corresponding to regression and binary classification problems of relevance to hit identification and toxicology. We conclude that Random Forest typically yields comparable or possibly better predictive performance than the linear modeling approaches and that its predictions may also be interpreted in a chemically and biologically meaningful way. In contrast to earlier work looking at interpretation of nonlinear QSAR models, we directly compare two methodologically distinct approaches for interpreting Random Forest models. The approaches for interpreting Random Forest assessed in our article were implemented using open-source programs that we have made available to the community. These programs are the rfFC package ( https://r-forge.r-project.org/R/?group_id=1725 ) for the R statistical

  3. Prediction of soil CO2 flux in sugarcane management systems using the Random Forest approach

    Directory of Open Access Journals (Sweden)

    Rose Luiza Moraes Tavares

    Full Text Available ABSTRACT: The Random Forest algorithm is a data mining technique used for classifying attributes in order of importance to explain the variation in an attribute-target, as soil CO2 flux. This study aimed to identify prediction of soil CO2 flux variables in management systems of sugarcane through the machine-learning algorithm called Random Forest. Two different management areas of sugarcane in the state of São Paulo, Brazil, were selected: burned and green. In each area, we assembled a sampling grid with 81 georeferenced points to assess soil CO2 flux through automated portable soil gas chamber with measuring spectroscopy in the infrared during the dry season of 2011 and the rainy season of 2012. In addition, we sampled the soil to evaluate physical, chemical, and microbiological attributes. For data interpretation, we used the Random Forest algorithm, based on the combination of predicted decision trees (machine learning algorithms in which every tree depends on the values of a random vector sampled independently with the same distribution to all the trees of the forest. The results indicated that clay content in the soil was the most important attribute to explain the CO2 flux in the areas studied during the evaluated period. The use of the Random Forest algorithm originated a model with a good fit (R2 = 0.80 for predicted and observed values.

  4. Prediction of 90Y Radioembolization Outcome from Pretherapeutic Factors with Random Survival Forests.

    Science.gov (United States)

    Ingrisch, Michael; Schöppe, Franziska; Paprottka, Karolin; Fabritius, Matthias; Strobl, Frederik F; De Toni, Enrico N; Ilhan, Harun; Todica, Andrei; Michl, Marlies; Paprottka, Philipp Marius

    2018-05-01

    Our objective was to predict the outcome of 90 Y radioembolization in patients with intrahepatic tumors from pretherapeutic baseline parameters and to identify predictive variables using a machine-learning approach based on random survival forests. Methods: In this retrospective study, 366 patients with primary ( n = 92) or secondary ( n = 274) liver tumors who had received 90 Y radioembolization were analyzed. A random survival forest was trained to predict individual risk from baseline values of cholinesterase, bilirubin, type of primary tumor, age at radioembolization, hepatic tumor burden, presence of extrahepatic disease, and sex. The predictive importance of each baseline parameter was determined using the minimal-depth concept, and the partial dependency of predicted risk on the continuous variables bilirubin level and cholinesterase level was determined. Results: Median overall survival was 11.4 mo (95% confidence interval, 9.7-14.2 mo), with 228 deaths occurring during the observation period. The random-survival-forest analysis identified baseline cholinesterase and bilirubin as the most important variables (forest-averaged lowest minimal depth, 1.2 and 1.5, respectively), followed by the type of primary tumor (1.7), age (2.4), tumor burden (2.8), and presence of extrahepatic disease (3.5). Sex had the highest forest-averaged minimal depth (5.5), indicating little predictive value. Baseline bilirubin levels above 1.5 mg/dL were associated with a steep increase in predicted mortality. Similarly, cholinesterase levels below 7.5 U predicted a strong increase in mortality. The trained random survival forest achieved a concordance index of 0.657, with an SE of 0.02, comparable to the concordance index of 0.652 and SE of 0.02 for a previously published Cox proportional hazards model. Conclusion: Random survival forests are a simple and straightforward machine-learning approach for prediction of overall survival. The predictive performance of the trained model

  5. Variable Selection in Time Series Forecasting Using Random Forests

    Directory of Open Access Journals (Sweden)

    Hristos Tyralis

    2017-10-01

    Full Text Available Time series forecasting using machine learning algorithms has gained popularity recently. Random forest is a machine learning algorithm implemented in time series forecasting; however, most of its forecasting properties have remained unexplored. Here we focus on assessing the performance of random forests in one-step forecasting using two large datasets of short time series with the aim to suggest an optimal set of predictor variables. Furthermore, we compare its performance to benchmarking methods. The first dataset is composed by 16,000 simulated time series from a variety of Autoregressive Fractionally Integrated Moving Average (ARFIMA models. The second dataset consists of 135 mean annual temperature time series. The highest predictive performance of RF is observed when using a low number of recent lagged predictor variables. This outcome could be useful in relevant future applications, with the prospect to achieve higher predictive accuracy.

  6. Application of lifting wavelet and random forest in compound fault diagnosis of gearbox

    Science.gov (United States)

    Chen, Tang; Cui, Yulian; Feng, Fuzhou; Wu, Chunzhi

    2018-03-01

    Aiming at the weakness of compound fault characteristic signals of a gearbox of an armored vehicle and difficult to identify fault types, a fault diagnosis method based on lifting wavelet and random forest is proposed. First of all, this method uses the lifting wavelet transform to decompose the original vibration signal in multi-layers, reconstructs the multi-layer low-frequency and high-frequency components obtained by the decomposition to get multiple component signals. Then the time-domain feature parameters are obtained for each component signal to form multiple feature vectors, which is input into the random forest pattern recognition classifier to determine the compound fault type. Finally, a variety of compound fault data of the gearbox fault analog test platform are verified, the results show that the recognition accuracy of the fault diagnosis method combined with the lifting wavelet and the random forest is up to 99.99%.

  7. Global patterns and predictions of seafloor biomass using random forests

    Digital Repository Service at National Institute of Oceanography (India)

    Wei, Chih-Lin; Rowe, G.T.; Escobar-Briones, E.; Boetius, A; Soltwedel, T.; Caley, M.J.; Soliman, Y.; Huettmann, F.; Qu, F.; Yu, Z.; Pitcher, C.R.; Haedrich, R.L.; Wicksten, M.K.; Rex, M.A; Baguley, J.G.; Sharma, J.; Danovaro, R.; MacDonald, I.R.; Nunnally, C.C.; Deming, J.W.; Montagna, P.; Levesque, M.; Weslawsk, J.M.; Wlodarska-Kowalczuk, M.; Ingole, B.S.; Bett, B.J.; Billett, D.S.M.; Yool, A; Bluhm, B.A; Iken, K.; Narayanaswamy, B.E.

    A comprehensive seafloor biomass and abundance database has been constructed from 24 oceanographic institutions worldwide within the Census of Marine Life (CoML) field projects. The machine-learning algorithm, Random Forests, was employed to model...

  8. Clustering Single-Cell Expression Data Using Random Forest Graphs.

    Science.gov (United States)

    Pouyan, Maziyar Baran; Nourani, Mehrdad

    2017-07-01

    Complex tissues such as brain and bone marrow are made up of multiple cell types. As the study of biological tissue structure progresses, the role of cell-type-specific research becomes increasingly important. Novel sequencing technology such as single-cell cytometry provides researchers access to valuable biological data. Applying machine-learning techniques to these high-throughput datasets provides deep insights into the cellular landscape of the tissue where those cells are a part of. In this paper, we propose the use of random-forest-based single-cell profiling, a new machine-learning-based technique, to profile different cell types of intricate tissues using single-cell cytometry data. Our technique utilizes random forests to capture cell marker dependences and model the cellular populations using the cell network concept. This cellular network helps us discover what cell types are in the tissue. Our experimental results on public-domain datasets indicate promising performance and accuracy of our technique in extracting cell populations of complex tissues.

  9. Random Forests for Evaluating Pedagogy and Informing Personalized Learning

    Science.gov (United States)

    Spoon, Kelly; Beemer, Joshua; Whitmer, John C.; Fan, Juanjuan; Frazee, James P.; Stronach, Jeanne; Bohonak, Andrew J.; Levine, Richard A.

    2016-01-01

    Random forests are presented as an analytics foundation for educational data mining tasks. The focus is on course- and program-level analytics including evaluating pedagogical approaches and interventions and identifying and characterizing at-risk students. As part of this development, the concept of individualized treatment effects (ITE) is…

  10. A random forest algorithm for nowcasting of intense precipitation events

    Science.gov (United States)

    Das, Saurabh; Chakraborty, Rohit; Maitra, Animesh

    2017-09-01

    Automatic nowcasting of convective initiation and thunderstorms has potential applications in several sectors including aviation planning and disaster management. In this paper, random forest based machine learning algorithm is tested for nowcasting of convective rain with a ground based radiometer. Brightness temperatures measured at 14 frequencies (7 frequencies in 22-31 GHz band and 7 frequencies in 51-58 GHz bands) are utilized as the inputs of the model. The lower frequency band is associated to the water vapor absorption whereas the upper frequency band relates to the oxygen absorption and hence, provide information on the temperature and humidity of the atmosphere. Synthetic minority over-sampling technique is used to balance the data set and 10-fold cross validation is used to assess the performance of the model. Results indicate that random forest algorithm with fixed alarm generation time of 30 min and 60 min performs quite well (probability of detection of all types of weather condition ∼90%) with low false alarms. It is, however, also observed that reducing the alarm generation time improves the threat score significantly and also decreases false alarms. The proposed model is found to be very sensitive to the boundary layer instability as indicated by the variable importance measure. The study shows the suitability of a random forest algorithm for nowcasting application utilizing a large number of input parameters from diverse sources and can be utilized in other forecasting problems.

  11. Quantifying and mapping spatial variability in simulated forest plots

    Science.gov (United States)

    Gavin R. Corral; Harold E. Burkhart

    2016-01-01

    We used computer simulations to test the efficacy of multivariate statistical methods to detect, quantify, and map spatial variability of forest stands. Simulated stands were developed of regularly-spaced plantations of loblolly pine (Pinus taeda L.). We assumed no affects of competition or mortality, but random variability was added to individual tree characteristics...

  12. Machine-learning techniques for family demography: an application of random forests to the analysis of divorce determinants in Germany

    OpenAIRE

    Arpino, Bruno; Le Moglie, Marco; Mencarini, Letizia

    2018-01-01

    Demographers often analyze the determinants of life-course events with parametric regression-type approaches. Here, we present a class of nonparametric approaches, broadly defined as machine learning (ML) techniques, and discuss advantages and disadvantages of a popular type known as random forest. We argue that random forests can be useful either as a substitute, or a complement, to more standard parametric regression modeling. Our discussion of random forests is intuitive and...

  13. Random Forest Variable Importance Spectral Indices Scheme for Burnt Forest Recovery Monitoring—Multilevel RF-VIMP

    Directory of Open Access Journals (Sweden)

    Sornkitja Boonprong

    2018-05-01

    Full Text Available Burnt forest recovery is normally monitored with a time-series analysis of satellite data because of its proficiency for large observation areas. Traditional methods, such as linear correlation plotting, have been proven to be effective, as forest recovery naturally increases with time. However, these methods are complicated and time consuming when increasing the number of observed parameters. In this work, we present a random forest variable importance (RF-VIMP scheme called multilevel RF-VIMP to compare and assess the relationship between 36 spectral indices (parameters of burnt boreal forest recovery in the Great Xing’an Mountain, China. Six Landsat images were acquired in the same month 0, 1, 4, 14, 16, and 20 years after a fire, and 39,380 fixed-location samples were then extracted to calculate the effectiveness of the 36 parameters. Consequently, the proposed method was applied to find correlations between the forest recovery indices. The experiment showed that the proposed method is suitable for explaining the efficacy of those spectral indices in terms of discrimination and trend analysis, and for showing the satellite data and forest succession dynamics when applied in a time series. The results suggest that the tasseled cap transformation wetness, brightness, and the shortwave infrared bands (both 1 and 2 perform better than other indices for both classification and monitoring.

  14. The simultaneous use of several pseudo-random binary sequences in the identification of linear multivariable dynamic systems

    International Nuclear Information System (INIS)

    Cummins, J.D.

    1965-02-01

    With several white noise sources the various transmission paths of a linear multivariable system may be determined simultaneously. This memorandum considers the restrictions on pseudo-random two state sequences to effect simultaneous identification of several transmission paths and the consequential rejection of cross-coupled signals in linear multivariable systems. The conditions for simultaneous identification are established by an example, which shows that the integration time required is large i.e. tends to infinity, as it does when white noise sources are used. (author)

  15. Spectral Classification of Asteroids by Random Forest

    Science.gov (United States)

    Huang, C.; Ma, Y. H.; Zhao, H. B.; Lu, X. P.

    2016-09-01

    With the increasing asteroid spectral and photometric data, a variety of classification methods for asteroids have been proposed. This paper classifies asteroids based on the observations of Sloan Digital Sky Survey (SDSS) Moving Object Catalogue (MOC) by using the random forest algorithm. With the training data derived from the taxonomies of Tholen, Bus, Lazzaro, DeMeo, and Principal Component Analysis, we classify 48642 asteroids according to g, r, i, and z SDSS magnitudes. In this way, asteroids are divided into 8 spectral classes (C, X, S, B, D, K, L, and V).

  16. RandomForest4Life: a Random Forest for predicting ALS disease progression.

    Science.gov (United States)

    Hothorn, Torsten; Jung, Hans H

    2014-09-01

    We describe a method for predicting disease progression in amyotrophic lateral sclerosis (ALS) patients. The method was developed as a submission to the DREAM Phil Bowen ALS Prediction Prize4Life Challenge of summer 2012. Based on repeated patient examinations over a three- month period, we used a random forest algorithm to predict future disease progression. The procedure was set up and internally evaluated using data from 1197 ALS patients. External validation by an expert jury was based on undisclosed information of an additional 625 patients; all patient data were obtained from the PRO-ACT database. In terms of prediction accuracy, the approach described here ranked third best. Our interpretation of the prediction model confirmed previous reports suggesting that past disease progression is a strong predictor of future disease progression measured on the ALS functional rating scale (ALSFRS). We also found that larger variability in initial ALSFRS scores is linked to faster future disease progression. The results reported here furthermore suggested that approaches taking the multidimensionality of the ALSFRS into account promise some potential for improved ALS disease prediction.

  17. Multivariate Feature Selection of Image Descriptors Data for Breast Cancer with Computer-Assisted Diagnosis

    Directory of Open Access Journals (Sweden)

    Carlos E. Galván-Tejada

    2017-02-01

    Full Text Available Breast cancer is an important global health problem, and the most common type of cancer among women. Late diagnosis significantly decreases the survival rate of the patient; however, using mammography for early detection has been demonstrated to be a very important tool increasing the survival rate. The purpose of this paper is to obtain a multivariate model to classify benign and malignant tumor lesions using a computer-assisted diagnosis with a genetic algorithm in training and test datasets from mammography image features. A multivariate search was conducted to obtain predictive models with different approaches, in order to compare and validate results. The multivariate models were constructed using: Random Forest, Nearest centroid, and K-Nearest Neighbor (K-NN strategies as cost function in a genetic algorithm applied to the features in the BCDR public databases. Results suggest that the two texture descriptor features obtained in the multivariate model have a similar or better prediction capability to classify the data outcome compared with the multivariate model composed of all the features, according to their fitness value. This model can help to reduce the workload of radiologists and present a second opinion in the classification of tumor lesions.

  18. Multivariate Feature Selection of Image Descriptors Data for Breast Cancer with Computer-Assisted Diagnosis.

    Science.gov (United States)

    Galván-Tejada, Carlos E; Zanella-Calzada, Laura A; Galván-Tejada, Jorge I; Celaya-Padilla, José M; Gamboa-Rosales, Hamurabi; Garza-Veloz, Idalia; Martinez-Fierro, Margarita L

    2017-02-14

    Breast cancer is an important global health problem, and the most common type of cancer among women. Late diagnosis significantly decreases the survival rate of the patient; however, using mammography for early detection has been demonstrated to be a very important tool increasing the survival rate. The purpose of this paper is to obtain a multivariate model to classify benign and malignant tumor lesions using a computer-assisted diagnosis with a genetic algorithm in training and test datasets from mammography image features. A multivariate search was conducted to obtain predictive models with different approaches, in order to compare and validate results. The multivariate models were constructed using: Random Forest, Nearest centroid, and K-Nearest Neighbor (K-NN) strategies as cost function in a genetic algorithm applied to the features in the BCDR public databases. Results suggest that the two texture descriptor features obtained in the multivariate model have a similar or better prediction capability to classify the data outcome compared with the multivariate model composed of all the features, according to their fitness value. This model can help to reduce the workload of radiologists and present a second opinion in the classification of tumor lesions.

  19. Differential privacy-based evaporative cooling feature selection and classification with relief-F and random forests.

    Science.gov (United States)

    Le, Trang T; Simmons, W Kyle; Misaki, Masaya; Bodurka, Jerzy; White, Bill C; Savitz, Jonathan; McKinney, Brett A

    2017-09-15

    Classification of individuals into disease or clinical categories from high-dimensional biological data with low prediction error is an important challenge of statistical learning in bioinformatics. Feature selection can improve classification accuracy but must be incorporated carefully into cross-validation to avoid overfitting. Recently, feature selection methods based on differential privacy, such as differentially private random forests and reusable holdout sets, have been proposed. However, for domains such as bioinformatics, where the number of features is much larger than the number of observations p≫n , these differential privacy methods are susceptible to overfitting. We introduce private Evaporative Cooling, a stochastic privacy-preserving machine learning algorithm that uses Relief-F for feature selection and random forest for privacy preserving classification that also prevents overfitting. We relate the privacy-preserving threshold mechanism to a thermodynamic Maxwell-Boltzmann distribution, where the temperature represents the privacy threshold. We use the thermal statistical physics concept of Evaporative Cooling of atomic gases to perform backward stepwise privacy-preserving feature selection. On simulated data with main effects and statistical interactions, we compare accuracies on holdout and validation sets for three privacy-preserving methods: the reusable holdout, reusable holdout with random forest, and private Evaporative Cooling, which uses Relief-F feature selection and random forest classification. In simulations where interactions exist between attributes, private Evaporative Cooling provides higher classification accuracy without overfitting based on an independent validation set. In simulations without interactions, thresholdout with random forest and private Evaporative Cooling give comparable accuracies. We also apply these privacy methods to human brain resting-state fMRI data from a study of major depressive disorder. Code

  20. Statistical Uncertainty Estimation Using Random Forests and Its Application to Drought Forecast

    Directory of Open Access Journals (Sweden)

    Junfei Chen

    2012-01-01

    Full Text Available Drought is part of natural climate variability and ranks the first natural disaster in the world. Drought forecasting plays an important role in mitigating impacts on agriculture and water resources. In this study, a drought forecast model based on the random forest method is proposed to predict the time series of monthly standardized precipitation index (SPI. We demonstrate model application by four stations in the Haihe river basin, China. The random-forest- (RF- based forecast model has consistently shown better predictive skills than the ARIMA model for both long and short drought forecasting. The confidence intervals derived from the proposed model generally have good coverage, but still tend to be conservative to predict some extreme drought events.

  1. A comprehensive comparison of random forests and support vector machines for microarray-based cancer classification

    Directory of Open Access Journals (Sweden)

    Wang Lily

    2008-07-01

    Full Text Available Abstract Background Cancer diagnosis and clinical outcome prediction are among the most important emerging applications of gene expression microarray technology with several molecular signatures on their way toward clinical deployment. Use of the most accurate classification algorithms available for microarray gene expression data is a critical ingredient in order to develop the best possible molecular signatures for patient care. As suggested by a large body of literature to date, support vector machines can be considered "best of class" algorithms for classification of such data. Recent work, however, suggests that random forest classifiers may outperform support vector machines in this domain. Results In the present paper we identify methodological biases of prior work comparing random forests and support vector machines and conduct a new rigorous evaluation of the two algorithms that corrects these limitations. Our experiments use 22 diagnostic and prognostic datasets and show that support vector machines outperform random forests, often by a large margin. Our data also underlines the importance of sound research design in benchmarking and comparison of bioinformatics algorithms. Conclusion We found that both on average and in the majority of microarray datasets, random forests are outperformed by support vector machines both in the settings when no gene selection is performed and when several popular gene selection methods are used.

  2. Comparing spatial regression to random forests for large environmental data sets

    Science.gov (United States)

    Environmental data may be “large” due to number of records, number of covariates, or both. Random forests has a reputation for good predictive performance when using many covariates, whereas spatial regression, when using reduced rank methods, has a reputatio...

  3. Random forests of interaction trees for estimating individualized treatment effects in randomized trials.

    Science.gov (United States)

    Su, Xiaogang; Peña, Annette T; Liu, Lei; Levine, Richard A

    2018-04-29

    Assessing heterogeneous treatment effects is a growing interest in advancing precision medicine. Individualized treatment effects (ITEs) play a critical role in such an endeavor. Concerning experimental data collected from randomized trials, we put forward a method, termed random forests of interaction trees (RFIT), for estimating ITE on the basis of interaction trees. To this end, we propose a smooth sigmoid surrogate method, as an alternative to greedy search, to speed up tree construction. The RFIT outperforms the "separate regression" approach in estimating ITE. Furthermore, standard errors for the estimated ITE via RFIT are obtained with the infinitesimal jackknife method. We assess and illustrate the use of RFIT via both simulation and the analysis of data from an acupuncture headache trial. Copyright © 2018 John Wiley & Sons, Ltd.

  4. Random forest to differentiate dementia with Lewy bodies from Alzheimer's disease

    NARCIS (Netherlands)

    Dauwan, Meenakshi; van der Zande, Jessica J.; van Dellen, Edwin; Sommer, Iris E C; Scheltens, Philip; Lemstra, Afina W.; Stam, Cornelis J.

    2016-01-01

    Introduction The aim of this study was to build a random forest classifier to improve the diagnostic accuracy in differentiating dementia with Lewy bodies (DLB) from Alzheimer's disease (AD) and to quantify the relevance of multimodal diagnostic measures, with a focus on electroencephalography

  5. Comparing spatial regression to random forests for large ...

    Science.gov (United States)

    Environmental data may be “large” due to number of records, number of covariates, or both. Random forests has a reputation for good predictive performance when using many covariates, whereas spatial regression, when using reduced rank methods, has a reputation for good predictive performance when using many records. In this study, we compare these two techniques using a data set containing the macroinvertebrate multimetric index (MMI) at 1859 stream sites with over 200 landscape covariates. Our primary goal is predicting MMI at over 1.1 million perennial stream reaches across the USA. For spatial regression modeling, we develop two new methods to accommodate large data: (1) a procedure that estimates optimal Box-Cox transformations to linearize covariate relationships; and (2) a computationally efficient covariate selection routine that takes into account spatial autocorrelation. We show that our new methods lead to cross-validated performance similar to random forests, but that there is an advantage for spatial regression when quantifying the uncertainty of the predictions. Simulations are used to clarify advantages for each method. This research investigates different approaches for modeling and mapping national stream condition. We use MMI data from the EPA's National Rivers and Streams Assessment and predictors from StreamCat (Hill et al., 2015). Previous studies have focused on modeling the MMI condition classes (i.e., good, fair, and po

  6. Pigmented skin lesion detection using random forest and wavelet-based texture

    Science.gov (United States)

    Hu, Ping; Yang, Tie-jun

    2016-10-01

    The incidence of cutaneous malignant melanoma, a disease of worldwide distribution and is the deadliest form of skin cancer, has been rapidly increasing over the last few decades. Because advanced cutaneous melanoma is still incurable, early detection is an important step toward a reduction in mortality. Dermoscopy photographs are commonly used in melanoma diagnosis and can capture detailed features of a lesion. A great variability exists in the visual appearance of pigmented skin lesions. Therefore, in order to minimize the diagnostic errors that result from the difficulty and subjectivity of visual interpretation, an automatic detection approach is required. The objectives of this paper were to propose a hybrid method using random forest and Gabor wavelet transformation to accurately differentiate which part belong to lesion area and the other is not in a dermoscopy photographs and analyze segmentation accuracy. A random forest classifier consisting of a set of decision trees was used for classification. Gabor wavelets transformation are the mathematical model of visual cortical cells of mammalian brain and an image can be decomposed into multiple scales and multiple orientations by using it. The Gabor function has been recognized as a very useful tool in texture analysis, due to its optimal localization properties in both spatial and frequency domain. Texture features based on Gabor wavelets transformation are found by the Gabor filtered image. Experiment results indicate the following: (1) the proposed algorithm based on random forest outperformed the-state-of-the-art in pigmented skin lesions detection (2) and the inclusion of Gabor wavelet transformation based texture features improved segmentation accuracy significantly.

  7. Random Forest-Based Recognition of Isolated Sign Language Subwords Using Data from Accelerometers and Surface Electromyographic Sensors.

    Science.gov (United States)

    Su, Ruiliang; Chen, Xiang; Cao, Shuai; Zhang, Xu

    2016-01-14

    Sign language recognition (SLR) has been widely used for communication amongst the hearing-impaired and non-verbal community. This paper proposes an accurate and robust SLR framework using an improved decision tree as the base classifier of random forests. This framework was used to recognize Chinese sign language subwords using recordings from a pair of portable devices worn on both arms consisting of accelerometers (ACC) and surface electromyography (sEMG) sensors. The experimental results demonstrated the validity of the proposed random forest-based method for recognition of Chinese sign language (CSL) subwords. With the proposed method, 98.25% average accuracy was obtained for the classification of a list of 121 frequently used CSL subwords. Moreover, the random forests method demonstrated a superior performance in resisting the impact of bad training samples. When the proportion of bad samples in the training set reached 50%, the recognition error rate of the random forest-based method was only 10.67%, while that of a single decision tree adopted in our previous work was almost 27.5%. Our study offers a practical way of realizing a robust and wearable EMG-ACC-based SLR systems.

  8. Spectral Classification of Asteroids by Random Forest

    Science.gov (United States)

    Huang, Chao; Ma, Yue-hua; Zhao, Hai-bin; Lu, Xiao-ping

    2017-10-01

    With the increasing spectral and photometric data of asteroids, a variety of classification methods for asteroids have been proposed. This paper classifies asteroids based on the observations in the Sloan Digital Sky Survey (SDSS) Moving Object Catalogue (MOC) by using the random forest algorithm. In combination with the present taxonomies of Tholen, Bus, Lazzaro, and DeMeo, and the principal component analysis, we have classified 48642 asteroids according to their SDSS magnitudes at the g, r, i, and z wavebands. In this way, these asteroids are divided into 8 (C, X, S, B, D, K, L, and V) classes.

  9. A comparison of the conditional inference survival forest model to random survival forests based on a simulation study as well as on two applications with time-to-event data.

    Science.gov (United States)

    Nasejje, Justine B; Mwambi, Henry; Dheda, Keertan; Lesosky, Maia

    2017-07-28

    Random survival forest (RSF) models have been identified as alternative methods to the Cox proportional hazards model in analysing time-to-event data. These methods, however, have been criticised for the bias that results from favouring covariates with many split-points and hence conditional inference forests for time-to-event data have been suggested. Conditional inference forests (CIF) are known to correct the bias in RSF models by separating the procedure for the best covariate to split on from that of the best split point search for the selected covariate. In this study, we compare the random survival forest model to the conditional inference model (CIF) using twenty-two simulated time-to-event datasets. We also analysed two real time-to-event datasets. The first dataset is based on the survival of children under-five years of age in Uganda and it consists of categorical covariates with most of them having more than two levels (many split-points). The second dataset is based on the survival of patients with extremely drug resistant tuberculosis (XDR TB) which consists of mainly categorical covariates with two levels (few split-points). The study findings indicate that the conditional inference forest model is superior to random survival forest models in analysing time-to-event data that consists of covariates with many split-points based on the values of the bootstrap cross-validated estimates for integrated Brier scores. However, conditional inference forests perform comparably similar to random survival forests models in analysing time-to-event data consisting of covariates with fewer split-points. Although survival forests are promising methods in analysing time-to-event data, it is important to identify the best forest model for analysis based on the nature of covariates of the dataset in question.

  10. A comparison of the conditional inference survival forest model to random survival forests based on a simulation study as well as on two applications with time-to-event data

    Directory of Open Access Journals (Sweden)

    Justine B. Nasejje

    2017-07-01

    Full Text Available Abstract Background Random survival forest (RSF models have been identified as alternative methods to the Cox proportional hazards model in analysing time-to-event data. These methods, however, have been criticised for the bias that results from favouring covariates with many split-points and hence conditional inference forests for time-to-event data have been suggested. Conditional inference forests (CIF are known to correct the bias in RSF models by separating the procedure for the best covariate to split on from that of the best split point search for the selected covariate. Methods In this study, we compare the random survival forest model to the conditional inference model (CIF using twenty-two simulated time-to-event datasets. We also analysed two real time-to-event datasets. The first dataset is based on the survival of children under-five years of age in Uganda and it consists of categorical covariates with most of them having more than two levels (many split-points. The second dataset is based on the survival of patients with extremely drug resistant tuberculosis (XDR TB which consists of mainly categorical covariates with two levels (few split-points. Results The study findings indicate that the conditional inference forest model is superior to random survival forest models in analysing time-to-event data that consists of covariates with many split-points based on the values of the bootstrap cross-validated estimates for integrated Brier scores. However, conditional inference forests perform comparably similar to random survival forests models in analysing time-to-event data consisting of covariates with fewer split-points. Conclusion Although survival forests are promising methods in analysing time-to-event data, it is important to identify the best forest model for analysis based on the nature of covariates of the dataset in question.

  11. D Semantic Labeling of ALS Data Based on Domain Adaption by Transferring and Fusing Random Forest Models

    Science.gov (United States)

    Wu, J.; Yao, W.; Zhang, J.; Li, Y.

    2018-04-01

    Labeling 3D point cloud data with traditional supervised learning methods requires considerable labelled samples, the collection of which is cost and time expensive. This work focuses on adopting domain adaption concept to transfer existing trained random forest classifiers (based on source domain) to new data scenes (target domain), which aims at reducing the dependence of accurate 3D semantic labeling in point clouds on training samples from the new data scene. Firstly, two random forest classifiers were firstly trained with existing samples previously collected for other data. They were different from each other by using two different decision tree construction algorithms: C4.5 with information gain ratio and CART with Gini index. Secondly, four random forest classifiers adapted to the target domain are derived through transferring each tree in the source random forest models with two types of operations: structure expansion and reduction-SER and structure transfer-STRUT. Finally, points in target domain are labelled by fusing the four newly derived random forest classifiers using weights of evidence based fusion model. To validate our method, experimental analysis was conducted using 3 datasets: one is used as the source domain data (Vaihingen data for 3D Semantic Labelling); another two are used as the target domain data from two cities in China (Jinmen city and Dunhuang city). Overall accuracies of 85.5 % and 83.3 % for 3D labelling were achieved for Jinmen city and Dunhuang city data respectively, with only 1/3 newly labelled samples compared to the cases without domain adaption.

  12. An Improved Fast Compressive Tracking Algorithm Based on Online Random Forest Classifier

    Directory of Open Access Journals (Sweden)

    Xiong Jintao

    2016-01-01

    Full Text Available The fast compressive tracking (FCT algorithm is a simple and efficient algorithm, which is proposed in recent years. But, it is difficult to deal with the factors such as occlusion, appearance changes, pose variation, etc in processing. The reasons are that, Firstly, even if the naive Bayes classifier is fast in training, it is not robust concerning the noise. Secondly, the parameters are required to vary with the unique environment for accurate tracking. In this paper, we propose an improved fast compressive tracking algorithm based on online random forest (FCT-ORF for robust visual tracking. Firstly, we combine ideas with the adaptive compressive sensing theory regarding the weighted random projection to exploit both local and discriminative information of the object. The second reason is the online random forest classifier for online tracking which is demonstrated with more robust to the noise adaptively and high computational efficiency. The experimental results show that the algorithm we have proposed has a better performance in the field of occlusion, appearance changes, and pose variation than the fast compressive tracking algorithm’s contribution.

  13. TEHRAN AIR POLLUTANTS PREDICTION BASED ON RANDOM FOREST FEATURE SELECTION METHOD

    Directory of Open Access Journals (Sweden)

    A. Shamsoddini

    2017-09-01

    Full Text Available Air pollution as one of the most serious forms of environmental pollutions poses huge threat to human life. Air pollution leads to environmental instability, and has harmful and undesirable effects on the environment. Modern prediction methods of the pollutant concentration are able to improve decision making and provide appropriate solutions. This study examines the performance of the Random Forest feature selection in combination with multiple-linear regression and Multilayer Perceptron Artificial Neural Networks methods, in order to achieve an efficient model to estimate carbon monoxide and nitrogen dioxide, sulfur dioxide and PM2.5 contents in the air. The results indicated that Artificial Neural Networks fed by the attributes selected by Random Forest feature selection method performed more accurate than other models for the modeling of all pollutants. The estimation accuracy of sulfur dioxide emissions was lower than the other air contaminants whereas the nitrogen dioxide was predicted more accurate than the other pollutants.

  14. Tehran Air Pollutants Prediction Based on Random Forest Feature Selection Method

    Science.gov (United States)

    Shamsoddini, A.; Aboodi, M. R.; Karami, J.

    2017-09-01

    Air pollution as one of the most serious forms of environmental pollutions poses huge threat to human life. Air pollution leads to environmental instability, and has harmful and undesirable effects on the environment. Modern prediction methods of the pollutant concentration are able to improve decision making and provide appropriate solutions. This study examines the performance of the Random Forest feature selection in combination with multiple-linear regression and Multilayer Perceptron Artificial Neural Networks methods, in order to achieve an efficient model to estimate carbon monoxide and nitrogen dioxide, sulfur dioxide and PM2.5 contents in the air. The results indicated that Artificial Neural Networks fed by the attributes selected by Random Forest feature selection method performed more accurate than other models for the modeling of all pollutants. The estimation accuracy of sulfur dioxide emissions was lower than the other air contaminants whereas the nitrogen dioxide was predicted more accurate than the other pollutants.

  15. The Efficiency of Random Forest Method for Shoreline Extraction from LANDSAT-8 and GOKTURK-2 Imageries

    Science.gov (United States)

    Bayram, B.; Erdem, F.; Akpinar, B.; Ince, A. K.; Bozkurt, S.; Catal Reis, H.; Seker, D. Z.

    2017-11-01

    Coastal monitoring plays a vital role in environmental planning and hazard management related issues. Since shorelines are fundamental data for environment management, disaster management, coastal erosion studies, modelling of sediment transport and coastal morphodynamics, various techniques have been developed to extract shorelines. Random Forest is one of these techniques which is used in this study for shoreline extraction.. This algorithm is a machine learning method based on decision trees. Decision trees analyse classes of training data creates rules for classification. In this study, Terkos region has been chosen for the proposed method within the scope of "TUBITAK Project (Project No: 115Y718) titled "Integration of Unmanned Aerial Vehicles for Sustainable Coastal Zone Monitoring Model - Three-Dimensional Automatic Coastline Extraction and Analysis: Istanbul-Terkos Example". Random Forest algorithm has been implemented to extract the shoreline of the Black Sea where near the lake from LANDSAT-8 and GOKTURK-2 satellite imageries taken in 2015. The MATLAB environment was used for classification. To obtain land and water-body classes, the Random Forest method has been applied to NIR bands of LANDSAT-8 (5th band) and GOKTURK-2 (4th band) imageries. Each image has been digitized manually and shorelines obtained for accuracy assessment. According to accuracy assessment results, Random Forest method is efficient for both medium and high resolution images for shoreline extraction studies.

  16. THE EFFICIENCY OF RANDOM FOREST METHOD FOR SHORELINE EXTRACTION FROM LANDSAT-8 AND GOKTURK-2 IMAGERIES

    Directory of Open Access Journals (Sweden)

    B. Bayram

    2017-11-01

    Full Text Available Coastal monitoring plays a vital role in environmental planning and hazard management related issues. Since shorelines are fundamental data for environment management, disaster management, coastal erosion studies, modelling of sediment transport and coastal morphodynamics, various techniques have been developed to extract shorelines. Random Forest is one of these techniques which is used in this study for shoreline extraction.. This algorithm is a machine learning method based on decision trees. Decision trees analyse classes of training data creates rules for classification. In this study, Terkos region has been chosen for the proposed method within the scope of "TUBITAK Project (Project No: 115Y718 titled "Integration of Unmanned Aerial Vehicles for Sustainable Coastal Zone Monitoring Model – Three-Dimensional Automatic Coastline Extraction and Analysis: Istanbul-Terkos Example". Random Forest algorithm has been implemented to extract the shoreline of the Black Sea where near the lake from LANDSAT-8 and GOKTURK-2 satellite imageries taken in 2015. The MATLAB environment was used for classification. To obtain land and water-body classes, the Random Forest method has been applied to NIR bands of LANDSAT-8 (5th band and GOKTURK-2 (4th band imageries. Each image has been digitized manually and shorelines obtained for accuracy assessment. According to accuracy assessment results, Random Forest method is efficient for both medium and high resolution images for shoreline extraction studies.

  17. Mapping Deforestation in North Korea Using Phenology-Based Multi-Index and Random Forest

    Directory of Open Access Journals (Sweden)

    Yihua Jin

    2016-12-01

    Full Text Available Phenology-based multi-index with the random forest (RF algorithm can be used to overcome the shortcomings of traditional deforestation mapping that involves pixel-based classification, such as ISODATA or decision trees, and single images. The purpose of this study was to investigate methods to identify specific types of deforestation in North Korea, and to increase the accuracy of classification, using phenological characteristics extracted with multi-index and random forest algorithms. The mapping of deforestation area based on RF was carried out by merging phenology-based multi-indices (i.e., normalized difference vegetation index (NDVI, normalized difference water index (NDWI, and normalized difference soil index (NDSI derived from MODIS (Moderate Resolution Imaging Spectroradiometer products and topographical variables. Our results showed overall classification accuracy of 89.38%, with corresponding kappa coefficients of 0.87. In particular, for forest and farm land categories with similar phenological characteristic (e.g., paddy, plateau vegetation, unstocked forest, hillside field, this approach improved the classification accuracy in comparison with pixel-based methods and other classes. The deforestation types were identified by incorporating point data from high-resolution imagery, outcomes of image classification, and slope data. Our study demonstrated that the proposed methodology could be used for deciding on the restoration priority and monitoring the expansion of deforestation areas.

  18. A Valid Matérn Class of Cross-Covariance Functions for Multivariate Random Fields With Any Number of Components

    KAUST Repository

    Apanasovich, Tatiyana V.; Genton, Marc G.; Sun, Ying

    2012-01-01

    We introduce a valid parametric family of cross-covariance functions for multivariate spatial random fields where each component has a covariance function from a well-celebrated Matérn class. Unlike previous attempts, our model indeed allows

  19. Comparison of ARIMA and Random Forest time series models for prediction of avian influenza H5N1 outbreaks.

    Science.gov (United States)

    Kane, Michael J; Price, Natalie; Scotch, Matthew; Rabinowitz, Peter

    2014-08-13

    Time series models can play an important role in disease prediction. Incidence data can be used to predict the future occurrence of disease events. Developments in modeling approaches provide an opportunity to compare different time series models for predictive power. We applied ARIMA and Random Forest time series models to incidence data of outbreaks of highly pathogenic avian influenza (H5N1) in Egypt, available through the online EMPRES-I system. We found that the Random Forest model outperformed the ARIMA model in predictive ability. Furthermore, we found that the Random Forest model is effective for predicting outbreaks of H5N1 in Egypt. Random Forest time series modeling provides enhanced predictive ability over existing time series models for the prediction of infectious disease outbreaks. This result, along with those showing the concordance between bird and human outbreaks (Rabinowitz et al. 2012), provides a new approach to predicting these dangerous outbreaks in bird populations based on existing, freely available data. Our analysis uncovers the time-series structure of outbreak severity for highly pathogenic avain influenza (H5N1) in Egypt.

  20. 3D SEMANTIC LABELING OF ALS DATA BASED ON DOMAIN ADAPTION BY TRANSFERRING AND FUSING RANDOM FOREST MODELS

    Directory of Open Access Journals (Sweden)

    J. Wu

    2018-04-01

    Full Text Available Labeling 3D point cloud data with traditional supervised learning methods requires considerable labelled samples, the collection of which is cost and time expensive. This work focuses on adopting domain adaption concept to transfer existing trained random forest classifiers (based on source domain to new data scenes (target domain, which aims at reducing the dependence of accurate 3D semantic labeling in point clouds on training samples from the new data scene. Firstly, two random forest classifiers were firstly trained with existing samples previously collected for other data. They were different from each other by using two different decision tree construction algorithms: C4.5 with information gain ratio and CART with Gini index. Secondly, four random forest classifiers adapted to the target domain are derived through transferring each tree in the source random forest models with two types of operations: structure expansion and reduction-SER and structure transfer-STRUT. Finally, points in target domain are labelled by fusing the four newly derived random forest classifiers using weights of evidence based fusion model. To validate our method, experimental analysis was conducted using 3 datasets: one is used as the source domain data (Vaihingen data for 3D Semantic Labelling; another two are used as the target domain data from two cities in China (Jinmen city and Dunhuang city. Overall accuracies of 85.5 % and 83.3 % for 3D labelling were achieved for Jinmen city and Dunhuang city data respectively, with only 1/3 newly labelled samples compared to the cases without domain adaption.

  1. A practical introduction to Random Forest for genetic association studies in ecology and evolution.

    Science.gov (United States)

    Brieuc, Marine S O; Waters, Charles D; Drinan, Daniel P; Naish, Kerry A

    2018-03-05

    Large genomic studies are becoming increasingly common with advances in sequencing technology, and our ability to understand how genomic variation influences phenotypic variation between individuals has never been greater. The exploration of such relationships first requires the identification of associations between molecular markers and phenotypes. Here, we explore the use of Random Forest (RF), a powerful machine-learning algorithm, in genomic studies to discern loci underlying both discrete and quantitative traits, particularly when studying wild or nonmodel organisms. RF is becoming increasingly used in ecological and population genetics because, unlike traditional methods, it can efficiently analyse thousands of loci simultaneously and account for nonadditive interactions. However, understanding both the power and limitations of Random Forest is important for its proper implementation and the interpretation of results. We therefore provide a practical introduction to the algorithm and its use for identifying associations between molecular markers and phenotypes, discussing such topics as data limitations, algorithm initiation and optimization, as well as interpretation. We also provide short R tutorials as examples, with the aim of providing a guide to the implementation of the algorithm. Topics discussed here are intended to serve as an entry point for molecular ecologists interested in employing Random Forest to identify trait associations in genomic data sets. © 2018 John Wiley & Sons Ltd.

  2. Relevant feature set estimation with a knock-out strategy and random forests

    DEFF Research Database (Denmark)

    Ganz, Melanie; Greve, Douglas N; Fischl, Bruce

    2015-01-01

    unintuitive and difficult to determine. In this article, we propose a novel MVPA method for group analysis of high-dimensional data that overcomes the drawbacks of the current techniques. Our approach explicitly aims to identify all relevant variations using a "knock-out" strategy and the Random Forest...

  3. A Comparison between Decision Tree and Random Forest in Determining the Risk Factors Associated with Type 2 Diabetes.

    Science.gov (United States)

    Esmaily, Habibollah; Tayefi, Maryam; Doosti, Hassan; Ghayour-Mobarhan, Majid; Nezami, Hossein; Amirabadizadeh, Alireza

    2018-04-24

    We aimed to identify the associated risk factors of type 2 diabetes mellitus (T2DM) using data mining approach, decision tree and random forest techniques using the Mashhad Stroke and Heart Atherosclerotic Disorders (MASHAD) Study program. A cross-sectional study. The MASHAD study started in 2010 and will continue until 2020. Two data mining tools, namely decision trees, and random forests, are used for predicting T2DM when some other characteristics are observed on 9528 subjects recruited from MASHAD database. This paper makes a comparison between these two models in terms of accuracy, sensitivity, specificity and the area under ROC curve. The prevalence rate of T2DM was 14% among these subjects. The decision tree model has 64.9% accuracy, 64.5% sensitivity, 66.8% specificity, and area under the ROC curve measuring 68.6%, while the random forest model has 71.1% accuracy, 71.3% sensitivity, 69.9% specificity, and area under the ROC curve measuring 77.3% respectively. The random forest model, when used with demographic, clinical, and anthropometric and biochemical measurements, can provide a simple tool to identify associated risk factors for type 2 diabetes. Such identification can substantially use for managing the health policy to reduce the number of subjects with T2DM .

  4. Differentiation of fat, muscle, and edema in thigh MRIs using random forest classification

    Science.gov (United States)

    Kovacs, William; Liu, Chia-Ying; Summers, Ronald M.; Yao, Jianhua

    2016-03-01

    There are many diseases that affect the distribution of muscles, including Duchenne and fascioscapulohumeral dystrophy among other myopathies. In these disease cases, it is important to quantify both the muscle and fat volumes to track the disease progression. There has also been evidence that abnormal signal intensity on the MR images, which often is an indication of edema or inflammation can be a good predictor for muscle deterioration. We present a fully-automated method that examines magnetic resonance (MR) images of the thigh and identifies the fat, muscle, and edema using a random forest classifier. First the thigh regions are automatically segmented using the T1 sequence. Then, inhomogeneity artifacts were corrected using the N3 technique. The T1 and STIR (short tau inverse recovery) images are then aligned using landmark based registration with the bone marrow. The normalized T1 and STIR intensity values are used to train the random forest. Once trained, the random forest can accurately classify the aforementioned classes. This method was evaluated on MR images of 9 patients. The precision values are 0.91+/-0.06, 0.98+/-0.01 and 0.50+/-0.29 for muscle, fat, and edema, respectively. The recall values are 0.95+/-0.02, 0.96+/-0.03 and 0.43+/-0.09 for muscle, fat, and edema, respectively. This demonstrates the feasibility of utilizing information from multiple MR sequences for the accurate quantification of fat, muscle and edema.

  5. FACT. Multivariate extraction of muon ring images

    Energy Technology Data Exchange (ETDEWEB)

    Noethe, Maximilian; Temme, Fabian; Buss, Jens [Experimentelle Physik 5b, TU Dortmund, Dortmund (Germany); Collaboration: FACT-Collaboration

    2016-07-01

    In ground-based gamma-ray astronomy, muon ring images are an important event class for instrument calibration and monitoring of its properties. In this talk, a multivariate approach will be presented, that is well suited for real time extraction of muons from data streams of Imaging Atmospheric Cherenkov Telescopes (IACT). FACT, the First G-APD Cherenkov Telescope is located on the Canary Island of La Palma and is the first IACT to use Silicon Photomultipliers for detecting the Cherenkov photons of extensive air showers. In case of FACT, the extracted muon events are used to calculate the time resolution of the camera. In addition, the effect of the mirror alignment in May 2014 on properties of detected muons is investigated. Muon candidates are identified with a random forest classification algorithm. The performance of the classifier is evaluated for different sets of image parameters in order to compare the gain in performance with the computational costs of their calculation.

  6. Genome Wide Association Study to predict severe asthma exacerbations in children using random forests classifiers

    Directory of Open Access Journals (Sweden)

    Litonjua Augusto A

    2011-06-01

    Full Text Available Abstract Background Personalized health-care promises tailored health-care solutions to individual patients based on their genetic background and/or environmental exposure history. To date, disease prediction has been based on a few environmental factors and/or single nucleotide polymorphisms (SNPs, while complex diseases are usually affected by many genetic and environmental factors with each factor contributing a small portion to the outcome. We hypothesized that the use of random forests classifiers to select SNPs would result in an improved predictive model of asthma exacerbations. We tested this hypothesis in a population of childhood asthmatics. Methods In this study, using emergency room visits or hospitalizations as the definition of a severe asthma exacerbation, we first identified a list of top Genome Wide Association Study (GWAS SNPs ranked by Random Forests (RF importance score for the CAMP (Childhood Asthma Management Program population of 127 exacerbation cases and 290 non-exacerbation controls. We predict severe asthma exacerbations using the top 10 to 320 SNPs together with age, sex, pre-bronchodilator FEV1 percentage predicted, and treatment group. Results Testing in an independent set of the CAMP population shows that severe asthma exacerbations can be predicted with an Area Under the Curve (AUC = 0.66 with 160-320 SNPs in comparison to an AUC score of 0.57 with 10 SNPs. Using the clinical traits alone yielded AUC score of 0.54, suggesting the phenotype is affected by genetic as well as environmental factors. Conclusions Our study shows that a random forests algorithm can effectively extract and use the information contained in a small number of samples. Random forests, and other machine learning tools, can be used with GWAS studies to integrate large numbers of predictors simultaneously.

  7. Research on electricity consumption forecast based on mutual information and random forests algorithm

    Science.gov (United States)

    Shi, Jing; Shi, Yunli; Tan, Jian; Zhu, Lei; Li, Hu

    2018-02-01

    Traditional power forecasting models cannot efficiently take various factors into account, neither to identify the relation factors. In this paper, the mutual information in information theory and the artificial intelligence random forests algorithm are introduced into the medium and long-term electricity demand prediction. Mutual information can identify the high relation factors based on the value of average mutual information between a variety of variables and electricity demand, different industries may be highly associated with different variables. The random forests algorithm was used for building the different industries forecasting models according to the different correlation factors. The data of electricity consumption in Jiangsu Province is taken as a practical example, and the above methods are compared with the methods without regard to mutual information and the industries. The simulation results show that the above method is scientific, effective, and can provide higher prediction accuracy.

  8. Predicting Stem Total and Assortment Volumes in an Industrial Pinus taeda L. Forest Plantation Using Airborne Laser Scanning Data and Random Forest

    Directory of Open Access Journals (Sweden)

    Carlos Alberto Silva

    2017-07-01

    Full Text Available Improvements in the management of pine plantations result in multiple industrial and environmental benefits. Remote sensing techniques can dramatically increase the efficiency of plantation management by reducing or replacing time-consuming field sampling. We tested the utility and accuracy of combining field and airborne lidar data with Random Forest, a supervised machine learning algorithm, to estimate stem total and assortment (commercial and pulpwood volumes in an industrial Pinus taeda L. forest plantation in southern Brazil. Random Forest was populated using field and lidar-derived forest metrics from 50 sample plots with trees ranging from three to nine years old. We found that a model defined as a function of only two metrics (height of the top of the canopy and the skewness of the vertical distribution of lidar points has a very strong and unbiased predictive power. We found that predictions of total, commercial, and pulp volume, respectively, showed an adjusted R2 equal to 0.98, 0.98 and 0.96, with unbiased predictions of −0.17%, −0.12% and −0.23%, and Root Mean Square Error (RMSE values of 7.83%, 7.71% and 8.63%. Our methodology makes use of commercially available airborne lidar and widely used mathematical tools to provide solutions for increasing the industry efficiency in monitoring and managing wood volume.

  9. Multivariate generalized linear mixed models using R

    CERN Document Server

    Berridge, Damon Mark

    2011-01-01

    Multivariate Generalized Linear Mixed Models Using R presents robust and methodologically sound models for analyzing large and complex data sets, enabling readers to answer increasingly complex research questions. The book applies the principles of modeling to longitudinal data from panel and related studies via the Sabre software package in R. A Unified Framework for a Broad Class of Models The authors first discuss members of the family of generalized linear models, gradually adding complexity to the modeling framework by incorporating random effects. After reviewing the generalized linear model notation, they illustrate a range of random effects models, including three-level, multivariate, endpoint, event history, and state dependence models. They estimate the multivariate generalized linear mixed models (MGLMMs) using either standard or adaptive Gaussian quadrature. The authors also compare two-level fixed and random effects linear models. The appendices contain additional information on quadrature, model...

  10. Automated segmentation of dental CBCT image with prior-guided sequential random forests

    Energy Technology Data Exchange (ETDEWEB)

    Wang, Li; Gao, Yaozong; Shi, Feng; Li, Gang [Department of Radiology and BRIC, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina 27599-7513 (United States); Chen, Ken-Chung; Tang, Zhen [Surgical Planning Laboratory, Department of Oral and Maxillofacial Surgery, Houston Methodist Research Institute, Houston, Texas 77030 (United States); Xia, James J., E-mail: dgshen@med.unc.edu, E-mail: JXia@HoustonMethodist.org [Surgical Planning Laboratory, Department of Oral and Maxillofacial Surgery, Houston Methodist Research Institute, Houston, Texas 77030 (United States); Department of Surgery (Oral and Maxillofacial Surgery), Weill Medical College, Cornell University, New York, New York 10065 (United States); Department of Oral and Craniomaxillofacial Surgery, Shanghai Jiao Tong University School of Medicine, Shanghai Ninth People’s Hospital, Shanghai 200011 (China); Shen, Dinggang, E-mail: dgshen@med.unc.edu, E-mail: JXia@HoustonMethodist.org [Department of Radiology and BRIC, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina 27599-7513 and Department of Brain and Cognitive Engineering, Korea University, Seoul 02841 (Korea, Republic of)

    2016-01-15

    Purpose: Cone-beam computed tomography (CBCT) is an increasingly utilized imaging modality for the diagnosis and treatment planning of the patients with craniomaxillofacial (CMF) deformities. Accurate segmentation of CBCT image is an essential step to generate 3D models for the diagnosis and treatment planning of the patients with CMF deformities. However, due to the image artifacts caused by beam hardening, imaging noise, inhomogeneity, truncation, and maximal intercuspation, it is difficult to segment the CBCT. Methods: In this paper, the authors present a new automatic segmentation method to address these problems. Specifically, the authors first employ a majority voting method to estimate the initial segmentation probability maps of both mandible and maxilla based on multiple aligned expert-segmented CBCT images. These probability maps provide an important prior guidance for CBCT segmentation. The authors then extract both the appearance features from CBCTs and the context features from the initial probability maps to train the first-layer of random forest classifier that can select discriminative features for segmentation. Based on the first-layer of trained classifier, the probability maps are updated, which will be employed to further train the next layer of random forest classifier. By iteratively training the subsequent random forest classifier using both the original CBCT features and the updated segmentation probability maps, a sequence of classifiers can be derived for accurate segmentation of CBCT images. Results: Segmentation results on CBCTs of 30 subjects were both quantitatively and qualitatively validated based on manually labeled ground truth. The average Dice ratios of mandible and maxilla by the authors’ method were 0.94 and 0.91, respectively, which are significantly better than the state-of-the-art method based on sparse representation (p-value < 0.001). Conclusions: The authors have developed and validated a novel fully automated method

  11. Automated segmentation of dental CBCT image with prior-guided sequential random forests

    International Nuclear Information System (INIS)

    Wang, Li; Gao, Yaozong; Shi, Feng; Li, Gang; Chen, Ken-Chung; Tang, Zhen; Xia, James J.; Shen, Dinggang

    2016-01-01

    Purpose: Cone-beam computed tomography (CBCT) is an increasingly utilized imaging modality for the diagnosis and treatment planning of the patients with craniomaxillofacial (CMF) deformities. Accurate segmentation of CBCT image is an essential step to generate 3D models for the diagnosis and treatment planning of the patients with CMF deformities. However, due to the image artifacts caused by beam hardening, imaging noise, inhomogeneity, truncation, and maximal intercuspation, it is difficult to segment the CBCT. Methods: In this paper, the authors present a new automatic segmentation method to address these problems. Specifically, the authors first employ a majority voting method to estimate the initial segmentation probability maps of both mandible and maxilla based on multiple aligned expert-segmented CBCT images. These probability maps provide an important prior guidance for CBCT segmentation. The authors then extract both the appearance features from CBCTs and the context features from the initial probability maps to train the first-layer of random forest classifier that can select discriminative features for segmentation. Based on the first-layer of trained classifier, the probability maps are updated, which will be employed to further train the next layer of random forest classifier. By iteratively training the subsequent random forest classifier using both the original CBCT features and the updated segmentation probability maps, a sequence of classifiers can be derived for accurate segmentation of CBCT images. Results: Segmentation results on CBCTs of 30 subjects were both quantitatively and qualitatively validated based on manually labeled ground truth. The average Dice ratios of mandible and maxilla by the authors’ method were 0.94 and 0.91, respectively, which are significantly better than the state-of-the-art method based on sparse representation (p-value < 0.001). Conclusions: The authors have developed and validated a novel fully automated method

  12. Multivariate statistics exercises and solutions

    CERN Document Server

    Härdle, Wolfgang Karl

    2015-01-01

    The authors present tools and concepts of multivariate data analysis by means of exercises and their solutions. The first part is devoted to graphical techniques. The second part deals with multivariate random variables and presents the derivation of estimators and tests for various practical situations. The last part introduces a wide variety of exercises in applied multivariate data analysis. The book demonstrates the application of simple calculus and basic multivariate methods in real life situations. It contains altogether more than 250 solved exercises which can assist a university teacher in setting up a modern multivariate analysis course. All computer-based exercises are available in the R language. All R codes and data sets may be downloaded via the quantlet download center  www.quantlet.org or via the Springer webpage. For interactive display of low-dimensional projections of a multivariate data set, we recommend GGobi.

  13. Quantifying spatial distribution of snow depth errors from LiDAR using Random Forests

    Science.gov (United States)

    Tinkham, W.; Smith, A. M.; Marshall, H.; Link, T. E.; Falkowski, M. J.; Winstral, A. H.

    2013-12-01

    There is increasing need to characterize the distribution of snow in complex terrain using remote sensing approaches, especially in isolated mountainous regions that are often water-limited, the principal source of terrestrial freshwater, and sensitive to climatic shifts and variations. We apply intensive topographic surveys, multi-temporal LiDAR, and Random Forest modeling to quantify snow volume and characterize associated errors across seven land cover types in a semi-arid mountainous catchment at a 1 and 4 m spatial resolution. The LiDAR-based estimates of both snow-off surface topology and snow depths were validated against ground-based measurements across the catchment. Comparison of LiDAR-derived snow depths to manual snow depth surveys revealed that LiDAR based estimates were more accurate in areas of low lying vegetation such as shrubs (RMSE = 0.14 m) as compared to areas consisting of tree cover (RMSE = 0.20-0.35 m). The highest errors were found along the edge of conifer forests (RMSE = 0.35 m), however a second conifer transect outside the catchment had much lower errors (RMSE = 0.21 m). This difference is attributed to the wind exposure of the first site that led to highly variable snow depths at short spatial distances. The Random Forest modeled errors deviated from the field measured errors with a RMSE of 0.09-0.34 m across the different cover types. Results show that snow drifts, which are important for maintaining spring and summer stream flows and establishing and sustaining water-limited plant species, contained 30 × 5-6% of the snow volume while only occupying 10% of the catchment area similar to findings by prior physically-based modeling approaches. This study demonstrates the potential utility of combining multi-temporal LiDAR with Random Forest modeling to quantify the distribution of snow depth with a reasonable degree of accuracy. Future work could explore the utility of Terrestrial LiDAR Scanners to produce validation of snow-on surface

  14. Random Forest as an Imputation Method for Education and Psychology Research: Its Impact on Item Fit and Difficulty of the Rasch Model

    Science.gov (United States)

    Golino, Hudson F.; Gomes, Cristiano M. A.

    2016-01-01

    This paper presents a non-parametric imputation technique, named random forest, from the machine learning field. The random forest procedure has two main tuning parameters: the number of trees grown in the prediction and the number of predictors used. Fifty experimental conditions were created in the imputation procedure, with different…

  15. Comparison of the CPU and memory performance of StatPatternRecognitions (SPR) and Toolkit for MultiVariate Analysis (TMVA)

    International Nuclear Information System (INIS)

    Palombo, G.

    2012-01-01

    High Energy Physics data sets are often characterized by a huge number of events. Therefore, it is extremely important to use statistical packages able to efficiently analyze these unprecedented amounts of data. We compare the performance of the statistical packages StatPatternRecognition (SPR) and Toolkit for MultiVariate Analysis (TMVA). We focus on how CPU time and memory usage of the learning process scale versus data set size. As classifiers, we consider Random Forests, Boosted Decision Trees and Neural Networks only, each with specific settings. For our tests, we employ a data set widely used in the machine learning community, “Threenorm” data set, as well as data tailored for testing various edge cases. For each data set, we constantly increase its size and check CPU time and memory needed to build the classifiers implemented in SPR and TMVA. We show that SPR is often significantly faster and consumes significantly less memory. For example, the SPR implementation of Random Forest is by an order of magnitude faster and consumes an order of magnitude less memory than TMVA on Threenorm data.

  16. Gamma/hadron segregation for a ground based imaging atmospheric Cherenkov telescope using machine learning methods: Random Forest leads

    International Nuclear Information System (INIS)

    Sharma Mradul; Koul Maharaj Krishna; Mitra Abhas; Nayak Jitadeepa; Bose Smarajit

    2014-01-01

    A detailed case study of γ-hadron segregation for a ground based atmospheric Cherenkov telescope is presented. We have evaluated and compared various supervised machine learning methods such as the Random Forest method, Artificial Neural Network, Linear Discriminant method, Naive Bayes Classifiers, Support Vector Machines as well as the conventional dynamic supercut method by simulating triggering events with the Monte Carlo method and applied the results to a Cherenkov telescope. It is demonstrated that the Random Forest method is the most sensitive machine learning method for γ-hadron segregation. (research papers)

  17. An overview of multivariate gamma distributions as seen from a (multivariate) matrix exponential perspective

    DEFF Research Database (Denmark)

    Bladt, Mogens; Nielsen, Bo Friis

    2012-01-01

    Laplace transform. In a longer perspective stochastic and statistical analysis for MVME will in particular apply to any of the previously defined distributions. Multivariate gamma distributions have been used in a variety of fields like hydrology, [11], [10], [6], space (wind modeling) [9] reliability [3......Numerous definitions of multivariate exponential and gamma distributions can be retrieved from the literature [4]. These distribtuions belong to the class of Multivariate Matrix-- Exponetial Distributions (MVME) whenever their joint Laplace transform is a rational function. The majority...... of these distributions further belongs to an important subclass of MVME distributions [5, 1] where the multivariate random vector can be interpreted as a number of simultaneously collected rewards during sojourns in a the states of a Markov chain with one absorbing state, the rest of the states being transient. We...

  18. Classification of Phishing Email Using Random Forest Machine Learning Technique

    OpenAIRE

    Akinyelu, Andronicus A.; Adewumi, Aderemi O.

    2013-01-01

    Phishing is one of the major challenges faced by the world of e-commerce today. Thanks to phishing attacks, billions of dollars have been lost by many companies and individuals. In 2012, an online report put the loss due to phishing attack at about $1.5 billion. This global impact of phishing attacks will continue to be on the increase and thus requires more efficient phishing detection techniques to curb the menace. This paper investigates and reports the use of random forest machine learnin...

  19. 3D statistical shape models incorporating 3D random forest regression voting for robust CT liver segmentation

    Science.gov (United States)

    Norajitra, Tobias; Meinzer, Hans-Peter; Maier-Hein, Klaus H.

    2015-03-01

    During image segmentation, 3D Statistical Shape Models (SSM) usually conduct a limited search for target landmarks within one-dimensional search profiles perpendicular to the model surface. In addition, landmark appearance is modeled only locally based on linear profiles and weak learners, altogether leading to segmentation errors from landmark ambiguities and limited search coverage. We present a new method for 3D SSM segmentation based on 3D Random Forest Regression Voting. For each surface landmark, a Random Regression Forest is trained that learns a 3D spatial displacement function between the according reference landmark and a set of surrounding sample points, based on an infinite set of non-local randomized 3D Haar-like features. Landmark search is then conducted omni-directionally within 3D search spaces, where voxelwise forest predictions on landmark position contribute to a common voting map which reflects the overall position estimate. Segmentation experiments were conducted on a set of 45 CT volumes of the human liver, of which 40 images were randomly chosen for training and 5 for testing. Without parameter optimization, using a simple candidate selection and a single resolution approach, excellent results were achieved, while faster convergence and better concavity segmentation were observed, altogether underlining the potential of our approach in terms of increased robustness from distinct landmark detection and from better search coverage.

  20. Using multivariate machine learning methods and structural MRI to classify childhood onset schizophrenia and healthy controls

    Directory of Open Access Journals (Sweden)

    Deanna eGreenstein

    2012-06-01

    Full Text Available Introduction: Multivariate machine learning methods can be used to classify groups of schizophrenia patients and controls using structural magnetic resonance imaging (MRI. However, machine learning methods to date have not been extended beyond classification and contemporaneously applied in a meaningful way to clinical measures. We hypothesized that brain measures would classify groups, and that increased likelihood of being classified as a patient using regional brain measures would be positively related to illness severity, developmental delays and genetic risk. Methods: Using 74 anatomic brain MRI sub regions and Random Forest, we classified 98 COS patients and 99 age, sex, and ethnicity-matched healthy controls. We also used Random Forest to determine the likelihood of being classified as a schizophrenia patient based on MRI measures. We then explored relationships between brain-based probability of illness and symptoms, premorbid development, and presence of copy number variation associated with schizophrenia. Results: Brain regions jointly classified COS and control groups with 73.7% accuracy. Greater brain-based probability of illness was associated with worse functioning (p= 0.0004 and fewer developmental delays (p=0.02. Presence of copy number variation (CNV was associated with lower probability of being classified as schizophrenia (p=0.001. The regions that were most important in classifying groups included left temporal lobes, bilateral dorsolateral prefrontal regions, and left medial parietal lobes. Conclusions: Schizophrenia and control groups can be well classified using Random Forest and anatomic brain measures, and brain-based probability of illness has a positive relationship with illness severity and a negative relationship with developmental delays/problems and CNV-based risk.

  1. AUTOCLASSIFICATION OF THE VARIABLE 3XMM SOURCES USING THE RANDOM FOREST MACHINE LEARNING ALGORITHM

    International Nuclear Information System (INIS)

    Farrell, Sean A.; Murphy, Tara; Lo, Kitty K.

    2015-01-01

    In the current era of large surveys and massive data sets, autoclassification of astrophysical sources using intelligent algorithms is becoming increasingly important. In this paper we present the catalog of variable sources in the Third XMM-Newton Serendipitous Source catalog (3XMM) autoclassified using the Random Forest machine learning algorithm. We used a sample of manually classified variable sources from the second data release of the XMM-Newton catalogs (2XMMi-DR2) to train the classifier, obtaining an accuracy of ∼92%. We also evaluated the effectiveness of identifying spurious detections using a sample of spurious sources, achieving an accuracy of ∼95%. Manual investigation of a random sample of classified sources confirmed these accuracy levels and showed that the Random Forest machine learning algorithm is highly effective at automatically classifying 3XMM sources. Here we present the catalog of classified 3XMM variable sources. We also present three previously unidentified unusual sources that were flagged as outlier sources by the algorithm: a new candidate supergiant fast X-ray transient, a 400 s X-ray pulsar, and an eclipsing 5 hr binary system coincident with a known Cepheid.

  2. Estimation of Rice Crop Yields Using Random Forests in Taiwan

    Science.gov (United States)

    Chen, C. F.; Lin, H. S.; Nguyen, S. T.; Chen, C. R.

    2017-12-01

    Rice is globally one of the most important food crops, directly feeding more people than any other crops. Rice is not only the most important commodity, but also plays a critical role in the economy of Taiwan because it provides employment and income for large rural populations. The rice harvested area and production are thus monitored yearly due to the government's initiatives. Agronomic planners need such information for more precise assessment of food production to tackle issues of national food security and policymaking. This study aimed to develop a machine-learning approach using physical parameters to estimate rice crop yields in Taiwan. We processed the data for 2014 cropping seasons, following three main steps: (1) data pre-processing to construct input layers, including soil types and weather parameters (e.g., maxima and minima air temperature, precipitation, and solar radiation) obtained from meteorological stations across the country; (2) crop yield estimation using the random forests owing to its merits as it can process thousands of variables, estimate missing data, maintain the accuracy level when a large proportion of the data is missing, overcome most of over-fitting problems, and run fast and efficiently when handling large datasets; and (3) error verification. To execute the model, we separated the datasets into two groups of pixels: group-1 (70% of pixels) for training the model and group-2 (30% of pixels) for testing the model. Once the model is trained to produce small and stable out-of-bag error (i.e., the mean squared error between predicted and actual values), it can be used for estimating rice yields of cropping seasons. The results obtained from the random forests-based regression were compared with the actual yield statistics indicated the values of root mean square error (RMSE) and mean absolute error (MAE) achieved for the first rice crop were respectively 6.2% and 2.7%, while those for the second rice crop were 5.3% and 2

  3. Fault diagnosis in spur gears based on genetic algorithm and random forest

    Science.gov (United States)

    Cerrada, Mariela; Zurita, Grover; Cabrera, Diego; Sánchez, René-Vinicio; Artés, Mariano; Li, Chuan

    2016-03-01

    There are growing demands for condition-based monitoring of gearboxes, and therefore new methods to improve the reliability, effectiveness, accuracy of the gear fault detection ought to be evaluated. Feature selection is still an important aspect in machine learning-based diagnosis in order to reach good performance of the diagnostic models. On the other hand, random forest classifiers are suitable models in industrial environments where large data-samples are not usually available for training such diagnostic models. The main aim of this research is to build up a robust system for the multi-class fault diagnosis in spur gears, by selecting the best set of condition parameters on time, frequency and time-frequency domains, which are extracted from vibration signals. The diagnostic system is performed by using genetic algorithms and a classifier based on random forest, in a supervised environment. The original set of condition parameters is reduced around 66% regarding the initial size by using genetic algorithms, and still get an acceptable classification precision over 97%. The approach is tested on real vibration signals by considering several fault classes, one of them being an incipient fault, under different running conditions of load and velocity.

  4. Intelligent Fault Diagnosis of HVCB with Feature Space Optimization-Based Random Forest.

    Science.gov (United States)

    Ma, Suliang; Chen, Mingxuan; Wu, Jianwen; Wang, Yuhao; Jia, Bowen; Jiang, Yuan

    2018-04-16

    Mechanical faults of high-voltage circuit breakers (HVCBs) always happen over long-term operation, so extracting the fault features and identifying the fault type have become a key issue for ensuring the security and reliability of power supply. Based on wavelet packet decomposition technology and random forest algorithm, an effective identification system was developed in this paper. First, compared with the incomplete description of Shannon entropy, the wavelet packet time-frequency energy rate (WTFER) was adopted as the input vector for the classifier model in the feature selection procedure. Then, a random forest classifier was used to diagnose the HVCB fault, assess the importance of the feature variable and optimize the feature space. Finally, the approach was verified based on actual HVCB vibration signals by considering six typical fault classes. The comparative experiment results show that the classification accuracy of the proposed method with the origin feature space reached 93.33% and reached up to 95.56% with optimized input feature vector of classifier. This indicates that feature optimization procedure is successful, and the proposed diagnosis algorithm has higher efficiency and robustness than traditional methods.

  5. Unravelling the importance of forest age stand and forest structure driving microbiological soil properties, enzymatic activities and soil nutrients content in Mediterranean Spanish black pine(Pinus nigra Ar. ssp. salzmannii) Forest.

    Science.gov (United States)

    Lucas-Borja, M E; Hedo, J; Cerdá, A; Candel-Pérez, D; Viñegla, B

    2016-08-15

    This study aimed to investigate the effects that stand age and forest structure have on microbiological soil properties, enzymatic activities and nutrient content. Thirty forest compartments were randomly selected at the Palancares y Agregados managed forest area (Spain), supporting forest stands of five ages; from 100 to 80years old to compartments with trees that were 19-1years old. Forest area ranging from 80 to 120years old and without forest intervention was selected as the control. We measured different soil enzymatic activities, soil respiration and nutrient content (P, K, Na, Mg, Cr, Mn, Fe, Co, Ni, Cu, Zn, Pb and Ca) in the top cm of 10 mineral soils in each compartment. Results showed that the lowest forest stand age and the forest structure created by management presented lower values of organic matter, soil moisture, water holding capacity and litterfall and higher values of C/N ratio in comparison with the highest forest stand age and the related forest structure, which generated differences in soil respiration and soil enzyme activities. The forest structure created by no forest management (control plot) presented the highest enzymatic activities, soil respiration, NH4(+) and NO3(-). Results did not show a clear trend in nutrient content comparing all the experimental areas. Finally, the multivariate PCA analysis clearly clustered three differentiated groups: Control plot; from 100 to 40years old and from 39 to 1year old. Our results suggest that the control plot has better soil quality and that extreme forest stand ages (100-80 and 19-1years old) and the associated forest structure generates differences in soil parameters but not in soil nutrient content. Copyright © 2016 Elsevier B.V. All rights reserved.

  6. A method of signal transmission path analysis for multivariate random processes

    International Nuclear Information System (INIS)

    Oguma, Ritsuo

    1984-04-01

    A method for noise analysis called ''STP (signal transmission path) analysis'' is presentd as a tool to identify noise sources and their propagation paths in multivariate random proceses. Basic idea of the analysis is to identify, via time series analysis, effective network for the signal power transmission among variables in the system and to make use of its information to the noise analysis. In the present paper, we accomplish this through two steps of signal processings; first, we estimate, using noise power contribution analysis, variables which have large contribution to the power spectrum of interest, and then evaluate the STPs for each pair of variables to identify STPs which play significant role for the generated noise to transmit to the variable under evaluation. The latter part of the analysis is executed through comparison of partial coherence function and newly introduced partial noise power contribution function. This paper presents the procedure of the STP analysis and demonstrates, using simulation data as well as Borssele PWR noise data, its effectiveness for investigation of noise generation and propagation mechanisms. (author)

  7. Pain in diagnostic hysteroscopy: a multivariate analysis after a randomized, controlled trial.

    Science.gov (United States)

    Mazzon, Ivan; Favilli, Alessandro; Grasso, Mario; Horvath, Stefano; Bini, Vittorio; Di Renzo, Gian Carlo; Gerli, Sandro

    2014-11-01

    To study which variables are able to influence women's experience of pain during diagnostic hysteroscopy. Multivariate analysis (phase II) after a randomized, controlled trial (phase I). Endoscopic gynecologic center. In phase I, 392 patients were analyzed. Group A: 197 women with carbon dioxide (CO2); group B: 195 women with normal saline. In phase II, 392 patients were assigned to two different groups according to their pain experience as measured by a visual analogue scale (VAS): group VAS>3 (170 patients); group VAS≤3 (222 patients). Free-anesthesia diagnostic hysteroscopy performed using CO2 or normal saline as distension media. Procedure time, VAS score, image quality, and side effects during and after diagnostic hysteroscopy. In phase I the median pain score in group A was 2, whereas in group B it was 3. In phase II the duration of the procedure, nulliparity, and the use of normal saline were significantly correlated with VAS>3. A higher presence of cervical synechiae was observed in the group VAS>3. The multivariate analysis revealed an inverse correlation between parity and a VAS>3, whereas the use of normal saline, the presence of synechiae in the cervical canal, and the duration of the hysteroscopy were all directly correlated to a VAS score>3. Pain in hysteroscopy is significantly related to the presence of cervical synechiae, to the duration of the procedure, and to the use of normal saline; conversely, parity seems to have a protective role. NCT01873391. Copyright © 2014 American Society for Reproductive Medicine. Published by Elsevier Inc. All rights reserved.

  8. Multivariate effect gradients driving forest demographic responses in the Iberian Peninsula

    NARCIS (Netherlands)

    Coll, Marta; Penuelas, Josep; Ninyerola, Miquel; Pons, Xavier; Carnicer, Jofre

    2013-01-01

    A precise knowledge of forest demographic gradients in the Mediterranean area is essential to assess future impacts of climate change and extreme drought events. Here we studied the geographical patterns of forest demography variables (tree recruitment, growth and mortality) of the main species in

  9. Assessing the accuracy and stability of variable selection methods for random forest modeling in ecology

    Science.gov (United States)

    Random forest (RF) modeling has emerged as an important statistical learning method in ecology due to its exceptional predictive performance. However, for large and complex ecological datasets there is limited guidance on variable selection methods for RF modeling. Typically, e...

  10. Predicting redox-sensitive contaminant concentrations in groundwater using random forest classification

    Science.gov (United States)

    Tesoriero, Anthony J.; Gronberg, Jo Ann; Juckem, Paul F.; Miller, Matthew P.; Austin, Brian P.

    2017-08-01

    Machine learning techniques were applied to a large (n > 10,000) compliance monitoring database to predict the occurrence of several redox-active constituents in groundwater across a large watershed. Specifically, random forest classification was used to determine the probabilities of detecting elevated concentrations of nitrate, iron, and arsenic in the Fox, Wolf, Peshtigo, and surrounding watersheds in northeastern Wisconsin. Random forest classification is well suited to describe the nonlinear relationships observed among several explanatory variables and the predicted probabilities of elevated concentrations of nitrate, iron, and arsenic. Maps of the probability of elevated nitrate, iron, and arsenic can be used to assess groundwater vulnerability and the vulnerability of streams to contaminants derived from groundwater. Processes responsible for elevated concentrations are elucidated using partial dependence plots. For example, an increase in the probability of elevated iron and arsenic occurred when well depths coincided with the glacial/bedrock interface, suggesting a bedrock source for these constituents. Furthermore, groundwater in contact with Ordovician bedrock has a higher likelihood of elevated iron concentrations, which supports the hypothesis that groundwater liberates iron from a sulfide-bearing secondary cement horizon of Ordovician age. Application of machine learning techniques to existing compliance monitoring data offers an opportunity to broadly assess aquifer and stream vulnerability at regional and national scales and to better understand geochemical processes responsible for observed conditions.

  11. A random forest classifier for detecting rare variants in NGS data from viral populations

    Directory of Open Access Journals (Sweden)

    Raunaq Malhotra

    Full Text Available We propose a random forest classifier for detecting rare variants from sequencing errors in Next Generation Sequencing (NGS data from viral populations. The method utilizes counts of varying length of k-mers from the reads of a viral population to train a Random forest classifier, called MultiRes, that classifies k-mers as erroneous or rare variants. Our algorithm is rooted in concepts from signal processing and uses a frame-based representation of k-mers. Frames are sets of non-orthogonal basis functions that were traditionally used in signal processing for noise removal. We define discrete spatial signals for genomes and sequenced reads, and show that k-mers of a given size constitute a frame.We evaluate MultiRes on simulated and real viral population datasets, which consist of many low frequency variants, and compare it to the error detection methods used in correction tools known in the literature. MultiRes has 4 to 500 times less false positives k-mer predictions compared to other methods, essential for accurate estimation of viral population diversity and their de-novo assembly. It has high recall of the true k-mers, comparable to other error correction methods. MultiRes also has greater than 95% recall for detecting single nucleotide polymorphisms (SNPs and fewer false positive SNPs, while detecting higher number of rare variants compared to other variant calling methods for viral populations. The software is available freely from the GitHub link https://github.com/raunaq-m/MultiRes. Keywords: Sequencing error detection, Reference free methods, Next-generation sequencing, Viral populations, Multi-resolution frames, Random forest classifier

  12. Introducing two Random Forest based methods for cloud detection in remote sensing images

    Science.gov (United States)

    Ghasemian, Nafiseh; Akhoondzadeh, Mehdi

    2018-07-01

    Cloud detection is a necessary phase in satellite images processing to retrieve the atmospheric and lithospheric parameters. Currently, some cloud detection methods based on Random Forest (RF) model have been proposed but they do not consider both spectral and textural characteristics of the image. Furthermore, they have not been tested in the presence of snow/ice. In this paper, we introduce two RF based algorithms, Feature Level Fusion Random Forest (FLFRF) and Decision Level Fusion Random Forest (DLFRF) to incorporate visible, infrared (IR) and thermal spectral and textural features (FLFRF) including Gray Level Co-occurrence Matrix (GLCM) and Robust Extended Local Binary Pattern (RELBP_CI) or visible, IR and thermal classifiers (DLFRF) for highly accurate cloud detection on remote sensing images. FLFRF first fuses visible, IR and thermal features. Thereafter, it uses the RF model to classify pixels to cloud, snow/ice and background or thick cloud, thin cloud and background. DLFRF considers visible, IR and thermal features (both spectral and textural) separately and inserts each set of features to RF model. Then, it holds vote matrix of each run of the model. Finally, it fuses the classifiers using the majority vote method. To demonstrate the effectiveness of the proposed algorithms, 10 Terra MODIS and 15 Landsat 8 OLI/TIRS images with different spatial resolutions are used in this paper. Quantitative analyses are based on manually selected ground truth data. Results show that after adding RELBP_CI to input feature set cloud detection accuracy improves. Also, the average cloud kappa values of FLFRF and DLFRF on MODIS images (1 and 0.99) are higher than other machine learning methods, Linear Discriminate Analysis (LDA), Classification And Regression Tree (CART), K Nearest Neighbor (KNN) and Support Vector Machine (SVM) (0.96). The average snow/ice kappa values of FLFRF and DLFRF on MODIS images (1 and 0.85) are higher than other traditional methods. The

  13. Chemical Characterization of Young Virgin Queens and Mated Egg-Laying Queens in the Ant Cataglyphis cursor: Random Forest Classification Analysis for Multivariate Datasets.

    Science.gov (United States)

    Monnin, Thibaud; Helft, Florence; Leroy, Chloé; d'Ettorre, Patrizia; Doums, Claudie

    2018-02-01

    Social insects are well known for their extremely rich chemical communication, yet their sex pheromones remain poorly studied. In the thermophilic and thelytokous ant, Cataglyphis cursor, we analyzed the cuticular hydrocarbon profiles and Dufour's gland contents of queens of different age and reproductive status (sexually immature gynes, sexually mature gynes, mated and egg-laying queens) and of workers. Random forest classification analyses showed that the four groups of individuals were well separated for both chemical sources, except mature gynes that clustered with queens for cuticular hydrocarbons and with immature gynes for Dufour's gland secretions. Analyses carried out with two groups of females only allowed identification of candidate chemicals for queen signal and for sexual attractant. In particular, gynes produced more undecane in the Dufour's gland. This chemical is both the sex pheromone and the alarm pheromone of the ant Formica lugubris. It may therefore act as sex pheromone in C. cursor, and/or be involved in the restoration of monogyny that occurs rapidly following colony fission. Indeed, new colonies often start with several gynes and all but one are rapidly culled by workers, and this process likely involves chemical signals between gynes and workers. These findings open novel opportunities for experimental studies of inclusive mate choice and queen choice in C. cursor.

  14. Predicting stem total and assortment volumes in an industrial Pinus taeda L. forest plantation using airborne laser scanning data and random forest

    Science.gov (United States)

    Carlos Alberto Silva; Carine Klauberg; Andrew Thomas Hudak; Lee Alexander Vierling; Wan Shafrina Wan Mohd Jaafar; Midhun Mohan; Mariano Garcia; Antonio Ferraz; Adrian Cardil; Sassan Saatchi

    2017-01-01

    Improvements in the management of pine plantations result in multiple industrial and environmental benefits. Remote sensing techniques can dramatically increase the efficiency of plantation management by reducing or replacing time-consuming field sampling. We tested the utility and accuracy of combining field and airborne lidar data with Random Forest, a supervised...

  15. Random Forests as a tool for estimating uncertainty at pixel-level in SAR image classification

    DEFF Research Database (Denmark)

    Loosvelt, Lien; Peters, Jan; Skriver, Henning

    2012-01-01

    , we introduce Random Forests for the probabilistic mapping of vegetation from high-dimensional remote sensing data and present a comprehensive methodology to assess and analyze classification uncertainty based on the local probabilities of class membership. We apply this method to SAR image data...

  16. Predicting adaptive phenotypes from multilocus genotypes in Sitka spruce (Picea sitchensis) using random forest.

    Science.gov (United States)

    Holliday, Jason A; Wang, Tongli; Aitken, Sally

    2012-09-01

    Climate is the primary driver of the distribution of tree species worldwide, and the potential for adaptive evolution will be an important factor determining the response of forests to anthropogenic climate change. Although association mapping has the potential to improve our understanding of the genomic underpinnings of climatically relevant traits, the utility of adaptive polymorphisms uncovered by such studies would be greatly enhanced by the development of integrated models that account for the phenotypic effects of multiple single-nucleotide polymorphisms (SNPs) and their interactions simultaneously. We previously reported the results of association mapping in the widespread conifer Sitka spruce (Picea sitchensis). In the current study we used the recursive partitioning algorithm 'Random Forest' to identify optimized combinations of SNPs to predict adaptive phenotypes. After adjusting for population structure, we were able to explain 37% and 30% of the phenotypic variation, respectively, in two locally adaptive traits--autumn budset timing and cold hardiness. For each trait, the leading five SNPs captured much of the phenotypic variation. To determine the role of epistasis in shaping these phenotypes, we also used a novel approach to quantify the strength and direction of pairwise interactions between SNPs and found such interactions to be common. Our results demonstrate the power of Random Forest to identify subsets of markers that are most important to climatic adaptation, and suggest that interactions among these loci may be widespread.

  17. MODELING URBAN DYNAMICS USING RANDOM FOREST: IMPLEMENTING ROC AND TOC FOR MODEL EVALUATION

    Directory of Open Access Journals (Sweden)

    M. Ahmadlou

    2016-06-01

    Full Text Available The importance of spatial accuracy of land use/cover change maps necessitates the use of high performance models. To reach this goal, calibrating machine learning (ML approaches to model land use/cover conversions have received increasing interest among the scholars. This originates from the strength of these techniques as they powerfully account for the complex relationships underlying urban dynamics. Compared to other ML techniques, random forest has rarely been used for modeling urban growth. This paper, drawing on information from the multi-temporal Landsat satellite images of 1985, 2000 and 2015, calibrates a random forest regression (RFR model to quantify the variable importance and simulation of urban change spatial patterns. The results and performance of RFR model were evaluated using two complementary tools, relative operating characteristics (ROC and total operating characteristics (TOC, by overlaying the map of observed change and the modeled suitability map for land use change (error map. The suitability map produced by RFR model showed 82.48% area under curve for the ROC model which indicates a very good performance and highlights its appropriateness for simulating urban growth.

  18. Random forest learning of ultrasonic statistical physics and object spaces for lesion detection in 2D sonomammography

    Science.gov (United States)

    Sheet, Debdoot; Karamalis, Athanasios; Kraft, Silvan; Noël, Peter B.; Vag, Tibor; Sadhu, Anup; Katouzian, Amin; Navab, Nassir; Chatterjee, Jyotirmoy; Ray, Ajoy K.

    2013-03-01

    Breast cancer is the most common form of cancer in women. Early diagnosis can significantly improve lifeexpectancy and allow different treatment options. Clinicians favor 2D ultrasonography for breast tissue abnormality screening due to high sensitivity and specificity compared to competing technologies. However, inter- and intra-observer variability in visual assessment and reporting of lesions often handicaps its performance. Existing Computer Assisted Diagnosis (CAD) systems though being able to detect solid lesions are often restricted in performance. These restrictions are inability to (1) detect lesion of multiple sizes and shapes, and (2) differentiate between hypo-echoic lesions from their posterior acoustic shadowing. In this work we present a completely automatic system for detection and segmentation of breast lesions in 2D ultrasound images. We employ random forests for learning of tissue specific primal to discriminate breast lesions from surrounding normal tissues. This enables it to detect lesions of multiple shapes and sizes, as well as discriminate between hypo-echoic lesion from associated posterior acoustic shadowing. The primal comprises of (i) multiscale estimated ultrasonic statistical physics and (ii) scale-space characteristics. The random forest learns lesion vs. background primal from a database of 2D ultrasound images with labeled lesions. For segmentation, the posterior probabilities of lesion pixels estimated by the learnt random forest are hard thresholded to provide a random walks segmentation stage with starting seeds. Our method achieves detection with 99.19% accuracy and segmentation with mean contour-to-contour error < 3 pixels on a set of 40 images with 49 lesions.

  19. BitterSweetForest: A random forest based binary classifier to predict bitterness and sweetness of chemical compounds

    Science.gov (United States)

    Banerjee, Priyanka; Preissner, Robert

    2018-04-01

    Taste of a chemical compounds present in food stimulates us to take in nutrients and avoid poisons. However, the perception of taste greatly depends on the genetic as well as evolutionary perspectives. The aim of this work was the development and validation of a machine learning model based on molecular fingerprints to discriminate between sweet and bitter taste of molecules. BitterSweetForest is the first open access model based on KNIME workflow that provides platform for prediction of bitter and sweet taste of chemical compounds using molecular fingerprints and Random Forest based classifier. The constructed model yielded an accuracy of 95% and an AUC of 0.98 in cross-validation. In independent test set, BitterSweetForest achieved an accuracy of 96 % and an AUC of 0.98 for bitter and sweet taste prediction. The constructed model was further applied to predict the bitter and sweet taste of natural compounds, approved drugs as well as on an acute toxicity compound data set. BitterSweetForest suggests 70% of the natural product space, as bitter and 10 % of the natural product space as sweet with confidence score of 0.60 and above. 77 % of the approved drug set was predicted as bitter and 2% as sweet with a confidence scores of 0.75 and above. Similarly, 75% of the total compounds from acute oral toxicity class were predicted only as bitter with a minimum confidence score of 0.75, revealing toxic compounds are mostly bitter. Furthermore, we applied a Bayesian based feature analysis method to discriminate the most occurring chemical features between sweet and bitter compounds from the feature space of a circular fingerprint.

  20. Correspondence between sound propagation in discrete and continuous random media with application to forest acoustics.

    Science.gov (United States)

    Ostashev, Vladimir E; Wilson, D Keith; Muhlestein, Michael B; Attenborough, Keith

    2018-02-01

    Although sound propagation in a forest is important in several applications, there are currently no rigorous yet computationally tractable prediction methods. Due to the complexity of sound scattering in a forest, it is natural to formulate the problem stochastically. In this paper, it is demonstrated that the equations for the statistical moments of the sound field propagating in a forest have the same form as those for sound propagation in a turbulent atmosphere if the scattering properties of the two media are expressed in terms of the differential scattering and total cross sections. Using the existing theories for sound propagation in a turbulent atmosphere, this analogy enables the derivation of several results for predicting forest acoustics. In particular, the second-moment parabolic equation is formulated for the spatial correlation function of the sound field propagating above an impedance ground in a forest with micrometeorology. Effective numerical techniques for solving this equation have been developed in atmospheric acoustics. In another example, formulas are obtained that describe the effect of a forest on the interference between the direct and ground-reflected waves. The formulated correspondence between wave propagation in discrete and continuous random media can also be used in other fields of physics.

  1. Random Forest Classification of Wetland Landcovers from Multi-Sensor Data in the Arid Region of Xinjiang, China

    Directory of Open Access Journals (Sweden)

    Shaohong Tian

    2016-11-01

    Full Text Available The wetland classification from remotely sensed data is usually difficult due to the extensive seasonal vegetation dynamics and hydrological fluctuation. This study presents a random forest classification approach for the retrieval of the wetland landcover in the arid regions by fusing the Pléiade-1B data with multi-date Landsat-8 data. The segmentation of the Pléiade-1B multispectral image data was performed based on an object-oriented approach, and the geometric and spectral features were extracted for the segmented image objects. The normalized difference vegetation index (NDVI series data were also calculated from the multi-date Landsat-8 data, reflecting vegetation phenological changes in its growth cycle. The feature set extracted from the two sensors data was optimized and employed to create the random forest model for the classification of the wetland landcovers in the Ertix River in northern Xinjiang, China. Comparison with other classification methods such as support vector machine and artificial neural network classifiers indicates that the random forest classifier can achieve accurate classification with an overall accuracy of 93% and the Kappa coefficient of 0.92. The classification accuracy of the farming lands and water bodies that have distinct boundaries with the surrounding land covers was improved 5%–10% by making use of the property of geometric shapes. To remove the difficulty in the classification that was caused by the similar spectral features of the vegetation covers, the phenological difference and the textural information of co-occurrence gray matrix were incorporated into the classification, and the main wetland vegetation covers in the study area were derived from the two sensors data. The inclusion of phenological information in the classification enables the classification errors being reduced down, and the overall accuracy was improved approximately 10%. The results show that the proposed random forest

  2. A comparison of random forest and its Gini importance with standard chemometric methods for the feature selection and classification of spectral data

    Directory of Open Access Journals (Sweden)

    Himmelreich Uwe

    2009-07-01

    Full Text Available Abstract Background Regularized regression methods such as principal component or partial least squares regression perform well in learning tasks on high dimensional spectral data, but cannot explicitly eliminate irrelevant features. The random forest classifier with its associated Gini feature importance, on the other hand, allows for an explicit feature elimination, but may not be optimally adapted to spectral data due to the topology of its constituent classification trees which are based on orthogonal splits in feature space. Results We propose to combine the best of both approaches, and evaluated the joint use of a feature selection based on a recursive feature elimination using the Gini importance of random forests' together with regularized classification methods on spectral data sets from medical diagnostics, chemotaxonomy, biomedical analytics, food science, and synthetically modified spectral data. Here, a feature selection using the Gini feature importance with a regularized classification by discriminant partial least squares regression performed as well as or better than a filtering according to different univariate statistical tests, or using regression coefficients in a backward feature elimination. It outperformed the direct application of the random forest classifier, or the direct application of the regularized classifiers on the full set of features. Conclusion The Gini importance of the random forest provided superior means for measuring feature relevance on spectral data, but – on an optimal subset of features – the regularized classifiers might be preferable over the random forest classifier, in spite of their limitation to model linear dependencies only. A feature selection based on Gini importance, however, may precede a regularized linear classification to identify this optimal subset of features, and to earn a double benefit of both dimensionality reduction and the elimination of noise from the classification task.

  3. Simulating Urban Growth Using a Random Forest-Cellular Automata (RF-CA Model

    Directory of Open Access Journals (Sweden)

    Courage Kamusoko

    2015-04-01

    Full Text Available Sustainable urban planning and management require reliable land change models, which can be used to improve decision making. The objective of this study was to test a random forest-cellular automata (RF-CA model, which combines random forest (RF and cellular automata (CA models. The Kappa simulation (KSimulation, figure of merit, and components of agreement and disagreement statistics were used to validate the RF-CA model. Furthermore, the RF-CA model was compared with support vector machine cellular automata (SVM-CA and logistic regression cellular automata (LR-CA models. Results show that the RF-CA model outperformed the SVM-CA and LR-CA models. The RF-CA model had a Kappa simulation (KSimulation accuracy of 0.51 (with a figure of merit statistic of 47%, while SVM-CA and LR-CA models had a KSimulation accuracy of 0.39 and −0.22 (with figure of merit statistics of 39% and 6%, respectively. Generally, the RF-CA model was relatively accurate at allocating “non-built-up to built-up” changes as reflected by the correct “non-built-up to built-up” components of agreement of 15%. The performance of the RF-CA model was attributed to the relatively accurate RF transition potential maps. Therefore, this study highlights the potential of the RF-CA model for simulating urban growth.

  4. Encoding atlases by randomized classification forests for efficient multi-atlas label propagation.

    Science.gov (United States)

    Zikic, D; Glocker, B; Criminisi, A

    2014-12-01

    We propose a method for multi-atlas label propagation (MALP) based on encoding the individual atlases by randomized classification forests. Most current approaches perform a non-linear registration between all atlases and the target image, followed by a sophisticated fusion scheme. While these approaches can achieve high accuracy, in general they do so at high computational cost. This might negatively affect the scalability to large databases and experimentation. To tackle this issue, we propose to use a small and deep classification forest to encode each atlas individually in reference to an aligned probabilistic atlas, resulting in an Atlas Forest (AF). Our classifier-based encoding differs from current MALP approaches, which represent each point in the atlas either directly as a single image/label value pair, or by a set of corresponding patches. At test time, each AF produces one probabilistic label estimate, and their fusion is done by averaging. Our scheme performs only one registration per target image, achieves good results with a simple fusion scheme, and allows for efficient experimentation. In contrast to standard forest schemes, in which each tree would be trained on all atlases, our approach retains the advantages of the standard MALP framework. The target-specific selection of atlases remains possible, and incorporation of new scans is straightforward without retraining. The evaluation on four different databases shows accuracy within the range of the state of the art at a significantly lower running time. Copyright © 2014 Elsevier B.V. All rights reserved.

  5. Predicting Seagrass Occurrence in a Changing Climate Using Random Forests

    Science.gov (United States)

    Aydin, O.; Butler, K. A.

    2017-12-01

    Seagrasses are marine plants that can quickly sequester vast amounts of carbon (up to 100 times more and 12 times faster than tropical forests). In this work, we present an integrated GIS and machine learning approach to build a data-driven model of seagrass presence-absence. We outline a random forest approach that avoids the prevalence bias in many ecological presence-absence models. One of our goals is to predict global seagrass occurrence from a spatially limited training sample. In addition, we conduct a sensitivity study which investigates the vulnerability of seagrass to changing climate conditions. We integrate multiple data sources including fine-scale seagrass data from MarineCadastre.gov and the recently available globally extensive publicly available Ecological Marine Units (EMU) dataset. These data are used to train a model for seagrass occurrence along the U.S. coast. In situ oceans data are interpolated using Empirical Bayesian Kriging (EBK) to produce globally extensive prediction variables. A neural network is used to estimate probable future values of prediction variables such as ocean temperature to assess the impact of a warming climate on seagrass occurrence. The proposed workflow can be generalized to many presence-absence models.

  6. Modeling urban coastal flood severity from crowd-sourced flood reports using Poisson regression and Random Forest

    Science.gov (United States)

    Sadler, J. M.; Goodall, J. L.; Morsy, M. M.; Spencer, K.

    2018-04-01

    Sea level rise has already caused more frequent and severe coastal flooding and this trend will likely continue. Flood prediction is an essential part of a coastal city's capacity to adapt to and mitigate this growing problem. Complex coastal urban hydrological systems however, do not always lend themselves easily to physically-based flood prediction approaches. This paper presents a method for using a data-driven approach to estimate flood severity in an urban coastal setting using crowd-sourced data, a non-traditional but growing data source, along with environmental observation data. Two data-driven models, Poisson regression and Random Forest regression, are trained to predict the number of flood reports per storm event as a proxy for flood severity, given extensive environmental data (i.e., rainfall, tide, groundwater table level, and wind conditions) as input. The method is demonstrated using data from Norfolk, Virginia USA from September 2010 to October 2016. Quality-controlled, crowd-sourced street flooding reports ranging from 1 to 159 per storm event for 45 storm events are used to train and evaluate the models. Random Forest performed better than Poisson regression at predicting the number of flood reports and had a lower false negative rate. From the Random Forest model, total cumulative rainfall was by far the most dominant input variable in predicting flood severity, followed by low tide and lower low tide. These methods serve as a first step toward using data-driven methods for spatially and temporally detailed coastal urban flood prediction.

  7. Semi-deciduous forest remnants in Benin: patterns and floristic characterisation

    NARCIS (Netherlands)

    Adomou, A.C.; Akoegninou, A.; Sinsin, B.; Foucault, de B.; Maesen, van der L.J.G.

    2009-01-01

    Patterns of semi-deciduous forest are investigated in Benin by means of phytosociological releves and multivariate analyses Species and family importance values are assessed for each forest type The classifications and DCA ordination of 176 semi-deciduous forest releves result in six forest types,

  8. Prediction of glycosylation sites using random forests

    Directory of Open Access Journals (Sweden)

    Hirst Jonathan D

    2008-11-01

    Full Text Available Abstract Background Post translational modifications (PTMs occur in the vast majority of proteins and are essential for function. Prediction of the sequence location of PTMs enhances the functional characterisation of proteins. Glycosylation is one type of PTM, and is implicated in protein folding, transport and function. Results We use the random forest algorithm and pairwise patterns to predict glycosylation sites. We identify pairwise patterns surrounding glycosylation sites and use an odds ratio to weight their propensity of association with modified residues. Our prediction program, GPP (glycosylation prediction program, predicts glycosylation sites with an accuracy of 90.8% for Ser sites, 92.0% for Thr sites and 92.8% for Asn sites. This is significantly better than current glycosylation predictors. We use the trepan algorithm to extract a set of comprehensible rules from GPP, which provide biological insight into all three major glycosylation types. Conclusion We have created an accurate predictor of glycosylation sites and used this to extract comprehensible rules about the glycosylation process. GPP is available online at http://comp.chem.nottingham.ac.uk/glyco/.

  9. Stable Graphical Model Estimation with Random Forests for Discrete, Continuous, and Mixed Variables

    OpenAIRE

    Fellinghauer, Bernd; Bühlmann, Peter; Ryffel, Martin; von Rhein, Michael; Reinhardt, Jan D.

    2011-01-01

    A conditional independence graph is a concise representation of pairwise conditional independence among many variables. Graphical Random Forests (GRaFo) are a novel method for estimating pairwise conditional independence relationships among mixed-type, i.e. continuous and discrete, variables. The number of edges is a tuning parameter in any graphical model estimator and there is no obvious number that constitutes a good choice. Stability Selection helps choosing this parameter with respect to...

  10. Gearbox fault diagnosis based on deep random forest fusion of acoustic and vibratory signals

    Science.gov (United States)

    Li, Chuan; Sanchez, René-Vinicio; Zurita, Grover; Cerrada, Mariela; Cabrera, Diego; Vásquez, Rafael E.

    2016-08-01

    Fault diagnosis is an effective tool to guarantee safe operations in gearboxes. Acoustic and vibratory measurements in such mechanical devices are all sensitive to the existence of faults. This work addresses the use of a deep random forest fusion (DRFF) technique to improve fault diagnosis performance for gearboxes by using measurements of an acoustic emission (AE) sensor and an accelerometer that are used for monitoring the gearbox condition simultaneously. The statistical parameters of the wavelet packet transform (WPT) are first produced from the AE signal and the vibratory signal, respectively. Two deep Boltzmann machines (DBMs) are then developed for deep representations of the WPT statistical parameters. A random forest is finally suggested to fuse the outputs of the two DBMs as the integrated DRFF model. The proposed DRFF technique is evaluated using gearbox fault diagnosis experiments under different operational conditions, and achieves 97.68% of the classification rate for 11 different condition patterns. Compared to other peer algorithms, the addressed method exhibits the best performance. The results indicate that the deep learning fusion of acoustic and vibratory signals may improve fault diagnosis capabilities for gearboxes.

  11. A Framework To Support Management Of HIVAIDS Using K-Means And Random Forest Algorithm

    Directory of Open Access Journals (Sweden)

    Gladys Iseu

    2017-06-01

    Full Text Available Healthcare industry generates large amounts of complex data about patients hospital resources disease management electronic patient records and medical devices among others. The availability of these huge amounts of medical data creates a need for powerful mining tools to support health care professionals in diagnosis treatment and management of HIVAIDS. Several data mining techniques have been used in management of different data sets. Data mining techniques have been categorized into regression algorithms segmentation algorithms association algorithms sequence analysis algorithms and classification algorithms. In the medical field there has not been a specific study that has incorporated two or more data mining algorithms hence limiting decision making levels by medical practitioners. This study identified the extent to which K-means algorithm cluster patient characteristics it has also evaluated the extent to which random forest algorithm can classify the data for informed decision making as well as design a framework to support medical decision making in the treatment of HIVAIDS related diseases in Kenya. The paper further used random forest classification algorithm to compute proximities between pairs of cases that can be used in clustering locating outliers or by scaling to give interesting views of the data.

  12. An AUC-based permutation variable importance measure for random forests.

    Science.gov (United States)

    Janitza, Silke; Strobl, Carolin; Boulesteix, Anne-Laure

    2013-04-05

    The random forest (RF) method is a commonly used tool for classification with high dimensional data as well as for ranking candidate predictors based on the so-called random forest variable importance measures (VIMs). However the classification performance of RF is known to be suboptimal in case of strongly unbalanced data, i.e. data where response class sizes differ considerably. Suggestions were made to obtain better classification performance based either on sampling procedures or on cost sensitivity analyses. However to our knowledge the performance of the VIMs has not yet been examined in the case of unbalanced response classes. In this paper we explore the performance of the permutation VIM for unbalanced data settings and introduce an alternative permutation VIM based on the area under the curve (AUC) that is expected to be more robust towards class imbalance. We investigated the performance of the standard permutation VIM and of our novel AUC-based permutation VIM for different class imbalance levels using simulated data and real data. The results suggest that the new AUC-based permutation VIM outperforms the standard permutation VIM for unbalanced data settings while both permutation VIMs have equal performance for balanced data settings. The standard permutation VIM loses its ability to discriminate between associated predictors and predictors not associated with the response for increasing class imbalance. It is outperformed by our new AUC-based permutation VIM for unbalanced data settings, while the performance of both VIMs is very similar in the case of balanced classes. The new AUC-based VIM is implemented in the R package party for the unbiased RF variant based on conditional inference trees. The codes implementing our study are available from the companion website: http://www.ibe.med.uni-muenchen.de/organisation/mitarbeiter/070_drittmittel/janitza/index.html.

  13. Random forests for classification in ecology

    Science.gov (United States)

    Cutler, D.R.; Edwards, T.C.; Beard, K.H.; Cutler, A.; Hess, K.T.; Gibson, J.; Lawler, J.J.

    2007-01-01

    Classification procedures are some of the most widely used statistical methods in ecology. Random forests (RF) is a new and powerful statistical classifier that is well established in other disciplines but is relatively unknown in ecology. Advantages of RF compared to other statistical classifiers include (1) very high classification accuracy; (2) a novel method of determining variable importance; (3) ability to model complex interactions among predictor variables; (4) flexibility to perform several types of statistical data analysis, including regression, classification, survival analysis, and unsupervised learning; and (5) an algorithm for imputing missing values. We compared the accuracies of RF and four other commonly used statistical classifiers using data on invasive plant species presence in Lava Beds National Monument, California, USA, rare lichen species presence in the Pacific Northwest, USA, and nest sites for cavity nesting birds in the Uinta Mountains, Utah, USA. We observed high classification accuracy in all applications as measured by cross-validation and, in the case of the lichen data, by independent test data, when comparing RF to other common classification methods. We also observed that the variables that RF identified as most important for classifying invasive plant species coincided with expectations based on the literature. ?? 2007 by the Ecological Society of America.

  14. Prostate cancer prediction using the random forest algorithm that takes into account transrectal ultrasound findings, age, and serum levels of prostate-specific antigen

    Directory of Open Access Journals (Sweden)

    Li-Hong Xiao

    2017-01-01

    Full Text Available The aim of this study is to evaluate the ability of the random forest algorithm that combines data on transrectal ultrasound findings, age, and serum levels of prostate-specific antigen to predict prostate carcinoma. Clinico-demographic data were analyzed for 941 patients with prostate diseases treated at our hospital, including age, serum prostate-specific antigen levels, transrectal ultrasound findings, and pathology diagnosis based on ultrasound-guided needle biopsy of the prostate. These data were compared between patients with and without prostate cancer using the Chi-square test, and then entered into the random forest model to predict diagnosis. Patients with and without prostate cancer differed significantly in age and serum prostate-specific antigen levels (P < 0.001, as well as in all transrectal ultrasound characteristics (P < 0.05 except uneven echo (P = 0.609. The random forest model based on age, prostate-specific antigen and ultrasound predicted prostate cancer with an accuracy of 83.10%, sensitivity of 65.64%, and specificity of 93.83%. Positive predictive value was 86.72%, and negative predictive value was 81.64%. By integrating age, prostate-specific antigen levels and transrectal ultrasound findings, the random forest algorithm shows better diagnostic performance for prostate cancer than either diagnostic indicator on its own. This algorithm may help improve diagnosis of the disease by identifying patients at high risk for biopsy.

  15. Automatic classification of endogenous seismic sources within a landslide body using random forest algorithm

    Science.gov (United States)

    Provost, Floriane; Hibert, Clément; Malet, Jean-Philippe; Stumpf, André; Doubre, Cécile

    2016-04-01

    Different studies have shown the presence of microseismic activity in soft-rock landslides. The seismic signals exhibit significantly different features in the time and frequency domains which allow their classification and interpretation. Most of the classes could be associated with different mechanisms of deformation occurring within and at the surface (e.g. rockfall, slide-quake, fissure opening, fluid circulation). However, some signals remain not fully understood and some classes contain few examples that prevent any interpretation. To move toward a more complete interpretation of the links between the dynamics of soft-rock landslides and the physical processes controlling their behaviour, a complete catalog of the endogeneous seismicity is needed. We propose a multi-class detection method based on the random forests algorithm to automatically classify the source of seismic signals. Random forests is a supervised machine learning technique that is based on the computation of a large number of decision trees. The multiple decision trees are constructed from training sets including each of the target classes. In the case of seismic signals, these attributes may encompass spectral features but also waveform characteristics, multi-stations observations and other relevant information. The Random Forest classifier is used because it provides state-of-the-art performance when compared with other machine learning techniques (e.g. SVM, Neural Networks) and requires no fine tuning. Furthermore it is relatively fast, robust, easy to parallelize, and inherently suitable for multi-class problems. In this work, we present the first results of the classification method applied to the seismicity recorded at the Super-Sauze landslide between 2013 and 2015. We selected a dozen of seismic signal features that characterize precisely its spectral content (e.g. central frequency, spectrum width, energy in several frequency bands, spectrogram shape, spectrum local and global maxima

  16. Using random forests for assistance in the curation of G-protein coupled receptor databases.

    Science.gov (United States)

    Shkurin, Aleksei; Vellido, Alfredo

    2017-08-18

    Biology is experiencing a gradual but fast transformation from a laboratory-centred science towards a data-centred one. As such, it requires robust data engineering and the use of quantitative data analysis methods as part of database curation. This paper focuses on G protein-coupled receptors, a large and heterogeneous super-family of cell membrane proteins of interest to biology in general. One of its families, Class C, is of particular interest to pharmacology and drug design. This family is quite heterogeneous on its own, and the discrimination of its several sub-families is a challenging problem. In the absence of known crystal structure, such discrimination must rely on their primary amino acid sequences. We are interested not as much in achieving maximum sub-family discrimination accuracy using quantitative methods, but in exploring sequence misclassification behavior. Specifically, we are interested in isolating those sequences showing consistent misclassification, that is, sequences that are very often misclassified and almost always to the same wrong sub-family. Random forests are used for this analysis due to their ensemble nature, which makes them naturally suited to gauge the consistency of misclassification. This consistency is here defined through the voting scheme of their base tree classifiers. Detailed consistency results for the random forest ensemble classification were obtained for all receptors and for all data transformations of their unaligned primary sequences. Shortlists of the most consistently misclassified receptors for each subfamily and transformation, as well as an overall shortlist including those cases that were consistently misclassified across transformations, were obtained. The latter should be referred to experts for further investigation as a data curation task. The automatic discrimination of the Class C sub-families of G protein-coupled receptors from their unaligned primary sequences shows clear limits. This study has

  17. Predicting disease risks from highly imbalanced data using random forest

    Directory of Open Access Journals (Sweden)

    Chakraborty Sounak

    2011-07-01

    Full Text Available Abstract Background We present a method utilizing Healthcare Cost and Utilization Project (HCUP dataset for predicting disease risk of individuals based on their medical diagnosis history. The presented methodology may be incorporated in a variety of applications such as risk management, tailored health communication and decision support systems in healthcare. Methods We employed the National Inpatient Sample (NIS data, which is publicly available through Healthcare Cost and Utilization Project (HCUP, to train random forest classifiers for disease prediction. Since the HCUP data is highly imbalanced, we employed an ensemble learning approach based on repeated random sub-sampling. This technique divides the training data into multiple sub-samples, while ensuring that each sub-sample is fully balanced. We compared the performance of support vector machine (SVM, bagging, boosting and RF to predict the risk of eight chronic diseases. Results We predicted eight disease categories. Overall, the RF ensemble learning method outperformed SVM, bagging and boosting in terms of the area under the receiver operating characteristic (ROC curve (AUC. In addition, RF has the advantage of computing the importance of each variable in the classification process. Conclusions In combining repeated random sub-sampling with RF, we were able to overcome the class imbalance problem and achieve promising results. Using the national HCUP data set, we predicted eight disease categories with an average AUC of 88.79%.

  18. Design of Probabilistic Random Forests with Applications to Anticancer Drug Sensitivity Prediction.

    Science.gov (United States)

    Rahman, Raziur; Haider, Saad; Ghosh, Souparno; Pal, Ranadip

    2015-01-01

    Random forests consisting of an ensemble of regression trees with equal weights are frequently used for design of predictive models. In this article, we consider an extension of the methodology by representing the regression trees in the form of probabilistic trees and analyzing the nature of heteroscedasticity. The probabilistic tree representation allows for analytical computation of confidence intervals (CIs), and the tree weight optimization is expected to provide stricter CIs with comparable performance in mean error. We approached the ensemble of probabilistic trees' prediction from the perspectives of a mixture distribution and as a weighted sum of correlated random variables. We applied our methodology to the drug sensitivity prediction problem on synthetic and cancer cell line encyclopedia dataset and illustrated that tree weights can be selected to reduce the average length of the CI without increase in mean error.

  19. Personalized epilepsy seizure detection using random forest classification over one-dimension transformed EEG data

    OpenAIRE

    Orellana, Marco; Cerqueira, Fabio

    2016-01-01

    This work presents a computational method for improving seizure detection for epilepsy diagnosis. Epilepsy is the second most common neurological disease impacting between 40 and 50 million of patients in the world and its proper diagnosis using electroencephalographic signals implies a long and expensive process which involves medical specialists. The proposed system is a patient-dependent offline system which performs an automatic detection of seizures in brainwaves applying a random forest...

  20. Prognostic Factors for Survival in Patients with Gastric Cancer using a Random Survival Forest

    Science.gov (United States)

    Adham, Davoud; Abbasgholizadeh, Nategh; Abazari, Malek

    2017-01-01

    Background: Gastric cancer is the fifth most common cancer and the third top cause of cancer related death with about 1 million new cases and 700,000 deaths in 2012. The aim of this investigation was to identify important factors for outcome using a random survival forest (RSF) approach. Materials and Methods: Data were collected from 128 gastric cancer patients through a historical cohort study in Hamedan-Iran from 2007 to 2013. The event under consideration was death due to gastric cancer. The random survival forest model in R software was applied to determine the key factors affecting survival. Four split criteria were used to determine importance of the variables in the model including log-rank, conversation?? of events, log-rank score, and randomization. Efficiency of the model was confirmed in terms of Harrell’s concordance index. Results: The mean age of diagnosis was 63 ±12.57 and mean and median survival times were 15.2 (95%CI: 13.3, 17.0) and 12.3 (95%CI: 11.0, 13.4) months, respectively. The one-year, two-year, and three-year rates for survival were 51%, 13%, and 5%, respectively. Each RSF approach showed a slightly different ranking order. Very important covariates in nearly all the 4 RSF approaches were metastatic status, age at diagnosis and tumor size. The performance of each RSF approach was in the range of 0.29-0.32 and the best error rate was obtained by the log-rank splitting rule; second, third, and fourth ranks were log-rank score, conservation of events, and the random splitting rule, respectively. Conclusion: Low survival rate of gastric cancer patients is an indication of absence of a screening program for early diagnosis of the disease. Timely diagnosis in early phases increases survival and decreases mortality. Creative Commons Attribution License

  1. Multivariate meta-analysis: Potential and promise

    Science.gov (United States)

    Jackson, Dan; Riley, Richard; White, Ian R

    2011-01-01

    The multivariate random effects model is a generalization of the standard univariate model. Multivariate meta-analysis is becoming more commonly used and the techniques and related computer software, although continually under development, are now in place. In order to raise awareness of the multivariate methods, and discuss their advantages and disadvantages, we organized a one day ‘Multivariate meta-analysis’ event at the Royal Statistical Society. In addition to disseminating the most recent developments, we also received an abundance of comments, concerns, insights, critiques and encouragement. This article provides a balanced account of the day's discourse. By giving others the opportunity to respond to our assessment, we hope to ensure that the various view points and opinions are aired before multivariate meta-analysis simply becomes another widely used de facto method without any proper consideration of it by the medical statistics community. We describe the areas of application that multivariate meta-analysis has found, the methods available, the difficulties typically encountered and the arguments for and against the multivariate methods, using four representative but contrasting examples. We conclude that the multivariate methods can be useful, and in particular can provide estimates with better statistical properties, but also that these benefits come at the price of making more assumptions which do not result in better inference in every case. Although there is evidence that multivariate meta-analysis has considerable potential, it must be even more carefully applied than its univariate counterpart in practice. Copyright © 2011 John Wiley & Sons, Ltd. PMID:21268052

  2. Urban Flood Mapping Based on Unmanned Aerial Vehicle Remote Sensing and Random Forest Classifier—A Case of Yuyao, China

    Directory of Open Access Journals (Sweden)

    Quanlong Feng

    2015-03-01

    Full Text Available Flooding is a severe natural hazard, which poses a great threat to human life and property, especially in densely-populated urban areas. As one of the fastest developing fields in remote sensing applications, an unmanned aerial vehicle (UAV can provide high-resolution data with a great potential for fast and accurate detection of inundated areas under complex urban landscapes. In this research, optical imagery was acquired by a mini-UAV to monitor the serious urban waterlogging in Yuyao, China. Texture features derived from gray-level co-occurrence matrix were included to increase the separability of different ground objects. A Random Forest classifier, consisting of 200 decision trees, was used to extract flooded areas in the spectral-textural feature space. Confusion matrix was used to assess the accuracy of the proposed method. Results indicated the following: (1 Random Forest showed good performance in urban flood mapping with an overall accuracy of 87.3% and a Kappa coefficient of 0.746; (2 the inclusion of texture features improved classification accuracy significantly; (3 Random Forest outperformed maximum likelihood and artificial neural network, and showed a similar performance to support vector machine. The results demonstrate that UAV can provide an ideal platform for urban flood monitoring and the proposed method shows great capability for the accurate extraction of inundated areas.

  3. The relative importance of community forests, government forests, and private forests for household-level incomes in the Middle Hills of Nepal

    DEFF Research Database (Denmark)

    Oli, Bishwa Nath; Treue, Thorsten; Smith-Hall, Carsten

    2016-01-01

    To investigate the household-level economic importance of income from forests under different tenure arrangements, data were collected from 304 stratified randomly sampled households within 10 villages with community forest user groups in Tanahun District, Western Nepal. We observed that forest...... realisation of community forestry's poverty reduction and income equalizing potential requires modifications of rules that govern forest extraction and pricing at community forest user group level....

  4. Predicting attention-deficit/hyperactivity disorder severity from psychosocial stress and stress-response genes : A random forest regression approach

    NARCIS (Netherlands)

    Van Der Meer, D.; Hoekstra, P. J.; Van Donkelaar, M.; Bralten, J.; Oosterlaan, J.; Heslenfeld, D.; Faraone, S. V.; Franke, B.; Buitelaar, J. K.; Hartman, C. A.

    2017-01-01

    Identifying genetic variants contributing to attention-deficit/hyperactivity disorder (ADHD) is complicated by the involvement of numerous common genetic variants with small effects, interacting with each other as well as with environmental factors, such as stress exposure. Random forest regression

  5. Predicting attention-deficit/hyperactivity disorder severity from psychosocial stress and stress-response genes : a random forest regression approach

    NARCIS (Netherlands)

    van der Meer, D.; Hoekstra, P. J.; van Donkelaar, Marjolein M. J.; Bralten, Janita; Oosterlaan, J; Heslenfeld, Dirk J.; Faraone, S. V.; Franke, B.; Buitelaar, J. K.; Hartman, C. A.

    2017-01-01

    Identifying genetic variants contributing to attention-deficit/hyperactivity disorder (ADHD) is complicated by the involvement of numerous common genetic variants with small effects, interacting with each other as well as with environmental factors, such as stress exposure. Random forest regression

  6. Multivariate normal maximum likelihood with both ordinal and continuous variables, and data missing at random.

    Science.gov (United States)

    Pritikin, Joshua N; Brick, Timothy R; Neale, Michael C

    2018-04-01

    A novel method for the maximum likelihood estimation of structural equation models (SEM) with both ordinal and continuous indicators is introduced using a flexible multivariate probit model for the ordinal indicators. A full information approach ensures unbiased estimates for data missing at random. Exceeding the capability of prior methods, up to 13 ordinal variables can be included before integration time increases beyond 1 s per row. The method relies on the axiom of conditional probability to split apart the distribution of continuous and ordinal variables. Due to the symmetry of the axiom, two similar methods are available. A simulation study provides evidence that the two similar approaches offer equal accuracy. A further simulation is used to develop a heuristic to automatically select the most computationally efficient approach. Joint ordinal continuous SEM is implemented in OpenMx, free and open-source software.

  7. Utilizing random forests imputation of forest plot data for landscape-level wildfire analyses

    Science.gov (United States)

    Karin L. Riley; Isaac C. Grenfell; Mark A. Finney; Nicholas L. Crookston

    2014-01-01

    Maps of the number, size, and species of trees in forests across the United States are desirable for a number of applications. For landscape-level fire and forest simulations that use the Forest Vegetation Simulator (FVS), a spatial tree-level dataset, or “tree list”, is a necessity. FVS is widely used at the stand level for simulating fire effects on tree mortality,...

  8. Agro-forest landscape and the 'fringe' city: a multivariate assessment of land-use changes in a sprawling region and implications for planning.

    Science.gov (United States)

    Salvati, Luca

    2014-08-15

    The present study evaluates the impact of urban expansion on landscape transformations in Rome's metropolitan area (1500 km(2)) during the last sixty years. Landscape composition, structure and dynamics were assessed for 1949 and 2008 by analyzing the distribution of 26 metrics for nine land-use classes. Changes in landscape structure are analysed by way of a multivariate statistical approach providing a summary measure of rapidity-to-change for each metric and class. Land fragmentation increased during the study period due to urban expansion. Poorly protected or medium-low value added classes (vineyards, arable land, olive groves and pastures) experienced fragmentation processes compared with protected or high-value added classes (e.g. forests, olive groves) showing larger 'core' areas and lower fragmentation. The relationship observed between class area and mean patch size indicates increased fragmentation for all uses of land (both expanding and declining) except for urban areas and forests. Reducing the impact of urban expansion for specific land-use classes is an effective planning strategy to contrast the simplification of Mediterranean landscape in peri-urban areas. Copyright © 2014 Elsevier B.V. All rights reserved.

  9. A MULTIVARIATE APPROACH TO ANALYSE NATIVE FOREST TREE SPECIE SEEDS

    Directory of Open Access Journals (Sweden)

    Alessandro Dal Col Lúcio

    2006-03-01

    Full Text Available This work grouped, by species, the most similar seed tree, using the variables observed in exotic forest species of theBrazilian flora of seeds collected in the Forest Research and Soil Conservation Center of Santa Maria, Rio Grande do Sul, analyzedfrom January, 1997, to march, 2003. For the cluster analysis, all the species that possessed four or more analyses per lot wereanalyzed by the hierarchical Clustering method, of the standardized Euclidian medium distance, being also a principal componentanalysis technique for reducing the number of variables. The species Callistemon speciosus, Cassia fistula, Eucalyptus grandis,Eucalyptus robusta, Eucalyptus saligna, Eucalyptus tereticornis, Delonix regia, Jacaranda mimosaefolia e Pinus elliottii presentedmore than four analyses per lot, in which the third and fourth main components explained 80% of the total variation. The clusteranalysis was efficient in the separation of the groups of all tested species, as well as the method of the main components.

  10. Mapping Soil Properties of Africa at 250 m Resolution: Random Forests Significantly Improve Current Predictions.

    Directory of Open Access Journals (Sweden)

    Tomislav Hengl

    Full Text Available 80% of arable land in Africa has low soil fertility and suffers from physical soil problems. Additionally, significant amounts of nutrients are lost every year due to unsustainable soil management practices. This is partially the result of insufficient use of soil management knowledge. To help bridge the soil information gap in Africa, the Africa Soil Information Service (AfSIS project was established in 2008. Over the period 2008-2014, the AfSIS project compiled two point data sets: the Africa Soil Profiles (legacy database and the AfSIS Sentinel Site database. These data sets contain over 28 thousand sampling locations and represent the most comprehensive soil sample data sets of the African continent to date. Utilizing these point data sets in combination with a large number of covariates, we have generated a series of spatial predictions of soil properties relevant to the agricultural management--organic carbon, pH, sand, silt and clay fractions, bulk density, cation-exchange capacity, total nitrogen, exchangeable acidity, Al content and exchangeable bases (Ca, K, Mg, Na. We specifically investigate differences between two predictive approaches: random forests and linear regression. Results of 5-fold cross-validation demonstrate that the random forests algorithm consistently outperforms the linear regression algorithm, with average decreases of 15-75% in Root Mean Squared Error (RMSE across soil properties and depths. Fitting and running random forests models takes an order of magnitude more time and the modelling success is sensitive to artifacts in the input data, but as long as quality-controlled point data are provided, an increase in soil mapping accuracy can be expected. Results also indicate that globally predicted soil classes (USDA Soil Taxonomy, especially Alfisols and Mollisols help improve continental scale soil property mapping, and are among the most important predictors. This indicates a promising potential for transferring

  11. Lesion segmentation from multimodal MRI using random forest following ischemic stroke.

    Science.gov (United States)

    Mitra, Jhimli; Bourgeat, Pierrick; Fripp, Jurgen; Ghose, Soumya; Rose, Stephen; Salvado, Olivier; Connelly, Alan; Campbell, Bruce; Palmer, Susan; Sharma, Gagan; Christensen, Soren; Carey, Leeanne

    2014-09-01

    Understanding structure-function relationships in the brain after stroke is reliant not only on the accurate anatomical delineation of the focal ischemic lesion, but also on previous infarcts, remote changes and the presence of white matter hyperintensities. The robust definition of primary stroke boundaries and secondary brain lesions will have significant impact on investigation of brain-behavior relationships and lesion volume correlations with clinical measures after stroke. Here we present an automated approach to identify chronic ischemic infarcts in addition to other white matter pathologies, that may be used to aid the development of post-stroke management strategies. Our approach uses Bayesian-Markov Random Field (MRF) classification to segment probable lesion volumes present on fluid attenuated inversion recovery (FLAIR) MRI. Thereafter, a random forest classification of the information from multimodal (T1-weighted, T2-weighted, FLAIR, and apparent diffusion coefficient (ADC)) MRI images and other context-aware features (within the probable lesion areas) was used to extract areas with high likelihood of being classified as lesions. The final segmentation of the lesion was obtained by thresholding the random forest probabilistic maps. The accuracy of the automated lesion delineation method was assessed in a total of 36 patients (24 male, 12 female, mean age: 64.57±14.23yrs) at 3months after stroke onset and compared with manually segmented lesion volumes by an expert. Accuracy assessment of the automated lesion identification method was performed using the commonly used evaluation metrics. The mean sensitivity of segmentation was measured to be 0.53±0.13 with a mean positive predictive value of 0.75±0.18. The mean lesion volume difference was observed to be 32.32%±21.643% with a high Pearson's correlation of r=0.76 (p<0.0001). The lesion overlap accuracy was measured in terms of Dice similarity coefficient with a mean of 0.60±0.12, while the contour

  12. Multivariate Meta-Analysis of Genetic Association Studies: A Simulation Study.

    Directory of Open Access Journals (Sweden)

    Binod Neupane

    Full Text Available In a meta-analysis with multiple end points of interests that are correlated between or within studies, multivariate approach to meta-analysis has a potential to produce more precise estimates of effects by exploiting the correlation structure between end points. However, under random-effects assumption the multivariate estimation is more complex (as it involves estimation of more parameters simultaneously than univariate estimation, and sometimes can produce unrealistic parameter estimates. Usefulness of multivariate approach to meta-analysis of the effects of a genetic variant on two or more correlated traits is not well understood in the area of genetic association studies. In such studies, genetic variants are expected to roughly maintain Hardy-Weinberg equilibrium within studies, and also their effects on complex traits are generally very small to modest and could be heterogeneous across studies for genuine reasons. We carried out extensive simulation to explore the comparative performance of multivariate approach with most commonly used univariate inverse-variance weighted approach under random-effects assumption in various realistic meta-analytic scenarios of genetic association studies of correlated end points. We evaluated the performance with respect to relative mean bias percentage, and root mean square error (RMSE of the estimate and coverage probability of corresponding 95% confidence interval of the effect for each end point. Our simulation results suggest that multivariate approach performs similarly or better than univariate method when correlations between end points within or between studies are at least moderate and between-study variation is similar or larger than average within-study variation for meta-analyses of 10 or more genetic studies. Multivariate approach produces estimates with smaller bias and RMSE especially for the end point that has randomly or informatively missing summary data in some individual studies, when

  13. Day-ahead load forecast using random forest and expert input selection

    International Nuclear Information System (INIS)

    Lahouar, A.; Ben Hadj Slama, J.

    2015-01-01

    Highlights: • A model based on random forests for short term load forecast is proposed. • An expert feature selection is added to refine inputs. • Special attention is paid to customers behavior, load profile and special holidays. • The model is flexible and able to handle complex load signal. • A technical comparison is performed to assess the forecast accuracy. - Abstract: The electrical load forecast is getting more and more important in recent years due to the electricity market deregulation and integration of renewable resources. To overcome the incoming challenges and ensure accurate power prediction for different time horizons, sophisticated intelligent methods are elaborated. Utilization of intelligent forecast algorithms is among main characteristics of smart grids, and is an efficient tool to face uncertainty. Several crucial tasks of power operators such as load dispatch rely on the short term forecast, thus it should be as accurate as possible. To this end, this paper proposes a short term load predictor, able to forecast the next 24 h of load. Using random forest, characterized by immunity to parameter variations and internal cross validation, the model is constructed following an online learning process. The inputs are refined by expert feature selection using a set of if–then rules, in order to include the own user specifications about the country weather or market, and to generalize the forecast ability. The proposed approach is tested through a real historical set from the Tunisian Power Company, and the simulation shows accurate and satisfactory results for one day in advance, with an average error exceeding rarely 2.3%. The model is validated for regular working days and weekends, and special attention is paid to moving holidays, following non Gregorian calendar

  14. Feature selection and classification of mechanical fault of an induction motor using random forest classifier

    OpenAIRE

    Patel, Raj Kumar; Giri, V.K.

    2016-01-01

    Fault detection and diagnosis is the most important technology in condition-based maintenance (CBM) system for rotating machinery. This paper experimentally explores the development of a random forest (RF) classifier, a recently emerged machine learning technique, for multi-class mechanical fault diagnosis in bearing of an induction motor. Firstly, the vibration signals are collected from the bearing using accelerometer sensor. Parameters from the vibration signal are extracted in the form of...

  15. A Valid Matérn Class of Cross-Covariance Functions for Multivariate Random Fields With Any Number of Components

    KAUST Repository

    Apanasovich, Tatiyana V.

    2012-03-01

    We introduce a valid parametric family of cross-covariance functions for multivariate spatial random fields where each component has a covariance function from a well-celebrated Matérn class. Unlike previous attempts, our model indeed allows for various smoothnesses and rates of correlation decay for any number of vector components.We present the conditions on the parameter space that result in valid models with varying degrees of complexity. We discuss practical implementations, including reparameterizations to reflect the conditions on the parameter space and an iterative algorithm to increase the computational efficiency. We perform various Monte Carlo simulation experiments to explore the performances of our approach in terms of estimation and cokriging. The application of the proposed multivariate Matérnmodel is illustrated on two meteorological datasets: temperature/pressure over the Pacific Northwest (bivariate) and wind/temperature/pressure in Oklahoma (trivariate). In the latter case, our flexible trivariate Matérn model is valid and yields better predictive scores compared with a parsimonious model with common scale parameters. © 2012 American Statistical Association.

  16. Simulation of multivariate stationary stochastic processes using dimension-reduction representation methods

    Science.gov (United States)

    Liu, Zhangjun; Liu, Zenghui; Peng, Yongbo

    2018-03-01

    In view of the Fourier-Stieltjes integral formula of multivariate stationary stochastic processes, a unified formulation accommodating spectral representation method (SRM) and proper orthogonal decomposition (POD) is deduced. By introducing random functions as constraints correlating the orthogonal random variables involved in the unified formulation, the dimension-reduction spectral representation method (DR-SRM) and the dimension-reduction proper orthogonal decomposition (DR-POD) are addressed. The proposed schemes are capable of representing the multivariate stationary stochastic process with a few elementary random variables, bypassing the challenges of high-dimensional random variables inherent in the conventional Monte Carlo methods. In order to accelerate the numerical simulation, the technique of Fast Fourier Transform (FFT) is integrated with the proposed schemes. For illustrative purposes, the simulation of horizontal wind velocity field along the deck of a large-span bridge is proceeded using the proposed methods containing 2 and 3 elementary random variables. Numerical simulation reveals the usefulness of the dimension-reduction representation methods.

  17. An Introduction to Recursive Partitioning: Rationale, Application, and Characteristics of Classification and Regression Trees, Bagging, and Random Forests

    Science.gov (United States)

    Strobl, Carolin; Malley, James; Tutz, Gerhard

    2009-01-01

    Recursive partitioning methods have become popular and widely used tools for nonparametric regression and classification in many scientific fields. Especially random forests, which can deal with large numbers of predictor variables even in the presence of complex interactions, have been applied successfully in genetics, clinical medicine, and…

  18. A systematic review of randomized controlled trials on curative and health enhancement effects of forest therapy

    Directory of Open Access Journals (Sweden)

    Kamioka H

    2012-07-01

    Full Text Available Hiroharu Kamioka,1 Kiichiro Tsutani,2 Yoshiteru Mutoh,3 Takuya Honda,4 Nobuyoshi Shiozawa,5 Shinpei Okada,6 Sang-Jun Park,6 Jun Kitayuguchi,7 Masamitsu Kamada,8 Hiroyasu Okuizumi,9 Shuichi Handa91Faculty of Regional Environment Science, Tokyo University of Agriculture, Tokyo, 2Department of Drug Policy and Management, Graduate School of Pharmaceutical Sciences, The University of Tokyo, Tokyo, 3Todai Policy Alternatives Research Institute, The University of Tokyo, Tokyo, 4Japanese Society for the Promotion of Science, Tokyo, 5Food Labeling Division, Consumer Affairs Agency, Cabinet Office, Government of Japan, Tokyo, 6Physical Education and Medicine Research Foundation, Nagano, 7Physical Education and Medicine Research Center Unnan, Shimane, 8Department of Environmental and Preventive Medicine, Shimane University School of Medicine, Shimane, 9Mimaki Onsen (Spa Clinic, Tomi City, Nagano, JapanObjective: To summarize the evidence for curative and health enhancement effects through forest therapy and to assess the quality of studies based on a review of randomized controlled trials (RCTs.Study design: A systematic review based on RCTs.Methods: Studies were eligible if they were RCTs. Studies included one treatment group in which forest therapy was applied. The following databases – from 1990 to November 9, 2010 – were searched: MEDLINE via PubMed, CINAHL, Web of Science, and Ichushi-Web. All Cochrane databases and Campbell Systematic Reviews were also searched up to November 9, 2010.Results: Two trials met all inclusion criteria. No specific diseases were evaluated, and both studies reported significant effectiveness in one or more outcomes for health enhancement. However, the results of evaluations with the CONSORT (Consolidated Standards of Reporting Trials 2010 and CLEAR NPT (A Checklist to Evaluate a Report of a Nonpharmacological Trial checklists generally showed a remarkable lack of description in the studies. Furthermore, there was a

  19. Estimating the Performance of Random Forest versus Multiple Regression for Predicting Prices of the Apartments

    Directory of Open Access Journals (Sweden)

    Marjan Čeh

    2018-05-01

    Full Text Available The goal of this study is to analyse the predictive performance of the random forest machine learning technique in comparison to commonly used hedonic models based on multiple regression for the prediction of apartment prices. A data set that includes 7407 records of apartment transactions referring to real estate sales from 2008–2013 in the city of Ljubljana, the capital of Slovenia, was used in order to test and compare the predictive performances of both models. Apparent challenges faced during modelling included (1 the non-linear nature of the prediction assignment task; (2 input data being based on transactions occurring over a period of great price changes in Ljubljana whereby a 28% decline was noted in six consecutive testing years; and (3 the complex urban form of the case study area. Available explanatory variables, organised as a Geographic Information Systems (GIS ready dataset, including the structural and age characteristics of the apartments as well as environmental and neighbourhood information were considered in the modelling procedure. All performance measures (R2 values, sales ratios, mean average percentage error (MAPE, coefficient of dispersion (COD revealed significantly better results for predictions obtained by the random forest method, which confirms the prospective of this machine learning technique on apartment price prediction.

  20. The Fault Diagnosis of Rolling Bearing Based on Ensemble Empirical Mode Decomposition and Random Forest

    OpenAIRE

    Qin, Xiwen; Li, Qiaoling; Dong, Xiaogang; Lv, Siqi

    2017-01-01

    Accurate diagnosis of rolling bearing fault on the normal operation of machinery and equipment has a very important significance. A method combining Ensemble Empirical Mode Decomposition (EEMD) and Random Forest (RF) is proposed. Firstly, the original signal is decomposed into several intrinsic mode functions (IMFs) by EEMD, and the effective IMFs are selected. Then their energy entropy is calculated as the feature. Finally, the classification is performed by RF. In addition, the wavelet meth...

  1. Exploring forest infrastructures equipment through multivariate analysis: complementarities, gaps and overlaps in the Mediterranean basin

    Directory of Open Access Journals (Sweden)

    Sofia Bajocco

    2013-12-01

    Full Text Available The countries of the Mediterranean basin face several challenges regarding the sustainability of forest ecosystems and the delivery of crucial goods and services that they provide in a context of rapid global changes. Advancing scientific knowledge and foresting innovation is essential to ensure the sustainable management of Mediterranean forests and maximize the potential role of their unique goods and services in building a knowledge-based bioeconomy in the region. In this context, the European project FORESTERRA ("Enhancing FOrest RESearch in the MediTERRAnean through improved coordination and integration” aims at reinforcing the scientific cooperation on Mediterranean forests through an ambitious transnational framework in order to reduce the existing research fragmentation and maximize the effectiveness of forest research activities. Within the FORESTERRA project framework, this work analyzed the infrastructures equipment of the Mediterranean countries belonging to the project Consortium. According to the European Commission, research infrastructures are facilities, resources and services that are used by the scientific communities to conduct research and foster innovation. To the best of our knowledge, the equipment and availability of infrastructures, in terms of experimental sites, research facilities and databases, have only rarely been explored. The aim of this paper was hence to identify complementarities, gaps and overlaps among the different forest research institutes in order to create a scientific network, optimize the resources and trigger collaborations.

  2. Prostate cancer prediction using the random forest algorithm that takes into account transrectal ultrasound findings, age, and serum levels of prostate-specific antigen.

    Science.gov (United States)

    Xiao, Li-Hong; Chen, Pei-Ran; Gou, Zhong-Ping; Li, Yong-Zhong; Li, Mei; Xiang, Liang-Cheng; Feng, Ping

    2017-01-01

    The aim of this study is to evaluate the ability of the random forest algorithm that combines data on transrectal ultrasound findings, age, and serum levels of prostate-specific antigen to predict prostate carcinoma. Clinico-demographic data were analyzed for 941 patients with prostate diseases treated at our hospital, including age, serum prostate-specific antigen levels, transrectal ultrasound findings, and pathology diagnosis based on ultrasound-guided needle biopsy of the prostate. These data were compared between patients with and without prostate cancer using the Chi-square test, and then entered into the random forest model to predict diagnosis. Patients with and without prostate cancer differed significantly in age and serum prostate-specific antigen levels (P prostate-specific antigen and ultrasound predicted prostate cancer with an accuracy of 83.10%, sensitivity of 65.64%, and specificity of 93.83%. Positive predictive value was 86.72%, and negative predictive value was 81.64%. By integrating age, prostate-specific antigen levels and transrectal ultrasound findings, the random forest algorithm shows better diagnostic performance for prostate cancer than either diagnostic indicator on its own. This algorithm may help improve diagnosis of the disease by identifying patients at high risk for biopsy.

  3. A random forest classifier for the prediction of energy expenditure and type of physical activity from wrist and hip accelerometers

    International Nuclear Information System (INIS)

    Ellis, Katherine; Lanckriet, Gert; Kerr, Jacqueline; Godbole, Suneeta; Wing, David; Marshall, Simon

    2014-01-01

    Wrist accelerometers are being used in population level surveillance of physical activity (PA) but more research is needed to evaluate their validity for correctly classifying types of PA behavior and predicting energy expenditure (EE). In this study we compare accelerometers worn on the wrist and hip, and the added value of heart rate (HR) data, for predicting PA type and EE using machine learning. Forty adults performed locomotion and household activities in a lab setting while wearing three ActiGraph GT3X+ accelerometers (left hip, right hip, non-dominant wrist) and a HR monitor (Polar RS400). Participants also wore a portable indirect calorimeter (COSMED K4b2), from which EE and metabolic equivalents (METs) were computed for each minute. We developed two predictive models: a random forest classifier to predict activity type and a random forest of regression trees to estimate METs. Predictions were evaluated using leave-one-user-out cross-validation. The hip accelerometer obtained an average accuracy of 92.3% in predicting four activity types (household, stairs, walking, running), while the wrist accelerometer obtained an average accuracy of 87.5%. Across all 8 activities combined (laundry, window washing, dusting, dishes, sweeping, stairs, walking, running), the hip and wrist accelerometers obtained average accuracies of 70.2% and 80.2% respectively. Predicting METs using the hip or wrist devices alone obtained root mean square errors (rMSE) of 1.09 and 1.00 METs per 6 min bout, respectively. Including HR data improved MET estimation, but did not significantly improve activity type classification. These results demonstrate the validity of random forest classification and regression forests for PA type and MET prediction using accelerometers. The wrist accelerometer proved more useful in predicting activities with significant arm movement, while the hip accelerometer was superior for predicting locomotion and estimating EE. (paper)

  4. A random forest classifier for the prediction of energy expenditure and type of physical activity from wrist and hip accelerometers.

    Science.gov (United States)

    Ellis, Katherine; Kerr, Jacqueline; Godbole, Suneeta; Lanckriet, Gert; Wing, David; Marshall, Simon

    2014-11-01

    Wrist accelerometers are being used in population level surveillance of physical activity (PA) but more research is needed to evaluate their validity for correctly classifying types of PA behavior and predicting energy expenditure (EE). In this study we compare accelerometers worn on the wrist and hip, and the added value of heart rate (HR) data, for predicting PA type and EE using machine learning. Forty adults performed locomotion and household activities in a lab setting while wearing three ActiGraph GT3X+ accelerometers (left hip, right hip, non-dominant wrist) and a HR monitor (Polar RS400). Participants also wore a portable indirect calorimeter (COSMED K4b2), from which EE and metabolic equivalents (METs) were computed for each minute. We developed two predictive models: a random forest classifier to predict activity type and a random forest of regression trees to estimate METs. Predictions were evaluated using leave-one-user-out cross-validation. The hip accelerometer obtained an average accuracy of 92.3% in predicting four activity types (household, stairs, walking, running), while the wrist accelerometer obtained an average accuracy of 87.5%. Across all 8 activities combined (laundry, window washing, dusting, dishes, sweeping, stairs, walking, running), the hip and wrist accelerometers obtained average accuracies of 70.2% and 80.2% respectively. Predicting METs using the hip or wrist devices alone obtained root mean square errors (rMSE) of 1.09 and 1.00 METs per 6 min bout, respectively. Including HR data improved MET estimation, but did not significantly improve activity type classification. These results demonstrate the validity of random forest classification and regression forests for PA type and MET prediction using accelerometers. The wrist accelerometer proved more useful in predicting activities with significant arm movement, while the hip accelerometer was superior for predicting locomotion and estimating EE.

  5. Genome-wide association data classification and SNPs selection using two-stage quality-based Random Forests.

    Science.gov (United States)

    Nguyen, Thanh-Tung; Huang, Joshua; Wu, Qingyao; Nguyen, Thuy; Li, Mark

    2015-01-01

    Single-nucleotide polymorphisms (SNPs) selection and identification are the most important tasks in Genome-wide association data analysis. The problem is difficult because genome-wide association data is very high dimensional and a large portion of SNPs in the data is irrelevant to the disease. Advanced machine learning methods have been successfully used in Genome-wide association studies (GWAS) for identification of genetic variants that have relatively big effects in some common, complex diseases. Among them, the most successful one is Random Forests (RF). Despite of performing well in terms of prediction accuracy in some data sets with moderate size, RF still suffers from working in GWAS for selecting informative SNPs and building accurate prediction models. In this paper, we propose to use a new two-stage quality-based sampling method in random forests, named ts-RF, for SNP subspace selection for GWAS. The method first applies p-value assessment to find a cut-off point that separates informative and irrelevant SNPs in two groups. The informative SNPs group is further divided into two sub-groups: highly informative and weak informative SNPs. When sampling the SNP subspace for building trees for the forest, only those SNPs from the two sub-groups are taken into account. The feature subspaces always contain highly informative SNPs when used to split a node at a tree. This approach enables one to generate more accurate trees with a lower prediction error, meanwhile possibly avoiding overfitting. It allows one to detect interactions of multiple SNPs with the diseases, and to reduce the dimensionality and the amount of Genome-wide association data needed for learning the RF model. Extensive experiments on two genome-wide SNP data sets (Parkinson case-control data comprised of 408,803 SNPs and Alzheimer case-control data comprised of 380,157 SNPs) and 10 gene data sets have demonstrated that the proposed model significantly reduced prediction errors and outperformed

  6. Application of random survival forests in understanding the determinants of under-five child mortality in Uganda in the presence of covariates that satisfy the proportional and non-proportional hazards assumption.

    Science.gov (United States)

    Nasejje, Justine B; Mwambi, Henry

    2017-09-07

    Uganda just like any other Sub-Saharan African country, has a high under-five child mortality rate. To inform policy on intervention strategies, sound statistical methods are required to critically identify factors strongly associated with under-five child mortality rates. The Cox proportional hazards model has been a common choice in analysing data to understand factors strongly associated with high child mortality rates taking age as the time-to-event variable. However, due to its restrictive proportional hazards (PH) assumption, some covariates of interest which do not satisfy the assumption are often excluded in the analysis to avoid mis-specifying the model. Otherwise using covariates that clearly violate the assumption would mean invalid results. Survival trees and random survival forests are increasingly becoming popular in analysing survival data particularly in the case of large survey data and could be attractive alternatives to models with the restrictive PH assumption. In this article, we adopt random survival forests which have never been used in understanding factors affecting under-five child mortality rates in Uganda using Demographic and Health Survey data. Thus the first part of the analysis is based on the use of the classical Cox PH model and the second part of the analysis is based on the use of random survival forests in the presence of covariates that do not necessarily satisfy the PH assumption. Random survival forests and the Cox proportional hazards model agree that the sex of the household head, sex of the child, number of births in the past 1 year are strongly associated to under-five child mortality in Uganda given all the three covariates satisfy the PH assumption. Random survival forests further demonstrated that covariates that were originally excluded from the earlier analysis due to violation of the PH assumption were important in explaining under-five child mortality rates. These covariates include the number of children under the

  7. Mapping Spatial Distribution of Larch Plantations from Multi-Seasonal Landsat-8 OLI Imagery and Multi-Scale Textures Using Random Forests

    Directory of Open Access Journals (Sweden)

    Tian Gao

    2015-02-01

    Full Text Available The knowledge about spatial distribution of plantation forests is critical for forest management, monitoring programs and functional assessment. This study demonstrates the potential of multi-seasonal (spring, summer, autumn and winter Landsat-8 Operational Land Imager imageries with random forests (RF modeling to map larch plantations (LP in a typical plantation forest landscape in North China. The spectral bands and two types of textures were applied for creating 675 input variables of RF. An accuracy of 92.7% for LP, with a Kappa coefficient of 0.834, was attained using the RF model. A RF-based importance assessment reveals that the spectral bands and bivariate textural features calculated by pseudo-cross variogram (PC strongly promoted forest class-separability, whereas the univariate textural features influenced weakly. A feature selection strategy eliminated 93% of variables, and then a subset of the 47 most essential variables was generated. In this subset, PC texture derived from summer and winter appeared the most frequently, suggesting that this variability in growing peak season and non-growing season can effectively enhance forest class-separability. A RF classifier applied to the subset led to 91.9% accuracy for LP, with a Kappa coefficient of 0.829. This study provides an insight into approaches for discriminating plantation forests with phenological behaviors.

  8. On set-valued functionals: Multivariate risk measures and Aumann integrals

    Science.gov (United States)

    Ararat, Cagin

    In this dissertation, multivariate risk measures for random vectors and Aumann integrals of set-valued functions are studied. Both are set-valued functionals with values in a complete lattice of subsets of Rm. Multivariate risk measures are considered in a general d-asset financial market with trading opportunities in discrete time. Specifically, the following features of the market are incorporated in the evaluation of multivariate risk: convex transaction costs modeled by solvency regions, intermediate trading constraints modeled by convex random sets, and the requirement of liquidation into the first m ≤ d of the assets. It is assumed that the investor has a "pure" multivariate risk measure R on the space of m-dimensional random vectors which represents her risk attitude towards the assets but does not take into account the frictions of the market. Then, the investor with a d-dimensional position minimizes the set-valued functional R over all m-dimensional positions that she can reach by trading in the market subject to the frictions described above. The resulting functional Rmar on the space of d-dimensional random vectors is another multivariate risk measure, called the market-extension of R. A dual representation for R mar that decomposes the effects of R and the frictions of the market is proved. Next, multivariate risk measures are studied in a utility-based framework. It is assumed that the investor has a complete risk preference towards each individual asset, which can be represented by a von Neumann-Morgenstern utility function. Then, an incomplete preference is considered for multivariate positions which is represented by the vector of the individual utility functions. Under this structure, multivariate shortfall and divergence risk measures are defined as the optimal values of set minimization problems. The dual relationship between the two classes of multivariate risk measures is constructed via a recent Lagrange duality for set optimization. In

  9. Water chemistry in 179 randomly selected Swedish headwater streams related to forest production, clear-felling and climate.

    Science.gov (United States)

    Löfgren, Stefan; Fröberg, Mats; Yu, Jun; Nisell, Jakob; Ranneby, Bo

    2014-12-01

    From a policy perspective, it is important to understand forestry effects on surface waters from a landscape perspective. The EU Water Framework Directive demands remedial actions if not achieving good ecological status. In Sweden, 44 % of the surface water bodies have moderate ecological status or worse. Many of these drain catchments with a mosaic of managed forests. It is important for the forestry sector and water authorities to be able to identify where, in the forested landscape, special precautions are necessary. The aim of this study was to quantify the relations between forestry parameters and headwater stream concentrations of nutrients, organic matter and acid-base chemistry. The results are put into the context of regional climate, sulphur and nitrogen deposition, as well as marine influences. Water chemistry was measured in 179 randomly selected headwater streams from two regions in southwest and central Sweden, corresponding to 10 % of the Swedish land area. Forest status was determined from satellite images and Swedish National Forest Inventory data using the probabilistic classifier method, which was used to model stream water chemistry with Bayesian model averaging. The results indicate that concentrations of e.g. nitrogen, phosphorus and organic matter are related to factors associated with forest production but that it is not forestry per se that causes the excess losses. Instead, factors simultaneously affecting forest production and stream water chemistry, such as climate, extensive soil pools and nitrogen deposition, are the most likely candidates The relationships with clear-felled and wetland areas are likely to be direct effects.

  10. Identification of a potential fibromyalgia diagnosis using random forest modeling applied to electronic medical records

    OpenAIRE

    Masters, Elizabeth T.; Emir,Birol; Mardekian,Jack; Clair,Andrew; Kuhn,Max; Silverman,Stuart

    2015-01-01

    Birol Emir,1 Elizabeth T Masters,1 Jack Mardekian,1 Andrew Clair,1 Max Kuhn,2 Stuart L Silverman,3 1Pfizer Inc., New York, NY, 2Pfizer Inc., Groton, CT, 3Cedars-Sinai Medical Center, Los Angeles, CA, USA Background: Diagnosis of fibromyalgia (FM), a chronic musculoskeletal condition characterized by widespread pain and a constellation of symptoms, remains challenging and is often delayed. Methods: Random forest modeling of electronic medical records was used to identify variables that may fa...

  11. Integrating support vector machines and random forests to classify crops in time series of Worldview-2 images

    Science.gov (United States)

    Zafari, A.; Zurita-Milla, R.; Izquierdo-Verdiguier, E.

    2017-10-01

    Crop maps are essential inputs for the agricultural planning done at various governmental and agribusinesses agencies. Remote sensing offers timely and costs efficient technologies to identify and map crop types over large areas. Among the plethora of classification methods, Support Vector Machine (SVM) and Random Forest (RF) are widely used because of their proven performance. In this work, we study the synergic use of both methods by introducing a random forest kernel (RFK) in an SVM classifier. A time series of multispectral WorldView-2 images acquired over Mali (West Africa) in 2014 was used to develop our case study. Ground truth containing five common crop classes (cotton, maize, millet, peanut, and sorghum) were collected at 45 farms and used to train and test the classifiers. An SVM with the standard Radial Basis Function (RBF) kernel, a RF, and an SVM-RFK were trained and tested over 10 random training and test subsets generated from the ground data. Results show that the newly proposed SVM-RFK classifier can compete with both RF and SVM-RBF. The overall accuracies based on the spectral bands only are of 83, 82 and 83% respectively. Adding vegetation indices to the analysis result in the classification accuracy of 82, 81 and 84% for SVM-RFK, RF, and SVM-RBF respectively. Overall, it can be observed that the newly tested RFK can compete with SVM-RBF and RF classifiers in terms of classification accuracy.

  12. On Complex Random Variables

    Directory of Open Access Journals (Sweden)

    Anwer Khurshid

    2012-07-01

    Full Text Available Normal 0 false false false EN-US X-NONE X-NONE In this paper, it is shown that a complex multivariate random variable  is a complex multivariate normal random variable of dimensionality if and only if all nondegenerate complex linear combinations of  have a complex univariate normal distribution. The characteristic function of  has been derived, and simpler forms of some theorems have been given using this characterization theorem without assuming that the variance-covariance matrix of the vector  is Hermitian positive definite. Marginal distributions of  have been given. In addition, a complex multivariate t-distribution has been defined and the density derived. A characterization of the complex multivariate t-distribution is given. A few possible uses of this distribution have been suggested.

  13. Classification of Phishing Email Using Random Forest Machine Learning Technique

    Directory of Open Access Journals (Sweden)

    Andronicus A. Akinyelu

    2014-01-01

    Full Text Available Phishing is one of the major challenges faced by the world of e-commerce today. Thanks to phishing attacks, billions of dollars have been lost by many companies and individuals. In 2012, an online report put the loss due to phishing attack at about $1.5 billion. This global impact of phishing attacks will continue to be on the increase and thus requires more efficient phishing detection techniques to curb the menace. This paper investigates and reports the use of random forest machine learning algorithm in classification of phishing attacks, with the major objective of developing an improved phishing email classifier with better prediction accuracy and fewer numbers of features. From a dataset consisting of 2000 phishing and ham emails, a set of prominent phishing email features (identified from the literature were extracted and used by the machine learning algorithm with a resulting classification accuracy of 99.7% and low false negative (FN and false positive (FP rates.

  14. Predicting Metabolic Syndrome Using the Random Forest Method

    Directory of Open Access Journals (Sweden)

    Apilak Worachartcheewan

    2015-01-01

    Full Text Available Aims. This study proposes a computational method for determining the prevalence of metabolic syndrome (MS and to predict its occurrence using the National Cholesterol Education Program Adult Treatment Panel III (NCEP ATP III criteria. The Random Forest (RF method is also applied to identify significant health parameters. Materials and Methods. We used data from 5,646 adults aged between 18–78 years residing in Bangkok who had received an annual health check-up in 2008. MS was identified using the NCEP ATP III criteria. The RF method was applied to predict the occurrence of MS and to identify important health parameters surrounding this disorder. Results. The overall prevalence of MS was 23.70% (34.32% for males and 17.74% for females. RF accuracy for predicting MS in an adult Thai population was 98.11%. Further, based on RF, triglyceride levels were the most important health parameter associated with MS. Conclusion. RF was shown to predict MS in an adult Thai population with an accuracy >98% and triglyceride levels were identified as the most informative variable associated with MS. Therefore, using RF to predict MS may be potentially beneficial in identifying MS status for preventing the development of diabetes mellitus and cardiovascular diseases.

  15. Estimating correlation between multivariate longitudinal data in the presence of heterogeneity.

    Science.gov (United States)

    Gao, Feng; Philip Miller, J; Xiong, Chengjie; Luo, Jingqin; Beiser, Julia A; Chen, Ling; Gordon, Mae O

    2017-08-17

    Estimating correlation coefficients among outcomes is one of the most important analytical tasks in epidemiological and clinical research. Availability of multivariate longitudinal data presents a unique opportunity to assess joint evolution of outcomes over time. Bivariate linear mixed model (BLMM) provides a versatile tool with regard to assessing correlation. However, BLMMs often assume that all individuals are drawn from a single homogenous population where the individual trajectories are distributed smoothly around population average. Using longitudinal mean deviation (MD) and visual acuity (VA) from the Ocular Hypertension Treatment Study (OHTS), we demonstrated strategies to better understand the correlation between multivariate longitudinal data in the presence of potential heterogeneity. Conditional correlation (i.e., marginal correlation given random effects) was calculated to describe how the association between longitudinal outcomes evolved over time within specific subpopulation. The impact of heterogeneity on correlation was also assessed by simulated data. There was a significant positive correlation in both random intercepts (ρ = 0.278, 95% CI: 0.121-0.420) and random slopes (ρ = 0.579, 95% CI: 0.349-0.810) between longitudinal MD and VA, and the strength of correlation constantly increased over time. However, conditional correlation and simulation studies revealed that the correlation was induced primarily by participants with rapid deteriorating MD who only accounted for a small fraction of total samples. Conditional correlation given random effects provides a robust estimate to describe the correlation between multivariate longitudinal data in the presence of unobserved heterogeneity (NCT00000125).

  16. Multivariate Methods for Meta-Analysis of Genetic Association Studies.

    Science.gov (United States)

    Dimou, Niki L; Pantavou, Katerina G; Braliou, Georgia G; Bagos, Pantelis G

    2018-01-01

    Multivariate meta-analysis of genetic association studies and genome-wide association studies has received a remarkable attention as it improves the precision of the analysis. Here, we review, summarize and present in a unified framework methods for multivariate meta-analysis of genetic association studies and genome-wide association studies. Starting with the statistical methods used for robust analysis and genetic model selection, we present in brief univariate methods for meta-analysis and we then scrutinize multivariate methodologies. Multivariate models of meta-analysis for a single gene-disease association studies, including models for haplotype association studies, multiple linked polymorphisms and multiple outcomes are discussed. The popular Mendelian randomization approach and special cases of meta-analysis addressing issues such as the assumption of the mode of inheritance, deviation from Hardy-Weinberg Equilibrium and gene-environment interactions are also presented. All available methods are enriched with practical applications and methodologies that could be developed in the future are discussed. Links for all available software implementing multivariate meta-analysis methods are also provided.

  17. A new multivariate zero-adjusted Poisson model with applications to biomedicine.

    Science.gov (United States)

    Liu, Yin; Tian, Guo-Liang; Tang, Man-Lai; Yuen, Kam Chuen

    2018-05-25

    Recently, although advances were made on modeling multivariate count data, existing models really has several limitations: (i) The multivariate Poisson log-normal model (Aitchison and Ho, ) cannot be used to fit multivariate count data with excess zero-vectors; (ii) The multivariate zero-inflated Poisson (ZIP) distribution (Li et al., 1999) cannot be used to model zero-truncated/deflated count data and it is difficult to apply to high-dimensional cases; (iii) The Type I multivariate zero-adjusted Poisson (ZAP) distribution (Tian et al., 2017) could only model multivariate count data with a special correlation structure for random components that are all positive or negative. In this paper, we first introduce a new multivariate ZAP distribution, based on a multivariate Poisson distribution, which allows the correlations between components with a more flexible dependency structure, that is some of the correlation coefficients could be positive while others could be negative. We then develop its important distributional properties, and provide efficient statistical inference methods for multivariate ZAP model with or without covariates. Two real data examples in biomedicine are used to illustrate the proposed methods. © 2018 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.

  18. Improving the chances of successful protein structure determination with a random forest classifier

    Energy Technology Data Exchange (ETDEWEB)

    Jahandideh, Samad [Sanford-Burnham Medical Research Institute, 10901 North Torrey Pines Road, La Jolla, CA 92307 (United States); Joint Center for Structural Genomics, (United States); Jaroszewski, Lukasz; Godzik, Adam, E-mail: adam@burnham.org [Sanford-Burnham Medical Research Institute, 10901 North Torrey Pines Road, La Jolla, CA 92307 (United States); Joint Center for Structural Genomics, (United States); University of California, San Diego, La Jolla, California (United States)

    2014-03-01

    Using an extended set of protein features calculated separately for protein surface and interior, a new version of XtalPred based on a random forest classifier achieves a significant improvement in predicting the success of structure determination from the primary amino-acid sequence. Obtaining diffraction quality crystals remains one of the major bottlenecks in structural biology. The ability to predict the chances of crystallization from the amino-acid sequence of the protein can, at least partly, address this problem by allowing a crystallographer to select homologs that are more likely to succeed and/or to modify the sequence of the target to avoid features that are detrimental to successful crystallization. In 2007, the now widely used XtalPred algorithm [Slabinski et al. (2007 ▶), Protein Sci.16, 2472–2482] was developed. XtalPred classifies proteins into five ‘crystallization classes’ based on a simple statistical analysis of the physicochemical features of a protein. Here, towards the same goal, advanced machine-learning methods are applied and, in addition, the predictive potential of additional protein features such as predicted surface ruggedness, hydrophobicity, side-chain entropy of surface residues and amino-acid composition of the predicted protein surface are tested. The new XtalPred-RF (random forest) achieves significant improvement of the prediction of crystallization success over the original XtalPred. To illustrate this, XtalPred-RF was tested by revisiting target selection from 271 Pfam families targeted by the Joint Center for Structural Genomics (JCSG) in PSI-2, and it was estimated that the number of targets entered into the protein-production and crystallization pipeline could have been reduced by 30% without lowering the number of families for which the first structures were solved. The prediction improvement depends on the subset of targets used as a testing set and reaches 100% (i.e. twofold) for the top class of predicted

  19. Multivariate fractional Poisson processes and compound sums

    OpenAIRE

    Beghin, Luisa; Macci, Claudio

    2015-01-01

    In this paper we present multivariate space-time fractional Poisson processes by considering common random time-changes of a (finite-dimensional) vector of independent classical (non-fractional) Poisson processes. In some cases we also consider compound processes. We obtain some equations in terms of some suitable fractional derivatives and fractional difference operators, which provides the extension of known equations for the univariate processes.

  20. Forest Stakeholder Participation in Improving Game Habitat in Swedish Forests

    Directory of Open Access Journals (Sweden)

    Eugene E. Ezebilo

    2012-07-01

    Full Text Available Although in Sweden the simultaneous use of forests for timber production and game hunting are both of socioeconomic importance it often leads to conflicting interests. This study examines forest stakeholder participation in improving game habitat to increase hunting opportunities as well as redistribute game activities in forests to help reduce browsing damage in valuable forest stands. The data for the study were collected from a nationwide survey that involved randomly selected hunters and forest owners in Sweden. An ordered logit model was used to account for possible factors influencing the respondents’ participation in improving game habitat. The results showed that on average, forest owning hunters were more involved in improving game habitat than non-hunting forest owners. The involvement of non-forest owning hunters was intermediate between the former two groups. The respondents’ participation in improving game habitat were mainly influenced by factors such as the quantity of game meat obtained, stakeholder group, forests on hunting grounds, the extent of risk posed by game browsing damage to the economy of forest owners, importance of bagging game during hunting, and number of hunting days. The findings will help in designing a more sustainable forest management strategy that integrates timber production and game hunting in forests.

  1. Effects of Deforestation and Forest Degradation on Forest Carbon Stocks in Collaborative Forests, Nepal

    Directory of Open Access Journals (Sweden)

    Ram Asheshwar MANDAL

    2012-12-01

    Full Text Available There are some key drivers that favor deforestation and forest degradation. Consequently, levels of carbon stock are affected in different parts of same forest types. But the problem lies in exploring the extent of the effects on level of carbon stocking. This paper highlights the variations in levels of carbon stocks in three different collaborative forests of same forest type i.e. tropical sal (Shorea robusta forest in Mahottari district of the central Terai in Nepal. Three collaborative forests namely Gadhanta-Bardibas Collaborative Forest (CFM, Tuteshwarnath CFM and Banke- Maraha CFM were selected for research site. Interview and workshops were organized with the key informants that include staffs, members and representatives of CFMs to collect the socio-economic data and stratified random sampling was applied to collect the bio-physical data to calculate the carbon stocks. Analysis was carried out using statistical tools. It was found five major drivers namely grazing, fire, logging, growth of invasive species and encroachment. It was found highest carbon 269.36 ton per ha in Gadhanta- Bardibash CFM. The findings showed that the levels of carbon stocks in the three studied CFMs are different depending on how the drivers of deforestation and forest degradation influence over them.

  2. The experimental design of the Missouri Ozark Forest Ecosystem Project

    Science.gov (United States)

    Steven L. Sheriff; Shuoqiong. He

    1997-01-01

    The Missouri Ozark Forest Ecosystem Project (MOFEP) is an experiment that examines the effects of three forest management practices on the forest community. MOFEP is designed as a randomized complete block design using nine sites divided into three blocks. Treatments of uneven-aged, even-aged, and no-harvest management were randomly assigned to sites within each block...

  3. Exploring precrash maneuvers using classification trees and random forests.

    Science.gov (United States)

    Harb, Rami; Yan, Xuedong; Radwan, Essam; Su, Xiaogang

    2009-01-01

    Taking evasive actions vis-à-vis critical traffic situations impending to motor vehicle crashes endows drivers an opportunity to avoid the crash occurrence or at least diminish its severity. This study explores the drivers, vehicles, and environments' characteristics associated with crash avoidance maneuvers (i.e., evasive actions or no evasive actions). Rear-end collisions, head-on collisions, and angle collisions are analyzed separately using decision trees and the significance of the variables on the binary response variable (evasive actions or no evasive actions) is determined. Moreover, the random forests method is employed to rank the importance of the drivers/vehicles/environments characteristics on crash avoidance maneuvers. According to the exploratory analyses' results, drivers' visibility obstruction, drivers' physical impairment, drivers' distraction are associated with crash avoidance maneuvers in all three types of accidents. Moreover, speed limit is associated with rear-end collisions' avoidance maneuvers and vehicle type is correlated with head-on collisions and angle collisions' avoidance maneuvers. It is recommended that future research investigates further the explored trends (e.g., physically impaired drivers, visibility obstruction) using driving simulators which may help in legislative initiatives and in-vehicle technology recommendations.

  4. Random forest classification of stars in the Galactic Centre

    Science.gov (United States)

    Plewa, P. M.

    2018-05-01

    Near-infrared high-angular resolution imaging observations of the Milky Way's nuclear star cluster have revealed all luminous members of the existing stellar population within the central parsec. Generally, these stars are either evolved late-type giants or massive young, early-type stars. We revisit the problem of stellar classification based on intermediate-band photometry in the K band, with the primary aim of identifying faint early-type candidate stars in the extended vicinity of the central massive black hole. A random forest classifier, trained on a subsample of spectroscopically identified stars, performs similarly well as competitive methods (F1 = 0.85), without involving any model of stellar spectral energy distributions. Advantages of using such a machine-trained classifier are a minimum of required calibration effort, a predictive accuracy expected to improve as more training data become available, and the ease of application to future, larger data sets. By applying this classifier to archive data, we are also able to reproduce the results of previous studies of the spatial distribution and the K-band luminosity function of both the early- and late-type stars.

  5. Prediction of Detailed Enzyme Functions and Identification of Specificity Determining Residues by Random Forests

    Science.gov (United States)

    Nagao, Chioko; Nagano, Nozomi; Mizuguchi, Kenji

    2014-01-01

    Determining enzyme functions is essential for a thorough understanding of cellular processes. Although many prediction methods have been developed, it remains a significant challenge to predict enzyme functions at the fourth-digit level of the Enzyme Commission numbers. Functional specificity of enzymes often changes drastically by mutations of a small number of residues and therefore, information about these critical residues can potentially help discriminate detailed functions. However, because these residues must be identified by mutagenesis experiments, the available information is limited, and the lack of experimentally verified specificity determining residues (SDRs) has hindered the development of detailed function prediction methods and computational identification of SDRs. Here we present a novel method for predicting enzyme functions by random forests, EFPrf, along with a set of putative SDRs, the random forests derived SDRs (rf-SDRs). EFPrf consists of a set of binary predictors for enzymes in each CATH superfamily and the rf-SDRs are the residue positions corresponding to the most highly contributing attributes obtained from each predictor. EFPrf showed a precision of 0.98 and a recall of 0.89 in a cross-validated benchmark assessment. The rf-SDRs included many residues, whose importance for specificity had been validated experimentally. The analysis of the rf-SDRs revealed both a general tendency that functionally diverged superfamilies tend to include more active site residues in their rf-SDRs than in less diverged superfamilies, and superfamily-specific conservation patterns of each functional residue. EFPrf and the rf-SDRs will be an effective tool for annotating enzyme functions and for understanding how enzyme functions have diverged within each superfamily. PMID:24416252

  6. Forest Cover Estimation in Ireland Using Radar Remote Sensing: A Comparative Analysis of Forest Cover Assessment Methodologies

    Science.gov (United States)

    Devaney, John; Barrett, Brian; Barrett, Frank; Redmond, John; O`Halloran, John

    2015-01-01

    Quantification of spatial and temporal changes in forest cover is an essential component of forest monitoring programs. Due to its cloud free capability, Synthetic Aperture Radar (SAR) is an ideal source of information on forest dynamics in countries with near-constant cloud-cover. However, few studies have investigated the use of SAR for forest cover estimation in landscapes with highly sparse and fragmented forest cover. In this study, the potential use of L-band SAR for forest cover estimation in two regions (Longford and Sligo) in Ireland is investigated and compared to forest cover estimates derived from three national (Forestry2010, Prime2, National Forest Inventory), one pan-European (Forest Map 2006) and one global forest cover (Global Forest Change) product. Two machine-learning approaches (Random Forests and Extremely Randomised Trees) are evaluated. Both Random Forests and Extremely Randomised Trees classification accuracies were high (98.1–98.5%), with differences between the two classifiers being minimal (forest area and an increase in overall accuracy of SAR-derived forest cover maps. All forest cover products were evaluated using an independent validation dataset. For the Longford region, the highest overall accuracy was recorded with the Forestry2010 dataset (97.42%) whereas in Sligo, highest overall accuracy was obtained for the Prime2 dataset (97.43%), although accuracies of SAR-derived forest maps were comparable. Our findings indicate that spaceborne radar could aid inventories in regions with low levels of forest cover in fragmented landscapes. The reduced accuracies observed for the global and pan-continental forest cover maps in comparison to national and SAR-derived forest maps indicate that caution should be exercised when applying these datasets for national reporting. PMID:26262681

  7. Monitoring grass nutrients and biomass as indicators of rangeland quality and quantity using random forest modelling and WorldView-2 data

    CSIR Research Space (South Africa)

    Ramoelo, Abel

    2015-12-01

    Full Text Available images and random forest technique in the north-eastern part of South Africa. Series of field work to collect samples for leaf N and biomass were undertaken in March 2013, April or May 2012 (end of wet season) and July 2012 (dry season). Several...

  8. Random Forests (RFs) for Estimation, Uncertainty Prediction and Interpretation of Monthly Solar Potential

    Science.gov (United States)

    Assouline, Dan; Mohajeri, Nahid; Scartezzini, Jean-Louis

    2017-04-01

    Solar energy is clean, widely available, and arguably the most promising renewable energy resource. Taking full advantage of solar power, however, requires a deep understanding of its patterns and dependencies in space and time. The recent advances in Machine Learning brought powerful algorithms to estimate the spatio-temporal variations of solar irradiance (the power per unit area received from the Sun, W/m2), using local weather and terrain information. Such algorithms include Deep Learning (e.g. Artificial Neural Networks), or kernel methods (e.g. Support Vector Machines). However, most of these methods have some disadvantages, as they: (i) are complex to tune, (ii) are mainly used as a black box and offering no interpretation on the variables contributions, (iii) often do not provide uncertainty predictions (Assouline et al., 2016). To provide a reasonable solar mapping with good accuracy, these gaps would ideally need to be filled. We present here simple steps using one ensemble learning algorithm namely, Random Forests (Breiman, 2001) to (i) estimate monthly solar potential with good accuracy, (ii) provide information on the contribution of each feature in the estimation, and (iii) offer prediction intervals for each point estimate. We have selected Switzerland as an example. Using a Digital Elevation Model (DEM) along with monthly solar irradiance time series and weather data, we build monthly solar maps for Global Horizontal Irradiance (GHI), Diffuse Horizontal Irradiance (GHI), and Extraterrestrial Irradiance (EI). The weather data include monthly values for temperature, precipitation, sunshine duration, and cloud cover. In order to explain the impact of each feature on the solar irradiance of each point estimate, we extend the contribution method (Kuz'min et al., 2011) to a regression setting. Contribution maps for all features can then be computed for each solar map. This provides precious information on the spatial variation of the features impact all

  9. Text Categorization on Hadith Sahih Al-Bukhari using Random Forest

    Science.gov (United States)

    Fauzan Afianto, Muhammad; Adiwijaya; Al-Faraby, Said

    2018-03-01

    Al-Hadith is a collection of words, deeds, provisions, and approvals of Rasulullah Shallallahu Alaihi wa Salam that becomes the second fundamental laws of Islam after Al-Qur’an. As a fundamental of Islam, Muslims must learn, memorize, and practice Al-Qur’an and Al-Hadith. One of venerable Imam which was also the narrator of Al-Hadith is Imam Bukhari. He spent over 16 years to compile about 2602 Hadith (without repetition) and over 7000 Hadith with repetition. Automatic text categorization is a task of developing software tools that able to classify text of hypertext document under pre-defined categories or subject code[1]. The algorithm that would be used is Random Forest, which is a development from Decision Tree. In this final project research, the author decided to make a system that able to categorize text document that contains Hadith that narrated by Imam Bukhari under several categories such as suggestion, prohibition, and information. As for the evaluation method, K-fold cross validation with F1-Score will be used and the result is 90%.

  10. Credit Risk Evaluation of Power Market Players with Random Forest

    Science.gov (United States)

    Umezawa, Yasushi; Mori, Hiroyuki

    A new method is proposed for credit risk evaluation in a power market. The credit risk evaluation is to measure the bankruptcy risk of the company. The power system liberalization results in new environment that puts emphasis on the profit maximization and the risk minimization. There is a high probability that the electricity transaction causes a risk between companies. So, power market players are concerned with the risk minimization. As a management strategy, a risk index is requested to evaluate the worth of the business partner. This paper proposes a new method for evaluating the credit risk with Random Forest (RF) that makes ensemble learning for the decision tree. RF is one of efficient data mining technique in clustering data and extracting relationship between input and output data. In addition, the method of generating pseudo-measurements is proposed to improve the performance of RF. The proposed method is successfully applied to real financial data of energy utilities in the power market. A comparison is made between the proposed and the conventional methods.

  11. Statistical inference for a class of multivariate negative binomial distributions

    DEFF Research Database (Denmark)

    Rubak, Ege Holger; Møller, Jesper; McCullagh, Peter

    This paper considers statistical inference procedures for a class of models for positively correlated count variables called α-permanental random fields, and which can be viewed as a family of multivariate negative binomial distributions. Their appealing probabilistic properties have earlier been...

  12. Forest structure and downed woody debris in boreal, temperate, and tropical forest fragments.

    Science.gov (United States)

    Gould, William A; González, Grizelle; Hudak, Andrew T; Hollingsworth, Teresa Nettleton; Hollingsworth, Jamie

    2008-12-01

    Forest fragmentation affects the heterogeneity of accumulated fuels by increasing the diversity of forest types and by increasing forest edges. This heterogeneity has implications in how we manage fuels, fire, and forests. Understanding the relative importance of fragmentation on woody biomass within a single climatic regime, and along climatic gradients, will improve our ability to manage forest fuels and predict fire behavior. In this study we assessed forest fuel characteristics in stands of differing moisture, i.e., dry and moist forests, structure, i.e., open canopy (typically younger) vs. closed canopy (typically older) stands, and size, i.e., small (10-14 ha), medium (33 to 60 ha), and large (100-240 ha) along a climatic gradient of boreal, temperate, and tropical forests. We measured duff, litter, fine and coarse woody debris, standing dead, and live biomass in a series of plots along a transect from outside the forest edge to the fragment interior. The goal was to determine how forest structure and fuel characteristics varied along this transect and whether this variation differed with temperature, moisture, structure, and fragment size. We found nonlinear relationships of coarse woody debris, fine woody debris, standing dead and live tree biomass with mean annual median temperature. Biomass for these variables was greatest in temperate sites. Forest floor fuels (duff and litter) had a linear relationship with temperature and biomass was greatest in boreal sites. In a five-way multivariate analysis of variance we found that temperature, moisture, and age/structure had significant effects on forest floor fuels, downed woody debris, and live tree biomass. Fragment size had an effect on forest floor fuels and live tree biomass. Distance from forest edge had significant effects for only a few subgroups sampled. With some exceptions edges were not distinguishable from interiors in terms of fuels.

  13. Managing salinity in Upper Colorado River Basin streams: Selecting catchments for sediment control efforts using watershed characteristics and random forests models

    Science.gov (United States)

    Tillman, Fred; Anning, David W.; Heilman, Julian A.; Buto, Susan G.; Miller, Matthew P.

    2018-01-01

    Elevated concentrations of dissolved-solids (salinity) including calcium, sodium, sulfate, and chloride, among others, in the Colorado River cause substantial problems for its water users. Previous efforts to reduce dissolved solids in upper Colorado River basin (UCRB) streams often focused on reducing suspended-sediment transport to streams, but few studies have investigated the relationship between suspended sediment and salinity, or evaluated which watershed characteristics might be associated with this relationship. Are there catchment properties that may help in identifying areas where control of suspended sediment will also reduce salinity transport to streams? A random forests classification analysis was performed on topographic, climate, land cover, geology, rock chemistry, soil, and hydrologic information in 163 UCRB catchments. Two random forests models were developed in this study: one for exploring stream and catchment characteristics associated with stream sites where dissolved solids increase with increasing suspended-sediment concentration, and the other for predicting where these sites are located in unmonitored reaches. Results of variable importance from the exploratory random forests models indicate that no simple source, geochemical process, or transport mechanism can easily explain the relationship between dissolved solids and suspended sediment concentrations at UCRB monitoring sites. Among the most important watershed characteristics in both models were measures of soil hydraulic conductivity, soil erodibility, minimum catchment elevation, catchment area, and the silt component of soil in the catchment. Predictions at key locations in the basin were combined with observations from selected monitoring sites, and presented in map-form to give a complete understanding of where catchment sediment control practices would also benefit control of dissolved solids in streams.

  14. Survival paths through the forest

    DEFF Research Database (Denmark)

    Mogensen, Ulla Brasch

    in appropriate prevention programs it is important to assess the individual risk with high accuracy. Generally, genetic information plays an important role for many diseases and will help to improve the accuracy of existing risk prediction models. However, conventional regression models have several limitations...... when the information is high-dimensional e.g. when there are many thousands of genes or markers. In these situations machine learning methods such as the random forest can still be applied and provide reasonable prediction accuracy. The main focus in this talk is the performance of random forest...

  15. Forest Typification to Characterize the Structure and Composition of Old-growth Evergreen Forests on Chiloe Island, North Patagonia (Chile

    Directory of Open Access Journals (Sweden)

    Jan R. Bannister

    2013-11-01

    Full Text Available The Evergreen forest type develops along the Valdivian and North-Patagonian phytogeographical regions of the south-central part of Chile (38° S–46° S. These evergreen forests have been scarcely studied south of 43° S, where there is still a large area made up of old-growth forests. Silvicultural proposals for the Evergreen forest type have been based on northern Evergreen forests, so that the characterization of the structure and composition of southern Evergreen forests, e.g., their typification, would aid in the development of appropriate silvicultural proposals for these forests. Based on the tree composition of 46 sampled plots in old-growth forests in an area of >1000 ha in southern Chiloé Island (43° S, we used multivariate analyses to define forest groups and to compare these forests with other evergreen forests throughout the Archipelago of North-Patagonia. We determined that evergreen forests of southern Chiloé correspond to the North-Patagonian temperate rainforests that are characterized by few tree species of different shade tolerance growing on fragile soils. We discuss the convenience of developing continuous cover forest management for these forests, rather than selective cuts or even-aged management that is proposed in the current legislation. This study is a contribution to forest classification for both ecologically- and forestry-oriented purposes.

  16. Multivariate time series modeling of selected childhood diseases in ...

    African Journals Online (AJOL)

    This paper is focused on modeling the five most prevalent childhood diseases in Akwa Ibom State using a multivariate approach to time series. An aggregate of 78,839 reported cases of malaria, upper respiratory tract infection (URTI), Pneumonia, anaemia and tetanus were extracted from five randomly selected hospitals in ...

  17. Statistical Downscaling of Temperature with the Random Forest Model

    Directory of Open Access Journals (Sweden)

    Bo Pang

    2017-01-01

    Full Text Available The issues with downscaling the outputs of a global climate model (GCM to a regional scale that are appropriate to hydrological impact studies are investigated using the random forest (RF model, which has been shown to be superior for large dataset analysis and variable importance evaluation. The RF is proposed for downscaling daily mean temperature in the Pearl River basin in southern China. Four downscaling models were developed and validated by using the observed temperature series from 61 national stations and large-scale predictor variables derived from the National Center for Environmental Prediction–National Center for Atmospheric Research reanalysis dataset. The proposed RF downscaling model was compared to multiple linear regression, artificial neural network, and support vector machine models. Principal component analysis (PCA and partial correlation analysis (PAR were used in the predictor selection for the other models for a comprehensive study. It was shown that the model efficiency of the RF model was higher than that of the other models according to five selected criteria. By evaluating the predictor importance, the RF could choose the best predictor combination without using PCA and PAR. The results indicate that the RF is a feasible tool for the statistical downscaling of temperature.

  18. A comparison of bivariate, multivariate random-effects, and Poisson correlated gamma-frailty models to meta-analyze individual patient data of ordinal scale diagnostic tests.

    Science.gov (United States)

    Simoneau, Gabrielle; Levis, Brooke; Cuijpers, Pim; Ioannidis, John P A; Patten, Scott B; Shrier, Ian; Bombardier, Charles H; de Lima Osório, Flavia; Fann, Jesse R; Gjerdingen, Dwenda; Lamers, Femke; Lotrakul, Manote; Löwe, Bernd; Shaaban, Juwita; Stafford, Lesley; van Weert, Henk C P M; Whooley, Mary A; Wittkampf, Karin A; Yeung, Albert S; Thombs, Brett D; Benedetti, Andrea

    2017-11-01

    Individual patient data (IPD) meta-analyses are increasingly common in the literature. In the context of estimating the diagnostic accuracy of ordinal or semi-continuous scale tests, sensitivity and specificity are often reported for a given threshold or a small set of thresholds, and a meta-analysis is conducted via a bivariate approach to account for their correlation. When IPD are available, sensitivity and specificity can be pooled for every possible threshold. Our objective was to compare the bivariate approach, which can be applied separately at every threshold, to two multivariate methods: the ordinal multivariate random-effects model and the Poisson correlated gamma-frailty model. Our comparison was empirical, using IPD from 13 studies that evaluated the diagnostic accuracy of the 9-item Patient Health Questionnaire depression screening tool, and included simulations. The empirical comparison showed that the implementation of the two multivariate methods is more laborious in terms of computational time and sensitivity to user-supplied values compared to the bivariate approach. Simulations showed that ignoring the within-study correlation of sensitivity and specificity across thresholds did not worsen inferences with the bivariate approach compared to the Poisson model. The ordinal approach was not suitable for simulations because the model was highly sensitive to user-supplied starting values. We tentatively recommend the bivariate approach rather than more complex multivariate methods for IPD diagnostic accuracy meta-analyses of ordinal scale tests, although the limited type of diagnostic data considered in the simulation study restricts the generalization of our findings. © 2017 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.

  19. A three-dimensional stochastic model of the behavior of radionuclides in forests. Part 2. Cs-137 behavior in forest soils

    International Nuclear Information System (INIS)

    Berg, Mitchell T.; Shuman, Larry J.

    1995-01-01

    Using a three-dimensional stochastic model of radionuclides in forests developed in Part 1, this work simulates the long-term behavior of Cs-137 in forest soil. It is assumed that the behavior of Cs-137 in soils is driven by its advection and dispersion due to the infiltration of the soil solution, and its sorption to the soil matrix. As Cs-137 transport through soils is affected by its uptake and release by forest vegetation, a model of radiocesium behavior in forest vegetation is presented in Part 3 of this paper. To estimate the rate of infiltration of water through the soil, models are presented to estimate the hydrological cycle of the forest including infiltration, evapotranspiration, and the root uptake of water. The state transition probabilities for the random walk model of Cs-137 transport are then estimated using the models developed to predict the distribution of water in the forest. The random walk model is then tested using a base line scenario in which Cs-137 is deposited into a coniferous forest ecosystem

  20. Multivariate statistical analysis of wildfires in Portugal

    Science.gov (United States)

    Costa, Ricardo; Caramelo, Liliana; Pereira, Mário

    2013-04-01

    Several studies demonstrate that wildfires in Portugal present high temporal and spatial variability as well as cluster behavior (Pereira et al., 2005, 2011). This study aims to contribute to the characterization of the fire regime in Portugal with the multivariate statistical analysis of the time series of number of fires and area burned in Portugal during the 1980 - 2009 period. The data used in the analysis is an extended version of the Rural Fire Portuguese Database (PRFD) (Pereira et al, 2011), provided by the National Forest Authority (Autoridade Florestal Nacional, AFN), the Portuguese Forest Service, which includes information for more than 500,000 fire records. There are many multiple advanced techniques for examining the relationships among multiple time series at the same time (e.g., canonical correlation analysis, principal components analysis, factor analysis, path analysis, multiple analyses of variance, clustering systems). This study compares and discusses the results obtained with these different techniques. Pereira, M.G., Trigo, R.M., DaCamara, C.C., Pereira, J.M.C., Leite, S.M., 2005: "Synoptic patterns associated with large summer forest fires in Portugal". Agricultural and Forest Meteorology. 129, 11-25. Pereira, M. G., Malamud, B. D., Trigo, R. M., and Alves, P. I.: The history and characteristics of the 1980-2005 Portuguese rural fire database, Nat. Hazards Earth Syst. Sci., 11, 3343-3358, doi:10.5194/nhess-11-3343-2011, 2011 This work is supported by European Union Funds (FEDER/COMPETE - Operational Competitiveness Programme) and by national funds (FCT - Portuguese Foundation for Science and Technology) under the project FCOMP-01-0124-FEDER-022692, the project FLAIR (PTDC/AAC-AMB/104702/2008) and the EU 7th Framework Program through FUME (contract number 243888).

  1. New machine learning tools for predictive vegetation mapping after climate change: Bagging and Random Forest perform better than Regression Tree Analysis

    Science.gov (United States)

    L.R. Iverson; A.M. Prasad; A. Liaw

    2004-01-01

    More and better machine learning tools are becoming available for landscape ecologists to aid in understanding species-environment relationships and to map probable species occurrence now and potentially into the future. To thal end, we evaluated three statistical models: Regression Tree Analybib (RTA), Bagging Trees (BT) and Random Forest (RF) for their utility in...

  2. Statistical Inference for a Class of Multivariate Negative Binomial Distributions

    DEFF Research Database (Denmark)

    Rubak, Ege H.; Møller, Jesper; McCullagh, Peter

    This paper considers statistical inference procedures for a class of models for positively correlated count variables called -permanental random fields, and which can be viewed as a family of multivariate negative binomial distributions. Their appealing probabilistic properties have earlier been...... studied in the literature, while this is the first statistical paper on -permanental random fields. The focus is on maximum likelihood estimation, maximum quasi-likelihood estimation and on maximum composite likelihood estimation based on uni- and bivariate distributions. Furthermore, new results...

  3. Spatial variation of dung beetle assemblages associated with forest structure in remnants of southern Brazilian Atlantic Forest

    Directory of Open Access Journals (Sweden)

    Pedro Giovâni da Silva

    2016-01-01

    Full Text Available The Brazilian Atlantic Forest is one of the world's biodiversity hotspots, and is currently highly fragmented and disturbed due to human activities. Variation in environmental conditions in the Atlantic Forest can influence the distribution of species, which may show associations with some environmental features. Dung beetles (Coleoptera: Scarabaeinae are insects that act in nutrient cycling via organic matter decomposition and have been used for monitoring environmental changes. The aim of this study is to identify associations between the spatial distribution of dung beetle species and Atlantic Forest structure. The spatial distribution of some dung beetle species was associated with structural forest features. The number of species among the sampling sites ranged widely, and few species were found in all remnant areas. Principal coordinates analysis indicated that species composition, abundance and biomass showed a spatially structured distribution, and these results were corroborated by permutational multivariate analysis of variance. The indicator value index and redundancy analysis showed an association of several dung beetle species with some explanatory environmental variables related to Atlantic Forest structure. This work demonstrated the existence of a spatially structured distribution of dung beetles, with significant associations between several species and forest structure in Atlantic Forest remnants from Southern Brazil. Keywords: Beta diversity, Species composition, Species diversity, Spatial distribution, Tropical forest

  4. HIGH QUALITY FACADE SEGMENTATION BASED ON STRUCTURED RANDOM FOREST, REGION PROPOSAL NETWORK AND RECTANGULAR FITTING

    Directory of Open Access Journals (Sweden)

    K. Rahmani

    2018-05-01

    Full Text Available In this paper we present a pipeline for high quality semantic segmentation of building facades using Structured Random Forest (SRF, Region Proposal Network (RPN based on a Convolutional Neural Network (CNN as well as rectangular fitting optimization. Our main contribution is that we employ features created by the RPN as channels in the SRF.We empirically show that this is very effective especially for doors and windows. Our pipeline is evaluated on two datasets where we outperform current state-of-the-art methods. Additionally, we quantify the contribution of the RPN and the rectangular fitting optimization on the accuracy of the result.

  5. The Fault Diagnosis of Rolling Bearing Based on Ensemble Empirical Mode Decomposition and Random Forest

    Directory of Open Access Journals (Sweden)

    Xiwen Qin

    2017-01-01

    Full Text Available Accurate diagnosis of rolling bearing fault on the normal operation of machinery and equipment has a very important significance. A method combining Ensemble Empirical Mode Decomposition (EEMD and Random Forest (RF is proposed. Firstly, the original signal is decomposed into several intrinsic mode functions (IMFs by EEMD, and the effective IMFs are selected. Then their energy entropy is calculated as the feature. Finally, the classification is performed by RF. In addition, the wavelet method is also used in the proposed process, the same as EEMD. The results of the comparison show that the EEMD method is more accurate than the wavelet method.

  6. SNRFCB: sub-network based random forest classifier for predicting chemotherapy benefit on survival for cancer treatment.

    Science.gov (United States)

    Shi, Mingguang; He, Jianmin

    2016-04-01

    Adjuvant chemotherapy (CTX) should be individualized to provide potential survival benefit and avoid potential harm to cancer patients. Our goal was to establish a computational approach for making personalized estimates of the survival benefit from adjuvant CTX. We developed Sub-Network based Random Forest classifier for predicting Chemotherapy Benefit (SNRFCB) based gene expression datasets of lung cancer. The SNRFCB approach was then validated in independent test cohorts for identifying chemotherapy responder cohorts and chemotherapy non-responder cohorts. SNRFCB involved the pre-selection of gene sub-network signatures based on the mutations and on protein-protein interaction data as well as the application of the random forest algorithm to gene expression datasets. Adjuvant CTX was significantly associated with the prolonged overall survival of lung cancer patients in the chemotherapy responder group (P = 0.008), but it was not beneficial to patients in the chemotherapy non-responder group (P = 0.657). Adjuvant CTX was significantly associated with the prolonged overall survival of lung cancer squamous cell carcinoma (SQCC) subtype patients in the chemotherapy responder cohorts (P = 0.024), but it was not beneficial to patients in the chemotherapy non-responder cohorts (P = 0.383). SNRFCB improved prediction performance as compared to the machine learning method, support vector machine (SVM). To test the general applicability of the predictive model, we further applied the SNRFCB approach to human breast cancer datasets and also observed superior performance. SNRFCB could provide recurrent probability for individual patients and identify which patients may benefit from adjuvant CTX in clinical trials.

  7. Deforestation Trends in Forest Estates of Vandeikya Local ...

    African Journals Online (AJOL)

    The variation in total forest area over time (years), the number of forest offences and annual forest fires was appraised in Vandeikya Local Government (VLG) Area, Benue State, Nigeria. Six wards were randomly selected from the twelve wards making up the Local Government. These wards were: Mbadede, Mbagbera, ...

  8. Evaluating effectiveness of down-sampling for stratified designs and unbalanced prevalence in Random Forest models of tree species distributions in Nevada

    Science.gov (United States)

    Elizabeth A. Freeman; Gretchen G. Moisen; Tracy S. Frescino

    2012-01-01

    Random Forests is frequently used to model species distributions over large geographic areas. Complications arise when data used to train the models have been collected in stratified designs that involve different sampling intensity per stratum. The modeling process is further complicated if some of the target species are relatively rare on the landscape leading to an...

  9. Analysis of multi-species point patterns using multivariate log Gaussian Cox processes

    DEFF Research Database (Denmark)

    Waagepetersen, Rasmus; Guan, Yongtao; Jalilian, Abdollah

    Multivariate log Gaussian Cox processes are flexible models for multivariate point patterns. However, they have so far only been applied in bivariate cases. In this paper we move beyond the bivariate case in order to model multi-species point patterns of tree locations. In particular we address t...... of the data. The selected number of common latent fields provides an index of complexity of the multivariate covariance structure. Hierarchical clustering is used to identify groups of species with similar patterns of dependence on the common latent fields.......Multivariate log Gaussian Cox processes are flexible models for multivariate point patterns. However, they have so far only been applied in bivariate cases. In this paper we move beyond the bivariate case in order to model multi-species point patterns of tree locations. In particular we address...... the problems of identifying parsimonious models and of extracting biologically relevant information from the fitted models. The latent multivariate Gaussian field is decomposed into components given in terms of random fields common to all species and components which are species specific. This allows...

  10. The choice of forest site for recreation

    DEFF Research Database (Denmark)

    Agimass, Fitalew; Lundhede, Thomas; Panduro, Toke Emil

    2018-01-01

    logit as well as a random parameter logit model. The variables that are found to affect the choice of forest site to a visit for recreation include: forest area, tree species composition, forest density, availability of historical sites, terrain difference, state ownership, and distance. Regarding......In this paper, we investigate the factors that can influence the site choice of forest recreation. Relevant attributes are identified by using spatial data analysis from a questionnaire asking people to indicate their most recent forest visits by pinpointing on a map. The main objectives...

  11. Maximizing the Diversity of Ensemble Random Forests for Tree Genera Classification Using High Density LiDAR Data

    Directory of Open Access Journals (Sweden)

    Connie Ko

    2016-08-01

    Full Text Available Recent research into improving the effectiveness of forest inventory management using airborne LiDAR data has focused on developing advanced theories in data analytics. Furthermore, supervised learning as a predictive model for classifying tree genera (and species, where possible has been gaining popularity in order to minimize this labor-intensive task. However, bottlenecks remain that hinder the immediate adoption of supervised learning methods. With supervised classification, training samples are required for learning the parameters that govern the performance of a classifier, yet the selection of training data is often subjective and the quality of such samples is critically important. For LiDAR scanning in forest environments, the quantification of data quality is somewhat abstract, normally referring to some metric related to the completeness of individual tree crowns; however, this is not an issue that has received much attention in the literature. Intuitively the choice of training samples having varying quality will affect classification accuracy. In this paper a Diversity Index (DI is proposed that characterizes the diversity of data quality (Qi among selected training samples required for constructing a classification model of tree genera. The training sample is diversified in terms of data quality as opposed to the number of samples per class. The diversified training sample allows the classifier to better learn the positive and negative instances and; therefore; has a higher classification accuracy in discriminating the “unknown” class samples from the “known” samples. Our algorithm is implemented within the Random Forests base classifiers with six derived geometric features from LiDAR data. The training sample contains three tree genera (pine; poplar; and maple and the validation samples contains four labels (pine; poplar; maple; and “unknown”. Classification accuracy improved from 72.8%; when training samples were

  12. Predicting temperate forest stand types using only structural profiles from discrete return airborne lidar

    Science.gov (United States)

    Fedrigo, Melissa; Newnham, Glenn J.; Coops, Nicholas C.; Culvenor, Darius S.; Bolton, Douglas K.; Nitschke, Craig R.

    2018-02-01

    Light detection and ranging (lidar) data have been increasingly used for forest classification due to its ability to penetrate the forest canopy and provide detail about the structure of the lower strata. In this study we demonstrate forest classification approaches using airborne lidar data as inputs to random forest and linear unmixing classification algorithms. Our results demonstrated that both random forest and linear unmixing models identified a distribution of rainforest and eucalypt stands that was comparable to existing ecological vegetation class (EVC) maps based primarily on manual interpretation of high resolution aerial imagery. Rainforest stands were also identified in the region that have not previously been identified in the EVC maps. The transition between stand types was better characterised by the random forest modelling approach. In contrast, the linear unmixing model placed greater emphasis on field plots selected as endmembers which may not have captured the variability in stand structure within a single stand type. The random forest model had the highest overall accuracy (84%) and Cohen's kappa coefficient (0.62). However, the classification accuracy was only marginally better than linear unmixing. The random forest model was applied to a region in the Central Highlands of south-eastern Australia to produce maps of stand type probability, including areas of transition (the 'ecotone') between rainforest and eucalypt forest. The resulting map provided a detailed delineation of forest classes, which specifically recognised the coalescing of stand types at the landscape scale. This represents a key step towards mapping the structural and spatial complexity of these ecosystems, which is important for both their management and conservation.

  13. Adaptive economic and ecological forest management under risk

    Science.gov (United States)

    Joseph Buongiorno; Mo Zhou

    2015-01-01

    Background: Forest managers must deal with inherently stochastic ecological and economic processes. The future growth of trees is uncertain, and so is their value. The randomness of low-impact, high frequency or rare catastrophic shocks in forest growth has significant implications in shaping the mix of tree species and the forest landscape...

  14. Substituting random forest for multiple linear regression improves binding affinity prediction of scoring functions: Cyscore as a case study.

    Science.gov (United States)

    Li, Hongjian; Leung, Kwong-Sak; Wong, Man-Hon; Ballester, Pedro J

    2014-08-27

    State-of-the-art protein-ligand docking methods are generally limited by the traditionally low accuracy of their scoring functions, which are used to predict binding affinity and thus vital for discriminating between active and inactive compounds. Despite intensive research over the years, classical scoring functions have reached a plateau in their predictive performance. These assume a predetermined additive functional form for some sophisticated numerical features, and use standard multivariate linear regression (MLR) on experimental data to derive the coefficients. In this study we show that such a simple functional form is detrimental for the prediction performance of a scoring function, and replacing linear regression by machine learning techniques like random forest (RF) can improve prediction performance. We investigate the conditions of applying RF under various contexts and find that given sufficient training samples RF manages to comprehensively capture the non-linearity between structural features and measured binding affinities. Incorporating more structural features and training with more samples can both boost RF performance. In addition, we analyze the importance of structural features to binding affinity prediction using the RF variable importance tool. Lastly, we use Cyscore, a top performing empirical scoring function, as a baseline for comparison study. Machine-learning scoring functions are fundamentally different from classical scoring functions because the former circumvents the fixed functional form relating structural features with binding affinities. RF, but not MLR, can effectively exploit more structural features and more training samples, leading to higher prediction performance. The future availability of more X-ray crystal structures will further widen the performance gap between RF-based and MLR-based scoring functions. This further stresses the importance of substituting RF for MLR in scoring function development.

  15. Bayesian inference for multivariate point processes observed at sparsely distributed times

    DEFF Research Database (Denmark)

    Rasmussen, Jakob Gulddahl; Møller, Jesper; Aukema, B.H.

    We consider statistical and computational aspects of simulation-based Bayesian inference for a multivariate point process which is only observed at sparsely distributed times. For specicity we consider a particular data set which has earlier been analyzed by a discrete time model involving unknown...... normalizing constants. We discuss the advantages and disadvantages of using continuous time processes compared to discrete time processes in the setting of the present paper as well as other spatial-temporal situations. Keywords: Bark beetle, conditional intensity, forest entomology, Markov chain Monte Carlo...

  16. Edge Detection from RGB-D Image Based on Structured Forests

    Directory of Open Access Journals (Sweden)

    Heng Zhang

    2016-01-01

    Full Text Available This paper looks into the fundamental problem in computer vision: edge detection. We propose a new edge detector using structured random forests as the classifier, which can make full use of RGB-D image information from Kinect. Before classification, the adaptive bilateral filter is used for the denoising processing of the depth image. As data sources, information of 13 channels from RGB-D image is computed. In order to train the random forest classifier, the approximation measurement of the information gain is used. All the structured labels at a given node are mapped to a discrete set of labels using the Principal Component Analysis (PCA method. NYUD2 dataset is used to train our structured random forests. The random forest algorithm is used to classify the RGB-D image information for extracting the edge of the image. In addition to the proposed methodology, the quantitative comparisons of different algorithms are presented. The results of the experiments demonstrate the significant improvements of our algorithm over the state of the art.

  17. Analysis of the stability and accuracy of the discrete least-squares approximation on multivariate polynomial spaces

    KAUST Repository

    Migliorati, Giovanni

    2016-01-05

    We review the main results achieved in the analysis of the stability and accuracy of the discrete leastsquares approximation on multivariate polynomial spaces, with noiseless evaluations at random points, noiseless evaluations at low-discrepancy point sets, and noisy evaluations at random points.

  18. Supremum Norm Posterior Contraction and Credible Sets for Nonparametric Multivariate Regression

    NARCIS (Netherlands)

    Yoo, W.W.; Ghosal, S

    2016-01-01

    In the setting of nonparametric multivariate regression with unknown error variance, we study asymptotic properties of a Bayesian method for estimating a regression function f and its mixed partial derivatives. We use a random series of tensor product of B-splines with normal basis coefficients as a

  19. Automated seismic detection of landslides at regional scales: a Random Forest based detection algorithm

    Science.gov (United States)

    Hibert, C.; Michéa, D.; Provost, F.; Malet, J. P.; Geertsema, M.

    2017-12-01

    Detection of landslide occurrences and measurement of their dynamics properties during run-out is a high research priority but a logistical and technical challenge. Seismology has started to help in several important ways. Taking advantage of the densification of global, regional and local networks of broadband seismic stations, recent advances now permit the seismic detection and location of landslides in near-real-time. This seismic detection could potentially greatly increase the spatio-temporal resolution at which we study landslides triggering, which is critical to better understand the influence of external forcings such as rainfalls and earthquakes. However, detecting automatically seismic signals generated by landslides still represents a challenge, especially for events with small mass. The low signal-to-noise ratio classically observed for landslide-generated seismic signals and the difficulty to discriminate these signals from those generated by regional earthquakes or anthropogenic and natural noises are some of the obstacles that have to be circumvented. We present a new method for automatically constructing instrumental landslide catalogues from continuous seismic data. We developed a robust and versatile solution, which can be implemented in any context where a seismic detection of landslides or other mass movements is relevant. The method is based on a spectral detection of the seismic signals and the identification of the sources with a Random Forest machine learning algorithm. The spectral detection allows detecting signals with low signal-to-noise ratio, while the Random Forest algorithm achieve a high rate of positive identification of the seismic signals generated by landslides and other seismic sources. The processing chain is implemented to work in a High Performance Computers centre which permits to explore years of continuous seismic data rapidly. We present here the preliminary results of the application of this processing chain for years

  20. The potential use of cuticular hydrocarbons and multivariate analysis to age empty puparial cases of Calliphora vicina and Lucilia sericata.

    Science.gov (United States)

    Moore, Hannah E; Pechal, Jennifer L; Benbow, M Eric; Drijfhout, Falko P

    2017-05-16

    Cuticular hydrocarbons (CHC) have been successfully used in the field of forensic entomology for identifying and ageing forensically important blowfly species, primarily in the larval stages. However in older scenes where all other entomological evidence is no longer present, Calliphoridae puparial cases can often be all that remains and therefore being able to establish the age could give an indication of the PMI. This paper examined the CHCs present in the lipid wax layer of insects, to determine the age of the cases over a period of nine months. The two forensically important species examined were Calliphora vicina and Lucilia sericata. The hydrocarbons were chemically extracted and analysed using Gas Chromatography - Mass Spectrometry. Statistical analysis was then applied in the form of non-metric multidimensional scaling analysis (NMDS), permutational multivariate analysis of variance (PERMANOVA) and random forest models. This study was successful in determining age differences within the empty cases, which to date, has not been establish by any other technique.

  1. Bayesian Modeling of Air Pollution Extremes Using Nested Multivariate Max-Stable Processes

    KAUST Repository

    Vettori, Sabrina; Huser, Raphaë l; Genton, Marc G.

    2018-01-01

    Capturing the potentially strong dependence among the peak concentrations of multiple air pollutants across a spatial region is crucial for assessing the related public health risks. In order to investigate the multivariate spatial dependence properties of air pollution extremes, we introduce a new class of multivariate max-stable processes. Our proposed model admits a hierarchical tree-based formulation, in which the data are conditionally independent given some latent nested $\\alpha$-stable random factors. The hierarchical structure facilitates Bayesian inference and offers a convenient and interpretable characterization. We fit this nested multivariate max-stable model to the maxima of air pollution concentrations and temperatures recorded at a number of sites in the Los Angeles area, showing that the proposed model succeeds in capturing their complex tail dependence structure.

  2. Bayesian Modeling of Air Pollution Extremes Using Nested Multivariate Max-Stable Processes

    KAUST Repository

    Vettori, Sabrina

    2018-03-18

    Capturing the potentially strong dependence among the peak concentrations of multiple air pollutants across a spatial region is crucial for assessing the related public health risks. In order to investigate the multivariate spatial dependence properties of air pollution extremes, we introduce a new class of multivariate max-stable processes. Our proposed model admits a hierarchical tree-based formulation, in which the data are conditionally independent given some latent nested $\\\\alpha$-stable random factors. The hierarchical structure facilitates Bayesian inference and offers a convenient and interpretable characterization. We fit this nested multivariate max-stable model to the maxima of air pollution concentrations and temperatures recorded at a number of sites in the Los Angeles area, showing that the proposed model succeeds in capturing their complex tail dependence structure.

  3. Abiotic and Biotic Soil Characteristics in Old Growth Forests and Thinned or Unthinned Mature Stands in Three Regions of Oregon

    Directory of Open Access Journals (Sweden)

    David A. Perry

    2012-09-01

    Full Text Available We compared forest floor depth, soil organic matter, soil moisture, anaerobic mineralizable nitrogen (a measure of microbial biomass, denitrification potential, and soil/litter arthropod communities among old growth, unthinned mature stands, and thinned mature stands at nine sites (each with all three stand types distributed among three regions of Oregon. Mineral soil measurements were restricted to the top 10 cm. Data were analyzed with both multivariate and univariate analyses of variance. Multivariate analyses were conducted with and without soil mesofauna or forest floor mesofauna, as data for those taxa were not collected on some sites. In multivariate analysis with soil mesofauna, the model giving the strongest separation among stand types (P = 0.019 included abundance and richness of soil mesofauna and anaerobic mineralizable nitrogen. The best model with forest floor mesofauna (P = 0.010 included anaerobic mineralizable nitrogen, soil moisture content, and richness of forest floor mesofauna. Old growth had the highest mean values for all variables, and in both models differed significantly from mature stands, while the latter did not differ. Old growth also averaged higher percent soil organic matter, and analysis including that variable was significant but not as strong as without it. Results of the multivariate analyses were mostly supported by univariate analyses, but there were some differences. In univariate analysis, the difference in percent soil organic matter between old growth and thinned mature was due to a single site in which the old growth had exceptionally high soil organic matter; without that site, percent soil organic matter did not differ between old growth and thinned mature, and a multivariate model containing soil organic matter was not statistically significant. In univariate analyses soil mesofauna had to be compared nonparametrically (because of heavy left-tails and differed only in the Siskiyou Mountains, where

  4. Organization of private forest sector in Timok forest area

    Directory of Open Access Journals (Sweden)

    Vojislav Milijic

    2010-06-01

    Full Text Available Today, private forest owners (PFOs in Serbia cooperate in form of private forest owners associations (PFOAs. Currently, there are 20 PFOAs, of which 15 are in Timok region. Initiatives of PFOs from Timok forest area, animated the owners from other parts of the country and led to foundation of Serbian Federation of Forest Owners' Associations. Twelve of PFOAs from Timok forest area are the founders of Serbian private forest owners' umbrella organization. Restructuring of Public Enterprise (PE "Srbijasume", which started in 2001, led to development of private small and medium forest enterprises, engaged as contractors of PE for harvesting, timber transport and construction of forest roads. The objectives of this paper are to elaborate if there are differences between PFOs in Serbia and Timok region and to analyze organization of private forest owners in Timok forest area. In order to reach these objectives, results of PRIFORT project were used. This project focused on four countries of Western Balkans region: Bosnia and Herzegovina, Croatia, Serbia and Macedonia. The aim of this project was to explore precondition for formation of PFOs in this region. Quantitative survey (n = 350 of randomly selected PFOs was conducted in nine municipalities in Serbia, of which two were in Timok region (n = 100. The results show that there are differences between PFOs in Serbia and Timok region in number of PFOs, size of private property and in additional incentives. These results also indicate that economic interest is a motive for establishment of PFOAs and that state support is very important for their development. Since a number of PFOs are entrepreneurs, it can be assumed that, further development of theirs organizations could lead to development of SMEs clusters. 

  5. In silico modelling of permeation enhancement potency in Caco-2 monolayers based on molecular descriptors and random forest

    DEFF Research Database (Denmark)

    Welling, Søren Havelund; Clemmensen, Line Katrine Harder; Buckley, Stephen T.

    2015-01-01

    has been developed.The random forest-QSAR model was based upon Caco-2 data for 41 surfactant-like permeation enhancers from Whitehead et al. (2008) and molecular descriptors calculated from their structure.The QSAR model was validated by two test-sets: (i) an eleven compound experimental set with Caco......-2 data and (ii) nine compounds with Caco-2 data from literature. Feature contributions, a recent developed diagnostic tool, was applied to elucidate the contribution of individual molecular descriptors to the predicted potency. Feature contributions provided easy interpretable suggestions...

  6. Analysis and Recognition of Traditional Chinese Medicine Pulse Based on the Hilbert-Huang Transform and Random Forest in Patients with Coronary Heart Disease

    Directory of Open Access Journals (Sweden)

    Rui Guo

    2015-01-01

    Full Text Available Objective. This research provides objective and quantitative parameters of the traditional Chinese medicine (TCM pulse conditions for distinguishing between patients with the coronary heart disease (CHD and normal people by using the proposed classification approach based on Hilbert-Huang transform (HHT and random forest. Methods. The energy and the sample entropy features were extracted by applying the HHT to TCM pulse by treating these pulse signals as time series. By using the random forest classifier, the extracted two types of features and their combination were, respectively, used as input data to establish classification model. Results. Statistical results showed that there were significant differences in the pulse energy and sample entropy between the CHD group and the normal group. Moreover, the energy features, sample entropy features, and their combination were inputted as pulse feature vectors; the corresponding average recognition rates were 84%, 76.35%, and 90.21%, respectively. Conclusion. The proposed approach could be appropriately used to analyze pulses of patients with CHD, which can lay a foundation for research on objective and quantitative criteria on disease diagnosis or Zheng differentiation.

  7. Analysis and Recognition of Traditional Chinese Medicine Pulse Based on the Hilbert-Huang Transform and Random Forest in Patients with Coronary Heart Disease

    Science.gov (United States)

    Wang, Yiqin; Yan, Hanxia; Yan, Jianjun; Yuan, Fengyin; Xu, Zhaoxia; Liu, Guoping; Xu, Wenjie

    2015-01-01

    Objective. This research provides objective and quantitative parameters of the traditional Chinese medicine (TCM) pulse conditions for distinguishing between patients with the coronary heart disease (CHD) and normal people by using the proposed classification approach based on Hilbert-Huang transform (HHT) and random forest. Methods. The energy and the sample entropy features were extracted by applying the HHT to TCM pulse by treating these pulse signals as time series. By using the random forest classifier, the extracted two types of features and their combination were, respectively, used as input data to establish classification model. Results. Statistical results showed that there were significant differences in the pulse energy and sample entropy between the CHD group and the normal group. Moreover, the energy features, sample entropy features, and their combination were inputted as pulse feature vectors; the corresponding average recognition rates were 84%, 76.35%, and 90.21%, respectively. Conclusion. The proposed approach could be appropriately used to analyze pulses of patients with CHD, which can lay a foundation for research on objective and quantitative criteria on disease diagnosis or Zheng differentiation. PMID:26180536

  8. A multivariate study of mangrove morphology ( Rhizophora mangle) using both above and below-water plant architecture

    Science.gov (United States)

    Brooks, R. Allen; Bell, Susan S.

    2005-11-01

    A descriptive study of the architecture of the red mangrove, Rhizophora mangle L., habitat of Tampa Bay, FL, was conducted to assess if plant architecture could be used to discriminate overwash from fringing forest type. Seven above-water (e.g., tree height, diameter at breast height, and leaf area) and 10 below-water (e.g., root density, root complexity, and maximum root order) architectural features were measured in eight mangrove stands. A multivariate technique (discriminant analysis) was used to test the ability of different models comprising above-water, below-water, or whole tree architecture to classify forest type. Root architectural features appear to be better than classical forestry measurements at discriminating between fringing and overwash forests but, regardless of the features loaded into the model, misclassification rates were high as forest type was only correctly classified in 66% of the cases. Based upon habitat architecture, the results of this study do not support a sharp distinction between overwash and fringing red mangrove forests in Tampa Bay but rather indicate that the two are architecturally undistinguishable. Therefore, within this northern portion of the geographic range of red mangroves, a more appropriate classification system based upon architecture may be one in which overwash and fringing forest types are combined into a single, "tide dominated" category.

  9. Soil map disaggregation improved by soil-landscape relationships, area-proportional sampling and random forest implementation

    DEFF Research Database (Denmark)

    Møller, Anders Bjørn; Malone, Brendan P.; Odgers, Nathan

    implementation generally improved the algorithm’s ability to predict the correct soil class. The implementation of soil-landscape relationships and area-proportional sampling generally increased the calculation time, while the random forest implementation reduced the calculation time. In the most successful......Detailed soil information is often needed to support agricultural practices, environmental protection and policy decisions. Several digital approaches can be used to map soil properties based on field observations. When soil observations are sparse or missing, an alternative approach...... is to disaggregate existing conventional soil maps. At present, the DSMART algorithm represents the most sophisticated approach for disaggregating conventional soil maps (Odgers et al., 2014). The algorithm relies on classification trees trained from resampled points, which are assigned classes according...

  10. Detecting Drought-Induced Tree Mortality in Sierra Nevada Forests with Time Series of Satellite Data

    Directory of Open Access Journals (Sweden)

    Sarah Byer

    2017-09-01

    Full Text Available A five-year drought in California led to a significant increase in tree mortality in the Sierra Nevada forests from 2012 to 2016. Landscape level monitoring of forest health and tree dieback is critical for vegetation and disaster management strategies. We examined the capability of multispectral imagery from the Moderate Resolution Imaging Spectroradiometer (MODIS in detecting and explaining the impacts of the recent severe drought in Sierra Nevada forests. Remote sensing metrics were developed to represent baseline forest health conditions and drought stress using time series of MODIS vegetation indices (VIs and a water index. We used Random Forest algorithms, trained with forest aerial detection surveys data, to detect tree mortality based on the remote sensing metrics and topographical variables. Map estimates of tree mortality demonstrated that our two-stage Random Forest models were capable of detecting the spatial patterns and severity of tree mortality, with an overall producer’s accuracy of 96.3% for the classification Random Forest (CRF and a RMSE of 7.19 dead trees per acre for the regression Random Forest (RRF. The overall omission errors of the CRF ranged from 19% for the severe mortality class to 27% for the low mortality class. Interpretations of the models revealed that forests with higher productivity preceding the onset of drought were more vulnerable to drought stress and, consequently, more likely to experience tree mortality. This method highlights the importance of incorporating baseline forest health data and measurements of drought stress in understanding forest response to severe drought.

  11. The Challenge of Forest Diagnostics

    Directory of Open Access Journals (Sweden)

    Harini Nagendra

    2011-06-01

    Full Text Available Ecologists and practitioners have conventionally used forest plots or transects for monitoring changes in attributes of forest condition over time. However, given the difficulty in collecting such data, conservation practitioners frequently rely on the judgment of foresters and forest users for evaluating changes. These methods are rarely compared. We use a dataset of 53 forests in five countries to compare assessments of forest change from forest plots, and forester and user evaluations of changes in forest density. We find that user assessments of changes in tree density are strongly and significantly related to assessments of change derived from statistical analyses of randomly distributed forest plots. User assessments of change in density at the shrub/sapling level also relate to assessments derived from statistical evaluations of vegetation plots, but this relationship is not as strong and only weakly significant. Evaluations of change by professional foresters are much more difficult to acquire, and less reliable, as foresters are often not familiar with changes in specific local areas. Forester evaluations can instead better provide valid single-time comparisons of a forest with other areas in a similar ecological zone. Thus, in forests where local forest users are present, their evaluations can be used to provide reliable assessments of changes in tree density in the areas they access. However, assessments of spatially heterogeneous patterns of human disturbance and regeneration at the shrub/sapling level are likely to require supplemental vegetation analysis.

  12. Robust multivariate analysis

    CERN Document Server

    J Olive, David

    2017-01-01

    This text presents methods that are robust to the assumption of a multivariate normal distribution or methods that are robust to certain types of outliers. Instead of using exact theory based on the multivariate normal distribution, the simpler and more applicable large sample theory is given.  The text develops among the first practical robust regression and robust multivariate location and dispersion estimators backed by theory.   The robust techniques  are illustrated for methods such as principal component analysis, canonical correlation analysis, and factor analysis.  A simple way to bootstrap confidence regions is also provided. Much of the research on robust multivariate analysis in this book is being published for the first time. The text is suitable for a first course in Multivariate Statistical Analysis or a first course in Robust Statistics. This graduate text is also useful for people who are familiar with the traditional multivariate topics, but want to know more about handling data sets with...

  13. CLASSIFICATION OF HYPERSPECTRAL DATA BASED ON GUIDED FILTERING AND RANDOM FOREST

    Directory of Open Access Journals (Sweden)

    H. Ma

    2017-09-01

    Full Text Available Hyperspectral images usually consist of more than one hundred spectral bands, which have potentials to provide rich spatial and spectral information. However, the application of hyperspectral data is still challengeable due to “the curse of dimensionality”. In this context, many techniques, which aim to make full use of both the spatial and spectral information, are investigated. In order to preserve the geometrical information, meanwhile, with less spectral bands, we propose a novel method, which combines principal components analysis (PCA, guided image filtering and the random forest classifier (RF. In detail, PCA is firstly employed to reduce the dimension of spectral bands. Secondly, the guided image filtering technique is introduced to smooth land object, meanwhile preserving the edge of objects. Finally, the features are fed into RF classifier. To illustrate the effectiveness of the method, we carry out experiments over the popular Indian Pines data set, which is collected by Airborne Visible/Infrared Imaging Spectrometer (AVIRIS sensor. By comparing the proposed method with the method of only using PCA or guided image filter, we find that effect of the proposed method is better.

  14. Mediastinal lymph node detection and station mapping on chest CT using spatial priors and random forest

    Energy Technology Data Exchange (ETDEWEB)

    Liu, Jiamin; Hoffman, Joanne; Zhao, Jocelyn; Yao, Jianhua; Lu, Le; Kim, Lauren; Turkbey, Evrim B.; Summers, Ronald M., E-mail: rms@nih.gov [Imaging Biomarkers and Computer-aided Diagnosis Laboratory, Radiology and Imaging Sciences, National Institutes of Health Clinical Center Building, 10 Room 1C224 MSC 1182, Bethesda, Maryland 20892-1182 (United States)

    2016-07-15

    Purpose: To develop an automated system for mediastinal lymph node detection and station mapping for chest CT. Methods: The contextual organs, trachea, lungs, and spine are first automatically identified to locate the region of interest (ROI) (mediastinum). The authors employ shape features derived from Hessian analysis, local object scale, and circular transformation that are computed per voxel in the ROI. Eight more anatomical structures are simultaneously segmented by multiatlas label fusion. Spatial priors are defined as the relative multidimensional distance vectors corresponding to each structure. Intensity, shape, and spatial prior features are integrated and parsed by a random forest classifier for lymph node detection. The detected candidates are then segmented by the following curve evolution process. Texture features are computed on the segmented lymph nodes and a support vector machine committee is used for final classification. For lymph node station labeling, based on the segmentation results of the above anatomical structures, the textual definitions of mediastinal lymph node map according to the International Association for the Study of Lung Cancer are converted into patient-specific color-coded CT image, where the lymph node station can be automatically assigned for each detected node. Results: The chest CT volumes from 70 patients with 316 enlarged mediastinal lymph nodes are used for validation. For lymph node detection, their system achieves 88% sensitivity at eight false positives per patient. For lymph node station labeling, 84.5% of lymph nodes are correctly assigned to their stations. Conclusions: Multiple-channel shape, intensity, and spatial prior features aggregated by a random forest classifier improve mediastinal lymph node detection on chest CT. Using the location information of segmented anatomic structures from the multiatlas formulation enables accurate identification of lymph node stations.

  15. Mediastinal lymph node detection and station mapping on chest CT using spatial priors and random forest

    International Nuclear Information System (INIS)

    Liu, Jiamin; Hoffman, Joanne; Zhao, Jocelyn; Yao, Jianhua; Lu, Le; Kim, Lauren; Turkbey, Evrim B.; Summers, Ronald M.

    2016-01-01

    Purpose: To develop an automated system for mediastinal lymph node detection and station mapping for chest CT. Methods: The contextual organs, trachea, lungs, and spine are first automatically identified to locate the region of interest (ROI) (mediastinum). The authors employ shape features derived from Hessian analysis, local object scale, and circular transformation that are computed per voxel in the ROI. Eight more anatomical structures are simultaneously segmented by multiatlas label fusion. Spatial priors are defined as the relative multidimensional distance vectors corresponding to each structure. Intensity, shape, and spatial prior features are integrated and parsed by a random forest classifier for lymph node detection. The detected candidates are then segmented by the following curve evolution process. Texture features are computed on the segmented lymph nodes and a support vector machine committee is used for final classification. For lymph node station labeling, based on the segmentation results of the above anatomical structures, the textual definitions of mediastinal lymph node map according to the International Association for the Study of Lung Cancer are converted into patient-specific color-coded CT image, where the lymph node station can be automatically assigned for each detected node. Results: The chest CT volumes from 70 patients with 316 enlarged mediastinal lymph nodes are used for validation. For lymph node detection, their system achieves 88% sensitivity at eight false positives per patient. For lymph node station labeling, 84.5% of lymph nodes are correctly assigned to their stations. Conclusions: Multiple-channel shape, intensity, and spatial prior features aggregated by a random forest classifier improve mediastinal lymph node detection on chest CT. Using the location information of segmented anatomic structures from the multiatlas formulation enables accurate identification of lymph node stations.

  16. The arboreal component of a dry forest in Northeastern Brazil

    Directory of Open Access Journals (Sweden)

    M. J. N. Rodal

    Full Text Available The dry forests of northeastern Brazil are found near the coastal zone and on low, isolated mountains inland amid semi-arid vegetation. The floristic composition of these dry montane forests, as well as their relationship to humid forests (Atlantic forest sensu stricto and to the deciduous thorn woodlands (Caatinga sensu stricto of the Brazilian northeast are not yet well known. This paper sought to determine if the arboreal plants in a dry forest growing on a low mountain in the semi-arid inland region (Serra Negra, 8° 35’ - 8° 38’ S and 38° 02’ - 38° 04’ W between the municipalities of Floresta and Inajá, state of Pernambuco have the same floristic composition and structure as that seen in other regional forests. In fifty 10 x 20 m plots all live and standing dead trees with trunk measuring > 5 cm diameter at breast height were measured. Floristic similarities between the forest studied and other regional forests were assessed using multivariate analysis. The results demonstrate that the dry forest studied can be classified into two groups that represent two major vegetational transitions: (1 a humid forest/dry forest transition; and (2 a deciduous thorn-woodland/ dry forest transition.

  17. The arboreal component of a dry forest in Northeastern Brazil.

    Science.gov (United States)

    Rodal, M J N; Nascimento, L M

    2006-05-01

    The dry forests of northeastern Brazil are found near the coastal zone and on low, isolated mountains inland amid semi-arid vegetation. The floristic composition of these dry montane forests, as well as their relationship to humid forests (Atlantic forest sensu stricto) and to the deciduous thorn woodlands (Caatinga sensu stricto) of the Brazilian northeast are not yet well known. This paper sought to determine if the arboreal plants in a dry forest growing on a low mountain in the semi-arid inland region (Serra Negra, 8 degrees 35 - 8 degrees 38 S and 38 degrees 02 - 38 degrees 04 W) between the municipalities of Floresta and Inajá, state of Pernambuco have the same floristic composition and structure as that seen in other regional forests. In fifty 10 x 20 m plots all live and standing dead trees with trunk measuring > 5 cm diameter at breast height were measured. Floristic similarities between the forest studied and other regional forests were assessed using multivariate analysis. The results demonstrate that the dry forest studied can be classified into two groups that represent two major vegetational transitions: (1) a humid forest/dry forest transition; and (2) a deciduous thorn-woodland/ dry forest transition.

  18. Random variables in forest policy: A systematic sensitivity analysis using CGE models

    International Nuclear Information System (INIS)

    Alavalapati, J.R.R.

    1999-01-01

    Computable general equilibrium (CGE) models are extensively used to simulate economic impacts of forest policies. Parameter values used in these models often play a central role in their outcome. Since econometric studies and best guesses are the main sources of these parameters, some randomness exists about the 'true' values of these parameters. Failure to incorporate this randomness into these models may limit the degree of confidence in the validity of the results. In this study, we conduct a systematic sensitivity analysis (SSA) to assess the economic impacts of: 1) a 1 % increase in tax on Canadian lumber and wood products exports to the United States (US), and 2) a 1% decrease in technical change in the lumber and wood products and pulp and paper sectors of the US and Canada. We achieve this task by using an aggregated version of global trade model developed by Hertel (1997) and the automated SSA procedure developed by Arndt and Pearson (1996). The estimated means and standard deviations suggest that certain impacts are more likely than others. For example, an increase in export tax is likely to cause a decrease in Canadian income, while an increase in US income is unlikely. On the other hand, a decrease in US welfare is likely, while an increase in Canadian welfare is unlikely, in response to an increase in tax. It is likely that income and welfare both fall in Canada and the US in response to a decrease in the technical change in lumber and wood products and pulp and paper sectors 21 refs, 1 fig, 5 tabs

  19. Idaho forest carbon projections from 2017 to 2117 under forest disturbance and climate change scenarios

    Science.gov (United States)

    Hudak, A. T.; Crookston, N.; Kennedy, R. E.; Domke, G. M.; Fekety, P.; Falkowski, M. J.

    2017-12-01

    Commercial off-the-shelf lidar collections associated with tree measures in field plots allow aboveground biomass (AGB) estimation with high confidence. Predictive models developed from such datasets are used operationally to map AGB across lidar project areas. We use a random selection of these pixel-level AGB predictions as training for predicting AGB annually across Idaho and western Montana, primarily from Landsat time series imagery processed through LandTrendr. At both the landscape and regional scales, Random Forests is used for predictive AGB modeling. To project future carbon dynamics, we use Climate-FVS (Forest Vegetation Simulator), the tree growth engine used by foresters to inform forest planning decisions, under either constant or changing climate scenarios. Disturbance data compiled from LandTrendr (Kennedy et al. 2010) using TimeSync (Cohen et al. 2010) in forested lands of Idaho (n=509) and western Montana (n=288) are used to generate probabilities of disturbance (harvest, fire, or insect) by land ownership class (public, private) as well as the magnitude of disturbance. Our verification approach is to aggregate the regional, annual AGB predictions at the county level and compare them to annual county-level AGB summarized independently from systematic, field-based, annual inventories conducted by the US Forest Inventory and Analysis (FIA) Program nationally. This analysis shows that when federal lands are disturbed the magnitude is generally high and when other lands are disturbed the magnitudes are more moderate. The probability of disturbance in corporate lands is higher than in other lands but the magnitudes are generally lower. This is consistent with the much higher prevalence of fire and insects occurring on federal lands, and greater harvest activity on private lands. We found large forest carbon losses in drier southern Idaho, only partially offset by carbon gains in wetter northern Idaho, due to anticipated climate change. Public and

  20. Mapping the Dabus Wetlands, Ethiopia, Using Random Forest Classification of Landsat, PALSAR and Topographic Data

    Directory of Open Access Journals (Sweden)

    Pierre Dubeau

    2017-10-01

    Full Text Available The Dabus Wetland complex in the highlands of Ethiopia is within the headwaters of the Nile Basin and is home to significant ecological communities and rare or endangered species. Its many interrelated wetland types undergo seasonal and longer-term changes due to weather and climate variations as well as anthropogenic land use such as grazing and burning. Mapping and monitoring of these wetlands has not been previously undertaken due primarily to their relative isolation and lack of resources. This study investigated the potential of remote sensing based classification for mapping the primary vegetation groups in the Dabus Wetlands using a combination of dry and wet season data, including optical (Landsat spectral bands and derived vegetation and wetness indices, radar (ALOS PALSAR L-band backscatter, and elevation (SRTM derived DEM and other terrain metrics as inputs to the non-parametric Random Forest (RF classifier. Eight wetland types and three terrestrial/upland classes were mapped using field samples of observed plant community composition and structure groupings as reference information. Various tests to compare results using different RF input parameters and data types were conducted. A combination of multispectral optical, radar and topographic variables provided the best overall classification accuracy, 94.4% and 92.9% for the dry and wet season, respectively. Spectral and topographic data (radar data excluded performed nearly as well, while accuracies using only radar and topographic data were 82–89%. Relatively homogeneous classes such as Papyrus Swamps, Forested Wetland, and Wet Meadow yielded the highest accuracies while spatially complex classes such as Emergent Marsh were more difficult to accurately classify. The methods and results presented in this paper can serve as a basis for development of long-term mapping and monitoring of these and other non-forested wetlands in Ethiopia and other similar environmental settings.

  1. Plausibility of Individual Decisions from Random Forests in Clinical Predictive Modelling Applications.

    Science.gov (United States)

    Hayn, Dieter; Walch, Harald; Stieg, Jörg; Kreiner, Karl; Ebner, Hubert; Schreier, Günter

    2017-01-01

    Machine learning algorithms are a promising approach to help physicians to deal with the ever increasing amount of data collected in healthcare each day. However, interpretation of suggestions derived from predictive models can be difficult. The aim of this work was to quantify the influence of a specific feature on an individual decision proposed by a random forest (RF). For each decision tree within the RF, the influence of each feature on a specific decision (FID) was quantified. For each feature, changes in outcome value due to the feature were summarized along the path. Results from all the trees in the RF were statistically merged. The ratio of FID to the respective feature's global importance was calculated (FIDrel). Global feature importance, FID and FIDrel significantly differed, depending on the individual input data. Therefore, we suggest to present the most important features as determined for FID and for FIDrel, whenever results of a RF are visualized. Feature influence on a specific decision can be quantified in RFs. Further studies will be necessary to evaluate our approach in a real world scenario.

  2. Mapping Robinia Pseudoacacia Forest Health Conditions by Using Combined Spectral, Spatial, and Textural Information Extracted from IKONOS Imagery and Random Forest Classifier

    Directory of Open Access Journals (Sweden)

    Hong Wang

    2015-07-01

    Full Text Available The textural and spatial information extracted from very high resolution (VHR remote sensing imagery provides complementary information for applications in which the spectral information is not sufficient for identification of spectrally similar landscape features. In this study grey-level co-occurrence matrix (GLCM textures and a local statistical analysis Getis statistic (Gi, computed from IKONOS multispectral (MS imagery acquired from the Yellow River Delta in China, along with a random forest (RF classifier, were used to discriminate Robina pseudoacacia tree health levels. Specifically, eight GLCM texture features (mean, variance, homogeneity, dissimilarity, contrast, entropy, angular second moment, and correlation were first calculated from IKONOS NIR band (Band 4 to determine an optimal window size (13 × 13 and an optimal direction (45°. Then, the optimal window size and direction were applied to the three other IKONOS MS bands (blue, green, and red for calculating the eight GLCM textures. Next, an optimal distance value (5 and an optimal neighborhood rule (Queen’s case were determined for calculating the four Gi features from the four IKONOS MS bands. Finally, different RF classification results of the three forest health conditions were created: (1 an overall accuracy (OA of 79.5% produced using the four MS band reflectances only; (2 an OA of 97.1% created with the eight GLCM features calculated from IKONOS Band 4 with the optimal window size of 13 × 13 and direction 45°; (3 an OA of 93.3% created with the all 32 GLCM features calculated from the four IKONOS MS bands with a window size of 13 × 13 and direction of 45°; (4 an OA of 94.0% created using the four Gi features calculated from the four IKONOS MS bands with the optimal distance value of 5 and Queen’s neighborhood rule; and (5 an OA of 96.9% created with the combined 16 spectral (four, spatial (four, and textural (eight features. The most important feature ranked by RF

  3. The sequencing of adverbial clauses of time in academic English: Random forest modelling

    Directory of Open Access Journals (Sweden)

    Abbas Ali Rezaee

    2016-12-01

    Full Text Available Adverbial clauses of time are positioned either before or after their associated main clauses. This study aims to assess the importance of discourse-pragmatics and processing-related constraints on the positioning of adverbial clauses of time in research articles of applied linguistics written by authors for whom English is considered a native language. Previous research has revealed that the ordering is co-determined by various factors from the domains of semantics and discourse-pragmatics (bridging, iconicity, and subordinator and language processing (deranking, length, and complexity. This research conducts a multifactorial analysis on the motivators of the positioning of adverbial clauses of time in 100 research articles of applied linguistics. The study will use a random forest of conditional inference trees as the statistical technique to measure the weights of the aforementioned variables. It was found that iconicity and bridging, which are factors associated with discourse and semantics, are the two most salient predictors of clause ordering.

  4. Effect of sample size on multi-parametric prediction of tissue outcome in acute ischemic stroke using a random forest classifier

    Science.gov (United States)

    Forkert, Nils Daniel; Fiehler, Jens

    2015-03-01

    The tissue outcome prediction in acute ischemic stroke patients is highly relevant for clinical and research purposes. It has been shown that the combined analysis of diffusion and perfusion MRI datasets using high-level machine learning techniques leads to an improved prediction of final infarction compared to single perfusion parameter thresholding. However, most high-level classifiers require a previous training and, until now, it is ambiguous how many subjects are required for this, which is the focus of this work. 23 MRI datasets of acute stroke patients with known tissue outcome were used in this work. Relative values of diffusion and perfusion parameters as well as the binary tissue outcome were extracted on a voxel-by- voxel level for all patients and used for training of a random forest classifier. The number of patients used for training set definition was iteratively and randomly reduced from using all 22 other patients to only one other patient. Thus, 22 tissue outcome predictions were generated for each patient using the trained random forest classifiers and compared to the known tissue outcome using the Dice coefficient. Overall, a logarithmic relation between the number of patients used for training set definition and tissue outcome prediction accuracy was found. Quantitatively, a mean Dice coefficient of 0.45 was found for the prediction using the training set consisting of the voxel information from only one other patient, which increases to 0.53 if using all other patients (n=22). Based on extrapolation, 50-100 patients appear to be a reasonable tradeoff between tissue outcome prediction accuracy and effort required for data acquisition and preparation.

  5. Why choose Random Forest to predict rare species distribution with few samples in large undersampled areas? Three Asian crane species models provide supporting evidence

    Directory of Open Access Journals (Sweden)

    Chunrong Mi

    2017-01-01

    Full Text Available Species distribution models (SDMs have become an essential tool in ecology, biogeography, evolution and, more recently, in conservation biology. How to generalize species distributions in large undersampled areas, especially with few samples, is a fundamental issue of SDMs. In order to explore this issue, we used the best available presence records for the Hooded Crane (Grus monacha, n = 33, White-naped Crane (Grus vipio, n = 40, and Black-necked Crane (Grus nigricollis, n = 75 in China as three case studies, employing four powerful and commonly used machine learning algorithms to map the breeding distributions of the three species: TreeNet (Stochastic Gradient Boosting, Boosted Regression Tree Model, Random Forest, CART (Classification and Regression Tree and Maxent (Maximum Entropy Models. In addition, we developed an ensemble forecast by averaging predicted probability of the above four models results. Commonly used model performance metrics (Area under ROC (AUC and true skill statistic (TSS were employed to evaluate model accuracy. The latest satellite tracking data and compiled literature data were used as two independent testing datasets to confront model predictions. We found Random Forest demonstrated the best performance for the most assessment method, provided a better model fit to the testing data, and achieved better species range maps for each crane species in undersampled areas. Random Forest has been generally available for more than 20 years and has been known to perform extremely well in ecological predictions. However, while increasingly on the rise, its potential is still widely underused in conservation, (spatial ecological applications and for inference. Our results show that it informs ecological and biogeographical theories as well as being suitable for conservation applications, specifically when the study area is undersampled. This method helps to save model-selection time and effort, and allows robust and rapid

  6. Why choose Random Forest to predict rare species distribution with few samples in large undersampled areas? Three Asian crane species models provide supporting evidence.

    Science.gov (United States)

    Mi, Chunrong; Huettmann, Falk; Guo, Yumin; Han, Xuesong; Wen, Lijia

    2017-01-01

    Species distribution models (SDMs) have become an essential tool in ecology, biogeography, evolution and, more recently, in conservation biology. How to generalize species distributions in large undersampled areas, especially with few samples, is a fundamental issue of SDMs. In order to explore this issue, we used the best available presence records for the Hooded Crane ( Grus monacha , n  = 33), White-naped Crane ( Grus vipio , n  = 40), and Black-necked Crane ( Grus nigricollis , n  = 75) in China as three case studies, employing four powerful and commonly used machine learning algorithms to map the breeding distributions of the three species: TreeNet (Stochastic Gradient Boosting, Boosted Regression Tree Model), Random Forest, CART (Classification and Regression Tree) and Maxent (Maximum Entropy Models). In addition, we developed an ensemble forecast by averaging predicted probability of the above four models results. Commonly used model performance metrics (Area under ROC (AUC) and true skill statistic (TSS)) were employed to evaluate model accuracy. The latest satellite tracking data and compiled literature data were used as two independent testing datasets to confront model predictions. We found Random Forest demonstrated the best performance for the most assessment method, provided a better model fit to the testing data, and achieved better species range maps for each crane species in undersampled areas. Random Forest has been generally available for more than 20 years and has been known to perform extremely well in ecological predictions. However, while increasingly on the rise, its potential is still widely underused in conservation, (spatial) ecological applications and for inference. Our results show that it informs ecological and biogeographical theories as well as being suitable for conservation applications, specifically when the study area is undersampled. This method helps to save model-selection time and effort, and allows robust and rapid

  7. A machine learning calibration model using random forests to improve sensor performance for lower-cost air quality monitoring

    Science.gov (United States)

    Zimmerman, Naomi; Presto, Albert A.; Kumar, Sriniwasa P. N.; Gu, Jason; Hauryliuk, Aliaksei; Robinson, Ellis S.; Robinson, Allen L.; Subramanian, R.

    2018-01-01

    Low-cost sensing strategies hold the promise of denser air quality monitoring networks, which could significantly improve our understanding of personal air pollution exposure. Additionally, low-cost air quality sensors could be deployed to areas where limited monitoring exists. However, low-cost sensors are frequently sensitive to environmental conditions and pollutant cross-sensitivities, which have historically been poorly addressed by laboratory calibrations, limiting their utility for monitoring. In this study, we investigated different calibration models for the Real-time Affordable Multi-Pollutant (RAMP) sensor package, which measures CO, NO2, O3, and CO2. We explored three methods: (1) laboratory univariate linear regression, (2) empirical multiple linear regression, and (3) machine-learning-based calibration models using random forests (RF). Calibration models were developed for 16-19 RAMP monitors (varied by pollutant) using training and testing windows spanning August 2016 through February 2017 in Pittsburgh, PA, US. The random forest models matched (CO) or significantly outperformed (NO2, CO2, O3) the other calibration models, and their accuracy and precision were robust over time for testing windows of up to 16 weeks. Following calibration, average mean absolute error on the testing data set from the random forest models was 38 ppb for CO (14 % relative error), 10 ppm for CO2 (2 % relative error), 3.5 ppb for NO2 (29 % relative error), and 3.4 ppb for O3 (15 % relative error), and Pearson r versus the reference monitors exceeded 0.8 for most units. Model performance is explored in detail, including a quantification of model variable importance, accuracy across different concentration ranges, and performance in a range of monitoring contexts including the National Ambient Air Quality Standards (NAAQS) and the US EPA Air Sensors Guidebook recommendations of minimum data quality for personal exposure measurement. A key strength of the RF approach is that

  8. Extreme-value limit of the convolution of exponential and multivariate normal distributions: Link to the Hüsler–Reiß distribution

    KAUST Repository

    Krupskii, Pavel

    2017-11-02

    The multivariate Hüsler–Reiß copula is obtained as a direct extreme-value limit from the convolution of a multivariate normal random vector and an exponential random variable multiplied by a vector of constants. It is shown how the set of Hüsler–Reiß parameters can be mapped to the parameters of this convolution model. Assuming there are no singular components in the Hüsler–Reiß copula, the convolution model leads to exact and approximate simulation methods. An application of simulation is to check if the Hüsler–Reiß copula with different parsimonious dependence structures provides adequate fit to some data consisting of multivariate extremes.

  9. Extreme-value limit of the convolution of exponential and multivariate normal distributions: Link to the Hüsler–Reiß distribution

    KAUST Repository

    Krupskii, Pavel; Joe, Harry; Lee, David; Genton, Marc G.

    2017-01-01

    The multivariate Hüsler–Reiß copula is obtained as a direct extreme-value limit from the convolution of a multivariate normal random vector and an exponential random variable multiplied by a vector of constants. It is shown how the set of Hüsler–Reiß parameters can be mapped to the parameters of this convolution model. Assuming there are no singular components in the Hüsler–Reiß copula, the convolution model leads to exact and approximate simulation methods. An application of simulation is to check if the Hüsler–Reiß copula with different parsimonious dependence structures provides adequate fit to some data consisting of multivariate extremes.

  10. Fragmentation of random trees

    International Nuclear Information System (INIS)

    Kalay, Z; Ben-Naim, E

    2015-01-01

    We study fragmentation of a random recursive tree into a forest by repeated removal of nodes. The initial tree consists of N nodes and it is generated by sequential addition of nodes with each new node attaching to a randomly-selected existing node. As nodes are removed from the tree, one at a time, the tree dissolves into an ensemble of separate trees, namely, a forest. We study statistical properties of trees and nodes in this heterogeneous forest, and find that the fraction of remaining nodes m characterizes the system in the limit N→∞. We obtain analytically the size density ϕ s of trees of size s. The size density has power-law tail ϕ s ∼s −α with exponent α=1+(1/m). Therefore, the tail becomes steeper as further nodes are removed, and the fragmentation process is unusual in that exponent α increases continuously with time. We also extend our analysis to the case where nodes are added as well as removed, and obtain the asymptotic size density for growing trees. (paper)

  11. An Ensemble Model for Co-Seismic Landslide Susceptibility Using GIS and Random Forest Method

    Directory of Open Access Journals (Sweden)

    Suchita Shrestha

    2017-11-01

    Full Text Available The Mw 7.8 Gorkha earthquake of 25 April 2015 triggered thousands of landslides in the central part of the Nepal Himalayas. The main goal of this study was to generate an ensemble-based map of co-seismic landslide susceptibility in Sindhupalchowk District using model comparison and combination strands. A total of 2194 co-seismic landslides were identified and were randomly split into 1536 (~70%, to train data for establishing the model, and the remaining 658 (~30% for the validation of the model. Frequency ratio, evidential belief function, and weight of evidence methods were applied and compared using 11 different causative factors (peak ground acceleration, epicenter proximity, fault proximity, geology, elevation, slope, plan curvature, internal relief, drainage proximity, stream power index, and topographic wetness index to prepare the landslide susceptibility map. An ensemble of random forest was then used to overcome the various prediction limitations of the individual models. The success rates and prediction capabilities were critically compared using the area under the curve (AUC of the receiver operating characteristic curve (ROC. By synthesizing the results of the various models into a single score, the ensemble model improved accuracy and provided considerably more realistic prediction capacities (91% than the frequency ratio (81.2%, evidential belief function (83.5% methods, and weight of evidence (80.1%.

  12. A multivariate nonlinear mixed effects method for analyzing energy partitioning in growing pigs

    DEFF Research Database (Denmark)

    Strathe, Anders Bjerring; Danfær, Allan Christian; Chwalibog, André

    2010-01-01

    to the multivariate nonlinear regression model because the MNLME method accounted for correlated errors associated with PD and LD measurements and could also include the random effect of animal. It is recommended that multivariate models used to quantify energy metabolism in growing pigs should account for animal......Simultaneous equations have become increasingly popular for describing the effects of nutrition on the utilization of ME for protein (PD) and lipid deposition (LD) in animals. The study developed a multivariate nonlinear mixed effects (MNLME) framework and compared it with an alternative method...... for estimating parameters in simultaneous equations that described energy metabolism in growing pigs, and then proposed new PD and LD equations. The general statistical framework was implemented in the NLMIXED procedure in SAS. Alternative PD and LD equations were also developed, which assumed...

  13. Using small area estimation and Lidar-derived variables for multivariate prediction of forest attributes

    Science.gov (United States)

    F. Mauro; Vicente Monleon; H. Temesgen

    2015-01-01

    Small area estimation (SAE) techniques have been successfully applied in forest inventories to provide reliable estimates for domains where the sample size is small (i.e. small areas). Previous studies have explored the use of either Area Level or Unit Level Empirical Best Linear Unbiased Predictors (EBLUPs) in a univariate framework, modeling each variable of interest...

  14. Non-Linguistic Vocal Event Detection Using Online Random

    DEFF Research Database (Denmark)

    Abou-Zleikha, Mohamed; Tan, Zheng-Hua; Christensen, Mads Græsbøll

    2014-01-01

    areas such as object detection, face recognition, and audio event detection. This paper proposes to use online random forest technique for detecting laughter and filler and for analyzing the importance of various features for non-linguistic vocal event classification through permutation. The results...... show that according to the Area Under Curve measure the online random forest achieved 88.1% compared to 82.9% obtained by the baseline support vector machines for laughter classification and 86.8% to 83.6% for filler classification....

  15. Microbiome Data Accurately Predicts the Postmortem Interval Using Random Forest Regression Models

    Directory of Open Access Journals (Sweden)

    Aeriel Belk

    2018-02-01

    Full Text Available Death investigations often include an effort to establish the postmortem interval (PMI in cases in which the time of death is uncertain. The postmortem interval can lead to the identification of the deceased and the validation of witness statements and suspect alibis. Recent research has demonstrated that microbes provide an accurate clock that starts at death and relies on ecological change in the microbial communities that normally inhabit a body and its surrounding environment. Here, we explore how to build the most robust Random Forest regression models for prediction of PMI by testing models built on different sample types (gravesoil, skin of the torso, skin of the head, gene markers (16S ribosomal RNA (rRNA, 18S rRNA, internal transcribed spacer regions (ITS, and taxonomic levels (sequence variants, species, genus, etc.. We also tested whether particular suites of indicator microbes were informative across different datasets. Generally, results indicate that the most accurate models for predicting PMI were built using gravesoil and skin data using the 16S rRNA genetic marker at the taxonomic level of phyla. Additionally, several phyla consistently contributed highly to model accuracy and may be candidate indicators of PMI.

  16. Species-specific audio detection: a comparison of three template-based detection algorithms using random forests

    Directory of Open Access Journals (Sweden)

    Carlos J. Corrada Bravo

    2017-04-01

    Full Text Available We developed a web-based cloud-hosted system that allow users to archive, listen, visualize, and annotate recordings. The system also provides tools to convert these annotations into datasets that can be used to train a computer to detect the presence or absence of a species. The algorithm used by the system was selected after comparing the accuracy and efficiency of three variants of a template-based detection. The algorithm computes a similarity vector by comparing a template of a species call with time increments across the spectrogram. Statistical features are extracted from this vector and used as input for a Random Forest classifier that predicts presence or absence of the species in the recording. The fastest algorithm variant had the highest average accuracy and specificity; therefore, it was implemented in the ARBIMON web-based system.

  17. Temporal changes in randomness of bird communities across Central Europe.

    Science.gov (United States)

    Renner, Swen C; Gossner, Martin M; Kahl, Tiemo; Kalko, Elisabeth K V; Weisser, Wolfgang W; Fischer, Markus; Allan, Eric

    2014-01-01

    Many studies have examined whether communities are structured by random or deterministic processes, and both are likely to play a role, but relatively few studies have attempted to quantify the degree of randomness in species composition. We quantified, for the first time, the degree of randomness in forest bird communities based on an analysis of spatial autocorrelation in three regions of Germany. The compositional dissimilarity between pairs of forest patches was regressed against the distance between them. We then calculated the y-intercept of the curve, i.e. the 'nugget', which represents the compositional dissimilarity at zero spatial distance. We therefore assume, following similar work on plant communities, that this represents the degree of randomness in species composition. We then analysed how the degree of randomness in community composition varied over time and with forest management intensity, which we expected to reduce the importance of random processes by increasing the strength of environmental drivers. We found that a high portion of the bird community composition could be explained by chance (overall mean of 0.63), implying that most of the variation in local bird community composition is driven by stochastic processes. Forest management intensity did not consistently affect the mean degree of randomness in community composition, perhaps because the bird communities were relatively insensitive to management intensity. We found a high temporal variation in the degree of randomness, which may indicate temporal variation in assembly processes and in the importance of key environmental drivers. We conclude that the degree of randomness in community composition should be considered in bird community studies, and the high values we find may indicate that bird community composition is relatively hard to predict at the regional scale.

  18. Temporal changes in randomness of bird communities across Central Europe.

    Directory of Open Access Journals (Sweden)

    Swen C Renner

    Full Text Available Many studies have examined whether communities are structured by random or deterministic processes, and both are likely to play a role, but relatively few studies have attempted to quantify the degree of randomness in species composition. We quantified, for the first time, the degree of randomness in forest bird communities based on an analysis of spatial autocorrelation in three regions of Germany. The compositional dissimilarity between pairs of forest patches was regressed against the distance between them. We then calculated the y-intercept of the curve, i.e. the 'nugget', which represents the compositional dissimilarity at zero spatial distance. We therefore assume, following similar work on plant communities, that this represents the degree of randomness in species composition. We then analysed how the degree of randomness in community composition varied over time and with forest management intensity, which we expected to reduce the importance of random processes by increasing the strength of environmental drivers. We found that a high portion of the bird community composition could be explained by chance (overall mean of 0.63, implying that most of the variation in local bird community composition is driven by stochastic processes. Forest management intensity did not consistently affect the mean degree of randomness in community composition, perhaps because the bird communities were relatively insensitive to management intensity. We found a high temporal variation in the degree of randomness, which may indicate temporal variation in assembly processes and in the importance of key environmental drivers. We conclude that the degree of randomness in community composition should be considered in bird community studies, and the high values we find may indicate that bird community composition is relatively hard to predict at the regional scale.

  19. Information system of forest growth and productivity by site quality type and elements of forest

    Science.gov (United States)

    Khlyustov, V.

    2012-04-01

    Information system of forest growth and productivity by site quality type and elements of forest V.K. Khlustov Head of the Forestry Department of Russian State Agrarian University named after K.A.Timiryazev doctor of agricultural sciences, professor The efficiency of forest management can be improved substantially by development and introduction of principally new models of forest growth and productivity dynamics based on regionalized site specific parameters. Therefore an innovative information system was developed. It describes the current state and gives a forecast for forest stand parameters: growth, structure, commercial and biological productivity depend on type of site quality. In contrast to existing yield tables, the new system has environmental basis: site quality type. The information system contains set of multivariate statistical models and can work at the level of individual trees or at the stand level. The system provides a graphical visualization, as well as export of the emulation results. The System is able to calculate detailed description of any forest stand based on five initial indicators: site quality type, site index, stocking, composition, and tree age by elements of the forest. The results of the model run are following parameters: average diameter and height, top height, number of trees, basal area, growing stock (total, commercial with distribution by size, firewood and residuals), live biomass (stem, bark, branches, foliage). The system also provides the distribution of mentioned above forest stand parameters by tree diameter classes. To predict the future forest stand dynamics the system require in addition the time slot only. Full set of forest parameters mention above will be provided by the System. The most conservative initial parameters (site quality type and site index) can be kept in the form of geo referenced polygons. In this case the system would need only 3 dynamic initial parameters (stocking, composition and age) to

  20. Global patterns and predictions of seafloor biomass using random forests.

    Directory of Open Access Journals (Sweden)

    Chih-Lin Wei

    Full Text Available A comprehensive seafloor biomass and abundance database has been constructed from 24 oceanographic institutions worldwide within the Census of Marine Life (CoML field projects. The machine-learning algorithm, Random Forests, was employed to model and predict seafloor standing stocks from surface primary production, water-column integrated and export particulate organic matter (POM, seafloor relief, and bottom water properties. The predictive models explain 63% to 88% of stock variance among the major size groups. Individual and composite maps of predicted global seafloor biomass and abundance are generated for bacteria, meiofauna, macrofauna, and megafauna (invertebrates and fishes. Patterns of benthic standing stocks were positive functions of surface primary production and delivery of the particulate organic carbon (POC flux to the seafloor. At a regional scale, the census maps illustrate that integrated biomass is highest at the poles, on continental margins associated with coastal upwelling and with broad zones associated with equatorial divergence. Lowest values are consistently encountered on the central abyssal plains of major ocean basins The shift of biomass dominance groups with depth is shown to be affected by the decrease in average body size rather than abundance, presumably due to decrease in quantity and quality of food supply. This biomass census and associated maps are vital components of mechanistic deep-sea food web models and global carbon cycling, and as such provide fundamental information that can be incorporated into evidence-based management.

  1. Multivariate analysis with LISREL

    CERN Document Server

    Jöreskog, Karl G; Y Wallentin, Fan

    2016-01-01

    This book traces the theory and methodology of multivariate statistical analysis and shows how it can be conducted in practice using the LISREL computer program. It presents not only the typical uses of LISREL, such as confirmatory factor analysis and structural equation models, but also several other multivariate analysis topics, including regression (univariate, multivariate, censored, logistic, and probit), generalized linear models, multilevel analysis, and principal component analysis. It provides numerous examples from several disciplines and discusses and interprets the results, illustrated with sections of output from the LISREL program, in the context of the example. The book is intended for masters and PhD students and researchers in the social, behavioral, economic and many other sciences who require a basic understanding of multivariate statistical theory and methods for their analysis of multivariate data. It can also be used as a textbook on various topics of multivariate statistical analysis.

  2. Forest canopy BRDF simulation using Monte Carlo method

    NARCIS (Netherlands)

    Huang, J.; Wu, B.; Zeng, Y.; Tian, Y.

    2006-01-01

    Monte Carlo method is a random statistic method, which has been widely used to simulate the Bidirectional Reflectance Distribution Function (BRDF) of vegetation canopy in the field of visible remote sensing. The random process between photons and forest canopy was designed using Monte Carlo method.

  3. A data based random number generator for a multivariate distribution (using stochastic interpolation)

    Science.gov (United States)

    Thompson, J. R.; Taylor, M. S.

    1982-01-01

    Let X be a K-dimensional random variable serving as input for a system with output Y (not necessarily of dimension k). given X, an outcome Y or a distribution of outcomes G(Y/X) may be obtained either explicitly or implicity. The situation is considered in which there is a real world data set X sub j sub = 1 (n) and a means of simulating an outcome Y. A method for empirical random number generation based on the sample of observations of the random variable X without estimating the underlying density is discussed.

  4. Ship Detection Based on Multiple Features in Random Forest Model for Hyperspectral Images

    Science.gov (United States)

    Li, N.; Ding, L.; Zhao, H.; Shi, J.; Wang, D.; Gong, X.

    2018-04-01

    A novel method for detecting ships which aim to make full use of both the spatial and spectral information from hyperspectral images is proposed. Firstly, the band which is high signal-noise ratio in the range of near infrared or short-wave infrared spectrum, is used to segment land and sea on Otsu threshold segmentation method. Secondly, multiple features that include spectral and texture features are extracted from hyperspectral images. Principal components analysis (PCA) is used to extract spectral features, the Grey Level Co-occurrence Matrix (GLCM) is used to extract texture features. Finally, Random Forest (RF) model is introduced to detect ships based on the extracted features. To illustrate the effectiveness of the method, we carry out experiments over the EO-1 data by comparing single feature and different multiple features. Compared with the traditional single feature method and Support Vector Machine (SVM) model, the proposed method can stably achieve the target detection of ships under complex background and can effectively improve the detection accuracy of ships.

  5. Random Forest Based Coarse Locating and KPCA Feature Extraction for Indoor Positioning System

    Directory of Open Access Journals (Sweden)

    Yun Mo

    2014-01-01

    Full Text Available With the fast developing of mobile terminals, positioning techniques based on fingerprinting method draw attention from many researchers even world famous companies. To conquer some shortcomings of the existing fingerprinting systems and further improve the system performance, on the one hand, in the paper, we propose a coarse positioning method based on random forest, which is able to customize several subregions, and classify test point to the region with an outstanding accuracy compared with some typical clustering algorithms. On the other hand, through the mathematical analysis in engineering, the proposed kernel principal component analysis algorithm is applied for radio map processing, which may provide better robustness and adaptability compared with linear feature extraction methods and manifold learning technique. We build both theoretical model and real environment for verifying the feasibility and reliability. The experimental results show that the proposed indoor positioning system could achieve 99% coarse locating accuracy and enhance 15% fine positioning accuracy on average in a strong noisy environment compared with some typical fingerprinting based methods.

  6. RSARF: Prediction of residue solvent accessibility from protein sequence using random forest method

    KAUST Repository

    Ganesan, Pugalenthi; Kandaswamy, Krishna Kumar Umar; Chou -, Kuochen; Vivekanandan, Saravanan; Kolatkar, Prasanna R.

    2012-01-01

    Prediction of protein structure from its amino acid sequence is still a challenging problem. The complete physicochemical understanding of protein folding is essential for the accurate structure prediction. Knowledge of residue solvent accessibility gives useful insights into protein structure prediction and function prediction. In this work, we propose a random forest method, RSARF, to predict residue accessible surface area from protein sequence information. The training and testing was performed using 120 proteins containing 22006 residues. For each residue, buried and exposed state was computed using five thresholds (0%, 5%, 10%, 25%, and 50%). The prediction accuracy for 0%, 5%, 10%, 25%, and 50% thresholds are 72.9%, 78.25%, 78.12%, 77.57% and 72.07% respectively. Further, comparison of RSARF with other methods using a benchmark dataset containing 20 proteins shows that our approach is useful for prediction of residue solvent accessibility from protein sequence without using structural information. The RSARF program, datasets and supplementary data are available at http://caps.ncbs.res.in/download/pugal/RSARF/. - See more at: http://www.eurekaselect.com/89216/article#sthash.pwVGFUjq.dpuf

  7. Multivariate η-μ fading distribution with arbitrary correlation model

    Science.gov (United States)

    Ghareeb, Ibrahim; Atiani, Amani

    2018-03-01

    An extensive analysis for the multivariate ? distribution with arbitrary correlation is presented, where novel analytical expressions for the multivariate probability density function, cumulative distribution function and moment generating function (MGF) of arbitrarily correlated and not necessarily identically distributed ? power random variables are derived. Also, this paper provides exact-form expression for the MGF of the instantaneous signal-to-noise ratio at the combiner output in a diversity reception system with maximal-ratio combining and post-detection equal-gain combining operating in slow frequency nonselective arbitrarily correlated not necessarily identically distributed ?-fading channels. The average bit error probability of differentially detected quadrature phase shift keying signals with post-detection diversity reception system over arbitrarily correlated and not necessarily identical fading parameters ?-fading channels is determined by using the MGF-based approach. The effect of fading correlation between diversity branches, fading severity parameters and diversity level is studied.

  8. The Impact of Forest Density on Forest Height Inversion Modeling from Polarimetric InSAR Data

    Directory of Open Access Journals (Sweden)

    Changcheng Wang

    2016-03-01

    Full Text Available Forest height is of great significance in analyzing the carbon cycle on a global or a local scale and in reconstructing the accurate forest underlying terrain. Major algorithms for estimating forest height, such as the three-stage inversion process, are depending on the random-volume-over-ground (RVoG model. However, the RVoG model is characterized by a lot of parameters, which influence its applicability in forest height retrieval. Forest density, as an important biophysical parameter, is one of those main influencing factors. However, its influence to the RVoG model has been ignored in relating researches. For this paper, we study the applicability of the RVoG model in forest height retrieval with different forest densities, using the simulated and real Polarimetric Interferometric SAR data. P-band ESAR datasets of the European Space Agency (ESA BioSAR 2008 campaign were selected for experiments. The test site was located in Krycklan River catchment in Northern Sweden. The experimental results show that the forest density clearly affects the inversion accuracy of forest height and ground phase. For the four selected forest stands, with the density increasing from 633 to 1827 stems/Ha, the RMSEs of inversion decrease from 4.6 m to 3.1 m. The RVoG model is not quite applicable for forest height retrieval especially in sparsely vegetated areas. We conclude that the forest stand density is positively related to the estimation accuracy of the ground phase, but negatively correlates to the ground-to-volume scattering ratio.

  9. Tourists’ perception of deadwood in mountain forests

    Directory of Open Access Journals (Sweden)

    Fabio Pastorella

    2016-12-01

    Full Text Available In the traditional forest management the non-living woody biomass in forests was perceived negatively. Generally, deadwood was removed during the silvicultural treatments to protect forests against fire, pests and insects attacks. In the last decades, the perception of forest managers regarding forest deadwood is changing. However, people’s opinions about the presence of deadwood in the forests have been few investigated. In view of this gap, the aim of the paper is to understand the tourists’ perception and opinions towards the deadwood in mountain forests. The survey was carried out in two study areas: the first one in Italy and the second one in Bosnia-Herzegovina. A structured questionnaire was administered to a random sample of visitors (n=156 in Italy; n=115 in Bosnia-Herzegovina. The tourists’ preferences were evaluated through a set of images characterized by a different amount of standing dead trees and lying deadwood. The collected data were statistically analyzed to highlight the preferred type of forests related to different forms of management of deadwood (unmanaged forests, close-to-nature forests, extensive managed forests and intensive managed forests. The results show that both components of deadwood are not perceived negatively by tourists. More than 60% of respondents prefer unmanaged forests and close-to-nature managed forests, 40% of respondents prefer intensive managed forests in which deadwood is removed during the silvicultural treatments.

  10. Object-based random forest classification of Landsat ETM+ and WorldView-2 satellite imagery for mapping lowland native grassland communities in Tasmania, Australia

    Science.gov (United States)

    Melville, Bethany; Lucieer, Arko; Aryal, Jagannath

    2018-04-01

    This paper presents a random forest classification approach for identifying and mapping three types of lowland native grassland communities found in the Tasmanian Midlands region. Due to the high conservation priority assigned to these communities, there has been an increasing need to identify appropriate datasets that can be used to derive accurate and frequently updateable maps of community extent. Therefore, this paper proposes a method employing repeat classification and statistical significance testing as a means of identifying the most appropriate dataset for mapping these communities. Two datasets were acquired and analysed; a Landsat ETM+ scene, and a WorldView-2 scene, both from 2010. Training and validation data were randomly subset using a k-fold (k = 50) approach from a pre-existing field dataset. Poa labillardierei, Themeda triandra and lowland native grassland complex communities were identified in addition to dry woodland and agriculture. For each subset of randomly allocated points, a random forest model was trained based on each dataset, and then used to classify the corresponding imagery. Validation was performed using the reciprocal points from the independent subset that had not been used to train the model. Final training and classification accuracies were reported as per class means for each satellite dataset. Analysis of Variance (ANOVA) was undertaken to determine whether classification accuracy differed between the two datasets, as well as between classifications. Results showed mean class accuracies between 54% and 87%. Class accuracy only differed significantly between datasets for the dry woodland and Themeda grassland classes, with the WorldView-2 dataset showing higher mean classification accuracies. The results of this study indicate that remote sensing is a viable method for the identification of lowland native grassland communities in the Tasmanian Midlands, and that repeat classification and statistical significant testing can be

  11. Analysis of the stability and accuracy of the discrete least-squares approximation on multivariate polynomial spaces

    KAUST Repository

    Migliorati, Giovanni

    2016-01-01

    We review the main results achieved in the analysis of the stability and accuracy of the discrete leastsquares approximation on multivariate polynomial spaces, with noiseless evaluations at random points, noiseless evaluations at low

  12. Parameter estimation of multivariate multiple regression model using bayesian with non-informative Jeffreys’ prior distribution

    Science.gov (United States)

    Saputro, D. R. S.; Amalia, F.; Widyaningsih, P.; Affan, R. C.

    2018-05-01

    Bayesian method is a method that can be used to estimate the parameters of multivariate multiple regression model. Bayesian method has two distributions, there are prior and posterior distributions. Posterior distribution is influenced by the selection of prior distribution. Jeffreys’ prior distribution is a kind of Non-informative prior distribution. This prior is used when the information about parameter not available. Non-informative Jeffreys’ prior distribution is combined with the sample information resulting the posterior distribution. Posterior distribution is used to estimate the parameter. The purposes of this research is to estimate the parameters of multivariate regression model using Bayesian method with Non-informative Jeffreys’ prior distribution. Based on the results and discussion, parameter estimation of β and Σ which were obtained from expected value of random variable of marginal posterior distribution function. The marginal posterior distributions for β and Σ are multivariate normal and inverse Wishart. However, in calculation of the expected value involving integral of a function which difficult to determine the value. Therefore, approach is needed by generating of random samples according to the posterior distribution characteristics of each parameter using Markov chain Monte Carlo (MCMC) Gibbs sampling algorithm.

  13. Cross-covariance functions for multivariate random fields based on latent dimensions

    KAUST Repository

    Apanasovich, T. V.; Genton, M. G.

    2010-01-01

    The problem of constructing valid parametric cross-covariance functions is challenging. We propose a simple methodology, based on latent dimensions and existing covariance models for univariate random fields, to develop flexible, interpretable

  14. Diagnosis of ulcerative colitis before onset of inflammation by multivariate modeling of genome-wide gene expression data

    DEFF Research Database (Denmark)

    Olsen, Jørgen; Gerds, Thomas A; Seidelin, Jakob B

    2009-01-01

    Background: Endoscopically obtained mucosal biopsies play an important role in the differential diagnosis between ulcerative colitis (UC) and Crohn's disease (CD), but in some cases where neither macroscopic nor microscopic signs of inflammation are present the biopsies provide only inconclusive...... biopsies from 78 patients were included. A diagnostic model was derived with the random forest method based on 71 biopsies from 60 patients. The model-internal out-of-bag performance measure yielded perfect classification. Furthermore, the model was validated in independent 18 noninflamed biopsies from 18...... of random forest modeling of genome-wide gene expression data for distinguishing quiescent and active UC colonic mucosa versus control and CD colonic mucosa.(Inflamm Bowel Dis 2009)....

  15. Aggregation-cokriging for highly multivariate spatial data

    KAUST Repository

    Furrer, R.; Genton, M. G.

    2011-01-01

    Best linear unbiased prediction of spatially correlated multivariate random processes, often called cokriging in geostatistics, requires the solution of a large linear system based on the covariance and cross-covariance matrix of the observations. For many problems of practical interest, it is impossible to solve the linear system with direct methods. We propose an efficient linear unbiased predictor based on a linear aggregation of the covariables. The primary variable together with this single meta-covariable is used to perform cokriging. We discuss the optimality of the approach under different covariance structures, and use it to create reanalysis type high-resolution historical temperature fields. © 2011 Biometrika Trust.

  16. Aggregation-cokriging for highly multivariate spatial data

    KAUST Repository

    Furrer, R.

    2011-08-26

    Best linear unbiased prediction of spatially correlated multivariate random processes, often called cokriging in geostatistics, requires the solution of a large linear system based on the covariance and cross-covariance matrix of the observations. For many problems of practical interest, it is impossible to solve the linear system with direct methods. We propose an efficient linear unbiased predictor based on a linear aggregation of the covariables. The primary variable together with this single meta-covariable is used to perform cokriging. We discuss the optimality of the approach under different covariance structures, and use it to create reanalysis type high-resolution historical temperature fields. © 2011 Biometrika Trust.

  17. Discriminant forest classification method and system

    Science.gov (United States)

    Chen, Barry Y.; Hanley, William G.; Lemmond, Tracy D.; Hiller, Lawrence J.; Knapp, David A.; Mugge, Marshall J.

    2012-11-06

    A hybrid machine learning methodology and system for classification that combines classical random forest (RF) methodology with discriminant analysis (DA) techniques to provide enhanced classification capability. A DA technique which uses feature measurements of an object to predict its class membership, such as linear discriminant analysis (LDA) or Andersen-Bahadur linear discriminant technique (AB), is used to split the data at each node in each of its classification trees to train and grow the trees and the forest. When training is finished, a set of n DA-based decision trees of a discriminant forest is produced for use in predicting the classification of new samples of unknown class.

  18. Origin Discrimination of Osmanthus fragrans var. thunbergii Flowers using GC-MS and UPLC-PDA Combined with Multivariable Analysis Methods.

    Science.gov (United States)

    Zhou, Fei; Zhao, Yajing; Peng, Jiyu; Jiang, Yirong; Li, Maiquan; Jiang, Yuan; Lu, Baiyi

    2017-07-01

    Osmanthus fragrans flowers are used as folk medicine and additives for teas, beverages and foods. The metabolites of O. fragrans flowers from different geographical origins were inconsistent in some extent. Chromatography and mass spectrometry combined with multivariable analysis methods provides an approach for discriminating the origin of O. fragrans flowers. To discriminate the Osmanthus fragrans var. thunbergii flowers from different origins with the identified metabolites. GC-MS and UPLC-PDA were conducted to analyse the metabolites in O. fragrans var. thunbergii flowers (in total 150 samples). Principal component analysis (PCA), soft independent modelling of class analogy analysis (SIMCA) and random forest (RF) analysis were applied to group the GC-MS and UPLC-PDA data. GC-MS identified 32 compounds common to all samples while UPLC-PDA/QTOF-MS identified 16 common compounds. PCA of the UPLC-PDA data generated a better clustering than PCA of the GC-MS data. Ten metabolites (six from GC-MS and four from UPLC-PDA) were selected as effective compounds for discrimination by PCA loadings. SIMCA and RF analysis were used to build classification models, and the RF model, based on the four effective compounds (caffeic acid derivative, acteoside, ligustroside and compound 15), yielded better results with the classification rate of 100% in the calibration set and 97.8% in the prediction set. GC-MS and UPLC-PDA combined with multivariable analysis methods can discriminate the origin of Osmanthus fragrans var. thunbergii flowers. Copyright © 2017 John Wiley & Sons, Ltd. Copyright © 2017 John Wiley & Sons, Ltd.

  19. Spatially random mortality in old-growth red pine forests of northern Minnesota

    Science.gov (United States)

    Tuomas ​Aakala; Shawn Fraver; Brian J. Palik; Anthony W. D' Amato

    2012-01-01

    Characterizing the spatial distribution of tree mortality is critical to understanding forest dynamics, but empirical studies on these patterns under old-growth conditions are rare. This rarity is due in part to low mortality rates in old-growth forests, the study of which necessitates long observation periods, and the confounding influence of tree in-growth during...

  20. Ecological consequences of alternative fuel reduction treatments in seasonally dry forests: the national fire and fire surrogate study

    Science.gov (United States)

    J.D. McIver; C.J. Fettig

    2010-01-01

    This special issue of Forest Science features the national Fire and Fire Surrogate study (FFS), a niultisite, multivariate research project that evaluates the ecological consequences of prescribed fire and its mechanical surrogates in seasonally dry forests of the United States. The need for a comprehensive national FFS study stemmed from concern that information on...

  1. Modelling and mapping the suitability of European forest formations at 1-km resolution

    DEFF Research Database (Denmark)

    Casalegno, Stefano; Amatulli, Giuseppe; Bastrup-Birk, Annemarie

    2011-01-01

    factors. Here, we used the bootstrap-aggregating machine-learning ensemble classifier Random Forest (RF) to derive a 1-km resolution European forest formation suitability map. The statistical model use as inputs more than 6,000 field data forest inventory plots and a large set of environmental variables...

  2. Multivariate statistical methods a primer

    CERN Document Server

    Manly, Bryan FJ

    2004-01-01

    THE MATERIAL OF MULTIVARIATE ANALYSISExamples of Multivariate DataPreview of Multivariate MethodsThe Multivariate Normal DistributionComputer ProgramsGraphical MethodsChapter SummaryReferencesMATRIX ALGEBRAThe Need for Matrix AlgebraMatrices and VectorsOperations on MatricesMatrix InversionQuadratic FormsEigenvalues and EigenvectorsVectors of Means and Covariance MatricesFurther Reading Chapter SummaryReferencesDISPLAYING MULTIVARIATE DATAThe Problem of Displaying Many Variables in Two DimensionsPlotting index VariablesThe Draftsman's PlotThe Representation of Individual Data P:ointsProfiles o

  3. Advanced analysis of forest fire clustering

    Science.gov (United States)

    Kanevski, Mikhail; Pereira, Mario; Golay, Jean

    2017-04-01

    Analysis of point pattern clustering is an important topic in spatial statistics and for many applications: biodiversity, epidemiology, natural hazards, geomarketing, etc. There are several fundamental approaches used to quantify spatial data clustering using topological, statistical and fractal measures. In the present research, the recently introduced multi-point Morisita index (mMI) is applied to study the spatial clustering of forest fires in Portugal. The data set consists of more than 30000 fire events covering the time period from 1975 to 2013. The distribution of forest fires is very complex and highly variable in space. mMI is a multi-point extension of the classical two-point Morisita index. In essence, mMI is estimated by covering the region under study by a grid and by computing how many times more likely it is that m points selected at random will be from the same grid cell than it would be in the case of a complete random Poisson process. By changing the number of grid cells (size of the grid cells), mMI characterizes the scaling properties of spatial clustering. From mMI, the data intrinsic dimension (fractal dimension) of the point distribution can be estimated as well. In this study, the mMI of forest fires is compared with the mMI of random patterns (RPs) generated within the validity domain defined as the forest area of Portugal. It turns out that the forest fires are highly clustered inside the validity domain in comparison with the RPs. Moreover, they demonstrate different scaling properties at different spatial scales. The results obtained from the mMI analysis are also compared with those of fractal measures of clustering - box counting and sand box counting approaches. REFERENCES Golay J., Kanevski M., Vega Orozco C., Leuenberger M., 2014: The multipoint Morisita index for the analysis of spatial patterns. Physica A, 406, 191-202. Golay J., Kanevski M. 2015: A new estimator of intrinsic dimension based on the multipoint Morisita index

  4. pSuc-Lys: Predict lysine succinylation sites in proteins with PseAAC and ensemble random forest approach.

    Science.gov (United States)

    Jia, Jianhua; Liu, Zi; Xiao, Xuan; Liu, Bingxiang; Chou, Kuo-Chen

    2016-04-07

    Being one type of post-translational modifications (PTMs), protein lysine succinylation is important in regulating varieties of biological processes. It is also involved with some diseases, however. Consequently, from the angles of both basic research and drug development, we are facing a challenging problem: for an uncharacterized protein sequence having many Lys residues therein, which ones can be succinylated, and which ones cannot? To address this problem, we have developed a predictor called pSuc-Lys through (1) incorporating the sequence-coupled information into the general pseudo amino acid composition, (2) balancing out skewed training dataset by random sampling, and (3) constructing an ensemble predictor by fusing a series of individual random forest classifiers. Rigorous cross-validations indicated that it remarkably outperformed the existing methods. A user-friendly web-server for pSuc-Lys has been established at http://www.jci-bioinfo.cn/pSuc-Lys, by which users can easily obtain their desired results without the need to go through the complicated mathematical equations involved. It has not escaped our notice that the formulation and approach presented here can also be used to analyze many other problems in computational proteomics. Copyright © 2016 Elsevier Ltd. All rights reserved.

  5. Hydrologic landscape regionalisation using deductive classification and random forests.

    Directory of Open Access Journals (Sweden)

    Stuart C Brown

    Full Text Available Landscape classification and hydrological regionalisation studies are being increasingly used in ecohydrology to aid in the management and research of aquatic resources. We present a methodology for classifying hydrologic landscapes based on spatial environmental variables by employing non-parametric statistics and hybrid image classification. Our approach differed from previous classifications which have required the use of an a priori spatial unit (e.g. a catchment which necessarily results in the loss of variability that is known to exist within those units. The use of a simple statistical approach to identify an appropriate number of classes eliminated the need for large amounts of post-hoc testing with different number of groups, or the selection and justification of an arbitrary number. Using statistical clustering, we identified 23 distinct groups within our training dataset. The use of a hybrid classification employing random forests extended this statistical clustering to an area of approximately 228,000 km2 of south-eastern Australia without the need to rely on catchments, landscape units or stream sections. This extension resulted in a highly accurate regionalisation at both 30-m and 2.5-km resolution, and a less-accurate 10-km classification that would be more appropriate for use at a continental scale. A smaller case study, of an area covering 27,000 km2, demonstrated that the method preserved the intra- and inter-catchment variability that is known to exist in local hydrology, based on previous research. Preliminary analysis linking the regionalisation to streamflow indices is promising suggesting that the method could be used to predict streamflow behaviour in ungauged catchments. Our work therefore simplifies current classification frameworks that are becoming more popular in ecohydrology, while better retaining small-scale variability in hydrology, thus enabling future attempts to explain and visualise broad-scale hydrologic

  6. Forecasting Space Weather-Induced GPS Performance Degradation Using Random Forest

    Science.gov (United States)

    Filjar, R.; Filic, M.; Milinkovic, F.

    2017-12-01

    Space weather and ionospheric dynamics have a profound effect on positioning performance of the Global Satellite Navigation System (GNSS). However, the quantification of that effect is still the subject of scientific activities around the world. In the latest contribution to the understanding of the space weather and ionospheric effects on satellite-based positioning performance, we conducted a study of several candidates for forecasting method for space weather-induced GPS positioning performance deterioration. First, a 5-days set of experimentally collected data was established, encompassing the space weather and ionospheric activity indices (including: the readings of the Sudden Ionospheric Disturbance (SID) monitors, components of geomagnetic field strength, global Kp index, Dst index, GPS-derived Total Electron Content (TEC) samples, standard deviation of TEC samples, and sunspot number) and observations of GPS positioning error components (northing, easting, and height positioning error) derived from the Adriatic Sea IGS reference stations' RINEX raw pseudorange files in quiet space weather periods. This data set was split into the training and test sub-sets. Then, a selected set of supervised machine learning methods based on Random Forest was applied to the experimentally collected data set in order to establish the appropriate regional (the Adriatic Sea) forecasting models for space weather-induced GPS positioning performance deterioration. The forecasting models were developed in the R/rattle statistical programming environment. The forecasting quality of the regional forecasting models developed was assessed, and the conclusions drawn on the advantages and shortcomings of the regional forecasting models for space weather-caused GNSS positioning performance deterioration.

  7. Robust linear registration of CT images using random regression forests

    Science.gov (United States)

    Konukoglu, Ender; Criminisi, Antonio; Pathak, Sayan; Robertson, Duncan; White, Steve; Haynor, David; Siddiqui, Khan

    2011-03-01

    Global linear registration is a necessary first step for many different tasks in medical image analysis. Comparing longitudinal studies1, cross-modality fusion2, and many other applications depend heavily on the success of the automatic registration. The robustness and efficiency of this step is crucial as it affects all subsequent operations. Most common techniques cast the linear registration problem as the minimization of a global energy function based on the image intensities. Although these algorithms have proved useful, their robustness in fully automated scenarios is still an open question. In fact, the optimization step often gets caught in local minima yielding unsatisfactory results. Recent algorithms constrain the space of registration parameters by exploiting implicit or explicit organ segmentations, thus increasing robustness4,5. In this work we propose a novel robust algorithm for automatic global linear image registration. Our method uses random regression forests to estimate posterior probability distributions for the locations of anatomical structures - represented as axis aligned bounding boxes6. These posterior distributions are later integrated in a global linear registration algorithm. The biggest advantage of our algorithm is that it does not require pre-defined segmentations or regions. Yet it yields robust registration results. We compare the robustness of our algorithm with that of the state of the art Elastix toolbox7. Validation is performed via 1464 pair-wise registrations in a database of very diverse 3D CT images. We show that our method decreases the "failure" rate of the global linear registration from 12.5% (Elastix) to only 1.9%.

  8. A joint model for multivariate hierarchical semicontinuous data with replications.

    Science.gov (United States)

    Kassahun-Yimer, Wondwosen; Albert, Paul S; Lipsky, Leah M; Nansel, Tonja R; Liu, Aiyi

    2017-01-01

    Longitudinal data are often collected in biomedical applications in such a way that measurements on more than one response are taken from a given subject repeatedly overtime. For some problems, these multiple profiles need to be modeled jointly to get insight on the joint evolution and/or association of these responses over time. In practice, such longitudinal outcomes may have many zeros that need to be accounted for in the analysis. For example, in dietary intake studies, as we focus on in this paper, some food components are eaten daily by almost all subjects, while others are consumed episodically, where individuals have time periods where they do not eat these components followed by periods where they do. These episodically consumed foods need to be adequately modeled to account for the many zeros that are encountered. In this paper, we propose a joint model to analyze multivariate hierarchical semicontinuous data characterized by many zeros and more than one replicate observations at each measurement occasion. This approach allows for different probability mechanisms for describing the zero behavior as compared with the mean intake given that the individual consumes the food. To deal with the potentially large number of multivariate profiles, we use a pairwise model fitting approach that was developed in the context of multivariate Gaussian random effects models with large number of multivariate components. The novelty of the proposed approach is that it incorporates: (1) multivariate, possibly correlated, response variables; (2) within subject correlation resulting from repeated measurements taken from each subject; (3) many zero observations; (4) overdispersion; and (5) replicate measurements at each visit time.

  9. Real time forest fire warning and forest fire risk zoning: a Vietnamese case study

    Science.gov (United States)

    Chu, T.; Pham, D.; Phung, T.; Ha, A.; Paschke, M.

    2016-12-01

    Forest fire occurs seriously in Vietnam and has been considered as one of the major causes of forest lost and degradation. Several studies of forest fire risk warning were conducted using Modified Nesterov Index (MNI) but remaining shortcomings and inaccurate predictions that needs to be urgently improved. In our study, several important topographic and social factors such as aspect, slope, elevation, distance to residential areas and road system were considered as "permanent" factors while meteorological data were updated hourly using near-real-time (NRT) remotely sensed data (i.e. MODIS Terra/Aqua and TRMM) for the prediction and warning of fire. Due to the limited number of weather stations in Vietnam, data from all active stations (i.e. 178) were used with the satellite data to calibrate and upscale meteorological variables. These data with finer resolution were then used to generate MNI. The only significant "permanent" factors were selected as input variables based on the correlation coefficients that computed from multi-variable regression among true fire-burning (collected from 1/2007) and its spatial characteristics. These coefficients also used to suggest appropriate weight for computing forest fire risk (FR) model. Forest fire risk model was calculated from the MNI and the selected factors using fuzzy regression models (FRMs) and GIS based multi-criteria analysis. By this approach, the FR was slightly modified from MNI by the integrated use of various factors in our fire warning and prediction model. Multifactor-based maps of forest fire risk zone were generated from classifying FR into three potential danger levels. Fire risk maps were displayed using webgis technology that is easy for managing data and extracting reports. Reported fire-burnings thereafter have been used as true values for validating the forest fire risk. Fire probability has strong relationship with potential danger levels (varied from 5.3% to 53.8%) indicating that the higher

  10. Random Forest Segregation of Drug Responses May define Regions of Biological Significance

    Directory of Open Access Journals (Sweden)

    Qasim eBukhari

    2016-03-01

    Full Text Available The ability to assess brain responses in unsupervised manner based on fMRI measure has remained a challenge. Here we have applied the Random Forest (RF method to detect differences in the pharmacological MRI (phMRI response in rats to treatment with an analgesic drug (buprenorphine as compared to control (saline. Three groups of animals were studied: two groups treated with different doses of the opioid buprenorphine, low (LD and high dose (HD, and one receiving saline. PhMRI responses were evaluated in 45 brain regions and RF analysis was applied to allocate rats to the individual treatment groups. RF analysis was able to identify drug effects based on differential phMRI responses in the hippocampus, amygdala, nucleus accumbens, superior colliculus and the lateral and posterior thalamus for drug vs. saline. These structures have high levels of mu opioid receptors. In addition these regions are involved in aversive signaling, which is inhibited by mu opioids. The results demonstrate that buprenorphine mediated phMRI responses comprise characteristic features that allow an unsupervised differentiation from placebo treated rats as well as the proper allocation to the respective drug dose group using the RF method, a method that has been successfully applied in clinical studies.

  11. Random Forest ensembles for detection and prediction of Alzheimer's disease with a good between-cohort robustness

    Directory of Open Access Journals (Sweden)

    A.V. Lebedev

    2014-01-01

    In the ADNI set, the best AD/HC sensitivity/specificity (88.6%/92.0% — test set was achieved by combining cortical thickness and volumetric measures. The Random Forest model resulted in significantly higher accuracy compared to the reference classifier (linear Support Vector Machine. The models trained using parcelled and high-dimensional (HD input demonstrated equivalent performance, but the former was more effective in terms of computation/memory and time costs. The sensitivity/specificity for detecting MCI-to-AD conversion (but not AD/HC classification performance was further improved from 79.5%/75%–83.3%/81.3% by a combination of morphometric measurements with ApoE-genotype and demographics (age, sex, education. When applied to the independent AddNeuroMed cohort, the best ADNI models produced equivalent performance without substantial accuracy drop, suggesting good robustness sufficient for future clinical implementation.

  12. A multiscale approach indicates a severe reduction in Atlantic Forest wetlands and highlights that São Paulo Marsh Antwren is on the brink of extinction.

    Directory of Open Access Journals (Sweden)

    Glaucia Del-Rio

    Full Text Available Over the last 200 years the wetlands of the Upper Tietê and Upper Paraíba do Sul basins, in the southeastern Atlantic Forest, Brazil, have been almost-completely transformed by urbanization, agriculture and mining. Endemic to these river basins, the São Paulo Marsh Antwren (Formicivora paludicola survived these impacts, but remained unknown to science until its discovery in 2005. Its population status was cause for immediate concern. In order to understand the factors imperiling the species, and provide guidelines for its conservation, we investigated both the species' distribution and the distribution of areas of suitable habitat using a multiscale approach encompassing species distribution modeling, fieldwork surveys and occupancy models. Of six species distribution models methods used (Generalized Linear Models, Generalized Additive Models, Multivariate Adaptive Regression Splines, Classification Tree Analysis, Artificial Neural Networks and Random Forest, Random Forest showed the best fit and was utilized to guide field validation. After surveying 59 sites, our results indicated that Formicivora paludicola occurred in only 13 sites, having narrow habitat specificity, and restricted habitat availability. Additionally, historic maps, distribution models and satellite imagery showed that human occupation has resulted in a loss of more than 346 km2 of suitable habitat for this species since the early twentieth century, so that it now only occupies a severely fragmented area (area of occupancy of 1.42 km2, and it should be considered Critically Endangered according to IUCN criteria. Furthermore, averaged occupancy models showed that marshes with lower cattail (Typha dominguensis densities have higher probabilities of being occupied. Thus, these areas should be prioritized in future conservation efforts to protect the species, and to restore a portion of Atlantic Forest wetlands, in times of unprecedented regional water supply problems.

  13. Testing Benefits Transfer of Forest Recreation Values over a 20-year time Horizon

    DEFF Research Database (Denmark)

    Zandersen, Marianne; Termansen, Mette; Jensen, F.S.

    2007-01-01

    We conduct a functional benefit transfer over 20 years of total willingness to pay based on car-borne forest recreation in 52 forests, using a mixed logit specification of a random utility model and geographic information systems to allow heterogeneous preferences across the population and for he......We conduct a functional benefit transfer over 20 years of total willingness to pay based on car-borne forest recreation in 52 forests, using a mixed logit specification of a random utility model and geographic information systems to allow heterogeneous preferences across the population...... and for heterogeneity over space. Results show that preferences for some forest attributes, such as species diversity and age, as well as transport mode have changed significantly over the period. Updating the transfer model with present total demand for recreation improves the error margins by an average of 282......%. Average errors of the best transfer model remain 25%....

  14. Analysis of preservative-treated wood by multivariate analysis of laser-induced breakdown spectroscopy spectra

    International Nuclear Information System (INIS)

    Martin, Madhavi Z.; Labbe, Nicole; Rials, Timothy G.; Wullschleger, Stan D.

    2005-01-01

    In this work, multivariate statistical analysis (MVA) techniques are coupled with laser-induced breakdown spectroscopy (LIBS) to identify preservative types (chromated copper arsenate, ammoniacal copper zinc or alkaline copper quat), and to predict elemental content in preservative-treated wood. The elemental composition of the samples was measured with a standard laboratory method of digestion followed by atomic absorption spectroscopy analysis. The elemental composition was then correlated with the LIBS spectra using projection to latent structures (PLS) models. The correlations for the different elements introduced by different treatments were very strong, with the correlation coefficients generally above 0.9. Additionally, principal component analysis (PCA) was used to differentiate the samples treated with different preservative formulations. The research has focused not only on demonstrating the application of LIBS as a tool for use in the forest products industry, but also considered sampling errors, limits of detection, reproducibility, and accuracy of measurements as they relate to multivariate analysis of this complex wood substrate

  15. Analysis of preservative-treated wood by multivariate analysis of laser-induced breakdown spectroscopy spectra

    Energy Technology Data Exchange (ETDEWEB)

    Martin, Madhavi Z. [Environmental Sciences Division Oak Ridge National Laboratory, P.O. Box 2008 MS 6422, Oak Ridge TN 37831-6422 (United States); Labbe, Nicole [Forest Products Center, University of Tennessee, 2506 Jacob Drive, Knoxville, TN 37996-4570 (United States)]. E-mail: nlabbe@utk.edu; Rials, Timothy G. [Forest Products Center, University of Tennessee, 2506 Jacob Drive, Knoxville, TN 37996-4570 (United States); Wullschleger, Stan D. [Environmental Sciences Division Oak Ridge National Laboratory, P.O. Box 2008 MS 6422, Oak Ridge TN 37831-6422 (United States)

    2005-08-31

    In this work, multivariate statistical analysis (MVA) techniques are coupled with laser-induced breakdown spectroscopy (LIBS) to identify preservative types (chromated copper arsenate, ammoniacal copper zinc or alkaline copper quat), and to predict elemental content in preservative-treated wood. The elemental composition of the samples was measured with a standard laboratory method of digestion followed by atomic absorption spectroscopy analysis. The elemental composition was then correlated with the LIBS spectra using projection to latent structures (PLS) models. The correlations for the different elements introduced by different treatments were very strong, with the correlation coefficients generally above 0.9. Additionally, principal component analysis (PCA) was used to differentiate the samples treated with different preservative formulations. The research has focused not only on demonstrating the application of LIBS as a tool for use in the forest products industry, but also considered sampling errors, limits of detection, reproducibility, and accuracy of measurements as they relate to multivariate analysis of this complex wood substrate.

  16. Contributions to Estimation and Testing Block Covariance Structures in Multivariate Normal Models

    OpenAIRE

    Liang, Yuli

    2015-01-01

    This thesis concerns inference problems in balanced random effects models with a so-called block circular Toeplitz covariance structure. This class of covariance structures describes the dependency of some specific multivariate two-level data when both compound symmetry and circular symmetry appear simultaneously. We derive two covariance structures under two different invariance restrictions. The obtained covariance structures reflect both circularity and exchangeability present in the data....

  17. Understanding and reaching family forest owners: lessons from social marketing research

    Science.gov (United States)

    Brett J. Butler; Mary Tyrrell; Geoff Feinberg; Scott VanManen; Larry Wiseman; Scott Wallinger

    2007-01-01

    Social marketing--the use of commercial marketing techniques to effect positive social change--is a promising means by which to develop more effective and efficient outreach, policies, and services for family forest owners. A hierarchical, multivariate analysis based on landowners' attitudes reveals four groups of owners to whom programs can be tailored: woodland...

  18. Forest owner representation of forest management and perception of resource efficiency: a structural equation modeling study

    Directory of Open Access Journals (Sweden)

    Andrej Ficko

    2015-03-01

    Full Text Available Underuse of nonindustrial private forests in developed countries has been interpreted mostly as a consequence of the prevailing noncommodity objectives of their owners. Recent empirical studies have indicated a correlation between the harvesting behavior of forest owners and the specific conceptualization of appropriate forest management described as "nonintervention" or "hands-off" management. We aimed to fill the huge gap in knowledge of social representations of forest management in Europe and are the first to be so rigorous in eliciting forest owner representations in Europe. We conducted 3099 telephone interviews with randomly selected forest owners in Slovenia, asking them whether they thought they managed their forest efficiently, what the possible reasons for underuse were, and what they understood by forest management. Building on social representations theory and applying a series of structural equation models, we tested the existence of three latent constructs of forest management and estimated whether and how much these constructs correlated to the perception of resource efficiency. Forest owners conceptualized forest management as a mixture of maintenance and ecosystem-centered and economics-centered management. None of the representations had a strong association with the perception of resource efficiency, nor could it be considered a factor preventing forest owners from cutting more. The underuse of wood resources was mostly because of biophysical constraints in the environment and not a deep-seated philosophical objection to harvesting. The difference between our findings and other empirical studies is primarily explained by historical differences in forestland ownership in different parts of Europe and the United States, the rising number of nonresidential owners, alternative lifestyle, and environmental protectionism, but also as a consequence of our high methodological rigor in testing the relationships between the constructs

  19. Unsupervised classification of multivariate geostatistical data: Two algorithms

    Science.gov (United States)

    Romary, Thomas; Ors, Fabien; Rivoirard, Jacques; Deraisme, Jacques

    2015-12-01

    With the increasing development of remote sensing platforms and the evolution of sampling facilities in mining and oil industry, spatial datasets are becoming increasingly large, inform a growing number of variables and cover wider and wider areas. Therefore, it is often necessary to split the domain of study to account for radically different behaviors of the natural phenomenon over the domain and to simplify the subsequent modeling step. The definition of these areas can be seen as a problem of unsupervised classification, or clustering, where we try to divide the domain into homogeneous domains with respect to the values taken by the variables in hand. The application of classical clustering methods, designed for independent observations, does not ensure the spatial coherence of the resulting classes. Image segmentation methods, based on e.g. Markov random fields, are not adapted to irregularly sampled data. Other existing approaches, based on mixtures of Gaussian random functions estimated via the expectation-maximization algorithm, are limited to reasonable sample sizes and a small number of variables. In this work, we propose two algorithms based on adaptations of classical algorithms to multivariate geostatistical data. Both algorithms are model free and can handle large volumes of multivariate, irregularly spaced data. The first one proceeds by agglomerative hierarchical clustering. The spatial coherence is ensured by a proximity condition imposed for two clusters to merge. This proximity condition relies on a graph organizing the data in the coordinates space. The hierarchical algorithm can then be seen as a graph-partitioning algorithm. Following this interpretation, a spatial version of the spectral clustering algorithm is also proposed. The performances of both algorithms are assessed on toy examples and a mining dataset.

  20. Spatiotemporal prediction of daily ambient ozone levels across China using random forest for human exposure assessment.

    Science.gov (United States)

    Zhan, Yu; Luo, Yuzhou; Deng, Xunfei; Grieneisen, Michael L; Zhang, Minghua; Di, Baofeng

    2018-02-01

    In China, ozone pollution shows an increasing trend and becomes the primary air pollutant in warm seasons. Leveraging the air quality monitoring network, a random forest model is developed to predict the daily maximum 8-h average ozone concentrations ([O 3 ] MDA8 ) across China in 2015 for human exposure assessment. This model captures the observed spatiotemporal variations of [O 3 ] MDA8 by using the data of meteorology, elevation, and recent-year emission inventories (cross-validation R 2  = 0.69 and RMSE = 26 μg/m 3 ). Compared with chemical transport models that require a plenty of variables and expensive computation, the random forest model shows comparable or higher predictive performance based on only a handful of readily-available variables at much lower computational cost. The nationwide population-weighted [O 3 ] MDA8 is predicted to be 84 ± 23 μg/m 3 annually, with the highest seasonal mean in the summer (103 ± 8 μg/m 3 ). The summer [O 3 ] MDA8 is predicted to be the highest in North China (125 ± 17 μg/m 3 ). Approximately 58% of the population lives in areas with more than 100 nonattainment days ([O 3 ] MDA8 >100 μg/m 3 ), and 12% of the population are exposed to [O 3 ] MDA8 >160 μg/m 3 (WHO Interim Target 1) for more than 30 days. As the most populous zones in China, the Beijing-Tianjin Metro, Yangtze River Delta, Pearl River Delta, and Sichuan Basin are predicted to be at 154, 141, 124, and 98 nonattainment days, respectively. Effective controls of O 3 pollution are urgently needed for the highly-populated zones, especially the Beijing-Tianjin Metro with seasonal [O 3 ] MDA8 of 140 ± 29 μg/m 3 in summer. To the best of the authors' knowledge, this study is the first statistical modeling work of ambient O 3 for China at the national level. This timely and extensively validated [O 3 ] MDA8 dataset is valuable for refining epidemiological analyses on O 3 pollution in China. Copyright © 2017 Elsevier Ltd. All rights

  1. Probabilistic risk models for multiple disturbances: an example of forest insects and wildfires

    Science.gov (United States)

    Haiganoush K. Preisler; Alan A. Ager; Jane L. Hayes

    2010-01-01

    Building probabilistic risk models for highly random forest disturbances like wildfire and forest insect outbreaks is a challenging. Modeling the interactions among natural disturbances is even more difficult. In the case of wildfire and forest insects, we looked at the probability of a large fire given an insect outbreak and also the incidence of insect outbreaks...

  2. Valuing the Recreational Benefits from the Creation of Nature Reserves in Irish Forests

    Science.gov (United States)

    Riccardo Scarpa; Susan M. Chilton; W. George Hutchinson; Joseph Buongiorno

    2000-01-01

    Data from a large-scale contingent valuation study are used to investigate the effects of forest attribum on willingness to pay for forest recreation in Ireland. In particular, the presence of a nature reserve in the forest is found to significantly increase the visitors' willingness to pay. A random utility model is used to estimate the welfare change associated...

  3. Random Forest Approach to QSPR Study of Fluorescence Properties Combining Quantum Chemical Descriptors and Solvent Conditions.

    Science.gov (United States)

    Chen, Chia-Hsiu; Tanaka, Kenichi; Funatsu, Kimito

    2018-04-22

    The Quantitative Structure - Property Relationship (QSPR) approach was performed to study the fluorescence absorption wavelengths and emission wavelengths of 413 fluorescent dyes in different solvent conditions. The dyes included the chromophore derivatives of cyanine, xanthene, coumarin, pyrene, naphthalene, anthracene and etc., with the wavelength ranging from 250 nm to 800 nm. An ensemble method, random forest (RF), was employed to construct nonlinear prediction models compared with the results of linear partial least squares and nonlinear support vector machine regression models. Quantum chemical descriptors derived from density functional theory method and solvent information were also used by constructing models. The best prediction results were obtained from RF model, with the squared correlation coefficients [Formula: see text] of 0.940 and 0.905 for λ abs and λ em , respectively. The descriptors used in the models were discussed in detail in this report by comparing the feature importance of RF.

  4. A Quantitative Index of Forest Structural Sustainability

    Directory of Open Access Journals (Sweden)

    Jonathan A. Cale

    2014-07-01

    Full Text Available Forest health is a complex concept including many ecosystem functions, interactions and values. We develop a quantitative system applicable to many forest types to assess tree mortality with respect to stable forest structure and composition. We quantify impacts of observed tree mortality on structure by comparison to baseline mortality, and then develop a system that distinguishes between structurally stable and unstable forests. An empirical multivariate index of structural sustainability and a threshold value (70.6 derived from 22 nontropical tree species’ datasets differentiated structurally sustainable from unsustainable diameter distributions. Twelve of 22 species populations were sustainable with a mean score of 33.2 (median = 27.6. Ten species populations were unsustainable with a mean score of 142.6 (median = 130.1. Among them, Fagus grandifolia, Pinus lambertiana, P. ponderosa, and Nothofagus solandri were attributable to known disturbances; whereas the unsustainability of Abies balsamea, Acer rubrum, Calocedrus decurrens, Picea engelmannii, P. rubens, and Prunus serotina populations were not. This approach provides the ecological framework for rational management decisions using routine inventory data to objectively: determine scope and direction of change in structure and composition, assess excessive or insufficient mortality, compare disturbance impacts in time and space, and prioritize management needs and allocation of scarce resources.

  5. Diversity, composition and host-species relationships of epiphytic orchids and ferns in two forests in Nepal

    Czech Academy of Sciences Publication Activity Database

    Adhikari, Y. P.; Fischer, A.; Fischer, H. S.; Rokaya, Maan Bahadur; Bhattarai, P.; Gruppe, A.

    2017-01-01

    Roč. 14, č. 6 (2017), s. 1065-1075 ISSN 1672-6316 R&D Projects: GA ČR GB14-36098G Institutional support: RVO:86652079 Keywords : vascular epiphytes * kathmandu valley * managed forests * dry forest * land-use * richness * conservation * biodiversity * assemblages * preferences * Environmental factors * Epiphytes * Large trees * Indicator species * Multivariate and univariate analyses * Permutations tests Subject RIV: EF - Botanics OBOR OECD: Environmental sciences (social aspects to be 5.7) Impact factor: 1.016, year: 2016

  6. Simple and Multivariate Relationships Between Spiritual Intelligence with General Health and Happiness.

    Science.gov (United States)

    Amirian, Mohammad-Elyas; Fazilat-Pour, Masoud

    2016-08-01

    The present study examined simple and multivariate relationships of spiritual intelligence with general health and happiness. The employed method was descriptive and correlational. King's Spiritual Quotient scales, GHQ-28 and Oxford Happiness Inventory, are filled out by a sample consisted of 384 students, which were selected using stratified random sampling from the students of Shahid Bahonar University of Kerman. Data are subjected to descriptive and inferential statistics including correlations and multivariate regressions. Bivariate correlations support positive and significant predictive value of spiritual intelligence toward general health and happiness. Further analysis showed that among the Spiritual Intelligence' subscales, Existential Critical Thinking Predicted General Health and Happiness, reversely. In addition, happiness was positively predicted by generation of personal meaning and transcendental awareness. The findings are discussed in line with the previous studies and the relevant theoretical background.

  7. Revegetation of coal mine soil with forest litter

    Energy Technology Data Exchange (ETDEWEB)

    Day, A.D.; Ludeke, K.L.; Thames, J.L.

    1986-11-01

    Forest litter, a good source of organic matter and seeds, was applied on undisturbed soil and on coal mine (spoils) in experiments conducted on the Black Mesa Coal Mine near Kayenta, Arizona over a 2-year period (1977-1978). Germination, seedling establishment, plant height and ground cover were evaluated for two seeding treatments (forest litter and no forest litter) and two soil moisture treatments (natural rainfall and natural rainfall plus irrigation). The forest litter was obtained at random from the Coconino National Forest, broadcast over the surface of the soil materials and incorporated into the surface 5 cm of each soil material. Germination, seedling establishment, plant height and ground cover on undisturbed soil and coal mine soil were higher when forest litter was applied than when it was not applied and when natural rainfall was supplemented with sprinkler irrigation than when rainfall was not supplemented with irrigation. Applications of forest litter and supplemental irrigation may ensure successful establishment of vegetation on areas disturbed by open-pit coal mining.

  8. Multivariate covariance generalized linear models

    DEFF Research Database (Denmark)

    Bonat, W. H.; Jørgensen, Bent

    2016-01-01

    are fitted by using an efficient Newton scoring algorithm based on quasi-likelihood and Pearson estimating functions, using only second-moment assumptions. This provides a unified approach to a wide variety of types of response variables and covariance structures, including multivariate extensions......We propose a general framework for non-normal multivariate data analysis called multivariate covariance generalized linear models, designed to handle multivariate response variables, along with a wide range of temporal and spatial correlation structures defined in terms of a covariance link...... function combined with a matrix linear predictor involving known matrices. The method is motivated by three data examples that are not easily handled by existing methods. The first example concerns multivariate count data, the second involves response variables of mixed types, combined with repeated...

  9. Distance to seed sources and land-use history affect forest development over a long-termheathland to forest succession

    DEFF Research Database (Denmark)

    Kepfer Rojas, Sebastian; Schmidt, Inger Kappel; Ransijn, Johannes

    2014-01-01

    ? Do these effects change in time? Location A 350-ha heathland (Nørholm) in southwest Denmark was abandoned in 1895 and left for free succession. Prior to abandonment the heathland was under traditional management for centuries. Method Trees and shrubs were recorded and measured in ten surveys spanning...... 91 yr (1921–2012). In the first nine surveys, complete censuses were used, whereas 116 randomly placed plots (10-m radius) were used in the most recent survey. We used mixed models and different multivariate techniques (non-metric multidimensional scaling and permutational multivariate ANOVA...

  10. Multivariate Statistical Process Control Charts: An Overview

    OpenAIRE

    Bersimis, Sotiris; Psarakis, Stelios; Panaretos, John

    2006-01-01

    In this paper we discuss the basic procedures for the implementation of multivariate statistical process control via control charting. Furthermore, we review multivariate extensions for all kinds of univariate control charts, such as multivariate Shewhart-type control charts, multivariate CUSUM control charts and multivariate EWMA control charts. In addition, we review unique procedures for the construction of multivariate control charts, based on multivariate statistical techniques such as p...

  11. On the use of spectra from portable Raman and ATR-IR instruments in synthesis route attribution of a chemical warfare agent by multivariate modeling.

    Science.gov (United States)

    Wiktelius, Daniel; Ahlinder, Linnea; Larsson, Andreas; Höjer Holmgren, Karin; Norlin, Rikard; Andersson, Per Ola

    2018-08-15

    Collecting data under field conditions for forensic investigations of chemical warfare agents calls for the use of portable instruments. In this study, a set of aged, crude preparations of sulfur mustard were characterized spectroscopically without any sample preparation using handheld Raman and portable IR instruments. The spectral data was used to construct Random Forest multivariate models for the attribution of test set samples to the synthetic method used for their production. Colored and fluorescent samples were included in the study, which made Raman spectroscopy challenging although fluorescence was diminished by using an excitation wavelength of 1064 nm. The predictive power of models constructed with IR or Raman data alone, as well as with combined data was investigated. Both techniques gave useful data for attribution. Model performance was enhanced when Raman and IR spectra were combined, allowing correct classification of 19/23 (83%) of test set spectra. The results demonstrate that data obtained with spectroscopy instruments amenable for field deployment can be useful in forensic studies of chemical warfare agents. Copyright © 2018 Elsevier B.V. All rights reserved.

  12. Methods of Multivariate Analysis

    CERN Document Server

    Rencher, Alvin C

    2012-01-01

    Praise for the Second Edition "This book is a systematic, well-written, well-organized text on multivariate analysis packed with intuition and insight . . . There is much practical wisdom in this book that is hard to find elsewhere."-IIE Transactions Filled with new and timely content, Methods of Multivariate Analysis, Third Edition provides examples and exercises based on more than sixty real data sets from a wide variety of scientific fields. It takes a "methods" approach to the subject, placing an emphasis on how students and practitioners can employ multivariate analysis in real-life sit

  13. Continuous multivariate exponential extension

    International Nuclear Information System (INIS)

    Block, H.W.

    1975-01-01

    The Freund-Weinman multivariate exponential extension is generalized to the case of nonidentically distributed marginal distributions. A fatal shock model is given for the resulting distribution. Results in the bivariate case and the concept of constant multivariate hazard rate lead to a continuous distribution related to the multivariate exponential distribution (MVE) of Marshall and Olkin. This distribution is shown to be a special case of the extended Freund-Weinman distribution. A generalization of the bivariate model of Proschan and Sullo leads to a distribution which contains both the extended Freund-Weinman distribution and the MVE

  14. Plant Traits Demonstrate That Temperate and Tropical Giant Eucalypt Forests Are Ecologically Convergent with Rainforest Not Savanna

    Science.gov (United States)

    Tng, David Y. P.; Jordan, Greg J.; Bowman, David M. J. S.

    2013-01-01

    Ecological theory differentiates rainforest and open vegetation in many regions as functionally divergent alternative stable states with transitional (ecotonal) vegetation between the two forming transient unstable states. This transitional vegetation is of considerable significance, not only as a test case for theories of vegetation dynamics, but also because this type of vegetation is of major economic importance, and is home to a suite of species of conservation significance, including the world’s tallest flowering plants. We therefore created predictions of patterns in plant functional traits that would test the alternative stable states model of these systems. We measured functional traits of 128 trees and shrubs across tropical and temperate rainforest – open vegetation transitions in Australia, with giant eucalypt forests situated between these vegetation types. We analysed a set of functional traits: leaf carbon isotopes, leaf area, leaf mass per area, leaf slenderness, wood density, maximum height and bark thickness, using univariate and multivariate methods. For most traits, giant eucalypt forest was similar to rainforest, while rainforest, particularly tropical rainforest, was significantly different from the open vegetation. In multivariate analyses, tropical and temperate rainforest diverged functionally, and both segregated from open vegetation. Furthermore, the giant eucalypt forests overlapped in function with their respective rainforests. The two types of giant eucalypt forests also exhibited greater overall functional similarity to each other than to any of the open vegetation types. We conclude that tropical and temperate giant eucalypt forests are ecologically and functionally convergent. The lack of clear functional differentiation from rainforest suggests that giant eucalypt forests are unstable states within the basin of attraction of rainforest. Our results have important implications for giant eucalypt forest management. PMID:24358359

  15. Random eigenvalue problems revisited

    Indian Academy of Sciences (India)

    statistical distributions; linear stochastic systems. 1. ... dimensional multivariate Gaussian random vector with mean µ ∈ Rm and covariance ... 5, the proposed analytical methods are applied to a three degree-of-freedom system and the ...... The joint pdf ofω1 andω3 is however close to a bivariate Gaussian density function.

  16. Succesional change and resilience of a very dry tropical deciduous forest following shifting agriculture

    NARCIS (Netherlands)

    Lebrija Trejos, E.E.; Bongers, F.J.J.M.; Pérez-García, E.; Meave, J.

    2008-01-01

    We analyzed successional patterns in a very dry tropical deciduous forest by using 15 plots differing in age after abandonment and contrasted them to secondary successions elsewhere in the tropics. We used multivariate ordination and nonlinear models to examine changes in composition and structure

  17. Correlations between Motor Symptoms across Different Motor Tasks, Quantified via Random Forest Feature Classification in Parkinson’s Disease

    Directory of Open Access Journals (Sweden)

    Andreas Kuhner

    2017-11-01

    Full Text Available BackgroundObjective assessments of Parkinson’s disease (PD patients’ motor state using motion capture techniques are still rarely used in clinical practice, even though they may improve clinical management. One major obstacle relates to the large dimensionality of motor abnormalities in PD. We aimed to extract global motor performance measures covering different everyday motor tasks, as a function of a clinical intervention, i.e., deep brain stimulation (DBS of the subthalamic nucleus.MethodsWe followed a data-driven, machine-learning approach and propose performance measures that employ Random Forests with probability distributions. We applied this method to 14 PD patients with DBS switched-off or -on, and 26 healthy control subjects performing the Timed Up and Go Test (TUG, the Functional Reach Test (FRT, a hand coordination task, walking 10-m straight, and a 90° curve.ResultsFor each motor task, a Random Forest identified a specific set of metrics that optimally separated PD off DBS from healthy subjects. We noted the highest accuracy (94.6% for standing up. This corresponded to a sensitivity of 91.5% to detect a PD patient off DBS, and a specificity of 97.2% representing the rate of correctly identified healthy subjects. We then calculated performance measures based on these sets of metrics and applied those results to characterize symptom severity in different motor tasks. Task-specific symptom severity measures correlated significantly with each other and with the Unified Parkinson’s Disease Rating Scale (UPDRS, part III, correlation of r2 = 0.79. Agreement rates between different measures ranged from 79.8 to 89.3%.ConclusionThe close correlation of PD patients’ various motor abnormalities quantified by different, task-specific severity measures suggests that these abnormalities are only facets of the underlying one-dimensional severity of motor deficits. The identification and characterization of this underlying motor deficit

  18. Canopy structure and topography effects on snow distribution at a catchment scale: Application of multivariate approaches

    Directory of Open Access Journals (Sweden)

    Jenicek Michal

    2018-03-01

    Full Text Available The knowledge of snowpack distribution at a catchment scale is important to predict the snowmelt runoff. The objective of this study is to select and quantify the most important factors governing the snowpack distribution, with special interest in the role of different canopy structure. We applied a simple distributed sampling design with measurement of snow depth and snow water equivalent (SWE at a catchment scale. We selected eleven predictors related to character of specific localities (such as elevation, slope orientation and leaf area index and to winter meteorological conditions (such as irradiance, sum of positive air temperature and sum of new snow depth. The forest canopy structure was described using parameters calculated from hemispherical photographs. A degree-day approach was used to calculate melt factors. Principal component analysis, cluster analysis and Spearman rank correlation were applied to reduce the number of predictors and to analyze measured data. The SWE in forest sites was by 40% lower than in open areas, but this value depended on the canopy structure. The snow ablation in large openings was on average almost two times faster compared to forest sites. The snow ablation in the forest was by 18% faster after forest defoliation (due to the bark beetle. The results from multivariate analyses showed that the leaf area index was a better predictor to explain the SWE distribution during accumulation period, while irradiance was better predictor during snowmelt period. Despite some uncertainty, parameters derived from hemispherical photographs may replace measured incoming solar radiation if this meteorological variable is not available.

  19. Land surface temperature downscaling using random forest regression: primary result and sensitivity analysis

    Science.gov (United States)

    Pan, Xin; Cao, Chen; Yang, Yingbao; Li, Xiaolong; Shan, Liangliang; Zhu, Xi

    2018-04-01

    The land surface temperature (LST) derived from thermal infrared satellite images is a meaningful variable in many remote sensing applications. However, at present, the spatial resolution of the satellite thermal infrared remote sensing sensor is coarser, which cannot meet the needs. In this study, LST image was downscaled by a random forest model between LST and multiple predictors in an arid region with an oasis-desert ecotone. The proposed downscaling approach was evaluated using LST derived from the MODIS LST product of Zhangye City in Heihe Basin. The primary result of LST downscaling has been shown that the distribution of downscaled LST matched with that of the ecosystem of oasis and desert. By the way of sensitivity analysis, the most sensitive factors to LST downscaling were modified normalized difference water index (MNDWI)/normalized multi-band drought index (NMDI), soil adjusted vegetation index (SAVI)/ shortwave infrared reflectance (SWIR)/normalized difference vegetation index (NDVI), normalized difference building index (NDBI)/SAVI and SWIR/NDBI/MNDWI/NDWI for the region of water, vegetation, building and desert, with LST variation (at most) of 0.20/-0.22 K, 0.92/0.62/0.46 K, 0.28/-0.29 K and 3.87/-1.53/-0.64/-0.25 K in the situation of +/-0.02 predictor perturbances, respectively.

  20. Identification of species based on DNA barcode using k-mer feature vector and Random forest classifier.

    Science.gov (United States)

    Meher, Prabina Kumar; Sahu, Tanmaya Kumar; Rao, A R

    2016-11-05

    DNA barcoding is a molecular diagnostic method that allows automated and accurate identification of species based on a short and standardized fragment of DNA. To this end, an attempt has been made in this study to develop a computational approach for identifying the species by comparing its barcode with the barcode sequence of known species present in the reference library. Each barcode sequence was first mapped onto a numeric feature vector based on k-mer frequencies and then Random forest methodology was employed on the transformed dataset for species identification. The proposed approach outperformed similarity-based, tree-based, diagnostic-based approaches and found comparable with existing supervised learning based approaches in terms of species identification success rate, while compared using real and simulated datasets. Based on the proposed approach, an online web interface SPIDBAR has also been developed and made freely available at http://cabgrid.res.in:8080/spidbar/ for species identification by the taxonomists. Copyright © 2016 Elsevier B.V. All rights reserved.

  1. Random Forests to Predict Rectal Toxicity Following Prostate Cancer Radiation Therapy

    International Nuclear Information System (INIS)

    Ospina, Juan D.; Zhu, Jian; Chira, Ciprian; Bossi, Alberto; Delobel, Jean B.; Beckendorf, Véronique; Dubray, Bernard; Lagrange, Jean-Léon; Correa, Juan C.

    2014-01-01

    Purpose: To propose a random forest normal tissue complication probability (RF-NTCP) model to predict late rectal toxicity following prostate cancer radiation therapy, and to compare its performance to that of classic NTCP models. Methods and Materials: Clinical data and dose-volume histograms (DVH) were collected from 261 patients who received 3-dimensional conformal radiation therapy for prostate cancer with at least 5 years of follow-up. The series was split 1000 times into training and validation cohorts. A RF was trained to predict the risk of 5-year overall rectal toxicity and bleeding. Parameters of the Lyman-Kutcher-Burman (LKB) model were identified and a logistic regression model was fit. The performance of all the models was assessed by computing the area under the receiving operating characteristic curve (AUC). Results: The 5-year grade ≥2 overall rectal toxicity and grade ≥1 and grade ≥2 rectal bleeding rates were 16%, 25%, and 10%, respectively. Predictive capabilities were obtained using the RF-NTCP model for all 3 toxicity endpoints, including both the training and validation cohorts. The age and use of anticoagulants were found to be predictors of rectal bleeding. The AUC for RF-NTCP ranged from 0.66 to 0.76, depending on the toxicity endpoint. The AUC values for the LKB-NTCP were statistically significantly inferior, ranging from 0.62 to 0.69. Conclusions: The RF-NTCP model may be a useful new tool in predicting late rectal toxicity, including variables other than DVH, and thus appears as a strong competitor to classic NTCP models

  2. Role of forest income in rural household livelihoods

    DEFF Research Database (Denmark)

    Misbahuzzaman, Khaled; Smith-Hall, Carsten

    2015-01-01

    as Village Common Forests (VCFs), which provide valuable resources for community use. An investigation was made of the role of forest income in livelihoods of selected VCF communities in Bandarban and Rangamati districts of the CHTs. Both quantitative and qualitative analyses were employed to examine...... the household livelihood system of the respondents selected at random from 7 villages. Data were collected through participatory rural appraisal and structured quarterly surveys. The contribution of all forest-related income was found to be much smaller (11.59 %) than that of agricultural income (77.......02 %) in average total household income. However, VCFs provide bamboos, which are the largest source of household forest income. Moreover, they harbour rich native tree diversity which is vital for maintaining perennial water sources upon which most household livelihood activities depend. Therefore, it seems...

  3. LigandRFs: random forest ensemble to identify ligand-binding residues from sequence information alone

    KAUST Repository

    Chen, Peng

    2014-12-03

    Background Protein-ligand binding is important for some proteins to perform their functions. Protein-ligand binding sites are the residues of proteins that physically bind to ligands. Despite of the recent advances in computational prediction for protein-ligand binding sites, the state-of-the-art methods search for similar, known structures of the query and predict the binding sites based on the solved structures. However, such structural information is not commonly available. Results In this paper, we propose a sequence-based approach to identify protein-ligand binding residues. We propose a combination technique to reduce the effects of different sliding residue windows in the process of encoding input feature vectors. Moreover, due to the highly imbalanced samples between the ligand-binding sites and non ligand-binding sites, we construct several balanced data sets, for each of which a random forest (RF)-based classifier is trained. The ensemble of these RF classifiers forms a sequence-based protein-ligand binding site predictor. Conclusions Experimental results on CASP9 and CASP8 data sets demonstrate that our method compares favorably with the state-of-the-art protein-ligand binding site prediction methods.

  4. Quantifying and Characterizing Tonic Thermal Pain Across Subjects From EEG Data Using Random Forest Models.

    Science.gov (United States)

    Vijayakumar, Vishal; Case, Michelle; Shirinpour, Sina; He, Bin

    2017-12-01

    Effective pain assessment and management strategies are needed to better manage pain. In addition to self-report, an objective pain assessment system can provide a more complete picture of the neurophysiological basis for pain. In this study, a robust and accurate machine learning approach is developed to quantify tonic thermal pain across healthy subjects into a maximum of ten distinct classes. A random forest model was trained to predict pain scores using time-frequency wavelet representations of independent components obtained from electroencephalography (EEG) data, and the relative importance of each frequency band to pain quantification is assessed. The mean classification accuracy for predicting pain on an independent test subject for a range of 1-10 is 89.45%, highest among existing state of the art quantification algorithms for EEG. The gamma band is the most important to both intersubject and intrasubject classification accuracy. The robustness and generalizability of the classifier are demonstrated. Our results demonstrate the potential of this tool to be used clinically to help us to improve chronic pain treatment and establish spectral biomarkers for future pain-related studies using EEG.

  5. Heuristic Relative Entropy Principles with Complex Measures: Large-Degree Asymptotics of a Family of Multi-variate Normal Random Polynomials

    Science.gov (United States)

    Kiessling, Michael Karl-Heinz

    2017-10-01

    Let z\\in C, let σ ^2>0 be a variance, and for N\\in N define the integrals E_N^{}(z;σ ) := {1/σ } \\int _R\\ (x^2+z^2) e^{-{1/2σ^2 x^2}}{√{2π }}/dx \\quad if N=1, {1/σ } \\int _{R^N} \\prod \\prod \\limits _{1≤ k1. These are expected values of the polynomials P_N^{}(z)=\\prod _{1≤ n≤ N}(X_n^2+z^2) whose 2 N zeros ± i X_k^{}_{k=1,\\ldots ,N} are generated by N identically distributed multi-variate mean-zero normal random variables {X_k}N_{k=1} with co-variance {Cov}_N^{}(X_k,X_l)=(1+σ ^2-1/N)δ _{k,l}+σ ^2-1/N(1-δ _{k,l}). The E_N^{}(z;σ ) are polynomials in z^2, explicitly computable for arbitrary N, yet a list of the first three E_N^{}(z;σ ) shows that the expressions become unwieldy already for moderate N—unless σ = 1, in which case E_N^{}(z;1) = (1+z^2)^N for all z\\in C and N\\in N. (Incidentally, commonly available computer algebra evaluates the integrals E_N^{}(z;σ ) only for N up to a dozen, due to memory constraints). Asymptotic evaluations are needed for the large- N regime. For general complex z these have traditionally been limited to analytic expansion techniques; several rigorous results are proved for complex z near 0. Yet if z\\in R one can also compute this "infinite-degree" limit with the help of the familiar relative entropy principle for probability measures; a rigorous proof of this fact is supplied. Computer algebra-generated evidence is presented in support of a conjecture that a generalization of the relative entropy principle to signed or complex measures governs the N→ ∞ asymptotics of the regime iz\\in R. Potential generalizations, in particular to point vortex ensembles and the prescribed Gauss curvature problem, and to random matrix ensembles, are emphasized.

  6. Data mining methods in the prediction of Dementia: A real-data comparison of the accuracy, sensitivity and specificity of linear discriminant analysis, logistic regression, neural networks, support vector machines, classification trees and random forests

    Directory of Open Access Journals (Sweden)

    Santana Isabel

    2011-08-01

    Full Text Available Abstract Background Dementia and cognitive impairment associated with aging are a major medical and social concern. Neuropsychological testing is a key element in the diagnostic procedures of Mild Cognitive Impairment (MCI, but has presently a limited value in the prediction of progression to dementia. We advance the hypothesis that newer statistical classification methods derived from data mining and machine learning methods like Neural Networks, Support Vector Machines and Random Forests can improve accuracy, sensitivity and specificity of predictions obtained from neuropsychological testing. Seven non parametric classifiers derived from data mining methods (Multilayer Perceptrons Neural Networks, Radial Basis Function Neural Networks, Support Vector Machines, CART, CHAID and QUEST Classification Trees and Random Forests were compared to three traditional classifiers (Linear Discriminant Analysis, Quadratic Discriminant Analysis and Logistic Regression in terms of overall classification accuracy, specificity, sensitivity, Area under the ROC curve and Press'Q. Model predictors were 10 neuropsychological tests currently used in the diagnosis of dementia. Statistical distributions of classification parameters obtained from a 5-fold cross-validation were compared using the Friedman's nonparametric test. Results Press' Q test showed that all classifiers performed better than chance alone (p Conclusions When taking into account sensitivity, specificity and overall classification accuracy Random Forests and Linear Discriminant analysis rank first among all the classifiers tested in prediction of dementia using several neuropsychological tests. These methods may be used to improve accuracy, sensitivity and specificity of Dementia predictions from neuropsychological testing.

  7. Channel Islands, Kelp Forest Monitoring, Survey, Random Point Contact, 1982-2007

    Data.gov (United States)

    National Oceanic and Atmospheric Administration, Department of Commerce — This dataset from the Channel Islands National Park's Kelp Forest Monitoring Program has estimates of substrate composition and percent cover of selected algal and...

  8. Application of the Random Forest method to analyse epidemiological and phenotypic characteristics of Salmonella 4,[5],12:i:- and Salmonella Typhimurium strains

    DEFF Research Database (Denmark)

    Barco, L.; Mancin, M.; Ruffa, M.

    2012-01-01

    in Italy, particularly as far as veterinary isolates are concerned. For this reason, a data set of 877 strains isolated in the north-east of Italy from foodstuffs, animals and environment was analysed during 2005-2010. The Random Forests (RF) method was used to identify the most important epidemiological...... and phenotypic variables to show the difference between the two serovars. Both descriptive analysis and RF revealed that S. 4,[5],12:i:- is less heterogeneous than S. Typhimurium. RF highlighted that phage type was the most important variable to differentiate the two serovars. The most common phage types...

  9. Modelling Forest α-Diversity and Floristic Composition — On the Added Value of LiDAR plus Hyperspectral Remote Sensing

    Directory of Open Access Journals (Sweden)

    Martin Wegmann

    2012-09-01

    Full Text Available The decline of biodiversity is one of the major current global issues. Still, there is a widespread lack of information about the spatial distribution of individual species and biodiversity as a whole. Remote sensing techniques are increasingly used for biodiversity monitoring and especially the combination of LiDAR and hyperspectral data is expected to deliver valuable information. In this study spatial patterns of vascular plant community composition and alpha-diversity of a temperate montane forest in Germany were analysed for different forest strata. The predictive power of LiDAR (LiD and hyperspectral (MNF datasets alone and combined (MNF+LiD was compared using random forest regression in a ten-fold cross-validation scheme that included feature selection and model tuning. The final models were used for spatial predictions. Species richness could be predicted with varying accuracy (R2 = 0.26 to 0.55 depending on the forest layer. In contrast, community composition of the different layers, obtained by multivariate ordination, could in part be modelled with high accuracies for the first ordination axis (R2 = 0.39 to 0.78, but poor accuracies for the second axis (R2 ≤ 0.3. LiDAR variables were the best predictors for total species richness across all forest layers (R2 LiD = 0.3, R2 MNF = 0.08, R2 MNF+LiD = 0.2, while for community composition across all forest layers both hyperspectral and LiDAR predictors achieved similar performances (R2 LiD = 0.75, R2 MNF = 0.76, R2 MNF+LiD = 0.78. The improvement in R2 was small (≤0.07—if any—when using both LiDAR and hyperspectral data as compared to using only the best single predictor set. This study shows the high potential of LiDAR and hyperspectral data for plant biodiversity modelling, but also calls for a critical evaluation of the added value of combining both with respect to acquisition costs.

  10. A multivariate model for predicting segmental body composition.

    Science.gov (United States)

    Tian, Simiao; Mioche, Laurence; Denis, Jean-Baptiste; Morio, Béatrice

    2013-12-01

    The aims of the present study were to propose a multivariate model for predicting simultaneously body, trunk and appendicular fat and lean masses from easily measured variables and to compare its predictive capacity with that of the available univariate models that predict body fat percentage (BF%). The dual-energy X-ray absorptiometry (DXA) dataset (52% men and 48% women) with White, Black and Hispanic ethnicities (1999-2004, National Health and Nutrition Examination Survey) was randomly divided into three sub-datasets: a training dataset (TRD), a test dataset (TED); a validation dataset (VAD), comprising 3835, 1917 and 1917 subjects. For each sex, several multivariate prediction models were fitted from the TRD using age, weight, height and possibly waist circumference. The most accurate model was selected from the TED and then applied to the VAD and a French DXA dataset (French DB) (526 men and 529 women) to assess the prediction accuracy in comparison with that of five published univariate models, for which adjusted formulas were re-estimated using the TRD. Waist circumference was found to improve the prediction accuracy, especially in men. For BF%, the standard error of prediction (SEP) values were 3.26 (3.75) % for men and 3.47 (3.95)% for women in the VAD (French DB), as good as those of the adjusted univariate models. Moreover, the SEP values for the prediction of body and appendicular lean masses ranged from 1.39 to 2.75 kg for both the sexes. The prediction accuracy was best for age < 65 years, BMI < 30 kg/m2 and the Hispanic ethnicity. The application of our multivariate model to large populations could be useful to address various public health issues.

  11. Rural Income and Forest Reliance in Highland Guatemala

    Science.gov (United States)

    Prado Córdova, José Pablo; Wunder, Sven; Smith-Hall, Carsten; Börner, Jan

    2013-05-01

    This paper estimates rural household-level forest reliance in the western highlands of Guatemala using quantitative methods. Data were generated by the way of an in-depth household income survey, repeated quarterly between November 2005 and November 2006, in 11 villages ( n = 149 randomly selected households). The main sources of income proved to be small-scale agriculture (53 % of total household income), wages (19 %) and environmental resources (14 %). The latter came primarily from forests (11 % on average). In the poorest quintile the forest income share was as high as 28 %. All households harvest and consume environmental products. In absolute terms, environmental income in the top quintile was 24 times higher than in the lowest. Timber and poles, seeds, firewood and leaf litter were the most important forest products. Households can be described as `regular subsistence users': the share of subsistence income is high, with correspondingly weak integration into regional markets. Agricultural systems furthermore use important inputs from surrounding forests, although forests and agricultural uses compete in household specialization strategies. We find the main household determinants of forest income to be household size, education and asset values, as well as closeness to markets and agricultural productivity. Understanding these common but spatially differentiated patterns of environmental reliance may inform policies aimed at improving livelihoods and conserving forests.

  12. Multivariate Birkhoff interpolation

    CERN Document Server

    Lorentz, Rudolph A

    1992-01-01

    The subject of this book is Lagrange, Hermite and Birkhoff (lacunary Hermite) interpolation by multivariate algebraic polynomials. It unifies and extends a new algorithmic approach to this subject which was introduced and developed by G.G. Lorentz and the author. One particularly interesting feature of this algorithmic approach is that it obviates the necessity of finding a formula for the Vandermonde determinant of a multivariate interpolation in order to determine its regularity (which formulas are practically unknown anyways) by determining the regularity through simple geometric manipulations in the Euclidean space. Although interpolation is a classical problem, it is surprising how little is known about its basic properties in the multivariate case. The book therefore starts by exploring its fundamental properties and its limitations. The main part of the book is devoted to a complete and detailed elaboration of the new technique. A chapter with an extensive selection of finite elements follows as well a...

  13. Transport of fallout radiocesium in the soil by bioturbation. A random walk model and application to a forest soil with a high abundance of earthworms

    International Nuclear Information System (INIS)

    Bunzl, K.

    2002-01-01

    It is well known that bioturbation can contribute significantly to the vertical transport of fallout radionuclides in grassland soils. To examine this effect also for a forest soil, activity-depth profiles of Chernobyl-derived 134Cs from a limed plot (soil, hapludalf under spruce) with a high abundance of earthworms (Lumbricus rubellus) in the Olu horizon (thickness=3.5 cm) were evaluated and compared with the corresponding depth profiles from an adjacent control plot. For this purpose, a random-walk based transport model was developed, which considers (1) the presence of an initial activity-depth distribution, (2) the deposition history of radiocesium at the soil surface, (3) individual diffusion/dispersion coefficients and convection rates for the different soil horizons, and (4) mixing by bioturbation within one soil horizon. With this model, the observed 134Cs-depth distribution at the control site (no bioturbation) and at the limed site could be simulated quite satisfactorily. It is shown that the observed, substantial long-term enrichment of 134Cs in the bioturbation horizon can be modeled by an exceptionally effective diffusion process, combined with a partial reflection of the randomly moving particles at the two borders of the bioturbation zone. The present model predicts significantly longer residence times of radiocesium in the organic soil layer of the forest soil than obtained from a first-order compartment model, which does not consider bioturbation explicitly

  14. Multivariate multiscale entropy of financial markets

    Science.gov (United States)

    Lu, Yunfan; Wang, Jun

    2017-11-01

    In current process of quantifying the dynamical properties of the complex phenomena in financial market system, the multivariate financial time series are widely concerned. In this work, considering the shortcomings and limitations of univariate multiscale entropy in analyzing the multivariate time series, the multivariate multiscale sample entropy (MMSE), which can evaluate the complexity in multiple data channels over different timescales, is applied to quantify the complexity of financial markets. Its effectiveness and advantages have been detected with numerical simulations with two well-known synthetic noise signals. For the first time, the complexity of four generated trivariate return series for each stock trading hour in China stock markets is quantified thanks to the interdisciplinary application of this method. We find that the complexity of trivariate return series in each hour show a significant decreasing trend with the stock trading time progressing. Further, the shuffled multivariate return series and the absolute multivariate return series are also analyzed. As another new attempt, quantifying the complexity of global stock markets (Asia, Europe and America) is carried out by analyzing the multivariate returns from them. Finally we utilize the multivariate multiscale entropy to assess the relative complexity of normalized multivariate return volatility series with different degrees.

  15. Impact of Reducing Polarimetric SAR Input on the Uncertainty of Crop Classifications Based on the Random Forests Algorithm

    DEFF Research Database (Denmark)

    Loosvelt, Lien; Peters, Jan; Skriver, Henning

    2012-01-01

    Although the use of multidate polarimetric synthetic aperture radar (SAR) data for highly accurate land cover classification has been acknowledged in the literature, the high dimensionality of the data set remains a major issue. This study presents two different strategies to reduce the number...... acquired by the Danish EMISAR on four dates within the period April to July in 1998. The predictive capacity of each feature is analyzed by the importance score generated by random forests (RF). Results show that according to the variation in importance score over time, a distinction can be made between...... general and specific features for crop classification. Based on the importance ranking, features are gradually removed from the single-date data sets in order to construct several multidate data sets with decreasing dimensionality. In the accuracy-oriented and efficiency-oriented reduction, the input...

  16. Modelling above Ground Biomass of Mangrove Forest Using SENTINEL-1 Imagery

    Science.gov (United States)

    Labadisos Argamosa, Reginald Jay; Conferido Blanco, Ariel; Balidoy Baloloy, Alvin; Gumbao Candido, Christian; Lovern Caboboy Dumalag, John Bart; Carandang Dimapilis, Lee, , Lady; Camero Paringit, Enrico

    2018-04-01

    Many studies have been conducted in the estimation of forest above ground biomass (AGB) using features from synthetic aperture radar (SAR). Specifically, L-band ALOS/PALSAR (wavelength 23 cm) data is often used. However, few studies have been made on the use of shorter wavelengths (e.g., C-band, 3.75 cm to 7.5 cm) for forest mapping especially in tropical forests since higher attenuation is observed for volumetric objects where energy propagated is absorbed. This study aims to model AGB estimates of mangrove forest using information derived from Sentinel-1 C-band SAR data. Combinations of polarisations (VV, VH), its derivatives, grey level co-occurrence matrix (GLCM), and its principal components were used as features for modelling AGB. Five models were tested with varying combinations of features; a) sigma nought polarisations and its derivatives; b) GLCM textures; c) the first five principal components; d) combination of models a-c; and e) the identified important features by Random Forest variable importance algorithm. Random Forest was used as regressor to compute for the AGB estimates to avoid over fitting caused by the introduction of too many features in the model. Model e obtained the highest r2 of 0.79 and an RMSE of 0.44 Mg using only four features, namely, σ°VH GLCM variance, σ°VH GLCM contrast, PC1, and PC2. This study shows that Sentinel-1 C-band SAR data could be used to produce acceptable AGB estimates in mangrove forest to compensate for the unavailability of longer wavelength SAR.

  17. Testing the structure of earthquake networks from multivariate time series of successive main shocks in Greece

    Science.gov (United States)

    Chorozoglou, D.; Kugiumtzis, D.; Papadimitriou, E.

    2018-06-01

    The seismic hazard assessment in the area of Greece is attempted by studying the earthquake network structure, such as small-world and random. In this network, a node represents a seismic zone in the study area and a connection between two nodes is given by the correlation of the seismic activity of two zones. To investigate the network structure, and particularly the small-world property, the earthquake correlation network is compared with randomized ones. Simulations on multivariate time series of different length and number of variables show that for the construction of randomized networks the method randomizing the time series performs better than methods randomizing directly the original network connections. Based on the appropriate randomization method, the network approach is applied to time series of earthquakes that occurred between main shocks in the territory of Greece spanning the period 1999-2015. The characterization of networks on sliding time windows revealed that small-world structure emerges in the last time interval, shortly before the main shock.

  18. Multivariate stochastic analysis for Monthly hydrological time series at Cuyahoga River Basin

    Science.gov (United States)

    zhang, L.

    2011-12-01

    Copula has become a very powerful statistic and stochastic methodology in case of the multivariate analysis in Environmental and Water resources Engineering. In recent years, the popular one-parameter Archimedean copulas, e.g. Gumbel-Houggard copula, Cook-Johnson copula, Frank copula, the meta-elliptical copula, e.g. Gaussian Copula, Student-T copula, etc. have been applied in multivariate hydrological analyses, e.g. multivariate rainfall (rainfall intensity, duration and depth), flood (peak discharge, duration and volume), and drought analyses (drought length, mean and minimum SPI values, and drought mean areal extent). Copula has also been applied in the flood frequency analysis at the confluences of river systems by taking into account the dependence among upstream gauge stations rather than by using the hydrological routing technique. In most of the studies above, the annual time series have been considered as stationary signal which the time series have been assumed as independent identically distributed (i.i.d.) random variables. But in reality, hydrological time series, especially the daily and monthly hydrological time series, cannot be considered as i.i.d. random variables due to the periodicity existed in the data structure. Also, the stationary assumption is also under question due to the Climate Change and Land Use and Land Cover (LULC) change in the fast years. To this end, it is necessary to revaluate the classic approach for the study of hydrological time series by relaxing the stationary assumption by the use of nonstationary approach. Also as to the study of the dependence structure for the hydrological time series, the assumption of same type of univariate distribution also needs to be relaxed by adopting the copula theory. In this paper, the univariate monthly hydrological time series will be studied through the nonstationary time series analysis approach. The dependence structure of the multivariate monthly hydrological time series will be

  19. Downscaling of surface moisture flux and precipitation in the Ebro Valley (Spain using analogues and analogues followed by random forests and multiple linear regression

    Directory of Open Access Journals (Sweden)

    G. Ibarra-Berastegi

    2011-06-01

    Full Text Available In this paper, reanalysis fields from the ECMWF have been statistically downscaled to predict from large-scale atmospheric fields, surface moisture flux and daily precipitation at two observatories (Zaragoza and Tortosa, Ebro Valley, Spain during the 1961–2001 period. Three types of downscaling models have been built: (i analogues, (ii analogues followed by random forests and (iii analogues followed by multiple linear regression. The inputs consist of data (predictor fields taken from the ERA-40 reanalysis. The predicted fields are precipitation and surface moisture flux as measured at the two observatories. With the aim to reduce the dimensionality of the problem, the ERA-40 fields have been decomposed using empirical orthogonal functions. Available daily data has been divided into two parts: a training period used to find a group of about 300 analogues to build the downscaling model (1961–1996 and a test period (1997–2001, where models' performance has been assessed using independent data. In the case of surface moisture flux, the models based on analogues followed by random forests do not clearly outperform those built on analogues plus multiple linear regression, while simple averages calculated from the nearest analogues found in the training period, yielded only slightly worse results. In the case of precipitation, the three types of model performed equally. These results suggest that most of the models' downscaling capabilities can be attributed to the analogues-calculation stage.

  20. Quantification of the heterogeneity of prognostic cellular biomarkers in ewing sarcoma using automated image and random survival forest analysis.

    Directory of Open Access Journals (Sweden)

    Claudia Bühnemann

    Full Text Available Driven by genomic somatic variation, tumour tissues are typically heterogeneous, yet unbiased quantitative methods are rarely used to analyse heterogeneity at the protein level. Motivated by this problem, we developed automated image segmentation of images of multiple biomarkers in Ewing sarcoma to generate distributions of biomarkers between and within tumour cells. We further integrate high dimensional data with patient clinical outcomes utilising random survival forest (RSF machine learning. Using material from cohorts of genetically diagnosed Ewing sarcoma with EWSR1 chromosomal translocations, confocal images of tissue microarrays were segmented with level sets and watershed algorithms. Each cell nucleus and cytoplasm were identified in relation to DAPI and CD99, respectively, and protein biomarkers (e.g. Ki67, pS6, Foxo3a, EGR1, MAPK localised relative to nuclear and cytoplasmic regions of each cell in order to generate image feature distributions. The image distribution features were analysed with RSF in relation to known overall patient survival from three separate cohorts (185 informative cases. Variation in pre-analytical processing resulted in elimination of a high number of non-informative images that had poor DAPI localisation or biomarker preservation (67 cases, 36%. The distribution of image features for biomarkers in the remaining high quality material (118 cases, 104 features per case were analysed by RSF with feature selection, and performance assessed using internal cross-validation, rather than a separate validation cohort. A prognostic classifier for Ewing sarcoma with low cross-validation error rates (0.36 was comprised of multiple features, including the Ki67 proliferative marker and a sub-population of cells with low cytoplasmic/nuclear ratio of CD99. Through elimination of bias, the evaluation of high-dimensionality biomarker distribution within cell populations of a tumour using random forest analysis in quality

  1. Quantification of the heterogeneity of prognostic cellular biomarkers in ewing sarcoma using automated image and random survival forest analysis.

    Science.gov (United States)

    Bühnemann, Claudia; Li, Simon; Yu, Haiyue; Branford White, Harriet; Schäfer, Karl L; Llombart-Bosch, Antonio; Machado, Isidro; Picci, Piero; Hogendoorn, Pancras C W; Athanasou, Nicholas A; Noble, J Alison; Hassan, A Bassim

    2014-01-01

    Driven by genomic somatic variation, tumour tissues are typically heterogeneous, yet unbiased quantitative methods are rarely used to analyse heterogeneity at the protein level. Motivated by this problem, we developed automated image segmentation of images of multiple biomarkers in Ewing sarcoma to generate distributions of biomarkers between and within tumour cells. We further integrate high dimensional data with patient clinical outcomes utilising random survival forest (RSF) machine learning. Using material from cohorts of genetically diagnosed Ewing sarcoma with EWSR1 chromosomal translocations, confocal images of tissue microarrays were segmented with level sets and watershed algorithms. Each cell nucleus and cytoplasm were identified in relation to DAPI and CD99, respectively, and protein biomarkers (e.g. Ki67, pS6, Foxo3a, EGR1, MAPK) localised relative to nuclear and cytoplasmic regions of each cell in order to generate image feature distributions. The image distribution features were analysed with RSF in relation to known overall patient survival from three separate cohorts (185 informative cases). Variation in pre-analytical processing resulted in elimination of a high number of non-informative images that had poor DAPI localisation or biomarker preservation (67 cases, 36%). The distribution of image features for biomarkers in the remaining high quality material (118 cases, 104 features per case) were analysed by RSF with feature selection, and performance assessed using internal cross-validation, rather than a separate validation cohort. A prognostic classifier for Ewing sarcoma with low cross-validation error rates (0.36) was comprised of multiple features, including the Ki67 proliferative marker and a sub-population of cells with low cytoplasmic/nuclear ratio of CD99. Through elimination of bias, the evaluation of high-dimensionality biomarker distribution within cell populations of a tumour using random forest analysis in quality controlled tumour

  2. SU-C-207B-05: Tissue Segmentation of Computed Tomography Images Using a Random Forest Algorithm: A Feasibility Study

    International Nuclear Information System (INIS)

    Polan, D; Brady, S; Kaufman, R

    2016-01-01

    Purpose: Develop an automated Random Forest algorithm for tissue segmentation of CT examinations. Methods: Seven materials were classified for segmentation: background, lung/internal gas, fat, muscle, solid organ parenchyma, blood/contrast, and bone using Matlab and the Trainable Weka Segmentation (TWS) plugin of FIJI. The following classifier feature filters of TWS were investigated: minimum, maximum, mean, and variance each evaluated over a pixel radius of 2n, (n = 0–4). Also noise reduction and edge preserving filters, Gaussian, bilateral, Kuwahara, and anisotropic diffusion, were evaluated. The algorithm used 200 trees with 2 features per node. A training data set was established using an anonymized patient’s (male, 20 yr, 72 kg) chest-abdomen-pelvis CT examination. To establish segmentation ground truth, the training data were manually segmented using Eclipse planning software, and an intra-observer reproducibility test was conducted. Six additional patient data sets were segmented based on classifier data generated from the training data. Accuracy of segmentation was determined by calculating the Dice similarity coefficient (DSC) between manual and auto segmented images. Results: The optimized autosegmentation algorithm resulted in 16 features calculated using maximum, mean, variance, and Gaussian blur filters with kernel radii of 1, 2, and 4 pixels, in addition to the original CT number, and Kuwahara filter (linear kernel of 19 pixels). Ground truth had a DSC of 0.94 (range: 0.90–0.99) for adult and 0.92 (range: 0.85–0.99) for pediatric data sets across all seven segmentation classes. The automated algorithm produced segmentation with an average DSC of 0.85 ± 0.04 (range: 0.81–1.00) for the adult patients, and 0.86 ± 0.03 (range: 0.80–0.99) for the pediatric patients. Conclusion: The TWS Random Forest auto-segmentation algorithm was optimized for CT environment, and able to segment seven material classes over a range of body habitus and CT

  3. SU-C-207B-05: Tissue Segmentation of Computed Tomography Images Using a Random Forest Algorithm: A Feasibility Study

    Energy Technology Data Exchange (ETDEWEB)

    Polan, D [University of Michigan, Ann Arbor, MI (United States); Brady, S; Kaufman, R [St. Jude Children’s Research Hospital, Memphis, TN (United States)

    2016-06-15

    Purpose: Develop an automated Random Forest algorithm for tissue segmentation of CT examinations. Methods: Seven materials were classified for segmentation: background, lung/internal gas, fat, muscle, solid organ parenchyma, blood/contrast, and bone using Matlab and the Trainable Weka Segmentation (TWS) plugin of FIJI. The following classifier feature filters of TWS were investigated: minimum, maximum, mean, and variance each evaluated over a pixel radius of 2n, (n = 0–4). Also noise reduction and edge preserving filters, Gaussian, bilateral, Kuwahara, and anisotropic diffusion, were evaluated. The algorithm used 200 trees with 2 features per node. A training data set was established using an anonymized patient’s (male, 20 yr, 72 kg) chest-abdomen-pelvis CT examination. To establish segmentation ground truth, the training data were manually segmented using Eclipse planning software, and an intra-observer reproducibility test was conducted. Six additional patient data sets were segmented based on classifier data generated from the training data. Accuracy of segmentation was determined by calculating the Dice similarity coefficient (DSC) between manual and auto segmented images. Results: The optimized autosegmentation algorithm resulted in 16 features calculated using maximum, mean, variance, and Gaussian blur filters with kernel radii of 1, 2, and 4 pixels, in addition to the original CT number, and Kuwahara filter (linear kernel of 19 pixels). Ground truth had a DSC of 0.94 (range: 0.90–0.99) for adult and 0.92 (range: 0.85–0.99) for pediatric data sets across all seven segmentation classes. The automated algorithm produced segmentation with an average DSC of 0.85 ± 0.04 (range: 0.81–1.00) for the adult patients, and 0.86 ± 0.03 (range: 0.80–0.99) for the pediatric patients. Conclusion: The TWS Random Forest auto-segmentation algorithm was optimized for CT environment, and able to segment seven material classes over a range of body habitus and CT

  4. The contribution of competition to tree mortality in old-growth coniferous forests

    Science.gov (United States)

    Das, A.; Battles, J.; Stephenson, N.L.; van Mantgem, P.J.

    2011-01-01

    Competition is a well-documented contributor to tree mortality in temperate forests, with numerous studies documenting a relationship between tree death and the competitive environment. Models frequently rely on competition as the only non-random mechanism affecting tree mortality. However, for mature forests, competition may cease to be the primary driver of mortality.We use a large, long-term dataset to study the importance of competition in determining tree mortality in old-growth forests on the western slope of the Sierra Nevada of California, U.S.A. We make use of the comparative spatial configuration of dead and live trees, changes in tree spatial pattern through time, and field assessments of contributors to an individual tree's death to quantify competitive effects.Competition was apparently a significant contributor to tree mortality in these forests. Trees that died tended to be in more competitive environments than trees that survived, and suppression frequently appeared as a factor contributing to mortality. On the other hand, based on spatial pattern analyses, only three of 14 plots demonstrated compelling evidence that competition was dominating mortality. Most of the rest of the plots fell within the expectation for random mortality, and three fit neither the random nor the competition model. These results suggest that while competition is often playing a significant role in tree mortality processes in these forests it only infrequently governs those processes. In addition, the field assessments indicated a substantial presence of biotic mortality agents in trees that died.While competition is almost certainly important, demographics in these forests cannot accurately be characterized without a better grasp of other mortality processes. In particular, we likely need a better understanding of biotic agents and their interactions with one another and with competition. ?? 2011.

  5. Tropical secondary forests regenerating after shifting cultivation in the Philippines uplands are important carbon sinks.

    Science.gov (United States)

    Mukul, Sharif A; Herbohn, John; Firn, Jennifer

    2016-03-08

    In the tropics, shifting cultivation has long been attributed to large scale forest degradation, and remains a major source of uncertainty in forest carbon accounting. In the Philippines, shifting cultivation, locally known as kaingin, is a major land-use in upland areas. We measured the distribution and recovery of aboveground biomass carbon along a fallow gradient in post-kaingin secondary forests in an upland area in the Philippines. We found significantly higher carbon in the aboveground total biomass and living woody biomass in old-growth forest, while coarse dead wood biomass carbon was higher in the new fallow sites. For young through to the oldest fallow secondary forests, there was a progressive recovery of biomass carbon evident. Multivariate analysis indicates patch size as an influential factor in explaining the variation in biomass carbon recovery in secondary forests after shifting cultivation. Our study indicates secondary forests after shifting cultivation are substantial carbon sinks and that this capacity to store carbon increases with abandonment age. Large trees contribute most to aboveground biomass. A better understanding of the relative contribution of different biomass sources in aboveground total forest biomass, however, is necessary to fully capture the value of such landscapes from forest management, restoration and conservation perspectives.

  6. Some limit theorems for negatively associated random variables

    Indian Academy of Sciences (India)

    random sampling without replacement, and (i) joint distribution of ranks. ... wide applications in multivariate statistical analysis and system reliability, the ... strong law of large numbers for negatively associated sequences under the case where.

  7. Multivariate refined composite multiscale entropy analysis

    International Nuclear Information System (INIS)

    Humeau-Heurtier, Anne

    2016-01-01

    Multiscale entropy (MSE) has become a prevailing method to quantify signals complexity. MSE relies on sample entropy. However, MSE may yield imprecise complexity estimation at large scales, because sample entropy does not give precise estimation of entropy when short signals are processed. A refined composite multiscale entropy (RCMSE) has therefore recently been proposed. Nevertheless, RCMSE is for univariate signals only. The simultaneous analysis of multi-channel (multivariate) data often over-performs studies based on univariate signals. We therefore introduce an extension of RCMSE to multivariate data. Applications of multivariate RCMSE to simulated processes reveal its better performances over the standard multivariate MSE. - Highlights: • Multiscale entropy quantifies data complexity but may be inaccurate at large scale. • A refined composite multiscale entropy (RCMSE) has therefore recently been proposed. • Nevertheless, RCMSE is adapted to univariate time series only. • We herein introduce an extension of RCMSE to multivariate data. • It shows better performances than the standard multivariate multiscale entropy.

  8. Multivariate Generalized Multiscale Entropy Analysis

    Directory of Open Access Journals (Sweden)

    Anne Humeau-Heurtier

    2016-11-01

    Full Text Available Multiscale entropy (MSE was introduced in the 2000s to quantify systems’ complexity. MSE relies on (i a coarse-graining procedure to derive a set of time series representing the system dynamics on different time scales; (ii the computation of the sample entropy for each coarse-grained time series. A refined composite MSE (rcMSE—based on the same steps as MSE—also exists. Compared to MSE, rcMSE increases the accuracy of entropy estimation and reduces the probability of inducing undefined entropy for short time series. The multivariate versions of MSE (MMSE and rcMSE (MrcMSE have also been introduced. In the coarse-graining step used in MSE, rcMSE, MMSE, and MrcMSE, the mean value is used to derive representations of the original data at different resolutions. A generalization of MSE was recently published, using the computation of different moments in the coarse-graining procedure. However, so far, this generalization only exists for univariate signals. We therefore herein propose an extension of this generalized MSE to multivariate data. The multivariate generalized algorithms of MMSE and MrcMSE presented herein (MGMSE and MGrcMSE, respectively are first analyzed through the processing of synthetic signals. We reveal that MGrcMSE shows better performance than MGMSE for short multivariate data. We then study the performance of MGrcMSE on two sets of short multivariate electroencephalograms (EEG available in the public domain. We report that MGrcMSE may show better performance than MrcMSE in distinguishing different types of multivariate EEG data. MGrcMSE could therefore supplement MMSE or MrcMSE in the processing of multivariate datasets.

  9. Land-use history affects understorey plant species distributions in a large temperate-forest complex, Denmark

    DEFF Research Database (Denmark)

    Svenning, J.-C.; Baktoft, Karen H.; Balslev, Henrik

    2009-01-01

    In Europe, forests have been strongly influenced by human land-use for millennia. Here, we studied the importance of anthropogenic historical factors as determinants of understorey species distributions in a 967 ha Danish forest complex using 156 randomly placed 100-m2 plots, 15 environmental, 9...... dispersal and a strong literature record as ancient-forest species, were still concentrated in areas that were high forest in 1805. Among the younger forests, there were clear floristic differences between those on reclaimed bogs and those not. Apparently remnant populations of wet-soil plants were still...

  10. A Prospectus on Restoring Late Successional Forest Structure to Eastside Pine Ecosystems Through Large-Scale, Interdisciplinary Research

    Science.gov (United States)

    Steve Zack; William F. Laudenslayer; Luke George; Carl Skinner; William Oliver

    1999-01-01

    At two different locations in northeast California, an interdisciplinary team of scientists is initiating long-term studies to quantify the effects of forest manipulations intended to accelerate andlor enhance late-successional structure of eastside pine forest ecosystems. One study, at Blacks Mountain Experimental Forest, uses a split-plot, factorial, randomized block...

  11. Sequence-Based Prediction of RNA-Binding Proteins Using Random Forest with Minimum Redundancy Maximum Relevance Feature Selection

    Directory of Open Access Journals (Sweden)

    Xin Ma

    2015-01-01

    Full Text Available The prediction of RNA-binding proteins is one of the most challenging problems in computation biology. Although some studies have investigated this problem, the accuracy of prediction is still not sufficient. In this study, a highly accurate method was developed to predict RNA-binding proteins from amino acid sequences using random forests with the minimum redundancy maximum relevance (mRMR method, followed by incremental feature selection (IFS. We incorporated features of conjoint triad features and three novel features: binding propensity (BP, nonbinding propensity (NBP, and evolutionary information combined with physicochemical properties (EIPP. The results showed that these novel features have important roles in improving the performance of the predictor. Using the mRMR-IFS method, our predictor achieved the best performance (86.62% accuracy and 0.737 Matthews correlation coefficient. High prediction accuracy and successful prediction performance suggested that our method can be a useful approach to identify RNA-binding proteins from sequence information.

  12. Multivariate quantile mapping bias correction: an N-dimensional probability density function transform for climate model simulations of multiple variables

    Science.gov (United States)

    Cannon, Alex J.

    2018-01-01

    Most bias correction algorithms used in climatology, for example quantile mapping, are applied to univariate time series. They neglect the dependence between different variables. Those that are multivariate often correct only limited measures of joint dependence, such as Pearson or Spearman rank correlation. Here, an image processing technique designed to transfer colour information from one image to another—the N-dimensional probability density function transform—is adapted for use as a multivariate bias correction algorithm (MBCn) for climate model projections/predictions of multiple climate variables. MBCn is a multivariate generalization of quantile mapping that transfers all aspects of an observed continuous multivariate distribution to the corresponding multivariate distribution of variables from a climate model. When applied to climate model projections, changes in quantiles of each variable between the historical and projection period are also preserved. The MBCn algorithm is demonstrated on three case studies. First, the method is applied to an image processing example with characteristics that mimic a climate projection problem. Second, MBCn is used to correct a suite of 3-hourly surface meteorological variables from the Canadian Centre for Climate Modelling and Analysis Regional Climate Model (CanRCM4) across a North American domain. Components of the Canadian Forest Fire Weather Index (FWI) System, a complicated set of multivariate indices that characterizes the risk of wildfire, are then calculated and verified against observed values. Third, MBCn is used to correct biases in the spatial dependence structure of CanRCM4 precipitation fields. Results are compared against a univariate quantile mapping algorithm, which neglects the dependence between variables, and two multivariate bias correction algorithms, each of which corrects a different form of inter-variable correlation structure. MBCn outperforms these alternatives, often by a large margin

  13. [The Effects of Urban Forest-walking Program on Health Promotion Behavior, Physical Health, Depression, and Quality of Life: A Randomized Controlled Trial of Office-workers].

    Science.gov (United States)

    Bang, Kyung Sook; Lee, In Sook; Kim, Sung Jae; Song, Min Kyung; Park, Se Eun

    2016-02-01

    This study was performed to determine the physical and psychological effects of an urban forest-walking program for office workers. For many workers, sedentary lifestyles can lead to low levels of physical activity causing various health problems despite an increased interest in health promotion. Fifty four office workers participated in this study. They were assigned to two groups (experimental group and control group) in random order and the experimental group performed 5 weeks of walking exercise based on Information-Motivation-Behavioral skills Model. The data were collected from October to November 2014. SPSS 21.0 was used for the statistical analysis. The results showed that the urban forest walking program had positive effects on the physical activity level (U=65.00, phealth promotion behavior (t=-2.20, p=.033), and quality of life (t=-2.42, p=.020). However, there were no statistical differences in depression, waist size, body mass index, blood pressure, or bone density between the groups. The current findings of the study suggest the forest-walking program may have positive effects on improving physical activity, health promotion behavior, and quality of life. The program can be used as an effective and efficient strategy for physical and psychological health promotion for office workers.

  14. Nonparametric Bayes Modeling of Multivariate Categorical Data.

    Science.gov (United States)

    Dunson, David B; Xing, Chuanhua

    2012-01-01

    Modeling of multivariate unordered categorical (nominal) data is a challenging problem, particularly in high dimensions and cases in which one wishes to avoid strong assumptions about the dependence structure. Commonly used approaches rely on the incorporation of latent Gaussian random variables or parametric latent class models. The goal of this article is to develop a nonparametric Bayes approach, which defines a prior with full support on the space of distributions for multiple unordered categorical variables. This support condition ensures that we are not restricting the dependence structure a priori. We show this can be accomplished through a Dirichlet process mixture of product multinomial distributions, which is also a convenient form for posterior computation. Methods for nonparametric testing of violations of independence are proposed, and the methods are applied to model positional dependence within transcription factor binding motifs.

  15. Predicting live and dead tree basal area of bark beetle affected forests from discrete-return lidar

    Science.gov (United States)

    Benjamin C. Bright; Andrew T. Hudak; Robert McGaughey; Hans-Erik Andersen; Jose Negron

    2013-01-01

    Bark beetle outbreaks have killed large numbers of trees across North America in recent years. Lidar remote sensing can be used to effectively estimate forest biomass, but prediction of both live and dead standing biomass in beetle-affected forests using lidar alone has not been demonstrated. We developed Random Forest (RF) models predicting total, live, dead, and...

  16. Human tracking in thermal images using adaptive particle filters with online random forest learning

    Science.gov (United States)

    Ko, Byoung Chul; Kwak, Joon-Young; Nam, Jae-Yeal

    2013-11-01

    This paper presents a fast and robust human tracking method to use in a moving long-wave infrared thermal camera under poor illumination with the existence of shadows and cluttered backgrounds. To improve the human tracking performance while minimizing the computation time, this study proposes an online learning of classifiers based on particle filters and combination of a local intensity distribution (LID) with oriented center-symmetric local binary patterns (OCS-LBP). Specifically, we design a real-time random forest (RF), which is the ensemble of decision trees for confidence estimation, and confidences of the RF are converted into a likelihood function of the target state. First, the target model is selected by the user and particles are sampled. Then, RFs are generated using the positive and negative examples with LID and OCS-LBP features by online learning. The learned RF classifiers are used to detect the most likely target position in the subsequent frame in the next stage. Then, the RFs are learned again by means of fast retraining with the tracked object and background appearance in the new frame. The proposed algorithm is successfully applied to various thermal videos as tests and its tracking performance is better than those of other methods.

  17. Multivariate pattern dependence.

    Directory of Open Access Journals (Sweden)

    Stefano Anzellotti

    2017-11-01

    Full Text Available When we perform a cognitive task, multiple brain regions are engaged. Understanding how these regions interact is a fundamental step to uncover the neural bases of behavior. Most research on the interactions between brain regions has focused on the univariate responses in the regions. However, fine grained patterns of response encode important information, as shown by multivariate pattern analysis. In the present article, we introduce and apply multivariate pattern dependence (MVPD: a technique to study the statistical dependence between brain regions in humans in terms of the multivariate relations between their patterns of responses. MVPD characterizes the responses in each brain region as trajectories in region-specific multidimensional spaces, and models the multivariate relationship between these trajectories. We applied MVPD to the posterior superior temporal sulcus (pSTS and to the fusiform face area (FFA, using a searchlight approach to reveal interactions between these seed regions and the rest of the brain. Across two different experiments, MVPD identified significant statistical dependence not detected by standard functional connectivity. Additionally, MVPD outperformed univariate connectivity in its ability to explain independent variance in the responses of individual voxels. In the end, MVPD uncovered different connectivity profiles associated with different representational subspaces of FFA: the first principal component of FFA shows differential connectivity with occipital and parietal regions implicated in the processing of low-level properties of faces, while the second and third components show differential connectivity with anterior temporal regions implicated in the processing of invariant representations of face identity.

  18. A Random Forest approach to predict the spatial distribution of sediment pollution in an estuarine system.

    Directory of Open Access Journals (Sweden)

    Eric S Walsh

    Full Text Available Modeling the magnitude and distribution of sediment-bound pollutants in estuaries is often limited by incomplete knowledge of the site and inadequate sample density. To address these modeling limitations, a decision-support tool framework was conceived that predicts sediment contamination from the sub-estuary to broader estuary extent. For this study, a Random Forest (RF model was implemented to predict the distribution of a model contaminant, triclosan (5-chloro-2-(2,4-dichlorophenoxyphenol (TCS, in Narragansett Bay, Rhode Island, USA. TCS is an unregulated contaminant used in many personal care products. The RF explanatory variables were associated with TCS transport and fate (proxies and direct and indirect environmental entry. The continuous RF TCS concentration predictions were discretized into three levels of contamination (low, medium, and high for three different quantile thresholds. The RF model explained 63% of the variance with a minimum number of variables. Total organic carbon (TOC (transport and fate proxy was a strong predictor of TCS contamination causing a mean squared error increase of 59% when compared to permutations of randomized values of TOC. Additionally, combined sewer overflow discharge (environmental entry and sand (transport and fate proxy were strong predictors. The discretization models identified a TCS area of greatest concern in the northern reach of Narragansett Bay (Providence River sub-estuary, which was validated with independent test samples. This decision-support tool performed well at the sub-estuary extent and provided the means to identify areas of concern and prioritize bay-wide sampling.

  19. A Permutation Importance-Based Feature Selection Method for Short-Term Electricity Load Forecasting Using Random Forest

    Directory of Open Access Journals (Sweden)

    Nantian Huang

    2016-09-01

    Full Text Available The prediction accuracy of short-term load forecast (STLF depends on prediction model choice and feature selection result. In this paper, a novel random forest (RF-based feature selection method for STLF is proposed. First, 243 related features were extracted from historical load data and the time information of prediction points to form the original feature set. Subsequently, the original feature set was used to train an RF as the original model. After the training process, the prediction error of the original model on the test set was recorded and the permutation importance (PI value of each feature was obtained. Then, an improved sequential backward search method was used to select the optimal forecasting feature subset based on the PI value of each feature. Finally, the optimal forecasting feature subset was used to train a new RF model as the final prediction model. Experiments showed that the prediction accuracy of RF trained by the optimal forecasting feature subset was higher than that of the original model and comparative models based on support vector regression and artificial neural network.

  20. Multivariate Max-Stable Spatial Processes

    KAUST Repository

    Genton, Marc G.

    2014-01-06

    Analysis of spatial extremes is currently based on univariate processes. Max-stable processes allow the spatial dependence of extremes to be modelled and explicitly quantified, they are therefore widely adopted in applications. For a better understanding of extreme events of real processes, such as environmental phenomena, it may be useful to study several spatial variables simultaneously. To this end, we extend some theoretical results and applications of max-stable processes to the multivariate setting to analyze extreme events of several variables observed across space. In particular, we study the maxima of independent replicates of multivariate processes, both in the Gaussian and Student-t cases. Then, we define a Poisson process construction in the multivariate setting and introduce multivariate versions of the Smith Gaussian extremevalue, the Schlather extremal-Gaussian and extremal-t, and the BrownResnick models. Inferential aspects of those models based on composite likelihoods are developed. We present results of various Monte Carlo simulations and of an application to a dataset of summer daily temperature maxima and minima in Oklahoma, U.S.A., highlighting the utility of working with multivariate models in contrast to the univariate case. Based on joint work with Simone Padoan and Huiyan Sang.

  1. Multivariate Max-Stable Spatial Processes

    KAUST Repository

    Genton, Marc G.

    2014-01-01

    Analysis of spatial extremes is currently based on univariate processes. Max-stable processes allow the spatial dependence of extremes to be modelled and explicitly quantified, they are therefore widely adopted in applications. For a better understanding of extreme events of real processes, such as environmental phenomena, it may be useful to study several spatial variables simultaneously. To this end, we extend some theoretical results and applications of max-stable processes to the multivariate setting to analyze extreme events of several variables observed across space. In particular, we study the maxima of independent replicates of multivariate processes, both in the Gaussian and Student-t cases. Then, we define a Poisson process construction in the multivariate setting and introduce multivariate versions of the Smith Gaussian extremevalue, the Schlather extremal-Gaussian and extremal-t, and the BrownResnick models. Inferential aspects of those models based on composite likelihoods are developed. We present results of various Monte Carlo simulations and of an application to a dataset of summer daily temperature maxima and minima in Oklahoma, U.S.A., highlighting the utility of working with multivariate models in contrast to the univariate case. Based on joint work with Simone Padoan and Huiyan Sang.

  2. Bayesian inference for multivariate meta-analysis Box-Cox transformation models for individual patient data with applications to evaluation of cholesterol-lowering drugs.

    Science.gov (United States)

    Kim, Sungduk; Chen, Ming-Hui; Ibrahim, Joseph G; Shah, Arvind K; Lin, Jianxin

    2013-10-15

    In this paper, we propose a class of Box-Cox transformation regression models with multidimensional random effects for analyzing multivariate responses for individual patient data in meta-analysis. Our modeling formulation uses a multivariate normal response meta-analysis model with multivariate random effects, in which each response is allowed to have its own Box-Cox transformation. Prior distributions are specified for the Box-Cox transformation parameters as well as the regression coefficients in this complex model, and the deviance information criterion is used to select the best transformation model. Because the model is quite complex, we develop a novel Monte Carlo Markov chain sampling scheme to sample from the joint posterior of the parameters. This model is motivated by a very rich dataset comprising 26 clinical trials involving cholesterol-lowering drugs where the goal is to jointly model the three-dimensional response consisting of low density lipoprotein cholesterol (LDL-C), high density lipoprotein cholesterol (HDL-C), and triglycerides (TG) (LDL-C, HDL-C, TG). Because the joint distribution of (LDL-C, HDL-C, TG) is not multivariate normal and in fact quite skewed, a Box-Cox transformation is needed to achieve normality. In the clinical literature, these three variables are usually analyzed univariately; however, a multivariate approach would be more appropriate because these variables are correlated with each other. We carry out a detailed analysis of these data by using the proposed methodology. Copyright © 2013 John Wiley & Sons, Ltd.

  3. Bayesian inference for multivariate meta-analysis Box-Cox transformation models for individual patient data with applications to evaluation of cholesterol lowering drugs

    Science.gov (United States)

    Kim, Sungduk; Chen, Ming-Hui; Ibrahim, Joseph G.; Shah, Arvind K.; Lin, Jianxin

    2013-01-01

    In this paper, we propose a class of Box-Cox transformation regression models with multidimensional random effects for analyzing multivariate responses for individual patient data (IPD) in meta-analysis. Our modeling formulation uses a multivariate normal response meta-analysis model with multivariate random effects, in which each response is allowed to have its own Box-Cox transformation. Prior distributions are specified for the Box-Cox transformation parameters as well as the regression coefficients in this complex model, and the Deviance Information Criterion (DIC) is used to select the best transformation model. Since the model is quite complex, a novel Monte Carlo Markov chain (MCMC) sampling scheme is developed to sample from the joint posterior of the parameters. This model is motivated by a very rich dataset comprising 26 clinical trials involving cholesterol lowering drugs where the goal is to jointly model the three dimensional response consisting of Low Density Lipoprotein Cholesterol (LDL-C), High Density Lipoprotein Cholesterol (HDL-C), and Triglycerides (TG) (LDL-C, HDL-C, TG). Since the joint distribution of (LDL-C, HDL-C, TG) is not multivariate normal and in fact quite skewed, a Box-Cox transformation is needed to achieve normality. In the clinical literature, these three variables are usually analyzed univariately: however, a multivariate approach would be more appropriate since these variables are correlated with each other. A detailed analysis of these data is carried out using the proposed methodology. PMID:23580436

  4. A primer of multivariate statistics

    CERN Document Server

    Harris, Richard J

    2014-01-01

    Drawing upon more than 30 years of experience in working with statistics, Dr. Richard J. Harris has updated A Primer of Multivariate Statistics to provide a model of balance between how-to and why. This classic text covers multivariate techniques with a taste of latent variable approaches. Throughout the book there is a focus on the importance of describing and testing one's interpretations of the emergent variables that are produced by multivariate analysis. This edition retains its conversational writing style while focusing on classical techniques. The book gives the reader a feel for why

  5. An exercise in model validation: Comparing univariate statistics and Monte Carlo-based multivariate statistics

    International Nuclear Information System (INIS)

    Weathers, J.B.; Luck, R.; Weathers, J.W.

    2009-01-01

    The complexity of mathematical models used by practicing engineers is increasing due to the growing availability of sophisticated mathematical modeling tools and ever-improving computational power. For this reason, the need to define a well-structured process for validating these models against experimental results has become a pressing issue in the engineering community. This validation process is partially characterized by the uncertainties associated with the modeling effort as well as the experimental results. The net impact of the uncertainties on the validation effort is assessed through the 'noise level of the validation procedure', which can be defined as an estimate of the 95% confidence uncertainty bounds for the comparison error between actual experimental results and model-based predictions of the same quantities of interest. Although general descriptions associated with the construction of the noise level using multivariate statistics exists in the literature, a detailed procedure outlining how to account for the systematic and random uncertainties is not available. In this paper, the methodology used to derive the covariance matrix associated with the multivariate normal pdf based on random and systematic uncertainties is examined, and a procedure used to estimate this covariance matrix using Monte Carlo analysis is presented. The covariance matrices are then used to construct approximate 95% confidence constant probability contours associated with comparison error results for a practical example. In addition, the example is used to show the drawbacks of using a first-order sensitivity analysis when nonlinear local sensitivity coefficients exist. Finally, the example is used to show the connection between the noise level of the validation exercise calculated using multivariate and univariate statistics.

  6. An exercise in model validation: Comparing univariate statistics and Monte Carlo-based multivariate statistics

    Energy Technology Data Exchange (ETDEWEB)

    Weathers, J.B. [Shock, Noise, and Vibration Group, Northrop Grumman Shipbuilding, P.O. Box 149, Pascagoula, MS 39568 (United States)], E-mail: James.Weathers@ngc.com; Luck, R. [Department of Mechanical Engineering, Mississippi State University, 210 Carpenter Engineering Building, P.O. Box ME, Mississippi State, MS 39762-5925 (United States)], E-mail: Luck@me.msstate.edu; Weathers, J.W. [Structural Analysis Group, Northrop Grumman Shipbuilding, P.O. Box 149, Pascagoula, MS 39568 (United States)], E-mail: Jeffrey.Weathers@ngc.com

    2009-11-15

    The complexity of mathematical models used by practicing engineers is increasing due to the growing availability of sophisticated mathematical modeling tools and ever-improving computational power. For this reason, the need to define a well-structured process for validating these models against experimental results has become a pressing issue in the engineering community. This validation process is partially characterized by the uncertainties associated with the modeling effort as well as the experimental results. The net impact of the uncertainties on the validation effort is assessed through the 'noise level of the validation procedure', which can be defined as an estimate of the 95% confidence uncertainty bounds for the comparison error between actual experimental results and model-based predictions of the same quantities of interest. Although general descriptions associated with the construction of the noise level using multivariate statistics exists in the literature, a detailed procedure outlining how to account for the systematic and random uncertainties is not available. In this paper, the methodology used to derive the covariance matrix associated with the multivariate normal pdf based on random and systematic uncertainties is examined, and a procedure used to estimate this covariance matrix using Monte Carlo analysis is presented. The covariance matrices are then used to construct approximate 95% confidence constant probability contours associated with comparison error results for a practical example. In addition, the example is used to show the drawbacks of using a first-order sensitivity analysis when nonlinear local sensitivity coefficients exist. Finally, the example is used to show the connection between the noise level of the validation exercise calculated using multivariate and univariate statistics.

  7. Profitability of Clerodendrum volubile (eweta) , a non-timber forest ...

    African Journals Online (AJOL)

    Timber Forest product, in Okitipupa, Ondo State Nigeria. Purposive and simple random sampling techniques were used in the selection of markets and respondents. The sample size was 60 and instrument of data collection was structured and ...

  8. Refining developmental coordination disorder subtyping with multivariate statistical methods

    Directory of Open Access Journals (Sweden)

    Lalanne Christophe

    2012-07-01

    Full Text Available Abstract Background With a large number of potentially relevant clinical indicators penalization and ensemble learning methods are thought to provide better predictive performance than usual linear predictors. However, little is known about how they perform in clinical studies where few cases are available. We used Random Forests and Partial Least Squares Discriminant Analysis to select the most salient impairments in Developmental Coordination Disorder (DCD and assess patients similarity. Methods We considered a wide-range testing battery for various neuropsychological and visuo-motor impairments which aimed at characterizing subtypes of DCD in a sample of 63 children. Classifiers were optimized on a training sample, and they were used subsequently to rank the 49 items according to a permuted measure of variable importance. In addition, subtyping consistency was assessed with cluster analysis on the training sample. Clustering fitness and predictive accuracy were evaluated on the validation sample. Results Both classifiers yielded a relevant subset of items impairments that altogether accounted for a sharp discrimination between three DCD subtypes: ideomotor, visual-spatial and constructional, and mixt dyspraxia. The main impairments that were found to characterize the three subtypes were: digital perception, imitations of gestures, digital praxia, lego blocks, visual spatial structuration, visual motor integration, coordination between upper and lower limbs. Classification accuracy was above 90% for all classifiers, and clustering fitness was found to be satisfactory. Conclusions Random Forests and Partial Least Squares Discriminant Analysis are useful tools to extract salient features from a large pool of correlated binary predictors, but also provide a way to assess individuals proximities in a reduced factor space. Less than 15 neuro-visual, neuro-psychomotor and neuro-psychological tests might be required to provide a sensitive and

  9. Model Checking Multivariate State Rewards

    DEFF Research Database (Denmark)

    Nielsen, Bo Friis; Nielson, Flemming; Nielson, Hanne Riis

    2010-01-01

    We consider continuous stochastic logics with state rewards that are interpreted over continuous time Markov chains. We show how results from multivariate phase type distributions can be used to obtain higher-order moments for multivariate state rewards (including covariance). We also generalise...

  10. On the multivariate total least-squares approach to empirical coordinate transformations. Three algorithms

    Science.gov (United States)

    Schaffrin, Burkhard; Felus, Yaron A.

    2008-06-01

    The multivariate total least-squares (MTLS) approach aims at estimating a matrix of parameters, Ξ, from a linear model ( Y- E Y = ( X- E X ) · Ξ) that includes an observation matrix, Y, another observation matrix, X, and matrices of randomly distributed errors, E Y and E X . Two special cases of the MTLS approach include the standard multivariate least-squares approach where only the observation matrix, Y, is perturbed by random errors and, on the other hand, the data least-squares approach where only the coefficient matrix X is affected by random errors. In a previous contribution, the authors derived an iterative algorithm to solve the MTLS problem by using the nonlinear Euler-Lagrange conditions. In this contribution, new lemmas are developed to analyze the iterative algorithm, modify it, and compare it with a new ‘closed form’ solution that is based on the singular-value decomposition. For an application, the total least-squares approach is used to estimate the affine transformation parameters that convert cadastral data from the old to the new Israeli datum. Technical aspects of this approach, such as scaling the data and fixing the columns in the coefficient matrix are investigated. This case study illuminates the issue of “symmetry” in the treatment of two sets of coordinates for identical point fields, a topic that had already been emphasized by Teunissen (1989, Festschrift to Torben Krarup, Geodetic Institute Bull no. 58, Copenhagen, Denmark, pp 335-342). The differences between the standard least-squares and the TLS approach are analyzed in terms of the estimated variance component and a first-order approximation of the dispersion matrix of the estimated parameters.

  11. Bush encroachment monitoring using multi-temporal Landsat data and random forests

    Science.gov (United States)

    Symeonakis, E.; Higginbottom, T.

    2014-11-01

    It is widely accepted that land degradation and desertification (LDD) are serious global threats to humans and the environment. Around a third of savannahs in Africa are affected by LDD processes that may lead to substantial declines in ecosystem functioning and services. Indirectly, LDD can be monitored using relevant indicators. The encroachment of woody plants into grasslands, and the subsequent conversion of savannahs and open woodlands into shrublands, has attracted a lot of attention over the last decades and has been identified as a potential indicator of LDD. Mapping bush encroachment over large areas can only effectively be done using Earth Observation (EO) data and techniques. However, the accurate assessment of large-scale savannah degradation through bush encroachment with satellite imagery remains a formidable task due to the fact that on the satellite data vegetation variability in response to highly variable rainfall patterns might obscure the underlying degradation processes. Here, we present a methodological framework for the monitoring of bush encroachment-related land degradation in a savannah environment in the Northwest Province of South Africa. We utilise multi-temporal Landsat TM and ETM+ (SLC-on) data from 1989 until 2009, mostly from the dry-season, and ancillary data in a GIS environment. We then use the machine learning classification approach of random forests to identify the extent of encroachment over the 20-year period. The results show that in the area of study, bush encroachment is as alarming as permanent vegetation loss. The classification of the year 2009 is validated yielding low commission and omission errors and high k-statistic values for the grasses and woody vegetation classes. Our approach is a step towards a rigorous and effective savannah degradation assessment.

  12. SALIENCY-GUIDED CHANGE DETECTION OF REMOTELY SENSED IMAGES USING RANDOM FOREST

    Directory of Open Access Journals (Sweden)

    W. Feng

    2018-04-01

    Full Text Available Studies based on object-based image analysis (OBIA representing the paradigm shift in change detection (CD have achieved remarkable progress in the last decade. Their aim has been developing more intelligent interpretation analysis methods in the future. The prediction effect and performance stability of random forest (RF, as a new kind of machine learning algorithm, are better than many single predictors and integrated forecasting method. In this paper, we present a novel CD approach for high-resolution remote sensing images, which incorporates visual saliency and RF. First, highly homogeneous and compact image super-pixels are generated using super-pixel segmentation, and the optimal segmentation result is obtained through image superimposition and principal component analysis (PCA. Second, saliency detection is used to guide the search of interest regions in the initial difference image obtained via the improved robust change vector analysis (RCVA algorithm. The salient regions within the difference image that correspond to the binarized saliency map are extracted, and the regions are subject to the fuzzy c-means (FCM clustering to obtain the pixel-level pre-classification result, which can be used as a prerequisite for superpixel-based analysis. Third, on the basis of the optimal segmentation and pixel-level pre-classification results, different super-pixel change possibilities are calculated. Furthermore, the changed and unchanged super-pixels that serve as the training samples are automatically selected. The spectral features and Gabor features of each super-pixel are extracted. Finally, superpixel-based CD is implemented by applying RF based on these samples. Experimental results on Ziyuan 3 (ZY3 multi-spectral images show that the proposed method outperforms the compared methods in the accuracy of CD, and also confirm the feasibility and effectiveness of the proposed approach.

  13. Random forest meteorological normalisation models for Swiss PM10 trend analysis

    Science.gov (United States)

    Grange, Stuart K.; Carslaw, David C.; Lewis, Alastair C.; Boleti, Eirini; Hueglin, Christoph

    2018-05-01

    Meteorological normalisation is a technique which accounts for changes in meteorology over time in an air quality time series. Controlling for such changes helps support robust trend analysis because there is more certainty that the observed trends are due to changes in emissions or chemistry, not changes in meteorology. Predictive random forest models (RF; a decision tree machine learning technique) were grown for 31 air quality monitoring sites in Switzerland using surface meteorological, synoptic scale, boundary layer height, and time variables to explain daily PM10 concentrations. The RF models were used to calculate meteorologically normalised trends which were formally tested and evaluated using the Theil-Sen estimator. Between 1997 and 2016, significantly decreasing normalised PM10 trends ranged between -0.09 and -1.16 µg m-3 yr-1 with urban traffic sites experiencing the greatest mean decrease in PM10 concentrations at -0.77 µg m-3 yr-1. Similar magnitudes have been reported for normalised PM10 trends for earlier time periods in Switzerland which indicates PM10 concentrations are continuing to decrease at similar rates as in the past. The ability for RF models to be interpreted was leveraged using partial dependence plots to explain the observed trends and relevant physical and chemical processes influencing PM10 concentrations. Notably, two regimes were suggested by the models which cause elevated PM10 concentrations in Switzerland: one related to poor dispersion conditions and a second resulting from high rates of secondary PM generation in deep, photochemically active boundary layers. The RF meteorological normalisation process was found to be robust, user friendly and simple to implement, and readily interpretable which suggests the technique could be useful in many air quality exploratory data analysis situations.

  14. Saliency-Guided Change Detection of Remotely Sensed Images Using Random Forest

    Science.gov (United States)

    Feng, W.; Sui, H.; Chen, X.

    2018-04-01

    Studies based on object-based image analysis (OBIA) representing the paradigm shift in change detection (CD) have achieved remarkable progress in the last decade. Their aim has been developing more intelligent interpretation analysis methods in the future. The prediction effect and performance stability of random forest (RF), as a new kind of machine learning algorithm, are better than many single predictors and integrated forecasting method. In this paper, we present a novel CD approach for high-resolution remote sensing images, which incorporates visual saliency and RF. First, highly homogeneous and compact image super-pixels are generated using super-pixel segmentation, and the optimal segmentation result is obtained through image superimposition and principal component analysis (PCA). Second, saliency detection is used to guide the search of interest regions in the initial difference image obtained via the improved robust change vector analysis (RCVA) algorithm. The salient regions within the difference image that correspond to the binarized saliency map are extracted, and the regions are subject to the fuzzy c-means (FCM) clustering to obtain the pixel-level pre-classification result, which can be used as a prerequisite for superpixel-based analysis. Third, on the basis of the optimal segmentation and pixel-level pre-classification results, different super-pixel change possibilities are calculated. Furthermore, the changed and unchanged super-pixels that serve as the training samples are automatically selected. The spectral features and Gabor features of each super-pixel are extracted. Finally, superpixel-based CD is implemented by applying RF based on these samples. Experimental results on Ziyuan 3 (ZY3) multi-spectral images show that the proposed method outperforms the compared methods in the accuracy of CD, and also confirm the feasibility and effectiveness of the proposed approach.

  15. Growth and structure of a young Aleppo pine planted forest after thinning for diversification and wildfire prevention

    Directory of Open Access Journals (Sweden)

    J. Ruiz-Mirazo

    2013-04-01

    Full Text Available Aim of study: In the Mediterranean, low timber-production forests are frequently thinned to promote biodiversity and reduce wildfire risk, but few studies in the region have addressed such goals. The aim of this research was to compare six thinning regimes applied to create a fuelbreak in a young Aleppo pine (Pinus halepensis Mill. planted forest.Area of study: A semiarid continental high plateau in south-eastern Spain.Material and Methods: Three thinning intensities (Light, Medium and Heavy were combined with two thinning methods: i Random (tree selection, and ii Regular (tree spacing. Tree growth and stand structure measurements were made four years following treatments.Main results: Heavy Random thinning successfully transformed the regular tree plantation pattern into a close-to-random spatial tree distribution. Heavy Regular thinning (followed by the Medium Regular and Heavy Random regimes significantly reduced growth in stand basal area and biomass. Individual tree growth, in contrast, was greater in Heavy and Medium thinnings than in Light ones, which were similar to the Control.Research highlights: Heavy Random thinning seemed the most appropriate in a youngAleppo pine planted forest to reduce fire risk and artificial tree distribution simultaneously. Light Regular thinning avoids understocking the stand and may be the most suitable treatment for creating a fuelbreak when the undergrowth poses a high fire risk.Keywords: Pinus halepensis; Mediterranean; Forest structure; Tree growth; Wildfire risk; Diversity.

  16. Rural income and forest reliance in highland Guatemala

    DEFF Research Database (Denmark)

    Córdova, José Pablo Prado; Wunder, Sven; Smith-Hall, Carsten

    2013-01-01

    This paper estimates rural household-level forest reliance in the western highlands of Guatemala using quantitative methods. Data were generated by the way of an in-depth household income survey, repeated quarterly between November 2005 and November 2006, in 11 villages (n = 149 randomly selected...

  17. Biomass Carbon Content in Schima- Castanopsis Forest of Midhills of Nepal: A Case Study from Jaisikuna Community Forest, Kaski

    Directory of Open Access Journals (Sweden)

    Sushma Tripathi

    2018-01-01

    Full Text Available Community forests of Nepal’s midhills have high potentiality to sequester carbon. This paper tries to analyze the biomass carbon stock in Schima-Castanopsis forest of Jaisikuna community forests of Kaski district, Nepal. Forest area was divided into two blocks and 18 sample plots (9 in each block which were laid randomly. Diameter at Breast Height (DBH and height of trees (DBH≥5cm were measured using the DBH tape and clinometer. Leaf litter, herbs, grasses and seedlings were collected from 1*1m2 plot and fresh weight was taken. For calculating carbon biomass is multiplied by default value 0.47. The AGTB carbon content of Chilaune, Katus and other species were found 19.56 t/ha, 18.66 t/ha and 3.59 t/ha respectively. The AGTB of Chilaune dominated, Katus dominated and whole forest was found 43.78 t/ha, 39.83 t/ha and 41.81 t/ha respectively. Carbon content at leaf litter, herbs, grasses and seedlings was found 2.73 t/ha. Below ground biomass carbon at whole forest was found 6.27 t/ha. Total biomass and carbon of the forest was found 108.09 t/ha and 50.80 t/ha respectively. Difference in biomass and carbon content at Chilaune dominated block and Katus dominated block was found insignificant. This study record very low biomass carbon content than average of Nepal's forest but this variation in carbon stock is not necessarily due to dominant species present in the forest. Carbon estimation at forest of different elevation, aspect and location are recommended for further research. International Journal of EnvironmentVolume-6, Issue-4, Sep-Nov 2017, page: 72-84

  18. Pitfalls and potential of particle swarm optimization for contemporary spatial forest planning

    Energy Technology Data Exchange (ETDEWEB)

    Shan, Y.; Bettinger, P.; Cieszewski, C.; Wang, W.

    2012-07-01

    We describe here an example of applying particle swarm optimization (PSO) a population-based heuristic technique to maximize the net present value of a contemporary southern United States forest plan that includes spatial constraints (green-up and adjacency) and wood flow constraints. When initiated with randomly defined feasible initial conditions, and tuned with some appropriate modifications, the PSO algorithm gradually converged upon its final solution and provided reasonable objective function values. However, only 86% of the global optimal value could be achieved using the modified PSO heuristic. The results of this study suggest that under random-start initial population conditions the PSO heuristic may have rather limited application to forest planning problems with economic objectives, wood-flow constraints, and spatial considerations. Pitfalls include the need to modify the structure of PSO to both address spatial constraints and to repair particles, and the need to modify some of the basic assumptions of PSO to better address contemporary forest planning problems. Our results, and hence our contributions, are contrary to earlier work that illustrated the impressive potential of PSO when applied to stand-level forest planning problems or when applied to a high quality initial population. (Author) 46 refs.

  19. The behaviour of random forest permutation-based variable importance measures under predictor correlation.

    Science.gov (United States)

    Nicodemus, Kristin K; Malley, James D; Strobl, Carolin; Ziegler, Andreas

    2010-02-27

    Random forests (RF) have been increasingly used in applications such as genome-wide association and microarray studies where predictor correlation is frequently observed. Recent works on permutation-based variable importance measures (VIMs) used in RF have come to apparently contradictory conclusions. We present an extended simulation study to synthesize results. In the case when both predictor correlation was present and predictors were associated with the outcome (HA), the unconditional RF VIM attributed a higher share of importance to correlated predictors, while under the null hypothesis that no predictors are associated with the outcome (H0) the unconditional RF VIM was unbiased. Conditional VIMs showed a decrease in VIM values for correlated predictors versus the unconditional VIMs under HA and was unbiased under H0. Scaled VIMs were clearly biased under HA and H0. Unconditional unscaled VIMs are a computationally tractable choice for large datasets and are unbiased under the null hypothesis. Whether the observed increased VIMs for correlated predictors may be considered a "bias" - because they do not directly reflect the coefficients in the generating model - or if it is a beneficial attribute of these VIMs is dependent on the application. For example, in genetic association studies, where correlation between markers may help to localize the functionally relevant variant, the increased importance of correlated predictors may be an advantage. On the other hand, we show examples where this increased importance may result in spurious signals.

  20. Recovery of Forest and Phylogenetic Structure in Abandoned Cocoa Agroforestry in the Atlantic Forest of Brazil.

    Science.gov (United States)

    Rolim, Samir Gonçalves; Sambuichi, Regina Helena Rosa; Schroth, Götz; Nascimento, Marcelo Trindade; Gomes, José Manoel Lucio

    2017-03-01

    Cocoa agroforests like the cabrucas of Brazil's Atlantic forest are among the agro-ecosystems with greatest potential for biodiversity conservation. Despite a global trend for their intensification, cocoa agroforests are also being abandoned for socioeconomic reasons especially on marginal sites, because they are incorporated in public or private protected areas, or are part of mandatory set-asides under Brazilian environmental legislation. However, little is known about phylogenetic structure, the processes of forest regeneration after abandonment and the conservation value of former cabruca sites. Here we compare the vegetation structure and composition of a former cabruca 30-40 years after abandonment with a managed cabruca and mature forest in the Atlantic forest region of Espirito Santo, Brazil. The forest in the abandoned cabruca had recovered a substantial part of its original structure. Abandoned cabruca have a higher density (mean ± CI95 %: 525.0 ± 40.3 stems per ha), basal area (34.0 ± 6.5 m 2 per ha) and species richness (148 ± 11.5 species) than managed cabruca (96.0 ± 17.7; 24.15 ± 3.9 and 114.5 ± 16.0, respectively) but no significant differences to mature forest in density (581.0 ± 42.2), basal area (29.9.0 ± 3.3) and species richness (162.6 ± 15.5 species). Thinning (understory removal) changes phylogenetic structure from evenness in mature forest to clustering in managed cabruca, but after 30-40 years abandoned cabruca had a random phylogenetic structure, probably due to a balance between biotic and abiotic filters at this age. We conclude that abandoned cocoa agroforests present highly favorable conditions for the regeneration of Atlantic forest and could contribute to the formation of an interconnected network of forest habitat in this biodiversity hotspot.

  1. Random phenomena fundamentals of probability and statistics for engineers

    CERN Document Server

    Ogunnaike, Babatunde A

    2009-01-01

    PreludeApproach PhilosophyFour Basic PrinciplesI FoundationsTwo Motivating ExamplesYield Improvement in a Chemical ProcessQuality Assurance in a Glass Sheet Manufacturing ProcessOutline of a Systematic ApproachRandom Phenomena, Variability, and UncertaintyTwo Extreme Idealizations of Natural PhenomenaRandom Mass PhenomenaIntroducing ProbabilityThe Probabilistic FrameworkII ProbabilityFundamentals of Probability TheoryBuilding BlocksOperationsProbabilityConditional ProbabilityIndependenceRandom Variables and DistributionsDistributionsMathematical ExpectationCharacterizing DistributionsSpecial Derived Probability FunctionsMultidimensional Random VariablesDistributions of Several Random VariablesDistributional Characteristics of Jointly Distributed Random VariablesRandom Variable TransformationsSingle Variable TransformationsBivariate TransformationsGeneral Multivariate TransformationsApplication Case Studies I: ProbabilityMendel and HeredityWorld War II Warship Tactical Response Under AttackIII DistributionsIde...

  2. Effectiveness of community forestry in Prey Long forest, Cambodia.

    Science.gov (United States)

    Lambrick, Frances H; Brown, Nick D; Lawrence, Anna; Bebber, Daniel P

    2014-04-01

    Cambodia has 57% forest cover, the second highest in the Greater Mekong region, and a high deforestation rate (1.2%/year, 2005-2010). Community forestry (CF) has been proposed as a way to reduce deforestation and support livelihoods through local management of forests. CF is expanding rapidly in Cambodia. The National Forests Program aims to designate one million hectares of forest to CF by 2030. However, the effectiveness of CF in conservation is not clear due to a global lack of controlled comparisons, multiple meanings of CF, and the context-specific nature of CF implementation. We assessed the effectiveness of CF by comparing 9 CF sites with paired controls in state production forest in the area of Prey Long forest, Cambodia. We assessed forest condition in 18-20 randomly placed variable-radius plots and fixed-area regeneration plots. We surveyed 10% of households in each of the 9 CF villages to determine the proportion that used forest products, as a measure of household dependence on the forest. CF sites had fewer signs of anthropogenic damage (cut stems, stumps, and burned trees), higher aboveground biomass, more regenerating stems, and reduced canopy openness than control areas. Abundance of economically valuable species, however, was higher in control sites. We used survey results and geographic parameters to model factors affecting CF outcomes. Interaction between management type, CF or control, and forest dependence indicated that CF was more effective in cases where the community relied on forest products for subsistence use and income. © 2014 Society for Conservation Biology.

  3. EcmPred: Prediction of extracellular matrix proteins based on random forest with maximum relevance minimum redundancy feature selection

    KAUST Repository

    Kandaswamy, Krishna Kumar Umar

    2013-01-01

    The extracellular matrix (ECM) is a major component of tissues of multicellular organisms. It consists of secreted macromolecules, mainly polysaccharides and glycoproteins. Malfunctions of ECM proteins lead to severe disorders such as marfan syndrome, osteogenesis imperfecta, numerous chondrodysplasias, and skin diseases. In this work, we report a random forest approach, EcmPred, for the prediction of ECM proteins from protein sequences. EcmPred was trained on a dataset containing 300 ECM and 300 non-ECM and tested on a dataset containing 145 ECM and 4187 non-ECM proteins. EcmPred achieved 83% accuracy on the training and 77% on the test dataset. EcmPred predicted 15 out of 20 experimentally verified ECM proteins. By scanning the entire human proteome, we predicted novel ECM proteins validated with gene ontology and InterPro. The dataset and standalone version of the EcmPred software is available at http://www.inb.uni-luebeck.de/tools-demos/Extracellular_matrix_proteins/EcmPred. © 2012 Elsevier Ltd.

  4. Methods for identifying SNP interactions: a review on variations of Logic Regression, Random Forest and Bayesian logistic regression.

    Science.gov (United States)

    Chen, Carla Chia-Ming; Schwender, Holger; Keith, Jonathan; Nunkesser, Robin; Mengersen, Kerrie; Macrossan, Paula

    2011-01-01

    Due to advancements in computational ability, enhanced technology and a reduction in the price of genotyping, more data are being generated for understanding genetic associations with diseases and disorders. However, with the availability of large data sets comes the inherent challenges of new methods of statistical analysis and modeling. Considering a complex phenotype may be the effect of a combination of multiple loci, various statistical methods have been developed for identifying genetic epistasis effects. Among these methods, logic regression (LR) is an intriguing approach incorporating tree-like structures. Various methods have built on the original LR to improve different aspects of the model. In this study, we review four variations of LR, namely Logic Feature Selection, Monte Carlo Logic Regression, Genetic Programming for Association Studies, and Modified Logic Regression-Gene Expression Programming, and investigate the performance of each method using simulated and real genotype data. We contrast these with another tree-like approach, namely Random Forests, and a Bayesian logistic regression with stochastic search variable selection.

  5. Multivariate analysis: models and method

    International Nuclear Information System (INIS)

    Sanz Perucha, J.

    1990-01-01

    Data treatment techniques are increasingly used since computer methods result of wider access. Multivariate analysis consists of a group of statistic methods that are applied to study objects or samples characterized by multiple values. A final goal is decision making. The paper describes the models and methods of multivariate analysis

  6. A Sample-Based Forest Monitoring Strategy Using Landsat, AVHRR and MODIS Data to Estimate Gross Forest Cover Loss in Malaysia between 1990 and 2005

    Directory of Open Access Journals (Sweden)

    Peter Potapov

    2013-04-01

    Full Text Available Insular Southeast Asia is a hotspot of humid tropical forest cover loss. A sample-based monitoring approach quantifying forest cover loss from Landsat imagery was implemented to estimate gross forest cover loss for two eras, 1990–2000 and 2000–2005. For each time interval, a probability sample of 18.5 km × 18.5 km blocks was selected, and pairs of Landsat images acquired per sample block were interpreted to quantify forest cover area and gross forest cover loss. Stratified random sampling was implemented for 2000–2005 with MODIS-derived forest cover loss used to define the strata. A probability proportional to x (πpx design was implemented for 1990–2000 with AVHRR-derived forest cover loss used as the x variable to increase the likelihood of including forest loss area in the sample. The estimated annual gross forest cover loss for Malaysia was 0.43 Mha/yr (SE = 0.04 during 1990–2000 and 0.64 Mha/yr (SE = 0.055 during 2000–2005. Our use of the πpx sampling design represents a first practical trial of this design for sampling satellite imagery. Although the design performed adequately in this study, a thorough comparative investigation of the πpx design relative to other sampling strategies is needed before general design recommendations can be put forth.

  7. Multivariant analyses of trace element patterns for environmental tracking

    International Nuclear Information System (INIS)

    Jervis, R.E.; Ko, M.M.C.; Junliang Tian; Puling Liu

    1993-01-01

    Nuclear-based analytical techniques: INAA, PIXE and photon activation permit simultaneous multielemental determination of concentrations in environmental materials, which data are often found sufficiently precise and free of uncontrolled, random errors among the various elements such that the data sets can yield valuable information on elemental communality through multi-variant statistical 'factor' analysis. Characteristic factor patterns obtained in this way can provide clues to the likely sources in the environment of various components. Recent studies in three different environmental situations: solid waste incinerators , Chinese soils, and iron and steel industry, involving measurements of 30-35 elements, have yielded distinct elemental patterns or, environmental signatures, with factor loading coefficients ranging mostly in the ranges: 0.7-0.96. (author) 10 refs.; 2 figs.; 9 tabs

  8. Defining Higher-Order Turbulent Moment Closures with an Artificial Neural Network and Random Forest

    Science.gov (United States)

    McGibbon, J.; Bretherton, C. S.

    2017-12-01

    Unresolved turbulent advection and clouds must be parameterized in atmospheric models. Modern higher-order closure schemes depend on analytic moment closure assumptions that diagnose higher-order moments in terms of lower-order ones. These are then tested against Large-Eddy Simulation (LES) higher-order moment relations. However, these relations may not be neatly analytic in nature. Rather than rely on an analytic higher-order moment closure, can we use machine learning on LES data itself to define a higher-order moment closure?We assess the ability of a deep artificial neural network (NN) and random forest (RF) to perform this task using a set of observationally-based LES runs from the MAGIC field campaign. By training on a subset of 12 simulations and testing on remaining simulations, we avoid over-fitting the training data.Performance of the NN and RF will be assessed and compared to the Analytic Double Gaussian 1 (ADG1) closure assumed by Cloudy Layers Unified By Binormals (CLUBB), a higher-order turbulence closure currently used in the Community Atmosphere Model (CAM). We will show that the RF outperforms the NN and the ADG1 closure for the MAGIC cases within this diagnostic framework. Progress and challenges in using a diagnostic machine learning closure within a prognostic cloud and turbulence parameterization will also be discussed.

  9. Multivariate meta-analysis: a robust approach based on the theory of U-statistic.

    Science.gov (United States)

    Ma, Yan; Mazumdar, Madhu

    2011-10-30

    Meta-analysis is the methodology for combining findings from similar research studies asking the same question. When the question of interest involves multiple outcomes, multivariate meta-analysis is used to synthesize the outcomes simultaneously taking into account the correlation between the outcomes. Likelihood-based approaches, in particular restricted maximum likelihood (REML) method, are commonly utilized in this context. REML assumes a multivariate normal distribution for the random-effects model. This assumption is difficult to verify, especially for meta-analysis with small number of component studies. The use of REML also requires iterative estimation between parameters, needing moderately high computation time, especially when the dimension of outcomes is large. A multivariate method of moments (MMM) is available and is shown to perform equally well to REML. However, there is a lack of information on the performance of these two methods when the true data distribution is far from normality. In this paper, we propose a new nonparametric and non-iterative method for multivariate meta-analysis on the basis of the theory of U-statistic and compare the properties of these three procedures under both normal and skewed data through simulation studies. It is shown that the effect on estimates from REML because of non-normal data distribution is marginal and that the estimates from MMM and U-statistic-based approaches are very similar. Therefore, we conclude that for performing multivariate meta-analysis, the U-statistic estimation procedure is a viable alternative to REML and MMM. Easy implementation of all three methods are illustrated by their application to data from two published meta-analysis from the fields of hip fracture and periodontal disease. We discuss ideas for future research based on U-statistic for testing significance of between-study heterogeneity and for extending the work to meta-regression setting. Copyright © 2011 John Wiley & Sons, Ltd.

  10. Spatio-temporal dynamics of the tropical rain forest

    Energy Technology Data Exchange (ETDEWEB)

    Chave, J. [CEN Saclay, Gif-sur-Yvette (France). Service de Physique de l' Etat Condense

    2000-07-01

    Mechanisms which drive the dynamics of forest ecosystems are complex, from seedling establishment to pollination, and seed dispersal by animals, running water or wind. These processes are more complex when the ecosystem shelters a large number of species and of vegetative forms, as it is the case in the tropical rainforest. To take them into account, we must develop and use models. I present a review of the fundamental mechanisms for the of a natural forest dynamics - photosynthesis, tree growth, recruitment and mortality - as well as a description of the past and of the present of tropical rainforests. This information is used to develop a spatially-explicit and individual-based forest model. Simplified models are deduced from it, and they serve to address more specific issues, such as the resilience of the forest to climate disturbances, or savanna-forest dynamics. The last topic is related to the spatio-temporal description of tropical plant biodiversity. A detailed introduction to the problem is provided, and models accounting for the maintenance of diversity are compared. These models include non spatial as well a spatial approaches (branching anihilating random walks and voter model with mutation). (orig.)

  11. Multivariate strategies in functional magnetic resonance imaging

    DEFF Research Database (Denmark)

    Hansen, Lars Kai

    2007-01-01

    We discuss aspects of multivariate fMRI modeling, including the statistical evaluation of multivariate models and means for dimensional reduction. In a case study we analyze linear and non-linear dimensional reduction tools in the context of a `mind reading' predictive multivariate fMRI model....

  12. Variation in nutrient characteristics of surface soils from the Luquillo Experimental Forest of Puerto Rico: A multivariate perspective.

    Science.gov (United States)

    S. B. Cox; M. R. Willig; F. N. Scatena

    2002-01-01

    We assessed the effects of landscape features (vegetation type and topography), season, and spatial hierarchy on the nutrient content of surface soils in the Luquillo Experimental Forest (LEF) of Puerto Rico. Considerable spatial variation characterized the soils of the LEF, and differences between replicate sites within each combination of vegetation type (tabonuco vs...

  13. Applied multivariate statistical analysis

    CERN Document Server

    Härdle, Wolfgang Karl

    2015-01-01

    Focusing on high-dimensional applications, this 4th edition presents the tools and concepts used in multivariate data analysis in a style that is also accessible for non-mathematicians and practitioners.  It surveys the basic principles and emphasizes both exploratory and inferential statistics; a new chapter on Variable Selection (Lasso, SCAD and Elastic Net) has also been added.  All chapters include practical exercises that highlight applications in different multivariate data analysis fields: in quantitative financial studies, where the joint dynamics of assets are observed; in medicine, where recorded observations of subjects in different locations form the basis for reliable diagnoses and medication; and in quantitative marketing, where consumers’ preferences are collected in order to construct models of consumer behavior.  All of these examples involve high to ultra-high dimensions and represent a number of major fields in big data analysis. The fourth edition of this book on Applied Multivariate ...

  14. Cross-covariance functions for multivariate random fields based on latent dimensions

    KAUST Repository

    Apanasovich, T. V.

    2010-02-16

    The problem of constructing valid parametric cross-covariance functions is challenging. We propose a simple methodology, based on latent dimensions and existing covariance models for univariate random fields, to develop flexible, interpretable and computationally feasible classes of cross-covariance functions in closed form. We focus on spatio-temporal cross-covariance functions that can be nonseparable, asymmetric and can have different covariance structures, for instance different smoothness parameters, in each component. We discuss estimation of these models and perform a small simulation study to demonstrate our approach. We illustrate our methodology on a trivariate spatio-temporal pollution dataset from California and demonstrate that our cross-covariance performs better than other competing models. © 2010 Biometrika Trust.

  15. Is more better or worse? New empirics on nuclear proliferation and interstate conflict by Random Forests1

    Directory of Open Access Journals (Sweden)

    Akisato Suzuki

    2015-06-01

    Full Text Available In the literature on nuclear proliferation, some argue that further proliferation decreases interstate conflict, some say that it increases interstate conflict, and others indicate a non-linear relationship between these two factors. However, there has been no systematic empirical investigation on the relationship between nuclear proliferation and a propensity for conflict at the interstate–systemic level. To fill this gap, the current paper uses the machine learning method Random Forests, which can investigate complex non-linear relationships between dependent and independent variables, and which can identify important regressors from a group of all potential regressors in explaining the relationship between nuclear proliferation and the propensity for conflict. The results indicate that, on average, a larger number of nuclear states decrease the systemic propensity for interstate conflict, while the emergence of new nuclear states does not have an important effect. This paper also notes, however, that scholars should investigate other risks of proliferation to assess whether nuclear proliferation is better or worse for international peace and security in general.

  16. Multivariate Bonferroni-type inequalities theory and applications

    CERN Document Server

    Chen, John

    2014-01-01

    Multivariate Bonferroni-Type Inequalities: Theory and Applications presents a systematic account of research discoveries on multivariate Bonferroni-type inequalities published in the past decade. The emergence of new bounding approaches pushes the conventional definitions of optimal inequalities and demands new insights into linear and Fréchet optimality. The book explores these advances in bounding techniques with corresponding innovative applications. It presents the method of linear programming for multivariate bounds, multivariate hybrid bounds, sub-Markovian bounds, and bounds using Hamil

  17. A kernel version of multivariate alteration detection

    DEFF Research Database (Denmark)

    Nielsen, Allan Aasbjerg; Vestergaard, Jacob Schack

    2013-01-01

    Based on the established methods kernel canonical correlation analysis and multivariate alteration detection we introduce a kernel version of multivariate alteration detection. A case study with SPOT HRV data shows that the kMAD variates focus on extreme change observations.......Based on the established methods kernel canonical correlation analysis and multivariate alteration detection we introduce a kernel version of multivariate alteration detection. A case study with SPOT HRV data shows that the kMAD variates focus on extreme change observations....

  18. Mapping Forest Inventory and Analysis forest land use: timberland, reserved forest land, and other forest land

    Science.gov (United States)

    Mark D. Nelson; John Vissage

    2007-01-01

    The Forest Inventory and Analysis (FIA) program produces area estimates of forest land use within three subcategories: timberland, reserved forest land, and other forest land. Mapping these subcategories of forest land requires the ability to spatially distinguish productive from unproductive land, and reserved from nonreserved land. FIA field data were spatially...

  19. Mapping SOC (Soil Organic Carbon) using LiDAR-derived vegetation indices in a random forest regression model

    Science.gov (United States)

    Will, R. M.; Glenn, N. F.; Benner, S. G.; Pierce, J. L.; Spaete, L.; Li, A.

    2015-12-01

    Quantifying SOC (Soil Organic Carbon) storage in complex terrain is challenging due to high spatial variability. Generally, the challenge is met by transforming point data to the entire landscape using surrogate, spatially-distributed, variables like elevation or precipitation. In many ecosystems, remotely sensed information on above-ground vegetation (e.g. NDVI) is a good predictor of below-ground carbon stocks. In this project, we are attempting to improve this predictive method by incorporating LiDAR-derived vegetation indices. LiDAR provides a mechanism for improved characterization of aboveground vegetation by providing structural parameters such as vegetation height and biomass. In this study, a random forest model is used to predict SOC using a suite of LiDAR-derived vegetation indices as predictor variables. The Reynolds Creek Experimental Watershed (RCEW) is an ideal location for a study of this type since it encompasses a strong elevation/precipitation gradient that supports lower biomass sagebrush ecosystems at low elevations and forests with more biomass at higher elevations. Sagebrush ecosystems composed of Wyoming, Low and Mountain Sagebrush have SOC values ranging from .4 to 1% (top 30 cm), while higher biomass ecosystems composed of aspen, juniper and fir have SOC values approaching 4% (top 30 cm). Large differences in SOC have been observed between canopy and interspace locations and high resolution vegetation information is likely to explain plot scale variability in SOC. Mapping of the SOC reservoir will help identify underlying controls on SOC distribution and provide insight into which processes are most important in determining SOC in semi-arid mountainous regions. In addition, airborne LiDAR has the potential to characterize vegetation communities at a high resolution and could be a tool for improving estimates of SOC at larger scales.

  20. Prediction of protein-protein interaction sites in sequences and 3D structures by random forests.

    Directory of Open Access Journals (Sweden)

    Mile Sikić

    2009-01-01

    Full Text Available Identifying interaction sites in proteins provides important clues to the function of a protein and is becoming increasingly relevant in topics such as systems biology and drug discovery. Although there are numerous papers on the prediction of interaction sites using information derived from structure, there are only a few case reports on the prediction of interaction residues based solely on protein sequence. Here, a sliding window approach is combined with the Random Forests method to predict protein interaction sites using (i a combination of sequence- and structure-derived parameters and (ii sequence information alone. For sequence-based prediction we achieved a precision of 84% with a 26% recall and an F-measure of 40%. When combined with structural information, the prediction performance increases to a precision of 76% and a recall of 38% with an F-measure of 51%. We also present an attempt to rationalize the sliding window size and demonstrate that a nine-residue window is the most suitable for predictor construction. Finally, we demonstrate the applicability of our prediction methods by modeling the Ras-Raf complex using predicted interaction sites as target binding interfaces. Our results suggest that it is possible to predict protein interaction sites with quite a high accuracy using only sequence information.

  1. Using Random Forests to Select Optimal Input Variables for Short-Term Wind Speed Forecasting Models

    Directory of Open Access Journals (Sweden)

    Hui Wang

    2017-10-01

    Full Text Available Achieving relatively high-accuracy short-term wind speed forecasting estimates is a precondition for the construction and grid-connected operation of wind power forecasting systems for wind farms. Currently, most research is focused on the structure of forecasting models and does not consider the selection of input variables, which can have significant impacts on forecasting performance. This paper presents an input variable selection method for wind speed forecasting models. The candidate input variables for various leading periods are selected and random forests (RF is employed to evaluate the importance of all variable as features. The feature subset with the best evaluation performance is selected as the optimal feature set. Then, kernel-based extreme learning machine is constructed to evaluate the performance of input variables selection based on RF. The results of the case study show that by removing the uncorrelated and redundant features, RF effectively extracts the most strongly correlated set of features from the candidate input variables. By finding the optimal feature combination to represent the original information, RF simplifies the structure of the wind speed forecasting model, shortens the training time required, and substantially improves the model’s accuracy and generalization ability, demonstrating that the input variables selected by RF are effective.

  2. Random Forests Are Able to Identify Differences in Clotting Dynamics from Kinetic Models of Thrombin Generation.

    Science.gov (United States)

    Arumugam, Jayavel; Bukkapatnam, Satish T S; Narayanan, Krishna R; Srinivasa, Arun R

    2016-01-01

    Current methods for distinguishing acute coronary syndromes such as heart attack from stable coronary artery disease, based on the kinetics of thrombin formation, have been limited to evaluating sensitivity of well-established chemical species (e.g., thrombin) using simple quantifiers of their concentration profiles (e.g., maximum level of thrombin concentration, area under the thrombin concentration versus time curve). In order to get an improved classifier, we use a 34-protein factor clotting cascade model and convert the simulation data into a high-dimensional representation (about 19000 features) using a piecewise cubic polynomial fit. Then, we systematically find plausible assays to effectively gauge changes in acute coronary syndrome/coronary artery disease populations by introducing a statistical learning technique called Random Forests. We find that differences associated with acute coronary syndromes emerge in combinations of a handful of features. For instance, concentrations of 3 chemical species, namely, active alpha-thrombin, tissue factor-factor VIIa-factor Xa ternary complex, and intrinsic tenase complex with factor X, at specific time windows, could be used to classify acute coronary syndromes to an accuracy of about 87.2%. Such a combination could be used to efficiently assay the coagulation system.

  3. The Effects of Point or Polygon Based Training Data on RandomForest Classification Accuracy of Wetlands

    Directory of Open Access Journals (Sweden)

    Jennifer Corcoran

    2015-04-01

    Full Text Available Wetlands are dynamic in space and time, providing varying ecosystem services. Field reference data for both training and assessment of wetland inventories in the State of Minnesota are typically collected as GPS points over wide geographical areas and at infrequent intervals. This status-quo makes it difficult to keep updated maps of wetlands with adequate accuracy, efficiency, and consistency to monitor change. Furthermore, point reference data may not be representative of the prevailing land cover type for an area, due to point location or heterogeneity within the ecosystem of interest. In this research, we present techniques for training a land cover classification for two study sites in different ecoregions by implementing the RandomForest classifier in three ways: (1 field and photo interpreted points; (2 fixed window surrounding the points; and (3 image objects that intersect the points. Additional assessments are made to identify the key input variables. We conclude that the image object area training method is the most accurate and the most important variables include: compound topographic index, summer season green and blue bands, and grid statistics from LiDAR point cloud data, especially those that relate to the height of the return.

  4. Random Forests Are Able to Identify Differences in Clotting Dynamics from Kinetic Models of Thrombin Generation.

    Directory of Open Access Journals (Sweden)

    Jayavel Arumugam

    Full Text Available Current methods for distinguishing acute coronary syndromes such as heart attack from stable coronary artery disease, based on the kinetics of thrombin formation, have been limited to evaluating sensitivity of well-established chemical species (e.g., thrombin using simple quantifiers of their concentration profiles (e.g., maximum level of thrombin concentration, area under the thrombin concentration versus time curve. In order to get an improved classifier, we use a 34-protein factor clotting cascade model and convert the simulation data into a high-dimensional representation (about 19000 features using a piecewise cubic polynomial fit. Then, we systematically find plausible assays to effectively gauge changes in acute coronary syndrome/coronary artery disease populations by introducing a statistical learning technique called Random Forests. We find that differences associated with acute coronary syndromes emerge in combinations of a handful of features. For instance, concentrations of 3 chemical species, namely, active alpha-thrombin, tissue factor-factor VIIa-factor Xa ternary complex, and intrinsic tenase complex with factor X, at specific time windows, could be used to classify acute coronary syndromes to an accuracy of about 87.2%. Such a combination could be used to efficiently assay the coagulation system.

  5. Multivariate Matrix-Exponential Distributions

    DEFF Research Database (Denmark)

    Bladt, Mogens; Nielsen, Bo Friis

    2010-01-01

    be written as linear combinations of the elements in the exponential of a matrix. For this reason we shall refer to multivariate distributions with rational Laplace transform as multivariate matrix-exponential distributions (MVME). The marginal distributions of an MVME are univariate matrix......-exponential distributions. We prove a characterization that states that a distribution is an MVME distribution if and only if all non-negative, non-null linear combinations of the coordinates have a univariate matrix-exponential distribution. This theorem is analog to a well-known characterization theorem...

  6. Experimental comparison of support vector machines with random ...

    Indian Academy of Sciences (India)

    dient method, support vector machines, and random forests to improve producer accuracy and overall classification accuracy. The performance comparison of these classifiers is valuable for a decision maker ... ping, surveillance system, resource management, tracking ... rocks, water bodies, and anthropogenic elements,.

  7. Multivariate analysis methods in physics

    International Nuclear Information System (INIS)

    Wolter, M.

    2007-01-01

    A review of multivariate methods based on statistical training is given. Several multivariate methods useful in high-energy physics analysis are discussed. Selected examples from current research in particle physics are discussed, both from the on-line trigger selection and from the off-line analysis. Also statistical training methods are presented and some new application are suggested [ru

  8. A hybrid training approach for leaf area index estimation via Cubist and random forests machine-learning

    KAUST Repository

    McCabe, Matthew

    2017-12-06

    With an increasing volume and dimensionality of Earth observation data, enhanced integration of machine-learning methodologies is needed to effectively analyze and utilize these information rich datasets. In machine-learning, a training dataset is required to establish explicit associations between a suite of explanatory ‘predictor’ variables and the target property. The specifics of this learning process can significantly influence model validity and portability, with a higher generalization level expected with an increasing number of observable conditions being reflected in the training dataset. Here we propose a hybrid training approach for leaf area index (LAI) estimation, which harnesses synergistic attributes of scattered in-situ measurements and systematically distributed physically based model inversion results to enhance the information content and spatial representativeness of the training data. To do this, a complimentary training dataset of independent LAI was derived from a regularized model inversion of RapidEye surface reflectances and subsequently used to guide the development of LAI regression models via Cubist and random forests (RF) decision tree methods. The application of the hybrid training approach to a broad set of Landsat 8 vegetation index (VI) predictor variables resulted in significantly improved LAI prediction accuracies and spatial consistencies, relative to results relying on in-situ measurements alone for model training. In comparing the prediction capacity and portability of the two machine-learning algorithms, a pair of relatively simple multi-variate regression models established by Cubist performed best, with an overall relative mean absolute deviation (rMAD) of ∼11%, determined based on a stringent scene-specific cross-validation approach. In comparison, the portability of RF regression models was less effective (i.e., an overall rMAD of ∼15%), which was attributed partly to model saturation at high LAI in association

  9. A hybrid training approach for leaf area index estimation via Cubist and random forests machine-learning

    Science.gov (United States)

    Houborg, Rasmus; McCabe, Matthew F.

    2018-01-01

    With an increasing volume and dimensionality of Earth observation data, enhanced integration of machine-learning methodologies is needed to effectively analyze and utilize these information rich datasets. In machine-learning, a training dataset is required to establish explicit associations between a suite of explanatory 'predictor' variables and the target property. The specifics of this learning process can significantly influence model validity and portability, with a higher generalization level expected with an increasing number of observable conditions being reflected in the training dataset. Here we propose a hybrid training approach for leaf area index (LAI) estimation, which harnesses synergistic attributes of scattered in-situ measurements and systematically distributed physically based model inversion results to enhance the information content and spatial representativeness of the training data. To do this, a complimentary training dataset of independent LAI was derived from a regularized model inversion of RapidEye surface reflectances and subsequently used to guide the development of LAI regression models via Cubist and random forests (RF) decision tree methods. The application of the hybrid training approach to a broad set of Landsat 8 vegetation index (VI) predictor variables resulted in significantly improved LAI prediction accuracies and spatial consistencies, relative to results relying on in-situ measurements alone for model training. In comparing the prediction capacity and portability of the two machine-learning algorithms, a pair of relatively simple multi-variate regression models established by Cubist performed best, with an overall relative mean absolute deviation (rMAD) of ∼11%, determined based on a stringent scene-specific cross-validation approach. In comparison, the portability of RF regression models was less effective (i.e., an overall rMAD of ∼15%), which was attributed partly to model saturation at high LAI in association with

  10. A hybrid training approach for leaf area index estimation via Cubist and random forests machine-learning

    KAUST Repository

    McCabe, Matthew; McCabe, Matthew

    2017-01-01

    With an increasing volume and dimensionality of Earth observation data, enhanced integration of machine-learning methodologies is needed to effectively analyze and utilize these information rich datasets. In machine-learning, a training dataset is required to establish explicit associations between a suite of explanatory ‘predictor’ variables and the target property. The specifics of this learning process can significantly influence model validity and portability, with a higher generalization level expected with an increasing number of observable conditions being reflected in the training dataset. Here we propose a hybrid training approach for leaf area index (LAI) estimation, which harnesses synergistic attributes of scattered in-situ measurements and systematically distributed physically based model inversion results to enhance the information content and spatial representativeness of the training data. To do this, a complimentary training dataset of independent LAI was derived from a regularized model inversion of RapidEye surface reflectances and subsequently used to guide the development of LAI regression models via Cubist and random forests (RF) decision tree methods. The application of the hybrid training approach to a broad set of Landsat 8 vegetation index (VI) predictor variables resulted in significantly improved LAI prediction accuracies and spatial consistencies, relative to results relying on in-situ measurements alone for model training. In comparing the prediction capacity and portability of the two machine-learning algorithms, a pair of relatively simple multi-variate regression models established by Cubist performed best, with an overall relative mean absolute deviation (rMAD) of ∼11%, determined based on a stringent scene-specific cross-validation approach. In comparison, the portability of RF regression models was less effective (i.e., an overall rMAD of ∼15%), which was attributed partly to model saturation at high LAI in association

  11. Bird distributional patterns support biogeographical histories and are associated with bioclimatic units in the Atlantic Forest, Brazil.

    Science.gov (United States)

    Carvalho, Cristiano DE Santana; Nascimento, Nayla Fábia Ferreira DO; Araujo, Helder F P DE

    2017-10-17

    Rivers as barriers to dispersal and past forest refugia are two of the hypotheses proposed to explain the patterns of biodiversity in the Atlantic Forest. It has recently been shown that possible past refugia correspond to bioclimatically different regions, so we tested whether patterns of shared distribution of bird taxa in the Atlantic Forest are 1) limited by the Doce and São Francisco rivers or 2) associated with the bioclimatically different southern and northeastern regions. We catalogued lists of forest birds from 45 locations, 36 in the Atlantic forest and nine in Amazon, and used parsimony analysis of endemicity to identify groups of shared taxa. We also compared differences between these groups by permutational multivariate analysis of variance and identified the species that best supported the resulting groups. The results showed that the distribution of forest birds is divided into two main regions in the Atlantic Forest, the first with more southern localities and the second with northeastern localities. This distributional pattern is not delimited by riverbanks, but it may be associated with bioclimatic units, surrogated by altitude, that maintain current environmental differences between two main regions on Atlantic Forest and may be related to phylogenetic histories of taxa supporting the two groups.

  12. Influence of multi-source and multi-temporal remotely sensed and ancillary data on the accuracy of random forest classification of wetlands in northern Minnesota

    Science.gov (United States)

    Corcoran, Jennifer M.; Knight, Joseph F.; Gallant, Alisa L.

    2013-01-01

    Wetland mapping at the landscape scale using remotely sensed data requires both affordable data and an efficient accurate classification method. Random forest classification offers several advantages over traditional land cover classification techniques, including a bootstrapping technique to generate robust estimations of outliers in the training data, as well as the capability of measuring classification confidence. Though the random forest classifier can generate complex decision trees with a multitude of input data and still not run a high risk of over fitting, there is a great need to reduce computational and operational costs by including only key input data sets without sacrificing a significant level of accuracy. Our main questions for this study site in Northern Minnesota were: (1) how does classification accuracy and confidence of mapping wetlands compare using different remote sensing platforms and sets of input data; (2) what are the key input variables for accurate differentiation of upland, water, and wetlands, including wetland type; and (3) which datasets and seasonal imagery yield the best accuracy for wetland classification. Our results show the key input variables include terrain (elevation and curvature) and soils descriptors (hydric), along with an assortment of remotely sensed data collected in the spring (satellite visible, near infrared, and thermal bands; satellite normalized vegetation index and Tasseled Cap greenness and wetness; and horizontal-horizontal (HH) and horizontal-vertical (HV) polarization using L-band satellite radar). We undertook this exploratory analysis to inform decisions by natural resource managers charged with monitoring wetland ecosystems and to aid in designing a system for consistent operational mapping of wetlands across landscapes similar to those found in Northern Minnesota.

  13. WDL-RF: Predicting Bioactivities of Ligand Molecules Acting with G Protein-coupled Receptors by Combining Weighted Deep Learning and Random Forest.

    Science.gov (United States)

    Wu, Jiansheng; Zhang, Qiuming; Wu, Weijian; Pang, Tao; Hu, Haifeng; Chan, Wallace K B; Ke, Xiaoyan; Zhang, Yang; Wren, Jonathan

    2018-02-08

    Precise assessment of ligand bioactivities (including IC50, EC50, Ki, Kd, etc.) is essential for virtual screening and lead compound identification. However, not all ligands have experimentally-determined activities. In particular, many G protein-coupled receptors (GPCRs), which are the largest integral membrane protein family and represent targets of nearly 40% drugs on the market, lack published experimental data about ligand interactions. Computational methods with the ability to accurately predict the bioactivity of ligands can help efficiently address this problem. We proposed a new method, WDL-RF, using weighted deep learning and random forest, to model the bioactivity of GPCR-associated ligand molecules. The pipeline of our algorithm consists of two consecutive stages: 1) molecular fingerprint generation through a new weighted deep learning method, and 2) bioactivity calculations with a random forest model; where one uniqueness of the approach is that the model allows end-to-end learning of prediction pipelines with input ligands being of arbitrary size. The method was tested on a set of twenty-six non-redundant GPCRs that have a high number of active ligands, each with 200∼4000 ligand associations. The results from our benchmark show that WDL-RF can generate bioactivity predictions with an average root-mean square error 1.33 and correlation coefficient (r2) 0.80 compared to the experimental measurements, which are significantly more accurate than the control predictors with different molecular fingerprints and descriptors. In particular, data-driven molecular fingerprint features, as extracted from the weighted deep learning models, can help solve deficiencies stemming from the use of traditional hand-crafted features and significantly increase the efficiency of short molecular fingerprints in virtual screening. The WDL-RF web server, as well as source codes and datasets of WDL-RF, is freely available at https://zhanglab.ccmb.med.umich.edu/WDL-RF/ for

  14. Method for statistical data analysis of multivariate observations

    CERN Document Server

    Gnanadesikan, R

    1997-01-01

    A practical guide for multivariate statistical techniques-- now updated and revised In recent years, innovations in computer technology and statistical methodologies have dramatically altered the landscape of multivariate data analysis. This new edition of Methods for Statistical Data Analysis of Multivariate Observations explores current multivariate concepts and techniques while retaining the same practical focus of its predecessor. It integrates methods and data-based interpretations relevant to multivariate analysis in a way that addresses real-world problems arising in many areas of inte

  15. Multivariate survival analysis and competing risks

    CERN Document Server

    Crowder, Martin J

    2012-01-01

    Multivariate Survival Analysis and Competing Risks introduces univariate survival analysis and extends it to the multivariate case. It covers competing risks and counting processes and provides many real-world examples, exercises, and R code. The text discusses survival data, survival distributions, frailty models, parametric methods, multivariate data and distributions, copulas, continuous failure, parametric likelihood inference, and non- and semi-parametric methods. There are many books covering survival analysis, but very few that cover the multivariate case in any depth. Written for a graduate-level audience in statistics/biostatistics, this book includes practical exercises and R code for the examples. The author is renowned for his clear writing style, and this book continues that trend. It is an excellent reference for graduate students and researchers looking for grounding in this burgeoning field of research.

  16. The value of multivariate model sophistication

    DEFF Research Database (Denmark)

    Rombouts, Jeroen; Stentoft, Lars; Violante, Francesco

    2014-01-01

    We assess the predictive accuracies of a large number of multivariate volatility models in terms of pricing options on the Dow Jones Industrial Average. We measure the value of model sophistication in terms of dollar losses by considering a set of 444 multivariate models that differ in their spec....... In addition to investigating the value of model sophistication in terms of dollar losses directly, we also use the model confidence set approach to statistically infer the set of models that delivers the best pricing performances.......We assess the predictive accuracies of a large number of multivariate volatility models in terms of pricing options on the Dow Jones Industrial Average. We measure the value of model sophistication in terms of dollar losses by considering a set of 444 multivariate models that differ...

  17. Interaction between forest biodiversity and people's use of forest resources in Roviana, Solomon Islands: implications for biocultural conservation under socioeconomic changes.

    Science.gov (United States)

    Furusawa, Takuro; Sirikolo, Myknee Qusa; Sasaoka, Masatoshi; Ohtsuka, Ryutaro

    2014-01-27

    In Solomon Islands, forests have provided people with ecological services while being affected by human use and protection. This study used a quantitative ethnobotanical analysis to explore the society-forest interaction and its transformation in Roviana, Solomon Islands. We compared local plant and land uses between a rural village and urbanized village. Special attention was paid to how local people depend on biodiversity and how traditional human modifications of forest contribute to biodiversity conservation. After defining locally recognized land-use classes, vegetation surveys were conducted in seven forest classes. For detailed observations of daily plant uses, 15 and 17 households were randomly selected in the rural and urban villages, respectively. We quantitatively documented the plant species that were used as food, medicine, building materials, and tools. The vegetation survey revealed that each local forest class represented a different vegetative community with relatively low similarity between communities. Although commercial logging operations and agriculture were both prohibited in the customary nature reserve, local people were allowed to cut down trees for their personal use and to take several types of non-timber forest products. Useful trees were found at high frequencies in the barrier island's primary forest (68.4%) and the main island's reserve (68.3%). Various useful tree species were found only in the reserve forest and seldom available in the urban village. In the rural village, customary governance and control over the use of forest resources by the local people still functioned. Human modifications of the forest created unique vegetation communities, thus increasing biodiversity overall. Each type of forest had different species that varied in their levels of importance to the local subsistence lifestyle, and the villagers' behaviors, such as respect for forest reserves and the semidomestication of some species, contributed to

  18. Assessment of Different Remote Sensing Data for Forest Structural Attributes Estimation in the Hyrcanian forests

    Energy Technology Data Exchange (ETDEWEB)

    Nourian, N.; Shataee-Joibary, S.; Mohammadi, J.

    2016-07-01

    Aim of the study: The objective of the study was the comparative assessment of various spatial resolutions of optical satellite imagery including Landsat-TM, ASTER, and Quickbird data to estimate the forest structure attributes of Hyrcanian forests, Golestan province, northernIran. Material and methods: The 112 square plots with area of0.09 ha were measured using a random cluster sampling method and then stand volume, basal area, and tree stem density were computed using measured data. After geometric and atmospheric corrections of images, the spectral attributes from original and different synthetic bands were extracted for modelling. The statistical modelling was performed using CART algorithm. Performance assessment of models was examined using the unused validation plots by RMSE and bias measures. Main Results: The results showed that model of Quickbird data for stand volume, basal area, and tree stem density had a better performance compared to ASTER and TM data. However, estimations by ASTER and TM imagery had slightly similar results for all three parameters. Research highlights: This study exposed that the high-resolution satellite data are more useful for forest structure attributes estimation in the Hyrcanian broadleaves forests compared with medium resolution images without consideration of images costs. However, regarding to be free of the most medium resolution data such as ASTER and TM,ETM+ or OLI images, these data can be used with slightly similar results. (Author)

  19. The role of forest in mitigating the impact of atmospheric dust pollution in a mixed landscape.

    Science.gov (United States)

    Santos, Artur; Pinho, Pedro; Munzi, Silvana; Botelho, Maria João; Palma-Oliveira, José Manuel; Branquinho, Cristina

    2017-05-01

    Atmospheric dust pollution, especially particulate matter below 2.5 μm, causes 3.3 million premature deaths per year worldwide. Although pollution sources are increasingly well known, the role of ecosystems in mitigating their impact is still poorly known. Our objective was to investigate the role of forests located in the surrounding of industrial and urban areas in reducing atmospheric dust pollution. This was tested using lichen transplants as biomonitors in a Mediterranean regional area with high levels of dry deposition. After a multivariate analysis, we have modeled the maximum pollution load expected for each site taking into consideration nearby pollutant sources. The difference between maximum expected pollution load and the observed values was explained by the deposition in nearby forests. Both the dust pollution and the ameliorating effect of forested areas were then mapped. The results showed that forest located nearby pollution sources plays an important role in reducing atmospheric dust pollution, highlighting their importance in the provision of the ecosystem service of air purification.

  20. Multivariate statistical methods a first course

    CERN Document Server

    Marcoulides, George A

    2014-01-01

    Multivariate statistics refer to an assortment of statistical methods that have been developed to handle situations in which multiple variables or measures are involved. Any analysis of more than two variables or measures can loosely be considered a multivariate statistical analysis. An introductory text for students learning multivariate statistical methods for the first time, this book keeps mathematical details to a minimum while conveying the basic principles. One of the principal strategies used throughout the book--in addition to the presentation of actual data analyses--is poin

  1. Health Effect of Forest Bathing Trip on Elderly Patients with Chronic Obstructive Pulmonary Disease.

    Science.gov (United States)

    Jia, Bing Bing; Yang, Zhou Xin; Mao, Gen Xiang; Lyu, Yuan Dong; Wen, Xiao Lin; Xu, Wei Hong; Lyu, Xiao Ling; Cao, Yong Bao; Wang, Guo Fu

    2016-03-01

    Forest bathing trip is a short, leisurely visit to forest. In this study we determined the health effects of forest bathing trip on elderly patients with chronic obstructive pulmonary disease (COPD). The patients were randomly divided into two groups. One group was sent to forest, and the other was sent to an urban area as control. Flow cytometry, ELISA, and profile of mood states (POMS) evaluation were performed. In the forest group, we found a significant decrease of perforin and granzyme B expressions, accompanied by decreased levels of pro-inflammatory cytokines and stress hormones. Meanwhile, the scores in the negative subscales of POMS decreased after forest bathing trip. These results indicate that forest bathing trip has health effect on elderly COPD patients by reducing inflammation and stress level. Copyright © 2016 The Editorial Board of Biomedical and Environmental Sciences. Published by China CDC. All rights reserved.

  2. Seminal Quality Prediction Using Clustering-Based Decision Forests

    Directory of Open Access Journals (Sweden)

    Hong Wang

    2014-08-01

    Full Text Available Prediction of seminal quality with statistical learning tools is an emerging methodology in decision support systems in biomedical engineering and is very useful in early diagnosis of seminal patients and selection of semen donors candidates. However, as is common in medical diagnosis, seminal quality prediction faces the class imbalance problem. In this paper, we propose a novel supervised ensemble learning approach, namely Clustering-Based Decision Forests, to tackle unbalanced class learning problem in seminal quality prediction. Experiment results on real fertility diagnosis dataset have shown that Clustering-Based Decision Forests outperforms decision tree, Support Vector Machines, random forests, multilayer perceptron neural networks and logistic regression by a noticeable margin. Clustering-Based Decision Forests can also be used to evaluate variables’ importance and the top five important factors that may affect semen concentration obtained in this study are age, serious trauma, sitting time, the season when the semen sample is produced, and high fevers in the last year. The findings could be helpful in explaining seminal concentration problems in infertile males or pre-screening semen donor candidates.

  3. Contribution of Forest Restoration to Rural Livelihoods and Household Income in Indonesia

    Directory of Open Access Journals (Sweden)

    Nayu Nuringdati Widianingsih

    2016-08-01

    Full Text Available Forest resources remain vital to the survival of many rural communities, though the level of forest reliance varies across a range of sites and socio-economic settings. This article investigates variation in forest utilization across households in three ethnic groups living near a forest restoration area in Sumatra, Indonesia. Survey data were collected on 268 households, with a four-month recall period and three repeat visits to each selected household within a year. Random sampling was applied to select households in five villages and five Batin Sembilan (indigenous semi-nomadic groups. Sampled households belonged to three ethnic groups: 15% were Batin Sembilan, 40% Local Malayan, and 45% Immigrant households. Indigenous households displayed the highest reliance on forests: 36% of their annual total income came from this source, as compared with 10% and 8% for Local and Immigrant households, respectively. Our findings showed that the livelihoods of indigenous groups were still intricately linked with forest resources, despite a rapid landscape-wide transition from natural forest to oil palm and timber plantations.

  4. Multivariable control in nuclear power stations

    International Nuclear Information System (INIS)

    Parent, M.; McMorran, P.D.

    1982-11-01

    Multivariable methods have the potential to improve the control of large systems such as nuclear power stations. Linear-quadratic optimal control is a multivariable method based on the minimization of a cost function. A related technique leads to the Kalman filter for estimation of plant state from noisy measurements. A design program for optimal control and Kalman filtering has been developed as part of a computer-aided design package for multivariable control systems. The method is demonstrated on a model of a nuclear steam generator, and simulated results are presented

  5. Forest resources of the Lincoln National Forest

    Science.gov (United States)

    John D. Shaw

    2006-01-01

    The Interior West Forest Inventory and Analysis (IWFIA) program of the USDA Forest Service, Rocky Mountain Research Station, as part of its national Forest Inventory and Analysis (FIA) duties, conducted forest resource inventories of the Southwestern Region (Region 3) National Forests. This report presents highlights of the Lincoln National Forest 1997 inventory...

  6. Forest Edge Regrowth Typologies in Southern Sweden-Relationship to Environmental Characteristics and Implications for Management.

    Science.gov (United States)

    Wiström, Björn; Busse Nielsen, Anders

    2017-07-01

    After two major storms, the Swedish Transport Administration was granted permission in 2008 to expand the railroad corridor from 10 to 20 m from the rail banks, and to clear the forest edges in the expanded area. In order to evaluate the possibilities for managers to promote and control the species composition of the woody regrowth so that a forest edge with a graded profile develops over time, this study mapped the woody regrowth and environmental variables at 78 random sites along the 610-km railroad between Stockholm and Malmö four growing seasons after the clearing was implemented. Through different clustering approaches, dominant tree species to be controlled and future building block species for management were identified. Using multivariate regression trees, the most decisive environmental variables were identified and used to develop a regrowth typology and to calculate species indicator values. Five regrowth types and ten indicator species were identified along the environmental gradients of soil moisture, soil fertility, and altitude. Six tree species dominated the regrowth across the regrowth types, but clustering showed that if these were controlled by selective thinning, lower tree and shrub species were generally present so they could form the "building blocks" for development of a graded edge. We concluded that selective thinning targeted at controlling a few dominant tree species, here named Functional Species Control, is a simple and easily implemented management concept to promote a wide range of suitable species, because it does not require field staff with specialist taxonomic knowledge.

  7. Per-field crop classification in irrigated agricultural regions in middle Asia using random forest and support vector machine ensemble

    Science.gov (United States)

    Löw, Fabian; Schorcht, Gunther; Michel, Ulrich; Dech, Stefan; Conrad, Christopher

    2012-10-01

    Accurate crop identification and crop area estimation are important for studies on irrigated agricultural systems, yield and water demand modeling, and agrarian policy development. In this study a novel combination of Random Forest (RF) and Support Vector Machine (SVM) classifiers is presented that (i) enhances crop classification accuracy and (ii) provides spatial information on map uncertainty. The methodology was implemented over four distinct irrigated sites in Middle Asia using RapidEye time series data. The RF feature importance statistics was used as feature-selection strategy for the SVM to assess possible negative effects on classification accuracy caused by an oversized feature space. The results of the individual RF and SVM classifications were combined with rules based on posterior classification probability and estimates of classification probability entropy. SVM classification performance was increased by feature selection through RF. Further experimental results indicate that the hybrid classifier improves overall classification accuracy in comparison to the single classifiers as well as useŕs and produceŕs accuracy.

  8. Using Random Forest to Improve the Downscaling of Global Livestock Census Data

    Science.gov (United States)

    Nicolas, Gaëlle; Robinson, Timothy P.; Wint, G. R. William; Conchedda, Giulia; Cinardi, Giuseppina; Gilbert, Marius

    2016-01-01

    Large scale, high-resolution global data on farm animal distributions are essential for spatially explicit assessments of the epidemiological, environmental and socio-economic impacts of the livestock sector. This has been the major motivation behind the development of the Gridded Livestock of the World (GLW) database, which has been extensively used since its first publication in 2007. The database relies on a downscaling methodology whereby census counts of animals in sub-national administrative units are redistributed at the level of grid cells as a function of a series of spatial covariates. The recent upgrade of GLW1 to GLW2 involved automating the processing, improvement of input data, and downscaling at a spatial resolution of 1 km per cell (5 km per cell in the earlier version). The underlying statistical methodology, however, remained unchanged. In this paper, we evaluate new methods to downscale census data with a higher accuracy and increased processing efficiency. Two main factors were evaluated, based on sample census datasets of cattle in Africa and chickens in Asia. First, we implemented and evaluated Random Forest models (RF) instead of stratified regressions. Second, we investigated whether models that predicted the number of animals per rural person (per capita) could provide better downscaled estimates than the previous approach that predicted absolute densities (animals per km2). RF models consistently provided better predictions than the stratified regressions for both continents and species. The benefit of per capita over absolute density models varied according to the species and continent. In addition, different technical options were evaluated to reduce the processing time while maintaining their predictive power. Future GLW runs (GLW 3.0) will apply the new RF methodology with optimized modelling options. The potential benefit of per capita models will need to be further investigated with a better distinction between rural and agricultural

  9. Using Random Forest to Improve the Downscaling of Global Livestock Census Data.

    Directory of Open Access Journals (Sweden)

    Gaëlle Nicolas

    Full Text Available Large scale, high-resolution global data on farm animal distributions are essential for spatially explicit assessments of the epidemiological, environmental and socio-economic impacts of the livestock sector. This has been the major motivation behind the development of the Gridded Livestock of the World (GLW database, which has been extensively used since its first publication in 2007. The database relies on a downscaling methodology whereby census counts of animals in sub-national administrative units are redistributed at the level of grid cells as a function of a series of spatial covariates. The recent upgrade of GLW1 to GLW2 involved automating the processing, improvement of input data, and downscaling at a spatial resolution of 1 km per cell (5 km per cell in the earlier version. The underlying statistical methodology, however, remained unchanged. In this paper, we evaluate new methods to downscale census data with a higher accuracy and increased processing efficiency. Two main factors were evaluated, based on sample census datasets of cattle in Africa and chickens in Asia. First, we implemented and evaluated Random Forest models (RF instead of stratified regressions. Second, we investigated whether models that predicted the number of animals per rural person (per capita could provide better downscaled estimates than the previous approach that predicted absolute densities (animals per km2. RF models consistently provided better predictions than the stratified regressions for both continents and species. The benefit of per capita over absolute density models varied according to the species and continent. In addition, different technical options were evaluated to reduce the processing time while maintaining their predictive power. Future GLW runs (GLW 3.0 will apply the new RF methodology with optimized modelling options. The potential benefit of per capita models will need to be further investigated with a better distinction between rural

  10. Aboveground carbon loss in natural and managed tropical forests from 2000 to 2012

    International Nuclear Information System (INIS)

    Tyukavina, A; Hansen, M C; Potapov, P V; Krylov, A M; Turubanova, S; Baccini, A; Houghton, R A; Goetz, S J; Stehman, S V

    2015-01-01

    Tropical forests provide global climate regulation ecosystem services and their clearing is a significant source of anthropogenic greenhouse gas (GHG) emissions and resultant radiative forcing of climate change. However, consensus on pan-tropical forest carbon dynamics is lacking. We present a new estimate that employs recommended good practices to quantify gross tropical forest aboveground carbon (AGC) loss from 2000 to 2012 through the integration of Landsat-derived tree canopy cover, height, intactness and forest cover loss and GLAS-lidar derived forest biomass. An unbiased estimate of forest loss area is produced using a stratified random sample with strata derived from a wall-to-wall 30 m forest cover loss map. Our sample-based results separate the gross loss of forest AGC into losses from natural forests (0.59 PgC yr −1 ) and losses from managed forests (0.43 PgC yr −1 ) including plantations, agroforestry systems and subsistence agriculture. Latin America accounts for 43% of gross AGC loss and 54% of natural forest AGC loss, with Brazil experiencing the highest AGC loss for both categories at national scales. We estimate gross tropical forest AGC loss and natural forest loss to account for 11% and 6% of global year 2012 CO 2 emissions, respectively. Given recent trends, natural forests will likely constitute an increasingly smaller proportion of tropical forest GHG emissions and of global emissions as fossil fuel consumption increases, with implications for the valuation of co-benefits in tropical forest conservation. (letter)

  11. Public awareness of aesthetic and other forest values associated with sustainable forest management: a cross-cultural comparison among the public in four countries.

    Science.gov (United States)

    Lim, Sang Seop; Innes, John L; Meitner, Michael

    2015-03-01

    Korea, China, Japan and Canada are all members of the Montreal Process (MP). However, there has been little comparative research on the public awareness of forest values within the framework of Sustainable Forest Management, not only between Asia and Canada, but also among these three Asian countries. This is true of aesthetic values, especially as the MP framework has no indicator for aesthetic values. We conducted surveys to identify similarities and differences in the perceptions of various forest values, including aesthetic values, between residents of the four countries: university student groups in Korea, China, Japan and Canada, as well as a more detailed assessment of the attitudes of Koreans by including two additional groups, Korean office workers, and Koreans living in Canada. A multivariate analysis of variance test across the four university student groups revealed significant differences in the rating of six forest functions out of 31. However the same test across the three Korean groups indicated no significant differences indicating higher confidence in the generalizability of our university student comparisons. For the forest aesthetic values, an analysis of variance test showed no significant differences across all groups. The forest aesthetic value was rated 6.95 to 7.98 (out of 10.0) depending on the group and rated relatively highly among ten social values across all the groups. Thurstone scale rankings and relative distances of six major forest values indicated that climate change control was ranked as the highest priority and scenic beauty was ranked the lowest by all the groups. Comparison tests of the frequencies of preferred major forest values revealed no significant differences across the groups with the exception of the Japanese group. These results suggest that public awareness of aesthetic and other forest values are not clearly correlated with the cultural backgrounds of the individuals, and the Korean university students' awareness

  12. Forest resources of Mississippi’s national forests, 2006

    Science.gov (United States)

    Sonja N. Oswalt

    2011-01-01

    This bulletin describes forest resource characteristics of Mississippi’s national forests, with emphasis on DeSoto National Forest, following the 2006 survey completed by the U.S. Department of Agriculture Forest Service, Forest Inventory and Analysis program. Mississippi’s national forests comprise > 1 million acres of forest land, or about 7 percent of all forest...

  13. Random forest variable selection in spatial malaria transmission modelling in Mpumalanga Province, South Africa

    Directory of Open Access Journals (Sweden)

    Thandi Kapwata

    2016-11-01

    Full Text Available Malaria is an environmentally driven disease. In order to quantify the spatial variability of malaria transmission, it is imperative to understand the interactions between environmental variables and malaria epidemiology at a micro-geographic level using a novel statistical approach. The random forest (RF statistical learning method, a relatively new variable-importance ranking method, measures the variable importance of potentially influential parameters through the percent increase of the mean squared error. As this value increases, so does the relative importance of the associated variable. The principal aim of this study was to create predictive malaria maps generated using the selected variables based on the RF algorithm in the Ehlanzeni District of Mpumalanga Province, South Africa. From the seven environmental variables used [temperature, lag temperature, rainfall, lag rainfall, humidity, altitude, and the normalized difference vegetation index (NDVI], altitude was identified as the most influential predictor variable due its high selection frequency. It was selected as the top predictor for 4 out of 12 months of the year, followed by NDVI, temperature and lag rainfall, which were each selected twice. The combination of climatic variables that produced the highest prediction accuracy was altitude, NDVI, and temperature. This suggests that these three variables have high predictive capabilities in relation to malaria transmission. Furthermore, it is anticipated that the predictive maps generated from predictions made by the RF algorithm could be used to monitor the progression of malaria and assist in intervention and prevention efforts with respect to malaria.

  14. Multivariate and semiparametric kernel regression

    OpenAIRE

    Härdle, Wolfgang; Müller, Marlene

    1997-01-01

    The paper gives an introduction to theory and application of multivariate and semiparametric kernel smoothing. Multivariate nonparametric density estimation is an often used pilot tool for examining the structure of data. Regression smoothing helps in investigating the association between covariates and responses. We concentrate on kernel smoothing using local polynomial fitting which includes the Nadaraya-Watson estimator. Some theory on the asymptotic behavior and bandwidth selection is pro...

  15. Aboveground Biomass Monitoring over Siberian Boreal Forest Using Radar Remote Sensing Data

    Science.gov (United States)

    Stelmaszczuk-Gorska, M. A.; Thiel, C. J.; Schmullius, C.

    2014-12-01

    Aboveground biomass (AGB) plays an essential role in ecosystem research, global cycles, and is of vital importance in climate studies. AGB accumulated in the forests is of special monitoring interest as it contains the most of biomass comparing with other land biomes. The largest of the land biomes is boreal forest, which has a substantial carbon accumulation capability; carbon stock estimated to be 272 +/-23 Pg C (32%) [1]. Russian's forests are of particular concern, due to the largest source of uncertainty in global carbon stock calculations [1], and old inventory data that have not been updated in the last 25 years [2]. In this research new empirical models for AGB estimation are proposed. Using radar L-band data for AGB retrieval and optical data for an update of in situ data the processing scheme was developed. The approach was trained and validated in the Asian part of the boreal forest, in southern Russian Central Siberia; two Siberian Federal Districts: Krasnoyarsk Kray and Irkutsk Oblast. Together the training and testing forest territories cover an area of approximately 3,500 km2. ALOS PALSAR L-band single (HH - horizontal transmitted and received) and dual (HH and HV - horizontal transmitted, horizontal and vertical received) polarizations in Single Look Complex format (SLC) were used to calculate backscattering coefficient in gamma nought and coherence. In total more than 150 images acquired between 2006 and 2011 were available. The data were obtained through the ALOS Kyoto and Carbon Initiative Project (K&C). The data were used to calibrate a randomForest algorithm. Additionally, a simple linear and multiple-regression approach was used. The uncertainty of the AGB estimation at pixel and stand level were calculated approximately as 35% by validation against an independent dataset. The previous studies employing ALOS PALSAR data over boreal forests reported uncertainty of 39.4% using randomForest approach [2] or 42.8% using semi-empirical approach [3].

  16. Use of DNA markers in forest tree improvement research

    Science.gov (United States)

    D.B. Neale; M.E. Devey; K.D. Jermstad; M.R. Ahuja; M.C. Alosi; K.A. Marshall

    1992-01-01

    DNA markers are rapidly being developed for forest trees. The most important markers are restriction fragment length polymorphisms (RFLPs), polymerase chain reaction- (PCR) based markers such as random amplified polymorphic DNA (RAPD), and fingerprinting markers. DNA markers can supplement isozyme markers for monitoring tree improvement activities such as; estimating...

  17. Change detection by the IR-MAD and kernel MAF methods in Landsat TM data covering a Swedish forest region

    DEFF Research Database (Denmark)

    Nielsen, Allan Aasbjerg; Olsson, Håkan

    2010-01-01

    Change over time between two 512 by 512 (25 m by 25 m pixels) multispectral Landsat Thematic Mapper images dated 6 June 1986 and 27 June 1988 respectively covering a forested region in northern Sweden, is here detected by means of the iteratively reweighted multivariate alteration detection (IR-M...

  18. Growth and structure of a young Aleppo pine planted forest after thinning for diversification and wildfire prevention

    Energy Technology Data Exchange (ETDEWEB)

    Ruiz-Mirazo, J.; Gonzalez-Rebollar, J. L.

    2013-05-01

    Aim of study: In the Mediterranean, low timber-production forests are frequently thinned to promote biodiversity and reduce wildfire risk, but few studies in the region have addressed such goals. The aim of this research was to compare six thinning regimes applied to create a fuel break in a young Aleppo pine (Pinus halepensis Mill.) planted forest. Area of study: A semiarid continental high plateau in south-eastern Spain. Material and Methods: Three thinning intensities (Light, Medium and Heavy) were combined with two thinning methods: i) Random (tree selection), and ii) Regular (tree spacing). Tree growth and stand structure measurements were made four years following treatments. Main results: Heavy Random thinning successfully transformed the regular tree plantation pattern into a close-to-random spatial tree distribution. Heavy Regular thinning (followed by the Medium Regular and Heavy Random regimes) significantly reduced growth in stand basal area and biomass. Individual tree growth, in contrast, was greater in Heavy and Medium thinnings than in Light ones, which were similar to the Control. Research highlights: Heavy Random thinning seemed the most appropriate in a young Aleppo pine planted forest to reduce fire risk and artificial tree distribution simultaneously. Light Regular thinning avoids under stocking the stand and may be the most suitable treatment for creating a fuel break when the undergrowth poses a high fire risk. (Author) 35 refs.

  19. Multivariate GARCH models

    DEFF Research Database (Denmark)

    Silvennoinen, Annastiina; Teräsvirta, Timo

    This article contains a review of multivariate GARCH models. Most common GARCH models are presented and their properties considered. This also includes nonparametric and semiparametric models. Existing specification and misspecification tests are discussed. Finally, there is an empirical example...

  20. Applied multivariate statistics with R

    CERN Document Server

    Zelterman, Daniel

    2015-01-01

    This book brings the power of multivariate statistics to graduate-level practitioners, making these analytical methods accessible without lengthy mathematical derivations. Using the open source, shareware program R, Professor Zelterman demonstrates the process and outcomes for a wide array of multivariate statistical applications. Chapters cover graphical displays, linear algebra, univariate, bivariate and multivariate normal distributions, factor methods, linear regression, discrimination and classification, clustering, time series models, and additional methods. Zelterman uses practical examples from diverse disciplines to welcome readers from a variety of academic specialties. Those with backgrounds in statistics will learn new methods while they review more familiar topics. Chapters include exercises, real data sets, and R implementations. The data are interesting, real-world topics, particularly from health and biology-related contexts. As an example of the approach, the text examines a sample from the B...

  1. Multivariate semi-logistic distribution and processes | Umar | Journal ...

    African Journals Online (AJOL)

    Multivariate semi-logistic distribution is introduced and studied. Some characterizations properties of multivariate semi-logistic distribution are presented. First order autoregressive minification processes and its generalization to kth order autoregressive minification processes with multivariate semi-logistic distribution as ...

  2. Learning multivariate distributions by competitive assembly of marginals.

    Science.gov (United States)

    Sánchez-Vega, Francisco; Younes, Laurent; Geman, Donald

    2013-02-01

    We present a new framework for learning high-dimensional multivariate probability distributions from estimated marginals. The approach is motivated by compositional models and Bayesian networks, and designed to adapt to small sample sizes. We start with a large, overlapping set of elementary statistical building blocks, or "primitives," which are low-dimensional marginal distributions learned from data. Each variable may appear in many primitives. Subsets of primitives are combined in a Lego-like fashion to construct a probabilistic graphical model; only a small fraction of the primitives will participate in any valid construction. Since primitives can be precomputed, parameter estimation and structure search are separated. Model complexity is controlled by strong biases; we adapt the primitives to the amount of training data and impose rules which restrict the merging of them into allowable compositions. The likelihood of the data decomposes into a sum of local gains, one for each primitive in the final structure. We focus on a specific subclass of networks which are binary forests. Structure optimization corresponds to an integer linear program and the maximizing composition can be computed for reasonably large numbers of variables. Performance is evaluated using both synthetic data and real datasets from natural language processing and computational biology.

  3. Synergy of optical and polarimetric microwave data for forest resource assessment

    International Nuclear Information System (INIS)

    Miguel-Ayanz, J.S.

    1997-01-01

    Data acquired during the Mac-Europe 91 campaign over the Black Forest ( Germany) are used to study the synergy of optical imaging spectrometer data ( AVIRIS) and polarimetric microwave data ( AIRSAR) for forest resource assessment. Original and new derived bands from AIRSAR and AVIRIS data are used to predict age and biomass. The best predictors ( bands) are selected through a multivariate stepwise regression analysis of each of the datasets separately. Then the joint AIRSAR-AVIRIS dataset is analysed. This study shows how the synergistic use of AIRSAR and AVIRIS data improves significantly the predictions obtained from the individual datasets for both age and biomass over the test site. In the analysis of AVIRIS data a new approach for processing large datasets as those provided by imaging spectrometers is presented, so that maximum likelihood classification of these datasets becomes feasible. (author)

  4. Multivariate Pareto Minification Processes | Umar | Journal of the ...

    African Journals Online (AJOL)

    Autoregressive (AR) and autoregressive moving average (ARMA) processes with multivariate exponential (ME) distribution are presented and discussed. The theory of positive dependence is used to show that in many cases, multivariate exponential autoregressive (MEAR) and multivariate autoregressive moving average ...

  5. Comparison of random forests and support vector machine for real-time radar-derived rainfall forecasting

    Science.gov (United States)

    Yu, Pao-Shan; Yang, Tao-Chang; Chen, Szu-Yin; Kuo, Chen-Min; Tseng, Hung-Wei

    2017-09-01

    This study aims to compare two machine learning techniques, random forests (RF) and support vector machine (SVM), for real-time radar-derived rainfall forecasting. The real-time radar-derived rainfall forecasting models use the present grid-based radar-derived rainfall as the output variable and use antecedent grid-based radar-derived rainfall, grid position (longitude and latitude) and elevation as the input variables to forecast 1- to 3-h ahead rainfalls for all grids in a catchment. Grid-based radar-derived rainfalls of six typhoon events during 2012-2015 in three reservoir catchments of Taiwan are collected for model training and verifying. Two kinds of forecasting models are constructed and compared, which are single-mode forecasting model (SMFM) and multiple-mode forecasting model (MMFM) based on RF and SVM. The SMFM uses the same model for 1- to 3-h ahead rainfall forecasting; the MMFM uses three different models for 1- to 3-h ahead forecasting. According to forecasting performances, it reveals that the SMFMs give better performances than MMFMs and both SVM-based and RF-based SMFMs show satisfactory performances for 1-h ahead forecasting. However, for 2- and 3-h ahead forecasting, it is found that the RF-based SMFM underestimates the observed radar-derived rainfalls in most cases and the SVM-based SMFM can give better performances than RF-based SMFM.

  6. Risk Prediction of One-Year Mortality in Patients with Cardiac Arrhythmias Using Random Survival Forest

    Directory of Open Access Journals (Sweden)

    Fen Miao

    2015-01-01

    Full Text Available Existing models for predicting mortality based on traditional Cox proportional hazard approach (CPH often have low prediction accuracy. This paper aims to develop a clinical risk model with good accuracy for predicting 1-year mortality in cardiac arrhythmias patients using random survival forest (RSF, a robust approach for survival analysis. 10,488 cardiac arrhythmias patients available in the public MIMIC II clinical database were investigated, with 3,452 deaths occurring within 1-year followups. Forty risk factors including demographics and clinical and laboratory information and antiarrhythmic agents were analyzed as potential predictors of all-cause mortality. RSF was adopted to build a comprehensive survival model and a simplified risk model composed of 14 top risk factors. The built comprehensive model achieved a prediction accuracy of 0.81 measured by c-statistic with 10-fold cross validation. The simplified risk model also achieved a good accuracy of 0.799. Both results outperformed traditional CPH (which achieved a c-statistic of 0.733 for the comprehensive model and 0.718 for the simplified model. Moreover, various factors are observed to have nonlinear impact on cardiac arrhythmias prognosis. As a result, RSF based model which took nonlinearity into account significantly outperformed traditional Cox proportional hazard model and has great potential to be a more effective approach for survival analysis.

  7. Air contaminants and litter fall decomposition in urban forest areas: The case of São Paulo - SP, Brazil.

    Science.gov (United States)

    Lamano Ferreira, Maurício; Portella Ribeiro, Andreza; Rodrigues Albuquerque, Caroline; Ferreira, Ana Paula do Nascimento Lamano; Figueira, Rubens César Lopes; Lafortezza, Raffaele

    2017-05-01

    Urban forests are usually affected by several types of atmospheric contaminants and by abnormal variations in weather conditions, thus facilitating the biotic homogenization and modification of ecosystem processes, such as nutrient cycling. Peri-urban forests and even natural forests that surround metropolitan areas are also subject to anthropogenic effects generated by cities, which may compromise the dynamics of these ecosystems. Hence, this study advances the hypothesis that the forests located at the margins of the Metropolitan Region of São Paulo (MRSP), Brazil, have high concentrations of atmospheric contaminants leading to adverse effects on litter fall stock. The production, stock and decomposition of litter fall in two forests were quantified. The first, known as Guarapiranga forest, lies closer to the urban area and is located within the MRSP, approximately 20km from the city center. The second, Curucutu forest, is located 70km from the urban center. This forest is situated exactly on the border of the largest continuum of vegetation of the Atlantic Forest. To verify the reach of atmospheric pollutants from the urban area, levels of heavy metals (Cd, Pb, Ni, Cu) adsorbed on the litter fall deposited on the soil surface of the forests were also quantified. The stock of litter fall and the levels of heavy metals were generally higher in the Guarapiranga forest in the samples collected during the lower rainfall season (dry season). Non-metric multidimensional scaling multivariate analysis showed a clear distinction of the sample units related to the concentrations of heavy metals in each forest. A subtle difference between the units related to the dry and rainy seasons in the Curucutu forest was also noted. Multivariate Analysis of Variance revealed that both site and season of the year (dry or rainy) were important to differentiate the quantity of heavy metals in litter fall stock, although the analysis did not show the interaction between these two

  8. Forest Structure Characterization Using Jpl's UAVSAR Multi-Baseline Polarimetric SAR Interferometry and Tomography

    Science.gov (United States)

    Neumann, Maxim; Hensley, Scott; Lavalle, Marco; Ahmed, Razi

    2013-01-01

    This paper concerns forest remote sensing using JPL's multi-baseline polarimetric interferometric UAVSAR data. It presents exemplary results and analyzes the possibilities and limitations of using SAR Tomography and Polarimetric SAR Interferometry (PolInSAR) techniques for the estimation of forest structure. Performance and error indicators for the applicability and reliability of the used multi-baseline (MB) multi-temporal (MT) PolInSAR random volume over ground (RVoG) model are discussed. Experimental results are presented based on JPL's L-band repeat-pass polarimetric interferometric UAVSAR data over temperate and tropical forest biomes in the Harvard Forest, Massachusetts, and in the La Amistad Park, Panama and Costa Rica. The results are partially compared with ground field measurements and with air-borne LVIS lidar data.

  9. Diversity of Medicinal Plants among Different Forest-use Types of the Pakistani Himalaya.

    Science.gov (United States)

    Adnan, Muhammad; Hölscher, Dirk

    2012-12-01

    Diversity of Medicinal Plants among Different Forest-use Types of the Pakistani Himalaya Medicinal plants collected in Himalayan forests play a vital role in the livelihoods of regional rural societies and are also increasingly recognized at the international level. However, these forests are being heavily transformed by logging. Here we ask how forest transformation influences the diversity and composition of medicinal plants in northwestern Pakistan, where we studied old-growth forests, forests degraded by logging, and regrowth forests. First, an approximate map indicating these forest types was established and then 15 study plots per forest type were randomly selected. We found a total of 59 medicinal plant species consisting of herbs and ferns, most of which occurred in the old-growth forest. Species number was lowest in forest degraded by logging and intermediate in regrowth forest. The most valuable economic species, including six Himalayan endemics, occurred almost exclusively in old-growth forest. Species composition and abundance of forest degraded by logging differed markedly from that of old-growth forest, while regrowth forest was more similar to old-growth forest. The density of medicinal plants positively correlated with tree canopy cover in old-growth forest and negatively in degraded forest, which indicates that species adapted to open conditions dominate in logged forest. Thus, old-growth forests are important as refuge for vulnerable endemics. Forest degraded by logging has the lowest diversity of relatively common medicinal plants. Forest regrowth may foster the reappearance of certain medicinal species valuable to local livelihoods and as such promote acceptance of forest expansion and medicinal plants conservation in the region. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1007/s12231-012-9213-4) contains supplementary material, which is available to authorized users.

  10. r2VIM: A new variable selection method for random forests in genome-wide association studies.

    Science.gov (United States)

    Szymczak, Silke; Holzinger, Emily; Dasgupta, Abhijit; Malley, James D; Molloy, Anne M; Mills, James L; Brody, Lawrence C; Stambolian, Dwight; Bailey-Wilson, Joan E

    2016-01-01

    Machine learning methods and in particular random forests (RFs) are a promising alternative to standard single SNP analyses in genome-wide association studies (GWAS). RFs provide variable importance measures (VIMs) to rank SNPs according to their predictive power. However, in contrast to the established genome-wide significance threshold, no clear criteria exist to determine how many SNPs should be selected for downstream analyses. We propose a new variable selection approach, recurrent relative variable importance measure (r2VIM). Importance values are calculated relative to an observed minimal importance score for several runs of RF and only SNPs with large relative VIMs in all of the runs are selected as important. Evaluations on simulated GWAS data show that the new method controls the number of false-positives under the null hypothesis. Under a simple alternative hypothesis with several independent main effects it is only slightly less powerful than logistic regression. In an experimental GWAS data set, the same strong signal is identified while the approach selects none of the SNPs in an underpowered GWAS. The novel variable selection method r2VIM is a promising extension to standard RF for objectively selecting relevant SNPs in GWAS while controlling the number of false-positive results.

  11. A random matrix approach to VARMA processes

    International Nuclear Information System (INIS)

    Burda, Zdzislaw; Jarosz, Andrzej; Nowak, Maciej A; Snarska, Malgorzata

    2010-01-01

    We apply random matrix theory to derive the spectral density of large sample covariance matrices generated by multivariate VMA(q), VAR(q) and VARMA(q 1 , q 2 ) processes. In particular, we consider a limit where the number of random variables N and the number of consecutive time measurements T are large but the ratio N/T is fixed. In this regime, the underlying random matrices are asymptotically equivalent to free random variables (FRV). We apply the FRV calculus to calculate the eigenvalue density of the sample covariance for several VARMA-type processes. We explicitly solve the VARMA(1, 1) case and demonstrate perfect agreement between the analytical result and the spectra obtained by Monte Carlo simulations. The proposed method is purely algebraic and can be easily generalized to q 1 >1 and q 2 >1.

  12. Geographical variation in soil bacterial community structure in tropical forests in Southeast Asia and temperate forests in Japan based on pyrosequencing analysis of 16S rRNA.

    Science.gov (United States)

    Ito, Natsumi; Iwanaga, Hiroko; Charles, Suliana; Diway, Bibian; Sabang, John; Chong, Lucy; Nanami, Satoshi; Kamiya, Koichi; Lum, Shawn; Siregar, Ulfah J; Harada, Ko; Miyashita, Naohiko T

    2017-09-12

    Geographical variation in soil bacterial community structure in 26 tropical forests in Southeast Asia (Malaysia, Indonesia and Singapore) and two temperate forests in Japan was investigated to elucidate the environmental factors and mechanisms that influence biogeography of soil bacterial diversity and composition. Despite substantial environmental differences, bacterial phyla were represented in similar proportions, with Acidobacteria and Proteobacteria the dominant phyla in all forests except one mangrove forest in Sarawak, although highly significant heterogeneity in frequency of individual phyla was detected among forests. In contrast, species diversity (α-diversity) differed to a much greater extent, being nearly six-fold higher in the mangrove forest (Chao1 index = 6,862) than in forests in Singapore and Sarawak (~1,250). In addition, natural mixed dipterocarp forests had lower species diversity than acacia and oil palm plantations, indicating that aboveground tree composition does not influence soil bacterial diversity. Shannon and Chao1 indices were correlated positively, implying that skewed operational taxonomic unit (OTU) distribution was associated with the abundance of overall and rare (singleton) OTUs. No OTUs were represented in all 28 forests, and forest-specific OTUs accounted for over 70% of all detected OTUs. Forests that were geographically adjacent and/or of the same forest type had similar bacterial species composition, and a positive correlation was detected between species divergence (β-diversity) and direct distance between forests. Both α- and β-diversities were correlated with soil pH. These results suggest that soil bacterial communities in different forests evolve largely independently of each other and that soil bacterial communities adapt to their local environment, modulated by bacterial dispersal (distance effect) and forest type. Therefore, we conclude that the biogeography of soil bacteria communities described here is non-random

  13. First direct landscape-scale measurement of tropical rain forest Leaf Area Index, a key driver of global primary productivity

    Science.gov (United States)

    David B. Clark; Paulo C. Olivas; Steven F. Oberbauer; Deborah A. Clark; Michael G. Ryan

    2008-01-01

    Leaf Area Index (leaf area per unit ground area, LAI) is a key driver of forest productivity but has never previously been measured directly at the landscape scale in tropical rain forest (TRF). We used a modular tower and stratified random sampling to harvest all foliage from forest floor to canopy top in 55 vertical transects (4.6 m2) across 500 ha of old growth in...

  14. Simultaneous comparison and assessment of eight remotely sensed maps of Philippine forests

    Science.gov (United States)

    Estoque, Ronald C.; Pontius, Robert G.; Murayama, Yuji; Hou, Hao; Thapa, Rajesh B.; Lasco, Rodel D.; Villar, Merlito A.

    2018-05-01

    This article compares and assesses eight remotely sensed maps of Philippine forest cover in the year 2010. We examined eight Forest versus Non-Forest maps reclassified from eight land cover products: the Philippine Land Cover, the Climate Change Initiative (CCI) Land Cover, the Landsat Vegetation Continuous Fields (VCF), the MODIS VCF, the MODIS Land Cover Type product (MCD12Q1), the Global Tree Canopy Cover, the ALOS-PALSAR Forest/Non-Forest Map, and the GlobeLand30. The reference data consisted of 9852 randomly distributed sample points interpreted from Google Earth. We created methods to assess the maps and their combinations. Results show that the percentage of the Philippines covered by forest ranges among the maps from a low of 23% for the Philippine Land Cover to a high of 67% for GlobeLand30. Landsat VCF estimates 36% forest cover, which is closest to the 37% estimate based on the reference data. The eight maps plus the reference data agree unanimously on 30% of the sample points, of which 11% are attributable to forest and 19% to non-forest. The overall disagreement between the reference data and Philippine Land Cover is 21%, which is the least among the eight Forest versus Non-Forest maps. About half of the 9852 points have a nested structure such that the forest in a given dataset is a subset of the forest in the datasets that have more forest than the given dataset. The variation among the maps regarding forest quantity and allocation relates to the combined effects of the various definitions of forest and classification errors. Scientists and policy makers must consider these insights when producing future forest cover maps and when establishing benchmarks for forest cover monitoring.

  15. Multivariate wavelet frames

    CERN Document Server

    Skopina, Maria; Protasov, Vladimir

    2016-01-01

    This book presents a systematic study of multivariate wavelet frames with matrix dilation, in particular, orthogonal and bi-orthogonal bases, which are a special case of frames. Further, it provides algorithmic methods for the construction of dual and tight wavelet frames with a desirable approximation order, namely compactly supported wavelet frames, which are commonly required by engineers. It particularly focuses on methods of constructing them. Wavelet bases and frames are actively used in numerous applications such as audio and graphic signal processing, compression and transmission of information. They are especially useful in image recovery from incomplete observed data due to the redundancy of frame systems. The construction of multivariate wavelet frames, especially bases, with desirable properties remains a challenging problem as although a general scheme of construction is well known, its practical implementation in the multidimensional setting is difficult. Another important feature of wavelet is ...

  16. Forests

    Science.gov (United States)

    Louis R. Iverson; Mark W. Schwartz

    1994-01-01

    Originally diminished by development, forests are coming back: forest biomass is accumulating. Forests are repositories for many threatened species. Even with increased standing timber, however, biodiversity is threatened by increased forest fragmentation and by exotic species.

  17. Blood pressure-lowering effect of Shinrin-yoku (Forest bathing): a systematic review and meta-analysis.

    Science.gov (United States)

    Ideno, Yuki; Hayashi, Kunihiko; Abe, Yukina; Ueda, Kayo; Iso, Hiroyasu; Noda, Mitsuhiko; Lee, Jung-Su; Suzuki, Shosuke

    2017-08-16

    Shinrin-yoku (experiencing the forest atmosphere or forest bathing) has received increasing attention from the perspective of preventive medicine in recent years. Some studies have reported that the forest environment decreases blood pressure. However, little is known about the possibility of anti-hypertensive applications of Shinrin-yoku. This study aimed to evaluate preventive or therapeutic effects of the forest environment on blood pressure. We systematically reviewed the medical literature and performed a meta-analysis.Four electronic databases were systematically searched for the period before May 2016 with language restriction of English and Japanese. The review considered all published, randomized, controlled trials, cohort studies, and comparative studies that evaluated the effects of the forest environment on changes in systolic blood pressure. A subsequent meta-analysis was performed. Twenty trials involving 732 participants were reviewed. Systolic blood pressure of the forest environment was significantly lower than that of the non-forest environment. Additionally, diastolic blood pressure of the forest environment was significantly lower than that of the non-forest environment. This systematic review shows a significant effect of Shinrin-yoku on reduction of blood pressure.

  18. Forest structure in low-diversity tropical forests: a study of Hawaiian wet and dry forests.

    Science.gov (United States)

    Ostertag, Rebecca; Inman-Narahari, Faith; Cordell, Susan; Giardina, Christian P; Sack, Lawren

    2014-01-01

    The potential influence of diversity on ecosystem structure and function remains a topic of significant debate, especially for tropical forests where diversity can range widely. We used Center for Tropical Forest Science (CTFS) methodology to establish forest dynamics plots in montane wet forest and lowland dry forest on Hawai'i Island. We compared the species diversity, tree density, basal area, biomass, and size class distributions between the two forest types. We then examined these variables across tropical forests within the CTFS network. Consistent with other island forests, the Hawai'i forests were characterized by low species richness and very high relative dominance. The two Hawai'i forests were floristically distinct, yet similar in species richness (15 vs. 21 species) and stem density (3078 vs. 3486/ha). While these forests were selected for their low invasive species cover relative to surrounding forests, both forests averaged 5->50% invasive species cover; ongoing removal will be necessary to reduce or prevent competitive impacts, especially from woody species. The montane wet forest had much larger trees, resulting in eightfold higher basal area and above-ground biomass. Across the CTFS network, the Hawaiian montane wet forest was similar to other tropical forests with respect to diameter distributions, density, and aboveground biomass, while the Hawai'i lowland dry forest was similar in density to tropical forests with much higher diversity. These findings suggest that forest structural variables can be similar across tropical forests independently of species richness. The inclusion of low-diversity Pacific Island forests in the CTFS network provides an ∼80-fold range in species richness (15-1182 species), six-fold variation in mean annual rainfall (835-5272 mm yr(-1)) and 1.8-fold variation in mean annual temperature (16.0-28.4°C). Thus, the Hawaiian forest plots expand the global forest plot network to enable testing of ecological theory for

  19. Forest structure in low-diversity tropical forests: a study of Hawaiian wet and dry forests.

    Directory of Open Access Journals (Sweden)

    Rebecca Ostertag

    Full Text Available The potential influence of diversity on ecosystem structure and function remains a topic of significant debate, especially for tropical forests where diversity can range widely. We used Center for Tropical Forest Science (CTFS methodology to establish forest dynamics plots in montane wet forest and lowland dry forest on Hawai'i Island. We compared the species diversity, tree density, basal area, biomass, and size class distributions between the two forest types. We then examined these variables across tropical forests within the CTFS network. Consistent with other island forests, the Hawai'i forests were characterized by low species richness and very high relative dominance. The two Hawai'i forests were floristically distinct, yet similar in species richness (15 vs. 21 species and stem density (3078 vs. 3486/ha. While these forests were selected for their low invasive species cover relative to surrounding forests, both forests averaged 5->50% invasive species cover; ongoing removal will be necessary to reduce or prevent competitive impacts, especially from woody species. The montane wet forest had much larger trees, resulting in eightfold higher basal area and above-ground biomass. Across the CTFS network, the Hawaiian montane wet forest was similar to other tropical forests with respect to diameter distributions, density, and aboveground biomass, while the Hawai'i lowland dry forest was similar in density to tropical forests with much higher diversity. These findings suggest that forest structural variables can be similar across tropical forests independently of species richness. The inclusion of low-diversity Pacific Island forests in the CTFS network provides an ∼80-fold range in species richness (15-1182 species, six-fold variation in mean annual rainfall (835-5272 mm yr(-1 and 1.8-fold variation in mean annual temperature (16.0-28.4°C. Thus, the Hawaiian forest plots expand the global forest plot network to enable testing of ecological

  20. ( Quercus spp. ) using random amplified polymorphic DNA (RAPD)

    African Journals Online (AJOL)

    Quercus is one of the most important woody genera of the Northern hemisphere and considered as one of the main forest tree species in Iran. In this study, genetic relationships in the genus Quercus, using random amplified polymorphic DNA (RAPD) was examined. Five species, including: Quercus robur, Quercus ...

  1. Procesoptimerende multivariable regulatorer til kraftværkskedler. Process Optimizing Multivariable Controllers for Powerplant Boilers

    DEFF Research Database (Denmark)

    Hansen, T.

    The purpose of this Ph.D. thesis is twofold: The first purpose is to devise a new method for application of multivariable controllers in boiler control systems in which they act as optional process optimizing extensions to conventional control systems and in such a way that the safety measures...... mentioned, the concept is applicable to new as well as existing plants. The seccond purpose is to suggest specific methods for experimental modelling and multivariable controller design which are possible to use under the conceptual framework, implement them and test them in a boiler application....

  2. TALL-HERB BOREAL FORESTS ON NORTH URAL

    Directory of Open Access Journals (Sweden)

    A. A. Aleinikov

    2016-09-01

    Full Text Available Background. One of the pressing aims of today’s natural resource management is its re-orientation to preserving and restoring ecological functions of ecosystems, among which the function of biodiversity maintenance plays an indicator role. The majority of today’s forests have not retained their natural appearance as the result of long-standing human impact. In this connection, refugia studies are becoming particularly interesting, as they give us an insight into the natural appearance of forests. Materials and methods. Studies were performed in dark conifer forests of the Pechora–Ilych reserve, in the lower reaches of the Bol’shaya Porozhnyaya River in 2013 yr. Vegetation data sampling was done at 50 temporary square plots of a fixed size (100 m2 randomly placed within a forest type. A list of plant species with species abundance was made for each forest layer. The overstorey (or tree canopy layer was denoted by the Latin letter A. The understorey layer (indicated by the letter B included tree undergrowth and tall shrubs. Ground vegetation was subdivided into the layers C and D. Layer C (field layer comprised the herbaceous species (herbs, grasses, sedges and dwarf shrubs together with low shrubs, tree and shrub seedlings. The height of the field layer was defined by the maximal height of the herbaceous species, ferns, and dwarf shrubs; the height varied from several cm to more than 200 cm in the ‘tall-herb’ forest types. Layer D (bottom layer included cryptogamic species (bryophytes and lichens. Species abundance in the each layer was usually assessed using the Braun-Blanquet cover scale (Braun-Blanquet 1928. The nomenclature used follows Cherepanov’s (1995 for vascular plants, and Ignatov & Afonina’s (1992. Results. The present article contains descriptions of unique tall-herb boreal forests of European Russia preserved in certain refugia which did not experience prolonged anthropogenic impact or any other catastrophes

  3. Multivariate data analysis

    DEFF Research Database (Denmark)

    Hansen, Michael Adsetts Edberg

    Interest in statistical methodology is increasing so rapidly in the astronomical community that accessible introductory material in this area is long overdue. This book fills the gap by providing a presentation of the most useful techniques in multivariate statistics. A wide-ranging annotated set...

  4. Effect of different tree mortality patterns on stand development in the forest model SIBYLA

    Directory of Open Access Journals (Sweden)

    Trombik Jiří

    2016-09-01

    Full Text Available Forest mortality critically affects stand structure and the quality of ecosystem services provided by forests. Spruce bark beetle (Ips typographus generates rather complex infestation and mortality patterns, and implementation of such patterns in forest models is challenging. We present here the procedure, which allows to simulate the bark beetle-related tree mortality in the forest dynamics model Sibyla. We explored how sensitive various production and stand structure indicators are to tree mortality patterns, which can be generated by bark beetles. We compared the simulation outputs for three unmanaged forest stands with 40, 70 and 100% proportion of spruce as affected by the disturbance-related mortality that occurred in a random pattern and in a patchy pattern. The used tree species and age class-specific mortality rates were derived from the disturbance-related mortality records from Slovakia. The proposed algorithm was developed in the SQLite using the Python language, and the algorithm allowed us to define the degree of spatial clustering of dead trees ranging from a random distribution to a completely clustered distribution; a number of trees that died in either mode is set to remain equal. We found significant differences between the long-term developments of the three investigated forest stands, but we found very little effect of the tested mortality modes on stand increment, tree species composition and diversity, and tree size diversity. Hence, our hypothesis that the different pattern of dead trees emergence should affect the competitive interactions between trees and regeneration, and thus affect selected productivity and stand structure indicators was not confirmed.

  5. Forest ownership dynamics of southern forests

    Science.gov (United States)

    Brett J. Butler; David N. Wear

    2013-01-01

    Key FindingsPrivate landowners hold 86 percent of the forest area in the South; two-thirds of this area is owned by families or individuals.Fifty-nine percent of family forest owners own between 1 and 9 acres of forest land, but 60 percent of family-owned forests are in holdings of 100 acres or more.Two-...

  6. An Exact Confidence Region in Multivariate Calibration

    OpenAIRE

    Mathew, Thomas; Kasala, Subramanyam

    1994-01-01

    In the multivariate calibration problem using a multivariate linear model, an exact confidence region is constructed. It is shown that the region is always nonempty and is invariant under nonsingular transformations.

  7. Modelling bark beetle disturbances in a large scale forest scenario model to assess climate change impacts and evaluate adaptive management strategies

    NARCIS (Netherlands)

    Seidl, R.; Schelhaas, M.J.; Lindner, M.; Lexer, M.J.

    2009-01-01

    To study potential consequences of climate-induced changes in the biotic disturbance regime at regional to national scale we integrated a model of Ips typographus (L. Scol. Col.) damages into the large-scale forest scenario model EFISCEN. A two-stage multivariate statistical meta-model was used to

  8. Multivariate statistical modelling based on generalized linear models

    CERN Document Server

    Fahrmeir, Ludwig

    1994-01-01

    This book is concerned with the use of generalized linear models for univariate and multivariate regression analysis. Its emphasis is to provide a detailed introductory survey of the subject based on the analysis of real data drawn from a variety of subjects including the biological sciences, economics, and the social sciences. Where possible, technical details and proofs are deferred to an appendix in order to provide an accessible account for non-experts. Topics covered include: models for multi-categorical responses, model checking, time series and longitudinal data, random effects models, and state-space models. Throughout, the authors have taken great pains to discuss the underlying theoretical ideas in ways that relate well to the data at hand. As a result, numerous researchers whose work relies on the use of these models will find this an invaluable account to have on their desks. "The basic aim of the authors is to bring together and review a large part of recent advances in statistical modelling of m...

  9. Derivation of a new ADAS-cog composite using tree-based multivariate analysis: prediction of conversion from mild cognitive impairment to Alzheimer disease.

    Science.gov (United States)

    Llano, Daniel A; Laforet, Genevieve; Devanarayan, Viswanath

    2011-01-01

    Model-based statistical approaches were used to compare the ability of the Alzheimer's Disease Assessment Scale-cognitive subscale (ADAS-cog), cerebrospinal fluid (CSF), fluorodeoxyglucose positron emission tomography and volumetric magnetic resonance imaging (MRI) markers to predict 12-month progression from mild cognitive impairment (MCI) to Alzheimer disease (AD). Using the Alzheimer's Disease Neuroimaging Initiative (ADNI) data set, properties of the 11-item ADAS-cog (ADAS.11), the 13-item ADAS-cog (ADAS.All) and novel composite scores were compared, using weighting schemes derived from the Random Forests (RF) tree-based multivariate model. Weighting subscores using the RF model of ADAS.All enhanced discrimination between elderly controls, MCI and AD patients. The ability of the RF-weighted ADAS-cog composite and individual scores, along with neuroimaging or biochemical biomarkers to predict MCI to AD conversion over 12 months was also assessed. Although originally optimized to discriminate across diagnostic categories, the ADAS. All, weighted according to the RF model, did nearly as well or better than individual or composite baseline neuroimaging or CSF biomarkers in prediction of 12-month conversion from MCI to AD. These suggest that a modified subscore weighting scheme applied to the 13-item ADAS-cog is comparable to imaging or CSF markers in prediction of conversion from MCI to AD at 12 months. Copyright © 2011 by Lippincott Williams & Wilkins

  10. Localization for random Schroedinger operators with correlated potentials

    Energy Technology Data Exchange (ETDEWEB)

    Von Dreifus, H [Princeton Univ., NJ (USA). Dept. of Physics; Klein, A [California Univ., Irvine (USA). Dept. of Mathematics

    1991-08-01

    We prove localization at high disorder or low energy for lattice Schroedinger operators with random potentials whose values at different lattice sites are correlated over large distances. The class of admissible random potentials for our multiscale analysis includes potentials with a stationary Gaussian distribution whose covariance function C(x,y) decays as vertical strokex-yvertical stroke{sup -{theta}}, where {theta}>0 can be arbitrarily small, and potentials whose probability distribution is a completely analytical Gibbs measure. The result for Gaussian potentials depends on a multivariable form of Nelson's best possible hypercontractive estimate. (orig.).

  11. Nonlocal atlas-guided multi-channel forest learning for human brain labeling.

    Science.gov (United States)

    Ma, Guangkai; Gao, Yaozong; Wu, Guorong; Wu, Ligang; Shen, Dinggang

    2016-02-01

    It is important for many quantitative brain studies to label meaningful anatomical regions in MR brain images. However, due to high complexity of brain structures and ambiguous boundaries between different anatomical regions, the anatomical labeling of MR brain images is still quite a challenging task. In many existing label fusion methods, appearance information is widely used. However, since local anatomy in the human brain is often complex, the appearance information alone is limited in characterizing each image point, especially for identifying the same anatomical structure across different subjects. Recent progress in computer vision suggests that the context features can be very useful in identifying an object from a complex scene. In light of this, the authors propose a novel learning-based label fusion method by using both low-level appearance features (computed from the target image) and high-level context features (computed from warped atlases or tentative labeling maps of the target image). In particular, the authors employ a multi-channel random forest to learn the nonlinear relationship between these hybrid features and target labels (i.e., corresponding to certain anatomical structures). Specifically, at each of the iterations, the random forest will output tentative labeling maps of the target image, from which the authors compute spatial label context features and then use in combination with original appearance features of the target image to refine the labeling. Moreover, to accommodate the high inter-subject variations, the authors further extend their learning-based label fusion to a multi-atlas scenario, i.e., they train a random forest for each atlas and then obtain the final labeling result according to the consensus of results from all atlases. The authors have comprehensively evaluated their method on both public LONI_LBPA40 and IXI datasets. To quantitatively evaluate the labeling accuracy, the authors use the dice similarity coefficient

  12. Multivariate rational data fitting

    Science.gov (United States)

    Cuyt, Annie; Verdonk, Brigitte

    1992-12-01

    Sections 1 and 2 discuss the advantages of an object-oriented implementation combined with higher floating-point arithmetic, of the algorithms available for multivariate data fitting using rational functions. Section 1 will in particular explain what we mean by "higher arithmetic". Section 2 will concentrate on the concepts of "object orientation". In sections 3 and 4 we shall describe the generality of the data structure that can be dealt with: due to some new results virtually every data set is acceptable right now, with possible coalescence of coordinates or points. In order to solve the multivariate rational interpolation problem the data sets are fed to different algorithms depending on the structure of the interpolation points in then-variate space.

  13. Multivariate missing data in hydrology - Review and applications

    Science.gov (United States)

    Ben Aissia, Mohamed-Aymen; Chebana, Fateh; Ouarda, Taha B. M. J.

    2017-12-01

    Water resources planning and management require complete data sets of a number of hydrological variables, such as flood peaks and volumes. However, hydrologists are often faced with the problem of missing data (MD) in hydrological databases. Several methods are used to deal with the imputation of MD. During the last decade, multivariate approaches have gained popularity in the field of hydrology, especially in hydrological frequency analysis (HFA). However, treating the MD remains neglected in the multivariate HFA literature whereas the focus has been mainly on the modeling component. For a complete analysis and in order to optimize the use of data, MD should also be treated in the multivariate setting prior to modeling and inference. Imputation of MD in the multivariate hydrological framework can have direct implications on the quality of the estimation. Indeed, the dependence between the series represents important additional information that can be included in the imputation process. The objective of the present paper is to highlight the importance of treating MD in multivariate hydrological frequency analysis by reviewing and applying multivariate imputation methods and by comparing univariate and multivariate imputation methods. An application is carried out for multiple flood attributes on three sites in order to evaluate the performance of the different methods based on the leave-one-out procedure. The results indicate that, the performance of imputation methods can be improved by adopting the multivariate setting, compared to mean substitution and interpolation methods, especially when using the copula-based approach.

  14. Multivariate statistics high-dimensional and large-sample approximations

    CERN Document Server

    Fujikoshi, Yasunori; Shimizu, Ryoichi

    2010-01-01

    A comprehensive examination of high-dimensional analysis of multivariate methods and their real-world applications Multivariate Statistics: High-Dimensional and Large-Sample Approximations is the first book of its kind to explore how classical multivariate methods can be revised and used in place of conventional statistical tools. Written by prominent researchers in the field, the book focuses on high-dimensional and large-scale approximations and details the many basic multivariate methods used to achieve high levels of accuracy. The authors begin with a fundamental presentation of the basic

  15. Forests

    International Nuclear Information System (INIS)

    Melin, J.

    1997-01-01

    Forests have the capacity to trap and retain radionuclides for a substantial period of time. The dynamic behaviour of nutrients, pollution and radionuclides in forests is complex. The rotation period of a forest stand in the Nordic countries is about 100 years, whilst the time for decomposition of organic material in a forest environment can be several hundred years. This means that any countermeasure applied in the forest environment must have an effect for several decades, or be reapplied continuously for long periods of time. To mitigate the detrimental effect of a contaminated forest environment on man, and to minimise the economic loss in trade of contaminated forest products, it is necessary to understand the mechanisms of transfer of radionuclides through the forest environment. It must also be stressed that any countermeasure applied in the forest environment must be evaluated with respect to long, as well as short term, negative effects, before any decision about remedial action is taken. Of the radionuclides studied in forests in the past, radiocaesium has been the main contributor to dose to man. In this document, only radiocaesium will be discussed since data on the impact of other radionuclides on man are too scarce for a proper evaluation. (EG)

  16. Predicting Long-Term Cognitive Outcome Following Breast Cancer with Pre-Treatment Resting State fMRI and Random Forest Machine Learning.

    Science.gov (United States)

    Kesler, Shelli R; Rao, Arvind; Blayney, Douglas W; Oakley-Girvan, Ingrid A; Karuturi, Meghan; Palesh, Oxana

    2017-01-01

    We aimed to determine if resting state functional magnetic resonance imaging (fMRI) acquired at pre-treatment baseline could accurately predict breast cancer-related cognitive impairment at long-term follow-up. We evaluated 31 patients with breast cancer (age 34-65) prior to any treatment, post-chemotherapy and 1 year later. Cognitive testing scores were normalized based on data obtained from 43 healthy female controls and then used to categorize patients as impaired or not based on longitudinal changes. We measured clustering coefficient, a measure of local connectivity, by applying graph theory to baseline resting state fMRI and entered these metrics along with relevant patient-related and medical variables into random forest classification. Incidence of cognitive impairment at 1 year follow-up was 55% and was predicted by classification algorithms with up to 100% accuracy ( p breast cancer. This information could inform treatment decision making by identifying patients at highest risk for long-term cognitive impairment.

  17. A Novel Approach for Multi Class Fault Diagnosis in Induction Machine Based on Statistical Time Features and Random Forest Classifier

    Science.gov (United States)

    Sonje, M. Deepak; Kundu, P.; Chowdhury, A.

    2017-08-01

    Fault diagnosis and detection is the important area in health monitoring of electrical machines. This paper proposes the recently developed machine learning classifier for multi class fault diagnosis in induction machine. The classification is based on random forest (RF) algorithm. Initially, stator currents are acquired from the induction machine under various conditions. After preprocessing the currents, fourteen statistical time features are estimated for each phase of the current. These parameters are considered as inputs to the classifier. The main scope of the paper is to evaluate effectiveness of RF classifier for individual and mixed fault diagnosis in induction machine. The stator, rotor and mixed faults (stator and rotor faults) are classified using the proposed classifier. The obtained performance measures are compared with the multilayer perceptron neural network (MLPNN) classifier. The results show the much better performance measures and more accurate than MLPNN classifier. For demonstration of planned fault diagnosis algorithm, experimentally obtained results are considered to build the classifier more practical.

  18. Forests and Forest Cover - MDC_NaturalForestCommunity

    Data.gov (United States)

    NSGIC Local Govt | GIS Inventory — A point feature class of NFCs - Natural Forest Communities. Natural Forest Community shall mean all stands of trees (including their associated understory) which...

  19. Exploratory multivariate analysis by example using R

    CERN Document Server

    Husson, Francois; Pages, Jerome

    2010-01-01

    Full of real-world case studies and practical advice, Exploratory Multivariate Analysis by Example Using R focuses on four fundamental methods of multivariate exploratory data analysis that are most suitable for applications. It covers principal component analysis (PCA) when variables are quantitative, correspondence analysis (CA) and multiple correspondence analysis (MCA) when variables are categorical, and hierarchical cluster analysis.The authors take a geometric point of view that provides a unified vision for exploring multivariate data tables. Within this framework, they present the prin

  20. Ellipsoidal prediction regions for multivariate uncertainty characterization

    DEFF Research Database (Denmark)

    Golestaneh, Faranak; Pinson, Pierre; Azizipanah-Abarghooee, Rasoul

    2018-01-01

    , for classes of decision-making problems based on robust, interval chance-constrained optimization, necessary inputs take the form of multivariate prediction regions rather than scenarios. The current literature is at very primitive stage of characterizing multivariate prediction regions to be employed...... in these classes of optimization problems. To address this issue, we introduce a new class of multivariate forecasts which form as multivariate ellipsoids for non-Gaussian variables. We propose a data-driven systematic framework to readily generate and evaluate ellipsoidal prediction regions, with predefined...... probability guarantees and minimum conservativeness. A skill score is proposed for quantitative assessment of the quality of prediction ellipsoids. A set of experiments is used to illustrate the discrimination ability of the proposed scoring rule for potential misspecification of ellipsoidal prediction regions...

  1. Drivers for plant species diversity in a characteristic tropical forest landscape in Bangladesh

    DEFF Research Database (Denmark)

    Steinbauer, Manuel; Uddin, Mohammad Bela; Jentsch, Anke

    2016-01-01

    species richness and community composition along a land use intensity gradient in a forest landscape including tea gardens, tree plantations and nature reserves (Satchari Reserved Forest) based on multivariate approaches and variation partitioning. We find richness as well composition of tree...... and understory species to directly relate to a disturbance gradient that reflects protection status and elevation. This is astonishing, as the range in elevation is with 70 m really small. Topography and protection remain significant drivers of biodiversity after correcting for human disturbances. While tree...... and non-tree species richness were positively correlated, they differ considerably in their relation to other environmental or disturbance variables as well as in the spatial richness pattern. The disturbance regime particularly structures tree species richness and composition in protected areas. We...

  2. Plant trait-species abundance relationships vary with environmental properties in subtropical forests in eastern china.

    Directory of Open Access Journals (Sweden)

    En-Rong Yan

    Full Text Available Understanding how plant trait-species abundance relationships change with a range of single and multivariate environmental properties is crucial for explaining species abundance and rarity. In this study, the abundance of 94 woody plant species was examined and related to 15 plant leaf and wood traits at both local and landscape scales involving 31 plots in subtropical forests in eastern China. Further, plant trait-species abundance relationships were related to a range of single and multivariate (PCA axes environmental properties such as air humidity, soil moisture content, soil temperature, soil pH, and soil organic matter, nitrogen (N and phosphorus (P contents. At the landscape scale, plant maximum height, and twig and stem wood densities were positively correlated, whereas mean leaf area (MLA, leaf N concentration (LN, and total leaf area per twig size (TLA were negatively correlated with species abundance. At the plot scale, plant maximum height, leaf and twig dry matter contents, twig and stem wood densities were positively correlated, but MLA, specific leaf area, LN, leaf P concentration and TLA were negatively correlated with species abundance. Plant trait-species abundance relationships shifted over the range of seven single environmental properties and along multivariate environmental axes in a similar way. In conclusion, strong relationships between plant traits and species abundance existed among and within communities. Significant shifts in plant trait-species abundance relationships in a range of environmental properties suggest strong environmental filtering processes that influence species abundance and rarity in the studied subtropical forests.

  3. Assessment of Antarctic moss health from multi-sensor UAS imagery with Random Forest Modelling

    Science.gov (United States)

    Turner, Darren; Lucieer, Arko; Malenovský, Zbyněk; King, Diana; Robinson, Sharon A.

    2018-06-01

    Moss beds are one of very few terrestrial vegetation types that can be found on the Antarctic continent and as such mapping their extent and monitoring their health is important to environmental managers. Across Antarctica, moss beds are experiencing changes in health as their environment changes. As Antarctic moss beds are spatially fragmented with relatively small extent they require very high resolution remotely sensed imagery to monitor their distribution and dynamics. This study demonstrates that multi-sensor imagery collected by an Unmanned Aircraft System (UAS) provides a novel data source for assessment of moss health. In this study, we train a Random Forest Regression Model (RFM) with long-term field quadrats at a study site in the Windmill Islands, East Antarctica and apply it to UAS RGB and 6-band multispectral imagery, derived vegetation indices, 3D topographic data, and thermal imagery to predict moss health. Our results suggest that moss health, expressed as a percentage between 0 and 100% healthy, can be estimated with a root mean squared error (RMSE) between 7 and 12%. The RFM also quantifies the importance of input variables for moss health estimation showing the multispectral sensor data was important for accurate health prediction, such information being essential for planning future field investigations. The RFM was applied to the entire moss bed, providing an extrapolation of the health assessment across a larger spatial area. With further validation the resulting maps could be used for change detection of moss health across multiple sites and seasons.

  4. A comparison of random forest regression and multiple linear regression for prediction in neuroscience.

    Science.gov (United States)

    Smith, Paul F; Ganesh, Siva; Liu, Ping

    2013-10-30

    Regression is a common statistical tool for prediction in neuroscience. However, linear regression is by far the most common form of regression used, with regression trees receiving comparatively little attention. In this study, the results of conventional multiple linear regression (MLR) were compared with those of random forest regression (RFR), in the prediction of the concentrations of 9 neurochemicals in the vestibular nucleus complex and cerebellum that are part of the l-arginine biochemical pathway (agmatine, putrescine, spermidine, spermine, l-arginine, l-ornithine, l-citrulline, glutamate and γ-aminobutyric acid (GABA)). The R(2) values for the MLRs were higher than the proportion of variance explained values for the RFRs: 6/9 of them were ≥ 0.70 compared to 4/9 for RFRs. Even the variables that had the lowest R(2) values for the MLRs, e.g. ornithine (0.50) and glutamate (0.61), had much lower proportion of variance explained values for the RFRs (0.27 and 0.49, respectively). The RSE values for the MLRs were lower than those for the RFRs in all but two cases. In general, MLRs seemed to be superior to the RFRs in terms of predictive value and error. In the case of this data set, MLR appeared to be superior to RFR in terms of its explanatory value and error. This result suggests that MLR may have advantages over RFR for prediction in neuroscience with this kind of data set, but that RFR can still have good predictive value in some cases. Copyright © 2013 Elsevier B.V. All rights reserved.

  5. What variables are important in predicting bovine viral diarrhea virus? A random forest approach.

    Science.gov (United States)

    Machado, Gustavo; Mendoza, Mariana Recamonde; Corbellini, Luis Gustavo

    2015-07-24

    Bovine viral diarrhea virus (BVDV) causes one of the most economically important diseases in cattle, and the virus is found worldwide. A better understanding of the disease associated factors is a crucial step towards the definition of strategies for control and eradication. In this study we trained a random forest (RF) prediction model and performed variable importance analysis to identify factors associated with BVDV occurrence. In addition, we assessed the influence of features selection on RF performance and evaluated its predictive power relative to other popular classifiers and to logistic regression. We found that RF classification model resulted in an average error rate of 32.03% for the negative class (negative for BVDV) and 36.78% for the positive class (positive for BVDV).The RF model presented area under the ROC curve equal to 0.702. Variable importance analysis revealed that important predictors of BVDV occurrence were: a) who inseminates the animals, b) number of neighboring farms that have cattle and c) rectal palpation performed routinely. Our results suggest that the use of machine learning algorithms, especially RF, is a promising methodology for the analysis of cross-sectional studies, presenting a satisfactory predictive power and the ability to identify predictors that represent potential risk factors for BVDV investigation. We examined classical predictors and found some new and hard to control practices that may lead to the spread of this disease within and among farms, mainly regarding poor or neglected reproduction management, which should be considered for disease control and eradication.

  6. Multivariate realised kernels

    DEFF Research Database (Denmark)

    Barndorff-Nielsen, Ole; Hansen, Peter Reinhard; Lunde, Asger

    We propose a multivariate realised kernel to estimate the ex-post covariation of log-prices. We show this new consistent estimator is guaranteed to be positive semi-definite and is robust to measurement noise of certain types and can also handle non-synchronous trading. It is the first estimator...

  7. Combating Forest Corruption: the Forest Integrity Network

    NARCIS (Netherlands)

    Gupta, A.; Siebert, U.

    2004-01-01

    This article describes the strategies and activities of the Forest Integrity Network. One of the most important underlying causes of forest degradation is corruption and related illegal logging. The Forest Integrity Network is a timely new initiative to combat forest corruption. Its approach is to

  8. An efficient word typing P300-BCI system using a modified T9 interface and random forest classifier.

    Science.gov (United States)

    Akram, Faraz; Han, Seung Moo; Kim, Tae-Seong

    2015-01-01

    A typical P300-based spelling brain computer interface (BCI) system types a single character with a character presentation paradigm and a P300 classification system. Lately, a few attempts have been made to type a whole word with the help of a smart dictionary that suggests some candidate words with the input of a few initial characters. In this paper, we propose a novel paradigm utilizing initial character typing with word suggestions and a novel P300 classifier to increase word typing speed and accuracy. The novel paradigm involves modifying the Text on 9 keys (T9) interface, which is similar to the keypad of a mobile phone used for text messaging. Users can type the initial characters using a 3×3 matrix interface and an integrated custom-built dictionary that suggests candidate words as the user types the initials. Then the user can select one of the given suggestions to complete word typing. We have adopted a random forest classifier, which significantly improves P300 classification accuracy by combining multiple decision trees. We conducted experiments with 10 subjects using the proposed BCI system. Our proposed paradigms significantly reduced word typing time and made word typing more convenient by outputting complete words with only a few initial character inputs. The conventional spelling system required an average time of 3.47 min per word while typing 10 random words, whereas our proposed system took an average time of 1.67 min per word, a 51.87% improvement, for the same words under the same conditions. Copyright © 2014 Elsevier Ltd. All rights reserved.

  9. Control Multivariable por Desacoplo

    Directory of Open Access Journals (Sweden)

    Fernando Morilla

    2013-01-01

    Full Text Available Resumen: La interacción entre variables es una característica inherente de los procesos multivariables, que dificulta su operación y el diseño de sus sistemas de control. Bajo el paradigma de Control por desacoplo se agrupan un conjunto de metodologías, que tradicionalmente han estado orientadas a eliminar o reducir la interacción, y que recientemente algunos investigadores han reorientado con objetivos de solucionar un problema tan complejo como es el control multivariable. Parte del material descrito en este artículo es bien conocido en el campo del control de procesos, pero la mayor parte de él son resultados de varios años de investigación de los autores en los que han primado la generalización del problema, la búsqueda de soluciones de fácil implementación y la combinación de bloques elementales de control PID. Esta conjunción de intereses provoca que no siempre se pueda conseguir un desacoplo perfecto, pero que sí se pueda conseguir una considerable reducción de la interacción en el nivel básico de la pirámide de control, en beneficio de otros sistemas de control que ocupan niveles jerárquicos superiores. El artículo resume todos los aspectos básicos del Control por desacoplo y su aplicación a dos procesos representativos: una planta experimental de cuatro tanques acoplados y un modelo 4×4 de un sistema experimental de calefacción, ventilación y aire acondicionado. Abstract: The interaction between variables is inherent in multivariable processes and this fact may complicate their operation and control system design. Under the paradigm of decoupling control, several methodologies that traditionally have been addressed to cancel or reduce the interactions are gathered. Recently, this approach has been reoriented by several researchers with the aim to solve such a complex problem as the multivariable control. Parts of the material in this work are well known in the process control field; however, most of them are

  10. Time fluctuation analysis of forest fire sequences

    Science.gov (United States)

    Vega Orozco, Carmen D.; Kanevski, Mikhaïl; Tonini, Marj; Golay, Jean; Pereira, Mário J. G.

    2013-04-01

    Forest fires are complex events involving both space and time fluctuations. Understanding of their dynamics and pattern distribution is of great importance in order to improve the resource allocation and support fire management actions at local and global levels. This study aims at characterizing the temporal fluctuations of forest fire sequences observed in Portugal, which is the country that holds the largest wildfire land dataset in Europe. This research applies several exploratory data analysis measures to 302,000 forest fires occurred from 1980 to 2007. The applied clustering measures are: Morisita clustering index, fractal and multifractal dimensions (box-counting), Ripley's K-function, Allan Factor, and variography. These algorithms enable a global time structural analysis describing the degree of clustering of a point pattern and defining whether the observed events occur randomly, in clusters or in a regular pattern. The considered methods are of general importance and can be used for other spatio-temporal events (i.e. crime, epidemiology, biodiversity, geomarketing, etc.). An important contribution of this research deals with the analysis and estimation of local measures of clustering that helps understanding their temporal structure. Each measure is described and executed for the raw data (forest fires geo-database) and results are compared to reference patterns generated under the null hypothesis of randomness (Poisson processes) embedded in the same time period of the raw data. This comparison enables estimating the degree of the deviation of the real data from a Poisson process. Generalizations to functional measures of these clustering methods, taking into account the phenomena, were also applied and adapted to detect time dependences in a measured variable (i.e. burned area). The time clustering of the raw data is compared several times with the Poisson processes at different thresholds of the measured function. Then, the clustering measure value

  11. Tropical forest carbon balance: effects of field- and satellite-based mortality regimes on the dynamics and the spatial structure of Central Amazon forest biomass

    Science.gov (United States)

    Di Vittorio, Alan V.; Negrón-Juárez, Robinson I.; Higuchi, Niro; Chambers, Jeffrey Q.

    2014-03-01

    Debate continues over the adequacy of existing field plots to sufficiently capture Amazon forest dynamics to estimate regional forest carbon balance. Tree mortality dynamics are particularly uncertain due to the difficulty of observing large, infrequent disturbances. A recent paper (Chambers et al 2013 Proc. Natl Acad. Sci. 110 3949-54) reported that Central Amazon plots missed 9-17% of tree mortality, and here we address ‘why’ by elucidating two distinct mortality components: (1) variation in annual landscape-scale average mortality and (2) the frequency distribution of the size of clustered mortality events. Using a stochastic-empirical tree growth model we show that a power law distribution of event size (based on merged plot and satellite data) is required to generate spatial clustering of mortality that is consistent with forest gap observations. We conclude that existing plots do not sufficiently capture losses because their placement, size, and longevity assume spatially random mortality, while mortality is actually distributed among differently sized events (clusters of dead trees) that determine the spatial structure of forest canopies.

  12. Tropical forest carbon balance: effects of field- and satellite-based mortality regimes on the dynamics and the spatial structure of Central Amazon forest biomass

    International Nuclear Information System (INIS)

    Di Vittorio, Alan V; Negrón-Juárez, Robinson I; Chambers, Jeffrey Q; Higuchi, Niro

    2014-01-01

    Debate continues over the adequacy of existing field plots to sufficiently capture Amazon forest dynamics to estimate regional forest carbon balance. Tree mortality dynamics are particularly uncertain due to the difficulty of observing large, infrequent disturbances. A recent paper (Chambers et al 2013 Proc. Natl Acad. Sci. 110 3949–54) reported that Central Amazon plots missed 9–17% of tree mortality, and here we address ‘why’ by elucidating two distinct mortality components: (1) variation in annual landscape-scale average mortality and (2) the frequency distribution of the size of clustered mortality events. Using a stochastic-empirical tree growth model we show that a power law distribution of event size (based on merged plot and satellite data) is required to generate spatial clustering of mortality that is consistent with forest gap observations. We conclude that existing plots do not sufficiently capture losses because their placement, size, and longevity assume spatially random mortality, while mortality is actually distributed among differently sized events (clusters of dead trees) that determine the spatial structure of forest canopies. (paper)

  13. Automatic gender determination from 3D digital maxillary tooth plaster models based on the random forest algorithm and discrete cosine transform.

    Science.gov (United States)

    Akkoç, Betül; Arslan, Ahmet; Kök, Hatice

    2017-05-01

    One of the first stages in the identification of an individual is gender determination. Through gender determination, the search spectrum can be reduced. In disasters such as accidents or fires, which can render identification somewhat difficult, durable teeth are an important source for identification. This study proposes a smart system that can automatically determine gender using 3D digital maxillary tooth plaster models. The study group was composed of 40 Turkish individuals (20 female, 20 male) between the ages of 21 and 24. Using the iterative closest point (ICP) algorithm, tooth models were aligned, and after the segmentation process, models were transformed into depth images. The local discrete cosine transform (DCT) was used in the process of feature extraction, and the random forest (RF) algorithm was used for the process of classification. Classification was performed using 30 different seeds for random generator values and 10-fold cross-validation. A value of 85.166% was obtained for average classification accuracy (CA) and a value of 91.75% for the area under the ROC curve (AUC). A multi-disciplinary study is performed here that includes computer sciences, medicine and dentistry. A smart system is proposed for the determination of gender from 3D digital models of maxillary tooth plaster models. This study has the capacity to extend the field of gender determination from teeth. Copyright © 2017 Elsevier B.V. All rights reserved.

  14. Multivariate Time Series Search

    Data.gov (United States)

    National Aeronautics and Space Administration — Multivariate Time-Series (MTS) are ubiquitous, and are generated in areas as disparate as sensor recordings in aerospace systems, music and video streams, medical...

  15. Impact of livestock on a mosquito community (Diptera: Culicidae) in a Brazilian tropical dry forest

    OpenAIRE

    Santos,Cleandson Ferreira; Borges,Magno

    2015-01-01

    AbstractINTRODUCTION: This study evaluated the effects of cattle removal on the Culicidae mosquito community structure in a tropical dry forest in Brazil.METHODS: Culicidae were collected during dry and wet seasons in cattle presence and absence between August 2008 and October 2010 and assessed using multivariate statistical models.RESULTS: Cattle removal did not significantly alter Culicidae species richness and abundance. However, alterations were noted in Culicidae community composition.CO...

  16. Intelligent multivariate process supervision

    International Nuclear Information System (INIS)

    Visuri, Pertti.

    1986-01-01

    This thesis addresses the difficulties encountered in managing large amounts of data in supervisory control of complex systems. Some previous alarm and disturbance analysis concepts are reviewed and a method for improving the supervision of complex systems is presented. The method, called multivariate supervision, is based on adding low level intelligence to the process control system. By using several measured variables linked together by means of deductive logic, the system can take into account the overall state of the supervised system. Thus, it can present to the operators fewer messages with higher information content than the conventional control systems which are based on independent processing of each variable. In addition, the multivariate method contains a special information presentation concept for improving the man-machine interface. (author)

  17. Nonlinear Methodologies for Identifying Seismic Event and Nuclear Explosion Using Random Forest, Support Vector Machine, and Naive Bayes Classification

    Directory of Open Access Journals (Sweden)

    Longjun Dong

    2014-01-01

    Full Text Available The discrimination of seismic event and nuclear explosion is a complex and nonlinear system. The nonlinear methodologies including Random Forests (RF, Support Vector Machines (SVM, and Naïve Bayes Classifier (NBC were applied to discriminant seismic events. Twenty earthquakes and twenty-seven explosions with nine ratios of the energies contained within predetermined “velocity windows” and calculated distance are used in discriminators. Based on the one out cross-validation, ROC curve, calculated accuracy of training and test samples, and discriminating performances of RF, SVM, and NBC were discussed and compared. The result of RF method clearly shows the best predictive power with a maximum area of 0.975 under the ROC among RF, SVM, and NBC. The discriminant accuracies of RF, SVM, and NBC for test samples are 92.86%, 85.71%, and 92.86%, respectively. It has been demonstrated that the presented RF model can not only identify seismic event automatically with high accuracy, but also can sort the discriminant indicators according to calculated values of weights.

  18. Graphics for the multivariate two-sample problem

    International Nuclear Information System (INIS)

    Friedman, J.H.; Rafsky, L.C.

    1981-01-01

    Some graphical methods for comparing multivariate samples are presented. These methods are based on minimal spanning tree techniques developed for multivariate two-sample tests. The utility of these methods is illustrated through examples using both real and artificial data

  19. Forest resources of the Nez Perce National Forest

    Science.gov (United States)

    Michele Disney

    2010-01-01

    As part of a National Forest System cooperative inventory, the Interior West Forest Inventory and Analysis (IWFIA) Program of the USDA Forest Service conducted a forest resource inventory on the Nez Perce National Forest using a nationally standardized mapped-plot design (for more details see the section "Inventory methods"). This report presents highlights...

  20. Landscape genetics of leaf-toed geckos in the tropical dry forest of northern Mexico.

    Directory of Open Access Journals (Sweden)

    Christopher Blair

    Full Text Available Habitat fragmentation due to both natural and anthropogenic forces continues to threaten the evolution and maintenance of biological diversity. This is of particular concern in tropical regions that are experiencing elevated rates of habitat loss. Although less well-studied than tropical rain forests, tropical dry forests (TDF contain an enormous diversity of species and continue to be threatened by anthropogenic activities including grazing and agriculture. However, little is known about the processes that shape genetic connectivity in species inhabiting TDF ecosystems. We adopt a landscape genetic approach to understanding functional connectivity for leaf-toed geckos (Phyllodactylus tuberculosus at multiple sites near the northernmost limit of this ecosystem at Alamos, Sonora, Mexico. Traditional analyses of population genetics are combined with multivariate GIS-based landscape analyses to test hypotheses on the potential drivers of spatial genetic variation. Moderate levels of within-population diversity and substantial levels of population differentiation are revealed by FST and Dest. Analyses using structure suggest the occurrence of from 2 to 9 genetic clusters depending on the model used. Landscape genetic analysis suggests that forest cover, stream connectivity, undisturbed habitat, slope, and minimum temperature of the coldest period explain more genetic variation than do simple Euclidean distances. Additional landscape genetic studies throughout TDF habitat are required to understand species-specific responses to landscape and climate change and to identify common drivers. We urge researchers interested in using multivariate distance methods to test for, and report, significant correlations among predictor matrices that can impact results, particularly when adopting least-cost path approaches. Further investigation into the use of information theoretic approaches for model selection is also warranted.