Conference Program Home
  My Program

All Times EDT

Abstract Details

Activity Number: 431 - Contributed Poster Presentations: Section on Statistics and the Environment
Type: Contributed
Date/Time: Wednesday, August 10, 2022 : 10:30 AM to 12:20 PM
Sponsor: Section on Statistics and the Environment
Abstract #322627
Title: Assessing the Implementation of a Geographically Stratified Random Sample in the Random Forest Setting
Author(s): Melissa A Meeker* and Leslie McClure
Companies: Drexel University and Drexel University
Keywords: Machine Learning; Random Forest; Spatial Data; Geographically Stratified; Sampling
Abstract:

The random forest (RF) is an ensemble of uncorrelated decision trees used for prediction that learns from a subset of data (training) and is tested on the remaining data (test). The RF assumes the full sample is independent and identically distributed; however, spatial data violate this assumption. There is evidence the RF performs well when applied to spatial data in a variety of settings. While some work has tried to improve the performance of the RF when applied to spatial data, it does not consider how random sampling to create training and test data may inadequately represent spatial coverage in a global model. We explored the use of geographically stratified random sampling to ensure the RF algorithm is adequately trained on the full geographic space. We simulated a set of spatially clustered point-level locations with datasets satisfying multiple levels of spatial correlation and performed the RF using both the random sample and geographically stratified random sample. Preliminary results show use of the geographically stratified random sample improves model performance. We will present these results and describe the robustness of results to choices of model parameters.


Authors who are presenting talks have a * after their name.

Back to the full JSM 2022 program