Activity Number:
|
431
- Contributed Poster Presentations: Section on Statistics and the Environment
|
Type:
|
Contributed
|
Date/Time:
|
Wednesday, August 10, 2022 : 10:30 AM to 12:20 PM
|
Sponsor:
|
Section on Statistics and the Environment
|
Abstract #322627
|
|
Title:
|
Assessing the Implementation of a Geographically Stratified Random Sample in the Random Forest Setting
|
Author(s):
|
Melissa A Meeker* and Leslie McClure
|
Companies:
|
Drexel University and Drexel University
|
Keywords:
|
Machine Learning;
Random Forest;
Spatial Data;
Geographically Stratified;
Sampling
|
Abstract:
|
The random forest (RF) is an ensemble of uncorrelated decision trees used for prediction that learns from a subset of data (training) and is tested on the remaining data (test). The RF assumes the full sample is independent and identically distributed; however, spatial data violate this assumption. There is evidence the RF performs well when applied to spatial data in a variety of settings. While some work has tried to improve the performance of the RF when applied to spatial data, it does not consider how random sampling to create training and test data may inadequately represent spatial coverage in a global model. We explored the use of geographically stratified random sampling to ensure the RF algorithm is adequately trained on the full geographic space. We simulated a set of spatially clustered point-level locations with datasets satisfying multiple levels of spatial correlation and performed the RF using both the random sample and geographically stratified random sample. Preliminary results show use of the geographically stratified random sample improves model performance. We will present these results and describe the robustness of results to choices of model parameters.
|
Authors who are presenting talks have a * after their name.