Conference Program Home
  My Program

All Times EDT

Abstract Details

Activity Number: 72 - SPEED: Statistical Learning and Data Challenge Part 2
Type: Contributed
Date/Time: Sunday, August 7, 2022 : 4:00 PM to 4:45 PM
Sponsor: Section on Statistical Learning and Data Science
Abstract #323716
Title: Extrapolation Control Using K-Nearest Neighbors
Author(s): Kasia Dobrzycka* and Jonathan Stallrich and Christopher M. Gotwalt
Companies: North Carolina State University and North Carolina State University and SAS Institute
Keywords: K nearest neighbors; Extrapolation; Extrapolation Control; Constrained Optimization; Flexible; Efficient
Abstract:

Machine learning models that are trained to a dataset are often accurate when predicting points in the interior of that data but can be highly inaccurate when extrapolating, e.g. predicting at points not in the interior of the training set. Unfortunately, most machine learning methods do not have a built-in method to let the practitioner know when the model is used far from the training data. In optimization applications, constrained optimization usually requires manually input constraints or uncertainty measurements derived using expensive model fitting methods. Using k-nearest neighbor distances, we propose a novel flexible criteria to classify predictions as extrapolations that is robust to messy training data that have distinct clusters or have a highly non-elliptical geometry. The method is tuned by making the assumption that all points in the input dataset should be treated as non-extrapolations. We investigate multiple measures of “nearness” to a dataset and determine which ones perform well at detecting extrapolation. The method is an effective black-box extrapolation control algorithm that can be used when scoring new observations or as a constraint when optimizing models.


Authors who are presenting talks have a * after their name.

Back to the full JSM 2022 program