Activity Number:
|
72
- SPEED: Statistical Learning and Data Challenge Part 2
|
Type:
|
Contributed
|
Date/Time:
|
Sunday, August 7, 2022 : 4:00 PM to 4:45 PM
|
Sponsor:
|
Section on Statistical Learning and Data Science
|
Abstract #323716
|
|
Title:
|
Extrapolation Control Using K-Nearest Neighbors
|
Author(s):
|
Kasia Dobrzycka* and Jonathan Stallrich and Christopher M. Gotwalt
|
Companies:
|
North Carolina State University and North Carolina State University and SAS Institute
|
Keywords:
|
K nearest neighbors;
Extrapolation;
Extrapolation Control;
Constrained Optimization;
Flexible;
Efficient
|
Abstract:
|
Machine learning models that are trained to a dataset are often accurate when predicting points in the interior of that data but can be highly inaccurate when extrapolating, e.g. predicting at points not in the interior of the training set. Unfortunately, most machine learning methods do not have a built-in method to let the practitioner know when the model is used far from the training data. In optimization applications, constrained optimization usually requires manually input constraints or uncertainty measurements derived using expensive model fitting methods. Using k-nearest neighbor distances, we propose a novel flexible criteria to classify predictions as extrapolations that is robust to messy training data that have distinct clusters or have a highly non-elliptical geometry. The method is tuned by making the assumption that all points in the input dataset should be treated as non-extrapolations. We investigate multiple measures of “nearness” to a dataset and determine which ones perform well at detecting extrapolation. The method is an effective black-box extrapolation control algorithm that can be used when scoring new observations or as a constraint when optimizing models.
|
Authors who are presenting talks have a * after their name.