Online Program Home
  My Program

All Times EDT

Abstract Details

Activity Number: 482 - Application of Nonparametric Tests
Type: Contributed
Date/Time: Thursday, August 6, 2020 : 10:00 AM to 2:00 PM
Sponsor: Section on Nonparametric Statistics
Abstract #309605
Title: Detection of Data Drift and Outliers Affecting Machine Learning Model Performance Over Time
Author(s): Samuel Ackerman* and Orna Raz and Eitan Farchi and Marcel Zalmanovici
Companies: IBM Research, Haifa and IBM Research, Haifa and IBM Research, Haifa and IBM Research, Haifa
Keywords: changepoints; sequential testing; outlier detection; data drift monitoring; density estimation; distribution change
Abstract:

A trained ML model is deployed on another payload dataset where target feature values (labels) are unknown. Drift is distribution change between the training and payload data, which is concerning if model performance changes. For a cat/dog image classifier, payload drift could be rabbit images (new class) or cat/dog images with changed characteristics (change in distribution). We wish to detect these changes but can't measure accuracy without payload labels. We instead detect drift indirectly by nonparametrically testing the distribution of model prediction confidence for changes. This generalizes our method and sidesteps domain-specific feature representation. We address important statistical issues, particularly Type-1 error control in sequential testing, using CPMs (Ross & Adams 2012). We also use nonparametric outlier methods to show the user suspicious observations for model diagnosis, since the before/after change confidence distributions overlap significantly. In experiments to demonstrate robustness, we train on a subset of MNIST digit classes, then insert drift (eg, unseen digit class) in payload data in various settings (gradual/sudden changes in the drift proportion).


Authors who are presenting talks have a * after their name.

Back to the full JSM 2020 program