Online Program Home
My Program

Abstract Details

Activity Number: 626 - Recent Advances in High-Dimensional Statistical Inference
Type: Invited
Date/Time: Thursday, August 1, 2019 : 10:30 AM to 12:20 PM
Sponsor: IMS
Abstract #300134
Title: Theoretical Support of Machine Learning Debugging
Author(s): Po-Ling Loh*
Companies: UW-Madison
Keywords: High-dimensional regression; M-estimation; Robust statistics; Machine learning debugging

We study a linear regression formulation of machine learning debugging, where data are obtained from two distinct pools of “clean” and “contaminated” data. The goal is to correctly identify the subset of buggy data contained in the contaminated data pool. We propose a novel weighted $M$-estimator that applies a Huber loss to the contaminated data and a squared error loss to the clean data, and derive rigorous statistical properties of the estimator. Our results reveal the dependence between the proper choice of relative weights; the sample sizes of the clean and contaminated data sets; and the ratio between the noise variances of the two datasets. Simulation studies demonstrate the success of our method when applied to debugging tasks involving synthetic and real datasets.

Authors who are presenting talks have a * after their name.

Back to the full JSM 2019 program