Name: 2019 Joint Statistical Meetings
Start: 2019-07-27T07:00:00+00:00
End: 2019-08-01
Location: Colorado Convention Center

Activity Number:	626 - Recent Advances in High-Dimensional Statistical Inference
Type:	Invited
Date/Time:	Thursday, August 1, 2019 : 10:30 AM to 12:20 PM
Sponsor:	IMS
Abstract #300134
Title:	Theoretical Support of Machine Learning Debugging
Author(s):	Po-Ling Loh*
Companies:	UW-Madison
Keywords:	High-dimensional regression; M-estimation; Robust statistics; Machine learning debugging
Abstract:	We study a linear regression formulation of machine learning debugging, where data are obtained from two distinct pools of “clean” and “contaminated” data. The goal is to correctly identify the subset of buggy data contained in the contaminated data pool. We propose a novel weighted $M$-estimator that applies a Huber loss to the contaminated data and a squared error loss to the clean data, and derive rigorous statistical properties of the estimator. Our results reveal the dependence between the proper choice of relative weights; the sample sizes of the clean and contaminated data sets; and the ratio between the noise variances of the two datasets. Simulation studies demonstrate the success of our method when applied to debugging tasks involving synthetic and real datasets.

Authors who are presenting talks have a * after their name.

JSM 2019 Online Program