Online Program Home
My Program

Abstract Details

Activity Number: 660 - Machine Learning: Advances and Applications
Type: Contributed
Date/Time: Thursday, August 1, 2019 : 10:30 AM to 12:20 PM
Sponsor: Section on Statistical Learning and Data Science
Abstract #304100 Presentation
Title: A Two-Stage Approach to Multivariate Linear Regression with Sparsely Mismatched Data
Author(s): Martin Slawski* and Emanuel Ben-David
Companies: George Mason Univ and US Census Bureau
Keywords: Broken Sample; Entity Resolution; Record Linkage; Matching; Robust Statistics
Abstract:

We study linear regression with faulty correspondence between responses Y and covariates X for a subset of the given data. This setting is motivated by the problem of adjusting for linkage error in post-linkage regression analysis when merging multiple data sets with non-unique identifiers. We present and analyze a computationally efficient two-stage method that first estimates the regression parameter by means of block-sparsity regularization, and subsequently restores the underlying correspondence between (X,Y)-pairs by solving a linear assignment problem. We provide explicit non-asymptotic error bounds and shed light on the optimality of the approach including minimax lower bounds.


Authors who are presenting talks have a * after their name.

Back to the full JSM 2019 program