Activity Number:
|
660
- Machine Learning: Advances and Applications
|
Type:
|
Contributed
|
Date/Time:
|
Thursday, August 1, 2019 : 10:30 AM to 12:20 PM
|
Sponsor:
|
Section on Statistical Learning and Data Science
|
Abstract #304100
|
Presentation
|
Title:
|
A Two-Stage Approach to Multivariate Linear Regression with Sparsely Mismatched Data
|
Author(s):
|
Martin Slawski* and Emanuel Ben-David
|
Companies:
|
George Mason Univ and US Census Bureau
|
Keywords:
|
Broken Sample;
Entity Resolution;
Record Linkage;
Matching;
Robust Statistics
|
Abstract:
|
We study linear regression with faulty correspondence between responses Y and covariates X for a subset of the given data. This setting is motivated by the problem of adjusting for linkage error in post-linkage regression analysis when merging multiple data sets with non-unique identifiers. We present and analyze a computationally efficient two-stage method that first estimates the regression parameter by means of block-sparsity regularization, and subsequently restores the underlying correspondence between (X,Y)-pairs by solving a linear assignment problem. We provide explicit non-asymptotic error bounds and shed light on the optimality of the approach including minimax lower bounds.
|
Authors who are presenting talks have a * after their name.