Online Program Home
  My Program

All Times EDT

Abstract Details

Activity Number: 357 - Contemporary Multivariate Methods
Type: Contributed
Date/Time: Wednesday, August 5, 2020 : 10:00 AM to 2:00 PM
Sponsor: Section on Statistical Learning and Data Science
Abstract #313644
Title: Generalized Linear Models with Partially Mismatched Data
Author(s): Zhenbang Wang* and Emanuel Ben-David and Martin Slawski
Companies: George Mason University and US Census Bureau and George Mason Univ
Keywords: generalized linear models; record linkage; broken sample problem; penalized estimatior

Probabilistic record linkage, i.e., the identification of matching records in multiple files can be a challenging and error-prone task. Linkage error can considerably affect subsequent analysis based on the resulting linked file. Several recent papers have studied post-linkage linear regression analysis with the response variable Y in one file and the covariates X in a second file from the perspective of “broken sample problem” and “permuted data”. In this work, we present an extension of this line of research to generalized linear models under the assumption of a small to moderate number of mismatches. An approach based on dummy variables and 1-norm penalization is proposed, and non-asymptotic error bounds for estimating the regression parameters are derived. For selected models, we also state conditions under which the underlying permutation can be recovered, i.e., under which the correct correspondence between X and Y can be restored.

Authors who are presenting talks have a * after their name.

Back to the full JSM 2020 program