Online Program Home
My Program

Abstract Details

Activity Number: 465 - Probabilistic Record Linkage: Better Assumptions, Scalable Inference, and Accounting for Uncertainty
Type: Topic Contributed
Date/Time: Wednesday, August 1, 2018 : 8:30 AM to 10:20 AM
Sponsor: Social Statistics Section
Abstract #329188 Presentation
Title: When There Can Be Only One: The Highlander Probability Model for Historical Record Linkage with Labeled Data
Author(s): Jared S Murray*
Companies: University of Texas at Austin
Keywords: record linkage; imputation; weighting; historical data
Abstract:

In probabilistic record linkage, two or more datasets are merged using quasi-identifying information like names or ages. The accuracy of estimated links can be enhanced by using a small set of hand-labelled record pairs. Current methods leveraging hand-labelled data use binary classifiers that examine each record pair independently of all the others. However, when the files are (approximately) de-duplicated we expect a record in one file to match at most one record in the other. Existing approaches impose this constraint post-hoc, which is inefficient.

We propose new models for record linkage with labeled data that incorporate one-to-one constraints during estimation. We find that 1) Our new models capture additional discriminating information, increasing the number of correctly matched records while maintaining the same false match rates, and 2) Simple parametric models that correctly model dependence between record pairs outperform complicated machine learning methods that ignore it. Our models also directly estimate the relevant probabilities for weighting or imputation, allowing subsequent analyses to adjust for uncertainty in the estimated links.


Authors who are presenting talks have a * after their name.

Back to the full JSM 2018 program