Online Program Home
My Program

Abstract Details

Activity Number: 465 - Probabilistic Record Linkage: Better Assumptions, Scalable Inference, and Accounting for Uncertainty
Type: Topic Contributed
Date/Time: Wednesday, August 1, 2018 : 8:30 AM to 10:20 AM
Sponsor: Social Statistics Section
Abstract #329360 Presentation
Title: Improving Probabilistic Record Linkage: Accurate Links, Probabilities, and Measures of Uncertainty
Author(s): Bradley Spahn* and Brendan McVeigh and Jared S Murray
Companies: Stanford University and Carnegie Mellon University and University of Texas at Austin
Keywords: Record Linkage; Approximate Bayesian Computation; MCMC
Abstract:

In the context of archival research, where data is derived from often-messy digitized text, adopting a statistically sound approach for linkage estimation is essential. Probabilistic record linkage, the process of assigning probabilities to whether two entries correspond to the same entity, allows for approximately unbiased estimation of quantities of interest while allowing for imperfect identification of matches. Bayesian approaches to record linkage are among the most accurate, but computational considerations severely limit the practical applicability of existing methods.

We introduce a new computational approach, providing both a fast algorithm for deriving point estimates that properly account for one-to-one matching and a restricted MCMC algorithm that samples from an approximate posterior distribution. These advances make it possible to perform Bayesian inference for much larger problems. We demonstrate the methods on an OCR'd dataset, the California Great Registers, a collection of 57 million voter registrations from 1900 to 1968 that comprise the only panel data set of party registration collected before the advent of scientific surveys.


Authors who are presenting talks have a * after their name.

Back to the full JSM 2018 program