Activity Number: 285 - Probabilistic Record Linkage and Inference with Merged Data
Type: Topic Contributed
Date/Time: Tuesday, July 30, 2019 : 8:30 AM to 10:20 AM
Sponsor: Section on Statistics in Epidemiology
Title: A Structured Prior for Sequential Bayesian Record Linkage
Author(s): Brendan McVeigh* and Jared S Murray
Companies: Carnegie Mellon University and University of Texas at Austin
Keywords: Record Linkage; Unsupervised learning; MCMC

Probabilistic record linkage is the problem of identifying sets of records from multiple databases which correspond to the same underlying entity in the absence of a unique identifier. For all but the smallest problems computational considerations mean that only a small subset of the possible record pairs can be considered for matching. In principle a multistage approach to this problem could deliver substantial gains in computational efficiency. Such an approach first considers a small number of candidate matches for each record, and only considers a larger number of candidates for records which remain unmatched after the first stage. We present a new record linkage prior and latent variable model which capture such a multistage approach. By fully incorporating the multistage approach into our statistical model we allow for valid posterior inference despite the multistage nature of the matching.

