Online Program Home
  My Program

Abstract Details

Activity Number: 125 - Bayesian Methods for Discrete Data Problems
Type: Contributed
Date/Time: Monday, July 31, 2017 : 8:30 AM to 10:20 AM
Sponsor: Section on Bayesian Statistical Science
Abstract #324088 View Presentation
Title: A Sequential Algorithm for Bayesian Inference of Large-Scale Record Linkage Structure
Author(s): Brendan McVeigh* and Jared S Murray
Companies: Carnegie Mellon University and Carnegie Mellon University
Keywords: Record linkage ; Blocking ; Metropolis-Hastings algorithm
Abstract:

Probabilistic record linkage is the process of determining which records in a database or databases correspond to the same unique entity. We adopt a fully Bayesian approach, allowing us to propagate uncertainty in the linkage structure through to subsequent inference by sampling from the posterior distribution. Sampling from the posterior is challenging given the high-dimensionality of the unknown linkage structure space. To address this challenge we propose a sequential approach for sampling from an approximate posterior distribution under the assumption of a one-to-one linkage structure across two files. In the first stage we restrict sampling to the subset of record pairs with relatively high linkage probabilities, and in subsequent stages we consider record pairs that are successively less likely to match, conditioning on the "resolved" record pairs from the previous steps. Conditioning reduces the size of the problem and yields highly a parallelizable algorithm. We compare simulation results from our method with those of a standard MCMC algorithm and discuss the tradeoffs between computational complexity and accuracy of the approximate posterior distribution.


Authors who are presenting talks have a * after their name.

Back to the full JSM 2017 program

 
 
Copyright © American Statistical Association