Activity Number:
|
125
- Bayesian Methods for Discrete Data Problems
|
Type:
|
Contributed
|
Date/Time:
|
Monday, July 31, 2017 : 8:30 AM to 10:20 AM
|
Sponsor:
|
Section on Bayesian Statistical Science
|
Abstract #324088
|
View Presentation
|
Title:
|
A Sequential Algorithm for Bayesian Inference of Large-Scale Record Linkage Structure
|
Author(s):
|
Brendan McVeigh* and Jared S Murray
|
Companies:
|
Carnegie Mellon University and Carnegie Mellon University
|
Keywords:
|
Record linkage ;
Blocking ;
Metropolis-Hastings algorithm
|
Abstract:
|
Probabilistic record linkage is the process of determining which records in a database or databases correspond to the same unique entity. We adopt a fully Bayesian approach, allowing us to propagate uncertainty in the linkage structure through to subsequent inference by sampling from the posterior distribution. Sampling from the posterior is challenging given the high-dimensionality of the unknown linkage structure space. To address this challenge we propose a sequential approach for sampling from an approximate posterior distribution under the assumption of a one-to-one linkage structure across two files. In the first stage we restrict sampling to the subset of record pairs with relatively high linkage probabilities, and in subsequent stages we consider record pairs that are successively less likely to match, conditioning on the "resolved" record pairs from the previous steps. Conditioning reduces the size of the problem and yields highly a parallelizable algorithm. We compare simulation results from our method with those of a standard MCMC algorithm and discuss the tradeoffs between computational complexity and accuracy of the approximate posterior distribution.
|
Authors who are presenting talks have a * after their name.