Activity Number:
|
355
- Advanced Bayesian Topics (Part 4)
|
Type:
|
Contributed
|
Date/Time:
|
Thursday, August 12, 2021 : 10:00 AM to 11:50 AM
|
Sponsor:
|
Section on Bayesian Statistical Science
|
Abstract #318889
|
|
Title:
|
A Bayesian Approach to Streaming Multi-File Record Linkage
|
Author(s):
|
Ian Taylor* and Andee Kaplan and Brenda Betancourt
|
Companies:
|
Colorado State University and Colorado State University and University of Florida
|
Keywords:
|
record linkage;
online updating;
MCMC;
Fellegi-Sunter;
streaming
|
Abstract:
|
Record linkage is the task of combining records from multiple files which refer to overlapping sets of entities when there is no unique identifying field in the records. In streaming record linkage, files arrive in time and estimates of links are desired after the arrival of each file. This arises in settings such as longitudinal surveys. The challenge in streaming record linkage is efficiently updating parameter estimates as new files arrive. We approach this problem from a Bayesian perspective with estimates in the form of posterior samples of parameters. We present a method for updating link estimates after the arrival of a new file that is faster than starting an MCMC from scratch. We adapt a Bayesian Fellegi-Sunter model for two files from Sadinle (2017) for more than two files and apply Sequential Markov Chain Monte Carlo from Yang and Dunson (2013) for streaming sample updates. We apply this method to simulated data and data from the the Social Diagnosis Survey of Polish households.
|
Authors who are presenting talks have a * after their name.