Online Program Home
  My Program

All Times EDT

Abstract Details

Activity Number: 355 - Advanced Bayesian Topics (Part 4)
Type: Contributed
Date/Time: Thursday, August 12, 2021 : 10:00 AM to 11:50 AM
Sponsor: Section on Bayesian Statistical Science
Abstract #318889
Title: A Bayesian Approach to Streaming Multi-File Record Linkage
Author(s): Ian Taylor* and Andee Kaplan and Brenda Betancourt
Companies: Colorado State University and Colorado State University and University of Florida
Keywords: record linkage; online updating; MCMC; Fellegi-Sunter; streaming
Abstract:

Record linkage is the task of combining records from multiple files which refer to overlapping sets of entities when there is no unique identifying field in the records. In streaming record linkage, files arrive in time and estimates of links are desired after the arrival of each file. This arises in settings such as longitudinal surveys. The challenge in streaming record linkage is efficiently updating parameter estimates as new files arrive. We approach this problem from a Bayesian perspective with estimates in the form of posterior samples of parameters. We present a method for updating link estimates after the arrival of a new file that is faster than starting an MCMC from scratch. We adapt a Bayesian Fellegi-Sunter model for two files from Sadinle (2017) for more than two files and apply Sequential Markov Chain Monte Carlo from Yang and Dunson (2013) for streaming sample updates. We apply this method to simulated data and data from the the Social Diagnosis Survey of Polish households.


Authors who are presenting talks have a * after their name.

Back to the full JSM 2021 program