Online Program Home
My Program

Abstract Details

Activity Number: 285 - Probabilistic Record Linkage and Inference with Merged Data
Type: Topic Contributed
Date/Time: Tuesday, July 30, 2019 : 8:30 AM to 10:20 AM
Sponsor: Section on Statistics in Epidemiology
Abstract #304320 Presentation
Title: Joint Record Linkage and Duplicate Detection via a Generative Prior on Partitions
Author(s): Serge Aleshin-Guendel* and Mauricio Sadinle
Companies: University of Washington and University of Washington
Keywords: Record Linkage; Duplicate Detection; Partitions; Priors; Multipartite Matchings

The problems of record linkage and duplicate detection have traditionally referred to distinct but related settings: record linkage referring to linking two data sources containing no duplicates, and duplicate detection referring to detecting which records in a single data source are duplicates. However, it’s common in practice to encounter data sources that fit somewhere in between or beyond these two settings. We propose a new probabilistic model for the general problem of joint record linkage and duplicate detection that can handle such settings. In particular, we build upon previous comparison based models and propose a prior on partitions that attempts to capture, in the context of record linkage, a generative process of partitions. We examine the performance of our model on simulated data and illustrate how we can accommodate settings outside of traditional record linkage and duplicate detection by linking data sources documenting human rights violations in El Salvador and homicides in Colombia.

Authors who are presenting talks have a * after their name.

Back to the full JSM 2019 program