Online Program Home
  My Program

All Times EDT

Abstract Details

Activity Number: 588 - GSS/SRMS/SSS Student Paper Award Winners
Type: Topic Contributed
Date/Time: Thursday, August 6, 2020 : 3:00 PM to 4:50 PM
Sponsor: Survey Research Methods Section
Abstract #311071
Title: Multifile Record Linkage and Duplicate Detection via a Structured Prior for Partitions
Author(s): Serge Aleshin-Guendel* and Mauricio Sadinle
Companies: University of Washington and University of Washington
Keywords: Partitions; Multipartite Matchings; Entity Resolution; Record Linkage; Duplicate Detection
Abstract:

Merging datafiles containing information on overlapping sets of entities is a challenging task in the absence of unique identifiers, and is further complicated when some entities are duplicated in the datafiles. Most approaches to this problem have focused on the settings of record linkage, referring to linking two data sources assumed to be free of duplicates, or duplicate detection, referring to detecting which records in a single data source are duplicates. However, it’s common in practice to encounter data sources that fit somewhere in between or beyond these two settings. In this article we propose a new Bayesian approach for this general setting of multifile record linkage and duplicate detection. We extend previous models for comparisons of fields between pairs of records to accommodate the multifile setting and use a novel partition parameterization to propose a structured prior for partitions, specific to the context of multifile record linkage and duplicate detection, that can incorporate prior information about the data collection processes of the files in a flexible manner.


Authors who are presenting talks have a * after their name.

Back to the full JSM 2020 program