Online Program Home
My Program

Abstract Details

Activity Number: 47 - Statistical Analysis of Linked Data
Type: Invited
Date/Time: Sunday, July 29, 2018 : 4:00 PM to 5:50 PM
Sponsor: Survey Research Methods Section
Abstract #326693 Presentation
Title: A Bayesian Approach for Deduplication, Record Linkage, and Inference with Linked Data
Author(s): brunero liseo* and Andrea Tancredi and Rebecca C. Steorts
Companies: Sapienza Università di Roma and Sapienza Università di Roma and Duke University
Keywords: Clustering; Entity Resolution; Official Statistics; Hit-and-Miss Model

We propose a Bayesian approach for performing record linkage and inference across multiple lists,and simultaneously considering duplicate detection.

We frame the linkage problem as a clustering task, where similar records are clustered to true latent individuals. We propose a statistical model to incorporate both the linking and the inferential processes, including the features of the record as well as the variables needed for inference. Paramount to our approach is the key observation that the prior over the space of linkages can be written as a random partition model, and hence, can be used to calibrate the prior distribution regarding the cluster assignment of records. By the joint modeling of the record linkage and the inferential process, one is able to account for the matching uncertainty in the inferential procedures based on linked data. Moreover, one is able to generate a feedback mechanism of the information provided by the working statistical model on the record linkage process. This feedback mechanism is essential to eliminate potential biases that can jeopardize the resulting post-linkage inference. We apply our methodology to the case of multiple regression.

Authors who are presenting talks have a * after their name.

Back to the full JSM 2018 program