Online Program Home
My Program

Abstract Details

Activity Number: 254 - Contributed Poster Presentations: Section on Statistical Learning and Data Science
Type: Contributed
Date/Time: Monday, July 30, 2018 : 2:00 PM to 3:50 PM
Sponsor: Section on Statistical Learning and Data Science
Abstract #329710
Title: A Generalized Fellegi-Sunter Framework for Unsupervised Collective Record Linkage in Clustered Relational Data with Applications to Electronic Health Records
Author(s): Nicole Solomon* and Sean M O'Brien and Joseph Lucas
Companies: Duke University Medical Center and Duke University Medical Center and Duke University
Keywords: data linkage; entity resolution; EM algorithm; identifiability; mixture model

"Big data" in healthcare involves the statistical analysis of electronic health records and clinical registries. Critical to such research is the accurate identification of records pertaining to identical individuals in different databases. This task is challenging when the data are prone to recording errors and when unique identifiers are not available in each database. We describe a simple framework to link records across databases using record attributes in conjunction with relational evidence. The proposed approach improves upon existing methodology by modeling in dependencies in clustered data and producing collective instead of independent match decisions. We derive a decision rule that is optimal under the availability of true matching probabilities and show that matching probabilities can be estimated without labeled training data using assumptions that are less restrictive compared to existing record linkage models. We apply our method to linking three randomized clinical trials to Medicare claims data and demonstrate its superiority over the current standard method using Monte Carlo simulations based on real study data.

Authors who are presenting talks have a * after their name.

Back to the full JSM 2018 program