Activity Number:
|
254
- Contributed Poster Presentations: Section on Statistical Learning and Data Science
|
Type:
|
Contributed
|
Date/Time:
|
Monday, July 30, 2018 : 2:00 PM to 3:50 PM
|
Sponsor:
|
Section on Statistical Learning and Data Science
|
Abstract #329710
|
|
Title:
|
A Generalized Fellegi-Sunter Framework for Unsupervised Collective Record Linkage in Clustered Relational Data with Applications to Electronic Health Records
|
Author(s):
|
Nicole Solomon* and Sean M O'Brien and Joseph Lucas
|
Companies:
|
Duke University Medical Center and Duke University Medical Center and Duke University
|
Keywords:
|
data linkage;
entity resolution;
EM algorithm;
identifiability;
mixture model
|
Abstract:
|
"Big data" in healthcare involves the statistical analysis of electronic health records and clinical registries. Critical to such research is the accurate identification of records pertaining to identical individuals in different databases. This task is challenging when the data are prone to recording errors and when unique identifiers are not available in each database. We describe a simple framework to link records across databases using record attributes in conjunction with relational evidence. The proposed approach improves upon existing methodology by modeling in dependencies in clustered data and producing collective instead of independent match decisions. We derive a decision rule that is optimal under the availability of true matching probabilities and show that matching probabilities can be estimated without labeled training data using assumptions that are less restrictive compared to existing record linkage models. We apply our method to linking three randomized clinical trials to Medicare claims data and demonstrate its superiority over the current standard method using Monte Carlo simulations based on real study data.
|
Authors who are presenting talks have a * after their name.