Online Program

Causes and Consequences of Data Linkage Errors: False and Missed Matches Following Linkage of Hospital Data

*Gareth Hagger-Johnson, UCL, Dept of Epidemiology & Public Health 


OBJECTIVES. Quantify the expected data linkage error rate in Hospital Episode Statistics (HES) by testing the HESID linkage pseudoanonymization algorithm in a Paediatric Intensive Care Audit Network database of 33 pediatric intensive care units in the UK (2004--2014) against a gold standard.

DESIGN. Classification of repeated admissions into true, false, and missed matches, as well as true non-matches, was based on independent gold standard. Outcome measures were the proportion of admissions that were false or missed matches following linkage.

RESULTS. The HESID pseudonymization algorithm produced 0.2% false matches and 7.3% missed matches. The true readmission rate was underestimated by 7.0%. Males from Asian/Black ethnic groups or patients with missing data were more likely to experience a false match. Missed matches were more common with decreasing age, ethnic minorities living in high socioeconomic deprivation, and missing data.

CONCLUSIONS. The deterministic linking algorithm used has a high missed match rate, which underestimates the readmission rate. To reduce linkage error, pseudoanonymization algorithms should be validated against quality gold standard.