Online Program

Validity of Deterministic and Probabilistic Record Linkage Using Multiple Indirect Personal Identifiers: Linking a Large Registry to Claims Data

*Soko Setoguchi, Duke Clinical Research Institute 
Ying Zhu, The University of Tokyo 
Chih-Ying Chen, Brigham and Women’s Hospital 


Record linkage improves data quality for database studies in pharmacoepidemiology, comparative effectiveness research, and health services/outcome research. Deterministic linkage and probabilistic linkage are two common methods to merge databases from different sources. However, databases available to general researchers often lack direct personal identifiers. The validity of record linkage using multiple indirect personal identifiers is not well understood. Using a large national cardiovascular device registry and 100% Medicare inpatient data, we linked hospitalization-level records. The main outcomes were the validity measures of several deterministic/probabilistic linkage rules using multiple indirect personal identifiers compared with rules using both direct and indirect personal identifiers. We assessed the performance of deterministic and probabilistic linkage rules without direct personal identifiers. When linking hospitalization-level records with no direct personal identifiers, provider information is needed for successful linkage. When to use deterministic vs. probabilistic linkage depends on the database quality and diagnoses/procedures for cohorts.