Online Program

Return to main conference page
Tuesday, January 7
Tue, Jan 7, 9:00 AM - 10:45 AM
East Coast Ballroom
Innovations in Missing Data and Record Linkage

Using Synthetic Data to Replace Linkage Derived Elements, a Case Study (306675)

Lisa B. Mirel, CDC/NCHS/OAE/SPB 
*Dean M. Resnick, N.O.R.C. at the University of Chicago 

Keywords: Record linkage, Synthetic data, Disclosure Avoidance, Survival Analysis

While record linkage can expand analyses performable from survey microdata, it also incurs greater risk of privacy-encroaching disclosure. One way to mitigate this risk is replacing the information added through linkage with synthetic data elements. To assess this we conducted a case study using the National Hospital Care Survey (NHCS) which collects patient records from a sample of U.S. hospitals. The NHCS data were linked to the National Death Index (NDI) as a way to enhance the survey. The added information from NDI enables survival analyses related to hospitalization, but as the death information includes dates of death and detailed cause of death, having it joined with the anonymized patient records entails risk of re-identification (albeit only for deceased persons). For this reason, we have tested an approach to developing synthetic data that uses a model estimated from survival analysis to replace actual dates-of-death with synthesized dates-of-death and uses classification tree analysis to replace actual causes of death with synthesized ones. This paper will present the results of the case study, which are evaluated by comparing survival analysis parameter estimates.