![IconGems-Print](images/IconGems-Print.png)
295 – SPEED: Big Data, Small Area Estimation, and Methodological Innovations Under Development, Part 1
Re-Examining File-Level Re-Identification Risk Assessment for Survey Microdata
Tom Krenzke
Westat
Jianzhu Li
Westat
Lin Li
Westat
Natalie Shlomo
University of Manchester
In this paper we discuss some practical issues encountered when estimating file-level disclosure risk measures of re-identification in survey microdata. We typically use the log-linear modeling approach (Skinner and Shlomo (2008)) to estimate disclosure risk in survey microdata files. Several challenges emerge that relate to satisfying goodness of fit criteria of the log-linear models in the presence of model assumption violations, and handling large numbers of variables. In the former, we explore several approaches to improve the fit of log-linear models particularly for the case of complex survey designs and differential survey weights. For the latter, we provide guidance for variable selection with insights on how to proceed with the risk assessment and provide meaningful results. We used the National Science Foundation‘s Survey of Doctorate Recipients data as a case study. The results of evaluating the disclosure risk under several approaches lead to guidance for a sensitivity analysis that helps to provide for a better estimate of file-level risk of re-identification in survey microdata.