346 – Disclosure, Confidentiality, Privacy
An Evaluation of the Impact of Missing Data on Disclosure Risk
Thomas Krenzke
Westat
Jianzhu Li
Westat
Lin Li
Westat
This paper focuses on measuring disclosure risk when missing data exists among key identifying variables. It is well known that combining identifying variables together can lead to the identification of an individual. Records that are unique in the sample based on a set of identifiers may not be 'true' uniques if there exists at least one other record that is a match on a non-missing subset of variables, because it is unknown if the true values match among the missing subset of variables. Therefore, there is some protection from missing values due to the uncertainty about their true values, and it is unclear how much protection is provided by the missing data items. In addition, available software handles missing data differently when measuring disclosure risk. In this paper we describe an approach to help gauge the impact of missing data on disclosure risk. We conduct an empirical investigation on public use data, as well as a simulation to evaluate further.