Keywords: Re-identification Risk, Risk Metric, Divergence Based Estimation Methods, Zero-inflated Models
Estimating the risk of re-identifying individuals from de-identified healthcare data is critical due to regulatory and contractual restrictions. In recent years, statistical models and metrics have been developed to understand and evaluate the risk of re-identification. However, when the statistical model is misspecified, the behavior of the risk estimates and its implications on the policy are largely unknown. In this presentation, we provide new composite metrics, statistical models, and methods for estimating risk of re-identification from de-identified data and study the theoretical properties of the metrics. We provide numerical algorithms for estimating the risk and evaluate the effects under model misspecification. We illustrate our findings with detailed description on the existing policy on several publicly available healthcare data.