Abstract:
|
In the context of survey sampling, Rubin (1993) proposed to release multiply imputed synthetic datasets with the target sensitive values replaced by values drawn from the posterior predictive distributions under proper imputation models. However, information loss due to incorrect specification of imputation models can weaken or even invalidate the inference obtained from the synthetic datasets. In this talk, we discuss a new masking framework through data augmentation that has promising potential to remedy this issue. Moreover, the new framework can always guarantee valid inferences obtained using synthetic datasets, and it allows data users to obtain their desired level of data utility while satisfying the disclosure requirement set by agencies. This new framework can be extended and combined with other existing methods to accommodate different levels of disclosure protection to further optimize the utility-risk profile. We demonstrate through simulations and an illustrative example that our proposed framework outperforms the classical MI approach in preserving better data utility while providing similar or even better protection against disclosure.
|