Abstract:
|
The statistical disclosure control (SDC) methods is a class of privacy and utility preserving techniques that deliberately perturb the original data before public release. The goal of SDC methods is to reduce the disclosure risks to an acceptable level, while releasing public-use data sets (known as synthetic data sets) that still perfectly preserve the information from the original data set. In this work, we investigate a mixture-based multiple imputation synthetic method that provides different degrees of perturbation to records/individuals of different levels of disclosure risk. The first step of the method utilizes the concept of k-Anonymity proposed by Sweeney (2002) to divide individuals into subgroups of different disclosure risk levels, using the given risk thresholds. Then, through a data augmentation step, we introduce a tuning mechanism when building imputation models, to further control information loss and hence provide different levels of protection to individuals in different risk subgroups. We illustrate the proposed method using a simulation study and a real data application.
|