Abstract:
|
First introduced by Gouweleeuw et al, the Post-RAndomization Method (PRAM) is a technique for perturbing categorical data for confidentiality protection. It possesses many desirable traits such as preserving specified variables and summary statistics. It can also prevent undesired changes by grouping. Nayak et al. took this method as well as recent results from Shlomo and Skinner and extended to inverse frequency post-randomization (IFPR), a method that can produce perturbed data with a theoretical upper bound for the risk of identity disclosure while preserving all traits of the invariant PRAM. In this paper, we revisit this method and conduct an extensive study comparing the effects on the empirical risk of identity disclosure and data utility on American Community Survey data between IFPR and data swapping, another method for confidentiality protection for categorical data.
|