Keywords: differential privacy, privacy budget, parametric DIPS, statistical disclosure limitation
Voter registration data is important in political science research and applications such as youth voter turnout and predicting the presidential election outcome. Voter registration data often contains sensitive information about the individuals in the data sets. One way of mitigating the privacy concern is removing identifiers in the released data. However, a data intruder can still expose personal information of the participants in the “anonymized” data by linking it to other public data sets such as healthcare data or the Personal Genome Project Data. DIfferentially Private Data Synthesis (DIPS) techniques produce synthetic data or pseudo individual records at a preset level of privacy protection. Although DIPS provides a strong and robust privacy guarantee, statistical inferences drawn from the synthetic data can be poor due to the large amount of noise added to the data. We propose and apply a new approach called Statistical Allocation for Epsilon (SAFE) on voter registration data that allocates the privacy budget based on the statistical significance of the data’s parameters. From the simulation study, SAFE outperforms DIPS alone by improving the statistical inferences.