Activity Number:
|
472
- Winners: Business and Economic Statistics Student Paper Awards
|
Type:
|
Topic Contributed
|
Date/Time:
|
Wednesday, August 10, 2022 : 2:00 PM to 3:50 PM
|
Sponsor:
|
Business and Economic Statistics Section
|
Abstract #322188
|
|
Title:
|
Differentially Private Heavy-Tailed Synthetic Data
|
Author(s):
|
Tran Tran* and Matthew Reimherr and Aleksandra B. Slavkovic
|
Companies:
|
The Pennsylvania State University and Penn State University and Penn State University
|
Keywords:
|
differential privacy;
synthetic data;
heavy-tailed data;
longitudinal business database
|
Abstract:
|
The Longitudinal Business Database by the U.S Census is an invaluable resource for economic research, but it contains a great amount of sensitive information about all U.S. firms. This situation warrants releasing a synthetic version of the data to protect firms' privacy while ensuring its usability for research activities. Differential privacy provides a framework for strong provable privacy protection against arbitrary adversaries while allowing the release of summary statistics and synthetic data. However, generating synthetic heavy-tailed data with a formal privacy guarantee while preserving high levels of utility remains a challenge for data curators and researchers. We propose the K-Norm Gradient Mechanism (KNG) in the setting of quantile regression for DP synthetic data generation. The proposed methodology offers the flexibility of the well-known exponential mechanism while adding less noise. We also propose implementing KNG in a stepwise and sandwich order, such that new quantile estimation relies on previously sampled quantiles, to more efficiently use the privacy-loss budget. We show that the proposed methods can achieve better data utility relative to the original KNG at
|
Authors who are presenting talks have a * after their name.