Activity Number:
|
167
- Data Mining and Econometrics
|
Type:
|
Contributed
|
Date/Time:
|
Tuesday, August 10, 2021 : 10:00 AM to 11:50 AM
|
Sponsor:
|
Social Statistics Section
|
Abstract #318745
|
|
Title:
|
Generating Differentially Private Synthetic Heavy-Tailed Data
|
Author(s):
|
Tran Tran* and Matthew Reimherr and Aleksandra Slavkovic
|
Companies:
|
Pennsylvania State University and Penn State University and Penn State University
|
Keywords:
|
differential privacy;
synthetic data;
heavy-tailed data
|
Abstract:
|
Despite the need for broad dissemination of sensitive microdata, generating differentially private synthetic heavy-tailed data with high levels of utility remains a challenge for researchers. Differential privacy (DP) provides a framework for strong provable privacy protection against arbitrary adversaries while allowing the release of summary statistics and synthetic data. We propose using the K-Norm Gradient Mechanism (KNG) in the setting of quantile regression for DP synthetic data generation. The proposed methodology offers the flexibility of the well-known exponential mechanism while adding less noise. We also demonstrate how to improve privacy-loss budget usage by utilizing KNG with the so-called stepwise and sandwich schemes for quantile estimation relying on the previously estimated DP quantiles. Through a simulation study and an application on the Synthetic Longitudinal Business Database (SynLBD), we demonstrate that the proposed methods can achieve good data utility with a low privacy-loss budget.
|
Authors who are presenting talks have a * after their name.