Online Program Home
  My Program

All Times EDT

Abstract Details

Activity Number: 239 - Synthetic Data and Differential Privacy: Data, Privacy and the Public Good
Type: Invited
Date/Time: Tuesday, August 4, 2020 : 1:00 PM to 2:50 PM
Sponsor: Survey Research Methods Section
Abstract #308066
Title: Synthetic Microdata for Establishment Surveys Under Informative Sampling
Author(s): Hang Kim* and Joerg Drechsler and Katherine Jenny Thompson
Companies: University of Cincinnati and Institute for Employment Research and US Census Bureau
Keywords: disclosure risk; full synthesis; pseudo likelihood; survey weight; synthetic population
Abstract:

Many agencies are currently investigating whether synthetic microdata could be a viable dissemination strategy for highly sensitive data, such as business data, for which disclosure regulations otherwise prohibit the release of public use microdata. However, existing methods assume that the original data comprise a simple random sample from this population, which limits the application of these methods in the context of survey data with unequal survey weights. This paper discusses synthetic data generation under informative sampling. To utilize the design information in the survey weights, we rely on the pseudo likelihood approach when building a hierarchical model to estimate the distribution of the finite population. Then, synthetic populations are randomly drawn from the estimated finite population density. Using simulation studies, we show that the suggested synthetic data approach offers high utility for design- and model-based analyses while offering disclosure protection. We apply it to a subset of the 2012 U.S. Economic Census and evaluate the results with utility metrics and disclosure avoidance metrics under data attacker scenarios commonly used for business data.


Authors who are presenting talks have a * after their name.

Back to the full JSM 2020 program