Activity Number:
|
82
- Contributed Poster Presentations: Government Statistics Section
|
Type:
|
Contributed
|
Date/Time:
|
Monday, August 3, 2020 : 10:00 AM to 2:00 PM
|
Sponsor:
|
Government Statistics Section
|
Abstract #312767
|
|
Title:
|
Generating Fully-Synthetic Discrete Data
|
Author(s):
|
Sixia Chen and Allshine Chen* and Daniel Zhao
|
Companies:
|
University of Oklahoma Health Sciences Center and University of Oklahoma, Health Sciences Center and University of Oklahoma Health Sciences Center
|
Keywords:
|
fully-synthetic;
discrete;
multiple-imputation;
survey sampling;
statistical disclosure
|
Abstract:
|
Fully-synthetic data is becoming increasingly prevalent with the growing demands of sharing data in private or public domains. The two key measures that must be addressed when creating synthetic data are its utility and risk. When synthesizing discrete data, there is not a well-accepted method to do so, nor quantify the utility and risk. In our study, we use generalized additive modeling with multiple imputation to create fully synthetic data. We compare our results to random forest and classification and regression tree methods using both simulated and real data as template data. We will also describe how we calculate the risk and utility for fully-synthetic discrete data.
|
Authors who are presenting talks have a * after their name.