Conference Program Home
  My Program

All Times EDT

Abstract Details

Activity Number: 232 - Health Policy Statistics Section Student Paper Award
Type: Topic Contributed
Date/Time: Tuesday, August 9, 2022 : 8:30 AM to 10:20 AM
Sponsor: Health Policy Statistics Section
Abstract #320999
Title: Bayesian Data Synthesis and the Utility-Risk Trade-Off for Mixed Epidemiological Data
Author(s): Joseph Feldman* and Daniel Kowal
Companies: Rice University and Rice University
Keywords: Generative Models; Synthetic Data; Mixed Data; Gaussian Copula; Data Privacy
Abstract:

Much of the micro data used for epidemiological studies contain sensitive measurements on real individuals. As a result, such micro data cannot be published out of privacy concerns, and without public access to these data, any statistical analyses originally published on them are nearly impossible to reproduce. To promote the dissemination of key datasets for analysis without jeopardizing the privacy of individuals, we introduce a cohesive Bayesian framework for the generation of fully synthetic high dimensional micro datasets of mixed categorical, binary, count, and continuous variables. This process centers around a joint Bayesian model that is simultaneously compatible with all of these data types, enabling the creation of mixed synthetic datasets through posterior predictive sampling. The proposed techniques are deployed to create a synthetic version of a confidential dataset containing dozens of health, cognitive, and social measurements on nearly 20,000 North Carolina children. We then study the utility-risk trade-off of synthetic data dissemination.


Authors who are presenting talks have a * after their name.

Back to the full JSM 2022 program