Online Program Home
My Program

Abstract Details

Activity Number: 632 - Advances in Statistical Disclosure Control Methodology
Type: Invited
Date/Time: Thursday, August 1, 2019 : 10:30 AM to 12:20 PM
Sponsor: SSC
Abstract #300160 Presentation
Title: Accounting for Longitudinal Data Structures When Disseminating Synthetic Data to the Public
Author(s): Joerg Drechsler* and Robin Mitra and Sana Rashid
Companies: Institute for Employment Research and University of Lancaster and Willis Towers Watson
Keywords: synthetic; longitudinal; fixed effects; random effects; confidentiality; disclosure
Abstract:

When generating synthetic data for public release, attention must be given to the selection of appropriate synthesis models since only those features that are incorporated in the model will be reflected in the generated data. If the dataset has a longitudinal structure, it is not obvious which synthesis model should be used to account for the design. Using multiple imputation for missing data it has previously been shown that employing fixed effects at the imputation stage may adversely affect inferences obtained by an analyst wishing to use random effects to account for the hierarchy and vice versa. Since it is generally unknown which model users of the data will prefer, a synthesis model should be preferred that suits both analysis models. We evaluate several strategies for generating longitudinal synthetic datasets using extensive simulation studies. In our evaluations we consider both, the analytical validity and the risk of disclosure resulting from the different synthesis strategies. We find that synthesis models should be preferred that cannot be classified as pure random or fixed effects models. We illustrate our findings using data from the German IAB Establishment Panel.


Authors who are presenting talks have a * after their name.

Back to the full JSM 2019 program