Online Program Home
My Program

Abstract Details

Activity Number: 483
Type: Topic Contributed
Date/Time: Wednesday, August 3, 2016 : 8:30 AM to 10:20 AM
Sponsor: Survey Research Methods Section
Abstract #318883 View Presentation
Title: General and Specific Utility Measures for Synthetic Data
Author(s): Joshua Snoke* and Beata Nowok and Gillian Raab and Aleksandra Slavkovic and Chris Dibben
Companies: Penn State University and University of Edinburgh and University of Edinburgh and Penn State University and University of Edinburgh
Keywords: Synthetic Data ; Privacy ; Utility ; Confidentiality ; Reproducibility ; Public Use Files
Abstract:

The utility of a data set that has been altered to preserve confidentiality can be assessed by general or specific measures. The former summarize differences between the distributions of the real and altered data while the latter compare differences between results from particular analyses using the two data sets. We extend previous work on utility for the specific case of synthetic data and exhibit our measures for two real data examples with synthesis. Methods are tailored specifically to improve usability for researchers seeking to generate analytically useful synthetic data. All methods in this paper are implemented in the synthpop package in R. Our extension includes a new statistic, the adjusted propensity mean squared error, that involves: (i) derivation and standardization of the statistic by a null expected value, (ii) the use of non-parametric CART models to estimate propensity scores values, and (iii) the use of the entire data set rather than only the changed variables in computing the utility measures. For specific utility measures, we use confidence interval overlap percentage, and introduce standardized measures for improved utility estimation under certain analyses.


Authors who are presenting talks have a * after their name.

Back to the full JSM 2016 program

 
 
Copyright © American Statistical Association