Online Program

Return to main conference page

All Times EDT

Friday, October 2
Fri, Oct 2, 1:00 PM - 3:00 PM
Virtual
Poster Session 4

SynthTools: An R Package to Measure Synthetic Data Utility (308534)

*Charlotte Baxter Looby, RTI International 

Keywords: synthetic data, partially synthetic data, fully synthetic data, utility

Synthetic data sets are effective tools for limiting statistical disclosure while preserving utility when publishing survey results. Experimentation and checks for utility are often required to achieve the desired balance of disclosure protection and utility, which can be a lengthy and complicated task with limited statistical software solutions. To guide this process, we wrote functions to calculate utility measures, which include confidence intervals and standard errors of statistics of variables of interest and cross-tabulations within fully and partially synthetic data sets using commonly accepted methodologies (Reiter & Raghunathan, 2007), as well as functions to check for data set comparability and potential logical inconsistencies. These functions were published in a package called SynthTools on CRAN for use by the greater statistics community. Having these functions readily available smooths out the experimentation process of creating synthetic data sets by providing simple, concrete utility measures. This presentation will show the types of tests these functions conduct and the start-to-finish process of testing a synthetic data set’s utility.