Online Program

Return to main conference page

All Times ET

Thursday, June 3
Software & Data Science Technologies
Software and Technology Shaping Data Science
Thu, Jun 3, 1:10 PM - 2:45 PM
TBD
 

Synthetic Data Generation with Tidysynthesis (309673)

Kyle Ueyama, Urban Institute 
*Aaron Robert Williams, Urban Institute 
Noah Zwiefel, Urban-Brooking Tax Policy Center 

Keywords: R, synthetic data, machine learning, statistical disclosure control, statistical disclosure limitation

Synthetic data is a growing statistical disclosure control method for creating microdata. This article demonstrates the R package tidysynthesis, which is a flexible and extensible open source tool for generating synthetic data. We designed the package by using tidy principles, leveraging the tidymodels machine learning framework, and implementing several new techniques for applied synthetic data generation. The package will be a useful tool for data stewards who wish to create synthetic microdata.