All Times ET
Keywords: R, synthetic data, machine learning, statistical disclosure control, statistical disclosure limitation
Synthetic data is a growing statistical disclosure control method for creating microdata. This article demonstrates the R package tidysynthesis, which is a flexible and extensible open source tool for generating synthetic data. We designed the package by using tidy principles, leveraging the tidymodels machine learning framework, and implementing several new techniques for applied synthetic data generation. The package will be a useful tool for data stewards who wish to create synthetic microdata.