Online Program

Towards Unrestricted Public Use Business Microdata: Construction of The Synthetic Longitudinal Business Database
John Abowd, Cornell University 
Ron S Jarmin, US Census Bureau 
*Satkartar K Kinney, National Institute of Statistical Sciences 
Javier Miranda, US Census Bureau 
Jerry P Reiter, Duke University 
Arnold Reznek, US Census Bureau 

Keywords: Synthetic data, longitudinal, business register, administrative data, confidentiality protection, imputation

Longitudinal business data are widely desired by researchers, but difficult to make available to the public because of confidentiality constraints. In this paper, we discuss the generation of synthetic public use datasets for establishment data. The basic idea is to release simulated values of sensitive variables, generated from probability distributions fit using genuine data. This can protect confidentiality, since attributes are synthetic rather than real. And, when the models describe the data well, broad-scale inferences from the synthetic datasets will be inferentially valid. We discuss the approaches used for generating synthetic public-use files for the U. S. Census Longitudinal Business Database.