Abstract:
|
Many institutions and researchers require access to detailed microdata. Historically, agencies may selectively allow access to the “real data” under strict security restrictions that respect disclosure avoidance regulations. Some programs produce public use microdata samples (PUMS) i.e., highly sanitized versions of real data. Many agencies are investigating methods of developing synthetic microdata as an alternative to PUMS. However, when the collected microdata are subject to regulatory privacy laws, it is often challenging to develop synthetic microdata that preserve complex inter-item relationships and protect the privacy of individual respondents simultaneously. It is especially difficult to main this balance in developing synthetic data for highly skewed economic populations; the information contained in the right tails is indispensable for accurate tabulations and is equally sensitive to disclosure. We present a novel synthetic data generation method designed specifically for skewed multivariate data that preserves key statistical properties of both the unit-level microdata and the tabulated estimates, respecting potential disclosure risk as applicable.
|