139 – Inference and Variance Estimation with Complex Survey Data
Data Entrepreneurs' Synthetic PUF: A Working PUF as an Alternative to Traditional Synthetic and Nonsynthetic PUFs
Joshua M. Borton
NORC at the University of Chicago
Tzy-Chyi Yu
NORC at the University of Chicago
A. M. Crego
NORC at the University of Chicago
Avi Singh
NORC at the University of Chicago
Micheal Dalvern
NORC at the University of Chicago
E. Hair
NORC at the University of Chicago
The nature of Medicare Claims data makes usual methods of creating synthetic or nonsynthetic PUFs infeasible due to data complexity, numerous identifying variables (IVs), and difficulty in computing disclosure risk under an assumed intruder IV knowledge as it may increase over time. In view of this, we consider a two-prong strategy: creating a working PUF with high confidentiality at the cost of analytic utility, coupled with DUA-based microdata access for testing applicability of procedures developed for the working PUF and for final analysis. The working PUF or data entrepreneurs' synthetic PUF (DE-SynPUF) has high pseudo analytic utility because it retains the real data structure, and is useful to data entrepreneurs for application development and researchers for training. DE-SynPUF was created by treating claims individually, with no explicit preservation of intra-claim relationships. Moreover, all claims were subject to ad-hoc treatment to reduce risk as measured by IV driven k-anonymization. An application of DE-SynPUF to 2008-10 claims data is presented.