Abstract:
|
Administrative data, science, and privacy are on a collision course. New studies using confidential data are already framing discussions of income inequality and policy interventions. These studies were based on unfettered access to confidential microdata. Scientists working only within the agency's supervised, secure computing environment met statutory access rules. Restricting their published output met confidentiality requirements. Reproducible science requires that others be able to confirm the results using the same data. But sponsors only require the dissemination of public data. These public data are summaries of models estimated on the confidential data. They are never sufficient to verify the accuracy of the authors' specifications. Advances in data privacy methods now allow for provably accurate, provably safe public data sets. Every researcher granted access can verify previous research without additional privacy loss. The public data are iteratively updated to reflect the results of each new analysis. Curation of the public-use data meets the reproducible science requirement. Because of the privacy-preserving algorithms, it also meets confidentiality requirements.
|