Using Multiple Imputation to Protect Participants' Confidentiality When Sharing Data
Many statistical agencies and research organizations seek to share data in ways that protect the confidentiality of data subjects. In this talk, I describe how multiple imputation can be used for sharing confidential data. The basic idea is to replace sensitive or identifying values with draws from statistical models estimated from the confidential data. In contrast to other techniques for data perturbation--such as swapping, top-coding, or adding noise--multiple imputation approaches have the potential to preserve relationships and distributional features of the data while enabling valid inferences. I review several flavors of multiple imputation and describe some recently generated, public use data products that utilize multiple imputation to protect confidentiality.