Online Program

Evaluation of Three Disclosure Limitation Models for the QCEW program

View Presentation Ali Mushtaq, NORC at the University of Chicago
Santanu Pramanik, NORC at the University of Chicago
Fritz Scheuren, NORC at the University of Chicago
*Michael Yang, NORC at the University of Chicago

Keywords: disclosure limitation, Confidentiality, Random noise method, Input treatment, Synthetic data, Two-part models

In recent years, the random noise method has been gaining wider use in statistical agencies to protect respondent data from disclosure. This method takes a micro approach to disclosure limitation: a multiplier (noise factor) is applied to each unit prior to tabulation, which guarantees that different tabulations, from the lowest to the highest level, are consistent. In this paper we evaluate two different random noise models in the context of the QCEW program. Our analysis suggests that the random noise method is unsatisfactory for protecting zero and small employment and wage values. To overcome this difficulty, we developed a mixed perturbation approach that combines the use of multiplicative noise to protect large establishments with the use of synthetic models to protect smaller establishments. To deal with the large percentage of zero values in employment and wages, we constructed two-part models (logistic regression followed by linear model) to generate the synthetic data. Results indicate that the mixed approach performs better both in terms of reliability and disclosure limitation, although it does not always generate large wages for companies with small employment.