Abstract:
|
With the advent of the European Union’s General Data Protection Regulation (GDPR) there is heightened awareness of the need for confidentiality, an important issue when survey microdata are to be released. We discuss options in moving from pseudonymization of microdata to anonymization, starting with the need for risk assessments to quantify the re-identification risk and recognize the data values that are subject to higher risk. The next steps relate to data treatments that lead to data anonymization and we discuss thoughts on what is “anonymized” for the GDPR. We conducted an evaluation of disclosure risk under the context of providing guidance for adherence to the GDPR requirements. We demonstrate different data perturbation techniques that reduce the disclosure risk, such as controlled swapping, model-assisted constrained hotdeck, and synthetic data. We used the National Science Foundation’s Survey of Doctorate Recipients data as a case study. After implementing each method, we compared their impacts using measures on risk reduction and retained data utility. We provide insights on risk evaluation and the proper use of data perturbation methods, as well as recommended options f
|