Friday, May 18

Survey Science

Fri, May 18, 10:30 AM - 12:00 PM
Lake Fairfax B

Classification Trees for Privacy in Sample Surveys (304628)

Presentation

*Rolando Andres Rodriguez, U.S. Census Bureau

Keywords: classification,CART,privacy,ACS,synthetic data

Faced with increasing public availability of large demographic databases and distributed computing, data-releasing agencies are calling upon mathematically formal privacy definitions to protect respondent identities and attributes. While paradigms such as differential privacy provide quantifiable privacy guarantees, their implementations can prove computationally intensive and difficult to apply. This is especially true for demographic surveys that collect detailed respondent attributes. For such surveys, other privacy methods can provide protection against specific attacks while maintaining survey accuracy. We detail the use of a simple machine-learning algorithm, classification trees, in creating synthetic data for the protection of categorical attributes in the American Community Survey. We discuss the difficulties in applying these algorithms to survey data and contrast these with the difficulties in using formal privacy techniques.

Online Program

Classification Trees for Privacy in Sample Surveys (304628)

ASA Meetings Department