JSM 2011 Online Program

The views expressed here are those of the individual authors and not necessarily those of the JSM sponsors, their officers, or their staff.

Abstract Details

Activity Number: 625
Type: Contributed
Date/Time: Thursday, August 4, 2011 : 8:30 AM to 10:20 AM
Sponsor: Section on Survey Research Methods
Abstract - #300612
Title: An Empirical Evaluation of Easily Implemented, Nonparametric Methods for Generating Synthetic Data Sets
Author(s): Joerg Drechsler*+ and Jerome P. Reiter
Companies: Institute for Employment Research and Duke University
Address: Regensburger Str. 104, Nuremberg, International, 90478, Germany
Keywords: Census ; Confidentiality ; Disclosure ; Imputation ; Microdata
Abstract:

When intense redaction is needed to protect data subjects' confidentiality, statistical agencies can release synthetic data, in which identifying or sensitive values are replaced with draws from statistical models estimated from the confidential data. Specifying accurate synthesis models can be a difficult and labor intensive task with standard parametric approaches. We describe and empirically evaluate four easy-to-implement, nonparametric synthesizers based on machine learning algorithms - classification and regression trees, bagging, random forests, and support vector machines - on their potential to preserve analytical validity and reduce disclosure risks. The results suggest that synthesizers based on regression trees can provide high utility with low disclosure risks.


The address information is for the authors that have a + after their name.
Authors who are presenting talks have a * after their name.

Back to the full JSM 2011 program




2011 JSM Online Program Home

For information, contact jsm@amstat.org or phone (888) 231-3473.

If you have questions about the Continuing Education program, please contact the Education Department.