Online Program Home
My Program

Abstract Details

Activity Number: 227
Type: Invited
Date/Time: Monday, August 1, 2016 : 2:00 PM to 3:50 PM
Sponsor: Committee on Privacy and Confidentiality
Abstract #321910
Title: Performance Bounds for Graphical Record Linkage: Can record linkage bounds provide guidance for private synthetic data release?
Author(s): Rebecca Steorts* and Matt Barnes and Willie Neisweigner
Companies: Duke University and Carnegie Mellon University and Carnegie Mellon University
Keywords: privacy ; synthetic data release ; record linkage ; differential privacy ; information theory
Abstract:

Often, real world data sets are not released for privacy and confidentiality reasons in applications for health care, official statistics, human rights conflicts, among others. However, synthetic versions of such data sets are released that preserve, for example, differential privacy, while at the same time preserving data utility. Such synthetic data sets are often analyzed by record linkage algorithms to estimate, for example, the number of people in a sample or population. Given these motivations, one open question is the following: given a synthetic data set and an unknown privacy algorithm, does the synthetic data set (under some privacy setting) still have data utility? In this talk, we critically assess performance bounds using the Kullback-Leibler (KL) divergence under a general record linkage framework to provide guidance in privacy settings. We provide an upper bound using the KL divergence and a lower bound on the minimum probability of misclassifying a latent entity. We give insight for when our bounds hold using simulated data and potential privacy implications for synthetic data release.


Authors who are presenting talks have a * after their name.

Back to the full JSM 2016 program

 
 
Copyright © American Statistical Association