JSM 2013 Home
Online Program Home
My Program

Abstract Details

Activity Number: 586
Type: Topic Contributed
Date/Time: Wednesday, August 7, 2013 : 2:00 PM to 3:50 PM
Sponsor: Government Statistics Section
Abstract - #308202
Title: Transitive Probabilistic Deduplication of Record Systems Using a Stochastic Blockmodel
Author(s): Mauricio Sadinle*+
Companies: Carnegie Mellon University
Keywords: Bayesian inference ; Data quality ; Deduplication ; Record linkage ; Relational data ; Stochastic blockmodel
Abstract:

The task of deduplicating a datafile can be solved by estimating the elements of a linkage matrix, where each entry is associated with a pair of records, and it is equal to one if both records refer to the same underlying entity, and zero otherwise. The current approach for supervised probabilistic deduplication consists on training classifiers on pairs of records that are hand-matched, and then predicting the matching status of the remaining pairs. Unsupervised probabilistic deduplication typically uses a mixture-model implementation of the Fellegi-Sunter methodology for record linkage to link the datafile with itself. These previous approaches ignore the dependencies among the entries of the linkage matrix, since they output independent linkage decisions for all pairs of records. We propose a method that outputs estimates of the linkage matrix such that transitive decisions are guaranteed. The proposed method uses a simple stochastic blockmodel for multigraphs. We also show how this method can be extended to handle simultaneously deduplication and record linkage.


Authors who are presenting talks have a * after their name.

Back to the full JSM 2013 program




2013 JSM Online Program Home

For information, contact jsm@amstat.org or phone (888) 231-3473.

If you have questions about the Continuing Education program, please contact the Education Department.

The views expressed here are those of the individual authors and not necessarily those of the JSM sponsors, their officers, or their staff.

ASA Meetings Department  •  732 North Washington Street, Alexandria, VA 22314  •  (703) 684-1221  •  meetings@amstat.org
Copyright © American Statistical Association.