Online Program Home
My Program

Abstract Details

Activity Number: 365 - SPEED: Innovations in Survey Sampling Designs: Administrative Data, Record Linkage, Non-Probability Samples, and More
Type: Contributed
Date/Time: Tuesday, July 31, 2018 : 10:30 AM to 11:15 AM
Sponsor: Government Statistics Section
Abstract #332588
Title: Record Linkage as a Decision Problem
Author(s): Alan Karr*
Companies: RTI International
Keywords: Record linkage; Linkage error; Decision problem
Abstract:

Record linkage is, ultimately, a decision problem of declaring which compared pairs are matches. Here we view record linkage in terms of what decisions are possible, as software packages, algorithms and parameter settings are varied. We report the results of an experiment on two publicly available datasets for which "ground truth" is known, using six freely available packages. Our analyses focus on the resulting weights for 77,951 compared record pairs. Depending on parameter settings (for example, use of EM algorithms or the string matching method), the number of weights can vary by orders of magnitude. Therefore, the number of distinct sets of matches as a function of the threshold-that is, the space of possible decisions-also varies. In some instances, there is no threshold that correctly reproduces ground truth. In others, "correct" thresholds exist but are difficult to identify without knowledge of ground truth. The available decisions differ across software packages, even though the algorithms are purported to be identical; over parameter settings; and over often opaque implementation details such as treatment of missing values. We propose the use of ensemble decision rules.


Authors who are presenting talks have a * after their name.

Back to the full JSM 2018 program