Online Program Home
  My Program

All Times EDT

Abstract Details

Activity Number: 308 - Data Integration in 21st Century Government Surveys
Type: Topic Contributed
Date/Time: Wednesday, August 5, 2020 : 10:00 AM to 11:50 AM
Sponsor: Government Statistics Section
Abstract #312203
Title: Adjusting Record Linkage Match Weights to Partial Levels of String Agreement
Author(s): Dean Resnick* and Lisa B Mirel and Marc Roemer and Scott Cambell
Companies: National Opinion Research Center (NORC) and National Center Health Statistics (NCHS/CDC) and Agency for Healthcare Research and Quality and N.O.R.C. at the University of Chicago
Keywords: Record Linkage; Fellegi-Sunter; String Comparisons; Agreement Weights
Abstract:

The Fellegi-Sunter record linkage paradigm in its original conception was based on the idea that for a set of comparison fields, such as first name, year of birth, and state of residence, agreement of each field between records in a pair is strictly binary: either there is complete agreement or there is not. For string comparisons, particularly for names fields, intuition tells us that having two versions of a name (e.g. ‘Resnick’ compared to ‘Reznik’) that are very similar but not identical is more indicative of a record pair being a match rather than a non-match. There are several string comparison tools such as Jaro-Winkler similarity scores and Levenshtein distances that can quantify the level of agreement as a full range of values between complete agreement and complete non-agreement. Certainly, one way of using such a metric is to establish a cutoff level above which we consider the fields essentially in agreement, but this would require a method of determining the cutoff. However, we are instead looking for a way to assess several gradations of agreement for string comparisons and assign agreement and non-agreement weights corresponding to the observed gradation. In this paper, we describe such a method that maintains and expands upon the Fellegi-Sunter approach.


Authors who are presenting talks have a * after their name.

Back to the full JSM 2020 program