|
Activity Number:
|
533
|
|
Type:
|
Topic Contributed
|
|
Date/Time:
|
Thursday, August 10, 2006 : 10:30 AM to 12:20 PM
|
|
Sponsor:
|
Section on Survey Research Methods
|
| Abstract - #306349 |
|
Title:
|
A Study of String Comparator Performance on Census Name Data
|
|
Author(s):
|
William E. Yancey*+
|
|
Companies:
|
U.S. Census Bureau
|
|
Address:
|
Statistical Research Division, Washington, DC, 20233,
|
|
Keywords:
|
record linkage ; string comparator ; edit distance ; ROC curve
|
|
Abstract:
|
We compare the performance of several string comparators on first and last name data from the clerically reviewed census and accuracy follow-up files from 2000 and 1990. We compare the Jaro-Winkler string comparator with and without optional enhancements and several edit-distance--based string comparators. We also consider a string comparator that combines the Jaro-Winkler and edit-distance approach. The main statistical comparison is based on areas under portions of the ROC curve (sensitivity vs. selectivity) for each of the comparators on each of the datasets of name pairs from the data files that have been judged to be from matching records but are not spelled identically. We consider the effect of the choice of string comparators with differing ROC-based scores on actual record linkage results.
|