JSM Preliminary Online Program
This is the preliminary program for the 2006 Joint Statistical Meetings in Seattle, Washington.

The views expressed here are those of the individual authors
and not necessarily those of the ASA or its board, officers, or staff.


Back to main JSM 2006 Program page




Activity Number: 533
Type: Topic Contributed
Date/Time: Thursday, August 10, 2006 : 10:30 AM to 12:20 PM
Sponsor: Section on Survey Research Methods
Abstract - #306349
Title: A Study of String Comparator Performance on Census Name Data
Author(s): William E. Yancey*+
Companies: U.S. Census Bureau
Address: Statistical Research Division, Washington, DC, 20233,
Keywords: record linkage ; string comparator ; edit distance ; ROC curve
Abstract:

We compare the performance of several string comparators on first and last name data from the clerically reviewed census and accuracy follow-up files from 2000 and 1990. We compare the Jaro-Winkler string comparator with and without optional enhancements and several edit-distance--based string comparators. We also consider a string comparator that combines the Jaro-Winkler and edit-distance approach. The main statistical comparison is based on areas under portions of the ROC curve (sensitivity vs. selectivity) for each of the comparators on each of the datasets of name pairs from the data files that have been judged to be from matching records but are not spelled identically. We consider the effect of the choice of string comparators with differing ROC-based scores on actual record linkage results.


  • The address information is for the authors that have a + after their name.
  • Authors who are presenting talks have a * after their name.

Back to the full JSM 2006 program

JSM 2006 For information, contact jsm@amstat.org or phone (888) 231-3473. If you have questions about the Continuing Education program, please contact the Education Department.
Revised April, 2006