Abstract #300896

This is the preliminary program for the 2003 Joint Statistical Meetings in San Francisco, California. Currently included in this program is the "technical" program, schedule of invited, topic contributed, regular contributed and poster sessions; Continuing Education courses (August 2-5, 2003); and Committee and Business Meetings. This on-line program will be updated frequently to reflect the most current revisions.

To View the Program:
You may choose to view all activities of the program or just parts of it at any one time. All activities are arranged by date and time.

The views expressed here are those of the individual authors
and not necessarily those of the ASA or its board, officers, or staff.


Back to main JSM 2003 Program page



JSM 2003 Abstract #300896
Activity Number: 113
Type: Topic Contributed
Date/Time: Monday, August 4, 2003 : 10:30 AM to 12:20 PM
Sponsor: Section on Survey Research Methods
Abstract - #300896
Title: Record Linkage Using Error Prone Strings
Author(s): Schnell Rainer*+ and Stefan Bender
Companies: University of Konstanz and Institute for Employment Research
Address: , 78457 Konstanz, , , Germany
Keywords: record linkage ; error-prone strings ; survey data ; register data ; STATA
Abstract:

We are attempting to recover employer names for 600 respondents in a labor-force survey by linking the respondent data with those from two other data sets (data from an official register of employers and data of employees from the German social security system). To conduct this study, we have had to perform reliable tests of various string-comparison algorithms that use German last names as key variables. We developed our own program for use with German personal names. The program uses blocking variables that are thought to be error-free. Within a block, all records are compared according to a number of specified key variables. These variables are automatically edited with selectable options. We implemented several phonetic algorithms such as Soundex, along with Metaphone, DoubleMetaphone, NYSIIS, Speedcop and Phonex. Additionally we programmed a bundle of string-similarity algorithms such as n-grams, Guth, LCS, Levenshtein and Ratcliff/Obershelp. Furthermore, we implemented Synoname. We report the true links for varying combinations of preprocessing options and string-similarity algorithms.


  • The address information is for the authors that have a + after their name.
  • Authors who are presenting talks have a * after their name.

Back to the full JSM 2003 program

JSM 2003 For information, contact meetings@amstat.org or phone (703) 684-1221. If you have questions about the Continuing Education program, please contact the Education Department.
Revised March 2003