Online Program Home
  My Program

Abstract Details

Activity Number: 396 - Current Themes in Record Linkage Research
Type: Topic Contributed
Date/Time: Tuesday, August 1, 2017 : 2:00 PM to 3:50 PM
Sponsor: Survey Research Methods Section
Abstract #323907 View Presentation
Title: File linking with faulty matching information
Author(s): Nicole Dalzell* and Jerry Reiter and Gale Boyd
Companies: Wake Forest University and Department of Statistical Science, Duke University and Duke University
Keywords: linkage ; faulty ; Bayesian ; hiearchical ; matching

Many data sets, like surveys, are publicly available for analysis. Linking such public data sources to internal or private data sets allows richer analysis to be performed. Without common identifiers across the two files, linking often involves matching on a set of variables common to both files. However, data quality concerns, such as inaccurate field values or missing data, can hinder the linking process. We present a Bayesian file linking methodology designed to link records using continuous matching variables, called MVs, in situations where we do not expect values of these MVs to agree exactly across matched pairs. The method involves a linking model for the distance between the MVs of records in one file and the MVs of their linked records in the second. This model is conditional on a vector indicating the links. We specify a mixture model for the distance component of the linking model, as this latent structure allows the distance between matching variables in linked pairs to vary across types of linked pairs. Finally, we specify a model for the linking vector. We use the approach to link public survey information and data from the U.S. Census of Manufactures.

Authors who are presenting talks have a * after their name.

Back to the full JSM 2017 program

Copyright © American Statistical Association