Thursday, February 23
PS1 Poster Session 1 and Opening Mixer Thu, Feb 23, 5:30 PM - 7:00 PM
Conference Center AB

Probabilistic Record Linkage in R and Stata (303410)

*Anders R Alexandersson, Florida Cancer Data System 

Keywords: Probabilistic record linkage, fuzzy matching, Fellegi-Sunter model, data linkage, R package RecordLinkage, Stata command clrevmatch

This poster will illustrate probabilistic record linkage in R and Stata. Probabilistic record linkage is also known as fuzzy matching or Fellegi-Sunter record linkage. Applications are many and varied. The three main steps are preprocessing, the actual linking, and a clerical review of uncertain links. The problem is that there is no easy way to handle all three steps for moderately large datasets. The leading general-purpose statistical software are R, SAS, and Stata. Special-purpose software such as AutoMatch, BigMatch, and LinkPLus are difficult to integrate with R, SAS or Stata. The preprocessing can be done in either R, SAS or Stata. The linkage step cannot easily be done in SAS (e.g., Link King is limited) or Stata (e.g., reclink2 is slow). The review step cannot easily be done in R or SAS. A solution is to use the R package RecordLinkage for the linkage step, and to use the Stata command clrevmatch for the clerical review step. Two implementations for seamless code between R and Stata are the R package RStata and the Stata command Rcall. Since R is more difficult to learn, the focus will be on Rcall, that is, on how to use the R RecordLinkage package in Stata.