Saturday, February 20
CS24 Administrative Applications Sat, Feb 20, 11:00 AM - 12:30 PM

R for Record Linkage (303182)

*Ahmad Emad, American Institutes for Research 
Celeste Stone, American Institutes for Research 

Keywords: Record Linkage, R

Record linkage is the procedure of linking equivalent records from two or more files or finding duplicates within files. The record linkage techniques that were initially developed for linking survey and administrative data are now being used to further expand the utility of Big Data, and wider use of these techniques by researchers stands to create many exciting new research opportunities. The objective of this presentation is to introduce participants to record linkage using the R statistical software.The RecordLinkage package is specifically designed for linking and deduplicating data sets with no unique identifiers. We will share common data standardization techniques used in record linkage, explain field comparison metrics, and give an overview of the several linking algorithms available in the RecordLinkage package. We will highlight the potential of record linkage by presenting the results of our research on the relationship between patents and federal research funding. We will then explain how such techniques can be generalized to different industries by taking examples from medical research, business intelligence, and federal research.