Online Program Home
My Program

Abstract Details

Activity Number: 635 - Advances in Machine Learning
Type: Contributed
Date/Time: Thursday, August 2, 2018 : 8:30 AM to 10:20 AM
Sponsor: Section on Statistical Learning and Data Science
Abstract #330192 Presentation
Title: A Comparison of Record Linkage Techniques
Author(s): Lowell Mason*
Companies: U.S. Bureau of Labor Statistics
Keywords: record linkage; integrated data; machine learning

It has become increasingly common to create new statistical products by integrating existing data rather than engaging in new data collection; using existing data sources is less expensive and does not increase respondent burden. However, it is usually not possible to satisfactorily integrate the multiple data sources without manual intervention. An example is the integration of the Bureau of Economic Analysis (BEA) enterprise-level data on Foreign Direct Investment (FDI) with establishment data from the Bureau of Labor Statistic's Quarterly Census of Wages and Employment (QCEW). In this particular case, the initial error rate was 87.7%. After manual review and correction, the error rate was reduced to 19.0%. The labor cost, however, was considerable: almost 1,510.5 hours. To reduce linkage error and labor costs, we implement several record linkage techniques. We consider supervised learning techniques, such as Support Vector Machines (SVM) and Random Forests. Finally, as a baseline comparison, we implement the methods developed by Fellegi and Sunter (1969).

Authors who are presenting talks have a * after their name.

Back to the full JSM 2018 program