Online Program Home
  My Program

All Times EDT

Abstract Details

Activity Number: 53 - Applications of Data Linkage and Machine Learning Techniques
Type: Contributed
Date/Time: Monday, August 3, 2020 : 10:00 AM to 2:00 PM
Sponsor: Survey Research Methods Section
Abstract #310994
Title: Estimating Linkage Errors Under Regularity Conditions
Author(s): Abel Dasylva* and Arthur Goussanou
Companies: and Statistics Canada
Keywords: data matching; big data; entity resolution; entity resolution; false negative rate; data integration
Abstract:

The accurate and cost effective estimation of linkage errors remains a major challenge for the automated production and use of linked data. However this exercise is worthwhile only if the linked data are fit for use. A new model is proposed to estimate the errors without clerical reviews, training data or conditional independence assumptions, under regularity conditions that guarantee the fitness for use of the linked data. It is based on the number of records adjacent to a given record, when linking files that have few duplicate records and a nearly complete coverage of the target population. Additional benefits include the estimation of false negatives due to blocking criteria, as well as record level measures of errors; two challenges for previous models.


Authors who are presenting talks have a * after their name.

Back to the full JSM 2020 program