Abstract Details
Activity Number:
Thursday, August 8, 2013 : 10:30 AM to 12:20 PM
Government Statistics Section
Abstract - #308133 |
Testing Record Linkage Production Data Quality
K. Bradley Paxton*+
Record Linkage ;
Testing ;
Automation ;
Data Quality ;
Record Linkage is used to find common entities (e.g., persons, households, or businesses) between pairs of data records in disparate data files. Once these links are found, an improved data set may be obtained by merging the matched entity data. This resulting improved data set could then be used for the appropriate business purpose or further examined by "data mining". If, however, the record linkage is done poorly, the "improved" data set might actually be worse than before. Testing the production output data quality for record linkage systems is very difficult - most find it so difficult they barely do it at all. This means many practitioners of record linkage don't know precisely how well their system actually works, much less how to make it better. In this paper, we outline a way to use automation to enable the efficient measurement of record linkage data quality in production or in development testing using "real" data. We call our automated testing approach RLPDQ, which stands for Record Linkage Production Data Quality, and it is an extension of the PDQ system that was used successfully in the 2010 Census to measure data capture quality in forms processing.
Authors who are presenting talks have a * after their name.
Back to the full JSM 2013 program
2013 JSM Online Program Home
For information, contact jsm@amstat.org or phone (888) 231-3473.
If you have questions about the Continuing Education program, please contact the Education Department.
The views expressed here are those of the individual authors and not necessarily those of the JSM sponsors, their officers, or their staff.
Copyright © American Statistical Association.