Statistical Analysis and Modeling for Errors in Record Linkage
*Michael D Larsen, George Washington University
Keywords: File matching, Full probability model, Fellegi-Sunter algorithm, False match, False nonmatch, Data fusion
Record linkage is joining information from two or more files such that the combined records associate together all the available data individual by individual. The product of record linkage is a file with one record per individual that contains all the information about the individual from the multiple files. The problem is difficult when a unique identification key is not available, there are errors in some variables, some data are missing, and files are large. Probabilistic record linkage computes a probability that records from on different files pertain to a single individual or to different people. Some true links are given low probabilities of matching, whereas some nonlinks are given high probabilities. Errors in linkage designations can cause bias in analyses based on the composite database. Models are proposed for jointly modeling the record linkage process and subsequent statistical analysis.