Introductory Overview Lecture: Data Integration and Data Linking/Matching

Record Linkage: Background and Challenges for Linkage of Persons and Establishments (308412)

*Michael D. Larsen, Saint Michael’s College 

Record linkage, or exact file matching, consists of bringing together records in two or more files on the same population. The population can consist of individuals, establishments, or other entities. Files are linked for the purposes of creating a larger database, enabling analyses that would otherwise not be possible, and counting the population. When unique, error-free identification codes are not available on both files, then record linkage can be accomplished through probabilistic methods. When implementing matching algorithms, one must choose matching variables, define for each variable what it means to agree or disagree, choose blocking factors that restrict the space of comparison pairs, and decide the level of evidence required to declare that a pair of records is a probable match. Linkage of establishments can be complicated by multiple associated names, physical and mail addresses, and points of contact. Subsequent analyses of data formed through a linkage process can be affected by both false matches and false non-matches between and among records. Model-based adjustment and survey weighting have been suggested for addressing analyses in linked files. This lecture will discuss linkage processes and algorithms in general and with attention to particular issues when linking establishments.