Abstract:
|
This paper describes the results of collaboration to develop tools for probabilistic linking of employer information provided by individuals' survey responses to employer administrative records. Our approach has several features that both facilitate the linkage process and enhance the quality of the linked data that result. We use unique identifiers and dates to narrow the employer candidate set, employ new standardizing and parsing techniques for business names, develop a "truth" set to train our matching models, and implement Fellegi-Sunter and logistic models of probabilistic record linkage. To illustrate our approach, we present results from matching a set of jobs described by respondents in the American Community Survey to administrative records on their employers from the Longitudinal Employer Household Dynamics data. We explore the robustness of the linking results to the availability of name and address characteristics and the selection of comparators, to the use of name and address standardizers, to the choice of probabilistic linking model, and to match quality thresholds. Based on the linking results, we provide recommendations on the utility of these methods.
|