Online Program Home
My Program

Abstract Details

Activity Number: 465 - Probabilistic Record Linkage: Better Assumptions, Scalable Inference, and Accounting for Uncertainty
Type: Topic Contributed
Date/Time: Wednesday, August 1, 2018 : 8:30 AM to 10:20 AM
Sponsor: Social Statistics Section
Abstract #329156
Title: Incorporating Sociodemographic Transitions and Family Network Structure into Historical Record Linkage
Author(s): Kayla Frisoli* and Rebecca Nugent and Brendan Murphy
Companies: Carnegie Mellon University and Carnegie Mellon University and University College Dublin
Keywords: record linkage; networks; historical census data; entity resolution
Abstract:

Record linkage is the process of identifying records corresponding to unique entities across datasets. Linking historical data allows researchers to better characterize topics like population mobility, impact of local/national events, and generational change. Most record linkage algorithms rely on string similarities (e.g. edit distance of name); however sometimes we expect to see changes not captured by standard text similarity metrics (e.g. name changes after marriage). In addition, methods often only consider pairwise information without incorporating relationship information across records (e.g. parents, siblings). We propose extending a typical linkage framework by including network structure and allowing for expected field changes. Our application (1901, 1911 Ireland census records) has limited, non-standardized fields with errors due to formatting and the digitization of hand-written records. These issues, coupled with high frequencies of common names, require modeling additional sociodemographic and network information to correctly link people across censuses. We conclude with a discussion of the challenges inherent to historical record linkage and other future extensions.


Authors who are presenting talks have a * after their name.

Back to the full JSM 2018 program