Online Program Home
My Program

Abstract Details

Activity Number: 448
Type: Contributed
Date/Time: Tuesday, August 2, 2016 : 2:00 PM to 2:45 PM
Sponsor: Section on Statistical Learning and Data Science
Abstract #321788
Title: Statistical Learning Methods for Record Linkage: A Pioneer Mortality Example
Author(s): Kristina Murri*
Companies: Brigham Young University
Keywords: data mining ; machine learning ; statistical learning ; record linkage ; neural nets

Statistical learning algorithms are commonly used in regression or classification settings. We use these methods to link together two sets of pioneer records - one from ocean voyages and the other from voyages across the United States by wagon and handcart. Record linkage can be performed by creating all possible combinations of the observations in each data set and then using methods such as random forests, stochastic gradient boosting, and neural networks to classify whether these observations match or not. We compare how well these algorithms perform on different training and test sets with classification metrics. Then, we apply the results for the optimal algorithm in parallel to the full data set which is much larger. This research has important implications for the fields of statistical learning and record linkage by determining which methods perform the best in certain situations as well as showing an effective way to link records. After linking both of the full data sets, other questions, such as about the pioneer mortality rates can be answered by comparing the record linkage results.

Authors who are presenting talks have a * after their name.

Back to the full JSM 2016 program

Copyright © American Statistical Association