Abstract:
|
This paper discusses the development of a person-place model for the 2020 Administrative Records Census. To assign people to addresses, we estimated the probability each person lived at each of their potential addresses on April 1, 2020, using machine learning algorithms trained on historical matches of administrative records to the 2018 American Community Survey. Many observations had only one potential address, but for those that had multiple potential addresses due to residential mobility and differences in timing of the administrative record, we assigned a probability to each location. To aggregate people to broader geographies, we either assigned them to their highest probability address or used the predicted probabilities as weights, such that a person may be partially represented in multiple locations. We evaluated several algorithms for estimating the location probability: logit regression, elastic net, random forest, and gradient boosting. Random forest was selected as the final model based on validation performance metrics and learned variable importance.
|