40 – Innovations at U.S. Census Bureau
A Comparison of Methodologies for Classification of Administrative Records Quality for Census Enumeration
Darcy Steeg Morris
U.S. Census Bureau
The use of administrative records - data collected by governmental or non-governmental agencies in the course of administering a program or service - for household enumeration may be one way to significantly reduce Census costs, particularly in nonresponse follow-up (NRFU). Administrative records suffer the complications of big data in that they are collected for purposes not related to Census enumeration; yet they contain a wealth of information relevant to Census enumeration. This work investigates different classification techniques for determining which administrative records are sufficiently reliable to use to achieve a Census enumeration that maintains data quality but reduces costs. In addition to the cost/quality tradeoff associated with using administrative records, we seek a methodology for using administrative records that strikes a balance between predictive power and model complexity. In this research, we compare the use of logistic regression and machine learning techniques for extracting and synthesizing the most important enumeration information from a set of governmental and non-governmental data sources.