Conference Program

Return to main conference page

All Times ET

Thursday, June 9
Practice and Applications
Applications in Social & Behavioral Sciences, Part 2
Thu, Jun 9, 9:50 AM - 10:30 AM
Allegheny I
 

Predicting Census Survey Response Rates via Additive Regression with Interactions (310198)

Emanuel Ben-David, US Census Bureau 
*Shibal Ibrahim, MIT 
Rahul Mazumder, MIT 
Peter Radchenko, University of Sydney 

Keywords: Additive models, Interactions, regression, predictive models, self response rate, Census, Survey, machine learning

Accurate and interpretable prediction of survey response rates is important from an operational standpoint. The US Census Bureau's well-known ROAM application uses principled statistical models trained on the US Census Planning Database data to identify hard-to-survey areas. An earlier crowdsourcing competition revealed that an ensemble of regression trees led to the best performance in predicting survey response rates; however, the corresponding models could not be adopted for the intended application due to limited interpretability. In this paper, we propose an additive regression with interactions to predict, with high accuracy, response rates in surveys. To facilitate interpretation we focus on parsimonious models via l0-regularization, as well as hierarchically structured variants that provide enhanced interpretability. Despite strong methodological underpinnings, in high dimensions, estimation and variable selection in additive regression can be computationally challenging -- we present new scalable algorithms for learning these models. We also establish novel non-asymptotic error bounds for the proposed estimators. Our proposed method on the US Census Planning Database demonstrate yields high-quality predictive models that permit actionable interpretability, for different segments of the population, without losing in predictive performance to state-of-the-art black-box machine learning methods based on gradient boosting and feedforward neural networks.