Online Program

Return to main conference page
Wednesday, January 8
Wed, Jan 8, 8:30 AM - 10:15 AM
Pacific C
Linked Data for Evidence-Based Policymaking

On the utility of prediction models in large government surveys: using linked administrative-survey data to inform analyses of more contemporaneous survey data (306602)


*Yulei He, National Center for Health Statistics 
Jennifer D Parker, NCHS 
Jennifer R Rammon, National Center for Health Statistics 

Keywords: NHANES, Medicaid, Linked Data, Multiple Imputation, Machine Learning, children

The record linkage program at the National Center for Health Statistics (NCHS) maximizes the scientific value of population-based surveys by adding person-level information from rich administrative data sets. The linked survey files provide a unique opportunity to examine factors that influence health status in conjunction with information only obtained on a population level through NCHS surveys. A disadvantage of the linked data is that there is a consistent time lag between the availability of population survey data produced by NCHS and the availability of the corresponding linked data files. While the currently linked data are useful, it is imminent that we are also able to make inference on more contemporaneous survey data in order to inform health policy. This project uses the 2005-12 NHANES linked data to examine the effectiveness of multiple imputation (MI) and machine learning (ML) methods for predicting the Medicaid/CHIP enrollment status of children who haven’t (yet) been linked to the CMS Medicaid files. Results indicate that predictions based on ML methods are more accurate than MI predictions, but that the methods perform similarly in terms of statistical inference.