Online Program

Return to main conference page

All Times ET

Wednesday, June 2
Practice and Applications
Assessing the Impact of COVID-19 Across Domains
Wed, Jun 2, 1:10 PM - 2:45 PM
TBD
 

Machine Learning Methods for Online Panel-based Surveys (309768)

Presentation

*Yulei He, CDC 
Van Parsons, CDC 
Guangyu Zhang, CDC 

Keywords: Health, Machine learning, Online survey, Probability panel, Prediction

Sample surveys are a major approach for collecting information from target populations aimed at producing reliable estimates, informing policy decisions, and providing data for scientific research. In the past decade or so, online (web) surveys have become an important tool for collecting data due to their advantages regarding timeliness and cost compared to the traditional data collection methods (e.g., in-person interviews). The advantage of online surveys is even more salient during the COVID-19 pandemic. The Research and Development Survey (RANDS) is a series of online health surveys based on probability-sampled panels, conducted by the National Center for Health Statistics, part of the U.S. Centers for Disease Control and Prevention.

Machine learning (ML) methods play a prominent role in the era of data science. Despite the increasing popularity of using online surveys for data collection and conducting traditional analyses, applying ML to online surveys is relatively rare. In this research, we investigate the utilities of established ML methods when applied to online surveys, using RANDS as a demonstrating example. Like many established national surveys, RANDS utilizes complex survey designs (e.g., sampling strata and clusters as well as unequal sampling weights) and has survey nonresponse (i.e., missing data) issues. RANDS includes a wide variety of health and health care related variables (e.g., health insurance coverage, diagnosed diabetes, telemedicine usage during the COVID-19 pandemic). We evaluate the performance of a variety of ML methods (e.g., regularized regressions, tree-based methods, deep-learning) for predicting important health outcomes (e.g., body mass index). We aim to develop practical ML strategies and guidelines that appropriately account for all the important yet sophisticated data features of online surveys for practitioners.