Abstract:
|
In the era of data science, machine learning (ML) methods play a prominent role as they are capable of handling data with large volume and complex structure. Complex sample surveys are a major approach for collecting information from target populations aimed at producing reliable estimates for scientific research. In the past decade or so, web-based panel surveys have become an important tool for collecting data due to their advantages regarding timeliness and cost compared to the traditional survey data collection methods. In this project, we demonstrate the use of established ML methods and investigate their utilities using web-based panel survey data. This is illustrated by using the first two surveys from the Research and Development Survey, which is a series of health surveys based on probability-sampled web-based panels and conducted by the U.S. National Center for Health Statistics. Specifically, we evaluate the performance of a variety of ML methods (e.g., regularized regressions, tree-based methods, deep-learning) for predicting health outcomes in the survey (e.g, body mass index). Our results and experiences might be helpful for others applying ML methods to their data.
|