Abstract:
|
Population health surveys are an important source of data for reporting national health statistics in the United States. Yet, as evidenced by the COVID-19 pandemic, it is important to be able to rapidly collect data to monitor changes in health outcomes and health care resources during public health crises. Alternative data sources, such as the National Center for Health Statistics’ Research and Development Survey (RANDS), a primarily web-based probability-sampled panel survey, have been used for timely data collection. While estimates from RANDS have been shown to have relatively low bias after calibrating the panel weights to a benchmark dataset, there is currently a gap in methods to integrate data from panel surveys with multiple years of traditional federal survey data. Machine learning models are flexible and can be used to account for existing trends while incorporating auxiliary information to produce timely and reliable estimates. Several machine learning approaches are evaluated and assessed by comparing the bias of the model-based estimates to direct estimates from a reference survey.
|