The Medical Expenditure Panel Survey (MEPS) is an annual survey that collects nationally representative data on healthcare use and expenditures for the civilian, non-institutionalized U.S. population and is the primary source for micro-level national data on medical expenditures. The MEPS contacts both households and their medical providers to gather as much accurate expenditure information as possible. However, a significant portion of this expenditure data must be imputed.
Currently, imputation is conducted using a predictive mean matching (PMM) algorithm in which a linear regression model predicts total expenditures for recipients and donors. Recipients and donors are matched based on the smallest distance between predicted values, and expenditures are then allocated to the recipient.
For this analysis, we assess whether more sophisticated machine-learning (ML) algorithms can improve the existing PMM process to impute total expenditures. We apply and compare supervised ML algorithms such as random forest, neural networks, and regularized regression. We also assess the possibility of adding additional features into the algorithms and weighting important features.
|