JSM 2016

‹‹ Go Back

Katherine Cauthen

Sandia National Laboratories

‹‹ Go Back

Gregory Lambert

Apple Inc.

‹‹ Go Back

Jaideep Ray

Sandia National Laboratories

‹‹ Go Back

Sophia Lefantzi

Sandia National Laboratories

â€¹â€¹ Go Back

←Back

313 – Missing Data Methods for Epidemiologic Studies

Imputing Data That Are Missing at High Rates Using a Boosting Algorithm

Sponsor: Section on Statistics in Epidemiology

Keywords: multiple imputation, machine-learning, boosting

Katherine Cauthen

Sandia National Laboratories

Gregory Lambert

Apple Inc.

Jaideep Ray

Sandia National Laboratories

Sophia Lefantzi

Sandia National Laboratories

Traditional multiple imputation approaches may perform poorly for datasets with high rates of missingness unless many m imputations are used. This paper implements an alternative machine learning-based approach to imputing data that are missing at high rates. We use boosting to create a strong learner from a weak learner fitted to a dataset missing many observations. This approach may be applied to a variety of types of learners (models). The approach is demonstrated by application to a spatiotemporal dataset for predicting dengue outbreaks in India from meteorological covariates. A Bayesian spatiotemporal CAR model is boosted to produce imputations, and the overall RMSE from a k-fold cross-validation is used to assess imputation accuracy.

View paper

Katherine Cauthen

Gregory Lambert

Jaideep Ray

Sophia Lefantzi

Please enter your access key

Email This Presentation:

Imputing Data That Are Missing at High Rates Using a Boosting Algorithm