Activity Number:
|
313
|
Type:
|
Contributed
|
Date/Time:
|
Tuesday, August 2, 2016 : 8:30 AM to 10:20 AM
|
Sponsor:
|
Section on Statistics in Epidemiology
|
Abstract #319353
|
View Presentation
|
Title:
|
Imputing Data That Are Missing at High Rates Using a Boosting Algorithm
|
Author(s):
|
Katherine Cauthen* and Gregory Lambert and Jaideep Ray and Sophia Lefantzi
|
Companies:
|
Sandia National Laboratories and Sandia National Laboratories and Sandia National Laboratories and Sandia National Laboratories
|
Keywords:
|
multiple imputation ;
machine-learning ;
boosting
|
Abstract:
|
Traditional multiple imputation approaches may perform poorly for datasets with high rates of missingness unless many m imputations are used. This paper implements an alternative machine learning-based approach to imputing data that are missing at high rates. We use boosting to create a strong learner from a weak learner fitted to a dataset missing many observations. This approach may be applied to a variety of types of learners (models). The approach is demonstrated by application to a spatiotemporal dataset for predicting dengue outbreaks in India from meteorological covariates. A Bayesian spatiotemporal CAR model is boosted to produce imputations, and the overall RMSE from a k-fold cross-validation is used to assess imputation accuracy.
|
Authors who are presenting talks have a * after their name.