Online Program

Return to main conference page
Thursday, February 14
Thu, Feb 14, 5:30 PM - 7:00 PM
St. James Ballroom
Poster Session 1 and Opening Mixer

Workflow for Training and Tuning Imputation and Prediction Model Pairings (303853)

View Presentation View Presentation

Milo Tyrus Page, JMP 
*Milo Tyrus Page, JMP 

Keywords: Missing Data, Data Imputation, Holdout Set Validation, Streaming Imputation, Model Validation

In general, as analysts, our goal is to use data to make inference or predictions about some response variable(s). In practice, holdout set validation methods (e.g., training, validation and test partitions) have proven to be robust procedures for inferring prediction model performance, but these procedures can be flawed in the presence of missing values. These holdout set techniques essentially rely on maintaining partial and complete blindness of the validation and test sets respectively. Most prediction methods require complete data, thus for missing values, a preprocessing imputation step is often needed. Without proper care, the imputation procedure can unintentionally negate the blinding of the validation and test sets, leading to corrupted evaluations of prediction model performance. Multiple imputation, an oft-employed technique to correct for the standard errors in the prediction model, does not address this issue; it was designed before holdout set techniques were widely used. In this poster, we expand on these ideas and present a compact, practical solution in the form of an imputation and prediction model workflow, that is easy to understand and implement in practice.