St. James Ballroom
Workflow for Training and Tuning Imputation and Prediction Model Pairings (303853)
Milo Tyrus Page, JMP*Milo Tyrus Page, JMP
Keywords: Missing Data, Data Imputation, Holdout Set Validation, Streaming Imputation, Model Validation
In general, as analysts, our goal is to use data to make inference or predictions about some response variable(s). In practice, holdout set validation methods (e.g., training, validation and test partitions) have proven to be robust procedures for inferring prediction model performance, but these procedures can be flawed in the presence of missing values. These holdout set techniques essentially rely on maintaining partial and complete blindness of the validation and test sets respectively. Most prediction methods require complete data, thus for missing values, a preprocessing imputation step is often needed. Without proper care, the imputation procedure can unintentionally negate the blinding of the validation and test sets, leading to corrupted evaluations of prediction model performance. Multiple imputation, an oft-employed technique to correct for the standard errors in the prediction model, does not address this issue; it was designed before holdout set techniques were widely used. In this poster, we expand on these ideas and present a compact, practical solution in the form of an imputation and prediction model workflow, that is easy to understand and implement in practice.