Online Program

Return to main conference page
Saturday, February 22
Sat, Feb 22, 8:00 AM - 9:15 AM
Regency EF
Poster Session 3 and Continental Breakfast

The Impact and Limitations of Stratified Imputation (304041)

View Presentation View Presentation

*Dianna J Spence, University of North Georgia 
*Gregg A Velatini, University of North Georgia 

Keywords: imputation, stratified imputation, missing data

When building a supervised learning model, one option for dealing with missing values is stratified imputation, where the response variable is stratified and the median or mode of a given predictor within each stratum is used as the imputed value for records in which the predictor is missing. Many implementation decisions will affect model quality, including the number of strata used; the proportion of missing values that may be imputed; and the number of predictor variables for which this strategy may be used simultaneously. The impact of such decisions is explored using many versions of the same master data set, where each version has a designated proportion of data values removed. For each version of the data set, regression models are generated using many configurations of stratified imputation, varying implementation choices referenced above. Measures of model quality are compared, using results for both training and validation data. The aims of this research are to identify potential impacts of stratified imputation on the generation of supervised models; and to establish constraints on how this type of imputation can be carried out without sacrificing model integrity.