Online Program Home
My Program

Abstract Details

Activity Number: 85 - Machine Learning in Biomedical Data
Type: Contributed
Date/Time: Sunday, July 28, 2019 : 4:00 PM to 5:50 PM
Sponsor: ENAR
Abstract #307353 Presentation
Title: Missing Data Imputation for Classification Problems
Author(s): Arkopal Choudhury* and Michael Kosorok
Companies: University of North Carolina at Chapel Hill and University of North Carolina at Chapel Hill
Keywords: k Nearest Neighbors; Missing Data; kNN Imputation; Mutual Information; Grey Theory; Classification algorithm

Classification problems often have a lot of missing data in the training set used for classification. A widely used solution to this problem is imputation of missing values based on k nearest neighbors (kNN) of the missing observation. However, most of the former studies do not take into account the presence of the class label in the classification problem with missing data. Also, the existing kNN imputation methods use Minkowski distance or its variants as a measure of distance, which does not work well with heterogeneous data. In this paper, we propose a novel iterative kNN imputation technique based on class weighted gray distance between the missing datum and all the training data. Gray distance works well in heterogeneous data with missing instances. The distance is weighted by Mutual Information (MI) which is a measure of feature relevance between the discrete or continuous features and the class label. This ensures that the imputed dataset is better directed towards improving the classification performance. This class weighted gray distance based kNN imputation algorithm is compared with traditional kNN imputation algorithms as well as MICE and missForest using UCI datasets.

Authors who are presenting talks have a * after their name.

Back to the full JSM 2019 program