Abstract:
|
We explore the role of Conditional Generative Adversarial Networks (GAN) in imputing missing data and apply GAN imputation on a novel use case in e-commerce: a learning-to-rank problem with incomplete training data. Conventional imputation methods often make strict assumptions regarding the underlying distribution and missing mechanism of training data. Our proposed methodology is a simple solution that guarantees compatible imputations across different missing mechanisms, sidesteps approximating intractable distributions while improving imputation quality, and supports downstream business applications. First, we prove that GAN imputation offers theoretical guarantees beyond the naive Missing Completely At Random (MCAR) mechanism. We show that empirically, Conditional GAN imputations on an Amazon Search ranking dataset have the lowest RMSE compared to benchmarks across all levels of missingness. Using GAN-imputed ranking dataset, we produce standard ranking models that are comparable to training on ground-truth data based on standard ranking quality metrics NDCG and MRR.
|