Online Program Home
  My Program

All Times EDT

Abstract Details

Activity Number: 276 - Statistical Foundations of Reinforcement Learning
Type: Topic-Contributed
Date/Time: Wednesday, August 11, 2021 : 1:30 PM to 3:20 PM
Sponsor: IMS
Abstract #317421
Title: Distributional Robust Batch Contextual Bandits
Author(s): Zhengyuan Zhou*
Companies: New York University
Keywords:
Abstract:

Policy learning using historical observational data is an important problem that has found widespread applications. Examples include selecting offers, prices, advertisements to send to customers, as well as selecting which medication to prescribe to a patient. However, existing literature rests on the crucial assumption that the future environment where the learned policy will be deployed is the same as the past environment that has generated the data--an assumption that is often false or too coarse an approximation. In this paper, we lift this assumption and aim to learn a distributional robust policy with incomplete (bandit) observational data. We propose a novel learning algorithm that is able to learn a robust policy to adversarial perturbations and unknown covariate shifts. We first present a policy evaluation procedure in the ambiguous environment and then give a performance guarantee based on the theory of uniform convergence. Additionally, we also give a heuristic algorithm to solve the distributional robust policy learning problems efficiently.


Authors who are presenting talks have a * after their name.

Back to the full JSM 2021 program