Conference Program Home
  My Program

All Times EDT

Abstract Details

Activity Number: 535 - Learning Individualized Treatment Rules in Complex Settings
Type: Invited
Date/Time: Thursday, August 11, 2022 : 10:30 AM to 12:20 PM
Sponsor: Health Policy Statistics Section
Abstract #320680
Title: Off-Policy Evaluation in Partially Observed Markov Decision Processes
Author(s): Stefan Wager* and Yuchen Hu
Companies: Stanford University and Stanford
Keywords:
Abstract:

We consider off-policy evaluation of dynamic treatment rules under the assumption that the underlying system can be modeled as a partially observed Markov decision process (POMDP). We propose an estimator, partial history importance weighting, and show that it can consistently estimate the stationary mean rewards of a target policy given long enough draws from the behavior policy. Furthermore, we establish an upper bound on its error that decays polynomially in the number of observations (i.e., the number of trajectories times their length), with an exponent that depends on the overlap of the target and behavior policies, and on the mixing time of the underlying system. We also establish a polynomial minimax lower bound for off-policy evaluation under the POMDP assumption, and show that its exponent has the same qualitative dependence on overlap and mixing time as obtained in our upper bound. Together, our upper and lower bounds imply that off-policy evaluation in POMDPs is strictly harder than off-policy evaluation in (fully observed) Markov decision processes, but strictly easier than model-free off-policy evaluation.


Authors who are presenting talks have a * after their name.

Back to the full JSM 2022 program