Abstract:
|
This work is concerned with offline reinforcement learning (RL), which learns using pre-collected data without further exploration. Effective offline RL would be able to accommodate distribution shift and insufficient data coverage. However, prior algorithms either suffer from suboptimal sample complexities or incur high burn-in cost, thus posing an impediment to efficient RL in sample-starved applications. In this work, we demonstrate that the model-based (or ``plug-in'') approach achieves minimax-optimal sample optimality with minimal burn-in cost. Our algorithms are pessimistic variants of value iteration with Bernstein-style penalties, which do not rely on sophisticated schemes like variance reduction.
|