Name: 2021 Joint Statistical Meetings
Start: 2021-08-08T07:00:00+00:00
End: 2021-08-12

Online Program Home
My Program

All Times EDT

Activity Number:	276 - Statistical Foundations of Reinforcement Learning
Type:	Topic-Contributed
Date/Time:	Wednesday, August 11, 2021 : 1:30 PM to 3:20 PM
Sponsor:	IMS
Abstract #317097
Title:	Is Q-Learning Minimax Optimal?
Author(s):	Yuejie Chi*
Companies:	Carnegie Mellon University
Keywords:
Abstract:	Q-learning, which seeks to learn the optimal Q-function of a Markov decision process (MDP) in a model-free fashion, lies at the heart of reinforcement learning. When it comes to the synchronous setting (such that independent samples for all state-action pairs are drawn from a generative model in each iteration), substantial progress has been made recently towards understanding the sample efficiency of Q-learning. However, state-of-the-art theory for Q-learning fails to match with the existing minimax lower bound. This gives rise to natural questions: what is the sharp sample complexity of Q-learning? Is Q-learning provably sub-optimal? In this work, we settle these questions by (1) proving a tighter sample complexity bound of Q-learning than prior art, and (2) developing a matching lower bound to confirm the sharpness of our result. Our findings unveil both the effectiveness and limitation of Q-learning: its sample complexity matches that of speedy Q-learning without requiring extra computation and storage, albeit still being considerably higher than the minimax lower bound for problems with long effective horizon.

Authors who are presenting talks have a * after their name.

JSM 2021 Online Program