Online Program Home
  My Program

All Times EDT

Abstract Details

Activity Number: 146 - Statistical Reinforcement Learning
Type: Invited
Date/Time: Tuesday, August 10, 2021 : 10:00 AM to 11:50 AM
Sponsor: Section on Statistical Computing
Abstract #316649
Title: Statistical Inference of the Value Function for Reinforcement Learning in Infinite Horizon Settings
Author(s): Rui Song*
Companies: North Carolina State University
Keywords: Statistical inference; Reinforcement Learning
Abstract:

Reinforcement learning is a general technique that allows an agent to learn an optimal policy and interact with an environment in sequential decision making problems. The goodness of a policy is measured by its value function starting from some initial state. As many reinforcement learning algorithms have been proposed in the computer science and statistics literature, it is crucial to examine their performance by evaluating the values under their estimated policies. The focus of this talk is to construct confidence intervals (CIs) for a policy's value in infinite horizon settings where the number of decision points diverges to infinity. We propose to model the action-value state function (Q-function) associated with a policy based on series/sieve method to derive its confidence interval. When the target policy depends on the observed data as well, a sample-splitting method is proposed to recursively update the estimated policy and its value estimator. As long as either the number of trajectories or the number of decision points diverges to infinity, we show that the proposed CI achieves nominal coverage even in cases where the optimal policy is not unique.


Authors who are presenting talks have a * after their name.

Back to the full JSM 2021 program