Online Program

Return to main conference page

All Times EDT

Thursday, June 4
Machine Learning
Interactive Machine Learning
Thu, Jun 4, 1:20 PM - 2:55 PM
TBD
 

On the Global Convergence of Policy Optimization in Deep Reinforcement Learning (308249)

*Zhaoran Wang, Northwestern University 

Policy optimization (with neural networks as actor and critic) is the workhorse behind the success of deep reinforcement learning. However, its global convergence remains less understood, even in classical settings with linear function approximators. In this talk, I will show that coupled with neural networks, a variant of proximal/trust-region policy optimization (PPO/TRPO) globally converges to the optimal policy. In particular, I will illustrate how the overparametrization of neural networks enable us to establish strong guarantees. (Joint work with Qi Cai, Jason Lee, Boyi Liu, Zhuoran Yang)