Online Program

Constructing dynamic treatment regimes using Greedy-GQ algorithm

*Ashkan Ertefaie, Postdoc fellow 
Susan A Murphy, University of Michigan 

Keywords: Dynamic treatment regime, Greedy-GQ, Reinforcement learning

We develop a methodology for constructing optimal dynamic treatment regimes from longitudinal data collected in an observational study. The optimal regime is the one that maximizes the expected utility function over all the enforceable regimes. We generalize a reinforcement algorithm called Greedy-GQ which facilitates estimation of the optimal regime in settings with many decision points. The proposed method determines the optimal regime by utilizing the individual covariate history in an infinite horizon setting where there is no a priori fixed end of follow up point. We discuss the assumptions needed to identify the optimal regime using the Greedy-GQ algorithm and derive large sample results necessary for conducting inference. Our simulation study examines the performance of the proposed method under different scenarios.