Abstract:
|
We propose a model-assisted estimator for the optimal intervention strategy in a Markov Decision Process that: (i) is consistent if the estimating equation underpinning a model-free method is unbiased; (ii) recovers the projection of the value onto the model class defined by the estimating equation if the posited dynamics model is correctly specified; and (iii) is asymptotically efficient if the dynamics model is correctly specified and the estimating equation is unbiased. This estimator relies on two key ideas. First, we use the posited dynamics model to derive efficient weights for the estimating equation in the model-free approach; if the model is misspecified, the estimating equation remains unbiased, but the weights are not efficient. Second, bias of the estimating equation can be identified under a working model. The model-free estimator can recover the projection of the model-based estimate onto the model-free class by orthogonalizing the estimating equation weights at the projection. We show that weights derived from a working model can substantially improve empirical performance if either the posited model is incorrect or the estimating equations are biased.
|