Abstract:
|
In Q-learning, a key step is the modelling of Q-functions in each stage of the algorithm. In this setting, the commonly used linear model is unlikely to be adequate, especially for the stages before the last, due to taking maximum over multiple functions. And this becomes even worse when the outcome is survival. We consider modelling the Q-functions using a nonparametric model, a partly linear model, and a linear model. When the outcome is survival, we also consider an accelerated failure time model and use inverse probability weighting to deal with censoring. We compare the performances of the Q-learning algorithm under different working models and different true models for the last stage via extensive simulation studies and make recommendations for practice.
|