Reinforcement learning problems concerning decision-making under uncertainty for continuous state and action spaces received a lot of attentions recently. The standard approach consists of balancing the exploration and the exploitation in models of linear dynamics and quadratic cost functions (LQ). The state-of-the-art results prescribe a class of randomized policies as practical methods with performance guarantees.
However, for the existing randomized algorithms, a comprehensive comparison is not currently available in the literature. This work compares various randomization procedures according to several important criteria such as learning accuracy, robustness to mis-specification, and regret due to uncertainty. We analyze different parametric and non-parametric schemes including action perturbation, posterior sampling, estimate randomization, residual bootstrap, and covariate resampling.