Abstract:
|
Reinforcement learning algorithms for decision-making under uncertainty constitute important trends of research in data science. The broad objective is to design effective experiments and learn from the resulting data about efficient actions for maximizing the reward. A classical model is that of contextual bandits where reward depends on an unknown parameter as well as contexts corresponding to available actions. On the other hand, a popular algorithm is posterior sampling; it is easily implementable and enjoys theoretical performance guarantees. At every time step, the algorithm updates a posterior belief about the unknown parameter, acts as if a sample drawn from the posterior is the true parameter, collects the resulting data, and iterates these steps in a sequential manner. The problem is well-studied assuming perfect observation, but implementable algorithms with performance guarantees are not currently available for partially observed data. We introduce a novel Bayesian decision-making algorithm consisting of multi-layered posterior sampling steps. Theoretical analyses together with numerical illustrations of the performance of the proposed algorithm will be presented.
|