Abstract:
|
This paper studies an important sequential decision making problem known as the multi-armed bandit problem with covariates. Under a linear bandit framework with high-dimensional covariates, we propose a general multi-stage arm allocation algorithm that integrates both arm elimination and randomized assignment strategies. By employing a class of high-dimensional regression methods for coefficient estimation, the proposed algorithm is shown to have near optimal finite-time regret performance under a new study scope. Based on synergistically verified benefit of a margin condition, our algorithm also exhibits an adaptive performance that automatically adapts to the margin and attains the optimal regret rates, up to a logarithmic factor. Besides the desirable regret performance, the proposed algorithm simultaneously generates useful coefficient estimation output for competitive arms and is shown to achieve both estimation consistency and variable selection consistency. Promising empirical performance is demonstrated through an online service example.
|