Online Program

Return to main conference page

All Times EDT

Friday, October 2
Fri, Oct 2, 3:35 PM - 4:35 PM
Virtual
Concurrent Session

Generalization to Higher Order Function Smoothness in Bandit Optimization of Non-Convex Functions (309568)

*Yusha Liu, Machine Learning Department, Carnegie Mellon University 
Aarti Singh, Machine Learning Department, Carnegie Mellon University 

Keywords: continuum-armed bandit, smoothness, Holder class

We consider the problem of optimizing a smooth non-convex function in bandit setting, where at each round the decision-maker makes an active query of the function and observes a noisy zeroth order feedback. While optimization of smooth functions has been considered before, the bandit setting focuses on cumulative regret i.e. the feedback values should be controlled throughout the process. In bandit setting, this problem has been studied for Lipschitz functions. We study its generalization to the more general Holder class of smooth functions. For Lipschitz functions, an approach based on standard multi-armed bandit algorithm at the center of each bin suffices as optimal. For the general case, we construct a 2-step algorithm that deploys misspecified (linear/polynomial) bandit algorithms as arms since the reward in each bin is now biased, and prove that it achieves cumulative regret rate under Holder class with alpha<2 that is the same as simple regret rate studied before and matches the Lipschitz continuum-armed bandit rate, apart from log factors. Thus, with known smoothness parameters, our proposed algorithm can exploit higher order smoothness of functions in Holder space.