Online Program Home
My Program

Abstract Details

Activity Number: 213 - Sequential Decision Making and Causal Inference
Type: Invited
Date/Time: Monday, July 29, 2019 : 2:00 PM to 3:50 PM
Sponsor: IMS
Abstract #300158
Title: Mostly Exploration-Free Algorithms for Contextual Bandits
Author(s): Mohsen Bayati*
Companies: Stanford University
Keywords: Contextual Bandits; Greedy algorithms

The contextual bandit literature has traditionally focused on algorithms that address the exploration-exploitation tradeoff. However, exploration-free greedy algorithms are desirable in practical settings where exploration may be costly or unethical (e.g., clinical trials). Surprisingly, we find that a simple greedy algorithm can be rate-optimal if there is sufficient randomness in the observed contexts. Furthermore, even absent this condition, we show that a greedy algorithm can be rate optimal with positive probability. Thus, standard bandit algorithms may unnecessarily explore. Motivated by these results, we introduce Greedy-First, a new algorithm that uses only observed contexts and rewards to determine whether to follow a greedy algorithm or to explore. We prove that this algorithm is rate-optimal without any additional assumptions on the context distribution or the number of arms. Extensive simulations demonstrate that Greedy-First successfully reduces exploration and outperforms existing (exploration-based) contextual bandit algorithms such as Thompson sampling or upper confidence bound (UCB)

Authors who are presenting talks have a * after their name.

Back to the full JSM 2019 program