Online Program Home
My Program

Abstract Details

Activity Number: 75 - Probability and Statistics
Type: Contributed
Date/Time: Sunday, July 28, 2019 : 4:00 PM to 5:50 PM
Sponsor: IMS
Abstract #303054
Title: Statistical Inference for Online Decision-Making: In a Contextual Bandit Setting
Author(s): Haoyu Chen* and Wenbin Lu and Rui Song
Companies: North Carolina State University and North Carolina State University and North Carolina State University
Keywords: epsilon-greedy; asymptotic distribution; inverse propensity weighted estimator; model misspecification

Online decision-making problem requires us to make a sequence of decisions based on incremental information. Common solutions often need to learn a reward model of different actions given the contextual information and then maximize the long-term reward. It is meaningful to know if the posited model is reasonable and how the model performs in the asymptotic sense. We study this problem under the setup of the contextual bandit with a linear reward model. The epsilon-greedy policy is adopted to address the classic exploration-and-exploitation dilemma. Using the martingale central limit theorem, we show that the online ordinary least squares estimator of model parameters is asymptotically normal. When the linear model is misspecified, we propose the online weighted least squares estimator using the inverse propensity score weighting and also establish its asymptotic normality. Based on the properties of the parameter estimators, we further show that the in-sample inverse propensity weighted value estimator is asymptotically normal. We illustrate our results using simulations and an application to a news article recommendation dataset from Yahoo!.

Authors who are presenting talks have a * after their name.

Back to the full JSM 2019 program