Abstract:
|
We consider the problem of multi-product dynamic pricing, in a contextual setting, for a seller of differentiated products. In this environment, the customers arrive over time and products are described by high-dimensional feature vectors. Each customer chooses a product according to the widely used Multinomial Logit (MNL) choice model and her utility depends on the product features as well as the prices offered. The seller a-priori does not know the parameters of the choice model but can learn them through interactions with customers. The seller’s goal is to design a pricing policy that maximizes her cumulative revenue. We measure the performance of a pricing policy in terms of regret, which is the expected revenue loss with respect to a clairvoyant policy that knows the parameters in advance and always sets the revenue-maximizing prices. We propose a pricing policy, named M3P, that achieves a T-period regret of O(log(T d)(v T + d log(T))) under heterogeneous price sensitivity for products with features of dimension d. We also use tools from information theory to prove that no policy can achieve worst-case T-regret better than ?(v T).
|