Abstract:
|
Among modern scientific experiments, many data sets are collected adaptively using methods such as multi-armed bandits (MAB). This process will lead to the dependence between data and then make the sample mean a biased estimator for the mean parameter. It is shown that the issue of bias could be severe in many cases. To remedy this, we come up with an approach, named as rMAB, which combines a randomization step with a MAB algorithm. It is guaranteed that rMAB can achieve the optimal regret when the randomization is chosen properly. Furthermore, the bias can be substantially reduced due to the randomization, as demonstrated via extensive numerical studies.
|