Abstract:
|
In sequential experiments, the exploration-exploitation trade-off describes the choice between (i) experiments that efficiently generate information useful for improving fitted models, and (ii) experiments that maximize utility based on current models. An optimal sequential experimental design maximizes longterm cumulative utility by exploring only if and when the value of information generated by exploration is expected to outweigh any short-term loss in utility. In general, it is not possible to construct an optimal sequential experimental design save for in contrived special cases. Thus, it is common to use heuristics that force exploration through randomization or by adding an exploration `bonus' to the utility function. We show that commonly used heuristics can frequently select experiments that generate less information and less utility than alternatives. Such experiments are said to be dominated in terms of utility and information. We show that, at each time point, one can remove from consideration experiments that are estimated to be dominated and then apply standard heuristics for forced exploration without destroying consistency or asymptotic normality.
|