
Full text loading...
Adaptive experiments such as multi-armed bandits adapt the treatment-allocation policy and/or the decision to stop the experiment to the data observed so far. This has the potential to improve outcomes for study participants within the experiment, to improve the chance of identifying the best treatments after the experiment, and to avoid wasting data. As an experiment (and not just a continually optimizing system), it is still desirable to draw statistical inferences with frequentist guarantees. The concentration inequalities and union bounds that generally underlie adaptive experimentation algorithms can yield overly conservative inferences, but at the same time, the asymptotic normality we would usually appeal to in nonadaptive settings can be imperiled by adaptivity. In this article we aim to explain why, how, and when adaptivity is in fact an issue for inference and, when it is, to understand the various ways to fix it: reweighting to stabilize variances and recover asymptotic normality, using always-valid inference based on joint normality of an asymptotic limiting sequence, and characterizing and inverting the nonnormal distributions induced by adaptivity.
Article metrics loading...
Full text loading...
Literature Cited