1932

Abstract

Adaptive experiments such as multi-armed bandits adapt the treatment-allocation policy and/or the decision to stop the experiment to the data observed so far. This has the potential to improve outcomes for study participants within the experiment, to improve the chance of identifying the best treatments after the experiment, and to avoid wasting data. As an experiment (and not just a continually optimizing system), it is still desirable to draw statistical inferences with frequentist guarantees. The concentration inequalities and union bounds that generally underlie adaptive experimentation algorithms can yield overly conservative inferences, but at the same time, the asymptotic normality we would usually appeal to in nonadaptive settings can be imperiled by adaptivity. In this article we aim to explain why, how, and when adaptivity is in fact an issue for inference and, when it is, to understand the various ways to fix it: reweighting to stabilize variances and recover asymptotic normality, using always-valid inference based on joint normality of an asymptotic limiting sequence, and characterizing and inverting the nonnormal distributions induced by adaptivity.

Loading

Article metrics loading...

/content/journals/10.1146/annurev-statistics-040522-015431
2025-03-07
2025-06-24
Loading full text...

Full text loading...

/deliver/fulltext/statistics/12/1/annurev-statistics-040522-015431.html?itemId=/content/journals/10.1146/annurev-statistics-040522-015431&mimeType=html&fmt=ahah

Literature Cited

  1. Adusumilli K. 2023.. Optimal tests following sequential experiments. . arXiv:2305.00403 [econ.EM]
  2. Athey S, Byambadalai U, Hadad V, Krishnamurthy SK, Leung W, Williams JJ. 2022.. Contextual bandits in a survey experiment on charitable giving: within-experiment outcomes versus policy learning. . arXiv:2211.12004 [econ.EM]
  3. Bibaut A, Dimakopoulou M, Kallus N, Chambaz A, van der Laan M. 2021a.. Post-contextual-bandit inference. . In NIPS '21: Proceedings of the 35th International Conference on Neural Information Processing Systems, ed. M Ranzato, A Beygelzimer, Y Dauphin, PS Liang, J Wortman Vaughan , pp. 2854859. Red Hook, NY:: Curran
    [Google Scholar]
  4. Bibaut A, Kallus N, Dimakopoulou M, Chambaz A, van der Laan M. 2021b.. Risk minimization from adaptively collected data: guarantees for supervised and policy learning. . In NIPS '21: Proceedings of the 35th International Conference on Neural Information Processing Systems, ed. M Ranzato, A Beygelzimer, Y Dauphin, PS Liang, J Wortman Vaughan , pp. 1926173. Red Hook, NY:: Curran
    [Google Scholar]
  5. Bibaut A, Kallus N, Lindon M. 2022.. Near-optimal non-parametric sequential tests and confidence sequences with possibly dependent observations. . arXiv:2212.14411 [stat.ME]
  6. Bibaut A, Petersen M, Vlassis N, Dimakopoulou M, van der Laan M. 2021c.. Sequential causal inference in a single world of connected units. . arXiv:2101.07380 [math.ST]
  7. Caria AS, Gordon G, Kasy M, Quinn S, Shami SO, Teytelboym A. 2024.. An adaptive targeted field experiment: job search assistance for refugees in Jordan. . J. Eur. Econ. Assoc. 22:(2):781836
    [Crossref] [Google Scholar]
  8. Chen J, Andrews I. 2023.. Optimal conditional inference in adaptive experiments. . arXiv:2309.12162 [stat.ME]
  9. Chernozhukov V, Hansen C, Kallus N, Spindler M, Syrgkanis V. 2024.. Applied causal inference powered by ML and AI. . arXiv:2403.02467 [econ.EM]
  10. Cohen J, Dupas P, Schaner S. 2015.. Price subsidies, diagnostic tests, and targeting of malaria treatment: evidence from a randomized controlled trial. . Am. Econ. Rev. 105:(2):60945
    [Crossref] [Google Scholar]
  11. Cook T, Mishler A, Ramdas A. 2024.. Semiparametric efficient inference in adaptive experiments. . Proc. Mach. Learn. Res. 236::103364
    [Google Scholar]
  12. Deshpande Y, Mackey L, Syrgkanis V, Taddy M. 2018.. Accurate inference for adaptive linear models. . Proc. Mach. Learn. Res. 80::1194203
    [Google Scholar]
  13. Garivier A, Kaufmann E. 2016.. Optimal best arm identification with fixed confidence. . Proc. Mach. Learn. Res. 49::9981027
    [Google Scholar]
  14. Hadad V, Hirshberg DA, Zhan R, Wager S, Athey S. 2021.. Confidence intervals for policy evaluation in adaptive experiments. . PNAS 118:(15):e2014602118
    [Crossref] [Google Scholar]
  15. Hall P, Heyde CC. 2014.. Martingale Limit Theory and Its Application. New York:: Academic
    [Google Scholar]
  16. Hirano K, Porter JR. 2023.. Asymptotic representations for sequential decisions, adaptive experiments, and batched bandits. . arXiv:2302.03117 [econ.EM]
  17. Howard SR, Ramdas A, McAuliffe J, Sekhon J. 2021.. Time-uniform, nonparametric, nonasymptotic confidence sequences. . Ann. Stat. 49:(2):105580
    [Crossref] [Google Scholar]
  18. Kallus N, Udell M. 2020.. Dynamic assortment personalization in high dimensions. . Oper. Res. 68:(4):102037
    [Crossref] [Google Scholar]
  19. Keener RW. 2010.. Theoretical Statistics: Topics for a Core Course. New York:: Springer
    [Google Scholar]
  20. Lattimore T, Szepesvári C. 2020.. Bandit Algorithms. Cambridge, UK:: Cambridge Univ. Press
    [Google Scholar]
  21. Li L, Chu W, Langford J, Schapire RE. 2010.. A contextual-bandit approach to personalized news article recommendation. . In Proceedings of the 19th International Conference on World Wide Web, pp. 66170. New York:: ACM
    [Google Scholar]
  22. Luedtke AR, van der Laan MJ. 2016.. Statistical inference for the mean outcome under a possibly non-unique optimal treatment strategy. . Ann. Stat. 44:(2):71342
    [Crossref] [Google Scholar]
  23. Nie X, Tian X, Taylor J, Zou J. 2018.. Why adaptively collected data have negative bias and how to correct for it. . Proc. Mach. Learn. Res. 84::126169
    [Google Scholar]
  24. Offer-Westort M, Rosenzweig LR, Athey S. 2020.. Optimal policies to battle the coronavirus ``infodemic'' among social media users in sub-Saharan Africa: pre-analysis plan. . Work. Pap. 3913 , Stanford Sch. Bus., Stanford Univ., Stanford, CA:
  25. Qiang S, Bayati M. 2016.. Dynamic pricing with demand covariates. . arXiv:1604.07463 [stat.ML]
  26. Rosenberger WF, Hu F. 1999.. Bootstrap methods for adaptive designs. . Stat. Med. 18:(14):175767
    [Crossref] [Google Scholar]
  27. Shin J, Ramdas A, Rinaldo A. 2019.. Are sample means in multi-armed bandits positively or negatively biased?. In NIPS'19: Proceedings of the 33rd International Conference on Neural Information Processing Systems, ed. HM Wallach, H Larochelle, A Beygelzimer, F d'Alché-Buc, EB Fox , pp. 7102111. Red Hook, NY:: Curran
    [Google Scholar]
  28. Strassen V. 1967.. Almost sure behavior of sums of independent random variables and martingales. . In Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability: Contributions to Probability Theory, Part 1, Vol. 2, ed. LM Le Cam, J Neyman , pp. 31543. Berkeley:: Univ. Calif. Press
    [Google Scholar]
  29. Van der Vaart AW. 2000.. Asymptotic Statistics. Cambridge Ser. Stat. Probab. Math. 3. Cambridge, UK:: Cambridge Univ. Press
    [Google Scholar]
  30. Wang J, He X. 2023.. Subgroup analysis and adaptive experiments crave for debiasing. . Wiley Interdiscip. Rev. Comput. Stat. 15:(6):e1614
    [Crossref] [Google Scholar]
  31. Waudby-Smith I, Arbour D, Sinha R, Kennedy EH, Ramdas A. 2021.. Time-uniform central limit theory and asymptotic confidence sequences. . arXiv:2103.06476 [math.ST]
  32. Wei L, Smythe RT, Lin D, Park T. 1990.. Statistical inference with data-dependent treatment allocation rules. . J. Am. Stat. Assoc. 85:(409):15662
    [Crossref] [Google Scholar]
  33. Zhan R, Hadad V, Hirshberg DA, Athey S. 2021.. Off-policy evaluation via adaptive weighting with data from contextual bandits. . In Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining, pp. 212535. New York:: ACM
    [Google Scholar]
  34. Zhang K, Janson L, Murphy S. 2020.. Inference for batched bandits. . In NIPS '20: Proceedings of the 34th International Conference on Neural Information Processing Systems, ed. H Larochelle, M Ranzato, R Hadsell, M Balcan, H Lin , pp. 981829. Red Hook, NY:: Curran
    [Google Scholar]
  35. Zhang K, Janson L, Murphy S. 2021.. Statistical inference with M-estimators on adaptively collected data. . In NIPS '21: Proceedings of the 35th International Conference on Neural Information Processing Systems, ed. M Ranzato, A Beygelzimer, Y Dauphin, PS Liang, J Wortman Vaughan , pp. 746071. Red Hook, NY:: Curran
    [Google Scholar]
/content/journals/10.1146/annurev-statistics-040522-015431
Loading
This is a required field
Please enter a valid email address
Approval was a Success
Invalid data
An Error Occurred
Approval was partially successful, following selected items could not be processed due to error