1932

Abstract

A dynamic treatment regime consists of a sequence of decision rules, one per stage of intervention, that dictate how to individualize treatments to patients, based on evolving treatment and covariate history. These regimes are particularly useful for managing chronic disorders and fit well into the larger paradigm of personalized medicine. They provide one way to operationalize a clinical decision support system. Statistics plays a key role in the construction of evidence-based dynamic treatment regimes—informing the best study design as well as efficient estimation and valid inference. Owing to the many novel methodological challenges this area offers, it has been growing in popularity among statisticians in recent years. In this article, we review the key developments in this exciting field of research. In particular, we discuss the sequential multiple assignment randomized trial designs, estimation techniques like Q-learning and marginal structural models, and several inference techniques designed to address the associated nonstandard asymptotics. We reference software whenever available. We also outline some important future directions.

Loading

Article metrics loading...

/content/journals/10.1146/annurev-statistics-022513-115553
2014-01-03
2024-04-25
Loading full text...

Full text loading...

/deliver/fulltext/statistics/1/1/annurev-statistics-022513-115553.html?itemId=/content/journals/10.1146/annurev-statistics-022513-115553&mimeType=html&fmt=ahah

Literature Cited

  1. Banks H, Jang T, Kwon H. 2011. Feedback control of HIV antiviral therapy with long measurement time. Int. J. Pure Appl. Math. 66:461–85 [Google Scholar]
  2. Bellman R. 1957. Dynamic Programming Princeton, NJ: Princeton Univ. Press
  3. Bennett C, Hauser K. 2012. Artificial intelligence framework for simulating clinical decision-making: a Markov decision process approach. Artif. Intell. Med. 57:9–19 [Google Scholar]
  4. Berger R, Boos D. 1994. P values maximized over a confidence set for the nuisance parameter. J. Am. Stat. Assoc. 89:1012–16 [Google Scholar]
  5. Box G, Hunter W, Hunter J. 1978. Statistics for Experimenters: An Introduction to Design, Data Analysis, and Model Building New York: Wiley
  6. Cai T, Tian L, Wong P, Wei L. 2011. Analysis of randomized comparative clinical trial data for personalized treatment selections. Biostatistics 12:270–82 [Google Scholar]
  7. Chakraborty B, Laber E, Zhao Y. 2013. Inference for optimal dynamic treatment regimes using an adaptive m-out-of-n bootstrap scheme. Biometrics 69:714–23 [Google Scholar]
  8. Chakraborty B, Murphy S, Strecher V. 2010. Inference for non-regular parameters in optimal dynamic treatment regimes. Stat. Methods Med. Res. 19:317–43 [Google Scholar]
  9. Coffey C, Levin B, Clark C, Timmerman C, Wittes J. et al. 2012. Overview, hurdles, and future work in adaptive designs: perspectives from an NIH-funded workshop. Clin. Trials 9:671–80 [Google Scholar]
  10. Cotton C, Heagerty P. 2011. A data augmentation method for estimating the causal effect of adherence to treatment regimens targeting control of an intermediate measure. Stat. Biosci. 3:28–44 [Google Scholar]
  11. Dawson R, Lavori P. 2010. Sample size calculations for evaluating treatment policies in multi-stage designs. Clin. Trials 7:643–52 [Google Scholar]
  12. Dawson R, Lavori P. 2012. Efficient design and inference for multistage randomized trials of individualized treatment policies. Biostatistics 13:142–52 [Google Scholar]
  13. Ernst D, Geurts P, Wehenkel L. 2005. Tree-based batch mode reinforcement learning. J. Mach. Learn. Res. 6:503–56 [Google Scholar]
  14. Feng W, Wahed A. 2008. Supremum weighted log-rank test and sample size for comparing two-stage adaptive treatment strategies. Biometrika 95:695–707 [Google Scholar]
  15. Feng W, Wahed A. 2009. Sample size for two-stage studies with maintenance therapy. Stat. Med. 28:2028–41 [Google Scholar]
  16. Gaweda A, Jacobs A, Aronoff G, Brier M. 2008. Model predictive control of erythropoietin administration in the anemia of ESRD. Am. J. Kidney Dis. 51:71–79 [Google Scholar]
  17. Gaweda A, Muezzinoglu M, Aronoff G, Jacobs A, Zurada J, Brier M. 2005. Individualization of pharmacological anemia management using reinforcement learning. Neural Netw. 18:826–34 [Google Scholar]
  18. Goldberg Y, Kosorok M. 2012. Q-learning with censored data. Ann. Stat. 40:529–60 [Google Scholar]
  19. Hernán MA, Lanoy E, Costagliola D, Robins JM. 2006. Comparison of dynamic treatment regimes via inverse probability weighting. Basic Clin. Pharmacol. Toxicol. 98:237–42 [Google Scholar]
  20. Imai K, Ratkovicz M. 2013. Estimating treatment effect heterogeneity in randomized program evaluation. Ann. Appl. Stat. 7:443–70 [Google Scholar]
  21. Jones H. 2010. Reinforcement-based treatment for pregnant drug abusers (HOME II) ClinicalTrials.gov database, updated October 19, 2012, accessed July 24, 2013, Natl. Inst. Health, Bethesda, MD. http://clinicaltrials.gov/ct2/show/NCT01177982 [Google Scholar]
  22. Kasari C. 2009. Developmental and augmented intervention for facilitating expressive language (CCNIA) ClinicalTrials.gov database, updated Apr. 26, 2012, accessed July 24, 2013. Natl. Inst. Health, Bethesda, MD. http://clinicaltrials.gov/ct2/show/NCT01013545 [Google Scholar]
  23. Kulkarni K, Gosavi A, Murray S, Grantham K. 2011. Semi-Markov adaptive critic heuristics with application to airline revenue management. J. Control Theory Appl. 9:421–30 [Google Scholar]
  24. Laber E, Murphy S. 2011. Adaptive confidence intervals for the test error in classification. J. Am. Stat. Assoc. 106:904–13 [Google Scholar]
  25. Lavori P, Dawson R. 2000. A design for testing clinical strategies: biased adaptive within-subject randomization. J. R. Stat. Soc. A 163:29–38 [Google Scholar]
  26. Lavori P, Dawson R. 2004. Dynamic treatment regimes: practical design considerations. Clin. Trials 1:9–20 [Google Scholar]
  27. Lavori P, Dawson R. 2008. Adaptive treatment strategies in chronic disease. Annu. Rev. Med. 59:443–53 [Google Scholar]
  28. Leeb H, Pötscher B. 2003. The finite-sample distribution of post-model-selection estimators and uniform versus nonuniform approximations. Econom. Theory 19:100–42 [Google Scholar]
  29. Lei H, Nahum-Shani I, Lynch K, Oslin D, Murphy S. 2011. A “SMART” design for building individualized treatment sequences. Annu. Rev. Clin. Psychol. 8:21–48 [Google Scholar]
  30. Li Z, Murphy S. 2011. Sample size formulae for two-stage randomized trials with survival outcomes. Biometrika 98:503–18 [Google Scholar]
  31. Lizotte D, Bowling M, Murphy S. 2010. Efficient reinforcement learning with multiple reward functions for randomized clinical trial analysis. Twenty-Seventh International Conference on Machine Learning (ICML)695–702 Haifa, Israel: Omnipress [Google Scholar]
  32. Lizotte D, Bowling M, Murphy S. 2012. Linear fitted-Q iteration with multiple reward functions. J. Mach. Learn. Res. 13:3253–95 [Google Scholar]
  33. Lunceford J, Davidian M, Tsiatis A. 2002. Estimation of survival distributions of treatment policies in two-stage randomization designs in clinical trials. Biometrics 58:48–57 [Google Scholar]
  34. Miyahara S, Wahed A. 2010. Weighted Kaplan-Meier estimators for two-stage treatment regimes. Stat. Med. 29:2581–91 [Google Scholar]
  35. Moodie E, Chakraborty B, Kramer M. 2012. Q-learning for estimating optimal dynamic treatment rules from observational data. Can. J. Stat. 40:629–45 [Google Scholar]
  36. Moodie E, Platt R, Kramer M. 2009. Estimating response-maximized decision rules with applications to breastfeeding. J. Am. Stat. Assoc. 104:155–65 [Google Scholar]
  37. Murphy S. 2003. Optimal dynamic treatment regimes. J. R. Stat. Soc. B 65:331–66 [Google Scholar]
  38. Murphy S. 2005a. A generalization error for Q-learning. J. Mach. Learn. Res. 6:1073–97 [Google Scholar]
  39. Murphy S. 2005b. An experimental design for the development of adaptive treatment strategies. Stat. Med. 24:1455–81 [Google Scholar]
  40. Murphy S, Bingham D. 2009. Screening experiments for developing dynamic treatment regimes. J. Am. Stat. Assoc. 184:391–408 [Google Scholar]
  41. Murphy S, van der Laan M, Robins JM, Conduct Probl. Prev. Res. Group 2001. Marginal mean models for dynamic regimes. J. Am. Stat. Assoc. 96:1410–23 [Google Scholar]
  42. Nahum-Shani I, Qian M, Almiral D, Pelham W, Gnagy B. et al. 2012a. Experimental design and primary data analysis methods for comparing adaptive interventions. Psychol. Methods 17:457–77 [Google Scholar]
  43. Nahum-Shani I, Qian M, Almiral D, Pelham W, Gnagy B. et al. 2012b. Q-learning: a data analysis method for constructing adaptive interventions. Psychol. Methods 17:478–94 [Google Scholar]
  44. Navarro-Barrientos J, Rivera D, Collins L. 2011. A dynamical model for describing behavioural interventions for weight loss and body composition change. Math. Comput. Model. Dyn. Syst. 17:183–203 [Google Scholar]
  45. Oetting A, Levy J, Weiss R, Murphy S. 2011. Statistical methodology for a SMART design in the development of adaptive treatment strategies. Causality and Psychopathology: Finding the Determinants of Disorders and Their Cures PE Shrout, KM Keyes, K Ornstein 179–205 New York: Oxford Univ. Press [Google Scholar]
  46. Olshen R. 1973. The conditional level of the F-test. J. Am. Stat. Assoc. 68:692–98 [Google Scholar]
  47. Orellana L, Rotnitzky A, Robins JM. 2010a. Dynamic regime marginal structural mean models for estimation of optimal dynamic treatment regimes, part I: main content. Int. J. Biostat. 6:8 [Google Scholar]
  48. Orellana L, Rotnitzky A, Robins JM. 2010b. Dynamic regime marginal structural mean models for estimation of optimal dynamic treatment regimes, part II: proofs and additional results. Int. J. Biostat. 6:9 [Google Scholar]
  49. Ormoneit D, Sen S. 2002. Kernel-based reinforcement learning. Mach. Learn. 49:161–78 [Google Scholar]
  50. Petersen ML, Deeks SG, van der Laan MJ. 2007. Individualized treatment rules: generating candidate clinical trials. Stat. Med. 26:4578–601 [Google Scholar]
  51. Qian M, Murphy S. 2011. Performance guarantees for individualized treatment rules. Ann. Stat. 39:1180–210 [Google Scholar]
  52. Rivera D, Pew M, Collins L. 2007. Using engineering control principles to inform the design of adaptive interventions: a conceptual introduction. Drug Alcohol Depend. 88:S31–40 [Google Scholar]
  53. Robins J. 1986. A new approach to causal inference in mortality studies with sustained exposure periods—application to control of the healthy worker survivor effect. Math. Model. 7:1393–512 [Google Scholar]
  54. Robins J. 1989. The analysis of randomized and nonrandomized AIDS treatment trials using a new approach to causal inference in longitudinal studies. Health Service Research Methodology: A Focus on AIDS L Sechrest, H Freeman, A Mulley 113–59 New York: Natl. Cent. Health Serv. Res. Health Care Technol. [Google Scholar]
  55. Robins J. 1993. Information recovery and bias adjustment in proportional hazards regression analysis of randomized trials using surrogate markers. Proc. Biopharm. Sect. Am. Stat. Assoc.24–33 Alexandria, VA: Am. Stat. Assoc. [Google Scholar]
  56. Robins J. 1994. Correcting for non-compliance in randomized trials using structural nested mean models. Commun. Stat. 23:2379–412 [Google Scholar]
  57. Robins J. 1997. Causal inference from complex longitudinal data. Latent Variable Modeling and Applications to Causality M Berkane 69–117 New York: Springer [Google Scholar]
  58. Robins J. 1999. Marginal structural models versus structural nested models as tools for causal inference. Statistical Models in Epidemiology, the Environment, and Clinical Trials ME Halloran, D Berry 95–134 New York: Springer [Google Scholar]
  59. Robins J. 2004. Optimal structural nested models for optimal sequential decisions. Proc. Seattle Symp. Biostat., 2nd. D Lin, P Heagerty 189–326 New York: Springer [Google Scholar]
  60. Robins JM, Hernán MA, Brumback B. 2000. Marginal structural models and causal inference in epidemiology. Epidemiology 11:550–60 [Google Scholar]
  61. Robins JM, Orellana L, Rotnitzky A. 2008. Estimation and extrapolation of optimal treatment and testing strategies. Stat. Med. 27:4678–721 [Google Scholar]
  62. Rosenberg E, Davidian M, Banks H. 2007. Using mathematical modeling and control to develop structured treatment interruption strategies for HIV infection. Drug Alcohol Depend. 88:S41–51 [Google Scholar]
  63. Rosthøj S, Fullwood C, Henderson R, Stewart S. 2006. Estimation of optimal dynamic anticoagulation regimes from observational data: a regret-based approach. Stat. Med. 25:4197–215 [Google Scholar]
  64. Rubin D. 1974. Estimating causal effects of treatments in randomized and nonrandomized studies. J. Educ. Psychol. 66:688–701 [Google Scholar]
  65. Rubin D. 1980. Discussion of “Randomized analysis of experimental data: the Fisher randomization test” by Basu D. J. Am. Stat. Assoc. 75:591–93 [Google Scholar]
  66. Shao J. 1994. Bootstrap sample size in nonregular cases. Proc. Am. Math. Soc. 122:1251–62 [Google Scholar]
  67. Shortreed S, Moodie E. 2012. Estimating the optimal dynamic antipsychotic treatment regime: evidence from the sequential-multiple assignment randomized CATIE Schizophrenia Study. J. R. Stat. Soc. C 61:577–99 [Google Scholar]
  68. Sutton R, Barto A. 1998. Reinforcement Learning: An Introduction Cambridge, MA: MIT Press
  69. Thall P, Millikan R, Sung H. 2000. Evaluating multiple treatment courses in clinical trials. Stat. Med. 30:1011–28 [Google Scholar]
  70. Thall P, Sung H, Estey E. 2002. Selecting therapeutic strategies based on efficacy and death in multicourse clinical trials. J. Am. Stat. Assoc. 97:29–39 [Google Scholar]
  71. Thall P, Wathen J. 2005. Covariate-adjusted adaptive randomization in a sarcoma trial with multi-stage treatments. Stat. Med. 24:1947–64 [Google Scholar]
  72. Thall PF, Logothetis C, Pagliaro LC, Wen S, Brown MA. et al. 2007. Adaptive therapy for androgen-independent prostate cancer: a randomized selection trial of four regimens. J. Natl. Cancer Inst. 99:1613–22 [Google Scholar]
  73. van der Laan MJ, Petersen ML. 2007a. Causal effect models for realistic individualized treatment and intention to treat rules. Int. J. Biostat. 3:3 [Google Scholar]
  74. van der Laan MJ, Petersen ML. 2007b. Statistical learning of origin-specific statically optimal individualized treatment rules. Int. J. Biostat. 3:6 [Google Scholar]
  75. Wagner E, Austin B, Davis C, Hindmarsh M, Schaefer J, Bonomi A. 2001. Improving chronic illness care: translating evidence into action. Health Aff. 20:64–78 [Google Scholar]
  76. Wahed A, Tsiatis A. 2004. Optimal estimator for the survival distribution and related quantities for treatment policies in two-stage randomized designs in clinical trials. Biometrics 60:124–33 [Google Scholar]
  77. Wahed A, Tsiatis A. 2006. Semiparametric efficient estimation of survival distributions in two-stage randomisation designs in clinical trials with censored data. Biometrika 93:163–77 [Google Scholar]
  78. Wang L, Rotnitzky A, Lin X, Millikan R, Thall P. 2012. Evaluation of viable dynamic treatment regimes in a sequentially randomized trial of advanced prostate cancer. J. Am. Stat. Assoc. 107:493–508 [Google Scholar]
  79. Zhang B, Tsiatis A, Davidian M, Zhang M, Laber E. 2012a. Estimating optimal treatment regimes from a classification perspective. Stat 1:103–14 [Google Scholar]
  80. Zhang B, Tsiatis A, Laber E, Davidian M. 2012b. A robust method for estimating optimal treatment regimes. Biometrics 68:1010–18 [Google Scholar]
  81. Zhao Y, Zeng D, Rush A, Kosorok M. 2012. Estimating individual treatment rules using outcome weighted learning. J. Am. Stat. Assoc. 107:1106–18 [Google Scholar]
  82. Zhao Y, Zeng D, Socinski M, Kosorok M. 2011. Reinforcement learning strategies for clinical trials in nonsmall cell lung cancer. Biometrics 67:1422–33 [Google Scholar]
/content/journals/10.1146/annurev-statistics-022513-115553
Loading
/content/journals/10.1146/annurev-statistics-022513-115553
Loading

Data & Media loading...

Supplemental Material

Supplementary Data

  • Article Type: Review Article
This is a required field
Please enter a valid email address
Approval was a Success
Invalid data
An Error Occurred
Approval was partially successful, following selected items could not be processed due to error