1932

Abstract

In recent years, there has been a growing trend of applying reinforcement learning (RL) in financial applications. This approach has shown great potential for decision-making tasks in finance. In this review, we present a comprehensive study of the applications of RL in finance and conduct a series of meta-analyses to investigate the common themes in the literature, such as the factors that most significantly affect RL's performance compared with traditional methods. Moreover, we identify challenges, including explainability, Markov decision process modeling, and robustness, that hinder the broader utilization of RL in the financial industry and discuss recent advancements in overcoming these challenges. Finally, we propose future research directions, such as benchmarking, contextual RL, multi-agent RL, and model-based RL, to address these challenges and to further enhance the implementation of RL in finance.

Loading

Article metrics loading...

/content/journals/10.1146/annurev-statistics-112723-034423
2025-03-07
2025-06-17
Loading full text...

Full text loading...

/deliver/fulltext/statistics/12/1/annurev-statistics-112723-034423.html?itemId=/content/journals/10.1146/annurev-statistics-112723-034423&mimeType=html&fmt=ahah

Literature Cited

  1. Almgren R, Chriss N. 2001.. Optimal execution of portfolio transactions. . J. Risk 3::540
    [Crossref] [Google Scholar]
  2. Ardon L, Vadori N, Spooner T, Xu M, Vann J, Ganesh S. 2021.. Towards a fully RL-based market simulator. . In ICAIF '21: Proceedings of the Second ACM International Conference on AI in Finance, pp. 19. New York:: ACM
    [Google Scholar]
  3. Avellaneda M, Stoikov S. 2008.. High-frequency trading in a limit order book. . Quant. Finance 8:(3):21724
    [Crossref] [Google Scholar]
  4. Bellman R. 1966.. Dynamic programming. . Science 153:(3731):3437
    [Crossref] [Google Scholar]
  5. Benhamou E, Saltiel D, Ohana JJ, Atif J. 2021.. Detecting and adapting to crisis pattern with context based deep reinforcement learning. . In 2020 25th International Conference on Pattern Recognition (ICPR), pp. 1005057. Piscataway, NJ:: IEEE
    [Google Scholar]
  6. Bertsimas D, Lo AW. 1998.. Optimal control of execution costs. . J. Financ. Markets 1:(1):150
    [Crossref] [Google Scholar]
  7. Bradley BO, Taqqu MS. 2003.. Financial risk and heavy tails. . In Handbook of Heavy Tailed Distributions in Finance, ed. ST Rachev , pp. 35103. Amsterdam:: Elsevier
    [Google Scholar]
  8. Byrd D, Hybinette M, Balch TH. 2020.. ABIDES: towards high-fidelity multi-agent market simulation. . In Proceedings of the 2020 ACM SIGSIM Conference on Principles of Advanced Discrete Simulation, pp. 1122. New York:: ACM
    [Google Scholar]
  9. Chakraborty B, Murphy SA. 2014.. Dynamic treatment regimes. . Annu. Rev. Stat. Appl. 1::44764
    [Crossref] [Google Scholar]
  10. Chan NT, Shelton C. 2001.. An electronic market-maker. Rep. , Artif. Intell. Lab., MIT, Cambridge, MA:
    [Google Scholar]
  11. Charpentier A, Elie R, Remlinger C. 2021.. Reinforcement learning in economics and finance. . Comput. Econ. 62::42562
    [Crossref] [Google Scholar]
  12. Chitnis R, Lozano-Pérez T. 2020.. Learning compact models for planning with exogenous processes. . Proc. Mach. Learn. Res. 100::81322
    [Google Scholar]
  13. Deng Y, Bao F, Kong Y, Ren Z, Dai Q. 2016.. Deep direct reinforcement learning for financial signal representation and trading. . IEEE Trans. Neural Netw. Learn. Syst. 28:(3):65364
    [Crossref] [Google Scholar]
  14. Fama EF. 1970.. Efficient capital markets: a review of theory and empirical work. . J. Finance 25:(2):383417
    [Crossref] [Google Scholar]
  15. Fan J, Yao Q. 2017.. The Elements of Financial Econometrics. Cambridge, UK:: Cambridge Univ. Press
    [Google Scholar]
  16. Fang Y, Ren K, Liu W, Zhou D, Zhang W, et al. 2021.. Universal trading for order execution with oracle policy distillation. . In Proceedings of the AAAI Conference on Artificial Intelligence, pp. 10715. Washington, DC:: AAAI
    [Google Scholar]
  17. Fischer TG. 2018.. Reinforcement learning in financial markets–a survey. Tech. Rep., FAU Discussion Papers in Economics, Friedrich-Alexander-Univ., Nürnberg, Ger:.
    [Google Scholar]
  18. Ganesh S, Vadori N, Xu M, Zheng H, Reddy P, Veloso M. 2019.. Reinforcement learning for market making in a multi-agent dealer market. . arXiv:1911.05892 [q-fin.TR]
  19. Gao Z, Gao Y, Hu Y, Jiang Z, Su J. 2020.. Application of deep Q-network in portfolio management. . In 2020 5th IEEE International Conference on Big Data Analytics (ICBDA), pp. 26875. Piscataway, NJ:: IEEE
    [Google Scholar]
  20. Gašperov B, Begušić S, Posedel Šimović P, Kostanjčar Z. 2021.. Reinforcement learning approaches to optimal market making. . Mathematics 9:(21):2689
    [Crossref] [Google Scholar]
  21. Gašperov B, Kostanjčar Z. 2021.. Market making with signals through deep reinforcement learning. . IEEE Access 9::6161122
    [Crossref] [Google Scholar]
  22. Ge W, Lalbakhsh P, Isai L, Lenskiy A, Suominen H. 2022.. Neural network–based financial volatility forecasting: a systematic review. . ACM Comput. Surv. 55:(1):14
    [Google Scholar]
  23. Guéant O, Manziuk I. 2019.. Deep reinforcement learning for market making in corporate bonds: beating the curse of dimensionality. . Appl. Math. Finance 26:(5):387452
    [Crossref] [Google Scholar]
  24. Guharay SK, Thakur GS, Goodman FJ, Rosen SL, Houser D. 2013.. Analysis of non-stationary dynamics in the financial system. . Econ. Lett. 121:(3):45457
    [Crossref] [Google Scholar]
  25. Hafner D, Davidson J, Vanhoucke V. 2018.. TensorFlow agents: Efficient batched reinforcement learning in TensorFlow. . arXiv:1709.02878 [cs.LG]
  26. Hambly B, Xu R, Yang H. 2021.. Policy gradient methods for the noisy linear quadratic regulator over a finite horizon. . SIAM J. Control Optim. 59:(5):335991
    [Crossref] [Google Scholar]
  27. Hambly B, Xu R, Yang H. 2023.. Recent advances in reinforcement learning in finance. . Math. Finance 33:(3):437503
    [Crossref] [Google Scholar]
  28. Heinrich J, Silver D. 2016.. Deep reinforcement learning from self-play in imperfect-information games. . arXiv:1603.01121 [cs.LG]
  29. Hendricks D, Wilcox D. 2014.. A reinforcement learning extension to the Almgren-Chriss framework for optimal trade execution. . In 2014 IEEE Conference on Computational Intelligence for Financial Engineering & Economics (CIFEr), pp. 45764. Piscataway, NJ:: IEEE
    [Google Scholar]
  30. Horst U. 2018.. Introduction to Mathematical Finance. Berlin:: Humboldt Univ.
    [Google Scholar]
  31. Huang J, Chai J, Cho S. 2020.. Deep learning in finance and banking: a literature review and classification. . Front. Bus. Res. China 14::13
    [Crossref] [Google Scholar]
  32. Huo X, Fu F. 2017.. Risk-aware multi-armed bandit problem with application to portfolio selection. . R. Soc. Open Sci. 4:(11):171377
    [Crossref] [Google Scholar]
  33. Israel R, Kelly BT, Moskowitz TJ. 2020.. Can machines “learn” finance?. J. Invest. Manag. 18:(2):2336
    [Google Scholar]
  34. Jegadeesh N, Titman S. 1993.. Returns to buying winners and selling losers: implications for stock market efficiency. . J. Finance 48::6591
    [Crossref] [Google Scholar]
  35. Jiang Z, Xu D, Liang J. 2017.. A deep reinforcement learning framework for the financial portfolio management problem. . arXiv:1706.10059 [q-fin.CP]
  36. Jin O, El-Saawy H. 2016.. Portfolio management using reinforcement learning. Work. Pap., Dep. Stat., Stanford Univ. , Stanford, CA:
    [Google Scholar]
  37. Kalman RE. 1960.. A new approach to linear filtering and prediction problems. . J. Basic Eng. 82:(1):3545
    [Crossref] [Google Scholar]
  38. Karpe M, Fang J, Ma Z, Wang C. 2020.. Multi-agent reinforcement learning in a realistic limit order book market simulation. . In ICAIF '20: Proceedings of the First ACM International Conference on AI in Finance, art. 30 . New York:: ACM
    [Google Scholar]
  39. King A. 2024.. TensorTrade: trade efficiently with reinforcement learning. . Statistical Software, version 1.0.3. https://github.com/tensortrade-org/tensortrade?tab=readme-ov-file
    [Google Scholar]
  40. Koratamaddi P, Wadhwani K, Gupta M, Sanjeevi SG. 2021.. Market sentiment-aware deep reinforcement learning approach for stock portfolio allocation. . Eng. Sci. Technol. Int. J. 24:(4):84859
    [Google Scholar]
  41. Kosorok MR, Laber EB. 2019.. Precision medicine. . Annu. Rev. Stat. Appl. 6::26386
    [Crossref] [Google Scholar]
  42. Lakonishok J, Shapiro AC. 1984.. Stock returns, beta, variance and size: an empirical analysis. . Financ. Anal. J. 40:(4):3641
    [Crossref] [Google Scholar]
  43. Lattimore T, Szepesvári C. 2020.. Bandit Algorithms. Cambridge, UK:: Cambridge Univ. Press
    [Google Scholar]
  44. Lee J, Kim R, Yi SW, Kang J. 2020.. MAPS: multi-agent reinforcement learning-based portfolio management system. . arXiv:2007.05402 [cs.AI]
  45. Lei K, Zhang B, Li Y, Yang M, Shen Y. 2020.. Time-driven feature-aware jointly deep reinforcement learning for financial signal representation and algorithmic trading. . Expert Syst. Appl. 140::112872
    [Crossref] [Google Scholar]
  46. Levine S, Kumar A, Tucker G, Fu J. 2020.. Offline reinforcement learning: tutorial, review, and perspectives on open problems. . arXiv:2005.01643 [cs.LG]
  47. Li B, Hoi SC. 2014.. Online portfolio selection: a survey. . ACM Comput. Surv. 46:(3):35
    [Google Scholar]
  48. Li M, Shi C, Wu Z, Fryzlewicz P. 2024.. Testing stationarity and change point detection in reinforcement learning. . arXiv:2203.01707 [stat.ML]
    [Google Scholar]
  49. Li X, Li Y, Zhan Y, Liu XY. 2019.. Optimistic bull or pessimistic bear: adaptive deep reinforcement learning for stock portfolio allocation. . arXiv:1907.01503 [q-fin.ST]
  50. Liang E, Liaw R, Nishihara R, Moritz P, Fox R, et al. 2018.. RLlib: Abstractions for distributed reinforcement learning. . Proc. Mach. Learn. Res. 80::305362
    [Google Scholar]
  51. Lillicrap TP, Hunt JJ, Pritzel A, Heess N, Erez T, et al. 2019.. Continuous control with deep reinforcement learning. . arXiv:1509.02971 [cs.LG]
  52. Lin S, Beling PA. 2020a.. A deep reinforcement learning framework for optimal trade execution. . In Joint European Conference on Machine Learning and Knowledge Discovery in Databases, ed. Y Dong, G Ifrim, D Mladenić, C Saunders, S Van Hoecke , pp. 22340. New York:: Springer
    [Google Scholar]
  53. Lin S, Beling PA. 2020b.. An end-to-end optimal trade execution framework based on proximal policy optimization. . In Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence (IJCAI-20), ed. C Bessiere , pp. 454854. Palo Alto, CA:: AAAI
    [Google Scholar]
  54. Liu XY, Rui J, Gao J, Yang L, Yang H, et al. 2022a.. FinRL-Meta: a universe of near-real market environments for data-driven deep reinforcement learning in quantitative finance. . arXiv:2112.06753 [q-fin.TR]
  55. Liu XY, Xiong Z, Zhong S, Yang H, Walid A. 2022b.. Practical deep reinforcement learning approach for stock trading. . arXiv:1811.07522 [cs.LG]
  56. Liu XY, Yang H, Chen Q, Zhang R, Yang L, et al. 2022c.. FinRL: A deep reinforcement learning library for automated stock trading in quantitative finance. . arXiv:2011.09607 [q-fin.TR]
  57. Liu XY, Yang H, Gao J, Wang CD. 2021.. FinRL: deep reinforcement learning framework to automate trading in quantitative finance. . In ICAIF '21: Proceedings of the Second ACM International Conference on AI in Finance, art . 1. New York:: ACM
    [Google Scholar]
  58. Liu Y, Liu Q, Zhao H, Pan Z, Liu C. 2020.. Adaptive quantitative trading: an imitative deep reinforcement learning approach. . In Proceedings of the AAAI Conference on Artificial Intelligence, pp. 212835. Washington, DC:: AAAI
    [Google Scholar]
  59. Ma C, Zhang J, Liu J, Ji L, Gao F. 2021.. A parallel multi-module deep reinforcement learning algorithm for stock trading. . Neurocomputing 449::290302
    [Crossref] [Google Scholar]
  60. Mani M, Phelps S, Parsons S. 2019.. Applications of reinforcement learning in automated market-making. . In Proceedings of the GAIW: Games, Agents and Incentives Workshops, Montreal, Canada, pp. 1314. N.p.:: IFAAMS
    [Google Scholar]
  61. Markowitz H. 1952.. Portfolio selection. . J. Finance 7:(1):7791
    [Google Scholar]
  62. Menezes CF, Hanson DL. 1970.. On the theory of risk aversion. . Int. Econ. Rev. 11:(3):48187
    [Crossref] [Google Scholar]
  63. Meng TL, Khushi M. 2019.. Reinforcement learning in financial markets. . Data 4:(3):110
    [Crossref] [Google Scholar]
  64. Mihatsch O, Neuneier R. 2002.. Risk-sensitive reinforcement learning. . Mach. Learn. 49:(2):26790
    [Crossref] [Google Scholar]
  65. Milani S, Topin N, Veloso M, Fang F. 2024.. Explainable reinforcement learning: a survey and comparative review. . ACM Comput. Surv. 56:(7):168
    [Crossref] [Google Scholar]
  66. Millea A. 2021.. Deep reinforcement learning for trading—a critical survey. . Data 6:(11):119
    [Crossref] [Google Scholar]
  67. Mnih V, Badia AP, Mirza M, Graves A, Lillicrap T, et al. 2016.. Asynchronous methods for deep reinforcement learning. . Proc. Mach. Learn. Res. 48::192837
    [Google Scholar]
  68. Mnih V, Kavukcuoglu K, Silver D, Graves A, Antonoglou I, et al. 2013.. Playing Atari with deep reinforcement learning. . arXiv:1312.5602 [cs.LG]
  69. Mosavi A, Faghan Y, Ghamisi P, Duan P, Ardabili SF, et al. 2020.. Comprehensive review of deep reinforcement learning methods and applications in economics. . Mathematics 8:(10):1640
    [Crossref] [Google Scholar]
  70. Murphy SA. 2003.. Optimal dynamic treatment regimes. . J. R. Stat. Soc. Ser. B 65:(2):33166
    [Crossref] [Google Scholar]
  71. Nachum O, Gu SS, Lee H, Levine S. 2018.. Data-efficient hierarchical reinforcement learning. . In NIPS'18: Proceedings of the 32nd International Conference on Neural Information Processing Systems, ed. S Bengio, HM Wallach, H Larochelle, K Grauman, N Cesa-Bianchi , pp. 330717. Red Hook, NY:: Curran
    [Google Scholar]
  72. Nash JF Jr. 1950.. Equilibrium points in n-person games. . PNAS 36:(1):4849
    [Crossref] [Google Scholar]
  73. Nevmyvaka Y, Feng Y, Kearns M. 2006.. Reinforcement learning for optimized trade execution. . In ICML '06: Proceedings of the 23rd International Conference on Machine Learning, pp. 67380. New York:: ACM
    [Google Scholar]
  74. O'Hara M, Oldfield GS. 1986.. The microeconomics of market making. . J. Financ. Quant. Anal. 21:(4):36176
    [Crossref] [Google Scholar]
  75. Patel Y. 2018.. Optimizing market making using multi-agent reinforcement learning. . arXiv:1812.10252 [q-fin.TR]
  76. Pinto L, Davidson J, Sukthankar R, Gupta A. 2017.. Robust adversarial reinforcement learning. . Proc. Mach. Learn. Res. 70::281726
    [Google Scholar]
  77. Pratt JW. 1978.. Risk aversion in the small and in the large. . In Uncertainty in Economics, ed. P Diamond, M Rothschild , pp. 5979. Amsterdam:: Elsevier
    [Google Scholar]
  78. Pricope TV. 2021.. Deep reinforcement learning in quantitative algorithmic trading: a review. . arXiv:2106.00123 [cs.LG]
  79. Raffin A, Hill A, Gleave A, Kanervisto A, Ernestus M, Dormann N. 2021.. Stable-Baselines3: reliable reinforcement learning implementations. . J. Mach. Learn. Res. 22:(268):18
    [Google Scholar]
  80. Rescorla RA. 1972.. A theory of Pavlovian conditioning: variations in the effectiveness of reinforcement and non-reinforcement. . In Classical Conditioning II: Current Research and Theory, ed. AH Black, WF Prokasy , pp. 6469. New York:: Appleton-Century-Crofts
    [Google Scholar]
  81. Roa-Vicens J, Chtourou C, Filos A, Rullan F, Gal Y, Silva R. 2019.. Towards inverse reinforcement learning for limit order book dynamics. . arXiv:1906.04813 [cs.LG]
  82. Russo DJ, Van Roy B, Kazerouni A, Osband I, Wen Z, et al. 2018.. A tutorial on Thompson sampling. . Found. Trends Mach. Learn. 11:(1):196
    [Crossref] [Google Scholar]
  83. Schaul T, Quan J, Antonoglou I, Silver D. 2016.. Prioritized experience replay. . arXiv:1511.05952 [cs.LG]
  84. Schmitt TA, Chetalova D, Schäfer R, Guhr T. 2013.. Non-stationarity in financial time series: generic features and tail behavior. . Europhys. Lett. 103:(5):58003
    [Crossref] [Google Scholar]
  85. Schwartz A. 1993.. A reinforcement learning method for maximizing undiscounted rewards. . In ICML'93: Proceedings of the Tenth International Conference on International Conference on Machine Learning, pp. 298305. San Francisco, CA:: Morgan Kaufmann
    [Google Scholar]
  86. Selser M, Kreiner J, Maurette M. 2021.. Optimal market making by reinforcement learning. . arXiv:2104.04036 [cs.LG]
  87. Shen W, Wang J. 2016.. Portfolio blending via Thompson sampling. . In Proceedings of the Twenty-Fifth International Joint Conference on Artificial Intelligence (IJCAI-16), ed. S Kambhampati , pp. 198389. Palo Alto, CA:: AAAI
    [Google Scholar]
  88. Shen W, Wang J, Jiang YG, Zha H. 2015.. Portfolio choices with orthogonal bandit learning. . In IJCAI'15: Proceedings of the 24th International Conference on Artificial Intelligence, ed. Q Yang, M Wooldridge , pp. 97480. Palo Alto, CA:: AAAI
    [Google Scholar]
  89. Shen Y, Huang R, Yan C, Obermayer K. 2014.. Risk-averse reinforcement learning for algorithmic trading. . In 2014 IEEE Conference on Computational Intelligence for Financial Engineering & Economics (CIFEr), pp. 39198. Piscataway, NJ:: IEEE
    [Google Scholar]
  90. Shen Y, Stannat W, Obermayer K. 2013.. Risk-sensitive Markov control processes. . SIAM J. Control Optim. 51:(5):365272
    [Crossref] [Google Scholar]
  91. Shi C, Wan R, Song R, Lu W, Leng L. 2020.. Does the Markov decision process fit the data: testing for the Markov property in sequential decision making. . Proc. Mach. Learn. Res. 119::880717
    [Google Scholar]
  92. Soleymani F, Paquet E. 2021.. Deep graph convolutional reinforcement learning for financial portfolio management—DeepPocket. . Expert Syst. Appl. 182::115127
    [Crossref] [Google Scholar]
  93. Spooner T, Fearnley J, Savani R, Koukorinis A. 2018.. Market making via reinforcement learning. . In AAMAS '18: Proceedings of the 17th International Conference on Autonomous Agents and MultiAgent Systems, pp. 43442. Richland, SC:: Int. Found. Auton. Agents Multiagent Syst.
    [Google Scholar]
  94. Spooner T, Savani R. 2020.. Robust market making via adversarial reinforcement learning. . arXiv:2003.01820 [q-fin.TR]
  95. Such FP, Madhavan V, Conti E, Lehman J, Stanley KO, Clune J. 2017.. Deep neuroevolution: genetic algorithms are a competitive alternative for training deep neural networks for reinforcement learning. . arXiv:1712.06567 [cs.NE]
  96. Sutton RS, Barto AG. 2018.. Reinforcement Learning: An Introduction. Cambridge, MA:: MIT Press
    [Google Scholar]
  97. Sutton RS, McAllester D, Singh S, Mansour Y. 1999.. Policy gradient methods for reinforcement learning with function approximation. . In NIPS'99: Proceedings of the 12th International Conference on Neural Information Processing Systems, ed. SA Solla, TK Leen, K Müller , pp. 105763. Cambridge, MA:: MIT Press
    [Google Scholar]
  98. Théate T, Ernst D. 2021.. An application of deep reinforcement learning to algorithmic trading. . Expert Syst. Appl. 173::114632
    [Crossref] [Google Scholar]
  99. Thomas P. 2014.. Bias in natural actor-critic algorithms. . Proc. Mach. Learn. Res. 32:(1):44148
    [Google Scholar]
  100. Wang J, Zhang Y, Tang K, Wu J, Xiong Z. 2019.. AlphaStock: a buying-winners-and-selling-losers investment strategy using interpretable deep reinforcement attention networks. . In KDD '19: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, pp. 19008. New York:: ACM
    [Google Scholar]
  101. Wang R, Wei H, An B, Feng Z, Yao J. 2021.. Commission fee is not enough: a hierarchical reinforced framework for portfolio management. . In Thirty-Fifth AAAI Conference on Artificial Intelligence, AAAI 2021, pp. 62633. Palo Alto, CA:: AAAI
    [Google Scholar]
  102. Wang Z, Huang B, Tu S, Zhang K, Xu L. 2021.. DeepTrader: a deep reinforcement learning approach for risk-return balanced portfolio management with market conditions embedding. . In Thirty-Fifth AAAI Conference on Artificial Intelligence, AAAI 2021, pp. 64350. Palo Alto, CA:: AAAI
    [Google Scholar]
  103. Wang Z, Schaul T, Hessel M, Hasselt H, Lanctot M, Freitas N. 2016.. Dueling network architectures for deep reinforcement learning. . Proc. Mach. Learn. Res. 48::19952003
    [Google Scholar]
  104. Watkins CJ, Dayan P. 1992.. Q-learning. . Mach. Learn. 8::27992
    [Google Scholar]
  105. Wei H, Wang Y, Mangu L, Decker K. 2019.. Model-based reinforcement learning for predictions and control for limit order books. . arXiv:1910.03743 [cs.AI]
  106. Yang H, Liu XY, Zhong S, Walid A. 2020.. Deep reinforcement learning for automated stock trading: An ensemble strategy. . In ICAIF '20: Proceedings of the First ACM International Conference on AI in Finance, art . 31. New York:: ACM
    [Google Scholar]
  107. Ye Y, Pei H, Wang B, Chen PY, Zhu Y, et al. 2020.. Reinforcement-learning based portfolio management with augmented asset movement prediction states. . In Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, pp. 111219. Palo Alto, CA:: AAAI
    [Google Scholar]
  108. Yu P, Lee JS, Kulyatin I, Shi Z, Dasgupta S. 2019.. Model-based deep reinforcement learning for dynamic portfolio optimization. . arXiv:1901.08740 [cs.LG]
  109. Zhang Z, Zohren S, Roberts S. 2020.. Deep reinforcement learning for trading. . J. Financ. Data Sci. 2:(2):2540
    [Crossref] [Google Scholar]
  110. Zhao W, Queralta JP, Westerlund T. 2020.. Sim-to-real transfer in deep reinforcement learning for robotics: a survey. . In 2020 IEEE Symposium Series on Computational Intelligence (SSCI), pp. 73744. Piscataway, NJ:: IEEE
    [Google Scholar]
  111. Zhu M, Zheng X, Wang Y, Li Y, Liang Q. 2019.. Adaptive portfolio by solving multi-armed bandit via Thompson sampling. . arXiv:1911.05309 [cs.LG]
  112. Zhuang V, Sui Y. 2021.. No-regret reinforcement learning with heavy-tailed rewards. . Proc. Mach. Learn. Res. 130::338593
    [Google Scholar]
/content/journals/10.1146/annurev-statistics-112723-034423
Loading
/content/journals/10.1146/annurev-statistics-112723-034423
Loading

Data & Media loading...

Supplemental Materials

This is a required field
Please enter a valid email address
Approval was a Success
Invalid data
An Error Occurred
Approval was partially successful, following selected items could not be processed due to error