1932

Abstract

Reinforcement learning (RL), particularly its combination with deep neural networks, referred to as deep RL (DRL), has shown tremendous promise across a wide range of applications, suggesting its potential for enabling the development of sophisticated robotic behaviors. Robotics problems, however, pose fundamental difficulties for the application of RL, stemming from the complexity and cost of interacting with the physical world. This article provides a modern survey of DRL for robotics, with a particular focus on evaluating the real-world successes achieved with DRL in realizing several key robotic competencies. Our analysis aims to identify the key factors underlying those exciting successes, reveal underexplored areas, and provide an overall characterization of the status of DRL in robotics. We highlight several important avenues for future work, emphasizing the need for stable and sample-efficient real-world RL paradigms; holistic approaches for discovering and integrating various competencies to tackle complex long-horizon, open-world tasks; and principled development and evaluation procedures. This survey is designed to offer insights for both RL practitioners and roboticists toward harnessing RL's power to create generally capable real-world robotic systems.

Loading

Article metrics loading...

/content/journals/10.1146/annurev-control-030323-022510
2025-05-05
2025-06-19
Loading full text...

Full text loading...

/deliver/fulltext/control/8/1/annurev-control-030323-022510.html?itemId=/content/journals/10.1146/annurev-control-030323-022510&mimeType=html&fmt=ahah

Literature Cited

  1. 1.
    Sutton RS, Barto AG. 2018.. Reinforcement Learning: An Introduction. Cambridge, MA:: MIT Press. , 2nd ed..
    [Google Scholar]
  2. 2.
    François-Lavet V, Henderson P, Islam R, Bellemare MG, Pineau J, et al. 2018.. An introduction to deep reinforcement learning. . Found. Trends Mach. Learn. 11:(3–4):219354
    [Crossref] [Google Scholar]
  3. 3.
    Schrittwieser J, Antonoglou I, Hubert T, Simonyan K, Sifre L, et al. 2020.. Mastering Atari, Go, chess and shogi by planning with a learned model. . Nature 588:(7839):6049
    [Crossref] [Google Scholar]
  4. 4.
    Wurman PR, Barrett S, Kawamoto K, MacGlashan J, Subramanian K, et al. 2022.. Outracing champion Gran Turismo drivers with deep reinforcement learning. . Nature 602:(7896):22328
    [Crossref] [Google Scholar]
  5. 5.
    Yu C, Liu J, Nemati S, Yin G. 2021.. Reinforcement learning in healthcare: a survey. . ACM Comput. Surv. 55:(1):5
    [Google Scholar]
  6. 6.
    Afsar MM, Crump T, Far B. 2022.. Reinforcement learning based recommender systems: a survey. . ACM Comput. Surv. 55:(7):145
    [Google Scholar]
  7. 7.
    Kaufmann E, Bauersfeld L, Loquercio A, Müller M, Koltun V, Scaramuzza D. 2023.. Champion-level drone racing using deep reinforcement learning. . Nature 620:(7976):98287
    [Crossref] [Google Scholar]
  8. 8.
    ANYbotics. 2023.. Superior robot mobility—where AI meets the real world. . ANYbotics, Oct. 30. https://www.anybotics.com/news/superior-robot-mobility-where-ai-meets-the-real-world
    [Google Scholar]
  9. 9.
    Boston Dyn. 2024.. Starting on the right foot with reinforcement learning. . Boston Dynamics. https://bostondynamics.com/blog/starting-on-the-right-foot-with-reinforcement-learning
    [Google Scholar]
  10. 10.
    Kiran BR, Sobh I, Talpaert V, Mannion P, Al Sallab AA, et al. 2021.. Deep reinforcement learning for autonomous driving: a survey. . IEEE Trans. Intell. Transp. Syst. 23:(6):490926
    [Crossref] [Google Scholar]
  11. 11.
    Dulac-Arnold G, Levine N, Mankowitz DJ, Li J, Paduraru C, et al. 2021.. Challenges of real-world reinforcement learning: definitions, benchmarks, and analysis. . Mach. Learn. 110:(9):241968
    [Crossref] [Google Scholar]
  12. 12.
    Ibarz J, Tan J, Finn C, Kalakrishnan M, Pastor P, Levine S. 2021.. How to train your robot with deep reinforcement learning: lessons we have learned. . Int. J. Robot. Res. 40:(4–5):698721
    [Crossref] [Google Scholar]
  13. 13.
    Kroemer O, Niekum S, Konidaris G. 2021.. A review of robot learning for manipulation: challenges, representations, and algorithms. . J. Mach. Learn. Res. 22:(30):182
    [Google Scholar]
  14. 14.
    Xiao X, Liu B, Warnell G, Stone P. 2022.. Motion planning and control for mobile robot navigation using machine learning: a survey. . Auton. Robots 46:(5):56997
    [Crossref] [Google Scholar]
  15. 15.
    Deisenroth MP. 2011.. A survey on policy search for robotics. . Found. Trends Robot. 2:(1–2):1142
    [Google Scholar]
  16. 16.
    Brunke L, Greeff M, Hall AW, Yuan Z, Zhou S, et al. 2022.. Safe learning in robotics: from learning-based control to safe reinforcement learning. . Annu. Rev. Control Robot. Auton. Syst. 5::41144
    [Crossref] [Google Scholar]
  17. 17.
    Kober J, Bagnell JA, Peters J. 2013.. Reinforcement learning in robotics: a survey. . Int. J. Robot. Res. 32:(11):123874
    [Crossref] [Google Scholar]
  18. 18.
    Sünderhauf N, Brock O, Scheirer W, Hadsell R, Fox D, et al. 2018.. The limits and potentials of deep learning for robotics. . Int. J. Robot. Res. 37:(4–5):40520
    [Crossref] [Google Scholar]
  19. 19.
    Mason MT. 2001.. Mechanics of Robotic Manipulation. Cambridge, MA:: MIT Press
    [Google Scholar]
  20. 20.
    Siciliano B, Khatib O, Kröger T. 2008.. Springer Handbook of Robotics. Berlin:: Springer
    [Google Scholar]
  21. 21.
    Mason MT. 2018.. Toward robotic manipulation. . Annu. Rev. Control Robot. Auton. Syst. 1::128
    [Crossref] [Google Scholar]
  22. 22.
    Rudin N, Hoeller D, Bjelonic M, Hutter M. 2022.. Advanced skills by learning locomotion and local navigation end-to-end. . In 2022 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 2497503. Piscataway, NJ:: IEEE
    [Google Scholar]
  23. 23.
    Song Y, Romero A, Müller M, Koltun V, Scaramuzza D. 2023.. Reaching the limit in autonomous racing: optimal control versus reinforcement learning. . Sci. Robot. 8:(82):eadg1462
    [Crossref] [Google Scholar]
  24. 24.
    On-Road Autom. Driv. (ORAD) Comm. 2018.. Taxonomy and definitions for terms related to driving automation systems for on-road motor vehicles. Stand. J3016_202104 , SAE Int., Warrendale, PA:
    [Google Scholar]
  25. 25.
    Lavin A, Gilligan-Lee CM, Visnjic A, Ganju S, Newman D, et al. 2022.. Technology readiness levels for machine learning systems. . Nat. Commun. 13:(1):6039
    [Crossref] [Google Scholar]
  26. 26.
    Kohl N, Stone P. 2004.. Policy gradient reinforcement learning for fast quadrupedal locomotion. . In 2004 IEEE International Conference on Robotics and Automation, Vol. 3, pp. 261924. Piscataway, NJ:: IEEE
    [Google Scholar]
  27. 27.
    Bagnell JA, Schneider JG. 2001.. Autonomous helicopter control using reinforcement learning policy search methods. . In 2001 IEEE International Conference on Robotics and Automation, Vol. 2, pp. 161520. Piscataway, NJ:: IEEE
    [Google Scholar]
  28. 28.
    Abbeel P, Coates A, Quigley M, Ng AY. 2006.. An application of reinforcement learning to aerobatic helicopter flight. . In Advances in Neural Information Processing Systems 19, ed. B Schölkopf, J Platt, T Hoffman , pp. 19. Red Hook, NY:: Curran
    [Google Scholar]
  29. 29.
    Kumar A, Li Z, Zeng J, Pathak D, Sreenath K, Malik J. 2022.. Adapting rapid motor adaptation for bipedal robots. . In 2022 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 116168. Piscataway, NJ:: IEEE
    [Google Scholar]
  30. 30.
    Tan J, Zhang T, Coumans E, Iscen A, Bai Y, et al. 2018.. Sim-to-real: learning agile locomotion for quadruped robots. . In Robotics: Science and Systems XIV, ed. H Kress-Gazit, S Srinivasa, T Howard, N Atanasov, pap . 10. San Francisco:: Robot. Sci. Syst. Found.
    [Google Scholar]
  31. 31.
    Hwangbo J, Lee J, Dosovitskiy A, Bellicoso D, Tsounis V, et al. 2019.. Learning agile and dynamic motor skills for legged robots. . Sci. Robot. 4:(26):eaau5872
    [Crossref] [Google Scholar]
  32. 32.
    Feng G, Zhang H, Li Z, Peng XB, Basireddy B, et al. 2023.. GenLoco: generalized locomotion controllers for quadrupedal robots. . In Proceedings of the 6th Conference on Robot Learning, ed. K Liu, D Kulic, J Ichnowski , pp. 1893903. Proc. Mach. Learn. Res. 205 . N.p.:: PMLR
    [Google Scholar]
  33. 33.
    Lee J, Hwangbo J, Hutter M. 2019.. Robust recovery controller for a quadrupedal robot using deep reinforcement learning. . arXiv:1901.07517 [cs.RO]
  34. 34.
    Yang C, Yuan K, Zhu Q, Yu W, Li Z. 2020.. Multi-expert learning of adaptive legged locomotion. . Sci. Robot. 5:(49):eabb2174
    [Crossref] [Google Scholar]
  35. 35.
    Kumar A, Fu Z, Pathak D, Malik J. 2021.. RMA: rapid motor adaptation for legged robots. . In Robotics: Science and Systems XVII, ed. D Shell, M Toussaint, MA Hsieh, pap . 11. San Francisco:: Robot. Sci. Syst. Found.
    [Google Scholar]
  36. 36.
    Lee J, Hwangbo J, Wellhausen L, Koltun V, Hutter M. 2020.. Learning quadrupedal locomotion over challenging terrain. . Sci. Robot. 5:(47):eabc5986
    [Crossref] [Google Scholar]
  37. 37.
    Miki T, Lee J, Hwangbo J, Wellhausen L, Koltun V, Hutter M. 2022.. Learning robust perceptive locomotion for quadrupedal robots in the wild. . Sci. Robot. 7:(62):eabk2822
    [Crossref] [Google Scholar]
  38. 38.
    Gangapurwala S, Geisert M, Orsolino R, Fallon M, Havoutis I. 2022.. RLOC: terrain-aware legged locomotion using reinforcement learning and optimal control. . IEEE Trans. Robot. 38:(5):290827
    [Crossref] [Google Scholar]
  39. 39.
    Choi S, Ji G, Park J, Kim H, Mun J, et al. 2023.. Learning quadrupedal locomotion on deformable terrain. . Sci. Robot. 8:(74):eade2256
    [Crossref] [Google Scholar]
  40. 40.
    Nahrendra IMA, Yu B, Myung H. 2023.. DreamWaQ: learning robust quadrupedal locomotion with implicit terrain imagination via deep reinforcement learning. . In 2023 IEEE International Conference on Robotics and Automation, pp. 507884. Piscataway, NJ:: IEEE
    [Google Scholar]
  41. 41.
    Escontrela A, Peng XB, Yu W, Zhang T, Iscen A, et al. 2022.. Adversarial motion priors make good substitutes for complex reward functions. . In 2022 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 2532. Piscataway, NJ:: IEEE
    [Google Scholar]
  42. 42.
    Ma Y, Farshidian F, Hutter M. 2023.. Learning arm-assisted fall damage reduction and recovery for legged mobile manipulators. . In 2023 IEEE International Conference on Robotics and Automation, pp. 1214955. Piscataway, NJ:: IEEE
    [Google Scholar]
  43. 43.
    Fu Z, Kumar A, Malik J, Pathak D. 2022.. Minimizing energy consumption leads to the emergence of gaits in legged robots. . In Proceedings of the 5th Conference on Robot Learning, ed. A Faust, D Hsu, G Neumann , pp. 92837. Proc. Mach. Learn. Res. 164 . N.p.:: PMLR
    [Google Scholar]
  44. 44.
    Loquercio A, Kumar A, Malik J. 2023.. Learning visual locomotion with cross-modal supervision. . In 2023 IEEE International Conference on Robotics and Automation, pp. 7295302. Piscataway, NJ:: IEEE
    [Google Scholar]
  45. 45.
    Agarwal A, Kumar A, Malik J, Pathak D. 2023.. Legged locomotion in challenging terrains using egocentric vision. . In Proceedings of the 6th Conference on Robot Learning, ed. K Liu, D Kulic, J Ichnowski , pp. 40315. Proc. Mach. Learn. Res. 205 . N.p.:: PMLR
    [Google Scholar]
  46. 46.
    Yang R, Yang G, Wang X. 2023.. Neural volumetric memory for visual locomotion control. . In 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 143040. Piscataway, NJ:: IEEE
    [Google Scholar]
  47. 47.
    Jenelten F, He J, Farshidian F, Hutter M. 2024.. DTC: deep tracking control. . Sci. Robot. 9:(86):eadh5401
    [Crossref] [Google Scholar]
  48. 48.
    Yang Y, Shi G, Meng X, Yu W, Zhang T, et al. 2023.. CAJun: continuous adaptive jumping using a learned centroidal controller. . In Proceedings of the 7th Conference on Robot Learning, ed. J Tan, M Toussaint, K Darvish , pp. 2791806. Proc. Mach. Learn. Res. 229 . N.p.:: PMLR
    [Google Scholar]
  49. 49.
    Smith L, Kew JC, Peng XB, Ha S, Tan J, Levine S. 2022.. Legged robots that keep on learning: fine-tuning locomotion policies in the real world. . In 2022 International Conference on Robotics and Automation, pp. 159399. Piscataway, NJ:: IEEE
    [Google Scholar]
  50. 50.
    Cheng X, Shi K, Agarwal A, Pathak D. 2024.. Extreme parkour with legged robots. . In 2024 IEEE International Conference on Robotics and Automation, pp. 1144350. Piscataway, NJ:: IEEE
    [Google Scholar]
  51. 51.
    Zhuang Z, Fu Z, Wang J, Atkeson CG, Schwertfeger S, et al. 2023.. Robot parkour learning. . In Proceedings of the 7th Conference on Robot Learning, ed. J Tan, M Toussaint, K Darvish , pp. 7392. Proc. Mach. Learn. Res. 229 . N.p.:: PMLR
    [Google Scholar]
  52. 52.
    Vollenweider E, Bjelonic M, Klemm V, Rudin N, Lee J, Hutter M. 2023.. Advanced skills through multiple adversarial motion priors in reinforcement learning. . In 2023 IEEE International Conference on Robotics and Automation, pp. 512026. Piscataway, NJ:: IEEE
    [Google Scholar]
  53. 53.
    Margolis GB, Agrawal P. 2023.. Walk these ways: tuning robot control for generalization with multiplicity of behavior. . In Proceedings of the 6th Conference on Robot Learning, ed. K Liu, D Kulic, J Ichnowski , pp. 2231. Proc. Mach. Learn. Res. 205 . N.p.:: PMLR
    [Google Scholar]
  54. 54.
    Smith L, Kostrikov I, Levine S. 2023.. Demonstrating a walk in the park: learning to walk in 20 minutes with model-free reinforcement learning. . In Robotics: Science and Systems XIX, ed. K Bekris, K Hauser, S Herbert, J Yu, pap . 56. San Francisco:: Robot. Sci. Syst. Found.
    [Google Scholar]
  55. 55.
    Wu P, Escontrela A, Hafner D, Abbeel P, Goldberg K. 2023.. DayDreamer: world models for physical robot learning. . In Proceedings of the 6th Conference on Robot Learning, ed. K Liu, D Kulic, J Ichnowski , pp. 222640. Proc. Mach. Learn. Res. 205 . N.p.:: PMLR
    [Google Scholar]
  56. 56.
    Siekmann J, Valluri S, Dao J, Bermillo L, Duan H, et al. 2020.. Learning memory-based control for human-scale bipedal locomotion. . In Robotics: Science and Systems XVI, ed. M Toussaint, A Bicchi, T Hermans, pap . 31. San Francisco:: Robot. Sci. Syst. Found.
    [Google Scholar]
  57. 57.
    Hanna JP, Desai S, Karnan H, Warnell G, Stone P. 2021.. Grounded action transformation for sim-to-real reinforcement learning. . Mach. Learn. 110:(9):246999
    [Crossref] [Google Scholar]
  58. 58.
    Siekmann J, Godse Y, Fern A, Hurst J. 2021.. Sim-to-real learning of all common bipedal gaits via periodic reward composition. . In 2021 IEEE International Conference on Robotics and Automation, pp. 730915. Piscataway, NJ:: IEEE
    [Google Scholar]
  59. 59.
    Li Z, Cheng X, Peng XB, Abbeel P, Levine S, et al. 2021.. Reinforcement learning for robust parameterized locomotion control of bipedal robots. . In 2021 IEEE International Conference on Robotics and Automation, pp. 281117. Piscataway, NJ:: IEEE
    [Google Scholar]
  60. 60.
    Siekmann J, Green K, Warila J, Fern A, Hurst J. 2021.. Blind bipedal stair traversal via sim-to-real reinforcement learning. . In Robotics: Science and Systems XVII, ed. D Shell, M Toussaint, MA Hsieh, pap . 61. San Francisco:: Robot. Sci. Syst. Found.
    [Google Scholar]
  61. 61.
    Castillo GA, Weng B, Zhang W, Hereid A. 2022.. Reinforcement learning-based cascade motion policy design for robust 3D bipedal locomotion. . IEEE Access 10::2013548
    [Crossref] [Google Scholar]
  62. 62.
    Duan H, Pandit B, Gadde MS, van Marum BJ, Dao J, et al. 2023.. Learning vision-based bipedal locomotion for challenging terrain. . In 2024 IEEE International Conference on Robotics and Automation, pp. 5662. Piscataway, NJ:: IEEE
    [Google Scholar]
  63. 63.
    Radosavovic I, Xiao T, Zhang B, Darrell T, Malik J, Sreenath K. 2023.. Real-world humanoid locomotion with reinforcement learning. . Sci. Robot. 9:(89):eadi9579
    [Crossref] [Google Scholar]
  64. 64.
    Li Z, Peng XB, Abbeel P, Levine S, Berseth G, Sreenath K. 2024.. Reinforcement learning for versatile, dynamic, and robust bipedal locomotion control. . arXiv:2401.16889 [cs.RO]
  65. 65.
    Hwangbo J, Sa I, Siegwart R, Hutter M. 2017.. Control of a quadrotor with reinforcement learning. . IEEE Robot. Autom. Lett. 2:(4):2096103
    [Crossref] [Google Scholar]
  66. 66.
    Molchanov A, Chen T, Hönig W, Preiss JA, Ayanian N, Sukhatme GS. 2019.. Sim-to-(multi)-real: transfer of low-level robust control policies to multiple quadrotors. . In 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5966. Piscataway, NJ:: IEEE
    [Google Scholar]
  67. 67.
    Kaufmann E, Bauersfeld L, Scaramuzza D. 2022.. A benchmark comparison of learned control policies for agile quadrotor flight. . In 2022 International Conference on Robotics and Automation, pp. 1050410. Piscataway, NJ:: IEEE
    [Google Scholar]
  68. 68.
    Zhang D, Loquercio A, Wu X, Kumar A, Malik J, Mueller MW. 2023.. Learning a single near-hover position controller for vastly different quadcopters. . In 2023 IEEE International Conference on Robotics and Automation, pp. 126369. Piscataway, NJ:: IEEE
    [Google Scholar]
  69. 69.
    Eschmann J, Albani D, Loianno G. 2024.. Learning to fly in seconds. . IEEE Robot. Autom. Lett. 9:(7):633643
    [Crossref] [Google Scholar]
  70. 70.
    Pinto L, Andrychowicz M, Welinder P, Zaremba W, Abbeel P. 2018.. Asymmetric actor critic for image-based robot learning. . In Robotics: Science and Systems XIV, ed. H Kress-Gazit, S Srinivasa, T Howard, N Atanasov, pap . 8. San Francisco:: Robot. Sci. Syst. Found.
    [Google Scholar]
  71. 71.
    Yang R, Zhang M, Hansen N, Xu H, Wang X. 2022.. Learning vision-guided quadrupedal locomotion end-to-end with cross-modal transformers. . In The Tenth International Conference on Learning Representations. La Jolla, CA:: Int. Conf. Learn. Represent. https://openreview.net/forum?id=nhnJ3oo6AB
    [Google Scholar]
  72. 72.
    Schulman J, Wolski F, Dhariwal P, Radford A, Klimov O. 2017.. Proximal policy optimization algorithms. . arXiv:1707.06347 [cs.LG]
  73. 73.
    Grizzle JW, Hurst J, Morris B, Park HW, Sreenath K. 2009.. MABEL, a new robotic bipedal walker and runner. . In 2009 American Control Conference, pp. 203036. Piscataway, NJ:: IEEE
    [Google Scholar]
  74. 74.
    Unitree Robot. 2024.. Unitree H1 the world's first full-size motor drive humanoid robot flips on ground. . YouTube. https://www.youtube.com/watch?v=V1LyWsiTgms
    [Google Scholar]
  75. 75.
    Boston Dyn. 2021.. Atlas ∣ partners in parkour. . YouTube. https://www.youtube.com/watch?v=tF4DML7FIWk
    [Google Scholar]
  76. 76.
    Loquercio A, Kaufmann E, Ranftl R, Müller M, Koltun V, Scaramuzza D. 2021.. Learning high-speed flight in the wild. . Sci. Robot. 6:(59):eabg5810
    [Crossref] [Google Scholar]
  77. 77.
    Tai L, Paolo G, Liu M. 2017.. Virtual-to-real deep reinforcement learning: continuous control of mobile robots for mapless navigation. . In 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 3136. Piscataway, NJ:: IEEE
    [Google Scholar]
  78. 78.
    Xu Z, Liu B, Xiao X, Nair A, Stone P. 2023.. Benchmarking reinforcement learning techniques for autonomous navigation. . In 2023 IEEE International Conference on Robotics and Automation, pp. 922430. Piscataway, NJ:: IEEE
    [Google Scholar]
  79. 79.
    Chiang HTL, Faust A, Fiser M, Francis A. 2019.. Learning navigation behaviors end-to-end with autoRL. . IEEE Robot. Autom. Lett. 4:(2):200714
    [Crossref] [Google Scholar]
  80. 80.
    Stein GJ, Bradley C, Roy N. 2018.. Learning over subgoals for efficient navigation of structured, unknown environments. . In Proceedings of the 2nd Conference on Robot Learning, ed. A Billard, A Dragan, J Peters, J Morimoto , pp. 21322. Proc. Mach. Learn. Res. 87 . N.p.:: PMLR
    [Google Scholar]
  81. 81.
    Zhu Y, Mottaghi R, Kolve E, Lim JJ, Gupta A, et al. 2017.. Target-driven visual navigation in indoor scenes using deep reinforcement learning. . In 2017 IEEE International Conference on Robotics and Automation, pp. 335764. Piscataway, NJ:: IEEE
    [Google Scholar]
  82. 82.
    Chaplot DS, Gandhi DP, Gupta A, Salakhutdinov RR. 2020.. Object goal navigation using goal-oriented semantic exploration. . In Advances in Neural Information Processing Systems 33, ed. H Larochelle, M Ranzato, R Hadsell, MF Balcan, H Lin , pp. 424758. Red Hook, NY:: Curran
    [Google Scholar]
  83. 83.
    Gervet T, Chintala S, Batra D, Malik J, Chaplot DS. 2023.. Navigating to objects in the real world. . Sci. Robot. 8:(79):eadf6991
    [Crossref] [Google Scholar]
  84. 84.
    Kadian A, Truong J, Gokaslan A, Clegg A, Wijmans E, et al. 2020.. Sim2real predictivity: Does evaluation in simulation predict real-world performance?. IEEE Robot. Autom. Lett. 5:(4):667077
    [Crossref] [Google Scholar]
  85. 85.
    Kahn G, Abbeel P, Levine S. 2021.. BADGR: an autonomous self-supervised learning-based navigation system. . IEEE Robot. Autom. Lett. 6:(2):131219
    [Crossref] [Google Scholar]
  86. 86.
    Shah D, Bhorkar A, Leen H, Kostrikov I, Rhinehart N, Levine S. 2023.. Offline reinforcement learning for visual navigation. . In Proceedings of the 6th Conference on Robot Learning, ed. K Liu, D Kulic, J Ichnowski , pp. 4454. Proc. Mach. Learn. Res. 205 . N.p.:: PMLR
    [Google Scholar]
  87. 87.
    Williams G, Wagener N, Goldfain B, Drews P, Rehg JM, et al. 2017.. Information theoretic MPC for model-based reinforcement learning. . In 2017 IEEE International Conference on Robotics and Automation, pp. 171421. Piscataway, NJ:: IEEE
    [Google Scholar]
  88. 88.
    Stachowicz K, Shah D, Bhorkar A, Kostrikov I, Levine S. 2023.. FastRLAP: a system for learning high-speed driving via deep RL and autonomous practicing. . In Proceedings of the 7th Conference on Robot Learning, ed. J Tan, M Toussaint, K Darvish , pp. 310011. Proc. Mach. Learn. Res. 229 . N.p.:: PMLR
    [Google Scholar]
  89. 89.
    Kendall A, Hawke J, Janz D, Mazur P, Reda D, et al. 2019.. Learning to drive in a day. . In 2019 IEEE International Conference on Robotics and Automation, pp. 824854. Piscataway, NJ:: IEEE
    [Google Scholar]
  90. 90.
    Jang K, Lichtlé N, Vinitsky E, Shah A, Bunting M, et al. 2024.. Reinforcement learning based oscillation dampening: scaling up single-agent RL algorithms to a 100 AV highway field operational test. . arXiv:2402.17050 [eess.SY]
  91. 91.
    Hoeller D, Wellhausen L, Farshidian F, Hutter M. 2021.. Learning a state representation and navigation in cluttered and dynamic environments. . IEEE Robot. Autom. Lett. 6:(3):508188
    [Crossref] [Google Scholar]
  92. 92.
    Truong J, Rudolph M, Yokoyama NH, Chernova S, Batra D, Rai A. 2023.. Rethinking sim2real: lower fidelity simulation leads to higher sim2real transfer in navigation. . In Proceedings of the 6th Conference on Robot Learning, ed. K Liu, D Kulic, J Ichnowski , pp. 85970. Proc. Mach. Learn. Res. 205 . N.p.:: PMLR
    [Google Scholar]
  93. 93.
    Truong J, Zitkovich A, Chernova S, Batra D, Zhang T, et al. 2024.. IndoorSim-to-OutdoorReal: learning to navigate outdoors without any outdoor experience. . IEEE Robot. Autom. Lett. 9:(5):4798805
    [Crossref] [Google Scholar]
  94. 94.
    Sorokin M, Tan J, Liu CK, Ha S. 2022.. Learning to navigate sidewalks in outdoor environments. . IEEE Robot. Autom. Lett. 7:(2):390613
    [Crossref] [Google Scholar]
  95. 95.
    Zhang C, Jin J, Frey J, Rudin N, Mattamala Aravena ME, et al. 2024.. Resilient legged local navigation: learning to traverse with compromised perception end-to-end. . In 2024 IEEE International Conference on Robotics and Automation, pp. 3441. Piscataway, NJ:: IEEE
    [Google Scholar]
  96. 96.
    Hoeller D, Rudin N, Sako D, Hutter M. 2024.. ANYmal parkour: learning agile navigation for quadrupedal robots. . Sci. Robot. 9:(88):eadi7566
    [Crossref] [Google Scholar]
  97. 97.
    Lee J, Bjelonic M, Reske A, Wellhausen L, Miki T, Hutter M. 2024.. Learning robust autonomous navigation and locomotion for wheeled-legged robots. . Sci. Robot. 9:(89):eadi9641
    [Crossref] [Google Scholar]
  98. 98.
    Miki T, Lee J, Wellhausen L, Hutter M. 2024.. Learning to walk in confined spaces using 3D representation. . In 2024 IEEE International Conference on Robotics and Automation, pp. 864956. Piscataway, NJ:: IEEE
    [Google Scholar]
  99. 99.
    Xu Z, Raj AH, Xiao X, Stone P. 2024.. Dexterous legged locomotion in confined 3D spaces with reinforcement learning. . In 2024 IEEE International Conference on Robotics and Automation, pp. 1147480. Piscataway, NJ:: IEEE
    [Google Scholar]
  100. 100.
    He T, Zhang C, Xiao W, He G, Liu C, Shi G. 2024.. Agile but safe: learning collision-free high-speed legged locomotion. . arXiv:2401.17583 [cs.RO]
  101. 101.
    Sadeghi F, Levine S. 2017.. CAD2RL: real single-image flight without a single real image. . In Robotics: Science and Systems XIII, ed. N Amato, S Srinivasa, N Ayanian, S Kuindersma, pap . 34. San Francisco:: Robot. Sci. Syst. Found.
    [Google Scholar]
  102. 102.
    Kang K, Belkhale S, Kahn G, Abbeel P, Levine S. 2019.. Generalization through simulation: integrating simulated and real data into deep reinforcement learning for vision-based autonomous flight. . In 2019 IEEE International Conference on Robotics and Automation, pp. 600814. Piscataway, NJ:: IEEE
    [Google Scholar]
  103. 103.
    Romero A, Song Y, Scaramuzza D. 2024.. Actor-critic model predictive control. . In 2024 IEEE International Conference on Robotics and Automation, pp. 1477784. Piscataway, NJ:: IEEE
    [Google Scholar]
  104. 104.
    Anderson P, Wu Q, Teney D, Bruce J, Johnson M, et al. 2018.. Vision-and-language navigation: interpreting visually-grounded navigation instructions in real environments. . In 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 367483. Piscataway, NJ:: IEEE
    [Google Scholar]
  105. 105.
    Kahn G, Villaflor A, Ding B, Abbeel P, Levine S. 2018.. Self-supervised deep reinforcement learning with generalized computation graphs for robot navigation. . In 2018 IEEE International Conference on Robotics and Automation, pp. 512936. Piscataway, NJ:: IEEE
    [Google Scholar]
  106. 106.
    Wijmans E, Kadian A, Morcos A, Lee S, Essa I, et al. 2019.. DD-PPO: learning near-perfect PointGoal navigators from 2.5 billion frames. . arXiv:1911.00357 [cs.CV]
  107. 107.
    Savva M, Kadian A, Maksymets O, Zhao Y, Wijmans E, et al. 2019.. Habitat: a platform for embodied AI research. . In 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 933947. Piscataway, NJ:: IEEE
    [Google Scholar]
  108. 108.
    Xie L, Wang S, Markham A, Trigoni N. 2017.. Towards monocular vision based obstacle avoidance through deep reinforcement learning. . arXiv:1706.09829 [cs.RO]
  109. 109.
    Wijmans E, Savva M, Essa I, Lee S, Morcos AS, Batra D. 2023.. Emergence of maps in the memories of blind navigation agents. . In The Eleventh International Conference on Learning Representations. La Jolla, CA:: Int. Conf. Learn. Represent. https://openreview.net/forum?id=lTt4KjHSsyl
    [Google Scholar]
  110. 110.
    Rosinol A, Leonard JJ, Carlone L. 2023.. NeRF-SLAM: real-time dense monocular slam with neural radiance fields. . In 2023 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 343744. Piscataway, NJ:: IEEE
    [Google Scholar]
  111. 111.
    Mahler J, Matl M, Satish V, Danielczuk M, DeRose B, et al. 2019.. Learning ambidextrous robot grasping policies. . Sci. Robot. 4:(26):eaau4984
    [Crossref] [Google Scholar]
  112. 112.
    Zeng A, Song S, Welker S, Lee J, Rodriguez A, Funkhouser T. 2018.. Learning synergies between pushing and grasping with self-supervised deep reinforcement learning. . In 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 423845. Piscataway, NJ:: IEEE
    [Google Scholar]
  113. 113.
    Kalashnikov D, Irpan A, Pastor P, Ibarz J, Herzog A, et al. 2018.. Scalable deep reinforcement learning for vision-based robotic manipulation. . In Proceedings of the 2nd Conference on Robot Learning, ed. A Billard, A Dragan, J Peters, J Morimoto , pp. 65173. Proc. Mach. Learn. Res. 87 . N.p.:: PMLR
    [Google Scholar]
  114. 114.
    James S, Wohlhart P, Kalakrishnan M, Kalashnikov D, Irpan A, et al. 2019.. Sim-to-real via sim-to-sim: data-efficient robotic grasping via randomized-to-canonical adaptation networks. . In 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 1262737. Piscataway, NJ:: IEEE
    [Google Scholar]
  115. 115.
    Wang D, Jia M, Zhu X, Walters R, Platt R. 2023.. On-robot learning with equivariant models. . In Proceedings of the 6th Conference on Robot Learning, ed. K Liu, D Kulic, J Ichnowski , pp. 134554. Proc. Mach. Learn. Res. 205 . N.p.:: PMLR
    [Google Scholar]
  116. 116.
    Levine S, Finn C, Darrell T, Abbeel P. 2016.. End-to-end training of deep visuomotor policies. . J. Mach. Learn. Res. 17:(1):133473
    [Google Scholar]
  117. 117.
    Kalashnikov D, Varley J, Chebotar Y, Swanson B, Jonschkowski R, et al. 2022.. Scaling up multi-task robotic reinforcement learning. . In Proceedings of the 5th Conference on Robot Learning, ed. A Faust, D Hsu, G Neumann , pp. 55775. Proc. Mach. Learn. Res. 164 . N.p.:: PMLR
    [Google Scholar]
  118. 118.
    Chebotar Y, Hausman K, Lu Y, Xiao T, Kalashnikov D, et al. 2021.. Actionable models: unsupervised offline reinforcement learning of robotic skills. . In Proceedings of the 38th International Conference on Machine Learning, ed. M Meila, T Zhang , pp. 151828. Proc. Mach. Learn. Res. 139 . N.p.:: PMLR
    [Google Scholar]
  119. 119.
    Lee AX, Devin CM, Zhou Y, Lampe T, Bousmalis K, et al. 2021.. Beyond pick-and-place: tackling robotic stacking of diverse shapes. . In Proceedings of the 5th Conference on Robot Learning, ed. A Faust, D Hsu, G Neumann , pp. 1089131. Proc. Mach. Learn. Res. 164 . N.p.:: PMLR
    [Google Scholar]
  120. 120.
    Walke HR, Yang JH, Yu A, Kumar A, Orbik J, et al. 2023.. Don't start from scratch: leveraging prior data to automate robotic reinforcement learning. . In Proceedings of the 6th Conference on Robot Learning, ed. K Liu, D Kulic, J Ichnowski , pp. 165262. Proc. Mach. Learn. Res. 205 . N.p.:: PMLR
    [Google Scholar]
  121. 121.
    Ebert F, Finn C, Dasari S, Xie A, Lee A, Levine S. 2018.. Visual foresight: model-based deep reinforcement learning for vision-based robotic control. . arXiv:1812.00568 [cs.RO]
  122. 122.
    Riedmiller M, Hafner R, Lampe T, Neunert M, Degrave J, et al. 2018.. Learning by playing solving sparse reward tasks from scratch. . In Proceedings of the 35th International Conference on Machine Learning, ed. J Dy, A Krause , pp. 434453. Proc. Mach. Learn. Res. 80 . N.p.:: PMLR
    [Google Scholar]
  123. 123.
    Zhu H, Yu J, Gupta A, Shah D, Hartikainen K, et al. 2020.. The ingredients of real world robotic reinforcement learning. . In The Eighth International Conference on Learning Representations. La Jolla, CA:: Int. Conf. Learn. Represent. https://openreview.net/forum?id=rJe2syrtvS
    [Google Scholar]
  124. 124.
    Ma YJ, Sodhani S, Jayaraman D, Bastani O, Kumar V, Zhang A. 2023.. VIP: towards universal visual reward and representation via value-implicit pre-training. . In The Eleventh International Conference on Learning Representations. La Jolla, CA:: Int. Conf. Learn. Represent. https://openreview.net/forum?id=YJ7o2wetJ2
    [Google Scholar]
  125. 125.
    Nair A, Gupta A, Dalal M, Levine S. 2020.. AWAC: accelerating online reinforcement learning with offline datasets. . arXiv:2006.09359 [cs.LG]
  126. 126.
    Nasiriany S, Liu H, Zhu Y. 2022.. Augmenting reinforcement learning with behavior primitives for diverse manipulation tasks. . In 2022 IEEE International Conference on Robotics and Automation, pp. 747784. Piscataway, NJ:: IEEE
    [Google Scholar]
  127. 127.
    Chebotar Y, Vuong Q, Hausman K, Xia F, Lu Y, et al. 2023.. Q-Transformer: scalable offline reinforcement learning via autoregressive Q-functions. . In Proceedings of the 7th Conference on Robot Learning, ed. J Tan, M Toussaint, K Darvish , pp. 390928. Proc. Mach. Learn. Res. 229 . N.p.:: PMLR
    [Google Scholar]
  128. 128.
    Nair AV, Pong V, Dalal M, Bahl S, Lin S, Levine S. 2018.. Visual reinforcement learning with imagined goals. . In Advances in Neural Information Processing Systems 31, ed. S Bengio, H Wallach, H Larochelle, K Grauman, N Cesa-Bianchi, R Garnett , pp. 9191200. Red Hook, NY:: Curran
    [Google Scholar]
  129. 129.
    Johannink T, Bahl S, Nair A, Luo J, Kumar A, et al. 2019.. Residual reinforcement learning for robot control. . In 2019 International Conference on Robotics and Automation, pp. 602329. Piscataway, NJ:: IEEE
    [Google Scholar]
  130. 130.
    Vecerik M, Hester T, Scholz J, Wang F, Pietquin O, et al. 2017.. Leveraging demonstrations for deep reinforcement learning on robotics problems with sparse rewards. . arXiv:1707.08817 [cs.AI]
  131. 131.
    Luo J, Sushkov O, Pevceviciute R, Lian W, Su C, et al. 2021.. Robust multi-modal policies for industrial assembly via reinforcement learning and demonstrations: a large-scale study. . In Robotics: Science and Systems XVII, ed. D Shell, M Toussaint, MA Hsieh, pap . 88. San Francisco:: Robot. Sci. Syst. Found.
    [Google Scholar]
  132. 132.
    Zhao TZ, Luo J, Sushkov O, Pevceviciute R, Heess N, et al. 2022.. Offline meta-reinforcement learning for industrial insertion. . In 2022 IEEE International Conference on Robotics and Automation, pp. 638693. Piscataway, NJ:: IEEE
    [Google Scholar]
  133. 133.
    Tang B, Lin MA, Akinola I, Handa A, Sukhatme GS, et al. 2023.. IndustReal: transferring contact-rich assembly tasks from simulation to reality. . In Robotics: Science and Systems XIX, ed. K Bekris, K Hauser, S Herbert, J Yu, pap . 39. San Francisco:: Robot. Sci. Syst. Found.
    [Google Scholar]
  134. 134.
    Chebotar Y, Handa A, Makoviychuk V, Macklin M, Issac J, et al. 2019.. Closing the sim-to-real loop: adapting simulation randomization with real world experience. . In 2019 IEEE International Conference on Robotics and Automation, pp. 897379. Piscataway, NJ:: IEEE
    [Google Scholar]
  135. 135.
    Abbatematteo B, Rosen E, Thompson S, Akbulut T, Rammohan S, Konidaris G. 2024.. Composable interaction primitives: a structured policy class for efficiently learning sustained-contact manipulation skills. . In 2024 IEEE International Conference on Robotics and Automation, pp. 752229. Piscataway, NJ:: IEEE
    [Google Scholar]
  136. 136.
    Wu R, Zhao Y, Mo K, Guo Z, Wang Y, et al. 2022.. VAT-Mart: learning visual action trajectory proposals for manipulating 3D articulated objects. . In The Tenth International Conference on Learning Representations. La Jolla, CA:: Int. Conf. Learn. Represent. https://openreview.net/forum?id=iEx3PiooLy
    [Google Scholar]
  137. 137.
    Matas J, James S, Davison AJ. 2018.. Sim-to-real reinforcement learning for deformable object manipulation. . In Proceedings of the 2nd Conference on Robot Learning, ed. A Billard, A Dragan, J Peters, J Morimoto , pp. 73443. Proc. Mach. Learn. Res. 87 . N.p.:: PMLR
    [Google Scholar]
  138. 138.
    Wu Y, Yan W, Kurutach T, Pinto L, Abbeel P. 2020.. Learning to manipulate deformable objects without demonstrations. . In Robotics: Science and Systems XVI, ed. M Toussaint, A Bicchi, T Hermans, pap . 65. San Francisco:: Robot. Sci. Syst. Found.
    [Google Scholar]
  139. 139.
    Avigal Y, Berscheid L, Asfour T, Kröger T, Goldberg K. 2022.. SpeedFolding: learning efficient bimanual folding of garments. . In 2022 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 18. Piscataway, NJ:: IEEE
    [Google Scholar]
  140. 140.
    Wang Y, Sun Z, Erickson Z, Held D. 2023.. One policy to dress them all: learning to dress people with diverse poses and garments. . In Robotics: Science and Systems XIX, ed. K Bekris, K Hauser, S Herbert, J Yu, pap . 8. San Francisco:: Robot. Sci. Syst. Found.
    [Google Scholar]
  141. 141.
    Andrychowicz OM, Baker B, Chociej M, Jozefowicz R, McGrew B, et al. 2020.. Learning dexterous in-hand manipulation. . Int. J. Robot. Res. 39:(1):320
    [Crossref] [Google Scholar]
  142. 142.
    Handa A, Allshire A, Makoviychuk V, Petrenko A, Singh R, et al. 2023.. DeXtreme: transfer of agile in-hand manipulation from simulation to reality. . In 2023 IEEE International Conference on Robotics and Automation, pp. 597784. Piscataway, NJ:: IEEE
    [Google Scholar]
  143. 143.
    Nagabandi A, Konolige K, Levine S, Kumar V. 2020.. Deep dynamics models for learning dexterous manipulation. . In Proceedings of the Conference on Robot Learning, ed. LP Kaelbling, D Kragic, K Sugiura , pp. 110112. Proc. Mach. Learn. Res. 100 . N.p.:: PMLR
    [Google Scholar]
  144. 144.
    Qi H, Yi B, Suresh S, Lambeta M, Ma Y, et al. 2023.. General in-hand object rotation with vision and touch. . In Proceedings of the 7th Conference on Robot Learning, ed. J Tan, M Toussaint, K Darvish , pp. 25492564. Proc. Mach. Learn. Res. 229 . N.p.:: PMLR
    [Google Scholar]
  145. 145.
    Chen T, Tippur M, Wu S, Kumar V, Adelson E, Agrawal P. 2023.. Visual dexterity: in-hand reorientation of novel and complex object shapes. . Sci. Robot. 8:(84):eadc9244
    [Crossref] [Google Scholar]
  146. 146.
    Zhou W, Held D. 2023.. Learning to grasp the ungraspable with emergent extrinsic dexterity. . In Proceedings of the 6th Conference on Robot Learning, ed. K Liu, D Kulic, J Ichnowski , pp. 15060. Proc. Mach. Learn. Res. 205 . N.p.:: PMLR
    [Google Scholar]
  147. 147.
    Zhou W, Jiang B, Yang F, Paxton C, Held D. 2023.. HACMan: learning hybrid actor-critic maps for 6D non-prehensile manipulation. . In Proceedings of the 7th Conference on Robot Learning, ed. J Tan, M Toussaint, K Darvish , pp. 24165. Proc. Mach. Learn. Res. 229 . N.p.:: PMLR
    [Google Scholar]
  148. 148.
    Cho Y, Han J, Cho Y, Kim B. 2024.. CORN: contact-based object representation for nonprehensile manipulation of general unseen objects. . In The Twelfth International Conference on Learning Representations. La Jolla, CA:: Int. Conf. Learn. Represent. https://openreview.net/forum?id=KTtEICH4TO
    [Google Scholar]
  149. 149.
    Covariant. 2024.. Born out of research. . Covariant. https://covariant.ai/research
    [Google Scholar]
  150. 150.
    Sievers L, Pitz J, Bäuml B. 2022.. Learning purely tactile in-hand manipulation with a torque-controlled hand. . In 2022 International Conference on Robotics and Automation, pp. 274551. Piscataway, NJ:: IEEE
    [Google Scholar]
  151. 151.
    Pitz J, Röstel L, Sievers L, Burschka D, Bäuml B. 2024.. Learning a shape-conditioned agent for purely tactile in-hand manipulation of various objects. . In 2024 IEEE/RSJ International Conference on Intelligent Robots and Systems, 1211219. Piscataway, NJ:: IEE
    [Google Scholar]
  152. 152.
    Lv J, Feng Y, Zhang C, Zhao S, Shao L, Lu C. 2023.. SAM-RL: sensing-aware model-based reinforcement learning via differentiable physics-based simulation and rendering. . In Robotics: Science and Systems XIX, ed. K Bekris, K Hauser, S Herbert, J Yu, pap . 40. San Francisco:: Robot. Sci. Syst. Found.
    [Google Scholar]
  153. 153.
    Van Wyk K, Handa A, Makoviychuk V, Guo Y, Allshire A, Ratliff ND. 2024.. Geometric fabrics: a safe guiding medium for policy learning. . arXiv:2405.02250 [cs.RO]
  154. 154.
    Chitnis R, Tulsiani S, Gupta S, Gupta A. 2020.. Efficient bimanual manipulation using learned task schemas. . In 2020 IEEE International Conference on Robotics and Automation, pp. 114955. Piscataway, NJ:: IEEE
    [Google Scholar]
  155. 155.
    Büchler D, Guist S, Calandra R, Berenz V, Schölkopf B, Peters J. 2022.. Learning to play table tennis from scratch using muscular robots. . IEEE Trans. Robot. 38:(6):385060
    [Crossref] [Google Scholar]
  156. 156.
    Cheng S, Xu D. 2023.. LEAGUE: guided skill learning and abstraction for long-horizon manipulation. . IEEE Robot. Autom. Lett. 8:(10):645158
    [Crossref] [Google Scholar]
  157. 157.
    Funk N, Chalvatzaki G, Belousov B, Peters J. 2022.. Learn2Assemble with structured representations and search for robotic architectural construction. . In Proceedings of the 5th Conference on Robot Learning, ed. A Faust, D Hsu, G Neumann , pp. 140111. Proc. Mach. Learn. Res. 164 . N.p.:: PMLR
    [Google Scholar]
  158. 158.
    Ma Y, Farshidian F, Miki T, Lee J, Hutter M. 2022.. Combining learning-based locomotion policy with model-based manipulation for legged mobile manipulators. . IEEE Robot. Autom. Lett. 7:(2):237784
    [Crossref] [Google Scholar]
  159. 159.
    Fu Z, Cheng X, Pathak D. 2023.. Deep whole-body control: learning a unified policy for manipulation and locomotion. . In Proceedings of the 6th Conference on Robot Learning, ed. K Liu, D Kulic, J Ichnowski , pp. 13849. Proc. Mach. Learn. Res. 205 . N.p.:: PMLR
    [Google Scholar]
  160. 160.
    Wang C, Zhang Q, Tian Q, Li S, Wang X, et al. 2020.. Learning mobile manipulation through deep reinforcement learning. . Sensors 20:(3):939
    [Crossref] [Google Scholar]
  161. 161.
    Fu Z, Zhao Q, Wu Q, Wetzstein G, Finn C. 2024.. HumanPlus: humanoid shadowing and imitation from humans. . arXiv:2406.10454 [cs.RO]
  162. 162.
    Hu J, Stone P, Martín-Martín R. 2023.. Causal policy gradient for whole-body mobile manipulation. . In Robotics: Science and Systems XIX, ed. K Bekris, K Hauser, S Herbert, J Yu, pap . 49. San Francisco:: Robot. Sci. Syst. Found.
    [Google Scholar]
  163. 163.
    Yang R, Kim Y, Kembhavi A, Wang X, Ehsani K. 2023.. Harmonic mobile manipulation. . arXiv:2312.06639 [cs.RO]
  164. 164.
    Cheng X, Kumar A, Pathak D. 2023.. Legs as manipulator: pushing quadrupedal agility beyond locomotion. . In 2023 IEEE International Conference on Robotics and Automation, pp. 510612. Piscataway, NJ:: IEEE
    [Google Scholar]
  165. 165.
    Ji Y, Li Z, Sun Y, Peng XB, Levine S, et al. 2022.. Hierarchical reinforcement learning for precise soccer shooting skills using a quadrupedal robot. . In 2022 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 147986. Piscataway, NJ:: IEEE
    [Google Scholar]
  166. 166.
    Ji Y, Margolis GB, Agrawal P. 2023.. Dribblebot: dynamic legged manipulation in the wild. . In 2023 IEEE International Conference on Robotics and Automation, pp. 515562. Piscataway, NJ:: IEEE
    [Google Scholar]
  167. 167.
    Honerkamp D, Welschehold T, Valada A. 2023.. N2M2 : learning navigation for arbitrary mobile manipulation motions in unseen and dynamic environments. . IEEE Trans. Robot. 39:(5):360119
    [Crossref] [Google Scholar]
  168. 168.
    Sun C, Orbik J, Devin CM, Yang BH, Gupta A, et al. 2022.. Fully autonomous real-world reinforcement learning with applications to mobile manipulation. . In Proceedings of the 5th Conference on Robot Learning, ed. A Faust, D Hsu, G Neumann , pp. 30819. Proc. Mach. Learn. Res. 164 . N.p.:: PMLR
    [Google Scholar]
  169. 169.
    Jauhri S, Peters J, Chalvatzaki G. 2022.. Robot learning of mobile manipulation with reachability behavior priors. . IEEE Robot. Autom. Lett. 7:(3):8399406
    [Crossref] [Google Scholar]
  170. 170.
    Xiong H, Mendonca R, Shaw K, Pathak D. 2024.. Adaptive mobile manipulation for articulated objects in the open world. . arXiv:2401.14403 [cs.RO]
  171. 171.
    Uppal S, Agarwal A, Xiong H, Shaw K, Pathak D. 2024.. SPIN: simultaneous perception interaction and navigation. . In 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 1813342. Piscataway, NJ:: IEEE
    [Google Scholar]
  172. 172.
    Liu M, Chen Z, Cheng X, Ji Y, Yang R, Wang X. 2024.. Visual whole-body control for legged loco-manipulation. . arXiv:2403.16967 [cs.RO]
  173. 173.
    Kumar KN, Essa I, Ha S. 2023.. Cascaded compositional residual learning for complex interactive behaviors. . IEEE Robot. Autom. Lett. 8:(8):46018
    [Crossref] [Google Scholar]
  174. 174.
    Wu B, Martin-Martin R, Fei-Fei L. 2023.. M-EMBER: tackling long-horizon mobile manipulation via factorized domain transfer. . In 2023 IEEE International Conference on Robotics and Automation, pp. 1169097. Piscataway, NJ:: IEEE
    [Google Scholar]
  175. 175.
    Yokoyama N, Clegg AW, Undersander E, Ha S, Batra D, Rai A. 2023.. Adaptive skill coordination for robotic mobile manipulation. . IEEE Robot. Autom. Lett. 9:(1):77986
    [Crossref] [Google Scholar]
  176. 176.
    Herzog A, Rao K, Hausman K, Lu Y, Wohlhart P, et al. 2023.. Deep RL at scale: sorting waste in office buildings with a fleet of mobile manipulators. . arXiv:2305.03270 [cs.RO]
  177. 177.
    Sentis L, Khatib O. 2006.. A whole-body control framework for humanoids operating in human environments. . In 2006 IEEE International Conference on Robotics and Automation, pp. 264148. Piscataway, NJ:: IEEE
    [Google Scholar]
  178. 178.
    Ghadirzadeh A, Chen X, Yin W, Yi Z, Björkman M, Kragic D. 2020.. Human-centered collaborative robots with deep reinforcement learning. . IEEE Robot. Autom. Lett. 6:(2):56671
    [Crossref] [Google Scholar]
  179. 179.
    Christen S, Feng L, Yang W, Chao YW, Hilliges O, Song J. 2023.. SynH2R: synthesizing hand-object motions for learning human-to-robot handovers. . In 2023 IEEE International Conference on Robotics and Automation, pp. 316875. Piscataway, NJ:: IEEE
    [Google Scholar]
  180. 180.
    Christen S, Yang W, Pérez-D'Arpino C, Hilliges O, Fox D, Chao YW. 2023.. Learning human-to-robot handovers from point clouds. . In 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 965464. Piscataway, NJ:: IEEE
    [Google Scholar]
  181. 181.
    Dimeas F, Aspragathos N. 2015.. Reinforcement learning of variable admittance control for human-robot co-manipulation. . In 2015 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 101116. Piscataway, NJ:: IEEE
    [Google Scholar]
  182. 182.
    Chen YF, Everett M, Liu M, How JP. 2017.. Socially aware motion planning with deep reinforcement learning. . In 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 134350. Piscataway, NJ:: IEEE
    [Google Scholar]
  183. 183.
    Everett M, Chen YF, How JP. 2021.. Collision avoidance in pedestrian-rich environments with deep reinforcement learning. . IEEE Access 9::1035777
    [Crossref] [Google Scholar]
  184. 184.
    Liang J, Patel U, Sathyamoorthy AJ, Manocha D. 2021.. Crowd-steer: realtime smooth and collision-free robot navigation in densely crowded scenarios trained using high-fidelity simulation. . In Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence, pp. 422128. Calif.:: IJCAI
    [Google Scholar]
  185. 185.
    Hirose N, Shah D, Stachowicz K, Sridhar A, Levine S. 2024.. SELFI: autonomous self-improvement with reinforcement learning for social navigation. . arXiv:2403.00991 [cs.RO]
  186. 186.
    Liu P, Zhang K, Tateo D, Jauhri S, Hu Z, et al. 2023.. Safe reinforcement learning of dynamic high-dimensional robotic tasks: navigation, manipulation, interaction. . In 2023 IEEE International Conference on Robotics and Automation, pp. 944956. Piscataway, NJ:: IEEE
    [Google Scholar]
  187. 187.
    Nair S, Mitchell E, Chen K, Savarese S, Finn C, et al. 2022.. Learning language-conditioned robot behavior from offline data and crowd-sourced annotation. . In Proceedings of the 5th Conference on Robot Learning, ed. A Faust, D Hsu, G Neumann , pp. 130315. Proc. Mach. Learn. Res. 164 . N.p.:: PMLR
    [Google Scholar]
  188. 188.
    Reddy S, Dragan AD, Levine S. 2018.. Shared autonomy via deep reinforcement learning. . In Robotics: Science and Systems XIV, ed. H Kress-Gazit, S Srinivasa, T Howard, N Atanasov, pap. 5 . San Francisco:: Robot. Sci. Syst. Found.
    [Google Scholar]
  189. 189.
    Schaff C, Walter MR. 2020.. Residual policy learning for shared autonomy. . In Robotics: Science and Systems XVI, ed. M Toussaint, A Bicchi, T Hermans, pap . 72. San Francisco:: Robot. Sci. Syst. Found.
    [Google Scholar]
  190. 190.
    Chen YF, Liu M, Everett M, How JP. 2017.. Decentralized non-communicating multiagent collision avoidance with deep reinforcement learning. . In 2017 IEEE International Conference on Robotics and Automation, pp. 28592. Piscataway, NJ:: IEEE
    [Google Scholar]
  191. 191.
    Everett M, Chen YF, How JP. 2018.. Motion planning among dynamic, decision-making agents with deep reinforcement learning. . In 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 305259. Piscataway, NJ:: IEEE
    [Google Scholar]
  192. 192.
    Fan T, Long P, Liu W, Pan J. 2020.. Distributed multi-robot collision avoidance via deep reinforcement learning for navigation in complex scenarios. . Int. J. Robot. Res. 39:(7):85692
    [Crossref] [Google Scholar]
  193. 193.
    Han R, Chen S, Wang S, Zhang Z, Gao R, et al. 2022.. Reinforcement learned distributed multi-robot navigation with reciprocal velocity obstacle shaped rewards. . IEEE Robot. Autom. Lett. 7:(3):5896903
    [Crossref] [Google Scholar]
  194. 194.
    Sartoretti G, Kerr J, Shi Y, Wagner G, Kumar TS, et al. 2019.. PRIMAL: pathfinding via reinforcement and imitation multi-agent learning. . IEEE Robot. Autom. Lett. 4:(3):237885
    [Crossref] [Google Scholar]
  195. 195.
    Nachum O, Ahn M, Ponte H, Gu S, Kumar V. 2019.. Multi-agent manipulation via locomotion using hierarchical sim2real. . In Proceedings of the Conference on Robot Learning, ed. LP Kaelbling, D Kragic, K Sugiura , pp. 11021. Proc. Mach. Learn. Res. 100 . N.p.:: PMLR
    [Google Scholar]
  196. 196.
    Haarnoja T, Moran B, Lever G, Huang SH, Tirumala D, et al. 2024.. Learning agile soccer skills for a bipedal robot with deep reinforcement learning. . Sci. Robot. 9:(89):eadi8022
    [Crossref] [Google Scholar]
  197. 197.
    Van Den Berg J, Guy SJ, Lin M, Manocha D. 2011.. Reciprocal n-body collision avoidance. . In Robotics Research: The 14th International Symposium ISRR, ed. C Pradalier, R Siegwart, G Hirzinger , pp. 319. Berlin:: Springer
    [Google Scholar]
  198. 198.
    Uchendu I, Xiao T, Lu Y, Zhu B, Yan M, et al. 2023.. Jump-start reinforcement learning. . In Proceedings of the 40th International Conference on Machine Learning, ed. A Krause, E Brunskill, K Cho, B Engelhardt, S Sabato, J Scarlett , pp. 3455683. Proc. Mach. Learn. Res. 202 . N.p.:: PMLR
    [Google Scholar]
  199. 199.
    Li C, Tang C, Nishimura H, Mercat J, Tomizuka M, Zhan W. 2023.. Residual Q-learning: offline and online policy customization without value. . In Advances in Neural Information Processing Systems 36, ed. A Oh, T Naumann, A Globerson, K Saenko, M Hardt, S Levine , pp. 6185769. Red Hook, NY:: Curran
    [Google Scholar]
  200. 200.
    Hansen N, Su H, Wang X. 2023.. TD-MPC2: scalable, robust world models for continuous control. . In The Eleventh International Conference on Learning Representations. La Jolla, CA:: Int. Conf. Learn. Represent. https://openreview.net/forum?id=Oxh5CstDJU
    [Google Scholar]
  201. 201.
    Jeong GC, Bahety A, Pedraza G, Deshpande AD, Martín-Martín R. 2023.. BaRiFlex: a robotic gripper with versatility and collision robustness for robot learning. . arXiv:2312.05323 [cs.RO]
  202. 202.
    Eysenbach B, Gupta A, Ibarz J, Levine S. 2019.. Diversity is all you need: learning skills without a reward function. . In The Seventh International Conference on Learning Representations. La Jolla, CA:: Int. Conf. Learn. Represent. https://iclr.cc/virtual/2019/poster/720
    [Google Scholar]
  203. 203.
    Schwarke C, Klemm V, Van der Boon M, Bjelonic M, Hutter M. 2023.. Curiosity-driven learning of joint locomotion and manipulation tasks. . In Proceedings of the 7th Conference on Robot Learning, ed. J Tan, M Toussaint, K Darvish , pp. 2594610. Proc. Mach. Learn. Res. 229 . N.p.:: PMLR
    [Google Scholar]
  204. 204.
    Xie Z, Da X, Van de Panne M, Babich B, Garg A. 2021.. Dynamics randomization revisited: a case study for quadrupedal locomotion. . In 2021 IEEE International Conference on Robotics and Automation, pp. 495561. Piscataway, NJ:: IEEE
    [Google Scholar]
  205. 205.
    Martín-Martín R, Lee MA, Gardner R, Savarese S, Bohg J, Garg A. 2019.. Variable impedance control in end-effector space: an action space for reinforcement learning in contact-rich tasks. . In 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 101017. Piscataway, NJ:: IEEE
    [Google Scholar]
  206. 206.
    Xia F, Li C, Martín-Martín R, Litany O, Toshev A, Savarese S. 2021.. ReLMoGen: integrating motion generation in reinforcement learning for mobile manipulation. . In 2021 IEEE International Conference on Robotics and Automation, pp. 458390. Piscataway, NJ:: IEEE
    [Google Scholar]
  207. 207.
    Luo J, Xu C, Liu F, Tan L, Lin Z, et al. 2024.. FMB: a functional manipulation benchmark for generalizable robotic learning. . arXiv:2401.08553 [cs.RO]
  208. 208.
    Heo M, Lee Y, Lee D, Lim JJ. 2023.. FurnitureBench: reproducible real-world benchmark for long-horizon complex manipulation. . In Robotics: Science and Systems XIX, ed. K Bekris, K Hauser, S Herbert, J Yu, pap . 23. San Francisco:: Robot. Sci. Syst. Found.
    [Google Scholar]
  209. 209.
    van der Zant T, Iocchi L. 2011.. RoboCup@Home: adaptive benchmarking of robot bodies and minds. . In Social Robotics: Third International Conference on Social Robotics, ICSR 2011, ed. B Mutlu, C Bartneck, J Ham, V Evers, T Kanda , pp. 21425. Berlin:: Springer
    [Google Scholar]
  210. 210.
    Li X, Hsu K, Gu J, Pertsch K, Mees O, et al. 2024.. Evaluating real-world robot manipulation policies in simulation. . arXiv:2405.05941 [cs.RO]
  211. 211.
    Firoozi R, Tucker J, Tian S, Majumdar A, Sun J, et al. 2023.. Foundation models in robotics: applications, challenges, and the future. . arXiv:2312.07843 [cs.RO]
  212. 212.
    Hu Y, Xie Q, Jain V, Francis J, Patrikar J, et al. 2023.. Toward general-purpose robots via foundation models: a survey and meta-analysis. . arXiv:2312.08782 [cs.RO]
  213. 213.
    Yang S, Nachum O, Du Y, Wei J, Abbeel P, Schuurmans D. 2023.. Foundation models for decision making: problems, methods, and opportunities. . arXiv:2303.04129 [cs.AI]
  214. 214.
    Ma YJ, Liang W, Wang HJ, Wang S, Zhu Y, et al. 2024.. DrEureka: language model guided sim-to-real transfer. . arXiv:2406.01967 [cs.RO]
/content/journals/10.1146/annurev-control-030323-022510
Loading
/content/journals/10.1146/annurev-control-030323-022510
Loading

Data & Media loading...

Supplemental Materials

This is a required field
Please enter a valid email address
Approval was a Success
Invalid data
An Error Occurred
Approval was partially successful, following selected items could not be processed due to error