Safe Learning in Robotics: From Learning-Based Control to Safe Reinforcement Learning

Lukas Brunke; Melissa Greeff; Adam W. Hall; Zhaocong Yuan; Siqi Zhou; Jacopo Panerati; Angela P. Schoellig

doi:10.1146/annurev-control-042920-020211

Annual Review of Control, Robotics, and Autonomous Systems

Volume 5, 2022

Review Article

Free

Safe Learning in Robotics: From Learning-Based Control to Safe Reinforcement Learning

Lukas Brunke^1,2,3, Melissa Greeff^1,2,3, Adam W. Hall^1,2,3, Zhaocong Yuan^1,2,3, Siqi Zhou^1,2,3, Jacopo Panerati^1,2,3, and Angela P. Schoellig^1,2,3
View Affiliations Hide Affiliations

Affiliations: ¹Institute for Aerospace Studies, University of Toronto, Toronto, Ontario, Canada; email: [email protected][email protected][email protected][email protected][email protected][email protected][email protected] ²University of Toronto Robotics Institute, Toronto, Ontario, Canada ³Vector Institute for Artificial Intelligence, Toronto, Ontario, Canada
Vol. 5:411-444 (Volume publication date May 2022) https://doi.org/10.1146/annurev-control-042920-020211
First published as a Review in Advance on January 26, 2022
Copyright © 2022 by Annual Reviews. All rights reserved

Abstract

The last half decade has seen a steep rise in the number of contributions on safe learning methods for real-world robotic deployments from both the control and reinforcement learning communities. This article provides a concise but holistic review of the recent advances made in using machine learning to achieve safe decision-making under uncertainties, with a focus on unifying the language and frameworks used in control theory and reinforcement learning research. It includes learning-based control approaches that safely improve performance by learning the uncertain dynamics, reinforcement learning approaches that encourage safety or robustness, and methods that can formally certify the safety of a learned control policy. As data- and learning-based robot control methods continue to gain traction, researchers must understand when and how to best leverage them in real-world scenarios where safety is imperative, such as when operating in close proximityto humans. We highlight some of the open challenges that will drive the field of robot learning in the coming years, and emphasize the need for realistic physics-based benchmarks to facilitate fair comparisons between control and reinforcement learning approaches.

Keyword(s): adaptive control, benchmarks, learning-based control, machine learning, model predictive control, robot learning, robotics, robust control, safe learning, safe reinforcement learning

Article metrics loading...

/content/journals/10.1146/annurev-control-042920-020211

2022-05-03

2024-04-16

Full text loading...

/deliver/fulltext/control/5/1/annurev-control-042920-020211.html?itemId=/content/journals/10.1146/annurev-control-042920-020211&mimeType=html&fmt=ahah

Literature Cited

1.
Burnett K, Qian J, Du X, Liu L, Yoon DJ et al. 2021. Zeus: a system description of the two-time winner of the collegiate SAE autodrive competition. J. Field Robot. 38:139–66
[Google Scholar]
2.
Boutilier JJ, Brooks SC, Janmohamed A, Byers A, Buick JE et al. 2017. Optimizing a drone network to deliver automated external defibrillators. Circulation 135:2454–65
[Google Scholar]
3.
Dong K, Pereida K, Shkurti F, Schoellig AP. 2020. Catch the ball: accurate high-speed motions for mobile manipulators via inverse dynamics learning. arXiv:2003.07489 [cs.RO]
4.
García J, Fernández F. 2015. A comprehensive survey on safe reinforcement learning. J. Mach. Learn. Res. 16:1437–80
[Google Scholar]
5.
Dulac-Arnold G, Levine N, Mankowitz DJ, Li J, Paduraru C et al. 2021. An empirical investigation of the challenges of real-world reinforcement learning. arXiv:2003.11881 [cs.LG]
6.
Dyn. Syst. Lab 2021. safe-control-gym. GitHub https://github.com/utiasDSL/safe-control-gym
[Google Scholar]
7.
Yuan Z, Hall AW, Zhou S, Brunke L, Greeff M et al. 2021. safe-control-gym: a unified benchmark suite for safe learning-based control and reinforcement learning. arXiv:2109.06325 [cs.RO]
8.
Dulac-Arnold G, Mankowitz D, Hester T 2019. Challenges of real-world reinforcement learning. arXiv:1904.12901 [cs.LG]
9.
Hewing L, Wabersich KP, Menner M, Zeilinger MN. 2020. Learning-based model predictive control: toward safe learning in control. Annu. Rev. Control Robot. Auton. Syst. 3:269–96
[Google Scholar]
10.
Bristow D, Tharayil M, Alleyne A. 2006. A survey of iterative learning control. IEEE Control Syst. Mag. 26:396–114
[Google Scholar]
11.
Ahn HS, Chen Y, Moore KL. 2007. Iterative learning control: brief survey and categorization. IEEE Trans. Syst. Man Cybernet. C 37:1099–121
[Google Scholar]
12.
Polydoros AS, Nalpantidis L. 2017. Survey of model-based reinforcement learning: applications on robotics. J. Intell. Robot. Syst. 86:153–73
[Google Scholar]
13.
Chatzilygeroudis K, Vassiliades V, Stulp F, Calinon S, Mouret JB 2020. A survey on policy search algorithms for learning robot controllers in a handful of trials. IEEE Trans. Robot. 36:328–47
[Google Scholar]
14.
Ravichandar H, Polydoros AS, Chernova S, Billard A 2020. Recent advances in robot learning from demonstration. Annu. Rev. Control Robot. Auton. Syst. 3:297–330
[Google Scholar]
15.
Kober J, Bagnell JA, Peters J. 2013. Reinforcement learning in robotics: a survey. Int. J. Robot. Res. 32:1238–74
[Google Scholar]
16.
Recht B. 2019. A tour of reinforcement learning: the view from continuous control. Annu. Rev. Control Robot. Auton. Syst. 2:253–79
[Google Scholar]
17.
Kiumarsi B, Vamvoudakis KG, Modares H, Lewis FL 2018. Optimal and autonomous control using reinforcement learning: a survey. IEEE Trans. Neural Netw. Learn. Syst. 29:2042–62
[Google Scholar]
18.
Osborne M, Shin HS, Tsourdos A. 2021. A review of safe online learning for nonlinear control systems. 2021 International Conference on Unmanned Aircraft Systems (ICUAS)794–803 Piscataway, NJ: IEEE
[Google Scholar]
19.
Tambon F, Laberge G, An L, Nikanjam A, Mindom PSN et al. 2021. How to certify machine learning based safety-critical systems? A systematic literature review. arXiv:2107.12045 [cs.LG]
20.
Ray A, Achiam J, Amodei D 2019. Benchmarking safe exploration in deep reinforcement learning Preprint, OpenAI San Francisco, CA: https://cdn.openai.com/safexp-short.pdf
21.
Leike J, Martic M, Krakovna V, Ortega PA, Everitt T et al. 2017. AI safety gridworlds. arXiv:1711.09883 [cs.LG]
22.
Khalil H. 2002. Nonlinear Systems Upper Saddle River, NJ: Prentice Hall, 3rd ed..
23.
Sastry S, Bodson M. 2011. Adaptive Control: Stability, Convergence and Robustness Mineola, NY: Dover
24.
Nguyen-Tuong D, Peters J. 2011. Model learning for robot control: a survey. Cogn. Process. 12:319–40
[Google Scholar]
25.
Zhou K, Doyle J, Glover K. 1996. Robust and Optimal Control Upper Saddle River, NJ: Prentice Hall
26.
Dullerud G, Paganini F. 2005. A Course in Robust Control Theory: A Convex Approach New York: Springer
27.
Rawlings J, Mayne D, Diehl M. 2017. Model Predictive Control: Theory, Computation, and Design Santa Barbara, CA: Nob Hill
28.
Mayne D, Seron M, Raković S. 2005. Robust model predictive control of constrained linear systems with bounded disturbances. Automatica 41:219–24
[Google Scholar]
29.
Arulkumaran K, Deisenroth MP, Brundage M, Bharath AA. 2017. Deep reinforcement learning: a brief survey. IEEE Signal Process. Mag. 34:626–38
[Google Scholar]
30.
Dai B, Shaw A, Li L, Xiao L, He N et al. 2018. SBEED: convergent reinforcement learning with nonlinear function approximation. Proceedings of the 35th International Conference on Machine Learning J Dy, A Krause 1125–34 Proc. Mach. Learn. Res. 80 N.p.: PMLR
[Google Scholar]
31.
Cheng R, Verma A, Orosz G, Chaudhuri S, Yue Y, Burdick J 2019. Control regularization for reduced variance reinforcement learning. Proceedings of the 36th International Conference on Machine Learning K Chaudhuri, R Salakhutdinov 1141–50 Proc. Mach. Learn. Res. 97 N.p.: PMLR
[Google Scholar]
32.
Ghavamzadeh M, Mannor S, Pineau J, Tamar A. 2015. Bayesian reinforcement learning: a survey. Found. Trends Mach. Learn. 8:359–483
[Google Scholar]
33.
Altman E. 1999. Constrained Markov Decision Processes Boca Raton, FL: Chapman & Hall/CRC
34.
Achiam J, Held D, Tamar A, Abbeel P 2017. Constrained policy optimization. Proceedings of the 34th International Conference on Machine Learning D Precup, YW Teh 22–31 Proc. Mach. Learn. Res. 70 N.p.: PMLR
[Google Scholar]
35.
Nilim A, El Ghaoui L. 2005. Robust control of Markov decision processes with uncertain transition matrices. Oper. Res. 53:780–98
[Google Scholar]
36.
Pinto L, Davidson J, Sukthankar R, Gupta A. 2017. Robust adversarial reinforcement learning. Proceedings of the 34th International Conference on Machine Learning D Precup, YW Teh 2817–26 Proc. Mach. Learn. Res. 70 N.p.: PMLR
[Google Scholar]
37.
Pan X, Seita D, Gao Y, Canny J. 2019. Risk averse robust adversarial reinforcement learning. 2019 International Conference on Robotics and Automation (ICRA)8522–28 Piscataway, NJ: IEEE
[Google Scholar]
38.
Vinitsky E, Du Y, Parvate K, Jang K, Abbeel P, Bayen A. 2020. Robust reinforcement learning using adversarial populations. arXiv:2008.01825 [cs.LG]
39.
Cooper J, Che J, Cao C 2014. The use of learning in fast adaptation algorithms. Int. J. Adapt. Control Signal Process. 28:325–40
[Google Scholar]
40.
Gahlawat A, Zhao P, Patterson A, Hovakimyan N, Theodorou E 2020. L1-GP: L1 adaptive control with Bayesian learning. Proceedings of the 2nd Conference on Learning for Dynamics and Control AM Bayen, A Jadbabaie, G Pappas, PA Parrilo, B Recht et al.826–37 Proc. Mach. Learn. Res. 120. N.p. PMLR
[Google Scholar]
41.
Hovakimyan N, Cao C. 2010. L1 Adaptive Control Theory: Guaranteed Robustness with Fast Adaptation Philadelphia: Soc. Ind. Appl. Math.
42.
Grande RC, Chowdhary G, How JP 2014. Experimental validation of Bayesian nonparametric adaptive control using Gaussian processes. J. Aerosp. Inf. Syst. 11:565–78
[Google Scholar]
43.
Chowdhary G, Kingravi HA, How JP, Vela PA. 2015. Bayesian nonparametric adaptive control using Gaussian processes. IEEE Trans. Neural Netw. Learn. Syst. 26:537–50
[Google Scholar]
44.
Joshi G, Chowdhary G. 2019. Deep model reference adaptive control. 2019 IEEE 58th Conference on Decision and Control (CDC)4601–8 Piscataway, NJ: IEEE
[Google Scholar]
45.
Joshi G, Virdi J, Chowdhary G. 2020. Asynchronous deep model reference adaptive control. arXiv:2011.02920 [cs.RO]
46.
Berkenkamp F, Schoellig AP. 2015. Safe and robust learning control with Gaussian processes. 2015 European Control Conference (ECC)2496–501 Piscataway, NJ: IEEE
[Google Scholar]
47.
Holicki T, Scherer CW, Trimpe S. 2021. Controller design via experimental exploration with robustness guarantees. IEEE Control Syst. Lett. 5:641–46
[Google Scholar]
48.
von Rohr A, Neumann-Brosig M, Trimpe S 2021. Probabilistic robust linear quadratic regulators with Gaussian processes. Proceedings of the 3rd Conference on Learning for Dynamics and Control A Jadbabaie, J Lygeros, GJ Pappas, PA Parrilo, B Recht et al.324–35 Proc. Mach. Learn. Res. 144. N.p. PMLR
[Google Scholar]
49.
Helwa MK, Heins A, Schoellig AP. 2019. Provably robust learning-based approach for high-accuracy tracking control of Lagrangian systems. IEEE Robot. Autom. Lett. 4:1587–94
[Google Scholar]
50.
Greeff M, Schoellig AP. 2021. Exploiting differential flatness for robust learning-based tracking control using Gaussian processes. IEEE Control Syst. Lett. 5:1121–26
[Google Scholar]
51.
Tanaskovic M, Fagiano L, Smith R, Morari M 2014. Adaptive receding horizon control for constrained MIMO systems. Automatica 50:3019–29
[Google Scholar]
52.
Lorenzen M, Cannon M, Allgöwer F. 2019. Robust MPC with recursive model update. Automatica 103:461–71
[Google Scholar]
53.
Bujarbaruah M, Zhang X, Borrelli F. 2018. Adaptive MPC with chance constraints for FIR systems. 2018 Annual American Control Conference (ACC)2312–17 Piscataway, NJ: IEEE
[Google Scholar]
54.
Bujarbaruah M, Zhang X, Tanaskovic M, Borrelli F. 2019. Adaptive MPC under time varying uncertainty: robust and stochastic. arXiv:1909.13473 [eess.SY]
55.
Gonçalves GA, Guay M. 2016. Robust discrete-time set-based adaptive predictive control for nonlinear systems. J. Process Control 39:111–22
[Google Scholar]
56.
Köhler J, Kötting P, Soloperto R, Allgöwer F, Müller MA 2021. A robust adaptive model predictive control framework for nonlinear uncertain systems. Int. J. Robust Nonlinear Control 31:8725–49
[Google Scholar]
57.
Rosolia U, Borrelli F. 2018. Learning model predictive control for iterative tasks. A data-driven control framework. IEEE Trans. Autom. Control 63:1883–96
[Google Scholar]
58.
Bujarbaruah M, Zhang X, Rosolia U, Borrelli F 2018. Adaptive MPC for iterative tasks. 2018 IEEE Conference on Decision and Control (CDC)6322–27 Piscataway, NJ: IEEE
[Google Scholar]
59.
Pereida K, Brunke L, Schoellig AP. 2021. Robust adaptive model predictive control for guaranteed fast and accurate stabilization in the presence of model errors. Int. J. Robust Nonlinear Control 31:8750–84
[Google Scholar]
60.
Aswani A, Gonzalez H, Sastry SS, Tomlin C. 2013. Provably safe and robust learning-based model predictive control. Automatica 49:1216–26
[Google Scholar]
61.
Soloperto R, Müller MA, Trimpe S, Allgöwer F. 2018. Learning-based robust model predictive control with state-dependent uncertainty. IFAC-PapersOnLine 51:20442–47
[Google Scholar]
62.
Ostafew CJ, Schoellig AP, Barfoot TD. 2016. Robust constrained learning-based NMPC enabling reliable mobile robot path tracking. Int. J. Robot. Res. 35:1547–63
[Google Scholar]
63.
Hewing L, Kabzan J, Zeilinger MN. 2020. Cautious model predictive control using Gaussian process regression. IEEE Trans. Control Syst. Technol. 28:2736–43
[Google Scholar]
64.
Kamthe S, Deisenroth M 2018. Data-efficient reinforcement learning with probabilistic model predictive control. Proceedings of the Twenty-First International Conference on Artificial Intelligence and Statistics A Storkey, F Perez-Cruz 1701–10 Proc. Mach. Learn. Res. 84 N.p.: PMLR
[Google Scholar]
65.
Koller T, Berkenkamp F, Turchetta M, Boedecker J, Krause A. 2019. Learning-based model predictive control for safe exploration and reinforcement learning. arXiv:1906.12189 [eess.SY]
66.
Fan D, Agha A, Theodorou E 2020. Deep learning tubes for tube MPC. Robotics: Science and Systems XVI M Toussaint, A Bicchi, T Hermans, pap. 87 N.p.: Robot. Sci. Syst. Found.
[Google Scholar]
67.
McKinnon CD, Schoellig AP. 2020. Context-aware cost shaping to reduce the impact of model error in receding horizon control. 2020 IEEE International Conference on Robotics and Automation (ICRA)2386–92 Piscataway, NJ: IEEE
[Google Scholar]
68.
Berkenkamp F, Turchetta M, Schoellig A, Krause A 2017. Safe model-based reinforcement learning with stability guarantees. Advances in Neural Information Processing Systems 30 I Guyon, UV Luxburg, S Bengio, H Wallach, R Fergus et al.908–19 Red Hook, NY: Curran
[Google Scholar]
69.
Turchetta M, Berkenkamp F, Krause A 2016. Safe exploration in finite Markov decision processes with Gaussian processes. Advances in Neural Information Processing Systems 29 DD Lee, M Sugiyama, UV Luxburg, I Guyon, R Garnett 4312–20 Red Hook, NY: Curran
[Google Scholar]
70.
Dalal G, Dvijotham K, Vecerik M, Hester T, Paduraru C, Tassa Y. 2018. Safe exploration in continuous action spaces. arXiv:1801.08757 [cs.AI]
71.
Sutton RS, Barto AG. 2018. Reinforcement Learning: An Introduction Cambridge, MA: MIT Press, 2nd ed..
72.
Henderson P, Islam R, Bachman P, Pineau J, Precup D, Meger D 2018. Deep reinforcement learning that matters. The Thirty-Second AAAI Conference on Artificial Intelligence3207–14 Palo Alto, CA: AAAI Press
[Google Scholar]
73.
Moldovan TM, Abbeel P. 2012. Safe exploration in Markov decision processes. Proceedings of the 29th International Conference on Machine Learning (ICML)1451–58 Madison, WI: Omnipress
[Google Scholar]
74.
Brafman RI, Tennenholtz M. 2002. R-max – a general polynomial time algorithm for near-optimal reinforcement learning. J. Mach. Learn. Res. 3:213–31
[Google Scholar]
75.
Pham TH, De Magistris G, Tachibana R. 2018. OptLayer - practical constrained optimization for deep reinforcement learning in the real world. 2018 IEEE International Conference on Robotics and Automation (ICRA)6236–43 Piscataway, NJ: IEEE
[Google Scholar]
76.
Kim Y, Allmendinger R, López-Ibáñez M. 2021. Safe learning and optimization techniques: towards a survey of the state of the art. arXiv:2101.09505 [cs.LG]
77.
Duivenvoorden RR, Berkenkamp F, Carion N, Krause A, Schoellig AP. 2017. Constrained Bayesian optimization with particle swarms for safe adaptive controller tuning. IFAC-PapersOnLine 50:111800–7
[Google Scholar]
78.
Sui Y, Gotovos A, Burdick J, Krause A 2015. Safe exploration for optimization with Gaussian processes. Proceedings of the 32nd International Conference on Machine Learning F Bach, D Blei 997–1005 Proc. Mach. Learn. Res. 37 N.p.: PMLR
[Google Scholar]
79.
Berkenkamp F, Krause A, Schoellig AP. 2020. Bayesian optimization with safety constraints: safe and automatic parameter tuning in robotics. arXiv:1602.04450 [cs.RO]
80.
Sui Y, Zhuang V, Burdick J, Yue Y 2018. Stagewise safe Bayesian optimization with Gaussian processes. Proceedings of the 35th International Conference on Machine Learning J Dy, A Krause 4781–89 Proc. Mach. Learn. Res. 80 N.p.: PMLR
[Google Scholar]
81.
Baumann D, Marco A, Turchetta M, Trimpe S. 2021. GoSafe: globally optimal safe robot learning. arXiv:2105.13281 [cs.RO]
82.
Wachi A, Sui Y, Yue Y, Ono M. 2018. Safe exploration and optimization of constrained MDPs using Gaussian processes. The Thirty-Second AAAI Conference on Artificial Intelligence6548–55 Palo Alto, CA: AAAI Press
[Google Scholar]
83.
Srinivasan K, Eysenbach B, Ha S, Tan J, Finn C 2020. Learning to be safe: deep RL with a safety critic. arXiv:2010.14603 [cs.LG]
84.
Thananjeyan B, Balakrishna A, Nair S, Luo M, Srinivasan K et al. 2021. Recovery RL: safe reinforcement learning with learned recovery zones. IEEE Robot. Autom. Lett. 6:4915–22
[Google Scholar]
85.
Bharadhwaj H, Kumar A, Rhinehart N, Levine S, Shkurti F, Garg A. 2021. Conservative safety critics for exploration. arXiv:2010.14497 [cs.LG]
86.
Kumar A, Zhou A, Tucker G, Levine S 2020. Conservative Q-learning for offline reinforcement learning. arXiv:2006.04779 [cs.LG]
87.
Kahn G, Villaflor A, Pong V, Abbeel P, Levine S 2017. Uncertainty-aware reinforcement learning for collision avoidance. arXiv:1702.01182 [cs.LG]
88.
Lütjens B, Everett M, How JP 2019. Safe reinforcement learning with model uncertainty estimates. 2019 International Conference on Robotics and Automation (ICRA)8662–68 Piscataway, NJ: IEEE
[Google Scholar]
89.
Zhang J, Cheung B, Finn C, Levine S, Jayaraman D 2020. Cautious adaptation for reinforcement learning in safety-critical settings. Proceedings of the 37th International Conference on Machine Learning HD Daumé III, A Singh 11055–65 Proc. Mach. Learn. Res. 119 N.p.: PMLR
[Google Scholar]
90.
Chua K, Calandra R, McAllister R, Levine S 2018. Deep reinforcement learning in a handful of trials using probabilistic dynamics models. Advances in Neural Information Processing Systems 31 S Bengio, H Wallach, H Larochelle, K Grauman, N Cesa-Bianchi, R Garnett 4759–70 Red Hook, NY: Curran
[Google Scholar]
91.
Thananjeyan B, Balakrishna A, Rosolia U, Li F, McAllister R et al. 2020. Safety augmented value estimation from demonstrations (SAVED): safe deep model-based RL for sparse cost robotic tasks. IEEE Robot. Autom. Lett. 5:3612–19
[Google Scholar]
92.
Urpí NA, Curi S, Krause A. 2021. Risk-averse offline reinforcement learning. arXiv:2102.05371 [cs.LG]
93.
Chow Y, Ghavamzadeh M, Janson L, Pavone M. 2017. Risk-constrained reinforcement learning with percentile risk criteria. J. Mach. Learn. Res. 18:6070–120
[Google Scholar]
94.
Liang Q, Que F, Modiano E 2018. Accelerated primal-dual policy optimization for safe reinforcement learning. arXiv:1802.06480 [cs.AI]
95.
Schulman J, Levine S, Abbeel P, Jordan M, Moritz P 2015. Trust region policy optimization. Proceedings of the 32nd International Conference on Machine Learning F Bach, D Blei 1889–97 Proc. Mach. Learn. Res. 37 N.p.: PMLR
[Google Scholar]
96.
Chow Y, Nachum O, Duenez-Guzman E, Ghavamzadeh M 2018. A Lyapunov-based approach to safe reinforcement learning. Advances in Neural Information Processing Systems 31 S Bengio, H Wallach, H Larochelle, K Grauman, N Cesa-Bianchi, R Garnett 8103–12 Red Hook, NY: Curran
[Google Scholar]
97.
Chow Y, Nachum O, Faust A, Duenez-Guzman E, Ghavamzadeh M. 2019. Lyapunov-based safe policy optimization for continuous control. arXiv:1901.10031 [cs.LG]
98.
Satija H, Amortila P, Pineau J 2020. Constrained Markov decision processes via backward value functions. Proceedings of the 37th International Conference on Machine Learning HD Daumé III, A Singh 8502–11 Proc. Mach. Learn. Res. 119 N.p.: PMLR
[Google Scholar]
99.
Morimoto J, Doya K. 2005. Robust reinforcement learning. Neural Comput 17:335–59
[Google Scholar]
100.
Turchetta M, Krause A, Trimpe S. 2020. Robust model-free reinforcement learning with multi-objective Bayesian optimization. 2020 IEEE International Conference on Robotics and Automation (ICRA)10702–708 Piscataway, NJ: IEEE
[Google Scholar]
101.
Goodfellow IJ, Pouget-Abadie J, Mirza M, Xu B, Warde-Farley D et al. 2014. Generative adversarial networks. arXiv:1406.2661 [stat.ML]
102.
Mnih V, Kavukcuoglu K, Silver D, Rusu AA, Veness J et al. 2015. Human-level control through deep reinforcement learning. Nature 518:529–33
[Google Scholar]
103.
Lütjens B, Everett M, How JP 2020. Certified adversarial robustness for deep reinforcement learning. Proceedings of the Conference on Robot Learning LP Kaelbling, D Kragic, K Sugiura 1328–37 Proc. Mach. Learn. Res. 100 N.p.: PMLR
[Google Scholar]
104.
Sadeghi F, Levine S 2017. CAD2RL: real single-image flight without a single real image. Robotics: Science and Systems XIII N Amato, S Srinivasa, N Ayanian, S Kuindersma, pap. 34 N.p.: Robot. Sci. Syst. Found.
[Google Scholar]
105.
Loquercio A, Kaufmann E, Ranftl R, Dosovitskiy A, Koltun V, Scaramuzza D. 2020. Deep drone racing: from simulation to reality with domain randomization. IEEE Trans. Robot. 36:1–14
[Google Scholar]
106.
Rajeswaran A, Ghotra S, Ravindran B, Levine S. 2017. EPOpt: learning robust neural network policies using model ensembles. arXiv:1610.01283 [cs.LG]
107.
Mehta B, Diaz M, Golemo F, Pal CJ, Paull L 2020. Active domain randomization. Proceedings of the Conference on Robot Learning LP Kaelbling, D Kragic, K Sugiura 1162–76 Proc. Mach. Learn. Res. 100 N.p.: PMLR
108.
Zhou S, Helwa MK, Schoellig AP. 2020. Deep neural networks as add-on modules for enhancing robot performance in impromptu trajectory tracking. Int. J. Robot. Res. 39:1397–418
[Google Scholar]
109.
Jin M, Lavaei J 2020. Stability-certified reinforcement learning: a control-theoretic perspective. IEEE Access 8:229086–100
[Google Scholar]
110.
Shi G, Shi X, O'Connell M, Yu R, Azizzadenesheli K et al. 2019. Neural lander: stable drone landing control using learned dynamics. 2019 International Conference on Robotics and Automation (ICRA)9784–90 Piscataway, NJ: IEEE
[Google Scholar]
111.
Fazlyab M, Robey A, Hassani H, Morari M, Pappas GJ 2019. Efficient and accurate estimation of Lipschitz constants for deep neural networks. arXiv:1906.04893 [cs.LG]
112.
Richards SM, Berkenkamp F, Krause A 2018. The Lyapunov neural network: adaptive stability certification for safe learning of dynamical systems. Proceedings of the 2nd Conference on Robot Learning A Billard, A Dragan, J Peters, J Morimoto 466–76 Proc. Mach. Learn. Res. 87 N.p.: PMLR
[Google Scholar]
113.
Zhou Z, Oguz OS, Leibold M, Buss M 2020. A general framework to increase safety of learning algorithms for dynamical systems based on region of attraction estimation. IEEE Trans. Robot. 36:1472–90
[Google Scholar]
114.
Jarvis-Wloszek Z, Feeley R, Tan W, Sun K, Packard A 2003. Some controls applications of sum of squares programming. 42nd IEEE International Conference on Decision and Control (CDC) 54676–81 Piscataway, NJ: IEEE
[Google Scholar]
115.
Schilders WH, Van der Vorst HA, Rommes J 2008. Model Order Reduction: Theory, Research Aspects and Applications Berlin: Springer
116.
Alshiekh M, Bloem R, Udiger Ehlers R, Könighofer B, Niekum S, Topcu U 2018. Safe reinforcement learning via shielding. The Thirty-Second AAAI Conference on Artificial Intelligence2669–78 Palo Alto, CA: AAAI Press
[Google Scholar]
117.
Ames AD, Coogan S, Egerstedt M, Notomista G, Sreenath K, Tabuada P. 2019. Control barrier functions: theory and applications. 2019 18th European Control Conference (ECC)3420–31 Piscataway, NJ: IEEE
[Google Scholar]
118.
Taylor AJ, Dorobantu VD, Le HM, Yue Y, Ames AD. 2019. Episodic learning with control Lyapunov functions for uncertain robotic systems. 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)6878–84 Piscataway, NJ: IEEE
[Google Scholar]
119.
Taylor A, Singletary A, Yue Y, Ames A 2020. Learning for safety-critical control with control barrier functions. Proceedings of the 2nd Conference on Learning for Dynamics and Control AM Bayen, A Jadbabaie, G Pappas, PA Parrilo, B Recht et al.708–17 Proc. Mach. Learn. Res. 120 N.p.: PMLR
[Google Scholar]
120.
Ohnishi M, Wang L, Notomista G, Egerstedt M. 2019. Barrier-certified adaptive reinforcement learning with applications to brushbot navigation. IEEE Trans. Robot. 35:1186–205
[Google Scholar]
121.
Choi J, Castañeda F, Tomlin C, Sreenath K 2020. Reinforcement learning for safety-critical control under model uncertainty, using control Lyapunov functions and control barrier functions. Robotics: Science and Systems XVI M Toussaint, A Bicchi, T Hermans, pap. 88 N.p.: Robot. Sci. Syst. Found.
[Google Scholar]
122.
Taylor AJ, Dorobantu VD, Krishnamoorthy M, Le HM, Yue Y, Ames AD. 2019. A control Lyapunov perspective on episodic learning via projection to state stability. 2019 IEEE 58th Conference on Decision and Control (CDC)1448–55 Piscataway, NJ: IEEE
[Google Scholar]
123.
Taylor AJ, Singletary A, Yue Y, Ames AD. 2020. A control barrier perspective on episodic learning via projection-to-state safety. arXiv:2003.08028 [eess.SY]
124.
Taylor AJ, Dorobantu VD, Dean S, Recht B, Yue Y, Ames AD. 2020. Towards robust data-driven control synthesis for nonlinear systems with actuation uncertainty. arXiv:2011.10730 [eess.SY]
125.
Taylor AJ, Ames AD. 2020. Adaptive safety with control barrier functions. 2020 American Control Conference (ACC)1399–405 Piscataway, NJ: IEEE
[Google Scholar]
126.
Lopez BT, Slotine JJE, How JP. 2021. Robust adaptive control barrier functions: an adaptive and data-driven approach to safety. IEEE Control Syst. Lett. 5:1031–36
[Google Scholar]
127.
Cheng R, Orosz G, Murray RM, Burdick JW 2019. End-to-end safe reinforcement learning through barrier functions for safety-critical continuous control tasks. The Thirty-Third AAAI Conference on Artificial Intelligence3387–95 Palo Alto, CA: AAAI Press
[Google Scholar]
128.
Fan DD, Nguyen J, Thakker R, Alatur N, Agha-mohammadi A, Theodorou EA. 2019. Bayesian learning-based adaptive control for safety critical systems. arXiv:1910.02325 [eess.SY]
129.
Wang L, Theodorou EA, Egerstedt M. 2018. Safe learning of quadrotor dynamics using barrier certificates. 2018 IEEE International Conference on Robotics and Automation (ICRA)2460–65 Piscataway, NJ: IEEE
[Google Scholar]
130.
Khojasteh MJ, Dhiman V, Franceschetti M, Atanasov N 2020. Probabilistic safety constraints for learned high relative degree system dynamics. Proceedings of the 2nd Conference on Learning for Dynamics and Control AM Bayen, A Jadbabaie, G Pappas, PA Parrilo, B Recht et al.781–92 Proc. Mach. Learn. Res. 120 N.p.: PMLR
[Google Scholar]
131.
Dean S, Taylor AJ, Cosner RK, Recht B, Ames AD 2020. Guaranteeing safety of learned perception modules via measurement-robust control barrier functions. arXiv:2010.16001 [eess.SY]
132.
Mitchell I, Bayen A, Tomlin C. 2005. A time-dependent Hamilton-Jacobi formulation of reachable sets for continuous dynamic games. IEEE Trans. Autom. Control 50:947–57
[Google Scholar]
133.
Fisac JF, Akametalu AK, Zeilinger MN, Kaynama S, Gillula J, Tomlin CJ 2019. A general safety framework for learning-based control in uncertain robotic systems. IEEE Trans. Autom. Control 64:2737–52
[Google Scholar]
134.
Gillula JH, Tomlin CJ. 2012. Guaranteed safe online learning via reachability: tracking a ground target using a quadrotor. 2012 IEEE International Conference on Robotics and Automation (ICRA)2723–30 Piscataway, NJ: IEEE
[Google Scholar]
135.
Bajcsy A, Bansal S, Bronstein E, Tolani V, Tomlin CJ 2019. An efficient reachability-based framework for provably safe autonomous navigation in unknown environments. 2019 IEEE 58th Conference on Decision and Control (CDC)1758–65 Piscataway, NJ: IEEE
[Google Scholar]
136.
Fisac JF, Lugovoy NF, Rubies-Royo V, Ghosh S, Tomlin CJ 2019. Bridging Hamilton-Jacobi safety analysis and reinforcement learning. 2019 International Conference on Robotics and Automation (ICRA)8550–56 Piscataway, NJ: IEEE
[Google Scholar]
137.
Choi JJ, Lee D, Sreenath K, Tomlin CJ, Herbert SL 2021. Robust control barrier-value functions for safety-critical control. arXiv:2104.02808 [eess.SY]
138.
Herbert S, Choi JJ, Sanjeev S, Gibson M, Sreenath K, Tomlin CJ. 2021. Scalable learning of safety guarantees for autonomous systems using Hamilton-Jacobi reachability. arXiv:2101.05916 [cs.RO]
139.
Wabersich KP, Zeilinger MN. 2018. Linear model predictive safety certification for learning-based control. 2018 IEEE Conference on Decision and Control (CDC)7130–35 Piscataway, NJ: IEEE
[Google Scholar]
140.
Wabersich KP, Hewing L, Carron A, Zeilinger MN. 2019. Probabilistic model predictive safety certification for learning-based control. arXiv:1906.10417 [eess.SY]
141.
Wabersich KP, Zeilinger MN. 2021. A predictive safety filter for learning-based control of constrained nonlinear dynamical systems. Automatica 129:109597
[Google Scholar]
142.
Mannucci T, van Kampen E, de Visser C, Chu Q. 2018. Safe exploration algorithms for reinforcement learning controllers. IEEE Trans. Neural Netw. Learn. Syst. 29:1069–81
[Google Scholar]
143.
Liu CK, Negrut D. 2021. The role of physics-based simulators in robotics. Annu. Rev. Control Robot. Auton. Syst. 4:35–58
[Google Scholar]
144.
Brockman G, Cheung V, Pettersson L, Schneider J, Schulman J et al. 2016. OpenAI Gym. arXiv:1606.01540 [cs.LG]
145.
Panerati J, Zheng H, Zhou S, Xu J, Prorok A, Schoellig AP 2021. Learning to fly—a Gym environment with PyBullet physics for reinforcement learning of multi-agent quadcopter control. 2021 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)7512–19 Piscataway, NJ: IEEE
[Google Scholar]
146.
Schulman J, Wolski F, Dhariwal P, Radford A, Klimov O. 2017. Proximal policy optimization algorithms. arXiv:1707.06347 [cs.LG]
147.
Wieber PB, Tedrake R, Kuindersma S 2016. Modeling and control of legged robots. Springer Handbook of Robotics B Siciliano, O Khatib 1203–34 Cham, Switz: Springer
[Google Scholar]
148.
McKinnon CD, Schoellig AP. 2018. Experience-based model selection to enable long-term, safe control for repetitive tasks under changing conditions. 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)2977–84 Piscataway, NJ: IEEE
[Google Scholar]
149.
Chandak Y, Jordan S, Theocharous G, White M, Thomas PS 2020. Towards safe policy improvement for non-stationary MDPs. Advances in Neural Information Processing Systems 33 H Larochelle, M Ranzato, R Hadsell, MF Balcan, H Lin 9156–68 Red Hook, NY: Curran
[Google Scholar]
150.
Burgner-Kahrs J, Rucker DC, Choset H. 2015. Continuum robots for medical applications: a survey. IEEE Trans. Robot. 31:1261–80
[Google Scholar]
151.
Mueller FL, Schoellig AP, D'Andrea R. 2012. Iterative learning of feed-forward corrections for high-performance tracking. 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)3276–81 Piscataway, NJ: IEEE
[Google Scholar]
152.
Dean S, Tu S, Matni N, Recht B 2018. Safely learning to control the constrained linear quadratic regulator. arXiv:5582–88
153.
McKinnon CD, Schoellig AP. 2019. Learning probabilistic models for safe predictive control in unknown environments. 2019 18th European Control Conference (ECC)2472–79 Piscataway, NJ: IEEE
[Google Scholar]

/content/journals/10.1146/annurev-control-042920-020211

Safe Learning in Robotics: From Learning-Based Control to Safe Reinforcement Learning

Annual Review of Control, Robotics, and Autonomous Systems 5, 411 (2022); https://doi.org/10.1146/annurev-control-042920-020211

/content/journals/10.1146/annurev-control-042920-020211

Data & Media loading...

Supplemental Material

Supplementary Data

Download Supplemental Table 1 (PDF).

Article Type: Review Article

Most Cited Most Cited RSS feed

- Planning and Decision-Making for Autonomous Vehicles
  
  Wilko Schwarting, Javier Alonso-Mora, and Daniela Rus
  
  Vol. 1 (2018), pp. 187–210
- Learning-Based Model Predictive Control: Toward Safe Learning in Control
  
  Lukas Hewing, Kim P. Wabersich, Marcel Menner, and Melanie N. Zeilinger
  
  Vol. 3 (2020), pp. 269–296
- Recent Advances in Robot Learning from Demonstration
  
  Harish Ravichandar, Athanasios S. Polydoros, Sonia Chernova, and Aude Billard
  
  Vol. 3 (2020), pp. 297–330
- A Tour of Reinforcement Learning: The View from Continuous Control
  
  Benjamin Recht
  
  Vol. 2 (2019), pp. 253–279
- Haptics: The Present and Future of Artificial Touch Sensation
  
  Heather Culbertson, Samuel B. Schorr, and Allison M. Okamura
  
  Vol. 1 (2018), pp. 385–409
- Magnetic Methods in Robotics
  
  Jake J. Abbott, Eric Diller, and Andrew J. Petruska
  
  Vol. 3 (2020), pp. 57–90
- A Century of Robotic Hands
  
  C. Piazza, G. Grioli, M.G. Catalano, and A. Bicchi
  
  Vol. 2 (2019), pp. 1–32
- Safe Learning in Robotics: From Learning-Based Control to Safe Reinforcement Learning
  
  Lukas Brunke, Melissa Greeff, Adam W. Hall, Zhaocong Yuan, Siqi Zhou, Jacopo Panerati, and Angela P. Schoellig
  
  Vol. 5 (2022), pp. 411–444
- Distributed Optimization for Control
  
  Angelia Nedić, and Ji Liu
  
  Vol. 1 (2018), pp. 77–103
- Soft Micro- and Nanorobotics
  
  Chengzhi Hu, Salvador Pané, and Bradley J. Nelson
  
  Vol. 1 (2018), pp. 53–75
More Less

Annual Review of Control, Robotics, and Autonomous Systems

Volume 5, 2022

Review Article

Free

Safe Learning in Robotics: From Learning-Based Control to Safe Reinforcement Learning

Abstract

Supplementary Data

Most Read This Month

Most Cited Most Cited RSS feed