1932

Abstract

This article provides an exposition of the field of adaptive control and its intersections with reinforcement learning. Adaptive control and reinforcement learning are two different methods that are both commonly employed for the control of uncertain systems. Historically, adaptive control has excelled at real-time control of systems with specific model structures through adaptive rules that learn the underlying parameters while providing strict guarantees on stability, asymptotic performance, and learning. Reinforcement learning methods are applicable to a broad class of systems and are able to produce near-optimal policies for highly complex control tasks. This is often enabled by significant offline training via simulation or the collection of large input-state datasets. This article attempts to compare adaptive control and reinforcement learning using a common framework. The problem statement in each field and highlights of their results are outlined. Two specific examples of dynamic systems are used to illustrate the details of the two methods, their advantages, and their deficiencies. The need for real-time control methods that leverage tools from both approaches is motivated through the lens of this common framework.

Loading

Article metrics loading...

/content/journals/10.1146/annurev-control-062922-090153
2023-05-03
2024-12-04
Loading full text...

Full text loading...

/deliver/fulltext/control/6/1/annurev-control-062922-090153.html?itemId=/content/journals/10.1146/annurev-control-062922-090153&mimeType=html&fmt=ahah

Literature Cited

  1. 1.
    Qu Z, Thomsen B, Annaswamy AM. 2020. Adaptive control for a class of multi-input multi-output plants with arbitrary relative degree. IEEE Trans. Autom. Control 65:3023–38
    [Google Scholar]
  2. 2.
    Annaswamy AM, Fradkov AL. 2021. A historical perspective of adaptive control and learning. Annu. Rev. Control 52:18–41
    [Google Scholar]
  3. 3.
    Krstić M. 2021. Control has met learning: aspirational lessons from adaptive control theory Control Meets Learning Seminar, virtual, June 9
    [Google Scholar]
  4. 4.
    Narendra KS, Annaswamy AM. 2005. Stable Adaptive Systems Mineola, NY: Dover (original publication by Prentice Hall 1989 )
    [Google Scholar]
  5. 5.
    Åström KJ, Wittenmark B. 1995. Adaptive Control Reading, MA: Addison-Wesley. , 2nd ed..
    [Google Scholar]
  6. 6.
    Ioannou PA, Sun J. 1996. Robust Adaptive Control Upper Saddle River, NJ: Prentice Hall
    [Google Scholar]
  7. 7.
    Sastry S, Bodson M. 1989. Adaptive Control: Stability, Convergence and Robustness Upper Saddle River, NJ: Prentice Hall
    [Google Scholar]
  8. 8.
    Krstić M, Kanellakopoulos I, Kokotović P. 1995. Nonlinear and Adaptive Control Design New York: Wiley
    [Google Scholar]
  9. 9.
    Tao G. 2003. Adaptive Control Design and Analysis New York: Wiley
    [Google Scholar]
  10. 10.
    Narendra KS, Annaswamy AM. 1987. Persistent excitation in adaptive systems. Int. J. Control 45:127–60
    [Google Scholar]
  11. 11.
    Boyd S, Sastry S. 1983. On parameter convergence in adaptive control. Syst. Control Lett. 3:311–19
    [Google Scholar]
  12. 12.
    Morgan AP, Narendra KS. 1977. On the uniform asymptotic stability of certain linear nonautonomous differential equations. SIAM J. Control Optim. 15:5–24
    [Google Scholar]
  13. 13.
    Anderson BDO, Johnson CR Jr. 1982. Exponential convergence of adaptive identification and control algorithms. Automatica 18:1–13
    [Google Scholar]
  14. 14.
    Jenkins B, Krupadanam A, Annaswamy AM. 2019. Fast adaptive observers for battery management systems. IEEE Trans. Control Syst. Technol. 28:776–89
    [Google Scholar]
  15. 15.
    Gaudio JE, Annaswamy AM, Bolender MA, Lavretsky E, Gibson TE. 2021. A class of high order tuners for adaptive systems. IEEE Control Syst. Lett. 5:391–96
    [Google Scholar]
  16. 16.
    Lavretsky E, Wise KA. 2013. Robust and Adaptive Control with Aerospace Applications London: Springer
    [Google Scholar]
  17. 17.
    Luders G, Narendra KS. 1974. Stable adaptive schemes for state estimation and identification of linear systems. IEEE Trans. Autom. Control 19:841–47
    [Google Scholar]
  18. 18.
    Lion PM. 1967. Rapid identification of linear and nonlinear systems. AIAA J. 5:1835–42
    [Google Scholar]
  19. 19.
    Kreisselmeier G. 1977. Adaptive observers with exponential rate of convergence. IEEE Trans. Autom. Control 22:2–8
    [Google Scholar]
  20. 20.
    Aranovskiy S, Belov A, Ortega R, Barabanov N, Bobtsov A. 2019. Parameter identification of linear time-invariant systems using dynamic regressor extension and mixing. Int. J. Adapt. Control Signal Process. 33:1016–30
    [Google Scholar]
  21. 21.
    Ortega R, Aranovskiy S, Pyrkin A, Astolfi A, Bobtsov A. 2020. New results on parameter estimation via dynamic regressor extension and mixing: continuous and discrete-time cases. IEEE Trans. Autom. Control 66:2265–72
    [Google Scholar]
  22. 22.
    Gaudio JE, Annaswamy AM, Lavretsky E, Bolender MA. 2020. Fast parameter convergence in adaptive flight control. AIAA Scitech 2020 Forum pap. 2020-0594 Reston, VA: Am. Inst. Aeronaut. Astronaut.
    [Google Scholar]
  23. 23.
    Kailath T. 1980. Linear Systems Englewood Cliffs, NJ: Prentice Hall
    [Google Scholar]
  24. 24.
    Chen CT. 1984. Linear System Theory and Design New York: Holt, Rinehart & Winston
    [Google Scholar]
  25. 25.
    Feldbaum A. 1960. Dual control theory. I. Avtom. Telemekhanika 21:1240–49
    [Google Scholar]
  26. 26.
    Yakubovich VA. 1962. The solution of certain matrix inequalities in automatic control theory. Dokl. Akad. Nauk 143:1304–7
    [Google Scholar]
  27. 27.
    Kalman RE. 1963. Lyapunov functions for the problem of Lur'e in automatic control. PNAS 49:201–5
    [Google Scholar]
  28. 28.
    Meyer K. 1965. On the existence of Lyapunov function for the problem of Lur'e. J. Soc. Ind. Appl. Math. A 3:373–83
    [Google Scholar]
  29. 29.
    Lefschetz S. 1965. Stability of Nonlinear Control Systems New York: Academic
    [Google Scholar]
  30. 30.
    Narendra KS, Taylor JH. 1973. Frequency Domain Criteria for Absolute Stability New York: Academic
    [Google Scholar]
  31. 31.
    Fradkov A. 1974. Synthesis of adaptive system of stabilization for linear dynamic plants. Autom. Remote Control 35:1960–66
    [Google Scholar]
  32. 32.
    Anderson BDO, Bitmead RR, Johnson CR Jr., Kokotović PV, Kosut RL et al. 1986. Stability of Adaptive Systems: Passivity and Averaging Analysis Cambridge, MA: MIT Press
    [Google Scholar]
  33. 33.
    Evesque S, Annaswamy AM, Niculescu S, Dowling AP. 2003. Adaptive control of a class of time-delay systems. J. Dyn. Syst. Meas. Control 125:186–93
    [Google Scholar]
  34. 34.
    Anderson BDO. 1985. Adaptive systems, lack of persistency of excitation and bursting phenomena. Automatica 21:247–58
    [Google Scholar]
  35. 35.
    Morris A, Fenton T, Nazer Y. 1977. Application of self-tuning regulators to the control of chemical processes. IFAC Proc. Vol. 10:16447–55
    [Google Scholar]
  36. 36.
    Fortescue T, Kershenbaum LS, Ydstie BE. 1981. Implementation of self-tuning regulators with variable forgetting factors. Automatica 17:831–35
    [Google Scholar]
  37. 37.
    Narendra KS, Annaswamy AM. 1987. A new adaptive law for robust adaptation without persistent excitation. IEEE Trans. Autom. Control 32:134–45
    [Google Scholar]
  38. 38.
    Narendra KS, Annaswamy AM. 1986. Robust adaptive control in the presence of bounded disturbances. IEEE Trans. Autom. Control 31:306–15
    [Google Scholar]
  39. 39.
    Jenkins BM, Annaswamy AM, Lavretsky E, Gibson TE. 2018. Convergence properties of adaptive systems and the definition of exponential stability. SIAM J. Control Optim. 56:2463–84
    [Google Scholar]
  40. 40.
    Kumar PR. 1983. Optimal adaptive control of linear-quadratic-Gaussian systems. SIAM J. Control Optim. 21:163–78
    [Google Scholar]
  41. 41.
    Desoer C, Liu R, Auth L. 1965. Linearity versus nonlinearity and asymptotic stability in the large. IEEE Trans. Circuit Theory 12:117–18
    [Google Scholar]
  42. 42.
    Goodwin GC, Ramadge PJ, Caines PE. 1981. Discrete time stochastic adaptive control. SIAM J. Control Optim. 19:829–53
    [Google Scholar]
  43. 43.
    Goodwin GC, Ramadge PJ, Caines PE. 1980. Discrete-time multivariable adaptive control. IEEE Trans. Autom. Control 25:449–56
    [Google Scholar]
  44. 44.
    Becker A, Kumar PR, Wei CZ. 1985. Adaptive control with the stochastic approximation algorithm: geometry and convergence. IEEE Trans. Autom. Control 30:330–38
    [Google Scholar]
  45. 45.
    Stevens BL, Lewis FL. 2003. Aircraft Control and Simulation Hoboken, NJ: Wiley. , 2nd ed..
    [Google Scholar]
  46. 46.
    Rohrs C, Valavani L, Athans M, Stein G. 1985. Robustness of continuous-time adaptive control algorithms in the presence of unmodeled dynamics. IEEE Trans. Autom. Control 30:881–89
    [Google Scholar]
  47. 47.
    Marino R, Tomei P. 1993. Global adaptive output-feedback control of nonlinear systems. II. Nonlinear parameterization.. IEEE Trans. Autom. Control 38:33–48
    [Google Scholar]
  48. 48.
    Seto D, Annaswamy AM, Baillieul J. 1994. Adaptive control of nonlinear systems with a triangular structure. IEEE Trans. Autom. Control 39:1411–28
    [Google Scholar]
  49. 49.
    Wen C, Hill DJ. 1990. Adaptive linear control of nonlinear systems. IEEE Trans. Autom. Control 35:1253–57
    [Google Scholar]
  50. 50.
    Gusev S. 1988. Linear stabilization of nonlinear systems program motion. Syst. Control Lett. 11:409–12
    [Google Scholar]
  51. 51.
    Marino R. 1985. High-gain feedback in non-linear control systems. Int. J. Control 42:1369–85
    [Google Scholar]
  52. 52.
    Haddad WM, Chellaboina V, Hayakawa T. 2001. Robust adaptive control for nonlinear uncertain systems. Proceedings of the 40th IEEE Conference on Decision and Control, Vol. 21615–20. Piscataway, NJ: IEEE
    [Google Scholar]
  53. 53.
    Fradkov A, Lipkovich M 2015. Adaptive absolute stability.. IFAC-PapersOnLine 48:11258–63
    [Google Scholar]
  54. 54.
    Astolfi A, Karagiannis D, Ortega R. 2007. Nonlinear and Adaptive Control with Applications London: Springer
    [Google Scholar]
  55. 55.
    Fomin V, Fradkov AL, Yakubovich V. 1981. Adaptive Control of Dynamical Systems Moscow: Nauka
    [Google Scholar]
  56. 56.
    Seron MM, Hill DJ, Fradkov AL. 1995. Nonlinear adaptive control of feedback passive systems. Automatica 31:1053–60
    [Google Scholar]
  57. 57.
    Andrievsky B, Selivanov A. 2020. Historical overview of the passification method and its applications to nonlinear and adaptive control problems. 2020 European Control Conference791–94. Piscataway, NJ: IEEE
    [Google Scholar]
  58. 58.
    Astolfi A, Ortega R. 2003. Immersion and invariance: a new tool for stabilization and adaptive control of nonlinear systems. IEEE Trans. Autom. Control 48:590–606
    [Google Scholar]
  59. 59.
    Narendra KS, Parthasarathy K. 1990. Identification and control of dynamical systems using neural networks. IEEE Trans. Neural Netw. 1:4–27
    [Google Scholar]
  60. 60.
    Sanner RM, Slotine JJE. 1992. Gaussian networks for direct adaptive control. IEEE Trans. Neural Netw. 3:837–63
    [Google Scholar]
  61. 61.
    Polycarpou MM. 1996. Stable adaptive neural control scheme for nonlinear systems. IEEE Trans. Autom. Control 41:447–51
    [Google Scholar]
  62. 62.
    Lewis FL, Yesildirek A, Liu K. 1996. Multilayer neural-net robot controller with guaranteed tracking performance. IEEE Trans. Neural Netw. 7:388–99
    [Google Scholar]
  63. 63.
    Chang YC, Roohi N, Gao S 2019. Neural Lyapunov control. Advances in Neural Information Processing Systems 32 H Wallach, H Larochelle, A Beygelzimer, F d'Alché-Buc, E Fox, R Garnett 3245–54. Red Hook, NY: Curran
    [Google Scholar]
  64. 64.
    Yu SH, Annaswamy AM. 1998. Stable neural controllers for nonlinear dynamic systems. Automatica 34:641–50
    [Google Scholar]
  65. 65.
    Bertsekas DP. 2015. Value and policy iterations in optimal control and adaptive dynamic programming. IEEE Trans. Neural Netw. Learn. Syst. 28:500–9
    [Google Scholar]
  66. 66.
    Bertsekas DP. 2017. Dynamic Programming and Optimal Control, Vol. 1 Belmont, MA: Athena Sci.
    [Google Scholar]
  67. 67.
    Watkins CJ, Dayan P. 1992. Q-learning. Mach. Learn. 8:279–92
    [Google Scholar]
  68. 68.
    Recht B. 2019. A tour of reinforcement learning: the view from continuous control. Annu. Rev. Control Robot. Auton. Syst. 2:253–79
    [Google Scholar]
  69. 69.
    Yu SH, Annaswamy AM 1996. Neural control for nonlinear dynamic systems. Advances in Neural Information Processing Systems 8 D Touretzky, MC Mozer, ME Hasselmo 1010–16, Cambridge, MA: MIT Press
    [Google Scholar]
  70. 70.
    Lagoudakis MG, Parr R. 2003. Least-squares policy iteration. J. Mach. Learn. Res. 4:1107–49
    [Google Scholar]
  71. 71.
    Bradtke SJ, Barto AG. 1996. Linear least-squares algorithms for temporal difference learning. Mach. Learn. 22:33–57
    [Google Scholar]
  72. 72.
    Narendra KS, Parthasarathy K. 1991. Gradient methods for the optimization of dynamical systems containing neural networks. IEEE Trans. Neural Netw. 2:252–62
    [Google Scholar]
  73. 73.
    Werbos PJ. 1990. Backpropagation through time: what it does and how to do it. Proc. IEEE 78:1550–60
    [Google Scholar]
  74. 74.
    Finn C, Abbeel P, Levine S. 2017. Model-agnostic meta-learning for fast adaptation of deep networks. Proceedings of the 34th International Conference on Machine Learning Doina Precup, YW Teh 1126–35. Proc. Mach. Learn. Res. 70 N.p.: PMLR
    [Google Scholar]
  75. 75.
    Schulman J, Levine S, Abbeel P, Jordan M, Moritz P 2015. Trust region policy optimization. Proceedings of the 32nd International Conference on Machine Learning F Bach, D Blei 1889–97. Proc. Mach. Learn. Res. 37 N.p.: PMLR
    [Google Scholar]
  76. 76.
    Silver D, Lever G, Heess N, Degris T, Wierstra D, Riedmiller M. 2014. Deterministic policy gradient algorithms. Proceedings of the 31st International Conference on Machine Learning EP Xing, T Jebara 387–395. Proc. Mach. Learn. Res. 32. N.p.: PMLR
    [Google Scholar]
  77. 77.
    Zhao W, Queralta JP, Westerlund T. 2020. Sim-to-real transfer in deep reinforcement learning for robotics: a survey. 2020 IEEE Symposium Series on Computational Intelligence737–44. Piscataway, NJ: IEEE
    [Google Scholar]
  78. 78.
    Chebotar Y, Handa A, Makoviychuk V, Macklin M, Issac J et al. 2019. Closing the sim-to-real loop: adapting simulation randomization with real world experience. 2019 International Conference on Robotics and Automation8973–79. Piscataway, NJ: IEEE
    [Google Scholar]
  79. 79.
    Campi MC, Kumar PR. 1998. Adaptive linear quadratic Gaussian control: the cost-biased approach revisited. SIAM J. Control Optim. 36:1890–907
    [Google Scholar]
  80. 80.
    Borkar V, Varaiya P. 1979. Adaptive control of Markov chains, I: finite parameter set. IEEE Trans. Autom. Control 24:953–57
    [Google Scholar]
  81. 81.
    Campi MC, Kumar PR. 1996. Optimal adaptive control of an LQG system. Proceedings of 35th IEEE Conference on Decision and Control, Vol. 1349–53. Piscataway, NJ: IEEE
    [Google Scholar]
  82. 82.
    Guo L, Chen HF. 1991. The Åström–Wittenmark self-tuning regulator revisited and ELS-based adaptive trackers. IEEE Trans. Autom. Control 36:802–12
    [Google Scholar]
  83. 83.
    Guo L. 1995. Convergence and logarithm laws of self-tuning regulators. Automatica 31:435–50
    [Google Scholar]
  84. 84.
    Duncan T, Guo L, Pasik-Duncan B. 1999. Adaptive continuous-time linear quadratic Gaussian control. IEEE Trans. Autom. Control 44:1653–62
    [Google Scholar]
  85. 85.
    Dean S, Tu S, Matni N, Recht B. 2018. Safely learning to control the constrained linear quadratic regulator. arXiv:1809.10121 [math.OC]
  86. 86.
    Abbasi-Yadkori Y, Szepesvári C. 2011. Regret bounds for the adaptive control of linear quadratic systems. Proceedings of the 24th Annual Conference on Learning Theory SM Kakade, U Von Luxburg 1–26. Proc. Mach. Learn. Res. 19. N.p.: PMLR
    [Google Scholar]
  87. 87.
    Lin YH, Narendra K. 1980. A new error model for adaptive systems. IEEE Trans. Autom. Control 25:585–87
    [Google Scholar]
  88. 88.
    Lewis FL, Vrabie D. 2009. Reinforcement learning and adaptive dynamic programming for feedback control. IEEE Circuits Syst. Mag. 9:332–50
    [Google Scholar]
  89. 89.
    Jiang Y, Jiang ZP. 2015. Global adaptive dynamic programming for continuous-time nonlinear systems. IEEE Trans. Autom. Control 60:2917–29
    [Google Scholar]
  90. 90.
    Tsitsiklis J, Van Roy B 1996. Analysis of temporal-diffference learning with function approximation. Advances in Neural Information Processing Systems 9 MC Mozer, M Jordan, T Petsche 1075–81. Cambridge, MA: MIT Press
    [Google Scholar]
  91. 91.
    Vrabie D, Pastravanu O, Abu-Khalaf M, Lewis FL. 2009. Adaptive optimal control for continuous-time linear systems based on policy iteration. Automatica 45:477–84
    [Google Scholar]
  92. 92.
    Berkenkamp F, Turchetta M, Schoellig A, Krause A 2017. Safe model-based reinforcement learning with stability guarantees. Advances in Neural Information Processing Systems 30 I Guyon, U Von Luxburg, S Bengio, H Wallach, R Fergus et al.908–18. Red Hook, NY: Curran
    [Google Scholar]
  93. 93.
    Annaswamy AM, Guha A, Cui Y, Tang S, Gaudio JE. 2022. Integration of adaptive control and reinforcement learning approaches for real-time control and learning. arXiv:2105.06577 [cs.LG]
  94. 94.
    Matni N, Proutiere A, Rantzer A, Tu S. 2019. From self-tuning regulators to reinforcement learning and back again. 2019 IEEE 58th Conference on Decision and Control3724–40. Piscataway, NJ: IEEE
    [Google Scholar]
  95. 95.
    Westenbroek T, Mazumdar E, Fridovich-Keil D, Prabhu V, Tomlin CJ, Sastry SS. 2020. Adaptive control for linearizable systems using on-policy reinforcement learning. 2020 59th IEEE Conference on Decision and Control118–25. Piscataway, NJ: IEEE
    [Google Scholar]
  96. 96.
    Sun R, Greene ML, Le DM, Bell ZI, Chowdhary G, Dixon WE. 2021. Lyapunov-based real-time and iterative adjustment of deep neural networks. IEEE Control Syst. Lett. 6:193–98
    [Google Scholar]
  97. 97.
    Richards SM, Azizan N, Slotine JJ, Pavone M 2021. Adaptive-control-oriented meta-learning for nonlinear systems. Robotics: Science and Systems XVII D Shell, M Toussaint, MA Hsieh, pap. 56 N.p.: Robot. Sci. Syst. Found.
    [Google Scholar]
/content/journals/10.1146/annurev-control-062922-090153
Loading
/content/journals/10.1146/annurev-control-062922-090153
Loading

Data & Media loading...

  • Article Type: Review Article
This is a required field
Please enter a valid email address
Approval was a Success
Invalid data
An Error Occurred
Approval was partially successful, following selected items could not be processed due to error