1932

Abstract

Planning under uncertainty is critical to robotics. The partially observable Markov decision process (POMDP) is a mathematical framework for such planning problems. POMDPs are powerful because of their careful quantification of the nondeterministic effects of actions and the partial observability of the states. But for the same reason, they are notorious for their high computational complexity and have been deemed impractical for robotics. However, over the past two decades, the development of sampling-based approximate solvers has led to tremendous advances in POMDP-solving capabilities. Although these solvers do not generate the optimal solution, they can compute good POMDP solutions that significantly improve the robustness of robotics systems within reasonable computational resources, thereby making POMDPs practical for many realistic robotics problems. This article presents a review of POMDPs, emphasizing computational issues that have hindered their practicality in robotics and ideas in sampling-based solvers that have alleviated such difficulties, together with lessons learned from applying POMDPs to physical robots.

Loading

Article metrics loading...

/content/journals/10.1146/annurev-control-042920-092451
2022-05-03
2024-06-16
Loading full text...

Full text loading...

/deliver/fulltext/control/5/1/annurev-control-042920-092451.html?itemId=/content/journals/10.1146/annurev-control-042920-092451&mimeType=html&fmt=ahah

Literature Cited

  1. 1. 
    Drake AW. 1962. Observation of a Markov process through a noisy channel PhD Thesis Mass. Inst. Technol. Cambridge, MA:
    [Google Scholar]
  2. 2. 
    Sondik EJ. 1971. The optimal control of partially observable Markov processes PhD Thesis Stanford Univ. Stanford, CA:
    [Google Scholar]
  3. 3. 
    Papadimitriou CH, Tsitsiklis JN. 1987. The complexity of Markov decision processes. Math. Oper. Res. 12:441–50
    [Google Scholar]
  4. 4. 
    Kaelbling LP, Littman ML, Cassandra AR. 1998. Planning and acting in partially observable stochastic domains. Artif. Intell. 101:99–134
    [Google Scholar]
  5. 5. 
    Monahan GE. 1982. State of the art—a survey of partially observable Markov decision processes: theory, models, and algorithms. Manag. Sci. 28:1–16
    [Google Scholar]
  6. 6. 
    Bonet B, Geffner H. 2009. Solving POMDPs: RTDP-Bel versus point-based algorithms. Proceedings of the Twenty-First International Joint Conference on Artificial Intelligence1641–46 Calif.: IJCAI
    [Google Scholar]
  7. 7. 
    Sondik EJ. 1978. The optimal control of partially observable Markov processes over the infinite horizon: discounted costs. Oper. Res. 26:282–304
    [Google Scholar]
  8. 8. 
    Mundhenk M, Goldsmith J, Lusena C, Allender E. 2000. Complexity of finite-horizon Markov decision process problems. J. ACM 47:681–720
    [Google Scholar]
  9. 9. 
    Vlassis N, Littman ML, Barber D. 2012. On the computational complexity of stochastic controller optimization in POMDPs. ACM Trans. Comput. Theory 4:12
    [Google Scholar]
  10. 10. 
    Natarajan BK. 1986. On moving and orienting objects Tech. Rep. Cornell Univ. Ithaca, NY:
    [Google Scholar]
  11. 11. 
    Canny J, Reif J. 1987. New lower bound techniques for robot motion planning problems. 28th Annual Symposium on Foundations of Computer Science49–60 Piscataway, NJ: IEEE
    [Google Scholar]
  12. 12. 
    Pineau J, Gordon G, Thrun S 2003. Point-based value iteration: an anytime algorithm for POMDPs. Proceedings of the Eighteenth International Joint Conference on Artificial Intelligence 31025–32 Calif.: IJCAI
    [Google Scholar]
  13. 13. 
    Hsu D, Lee W, Rong N 2007. What makes some POMDP problems easy to approximate?. Advances in Neural Information Processing Systems 20 J Platt, D Koller, Y Singer, S Roweis 689–96 Red Hook, NY: Curran
    [Google Scholar]
  14. 14. 
    Hauskrecht M. 2000. Value-function approximations for partially observable Markov decision processes. J. Artif. Intell. Res. 13:33–94
    [Google Scholar]
  15. 15. 
    Smith T, Simmons R. 2004. Heuristic search value iteration for POMDPs. UAI '04: Proceedings of the 20th Conference on Uncertainty in Artificial Intelligence520–27 Arlington, VA: AUAI Press
    [Google Scholar]
  16. 16. 
    Smith T, Simmons R. 2005. Point-based POMDP algorithms: improved analysis and implementation. UAI '05: Proceedings of the Twenty-First Conference on Uncertainty in Artificial Intelligence542–49 Arlington, VA: AUAI Press
    [Google Scholar]
  17. 17. 
    Kurniawati H, Hsu D, Lee W 2008. SARSOP: efficient point-based POMDP planning by approximating optimally reachable belief spaces. Robotics: Science and Systems III W Burgard, O Brock, C Stachniss 65–72 Cambridge, MA: MIT Press
    [Google Scholar]
  18. 18. 
    Smallwood RD, Sondik EJ. 1973. The optimal control of partially observable Markov processes over a finite horizon. Oper. Res. 21:1071–88
    [Google Scholar]
  19. 19. 
    Cheng H. 1988. Algorithms for partially observable Markov decision processes PhD Thesis Univ. B.C. Vancouver, Can:.
    [Google Scholar]
  20. 20. 
    Zhang NL, Zhang W. 2001. Speeding up the convergence of value iteration in partially observable Markov decision processes. J. Artif. Intell. Res. 14:29–51
    [Google Scholar]
  21. 21. 
    Spaan MT, Vlassis N. 2005. Perseus: randomized point-based value iteration for POMDPs. J. Artif. Intell. Res. 24:195–220
    [Google Scholar]
  22. 22. 
    Ong SC, Png SW, Hsu D, Lee WS. 2010. Planning under uncertainty for robotic tasks with mixed observability. Int. J. Robot. Res. 29:1053–68
    [Google Scholar]
  23. 23. 
    Ji S, Parr R, Li H, Liao X, Carin L. 2007. Point-based policy iteration. Proceedings of the 22nd National Conference on Artificial Intelligence1243–49 Palo Alto, CA: AAAI Press
    [Google Scholar]
  24. 24. 
    Hansen EA. 1998. Solving POMDPs by searching in policy space. UAI '98: Proceedings of the Fourteenth Conference on Uncertainty in Artificial Intelligence211–19 San Francisco: Morgan Kaufmann
    [Google Scholar]
  25. 25. 
    Thrun S 1999. Monte Carlo POMDPs. Advances in Neural Information Processing Systems 12 S Solla, T Leen, K Müller 1064–70 Cambridge, MA: MIT Press
    [Google Scholar]
  26. 26. 
    Porta JM, Vlassis N, Spaan MT, Poupart P. 2006. Point-based value iteration for continuous POMDPs. J. Mach. Learn. Res. 7:2329–67
    [Google Scholar]
  27. 27. 
    Bai H, Hsu D, Lee WS, Ngo VA 2010. Monte Carlo value iteration for continuous-state POMDPs. Algorithmic Foundations of Robotics IX D Hsu, V Isler, JC Latombe, MC Lin 175–91 Berlin: Springer
    [Google Scholar]
  28. 28. 
    Kurniawati H, Bandyopadhyay T, Patrikalakis NM 2012. Global motion planning under uncertain motion, sensing, and environment map. Robotics: Science and Systems VII H Durrant-Whyte, N Roy, P Abbeel 161–68 Cambridge, MA: MIT Press
    [Google Scholar]
  29. 29. 
    Bonet B. 1998. Solving large POMDPs using real time dynamic programming. Proceedings of the AAAI Fall Symposium on Planning with Partially Observable Markov Decision Processes Palo Alto, CA: AAAI Press
    [Google Scholar]
  30. 30. 
    Kim SK, Salzman O, Likhachev M. 2019. POMHDP: search-based belief space planning using multiple heuristics. Proceedings of the Twenty-Ninth International Conference on Automated Planning and Scheduling734–44 Palo Alto, CA: AAAI Press
    [Google Scholar]
  31. 31. 
    Coulom R 2006. Efficient selectivity and backup operators in Monte-Carlo tree search. Computers and Games: 5th International Conference, CG 2006 HJ van den Herik, P Ciancarini, HHLM Donkers 72–83 Berlin: Springer
    [Google Scholar]
  32. 32. 
    Silver D, Veness J 2010. Monte-Carlo planning in large POMDPs. Advances in Neural Information Processing Systems 23 J Lafferty, C Williams, J Shawe-Taylor, R Zemel, A Culotta 2164–72 Red Hook, NY: Curran
    [Google Scholar]
  33. 33. 
    Kocsis L, Szepesvári C 2006. Bandit based Monte-Carlo planning. Machine Learning: ECML 2006 J Fürnkranz, T Scheffer, M Spiliopoulou 282–93 Berlin: Springer
    [Google Scholar]
  34. 34. 
    Auer P, Cesa-Bianchi N, Fischer P. 2002. Finite-time analysis of the multiarmed bandit problem. Mach. Learn. 47:235–56
    [Google Scholar]
  35. 35. 
    Somani A, Ye N, Hsu D, Lee WS 2013. DESPOT: online POMDP planning with regularization. Advances in Neural Information Processing Systems 26 CJC Burges, L Bottou, M Welling, Z Ghahramani, KQ Weinberger 1772–80 Red Hook, NY: Curran
    [Google Scholar]
  36. 36. 
    Kurniawati H, Yadav V 2013. An online POMDP solver for uncertainty planning in dynamic environment. Robotics Research: The 16th International Symposium ISRR M Inaba, P Corke 611–29 Cham, Switz: Springer
    [Google Scholar]
  37. 37. 
    Hoerger M, Kurniawati H, Elfes A. 2019. Multilevel Monte-Carlo for solving POMDPs online. Robotics Research: The 19th International Symposium ISRRed. T Asfour, E Yoshida, J Park, H Christensen, O Khatibpp. 174–90 Cham, Switz.: Springer
    [Google Scholar]
  38. 38. 
    Theocharous G, Kaelbling L 2003. Approximate planning in POMDPs with macro-actions. Advances in Neural Information Processing Systems 16 S Thrun, L Saul, B Schölkopf 775–82 Cambridge, MA: MIT Press
    [Google Scholar]
  39. 39. 
    He R, Brunskill E, Roy N 2010. PUMA: planning under uncertainty with macro-actions. Proceedings of the Twenty-Fourth AAAI Conference on Artificial Intelligence1089–95 Palo Alto, CA: AAAI Press
    [Google Scholar]
  40. 40. 
    Lim ZW, Hsu D, Lee WS 2011. Monte Carlo value iteration with macro-actions. Advances in Neural Information Processing Systems 24 J Shawe-Taylor, R Zemel, P Bartlett, F Pereira, KQ Weinberger 1287–95 Red Hook, NY: Curran
    [Google Scholar]
  41. 41. 
    Kurniawati H, Du Y, Hsu D, Lee W. 2011. Motion planning under uncertainty for robotic tasks with long time horizons. Int. J. Robot. Res. 30:308–23
    [Google Scholar]
  42. 42. 
    Agha-Mohammadi AA, Chakravorty S, Amato NM. 2014. FIRM: sampling-based feedback motion-planning under motion uncertainty and imperfect measurements. Int. J. Robot. Res. 33:268–304
    [Google Scholar]
  43. 43. 
    Bertsekas DP. 2011. Dynamic Programming and Optimal Control 1 Belmont, MA: Athena Sci., 3rd ed..
    [Google Scholar]
  44. 44. 
    Kavraki LE, Svestka P, Latombe JC, Overmars MH. 1996. Probabilistic roadmaps for path planning in high-dimensional configuration spaces. IEEE Trans. Robot. Autom. 12:566–80
    [Google Scholar]
  45. 45. 
    Flaspohler G, Roy NA, Fisher JW III. 2020. Belief-dependent macro-action discovery in POMDPs using the value of information. Advances in Neural Information Processing Systems 33 H Larochelle, M Ranzato, R Hadsell, MF Balcan, H Lin 11108–18 Red Hook, NY: Curran
    [Google Scholar]
  46. 46. 
    Hoey J, Poupart P. 2005. Solving POMDPs with continuous or large discrete observation spaces. Proceedings of the 19th International Joint Conference on Artificial Intelligence1332–38 San Francisco: Morgan Kaufmann
    [Google Scholar]
  47. 47. 
    Bai H, Hsu D, Lee WS. 2014. Integrated perception and planning in the continuous space: a POMDP approach. Int. J. Robot. Res. 33:1288–302
    [Google Scholar]
  48. 48. 
    Sunberg ZN, Kochenderfer MJ. 2018. Online algorithms for POMDPs with continuous state, action, and observation spaces. Proceedings of the Twenty-Eighth International Conference on Automated Planning and Scheduling259–63 Palo Alto, CA: AAAI Press
    [Google Scholar]
  49. 49. 
    Hoerger M, Kurniawati H. 2021. An on-line POMDP solver for continuous observation spaces. 2021 IEEE International Conference on Advanced Roboticspp. 7643–49 Piscataway, NJ: IEEE
    [Google Scholar]
  50. 50. 
    Mansley C, Weinstein A, Littman M. 2011. Sample-based planning for continuous action Markov decision processes. Proceedings of the Twenty-First International Conference on Automated Planning and Scheduling335–38 Palo Alto, CA: AAAI Press
    [Google Scholar]
  51. 51. 
    Seiler K, Kurniawati H, Singh S. 2015. An online and approximate solver for POMDPs with continuous action space. 2015 IEEE International Conference on Robotics and Automation2290–97 Piscataway, NJ: IEEE
    [Google Scholar]
  52. 52. 
    Morere P, Marchant R, Ramos F. 2016. Bayesian optimisation for solving continuous state-action-observation POMDPs Paper presented at the 30th Conference on Neural Information Processing Systems Barcelona, Spain: Dec. 5–10
    [Google Scholar]
  53. 53. 
    Mern J, Yildiz A, Sunberg Z, Mukerji T, Kochenderfer MJ 2021. Bayesian optimized Monte Carlo planning. Proceedings of the Thirty-Fifth AAAI Conference on Artificial Intelligence11880–87 Palo Alto, CA: AAAI Press
    [Google Scholar]
  54. 54. 
    Van Den Berg J, Abbeel P, Goldberg K 2011. LQG-MP: optimized path planning for robots with motion uncertainty and imperfect state information. Int. J. Robot. Res. 30:895–913
    [Google Scholar]
  55. 55. 
    Hoerger M, Kurniawati H, Elfes A. 2020. Non-linearity measure for POMDP-based motion planning. arXiv:2005.14406 [cs.RO]
  56. 56. 
    Wang E, Kurniawati H, Kroese D. 2018. An on-line planner for POMDPs with large discrete action space: a quantile-based approach. Proceedings of the Twenty-Eighth International Conference on Automated Planning and Scheduling273–77 Palo Alto, CA: AAAI Press
    [Google Scholar]
  57. 57. 
    Wang E, Kurniawati H, Kroese D. 2019. Inventory control with partially observable states. 23rd International Congress on Modelling and Simulation200–6 Canberra, Aust.: Model. Simul. Soc. Aust. N.Z.
    [Google Scholar]
  58. 58. 
    Brunskill E, Kaelbling LP, Lozano-Pérez T, Roy N. 2008. Continuous-state POMDPs with hybrid dynamics. Proceedings of the Tenth International Symposium on Artificial Intelligence and Mathematics13–18 Palo Alto, CA: AAAI Press
    [Google Scholar]
  59. 59. 
    Van Den Berg J, Patil S, Alterovitz R 2012. Motion planning under uncertainty using iterative local optimization in belief space. Int. J. Robot. Res. 31:1263–78
    [Google Scholar]
  60. 60. 
    Giles MB. 2015. Multilevel Monte Carlo methods. Acta Numer 24:259–328
    [Google Scholar]
  61. 61. 
    Hoerger M, Song J, Kurniawati H, Elfes A. 2019. POMDP-based candy server: lessons learned from a seven day demo. Proceedings of the Twenty-Ninth International Conference on Automated Planning and Scheduling698–706 Palo Alto, CA: AAAI Press
    [Google Scholar]
  62. 62. 
    Poupart P. 2009. Symbolic Perseus. Pascal Poupart's Homepage https://cs.uwaterloo.ca/˜ppoupart/software.html#symbolic-perseus
    [Google Scholar]
  63. 63. 
    Poupart P. 2005. Exploiting structure to efficiently solve large scale partially observable Markov decision processes PhD Thesis Univ. Toronto Toronto, Can:.
    [Google Scholar]
  64. 64. 
    Hoey J, St-Aubin R, Hu A, Boutilier C. 1999. SPUDD: stochastic planning using decision diagrams. UAI '99: Proceedings of the Fifteenth Conference on Uncertainty in Artificial Intelligence279–88 San Francisco: Morgan Kaufmann
    [Google Scholar]
  65. 65. 
    Smith T. 2013. ZMDP software for POMDP and MDP planning. Long Horizon http://longhorizon.org/trey/zmdp
    [Google Scholar]
  66. 66. 
    Cassandra AR. 2021. POMDP file format. POMDP.org http://pomdp.org/code/pomdp-file-spec.html
    [Google Scholar]
  67. 67. 
    Natl. Univ. Singap. Dep. Comput 2020. SARSOP. GitHub https://github.com/AdaCompNUS/sarsop
    [Google Scholar]
  68. 68. 
    Natl. Univ. Singap. Dep. Comput 2014. Approximate POMDP planning software. National University of Singapore Department of Computing https://bigbird.comp.nus.edu.sg/pmwiki/farm/appl
    [Google Scholar]
  69. 69. 
    Natl. Univ. Singap. Dep. Comput 2021. Approximate POMDP Planning Online (APPL Online) toolkit. GitHub https://github.com/AdaCompNUS/despot
    [Google Scholar]
  70. 70. 
    Aust. Natl. Univ. Robust Decis.-Mak. Learn. Lab 2020. Toolkit for approximating and Adapting POMDP solutions In Realtime (TAPIR). GitHub https://github.com/RDLLab/tapir
    [Google Scholar]
  71. 71. 
    Aust. Natl. Univ. Robust Decis.-Mak. Learn. Lab 2021. On-line POMDP Planning Toolkit (OPPT). GitHub https://github.com/RDLLab/oppt
    [Google Scholar]
  72. 72. 
    H2RLab 2021. pomdp_py documentation. pomdp_py https://h2r.github.io/pomdp-py/html
    [Google Scholar]
  73. 73. 
    Littlefield Z, Klimenko D, Kurniawati H, Bekris KE 2015. The importance of a suitable distance function in belief-space planning. Robotics Research: Volume 2 A Bicchi, W Burgard 683–700 Cham, Switz: Springer
    [Google Scholar]
  74. 74. 
    Karkus P, Hsu D, Lee WS 2017. QMDP-Net: deep learning for planning under partial observability. Advances in Neural Information Processing Systems 30 I Guyon, UV Luxburg, S Bengio, H Wallach, R Fergus et al.4694–704 Red Hook, NY: Curran
    [Google Scholar]
  75. 75. 
    Friedland B. 2012. Control System Design: An Introduction to State-Space Methods Mineola, NY: Dover
    [Google Scholar]
  76. 76. 
    Ng AY, Jordan M. 2000. Pegasus: a policy search method for large MDPs and POMDPs. UAI '00: Proceedings of the Sixteenth Conference on Uncertainty in Artificial Intelligence406–15 San Francisco: Morgan Kaufmann
    [Google Scholar]
  77. 77. 
    Shani G, Pineau J, Kaplow R. 2013. A survey of point-based POMDP solvers. Auton. Agents Multi-Agent Syst. 27:1–51
    [Google Scholar]
  78. 78. 
    Ross S, Pineau J, Paquet S, Chaib-Draa B. 2008. Online planning algorithms for POMDPs. J. Artif. Intell. Res. 32:663–704
    [Google Scholar]
  79. 79. 
    Lozano-Pérez T, Wesley MA. 1979. An algorithm for planning collision-free paths among polyhedral obstacles. Commun. ACM 22:10560–70
    [Google Scholar]
  80. 80. 
    Larsen E, Gottschalk S, Lin MC, Manocha D 1999. Fast proximity queries with swept sphere volumes Tech. Rep. TR99-018 Dep. Comput. Sci., Univ. N.C., Chapel Hill NC:
    [Google Scholar]
  81. 81. 
    Gottschalk S, Lin MC, Manocha D 1996. OBBTree: a hierarchical structure for rapid interference detection. SIGGRAPH '96: Proceedings of the 23rd Annual Conference on Computer Graphics and Interactive Techniques171–80 New York: ACM
    [Google Scholar]
  82. 82. 
    Barraquand J, Latombe JC. 1990. A Monte-Carlo algorithm for path planning with many degrees of freedom. Proceedings of the IEEE International Conference on Robotics and Automation 31712–17 Piscataway, NJ: IEEE
    [Google Scholar]
  83. 83. 
    Bessiere P, Ahuactzin JM, Talbi EG, Mazer E. 1993. The “Ariadne's clew” algorithm: global planning with local methods. Proceedings of the 1993 IEEE/RSJ International Conference on Intelligent Robots and Systems 21373–80 Piscataway, NJ: IEEE
    [Google Scholar]
  84. 84. 
    Hsu D, Latombe JC, Motwani R. 1997. Path planning in expansive configuration spaces. Proceedings of the IEEE International Conference on Robotics and Automation 32719–26 Piscataway, NJ: IEEE
    [Google Scholar]
  85. 85. 
    Ghavamzadeh M, Mannor S, Pineau J, Tamar A. 2015. Bayesian reinforcement learning: a survey. Found. Trends Mach. Learn. 8:359–483
    [Google Scholar]
  86. 86. 
    Hausknecht M, Stone P. 2015. Deep recurrent Q-learning for partially observable MDPs. Sequential Decision Making for Intelligent Agents: Papers from the AAAI 2015 Fall Symposium29–37 Palo Alto, CA: AAAI Press
    [Google Scholar]
  87. 87. 
    Mnih V, Kavukcuoglu K, Silver D, Rusu AA, Veness J et al. 2015. Human-level control through deep reinforcement learning. Nature 518:529–33
    [Google Scholar]
  88. 88. 
    Mirowski P, Pascanu R, Viola F, Soyer H, Ballard AJ et al. 2017. Learning to navigate in complex environments. Proceedings of the 5th International Conference on Learning Representations https://openreview.net/pdf?id=SJMGPrcle
    [Google Scholar]
  89. 89. 
    Shankar T, Dwivedy SK, Guha P. 2016. Reinforcement learning via recurrent convolutional neural networks. 2016 23rd International Conference on Pattern Recognition2592–97 Piscataway, NJ: IEEE
    [Google Scholar]
  90. 90. 
    Oh J, Singh S, Lee H 2017. Value prediction network. Advances in Neural Information Processing Systems 30 I Guyon, UV Luxburg, S Bengio, H Wallach, R Fergus et al.6118–28 Red Hook, NY: Curran
    [Google Scholar]
  91. 91. 
    Igl M, Zintgraf L, Le TA, Wood F, Whiteson S 2018. Deep variational reinforcement learning for POMDPs. Proceedings of the 35th International Conference on Machine Learning2117–26 Proc. Mach. Learn. Res 80 N.p.: PMLR
    [Google Scholar]
  92. 92. 
    Collins N, Kurniawati H 2020. Locally-connected interrelated network: a forward propagation primitive. Algorithmic Foundations of Robotics XIV SM LaValle, M Lin, T Ojala, D Shell, J Yu 124–42 Cham, Switz: Springer
    [Google Scholar]
/content/journals/10.1146/annurev-control-042920-092451
Loading
/content/journals/10.1146/annurev-control-042920-092451
Loading

Data & Media loading...

  • Article Type: Review Article
This is a required field
Please enter a valid email address
Approval was a Success
Invalid data
An Error Occurred
Approval was partially successful, following selected items could not be processed due to error