1932

Abstract

Dynamic programming (DP) is a powerful tool for solving a wide class of sequential decision-making problems under uncertainty. In principle, it enables us to compute optimal decision rules that specify the best possible decision in any situation. This article reviews developments in DP and contrasts its revolutionary impact on economics, operations research, engineering, and artificial intelligence with the comparative paucity of its real-world applications to improve the decision making of individuals and firms. The fuzziness of many real-world decision problems and the difficulty in mathematically modeling them are key obstacles to a wider application of DP in real-world settings. Nevertheless, I discuss several success stories, and I conclude that DP offers substantial promise for improving decision making if we let go of the empirically untenable assumption of unbounded rationality and confront the challenging decision problems faced every day by individuals and firms.

Loading

Article metrics loading...

/content/journals/10.1146/annurev-economics-080218-025721
2019-08-02
2024-04-24
Loading full text...

Full text loading...

/deliver/fulltext/economics/11/1/annurev-economics-080218-025721.html?itemId=/content/journals/10.1146/annurev-economics-080218-025721&mimeType=html&fmt=ahah

Literature Cited

  1. Adda J, Cooper R. 2003. Dynamic Economics: Quantitative Methods and Applications Cambridge, MA: MIT Press
  2. Aguirregabiria V, Mira P. 2010. Dynamic discrete choice structural models: a survey. J. Econom. 156:38–67
    [Google Scholar]
  3. Andersen L, Broadie M. 2004. Primal-dual simulation algorithm for pricing multidimensional American options. Manag. Sci. 50:91222–34
    [Google Scholar]
  4. Arrow K, Harris T, Marshak J 1951. Optimal inventory policy. Econometrica 19:250–72
    [Google Scholar]
  5. Barron A. 1994. Approximation and estimation bounds for neural networks. Mach. Learn. 14:115–33
    [Google Scholar]
  6. Barto A, Bradtke S, Singh S 1995. Learning to act using real-time dynamic programming. Artif. Intell. 72:81–138
    [Google Scholar]
  7. Barto A, Dietterich T. 2004. Reinforcement learning and its relation to supervised learning. Learning and Approximate Dynamic Programming: Scaling Up to the Real World J Si, A Barto, W Powell, D Wunsch 46–63 New York: Wiley Intersci
    [Google Scholar]
  8. Bellman R. 1984. Eye of the Hurricane Singapore: World Sci
  9. Bertsekas D. 1982. Distributed dynamic programming. IEEE Trans. Autom. Control 27:610–16
    [Google Scholar]
  10. Bertsekas D. 2017. Dynamic Programming and Optimal Control 1 Belmont, MA: Athena Sci
  11. Bertsekas D, Tsitsiklis J. 1989. Parallel and Distributed Computation: Numerical Methods Englewood Cliffs, NJ: Prentice Hall
  12. Bertsekas D, Tsitsiklis J. 1996. Neuro-Dynamic Programming Belmont, MA: Athena Sci
  13. Bilonis I, Scheidegger S. 2017. Machine learning for high-dimensional dynamic stochastic economies Work. Pap Sch. Econ. Bus. Admin., Univ. Lausanne Switz:
  14. Blackwell D. 1965. Discounted dynamic programming. Ann. Math. Stat. 36:226–35
    [Google Scholar]
  15. Bloise G, Vailakis Y. 2018. Convex dynamic programming with (bounded) recursive utility. J. Econ. Theory 173:118–41
    [Google Scholar]
  16. Broadie M, Glasserman P. 2004. A stochastic mesh method for pricing high dimensional American options. J. Comput. Finance 7:435–72
    [Google Scholar]
  17. Brown G. 1951. Iterative solutions of games by fictitious play. Activity Analysis of Production and Allocation TC Koopmans 374–76 New York: Wiley
    [Google Scholar]
  18. Brumm J, Scheidegger S. 2017. Using adaptive sparse grids to solve high-dimensional dynamic models. Econometrica 85:1575–612
    [Google Scholar]
  19. Campbell J, Shiller R. 1988. Stock prices, earnings and expected dividends. J. Finance 43:3661–76
    [Google Scholar]
  20. Cellan-Jones R. 2014. Stephen Hawking warns artificial intelligence could end mankind. BBC News Dec. 2. https://www.bbc.com/news/technology-30290540
    [Google Scholar]
  21. Cho S, Lee G, Rust J, Yu M 2018. Optimal dynamic hotel pricing Work. Pap Dep. Econ., Georgetown Univ
  22. Cho S, Lee G, Rust J, Yu M 2019. Semi-parametric instrument-free demand estimation: relaxing optimality and equilibrium assumptions Work. Pap., Dep. Econ., Georgetown Univ Washington, DC:
  23. Cho S, Rust J. 2010. The flat rental puzzle. Rev. Econ. Stud. 77:560–94
    [Google Scholar]
  24. Chow CS, Tsitsiklis JN. 1989. The complexity of dynamic programming. J. Complex. 5:466–88
    [Google Scholar]
  25. Chow GC. 1976. Analysis and Control of Dynamic Economic Systems New York: Wiley
  26. DellaVigna S. 2018. Structural behavioral economics. Handbook of Behavioral Economics 1 D Bernheim, S DellaVigna, D Laibson 621–723 Amsterdam: Elsevier
    [Google Scholar]
  27. Dreyfus S. 2002. Richard Bellman on the birth of dynamic programming. Oper. Res. 50:48–51
    [Google Scholar]
  28. Eckstein Z, Wolpin K. 1989. The specification and estimation of dynamic discrete choice models. J. Hum. Resour. 24:562–98
    [Google Scholar]
  29. Forbes Insights 2018. What not to wear: how algorithms are taking uncertainty out of fashion. Forbes July 17. https://www.forbes.com/sites/insights-intelai/2018/07/17/what-not-to-wear-how-algorithms-are-taking-uncertainty-out-of-fashion/#3e21594186ab
    [Google Scholar]
  30. Frederick S, Loewenstein G, O'Donoghue T 2002. Time discounting and time preference: a critical review. J. Econ. Lit. 40:351–401
    [Google Scholar]
  31. Gallego G, van Ryzin G 1994. Optimal dynamic pricing of inventories with stochastic demand over finite horizons. Manag. Sci. 40:999–1020
    [Google Scholar]
  32. Gittins J. 1989. Multi-Armed Bandit Allocation Indices New York: Wiley
  33. Griffiths T, Tenenbaum J. 2009. Theory-based causal induction. Psychol. Rev. 116:661–716
    [Google Scholar]
  34. Gupta S, Rust J. 2018. A simple theory of why and when firms go public Work. Pap Dep. Econ., Georgetown Univ Washington, DC:
  35. Hall G, Rust J. 2019. Econometric methods for endogenously sampled time series: the case of commodity price speculation in the steel market.. J. Econom. In press
    [Google Scholar]
  36. Heckman J, Singer B. 2017. Abducting economics. Am. Econ. Rev. 107:298–302
    [Google Scholar]
  37. Hutchinson J, Meyer RJ. 1994. Dynamic decision making: optimal policies and actual behavior in sequential choice problems. Mark. Lett. 5:369–82
    [Google Scholar]
  38. Iskhakov F, Rust J, Schjerning B 2015. Recursive lexicographical search: finding all Markov perfect equilibria of finite state directional dynamic games. Rev. Econ. Stud. 83:658–703
    [Google Scholar]
  39. Iskhakov F, Rust J, Schjerning B 2018. The dynamics of Bertrand price competition with cost-reducing investments. Int. Econ. Rev. 59:41681–731
    [Google Scholar]
  40. Judd K. 1998. Numerical Methods in Economics Cambridge, MA: MIT Press
  41. Kahneman D. 2011. Thinking, Fast and Slow New York: Farrar, Straus and Giroux
  42. Leary M, Michaely R. 2011. Determinants of dividend smoothing: empirical evidence. Rev. Financ. Stud. 24:3197–249
    [Google Scholar]
  43. Magnac T, Thesmar D. 2002. Identifying discrete decision processes. Econometrica 70:810–16
    [Google Scholar]
  44. Mak T, Cheung P, Lam K, Luk W 2011. Adaptive routing in network-on-chips using a dynamic-programming network. IEEE Trans. Ind. Electron. 58:3701–16
    [Google Scholar]
  45. MarketsandMarkets. 2016. Revenue management market by solutions (risk management, pricing and revenue forecast management, revenue analytics, revenue leakage detection, channel revenue management) by services (professional, managed) by deployment mode—global forecast to 2020 Tech. Rep MarketsandMarkets https://www.marketsandmarkets.com/Market-Reports/revenue-management-market-264806846.html
  46. Maskin E, Tirole J. 2001. Markov perfect equilibrium I: observable actions. J. Econ. Theory 100:191–219
    [Google Scholar]
  47. Massé P. 1944. Application des probabilités en chaîne à l'hydrologie statistique et au jeu des réservoirs. Soc. Stat. Paris 86:204–19
    [Google Scholar]
  48. McAfee P, te Velde V 2008. Dynamic pricing with constant demand elasticity. Prod. Oper. Manag. 17:432–38
    [Google Scholar]
  49. Miller J, Palmer R, Rust J 1993. Behavior of trading automata in a computerized double auction market. The Double Auction Market: Theory, Institutions, and Laboratory Evidence D Friedman, J Rust 155–98 Redwood City, CA: Addison Wesley
    [Google Scholar]
  50. Misra S, Nair H. 2011. A structural model of sales-force compensation dynamics: estimation and field implementation. Quant. Mark. Econ. 9:211–57
    [Google Scholar]
  51. Mnih V, Kavukcuoglu K, Silver D, Rusu A, Veness J et al. 2015. Human-level control through deep reinforcement learning. Nature 518:529–33
    [Google Scholar]
  52. Myerson R. 1981. Optimal auction design. Math. Oper. Res. 6:58–73
    [Google Scholar]
  53. Nutt PC. 2002. Why Decisions Fail San Francisco: Berret-Koehler Publ
  54. Phillips RL. 2005. Pricing and Revenue Optimization Palo Alto, CA: Stanford Univ. Press
  55. Powell W. 2010. Approximate Dynamic Programming: Solving the Curses of Dimensionality New York: Wiley
  56. Powell W, Bouzaiene-Ayari B, Lawrence C, Cheng C, Das S, Fiorillo R 2014. Locomotive planning at Norfolk Southern: an optimizing simulator using approximate dynamic programming. Interfaces 44:567–78
    [Google Scholar]
  57. Puterman M. 2005. Markov Decision Processes: Discrete Stochastic Dynamic Programming New York: Wiley
  58. Renner P, Scheidegger S. 2018. Machine learning for dynamic incentive problems Work. Pap Dep. Econ., Univ Lancaster, UK:
  59. Robbins H, Munro S. 1951. A stochastic approximation method. Ann. Math. Stat. 22:400–25
    [Google Scholar]
  60. Rust J. 1994. Structural estimation of Markov decision processes. Handbook of Econometrics 4 R Engel, D McFadden 3081–143 Amsterdam: Elsevier
    [Google Scholar]
  61. Rust J. 1996. Numerical dynamic programming in economics. Handbook of Computational Economics H Amman, D Kendrick, J Rust 619–730 Amsterdam: Elsevier
    [Google Scholar]
  62. Rust J. 1997. Using randomization to break the curse of dimensionality. Econometrica 65:487–516
    [Google Scholar]
  63. Rust J. 2008. Dynamic programming. The New Palgrave Dictionary of Economics 1 SN Durlauf, LE Blume New York: Palgrave Macmillan https://doi.org/10.1057/978-1-349-95121-5_1932-1
    [Crossref] [Google Scholar]
  64. Rust J. 2014. The limits of inference with theory: a review of Wolpin 2013. J. Econ. Lit. 52:820–50
    [Google Scholar]
  65. Rust J. 2017. Dynamic programming, numerical. Wiley StatsRef Feb. 15. https://doi.org/10.1002/9781118445112.stat07921
    [Crossref] [Google Scholar]
  66. Silver D. 2017. Mastering chess and shogi by self-play with a general reinforcement learning algorithm. arXiv:1712.01815[cs.AI]
  67. Silver D, Schrittewieser J, Simonyan K, Antonoglu I, Huang A et al. 2017. Mastering the game of Go without human knowledge. Nature 550:354–58
    [Google Scholar]
  68. Simon H. 1978. Rational decision-making in business organizations Nobel Memorial Lecture Dec. 8. https://www.nobelprize.org/uploads/2018/06/simon-lecture.pdf
  69. Stokey N, Lucas R. 1989. Recursive Methods in Economic Dynamics Cambridge, MA: Harvard Univ. Press
  70. Sutton R. 1988. Learning to predict by the methods of temporal differences. Mach. Learn. 3:9–44
    [Google Scholar]
  71. Thaler R, Sunstein C. 2008. Nudge New Haven, CT: Yale Univ. Press
  72. Traub J, Werschulz AG. 1998. Complexity and Information Pisa, Italy: Accad. Naz. Lincei
  73. Tsitsiklis J. 1995. Asynchronous stochastic approximation and Q-learning. Mach. Learn. 16:185–202
    [Google Scholar]
  74. Wald A. 1947. Foundations of a general theory of statistical decision functions. Econometrica 15:279–313
    [Google Scholar]
  75. Watkins C. 1989. Learning from delayed rewards PhD Thesis, Cambridge Univ Cambridge, UK:
  76. Wiener N. 1948. Cybernetics: Or Control and Communication in the Animal and the Machine Paris: Hermann & Cie
  77. Wikipedia 2018. Machine learning. Wikipedia https://en.wikipedia.org/wiki/Machine_learning
    [Google Scholar]
  78. Zhang K, Wu T, Chen S, Cai L, Peng C 2017. A new energy efficient VM scheduling algorithm for cloud computing based on dynamic programming. 2017 IEEE 4th International Conference on Cyber Security and Cloud Computing (CSCloud)249–54 New York: IEEE
    [Google Scholar]
/content/journals/10.1146/annurev-economics-080218-025721
Loading
  • Article Type: Review Article
This is a required field
Please enter a valid email address
Approval was a Success
Invalid data
An Error Occurred
Approval was partially successful, following selected items could not be processed due to error