1932

Abstract

Those designing autonomous systems that interact with humans will invariably face questions about how humans think and make decisions. Fortunately, computational cognitive science offers insight into human decision-making using tools that will be familiar to those with backgrounds in optimization and control (e.g., probability theory, statistical machine learning, and reinforcement learning). Here, we review some of this work, focusing on how cognitive science can provide forward models of human decision-making and inverse models of how humans think about others’ decision-making. We highlight relevant recent developments, including approaches that synthesize black box and theory-driven modeling, accounts that recast heuristics and biases as forms of bounded optimality, and models that characterize human theory of mind and communication in decision-theoretic terms. In doing so, we aim to provide readers with a glimpse of the range of frameworks, methodologies, and actionable insights that lie at the intersection of cognitive science and control research.

Loading

Article metrics loading...

/content/journals/10.1146/annurev-control-042920-015547
2022-05-03
2024-06-22
Loading full text...

Full text loading...

/deliver/fulltext/control/5/1/annurev-control-042920-015547.html?itemId=/content/journals/10.1146/annurev-control-042920-015547&mimeType=html&fmt=ahah

Literature Cited

  1. 1. 
    Premack D, Woodruff G. 1978. Does the chimpanzee have a theory of mind?. Behav. Brain Sci. 1:515–26
    [Google Scholar]
  2. 2. 
    Tomasello M, Carpenter M, Call J, Behne T, Moll H 2005. Understanding and sharing intentions: the origins of cultural cognition. Behav. Brain Sci. 28:675–90
    [Google Scholar]
  3. 3. 
    Malle BF 2008. The fundamental tools, and possibly universals, of human social cognition. Handbook of Motivation and Cognition Across Cultures R Sorrentino, S Yamaguchi 267–96 San Diego, CA: Academic
    [Google Scholar]
  4. 4. 
    Baker CL, Saxe R, Tenenbaum JB 2009. Action understanding as inverse planning. Cognition 113:329–49
    [Google Scholar]
  5. 5. 
    Lucas CG, Griffiths TL, Xu F, Fawcett C, Gopnik A et al. 2014. The child as econometrician: a rational model of preference understanding in children. PLOS ONE 9:e92160
    [Google Scholar]
  6. 6. 
    Jara-Ettinger J, Gweon H, Schulz LE, Tenenbaum JB. 2016. The naïve utility calculus: computational principles underlying commonsense psychology. Trends Cogn. Sci. 20:589–604
    [Google Scholar]
  7. 7. 
    Scassellati B. 2002. Theory of mind for a humanoid robot. Auton. Robots 12:13–24
    [Google Scholar]
  8. 8. 
    Breazeal C. 2003. Toward sociable robots. Robot. Auton. Syst. 42:167–75
    [Google Scholar]
  9. 9. 
    Dragan AD, Lee KC, Srinivasa SS. 2013. Legibility and predictability of robot motion. 2013 8th ACM/IEEE International Conference on Human-Robot Interaction (HRI)301–8 Piscataway, NJ: IEEE
    [Google Scholar]
  10. 10. 
    Ho MK, Littman M, MacGlashan J, Cushman F, Austerweil JL 2016. Showing versus doing: teaching by demonstration. Advances in Neural Information Processing Systems 29 DD Lee, M Sugiyama, UV Luxburg, I Guyon, R Garnett 3027–35 Red Hook, NY: Curran
    [Google Scholar]
  11. 11. 
    Fisac JF, Gates MA, Hamrick JB, Liu C, Hadfield-Menell D et al. 2020. Pragmatic-pedagogic value alignment. Robotics Research: The 18th International Symposium ISRR N Amato, G Hager, S Thomas, M Torres-Torriti 49–57 Cham, Switz: Springer
    [Google Scholar]
  12. 12. 
    Sun R 2008. The Cambridge Handbook of Computational Psychology Cambridge, UK: Cambridge Univ. Press
    [Google Scholar]
  13. 13. 
    Bernoulli D. 1738. Exposition of a new theory on the measurement of risk. Econometrica 22:22–36
    [Google Scholar]
  14. 14. 
    Tversky A, Kahneman D. 1974. Judgment under uncertainty: heuristics and biases. Science 185:1124–31
    [Google Scholar]
  15. 15. 
    Peterson JC, Bourgin D, Agrawal M, Reichman D, Griffiths TL. 2021. Using large-scale experiments and machine learning to discover theories of human decision-making. Science 372:1209–14
    [Google Scholar]
  16. 16. 
    Agrawal M, Peterson JC, Griffiths TL. 2020. Scaling up psychology via scientific regret minimization. PNAS 117:8825–35
    [Google Scholar]
  17. 17. 
    Griffiths TL, Lieder F, Goodman ND. 2015. Rational use of cognitive resources: levels of analysis between the computational and the algorithmic. Top. Cogn. Sci. 7:217–29
    [Google Scholar]
  18. 18. 
    Lieder F, Griffiths TL. 2020. Resource-rational analysis: understanding human cognition as the optimal use of limited computational resources. Behav. Brain Sci. 43:e1
    [Google Scholar]
  19. 19. 
    Von Neumann J, Morgenstern O. 1944. Theory of Games and Economic Behavior Princeton, NJ: Princeton Univ. Press
    [Google Scholar]
  20. 20. 
    Savage LJ. 1972. The Foundations of Statistics New York: Courier
    [Google Scholar]
  21. 21. 
    Ziebart BD, Maas A, Bagnell JA, Dey AK. 2008. Maximum entropy inverse reinforcement learning. Proceedings of the Twenty-Third AAAI Conference on Artificial Intelligence 31433–38 Palo Alto, CA: AAAI Press
    [Google Scholar]
  22. 22. 
    McFadden D 1973. Conditional logit analysis of qualitative choice behavior. Frontiers in Econometrics P Zarembka 105–35 New York: Academic
    [Google Scholar]
  23. 23. 
    Kahneman D, Tversky A. 1979. Prospect theory: an analysis of decisions under risk. Econometrica 47:263–92
    [Google Scholar]
  24. 24. 
    Tversky A, Kahneman D. 1992. Advances in prospect theory: cumulative representation of uncertainty. J. Risk Uncertain. 5:297–323
    [Google Scholar]
  25. 25. 
    Gigerenzer G, Todd PM. 1999. Simple Heuristics That Make Us Smart Oxford, UK: Oxford Univ. Press
    [Google Scholar]
  26. 26. 
    Bordalo P, Gennaioli N, Shleifer A 2012. Salience theory of choice under risk. Q. J. Econ. 127:1243–85
    [Google Scholar]
  27. 27. 
    Ratcliff R, Smith PL, Brown SD, McKoon G. 2016. Diffusion decision model: current issues and history. Trends Cogn. Sci. 20:260–81
    [Google Scholar]
  28. 28. 
    Tsetsos K, Gao J, McClelland JL, Usher M. 2012. Using time-varying evidence to test models of decision dynamics: bounded diffusion versus the leaky competing accumulator model. Front. Neurosci. 6:79
    [Google Scholar]
  29. 29. 
    Kiani R, Shadlen MN. 2009. Representation of confidence associated with a decision by neurons in the parietal cortex. Science 324:759–64
    [Google Scholar]
  30. 30. 
    Latimer KW, Yates JL, Meister ML, Huk AC, Pillow JW. 2015. Single-trial spike trains in parietal cortex reveal discrete steps during decision-making. Science 349:184–87
    [Google Scholar]
  31. 31. 
    Erev I, Ert E, Plonsky O, Cohen D, Cohen O. 2017. From anomalies to forecasts: toward a descriptive model of decisions under risk, under ambiguity, and from experience. Psychol. Rev. 124:369–409
    [Google Scholar]
  32. 32. 
    Noti G, Levi E, Kolumbus Y, Daniely A 2016. Behavior-based machine-learning: a hybrid approach for predicting human decision making. arXiv:1611.10228 [cs.LG]
  33. 33. 
    Plonsky O, Erev I, Hazan T, Tennenholtz M. 2017. Psychological forest: predicting human behavior. Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence656–62 Palo Alto, CA: AAAI Press
    [Google Scholar]
  34. 34. 
    Geman S, Bienenstock E, Doursat R 1992. Neural networks and the bias/variance dilemma. Neural Comput 4:1–58
    [Google Scholar]
  35. 35. 
    Fudenberg D, Kleinberg J, Liang A, Mullainathan S. 2019. Measuring the completeness of theories. arXiv:1910.07022 [econ.TH]
  36. 36. 
    Peysakhovich A, Naecker J. 2017. Using methods from machine learning to evaluate behavioral models of choice under risk and ambiguity. J. Econ. Behav. Organ. 133:373–84
    [Google Scholar]
  37. 37. 
    Plonsky O, Apel R, Ert E, Tennenholtz M, Bourgin D et al. 2019. Predicting human decisions with behavioral theories and machine learning. arXiv:1904.06866 [cs.AI]
  38. 38. 
    Bourgin DD, Peterson JC, Reichman D, Russell SJ, Griffiths TL 2019. Cognitive model priors for predicting human decisions. Proceedings of the 36th International Conference on Machine Learning K Chaudhuri, R Salakhutdinov 5133–41 Proc. Mach. Learn. Res. 97. N.p. PMLR
    [Google Scholar]
  39. 39. 
    Awad E, Dsouza S, Kim R, Schulz J, Henrich J et al. 2018. The moral machine experiment. Nature 563:59–64
    [Google Scholar]
  40. 40. 
    Thomson JJ. 1976. Killing, letting die, and the trolley problem. Monist 59:204–17
    [Google Scholar]
  41. 41. 
    Simon HA. 1955. A behavioral model of rational choice. Q. J. Econ. 69:99–118
    [Google Scholar]
  42. 42. 
    Horvitz EJ. 1987. Reasoning about beliefs and actions under computational resource constraints. Proceedings of the Third Conference on Uncertainty in Artificial Intelligence429–47 Arlington, VA: AUAI Press
    [Google Scholar]
  43. 43. 
    Russell S, Wefald E 1991. Principles of metareasoning. Artif. Intell. 49:361–95
    [Google Scholar]
  44. 44. 
    Gershman SJ, Horvitz EJ, Tenenbaum JB. 2015. Computational rationality: a converging paradigm for intelligence in brains, minds, and machines. Science 349:273–78
    [Google Scholar]
  45. 45. 
    Lewis RL, Howes A, Singh S. 2014. Computational rationality: linking mechanism and behavior through bounded utility maximization. Top. Cogn. Sci. 6:279–311
    [Google Scholar]
  46. 46. 
    Lieder F, Griffiths TL, Hsu M. 2018. Overrepresentation of extreme events in decision making reflects rational use of cognitive resources. Psychol. Rev. 125:1–32
    [Google Scholar]
  47. 47. 
    Hay N, Russell S, Tolpin D, Shimony S 2012. Selecting computations: theory and applications. Proceedings of the 28th Conference on Uncertainty in Artificial Intelligence N de Freitas, K Murphy 346–55 Arlington, VA: AUAI Press
    [Google Scholar]
  48. 48. 
    Callaway F, Rangel A, Griffiths TL. 2021. Fixation patterns in simple choice reflect optimal information sampling. PLOS Comput. Biol. 17:e1008863
    [Google Scholar]
  49. 49. 
    Ortega DA, Braun PA. 2011. Information, utility and bounded rationality. Artificial General Intelligence: 4th International Conference, AGI 2011 J Schmidhuber, KR Thórisson, M Looks 269–74 Berlin: Springer
    [Google Scholar]
  50. 50. 
    Bhui R, Gershman SJ. 2018. Decision by sampling implements efficient coding of psychoeconomic functions. Psychol. Rev. 125:985–1001
    [Google Scholar]
  51. 51. 
    Sims CA. 2003. Implications of rational inattention. J. Monet. Econ. 50:665–90
    [Google Scholar]
  52. 52. 
    Gershman SJ, Bhui R. 2020. Rationally inattentive intertemporal choice. Nat. Commun. 11:3365
    [Google Scholar]
  53. 53. 
    Gergely G, Csibra G. 2003. Teleological reasoning in infancy: the naïve theory of rational action. Trends Cogn. Sci. 7:287–92
    [Google Scholar]
  54. 54. 
    Flavell JH. 2004. Theory-of-mind development: retrospect and prospect. Merrill-Palmer Q 50:274–90
    [Google Scholar]
  55. 55. 
    Gergely G, Nádasdy Z, Csibra G, Bíró S 1995. Taking the intentional stance at 12 months of age. Cognition 56:165–93
    [Google Scholar]
  56. 56. 
    Abbeel P, Ng AY. 2004. Apprenticeship learning via inverse reinforcement learning. Proceedings of the Twenty-First International Conference on Machine Learning pap. 1 New York: ACM
    [Google Scholar]
  57. 57. 
    Argall BD, Chernova S, Veloso M, Browning B. 2009. A survey of robot learning from demonstration. Robot. Auton. Syst. 57:469–83
    [Google Scholar]
  58. 58. 
    Arora S, Doshi P. 2021. A survey of inverse reinforcement learning: challenges, methods and progress. Artif. Intell. 297:103500
    [Google Scholar]
  59. 59. 
    Jara-Ettinger J, Gweon H, Tenenbaum JB, Schulz LE. 2015. Children's understanding of the costs and rewards underlying rational action. Cognition 140:14–23
    [Google Scholar]
  60. 60. 
    Liu S, Ullman TD, Tenenbaum JB, Spelke ES. 2017. Ten-month-old infants infer the value of goals from the costs of actions. Science 358:1038–41
    [Google Scholar]
  61. 61. 
    Jern A, Lucas CG, Kemp C. 2017. People learn other people's preferences through inverse decision-making. Cognition 168:46–64
    [Google Scholar]
  62. 62. 
    Baker CL, Jara-Ettinger J, Saxe R, Tenenbaum JB 2017. Rational quantitative attribution of beliefs, desires and percepts in human mentalizing. Nat. Hum. Behav. 1:0064
    [Google Scholar]
  63. 63. 
    Jara-Ettinger J. 2019. Theory of mind as inverse reinforcement learning. Curr. Opin. Behav. Sci. 29:105–10
    [Google Scholar]
  64. 64. 
    Gershman SJ, Gerstenberg T, Baker CL, Cushman FA. 2016. Plans, habits, and theory of mind. PLOS ONE 11:e0162246
    [Google Scholar]
  65. 65. 
    Ong DC, Zaki J, Goodman ND. 2015. Affective cognition: exploring lay theories of emotion. Cognition 143:141–62
    [Google Scholar]
  66. 66. 
    Saxe R, Houlihan SD 2017. Formalizing emotion concepts within a Bayesian model of theory of mind. Curr. Opin. Psychol. 17:15–21
    [Google Scholar]
  67. 67. 
    Ong DC, Zaki J, Goodman ND. 2019. Computational models of emotion inference in theory of mind: a review and roadmap. Top. Cogn. Sci. 11:338–57
    [Google Scholar]
  68. 68. 
    Gerstenberg T, Ullman TD, Nagel J, Kleiman-Weiner M, Lagnado DA, Tenenbaum JB 2018. Lucky or clever? From expectations to responsibility judgments. Cognition 177:122–41
    [Google Scholar]
  69. 69. 
    Kleiman-Weiner M, Gerstenberg T, Levine S, Tenenbaum JB 2015. Inference of intention and permissibility in moral decision making. Proceedings of the 37th Annual Meeting of the Cognitive Science Society D Noelle, R Dale, AS Warlaumont, J Yoshimi, T Matlock, et al. 1123–28 Red Hook, NY: Curran
    [Google Scholar]
  70. 70. 
    Lau T, Pouncy HT, Gershman SJ, Cikara M. 2018. Discovering social groups via latent structure learning. J. Exp. Psychol. Gen. 147:1881–91
    [Google Scholar]
  71. 71. 
    Shum M, Kleiman-Weiner M, Littman ML, Tenenbaum JB. 2019. Theory of minds: understanding behavior in groups through inverse planning. Proceedings of the Thirty-Third AAAI Conference on Artificial Intelligence6163–70 Palo Alto, CA: AAAI Press
    [Google Scholar]
  72. 72. 
    Littman ML, Topcu U, Fu J, Isbell C, Wen M, MacGlashan J. 2017. Environment-independent task specifications via GLTL. arXiv:1704.04341 [cs.AI]
  73. 73. 
    Velez-Ginorio J, Siegel MH, Tenenbaum JB, Jara-Ettinger J 2017. Interpreting actions by attributing compositional desires. Proceedings of the 39th Annual Meeting of the Cognitive Science Society G Gunzelmann, A Howes, T Tenbrink, J Davelaar 1284–89 Red Hook, NY: Curran
    [Google Scholar]
  74. 74. 
    Vazquez-Chanlatte M, Jha S, Tiwari A, Ho MK, Seshia S 2018. Learning task specifications from demonstrations. Advances in Neural Information Processing Systems 31 S Bengio, H Wallach, H Larochelle, K Grauman, N Cesa-Bianchi, R Garnett 5367–77 Red Hook, NY: Curran
    [Google Scholar]
  75. 75. 
    Icarte RT, Klassen T, Valenzano R, McIlraith S 2018. Using reward machines for high-level task specification and decomposition in reinforcement learning. Proceedings of the 35th International Conference on Machine Learning J Dy, A Krause 2107–16 Proc. Mach. Learn. Res. 80. N.p PMLR
    [Google Scholar]
  76. 76. 
    Ho MK, Sanborn S, Callaway F, Bourgin D, Griffiths T. 2018. Human priors in hierarchical program induction. 2018 Conference on Cognitive Computational Neuroscience pap. PS-1A.38. College Station TX: CCN
    [Google Scholar]
  77. 77. 
    Rabinowitz N, Perbet F, Song F, Zhang C, Eslami SA, Botvinick M 2018. Machine theory of mind. Proceedings of the 35th International Conference on Machine Learning J Dy, A Krause 4218–27 Proc. Mach. Learn. Res. 80. N.p. PMLR
    [Google Scholar]
  78. 78. 
    Nematzadeh A, Burns K, Grant E, Gopnik A, Griffiths T 2018. Evaluating theory of mind in question answering. Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing2392–400 Brussels: Assoc. Comput. Linguist.
    [Google Scholar]
  79. 79. 
    Wittgenstein L. 1953. Philosophical Investigations New York: Macmillan
    [Google Scholar]
  80. 80. 
    Grice HP. 1957. Meaning. Philos. Rev. 66:377–88
    [Google Scholar]
  81. 81. 
    Sperber D, Wilson D. 1986. Relevance: Communication and Cognition Cambridge, MA: Harvard Univ. Press
    [Google Scholar]
  82. 82. 
    Clark HH. 1996. Using Language Cambridge, MA: Cambridge Univ. Press
    [Google Scholar]
  83. 83. 
    Csibra G, Gergely G. 2009. Natural pedagogy. Trends Cogn. Sci. 13:148–53
    [Google Scholar]
  84. 84. 
    Shafto P, Goodman ND, Griffiths TL. 2014. A rational account of pedagogical reasoning: teaching by, and learning from, examples. Cogn. Psychol. 71:55–89
    [Google Scholar]
  85. 85. 
    Goodman ND, Frank MC. 2016. Pragmatic language interpretation as probabilistic inference. Trends Cogn. Sci. 20:818–29
    [Google Scholar]
  86. 86. 
    Shafto P, Goodman ND 2008. Teaching games: statistical sampling assumptions for learning in pedagogical situations. Proceedings of the 30th Annual Meeting of the Cognitive Science Society BC Love, K McRae, VM Sloutsky 1632–37 Red Hook, NY: Curran
    [Google Scholar]
  87. 87. 
    Landrum AR, Eaves BS Jr., Shafto P. 2015. Learning to trust and trusting to learn: a theoretical framework. Trends Cogn. Sci. 19:109–11
    [Google Scholar]
  88. 88. 
    Bonawitz E, Shafto P, Gweon H, Goodman ND, Spelke E, Schulz L 2011. The double-edged sword of pedagogy: instruction limits spontaneous exploration and discovery. Cognition 120:322–30
    [Google Scholar]
  89. 89. 
    Bridgers S, Jara-Ettinger J, Gweon H. 2020. Young children consider the expected utility of others' learning to decide what to teach. Nat. Hum. Behav. 4:144–52
    [Google Scholar]
  90. 90. 
    Wang P, Wang J, Paranamana P, Shafto P 2020. A mathematical theory of cooperative communication. Advances in Neural Information Processing Systems 33 H Larochelle, M Ranzato, R Hadsell, MF Balcan, H Lin 17582–93 Red Hook, NY: Curran
    [Google Scholar]
  91. 91. 
    Shafto P, Wang J, Wang P 2021. Cooperative communication as belief transport. Trends Cogn. Sci. 25:826–28
    [Google Scholar]
  92. 92. 
    Kao JT, Wu JY, Bergen L, Goodman ND. 2014. Nonliteral understanding of number words. PNAS 111:12002–7
    [Google Scholar]
  93. 93. 
    Yoon EJ, Tessler MH, Goodman ND, Frank MC. 2020. Polite speech emerges from competing social goals. Open Mind 4:71–87
    [Google Scholar]
  94. 94. 
    Tessler MH, Lopez-Brau M, Goodman ND 2017. Warm (for winter): comparison class understanding in vague language. Proceedings of ICCM 2017: 15th International Conference on Cognitive Modeling MK van Vugt, AP Banks, WG Kennedy 193–98 Coventry, UK: Univ. Warwick
    [Google Scholar]
  95. 95. 
    Tessler MH, Goodman ND. 2019. The language of generalization. Psychol. Rev. 126:395–436
    [Google Scholar]
  96. 96. 
    Csibra G. 2010. Recognizing communicative intentions in infancy. Mind Lang 25:141–68
    [Google Scholar]
  97. 97. 
    Scott-Phillips T. 2014. Speaking Our Minds: Why Human Communication Is Different, and How Language Evolved to Make It Special New York: Macmillan Int. High. Educ.
    [Google Scholar]
  98. 98. 
    Shafto P, Eaves B, Navarro DJ, Perfors A. 2012. Epistemic trust: modeling children's reasoning about others' knowledge and intent. Dev. Sci. 15:436–47
    [Google Scholar]
  99. 99. 
    Ho MK, Cushman F, Littman ML, Austerweil JL. 2021. Communication in action: planning and interpreting communicative demonstrations. J. Exp. Psychol. Gen. 150:2246–72
    [Google Scholar]
  100. 100. 
    MacGlashan J, Littman ML. 2015. Between imitation and intention learning. Proceedings of the Twenty-Fourth International Joint Conference on Artificial Intelligence3692–98 Palo Alto, CA: AAAI Press
    [Google Scholar]
  101. 101. 
    Ho MK, MacGlashan J, Littman ML, Cushman F. 2017. Social is special: a normative framework for teaching with and learning from evaluative feedback. Cognition 167:91–106
    [Google Scholar]
  102. 102. 
    Ho MK, Littman ML, Cushman F, Austerweil JL 2015. Teaching with rewards and punishments: reinforcement or communication?. Proceedings of the 37th Annual Meeting of the Cognitive Science Society D Noelle, R Dale, AS Warlaumont, J Yoshimi, T Matlock, et al. 920–25 Red Hook, NY: Curran
    [Google Scholar]
  103. 103. 
    Ho MK, Cushman F, Littman ML, Austerweil JL. 2019. People teach with rewards and punishments as communication, not reinforcements. J. Exp. Psychol. Gen. 148:520–49
    [Google Scholar]
  104. 104. 
    Loftin R, MacGlashan J, Peng B, Taylor M, Littman M et al. 2014. A strategy-aware technique for learning behaviors from discrete human feedback. Proceedings of the Twenty-Eighth AAAI Conference on Artificial Intelligence937–43 Palo Alto, CA: AAAI Press
    [Google Scholar]
  105. 105. 
    Hadfield-Menell D, Milli S, Abbeel P, Russell S, Dragan AD 2017. Inverse reward design. Advances in Neural Information Processing Systems 30 I Guyon, UV Luxburg, S Bengio, H Wallach, R Fergus, et al. 6768–77 Red Hook, NY: Curran
    [Google Scholar]
  106. 106. 
    MacGlashan J, Ho MK, Loftin R, Peng B, Wang G et al. 2017. Interactive learning from policy-dependent human feedback. Proceedings of the 34th International Conference on Machine Learning D Precup, YW Teh 2285–94 Proc. Mach. Learn. Res. 70. N.p. PMLR
    [Google Scholar]
  107. 107. 
    Baird L 1995. Residual algorithms: reinforcement learning with function approximation. Machine Learning: Proceedings of the Twelfth International Conference on Machine Learning A Prieditis, S Russell 30–37 San Francisco: Morgan Kaufmann
    [Google Scholar]
  108. 108. 
    Arumugam D, Lee JK, Saskin S, Littman ML. 2019. Deep reinforcement learning from policy-dependent human feedback. arXiv:1902.04257 [cs.LG]
  109. 109. 
    Hadfield-Menell D, Dragan A, Abbeel P, Russell S 2016. Cooperative inverse reinforcement learning. Advances in Neural Information Processing Systems 29 D Lee, M Sugiyama, U Luxburg, I Guyon, R Garnett 3916–24 Red Hook, NY: Curran
    [Google Scholar]
  110. 110. 
    Jeon HJ, Milli S, Dragan A 2020. Reward-rational (implicit) choice: a unifying formalism for reward learning. Advances in Neural Information Processing Systems 33 H Larochelle, M Ranzato, R Hadsell, MF Balcan, H Lin 4415–26 Red Hook, NY: Curran
    [Google Scholar]
  111. 111. 
    Correa CG, Ho MK, Callaway F, Griffiths TL 2020. Resource-rational task decomposition to minimize planning costs. Proceedings of the 42nd Annual Meeting of the Cognitive Science Society S Denison, M Mack, Y Xu, B Armstrong 2974–80 Red Hook, NY: Curran
    [Google Scholar]
  112. 112. 
    Ho MK, Abel D, Correa CG, Littman ML, Cohen JD, Griffiths TL. 2021. Control of mental representations in human planning. arXiv:2105.06948 [cs.AI]
  113. 113. 
    Hawkins RD, Gweon H, Goodman ND 2021. The division of labor in communication: speakers help listeners account for asymmetries in visual perspective. Cogn. Sci. 45:e12926
    [Google Scholar]
  114. 114. 
    Gates V, Callaway F, Ho MK, Griffiths T. 2021. A rational model of people's inferences about others' preferences based on response times. Cognition 217:104885
    [Google Scholar]
  115. 115. 
    Evans O, Stuhlmüller A, Goodman N. 2016. Learning the preferences of ignorant, inconsistent agents. Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence323–29 Palo Alto, CA: AAAI Press
    [Google Scholar]
  116. 116. 
    Berke M, Jara-Ettinger J. 2021. Thinking about thinking through inverse reasoning. PsyArXiv. https://doi.org/10.31234/osf.io/r25qn
    [Crossref]
  117. 117. 
    Alanqary A, Lin GZ, Le J, Zhi-Xuan T, Masinghka V, Tenenbaum J 2021. Modeling the mistakes of boundedly rational agents within a Bayesian theory of mind. Proceedings of the 43rd Annual Meeting of the Cognitive Science Societyp. 1013 Red Hook, NY: Curran (Abstr.)
    [Google Scholar]
  118. 118. 
    Zhi-Xuan T, Mann J, Silver T, Tenenbaum J, Masinghka V 2020. Online Bayesian goal inference for boundedly rational planning agents. Advances in Neural Information Processing Systems 33 H Larochelle, M Ranzato, R Hadsell, MF Balcan, H Lin 19238–50 Red Hook, NY: Curran
    [Google Scholar]
  119. 119. 
    Shoham Y, Powers R, Grenager T. 2007. If multi-agent learning is the answer, what is the question?. Artif. Intell. 171:365–77
    [Google Scholar]
  120. 120. 
    Kleiman-Weiner M, Ho MK, Austerweil JL, Littman ML, Tenenbaum JB 2016. Coordinate to cooperate or compete: abstract goals and joint intentions in social interaction. Proceedings of the 38th Annual Meeting of the Cognitive Science Society A Papafragou, D Grodner, D Mirman, JC Trueswell 1679–84 Red Hook, NY: Curran
    [Google Scholar]
  121. 121. 
    Hawkins RXD, Goodman ND, Goldstone RL. 2019. The emergence of social norms and conventions. Trends Cogn. Sci. 23:158–69
    [Google Scholar]
/content/journals/10.1146/annurev-control-042920-015547
Loading
/content/journals/10.1146/annurev-control-042920-015547
Loading

Data & Media loading...

  • Article Type: Review Article
This is a required field
Please enter a valid email address
Approval was a Success
Invalid data
An Error Occurred
Approval was partially successful, following selected items could not be processed due to error