Learning, Reward, and Decision Making

John P. O'Doherty; Jeffrey Cockburn; Wolfgang M. Pauli

doi:10.1146/annurev-psych-010416-044216

Learning, Reward, and Decision Making

John P. O'Doherty¹, Jeffrey Cockburn¹, and Wolfgang M. Pauli¹
View Affiliations Hide Affiliations

Affiliations: Division of Humanities and Social Sciences and Computation and Neural Systems Program, California Institute of Technology, Pasadena, California 91125; email: [email protected]
Vol. 68:73-100 (Volume publication date January 2017) https://doi.org/10.1146/annurev-psych-010416-044216
First published as a Review in Advance on September 28, 2016
© Annual Reviews

Abstract

In this review, we summarize findings supporting the existence of multiple behavioral strategies for controlling reward-related behavior, including a dichotomy between the goal-directed or model-based system and the habitual or model-free system in the domain of instrumental conditioning and a similar dichotomy in the realm of Pavlovian conditioning. We evaluate evidence from neuroscience supporting the existence of at least partly distinct neuronal substrates contributing to the key computations necessary for the function of these different control systems. We consider the nature of the interactions between these systems and show how these interactions can lead to either adaptive or maladaptive behavioral outcomes. We then review evidence that an additional system guides inference concerning the hidden states of other agents, such as their beliefs, preferences, and intentions, in a social context. We also describe emerging evidence for an arbitration mechanism between model-based and model-free reinforcement learning, placing such a mechanism within the broader context of the hierarchical control of behavior.

Keyword(s): cognitive map, instrumental, model based, model free, outcome valuation, Pavlovian

Article metrics loading...

/content/journals/10.1146/annurev-psych-010416-044216

2017-01-03

2024-04-29

Full text loading...

/deliver/fulltext/psych/68/1/annurev-psych-010416-044216.html?itemId=/content/journals/10.1146/annurev-psych-010416-044216&mimeType=html&fmt=ahah

Literature Cited

Abe H, Lee D. 2011. Distributed coding of actual and hypothetical outcomes in the orbital and dorsolateral prefrontal cortex. Neuron 70:4731–41 [Google Scholar]
Allman MJ, DeLeon IG, Cataldo MF, Holland PC, Johnson AW. 2010. Learning processes affecting human decision making: an assessment of reinforcer-selective Pavlovian-to-instrumental transfer following reinforcer devaluation. J. Exp. Psychol. Anim. Behav. Process. 36:3402–8 [Google Scholar]
Andersen RA, Snyder LH, Bradley DC, Xing J. 1997. Multimodal representation of space in the posterior parietal cortex and its use in planning movements. Annu. Rev. Neurosci. 20:303–30 [Google Scholar]
Applegate CD, Frysinger RC, Kapp BS, Gallagher M. 1982. Multiple unit activity recorded from amygdala central nucleus during Pavlovian heart rate conditioning in rabbit. Brain Res 238:2457–62 [Google Scholar]
Ariely D, Gneezy U, Loewenstein G, Mazar N. 2009. Large stakes and big mistakes. Rev. Econ. Stud. 76:2451–69 [Google Scholar]
Badre D, D'Esposito M. 2007. Functional magnetic resonance imaging evidence for a hierarchical organization of the prefrontal cortex. J. Cogn. Neurosci. 19:122082–99 [Google Scholar]
Badre D, Doll BB, Long NM, Frank MJ. 2012. Rostrolateral prefrontal cortex and individual differences in uncertainty-driven exploration. Neuron 73:3595–607 [Google Scholar]
Baker PM, Ragozzino ME. 2014. Contralateral disconnection of the rat prelimbic cortex and dorsomedial striatum impairs cue-guided behavioral switching. Learn. Mem. 21:8368–79 [Google Scholar]
Balleine BW, Daw ND, O'Doherty JP. 2009. Multiple forms of value learning and the function of dopamine. Glimcher et al. 2013 367–85
Balleine BW, Dickinson A. 1991. Instrumental performance following reinforcer devaluation depends upon incentive learning. Q. J. Exp. Psychol. Sect. B 43:3279–96 [Google Scholar]
Balleine BW, Dickinson A. 1998. Goal-directed instrumental action: contingency and incentive learning and their cortical substrates. Neuropharmacology 37:4–5407–19 [Google Scholar]
Balleine BW, O'Doherty JP. 2009. Human and rodent homologies in action control: corticostriatal determinants of goal-directed and habitual action. Neuropsychopharmacology 35:148–69 [Google Scholar]
Behrens TEJ, Hunt LT, Woolrich MW, Rushworth MFS. 2008. Associative learning of social value. Nature 456:7219245–49 [Google Scholar]
Boakes RA. 1977. Performance on learning to associate a stimulus with positive reinforcement. Operant-Pavlovian Interactions H Davis, HMB Burwitz 67–97 London: Wiley
Boorman ED, O'Doherty JP, Adolphs R, Rangel A. 2013a. The behavioral and neural mechanisms underlying the tracking of expertise. Neuron 80:61558–71 [Google Scholar]
Boorman ED, Rushworth MF, Behrens TE. 2013b. Ventromedial prefrontal and anterior cingulate cortex adopt choice and default reference frames during sequential multi-alternative choice. J. Neurosci. 33:62242–53 [Google Scholar]
Botvinick MM. 2012. Hierarchical RL and decision making. Curr. Opin. Neurobiol. 22:6956–62 [Google Scholar]
Botvinick MM, Niv Y, Barto AC. 2009. Hierarchically organized behavior and its neural foundations: a RL perspective. Cognition 113:3262–80 [Google Scholar]
Burke CJ, Tobler PN, Baddeley M, Schultz W. 2010. Neural mechanisms of observational learning. PNAS 107:3214431–36 [Google Scholar]
Camerer C, Loewenstein G, Prelec D. 2005. Neuroeconomics: How neuroscience can inform economics. J. Econ. Lit. 43:9–64 [Google Scholar]
Chib VS, De Martino B, Shimojo S, O'Doherty JP. 2012. Neural mechanisms underlying paradoxical performance for monetary incentives are driven by loss aversion. Neuron 74:3582–94 [Google Scholar]
Chib VS, Rangel A, Shimojo S, O'Doherty JP. 2009. Evidence for a common representation of decision values for dissimilar goods in human VmPFC. J. Neurosci. 29:3912315–20 [Google Scholar]
Chib VS, Shimojo S, O'Doherty JP. 2014. The effects of incentive framing on performance decrements for large monetary outcomes: behavioral and neural mechanisms. J. Neurosci. 34:4514833–44 [Google Scholar]
Cohen YE, Andersen RA. 2002. A common reference frame for movement plans in the posterior parietal cortex. Nat. Rev. Neurosci. 3:7553–62 [Google Scholar]
Cone JJ, Fortin SM, McHenry JA, Stuber GD, McCutcheon JE, Roitman MF. 2016. Physiological state gates acquisition and expression of mesolimbic reward prediction signals. PNAS 113:71943–48 [Google Scholar]
Cooper JC, Dunne S, Furey T, O'Doherty JP. 2012. Human dorsal striatum encodes prediction errors during observational learning of instrumental actions. J. Cogn. Neurosci. 24:1106–18 [Google Scholar]
Corbit LH, Balleine BW. 2000. The role of the hippocampus in instrumental conditioning. J. Neurosci. 20:114233–39 [Google Scholar]
Corbit LH, Balleine BW. 2005. Double dissociation of basolateral and central amygdala lesions on the general and outcome-specific forms of Pavlovian-instrumental transfer. J. Neurosci. 25:4962–70 [Google Scholar]
Critchley HD, Mathias CJ, Dolan RJ. 2001. Neural activity in the human brain relating to uncertainty and arousal during anticipation. Neuron 29:2537–45 [Google Scholar]
Cushman F, Morris A. 2015. Habitual control of goal selection in humans. PNAS 112:4513817–22 [Google Scholar]
D'Ardenne K, McClure SM, Nystrom LE, Cohen JD. 2008. BOLD responses reflecting dopaminergic signals in the human ventral tegmental area. Science 319:58671264–67 [Google Scholar]
Davey GCL. 1992. Classical conditioning and the acquisition of human fears and phobias: a review and synthesis of the literature. Adv. Behav. Res. Ther. 14:129–66 [Google Scholar]
Daw ND, Niv Y, Dayan P. 2005. Uncertainty-based competition between prefrontal and dorsolateral striatal systems for behavioral control. Nat. Neurosci. 8:121704–11 [Google Scholar]
Daw ND, O'Doherty JP, Dayan P, Seymour B, Dolan RJ. 2006. Cortical substrates for exploratory decisions in humans. Nature 441:7095876–79 [Google Scholar]
Dayan P, Berridge KC. 2014. Model-based and model-free Pavlovian reward learning: revaluation, revision, and revelation. Cogn. Affect. Behav. Neurosci. 14:2473–92 [Google Scholar]
de Araujo IET, Kringelbach ML, Rolls ET, McGlone F. 2003a. Human cortical responses to water in the mouth, and the effects of thirst. J. Neurophysiol. 90:31865–76 [Google Scholar]
de Araujo IET, Rolls ET, Kringelbach ML, McGlone F, Phillips N. 2003b. Taste-olfactory convergence, and the representation of the pleasantness of flavour, in the human brain. Eur. J. Neurosci. 18:72059–68 [Google Scholar]
de Araujo IET, Rolls ET, Velazco MI, Margot C, Cayeux I. 2005. Cognitive modulation of olfactory processing. Neuron 46:4671–79 [Google Scholar]
de Wit S, Corlett PR, Aitken MR, Dickinson A, Fletcher PC. 2009. Differential engagement of the VmPFC by goal-directed and habitual behavior toward food pictures in humans. J. Neurosci. 29:3611330–38 [Google Scholar]
de Wit S, Watson P, Harsay HA, Cohen MX, Vijver I, van de Ridderinkhof KR. 2012. Corticostriatal connectivity underlies individual differences in the balance between habitual and goal-directed action control. J. Neurosci. 32:3512066–75 [Google Scholar]
Delgado MR, Li J, Schiller D, Phelps EA. 2008a. The role of the striatum in aversive learning and aversive prediction errors. Philos. Trans. R. Soc. Lond. B Biol. Sci. 363:15113787–800 [Google Scholar]
Delgado MR, Nearing KI, Ledoux JE, Phelps EA. 2008b. Neural circuitry underlying the regulation of conditioned fear and its relation to extinction. Neuron 59:5829–38 [Google Scholar]
Desmurget M, Epstein CM, Turner RS, Prablanc C, Alexander GE, Grafton ST. 1999. Role of the posterior parietal cortex in updating reaching movements to a visual target. Nat. Neurosci. 2:6563–67 [Google Scholar]
Dezfouli A, Balleine BW. 2013. Actions, action sequences and habits: evidence that goal-directed and habitual action control are hierarchically organized. PLOS Comput. Biol. 9:12e1003364 [Google Scholar]
Dickinson A. 1985. Actions and habits: the development of behavioural autonomy. Philos. Trans. R. Soc. Lond. B Biol. Sci. 308:113567–78 [Google Scholar]
Dickinson A, Balleine B. 1994. Motivational control of goal-directed action. Anim. Learn. Behav. 22:11–18 [Google Scholar]
Dickinson A, Balleine B, Watt A, Gonzalez F, Boakes RA. 1995. Motivational control after extended instrumental training. Anim. Learn. Behav. 23:2197–206 [Google Scholar]
Dickinson A, Nicholas DJ, Adams CD. 1983. The effect of the instrumental training contingency on susceptibility to reinforcer devaluation. Q. J. Exp. Psychol. Sect. B 35:135–51 [Google Scholar]
Diuk C, Tsai K, Wallis J, Botvinick M, Niv Y. 2013. Hierarchical learning induces two simultaneous, but separable, prediction errors in human basal ganglia. J. Neurosci. 33:135797–805 [Google Scholar]
Doll BB, Duncan KD, Simon DA, Shohamy D, Daw ND. 2015. Model-based choices involve prospective neural activity. Nat. Neurosci. 18:5767–72 [Google Scholar]
Doll BB, Hutchison KE, Frank MJ. 2011. Dopaminergic genes predict individual differences in susceptibility to confirmation bias. J. Neurosci. 31:166188–98 [Google Scholar]
Donoso M, Collins AGE, Koechlin E. 2014. Foundations of human reasoning in the prefrontal cortex. Science 344:61911481–86 [Google Scholar]
Dorris MC, Glimcher PW. 2004. Activity in posterior parietal cortex is correlated with the relative subjective desirability of action. Neuron 44:2365–78 [Google Scholar]
Doya K. 1999. What are the computations of the cerebellum, the basal ganglia and the cerebral cortex. Neural Netw 12:7–8961–74 [Google Scholar]
Eichenbaum H, Dudchenko P, Wood E, Shapiro M, Tanila H. 1999. The hippocampus, memory, and place cells: Is it spatial memory or a memory space. Neuron 23:2209–26 [Google Scholar]
Estes WK. 1943. Discriminative conditioning. I. A discriminative property of conditioned anticipation. J. Exp. Psychol. 32:2150–55 [Google Scholar]
Everitt BJ, Robbins TW. 2016. Drug addiction: updating actions to habits to compulsions ten years on. Annu. Rev. Psychol. 67:123–50 [Google Scholar]
Faure A, Haberland U, Condé F, Massioui NE. 2005. Lesion to the nigrostriatal dopamine system disrupts stimulus-response habit formation. J. Neurosci. 25:112771–80 [Google Scholar]
FitzGerald THB, Dolan RJ, Friston KJ. 2014. Model averaging, optimal inference, and habit formation. Front. Hum. Neurosci. 8:457 [Google Scholar]
Flagel SB, Watson SJ, Robinson TE, Akil H. 2007. Individual differences in the propensity to approach signals versus goals promote different adaptations in the dopamine system of rats. Psychopharmacol. Berl. 191:3599–607 [Google Scholar]
Frank MJ, Seeberger LC, O'Reilly RC. 2004. By carrot or by stick: cognitive RL in parkinsonism. Science 306:57031940–43 [Google Scholar]
Freedman DJ, Assad JA. 2006. Experience-dependent representation of visual categories in parietal cortex. Nature 443:710785–88 [Google Scholar]
Frith CD, Frith U. 2006. The neural basis of mentalizing. Neuron 50:4531–34 [Google Scholar]
Frith U, Frith CD. 2003. Development and neurophysiology of mentalizing. Philos. Trans. R. Soc. Lond. B Biol. Sci. 358:1431459–73 [Google Scholar]
Gigerenzer G, Gaissmaier W. 2011. Heuristic decision making. Annu. Rev. Psychol. 62:1451–82 [Google Scholar]
Gläscher J, Daw N, Dayan P, O'Doherty JP. 2010. States versus rewards: dissociable neural prediction error signals underlying model-based and model-free RL. Neuron 66:4585–95 [Google Scholar]
Glimcher PW, Camerer CF, Fehr E, Poldrack RA. 2013. Neuroeconomics: Decision Making and the Brain London: Academic
Gottfried JA, O'Doherty J, Dolan RJ. 2002. Appetitive and aversive olfactory learning in humans studied using event-related functional magnetic resonance imaging. J. Neurosci. 22:2410829–37 [Google Scholar]
Gottfried JA, O'Doherty J, Dolan RJ. 2003. Encoding predictive reward value in human amygdala and OFC. Science 301:56361104–7 [Google Scholar]
Groenewegen HJ, Berendse HW. 1994. Anatomical relationships between the prefrontal cortex and the basal ganglia in the rat. Motor and Cognitive Functions of the Prefrontal Cortex AM Thierry, J Glowinski, PS Goldman-Rakic, Y Christen 51–77 Berlin/Heidelberg: Springer [Google Scholar]
Hampton AN, Bossaerts P, O'Doherty JP. 2008. Neural correlates of mentalizing-related computations during strategic interactions in humans. PNAS 105:186741–46 [Google Scholar]
Hare TA, Schultz W, Camerer CF, O'Doherty JP, Rangel A. 2011. Transformation of stimulus value signals into motor commands during simple choice. PNAS 108:4418120–25 [Google Scholar]
Hearst E, Jenkins HM. 1974. Sign-Tracking: The Stimulus-Reinforcer Relation and Directed Action Madison, WI: Psychon. Soc.
Hikosaka O, Sakamoto M, Usui S. 1989. Functional properties of monkey caudate neurons. I. Activities related to saccadic eye movements. J. Neurophysiol. 61:4780–98 [Google Scholar]
Hillman KL, Bilkey DK. 2012. Neural encoding of competitive effort in the anterior cingulate cortex. Nat. Neurosci. 15:91290–97 [Google Scholar]
Holland PC. 2004. Relations between Pavlovian-instrumental transfer and reinforcer devaluation. J. Exp. Psychol. Anim. Behav. Process. 30:2104–17 [Google Scholar]
Holland PC, Bouton ME. 1999. Hippocampus and context in classical conditioning. Curr. Opin. Neurobiol. 9:2195–202 [Google Scholar]
Holland PC, Gallagher M. 2003. Double dissociation of the effects of lesions of basolateral and central amygdala on conditioned stimulus-potentiated feeding and Pavlovian-instrumental transfer. Eur. J. Neurosci. 17:81680–94 [Google Scholar]
Horga G, Maia TV, Marsh R, Hao X, Xu D. et al. 2015. Changes in corticostriatal connectivity during RL in humans. Hum. Brain Mapp. 36:2793–803 [Google Scholar]
Hosokawa T, Kennerley SW, Sloan J, Wallis JD. 2013. Single-neuron mechanisms underlying cost-benefit analysis in frontal cortex. J. Neurosci. 33:4417385–97 [Google Scholar]
Howard JD, Gottfried JA, Tobler PN, Kahnt T. 2015. Identity-specific coding of future rewards in the human orbitofrontal cortex. PNAS 112:165195–200 [Google Scholar]
Huettel SA, Stowe CJ, Gordon EM, Warner BT, Platt ML. 2006. Neural signatures of economic preferences for risk and ambiguity. Neuron 49:5765–75 [Google Scholar]
Hunt LT, Dolan RJ, Behrens TEJ. 2014. Hierarchical competitions subserving multi-attribute choice. Nat. Neurosci. 17:111613–22 [Google Scholar]
Huys QJM, Maia TV, Frank MJ. 2016. Computational psychiatry as a bridge from neuroscience to clinical applications. Nat. Neurosci. 19:3404–13 [Google Scholar]
Jenkins HM, Moore BR. 1973. The form of the auto-shaped response with food or water reinforcers. J. Exp. Anal. Behav. 20:2163–81 [Google Scholar]
Johnson A, Redish AD. 2007. Neural ensembles in CA3 transiently encode paths forward of the animal at a decision point. J. Neurosci. 27:4512176–89 [Google Scholar]
Jones JL, Esber GR, McDannald MA, Gruber AJ, Hernandez A. et al. 2012. OFC supports behavior and learning using inferred but not cached values. Science 338:6109953–56 [Google Scholar]
Kirk U, Skov M, Hulme O, Christensen MS, Zeki S. 2009. Modulation of aesthetic value by semantic context: an fMRI study. NeuroImage 44:31125–32 [Google Scholar]
Knutson B, Fong GW, Adams CM, Varner JL, Hommer D. 2001. Dissociation of reward anticipation and outcome with event-related fMRI. Neuroreport 12:173683–87 [Google Scholar]
Koechlin E, Ody C, Kouneiher F. 2003. The architecture of cognitive control in the human prefrontal cortex. Science 302:56481181–85 [Google Scholar]
Kolb B, Buhrmann K, McDonald R, Sutherland RJ. 1994. Dissociation of the medial prefrontal, posterior parietal, and posterior temporal cortex for spatial navigation and recognition memory in the rat. Cereb. Cortex 4:6664–80 [Google Scholar]
Konorski J, Miller S. 1937. On two types of conditioned reflex. J. Gen. Psychol. 16:1264–72 [Google Scholar]
Kuvayev L, Sutton R. 1996. Model-based RL with an approximate, learned model. Proc. Yale Worksh. Adapt. Learn. Syst., 9th, June 10–12, New Haven, CT101–5 New Haven, CT: Dunham Lab., Yale Univ. [Google Scholar]
Lau B, Glimcher PW. 2008. Value representations in the primate striatum during matching behavior. Neuron 58:3451–63 [Google Scholar]
LeDoux JE, Iwata J, Cicchetti P, Reis DJ. 1988. Different projections of the central amygdaloid nucleus mediate autonomic and behavioral correlates of conditioned fear. J. Neurosci. 8:72517–29 [Google Scholar]
Lee D, Rushworth MFS, Walton ME, Watanabe M, Sakagami M. 2007. Functional specialization of the primate frontal cortex during decision making. J. Neurosci. 27:318170–73 [Google Scholar]
Lee SW, Shimojo S, O'Doherty JP. 2014. Neural computations underlying arbitration between model-based and model-free learning. Neuron 81:3687–99 [Google Scholar]
Lee TG, Grafton ST. 2015. Out of control: Diminished prefrontal activity coincides with impaired motor performance due to choking under pressure. NeuroImage 105:145–55 [Google Scholar]
Levy DJ, Glimcher PW. 2012. The root of all value: a neural common currency for choice. Curr. Opin. Neurobiol. 22:61027–38 [Google Scholar]
Liljeholm M, Molloy CJ, O'Doherty JP. 2012. Dissociable brain systems mediate vicarious learning of stimulus-response and action-outcome contingencies. J. Neurosci. 32:299878–86 [Google Scholar]
Liljeholm M, Tricomi E, O'Doherty JP, Balleine BW. 2011. Neural correlates of instrumental contingency learning: differential effects of action-reward conjunction and disjunction. J. Neurosci. 31:72474–80 [Google Scholar]
Liljeholm M, Wang S, Zhang J, O'Doherty JP. 2013. Neural correlates of the divergence of instrumental probability distributions. J. Neurosci. 33:3012519–27 [Google Scholar]
Lovibond PF. 1983. Facilitation of instrumental behavior by a Pavlovian appetitive conditioned stimulus. J. Exp. Psychol. Anim. Behav. Process. 9:3225–47 [Google Scholar]
MacKay WA. 1992. Properties of reach-related neuronal activity in cortical area 7A. J. Neurophysiol. 67:51335–45 [Google Scholar]
Maia TV, Frank MJ. 2011. From RL models to psychiatric and neurological disorders. Nat. Neurosci. 14:2154–62 [Google Scholar]
Matsumoto K, Suzuki W, Tanaka K. 2003. Neuronal correlates of goal-based motor selection in the prefrontal cortex. Science 301:5630229–32 [Google Scholar]
McDannald MA, Lucantonio F, Burke KA, Niv Y, Schoenbaum G. 2011. Ventral striatum and OFC are both required for model-based, but not model-free, RL. J. Neurosci. 31:72700–5 [Google Scholar]
McNamee D, Rangel A, O'Doherty JP. 2013. Category-dependent and category-independent goal-value codes in human vmPFC. Nat. Neurosci. 16:4479–85 [Google Scholar]
Miller EK, Cohen JD. 2001. An integrative theory of prefrontal cortex function. Annu. Rev. Neurosci. 24:1167–202 [Google Scholar]
Mobbs D, Hassabis D, Seymour B, Marchant JL, Weiskopf N. et al. 2009. Choking on the money: Reward-based performance decrements are associated with midbrain activity. Psychol. Sci. 20:8955–62 [Google Scholar]
Montague PR, Dayan P, Sejnowski TJ. 1996. A framework for mesencephalic dopamine systems based on predictive Hebbian learning. J. Neurosci. 16:51936–47 [Google Scholar]
Montague PR, Dolan RJ, Friston KJ, Dayan P. 2012. Computational psychiatry. Trends Cogn. Sci. 16:172–80 [Google Scholar]
Morris RW, Dezfouli A, Griffiths KR, Balleine BW. 2014. Action-value comparisons in the dorsolateral prefrontal cortex control choice between goal-directed actions. Nat. Commun. 5:4390 [Google Scholar]
Nasser HM, Chen Y-W, Fiscella K, Calu DJ. 2015. Individual variability in behavioral flexibility predicts sign-tracking tendency. Front. Behav. Neurosci. 9:289 [Google Scholar]
O'Doherty J, Kringelbach ML, Rolls ET, Hornak J, Andrews C. 2001. Abstract reward and punishment representations in the human OFC. Nat. Neurosci. 4:195–102 [Google Scholar]
O'Doherty J, Rolls ET, Francis S, Bowtell R, McGlone F. et al. 2000. Sensory-specific satiety-related olfactory activation of the human OFC. Neuroreport 11:4893–97 [Google Scholar]
O'Doherty J, Winston J, Critchley H, Perrett D, Burt DM, Dolan RJ. 2003a. Beauty in a smile: the role of medial orbitofrontal cortex in facial attractiveness. Neuropsychologia 41:2147–55 [Google Scholar]
O'Doherty JP. 2004. Reward representations and reward-related learning in the human brain: insights from neuroimaging. Curr. Opin. Neurobiol. 14:6769–76 [Google Scholar]
O'Doherty JP. 2014. The problem with value. Neurosci. Biobehav. Rev. 43:259–68 [Google Scholar]
O'Doherty JP, Dayan P, Friston K, Critchley H, Dolan RJ. 2003b. Temporal difference models and reward-related learning in the human brain. Neuron 38:2329–37 [Google Scholar]
O'Doherty JP, Deichmann R, Critchley HD, Dolan RJ. 2002. Neural responses during anticipation of a primary taste reward. Neuron 33:5815–26 [Google Scholar]
O'Keefe J, Dostrovsky J. 1971. The hippocampus as a spatial map. Preliminary evidence from unit activity in the freely-moving rat. Brain Res 34:1171–75 [Google Scholar]
Ostlund SB, Balleine BW. 2005. Lesions of medial prefrontal cortex disrupt the acquisition but not the expression of goal-directed learning. J. Neurosci. 25:347763–70 [Google Scholar]
Padoa-Schioppa C, Assad JA. 2006. Neurons in the OFC encode economic value. Nature 441:7090223–26 [Google Scholar]
Pan X, Fan H, Sawa K, Tsuda I, Tsukada M, Sakagami M. 2014. Reward inference by primate prefrontal and striatal neurons. J. Neurosci. 34:41380–96 [Google Scholar]
Pascoe JP, Kapp BS. 1985. Electrophysiological characteristics of amygdaloid central nucleus neurons during Pavlovian fear conditioning in the rabbit. Behav. Brain Res. 16:2–3117–33 [Google Scholar]
Paton JJ, Belova MA, Morrison SE, Salzman CD. 2006. The primate amygdala represents the positive and negative value of visual stimuli during learning. Nature 439:7078865–70 [Google Scholar]
Pauli WM, Larsen T, Collette S, Tyszka JM, Seymour B, O'Doherty JP. 2015. Distinct contributions of ventromedial and dorsolateral subregions of the human substantia nigra to appetitive and aversive learning. J. Neurosci. 35:4214220–33 [Google Scholar]
Paulus MP, Rogalsky C, Simmons A, Feinstein JS, Stein MB. 2003. Increased activation in the right insula during risk-taking decision making is related to harm avoidance and neuroticism. NeuroImage 19:41439–48 [Google Scholar]
Pavlov I. 1927. Conditioned Reflexes: An Investigation of the Physiological Activity of the Cerebral Cortex London: Oxford Univ. Press
Payzan-LeNestour E, Dunne S, Bossaerts P, O'Doherty JP. 2013. The neural representation of unexpected uncertainty during value-based decision making. Neuron 79:1191–201 [Google Scholar]
Pezzulo G, Rigoli F, Chersi F. 2013. The mixed instrumental controller: using value of information to combine habitual choice and mental simulation. Front. Psychol. 4:92 [Google Scholar]
Pfeiffer BE, Foster DJ. 2013. Hippocampal place-cell sequences depict future paths to remembered goals. Nature 497:744774–79 [Google Scholar]
Plassmann H, O'Doherty J, Rangel A. 2007. OFC encodes willingness to pay in everyday economic transactions. J. Neurosci. 27:379984–88 [Google Scholar]
Plassmann H, O'Doherty J, Shiv B, Rangel A. 2008. Marketing actions can modulate neural representations of experienced pleasantness. PNAS 105:31050–54 [Google Scholar]
Plassmann H, O'Doherty JP, Rangel A. 2010. Appetitive and aversive goal values are encoded in the medial OFC at the time of decision making. J. Neurosci. 30:3210799–808 [Google Scholar]
Platt ML, Glimcher PW. 1999. Neural correlates of decision variables in parietal cortex. Nature 400:6741233–38 [Google Scholar]
Prévost C, McNamee D, Jessup RK, Bossaerts P, O'Doherty JP. 2013. Evidence for model-based computations in the human amygdala during Pavlovian conditioning. PLOS Comput. Biol. 9:2e1002918 [Google Scholar]
Prévost C, Pessiglione M, Météreau E, Cléry-Melin M-L, Dreher J-C. 2010. Separate valuation subsystems for delay and effort decision costs. J. Neurosci. 30:4214080–90 [Google Scholar]
Ragozzino ME, Ragozzino KE, Mizumori SJ, Kesner RP. 2002. Role of the dorsomedial striatum in behavioral flexibility for response and visual cue discrimination learning. Behav. Neurosci. 116:1105–15 [Google Scholar]
Rangel A, Camerer C, Montague PR. 2008. A framework for studying the neurobiology of value-based decision making. Nat. Rev. Neurosci. 9:7545–56 [Google Scholar]
Rangel A, Hare T. 2010. Neural computations associated with goal-directed choice. Curr. Opin. Neurobiol. 20:2262–70 [Google Scholar]
Rescorla RA. 1980. Simultaneous and successive associations in sensory preconditioning. J. Exp. Psychol. Anim. Behav. Process. 6:3207–16 [Google Scholar]
Rescorla RA, Solomon RL. 1967. Two-process learning theory: relationships between Pavlovian conditioning and instrumental learning. Psychol. Rev. 74:3151–82 [Google Scholar]
Rescorla RA, Wagner AR. 1972. A theory of Pavlovian conditioning: variations in the effectiveness of reinforcement and nonreinforcement. Classical Conditioning II: Current Research and Theory AH Black, WF Prokasy 64–99 New York: Appleton-Century-Crofts [Google Scholar]
Ribas-Fernandes JJF, Solway A, Diuk C, McGuire JT, Barto AG. et al. 2011. A neural signature of hierarchical RL. Neuron 71:2370–79 [Google Scholar]
Rolls ET, Kringelbach ML, De Araujo IET. 2003. Different representations of pleasant and unpleasant odours in the human brain. Eur. J. Neurosci. 18:3695–703 [Google Scholar]
Salzman CD, Fusi S. 2010. Emotion, cognition, and mental state representation in amygdala and prefrontal cortex. Annu. Rev. Neurosci. 33:173–202 [Google Scholar]
Salzman CD, Paton JJ, Belova MA, Morrison SE. 2007. Flexible neural representations of value in the primate brain. Ann. N. Y. Acad. Sci. 1121:1336–54 [Google Scholar]
Samejima K, Ueda Y, Doya K, Kimura M. 2005. Representation of action-specific reward values in the striatum. Science 310:57521337–40 [Google Scholar]
Schapiro AC, Rogers TT, Cordova NI, Turk-Browne NB, Botvinick MM. 2013. Neural representations of events arise from temporal community structure. Nat. Neurosci. 16:4486–92 [Google Scholar]
Schoenbaum G, Chiba AA, Gallagher M. 1998. OFC and basolateral amygdala encode expected outcomes during learning. Nat. Neurosci. 1:2155–59 [Google Scholar]
Schoenbaum G, Esber GR, Iordanova MD. 2013. Dopamine signals mimic reward prediction errors. Nat. Neurosci. 16:7777–79 [Google Scholar]
Schönberg T, Daw ND, Joel D, O'Doherty JP. 2007. Reinforcement learning signals in the human striatum distinguish learners from nonlearners during reward-based decision making. J. Neurosci. 27:4712860–67 [Google Scholar]
Schultz W, Dayan P, Montague PR. 1997. A neural substrate of prediction and reward. Science 275:53061593–99 [Google Scholar]
Seo H, Barraclough DJ, Lee D. 2007. Dynamic signals related to choices and outcomes in the dorsolateral prefrontal cortex. Cereb. Cortex 17:Suppl. 1110–17 [Google Scholar]
Seo H, Barraclough DJ, Lee D. 2009. Lateral intraparietal cortex and RL during a mixed-strategy game. J. Neurosci. 29:227278–89 [Google Scholar]
Shadlen MN, Newsome WT. 2001. Neural basis of a perceptual decision in the parietal cortex (Area LIP) of the rhesus monkey. J. Neurophysiol. 86:41916–36 [Google Scholar]
Shenhav A, Botvinick MM, Cohen JD. 2013. The expected value of control: an integrative theory of anterior cingulate cortex function. Neuron 79:2217–40 [Google Scholar]
Simon DA, Daw ND. 2011. Neural correlates of forward planning in a spatial decision task in humans. J. Neurosci. 31:145526–39 [Google Scholar]
Small DM, Zatorre RJ, Dagher A, Evans AC, Jones-Gotman M. 2001. Changes in brain activity related to eating chocolate. Brain 124:91720–33 [Google Scholar]
Smith DV, Hayden BY, Truong T-K, Song AW, Platt ML, Huettel SA. 2010. Distinct value signals in anterior and posterior VmPFC. J. Neurosci. 30:72490–95 [Google Scholar]
Sohn J-W, Lee D. 2007. Order-dependent modulation of directional signals in the supplementary and presupplementary motor areas. J. Neurosci. 27:5013655–66 [Google Scholar]
Staudinger MR, Erk S, Abler B, Walter H. 2009. Cognitive reappraisal modulates expected value and prediction error encoding in the ventral striatum. NeuroImage 47:2713–21 [Google Scholar]
Steinberg EE, Janak PH. 2013. Establishing causality for dopamine in neural function and behavior with optogenetics. Brain Res 1511:46–64 [Google Scholar]
Strait CE, Blanchard TC, Hayden BY. 2014. Reward value comparison via mutual inhibition in VmPFC. Neuron 82:61357–66 [Google Scholar]
Sutton RS. 1988. Learning to predict by the methods of temporal differences. Mach. Learn. 3:19–44 [Google Scholar]
Sutton RS. 1990. RL architectures for animats. Proc. Int. Conf. Simul. Adapt. Behav., 1st, From Animals to Animats, Cambridge, MA288–96 Cambridge, MA: MIT Press [Google Scholar]
Sutton RS, Precup D, Singh S. 1999. Between MDPs and semi-MDPs: a framework for temporal abstraction in RL. Artif. Intell. 112:181–211 [Google Scholar]
Suzuki S, Adachi R, Dunne S, Bossaerts P, O'Doherty JP. 2015. Neural mechanisms underlying human consensus decision-making. Neuron 86:2591–602 [Google Scholar]
Tavares RM, Mendelsohn A, Grossman Y, Williams CH, Shapiro M. et al. 2015. A map for social navigation in the human brain. Neuron 87:1231–43 [Google Scholar]
Thibodeau GA, Patton KT. 1992. Structure & Function of the Body St. Louis, MO: Mosby Year Book, 9th ed..
Thorndike EL. 1898. Animal intelligence: an experimental study of the associative processes in animals. Psychol. Rev. Monogr. Suppl. 2:41–109 [Google Scholar]
Tobler PN, O'Doherty JP, Dolan RJ, Schultz W. 2006. Human neural learning depends on reward prediction errors in the blocking paradigm. J. Neurophysiol. 95:1301–10 [Google Scholar]
Tolman EC. 1948. Cognitive maps in rats and men. Psychol. Rev. 55:4189–208 [Google Scholar]
Tricomi E, Balleine BW, O'Doherty JP. 2009. A specific role for posterior dorsolateral striatum in human habit learning. Eur. J. Neurosci. 29:112225–32 [Google Scholar]
Tully T, Quinn WG. 1985. Classical conditioning and retention in normal and mutant Drosophila melanogaster. J. Comp. Physiol. 157:2263–77 [Google Scholar]
Valentin VV, Dickinson A, O'Doherty JP. 2007. Determining the neural substrates of goal-directed learning in the human brain. J. Neurosci. 27:154019–26 [Google Scholar]
Voon V, Derbyshire K, Rück C, Irvine MA, Worbe Y. et al. 2015. Disorders of compulsivity: a common bias towards learning habits. Mol. Psychiatry 20:3345–52 [Google Scholar]
Wallis JD, Miller EK. 2003. Neuronal activity in primate dorsolateral and orbital prefrontal cortex during performance of a reward preference task. Eur. J. Neurosci. 18:72069–81 [Google Scholar]
Walters ET, Carew TJ, Kandel ER. 1981. Associative learning in Aplysia: evidence for conditioned fear in an invertebrate. Science 211:4481504–6 [Google Scholar]
Walton ME, Groves J, Jennings KA, Croxson PL, Sharp T. et al. 2009. Comparing the role of the anterior cingulate cortex and 6-hydroxydopamine nucleus accumbens lesions on operant effort-based decision making. Eur. J. Neurosci. 29:81678–91 [Google Scholar]
Watson P, Wiers RW, Hommel B, de Wit S. 2014. Working for food you don't desire. Cues interfere with goal-directed food-seeking. Appetite 79:139–48 [Google Scholar]
Whitlock JR, Pfuhl G, Dagslott N, Moser M-B, Moser EI. 2012. Functional split between parietal and entorhinal cortices in the rat. Neuron 73:4789–802 [Google Scholar]
Wilber AA, Clark BJ, Forster TC, Tatsuno M, McNaughton BL. 2014. Interaction of egocentric and world-centered reference frames in the rat posterior parietal cortex. J. Neurosci. 34:165431–46 [Google Scholar]
Wilson RC, Takahashi YK, Schoenbaum G, Niv Y. 2014. OFC as a cognitive map of task space. Neuron 81:2267–79 [Google Scholar]
Wimmer GE, Shohamy D. 2012. Preference by association: how memory mechanisms in the hippocampus bias decisions. Science 338:6104270–73 [Google Scholar]
Winn P, Brown VJ, Inglis WL. 1997. On the relationships between the striatum and the pedunculopontine tegmental nucleus. Crit. Rev. Neurobiol. 11:4241–61 [Google Scholar]
Wittmann BC, Schott BH, Guderian S, Frey JU, Heinze H-J, Düzel E. 2005. Reward-related fMRI activation of dopaminergic midbrain is associated with enhanced hippocampus-dependent long-term memory formation. Neuron 45:3459–67 [Google Scholar]
Wunderlich K, Dayan P, Dolan RJ. 2012. Mapping value based planning and extensively trained choice in the human brain. Nat. Neurosci. 15:5786–91 [Google Scholar]
Wunderlich K, Rangel A, O'Doherty JP. 2009. Neural computations underlying action-based decision making in the human brain. PNAS 106:4017199–204 [Google Scholar]
Yanike M, Ferrera VP. 2014. Representation of outcome risk and action in the anterior caudate nucleus. J. Neurosci. 34:93279–90 [Google Scholar]
Yin HH, Knowlton BJ, Balleine BW. 2004. Lesions of dorsolateral striatum preserve outcome expectancy but disrupt habit formation in instrumental learning. Eur. J. Neurosci. 19:1181–89 [Google Scholar]
Yin HH, Knowlton BJ, Balleine BW. 2005. Blockade of NMDA receptors in the dorsomedial striatum prevents action-outcome learning in instrumental conditioning. Eur. J. Neurosci. 22:2505–12 [Google Scholar]
Yin HH, Knowlton BJ, Balleine BW. 2006. Inactivation of dorsolateral striatum enhances sensitivity to changes in the action-outcome contingency in instrumental conditioning. Behav. Brain Res. 166:2189–96 [Google Scholar]
Yoshida W, Ishii S. 2006. Resolution of uncertainty in prefrontal cortex. Neuron 50:5781–89 [Google Scholar]
Zedelius CM, Veling H, Aarts H. 2011. Boosting or choking—how conscious and unconscious reward processing modulate the active maintenance of goal-relevant information. Conscious. Cogn. 20:2355–62 [Google Scholar]

/content/journals/10.1146/annurev-psych-010416-044216

Learning, Reward, and Decision Making

Annual Review of Psychology 68, 73 (2017); https://doi.org/10.1146/annurev-psych-010416-044216

/content/journals/10.1146/annurev-psych-010416-044216

Data & Media loading...

Article Type: Review Article

Most Cited Most Cited RSS feed

- Job Burnout
  
  Christina Maslach, Wilmar B. Schaufeli, and Michael P. Leiter
  
  Vol. 52 (2001), pp. 397–422
- Executive Functions
  
  Adele Diamond
  
  Vol. 64 (2013), pp. 135–168
- Social Cognitive Theory: An Agentic Perspective
  
  Albert Bandura
  
  Vol. 52 (2001), pp. 1–26
- On Happiness and Human Potentials: A Review of Research on Hedonic and Eudaimonic Well-Being
  
  Richard M. Ryan, and Edward L. Deci
  
  Vol. 52 (2001), pp. 141–166
- Mediation Analysis
  
  David P. MacKinnon, Amanda J. Fairchild, and Matthew S. Fritz
  
  Vol. 58 (2007), pp. 593–614
- Missing Data Analysis: Making It Work in the Real World
  
  John W. Graham
  
  Vol. 60 (2009), pp. 549–576
- Sources of Method Bias in Social Science Research and Recommendations on How to Control It
  
  Philip M. Podsakoff, Scott B. MacKenzie, and Nathan P. Podsakoff
  
  Vol. 63 (2012), pp. 539–569
- Grounded Cognition
  
  Lawrence W. Barsalou
  
  Vol. 59 (2008), pp. 617–645
- Personality Structure: Emergence of the Five-Factor Model
  
  J M Digman
  
  Vol. 41 (1990), pp. 417–440
- Motivational Beliefs, Values, and Goals
  
  Jacquelynne S. Eccles, and Allan Wigfield
  
  Vol. 53 (2002), pp. 109–132
More Less

Annual Review of Psychology

Volume 68, 2017

Review Article

Free

Learning, Reward, and Decision Making

Abstract

Most Read This Month

Most Cited Most Cited RSS feed

Job Burnout

Executive Functions

Social Cognitive Theory: An Agentic Perspective

On Happiness and Human Potentials: A Review of Research on Hedonic and Eudaimonic Well-Being

Mediation Analysis

Missing Data Analysis: Making It Work in the Real World

Sources of Method Bias in Social Science Research and Recommendations on How to Control It

Grounded Cognition

Personality Structure: Emergence of the Five-Factor Model

Motivational Beliefs, Values, and Goals