Reinforcement Learning and Episodic Memory in Humans and Animals: An Integrative Framework

Samuel J. Gershman; Nathaniel D. Daw

doi:10.1146/annurev-psych-122414-033625

Reinforcement Learning and Episodic Memory in Humans and Animals: An Integrative Framework

Samuel J. Gershman¹, and Nathaniel D. Daw²
View Affiliations Hide Affiliations

Affiliations: ¹Department of Psychology and Center for Brain Science, Harvard University, Cambridge, Massachusetts 02138; email: [email protected] ²Princeton Neuroscience Institute and Department of Psychology, Princeton University, Princeton, New Jersey 08544
Vol. 68:101-128 (Volume publication date January 2017) https://doi.org/10.1146/annurev-psych-122414-033625
First published as a Review in Advance on September 02, 2016
© Annual Reviews

Abstract

We review the psychology and neuroscience of reinforcement learning (RL), which has experienced significant progress in the past two decades, enabled by the comprehensive experimental study of simple learning and decision-making tasks. However, one challenge in the study of RL is computational: The simplicity of these tasks ignores important aspects of reinforcement learning in the real world: (a) State spaces are high-dimensional, continuous, and partially observable; this implies that (b) data are relatively sparse and, indeed, precisely the same situation may never be encountered twice; furthermore, (c) rewards depend on the long-term consequences of actions in ways that violate the classical assumptions that make RL tractable. A seemingly distinct challenge is that, cognitively, theories of RL have largely involved procedural and semantic memory, the way in which knowledge about action values or world models extracted gradually from many experiences can drive choice. This focus on semantic memory leaves out many aspects of memory, such as episodic memory, related to the traces of individual events. We suggest that these two challenges are related. The computational challenge can be dealt with, in part, by endowing RL systems with episodic memory, allowing them to (a) efficiently approximate value functions over complex state spaces, (b) learn with very little data, and (c) bridge long-term dependencies between actions and rewards. We review the computational theory underlying this proposal and the empirical evidence to support it. Our proposal suggests that the ubiquitous and diverse roles of memory in RL may function as part of an integrated learning system.

Keyword(s): decision making, memory, reinforcement learning

Article metrics loading...

/content/journals/10.1146/annurev-psych-122414-033625

2017-01-03

2024-04-29

Full text loading...

/deliver/fulltext/psych/68/1/annurev-psych-122414-033625.html?itemId=/content/journals/10.1146/annurev-psych-122414-033625&mimeType=html&fmt=ahah

Literature Cited

Adams CD. 1982. Variations in the sensitivity of instrumental responding to reinforcer devaluation. Q. J. Exp. Psychol. 34:77–98 [Google Scholar]
Alexander GE, Crutcher MD. 1990. Functional architecture of basal ganglia circuits: neural substrates of parallel processing. Trends Neurosci 13:266–71 [Google Scholar]
Barron G, Erev I. 2003. Small feedback-based decisions and their limited correspondence to description-based decisions. J. Behav. Decis. Making 16:215–33 [Google Scholar]
Bayer HM, Glimcher PW. 2005. Midbrain dopamine neurons encode a quantitative reward prediction error signal. Neuron 47:129–41 [Google Scholar]
Bellman R. 1957. Dynamic Programming Princeton, NJ: Princeton Univ. Press
Bertsekas DP, Tsitsiklis JN. 1996. Neuro-Dynamic Programming Nashua, NH: Athena Sci.
Bettman JR. 1979. Information Processing Theory of Consumer Choice Boston: Addison-Wesley
Biele G, Erev I, Ert E. 2009. Learning, risk attitude and hot stoves in restless bandit problems. J. Math. Psychol. 53:3155–67 [Google Scholar]
Bornstein AM, Khaw MW, Shohamy D, Daw ND. 2015. What's past is present: Reminders of past choices bias decisions for reward in humans. bioRxiv 033910. doi: 10.1101/033910
Botvinick MM, Niv Y, Barto AC. 2009. Hierarchically organized behavior and its neural foundations: a reinforcement learning perspective. Cognition 113:262–80 [Google Scholar]
Braver TS, Cohen JD. 2000. On the control of control: the role of dopamine in regulating prefrontal function and working memory. Control of Cognitive Processes: Attention and Performance XVIII S Monsell, J Driver 713–37 Cambridge, MA: MIT Press [Google Scholar]
Brogden W. 1939. Sensory pre-conditioning. J. Exp. Psychol. 25:323–32 [Google Scholar]
Brown TI, Ross RS, Tobyne SM, Stern CE. 2012. Cooperative interactions between hippocampal and striatal systems support flexible navigation. NeuroImage 60:1316–30 [Google Scholar]
Carr MF, Jadhav SP, Frank LM. 2011. Hippocampal replay in the awake state: a potential substrate for memory consolidation and retrieval. Nat. Neurosci. 14:147–53 [Google Scholar]
Cohen JD, Braver TS, Brown JW. 2002. Computational perspectives on dopamine function in prefrontal cortex. Curr. Opin. Neurobiol. 12:223–29 [Google Scholar]
Cohen JY, Haesler S, Vong L, Lowell BB, Uchida N. 2012. Neuron-type-specific signals for reward and punishment in the ventral tegmental area. Nature 482:85–88 [Google Scholar]
Collins AG, Brown JK, Gold JM, Waltz JA, Frank MJ. 2014. Working memory contributions to reinforcement learning impairments in schizophrenia. J. Neurosci. 34:13747–56 [Google Scholar]
Collins AG, Frank MJ. 2012. How much of reinforcement learning is working memory, not reinforcement learning? A behavioral, computational, and neurogenetic analysis. Eur. J. Neurosci. 35:1024–35 [Google Scholar]
Cushman F, Morris A. 2015. Habitual control of goal selection in humans. PNAS 112:13817–22 [Google Scholar]
Daw ND. 2013. Advanced reinforcement learning. See Glimcher & Fehr 2013 299–320
Daw ND, Courville AC, Touretzky DS. 2006. Representation and timing in theories of the dopamine system. Neural Comput 18:1637–77 [Google Scholar]
Daw ND, Dayan P. 2014. The algorithmic anatomy of model-based evaluation. Philos. Trans. R. Soc. Lond. B Biol. Sci. 369:20130478 [Google Scholar]
Daw ND, Gershman SJ, Seymour B, Dayan P, Dolan RJ. 2011. Model-based influences on humans' choices and striatal prediction errors. Neuron 69:1204–15 [Google Scholar]
Daw ND, Niv Y, Dayan P. 2005. Uncertainty-based competition between prefrontal and dorsolateral striatal systems for behavioral control. Nat. Neurosci. 8:1704–11 [Google Scholar]
Daw ND, O'Doherty JP. 2013. Multiple systems for value learning. See Glimcher & Fehr 2013 393–410
Daw ND, Tobler PN. 2013. Value learning through reinforcement: the basics of dopamine and reinforcement learning. See Glimcher & Fehr 2013 283–98
Dayan P. 1993. Improving generalization for temporal difference learning: the successor representation. Neural Comput 5:613–24 [Google Scholar]
Dezfouli A, Balleine BW. 2013. Actions, action sequences and habits: evidence that goal-directed and habitual action control are hierarchically organized. PLOS Comput. Biol. 9:e1003364 [Google Scholar]
Dickinson A, Balleine BW. 2002. The role of learning in the operation of motivational systems. Steven's Handbook of Experimental Psychology 3 Learning, Motivation and Emotion CR Gallistel 497–534 New York: John Wiley & Sons, 3rd ed.. [Google Scholar]
Diuk C, Tsai K, Wallis J, Botvinick M, Niv Y. 2013. Hierarchical learning induces two simultaneous, but separable, prediction errors in human basal ganglia. J. Neurosci. 33:5797–805 [Google Scholar]
Dolan RJ, Dayan P. 2013. Goals and habits in the brain. Neuron 80:312–25 [Google Scholar]
Doll BB, Duncan KD, Simon DA, Shohamy D, Daw ND. 2015. Model-based choices involve prospective neural activity. Nat. Neurosci. 18:767–72 [Google Scholar]
Eichenbaum H, Cohen NJ. 2004. From Conditioning to Conscious Recollection: Memory Systems of the Brain Oxford, UK: Oxford Univ. Press
Engel Y, Mannor S, Meir R. 2005. Reinforcement learning with Gaussian processes. Proc. Int. Conf. Mach. Learn., 22nd, Bonn, Ger.201–8 New York: Assoc. Comput. Mach. [Google Scholar]
Erev I, Ert E, Yechiam E. 2008. Loss aversion, diminishing sensitivity, and the effect of experience on repeated decisions. J. Behav. Decis. Making 21:575–97 [Google Scholar]
Everitt BJ, Robbins TW. 2005. Neural systems of reinforcement for drug addiction: from actions to habits to compulsion. Nat. Neurosci. 8:1481–89 [Google Scholar]
Ezzyat Y, Davachi L. 2011. What constitutes an episode in episodic memory. Psychol. Sci. 22:2243–52 [Google Scholar]
Foster DJ, Morris RGM, Dayan P. 2000. A model of hippocampally dependent navigation, using the temporal difference learning rule. Hippocampus 10:1–16 [Google Scholar]
Frank MJ, Seeberger LC, O'Reilly RC. 2004. By carrot or by stick: cognitive reinforcement learning in parkinsonism. Science 306:1940–43 [Google Scholar]
Fu W-T, Anderson JR. 2008. Solving the credit assignment problem: explicit and implicit learning of action sequences with probabilistic outcomes. Psychol. Res. 72:321–30 [Google Scholar]
Gabrieli JD. 1998. Cognitive neuroscience of human memory. Annu. Rev. Psychol. 49:87–115 [Google Scholar]
Gärtner T, Lloyd JW, Flach PA. 2004. Kernels and distances for structured data. Mach. Learn. 57:205–32 [Google Scholar]
Geman S, Bienenstock E, Doursat R. 1992. Neural networks and the bias/variance dilemma. Neural Comput 4:1–58 [Google Scholar]
Gershman SJ, Blei DM, Niv Y. 2010. Context, learning, and extinction. Psychol. Rev. 117:197–209 [Google Scholar]
Gershman SJ, Markman AB, Otto AR. 2014. Retrospective revaluation in sequential decision making: a tale of two systems. J. Exp. Psychol. Gen. 143:182–94 [Google Scholar]
Gershman SJ, Norman KA, Niv Y. 2015. Discovering latent causes in reinforcement learning. Curr. Opin. Behav. Sci. 5:43–50 [Google Scholar]
Gilboa I, Schmeidler D. 2001. A Theory of Case-Based Decisions Cambridge, UK: Cambridge Univ. Press
Gillan CM, Kosinski M, Whelan R, Phelps EA, Daw ND. 2016. Characterizing a psychiatric symptom dimension related to deficits in goal-directed control. eLife 5:e11305 [Google Scholar]
Gläscher J, Daw N, Dayan P, O'Doherty JP. 2010. States versus rewards: dissociable neural prediction error signals underlying model-based and model-free reinforcement learning. Neuron 66:585–95 [Google Scholar]
Glimcher PW, Fehr E. 2013. Neuroeconomics: Decision-Making and the Brain Cambridge, MA: Academic
Gonzalez C, Dutt V. 2011. Instance-based learning: integrating sampling and repeated decisions from experience. Psychol. Rev. 118:523–51 [Google Scholar]
Gonzalez C, Lerch JF, Lebiere C. 2003. Instance-based learning in dynamic decision making. Cogn. Sci. 27:591–635 [Google Scholar]
Griffiths TL, Chater N, Kemp C, Perfors A, Tenenbaum JB. 2010. Probabilistic models of cognition: exploring representations and inductive biases. Trends Cogn. Sci. 14:357–64 [Google Scholar]
Gustafson NJ, Daw ND. 2011. Grid cells, place cells, and geodesic generalization for spatial reinforcement learning. PLOS Comput. Biol. 7:e1002235 [Google Scholar]
Hare TA, O'Doherty J, Camerer CF, Schultz W, Rangel A. 2008. Dissociating the role of the orbitofrontal cortex and the striatum in the computation of goal values and prediction errors. J. Neurosci. 28:5623–30 [Google Scholar]
Hart AS, Rutledge RB, Glimcher PW, Phillips PE. 2014. Phasic dopamine release in the rat nucleus accumbens symmetrically encodes a reward prediction error term. J. Neurosci. 34:698–704 [Google Scholar]
Hassabis D, Maguire EA. 2009. The construction system of the brain. Philos. Trans. R. Soc. Lond. B Biol. Sci. 364:15211263–71 [Google Scholar]
Hau R, Pleskac TJ, Kiefer J, Hertwig R. 2008. The description-experience gap in risky choice: the role of sample size and experienced probabilities. J. Behav. Decis. Making 21:493–518 [Google Scholar]
Hertwig R, Erev I. 2009. The description-experience gap in risky choice. Trends Cogn. Sci. 13:517–23 [Google Scholar]
Houk JC, Adams JL, Barto AG. 1995. A model of how the basal ganglia generate and use neural signals that predict reinforcement. Models of Information Processing in the Basal Ganglia JC Houk, JL Davis, DG Beiser 249–70 Cambridge, MA: MIT Press [Google Scholar]
Huys QJ, Lally N, Faulkner P, Eshel N, Seifritz E. et al. 2015. Interplay of approximate planning strategies. PNAS 112:3098–103 [Google Scholar]
Jäkel F, Schölkopf B, Wichmann FA. 2009. Does cognitive science need kernels. Trends Cogn. Sci. 13:381–88 [Google Scholar]
Johnson A, Redish AD. 2007. Neural ensembles in CA3 transiently encode paths forward of the animal at a decision point. J. Neurosci. 27:12176–89 [Google Scholar]
Kaelbling LP, Littman ML, Cassandra AR. 1998. Planning and acting in partially observable stochastic domains. Artif. Intell. 101:99–134 [Google Scholar]
Kahneman D, Miller DT. 1986. Norm theory: comparing reality to its alternatives. Psychol. Rev. 93:136–53 [Google Scholar]
Kahneman D, Tversky A. 1979. Prospect theory: an analysis of decision under risk. Econometrica 47:263–91 [Google Scholar]
Kearns MJ, Singh SP. 2000. “Bias-variance” error bounds for temporal difference updates. Proc. Annu. Conf. Comput. Learn. Theory, 13th, Stanford, CA142–47 New York: Assoc. Comput. Mach. [Google Scholar]
Keramati M, Dezfouli A, Piray P. 2011. Speed/accuracy trade-off between the habitual and the goal-directed processes. PLOS Comput. Biol. 7:e1002055 [Google Scholar]
Knowlton BJ, Mangels JA, Squire LR. 1996. A neostriatal habit learning system in humans. Science 273:1399–402 [Google Scholar]
Kruschke JK. 1992. ALCOVE: an exemplar-based connectionist model of category learning. Psychol. Rev. 99:22–44 [Google Scholar]
Kurth-Nelson Z, Barnes G, Sejdinovic D, Dolan R, Dayan P. 2015. Temporal structure in associative retrieval. eLife 4:e04919 [Google Scholar]
Lansink CS, Goltstein PM, Lankelma JV, McNaughton BL, Pennartz CM. 2009. Hippocampus leads ventral striatum in replay of place-reward information. PLOS Biol 7:e1000173 [Google Scholar]
Lau B, Glimcher PW. 2005. Dynamic response-by-response models of matching behavior in rhesus monkeys. J. Exp. Anal. Behav. 84:555–79 [Google Scholar]
Lee SW, Shimojo S, O'Doherty JP. 2014. Neural computations underlying arbitration between model-based and model-free learning. Neuron 81:687–99 [Google Scholar]
Lengyel M, Dayan P. 2007. Hippocampal contributions to control: the third way. Adv. Neural Inf. Process. Syst. 20:889–96 [Google Scholar]
Lieder F, Hsu M, Griffiths TL. 2014. The high availability of extreme events serves resource-rational decision-making. Proc. Ann. Conf. Cogn. Sci. Soc., 36th, Quebec City, Can.2567–72 Wheat Ridge, CO: Cogn. Sci. Soc. [Google Scholar]
Love BC, Medin DL, Gureckis TM. 2004. SUSTAIN: a network model of category learning. Psychol. Rev. 111:309–32 [Google Scholar]
Ludvig EA, Madan CR, Spetch ML. 2015. Priming memories of past wins induces risk seeking. J. Exp. Psychol. Gen. 144:24–29 [Google Scholar]
Ludvig EA, Sutton RS, Kehoe EJ. 2008. Stimulus representation and the timing of reward-prediction errors in models of the dopamine system. Neural Comput 20:3034–54 [Google Scholar]
Lynch JG Jr., Srull TK. 1982. Memory and attentional factors in consumer choice: concepts and research methods. J. Consum. Res. 9:18–37 [Google Scholar]
Madan CR, Ludvig EA, Spetch ML. 2014. Remembering the best and worst of times: memories for extreme outcomes bias risky decisions. Psychon. Bull. Rev. 21:629–36 [Google Scholar]
Mahadevan S, Maggioni M. 2007. Proto-value functions: a Laplacian framework for learning representation and control in Markov decision processes. J. Mach. Learn. Res. 8:2169–231 [Google Scholar]
Mnih V, Kavukcuoglu K, Silver D, Rusu AA, Veness J. et al. 2015. Human-level control through deep reinforcement learning. Nature 518:529–33 [Google Scholar]
Montague PR, Dayan P, Sejnowski TJ. 1996. A framework for mesencephalic dopamine systems based on predictive Hebbian learning. J. Neurosci. 16:1936–47 [Google Scholar]
Murty VP, Feldman Hall O, Hunter LE, Phelps EA, Davachi L. 2016. Episodic memories predict adaptive value-based decision-making. J. Exp. Psychol. Gen. 145:548–58 [Google Scholar]
Nedungadi P. 1990. Recall and consumer consideration sets: influencing choice without altering brand evaluations. J. Consum. Res. 17:263–76 [Google Scholar]
Niv Y. 2009. Reinforcement learning in the brain. J. Math. Psychol. 53:139–54 [Google Scholar]
Niv Y, Daniel R, Geana A, Gershman SJ, Leong YC. et al. 2015. Reinforcement learning in multidimensional environments relies on attention mechanisms. J. Neurosci. 35:8145–57 [Google Scholar]
Nosofsky RM. 1986. Attention, similarity, and the identification-categorization relationship. J. Exp. Psychol. Gen. 115:39–57 [Google Scholar]
O'Keefe J, Nadel L. 1978. The Hippocampus as a Cognitive Map Oxford, UK: Clarendon Press
O'Reilly RC, Frank MJ. 2006. Making working memory work: a computational model of learning in the prefrontal cortex and basal ganglia. Neural Comput 18:283–328 [Google Scholar]
Ormoneit D, Sen Ś. 2002. Kernel-based reinforcement learning. Mach. Learn. 49:161–78 [Google Scholar]
Otto AR, Gershman SJ, Markman AB, Daw ND. 2013a. The curse of planning: dissecting multiple reinforcement-learning systems by taxing the central executive. Psychol. Sci. 24:751–61 [Google Scholar]
Otto AR, Raio CM, Chiang A, Phelps EA, Daw ND. 2013b. Working-memory capacity protects model-based learning from stress. PNAS 110:20941–46 [Google Scholar]
Packard MG, McGaugh JL. 1996. Inactivation of hippocampus or caudate nucleus with lidocaine differentially affects expression of place and response learning. Neurobiol. Learn. Mem. 65:65–72 [Google Scholar]
Parker NF, Cameron CM, Taliaferro JP, Lee J, Choi JY. et al. 2016. Reward and choice encoding in terminals of midbrain dopamine neurons depends on striatal target. Nat. Neurosci. 19:845–54 [Google Scholar]
Pennartz CMA, Ito R, Verschure PFMJ, Battaglia FP, Robbins TW. 2011. The hippocampal-striatal axis in learning, prediction and goal-directed behavior. Trends Neurosci 34:548–59 [Google Scholar]
Pessiglione M, Seymour B, Flandin G, Dolan RJ, Frith CD. 2006. Dopamine-dependent prediction errors underpin reward-seeking behaviour in humans. Nature 442:1042–45 [Google Scholar]
Pfeiffer BE, Foster DJ. 2013. Hippocampal place-cell sequences depict future paths to remembered goals. Nature 497:74–79 [Google Scholar]
Poldrack RA, Clark J, Pare-Blagoev E, Shohamy D, Moyano JC. et al. 2001. Interactive memory systems in the human brain. Nature 414:546–50 [Google Scholar]
Rao RP. 2010. Decision making under uncertainty: a neural model based on partially observable Markov decision processes. Front. Comput. Neurosci. 4:146 [Google Scholar]
Redish AD. 2004. Addiction as a computational process gone awry. Science 306:1944–47 [Google Scholar]
Riesbeck CK, Schank RC. 1989. Inside Case-based Reasoning Mahwah, NJ: Lawrence Erlbaum Assoc.
Ross RS, Sherrill KR, Stern CE. 2011. The hippocampus is functionally connected to the striatum and orbitofrontal cortex during context dependent decision making. Brain Res 1423:53–66 [Google Scholar]
Sadacca BF, Jones JL, Schoenbaum G. 2016. Midbrain dopamine neurons compute inferred and cached value prediction errors in a common framework. eLife 5:e13665 [Google Scholar]
Schacter DL, Addis DR, Hassabis D, Martin VC, Spreng RN, Szpunar KK. 2012. The future of memory: remembering, imagining, and the brain. Neuron 76:677–94 [Google Scholar]
Schölkopf B, Smola AJ. 2002. Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond Cambridge, MA: MIT Press
Schonberg T, O'Doherty JP, Joel D, Inzelberg R, Segev Y, Daw ND. 2010. Selective impairment of prediction error signaling in human dorsolateral but not ventral striatum in Parkinson's disease patients: evidence from a model-based fMRI study. NeuroImage 49:772–81 [Google Scholar]
Schultz W, Dayan P, Montague PR. 1997. A neural substrate of prediction and reward. Science 275:1593–99 [Google Scholar]
Seymour B, Daw ND, Roiser JP, Dayan P, Dolan R. 2012. Serotonin selectively modulates reward value in human decision-making. J. Neurosci. 32:5833–42 [Google Scholar]
Shohamy D, Daw ND. 2015. Integrating memories to guide decisions. Curr. Opin. Behav. Sci. 5:85–90 [Google Scholar]
Shohamy D, Myers CE, Grossman S, Sage J, Gluck MA. 2005. The role of dopamine in cognitive sequence learning: evidence from Parkinson's disease. Behav. Brain Res. 156:191–99 [Google Scholar]
Shohamy D, Wagner AD. 2008. Integrating memories in the human brain: hippocampal-midbrain encoding of overlapping events. Neuron 60:378–89 [Google Scholar]
Simonson I, Tversky A. 1992. Choice in context: tradeoff contrast and extremeness aversion. J. Mark. Res. 29:281–95 [Google Scholar]
Simonsohn U, Loewenstein G. 2006. Mistake #37: the effect of previously encountered prices on current housing demand. Econ. J. 116:175–99 [Google Scholar]
Skaggs WE, McNaughton BL. 1996. Replay of neuronal firing sequences in rat hippocampus during sleep following spatial experience. Science 271:1870–73 [Google Scholar]
Solway A, Botvinick MM. 2015. Evidence integration in model-based tree search. PNAS 112:11708–13 [Google Scholar]
Squire LR. 1992. Memory and the hippocampus: a synthesis from findings with rats, monkeys, and humans. Psychol. Rev. 99:195–231 [Google Scholar]
Stachenfeld KL, Botvinick M, Gershman SJ. 2014. Design principles of the hippocampal cognitive map. Adv. Neural Inf. Process. Sys. 27:2528–36 [Google Scholar]
Steinberg EE, Keiflin R, Boivin JR, Witten IB, Deisseroth K, Janak PH. 2013. A causal link between prediction errors, dopamine neurons and learning. Nat. Neurosci. 16:966–73 [Google Scholar]
Stewart N, Brown GD, Chater N. 2005. Absolute identification by relative judgment. Psychol. Rev. 112:881–911 [Google Scholar]
Stewart N, Chater N, Brown GD. 2006. Decision by sampling. Cogn. Psychol. 53:1–26 [Google Scholar]
Sutton RS. 1988. Learning to predict by the methods of temporal differences. Mach. Learn. 3:9–44 [Google Scholar]
Sutton RS. 1991. Dyna, an integrated architecture for learning, planning, and reacting. ACM SIGART Bull 2:160–63 [Google Scholar]
Sutton RS, Barto AG. 1998. Reinforcement Learning: An Introduction Cambridge, MA: MIT Press
Tenenbaum JB, De Silva V, Langford JC. 2000. A global geometric framework for nonlinear dimensionality reduction. Science 290:2319–23 [Google Scholar]
Todd MT, Niv Y, Cohen JD. 2008. Learning to use working memory in partially observable environments through dopaminergic reinforcement. Adv. Neural Inf. Process. Sys. 21:1689–96 [Google Scholar]
Tolman EC. 1948. Cognitive maps in rats and men. Psychol. Rev. 55:189–208 [Google Scholar]
Tulving E. 1972. Episodic and semantic memory 1. Organization and Memory E Tulving, W Donaldson 381–402 New York: Academic [Google Scholar]
Vaidya AR, Fellows LK. 2015. Ventromedial frontal cortex is critical for guiding attention to reward-predictive visual features in humans. J. Neurosci. 35:12813–23 [Google Scholar]
van der Meer MA, Redish AD. 2011. Theta phase precession in rat ventral striatum links place and reward information. J. Neurosci. 31:2843–54 [Google Scholar]
Wasserman L. 2006. All of Nonparametric Statistics Berlin: Springer Science & Business Media
Wimmer GE, Braun EK, Daw ND, Shohamy D. 2014. Episodic memory encoding interferes with reward learning and decreases striatal prediction errors. J. Neurosci. 34:14901–12 [Google Scholar]
Wimmer GE, Shohamy D. 2012. Preference by association: how memory mechanisms in the hippocampus bias decisions. Science 338:270–73 [Google Scholar]
Zilli EA, Hasselmo ME. 2008. Modeling the role of working memory and episodic memory in behavioral tasks. Hippocampus 18:193–209 [Google Scholar]

/content/journals/10.1146/annurev-psych-122414-033625

Reinforcement Learning and Episodic Memory in Humans and Animals: An Integrative Framework

Annual Review of Psychology 68, 101 (2017); https://doi.org/10.1146/annurev-psych-122414-033625

/content/journals/10.1146/annurev-psych-122414-033625

Data & Media loading...

Article Type: Review Article

Most Cited Most Cited RSS feed

- Job Burnout
  
  Christina Maslach, Wilmar B. Schaufeli, and Michael P. Leiter
  
  Vol. 52 (2001), pp. 397–422
- Executive Functions
  
  Adele Diamond
  
  Vol. 64 (2013), pp. 135–168
- Social Cognitive Theory: An Agentic Perspective
  
  Albert Bandura
  
  Vol. 52 (2001), pp. 1–26
- On Happiness and Human Potentials: A Review of Research on Hedonic and Eudaimonic Well-Being
  
  Richard M. Ryan, and Edward L. Deci
  
  Vol. 52 (2001), pp. 141–166
- Mediation Analysis
  
  David P. MacKinnon, Amanda J. Fairchild, and Matthew S. Fritz
  
  Vol. 58 (2007), pp. 593–614
- Missing Data Analysis: Making It Work in the Real World
  
  John W. Graham
  
  Vol. 60 (2009), pp. 549–576
- Sources of Method Bias in Social Science Research and Recommendations on How to Control It
  
  Philip M. Podsakoff, Scott B. MacKenzie, and Nathan P. Podsakoff
  
  Vol. 63 (2012), pp. 539–569
- Grounded Cognition
  
  Lawrence W. Barsalou
  
  Vol. 59 (2008), pp. 617–645
- Personality Structure: Emergence of the Five-Factor Model
  
  J M Digman
  
  Vol. 41 (1990), pp. 417–440
- Motivational Beliefs, Values, and Goals
  
  Jacquelynne S. Eccles, and Allan Wigfield
  
  Vol. 53 (2002), pp. 109–132
More Less

Annual Review of Psychology

Volume 68, 2017

Review Article

Free

Reinforcement Learning and Episodic Memory in Humans and Animals: An Integrative Framework

Abstract

Most Read This Month

Most Cited Most Cited RSS feed

Job Burnout

Executive Functions

Social Cognitive Theory: An Agentic Perspective

On Happiness and Human Potentials: A Review of Research on Hedonic and Eudaimonic Well-Being

Mediation Analysis

Missing Data Analysis: Making It Work in the Real World

Sources of Method Bias in Social Science Research and Recommendations on How to Control It

Grounded Cognition

Personality Structure: Emergence of the Five-Factor Model

Motivational Beliefs, Values, and Goals