Abstract
The reinforcement learning hypothesis of dopamine function predicts that dopamine acts as a teaching signal by governing synaptic plasticity in the striatum. Induced changes in synaptic strength enable the cortico-striatal network to learn a mapping between situations and actions that lead to a reward. A review of the relevant neurophysiology of dopamine function in the cortico-striatal network and the machine reinforcement learning hypothesis reveals an apparent mismatch with recent electrophysiological studies. It was found that in addition to the well-described reward-related responses, a subpopulation of dopamine neurons also exhibits phasic responses to aversive stimuli or to cues predicting aversive stimuli. Obviously, actions that lead to aversive events should not be reinforced. However, published data suggest that the phasic responses of dopamine neurons to reward-related stimuli have a higher firing rate and have a longer duration than phasic responses of dopamine neurons to aversion-related stimuli. We propose that based on different dopamine concentrations, the target structures are able to decode reward-related dopamine from aversion-related dopamine responses. Thereby, the learning of actions in the basal-ganglia network integrates information about both costs and benefits. This hypothesis predicts that dopamine concentration should be a crucial parameter for plasticity rules at cortico-striatal synapses. Recent in vitro studies on cortico-striatal synaptic plasticity rules support a striatal action-learning scheme where during reward-related dopamine release dopamine-dependent forms of synaptic plasticity occur, while during aversion-related dopamine release the dopamine concentration only allows dopamine-independent forms of synaptic plasticity to occur.
Similar content being viewed by others
References
Acquas E, Tanda G, Di Chiara G (2002) Differential effects of caffeine on dopamine and acetylcholine transmission in brain areas of drug-naive and caffeine-pretreated rats. Neuropsychopharmacology 27:182–193
Albin RL, Young AB, Penney JB (1989) The functional anatomy of basal ganglia disorders. Trends Neurosci 12:366–375
Alexander GE, Crutcher MD (1990) Functional architecture of basal ganglia circuits: neural substrates of parallel processing. Trends Neurosci 13:266–271
Alexander GE, DeLong MR, Strick PL (1986) Parallel organization of functionally segregated circuits linking basal ganglia and cortex. Ann Rev Neurosci 9:357–381
Alexander GE, Crutcher MD, DeLong MR (1990) Basal ganglia-thalamocortical circuits: parallel substrates for motor, oculomotor, “prefrontal” and “limbic” functions. Prog Brain Res 85:119–146
Arroyo M, Markou A, Robbins TW, Everitt BJ (1998) Acquisition, maintenance and reinstatement of intravenous cocaine self-administration under a second-order schedule of reinforcement in rats: effects of conditioned cues and continuous access to cocaine. Psychopharmacology Berl 140:331–344
Barber TA, Klunk AM, Howorth PD, Pearlman MF, Patrick KE (1998) A new look at an old task: advantages and uses of sickness-conditioned learning in day-old chicks. Pharmacol Biochem Behav 60:423–430
Bar-Gad I, Morris G, Bergman H (2003) Information processing, dimensionality reduction and reinforcement learning in the basal ganglia. Prog Neurobiol 71:439–473
Bayer HM, Glimcher PW (2005) Midbrain dopamine neurons encode a quantitative reward prediction error signal. Neuron 47:129–141
Belin D, Everitt BJ (2008) Cocaine seeking habits depend upon dopamine-dependent serial connectivity linking the ventral with the dorsal striatum. Neuron 57:432–441
Bergman H, Wichmann T, DeLong MR (1990) Reversal of experimental parkinsonism by lesions of the subthalamic nucleus. Science 249:1436–1438
Berridge KC (1996) Food reward: brain substrates of wanting and liking. Neurosci Biobehav Rev 20:1–25
Berridge KC (2007) The debate over dopamine’s role in reward: the case for incentive salience. Psychopharmacology (Berl) 191:391–431
Berridge KC, Robinson TE (1998) What is the role of dopamine in reward: hedonic impact, reward learning, or incentive salience? Brain Res Brain Res Rev 28:309–369
Bi G, Poo M (1999) Distributed synaptic modification in neural networks induced by patterned stimulation. Nature 401:792–796
Bjorklund A, Lindvall O (1984) In: Bjorklund A, Hokfelt T (eds) Classical transmitters in the CNS part I. Elsevier, Amsterdam, pp 55–122
Bonci A, Malenka RC (1999) Properties and plasticity of excitatory synapses on dopaminergic and GABAergic cells in the ventral tegmental area. J Neurosci 19:3723–3730
Brischoux F, Chakraborty S, Brierley DI, Ungless MA (2009) Phasic excitation of dopamine neurons in ventral VTA by noxious stimuli. Proc Natl Acad Sci USA 106:4894–4899
Calabresi P, Centonze D, Gubellini P, Pisani A, Bernardi G (1998) Blockade of M2-like muscarinic receptors enhances long-term potentiation at corticostriatal synapses. Eur J Neurosci 10:3020–3023
Cardinal RN, Parkinson JA, Hall J, Everitt BJ (2002) Emotion and motivation: the role of the amygdala, ventral striatum, and prefrontal cortex. Neurosci Biobehav Rev 26:321–352
Centonze D, Picconi B, Gubellini P, Bernardi G, Calabresi P (2001) Dopaminergic control of synaptic plasticity in the dorsal striatum. Eur J Neurosci 13:1071–1077
Cheer JF, Aragona BJ, Heien ML, Seipel AT, Carelli RM, Wightman RM (2007) Coordinated accumbal dopamine release and neural activity drive goal-directed behavior. Neuron 54:237–244
Cragg SJ (2006) Meaningful silences: how dopamine listens to the ACh pause. Trends Neurosci 29:125–131
Cragg SJ, Hille CJ, Greenfield SA (2000) Dopamine release and uptake dynamics within nonhuman primate striatum in vitro. J Neurosci 20:8209–8217
Cragg SJ, Nicholson C, Kume-Kick J, Tao L, Rice ME (2001) Dopamine-mediated volume transmission in midbrain is regulated by distinct extracellular geometry and uptake. J Neurophysiol 85:1761–1771
Dalley JW, Fryer TD, Brichard L, Robinson ES, Theobald DE, Laane K, Pena Y, Murphy ER, Shah Y, Probst K, Abakumova I, Aigbirhio FI, Richards HK, Hong Y, Baron JC, Everitt BJ, Robbins TW (2007) Nucleus accumbens D2/3 receptors predict trait impulsivity and cocaine reinforcement. Science 315:1267–1270
DeLong MR, Georgopoulos AP (1981) Motor functions of the basal ganglia. In: Brookhart JM, Mountcastle VB, Brooks VB, Geiger SR (eds) Handbook of physiology. The nervous system. Motor control, Sect. 1, Pt. 2, vol II. American Physiological Society, Bethesda, pp 1017–1061
Descarries L, Gisiger V, Steriade M (1997) Diffuse transmission by acetylcholine in the CNS. Prog Neurobiol 53:603–625
Di Chiara G, Bassareo V (2007) Reward system and addiction: what dopamine does and doesn’t do. Curr Opin Pharmacol 7:69–76
Di Chiara G, Bassareo V, Fenu S, De Luca MA, Spina L, Cadoni C, Acquas E, Carboni E, Valentini V, Lecca D (2004) Dopamine and drug addiction: the nucleus accumbens shell connection. Neuropharmacology 47(Suppl 1):227–241
Fino E, Glowinski J, Venance L (2005) Bidirectional activity-dependent plasticity at corticostriatal synapses. J Neurosci 25:11279–11287
Fiorillo CD, Tobler PN, Schultz W (2003) Discrete coding of reward probability and uncertainty by dopamine neurons. Science 299:1898–1902
Frank MJ, Seeberger LC, O’Reilly RC (2004) By carrot or by stick: cognitive reinforcement learning in parkinsonism. Science 306:1940–1943
Frank MJ, Samanta J, Moustafa AA, Sherman SJ (2007) Hold your horses: impulsivity, deep brain stimulation, and medication in parkinsonism. Science 318:1309–1312
Gerfen CR (1992) The neostriatal mosaic: multiple levels of compartmental organization. J Neural Transm Suppl 36:43–59
Haber SN, Fudge JL, McFarland NR (2000) Striatonigrostriatal pathways in primates form an ascending spiral from the shell to the dorsolateral striatum. J Neurosci 20:2369–2382
Hand TH, Franklin KB (1985) 6-OHDA lesions of the ventral tegmental area block morphine-induced but not amphetamine-induced facilitation of self-stimulation. Brain Res 328:233–241
Hollerman JR, Schultz W (1998) Dopamine neurons report an error in the temporal prediction of reward during learning. Nat Neurosci 1:304–309
Holt DJ, Graybiel AM, Saper CB (1997) Neurochemical architecture of the human striatum. J Comp Neurol 384:1–25
Horvitz JC (2002) Dopamine gating of glutamatergic sensorimotor and incentive motivational input signals to the striatum. Behav Brain Res 137:65–74
Ito R, Robbins TW, Everitt BJ (2004) Differential control over cocaine-seeking behavior by nucleus accumbens core and shell. Nat Neurosci 7:389–397
Joel D, Niv Y, Ruppin E (2002) Actor-critic models of the basal ganglia: new anatomical and computational perspectives. Neural Netw 15:535–547
Joshua M, Adler A, Mitelman R, Vaadia E, Bergman H (2008) Midbrain dopaminergic neurons and striatal cholinergic interneurons encode the difference between reward and aversive events at different epochs of probabilistic classical conditioning trials. J Neurosci 28:11673–11684
Kawagoe R, Takikawa Y, Hikosaka O (2004) Reward-predicting activity of dopamine and caudate neurons—a possible mechanism of motivational control of saccadic eye movement. J Neurophys 91:1013–1024
Kelley AE (2004) Memory and addiction: shared neural circuitry and molecular mechanisms. Neuron 44:161–179
Kelley AE, Domesick VB, Nauta WJH (1982) The amygdalostriatal projection in the rat-an anatomical study by anterograde and retrograde tracing methods. Neuroscience 7:615–630
Kerr JN, Wickens JR (2001) Dopamine D-1/D-5 receptor activation is required for long-term potentiation in the rat neostriatum in vitro. J Neurophysiol 85:117–124
Knowlton BJ, Mangels JA, Squire LR (1996) A neostriatal habit learning system in humans. Science 273:1399–1402
Kreitzer AC, Malenka RC (2005) Dopamine modulation of state-dependent endocannabinoid release and long-term depression in the striatum. J Neurosci 25:10537–10545
Lavoie B, Parent A (1990) Immunohistochemical study of the serotoninergic innervation of the basal ganglia in the squirrel monkey. J Comp Neurol 299:1–16
Lavoie B, Smith Y, Parent A (1989) Dopaminergic innervation of the basal ganglia in the squirrel monkey as revealed by tyrosine hydroxylase immunohistochemistry. J Comp Neurol 289:36–52
Ludvig EA, Sutton RS, Kehoe EJ (2008) Stimulus representation and the timing of reward-prediction errors in models of the dopamine system. Neural Comput 20:3034–3054
Markram H, Lubke J, Frotscher M, Sakmann B (1997) Regulation of synaptic efficacy by coincidence of postsynaptic APs and EPSPs. Science 275:213–215
Matsuda W, Furuta T, Nakamura KC, Hioki H, Fujiyama F, Arai R, Kaneko T (2009) Single nigrostriatal dopaminergic neurons form widely spread and highly dense axonal arborizations in the neostriatum. J Neurosci 29:444–453
Matsumoto M, Hikosaka O (2009) Two types of dopamine neuron distinctly convey positive and negative motivational signals. Nature 459:837–841
Matsumoto N, Hanakawa T, Maki S, Graybiel AM, Kimura M (1999) Nigrostriatal dopamine system in learning to perform sequential motor tasks in a predictive manner. J Neurophysiol 82:978–998
Mink JW (1996) The basal ganglia: focused selection and inhibition of competing motor programs. Prog Neurobiol 50:381–425
Montague PR, Dayan P, Sejnowski TJ (1996) A framework for mesencephalic dopamine systems based on predictive Hebbian learning. J Neurosci 16:1936–1947
Morris G, Arkadir D, Nevet A, Vaadia E, Bergman H (2004) Coincident but distinct messages of midbrain dopamine and striatal tonically active neurons. Neuron 43:133–143
Morris G, Nevet A, Arkadir D, Vaadia E, Bergman H (2006) Midbrain dopamine neurons encode decisions for future action. Nat Neurosci 9:1057–1063
Moss J, Bolam JP (2008) A dopaminergic axon lattice in the striatum and its relationship with cortical and thalamic terminals. J Neurosci 28:11221–11230
Nakahara H, Itoh H, Kawagoe R, Takikawa Y, Hikosaka O (2004) Dopamine neurons can represent context-dependent prediction error. Neuron 41:269–280
Nicola SM, Surmeier J, Malenka RC (2000) Dopaminergic modulation of neuronal excitability in the striatum and nucleus accumbens. Annu Rev Neurosci 23:185–215
O’Doherty JP, Dayan P, Friston K, Critchley H, Dolan RJ (2003) Temporal difference models and reward-related learning in the human brain. Neuron 38:329–337
Olds J, Milner P (1954) Positive reinforcement produced by electrical stimulation of septal area and other regions of rat brain. J Comp Physiol Psychol 47:419–427
Owesson-White CA, Cheer JF, Beyene M, Carelli RM, Wightman RM (2008) Dynamic changes in accumbens dopamine correlate with learning during intracranial self-stimulation. Proc Natl Acad Sci USA 105:11957–11962
Pan WX, Schmidt R, Wickens JR, Hyland BI (2005) Dopamine cells respond to predicted events during classical conditioning: evidence for eligibility traces in the reward-learning network. J Neurosci 25:6235–6242
Pan WX, Schmidt R, Wickens JR, Hyland BI (2008) Tripartite mechanism of extinction suggested by dopamine neuron activity and temporal difference model. J Neurosci 28:9619–9631
Pavlov IP (1927) Conditioned reflexes: an investigation of the physiological activity of the cerebral cortex. Oxford university press, London
Pawlak V, Kerr JN (2008) Dopamine receptor activation is required for corticostriatal spike-timing-dependent plasticity. J Neurosci 28:2435–2446
Percheron G, Filion M (1991) Parallel processing in the basal ganglia: up to a point. Trends Neurosci 14:55–56
Rassnick S, Stinus L, Koob GF (1993) The effects of 6-hydroxydopamine lesions of the nucleus accumbens and the mesolimbic dopamine system on oral self-administration of ethanol in the rat. Brain Res 623:16–24
Redgrave P, Gurney K (2006) The short-latency dopamine signal: a role in discovering novel actions? Nat Rev Neurosci 7:967–975
Redgrave P, Gurney K, Reynolds J (2008) What is reinforced by phasic dopamine signals? Brain Res Rev 58:322–339
Redish AD (2004) Addiction as a computational process gone awry. Science 306:1944–1947
Rescorla RA, Wagner AR (1972) A theory of Pavlovian conditioning: variations in the effectiveness of reinforcement and non-reinforcement. In: Black AJ, Prokasy WF (eds) Classical conditioning II: current research and theory. Appelton-Century Crofts, New York, pp 64–99
Reynolds JN, Hyland BI, Wickens JR (2001) A cellular mechanism of reward-related learning. Nature 413:67–70
Roesch MR, Calu DJ, Schoenbaum G (2007) Dopamine neurons encode the better option in rats deciding between differently delayed or sized rewards. Nat Neurosci 10:1615–1624
Roitman MF, Stuber GD, Phillips PEM, Wightman RM, Carelli RM (2004) Dopamine operates as a subsecond modulator of food seeking. J Neurosci 24:1265–1271
Roitman MF, Wheeler RA, Wightman RM, Carelli RM (2008) Real-time chemical responses in the nucleus accumbens differentiate rewarding and aversive stimuli. Nat Neurosci 11:1376–1377
Royall DR, Klemm WR (1981) Dopaminergic mediation of reward: evidence gained using a natural reinforcer in a behavioral contrast paradigm. Neurosci Lett 21:223–229
Salamone JD, Correa M, Farrar A, Mingote SM (2007) Effort-related functions of nucleus accumbens dopamine and associated forebrain circuits. Psychopharmacology (Berl) 191:461–482
Schmidt R, Morris G, Hagen EH, Sullivan RJ, Hammerstein P, Kempter R (2009) The dopamine puzzle. Proc Natl Acad Sci USA 106:E75
Schultz W (1982) Depletion of dopamine in the striatum as an experimental model of parkinsonism: direct effects and adaptive mechanisms. Prog Neurobiol 18:121–166
Schultz W (1994) Behavior-related activity of primate dopamine neurons. Rev Neurol Paris 150:634–639
Schultz W (2002) Getting formal with dopamine and reward. Neuron 36:241–263
Schultz W, Studer A, Jonsson G, Sundstrom E, Mefford I (1985) Deficits in behavioral initiation and execution processes in monkeys with 1-methyl-4-phenyl-1,2,3,6-tetrahydropyridine-induced parkinsonism. Neurosci Lett 59:225–232
Schultz W, Dayan P, Montague PR (1997) A neural substrate of prediction and reward. Science 275:1593–1599
Seymour B, O’Doherty JP, Dayan P, Koltzenburg M, Jones AK, Dolan RJ, Friston KJ, Frackowiak RS (2004) Temporal difference models describe higher-order learning in humans. Nature 429:664–667
Shen W, Flajolet M, Greengard P, Surmeier DJ (2008) Dichotomous dopaminergic control of striatal synaptic plasticity. Science 321:848–851
Skinner BF (1974) About behaviorism. Knopf, New York
Smith Y, Bevan MD, Shink E, Bolam JP (1998) Microcircuitry of the direct and indirect pathways of the basal ganglia. Neuroscience 86:353–387
Suri RE, Schultz W (1999) A neural network model with dopamine-like reinforcement signal that learns a spatial delayed response task. Neuroscience 91:871–890
Sutton RS (1988) Learning to predict by the methods of temporal difference. Mach Learn 3:9–44
Sutton RS, Barto AG (1998) Reinforcement learning: an introduction. MIT press, Cambridge, MA
Thivierge JP, Rivest F, Monchi O (2007) Spiking neurons, dopamine, and plasticity: timing is everything, but concentration also matters. Synapse 61:375–390
Thomas MJ, Malenka RC (2003) Synaptic plasticity in the mesolimbic dopamine system. Philos Trans Roy Soc Lond B Biol Sci 358:815–819
Thorndike EL (1898) Animal intelligence: an experimental study of the associative processes in animals. Psychol Rev, Monograph Suppl 8
Thorndike EL (1911) Animal intelligence. Hafner, Darien
Tobler PN, Fiorillo CD, Schultz W (2005) Adaptive coding of reward value by dopamine neurons. Science 307:1642–1645
Tsai HC, Zhang F, Adamantidis A, Stuber GD, Bonci A, de Lecea L, Deisseroth K (2009) Phasic firing in dopaminergic neurons is sufficient for behavioral conditioning. Science 324:1080–1084
Venton BJ, Zhang H, Garris PA, Phillips PE, Sulzer D, Wightman RM (2003) Real-time decoding of dopamine concentration changes in the caudate-putamen during tonic and phasic firing. J Neurochem 87:1284–1295
Voorn P, Vanderschuren LJ, Groenewegen HJ, Robbins TW, Pennartz CM (2004) Putting a spin on the dorsal-ventral divide of the striatum. Trends Neurosci 27:468–474
Waelti P, Dickinson A, Schultz W (2001) Dopamine responses comply with basic assumptions of formal learning theory. Nature 412:43–48
Wang Z, Kai L, Day M, Ronesi J, Yin HH, Ding J, Tkatch T, Lovinger DM, Surmeier DJ (2006) Dopaminergic control of corticostriatal long-term synaptic depression in medium spiny neurons is mediated by cholinergic interneurons. Neuron 50:443–452
Watkins CJCH, Dayan P (1992) Q learning. Mach Learn 8:279–292
Wheeler RA, Carelli RM (2009) Dissecting motivational circuitry to understand substance abuse. Neuropharmacology 56(Suppl 1):149–159
Wickens JR, Arbuthnott GW (2005) Structural and functional interactions in the striatum at the receptor level. In: Dunnett SB, Bentivoglio M, Bjorklund A, Hokfelt T (eds) Dopamine. Elsevier, Amsterdam, pp 199–236
Wickens JR, Budd CS, Hyland BI, Arbuthnott GW (2007) Striatal contributions to reward and decision making: making sense of regional variations in a reiterated processing matrix. Ann N Y Acad Sci 1104:192–212
Wise RA (1996) Addictive drugs and brain stimulation reward. Annu Rev Neurosci 19:319–340
Wise RA (2004) Dopamine, learning and motivation. Nat Rev Neurosci 5:483–494
Wise RA (2008) Dopamine and reward: the anhedonia hypothesis 30 years on. Neurotox Res 14:169–183
Woolf NJ (1991) Cholinergic systems in mammalian brain and spinal cord. Prog Neurobiol 37:475–524
Zhou FM, Liang Y, Dani JA (2001) Endogenous nicotinic cholinergic activity regulates dopamine release in the striatum. Nat Neurosci 4:1224–1229
Zhou FM, Wilson C, Dani JA (2003) Muscarinic and nicotinic cholinergic mechanisms in the mesostriatal dopamine systems. Neuroscientist 9:23–36
Zohary E, Shadlen MN, Newsome WT (1994) Correlated neuronal discharge rate and its implications for psychophysical performance. Nature 370:140–143
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Morris, G., Schmidt, R. & Bergman, H. Striatal action-learning based on dopamine concentration. Exp Brain Res 200, 307–317 (2010). https://doi.org/10.1007/s00221-009-2060-6
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00221-009-2060-6