Learning from Your Mistakes: The Role of Dopamine Activity in Prediction Errors

Post by Lincoln Tracy 

What's the science?

Understanding how associative learning occurs in the brain is one of the most important questions in neuroscience. One of the key concepts in associative learning relates to the idea of a prediction error — a mismatch between what we expect to happen and what actually happens. Both humans and animals use prediction errors to learn; the greater the error, the greater the learning. Prediction errors can be calculated using the method of temporal difference. The ability to map millisecond by millisecond changes in neuronal dopamine firing activity has been a major step forward in understanding prediction errors. However, there are still aspects of prediction errors that are yet to be fully explored. Previous research has demonstrated that optogenetics can be used to shunt—or attenuate—neuronal dopamine activity to prevent learning about a reward when it is delivered. This week in Nature Neuroscience, Maes and colleagues used second-order conditioning to determine whether blocking or shunting neuronal dopamine activity with laser light when a visual cue that predicts a reward is presented prevents learning from occurring in a similar fashion.

How did they do it?

The authors took rats whose genome had been altered to express Cre recombinase, an enzyme derived from bacteria, that could be controlled by a tyrosine hydroxylase promoter. The rats underwent surgery, where a Cre-dependent viral vector carrying halorhodopsin was injected into the ventral tegmental area (VTA) of the brain. Optic fibers were also implanted into the VTA; these would be targeted by the lasers during optogenetic stimulation. The rats were then placed on a food-restricted diet for four weeks before they were conditioned to associate a specific visual cue (stimulus A; a flashing light) with a reward (a chocolate-tasting sucrose pellet). After the training period, the rats completed two experiments; a second-order conditioning experiment and a blocking experiment. In both experiments, the percentage of time the rats spent approaching the food port where the pellet was delivered was taken as a measure of how conditioned they had become. The second-order training experiment had two types of trials. In both types of trials, the previously conditioned cue (the flashing light) was used to reinforce learning about two novel cues. That is, a second novel stimulus, either a chime (stimulus C) or a siren (stimulus D), was presented after the flashing light. In the C trials, continuous laser light was beamed onto the VTA half a second before the presentation of the flashing light so to disrupt the dopamine transmission that would normally occur when the reward predicting cue was presented. In the D trials, the light was beamed onto the VTA at a random time point after the flashing light was presented. Following the training, the rats also completed probe testing, where the chime and siren were presented without a reward. The authors then compared the behavioral response between the two trial types to determine if disrupting dopaminergic transmission impacted learning.

In the blocking experiment, the conditioning cue (the flashing light) was presented in separate compounds with each of two novel audio stimuli, a tone (stimulus X) or a click (stimulus Y). Each of these compounds was paired with reward. Normally, under these conditions, the conditioned light blocks learning about the relationships between X (or Y) and the reward. The question was, if the conditioned cue carries information about the prediction of up going reward, then disrupting this prediction would prevent the light from blocking learning about X. To test this, the laser light was beamed onto the VTA in the X and Y trials at the flashing light or at a random time point between trials, respectively. Learning these compounds was compared to a compound that consisted of a non-conditioned steady light and another audio cue, a white noise (stimulus D), which was also paired with a reward. The rats underwent probe testing following the blocking (compound) training, where the X, Y, and Z stimuli were presented alone and without a reward.

What did they find?

Optogenetic manipulation did not alter responding during second-order training. However, during the probe test, the rats responded to the D stimulus more frequently than the C stimulus. These results indicate that attenuating dopaminergic activity at the start of the reward-predictive cue prevented second-order conditioning from occurring to stimulus C. Similar to the second-order experiment, optogenetic manipulation did not alter responding during blocking training. During the probe test of the blocking experiment, the rats responded more to the control stimulus, Z, compared to the blocked stimuli, X or Y. These results confirmed that the conditioned flashing light was able to block learning about the novel cues, X and Y, but showed that attenuating the dopamine signal to the flashing light did not disrupt the ability of this stimulus to block learning about the novel stimulus X, suggesting that the dopamine signal to good predictors of reward represents a prediction error and not a prediction about reward.   

lincoln_nature_neuro_pic.png

What's the impact?

This study provides clear evidence that the increases in firing activity of dopaminergic neurons following the presentation of a reward-predicting cue serve as prediction errors to support associative learning in a similar fashion to the previously shown reward-evoked changes in dopaminergic firing. Importantly, these findings suggest a broader role for dopaminergic signaling in driving associative learning than what is thought in current theories. 

dopamine_quote_Feb11.jpg

Maes et al. Causal evidence supporting the proposal that dopamine transients function as temporal difference prediction errors. Nature Neuroscience (2020). Access the original scientific publication here.

Distinct Patterns of Activity Underlie the Motivation to Be Fair

Post by Shireen Parimoo

What's the science?

Why are people motivated to be fair? People can be fair for prosocial reasons when they value the well-being of others, or for strategic reasons when being unfair might cost them something. In the ultimatum game, which is often used to evaluate fairness, people offer to split a sum of money with a recipient who accepts or rejects the offer. Participants typically offer 40% of the sum, which suggests that they could be acting prosocially by providing a nearly equal split. Conversely, they could be acting strategically to ensure that the recipient does not reject the offer. The ultimatum game activates regions of the brain like the dorsolateral prefrontal cortex (dlPFC) that are involved in strategic processing. Prosocial behavior is thought to be supported by Theory of Mind (ToM), which is the ability to empathize with and understand other people’s mental states. No study has yet to examine the pattern of activity in brain regions belonging to the ToM network while people make fair or unfair decisions. This week in Social Cognitive and Affective Neuroscience, Speer and Boksem used functional magnetic resonance imaging to distinguish between patterns of activity associated with prosocial and strategic motivations in the cognitive control and ToM networks.

How did they do it?

Thirty-one young adults played the ultimatum game (UG) and the dictator game (DG) while undergoing functional magnetic resonance imaging scanning. They had to split €20 and could offer between €0-14 to their opponent. Half of the trials consisted of the UG and the other half of the trials consisted of the DG. Unlike the UG, there is no strategic advantage to offering a fair split in the DG, as opponents cannot reject offers made by participants. To evaluate behavior, the authors calculated the difference between the amount of money that participants offered in the two games. Participants were categorized as selfish players if there was a large difference in their offers between the two games, which suggests that they were acting strategically during the UG by offering more money to their opponent.

The authors examined patterns of activity in the ToM and cognitive control networks during the two games. First, they used Neurosynth (an online database of fMRI studies) to identify brain regions that are often active during ToM and cognitive control tasks, which included the temporoparietal junction (TPJ) and the medial prefrontal cortex (mPFC) in the ToM network and the dlPFC and posterior cingulate cortex (PCC) in the cognitive control network. For each participant, they created a model (a support vector machine classifier) to distinguish between the two games based on the pattern of activity in these networks and in individual brain regions. The classifier was trained on brain activity on a subset of UG and DG trials and then tested with a different set of trials to predict whether the pattern of activity corresponds to the UG or the DG. They correlated classifier performance with behavior to determine how patterns of activity related to participants’ motivations in the two games. Finally, to identify other brain regions that might be differentially activated by the two games, the authors applied the classifier to the whole brain by targeting a small area at a time and then correlated classifier performance with behavior.

What did they find?

In general, people made higher offers to their opponents in the UG than in the DG. There were large individual differences in motivation, as prosocial participants made similar offers between the two games whereas selfish players offered comparatively less money to their opponent in the DG. Classification accuracy in the ToM and cognitive control networks was related to behavior. Distinct patterns of activity in these networks were found to underlie prosocial and strategic motivations, as the classifier was more accurate at distinguishing between the two games when participants were behaving strategically than when they were driven by prosocial motivations.

Shireen_theoryofmind.png

Patterns of activity in individual regions of the ToM and cognitive control networks also differed between prosocial and selfish players. For example, activity in the left TPJ was more different across the two games in selfish players than in prosocial players. Similarly, classification accuracy in the bilateral dlPFC and PCC was higher when the difference in offers was larger, suggesting that the pattern of activity was more distinct between the two games in selfish than in prosocial players. Finally, classifier performance in other regions like the bilateral TPJ, mPFC, and the left IFG was also related to behavior. These results indicate that prosocial players exhibited similar patterns of activity in the two games because they did not differentially engage in strategic and prosocial reasoning. On the other hand, selfish players engaged regions in the ToM and cognitive control network differently when they were motivated to behave strategically in the UG game, even when their offers do not differ from prosocial individuals.

What's the impact?

This study is the first to demonstrate that distinct patterns of activity in the ToM and cognitive control networks underlie prosocial and strategic motivations. Importantly, these results provide a deeper insight into how people rely on both cognitive control processes and ToM processes like empathy to make fairness decisions. 

Speer and Boksem. Decoding fairness motivations from multivariate brain activity patterns. Social Cognitive and Affective Neuroscience (2020). Access the original scientific publication here.

Hypothetical Experiences Encoded by Fast, Regular Firing of Hippocampal Place Cells

Post by Amanda McFarlan

What's the science?

Whether for planning, imagination or decision-making, the ability to construct a hypothetical scenario is an important cognitive process that is fundamental to the brain. Recent studies have identified place cells in the hippocampus (a brain region known to be important for memory and spatial navigation) as a potential neural substrate for thinking about hypotheticals, as place cell firing has been observed to encode hypothetical spatial paths. However, the mechanisms underlying this activity remain unclear. This week in Cell, Kay, and colleagues investigated the role of hippocampal place cells in encoding hypothetical experiences.

How did they do it?

To study the activity of place cells, the authors trained rats to navigate a maze. By design, the maze was extremely simple: it had a single fork where rats had to choose between left or right. The rats were either placed in the centre arm of the maze where they had to move towards the fork (choice imminent group) or they were placed at the fork immediately (choice passed group). As rats ran in this maze, the authors recorded and analyzed the activity of place cells in the moments before the rats chose either the left or right arm of the maze. By doing so, they could determine whether place cells encoded the unchosen arm, and thus, encoded a hypothetical future scenario. The authors examined place cell activity at three levels: single cells, cell pairs, and at the population level. At the population-level, the authors used a decoding algorithm that summarizes the activity of all the cells (approximately dozens to hundreds of cells) recorded in the experiment.

What did they find?

The authors initially found that pairs of place cells encoding either the left or right arm of the maze fired in an alternating pattern at approximately 8 Hz, suggesting that future scenarios (choosing the left or right arm) could be encoded extremely quickly yet also extremely consistently. The authors further found that place cells were also more likely to fire in an alternating pattern when rats were approaching the maze fork (choice imminent group) compared to when they were moving away from the fork (choice passed group). Next, the authors showed that place cells at the population level encoded left and right arms in alternation at 8 Hz, similarly to what was observed in pairs of place cells. In the second stage of their study, they found that place cell activity encoding hypotheticals occurs systematically at specific phases of an 8 Hz neural rhythm called hippocampal theta, indicating that the hypothetical-encoding activity originates from a specific internal brain process. Overall, these findings indicate that hypothetical future scenarios can be neurally encoded both quickly and regularly (at 8 Hz) and that the underlying neural activity can be observed not only at the population level, but even down to the single-cell level.

place_cells_Feb4.png

What’s the impact?

This is the first study to show that neural firing can encode multiple hypothetical future scenarios both quickly and consistently (8 times a second). The authors also found that such neural firing could be seen at the single-cell, cell-pair, and population levels, and was influenced by both behavioral and anatomical factors. Together, these findings provide insight into the neural basis of how the brain can come up with hypothetical scenarios, an ability that is essential to complex cognition.

Frank_quote_Feb4.jpg

Kay et al. Constant Sub-second Cycling between Representations of Possible Futures in the Hippocampus. Cell (2020). Access the original scientific publication here.