Neural Predictions of Others’ Beliefs

Post by Elisa Guma

What's the science?

Humans are able to form detailed representations of others’ thoughts and beliefs that are distinct from their own. This capacity, often referred to as theory of mind, is critical to our social behaviour and ability to interact with others. While a number of brain areas, including the temporal-parietal junction, superior temporal sulcus, and dorsal medial prefrontal cortex, have been shown to support social reasoning, little is known about the mechanisms underlying theory of mind at the neuronal level. This week in Nature, Jamali and colleagues use single neuron recordings from the human dorsomedial prefrontal cortex during behavioural tasks to understand how these cells support theory of mind. 

How did they do it?

The authors used custom-made, multi-electrode arrays to record neuronal activity from 324 neurons in the dorsomedial prefrontal cortex of 15 participants as they performed an auditory version of a common theory of mind task: the false-belief task. Participants were patients undergoing surgery unrelated to study participation.

During the behavioural task, participants were presented with a series of unique narratives describing simple events, paired with questions about the events that tested their knowledge about another’s belief. For example, the narrative could be: You and Tom see a jar on the table. After Tom leaves, you move the jar to the cupboard,” followed by the question “Where does Tom believe the jar is?”. In addition to this scenario, considered a false-belief trial (i.e. the other’s belief is different than reality), participants were presented with true-belief trials, in which the other’s belief is the same as reality (i.e. leaving the jar on the table while Tom is away). To distinguish self-from other-belief representations in the brain, participants were also given trials in which their own imagined belief had to be judged as true or false. In addition to the scenario described above, a number of others were focused on another's beliefs of objects (e.g. table), containers (e.g. cupboard), foods (e.g. vegetables), places (e.g. park), animals (e.g. cat), and appearance (e.g. colour red).

Following the alignment of trial events with neural activity, the authors used linear models that quantified whether and to what degree the activity of each recorded neuron could predict the specific trial condition (i.e. to distinguish between true- and false-belief trials or related to objects vs. non-objects) during questioning. Neuronal data were randomly divided into two subsets. One subset consisted of 80% of the trials and was used to train the model to predict the trial condition. Next, the other subset was used to test the accuracy of the model’s predictions on fresh data.

What did they find?

The authors found that many neurons (20%) in the dorsomedial prefrontal cortex responded selectively when considering another’s beliefs. Further, they found that 23% of neurons accurately predicted whether the other's beliefs were true or false. These neurons were distinct from those activated (27%) when participants had to determine whether their own imagined beliefs were true or false, confirming that a distinct class of neurons encodes and predicts beliefs other than our own. Finally, the authors observed differences in the neuronal populations activated based on the contents of the others’ beliefs (i.e. object, place, food, etc). These data indicate that these neurons encoded highly detailed information about the others’ beliefs and suggest that in order to accurately infer the beliefs of others, it is important to determine whether the beliefs are true or false as well as the specific beliefs being considered.

elisa+%281%29.jpg

What's the impact?

By leveraging single cell recordings in the human dorsomedial prefrontal cortex, the authors identified neurons that encode information about others’ beliefs, distinct from their own beliefs, across richly varying scenarios. They show that there is high specificity between cells based on the content of the others’ beliefs and that they are able to accurately predict whether these beliefs are true or false. These findings provide great insight into the cellular underpinnings of how the human brain is able to represent others’ beliefs.  Future work may investigate whether these neurons are affected in individuals with psychiatric disorders known to affect social cognition and theory of mind.

Williams_feb2.jpg

Jamali et al. Single-neuronal predictions of others’ beliefs in humans. Nature (2021). Access to the original publication can be found here.

“Circuit Motifs” Underlying Short-Term Memory

Post by Lani Cupo

What's the science?

Neurons generally fire briefly in response to stimuli that activate them. However, the neurons that underlie short-term memory are arranged in “circuit motifs,” or interconnected groups of neurons that continue to activate one another. This pattern of activation allows neurons to maintain a signal after the initial stimulus has ended, forming the basis of short-term memory. Circuits can take many forms in terms of the way they are connected and the strength of their connections. It is still unclear what role these different forms of motifs play in short-term memory. This week in Nature Neuroscience, Daie and colleagues investigated circuit motifs of short-term memory by using lasers (photostimulation) to stimulate neurons in the anterolateral motor cortex, an area of the brain that stores short-term memories for upcoming planned movements.

How did they do it?

The authors used data from adult mice that expressed genes allowing the authors to activate neurons with light and record activation as fluorescence. They imaged neuron activation while the animals behaved freely using two-photon microscopy. The mice were trained to distinguish between two different auditory stimuli, responding either “right” or “left” for a reward. By imaging neuron activity while they performed this task, the authors could identify which specific neurons were selective for left and right movement directions. 

photostim_feb2.png

Next, in order to better understand circuit motifs, the authors directly stimulated entire groups of neurons at the same time. They then measured the activity in neurons that were greater than 30 micrometers away from the directly targeted neurons. By activating these groups of neurons within a network, the authors could test whether other nearby neurons would also respond to this activation, or in other words, whether they were ‘coupled’ with this circuit. Using the activation patterns, they were able to calculate the connection strength between stimulated and unstimulated neurons in the same circuit, as well as the duration of activity in the circuit. Furthermore, they examined the activity of neurons selective for “left” and “right” responses in the behavioral task following stimulation. 

What did they find?

The authors found that stimulating small groups of neurons altered activity in other nearby neurons, demonstrating that they were ‘coupled’ with that circuit. These ‘coupled’ neurons that were indirectly activated also showed persistent activity lasting well beyond the duration of the light stimulation. This finding demonstrates evidence for circuits composed of strongly-connected subnetworks that produce persistent activity which may underlie short-term memory. The authors also found that neurons with similar directional selectivity (i.e. ‘right’ or ‘left’ in the behavioural task) were more likely to be coupled. When the authors stimulated neurons that were selective for the ‘right’ direction or ‘left’ direction in the behavioural task, they found that this stimulation reliably biased behaviour in the task to a greater degree than would be expected by chance. However, the direction of movement did not necessarily correspond with the selectivity of the neuron (e.g., right activation did not always result in a rightwards movement). The authors concluded that the activation of a small group of neurons reliably predicted neural activity and behaviour (i.e. movement direction).

What's the impact?

This study found that brief stimulation of groups of neurons resulted in long-lasting, persistent activity in a network of neurons. Further, the authors demonstrated that the stimulation indirectly activated nearby ‘coupled’ neurons, suggesting that these circuits are composed of subnetworks or modules. The persistent activity in these networks was directly related to short-term behavioural outcomes. These findings provide insight into the mechanisms of short-term memory at the circuit level. Future research is needed to further investigate the structure and activity of these modular networks and how they impact short-term memory and behaviour.

Svoboda_Feb2.jpg

Daie et al. Targeted photostimulation uncovers circuit motifs supporting short-term memory. Nature Neuroscience (2021). Access the original scientific publication here.

Reinforcement Learning Models Capture Human Decision-Making Processes

Post by Shireen Parimoo

What's the science?

How do people flexibly plan their actions in service of novel goals? According to reinforcement learning (RL) models of human behaviour, actions are chosen to maximize reward in the long run. In standard RL algorithms, actions are guided by knowledge of the environment, where the outcomes achieved are either known (model-based) or learned when they occur without much prior knowledge (model-free). These algorithms can require a lot of resources and also run the risk of under- or over-generalizing to new tasks. 

Two new algorithms have been proposed to model human cross-task generalization. One approach extracts similarities across tasks to inform future actions using universal value function approximators (UVFAs). Let’s consider this example: you are both tired and hungry, and want to make a decision between going to a Burger Shop (known for food), a coffee shop (known for coffee) or a diner (known for both). You know the Burger Shop is good when hungry and the coffee shop is good when tired, so you select an action using UVFA: you look for a place similar to both these places when you are both hungry and tired (the diner). In other words, you use previous values to extrapolate and predict a new value. Another approach is to keep track of actions associated with commonly encountered tasks using a generalized policy improvement algorithm (GPI): this predicts the outcome of an action from learned experience. Here’s an example of selecting an action using GPI: You are hungry and looking for a place to eat. You have been frequenting the diner lately and you have a ‘policy’ of going there when hungry. Now, you’re tired and would like to get coffee. You can also envision the outcome of going to the diner - there is coffee available, and people tend to drink coffee there. Therefore, you might choose to apply this same policy of going to the diner, in your decision to get coffee. In other words, you are generalizing the policy to a new task. This is the distinction between UVFA and GPI: UVFA uses previously learned values to approximate a solution, while GPI evaluates a previously learned policy (the relationship between an action and an end state or solution), and applies this policy to a new situation.

This week in Nature Human Behavior, Tomov and colleagues tested the generalizability of standard and new RL algorithms across tasks and compared their performance to human behavior.

How did they do it?

In a set of four online experiments, over 1100 participants played a resource-trading game set in a castle. Before each trial began, participants saw the “daily market price” for the resources – the amount of money they could expect to either receive or pay for each resource (wood, stone, and iron). For example, they might see “Wood: $1, Stone, $2, and Iron, $0”, indicating that they would receive $1 for each wood and pay $2 for each stone they had at the end of the trial. Each trial consisted of a two-step decision-making process: Participants were instructed to choose a) between three doors to enter one of three rooms, and then b) choose between another three doors per room to enter a final room containing resources. Importantly, each final door always led to the same amount of resources in the corresponding final room (e.g., door 2 in room 3 always contained 100 wood, 40 stone, and 0 iron across all trials). The amount of money received or paid, however, would change from trial to trial because the ‘daily market value’ changed each trial. In our example, participants might receive $1 x 100 wood ($100), -$2 x 40 stone (-$80), and $0 x 0 iron = $20 total.

The researchers carefully selected a few sets of daily market prices for the resources for each experiment in order to manipulate the computational demands required and to vary the degree of difficulty in successfully mapping actions to outcomes across the four experiments. In the training phase of each of the four experiments, participants completed 100 trials, each of which was randomly assigned one of the pre-selected sets of daily market prices. For example, the profit from finding wood might double earnings on one trial but cost participants money on the next trial. Thus, participants had to learn which final door would ensure maximum profit. 

The goal of the experiments was to determine which RL model the participants likely had used (model-free, model-based, UVFA, or GPI). The experiments were designed in such a way that the door participants chose on a test trial, completed after the 100 training trials, would be likely to reflect the underlying RL algorithm that best modelled how they chose actions to maximize reward. For example, the third door in the second room in one experiment would in fact result in the highest profit in the final test trial, but it would only be selected by a model-based learner who had successfully learned the entire structure of the environment. 

What did they find?

In the first experiment, participants were most likely to select actions leading to the final doors predicted by the model-based and the GPI algorithms. In more difficult experiments, however, participants were by far more likely to choose the final door predicted by GPI. Interestingly, the final door chosen by the model-based and UVFA algorithms would have been the most rewarding, yet participants did not choose those actions more frequently than would be expected by chance. In comparing the different algorithms, the authors found that participants learned to select the final door predicted by GPI faster than that predicted by the model-based algorithm, which is consistent with the fact that model-based algorithms tend to require more resources. Finally, as GPI makes predictions based on learned experience, the authors compared participants’ choice history during the training phase to their actions on the test trial. Here, learned experience did indeed drive choice at test time; participants’ tendency to choose the same door during training predicted the probability that they would select that door in the test trial. This indicates that participants kept track of the different situations they encountered during the training phase, along with the associated action-state mapping, which informed their behavior during the test.

Shireen+%282%29.jpg

What's the impact?

People use their knowledge of frequently encountered experiences to make predictions about future outcomes and inform their decisions. One of the interesting outcomes of this study is that people do not necessarily make the most rewarding decisions, but rather they tend to map previously used policies onto new scenarios. This finding provides exciting new insight into how reinforcement learning captures human decision-making processes in complex and changing environments.

Tomov et al. Multi-task reinforcement learning in humans. Nature Human Behavior (2021). Access the original scientific publication here.