Post by Deborah Joye
What's the science?
The ability to make decisions is an important part of survival. But how do individual neurons or the brain as a whole calculate what the best choice is? Researchers have described how choosing between two things might occur, but whether that process is the same for choosing between more than two options is not known. One possible decision-making model involves a “race” concept where evidence in favor of each option is accumulated over time until a decision threshold is reached, triggering a choice. But this model assumes that each option accumulates evidence independently and that the decision criteria remains the same across time and options. The nervous system is also much more complex than the “race” model suggests. For example, possible options can influence the perceived value of other choices, a process called value normalization. The nervous system may also involve a “global urgency signal” which decreases the amount of evidence needed to trigger a decision as time elapses. This week in Nature Neuroscience, Tajima and colleagues use tools from dynamic programming to present a model for optimal decision-making between multiple choices. The authors describe an extended race model that considers how alternative choices interact over time and how a global urgency signal increases the likelihood of a decision as time elapses.
How did they do it?
The authors built a mathematical model that describes how a decision-maker accumulates evidence for each choice option over time, how choice options interact with one another, and when the decision-maker should stop accumulating evidence and make a choice. The evidence accumulation part of the model involves summing together noisy moment-to-moment evidence for each choice over time. To identify the optimal strategy to stop accumulating evidence, the authors designed a 'value function’ which is a mathematical function that calculates the expected reward for being in a certain state at a given time. Their value function considers maximum expected reward minus the cost of accumulating evidence over time (for example, a metabolic cost associated with continuing to gather evidence) and compares the value of deciding immediately versus accumulating more evidence and deciding later. Solutions to these equations and the points at which they intersect to produce a multi-dimensional space with decision boundaries which, once crossed, indicate that option is chosen. Importantly, the authors’ value function considers the passage of time and models how decision boundaries change as time elapses. Overall, this leads to a fairly complex optimal decision-making strategy. To propose a simpler, close-to-optimal strategy, the authors use the insights gained from the dynamic programming solution to design a model neural circuit to implement their optimal decision-making policy and describe how their modelled data matches collected behavioral data.
What did they find?
When the authors simulated the basic “race” model versus “race” models that considered how options affect one another over time (normalization), how the decision boundaries change over time (urgency signal) or both, they found that the best model requires both normalization and urgency. Interestingly, they found that the inclusion of normalization alone improved reward rate more than the inclusion of urgency alone, demonstrating the importance of considering how the value of different choice options interact over time. The authors also found that data output from their model closely resembled previous physiological and behavioral findings. For example, when their model was optimized for maximum reward rate, the authors found that average activity across all accumulating neurons increased over time, replicating the physiological urgency signal that increases the likelihood of a decision over time. The authors also found that increasing the number of options in their model resulted in a decrease in average neural activity and increased reaction time. This replicates physiological and behavioral data demonstrating that more options mean the decision-maker (whether it’s a person or a neuron) accumulates evidence more slowly and takes more time to decide. Their optimal reward model also replicates seemingly “irrational” behaviors in humans and animals such as estimating the value of two top choices depending on the value of a third option, even if the third option is not valued enough to ever be chosen (violating the idea of independence of irrelevant alternatives) or that adding extra options can increase the probability of selecting an existing option (violating the regularity principle).
What's the impact?
This study provides a cohesive model for optimal decisions between more than two options. Importantly, the model replicates data from previous behavioral studies as well as seemingly “irrational” choice behaviors that other decision models do not explain. These findings provide a new perspective on decision-making that considers changes over time and brings together contrasting models. This model also produces interesting predictions that can be tested behaviorally and physiologically in future experiments.
Tajima et al., Optimal policy for multi-alternative decisions, Nature Neuroscience (2019). Access the original scientific publication here.