Using Large Language Models to Map Grammar in the Brain
Post by Anastasia Sares
The takeaway
This study shows that grammatical features are encoded by individual neurons in the brain, distinct from other characteristics of the speech signal. Grammar-sensitive neurons are distributed throughout the left hemisphere in a way that is hard to detect when averaging regional activity.
What's the science?
Since the time of Broca, scientists have developed more and more detailed maps of the language-processing areas of the human brain. They have used a variety of brain imaging techniques, including functional magnetic resonance imaging (fMRI), electroencephalography (EEG), and others. All of these techniques have been helpful, but still lack the precise detail of single-neuron recording that is done routinely in animals. The only time that electrodes are implanted in human brains is when they are already undergoing an operation, such as for epilepsy. During those times when the brain is exposed, researchers can gather precious single-neuron data.
One challenge about studying human language processing is the complexity of language itself. In everyday conversation, we are free to put together words in a myriad of forms and sentence structures. Most of the time, studies of language involve presenting pre-designed words, phrases, or sentences that have a feature we are interested in (for example, sentences with different relationships between the subject and the object). However, this eliminates the natural spontaneity of language production, and so we can’t be sure if the brain activity we observe is reflecting what our brains do daily.
Fortunately, the technology behind large language models (some of which are now commonly referred to as “AI,” though not all of them are on the same scale) has given us the ability to represent natural language production in a way that computers can digest, and we can identify interesting features without having to design awkward stimuli. This week in Nature, Cai and colleagues combined natural language processing and recordings from electrodes implanted in the brains of epilepsy patients to map human language production, including grammatical features.
How did they do it?
The authors had access to eight people who were undergoing surgery for epilepsy. At the time of their operations, the patients consented to have an array of electrodes temporarily implanted in their brains so that the researchers could gather data about the firing of individual neurons. Depending on the patient, these electrode arrays were located in different spots in the brain, so the researchers had a variety of brain regions represented.
After implantation, the researchers recorded brain activity through the electrodes while asking the participant questions and allowing them to respond freely. They recorded these conversations and applied natural language processing algorithms to classify each sample of speech. In addition to representing the individual words, this approach extracts both grammatical relationships and the hierarchical structure of phrases (see image below). By labeling these features, the authors could then look at which parts of the brain were selectively responding to parts of speech, hierarchical features, or specific conversation contexts.
What did they find?
About 9% of neurons responded specifically to a word’s part of speech (noun, adjective, verb, etc). About 16% of neurons responded to constituency depth (how deep the word was in the nested hierarchy of the sentence), and so on. These neurons had very little overlap with each other, or with neurons that tracked other features of the speech (like the meaning of the word itself, or whether the pitch of the spoken word was high or low). Interestingly, these neurons that responded to different grammatical features were scattered throughout various brain regions, which may explain why it has been hard to pinpoint grammatical processing in the brain by summing up large amounts of activity. In addition, these grammar-selective neurons appeared in both hemispheres of the brain, but the most strongly predictive ones were in the left hemisphere, which lines up with decades of research confirming that language is processed on the left side of the brain (for most people).
What's the impact?
The results of this study inform debates about how the brain processes language. First, it supports the idea that certain high-level features of language can be coded by single neurons, rather than being only a result of multiple neurons firing in a pattern. These results also suggest that the popular but simplistic “modular” approach, where “region X is responsible for function Y,” doesn’t really work here: grammar-sensitive neurons are scattered throughout the left hemisphere.
