Virtual vocal tract creates speech from brain signals, a potential aid for ALS and stroke patients

By SHARON BEGLEY @sxbegle

APRIL 24, 2019

Speaking one’s mind” is getting literal: A device that detects electrical signals in the brain’s speech-producing regions created synthetic speech good enough for listeners to mostly understand complex sentences, University of California, San Francisco, scientists reported on Wednesday.

An array of intracranial electrodes of the type used to record brain activity in the UCSF study.

Listeners missed about 30 percent of the words in the synthetic speech, such as hearing “rabbit” when the computer said “rodent,” and misunderstood some sentences with uncommon words (“At twilight on the twelfth day we’ll have Chablis”) as comically as if they’d arrived via tin can and string. The synthetic speech was nevertheless clear enough to raise hopes that a brain-computer interface might one day give voice to people who have none due to stroke, ALS, or other brain damage.

That possibility comes with lots of caveats, however, starting with the need to open up the brain to place electrodes on it. “That’s a heck of a constraint,” said neuroscientist Marcel Just of Carnegie Mellon University, who is developing noninvasive systems to try to detect thoughts.

Still, decoding the words someone wants to say would be a life-changing advance beyond current assistive devices that turn eye or facial muscle movements into letter-by-letter spelling, like one that cosmologist Stephen Hawking used for three decades before his death last year. That approach has a high error rate and is maddeningly slow, typically 10 words per minute, compared to 120 for natural speech.Related: With brain implants, scientists hope to translate paralyzed patients’ thoughts into speech

Speech is the latest frontier in brain-computer interfaces, whose first generation decoded brain signals from the motor cortex of patients with paralysis into electronic instructions to move a robotic arm or computer cursor. Previous studies have turned brain signals into speech, but they mostly “decoded single words, and those were mostly monosyllabic,” said biomedical engineer Chethan Pandarinath of Emory University and Georgia Institute of Technology, an expert on brain-machine interfaces. The UCSF scientists, in contrast, were able “to reconstruct complete sentences, [with] results somewhat intelligible to human listeners,” he said. “Making the leap from single syllables to sentences is technically quite challenging and is one of the things that makes it so impressive.”

UCSF neurosurgeon Dr. Edward Chang, who led the research, told reporters that producing “entire spoken sentences based on an individual’s brain activity” means that “we should be able to build a device that is clinically viable in patients with speech loss.”

The scientists first placed a 16-by-16 grid of electrodes, called electrocorticography (ECoG), on the brain surfaces of five participants who were undergoing unrelated epilepsy surgery and had agreed to participate in the experiment. The five then read aloud several hundred sentences, while the ECoG array detected brain signals corresponding to movement of their lips, jaw, tongue, and other parts of the vocal tract, said UCSF graduate student Josh Chartier, a co-first-author of the study. Those signals were sent to a computer programmed to simulate the vocal tract, and algorithms translated the signals into virtual tongue and other movements (more than 100 muscles produce human speech) and generated spoken words.

The virtual vocal tract is no BBC announcer. Hundreds of volunteers listening to 100 sentences of synthetic speech understood no more than 70 percent of the words, with drawn-out sounds such as shhh easier to identify than explosive ones such as b and p.

An array of intracranial electrodes of the type used to record brain activity in the UCSF study.

Two examples of a research participant reading a sentence, followed by the synthesized version of the sentence generated from their brainactivity. Credit: Chang lab, UCSF Department of Neurosurgery

The understandability test was stacked, however: the scientists gave listeners a list of 25 or 50 words that the spoken sentences could include. That clearly helped: The error rate was 31 percent with a 25-word list and 53 percent with a 50-word list. With no list, analogous to a real-world setting, “it would definitely be harder,” said co-first-author Gopala Anumanchipalli.

Still, the errors weren’t catastrophic. Listeners mistook “Those thieves stole thirty jewels” for “Thirty thieves stole thirty jewels,” for instance, and “Mum strongly dislikes appetizers” for “Mom often dislikes appetizers.”

Even with these errors, the study “represents an important step toward a speech neuroprosthesis,” said electrical engineer Nima Mesgarani of Columbia University, who is also developing such a device.

“This is a very strong signal-processing and engineering study,” said David Poeppel of New York University, an expert on speech perception who reviewed the paper for Nature. “Conceptually, the paper is not so groundbreaking,” since other labs have decoded speech from brainsignals, “but it’s solid electrophysiology and engineering.”

With a denser ECoG array and better ECoG-to-vocal-tract translation algorithms, Chang said, the system can likely be improved. But whether it could help someone who has never spoken, such as in cerebral palsy, is unclear. If someone doesn’t produce any brain signals encoding vocal tract movements, it would be impossible to train the virtual vocal tract to produce speech from ECoG signals.

A more realistic hope is to restore speech to people who once had it. But since the UCSF scientists trained their system on the brain signals of people with intact speech, it remains to be seen if people who lost that ability due to stroke, ALS, or other brain injury retain enough of the brain’s speech-making machinery for ECoG to pick up its signals clearly enough for the virtual vocal tract to decode them.

Such patients can likely still imagine words, and even hear them in their “mind’s ear” (as people can see imagined sights, like pink elephants, in their mind’s eye). Mesgarani and his colleagues therefore eavesdrop on signals not from the motor cortex, as the UCSF team did, but the sensory cortex, which hears imagined speech.

“What approach will ultimately prove better [in people who cannot speak] remains to be seen,” Mesgarani said. “But it is likely that a hybrid of the two may be the best.”


Leave a Reply

Your email address will not be published.