X

New 'mind reading' system makes synthesized speech more human

For years researchers have tried to convert brain signals directly to speech, cutting out the muscles we actually use to speak. A new approach out of UCSF puts the jaw back in jawing.

Eric Mack Contributing Editor
Eric Mack has been a CNET contributor since 2011. Eric and his family live 100% energy and water independent on his off-grid compound in the New Mexico desert. Eric uses his passion for writing about energy, renewables, science and climate to bring educational content to life on topics around the solar panel and deregulated energy industries. Eric helps consumers by demystifying solar, battery, renewable energy, energy choice concepts, and also reviews solar installers. Previously, Eric covered space, science, climate change and all things futuristic. His encrypted email for tips is ericcmack@protonmail.com.
Expertise Solar, solar storage, space, science, climate change, deregulated energy, DIY solar panels, DIY off-grid life projects. CNET's "Living off the Grid" series. https://www.cnet.com/feature/home/energy-and-utilities/living-off-the-grid/ Credentials
  • Finalist for the Nesta Tipping Point prize and a degree in broadcast journalism from the University of Missouri-Columbia.
Eric Mack
3 min read
ecog-electrode-array-2-1024x768

An example array of intracranial electrodes of the type used to record brain activity in the current study. 

UCSF

A new technique could go a long way toward transforming brain activity into synthesized speech to truly restore the gift of gab to those who've lost the ability to talk.

Neuroscientists at UC San Francisco have created a brain-machine interface that interprets signals from the brain's speech center via a novel two-step process. Instead of trying to translate brain activity directly into sounds, the researchers convert the neural signals into the movements a person's vocal tract uses to create those sounds digitally. 

The result is artificial speech that's closer to the real thing, and at a pace that begins to approach a normal rate of conversation.

 "We showed that explicitly simulating the movements of the participants' vocal tracts -- that includes the lips, tongue, jaw, larynx -- using a computer simulation... this may produce the optimal decoding of speech from brain activity," Edward Chang, a professor of neurological surgery at UCSF, told reporters Tuesday.

Last year, MIT took a tangentially related approach with a headset that picks up on signals sent from the brain to the mouth and jaw.

The new system is being developed in Chang's lab, and the team's progress is outlined in a new paper published Wednesday in the journal Nature.

Researchers conducted the study with a handful of volunteers who already had temporary electrodes implanted in their brains in preparation for neurosurgery to treat epilepsy. They were asked to read several hundred sentences out loud while their brain activity was recorded. The data, along with audio recordings of the participants' speech, allowed the scientists to create a virtual vocal tract. This detailed computer simulation of the anatomy used to create speech could then be controlled by brain activity. The video below shows a few examples of the results.

"For the first time, this study demonstrates that we can generate entire spoken sentences based on an individual's brain activity," Chang said in a statement. "This is an exhilarating proof of principle that with technology that is already within reach, we should be able to build a device that is clinically viable in patients with speech loss."

Currently, many devices for people with severe speech disabilities require spelling out thoughts letter by letter, producing a maximum of 10 words per minute. But a system that can translate entire sentences could allow people to communicate much more rapidly, perhaps even at a speed approach the 100-150 words per minute of natural speech. 

"The authors' two-stage approach resulted in markedly less acoustic distortion," write biomedical engineers Chethan Pandarinath and Yahia H. Ali, who were not involved in the research, in a separate commentary also published in Nature. "However, many challenges remain ...The intelligibility of the reconstructed speech was still much lower than that of natural speech."  

Josh Chartier, a co-author of the new study, maintains that the levels of accuracy produced by their system improves upon existing technologies, but acknowledges that there is a ways to go to be able to perfectly mimic spoken language.

"We're quite good at synthesizing slower speech sounds like 'sh' and 'z' as well as maintaining the rhythms and intonations of speech and the speaker's gender and identity, but some of the more abrupt sounds like 'b's and 'p's get a bit fuzzy."

Another promising discovery is that the neural code for vocal movements isn't necessarily unique to every individual. This means that someone without natural speech may be able to control a speech prosthesis modeled on the voice of somebody else with intact speech.

"People who can't move their arms and legs have learned to control robotic limbs with their brains," Chartier said. "We are hopeful that one day people with speech disabilities will be able to learn to speak again using this brain-controlled artificial vocal tract."