A new milestone in the field of brain-computer interface: mental speech, machine interpretation

https://mp.weixin.qq.com/s/fyXVvmpl_12sS-khxuYcPQ

By 超神经

场景描述:利用神经网络将人说话时,相应大脑区域的神经信号进行解码,然后使用循环神经网络将信号合成为语音,可帮助语言障碍患者解决语言沟通问题。

关键词:循环神经网络 解码器 脑机接口 语音合成

"Mind reading" may really be realized.

Talking is an uncommon thing for most people. However, there are still many people in the world who suffer from these diseases: stroke, traumatic brain injury, neurodegenerative diseases such as Parkinson's disease, multiple sclerosis and amyotrophic lateral sclerosis (ALS or Lou Gehrig's disease) And so on, they often lose the ability to speak, and it is irreversible.

Scientists have been working hard to restore human functions and nerve repair. Brain-computer interface (BCI) is a key area.

The brain-computer interface refers to the direct connection created between the human or animal brain and external equipment to realize the exchange of information between the brain and the equipment.

A new milestone in the field of brain-computer interface: mental speech, machine interpretation
The "brain" in the brain-computer interface refers to the
brain or nervous system of organic life forms , not just the brain

But it seems that the brain-computer interface has always been a distant concept. Today, the paper "Speech synthesis from neural decoding of spoken sentences" ("Speech synthesis from neural decoding of spoken sentences") published in the top academic journal "Nature", let us see that the research in the field of brain-computer interface has taken a step forward. Stride.

The plight of people with language disorders

In fact, research on brain-computer interfaces has been going on for more than 40 years. But so far the most successful and most popular clinical application is only sensory repair techniques such as cochlear implants.

So far, some people with severe language barriers still can only use assistive devices to express their thoughts verbatim.

These assistive devices can track very subtle eye or facial muscle movements and spell words and sentences based on the patient's gestures.

The physicist Hawking, once installed such a device on his wheelchair.

A new milestone in the field of brain-computer interface: mental speech, machine interpretation
Hawking relies on a speech synthesizer to "speak". He has used many sets of auxiliary communication systems

At that time, Hawking relied on the muscle movements detected by infrared rays to issue commands, confirm the letters scanned by the computer cursor, and write the text he wanted. After that, use the text-to-speech device to "speak" the words. It is with the help of these black technologies that we can see his book "A Brief History of Time".

However, generating text or synthesized speech with such a device is not only laborious, but also error-prone, and the synthesis speed is very slow, usually allowing up to 10 words per minute. Hawking was already fast, but he could only spell 15-20 words. Natural speech can reach 100 to 150 words per minute.

In addition, this method is also severely limited by the operator's own physical exercise ability.

In order to solve these problems, the field of brain-computer interface has been studying how to directly interpret the corresponding electrical signals of the cerebral cortex into speech.

Neural network interprets brain signals to synthesize speech

Today, this problem has ushered in a breakthrough.

Edward Chang, a professor of neurosurgery at the University of California, San Francisco, and his colleagues in the published paper "Speech Synthesis for Neural Decoding of Spoken Sentences", proposed that the brain-computer interface he created can decode the neural signals generated when people speak and synthesize them into voice. The system can generate 150 words per minute, which is close to the normal human speech speed.
A new milestone in the field of brain-computer interface: mental speech, machine interpretation

The first author of the paper, Gopala Anumanchipalli, holds a set
of exemplary intracranial electrodes used to record brain activity in the current study

The team of researchers recruited five epilepsy patients under treatment, asked them to speak hundreds of sentences aloud, and at the same time, recorded their high-density electroencephalogram (ECoG) signals and tracked the brain’s speech production center -Neural activity in the ventral sensorimotor cortex.

Using Recurrent Neural Networks (RNN), the researchers deciphered the collected neural signals in two steps.

In the first step, they converted neural signals into signals that characterize the actions of the articulators, including brain signals related to the actions of the jaw, throat, lips and tongue.

The second step is to convert the signal into spoken words and sentences based on the decoded vocal organ actions.

A new milestone in the field of brain-computer interface: mental speech, machine interpretation
Illustration of the steps of brain-computer interface to realize speech synthesis

In the decoding process, the researcher first decodes the continuous electrogram signals of the three brain regions when the patient is speaking. These electrogram signals are recorded by invasive electrodes.

After decoding, 33 kinds of vocal organ movement characteristic indexes are obtained, and then these movement characteristic indexes are decoded into 32 speech parameters (including pitch, voicing, etc.), and finally speech sound waves are synthesized according to these parameters.

In order to analyze the accuracy of the synthetic speech to the real speech, the researchers compared the sound wave characteristics of the original speech and the synthesized speech, and found that the speech decoded by the neural network reproduced a single phoneme in the original sentence of the patient quite completely , And the natural connections and pauses between phonemes.

A new milestone in the field of brain-computer interface: mental speech, machine interpretation
Comparison of original speech sound waves (top) and synthesized speech sound waves (bottom)

After that, the researchers used crowdsourcing to allow netizens to recognize the speech synthesized by the decoder. The final result is that the success rate of listeners retelling synthesized speech content is close to 70%.

In addition, the researchers also tested the decoder's speech synthesis ability for silent speech. The tester first said a sentence, and then said the same sentence silently (with actions but no sound). The results show that the speech spectrum synthesized by the decoder for silent actions is similar to the voiced spectrum of the same sentence.

Speech synthesis demonstration of neural decoding of spoken sentences

Milestone: Challenges and expectations coexist

"This research shows for the first time that we can generate complete spoken sentences based on individual brain activity," Edward Chang said. "This is exciting. This is a technology that is already within reach. We should be able to build Clinically feasible equipment."

A new milestone in the field of brain-computer interface: mental speech, machine interpretation
Dr. Edward Chang’s research focuses on
the brain mechanisms of speech, movement and human emotions

The first author of the paper, Gopala Anumanchipalli, added: "I am proud to be able to use neuroscience, linguistics and machine learning expertise as part of this important milestone in helping patients with neurological disabilities."

Of course, there are still many challenges in realizing 100% speech-synthesis brain-computer interface voice interaction, such as whether the patient can accept invasive surgery to install electrodes, whether the brain waves in the experiment are the same as those of real patients, and so on.

However, from this research, we see that speech synthesis brain-computer interface is no longer a concept.

Looking forward to one day in the future, people with language impairments will be able to regain the ability to "speak" as soon as possible and express their feelings as soon as possible.

HyperNeuropedia

Feedforward Neural Networks

Feedforward neural network is the earliest simple artificial neural network invented in the field of artificial intelligence. Inside it, the parameters propagate unidirectionally from the input layer to the output layer. Unlike a recurrent neural network, it does not form a directed ring inside.

Feedforward can also be called forward. From the perspective of signal flow, it means that after the input signal enters the network, the signal flow is one-way, that is, the signal flows from the previous layer to the next layer, all the way to the output layer, any two of them There is no feedback between the connections, that is, the signal does not return from the next layer to the previous layer. If it is understood from the input-output relationship, when the input signal enters, each layer after the input layer uses the output of the previous layer as input.

When the signal between the layers in the feed-forward neural network flows in the reverse direction, or is self-input, we call this network a recurrent neural network.

In a deep feedforward network, the chain structure is the connection between layers, and the number of layers represents the depth of the network.
A new milestone in the field of brain-computer interface: mental speech, machine interpretation

Guess you like

Origin blog.51cto.com/14929242/2535451
Recommended