Amazon's Alexa upcoming AI-based whisper mode

Amazon's Alexa upcoming AI-based whisper mode

Source: ATYUN AI platform 

Amazon launched a series of features that will be available to new and existing speakers, through its Alexa intelligent voice platform. One is the "whisper mode", which makes it possible to respond Alexa whisper by whisper. In a blog post published today, Amazon Alexa voice system experts Zeynab Raeesy revealed AI basis for this function.

Most of the work are described in detail in a paper "based LSTM whisper test", the paper will be published in the IEEE spoken technical seminar in December.

Raeesy said, "If you just go to sleep in a child's room, while others came in, you may be whispering, show that you are trying to make room to keep quiet. Another person may begin to whisper, that we hope to Alexa kind of natural, intuitive way to respond to the conversation thread. "

Raeesy explained that the reason for the low language is difficult to explain is that it is mostly voiceless, that is to say, it does not involve vibration of the vocal cords. Compared with normal speech, it tends to have less energy in the low frequency band.

She and colleagues studied two ways to use different neural networks, mathematical function layer loosely modeled after the neurons in the human brain to distinguish between normal and whispered words.

Two neural networks has been on a different architecture, a multi-layer perceptron (MLP), the second is a long-term short-term memory (LSTM) network, which processes the input sequence, but training on the same data. The data includes (1) the number of filter bank energy, or a group of the recording signal characteristic of the difference between the signal represented by the speech signal energy within different frequency ranges, and (2) "using the" normal voice and whispered .

In tests, they found that LSTM generally perform better than MLP, it has many advantages. As explained Raeesy other components Alexa voice recognition engine is completely dependent on the log filter bank energy, and provide the same input data for different components of the entire system more compact.

However, this has not been easy, at least initially. Or reply since the end of Alexa by a short silence (called "end point" technology) identifies a command, confidence LSTM decline at the end of discourse. To solve this problem, researchers LSTM output of the entire discourse analysis were average, and finally, discarding the last 1.25 seconds of voice data critical to maintaining performance.

October whisper mode will be available in US English.

This switched ATYUN artificial intelligence media platforms, the original link: Amazon Alexa upcoming AI-based whisper mode

more recommendations

Extended high-quality data labeled AI basic skills and tips

Bloomreach: as the connected digital experience powered

AI analysis of two linguistic variables to predict mental illness, the accuracy rate of 93%

Developers should understand the SOLID principles (on)

Welcome to the official public attention ATYUN number, business cooperation and contribute content, please contact E-mail: bd@atyun.com
Welcome to the official public attention ATYUN number, business cooperation and contribute content, please contact E-mail: [email protected]

 

Guess you like

Origin blog.csdn.net/whale52hertz/article/details/93190917