"Hey Siri" science and technology development team behind the black!

Author | Vishant Batta

Translator | Soviet Union such as this, Zebian | Wu Xingling

Exhibition | CSDN (ID: CSDNnews)

The following is a translation:

Today Apple phone can be detected at any time and answer "Hey Siri" command, one might think, it is not ready to record our daily life conversations do?

the answer is negative!

"Hey Siri" do not so much as we thought!

Let's take a look at "Hey Siri!" The history of the development of it.

"Hey Siri!" As a pre-installed voice assistant Siri additional features, was released in September 2014 in iOS 8. However, in iOS 9 (2015 Nian September) it upgraded and may only be used to personalize voice recognition users.

The Google assistants before 2013 already have this feature, but when your screen is off, it can not support this feature. Even now, very Duo Anzhuo mobile phones do not support this feature.

Let us compare the user experience, as shown below:

Conventional way vs. "Hey Siri!"

Conventional way is this: the user picks up the handset -> press home button -> Siri start.

While in "Hey Siri!" Mode, users simply say "Hey Siri!", No buttons, you can make Siri start.

This has the advantage that, because when users inconvenient to use hand operation (such as when driving), users can also use some of the functions of the phone.

Siri's predecessor: M9 motion coprocessor

"Coprocessor" can be understood as an auxiliary processor limited functionality and battery consumption to support even when the phone is idle (screen off) can also access the "always on" function.

M9 motion coprocessor coprocessor is Apple's third-generation product family, launched in September 2015 with the iPhone 6s. Thanks to it based on ARM, 64-bit A9 chip system has powerful processing capabilities and a trace amount of battery consumption, the famous Apple phone "wake-up" feature could be realized. M9 sometimes been described as "always on embedded processor motion coprocessor (AOP - Always on Processor)"

"Hey Siri!" How does it work?

When you first enable this feature, it will prompt you to say a certain number of "Hey Siri!". Then your iPhone will save up these sounds, used in the future to identify your individual voice "trigger key."

This personalized "Trigger key" stored in the coprocessor, even if your phone is idle, the coprocessor will be listening (but not hear) all fell sound on the microphone.

Therefore, when the sound falls on the microphone, and with the "Trigger key" matches, the coprocessor will activate the main processor to start recording (as we press the Home button to open Siri the same). Then, the recording is sent to the server, and each process is similar to voice assistant will be explained.

Imagine this process, if you have thousands of keys, which you are trying to find the right match for your key to open the lock.

Here we must pay attention to the important point is that, AOP processor (A9) always "listening" rather than "listen to" the voice of the user. It is like a baby, he has been listening to people talk, but can not fully processed hear the words, only when calling his name, it will be triggered and start working.

M9 motion coprocessor released in September 2015 with the iPhone 6s. But, as mentioned at the beginning of this article, "Hey Siri!" Function as early as September 2014 has been launched. Well, the early version of the iPhone is how to "passively" listens it?

Well, if you happen to know a person with an iPhone 6, you can check "Hey Siri!". Even if your phone is in an idle state (screen off), this function can only work in charging mode. As we can simply infer it, it can only get a small amount of additional power while charging. Consider the following iPhone 6 Siri settings Screenshot:

Algorithm "Hey Siri!" Behind

User's voice will be 0.01 seconds is a sampling unit down, and then every 20 such frames (0.2 seconds), the continuous input to the neural network depth (the DNN), the neural network to convert these sounds as a probability density function when the function value exceeds the minimum threshold value, thereby activating the main processor.

DNN training

Here the threshold value is not fixed, but varies depending on the background noise. Therefore, in order to clearly understand, you can say all the time in the calculation of DNN threshold.

In addition, when the first record your voice samples and generate a "trigger button", is in fact to calculate the probability of re-training and the definition of the right DNN.

For different accents, DNN training is different. For example, "Hey Siri" a bit like the American English pronunciation of "Serious", but it has no punctuation. Different "Hey Siri!" In the "i" pronunciation length, and with an exclamation point.

Math "Hey Siri" behind

The following content is ready for all lovers of learning machine :).

This is the depth of the neural network (DNN) model:

DNN model

The overall probability function as follows:

among them:

F (i, t) is the model of a state of accumulated points i
q (i, t) is the output of the acoustic model. This output is a fractional number of categories of speech, which is close to the time t given speech patterns relating to the i-th state
s (i) is left in the state i and the overhead associated with
m (i) is continues to move rearwardly overhead from state i

Where s (i), and m (i) of the definition of "trigger key" power-related weight training, it can be assumed that:

s (i) - is determined by a single frame "trigger key", depending on the tone, volume and other parameters.

m (i) - depending on the frequency, speed, or in short "trigger key", and s (i) to change the size and speed parameters.

For example: m (i) and s (i) of Eminem (Eminem) and Adele (Adele) is very different, since Eminem sings faster (in fact much faster) while little change. And Adele sings more slowly, and greater change.

Taking into account the processing power and battery consumption, for the coprocessor (layer 32) and a main processor (192 layer), the size of the hierarchy are different DNN.

"Hey Siri!" Although this feature has not been widely publicized, but it is a revolutionary step towards automation and increasing mobile phone usability step. It can also be seen as a good example of how a small change in how users experience an enormous impact, and sometimes requires extensive study of these tiny revolutionary change.

Original link: https: //hackernoon.com/how-does-hey-siri-work-without-your-iphone-listening-to-you-at-all-times-827932do

This article CSDN translation, please indicate the source of the source.

【END】