Master 5 key points and get the speech recognition test done!

There are tens of thousands of smart electronic products on the market now. In order to make people more convenient to use, many smart products have developed voice recognition functions for voice wakeup and interaction;

In addition, major companies have also developed various intelligent voice robots, such as Xiaomi's "Xiaoai", Baidu's "Xiaodu", Samsung's "bixby", Apple's "siri" and so on.

These speech recognition functions improve people's experience in using electronic products, but as a tester, how do you test a speech recognition product?

foreword

Next, I will take Xiaomi mobile phones as an example to introduce how to test the voice recognition of Xiaomi mobile phones.

How to test Xiaomi voice recognition function?

To know how to test the voice recognition function, let's first understand the voice interaction process of smart products:

 


Therefore, for testing, we need to prepare test points from the following dimensions:

01. Basic function test:

1 Voiceprint recording:

Voice wake-up, in order to ensure that everyone’s voice and everyone’s voice in different scenarios can be successfully voice-awakened, the test must have a variety of different voiceprints for testing.

Therefore, it is necessary to record various voiceprints to enrich the coverage of test scenarios;

2 Voice wake-up:

Normal wakeup: Use normal voiceprint for voice wakeup, and the check can be successful;

Abnormal wake-up: Use abnormal sounds, such as video/recording to wake up, music to wake up, to ensure that there will be no false wake-up.

3 Functions after wake-up:


a. Find device by voice: You can wake up the device, such as a mobile phone, and find the device by voice.

b. Volume adjustment: You can adjust the volume of the device through voice

c. Continuous dialogue: After waking up the device, you can have a continuous voice dialogue with it, and the function is normal.

d. Command recognition: After waking up, you can issue commands such as playing music, checking the weather, making calls, setting alarms, etc., and checking that the commands can be executed normally.

4 Function conflict interaction test


a. Interruption test: During the voice recognition process, there are interruptions, such as interruptions when the mobile phone wakes up; interruptions by alarm clocks, interruptions due to low battery, etc., to ensure that these interruptions can be processed normally and will not cause abnormalities;

b. Microphone conflict: If the microphone is occupied, test whether it can be woken up;

5 multi-user scenarios


Because there are many scenarios where users use speech recognition, it is difficult to fully cover the test. Therefore, we need to cover the main scenarios by analyzing the mainstream usage scenarios of users.

Through some data collection, it is found that the scene screens used by users are distributed as follows:

 


According to the survey results, users use the voice function mainly to cover the following scenarios:

 


Therefore, the test mainly focuses on covering these user scenarios, and the priority of other scenario use cases can be gradually reduced, and the test weight can be adjusted to ensure the stability and accuracy of the mainstream user scenarios.

02. UI testing


The voice-activated UI interface requires UI testing.

For example, the voice wake-up function of the mobile phone needs to check the UI interface to keep the UI friendly and beautiful;

03. Compatibility testing

1. Compatibility testing of third-party applications

If there are other applications installed in the device, such as other applications in the mobile phone, can it be awakened by voice recognition to perform specified actions; third-party application compatibility needs to be guaranteed;

2. Compatible with external devices

a. Three-stage headphone connection

b. Four-segment earphone connection

c, type-c digital headphone access

d. Bluetooth headset access

By connecting these third-party headset devices, voice recognition is possible and functions normally.

04. Automated speech recognition testing

All of the above are tested manually. To perform a relatively complete speech recognition coverage, at least the following configurations are required:

Number of testers: 10/20 people (half male and half male)

Number of tests: 50 times per scene

Test environment: office, conference room

Test scenarios: wake up when the screen is on, wake up when the screen is off, wake up by playing music on the mobile phone, wake up by voiceprint error, basic sentence recognition rate

But manual testing has some serious flaws that cannot be ignored:

1. The test methods are not uniform: different distances and different angles will lead to different recognition results.

2. During the test, the voice of the personnel fluctuated greatly

For the same algorithm and the same product, when the testers are the same and the scenarios are the same, the data of multiple rounds of testing varies greatly;

It can be seen that manual testing is time-consuming and labor-intensive, and the reference value of test data is low. Therefore, the speech recognition test can also perform some automated tests.

05. The key points of automated testing

1 Realize semi-automated voice testing

Because there is no way to provide so many people to test different corpora in manual testing, it is necessary to realize automatic synthesis and simulation of corpus. You can use python+pyaudio development + speakers to simulate the human voice to test the voice recognition.

And by increasing the volume of corpus (at least 40 groups of voiceprints), reduce the frequency of wake-up/recognition; increase different noise environments, different noises + different distances, to simulate the user's real environment.

In this way, more different corpus and scenes can be covered, and the accuracy of recognition can be greatly improved.

2 Automatic playback of corpus + automatic detection

Now there is a corpus, but if it needs to be played manually, the workload is still heavy, so it is necessary to realize the automatic playback and automatic monitoring of the corpus.

3 Add noise playback system + slide rail control system

Because the user's use scene often has a lot of noise, if the test does not simulate this noise environment, there is no way to truly restore the user scene. Therefore, you need to set some noise sources, which can automatically increase the noise and adjust the distance.

As shown in the picture below, it is Xiaomi's reverberation chamber specially built for testing language recognition, and the automatic head adjustment system

 

Guess you like

Origin blog.csdn.net/a448335587/article/details/131234518