Live delivery is so popular, how do testers test the quality of live broadcast products?

Testers, how to test the live broadcast quality of live broadcast products?

What is the performance that users are most concerned about?

What are the standards for audio and video quality testing?

The key to truly determining future competition is to improve the quality indicators of the live broadcast software on the premise that the functions meet the needs of users, and conduct special tests on fluency, clarity, sound quality, stability, and traffic consumption, so as to improve the quality of audio and video calls. .

Basic principles of audio and video

1. Collection

Audio and video need to be collected, transmitted and transformed by sensors on hardware devices such as cameras and microphones, and finally become computer digital signals. Among them, the playback, collection and playback of two-person videos and group videos are all completed by the ffmpeg plug-in.

picture

 

2. Pretreatment

The collected audio and video data needs to be processed to get better results. Audio pre-processing includes gain control (AGC), noise suppression (ANS), echo cancellation (AEC), silence detection (VAD), etc. Video pre-processing Includes video noise reduction, scaling, and more.

picture

3. Codec

A signal or a data stream needs to be encoded and decoded. The transformation referred to here includes the operation of encoding the signal or data stream (usually for transmission, storage or encryption) or extracting an encoded stream. There are many video codecs, such as vp8, vp9, MPEG, H264, etc.; audio codecs can be divided into two categories, voice codecs (SILK, Speex, iSAC, etc.) and audio codecs (ECELT, AAC wait).

picture

4. Network transmission

In network transmission, UDP or TCP transmission will be selected according to different network environments. UDP is generally preferred for instant audio and video calls, because of its better freedom and delay; in addition to the loss in the transmission process Processing, including controlling packet size, FEC mechanism, packet loss retransmission, Jitter control, delay, out-of-sequence, etc.

picture

 

5. Post-processing

After the data is transmitted to the receiver through the network, it is decoded and enters the post-processing link. In this link, the audio data may need to be resampled or mixed, and the video may need to eliminate blocking effects, time-domain frequency reduction, and so on.

picture

6. Play/render display

After post-processing, the process of converting digital signals into sound and pictures is playback/rendering. Commonly used audio playback APIs in Windows systems include DirectSound, WaveOut, and CoreAudio.

picture

Video Quality Standard

The video quality standards and testing methods are introduced below.

1. Room entry speed

Normal network requirements: it takes less than 1 second to enter the room (iOS and Android)

Weak network requirements: There is no standard for room entry speed under a weak network

For Android, it is recommended to use low-end models (such as Xiaomi note), and for IOS, it is recommended to use iphone6S for testing

Test Methods

Coverage scenarios: the entrance should cover the whole, such as inside the app, QQ, Qzone, WeChat, circle of friends, Sina Weibo

1. A mobile phone turns on the millisecond-level stopwatch, and then another test mobile phone turns on the product under test and enters the anchor room;

2. After entering the anchor room and the first frame appears, pause the stopwatch to record data;

3. The above steps need to be repeated for 20 times of data, and the final results are averaged.

 Competitive product data

model

application

Time to enter the room (ms)

android

Competitor A

Competitor B

Competitor C

Competitor D

apple

Competitor A

Competitor B

Competitor C

Competitor D

 

2. Clarity

Normal network requirements: Compared with the previous version, the clarity has not deteriorated

Weak network requirements: In the scenario where the network packet loss rate is 10%, the clarity does not decrease significantly compared with the normal network data

Tool: Imatest

Environment debugging:

1. The distance between the camera and the target card is 0.75m, and the angle between the light source and the card to be photographed is kept at 45° to ensure that there will be no shadow on the surface of the card;

2. Before testing with fluorescent lamps (D65/CWF/SP35), warm up the light source for at least 15 minutes;

3. Measure the illuminance and color temperature of 9 points on the surface of the reflective card to ensure the consistency of the light, and adjust the position of the tested mobile phone so that its shooting position is centered.

picture

 

Steps:

1. Use different competing products to shoot cards, import the captured pictures to the PC, use the Imatest tool to calculate the sharpness, and click SFR: New File;

2. Select the picture that needs to be processed, add it, select the 13 distribution points on the picture (as shown in the figure below) for block diagram processing, and click [Yes, Continue] to complete the frame diagram

3. Click [OK] and [Yes], the calculated MTF50P is the definition of the picture
picture

Influencing factors

The definition is greatly affected by the video resolution and bit rate. The higher the sending bit rate and the higher the resolution, the better the video definition. Be careful not to judge the definition based on the resolution alone based on the bit rate.

Competitive product data

anchor mobile platform

competing products

Clarity value

IOS

Competitor A

Competitor B

Competitor C

Android

Competitor A

Competitor B

Competitor C

 

3. Frame rate

Normal network requirements: Due to the special physiological structure of the human eye, if the frame rate of the viewed picture is higher than 16, it will be considered coherent, so the frame rate is recommended not to be lower than 16 frames. When setting the frame rate, you can comprehensively consider according to your needs and compare with competing products. When the frame rate is lower than 5 frames, the human eye can obviously feel that the picture is incoherent, resulting in a feeling of being stuck.

Weak network requirements: In the scenario where the network packet loss rate is 10%, the frame rate does not drop significantly compared with the normal network

Test Methods

Equipment: 2 computers + 1 camera + 2 mobile phones. One computer plays the video, one computer records the video, one mobile phone acts as the host, one mobile phone acts as the viewer, and the camera collects the viewer's screen.

Video source: specific video demo.avi

Steps:

1. Computer 1 plays the loop video demo.avi, and computer 2 plugs in the camera, and opens the "VideoStudio" software;

2. Mobile phone A initiates the live broadcast, and mobile phone B is the audience of the live broadcast. A points at the computer playing the video, opens the "VideoStudio" software, and points the camera of the recording computer at B;

3. Click Capture in the "VideoStudio" software——"Capture Video—"Set "Capture Folder", click Capture Video (about 10~20s recording), and the video capture is completed. The video format after capture is mpg format;

4. Convert the file in mpg format to yuv format: Edit the mepg2Dec.cmd file, as shown in the figure below, change the file name to the captured video file name, save and run mpeg2dec.exe;

picture

5. Open the YUVviewerPlus.exe file, as shown in the figure below, set the resolution of the recorded video (the default resolution of the VideoStudio recording file is 720*480), click open File to open the converted yuv format file;

6. Click "next" to start counting the number of frames. Based on 30 frames in 1s, the number of scene image changes within 30 seconds is the frame rate (preferably counting 3s). The average value of the number of image changes in 3s. It is recommended to take the average at the beginning/middle/end of the recorded video.

Influencing factors

When the network is normal and undamaged, the frame rate is mainly affected by the video, and the higher the video bit rate, the higher the frame rate and high resolution video bit stream will be encoded.

Competitive product data

competing products

anchor mobile platform

frame rate

Competitor A

IOS

Android

Competitor B

IOS

Android

Competitor C

IOS

Android

4. Freezing times

standard

Normal network requirements:

Weak network requirements:

Test Methods

Globe (IOS) or automated testing tool (Android)

 Influencing factors

When the network is normal and undamaged, the frame rate is mainly affected by the video. The higher the video bit rate, the higher the frame rate and high resolution video bit stream will be encoded.

5. Video quality stability

Under various damage and change scenarios, no blurred screen, black screen, automatic interruption, etc. occurred within 3 hours of live broadcast

Test Methods

1. Automated damage testing, and use the software VideoStudio to record;

2. Check whether the recorded video has blurred screen, black screen or abnormal interruption.

audio quality standard

The audio quality standards and test methods are introduced below.
1. Sampling rate

Normal network requirements: audio sampling rate greater than 16k

Weak network requirements: audio sampling rate greater than 16k

The test needs to cover the live broadcast scene and the mic-connected scene.

Test Methods

Equipment: two mobile phones, sample playback device, voice recorder

1. One mobile phone enters the anchor environment, and the other mobile phone serves as the audience terminal;

2. Use a device that can play voice (music) samples to play on the anchor side;

3. The audience uses a recording pen to record the received voice

4. Use adobe audition to view the spectrum: the highest spectrum is about 7k, so the sampling rate should be 16k;
picture

2. Objective scoring of sound quality

Normal network requirements: During normal network live broadcast, the average voice quality score >= 4.0 points

Weak network requirements: when live broadcasting on a weak network, the average voice quality score >= 3.5 points

Test Methods

Live mode: Since the live broadcast time delay is greater than 2 seconds, the audio cable is used to record and cut the score before using the SPIRENT device for scoring.

Equipment: two audio cables, one PC, two mobile phones

1. The microphone on the host end is connected to the speaker of the PC, and the speaker on the audience end is connected to the microphone of the PC;

2. The PC loops to play 48k voice samples (the sample duration is 10s);

3. Open adobe audition to record, the recording time is about 2mins;

4. Cut the recorded audio into segments (each voice is 10s, and the previous blank voice is reserved for about 3s)

5. Upload the cut audio file to the SPIRENT device and calculate the POLQA average score.

Connected microphone mode: the delay is less than 1s, and the sound quality can be directly measured with SPIRENT equipment.

1. The anchor end connects to the audience end;

2. Connect to the SPIRENT device to test the sound quality, the two-way test time is about 8mins;

3. Get the average sound quality score

3. Audio and video synchronization

Under normal network and weak network, the probability of audio and video being out of sync is 0.

Test Methods

While watching the live broadcast, subjectively judge whether the anchor's mouth shape and voice in the video screen match.
Lianmai-Noise Suppression

In the host and audience mode, the noise cancellation effect of the host→audience is not worse than that of the previous version.

Test Methods

Equipment: one audio cable, a device for playing voice samples, and one PC

1. The anchor end connects to the audience end;

2. Put the mobile phone of the anchor in the anechoic room and fix it, and then use a device that can play voice samples to play the noise sample in the anechoic room;

3. Connect the speaker port of the viewer to the microphone of the PC;

4. Use Adobe Audition to record and save the file;

5. Record the previous version in the same way (keep the same test environment);

6. Compare the old and new versions, select the same speech segment and noise segment, and calculate the signal-to-noise ratio.
picture

 4. Lianmai-Echo Cancellation

Standard: In the co-mic mode between the anchor and the audience, the echo heard by the speaker is relatively small during single-speak and double-speak, which will not affect communication.

Test Methods

Single talk: Turn on the loudspeaker at the audience end, speak at the host end, and listen to whether there is an echo subjectively; on the other hand, the audience end speaks and listen to whether there is an echo.

Dual talk: Both parties turn on the speakers and speak at the same time, subjectively listening to see if there is an echo, or the sound is intermittently cut.
 

Finally, I would like to thank everyone who has read my article carefully. Reciprocity is always necessary. Although it is not a very valuable thing, you can take it away if you need it:

These materials should be the most comprehensive and complete preparation warehouse for [software testing] friends. This warehouse has also accompanied tens of  thousands of test engineers through the most difficult journey, and I hope it can help you! Partners can click the small card below to receive

Guess you like

Origin blog.csdn.net/kk_lzvvkpj/article/details/130019344