Pressure test comparison between ffmpeg hard decoding and soft decoding

Pressure test of ffmpeg hard decoding and soft decoding

1. Basic knowledge

This paper conducts stress testing based on intel integrated display

  • Soft decoding: the cpu decodes the video
  • Hard decoding: graphics card or multimedia processing chip to decode video
    ffmpeg can perform hard decoding based on vaapi plug-in or qsv acceleration library
    Reference video and video frame: Intel GPU (core display) codec story
    FFmpeg-vaapi is a FFmpeg plug-in, it Provides hardware acceleration based on the low-level VAAPI interface, utilizing industry-standard VAAPI to perform high-performance video codec, video processing and transcoding functions on Intel GPUs.
    · FFmpeg-qsv is a FFmpeg plugin that provides hardware acceleration based on Intel GPU. It provides high-performance video codecs, video processing and transcoding functions based on the Intel Media SDK library.

2. Pressure test

1. Experimental conditions and tool description

  • Memory: 32GB
  • OS: 22.04 Ubuntu
  • CPU: 12-core 12th Gen Intel® Core™ i7-12700 --> limited to 6 cores
  • Video: 2k resolution
  • intel_gpu_tools
    can apt-get install intel_gpu_toolsbe installed by command, create a new window and type intel_gpu_top to observe the usage of gpu

2. Pressure test script

Original copy:

#!/bin/bash
read thread_num

tmp_fifofile="/tmp/$$.fifo"
mkfifo $tmp_fifofile   # 新建一个FIFO类型的文件
exec 6<>$tmp_fifofile  # 将FD6指向FIFO类型
rm $tmp_fifofile  #删也可以,

#根据线程总数量设置令牌个数
#事实上就是在fd6中放置了$thread_num个回车符
for ((i=0;i<${thread_num};i++))
do
    echo
done >&6

start_time=`date +%s` #运行的开始时间
for ((i=0;i<${thread_num};i++))  # 找到data文件夹下所有bam格式的文件
do
    # 一个read -u6命令执行一次,就从FD6中减去一个回车符,然后向下执行
    # 当FD6中没有回车符时,就停止,从而实现线程数量控制
    read -u6
    {
    
    

        ffmpeg -i 10min39.mp4 -f null - -benchmark
        echo >&6
    } &
done
wait
stop_time=`date +%s`

exec 6>&-
expr $stop_time - $start_time
echo 'over'

However, the following problems will occur during the running process
-bash: wait: 警告: job stopped
. It is mentioned in Section 9.8 Job Control of "Advanced Programming in Unix Environment", "If the background program tries to read the terminal, this is not an error, but the terminal driver will detect this situation, And send a specific signal SIGTTIN to the background job, which stops the background process and sends a notification to the user".

Modified version: add</dev/null

#!/bin/bash
read thread_num

tmp_fifofile="/tmp/$$.fifo"
mkfifo $tmp_fifofile   # 新建一个FIFO类型的文件
exec 6<>$tmp_fifofile  # 将FD6指向FIFO类型
rm $tmp_fifofile  #删也可以,

#根据线程总数量设置令牌个数
#事实上就是在fd6中放置了$thread_num个回车符
for ((i=0;i<${thread_num};i++))
do
    echo
done >&6

start_time=`date +%s` #运行的开始时间
for ((i=0;i<${thread_num};i++))  # 找到data文件夹下所有bam格式的文件
do
    # 一个read -u6命令执行一次,就从FD6中减去一个回车符,然后向下执行
    # 当FD6中没有回车符时,就停止,从而实现线程数量控制
    read -u6
    {
    
    

        ffmpeg -i 10min39.mp4 -f null - -benchmark
        echo >&6
    } < /dev/null &
done
wait
stop_time=`date +%s`

exec 6>&-
expr $stop_time - $start_time
echo 'over'

The above shell script is the soft decoding version and the hard decoding version

  • To use QSV simply ffmpeg -i 10min39.mp4 -f null - -benchmarkchange toffmpeg -hwaccel qsv --hwaccel_output_format qsv -c:v h264_qsv 10min39.mp4 -f null - -benchmark
  • Use VAAPI to changeffmpeg -hwaccel vaapi -hwaccel_output_format vaapi -i 10min39.mp4 -f null -

3. Experimental data results

Po out a part of the decoding speed experiment results
Experimental results

Analysis of experimental results:
1. From the comparison data of the experiment, it can be seen that the decoding speed of hard coding is faster, the CPU usage rate is smaller, and the memory usage is less, all of which are better than soft decoding; If the number of concurrency is 5-100, the gpu usage rate will reach 100%.
2. In the experimental environment, the critical concurrency number of soft decoding is in the range of 21-22, and the hard decoding can still achieve faster than the normal video playback rate in the face of 100 concurrent threads. 3.
In the experimental environment, if you want to ensure that the decoding rate is not slower than the normal video playback rate: for soft decoding, the concurrent number should be controlled at no more than 20; for hard decoding, the concurrent number should not exceed 100. 4. For hard
decoding The use of cpu first increases with the increase of the number of concurrency, and then decreases with the increase of the number of concurrency. It may be due to the slowdown of the decoding rate, and the cpu can quickly cope with the switching of multiple threads.

Note: The above information is welcome to correct me

Guess you like

Origin blog.csdn.net/weixin_47407737/article/details/128960356