H265 video hard solution

Hard solution, use non-CPU for encoding, such as graphics card GPU, dedicated DSP, FPGA, ASIC chip, etc. Current mainstream GPU acceleration platforms: INTEL, AMD, NVIDIA.

1. Comparison of soft coding and hard coding

Soft coding: direct and simple to implement, easy to adjust parameters, and easy to upgrade, but the CPU load is heavy, performance is lower than hard coding, and the quality at low bit rates is usually better than hard coding.
Hard coding: high performance, and the quality is usually lower than that of soft coding at low bit rates, but some products have transplanted excellent soft coding algorithms (such as X264) on the GPU hardware platform, and the quality is basically equivalent to soft coding.

2. The current mainstream GPU platform development framework

CUVID: NVIDIA's closed programming framework, through which GPU computing resources can be invoked, dedicated to N cards.
AMD APP: A set of general-purpose parallel programming framework proposed by AMD for its own GPU. The standard is open. By supporting the OpenCL framework on the CPU and GPU at the same time, computing power fusion is carried out.
OpenCL: An open computing language, a framework for writing programs for heterogeneous platforms. Heterogeneous platforms can include CPUs, GPUs, and other computing processors. The goal is to enable the same operations to support hardware acceleration on different platforms.
Inel QuickSync: A dedicated video codec module integrated in the Intel graphics card, dedicated to the core display.
CUDA can only run on NVIDIA GPU hardware. However, the goal of OpenCL is to face any kind of parallel processor. OpenCL is the first truly open and free copyright programming standard, which is suitable for general computing on heterogeneous systems. The heterogeneous platform can be built by CPU, GPU, DSP, FPGA or other types of processors.
DXVA: DXVA is the abbreviation of DirectX Video Acceleration, which translates into video hardware acceleration in Chinese. DXVA is a video acceleration specification specially customized by Microsoft. It has two versions, namely DXVA 1.0 and DXVA 2.0. Almost all graphics cards have hardware acceleration capabilities.

3. Process difference

Hard decoding and soft editing: read(ffmpeg) -> decoder(NVIDIA) -> | Queue -> encoder(ffmpeg) soft decoding and soft editing:
read(ffmpeg) -> decoder(ffmpeg) ->encoder(ffmpeg)
decoding and encoding A queue is maintained during the interval, and the queue length is set to 20 (because the decoding speed is faster than the encoding speed, the data is overwritten, and the frame is lost)

4. NVIDIA CUVID, Intel QuickSync and DXVA2, among which DXVA2 is divided into DXVA2 (copy-back) and DXVA2 (native), so what is the difference between these decoding methods?

NVIDIA CUVID is a dedicated hardware decoding interface for NVIDIA, which can enable hardware de-interlacing.
Intel QuickSync: Intel internal display dedicated hardware decoding interface, the CPU usage is about 5~10% higher than other hardware decoding modes, and the hardware can be turned on for interleaving processing.
DXVA2 (copy-back): A hardware acceleration interface developed for Microsoft. AMD, NVIDIA, and Intel graphics cards can all be used. It will return the decoded information to the memory. Due to multiple return actions, the performance will be lower Native is slightly worse, but the advantage is that filters can be added between the decoder and the renderer.
DXVA2 (native): A hardware acceleration interface developed for Microsoft. AMD, NVIDIA, and Intel graphics cards can also be used. The decoded information will not be transmitted to the memory, and it will be rendered directly, so the performance is better than copy-back , the disadvantage is that it is more restrictive.
So the decoding method is recommended: DXVA2 (native) > DXVA2 (copy-back) > NVIDIA CUVID or Intel QuickSync.

5. NVIDIA hardware codec solution

1. Use the codec interface in the SDK

Nvidia provides two related SDKs for video codec
NVENC -- responsible for hardware encoding
NVCUVID -- responsible for hardware decoding
NVENC is a separate SDK, integrated on the latest graphics card driver, you can find the relevant library after installing the latest driver document. In Ubuntu 14.04, related library files can be found under the /usr/lib/nvidia-352/ directory.
NVCUVID is a CUDA component included in the latest CUDA Toolkit. However, the library file libnvcuvid.so can be found in the class library of the graphics card. In the previous version of the graphics card driver, there is also a hardware encoder called NVCUVENC corresponding to NVCUVID, but this component has been replaced by NVENC.

2. Use the encoder to package OpenCL and SDK

This method is the most ideal way that I personally think. FFMPEG currently has an encoder nvenc that is a package for Nvidia's NVENC. By using it, it can be seamlessly integrated with FFMPEG. In addition it also contains the package for Intel QSV. AMD's related interface has not found relevant information so far.
However, FFMPEG only has the interface of NVENC, and there is no encapsulation of NVCUVID. If you need to implement related decoders, you may need to implement the FFMPEG interface yourself.
libx264 has a package for OpenCL, but I didn't succeed when I tried this function in windows.
There is also an open source format converter, HandBrake, which includes wrappers for Intel QuickSync, image stretching using OpenCL and opencl wrappers using x264. The disadvantage of this project is that the documentation is not very rich, and it is difficult to study.

To use the nvenc encoder in FFMPEG, you need to add the enable-nvenc option to the compilation options (the old version, the new version is automatically detected, and the option to disable-nvenc is displayed).
This option depends on the nvEncodeAPI.h header file. This header file is not included in the private driver. You need to download the SDK from the NVIDIA VIDEO CODEC SDK. After decompression, there is this header file in the Samples/common/inc directory, copy it to a directory that can be linked to. After that, the compilation can pass smoothly, and the library containing the nvenc encoder can be obtained.

6. ffmpeg hard decodes and draws video, and the cpu still takes up a lot.

The high cpu usage is mainly because av_hwframe_transfer_data(sw_frame, frame, 0) takes up the cpu. Do not use this function, you should use d3d+dxva2 or cuda+opengl for hard rendering.
OpenCL is mainly used for general-purpose parallel computing. It is based on predefined data structures and codes, allowing GPUs to handle various general-purpose computing tasks, such as video processing, financial modeling, scientific computing, and image processing, etc., making full use of modern GPUs. Parallel scalability. OpenCL can run on multiple hardware platforms, including CPU, GPU, and FPGA, and has nothing to do with OpenGL, Direct3D, and other APIs, so it can interact with different graphics APIs.
While OpenGL focuses on computer graphics processing and rendering, it provides a powerful rendering pipeline with extensive support and a wide range of uses. In games and virtual reality applications, OpenGL is used as the standard API for real-time graphics rendering. However, OpenGL cannot perform general-purpose computing tasks, which requires us to use other APIs, such as OpenCL or CUDA.
av_hwframe_transfer_data gpu data to cpu data
dxva2_retrieve_data_call Get data and render at the same time

Seven, common commands

1. How to use the ffmpeg command to list the supported hardware decoders:

ffmpeg -hwaccels


2. ffmpeg hard solution command:

ffmpeg -i d:\input.mp4 -vcodec libx264  d:\output.mp4


The meaning of frame, fps, q, size, bitrate and speed output during ffmpeg transcoding.
frame indicates the current frame of the video; fps indicates how many video frames are encoded in one second; q indicates the encoding quality; size indicates the data size written to the file; bitrate indicates the bit rate (unit is kbits/s 1000 bits/second); time Indicates the duration of the current processing file location; speed indicates the encoding speed (how many seconds of video frames/how many seconds of ffmpeg processing time).
According to the speed, the time-consuming of ffmpeg can be calculated. The time to complete the processing of a video file by ffmpeg is: video duration seconds/speed.

ffmpeg version 4.4-full_build-www.gyan.dev Copyright (c) 2000-2021 the FFmpeg developers
  built with gcc 10.2.0 (Rev6, Built by MSYS2 project)
  configuration: --enable-gpl --enable-version3 --enable-static --disable-w32threads --disable-autodetect --enable-fontconfig --enable-iconv --enable-gnutls --enable-libxml2 --enable-gmp --enable-lzma --enable-libsnappy --enable-zlib --enable-librist --enable-libsrt --enable-libssh --enable-libzmq --enable-avisynth --enable-libbluray --enable-libcaca --enable-sdl2 --enable-libdav1d --enable-libzvbi --enable-librav1e --enable-libsvtav1 --enable-libwebp --enable-libx264 --enable-libx265 --enable-libxvid --enable-libaom --enable-libopenjpeg --enable-libvpx --enable-libass --enable-frei0r --enable-libfreetype --enable-libfribidi --enable-libvidstab --enable-libvmaf --enable-libzimg --enable-amf --enable-cuda-llvm --enable-cuvid --enable-ffnvcodec --enable-nvdec --enable-nvenc --enable-d3d11va --enable-dxva2 --enable-libmfx --enable-libglslang --enable-vulkan --enable-opencl --enable-libcdio --enable-libgme --enable-libmodplug --enable-libopenmpt --enable-libopencore-amrwb --enable-libmp3lame --enable-libshine --enable-libtheora --enable-libtwolame --enable-libvo-amrwbenc --enable-libilbc --enable-libgsm --enable-libopencore-amrnb --enable-libopus --enable-libspeex --enable-libvorbis --enable-ladspa --enable-libbs2b --enable-libflite --enable-libmysofa --enable-librubberband --enable-libsoxr --enable-chromaprint
  libavutil      56. 70.100 / 56. 70.100
  libavcodec     58.134.100 / 58.134.100
  libavformat    58. 76.100 / 58. 76.100
  libavdevice    58. 13.100 / 58. 13.100
  libavfilter     7.110.100 /  7.110.100
  libswscale      5.  9.100 /  5.  9.100
  libswresample   3.  9.100 /  3.  9.100
  libpostproc    55.  9.100 / 55.  9.100
Input #0, mpeg, from 'd:\input.mp4':
  Duration: 03:37:39.62, start: 37915.032111, bitrate: 354 kb/s
  Stream #0:0[0x1e0]: Video: hevc (Main), yuvj420p(pc, bt709), 960x576, 50 fps, 25 tbr, 90k tbn, 50 tbc
  Stream #0:1[0x1c0]: Audio: pcm_mulaw, 8000 Hz, mono, s16, 64 kb/s
Stream mapping:
  Stream #0:0 -> #0:0 (hevc (hevc_cuvid) -> h264 (h264_nvenc))
Press [q] to stop, [?] for help
Output #0, mp4, to 'd:\ch01_202203302130002-hw3.mp4':
  Metadata:
    encoder         : Lavf58.76.100
  Stream #0:0: Video: h264 (Main) (avc1 / 0x31637661), nv12(pc, bt709, progressive), 960x576 [SAR 1:1 DAR 5:3], q=2-31, 4000 kb/s, 25 fps, 12800 tbn
    Metadata:
      encoder         : Lavc58.134.100 h264_nvenc
    Side data:
      cpb: bitrate max/min/avg: 0/0/4000000 buffer size: 8000000 vbv_delay: N/A
frame=200781 fps=728 q=9.0 size= 3945472kB time=02:13:51.12 bitrate=4024.5kbits/s dup=0 drop=5 speed=29.1x

3. Display the nvidia graphics card nvidia-smi command
3.1 to display the current status of the GPU:

nvidia-smi

Detailed parameter explanation:
**GPU: **GPU number in this machine (when there are multiple graphics cards, the number starts from 0) The GPU number on the picture is: 0 **
Fan: **Fan speed (0%-100% ), N/A means that there is no fan, this speed is the fan speed expected by the computer, in reality, if the fan is blocked, the displayed speed may not be reached.
**Name: **GPU type, the type of GPU on the picture is: GeForce MX250/RTX 2080Ti
**Temp: **GPU temperature (GPU temperature is too high will cause GPU frequency to drop)
**Perf: **GPU's Performance status, from P0 (maximum performance) to P12 (minimum performance), the figure is: P0
**Persistence-M: **The status of the persistent mode, although the persistent mode consumes a lot of energy, it will cost a lot of energy when a new GPU application starts Less time, the figure shows: off
**Pwr: Usager/Cap: **Energy consumption display, Usage: How much is used, how much is the total Cap
**Bus-Id: **GPU bus related display, domain: bus: device.function
**Disp.A: **Display Active, indicating whether the GPU display is initialized
**Memory-Usage: **Video memory usage
**Volatile GPU-Util: **GPU usage
**Uncorr. ECC : **About ECC, whether to enable error checking and correction technology, 0/disabled, 1/enabled
**Compute M: **Compute mode, 0/DEFAULT, 1/EXCLUSIVE_PROCESS, 2/PROHIBITED
**Processes: **Displays the video memory usage, process number, and GPU occupied by each process
**type: **Process type. C means computing process, G means graphics process, and C+G means both.
3.2 Refresh the memory status every few seconds:

nvidia-smi -l 秒数

3.3 Write the monitoring result to the file, and specify the monitoring field to write to the file:

nvidia-smi -l 1 --format=csv --filename=report.csv --query-gpu=timestamp,name,index,utilization.gpu,memory.total,memory.used,power.draw

References:
https://tool.4xseo.com/a/169.html
https://www.cnblogs.com/huty/p/8517141.html
https://blog.csdn.net/qq_40594137/article/details /124959608
https://deepinout.com/opencl/opencl-tutorials/22_difference_between_opencl_and_opengl.html

Guess you like

Origin blog.csdn.net/byxdaz/article/details/132699279