[2023 CANN Training Camp Season 1] Application Development (Elementary) Chapter 5 - Media Data Processing

1. Media data processing

Affected by factors such as network structure and training methods, most neural network models have format restrictions on input data. In the field of computer vision, this limitation is mostly reflected in the image size, color gamut, normalization parameters, etc. If the size and format of the source image or video are inconsistent with the requirements of the network model, we need to process the source image or video into a graph or video that meets the requirements of the model.

The way of data preprocessing

AIPP and DVPP can be used independently or in combination. In the combined use scenario, DVPP is generally used to decode, cut and zoom pictures/videos first, but due to the constraints of DVPP hardware, the format and resolution of pictures processed by DVPP may not meet the requirements of the model, so It also needs to go through AIPP for further processing such as color gamut conversion, matting, and filling.
image.png

Data preprocessing DVPP function

The Yiteng AI processor has a built-in image processing unit DVPP (Digital Video Pre-Processor), which provides powerful hardware acceleration capabilities for media processing. At the same time, CANN, a heterogeneous computing architecture, provides an entrance to use the computing power of image processing hardware: the AscendCL interface, through which developers can perform image processing in order to utilize the computing power of the Yiteng AI processor.
image.png

Width stride and high stride

Due to hardware requirements for image width and height alignment, there are two concepts of width stride and height stride in DVPP data preprocessing, which are used to represent the aligned width and height after alignment, different functions in DVPP, and different formats. The width and height alignment requirements of input or output images are also different. When using multiple functions in DVPP (such as JPEGD+VPC), special attention must be paid to the alignment instructions in the interface document.

Memory application and memory release

Memory application/release interface

Before implementing VPC, JPEGD, JPEGE and other functions of media data processing, if you need to apply for memory to store input or output data, you need to call the dedicated memory application/release interface, and the application/release is the memory on the Device. If multiple functions are used in series and the same segment of memory needs to be reused, apply for the memory media data processing V1 version according to the maximum memory requirement, call acldvppMalloc to apply for memory, and call the acldvppFree interface to release the memory media data processing V2 version, you need to call hi_mpi_dvpp_malloc Apply for memory, call hi_mpi_dvpp_free interface to release memory

memory size

For different DVPP functions and different input/output image formats, the calculation formula of the memory size is also different. When using the DVPP function, please refer to the detailed documentation.

memory reuse

The memory requested by calling the dedicated memory application interface for media data processing can also be used in other tasks. For example, from the perspective of performance, in order to reduce copying, the output of media data processing is used as the input of model reasoning to realize memory multiplexing. However, due to the limited address space accessed by media data processing, in order to ensure sufficient memory for media data processing, it is recommended to call the acrtMalloc interface, or aclrtMallocHost interface, or aclrtMallocCached interface for functions other than media data processing functions (such as model loading). The memory management interface applies for memory.

number of channels

Before implementing each function of media data processing, the interface must be called to create a channel for the corresponding function. The creation and destruction of channels will involve the application and release of resources. Repeated creation and destruction of channels will affect business performance. Therefore, it is recommended to manage channels according to actual scenarios. For example, if there is continuous VPC image processing, after creating VPC channels, wait until all VPCs After the function call is completed, destroy the VPC. If the number of channels is too large, it will affect the CPU usage and memory usage of the Device. For the number of channels, it is recommended to refer to the number of channels in the performance indicators under each function chapter.

2. JPEG image decoding

Image Resolution Constraints

image.png

When implementing the JPEGD image decoding function, it involves storing input and output data. It is necessary to call a special memory application/release interface to apply for input and output memory on the Device. The life cycle of this part of memory is managed by the user. The size of the input memory refers to the size occupied by the actual input image. For the size of the output memory, refer to the calculation formula in the left table.
When implementing the JPEGD image decoding function, only Huffman encoding is supported. The color space of the original image before compression is YUV, and the ratio of each component of the pixel is 4:4:4 or 4:2:2 or 4:2:0 or 4:0: 0 or 4:4:0 does not support arithmetic coding, does not support progressive JPEG format, and does not support JPEG2000 format.

3. VPC image scaling

Image Resolution Constraints

image.png

Picture format, width and height memory

image.png

Guess you like

Origin blog.csdn.net/qq_45257495/article/details/130875866