Data arrangement and span alignment

1 Data Arrangement

1.1 The concept of data arrangement

In the deep learning framework, the feature map is usually presented in the form of a four-dimensional array. The four dimensions are: batch size N, number of feature map channels C, feature map height H, and feature map width W. Data layout (Layout) refers to the arrangement of these four dimensions, usually NHWC and NCHW. Although from the human perspective, both NHWC and NCHW are four-dimensional data, but for computers, data storage is linear, so four-dimensional data will be stored in one-dimensional form. The difference between NHWC and NCHW is that four-dimensional data is stored in Memory storage rules are different. It should be noted that the concepts of NHWC and NCHW do not apply to NV12 (YUV420) data types, because 4 Y components correspond to 1 set of UV components, so there is no concept of channels.

1.2 NHWC

insert image description here
For a 2x2 RGB image, if the data is arranged as NHWC, it will be stored in the order of C, W, H, N in the memory, and the pixels at the same position of different channels will be stored together, as shown in the following figure:
insert image description here

1.3 NCHW

If the data arrangement of the 2X2 RGB image is NCHW, it will be stored in the memory in the order of W, H, C, N, that is, all Rs will be stored first, then all Gs will be stored, and finally all Bs will be stored, as shown in the figure below :

insert image description here

1.4 Support

The PyTorch, Caffe, and PaddlePaddle deep learning frameworks use the NCHW format. TensorFlow uses NHWC by default, but the GPU version can support NCHW. For the horizon chip algorithm tool chain, the models trained by the two data arrangements of NCHW and NHWC can be converted and compiled normally.

2 span alignment

2.1 The concept of span

跨距(Stride)Refers to when the image is stored in memory, 每一行所占空间的实际大小. Most computer processors are 32-bit or 64-bit. Therefore 一次读取到的完整数据量最好为4字节或8字节的倍数, if the processor is of other values, the computer needs to perform special processing, resulting in reduced operating efficiency. In order to allow the computer to efficiently process the image, it is usually based on the original data, 填充一些额外的数据以做到4字节或8字节对齐。对齐的操作又叫Paddingand the actual alignment rules will depend on the specific hardware and software system.

insert image description here
Suppose we have an 8-bit grayscale image with a height of 20 pixels and a width of 30 pixels, then the effective data of each line of the image is 30 bytes, if the alignment rule of the computer is 8 bytes , then the span of the image after alignment is 32 bytes, and the amount of data required for Padding for each line is 2 bytes

2.2 BPU span alignment

The above content is just a general introduction to the span rules, for 地平线征程、旭日系列芯片的BPU而言, there are special span alignment rules. For example, for NV12 input, on the premise that H and W are even numbers, 16字节the multiples of Width should be aligned (refer to the analysis of model input and output alignment rules https://developer.horizon.ai/forumDetail/118364000835765837 ). For different data arrangements and data types, BPU's span alignment has different rules. 图像数据的对齐On the board side, it will be automatically completed by the model reasoning and prediction library (using code input[i].properties.alignedShape = input[i].properties.validShape;). You only need to allocate the BPU memory according to the aligned byte size when writing the deployment code (the alignment of the featuremap data still requires the user to write code to complete, refer to OE package horizon_runtime_sample/code/03_misc/resnet_featur). The aligned byte size can be obtained directly by reading the model parameters, so it is very convenient to use.

typedef struct {
    
    
  hbDNNTensorShape validShape;    // 数据的有效尺寸
  hbDNNTensorShape alignedShape;  // 数据的对齐尺寸
  int32_t tensorLayout;
  int32_t tensorType;
  hbDNNQuantiShift shift;
  hbDNNQuantiScale scale; 
  hbDNNQuantiType quantiType;
  int32_t quantizeAxis;
  int32_t alignedByteSize;        // 数据对齐后所占的字节大小
  int32_t stride[HB_DNN_TENSOR_MAX_DIMENSIONS];
} hbDNNTensorProperties;

C++SDKThe structure provided by the toolchain hbDNNTensorPropertiescontains 输入/输出detailed information of the model tensor, validShape is the effective size of the data, alignedShape为对齐尺寸and alignedByteSize is the byte size after alignment. Proper use of these data can make code writing more efficient. For details about this part, please refer to the BPU SDK API chapter of the toolchain manual.

2.3 Remove alignment

The alignment is to take care of the image reading performance of the hardware and software system, 在完成计算任务后,需要去除对齐,只保留有效数据. If the model ends with BPU节点结尾,则会输出alignedShape的数据,需要用户编写代码将padding数据跳过(you can use hrt_model_exec model_info to view the sum of the model input alignedShapeand output validShape). If there is a CPU node at the end of the model, the BPU and CPU will automatically remove the alignment during data transmission, and no manual operation is required.
insert image description here
The model whose tail is a BPU node requires the user to manually remove the alignment data

insert image description here
For a model whose tail is a CPU node, the alignment removal operation will be performed automatically without manual intervention by the user.

Guess you like

Origin blog.csdn.net/weixin_38346042/article/details/131792300