NHWC and NCHW data arrangement and conversion (model deployment)

1. Concept

First of all, these are two data storage methods for batch pictures, and t is defined 一批图片在计算机存储空间内的数据存储layou. NIndicates the number of this batch of pictures, Cthe number of channels contained in each picture, Hthe pixel height of this batch of pictures, Wand the pixel width of this batch of pictures. The number of channels represented by C may have many situations. For example, the channels of the RGB image format are 3 channels, and R (red), G (green), and B (blue) each occupy a channel, representing each pixel in the image. There are three channel values, and the value range of each channel is [0~255]. The superposition of the three channels presents the color of a pixel. The RGB image also has a four-channel representation. In addition to the RGB three-channel, there is also an alpha channel, which represents transparency. If it is a grayscale image, there is only one channel. The YUV image also contains three channels. Unlike RGB, there are various combinations of YUV data representation.

2. Data layout understanding

Taking an image in RGB data format as an example, an RGB image can be described by three parameters H, W, and C. The height and width of the image correspond to the two dimensions H and W respectively, and each pixel in the image has Three values ​​are used to represent red R, green G and blue B. The range of each value is [0~255], as shown in the following figure:

insert image description here

2.1 Application of NCHW and NHWC in model reasoning

In actual deployment, such as using a deployment framework such as, NCNN、TensorRT、Caffeetc., after model reasoning, float* the result data of the same type will be returned data, which is stored as one-dimensional data. At this time, if you know whether the data arrangement of the returned data is NCHW or NHWC, it is easy to parse the result. The arrangement of the data in the two formats is as follows:

insert image description here

(1) NCHW arrangement

NCHWnumpy.array Similar to the arrangement of 4-dimensional arrays in python : the innermost layer is H, W, the outer layer is C, and the outermost layer is N. Since the returned data is of float* type, the 4-dimensional data is gradually split from the inner layer to the outer layer (HW data is arranged first) and arranged into a one-dimensional array.
That is [R11,R12,R13....R21,R22,R23,R24...G11,G12,G13....G21,G22,G23,G24...B11,B12,B13....B21,B22,B23,B24...], the elements of each channel will be placed next to each other. Assuming that the size of the RGB image is HxWx3, for the data arranged by NCHW, [0:HxW]the pixel values ​​​​of the R channel are stored, [HxW:2*HxW]and the pixel values ​​​​of the G channel are stored, and [HxW*2:3*HxW]the pixel values ​​​​of the B channel are stored.

insert image description here

(1) NHWC Arrangement

NHWCnumpy.array Similar to the arrangement of 4-dimensional arrays in python : the innermost layer is C, the outermost layer of W is H, N. Since the returned data is of float* type, the 4-dimensional data is gradually split from the inner layer to the outer layer (the C data is arranged first) and arranged into a one-dimensional array.

That is , the values ​​in multiple channels at [R11,G11,B11,R12,G12,B12,R13,G13,B13...R21,G21,B21,R22,G22,B22....Rij,Gij,Bij]the pixel position are stored consecutively.ij

insert image description here
The last is the storage N of multiple pictures, which means that there are a total of N pictures. NHWC and NCHW represent two image data storage methods, which are applied in different hardware acceleration scenarios. In the case of intel GPU acceleration, it is hoped that the pixels that access the same channel are continuous. Generally, NCHW is used for storage, that is, the input data format is NCHW. In this way, when doing CNN, it is continuous when accessing the memory. convenient.

最佳实践: Fully consider the two formats when designing the network, and it is best to be able to switch flexibly. On GPU上训练时, 输入数据the format is taken NCHW格式, and on 推理结果output, the returned data is in NHWCthe format.

3. Conversion between NHWC and NCHW

The two storage methods show how image data is stored in the storage. These two storage methods can be converted to each other. Taking NHWC to NCHW as an example, the conversion can be done as follows:

3.1 NHWC to NCHW

int nhwc_to_nchw(float *out_data, float *in_data, int img_h, int img_w) {
    
    
  float *hw1 = out_data;
  float *hw2 = hw1 + img_h * img_w;
  float *hw3 = hw2 + img_h * img_w;
  for (int hh = 0; hh < img_h; ++hh) {
    
    
    for (int ww = 0; ww < img_w; ++ww) {
    
    
      *hw1++ = *(in_data++);    // B
      *hw2++ = *(in_data++);    // G
      *hw3++ = *(in_data++);    // R
    }
  }
  return 0;
}
  • in_data The data arranged for the input NHWC, out_data is the converted NCHW data

3.2 NCHW to NHWC

int nchw_to_nhwc(float* out_data, float* in_data, int img_h, int img_w) {
    
    
  float *res = out_data;

  for (int i = 0; i < img_h * img_w*3 ;) {
    
    
      res[i] = *(in_data);    
      res[i+1] = *(in_data + img_h * img_w); 
      res[i+2] = *(in_data + 2*img_h * img_w); 
      i +=3;
    }
  }
  return 0;
}

4. Image preprocessing (HWC to CHW, BGR to RGB and normalized)

  • Input the picture of the model, the general data arrangement is CHW, the picture format is RGB, and do it 归一化(除255).
  • Assuming that the picture is read by opencv, the data arrangement is HWC, and the picture format is BGR. Therefore, before inputting the network, it is necessary to convert the HWC format to CHW, and BGR to RGB, and to normalize
  • The code is implemented as follows:
int img_prerpocess(cv::Mat input_image, float *out_data,int img_h, int img_w) 
{
    
    
	int image_area = input_image.cols * input_image.rows;
	unsigned char* pimage = input_image.data;
	// / 将 HWC 转为 CHW/
	float *hw_r = out_data + image_area * 0;
	float *hw_g = out_data + image_area * 1;
	float *hw_b = out_data + image_area * 2;
	//BGR -> RGB ///
	for(int i = 0; i < image_area; ++i, pimage += 3){
    
    
	    *hw_r++ = pimage[2] / 255.0f;
	    *hw_g++ = pimage[1] / 255.0f;
	    *hw_b++ = pimage[0] / 255.0f;
	}
	
	return 0;

}

5. Post-processing decode (take yolox target detection as an example)

The main steps of post-processing decode include the following:

  • generate grid cells
  • Decode the predicted prediction and output it as a proposal
  • NMS

This article mainly introduces how to decode the prediction output into a proposal, and distinguishes the decoding methods of NHWC and NCHW formats

5.1 Model decode (NHWC)

struct Object
{
    
    
    cv::Rect_<float> rect;
    int label;
    float prob;
};

struct GridAndStride
{
    
    
    int grid0;
    int grid1;
    int stride;
};

static void generate_grids_and_stride(const int target_size, std::vector<int>& strides, std::vector<GridAndStride>& grid_strides)
{
    
    
    for (int i = 0; i < (int)strides.size(); i++)
    {
    
    
        int stride = strides[i];
        int num_grid = target_size / stride;
        for (int g1 = 0; g1 < num_grid; g1++)
        {
    
    
            for (int g0 = 0; g0 < num_grid; g0++)
            {
    
    
                GridAndStride gs;
                gs.grid0 = g0;
                gs.grid1 = g1;
                gs.stride = stride;
                grid_strides.push_back(gs);
            }
        }
    }
}
static void generate_yolox_proposals(std::vector<GridAndStride> grid_strides, const float*bottom, float prob_threshold, std::vector<Object>& objects)
{
    
    
    int feat_h =640 / 32;    // 640 input net size h
    int feat_w =640 / 32;    // 640 input net size w
    int pred_num = 85;       // x y w h conf + 80 classes
    const int num_grid = feat_h * feat_w ;  // 
    const int num_class = 80;   // coco  80 classes
    const int num_anchors = grid_strides.size();  // 等于  feat_h * feat_w

    const float* feat_ptr = bottom;
    for (int anchor_idx = 0; anchor_idx < num_anchors; anchor_idx++)
    {
    
    
        const int grid0 = grid_strides[anchor_idx].grid0;
        const int grid1 = grid_strides[anchor_idx].grid1;
        const int stride = grid_strides[anchor_idx].stride;

        // yolox/models/yolo_head.py decode logic
        //  outputs[..., :2] = (outputs[..., :2] + grids) * strides
        //  outputs[..., 2:4] = torch.exp(outputs[..., 2:4]) * strides
        float x_center = (feat_ptr[0] + grid0) * stride;
        float y_center = (feat_ptr[1] + grid1) * stride;
        float w = exp(feat_ptr[2]) * stride;
        float h = exp(feat_ptr[3]) * stride;
        
        float x0 = x_center - w * 0.5f;
        float y0 = y_center - h * 0.5f;

        float box_objectness = feat_ptr[4];
        for (int class_idx = 0; class_idx < num_class; class_idx++)
        {
    
    
            float box_cls_score = feat_ptr[5 + class_idx];
            float box_prob = box_objectness * box_cls_score;
            if (box_prob > prob_threshold)
            {
    
    
                Object obj;
                obj.rect.x = x0;
                obj.rect.y = y0;
                obj.rect.width = w;
                obj.rect.height = h;
                obj.label = class_idx;
                obj.prob = box_prob;

                objects.push_back(obj);
            }

        } // class loop
        feat_ptr += pred_num;

    } // point anchor loop
}

Reference: https://github.com/Megvii-BaseDetection/YOLOX/blob/main/demo/ncnn/cpp/yolox.cpp

5.1 Model decode (NCHW)

static void generate_grids_and_stride(const int target_size, std::vector<int>& strides, std::vector<GridAndStride>& grid_strides)
{
    
    
    for (int i = 0; i < (int)strides.size(); i++)
    {
    
    
        int stride = strides[i];
        int num_grid = target_size / stride;
        for (int g1 = 0; g1 < num_grid; g1++)
        {
    
    
            for (int g0 = 0; g0 < num_grid; g0++)
            {
    
    
                GridAndStride gs;
                gs.grid0 = g0;
                gs.grid1 = g1;
                gs.stride = stride;
                grid_strides.push_back(gs);
            }
        }
    }
}
static void generate_yolox_proposals(std::vector<GridAndStride> grid_strides, const float*bottom, float prob_threshold, std::vector<Object>& objects)
{
    
    
    int feat_h =640 / 32;    // 640 input net size h
    int feat_w =640 / 32;    // 640 input net size w
    int pred_num = 85;       // x y w h conf + 80 classes
    const int num_grid = feat_h * feat_w ;  // 
    const int num_class = 80;   // coco  80 classes
    const int num_anchors = grid_strides.size();  // 等于  feat_h * feat_w

    const float* feat_ptr = bottom;
    for (int anchor_idx = 0; anchor_idx < num_anchors; anchor_idx++)
    {
    
    
        const int grid0 = grid_strides[anchor_idx].grid0;
        const int grid1 = grid_strides[anchor_idx].grid1;
        const int stride = grid_strides[anchor_idx].stride;

        // yolox/models/yolo_head.py decode logic
        //  outputs[..., :2] = (outputs[..., :2] + grids) * strides
        //  outputs[..., 2:4] = torch.exp(outputs[..., 2:4]) * strides
        float x_center = (feat_ptr[anchor_idx + 0*feat_h * feat_w] + grid0) * stride;
        float y_center = (feat_ptr[anchor_idx + 1*feat_h * feat_w] + grid1) * stride;
        float w = exp(feat_ptr[anchor_idx + 2*feat_h * feat_w]) * stride;
        float h = exp(feat_ptr[anchor_idx + 3*feat_h * feat_w]) * stride;
        
        float x0 = x_center - w * 0.5f;
        float y0 = y_center - h * 0.5f;

        float box_objectness = feat_ptr[anchor_idx + 4*feat_h * feat_w];
        for (int class_idx = 0; class_idx < num_class; class_idx++)
        {
    
    
            float box_cls_score = feat_ptr[5 + class_idx];
            float box_prob = box_objectness * box_cls_score;
            if (box_prob > prob_threshold)
            {
    
    
                Object obj;
                obj.rect.x = x0;
                obj.rect.y = y0;
                obj.rect.width = w;
                obj.rect.height = h;
                obj.label = class_idx;
                obj.prob = box_prob;

                objects.push_back(obj);
            }

        } // class loop
      

    } // point anchor loop
}

reference

1 . https://developer.horizon.ai/forumDetail/136488103547258555

Guess you like

Origin blog.csdn.net/weixin_38346042/article/details/130624572