1. Concept
First of all, these are two data storage methods for batch pictures, and t is defined 一批图片在计算机存储空间内的数据存储layou
. N
Indicates the number of this batch of pictures, C
the number of channels contained in each picture, H
the pixel height of this batch of pictures, W
and the pixel width of this batch of pictures. The number of channels represented by C may have many situations. For example, the channels of the RGB image format are 3 channels, and R (red), G (green), and B (blue) each occupy a channel, representing each pixel in the image. There are three channel values, and the value range of each channel is [0~255]. The superposition of the three channels presents the color of a pixel. The RGB image also has a four-channel representation. In addition to the RGB three-channel, there is also an alpha channel, which represents transparency. If it is a grayscale image, there is only one channel. The YUV image also contains three channels. Unlike RGB, there are various combinations of YUV data representation.
2. Data layout understanding
Taking an image in RGB data format as an example, an RGB image can be described by three parameters H, W, and C. The height and width of the image correspond to the two dimensions H and W respectively, and each pixel in the image has Three values are used to represent red R, green G and blue B. The range of each value is [0~255], as shown in the following figure:
2.1 Application of NCHW and NHWC in model reasoning
In actual deployment, such as using a deployment framework such as, NCNN、TensorRT、Caffe
etc., after model reasoning, float*
the result data of the same type will be returned data
, which is stored as one-dimensional data. At this time, if you know whether the data arrangement of the returned data is NCHW or NHWC, it is easy to parse the result. The arrangement of the data in the two formats is as follows:
(1) NCHW arrangement
NCHW
numpy.array
Similar to the arrangement of 4-dimensional arrays in python : the innermost layer is H, W, the outer layer is C, and the outermost layer is N. Since the returned data is of float* type, the 4-dimensional data is gradually split from the inner layer to the outer layer (HW data is arranged first) and arranged into a one-dimensional array.
That is [R11,R12,R13....R21,R22,R23,R24...G11,G12,G13....G21,G22,G23,G24...B11,B12,B13....B21,B22,B23,B24...]
, the elements of each channel will be placed next to each other. Assuming that the size of the RGB image is HxWx3
, for the data arranged by NCHW, [0:HxW]
the pixel values of the R channel are stored, [HxW:2*HxW]
and the pixel values of the G channel are stored, and [HxW*2:3*HxW]
the pixel values of the B channel are stored.
(1) NHWC Arrangement
NHWC
numpy.array
Similar to the arrangement of 4-dimensional arrays in python : the innermost layer is C, the outermost layer of W is H, N. Since the returned data is of float* type, the 4-dimensional data is gradually split from the inner layer to the outer layer (the C data is arranged first) and arranged into a one-dimensional array.
That is , the values in multiple channels at [R11,G11,B11,R12,G12,B12,R13,G13,B13...R21,G21,B21,R22,G22,B22....Rij,Gij,Bij]
the pixel position are stored consecutively.ij
The last is the storage N of multiple pictures, which means that there are a total of N pictures. NHWC and NCHW represent two image data storage methods, which are applied in different hardware acceleration scenarios. In the case of intel GPU acceleration, it is hoped that the pixels that access the same channel are continuous. Generally, NCHW is used for storage, that is, the input data format is NCHW. In this way, when doing CNN, it is continuous when accessing the memory. convenient.
最佳实践
: Fully consider the two formats when designing the network, and it is best to be able to switch flexibly. On GPU上训练时
, 输入数据
the format is taken NCHW格式
, and on 推理结果
output, the returned data is in NHWC
the format.
3. Conversion between NHWC and NCHW
The two storage methods show how image data is stored in the storage. These two storage methods can be converted to each other. Taking NHWC to NCHW as an example, the conversion can be done as follows:
3.1 NHWC to NCHW
int nhwc_to_nchw(float *out_data, float *in_data, int img_h, int img_w) {
float *hw1 = out_data;
float *hw2 = hw1 + img_h * img_w;
float *hw3 = hw2 + img_h * img_w;
for (int hh = 0; hh < img_h; ++hh) {
for (int ww = 0; ww < img_w; ++ww) {
*hw1++ = *(in_data++); // B
*hw2++ = *(in_data++); // G
*hw3++ = *(in_data++); // R
}
}
return 0;
}
in_data
The data arranged for the input NHWC, out_data is the converted NCHW data
3.2 NCHW to NHWC
int nchw_to_nhwc(float* out_data, float* in_data, int img_h, int img_w) {
float *res = out_data;
for (int i = 0; i < img_h * img_w*3 ;) {
res[i] = *(in_data);
res[i+1] = *(in_data + img_h * img_w);
res[i+2] = *(in_data + 2*img_h * img_w);
i +=3;
}
}
return 0;
}
4. Image preprocessing (HWC to CHW, BGR to RGB and normalized)
- Input the picture of the model, the general data arrangement is
CHW
, the picture format isRGB
, and do it归一化(除255)
. - Assuming that the picture is read by opencv, the data arrangement is
HWC
, and the picture format isBGR
. Therefore, before inputting the network, it is necessary to convert the HWC format to CHW, and BGR to RGB, and to normalize - The code is implemented as follows:
int img_prerpocess(cv::Mat input_image, float *out_data,int img_h, int img_w)
{
int image_area = input_image.cols * input_image.rows;
unsigned char* pimage = input_image.data;
// / 将 HWC 转为 CHW/
float *hw_r = out_data + image_area * 0;
float *hw_g = out_data + image_area * 1;
float *hw_b = out_data + image_area * 2;
//BGR -> RGB ///
for(int i = 0; i < image_area; ++i, pimage += 3){
*hw_r++ = pimage[2] / 255.0f;
*hw_g++ = pimage[1] / 255.0f;
*hw_b++ = pimage[0] / 255.0f;
}
return 0;
}
5. Post-processing decode (take yolox target detection as an example)
The main steps of post-processing decode include the following:
- generate grid cells
- Decode the predicted prediction and output it as a proposal
- NMS
This article mainly introduces how to decode the prediction output into a proposal, and distinguishes the decoding methods of NHWC and NCHW formats
5.1 Model decode (NHWC)
struct Object
{
cv::Rect_<float> rect;
int label;
float prob;
};
struct GridAndStride
{
int grid0;
int grid1;
int stride;
};
static void generate_grids_and_stride(const int target_size, std::vector<int>& strides, std::vector<GridAndStride>& grid_strides)
{
for (int i = 0; i < (int)strides.size(); i++)
{
int stride = strides[i];
int num_grid = target_size / stride;
for (int g1 = 0; g1 < num_grid; g1++)
{
for (int g0 = 0; g0 < num_grid; g0++)
{
GridAndStride gs;
gs.grid0 = g0;
gs.grid1 = g1;
gs.stride = stride;
grid_strides.push_back(gs);
}
}
}
}
static void generate_yolox_proposals(std::vector<GridAndStride> grid_strides, const float*bottom, float prob_threshold, std::vector<Object>& objects)
{
int feat_h =640 / 32; // 640 input net size h
int feat_w =640 / 32; // 640 input net size w
int pred_num = 85; // x y w h conf + 80 classes
const int num_grid = feat_h * feat_w ; //
const int num_class = 80; // coco 80 classes
const int num_anchors = grid_strides.size(); // 等于 feat_h * feat_w
const float* feat_ptr = bottom;
for (int anchor_idx = 0; anchor_idx < num_anchors; anchor_idx++)
{
const int grid0 = grid_strides[anchor_idx].grid0;
const int grid1 = grid_strides[anchor_idx].grid1;
const int stride = grid_strides[anchor_idx].stride;
// yolox/models/yolo_head.py decode logic
// outputs[..., :2] = (outputs[..., :2] + grids) * strides
// outputs[..., 2:4] = torch.exp(outputs[..., 2:4]) * strides
float x_center = (feat_ptr[0] + grid0) * stride;
float y_center = (feat_ptr[1] + grid1) * stride;
float w = exp(feat_ptr[2]) * stride;
float h = exp(feat_ptr[3]) * stride;
float x0 = x_center - w * 0.5f;
float y0 = y_center - h * 0.5f;
float box_objectness = feat_ptr[4];
for (int class_idx = 0; class_idx < num_class; class_idx++)
{
float box_cls_score = feat_ptr[5 + class_idx];
float box_prob = box_objectness * box_cls_score;
if (box_prob > prob_threshold)
{
Object obj;
obj.rect.x = x0;
obj.rect.y = y0;
obj.rect.width = w;
obj.rect.height = h;
obj.label = class_idx;
obj.prob = box_prob;
objects.push_back(obj);
}
} // class loop
feat_ptr += pred_num;
} // point anchor loop
}
Reference: https://github.com/Megvii-BaseDetection/YOLOX/blob/main/demo/ncnn/cpp/yolox.cpp
5.1 Model decode (NCHW)
static void generate_grids_and_stride(const int target_size, std::vector<int>& strides, std::vector<GridAndStride>& grid_strides)
{
for (int i = 0; i < (int)strides.size(); i++)
{
int stride = strides[i];
int num_grid = target_size / stride;
for (int g1 = 0; g1 < num_grid; g1++)
{
for (int g0 = 0; g0 < num_grid; g0++)
{
GridAndStride gs;
gs.grid0 = g0;
gs.grid1 = g1;
gs.stride = stride;
grid_strides.push_back(gs);
}
}
}
}
static void generate_yolox_proposals(std::vector<GridAndStride> grid_strides, const float*bottom, float prob_threshold, std::vector<Object>& objects)
{
int feat_h =640 / 32; // 640 input net size h
int feat_w =640 / 32; // 640 input net size w
int pred_num = 85; // x y w h conf + 80 classes
const int num_grid = feat_h * feat_w ; //
const int num_class = 80; // coco 80 classes
const int num_anchors = grid_strides.size(); // 等于 feat_h * feat_w
const float* feat_ptr = bottom;
for (int anchor_idx = 0; anchor_idx < num_anchors; anchor_idx++)
{
const int grid0 = grid_strides[anchor_idx].grid0;
const int grid1 = grid_strides[anchor_idx].grid1;
const int stride = grid_strides[anchor_idx].stride;
// yolox/models/yolo_head.py decode logic
// outputs[..., :2] = (outputs[..., :2] + grids) * strides
// outputs[..., 2:4] = torch.exp(outputs[..., 2:4]) * strides
float x_center = (feat_ptr[anchor_idx + 0*feat_h * feat_w] + grid0) * stride;
float y_center = (feat_ptr[anchor_idx + 1*feat_h * feat_w] + grid1) * stride;
float w = exp(feat_ptr[anchor_idx + 2*feat_h * feat_w]) * stride;
float h = exp(feat_ptr[anchor_idx + 3*feat_h * feat_w]) * stride;
float x0 = x_center - w * 0.5f;
float y0 = y_center - h * 0.5f;
float box_objectness = feat_ptr[anchor_idx + 4*feat_h * feat_w];
for (int class_idx = 0; class_idx < num_class; class_idx++)
{
float box_cls_score = feat_ptr[5 + class_idx];
float box_prob = box_objectness * box_cls_score;
if (box_prob > prob_threshold)
{
Object obj;
obj.rect.x = x0;
obj.rect.y = y0;
obj.rect.width = w;
obj.rect.height = h;
obj.label = class_idx;
obj.prob = box_prob;
objects.push_back(obj);
}
} // class loop
} // point anchor loop
}
reference
1 . https://developer.horizon.ai/forumDetail/136488103547258555