1 Introduction
The code of CenterNet is still a bit difficult to understand, but I still want to thank everyone for sharing the information~
2 Study notes for CenterNet code
2.1 Data reading-COCO type
The COCO class is used for data reading. The item obtained after reading has the following attributes:
'image': img,
'hmap': hmap,
'w_h_': w_h_,
'regs': regs,
'inds': inds——
inds指的是目标中心的线式索引,其中“线式索引”指的是将单张特征图平铺为一维向量之后的索引(或者说坐标),
从inds的计算公式也可以看出:inds[k] = obj_c_int[1] * self.fmap_size['w'] + obj_c_int[0]
'ind_masks': ind_masks,
'c': center, 's': scale, 'img_id': img_id
2.2 Post-processing
When introducing post-processing, we will introduce it in the order of testing;
2.2.1 _nms——NMS algorithm based on Gaussian distribution characteristics
Use the method of detecting local maxima to obtain the detected peak value, thereby obtaining the detection frame with the highest score;
as the picture shows:
2.2.2 Top-K——Get the k highest scores
Use the Top-K method to get the k values with the highest score, where k is a hyperparameter;
as the picture shows:
2.2.3 Soft-NMS-filtering boxes during the testing phase
CenterNet used the Soft-NMS algorithm in the testing phase,
I think the possible reason is that CenterNet established a multi-scale input for the picture during the test, so it is likely that there will be repeated detection frames.
2.3.4 max_per_image-actually uses Top-K for filtering
Filter by setting the maximum number of targets for each picture, and get the threshold for filtering through max_per_image;
as the picture shows: