1. Types of occlusion
- Intra-class occlusion, the object is occluded by objects of the same class
- Inter-class occlusion, where objects are occluded by objects of other classes
Two, the solution
Data annotation
Fine-tune GT bounding boxes for occluded objects
data augmentation
- cutout: During training, random mask targets are used to improve the model's ability to cope with occlusions
- mosaic: Fusion of multiple images according to a certain ratio, which is equivalent to simulating some occluded scenes
network structure
- Add attention mechanism, etc., so that the model can extract more discriminative features
- Pedestrian detection: Pedestrians are divided into 5 independent areas, and an occlusion score between 0 and 1 is predicted for these 5 local areas, representing the degree of visibility or occlusion of these 5 local areas. Use these 5 visibility scores to multiply and add to the feature of the corresponding area to get the final feature (Zhang, S., Wen, L., Bian, X., Lei, Z., & Li, SZ (2018) . Occlusion-aware R-CNN: Detecting Pedestrians in a Crowd)
Positive and negative sample matching mechanism
Change the matching basis from IOU to GIOU, DIOU, CIOU, etc.
loss function
RepLoss(Wang, X., Xiao, T., Jiang, Y., Shao, S., Sun, J., & Shen, C. (2017). Repulsion Loss: Detecting Pedestrians in a Crowd)
The loss function consists of two parts. While requiring the prediction frame to be close to the matching GT frame (T), it also requires the prediction frame to be far away from other GT frames (B), which improves the occlusion detection performance while reducing the sensitivity of the detector to NMS.
The overall formula is as follows, the first part is the attraction loss, and the second part is the repulsion loss (it can also be divided into the loss of the prediction frame and other GT frames, and the loss of the prediction frame and other prediction frames)
The formula for the first part of the loss is as follows,
Function: Make P and the matching GT frame close to each other
in,
: The IOU with at least one GT is greater than or equal to 0.5,
: the GT box with the maximum IOU value with P
The formula for the second part of the loss is as follows,
RepGT
Function: Keep P away from the GT box whose IOU value between P and P is the second largest
in,
: In addition to the GT that matches P, the GT frame with the largest IOU between P and P (that is, the IOU value between P and P is the second largest)
RepBox
Function: Make the prediction frames that match different GTs far away from each other (repel), reducing the sensitivity of the detector to NMS
According to the serial number of the matching GT box, it will be divided into different subsets, such as, , indicating the number of GT boxes
【Reference article】
Target detection in Repulsion Loss occluded scenes - Programmer Sought