Occlusion object detection for target detection

1. Types of occlusion

  1. Intra-class occlusion, the object is occluded by objects of the same class
  2. Inter-class occlusion, where objects are occluded by objects of other classes

Two, the solution

Data annotation

Fine-tune GT bounding boxes for occluded objects

data augmentation

  1. cutout: During training, random mask targets are used to improve the model's ability to cope with occlusions
  2. mosaic: Fusion of multiple images according to a certain ratio, which is equivalent to simulating some occluded scenes

network structure

  1. Add attention mechanism, etc., so that the model can extract more discriminative features
  2. Pedestrian detection: Pedestrians are divided into 5 independent areas, and an occlusion score between 0 and 1 is predicted for these 5 local areas, representing the degree of visibility or occlusion of these 5 local areas. Use these 5 visibility scores to multiply and add to the feature of the corresponding area to get the final feature (Zhang, S., Wen, L., Bian, X., Lei, Z., & Li, SZ (2018) . Occlusion-aware R-CNN: Detecting Pedestrians in a Crowd)

Positive and negative sample matching mechanism

Change the matching basis from IOU to GIOU, DIOU, CIOU, etc.

loss function

RepLoss(Wang, X., Xiao, T., Jiang, Y., Shao, S., Sun, J., & Shen, C. (2017). Repulsion Loss: Detecting Pedestrians in a Crowd)

The loss function consists of two parts. While requiring the prediction frame to be close to the matching GT frame (T), it also requires the prediction frame to be far away from other GT frames (B), which improves the occlusion detection performance while reducing the sensitivity of the detector to NMS.

The overall formula is as follows, the first part is the attraction loss, and the second part is the repulsion loss (it can also be divided into the loss of the prediction frame and other GT frames, and the loss of the prediction frame and other prediction frames)

The formula for the first part of the loss is as follows,

Function: Make P and the matching GT frame close to each other

in,

P_{+}: The IOU with at least one GT is greater than or equal to 0.5,

G_{Attr}^{P}: the GT box with the maximum IOU value with P

The formula for the second part of the loss is as follows,

RepGT

Function: Keep P away from the GT box whose IOU value between P and P is the second largest

in,

G_{Rep}^{P}: In addition to the GT that matches P, the GT frame with the largest IOU between P and P (that is, the IOU value between P and P is the second largest)

RepBox

Function: Make the prediction frames that match different GTs far away from each other (repel), reducing the sensitivity of the detector to NMS

According to the serial number of the matching GT box, it will P_{+}be divided into different subsets, such as, P_{+}=P_{1}\cap P_{2}\cap ... \cap P_{\varrho }, \varrhoindicating the number of GT boxes

Reference article

"Don't block me, I want to debut in C position!" Talk about the occlusion problem in deep learning target detection-Knowledge

Target detection in Repulsion Loss occluded scenes - Programmer Sought

Guess you like

Origin blog.csdn.net/qq_38964360/article/details/131516787