Process record of implementing Grad-CAM visualization algorithm

1. Things to pay attention to

1. The layer that hangs the hook

When we want to see the score on the classification response map, and where the model focuses, the layer that needs to be recorded is the output of the activation function, not the output of the convolutional layer that predicts the classification score.

That is to say, the layer with the hook must be the activation function layer before the classification head

2. The inplace of the activation function must be set to False

This can be achieved with the following functions

# 激活函数的inplace一定要为False形式,否则保存不了梯度
def change_inplace(model, inplace=False):
    for m in model.modules():
        t = type(m)
        # print(t)
        if t is nn.ReLU:
            m.inplace = inplace

3. Processing of classification score map

The score map must not be converted between tensor and numpy. Here, the original score map output by the network must be retained, so that the form is the tensor form. As shown below, the points that need attention are circled.

 2. Effect display

A grad-cam visualization algorithm applied based on classification scores. Look at the part of the model that is concerned (at this time, it is not a black cat that is being tracked, but another cat)

 

 The above effect comes from the results of a twin network tracking algorithm. And the effect of adding the resulting heat map to the input image is not well modulated, haha.

Another point is that the final return of grad-cam visualization should only return to the input of the model, so it is impossible to apply the attention to the original image. Because the input of the tracking process is the cropped image crop + pad + resize, as shown above. And the detection input is resize+pad, so after superimposing the final network input and the heat map, it can perfectly resize back to the original image without seeing the difference. But the tracking doesn't work.

The following is the heat map mask, the color can actually be set by yourself (the black cat being tracked). It seems that the TV is also seen from the position.

 

Guess you like

Origin blog.csdn.net/allrubots/article/details/127962773