The class activation map (CAM) can show how much information each area of the input image provides for the specified category of the classification neural network, which can help us better understand the working process of the neural network. There are many talks about CAM on the Internet. Here I will focus on the concise code implementation of CAM and discuss the effects of CAM in various situations. CAM has developed several methods, including CAM, GradCAM, GradCAM++, SmoothGradCAM++, ScoreCAM, SSCAM, ISCAM, etc. Only the first three are introduced here, and the others are not planned for the time being. The codes given on the Internet are either complete but too long, or too short and incomplete, which are not easy to understand quickly. Here I give a complete and short way of writing, which is easier to understand. If you are not familiar with the hook method involved, you can read my other blog dedicated to the hook method.

1. CAM

CAM is only suitable for the case where the network structure is the last convolutional layer + GAP (ie global average pooling, global average pooling) + fc. In this case, the feature map feature_map output by the last convolutional layer (we named it hook_a in the following code) will change from C×H×W to C×1×1 after GAP pooling, so that the final full The number of input nodes of the connection layer fc is C, so the weights of fc are C × num_classes. If we multiply the vector of each category in weights and the feature map of the last convolutional layer and then add them together, we can get a H×W size map, which becomes the heatmap heatmap, which reflects each feature in feature_map How important a pixel is to the final classification. The convolutional neural network is always only zoomed without rotation transformation, so hook_a corresponds to the coordinate position of the original input image, so it can be resized and superimposed with the original image to display the contribution of each area in the original image to the specified class degree.
Well, explaining this in human language is too strenuous, I guess you don’t understand it very well, show you the code:

import numpy as np
import torch.nn.functional as F
import torchvision.models as models
from torchvision.transforms.functional import normalize, resize, to_tensor, to_pil_image
import matplotlib.pyplot as plt
from matplotlib import cm
from PIL import Image

net = models.resnet18(pretrained=True).cuda()

hook_a = None
def _hook_a(module,inp,out):
    global hook_a
    hook_a = out

submodule_dict = dict(net.named_modules())
target_layer = submodule_dict['layer4']
hook1 = target_layer.register_forward_hook(_hook_a)

img_path = 'images/border_collie2.jpg'
img = Image.open(img_path, mode='r').convert('RGB')
img_tensor = normalize(to_tensor(resize(img, (224, 224))),
                           [0.485, 0.456, 0.406], [0.229, 0.224, 0.225]).cuda()

scores = net(img_tensor.unsqueeze(0))
hook1.remove()
class_idx = 232 # class 232 corresponding to the border collie
weights = net.fc.weight.data[class_idx,:]
cam = (weights.view(*weights.shape, 1, 1) * hook_a.squeeze(0)).sum(0)
cam = F.relu(cam)
cam.sub_(cam.flatten(start_dim=-2).min(-1).values.unsqueeze(-1).unsqueeze(-1))
cam.div_(cam.flatten(start_dim=-2).max(-1).values.unsqueeze(-1).unsqueeze(-1))
cam = cam.data.cpu().numpy()

heatmap = to_pil_image(cam, mode='F')
overlay = heatmap.resize(img.size, resample=Image.BICUBIC)
cmap = cm.get_cmap('jet')
overlay = (255 * cmap(np.asarray(overlay) ** 2)[:, :, :3]).astype(np.uint8)
alpha = .7
result = (alpha * np.asarray(img) + (1 - alpha) * overlay).astype(np.uint8)
plt.imshow(result)

insert image description here

Figure 1. Original image

insert image description here

Figure 2. CAM method output results

We can see that ResNet18 does not pay attention to all the border herds in the picture, it only pays attention to the most similar one, which is enough for it to make a classification judgment.

2. GradCAM

Compared with CAM, GradCAM no longer uses the weight of the fully connected layer, but cleverly calculates the gradient of the output score of the specified class relative to the feature map of the convolutional layer, and no longer requires that there must be a link between the last convolutional layer and the fully connected layer. The GAP layer has a wider scope of application.
insert image description here

Figure 3. The core formula of the GradCAM method

Use this formula in the paper to explain, where $A^{k}_{ij}$ Represents the output feature map of the convolutional layer, k is the number of channels, i, j are the number of pixels, $y^{c}$ represents the output vector of class c, $\frac{\partial y^{c}}{\partial A^{k}_{ij}}$ Represents the gradient of the c-th class to the feature map. The gradient is averaged per channel, and a k-dimensional vector $\alpha ^{c}_{k} can also be obtained$ , is the weights in the code, which can also be used to replace the weights in the CAM. For the case where GAP is used between the last convolutional layer and the fully connected layer, the results calculated by the GradCAM method and the CAM method are the same. The complete code is given below:

import numpy as np
import torch.nn.functional as F
import torchvision.models as models
from torchvision.transforms.functional import normalize, resize, to_tensor, to_pil_image
import matplotlib.pyplot as plt
from matplotlib import cm
from PIL import Image

net = models.resnet18(pretrained=True).cuda()

hook_a = None
def _hook_a(module,inp,out):
    global hook_a
    hook_a = out

hook_g = None
def _hook_g(module,inp,out):
    global hook_g
    hook_g = out[0]

submodule_dict = dict(net.named_modules())
target_layer = submodule_dict['layer4']

hook1 = target_layer.register_forward_hook(_hook_a)
hook2 = target_layer.register_backward_hook(_hook_g)

img_path = 'images/border_collie3.jpg'
img = Image.open(img_path, mode='r').convert('RGB')
img_tensor = normalize(to_tensor(resize(img, (224, 224))),
                           [0.485, 0.456, 0.406], [0.229, 0.224, 0.225]).cuda()

scores = net(img_tensor.unsqueeze(0))
class_idx = 232 # class 232 corresponding to the border collie
loss = scores[:,class_idx].sum()
loss.backward()
hook1.remove()
hook2.remove()

weights = hook_g.squeeze(0).mean(dim=(1,2))
cam = (weights.view(*weights.shape, 1, 1) * hook_a.squeeze(0)).sum(0)
cam = F.relu(cam)
cam.sub_(cam.flatten(start_dim=-2).min(-1).values.unsqueeze(-1).unsqueeze(-1))
cam.div_(cam.flatten(start_dim=-2).max(-1).values.unsqueeze(-1).unsqueeze(-1))
cam = cam.data.cpu().numpy()

heatmap = to_pil_image(cam, mode='F')
overlay = heatmap.resize(img.size, resample=Image.BICUBIC)
cmap = cm.get_cmap('jet')
overlay = (255 * cmap(np.asarray(overlay) ** 2)[:, :, :3]).astype(np.uint8)
alpha = .7
result = (alpha * np.asarray(img) + (1 - alpha) * overlay).astype(np.uint8)
plt.imshow(result)

Note that the loss used in GradCAM is the final output of the specified class score, using this loss backpropagation, and the gradient extracted here is not the gradient of the weight we are concerned about under normal circumstances, but the gradient of the feature map.
In theory, the GradCAM method can use the feature map of any convolutional layer and the gradient of the feature map to calculate this cam map:
insert image description here

Figure 4. The cam image drawn by the GradCAM method using different layers of ResNet18

However, it can be seen from the results drawn in each layer that only the cam image drawn in the last layer is meaningful. I haven't thought about this reason yet.

3. GradCAM++

For better results, GradCAM++ introduces the second-order gradient and the third-order gradient. When calculating in the code, for convenience, the square and cubic of the first-order gradient are used instead. The code is basically the same as GradCAM, just replace the sentence of calculating weights with the following code:

##weights = hook_g.squeeze(0).mean(dim=(1,2))
grad_2 = hook_g.pow(2)
grad_3 = grad_2 * hook_g
denom = 2 * grad_2 + (grad_3 * hook_a).sum(dim=(2, 3), keepdim=True)
nan_mask = grad_2 > 0
grad_2[nan_mask].div_(denom[nan_mask])
weights = grad_2.squeeze_(0).mul_(torch.relu(hook_g.squeeze(0))).sum(dim=(1, 2))

insert image description here

Figure 5. GradCAM++ method output results

Concise code introduction class activation map CAM, GradCAM, GradCAM++

1. CAM

2. GradCAM

3. GradCAM++

Guess you like