Feature Visualization Technology (CAM)

https://zhuanlan.zhihu.com/p/269702192
insert image description here

CAM technology can help us understand the process of CNN looking for target objects in images, and can be used to visualize the intermediate layer features of CNN, as well as interpret and visualize image classification results. The implementation of CAM technology is relatively simple, and it can be implemented using commonly used deep learning frameworks such as PyTorch and TensorFlow.

CAM (Class Activation Map) is a feature visualization technology that can help us understand the decision-making process of neural networks for image classification. CAM technology can generate a heat map to visualize which regions in the input image play an important role in the classification decision of the neural network. CAM technology is mainly suitable for image classification tasks based on convolutional neural networks.

In CNN, each convolutional layer generates a set of feature maps, where each feature map corresponds to a convolution kernel. The CAM technique utilizes the Global Average Pooling (GAP) operation to capture features related to the target category in each feature map. Then, these features are mapped to the spatial location of the input image to generate a class activation map (Class Activation Map) corresponding to each category.

Specifically, the steps of CAM technology are as follows:

  1. Feed the image into the CNN and get the output feature map of the last convolutional layer.

  2. For each target category, calculate its weight in the output feature map of the convolutional layer, that is, use the global average pooling operation to perform a weighted average of the feature maps corresponding to the category.

  3. These weights are multiplied with the output feature map of the convolutional layer and the results are summed to obtain the Class Activation Map corresponding to that category.

  4. Normalize the class activation map to get the visualization result.

Here is a simple example code that uses PyTorch and CAM techniques to visualize which regions in an image contribute significantly to a neural network's classification decision:

import torch
import torch.nn as nn
import torchvision.transforms as transforms
import matplotlib.pyplot as plt
import numpy as np

# 加载预训练的ResNet50模型
model = torch.hub.load('pytorch/vision', 'resnet50', pretrained=True)

# 定义CAM模型
class CAM(nn.Module):
    def __init__(self, feature_extractor, classifier):
        super(CAM, self).__init__()
        self.feature_extractor = feature_extractor
        self.classifier = classifier
        self.gradient = None

    def gradients_hook(self, grad):
        self.gradient = grad

    def forward(self, x):
        x = self.feature_extractor(x)
        h = x.register_hook(self.gradients_hook)
        x = self.classifier(x)
        h.remove()
        return x

# 定义CAM可视化函数
def visualize_CAM(image, model, target_class):
    # 转换图像为张量
    transform = transforms.Compose([
        transforms.Resize((224, 224)),
        transforms.ToTensor(),
        transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])
    ])
    image = transform(image).unsqueeze(0)

    # 定义CAM模型
    feature_extractor = nn.Sequential(*list(model.children())[:-2])
    classifier = model.fc
    cam_model = CAM(feature_extractor, classifier)

    # 前向传播
    output = cam_model(image)
    target = torch.Tensor([target_class])
    loss = nn.functional.binary_cross_entropy_with_logits(output, target)

    # 反向传播
    cam_model.zero_grad()
    loss.backward()

    # 计算CAM
    features = cam_model.feature_extractor(image).detach().squeeze()
    weights = cam_model.gradient.squeeze().mean(dim=(1, 2), keepdims=True)
    cam = (weights * features).sum(dim=0)
    cam = np.maximum(cam, 0)
    cam = cam / cam.max()

    # 可视化CAM
    plt.imshow(image.squeeze().permute(1, 2, 0))
    plt.imshow(cam, cmap='jet', alpha=0.5, interpolation='nearest')
    plt.axis('off')
    plt.show()

# 加载一张测试图像
image = plt.imread('test_image.jpg')

# 可视化CAM
visualize_CAM(image, model, target_class=0)

In the above code, we first loaded a pre-trained ResNet50 model and defined a CAM model. We then define a function to visualize CAM that takes an image, a neural network model, and an object category, and then uses the CAM technique to generate a heatmap to visualize which regions of the input image contributed to the neural network's classification decision. important role. Finally, we load a test image and visualize it using the Visual CAM function.

Note that this is just a simple example code, you can modify it as needed to suit your data and visualization needs.

Guess you like

Origin blog.csdn.net/qq_44089890/article/details/130442429