Realize vgg16 feature map visualization, image convolution operation data flow analysis (pytorch)

I was recently researching the target detection network of deep learning. During the process of researching the code, I was curious and interested in extracting the content of the black box of image features, so I conducted further research, inquired about relevant information, and realized the vgg16 network feature map. Visualization.
First post the complete code:

import torch
from torchvision import models, transforms
from PIL import Image
import matplotlib.pyplot as plt
import numpy as np
import scipy.misc
import cv2
 
 
# 将输入图片转化为卷积运算格式tensor数据
def transfer_image(image_dir):
    # 以RGB格式打开图像
    # Pytorch DataLoader就是使用PIL所读取的图像格式
    # 建议就用这种方法读取图像,当读入灰度图像时convert('')
    image_info = Image.open(image_dir).convert('RGB')
    # 数据预处理方法
    image_transform = transforms.Compose([
        transforms.Resize(1024),#放缩图片大小至1024*1024
        transforms.CenterCrop(672),#从图象中间剪切672*672大小
        transforms.ToTensor(),#将图像数据转化为tensor
        transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])#归一化处理
    ])
    image_info = image_transform(image_info)#更新载入图像的数据格式
    #下面这部分是tensor张量转化为图片的过程,opencv画图和PIL、torch画图方式
    array1 = image_info.numpy()
    maxvalue = array1.max()
   	array1 = array1*255/maxvalue
   	mat = np.uint8(array1)
    print('mat_shape',mat.shape)
    mat = mat.transpose(1,2,0)
    cv2.imwrite('convert_cocomi1.jpg',mat)
    mat=cv2.cvtColor(mat,cv2.COLOR_BGR2RGB)
    cv2.imwrite('convert_cocomi2.jpg',mat)

    image_info = image_info.unsqueeze(0)#增加tensor维度,便于卷积层进行运算
    return image_info
 
# 获取第k层的特征图
def get_k_layer_feature_map(feature_extractor, k, x):
    with torch.no_grad():
   		 #feature_extractor是特征提取层,后面可以具体看一下vgg16网络
        for index,layer in enumerate(feature_extractor):
            x = layer(x)#x是输入图像的张量数据,layer是该位置进行运算的卷积层,就是进行特征提取
            print('k values:',k)
            print('feature_extractor layer:',index)
            if k == index:#k代表想看第几层的特征图
          
                return x
 
#  可视化特征图
def show_feature_map(feature_map):
    feature_map = feature_map.squeeze(0)#squeeze(0)实现tensor降维,开始将数据转化为图像格式显示
    feature_map = feature_map.cpu().numpy()#进行卷积运算后转化为numpy格式
    feature_map_num = feature_map.shape[0]#特征图数量等于该层卷积运算后的特征图维度
    row_num = np.ceil(np.sqrt(feature_map_num))
    plt.figure()
    for index in range(1, feature_map_num+1):
        plt.subplot(row_num, row_num, index)
        plt.imshow(feature_map[index-1], cmap='gray')
        plt.axis('off')
        scipy.misc.imsave(str(index)+".png", feature_map[index-1])
    plt.show()
 
 
 
 
if __name__ ==  '__main__':
    # 初始化图像的路径
    image_dir = 'cocomi.jpg'
    # 定义提取第几层的feature map
    k = 5
    # 导入Pytorch封装的vgg16网络模型
    model = models.vgg16(pretrained=False)
    # 是否使用gpu运算
    use_gpu = torch.cuda.is_available()
    use_gpu =False
    # 读取图像信息
    image_info = transfer_image(image_dir)
    # 判断是否使用gpu
    if use_gpu:
        model = model.cuda()
        image_info = image_info.cuda()
    # vgg16包含features和classifier,但只有features部分有特征图
    # classifier部分的feature map是向量
    feature_extractor = model.features
    feature_map = get_k_layer_feature_map(feature_extractor, k, image_info)
    show_feature_map(feature_map)

After running the code, the result is as follows:
Insert picture description here
You can see that 128 feature maps are generated when k=5. Why are 128 feature maps? Look at the network structure to understand. It is also helpful for beginners to understand the feature map dimensions.
Debugging with ipdb can see:

ipdb> model                                                                                                                                           
VGG(
  (features): Sequential(
    (0): Conv2d(3, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (1): ReLU(inplace=True)
    (2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (3): ReLU(inplace=True)
    (4): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
    (5): Conv2d(64, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (6): ReLU(inplace=True)
    (7): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (8): ReLU(inplace=True)
    (9): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
    (10): Conv2d(128, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (11): ReLU(inplace=True)
    (12): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (13): ReLU(inplace=True)
    (14): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (15): ReLU(inplace=True)
    (16): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
    (17): Conv2d(256, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (18): ReLU(inplace=True)
    (19): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (20): ReLU(inplace=True)
    (21): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (22): ReLU(inplace=True)
    (23): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
    (24): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (25): ReLU(inplace=True)
    (26): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (27): ReLU(inplace=True)
    (28): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (29): ReLU(inplace=True)
    (30): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
  )
  (avgpool): AdaptiveAvgPool2d(output_size=(7, 7))
  (classifier): Sequential(
    (0): Linear(in_features=25088, out_features=4096, bias=True)
    (1): ReLU(inplace=True)
    (2): Dropout(p=0.5, inplace=False)
    (3): Linear(in_features=4096, out_features=4096, bias=True)
    (4): ReLU(inplace=True)
    (5): Dropout(p=0.5, inplace=False)
    (6): Linear(in_features=4096, out_features=1000, bias=True)
  )
)

The above is the complete vgg16 network structure, which is divided into two parts, features and classifier. The features part is used to display the feature map. From (0) to (30), they are the convolutional layer, the activation function layer and the maximum pooling layer. Our k=5 corresponds to (5): Conv2d(64, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) The
input dimension is 64, the output dimension is 128 , Which means to output 128 feature maps.
Let's put a few specific special pictures below to take a look at the extraction effect:
Insert picture description here
Insert picture description here
it looks more intuitive from the above results.
As for the flow of image convolutional data, if you can carefully start part by part debugging from the code, I believe you will have some gains, and the understanding will be more specific. Let's briefly talk about it in combination with the code:

The first is that a three-channel color image enters the convolutional layer. When performing operations, the image is divided into three channels for separate convolution operations and then added to obtain a feature map. After the first layer of convolutional layer, there are 64 convolutions. The product kernel (manually set), and each convolution kernel operation will generate a feature map. At this time, the output of an image becomes a 64-dimensional feature map, which is 64 feature maps, in the form of tensor Flow and subsequent convolution pooling are all based on this principle.
Sometimes when processing multiple images at a time, the form of tensor will have new changes. You can see this: After mini-batch batch processing
performs convolution operation, the feature map still exists in the form of tensor. If you want to visualize the feature map, just To convert tensor to numpy form, the cv library can draw it. These can be understood from the posted code. The differences in the specific drawing methods are as follows:

The image data supported by opencv is in numpy format, the data type is uint8, and the pixel value is distributed in [0,255], but the pixel value of the tensor data is not distributed in [0,255], and the data type is float32, so you need to do normalize and data transformation. Extend the image data to [0,255]. Another point is that tensor and numpy store data in different dimensional order, which requires transpose for conversion.

In addition, the color channel order in opencv is BGR and the image color channel in PIL and torch is RGB. Use cvtColor to convert the color channel.

Guess you like

Origin blog.csdn.net/qq_44442727/article/details/112977805