【Reading paper】Visualizing and Understanding Convolutional Networks

1. What is the main purpose of this article?

In this paper,

  • The authors introduce a newvisualization technology, this visualization technique can reveal the input stimuli that excite individual feature maps at any layer of the model, allowing us to observe the evolution of features during model training, which can be used to gain insight into the function of intermediate feature layers and the operation of classifiers. These visualizations allow us to find model architectures that perform better on the ImageNet classification benchmark than Krizhevsky et al.
  • At the same time, the author also carried outAblation study, to discover the contribution of different model layers to the performance.

2. Specific practices

2.1 Model

A standard fully supervised convolutional neural network model is used in the article. The model accepts a 2D color image as input, passes through a series of layers, and obtains a representation that the image belongs to each category CCThe probability vectory ^ \widehat{y} of Cy . And use the cross-entropy loss function to y ^ i \widehat{y}_iy isum yi y_iyiFor comparison, the parameters are trained by backpropagation and updated by stochastic gradient descent.

insert image description here

2.2 Visualization with a Deconvnet

In order to explain why convolutional neural networks work, we need to explain what each layer of CNN has learned. This article uses deconvolution (deconvnet) [ 2 ] ^{[2]}The method of [ 2 ] to visualize the intermediate layers of CNN allows us to observe the evolution of features during model training, gaining insight into the function of intermediate feature layers and the operation of classifiers.

In the paper Adaptive deconvolutional networks for mid and high level feature learning, the author proposes the deconvnet method as an unsupervised method for learning image representation.

But in this paper, the deconvnet is not used for any learning capability, just as a probe for an already trained convolutional neural network.

How to do it : First, the input image is submitted to the convolutional neural network, and the features of each layer are calculated. To examine a given convnet activation, we set all other activations in that layer to zero and pass the feature map as input to an additional deconvnet layer. We then sequentially (i) unpool, (ii) rectify, and (iii) filter to reconstruct the underlying activities that produced the selected activations. This operation is repeated until the input pixel space is reached.

insert image description here


(1) Unpooling anti-pooling operation (corresponding to the pooling operation of the convolutional network)

Theoretically, the max pooling operation in a convolutional network is irreversible, but we can make it approximately reversible through the pooling index.

insert image description here

(2) Rectification correction operation (corresponding to the ReLU operation of the convolutional network)

ReLU is used in the convolutional network to ensure that the eigenvalues ​​are non-negative. In order to ensure the consistency of the positive and negative processes, we also use ReLU to obtain non-negative values ​​for the reconstructed features of each layer of the deconvolution network.

(3) Filtering deconvolution operation (corresponding to the convolution operation in the convolutional network)

The convolution operation is an inefficient operation, and the mainstream neural network frameworks all passim2col+ matrix multiplicationRealize convolution, trade space for efficiency. The elements in each convolution window in the input are straightened into a single column, so that the input is converted into a matrix (Columns) of H_out * W_out columns, hence the name of im2col; after the convolution kernel is also pulled into a column (Kernel), the input matrix is ​​multiplied to the left to obtain the convolution result (Output). im2col and matrix multiplication are shown in the following two figures:
insert image description here
insert image description here
the deconvolution operation mentioned in this article is actuallytransposed convolution, the neural network framework implements with the help of transposed convolutionGradient Backpropagation, as shown in the following two figures. After the convolution kernel matrix is ​​transposed (Weight_T), the gradient output (GradOutput) is multiplied to the left to obtain the gradient column matrix (GradColumns). After col2im is restored to the input size, the gradient relative to the input (GradInput) is obtained.
insert image description here
insert image description here

Code

Reference: https://github.com/huybery/VisualizingCNN






Reference :
[1] Zeiler, M., Taylor, G., Fergus, R.: Adaptive deconvolutional networks for mid and high level feature learning. In: ICCV (2011) [2] Zeiler MD, Taylor GW, Fergus R. Adaptive deconvolutional networks for mid and high level feature learning[ C]//2011 international conference on computer vision. IEEE, 2011: 2018-2025. [3] Deep Learning (27) Visual Understanding of Convolutional Neural Networks [4] "Intuitive Understanding" Convolutional Neural Networks (1): Deconvolution (Deconvnet) [5] Principle and Implementation of im2col [6] Deep Learning -
Detailed
Explanation of
VGG16 Principles
[ 7
] Code
Implementation Reference

Guess you like

Origin blog.csdn.net/qq_42757191/article/details/126526741