CAM: Class Activation Mapping Class Activation Mapping

CAM

Brother Tongji Zihao--CAM Interpretability Analysis-Algorithm Explanation

Learning Deep Features for Discriminative Localization, CVPR2016

contribute:

  1. Laid the cornerstone of interpretable analysis and significance analysis

insert image description here

Introduction to Category Activation Heatmap Visualizer

  1. The same image, draw different heat maps according to different categories

insert image description here

  1. underlying attention mechanism

  2. Weakly supervised learning (image classification model -> complete positioning)

1. Introduction and related work

  1. Convolutional units in convolutional neural networks act as object detectors. But this ability to saliently localize objects is lost when using fully-connected layer classification.
  2. NIN proposes GAP (global average pooling), whose advantages are not only reflected in regularization, but more importantly, it can maintain the positioning ability of the network to the last layer.
  3. GAM can be used for weakly supervised object localization.

Two multi-instance learning weakly supervised target localization
"RG Cinbis, J. Verbeek, and C. Schmid. Weakly supervised object localization with multi-fold multiple instance learning. IEEE Trans. on Pattern Analysis and Machine Intelligence, 2015. 1, 2"
"PO Pinheiro and R. Collobert. From image-level to pixellevel labeling with convolutional networks. 2015. 1, 2"


2. Class Activation Mapping class activation mapping

insert image description here

analyze:

  • Linear classification logit logit for class clogit ratio : S c = ∑ k ω kc ∑ x , yfk ( x , y ) = ∑ x , y ∑ k ω kcfk ( x , y ) S_c = \sum_k \omega_k^c \sum_ { x ,
    y} f_k(x,y) = \sum_{x,y}\sum_k \omega_k^c f_k(x,y)Sc=kohkcx,yfk(x,y)=x,ykohkcfk(x,y)

where fk ( x , y ) f_k(x,y)fk(x,y ) represents the unit k of the last convolutional layer in the spatial grid( x , y ) (x,y)(x,y ) activation. After that, for unit k, after GAP global average pooling,F k = ∑ x , yfk ( x , y ) F_k = \sum_{x,y} f_k(x,y)Fk=x,yfk(x,y ) . For category c, the input goes throughsoftmax softmaxso f t ma xwkc w_k^cwkcIndicates the weight of class c's importance to unit k.

  • Class probability map M c M_c for class cMc:
    M c ( x , y ) = ∑ k ω kcfk ( x , y ) M_c(x,y) = \sum_k \omega_k^c f_k(x,y)Mc(x,y)=kohkcfk(x,y)

Therefore, S c = ∑ x , y M c ( x , y ) S_c = \sum_{x,y} M_c(x,y)Sc=x,yMc(x,y), 其中 M c ( x , y ) M_c(x,y) Mc(x,y ) indicates that in space grid( x , y ) (x,y)(x,y ) for the importance of activations for classifying images into classes c.

The channel of each feature map represents a class of visual features extracted from an image by a convolution kernel. wc w_cwcThe weight indirectly reflects the importance of this feature to category c. The 14x14 feature map is then scaled to the original input image size by upsampling.


GAP vs GMP:

  • average: The features in the key area have an impact
  • max : It is useless to change the features of the non-maximum value (no gradient)

Classification performance is close, positioning performance is different


discuss:

  1. Fully convolutional neural network? Why not use pooling?
    Pooling (Max, Mean) function:
    • Reduce the amount of calculation
    • prevent overfitting
    • translation invariance

Pooling (downsampling) introduces translation invariance, which also means that the position information in the length and width directions is lost. Therefore, in the CAM heat map, the convolutional neural network with pooling is not used.

  1. Global draw pooling (GAP)?

Global average pooling (GAP) replaces the fully connected layer, reducing the number of parameters and preventing overfitting.
And the average value of each GAP indirectly represents each channel output by the last layer of the convolutional layer.
In the CAM algorithm, there must be a GAP layer, otherwise the weight of each channel cannot be calculated. [shortcoming]

insert image description here


[Note] GAP and 1x1 convolution are proposed in NIN.
insert image description here


3. Experiment: Localization (Location Competition)

Localization: Localization competition, there is an object in the image that needs to be classified + draw a box.

Method: use GAP instead of fully connected layer⟶ \longrightarrow⟶Retraining the model
The less the number of downsampling, the larger the feature map output by the last convolutional layer, the less the loss of spatial information, and the better the positioning performance.


4. Disadvantages of CAM algorithm:

  1. There must be a GAP layer, otherwise the model structure must be modified and retrained
  2. Only the output of the last convolutional layer can be analyzed, but the middle layer cannot be analyzed

Improvement work:

Grad-CAM

  • No need for GAP layer
  • middle layer can be analyzed
    insert image description here

SqueezeNet (lightweight network)

The last convolutional layer directly outputs the channel number feature map for the number of categories
insert image description here


5. Significance of significance analysis

1. Industrial application

  • machine learning
    • Solve problems in industry: such as parameter setting
  • machine teaching
    • Through visualization, tell people where they need to pay attention, and teach people to learn
  1. AI teaching

Paper: Making a Bird AI Expert Work for You and Me
teaches people to use the characteristics of different types of birds in images to distinguish different birds


References:

insert image description here

Thinking questions:

insert image description here

Guess you like

Origin blog.csdn.net/qq_38869560/article/details/128341311