NVIDIA DLI Deep Learning Introductory Training | Three special sessions! !
April 28th/May 19th/May 26th One- day intensive study to get you started quickly Read more >The text consists of 1859 words, 2 pictures, and the estimated reading time is 5 minutes.
This is one of many convolutional neural network visualization methods. The method comes from the paper "Learning Deep Features for Discriminative Localization", and the translation of the paper is [translation]Learning Deep Features for Discriminative Localization.
The core idea of this article is to propose a method called Class Activation Mapping (Class Activation Mapping), which can visualize what CNN "sees" during classification. Its principle is: the convolutional layer of CNN contains a lot of position information, which makes it have good positioning ability, but the fully connected layer makes this ability lost, if only the last fully connected layer for classification (specially softmax) is kept. , replace the remaining fully connected layers with the Global Average Pooling layer, you can retain this positioning ability, multiply the weight of each unit of the last softmax layer with the output of the last convolutional layer (for the weighted sum), draw a thermal image, and the result is a class activation map.
For example, suppose that the shape of the image after the last convolutional layer is (14, 14, 512), the first dimension and the second dimension represent the width and height, the third dimension represents the depth of the convolution layer, and the shape of the softmax layer is (512, 10), The first dimension represents the number of units, and the second dimension represents the number of classifications. To get the class activation map of a certain class, multiply the matrix through the last convolutional layer by the matrix of a certain class of sotfmax, that is, the matrix multiplication of (14, 14, 512). Taking the matrix of (512,1), the matrix of (14,14,1) is obtained, which is the class activation map of that class. The following is the class activation map:
Driver's driving state classification
It is mentioned in the paper that the higher the resolution of the output of the last convolutional layer, the stronger the localization ability, and the better the CAM image obtained. The corresponding processing method is not only to cut off the fully connected layer, but also to cut off some convolutional layers, so that the resolution is controlled at about 14. The following is the picture in the paper. The biggest difference from the above picture is that it has red color. The reason may be a resolution problem, or it may simply be a color representation problem. Further experiments are needed to determine it, but it does not affect the visualization. The classification accuracy rate is also above 90%. .
dog classification
In fact, it is natural to have a question here: if so many layers are cut off, will the accuracy rate be reduced?
The answer is yes, but it won't decrease much, and the network accuracy can be maintained by fine-tuning.
The following is the part of the code that everyone cares about most. I use Keras based on TensorFlow, so the color channel is at the end, and students who use other frameworks can adjust it, and it will be placed in the Github repository after a while.
def visualize_class_activation_map(model, img_path, target_class):
'''
parameters:
model: model
img_path: image path
target_class: target type
'''
origin_img = get_im_cv2([img_path], 224, 224, 3) # This is a custom read Take the picture function
class_weights = model.layers[-1].get_weights()[0] # Take the weight of the last softmax layer
final_conv_layer = model.layers[17] # This is the index of the last convolutional layer
get_output = K.function ([model.layers[0].input],[final_conv_layer.output, model.layers[-1].output])
[conv_outputs, predictions] = get_output([origin_img])
conv_outputs = conv_outputs[0, :, :, :]
cam = np.zeros(dtype=np.float32, shape=(14, 14))
for i, w in enumerate(class_weights[:, target_class]):
cam += conv_outputs[:, :, i] * w
cam = cv2.resize(cam, (224, 224))
cam = 100 * cam
plt.imshow(origin_img[0])
plt.imshow(cam, alpha=0.8, interpolation='nearest')
plt.show()
Original link: https://www.jianshu.com/p/641a6fc97117
For more concise and convenient classified articles and the latest course and product information, please move to the newly presented "LeadAI Academy Official Website":
www.leadai.org
Please pay attention to the artificial intelligence LeadAI public account to view more professional articles
Everyone is watching
Application of LSTM model in question answering system
TensorFlow-based neural network solves the problem of user churn overview
The Most Common Algorithm Engineer Interview Questions (1)
Sorting out the most common algorithm engineer interview questions (2)