What Convolutional Neural Networks See

NVIDIA DLI Deep Learning Introductory Training | Three special sessions! !

April 28th/May 19th/May 26th One- 640?wx_fmt=jpeg day intensive study to get you started quickly Read more >


The text consists of 1859 words, 2 pictures, and the estimated reading time is 5 minutes.


This is one of many convolutional neural network visualization methods. The method comes from the paper "Learning Deep Features for Discriminative Localization", and the translation of the paper is [translation]Learning Deep Features for Discriminative Localization.


The core idea of ​​this article is to propose a method called Class Activation Mapping (Class Activation Mapping), which can visualize what CNN "sees" during classification. Its principle is: the convolutional layer of CNN contains a lot of position information, which makes it have good positioning ability, but the fully connected layer makes this ability lost, if only the last fully connected layer for classification (specially softmax) is kept. , replace the remaining fully connected layers with the Global Average Pooling layer, you can retain this positioning ability, multiply the weight of each unit of the last softmax layer with the output of the last convolutional layer (for the weighted sum), draw a thermal image, and the result is a class activation map.


For example, suppose that the shape of the image after the last convolutional layer is (14, 14, 512), the first dimension and the second dimension represent the width and height, the third dimension represents the depth of the convolution layer, and the shape of the softmax layer is (512, 10), The first dimension represents the number of units, and the second dimension represents the number of classifications. To get the class activation map of a certain class, multiply the matrix through the last convolutional layer by the matrix of a certain class of sotfmax, that is, the matrix multiplication of (14, 14, 512). Taking the matrix of (512,1), the matrix of (14,14,1) is obtained, which is the class activation map of that class. The following is the class activation map:


640?wx_fmt=png

Driver's driving state classification


It is mentioned in the paper that the higher the resolution of the output of the last convolutional layer, the stronger the localization ability, and the better the CAM image obtained. The corresponding processing method is not only to cut off the fully connected layer, but also to cut off some convolutional layers, so that the resolution is controlled at about 14. The following is the picture in the paper. The biggest difference from the above picture is that it has red color. The reason may be a resolution problem, or it may simply be a color representation problem. Further experiments are needed to determine it, but it does not affect the visualization. The classification accuracy rate is also above 90%. .


640?wx_fmt=png

dog classification


In fact, it is natural to have a question here: if so many layers are cut off, will the accuracy rate be reduced?


The answer is yes, but it won't decrease much, and the network accuracy can be maintained by fine-tuning.


The following is the part of the code that everyone cares about most. I use Keras based on TensorFlow, so the color channel is at the end, and students who use other frameworks can adjust it, and it will be placed in the Github repository after a while.


def visualize_class_activation_map(model, img_path, target_class):
   '''
   parameters:
       model: model
       img_path: image path
       target_class: target type
   '''
   origin_img = get_im_cv2([img_path], 224, 224, 3) # This is a custom read Take the picture function
   class_weights = model.layers[-1].get_weights()[0] # Take the weight of the last softmax layer

   final_conv_layer = model.layers[17] # This is the index of the last convolutional layer
   get_output = K.function ([model.layers[0].input],[final_conv_layer.output, model.layers[-1].output])
   [conv_outputs, predictions] = get_output([origin_img])

   conv_outputs = conv_outputs[0, :, :, :]
   cam = np.zeros(dtype=np.float32, shape=(14, 14)) 
   
   for i, w in enumerate(class_weights[:, target_class]):
       cam += conv_outputs[:, :, i] * w

   cam = cv2.resize(cam, (224, 224))
   cam = 100 * cam
   plt.imshow(origin_img[0])
   plt.imshow(cam, alpha=0.8, interpolation='nearest')

   plt.show()


Original link: https://www.jianshu.com/p/641a6fc97117


For more concise and convenient classified articles and the latest course and product information, please move to the newly presented "LeadAI Academy Official Website":

www.leadai.org


Please pay attention to the artificial intelligence LeadAI public account to view more professional articles

640?wx_fmt=jpeg

Everyone is watching

640.png?

Application of LSTM model in question answering system

TensorFlow-based neural network solves the problem of user churn overview

The Most Common Algorithm Engineer Interview Questions (1)

Sorting out the most common algorithm engineer interview questions (2)

TensorFlow from 1 to 2 | Chapter 3 The Beginning of the Deep Learning Revolution: Convolutional Neural Networks

Decorators | Advanced Programming in Python

Why don't you review the basics of Python today?

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=326164576&siteId=291194637