Deep Learning Theory (18) -- SENet of Attention Mechanism

Scientific knowledge

ILSVRC (ImageNet Large Scale Visual Recognition Challenge) is one of the most sought after and authoritative academic competitions in the field of machine vision, representing the highest level in the image field. The ImageNet dataset is a dataset used by the ILSVRC competition, led by Professor Li Feifei of Stanford University, and contains more than 14 million full-size labeled images. The ILSVRC competition will draw some samples from the ImageNet dataset every year. Taking 2012 as an example, the training set of the competition contains 1,281,167 pictures, the verification set contains 50,000 pictures, and the test set is 100,000 pictures.

02f0e44ed6247d55d11057992507bf2a.gif

foreword

    In the previous article, we learned the DenseNet network together. The core of the network is that the input of each layer in each dense block contains all the previous layers. These layers are spliced ​​​​in the channel dimension to together serve as the next layer input. This has alleviated the problem of gradient disappearance to a certain extent, and thus can build a deeper neural network.

Today we continue to learn a new network --> SENet, and its appearance has once again refreshed the direction of convolutional neural networks. What problem is the network addressing? What work did you do in the end? How is the effect? Let's move on!

Overview of SENet 

1

SENet: Squeeze-and-Excitation Networks, literal translation is the network of extrusion and excitation, the network is the first place in the 2017 ILSVRC competition, and because it can be easily embedded into the rest of the convolutional neural network to improve the accuracy, it is received It has attracted a lot of attention, which has also triggered the thinking of the convolutional neural network on the attention mechanism, and then various attention mechanisms have been born, such as CBAM, DACNet, etc.

Paper Screenshot

267abfada76c98b821a62c6ab395bd23.png

56b3579899f9a1343c3bd35f3c8b4d3c.png

a1c7b41fc42f06e0072ba305631e12b6.png

Paper address: https://arxiv.org/pdf/1709.01507.pdf

Reasons for SENet

2

    Let's first think about such a question, in the previous classic network learning, what are these networks improving? From what angle was it improved? In fact, the improvement is mainly from the structure and depth of the network, that is to say, the main improvement is the stacking and connection methods of the network, and the improvement is rarely made from the network itself. This article of SENet has made some improvements from the network itself, so its core ideas can be easily embedded into the previous classic neural network architecture (mainly convolutional neural network).

1. As can be seen from the abstract: previous studies have attempted to improve the expressive (representational) capabilities of CNNs by enhancing the quality of spatial encoding at the entire feature level of CNNs. This sentence is a little professional, and it is simple: try to improve the quality of the features in CNN (why? All features are useful? Not necessarily, then the enhancement that should be enhanced, the weakening that should be weakened), and then achieve a kind of The function of feature enhancement (weakening is also a negative performance of enhancement), so that the features of CNN are more representative.

2. In the introduction: Some recent researchers have improved CNN (generated) representation capabilities by integrating learning mechanisms to capture the spatial correlation among features.

3. To sum up, what was mentioned above is to improve the representation ability of CNN from the perspective of spatial relationship. Are there other places where the same function can be achieved?

4. Then the author proposed whether it is possible to study the relationship (interdependence) between feature map channels to improve the representation ability of CNN?

Contribution of SENet

3

In this paper, the author studies another aspect of network design --> the relationship between channels. Then a new structural unit is proposed, named: Squeeze-andExcitation (SE) block.

1. A new structural unit called SE block is introduced. What can this structure do? It can model the interdependence relationship between feature map channels. What is the use of this relationship? It is used to enhance the representation quality of CNN (equivalent to the feature purification, which makes the feature expression ability stronger, that's right?).

2. This module can be easily embedded into most CNNs, at any level, at any depth, from the early stage to the later stage, because its ability can be accumulated.

3. The network formed by this module dominated the ILSVRC competition of the year and refreshed the record on ImageNet. It can improve the performance of almost all previous classic CNN architectures.

SEet's Network Architecture 

4

In fact, the architecture of this module is very simple, but often the simple things are the most overlooked. In general, the module is streamlined, efficient, lightweight and easy to integrate.

SE block structure diagram:

cdbd8d0051f73c4338780b9afd805af6.png

Schematic diagram embedded in the mainstream CNN network:

e9e195dc8873e366174af604061926ec.png

When you see the first picture, you don’t quite understand it, maybe you have to read the formula in the paper carefully, but when you see the second picture, you should probably understand what the so-called SE block is. To be simple: after the SE block is embedded in a certain layer, the received feature map is first subjected to global average pooling so that the size of the feature map becomes 1x1, and the number of channels remains unchanged (why not? What needs to be learned is the inter-channel Dependency? How to learn after changing?), followed by the fully connected layer--activation layer--full connection layer--sigmoid activation layer (why it is not another activation layer, this function can convert the previous The specific value of the channel number is normalized to 0-1, so it is easier to learn), and the subsequent Scale layer is to perform matrix multiplication between the value of the channel number just learned and the original input feature map (the point of the broadcast situation take). Wait, are we a little awakened here, isn't the SE module just learning the value of the number of channels in the original feature map (0-1 range), isn't this equivalent to learning a weight for each channel? Indeed, if you can understand this, you will also understand the idea of ​​​​SE. The essence of SE is to learn an adaptive weight for each channel, so as to determine which channels need more attention and which are less important. Even Some are unnecessary (the weight is very close to 0), the problem is, each channel contains so many feature points, what we learn is a value (each channel), so the learned weight is actually for each channel For all the feature points, the weight of each channel will be broadcast to each feature point on the channel during the actual calculation. In this way, each feature point has a learned weight, which is also explained from the side, SE The focus is on the globality of the channel, and each feature point in the channel is regarded as an equally important position.

the end

d840bafaba0ca7015fbdcedfad10d5bd.png

This is the end of SENet sharing in this issue. Students who have time can study the code given in the paper in advance. In the next issue, we will use the pytorch framework to start the actual combat chapter. Stay tuned!                                                                   

Editor: Layman Yueyi | Reviewer: Layman Xiaoquanquan

9042a48e3043012c6474d97858b959c4.png

Scan code attention

Advanced IT Tour

NO.1

Past review

Deep Learning Theory (18) -- DenseNet covers thousands

Deep Learning Theory (Seventeen) -- ResNet's deep classic

Deep Learning Theory (16) -- GoogLeNet's Re-exploration of the Mystery of Depth

In the past have:

[Year-end Summary] Saying goodbye to the old and welcoming the new, 2020, let's start again

[Year-end summary] 2021, bid farewell to the old and welcome the new

[Year-end summary] Pay tribute to the distant 2021 and embrace a different 2022

like it

Guess you like

Origin blog.csdn.net/xyl666666/article/details/123785177