"Pyramid Attention Network for Image Semantic Segmentation" Paper Notes

Module 1: The Feature Pyramid Attention Module (FPA)
Insert picture description here
combines the spatial pyramid structure proposed by PSPNet or DeepLab and the attention mechanism of SENet. It combines context information at different scales and at the same time provides better pixel levels for high-level feature maps. Attention features, expand the receptive field and effectively realize the classification of small targets.

Module 2: Global Attention Upsampling Module (GAU)
Insert picture description here
extracts the global context information of high-level features as a guide for the weighted calculation of low-level features (use high-level features to guide low-level features)

Overall network model: Pyramid Attention Network (PAN)
Insert picture description here
ablation experiment
1.
AVE means average pooling, MAX means maximum pooling, C333 means that all 3×3 convolution kernels are used, C357 means that the convolution kernels used are 3×3, 5×5 and 7×7, respectively, and GP means global pooling Branch, SE means using SENet attention module
FPA module
(1) Only low-level features of skip connections are used without global contextual attention branches.  (2) Use 1×1 convolution to reduce the number of low-level features in the GAU module.  (3) Use 3×3 convolution instead of 1×1 convolution to reduce the number of channels
2. GAU module 3. Comparison with other classic network models
PASVAL VOC 2012 data set
Cityscapes dataset
Article link: https://blog.csdn.net/guleileo/article/details/80544835

Question: Why is multiplying pixel by pixel in FPN, and what is the significance? When do you use addition and when do you use multiplication?

Guess you like

Origin blog.csdn.net/qq_45234219/article/details/113868526