Introducing the attention mechanism into ResNet, the upswing skills in the visual field are here! Attached method of use

Why use BoTNet? Design ideas

In recent years, convolutional skeleton networks have made great progress in various fields of computer vision. This is due to the ability of convolution to effectively capture local information in images, but for vision such as instance segmentation, target detection, and key point detection. Tasks require modeling of long-term dependencies.

1. Why introduce attention mechanism? The traditional convolution-based architecture requires stacking many convolutional layers to summarize and capture the calculation results of local information globally. Although stacking more layers may improve the performance of these skeletal networks, explicitly modeling global dependencies may be a more powerful and scalable solution.

2. Why not replace all of them with attention mechanisms? The input image (1024 pixels) in the object detection field is very large compared to the image classification (224 pixels). For the self-attention mechanism, the amount of video memory and calculation consumed increases by 4 times the size of the input, resulting in low training and testing efficiency and impractical

Guess you like

Origin blog.csdn.net/weixin_47967031/article/details/114883345