【Recurrent Attention Convolutional Neural Network for Fine-grained Image Recognition】

Classification of skin diseases based on weakly supervised fine-grained methods

Article title

Look Closer to See Better: Recurrent Attention Convolutional Neural Network for Fine-grained Image Recognition

Article Source

CVPR2019

Author motivation

Region localization and fine-grained feature learning are two major challenges in fine-grained problems. Existing (before 19 years) methods mainly focus on solving these two problems independently, but ignore the correlation between the two, so a new architecture - RA-CNN is proposed

Author's ideas

An input image is cropped through the Attention Proposal Network (APN), and then enlarged through bilinear interpolation. The effect is equivalent to discarding other information in the picture and magnifying what "I" want to see. The effect is as follows:
Insert image description here

Network Architecture

Insert image description here
Rough explanation:
Enter an original image. There are two tasks for the original image. One is to classify the original image through convolution-fully connected-softmax like conventional image classification to obtain the probabilities of a series of categories; the second is to classify the original image after convolution A series of feature maps obtained after producting are passed through the Attention Proposal Network (APN) to obtain the attention results. As shown in the picture above, our attention is on the bird's head, so we crop out other parts, leaving only the bird's head, and then enlarge the bird's head through bilinear pooling. Echoing the title of the article - the closer you see, the better you see

Detailed explanation:
For a picture A, after feature extraction (convolution operation) - full connection - softmax, the probability P of different categories is obtained, as shown below: The
Insert image description here
loss L(X)1 is:
Insert image description here
At the same time, after feature extraction A series of feature maps are obtained, and through the attention proposal module
(APN), a square attention block is obtained, recorded as:
Insert image description here
tx represents the x coordinate of the attention center, ty represents the y coordinate of the attention center, and tl represents the attention block. half the side length. This is what we need to leave in the original image.

Guess you like

Origin blog.csdn.net/weixin_46516242/article/details/127853088