R-CNN after reading an article in five minutes

The basic structure of the Alex-Net network is almost the same as that of LeNet more than ten years ago. However, due to the earth-shaking development of data and hardware devices (especially GPU), the deep neural network is no longer a "party trick". It has become a practical and feasible tool and application means. Due to the emergence of Alex-Net, the eyes of the world have returned to the neural network.

Alex-Net network structure
Figure Alex-Net network structure

The Alex-Net network contains 5 convolutional layers and 2 fully connected layers. Inspired by Alex-Net, the author of R-CNN (Regions with CNN features) tried to generalize the ability of Alex-Net's target recognition on the ImageNet dataset to the PASCAL VOC dataset for target detection (now called migration learning) .

Two problems to be solved by R-CNN
1. How to use the convolutional network for target positioning;
2. How to train a network model with good performance on a small data set.

R-CNN target detection process
Figure R-CNN target detection process

R-CNN training process
1. The Alex-net network uses the ImageNet dataset for pre-train (“image classification”);
2. Uses the candidate regions extracted by SS for fine-tune (“target detection”).

R-CNN target detection process
1. Read an input image;
2. Use SS (Selective Search) to extract about 2000 bottom-up (from details to the whole) candidate regions (Region Proposals);
3. The candidate regions (Need to be scaled to 227x227 to make it compatible with Alex-Net) Input the Alex-net network separately, and use the fc7 layer output of Alex-net as the feature; 4. Input the output of the fc7 layer
into the SVM for classification, and use NMS for different types of collections (non-maximal suppression) for processing.

Finally, in order to improve the accuracy of target positioning, the author of R-CNN trained a linear regression model, which can predict the pool5 data of the candidate area and get a more accurate Box position (for details, please refer to my other Article).

For more details, please refer to the author's paper: Rich feature hierarchies for accurate object detection and semantic segmentation Tech report (v5)

Summary
Now, although R-CNN has various shortcomings, it is the first algorithm to successfully apply deep learning to target detection.

Guess you like

Origin blog.csdn.net/weixin_41006390/article/details/105261770