The basic structure of the Alex-Net network is almost the same as that of LeNet more than ten years ago. However, due to the earth-shaking development of data and hardware devices (especially GPU), the deep neural network is no longer a "party trick". It has become a practical and feasible tool and application means. Due to the emergence of Alex-Net, the eyes of the world have returned to the neural network.
Figure Alex-Net network structure
The Alex-Net network contains 5 convolutional layers and 2 fully connected layers. Inspired by Alex-Net, the author of R-CNN (Regions with CNN features) tried to generalize the ability of Alex-Net's target recognition on the ImageNet dataset to the PASCAL VOC dataset for target detection (now called migration learning) .
Two problems to be solved by R-CNN
1. How to use the convolutional network for target positioning;
2. How to train a network model with good performance on a small data set.
Figure R-CNN target detection process
R-CNN training process
1. The Alex-net network uses the ImageNet dataset for pre-train (“image classification”);
2. Uses the candidate regions extracted by SS for fine-tune (“target detection”).
R-CNN target detection process
1. Read an input image;
2. Use SS (Selective Search) to extract about 2000 bottom-up (from details to the whole) candidate regions (Region Proposals);
3. The candidate regions (Need to be scaled to 227x227 to make it compatible with Alex-Net) Input the Alex-net network separately, and use the fc7 layer output of Alex-net as the feature; 4. Input the output of the fc7 layer
into the SVM for classification, and use NMS for different types of collections (non-maximal suppression) for processing.
Finally, in order to improve the accuracy of target positioning, the author of R-CNN trained a linear regression model, which can predict the pool5 data of the candidate area and get a more accurate Box position (for details, please refer to my other Article).
For more details, please refer to the author's paper: Rich feature hierarchies for accurate object detection and semantic segmentation Tech report (v5)
Summary
Now, although R-CNN has various shortcomings, it is the first algorithm to successfully apply deep learning to target detection.