Introduction to several classic networks of convolutional neural networks (classification and regression) based on machine learning

1. Classical neural networks include: AlexNet proposed in 2012 and VGGNet proposed in 2014. The structure diagrams are as follows:

2. Classification and regression:

(1) Classfication: After going through a series of convolutional layers and pooling layers, the scores of the samples belonging to each class are obtained through the fully connected layer, and then they are classified by softmax classification;

(2) Regression: It is equivalent to using a rectangular frame to frame the object to be recognized, that is, localization;

as follows:

Here, the regression uses the fitting method, that is, given the position of the object in the input (x, yw, h), and then uses the output of the convolutional network (x', y', w', h') to fit correctly The position of , the Loss value is the Euclidean distance between the two:

So in the end, there are two things to do through the fully connected layer, namely classfication and localization:

Of course, with this regression method, we can also do other things, such as pose estimation, which is to locate n points on the object and connect them, and the behavior of the object can be judged by the shape:

In the process of finding the location, the most commonly used is Sliding Window: that is, for an input, a fixed-size sliding window of say 3*3 is used to move continuously in the image. A trained network such as VGGNet is used to calculate the probability of belonging to a "cat" until all areas have been slid, to see which area has the highest probability, and that area is the final location:

It should also be noted here that when using the sliding window, the input image should be preprocessed and scaled, that is, the original input image will be enlarged and reduced into a series of images of different sizes, and then the sliding window of a fixed size will be used to slide. is to detect the position of objects when they are different sizes in different images.

Of course, it is very laborious to do this, because the sliding window is generally small, and each input may need to be slid many times, and CNN needs to be traversed once for each sliding, which is very memory-intensive, so there is a simpler The method is called Region Proposals: frame the useful objects in the input image as candidate frames, and then use the selective search algorithm, that is, for the n candidate frames, the pixel characteristics or texture characteristics of the image are similar or have the same Regular boxes are merged into large boxes, generally the final box is Location:

The regression head (regression module) is generally placed after the last convolutional layer (for overfeat, VGG) or after the fully connected layer (for DeepPose, R-CNN).

 

Ok, let's talk about the training process of CNN convolutional neural network:

(1) Selected model (classic models include VGGNet, AlexNet, GoogleNet; VGG is the most commonly used):

        Here, you do not need to initialize parameters such as weights and thresholds from scratch. Use fine tune, that is, use the model trained by others to train your own network, use other people's parameters, and then fine-tune it yourself. (Sometimes other people's implementation functions may differ from ours

        It is not the same. For example, others need to be divided into 1000 categories, and we only need to divide into 10 categories. At this time, we only need to modify the fully connected layer, and the previous ones do not need to be modified)

(2) Add regression module;

(3) Train the network;

(4) Testing;

The process is as follows:

Let's introduce R-CNN and its training process, which is similar to CNN, but the last step is to fix the feature to the disk first, and then perform the regression:

It can be seen from the above: R-CNN has two disadvantages: (1) CNN is performed once for each region proposal;

                                         (2) The process is cumbersome, first solidify and then extract.

Then there is fast R-CNN: using convolution sharing, only one CNN is performed for all regions; no curing is required:

faster R-CNN:将selective search整合到Net中,即在网络中添加一层region proposal network(RPN),不需要在原输入图像中找候选框,而是在卷积之后的特征图中找框,提高了速度:

将R-CNN、fast R-CNN、faster R-CNN进行对比如下:

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=325447791&siteId=291194637