Convolutional Neural Networks Based on Machine Learning_Several Simple Training Techniques

Here are a few simple training techniques:

1. First of all, let’s talk about the concept of filter (receptive field): the size of the receptive field is the feature extracted from a certain unit in the feature map from a region in the original input image;

As shown in the figure above, in the feature map obtained after the first convolution, the size of the receptive field of each small unit is 3*3, and the size of the receptive field corresponding to each small unit in the feature map after the second convolution is 5 *5, and so on, if

With three layers of 3*3 filters, the size of the receptive field of each small unit in the feature map of the last layer is 7*7.

At the same time, we can infer that the more convolutional layers, the larger the receptive field size of each unit in the feature map, and the better the feature extraction effect.

At this time, we have a question, can we use one layer of 7*7 filter instead of three layers of 3*3 filter?

It can be seen from the above that the number of parameters required by a 7*7 filter is 49C*C, while the number of parameters required by a 3-layer 3*3 filter is 27C*C; on the one hand, the number of parameters required is less, and the computational complexity is Reduced , on the other hand, the three-layer 3*3 filter has a three-layer activation function,

Make the nonlinearity of the classifier better , so we choose 3 layers of 3*3 filters;

In order to get the above two effects, we can also use the deep staggered network that shines this year, as follows:

(1*1*C)*C/2+3*3*C/2*C/2+1*1*C/2*C=3.25C*C;

 (3*3*C)*C=9C*C;

Fewer parameters and better nonlinearity, and the number of layers increases, the number of parameters stagnates at most, but does not increase!

To sum up, it is: (1) stacking small filters;

                        (2) For example, 3*3 filters and 1*1 filters are combined to form a deep staggered network;

                       (3) Split the N*N filter into 1*N and N*1 filters;

2. Data preprocessing for training techniques: Even if the input data can grow exponentially, especially for algorithms such as deep learning that require a large amount of training data, data preprocessing is very important!

  The preprocessing methods are roughly as follows: (1) horizontal flip;

                              (2) Random crop/size transformation: do different crops for different scales;

                               (3) Translation, angle transformation, stretching, trimming, etc. . . .

For example, the following picture is the effect of horizontal flip:

One thing to note here: for an original input image, these operations must be performed once, and the parameters are random;

3. Transfer Learning (fine tune) of training skills: that is to say, we have done data enhancement, but the amount of data required is still not enough. At this time, we need a network trained by others and learn from other people's parameters. On this basis train yourself

network of;

As shown in the figure above: If the available training samples are small datasets, only the FC layer will be changed. If it is a medium dataset, the FC layer and the last conv layer and pooling layer will be changed.

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=325542203&siteId=291194637