AlexNet summary

This article is "ImageNet Classification with Deep Convolutional Neural Networks" interpretation and summary papers. The paper was published in NIPS2012, author Alex Krizhevsky belong to the parent group Hinton neural networks, published from 2012 to date, there are more than 30,000 references, it is classic. AlexNet obtained in ISVRC2012 (ImageNet Large Scale Visual Recognition Competition) the first place, top5 test error rate of 15.3%, far more than the second place, compared with the traditional method, AlexNet reflects its great advantage is also AlexNet was born, so that the depth of learning once again brought to the cusp.

This is not the translation of the original paper, but will make a summary of the main points of which, many of the ideas and methods proposed in the paper are still in use, but also for each subsequent birth of the classic neural network provides a guideline.

1. Network structure

This network diagram can be said now it is everywhere visible. 5 layer Conv + pool, 2 layer fc, 1 layer softmax.

1.1 resume

AlexNet ReLU used instead of the traditional activation function, ReLU now have been widely used in a variety of CNN structure, described everywhere, and Comparative concepts on activation function, can be found in a Bowen Further bloggers: common activation function Compared.

More than 1.2 GPU Parallel Training

GPU memory limits the size of the network, so the network will be spread over two GPU, parallel training. Then AlexNet use of graphics memory is only 3G, so use this strategy (of course, two parallel GPU is completely isolated from the need to communicate with each other at a certain stage). Now, with the development of hardware, model training has been good improvement.

1.3 Standardization partial response

ReLU have a good characteristic is that it does not require input normalization to prevent saturation. As long as the positive input neurons, the neurons can play a role in learning. But one thing, local normalization is help generalization. Specific standardized formula given in the paper, will not be here focuses. Authors note in the paper, use this standardization, top-1 top-5 and the error rate has decreased.

1.4 of the overlapping pool

Pooling is generally do not overlap, but Pooling AlexNet used are overlapped, that is, when pooled, each step movement is less than the side length of the pool. AlexNet pool size of a square 3 * 3, each pooled moving step is 2, so there will be overlap. The program also reduces the error rate and the top-1 top-5, while effectively alleviate the overfitting.

2. Reduce over-fitting

The number of parameters AlexNet of 60 million, it is necessary to reduce the problem of over-fitting

2.1 Data upgrade

Alex data uses two ways to enhance a picture is cropped and a horizontal mirror image processing, to increase the diversity of the data set by the picture block from 256x256 and 224x224 random horizontal mirror. Since the extraction size of the picture so that it becomes a 224x224, it is also necessary to slightly change the test data, extracting part of the test set of 224 × 224 and the four corners of the intermediate image, further comprising a total mirror portion 10 (Patch) Results then these test results do average of 10 inputs of the test set image as the final test results.
The second way is to change the data to enhance the strength training picture RGB channels, PCA is performed on all RGB pixel values ImageNet training image. Each image of a train to a certain percentage of the principal component add more to find. Alex mentioned, this method effectively reduces the error rate of top-1.

2.2 Dropout

For DropOut, now widely used in various CNN networks, really a classic, it is an effective approach to integration model. dropout refers to the depth learning network training process, the neural network unit for, according to a certain probability temporarily dropped from the network. Note that temporarily, for stochastic gradient descent, because the randomly discarded, and therefore each of a mini-batch training in different networks. About Dropout dropout ratio selection and explain why the effective, there is a dedicated research papers, bloggers will be updated separately in a future post, so stay tuned!

About AlexNet, there are other training details and features, but for now has more common, not repeat them here. Alex also pointed out in the paper, the depth is very important to remove any layer of AlexNet, performance will be reduced. So, after the birth of a number of well-known AlexNet network structure, both in depth and down the effort network, more network interpretation, so stay tuned!

Original: https://blog.csdn.net/huangfei711/article/details/80421705

Guess you like

Origin blog.csdn.net/qq_36697196/article/details/91944755