Please indicate the source https://blog.csdn.net/Fire_Light_/article/details/79595687
Papers Link: A lightened CNN for Deep Face Representation
Author CASIA
Outline
For better accuracy, the method of learning a depth deeper tend Ensemble networks and multiple models, this leads to a large model, calculation time. This paper presents a lightweight CNN, at the same time achieve better results, simplify network structure, time and space have been optimized to run on embedded devices and mobile devices.
MFM activation function
As used herein, the activation function called MFM, and this structure is very simple. Convolution of the input layer, the two layers selected, whichever is greater same position.
Written formula:
Convolution layer input layer is 2n, and k takes a larger value of n-th layer in the k + as the output layer, MFM output becomes the n-layer. Gradient function is activated
Thus half of the active layer with a gradient of 0, MFM can be sparse gradient can be achieved in accordance with the results to update the corresponding weight effect . Compared to the activation function and MFM RELU function, RELU function is obtained sparse high-dimensional feature, MFM can be obtained compaction (Compact) features, feature selection, and also to achieve reduction dimensional effect .
Network architecture
The last layer is the network layer Sofmax achieve the purpose of classification, characteristics fc1 is the result of a human face.
Why not drop the table dimension?
Each layer has a convolution of two separate parts, independent training, and then enter the MFM, and that is conv2_1 conv2_2 parallel.
The amount of convolution layer parameters so little?
Looks for a convolution kernels, among its various dimensions sharing the same weight, this is not like any other convolution network (regardless of VGG, Inception same parameters for each dimension of the convolution kernel is not the same). Therefore significantly reducing the parameters.
And the end result is not bad
training
On the GTX980 trained two weeks
Training set is CASIA-WebFace (10K people, 0.5M image)
result
And c is about FIG relu for comparison, bloggers this regard doubt, i.e. CNN used in each layer is a convolution of two separate parts do MAXOUT, only a portion corresponding to the separate relu, the result may not so meaningful, because the parameters of the network has also been enhanced.
This table shows the network is very lightweight:
LFW score