Webface v2

Please indicate the source https://blog.csdn.net/Fire_Light_/article/details/79595687

Papers Link: A lightened CNN for Deep Face Representation

Author CASIA

Outline

For better accuracy, the method of learning a depth deeper tend Ensemble networks and multiple models, this leads to a large model, calculation time. This paper presents a lightweight CNN, at the same time achieve better results, simplify network structure, time and space have been optimized to run on embedded devices and mobile devices.

MFM activation function

As used herein, the activation function called MFM, and this structure is very simple. Convolution of the input layer, the two layers selected, whichever is greater same position.

Write pictures described here

Written formula:

Write pictures described here

Convolution layer input layer is 2n, and k takes a larger value of n-th layer in the k + as the output layer, MFM output becomes the n-layer. Gradient function is activated

Write pictures described here

Thus half of the active layer with a gradient of 0, MFM can be sparse gradient can be achieved in accordance with the results to update the corresponding weight effect . Compared to the activation function and MFM RELU function, RELU function is obtained sparse high-dimensional feature, MFM can be obtained compaction (Compact) features, feature selection, and also to achieve reduction dimensional effect .

Network architecture

structure

The last layer is the network layer Sofmax achieve the purpose of classification, characteristics fc1 is the result of a human face.

Why not drop the table dimension?

Each layer has a convolution of two separate parts, independent training, and then enter the MFM, and that is conv2_1 conv2_2 parallel.

The amount of convolution layer parameters so little?

Looks for a convolution kernels, among its various dimensions sharing the same weight, this is not like any other convolution network (regardless of VGG, Inception same parameters for each dimension of the convolution kernel is not the same). Therefore significantly reducing the parameters.

And the end result is not bad

training

On the GTX980 trained two weeks

Training set is CASIA-WebFace (10K people, 0.5M image)

result

Write pictures described here

And c is about FIG relu for comparison, bloggers this regard doubt, i.e. CNN used in each layer is a convolution of two separate parts do MAXOUT, only a portion corresponding to the separate relu, the result may not so meaningful, because the parameters of the network has also been enhanced.

This table shows the network is very lightweight:

Write pictures described here

LFW score

Write pictures described here

Guess you like

Origin blog.csdn.net/weixin_39875161/article/details/91648243
v2