SlimYOLOv3:Narrower,Faster and Better for Real-Time UAV Application
This article is for Yolov3 do the pruning model, given the typical model began pruning process, as shown.
Secondly, the model for a given process yolov3 pruning, as
What is pruning depth model? Like the name of the paper narrower (Narrower), it is necessary to reduce the number of channel model.
Each layer is removed unimportant convolutional channel characteristics. It is necessary to properly assess the importance of the channel characteristics.
Its nature: the scaling factor by imposing channel L1 regularization and trim smaller amount characteristic of the channel layer is achieved convolutional channel sparsity level to obtain SlimYOLOv3.
Figure above process is explained as follows: YOLOv3 sparse After training, to obtain the scale factor of each channel, then channels that remove small scale factor, the resulting model SlimYOLOv3 pruning further fine-tune the data sets, to obtain a detection result, and then proceed to the next a sparse training. The pruning process is iteratively repeated over until the models meet certain conditions, such as pruning rate model certain requirements.
This paper, the authors use the feature fusion technology, introduction of spatial pyramid (SPP) structure, yolov3 made a small change. SPPmodule comprising four parallel maxpooling layer, respectively, nuclear size 1 * 1, 5 * 5, 9 * 9,13 * 13.SPPmodule extraction features can be different receptive field, and then in the passage where they concat dimension.
Between the detection of several up header fifth and sixth convolution convolution added SPPmodule (added to the number of layers between yolo input direction, the fifth and sixth convolution SPPmodule).
Detection header is output N * N * (3 * ( 4 + 1 + C)), where N * N is the feature map size, C is the number of classes.
sparsity training
yolov3 network structure except the input layer yolo convolutional layer is not bn layer, other layers are convolutional layer bn, bn layer formula:
Equation (1), respectively, a batch of mean and variance, respectively, Training scaling (trainable scale factor) and offset (bias). On direct use to measure the importance of the channel, the importance of using L1 regression to measure, sparsity training objectives formula:
represents L1 return, for balancing the two loss, with a negative gradient method for nonsmooth penalty term L1 optimized author use value size is 0.0001
Channel pruning:
After sparse training, introducing a global threshold to determine which channels are cut features, using the global threshold control all will be minus a few percent. At the same time, the introduction of a local threshold value in order to protect channels within a convolution layer is excessively pruned, to avoid the integrity of the structure of a network connection is damaged.
Fine-tuning
To fine-tune the structure after pruning, to restore accuracy.
Iteratively pruning
Iterative pruning, to prevent over-fitting.
Results and Discussions
Seen from the figure, yolov3-spp3 better than yolov3-spp1, which shows a multi-receptive fields can be effectively extracted features multi-scale depth.
Annex: yolov3-spp3 spp uses three modules, respectively mounted between the 5th and 6th number next three test head.