Target detection test of tensorflow MobileNetV2

  Google recently launched the next-generation mobile vision application MobileNetV2, which has been significantly improved on the basis of MobileNetV1 and promoted the effective development of mobile visual recognition technology, including classification, object detection and semantic segmentation. MobileNetV2 was launched as part of the TensorFlow-Slim image classification library, and has also been integrated into the object detection package, and importantly provides pre-trained models.

1. Model principle

  From the name of the paper " MobileNetV2: Inverted Residuals and Linear Bottlenecks ", it can be seen that MobileNetV2 is based on the idea of ​​MobileNetV1, and there are two major differences. One is based on the reverse residual structure, in which the input and output of the residual block are Shorter bottleneck layers, as opposed to traditional residual models that use extended representations in the input. MobileNetV2 uses a lightweight depthwise convolution to filter the features of the intermediate expansion layers; the second is to maintain representational power, it is important to remove the nonlinearity in the short layers, which improves performance and brings the intuitive idea that led to the design, namely Allows to separate the input/output domain from the expressiveness of the transformation, thus providing an easy framework for future analysis.

(1) Reverse residual structure


The left picture can be understood as: first use 1x1 down channel to pass ReLU, then 3x3 space convolution to pass ReLU, and then use 1x1 convolution to pass ReLU to restore the channel, and add it to the input. The reason for the 1x1 convolution drop channel is to reduce the amount of calculation, otherwise the calculation of the 3x3 spatial convolution in the middle is too large. Therefore, the Residual block is hourglass-shaped, wide on both sides and narrow in the middle .
In the right picture: the 3x3 convolution in the middle has become Depthwise, and the amount of calculation is very small, so the channel can be a little more, and the effect is better, so first increase the number of channels through 1x1 convolution, then Depthwise's 3x3 spatial convolution, and then Reduce dimensionality with 1x1 convolution. The number of channels at both ends is very small, so the calculation amount of the 1x1 convolution up channel or down channel is not large, and although the number of intermediate channels is large, it is the convolution of Depthwise, and the amount of calculation is not large. Called the Inverted residual block, the two sides are narrow and the middle is wide , and a small amount of calculation can get better performance.

The architecture of its model is compared with MobileNetV1, etc. as shown in the following figure:



2. Model experiment

  Since the classification experiment is essentially included in the object detection, only the detection is analyzed here:

(1) The test pictures that come with the object_detection package, the detection results are as follows, it can be seen that there are still some undetected.


(2) For comparison with the last mobilenetv1 experiment, see the blog: Compilation and testing of the target object detection package in the tensorflow model

The photo location: https://worldtravelholics.files.wordpress.com/2014/07/img_4720.jpg

The following picture shows the detection effect of mobilenetv2

The following picture shows the detection effect of mobilenetv1


It can be seen that the detection effect of mobilenetv2 is better than the v1 version!

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=325441124&siteId=291194637