Learning: YOLO yolo v2 Series

copy Link

yolo_v2 published in CVPR2017. v2 v1 on the basis of the algorithm can be said to be drastically improved, learned the advantages of philosophers. Meanwhile, v2 paper is by far the most yolo series of papers in the dry article.

Thesis title: "YOLO9000: Better, Faster, Stronger"
paper Address: https://arxiv.org/pdf/1612.08242v1.pdf

a major feature of yolo_v2 is " TradeOff ", translated into Chinese is "compromise." v2 may be performed on the speed and accuracy tradeoff, for example, in 67 a frame rate under, v2 in the data set may mAP VOC2007 76.8; 40 at the frame rate, mAP be 78.6. In this way, v2 can adapt to the needs of a variety of scenarios, that do not require fast, it can do high precision, without the need for very accurate when it can be done quickly.

v1 v2 on the promotion:

Normalization BATCH: BN able to bring significantly enhance the convergence of the model , while also eliminating other forms of regularization necessary. After each convolution of the back layer is added to BN, on lifting mAP 2% . BN also contribute to the regular model. With BN can be removed with a dropout model to avoid over-fitting operation. Adding BN layer directly mAP hard pulled two points, this operation still have reservations on the yolo_v3, BN layer from the beginning v2 became yolo standard algorithms.

** high resolution classifier: ** All top detection algorithms are used in pre-trained classifier based ImageNet. From AlexNet start, enter the size of most of the classifier is less than a 256x256. The earliest YOLO algorithm used is 224x224, and now has been raised to 448 the. This means that network learning time target detection must be adjusted to the new resolution .

For YOLOv2, at the beginning of a harmonized classification network (refer darknet-19 ) with full resolution of 448X448 ran 10 epoch in ImageNet. This enables the network to have time to adjust their filter to make that they can perform better on higher resolution input. Then, the authors put this higher resolution classification used on the network detection, found mAP improved by 4%.

Anchor Boxes With Convolutional : join the optimization attempt yolo_v2 the anchor mechanism . YOLO layer fully connected directly predicted coordinate value of the Bounding Box . Faster R-CNN is not directly predicted coordinate value. Faster R-CNN except RPN thereof connected to the full for each box prediction offset (coordinates offset or finishing amount ) and a confidence level (score). ( Description: faster r-cnn the box body from the anchor, RPN amount offset only provide the anchor-refined )

Since the layer is a convolution of the prediction, the prediction offset RPN is global. Prediction offset coordinates rather than simplify the practical problems, and easier learning .

In addition to the connection of the whole layer of YOLO , use anchor frame to predict the bounding box. First, of one cell layer is removed to ensure the convolution output with a higher resolution . The authors 448X448 image size shrinks to 416. Since the output characteristic of FIG wanted dimension is odd (416/32 = 13, 13 is odd), so there is one intermediate cell (center cell). An object (especially the large object) often occupies the center of the image, so there is just a single position can be well predicted object position at the center. These sample images under YOLO convolution layer 32 (i.e., 25) of oversampling factor (416/32 = 13), the output feature map is 13x13.

After using anchor boxes mechanism, the accuracy rate has decreased a little bit. YOLO (refer YOLO v1) Prediction block 98 can be given in each figure, the mechanism used after the anchor boxes model can predict more than 1000 frames.
Here Insert Picture Description
Despite a slight decline in the number of mAP, but recall lift on the model means that there is more room for improvement.

Dimension Clusters: When authors use anchor mechanism yolo, encountered two problems. 1, the template frame (Prior) size is selected manually (refer to the size of the anchor prior to a beginning of people to manually set, Faster R-CNN in k = 9, a total of 3x3 sizes species). box specifications although the latter can be adjusted by linear regression , but If I had chosen a more appropriate prior (template box), you can make the network easier to learn more. (This article will be translated into a template prior frame, my own experience, for reference)

The author does not manually set the Prior, , but b-box training set used on the k-means clustering to automatically locate Prior, . If the standard k-means (Euclidean distance), more errors will be relatively large box small box appears. However, what we really want is the ability to score higher preference IOU nothing to do with the size of the box. Thus, for the distance determination, the use of: d (box, centroid) = 1 - IOU (box, centroid)
of k-means algorithm takes on various values of k, and draw a graph of
Here Insert Picture Description
the final choice of k = 5 , which is the model complexity and high recall rate between taking a compromise. Obtained by the clustering frame and the frame before hand-picked a big difference. There are slightly shorter than wide and tall, thin number (block).
We compared before and after the average IOU, as follows:
Here Insert Picture Description
In when k = 5 9-anchor effect of the clustering effect and proximity Faster R-CNN, and using a clustering 9-anchor, there will be a significantly improved. This indicates that the block to generate the initial b-box using k-means clustering, this model can have a better and easier to learn phenotype .

Direct location prediction: When using anchor boxes mechanism in YOLO, meet the second question: model instability . Especially when time early iterations. The main factor of instability for the box from the prediction (x, y) position of the time. In RPN, the network predicted values tx and ty and (x, y) coordinates, the following formula is calculated:
Here Insert Picture Description

For example, predicted t x = 1 t_x = 1 means that the whole frame is moved from a box to the right.

This formula does not restrict the conditions, any anchor box can be shifted to an arbitrary position on the image. Random initialization model will take a long time to generate stable and reliable offsets (offset).

We did not "predicted offset amount", but followed the method of YOLO: direct prediction relative position of the grid cells .
Here Insert Picture Description

Direct prediction (x, y), as yolo_v1 practice , however, v2 is a predicted relative position coordinates of the upper left corner of the cell opposite (shown above) . When the (x, y) is predicted directly, then the whole bounding box is determined worse w and h. yolo_v2 practice is to have both a conservative radical, x and y predicted direct violence, while w and h is determined by adjusting the bounding box prior . yolo bounding box for each coordinate predicted 5 ( t x , t Y , t w , t h , t O ) (t_x, t_y, t_w, t_h, t_o)
Here Insert Picture Description
See above equation can also be seen, b-box width and height are determined at the same time out, and will not be as determined by the regression as RPN. p w p_w with p h p_h Both prior (template block) after kmeans cluster width and height , yolo directly predicted offset t w t_w with t h t_h ,相当于直接预测了bounding box的宽和高。使用聚类搭配直接位置预测法的操作,使得模型上升了5个百分点。

论文刚看到这儿的时候,我也很纳闷,好像又没用anchor,作者在前面花大篇幅讲的anchor机制在这里又被否定了。不过看到等我看到下面表格的时候我才明白:
Here Insert Picture Description
从第四行可以看出,anchor机制只是试验性在yolo_v2上铺设,一旦有了dimension priors就把anchor抛弃了。最后达到78.6mAP的成熟模型上也没用anchor boxes。

Fine-Grained Features调整后的yolo将在13x13的特征图上做检测任务。虽然这对大物体检测来说用不着这么细粒度的特征图,但这对小物体检测十分有帮助。Fast R-CNN和SSD 都是在各种特征图上做推荐网络 以得到一个范围内的分辨率。我们采用不同的方法,只添加了一个passthrough层,从26x26的分辨率得到特征。

**multi-scale training:**用多种分辨率的输入图片进行训练。

** darknet-19: ** yolo_v2 with darknet-19 as a backbone network. Usually there will be detection of a task model classified network as a backbone network , such as faster R-CNN VGG take as backbone. yolo_v2 with its own classification network darknet-19 as a base, reflect the superiority from home. In use batch normalization darknet-19 to accelerate convergence .
Here Insert Picture Description

Guess you like

Origin blog.csdn.net/czp_374/article/details/91887626