Target detection-YOLOv5 (eight)

Introduction:
YOLOv4 (2020.4.23) has been released less than 2 months, and many people have not had time to take a closer look. . . Suddenly YOLOv5 (2020.6.10) came again. . .
The size of YOLOv5 is only 27 MB, while YOLOv4 using darknet architecture has 244 MB, which is nearly 90% smaller by comparison, and is equivalent to the YOLOv4 benchmark in terms of accuracy.
Github address: https://github.com/ultralytics/yolov5
yolov5 weight file: https://pan.baidu.com/s/1Zk2Ksfl_v-apbRBQ_mqc6w
Password: 00mp
The algorithm performance given by the author is as follows:
Insert picture description here

Network structure:

Adapted from a picture of
Zhihu : Yolov5s network is the network with the smallest depth and the smallest width of the feature map in the Yolov5 series. The latter three types (Yolov5m, Yolov5l, Yolov5x) are constantly deepening and widening on this basis.
Insert picture description here

Detailed explanation of core principles:

1. Input:
(1) Mosaic data enhancement:
Same as Yolov4, proposed by this author.
(2) Adaptive anchor box calculation:
In the previous yolo, we had to use k-means clustering to calculate the size of the initial anchors, but in Yolov5 this function was integrated into the train code, and it was adaptive each time training Calculate the best anchor box value in different training sets.
(3) Adaptive image scaling:
In the training phase, for example, the size of the network input is 608×608, but the size of my data is different. The general method is to directly zoom to the standard size, and then fill the black border, as shown in the following figure Show:
Insert picture description here

However, if there is more filling, there will be information redundancy, which will affect the speed of reasoning.
In the inference stage, Yolov5 uses the method of reducing the black border to improve the speed of inference. Modified in the letterbox function of datasets.py code to adaptively add the least black borders to the original image.
eg: "For example, my 1000×800 picture is not directly scaled to the size of 608×608, but calculated 608/1000=0.608 and then scaled to the size of 608×486, and then calculated 608-486=122 and then np.mod(122 , 32) Take the remainder to get 26, then average 13 to fill the two ends of the picture height, and finally 608×512."
Insert picture description here

2. Backbone:
(1) Focus structure:
Focus structure, there is no such structure in Yolov3&Yolov4, and the key is the slicing operation.
Taking the structure of Yolov5s as an example, the original 608×608×3 image is input into the Focus structure, and the slicing operation is used to first become a 304×304×12 feature map, and then after a convolution operation of 32 convolution kernels, the final change A feature map of 304×304×32 is formed.
Insert picture description here

(2) CSP structure:
Yolov5 and Yolov4 both have CSP structure. The difference is that only the backbone network in Yolov4 uses the CSP structure.
There are two CSP structures designed in Yolov5. Taking the Yolov5s network as an example, the CSP1_X structure is used in the Backbone backbone network, and the other CSP2_X structure is used in Neck.
3. Neck:
(1) FPN+PAN structure.
The current Neck of Yolov5 is the same as Yolov4, and both adopt the structure of FPN+PAN.
In the Neck structure of Yolov4, ordinary convolution operations are used. In the Neck structure of Yolov5, the CSP2 structure designed by CSPnet is used to strengthen the ability of network feature integration.
Insert picture description here

4. Prediction:
(1) Bounding box loss function:
CIOU_Loss used in Yolov4 is used as the loss function of Bounding box, and GIOU_Loss is used in Yolov5.
ps: The development process of the regression loss function in recent years is:
Smooth L1 Loss → IoU Loss (2016) → GIoU Loss (2019) → DIoU Loss (2020) → CIoU Loss (2020)
IOU_Loss: The main consideration is that the detection frame and the target frame overlap area.
GIOU_Loss: On the basis of IOU, solve the problem when bounding boxes do not coincide.
DIOU_Loss: On the basis of IOU and GIOU, consider the information of the center point of the bounding box.
CIOU_Loss: On the basis of DIOU, consider the scale information of the aspect ratio of the bounding box.
(2)
NMS non-maximum suppression: In the post-processing stage, the NMS algorithm is generally performed on many frames, because CIOU_Loss contains the influence factor v, which involves the information of the groundtruth, and there is no groundtruth when testing the inference.
Therefore, Yolov4 uses the DIOU_nms method on the basis of DIOU_Loss in reasoning, and Yolov5 uses the weighted nms method on the basis of GIOU_Loss.

Series of portals:
target detection-R-CNN (1)
target detection-Fast R-CNN (2)
target detection-Faster R-CNN (3)
target detection-Mask R-CNN (4)
target detection- -R-FCN (five)
target detection-YOLOv3 (six)
target detection-YOLOv4 (seven)