Analysis of Yolov5 target detection algorithm: model structure

        The Yolov5 series is a new generation model of the Yolo family. Compared with the previous versions Yolov3 and Yolov4, the same thing is that it still uses the anchor frame (anchor) to regress the size of the target, and maintains a variety of large, medium and small scale feature outputs. The difference is that the Yolov5 series has changed in terms of loss function design, network structure, and feature output post-processing. Thanks to Yolov5's innovations in network structure and data enhancement, its inference speed and inference accuracy have improved compared with the previous work.

        At present, the official has updated Yolov5 to the sixth version. There are 6 predefined models available under this version. A large number of levels, the files defined are yolov5n, yolov5s, yolov5m, yolov5l, yolov5x. Generally speaking, for relatively simple target detection tasks, lightweight models are fully competent, and for mobile and edge detection, using yolov5n and yolov5s will be a good choice. The following is a brief description of the overall network structure of the Yolov5 model.

Figure 1 Yolov5 network structure

        As shown in Figure 1, Yolov5 is an end-to-end single-stage detection model, which can be roughly divided into five parts, and the functions of each part are as follows:

(1) Input part (Input) 

        The input part is actually the image pre-processing stage, including image reading and decompression, data enhancement (zooming, translation, flipping, projection, mixing, miscutting, mosaicing, etc.). If you choose to automatically generate anchor frames, this part will also adaptively generate appropriate anchor frames based on the notes.

(2) Backbone network part (Backbone)

        The backbone network part is the main part of the extracted features of the network, which can extract three levels of features: high, medium and low. In migration learning, this part of the weights will be migrated to the new model as initialization weights, and during the training of the new model, this part of the weights will usually be frozen and will not be updated by gradient descent.

(3) Bottleneck network part (Neck)

        The bottleneck network part, as the name suggests, is composed of a series of modules with a bottleneck structure. The characteristic of this part of the network is that the size of the feature changes, like a bottleneck, first reduces by half, and then returns to the original size, which can also reduce calculations. The amount can increase the feature extraction ability. The function of this part is to fuse the features of each level to extract large, medium and small feature maps.

(4) Detection head part (Head)

        The detection head part regresses the position of the target, generates the center coordinates of the target, generates the width and height of the target according to the anchor box, and the confidence of the final target and classification.

(5) Loss function part (Loss)

        The loss function part only works during the training period. Its function is to calculate the loss function of the current model based on the deviation between the predicted result and the true value, and update the weight data by using the gradient descent method to backpropagate the error.

Guess you like

Origin blog.csdn.net/qq_28249373/article/details/129426556