English paper (sci) Interpretation and reproduction [NO.7] Improved YOLOv5s target detection algorithm based on attention mechanism

I published the target detection algorithm improvement column before, but for what scenario it is applied to, what improvement method is needed to be effective for its own application scenario, and how many improvements can be published at what level of articles, in order to solve everyone's confusion, this series of articles aims to I will explain to you the SCI papers published in high-level academic journals , and introduce the corresponding SCI journals to help you answer your doubts and facilitate the submission of scientific research papers. For the series of articles interpreted, I will reproduce the innovative point code , and friends in need can pay attention to private messages and get them.

1. Summary

In order to improve the accuracy of the YOLOv 5s (You Only Look Once v5 s) target detection algorithm, an improved YOLOv 5s target detection algorithm CBAM-YOLOv 5s is proposed. A Convolutional Block Attention Module (CBAM) is incorporated into the YOLOv 5s backbone network to improve its feature extraction capability. Furthermore, the complete intersection over union (CIoU) loss is used as the object bounding box regression loss function to speed up the regression process. Experiments are conducted on the Pascal Visual Object Classes 2007 (VOC 2007) dataset and the Microsoft Common Objects in Context (COCO 2014) dataset, which are widely used for object detection evaluation. The experimental results on the VOC 2007 dataset show that compared with the original YOLOv 5s algorithm, the precision, recall and average precision of the CBAM-YOLOv 5s algorithm are increased by 4.52%, 1.18% and 3.09%, respectively. On the COCO 2014 dataset, compared with the original YOLOv 5s algorithm, the precision, recall and mAP of the CBAM-YOLOv 5s algorithm increased by 2.21%, 0.88% and 1.39% respectively.

2. Network model and core innovation points

 

1. Convolutional Block Attention Module (CBAM)

2. Complete Intersection and Union (CIoU)

3. Dataset

The datasets used in this experiment are the Pascal Visual Object Classes 2007 (VOC2007) dataset [28] and the Microsoft Common Objects in Context (COCO2014) dataset. The COCO 2014 dataset has a total of 123,287 images in 80 categories. The VOC 2007 dataset contains a total of 9963 images. The dataset includes 20 classes, as shown in Figure 6; these classes include airplane, bicycle, bird, boat, bottle, bus, car, cat, chair, cow, dining table, dog, horse, motorcycle, person, potted plant , sheep, sofa, train, and TV monitor classes, and the associated XML files provide the object classes of the input image and the corresponding ground truth coordinates.

4. Experimental results (partial display)

1. Ablation experiment

During the experimental training, stochastic gradient descent (SGD) optimization algorithm was used to update the model parameters. Table 3 shows the experimental results obtained on the VOC2007 dataset.

 By comparing and analyzing the experimental results, it can be concluded that the algorithm in this paper has better performance than the original algorithm and the algorithm with SENet module. SENet only includes channel attention, and can only obtain important feature information on the channel, while CBAM includes not only channel attention, but also spatial attention. It can obtain important feature information in both channel and space, so that the network can better learn important features in images. The more image features the network learns, the better it can recognize objects, which will lead to higher recognition accuracy for the network.

2. Comparative experiment

In order to further verify the effectiveness of the improved algorithm, this paper conducts comparative experiments on the COCO2014 dataset. The experimental results are shown in Table 4.

It can be seen from Table 4 that compared with the original YOLOv 5s algorithm, the precision, recall and mAP of the CBAM-YOLOv 5s algorithm have increased by 2.21%, 0.88% and 1.39% respectively. Based on the experimental results in Table 3 and Table 4, it can be concluded that the improved CBAM-YOLOv 5s algorithm outperforms the original YOLOv 5s algorithm on the VOC 2007 dataset and the COCO 2014 dataset.

5. Experimental conclusion

This paper introduces CBAM in the YOLOv 5s backbone network, optimizes the network structure of the YOLOv 5s backbone network, and uses CIoU loss as the object bounding box regression loss function to speed up the regression process. To verify the performance of the proposed algorithm, extensive experiments are conducted on the VOC 2007 dataset. The experimental results show that compared with the original YOLOv 5s algorithm, the precision, recall and mAP of the algorithm are significantly improved; moreover, the CIOU loss is used because the bounding box regression loss function is faster than the GIOU loss in terms of convergence. The algorithm in this paper solves the problem of low detection accuracy of the original YOLOv 5s algorithm to a certain extent, but there are still certain detection errors and missing detection problems for complex images with dense targets. Future research will involve continuously optimizing the network structure of the proposed algorithm to further improve its detection accuracy.

Note: The original text of the paper is from An Improved YOLOv5s Algorithm for Object Detection with an
Attention Mechanism. This paper is only for academic sharing. If there is any infringement, please contact the background for deletion.

Interpretation of the series of articles, I have reproduced the innovative code, friends in need are welcome to pay attention to private message me to get .

Guess you like

Origin blog.csdn.net/m0_70388905/article/details/130500598