[YOLOv7] Reflective clothing detection system based on YOLOv7 (source code & deployment tutorial & data set)

1. Recognition effect display

2.png
1 (262).jpg

1 (267).jpg

2. Video demonstration

[YOLOv7] Reflective clothing detection system based on YOLOv7 (source code & deployment tutorial & data set)_哔哩哔哩_bilibili

3. Introduction to YOLOv7 Algorithm

YOLOv7 exceeds all known object detectors in speed and accuracy in the range of 5 FPS to 160 FPS

And on GPU V100, the highest accuracy of real-time object detector is 56.8% AP at 30 FPS. YOLOv7 is trained from scratch on the MS COCO dataset without using any other datasets or pretrained weights.
Relative to other types of tools, the YOLOv7-E6 object detector (56 FPS V100, 55.9% AP) is 509% faster than the transformer-based detector SWINL Cascade-Mask R-CNN (9.2 FPS A100, 53.9% AP) , which is 2% more accurate, 551% faster and 0.7% more accurate than the convolution-based detector ConvNeXt-XL Cascade-Mask R-CNN (8.6 FPS A100, 55.2% AP).
5.png

In addition, YOLOv7's performance in speed and accuracy is also better than that of YOLOR, YOLOX, Scaled-YOLOv4, YOLOv5, DETR and other target detectors.

4. YOLOv7 technical method

In recent years, real-time object detectors are still being developed for different edge devices. For example, the development of MCUNet and NanoDet focuses on producing low-power single-chips and improving the inference speed of edge CPUs; YOLOX, YOLOR and other methods focus on improving the inference speed of various GPUs; the development of real-time object detectors focuses on the design of efficient architectures above; the design of real-time object detectors used on CPU is mainly based on MobileNet, ShuffleNet or GhostNet; most of the real-time object detectors developed for GPU use ResNet, DarkNet or DLA, and use CSPNet strategy to optimize the architecture.

The development direction of YOLOv7 is different from the current mainstream real-time object detectors, and the research team hopes that it can support both mobile GPUs and GPU devices from the edge to the cloud. In addition to architecture optimization, the method proposed in this study also focuses on the optimization of the training process, focusing on some optimization modules and optimization methods. This may increase the training cost to improve the accuracy of object detection, but not the inference cost. The researchers refer to the proposed modules and optimization methods as trainable "bag-of-freebies".

For model reparameterization, this study analyzes model reparameterization strategies applicable to different network layers using the concept of gradient propagation paths, and proposes a planned reparameterization model. In addition, the researchers found that when using dynamic label assignment techniques, a model with multiple output layers will generate new problems during training: "How to assign dynamic targets to the outputs of different branches?" To solve this problem, the researchers proposed a A new label assignment method called guided label assignment from coarse-to-fine.

The main contributions of the study include:

(1) Several trainable bag-of-freebies methods are designed, so that real-time object detection can greatly improve the detection accuracy without increasing the cost of reasoning;

(2) For the evolution of target detection methods, the researchers discovered two new problems: one is how to replace the original module with the reparameterized module, and the other is how the dynamic label allocation strategy handles the problem of assigning to different output layers, and proposed A solution to these two problems;

(3) Propose "extend" and "compound scale" methods for real-time object detectors to efficiently utilize parameters and computation;

(4) The method proposed in this study can effectively reduce about 40% of the parameters and 50% of the computation of the SOTA real-time object detector, and has faster inference speed and higher detection accuracy.

In most of the literature on designing efficient architectures, the main considerations include the number of parameters, computational effort, and computational density. The design of CSPVoVNet in Figure 2(b) below is a variant of VoVNet. The architecture of CSPVoVNet analyzes the gradient path to enable the weights of different layers to learn more diverse features, making inference faster and more accurate. The ELAN in Figure 2 © considers the problem of "how to design an efficient network".

The YOLOv7 research team proposed an extended E-ELAN based on ELAN, and its main architecture is shown in the figure.
6.png
The new E-ELAN does not change the gradient transmission path of the original architecture at all, which uses group convolution to increase the cardinality of added features, and combines different groups of features in a way of shuffle and merge cardinality. This mode of operation can enhance the features learned from different feature maps, and improve the use of parameters and computational efficiency.

It reaches a steady state regardless of the gradient path length and the stacked number of computational blocks in large-scale ELAN. If more computing blocks are infinitely stacked, this stable state may be disrupted, and the parameter utilization rate will decrease. The newly proposed E-ELAN uses expand, shuffle, and merge cardinality to continuously enhance the learning ability of the network without destroying the original gradient path.

In terms of architecture, E-ELAN only changes the architecture of the computing block, while the architecture of the transition layer remains unchanged at all. The strategy of YOLOv7 is to use group convolutions to expand the channels and cardinality of computational blocks. The researchers will apply the same group parameters and channel multipliers to all computation blocks of the computation layer. Then, the feature maps calculated by each calculation block are shuffled into g groups according to the set group parameter g, and then they are connected together. At this point, the number of channels for each set of feature maps will be the same as in the original architecture. Finally, the method adds g sets of feature maps to perform merge cardinality. In addition to maintaining the original ELAN design architecture, E-ELAN can also guide different groups of computing blocks to learn more diverse features.
Therefore, for series-based models, we cannot analyze the different expansion factors separately, but must consider them together. The study proposes Figure (c), that is, when extending the cascade-based model, only the depth in the computation block needs to be extended, and the rest of the transmission layers should be extended correspondingly. This composite extension approach preserves the identity and optimal structure of the model at the time of its initial design.

In addition, the study uses the gradient flow propagation path to analyze how to reparameterize the convolution to combine with different networks. The figure below shows the "planned reparameterized convolution" for PlainNet and ResNet designed by this research.
7.png

5. Dataset Preparation

Annotate the collected pictures to create a YOLO format dataset

4.png

11.png
Create a myself.yaml file to configure the path. The path format is different from the previous V5 and V6. You only need to configure the txt path.
8.png

9.png
The train-list.txt and val-list.txt files store the absolute path of the image (you can also put in a relative path).
12.png
How to get the absolute path of the image, the script is written below (you can also get the relative path)

# From Mr. Dinosaur
 
import os
 
 
def listdir(path, list_name):  # 传入存储的list
    for file in os.listdir(path):
        file_path = os.path.join(path, file)
        if os.path.isdir(file_path):
            listdir(file_path, list_name)
        else:
            list_name.append(file_path)
 
 
list_name = []
path = 'D:/PythonProject/data/'  # 文件夹路径
listdir(path, list_name)
print(list_name)
 
with open('./list.txt', 'w') as f:  # 要存入的txt
    write = ''
    for i in list_name:
        write = write + str(i) + '\n'
    f.write(write)

6. Training process

run train.py

The train file is still the same as V5. For convenience, I put the files I need in the root directory
13.png

After the path is modified, right-click to run
14.png

After a long training process, YOLOv7 requires more configuration than YOLOv5 training, especially video memory. The actual test of GPU 3080ti training lasts for more than 40 hours. It is recommended to try carefully with a computer with less than 8G video memory. It is possible that the computer with low configuration will appear blue screen during the training process. All of them are overloaded graphics cards. Using the trained weights provided in this article for prediction does not require configuration, and the CPU can also achieve good prediction results without damaging the computer.

Attach the experimental equipment configuration of this article
16.jpg

7. Test verification

Put the comparison chart below: (V7 above, V5 below)
15.png

8. Project display

1.png

9. Complete source code & environment deployment video tutorial & data set & custom UI interface:

Baidu Bread can download the source code by searching more title names

10. Project background

The correct wearing of safety helmets and reflective clothing by construction workers is an important part of safe production and personal safety. The current inspection of wearing safety helmet reflective clothing still relies on traditional manual methods, which is time-consuming and labor-intensive. In view of this situation, the The SSD (single shot multibox detector) algorithm in deep learning is used as the basic network framework to detect unmanned wearables of target people in real time. At the same time, the original SSD algorithm has low detection accuracy, and the original SSD algorithm has been improved. First, use Part of the ResNet50 network is used to replace the internal VGG-16 as a feature extraction network; secondly, a deformable convolution module is added to the high-level convolution module of the SSD algorithm to better adapt to the different sizes of the target when detecting the target and improve the detection accuracy. Experimental results show that the network structure is excellent in the detection of safety helmets and reflective clothing in terms of accuracy and speed.

11. References:


[1] YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors
[2] Improved multi-target detection of SSD algorithm [J]. Ma Yuandong, Luo Zijiang, Ni Zhaofeng, Xu Bin , Wu Fengjiao, Sun Shouyu, Yang Xiuzhang. Computer Engineering and Application. 2020(23)
[3] An Improved Algorithm for Pedestrian Detection Based on SSD [J]. Li Guojin, Wei Huiling, Ai Jiaoyan, Chen Yanming. Journal of Guangxi University ( Natural Science Edition). 2021(05)
[4] Detection of Leaky Cable Fixtures in Railway Tunnel Based on Improved SSD Algorithm [J]. Zhang Yunzuo, Yang Panliang, Li Wenxuan. Progress in Laser and Optoelectronics. 2021(22) [5
] ] Research on Highway Asset Detection Algorithm Based on Improved SSD [J]. Wei Xinyu, Wang Chishe. Information and Computer (Theoretical Edition). 2022 (04) [6] Vehicle Detection Based on Improved SSD Algorithm [J]
. Jie, Ai Jiaoyan. Computer Engineering. 2022(01)
[7] Improved SSD algorithm and its application in subway security inspection [J]. Zhang Zhen, Li Mengzhou, Li Haofang, Ma Junqiang. Computer Engineering. 2021(07) [8
] An Improved SSD Algorithm for Detection of Transmission Line Objects [J]. Huang Qinqin, Dong Jie, Chen Yue, Zhu Yuanyuan. Electrical Engineering Electric. 2021 (06) [9] Pedestrian Detection Method Based on Improved SSD Algorithm [J]. Yu Bo, Liu Chang
. Electronic measurement technology. 2021(12)
[10] Individual recognition of cows based on improved SSD algorithm [J]. Xing Yongxin, Sun Youdong, Wang Tianyi. Computer Engineering and Application. 2022(02)

Guess you like

Origin blog.csdn.net/cheng2333333/article/details/126681684