Face mask target detection system based on YOLOV8 model (PyTorch+Pyside6+YOLOv8 model)

Abstract: The face mask target detection system based on the YOLOV8 model can be used to detect and locate face masks in daily life. The deep learning algorithm can be used to achieve target detection in pictures, videos, cameras, etc. In addition, this system also supports pictures, videos, etc. Formatted result visualization and result export. This system uses the YOLOv8 target detection algorithm training data set and uses the Pysdie6 library to build a front-end page display system. In addition, the functions supported by this system include the import and initialization of training models; the adjustment of detection confidence scores and post-detection processing IOU thresholds; image upload, detection, visual result display and detection result export; video upload, detection, and visual result display Export of detection results; camera image input, detection and visualization results display; number and list of detected targets, location information; forward inference time and other functions. This blog post provides complete Python code and installation and usage tutorials, which is suitable for reference by newbies. Some important code parts have comments. For complete code resource files, please go to the download link at the end of the article.
Insert image description here

Friends who need the source code can privately message the blogger in the background to get the download link.

basic introduction

In recent years, machine learning and deep learning have made great progress, and deep learning methods have shown better performance than traditional methods in terms of detection accuracy and speed. YOLOv8 is the next generation algorithm model developed by Ultralytics after the YOLOv5 algorithm. It currently supports image classification, object detection and instance segmentation tasks. YOLOv8 is a SOTA model that builds on the success of the previous YOLO series of models and introduces new features and improvements to further improve performance and flexibility. Specific innovations include: a new backbone network, a new Ancher-Free detection head and a new loss function that can run on various hardware platforms from CPU to GPU. Therefore, this blog post uses the YOLOv8 target detection algorithm to implement a face mask target detection system based on the YOLOV8 model, and then uses the Pyside6 library to build an interface system to complete the development of the target detection page. This blogger has previously published related models and interfaces about the YOLOv5 algorithm. Friends who need it can check it out from my previous blog. In addition, this blogger plans to jointly release YOLOv5, YOLOv6, YOLOv7 and YOLOv8. Friends who need it can continue to pay attention. Friends are welcome to follow and collect.

Environment setup

(1) Open the project directory and enter cmd in the search box to open the terminal
Insert image description here

(2) Create a new virtual environment (conda create -n yolo8 python=3.8)
Insert image description here

(3) Activate the environment, install ultralytics library (yolov8 official library), pip install ultralytics -i https://pypi.tuna.tsinghua.edu.cn/simple
Insert image description here

(4) Note that this installation method will only install the CPU version of torch. If you need to install the gpu version of torch, you need to install torch before installing the package: pip install torch2.0.1+cu118 torchvision0.15.2+cu118 -f https://download.pytorch.org/whl/torch_stable.html;再,pip install ultralytics -i https://pypi.tuna.tsinghua.edu.cn/simple
Insert image description here

(5) Install the graphical interface library pyside6: pip install pyside6 -i https://pypi.tuna.tsinghua.edu.cn/simple

Interface and function display

The software interface designed in this blog post is given below. The overall interface is simple and elegant. The general functions include the import and initialization of the training model; the adjustment of the confidence score and IOU threshold, image upload, detection, visual result display, result export and end detection; video Uploading, detection, visual result display, result export and end of detection; detected target list, location information; forward inference time. The initial interface is as shown below:
Insert image description here

Model selection and initialization

Users can click the model weight selection button to upload the trained model weights. The training weight formats can be .pt, .onnx, engine, etc., and then click the model weight initialization button to implement the configuration of the selected model initialization.
Insert image description here

Changes in confidence score and IOU

Changing the value in the input box below Confidence or IOU can synchronously change the progress of the slider bar, and changing the progress value of the slider can also change the value of the input box synchronously; changes in the Confidence or IOU value will be synchronized to the configuration in the model. The detection confidence threshold and IOU threshold will be changed.

Image selection, detection and export

Users can click the Select Image button to upload a leaflet image for detection and recognition. After the upload is successful, the system interface will simultaneously display the input image.
Insert image description here

Click the image detection button again to complete the target detection function of the input image. Then the system will output the detection time in the time column and the number of detected targets in the target number column. You can select the detected target in the drop-down box, corresponding to the target Changes in position (i.e. xmin, ymin, xmax and ymax) label values.
Insert image description here

Click the detection result display button again to display the input image detection results in the lower left corner of the system. The system will display the category, location and confidence information of the target in the image.
Insert image description here

Click the image detection result export button to export the detected image. Enter the saved image name and suffix in the save column to save the detection result image.
Insert image description here

Click the End Image Detection button to refresh the system interface and clear all output information. Then click the Select Image or Select Video button to upload images or videos, or click the Open Camera button to turn on the camera.

Video selection, detection and export

The user clicks the Select Video button to upload the video for detection and recognition, and then the system will input the first frame of the video into the system interface for display.
Insert image description here

Then click the video detection button to complete the target detection function of the input video. Then the system will output the detection time in the time column and the number of detected targets in the target number column. You can select the detected target in the drop-down box, corresponding to the target Changes in position (i.e. xmin, ymin, xmax and ymax) label values.
Insert image description here

Click the pause video detection button to pause the input video. At this time, the button changes to continue video detection. The input video frames and frame detection results will be retained in the system interface. You can click the drop-down target box to select the coordinate position information of the detected target, and then Click the Continue Video Detection button to detect the input video.
Click the video detection result export button to export the video after detection. Enter the saved image name and suffix in the save column to save the detection result video.
Insert image description here

Click the End Video Detection button to refresh the system interface and clear all output information. Then click the Select Image or Select Video button to upload images or videos, or click the Open Camera button to turn on the camera.

Camera opening, detection and termination

The user can click the Open Camera button to open the camera device for detection and identification, and then the system will input the camera image into the system interface for display.
Insert image description here

Click the camera detection button again to complete the target detection function of the input camera. Then the system will output the detection time in the time column and the number of detected targets in the target number column. You can select the detected target in the drop-down box, corresponding to the target Changes in position (i.e. xmin, ymin, xmax and ymax) label values.
Insert image description here

Click the End Video Detection button to refresh the system interface and clear all output information. Then click the Select Image or Select Video button to upload images or videos, or click the Open Camera button to turn on the camera.

Introduction to algorithm principles

This system uses the single-stage target detection algorithm YOLOv8 based on deep learning. Compared with the previous YOLO series target detection algorithms, the YOLOv8 target detection algorithm has the following advantages: (1) More friendly installation/operation method; (2) ) is faster and more accurate; (3) new backbone, replacing C3 in YOLOv5 with C2F; (4) YOLO series attempts to use anchor-free for the first time; (5) new loss function. The overall structure of the YOLOv8 model is shown in the figure below. The original picture can be found in the official warehouse of mmyolo.
Insert image description here

The most obvious difference between the YOLOv8 and YOLOv5 models is that the C2F module is used to replace the original C3 module. The structure of the two modules is as shown in the figure below. The original figure can be found in the official warehouse of mmyolo.
Insert image description here

In addition, the Head part has changed the most, from the original coupling head to the decoupling head, and from YOLOv5's Anchor-Based to Anchor-Free. The structural comparison is shown in the figure below.
Insert image description here

Dataset introduction

The face mask data set used by this system is manually labeled with the two categories of face and mask, and the data set has a total of 7952 images. The categories in this dataset all have a large number of rotations and different lighting conditions, which helps to train a more robust detection model. The face mask detection and recognition data set in this article's experiment includes 6612 images in the training set and 1340 images in the verification set. Some of the selected data and some sample data sets are shown in the figure below. Since the YOLOv5 algorithm has limitations on the size of input images, all images need to be adjusted to the same size. In order to reduce the distortion of the images as much as possible without affecting the detection accuracy, we resize all images to 640x640 and maintain the original aspect ratio. In addition, in order to enhance the generalization ability and robustness of the model, we also used data augmentation techniques, including random rotation, scaling, cropping, and color transformation, to expand the data set and reduce the risk of overfitting.
Insert image description here

Key code analysis

In the training phase, we used the pre-trained model as the initial model for training, and then optimized the network parameters through multiple iterations to achieve better detection performance. During the training process, we adopted techniques such as learning rate decay and data enhancement to enhance the generalization ability and robustness of the model. A simple single-card model training command is as follows.
Insert image description here

More parameters can also be specified during training. Most of the important parameters are as follows:
Insert image description here

During the testing phase, we used the trained model to detect new images and videos. By setting a threshold, detection frames with confidence lower than the threshold are filtered out, and the detection results are finally obtained. At the same time, we can also save the detection results in image or video format for subsequent analysis and application. This system is based on the YOLOv8 algorithm and implemented using PyTorch. The main libraries used in the code include PyTorch, NumPy, OpenCV, Pyside6, etc.
Insert image description here

Pyside6 interface design

PySide is a Python graphical interface (GUI) library developed from the C++ version of Qt. There is basically no big difference in usage from the C++ version. Compared with other Python GUI libraries, PySide develops faster, has more complete functions, and has better documentation support. In this blog post, we use the Pyside6 library to create a graphical interface to provide users with a simple and easy-to-use interactive interface, allowing users to select pictures and videos for target detection.
We use Qt Designer to design the graphical interface, and then use Pyside6 to convert the designed UI files into Python code. The graphical interface contains multiple UI controls, such as labels, buttons, text boxes, multi-select boxes, etc. Through the signal and slot mechanism in Pyside6, UI controls and program logic code can be connected to each other.

Experimental results and analysis

In the experimental results and analysis section, we use indicators such as precision and recall to evaluate the performance of the model, and also analyze the training process through the loss curve and PR curve. In the training phase, we used the previously introduced data set for training, and used the YOLOv8 algorithm to train the data set for a total of 100 epochs. During the training process, we used tensorboard to record the loss curves of the model on the training set and validation set. As can be seen from the figure below, as the number of training times increases, the training loss and verification loss of the model gradually decrease, indicating that the model continues to learn more accurate features. After training, we used the model to evaluate on the validation set of the dataset and got the following results.
Insert image description here

The figure below shows the PR curve of the YOLOv8 model we trained on the verification set. As can be seen from the figure, the model has achieved high recall and precision, and the overall performance is good.
Insert image description here

The figure below shows the Mosaic data enhancement image of this blog post when using the YOLOv8 model to train the data set.
Insert image description here

In summary, the YOLOv8 model trained in this blog post performs well on the data set, has high detection accuracy and robustness, and can be applied in actual scenarios. In addition, this blogger conducted detailed tests on the entire system and finally developed a smooth and high-precision target detection system interface, which is shown in the demonstration part of this blog post. The complete UI interface, test pictures and videos, code files, etc. have all been packaged and uploaded. , interested friends can follow my private message to get it. In addition, please follow the author’s WeChat public account BestSongC (formerly Nuist Computer Vision and Pattern Recognition) for the PDF of this blog post and more target detection and recognition systems.

Other target detection systems based on deep learning include tomatoes, cats and dogs, goats, wild targets, cigarette butts, QR codes, helmets, traffic police, wild animals, wild smoke, human fall recognition, infrared pedestrians, poultry pigs, apples, bulldozers, Friends who are in need of bees, phone calls, pigeons, football, cows, human face masks, safety vests, smoke detection systems, etc. follow me and get download links from other videos of the blogger.

The complete project directory is as follows:
Insert image description here

Guess you like

Origin blog.csdn.net/sc1434404661/article/details/132528173