Intelligent driving target detection system based on Kitti dataset (PyTorch+Pyside6+YOLOv5 model)

Abstract: The intelligent driving target detection system based on the Kitti data set can be used to detect and locate pedestrians (Pedestrian), vans (Van), people sitting (Person Sitting), cars (Car), trucks (Truck), and riders in daily life. Bicyclists (Cyclist), trams (Tram) and other targets (Misc) can use deep learning algorithms to achieve target detection in the form of pictures, videos, cameras, etc., and also support result visualization and export of picture or video detection results. This system uses the YOLOv5 target detection model training data set, uses the Pysdie6 library to build a page display system, and supports ONNX, PT and other models as the output of the weight model. The functions supported by this system include the import and initialization of the training model; the adjustment of confidence score and IOU threshold, image upload, detection, visualization result display, result export and end detection; video upload, detection, visualization result display, result export and end Detection; camera upload, detection, visual result display and end detection; detected target list, location information; forward reasoning time. In addition, the system supports the simultaneous display of the original image and the test result image, and the simultaneous display of the original video and the test result video. This blog post provides a complete Python code and usage tutorial, suitable for beginners to refer to. For the complete code resource file, please go to the download link at the end of the post.
insert image description here

basic introduction

In recent years, machine learning and deep learning have made great progress, and deep learning methods have shown better performance than traditional methods in terms of detection accuracy and speed. YOLOv5 is the fifth generation of the single-stage target detection algorithm YOLO. According to experiments, it has been significantly improved in terms of speed and accuracy. The open source code can be found at https://github.com/ultralytics/yolov5. Therefore, this blog post uses the YOLOv5 detection algorithm to implement an intelligent driving target detection system based on the Kitti dataset, and then writes the interface system with the Pyside6 library to complete the development of the target detection and recognition page. I noticed that the latest progress of the YOLO series of algorithms has YOLOv6, YOLOv7, YOLOv8 and other algorithms. The code to replace the detection algorithm in this system with the latest algorithm will also be released later. Welcome to pay attention to collection.

Environment build

(1) Download the complete file to your computer, and then use cmd to open the file directory
(2) Use Conda to create the environment (Anacodna), conda create -n yolo5 python=3.8 and then install torch and torchvision (pip install torch1.10.0+cu113 torchvision0.11.0+cu113 -f https://download.pytorch.org/whl/torch_stable.html -i https://pypi.tuna.tsinghua.edu.cn/simple) where -i https://pypi.tuna .tsinghua.edu.cn/simple represents the use of Tsinghua source, this command requires the CUDA version displayed by nvidia-smi >=11.3, and finally install the remaining dependent packages using: pip install -r requirements.txt -i https://pypi. tuna.tsinghua.edu.cn/simple
insert image description here
insert image description here

(3) Install Pyside6 library pip install pyside6==6.3 -i https://pypi.tuna.tsinghua.edu.cn/simple
insert image description here

(4) For the installation of the pycocotools library under the windows system: pip install pycocotools-windows -i https://pypi.tuna.tsinghua.edu.cn/simple

Interface and function display

The software interface designed in this blog post is given below. The overall interface is simple and elegant. The general functions include training model import and initialization; confidence score and IOU threshold adjustment, image upload, detection, visual result display, result export and end detection; video Upload, detection, visual result display, result export and end detection; detected target list, location information; forward reasoning time. Hope you like it, the initial interface is as follows:
insert image description here

Model selection and initialization

Users can click the model weight selection button to upload the trained model weights, and the training weight format can be .pt, .onnx and . engine, etc., and then click the model weight initialization button to realize the setting of the selected model initialization information.
insert image description here
insert image description here

Confidence score and IOU change

Change the value in the input box below Confidence or IOU to change the progress of the slider synchronously. At the same time, changing the progress value of the slider can also change the value of the input box synchronously; the change of Confidence or IOU value will be synchronized to the configuration in the model. The detection confidence threshold and IOU threshold will be changed.

Image selection, detection and export

Users can click the Select Image button to upload a single image for detection and recognition.
insert image description here

Then click the image detection button to complete the target detection function of the input image. After that, the system will output the detection time in the column of time spent and the number of detected targets in the column of target quantity. You can select the detected target in the drop-down box, corresponding to the target Changes in label values ​​for positions (i.e. xmin, ymin, xmax, and ymax).
insert image description here

Then click the detection result display button to display the detection result of the input image at the bottom left of the system, and the system will display the category, location and confidence information of the recognized target in the picture.
insert image description here

Click the image detection result export button to export the detected image, and enter the saved image name and suffix in the save bar to save the detection result image.
insert image description here

Click the End Image Detection button to complete the refresh of the system interface, clear all output information, and then click the Select Image or Select Video button to upload the image or video.

Video selection, detection and export

Users can click the Select Video button to upload a video for detection and recognition, and then the system will input the first frame of the video to the upper left of the system interface for display.
insert image description here

Then click the video detection button to complete the target detection function of the input video. After that, the system will output the detection time in the column of time spent, and the number of detected targets in the column of target quantity. You can select the detected target in the drop-down box, corresponding to the target Changes in label values ​​for positions (i.e. xmin, ymin, xmax, and ymax).
insert image description here

Click the Pause Video Detection button to pause the input video. At this time, the button changes to continue video detection. The input video frame and frame detection results will remain in the system interface. You can click the drop-down target box to select the coordinate position information of the detected target, and then Click the continue video detection button to realize the detection of the input video.
Click the video detection result export button to export the detected video, and enter the saved picture name and suffix in the save bar to save the detection result video.
insert image description here

Click the End Video Detection button to complete the refresh of the system interface, clear all output information, and then click the Select Image or Select Video button to upload images or videos.

Camera opening, detection and termination

The user can click the Open Camera button to open the camera device for detection and identification, and then the system will input the camera image to the upper left of the system interface for display.
insert image description here

Then click the camera detection button to complete the target detection function of the input camera. After that, the system will output the detection time in the column of time spent, and output the number of detected targets in the column of target quantity. You can select the detected target in the drop-down box, corresponding to the target Changes in label values ​​for positions (i.e. xmin, ymin, xmax, and ymax).
insert image description here

Click the End Video Detection button to complete the refresh of the system interface, clear all output information, and then click the Select Image or Select Video or Open Camera button to upload images, videos or open the camera.

Introduction to Algorithm Principles

本系统采用了基于深度学习的单阶段目标检测算法YOLOv5,相比于YOLOv3和YOLOv4,YOLOv5在检测精度和速度上都有很大的提升。YOLOv5算法的核心思想是将目标检测问题转化为一个回归问题,通过直接预测物体中心点的坐标来代替Anchor框。此外,YOLOv5使用SPP(Spatial Pyramid Pooling)的特征提取方法,这种方法可以在不增加计算量的情况下,有效地提取多尺度特征,提高检测性能。YOLOv5s模型的整体结构如下图所示。

insert image description here

The YOLOv5 network structure is composed of Input, Backbone, Neck, and Prediction. The input part of YOLOv5 is the input end of the network, and the Mosaic data enhancement method is used to randomly crop the input data and then splicing. Backbone is the network part of YOLOv5 extracting features, and the feature extraction ability directly affects the performance of the entire network. In the feature extraction stage, YOLOv5 uses the CSPNet (Cross Stage Partial Network) structure, which divides the input feature map into two parts, one part is processed through a series of convolutional layers, the other part is directly down-sampled, and finally the two parts feature map Perform fusion. This design makes the network have stronger nonlinear expressive ability, and can better handle complex backgrounds and diverse objects in target detection tasks. In the Neck stage, the feature maps are fused using continuous convolutional kernel C3 structural blocks. In the Prediction stage, the model uses the resulting feature map to predict the center coordinates and size information of the target. The blogger feels that YOLOv5 is a high-performance solution for target detection, which can classify and locate targets with high accuracy. Of course, YOLOv6, YOLOv7, YOLOv8 and other algorithms are constantly being proposed and improved, and follow-up bloggers will also integrate these algorithms into this system, so stay tuned.

Dataset introduction

The Kitti dataset used in this system labels pedestrians (Pedestrian), vans (Van), sitting people (Person Sitting), cars (Car), trucks (Truck), cyclists (Cyclist), trams ( Tram) and other targets (Misc) these eight categories, the data set has a total of 7481 pictures. The categories in this dataset have a large number of rotations and different lighting conditions, which help to train a more robust detection model. The Kitti detection and recognition data set in this experiment contains 6000 pictures in the training set and 1481 pictures in the verification set. Some of the selected data and some sample data sets are shown in the figure below. Since the YOLOv5 algorithm has a limit on the size of the input image, all images need to be adjusted to the same size. In order to reduce the distortion of the picture as much as possible without affecting the detection accuracy, we resize all the pictures to 640x640 and keep the original aspect ratio. In addition, in order to enhance the generalization ability and robustness of the model, we also use data augmentation techniques, including random rotation, scaling, cropping and color transformation, etc., to expand the dataset and reduce the risk of overfitting.
insert image description here

Key code analysis

The deep learning model of this system is implemented using PyTorch, and the target detection is based on the YOLOv5 algorithm. In the training phase, we used the pre-trained model as the initial model for training, and then optimized the network parameters through multiple iterations to achieve better detection performance. During training, we employ techniques such as learning rate decay and data augmentation to enhance the generalization ability and robustness of the model.
During the testing phase, we used the trained model to detect new images and videos. By setting the threshold, the detection frame whose confidence is lower than the threshold is filtered out, and finally the detection result is obtained. At the same time, we can also save the detection results in image or video format for subsequent analysis and application. This system is based on the YOLOv5 algorithm and implemented using PyTorch. The main libraries used in the code include PyTorch, NumPy, OpenCV, PyQt, etc.
insert image description here
insert image description here

Pyside6 interface design

Pyside6 is one of the GUI programming solutions for the Python language, which can quickly create GUI applications for Python programs. In this blog post, we use the Pyside6 library to create a graphical interface to provide users with an easy-to-use interactive interface, enabling users to select pictures and videos for target detection.
We use Qt Designer to design the graphical interface, and then use Pyside6 to convert the designed UI file into Python code. The graphical interface contains multiple UI controls, such as labels, buttons, text boxes, multi-select boxes, and so on. Through the signal slot mechanism in Pyside6, UI controls and program logic codes can be connected to each other.

Experimental results and analysis

In the experimental results and analysis section, we use indicators such as precision and recall to evaluate the performance of the model, and also analyze the training process through the loss curve and PR curve. In the training phase, we used the Kitti dataset introduced earlier for training, and used the YOLOv5 algorithm to train the dataset. A total of 300 epochs were trained. During the training process, we used tensorboard to record the loss curves of the model on the training set and validation set. As can be seen from the figure below, as the number of training increases, the training loss and verification loss of the model gradually decrease, indicating that the model continues to learn more accurate features. After training, we evaluated the model on the validation set of the dataset and obtained the following results.
insert image description here

The figure below shows the PR curve of our trained YOLOv5 model on the verification set. It can be seen from the figure that the model has achieved a high recall rate and precision rate, and the overall performance is good.
insert image description here

The figure below shows the Mosaic data enhancement image of this blog post when using the YOLOv5 model to train the Kitti dataset.
insert image description here
insert image description here

In summary, the YOLOv5 model trained in this blog post performs well on the dataset, has high detection accuracy and robustness, and can be applied in actual scenarios. In addition, the blogger conducted a detailed test on the entire system, and finally developed a smooth version of the high-precision target detection system interface, which is the demonstration part of this blog post. The complete UI interface, test picture video, code files, etc. have been packaged and uploaded , Interested friends can follow my private message to get it.

Other deep learning-based target detection systems such as tomatoes, cats and dogs, goats, wild targets, cigarette butts, QR codes, helmets, traffic police, wild animals, wild smoke, human fall recognition, infrared pedestrians, poultry pigs, apples, bulldozers, Friends in need of bees, phone calls, pigeons, footballs, cows, face masks, safety vests, smoke detection systems, etc. follow me and get download links from other videos of the blogger.

The complete project directory is as follows:

insert image description here

Guess you like

Origin blog.csdn.net/sc1434404661/article/details/131860854