Based on the full series of YOLOv8 [n/s/m/l/x], the CCTSDB2021 traffic sign detection and recognition system under road traffic scenarios is developed and constructed.

Traffic sign detection is an important task in traffic sign recognition systems. Compared with traffic signs in other countries, China's traffic signs have its own unique characteristics. Convolutional neural networks (CNN) have made breakthrough progress in computer vision tasks and achieved great success in traffic sign classification. The CCTSDB data set was produced by relevant scholars and teams from Changsha University of Science and Technology. It has nearly 20,000 traffic sign sample images, containing a total of nearly 40,000 traffic signs. However, only 10,000 of the images have been disclosed so far. There are three common types of traffic signs: instruction signs, prohibition signs and warning signs. With the passage of time, different versions of data sets have emerged. The main purpose of this article is to develop and build a target detection and recognition system on the CCTSDB2021 data set based on yolov8. First, let’s look at the example effect:

In the previous article, we have carried out a series of development practices. If you are interested, you can read it by yourself:

" Development and construction of CCTSDB2021 traffic sign detection and recognition system in road traffic scenarios based on YOLOv3"

"Development and Construction of CCTSDB2021 Traffic Sign Detection and Recognition System in Road Traffic Scenarios Based on YOLOv4"

"Development and construction of CCTSDB2021 traffic sign detection and recognition system in road traffic scenarios based on YOLOv5 full series parameter model [n/s/m/l/x]"

"Development and Construction of CCTSDB2021 Traffic Sign Detection and Recognition System in Road Traffic Scenarios Based on YOLOv6"

"Development and construction of CCTSDB2021 traffic sign detection and recognition system in road traffic scenarios based on YOLOv7"

In the CCTSDB2021 data set, there are 17856 images in the training set and positive sample test set. The traffic signs in the image are divided into mandatory, prohibitive and warning according to their meaning. There are a total of 16356 training set images, numbered 00000-18991. The positive sample test set has 1500 images, numbered 18992-20491. The "XML" compressed package stores the XML format annotation files of the training set and the positive sample test set. The "train_img" compressed package stores training set images. The "train_labels" compressed package stores the TXT format annotation files of the training set. The "test_img" compressed package stores positive sample test set images. The "Weather and Environment Based Classification" compressed package stores XML format annotation files of positive sample test sets classified according to weather and lighting conditions. The "Classification based on traffic sign size" compressed package stores the XML format annotation file of the positive sample test set classified according to the size of the traffic sign in the image. "Negative Samples" contains 500 negative sample images.

Next look at the data set:

If you have any questions about YOLOv8 development and building your own target detection project, you can read the following article, as shown below:

"Super detailed tutorial on developing and building a target detection model based on YOLOv8 [taking the weld quality inspection data scenario as an example]"

Very detailed practical development tutorial. This article will not be expanded on here, because starting from YOLOv8 it has become an installation package, and the overall usage difference with v5 and v7 is still relatively large.

The core features and changes of YOLOv8 are as follows:
1. Provides a new SOTA model (state-of-the-art model), including P5 640 and P6 1280 resolution target detection networks and YOLACT-based instance segmentation models. Like YOLOv5, models of different sizes in N/S/M/L/X scales are also provided based on scaling factors to meet the needs of different scenarios.
2. The backbone network and Neck part may refer to the YOLOv7 ELAN design idea, and the C3 of YOLOv5 The structure was replaced by a C2f structure with richer gradient flow, and different channel numbers were adjusted for different scale models. This was a careful fine-tuning of the model structure. It no longer applied a set of parameters to all models, which greatly improved the model performance.
3. The Head part has undergone major changes compared to YOLOv5. It has been replaced by the current mainstream decoupling head structure, which separates the classification and detection heads. It also changes from Anchor-Based to Anchor-Free.
4. Loss calculation uses TaskAlignedAssigner. Sample distribution strategy, and the introduction of Distribution Focal Loss
5. The data enhancement part of training introduces the operation of turning off Mosiac enhancement in YOLOX for the last 10 epochs, which can effectively improve accuracy.

The official project address is here , as shown below:

At present, more than 1.7w of stars have been harvested. The officially provided pre-training model is as follows:

Model size
(pixels)
mAPval
50-95
Speed
CPU ONNX
(ms)
Speed
A100 TensorRT
(ms)
params
(M)
FLOPs
(B)
YOLOv8n 640 37.3 80.4 0.99 3.2 8.7
YOLOv8s 640 44.9 128.4 1.20 11.2 28.6
YOLOv8m 640 50.2 234.7 1.83 25.9 78.9
YOLOv8l 640 52.9 375.2 2.39 43.7 165.2
YOLOv8x 640 53.9 479.1 3.53 68.2 257.8

Another set of pre-trained models such as:

Model size
(pixels)
mAPval
50-95
Speed
CPU ONNX
(ms)
Speed
A100 TensorRT
(ms)
params
(M)
FLOPs
(B)
YOLOv8n 640 18.4 142.4 1.21 3.5 10.5
YOLOv8s 640 27.7 183.1 1.40 11.4 29.7
YOLOv8m 640 33.6 408.5 2.26 26.2 80.6
YOLOv8l 640 34.9 596.9 2.43 44.1 167.4
YOLOv8x 640 36.3 860.6 3.56 68.7 260.6

It is built based on the Open Image V7 data set and can be selected and used according to your own needs.

The positioning of YOLOv8 is not just target detection, but a powerful and comprehensive tool library. Therefore, it supports multiple types of tasks: attitude estimation, detection, classification, segmentation, and tracking. You can choose to use it according to your own needs. Here I won’t elaborate further.

A simple example implementation is as follows:

from ultralytics import YOLO
 
# yolov8n
model = YOLO('yolov8n.yaml').load('yolov8n.pt')  # build from YAML and transfer weights
model.train(data='data/self.yaml', epochs=100, imgsz=640)
 
 
# yolov8s
model = YOLO('yolov8s.yaml').load('yolov8s.pt')  # build from YAML and transfer weights
model.train(data='data/self.yaml', epochs=100, imgsz=640)
 
 
# yolov8m
model = YOLO('yolov8m.yaml').load('yolov8m.pt')  # build from YAML and transfer weights
model.train(data='data/self.yaml', epochs=100, imgsz=640)
 
 
# yolov8l
model = YOLO('yolov8l.yaml').load('yolov8l.pt')  # build from YAML and transfer weights
model.train(data='data/self.yaml', epochs=100, imgsz=640)
 
 
# yolov8x
model = YOLO('yolov8x.yaml').load('yolov8x.pt')  # build from YAML and transfer weights
model.train(data='data/self.yaml', epochs=100, imgsz=640)

Here we select five models of n, s, m, l and x with different parameter magnitudes for development.

The model file of yolov8 is given here as follows:

# Parameters
nc: 3  # number of classes
scales: # model compound scaling constants, i.e. 'model=yolov8n.yaml' will call yolov8.yaml with scale 'n'
  # [depth, width, max_channels]
  n: [0.33, 0.25, 1024]  # YOLOv8n summary: 225 layers,  3157200 parameters,  3157184 gradients,   8.9 GFLOPs
  s: [0.33, 0.50, 1024]  # YOLOv8s summary: 225 layers, 11166560 parameters, 11166544 gradients,  28.8 GFLOPs
  m: [0.67, 0.75, 768]   # YOLOv8m summary: 295 layers, 25902640 parameters, 25902624 gradients,  79.3 GFLOPs
  l: [1.00, 1.00, 512]   # YOLOv8l summary: 365 layers, 43691520 parameters, 43691504 gradients, 165.7 GFLOPs
  x: [1.00, 1.25, 512]   # YOLOv8x summary: 365 layers, 68229648 parameters, 68229632 gradients, 258.5 GFLOPs
 
# YOLOv8.0n backbone
backbone:
  # [from, repeats, module, args]
  - [-1, 1, Conv, [64, 3, 2]]  # 0-P1/2
  - [-1, 1, Conv, [128, 3, 2]]  # 1-P2/4
  - [-1, 3, C2f, [128, True]]
  - [-1, 1, Conv, [256, 3, 2]]  # 3-P3/8
  - [-1, 6, C2f, [256, True]]
  - [-1, 1, Conv, [512, 3, 2]]  # 5-P4/16
  - [-1, 6, C2f, [512, True]]
  - [-1, 1, Conv, [1024, 3, 2]]  # 7-P5/32
  - [-1, 3, C2f, [1024, True]]
  - [-1, 1, SPPF, [1024, 5]]  # 9
 
# YOLOv8.0n head
head:
  - [-1, 1, nn.Upsample, [None, 2, 'nearest']]
  - [[-1, 6], 1, Concat, [1]]  # cat backbone P4
  - [-1, 3, C2f, [512]]  # 12
 
  - [-1, 1, nn.Upsample, [None, 2, 'nearest']]
  - [[-1, 4], 1, Concat, [1]]  # cat backbone P3
  - [-1, 3, C2f, [256]]  # 15 (P3/8-small)
 
  - [-1, 1, Conv, [256, 3, 2]]
  - [[-1, 12], 1, Concat, [1]]  # cat head P4
  - [-1, 3, C2f, [512]]  # 18 (P4/16-medium)
 
  - [-1, 1, Conv, [512, 3, 2]]
  - [[-1, 9], 1, Concat, [1]]  # cat head P5
  - [-1, 3, C2f, [1024]]  # 21 (P5/32-large)
 
  - [[15, 18, 21], 1, Detect, [nc]]  # Detect(P3, P4, P5)

It includes five models with different parameter magnitudes. Keep the same parameter settings during training settlement. After the training is completed, we will perform horizontal comparison and visualization for overall comparison and analysis.

[Precision Curve]
The Precision-Recall Curve is a visual tool used to evaluate the precision performance of a binary classification model under different thresholds. It helps us understand how the model performs at different thresholds by plotting the relationship between precision and recall at different thresholds.
Precision refers to the ratio of the number of samples that are correctly predicted as positive examples to the number of samples that are predicted to be positive examples. Recall refers to the ratio of the number of samples that are correctly predicted as positive examples to the number of samples that are actually positive examples.
The steps for plotting a precision curve are as follows:
Convert predicted probabilities into binary class labels using different thresholds. Usually, when the predicted probability is greater than the threshold, the sample is classified as a positive example, otherwise it is classified as a negative example.
For each threshold, the corresponding precision and recall are calculated.
Plot precision and recall at each threshold on the same graph to form a precision curve.
Based on the shape and changing trend of the accuracy curve, an appropriate threshold can be selected to achieve the required performance requirements.
By observing the precision curve, we can determine the best threshold according to our needs to balance precision and recall. Higher precision means fewer false positives, while higher recall means fewer false negatives. Depending on specific business needs and cost trade-offs, appropriate operating points or thresholds can be selected on the curve.
Precision curves are often used together with recall curves to provide a more comprehensive analysis of classifier performance and help evaluate and compare the performance of different models.

[Recall Curve]
Recall Curve is a visualization tool used to evaluate the recall performance of a binary classification model under different thresholds. It helps us understand the performance of the model under different thresholds by plotting the relationship between the recall rate at different thresholds and the corresponding precision rate.
Recall refers to the ratio of the number of samples that are correctly predicted as positive examples to the number of samples that are actually positive examples. Recall rate is also called sensitivity (Sensitivity) or true positive rate (True Positive Rate).
The steps for plotting a recall curve are as follows:
Convert predicted probabilities into binary class labels using different thresholds. Usually, when the predicted probability is greater than the threshold, the sample is classified as a positive example, otherwise it is classified as a negative example.
For each threshold, the corresponding recall rate and the corresponding precision rate are calculated.
Plot recall and precision at each threshold on the same graph to form a recall curve.
Based on the shape and changing trend of the recall curve, an appropriate threshold can be selected to achieve the required performance requirements.
By observing the recall curve, we can determine the best threshold according to our needs to balance recall and precision. Higher recall means fewer false negatives, while higher precision means fewer false positives. Depending on specific business needs and cost trade-offs, appropriate operating points or thresholds can be selected on the curve.
Recall curves are often used together with precision curves to provide a more comprehensive analysis of classifier performance and help evaluate and compare the performance of different models.

[F1 value curve]
The F1 value curve is a visualization tool used to evaluate the performance of a binary classification model under different thresholds. It helps us understand the overall performance of the model by plotting the relationship between Precision, Recall and F1 score at different thresholds.
The F1 score is the harmonic average of precision and recall, which takes into account both performance indicators. The F1 value curve can help us determine a balance point between different precision and recall rates to choose the best threshold.
The steps for plotting an F1 value curve are as follows:
Convert the predicted probabilities into binary class labels using different thresholds. Usually, when the predicted probability is greater than the threshold, the sample is classified as a positive example, otherwise it is classified as a negative example.
For each threshold, the corresponding precision, recall and F1 score are calculated.
Plot the precision, recall and F1 score at each threshold on the same graph to form an F1 value curve.
According to the shape and changing trend of the F1 value curve, an appropriate threshold can be selected to achieve the required performance requirements.
F1 value curves are often used together with receiver operating characteristic curves (ROC curves) to help evaluate and compare the performance of different models. They provide a more comprehensive analysis of classifier performance, allowing the selection of appropriate models and threshold settings based on specific application scenarios.

【loss】

From a comprehensive comparison: In comparison, the n series model has the lowest effect, the s series model effect is second, the m series model is in the middle, and the l and x series model effects are close. Considering the parameter magnitude reasoning speed, finally we The l series model was selected as the online inference model.

Next, let’s take a detailed look at the results of the l series model:

【Batch example】

【Training Visualization】

【PR Curve】

Development is not easy. Friends who know the CCTSDB2021 data set will know how much computing resources it takes to complete the development and training of all parameter-level models. If you are interested, you can practice it yourself! You can choose the most lightweight n series model to do it.

Guess you like

Origin blog.csdn.net/Together_CZ/article/details/135461254