Use the cityscapes dataset to build a yolov7 test set

The establishment of the test set is very important to evaluate the quality of the model. This paper uses the cityscapes data set, converts its segmentation labels to obtain target detection labels, and establishes a yolov7 test set to test the trained model.

Label format conversion mainly refers to https://blog.csdn.net/Shenpibaipao/article/details/111240711

The download, format conversion, and yolov7 training process link of the aforementioned bdd100k dataset in this article: https://blog.csdn.net/qq_37214693/article/details/126708738?spm=1001.2014.3001.5501

After conversion using the program linked above, the resulting classes are as follows:

image-20220907104810398

There are many categories in classes that we don't need, as well as types like "cargroup". Check the comments on the official website to know "If the boundaries between such instances cannot be clearly seen, mark the entire group/group together and The annotation is group, such as cargroup. "Because in order to build a test set here, the previously trained model does not need to consider this situation, so we have to throw away the category of group.

Single instance annotations are available. However, if the boundary between such instances cannot be clearly seen, the whole crowd/group is labeled together and annotated as group, e.g. car group.

The filtered categories are shown in the figure below.

image-20220907135713104

Modify the code, mask the following code, define label_map, and re-run the program to get the category label I need and the corresponding yolo format label of each picture.

        # if obj_label not in label_map.keys():  # 记录目标类型转为int值
        #
        #     label_map[obj_label] = len(label_map.keys())  # 标签从0开始
label_map = {
    
    "person": 0, "rider": 1, "car": 2, "bus": 3, "truck": 4, "bicycle": 5, "motorcycle": 6,
             "traffic light": 10, "traffic sign": 11, "train": 12}

The converted file structure is as follows:

gtFine
	test
	train
	val
labels
	test
	train
	val
leftImg8bit
	test
	train
	val

In the test, train, and val folders, according to the collection city, it is divided into data of multiple different cities. According to our needs, put all the pictures in one folder, put the labels in one folder, and put the two folders together for training. Under the folder, modify the parameters related to yolo to train or test.

image-20220907144635332

The final structure of the yolov7/data/cityscapes folder is shown in the figure. All images are under images, and all labels are under labels.

image-20220907151231212

Modify according to data/coco.yaml to get cityscapes.yaml. Because it is only used for testing here, you only need to define val.

image-20220907152744086

Run the following command to test

python test.py --data data/cityscapes.yaml --img 640 --batch 4 --conf 0.001 --iou 0.65 --device 0 --weights runs/train/bdd100k6/weights/best.p
t --name bdd100k6_val_20220907

python test.py --data data/cityscapes.yaml --img 640 --batch 32 --conf 0.001 --iou 0.65 --device 0 --weights yolov7_bdd.pt --name bdd100k6_val_202209071515

The test results are shown in the figure. Since the number of model training rounds is still relatively small, the mAP is low. However, the traffic lights in the training set are divided into four categories: yellow, green and none. There is no color attribute in the test set, so the mAP of tl_none in the test results is low. , requires further processing to group all traffic lights together. The mAP of the train is obviously out of the normal range. The guess is that the amount of data is too small. In bdd100k, there are only 179 labels for the train. This class can be considered to be removed, and since the train is basically not considered in the automatic driving environment, the use of the model Has little effect.

img

Although the accuracy of the model is still very low, the test results are still worthy of analysis. The confusion matrix, PR curve and test legend are given below.

Analysis of the confusion matrix shows that the accuracy of categories such as cars, humans, riders, and buses can clearly show that the training is effective, but because the model training is not in place, there are a large number of missed detections, because the number of labels for cars is at least It takes an order of magnitude, so it is reasonable for the accuracy of the car to reach 0.72, which shows that the prediction of the model is effective, but it still needs multiple rounds of iterations. The traffic lights confirmed the above speculation. The low mAP of the traffic lights was caused by the inability to distinguish the colors of the traffic lights in the test set, and all the trains were missed. This shows that the training of the model is basically ineffective for the training of the train, which is also caused by the small number of samples.

confusion_matrix

It can also be seen from the PR curve that the training effect of the car is significantly better than that of other categories, and the entire model has yet to be trained

PR_curve

Three sets of test batches are given below. It can be seen that the overall prediction effect is not bad, and most of the labels are detected, but there are still some missed detections and many false detections. This is because the model accuracy is not enough, and the other is On the one hand, it can be found that some labels are wrong. This is related to the label of cityscapes itself is a segmentation label, and there will be some errors when it is transferred to the detection label. In addition, different data sets have different standards for label calibration, such as in test batch1. , on the left side of the second word map, two traffic signs are detected, but the label only has the front one.

test_batch0_labels

test_batch0_labels

test_batch0_pred

test_batch0_pred

test_batch1_labels

test_batch1_labels

test_batch1_pred

test_batch1_pred

test_batch2_labels

test_batch2_labels

test_batch2_pred

test_batch2_pred

Guess you like

Origin blog.csdn.net/qq_37214693/article/details/126751732