yolov5 detects small targets (with source code)

yolov5 small target detection (image cutting method with source code)

6.30 Update the label data processing of the cut small picture

foreword

Everyone is familiar with yolov5, and it is very versatile, but the effect of detecting some small targets is very poor.
During the training process of the YOLOv5 algorithm, the default image size is 640x640 pixels (img-size). To detect small targets, if the img-size is simply changed to 4000*4000, the required memory will be reduced. Get so big it's almost impossible to do.
The following are the results of small target detection training on 6k * 4k pictures, eight pictures and one word: bad
insert image description here
data set (road sign):
insert image description here

image cutting

The easiest way is to cut this large picture into small pictures. Refer to the open source framework SAHI [1]
for several questions:

1. For simple cutting, it is necessary to ensure that the size of each picture after cutting is the same;
2. The cutting process will inevitably cut off the target, and a "fusion" area needs to be set;
3. The data set after cutting is a data set of small pictures, so the target When detecting, only small pictures can be detected. Then it is necessary to merge the small pictures after detection. (trouble)

1. Image cutting

General structure diagram:
where the blue and green are 4*4=16 subimages after cutting, the part of the red and blue frame is the fusion image, and the mixing ratio is 0.2
insert image description here

This is simple, refer to the blog python to cut the picture , just use opencv to cut it, pay attention to cutting the fusion part of the picture at the same time.

# 融合部分图片
def img_mix(img, row_height, col_width, save_path, file):
    mix_num = 3
    # 每行的高度和每列的宽度

    # 分割成4*4就是有
    # 4*3个行融合区域
    # 3*4个列融合区域
    # 一行的融合
    row = 0
    for i in range(mix_num + 1):
        mix_height_start = i * row_height
        mix_height_end = (i + 1) * row_height
        for j in range(mix_num):
            mix_row_path = save_path + '/' + file + '_mix_row_' + str(row) + '.jpg'
            mix_row_start = int(j * col_width + col_width * (1 - mix_percent))
            mix_row_end = int(mix_row_start + col_width * mix_percent * 2)
            # print(mix_height_start, mix_height_end, mix_row_start, mix_row_end)
            mix_row_img = img[mix_height_start:mix_height_end, mix_row_start:mix_row_end]
            cv2.imwrite(mix_row_path, mix_row_img)
            row += 1

    col = 0
    # 一列的融合
    for i in range(mix_num):
        mix_col_start = int(i * row_height + row_height * (1 - mix_percent))
        mix_col_end = int(mix_col_start + row_height * mix_percent * 2)
        for j in range(mix_num + 1):
            mix_col_path = save_path + '/' + file + '_mix_col_' + str(col) + '.jpg'
            mix_width_start = j * col_width
            mix_width_end = (j + 1) * col_width
            # print(mix_col_start, mix_col_end, mix_width_start, mix_width_end)
            mix_col_img = img[mix_col_start:mix_col_end, mix_width_start:mix_width_end]
            cv2.imwrite(mix_col_path, mix_col_img)
            col += 1
After cutting into small pictures, the label processing part

I read the target data directly from the xml file. The code: get_xml_data.py is
saved in a txt file format after the read is successful. The stored data is

图片类型(0:子图,1:行融合图,2:列融合图)
小图所处位置(0~15)
小图文件名
大图宽度
大图高度
目标类型
x最小值
x最大值
y最小值
y最大值

The results obtained after reading are as follows
insert image description here
. Next, we need to further analyze the data. Code: txt_to_yolo.py
Now we know: the position of the small image, the width and height of each small image, and the width and height of the large image. Then we can locate the target on the small image For
example: suppose the width and height of the picture below are 100, the upper right small box is in the
center of the upper right part, and the width and height are 10, then the position information of the small box is
xmin=70, xmax=80
ymin=20, ymax=30
On the No. 1 submap (number 0~3),
insert image description here
as far as a small part of the upper right corner is concerned, the position information of the small section is
xmin=20, xmax=30
ymin=20, ymax=30
According to this idea, it can be very Handle other data nicely.

2. Target detection

There is nothing to say about this. After the picture is cut, just change the picture training path and detection path of yolov5 to the cut picture.
Note that
there is a fusion map during training, but not during detection (because I did not detect the fusion map, it is easy to overlap with the sub-graph, and the comparison is the result of machine detection) Change the path: directly change the path below, such as detect
.def run() py:
insert image description here
test results:
insert image description here
insert image description here

3. Fusion

This is difficult to say, and it is not easy to say,
the main thing is to have a clear mind

1. It is necessary to locate the position of each picture (for example, cut into 4*4, there are 16 positions in total)
2. According to each position, the content of the detection result (txt file) of each picture is processed accordingly. Into the corresponding position in the large image, for example, the position is the upper right corner (0, 3), then the x value of the detected result in the image should be added (3 * large image width/4), and then re-converted to yolov5 label format

That's about it?
fusion result
insert image description here

4. Result observation

As mentioned earlier, the training and detection here are based on small pictures, so it is not easy to directly observe the results (detect the frame on the picture), then you can directly use ImageDraw to draw a frame result
on the original image for the result of the fused txt file
not bad
insert image description here

training results

look at the training results
insert image description here

other

You can also refer to some similar projects
yolov5-tph: https://github.com/Gumpest/YOLOv5-Multibackbone-Compression
yolov-z
What else can be added to the small target detection layer (it doesn’t feel common, try it out except to increase the training time In addition, the effect is also general)

Related documents:

Configuration file: config.py
Cutting image: cut_image.py
Reading xml data: get_xml_data.py
Cutting label data: txt_to_yolo.py
Fusion picture: joint_image.py
Original picture frame: draw_box.py
Main function: main.py
Download address ①
file Download address②

Guess you like

Origin blog.csdn.net/qq_43622870/article/details/124984295