The solution to the problem that cannot be detected when using YOLOv5 to detect large aspect ratios (slender targets)

In the engineering of deep learning, there will always be many strange problems, which make people feel dizzy.

        Last Friday, Franpper encountered such a strange problem when training with yolov5-5.0: the labels are always 0 during the training process, indicating that no labels have been read in. Although it can be trained, it is invalid training because the network does not know the target. Where it is, you can only play freely.

In the beginning, Franppe searched for the answer on the Internet to no avail (how come no one has encountered this kind of problem before?). Later, he asked several bloggers and discussed with his friends. Everyone thought it was very metaphysical, and felt that data There is a high possibility that there is a problem with the dataset, but Franpper checked and checked again and again, and really did not find any problems with the dataset, the folder format is fine, and the marked .txt file is also fine, but no other reason can be found, so Franpper can only relabel the data, even reinstall the labelimg and label the data and then train, but it is useless (worry

However, during the labeling process, Franpper discovered an interesting phenomenon: the target that needs to be identified in the project is the information box of the slender bar located at the bottom of the whole picture from the leftmost to the rightmost. When all the targets are framed (that is, in the figure The blue label box) cannot load labels during the training process, and cannot train normally. But when you select half of the target frame (that is, the red marked frame in the figure), you can train normally, so Franpper thinks that YOLOV5 may filter the aspect ratio of the marked frame, so he looks for it while debugging.

really! ! ! There is a box_candidates function in datasets.py, which will filter the label of the image. For specific filtering conditions, please see the note below


def box_candidates(box1, box2, wh_thr=2, ar_thr=20, area_thr=0.1, eps=1e-16):
    """
    用在random_perspective中 对透视变换后的图片label进行筛选
    去除被裁剪过小的框(面积小于裁剪前的area_thr) 还有长和宽必须大于wh_thr个像素,且长宽比范围在(1/ar_thr, ar_thr)之间的限制
    Compute candidate boxes: box1 before augment, box2 after augment, wh_thr (pixels), aspect_ratio_thr, area_ratio
    :params box1: [4, n]
    :params box2: [4, n]
    :params wh_thr: 筛选条件 宽高阈值
    :params ar_thr: 筛选条件 宽高比、高宽比最大值阈值
    :params area_thr: 筛选条件 面积阈值
    :params eps: 1e-16 接近0的数 防止分母为0
    :return i: 筛选结果 [n] 全是True或False   使用比如: box1[i]即可得到i中所有等于True的矩形框 False的矩形框全部删除
    """
    w1, h1 = box1[2] - box1[0], box1[3] - box1[1]  # 求出所有box1矩形框的宽和高  [n] [n]
    w2, h2 = box2[2] - box2[0], box2[3] - box2[1]  # 求出所有box2矩形框的宽和高  [n] [n]
    ar = np.maximum(w2 / (h2 + eps), h2 / (w2 + eps))  # 求出所有box2矩形框的宽高比和高宽比的较大者  [n, 1]
    # 筛选条件: 增强后w、h要大于2   增强后图像与增强前图像面积比值大于area_thr   宽高比大于ar_thr
    return (w2 > wh_thr) & (h2 > wh_thr) & (w2 * h2 / (w1 * h1 + eps) > area_thr) & (ar < ar_thr)  # candidates

The default aspect ratio and the maximum threshold of the aspect ratio are 20, but the aspect ratio in the actual project must be far more than 20, so Franpper set ar_thr to 50, and then restarted the training.

It's time to witness the miracle! ! !   

Finally able to train normally! ! ! The training results are also very gratifying! ! !

Guess you like

Origin blog.csdn.net/weixin_58283091/article/details/128441085