Analysis and solution of embedded situation in target detection frame

Table of contents title

Problem Description
Cause analysis and solutions:
Follow-up and thoughts
Reference documentation

Problem Description

The detection frame output by the target detection model is embedded.

Insert image description here

Cause analysis and solutions:

Based on experience, the first impression is that there is something wrong with the post-processingnms part. Let’s take a look at the corresponding code:

static float CalcIou(const vector<float> &box1, const vector<float> &box2)
{
    
    
    float area1 = box1[6];
    float area2 = box2[6];
    float xx1 = max(box1[0], box2[0]);
    float yy1 = max(box1[1], box2[1]);
    float xx2 = min(box1[2], box2[2]);
    float yy2 = min(box1[3], box2[3]);
    float w = max(0.0f, xx2 - xx1 + 1);
    float h = max(0.0f, yy2 - yy1 + 1);
    float inter = w * h;
    float ovr = inter /(area1 + area2 - inter);
    return ovr;
}

static void MulticlassNms(vector<vector<float>>& bboxes, const vector<vector<float>>& vaildBox, float nmsThr)
{
    
    
    for (auto &item : vaildBox) {
    
     /* score, xcenter, ycenter, w, h, classId */
        float boxXCenter = item[XCENTER_IDX];
        float boxYCenter = item[YCENTER_IDX];
        float boxWidth = item[W_IDX];
        float boxHeight = item[H_IDX];

        float x1 = (boxXCenter - boxWidth / 2);
        float y1 = (boxYCenter - boxHeight / 2);
        float x2 = (boxXCenter + boxWidth / 2);
        float y2 = (boxYCenter + boxHeight / 2);
        float area = (x2 - x1 + 1) * (y2 - y1 + 1);
        bool keep = true;
        /* lx, ly, rx, ry, score, class id, area */
        vector<float> bbox {
    
    x1, y1, x2, y2, item[SCORE_IDX], item[CLSAA_ID_IDX], area};
        for (size_t j = 0; j < bboxes.size(); j++) {
    
    
            if (CalcIou(bbox, bboxes[j]) > nmsThr) {
    
    
                keep = false;
                break;
            }
        }
        if (keep) {
    
    
            bboxes.push_back(bbox);
        }
    }
}

The most likely reason for the current analysis is that the setting of nms is too large and fails to filter out overlapping detection frames. It turns out that The setting was , now it is adjusted to . The embedded detection frame has basically disappeared:nmsThrnmsThr0.450.1

Insert image description here

Follow-up and thoughts

About the loss functions in target detection: IOU_Loss, GIOU_Loss, DIOU_Loss and CIOU_Loss, I also made a note, if you need it You can find out.
Let’s give a conclusion first and take a comprehensive look at the differences between various Loss functions::
IOU_Loss: Main considerations a>: using . The following is also Given code for is used in this project. If the reasoning performance is sufficient, you can consider using The basic Information. the scale of the bounding box aspect ratio: Based on DIOU, consider CIOU_Loss information. the distance between the center points of the bounding box: Based on IOU and GIOU, consider DIOU_Loss The problem. when the bounding boxes do not overlap: Based on IOU, solve GIOU_Loss. The overlapping area of the detection frame and the target frame

IOUDIOUDIOUnms

static float CalcDiou(const vector<float>& box1, const vector<float>& box2) {
    
    
    float x1 = min(box1[0], box2[0]);
    float y1 = min(box1[1], box2[1]);
    float x2 = max(box1[2], box2[2]);
    float y2 = max(box1[3], box2[3]);
    
    float c_x1 = (box1[0] + box1[2]) / 2.0;
    float c_y1 = (box1[1] + box1[3]) / 2.0;
    float c_x2 = (box2[0] + box2[2]) / 2.0;
    float c_y2 = (box2[1] + box2[3]) / 2.0;
    
    float dist_center = sqrt((c_x1 - c_x2) * (c_x1 - c_x2) + (c_y1 - c_y2) * (c_y1 - c_y2));
    
    float w = max(0.0f, x2 - x1);
    float h = max(0.0f, y2 - y1);
    
    float intersection = w * h;
    float area1 = (box1[2] - box1[0]) * (box1[3] - box1[1]);
    float area2 = (box2[2] - box2[0]) * (box2[3] - box2[1]);
    
    float union_area = area1 + area2 - intersection;
    
    float diou = intersection / union_area - dist_center * dist_center / (union_area * union_area);
    
    return diou;
}

static void MulticlassNms(vector<vector<float>>& bboxes, const vector<vector<float>>& vaildBox, float nmsThr)
{
    
    
    for (auto &item : vaildBox) {
    
     /* score, xcenter, ycenter, w, h, classId */
        float boxXCenter = item[XCENTER_IDX];
        float boxYCenter = item[YCENTER_IDX];
        float boxWidth = item[W_IDX];
        float boxHeight = item[H_IDX];

        float x1 = (boxXCenter - boxWidth / 2);
        float y1 = (boxYCenter - boxHeight / 2);
        float x2 = (boxXCenter + boxWidth / 2);
        float y2 = (boxYCenter + boxHeight / 2);
        float area = (x2 - x1 + 1) * (y2 - y1 + 1);
        bool keep = true;

        vector<float> bbox {
    
    x1, y1, x2, y2, item[SCORE_IDX], item[CLSAA_ID_IDX], area};
        for (size_t j = 0; j < bboxes.size(); j++) {
    
    
            if (CalcDiou(bbox, bboxes[j]) > nmsThr) {
    
    
                keep = false;
                break;
            }
        }
        if (keep) {
    
    
            bboxes.push_back(bbox);
        }
    }
}

Some readers may have questions, why CIOU_nms is not used here, but DIOU_nms?
Answer: Because a>CIOU_loss is an added influence factor based on DIOU_loss, including the information of groundtruth label box, which is used for regression during training.
However, during the test process, there is no information about groundtruth, so there is no need to consider the impact factor, so just use DIOU_nms.

Reference documentation

https://blog.csdn.net/nan355655600/article/details/106246625