Model inference post-processing C++ code optimization case

Project scenario:

Optimization of inferred post-processing runtime.

Let’s first look at the time comparison before and after optimization:
Before optimization:
Insert image description here
After optimization:

The improvement is still huge.


Problem Description

The post-processing of data obtained after model inference requires a large amount of time.

auto outputsf = pRealEngine->sampleProcess->outData;
    //postprogress

std::vector<float> outputvtemp; 
std::vector<std::vector<BoundingBox>> preds(pRealEngine->num_class); //class-bboxes
BoundingBox temp;
auto n = pRealEngine->modelout_len*pRealEngine->nc;
// int c=0;
for(auto i=0;i<n;i++){
    
    
    outputvtemp.push_back(outputsf[i]);
    if((i+1)%pRealEngine->nc==0) {
    
    
        if(outputvtemp[4]>pRealEngine->confidence_threshold){
    
    
            auto cx = outputvtemp[0];
            auto cy = outputvtemp[1];
            auto w = outputvtemp[2];
            auto h = outputvtemp[3];
            temp.x = std::max(int((cx-w/2)),0);
            temp.y = std::max(int((cy-h/2)),0);
            temp.w = std::min(int(w),int(pImgInfo->i32Width-temp.x));
            temp.h = std::min(int(h),int(pImgInfo->i32Height-temp.y));
            temp.cx = int((cx-w/2));
            temp.cy = int((cx-w/2));
            temp.confidence =  outputvtemp[4];
            temp.classid = getfclassid(outputvtemp);

            preds[temp.classid].push_back(temp);
        }
        outputvtemp.clear();
    } 
}	

Cause Analysis:

不必要的数据复制: Used in the original code outputvtemp.push_back(outputsf[i])to outputsf[i]add to outputvtempthe vector. This will involve memory reallocation and data copying. To avoid this overhead, the array can be accessed directly in the loop outputsfwithout using an additional vector.

重复计算: The values ​​calculated in the loop cx - w/2and cy - h/2are reused in multiple places. These calculations can be moved outside the conditional to avoid double calculations.

复杂的条件判断: There are some conditional judgments in the loop, for example if (outputvtemp[4] > pRealEngine->confidence_threshold), these conditional judgments may increase the running time. Make sure these conditional judgments are necessary and, if possible, minimize unnecessary conditional judgments.


solution:

auto outputsf = pRealEngine->sampleProcess->outData;
std::vector<std::vector<BoundingBox>> preds(pRealEngine->num_class); //class-bboxes
BoundingBox temp;
auto n = pRealEngine->modelout_len * pRealEngine->nc;
// int elements_per_output = 5; // 每个输出元素包含 5 个值

for (auto i = 0; i < n; i +=pRealEngine->nc)// elements_per_output) 
{
    
    
    float confidence = outputsf[i + 4];

    if (confidence > pRealEngine->confidence_threshold) 
    {
    
    
        auto cx = outputsf[i];
        auto cy = outputsf[i + 1];
        auto w = outputsf[i + 2];
        auto h = outputsf[i + 3];
        temp.x = std::max(int((cx - w / 2)), 0);
        temp.y = std::max(int((cy - h / 2)), 0);
        temp.w = std::min(int(w), int(pImgInfo->i32Width - temp.x));
        temp.h = std::min(int(h), int(pImgInfo->i32Height - temp.y));
        temp.cx = int((cx - w / 2));
        temp.cy = int((cy - h / 2));
        temp.confidence = confidence;
        // 将数组转换为 std::vector<float>
        std::vector<float> outputvtemp(outputsf + i, outputsf + i + pRealEngine->nc);
        
        // 调用 getfclassid 函数,并传递起始和结束索引
         temp.classid = getfclassid(outputvtemp); // 传递起始和结束索引

        preds[temp.classid].push_back(temp);
    }
}

summary

This article is a practical problem encountered in my own project. Since I have just started a C++-related project, I hereby record it! ! !
C++ has a long way to go, come on! ! !
September 9, 2023 15:33:36

Guess you like

Origin blog.csdn.net/JulyLi2019/article/details/132777406