Article directory
Project scenario:
Optimization of inferred post-processing runtime.
Let’s first look at the time comparison before and after optimization:
Before optimization:
After optimization:
The improvement is still huge.
Problem Description
The post-processing of data obtained after model inference requires a large amount of time.
auto outputsf = pRealEngine->sampleProcess->outData;
//postprogress
std::vector<float> outputvtemp;
std::vector<std::vector<BoundingBox>> preds(pRealEngine->num_class); //class-bboxes
BoundingBox temp;
auto n = pRealEngine->modelout_len*pRealEngine->nc;
// int c=0;
for(auto i=0;i<n;i++){
outputvtemp.push_back(outputsf[i]);
if((i+1)%pRealEngine->nc==0) {
if(outputvtemp[4]>pRealEngine->confidence_threshold){
auto cx = outputvtemp[0];
auto cy = outputvtemp[1];
auto w = outputvtemp[2];
auto h = outputvtemp[3];
temp.x = std::max(int((cx-w/2)),0);
temp.y = std::max(int((cy-h/2)),0);
temp.w = std::min(int(w),int(pImgInfo->i32Width-temp.x));
temp.h = std::min(int(h),int(pImgInfo->i32Height-temp.y));
temp.cx = int((cx-w/2));
temp.cy = int((cx-w/2));
temp.confidence = outputvtemp[4];
temp.classid = getfclassid(outputvtemp);
preds[temp.classid].push_back(temp);
}
outputvtemp.clear();
}
}
Cause Analysis:
不必要的数据复制
: Used in the original code outputvtemp.push_back(outputsf[i])
to outputsf[i]
add to outputvtemp
the vector. This will involve memory reallocation and data copying. To avoid this overhead, the array can be accessed directly in the loop outputsf
without using an additional vector.
重复计算
: The values calculated in the loop cx - w/2
and cy - h/2
are reused in multiple places. These calculations can be moved outside the conditional to avoid double calculations.
复杂的条件判断
: There are some conditional judgments in the loop, for example if (outputvtemp[4] > pRealEngine->confidence_threshold)
, these conditional judgments may increase the running time. Make sure these conditional judgments are necessary and, if possible, minimize unnecessary conditional judgments.
solution:
auto outputsf = pRealEngine->sampleProcess->outData;
std::vector<std::vector<BoundingBox>> preds(pRealEngine->num_class); //class-bboxes
BoundingBox temp;
auto n = pRealEngine->modelout_len * pRealEngine->nc;
// int elements_per_output = 5; // 每个输出元素包含 5 个值
for (auto i = 0; i < n; i +=pRealEngine->nc)// elements_per_output)
{
float confidence = outputsf[i + 4];
if (confidence > pRealEngine->confidence_threshold)
{
auto cx = outputsf[i];
auto cy = outputsf[i + 1];
auto w = outputsf[i + 2];
auto h = outputsf[i + 3];
temp.x = std::max(int((cx - w / 2)), 0);
temp.y = std::max(int((cy - h / 2)), 0);
temp.w = std::min(int(w), int(pImgInfo->i32Width - temp.x));
temp.h = std::min(int(h), int(pImgInfo->i32Height - temp.y));
temp.cx = int((cx - w / 2));
temp.cy = int((cy - h / 2));
temp.confidence = confidence;
// 将数组转换为 std::vector<float>
std::vector<float> outputvtemp(outputsf + i, outputsf + i + pRealEngine->nc);
// 调用 getfclassid 函数,并传递起始和结束索引
temp.classid = getfclassid(outputvtemp); // 传递起始和结束索引
preds[temp.classid].push_back(temp);
}
}
summary
This article is a practical problem encountered in my own project. Since I have just started a C++-related project, I hereby record it! ! !
C++ has a long way to go, come on! ! !
September 9, 2023 15:33:36