I wrote an article about tensorrt before: performance comparison of tensorrt under different batchsize
The model is still the model used in this article, but it is converted to an openvino model, and then tested using the benchmark
To use the benchmark, you need to compile it yourself: https://blog.csdn.net/zhou_438/article/details/112974101
Then you can test
For a specific example, the following command is batchsize=32:
benchmark_app.exe -m ctdet_coco_dlav0_512.xml -d CPU -i 2109.png -b 32
The rest of batchsize needs to be tested by yourself and then summarized data for observation
Then visualize the data, and the results are as follows:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
plt.tight_layout()
df1=pd.read_csv("FirstInference.txt",sep=' ')
df2=pd.read_csv("Latency.txt",sep=' ')
df=pd.merge(df1,df2,how = 'inner',on='batchsize')
print(df.head())
x =df['batchsize'].values
y1 =df['FirstInference'].values
y2 =df['Latency'].values
plt.plot(x, y1, 'ro--',label='FirstInference')
plt.plot(x, y2, 'bo-',label='Latency')
plt.xlabel('batchsize')
plt.ylabel('time(ms)')
plt.legend()
plt.show()
It can be seen that no matter how large your batchsize is for the cpu, the inference time actually increases linearly.
There is almost no change in throughput. The graph seems to fluctuate greatly. In fact, the data changes in a small range.