TensorRT-Plugin small test
Note: It needs to be compiled under linux, and win10 did not compile successfully
. The overall process:
- Install the TRT library first;
- Modify onnx and add plugin;
- Compile the source code corresponding to the TRT version (there will be many pitfalls on win10, and linux will be relatively smooth);
- Modify the TRT source code, add the cpp, h and cu files corresponding to the plugin, and then register, and the other cpp and h files need to be changed;
- Compile the modified TRT source code, put the generated .so and .a files into the lib directory of the TRT library, and replace the original lib directory;
- Use the trtexec tool to generate an engine from the modified onnx;
- In the verification phase, when calling the custom plugin, you need to add the header file #include “NvInferPlugin.h” and bool didInitPlugins = initLibNvInferPlugins(&gLogger, “ ”); before deserializing the engine (deserializeCudaEngine function), and add it in Cmakelists. Add 1 line target_link_libraries (yolox libnvinfer_plugin.so) to txt.
Step 1:
You need to install the TensorRT library, choose version 8.2.0.6, you can also try other versions.
Step 2:
Modify the activation function of the network in onnx. In order to add op types that are not supported in TRT, use the following script to change the op_type of LeakyRelu to Custom:
Finally, try to use the trtexec tool to export the modified model into the onnx model and report an error. . .
Step 3:
Download and compile the source code that is consistent with the installed version of TensorRT.
It is best to install the same version. For example, the TensorRT library is 8.2.0.6, and the TensorRT source code is also 8.2.0.6. I haven't tried the same version, so I don't know if it works or not.
Download command:
git clone -b release/8.2.0.6 https://github.com/nvidia/TensorRT TensorRT // 8.2.0.6可以替换成别的版本,如7.1
或:
git clone -b master https://github.com/nvidia/TensorRT TensorRT // 默认就是8.2.0.6版本的
cd TensorRT
git submodule update --init --recursive // 下载trt源码中的第三方库如onnx等,有时候网不好,需要多下几回,才能下全。
As shown in the figure below, there is no output after entering the git submodule update --init --recursive command, which proves that everything is complete.
Before make, you need to modify one line of code in Cmakelists.txt, as shown in the figure below, you need to specify the TRT_LIB_DIR path as the lib directory of the TensorRT library:
Compile command:
mkdir build
cd build
cmake ..
make -j10
Under normal circumstances, it is very smooth and no error will be reported. After compiling, .so and .a files will be generated in the build directory. The purpose of this step is to verify whether the source code of TensorRT can be compiled successfully, because after adding a new plugin, the source code of TensorRT has to be compiled again. Before that, the pit of compiling the source code of TensorRT should be filled first. . .
If you compile source code of other versions of TRT, such as version 7.1, note that version 7.1 requires CUDA version 11.0 or 10.2. If you want to compile with other CUDA versions, you need to change the command in the cmake step to:
cmake .. -DCUDA_VERSION=11.2 // 这里使用的CUDA版本是11.2,可替换成你自己想要的版本
In addition, the default protobuf version of the third-party library used by the source code of TRT7.1 is 3.0.0. If it cannot be downloaded during the compilation process, you can download tar.gz from the Internet in advance, and you need to modify it at this time The corresponding CMakeLists.txt, as shown below:
Step 4:
Open the TRT source code, copy a copy of LeakyReluPlugin in the plugin directory, replace lRelu in the h file and cpp file with Custom, and change the file name, as shown in the figure below.
In order to facilitate subsequent registration, the custom plug-in Custom class needs to inherit the nvinfer1::IPluginV2DynamicExt interface, and some rewriting methods and properties need to be added, as shown in Figures 3, 4, and 5:
In plugin/common/kernel.h, add the line in Figure 6.
In plugin/common/kernels, you need to write a lCustom.cu file, like lRelu.cu, replace lRelu with Custom in the file, as shown in Figure 7, the final Custom plugin function is implemented in pCustomKernel(), you can Modify it here to the plugin function code you want.
It is also necessary to register the plugin, and add the two lines of code in Figure 8 and 9 to InferPlugin.cpp:
And add the line shown in Figure 10 in plugin/CMakeLists.txt:
Finally, the mapping relationship between the onnx node and the TRT plug-in needs to be realized, and the changes are as follows:
Step 5:
Compile the modified TRT source code. The compilation command is the same as the third step. After compiling, copy the .so and .a files in the build directory to the lib directory in the TRT library, and select "Replace All".
Step 6:
Use the trtexec tool to generate an engine from the modified onnx, and the results are as follows:
Step 7:
How to call the custom plugin?
If nothing is changed in the reasoning code, even if the engine is generated successfully, an error will occur when calling the engine with a custom plug-in. The error screenshot is shown in Figure 14.
Baidu found the solution on the Internet, see reference link 2, you need to add Figures 15 and 16 at the corresponding positions in the reasoning code, and Figure 16 needs to be added before the deserializeCudaEngine function. In addition, cmakelists.txt also needs to add 1 line, as shown in Figure 17.
Finally, you can happily call the custom plugin model reasoning! hhh
The above is a small example of TensorRT-Plugin. In actual combat, I changed the pre-processing resize and other nodes of yolox to Plugin, and the method is the same as above. Since the resize node itself exists in the onnx-to-TensorRT operator, there is no need to write Plugin's C++ code, and only need to add Shape, Gather, Concat and Resize nodes to the onnx model.
Figure 19 is the visualization result of putting the pre-processing of yolox into the model. On the basis of the original model, Shape, Gather, Concat and Resize nodes are added. The Concat node and Resize node are visualized as shown in Figure 20. The output of the Concat node That is the shape output by the Resize node. The purpose of adding Shape, Gather and Concat nodes is to obtain the dynamic shape of the network input. As for why not directly specify the shape as [1, 3, 640, 640] in the sizes parameter of the Resize node, it is because direct specification will report an error.
In fact, the purpose of converting the resize C++ code into TensorRT-Plugin is to reduce the processing time of each frame to achieve the purpose of improving performance. However, after I tested the time, I found that using the resize-plugin made the processing time of each frame longer. . . The guess is that the size of the input image has become larger (because the original image is input at this time), and it is necessary to convert nhwc to nchw before resizing. The size of the input image becomes larger, resulting in a longer time-consuming function of converting dimensions. In addition, the reasoning time of using nsight systems to view the resize node is not short. . . So I changed the optimization strategy to write all the pre-processing and post-processing functions of yolox as CUDA C kernel functions to reduce time-consuming and improve performance. The writing of CUDA kernel functions will be introduced in future blogs.
参考链接:
1、https://zhuanlan.zhihu.com/p/492144628
2、https://blog.csdn.net/a2824256/article/details/121262135?spm=1001.2101.3001.6650.11&utm_medium=distribute.pc_relevant.none-task-blog-2%7Edefault%7ECTRLIST%7ERate-11-121262135-blog-102723545.pc_relevant_default&depth_1-utm_source=distribute.pc_relevant.none-task-blog-2%7Edefault%7ECTRLIST%7ERate-11-121262135-blog-102723545.pc_relevant_default&utm_relevant_index=14