TensorRT Plugin in action

TensorRT-Plugin small test

Note: It needs to be compiled under linux, and win10 did not compile successfully
. The overall process:

  1. Install the TRT library first;
  2. Modify onnx and add plugin;
  3. Compile the source code corresponding to the TRT version (there will be many pitfalls on win10, and linux will be relatively smooth);
  4. Modify the TRT source code, add the cpp, h and cu files corresponding to the plugin, and then register, and the other cpp and h files need to be changed;
  5. Compile the modified TRT source code, put the generated .so and .a files into the lib directory of the TRT library, and replace the original lib directory;
  6. Use the trtexec tool to generate an engine from the modified onnx;
  7. In the verification phase, when calling the custom plugin, you need to add the header file #include “NvInferPlugin.h” and bool didInitPlugins = initLibNvInferPlugins(&gLogger, “ ”); before deserializing the engine (deserializeCudaEngine function), and add it in Cmakelists. Add 1 line target_link_libraries (yolox libnvinfer_plugin.so) to txt.

Step 1:
You need to install the TensorRT library, choose version 8.2.0.6, you can also try other versions.

Step 2:
Modify the activation function of the network in onnx. In order to add op types that are not supported in TRT, use the following script to change the op_type of LeakyRelu to Custom:
insert image description here
insert image description here

Finally, try to use the trtexec tool to export the modified model into the onnx model and report an error. . .
insert image description here

Step 3:
Download and compile the source code that is consistent with the installed version of TensorRT.
It is best to install the same version. For example, the TensorRT library is 8.2.0.6, and the TensorRT source code is also 8.2.0.6. I haven't tried the same version, so I don't know if it works or not.
Download command:

git clone -b release/8.2.0.6 https://github.com/nvidia/TensorRT TensorRT  // 8.2.0.6可以替换成别的版本,如7.1
或:
git clone -b master https://github.com/nvidia/TensorRT TensorRT               // 默认就是8.2.0.6版本的

cd TensorRT
git submodule update --init --recursive                           // 下载trt源码中的第三方库如onnx等,有时候网不好,需要多下几回,才能下全。

As shown in the figure below, there is no output after entering the git submodule update --init --recursive command, which proves that everything is complete.
insert image description here

Before make, you need to modify one line of code in Cmakelists.txt, as shown in the figure below, you need to specify the TRT_LIB_DIR path as the lib directory of the TensorRT library:
insert image description here

Compile command:

mkdir build
cd build
cmake .. 
make -j10

  Under normal circumstances, it is very smooth and no error will be reported. After compiling, .so and .a files will be generated in the build directory. The purpose of this step is to verify whether the source code of TensorRT can be compiled successfully, because after adding a new plugin, the source code of TensorRT has to be compiled again. Before that, the pit of compiling the source code of TensorRT should be filled first. . .
  If you compile source code of other versions of TRT, such as version 7.1, note that version 7.1 requires CUDA version 11.0 or 10.2. If you want to compile with other CUDA versions, you need to change the command in the cmake step to:

cmake .. -DCUDA_VERSION=11.2    // 这里使用的CUDA版本是11.2,可替换成你自己想要的版本

  In addition, the default protobuf version of the third-party library used by the source code of TRT7.1 is 3.0.0. If it cannot be downloaded during the compilation process, you can download tar.gz from the Internet in advance, and you need to modify it at this time The corresponding CMakeLists.txt, as shown below:
insert image description here

Step 4:
  Open the TRT source code, copy a copy of LeakyReluPlugin in the plugin directory, replace lRelu in the h file and cpp file with Custom, and change the file name, as shown in the figure below.
figure 1

figure 1
 
insert image description here
Figure 2 lCustomPlugin.cpp (except for the parts shown in the figure, other places need to be changed to Custom)
 

  In order to facilitate subsequent registration, the custom plug-in Custom class needs to inherit the nvinfer1::IPluginV2DynamicExt interface, and some rewriting methods and properties need to be added, as shown in Figures 3, 4, and 5:
insert image description here

Figure 3 lCustomPlugin.h front part
 

insert image description here

Figure 4 lCustomPlugin.h middle part
 
insert image description here
Figure 5 lCustomPlugin.h post part
 

In plugin/common/kernel.h, add the line in Figure 6.
insert image description here

Figure 6 kernel.h
 

  In plugin/common/kernels, you need to write a lCustom.cu file, like lRelu.cu, replace lRelu with Custom in the file, as shown in Figure 7, the final Custom plugin function is implemented in pCustomKernel(), you can Modify it here to the plugin function code you want.
insert image description here

Figure 7 lCustom.cu
 

It is also necessary to register the plugin, and add the two lines of code in Figure 8 and 9 to InferPlugin.cpp:
insert image description here

Figure 8 InferPlugin.cpp
 
insert image description here
Figure 9 InferPlugin.cpp
 

And add the line shown in Figure 10 in plugin/CMakeLists.txt:
insert image description here

图10 plugin/CMakeLists.txt
 

Finally, the mapping relationship between the onnx node and the TRT plug-in needs to be realized, and the changes are as follows:
insert image description here

Figure 11 builtin_op_importers.cpp
 

Step 5:
  Compile the modified TRT source code. The compilation command is the same as the third step. After compiling, copy the .so and .a files in the build directory to the lib directory in the TRT library, and select "Replace All".

Step 6:
Use the trtexec tool to generate an engine from the modified onnx, and the results are as follows:
insert image description here

Figure 12 custom.engine
 

Step 7:
How to call the custom plugin?
insert image description here

Figure 13 Model comparison after yolox_s.onnx modifies the node
 

If nothing is changed in the reasoning code, even if the engine is generated successfully, an error will occur when calling the engine with a custom plug-in. The error screenshot is shown in Figure 14.
insert image description here

Figure 14 When the reasoning code has not been changed, the compilation is successful, but there is still an error at runtime, and the error is as above
 

  Baidu found the solution on the Internet, see reference link 2, you need to add Figures 15 and 16 at the corresponding positions in the reasoning code, and Figure 16 needs to be added before the deserializeCudaEngine function. In addition, cmakelists.txt also needs to add 1 line, as shown in Figure 17.
insert image description here

Figure 15 Add the NvInferPlugin.h header file to yolox.cpp
 

insert image description here

Figure 16 Add the call to initialize the plugin function before the deserializeCudaEngine function
 

insert image description here

Figure 17 CMakeLists.txt needs to be linked to libnvinfer_plugin.so
 

Finally, you can happily call the custom plugin model reasoning! hhh
insert image description here

Figure 18 Inference results with yolox_s.engine with custom plugin    

  The above is a small example of TensorRT-Plugin. In actual combat, I changed the pre-processing resize and other nodes of yolox to Plugin, and the method is the same as above. Since the resize node itself exists in the onnx-to-TensorRT operator, there is no need to write Plugin's C++ code, and only need to add Shape, Gather, Concat and Resize nodes to the onnx model.
  Figure 19 is the visualization result of putting the pre-processing of yolox into the model. On the basis of the original model, Shape, Gather, Concat and Resize nodes are added. The Concat node and Resize node are visualized as shown in Figure 20. The output of the Concat node That is the shape output by the Resize node. The purpose of adding Shape, Gather and Concat nodes is to obtain the dynamic shape of the network input. As for why not directly specify the shape as [1, 3, 640, 640] in the sizes parameter of the Resize node, it is because direct specification will report an error.
insert image description here

Figure 19 The left is the original yolox_s model, and the right is the yolox_s model after adding the pre-processing node
 

insert image description here

Figure 20 The left is the Concat node, and the right is the Resize node    

  In fact, the purpose of converting the resize C++ code into TensorRT-Plugin is to reduce the processing time of each frame to achieve the purpose of improving performance. However, after I tested the time, I found that using the resize-plugin made the processing time of each frame longer. . . The guess is that the size of the input image has become larger (because the original image is input at this time), and it is necessary to convert nhwc to nchw before resizing. The size of the input image becomes larger, resulting in a longer time-consuming function of converting dimensions. In addition, the reasoning time of using nsight systems to view the resize node is not short. . . So I changed the optimization strategy to write all the pre-processing and post-processing functions of yolox as CUDA C kernel functions to reduce time-consuming and improve performance. The writing of CUDA kernel functions will be introduced in future blogs.
 
 
 

参考链接:
1、https://zhuanlan.zhihu.com/p/492144628
2、https://blog.csdn.net/a2824256/article/details/121262135?spm=1001.2101.3001.6650.11&utm_medium=distribute.pc_relevant.none-task-blog-2%7Edefault%7ECTRLIST%7ERate-11-121262135-blog-102723545.pc_relevant_default&depth_1-utm_source=distribute.pc_relevant.none-task-blog-2%7Edefault%7ECTRLIST%7ERate-11-121262135-blog-102723545.pc_relevant_default&utm_relevant_index=14

Guess you like

Origin blog.csdn.net/sinat_41886501/article/details/129091397