[Model Deployment] Getting Started Tutorial (8): How to Add a TensorRT Custom Operator

Model Deployment Getting Started Tutorial (8): How to Add TensorRT Custom Operator - Zhihu (zhihu.com)

Table of contents

In the previous model introduction series, we introduced deploying a PyTorch model to an inference backend, such as ONNXRuntime, which may encounter many engineering problems.

Some can be solved by creating ONNX nodes that still use the native implementation of the backend for inference. For some algorithms that cannot be exported to the backend, the implementation process of the algorithm can be changed by rewriting the code, and it can also be exported to ONNX to achieve a consistent effect. The above two methods can generally handle most of the deployment problems, and at the same time do not need to introduce new content to the reasoning framework, which is our preferred choice when deploying models.

However, there are still some models, and some operators in the model cannot bypass the problem through the above two methods. At this time, how to implement the corresponding code for a specific backend is extremely important. This is also the third way that this article will introduce - custom plug-ins.

Custom plug-ins are a way for many inference frameworks to support user-defined operators. Taking MMDeploy as an example, it is an algorithm library that supports multiple inference backends. Currently supported backends are:

ONNXRuntime
TensorRT
ncnn
OpenVINO
PPLNN

Among them, the first three backends implement some custom operators. For example, the modulation variability convolution in ONNXRuntime, the topk operator in ncnn, and the MultiLevelRoiAlign in TensorRT.

It is relatively complicated to introduce how to customize operators for the backend, so this article only introduces custom operators for one of the backend TensorRT. If readers are interested in other backends, they can check their code bases. Generally, each reasoning framework has detailed documentation on how to add customized operator implementations.

Add TensorRT plugin to MMDeploy

Still take the super-resolution model SRCNN in the previous tutorial 2 as an example. In tutorial 2, we used ONNXRuntime as the backend, and exported an ONNX model that supports dynamic scale through the symbolic function of PyTorch. This model can be run directly with ONNXRuntime, because the nodes exported by the NewInterpolate class are the nodes supported by ONNXRuntime Resize . Below we try to directly convert the exported tutorial 2 srcnn3.onnx to TensorRT.

from mmdeploy.backend.tensorrt.utils import from_onnx 
 
from_onnx( 
    'srcnn3.onnx', 
    'srcnn3', 
    input_shapes=dict( 
        input=dict( 
            min_shape=[1, 3, 256, 256], 
            opt_shape=[1, 3, 256, 256], 
            max_shape=[1, 3, 256, 256]), 
        factor=dict( 
            min_shape=[4], 
            opt_shape=[4], 
            max_shape=[4])))

Friends who have not installed MMDeploy can refer to build.md to install it first. After the installation is complete, execute the above script, and the following error will be reported:

RuntimeError: Failed to parse onnx, In node 1 (importResize): UNSUPPORTED_NODE: Assertion failed: mode != "cubic" && "This version of TensorRT does not support cubic interpolation!"

There are two reasons for the error:

'srcnn3.onnx'The file Resize is the ONNX native node. One of its interpolation methods, bicubic, is not supported by TensorRT (TensorRT's Resize Layer only supports nearest and bilinear interpolation methods). The error message in the log also clearly indicates this;
But even after changing the mode from "bicubic" to "bilinear", the conversion still fails: RuntimeError: Failed to parse onnx, In node 1 (importResize): UNSUPPORTED_NODE: Assertion failed: scales.is_weights() && Resize scales must be initializer!". This is because TensorRT cannot accept dynamic scale.

Create an ONNX node

To solve the above problems, we need to create a new node to replace the original Resize node, and implement the plug-in code corresponding to the new node.

The name of the newly changed node is called Test::DynamicTRTResize, which is written in C++ Test and is a domain name. It is mainly used to distinguish nodes with the same name from different sources, such as ONNX:: and Test::. Of course, ONNX itself does not have DynamicTRTResize a node name.

import torch 
from torch import nn 
from torch.nn.functional import interpolate 
import torch.onnx 
import cv2 
import numpy as np 
import os, requests 
# Download checkpoint and test image 
urls = ['https://download.openmmlab.com/mmediting/restorers/srcnn/srcnn_x4k915_1x16_1000k_div2k_20200608-4186f232.pth', 
    'https://raw.githubusercontent.com/open-mmlab/mmediting/master/tests/data/face/000001.png'] 
names = ['srcnn.pth', 'face.png'] 
for url, name in zip(urls, names): 
    if not os.path.exists(name): 
        open(name, 'wb').write(requests.get(url).content) 
class DynamicTRTResize(torch.autograd.Function): 
    def __init__(self) -> None: 
        super().__init__() 
    @staticmethod 
    def symbolic(g, input, size_tensor, align_corners = False): 
        """Symbolic function for creating onnx op.""" 
        return g.op( 
            'Test::DynamicTRTResize', 
            input, 
            size_tensor, 
            align_corners_i=align_corners) 
    @staticmethod 
    def forward(g, input, size_tensor, align_corners = False): 
        """Run forward.""" 
        size = [size_tensor.size(-2), size_tensor.size(-1)] 
        return interpolate( 
            input, size=size, mode='bicubic', align_corners=align_corners) 
class StrangeSuperResolutionNet(nn.Module): 
    def __init__(self): 
        super().__init__() 
        self.conv1 = nn.Conv2d(3, 64, kernel_size=9, padding=4) 
        self.conv2 = nn.Conv2d(64, 32, kernel_size=1, padding=0) 
        self.conv3 = nn.Conv2d(32, 3, kernel_size=5, padding=2) 
        self.relu = nn.ReLU() 
    def forward(self, x, size_tensor): 
        x = DynamicTRTResize.apply(x, size_tensor) 
        out = self.relu(self.conv1(x)) 
        out = self.relu(self.conv2(out)) 
        out = self.conv3(out) 
        return out 
def init_torch_model(): 
    torch_model = StrangeSuperResolutionNet() 
    state_dict = torch.load('srcnn.pth')['state_dict'] 
    # Adapt the checkpoint 
    for old_key in list(state_dict.keys()): 
        new_key = '.'.join(old_key.split('.')[1:]) 
        state_dict[new_key] = state_dict.pop(old_key) 
    torch_model.load_state_dict(state_dict) 
    torch_model.eval() 
    return torch_model 
model = init_torch_model() 
factor = torch.rand([1, 1, 512, 512], dtype=torch.float) 
input_img = cv2.imread('face.png').astype(np.float32) 
# HWC to NCHW 
input_img = np.transpose(input_img, [2, 0, 1]) 
input_img = np.expand_dims(input_img, 0) 
# Inference 
torch_output = model(torch.from_numpy(input_img), factor).detach().numpy() 
# NCHW to HWC 
torch_output = np.squeeze(torch_output, 0) 
torch_output = np.clip(torch_output, 0, 255) 
torch_output = np.transpose(torch_output, [1, 2, 0]).astype(np.uint8) 
# Show image 
cv2.imwrite("face_torch.png", torch_output) 
x = torch.randn(1, 3, 256, 256) 
dynamic_axes={ 
        'input': { 
            0: 'batch', 
            2: 'height', 
            3: 'width' 
        }, 
        'factor': { 
            0: 'batch1', 
            2: 'height1', 
            3: 'width1' 
        }, 
        'output': { 
            0: 'batch2', 
            2: 'height2', 
            3: 'width2' 
        }, 
    } 
with torch.no_grad(): 
    torch.onnx.export( 
        model, (x, factor), 
        "srcnn3.onnx", 
        opset_version=11, 
        input_names=['input', 'factor'], 
        output_names=['output'], 
        dynamic_axes=dynamic_axes)

Executing the above script, we successfully exported an ONNX model srcnn.onnx. Open this model visualization with netron as follows:

Converting this model directly to a TensorRT model is also not feasible, because TensorRT cannot yet resolve DynamicTRTResize nodes. To parse the node, we must add c++ code to TensorRT to implement the plugin.

C++ implementation

Because the Bicubic Interpolate operator has been implemented in MMDeploy, we can reuse some of the CUDA code, and only implement the plug-in that supports dynamic scale for TensorRT. Friends who are interested in CUDA programming can refer to the official CUDA tutorial .

Because csrc/backend_ops/tensorrt/bicubic_interpolate there are CUDA codes we need, we can directly add TensorRT-related trt_dynamic_resize.hpp and trt_dynamic_resize.cpp files in this folder, and declare and implement the plug-ins in these two files respectively. We can also create a new folder csrc/backend_ops/tensorrt/dynamic_resizeand put these two files directly under this folder.

For TensorRT 7+, to implement such a custom plugin, we need to write two classes.

DynamicTRTResize, inherited from nvinfer1::IPluginV2DynamicExt , to complete the specific implementation of the plug-in.
DynamicTRTResizeCreator, inherited from nvinfer1::IPluginCreator , is the plug-in factory class for creating DynamicTRTResize plug-in instances.

In MMDeploy, since there are several plug-ins that need to be implemented, we mmdeploy/csrc/backend_ops/tensorrt/common/trt_plugin_base.hpp implement TRTPluginBase and TRTPluginCreatorBase two classes in MMDeploy, which are used to manage some common property methods of all plug-ins.

Among them, TRTPluginBase inherits from nvinfer1::IPluginV2DynamicExtand TRTPluginCreatorBaseinherits from nvinfer1::IPluginCreator. In this way, users only need to inherit these two new classes when implementing plug-ins. So we only need to dynamic_resize refer to the header file in the .hpp file under the folder, trt_plugin_base.hpp and the inheritance logic is as follows:

class DynamicTRTResize : public TRTPluginBase{} 
class DynamicTRTResizeCreator : public TRTPluginCreatorBase{}

In trt_dynamic_resize.hpp, we declare the following:

#ifndef TRT_DYNAMIC_RESIZE_HPP 
#define TRT_DYNAMIC_RESIZE_HPP 
#include <cublas_v2.h> 
#include <memory> 
#include <string> 
#include <vector> 
#include "trt_plugin_base.hpp" 
namespace mmdeploy { 
class DynamicTRTResize : public TRTPluginBase { 
 public: 
  DynamicTRTResize(const std::string &name, bool align_corners); 
  DynamicTRTResize(const std::string name, const void *data, size_t length); 
  DynamicTRTResize() = delete; 
  // IPluginV2DynamicExt Methods 
  nvinfer1::IPluginV2DynamicExt *clone() const TRT_NOEXCEPT override; 
  nvinfer1::DimsExprs getOutputDimensions(int outputIndex, const nvinfer1::DimsExprs *inputs, 
                                          int nbInputs, nvinfer1::IExprBuilder &exprBuilder) 
      TRT_NOEXCEPT override; 
  bool supportsFormatCombination(int pos, const nvinfer1::PluginTensorDesc *ioDesc, int nbInputs, 
                                 int nbOutputs) TRT_NOEXCEPT override; 
  void configurePlugin(const nvinfer1::DynamicPluginTensorDesc *in, int nbInputs, 
                       const nvinfer1::DynamicPluginTensorDesc *out, 
                       int nbOutputs) TRT_NOEXCEPT override; 
  size_t getWorkspaceSize(const nvinfer1::PluginTensorDesc *inputs, int nbInputs, 
                          const nvinfer1::PluginTensorDesc *outputs, 
                          int nbOutputs) const TRT_NOEXCEPT override; 
  int enqueue(const nvinfer1::PluginTensorDesc *inputDesc, 
              const nvinfer1::PluginTensorDesc *outputDesc, const void *const *inputs, 
              void *const *outputs, void *workspace, cudaStream_t stream) TRT_NOEXCEPT override; 
  // IPluginV2Ext Methods 
  nvinfer1::DataType getOutputDataType(int index, const nvinfer1::DataType *inputTypes, 
                                       int nbInputs) const TRT_NOEXCEPT override; 
  // IPluginV2 Methods 
  const char *getPluginType() const TRT_NOEXCEPT override; 
  const char *getPluginVersion() const TRT_NOEXCEPT override; 
  int getNbOutputs() const TRT_NOEXCEPT override; 
  size_t getSerializationSize() const TRT_NOEXCEPT override; 
  void serialize(void *buffer) const TRT_NOEXCEPT override; 
 private: 
  bool mAlignCorners; 
}; 
class DynamicTRTResizeCreator : public TRTPluginCreatorBase { 
 public: 
  DynamicTRTResizeCreator(); 
  const char *getPluginName() const TRT_NOEXCEPT override; 
  const char *getPluginVersion() const TRT_NOEXCEPT override; 
  nvinfer1::IPluginV2 *createPlugin(const char *name, const nvinfer1::PluginFieldCollection *fc) 
      TRT_NOEXCEPT override; 
  nvinfer1::IPluginV2 *deserializePlugin(const char *name, const void *serialData, 
                                         size_t serialLength) TRT_NOEXCEPT override; 
}; 
}  // namespace mmdeploy 
#endif  // TRT_DYNAMIC_RESIZE_HPP

In such a header file, the DynamicTRRTesize class performs the following nesting doll inheritance:

From the above picture and code, we found that in the plug-in class DynamicTRTResize , we defined a private variable mAlignCorners, which indicates whether or not align corners. In addition, it is only necessary to implement the constructor, destructor and methods of the three base classes in TensoRT. There are two constructors, which are used to create plug-ins and deserialize plug-ins respectively. And in the base class method:

The method of the base class IPluginV2DynamicExt is more worthy of attention. getOutputDimensions It obtains the shape of the output tensor and enqueue is really responsible for executing our algorithm. Generally, the CUDA kernel function is called internally. The plug-in implemented in this article directly calls csrc/backend_ops/tensorrt/bicubic_interpolate the kernel function defined in MMDeploy bicubic_interpolate.
For the method of the base class IPluginV2Ext , we only need to realize the method of obtaining the output data type getOutputDataType .
The base class IPluginV2 is a method to get the plugin type and version number, in addition to a function to serialize the parameters of the input plugin serialize and a function to calculate the serialized buffer size of the parameter getSerializationSize, and a method to get the number of output tensors getNbOutputs. There are also some public methods defined within TRTPluginBase the class.

In the plugin factory class DynamicTRTResizeCreator , we need to declare the methods getPluginName and methods to get the plugin name and version getPluginVersion. At the same time, we also need to declare the methods of creating plug-ins and deserializing plug-ins createPlugin , deserializePluginthe former calls DynamicTRTResize the method of creating plug-ins, and the latter calls the methods of deserializing plug-ins.

Next, let's implement the above statement. The code we implement in the .cpp file is as follows:

// Copyright (c) OpenMMLab. All rights reserved 
#include "trt_dynamic_resize.hpp" 
#include <assert.h> 
#include <chrono> 
#include "trt_plugin_helper.hpp" 
#include "trt_serialize.hpp" 
// to get the reference to kernel function bicubic_interpolate，which will be used in enqueue 
#include "../bicubic_interpolate/trt_bicubic_interpolate_kernel.hpp" 
using namespace nvinfer1; 
namespace mmdeploy { 
namespace { 
static const char *PLUGIN_VERSION{"1"}; 
static const char *PLUGIN_NAME{"DynamicTRTResize"};//plagin name == ONNX node name，triggered in building engine 
}  // namespace 
DynamicTRTResize::DynamicTRTResize(const std::string &name, bool align_corners) 
    : TRTPluginBase(name), mAlignCorners(align_corners) {} 
DynamicTRTResize::DynamicTRTResize(const std::string name, const void *data, 
                                             size_t length) 
    : TRTPluginBase(name) { 
  deserialize_value(&data, &length, &mAlignCorners); 
} 
nvinfer1::IPluginV2DynamicExt *DynamicTRTResize::clone() const TRT_NOEXCEPT { 
  DynamicTRTResize *plugin = 
      new DynamicTRTResize(mLayerName, mAlignCorners); 
  plugin->setPluginNamespace(getPluginNamespace()); 
  return plugin; 
} 
nvinfer1::DimsExprs DynamicTRTResize::getOutputDimensions( 
    int outputIndex, const nvinfer1::DimsExprs *inputs, int nbInputs, 
    nvinfer1::IExprBuilder &exprBuilder) TRT_NOEXCEPT { 
  nvinfer1::DimsExprs ret; 
  ret.nbDims = 4; 
  // input two tensors: input and size_tensor, the later is for shape inference only 
  ret.d[0] = inputs[0].d[0]; 
  ret.d[1] = inputs[0].d[1]; 
  ret.d[2] = inputs[1].d[2]; 
  ret.d[3] = inputs[1].d[3]; 
  return ret; 
} 
bool DynamicTRTResize::supportsFormatCombination(int pos, 
                                                      const nvinfer1::PluginTensorDesc *ioDesc, 
                                                      int nbInputs, int nbOutputs) TRT_NOEXCEPT { 
  if (pos == 0) { 
    return (ioDesc[pos].type == nvinfer1::DataType::kFLOAT && 
            ioDesc[pos].format == nvinfer1::TensorFormat::kLINEAR); 
  } else { 
    return ioDesc[pos].type == ioDesc[0].type && ioDesc[pos].format == ioDesc[0].format; 
  } 
} 
void DynamicTRTResize::configurePlugin(const nvinfer1::DynamicPluginTensorDesc *inputs, 
                                            int nbInputs, 
                                            const nvinfer1::DynamicPluginTensorDesc *outputs, 
                                            int nbOutputs) TRT_NOEXCEPT {} 
size_t DynamicTRTResize::getWorkspaceSize(const nvinfer1::PluginTensorDesc *inputs, 
                                               int nbInputs, 
                                               const nvinfer1::PluginTensorDesc *outputs, 
                                               int nbOutputs) const TRT_NOEXCEPT { 
  return 0; 
} 
int DynamicTRTResize::enqueue(const nvinfer1::PluginTensorDesc *inputDesc, 
                                   const nvinfer1::PluginTensorDesc *outputDesc, 
                                   const void *const *inputs, void *const *outputs, void *workSpace, 
                                   cudaStream_t stream) TRT_NOEXCEPT { 
  int batch = inputDesc[0].dims.d[0]; 
  int channels = inputDesc[0].dims.d[1]; 
  int height = inputDesc[0].dims.d[2]; 
  int width = inputDesc[0].dims.d[3]; 
  int height_out = outputDesc[0].dims.d[2]; 
  int width_out = outputDesc[0].dims.d[3]; 
  const void *x = inputs[0]; 
  void *output = outputs[0]; 
  // TODO: add fp16 support 
  auto data_type = inputDesc[0].type; 
  switch (data_type) { 
    case nvinfer1::DataType::kFLOAT: 
      bicubic_interpolate<float>((float *)x, (float *)output, batch, channels, height, width, 
                                 height_out, width_out, mAlignCorners, stream); 
      break; 
    default: 
      return 1; 
      break; 
  } 
  return 0; 
} 
nvinfer1::DataType DynamicTRTResize::getOutputDataType(int index, 
                                                            const nvinfer1::DataType *inputTypes, 
                                                            int nbInputs) const TRT_NOEXCEPT { 
  return inputTypes[0]; 
} 
// IPluginV2 Methods 
const char *DynamicTRTResize::getPluginType() const TRT_NOEXCEPT { return PLUGIN_NAME; } 
const char *DynamicTRTResize::getPluginVersion() const TRT_NOEXCEPT { return PLUGIN_VERSION; } 
int DynamicTRTResize::getNbOutputs() const TRT_NOEXCEPT { return 1; } 
size_t DynamicTRTResize::getSerializationSize() const TRT_NOEXCEPT { 
  return serialized_size(mAlignCorners); 
} 
void DynamicTRTResize::serialize(void *buffer) const TRT_NOEXCEPT { 
  serialize_value(&buffer, mAlignCorners); 
} 
// creator / 
DynamicTRTResizeCreator::DynamicTRTResizeCreator() { 
  mPluginAttributes.clear(); 
  mPluginAttributes.emplace_back(nvinfer1::PluginField("align_corners")); 
  mFC.nbFields = mPluginAttributes.size(); 
  mFC.fields = mPluginAttributes.data(); 
} 
const char *DynamicTRTResizeCreator::getPluginName() const TRT_NOEXCEPT { return PLUGIN_NAME; } 
const char *DynamicTRTResizeCreator::getPluginVersion() const TRT_NOEXCEPT { 
  return PLUGIN_VERSION; 
} 
nvinfer1::IPluginV2 *DynamicTRTResizeCreator::createPlugin( 
    const char *name, const nvinfer1::PluginFieldCollection *fc) TRT_NOEXCEPT { 
  nvinfer1::Dims size{2, {1, 1}}; 
  bool align_corners = 1; 
  for (int i = 0; i < fc->nbFields; i++) { 
    if (fc->fields[i].data == nullptr) { 
      continue; 
    } 
    std::string field_name(fc->fields[i].name); 
    if (field_name.compare("align_corners") == 0) { 
      align_corners = static_cast<const int *>(fc->fields[i].data)[0]; 
    } 
  } 
  // create the instance of DynamicTRTResize 
  DynamicTRTResize *plugin = new DynamicTRTResize(name, align_corners); 
  plugin->setPluginNamespace(getPluginNamespace()); 
  return plugin; 
} 
nvinfer1::IPluginV2 *DynamicTRTResizeCreator::deserializePlugin( 
    const char *name, const void *serialData, size_t serialLength) TRT_NOEXCEPT { 
  auto plugin = new DynamicTRTResize(name, serialData, serialLength); 
  plugin->setPluginNamespace(getPluginNamespace()); 
  return plugin; 
} 
REGISTER_TENSORRT_PLUGIN(DynamicTRTResizeCreator);//register the plugin 
}  // namespace mmdeploy

Then, we rebuild the TensorRT dynamic library for MMDeploy build/lib/libmmdeploy_tensorrt_ops.so. Generally, successful compilation means that the operator has been registered, but we need to perform some tests to ensure the correct result.

test

Let's use TensorRT's python api to check the current list of plugins:

import tensorrt as trt 
from mmdeploy.backend.tensorrt import load_tensorrt_plugin 
load_tensorrt_plugin() 
def get_plugin_names(): 
    return [pc.name for pc in trt.get_plugin_registry().plugin_creator_list] 
print(get_plugin_names())

You can find 'DynamicTRRTesize' in the list of plugins. Then we perform a functional test on this plug-in to see if the inference result is consistent with the PyTroch result, and whether the output size can be dynamically controlled.

from mmdeploy.backend.tensorrt import create_trt_engine, save_trt_engine 
engine = create_trt_engine( 
        'srcnn3.onnx', 
        input_shapes=dict(input = dict( 
            min_shape=[1, 3, 256, 256], 
            opt_shape=[1, 3, 256, 256], 
            max_shape=[1, 3, 256, 256]), 
            factor = dict(min_shape = [1, 1, 256, 256], opt_shape = [1, 1, 512, 512], max_shape = [1, 1, 1024, 1024]))) 
save_trt_engine(engine, 'srcnn3.engine') 
from mmdeploy.backend.tensorrt import TRTWrapper 
trt_model = TRTWrapper('srcnn3.engine', ['output']) 
factor = torch.rand([1, 1, 768, 768], dtype=torch.float) 
trt_output = trt_model.forward(dict(input = x.cuda(), factor = factor.cuda())) 
torch_output = model.forward(x, factor) 
assert np.allclose(trt_output['output'].cpu().numpy(), torch_output.cpu().detach(), rtol = 1e-3, atol = 1e-5)

Compare whether the output of TensorRT is consistent with that of PyTorch. If the program does not report an error, it means that the reasoning is correct. In addition, we used a different size when testing and exporting, and the results were consistent with PyTorch, indicating that dynamic sizes can be supported.

Summarize

In this tutorial, we mainly describe how to add a custom TensorRT plug-in to the MMDeploy code base. The whole process does not involve too much more complicated CUDA programming. I believe that after learning, you can implement the desired plug-in by yourself.

https://github.com/open-mmlab/mmdeploygithub.com/open-mmlab/mmdeploy

So far, our series of tutorials on getting started with model deployment has been updated for eight issues, so it may come to an end here! Thank you for your support and love, we will continue to launch advanced tutorials on model deployment in the future, please stay tuned! All the content related to model deployment is organized in this column, welcome everyone to pay attention~

Model deployment those thingswww.zhihu.com/column/c_1497987564452114432Uploading...ReuploadCancel