Table of contents
Add TensorRT plugin to MMDeploy
In the previous model introduction series, we introduced deploying a PyTorch model to an inference backend, such as ONNXRuntime, which may encounter many engineering problems.
Some can be solved by creating ONNX nodes that still use the native implementation of the backend for inference. For some algorithms that cannot be exported to the backend, the implementation process of the algorithm can be changed by rewriting the code, and it can also be exported to ONNX to achieve a consistent effect. The above two methods can generally handle most of the deployment problems, and at the same time do not need to introduce new content to the reasoning framework, which is our preferred choice when deploying models.
However, there are still some models, and some operators in the model cannot bypass the problem through the above two methods. At this time, how to implement the corresponding code for a specific backend is extremely important. This is also the third way that this article will introduce - custom plug-ins.
Custom plug-ins are a way for many inference frameworks to support user-defined operators. Taking MMDeploy as an example, it is an algorithm library that supports multiple inference backends. Currently supported backends are:
- ONNXRuntime
- TensorRT
- ncnn
- OpenVINO
- PPLNN
Among them, the first three backends implement some custom operators. For example, the modulation variability convolution in ONNXRuntime, the topk operator in ncnn, and the MultiLevelRoiAlign in TensorRT.
It is relatively complicated to introduce how to customize operators for the backend, so this article only introduces custom operators for one of the backend TensorRT. If readers are interested in other backends, they can check their code bases. Generally, each reasoning framework has detailed documentation on how to add customized operator implementations.
Add TensorRT plugin to MMDeploy
Still take the super-resolution model SRCNN in the previous tutorial 2 as an example. In tutorial 2, we used ONNXRuntime as the backend, and exported an ONNX model that supports dynamic scale through the symbolic function of PyTorch. This model can be run directly with ONNXRuntime, because the nodes exported by the NewInterpolate class are the nodes supported by ONNXRuntime Resize
. Below we try to directly convert the exported tutorial 2 srcnn3.onnx
to TensorRT.
from mmdeploy.backend.tensorrt.utils import from_onnx
from_onnx(
'srcnn3.onnx',
'srcnn3',
input_shapes=dict(
input=dict(
min_shape=[1, 3, 256, 256],
opt_shape=[1, 3, 256, 256],
max_shape=[1, 3, 256, 256]),
factor=dict(
min_shape=[4],
opt_shape=[4],
max_shape=[4])))
Friends who have not installed MMDeploy can refer to build.md to install it first. After the installation is complete, execute the above script, and the following error will be reported:
RuntimeError: Failed to parse onnx, In node 1 (importResize): UNSUPPORTED_NODE: Assertion failed: mode != "cubic" && "This version of TensorRT does not support cubic interpolation!"
There are two reasons for the error:
'srcnn3.onnx'
The fileResize
is the ONNX native node. One of its interpolation methods, bicubic, is not supported by TensorRT (TensorRT's Resize Layer only supports nearest and bilinear interpolation methods). The error message in the log also clearly indicates this;- But even after changing the mode from "bicubic" to "bilinear", the conversion still fails:
RuntimeError: Failed to parse onnx, In node 1 (importResize): UNSUPPORTED_NODE: Assertion failed: scales.is_weights() && Resize scales must be initializer!"
. This is because TensorRT cannot accept dynamic scale.
Create an ONNX node
To solve the above problems, we need to create a new node to replace the original Resize node, and implement the plug-in code corresponding to the new node.
The name of the newly changed node is called Test::DynamicTRTResize
, which is written in C++ Test
and is a domain name. It is mainly used to distinguish nodes with the same name from different sources, such as ONNX::
and Test::
. Of course, ONNX itself does not have DynamicTRTResize
a node name.
import torch
from torch import nn
from torch.nn.functional import interpolate
import torch.onnx
import cv2
import numpy as np
import os, requests
# Download checkpoint and test image
urls = ['https://download.openmmlab.com/mmediting/restorers/srcnn/srcnn_x4k915_1x16_1000k_div2k_20200608-4186f232.pth',
'https://raw.githubusercontent.com/open-mmlab/mmediting/master/tests/data/face/000001.png']
names = ['srcnn.pth', 'face.png']
for url, name in zip(urls, names):
if not os.path.exists(name):
open(name, 'wb').write(requests.get(url).content)
class DynamicTRTResize(torch.autograd.Function):
def __init__(self) -> None:
super().__init__()
@staticmethod
def symbolic(g, input, size_tensor, align_corners = False):
"""Symbolic function for creating onnx op."""
return g.op(
'Test::DynamicTRTResize',
input,
size_tensor,
align_corners_i=align_corners)
@staticmethod
def forward(g, input, size_tensor, align_corners = False):
"""Run forward."""
size = [size_tensor.size(-2), size_tensor.size(-1)]
return interpolate(
input, size=size, mode='bicubic', align_corners=align_corners)
class StrangeSuperResolutionNet(nn.Module):
def __init__(self):
super().__init__()
self.conv1 = nn.Conv2d(3, 64, kernel_size=9, padding=4)
self.conv2 = nn.Conv2d(64, 32, kernel_size=1, padding=0)
self.conv3 = nn.Conv2d(32, 3, kernel_size=5, padding=2)
self.relu = nn.ReLU()
def forward(self, x, size_tensor):
x = DynamicTRTResize.apply(x, size_tensor)
out = self.relu(self.conv1(x))
out = self.relu(self.conv2(out))
out = self.conv3(out)
return out
def init_torch_model():
torch_model = StrangeSuperResolutionNet()
state_dict = torch.load('srcnn.pth')['state_dict']
# Adapt the checkpoint
for old_key in list(state_dict.keys()):
new_key = '.'.join(old_key.split('.')[1:])
state_dict[new_key] = state_dict.pop(old_key)
torch_model.load_state_dict(state_dict)
torch_model.eval()
return torch_model
model = init_torch_model()
factor = torch.rand([1, 1, 512, 512], dtype=torch.float)
input_img = cv2.imread('face.png').astype(np.float32)
# HWC to NCHW
input_img = np.transpose(input_img, [2, 0, 1])
input_img = np.expand_dims(input_img, 0)
# Inference
torch_output = model(torch.from_numpy(input_img), factor).detach().numpy()
# NCHW to HWC
torch_output = np.squeeze(torch_output, 0)
torch_output = np.clip(torch_output, 0, 255)
torch_output = np.transpose(torch_output, [1, 2, 0]).astype(np.uint8)
# Show image
cv2.imwrite("face_torch.png", torch_output)
x = torch.randn(1, 3, 256, 256)
dynamic_axes={
'input': {
0: 'batch',
2: 'height',
3: 'width'
},
'factor': {
0: 'batch1',
2: 'height1',
3: 'width1'
},
'output': {
0: 'batch2',
2: 'height2',
3: 'width2'
},
}
with torch.no_grad():
torch.onnx.export(
model, (x, factor),
"srcnn3.onnx",
opset_version=11,
input_names=['input', 'factor'],
output_names=['output'],
dynamic_axes=dynamic_axes)
Executing the above script, we successfully exported an ONNX model srcnn.onnx
. Open this model visualization with netron as follows:
Converting this model directly to a TensorRT model is also not feasible, because TensorRT cannot yet resolve DynamicTRTResize
nodes. To parse the node, we must add c++ code to TensorRT to implement the plugin.
C++ implementation
Because the Bicubic Interpolate operator has been implemented in MMDeploy, we can reuse some of the CUDA code, and only implement the plug-in that supports dynamic scale for TensorRT. Friends who are interested in CUDA programming can refer to the official CUDA tutorial .
Because csrc/backend_ops/tensorrt/bicubic_interpolate
there are CUDA codes we need, we can directly add TensorRT-related trt_dynamic_resize.hpp and trt_dynamic_resize.cpp files in this folder, and declare and implement the plug-ins in these two files respectively. We can also create a new folder csrc/backend_ops/tensorrt/dynamic_resize
and put these two files directly under this folder.
For TensorRT 7+, to implement such a custom plugin, we need to write two classes.
DynamicTRTResize
, inherited from nvinfer1::IPluginV2DynamicExt , to complete the specific implementation of the plug-in.DynamicTRTResizeCreator
, inherited from nvinfer1::IPluginCreator , is the plug-in factory class for creatingDynamicTRTResize
plug-in instances.
In MMDeploy, since there are several plug-ins that need to be implemented, we mmdeploy/csrc/backend_ops/tensorrt/common/trt_plugin_base.hpp
implement TRTPluginBase
and TRTPluginCreatorBase
two classes in MMDeploy, which are used to manage some common property methods of all plug-ins.
Among them, TRTPluginBase
inherits from nvinfer1::IPluginV2DynamicExt
and TRTPluginCreatorBase
inherits from nvinfer1::IPluginCreator
. In this way, users only need to inherit these two new classes when implementing plug-ins. So we only need to dynamic_resize
refer to the header file in the .hpp file under the folder, trt_plugin_base.hpp
and the inheritance logic is as follows:
class DynamicTRTResize : public TRTPluginBase{}
class DynamicTRTResizeCreator : public TRTPluginCreatorBase{}
In trt_dynamic_resize.hpp, we declare the following:
#ifndef TRT_DYNAMIC_RESIZE_HPP
#define TRT_DYNAMIC_RESIZE_HPP
#include <cublas_v2.h>
#include <memory>
#include <string>
#include <vector>
#include "trt_plugin_base.hpp"
namespace mmdeploy {
class DynamicTRTResize : public TRTPluginBase {
public:
DynamicTRTResize(const std::string &name, bool align_corners);
DynamicTRTResize(const std::string name, const void *data, size_t length);
DynamicTRTResize() = delete;
// IPluginV2DynamicExt Methods
nvinfer1::IPluginV2DynamicExt *clone() const TRT_NOEXCEPT override;
nvinfer1::DimsExprs getOutputDimensions(int outputIndex, const nvinfer1::DimsExprs *inputs,
int nbInputs, nvinfer1::IExprBuilder &exprBuilder)
TRT_NOEXCEPT override;
bool supportsFormatCombination(int pos, const nvinfer1::PluginTensorDesc *ioDesc, int nbInputs,
int nbOutputs) TRT_NOEXCEPT override;
void configurePlugin(const nvinfer1::DynamicPluginTensorDesc *in, int nbInputs,
const nvinfer1::DynamicPluginTensorDesc *out,
int nbOutputs) TRT_NOEXCEPT override;
size_t getWorkspaceSize(const nvinfer1::PluginTensorDesc *inputs, int nbInputs,
const nvinfer1::PluginTensorDesc *outputs,
int nbOutputs) const TRT_NOEXCEPT override;
int enqueue(const nvinfer1::PluginTensorDesc *inputDesc,
const nvinfer1::PluginTensorDesc *outputDesc, const void *const *inputs,
void *const *outputs, void *workspace, cudaStream_t stream) TRT_NOEXCEPT override;
// IPluginV2Ext Methods
nvinfer1::DataType getOutputDataType(int index, const nvinfer1::DataType *inputTypes,
int nbInputs) const TRT_NOEXCEPT override;
// IPluginV2 Methods
const char *getPluginType() const TRT_NOEXCEPT override;
const char *getPluginVersion() const TRT_NOEXCEPT override;
int getNbOutputs() const TRT_NOEXCEPT override;
size_t getSerializationSize() const TRT_NOEXCEPT override;
void serialize(void *buffer) const TRT_NOEXCEPT override;
private:
bool mAlignCorners;
};
class DynamicTRTResizeCreator : public TRTPluginCreatorBase {
public:
DynamicTRTResizeCreator();
const char *getPluginName() const TRT_NOEXCEPT override;
const char *getPluginVersion() const TRT_NOEXCEPT override;
nvinfer1::IPluginV2 *createPlugin(const char *name, const nvinfer1::PluginFieldCollection *fc)
TRT_NOEXCEPT override;
nvinfer1::IPluginV2 *deserializePlugin(const char *name, const void *serialData,
size_t serialLength) TRT_NOEXCEPT override;
};
} // namespace mmdeploy
#endif // TRT_DYNAMIC_RESIZE_HPP
In such a header file, the DynamicTRRTesize class performs the following nesting doll inheritance:
From the above picture and code, we found that in the plug-in class DynamicTRTResize
, we defined a private variable mAlignCorners
, which indicates whether or not align corners
. In addition, it is only necessary to implement the constructor, destructor and methods of the three base classes in TensoRT. There are two constructors, which are used to create plug-ins and deserialize plug-ins respectively. And in the base class method:
- The method of the base class
IPluginV2DynamicExt
is more worthy of attention.getOutputDimensions
It obtains the shape of the output tensor andenqueue
is really responsible for executing our algorithm. Generally, the CUDA kernel function is called internally. The plug-in implemented in this article directly callscsrc/backend_ops/tensorrt/bicubic_interpolate
the kernel function defined in MMDeploybicubic_interpolate
. - For the method of the base class
IPluginV2Ext
, we only need to realize the method of obtaining the output data typegetOutputDataType
. - The base class
IPluginV2
is a method to get the plugin type and version number, in addition to a function to serialize the parameters of the input pluginserialize
and a function to calculate the serializedbuffer
size of the parametergetSerializationSize
, and a method to get the number of output tensorsgetNbOutputs
. There are also some public methods defined withinTRTPluginBase
the class.
In the plugin factory class DynamicTRTResizeCreator
, we need to declare the methods getPluginName
and methods to get the plugin name and version getPluginVersion
. At the same time, we also need to declare the methods of creating plug-ins and deserializing plug-ins createPlugin
, deserializePlugin
the former calls DynamicTRTResize
the method of creating plug-ins, and the latter calls the methods of deserializing plug-ins.
Next, let's implement the above statement. The code we implement in the .cpp file is as follows:
// Copyright (c) OpenMMLab. All rights reserved
#include "trt_dynamic_resize.hpp"
#include <assert.h>
#include <chrono>
#include "trt_plugin_helper.hpp"
#include "trt_serialize.hpp"
// to get the reference to kernel function bicubic_interpolate,which will be used in enqueue
#include "../bicubic_interpolate/trt_bicubic_interpolate_kernel.hpp"
using namespace nvinfer1;
namespace mmdeploy {
namespace {
static const char *PLUGIN_VERSION{"1"};
static const char *PLUGIN_NAME{"DynamicTRTResize"};//plagin name == ONNX node name,triggered in building engine
} // namespace
DynamicTRTResize::DynamicTRTResize(const std::string &name, bool align_corners)
: TRTPluginBase(name), mAlignCorners(align_corners) {}
DynamicTRTResize::DynamicTRTResize(const std::string name, const void *data,
size_t length)
: TRTPluginBase(name) {
deserialize_value(&data, &length, &mAlignCorners);
}
nvinfer1::IPluginV2DynamicExt *DynamicTRTResize::clone() const TRT_NOEXCEPT {
DynamicTRTResize *plugin =
new DynamicTRTResize(mLayerName, mAlignCorners);
plugin->setPluginNamespace(getPluginNamespace());
return plugin;
}
nvinfer1::DimsExprs DynamicTRTResize::getOutputDimensions(
int outputIndex, const nvinfer1::DimsExprs *inputs, int nbInputs,
nvinfer1::IExprBuilder &exprBuilder) TRT_NOEXCEPT {
nvinfer1::DimsExprs ret;
ret.nbDims = 4;
// input two tensors: input and size_tensor, the later is for shape inference only
ret.d[0] = inputs[0].d[0];
ret.d[1] = inputs[0].d[1];
ret.d[2] = inputs[1].d[2];
ret.d[3] = inputs[1].d[3];
return ret;
}
bool DynamicTRTResize::supportsFormatCombination(int pos,
const nvinfer1::PluginTensorDesc *ioDesc,
int nbInputs, int nbOutputs) TRT_NOEXCEPT {
if (pos == 0) {
return (ioDesc[pos].type == nvinfer1::DataType::kFLOAT &&
ioDesc[pos].format == nvinfer1::TensorFormat::kLINEAR);
} else {
return ioDesc[pos].type == ioDesc[0].type && ioDesc[pos].format == ioDesc[0].format;
}
}
void DynamicTRTResize::configurePlugin(const nvinfer1::DynamicPluginTensorDesc *inputs,
int nbInputs,
const nvinfer1::DynamicPluginTensorDesc *outputs,
int nbOutputs) TRT_NOEXCEPT {}
size_t DynamicTRTResize::getWorkspaceSize(const nvinfer1::PluginTensorDesc *inputs,
int nbInputs,
const nvinfer1::PluginTensorDesc *outputs,
int nbOutputs) const TRT_NOEXCEPT {
return 0;
}
int DynamicTRTResize::enqueue(const nvinfer1::PluginTensorDesc *inputDesc,
const nvinfer1::PluginTensorDesc *outputDesc,
const void *const *inputs, void *const *outputs, void *workSpace,
cudaStream_t stream) TRT_NOEXCEPT {
int batch = inputDesc[0].dims.d[0];
int channels = inputDesc[0].dims.d[1];
int height = inputDesc[0].dims.d[2];
int width = inputDesc[0].dims.d[3];
int height_out = outputDesc[0].dims.d[2];
int width_out = outputDesc[0].dims.d[3];
const void *x = inputs[0];
void *output = outputs[0];
// TODO: add fp16 support
auto data_type = inputDesc[0].type;
switch (data_type) {
case nvinfer1::DataType::kFLOAT:
bicubic_interpolate<float>((float *)x, (float *)output, batch, channels, height, width,
height_out, width_out, mAlignCorners, stream);
break;
default:
return 1;
break;
}
return 0;
}
nvinfer1::DataType DynamicTRTResize::getOutputDataType(int index,
const nvinfer1::DataType *inputTypes,
int nbInputs) const TRT_NOEXCEPT {
return inputTypes[0];
}
// IPluginV2 Methods
const char *DynamicTRTResize::getPluginType() const TRT_NOEXCEPT { return PLUGIN_NAME; }
const char *DynamicTRTResize::getPluginVersion() const TRT_NOEXCEPT { return PLUGIN_VERSION; }
int DynamicTRTResize::getNbOutputs() const TRT_NOEXCEPT { return 1; }
size_t DynamicTRTResize::getSerializationSize() const TRT_NOEXCEPT {
return serialized_size(mAlignCorners);
}
void DynamicTRTResize::serialize(void *buffer) const TRT_NOEXCEPT {
serialize_value(&buffer, mAlignCorners);
}
// creator /
DynamicTRTResizeCreator::DynamicTRTResizeCreator() {
mPluginAttributes.clear();
mPluginAttributes.emplace_back(nvinfer1::PluginField("align_corners"));
mFC.nbFields = mPluginAttributes.size();
mFC.fields = mPluginAttributes.data();
}
const char *DynamicTRTResizeCreator::getPluginName() const TRT_NOEXCEPT { return PLUGIN_NAME; }
const char *DynamicTRTResizeCreator::getPluginVersion() const TRT_NOEXCEPT {
return PLUGIN_VERSION;
}
nvinfer1::IPluginV2 *DynamicTRTResizeCreator::createPlugin(
const char *name, const nvinfer1::PluginFieldCollection *fc) TRT_NOEXCEPT {
nvinfer1::Dims size{2, {1, 1}};
bool align_corners = 1;
for (int i = 0; i < fc->nbFields; i++) {
if (fc->fields[i].data == nullptr) {
continue;
}
std::string field_name(fc->fields[i].name);
if (field_name.compare("align_corners") == 0) {
align_corners = static_cast<const int *>(fc->fields[i].data)[0];
}
}
// create the instance of DynamicTRTResize
DynamicTRTResize *plugin = new DynamicTRTResize(name, align_corners);
plugin->setPluginNamespace(getPluginNamespace());
return plugin;
}
nvinfer1::IPluginV2 *DynamicTRTResizeCreator::deserializePlugin(
const char *name, const void *serialData, size_t serialLength) TRT_NOEXCEPT {
auto plugin = new DynamicTRTResize(name, serialData, serialLength);
plugin->setPluginNamespace(getPluginNamespace());
return plugin;
}
REGISTER_TENSORRT_PLUGIN(DynamicTRTResizeCreator);//register the plugin
} // namespace mmdeploy
Then, we rebuild the TensorRT dynamic library for MMDeploy build/lib/libmmdeploy_tensorrt_ops.so
. Generally, successful compilation means that the operator has been registered, but we need to perform some tests to ensure the correct result.
test
Let's use TensorRT's python api to check the current list of plugins:
import tensorrt as trt
from mmdeploy.backend.tensorrt import load_tensorrt_plugin
load_tensorrt_plugin()
def get_plugin_names():
return [pc.name for pc in trt.get_plugin_registry().plugin_creator_list]
print(get_plugin_names())
You can find 'DynamicTRRTesize' in the list of plugins. Then we perform a functional test on this plug-in to see if the inference result is consistent with the PyTroch result, and whether the output size can be dynamically controlled.
from mmdeploy.backend.tensorrt import create_trt_engine, save_trt_engine
engine = create_trt_engine(
'srcnn3.onnx',
input_shapes=dict(input = dict(
min_shape=[1, 3, 256, 256],
opt_shape=[1, 3, 256, 256],
max_shape=[1, 3, 256, 256]),
factor = dict(min_shape = [1, 1, 256, 256], opt_shape = [1, 1, 512, 512], max_shape = [1, 1, 1024, 1024])))
save_trt_engine(engine, 'srcnn3.engine')
from mmdeploy.backend.tensorrt import TRTWrapper
trt_model = TRTWrapper('srcnn3.engine', ['output'])
factor = torch.rand([1, 1, 768, 768], dtype=torch.float)
trt_output = trt_model.forward(dict(input = x.cuda(), factor = factor.cuda()))
torch_output = model.forward(x, factor)
assert np.allclose(trt_output['output'].cpu().numpy(), torch_output.cpu().detach(), rtol = 1e-3, atol = 1e-5)
Compare whether the output of TensorRT is consistent with that of PyTorch. If the program does not report an error, it means that the reasoning is correct. In addition, we used a different size when testing and exporting, and the results were consistent with PyTorch, indicating that dynamic sizes can be supported.
Summarize
In this tutorial, we mainly describe how to add a custom TensorRT plug-in to the MMDeploy code base. The whole process does not involve too much more complicated CUDA programming. I believe that after learning, you can implement the desired plug-in by yourself.
https://github.com/open-mmlab/mmdeploygithub.com/open-mmlab/mmdeploy
So far, our series of tutorials on getting started with model deployment has been updated for eight issues, so it may come to an end here! Thank you for your support and love, we will continue to launch advanced tutorials on model deployment in the future, please stay tuned! All the content related to model deployment is organized in this column, welcome everyone to pay attention~
Model deployment those thingswww.zhihu.com/column/c_1497987564452114432Uploading...ReuploadCancel