background
Many excellent models in binocular use torch's grid_sample, but it has no interface in TensorRT. Although there are alternatives to 4D, there is still a performance loss. And I recently tested BGNET, and I feel that it is the most reliable in the real-time solution, and there will not be a large number of mismatches in the jitter, but it has a bilateral filter that increases the dimension and becomes a 5D grid_sample, which is a big headache. There is no alternative writing method on the Internet. I implemented it myself, and the result was the same, but the time was doubled directly. After thinking about it, I still try to see if I can get an interface.
testing platform
Jetson Xavier NX
antecedent
After some complicated attempts, I first tried torch2trt. I feel that the details are easy to fail to meet his interface requirements. There are too many investigations, so I give up.
I feel that it is not appropriate to compile the source code TensorRT source code on NX. After a search and try, the following solution is determined.
reference
- Others github tutorial
- NVIDIA official instructions
- github open source TensorRT interface
- NVIDIA's trtexec folder
process
- First register and export the unsupported interfaces in onnx. The registration code is copied here . Note that if you want to change the name of the operator, change the place I marked.
import torch
from my_model import my_model
import typing
from torch.onnx import symbolic_helper
_OPSET_VERSION = 11
_registered_ops: typing.AbstractSet[str] = set()
def _reg(symbolic_fn: typing.Callable):
name = "::%s" % symbolic_fn.__name__
torch.onnx.register_custom_op_symbolic(name, symbolic_fn, _OPSET_VERSION)
_registered_ops.add(name)
def register():
"""Register ONNX Runtime's built-in contrib ops.
Should be run before torch.onnx.export().
"""
def grid_sampler(g, input, grid, mode, padding_mode, align_corners):
# mode
# 'bilinear' : onnx::Constant[value={0}]
# 'nearest' : onnx::Constant[value={1}]
# 'bicubic' : onnx::Constant[value={2}]
# padding_mode
# 'zeros' : onnx::Constant[value={0}]
# 'border' : onnx::Constant[value={1}]
# 'reflection' : onnx::Constant[value={2}]
mode = symbolic_helper._maybe_get_const(mode, "i")
padding_mode = symbolic_helper._maybe_get_const(padding_mode, "i")
mode_str = ["bilinear", "nearest", "bicubic"][mode]
padding_mode_str = ["zeros", "border", "reflection"][padding_mode]
align_corners = int(symbolic_helper._maybe_get_const(align_corners, "b"))
# From opset v13 onward, the output shape can be specified with
# (N, C, H, W) (N, H_out, W_out, 2) => (N, C, H_out, W_out)
# input_shape = input.type().sizes()
# gird_shape = grid.type().sizes()
# output_shape = input_shape[:2] + gird_shape[1:3]
# g.op(...).setType(input.type().with_sizes(output_shape))
return g.op(
## op name, modify here. not sure whether "com.microsoft::" is required
"com.microsoft::GridSamplePluginDynamic",
input,
grid,
mode_s=mode_str,
padding_mode_s=padding_mode_str,
align_corners_i=align_corners,
)
_reg(grid_sampler)
@torch.no_grad()
def convert():
register()
# set cpu
device = "cuda"
model = my_model (88, 'models.pth').to(device)
model.eval()
t1 = torch.rand(1, 1, 384, 640).to(device)
t2 = torch.rand(1, 1, 384, 640).to(device)
# Export the model
torch.onnx.export(model,
(t1, t2),
'model.onnx', # where to save the model (can be a file or file-like object)
export_params=True, # store the trained parameter weights inside the model file
opset_version=11, # the ONNX version to export the model to
do_constant_folding=True, # whether to execute constant folding for optimization
input_names = ['left', 'right'], # the model's input names
output_names = ['output'])
if __name__ == "__main__":
convert()
Because the grid_sample I want to use is 5D, even if it is exported with the latest onnx, it does not support it, so I still use the old version of onnx to export. For 4D, I don't know if there will be any problems when exporting directly with a higher version. Had a chance to try out CREStereo afterwards.
- Use the onnx-simplifier casually, it is possible to eliminate some if nodes to avoid tensorrt reporting some errors
python3 -m onnxsim model.onnx model_sim.onnx
- Download the open source interface from here (mmcv also has it, I feel that you can find it there too), I personally use grid_sample so I deleted everything else. After deleting, change these two places.
After compiling according to his markdown, it will be fine. The generated library file is in build/lib, it looks like this
- Then use trtexec to convert the model file and link to the interface:
/usr/src/tensorrt/bin/trtexec --onnx=model.onnx --saveEngine=model.trt --fp16 \
--plugins=/home/ubuntu/Documents/amirstan_plugin/build/lib/libamirstan_plugin.so
Here I linked it at the beginning but the interface still didn’t match. The name in the interface project didn’t seem to change. After thinking about it, I changed the operator type in the onnx model to GridSamplePluginDynamic.
- Then run the project file in c++ on my previous project to achieve the effect.
At first, I tried to link directly in CMakeLists, but found no effect. But trtexec is possible, so I took a look at the source code of trtexec , he used the dlopen function, and I followed suit.
#include <dlfcn.h>
......
string dll_path = "/home/ubuntu/Documents/amirstan_plugin/build/lib/libamirstan_plugin.so";
void *handle = dlopen(dll_path.c_str(), RTLD_LAZY);
if (NULL == handle)
{
printf("dlopen error. msg:%s", dlerror());
return -1;
}
......
At the same time, write in CMakeLists
target_link_libraries(main ${CMAKE_DL_LIBS} )
- Then compile and run the graph, the effect is not convenient to display, it looks the same as the grid_sample realized by other methods before. The grid_sample running model originally implemented by other methods is about 12 frames, and now it is 17 frames, which is significantly improved. done~
Replenish
For python implementation, I can refer to here , use ctypes.CDLL (plugin_lib) to link the library, and the others are the same. Not sure, haven't tried it yet.