Several small pits stepped on by pytorch to tensorRT

1. The version does not match

[E] [TRT] Layer:Where_51's output can not be used as shape tensor.
[E] [TRT] Network validation failed.
[E] Engine creation failed.
[E] Engine set up failed.

This is actually due to the mismatch between pytorch and TensorRT version, my TensorRT is 7.0, pytorch should be 1.4, but I used 1.7

Therefore, you need to re-read the weight file with 1.7, then save it in the old way, and then export it with onnx

def main():
    input_shape = (3, 416, 416)
    model_onnx_path = "yolov4tiny.onnx"

    # model = torch.hub.load('mateuszbuda/brain-segmentation-pytorch', 'unet',
    #                        in_channels=3, out_channels=1, init_features=32, pretrained=True)
    model = YoloBody(3, 12).cuda()
    device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
    dummy_input = torch.randn(1, 3, 416, 416, device=device)
    # 用1.7版本读取权重
    state_dict = torch.load('logs/Epoch120-Total_Loss0.5324-Val_Loss0.8735.pth', map_location=device)
    model.load_state_dict(state_dict)
    # 保存成1.4版本支持的格式
    torch.save(model.state_dict(), 'logs/for_onnx.pth', _use_new_zipfile_serialization=False)
    
    # Python解释器换成torch1.4的环境,重新读取,导出pytorch1.4版本对应的onnx
    state_dict = torch.load('logs/for_onnx.pth', map_location=device)
    model.load_state_dict(state_dict)
    model.train(False)

    inputs = ['input_1']
    outputs = ['output_1', 'output_2']
    dynamic_axes = {'input_1': {0: 'batch'}, 'output_1': {0: 'batch'}}
    torch.onnx.export(model,
                      dummy_input,
                      model_onnx_path,
                      export_params=True,
                      opset_version=11,
                      do_constant_folding=True,
                      input_names=inputs, output_names=outputs,
                      dynamic_axes=None)

The onnx file generated by this operation can be converted by TensorRT

Use TensorRT's OSS tools:

trtexec --explicitBatch --onnx=yolov4tiny.onnx --saveEngine=yolov4tiny.trt --fp16 --workspace=10240 --verbose

It is recommended to turn on --verbose, the conversion process will be very slow, verbose prints the log and can feel relieved, otherwise staring at the screen will think that it is stuck and panic

 

2. Upsampling scale problem

[5] Assertion failed: ctx->tensors().count(inputName)

During the upsampling phase of YOLO, Pytorch using onnx with opset=11 will cause a constant node to be added to the upsample layer, so TensorFlowRT conversion fails. During the period , refer to the method mentioned in the pit record of pytorch using onnx to deploy and convert the conversion .

After trying multiple versions of Pytorch and onnx, the problem of the upsample layer still can’t be solved. Finally, refer to the implementation of https://github.com/Tianxiaomo/pytorch-YOLOv4 , instead of using torch’s own interpolation function in the inference, but by yourself Rewrite, successfully exported TensorRT

class Upsample(nn.Module):
    def __init__(self):
        super(Upsample, self).__init__()

    def forward(self, x, target_size, inference=False):
        assert (x.data.dim() == 4)
        # _, _, tH, tW = target_size

        if inference:

            #B = x.data.size(0)
            #C = x.data.size(1)
            #H = x.data.size(2)
            #W = x.data.size(3)

            return x.view(x.size(0), x.size(1), x.size(2), 1, x.size(3), 1).\
                    expand(x.size(0), x.size(1), x.size(2), target_size[2] // x.size(2), x.size(3), target_size[3] // x.size(3)).\
                    contiguous().view(x.size(0), x.size(1), target_size[2], target_size[3])
        else:
            return F.interpolate(x, size=(target_size[2], target_size[3]), mode='nearest')

3. The data type is wrong

Unsupported ONNX data type: DOUBLE (2)

Due to the division in the above changes, the TRT recognizes double precision. The cast node in the figure below is the problem. It’s amazing that this problem will not occur with the original code above, and it will be a problem if I change my own model.

According to https://github.com/onnx/onnx-tensorrt/issues/400#issuecomment-730240546 statement

It didn't help to try it.

In order to solve this problem, directly replace target_size[3] // x.size(3) with result 2 and it succeeds.

Guess you like

Origin blog.csdn.net/qq_26751117/article/details/111352947