tensorrt dynamic input resolution size

In this article, only the tensorrt python part involves dynamic resolution settings, not c++.

content

pytorch to onnx:

onnx to tensorrt:

python tensorrt inference:


Zhihu blog can also refer to:

tensorrt dynamic input (Dynamic shapes) - Programmer Sought

There are two reasons for recording this post: 1. There must be a lot of people who need it. 2. As far as I searched, none of the posts are clear and clear, and the official documents are not easy to make, so I need to guess. Not much to say, go directly to the code.

Take pytorch to onnx to tensorrt as an example, the dynamic shape is the length and width of the image.

pytorch to onnx:

def export_onnx(model,image_shape,onnx_path, batch_size=1):
    x,y=image_shape
    img = torch.zeros((batch_size, 3, x, y))
    dynamic_onnx=True
    if dynamic_onnx:
        dynamic_ax = {'input_1' : {2 : 'image_height',3:'image_wdith'},   
                                'output_1' : {2 : 'image_height',3:'image_wdith'}}
        torch.onnx.export(model, (img), onnx_path, 
           input_names=["input_1"], output_names=["output_1"], verbose=False, opset_version=11,dynamic_axes=dynamic_ax)
    else:
        torch.onnx.export(model, (img), onnx_path, 
           input_names=["input_1"], output_names=["output_1"], verbose=False, opset_version=11
    )


onnx to tensorrt:

According to the definition of dynamic shape in the official documentation of nvidia, the so-called dynamic is nothing more than not specifying when defining the engine, and replacing it with -1, and then confirming it during inference. Therefore, the code for creating the engine and the inference part needs to be modified.

When building the engine, the input and output of the network read from onnx are dynamic shapes. You only need to increase the optimization_profile to determine the size range of the input.

def build_engine(onnx_path, using_half,engine_file,dynamic_input=True):
    trt.init_libnvinfer_plugins(None, '')
    with trt.Builder(TRT_LOGGER) as builder, builder.create_network(EXPLICIT_BATCH) as network, trt.OnnxParser(network, TRT_LOGGER) as parser:
        builder.max_batch_size = 1 # always 1 for explicit batch
        config = builder.create_builder_config()
        config.max_workspace_size = GiB(1)
        if using_half:
            config.set_flag(trt.BuilderFlag.FP16)
        # Load the Onnx model and parse it in order to populate the TensorRT network.
        with open(onnx_path, 'rb') as model:
            if not parser.parse(model.read()):
                print ('ERROR: Failed to parse the ONNX file.')
                for error in range(parser.num_errors):
                    print (parser.get_error(error))
                return None
        ##增加部分
        if dynamic_input:
            profile = builder.create_optimization_profile();
            profile.set_shape("input_1", (1,3,512,512), (1,3,1024,1024), (1,3,1600,1600)) 
            config.add_optimization_profile(profile)
        #加上一个sigmoid层
        previous_output = network.get_output(0)
        network.unmark_output(previous_output)
        sigmoid_layer=network.add_activation(previous_output,trt.ActivationType.SIGMOID)
        network.mark_output(sigmoid_layer.get_output(0))
        return builder.build_engine(network, config) 

python tensorrt inference:


When inferring, there is a big dark pit. According to my previous understanding, since the input is dynamic, I only need to allocate a suitable buffer to the input, and then I can directly reason regardless of the size. It turns out that it is still young. According to the official documentation, you must add such a line during inference, context.active_optimization_profile = 0, to select the corresponding optimization_profile, ok, I added it, but an error is still reported, because we did not define the engine when we defined it. Input size, then you need to define the input size according to the actual input during inference.

def profile_trt(engine, imagepath,batch_size):
    assert(engine is not None)  
    
    input_image,input_shape=preprocess_image(imagepath)
 
    segment_inputs, segment_outputs, segment_bindings = allocate_buffers(engine, True,input_shape)
    
    stream = cuda.Stream()    
    with engine.create_execution_context() as context:
        context.active_optimization_profile = 0#增加部分
        origin_inputshape=context.get_binding_shape(0)
        #增加部分
        if (origin_inputshape[-1]==-1):
            origin_inputshape[-2],origin_inputshape[-1]=(input_shape)
            context.set_binding_shape(0,(origin_inputshape))
        input_img_array = np.array([input_image] * batch_size)
        img = torch.from_numpy(input_img_array).float().numpy()
        segment_inputs[0].host = img
        [cuda.memcpy_htod_async(inp.device, inp.host, stream) for inp in segment_inputs]#Copy from the Python buffer src to the device pointer dest (an int or a DeviceAllocation) asynchronously,
        stream.synchronize()#Wait for all activity on this stream to cease, then return.
       
        context.execute_async(bindings=segment_bindings, stream_handle=stream.handle)#Asynchronously execute inference on a batch. 
        stream.synchronize()
        [cuda.memcpy_dtoh_async(out.host, out.device, stream) for out in segment_outputs]#Copy from the device pointer src (an int or a DeviceAllocation) to the Python buffer dest asynchronously
        stream.synchronize()
        results = np.array(segment_outputs[0].host).reshape(batch_size, input_shape[0],input_shape[1])    
    return results.transpose(1,2,0)


It was just a few lines of code, and the result was a whole day of tossing, but fortunately, the problem of dynamic input was solved, and there was no need to write a bunch of messy code.

Original link: https://blog.csdn.net/weixin_42365510/article/details/112088887

Guess you like

Origin blog.csdn.net/jacke121/article/details/123767395