TensorRT模型推理内存泄露问题解决

问题描述:

在部署AI云服务后端时,使用tensorRT来进行模型推理,发现随着客户端不断地请求服务,显存会持续的增长,当累积到一定程度时就会出现申请不到显存而报错的情况。

经过分析是在tensorrt模型前向推理是造成的问题,在代码里:

trt_engine_path = './model/resnet50.trt'
trt_runtime = trt.Runtime(TRT_LOGGER)

engine = load_engine(trt_runtime, trt_engine_path)
context = engine.create_execution_context()
trt_outputs = do_inference(context, bindings=bindings, inputs=inputs, outputs=outputs, stream=stream)

解决办法:

使用with语句来加载engine和context,推理结束时会自动释放显卡内存,写法如下:

trt_engine_path = './model/resnet50.trt'
trt_runtime = trt.Runtime(TRT_LOGGER)

with load_engine(trt_runtime, trt_engine_path) as engine:
    inputs, outputs, bindings, stream = allocate_buffers(engine)

    with engine.create_execution_context() as context:
        trt_outputs = do_inference(context, bindings=bindings, inputs=inputs, outputs=outputs, stream=stream)
        ......

除此之外,大家可以多看看tensorrt里面自带的samples,里面有关于不同模型的tensorRT推理的写法。

猜你喜欢

转载自blog.csdn.net/u012505617/article/details/111543125