Accelerating ScaledYOLOv4 with TensorRT

Many people have written the TensorRT version of yolo, and I will write one too. The specific code can be found in my github

test environment

ubuntu 18.04
pytorch 1.7.1 
jetpack 4.4
CUDA 11.0
TensorRT7.1

quick start

1. Generate onnx model

git clone --branch yolov4-csp https://github.com/WongKinYiu/ScaledYOLOv4
git clone https://github.com/talebolano/TensorRT-Scaled-YOLOv4
cp TensorRT-Scaled-YOLOv4/script/ScaledYOLOv4-csp/* ScaledYOLOv4/
下载yolov4-csp.weights到ScaledYOLOv4/
cd ScaledYOLOv4
python3 export.py

2. Compile

cd ../TensorRT-Scaled-YOLOv4
mkdir build 
cd build
cmake ..
make -j8

3. Convert the onnx model to the trt model

./makeCudaEngine -i ../../ScaledYOLOv4/yolov4-csp.onnx -o yolov4-csp.trt

4. Test

./inferYoloCuda  -e yolov4-csp.trt -i 你的图片 -show -save

speed effect

Mode GPU inference time Ap
FP16 V100 10ms -
FP16 xavier 35ms -

insert image description here

Using the mish plugin layer

1. Download the open source version of TensorRT and register the Mish plugin in builtin_op_importers.cpp. Paste the following code at the bottom of builtin_op_importers.cpp

	DEFINE_BUILTIN_OP_IMPORTER(Mish)
	{
    
    
    	ASSERT(inputs.at(0).is_tensor(),  nvonnxparser::ErrorCode::kUNSUPPORTED_NODE); // input
    	std::vector<nvinfer1::ITensor*> tensors;
    	nvinfer1::ITensor* input = &convertToTensor(inputs.at(0),ctx);
    	tensors.push_back(input);
    
    	const std::string pluginName = "Mish_TRT";
    	const std::string pluginVersion = "001";
    	std::vector<nvinfer1::PluginField> f;

    	const auto mPluginRegistry = getPluginRegistry();
    	const auto pluginCreator
    	    = mPluginRegistry->getPluginCreator(pluginName.c_str(), pluginVersion.c_str(), "");
    	nvinfer1::PluginFieldCollection fc;
    	fc.nbFields = f.size();
    	fc.fields = f.data();
    	nvinfer1::IPluginV2* plugin = pluginCreator->createPlugin(node.name().c_str(), &fc);

    	ASSERT(plugin != nullptr && "Mish plugin was not found in the plugin registry!",
        	ErrorCode::kUNSUPPORTED_NODE);
    	nvinfer1::IPluginV2Layer* layer = ctx->network()->addPluginV2(tensors.data(), tensors.size(), *plugin);
    	RETURN_ALL_OUTPUTS(layer);
	}

2. Use MishImplementation() instead of Mish() in ScaledYOLOv4/models/models.py, and use

	modules.add_module('activation', Mish())

replace with

	modules.add_module('activation', MishImplementation())

3. Generate onnx model and convert it to trt model

python3 export.py
./makeCudaEngine -i ../../ScaledYOLOv4/yolov4-csp.onnx -o yolov4-csp.trt

4. Test

./inferYoloCuda  -e yolov4-csp.trt -i 你的图片 -show -save

Note: Using the mish plugin in xavier is more than 10ms slower than not using it.

Guess you like

Origin blog.csdn.net/blanokvaffy/article/details/115312643