Accelerating ScaledYOLOv4 with TensorRT

test environment
quick start
speed effect
Using the mish plugin layer

Many people have written the TensorRT version of yolo, and I will write one too. The specific code can be found in my github

test environment

ubuntu 18.04
pytorch 1.7.1 
jetpack 4.4
CUDA 11.0
TensorRT7.1

quick start

1. Generate onnx model

git clone --branch yolov4-csp https://github.com/WongKinYiu/ScaledYOLOv4
git clone https://github.com/talebolano/TensorRT-Scaled-YOLOv4
cp TensorRT-Scaled-YOLOv4/script/ScaledYOLOv4-csp/* ScaledYOLOv4/
下载yolov4-csp.weights到ScaledYOLOv4/
cd ScaledYOLOv4
python3 export.py

2. Compile

cd ../TensorRT-Scaled-YOLOv4
mkdir build 
cd build
cmake ..
make -j8

3. Convert the onnx model to the trt model

./makeCudaEngine -i ../../ScaledYOLOv4/yolov4-csp.onnx -o yolov4-csp.trt

4. Test

./inferYoloCuda  -e yolov4-csp.trt -i 你的图片 -show -save

speed effect

Mode	GPU	inference time	Ap
FP16	V100	10ms	-
FP16	xavier	35ms	-

insert image description here

Using the mish plugin layer

1. Download the open source version of TensorRT and register the Mish plugin in builtin_op_importers.cpp. Paste the following code at the bottom of builtin_op_importers.cpp

	DEFINE_BUILTIN_OP_IMPORTER(Mish)
	{
    
    
    	ASSERT(inputs.at(0).is_tensor(),  nvonnxparser::ErrorCode::kUNSUPPORTED_NODE); // input
    	std::vector<nvinfer1::ITensor*> tensors;
    	nvinfer1::ITensor* input = &convertToTensor(inputs.at(0),ctx);
    	tensors.push_back(input);
    
    	const std::string pluginName = "Mish_TRT";
    	const std::string pluginVersion = "001";
    	std::vector<nvinfer1::PluginField> f;

    	const auto mPluginRegistry = getPluginRegistry();
    	const auto pluginCreator
    	    = mPluginRegistry->getPluginCreator(pluginName.c_str(), pluginVersion.c_str(), "");
    	nvinfer1::PluginFieldCollection fc;
    	fc.nbFields = f.size();
    	fc.fields = f.data();
    	nvinfer1::IPluginV2* plugin = pluginCreator->createPlugin(node.name().c_str(), &fc);

    	ASSERT(plugin != nullptr && "Mish plugin was not found in the plugin registry!",
        	ErrorCode::kUNSUPPORTED_NODE);
    	nvinfer1::IPluginV2Layer* layer = ctx->network()->addPluginV2(tensors.data(), tensors.size(), *plugin);
    	RETURN_ALL_OUTPUTS(layer);
	}

2. Use MishImplementation() instead of Mish() in ScaledYOLOv4/models/models.py, and use

	modules.add_module('activation', Mish())

replace with

	modules.add_module('activation', MishImplementation())

3. Generate onnx model and convert it to trt model

python3 export.py
./makeCudaEngine -i ../../ScaledYOLOv4/yolov4-csp.onnx -o yolov4-csp.trt

4. Test

./inferYoloCuda  -e yolov4-csp.trt -i 你的图片 -show -save

Note: Using the mish plugin in xavier is more than 10ms slower than not using it.

Accelerating ScaledYOLOv4 with TensorRT

Accelerating ScaledYOLOv4 with TensorRT

test environment

quick start

1. Generate onnx model

2. Compile

3. Convert the onnx model to the trt model

4. Test

speed effect

Using the mish plugin layer

Guess you like