[TensorRT] Deploying the Yolov5 model based on C# calling TensorRT - Part 2: Building the Nvinfer class

  NVIDIA TensorRT™ is an SDK for high-performance deep learning inference that provides low latency and high throughput for deep learning inference applications. For detailed installation methods, refer to the following blog: NVIDIA TensorRT Installation (Windows C++)
insert image description here
  In the previous article, we have introduced the use of TensorRT to deploy the Yolov5 model in C++ , but in practical applications, there is often a need to deploy the model in C#. C# calls the function interface to implement model deployment. Here, the dynamic link library function is used to build TensorRTSharp to implement the C# deployment model.

2. Build the Nvinfer class

2.1 Create a new C# class library

  Right-click on the solution, Add->New Project, select Add C# Class Library, name the project csharp_tensorrt_class, and select the project framework according to the framework in the computer. Here, .NET 5.0 is used. After the creation is complete, right-click the project, select Add -> New Item, select the class file, add Nvinfer.csand NativeMethods.cstwo class files.

2.2 Import the method in the dll file

  Under the NativeMethods.cs file, we [DllImport()]read all the methods in the dll file into C# through the method. The model conversion method is read as follows:

[DllImport(tensorrt_dll_path, CharSet = CharSet.Unicode, CallingConvention = CallingConvention.Cdecl)]
public extern static void onnx_to_engine(string onnx_file_path, string engine_file_path, int type);

  Among them, openvino_dll_path is the path of the dll file, CharSet = CharSet.Unicode means that the Chinese encoding format string is supported, and CallingConvention = CallingConvention.Cdecl indicates that the calling convention of the entry point cleans up the stack for the caller.
The above list is to initialize the reasoning model. When the dlii file interface is matched, it is matched by the method name. Therefore, the method name must be consistent with the dll file. The second is to correspond to the parameter type of the method. In the above method, the return value of the function is void* in C++, and the corresponding IntPtr type in C#. Among the input parameters, it is a wchar_t* character pointer in C++, and in C# Corresponding to the string character string. Through the one-to-one correspondence between the method name and the parameter type, the method call can be realized in C#. Other methods are rewritten in C# as follows:

// 读取本地engine模型,并初始化NvinferStruct
public extern static IntPtr nvinfer_init(string engine_filename, int num_ionode);
// 创建GPU显存输入/输出缓冲区
public extern static IntPtr creat_gpu_buffer(IntPtr nvinfer_ptr, string node_name, ulong data_length);
// 加载图片输入数据到缓冲区
public extern static IntPtr load_image_data(IntPtr nvinfer_ptr, string node_name, ref byte image_data, ulong image_size, int BN_means);
// 模型推理
public extern static IntPtr infer(IntPtr nvinfer_ptr);
// 读取推理数据
public extern static void read_infer_result(IntPtr nvinfer_ptr, string node_name_wchar, ref float result, ulong data_length);
// 删除内存地址
public extern static void nvinfer_delete(IntPtr nvinfer_ptr);

2.3 Create Nvinfer class

  In order to more conveniently call the TensorRT method we introduced through the dll and reduce the function method interface when using it, we reorganized our own reasoning class in C#, named as , and Class Nvinferits main member variables and methods are shown in the Nvinfer class diagram.

public class Nvinfer{}

insert image description here

  In the Nvinfer class, we only need to create an address variable as a member variable of the Nvinfer class to receive the inference core pointer returned by the interface function. We only need to access this member variable under the current class, so set it as a private variable :

private IntPtr ptr = new IntPtr();

  First encapsulate the model conversion method onnx_to_engine(), which is mainly used to convert the onnx model to the engine format, because the engine is an inference model file converted based on the local configuration, so the model file is not universal and needs to be converted by itself. In this method, you only need to input the local onnx model file, the local storage path of the converted engine, and the converted model precision type, and call the rewritten NativeMethods.onnx_to_engine()method.

public void onnx_to_engine(string onnx_file_path, string engine_file_path, AccuracyFlag type){
NativeMethods.onnx_to_engine(onnx_file_path, engine_file_path, (int)type);
}

  Next, build the inference model initialization method init(), we only need to input the path address of the engine model file and the number of input and output nodes, and then call NativeMethods.nvinfer_init()the method, which can realize the local reading of the engine model, and initialize the relevant downloads in the inference engine structure Member variables.

public void init(string engine_filename, int num_ionode){
ptr = NativeMethods.nvinfer_init(engine_filename, num_ionode);
}

creat_gpu_buffer()It mainly implements the creation of input/output buffers in the GPU memory, where the input/output node name and the data size of the input and output nodes need to be specified.

public void creat_gpu_buffer(string node_name, ulong data_length){
ptr = NativeMethods.creat_gpu_buffer(ptr, node_name, data_length);
}

load_image_data()This method is mainly to load the data with inference into the inference model. The input image data of this method is the image data converted into a matrix, which is convenient for image data to be transferred between C++ and C#. This method has included image data preprocessing and other steps, so we don't need to do data preprocessing here.

public void load_image_data(string node_name, byte[] image_data, ulong image_size, BNFlag BN_means){
ptr = NativeMethods.load_image_data(ptr, node_name, ref image_data[0], image_size, (int)BN_means);
}

infer()The main step is to call the model inference method to perform model inference on the configured data.

public void infer(){
ptr = NativeMethods.infer(ptr);
}

read_infer_result()It mainly realizes the reading of inference result data after model inference. Currently, the data type of the result only supports the reading of floating-point data. If there are other data reading requirements in the future, it will be changed according to the demand.

public float[] read_infer_result(string node_name_wchar,ulong data_length){
float[] result = new float[data_length];
NativeMethods.read_infer_result(ptr, node_name_wchar, ref result[0], data_length);
return result;
}

  The last step is mainly to delete the memory data, and placing too much memory leads to memory leaks.

public void delete(){
NativeMethods.nvinfer_delete(ptr);
}

2.4 Compile the Nvinfer class library

Right-click the project, click Generate/Regenerate, and the figure shown in the figure appears, indicating that the compilation is successful.
insert image description here

Guess you like

Origin blog.csdn.net/grape_yan/article/details/128551469