1. Project introduction
Paddle Inference is Paddle's native inference library, which provides server-side high-performance inference capabilities and is directly based on Paddle's training operators, so it supports the inference of all models trained by Paddle; Paddle Inference has rich features and excellent performance. Different platforms and different application scenarios have been deeply adapted and optimized to achieve high throughput and low latency, ensuring that the flying paddle model can be trained and used on the server side and quickly deployed.
However, since Paddle Inference currently only provides Python, C++, C, and Go language method interfaces, C# cannot be used directly. In recent years, the C# language has developed rapidly and has become one of the top programming language rankings. In order to call the Paddle Inference model inference library in the C# language, PaddleInferenceSharp is launched according to the principle of the C++ dynamic link library. The C# platform calls Paddle Inference to deploy the deep learning model. Its realization principle can refer to the figure below:
2. Project environment configuration
In order to prevent problems with the recurring code, the following code development environments are listed, which can be set according to your needs.
- OS: Windows 11
- CUDA:11.4
- cuDNN:8.2.4
- TensorRT:8.4.0.6
- OpenCV:4.5.5
- Visual Studio:2022
- C# framework: .NET 6.0
- OpenCvSharp:OpenCvSharp4
The most important thing here is to install Paddle Inference C++ version. For the specific installation method, please refer to the following link: Paddle Inference C++ Dependent Library Installation (Windows) . For other dependency installations, please refer to the following links: NVIDIA TensorR Installation (Windows C++) , OpenCV C++ Installation and Configuration .
3. Project download method
The source code used in the project has been open sourced on Github and Gitee.
Github:
git clone https://github.com/guojin-yan/PaddleInferenceSharp.git
Gitee:
git clone https://gitee.com/guojin-yan/PaddleInferenceSharp.git
4. PaddleInfer class
4.1 API methods
serial number | API | parameter explanation | illustrate | |
---|---|---|---|---|
1 | method | PaddleInfer() | Constructor, initialize the inference core, read the local model | |
parameter | string model_path | Static graph model file | ||
string params_path | Model configuration file information, default is empty | |||
2 | method | void set_divice() | Set up an inference device | Support CPU, GPU, ONNX runtime, oneDNN |
parameter | Division division | Device Name Selection | ||
whether _ | For CPU and ONNX runtime, it represents the number of threads, and the default is 10; for GPU, it represents the graphics card number, and the default is 0; for oneDNN, it represents the number of caches, and the default is 1 |
|||
head memory_init_size | Memory allocation space (effect when using GPU), the default is 500 | |||
int workspace_size | Video memory workspace (effect when using GPU), the default is 30 | |||
3 | method | List <string> get_input_names() | Get the input node name | |
4 | method | void set_input_shape() | Set input node shape | According to the node dimension setting |
parameter | int[] input_shape | shape array | ||
string input_name | node name | |||
5 | method | void load_input_data() | Set image/common input data | method overloading |
parameter | string input_name | Enter a node name | ||
float[] input_data | Input data | |||
parameter | string input_name | Enter a node name | ||
byte[] image_data | image data | |||
ulong image_size | picture length | |||
int type | Data processing type: type == 0: mean variance normalization, direct scaling type == 1: ordinary normalization, direct scaling type == 2: mean variance normalization, affine transformation |
|||
6 | method | void infer() | model reasoning | |
7 | method | List <string> get_output_names() | Get output node name | |
8 | method | List <int> get_shape() | Get the specified node shape | |
parameter | string node_name | node name | ||
9 | method | void T[] read_infer_result <T>() | Read inference result data | Support reading Float32, Int32, Int64 format data |
parameter | string output_name | output node name | ||
int data_size | output data length | |||
10 | method | void delet() | delete memory address |
4.2 Enumeration
serial number | name | enum variable | meaning |
---|---|---|---|
1 | Device device name |
CPU | Use CPU inference |
GPU | Use GPU inference | ||
ONNX_runtime | Inference with ONNX_runtime | ||
oneDNN | Inference with oneDNN |
Regarding the use of the above methods, the county magistrate’s case tutorial and detailed technical documents will be updated in the future, so stay tuned.