20230512 added, directly compile the source code of ONNXRuntime, and implement ONNXRuntime using different reasoning engines (especially TensorRT) as the backend.
- Documentation: TensorRT Execution Provider , requires version correspondence
- Of course, other inference engines can also be used as the backend: ONNX Runtime Execution Providers
- The currently supported backends are as follows:
- Check the version by yourself, I verified ONNX Runtime 1.12.1 version successfully, install TensorRT 8.4.3 | CUDA 11.4.4 | cuDNN 8.4.1 ,
- It was originally the 525.60 driver version, but I couldn’t find it after reinstalling it. Smartly installed the 525.125 version. CUDA here is the highest version supported by the driver! !
- Reference:
TensorRT Execution Provider
CUDA Execution Provider
tensorrt configuration code is as follows, only supports FP16 quantization, not INT8 quantization.
OrtTensorRTProviderOptions trt_options{
};
trt_options.device_id = 0;
trt_options.has_user_compute_stream = 1;
trt_options.trt_max_partition_iterations = 1000;
trt_options.trt_min_subgraph_size = 1;
trt_options.trt_engine_decryption_enable = false;
trt_options.trt_dla_core = 0;
trt_options.trt_dla_enable = 0;
trt_options.trt_fp16_enable = 1;
trt_options.trt_int8_enable = 0;
trt_options.trt_max_workspace_size = 2147483648;
trt_options.trt_int8_use_native_calibration_table = 1;
trt_options.trt_engine_cache_enable = 1;
trt_options.trt_engine_cache_path = "./trtcache";
trt_options.trt_dump_subgraphs = 1;
session_options.AppendExecutionProvider_TensorRT(trt_options);
You can use the Linux (x86_64) version to install.
You can use nvidia-smi
the command to view the current driver version and the highest supported CUDA version.
Note : Because of CUDA Minor Version Compatibility, Onnx Runtime built with CUDA 11.4 should be compatible with any CUDA 11.x version. Please reference Nvidia CUDA Minor Version Compatibility .
- Note:
To compile ONNX Runtime with TensorRT as the backend in Ubuntu, you need to install the cuDNN developer version, because it contains
cuDNN header files and dynamic libraries, which are used to compile and link code using cuDNN.
Note that the deb package is used to install cuDNN. The cuDNN developer version is different from the cuDNN runtime version. The former contains the files required for development, while the latter only contains runtime library files. When installing, please make sure to select the correct version, but to install the dev version, you must first install the runtime version. After installing cuDNN with deb package
on Ubuntu system , the files of cuDNN library will usually be installed in /usr/lib/x86_64-linux-gnu/ directory. Specifically, libcudnn.so will be installed in the /usr/lib/x86_64-linux-gnu/ directory, and the header files will be installed in the /usr/include/ directory. The version correspondence of the docs form of the onnxruntime official document below is not accurate! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! For the version of ONNX Runtime TensorRT CUDA, it is best to refer to the TensorRT documentation and the github repository of ONNX Runtime: TensorRT Release Notes ONNX Runtime Release
- Supplementary knowledge
- IR-Intermediate Representation (intermediate representation): Analysis of layer IR in AI framework
- Operator: What is an operator?
- ONNX opset (ONNX operator set version referenced during conversion , default 9): Model deployment tutorial (3): PyTorch to ONNX detailed explanation
- ONNX official website: Open Neural Network Exchange
- ONNX Runtime 官网:Optimize and Accelerate Machine Learning Inferencing and Training
- Introduction to the use of each language on GitHub, as well as some examples: microsoft / nonnxruntime-inference-examples , where the c_cxx folder is the C++ example ( onnxruntime-inference-examples/c_cxx/ ).
- Operations related to char* and string that need to be understood: c++ string to char*
1 Export and verification of ONNX model
- For the introduction of Yolov5 to ONNX model + official documents, see the article: Detailed introduction of Yolov5 to ONNX model + Python deployment using ONNX Runtime (including the introduction of official documents) .
The model used in this article is as follows:
According to the python code and environment, the opset version I use is 12 .
parser.add_argument('--opset', type=int, default=12, help='ONNX: opset version')
The version of onnx is 1.12.0 , and onnxruntime uses 1.13.1 in Python, and 1.13.1 will also be used here :
Versions of CUDA and TensorRT, etc.: (Windows 10) TensorRT acceleration of Yolov5-5.0 model + C++ deployment + VS2019 packaging dll (CMake) + Qt calls
cuda 11.1
cudnn 8.5.0
TensorRT 8.2.1.8
Opencv 4.5.5
CMake 3.24.2
2 ONNX Runtime (C++) reads the model in ONNX format
2.1 Check the version compatibility of ONNX Runtime (C++)
Python can pip
install onnxruntime directly, but C++ needs to be set by itself, and compatibility should also be considered
2.1.1 Enter the Docs page of ONNX Runtime
2.2.2 View the compatibility of the C++ version of ONNX Runtime with different systems
Enter Get Started through the option of How to use ONNX Runtime , and then enter Get Started / C++ . Compatibility can be checked through Builds.
2.2 Installation of ONNX Runtime (C++)
- This chapter mainly introduces what I have learned from exploring ONNX Runtime. You can choose different installation methods according to your own needs. I just chose the simplest method for simple learning and use.
- Official address: Install ONNX Runtime (ORT) , select C#/C/C++/WinML Installs .
You can also download directly from GitHub: .zip and .tgz files are also included as assets in each Github release.
Here we only try to deploy the CPU, as a study, the focus is on the deployment of TensorRT. - The whole process can be understood through the How it works section of the homepage
: click basic tutorials to enter ONNX Runtime Inferencing: API Basics , select C/C++ examples to enter microsoft / onnxruntime-inference-examples / c_cxx , and proceed according to the process in README .
I am here mainly to learn Load and run the model with ONNX Runtime.
You can see the use of the C++ version of ONNX Runtime, there are two options:
- download a prebuilt package : download the prebuilt package directly
- build from source : Build the required version from the source code, which can be understood as building a version according to the environment of your computer, such as different CUDA versions, etc.
- The two methods are shown in the figure below: You can choose according to your own needs.
The following is a brief introduction of the two options.
2.2.1 Select the prebuilt version you need in GitHub
Option 1: download a prebuilt package
The version I downloaded: onnxruntime-win-x64-1.13.1.zip , after downloading, unzip it and use it.
2.2.2 build from source
Option 2: build from source
Just follow the guidance in Option 2 in the picture above.
2.3 ONNX Runtime (C++) read model
- The official document steps at this point are: Get Started
- You can also take a look at the official Samples, select C/C++ examples in ONNX Runtime Inferencing: API Basics .
- In the example / C++ GitHub warehouse https://github.com/microsoft/onnxruntime-inference-examples/tree/main/c_cxx , just choose a sample, and then look at the cpp file inside: It is recommended to download it more conveniently
here You can directly use it according to the code inside. My YOLOV5 reasoning code also refers to the OpenCV deployment of yolov5v-v6.1 target detection (with source code) , the use of onnxruntime to deploy the yolov5 model under the win10 system , and the c++ usage code of onnxruntime are as follows : It is rather messy. I encapsulated the class myself and can rely on myself The need to package the following code
#include <fstream>
#include <iostream>
#include <opencv2/imgproc.hpp>
#include <opencv2/highgui.hpp>
#include <onnxruntime_cxx_api.h>
#include <opencv2/core/utils/logger.hpp>
#include <opencv2/opencv.hpp>
using namespace cv;
Mat resize_image(Mat srcimg, int* newh, int* neww, int* top, int* left)
{
int srch = srcimg.rows, srcw = srcimg.cols;
int inpHeight = 640;
int inpWidth = 640;
*newh = inpHeight;
*neww = 640;
bool keep_ratio = true;
Mat dstimg;
if (keep_ratio && srch != srcw) {
float hw_scale = (float)srch / srcw;
if (hw_scale > 1) {
*newh = inpHeight;
*neww = int(inpWidth / hw_scale);
resize(srcimg, dstimg, Size(*neww, *newh), INTER_AREA);
*left = int((inpWidth - *neww) * 0.5);
copyMakeBorder(dstimg, dstimg, 0, 0, *left, inpWidth - *neww - *left, BORDER_CONSTANT, 114);
}
else {
*newh = (int)inpHeight * hw_scale;
*neww = inpWidth;
resize(srcimg, dstimg, Size(*neww, *newh), INTER_AREA);
*top = (int)(inpHeight - *newh) * 0.5;
copyMakeBorder(dstimg, dstimg, *top, inpHeight - *newh - *top, 0, 0, BORDER_CONSTANT, 114);
}
}
else {
resize(srcimg, dstimg, Size(*neww, *newh), INTER_AREA);
}
return dstimg;
}
int main(int argc, char* argv[])
{
//std::string imgpath = "images/bus.jpg";
std::string imgpath = "images/real.jpg";
utils::logging::setLogLevel(utils::logging::LOG_LEVEL_ERROR);//设置OpenCV只输出错误日志
Ort::Env env(ORT_LOGGING_LEVEL_WARNING, "yolov5s-5.0");
Ort::SessionOptions session_options;
session_options.SetIntraOpNumThreads(1);
session_options.SetGraphOptimizationLevel(GraphOptimizationLevel::ORT_ENABLE_EXTENDED);
#ifdef _WIN32
//const wchar_t* model_path = L"yolov5s.onnx";
const wchar_t* model_path = L"sim_best20221027.onnx";
#else
const char* model_path = "yolov5s.onnx";
#endif
std::vector<std::string> class_names;
//std::string classesFile = "class.names";
std::string classesFile = "myclass.txt";
std::ifstream ifs(classesFile.c_str());
std::string line;
while (getline(ifs, line)) class_names.push_back(line);
Ort::Session session(env, model_path, session_options);
// print model input layer (node names, types, shape etc.)
Ort::AllocatorWithDefaultOptions allocator;
// print number of model input nodes
size_t num_input_nodes = session.GetInputCount();
std::vector<const char*> input_node_names = {
"images"};
std::vector<const char*> output_node_names = {
"output0"};
size_t input_tensor_size = 3*640*640;
std::vector<float> input_tensor_values(input_tensor_size);
cv::Mat srcimg = cv::imread(imgpath);
int newh = 0, neww = 0, padh = 0, padw = 0;
Mat dstimg = resize_image(srcimg, &newh, &neww, &padh, &padw);//Padded resize
//resizedImage.convertTo(floatImage, CV_32FC3, 1 / 255.0);
for (int c = 0; c < 3; c++)
{
for (int i = 0; i < 640; i++)
{
for (int j = 0; j < 640; j++)
{
float pix = dstimg.ptr<uchar>(i)[j * 3 + 2 - c];
input_tensor_values[c * 640 * 640 + i * 640 + size_t(j)] = pix / 255.0;
}
}
}
// create input tensor object from data values
std::vector<int64_t> input_node_dims = {
1, 3, 640, 640 };
auto memory_info = Ort::MemoryInfo::CreateCpu(OrtArenaAllocator, OrtMemTypeDefault);
Ort::Value input_tensor = Ort::Value::CreateTensor<float>(memory_info, input_tensor_values.data(), input_tensor_size, input_node_dims.data(), input_node_dims.size());
std::vector<Ort::Value> ort_inputs;
ort_inputs.push_back(std::move(input_tensor));
// score model & input tensor, get back output tensor
std::vector<Ort::Value> output_tensors = session.Run(Ort::RunOptions{
nullptr }, input_node_names.data(), ort_inputs.data(), input_node_names.size(), output_node_names.data(), output_node_names.size());
// Get pointer to output tensor float values
const float* rawOutput = output_tensors[0].GetTensorData<float>();
//generate proposals
std::vector<int64_t> outputShape = output_tensors[0].GetTensorTypeAndShapeInfo().GetShape();
size_t count = output_tensors[0].GetTensorTypeAndShapeInfo().GetElementCount();
std::vector<float> output(rawOutput, rawOutput + count);
std::vector<cv::Rect> boxes;
std::vector<float> confs;
std::vector<int> classIds;
int numClasses = (int)outputShape[2] - 5;
int elementsInBatch = (int)(outputShape[1] * outputShape[2]);
float confThreshold = 0.5;
for (auto it = output.begin(); it != output.begin() + elementsInBatch; it += outputShape[2])
{
float clsConf = *(it+4);//object scores
if (clsConf > confThreshold)
{
int centerX = (int)(*it);
int centerY = (int)(*(it + 1));
int width = (int)(*(it + 2));
int height = (int)(*(it + 3));
int x1 = centerX - width / 2;
int y1 = centerY - height / 2;
boxes.emplace_back(cv::Rect(x1, y1, width, height));
// first 5 element are x y w h and obj confidence
int bestClassId = -1;
float bestConf = 0.0;
for (int i = 5; i < numClasses + 5; i++)
{
if ((*(it + i)) > bestConf)
{
bestConf = it[i];
bestClassId = i - 5;
}
}
//confs.emplace_back(bestConf * clsConf);
confs.emplace_back(clsConf);
classIds.emplace_back(bestClassId);
}
}
float iouThreshold = 0.5;
std::vector<int> indices;
// Perform non maximum suppression to eliminate redundant overlapping boxes with
// lower confidences
cv::dnn::NMSBoxes(boxes, confs, confThreshold, iouThreshold, indices);
//随机数种子
RNG rng((unsigned)time(NULL));
for (size_t i = 0; i < indices.size(); ++i)
{
int index = indices[i];
int colorR = rng.uniform(0, 255);
int colorG = rng.uniform(0, 255);
int colorB = rng.uniform(0, 255);
//保留两位小数
float scores = round(confs[index] * 100) / 100;
std::ostringstream oss;
oss << scores;
rectangle(dstimg, Point(boxes[index].tl().x, boxes[index].tl().y), Point(boxes[index].br().x, boxes[index].br().y), Scalar(colorR, colorG, colorB), 1.5);
putText(dstimg, class_names[classIds[index]] + " " + oss.str(), Point(boxes[index].tl().x, boxes[index].tl().y - 5), FONT_HERSHEY_SIMPLEX, 0.5, Scalar(colorR, colorG, colorB), 2);
}
imshow("检测结果", dstimg);
cv::waitKey();
}
这里有一个大坑
: The reason is that
direct assignment can be used for the API of ONNX Runtime
std::vector<const char*> input_node_names = {
"images"};
std::vector<const char*> output_node_names = {
"output0"};
In this way, the use of input_node_names obtained by the API of ONNX Runtime will throw an exception. After observation, the difference between the two contents is "images" is that the value of ‘i’
the address const char* _ptr64
type is different.
// print number of model input nodes
size_t num_input_nodes = session.GetInputCount();
for (int i = 0; i < num_input_nodes; i++)
{
Ort::AllocatedStringPtr input_name_Ptr = session.GetInputNameAllocated(i, allocator);
input_node_names.push_back(input_name_Ptr.get());//"images" char * __ptr64
Ort::TypeInfo input_type_info = session.GetInputTypeInfo(i);
auto input_tensor_info = input_type_info.GetTensorTypeAndShapeInfo();//Get OrtTensorTypeAndShapeInfo from an OrtTypeInfo.
auto input_dims = input_tensor_info.GetShape();//Uses GetDimensionsCount & GetDimensions to return a std::vector of the shape.
input_node_dims.push_back(input_dims);//input_node_dims[0] = vector<unsigned_int64>{1, 3, 640, 640}
}
2.3.1 API Learning of ONNX Runtime (C++)
- The C++ API is a thin wrapper of the C API. Please refer to C API for more details. You can also choose C/C++ API Docs
from the official API Docs for more details. - According to the introduction of the official document, here are some important APIs. The official document feels that C / C++ is very messy. Here, we will learn directly from the MNIST demo combined with the official C & C++ APIs .
- Reference: Ort Namespace Reference for C++
- The following introduction mainly introduces the namespace Ort Namespace Reference , uses the classes in it as the basis for classification, and then introduces the member functions used by each class
- According to the introduction of Python's onnxruntime,
InferenceSession
it is the main class of ONNX Runtime. It is used to load and run ONNX models, and to specify environment and application configuration options - Functions with
deprecated(强烈反对 / 强烈抨击)
a should not be used and their suggested alternatives should be used.
2.3.1.1 (Classes)Ort::MemoryInfo
- Reference: Ort::MemoryInfo Struct Reference
- (Static Public Member Functions)
CreateCpu()
static MemoryInfo Ort::MemoryInfo::CreateCpu(OrtAllocatorType type, OrtMemType mem_type1)
- Function input:
OrtAllocatorType type:
OrtMemType mem_type1:
- The function returns
OrtMemoryInfo: The introduction comes from the function Memory description of where the p_data (Pointer to the data buffer) buffer resides (CPU vs GPU etc) that require OrtMemoryInfo as input
.
2.3.1.2 (Classes)Ort::Value
- Reference: Ort::Value Struct Reference
- (Static Public Member Functions)
CreateTensor()
There are four overloaded forms, you can enter the link to view: CreateTensor() [1/4]
MNIST uses the first one:
- Input
p_data: You can use the .data() function to get
p_data_element_number: You can use the .size() function to get
shape: the pointer of the dimension (1, 3, 640, 640)
shape_len: the number of dimensions (4) - The function returns: Ort::Value Struct Reference
according to Creates a tensor with a user supplied buffer. Wraps OrtApi::CreateTensorWithDataAsOrtValue .
Enter the link to know that the return is If no error, nullptr will be returned. If there is an error, a pointer to anOrtStatus
that contains error details will be returned.
UseOrtApi::ReleaseStatus
to free this pointer.
- (Static Public Member Functions)
GetTensorMutableData ()
orGetTensorData ()
, and the difference is that the input and return of the latter are const .
The function GetTensorMutableData() of its Wraps is as follows:
Get a pointer to the original data (row data) in the tensor .
Used to directly read/write/modify internal tensor data.
2.3.1.3 (Classes)Ort::Session
- Reference: Ort::Session Struct Reference
- (Static Public Member Functions): link
size_t GetInputCount () const
size_t GetOutputCount () const
Note that it is the number of inputs, how many inputs are required, not the number of Batch input pictures. For example, the output of the yolov5 model is unprocessed, with one input and three outputs. This is the number of input and output. - (Static Public Member Functions)
Run()
There are four overloaded forms, you can enter the link to view: Run() [1/3]
MNIST uses the second one:
- Input parameters:
input_names: the Array of each input name
input_values: the input data, the function to be used to obtainValue
the parameters can refer to the first form:CreateTensor()
- The first form of the function returns:
A std::vector of Value objects that directly maps to the output_count (eg. output_name[0] is the first entry of the returned vector)
- (Public Member Functions)
Session()
There are 5 overloaded forms, you can enter the link to view: Session() [1/5]
uses the second one:
OrtApi ::CreateSession. :
- (Static Public Member Functions): Get In / Out putName()
char * GetInputName (size_t index, OrtAllocator *allocator) const
char * GetOutputName (size_t index, OrtAllocator *allocator) const
You can see the following Deprecated , so these two members The function has also been abandoned and should be replacedGetInputNameAllocated()
withGetOutputNameAllocated()
the introduction of its wrap function:
SessionGetInputName() and SessionGetOutputName()
- Obviously deprecated here , so
应该换成
GetInputNameAllocated() and GetOutputNameAllocated() .
Returns a copy of the input/output name (corresponding to onnx's input and output name) at the specified index.
The return value is AllocatedStringPtr , anunique_ptr
instance of smart pointer ( ), see 2.3.1.8 for details .
- (Static Public Member Functions): link
TypeInfo GetInputTypeInfo (size_t index) const
TypeInfo GetOutputTypeInfo (size_t index) const
The use of its input and output parameters can be known through its Wrap function:
SessionGetInputTypeInfo()
SessionGetOutputTypeInfo()
2.3.1.4 (Classes)Ort::Env
- Reference: Ort::Env Struct Reference
- The Env holds the logging state used by all other objects. Note: One Env must be created before using any other Onnxruntime functionality
- (Public Member Functions)
Env()
There are a total of 6 overloaded forms, which can be viewed in the link: Env () [1/6]
uses the second type: OrtLoggingLevel : specifies the lowest severity of the log message to be displayed
2.3.1.5 (Classes)Ort::SessionOptions
- Reference: Ort ::SessionOptions Struct Reference
- Options object used when creating a new Session object .
- (Public Member Functions)
SessionOptions()
There are a total of 3 overloaded forms, which can be viewed in the link: SessionOptions() [1/3]
uses the second type: CreateSessionOptions()l
: first call it on your favorite execution provider This method, and then the less preferred execution provider. If not called Ort will use its internal CPU to execute the provider.
- (Public Member Functions)
SetGraphOptimizationLevel()
Introduction: Graph Optimizations in ONNX Runtime
One level of optimization is performed after applying the previous level of optimization.
Among them, GraphOptimizationLevell
- (Public Member Functions)
SetIntraOpNumThreads()
Introduction: SetIntraOpNumThreads()
Among them, OrtApi::SetIntraOpNumThreads.
Sets the number of threads used for parallel execution within the node.
When running single node operations, eg. add, this will set the maximum number of threads to use.
2.3.1.6 (Classes)Ort::TypeInfo
- Reference: Ort::TypeInfo Struct Reference
- (Public Member Functions)
Unowned< TensorTypeAndShapeInfo > GetTensorTypeAndShapeInfo () const
You can enter the link to view: GetTensorTypeAndShapeInfo() CastTypeInfoToTensorInfo()
of its wraps : Get OrtTensorTypeAndShapeInfo from an OrtTypeInfo .
2.3.1.7 (Classes)Ort::TensorTypeAndShapeInfo
- (Public Member Functions)
std::vector< int64_t > GetShape () const
You can enter the link to view: GetShape()
Uses GetDimensionsCount & GetDimensions to return a std::vector of the shape. - (Public Member Functions)
size_t GetElementCount () const
You can enter the link to view: GetElementCount ()
and its Wraps:GetTensorShapeElementCount()
, return the total number of data ( all dimensions multiplied by each other ), return 1 for 0 dimensions, and return -1 for dimensions less than 0.
2.3.1.8 (Classes)Ort::AllocatorWithDefaultOptions / Ort::Allocator
- 参考:Ort::AllocatorWithDefaultOptions Struct Reference
- a instance of smart pointer (an instance of a smart pointer) that would deallocate the buffer when out of scope.
The OrtAllocator instances must be valid at the point of memory release . - Supplementary references: Ort::Allocator Struct Reference and OrtAllocator Struct Reference
- (Typedef Documentation) AllocatedStringPtr
is a unique_ptr typedef for owning strings allocated by OrtAllocators and freeing them when the scope ends. The lifetime of the given allocator must outlive the lifetime of the AllocatedStringPtr instance .
2.3.1.10 (Classes)Ort::RunOptions
- Reference: Ort::RunOptions Struct Reference
2.3.1.11 (Classes)AllocatedStringPtr
AllocatedStringPtr : Note that there are pitfalls here. AllocatedStringPtr is a smart pointer, so you must pay attention to the life cycle.
The unique_ptr typedef is used to own strings allocated by OrtAllocators and free them when the scope ends. The lifetime of the given allocator must outlive the lifetime of the allocated string Ptr instance
2.3.2 Basic learning of OpenCV (C++)
To reason about images, you need to use OpenCV to process images
2.3.2.1 Important library functions
1. Use functions cv::imread()
to read pictures
2. Use functions cv::imshow()
to display pictures
- When used
imshow()
, it must be matched with a function laterwaitkey()
, otherwise it cannot be displayed normally.
waitkey()
Explanation:
@brief : Polls for a pressed key.
FunctionpollKey
Polls for keypress events without waiting. It returns the code of the key that was pressed, or -1 if no key was pressed since the last call.
To wait until a key is pressed, usewaitKey
.
@note : FunctionwaitKey
andpollKey
are the only methods in HighGUI that can get and handle GUI events, so one of them needs to be called periodically for normal event handling, unless HighGUI is used in an event-handling environment.
@note : This function is only valid when at least one HighGUI window is created and active . If there are multiple HighGUI windows, any one of them can be activated.
3. Use functions cv::dnn::NMSBoxes
for non-maximum suppression
4. Use functions cv::rectangle()
to draw rectangles
4. cv::putText()
Write text using functions
2.3.2.2 Basic data structure Mat
Official documentation: cv::Mat Class Reference
- Refer to Opencv C++ basic data structure Mat , OpenCV basic type 4–cv::Mat detailed explanation and [opencv] cv::Mat image format (Data Type) Note that some operations call member functions (with parentheses), and some operations directly Output member variables.
- View data type: OpenCV's Mat class is used to obtain common attributes and methods of image information
Form:Mat(int rows, int cols, int type)
more important operation introduction:
1. Use the member function channels()
to get the number of channels of the matrix
Color channel conversion: cvtColor(cv_image, cv_image, cv::COLOR_BGR2RGB);
[OpenCV3] Color space conversion - cv::cvtColor() detailed explanation
2. Use member variables dims
to get the dimension of the matrix
Tips: The difference between dims and channels()
3. Use member functions size()
to get the dimensions of the matrix
4. Use member functions convertTo()
to convert the format of the matrix
cv::Mat::convertTo() can realize data type conversion, channel sequence conversion, data digit conversion, etc., and realize its own search.
5. Use the member function ptr() / at<T>() / 地址
to get the value of a pixel in the matrix
Official docs: ptr() [1/20]
uchar* cv::Mat::ptr ( int i0 = 0 )
Returns a pointer to the specified matrix row.
- So how to get the value of a certain position of the picture in CHW format?
The i-th row, the j-th column, the value of the c-th channel: the at operation is simple and convenient, but the efficiency is low. It is recommended to use the ptr operation.
at method :float pix = img.at<Vec3b>(i, j)[c];
matrix element address :float pix = (int)(*(img.data + img.step[0] * i + img.step[1] * j + c));
pointer ptr :float pix = img.ptr<uchar>(i)[j * channel + channel-1 - c];//channel-1 是因为数组从0开始
iterator : difficult, not recommended for novices - Among them, the value of three channels
cout<<img.at<Vec3b>(i, j)<<endl;
can be output , such as [230 222 102], and the output is uchar type garbled code, which requires data type conversion to output the value.cout<<img.at<Vec3b>(i, j)[c]<<endl;
- The above access methods refer to: C++ version OpenCv tutorial (four) 4 methods of reading Mat class elements and [C++ Opencv] read and write grayscale images, a certain pixel of RGB image, modify pixel value, image inversion (source code + API) , you can also find some other methods.
- Refer to the data type used
int,uchar,Vec3b
or other, refer to the second table of opencv cv::Mat data type summary . - Note that in the use of OpenCV, the order of color channels used is BGR color format instead of RGB format. If you want to get it out in RGB format, you should
float pix = img.ptr<uchar>(i)[j * 3 + 2 - c];
- Here is an image of my hand drawing:
2.4 ONNX Runtime (C++) reasoning model
The steps of the inference model:
OpenCV reads the image —> processes the image to obtain an image of a suitable size —> performs inference —> obtains the result of the inference
2.4.1 Processing before data input
Padded resize
The processing of input data is best realized by the method proposed by yolov5
2.4.1.1 Padded Resize Method Introduction
Padded Resize
: The aspect ratio of the image can be maintained , and the remaining part is filled with gray , so as to maintain the aspect ratio of the original image by filling the border (usually gray filling), and at the same time meet the needs of the model square input.- Its Pyhon source code is located in the dataset.py
letterbox()
method of yolov5.
Its return value only needs the first parameter img (the picture after padded resize):
its source code is as follows: know the principle and reproduce it with C++
def letterbox(img, new_shape=(640, 640), color=(114, 114, 114), auto=True, scaleFill=False, scaleup=True, stride=32):
# Resize and pad image while meeting stride-multiple constraints
shape = img.shape[:2] # current shape [height, width]
if isinstance(new_shape, int):
new_shape = (new_shape, new_shape)
# Scale ratio (new / old)
r = min(new_shape[0] / shape[0], new_shape[1] / shape[1])
if not scaleup: # only scale down, do not scale up (for better test mAP)
r = min(r, 1.0)
# Compute padding
ratio = r, r # width, height ratios
new_unpad = int(round(shape[1] * r)), int(round(shape[0] * r))
dw, dh = new_shape[1] - new_unpad[0], new_shape[0] - new_unpad[1] # wh padding
if auto: # minimum rectangle
dw, dh = np.mod(dw, stride), np.mod(dh, stride) # wh padding
elif scaleFill: # stretch
dw, dh = 0.0, 0.0
new_unpad = (new_shape[1], new_shape[0])
ratio = new_shape[1] / shape[1], new_shape[0] / shape[0] # width, height ratios
dw /= 2 # divide padding into 2 sides
dh /= 2
if shape[::-1] != new_unpad: # resize
img = cv2.resize(img, new_unpad, interpolation=cv2.INTER_LINEAR)
top, bottom = int(round(dh - 0.1)), int(round(dh + 0.1))
left, right = int(round(dw - 0.1)), int(round(dw + 0.1))
img = cv2.copyMakeBorder(img, top, bottom, left, right, cv2.BORDER_CONSTANT, value=color) # add border
return img, ratio, (dw, dh)
2.4.1.2 Learning cv::resize()
functions
Reference: official document cv::resize()
Look at the source code first: atimgproc.hpp
CV_EXPORTS_W void resize( InputArray src, OutputArray dst,
Size dsize, double fx = 0, double fy = 0,
int interpolation = INTER_LINEAR );
Parameter introduction:
- src: input image , input image
- dst: output image that has the size and
dsize
the same type as src .dsize
src
- dsize:
dsize = Size(round(fx*src.cols), round(fy*src.rows))
, C++ functionround()
and OpenCV classSize
reference: C++: round function usage and Opencv's Size class-size class , pay attentionSize
to the width first and then the height - interpolation: interpolation method, default bilinear interpolation, reference: enum cv::InterpolationFlags
The screenshot is as follows:
2.4.1.3 Learning cv::copyMakeBorder()
functions
Reference: Official document: copyMakeBorder() , the function is Forms a border around an image.
- This function copies the source image to the middle of the destination image.
The areas to the left, right, above, and below the copied source image will be filled with extrapolated pixels . This is not what filter functions based on it do (they infer pixels dynamically), but other more complex functions (including your own) can be used to simplify image boundary handling.
The function supports modes where src is already in the middle of dst. In this case, the function does not copy the src itself, but simply constructs the border. - When the source image is part of a larger image (ROI: region of interest), the function will try to use pixels outside the ROI to form the border . To disable this and always extrapolate as if src were not the ROI, use borderType | BORDER_ISOLATED.
CV_EXPORTS_W void copyMakeBorder(InputArray src, OutputArray dst,
int top, int bottom, int left, int right,
int borderType, const Scalar& value = Scalar() );
Parameter introduction:
- src: input image , input image
- dst: output image that has the size and
dsize
the same type as src . with a size ofdsize
src
Size(src.cols+left+right, src.rows+top+bottom)
- top, bottom, left, right: expand the size of the border around the original image
- borderType: The type of border to extend, See borderInterpolate for details .
- value: Border value if borderType==BORDER_CONSTANT . Otherwise it depends on the original image, it has nothing to do with value
- In the official introduction of BorderTypes, you can see the various border types in the figure below. The image border is represented by |, the middle is the input image, and the two sides are the extended border of the image and the internal relationship . For details, see OpenCV library members - BorderTypes .
- The Scalar() function is used to set the color and format in OpenCV.
Scalar(B,G,R)
If you directly write a number X , it is equivalent toScalar(X,0,0)
, refer to: Scalar() function in opencv . And the official example: typedef Scalar_ cv::Scalar
2.4.1.4 Pixel normalization
Knowledge that needs to be understood: [ -What are the data types CV_8U, CV_16U, CV_16S, CV_32F and CV_64F in Opencv? , mine should be V_8UC3
converted from C. It should be noted here : the image channel order read by OpenCV is BGR , and the normalized input should be in RGB order. At the same time, my input type is float32 , which also needs to be converted.CV_32FC3
3 ONNX Runtime (C++) reasoning model in ONNX format
Function used to construct input data: public member function CreateTensor() of Ord::Value
Reasoning main function used: public member function Run of Ord::Session
3.1 Model input data specification
Carefully study the input of the function, construct your own data into the data format it needs and the input data type and format of your own model , and then input it.
- For the specification of data types, refer to the article:
Analysis of uint8_t / uint16_t / uint32_t /uint64_t of C language
[C/C++] uin8_t uint16_t uint32_t uint64_t data type analysis
C language in DSP (1) - definitions of int16, Uint16, float32, etc. Usage and difference of
c++ data type uint8_t/uint16_t/uint32_t/float128_t
For my model, the attribute should befloat32(1, 3, 640, 640)
- OpenCV is now used, so the input data specification of OpenCV is used, which is located in the header file as follows:
- For the data type specification of C/C++, you can look at its own header file, and you can see that for the same data type, many aliases are given
and another alias is located in the following header file:typedef unsigned short wchar_t;
Looking stdint.h
at the source code, you can see that in addition to the data type alias, many data-related macros are defined, which can be used to detect the size of the data and avoid problems such as overflow.
//
// stdint.h
//
// Copyright (c) Microsoft Corporation. All rights reserved.
//
// The C Standard Library <stdint.h> header.
//
#pragma once
#define _STDINT
#include <vcruntime.h>
#if _VCRT_COMPILER_PREPROCESSOR
#pragma warning(push)
#pragma warning(disable: _VCRUNTIME_DISABLED_WARNINGS)
typedef signed char int8_t;
typedef short int16_t;
typedef int int32_t;
typedef long long int64_t;
typedef unsigned char uint8_t;
typedef unsigned short uint16_t;
typedef unsigned int uint32_t;
typedef unsigned long long uint64_t;
typedef signed char int_least8_t;
typedef short int_least16_t;
typedef int int_least32_t;
typedef long long int_least64_t;
typedef unsigned char uint_least8_t;
typedef unsigned short uint_least16_t;
typedef unsigned int uint_least32_t;
typedef unsigned long long uint_least64_t;
typedef signed char int_fast8_t;
typedef int int_fast16_t;
typedef int int_fast32_t;
typedef long long int_fast64_t;
typedef unsigned char uint_fast8_t;
typedef unsigned int uint_fast16_t;
typedef unsigned int uint_fast32_t;
typedef unsigned long long uint_fast64_t;
typedef long long intmax_t;
typedef unsigned long long uintmax_t;
// These macros must exactly match those in the Windows SDK's intsafe.h.
#define INT8_MIN (-127i8 - 1)
#define INT16_MIN (-32767i16 - 1)
#define INT32_MIN (-2147483647i32 - 1)
#define INT64_MIN (-9223372036854775807i64 - 1)
#define INT8_MAX 127i8
#define INT16_MAX 32767i16
#define INT32_MAX 2147483647i32
#define INT64_MAX 9223372036854775807i64
#define UINT8_MAX 0xffui8
#define UINT16_MAX 0xffffui16
#define UINT32_MAX 0xffffffffui32
#define UINT64_MAX 0xffffffffffffffffui64
#define INT_LEAST8_MIN INT8_MIN
#define INT_LEAST16_MIN INT16_MIN
#define INT_LEAST32_MIN INT32_MIN
#define INT_LEAST64_MIN INT64_MIN
#define INT_LEAST8_MAX INT8_MAX
#define INT_LEAST16_MAX INT16_MAX
#define INT_LEAST32_MAX INT32_MAX
#define INT_LEAST64_MAX INT64_MAX
#define UINT_LEAST8_MAX UINT8_MAX
#define UINT_LEAST16_MAX UINT16_MAX
#define UINT_LEAST32_MAX UINT32_MAX
#define UINT_LEAST64_MAX UINT64_MAX
#define INT_FAST8_MIN INT8_MIN
#define INT_FAST16_MIN INT32_MIN
#define INT_FAST32_MIN INT32_MIN
#define INT_FAST64_MIN INT64_MIN
#define INT_FAST8_MAX INT8_MAX
#define INT_FAST16_MAX INT32_MAX
#define INT_FAST32_MAX INT32_MAX
#define INT_FAST64_MAX INT64_MAX
#define UINT_FAST8_MAX UINT8_MAX
#define UINT_FAST16_MAX UINT32_MAX
#define UINT_FAST32_MAX UINT32_MAX
#define UINT_FAST64_MAX UINT64_MAX
#ifdef _WIN64
#define INTPTR_MIN INT64_MIN
#define INTPTR_MAX INT64_MAX
#define UINTPTR_MAX UINT64_MAX
#else
#define INTPTR_MIN INT32_MIN
#define INTPTR_MAX INT32_MAX
#define UINTPTR_MAX UINT32_MAX
#endif
#define INTMAX_MIN INT64_MIN
#define INTMAX_MAX INT64_MAX
#define UINTMAX_MAX UINT64_MAX
#define PTRDIFF_MIN INTPTR_MIN
#define PTRDIFF_MAX INTPTR_MAX
#ifndef SIZE_MAX
// SIZE_MAX definition must match exactly with limits.h for modules support.
#ifdef _WIN64
#define SIZE_MAX 0xffffffffffffffffui64
#else
#define SIZE_MAX 0xffffffffui32
#endif
#endif
#define SIG_ATOMIC_MIN INT32_MIN
#define SIG_ATOMIC_MAX INT32_MAX
#define WCHAR_MIN 0x0000
#define WCHAR_MAX 0xffff
#define WINT_MIN 0x0000
#define WINT_MAX 0xffff
#define INT8_C(x) (x)
#define INT16_C(x) (x)
#define INT32_C(x) (x)
#define INT64_C(x) (x ## LL)
#define UINT8_C(x) (x)
#define UINT16_C(x) (x)
#define UINT32_C(x) (x ## U)
#define UINT64_C(x) (x ## ULL)
#define INTMAX_C(x) INT64_C(x)
#define UINTMAX_C(x) UINT64_C(x)
#pragma warning(pop) // _VCRUNTIME_DISABLED_WARNINGS
#endif // _VCRT_COMPILER_PREPROCESSOR
3.2 Introduction to model input data and reasoning of ONNVRuntime (C++)
According to the selected inference function inline std::vector<Value> Session::Run()
to set the input, I choose three forms, and I choose the first one: Run() [1/3]
std::vector< Value > Ort::Session::Run ( const RunOptions & run_options,
const char *const * input_names,
const Value * input_values,
size_t input_count,
const char *const * output_names,
size_t output_count
)
The caller provides a list of inputs and a list of desired outputs to be returned.
- Create a function inside the function
std::vector<Ort::Value> output_values;
as output.
- run_options: set by yourself
- input_names: Store input names, for example: if
vector<const char*> input_names = { "images"};
theninput_names.data()
(because pointers are used), of course, multi-input models can have multiple parameters. - input_values: This is the key point, there are pitfalls! ! ! ! ! To use rvalue references, this parameter will be described in detail later
- input_count: The number of input names, which is also the number of model input nodes. eg
input_names.size()
. - output_names: The names of the outputs of the model. See input_names .
- output_count: The number of output names, which is also the number of model input nodes, refer to input_count .
Then enter the following function:
inline void Session::Run(const RunOptions& run_options, const char* const* input_names, const Value* input_values, size_t input_count,
const char* const* output_names, Value* output_values, size_t output_count) {
static_assert(sizeof(Value) == sizeof(OrtValue*), "Value is really just an array of OrtValue* in memory, so we can reinterpret_cast safely");
auto ort_input_values = reinterpret_cast<const OrtValue**>(const_cast<Value*>(input_values));
auto ort_output_values = reinterpret_cast<OrtValue**>(output_values);
ThrowOnError(GetApi().Run(p_, run_options, input_names, ort_input_values, input_count, output_names, output_count, ort_output_values));
}
can be seen:
- For
input_values
the operation ofauto ort_input_values = reinterpret_cast<const OrtValue**>(const_cast<Value*>(input_values))
first remove the inputconst
attributes, and then convert toconst OrtValue**
the value of - The new operation
output_values
is performed , and finally the sumreinterpret_cast<OrtValue**>
is obtained .ort_input_values
ort_output_values
- Finally, the value (
OrtValue **
type) is returned, and the type is encapsulated.
4 Display of inference results of ONNVRuntime (C++)
Tip: Qt(MinGW) deploys ONNX Runtime reasoning framework:
ONNXRuntime is compiled with MSVC and cannot be used with MinGW compiler. Refer to the following:
error: unknown type name ' Frees_ptr_opt '
error: ' Frees_ptr_opt ' has not been declared