Written in front:

Huang Ningran, the project of your Party A is not easy to do.

Source of problem:

Yolov3 has been implemented based on WIN10+VS2015+opencv3.4.12 before.
(https://download.csdn.net/download/xiaohuolong1827/34664248)
When it was used in the project, it was discovered that the implementation does not support cuda, and it needs to be supported by opencv4 or above, which is mentioned in dnn.hpp: //OpenCV 4.x: DNN_BACKEND_CUDA,
all projects of Party A have been using opencv3.4.12 version.
To this end, one of the solutions is to implement yolov3 based on libtorch.

references:

[1] Luo Bin, the Yolov3 model is trained on pytorch, and the model is loaded and inferred on Libtorch in C++,
https://zhuanlan.zhihu.com/p/246156517?utm_source=qq
[2] L2_Zhang, c++ call pytorch libtorch (YoloV3 actual combat),
https://blog.csdn.net/WANGWUSHAN/article/details/118968060

1. Download yolov3 pytorch version

Download URL: https://github.com/eriklindernoren/PyTorch-YOLOv3

2. Make pt file in python

2.1 pt file creation

Open the detect.py file in the project, and specify the path of each parameter file of the model (which has been trained in advance). For example:

args.model = '../lxxz_yolo_test/yolov3_1classes.cfg'
args.weights = '../lxxz_yolo_test/trained_weights_final_202204252156.weights'
args.classes = '../lxxz_yolo_test/lxxz_classes.txt'
args.n_cpu = 1
args.conf_thres = 0.5
args.nms_thres = 0.5
args.images = '../lxxz_image/'

Here, rewrite the detect_directory function:

def detect_directory2(model_path, weights_path, img_path, classes, output_path,
                     batch_size=8, img_size=416, n_cpu=8, conf_thres=0.5, nms_thres=0.5):
    files_list = os.listdir(img_path)
    model = load_model(model_path, weights_path)
    for f in files_list:
        img = cv2.imread(img_path+f,cv2.IMREAD_COLOR)
        boxes = detect_image(model,img,img_size=img_size,conf_thres=conf_thres,nms_thres=nms_thres)
        draw_img = draw_boxes(img, boxes)
        print(boxes)
    print(f"---- Detections were saved to: '{output_path}' ----")

In the main program, the original call to detect_directory is changed to call detect_directory2. The main thing is to call the subfunction detect_image. In the detect_image function, after the network prediction, add a statement to generate a pt file or call a pt file as needed

with torch.no_grad():
        detections = model(input_img)
        detections = non_max_suppression(detections, conf_thres, nms_thres)
        detections = rescale_boxes(detections[0], img_size, image.shape[:2])
    ##根据需要将网络保存为pt格式文件
    traced_model = torch.jit.trace(model, input_img, check_trace=False)
    traced_model.save("yolo_temp.pt")
    test_out = traced_model(input_img)
    ##根据需要，导入pt文件，进行预测
    model2 = torch.jit.load("yolo_temp.pt")
    output2 = model2.forward(input_img)
    output2 = non_max_suppression(output2, conf_thres, nms_thres)
    output2 = rescale_boxes(output2[0], img_size, image.shape[:2])
    print(output2[0].equal(detections[0]))

2.2 Points to note

(1) If there is an error in the operation

If there is an error in torch.jit.trace, you need to set the parameter check_trace=False
in the _make_grid function in the models.py file. If you call the torch.meshgrid function:
yv, xv = torch.meshgrid([torch.arange(ny), torch.arange(nx)], indexing='ij')
If there is an indexing parameter error, delete the indexing parameter.

(2) gpu or cpu

When trace generates a pt file, if the model and input_img are in 'cuda', the generated pt is the GPU version, otherwise the generated pt is the cpu version.

(3) Do not add non-fixed codes to the model at will when generating pt files

In python, use torch.jit.trace to generate pt files:

traced_model = torch.jit.trace(model, input_img, check_trace=False)
traced_model.save("yolo_temp.pt")

Here, it is not possible to add non-maximum value suppression code to the model in order to save trouble in c++ prediction. The reason is: when using torch.jit.trace to generate pt files, you should ensure that the output of the model is a fixed dimension, for example, the output of yolov3 is 10647*(4+1+number of categories) dimension; if you add non- Maximum suppression, the output of the network will become uncertain (for example, after non-maximum suppression, 2 frames may be predicted for a certain image, and 4 frames may be predicted for another image, which is not Sure). The error phenomenon directly caused: use picture A to trace and save it into a pt file; then load the pt model and use picture A for prediction, the result is normal, but if other pictures are used for prediction, it fails.

3. Prediction in libtorch

VS2017 has established the libtorch project in advance to ensure that the project can use torch normally. Refer to:
https://blog.csdn.net/xiaohuolong1827/article/details/121428648

3.1 Import model

Import the model into the main program

torch::jit::script::Module module = torch::jit::load(“D:\\xxx_gpu.pt”);

3.2 Read pictures

Read pictures in the main program and perform format processing

	Mat imgSrc = imread("D: \\60_1.265_89.74.tif", -1);
	Mat img = imgSrc.clone();
	if (img.depth() == CV_16U)
	{
    
    
		img.convertTo(img, CV_32F, 1.0 / 65535);
	}
	else
	{
    
    
		img.convertTo(img, CV_32F, 1.0 / 255);
	}
	cv::resize(img, img, Size(416, 416));
	if (img.channels() == 1)
	{
    
    
		cv::cvtColor(img, img, cv::COLOR_GRAY2BGR);
	}

3.3 Network Prediction

Write a prediction function

int torch_model_predict(void*pmodule, Mat img_input, Mat *img_output, int cuda_flag)
{
    
    
	Mat img = img_input.clone();
	if (pmodule == 0)
	{
    
    
		return 1;
	}
	try
	{
    
    
		torch::DeviceType device_type = (cuda_flag) ? at::kCUDA : at::kCPU;
		torch::jit::script::Module *module = (torch::jit::script::Module *)pmodule;
		module->to(device_type);
		module->eval();
		//制作tensor
		Mat data_src;
		img.convertTo(data_src, CV_32F, 1.0);//无论img什么类型，先转为float类型
		torch::Tensor tensor_image = torch::from_blob(data_src.data, {
    
     1,img.rows, img.cols,img.channels() }, torch::kFloat);// torch::kFloat//torch::kByte
		tensor_image = tensor_image.permute({
    
     0,3,1,2 });//将第3维度提前
		tensor_image = tensor_image.to(device_type);
		//网络预测
		at::Tensor outputs = module->forward({
    
     tensor_image }).toTensor();
		//提取预测结果//
		int size_arr[10] = {
    
     0 };//最多接受10维
		int n = outputs.dim();
		if (n > 10)
		{
    
    
			return 2;
		}
		for (int i = 0; i < n; i++)
		{
    
    
			size_arr[i] = outputs.size(i);
		}
		outputs = outputs.to(at::kCPU);
		Mat outimg(n, size_arr, CV_32F, outputs.data_ptr());
		*img_output = outimg.clone();
	}
	catch (...)
	{
    
    
		return 3;
	}
	return 0;
}

Called in the main program:

	Mat img_predict2;
	int reu = 0;
	reu = torch_model_predict((void*)module, img, &img_predict2, 1);

3.4 Post-processing of network output

In Python, we already know that the output dimension of the network is: 1*10647*n, n=(5+number of categories)

	int size_arr[10];
	for (int i = 0; i < img_predict2.dims; i++)
	{
    
    
		cout << img_predict2.size[i]<< " ";
		size_arr[i] = img_predict2.size[i];
	}	
	Mat boxes (Size(size_arr[2], size_arr[1]), CV_32F, img_predict2.data);

In this way, the predicted frame is obtained.

3.5 Threshold processing, non-maximum suppression

Boxes are thresholded, nms suppressed. This part of the program can use the program that implements the corresponding part of YOLOv3 based on opencv. But the position of the boxes must be normalized first. The format of each box is (cx, cy, w, h, s, c0~cn)

	for (int i = 0; i < boxes.rows; i++)
	{
    
    
		boxes.at<float>(i, 0) /= img_size;
		boxes.at<float>(i, 1) /= img_size;
		boxes.at<float>(i, 2) /= img_size;
		boxes.at<float>(i, 3) /= img_size;
	}

img_size is 416.
Then convert the boxes to vector, and then call the opencv commonly used on the network to implement the postprocess function in the yolov3 project.
(https://download.csdn.net/download/xiaohuolong1827/34664248)
Here, paste the code:

void postprocess(Mat& frame, const vector<Mat>& outs)
{
    
    
	//输出类
	vector<int> classIds;
	//置信度
	vector<float> confidences;
	vector<Rect> boxes;

	//遍历所有的输出层
	for (size_t i = 0; i < outs.size(); ++i)
	{
    
    
		// Scan through all the bounding boxes output from the network and keep only the
		// ones with high confidence scores. Assign the box's class label as the class
		// with the highest score for the box.
		//扫描所有来自网络的边界框输出，只保留具有高置信度分数的边界框。将框的类标签指定为框得分最高的类。
		//读取框
		float* data = (float*)outs[i].data;
		for (int j = 0; j < outs[i].rows; ++j, data += outs[i].cols)
		{
    
    
			Mat scores = outs[i].row(j).colRange(5, outs[i].cols);
			Point classIdPoint;
			double confidence;
			// Get the value and location of the maximum score 获取置信度和位置参数
			minMaxLoc(scores, 0, &confidence, 0, &classIdPoint);
			//如果大于置信度阈值
			if (confidence > confThreshold)
			{
    
    
				//获取坐标
				int centerX = (int)(data[0] * frame.cols);
				int centerY = (int)(data[1] * frame.rows);
				int width = (int)(data[2] * frame.cols);
				int height = (int)(data[3] * frame.rows);
				int left = centerX - width / 2;
				int top = centerY - height / 2;

				classIds.push_back(classIdPoint.x);
				confidences.push_back((float)confidence);
				boxes.push_back(Rect(left, top, width, height));
			}
		}
	}

	// Perform non maximum suppression to eliminate redundant overlapping boxes with
	// lower confidences
	//输出非极大性抑制结果，按置信度从大到小输出
	vector<int> indices;
	//非极大性抑制
	NMSBoxes(boxes, confidences, confThreshold, nmsThreshold, indices);
	//绘图
	for (size_t i = 0; i < indices.size(); ++i)
	{
    
    
		int idx = indices[i];
		Rect box = boxes[idx];
		//类，置信度
		drawPred(classIds[idx], confidences[idx], box.x, box.y,
			box.x + box.width, box.y + box.height, frame);
	}
}

3.6 Points to note

(1) Dimension order

When making the input tensor of the network, use torch::from_blob to pay attention to the order of dimensions:
{ 1,img.rows, img.cols,img.channels() }, that is, the number of batches, rows, columns, and channels.

(2) GPU or CPU needs strict correspondence

When Yolov3 predicts on libtroch, whether to use the GPU needs to be consistent with saving it as a pt file, that is, when saving a pt file on python, if the model is in the CPU, then when libtorch predicts, the model also needs to be placed In the cpu; if the model is placed in the GPU when saving as a pt file, then the model needs to be placed in the GPU when making predictions in libtorch.
The method of putting the model on the GPU or CPU is: model.to('cpu'), model.to('cuda'), and the image tensor should do the same.
This is different from unet. When unet predicts on libtorch, it only needs to ensure that the model and image tensor are located in the same device (CPU or cuda), and there is no need to pay attention to whether the model is in the GPU or CPU when generating the pt file. The reason was not explored.

4. Other

The C++ project is implemented based on VS2017, which can form a dll project and a dll file, which can be called in vs2013 and vs2015 (some project parties, such as Huang Jia, still use the version of VS2013, which does not support the direct use of libtorch. So you can first use vs2017 to encapsulate libtorch-related calling functions into a dll, and then call it in VS2013).
It’s really another hydrology article.

Yolov3 model loading and prediction on libtorch (c++, yolov3, libtorch)