Automated driving multi-task framework Hybridnets - Simultaneously process vehicle detection, drivable area segmentation, lane line segmentation model deployment (C++/Python)

1. Multi-sensory tasks

The perception system of mobile robots, including self-driving cars and drones, uses a variety of sensors to obtain key information to achieve environmental perception and object detection. These sensors include cameras, lidar, radar, inertial measurement units (IMU), global navigation satellite systems (GNSS), etc. In the perception system of autonomous driving and mobile robots, the function of the camera is mainly to detect the surrounding objects. These perception tasks include object detection and segmentation.
Object detection: Object detection refers to detecting objects in the environment, such as cars, pedestrians, cyclists, traffic lights, etc., by processing sensor data. YOLO series networks are solutions commonly used for real-time object detection. These networks can identify multiple objects in real time and provide their locations and bounding boxes.
Insert image description here

Instance Segmentation: Instance segmentation is a more complex task that not only detects objects but also segments them into parts that belong to different instances. This is useful for tracking and identifying different vehicles or pedestrians. Typically, networks such as Mask R-CNN are used for instance segmentation.
Insert image description here

Semantic Segmentation: Semantic segmentation is the task of assigning each pixel in an image to a specific semantic category, such as segmenting roads, buildings, vehicles, etc. into different regions. CNN architectures such as U-Net and fully convolutional networks (FCN) are commonly used for semantic segmentation.
There are two main approaches to consider when dealing with these tasks:
Single model: Develop a single neural network model that can handle multiple perception tasks such as detection, segmentation simultaneously. This method can improve computational efficiency, but requires a large amount of training data and a complex network structure.
Multi-model ensemble: Use multiple independent models to handle different perception tasks, with each model focusing on one task. This approach can more easily handle data imbalances between different tasks and can allocate resources to different models based on the importance of the tasks.

2. Hybridnets multi-tasking framework

Hybridnets, a multitasking end-to-end multi-sensing network, proposes several key optimizations to improve accuracy.

  1. An efficient segmentation head and bounding box/category prediction network based on a weighted bidirectional feature network is introduced.
  2. An automatic customized anchoring method for layers in a weighted bidirectional feature network is proposed.
  3. An efficient training loss function and training strategy are proposed to balance and optimize network performance.

Based on these optimizations, Hybridnets developed an end-to-end perception network for performing multi-tasking, including traffic target detection, drivable area segmentation, and lane line detection, which we called hybridnets. The hybrid network performed well on the Berkeley DeepDrive data set, with an average accuracy of 77.3%, an average intersection and union ratio of 31.6%, and only 12.83 million parameters and 15.6 billion floating point operations. Additionally, it is capable of performing visual perception tasks in real time, making it a practical and accurate multitasking solution.
Insert image description here
Algorithm source code: https://github.com/datvuthanh/HybridNets.git

3. Use C++ for model deployment

Convert the trained model into a model in onnx format, and then use OpenCV 4.6 and contrib dnn module for inference. The IDE used is Vs 2019.
First, define the class of one-person model reasoning. For the convenience of demonstration, only CPU is used for reasoning.

struct Net_config
{
    
    
	float confThreshold; 
	float nmsThreshold; 
	std::string modelpath;
	std::string anchorpath;
};

class HybridNets
{
    
    
public:
	HybridNets(Net_config config);
	cv::Mat detect(cv::Mat frame); 
	~HybridNets(); 
private:
	int inpWidth;
	int inpHeight;
	std::vector<std::string> det_class_names = {
    
     "car" };
	std::vector<std::string> seg_class_names = {
    
     "Background", "Lane", "Line" };
	int det_num_class;
	int seg_numclass;

	float confThreshold;
	float nmsThreshold;
	cv::dnn::Net net;
	float* anchors = nullptr;
	const float mean_[3] = {
    
     0.485, 0.456, 0.406 };
	const float std_[3] = {
    
     0.229, 0.224, 0.225 };
	const bool keep_ratio = true;
	cv::Mat resize_image(cv::Mat srcimg, int* newh, int* neww, int* padh, int* padw);
	cv::Mat normalize_(cv::Mat img);
	std::vector<cv::Vec3b> class_colors = {
    
     cv::Vec3b(0,0,0), cv::Vec3b(0, 255, 0), cv::Vec3b(255, 0, 0) };
};

HybridNets::HybridNets(Net_config config)
{
    
    
	this->confThreshold = config.confThreshold;
	this->nmsThreshold = config.nmsThreshold;

	this->net = cv::dnn::readNet(config.modelpath); 
	this->det_num_class = det_class_names.size();
	this->seg_numclass = seg_class_names.size();

	size_t pos = config.modelpath.rfind("_");
	size_t pos_ = config.modelpath.rfind(".");
	int len = pos_ - pos - 1;
	std::string hxw = config.modelpath.substr(pos + 1, len);

	pos = hxw.rfind("x");
	std::string h = hxw.substr(0, pos);
	len = hxw.length() - pos;
	std::string w = hxw.substr(pos + 1, len);
	this->inpHeight = stoi(h);
	this->inpWidth = stoi(w);

	pos = config.anchorpath.rfind("_");
	pos_ = config.anchorpath.rfind(".");
	len = pos_ - pos - 1;
	std::string len_ = config.anchorpath.substr(pos + 1, len);
	len = stoi(len_);
	this->anchors = new float[len];
	FILE* fp = fopen(config.anchorpath.c_str(), "rb");
	fread(this->anchors, sizeof(float), len, fp);
	fclose(fp);
}

HybridNets::~HybridNets()
{
    
    
	delete[] anchors;
	anchors = nullptr;
}

cv::Mat HybridNets::resize_image(cv::Mat srcimg, int* newh, int* neww, int* padh, int* padw)
{
    
    
	int srch = srcimg.rows, srcw = srcimg.cols;
	*newh = this->inpHeight;
	*neww = this->inpWidth;
	cv::Mat dstimg;
	if (this->keep_ratio && srch != srcw) {
    
    
		float hw_scale = (float)srch / srcw;
		if (hw_scale > 1) {
    
    
			*newh = this->inpHeight;
			*neww = int(this->inpWidth / hw_scale);
			resize(srcimg, dstimg, cv::Size(*neww, *newh), cv::INTER_LINEAR);
			*padw = int((this->inpWidth - *neww) * 0.5);
			copyMakeBorder(dstimg, dstimg, 0, 0, *padw, this->inpWidth - *neww - *padw, cv::BORDER_CONSTANT, 114);
		}
		else {
    
    
			*newh = (int)this->inpHeight * hw_scale;
			*neww = this->inpWidth;
			resize(srcimg, dstimg, cv::Size(*neww, *newh), cv::INTER_LINEAR);
			*padh = (int)(this->inpHeight - *newh) * 0.5;
			copyMakeBorder(dstimg, dstimg, *padh, this->inpHeight - *newh - *padh, 0, 0, cv::BORDER_CONSTANT, 114);
		}
	}
	else {
    
    
		resize(srcimg, dstimg, cv::Size(*neww, *newh), cv::INTER_LINEAR);
	}
	return dstimg;
}

cv::Mat HybridNets::normalize_(cv::Mat img)
{
    
    
	std::vector<cv::Mat> bgrChannels(3);
	split(img, bgrChannels);
	for (int c = 0; c < 3; c++)
	{
    
    
		bgrChannels[c].convertTo(bgrChannels[c], CV_32FC1, 1.0 / (255.0 * std_[c]), (0.0 - mean_[c]) / std_[c]);
	}
	cv::Mat m_normalized_mat;
	merge(bgrChannels, m_normalized_mat);
	return m_normalized_mat;
}

cv::Mat HybridNets::detect(cv::Mat srcimg)
{
    
    
	int newh = 0, neww = 0, padh = 0, padw = 0;
	cv::Mat rgbimg;
	cvtColor(srcimg, rgbimg, cv::COLOR_BGR2RGB);
	cv::Mat dstimg = this->resize_image(rgbimg, &newh, &neww, &padh, &padw);
	cv::Mat normalized_mat = this->normalize_(dstimg);


	cv::Mat blob = cv::dnn::blobFromImage(normalized_mat);
	this->net.setInput(blob);
	std::vector<cv::Mat> outs;
	this->net.forward(outs, this->net.getUnconnectedOutLayersNames());

	float* classification = (float*)outs[0].data;
	float* box_regression = (float*)outs[1].data;
	float* seg = (float*)outs[2].data;

	std::vector<cv::Rect> boxes;
	std::vector<float> confidences;
	std::vector<int> classIds;
	float ratioh = (float)srcimg.rows / newh, ratiow = (float)srcimg.cols / neww;
	const int num_proposal = outs[1].size[1];  
	for (int n = 0; n < num_proposal; n++)
	{
    
    
		float conf = classification[n];
	
		if (conf > this->confThreshold)
		{
    
    
			const int row_ind = n * 4;
			float x_centers = box_regression[row_ind + 1] * this->anchors[row_ind + 2] + this->anchors[row_ind];
			float y_centers = box_regression[row_ind] * this->anchors[row_ind + 3] + this->anchors[row_ind + 1];
			float w = exp(box_regression[row_ind + 3]) * this->anchors[row_ind + 2];
			float h = exp(box_regression[row_ind + 2]) * this->anchors[row_ind + 3];

			float xmin = (x_centers - w * 0.5 - padw) * ratiow;
			float ymin = (y_centers - h * 0.5 - padh) * ratioh;
			w *= ratiow;
			h *= ratioh;
			cv::Rect box = cv::Rect(int(xmin), int(ymin), int(w), int(h));
			boxes.push_back(box);
			confidences.push_back(conf);
			classIds.push_back(0);
		}
	}
	
	std::vector<int> indices;
	cv::dnn::NMSBoxes(boxes, confidences, this->confThreshold, this->nmsThreshold, indices);

	cv::Mat outimg = srcimg.clone();
	for (size_t i = 0; i < indices.size(); ++i)
	{
    
    
		int idx = indices[i];
		cv::Rect box = boxes[idx];
		rectangle(outimg, cv::Point(box.x, box.y), cv::Point(box.x + box.width, box.y + box.height), cv::Scalar(0, 0, 255), 2);
		std::string label = cv::format("%.2f", confidences[idx]);
		label = this->det_class_names[classIds[idx]] + ":" + label;
		putText(outimg, label, cv::Point(box.x, box.y - 5), cv::FONT_HERSHEY_SIMPLEX, 0.75, cv::Scalar(0, 0, 255), 1);
	}

	const int area = this->inpHeight * this->inpWidth;
	int i = 0, j = 0, c = 0;
	for (i = 0; i < outimg.rows; i++)
	{
    
    
		for (j = 0; j < outimg.cols; j++)
		{
    
    
			const int x = int(j / ratiow) + padw;  
			const int y = int(i / ratioh) + padh;
			int max_id = -1;
			float max_conf = -10000;
			for (c = 0; c < seg_numclass; c++)
			{
    
    
				float seg_conf = seg[c * area + y * this->inpWidth + x];
				if (seg_conf > max_conf)
				{
    
    
					max_id = c;
					max_conf = seg_conf;
				}
			}
			if (max_id > 0)
			{
    
    
				outimg.at<cv::Vec3b>(i, j)[0] = this->class_colors[max_id][0];
				outimg.at<cv::Vec3b>(i, j)[1] = this->class_colors[max_id][1];
				outimg.at<cv::Vec3b>(i, j)[2] = this->class_colors[max_id][2];
			}
		}
	}
	return outimg;
}

Read the video and use the model for inference

int main()
{
    
    
	Net_config cfg = {
    
     0.3, 0.5, "models/hybridnets_768x1280.onnx", "models/anchors_736560.bin" };
	HybridNets net(cfg);

	//cv::VideoWriter outputVideo;
	cv::VideoCapture cap("test2.mp4");

	//cv::Size S = cv::Size((int)cap.get(cv::CAP_PROP_FRAME_WIDTH),
		//(int)cap.get(cv::CAP_PROP_FRAME_HEIGHT));
	//std::string out_path = "dst.ma4";
	if (cap.isOpened())
	{
    
    
		//outputVideo.open(out_path, -1, 30.0, S, true);
		cv::Mat cv_fram;
		while (1)
		{
    
    
			cap.read(cv_fram);
			if (!cv_fram.empty())
			{
    
    
				cv::Mat outimg = net.detect(cv_fram);
				//outputVideo << outimg;
				cv::imshow("视频", outimg);
			}
			
			if (cv::waitKey(100) == 27)break;
		}
	}

	cap.release();
	return 0;
}

The reasoning effect is as follows:
Insert image description here

4. Use Python for model deployment

import cv2
import argparse
import numpy as np
import os

print(cv2.__version__)

class HybridNets():
    def __init__(self, modelpath, anchorpath, confThreshold=0.5, nmsThreshold=0.5):
        self.det_classes = ["car"]
        self.seg_classes = ["Background", "Lane", "Line"]

        self.net = cv2.dnn.readNet(modelpath)
        self.confThreshold = confThreshold
        self.nmsThreshold = nmsThreshold
        
        h, w = os.path.basename(modelpath).split('_')[-1].replace('.onnx', '').split('x')
        self.inpHeight, self.inpWidth = int(h), int(w)
        self.mean_ = np.array([0.485, 0.456, 0.406], dtype=np.float32).reshape((1, 1, 3))
        self.std_ = np.array([0.229, 0.224, 0.225], dtype=np.float32).reshape((1, 1, 3))
        self.anchors = np.load(anchorpath)  ### cx_cy_w_h
    
    def resize_image(self, srcimg, keep_ratio=True):
        padh, padw, newh, neww = 0, 0, self.inpWidth, self.inpHeight
        if keep_ratio and srcimg.shape[0] != srcimg.shape[1]:
            hw_scale = srcimg.shape[0] / srcimg.shape[1]
            if hw_scale > 1:
                newh, neww = self.inpHeight, int(self.inpWidth / hw_scale)
                img = cv2.resize(srcimg, (neww, newh), interpolation=cv2.INTER_LINEAR)
                padw = int((self.inpWidth - neww) * 0.5)
                img = cv2.copyMakeBorder(img, 0, 0, padw, self.inpWidth - neww - padw, cv2.BORDER_CONSTANT,
                                         value=(114, 114, 114))  # add border
            else:
                newh, neww = int(self.inpHeight * hw_scale), self.inpWidth
                img = cv2.resize(srcimg, (neww, newh), interpolation=cv2.INTER_LINEAR)
                padh = int((self.inpHeight - newh) * 0.5)
                img = cv2.copyMakeBorder(img, padh, self.inpHeight - newh - padh, 0, 0, cv2.BORDER_CONSTANT,
                                         value=(114, 114, 114))
        else:
            img = cv2.resize(srcimg, (self.inpWidth, self.inpHeight), interpolation=cv2.INTER_LINEAR)
        return img, newh, neww, padh, padw
    
    def detect(self, srcimg):
        img, newh, neww, padh, padw = self.resize_image(cv2.cvtColor(srcimg, cv2.COLOR_BGR2RGB))
        scale_h, scale_w = srcimg.shape[0] / newh, srcimg.shape[1] / neww
        img = (img.astype(np.float32) / 255.0 - self.mean_) / self.std_
        # Sets the input to the network
        blob = cv2.dnn.blobFromImage(img)
        self.net.setInput(blob)
        
        classification, box_regression, seg = self.net.forward(self.net.getUnconnectedOutLayersNames())

        x_centers = box_regression[..., 1] * self.anchors[..., 2] + self.anchors[..., 0]
        y_centers = box_regression[..., 0] * self.anchors[..., 3] + self.anchors[..., 1]
        w = np.exp(box_regression[..., 3]) * self.anchors[..., 2]
        h = np.exp(box_regression[..., 2]) * self.anchors[..., 3]

        xmin = x_centers - w * 0.5
        ymin = y_centers - h * 0.5
        
        bboxes_wh = np.stack([xmin, ymin, w, h], axis=2).squeeze(axis=0)
        
        confidences = np.max(classification.squeeze(axis=0), axis=1)  ####max_class_confidence
        classIds = np.argmax(classification.squeeze(axis=0), axis=1)
        mask = confidences > self.confThreshold
        bboxes_wh = bboxes_wh[mask]
        confidences = confidences[mask]
        classIds = classIds[mask]
        
        bboxes_wh -= np.array([[padw, padh, 0, 0]])  
        bboxes_wh *= np.array([[scale_w, scale_h, scale_w, scale_h]])
        
        indices = cv2.dnn.NMSBoxes(bboxes_wh.tolist(), confidences.tolist(), self.confThreshold,
                                   self.nmsThreshold).flatten().tolist()
        
        drive_area_mask = np.squeeze(seg, axis=0)[:, padh:(self.inpHeight - padh), padw:(self.inpWidth - padw)]
        seg_id = np.argmax(drive_area_mask, axis=0).astype(np.uint8)
        seg_id = cv2.resize(seg_id, (srcimg.shape[1], srcimg.shape[0]), interpolation=cv2.INTER_NEAREST)
     
        outimg = srcimg.copy()
        for ind in indices:
            x, y, w, h = bboxes_wh[ind,:].astype(int)
            cv2.rectangle(outimg, (x, y), (x + w, y + h), (0, 0, 255), thickness=2, lineType=cv2.LINE_AA)
            cv2.putText(outimg, self.det_classes[classIds[ind]]+ ":" + str(round(confidences[ind], 2)), (x, y - 5), cv2.FONT_HERSHEY_SIMPLEX, 0.75, (0, 0, 255),
                        thickness=1, lineType=cv2.LINE_AA)

        outimg[seg_id == 1] = [0, 255, 0]
        outimg[seg_id == 2] = [255, 0, 0]
        return outimg


if __name__ == "__main__":
    parser = argparse.ArgumentParser()
    parser.add_argument('--imgpath', type=str, default='images/test.jpg', help="image path")
    parser.add_argument('--modelpath', type=str, default='models/hybridnets_768x1280.onnx')
    parser.add_argument('--anchorpath', type=str, default='models/anchors_768x1280.npy')
    parser.add_argument('--confThreshold', default=0.3, type=float, help='class confidence')
    parser.add_argument('--nmsThreshold', default=0.5, type=float, help='nms iou thresh')
    args = parser.parse_args()
    
    yolonet = HybridNets(args.modelpath, args.anchorpath, confThreshold=args.confThreshold,
                         nmsThreshold=args.nmsThreshold)
    srcimg = cv2.imread(args.imgpath)
    srcimg = yolonet.detect(srcimg)
    
    cv2.namedWindow('dst', 0)
    cv2.imshow('dst', srcimg)
    cv2.waitKey(0)
    cv2.destroyAllWindows()

Insert image description here

Guess you like

Origin blog.csdn.net/matt45m/article/details/132797336