1. Model conversion
1.onnxruntime
ONNX Runtime (ONNX Runtime or ORT) is an open source, high-performance inference engine used to deploy and run machine learning models. It is designed to optimize the execution of models defined using the Open Neural Network Exchange (ONNX) format, an open standard for representing machine learning models.
ONNX Runtime provides several key features and benefits:
Cross-platform compatibility: ONNX Runtime is designed to be compatible with a variety of hardware and operating system platforms, including Windows, Linux, and various accelerators such as CPUs, GPUs, and FPGAs. This makes it possible to easily deploy and run models in different environments.
High performance: ONNX Runtime is performance optimized to provide low-latency model execution. It is optimized for different hardware platforms to ensure that the model runs efficiently.
Multi-framework support: ONNX Runtime can be used with models created using different machine learning frameworks, including PyTorch, TensorFlow, and more, thanks to its support for the ONNX format.
Model conversion: ONNX Runtime can convert models from supported frameworks into ONNX format, making it easier to use these models in a variety of deployment scenarios.
Multi-language support: ONNX Runtime is available in multiple programming languages, including C++, C#, Python, etc., making it usable by a wide range of developers.
Custom Operators: It supports custom operators, allowing developers to extend its functionality to support specific operations or hardware acceleration.
ONNX Runtime is widely used for production deployment of various machine learning applications, including computer vision, natural language processing, etc. It is actively maintained by the ONNX community and continues to receive updates and improvements.
- pt model and onnx model.pt
model and .onnx model are two different model file formats used to represent deep learning models. The main differences between them include:
file format:
.pt model: This is the weight file format of the PyTorch framework and is usually saved with a .pt or .pth extension. It contains the definition of the weight parameters of the model and the model structure. This file format is PyTorch specific.
.onnx model: This is a model file in ONNX (Open Neural Network Exchange) format, usually saved with a .onnx extension. ONNX is an intermediate representation format that is independent of any specific deep learning framework and is used for model conversion and deployment across different frameworks.
Framework dependencies:
.pt model: It relies on the PyTorch framework, so the PyTorch library is required when loading and running. This limits its direct use on different frameworks.
.onnx model: The ONNX model is independent of the deep learning framework and can be loaded and run in different frameworks that support ONNX, such as ONNX Runtime, TensorFlow, Caffe2, etc.
Cross-platform compatibility:
.pt model: It usually requires PyTorch compatibility configuration on different platforms, which may require additional work and dependency handling.
.onnx model: Due to the independence of ONNX, it is easier to deploy on different platforms and hardware without worrying about framework dependencies.
3. Yolov8 .pt model conversion to onnx
If you want cross-platform compatibility, the .pt model needs to be used in different frameworks or deployed cross-platform, and you need to use code or libraries to convert it to ONNX format. The ONNX conversion tool can convert PyTorch models to ONNX format.
from ultralytics import YOLO
# load model
model = YOLO('yolov8m.pt')
# Export model
success = model.export(format="onnx")
2. Model reasoning
1. Environment deployment
onnx model Model reasoning only depends on the onnxruntime library, and image processing depends on opencv, so you only need to install these two libraries without installing too many dependencies.
pip install onnxruntime
pip install opencv-python
pip install numpy
pip install gradio
2. Deployment code
utils.py
import numpy as np
import cv2
class_names = ['person', 'bicycle', 'car', 'motorcycle', 'airplane', 'bus', 'train', 'truck', 'boat', 'traffic light',
'fire hydrant', 'stop sign', 'parking meter', 'bench', 'bird', 'cat', 'dog', 'horse', 'sheep', 'cow',
'elephant', 'bear', 'zebra', 'giraffe', 'backpack', 'umbrella', 'handbag', 'tie', 'suitcase', 'frisbee',
'skis', 'snowboard', 'sports ball', 'kite', 'baseball bat', 'baseball glove', 'skateboard', 'surfboard',
'tennis racket', 'bottle', 'wine glass', 'cup', 'fork', 'knife', 'spoon', 'bowl', 'banana', 'apple',
'sandwich', 'orange', 'broccoli', 'carrot', 'hot dog', 'pizza', 'donut', 'cake', 'chair', 'couch',
'potted plant', 'bed', 'dining table', 'toilet', 'tv', 'laptop', 'mouse', 'remote', 'keyboard',
'cell phone', 'microwave', 'oven', 'toaster', 'sink', 'refrigerator', 'book', 'clock', 'vase',
'scissors', 'teddy bear', 'hair drier', 'toothbrush']
# Create a list of colors for each class where each color is a tuple of 3 integer values
rng = np.random.default_rng(3)
colors = rng.uniform(0, 255, size=(len(class_names), 3))
def nms(boxes, scores, iou_threshold):
# Sort by score
sorted_indices = np.argsort(scores)[::-1]
keep_boxes = []
while sorted_indices.size > 0:
# Pick the last box
box_id = sorted_indices[0]
keep_boxes.append(box_id)
# Compute IoU of the picked box with the rest
ious = compute_iou(boxes[box_id, :], boxes[sorted_indices[1:], :])
# Remove boxes with IoU over the threshold
keep_indices = np.where(ious < iou_threshold)[0]
# print(keep_indices.shape, sorted_indices.shape)
sorted_indices = sorted_indices[keep_indices + 1]
return keep_boxes
def multiclass_nms(boxes, scores, class_ids, iou_threshold):
unique_class_ids = np.unique(class_ids)
keep_boxes = []
for class_id in unique_class_ids:
class_indices = np.where(class_ids == class_id)[0]
class_boxes = boxes[class_indices,:]
class_scores = scores[class_indices]
class_keep_boxes = nms(class_boxes, class_scores, iou_threshold)
keep_boxes.extend(class_indices[class_keep_boxes])
return keep_boxes
def compute_iou(box, boxes):
# Compute xmin, ymin, xmax, ymax for both boxes
xmin = np.maximum(box[0], boxes[:, 0])
ymin = np.maximum(box[1], boxes[:, 1])
xmax = np.minimum(box[2], boxes[:, 2])
ymax = np.minimum(box[3], boxes[:, 3])
# Compute intersection area
intersection_area = np.maximum(0, xmax - xmin) * np.maximum(0, ymax - ymin)
# Compute union area
box_area = (box[2] - box[0]) * (box[3] - box[1])
boxes_area = (boxes[:, 2] - boxes[:, 0]) * (boxes[:, 3] - boxes[:, 1])
union_area = box_area + boxes_area - intersection_area
# Compute IoU
iou = intersection_area / union_area
return iou
def xywh2xyxy(x):
# Convert bounding box (x, y, w, h) to bounding box (x1, y1, x2, y2)
y = np.copy(x)
y[..., 0] = x[..., 0] - x[..., 2] / 2
y[..., 1] = x[..., 1] - x[..., 3] / 2
y[..., 2] = x[..., 0] + x[..., 2] / 2
y[..., 3] = x[..., 1] + x[..., 3] / 2
return y
def draw_detections(image, boxes, scores, class_ids, mask_alpha=0.3):
det_img = image.copy()
img_height, img_width = image.shape[:2]
font_size = min([img_height, img_width]) * 0.0006
text_thickness = int(min([img_height, img_width]) * 0.001)
det_img = draw_masks(det_img, boxes, class_ids, mask_alpha)
# Draw bounding boxes and labels of detections
for class_id, box, score in zip(class_ids, boxes, scores):
color = colors[class_id]
draw_box(det_img, box, color)
label = class_names[class_id]
caption = f'{label} {int(score * 100)}%'
draw_text(det_img, caption, box, color, font_size, text_thickness)
return det_img
def detections_dog(image, boxes, scores, class_ids, mask_alpha=0.3):
det_img = image.copy()
img_height, img_width = image.shape[:2]
font_size = min([img_height, img_width]) * 0.0006
text_thickness = int(min([img_height, img_width]) * 0.001)
# det_img = draw_masks(det_img, boxes, class_ids, mask_alpha)
# Draw bounding boxes and labels of detections
for class_id, box, score in zip(class_ids, boxes, scores):
color = colors[class_id]
draw_box(det_img, box, color)
label = class_names[class_id]
caption = f'{label} {int(score * 100)}%'
draw_text(det_img, caption, box, color, font_size, text_thickness)
return det_img
def draw_box( image: np.ndarray, box: np.ndarray, color: tuple[int, int, int] = (0, 0, 255),
thickness: int = 2) -> np.ndarray:
x1, y1, x2, y2 = box.astype(int)
return cv2.rectangle(image, (x1, y1), (x2, y2), color, thickness)
def draw_text(image: np.ndarray, text: str, box: np.ndarray, color: tuple[int, int, int] = (0, 0, 255),
font_size: float = 0.001, text_thickness: int = 2) -> np.ndarray:
x1, y1, x2, y2 = box.astype(int)
(tw, th), _ = cv2.getTextSize(text=text, fontFace=cv2.FONT_HERSHEY_SIMPLEX,
fontScale=font_size, thickness=text_thickness)
th = int(th * 1.2)
cv2.rectangle(image, (x1, y1),
(x1 + tw, y1 - th), color, -1)
return cv2.putText(image, text, (x1, y1), cv2.FONT_HERSHEY_SIMPLEX, font_size, (255, 255, 255), text_thickness, cv2.LINE_AA)
def draw_masks(image: np.ndarray, boxes: np.ndarray, classes: np.ndarray, mask_alpha: float = 0.3) -> np.ndarray:
mask_img = image.copy()
# Draw bounding boxes and labels of detections
for box, class_id in zip(boxes, classes):
color = colors[class_id]
x1, y1, x2, y2 = box.astype(int)
# Draw fill rectangle in mask image
cv2.rectangle(mask_img, (x1, y1), (x2, y2), color, -1)
return cv2.addWeighted(mask_img, mask_alpha, image, 1 - mask_alpha, 0)
YOLODet.py
import time
import cv2
import numpy as np
import onnxruntime
from detection.utils import xywh2xyxy, draw_detections, multiclass_nms,detections_dog
class YOLODet:
def __init__(self, path, conf_thres=0.7, iou_thres=0.5):
self.conf_threshold = conf_thres
self.iou_threshold = iou_thres
# Initialize model
self.initialize_model(path)
def __call__(self, image):
return self.detect_objects(image)
def initialize_model(self, path):
self.session = onnxruntime.InferenceSession(path,providers=onnxruntime.get_available_providers())
# Get model info
self.get_input_details()
self.get_output_details()
def detect_objects(self, image):
input_tensor = self.prepare_input(image)
# Perform inference on the image
outputs = self.inference(input_tensor)
self.boxes, self.scores, self.class_ids = self.process_output(outputs)
return self.boxes, self.scores, self.class_ids
def prepare_input(self, image):
self.img_height, self.img_width = image.shape[:2]
input_img = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
# Resize input image
input_img = cv2.resize(input_img, (self.input_width, self.input_height))
# Scale input pixel values to 0 to 1
input_img = input_img / 255.0
input_img = input_img.transpose(2, 0, 1)
input_tensor = input_img[np.newaxis, :, :, :].astype(np.float32)
return input_tensor
def inference(self, input_tensor):
start = time.perf_counter()
outputs = self.session.run(self.output_names, {
self.input_names[0]: input_tensor})
# print(f"Inference time: {(time.perf_counter() - start)*1000:.2f} ms")
return outputs
def process_output(self, output):
predictions = np.squeeze(output[0]).T
# Filter out object confidence scores below threshold
scores = np.max(predictions[:, 4:], axis=1)
predictions = predictions[scores > self.conf_threshold, :]
scores = scores[scores > self.conf_threshold]
if len(scores) == 0:
return [], [], []
# Get the class with the highest confidence
class_ids = np.argmax(predictions[:, 4:], axis=1)
# Get bounding boxes for each object
boxes = self.extract_boxes(predictions)
# Apply non-maxima suppression to suppress weak, overlapping bounding boxes
# indices = nms(boxes, scores, self.iou_threshold)
indices = multiclass_nms(boxes, scores, class_ids, self.iou_threshold)
return boxes[indices], scores[indices], class_ids[indices]
def extract_boxes(self, predictions):
# Extract boxes from predictions
boxes = predictions[:, :4]
# Scale boxes to original image dimensions
boxes = self.rescale_boxes(boxes)
# Convert boxes to xyxy format
boxes = xywh2xyxy(boxes)
return boxes
def rescale_boxes(self, boxes):
# Rescale boxes to original image dimensions
input_shape = np.array([self.input_width, self.input_height, self.input_width, self.input_height])
boxes = np.divide(boxes, input_shape, dtype=np.float32)
boxes *= np.array([self.img_width, self.img_height, self.img_width, self.img_height])
return boxes
def draw_detections(self, image, draw_scores=True, mask_alpha=0.4):
return detections_dog(image, self.boxes, self.scores,
self.class_ids, mask_alpha)
def get_input_details(self):
model_inputs = self.session.get_inputs()
self.input_names = [model_inputs[i].name for i in range(len(model_inputs))]
self.input_shape = model_inputs[0].shape
self.input_height = self.input_shape[2]
self.input_width = self.input_shape[3]
def get_output_details(self):
model_outputs = self.session.get_outputs()
self.output_names = [model_outputs[i].name for i in range(len(model_outputs))]
- Model testing
Image reasoning:
import cv2
import numpy as np
from detection import YOLODet
import gradio as gr
model = 'yolov8m.onnx'
yolo_det = YOLODet(model, conf_thres=0.5, iou_thres=0.3)
def det_img(cv_src):
yolo_det(cv_src)
cv_dst = yolo_det.draw_detections(cv_src)
return cv_dst
if __name__ == '__main__':
input = gr.Image()
output = gr.Image()
demo = gr.Interface(fn=det_img, inputs=input, outputs=output)
demo.launch()
Video reasoning:
def detectio_video(input_path,model_path,output_path):
cap = cv2.VideoCapture(input_path)
fps = int(cap.get(5))
t = int(1000 / fps)
videoWriter = None
det = YOLODet(model_path, conf_thres=0.3, iou_thres=0.5)
while True:
# try:
_, img = cap.read()
if img is None:
break
det(img)
cv_dst = det.draw_detections(img)
if videoWriter is None:
fourcc = cv2.VideoWriter_fourcc('m', 'p', '4', 'v')
videoWriter = cv2.VideoWriter(output_path, fourcc, fps, (cv_dst.shape[1], cv_dst.shape[0]))
videoWriter.write(cv_dst)
cv2.imshow("detection", cv_dst)
cv2.waitKey(t)
if cv2.getWindowProperty("detection", cv2.WND_PROP_AUTOSIZE) < 1:
# 点x退出
break
cap.release()
videoWriter.release()
cv2.destroyAllWindows()
Test Results:
Target Detection