Realsense D455 depth camera + YOLO V5 combined to achieve target detection (2)
The combination of realsense D455 depth camera + YOLO V5 realizes target detection (1) the first link
Why is there a second article about the combination of realsense D455 + YOLO V5? Because the last article was written after I found it from github and ran it through. Later, I found out that I couldn’t use the YOLO V5 code that I downloaded from git. In the process, I found that there is still something missing, so after learning from various ways, I applied the original YOLO v5 code found on github to it, and finally it can be detected very well!
It can realize the combination of D435, D455 depth camera and yolo v5. While recognizing the object, it can also measure the distance between the object and the camera.
Explain why you need to do this? 1. First of all, why do you need to use the realsense D455 depth camera? Because it is an ordinary camera with an infrared rangefinder, so other 2D images can get the projection of the 3D world on the 2D pixel plane, that is, the picture , but what we get after losing a depth dimension is the projection. For example, an apple can be as big as a football, because we don’t know the depth, that is, the distance information of the object from the camera, so we need a depth camera to realize the measurement. distance. 2. Why do you need to use the yolo algorithm? Because it can be used in real-time and accuracy, and can be applied to industrial and agricultural production, it is definitely needed. So there is a need for a combination of the two!
1. Code source
This is the first time I changed the code and put it on github. I hope you will give me a lot of stars . I mainly rewritten the detect.py file as realsensedetect.py. First of all, if you want to use this code, you can go here. git clone This is the code Link (in order to prevent the link from failing, write it here https://github.com/wenyishengkingkong/realsense-D455-YOLOV5.git).
2. Environment configuration
You can configure the environment according to the YOLO V5 environment configuration method, or just like the previous one , there is a simple configuration.
Then cd to enter the project folder and execute:
python realsensedetect.py
Mainly rewrite the detect.py part as realsensedetect.py file . The result of the operation is as follows:
3. Code analysis:
3.1 It is mainly shown in the file part of converting detect.py to realsensedetect.py. You can also directly change your own detect.py file to the following file and execute it directly.
import argparse
import os
import shutil
import time
from pathlib import Path
import cv2
import torch
import torch.backends.cudnn as cudnn
from numpy import random
import numpy as np
import pyrealsense2 as rs
from models.experimental import attempt_load
from utils.general import (
check_img_size, non_max_suppression, apply_classifier, scale_coords,
xyxy2xywh, plot_one_box, strip_optimizer, set_logging)
from utils.torch_utils import select_device, load_classifier, time_synchronized
from utils.datasets import letterbox
def detect(save_img=False):
out, source, weights, view_img, save_txt, imgsz = \
opt.save_dir, opt.source, opt.weights, opt.view_img, opt.save_txt, opt.img_size
webcam = source == '0' or source.startswith(('rtsp://', 'rtmp://', 'http://')) or source.endswith('.txt')
# Initialize
set_logging()
device = select_device(opt.device)
if os.path.exists(out): # output dir
shutil.rmtree(out) # delete dir
os.makedirs(out) # make new dir
half = device.type != 'cpu' # half precision only supported on CUDA
# Load model
model = attempt_load(weights, map_location=device) # load FP32 model
imgsz = check_img_size(imgsz, s=model.stride.max()) # check img_size
if half:
model.half() # to FP16
# Set Dataloader
vid_path, vid_writer = None, None
view_img = True
cudnn.benchmark = True # set True to speed up constant image size inference
#dataset = LoadStreams(source, img_size=imgsz)
# Get names and colors
names = model.module.names if hasattr(model, 'module') else model.names
colors = [[random.randint(0, 255) for _ in range(3)] for _ in range(len(names))]
# Run inference
t0 = time.time()
img = torch.zeros((1, 3, imgsz, imgsz), device=device) # init img
_ = model(img.half() if half else img) if device.type != 'cpu' else None # run once
pipeline = rs.pipeline()
# 创建 config 对象:
config = rs.config()
# config.enable_stream(rs.stream.depth, 640, 480, rs.format.z16, 30)
config.enable_stream(rs.stream.depth, 640, 480, rs.format.z16, 60)
config.enable_stream(rs.stream.color, 640, 480, rs.format.bgr8, 60)
# Start streaming
pipeline.start(config)
align_to_color = rs.align(rs.stream.color)
while True:
start = time.time()
# Wait for a coherent pair of frames(一对连贯的帧): depth and color
frames = pipeline.wait_for_frames()
frames = align_to_color.process(frames)
# depth_frame = frames.get_depth_frame()
depth_frame = frames.get_depth_frame()
color_frame = frames.get_color_frame()
color_image = np.asanyarray(color_frame.get_data())
depth_image = np.asanyarray(depth_frame.get_data())
mask = np.zeros([color_image.shape[0], color_image.shape[1]], dtype=np.uint8)
mask[0:480, 320:640] = 255
sources = [source]
imgs = [None]
path = sources
imgs[0] = color_image
im0s = imgs.copy()
img = [letterbox(x, new_shape=imgsz)[0] for x in im0s]
img = np.stack(img, 0)
img = img[:, :, :, ::-1].transpose(0, 3, 1, 2) # BGR to RGB, to 3x416x416, uint8 to float32
img = np.ascontiguousarray(img, dtype=np.float16 if half else np.float32)
img /= 255.0 # 0 - 255 to 0.0 - 1.0
# Get detections
img = torch.from_numpy(img).to(device)
if img.ndimension() == 3:
img = img.unsqueeze(0)
t1 = time_synchronized()
pred = model(img, augment=opt.augment)[0]
# Apply NMS
pred = non_max_suppression(pred, opt.conf_thres, opt.iou_thres, classes=opt.classes, agnostic=opt.agnostic_nms)
t2 = time_synchronized()
for i, det in enumerate(pred): # detections per image
p, s, im0 = path[i], '%g: ' % i, im0s[i].copy()
s += '%gx%g ' % img.shape[2:] # print string
gn = torch.tensor(im0.shape)[[1, 0, 1, 0]] # normalization gain whwh
if det is not None and len(det):
# Rescale boxes from img_size to im0 size
det[:, :4] = scale_coords(img.shape[2:], det[:, :4], im0.shape).round()
# Print results
for c in det[:, -1].unique():
n = (det[:, -1] == c).sum() # detections per class
s += '%g %ss, ' % (n, names[int(c)]) # add to string
# Write results
for *xyxy, conf, cls in reversed(det):
xywh = (xyxy2xywh(torch.tensor(xyxy).view(1, 4)) / gn).view(-1).tolist() # normalized xywh
line = (cls, conf, *xywh) if opt.save_conf else (cls, *xywh) # label format
distance_list = []
mid_pos = [int((int(xyxy[0]) + int(xyxy[2])) / 2), int((int(xyxy[1]) + int(xyxy[3])) / 2)] # 确定索引深度的中心像素位置左上角和右下角相加在/2
min_val = min(abs(int(xyxy[2]) - int(xyxy[0])), abs(int(xyxy[3]) - int(xyxy[1]))) # 确定深度搜索范围
# print(box,)
randnum = 40
for i in range(randnum):
bias = random.randint(-min_val // 4, min_val // 4)
dist = depth_frame.get_distance(int(mid_pos[0] + bias), int(mid_pos[1] + bias))
# print(int(mid_pos[1] + bias), int(mid_pos[0] + bias))
if dist:
distance_list.append(dist)
distance_list = np.array(distance_list)
distance_list = np.sort(distance_list)[
randnum // 2 - randnum // 4:randnum // 2 + randnum // 4] # 冒泡排序+中值滤波
label = '%s %.2f%s' % (names[int(cls)], np.mean(distance_list), 'm')
plot_one_box(xyxy, im0, label=label, color=colors[int(cls)], line_thickness=3)
# Print time (inference + NMS)
print('%sDone. (%.3fs)' % (s, t2 - t1))
# Stream results
if view_img:
cv2.imshow(p, im0)
if cv2.waitKey(1) == ord('q'): # q to quit
raise StopIteration
print('Done. (%.3fs)' % (time.time() - t0))
if __name__ == '__main__':
parser = argparse.ArgumentParser()
parser.add_argument('--weights', nargs='+', type=str, default='yolov5m.pt', help='model.pt path(s)')
parser.add_argument('--source', type=str, default='inference/images', help='source') # file/folder, 0 for webcam
parser.add_argument('--img-size', type=int, default=640, help='inference size (pixels)')
parser.add_argument('--conf-thres', type=float, default=0.25, help='object confidence threshold')
parser.add_argument('--iou-thres', type=float, default=0.45, help='IOU threshold for NMS')
parser.add_argument('--device', default='', help='cuda device, i.e. 0 or 0,1,2,3 or cpu')
parser.add_argument('--view-img', action='store_true', help='display results')
parser.add_argument('--save-txt', action='store_true', help='save results to *.txt')
parser.add_argument('--save-conf', action='store_true', help='save confidences in --save-txt labels')
parser.add_argument('--save-dir', type=str, default='inference/output', help='directory to save results')
parser.add_argument('--classes', nargs='+', type=int, help='filter by class: --class 0, or --class 0 2 3')
parser.add_argument('--agnostic-nms', action='store_true', help='class-agnostic NMS')
parser.add_argument('--augment', action='store_true', help='augmented inference')
parser.add_argument('--update', action='store_true', help='update all models')
opt = parser.parse_args()
print(opt)
with torch.no_grad(): # 一个上下文管理器,被该语句wrap起来的部分将不会track梯度
detect()
I believe that everyone feels a headache after seeing so many codes. In fact, there are not many lines changed, but the order and position are changed. If you find it troublesome, there are two softwares that can help you compare the files (it is explained that the v3.1 version in the YOLO V5 code is used above, I believe that there should be no problem with other versions, for other target detection The algorithm has not been tested, I believe it should be a change of soup but not medicine).
3.2 Comparison and difference analysis of files or files in folders Software introduction:
Whether it is on windows or on ubuntu, the easy-to-use pycharm software can be applied. You can select a file or folder and right-click to have a compare with option to perform difference analysis. You can compare the realsensedetect above. The difference between the py file and the detect.py file can be used to know how much has changed. The second is that you can use diffnity software on Windows, which is quite easy to use in theory!
4. Thoughts and concluding remarks
Why do we need to use this realsense depth camera? As mentioned in the previous article, it will add a dimension, that is, distance. So what is the application of this extra dimension? First of all, the first one is in social distance detection . For example, if you detect that a person is not wearing a mask, then you can directly detect the distance between him and the camera, and you can notify him in advance to wear a mask to avoid crowds at the entrance. when cross-infection. Here is a practical example. Secondly, it is mainly used in 3D reconstruction . After we get the 2D pixel points and distance values of the object, we can remodel the 3D object through 3D reconstruction or mathematical modeling, which is very important! Finally, we can all use the information we have obtained for 3D modeling and use the pcl library for more accurate distance calculations to achieve applications in the real world!
This is the first git own code on github, I hope it can help you, children's shoes who are interested in me can follow me , maybe I can help you that day !