Realsense D455 depth camera + YOLO V5 combined to achieve target detection (2)

The combination of realsense D455 depth camera + YOLO V5 realizes target detection (1) the first link

Why is there a second article about the combination of realsense D455 + YOLO V5? Because the last article was written after I found it from github and ran it through. Later, I found out that I couldn’t use the YOLO V5 code that I downloaded from git. In the process, I found that there is still something missing, so after learning from various ways, I applied the original YOLO v5 code found on github to it, and finally it can be detected very well!

It can realize the combination of D435, D455 depth camera and yolo v5. While recognizing the object, it can also measure the distance between the object and the camera.

Explain why you need to do this? 1. First of all, why do you need to use the realsense D455 depth camera? Because it is an ordinary camera with an infrared rangefinder, so other 2D images can get the projection of the 3D world on the 2D pixel plane, that is, the picture , but what we get after losing a depth dimension is the projection. For example, an apple can be as big as a football, because we don’t know the depth, that is, the distance information of the object from the camera, so we need a depth camera to realize the measurement. distance. 2. Why do you need to use the yolo algorithm? Because it can be used in real-time and accuracy, and can be applied to industrial and agricultural production, it is definitely needed. So there is a need for a combination of the two!

1. Code source

This is the first time I changed the code and put it on github. I hope you will give me a lot of stars . I mainly rewritten the detect.py file as realsensedetect.py. First of all, if you want to use this code, you can go here. git clone This is the code Link (in order to prevent the link from failing, write it here https://github.com/wenyishengkingkong/realsense-D455-YOLOV5.git).

2. Environment configuration

You can configure the environment according to the YOLO V5 environment configuration method, or just like the previous one , there is a simple configuration.

Then cd to enter the project folder and execute:

python realsensedetect.py

Mainly rewrite the detect.py part as realsensedetect.py file . The result of the operation is as follows:
Detection example

3. Code analysis:

3.1 It is mainly shown in the file part of converting detect.py to realsensedetect.py. You can also directly change your own detect.py file to the following file and execute it directly.

import argparse
import os
import shutil
import time
from pathlib import Path

import cv2
import torch
import torch.backends.cudnn as cudnn
from numpy import random
import numpy as np
import pyrealsense2 as rs

from models.experimental import attempt_load
from utils.general import (
    check_img_size, non_max_suppression, apply_classifier, scale_coords,
    xyxy2xywh, plot_one_box, strip_optimizer, set_logging)
from utils.torch_utils import select_device, load_classifier, time_synchronized
from utils.datasets import letterbox

def detect(save_img=False):
    out, source, weights, view_img, save_txt, imgsz = \
        opt.save_dir, opt.source, opt.weights, opt.view_img, opt.save_txt, opt.img_size
    webcam = source == '0' or source.startswith(('rtsp://', 'rtmp://', 'http://')) or source.endswith('.txt')

    # Initialize
    set_logging()
    device = select_device(opt.device)
    if os.path.exists(out):  # output dir
        shutil.rmtree(out)  # delete dir
    os.makedirs(out)  # make new dir
    half = device.type != 'cpu'  # half precision only supported on CUDA

    # Load model
    model = attempt_load(weights, map_location=device)  # load FP32 model
    imgsz = check_img_size(imgsz, s=model.stride.max())  # check img_size
    if half:
        model.half()  # to FP16
    # Set Dataloader
    vid_path, vid_writer = None, None
    view_img = True
    cudnn.benchmark = True  # set True to speed up constant image size inference
    #dataset = LoadStreams(source, img_size=imgsz)

    # Get names and colors
    names = model.module.names if hasattr(model, 'module') else model.names
    colors = [[random.randint(0, 255) for _ in range(3)] for _ in range(len(names))]

    # Run inference
    t0 = time.time()
    img = torch.zeros((1, 3, imgsz, imgsz), device=device)  # init img
    _ = model(img.half() if half else img) if device.type != 'cpu' else None  # run once
    pipeline = rs.pipeline()
    # 创建 config 对象:
    config = rs.config()
    # config.enable_stream(rs.stream.depth, 640, 480, rs.format.z16, 30)
    config.enable_stream(rs.stream.depth, 640, 480, rs.format.z16, 60)
    config.enable_stream(rs.stream.color, 640, 480, rs.format.bgr8, 60)

    # Start streaming
    pipeline.start(config)
    align_to_color = rs.align(rs.stream.color)
    while True:
        start = time.time()
        # Wait for a coherent pair of frames(一对连贯的帧): depth and color
        frames = pipeline.wait_for_frames()
        frames = align_to_color.process(frames)
        # depth_frame = frames.get_depth_frame()
        depth_frame = frames.get_depth_frame()
        color_frame = frames.get_color_frame()
        color_image = np.asanyarray(color_frame.get_data())
        depth_image = np.asanyarray(depth_frame.get_data())
        mask = np.zeros([color_image.shape[0], color_image.shape[1]], dtype=np.uint8)
        mask[0:480, 320:640] = 255

        sources = [source]
        imgs = [None]
        path = sources
        imgs[0] = color_image
        im0s = imgs.copy()
        img = [letterbox(x, new_shape=imgsz)[0] for x in im0s]
        img = np.stack(img, 0)
        img = img[:, :, :, ::-1].transpose(0, 3, 1, 2)  # BGR to RGB, to 3x416x416, uint8 to float32
        img = np.ascontiguousarray(img, dtype=np.float16 if half else np.float32)
        img /= 255.0  # 0 - 255 to 0.0 - 1.0

        # Get detections
        img = torch.from_numpy(img).to(device)
        if img.ndimension() == 3:
            img = img.unsqueeze(0)
        t1 = time_synchronized()
        pred = model(img, augment=opt.augment)[0]

        # Apply NMS
        pred = non_max_suppression(pred, opt.conf_thres, opt.iou_thres, classes=opt.classes, agnostic=opt.agnostic_nms)
        t2 = time_synchronized()

        for i, det in enumerate(pred):  # detections per image
            p, s, im0 = path[i], '%g: ' % i, im0s[i].copy()
            s += '%gx%g ' % img.shape[2:]  # print string
            gn = torch.tensor(im0.shape)[[1, 0, 1, 0]]  # normalization gain whwh
            if det is not None and len(det):
                # Rescale boxes from img_size to im0 size
                det[:, :4] = scale_coords(img.shape[2:], det[:, :4], im0.shape).round()

                # Print results
                for c in det[:, -1].unique():
                    n = (det[:, -1] == c).sum()  # detections per class
                    s += '%g %ss, ' % (n, names[int(c)])  # add to string

                # Write results
                for *xyxy, conf, cls in reversed(det):
                    xywh = (xyxy2xywh(torch.tensor(xyxy).view(1, 4)) / gn).view(-1).tolist()  # normalized xywh
                    line = (cls, conf, *xywh) if opt.save_conf else (cls, *xywh)  # label format
                    distance_list = []
                    mid_pos = [int((int(xyxy[0]) + int(xyxy[2])) / 2), int((int(xyxy[1]) + int(xyxy[3])) / 2)]  # 确定索引深度的中心像素位置左上角和右下角相加在/2
                    min_val = min(abs(int(xyxy[2]) - int(xyxy[0])), abs(int(xyxy[3]) - int(xyxy[1])))  # 确定深度搜索范围
                    # print(box,)
                    randnum = 40
                    for i in range(randnum):
                        bias = random.randint(-min_val // 4, min_val // 4)
                        dist = depth_frame.get_distance(int(mid_pos[0] + bias), int(mid_pos[1] + bias))
                        # print(int(mid_pos[1] + bias), int(mid_pos[0] + bias))
                        if dist:
                            distance_list.append(dist)
                    distance_list = np.array(distance_list)
                    distance_list = np.sort(distance_list)[
                                    randnum // 2 - randnum // 4:randnum // 2 + randnum // 4]  # 冒泡排序+中值滤波

                    label = '%s %.2f%s' % (names[int(cls)], np.mean(distance_list), 'm')
                    plot_one_box(xyxy, im0, label=label, color=colors[int(cls)], line_thickness=3)

            # Print time (inference + NMS)
            print('%sDone. (%.3fs)' % (s, t2 - t1))

            # Stream results
            if view_img:
                cv2.imshow(p, im0)
                if cv2.waitKey(1) == ord('q'):  # q to quit
                    raise StopIteration
    print('Done. (%.3fs)' % (time.time() - t0))


if __name__ == '__main__':
    parser = argparse.ArgumentParser()
    parser.add_argument('--weights', nargs='+', type=str, default='yolov5m.pt', help='model.pt path(s)')
    parser.add_argument('--source', type=str, default='inference/images', help='source')  # file/folder, 0 for webcam
    parser.add_argument('--img-size', type=int, default=640, help='inference size (pixels)')
    parser.add_argument('--conf-thres', type=float, default=0.25, help='object confidence threshold')
    parser.add_argument('--iou-thres', type=float, default=0.45, help='IOU threshold for NMS')
    parser.add_argument('--device', default='', help='cuda device, i.e. 0 or 0,1,2,3 or cpu')
    parser.add_argument('--view-img', action='store_true', help='display results')
    parser.add_argument('--save-txt', action='store_true', help='save results to *.txt')
    parser.add_argument('--save-conf', action='store_true', help='save confidences in --save-txt labels')
    parser.add_argument('--save-dir', type=str, default='inference/output', help='directory to save results')
    parser.add_argument('--classes', nargs='+', type=int, help='filter by class: --class 0, or --class 0 2 3')
    parser.add_argument('--agnostic-nms', action='store_true', help='class-agnostic NMS')
    parser.add_argument('--augment', action='store_true', help='augmented inference')
    parser.add_argument('--update', action='store_true', help='update all models')
    opt = parser.parse_args()
    print(opt)

    with torch.no_grad(): # 一个上下文管理器,被该语句wrap起来的部分将不会track梯度
        detect()

I believe that everyone feels a headache after seeing so many codes. In fact, there are not many lines changed, but the order and position are changed. If you find it troublesome, there are two softwares that can help you compare the files (it is explained that the v3.1 version in the YOLO V5 code is used above, I believe that there should be no problem with other versions, for other target detection The algorithm has not been tested, I believe it should be a change of soup but not medicine).

3.2 Comparison and difference analysis of files or files in folders Software introduction:

Whether it is on windows or on ubuntu, the easy-to-use pycharm software can be applied. You can select a file or folder and right-click to have a compare with option to perform difference analysis. You can compare the realsensedetect above. The difference between the py file and the detect.py file can be used to know how much has changed. The second is that you can use diffnity software on Windows, which is quite easy to use in theory!

4. Thoughts and concluding remarks

Why do we need to use this realsense depth camera? As mentioned in the previous article, it will add a dimension, that is, distance. So what is the application of this extra dimension? First of all, the first one is in social distance detection . For example, if you detect that a person is not wearing a mask, then you can directly detect the distance between him and the camera, and you can notify him in advance to wear a mask to avoid crowds at the entrance. when cross-infection. Here is a practical example. Secondly, it is mainly used in 3D reconstruction . After we get the 2D pixel points and distance values ​​of the object, we can remodel the 3D object through 3D reconstruction or mathematical modeling, which is very important! Finally, we can all use the information we have obtained for 3D modeling and use the pcl library for more accurate distance calculations to achieve applications in the real world!

This is the first git own code on github, I hope it can help you, children's shoes who are interested in me can follow me , maybe I can help you that day !

Guess you like

Origin blog.csdn.net/qq_45077256/article/details/120040059