[Selected] ROI area pedestrian speed measurement statistical system: Deepsort and OpenCV

1. Research background and significance

With the acceleration of urbanization and the continuous growth of population, urban traffic problems have become increasingly prominent. As a part of urban traffic that cannot be ignored, pedestrian behavior analysis and statistics are of great significance to urban traffic management and planning. Especially in densely populated areas, such as commercial areas, train stations, airports, etc., the flow of pedestrians directly affects the distribution of traffic flow and road design. Therefore, the development of an efficient and accurate pedestrian speed measurement statistical system is of great significance for urban traffic management.

The traditional pedestrian speed measurement statistical method mainly relies on manual counting, that is, counting by manually observing and recording the number and time of pedestrians passing through a certain area. However, there are many problems with this method, such as the accuracy and low efficiency of manual counting. At the same time, with the rapid development of computer vision and deep learning technology, statistical methods of pedestrian speed measurement based on images and videos have gradually become a research hotspot.

In recent years, deep learning technology has made major breakthroughs in the field of computer vision, especially in target detection and tracking. Deepsort is a multi-target tracking algorithm based on deep learning, which can achieve efficient and accurate target tracking in complex scenes. OpenCV is an open source computer vision library that provides a wealth of image processing and analysis tools. Combining Deepsort and OpenCV, accurate tracking and speed statistics of pedestrians can be achieved.

The ROI area pedestrian speed measurement statistical system based on Deepsort and OpenCV has the following significance:

  1. Improve the accuracy of speed measurement statistics: The traditional manual counting method is easily affected by human factors, such as observation angle, counting errors, etc. Systems based on Deepsort and OpenCV can perform automated pedestrian tracking and speed measurement statistics through image and video data, greatly improving the accuracy of speed measurement statistics.

  2. Improve the efficiency of speed measurement statistics: The traditional manual counting method requires a lot of manpower and time and is inefficient. The system based on Deepsort and OpenCV can realize automated pedestrian tracking and speed measurement statistics, greatly improving the efficiency of speed measurement statistics.

  3. Provide scientific basis for urban traffic management and planning: The flow of pedestrians is of great significance to urban traffic management and planning. The system based on Deepsort and OpenCV can monitor and count the flow of pedestrians in real time, providing scientific basis for urban traffic management and planning.

  4. Promote the application of computer vision and deep learning technology: Systems based on Deepsort and OpenCV are one of the applications of computer vision and deep learning technology in the transportation field, and can promote the application and development of these technologies in other fields.

In summary, the ROI area pedestrian speed measurement statistical system based on Deepsort and OpenCV has important research background and significance. By improving the accuracy and efficiency of speed measurement statistics, providing scientific basis for urban traffic management and planning, and promoting the application of computer vision and deep learning technology, the system is expected to play an important role in practical applications.

2. Picture demonstration

Insert image description here
Insert image description here
Insert image description here

3. Video demonstration

ROI area pedestrian speed measurement statistical system based on Deepsort and OpenCV_bilibili_bilibili

4. Collection, labeling and organization of data sets

collection of pictures

First, we need to collect the images we need. This can be achieved in different ways, such as using the existing dataset voc_from_mot20.
Insert image description here

Use labelImg for labeling

labelImg is a graphical image annotation tool that supports VOC and YOLO formats. The following are the steps to use labelImg to label images in VOC format:

(1) Download and install labelImg.
(2) Open labelImg and select "Open Dir" to select your image directory.
(3) Set the label name for your target object.
(4) Draw a rectangular frame on the picture and select the corresponding label.
(5) Save the annotation information, which will generate an XML file with the same name as the picture in the picture directory.
(6) Repeat this process until all pictures are labeled.
Insert image description here

Convert to YOLO format

Since YOLO uses txt format annotations, we need to convert the VOC format to YOLO format. This can be achieved using various conversion tools or scripts.

Here's a simple way to do it using a Python script that reads the XML file and then converts it to the txt format required by YOLO.

#!/usr/bin/env python3
# -*- coding: utf-8 -*-

import xml.etree.ElementTree as ET
import os

classes = []  # 初始化为空列表

CURRENT_DIR = os.path.dirname(os.path.abspath(__file__))

def convert(size, box):
    dw = 1. / size[0]
    dh = 1. / size[1]
    x = (box[0] + box[1]) / 2.0
    y = (box[2] + box[3]) / 2.0
    w = box[1] - box[0]
    h = box[3] - box[2]
    x = x * dw
    w = w * dw
    y = y * dh
    h = h * dh
    return (x, y, w, h)

def convert_annotation(image_id):
    in_file = open('./label_xml\%s.xml' % (image_id), encoding='UTF-8')
    out_file = open('./label_txt\%s.txt' % (image_id), 'w')  # 生成txt格式文件
    tree = ET.parse(in_file)
    root = tree.getroot()
    size = root.find('size')
    w = int(size.find('width').text)
    h = int(size.find('height').text)

    for obj in root.iter('object'):
        cls = obj.find('name').text
        if cls not in classes:
            classes.append(cls)  # 如果类别不存在,添加到classes列表中
        cls_id = classes.index(cls)
        xmlbox = obj.find('bndbox')
        b = (float(xmlbox.find('xmin').text), float(xmlbox.find('xmax').text), float(xmlbox.find('ymin').text),
             float(xmlbox.find('ymax').text))
        bb = convert((w, h), b)
        out_file.write(str(cls_id) + " " + " ".join([str(a) for a in bb]) + '\n')

xml_path = os.path.join(CURRENT_DIR, './label_xml/')

# xml list
img_xmls = os.listdir(xml_path)
for img_xml in img_xmls:
    label_name = img_xml.split('.')[0]
    print(label_name)
    convert_annotation(label_name)

print("Classes:")  # 打印最终的classes列表
print(classes)  # 打印最终的classes列表

Organize data folder structure

We need to organize the dataset into the following structure:

-----data
   |-----train
   |   |-----images
   |   |-----labels
   |
   |-----valid
   |   |-----images
   |   |-----labels
   |
   |-----test
       |-----images
       |-----labels

Make sure the following:

All training images are located in the data/train/images directory, and the corresponding label files are located in the data/train/labels directory.
All verification images are located in the data/valid/images directory, and the corresponding label files are located in the data/valid/labels directory.
All test images are located in the data/test/images directory, and the corresponding label files are located in the data/test/labels directory.
Such a structure makes data management and model training, verification and testing very convenient.

Model training

Insert image description here

5. Core code explanation

5.1 detector_CPU.py

class Detector:
    def __init__(self):
        self.img_size = 640
        self.threshold = 0.4
        self.stride = 1

        self.weights = './weights/output_of_small_target_detection.pt'

        self.device = '0' if torch.cuda.is_available() else 'cpu'
        self.device = select_device(self.device)
        model = attempt_load(self.weights, map_location=self.device)
        model.to(self.device).eval()
        model.float()

        self.m = model
        self.names = model.module.names if hasattr(
            model, 'module') else model.names

    def preprocess(self, img):
        img0 = img.copy()
        img = letterbox(img, new_shape=self.img_size)[0]
        img = img[:, :, ::-1].transpose(2, 0, 1)
        img = np.ascontiguousarray(img)
        img = torch.from_numpy(img).to(self.device)
        img = img.float()
        img /= 255.0
        if img.ndimension() == 3:
            img = img.unsqueeze(0)
        return img0, img

    def detect(self, im):
        im0, img = self.preprocess(im)
        pred = self.m(img, augment=False)[0]
        pred = pred.float()
        pred = non_max_suppression(pred, self.threshold, 0.4)

        boxes = []
        for det in pred:
            if det is not None and len(det):
                det[:, :4] = scale_coords(
                    img.shape[2:], det[:, :4], im0.shape).round()

                for *x, conf, cls_id in det:
                    lbl = self.names[int(cls_id)]
                    if lbl not in ['person']:
                        continue
                    x1, y1 = int(x[0]), int(x[1])
                    x2, y2 = int(x[2]), int(x[3])
                    xm = x2
                    ym = y2
                    boxes.append(
                            (x1, y1, x2, y2, lbl, conf))
        return boxes

The file name of this program isdetector_CPU.py, and its main function is target detection. It uses the PyTorch library for model loading and inference, and the OpenCV library for image processing.

In the initialization method of theDetector class, some parameters are defined, including image size, threshold and step size. At the same time, the weight files of the pre-trained model are loaded and the model is moved to an available device for inference.

preprocessThe method is used to preprocess the input image, including resizing the image, converting color channels, converting to Tensor, etc.

detectThe method is used to perform target detection. It first preprocesses the input image, and then uses the loaded model to perform inference to obtain the prediction result. Then, non-maximum suppression processing is performed on the prediction results to select target frames with higher confidence. Finally, filter according to the category of the target box, retain only the target box with the category 'person', and save it in a list.

The function of the entire program is to perform target detection on the input image and return the coordinates and category information of the detected human target frame.

5.2 detector_GPU.py

class Detector:
    def __init__(self):
        self.img_size = 640
        self.threshold = 0.1
        self.stride = 1
        self.weights = './weights/Attention_mechanism.pt'
        self.device = '0' if torch.cuda.is_available() else 'cpu'
        self.device = select_device(self.device)
        model = attempt_load(self.weights, map_location=self.device)
        model.to(self.device).eval()
        model.half()
        self.m = model
        self.names = model.module.names if hasattr(model, 'module') else model.names

    def preprocess(self, img):
        img0 = img.copy()
        img = letterbox(img, new_shape=self.img_size)[0]
        img = img[:, :, ::-1].transpose(2, 0, 1)
        img = np.ascontiguousarray(img)
        img = torch.from_numpy(img).to(self.device)
        img = img.half()
        img /= 255.0
        if img.ndimension() == 3:
            img = img.unsqueeze(0)
        return img0, img

    def detect(self, im):
        im0, img = self.preprocess(im)
        pred = self.m(img, augment=False)[0]
        pred = pred.float()
        pred = non_max_suppression(pred, self.threshold, 0.4)
        boxes = []
        for det in pred:
            if det is not None and len(det):
                det[:, :4] = scale_coords(img.shape[2:], det[:, :4], im0.shape).round()
                for *x, conf, cls_id in det:
                    lbl = self.names[int(cls_id)]
                    if lbl not in ['bicycle','car', 'bus', 'truck']:
                        continue
                    x1, y1 = int(x[0]), int(x[1])
                    x2, y2 = int(x[2]), int(x[3])
                    xm = x2
                    ym = y2
                    if  ym +0.797* xm -509.77 > 0:
                       boxes.append((x1, y1, x2, y2, lbl, conf))
        return boxes

This program file is named detector_GPU.py, which is the definition of a class for target detectionDetector. This class has the following methods:

  1. __init__Method: Initialize some parameters, including image size, threshold, step size, model weight file path, etc. At the same time, a device is selected for model loading and inference based on whether a GPU is available.

  2. preprocessMethod: Preprocess the input image, including resizing the image, converting the order of color channels, converting to numpy array, converting to torch tensor, etc.

  3. detectMethod: Perform target detection on the input image. First call the preprocess method to preprocess the image, and then use the loaded model to infer the image and obtain the prediction result. Finally, based on the confidence and category information of the prediction results, target boxes of specific categories are filtered out, and the coordinates and category information of these target boxes are returned.

The function of the entire program is to use the GPU for target detection, input an image, and output the coordinates and category information of the frame of a specific category of target in the image.

5.3 fit.py


class PedestrianSpeedEstimator:
    def __init__(self, yolo_weights, deepsort_weights, video_path):
        self.yolo = YOLODetector(yolo_weights)
        self.deepsort = DeepSort(deepsort_weights)
        self.video_path = video_path

    def detect_and_track(self):
        cap = cv2.VideoCapture(self.video_path)
        ret, frame = cap.read()
        if not ret:
            print("Failed to read video")
            return

        roi = cv2.selectROI("Select ROI", frame, fromCenter=False, showCrosshair=True)
        cv2.destroyAllWindows()

        while cap.isOpened():
            ret, frame = cap.read()
            if not ret:
                break

            roi_frame = frame[int(roi[1]):int(roi[1] + roi[3]), int(roi[0]):int(roi[0] + roi[2])]

            detections = self.yolo.detect(roi_frame)

            tracker_outputs = self.deepsort.update(detections)

            for track in tracker_outputs:
                if track.is_confirmed() and track.time_since_update > 1:
                    continue

                speed = self.calculate_speed(track)

                bbox = track.to_tlbr()
                cv2.rectangle(roi_frame, (int(bbox[0]), int(bbox[1])), (int(bbox[2]), int(bbox[3])), (255, 0, 0), 2)
                cv2.putText(roi_frame, f"ID: {
      
      track.track_id}, Speed: {
      
      speed:.2f} m/s",
                            (int(bbox[0]), int(bbox[1]) - 10), 0, 0.5, (255, 0, 0), 2)

            frame[int(roi[1]):int(roi[1] + roi[3]), int(roi[0]):int(roi[0] + roi[2])] = roi_frame
            cv2.rectangle(frame, (int(roi[0]), int(roi[1])), (int(roi[0] + roi[2]), int(roi[1] + roi[3])), (0, 255, 0),
                          2)

            cv2.imshow('Frame', frame)

            if cv2.waitKey(1) & 0xFF == ord('q'):
                break

        cap.release()
        cv2.destroyAllWindows()

    def calculate_speed(self, track):
        if len(track.history) < 2:
            return 0.0

        prev_bbox, curr_bbox = track.history[-2], track.history[-1]
        dy = (curr_bbox[1] + curr_bbox[3]) / 2 - (prev_bbox[1] + prev_bbox[3]) / 2
        dx = (curr_bbox[0] + curr_bbox[2]) / 2 - (prev_bbox[0] + prev_bbox[2]) / 2

        speed = (dy ** 2 + dx ** 2) ** 0.5

        return speed

This program file is called fit.py and it implements a pedestrian speed estimator. It uses the pre-trained YOLO model for pedestrian detection and the DeepSort algorithm for pedestrian tracking. The program first opens a video file and then lets the user select a region of interest (ROI). Next, the program will loop through the video frames, intercept the ROI area, and use the YOLO model for pedestrian detection. The program then uses the DeepSort algorithm to track the pedestrian and calculate the pedestrian's speed. Finally, the program will draw the detection box and speed information onto the image and display it in the window. The user can press the 'q' key on the keyboard to exit the program.

5.4 tracker.py

class ObjectTracker:
    def __init__(self):
        cfg = get_config()
        cfg.merge_from_file("./deep_sort/configs/deep_sort.yaml")
        self.deepsort = DeepSort(cfg.DEEPSORT.REID_CKPT,
                                 max_dist=cfg.DEEPSORT.MAX_DIST, min_confidence=cfg.DEEPSORT.MIN_CONFIDENCE,
                                 nms_max_overlap=cfg.DEEPSORT.NMS_MAX_OVERLAP, max_iou_distance=cfg.DEEPSORT.MAX_IOU_DISTANCE,
                                 max_age=cfg.DEEPSORT.MAX_AGE, n_init=cfg.DEEPSORT.N_INIT, nn_budget=cfg.DEEPSORT.NN_BUDGET,
                                 use_cuda=True)

    def update(self, bboxes, image):
        bbox_xywh = []
        confs = []
        bboxes2draw = []

        if len(bboxes) > 0:
            for x1, y1, x2, y2, lbl, conf in bboxes:
                obj = [
                    int((x1 + x2) * 0.5), int((y1 + y2) * 0.5),
                    x2 - x1, y2 - y1
                ]
                bbox_xywh.append(obj)
                confs.append(conf)

            xywhs = torch.Tensor(bbox_xywh)
            confss = torch.Tensor(confs)

            outputs = self.deepsort.update(xywhs, confss, image)

            for x1, y1, x2, y2, track_id in list(outputs):
                center_x = (x1 + x2) * 0.5
                center_y = (y1 + y2) * 0.5

                label = self.search_label(center_x=center_x, center_y=center_y,
                                          bboxes_xyxy=bboxes, max_dist_threshold=20.0)

                bboxes2draw.append((x1, y1, x2, y2, label, track_id))

        return bboxes2draw

    def search_label(self, center_x, center_y, bboxes_xyxy, max_dist_threshold):
        label = ''
        min_dist = -1.0

        for x1, y1, x2, y2, lbl, conf in bboxes_xyxy:
            center_x2 = (x1 + x2) * 0.5
            center_y2 = (y1 + y2) * 0.5

            min_x = abs(center_x2 - center_x)
            min_y = abs(center_y2 - center_y)

            if min_x < max_dist_threshold and min_y < max_dist_threshold:
                avg_dist = (min_x + min_y) * 0.5
                if min_dist == -1.0:
                    min_dist = avg_dist
                    label = lbl
                else:
                    if avg_dist < min_dist:
                        min_dist = avg_dist
                        label = lbl

        return label

    def draw_bboxes(self, point_list, speed_list, name_list, image, bboxes, line_thickness):
        line_thickness = line_thickness or round(
            0.002 * (image.shape[0] + image.shape[1]) * 0.5) + 1

        list_pts = []
        point_radius = 4

        for (x1, y1, x2, y2, cls_id, pos_id) in bboxes:
            color = (0, 255, 0)

            check_point_x = x1
            check_point_y = int(y1 + ((y2 - y1) * 0.6))

            c1, c2 = (x1, y1), (x2, y2)
            cv2.rectangle(image, c1, c2, color, thickness=line_thickness, lineType=cv2.LINE_AA)

            font_thickness = max(line_thickness - 1, 1)
            t_size = cv2.getTextSize(cls_id, 0, fontScale=line_thickness / 3, thickness=font_thickness)[0]
            c2 = c1[0] + t_size[0], c1[1] - t_size[1] - 3

            if str(pos_id) not in name_list:
                name_list.append(str(pos_id))
                point_list.append([])
                speed_list.append([])
            id = name_list.index(str(pos_id))
            point_list[id].append(c1)
            list_pts.append([check_point_x - point_radius, check_point_y - point_radius])
            list_pts.append([check_point_x - point_radius, check_point_y + point_radius])
            list_pts.append([check_point_x + point_radius, check_point_y + point_radius])
            list_pts.append([check_point_x + point_radius, check_point_y - point_radius])

            ndarray_pts = np.array(list_pts, np.int32)

            cv2.rectangle(image, c1, c2, color, -1, cv2.LINE_AA)
            try:
                cv2.putText(image, '{} ID-{}-{}Km/H'.format(cls_id, pos_id,str(float(speed_list[id])/3)[:5]), (c1[0], c1[1] - 2), 0, line_thickness / 3,
                            [225, 255, 255], thickness=font_thickness, lineType=cv2.LINE_AA)
            except:
                cv2.putText(image, '{} ID-{}'.format(cls_id, pos_id),
                            (c1[0], c1[1] - 2), 0, line_thickness / 3,
                            [225, 255, 255], thickness=font_thickness, lineType=cv2.LINE_AA)
            cv2.fillPoly(image, [ndarray_pts], color=(0, 0, 255))

            list_pts.clear()

        return image, point_list, speed_list, name_list

The program file name is tracker.py, and its main function is to implement target tracking and speed calculation. The program imports libraries such as cv2, torch and numpy, and imports some functions and classes from the deep_sort.utils.parser and deep_sort.deep_sort modules.

The program first reads the configuration file deep_sort.yaml and uses the configuration file to initialize the DeepSort object deepsort. Then a function draw_bboxes is defined to draw target boxes and labels on the image, and record the target's position and speed information. Next, a function update is defined to update the target tracking results. This function converts the input target box coordinates and confidence into the format required by DeepSort, and calls the deepsort.update method for target tracking. Finally, a function search_label is defined, which is used to search for the label closest to the specified center point in the target box.

The main function of the entire program is to pass the input target box coordinates and confidence to DeepSort for target tracking, and draw the target box and label on the image. At the same time, the target's speed is calculated based on the target's position information, and the speed information is added to the label.

5.5 train.py


class Trainer:
    def __init__(self, hyp, opt, device, tb_writer=None):
        self.hyp = hyp
        self.opt = opt
        self.device = device
        self.tb_writer = tb_writer

    def train(self):
        logger.info(colorstr('hyperparameters: ') + ', '.join(f'{
      
      k}={
      
      v}' for k, v in self.hyp.items()))
        save_dir, epochs, batch_size, total_batch_size, weights, rank, freeze = \
            Path(self.opt.save_dir), self.opt.epochs, self.opt.batch_size, self.opt.total_batch_size, self.opt.weights, self.opt.global_rank, self.opt.freeze

        # Directories
        wdir = save_dir / 'weights'
        wdir.mkdir(parents=True, exist_ok=True)  # make dir
        last = wdir / 'last.pt'
        best = wdir / 'best.pt'
        results_file = save_dir / 'results.txt'

        # Save run settings
        with open(save_dir / 'hyp.yaml', 'w') as f:
            yaml.dump(self.hyp, f, sort_keys=False)
        with open(save_dir / 'opt.yaml', 'w') as f:
            yaml.dump(vars(self.opt), f, sort_keys=False)

        # Configure
        plots = not self.opt.evolve  # create plots
        cuda = self.device.type != 'cpu'
        init_seeds(2 + rank)
        with open(self.opt.data) as f:
            data_dict = yaml.load(f, Loader=yaml.SafeLoader)  # data dict
        is_coco = self.opt.data.endswith('coco.yaml')

        # Logging- Doing this before checking the dataset. Might update data_dict
        loggers = {
    
    'wandb': None}  # loggers dict
        if rank in [-1, 0]:
            self.opt.hyp = self.hyp  # add hyperparameters
            run_id = torch.load(weights, map_location=self.device).get('wandb_id') if weights.endswith('.pt') and os.path.isfile(weights) else None
            wandb_logger = WandbLogger(self.opt, Path(self.opt.save_dir).stem, run_id, data_dict)
            loggers['wandb'] = wandb_logger.wandb
            data_dict = wandb_logger.data_dict
            if wandb_logger.wandb:
                weights, epochs, self.hyp = self.opt.weights, self.opt.epochs, self.opt.hyp  # WandbLogger might update weights, epochs if resuming

        nc = 1 if self.opt.single_cls else int(data_dict['nc'])  # number of classes
        names = ['item'] if self.opt.single_cls and len(data_dict['names']) != 1 else data_dict['names']  # class names
        assert len(names) == nc, '%g names found for nc=%g dataset in %s' % (len(names), nc, self.opt.data)  # check

        # Model
        pretrained = weights.endswith('.pt')
        if pretrained:
            with torch_distributed_zero_first(rank):
                attempt_download(weights)  # download if not found locally
            ckpt = torch.load(weights, map_location=self.device)  # load checkpoint
            model = Model(self.opt.cfg or ckpt['model'].yaml, ch=3, nc=nc, anchors=self.hyp.get('anchors')).to(self.device)  # create
            exclude = ['anchor'] if (self.opt.cfg or self.hyp.get('anchors')) and not self.opt.resume else []  # exclude keys
            state_dict = ckpt['model'].float().state_dict()  # to FP32
            state_dict = intersect_dicts(state_dict, model.state_dict(), exclude=exclude)  # intersect
            model.load_state_dict(state_dict, strict=False)  # load
            logger.info('Transferred %g/%g items from %s' % (len(state_dict), len(model.state_dict()), weights))  # report
        else:
            model = Model(self.opt.cfg, ch=3, nc=nc, anchors=self.hyp.get('anchors')).to(self.device)  # create
        with torch_distributed_zero_first(rank):
            check_dataset(data_dict)  # check
        train_path = data_dict['train']
        test_path = data_dict['val']

        # Freeze
        freeze = [f'model.{
      
      x}.' for x in (freeze if len(freeze) > 1 else range(freeze[0]))]  # parameter names to freeze (full or partial)
        for k, v in model.named_parameters():
            v.requires_grad = True  # train all layers
            if any(x in k for x in freeze):
                print('freezing %s' % k)
                v.requires_grad = False

        # Optimizer
        nbs = 64  # nominal batch size
        accumulate = max(round(nbs / total_batch_size), 1)  # accumulate loss before optimizing
        self.hyp['weight_decay'] *= total_batch_size * accumulate / nbs  # scale weight_decay
        logger.info(f"Scaled weight_decay = {
      
      self.hyp['weight_decay']}")

        pg0, pg1, pg2 = [], [], []  # optimizer parameter groups
        for k, v in model.named_modules():
            if hasattr(v, 'bias') and isinstance(v.bias, nn.Parameter):
                pg2.append(v.bias)  # biases
            if isinstance(v, nn.BatchNorm2d):
                pg0.append(v.weight)  # no decay
            elif hasattr(v, 'weight') and isinstance(v.weight, nn.Parameter):
                pg1.append(v.weight)  # apply decay
            if hasattr(v, 'im'):
                if hasattr(v.im, 'implicit'):           
                    pg0.append(v.im.implicit)
                else:
                    for iv in v.im:
                        pg0.append(iv.implicit)
            if hasattr(v, 'imc'):
                if hasattr(v.imc, 'implicit'):           
                    pg0.append(v.imc.implicit)
                else:
                    for iv in v.imc:
                        pg0.append(iv.implicit)
            .....

This program file is a script used to train the model. It includes importing the required libraries and modules, defining training functions and some auxiliary functions. The training function includes steps such as model initialization, data loading, optimizer settings, training loop, and model saving. This script also supports distributed training and visualization with TensorBoard.

6. Overall structure of the system

Overview of overall functions and architecture:
This project is a ROI area pedestrian speed measurement statistics system based on DeepSort and OpenCV. It uses the YOLO model for target detection, the DeepSort algorithm for target tracking, and combines collision detection and speed calculation to achieve speed statistics and trajectory analysis of pedestrians in the ROI area.

Here is an overview of the functionality of each file:

file path Functional Overview
E:\Visual Project\shop\ROI area pedestrian speed measurement statistical system based on Deepsort and OpenCV\code\detector_CPU.py Implementing a CPU-based object detector using PyTorch and OpenCV libraries for image processing and inference
E:\Visual Project\shop\ROI area pedestrian speed measurement statistical system based on Deepsort and OpenCV\code\detector_GPU.py Implementing a GPU-based object detector using PyTorch and OpenCV libraries for image processing and inference
E:\Visual Project\shop\ROI area pedestrian speed measurement statistical system based on Deepsort and OpenCV\code\fit.py Implement pedestrian speed estimator, use YOLO model for pedestrian detection, and use DeepSort algorithm for pedestrian tracking
E:\Visual Project\shop\ROI area pedestrian speed measurement statistical system based on Deepsort and OpenCV\code\tracker.py Implement target tracking and speed calculation, use the DeepSort algorithm for target tracking, and calculate the speed of the target
E:\Visual Project\shop\ROI area pedestrian speed measurement statistical system based on Deepsort and OpenCV\code\train.py Implement model training scripts to train models, support distributed training and use TensorBoard for visualization
E:\Visual Project\shop\ROI area pedestrian speed measurement statistical system based on Deepsort and OpenCV\code\ui.py Implement the UI interface, use the YOLO model for target detection and DeepSort for target tracking, and implement pedestrian speed statistics and trajectory analysis.
E:\Visual Project\shop\ROI area pedestrian speed measurement statistical system based on Deepsort and OpenCV\deep_sort\deep_sort\deep_sort.py Implement the main logic of the DeepSort algorithm, including target tracking and feature extraction and other functions
E:\Visual Project\shop\ROI area pedestrian speed measurement statistical system based on Deepsort and OpenCV\deep_sort\deep_sort_init_.py Initialization file, used to import related modules and functions of the DeepSort algorithm
E:\Visual Project\shop\ROI area pedestrian speed measurement statistical system based on Deepsort and OpenCV\deep_sort\deep\evaluate.py Implement evaluation functions for evaluating model performance and accuracy
E:\Visual Project\shop\ROI area pedestrian speed measurement statistical system based on Deepsort and OpenCV\deep_sort\deep\feature_extractor.py Implement a feature extractor for extracting feature vectors from target images
E:\Visual Project\shop\ROI area pedestrian speed measurement statistical system based on Deepsort and OpenCV\deep_sort\deep\model.py Implement model classes for loading and saving model weights and parameters
E:\Visual Project\shop\ROI area pedestrian speed measurement statistical system based on Deepsort and OpenCV\deep_sort\deep\original_model.py Implement the original model class for loading and saving the weights and parameters of the original model
E:\Visual Project\shop\ROI area pedestrian speed measurement statistical system based on Deepsort and OpenCV\deep_sort\deep\test.py Implement test functions to test the performance and accuracy of the model
E:\Visual Project\shop\ROI area pedestrian speed measurement statistical system based on Deepsort and OpenCV\deep_sort\deep\train.py Implement the training function, which is used to train the parameters and weights of the model
E:\Visual Project\shop\ROI area pedestrian speed measurement statistical system based on Deepsort and OpenCV\deep_sort\deep_init_.py Initialization file, used to import related modules and functions of the DeepSort algorithm
E:\Visual Project\shop\ROI area pedestrian speed measurement statistical system based on Deepsort and OpenCV\deep_sort\sort\detection.py Implement the detection class, which is used to process target detection results and target frame related operations.
E:\Visual Project\shop\ROI area pedestrian speed measurement statistical system based on Deepsort and OpenCV\deep_sort\sort\iou_matching.py Implement the IoU matching class to calculate the IoU matching score between target boxes
E:\Visual Project\shop\ROI area pedestrian speed measurement statistical system based on Deepsort and OpenCV\deep_sort\sort\kalman_filter.py Implement the Kalman filter class for prediction and update of target states
E:\Visual Project\shop\ROI area pedestrian speed measurement statistical system based on Deepsort and OpenCV\deep_sort\sort\linear_assignment.py Implement a linear distribution class to match and distribute detection results with tracking results
E:\Visual Project\shop\ROI area pedestrian speed measurement statistical system based on Deepsort and OpenCV\deep_sort\sort\nn_matching.py Implement the NN matching class for calculating similarity scores between feature vectors
E:\Visual Project\shop\ROI area pedestrian speed measurement statistical system based on Deepsort and OpenCV\deep_sort\sort\preprocessing.py Implement the preprocessing class to preprocess and convert the target frame
E:\Visual Project\shop\ROI area pedestrian speed measurement statistical system based on Deepsort and OpenCV\deep_sort\sort\track.py Implement tracking classes to represent and manage the status and properties of tracking targets
E:\Visual Project\shop\ROI area pedestrian speed measurement statistical system based on Deepsort and OpenCV\deep_sort\sort\tracker.py Implement a tracker class to integrate the functions of target detection and target tracking

7.Deepsort target tracking

(1) Obtain the original video frame
(2) Use the target detector to detect the target in the video frame
(3) Detect the The features in the frame of the target are extracted, which include appearance features (convenient for feature comparison to avoid ID switch) and motion features (movement features facilitate prediction by Kalman filter) (4) Calculate the matching degree of the target in the two frames before and after (using the Hungarian algorithm and cascade matching), and assign an ID to each tracked target. The predecessor of Deepsort is the sort algorithm. The core of the sort algorithm is the Kalman filter algorithm and the Hungarian algorithm.


    卡尔曼滤波算法作用:该算法的主要作用就是当前的一系列运动变量去预测下一时刻的运动变量,但是第一次的检测结果用来初始化卡尔曼滤波的运动变量。

    匈牙利算法的作用:简单来讲就是解决分配问题,就是把一群检测框和卡尔曼预测的框做分配,让卡尔曼预测的框找到和自己最匹配的检测框,达到追踪的效果。
The sort workflow is shown in the figure below:

4.png

Detections are frames detected by the target. Tracks is track information.

The workflow of the entire algorithm is as follows:

(1) Create corresponding Tracks based on the detected results of the first frame. Initialize the motion variables of the Kalman filter and predict their corresponding frames through the Kalman filter.

(2) Match the frame detected by the target in this frame with the frame predicted by Tracks in the previous frame one by one, and then calculate the cost matrix (cost matrix, the calculation method is 1-IOU) through the IOU matching result.

(3) Use all the cost matrices obtained in (2) as the input of the Hungarian algorithm to obtain linear matching results. At this time, we get three types of results. The first is Tracks mismatch (Unmatched Tracks). We directly Delete the mismatched Tracks; the second one is Detections mismatch (Unmatched Detections), we initialize such Detections into a new Tracks (new Tracks); the third one is that the detection frame and the predicted frame are paired successfully, which means We tracked the previous frame and the next frame successfully, and updated their corresponding Tracks variables through Kalman filtering.

(4) Repeat steps (2)-(3) until the video frame ends.

Deepsort algorithm process

Since the sort algorithm is still a relatively rough tracking algorithm, it is particularly easy to lose its own ID when an object is occluded. The Deepsort algorithm adds matching cascade and confirmation of new trajectories to the sort algorithm. Tracks are divided into confirmed state (confirmed) and unconfirmed state (unconfirmed). The newly generated Tracks are in unconfirmed state; unconfirmed Tracks must match Detections a certain number of times (default is 3) before they can be converted into Confirmation status. Confirmed Tracks must mismatch with Detections a certain number of times (default 30 times) before they will be deleted.
The workflow of the Deepsort algorithm is shown in the figure below:
5.png
The workflow of the entire algorithm is as follows:

(1) Create corresponding Tracks based on the detection results of the first frame. Initialize the motion variables of the Kalman filter and predict their corresponding frames through the Kalman filter. The Tracks at this time must be unconfirmed.

(2) Match the frame detected by the target in this frame with the frame predicted by Tracks in the previous frame one by one, and then calculate the cost matrix (cost matrix, the calculation method is 1-IOU) through the IOU matching result.

(3) Use all the cost matrices obtained in (2) as the input of the Hungarian algorithm to obtain linear matching results. At this time, we get three types of results. The first is Tracks mismatch (Unmatched Tracks). We directly Delete the mismatched Tracks (because this Tracks is in an uncertain state, and if it is in a determined state, it can only be deleted after reaching a certain number of times (default 30 times)); the second type is Unmatched Detections. We initialize such Detections into a new Tracks (new Tracks); the third is that the detection frame and the predicted frame are successfully paired, which means that we have successfully tracked the previous frame and the next frame, and the corresponding Detections are passed through Kalman Filter and update its corresponding Tracks variable.

(4) Repeat steps (2)-(3) until confirmed Tracks appear or the video frame ends.

(5) Use Kalman filtering to predict the frames corresponding to the Tracks in the confirmed state and the Tracks in the unconfirmed state. Cascade matching the frames of confirmed Tracks and Detections (previously, every time Tracks were matched, the appearance features and motion information of Detections would be saved. The first 100 frames were saved by default, and the appearance features and motion information were used to cascade with Detections. Matching, this is done because confirmed Tracks and Detections are more likely to match).

(6) There are three possible results after performing cascade matching. The first is Tracks matching. Such Tracks update their corresponding Tracks variables through Kalman filtering. The second and third type is the mismatch between Detections and Tracks. At this time, the previously unconfirmed Tracks and mismatched Tracks are matched with Unmatched Detections one by one by IOU, and then the cost matrix is ​​calculated based on the IOU matching results. , its calculation method is 1-IOU).

(7) Use all the cost matrices obtained in (6) as the input of the Hungarian algorithm to obtain linear matching results. At this time, we get three types of results. The first is Tracks mismatch (Unmatched Tracks). We directly Delete the mismatched Tracks (because this Tracks is in an uncertain state, and if it is in a determined state, it can only be deleted after reaching a certain number of times (default 30 times)); the second type is Unmatched Detections. We initialize such Detections into a new Tracks (new Tracks); the third is that the detection frame and the predicted frame are successfully paired, which means that we have successfully tracked the previous frame and the next frame, and the corresponding Detections are passed through Kalman Filter and update its corresponding Tracks variable.

(8) Repeat steps (5)-(7) until the end of the video frame.

8. Principle of pedestrian speed measurement

Algorithm process

First, use the real width of the pedestrian set in advance and the detected pixel width of the pedestrian to find the ratio of the real distance to the pixel distance, and then use the center coordinates of the two frames before and after each pedestrian to calculate the pixel distance moved between the two frames. Using this ratio and pixel distance for mapping, the true distance of pedestrian movement between two frames can be calculated. Then the distance divided by the time between two frames is the speed. In this speed measurement algorithm, the real moving distance of the vehicle and the moving distance of the pixel are regarded as a linear relationship, which can only be established when the axis of the surveillance camera is perpendicular to the direction of vehicle movement, and the detected pedestrian frame will produce a certain deformation in space, making the true distance The mapping relationship to pixel distance is inaccurate. Interested students can add perspective transformation to the code to turn the image into an overhead view similar to remote sensing data. After achieving speed measurement, the image can be transformed into the original image perspective to achieve more accurate pedestrian speed measurement.

core code
def Estimated_speed(locations, fps, width):
    present_IDs = []
    prev_IDs = []
    work_IDs = []
    work_IDs_index = []
    work_IDs_prev_index = []
    work_locations = []  # 当前帧数据:中心点x坐标、中心点y坐标、目标序号、车辆类别、车辆像素宽度
    work_prev_locations = []  # 上一帧数据,数据格式相同
    speed = []
    for i in range(len(locations[1])):
        present_IDs.append(locations[1][i][2])  # 获得当前帧中跟踪到车辆的ID
    for i in range(len(locations[0])):
        prev_IDs.append(locations[0][i][2])  # 获得前一帧中跟踪到车辆的ID
    for m, n in enumerate(present_IDs):
        if n in prev_IDs:  # 进行筛选,找到在两帧图像中均被检测到的有效车辆ID,存入work_IDs中
            work_IDs.append(n)
            work_IDs_index.append(m)
    for x in work_IDs_index:  # 将当前帧有效检测车辆的信息存入work_locations中
        work_locations.append(locations[1][x])
    for y, z in enumerate(prev_IDs):
        if z in work_IDs:  # 将前一帧有效检测车辆的ID索引存入work_IDs_prev_index中
            work_IDs_prev_index.append(y)
    for x in work_IDs_prev_index:  # 将前一帧有效检测车辆的信息存入work_prev_locations中
        work_prev_locations.append(locations[0][x])
    for i in range(len(work_IDs)):
        speed.append(
            math.sqrt((work_locations[i][0] - work_prev_locations[i][0]) ** 2 +  # 计算有效检测车辆的速度,采用线性的从像素距离到真实空间距离的映射
                      (work_locations[i][1] - work_prev_locations[i][1]) ** 2) *  # 当视频拍摄视角并不垂直于车辆移动轨迹时,测算出来的速度将比实际速度低
            width[work_locations[i][3]] / (work_locations[i][4]) * fps / 5 * 3.6 * 2)
    for i in range(len(speed)):
        speed[i] = [round(speed[i], 1), work_locations[i][2]]  # 将保留一位小数的单位为km/h的车辆速度及其ID存入speed二维列表中
    ......


9. System integration

The complete source code & data set & environment deployment video tutorial & custom UI interface shown below

Insert image description here

Reference blog"ROI area pedestrian speed measurement statistical system based on Deepsort and OpenCV"

Guess you like

Origin blog.csdn.net/cheng2333333/article/details/134992760