OpenMMLab MMTracking target tracking official document learning (1)

introduce

MMTracking is an open source video perception toolbox for PyTorch. It is part of the OpenMMLab project.
It supports 4 video tasks:
Video Object Detection (VID)
Single Object Tracking (SOT)
Multiple Object Tracking (MOT)
Video Instance Segmentation (VIS)

main feature

The First Unified Video Perception Platform
We are the first open-source toolbox that unifies multifunctional video perception tasks, including video object detection, multiple object tracking, single object tracking, and video instance segmentation.

Modular Design
We decompose the video perception framework into different components, and customized methods can be easily built by combining different modules.

Simple, Fast, Powerful
Simple: MMTracking interacts with other OpenMMLab projects. It builds on MMDetection, we can utilize any detector just by modifying the configuration.
Fast: All operations run on the GPU. Training and inference speeds are faster or comparable to other implementations.
Strong: We reproduce state-of-the-art models, some of which even outperform official implementations.

getting Started

For the basic usage of MMTracking, please refer to get_started.md.
A Colab tutorial is provided. You can preview the notebook here or run it directly on Colab.

Extra account, extra account~ MMTracking will start to update continuously

At the same time, an efficient and powerful benchmark model is released, some of which exceed the official version (such as SELSA in video target detection, Tracktor in multi-target tracking, SiameseRPN++ in single target tracking), and some academic datasets (such as ImageNet VID) reach SOTA level.

1) When MMTracking V0.5.0 was just released, we supported the following algorithms:

Video object detection (VID) algorithm:

Video object detection only needs to detect each frame in the video, and does not require the association of the same object in different frames.
DFF (CVPR 2017)
FGFA (ICCV 2017)
SELSA (ICCV 2019)
Temporal RoI Align (AAAI 2021)

Multiple Object Tracking (MOT) Algorithm:

Based on the completion of video target detection, multi-target detection focuses more on associating the same target in the video.
SORT (ICIP 2016)
DeepSORT (ICIP 2017)
Tracktor (ICCV 2019)

Single Object Tracking (SOT) Algorithm:

Single-target tracking focuses more on human-computer interaction, and the algorithm needs to be able to continuously track it given a target of any category and shape.
Siamese RPN++ (CVPR 2019)

MMTracking V0.7.0 is released! This version mainly adds the following features:
Codebase
refactoring, improved English user documentation, Chinese user documentation, with vivid examples to tell you how to reason, test, and train VID, MOT, SOT models
Support FP16 training and testing of all algorithms
VID
supports the new video object detection algorithm Temporal RoI Align (AAAI 2021), and provides a pre-trained model using ResNeXt-101 as the backbone for all video object detection algorithms. This method has a performance of 84.1 mAP@50 in the ImageNet VID dataset
MOT provides
Tracktor results on MOT15, MOT16, and MOT20. On MOTA, the main evaluation indicator of the more complex MOT20 dataset, it is 5.3 points higher than the official version. It supports training ReID models
in MOT datasets.
Visual analysis of errors (FP, FN, IDS) on MOT data sets.
SOT
supports more SOT data sets: LaSOT, UAV123, TrackingNet, and other mainstream data sets will also be supported soon.

Full interaction between projects within OpenMMLab

MMTracking: OpenMMLab all-in-one video object perception platform

Object perception in video can be considered as a downstream task of 2D object detection in most cases and relies heavily on various 2D object detection algorithms . Before this, how to use or switch between different 2D object detectors is actually a very tedious and time-consuming task.

In MMTracking, we make full use of the achievements and advantages of other platforms of OpenMMLab, which greatly simplifies the code framework. For example, we import or inherit most of the modules in MMDetection, which greatly simplifies the code framework. In this mode, we can directly use all models in MMDetection through configs. Taking multi-target tracking as an example, each multi-target tracking model consists of the following modules :

import torch.nn as nn
from mmdet.models import build_detector

class BaseMultiObjectTracker(nn.Module):

    def __init__(self,
                 detector=None,
                 reid=None,
                 tracker=None,
                 motion=None,
                 pretrains=None):
        self.detector = build_detector(detector)
        ...

Config example:

model = dict(
    type='BaseMultiObjectTracker',
    detector=dict(type='FasterRCNN', **kwargs),
    reid=dict(type='BaseReID', **kwargs),
    motion=dict(type='KalmanFilter', **kwargs),
    tracker=dict(type='BaseTracker', **kwargs))

Get started quickly! VID of MMTracking Edible Guide (Video Object Detection) (with interpretation of AAAI2021 paper!)

1. Run VID demo

This script can use a video object detection model to infer input video.

python demo/demo_vid.py \
    ${CONFIG_FILE}\
    --input ${INPUT} \
    --checkpoint ${CHECKPOINT_FILE} \
    [--output ${OUTPUT}] \
    [--device ${DEVICE}] \
    [--show]

And INPUT and OUTPUT support mp4 video format and folder format.
Optional parameters:

OUTPUT: The output of the visualization presentation. If not specified, --show is obliged to play the video on the fly.
DEVICE: The device used for inference. The options are cpu or cuda:0 etc.
–show: Whether to dynamically display the video.

example:

Suppose you have downloaded the checkpoint to the directory checkpoints/, your video file name is demo.mp4, and the output path is ./outputs/

python ./demo/demo_vid.py \
    configs/vid/selsa/selsa_faster-rcnn_r50-dc5_8xb1-7e_imagenetvid.py \
    --input ./demo.mp4 \
    --checkpoint checkpoints/selsa_faster_rcnn_r101_dc5_1x_imagenetvid_20201218_172724-aa961bcc.pth \
    --output ./outputs/ \
    --show

Just execute the following command in the MMTracking root directory to run the VID demo using the SELSA + Temporal RoI Align algorithm

python demo/demo_vid.py configs/vid/temporal_roi_align/selsa-troialign_faster-rcnn_r101-dc5_8xb1-7e_imagenetvid.py  --input demo/demo.mp4 --checkpoint checkpoints/selsa_troialign_faster_rcnn_r101_dc5_7e_imagenetvid_20210822_111621-22cb96b9.pth --output vid.mp4  --show
python demo/demo_vid.py configs/vid/temporal_roi_align/selsa-troialign_faster-rcnn_x101-dc5_8xb1-7e_imagenetvid.py --input demo/demo.mp4 --checkpoint checkpoints/selsa_troialign_faster_rcnn_x101_dc5_7e_imagenetvid_20210822_164036-4471ac42.pth --output vid.mp4  --show

insert image description here

2. Test the VID model

Use the following commands in the MMTracking root directory to test the VID model and evaluate the model's bbox mAP

./tools/dist_test.sh configs/vid/temporal_roi_align/selsa_troialign_faster_rcnn_r101_dc5_7e_imagenetvid.py 8 \
    --checkpoint checkpoints/selsa_troialign_faster_rcnn_r101_dc5_7e_imagenetvid_20210822_111621-22cb96b9.pth \
    --out results.pkl \
    --eval bbox

3. Training VID model

Use the following commands in the MMTracking root directory to train the VID model and evaluate the model's bbox mAP at the last epoch

./tools/dist_train.sh ./configs/temporal_roi_align/selsa_troialign_faster_rcnn_r101_dc5_7e_imagenetvid.py 8 \
    --work-dir ./work_dirs/

The latest online! MMTracking Video Instance Segmentation Edible Guide
is new! MMTracking single object tracking task edible guide
MMTracking multiple object tracking (MOT) task edible guide

Guess you like

Origin blog.csdn.net/qq_41627642/article/details/131723815