Target Tracking|Paper Note Sharing|CVPR-10 Papers (2)

Hello everyone, here is [Come to a Scallion Cake], this time I brought a paper sharing on target tracking, and I will share it with you~

I have done research on target tracking algorithms (mainly single target tracking SOT) for a period of time before, and have studied more than 40 top conference papers. Therefore, I set up a new column Object Tracking (SOT)|top meeting papers|study notes , paper notes to share with you, so that you can quickly understand the progress of target tracking and master different algorithm ideas. Welcome everyone to discuss and write your own thoughts in the comment area~

This article is the target tracking paper notes of CVPR-10 (2), and I will share it with you. For specific paper analysis notes, see other articles in the column, welcome to pay attention.

For specific paper analysis notes, see other articles in the column, everyone is welcome to pay attention, the link is as follows:
Target Tracking | Last Three Years | 45 Top Conference Papers Organized
Target Tracking | Seven Datasets | Organized
Target Tracking | Paper Note Sharing | ICCV- 6 papers
on target tracking|Paper notes sharing|ICCV-2 papers
on target tracking|Paper notes sharing|ECCV-6 papers
on target tracking|Paper notes sharing|CVPR-12 papers
on target tracking|Paper notes sharing|CVPR-10 papers (1)
target Tracking|Paper notes sharing|CVPR-10 articles (2)

1. Thesis topic

Essay topic
Learning to Filter: Siamese Relation Network for Robust Tracking
STMTrack: Template-free Visual Tracking with Space-time Memory Networks
LightTrack: Finding Lightweight Neural Networks for Object Tracking via One-Shot Architecture Search
Alpha-Refine: Boosting Tracking Performance by Precise Bounding Box Estimation
Graph Attention Tracking
CapsuleRRT: Relationships-aware Regression Tracking via Capsules
Progressive Unsupervised Learning for Visual Object Tracking
Towards More Flexible and Accurate Object Tracking with Natural Language:Algorithms and Benchmark
Siamese Natural Language Tracker: Tracking by Natural Language Descriptions with Siamese Trackers
Rotation Equivariant Siamese Networks for Tracking

2. Main idea

The main idea: combine nlp and cv, use transformer for target tracking; use siamese without template; handle tracking objects with rotation; more accurate bbox extraction; use contrastive learning, meta-learning, capsule network; lightweight neural network

3. Specific articles

Towards More Flexible and Accurate Object Tracking with Natural Language:Algorithms and Benchmark

Towards More Flexible and Accurate Natural Language Object Tracking: Algorithms and Benchmarks

This article is about natural language tracking, which belongs to the intersection of nlp and target tracking. In the past few years, there have been articles that combine nlp and video processing to further improve the accuracy of video processing by inputting language and video. Such as MDETR.

It is also a good potential innovation direction, but it is estimated that the computing power is very demanding

The TNL2K dataset proposed in this paper is specially designed for tracking through natural language norms, and contains multiple videos** with significant appearance changes and adversarial examples. It** also contains nature videos, animation videos, infrared videos, virtual game videos, etc.

A simple but powerful baseline method (called AdaSwitcher) is proposed for comparison in future work, which can adaptively switch between a local tracking system and a global grounding module (localizing objects in language in videos) . An adaptive mechanism is also used here.

insert image description here

STMTrack: Template-free Visual Tracking with Space-time Memory Networks

STMTrack: Template-Free Visual Tracking Using Spatiotemporal Memory Networks

Yet another article using adaptive, no-template tracking

In this paper, we propose a tracking framework based on spatio-temporal memory networks . The framework abandons the traditional template-based tracking mechanism and uses multiple memory frames and front-back label mapping to locate objects in query frames .

In spatio-temporal memory networks, the target information stored in multiple memory frames is adaptively retrieved through query frames , which makes the tracker have strong adaptive ability to target changes .

Pixel-level similarity computation of memory networks enables trackers to generate more accurate bboxes

insert image description here

Siamese Natural Language Tracker: Tracking by Natural Language Descriptions with Siamese Trackers

Siamese Natural Language Tracking: Natural Language Description of Tracking with Siamese Tracking

Another paper combining cv and nlp achieves a better effect of target tracking, a good nl tracker. In fact, there are very few nl trackers. Before this article, there were only two. .

Aiming at all Siamese trackers, this paper proposes a novel and general Siamese Natural Language Region Proposal Network (SNL-RPN) , which provides a wide range of strong tracking classes over the NL description baseline. A dynamic aggregation of predictions from visual and language patterns is proposed to convert SNL-RPN to real-time Siamese Natural Language Tracking (SNLT) .

The proposed SNLT consistently improves the performance** of SiamFC [1], SiamRPN [25] and SiamRPN++ [24], but with a slight decrease in speed. It also outperforms all NL trackers to date

This article combines vision and NL, so can other ideas be added (no template or multiple similarity calculation methods), is this a new article?

insert image description here

Rotation Equivariant Siamese Networks for Tracking

Tracking through a rotationally equivariant Siamese network

This paper aims at the situation that the tracking object generates rotation, and adds rotation and other variances to improve the effect.

1. We propose rotation equivariant siamese networks (RE-SiamNets) , which are constructed by using group equivariant convolutional layers composed of controllable filters. Rotate equivariance. A more robust way to enforce rotation equivariance in CNNs is to use steerable filters . Steerable filtering CNNs (sfcnn) also extend the concept of weight sharing from translation groups to rotation groups . For rotation equivariance of steerable filters, the network must convolve a different rotated version of each filter

To design RE-SiamNets, the regular CNN layers are replaced by rotation equivariant layers, and group pooling layers are used to output features in a single direction for each input . For the basic Siamese tracker we use SiamFC, its variants SiamFCv2 and SiamRPN++

But most experiments still use siamesefc

2. A dataset for rotation is proposed.

3. Moreover, SiamNets allow to estimate the orientation change of objects in an unsupervised way , so it is also convenient to use in relative 2D pose estimation .

It is a good idea, and the rotation processing module of this method can also be migrated to other siamese networks. The authors speculate that introducing additional types of equal variance to impose more constraints on the types of motion achievable in videos will result in more robust trackers. — This is a good idea, you can use other constraints to improve the content of handling rotated objects

insert image description here

Alpha-Refine: Boosting Tracking Performance by Precise Bounding Box Estimation

Alpha-Refine: Improve tracking performance through accurate bbox estimation

The purpose of visual object tracking is to accurately estimate the bbox of a given object, the accuracy of existing methods is limited, and the coupling of each stage severely restricts the portability of the method.

In this paper, we propose a new, flexible and accurate refinement module, Alpha-Refine (AR) , which can significantly improve the quality of bbox estimation for base trackers . The previous SiamMask was designed as an independent tracker rather than a refinement module, which is inappropriate and uneconomical for refinement of other trackers.

The authors propose a new Alpha-Refine method for visual tracking, which is an accurate and general refinement module, which can effectively improve the plug-and-play tracking performance of different types of trackers .

By exploring multiple design options, it is found that extracting and maintaining accurate spatial information is key for accurate box estimation.

Alpha-Refine finally employs pixel-accurate correlation layers, Key-Point-style prediction headers, and auxiliary mask headers .

It's a small change, but it does increase because of more attention to extracting and maintaining accurate spatial information.

can learn.

insert image description here

CapsuleRRT: Relationships-aware Regression Tracking via Capsules

Capsules: Relation-Aware Regression Methods for Object Tracking Using Capsule Networks

In this paper, the capsule network is used to improve the target tracking algorithm

However, the capsule network and our research findings are not very appropriate, and there is not much understanding before, so directly pass

Graph Attention Tracking

Graph Attention Tracking

This paper uses the graph attention mechanism to optimize target tracking

However, the graph attention mechanism is far from our research direction, and I don’t know much about it, so I went directly to

Progressive Unsupervised Learning for Visual Object Tracking

Progressive Unsupervised Learning for Visual Object Tracking

Contrastive learning distinguishes foreground from background, uses the trained contrastive learning model, unsupervised training on unlabeled videos, and uses a new noise robust loss to optimize the results.

An overview of progressive unsupervised learning (PUL) for learning feature representations for tracking .

We first use contrastive learning to learn a background recognition (BD) model, applying anchor-based hard negative mining . Find the corresponding positive and negative samples.

To learn temporal correspondence (TC), the BD model is applied to mine time-corresponding patches . Since the mined patch pairs are noisy (i.e., they lack precise spatial correspondence), a noise robust (NR) loss function is proposed for TC learning . In time-mined patches, the estimated target center is a red "x", while the true target center is a green circle .

The effect of this algorithm is better than other unsupervised tracking algorithms.

is a great idea to use contrastive learning for object tracking! And achieved unsupervised training!

insert image description here

Learning to Filter: Siamese Relation Network for Robust Tracking

Learning to filter: Siamese relational networks for robust tracking

This paper introduces two efficient modules, Relation Detector (RD) and Refinement Module (RM) .

RD employs a meta-learning approach to acquire the learning ability to filter distractors from the background, while RM aims to effectively integrate the proposed RD into the Siamese framework to generate accurate tracking results.

To further improve the discriminability and robustness of the tracker, we introduce a contrastive training strategy that not only tries to learn to match the same object, but also tries to learn how to distinguish different objects . As a result, our tracker is able to achieve accurate tracking results in the face of background clutter, fast motion, and occlusions

However, some ideas of meta-learning and comparative learning are mainly used, which are not very suitable for our research direction, and I don’t know much about it, so I just took a brief look.

insert image description here

LightTrack: Finding Lightweight Neural Networks for Object Tracking via One-Shot Architecture Search

LightTrack: An Architecture Search for Lightweight Neural Networks for Object Tracking

This paper is the first attempt to leverage neural architecture search to design a lightweight object tracker. — has a good prospect in industry

this method. LightTrack reformulates one-shot NAS specifically for object tracking and introduces an efficient search space . Extensive experiments on multiple benchmarks show that LightTrack achieves state-of-the-art performance while using fewer Flops and parameters . Additionally, LightTrack can run in real-time on a variety of resource-constrained platforms .

Neural Architecture Search (NAS), NAS aims to automate the design of neural network architectures. Most recent studies adopt a one-shot weight sharing strategy to amortize the search cost . The key idea is to train a single hyperparameterized hypernetwork model and then share weights across subnetworks . The single-path method with uniform sampling is a representative one-shot sampling method . In each iteration, it samples only one random path and trains on that path using a batch of data . Once the training process is complete, the subnetworks can be sorted according to their shared weights .

The authors propose a new one-shot NAS algorithm for object tracking tasks. Then, a lightweight search space consisting of depthwise separable convolutions and inverted residual structures is designed, allowing efficient tracking architectures to be built. Finally, LightTrack's pipeline is proposed, which is able to search different models for different deployment scenarios.

Once the process is complete, the subnets can be sorted according to shared weights.

The authors propose a new one-shot NAS algorithm for object tracking tasks. Then, a lightweight search space consisting of depthwise separable convolutions and inverted residual structures is designed, allowing efficient tracking architectures to be built. Finally, LightTrack's pipeline is proposed, which is able to search different models for different deployment scenarios.

After that, I will share detailed notes of more than 40 top conference articles in the past three years in the column Target Tracking (SOT)|Top Conference Papers|Study Notes , so that everyone can get started quickly.

Interested students like + bookmark + follow, directly enter the column to learn ~ your support is my biggest motivation ~
interested students like + bookmark + follow, directly enter the column to study ~ your support is my greatest Motivation ~
Interested students like + bookmark + follow, directly enter the column to learn ~ Your support is my biggest motivation ~

Guess you like

Origin blog.csdn.net/weixin_42784535/article/details/128455420