Hello everyone, here is [Come to a Scallion Cake], this time I brought a paper sharing on target tracking, and I will share it with you~
I have done research on target tracking algorithms (mainly single target tracking SOT) for a period of time before, and have studied more than 40 top conference papers. Therefore, I set up a new column Object Tracking (SOT)|top meeting papers|study notes , paper notes to share with you, so that you can quickly understand the progress of target tracking and master different algorithm ideas. Welcome everyone to discuss and write your own thoughts in the comment area~
This article is the target tracking paper notes of CVPR-10 (2), and I will share it with you. For specific paper analysis notes, see other articles in the column, welcome to pay attention.
For specific paper analysis notes, see other articles in the column, everyone is welcome to pay attention, the link is as follows:
Target Tracking | Last Three Years | 45 Top Conference Papers Organized
Target Tracking | Seven Datasets | Organized
Target Tracking | Paper Note Sharing | ICCV- 6 papers
on target tracking|Paper notes sharing|ICCV-2 papers
on target tracking|Paper notes sharing|ECCV-6 papers
on target tracking|Paper notes sharing|CVPR-12 papers
on target tracking|Paper notes sharing|CVPR-10 papers (1)
target Tracking|Paper notes sharing|CVPR-10 articles (2)
Article directory
- 1. Thesis topic
- 2. Main idea
- 3. Specific articles
-
- Towards More Flexible and Accurate Object Tracking with Natural Language:Algorithms and Benchmark
- STMTrack: Template-free Visual Tracking with Space-time Memory Networks
- Siamese Natural Language Tracker: Tracking by Natural Language Descriptions with Siamese Trackers
- Rotation Equivariant Siamese Networks for Tracking
- Alpha-Refine: Boosting Tracking Performance by Precise Bounding Box Estimation
- CapsuleRRT: Relationships-aware Regression Tracking via Capsules
- Graph Attention Tracking
- Progressive Unsupervised Learning for Visual Object Tracking
- Learning to Filter: Siamese Relation Network for Robust Tracking
- LightTrack: Finding Lightweight Neural Networks for Object Tracking via One-Shot Architecture Search
1. Thesis topic
Essay topic |
---|
Learning to Filter: Siamese Relation Network for Robust Tracking |
STMTrack: Template-free Visual Tracking with Space-time Memory Networks |
LightTrack: Finding Lightweight Neural Networks for Object Tracking via One-Shot Architecture Search |
Alpha-Refine: Boosting Tracking Performance by Precise Bounding Box Estimation |
Graph Attention Tracking |
CapsuleRRT: Relationships-aware Regression Tracking via Capsules |
Progressive Unsupervised Learning for Visual Object Tracking |
Towards More Flexible and Accurate Object Tracking with Natural Language:Algorithms and Benchmark |
Siamese Natural Language Tracker: Tracking by Natural Language Descriptions with Siamese Trackers |
Rotation Equivariant Siamese Networks for Tracking |
2. Main idea
The main idea: combine nlp and cv, use transformer for target tracking; use siamese without template; handle tracking objects with rotation; more accurate bbox extraction; use contrastive learning, meta-learning, capsule network; lightweight neural network
3. Specific articles
Towards More Flexible and Accurate Object Tracking with Natural Language:Algorithms and Benchmark
Towards More Flexible and Accurate Natural Language Object Tracking: Algorithms and Benchmarks
This article is about natural language tracking, which belongs to the intersection of nlp and target tracking. In the past few years, there have been articles that combine nlp and video processing to further improve the accuracy of video processing by inputting language and video. Such as MDETR.
It is also a good potential innovation direction, but it is estimated that the computing power is very demanding
The TNL2K dataset proposed in this paper is specially designed for tracking through natural language norms, and contains multiple videos** with significant appearance changes and adversarial examples. It** also contains nature videos, animation videos, infrared videos, virtual game videos, etc.
A simple but powerful baseline method (called AdaSwitcher) is proposed for comparison in future work, which can adaptively switch between a local tracking system and a global grounding module (localizing objects in language in videos) . An adaptive mechanism is also used here.
STMTrack: Template-free Visual Tracking with Space-time Memory Networks
STMTrack: Template-Free Visual Tracking Using Spatiotemporal Memory Networks
Yet another article using adaptive, no-template tracking
In this paper, we propose a tracking framework based on spatio-temporal memory networks . The framework abandons the traditional template-based tracking mechanism and uses multiple memory frames and front-back label mapping to locate objects in query frames .
In spatio-temporal memory networks, the target information stored in multiple memory frames is adaptively retrieved through query frames , which makes the tracker have strong adaptive ability to target changes .
Pixel-level similarity computation of memory networks enables trackers to generate more accurate bboxes
Siamese Natural Language Tracker: Tracking by Natural Language Descriptions with Siamese Trackers
Siamese Natural Language Tracking: Natural Language Description of Tracking with Siamese Tracking
Another paper combining cv and nlp achieves a better effect of target tracking, a good nl tracker. In fact, there are very few nl trackers. Before this article, there were only two. .
Aiming at all Siamese trackers, this paper proposes a novel and general Siamese Natural Language Region Proposal Network (SNL-RPN) , which provides a wide range of strong tracking classes over the NL description baseline. A dynamic aggregation of predictions from visual and language patterns is proposed to convert SNL-RPN to real-time Siamese Natural Language Tracking (SNLT) .
The proposed SNLT consistently improves the performance** of SiamFC [1], SiamRPN [25] and SiamRPN++ [24], but with a slight decrease in speed. It also outperforms all NL trackers to date
This article combines vision and NL, so can other ideas be added (no template or multiple similarity calculation methods), is this a new article?
Rotation Equivariant Siamese Networks for Tracking
Tracking through a rotationally equivariant Siamese network
This paper aims at the situation that the tracking object generates rotation, and adds rotation and other variances to improve the effect.
1. We propose rotation equivariant siamese networks (RE-SiamNets) , which are constructed by using group equivariant convolutional layers composed of controllable filters. Rotate equivariance. A more robust way to enforce rotation equivariance in CNNs is to use steerable filters . Steerable filtering CNNs (sfcnn) also extend the concept of weight sharing from translation groups to rotation groups . For rotation equivariance of steerable filters, the network must convolve a different rotated version of each filter
To design RE-SiamNets, the regular CNN layers are replaced by rotation equivariant layers, and group pooling layers are used to output features in a single direction for each input . For the basic Siamese tracker we use SiamFC, its variants SiamFCv2 and SiamRPN++
But most experiments still use siamesefc
2. A dataset for rotation is proposed.
3. Moreover, SiamNets allow to estimate the orientation change of objects in an unsupervised way , so it is also convenient to use in relative 2D pose estimation .
It is a good idea, and the rotation processing module of this method can also be migrated to other siamese networks. The authors speculate that introducing additional types of equal variance to impose more constraints on the types of motion achievable in videos will result in more robust trackers. — This is a good idea, you can use other constraints to improve the content of handling rotated objects
Alpha-Refine: Boosting Tracking Performance by Precise Bounding Box Estimation
Alpha-Refine: Improve tracking performance through accurate bbox estimation
The purpose of visual object tracking is to accurately estimate the bbox of a given object, the accuracy of existing methods is limited, and the coupling of each stage severely restricts the portability of the method.
In this paper, we propose a new, flexible and accurate refinement module, Alpha-Refine (AR) , which can significantly improve the quality of bbox estimation for base trackers . The previous SiamMask was designed as an independent tracker rather than a refinement module, which is inappropriate and uneconomical for refinement of other trackers.
The authors propose a new Alpha-Refine method for visual tracking, which is an accurate and general refinement module, which can effectively improve the plug-and-play tracking performance of different types of trackers .
By exploring multiple design options, it is found that extracting and maintaining accurate spatial information is key for accurate box estimation.
Alpha-Refine finally employs pixel-accurate correlation layers, Key-Point-style prediction headers, and auxiliary mask headers .
It's a small change, but it does increase because of more attention to extracting and maintaining accurate spatial information.
can learn.
CapsuleRRT: Relationships-aware Regression Tracking via Capsules
Capsules: Relation-Aware Regression Methods for Object Tracking Using Capsule Networks
In this paper, the capsule network is used to improve the target tracking algorithm
However, the capsule network and our research findings are not very appropriate, and there is not much understanding before, so directly pass
Graph Attention Tracking
Graph Attention Tracking
This paper uses the graph attention mechanism to optimize target tracking
However, the graph attention mechanism is far from our research direction, and I don’t know much about it, so I went directly to
Progressive Unsupervised Learning for Visual Object Tracking
Progressive Unsupervised Learning for Visual Object Tracking
Contrastive learning distinguishes foreground from background, uses the trained contrastive learning model, unsupervised training on unlabeled videos, and uses a new noise robust loss to optimize the results.
An overview of progressive unsupervised learning (PUL) for learning feature representations for tracking .
We first use contrastive learning to learn a background recognition (BD) model, applying anchor-based hard negative mining . Find the corresponding positive and negative samples.
To learn temporal correspondence (TC), the BD model is applied to mine time-corresponding patches . Since the mined patch pairs are noisy (i.e., they lack precise spatial correspondence), a noise robust (NR) loss function is proposed for TC learning . In time-mined patches, the estimated target center is a red "x", while the true target center is a green circle .
The effect of this algorithm is better than other unsupervised tracking algorithms.
is a great idea to use contrastive learning for object tracking! And achieved unsupervised training!
Learning to Filter: Siamese Relation Network for Robust Tracking
Learning to filter: Siamese relational networks for robust tracking
This paper introduces two efficient modules, Relation Detector (RD) and Refinement Module (RM) .
RD employs a meta-learning approach to acquire the learning ability to filter distractors from the background, while RM aims to effectively integrate the proposed RD into the Siamese framework to generate accurate tracking results.
To further improve the discriminability and robustness of the tracker, we introduce a contrastive training strategy that not only tries to learn to match the same object, but also tries to learn how to distinguish different objects . As a result, our tracker is able to achieve accurate tracking results in the face of background clutter, fast motion, and occlusions
However, some ideas of meta-learning and comparative learning are mainly used, which are not very suitable for our research direction, and I don’t know much about it, so I just took a brief look.
LightTrack: Finding Lightweight Neural Networks for Object Tracking via One-Shot Architecture Search
LightTrack: An Architecture Search for Lightweight Neural Networks for Object Tracking
This paper is the first attempt to leverage neural architecture search to design a lightweight object tracker. — has a good prospect in industry
this method. LightTrack reformulates one-shot NAS specifically for object tracking and introduces an efficient search space . Extensive experiments on multiple benchmarks show that LightTrack achieves state-of-the-art performance while using fewer Flops and parameters . Additionally, LightTrack can run in real-time on a variety of resource-constrained platforms .
Neural Architecture Search (NAS), NAS aims to automate the design of neural network architectures. Most recent studies adopt a one-shot weight sharing strategy to amortize the search cost . The key idea is to train a single hyperparameterized hypernetwork model and then share weights across subnetworks . The single-path method with uniform sampling is a representative one-shot sampling method . In each iteration, it samples only one random path and trains on that path using a batch of data . Once the training process is complete, the subnetworks can be sorted according to their shared weights .
The authors propose a new one-shot NAS algorithm for object tracking tasks. Then, a lightweight search space consisting of depthwise separable convolutions and inverted residual structures is designed, allowing efficient tracking architectures to be built. Finally, LightTrack's pipeline is proposed, which is able to search different models for different deployment scenarios.
Once the process is complete, the subnets can be sorted according to shared weights.
The authors propose a new one-shot NAS algorithm for object tracking tasks. Then, a lightweight search space consisting of depthwise separable convolutions and inverted residual structures is designed, allowing efficient tracking architectures to be built. Finally, LightTrack's pipeline is proposed, which is able to search different models for different deployment scenarios.
After that, I will share detailed notes of more than 40 top conference articles in the past three years in the column Target Tracking (SOT)|Top Conference Papers|Study Notes , so that everyone can get started quickly.
Interested students like + bookmark + follow, directly enter the column to learn ~ your support is my biggest motivation ~
interested students like + bookmark + follow, directly enter the column to study ~ your support is my greatest Motivation ~
Interested students like + bookmark + follow, directly enter the column to learn ~ Your support is my biggest motivation ~