Target Tracking|Paper Notes Sharing|ICV-6

Hello everyone, here is [Come to a Scallion Cake], this time I brought a paper sharing on target tracking, and I will share it with you~

I have done research on target tracking algorithms (mainly single target tracking SOT) for a period of time before, and have studied more than 40 top conference papers. Therefore, I set up a new column Object Tracking (SOT)|top meeting papers|study notes , paper notes to share with you, so that you can quickly understand the progress of target tracking and master different algorithm ideas. Welcome everyone to discuss and write your own thoughts in the comment area~

This article is the target tracking paper notes of ICCV-6, and I will share it with you. For analysis notes of other top conference papers, see other articles in the column, welcome to pay attention~

For specific paper analysis notes, see other articles in the column, everyone is welcome to pay attention, the link is as follows:
Target Tracking | Last Three Years | 45 Top Conference Papers Organized
Target Tracking | Seven Datasets | Organized
Target Tracking | Paper Note Sharing | ICCV- 6 papers
on target tracking|Paper notes sharing|ICCV-2 papers
on target tracking|Paper notes sharing|ECCV-6 papers
on target tracking|Paper notes sharing|CVPR-12 papers
on target tracking|Paper notes sharing|CVPR-10 papers (1)
target Tracking|Paper notes sharing|CVPR-10 articles (2)

1. Thesis topic

Essay topic
Learning to Track Objects from Unlabeled Videos
Learning Spatio-Temporal Transformer for Visual Tracking
Learning to Adversarially Blur Visual Object Tracking
HiFT: Hierarchical Feature Transformer for Aerial Tracking
Learn to Match: Automatic Matching Network Design for Visual Tracking
Saliency-Associated Object Tracking

2. Main idea

It mainly uses the transformer to use spatio-temporal features, unsupervised, focus on local salient areas, automatically match the network and improve the matching operator of siamese to resist attacks

3. Specific articles

Learning Spatio-Temporal Transformer for Visual Tracking

Learning spatiotemporal transformers for visual tracking

This paper uses transformer to design a tracker and combines spatio-temporal features.

It is a very effective method, you can refer to it!

The previous Siamese series of algorithms only used spatial features, so they are not very friendly to the scenes where the target disappears and the object changes too much.

Using transformers, the problem of long-distance interactions is addressed in sequence modeling.

Spatial information contains object appearance information for target localization, and temporal information contains object state changes between frames. Considering the superior ability to model global dependencies, tarnsformer integrates spatial and temporal information tracking to generate object localization with discriminative spatio-temporal features.

insert image description here

Learning to Track Objects from Unlabeled Videos

Learning to track objects from unlabeled videos

This paper uses an unsupervised tracker.

Three challenges of previous unsupervised tracking were found: moving object discovery, rich temporal variation exploitation, and online update.

Aiming at the above methods, a new unsupervised tracking method is proposed. First, use unsupervised optical flow and dynamic programming to sequentially sample moving objects; then use a single frame pair to train a naive Siamese tracker from scratch; finally, use a cyclic memory learning scheme to train the tracker to achieve online updates.

This paper also proposes that an important idea of ​​target tracking is to input a dynamic and constantly updated template object to ensure long-term tracking.

In summary, two points, one is unsupervised tracking, and the other is using online update.
insert image description here

Saliency-Associated Object Tracking

Significantly Associated Object Tracking

Most of the current trackers track and recognize the whole target, but it is difficult to track targets with various shape changes. Another idea is to divide the target into the same small blocks for local tracking, and track all small blocks in parallel, and use all these small blocks for parallel tracking.

But in fact, many local small blocks are useless and will have a bad influence on the result.

Therefore, this paper proposes to use the local saliency mining module to capture the local saliency based on the local tracking of the salient area, and then use the saliency association modeling module to associate the captured saliency together for tracking and state estimation.

This method has a better tracking effect on objects with large shape changes.

insert image description here

HiFT: Hierarchical Feature Transformer for Aerial Tracking

Aerial Tracking of Hierarchical Feature Tarnsformer

This paper mainly aims at improving the Siamese series of tracking methods, and proposes an efficient and effective layered feature transformer (shift) aerial tracking**. The hierarchical similarity graph generated by the multi-level convolutional layer is input into the feature transformer to realize the interactive fusion of spatial clues (shallow layer) and semantic clues (deep layer).

Due to the use of transformer, the global context information can be better extracted to facilitate target search; multi-level feature learning is used to obtain a feature space for tracking with strong recognizability.

Of course, the scene in this article is mainly a low-resolution tracking object captured by a drone in the air.

insert image description here

Learn to Match: Automatic Matching Network Design for Visual Tracking

Learning to Match: Network Design for Automatic Matching for Visual Tracking

The previous Siamese series algorithms work well, but for the input template object and search object, only cross-correlation is used to find the similarity. Two disadvantages: 1. Heuristic matching network design heavily relies on expert experience. 2. It is difficult for a single matching operator to guarantee stable tracking in all challenging environments

This paper proposes six other matching operators to explore the feasibility of matching operator selection, which is actually a very good idea!

We can combine them to explore complementary features and get better matching results.

In addition, the binary channel operation (BCM) is used to search for the optimal combination of these operators, and the optimal combination of various operators is automatically generated to obtain a general target tracking model to meet a variety of complex target tracking. Scenes.

insert image description here

Learning to Adversarially Blur Visual Object Tracking

Learning, adversarial blurred visual object tracking

Due to the previous object tracking, there is not much discussion on the robustness of tracking under blurred images, so this paper proposes Adversarial Blur Attack (ABA).

The author studies the principle of motion blur generation and designs a blur generation method for visual tracking.

But we don't seem to need such a research scenario? It may be possible to use this method to generate a corresponding data set in the future, and use it to train to improve the robustness of the model.

insert image description here

After that, I will share detailed notes of more than 40 top conference articles in the past three years in the column Target Tracking (SOT)|Top Conference Papers|Study Notes , so that everyone can get started quickly.

Interested students like + bookmark + follow, directly enter the column to learn ~ your support is my biggest motivation ~
interested students like + bookmark + follow, directly enter the column to study ~ your support is my greatest Motivation ~
Interested students like + bookmark + follow, directly enter the column to learn ~ Your support is my biggest motivation ~

Guess you like

Origin blog.csdn.net/weixin_42784535/article/details/128455672