Feature extraction series based on deep learning (3): Transformer Guided Geometry Model for Flow-Based Unsupervised Visual Odometry paper

Summarize

Integrate the transformer mechanism into VO to obtain better performance. It is a combination of direct method and unsupervised learning. The main mechanisms are two: TAPE and F2FPE.

TAPE

TAPE - Transformer-based Auxiliary Pose Estimator
TAPE is a transformer-style pose estimator used to model geometric and temporal information in a short period of time.


This module inputs two sets of DF-Groups (composed of optical flow maps and depth maps). Each group consists of two depth maps and one optical flow map, and inputs the pose relationship of the two transformations. Equivalent to translating DF-Group into camera pose one-to-one. DF-Group passes through the convolution layer and obtains feature embedding through position coding. After going through multiple attention mechanisms, dropout, residual link, LN layer, etc., the result is finally obtained.

F2FPE

Flow-to-Flow Pose Estimator (F2FPE)
several keywords: Initial Flow Generator (IFG), Feature Encoder (FE), Pose Estimator (PE) and Final Flow Generator (FFG).
IFG is generated with a pre-trained optical flow The detector generates an initial optical flow map. The initial optical flow map generates the camera pose through FE and FFG, as shown in Figure 1. The initial optical flow map also takes Route 2 to obtain an improved optical flow map. In this solution, FFG can be removed.
Insert image description here

Network structure

Insert image description here
The whole process is as follows:
the original image is input, the depth map is extracted through the deep network, and the optical flow map and camera pose are generated through the F2FPE network. The DF-Group is composed of the upper depth map and the lower optical flow map, input TAPE, and generate the camera pose. These two camera poses are evaluated for pose consistency. There are also some branches in the middle that use the ISP module to generate specific pictures, ISP – picture information processing.

Related Links

Paper address

Guess you like

Origin blog.csdn.net/private_Jack/article/details/132980821