Optimal Transmission of SuperGlue Learning Records

In the process of learning the theory of optimal transmission, I found the paper SuperGlue, which completes the matching process of feature points through optimal transmission.

SuperGlue structure

insert image description here
Let's take a look at its structure first:

First, two pictures are sent to the feature extraction network, and the features are extracted through the convolutional network. There are four main values, which are the feature information of the two pictures, diA and diB (1,256,997), 256 is the dimension feature, and the position information piA With piB(1,997,2) 997 refers to the number of feature points in the image, and 2 refers to the xy coordinates.
Then the position of the feature point is sent to the KeyPoint Encoder for dimension conversion, which becomes (1,256,997) and (1,256,1074), and then sent to AGNN (Attentional Graph Neural Network). This module is based on Transformer for self-attention and crossover Attention calculation, finally obtain the features of the two images, which are (1, 256, 1074) and (1, 256, 997), and then use the two feature information to calculate the score. The specific calculation method is:

mdesc0, mdesc1 = self.final_proj(desc0), self.final_proj(desc1)
# Compute matching descriptor distance.
scores = torch.einsum('bdn,bdm->bnm', mdesc0, mdesc1)

That is, the obtained value is Sij, and then borrowed from SuperPoint, the introduction of dustbin is to deal with the feature points that do not match, thus forming a cost matrix, and sending the constructed cost matrix into the Sinkhorn algorithm for calculation, and finally get the transmission plan and loss value.

The figure below shows the data transformation and model structure diagram.
insert image description here

Guess you like

Origin blog.csdn.net/pengxiang1998/article/details/131794469
Recommended