Paper Express: A Line Feature Matching Algorithm Based on NLP Ideas for Visual Positioning

标题:Line as a Visual Sentence:Context-aware Line Descriptor for Visual Localization

Authors: Sungho Yoon1 and Ayoung Kim2∗

Summary

    In the field of robotics and computer vision, in addition to calculating multi-view geometry using image-matched feature points to solve problems, line feature calculations can also be used because line features can provide redundant constraints. Although the linear feature descriptor based on CNN has great potential in the application of viewpoint changes and dynamic environments, we believe that CNN has a natural disadvantage when the length of the line changes, because CNN needs to describe the changing line length with a fixed-dimensional descriptor. . In this paper, we propose that the Line-Transformers method can effectively solve the problem of line length variation. Inspired by Natural Language Processing (NLP), we regard line segments as sentences and feature points as words. By dynamically observing describable points on a line, our descriptor performs well for lines of variable length. We also propose line signature networks, which can share the geometric properties of a line to its neighbors. And we also use the line feature description and matching method proposed in this paper to realize point-line-based localization (PL-Loc). We demonstrate that combining our proposed line features can well improve the visual localization performance based on feature points. We demonstrate the effectiveness of the algorithm in terms of homography estimation and visual localization.

    Project open source address: https://github.com/yosungho/LineTR

Line Transformers

1. Line Tokenizer: The author borrows relevant ideas from NLP. The process of dividing a sentence into multiple phrases in NLP is called tokenization. The author regards a straight line as a sentence in NLP, and uses feature points to divide a straight line into different line segments. The extracted feature points are expressed as: pi = (x, y, c)i, where x, y represent the position, and c represents confidence. The length of the interval between two adjacent points is represented by v, then the number of feature points on a straight line is n = ⌊l/v⌋+1, where l represents the length of the straight line. The specific process is shown in the figure below:

    Personal understanding, when using the neural network corresponding to NLP, it is necessary to vectorize the information of the straight line, and the above figure is the vectorization process, so that the straight line features are abstracted into the corresponding features of the NLP network.

2. Transformer: The author uses Transformers to build a linear feature description sub-model. Transformer's encoder consists of two parts: MSA layer (multi-head self-attention layers) and MLP layer. The feature descriptor of the straight line can be obtained by stacking the Transformer L times, as shown in formula 1:

    In the formula, z0 is the input of Transformers, Eline is the initial state of the descriptor; En represents the description vector of the nth feature point; Epos represents the position information of each feature point; in order to solve the problem of different feature lengths of straight lines, add mask0 to remove The feature points with relatively low correlation; d is the obtained linear feature descriptor.

Line Signature Networks

    In addition to establishing a single straight line feature description, the author designed a straight line signature network, clustering the straight lines into clusters according to the position and angle of the straight lines, and sharing the information (position, angle) of adjacent straight lines through the information transfer network. The transfer formula is as follows:

Sublines to Keylines

    Since Transformers has a limit on the maximum number of tokens, for straight line features, it will limit the maximum number of feature points on the line, that is, it will limit the maximum line length. In order to solve this problem, this paper proposes the concept of Sublines and Keylines. The original straight line becomes the Keyline. When the length of the straight line exceeds the maximum length limit, the original straight line will be divided into multiple sub-lines, namely Sublines. At the same time, an adjacency matrix (Adjacency Matrices) is designed to convert the Subline description sub-distance matrix to the Keyline distance matrix. . As follows:

Loss Function

    The design of the cost function adopts a semi-hard negative sampling strategy (semi-hard negative sampling strategy), and the cost function is designed as follows:

    Among them, ai is the anchor descriptor (personal understanding is the descriptor when calculating the matching distance), Pi is the positive descriptor (personal understanding is the relatively similar descriptor on the matching), ni is the negative descriptor (personal understanding is the matching does not The above descriptors), as can be seen from the loss function, is to minimize the distance between the matched descriptors and maximize the distance between the unmatched descriptors.

Experimental results

    The author evaluates the algorithm ability through homography estimation and the effect of visual localization. The compared algorithms include SuperPoint, LBD, LLD, WLD, SOLD.

    Through the above results, it can be proved that the algorithm proposed in this question is generally superior to other comparative algorithms in terms of indicators.

Summarize

    This paper proposes a line feature description method based on NLP ideas, and through quantitative experiments, it is verified that it is superior to other line matching algorithms in terms of homography estimation and visual positioning, and this method has been open sourced on github. I am interested in line matching Students can check it out.

Abstract

    Along with feature points for image matching, line features provide additional constraints to solve visual geometric problems in robotics and computer vision (CV). Although recent convolutional neural network (CNN)-based line descriptors are promising for viewpoint changes or dynamic environments, we claim that the CNN architecture has innate disadvantages to abstract variable line length into the fixed-dimensional descriptor. In this paper, we effectively introduce Line-Transformers dealing with variable lines. Inspired by natural language processing (NLP) tasks where sentences can be understood and abstracted well in neural nets, we view a line segment as a sentence that contains points (words). By attending to well-describable points on aline dynamically, our descriptor performs excellently on variable line length. We also propose line signature networks sharing the line's geometric attributes to neighborhoods. Performing as group descriptors, the networks enhance line descriptors by understanding lines' relative geometries. Finally, we present the proposed line descriptor and matching in a Point and Line Localization (PL-Loc). We show that the visual localization with feature points can be improved using our line features. We validate the proposed method for homography estimation and visual localization.

Guess you like

Origin blog.csdn.net/qq_41050642/article/details/128256117