Classic literature reading--LBEVPlace (LiDAR position recognition in bird's-eye view)

0. Introduction

We have talked a lot about how to complete relocation through frame and map matching in previous blogs, so as to complete location recognition. At the same time, there is also a blog " Positioning Analysis and Thinking " dedicated to introducing relocation . Current location recognition methods based on LiDAR are usually based on the representation of point clouds such as unordered points or range images. These methods achieve high retrieval recall, but their performance may degrade under view changes or scene changes. In BEVPlace : Learning LiDAR-based Place Recognition using Bird's Eye View Images , we explored the potential of a different representation for place recognition, namely Bird's Eye View (BEV) images. We observe that the structural content of BEV images is less affected by the rotation and translation of the point cloud. The code for this paper is also open source on Github

insert image description here

Figure 1. (a) Two range images from the KITTI dataset. The images were projected from two point clouds about 5 meters apart. Small translations of point clouds will introduce structural distortions such as scale changes and occlusions from objects in the scene. (b) Corresponding BEV image. The scale and location distribution of objects on the road remain almost unchanged. ©Performance on various datasets. The recall of the simple NetVALD network based on BEV representation at Top1 is comparable to the state-of-the-art methods. Our BEVPlace further improves the performance of the baseline.

1. Main contributions

The contributions of this article are as follows:

  1. A new LiDAR-based location recognition method BEVPlace is proposed. In this method, we extract rotation-equivalent local features from Bev images based on grouped convolution, which facilitates the design of rotation-invariant global features.

  2. The statistical correlation between characteristic and geometric distances of point cloud pairs is explored. Based on this correlation prior, we calculate the geometric distance between the query point cloud and the matching point cloud and use it for position estimation.

  3. We evaluate our method on three large-scale public datasets and show that our method is robust to view changes, has strong generalization ability, and achieves the best performance in terms of recall .

2. Preliminary preparation (explaining concepts)

m i m_i miFor the sensor in the pose Ti = (R i, ti) T_i = (R_i, t_i)Ti=Riti) , whereR i R_iRiis the rotation matrix, ti t_itiIt's the location. by nnThe database formed by n point clouds and their related poses can be expressed asM = { ( mi , T i ) } i = 1 , 2 , … , n M = \{(m_i, T_i)\}_{i = 1 ,2,…,n}M={(miTi)}i=1,2n. Given a query point cloud mq m_qmq, venue identification aims at MM from a pre-built databaseFind the point cloud with the most structural similarity in M. In LiDAR-based place recognition problems, it is generally considered that if two point clouds are geometrically close, they are structurally similar. To achieve this goal, we design a networkf ( ⋅ ) f(·)f() maps the point cloud to a unique compact global feature vector such that ifmq m_qmqStructurally similar to mi m_imiBut with mj m_jmj不相似,则 ∣ ∣ f ( m q ) − f ( m i ) ∣ ∣ 2 < ∣ ∣ f ( m q ) − f ( m j ) ∣ ∣ 2 ||f(m_q)-f(m_i)||^2 <||f(m_q)-f(m_j)||^2 ∣∣f(mq)f(mi)2<∣∣f(mq)f(mj)2 . Based on networkfff , we perform place retrieval by finding the point cloud with the smallest feature distance from the query point cloud.

In this work, we train our network based on BEV images of point clouds. In addition to place retrieval, we also develop an extended use for estimating the location of a query point cloud.

3. Summary of methods

The method in this article consists of two modules, as shown in Figure 2. In the BEVPlace network, we project the query point cloud into the BEV image. We then extract rotation-invariant global features through a set of convolutional networks and NetVLAD [2]. In the position estimator, we retrieve the closest features of global features from a pre-built database. We recover the geometric distance between query and matching point clouds based on a mapping model. Based on the recovered distance, the location of the query is estimated.

insert image description here

Figure 2: Two modules of our approach. In the BEVPlace network, we project point clouds into BEV images and extract rotation-invariant global features. In the position estimator module, we recover the geometric distance from the feature space and estimate the position of the query point cloud.

4. BEVPlace network (important part)

In road scenarios, LiDAR sensors on cars or robots can only move on the ground plane. Since we generate BEV images by projecting point clouds onto the ground plane, changes in the sensor's perspective will cause rotational transformations on the image. To achieve robust place recognition, we aim to design a network f that extracts rotation-invariant features from BEV images. Will BEV Image IIRotation transformation R ∈ SO (2) R∈SO(2)on IRSO ( 2 ) is expressed asR ◦ IR◦IRI f f The rotation invariance of f can be expressed as:

insert image description here

A straightforward way to achieve this invariance is to use data augmentation [17] to train the network. However, data augmentation typically requires the network to have a larger set of parameters to learn rotations, and may fail to generalize to combinations of rotations and scenes not present in the training set. In this work, we use a cascade of group convolutional networks and NetVLAD to achieve rotation invariance. Our BEVPlace has strong generalization ability because the network is inherently rotation invariant.

4.1 Bird’s-eye image generation

We follow the approach of BVMatch [20] to construct images using point densities. We divide the ground space into grids with a grid size of 0.4 m. For point cloud mmm , we count the number of points in each grid and use the normalized point density as the BEV imageIIThe pixel intensity of I.

4.2 Group convolutional network (rotation transformation)

Group convolutions treat feature maps as functions of corresponding symmetric groups [30]. Consider the 2D rotation group SO ( 2 ) SO(2)SO ( 2 ) , in BEV imageIIApply group convolutionfgc f_{gc} on Ifgcwill produce rotational equivariant characteristics, which can be written as

insert image description here

That is, the input III transform RRby rotationR transforms and then mapsfgc f_{gc}fgcPassing it should be done with f_{gc} via fgc firstfgcProjection III , then useR ′ ∈ SO ( 2 ) R′ ∈ SO(2)RThe results of SO ( 2 ) transformed features are the same. Usually, the designfff to makeR′ = R R′ = RR=R

Group convolution has been developed for several years, and there are some mature group convolution designs [30, 18, 32]. We implemented our network based on GIFT [18]. GIFT was originally designed for image matching and can produce unique local features. Our main modification to GIFT is to remove the scale feature since there is no scale difference between BEV images. More details of our network implementation are attached in the supplementary material.

insert image description here

4.3 Loss function

For the LiDAR-based place recognition problem, there are some loss functions [1, 33]. In this work, we use the simple and commonly used lazy triplet loss [1] to train our network, whose formula is:

insert image description here

…For details, please refer to Gu Yueju

Guess you like

Origin blog.csdn.net/lovely_yoshino/article/details/130111243