Classic Literature Reading--RigidFusion (Dynamic Obstacle SLAM)

0. Introduction

In a real SLAM scene, we will find that when encountering a scene with a large number of dynamic obstacles, it is particularly easy to cause the problem of losing track. The traditional solution is to filter out dynamic obstacles, but this article " RigidFusion: Robot Localization and Mapping in Environments with Large Dynamic Rigid Objects " mentions that dynamic objects are tracked as rigid bodies. Although this article is not very innovative, but the research status is summarized very well. At present, the code of this article is not open source, but there are video explanations. In this paper, we propose a novel RGB-D SLAM method that can simultaneously segment, track and reconstruct both static backgrounds and large dynamic rigid objects that may obscure a major part of the camera's field of view. Previous methods treat dynamic parts of the scene as outliers and thus are limited to small changes in the scene , or rely on prior information on all objects in the scene for robust camera tracking. This paper proposes to treat all dynamic parts as a rigid body, segmenting and tracking both static and dynamic parts . Thus, in environments with large occlusions caused by dynamic objects, static backgrounds and rigid dynamic components can be localized and reconstructed simultaneously.

RigidFusion: Robot Localisation and Mapping inEnvironments with Large Dynamic Ri

1. Article contribution

Handling and transporting objects in an unmanned warehouse These tasks require mobile manipulation of robots, requiring robots to localize in a moving static environment while being robust to disturbances from dynamic objects and tracking the objects they need to manipulate. Although these two issues have been addressed separately before. But solving both tasks simultaneously is rare.
insert image description here

  • A new SLAM framework, using an RGB-D camera, simultaneously segments, tracks and reconstructs the scene throughout the SLAM framework, with a motion prior with potential drift to complete the construction of a static background and a dynamic rigid body.

  • A method using dense SLAM mapping that is robust to large dynamic occlusions (more than 65% of the field of view) in the visual input. At the same time, the SLAM model does not depend on the initialization of static and dynamic models;

  • A new RGB-D SLAM dataset is provided, which includes dynamic objects and real trajectories with large occlusions in the scene.

2. Details

In this paper, we propose a SLAM framework that treats dynamic parts as a single rigid body and uses motion priors to segment static and dynamic parts. Use the segmented image to track the camera and reconstruct the background and object models.

The figure below shows the reconstruction process of this method. First, two consecutive RGBD keyframes A and B are required, as well as the prior information of static and dynamic objects. The prior information of static and dynamic objects is ξ ~ s \tilde{ξ } _sX~sξ ~ d \tilde{ξ}_dX~dbelongs to se ( 3 ) se(3)se ( 3 ) , in addition, the semantic segmentation information of the previous frame is requiredΓ ~ A ∈ R w × h \tilde{\Gamma}_A\in \mathbb{R}^{w\times h}C~ARw × h . First, detect whether the object is a dynamic object according to the motion prior information; then, when the object is moving, based on the alignment between frames, we jointly estimate the segmentationΓ ~ B \tilde{\Gamma}_BC~Band rigid body motion ξ ~ s \tilde{ξ}_sX~sξ ~ d \tilde{ξ}_dX~d. These clips are used to reconstruct the static environment and dynamic objects, and the alignment of the frame to the model is used for camera positioning.
insert image description here

2.1 Image aggregation

Similar to [4], each new intensity and depth image ( I , D ) ∈ RW × H (I,D) ∈ \mathbb{R}^{W×H}(I,D)RW × H pairs are divided into K geometric clusters by using K-means clustering (g K-Means)V = { V i ∣ i = 1 , ⋅ ⋅ ⋅ , K } V =\{V_i| i = 1 ,··· ,K\}V={ Vii=1,⋅⋅⋅,K } . Assuming that each cluster satisfies the rigidity condition, each rigid body can be approximated by a combination of clusters. At the same time, each cluster is assigned a scoreγ i ∈ [ 0 , 1 ] γ_i ∈ [0,1]ci[0,1 ] represents the probability that the cluster belongs to a static rigid body:γ i = 0 γ_i=0ci=0 means dynamic cluster, andγ i = 1 γ_i=1ci=1 indicates a static cluster. For RGB-D frame A, we denote the overall score asγ A ∈ RK γ_A ∈ \mathbb{R}^KcARK

2.2 Distance Judgment

If the difference between two motion priors ∣ ∣ ξ ~ s − ξ ~ d ∣ ∣ 2 ||\tilde{ξ}_s− \tilde{ξ}_d||^2∣∣X~sX~d2 is less than the thresholdd ^ \hat{d}d^ , all clusters in the image are considered for static and motion segmentation. Otherwise, the scoreγB for the current frame is jointly optimized γ_BcBand the relative motion of static and dynamic rigid bodies ξ s ξ_sXsξ d ξ_dXd

2.3 Image Segmentation

The pixel -level segmentation Γ ~ B ∈ R w × h \tilde{\Gamma}_B\in \mathbb{R}^{w\times h} is then computed from the clusters and scoresC~BRw × h . Similar to static fusion, we compute weighted RGB-D images Γ ∼ B \tilde{\Gamma}_Bof static and dynamic rigid bodies from the segmentationC~B. These weighted images are used to reconstruct models of the background and dynamic objects, and to refine the estimated camera pose via frame-model alignment (Section V).

…For details, please refer to Gu Yueju

Guess you like

Origin blog.csdn.net/lovely_yoshino/article/details/127527794