Classic literature reading - Online Monocular Lane Mapping (using Catmull-Rom splines to complete online monocular lane mapping)

0. Introduction

It is very important for the autonomous driving industry to complete operations such as SLAM mapping with a monocular camera, "Online Monocular Lane Mapping Using Catmull-Rom Spline》Introduces a spline-based online monocular lane mapping method that relies only on a single camera and odometry. Our proposed technique models the lane association process as a bipartite graph assignment problem and weights edges by combining Chamfer distance, pose uncertainty, and lateral sequence consistency. In addition, control point initialization, spline parameterization, and optimization are carefully designed to gradually create, extend, and refine splines. The relevant code has been open source on Github.

1. Main contributions

Based on Catmull-Rom spline representation, a complete online lane mapping system is designed, as shown in Figure 1. The proposed system allows autonomous vehicles to construct local lane maps in real time using temporal images and odometry data, which can be used for self-localization, planning, and crowdsourced updates [4]. Overall, the specific contributions can be summarized as follows:

  1. We propose anonline monocular lane mapping system, including lane tracking and map optimization subsystems. The system candirectly output lightweight instance-level lane maps represented by Catmull-Rom splines without the need for offline vectorization.
  2. Every part of the system has been carefully designed to incorporate the features of lane markings and splines, including lane association, attitude estimation, spline initialization, expansion and optimization< /span>.
  3. Experiments on the publicly available dataset OpenLane show that our proposed method can improve lane association, odometry accuracy and map quality.

Insert image description here

Figure 1. Experimental results on the OpenLane dataset. Gray points represent the cumulative results of multi-frame detection using odometry. Colored curves represent splines of sample points for different instances in the map. The red sphere represents the control points of the spline.

2. System Overview

The structure of the proposed monocular lane mapping system is shown in Figure 2. The systemuses only monocular cameras and odometry (e.g. VIO, LIO) as input without a priori navigation maps or aerial photos. Outputs a compact map of lane markings, represented as splines. Specifically, the proposed framework consists of two subsystems: lane tracking and map optimization. Neural networks in lane tracking predict 3D lane markings directly based on input images. These prediction results are then further processed to meet subsequent needs, as detailed in Section 3.1. Subsequently, the processed lane markings are combined with those in the map, taking into account the pose information provided by the odometry (Section 3.2). Finally, the pose is updated based on the association results (see Section 3.3). In map optimization, splines are first initialized from scratch or expanded based on newly obtained detection results (Section 4.1). Finally, the incremental optimization framework iSAM2 [28] is applied to gradually update the splines in the map by adding new observations without losing information from past observations (Section 4.2).

Insert image description here

Figure 2. This block illustrates the entire process of the proposed monocular lane mapping system. The system is divided into two parts: lane tracking and map optimization. The former is used for lane mark association and pose updates, while the latter is used for initialization, expansion and optimization of splines. The factor plot is shown on the right. Different from traditional binary visual factors, the optimization process involves point-to-spline factors, which are used to optimize the four control point landmarks.

3. Lane tracking

3.1 Lane representation

In this study, we use Performer [12] to obtain lane detection results, which include unordered lane marking points and their associated instance-level labels. Furthermore, we distinguish between representations of lanes, including observations (detections) and landmarks (maps). Particularly, for the network prediction results, considering its sparsity and noise, we first convert it into a local reference frame (LRF), where the main direction of the lane is aligned with the X-axis. Subsequently, we perform a cubic polynomial fit on the X-Y and X-Z coordinates and sample at a specific resolution (set to 0.5 meters in our experiments). Therefore, the lane observation results can be expressed as:

Insert image description here

On the detected lane, d p 1 : M ∈ R 3 ^dp_{1:M} ∈ \mathbb{R}^3 dp1:MR3 is the sampling point, f x y f_{xy} fxy f x z f_{xz} fxz are polynomial coefficients respectively, c c c is the category (e.g. double yellow line, solid white line). d σ 1 : M ^dσ_{1:M} dσ1:Mis the standard deviation of the detection noise and can be set to be proportional to the 2-norm of the points.

The lane landmarks in the map are represented by Catmull-Rom splines, written L j = { P 0 : N + 1 , c } L_j = \{P_{0:N +1}, c\} Lj={ P0:N+1,c}, inside P 0 : N + 1 P_{0:N+1} P0:N+1 is the control point in the spline, c c c is a category. Piecewise spline L j L_j Lj Yes N N N segments (each two segments share three control points), each segment l l l There are four adjacent control points to determine the points of the curve l p ( u ) ^lp(u) lp(u)

Insert image description here

inside u ∈ [ 0 , 1 ] u ∈ [0, 1] in[0,1] are called parameters. Find the u u The process of u is called parameterization; τ τ τ controls the blending degree of the curve, usually set to 1/2. Without loss of generality, we call these four control points [l P 0 , l P 1 , l P 2 , l P 3 ] [^lP_0, ^lP_1, ^lP_2, ^lP_3] [lP0lP1lP2lP3]. In the following sections, for the sake of brevity, l l lsum d d d may be omitted. In this case, a point on the curve can be viewed as a weighted weight of four control points. These coefficients are shown in Figure 3(a).

Insert image description here

图3. (a) τ = 1 / 2 τ = 1/2 t=The four control point coefficients when 1/2. (b) Coarse-to-fine parameterization. Red stars represent control points and yellow represent sampled path points. We first find the closest two control points and then determine the parameters by finding the vertical foot on the polyline. Please note that the curvature of the actual lane markings is smaller and not as large as in the picture.


3.2 Lane correlation (key content)

Given a set of tests D D D和一组地标 L L L, 车道关联用彦道壀测 D i D_i Di With existing landmarks L j L_j Lj matches or identifies a new lane. To do this, we model the problem as an assignment problem using a bipartite graph and use K − M K-M KM algorithm [29], [30] to solve, the key point is how to determine the edges and their weights. First, related lanes should have the same category. Another natural idea is to sample some points on the spline first, and then calculate the distance between the two point clouds, such as the Chamfer distance, to determine D i D_i Digive L j L_j Ljsimilarity. However, the Chamfer distance can always be calculated, so a vertex in the bipartite graph will have an edge, resulting in no new lane markings being generated. Therefore, inspired by the recent KISS-ICP [31] and the work of Kim et al. [32], we restrict the search range as well as the upper bound of the Chamfer distance.

We calculate the difference between the odometry and the true attitude ( Δ R , Δ t ) ( ΔR , Δt ) (ΔR, Δ t). Different from the traditional assumption of noise in Lie group tangent space, we only use two parameters, namely the standard deviation of rotation and translation, respectively σ θ σ_θ pθ σ t σ_t pt,in:

Insert image description here

inko, 对于 D i D_i DiA point p k p_k pk, the upper bound of the distance between it and the true matching point δ k δ_k dk(using the 95% double standard deviation rule) can be expressed as:

Insert image description here

Therefore, given that the sample point is on the spline q ​​1 : Q q_{1:Q} q1:Q, detection point p 1 : M p_{1:M} p1:Mand odometer attitude T T T,I'll take care of you D i D_i Disum L j L_j LjThe distance between them is:

Insert image description here

Among n a n_a na represents the number of points that meet the distance threshold, I \mathbb{I} I is an indicator function, q k ’ q_{k’} qkhere p k p_k pkThe nearest sampling point, M n a \sqrt{\frac{M}{n_a}} naM is multiplied to penalize low match rates. In addition, we set an upper limit 2 m e a n ( δ k ) \sqrt{2}mean(δ_k) 2 mean(δk) to determine whether a new lane appears. 2 \sqrt{2} 2 Indicates that at least half of the points match.

Insert image description here

Figure 4. (a) Lane marking detection is represented by yellow points, and lane markings in the map are represented by blue points. u represents the correlation obtained by Euclidean distance, and red represents the wrong correlation. (b) Create an undirected graph with vertices representing associations. Two associations have an edge between them if they have horizontal order consistency. © gives the definition of the weight of each edge. The degree of a vertex is the sum of its edges. (d) The degree of a vertex will be used as the edge of the bipartite graph in the assignment problem.

Figure 5. (a) Two lane marking frames that need to be associated. (b) Visualization of the correlation results of these two frames. In each frame, the color represents the category of lane markings. In the association, red represents errors and green represents correct. Correlation based solely on Euclidean distance may lead to erroneous results due to pose uncertainty.

Nevertheless, relying solely on distance for data association may lead to ambiguity, especially given that Δ R ΔR The existence of ΔR is shown in Figure 5. In Euclidean space, detections and landmarks are not well separated and even intersect. To this end, we further weight the edges using lateral order consistency. Like most graph matching methods [33], we define the consistency between two associated pairs (edges in a bipartite graph). To better illustrate data association, we configure an example in Figure 4. There are four lanes, five landmarks, and seven associated pairs based on Euclidean distance, three of which are wrong (but still within the upper distance limit). Each associated pair acts as a vertex in the graph. The existence of an edge between two vertices depends on the lateral order consistency of the associated results they produce. For example, in BEV, for u 1 u_1 in1sum u 3 u_3 in3, we are in the driveway b ’ b_’ bUpsample two points (such as the start and end points), create a straight line, and determine the lane a ’ a_’ aWhether the sampling point (such as the middle point) of is above or below the straight line, and then for the lane c c c. If they have the same relative relationship, then u 1 u_1 in1sum u 3 u_3 in3There is consistency between . In addition, u 2 u_2 in2sum u 3 u_3 in3is also inconsistent because they share a landmark b ’ b_’ b. Without losing generality, we use u 3 u_3 in3For example, calculate its consistency score S ( u 3 ) S(u_3) S(u3)

Insert image description here

C C Cdisplay given u 3 u_3 in3 A set of connected vertices. for u 3 u_3 in3sum u 1 u_1 in1,We have

Insert image description here

inside a b s ( ⋅ ) abs(·) abs()Display complete, ϕ ϕ ϕ represents the minimum value of the previously mentioned point-to-line distance.

Finally, the edges of the bipartite graph can be obtained by multiplying two fractions, one of which is the reciprocal of the Chamfer distance mentioned earlier, and the other is S S Corresponding lateral sequence identity in S.

3.3 Posture update

The problem can be formalized as follows. Let T t T_t Tt indicates that the camera is at time t t Attitude transformation relative to the world coordinate system when t. Combined with the correlation results in Section 32, use T t T_t Tt d p k ^dp_k dpkProject to the world coordinate system by finding the foot points on the associated spline p ( u k ) p(u_k) p(uk)Confirmed number of arrivals u k u_k ink. Conclusion p ( u k ) p(u_k) p(uk) local direction d k d_k dk, defines a point-to-tangent residual, because the lane markings only provide lateral constraints for attitude estimation. The overall goal of registration is to find

…For details, please refer toGuyueju

Guess you like

Origin blog.csdn.net/lovely_yoshino/article/details/131895502