Classic literature reading--MapEX (no picture BEV perception)

0. Introduction

Online high-precision map (HDMap) estimation of sensors provides a low-cost alternative to traditional manual HDMap acquisition. Therefore, it is expected to reduce the cost of autonomous driving systems that already rely on HDMap, and it may even be possible to apply it to new systems.

Mind the map! Accounting for existing map information when estimating online HDMaps from sensor data》Recommended to improve online HDMaps by considering existing maps HDMap estimate. Mainly identifiedthree reasonable types of existing maps (simple maps, noisy and old maps). In addition, this article also introduces MapEX, a new online HDMap construction framework for awareness of existing maps. MapEX achieves this by encoding map elements as queries and by improving the matching algorithm used to train classic query-based map estimation models.

Insert image description here

Figure 1. We propose to use existing map information - even if inaccurate - to better estimate online HDMaps from sensor inputs. In doing so, we simplify the problem from generating a map using only sensors to using an always-available map assisted by sensors.

1. Main contributions

In summary, the main contributions of MapEX are as follows:

  1. Consider existing map information when estimating online HDMaps from sensor data;

  2. Plausible cases in which existing maps are incomplete are discussed. We also provide actual implementations and code for these scenes for the nuScenes dataset;

  3. Introduced MapEX, a new query-based HDMap acquisition scheme that can incorporate map information when estimating online HDMap from sensors. In particular, a new way to combine existing map information with existing (EX) queries has been introduced in MapEX, as well as a way to help the model learn to exploit this information by pre-attributing predictions to GT during training. method.

2. Review of online local high-precision map construction work

We provide here some brief overview of HDMaps in autonomous driving. The application of HDMap in trajectory prediction is discussed first, and then their acquisition is discussed. Finally, the online HDMap build itself is discussed.

HDMaps for trajectory prediction: Autonomous driving often requires a large amount of information about the world in which the vehicle navigates. This information is often embedded in rich HDMaps and serves as input to modify neural networks. HDMaps have proven to be critical to the performance of trajectory prediction. Particularly in trajectory prediction, some methods are explicitly based on HDMap's representation, so access to HDMap is absolutely required.

HDMap acquisition and maintenance: The acquisition and maintenance costs of traditional HDMap are high. While the HDMaps used in forecasting are just a simplified version that contains map elements (lane separators, road boundaries, etc.) and provides much of the complex information found in full HDMaps, they still require very precise measurements. As a result, many companies have been moving towards less stringent standards for medium-definition maps (MDMaps) or even satellite navigation maps (Google Maps, SDMaps). Crucially, an MDMap with a few meters accuracy will be a good example of an existing map, providing valuable information for the online HDMap generation process. Our map scenario 2a explores an approximation of this situation.

Online HDMap construction of sensors: Therefore, online HDMap construction has become the core of light image/imageless perception. While some work focuses on predicting virtual map elements, i.e., lane centerlines, some work focuses on more visually identifiable map elements: lane dividers, road boundaries, and crosswalks. Perhaps because visual elements are easier to detect by sensors, the latter approach has made rapid progress over the past year. Interestingly, the latest such method, Map-TRv2, does provide an auxiliary setting for detecting actual lane centerlines. This shows convergence to more complex schemes, including a large number of additional map elements (traffic lights, etc.).

The work of this paper is similar to commonly studied change detection problems, which aim to detect changes (e.g. intersections) in maps. The goal of MapEX is to generate accurate online HDMap with the help of existing (possibly very different) maps, which is achieved for the current online HDMap construction problem. Therefore, we not only corrected small errors in the map, but also proposed a more expressive framework that can accommodate any changes (e.g. distorted lines, very noisy elements).

Insert image description here

What existing maps can we use?

Our core proposition is that leveraging existing maps will facilitate online HDMaps construction. We believe that there are many legitimate circumstances in which imperfect maps may arise.

Online HDMap representation

We adopted a standard format for online sensor generation of HDMaps: we consider HDMaps to consist of 3 types of polylines, road boundaries, lane separations and crosswalks, with the same colors as the previous green, stone gray and blue respectively, as shown in Figure 2a Show.

While real HDMaps are much more complex and more sophisticated representations have been proposed, the purpose of this work is to investigate how existing map information can be interpreted. Therefore, we use the most studied paradigm. The work of this paper will be directly applicable to the prediction of more map elements, finer polylines or rasterized targets.

MapModEX: Simulating imperfect maps

Since acquisition for standard maps is expensive and time-consuming, we synthetically generated imprecise maps from existing HDMaps.

For this purpose we developed MapModEX, an independent map modification library. It takes nuScenes map files and sample records and outputs for each sample the polyline coordinates of sidewalks, boundaries, and crosswalks in a given patch around the ego vehicle. Importantly, MapModEX provides the ability to modify these polylines to reflect various modifications: deletion of map elements, addition, movement of crosswalks, addition of noise to point coordinates, map movement, map rotation, and map distortion. MapModEX will be available upon release to facilitate further re-searching of existing maps into the sensor's online HDMap acquisition.

We implemented three challenging scenarios using the MapModEX package, as described below, generating 10 variants of scenarios 2 and 3 for each sample (only one variant is allowed for scenario 1). We chose to use a fixed set of modified maps to reduce costs during training and reflect real-world situations where only a limited number of map variants may be available.

Scenario 1: Only borders are available

The first case is that only a rough HDMap (without separation strips and crosswalks) is available, as shown in Figure 2b. Road boundaries are often associated with 3D physical landmarks such as sidewalk edges, while sidewalks and crosswalks are often represented by flat markers that are easier to miss. Additionally, crosswalks and lane dividers are often abandoned due to construction work or road deviations, or even partially hidden by tire tracks.

Therefore, it is reasonable to use HDMaps with only borders. The advantage of this is that only the labeling of road constraints is required, which can reduce the cost of labeling. Additionally, locating only road boundaries may require less precise equipment and updates. Implementation From a practical point of view, the implementation of scenario 1 is simple: we remove dividers and crosswalks from the available HDMaps.

Scenario 2: Noisy map

The second possible scenario is that we only have very noisy maps, as shown in Figure 2c. One weakness of existing HDMaps is the need for high accuracy (on the order of a few centimeters), which puts great pressure on their acquisition and maintenance [11]. In fact, a key difference between HDMaps and the emerging MDMaps standard is the lower accuracy (a few centimeters versus a few meters).

Therefore, we recommend using noisy HDMaps to simulate situations where less accurate maps may be due to cheaper acquisition processes or to using the MDMaps standard instead. Even more interesting is that these less precise maps can be obtained automatically from sensor data. Although methods like MapTRv2 have achieved very impressive performance, they are not yet completely accurate: even with very flexible retrieval thresholds, prediction accuracy is well below 80%.

Implementation: We propose two possible implementations of these noisy HDMaps to reflect various conditions under which we may lack accuracy. In the first scenario 2a, we propose an offset noise setup, where for each map element positioning we add noise from a Gaussian distribution with a standard deviation of 1 meter. This has the effect of applying a uniform translation to every map element (dividers, borders, crosswalks). Such a setup should provide a good approximation of the situation where human annotators quickly provide imprecise annotations from noisy data. We chose a standard deviation of 1 meter to reflect the MDMaps standard accurate to a few meters.

We then test our method on a very challenging point-wise noise scenario 2b: for each ground truth point - remember, a map element consists of 20 such points - we start with a standard deviation of 5 Sample noise from a Gaussian distribution in meters and add it to the point coordinates. This provides a worst-case approximation for situations where the map automatically acquires or provides very imprecise positioning.

Scenario 3: The map has undergone substantial changes

The final case we consider is where we have access to old maps that were accurate in the past (see Figure 2d). It is quite common for paint markings such as crosswalks to shift from time to time. In addition, the city substantially renovated some problematic intersections or renovated areas to accommodate the increased traffic generated by the new attractions.

So it's interesting to work with HDMaps, which are valid in their own right but are not the same as actual HDMaps in a big way. When HDMaps were only updated every few years by maintainers to keep costs down, these maps should have appeared regularly. In this case, the existing map will still provide some information about the world, but may not reflect temporary or recent changes.

Implementation: We approximate this by making strong changes to existing HDMaps in scenario 3a. We removed 50% of the crosswalks and lane dividers in the map, added some crosswalks (half of the remaining crosswalks), and finally applied a small warp to the map.

However, it is important to note that large parts of the global map will remain unchanged over time. We count this in our scenario 3b, where we study the impact of randomly choosing (with probability p = 0.5) to consider the real HDMap instead of the perturbed version.

3. MapEX: Leverage existing maps

To this end we propose MapEX (see Figure 3), a new framework for online HDMap construction. It follows the standard query-based online HDMap construction paradigm and processes existing map information through two key modules: the map query encoding module and the prediction and GT pre-attribution scheme. This article builds a baseline based on MapTRv2.

Insert image description here

Overview

The query-based core is shown bythe gray elements in Figure 3. It first takes the sensor input (camera or lidar) and encodes it into a Bird's Eye View (BEV) representation as sensor features. Use a DETR-like detection scheme to detect map elements (up to N) to obtain the map itself. This is achieved by passing N×L learned query tokens (N is the maximum number of detected elements, L is the number of points predicted for the element) into a Transformer decoder, which uses the same BEV features as The cross-attention feeds sensor information to the query token. The decoded queries are then transformed into map element coordinates via a linear layer along with class predictions (including additional background classes), such that L query groups represent L points of the map element (L=20 in this paper). Training is done by matching predicted map elements and GT map elements using some variant of the Hungarian algorithm. Once matched, the model is optimized so that the predicted map element matches the GT to which it responds, using regression (for coordinates) and classification (for element categories) losses.

But this framework cannot interpret existing maps, which requires the introduction of new modules at two key levels. At the query level, we encode map elements into unlearnable EX queries. At the matching level, we prepend the query attributes to the GT map elements they represent.

The complete MapEX framework (shown in Figure 3) converts existing map elements into non-learnable map queries and adds learnable queries to reach a certain number of queries N×L. This complete set of queries is then passed to the Transformer decoder and transformed into predictions via a linear layer as usual. When training, our attribution model pre-matches some predictions to GT and the remaining predictions are matched normally using Hungarian matching. At test time, decoded non-background queries produce HDMap representations.

Convert map to EX query

There is no mechanism in the current online HDMap construction framework to interpret existing map information. Therefore, we need to design a new scheme that can translate existing maps into a form that can be understood by the standard query-based online HDMap construction framework. We propose a simple method using MapEX to encode existing map elements into an EX query for the decoder, as shown in Figure 4.

Insert image description here

For a given map element, we extract L equidistant points, where L is the number of points we seek to predict for any map element. For each point, we make an EX query that encodes its map coordinates (x,y) in the first 2 dimensions and the map element class (divider, intersection, or boundary) in the next 3 dimensions Perform one-time coding. The remainder of the EX query is padded with 0s to achieve the standard query size used by the decoder architecture.

Although this query design is very simple, it provides the key benefits of directly encoding the information of interest (point coordinates and element classes) and minimizing conflicts with already learned queries (thanks to rich 0 padding).

Once we have a set of L queries (for map elements in an existing map), we can retrieve ( ) a set of L categorical learnable queries from a pool of standard learnable queries. Then, following the method of this article, the generated N×L queries are fed to the decoder: in MapTR, N×L queries are treated as independent queries, while MapTRv2 uses a more effective decoupled attention scheme to combine the same map elements. of queries are grouped together. After predicting the map elements from the query, they can be used directly at test time or they can be matched to the trained GT.

Map element ownership

While EX queries introduce a way to interpret existing map information, there is nothing to ensure that the model correctly uses these queries to estimate the corresponding elements. In fact, if used alone, the network won't even recognize a fully accurate EX query. Therefore, we introduce pre-attribution of prediction and GT elements before using traditional Hungarian matching in training, as shown in Figure 3.

Simply put, we keep track of each map element in the modified map to which GT map element they correspond to: if the map element is not modified, shifted or distorted, we can relate it to the original map element in the real map . To ensure that the model learns to use only useful information, we only maintain a match in terms of the average point-by-point displacement score between the modified and real map elements:

Insert image description here

Given the correspondence between GT and pre-predicted map elements, we can remove the pre-attributed map elements from the pool of elements to be matched. The remaining map elements (predictions and GT) are then matched using some variation of the Hungarian algorithm, as is customary. Therefore, the Hungarian matching step only needs to identify which EX queries correspond to added map elements that do not exist, and find standard learned queries that fit some real map elements that do not exist in the real map (due to deletions or strong perturbations).

Reducing the number of elements that the Hungarian algorithm has to process is important because even the most efficient variant has a cubic complexity () [8]. This is not a major weakness of most current online HDMap acquisition methods, as the predicted maps are small (30m × 60m) and only three types of map elements are predicted. However, as online map generation develops further, it becomes necessary to accommodate an increasing number of map elements as predictive maps become larger and more complete.

4. Experimental results

Settings: We evaluated the MapEX framework on the nuScenes dataset as it is the standard evaluation dataset for online HDMap estimation. We are based on the MapTRv2 framework and official code base. Following common practice, we report the average accuracy of three map element types (divider, boundary, crossing) at different retrieval thresholds (0.5m, 1.0m and 1.5m chamfer distance), as well as the mAP of the three categories.

For each experiment, three experiments were performed using three fixed random seeds. Importantly, for a given combination of seeds and map scenes, the existing map data provided during validation is fixed to facilitate comparison. For consistency, we report results as mean ± standard deviation, to the nearest decimal point, even if the standard deviation exceeds this precision.

MapEX performance

We provide in Table 2 a comparison of related methods, as well as the performance of MapEX: maps without lane dividers or crosswalks (S1), maps with noise (S2a for offset map elements, S2b for strong point-wise noise) and drastically changed maps (S3a contains only these maps, S3b contains a mixture of real maps). We exhaustively compare the performance of MapEX to existing online HDMap evaluations on comparable settings (camera input, CNN backbone) and to the current state-of-the-art (which uses significantly more resources).

Insert image description here

First, it is clear from Table 2 that any type of existing map information makes MapEX significantly outperform the existing literature in comparable settings, regardless of the scenario considered. In all but one case, the existing map information even allowed MapEX to perform better than the current state-of-the-art MapTRv2 model, which uses a large ViT backbone pretrained on an extensive depth estimation dataset in four Training in twice as many periods. Even the rather conservative S2a scenario with imprecise map element positioning gets an 11.4mAP score improvement (i.e. 16%).

Across all scenarios, we observe consistent improvements over the base MapTRv2 model on all 4 metrics. Understandably, scenario 3b (using accurate existing maps half the time) produced the best overall performance by a large margin, thus demonstrating the strong ability to identify and exploit fully accurate existing maps. Both Scenario 2a (with offset map elements) and Scenario 3a (with "outdated" map elements) provide very strong overall performance, with good performance for all three types of map elements. In Scenario 1 there are only roads boundaries are available, showing huge mAP gains due to its (expected) very powerful boundary retrieval. Even in the extremely challenging scenario 2b, where Gaussian noise with a standard deviation of 5 meters is applied to each map element point, Significant gains are also obtained over the base model, with particularly good retrieval performance for delimiters and boundaries.

Improvements brought by MapEX

We now focus more specifically on the improvements that existing map information brings to MapEX. For reference, we compare the MapEX gains with those from other sources of additional information: Neural Map Prior with globally learned feature maps, satellite maps, and P-MapNet using geo-localized SDMaps. Importantly, MapModEX relies on stronger underlying models than these methods. While this makes it harder to improve upon the base model, it also makes it easier to achieve high scores. To avoid having an unfair advantage, absolute scores are provided in Table 3.

We see from Table 3 that using MapEX for any type of existing map results in a larger overall mAP gain than any other source of additional information, including the more complex P-MapNet setup. We observe large improvements in the model's detection performance on both lane dividers and road boundaries. A slight example is scenario 1 (only road boundaries are accessible), where the model successfully preserves map information on the boundaries, but only provides comparable improvement over previous methods on two map elements for which there is no prior information. Crosswalks appear to require more precise information from existing maps, as Scenario 1 and Scenario 2b (imposing extremely damaging noise on each map point) only provide comparable improvements over existing techniques. Scenario 2a (the elements have changed) and scenario 3a (the map is "outdated") result in high crosswalk detection scores, probably because these two scenes contain more accurate crosswalk information.

Insert image description here

ablation experiment
MapEX input contribution

Table 4 shows how different types of inputs (existing maps, map element correspondences, and sensor inputs) affect MapEX. Existing maps have greatly improved performance.

Insert image description here

About EX query encoding

Table 5 shows that the learned EX query performs much worse than our simple non-learnable EX query. Interestingly, initializing a learnable EX query with non-learnable values ​​may result in very small improvements that fail to justify the added complexity.

Insert image description here

On ground truth attribution

Since predetermining the attributes of map elements is important to make full use of existing map information, it might be easy to predetermine attributes for all corresponding map elements instead of filtering them as in MapEX. Table 6 shows that when existing map elements are too different, discarding correspondence indeed leads to stronger performance than indiscriminate attribution. Essentially, this suggests that MapEX is better off using learnable queries rather than EX queries when existing map elements differ too far from the ground truth.

Insert image description here

5. Discussion

This article proposes leveraging existing maps to improve online HDMap construction. To investigate this, the authors outline three realistic scenarios where existing (simple, noisy or outdated) maps are available and introduce a new MapEX framework to exploit these maps. Since there is no mechanism in the current framework to take existing maps into account, we developed two new modules: one to encode map elements into EX queries and another to ensure that the model utilizes these queries.

Experimental results show that existing maps represent key information for online HDMap construction and that MapEX significantly improves comparable methods in all cases. In fact, in terms of mAP - Scenario 2a with randomly moving map elements - it improves by 38% over the base MapTRv2 model and by 16% over the current state-of-the-art.

We hope this work will lead to new online HDMap construction methods to interpret existing information. Existing maps, good or bad, are widely available. Ignoring them is giving up a key tool in the search for reliable online HDMap builds.

Guess you like

Origin blog.csdn.net/lovely_yoshino/article/details/134799176