[Paper Express] Arxiv2019 - MultiPath: Multiple Probabilistic Anchor Trajectory Hypothesis for Behavior Prediction

【论文原文】：MultiPath: Multiple Probabilistic Anchor Trajectory Hypotheses for Behavior Prediction

【Author Information】：Yuning Chai∗ Benjamin Sapp∗ Mayank Bansal Dragomir Anguelov

获取地址：https://arxiv.org/pdf/1910.05449.pdf

Blogger keywords: motion planning, trajectory prediction

Recommended related papers:

- 无

Summary:

Predicting human behavior is a difficult but critical task in motion planning. This is challenging in large part because of the high degree of uncertainty and multimodal set of possible outcomes in real-world domains such as autonomous driving. Apart from single MAP trajectory prediction [1,2], obtaining accurate probability distributions for the future is an area of active attention [3,4]. ** We propose MultiPath, which exploits a fixed set of future state sequence anchors that correspond to patterns in the trajectory distribution. **In inference, our model predicts a discrete distribution over the anchors, and for each anchor, regresses the offsets of the anchor waypoints along with the uncertainty, producing a Gaussian mixture at each time step. Our model is efficient, requiring only a single forward inference pass to obtain multimodal future distributions, and the output is parametric, allowing compact communication and analytical probabilistic queries. We show on several datasets that our model achieves more accurate predictions, and in doing so uses orders of magnitude fewer trajectories than the sampled baseline.

Introduction:

We focus on the problem of predicting future agent states, a key task for robot planning in real-world environments. We are particularly interested in solving this problem for self-driving cars, an application with potentially huge societal impact. Importantly, anticipating the future of other media in the field is critical for safe, comfortable and efficient operation. For example, if a car is going to cut in front of our robot, it's important to know whether to give way, or when is the best time to blend into traffic. Such future prediction requires an understanding of static and dynamic world environments: road semantics (e.g., lane connectivity, stop lines), traffic light information, and past observations by other agents, as shown in Figure 1.

insert image description here

Fig. 1. MultiPath estimates the distribution of future trajectories for each agent in a scene as follows: 1) Based on the top-down scene representation, the scene CNN extracts mid-level features that encode the states of individual agents and their interactions . 2) For each agent in the scene, we crop an agent-centric view of mid-level feature representations and predict probabilities for a fixed set of K predefined anchor trajectories. 3) For each anchor, the model regresses the offset of the anchor state and the uncertainty distribution for each future time step.

A fundamental aspect of future state prediction is that it is inherently stochastic, since agents have no way of knowing each other's motivations. When driving, we can never be sure what other drivers will do next, and it is important to consider multiple outcomes and their likelihood.

We seek a future model that can provide (1) a weighted, parsimonious set of discrete trajectories covering the space of possible outcomes, and (2) a closed-loop assessment of the likelihood of any trajectory. These two properties enable efficient reasoning in key planning use cases, e.g., human-like responses to discrete trajectory hypotheses (e.g., yield, follow), and probabilistic queries, such as the expected risk of collision in a spatiotemporal region.

Both of these properties pose modeling challenges. Models attempting to achieve diversity and coverage often suffer from mode collapse during training [4,5,6], while tractable probabilistic inference is difficult due to the exponential growth in the space of possible trajectories over time.

** Our multipath model addresses these issues with a key insight: it uses a fixed set of trajectory anchors as the basis for modeling. **This allows us to consider stochastic uncertainty hierarchically: first, intent uncertainty captures uncertainty about what the agent intends to do, and is encoded as a distribution over a set of anchor trajectories. Second, given an intention, uncertainty in control represents our uncertainty about how they will achieve it. We assume that the control uncertainty is normally distributed at each future time step [7], parameterized such that its mean corresponds to a specific contextual offset from the anchor state, and the associated covariance captures unimodal arbitrary uncertainty [8]. Figure 1 shows a typical scenario where, given the scene context, there are 3 possible intents, controlling the average offset to refine the road geometry, and controlling the uncertainty to grow intuitively over time.

Our trajectory anchors are patterns found in the training data in state sequence space by unsupervised learning. These anchors provide templates for the agent's coarse-grained futures, possibly corresponding to semantic concepts such as "change lane" or "slow down" (although to be clear, we did not use any semantic concepts in our modeling).

Our full model predicts a Gaussian mixture model (GMM) at each time step, with the mixture weights (intent distribution) fixed over time. Given such a parameter distribution model, we can directly evaluate the likelihood of any future trajectory, and there is also an easy way to obtain a compact and diverse sample set of weighted trajectories: MAP samples from each anchor-intent.

Our model contrasts with popular methods in the past, which either provided only a single MAP trajectory [1, 2, 9, 10, 11] or unweighted sample sets via generative models [3, 4, 6, 12 ,13,14,15]. When it comes to real-world applications, such as self-driving cars, sample-based approaches have a number of disadvantages: (1) uncertainty in safety-critical systems, (2) poor handling of approximation errors (e.g., self-driving car ). "How many samples do I have to take to know the probability of a pedestrian crossing the street?"), (3) There is no easy way to perform probabilistic inference on relevant queries, such as computing the expectation of a spatio-temporal region.

We demonstrate empirically that the distributions emitted by our model on synthetic and real-world prediction datasets better predict observed outcomes: we obtain higher likelihood than models emitting unimodal parametric distributions, which shows the importance of multiple anchors in real-world data. We also compare with sampling-based methods by using a weighted set of MAP trajectories for each anchor, which better describes the future using fewer samples on the sample set metric.

【Paper Express | Featured】

Forum address: https://bbs.csdn.net/forums/paper