Overview of end-to-end driving model for unmanned vehicles

Summary:

Typically, end-to-end driving models use a deep neural network to perform this mapping, and all parameters of the network are jointly trained. This approach is notable for its simplicity and efficiency.

introduction

When building an unmanned vehicle, the main job of my friends and I was to build a driving model. The so-called driving model is the software that controls the driving of the unmanned vehicle. It is functionally similar to a driver. Its input is the vehicle status and surrounding environment information, and the output is the control signal for the unmanned vehicle. Of all the driving models, the most straightforward is the end-to-end driving model. The end-to-end driving model derives the vehicle's control signals directly from the vehicle state and external environment information. From the input end (the raw data of the sensor) to the output end (control signal), there is no need for any artificially designed features. Typically, end-to-end driving models use a deep neural network to perform this mapping, and all parameters of the network are jointly trained. This approach is notable for its simplicity and efficiency.

Universal Approximation Theorem (UAT) provides some theoretical basis for this approach [1]. UAT shows that a feedforward network with sufficiently many hidden nodes can approximate a continuous function defined on a compact subset of Euclidean space with any given accuracy when using a non-constant bounded continuous function as the activation function. We assume that human driving behavior is a continuous function defined on a compact subset of Euclidean space. Then, there must exist a neural network that can approximate the world's best human driver with arbitrary precision. Although UAT doesn't tell us how to find this network, intrepid scientists have already embarked on a journey.

The Evolution of End-to-End Driving Models

The earliest attempts to find an end-to-end driving model go back at least to the ALVINN model in 1989 [2]. ALVINN is a three-layer neural network whose inputs include video data of the road ahead, laser rangefinder data, and an intensity feedback. For the video input, ALVINN only uses its blue channel, because in the blue channel, the contrast between road and non-road is the strongest. For rangefinder data, the activation intensity of neurons is proportional to the distance from each point captured to the vehicle. Intensity feedback describes the relative brightness of road and non-pavement in the previous image. The output of ALVINN is a vector indicating the direction to go, and the intensity feedback input to the next moment. The specific network structure is shown in Figure 1.

Figure 1: Schematic diagram of the network structure of ALVINN, the picture is cited in [2]

When training ALVINN, the ground truth of its output is set to a distribution. The central position of the distribution corresponds to the direction that allows the vehicle to travel to the center of the road 7 meters ahead, and the distribution rapidly decays to 0 from the center to both sides. In addition, a large amount of synthetic road data is used in the training process, which is used to improve the generalization ability of ALVINN. The model successfully drove a 400-meter-long road at a speed of 0.5 meters per second. In 1995, Carnegie Mellon University enabled ALVINN to detect roads and intersections by introducing a virtual camera based on ALVINN [3]. In addition, Yann LeCun of New York University gave an end-to-end obstacle avoidance robot built with a 6-layer convolutional neural network in 2006 [4].

In recent years, the more influential work is PilotNet [5] developed by NVIDIA in 2016. As shown in Figure 2, the model uses convolutional and fully connected layers to extract features from the input image and give the angle of the steering wheel (turning radius). Correspondingly, NVIDIA also provided a set of computing platform NVIDIA PX 2 for real vehicle road test. In NVIDIA's follow-up work, they also visualized the features learned inside PilotNet, and found that PilotNet can spontaneously pay attention to obstacles, lane lines and other objects that have important reference value for driving [6].

Figure 2: Schematic diagram of the network structure of PilotNet, the picture is cited in [5]

Models after PilotNet have sprung up. An important representative is the FCN-LSTM network proposed by the University of California, Berkeley [7]. As shown in Figure 3, the network first abstracts the image into a vector form of features through the full convolutional network, and then fuses the current features with the previous features through the long short-term memory network, and outputs the current control signal. It is worth pointing out that the network uses an image segmentation task to assist the training of the network, and it is an interesting attempt to change the network parameters from "disordered" to "ordered" with more supervisory signals. The above work only focuses on the "lateral control" of the unmanned vehicle, that is, the corner of the steering wheel. The Multi-modal multi-task network proposed by the University of Rochester [8] is based on the previous work, not only gives the steering wheel angle, but also gives the expected speed, that is, it includes "longitudinal control", so the complete The most basic control signals required by unmanned vehicles are given, and its network structure is shown in Figure 4.

Figure 3: Schematic diagram of the FCN-LSTM network structure, the picture is cited in [7]

Figure 4: Schematic diagram of the Multi-modal multi-task network structure, the picture is cited in [8]

The ST-Conv + ConvLSTM + LSTM network proposed by Peking University is more refined [9]. As shown in Figure 5, the network is roughly divided into two parts, namely the feature extraction subnetwork and the orientation angle prediction subnetwork. The feature extraction sub-network utilizes spatio-temporal convolution, multi-scale residual aggregation, convolutional long-short-term memory network and other building techniques or modules. The direction angle prediction sub-network mainly performs the fusion and circulation of time series information. The authors of the network also found that the lateral control and vertical control of unmanned vehicles have a strong correlation, so jointly predicting the two controls can help the network learn more effectively.

Figure 5: Schematic diagram of ST-Conv+ConvLSTM+LSTM network structure, the picture is cited in [9]

Characteristics of the end-to-end driving model

Speaking of this, you may have discovered that the end-to-end model has benefited from the rapid development of deep learning technology and is constantly developing in a more sophisticated direction. From the initial three-tier network, it is gradually armed with the latest modules and techniques. With the support of these latest technologies, the end-to-end driving model has basically realized functions such as straight road driving, curve driving, and speed control. In order to let everyone understand the current development status of the end-to-end model, we make a simple comparison between this model and the traditional model from the algorithm level, as shown in Table 1 below:

Table 1: Comparison of traditional driving model and end-to-end model

Traditional models generally divide driving tasks into multiple sub-modules, such as perception, localization, mapping, planning, control, and so on. Each sub-module completes a specific function, the output of a certain module is used as the input of other modules, and the modules are connected to each other to form a directed graph structure. This method requires manual decoupling of the driving tasks of the unmanned vehicle and the design of each sub-module, and the number of sub-modules can even be as high as thousands, resulting in time-consuming and labor-intensive work and high maintenance costs. So many sub-modules put forward extremely high requirements on the on-board computing platform, which requires powerful computing power to ensure that each module can quickly respond to changes in the environment. In addition, traditional driving models often rely on high-precision maps, resulting in high data costs. This type of model uses regular logic to plan and control the motion of unmanned vehicles, which leads to a weak anthropomorphic driving style and affects ride comfort. In contrast, the end-to-end model shows strong advantages due to its simplicity, ease of use, low cost, and anthropomorphism.

People often think of the end-to-end driving model as antithetical to the traditional model of modularity. With the modular model, there is no need for end-to-end. But in the field of unmanned delivery, I think the two should be complementary. First of all, the "small, light, slow, and object" characteristics of unmanned delivery vehicles [10] greatly reduce their safety risks. Enables the deployment of end-to-end models. Then, the end-to-end model handles common scenarios well and with low power consumption. The modular approach can cover more scenarios, but consumes a lot of power. Therefore, a valuable direction should be to jointly deploy end-to-end models and modular models. Use end-to-end for common scenarios, and switch to a modular model for complex scenarios. In this way, we can reduce the power consumption of the delivery vehicle as much as possible while maintaining the overall model performance.

So will we see unmanned delivery vehicles controlled by end-to-end driving models soon? In fact, the end-to-end driving model is still in the research stage. I have summed up the following difficulties from my actual work experience:

1. The end-to-end driving model is difficult to debug due to its almost black-box characteristics.

Since the end-to-end model works as a whole, when the model fails in a certain situation, it is almost impossible to find the "submodule" in the model that should be responsible for this failure, and there is no way to target it. Tuning. When encountering a failure example, the usual approach can only be to add more data, hoping that the retrained model can pass this example next time.

2. It is difficult to introduce prior knowledge into the end-to-end driving model.

The current end-to-end models are more imitating the actions of human drivers, but do not understand the rules behind human actions. It is difficult to let the model learn rules such as traffic rules and civilized driving in a purely data-driven way, and more research is needed.

3. It is difficult for end-to-end driving models to properly handle long-tail scenarios.
 

For common scenarios, it is easy to teach an end-to-end model the correct approach in a data-driven manner. However, the real road conditions vary widely, and we cannot collect data from all scenarios. For scenes that the model has not seen, the performance of the model is often worrying. How to improve the generalization ability of the model is an urgent problem to be solved.

4. End-to-end driving models typically learn driving techniques by imitating the control behavior of human drivers. But what this method essentially learns is the driver's "average control signal", and the "average control signal" may not even be a "correct" signal at all.

For example, at a T-junction where you can turn left and right, the average control signal—"go straight"—is a wrong control signal. Therefore, how to learn the control strategy of human drivers remains to be studied.

On this issue, my friends and I did a little work together. In this work, we determined that the driver's operation in different states satisfies a probability distribution. We estimate this distribution by learning the different moments of this probability distribution. In this way, the driver's control strategy can be well expressed by the moment of its probability distribution, avoiding the shortcoming of simply seeking the "average control signal". This work has been accepted for ROBIO 2018.

Common methods used in end-to-end driving models

In order to solve the various problems mentioned above, brave scientists have proposed many methods, among which the most anticipated are deep learning technology [11] and reinforcement learning technology [12]. With the continuous development of deep learning technology, it is believed that the interpretability and generalization ability of the model will be further improved. In this way, we may be able to tune the network in a targeted manner, or successfully generalize to real car scenarios and long-tail scenarios under rough simulation and with less data. Reinforcement learning is a technique that has achieved amazing results in recent years. It is also unknown whether unmanned vehicles may be able to obtain better control methods than human drivers by allowing unmanned vehicles to perform reinforcement learning in a simulated environment. In addition, the rapid development of technologies such as transfer learning, adversarial learning, and meta-learning may also have a huge impact on end-to-end driving models.

I am very excited about the future development of end-to-end driving models. "Two roads diverged in a wood, and I took the one less traveled by" [13].

references

[1] Csáji, Balázs Csanád. "Approximation with artificial neural networks." Faculty of Sciences, Etvs Lornd University, Hungary 24 (2001): 48.

[2]  Pomerleau, Dean A. "Alvinn: An autonomous land vehicle in a neural network." In Advances in neural information processing systems, pp. 305-313. 1989.

[3]  Jochem, Todd M., Dean A. Pomerleau, and Charles E. Thorpe. "Vision-based neural network road and intersection detection and traversal." In Intelligent Robots and Systems 95.'Human Robot Interaction and Cooperative Robots', Proceedings. 1995 IEEE/RSJ International Conference on, vol. 3, pp. 344-349. IEEE, 1995.

[4]  Muller, Urs, Jan Ben, Eric Cosatto, Beat Flepp, and Yann L. Cun. "Off-road obstacle avoidance through end-to-end learning." In Advances in neural information processing systems, pp. 739-746. 2006.

[5] Bojarski, Mariusz, Davide Del Testa, Daniel Dworakowski, Bernhard Firner, Beat Flepp, Prasoon Goyal, Lawrence D. Jackel et al. "End to End Learning for Self-Driving Cars." arXiv preprint arXiv:1604.07316 (2016).

[6]  Bojarski, Mariusz, Philip Yeres, Anna Choromanska, Krzysztof Choromanski, Bernhard Firner, Lawrence Jackel, and Urs Muller. "Explaining how a deep neural network trained with end-to-end learning steers a car." arXiv preprint arXiv:1704.07911 (2017).

[7]  Xu, Huazhe, Yang Gao, Fisher Yu, and Trevor Darrell. "End-to-end learning of driving models from large-scale video datasets." arXiv preprint (2017).

[8] Yang, Zhengyuan, Yixuan Zhang, Jerry Yu, Junjie Cai, and Jiebo Luo. "End-to-end Multi-Modal Multi-Task Vehicle Control for Self-Driving Cars with Visual Perception." arXiv preprint arXiv:1801.06734 (2018).

[9] Chi, Lu, and Yadong Mu. "Deep steering: Learning end-to-end driving model from spatial and temporal visual cues." arXiv preprint arXiv:1708.03798 (2017).

[10] "Xia Huaxia: Unmanned Delivery Scenario Contributes to the Iteration of Autonomous Driving Technology",
http://auto.qq.com/a/20180621/029250.htm, Tencent Auto, 2018.

http://auto.qq.com/a/20180621/029250.htm, Tencent Auto, 2018.

[11] LeCun, Yann, Yoshua Bengio, and Geoffrey Hinton. "Deep learning." nature 521, no. 7553 (2015): 436.

[12] Sutton, Richard S., and Andrew G. Barto. Reinforcement learning: An introduction. MIT press, 2018.

[13] Frost, Robert. Mountain interval. H. Holt, 1921.

Source|  Meituan
 

Guess you like

Origin blog.csdn.net/yessunday/article/details/130888828